7 Practical optimization¶
In this chapter we discuss various practical aspects of creating optimization models.
7.1 Conic reformulations¶
Many nonlinear constraint families were reformulated to conic form in previous chapters, and these examples are often sufficient to construct conic models for the optimization problems at hand. In case you do need to go beyond the cookbook examples, however, a few composition rules are worth knowing to guarantee correctness of your reformulation.
7.1.1 Perspective functions¶
The perspective of a function
In Sec. 5.2.6 (Log-sum-exp) it was shown that the logarithm of sum of exponentials
can be modeled as:
It follows immediately that its perspective
can be modeled as:
If
7.1.2 Composite functions¶
In representing composites
If
is convex and is convex, then is equivalent toIf
is convex and is concave, then is equivalent to
where
Consider Relaxation 1. Intuitively, if
Here are some simple substitutions.
The inequality
can be rewritten as and for all convex functions . This is because is nondecreasing everywhere.The inequality
can be rewritten as and for all nonnegative convex functions . This is because is nondecreasing on .The nonconvex inequality
can be relaxed as and . This relaxation is exact on where is nondecreasing, i.e., on the disjoint domain . On the domain it represents the strongest possible convex cut given by .
Some amount of work is generally required to find the correct reformulation of a nested nonlinear constraint. For instance, suppose that for
A natural first attempt is:
which corresponds to Relaxation 2 with
which is
We refer to Sec. 4 (The power cones) to verify that all the constraints above are representable using power cones. We leave it as an exercise to find other conic representations, based on other transformations of the original inequality.
7.1.3 Convex univariate piecewise-defined functions¶
Consider a univariate function with
Suppose
In the special case when
As the reformulation grows in size with the number of pieces, it is preferable to keep this number low. Trivially, if
The Huber loss function
is convex and its epigraph
where
In this particular example, however, unless the absolute value
7.2 Avoiding ill-posed problems¶
For a well-posed continuous problem a small change in the input data should induce a small change of the optimal solution. A problem is ill-posed if small perturbations of the problem data result in arbitrarily large perturbations of the solution, or even change the feasibility status of the problem. In such cases small rounding errors, or solving the problem on a different computer can result in different or wrong solutions.
In fact, from an algorithmic point of view, even computing a wrong solution is numerically difficult for ill-posed problems. A rigorous definition of the degree of ill-posedness is possible by defining a condition number. This is an attractive, but not very practical metric, as its evaluation requires solving several auxiliary optimization problems. Therefore we only make the modest recommendations to avoid problems
that are nearly infeasible,
with constraints that are linearly dependent, or nearly linearly dependent.
To give some idea about the perils of near linear dependence, consider this pair of optimization problems:
and
The first problem has a unique optimal solution
with optimal solution
Typical examples of problems with nearly linearly dependent constraints are discretizations of continuous processes, where the constraints invariably become more correlated as we make the discretization finer; as such there may be nothing wrong with the discretization or problem formulation, but we should expect numerical difficulties for sufficiently fine discretizations.
One should also be careful not to specify problems whose optimal value is only achieved in the limit. A trivial example is
The infimum value 0 is not attained for any finite

Fig. 7.1 Geometric representations of some ill-posed situations.¶
7.3 Scaling¶
Another difficulty encountered in practice involves models that are badly scaled. Loosely speaking, we consider a model to be badly scaled if
variables are measured on very different scales,
constraints or bounds are measured on very different scales.
For example if one variable
and in finite precision the second term dominates the objective and renders the contribution from
A similar situation (from a numerical point of view) is encountered when using penalization or big-
reasoning that the large penalty term will force
Suppose we want to solve an efficient frontier variant of the optimal portfolio problem from Sec. 3.3.3 (Markowitz portfolio optimization) with an objective of the form
where the parameter
and note that the maximum is attained for
Consider again the problem (2.12), but with additional (redundant) constraints
with a dual problem
Suppose we do not know a-priori an upper bound on
Suppose we have a mixed integer model which involves a big-M constraint (see Sec. 9 (Mixed integer optimization)), for instance
as in Sec. 9.1.3 (Indicator constraints). The constant
7.4 The huge and the tiny¶
Some types of problems and constraints are especially prone to behave badly with very large or very small numbers. We discuss such examples briefly in this section.
Sum of squares
The sum of squares
but in most cases it is better to bound the square root with a quadratic cone:
The latter has slightly better numerical properties:
Exponential cone
Using the function
since the smaller summand disappears in comparison to the other one,
For the same reason it is advised to replace explicit inequalities involving
Then after a substitution
which has the advantage that
Power cone
The power cone is not reliable when one of the exponents is very small. For example, consider the function
Now suppose
7.5 Semidefinite variables¶
Special care should be given to models involving semidefinite matrix variables (see Sec. 6 (Semidefinite optimization)), otherwise it is easy to produce an unnecessarily inefficient model. The general rule of thumb is:
having many small matrix variables is more efficient than one big matrix variable,
efficiency drops with growing number of semidefinite terms involved in linear constraints. This can have much bigger impact on the solution time than increasing the dimension of the semidefinite variable.
Let us consider a few examples.
Block matrices
Given two matrix variables
This increases the dimension of the problem and, even worse, introduces unnecessary constraints for a large portion of entries of the block matrix.
Schur complement
Suppose we want to model a relaxation of a rank-one constraint:
where
and use the upper-left
Sparse LMIs
Suppose we want to model a problem with a sparse linear matrix inequality (see Sec. 6.2.1 (Linear matrix inequalities)) such as:
where
and the linear constraint requires a full set of
and the number of nonzeros in linear constraints is just joint number of nonzeros in
7.6 The quality of a solution¶
In this section we will discuss how to validate an obtained solution. Assume we have a conic model with continuous variables only and that the optimization software has reported an optimal primal and dual solution. Given such a solution, we might ask how to verify that it is indeed feasible and optimal.
To that end, consider a simple model
where a solver might approximate the solution as
and therefore the approximate optimal objective value is
The true objective value is
Most likely this difference is irrelevant for all practical purposes. Nevertheless, in general a solution obtained using floating point arithmetic is only an approximation. Most (if not all) commercial optimization software uses double precision floating point arithmetic, implying that about 16 digits are correct in the computations performed internally by the software. This also means that irrational numbers such as
Verifying feasibility
A good practice after solving an optimization problem is to evaluate the reported solution. At the very least this process should be carried out during the initial phase of building a model, or if the reported solution is unexpected in some way. The first step in that process is to check that the solution is feasible; in case of the small example (7.1) this amounts to checking that:
which demonstrates that one constraint is slightly violated due to computations in finite precision. It is up to the user to assess the significance of a specific violation; for example a violation of one unit in
may or may not be more significant than a violation of one unit in
The right-hand side of
Verifying optimality
Another question is how to verify that a feasible approximate solution is actually optimal, which is answered by duality theory, which is discussed in Sec. 2.4 (Duality in linear optimization) for linear problems and in Sec. 8 (Duality in conic optimization) in full generality. Following Sec. 8 (Duality in conic optimization) we can derive an equivalent version of the dual problem as:
This problem has a feasible point
leading to a duality gap of about
7.7 Distance to a cone¶
The violation of a linear constraint
It is less obvious how to assess violations of conic constraints. To this end suppose we have a convex cone
We denote the projection by
i.e. the objective value of (7.2), measures the violation of a conic constraint involving
This distance measure is attractive because it depends only on the set
Let
is clearly not in the cone, in fact
belongs to
Therefore the distance of
Distance to certain cones
For some basic cone types the projection problem (7.2) can be solved analytically. Denoting
For
the projection onto the nonnegative half-line is .For
the projection onto the quadratic cone isFor
the projection onto the semidefinite cone is