3 Conic quadratic optimization¶
This chapter extends the notion of linear optimization with quadratic cones. Conic quadratic optimization, also known as second-order cone optimization, is a straightforward generalization of linear optimization, in the sense that we optimize a linear function under linear (in)equalities with some variables belonging to one or more (rotated) quadratic cones. We discuss the basic concept of quadratic cones, and demonstrate the surprisingly large flexibility of conic quadratic modeling.
3.1 Cones¶
Since this is the first place where we introduce a non-linear cone, it seems suitable to make our most important definition:
A set \(K\subseteq\real^n\) is called a convex cone if
for every \(x,y\in K\) we have \(x+y\in K\),
for every \(x\in K\) and \(\alpha\geq 0\) we have \(\alpha x\in K\).
For example a linear subspace of \(\real^n\), the positive orthant \(\real_{\geq 0}^n\) or any ray (half-line) starting at the origin are examples of convex cones. We leave it for the reader to check that the intersection of convex cones is a convex cone; this property enables us to assemble complicated optimization models from individual conic bricks.
3.1.1 Quadratic cones¶
We define the \(n\)-dimensional quadratic cone as
The geometric interpretation of a quadratic (or second-order) cone is shown in Fig. 3.1 for a cone with three variables, and illustrates how the boundary of the cone resembles an ice-cream cone. The 1-dimensional quadratic cone simply states nonnegativity \(x_1\geq 0\).
3.1.2 Rotated quadratic cones¶
An \(n-\)dimensional rotated quadratic cone is defined as
As the name indicates, there is a simple relationship between quadratic and rotated quadratic cones. Define an orthogonal transformation
Then it is easy to verify that
and since \(T\) is orthogonal we call \(\Q_r^n\) a rotated cone; the transformation corresponds to a rotation of \(\pi/4\) in the \((x_1,x_2)\) plane. For example if \(x\in \Q^3\) and
then
and similarly we see that
Thus, one could argue that we only need quadratic cones \(\Q^n\), but there are many examples where using an explicit rotated quadratic cone \(\Q^n_r\) is more natural, as we will see next.
3.2 Conic quadratic modeling¶
In the following we describe several convex sets that can be modeled using conic quadratic formulations or, as we call them, are conic quadratic representable.
3.2.1 Absolute values¶
In Sec. 2.2.2 (Absolute value) we saw how to model \(|x|\leq t\) using two linear inequalities, but in fact the epigraph of the absolute value is just the definition of a two-dimensional quadratic cone, i.e.,
3.2.2 Euclidean norms¶
The Euclidean norm of \(x\in \R^n\),
essentially defines the quadratic cone, i.e.,
The epigraph of the squared Euclidean norm can be described as the intersection of a rotated quadratic cone with an affine hyperplane,
3.2.3 Convex quadratic sets¶
Assume \(Q \in \R^{n \times n}\) is a symmetric positive semidefinite matrix. The convex inequality
may be rewritten as
Since \(Q\) is symmetric positive semidefinite the epigraph
is a convex set and there exists a matrix \(F\in \R^{k\times n}\) such that
(see Sec. 6 (Semidefinite optimization) for properties of semidefinite matrices). For instance \(F\) could be the Cholesky factorization of \(Q\). Then
and we have an equivalent characterization of (3.5) as
Frequently \(Q\) has the structure
where \(I\) is the identity matrix, so
and hence
is a conic quadratic representation of (3.5) in this case.
3.2.4 Second-order cones¶
A second-order cone is occasionally specified as
where \(A \in \R^{m\times n}\) and \(c \in \R^n\). The formulation (3.7) is simply
or equivalently
As will be explained in Sec. 8 (Duality in conic optimization), we refer to (3.8) as the dual form and (3.9) as the primal form. An alternative characterization of (3.7) is
which shows that certain quadratic inequalities are conic quadratic representable.
3.2.5 Simple sets involving power functions¶
Some power-like inequalities are conic quadratic representable, even though it need not be obvious at first glance. For example, we have
or in a similar fashion
For a more complicated example, consider the constraint
This is equivalent to a statement involving two cones and an extra variable
because
In practice power-like inequalities representable with similar tricks can often be expressed much more naturally using the power cone (see Sec. 4 (The power cones)), so we will not dwell on these examples much longer.
3.2.6 Rational functions¶
A general non-degenerate rational function of one variable \(f(x)=\frac{ax+b}{gx+h}\) can always be written in the form \(f(x)=p+\frac{q}{gx+h}\), and when \(q>0\) it is convex on the set where \(gx+h>0\). The conic quadratic model of
with variables \(t,x\) can be written as:
We can generalize it to a quadratic-over-linear function \(f(x)=\frac{ax^2+bx+c}{gx+h}\). By performing long polynomial division we can write it as \(f(x)=rx+s+\frac{q}{gx+h}\), and as before it is convex when \(q>0\) on the set where \(gx+h>0\). In particular
can be written as
In both cases the argument boils down to the observation that the target function is a sum of an affine expression and the inverse of a positive affine expression, see Sec. 3.2.5 (Simple sets involving power functions).
3.2.7 Harmonic mean¶
Consider next the hypograph of the harmonic mean,
It is not obvious that the inequality defines a convex set, nor that it is conic quadratic representable. However, we can rewrite it in the form
which suggests the conic representation:
Alternatively, in case \(n=2\), the hypograph of the harmonic mean can be represented by a single quadratic cone:
As this representation generalizes the harmonic mean to all points \(x_1 + x_2 \geq 0\), as implied by (3.12), the variable bounds \(x\geq 0\) and \(t\geq 0\) have to be added seperately if so desired.
3.2.8 Quadratic forms with one negative eigenvalue¶
Assume that \(A \in \R^{n \times n}\) is a symmetric matrix with exactly one negative eigenvalue, i.e., \(A\) has a spectral factorization (i.e., eigenvalue decomposition)
where \(Q^T Q = I\), \(\Lambda=\Diag(-\alpha_1, \alpha_2, \dots, \alpha_n)\), \(\alpha_i \geq 0\). Then
is equivalent to
This shows \(x^T A x \leq 0\) to be a union of two convex subsets, each representable by a quadratic cone. For example, assuming \(q_1^T x \geq 0\), we can characterize (3.13) as
3.2.9 Ellipsoidal sets¶
The set
describes an ellipsoid centred at \(c\). It has a natural conic quadratic representation, i.e., \(x\in {\cal E}\) if and only if
3.3 Conic quadratic case studies¶
3.3.1 Quadratically constrained quadratic optimization¶
A general convex quadratically constrained quadratic optimization problem can be written as
where all \(Q_i\in\real^{n\times n}\) are symmetric positive semidefinite. Let
where \(F_i \in \R^{k_i \times n}\). Using the formulations in Sec. 3.2.3 (Convex quadratic sets) we then get an equivalent conic quadratic problem
Assume next that \(k_i\), the number of rows in \(F_i\), is small compared to \(n\). Storing \(Q_i\) requires about \(n^2/2\) space whereas storing \(F_i\) then only requires \(nk_i\) space. Moreover, the amount of work required to evaluate \(x^T Q_i x\) is proportional to \(n^2\) whereas the work required to evaluate \(x^T F_i^T F_ i x = \| F_i x \|^2\) is proportional to \(nk_i\) only. In other words, if \(Q_i\) have low rank, then (3.16) will require much less space and time to solve than (3.15). We will study the reformulation (3.16) in much more detail in Sec. 10 (Quadratic optimization).
3.3.2 Robust optimization with ellipsoidal uncertainties¶
Often in robust optimization some of the parameters in the model are assumed to be unknown exactly, but there is a simple set describing the uncertainty. For example, for a standard linear optimization problem we may wish to find a robust solution for all objective vectors \(c\) in an ellipsoid
A common approach is then to optimize for the worst-case scenario for \(c\), so we get a robust version
The worst-case objective can be evaluated as
where we used that \(\sup_{\| u \|_2\leq 1} v^T u = (v^T v)/\|v\|_2=\|v\|_2\). Thus the robust problem (3.17) is equivalent to
which can be posed as a conic quadratic problem
3.3.3 Markowitz portfolio optimization¶
In classical Markowitz portfolio optimization we consider investment in \(n\) stocks or assets held over a period of time. Let \(x_i\) denote the amount we invest in asset \(i\), and assume a stochastic model where the return of the assets is a random variable \(r\) with known mean
and covariance
The return of our investment is also a random variable \(y = r^Tx\) with mean (or expected return)
and variance (or risk)
We then wish to rebalance our portfolio to achieve a compromise between risk and expected return, e.g., we can maximize the expected return given an upper bound \(\gamma\) on the tolerable risk and a constraint that our total investment is fixed,
Suppose we factor \(\Sigma = GG^T\) (e.g., using a Cholesky or a eigenvalue decomposition). We then get a conic formulation
In practice both the average return and covariance are estimated using historical data. A recent trend is then to formulate a robust version of the portfolio optimization problem to combat the inherent uncertainty in those estimates, e.g., we can constrain \(\mu\) to an ellipsoidal uncertainty set as in Sec. 3.3.2 (Robust optimization with ellipsoidal uncertainties).
It is also common that the data for a portfolio optimization problem is already given in the form of a factor model \(\Sigma=F^TF\) of \(\Sigma=I+F^TF\) and a conic quadratic formulation as in Sec. 3.3.1 (Quadratically constrained quadratic optimization) is most natural. For more details see Sec. 10.3 (Example: Factor model).
3.3.4 Maximizing the Sharpe ratio¶
Continuing the previous example, the Sharpe ratio defines an efficiency metric of a portfolio as the expected return per unit risk, i.e.,
where \(r_f\) denotes the return of a risk-free asset. We assume that there is a portfolio with \(\mu^T x>r_f\), so maximizing the Sharpe ratio is equivalent to minimizing \(1/S(x)\). In other words, we have the following problem
The objective has the same nature as a quotient of two affine functions we studied in Sec. 2.2.5 (Homogenization). We reformulate the problem in a similar way, introducing a scalar variable \(z\geq 0\) and a variable transformation
Since a positive \(z\) can be chosen arbitrarily and \((\mu - r_fe)^T x>0\), we can without loss of generality assume that
Thus, we obtain the following conic problem for maximizing the Sharpe ratio,
and we recover \(x = y/z\).
3.3.5 A resource constrained production and inventory problem¶
The resource constrained production and inventory problem [Zie82] can be formulated as follows:
where \(n\) denotes the number of items to be produced, \(b\) denotes the amount of common resource, and \(r_j\) is the consumption of the limited resource to produce one unit of item \(j\). The objective function represents inventory and ordering costs. Let \(c_j^p\) denote the holding cost per unit of product \(j\) and \(c_j^r\) denote the rate of holding costs, respectively. Further, let
so that
is the average holding costs for product \(j\). If \(D_j\) denotes the total demand for product \(j\) and \(c_j^o\) the ordering cost per order of product \(j\) then let
and hence
is the average ordering costs for product \(j\). In summary, the problem finds the optimal batch size such that the inventory and ordering cost are minimized while satisfying the constraints on the common resource. Given \(d_j, e_j \geq 0\) problem (3.21) is equivalent to the conic quadratic problem
It is not always possible to produce a fractional number of items. In such case \(x_j\) should be constrained to be integers. See Sec. 9 (Mixed integer optimization).