8 Duality in conic optimization

In Sec. 2 (Linear optimization) we introduced duality and related concepts for linear optimization. Here we present a more general version of this theory for conic optimization and we illustrate it with examples. Although this chapter is self-contained, we recommend familiarity with Sec. 2 (Linear optimization), which some of this material is a natural extension of.

Duality theory is a rich and powerful area of convex optimization, and central to understanding sensitivity analysis and infeasibility issues. Furthermore, it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal value for many difficult non-convex problems.

From now on we consider a conic problem in standard form

(8.1)\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax=b,\\ & x\in K, \end{array}\end{split}\]

where \(K\) is a convex cone. In practice \(K\) would likely be defined as a product

\[K=K_1\times\cdots\times K_m\]

of smaller cones corresponding to actual constraints in the problem formulation. The abstract formulation (8.1) encompasses such special cases as \(K=\real_+^n\) (linear optimization), \(K_i=\real\) (unconstrained variable) or \(K_i=\PSD^n\) (\(\frac12n(n+1)\) variables representing the upper-triangular part of a matrix).

The feasible set for (8.1):

\[{\cal F}_p=\{x\in \R^n \mid Ax=b\} \cap K\]

is a section of \(K\). We say the problem is feasible if \({\cal F}_p\neq\emptyset\) and infeasible otherwise. The value of (8.1) is defined as

\[p^\star = \inf\{c^Tx~:~ x\in{\cal F}_p\},\]

allowing the special cases of \(p^\star=+\infty\) (the problem is infeasible) and \(p^\star=-\infty\) (the problem is unbounded). Note that \(p^\star\) may not be attained, i.e. the infimum is not necessarily a minimum, although this sort of problems are ill-posed from a practical point of view (see Sec. 7 (Practical optimization)).

Example 8.1 Unattained problem value

The conic quadratic problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & x \\ \mbox{subject to} & xy\geq 0.5\quad ((x,y,1)\in\Q_r^3), \end{array}\end{split}\]

has \(p^\star=0\) but this optimal value is not attained by any point in \({\cal F}_p\).

8.1 Dual cone

Suppose \(K\subseteq \real^n\) is a closed convex cone. We define the dual cone \(K^*\) as

(8.2)\[K^* = \{ y\in \real^n~:~ y^Tx\geq 0\ \forall x\in K\}.\]

For simplicity we write \(y^Tx\), denoting the standard Euclidean inner product, but everything in this section applies verbatim to the inner product of matrices in the semidefinite context. In other words \(K^*\) consists of vectors which form an acute (or right) angle with every vector in \(K\). We see easily that \(K^*\) is in fact a convex cone. The main example, associated with linear optimization, is the dual of the positive orthant:

Lemma 8.1 Dual of linear cone
\[(\real_+^n)^* = \real_+^n\]
Proof

This is obvious for \(n=1\), and then we use the simple fact

\[(K_1\times\cdots\times K_m)^* = K_1^*\times\cdots\times K_m^*,\]

which we invite the reader to prove.

All cones studied in this cookbook can be explicitly dualized as we show next.

Lemma 8.2 Duals of nonlinear cones

We have the following:

  • The quadratic, rotated quadratic and semidefinite cones are self-dual:

    \[(\Q^n)^*=\Q^n,\ (\Qr^n)^*=\Qr^n,\ (\PSD^n)^*=\PSD^n\]
  • The dual of the power cone (4.4) is

    \[(\POW_n^{\alpha_1,\cdots,\alpha_m})^* = \left\lbrace y\in\real^n~:~ \left(\frac{y_1}{\alpha_1},\ldots,\frac{y_m}{\alpha_m},y_{m+1},\ldots,y_n\right)\in\POW_n^{\alpha_1,\cdots,\alpha_m}\right\rbrace.\]
  • The dual of the exponential cone (5.2) is the closure

    \[(\EXP)^* = \mathrm{cl}\left\{y\in\real^3~:~ y_1\geq -y_3e^{y_2/y_3-1},\ y_1>0, y_3<0\right\}.\]
Proof

We only sketch some parts of the proof and indicate geometric intuitions. First, let us show \((\Q^n)^*=\Q^n\). For \((t,x),(s,y)\in\Q^n\) we have

\[st + y^Tx\geq \|y\|_2\|x\|_2 + y^Tx \geq 0\]

by the Cauchy-Schwartz inequality \(|u^Tv|\leq \|u\|_2\|v\|_2\), therefore \(\Q^n\subseteq (\Q^n)^*\). Geometrically we are just saying that the quadratic cone \(\Q^n\) has a right angle at the apex, so any two vectors inside the cone form at most a right angle. Now suppose \((s,y)\) is a point outside \(\Q^n\). If \((-s,-y)\in \Q^n\) then \(-s^2-y^Ty<0\) and we showed \((s,y)\not\in (\Q^n)^*\). Otherwise let \((t,x)=\mathrm{proj}_{\Q^n}(s,y)\) be the projection (see Sec. 7.5 (Distance to a cone)); note that \((s,y)\) and \((t,-x)\in\Q^n\) form an obtuse angle.

For the semidefinite cone we use the property \(\langle X,Y\rangle\geq 0\) for \(X,Y\in\PSD^n\) (see Sec. 6.1.2 (Properties of semidefinite matrices)). Conversely, assume that \(Z\not \in \PSD^n\). Then there exists a \(w\) satisfying \(\langle ww^T, Z\rangle = w^T Z w <0\), so \(Z\not \in (\PSD^n)^*\).

As our last exercise let us check that vectors in \(\EXP\) and \((\EXP)^*\) form acute angles. By definition of \(\EXP\) and \((\EXP)^*\) we take \(x,y\) such that:

\[x_1\geq x_2\exp(x_3/x_2),\quad y_1\geq -y_3\exp(y_2/y_3-1),\quad x_1,x_2,y_1>0,\ y_3<0\]

and then we have

\[\begin{split}\begin{array}{ll} y^Tx=x_1y_1+x_2y_2+x_3y_3 & \geq -x_2y_3\exp(\frac{x_3}{x_2}+\frac{y_2}{y_3}-1)+x_2y_2+x_3y_3 \\ & \geq -x_2y_3(\frac{x_3}{x_2}+\frac{y_2}{y_3})+x_2y_2+x_3y_3 = 0, \end{array}\end{split}\]

using the inequality \(\exp(t)\geq 1+t\). We refer to the literature for full proofs of all the statements.

Finally it is nice to realize that \((K^*)^*=K\) and that the cones \(\{0\}\) and \(\real\) are each others’ duals. We leave it to the reader to check these facts.

8.2 Infeasibility in conic optimization

We can now discuss infeasibility certificates for conic problems. Given an optimization problem, the first basic question we are interested in is its feasibility status. The theory of infeasibility certificates for linear problems, including the Farkas lemma (see Sec. 2.3 (Infeasibility in linear optimization)) extends almost verbatim to the conic case.

Lemma 8.3 Farkas' lemma, conic version

Suppose we have the conic optimization problem (8.1). Exactly one of the following statements is true:

  1. Problem (8.1) is feasible.
  2. Problem (8.1) is infeasible, but there is a sequence \(x_n\in K\) such that \(\|Ax_n-b\|\to 0\).
  3. There exists \(y\) such that \(-A^T y \in K^*\) and \(b^T y > 0\).

Problems which fall under option 2. (limit-feasible) are ill-posed: an arbitrarily small perturbation of input data puts the problem in either category 1. or 3. This fringe case should therefore not appear in practice, and if it does, it signals issues with the optimization model which should be addressed.

Example 8.2 Limit-feasible model

Here is an example of an ill-posed limit-feasible model created by fixing one of the root variables of a rotated quadratic cone to \(0\).

\[\begin{split}\begin{array}{ll} \mbox{minimize} & u \\ \mbox{subject to}& (u,v,w)\in\Q^3_r,\\ & v=0,\\ & w\geq 1. \end{array}\end{split}\]

The problem is clearly infeasible, but the sequence \((u_n,v_n,w_n)=(n,1/n,1)\in\Q_r^3\) with \((v_n,w_n)\to(0,1)\) makes it limit-feasible as in alternative 2. of Lemma 8.3. There is no infeasibility certificate as in alternative 3.

Having cleared this detail we continue with the proof and example for the actually useful part of conic Farkas’ lemma.

Proof

Consider the set

\[A(K) = \{Ax~:~x\in K\}.\]

It is a convex cone. Feasibility is equivalent to \(b\in A(K)\). If \(b\not\in A(K)\) but \(b\in\mathrm{cl}(A(K))\) then we have the second alternative. Finally, if \(b\not\in\mathrm{cl}(A(K))\) then the closed convex cone \(\mathrm{cl}(A(K))\) and the point \(b\) can be strictly separated by a hyperplane passing through the origin, i.e. there exists \(y\) such that

\[b^Ty > 0, \quad (Ax)^Ty \leq 0\ \forall x\in K.\]

But then \(0\leq -(Ax)^Ty=(-A^Ty)^Tx\) for all \(x\in K\), so by definition of \(K^*\) we have \(-A^Ty\in K^*\), and we showed \(y\) satisfies the third alternative. Finally, 1. and 3. are mutually exclusive, since otherwise we would have

\[0\leq (-A^Ty)^Tx=-y^TAx=-y^Tb<0\]

and the same argument works in the limit if \(x_n\) is a sequence as in 2. That proves the lemma.

Therefore Farkas’ lemma implies that (up to certain ill-posed cases) either the problem (8.1) is feasible (first alternative) or there is a certificate of infeasibility \(y\) (last alternative). In other words, every time we classify a model as infeasible, we can certify this fact by providing an appropriate \(y\). Note that when \(K=\real_+^n\) we recover precisely the linear version, Lemma 2.1.

Example 8.3 Infeasible conic problem

Consider a minimization problem:

\[\begin{split}\begin{array}{ll} \mbox{minimize} & x_1 \\ \mbox{subject to}& -x_1+x_2-x_4 = 1,\\ & 2x_3-3x_4 = -1,\\ & x_1 \geq \sqrt{x_2^2+x_3^2},\\ & x_4\geq 0. \end{array}\end{split}\]

It can be expressed in the standard form (8.1) with

\[\begin{split}A = \left[\begin{array}{cccc}-1&1&0&-1\\0&0&2&-3\end{array}\right],\ b = \left[\begin{array}{cc}1\\-1\end{array}\right],\ K = \Q^3\times \real_+,\ c=[1,0,0,0]^T.\end{split}\]

A certificate of infeasibility is \(y=[1,0]^T\). Indeed, \(b^Ty=1>0\) and \(-A^Ty=[1,-1,0,1]\in K^*=\Q^3\times \real_+\). The certificate indicates that the first linear constraint alone causes infeasibility, which is indeed the case: the first equality together with the conic constraints yield a contradiction:

\[x_2-1 \geq x_2-1-x_4 = x_1\geq |x_2|.\]

8.3 Lagrangian and the dual problem

Classical Lagrangian

In general constrained optimization we consider an optimization problem of the form

(8.3)\[\begin{split}\begin{array}{ll} \mbox{minimize} & f_0(x) \\ \mbox{subject to} & f(x) \leq 0,\\ & h(x) = 0, \end{array}\end{split}\]

where \(f_0:\R^n \mapsto \R\) is the objective function, \(f:\R^n \mapsto \R^m\) encodes inequality constraints, and \(h: \R^n \mapsto \R^p\) encodes equality constraints. Readers familiar with the method of Lagrange multipliers or penalization will recognize the Lagrangian for (8.3), a function \(L:\R^n\times \R^m \times \R^p \mapsto \R\) that augments the objective with a weighted combination of all the constraints,

(8.4)\[ L(x, y, s) = f_0(x) + y^T h(x) + s^T f(x).\]

The variables \(y\in \R^p\) and \(s\in \R_+^m\) are called Lagrange multipliers or dual variables. The Lagrangian has the property that \(L(x,y,s)\leq f_0(x)\) whenever \(x\) is feasible for (8.1) and \(s\in \R_+^m\). The optimal point satisfies the first-order optimality condition \(\nabla_xL(x,y,s)=0\). Moreover, the dual function \(g(y,s)=\inf_x L(x,y,s)\) provides a lower bound for the optimal value of (8.3) for any \(y\in \R^p\) and \(s\in \R_+^m\), which leads to considering the dual problem of maximizing \(g(y,s)\).

Lagrangian for a conic problem

We next set up an analogue of the Lagrangian theory for the conic problem (8.1)

\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^Tx \\ \mbox{subject to} & Ax=b,\\ & x\in K, \end{array}\end{split}\]

where \(x,c\in \R^n\), \(b\in\R^m\), \(A\in\R^{m\times n}\) and \(K\subseteq \R^n\) is a convex cone. We associate with (8.1) a Lagrangian of the form \(L:\R^n\times \R^m \times K^* \to \R\)

\[L(x, y, s) = c^Tx + y^T(b-Ax) - s^T x.\]

For any feasible \(x^*\in{\cal F}_p\) and any \((y^*,s^*)\in\real^m\times K^*\) we have

(8.5)\[L(x^*, y^*, s^*)=c^Tx^*+(y^*)^T\cdot 0 - (s^*)^Tx^*\leq c^T x^*.\]

Note the we used the definition of the dual cone to conclude that \((s^*)^Tx^*\geq 0\). The dual function is defined as the minimum of \(L(x,y,s)\) over \(x\). Thus the dual function of (8.1) is

\[\begin{split}g(y, s) = \min_x L(x,y,s) = \min_x x^T( c - A^T y - s ) + b^T y = \left \{ \begin{array}{ll} b^T y, & c - A^T y - s = 0\\ -\infty, & \mbox{otherwise}. \end{array} \right.\end{split}\]

Dual problem

From (8.5) every \((y,s)\in\R^m\times K^*\) satisfies \(g(y,s)\leq p^\star\), i.e. \(g(y,s)\) is a lower bound for \(p^\star\). To get the best such bound we maximize \(g(y,s)\) over all \((y,s)\in\R^m\times K^*\) and get the dual problem:

(8.6)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y \\ \mbox{subject to} & c-A^Ty=s,\\ & s \in K^*, \end{array}\end{split}\]

or simply:

(8.7)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y \\ \mbox{subject to} & c-A^Ty\in K^*. \end{array}\end{split}\]

The optimal value of (8.6) will be denoted \(d^\star\). As in the case of (8.1) (which from now on we call the primal problem), the dual problem can be infeasible (\(d^\star=-\infty\)), have an optimal solution (\(-\infty<d^\star<+\infty\)) or be unbounded (\(d^\star=+\infty\)). As before, the value \(d^\star\) is defined as a supremum of \(b^Ty\) and may not be attained. Note that the roles of \(-\infty\) and \(+\infty\) are now reversed because the dual is a maximization problem.

Example 8.4 More general constraints

We can just as easily derive the dual of a problem with more general constraints, without necessarily having to transform the problem to standard form beforehand. Imagine, for example, that some solver accepts conic problems of the form:

(8.8)\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^Tx +c^f \\ \mbox{subject to} & l^c \leq Ax \leq u^c, \\ & l^x \leq x \leq u^x, \\ & x\in K. \end{array}\end{split}\]

Then we define a Lagrangian with one set of dual variables for each constraint appearing in the problem:

\[\begin{split}\begin{array}{rl} L(x, s_l^c, s_u^c, s_l^x, s_u^x, s_n^x) = & c^Tx+c^f-(s_l^c)^T(Ax-l^c)-(s_u^c)^T(u^c-Ax) \\ & -(s_l^x)^T(x-l^x)-(s_u^x)^T(u^x-x)-(s_n^x)^Tx \\ = & x^T(c-A^Ts_l^c+A^Ts_u^c-s_l^x+s_u^x-s_n^x) \\ & + (l^c)^Ts_l^c - (u^c)^Ts_u^c+(l^x)^Ts_l^x-(u^c)^Ts_u^x+c^f \end{array}\end{split}\]

and that gives a dual problem

\[\begin{split}\begin{array}{ll} \mbox{maximize} & (l^c)^Ts_l^c - (u^c)^Ts_u^c+(l^x)^Ts_l^x-(u^c)^Ts_u^x+c^f \\ \mbox{subject to} & c+A^T(-s_l^c+s_u^c)-s_l^x+s_u^x-s_n^x=0, \\ & s_l^c,s_u^c,s_l^x,s_u^x\geq 0, \\ & s_n^x\in K^*. \end{array}\end{split}\]
Example 8.5 Dual of simple portfolio

Consider a simplified version of the portfolio optimization problem, where we maximize expected return subject to an upper bound on the risk and no other constraints:

\[\begin{split}\begin{array}{ll} \mbox{maximize} & \mu^T x \\ \mbox{subject to} & \|Fx\|_2\leq \gamma, \end{array}\end{split}\]

where \(x\in\real^n\) and \(F\in\real^{m\times n}\). The conic formulation is

(8.9)\[\begin{split}\begin{array}{ll} \mbox{maximize} & \mu^T x \\ \mbox{subject to} & (\gamma, Fx)\in\Q^{m+1}, \end{array}\end{split}\]

and we can directly write the Lagrangian

\[L(x,v,w) = \mu^Tx+v\gamma+w^TFx = x^T(\mu+F^Tw) + v\gamma\]

with \((v,w)\in(\Q^{m+1})^*=\Q^{m+1}\). Note that we chose signs to have \(L(x,v,w)\geq \mu^Tx\) since we are dealing with a maximization problem. The dual is now determined by the conditions \(F^Tw+\mu=0\) and \(v\geq\|w\|_2\), so it can be formulated as

(8.10)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \gamma\|w\|_2 \\ \mbox{subject to} & F^Tw=-\mu. \end{array}\end{split}\]

Note that it is actually more natural to view problem (8.9) as the dual form and problem (8.10) as the primal. Indeed we can write the constraint in (8.9) as

\[\begin{split}\left[\begin{array}{cc}\gamma\\0\end{array}\right]-\left[\begin{array}{cc}0\\-F\end{array}\right]x\in(Q^{m+1})^*\end{split}\]

which fits naturally into the form (8.7). Having done this we can recover the dual as a minimization problem in the standard form (8.1). We leave it to the reader to check that we get the same answer as above.

8.4 Weak and strong duality

Weak duality

Suppose \(x^*\) and \((y^*, s^*)\) are feasible points for the primal and dual problems (8.1) and (8.6), respectively. Then we have

(8.11)\[b^Ty^*=(Ax^*)^Ty^*=(x^*)^T(A^Ty^*)=(x^*)^T(c-s^*)=c^Tx^*-(s^*)^Tx^* \leq c^Tx^*\]

so the dual objective value is a lower bound on the objective value of the primal. In particular, any dual feasible point \((y^*, s^*)\) gives a lower bound:

\[b^Ty^* \leq p^\star\]

and we immediately get the next lemma.

Lemma 8.4 Weak duality
\(d^\star\leq p^\star\).

It follows that if \(b^Ty^* = c^Tx^*\) then \(x^*\) is optimal for the primal, \((y^*,s^*)\) is optimal for the dual and \(b^Ty^* = c^Tx^*\) is the common optimal objective value. This way we can use the optimal dual solution to certify optimality of the primal solution and vice versa.

Complementary slackness

Moreover, (8.11) asserts that \(b^Ty^*=c^Tx^*\) is equivalent to orthogonality

\[(s^*)^Tx^*=0\]

i.e. complementary slackness. It is not hard to verify what complementary slackness means for particular types of cones, for example

  • for \(s,x\in\real_+\) we have \(sx=0\) iff \(s=0\) or \(x=0\),
  • vectors \((s_1,\tilde{s}), (x_1,\tilde{x}) \in \Q^{n+1}\) are orthogonal iff \((s_1,\tilde{s})\) and \((x_1,-\tilde{x})\) are parallel,
  • vectors \((s_1,s_2,\tilde{s}), (x_1,x_2,\tilde{x}) \in \Qr^{n+2}\) are orthogonal iff \((s_1,s_2,\tilde{s})\) and \((x_2,x_1,-\tilde{x})\) are parallel.

One implicit case is worth special attention: complementary slackness for a linear inequality constraint \(a^Tx\leq b\) with a non-negative dual variable \(y\) asserts that \((a^Tx^*-b)y^*=0\). This can be seen by directly writing down the appropriate Lagrangian for this type of constraint. Alternatively, we can introduce a slack variable \(u=b-a^Tx\) with a conic constraint \(u\geq0\) and let \(y\) be the dual conic variable. In particular, if a constraint is non-binding in the optimal solution (\(a^Tx^*<b\)) then the corresponding dual variable \(y^*=0\). If \(y^*>0\) then it can be related to a shadow price, see Sec. 2.4.4 (Dual values as shadow prices) and Sec. 8.5.2 (Constraint attribution).

Strong duality

The obvious question is now whether \(d^\star=p^\star\), that is if optimality of a primal solution can always be certified by a dual solution with matching objective value, as for linear programming. This turns out not to be the case for general conic problems.

Example 8.6 Positive duality gap

Consider the problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & x_3 \\ \mbox{subject to} & x_1\geq \sqrt{x_2^2+x_3^2},\\ & x_2\geq x_1,\ x_3\geq -1. \end{array}\end{split}\]

The only feasible points are \((x,x,0)\), so \(p^\star=0\). The dual problem is

\[\begin{split}\begin{array}{ll} \mbox{maximize} & -y_2 \\ \mbox{subject to} & y_1\geq \sqrt{y_1^2+(1-y_2)^2}, \end{array}\end{split}\]

with feasible points \((y,1)\), hence \(d^\star=-1\).

Similarly, we consider a problem

\[\begin{split}\begin{array}{ll} \mbox{minimize}& x_1\\ \mbox{subject to}& \left[ \begin{array}{ccc} 0 & x_1 & 0 \\ x_1 & x_2 & 0\\ 0 & 0 & 1+x_1 \end{array} \right]\in \PSD^3, \end{array}\end{split}\]

with feasible set \(\{x_1=0, x_2\geq 0\}\) and optimal value \(p^\star=0\). The dual problem can be formulated as

\[\begin{split}\begin{array}{ll} \mbox{maximize}& -z_2\\ \mbox{subject to}& \left[ \begin{array}{ccc} z_1 & (1-z_2)/2 & 0 \\ (1-z_2)/2 & 0 & 0\\ 0 & 0 & z_2 \end{array} \right]\in \PSD^3, \end{array}\end{split}\]

which has a feasible set \(\{z_1\geq 0, z_2=1\}\) and dual optimal value \(d^\star=-1\).

To ensure strong duality for conic problems we need an additional regularity assumption. As with the conic version of Farkas’ lemma Lemma 8.3, we stress that this is a technical condition to eliminate ill-posed problems which should not be formed in practice. In particular, we invite the reader to think that strong duality holds for all well formed conic problems one is likely to come across in applications, and that having a duality gap signals issues with the model formulation.

We say problem (8.1) is very nicely posed if for all values of \(c_0\) the feasibility problem

\[c^Tx = c_0,\ Ax=b,\ x\in K\]

satisfies either the first or third alternative in Lemma 8.3.

Lemma 8.5 Strong duality
Suppose that (8.1) is very nicely posed and \(p^\star\) is finite. Then \(d^\star=p^\star\).
Proof

For any \(\varepsilon>0\) consider the feasibility problem with variable \(x\) and constraints

\[\begin{split}\begin{array}{rcl} \left[\begin{array}{cc}-c^T \\ A\end{array}\right]x & = & \left[\begin{array}{cc}-p^\star+\varepsilon\\ b\end{array}\right] \\ x & \in & K \end{array} \quad \mbox{that is}\quad \begin{array}{rcl} \begin{array}{rcl}c^Tx&=&p^\star-\varepsilon, \\ Ax&=&b,\\ x & \in & K.\end{array} \\ \end{array}\end{split}\]

Optimality of \(p^\star\) implies that the above problem is infeasible. By Lemma 8.3 and because we assumed very-nice-posedness there exists \(\hat{y}=[y_0\ y]^T\) such that

\[\left[c,\ -A^T\right]\hat{y} \in K^* \ \mbox{and} \ \left[-p^\star+\varepsilon,\ b^T\right]\hat{y}>0.\]

If \(y_0=0\) then \(-A^Ty\in K^*\) and \(b^Ty>0\), which by Lemma 8.3 again would mean that the original problem was infeasible, which is not the case. Hence we can rescale so that \(y_0=1\) and then we get

\[c-A^Ty\in K^* \quad \mbox{and}\quad b^Ty\geq p^\star-\varepsilon.\]

The first constraint means that \(y\) is feasible for the dual problem. By letting \(\varepsilon\to 0\) we obtain \(d^\star\geq p^\star\).

There are more direct conditions which guarantee strong duality, such as below.

Lemma 8.6 Slater constraint qualification
Suppose that (8.1) is strictly feasible: there exists a point \(x\in\mathrm{int}(K)\) in the interior of \(K\) such that \(Ax=b\). Then strong duality holds if \(p^\star\) is finite. Moreover, if both primal and dual problem are strictly feasible then \(p^\star\) and \(d^\star\) are attained.

We omit the proof which can be found in standard texts. Note that the first problem from Example 8.6 does not satisfy Slater constraint qualification: the only feasible points lie on the boundary of the cone (we say the problem has no interior). That problem is not very nicely posed either: the point \((x_1,x_2,x_3)=(0.5c_0^2\varepsilon^{-1}+\varepsilon, 0.5c_0^2\varepsilon^{-1}, c_0)\in\Q^3\) violates the inequality \(x_2\geq x_1\) by an arbitrarily small \(\varepsilon\), so the problem is infeasible but limit-feasible (second alternative in Lemma 8.3).

8.5 Applications of conic duality

8.5.1 Linear regression and the normal equation

Least-squares linear regression is the problem of minimizing \(\|Ax-b\|_2^2\) over \(x\in\real^n\), where \(A\in\real^{m\times n}\) and \(b\in\real^m\) are fixed. This problem can be posed in conic form as

\[\begin{split}\begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & (t,Ax-b)\in \Q^{m+1}, \end{array}\end{split}\]

and we can write the Lagrangian

\[L(t,x,u,v) = t-tu-v^T(Ax-b) = t(1-u)-x^TA^Tv+b^Tv\]

so the constraints in the dual problem are:

\[u=1,\ A^Tv = 0,\ (u,v)\in Q^{m+1}.\]

The problem exhibits strong duality with both the primal and dual values attained in the optimal solution. The primal solution clearly satisfies \(t=\|Ax-b\|_2\), and so complementary slackness for the quadratic cone implies that the vectors \((u,-v)=(1,-v)\) and \((t,Ax-b)\) are parallel. As a consequence the constraint \(A^Tv=0\) becomes \(A^T(Ax-b)=0\) or simply

\[A^TAx=A^Tb\]

so if \(A^TA\) is invertible then \(x=(A^TA)^{-1}A^Tb\). This is the so-called normal equation for least-squares regression, which we now obtained as a consequence of strong duality.

8.5.2 Constraint attribution

Consider again a portfolio optimization problem with mean-variance utility function, vector of expected returns \(\alpha\) and covariance matrix \(\Sigma=F^TF\):

(8.12)\[\begin{split}\begin{array}{ll} \mbox{maximize} & \alpha^Tx-\frac12 cx^T\Sigma x\\ \mbox{subject to}& Ax\leq b, \end{array}\end{split}\]

where the linear part represents any set of additional constraints: total budget, sector constraints, diversification constraints, individual relations between positions etc. In the absence of additional constraints the solution to the unconstrained maximization problem is easy to derive using basic calculus and equals

\[\hat{x} = c^{-1}\Sigma^{-1}\alpha.\]

We would like to understand the difference \(x^*-\hat{x}\), where \(x^*\) is the solution of (8.12), and in particular to measure which of the linear constraints actually cause \(x^*\) to deviate from \(\hat{x}\) and to what degree. This can be quantified using the dual variables.

The conic version of problem (8.12) is

\[\begin{split}\begin{array}{ll} \mbox{maximize} & \alpha^Tx-cr\\ \mbox{subject to}& Ax\leq b,\\ & (1, r, Fx)\in \Qr \end{array}\end{split}\]

with dual

\[\begin{split}\begin{array}{ll} \mbox{minimize} & b^Ty+s\\ \mbox{subject to}& A^Ty=\alpha+F^Tu,\\ & (s, c, u)\in \Qr,\\ & y\geq 0. \end{array}\end{split}\]

Suppose we have a primal-dual optimal solution \((x^*,r^*,y^*,s^*,u^*)\). Complementary slackness for the rotated quadratic cone implies

\[(s^*, c, u^*) = \beta(r^*, 1, -Fx^*)\]

which leads to \(\beta=c\) and

\[A^Ty^* = \alpha-cF^TFx^*\]

or equivalently

\[x^* = \hat{x} - c^{-1}\Sigma^{-1}A^Ty^*.\]

This equation splits the difference between the constrained and unconstrained solutions into contributions from individual constraints, where the weights are precisely the dual variables \(y^*\). For example, if a constraint is not binding (\(a_i^Tx^*-b_i<0\)) then by complementary slackness \(y_i^*=0\) and, indeed, a non-binding constraint has no effect on the change in solution.

8.6 Semidefinite duality and LMIs

The general theory of conic duality applies in particular to problems with semidefinite variables so here we just state it in the language familiar to SDP practitioners. Consider for simplicity a primal semidefinite optimization problem with one matrix variable

(8.13)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \langle C, X \rangle\\ \mbox{subject to} & \langle A_i, X \rangle = b_i, \quad i=1,\dots,m,\\ & X \in \PSD^n. \end{array}\end{split}\]

We can quickly repeat the derivation of the dual problem in this notation. The Lagrangian is

\[\begin{split}\begin{array}{ll} L(X,y,S) & = \langle C, X \rangle - \sum_i y_i(\langle A_i, X \rangle - b_i) - \langle S, X \rangle \\ & = \langle C-\sum_i y_iA_i-S, X\rangle + b^Ty \end{array}\end{split}\]

and we get the dual problem

(8.14)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y\\ \mbox{subject to} & C - \sum_{i=1}^m y_i A_i \in \PSD^n.\\ \end{array}\end{split}\]

The dual contains an affine matrix-valued function with coefficients \(C,A_i\in \mathcal{S}^n\) and variable \(y\in \R^m\). Such a matrix-valued affine inequality is called a linear matrix inequality (LMI). In Sec. 6 (Semidefinite optimization) we formulated many problems as LMIs, that is in the form more naturally matching the dual.

From a modeling perspective it does not matter whether constraints are given as linear matrix inequalities or as an intersection of affine hyperplanes; one formulation is easily converted to other using auxiliary variables and constraints, and this transformation is often done transparently by optimization software. Nevertheless, it is instructive to study an explicit example of how to carry out this transformation. An linear matrix inequality

\[A_0 + x_1 A_1 + \dots + x_n A_n \succeq 0\]

where \(A_i\in\PSD^m\) is converted to a set of linear equality constraints using a slack variable

\[A_0 + x_1 A_1 + \dots + x_n A_n = S, \qquad S \succeq 0.\]

Apart from introducing an explicit semidefinite variable \(S\in\PSD^m\) we also added \(m(m+1)/2\) equality constraints. On the other hand, a semidefinite variable \(X\in\PSD^n\) can be rewritten as a linear matrix inequality with \(n(n+1)/2\) scalar variables

\[X = \sum_{i=1}^n e_ie_i^T x_{ii} + \sum_{i=1}^n\sum_{j=i+1}^n (e_ie_j^T + e_je_i^T) x_{ij} \succeq 0.\]

Obviously we should only use these transformations when necessary; if we have a problem that is more naturally interpreted in either primal or dual form, we should be careful to recognize that structure.

Example 8.7 Dualization and efficiency

Consider the problem:

\[\begin{split}\begin{array}{ll} \mbox{minimize} & e^T z \\ \mbox{subject to} & A + \Diag(z) = X\\ & X\succeq 0. \end{array}\end{split}\]

with the variables \(X\in\PSD^n\) and \(z\in \R^n\). This is a problem in primal form with \(n(n+1)/2\) equality constraints, but they are more naturally interpreted as a linear matrix inequality

\[A + \sum_i e_i e_i^T z_i \succeq 0.\]

The dual problem is

\[\begin{split}\begin{array}{ll} \mbox{maximize} & -\langle A, Z\rangle \\ \mbox{subject to} & \diag(Z) = e\\ & Z\succeq 0, \end{array}\end{split}\]

in the variable \(Z\in \PSD^n\). The dual problem has only \(n\) equality constraints, which is a vast improvement over the \(n(n+1)/2\) constraints in the primal problem.

Example 8.8 Sum of singular values revisited

In Sec. 6.2.3 (Singular value optimization), and specifically in (6.17), we expressed the problem of minimizing the sum of singular values of a nonsymmetric matrix \(X\). Problem (6.17) can be written as an LMI:

\[\begin{split}\begin{array}{ll} \mbox{maximize} & \sum_{i,j}X_{ij}z_{ij}\\ \mbox{subject to} & I - \sum_{i,j} z_{ij}\left[\begin{array}{cc}0 & e_je_i^T\\e_ie_j^T&0\end{array}\right] \succeq 0. \end{array}\end{split}\]

Treating this as the dual and going back to the primal form we get:

\[\begin{split}\begin{array}{ll} \mbox{minimize} & \trace(U)+\trace(V)\\ \mbox{subject to} & S = -\frac12 X,\\ & \left[\begin{array}{cc}U & S^T\\S & V\end{array}\right] \succeq 0, \end{array}\end{split}\]

which is equivalent to the claimed (6.18). The dual formulation has the advantage of being linear in \(X\).