# 8 Duality in conic optimization¶

In Sec. 2 (Linear optimization) we introduced duality and related concepts for linear optimization. Here we present a more general version of this theory for conic optimization and we illustrate it with examples. Although this chapter is self-contained, we recommend familiarity with Sec. 2 (Linear optimization), which some of this material is a natural extension of.

Duality theory is a rich and powerful area of convex optimization, and central to understanding sensitivity analysis and infeasibility issues. Furthermore, it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal value for many difficult non-convex problems.

From now on we consider a conic problem in standard form

where \(K\) is a convex cone. In practice \(K\) would likely be defined as a product

of smaller cones corresponding to actual constraints in the problem formulation. The abstract formulation (8.1) encompasses such special cases as \(K=\real_+^n\) (linear optimization), \(K_i=\real\) (unconstrained variable) or \(K_i=\PSD^n\) (\(\frac12n(n+1)\) variables representing the upper-triangular part of a matrix).

The *feasible set* for (8.1):

is a section of \(K\). We say the problem is *feasible* if \({\cal F}_p\neq\emptyset\) and *infeasible* otherwise. The *value* of (8.1) is defined as

allowing the special cases of \(p^\star=+\infty\) (the problem is infeasible) and \(p^\star=-\infty\) (the problem is unbounded). Note that \(p^\star\) may not be attained, i.e. the infimum is not necessarily a minimum, although this sort of problems are ill-posed from a practical point of view (see Sec. 7 (Practical optimization)).

The conic quadratic problem

has \(p^\star=0\) but this optimal value is not attained by any point in \({\cal F}_p\).

## 8.1 Dual cone¶

Suppose \(K\subseteq \real^n\) is a closed convex cone. We define the *dual cone* \(K^*\) as

For simplicity we write \(y^Tx\), denoting the standard Euclidean inner product, but everything in this section applies verbatim to the inner product of matrices in the semidefinite context. In other words \(K^*\) consists of vectors which form an acute (or right) angle with *every* vector in \(K\). We see easily that \(K^*\) is in fact a convex cone. The main example, associated with linear optimization, is the dual of the positive orthant:

This is obvious for \(n=1\), and then we use the simple fact

which we invite the reader to prove.

All cones studied in this cookbook can be explicitly dualized as we show next.

We have the following:

The quadratic, rotated quadratic and semidefinite cones are self-dual:

\[(\Q^n)^*=\Q^n,\ (\Qr^n)^*=\Qr^n,\ (\PSD^n)^*=\PSD^n\]The dual of the power cone (4.4) is

\[(\POW_n^{\alpha_1,\cdots,\alpha_m})^* = \left\lbrace y\in\real^n~:~ \left(\frac{y_1}{\alpha_1},\ldots,\frac{y_m}{\alpha_m},y_{m+1},\ldots,y_n\right)\in\POW_n^{\alpha_1,\cdots,\alpha_m}\right\rbrace.\]The dual of the exponential cone (5.2) is the closure

\[(\EXP)^* = \mathrm{cl}\left\{y\in\real^3~:~ y_1\geq -y_3e^{y_2/y_3-1},\ y_1>0, y_3<0\right\}.\]

We only sketch some parts of the proof and indicate geometric intuitions. First, let us show \((\Q^n)^*=\Q^n\). For \((t,x),(s,y)\in\Q^n\) we have

by the Cauchy-Schwartz inequality \(|u^Tv|\leq \|u\|_2\|v\|_2\), therefore \(\Q^n\subseteq (\Q^n)^*\). Geometrically we are just saying that the quadratic cone \(\Q^n\) has a right angle at the apex, so any two vectors inside the cone form at most a right angle. Now suppose \((s,y)\) is a point outside \(\Q^n\). If \((-s,-y)\in \Q^n\) then \(-s^2-y^Ty<0\) and we showed \((s,y)\not\in (\Q^n)^*\). Otherwise let \((t,x)=\mathrm{proj}_{\Q^n}(s,y)\) be the projection (see Sec. 7.5 (Distance to a cone)); note that \((s,y)\) and \((t,-x)\in\Q^n\) form an obtuse angle.

For the semidefinite cone we use the property \(\langle X,Y\rangle\geq 0\) for \(X,Y\in\PSD^n\) (see Sec. 6.1.2 (Properties of semidefinite matrices)). Conversely, assume that \(Z\not \in \PSD^n\). Then there exists a \(w\) satisfying \(\langle ww^T, Z\rangle = w^T Z w <0\), so \(Z\not \in (\PSD^n)^*\).

As our last exercise let us check that vectors in \(\EXP\) and \((\EXP)^*\) form acute angles. By definition of \(\EXP\) and \((\EXP)^*\) we take \(x,y\) such that:

and then we have

using the inequality \(\exp(t)\geq 1+t\). We refer to the literature for full proofs of all the statements.

Finally it is nice to realize that \((K^*)^*=K\) and that the cones \(\{0\}\) and \(\real\) are each others’ duals. We leave it to the reader to check these facts.

## 8.2 Infeasibility in conic optimization¶

We can now discuss infeasibility certificates for conic problems. Given an optimization problem, the first basic question we are interested in is its feasibility status. The theory of infeasibility certificates for linear problems, including the Farkas lemma (see Sec. 2.3 (Infeasibility in linear optimization)) extends almost verbatim to the conic case.

Problems which fall under option 2. (*limit-feasible*) are ill-posed: an arbitrarily small perturbation of input data puts the problem in either category 1. or 3. This fringe case should therefore not appear in practice, and if it does, it signals issues with the optimization model which should be addressed.

Here is an example of an ill-posed limit-feasible model created by fixing one of the root variables of a rotated quadratic cone to \(0\).

The problem is clearly infeasible, but the sequence \((u_n,v_n,w_n)=(n,1/n,1)\in\Q_r^3\) with \((v_n,w_n)\to(0,1)\) makes it limit-feasible as in alternative 2. of Lemma 8.3. There is no infeasibility certificate as in alternative 3.

Having cleared this detail we continue with the proof and example for the actually useful part of conic Farkas’ lemma.

Consider the set

It is a convex cone. Feasibility is equivalent to \(b\in A(K)\). If \(b\not\in A(K)\) but \(b\in\mathrm{cl}(A(K))\) then we have the second alternative. Finally, if \(b\not\in\mathrm{cl}(A(K))\) then the *closed* convex cone \(\mathrm{cl}(A(K))\) and the point \(b\) can be strictly separated by a hyperplane passing through the origin, i.e. there exists \(y\) such that

But then \(0\leq -(Ax)^Ty=(-A^Ty)^Tx\) for all \(x\in K\), so by definition of \(K^*\) we have \(-A^Ty\in K^*\), and we showed \(y\) satisfies the third alternative. Finally, 1. and 3. are mutually exclusive, since otherwise we would have

and the same argument works in the limit if \(x_n\) is a sequence as in 2. That proves the lemma.

Therefore Farkas’ lemma implies that (up to certain ill-posed cases) either the problem (8.1) is feasible (first alternative) or there is a *certificate of infeasibility* \(y\) (last alternative). In other words, every time we classify a model as infeasible, we can certify this fact by providing an appropriate \(y\). Note that when \(K=\real_+^n\) we recover precisely the linear version, Lemma 2.1.

Consider a minimization problem:

It can be expressed in the standard form (8.1) with

A certificate of infeasibility is \(y=[1,0]^T\). Indeed, \(b^Ty=1>0\) and \(-A^Ty=[1,-1,0,1]\in K^*=\Q^3\times \real_+\). The certificate indicates that the first linear constraint alone causes infeasibility, which is indeed the case: the first equality together with the conic constraints yield a contradiction:

## 8.3 Lagrangian and the dual problem¶

Classical Lagrangian

In general constrained optimization we consider an optimization problem of the form

where \(f_0:\R^n \mapsto \R\) is the objective function, \(f:\R^n \mapsto \R^m\) encodes inequality constraints, and \(h: \R^n \mapsto \R^p\) encodes equality constraints. Readers familiar with the method of *Lagrange multipliers* or *penalization* will recognize the *Lagrangian* for (8.3), a function \(L:\R^n\times \R^m \times \R^p \mapsto \R\) that augments the objective with a weighted combination of all the constraints,

The variables \(y\in \R^p\) and \(s\in \R_+^m\) are called *Lagrange multipliers* or *dual variables*. The Lagrangian has the property that \(L(x,y,s)\leq f_0(x)\) whenever \(x\) is feasible for (8.1) and \(s\in \R_+^m\). The optimal point satisfies the first-order optimality condition \(\nabla_xL(x,y,s)=0\). Moreover, the dual function \(g(y,s)=\inf_x L(x,y,s)\) provides a lower bound for the optimal value of (8.3) for any \(y\in \R^p\) and \(s\in \R_+^m\), which leads to considering the dual problem of maximizing \(g(y,s)\).

Lagrangian for a conic problem

We next set up an analogue of the Lagrangian theory for the conic problem (8.1)

where \(x,c\in \R^n\), \(b\in\R^m\), \(A\in\R^{m\times n}\) and \(K\subseteq \R^n\) is a convex cone. We associate with (8.1) a Lagrangian of the form \(L:\R^n\times \R^m \times K^* \to \R\)

For any feasible \(x^*\in{\cal F}_p\) and any \((y^*,s^*)\in\real^m\times K^*\) we have

Note the we used the definition of the dual cone to conclude that \((s^*)^Tx^*\geq 0\). The *dual function* is defined as the minimum of \(L(x,y,s)\) over \(x\). Thus the dual function of (8.1) is

Dual problem

From (8.5) every \((y,s)\in\R^m\times K^*\) satisfies \(g(y,s)\leq p^\star\), i.e. \(g(y,s)\) is a lower bound for \(p^\star\). To get the best such bound we maximize \(g(y,s)\) over all \((y,s)\in\R^m\times K^*\) and get the *dual problem*:

or simply:

The optimal value of (8.6) will be denoted \(d^\star\). As in the case of (8.1) (which from now on we call the *primal problem*), the dual problem can be infeasible (\(d^\star=-\infty\)), have an optimal solution (\(-\infty<d^\star<+\infty\)) or be unbounded (\(d^\star=+\infty\)). As before, the value \(d^\star\) is defined as a supremum of \(b^Ty\) and may not be attained. Note that the roles of \(-\infty\) and \(+\infty\) are now reversed because the dual is a maximization problem.

We can just as easily derive the dual of a problem with more general constraints, without necessarily having to transform the problem to standard form beforehand. Imagine, for example, that some solver accepts conic problems of the form:

Then we define a Lagrangian with one set of dual variables for each constraint appearing in the problem:

and that gives a dual problem

Consider a simplified version of the portfolio optimization problem, where we maximize expected return subject to an upper bound on the risk and no other constraints:

where \(x\in\real^n\) and \(F\in\real^{m\times n}\). The conic formulation is

and we can directly write the Lagrangian

with \((v,w)\in(\Q^{m+1})^*=\Q^{m+1}\). Note that we chose signs to have \(L(x,v,w)\geq \mu^Tx\) since we are dealing with a maximization problem. The dual is now determined by the conditions \(F^Tw+\mu=0\) and \(v\geq\|w\|_2\), so it can be formulated as

Note that it is actually more natural to view problem (8.9) as the dual form and problem (8.10) as the primal. Indeed we can write the constraint in (8.9) as

which fits naturally into the form (8.7). Having done this we can recover the dual as a minimization problem in the standard form (8.1). We leave it to the reader to check that we get the same answer as above.

## 8.4 Weak and strong duality¶

Weak duality

Suppose \(x^*\) and \((y^*, s^*)\) are feasible points for the primal and dual problems (8.1) and (8.6), respectively. Then we have

so the dual objective value is a lower bound on the objective value of the primal. In particular, any dual feasible point \((y^*, s^*)\) gives a lower bound:

and we immediately get the next lemma.

It follows that if \(b^Ty^* = c^Tx^*\) then \(x^*\) is optimal for the primal, \((y^*,s^*)\) is optimal for the dual and \(b^Ty^* = c^Tx^*\) is the common optimal objective value. This way we can use the optimal dual solution to certify optimality of the primal solution and vice versa.

Complementary slackness

Moreover, (8.11) asserts that \(b^Ty^*=c^Tx^*\) is equivalent to orthogonality

i.e. *complementary slackness*. It is not hard to verify what complementary slackness means for particular types of cones, for example

- for \(s,x\in\real_+\) we have \(sx=0\) iff \(s=0\) or \(x=0\),
- vectors \((s_1,\tilde{s}), (x_1,\tilde{x}) \in \Q^{n+1}\) are orthogonal iff \((s_1,\tilde{s})\) and \((x_1,-\tilde{x})\) are parallel,
- vectors \((s_1,s_2,\tilde{s}), (x_1,x_2,\tilde{x}) \in \Qr^{n+2}\) are orthogonal iff \((s_1,s_2,\tilde{s})\) and \((x_2,x_1,-\tilde{x})\) are parallel.

One implicit case is worth special attention: complementary slackness for a linear inequality constraint \(a^Tx\leq b\) with a non-negative dual variable \(y\) asserts that \((a^Tx^*-b)y^*=0\). This can be seen by directly writing down the appropriate Lagrangian for this type of constraint. Alternatively, we can introduce a slack variable \(u=b-a^Tx\) with a conic constraint \(u\geq0\) and let \(y\) be the dual conic variable. In particular, if a constraint is non-binding in the optimal solution (\(a^Tx^*<b\)) then the corresponding dual variable \(y^*=0\). If \(y^*>0\) then it can be related to a shadow price, see Sec. 2.4.4 (Dual values as shadow prices) and Sec. 8.5.2 (Constraint attribution).

Strong duality

The obvious question is now whether \(d^\star=p^\star\), that is if optimality of a primal solution can always be certified by a dual solution with matching objective value, as for linear programming. This turns out not to be the case for general conic problems.

Consider the problem

The only feasible points are \((x,x,0)\), so \(p^\star=0\). The dual problem is

with feasible points \((y,1)\), hence \(d^\star=-1\).

Similarly, we consider a problem

with feasible set \(\{x_1=0, x_2\geq 0\}\) and optimal value \(p^\star=0\). The dual problem can be formulated as

which has a feasible set \(\{z_1\geq 0, z_2=1\}\) and dual optimal value \(d^\star=-1\).

To ensure strong duality for conic problems we need an additional regularity assumption. As with the conic version of Farkas’ lemma Lemma 8.3, we stress that this is a technical condition to eliminate ill-posed problems which should not be formed in practice. In particular, we invite the reader to think that strong duality holds for all well formed conic problems one is likely to come across in applications, and that having a duality gap signals issues with the model formulation.

We say problem (8.1) is *very nicely posed* if for all values of \(c_0\) the feasibility problem

satisfies either the first or third alternative in Lemma 8.3.

For any \(\varepsilon>0\) consider the feasibility problem with variable \(x\) and constraints

Optimality of \(p^\star\) implies that the above problem is infeasible. By Lemma 8.3 and because we assumed very-nice-posedness there exists \(\hat{y}=[y_0\ y]^T\) such that

If \(y_0=0\) then \(-A^Ty\in K^*\) and \(b^Ty>0\), which by Lemma 8.3 again would mean that the original problem was infeasible, which is not the case. Hence we can rescale so that \(y_0=1\) and then we get

The first constraint means that \(y\) is feasible for the dual problem. By letting \(\varepsilon\to 0\) we obtain \(d^\star\geq p^\star\).

There are more direct conditions which guarantee strong duality, such as below.

*strictly feasible*: there exists a point \(x\in\mathrm{int}(K)\) in the interior of \(K\) such that \(Ax=b\). Then strong duality holds if \(p^\star\) is finite. Moreover, if both primal and dual problem are strictly feasible then \(p^\star\) and \(d^\star\) are attained.

We omit the proof which can be found in standard texts. Note that the first problem from Example 8.6 does not satisfy Slater constraint qualification: the only feasible points lie on the boundary of the cone (we say the problem has *no interior*). That problem is not very nicely posed either: the point \((x_1,x_2,x_3)=(0.5c_0^2\varepsilon^{-1}+\varepsilon, 0.5c_0^2\varepsilon^{-1}, c_0)\in\Q^3\) violates the inequality \(x_2\geq x_1\) by an arbitrarily small \(\varepsilon\), so the problem is infeasible but limit-feasible (second alternative in Lemma 8.3).

## 8.5 Applications of conic duality¶

### 8.5.1 Linear regression and the normal equation¶

Least-squares linear regression is the problem of minimizing \(\|Ax-b\|_2^2\) over \(x\in\real^n\), where \(A\in\real^{m\times n}\) and \(b\in\real^m\) are fixed. This problem can be posed in conic form as

and we can write the Lagrangian

so the constraints in the dual problem are:

The problem exhibits strong duality with both the primal and dual values attained in the optimal solution. The primal solution clearly satisfies \(t=\|Ax-b\|_2\), and so complementary slackness for the quadratic cone implies that the vectors \((u,-v)=(1,-v)\) and \((t,Ax-b)\) are parallel. As a consequence the constraint \(A^Tv=0\) becomes \(A^T(Ax-b)=0\) or simply

so if \(A^TA\) is invertible then \(x=(A^TA)^{-1}A^Tb\). This is the so-called *normal equation* for least-squares regression, which we now obtained as a consequence of strong duality.

### 8.5.2 Constraint attribution¶

Consider again a portfolio optimization problem with mean-variance utility function, vector of expected returns \(\alpha\) and covariance matrix \(\Sigma=F^TF\):

where the linear part represents any set of additional constraints: total budget, sector constraints, diversification constraints, individual relations between positions etc. In the absence of additional constraints the solution to the unconstrained maximization problem is easy to derive using basic calculus and equals

We would like to understand the difference \(x^*-\hat{x}\), where \(x^*\) is the solution of (8.12), and in particular to measure which of the linear constraints actually cause \(x^*\) to deviate from \(\hat{x}\) and to what degree. This can be quantified using the dual variables.

The conic version of problem (8.12) is

with dual

Suppose we have a primal-dual optimal solution \((x^*,r^*,y^*,s^*,u^*)\). Complementary slackness for the rotated quadratic cone implies

which leads to \(\beta=c\) and

or equivalently

This equation splits the difference between the constrained and unconstrained solutions into contributions from individual constraints, where the weights are precisely the dual variables \(y^*\). For example, if a constraint is not binding (\(a_i^Tx^*-b_i<0\)) then by complementary slackness \(y_i^*=0\) and, indeed, a non-binding constraint has no effect on the change in solution.

## 8.6 Semidefinite duality and LMIs¶

The general theory of conic duality applies in particular to problems with semidefinite variables so here we just state it in the language familiar to SDP practitioners. Consider for simplicity a primal semidefinite optimization problem with one matrix variable

We can quickly repeat the derivation of the dual problem in this notation. The Lagrangian is

and we get the dual problem

The dual contains an affine matrix-valued function with coefficients \(C,A_i\in \mathcal{S}^n\) and variable \(y\in \R^m\). Such a matrix-valued affine inequality is called a *linear matrix inequality (LMI)*. In Sec. 6 (Semidefinite optimization) we formulated many problems as LMIs, that is in the form more naturally matching the dual.

From a modeling perspective it does not matter whether constraints are given as linear matrix inequalities or as an intersection of affine hyperplanes; one formulation is easily converted to other using auxiliary variables and constraints, and this transformation is often done transparently by optimization software. Nevertheless, it is instructive to study an explicit example of how to carry out this transformation. An linear matrix inequality

where \(A_i\in\PSD^m\) is converted to a set of linear equality constraints using a slack variable

Apart from introducing an explicit semidefinite variable \(S\in\PSD^m\) we also added \(m(m+1)/2\) equality constraints. On the other hand, a semidefinite variable \(X\in\PSD^n\) can be rewritten as a linear matrix inequality with \(n(n+1)/2\) scalar variables

Obviously we should only use these transformations when necessary; if we have a problem that is more naturally interpreted in either primal or dual form, we should be careful to recognize that structure.

Consider the problem:

with the variables \(X\in\PSD^n\) and \(z\in \R^n\). This is a problem in primal form with \(n(n+1)/2\) equality constraints, but they are more naturally interpreted as a linear matrix inequality

The dual problem is

in the variable \(Z\in \PSD^n\). The dual problem has only \(n\) equality constraints, which is a vast improvement over the \(n(n+1)/2\) constraints in the primal problem.

In Sec. 6.2.3 (Singular value optimization), and specifically in (6.17), we expressed the problem of minimizing the sum of singular values of a nonsymmetric matrix \(X\). Problem (6.17) can be written as an LMI:

Treating this as the dual and going back to the primal form we get:

which is equivalent to the claimed (6.18). The dual formulation has the advantage of being linear in \(X\).