9. Duality in conic optimization

9.1. Introduction

Duality theory is a rich and powerful area of convex optimization, and central to understanding sensitivity analysis and infeasibility issues in linear (and convex) optimization. Furthermore, it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal value for many difficult non-convex problem.

9.2. The Lagrangian

Duality theory is inherently connected with a (primal) optimization problem. Initially we consider an optimization problem in a general form

(1)\[\begin{split}\begin{array}{ll} \mbox{minimize}& f_0(x) \\ \mbox{subject to}& f(x) \leq 0\\ & h(x) = 0, \end{array}\end{split}\]

where \(f_0:\R^n \mapsto \R\) is the objective funtion, \(f:\R^n \mapsto \R^m\) encodes inequality constraints, and \(h: \R^n \mapsto \R^p\) encodes equality constraints. We will denote (1) the primal problem, and its optimal value by \(p^\star\). A general problem such as (1) is well-suited for our initial discussion of duality (especially weak duality); for our discussion of strong duality in Section 9.5 we only consider a particular class of linear optimization problems.

The Lagrangian for (1) is a function \(L:\R^n\times \R^m \times \R^p \mapsto \R\) that augments the objective with a weighted combination of all the constraints,

(2)\[ L(x, y, s) = f_0(x) + y^T h(x) + s^T f(x).\]

The variables \(y\in \R^p\) and \(s\in \R_+^m\) are called Lagrange multipliers or dual variables. Readers familiar with the method of penalization for enforcing constraints may recognize the Lagrangian as something similar; if we fix \(y\) and \(s\) large enough, then the minimizer of \(L(x)\) may approximately satisfy \(h(x)=0\) and \(f(x)\leq 0\). That is just a (very) simplifying interpretation, however. The main property of the Lagrangian is that it provides a lower bound on feasible solutions to (1); for a feasible \(x\) we have \(y^T h(x)=0\) and \(s^T f(x)\leq 0\) (since \(s\geq 0\)), so obviously \(L(x,y,s) \leq f_0(x)\).

9.3. The dual function and problem

The Lagrange dual function is defined as the minimum of the Lagrangian over \(x\),

\[g(y, s) = \inf_{x} L(x, y, s),\]

and where the Lagrangian is unbounded below in \(x\), we assign the dual function the value \(-\infty\). For fixed \((y,s)\) the dual function is an affine function, so we can think of \(g(y,s)\) as the pointwise infimum of infinitely many such affine functions (recall our discussion in Section 2.2.1 of convex piecewise-linear functions defined as the maximum of a set of affine function); thus \(g(y,s)\) is a concave function, even when the problem (1) is not convex.

The Lagrange dual problem is then obtained by maximizing \(g(x, y)\), i.e., is it defined as

(3)\[\begin{split}\begin{array}{ll} \mbox{maximize}& g(y, s)\\ \mbox{subject to}& s\geq 0, \end{array}\end{split}\]

which is then always a concave optimization problem.

9.4. Weak duality

We already established that

\[L(x,y,s)\leq f_0(x),\]

which is a global (and weak) inequality, i.e., it holds for all \((x, y, s)\). The Lagrange dual

\[g(y,s)=\inf_x L(x,y,s)\]

provides a bound on the optimal value \(p^\star=f_0(x^\star)\) by minimizing over \(x\). The best such lower bound is given by the dual problem (i.e., \(d^\star=\sup_{y,s} g(y,s)\)), and the relationship

\[p^\star \geq d^\star\]

is called the weak duality property, and holds for any kind of optimization problem with a well-defined Lagrangian function (including, e.g., nonconvex problems, but excluding general integer problems).

9.5. Strong duality

Strong duality is one the most important concepts in convex optimization. The problem (3) is generally not convex; for that we require that the objective function \(f_0(x)\) and the inequalities \(f_i(x), \: i=1,\dots,m\) are convex functions, and that the equality constraints are affine functions. Then a sufficient condition for strong duality is known as Slater’s constraint qualifications, which basically assumes that the existence of a strictly feasible point exists for either the primal or dual problem.

In this section we only show strong duality for the linear optimization problem. Since all linear optimization can be rewritten into this form, this incurs no loss of generality. For linear optimization strong duality is traditionally proved using ideas from the simplex algorithm; our proof is based on a more recent approach using separating hyperplanes and Farkas’ lemma, which are also important for the discussion of infeasibility in Section 2.4.

To prove strong duality we need two intermediate results. The first (which we state without proof) is a special case of a more general separating hyperplane theorem.

Theorem 5 Separating hyperplane theorem

Let \(S\) by a closed convex set, and \(b\notin S\). Then there exists a strictly separating hyperplane \(a\) such that

\[a^T b > a^T x, \quad \forall x \in S.\]

The geometric interpretation of the Separating hyperplane theorem is given in Fig. 9.1, and indicates that a constructive proof follows from projecting \(b\) onto \(S\).

_images/fig-hyperplane-new.png

Fig. 9.1 Separating hyperplane for a closed convex set \(S\) and a point \(b \not\in S\).

We next show an important result, which makes use of the separating hyperplane theorem.

Lemma 6 Farkas lemma

Given \(A\) and \(b\), exactly one of the two propositions is true:

  1. \(\exists x \: : \: Ax = b, \quad x\geq 0\),
  2. \(\exists y \: : \: A^Ty \leq 0, \quad b^T y > 0\).

Proof. Both cannot be true, because then

\[b^T y = x^T A^T y \leq 0,\]

which is a contradiction. Next assume that 1. is not true, i.e., \(b\notin S = \{Ax \: | \: x\geq 0 \}.\) The set \(S\) is a closed and convex, so there exists a separating hyperplane \(y\) (separating \(b\) and \(S\)) such that

\[y^T b > y^T A x, \quad \forall x\geq 0\]

But this implies that \(b^T y > 0\) and \(A^Ty\leq 0\).

We can now prove strong duality.

Proof Assume that the dual optimal value is attained (i.e., \(d^\star := b^T y^\star < \infty\)). Since \(s^\star = c - A^T y^\star\geq 0\) there can then be no \(z\) satisfying

(4)\[ b^T z > 0, \qquad A^T z \leq 0,\]

because otherwise

\[b^T(y^\star + z)>d^\star, \qquad c-A^T(y^\star + z) \geq 0\]

contradicting the optimality assumption on \(y^\star\). From Farkas’ lemma (4) is infeasible if and only if there exists \(x\) such that

\[Ax = b, \qquad x\geq 0.\]

For that particular choice of \(x\) we have

\[c^T x = x^T A^T y = b^T y^\star,\]

i.e., \(p^\star=d^\star\). On the other hand, assume that \(-\infty < p^\star < \infty\). From weak duality we then have \(d^\star < \infty\), and from Farkas’ lemma \(d^\star > -\infty\), i.e., the dual optimal value is attained. In other words, if either the primal or dual optimal values are attained, we have \(p^\star = d^\star\).

Farkas’ lemma is an example of theorems of alternatives, where exactly one of two systems have a solution, and we can also formulate a dual variant.

Lemma 7 Dual variant of Farkas’ lemma

Given \(A\) and \(c\), exactly one of the two propositions is true:

  1. \(\exists x \: : \: Ax = 0, \quad x\geq 0, \quad c^Tx < 0\),
  2. \(\exists y \: : \: c-A^Ty \geq 0\).

Proof. Both cannot be true, because then

\[c^T x = x^T(c - A^Ty) \geq 0.\]

Next assume that 2. is not true, i.e., \(\forall y \: : \: c < A^Ty\). But then for \(x\geq 0\) we have

\[c^T x < x^T A^T y, \quad \forall y\]

implying that \(c^Tx<0\) and \(Ax=0\).