4. Semidefinite optimization

4.1. Introduction

In this chapter we extend the conic optimization framework from Chap. 2 and 3 with symmetric positive semidefinite matrix variables.

4.1.1. Semidefinite matrices and cones

A symmetric matrix \(X\in \mathcal{S}^n\) is called symmetric positive semidefinite if

\[z^T X z \geq 0, \quad \forall z\in\mathbb{R}^n.\]

We then define the cone of symmetric positive semidefinite matrices as

(1)\[\PSD^n = \{ X \in \mathcal{S}^n \mid z^T X z \geq 0, \: \forall z\in \mathbb{R}^n \}.\]

For brevity we will often use the shorter notion semidefinite instead of symmetric positive semidefinite, and we will write \(X\succeq Y\) (\(X \preceq Y\)) as shorthand notation for \((X-Y)\in \PSD^n\) (\((Y-X)\in \PSD^n\)). As inner product for semidefinite matrices, we use the standard trace inner product for general matrices, i.e.,

\[\langle A, B \rangle := \mathrm{tr}(A^T B) = \sum_{ij}a_{ij}b_{ij}.\]

It is easy to see that (1) indeed specifies a convex cone; it is pointed (with origin \(X=0\)), and \(X,Y\in \PSD^n\) implies that \((\alpha X + \beta Y)\in \PSD^n, \: \alpha,\beta \geq 0\). Let us a review a few equivalent definitions of \(\PSD^n\). It is well-known that every symmetric matrix \(A\) has a spectral factorization

\[A = \sum_{i=1}^n \lambda_i q_i q_i^T.\]

where \(q_i \in \mathbb{R}^n\) are the (orthogonal) eigenvectors and \(\lambda_i\) are eigenvalues of \(A\). Using the spectral factorization of \(A\) we have

\[x^T A x = \sum_{i=1}^n \lambda_i (x^T q_i)^2,\]

which shows that \(x^T A x \geq 0 \: \Leftrightarrow \: \lambda_i \geq 0, \: i=1,\dots,n\). In other words,

(2)\[\PSD^n = \{ X \in \Symm^n \mid \lambda_i(X)\geq 0, \: i=1,\dots,n \}.\]

Another useful characterization is that \(A\in \PSD^n\) if and only if it is a Grammian matrix \(A=V^T V\). Using the Grammian representation we have

\[x^T A x = x^T V^T V x = \|V x\|_2^2,\]

i.e., if \(A=V^TV\) then \(x^T A x \geq 0, \: \forall x\). On the other hand, from the positive spectral factorization \(A=Q\Lambda Q^T\) we have \(A = V^T V\) with \(V=\Lambda^{1/2} Q^T\), where \(\Lambda^{1/2}=\mathbf{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_n})\). We thus have the equivalent characterization

(3)\[\PSD^n = \{ X \in \PSD^n \mid X = V^T V \text{ for some } V\in\R^{n\times n}\}.\]

In a completely analogous way we define the cone of symmetric positive definite matrices as

\[\begin{split}\begin{array}{cl} \PD^n & = \{ X \in \Symm^n \mid z^T X z > 0, \: \forall z\in \R^n \} \\ & = \{ X \in \Symm^n \mid \lambda_i(X) > 0, \: i=1,\dots,n \} \\ & = \{ X \in \PSD^n \mid X = V^T V \text{ for some } V\in\R^{n\times n},\: \mathbf{rank}(V)=n\}, \end{array}\end{split}\]

and we write \(X\succ Y\) (\(X\prec Y\)) as shorthand notation for \((X-Y)\in \PD^n\) (\((Y-X)\in \PD^n\)).

The one dimensional cone \(\PSD^1\) simply corresponds to \(\R_+\). Similarly consider

\[\begin{split}X = \left[\begin{array}{cc} x_1 & x_3\\ x_3 & x_2 \end{array} \right]\end{split}\]

with determinant \(\det(X)=x_1 x_2 - x_3^2 = \lambda_1\lambda_2\) and trace \(\mathrm{tr}(X)=x_1 + x_2 = \lambda_1 + \lambda_2\). Therefore \(X\) has positive eigenvalues if and only if

\[x_1x_2 \geq x_3^2, \quad x_1, x_2\geq 0,\]

which characterizes a three dimensional scaled rotated cone, i.e.,

\[\begin{split}\left[\begin{array}{cc} x_1 & x_3\\ x_3 & x_2 \end{array} \right] \in \PSD^2 \quad \Longleftrightarrow \quad (x_1,x_2,x_3\sqrt{2})\in \mathcal{Q}_r^3.\end{split}\]

More generally we have

\[x\in \R^n_+ \quad \Longleftrightarrow \quad \mathbf{Diag}(X) \in \PSD^n\]

and

\[\begin{split}(t,x)\in \mathcal{Q}^{n+1} \quad \Longleftrightarrow \quad \left[\begin{array}{cc} t & x^T \\ x & t I \end{array}\right] \in \PSD^{n+1},\end{split}\]

where the latter equivalence follows immediate from Lemma 14. Thus both the linear and conic quadratic cone are embedded in the semidefinite cone. In practice, however, linear and conic quadratic should never be described using semidefinite cones, which would result in a large performance penalty by squaring the number of variables.

Example 8

As a more interesting example, consider the symmetric matrix

(4)\[\begin{split}A(x,y,z) = \left[ \begin{array}{ccc} 1 & x & y\\ x & 1 & z\\ y & z & 1 \end{array} \right]\end{split}\]

parametrized by \((x,y,z)\). The set

\[S = \{ (x,y,z) \in \R^3 \mid A(x,y,z) \in \PSD^3 \},\]

(shown in Fig. 4.1) is called a spectrahedron and is perhaps the simplest bounded semidefinite representable set, which cannot be represented using (finitely many) linear or conic quadratic cones. To gain a geometric intuition of \(S\), we note that

\[\det(A(x,y,z)) = -(x^2 + y^2 + z^2 - 2xyz - 1),\]

so the boundary of \(S\) can be characterized as

\[x^2 + y^2 + z^2 - 2xyz = 1,\]

or equivalently as

\[\begin{split}\left[\begin{array}{c} x\\y \end{array}\right]^T \left[\begin{array}{rr} 1 & -z\\-z &1 \end{array}\right] \left[\begin{array}{c} x\\y \end{array}\right] = 1 - z^2.\end{split}\]

For \(z=0\) this describes a circle in the \((x,y)\)-plane, and for \(-1\leq z \leq 1\) it characterizes an ellipse (for a fixed \(z\)).

_images/3dset.png

Fig. 4.1 Plot of spectrahedron \(S=\{ (x,y,z) \in \R^3 \mid A(x,y,z)\succeq 0 \}\).

4.1.2. Properties of semidefinite matrices

Many useful properties of (semi)definite matrices follow directly from the definitions (1)-(3) and the definite counterparts.

  • The diagonal elements of \(A\in\PSD^n\) are nonnegative. Let \(e_i\) denote the \(i\)th standard basis vector (i.e., \([e_i]_j=0,\: j\neq i\), \([e_i]_i=1\)). Then \(A_{ii}=e_i^T A e_i\), so (1) implies that \(A_{ii}\geq 0\).

  • A block-diagonal matrix \(A=\mathbf{diag}(A_1,\dots A_p)\) is (semi)definite if and only if each diagonal block \(A_i\) is (semi)definite.

  • Given a quadratic transformation \(M:=B^T A B\). Then \(M\succ 0\) if and only if \(A\succ 0\) and \(B\) has full rank. This follows directly from the Grammian characterization \(M=(VB)^T(VB)\). For \(M\succeq 0\) we only require that \(A\succeq 0\). As an example, if \(A\) is (semi)definite then so is any permutation \(P^T A P\).

  • Any submatrix of \(A\in \PD^n\) (\(A\in\PSD^n\)) is positive (semi)definite; this follows from the Grammian characterization \(A=V^T V\) since any subset of the columns of \(V\) is linearly independent.

  • The inner product of positive (semi)definite matrices is positive (nonnegative). For any \(A,B\in \PD^n\) let \(A=U^TU\) and \(B=V^T V\) where \(U\) and \(V\) have full rank. Then

    \[\langle A, B \rangle = \mathrm{tr}(U^T U V^T V) = \|U V^T \|_F^2 > 0,\]

    where strict positivity follows from the assumption that \(U\) has full column-rank, i.e., \(UV^T \neq 0\).

  • The inverse of a positive definite matrix is positive definite. This follows from the positive spectral factorization \(A=Q \Lambda Q^T\), which gives us

    \[A^{-1} = Q^T \Lambda^{-1} Q\]

    where \(\Lambda_{ii}>0\). If \(A\) is semidefinite then the pseudo-inverse of \(A\) is semidefinite.

  • Consider a matrix \(X\in\Symm^n\) partitioned as

    \[\begin{split}X = \left[\begin{array}{cc} A & B^T\\ B & C \end{array}\right].\end{split}\]

    Let us find necessary and sufficient conditions for \(X\succ 0\). We know that \(A\succ 0\) and \(C\succ 0\) (since any submatrix must be positive definite). Furthermore, we can simplify the analysis using a nonsingular transformation

    \[\begin{split}L = \left[\begin{array}{cc} I & 0\\ F & I \end{array}\right]\end{split}\]

    to diagonalize \(X\) as \(L X L^T = D\), where \(D\) is block-diagonal. Note that \(\det(L)=1\), so \(L\) is indeed nonsingular. Then \(X\succ 0\) if and only if \(D\succ 0\). Expanding \(L X L^T = D\), we get

    \[\begin{split}\left[\begin{array}{cc} A & AF^T + B^T\\ FA + B & F A F^T + F B^T + B F^T + C \end{array}\right] = \left[\begin{array}{cc} D_1 & 0\\ 0 & D_2 \end{array}\right].\end{split}\]

    Since \(\det(A)\neq 0\) (by assuming that \(A\succ 0\)) we see that \(F=-BA^{-1}\) and direct substitution gives us

    \[\begin{split}\left[\begin{array}{cc} A & 0\\ 0 & C - B A^{-1} B^T \end{array}\right] = \left[\begin{array}{cc} D_1 & 0\\ 0 & D_2 \end{array}\right].\end{split}\]

    We have thus established the following useful result.

    Lemma 14 Schur complement lemma

    A symmetric matrix

    \[\begin{split}X = \left[\begin{array}{cc} A & B^T\\ B & C \end{array}\right].\end{split}\]

    is positive definite if and only if

    \[A \succ 0, \quad C - B A^{-1} B^T \succ 0.\]

4.1.3. Semidefinite duality

Semidefinite duality is largely identical to conic quadratic duality. We define a primal semidefinite optimization problem as

(5)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \langle C, X \rangle\\ \mbox{subject to} & \langle A_i, X \rangle = b_i, \quad i=1,\dots,m\\ & X \in \PSD^n \end{array}\end{split}\]

with a dual cone

\[(\PSD^n)^* = \{ Z \in \R^{n\times n} \mid \langle X, Z \rangle\geq 0, \: \forall X \in \PSD^n \}.\]
Lemma 15
The semidefinite cone is self-dual, \((\PSD^n)^*=\PSD^n\).

Proof. We have already seen that \(X,Z\in \PSD^n \Rightarrow \langle X,Z \rangle \geq 0\). Conversely, assume that \(Z\not \in \PSD^n\). Then there exists a \(w\) satisfying \(w^T Z w < 0\), i.e., \(X=ww^T\in \PSD^n\) satisfies \(\langle X, Z\rangle <0\), so \(Z\not \in (\PSD^n)^*\).

We can now define a Lagrange function

\[\begin{split}\begin{array}{ll} L(X,y,S) & = \langle C, X \rangle - \sum_{i=1}^m y_i(\langle A_i, X \rangle-b_i) - \langle X, S \rangle\\ & = \langle X, C \rangle - \sum_{i=1}^m y_i A_i - S\rangle + b^T y \end{array}\end{split}\]

with a dual variable \(S\in\PSD^n\). The Lagrange function is unbounded below unless \(C-\sum_i y_i A_i = S\), so we get a dual problem

(6)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y\\ \mbox{subject to} & C - \sum_{i=1}^m y_i A_i = S\\ & S \in \PSD^n. \end{array}\end{split}\]

The primal constraints are a set of equality constraints involving inner products \(\langle A_i, X\rangle=b_i\), i.e., an intersection of \(n(n+1)/2\)-dimensional affine hyperplanes. If we eliminate \(S\) from the dual constraints,

\[C-\sum_{i=1}^m y_i A_i \succeq 0\]

we get an affine matrix-valued function with coefficients \((C,A_i)\in \mathcal{S}^n\) and variable \(y\in \R^m\). Such a matrix-valued affine inequality is called a linear matrix inequality, which we discuss in greater detail in the next section.

Weak duality follows immediately by taking the inner product between \(X\) and the dual constraint,

\[\langle X, C-\sum_{i=1}^m y_i A_i \rangle = \langle C, X \rangle - b^T y = \langle X, S\rangle \geq 0,\]

but (similar to conic quadratic optimization) the issue of strong duality is more complicated than for linear optimization. In particular, it it possible that the primal or dual optimal values are unattained, and it is also possible for a problem to have a positive duality gap, even though both the primal and dual optimal values are attained. This is illustrated in the following example.

Example 9 Duality

We consider a problem

\[\begin{split}\begin{array}{ll} \mbox{minimize}& x_1\\ \mbox{subject to}& \left[ \begin{array}{ccc} 0 & x_1 & 0 \\ x_1 & x_2 & 0\\ 0 & 0 & 1+x_1 \end{array} \right]\in \PSD^3, \end{array}\end{split}\]

with feasible set \(\{x_1=0, x_2\geq 0\}\) and optimal value \(p^\star=0\). The dual problem can be formulated as

\[\begin{split}\begin{array}{ll} \mbox{maximize}& -z_2\\ \mbox{subject to}& \left[ \begin{array}{ccc} z_1 & (1-z_2)/2 & 0 \\ (1-z_2)/2 & 0 & 0\\ 0 & 0 & z_2 \end{array} \right]\in \PSD^3, \end{array}\end{split}\]

which has a feasible set \(\{z_1\geq 0, z_2=1\}\) and dual optimal value \(d^\star=-1\). Both problems are feasible, but not strictly feasible.

Just as for conic quadratic optimization, existence of either a strictly feasible primal or dual point (i.e., a Slater constraint qualification) ensures strong duality for the pair of primal and dual semidefinite programs.

As discussed in the introduction, we can embed the nonnegative orthant and quadratic cone within the semidefinite cone, and the Cartesian product of several semidefinite cones can be modeled as a larger block-diagonal semidefinite matrix. For an optimization solver, however, it is more efficient to explicitly include multiple cones in the problem formulation. A general linear, conic quadratic and semidefinite problem format often used by optimization solvers is

(7)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \langle c^l, x^l \rangle + \sum_{j=1}^{n_q} \langle c^q_j, x^q_j \rangle + \sum_{j=1}^{n_s}\langle C_j, X_j \rangle\\ \mbox{subject to} & A^l x^l + \sum_{j=1}^{n_q} A^q_j x^q_j + \sum_{j=1}^{n_s} {\cal A}_j(X_j) = b\\ & x^l \in \R^{n_l}_+,\: x^q_j \in \mathcal{Q}^{q_j},\: X_j \in \PSD^{s_j}\:\forall j \end{array}\end{split}\]

with a linear variable \(x^l\), conic quadratic variables \(x^q_j\), semidefinite variables \(X_j\) and a linear mapping \({\cal A}_j: \mathcal{S}^{s_j} \mapsto \R^m\),

\[\begin{split}{\cal A}_j(X^s_j) := \left[\begin{array}{c} \langle A^s_{1j}, X^s_j \rangle \\ \vdots \\ \langle A^s_{mj}, X^s_j \rangle \end{array} \right].\end{split}\]

All the duality results we have presented for the simpler linear, conic quadratic and semidefinite problems translate directly to (7); the dual problem is

(8)\[\begin{split}\begin{array}{lccll} \mbox{maximize} & b^T y & & &\\ \mbox{subject to} & c^l - (A^l)^T y & = & s^l & \\ & c^q_j - (A^q_j)^T y & = & s^q_j, & j=1,\dots,n_q\\ & C_j - {\cal A}_j^*(y) & = & S_j, & j=1,\dots,n_s, \end{array}\end{split}\]

where

\[{\cal A}^*_j (y) := \sum_{i=1}^m y_i A^s_{ij},\]

and strong duality holds if a strictly feasible point exists for either (7) or (8). Similarly, if there exist

\[x^l\in \R^{n_l}_+, \qquad x^q_j \in \mathcal{Q}^{q_j}\: \forall j, \qquad X_j \in \PSD^{s_j}\: \forall j\]

satisfying

\[A^l x^l + \sum_{j=1}^{n_q} A^q_j x^q_j + \sum_{j=1}^{n_s} {\cal A}_j(X_j) = 0, \quad \langle c^l, x^l \rangle + \sum_{j=1}^{n_q} \langle c^q_j, x^q_j \rangle + \sum_{j=1}^{n_s}\langle C_j, X_j \rangle < 0,\]

then the dual problem is infeasible.

4.2. Semidefinite modeling

Having discussed different characterizations and properties of semidefinite matrices, we next turn to different functions and sets that can be modeled using semidefinite cones and variables. Most of those representations involve semidefinite matrix-valued affine functions, which we discuss next.

4.2.1. Linear matrix inequalities

A linear matrix inequality is an affine matrix-valued mapping \(A:\R^n \mapsto \mathcal{S}^m\),

(9)\[A(x) = A_0 + x_1 A_1 + \dots + x_n A_n \succeq 0,\]

in the variable \(x\in \R^n\) with symmetric coefficients \(A_i\in\mathcal{S}^m,\: i=0,\dots,n\). As a simple example consider the matrix in (4),

\[A(x,y,z) = A_0 + x A_1 + y A_2 + z A_3 \succeq 0\]

with

\[\begin{split}\begin{array}{cccc} A_0 = I, & A_1 = \left[ \begin{array}{ccc} 0 & 1 & 0\\ 1 & 0 & 0\\0 & 0 &0 \end{array}\right], & A_2 = \left[ \begin{array}{ccc} 0 & 0 & 1\\ 0 & 0 & 0\\1 & 0 &0 \end{array}\right], & A_3 = \left[\begin{array}{ccc} 0 & 0 & 0\\ 0 & 0 & 1\\0 & 1 &0 \end{array}\right]. \end{array}\end{split}\]

Alternatively, we can describe the linear matrix inequality \(A(x,y,z)\succeq 0\) as

\[X\in \PSD^3, \qquad x_{11} = x_{22} = x_{33}=1,\]

i.e., as a semidefinite variable with fixed variables; these two alternative formulations illustrate the difference between primal and dual semidefinite formulations, discussed in Section 4.1.3.

4.2.2. Eigenvalue optimization

Consider a symmetric matrix valued function \(A:\R^n\mapsto \mathcal{S}^m\),

\[A(x) = A_0 + x_1 A_1 + \dots + x_n A_n,\]

and let the eigenvalues of \(A(x)\) be denoted by

\[\lambda_1(A(x)) \geq \lambda_2(A(x)) \geq \dots \geq \lambda_m(A(x)).\]

A number of different functions of \(\lambda_i(A(x))\) can then be described using simple linear matrix inequalities.

4.2.2.1. Sum of eigenvalues

The sum of the eigenvalues corresponds to

\[\sum_{i=1}^m \lambda_i (A(x)) = \mathrm{tr}(A(x)) = \mathrm{tr}(A_0) + \sum_{i=1}^n x_i \mathrm{tr}(A_i).\]

4.2.2.2. Largest eigenvalue

The largest eigenvalue can be characterized in epigraph form \(\lambda_1(A(x))\leq t\) as

(10)\[K = \{ (x, t)\in \R^{n+1} \mid tI - A(x) \succeq 0 \}.\]

To verify this, suppose we have a spectral factorization \(A(x)=Q(x) \Lambda(x) Q(x)^T\) where \(Q(x)\) is orthogonal and \(\Lambda(x)\) is diagonal. Then \(t\) is an upper bound on the largest eigenvalue if and only if

\[Q(x)^T(tI - A(x))Q(x) = t I - \Lambda(x) \succeq 0.\]

Thus we can minimize the largest eigenvalue of \(A(x)\).

4.2.2.3. Smallest eigenvalue

The smallest eigenvalue can be described in hypograph form \(\lambda_m(A(x)) \geq t\) as

(11)\[ K = \{ (x,t)\in\R^{n+1} \mid A(x) \succeq tI\},\]

i.e., we can maximize the smallest eigenvalue of \(A(x)\).

4.2.2.4. Eigenvalue spread

The eigenvalue spread can be modeled in epigraph form

\[\lambda_1(A(x))-\lambda_m(A(x))\leq t\]

by combining the two linear matrix inequalities in (10) and (11), i.e.,

(12)\[K = \{ (x,t) \in \R^{n+1} \mid zI \preceq A(x) \preceq sI, \: s-z \leq t \}.\]

4.2.2.5. Spectral radius

The spectral radius \(\rho(A(x)):=\max_i |\lambda_i(A(x)) |\) can be modeled in epigraph form \(\rho(A(x))\leq t\) using two linear matrix inequalities

\[t I \pm A(x) \succeq 0,\]

i.e., the epigraph is

(13)\[K = \{ (x,t)\in\R^{n+1} \mid tI \succeq A(x) \succeq -tI \}.\]

4.2.2.6. Condition number of a positive definite matrix

The condition number of a positive definite matrix can be minimized by noting that \(\lambda_1(A(x))/\lambda_m(A(x))\leq t\) if and only if there exists a \(\mu>0\) such that

\[\mu I \preceq A(x) \preceq \mu t I,\]

or equivalently if and only if \(I \preceq \mu^{-1}A(x) \preceq t I\). In other words, if we make a change of variables \(z:=x/\mu\), \(\nu:=1/\mu\) we can minimize the condition number as

\begin{equation} \begin{array}{lc} \mbox{minimize} & t \\ \mbox{subject to} & I \preceq \nu A_0 + \sum_{i=1}^m z_i A_i \preceq t I, \end{array} \end{equation}

and subsequently recover the solution \(x=\nu z\). In essence, we first normalize the spectrum by the smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linear matrix inequality.

4.2.2.7. Roots of the determinant

The determinant of a semidefinite matrix

\[\det(A(x)) = \prod_{i=1}^m \lambda_i(A(x))\]

is neither convex or concave, but rational powers of the determinant can be modeled using linear matrix inequalities. For a rational power \(0\leq q \leq 1/m\) we have that

\[t \leq \det(X)^q\]

if and only if

\[\begin{split}\left[ \begin{array}{cc} A(x) & Z \\ Z^T & \mathbf{Diag}(Z) \end{array}\right]\end{split}\]
(14)\[ (Z_{11} Z_{22} \cdots Z_{mm})^q & \geq t,\]

where \(Z\in\R^{m\times m}\) is a lower-triangular matrix, and the inequality (14) can be modeled using the formulations in Section 3.2.7; see for a proof. Similarly we can model negative powers of the determinant, i.e., for any rational \(q>0\) we have

\[t \geq \det(X)^{-q}\]

if and only if

\begin{equation} \begin{aligned} \left[ \begin{array}{cc} A(x) & Z \\ Z^T & \mathbf{Diag}(Z) \end{array} \right]& \succeq 0\\ (Z_{11} Z_{22} \cdots Z_{mm})^{-q} & \geq t \end{aligned} \end{equation}

for a lower triangular \(Z\).

4.2.3. Singular value optimization

We next consider a nonsymmetric matrix valued function \(A:\R^n\mapsto \R^{m\times p}\),

\[A(x) = A_0 + x_1 A_1 + \dots + x_n A_n,\]

where \(A_i\in \R^{m\times p}\). Assume \(p\leq m\) and denote the singular values of \(A(x)\) by

\[\sigma_1(A(x)) \geq \sigma_2(A(x)) \geq \dots \geq \sigma_p(A(x)) \geq 0.\]

The singular values are simply connected to the eigenvalues of \(A(x)^T A(x)\),

(16)\[\sigma_i(A(x)) = \sqrt{\lambda_i(A(x)^T A(x))},\]

and if \(A(x)\) is symmetric the singular values corresponds to the absolute of the eigenvalues. We can then optimize several functions of the singular values.

4.2.3.1. Largest singular value

The epigraph \(\sigma_1(A(x))\leq t\) can be characterized using (16) as

\[A(x)^T A(x) \preceq t^2 I\]

which from Schur’s lemma is equivalent to the linear matrix inequality

(17)\[\begin{split} \left[\begin{array}{cc} tI & A(x) \\ A(x)^T & t I \end{array} \right]\succeq 0.\end{split}\]

The largest singular value \(\sigma_1(A(x))\) is also called the spectral norm or the \(\ell_2\)-norm of \(A(x)\), \(\|A(x)\|_2 := \sigma_1(A(x))\).

4.2.3.2. Sum of singular values

The trace norm or the nuclear norm of \(A(x)\) is the dual of the \(\ell_2\)-norm. We can characterize it as

(18)\[ \| X \|_{*}=\sup_{\| Z \|_2\leq 1}\mathrm{tr}(X^T Z).\]

It turns out that the nuclear norm corresponds to the sum of the singular values,

(19)\[ \| X \|_{*} = \sum_{i=1}^m \sigma_i(X),\]

which is easy to verify using singular value decomposition \(X=U \Sigma V^T\). We then have

\[\begin{split}\begin{aligned} \sup_{\|Z\|_2\leq 1} \mathrm{tr}(X^T Z) & = \sup_{\|X\|_2\leq 1} \mathrm{tr}(\Sigma^T U^T ZV)\\ & = \sup_{\| U^T Z V\|_2\leq 1} \mathrm{tr}(\Sigma^T U^T Z V) \\ & = \sup_{\| Y\|_2\leq 1} \mathrm{tr}(\Sigma^T Y), \end{aligned}\end{split}\]

where we used the unitary invariance of the norm \(\|\cdot \|_2\). We consider \(Y=\mathbf{diag}(y_1, \dots, y_p)\), so that

\[\sup_{\|Z\|_2\leq 1} \mathrm{tr}(X^T Z) = \sup_{|y_i| \leq 1} \sum_{i=1}^p \sigma_i y_i = \sum_{i=1}^p \sigma_i,\]

which shows (19). Alternatively, we can express (18) as the solution to

\[\begin{split}\begin{array}{ll} \mbox{maximize} & \mathrm{tr}(X^T Z) \\ \mbox{subject to} & \left[\begin{array}{cc}I & Z^T\\Z & I\end{array}\right] \succeq 0, \end{array}\end{split}\]

with the dual problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & \mathrm{tr}(U + V)/2 \\ \mbox{subject to} & \left[\begin{array}{cc}U & X^T\\X & V\end{array}\right] \succeq 0. \end{array}\end{split}\]

In other words, using strong duality we can characterize the epigraph \(\| A(x) \|_{*}\leq t\) using a linear matrix inequality

(20)\[\begin{split}\left[\begin{array}{cc} U & A(x)^T \\ A(x) & V\end{array}\right] \succeq 0, \quad \mathrm{tr}(U + V)/2 \leq t.\end{split}\]

For a symmetric matrix the nuclear norm corresponds to the sum of the absolute eigenvalues, and for a semidefinite matrix it simply corresponds to the trace of the matrix.

4.2.4. Matrix inequalities from Schur’s Lemma

Several quadratic or quadratic-over-linear matrix inequalities follow immediately from Schur’s lemma. Suppose \(A:\R^n \mapsto \R^{m\times p}\) and \(B:\R^n \mapsto \R^{p\times p}\) are matrix-valued affine functions

\[\begin{split}\begin{array}{rl} A(x) & = A_0 + x_1 A_1 + \dots + x_n A_n\\ B(y) & = B_0 + y_1 B_1 + \dots + y_r B_r \end{array}\end{split}\]

where \(A_i\in \R^{m\times p}\), \(B_i\in \mathcal{S}^p\), \(x\in \R^n\) and \(y\in \R^r\). Then

\[A(x)^T B(y)^{-1} A(x) \preceq C\]

if and only if

\[\begin{split}\left[\begin{array}{cl} C & A(x)^T \\ A(x) & B(y) \end{array} \right] \succeq 0.\end{split}\]

4.2.5. Nonnegative polynomials

We next consider characterizations of polynomials constrained to be nonnegative on the real axis. To that end, consider a polynomial basis function

\[v(t) = \left(1, t, \dots, t^{2n}\right).\]

It is then well-known that a polynomial \(f:\R\mapsto \R\) of even degree \(2n\) is nonnegative on the entire real axis

(21)\[f(t) := x^T v(t) = x_0 + x_1 t + \dots + x_{2n} t^{2n} \geq 0, \: \forall t\]

if and only if it can be written as a sum of squared polynomials of degree \(n\) (or less), i.e., for some \(q_1,q_2\in \R^{n+1}\)

(22)\[f(t) = (q_1^T u(t))^2 + (q_2^T u(t))^2, \qquad u(t):=(1,t,\dots,t^n).\]

It turns out that an equivalent characterization of \(\{ x \mid f(t)\geq 0, \: \forall t\}\) can be given in terms of a semidefinite variable \(X\),

(23)\[x_i = \langle X, H_i\rangle, \quad i=0,\dots,2n, \qquad X \in \PSD^{n+1}.\]

where \(H^{n+1}_i\in \R^{(n+1)\times (n+1)}\) are Hankel matrices

\[\begin{split}[H_i]_{kl} = \left \{\begin{array}{ll}1, & k+l=i\\0, &\text{otherwise}.\end{array} \right.\end{split}\]

When there is no ambiguity, we drop the superscript on \(H_i\). For example, for \(n=2\) we have

\[\begin{split}H_0 = \left[\begin{array}{ccc} 1 & 0 & 0\\0 & 0 & 0\\0 & 0 & 0 \end{array} \right], \quad H_1 = \left[\begin{array}{ccc} 0 & 1 & 0\\1 & 0 & 0\\0 & 0 & 0 \end{array} \right], \quad \dots \quad H_4 = \left[\begin{array}{ccc} 0 & 0 & 0\\0 & 0 & 0\\0 & 0 & 1 \end{array} \right].\end{split}\]

To verify that (21) and (23) are equivalent, we first note that

\[u(t)u(t)^T = \sum_{i=0}^{2n} H_i v_i(t),\]

i.e.,

\[\begin{split}\left[\begin{array}{c} 1\\ t \\ \vdots \\t^n \end{array}\right] \left[\begin{array}{c} 1\\ t \\ \vdots \\t^n \end{array}\right]^T = \left[\begin{array}{cccc} 1 & t & \dots & t^n\\ t & t^2 & \dots & t^{n+1}\\ \vdots & \vdots & \ddots & \vdots \\ t^n & t^{n+1} & \dots & t^{2n} \end{array}\right].\end{split}\]

Assume next that \(f(t)\geq 0\). Then from (22) we have

\[\begin{split}\begin{array}{cl} f(t) & = (q_1^T u(t))^2 + (q_2^T u(t))^2 \\ & = \langle q_1q_1^T + q_2 q_2^T, u(t)u(t)^T \rangle\\ & = \sum_{i=0}^{2n} \langle q_1q_1^T + q_2 q_2^T, H_i \rangle v_i(t), \end{array}\end{split}\]

i.e., we have \(f(t)=x^T v(t)\) with \(x_i = \langle X, H_i \rangle, \: X = (q_1q_1^T + q_2 q_2^T)\succeq 0\). Conversely, assume that (23) holds. Then

\[f(t) = \sum_{i=0}^{2n} \langle H_i, X\rangle v_i(t) = \langle X , \sum_{i=0}^{2n}H_i v_i(t)\rangle = \langle X, u(t) u(t)^T\rangle \geq 0\]

since \(X\succeq 0\). In summary, we can characterize the cone of nonnegative polynomials over the real axis as

(24)\[K^n_{\infty} = \{ x\in \R^{n+1} \mid x_i = \langle X, H_i\rangle, \: i=0,\dots,2n, \: X\in\PSD^{n+1} \}.\]

Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasibility problem.

4.2.5.1. Nonnegativity on a finite interval

As an extension we consider a basis function of degree \(n\),

\[v(t)=(1,t,\ldots,t^n).\]

A polynomial \(f(t):=x^T v(t)\) is then nonnegative on a subinterval \(I=[a,b]\subset \R\) if and only \(f(t)\) can be written as a sum of weighted squares,

\[f(t) = w_1(t)(q_1^T u_1(t))^2 + w_2(t)(q_2^T u_2(t))^2\]

where \(w_i(t)\) are nonnegative polynomial \(w_i(t)\geq 0, \,i=1,2, \, \forall t\in I\). To describe the cone

\[K^n_{a,b} = \{ x\in \R^{n+1} \mid f(t) = x^T v(t) \geq 0, \: \forall t\in [a, b] \}\]

we distinguish between polynomials of odd and even degree.

  • Even degree. Let \(n=2m\) and denote

    \[u_1(t)=(1,t,\ldots,t^m), \qquad u_2(t)=(1,t,\ldots,t^{m-1}).\]

    We choose \(w_1(t)=1\) and \(w_2(t)=(t-a)(b-t)\) and note that \(w_2(t)\geq 0\) on \([a,b]\). Then \(f(t)\geq 0, \forall t\in [a, b]\) if and only if

    \[f(t) = (q_1^T u_1(t))^2 + w_2(t)(q_2^T u_2(t))^2\]

    for some \(q_1, q_2\), and an equivalent semidefinite characterization can be found as

    (25)\[\begin{split} K^n_{a,b} = \{ x\in \R^{n+1} \mid x_i = \langle X_1, H^m_i\rangle + \langle X_2, (a+b)H^{m-1}_{i-1}-ab H^{m-1}_i - H^{m-1}_{i-2}\rangle, \\ \, i=0,\ldots,n, \, X_1 \in \PSD^m, \, X_2 \in \PSD^{m-1} \}.\end{split}\]
  • Odd degree. Let \(n=2m+1\) and denote \(u(t)=(1,t,\ldots,t^m)\). We choose \(w_1(t)=(t-a)\) and \(w_2(t)=(b-t)\). We then have that \(f(t)=x^T v(t)\geq 0, \forall t\in[a,b]\) if and only if

    \[f(t) = (t-a)(q_1^T u(t))^2 + (b-t)(q_2^T u(t))^2\]

    for some \(q_1, q_2\), and an equivalent semidefinite characterization can be found as

    (26)\[\begin{split}K^n_{a,b} = \{ x\in \R^{n+1} \mid x_i = \langle X_1, H^m_{i-1} - a H^m_i\rangle + \langle X_2, bH^m_i- H^m_{i-1}\rangle, \\ \, i=0,\ldots,n, \, X_1, X_2 \in \PSD^m \}.\end{split}\]

4.2.6. Hermitian matrices

Semidefinite optimization can be extended to complex-valued matrices. To that end, let \(\mathcal{H}^n\) denote the cone of Hermitian matrices of order \(n\), i.e.,

(27)\[X \in \mathcal{H}^n \qquad \Longleftrightarrow \qquad X\in \mathbb{C}^{n\times n}, \quad X^H = X\]

where superscript ’\(H\)’ denotes Hermitian (or complex) transposition. Then \(X\in \mathcal{H}_+^n\) if and only if

\[\begin{split}\begin{array}{cl} z^H X z & = (\Re z - i \Im z)^T (\Re X + i\Im X) (\Re z + i \Im z)\\ & = \left[\begin{array}{c}\Re z\\ \Im z\end{array}\right]^T \left[ \begin{array}{rr} \Re X & -\Im X\\ \Im X & \Re X \end{array} \right] \left[ \begin{array}{c} \Re z\\ \Im z \end{array} \right] \geq 0, \quad \forall z \in \mathbb{C}^n. \end{array}\end{split}\]

In other words,

(28)\[\begin{split} X \in \mathcal{H}_+^n \qquad \Longleftrightarrow \qquad \left[\begin{array}{rr} \Re X & -\Im X\\ \Im X & \Re X \end{array} \right]\in \PSD^{2n}.\end{split}\]

Note that (28) implies skew-symmetry of \(\Im X\), i.e., \(\Im X = -\Im X^T\).

4.2.7. Nonnegative trigonometric polynomials

As a complex-valued variation of the sum-of-squares representation we consider trigonometric polynomials; optimization over cones of nonnegative trigonometric polynomials has several important engineering applications. Consider a trigonometric polynomial evaluated on the complex unit-circle

(29)\[ f(z) = x_0 + 2\Re ( \sum_{i=1}^n x_i z^{-i} ), \qquad |z|=1\]

parametrized by \(x\in \R\times \mathbb{C}^n\). We are interested in characterizing the cone of trigonometric polynomials that are nonnegative on the angular interval \([0,\pi]\),

\[K^n_{0,\pi} = \{ x\in \R\times \mathbb{C}^{n} \mid x_0 + 2\Re ( \sum_{i=1}^n x_i z^{-i} )\geq 0, \: \forall z=e^{jt}, t\in[0,\pi] \}.\]

Consider a complex-valued basis function

\[v(z) = (1,z,\dots,z^n).\]

The Riesz-Fejer Theorem states that a trigonometric polynomial \(f(z)\) in (29) is nonnegative (i.e., \(x \in K_{0,\pi}\)) if and only if for some \(q\in \mathbb{C}^{n+1}\)

(30)\[f(z) = | q^H v(z) |^2.\]

Analogously to Section 4.2.5 we have a semidefinite characterization of the sum-of-squares representation, i.e., \(f(z)\geq 0, \forall z=e^{jt}\) if and only if

(31)\[x_i = \langle X, T_i\rangle, \quad i=0,\dots,n, \qquad X\in\mathcal{H}_+^{n+1}\]

where \(T_i^{n+1}\in \R^{(n+1)\times (n+1)}\) are Toeplitz matrices

\[\begin{split}[T_i]_{kl} = \left \{\begin{array}{ll}1, & k-l=i\\0, &\text{otherwise}\end{array} \right., \quad i=0,\dots,n.\end{split}\]

When there is no ambiguity, we drop the superscript on \(T_i\). For example, for \(n=2\) we have

\[\begin{split}T_0 = \left[\begin{array}{ccc} 1 & 0 & 0\\0 & 1 & 0\\0 & 0 & 1 \end{array} \right], \quad T_1 = \left[\begin{array}{ccc} 0 & 0 & 0\\1 & 0 & 0\\0 & 1 & 0 \end{array} \right], \quad T_2 = \left[\begin{array}{ccc} 0 & 0 & 0\\0 & 0 & 0\\1 & 0 & 0 \end{array} \right].\end{split}\]

To prove correctness of the semidefinite characterization, we first note that

\[v(z)v(z)^H = I + \sum_{i=1}^n \left(T_i v_i(z)\right ) + \sum_{i=1}^n \left( T_i v_i(z)\right )^H\]

i.e.,

\[\begin{split}\left[ \begin{array}{c} 1\\ z \\ \vdots \\z ^n \end{array} \right] \left[ \begin{array}{c} 1\\ z \\ \vdots \\z^n \end{array} \right]^H = \left[ \begin{array}{cccc} 1 & z^{-1} & \dots & z^{-n}\\ z & 1 & \dots & z^{1-n}\\ \vdots & \vdots & \ddots & \vdots \\ z^n & z^{n-1} & \dots & 1 \end{array} \right].\end{split}\]

Next assume that (30) is satisfied. Then

\[\begin{split}\begin{array}{cl} f(z) & = \langle qq^H, v(z)v(z)^H \rangle\\ & = \langle qq^H, I\rangle + \langle qq^H, \sum_{i=1}^n T_i v_i(z)\rangle + \langle qq^H, \sum_{i=1}^n T_i^T \overline{v_i(z)}\rangle\\ & = \langle qq^H, I\rangle + \sum_{i=1}^n\langle qq^H, T_i \rangle v_i(z) + \sum_{i=1}^n\langle qq^H, T_i^T \rangle \overline{v_i(z)}\\ & = x_0 + 2\Re ( \sum_{i=1}^n x_i v_i(z) ) \end{array}\end{split}\]

with \(x_i = \langle qq^H, T_i \rangle\). Conversely, assume that (31) holds. Then

\[f(z) = \langle X, I\rangle + \sum_{i=1}^n\langle X, T_i \rangle v_i(z) + \sum_{i=1}^n\langle X, T_i^T \rangle \overline{v_i(z)} = \langle X, v(z) v(z)^H \rangle \geq 0.\]

In other words, we have shown that

(32)\[K^n_{0,\pi} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X, T_i\rangle, \: i=0,\dots,n, \: X\in\mathcal{H}_+^{n+1} \}.\]

4.2.7.1. Nonnegativity on a subinterval

We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states that a trigonometric polynomial \(f(z)\) of degree \(n\) is nonnegative on \(I(a,b)=\{ z \mid z=e^{jt}, \: t\in [a,b]\subseteq [0,\pi] \}\) if and only if it can be written as a weighted sum of squared trigonometric polynomials

\[f(z) = | f_1(z) |^2 + g(z) | f_2(z) |^2\]

where \(f_1, f_2, g\) are trigonemetric polynomials of degree \(n, n-d\) and \(d\), respectively, and \(g(z)\geq 0\) on \(I(a,b)\). For example \(g(z) = z + z^{-1} - 2\cos \alpha\) is nonnegative on \(I(0,\alpha)\), and it can be verified that \(f(z)\geq 0, \: \forall z\in I(0,\alpha)\) if and only if

\[x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle - 2\cos(\alpha)\langle X_2 , T_i^n\rangle,\]

for \(X_1\in \mathcal{H}_+^{n+1}\), \(X_2\in\mathcal{H}_+^n\), i.e.,

(33)\[\begin{split} K^n_{0,\alpha} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle \\ - 2\cos(\alpha)\langle X_2 , T_i^n\rangle, \: X_1\in \mathcal{H}_+^{n+1}, X_2\in\mathcal{H}_+^n \}.\end{split}\]

Similarly \(f(z)\geq 0, \: \forall z \in I(\alpha,\pi)\) if and only if

\[x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle -2\cos(\alpha)\langle X_2 , T_i^n\rangle\]

i.e.,

(34)\[\begin{split} K^n_{\alpha,\pi} = \{ x\in \R\times \mathbb{C}^n \mid x_i = \langle X_1, T_i^{n+1} \rangle + \langle X_2 , T_{i+1}^n \rangle + \langle X_2 , T_{i-1}^n \rangle \\ +2\cos(\alpha)\langle X_2 , T_i^n\rangle, \: X_1\in \mathcal{H}_+^{n+1}, X_2\in\mathcal{H}_+^n \}.\end{split}\]

4.3. Semidefinite optimization examples

In this section we give examples of semidefinite optimization problems using some of the results from Section 4.2.

4.3.1. Nearest correlation matrix

We consider the set

\[S = \{ X \in \PSD^n \mid X_{ii}=1, \: i=1,\dots,n \}\]

(shown in Fig. 4.1 for \(n=3\)). For \(A\in \mathcal{S}^n\) the nearest correlation matrix is

\[X^\star = \arg \min_{X\in S} \| A - X\|_F,\]

i.e., the projection of \(A\) onto the set \(S\). To pose this as a conic optimization we define the linear operator

\[\mathbf{svec}(U) = (U_{11},\sqrt{2}U_{21},\dots,\sqrt{2}U_{n1}, U_{22},\sqrt{2}U_{32},\dots,\sqrt{2}U_{n2},\dots,U_{nn}),\]

which extracts and scales the lower-triangular part of \(U\). We then get a conic formulation of the nearest correlation problem exploiting symmetry of \(A-X\),

(35)\[\begin{split} \begin{array}{lcl} \mbox{minimize} & t &\\ \mbox{subject to} & \| z \|_2 & \leq t\\ & \mathbf{svec}(A-X) & = z\\ & \mathbf{diag}(X) & = e\\ & X \succeq 0. \end{array}\end{split}\]

This is an example of a problem with both conic quadratic and semidefinite constraints in primal form, i.e., it matches the generic format (7).

We can add different constraints to the problem, for example a bound \(\gamma\) on the smallest eigenvalue

\[X - \gamma I \succeq 0.\]

A naive way to add the eigenvalue bound would be to introduce a new variable \(U\),

\[U = X - \gamma I, \qquad U\in \PSD^n,\]

which would approximately double the number of variables and constraints in the model. Instead we should just interpret \(U\) as a change of variables leading to a problem

\[\begin{split}\begin{array}{lcl} \mbox{minimize} & t & \\ \mbox{subject to} & \| z \|_2 & \leq t\\ & \mathbf{svec}(A-U-\lambda I) & = z\\ & \mathbf{diag}(U+\lambda I) & = e\\ & U \succeq 0. \end{array}\end{split}\]

4.3.2. Extremal ellipsoids

Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallest ellipsoid containing the polytope (for certain representations of the polytope).

4.3.2.1. Maximal inscribed ellipsoid

Consider a polytype

\[S = \{ x \in \R^n \: | \: a_i^T x \leq b_i, \: i=1,\dots,m \}.\]

The ellipsoid

\[\mathcal{E}:=\{ x \: | \: x=Cu + d, \: \|u\|\leq 1\}\]

is contained in \(S\) if and only if

\[\max_{\| u\|_2\leq 1}a_i^T(Cu + d) \leq b_i, \quad i=1,\dots,m\]

or equivalently, if and only if

\[\| C a_i\|_2 + a_i^T d \leq b_i, \quad i=1,\dots,m.\]

Since \(\mathbf{Vol}({\mathcal{E}})\approx \det(C)^{1/n}\) the maximum-volume inscribed ellipsoid is the solution to

\[\begin{split}\begin{array}{lccll} \mbox{maximize} & \det(C)^{1/n} & & &\\ \mbox{subject to} & \| C a_i\|_2 + a_i^T d & \leq & b_i & i=1,\dots,m\\ & C \succeq 0. & & & \end{array}\end{split}\]

If we use the semidefinite characterization of a fractional power of the determinant of a positive definite matrix, we get an equivalent conic formulation,

\[\begin{split}\begin{array}{ll} \mbox{maximize} & t\\ \mbox{subject to} & \| C a_i\|_2 + a_i^T d \leq b_i, \quad i=1,\dots,m\\ & \left[\begin{array}{cc} C & Z\\ Z^T & \mathbf{Diag}(Z) \end{array}\right] \succeq 0\\ & t \leq \left(Z_{11}Z_{22} \cdots Z_{nn} \right)^{1/n}, \end{array}\end{split}\]

where \(t \leq \left(Z_{11}Z_{22} \cdots Z_{nn} \right)^{1/n}\) can be modeled as the intersection of rotated quadratic cones, see Section 3.2.7.

4.3.2.2. Minimal enclosing ellipsoid

Next consider a polytope given as the convex hull of a set of points,

\[S'=\mathbf{conv}\{ x_1, x_2, \dots, x_m \}, \quad x_i\in \R^n.\]

The ellipsoid

\[\mathcal{E}' := \{ x \: | \: \|P(x-c)\|_2 \leq 1 \}\]

has \(\mathbf{Vol}({\mathcal{E'}})\approx \det(P)^{-1/n}\), so the minimum-volume enclosing ellipsoid is the solution to

\[\begin{split}\begin{array}{lccl} \mbox{minimize} & \det(P)^{-1/n} & &\\ \mbox{subject to} & \| P(x_i-c)\|_2 & \leq 1, & i=1,\ldots,m\\ & P \succeq 0. & & \end{array}\end{split}\]

Alternatively, we can maximize \(\det(P)^{1/n}\) to get an equivalent formulation

\[\begin{split}\begin{array}{ll} \mbox{maximize} & t\\ \mbox{subject to} & \| P(x_i-c)\|_2 \leq 1, \quad i=1,\dots,m\\ & \left[\begin{array}{cc} P & Z\\ Z^T & \mathbf{Diag}(Z) \end{array}\right] \succeq 0\\ & t \leq \left(Z_{11}Z_{22} \cdots Z_{nn} \right)^{1/n}. \end{array}\end{split}\]

In Fig. 4.2 we illustrate the inner and outer ellipsoidal approximations of a polytope characterized by 5 points in \(\R^2\).

_images/lownerjohn.png

Fig. 4.2 Example of inner and outer ellipsoidal approximations.

4.3.3. Polynomial curve-fitting

Consider a univariate polynomial of degree \(n\),

\[f(t) = x_0 + x_1 t + x_2 t^2 + \dots + x_n t^n.\]

Often we wish to fit such a polynomial to a given set of measurements or control points

\[\{ (t_1, y_1), (t_2, y_2), \ldots, (t_m, y_m) \},\]

i.e., we wish to determine coefficients \(x_i, \, i=0,\ldots,n\) such that

\[f(t_j) \approx y_j, \quad j=1,\ldots,m.\]

To that end, define the Vandermonde matrix

\[\begin{split}A = \left [ \begin{array}{ccccc} 1 & t_1 & t_1^2 & \dots & t_1^n\\ 1 & t_2 & t_2^2 & \dots & t_2^n\\ \vdots & \vdots & \vdots & & \vdots\\ 1 & t_m & t_m^2 & \dots & t_m^n\\ \end{array} \right ].\end{split}\]

We can then express the desired curve-fit compactly as

\[A x \approx y,\]

i.e., as a linear expression in the coefficients \(x\). When the degree of the polynomial equals the number measurements, \(n=m\), the matrix \(A\) is square and non-singular (provided there are no duplicate rows), so we can can solve

\[Ax = y\]

to find a polynomial that passes through all the control points \((t_i, y_i)\). Similarly, if \(n>m\) there are infinitely many solutions satisfying the underdetermined system \(Ax=y\). A typical choice in that case is the least-norm solution

\[x_{\mathrm{ln}} = \arg \min_{Ax=y} \| x \|_2\]

which (assuming again there are no duplicate rows) has a simply solution

\[x_{\mathrm{ln}} = A^T (AA^T)^{-1} y.\]

On the other hand, if \(n<m\) we generally cannot find a solution to the overdetermined system \(Ax=y\), and we typically resort to a least-squares solution

\[x_{\mathrm{ls}} = \arg \min \| Ax-y \|_2\]

which has a simple solution

\[x_{\mathrm{ls}} = (A^T A)^{-1} A^T y.\]

In the following we discuss how the semidefinite characterizations of nonnegative polynomials (see Section 4.2.5) lead to more advanced and useful polynomial curve-fitting constraints.

  • Nonnegativity. One possible constraint is nonnegativity on an interval,

    \[f(t) := x_0 + x_1 t + \cdots + x_n t^n \geq 0, \, \forall t\in [a, b]\]

    with a semidefinite characterization embedded in \(x \in K^n_{a,b}\), see (25).

  • Monotonicity. We can ensure monotonicity of \(f(t)\) by requiring that \(f'(t)\geq 0\) (or \(f'(t)\leq 0\)), i.e.,

    \[(x_1, 2 x_2, \ldots, n x_n) \in K^{n-1}_{a,b},\]

    or

    \[-(x_1, 2 x_2, \ldots, n x_n) \in K^{n-1}_{a,b},\]

    respectively.

  • Convexity or concavity. Convexity (or concavity) of \(f(t)\) corresponds to \(f''(t)\geq 0\) (or \(f''(t)\leq 0\)), i.e.,

    \[(2 x_2, 6 x_3, \ldots, (n-1)n x_n) \in K^{n-2}_{a,b},\]

    or

    \[-(2 x_2, 6 x_3, \ldots, (n-1)n x_n) \in K^{n-2}_{a,b},\]

    respectively.

As an example, we consider fitting a smooth polynomial

\[f_n(t) = x_0 + x_1 t + \cdots + x_n t^n\]

to the points \(\{(-1,1),(0,0),(1,1)\}\), where smoothness is implied by bounding \(|f_n'(t)|\). More specifically, we wish to solve the problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & z\\ \mbox{subject to} & |f'_n(t)| \leq z, \quad \forall t\in [-1,1]\\ & f_n(-1) = 1, \quad f_n(0) = 0, \quad f_n(1) = 1, \end{array}\end{split}\]

or equivalently

\[\begin{split}\begin{array}{ll} \mbox{minimize} & z\\ \mbox{subject to} & z - f'_n(t) \geq 0, \quad \forall t\in [-1,1]\\ & f'_n(t) - z \geq 0, \quad \forall t\in [-1,1]\\ & f_n(-1) = 1, \quad f_n(0) = 0, \quad f_n(1) = 1. \end{array}\end{split}\]

Finally, we use the characterizations \(K_{a,b}^n\) to get a conic problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & z\\ \mbox{subject to} & (z-x_1, -2x_2, \ldots, -nx_n) \in K^{n-1}_{-1,1}\\ & (x_1-z, 2x_2, \ldots, nx_n) \in K^{n-1}_{-1,1}\\ & f_n(-1) = 1, \quad f_n(0) = 0, \quad f_n(1) = 1. \end{array}\end{split}\]
_images/curvefit1.png

Fig. 4.3 Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through \(\{(-1,1),(0,0),(1,1)\}\). The higher-degree polynomials are increasingly smoother on \([-1,1]\).

In Fig. 4.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, respectively. The second degree polynomial is uniquely determined by the three constraints \(f_2(-1)=1\), \(f_2(0)=0\), \(f_2(1)=1\), i.e., \(f_2(t) = t^2\). Also, we obviously have a lower bound on the largest derivative \(\max_{t\in[-1,1]}|f_n'(t)|\geq 1\). The computed fourth degree polynomial is given by

\[f_4(t) = \frac{3}{2} t^2 - \frac{1}{2} t^4\]

after rounding coefficients to rational numbers. Furthermore, the largest derivative is given by

\[f_4'(1/\sqrt{2}) = \sqrt{2},\]

and \(f_4''(t)< 0\) on \((1/\sqrt{2},1]\) so, although not visibly clear, the polynomial is nonconvex on \([-1,1]\). In Fig. 4.4 we show the graphs of the corresponding polynomials where we added a convexity constraint \(f''_n(t)\geq 0\), i.e.,

\[(2 x_2, 6 x_3, \ldots, (n-1)n x_n) \in K^{n-2}_{-1,1}.\]

In this case, we get

\[f_4(t) = \frac{6}{5} t^2 - \frac{1}{5} t^4\]

and the largest derivative increases to \(\frac{8}{5}\).

_images/curvefit2.png

Fig. 4.4 Graph of univariate polynomials of degree \(2, 4,\) and \(8\), respectively, passing through \(\{(-1,1),(0,0),(1,1)\}\). The polynomials all have positive second derivative (i.e., they are convex) on \([-1,1]\) and the higher-degree polynomials are increasingly smoother on that interval.

4.3.4. Filter design problems

An important signal processing application is filter design problems over trigonometric polynomials

\[\begin{split}\begin{array}{cl} H(\omega) & = x_0 + 2\Re( \sum_{k=1}^n x_k e^{-j\omega k} )\\ & = a_0 + 2\sum_{k=1}^n \left (a_k \cos(\omega k) + b_k \sin(\omega k) \right ) \end{array}\end{split}\]

where \(a_k:=\Re (x_k)\), \(b_k:=\Im (x_k)\). If the function \(H(\omega)\) is nonnegative we call it a transfer function, and it describes how different harmonic components of a periodic discrete signal are attenuated when a filter with transfer function \(H(\omega)\) is applied to the signal.

We often wish a transfer function where \(H(\omega)\approx 1\) for \(0\leq \omega \leq \omega_p\) and \(H(\omega)\approx 0\) for \(\omega_s \leq \omega \leq \pi\) for given constants \(\omega_p, \omega_s\). One possible formulation for achieving this is

\[\begin{split}\begin{array}{lcccccl} \mbox{minimize} & & & t & & &\\ \mbox{subject to} & 0 & \leq & H(\omega) & & & \forall \omega\in [0,\pi]\\ & 1-\delta & \leq & H(\omega) & \leq & 1+\delta & \forall \omega\in [0,\omega_p]\\ & & & H(\omega) & \leq & t & \forall \omega\in [\omega_s,\pi], \end{array}\end{split}\]

which corresponds to minimizing \(H(w)\) on the interval \([\omega_s,\pi]\) while allowing \(H(w)\) to depart from unity by a small amount \(\delta\) on the interval \([0, \omega_p]\). Using the results from Section 4.2.7 (in particular (32), (33) and (34)), we can pose this as a conic optimization problem

(36)\[\begin{split}\begin{array}{lrcl} \mbox{minimize} & t & \\ \mbox{subject to}& x & \in & K^n_{0,\pi}\\ & (x_0-(1-\delta),x_{1:n}) & \in & K^n_{0,\omega_p}\\ & -(x_0-(1+\delta),x_{1:n}) & \in & K^n_{0,\omega_p}\\ & -(x_0-t,x_{1:n}) & \in & K^n_{\omega_s, \pi}, \end{array}\end{split}\]

which is a semidefinite optimization problem. In Fig. 4.5 we show \(H(\omega)\) obtained by solving (36) for \(n=10\), \(\delta=0.05\), \(\omega_p=\pi/4\) and \(\omega_s=\omega_p+\pi/8\).

_images/trigpoly.png

Fig. 4.5 Plot of lowpass filter transfer function.

4.3.5. Relaxations of binary optimization

Semidefinite optimization is also useful for computing bounds on difficult non-convex or combinatorial optimization problems. For example consider the binary linear optimization problem

(37)\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b \\ & x\in \{ 0, 1\}^n. \end{array}\end{split}\]

In general, problem (37) is a very difficult non-convex problem where we have to explore \(2^n\) different objectives. Alternatively, we can use semidefinite optimization to get a lower bound on the optimal solution with polynomial complexity. We first note that

\[x_i \in \{0,1\} \quad \Longleftrightarrow \quad x_i^2=x_i,\]

which is, in fact, equivalent to a rank constraint on a semidefinite variable,

\[X = xx^T, \qquad \mathbf{diag}(X) = x.\]

By relaxing the rank 1 constraint on \(X\) we get a semidefinite relaxation of (37),

(38)\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x\\ \mbox{subject to} & Ax = b\\ & \mathbf{diag}(X) = x\\ & X \succeq xx^T, \end{array}\end{split}\]

where we note that

\[\begin{split}X \succeq xx^T \quad \Longleftrightarrow \quad \left[\begin{array}{cc} 1 & x^T \\ x & X \end{array}\right] \succeq 0.\end{split}\]

Since (38) is a semidefinite optimization problem, it can be solved very efficiently. Suppose \(x^\star\) is an optimal solution for (37); then \((x^\star,x^\star (x^\star)^T)\) is also feasible for (38), but the feasible set for (38) is larger than the feasible set for (37), so in general the optimal solution of (38) serves as a lower bound. However, if the optimal solution \(X^\star\) of (38) has rank 1 we have found a solution to (37) also.

The semidefinite relaxation (38) can be derived more systematically. The Lagrangian function of (37) is

\[\begin{split}\begin{array}{cl} L(x,y) & = c^T x + y^T(Ax-b) + \sum_{i=1}^n z_i(x_i^2 - x_i) \\ & = x^T (c + A^Ty - z) + x^T \mathbf{diag}(z) x - b^T y, \end{array}\end{split}\]

which is bounded below only if \(\mathbf{diag}(z)\succeq 0\) and \((c+A^Ty-z)\in\mathcal{R}(\mathbf{diag}(z))\). Assume for simplicity that \(\mathbf{diag}(z)\succ 0\). Then the minimizer of \(L(x,y)\) is

\[\arg\min_x L(x,y) = -(1/2)\mathbf{diag}(z)^{-1}(c+A^Ty-z)\]

and we get a dual problem

\[\begin{split}\begin{array}{ll} \mbox{maximize} & -(1/4)(c+A^Ty-z)^T \mathbf{diag}(z)^{-1} (c+A^Ty-z) - b^T y\\ \mbox{subject to} & z > 0. \end{array}\end{split}\]

From Schur’s Lemma Lemma 14 this is equivalent to a standard semidefinite optimization problem

(39)\[\begin{split}\begin{array}{ll} \mbox{maximize} & -y^T b -t\\ \mbox{subject to} & \left [ \begin{array}{cc} t & \frac{1}{2}(c + A^T y -z)\\ \frac{1}{2}(c + A^T y -z)^T & \mathbf{diag}(z) \end{array} \right ]\succeq 0. \end{array}\end{split}\]

This dual problem gives us a lower bound. To obtain an explicit relaxation of (37) we derive the dual once more, but this time of problem (39). The Lagrangian function of (39) is

\[\begin{split}\begin{array}{cl} L(t, y, X) & = -y^T b - t + \langle \left[ \begin{array}{cc} \nu & x^T \\ x & X \end{array} \right] , \left[ \begin{array}{ccc} t & \frac{1}{2}(c + A^T y -z)\\\frac{1}{2}(c + A^T y -z)^T & \mathbf{diag}(z) \end{array} \right] \rangle\\ & = t(\nu-1) + y^T(Ax-b) + z^T(\mathbf{Diag}(X)-x) + c^Tx \end{array}\end{split}\]

which is only bounded above if \(\nu=1\), \(Ax=b\), \(\mathbf{Diag}(X) = x\), i.e., we get the dual problem (38), which is called a Lagrangian relaxation. From strong duality, problem (38) has the same optimal value as problem (39).

We can tighten (or improve) the relaxation (38) by adding other constraints the cuts away part of the feasible set, without excluding rank 1 solutions. By tightening the relaxation, we reduce the duality gap between the optimal values of the original problem and the relaxation. For example, we can add the constraints

\[0 \leq x_i \leq 1, \quad i=1,\dots,n\]

and

\[0 \leq X_{ij} \leq 1, \quad i=1,\dots,n, \: j=1,\dots,n.\]

A semidefinite matrix \(X\) with \(X_{ij}\geq 0\) is called a doubly nonnegative matrix. In practice, constraining a semidefinite matrix to be doubly nonnegative has a dramatic impact on the solution time and memory requirements of an optimization problem, since we add \(n^2\) linear inequality constraints.

4.3.6. Relaxations of boolean optimization

Similarly to Section 4.3.5 we can use semidefinite relaxations of boolean constraints \(x\in \{-1,+1\}^n\). To that end, we note that

(40)\[x\in\{-1,+1\}^n \quad \Longleftrightarrow \quad X = xx^T, \quad \mathbf{diag}(X) = e,\]

with a semidefinite relaxation \(X \succeq xx^T\) of the rank-1constraint.

As a (standard) example of a combinatorial problem with boolean constraints, let us consider an undirected graph \(G\) described by a set of vertices \(V=\{v_1, \dots, v_n \}\) and a set of edges \(E=\{ (i,j) \mid i,j\in V, \: i\neq j\}\), and we wish to find the cut of maximum capacity. A cut \(C\) partitions the nodes \(V\) into two disjoint sets \(S\) and \(T=V\setminus S\), and the cut-set \(I\) is the set of edges with one node in \(S\) and another in \(T\), i.e.,

\[I = \{ (u,v) \in E \mid u \in S, \: v\in T \}.\]

The capacity of a cut is then defined as the number of edges in the cut-set \(|I|\); for example Fig. 4.6 illustrates a cut \(\{ v_2, v_4, v_5\}\) of capacity \(9\).

_images/graph7.png

Fig. 4.6 Undirected graph. The cut \(\{ v_2, v_4, v_5 \}\) has capacity 9.

To maximize the capacity of a cut we define the symmetric adjacency matrix \(A\in\mathcal{S}^n\),

\[\begin{split}[A]_{ij} = \left \{ \begin{array}{ll}1, & (v_i,v_j)\in E\\0, & \text{otherwise}\end{array}\right.\end{split}\]

where \(n=|V|\), and let

\[\begin{split}x_i = \left \{ \begin{array}{ll}+1, & v_i\in S\\-1, & v_i \notin S. \end{array}\right.\end{split}\]

Suppose \(v_i\in S\). Then \(1-x_i x_j=0\) if \(v_j\in S\) and \(1-x_i x_j = 2\) if \(v_j\notin S\), so we get an expression of the capacity as

\[c(x) = \frac{1}{4}\sum_{ij}(1-x_i x_j)A_{ij} = \frac{1}{4}e^T A e - \frac{1}{4}x^T A x,\]

and discarding the constant term \(e^T A e\) gives us the MAX-CUT problem

(41)\[\begin{split}\begin{array}{ll} \mbox{minimize} & x^T A x\\ \mbox{subject to} & x \in \{-1, +1\}^n, \end{array}\end{split}\]

with a semidefinite relaxation

(42)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \langle A, X\rangle \\ \mbox{subject to} & \mathbf{diag}(X) = e\\ & X \succeq 0. \end{array}\end{split}\]

The MAX-CUT relaxation can also be derived using Lagrangian duality, analogous to Section 4.3.5.

4.3.7. Converting a problem to standard form

From a modeling perspective it does not matter whether constraints are given as linear matrix inequalities or as an intersection of affine hyperplanes; one formulation is easily converted to other using extraneous variables and constraints, and this transformation is often done transparently by optimization software. Nevertheless, it is instructive to study an explicit example of how to carry out this transformation.

Optimization solvers require a problem to be posed either in primal form (7) or dual form (8), but problems often have both linear equality constraints and linear matrix inequalities, so one must be converted to the other. A linear matrix inequality

\[A_0 + x_1 A_1 + \dots + x_n A_n \succeq 0\]

where \(A_i\in\mathcal{S}^m\) is converted to a set of linear equality constraints using a slack variable

(43)\[A_0 + x_1 A_1 + \dots + x_n A_n = S, \qquad S \succeq 0.\]

Apart from introducing an additional semidefinite variable \(S\in\PSD^m\), we also add \(m(m+1)/2\) equality constraints. On the other hand, a semidefinite variable \(X\in\PSD^n\) can be rewritten as a linear matrix inequality with \(n(n+1)/2\) scalar variables

(44)\[X = \sum_{i=1}^n e_ie_i^T x_{ii} + \sum_{i=1}^n\sum_{j=i+1}^n (e_ie_j^T + e_je_i^T) x_{ij} \succeq 0.\]

Such conversions are often necessary, but they can make the computational cost of the resulting problem prohibitively high.

Obviously we should only use the the transformations (43) and (44) when necessary; if we have a problem that is more naturally interpreted in either primal or dual form, we should be careful to recognize that structure. For example, consider a problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & e^T z \\ \mbox{subject to} & A + \mathbf{Diag}(z) = X\\ & X\succeq 0. \end{array}\end{split}\]

with the variables \(X\in\PSD^n\) and \(z\in \R^n\). This is a problem in primal form with \(n(n+1)/2\) equality constraints, but they are more naturally interpreted as a linear matrix inequality

\[A + \sum_i e_i e_i^T z_i \succeq 0.\]

The dual problem is

\[\begin{split}\begin{array}{ll} \mbox{maximize} & -\langle A, Z\rangle \\ \mbox{subject to} & \mathbf{diag}(Z) = e\\ & Z\succeq 0, \end{array}\end{split}\]

in the variable \(Z\in \PSD^n\), which corresponds to the MAX-CUT relaxation. The dual problem has only \(n\) equality constraints, which is a vast improvement over the \(n(n+1)/2\) constraints in the primal problem.

4.4. Bibliography

Much of the material in this chapter is based on the paper [VB96] and the books [BenTalN01], [BKVH04]. The section on optimization over nonnegative polynomials is based on [Nes99], [Hac03]. We refer to [LR05] for a comprehensive survey on semidefinite optimization and relaxations in combinatorial optimization.