6 Semidefinite optimization

In this chapter we extend the conic optimization framework introduced before with symmetric positive semidefinite matrix variables.

6.1 Introduction to semidefinite matrices

6.1.1 Semidefinite matrices and cones

A symmetric matrix XSn is called symmetric positive semidefinite if

zTXz0,zRn.

We then define the cone of symmetric positive semidefinite matrices as

(6.1)S+n={XSnzTXz0,zRn}.

For brevity we will often use the shorter notion semidefinite instead of symmetric positive semidefinite, and we will write XY (XY) as shorthand notation for (XY)S+n ((YX)S+n). As inner product for semidefinite matrices, we use the standard trace inner product for general matrices, i.e.,

A,B:=tr(ATB)=ijaijbij.

It is easy to see that (6.1) indeed specifies a convex cone; it is pointed (with origin X=0), and X,YS+n implies that (αX+βY)S+n,α,β0. Let us review a few equivalent definitions of S+n. It is well-known that every symmetric matrix A has a spectral factorization

A=i=1nλiqiqiT.

where qiRn are the (orthogonal) eigenvectors and λi are eigenvalues of A. Using the spectral factorization of A we have

xTAx=i=1nλi(xTqi)2,

which shows that xTAx0λi0,i=1,,n. In other words,

(6.3)S+n={XSnλi(X)0,i=1,,n}.

Another useful characterization is that AS+n if and only if it is a Grammian matrix A=VTV. Here V is called the Cholesky factor of A. Using the Grammian representation we have

xTAx=xTVTVx=Vx22,

i.e., if A=VTV then xTAx0 for all x. On the other hand, from the positive spectral factorization A=QΛQT we have A=VTV with V=Λ1/2QT, where Λ1/2=Diag(λ1,,λn). We thus have the equivalent characterization

(6.4)S+n={XS+nX=VTV for some VRn×n}.

In a completely analogous way we define the cone of symmetric positive definite matrices as

S++n={XSnzTXz>0,zRn}={XSnλi(X)>0,i=1,,n}={XS+nX=VTV for some VRn×n,rank(V)=n},

and we write XY (XY) as shorthand notation for (XY)S++n ((YX)S++n).

The one dimensional cone S+1 simply corresponds to R+. Similarly consider

X=[x1x3x3x2]

with determinant det(X)=x1x2x32=λ1λ2 and trace tr(X)=x1+x2=λ1+λ2. Therefore X has positive eigenvalues if and only if

x1x2x32,x1,x20,

which characterizes a three-dimensional scaled rotated cone, i.e.,

[x1x3x3x2]S+2(x1,x2,x32)Qr3.

More generally we have

xR+nDiag(x)S+n

and

(t,x)Qn+1[txTxtI]S+n+1,

where the latter equivalence follows immediately from Lemma 6.1. Thus both the linear and quadratic cone are embedded in the semidefinite cone. In practice, however, linear and quadratic cones should never be described using semidefinite constraints, which would result in a large performance penalty by squaring the number of variables.

Example 6.1

As a more interesting example, consider the symmetric matrix

(6.5)A(x,y,z)=[1xyx1zyz1]

parametrized by (x,y,z). The set

S={(x,y,z)R3A(x,y,z)S+3},

(shown in Fig. 6.1) is called a spectrahedron and is perhaps the simplest bounded semidefinite representable set, which cannot be represented using (finitely many) linear or quadratic cones. To gain a geometric intuition of S, we note that

det(A(x,y,z))=(x2+y2+z22xyz1),

so the boundary of S can be characterized as

x2+y2+z22xyz=1,

or equivalently as

[xy]T[1zz1][xy]=1z2.

For z=0 this describes a circle in the (x,y)-plane, and for 1z1 it characterizes an ellipse (for a fixed z).

_images/3dset.png

Fig. 6.1 Plot of spectrahedron S={(x,y,z)R3A(x,y,z)0}.

6.1.2 Properties of semidefinite matrices

Many useful properties of (semi)definite matrices follow directly from the definitions (6.1)-(6.4) and their definite counterparts.

  • The diagonal elements of AS+n are nonnegative. Let ei denote the ith standard basis vector (i.e., [ei]j=0,ji, [ei]i=1). Then Aii=eiTAei, so (6.1) implies that Aii0.

  • A block-diagonal matrix A=Diag(A1,Ap) is (semi)definite if and only if each diagonal block Ai is (semi)definite.

  • Given a quadratic transformation M:=BTAB, M0 if and only if A0 and B has full rank. This follows directly from the Grammian characterization M=(VB)T(VB). For M0 we only require that A0. As an example, if A is (semi)definite then so is any permutation PTAP.

  • Any principal submatrix of AS+n (A restricted to the same set of rows as columns) is positive semidefinite; this follows by restricting the Grammian characterization A=VTV to a submatrix of V.

  • The inner product of positive (semi)definite matrices is positive (nonnegative). For any A,BS++n let A=UTU and B=VTV where U and V have full rank. Then

    A,B=tr(UTUVTV)=UVTF2>0,

    where strict positivity follows from the assumption that U has full column-rank, i.e., UVT0.

  • The inverse of a positive definite matrix is positive definite. This follows from the positive spectral factorization A=QΛQT, which gives us

    A1=QTΛ1Q

    where Λii>0. If A is semidefinite then the pseudo-inverse A of A is semidefinite.

  • Consider a matrix XSn partitioned as

    X=[ABTBC].

    Let us find necessary and sufficient conditions for X0. We know that A0 and C0 (since any principal submatrix must be positive definite). Furthermore, we can simplify the analysis using a nonsingular transformation

    L=[I0FI]

    to diagonalize X as LXLT=D, where D is block-diagonal. Note that det(L)=1, so L is indeed nonsingular. Then X0 if and only if D0. Expanding LXLT=D, we get

    [AAFT+BTFA+BFAFT+FBT+BFT+C]=[D100D2].

    Since det(A)0 (by assuming that A0) we see that F=BA1 and direct substitution gives us

    [A00CBA1BT]=[D100D2].

In the last part we have thus established the following useful result.

Lemma 6.1 Schur complement lemma

A symmetric matrix

X=[ABTBC].

is positive definite if and only if

A0,CBA1BT0.

6.2 Semidefinite modeling

Having discussed different characterizations and properties of semidefinite matrices, we next turn to different functions and sets that can be modeled using semidefinite cones and variables. Most of those representations involve semidefinite matrix-valued affine functions, which we discuss next.

6.2.1 Linear matrix inequalities

Consider an affine matrix-valued mapping A:RnSm:

(6.8)A(x)=A0+x1A1++xnAn.

A linear matrix inequality (LMI) is a constraint of the form

(6.9)A0+x1A1++xnAn0

in the variable xRn with symmetric coefficients AiSm,i=0,,n. As a simple example consider the matrix in (6.5),

A(x,y,z)=A0+xA1+yA2+zA30

with

A0=I,A1=[010100000],A2=[001000100],A3=[000001010].

Alternatively, we can describe the linear matrix inequality A(x,y,z)0 as

XS+3,X11=X22=X33=1,x=X21,y=X31,z=X32,

i.e., as a semidefinite variable with fixed diagonal; these two alternative formulations illustrate the difference between primal and dual form of semidefinite problems, see Sec. 8.6 (Semidefinite duality and LMIs).

6.2.2 Eigenvalue optimization

Consider a symmetric matrix ASm and let its eigenvalues be denoted by

λ1(A)λ2(A)λm(A).

A number of different functions of λi can then be described using a mix of linear and semidefinite constraints.

Sum of eigenvalues

The sum of the eigenvalues corresponds to

i=1mλi(A)=tr(A).

Largest eigenvalue

The largest eigenvalue can be characterized in epigraph form λ1(A)t as

(6.10)tIA0.

To verify this, suppose we have a spectral factorization A=QΛQT where Q is orthogonal and Λ is diagonal. Then t is an upper bound on the largest eigenvalue if and only if

QT(tIA)Q=tIΛ0.

Thus we can minimize the largest eigenvalue of A.

Smallest eigenvalue

The smallest eigenvalue can be described in hypograph form λm(A)t as

(6.11)AtI,

i.e., we can maximize the smallest eigenvalue of A.

Eigenvalue spread

The eigenvalue spread can be modeled in epigraph form

λ1(A)λm(A)t

by combining the two linear matrix inequalities in (6.10) and (6.11), i.e.,

(6.12)zIAsI,szt.

Spectral radius

The spectral radius ρ(A):=maxi|λi(A)| can be modeled in epigraph form ρ(A)t using two linear matrix inequalities

tIAtI.

Condition number of a positive definite matrix

Suppose now that AS+m. The condition number of a positive definite matrix can be minimized by noting that λ1(A)/λm(A)t if and only if there exists a μ>0 such that

μIAμtI,

or equivalently if and only if Iμ1AtI. If A=A(x) is represented as in (6.8) then a change of variables z:=x/μ, ν:=1/μ leads to a problem of the form

minimizetsubject toIνA0+i=1mziAitI,

from which we recover the solution x=z/ν. In essence, we first normalize the spectrum by the smallest eigenvalue, and then minimize the largest eigenvalue of the normalized linear matrix inequality. Compare Sec. 2.2.5 (Homogenization).

6.2.3 Log-determinant

Consider again a symmetric positive-definite matrix AS+m. The determinant

det(A)=i=1mλi(A)

is neither convex or concave, but logdet(A) is concave and we can write the inequality

tlogdet(A)

in the form of the following problem:

(6.13)[AZZTdiag(Z)]0,Z is lower triangular,tilogZii.

The equivalence of the two problems follows from Lemma 6.1 and subadditivity of determinant for semidefinite matrices. That is:

0det(AZTdiag(Z)1Z)det(A)det(ZTdiag(Z)1Z)=det(A)det(Z).

On the other hand the optimal value det(A) is attained for Z=LD if A=LDLT is the LDL factorization of A.

The last inequality in problem (6.13) can of course be modeled using the exponential cone as in Sec. 5.2 (Modeling with the exponential cone). Note that we can replace that bound with t(iZii)1/m to get instead the model of tdet(A)1/m using Sec. 4.2.4 (Geometric mean).

6.2.4 Singular value optimization

We next consider a non-square matrix ARm×p. Assume pm and denote the singular values of A by

σ1(A)σ2(A)σp(A)0.

The singular values are connected to the eigenvalues of ATA via

(6.14)σi(A)=λi(ATA),

and if A is square and symmetric then σi(A)=|λi(A)|. We show next how to optimize several functions of the singular values.

Largest singular value

The epigraph σ1(A)t can be characterized using (6.14) as

ATAt2I,

which from Schur’s lemma is equivalent to

(6.15)[tIAATtI]0.

The largest singular value σ1(A) is also called the spectral norm or the 2-norm of A, A2:=σ1(A).

Sum of singular values

The trace norm or the nuclear norm of X is the dual of the 2-norm:

(6.16)X=supZ21tr(XTZ).

It turns out that the nuclear norm corresponds to the sum of the singular values,

(6.17)X=i=1mσi(X)=i=1nλi(XTX),

which is easy to verify using singular value decomposition X=UΣVT. We have

supZ21tr(XTZ)=supZ21tr(ΣTUTZV)=supY21tr(ΣTY)=sup|yi|1i=1pσiyi=i=1pσi.

which shows (6.17). Alternatively, we can express (6.16) as the solution to

(6.18)maximizetr(XTZ)subject to[IZTZI]0,

with the dual problem (see Example 8.8)

(6.19)minimizetr(U+V)/2subject to[UXTXV]0.

In other words, using strong duality we can characterize the epigraph At with

(6.20)[UATAV]0,tr(U+V)/2t.

For a symmetric matrix the nuclear norm corresponds to the sum of absolute values of eigenvalues, and for a semidefinite matrix it simply corresponds to the trace of the matrix.

6.2.5 Matrix inequalities from Schur’s Lemma

Several quadratic or quadratic-over-linear matrix inequalities follow immediately from Schur’s lemma. Suppose A:Rm×p and B:Rp×p are matrix variables. Then

ATB1AC

if and only if

[CATAB]0.

6.2.6 Nonnegative polynomials

We next consider characterizations of polynomials constrained to be nonnegative on the real axis. To that end, consider a polynomial basis function

v(t)=(1,t,,t2n).

It is then well-known that a polynomial f:RR of even degree 2n is nonnegative on the entire real axis

(6.21)f(t):=xTv(t)=x0+x1t++x2nt2n0,t

if and only if it can be written as a sum of squared polynomials of degree n (or less), i.e., for some q1,q2Rn+1

(6.22)f(t)=(q1Tu(t))2+(q2Tu(t))2,u(t):=(1,t,,tn).

It turns out that an equivalent characterization of {xxTv(t)0,t} can be given in terms of a semidefinite variable X,

(6.23)xi=X,Hi,i=0,,2n,XS+n+1.

where Hin+1R(n+1)×(n+1) are Hankel matrices

[Hi]kl={1,k+l=i,0,otherwise.

When there is no ambiguity, we drop the superscript on Hi. For example, for n=2 we have

H0=[100000000],H1=[010100000],H4=[000000001].

To verify that (6.21) and (6.23) are equivalent, we first note that

u(t)u(t)T=i=02nHivi(t),

i.e.,

[1ttn][1ttn]T=[1ttntt2tn+1tntn+1t2n].

Assume next that f(t)0. Then from (6.22) we have

f(t)=(q1Tu(t))2+(q2Tu(t))2=q1q1T+q2q2T,u(t)u(t)T=i=02nq1q1T+q2q2T,Hivi(t),

i.e., we have f(t)=xTv(t) with xi=X,Hi,X=(q1q1T+q2q2T)0. Conversely, assume that (6.23) holds. Then

f(t)=i=02nHi,Xvi(t)=X,i=02nHivi(t)=X,u(t)u(t)T0

since X0. In summary, we can characterize the cone of nonnegative polynomials over the real axis as

(6.24)Kn={xRn+1xi=X,Hi,i=0,,2n,XS+n+1}.

Checking nonnegativity of a univariate polynomial thus corresponds to a semidefinite feasibility problem.

6.2.6.1 Nonnegativity on a finite interval

As an extension we consider a basis function of degree n,

v(t)=(1,t,,tn).

A polynomial f(t):=xTv(t) is then nonnegative on a subinterval I=[a,b]R if and only f(t) can be written as a sum of weighted squares,

f(t)=w1(t)(q1Tu1(t))2+w2(t)(q2Tu2(t))2

where wi(t) are polynomials nonnegative on [a,b]. To describe the cone

Ka,bn={xRn+1f(t)=xTv(t)0,t[a,b]}

we distinguish between polynomials of odd and even degree.

  • Even degree. Let n=2m and denote

    u1(t)=(1,t,,tm),u2(t)=(1,t,,tm1).

    We choose w1(t)=1 and w2(t)=(ta)(bt) and note that w2(t)0 on [a,b]. Then f(t)0,t[a,b] if and only if

    f(t)=(q1Tu1(t))2+w2(t)(q2Tu2(t))2

    for some q1,q2, and an equivalent semidefinite characterization can be found as

    (6.25)Ka,bn={xRn+1xi=X1,Him+X2,(a+b)Hi1m1abHim1Hi2m1,i=0,,n,X1S+m,X2S+m1}.
  • Odd degree. Let n=2m+1 and denote u(t)=(1,t,,tm). We choose w1(t)=(ta) and w2(t)=(bt). We then have that f(t)=xTv(t)0,t[a,b] if and only if

    f(t)=(ta)(q1Tu(t))2+(bt)(q2Tu(t))2

    for some q1,q2, and an equivalent semidefinite characterization can be found as

    (6.26)Ka,bn={xRn+1xi=X1,Hi1maHim+X2,bHimHi1m,i=0,,n,X1,X2S+m}.

6.2.7 Hermitian matrices

Semidefinite optimization can be extended to complex-valued matrices. To that end, let Hn denote the cone of Hermitian matrices of order n, i.e.,

(6.27)XHnXCn×n,XH=X,

where superscript ’H’ denotes Hermitian (or complex) transposition. Then XH+n if and only if

zHXz=(ziz)T(X+iX)(z+iz)=[zz]T[XXXX][zz]0,zCn.

In other words,

(6.28)XH+n[XXXX]S+2n.

Note that (6.27) implies skew-symmetry of X, i.e., X=XT.

6.2.8 Nonnegative trigonometric polynomials

As a complex-valued variation of the sum-of-squares representation we consider trigonometric polynomials; optimization over cones of nonnegative trigonometric polynomials has several important engineering applications. Consider a trigonometric polynomial evaluated on the complex unit-circle

(6.29)f(z)=x0+2(i=1nxizi),|z|=1

parametrized by xR×Cn. We are interested in characterizing the cone of trigonometric polynomials that are nonnegative on the angular interval [0,π],

K0,πn={xR×Cnx0+2(i=1nxizi)0,z=ejt,t[0,π]}.

Consider a complex-valued basis function

v(z)=(1,z,,zn).

The Riesz-Fejer Theorem states that a trigonometric polynomial f(z) in (6.29) is nonnegative (i.e., xK0,πn) if and only if for some qCn+1

(6.30)f(z)=|qHv(z)|2.

Analogously to Sec. 6.2.6 (Nonnegative polynomials) we have a semidefinite characterization of the sum-of-squares representation, i.e., f(z)0, z=ejt, t[0,2π] if and only if

(6.31)xi=X,Ti,i=0,,n,XH+n+1

where Tin+1R(n+1)×(n+1) are Toeplitz matrices

[Ti]kl={1,kl=i0,otherwise,i=0,,n.

When there is no ambiguity, we drop the superscript on Ti. For example, for n=2 we have

T0=[100010001],T1=[000100010],T2=[000000100].

To prove correctness of the semidefinite characterization, we first note that

v(z)v(z)H=I+i=1n(Tivi(z))+i=1n(Tivi(z))H

i.e.,

[1zzn][1zzn]H=[1z1znz1z1nznzn11].

Next assume that (6.30) is satisfied. Then

f(z)=qqH,v(z)v(z)H=qqH,I+qqH,i=1nTivi(z)+qqH,i=1nTiTvi(z)=qqH,I+i=1nqqH,Tivi(z)+i=1nqqH,TiTvi(z)=x0+2(i=1nxivi(z))

with xi=qqH,Ti. Conversely, assume that (6.31) holds. Then

f(z)=X,I+i=1nX,Tivi(z)+i=1nX,TiTvi(z)=X,v(z)v(z)H0.

In other words, we have shown that

(6.32)K0,πn={xR×Cnxi=X,Ti,i=0,,n,XH+n+1}.

6.2.8.1 Nonnegativity on a subinterval

We next sketch a few useful extensions. An extension of the Riesz-Fejer Theorem states that a trigonometric polynomial f(z) of degree n is nonnegative on I(a,b)={zz=ejt,t[a,b][0,π]} if and only if it can be written as a weighted sum of squared trigonometric polynomials

f(z)=|f1(z)|2+g(z)|f2(z)|2

where f1,f2,g are trigonemetric polynomials of degree n,nd and d, respectively, and g(z)0 on I(a,b). For example g(z)=z+z12cosα is nonnegative on I(0,α), and it can be verified that f(z)0,zI(0,α) if and only if

xi=X1,Tin+1+X2,Ti+1n+X2,Ti1n2cos(α)X2,Tin,

for X1H+n+1, X2H+n, i.e.,

(6.33)K0,αn={xR×Cnxi=X1,Tin+1+X2,Ti+1n+X2,Ti1n2cos(α)X2,Tin,X1H+n+1,X2H+n}.

Similarly f(z)0,zI(α,π) if and only if

xi=X1,Tin+1+X2,Ti+1n+X2,Ti1n2cos(α)X2,Tin

i.e.,

(6.34)Kα,πn={xR×Cnxi=X1,Tin+1+X2,Ti+1n+X2,Ti1n+2cos(α)X2,Tin,X1H+n+1,X2H+n}.

6.3 Semidefinite optimization case studies

6.3.1 Nearest correlation matrix

We consider the set

S={XS+nXii=1,i=1,,n}

(shown in Fig. 6.1 for n=3). For ASn the nearest correlation matrix is

X=argminXSAXF,

i.e., the projection of A onto the set S. To pose this as a conic optimization we define the linear operator

svec(U)=(U11,2U21,,2Un1,U22,2U32,,2Un2,,Unn),

which extracts and scales the lower-triangular part of U. We then get a conic formulation of the nearest correlation problem exploiting symmetry of AX,

(6.35)minimizetsubject toz2t,svec(AX)=z,diag(X)=e,X0.

This is an example of a problem with both conic quadratic and semidefinite constraints in primal form. We can add different constraints to the problem, for example a bound γ on the smallest eigenvalue by replacing X0 with XγI.

6.3.2 Extremal ellipsoids

Given a polytope we can find the largest ellipsoid contained in the polytope, or the smallest ellipsoid containing the polytope (for certain representations of the polytope).

6.3.2.1 Maximal inscribed ellipsoid

Consider a polytope

S={xRn|aiTxbi,i=1,,m}.

The ellipsoid

E:={x|x=Cu+d,u1}

is contained in S if and only if

maxu21aiT(Cu+d)bi,i=1,,m

or equivalently, if and only if

Cai2+aiTdbi,i=1,,m.

Since Vol(E)det(C)1/n the maximum-volume inscribed ellipsoid is the solution to

maximizedet(C)subject toCai2+aiTdbi,i=1,,m,C0.

In Sec. 6.2.2 (Eigenvalue optimization) we show how to maximize the determinant of a positive definite matrix.

6.3.2.2 Minimal enclosing ellipsoid

Next consider a polytope given as the convex hull of a set of points,

S=conv{x1,x2,,xm},xiRn.

The ellipsoid

E:={x|Pxc21}

has Vol(E)det(P)1/n, so the minimum-volume enclosing ellipsoid is the solution to

maximizedet(P)subject toPxic21,i=1,,m,P0.
_images/lownerjohn.png

Fig. 6.2 Example of inner and outer ellipsoidal approximations of a pentagon in R2.

6.3.3 Polynomial curve-fitting

Consider a univariate polynomial of degree n,

f(t)=x0+x1t+x2t2++xntn.

Often we wish to fit such a polynomial to a given set of measurements or control points

{(t1,y1),(t2,y2),,(tm,ym)},

i.e., we wish to determine coefficients xi,i=0,,n such that

f(tj)yj,j=1,,m.

To that end, define the Vandermonde matrix

A=[1t1t12t1n1t2t22t2n1tmtm2tmn].

We can then express the desired curve-fit compactly as

Axy,

i.e., as a linear expression in the coefficients x. When the degree of the polynomial equals the number measurements, n=m, the matrix A is square and non-singular (provided there are no duplicate rows), so we can can solve

Ax=y

to find a polynomial that passes through all the control points (ti,yi). Similarly, if n>m there are infinitely many solutions satisfying the underdetermined system Ax=y. A typical choice in that case is the least-norm solution

xln=argminAx=yx2

which (assuming again there are no duplicate rows) equals

xln=AT(AAT)1y.

On the other hand, if n<m we generally cannot find a solution to the overdetermined system Ax=y, and we typically resort to a least-squares solution

xls=argminAxy2

which is given by the formula (see Sec. 8.5.1 (Linear regression and the normal equation))

xls=(ATA)1ATy.

In the following we discuss how the semidefinite characterizations of nonnegative polynomials (see Sec. 6.2.6 (Nonnegative polynomials)) lead to more advanced and useful polynomial curve-fitting constraints.

  • Nonnegativity. One possible constraint is nonnegativity on an interval,

    f(t):=x0+x1t++xntn0,t[a,b]

    with a semidefinite characterization embedded in xKa,bn, see (6.25).

  • Monotonicity. We can ensure monotonicity of f(t) by requiring that f(t)0 (or f(t)0), i.e.,

    (x1,2x2,,nxn)Ka,bn1,

    or

    (x1,2x2,,nxn)Ka,bn1,

    respectively.

  • Convexity or concavity. Convexity (or concavity) of f(t) corresponds to f(t)0 (or f(t)0), i.e.,

    (2x2,6x3,,(n1)nxn)Ka,bn2,

    or

    (2x2,6x3,,(n1)nxn)Ka,bn2,

    respectively.

As an example, we consider fitting a smooth polynomial

fn(t)=x0+x1t++xntn

to the points {(1,1),(0,0),(1,1)}, where smoothness is implied by bounding |fn(t)|. More specifically, we wish to solve the problem

minimizezsubject to|fn(t)|z,t[1,1]fn(1)=1,fn(0)=0,fn(1)=1,

or equivalently

minimizezsubject tozfn(t)0,t[1,1]fn(t)z0,t[1,1]fn(1)=1,fn(0)=0,fn(1)=1.

Finally, we use the characterizations Ka,bn to get a conic problem

minimizezsubject to(zx1,2x2,,nxn)K1,1n1(x1z,2x2,,nxn)K1,1n1fn(1)=1,fn(0)=0,fn(1)=1.
_images/curvefit1.png

Fig. 6.3 Graph of univariate polynomials of degree 2, 4, and 8, respectively, passing through {(1,1),(0,0),(1,1)}. The higher-degree polynomials are increasingly smoother on [1,1].

In Fig. 6.3 we show the graphs for the resulting polynomails of degree 2, 4 and 8, respectively. The second degree polynomial is uniquely determined by the three constraints f2(1)=1, f2(0)=0, f2(1)=1, i.e., f2(t)=t2. Also, we obviously have a lower bound on the largest derivative maxt[1,1]|fn(t)|1. The computed fourth degree polynomial is given by

f4(t)=32t212t4

after rounding coefficients to rational numbers. Furthermore, the largest derivative is given by

f4(1/2)=2,

and f4(t)<0 on (1/2,1] so, although not visibly clear, the polynomial is nonconvex on [1,1]. In Fig. 6.4 we show the graphs of the corresponding polynomials where we added a convexity constraint fn(t)0, i.e.,

(2x2,6x3,,(n1)nxn)K1,1n2.

In this case, we get

f4(t)=65t215t4

and the largest derivative increases to 85.

_images/curvefit2.png

Fig. 6.4 Graph of univariate polynomials of degree 2,4, and 8, respectively, passing through {(1,1),(0,0),(1,1)}. The polynomials all have positive second derivative (i.e., they are convex) on [1,1] and the higher-degree polynomials are increasingly smoother on that interval.

6.3.4 Filter design problems

Filter design is an important application of optimization over trigonometric polynomials in signal processing. We consider a trigonometric polynomial

H(ω)=x0+2(k=1nxkejωk)=a0+2k=1n(akcos(ωk)+bksin(ωk))

where ak:=(xk), bk:=(xk). If the function H(ω) is nonnegative we call it a transfer function, and it describes how different harmonic components of a periodic discrete signal are attenuated when a filter with transfer function H(ω) is applied to the signal.

We often wish a transfer function where H(ω)1 for 0ωωp and H(ω)0 for ωsωπ for given constants ωp,ωs. One possible formulation for achieving this is

minimizetsubject to0H(ω)ω[0,π]1δH(ω)1+δω[0,ωp]H(ω)tω[ωs,π],

which corresponds to minimizing H(w) on the interval [ωs,π] while allowing H(w) to depart from unity by a small amount δ on the interval [0,ωp]. Using the results from Sec. 6.2.8 (Nonnegative trigonometric polynomials) (in particular (6.32), (6.33) and (6.34)), we can pose this as a conic optimization problem

(6.36)minimizetsubject toxK0,πn,(x0(1δ),x1:n)K0,ωpn,(x0(1+δ),x1:n)K0,ωpn,(x0t,x1:n)Kωs,πn,

which is a semidefinite optimization problem. In Fig. 6.5 we show H(ω) obtained by solving (6.36) for n=10, δ=0.05, ωp=π/4 and ωs=ωp+π/8.

_images/trigpoly.png

Fig. 6.5 Plot of lowpass filter transfer function.

6.3.5 Relaxations of binary optimization

Semidefinite optimization is also useful for computing bounds on difficult non-convex or combinatorial optimization problems. For example consider the binary linear optimization problem

(6.37)minimizecTxsubject toAx=b,x{0,1}n.

In general, problem (6.37) is a very difficult non-convex problem where we have to explore 2n different objectives. Alternatively, we can use semidefinite optimization to get a lower bound on the optimal solution with polynomial complexity. We first note that

xi{0,1}xi2=xi,

which is, in fact, equivalent to a rank constraint on a semidefinite variable,

X=xxT,diag(X)=x.

By relaxing the rank 1 constraint on X we get a semidefinite relaxation of (6.37),

(6.38)minimizecTxsubject toAx=b,diag(X)=x,XxxT,

where we note that

XxxT[1xTxX]0.

Since (6.38) is a semidefinite optimization problem, it can be solved very efficiently. Suppose x is an optimal solution for (6.37); then (x,x(x)T) is also feasible for (6.38), but the feasible set for (6.38) is larger than the feasible set for (6.37), so in general the optimal solution of (6.38) serves as a lower bound. However, if the optimal solution X of (6.38) has rank 1 we have found a solution to (6.37) also. The semidefinite relaxation can also be used in a branch-bound mixed-integer exact algorithm for (6.37).

We can tighten (or improve) the relaxation (6.38) by adding other constraints that cut away parts of the feasible set, without excluding rank 1 solutions. By tightening the relaxation, we reduce the gap between the optimal values of the original problem and the relaxation. For example, we can add the constraints

0Xij1,i=1,,n,j=1,,n,XiiXij,i=1,,n,j=1,,n,

and so on. This will usually have a dramatic impact on solution times and memory requirements. Already constraining a semidefinite matrix to be doubly nonnegative (Xij0) introduces additional n2 linear inequality constraints.

6.3.6 Relaxations of boolean optimization

Similarly to Sec. 6.3.5 (Relaxations of binary optimization) we can use semidefinite relaxations of boolean constraints x{1,+1}n. To that end, we note that

(6.39)x{1,+1}nX=xxT,diag(X)=e,

with a semidefinite relaxation XxxT of the rank-1 constraint.

As a (standard) example of a combinatorial problem with boolean constraints, we consider an undirected graph G=(V,E) described by a set of vertices V={v1,,vn} and a set of edges E={(vi,vj)vi,vjV,ij}, and we wish to find the cut of maximum capacity. A cut C partitions the nodes V into two disjoint sets S and T=VS, and the cut-set I is the set of edges with one node in S and another in T, i.e.,

I={(u,v)EuS,vT}.

The capacity of a cut is then defined as the number of edges in the cut-set, |I|.

_images/graph7.png

Fig. 6.6 Undirected graph. The cut {v2,v4,v5} has capacity 9 (thick edges).

To maximize the capacity of a cut we define the symmetric adjacency matrix ASn,

[A]ij={1,(vi,vj)E,0,otherwise,

where n=|V|, and let

xi={+1,viS,1,viS.

Suppose viS. Then 1xixj=0 if vjS and 1xixj=2 if vjS, so we get an expression for the capacity as

c(x)=14ij(1xixj)Aij=14eTAe14xTAx,

and discarding the constant term eTAe gives us the MAX-CUT problem

(6.40)minimizexTAxsubject tox{1,+1}n,

with a semidefinite relaxation

(6.41)minimizeA,Xsubject todiag(X)=e,X0.