13 Appendix

13.1 Conic optimization refresher

In this section we give a short summary about conic optimization. Read more about the topic from a mathematical and modeling point of view in our other publication, the MOSEK Modeling Cookbook [MOSEKApS24].

Conic optimization is a class of convex optimization problems, which contains and generalizes in an unified way many specific and well known convex models. These models include linear optimization (LO), quadratic optimization (QO), quadratically constrained quadratic optimization (QCQO), second-order cone optimization (SOCO), and semidefinite optimization (SDO). The general form of a conic optimization problem is

(13.1)maximizecTxsubject toAx+bK.

where K is a product of the following basic types of cones:

  • Linear cone:

    Rn, R+n, {0}.

    Linear cones model all traditionally LO problems.

  • Quadratic cone and rotated quadratic cone:

    The quadratic cone is the set

    Qn={xRn|x1x22++xn2}.

    The rotated quadratic cone is the set

    Qrn={xRn|2x1x2x32++xn2, x1,x20}.

    Modeling with these two cones cover the class of SOCO problems, which include all traditionally QO and QCQO problems as well. See in Sec. 13.1.2 (Traditional quadratic models) for more details.

  • Primal power cone:

    Pnα,1α={xRn|x1αx21αx32++xn2, x1,x20},

    or its dual cone.

  • Primal exponential cone:

    Kexp={xR3|x1x2exp(x3x2), x1,x20},

    or its dual cone.

  • Semidefinite cone:

    S+n={XRn×n|X is symmetric positive semidefinite}.

    Semidefinite cones model all traditionally SDO problems.

Each of these cones allow formulating different types of convex constraints.

13.1.1 Selection of conic constraints

In the following we will list the constraints appearing in financial context in this book and show how can we convert them to conic form.

13.1.1.1 Maximum function

We can model the maximum constraint max(x1,x2,,xn)c using linear constraints by introducing an auxiliary variable t:

(13.2)tc,tx1,txn.

For instance, we could write a series of constraints max(xi,0)ci, i=1,,n as

(13.3)tc,tx,t0,

where t is an n-dimensional vector.

13.1.1.2 Positive and negative part

A special case of modeling the maximum function is to model the positive part x+ and negative part x of a variable x. We define these as x+=max(x,0) and x=max(x,0).

We can model them explicitly based on Sec. 13.1.1.1 (Maximum function) by relaxing the definitions to inequalities x+max(x,0) and xmax(x,0), or we can also model them implicitly by the following set of constraints:

(13.4)x=x+x,x+, x0.

Note however, that in either case a freedom remains in the magnitude of x+ and x. This is because in the explicit case we relaxed the equalities, and in the impicit case only the difference of the variables is constrained. In other words it will be possible for x+ and x to be both positive, allowing optimal solutions where x+=max(x,0) and x=max(x,0) does not hold. We could ensure that only either x+ or x is positive and never both by stating the complementarity constraint x+x=0 (or in the vector case x+,x=0), but unfortunately such an equality constraint is non-convex and cannot be modeled.

We can do workarounds to ensure that equalities x+=max(x,0) and x=max(x,0) hold in the optimal solution. One approach is to penalize the magnitude of these two variables, so that if both are positive in any solution, then the solver could always improve the objective by reducing them until either one becomes zero. Another possible workaround is to formulate a mixed integer problem; see Sec. 13.2.1.4 (Positive and negative part).

13.1.1.3 Absolute value

We can model the absolute value constraint |x|c using the maximum function by observing that |x|=max(x,x) (see Sec. 13.1.1.1 (Maximum function)):

(13.5)cxc.

Another possibility is to model it using the quadratic cone:

(13.6)(c,x)Q2.

13.1.1.4 Sum of largest elements

The sum of the m largest elements of a vector x is the optimal solution of the LO problem

(13.7)maximizexTzsubject to1Tz=m,0z1.

Here x cannot be a variable, because that would result in a nonlinear objective. Let us take the dual of problem (13.7):

(13.8)minimizemt+1Tusubject tou+tx,u0.

Problem (13.8) is actually the same as mint mt+imax(0,xit). In this case x can be a variable, and thus it can also be optimized.

13.1.1.5 Linear combination of largest elements

We can slightly extend the problem in Sec. 13.1.1.4 (Sum of largest elements) such that z can have an upper bound c0, and there is a real number 0bcsum instead of the integer m, where csum=ici:

(13.9)maximizexTzsubject to1Tz=csumb,0zc.

This has the optimal objective cibfracxib+i>ibcixi, where ib is such that i=1ib1ci<bi=1ibci, and cibfrac=i=1ibcib<cib.

If we take the dual of problem (13.9), we get:

(13.10)minimize(csumb)t+cTusubject tou+tx,u0,

which is the same as mint (csumb)t+icimax(0,xit).

13.1.1.6 Manhattan norm (1-norm)

Let xRn and observe that x1=|x1|++|xn|. Then we can model the Manhattan norm or 1-norm constraint x1c by modeling the absolute value for each coordinate:

(13.11)zxz,i=1nzi=c,

where z is an auxiliary variable.

13.1.1.7 Euclidean norm (2-norm)

Let xRn and observe that x2=x12++xn2. Then we can model the Euclidean norm or 2-norm constraint x2c using the quadratic cone:

(13.12)(c,x)Qn+1.

13.1.1.8 Hyperbolic constraint

Let xRn and observe that x22=xTx=x12++xn2. Then we can model the hyperbolic constraint xTxyz, where y,z0 using the rotated quadratic cone:

(13.13)(y,z,x)Qrn+2.

13.1.1.9 Squared Euclidean norm

We can model the squared Euclidean norm or sum-of-squares constraint x22c using the hyperbolic constraint in Sec. 13.1.1.1 (Maximum function) and the rotated quadratic cone:

(13.14)(c,12,x)Qrn+2.

13.1.1.10 Quadratic form

Let xRn and let QS+n, i. e., a symmetric positive semidefinite matrix. Then we can model the quadratic form constraint 12xTQxc either using the quadratic cone or the rotated quadratic cone. To see this, observe that there exists a matrix GRn×k such that Q=GGT. Of course this decomposition can be done in many ways, so the matrix G is not unique. The most interesting cases are when kn or G is very sparse, because these make the optimization problem much easier to solve numerically (see in Sec. 13.1.2 (Traditional quadratic models)).

Most common ways to compute the matrix G are the following:

  • Cholesky decomposition: Q=CCT, where C is a lower triangular matrix with nonnegative entries in the diagonal. From this decomposition we have G=CRn×n.

  • Eigenvalue decomposition: Q=VDVT, where the diagonal matrix D contains the (nonnegative) eigenvalues of Q and the unitary matrix V contains the corresponding eigenvectors in its columns. From this decomposition we have G=VD1/2Rn×n.

  • Matrix square root: Q=Q1/2Q1/2, where Q1/2 is the symmetric positive semidefinite “square root” matrix of Q. From this decomposition we have G=Q1/2Rn×n.

  • Factor model: If Q is a covariance matrix of some data, then we can approximate the data series with the combination of kn common factors. Then we have the decomposition Q=βQFβT+D, where QFRk×k is the covariance of the factors, βRn×k is the exposure of the data series to each factor, and D is diagonal. From this, by computing the Cholesky decomposition QF=FFT we have G=[βF,D1/2]Rn×(n+k). The advantage of factor models is that G is very sparse and the factors have a direct financial interpretation (see Sec. 5 (Factor models) for details).

After obtaining the matrix G, we can write the quadratic form constraint as a sum-of-squares 12xTGGTxc, which is a squared Euclidean norm constraint 12GTx22c.

We can choose to model this using the rotated quadratic cone as

(13.15)(c,1,GTx)Qrk+2,

or we can choose to model its square root using the quadratic cone as

(13.16)(c,GTx)Qk+1.

Whether to use the quadratic cone or the rotated quadratic cone for modeling can be decided based on which is more natural. Typically the quadratic cone is used to model 2-norm constraints, while the rotated quadratic cone is more natural for modeling of quadratic functions. There can be exceptions, however; see for example in Sec. 2.3 (Conic formulation).

13.1.1.11 Power

Let xR and α>1. Then we can model the power constraint c|x|α or equivalently c1/α|x| using the power cone:

(13.17)(c,1,x)P31/α,(α1)/α.

13.1.1.12 Exponential

Let xR. Then we can model the exponential constraint tex using the exponential cone:

(13.18)(t,1,x)Kexp.

13.1.1.13 Log-sum-exp

Let x1,,xnR. Then we can model the log-sum-exp constraint tlog(i=1nexi) by applying rule Sec. 13.1.1.12 (Exponential) n times:

(13.19)i=1nui1,(ui,1,xit)Kexp, i=1,,n

13.1.1.14 Perspective of function

The perspective of a function f(x) is defined as sf(x/s) on s>0. From any conic representation of tf(x) we can reach a representation of tsf(x/s) by substituting all constants c with their homogenized counterpart sc.

13.1.1.15 Perspective of log-sum-exp

Let x1,,xnR. Then we can model the perspective of the log-sum-exp constraint tslog(i=1nexi/s) by applying rule Sec. 13.1.1.14 (Perspective of function) on constraint (13.19):

(13.20)i=1nuis,(ui,s,xit)Kexp, i=1,,n

13.1.2 Traditional quadratic models

Optimization problems involving quadratic functions often appear in practice. In this section we show how to convert the traditionally QO or QCQO problems into conic optimization form, because most of the time the solution of these conic models is computationally more efficient.

13.1.2.1 Quadratic optimization

The standard form of a quadratic optimization (QO) problem is the following:

(13.21)minimize12xTQx+cTxsubject toAx=b,x0.

The matrix QRn×n must be symmetric positive semidefinite, otherwise the objective function would not be convex.

Assuming the factorization Q=GGT with GRn×k, we can reformulate the problem (13.21) as a conic problem by applying the method described in Sec. 13.1.1.10 (Quadratic form):

(13.22)minimizet+cTxsubject toAx=b,x0,(t,1,GTx)Qrk+2.

13.1.2.2 Quadratically constrained quadratic optimization

Consider the quadratically constrained quadratic optimization (QCQO) problem of the form

(13.23)minimize12xTQ0x+c0Tx+a0subject to12xTQix+ciTx+ai0,i=1,,m.

The matrices QiRn×n, i=0,,m must all be symmetric positive semidefinite, otherwise the optimization problem would not be convex.

Assuming the factorization Qi=GiGiT with GiRn×ki, we can reformulate the problem (13.23) as a conic problem by applying the method described in Sec. 13.1.1.10 (Quadratic form) for both the objective and the constraints:

(13.24)minimizet0+c0Tx+a0subject toti+ciTx+ai0,i=1,,m,(ti,1,GiTx)Qrki+2,i=0,,m.

13.1.2.3 Practical benefits of conic models

The key step in the model conversion is to transform quadratic terms xTQx using the factorization Q=GGT, where GRn×k. Assuming kn, it results in the following benefits:

  • The storage requirement nk of G can be much lower than the storage requirement n2/2 of Q.

  • The amount of work to evaluate xTQx is proportional to n2 whereas the work to evaluate GTx22 is proportional to nk only.

  • No need to numerically validate positive semidefiniteness of the matrix Q. This is otherwise difficult owing to the presence of rounding errors that can make Q indefinite.

  • Duality theory is much simpler for conic quadratic optimization.

In summary, the conic equivalents are not only as easy to solve as the original QP or QCQP problems, but in most cases also need less space and solution time.

13.2 Mixed-integer models

Mixed integer optimization (MIO) is an extension of convex optimization, which introduces variables taking values only in the set of integers or in some subset of integers. This allows for solving a much wider range of optimization problems in addition to the ones convex optimization already covers. However, these problems are no longer convex. Handling of integer variables make the optimization NP-hard and needs specific algorithms, thus the solution of MIO problems is much slower. In many practical cases we cannot expect to find the optimal solution in reasonable time, and only near-optimal solutions will be available.

A general mixed integer optimization problem has the following form:

(13.25)maximizecTxsubject toAx+bK,xiZ,iI.

where K is a cone and I{1,,n} contains the indices of integer variables. We can model any finite range for the integer variable xi by simply adding the extra constraint aixibi.

13.2.1 Selection of integer constraints

In the following we will list the integer constraints appearing in financial context in this book and show how to model them.

13.2.1.1 Switch

In some practical cases we might wish to impose conditions on parts of our optimization model. For example, allowing nonzero value for a variable only in presence of a condition. We can model such situations using binary variables (or indicator variables), which can only take 0 or 1 values. The following set of constraints only allow x to be positive when the indicator variable y is equal to 1:

(13.26)xMy,y{0,1}.

The number M here is not related to the optimization problem, but it is necessary to form such a switchable constraint. If y=0, then the upper limit of x is 0. If y=1, then the upper limit of x is M. This modeling technique is called big-M. The choice of M can affect the solution performance, but a nice feature of it is that the problem cannot accidentally become infeasible.

If we have a vector variable x then the switch will look like:

(13.27)xMy,y{0,1}n,

where denotes the elementwise product. We accounted for a possibly different big-M value for each coordinate.

13.2.1.2 Semi-continuous variable

A slight extension of Sec. 13.2.1.1 (Switch) is to model semi-continuous variables, i. e. when x{0}[a,b], where 0<ab. We can model this by

(13.28)ayxby,y{0,1}.

13.2.1.3 Cardinality

We might need to limit the number of nonzeros in a vector x to some number K<n. We can do this with the help of an indicator vector y of length n, which indicates |x|0 (see Sec. 13.2.1.1 (Switch)). First we add a big-M bound My to the absolute value, and model it based on Sec. 13.1.1.3 (Absolute value). Then we limit the cardinality of x by limiting the cardinality of y:

(13.29)xMy,xMy,1TyK,y{0,1}n.

13.2.1.4 Positive and negative part

We introduced the positive part x+ and negative part x of a variable x in Sec. 13.1.1.2 (Positive and negative part). We noted that we need a way to ensure only x+ or x will be positive in the optimal solution and not both. One such way is to use binary variables:

(13.30)x=x+x,x+My,xM(1y),x+, x0,y{0,1}.

Here the binary variable y allows the positive (negative) part to become nonzero exactly when the negative (positive) part is zero.

Sometimes we need to handle separately the case when both the positive and the negative part is fixed to be zero. Then we need to introduce two binary variables y+ and y, and include the constraint y++y1 to prevent both variables from being 1:

(13.31)x=x+x,x+My+,xMy,x+, x0,y++y1,y+, y{0,1}.

13.3 Quadratic cones and riskless solution

In Sec. 6 (Transaction costs) we consider portfolio optimization problems with risk-free security. In this case there is a difference in the computation of the efficient frontier, if we model using the quadratic cone instead of the rotated quadratic cone. Namely, the optimization problem will not have solutions that are a mix of the risk-free security and risky securities. Instead, it will only have either the 100% risk-free solution, or a portfolio of only risky securities. Along the derivation below we will see that the same behavior applies to problems not having a risk-free security but having x=0 as a feasible solution.

Consider the following simple model:

(13.32)maximizeμTx+rfxfδ~xTΣxsubject to1Tx+xf=1,x,xf0,

where rf is the risk-free rate and xf is the allocation to the risk-free asset.

We can transform this problem using xf=11Tx, such that we are only left with the variable x:

(13.33)maximize(μrf1)Tx+rfδ~xTΣxsubject to1Tx1,x0.

The feasible region in this form is a probability simplex. The solution x=0 means that everything is allocated to the risk-free security, and the hyperplane 1Tx=1 has all the feasible solutions purely involving risky assets.

Let us denote the objective function value by obj(x). Then the directional derivative of the objective along the direction u>0, ||u||=1 will be

uobj(x)=(μrf1)Tuδ~xTΣuxTΣx.

This does not depend on the norm of x, meaning that uobj(cu) will be constant in c:

uobj(cu)=(μrf1)Tuδ~uTΣu.

Thus the objective is linear along the 1-dimensional slice between x=0 and any x>0, meaning that the optimal solution is either x=0 or some x>0 satisfying 1Tx=1.

Furthermore, along every 1-dimensional slice represented by a direction u, there is a δ~ threshold

δ~u=(μrf1)TuuTSu.

This is the Sharpe ratio of all portfolios of the form cu. If δ~>δ~u then uobj(u) turns negative.

In general, the larger we set δ~, the fewer directions u will exist for which δ~<δ~u, i. e., uobj(u)>0, and the optimal solution is a combination of risky assets. The largest δ~ for which we still have such a direction is δ~=maxu δ~u, the maximal Sharpe ratio. The corresponding optimal portfolio is the one having the smallest risk while consisting purely of risky assets. For δ~>δ~ we have uobj(u)<0 in all directions, meaning that the optimal solution will always be x=0, the 100% risk-free portfolio.

13.4 Monte Carlo Optimization Selection (MCOS)

We can use the following procedure to compare the accuracy of different portfolio optimization methods.

  1. Inputs of the procedure are the estimated mean return μ and estimated covariance Σ. Treat this pair of inputs as true values.

  2. Compute the optimal asset allocation x for the original input pair (μ,Σ).

  3. Repeat Ksim times:

    1. Do parametric bootstrap resampling, i. e., draw a new N×T return data sample Rk and derive a simulated pair (μk,Σk).

    2. De-noise the covariance matrix Σk using the method in Sec. 4.2.2 (Covariance de-noising).

    3. Compute the optimal portfolio xk using the optimization method(s) that we wish to analyze.

  4. To estimate the error for each security in the optimal portfolio, compute the standard deviation of the difference vectors xkx. Then by taking the mean we get a single number estimating the error of the optimization method.

Regarding the final step, we can also measure the average decay in performance by any of the following:

  • the mean difference in expected outcomes: 1Ksimk(xkx)Tμ

  • the mean difference in variance: 1Ksimk(xkx)TΣ(xkw)

  • the mean difference in Sharpe ratio or other metric computed from the above statistics.