13 Appendix¶
13.1 Conic optimization refresher¶
In this section we give a short summary about conic optimization. Read more about the topic from a mathematical and modeling point of view in our other publication, the MOSEK Modeling Cookbook [MOSEKApS24].
Conic optimization is a class of convex optimization problems, which contains and generalizes in an unified way many specific and well known convex models. These models include linear optimization (LO), quadratic optimization (QO), quadratically constrained quadratic optimization (QCQO), second-order cone optimization (SOCO), and semidefinite optimization (SDO). The general form of a conic optimization problem is
where \(\K\) is a product of the following basic types of cones:
Linear cone:
\[\R^n,\ \R_+^n,\ \{0\}.\]Linear cones model all traditionally LO problems.
Quadratic cone and rotated quadratic cone:
The quadratic cone is the set
\[\Q^n = \left\{x\in\real^n\middle\vert x_1\geq\sqrt{x_2^2+\cdots+x_n^2}\right\}.\]The rotated quadratic cone is the set
\[\Q_\mathrm{r}^n = \left\{x\in\real^n\middle\vert 2x_1x_2\geq x_3^2+\cdots+x_n^2,\ x_1,x_2\geq 0\right\}.\]Modeling with these two cones cover the class of SOCO problems, which include all traditionally QO and QCQO problems as well. See in Sec. 13.1.2 (Traditional quadratic models) for more details.
Primal power cone:
\[\POW_n^{\alpha,1-\alpha} = \left\{x\in\real^n\middle\vert x_1^\alpha x_2^{1-\alpha}\geq \sqrt{x_3^2+\cdots+x_n^2},\ x_1,x_2\geq 0\right\},\]or its dual cone.
Primal exponential cone:
\[\EXP=\left\{x\in\real^3 \middle\vert x_1\geq x_2\exp\left(\frac{x_3}{x_2}\right),\ x_1,x_2\geq 0\right\},\]or its dual cone.
Semidefinite cone:
\[\PSD^n=\left\{X\in\real^{n\times n}\middle\vert X\ \textrm{is symmetric positive semidefinite}\right\}.\]Semidefinite cones model all traditionally SDO problems.
Each of these cones allow formulating different types of convex constraints.
13.1.1 Selection of conic constraints¶
In the following we will list the constraints appearing in financial context in this book and show how can we convert them to conic form.
13.1.1.1 Maximum function¶
We can model the maximum constraint \(\mathrm{max}(x_1, x_2, \dots, x_n) \leq c\) using linear constraints by introducing an auxiliary variable \(t\):
For instance, we could write a series of constraints \(\mathrm{max}(x_i, 0) \leq c_i,\ i = 1,\dots,n\) as
where \(\mathbf{t}\) is an \(n\)-dimensional vector.
13.1.1.2 Positive and negative part¶
A special case of modeling the maximum function is to model the positive part \(x^+\) and negative part \(x^-\) of a variable \(x\). We define these as \(x^+ = \mathrm{max}(x, 0)\) and \(x^- = \mathrm{max}(-x, 0)\).
We can model them explicitly based on Sec. 13.1.1.1 (Maximum function) by relaxing the definitions to inequalities \(x^+ \geq \mathrm{max}(x, 0)\) and \(x^- \geq \mathrm{max}(-x, 0)\), or we can also model them implicitly by the following set of constraints:
Note however, that in either case a freedom remains in the magnitude of \(x^+\) and \(x^-\). This is because in the explicit case we relaxed the equalities, and in the impicit case only the difference of the variables is constrained. In other words it will be possible for \(x^+\) and \(x^-\) to be both positive, allowing optimal solutions where \(x^+ = \mathrm{max}(x, 0)\) and \(x^- = \mathrm{max}(-x, 0)\) does not hold. We could ensure that only either \(x^+\) or \(x^-\) is positive and never both by stating the complementarity constraint \(x^+x^- = 0\) (or in the vector case \(\langle\mathbf{x}^+, \mathbf{x}^-\rangle = 0\)), but unfortunately such an equality constraint is non-convex and cannot be modeled.
We can do workarounds to ensure that equalities \(x^+ = \mathrm{max}(x, 0)\) and \(x^- = \mathrm{max}(-x, 0)\) hold in the optimal solution. One approach is to penalize the magnitude of these two variables, so that if both are positive in any solution, then the solver could always improve the objective by reducing them until either one becomes zero. Another possible workaround is to formulate a mixed integer problem; see Sec. 13.2.1.4 (Positive and negative part).
13.1.1.3 Absolute value¶
We can model the absolute value constraint \(|x| \leq c\) using the maximum function by observing that \(|x| = \mathrm{max}(x, -x)\) (see Sec. 13.1.1.1 (Maximum function)):
Another possibility is to model it using the quadratic cone:
13.1.1.4 Sum of largest elements¶
The sum of the \(m\) largest elements of a vector \(\mathbf{x}\) is the optimal solution of the LO problem
Here \(\mathbf{x}\) cannot be a variable, because that would result in a nonlinear objective. Let us take the dual of problem (13.7):
Problem (13.8) is actually the same as \(\min_t\ mt + \sum_i\mathrm{max}(0, x_i - t)\). In this case \(\mathbf{x}\) can be a variable, and thus it can also be optimized.
13.1.1.5 Linear combination of largest elements¶
We can slightly extend the problem in Sec. 13.1.1.4 (Sum of largest elements) such that \(\mathbf{z}\) can have an upper bound \(\mathbf{c} \geq \mathbf{0}\), and there is a real number \(0 \leq b \leq c_\mathrm{sum}\) instead of the integer \(m\), where \(c_\mathrm{sum} = \sum_ic_i\):
This has the optimal objective \(c_{i_b}^\mathrm{frac} x_{i_b} + \sum_{i > i_b}c_ix_i\), where \(i_b\) is such that \(\sum_{i=1}^{i_b-1}c_i < b \leq \sum_{i=1}^{i_b}c_i\), and \(c_{i_b}^\mathrm{frac} = \sum_{i=1}^{i_b}c_i - b < c_{i_b}\).
If we take the dual of problem (13.9), we get:
which is the same as \(\min_t\ (c_\mathrm{sum} - b)t + \sum_ic_i\mathrm{max}(0, x_i - t)\).
13.1.1.6 Manhattan norm (1-norm)¶
Let \(\mathbf{x} \in \R^n\) and observe that \(\|\mathbf{x}\|_1 = |x_1| +\cdots+ |x_n|\). Then we can model the Manhattan norm or 1-norm constraint \(\|\mathbf{x}\|_1 \leq c\) by modeling the absolute value for each coordinate:
where \(\mathbf{z}\) is an auxiliary variable.
13.1.1.7 Euclidean norm (2-norm)¶
Let \(\mathbf{x} \in \R^n\) and observe that \(\|\mathbf{x}\|_2 = \sqrt{x_1^2+\cdots+x_n^2}\). Then we can model the Euclidean norm or 2-norm constraint \(\|\mathbf{x}\|_2 \leq c\) using the quadratic cone:
13.1.1.8 Hyperbolic constraint¶
Let \(\mathbf{x} \in \R^n\) and observe that \(\|\mathbf{x}\|_2^2 = \mathbf{x}^\mathsf{T}\mathbf{x} = x_1^2+\cdots+x_n^2\). Then we can model the hyperbolic constraint \(\mathbf{x}^\mathsf{T}\mathbf{x} \leq yz\), where \(y, z \geq 0\) using the rotated quadratic cone:
13.1.1.9 Squared Euclidean norm¶
We can model the squared Euclidean norm or sum-of-squares constraint \(\|\mathbf{x}\|_2^2 \leq c\) using the hyperbolic constraint in Sec. 13.1.1.1 (Maximum function) and the rotated quadratic cone:
13.1.1.10 Quadratic form¶
Let \(\mathbf{x} \in \R^n\) and let \(\mathbf{Q} \in \PSD^n\), i. e., a symmetric positive semidefinite matrix. Then we can model the quadratic form constraint \(\frac{1}{2}\mathbf{x}^\mathsf{T}\mathbf{Q}\mathbf{x} \leq c\) either using the quadratic cone or the rotated quadratic cone. To see this, observe that there exists a matrix \(\mathbf{G} \in \R^{n\times k}\) such that \(\mathbf{Q} = \mathbf{G}\mathbf{G}^\mathsf{T}\). Of course this decomposition can be done in many ways, so the matrix \(\mathbf{G}\) is not unique. The most interesting cases are when \(k \ll n\) or \(\mathbf{G}\) is very sparse, because these make the optimization problem much easier to solve numerically (see in Sec. 13.1.2 (Traditional quadratic models)).
Most common ways to compute the matrix \(\mathbf{G}\) are the following:
Cholesky decomposition: \(\mathbf{Q} = \mathbf{C}\mathbf{C}^\mathsf{T}\), where \(\mathbf{C}\) is a lower triangular matrix with nonnegative entries in the diagonal. From this decomposition we have \(\mathbf{G} = \mathbf{C} \in \R^{n\times n}\).
Eigenvalue decomposition: \(\mathbf{Q} = \mathbf{V}\mathbf{D}\mathbf{V}^\mathsf{T}\), where the diagonal matrix \(\mathbf{D}\) contains the (nonnegative) eigenvalues of \(\mathbf{Q}\) and the unitary matrix \(\mathbf{V}\) contains the corresponding eigenvectors in its columns. From this decomposition we have \(\mathbf{G} = \mathbf{V}\mathbf{D}^{1/2} \in \R^{n\times n}\).
Matrix square root: \(\mathbf{Q} = \mathbf{Q}^{1/2}\mathbf{Q}^{1/2}\), where \(\mathbf{Q}^{1/2}\) is the symmetric positive semidefinite “square root” matrix of \(\mathbf{Q}\). From this decomposition we have \(\mathbf{G} = \mathbf{Q}^{1/2} \in \R^{n\times n}\).
Factor model: If \(\mathbf{Q}\) is a covariance matrix of some data, then we can approximate the data series with the combination of \(k \ll n\) common factors. Then we have the decomposition \(\mathbf{Q} = \EBeta\mathbf{Q}_F\EBeta^\mathsf{T} + \mathbf{D}\), where \(\mathbf{Q}_F \in \R^{k\times k}\) is the covariance of the factors, \(\EBeta \in \R^{n\times k}\) is the exposure of the data series to each factor, and \(\mathbf{D}\) is diagonal. From this, by computing the Cholesky decomposition \(\mathbf{Q}_F = \mathbf{F}\mathbf{F}^\mathsf{T}\) we have \(\mathbf{G} = [\EBeta\mathbf{F}, \mathbf{D}^{1/2}] \in \R^{n\times (n+k)}\). The advantage of factor models is that \(\mathbf{G}\) is very sparse and the factors have a direct financial interpretation (see Sec. 5 (Factor models) for details).
After obtaining the matrix \(\mathbf{G}\), we can write the quadratic form constraint as a sum-of-squares \(\frac{1}{2}\mathbf{x}^\mathsf{T}\mathbf{G}\mathbf{G}^\mathsf{T}\mathbf{x} \leq c\), which is a squared Euclidean norm constraint \(\frac{1}{2}\|\mathbf{G}^\mathsf{T}\mathbf{x}\|_2^2 \leq c\).
We can choose to model this using the rotated quadratic cone as
or we can choose to model its square root using the quadratic cone as
Whether to use the quadratic cone or the rotated quadratic cone for modeling can be decided based on which is more natural. Typically the quadratic cone is used to model 2-norm constraints, while the rotated quadratic cone is more natural for modeling of quadratic functions. There can be exceptions, however; see for example in Sec. 2.3 (Conic formulation).
13.1.1.11 Power¶
Let \(x \in \R\) and \(\alpha > 1\). Then we can model the power constraint \(c \geq |x|^\alpha\) or equivalently \(c^{1/\alpha} \geq |x|\) using the power cone:
13.1.1.12 Exponential¶
Let \(x \in \R\). Then we can model the exponential constraint \(t \geq e^x\) using the exponential cone:
13.1.1.13 Log-sum-exp¶
Let \(x_1,\dots,x_n \in \R\). Then we can model the log-sum-exp constraint \(t \geq \mathrm{log}\left(\sum_{i=1}^n e^{x_i}\right)\) by applying rule Sec. 13.1.1.12 (Exponential) \(n\) times:
13.1.1.14 Perspective of function¶
The perspective of a function \(f(x)\) is defined as \(sf(x/s)\) on \(s>0\). From any conic representation of \(t\geq f(x)\) we can reach a representation of \(t\geq sf(x/s)\) by substituting all constants \(c\) with their homogenized counterpart \(sc\).
13.1.1.15 Perspective of log-sum-exp¶
Let \(x_1,\dots,x_n \in \R\). Then we can model the perspective of the log-sum-exp constraint \(t \geq s\mathrm{log}\left(\sum_{i=1}^n e^{x_i/s}\right)\) by applying rule Sec. 13.1.1.14 (Perspective of function) on constraint (13.19):
13.1.2 Traditional quadratic models¶
Optimization problems involving quadratic functions often appear in practice. In this section we show how to convert the traditionally QO or QCQO problems into conic optimization form, because most of the time the solution of these conic models is computationally more efficient.
13.1.2.1 Quadratic optimization¶
The standard form of a quadratic optimization (QO) problem is the following:
The matrix \(\mathbf{Q} \in \R^{n\times n}\) must be symmetric positive semidefinite, otherwise the objective function would not be convex.
Assuming the factorization \(\mathbf{Q} = \mathbf{G}\mathbf{G}^\mathsf{T}\) with \(\mathbf{G}\in \R^{n\times k}\), we can reformulate the problem (13.21) as a conic problem by applying the method described in Sec. 13.1.1.10 (Quadratic form):
13.1.2.2 Quadratically constrained quadratic optimization¶
Consider the quadratically constrained quadratic optimization (QCQO) problem of the form
The matrices \(\mathbf{Q}_i \in \R^{n\times n}\), \(i=0,\dots,m\) must all be symmetric positive semidefinite, otherwise the optimization problem would not be convex.
Assuming the factorization \(\mathbf{Q}_i = \mathbf{G}_i\mathbf{G}_i^\mathsf{T}\) with \(\mathbf{G}_i\in \R^{n\times k_i}\), we can reformulate the problem (13.23) as a conic problem by applying the method described in Sec. 13.1.1.10 (Quadratic form) for both the objective and the constraints:
13.1.2.3 Practical benefits of conic models¶
The key step in the model conversion is to transform quadratic terms \(\mathbf{x}^\mathsf{T}\mathbf{Q}\mathbf{x}\) using the factorization \(\mathbf{Q} = \mathbf{G}\mathbf{G}^\mathsf{T}\), where \(\mathbf{G}\in \R^{n\times k}\). Assuming \(k \ll n\), it results in the following benefits:
The storage requirement \(nk\) of \(\mathbf{G}\) can be much lower than the storage requirement \(n^2/2\) of \(\mathbf{Q}\).
The amount of work to evaluate \(\mathbf{x}^\mathsf{T}\mathbf{Q}\mathbf{x}\) is proportional to \(n^2\) whereas the work to evaluate \(\|\mathbf{G}^\mathsf{T}\mathbf{x}\|_2^2\) is proportional to \(nk\) only.
No need to numerically validate positive semidefiniteness of the matrix \(\mathbf{Q}\). This is otherwise difficult owing to the presence of rounding errors that can make \(\mathbf{Q}\) indefinite.
Duality theory is much simpler for conic quadratic optimization.
In summary, the conic equivalents are not only as easy to solve as the original QP or QCQP problems, but in most cases also need less space and solution time.
13.2 Mixed-integer models¶
Mixed integer optimization (MIO) is an extension of convex optimization, which introduces variables taking values only in the set of integers or in some subset of integers. This allows for solving a much wider range of optimization problems in addition to the ones convex optimization already covers. However, these problems are no longer convex. Handling of integer variables make the optimization NP-hard and needs specific algorithms, thus the solution of MIO problems is much slower. In many practical cases we cannot expect to find the optimal solution in reasonable time, and only near-optimal solutions will be available.
A general mixed integer optimization problem has the following form:
where \(\K\) is a cone and \(\mathcal{I} \subseteq \{1,\dots,n\}\) contains the indices of integer variables. We can model any finite range for the integer variable \(x_i\) by simply adding the extra constraint \(a_i \leq x_i \leq b_i\).
13.2.1 Selection of integer constraints¶
In the following we will list the integer constraints appearing in financial context in this book and show how to model them.
13.2.1.1 Switch¶
In some practical cases we might wish to impose conditions on parts of our optimization model. For example, allowing nonzero value for a variable only in presence of a condition. We can model such situations using binary variables (or indicator variables), which can only take \(0\) or \(1\) values. The following set of constraints only allow \(x\) to be positive when the indicator variable \(y\) is equal to \(1\):
The number \(M\) here is not related to the optimization problem, but it is necessary to form such a switchable constraint. If \(y=0\), then the upper limit of \(x\) is \(0\). If \(y=1\), then the upper limit of \(x\) is \(M\). This modeling technique is called big-M. The choice of \(M\) can affect the solution performance, but a nice feature of it is that the problem cannot accidentally become infeasible.
If we have a vector variable \(\mathbf{x}\) then the switch will look like:
where \(\circ\) denotes the elementwise product. We accounted for a possibly different big-M value for each coordinate.
13.2.1.2 Semi-continuous variable¶
A slight extension of Sec. 13.2.1.1 (Switch) is to model semi-continuous variables, i. e. when \(x \in \{0\} \cup [a, b]\), where \(0 < a \leq b\). We can model this by
13.2.1.3 Cardinality¶
We might need to limit the number of nonzeros in a vector \(\mathbf{x}\) to some number \(K<n\). We can do this with the help of an indicator vector \(\mathbf{y}\) of length \(n\), which indicates \(|\mathbf{x}| \neq \mathbf{0}\) (see Sec. 13.2.1.1 (Switch)). First we add a big-M bound \(\mathbf{M}\circ\mathbf{y}\) to the absolute value, and model it based on Sec. 13.1.1.3 (Absolute value). Then we limit the cardinality of \(\mathbf{x}\) by limiting the cardinality of \(\mathbf{y}\):
13.2.1.4 Positive and negative part¶
We introduced the positive part \(x^+\) and negative part \(x^-\) of a variable \(x\) in Sec. 13.1.1.2 (Positive and negative part). We noted that we need a way to ensure only \(x^+\) or \(x^-\) will be positive in the optimal solution and not both. One such way is to use binary variables:
Here the binary variable \(y\) allows the positive (negative) part to become nonzero exactly when the negative (positive) part is zero.
Sometimes we need to handle separately the case when both the positive and the negative part is fixed to be zero. Then we need to introduce two binary variables \(y^+\) and \(y^-\), and include the constraint \(y^+ + y^- \leq 1\) to prevent both variables from being \(1\):
13.3 Quadratic cones and riskless solution¶
In Sec. 6 (Transaction costs) we consider portfolio optimization problems with risk-free security. In this case there is a difference in the computation of the efficient frontier, if we model using the quadratic cone instead of the rotated quadratic cone. Namely, the optimization problem will not have solutions that are a mix of the risk-free security and risky securities. Instead, it will only have either the 100% risk-free solution, or a portfolio of only risky securities. Along the derivation below we will see that the same behavior applies to problems not having a risk-free security but having \(\mathbf{x} = \mathbf{0}\) as a feasible solution.
Consider the following simple model:
where \(r^\mathrm{f}\) is the risk-free rate and \(x^\mathrm{f}\) is the allocation to the risk-free asset.
We can transform this problem using \(x^\mathrm{f} = 1 - \mathbf{1}^\mathsf{T}\mathbf{x}\), such that we are only left with the variable \(\mathbf{x}\):
The feasible region in this form is a probability simplex. The solution \(\mathbf{x} = \mathbf{0}\) means that everything is allocated to the risk-free security, and the hyperplane \(\mathbf{1}^\mathsf{T}\mathbf{x} = 1\) has all the feasible solutions purely involving risky assets.
Let us denote the objective function value by \(\mathrm{obj}(\mathbf{x})\). Then the directional derivative of the objective along the direction \(\mathbf{u} > \mathbf{0}\), \(||\mathbf{u}|| = 1\) will be
This does not depend on the norm of \(\mathbf{x}\), meaning that \(\partial_\mathbf{u}\mathrm{obj}(c\mathbf{u})\) will be constant in \(c\):
Thus the objective is linear along the 1-dimensional slice between \(\mathbf{x} = \mathbf{0}\) and any \(\mathbf{x} > \mathbf{0}\), meaning that the optimal solution is either \(\mathbf{x} = \mathbf{0}\) or some \(\mathbf{x} > \mathbf{0}\) satisfying \(\mathbf{1}^\mathsf{T}\mathbf{x} = 1\).
Furthermore, along every 1-dimensional slice represented by a direction \(\mathbf{u}\), there is a \(\tilde{\delta}\) threshold
This is the Sharpe ratio of all portfolios of the form \(c\mathbf{u}\). If \(\tilde{\delta} > \tilde{\delta}_\mathbf{u}\) then \(\partial_\mathbf{u}\mathrm{obj}(\mathbf{u})\) turns negative.
In general, the larger we set \(\tilde{\delta}\), the fewer directions \(\mathbf{u}\) will exist for which \(\tilde{\delta} < \tilde{\delta}_\mathbf{u}\), i. e., \(\partial_\mathbf{u}\mathrm{obj}(\mathbf{u}) > 0\), and the optimal solution is a combination of risky assets. The largest \(\tilde{\delta}\) for which we still have such a direction is \(\tilde{\delta}^* = \mathrm{max}_\mathbf{u}\ \tilde{\delta}_\mathbf{u}\), the maximal Sharpe ratio. The corresponding optimal portfolio is the one having the smallest risk while consisting purely of risky assets. For \(\tilde{\delta} > \tilde{\delta}^*\) we have \(\partial_\mathbf{u}\mathrm{obj}(\mathbf{u}) < 0\) in all directions, meaning that the optimal solution will always be \(\mathbf{x} = \mathbf{0}\), the 100% risk-free portfolio.
13.4 Monte Carlo Optimization Selection (MCOS)¶
We can use the following procedure to compare the accuracy of different portfolio optimization methods.
Inputs of the procedure are the estimated mean return \(\EMean\) and estimated covariance \(\ECov\). Treat this pair of inputs as true values.
Compute the optimal asset allocation \(\mathbf{x}\) for the original input pair \((\EMean, \ECov)\).
Repeat \(K_\mathrm{sim}\) times:
Do parametric bootstrap resampling, i. e., draw a new \(N\times T\) return data sample \(\mathbf{R}_k\) and derive a simulated pair \((\EMean_k, \ECov_k)\).
De-noise the covariance matrix \(\ECov_k\) using the method in Sec. 4.2.2 (Covariance de-noising).
Compute the optimal portfolio \(\mathbf{x}_k\) using the optimization method(s) that we wish to analyze.
To estimate the error for each security in the optimal portfolio, compute the standard deviation of the difference vectors \(\mathbf{x}_k-\mathbf{x}\). Then by taking the mean we get a single number estimating the error of the optimization method.
Regarding the final step, we can also measure the average decay in performance by any of the following:
the mean difference in expected outcomes: \(\frac{1}{K_\mathrm{sim}}\sum_k (\mathbf{x}_k-\mathbf{x})^\mathsf{T}\EMean\)
the mean difference in variance: \(\frac{1}{K_\mathrm{sim}}\sum_k (\mathbf{x}_k-\mathbf{x})^\mathsf{T}\ECov(\mathbf{x}_k-\mathbf{w})\)
the mean difference in Sharpe ratio or other metric computed from the above statistics.