MOSEK Portfolio Optimization Workshop
Theory and Practice of Portfolio Optimization with MOSEK
Copenhagen, 18 November 2021

Optimization Specialist, MOSEK ApS

# Outline¶

• A deeper look at MOSEK through the simple MVO example.
• How (not to) implement a big factor model.
• What to expect working with actual data.
• Problems requiring the mixed-integer optimizer.
Part 1. Our first model.

# Simple MVO model¶

$\begin{array}{lrcl} \mbox{maximize} & m^T x & &\\ \mbox{subject to} & x^T \Sigma x & \leq & \gamma^2,\\ & 1^Tx & = & 1,\\ & x & \geq & 0.\\ \end{array}$

Recall that we cast the risk constraint in conic form using any $G$ such that

$\Sigma = GG^T$

Then

$x^T\Sigma x=x^TGG^Tx=(G^Tx)^T(G^Tx) = \|G^Tx\|_2^2$

So:

$x^T\Sigma x\leq \gamma^2 \iff \|G^Tx\|_2\leq \gamma$

The quadratic cone (second-order cone, SOC) $\mathcal{Q}^k$ in dimension $k$ is the convex set defined as:

$\mathcal{Q}^k = \{(y_1,y_2,\ldots,y_k)\in\mathbb{R}^k~:~y_1\geq\left\|(y_2,\ldots,y_k)\right\|_2\}$

Therefore the bound

$\gamma \geq \|G^Tx\|_2$

in conic form becomes

$(\gamma,G^Tx)\in\mathcal{Q}^{N+1}$.

# Simple MVO model - conic form¶

$\begin{array}{lrcl} \mbox{maximize} & m^T x & &\\ \mbox{subject to} & (\gamma,G^Tx) & \in & \mathcal{Q}^{N+1},\\ & 1^Tx & = & 1,\\ & x & \geq & 0.\\ \end{array}$
$x\geq 0$
$1^Tx = 1$
$(\gamma, G^Tx)\in\mathcal{Q}^{N+1}$
$\mathrm{maximize} \quad m^Tx$

Let us check that the constraints of the original problem before conic reformulation

$\begin{array}{lrcl} \mbox{maximize} & m^T x & &\\ \mbox{subject to} & x^T \Sigma x & \leq & \gamma^2,\\ & 1^Tx & = & 1,\\ & x & \geq & 0.\\ \end{array}$

are satisfied:

## Simple debugging - log output¶

• Problem size summary.
• Presolve.
• Details of the presolved problem.
• Iteration log
• Solution summary
• size of objective
• solutions norms
• violations

# A CVXPY example¶

There are modeling tools that allow writing the quadratic model directly and still use MOSEK.

Part 2. A more complicated factor model.

### We consider a real-world portfolio optimization problem.¶

$\begin{array}{ll} \mbox{maximize} & \alpha^T x \\ \mbox{subject to} & 1^Tx = 0,\\ & t_\mathrm{min} \leq x \leq t_\mathrm{max}, \\ & l \leq A\tilde{x} \leq u, \end{array}$

with a risk bound

$(\tilde{x}+h)^T\ \left(\beta\Sigma_F\beta^T+\mathrm{diag}(S_\theta)\right)\ (\tilde{x}+h) \leq \gamma^2$

where

• $x\in\mathbb{R}^{n+1}$, $x = (\tilde{x}, c)$
• $\Sigma_F\in\mathbb{R}^{k\times k}$ - factor covariance matrix,
• $\beta\in\mathbb{R}^{n\times k}$ - factor exposures,
• $S_\theta\in\mathbb{R}^n$ - specific risk vector,
• $h\in\mathbb{R}^n$ - initial holdings.

The linear part

$\begin{array}{ll} \mbox{maximize} & \alpha^T x \\ \mbox{subject to} & 1^Tx = 0,\\ & t_\mathrm{min} \leq x \leq t_\mathrm{max}, \\ & l \leq A\tilde{x} \leq u, \end{array}$

has a direct model:

We proceed with the risk bound constraint

$(\tilde{x}+h)^T\ \left(\beta\Sigma_F\beta^T+\mathrm{diag}(S_\theta)\right)\ (\tilde{x}+h) \leq \gamma^2$

The reason is:

a small negative eigenvalue caused by precision issues. Typically one tries to correct for it by boosting the diagonal a bit:

Going back to the original optimization problem, we can now write the risk bound

$(\tilde{x}+h)^T\ \left(\beta\Sigma_F\beta^T+\mathrm{diag}(S_\theta)\right)\ (\tilde{x}+h) \leq \gamma^2$
since we (approximately) factorized
$\beta\Sigma_F\beta^T+\mathrm{diag}(S_\theta) = GG^T$

Let us instead exploit the fact that we already (almost) have a factorization of on our hands.

If $\Sigma_F = P\cdot P^T$ then $\beta\Sigma_F\beta^T = \beta P P^T \beta^T = (\beta P)(\beta P)^T$

and finally

$(\tilde{x}+h)^T\ \left(\beta\Sigma_F\beta^T+\mathrm{diag}(S_\theta)\right)\ (\tilde{x}+h) = \left\|\left[\begin{array}{c}(\beta P)^T\\ \mathrm{diag}(\sqrt{S_\theta})\end{array}\right](\tilde{x}+h)\right\|_2^2$

As always we can analyze the solution and make some checks:

How many trades are actually substantial?

A typical slice of the trades looks like:

Here it is the user's job to interpret and post-process the solution if necessary.

We could for example choose to

• force those positions to $0$,
• reoptimize.

# Factor model summary¶

• fullModel

• ignores the factor structure: multiplies inputs which then have to be factorized again - duplicate work,
• factorizing the big matrix can be numerically delicate.
• factorModel

• uses the factor structure in an essential way.
• Cholesky only of a very small matrix, numerically stable,
• the conic form is essential.

# NB. Efficiency depends on sparsity rather than dimensions¶

Part 3. Working with the mixed-integer optimizer.

## Mixed-integer optimization (MIO)¶

• An integer variable $z$ is a variable whose values are restricted to the set of integers, $z\in\{\ldots,-2,-1,0,1,2,\ldots\}$.
• A binary variable is a variable restricted to $\{0,1\}$.
• MOSEK can solve problems with integer variables using the mixed-integer optimizer.

## Applications of integer variables¶

• Actual integer amount, for example counting items.
• Binary variables:
• as an on/off indicator
• in modeling if ... then implications
• modeling of functions defined piecewise.

# Example: cardinality constraint¶

Let us extend the previous model with a cardinality constraint for the number of traded assets:

$\#\ \{i~:~x_i\neq 0\} \ \leq\ k$.

## Mixed-integer model:¶

1. We introduce a binary variable $z$ of the same length as $x$.
2. Want: $z_i$ is $1$ (on) if we trade $i$-th asset ($x_i\neq 0$) and $0$ (off) otherwise.
3. We know a good common bound $M$ for all trades ($-M\leq x\leq M$) (a.k.a. big-M).

Then we can express 2. with

$-Mz \leq x \leq Mz$

and the number of assets for which trading is on equals $\sum_i z_i$.

Let us inspect the solution.

What to consider with the mixed-integer optimizer:

• tuning with solver parameters: level of heuristics: feasibility pump, outer approximation, choice of cuts and much more.
• termination criteria:
• time limit, very machine-dependent
• upon reaching some relative optimality gap
• when a number of solutions was found
• initial solution: specifying a good feasible initial point can reduce search time.
• algorithm sensitive to problem structure and input data.
Part 4. Final remarks.