A tutorial to MOSEK 10

KTH Royal Institute of Technology
Stockholm, May 19th, 2022

Sven Wiese

www.mosek.com

Outline

Linear Programming

LP in standard form:

$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{s.t.} & Ax = b\\ & x \geq 0. \end{array} $

Pro:

Therefore, we have powerful algorithms and software.

Con:

Nonlinear Programming

The classical nonlinear optimization problem:

$ \begin{array}{ll} \mbox{minimize} & f(x)\\ \mbox{s.t.} & h(x) = 0 \\ & g(x) \leq 0. \end{array} $

Pro:

Con:

Is there a class of optimization problems that

Good partial orderings

Definition (Ben-Tal & Nemirovski, 2001): A good partial ordering of $\mathbb{R}^n$ is a vector relation that satisfies:

  1. reflexivity
  2. antisymmetry
  3. transitivity
  4. compatibility with linear operations

The coordinatewise ordering $$x\geq y \Longleftrightarrow x_i \geq y_i ~\forall i=1,\ldots,n$$

is an example, but not the only one!

The coordinatewise ordering leads to the cone $\mathbb{R}^n_+$.

Conic Programming

In standard form:

$ \begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{s.t.} & Ax = b\\ & x\in\mathcal{K} \end{array} $

where $\mathcal{K}$ is a (closed) pointed convex cone (with non-empty interior).

The beauty of conic optimization

MOSEK

A software package/library for solving:

Current version is MOSEK 10, released March 2022.

Continuous Optimmization folklore: "Almost all convex constraints that arise in practice are representable using these 5 cones."

More evidence: (Lubin et. al, 2016) show that all convex instances (333) in MINLPLIB2 are conic representable using only 4 of the above cones.

Interfaces to MOSEK

$ \begin{array}{lrcl} \\ \mbox{minimize} & & c^T x \\ \mbox{s.t.} & l^c \leq & Ax & \leq u^c\\ & l^x \leq & x & \leq u^x\\ & & Fx + g & \in \mathcal{K} \end{array} $

Linear Programming case study

Solve a simple diet problem: combine servings of meals to meat min/max nutrient requirements.

Corn Milk Bread
Cost per serving \$0.18 \$0.23 \$0.05
Vitamin A 107 500 0
Calories 72 121 65
Max servings - 10 10

Vitamin A should stay between 2,000 and 2,250, calories between 5,000 and 50,000.

First create a model:

Add variables to it:

Set the objective:

Add constraints on vitamin A and calorie intake:

Set upper bounds on variables:

And solve:

Try some linear Algebra utils:

Linear operators available in Fusion's Expr class:

Quadratic Cone Programming

The quadratic cone family:

$ \mathcal{Q}^n = \{ x\in \mathbb{R}^n \mid x_1 \geq \left( x_2^2 + \cdots + x_n^2 \right)^{1/2} = \|x_{2:n}\|_2 \} $
$ \mathcal{Q}_r^n = \{ x\in \mathbb{R}^n \mid 2x_1 x_2 \geq x_3^2 + \cdots + x_n^2 = \|x_{3:n}\|_2^2, \, x_1, x_2\geq 0\} $

Are equivalent in the sense that $x\in\mathcal{Q}^n\Longleftrightarrow T_nx\in \mathcal{Q}^n_r$ with $T_n = $ $ \begin{pmatrix} 1/\sqrt{2} & 1/\sqrt{2} & 0\\ 1/\sqrt{2} & -1/\sqrt{2} & 0\\ 0 & 0 & I_{n-2} \end{pmatrix}. $

Conic quadratic modeling

$ c^Tx + d \geq \| Ax + b\|_2 \quad \Longleftrightarrow \quad (c^Tx + d, Ax+b) \in \mathcal{Q}^{m+1} $
$ t\geq\| x \|_2^2 \quad \Longleftrightarrow \quad (t, 1/2, x) \in \mathcal{Q}_r^{n+2} $
$ t\geq (1/2)x^T Q x \quad \Longleftrightarrow \quad (t, 1, F^Tx) \in \mathcal{Q}_r^{k+2}\text{ with }Q=F^TF, F\in\mathbb{R}^{n\times k}. $

Any convex (MI)QCQP can be cast in conic form!

Conic quadratic case study: least squares regression

Simplest form: given observations $y\in\mathbb{R}^n$ and features $X\in\mathbb{R}^{n\times d}$, solve


$ \min_{w\in\mathbb{R}^d}\Vert y - Xw\Vert_2 $

Put it in the conic framework - we need a bit of reformulation:

$ \begin{array}{lrcl} \\ \mbox{minimize} & & c^T x \\ \mbox{s.t.} & l^c \leq & Ax & \leq u^c\\ & l^x \leq & x & \leq u^x\\ & & Fx + g & \in \mathcal{K} \end{array} $

Remember: $\mathcal{Q}^n = \{ x\in \mathbb{R}^n \mid x_1 \geq \left( x_2^2 + \cdots + x_n^2 \right)^{1/2} = \|x_{2:n}\|_2 \}$

Solution:

$ \begin{array}{ll} \mbox{minimize} & t\\ \mbox{s.t.} & t\geq\Vert y-Xw\Vert_2 \\ & t\in\mathbb{R}, w\in\mathbb{R}^d \end{array} $
$ \begin{array}{ll} \mbox{minimize} & t\\ \mbox{s.t.} & (t, y-Xw)\in \mathcal{Q}^{n+1}\\ & t\in\mathbb{R}, w\in\mathbb{R}^d \end{array} $

The regression coefficients and bound on the norm:

The quadratic cone constraint, note the (vertical) stacking:

The objective:

Advanced conic quadratic modeling: simple rational powers

Portfolio optimization with market impact:

$ \begin{array}{ll} \mbox{maximize} & \mu^Tx\\ \mbox{s.t.} & \sum_ix_i \color{red}{+ a^Tt} = 1 \\ & \color{red}{t_i = x_i^{3/2}\qquad\forall i}\\ & x^T\Sigma x \leq \sigma^2 \\ & x \geq 0 \end{array} $

$x^t\Sigma x\leq \sigma^2$ is convex quadratic (easy to reformulate...), but how to model $t_i\geq x_i^{3/2}$?

For some constant $c$, together with $x\geq 0$,

$ (s,t,x), (x,c,s)\in\mathcal{Q}_r^3, s\in\mathbb{R} $

implies $t_i \geq x_i^{3/2}$.

What's the value of $c$? Remember:

$ \mathcal{Q}_r^n = \{ x\in \mathbb{R}^n \mid 2x_1 x_2 \geq x_3^2 + \cdots + x_n^2 = \|x_{3:n}\|_2^2, \, x_1, x_2\geq 0\} $

Solution:

$ (s,t,x), (x,1/8,s)\in\mathcal{Q}_r^3\implies 2st\geq x^2, 2x\cdot\frac{1}{8}\geq s^2 $
$ \implies 4s^2t^2\cdot 2x\cdot \frac{1}{8}\geq x^4\cdot s^2\implies t^2\geq x^3\implies t\geq x^{3/2} $

Symmetric vs Nonsymmetric cones

The LP and the quadratic cones are so-called symmetric cones: they are homogeneous and self-dual.

There are several nonsymmetric cones, exponential and power cones being two examples.

Power Cone Programming

$n$-dimensional power cone:

$ \mathcal{P}^\alpha_n = \{ x\in\mathbb{R}^n \mid x_1^\alpha x_2^{(1-\alpha)} \geq \left( x_3^2 + \cdots + x_n^2 \right)^{1/2} = \|x_{3:n}\|_2, \: x_1, x_2 \geq 0 \} $

with parameter $0 < \alpha < 1$.

More generally:

$ \mathcal{P}^{(\alpha_1, \ldots, \alpha_{n_l})}_n = \{ x\in\mathbb{R}^n \mid \prod_\limits{i=1}^{n_l} x_i^{\beta_i} \geq \left( x_{n_l+1}^2 + \cdots + x_n^2 \right)^{1/2} = \|x_{n_l+1:n}\|_2, \: x_1,\ldots, x_{n_l} \geq 0 \} $

where $\beta_i = \alpha_i / (\sum_j\alpha_j)$.

Power cone modeling

$ |t| \leq x^p, x \geq 0 \text{ with } 0 < p < 1 \Longleftrightarrow (x, 1, t)\in\mathcal{P}_3^{p} $
$ (x_1\cdots x_n)^{1/n}\geq|z| \Longleftrightarrow (x,z)\in\mathcal{P}_{n+1}^{(1/n, \ldots,1/n)} $

Power cone case study: Portfolio optimization with market impact

From above:

$ \begin{array}{ll} \mbox{maximize} & \mu^Tx\\ \mbox{s.t.} & \sum_ix_i + a^Tt = 1 \\ & t_i = x_i^{3/2}\qquad\forall i\\ & x^T\Sigma x \leq \sigma^2 \\ & x \geq 0 \end{array} $

For $t \geq x^{3/2}$ use the power cone (instead of $(s,t,x), (x,1/8,s)\in \mathcal{Q}_r^3$...). Remember:

$$ \mathcal{P}^\alpha_3 = \{ x\in\mathbb{R}^3 \mid x_1^\alpha x_2^{(1-\alpha)} \geq | x_3|, \: x_1, x_2 \geq 0 \} $$

Solution: $$t \geq |x|^{3/2} \Longleftrightarrow (t,1,x)\in\mathcal{P}_3^{(2/3,1/3)}.$$

Budget constraint $\sum_ix_i + a^Tt = 1$:

One power cone constraint for each asset: $(t_i, 1, x_i) \in\mathcal{P}_3^{(2/3,1/3)}$

Equivalent vectorization:

The convex quadratic risk constraint $x^T\Sigma x \leq \sigma^2$:

Set objective and solve:

Exponential Cone Programming

The exponential cone: closure of the epigraph of the perspective of the exponential function:

$ \mathcal{K}_{exp} := \text{cl}\{ x\in\mathbb{R}^3 \mid x_1 \geq x_2 \exp(x_3/x_2), \: x_2 > 0 \} $

or more explicitly

$ \mathcal{K}_{exp} = \{ (x_1, x_2, x_3) \mid x_1 \geq x_2 \exp(x_3/x_2), \: x_2 > 0 \}\bigcup\{(x_1, 0, x_3) \mid x_1 \geq 0, x_3\leq 0\} $

Exponential cone modeling

Exponential cone case study: logistic regression

Given $n$ binary-labeled training-points $\{ (x_i, y_i) \}$ in $\mathbb{R}^{d+1}$, determine the classifier

$ h_\theta(x) = \frac{1}{1+\exp(-\theta^T x)} $
$\approx$ "probability that $x$ belongs to class 1".

Log-likelihood training:

$ \begin{array}{lll} \mbox{minimize} & \sum_i t_i \\ \mbox{s.t.} & t_i \geq \log( 1 + \exp(-\theta^T x_i)), & y_i = 1\\ & t_i \geq \log( 1 + \exp(\theta^T x_i)), & y_i = 0\\ & \theta\in\mathbb{R}^d \end{array} $

How to model the softplus function $t\geq\log(1+\exp(z))$?

Hint: write as $e^{\ldots} + e^{\ldots} \leq 1$ and use 2 cone constraints as in

$ e^x\leq t \quad \Longleftrightarrow \quad (t,1,x)\in \mathcal{K}_{exp} $

Solution: $$ u + v \leq 1, (u,1,z-t), (v,1,-t)\in\mathcal{K}_{exp}\implies e^{z-t} + e^{-t} \leq u + v \leq 1 $$

$$ \implies e^t\geq 1 + e^z\implies t\geq\log(1+\exp(z)) $$

Take monomials with some max degree $D$ as features: $$\theta^T x_i = \sum_{(a,b)~:~a+b\leq D}\theta_{(a,b)}p_{i1}^ap_{i2}^b$$

Algorithms employed by MOSEK

The interior-point optimizer log:

The interior-point optimizer solves the so-called homogeneous model, and converges towards

at the same time!

... or otherwise provides certificates of primal or dual infeasibility (modulo ill-posedness).

Integer variables

MOSEK allows integer variables together with all cone types seen until now.

Example: Logistic regression problem as above with feature selection:

$ \begin{array}{lll} \mbox{minimize} & \sum_i t_i {\color{red}{~+~ F\cdot\lvert\{j~|~\theta_j \neq 0\}\rvert}}\\ \mbox{s.t.} & t_i \geq \log( 1 + \exp(-\theta^T x_i)), & y_i = 1\\ & t_i \geq \log( 1 + \exp(\theta^T x_i)), & y_i = 0\\ & \theta\in\mathbb{R}^d \end{array} $

Modeled with bigM-constraints:

$ \begin{array}{lll} \mbox{minimize} & \sum_i t_i {\color{red}{~+~ F\cdot\sum_jz_j}}\\ \mbox{s.t.} & t_i \geq \log( 1 + \exp(-\theta^T x_i)), & y_i = 1\\ & t_i \geq \log( 1 + \exp(\theta^T x_i)), & y_i = 0\\ & {\color{red}{-Mz_j\leq\theta_j \leq Mz_j}}&{\color{red}{\forall j}}\\ & {\color{red}{z_j\in\{0,1\}^d}}\\ & \theta\in\mathbb{R}^d \end{array} $

Semidefinite Programming

The PSD cone can be defined in matrix space $\mathbb{R}^{n\times n}$:

$ \mathbb{S}^n_+ := \{ X\in \mathbb{R}^{n\times n} \mid X\in\mathbb{S}^n, z^T X z \geq 0, \: \forall z \in\mathbb{R}^n \} $

If desired one may also work in vector space $\mathbb{R}^{n(n+1)/2}$:

$ \mathcal{S}^{n(n+1)/2} := \{ x\in \mathbb{R}^{n(n+1)/2} \mid x = \text{svec}(X)~\forall X\in\mathbb{S}_+^n\} $

with $ \text{svec}(X) := (X_{11}, \sqrt{2}X_{21}, \ldots, \sqrt{2}X_{n1}, X_{22}, \sqrt{2}X_{32}, \ldots, X_{nn})^T $

Note that $\Vert X\Vert_F = \Vert \text{svec}(X)\Vert_2$.

SDP case study: nearest correlation matrix

Let $C\in\mathbb{S}^n$ and assume we want to find its nearest correlation matrix

$ X^*\in\mathcal{C}:=\{X\in\mathbb{S}^n_+~|~X_{ii} = 1 ~\forall i=1,\ldots, n\}, $

i.e. $X^* = \min_{X\in\mathcal{C}}\Vert C - X\Vert_F.$

$ \begin{array}{ll} \mbox{minimize} & t\\ \mbox{s.t.} & (t, \text{svec}(C-X))\in\mathcal{Q}^{n(n+1)/2 + 1} \\ & \text{diag}(X) = e\\ & X\in \mathbb{S}^{n}_+. \end{array} $

Also the SDP cone is a symmetric cone!

Are there more cones?

"Exotic" cones:

But for every new cone, questions arise:

Do we need more cones?

Remember the folklore: "Almost all convex constraints that arise in practice are representable using these 5 cones."

Further remarks

Further reading

Questions?