13.4 Conic Optimization

For conic optimization problems only an interior-point type optimizer is available.

13.4.1 The Interior-point optimizer

13.4.1.1 The homogeneous primal-dual problem

The interior-point optimizer is an implementation of the so-called homogeneous and self-dual algorithm. For a detailed description of the algorithm, please see [ART03]. In order to keep our discussion simple we will assume that MOSEK solves a conic optimization problem of the form:

(1)\[\begin{split}\begin{array}{lcll} \mbox{minimize} & c^T x & & \\ \mbox{subject to} & A x & = & b, \\ & x \in \K & & \end{array}\end{split}\]

where \(K\) is a convex cone. The corresponding dual problem is

(2)\[\begin{split}\begin{array}{lcll} \mbox{maximize} & b^T y & & \\ \mbox{subject to} & A^T y + s & = & c, \\ & x \in \K^* & & \end{array}\end{split}\]

where \(\K^*\) is the dual cone of \(\K\). See Sec. 12.2 (Conic Quadratic Optimization) for definitions.

Since it is not known beforehand whether problem (1) has an optimal solution, is primal infeasible or is dual infeasible, the optimization algorithm must deal with all three situations. This is the reason that MOSEK solves the so-called homogeneous model

(3)\[\begin{split}\begin{array}{rcl} A x - b \tau & = & 0, \\ A^T y + s - c \tau & = & 0, \\ -c^T x + b^T y - \kappa & = & 0, \\ x & \in & \K, \\ s & \in & \K^*, \\ \tau ,\kappa & \geq & 0, \end{array}\end{split}\]

where \(y\) and \(s\) correspond to the dual variables in (1), and \(\tau\) and \(\kappa\) are two additional scalar variables. Note that the homogeneous model (3) always has a solution since

\[(x,y,s,\tau ,\kappa ) = (0,0,0,0,0)\]

is a solution, although not a very interesting one. Any solution

\[(x^*,y^*,s^*,\tau^*,\kappa^*)\]

to the homogeneous model (3) satisfies

\[(x^*)^T s^* + \tau^* \kappa^* = 0\]

i.e. complementarity. Observe that \(x^* \in \K\) and \(s^* \in \K^*\) implies

\[(x^*)^T s^* \geq 0\]

and therefore

\[\tau^* \kappa^* = 0.\]

since \(\tau^*, \kappa^* \geq 0\). Hence, at least one of \(\tau^*\) and \(\kappa^*\) is zero.

First, assume that \(\tau^*>0\) and hence \(\kappa^*=0\). It follows that

\[\begin{split}\begin{array}{rcl} A \frac{x^*}{\tau^*} & = & b, \\ A^T \frac{y^*}{\tau^*} + \frac{s^*}{\tau^*} & = & c, \\ - c^T \frac{x^*}{\tau^*} + b^T\frac{y^*}{\tau^*} & = & 0, \\ x^*/\tau^* & \in & \K, \\ s^*/\tau^* & \in & \K^*. \end{array}\end{split}\]

This shows that \(\frac{x^*}{\tau^*}\) is a primal optimal solution and \((\frac{y^*}{\tau^*},\frac{s^*}{\tau^*})\) is a dual optimal solution; this is reported as the optimal interior-point solution since

\[(x,y,s) = \left ( \frac{x^*}{\tau^*},\frac{y^*}{\tau^*},\frac{s^*}{\tau^*} \right )\]

is a primal-dual optimal solution.

On other hand, if \(\kappa^* >0\) then

\[\begin{split}\begin{array}{rcl} A x^* & = & 0, \\ A^T y^* + s^* & = & 0, \\ -c^T x^* + b^T y^* & = & \kappa^*, \\ x^* & \in & \K, \\ s^* & \in & \K^*. \end{array}\end{split}\]

This implies that at least one of

(4)\[c^T x^* < 0\]

or

(5)\[b^T y^* > 0\]

holds. If (4) is satisfied, then \(x^*\) is a certificate of dual infeasibility, whereas if (5) holds then \(y^*\) is a certificate of primal infeasibility.

In summary, by computing an appropriate solution to the homogeneous model, all information required for a solution to the original problem is obtained. A solution to the homogeneous model can be computed using a primal-dual interior-point algorithm [And09].

13.4.1.2 Interior-point Termination Criterion

Since computations are performed in finite precision, and for efficiency reasons, it is not possible to solve the homogeneous model exactly in general. Hence, an exact optimal solution or an exact infeasibility certificate cannot be computed and a reasonable termination criterion has to be employed.

In every iteration \(k\) of the interior-point algorithm a trial solution

\[(x^k,y^k,s^k,\tau^k,\kappa^k)\]

to the homogeneous model is generated, where

\[x^k \in \K, s^k \in \K^*, \tau^k, \kappa^k > 0.\]

Therefore, it is possible to compute the values:

\[\begin{split}\begin{array}{rcl} \rho_p^k & = & \mbox{arg min}_{\rho} \left \{\rho \mid \, \left\| A \frac{x^k}{\tau^k} - b \right\|_\infty \leq \rho \varepsilon_p (1+\left\| b \right\|_\infty ) \right \}, \\ \rho_d^k & = & \mbox{arg min}_{\rho} \left \{\rho \mid \, \left\| A^T \frac{y^k}{\tau^k} + \frac{s^k}{\tau^k}- c \right\|_\infty \leq \rho \varepsilon_d (1+\left\| c \right\|_\infty ) \right \}, \\ \rho_g^k & = & \mbox{arg min}_{\rho} \left \{\rho \mid \, \left( \frac{(x^k)^T s^k}{ (\tau^k)^2 }, \vert \frac{c^T x^k}{\tau^k} - \frac{ b^T y^k}{\tau^k} \vert \right) \leq \rho \varepsilon_g \max \left( 1, \frac{ \min\left( \left\vert c^T x^k \right\vert, \left\vert b^T y^k \right\vert \right) }{\tau^k} \right) \right \}, \\ \rho_{pi}^k & = & \mbox{arg min}_{\rho} \left \{\rho \mid \, \left\| A^T y^k + s^k \right \|_\infty \leq \rho \varepsilon_i b^T y^k,\, b^T y^k > 0 \right \} \mbox{ and} \\ \rho_{di}^k & = & \mbox{arg min}_{\rho} \left \{\rho \mid \, \left\| A x^k \right \|_\infty \leq - \rho \varepsilon_i c^T x^k,\, c^T x^k < 0 \right \}. \\ \end{array}\end{split}\]

Note \(\varepsilon_p, \varepsilon_d, \varepsilon_g \mbox{ and } \varepsilon_i\) are nonnegative user specified tolerances.

Optimal Case

Observe \(\rho_p^k\) measures how far \(x^k/\tau^k\) is from being a good approximate primal feasible solution. Indeed if \(\rho_p^k \leq 1\), then

(6)\[\left\| A \frac{x^k}{\tau^k} - b \right\|_\infty \leq \varepsilon_p (1+\left\| b \right\|_\infty ).\]

This shows the violations in the primal equality constraints for the solution \(x^k/\tau^k\) is small compared to the size of \(b\) given \(\varepsilon_p\) is small.

Similarly, if \(\rho_d^k \leq 1\), then \((y^k,s^k)/\tau^k\) is an approximate dual feasible solution. If in addition \(\rho_g \leq 1\), then the solution \((x^k,y^k,s^k)/\tau^k\) is approximate optimal because the associated primal and dual objective values are almost identical.

In other words if \(\max(\rho_p^k,\rho_d^k,\rho_g^k) \leq 1\), then

\[\frac{(x^k,y^k,s^k)}{\tau^k}\]

is an approximate optimal solution.

Dual Infeasibility Certificate

Next assume that \(\rho_{di}^k \leq 1\) and hence

\[\left\| A x^k \right \|_\infty \leq - \varepsilon_i c^T x^k \mbox{ and } -c^T x^k > 0\]

holds. Now in this case the problem is declared dual infeasible and \(x^k\) is reported as a certificate of dual infeasibility. The motivation for this stopping criterion is as follows. Let

\[\bar{x} := \frac{x^k} {- c^T x^k }\]

and it is easy to verify that

\[\left\| A \bar x \right\|_\infty \leq \varepsilon_i \mbox{ and } c^T \bar{x} = -1\]

which shows \(\bar{x}\) is an approximate certificate of dual infeasibility, where \(\varepsilon_{i}\) controls the quality of the approximation.

Primal Infeasiblity Certificate

Next assume that \(\rho_{pi}^k \leq 1\) and hence

\[\left\| A^T y^k + s^k \right \|_\infty \leq \varepsilon_i b^T y^k \mbox{ and } b^T y^k > 0\]

holds. Now in this case the problem is declared primal infeasible and \((y^k,s^k)\) is reported as a certificate of primal infeasibility. The motivation for this stopping criterion is as follows. Let

\[\bar{y} := \frac{y^k} { b^T y^k } \mbox{ and } \bar{s} := \frac{s^k} { b^T y^k }\]

and it is easy to verify that

\[\left\| A^T \bar y + \bar s \right \|_\infty \leq \varepsilon_i \mbox{ and } b^T \bar y = 1\]

which shows \((y^k,s^k)\) is an approximate certificate of dual infeasibility, where \(\varepsilon_{i}\) controls the quality of the approximation.

13.4.1.3 Adjusting optimality criteria and near optimality

It is possible to adjust the tolerances \(\varepsilon_{p}\), \(\varepsilon_{d}\), \(\varepsilon_{g}\) and \(\varepsilon_{i}\) using parameters; see table for details.

Table 14 Parameters employed in termination criterion
ToleranceParameter name
\(\varepsilon_{p}\) MSK_DPAR_INTPNT_CO_TOL_PFEAS
\(\varepsilon_{d}\) MSK_DPAR_INTPNT_CO_TOL_DFEAS
\(\varepsilon_{g}\) MSK_DPAR_INTPNT_CO_TOL_REL_GAP
\(\varepsilon_{i}\) MSK_DPAR_INTPNT_CO_TOL_INFEAS

The default values of the termination tolerances are chosen such that for a majority of problems appearing in practice it is not possible to achieve much better accuracy. Therefore, tightening the tolerances usually is not worthwhile. However, an inspection of (6) reveals that the quality of the solution depends on \(\left\| b \right\|_{\infty}\) and \(\left\| c \right\|_{\infty}\); the smaller the norms are, the better the solution accuracy.

The interior-point method as implemented by MOSEK will converge toward optimality and primal and dual feasibility at the same rate [And09]. This means that if the optimizer is stopped prematurely then it is very unlikely that either the primal or dual solution is feasible. Another consequence is that in most cases all the tolerances, \(\varepsilon_p\), \(\varepsilon_d\), \(\varepsilon_g\) and \(\varepsilon_i\), have to be relaxed together to achieve an effect.

In some cases the interior-point method terminates having found a solution not too far from meeting the optimality condition (6). A solution is defined as near optimal if scaling the termination tolerances \(\varepsilon_{p}\), \(\varepsilon_{d}\), \(\varepsilon_{g}\) and \(\varepsilon_{g}\) by the same factor \(\varepsilon_{n}\in [1.0,+\infty]\) makes the condition (6) satisfied. A near optimal solution is therefore of lower quality but still potentially valuable. If for instance the solver stalls, i.e. it can make no more significant progress towards the optimal solution, a near optimal solution could be available and be good enough for the user. Near infeasibility certificates are defined similarly. The value of \(\varepsilon_n\) can be adjusted with the parameter MSK_DPAR_INTPNT_CO_TOL_NEAR_REL.

To conclude the discussion in this section, relaxing the termination criterion is usually not worthwhile.

13.4.1.4 The Interior-point Log

Below is a typical log output from the interior-point optimizer:

Optimizer  - threads                : 20
Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 1
Optimizer  - Cones                  : 2
Optimizer  - Scalar variables       : 6                 conic                  : 6
Optimizer  - Semi-definite variables: 0                 scalarized             : 0
Factor     - setup time             : 0.00              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 1                 after factor           : 1
Factor     - dense dim.             : 0                 flops                  : 1.70e+01
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME
0   1.0e+00  2.9e-01  3.4e+00  0.00e+00   2.414213562e+00   0.000000000e+00   1.0e+00  0.01
1   2.7e-01  7.9e-02  2.2e+00  8.83e-01   6.969257574e-01   -9.685901771e-03  2.7e-01  0.01
2   6.5e-02  1.9e-02  1.2e+00  1.16e+00   7.606090061e-01   6.046141322e-01   6.5e-02  0.01
3   1.7e-03  5.0e-04  2.2e-01  1.12e+00   7.084385672e-01   7.045122560e-01   1.7e-03  0.01
4   1.4e-08  4.2e-09  4.9e-08  1.00e+00   7.071067941e-01   7.071067599e-01   1.4e-08  0.01

The first line displays the number of threads used by the optimizer and the second line tells that the optimizer chose to solve the dual problem rather than the primal problem. The next line displays the problem dimensions as seen by the optimizer, and the Factor... lines show various statistics. This is followed by the iteration log.

Using the same notation as in Sec. 13.4.1 (The Interior-point optimizer) the columns of the iteration log have the following meaning:

  • ITE: Iteration index \(k\).
  • PFEAS: \(\left\| Ax^k - b \tau^k \right\|_{\infty}\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
  • DFEAS: \(\left\| A^T y^{k} + s^k - c \tau^k \right\|_{\infty}\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
  • GFEAS: \(| - c^T x^k + b^T y^{k} - \kappa^k|\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
  • PRSTATUS: This number converges to \(1\) if the problem has an optimal solution whereas it converges to \(-1\) if that is not the case.
  • POBJ: \(c^T x^k / \tau^k\). An estimate for the primal objective value.
  • DOBJ: \(b^T y^k / \tau^k\). An estimate for the dual objective value.
  • MU: \(\frac{ (x^k)^T s^k + \tau^k \kappa^k }{n+1}\) . The numbers in this column should always converge to zero.
  • TIME: Time spent since the optimization started (in seconds).