13.2 Linear Optimization¶

13.2.1 Optimizer Selection¶

Two different types of optimizers are available for linear problems: The default is an interior-point method, and the alternative is the simplex method (primal or dual). The optimizer can be selected using the parameter MSK_IPAR_OPTIMIZER.

The Interior-point or the Simplex Optimizer?

Given a linear optimization problem, which optimizer is the best: the simplex or the interior-point optimizer? It is impossible to provide a general answer to this question. However, the interior-point optimizer behaves more predictably: it tends to use between 20 and 100 iterations, almost independently of problem size, but cannot perform warm-start. On the other hand the simplex method can take advantage of an initial solution, but is less predictable from cold-start. The interior-point optimizer is used by default.

The Primal or the Dual Simplex Variant?

MOSEK provides both a primal and a dual simplex optimizer. Predicting which simplex optimizer is faster is impossible, however, in recent years the dual optimizer has seen several algorithmic and computational improvements, which, in our experience, make it faster on average than the primal version. Still, it depends much on the problem structure and size. Setting the MSK_IPAR_OPTIMIZER parameter to "MSK_OPTIMIZER_FREE_SIMPLEX" instructs MOSEK to choose one of the simplex variants automatically.

To summarize, if you want to know which optimizer is faster for a given problem type, it is best to try all the options.

13.2.2 The Interior-point Optimizer¶

The purpose of this section is to provide information about the algorithm employed in the MOSEK interior-point optimizer for linear problems and about its termination criteria.

13.2.2.1 The homogeneous primal-dual problem¶

In order to keep the discussion simple it is assumed that MOSEK solves linear optimization problems of standard form

(13.1)¶\[\begin{split}\begin{array}{lcll} \mbox{minimize} & c^T x & & \\ \mbox{subject to} & A x & = & b, \\ & x\geq 0. & & \end{array}\end{split}\]

This is in fact what happens inside MOSEK; for efficiency reasons MOSEK converts the problem to standard form before solving, then converts it back to the input form when reporting the solution.

Since it is not known beforehand whether problem (13.1) has an optimal solution, is primal infeasible or is dual infeasible, the optimization algorithm must deal with all three situations. This is the reason why MOSEK solves the so-called homogeneous model

(13.2)¶\[\begin{split}\begin{array}{rcl} A x - b \tau & = & 0, \\ A^T y + s - c \tau & = & 0, \\ -c^T x + b^T y - \kappa & = & 0, \\ x,s,\tau ,\kappa & \geq & 0, \end{array}\end{split}\]

where \(y\) and \(s\) correspond to the dual variables in (13.1), and \(\tau\) and \(\kappa\) are two additional scalar variables. Note that the homogeneous model (13.2) always has solution since

\[(x,y,s,\tau ,\kappa ) = (0,0,0,0,0)\]

is a solution, although not a very interesting one. Any solution

\[(x^*,y^*,s^*,\tau^*,\kappa^*)\]

to the homogeneous model (13.2) satisfies

\[x_{j}^* s_{j}^* = 0 \mbox{ and } \tau^* \kappa^* = 0.\]

Moreover, there is always a solution that has the property \(\tau^* + \kappa^* > 0\).

First, assume that \(\tau^*>0\) . It follows that

\[\begin{split}\begin{array}{rcl} A \frac{x^*}{\tau^*} & = & b, \\ A^T \frac{y^*}{\tau^*} + \frac{s^*}{\tau^*} & = & c, \\ - c^T \frac{x^*}{\tau^*} + b^T\frac{y^*}{\tau^*} & = & 0, \\ x^*,s^*,\tau^*,\kappa^* & \geq & 0. \end{array}\end{split}\]

This shows that \(\frac{x^*}{\tau^*}\) is a primal optimal solution and \((\frac{y^*}{\tau^*},\frac{s^*}{\tau *})\) is a dual optimal solution; this is reported as the optimal interior-point solution since

\[(x,y,s) = \left\lbrace \frac{x^*}{\tau^*},\frac{y^*}{\tau^*},\frac{s^*}{\tau^*} \right\rbrace\]

is a primal-dual optimal solution (see Sec. 12.1 (Linear Optimization) for the mathematical background on duality and optimality).

On other hand, if \(\kappa^* >0\) then

\[\begin{split}\begin{array}{rcl} A x^* & = & 0, \\ A^T y^* + s^* & = & 0, \\ -c^T x^* + b^T y^* & = & \kappa^*, \\ x^*, s^*, \tau^* ,\kappa^* & \geq & 0. \end{array}\end{split}\]

This implies that at least one of

(13.3)¶\[c^T x^* < 0\]

or

(13.4)¶\[b^T y^* > 0\]

is satisfied. If (13.3) is satisfied then \(x^*\) is a certificate of dual infeasibility, whereas if (13.4) is satisfied then \(y^*\) is a certificate of primal infeasibility.

In summary, by computing an appropriate solution to the homogeneous model, all information required for a solution to the original problem is obtained. A solution to the homogeneous model can be computed using a primal-dual interior-point algorithm [And09].

13.2.2.2 Interior-point Termination Criterion¶

For efficiency reasons it is not practical to solve the homogeneous model exactly. Hence, an exact optimal solution or an exact infeasibility certificate cannot be computed and a reasonable termination criterion has to be employed.

In the \(k\)-th iteration of the interior-point algorithm a trial solution

\[(x^k,y^k,s^k,\tau^k,\kappa^k)\]

to homogeneous model is generated, where

\[x^k,s^k,\tau^k,\kappa^k > 0.\]

Optimal case

Whenever the trial solution satisfies the criterion

(13.5)¶\[\begin{split}\begin{array}{rcl} \left\| A \frac{x^k}{\tau^k} - b \right\|_\infty & \leq & \epsilon_p (1+\left\| b \right\|_\infty ), \\ \left\| A^T \frac{y^k}{\tau^k} + \frac{s^k}{\tau^k}- c \right\|_\infty & \leq & \epsilon_d (1+\left\| c \right\|_\infty ), \mbox{ and} \\ \min \left( \frac{(x^k)^T s^k}{ (\tau^k)^2 }, \vert \frac{c^T x^k}{\tau^k} - \frac{ b^T y^k}{\tau^k} \vert \right) & \leq & \epsilon_g \max \left( 1, \frac{ \min\left( \left\vert c^T x^k \right\vert, \left\vert b^T y^k \right\vert \right) }{\tau^k} \right), \end{array}\end{split}\]

the interior-point optimizer is terminated and

\[\frac{(x^k,y^k,s^k)}{\tau^k}\]

is reported as the primal-dual optimal solution. The interpretation of (13.5) is that the optimizer is terminated if

\(\frac{x^k}{\tau^k}\) is approximately primal feasible,
\(\left\lbrace \frac{y^k}{\tau^k},\frac{s^k}{\tau^k} \right\rbrace\) is approximately dual feasible, and
the duality gap is almost zero.

Dual infeasibility certificate

On the other hand, if the trial solution satisfies

\[-\epsilon_i c^T x^k > \frac{ \left\| c \right\|_\infty }{ \max\left( 1, \left\| b \right\|_\infty \right) } \left\| A x^k \right\|_\infty\]

then the problem is declared dual infeasible and \(x^k\) is reported as a certificate of dual infeasibility. The motivation for this stopping criterion is as follows: First assume that \(\left\| A x^k \right\|_\infty = 0\) ; then \(x^k\) is an exact certificate of dual infeasibility. Next assume that this is not the case, i.e.

\[\left\| A x^{k} \right\|_\infty > 0,\]

and define

\[\bar{x} := \epsilon_i \frac{ \max \left( 1, \left\| b \right\|_\infty \right) } { \left\| A x^k \right\|_\infty \left\| c \right\|_\infty } x^k.\]

It is easy to verify that

\[\left\| A \bar x \right\|_\infty = \epsilon_i \frac{\max \left( 1,\left\| b \right\|_\infty\right)} {\left\| c \right\|_\infty} \mbox{ and } -c^T \bar{x} > 1,\]

which shows \(\bar{x}\) is an approximate certificate of dual infeasibility, where \(\varepsilon_{i}\) controls the quality of the approximation. A smaller value means a better approximation.

Primal infeasibility certificate

Finally, if

\[\epsilon_i b^T y^k > \frac{ \left\| b \right\|_\infty } { \max \left( 1,\left\| c \right\|_\infty \right) } \left\| A^T y^k + s^k \right\|_\infty\]

then \(y^k\) is reported as a certificate of primal infeasibility.

13.2.2.3 Adjusting optimality criteria¶

It is possible to adjust the tolerances \(\varepsilon_{p}\), \(\varepsilon_{d}\), \(\varepsilon_{g}\) and \(\varepsilon_{i}\) using parameters; see table for details.

Table 13.1 Parameters employed in termination criterion¶
ToleranceParameter	name
\(\varepsilon_{p}\)	`MSK_DPAR_INTPNT_TOL_PFEAS`
\(\varepsilon_{d}\)	`MSK_DPAR_INTPNT_TOL_DFEAS`
\(\varepsilon_{g}\)	`MSK_DPAR_INTPNT_TOL_REL_GAP`
\(\varepsilon_{i}\)	`MSK_DPAR_INTPNT_TOL_INFEAS`

The default values of the termination tolerances are chosen such that for a majority of problems appearing in practice it is not possible to achieve much better accuracy. Therefore, tightening the tolerances usually is not worthwhile. However, an inspection of (13.5) reveals that the quality of the solution depends on \(\left\| b \right\|_{\infty}\) and \(\left\| c \right\|_{\infty}\); the smaller the norms are, the better the solution accuracy.

The interior-point method as implemented by MOSEK will converge toward optimality and primal and dual feasibility at the same rate [And09]. This means that if the optimizer is stopped prematurely then it is very unlikely that either the primal or dual solution is feasible. Another consequence is that in most cases all the tolerances, \(\varepsilon_p\), \(\varepsilon_d\), \(\varepsilon_g\) and \(\varepsilon_i\), have to be relaxed together to achieve an effect.

The basis identification discussed in Sec. 13.2.2.4 (Basis Identification) requires an optimal solution to work well; hence basis identification should be turned off if the termination criterion is relaxed.

To conclude the discussion in this section, relaxing the termination criterion is usually not worthwhile.

13.2.2.4 Basis Identification¶

An interior-point optimizer does not return an optimal basic solution unless the problem has a unique primal and dual optimal solution. Therefore, the interior-point optimizer has an optional post-processing step that computes an optimal basic solution starting from the optimal interior-point solution. More information about the basis identification procedure may be found in [AY96]. In the following we provide an overall idea of the procedure.

There are some cases in which a basic solution could be more valuable:

a basic solution is often more accurate than an interior-point solution,
a basic solution can be used to warm-start the simplex algorithm in case of reoptimization,
a basic solution is in general more sparse, i.e. more variables are fixed to zero. This is particularly appealing when solving continuous relaxations of mixed integer problems, as well as in all applications in which sparser solutions are preferred.

To illustrate how the basis identification routine works, we use the following trivial example:

\[\begin{split}\begin{array}{lcll} \mbox{minimize} & x + y & & \\ \mbox{subject to} & x + y & = & 1, \\ & x,y\geq 0. & & \end{array}\end{split}\]

It is easy to see that all feasible solutions are also optimal. In particular, there are two basic solutions, namely

\[\begin{split}\begin{array}{lcll} (x_1^*,y_1^*) & = & (1,0), & \\ (x_2^*,y_2^*) & = & (0,1). & \end{array}\end{split}\]

The interior point algorithm will actually converge to the center of the optimal set, i.e. to \((x^*,y^*)=(1/2,1/2)\) (to see this in MOSEK deactivate Presolve).

In practice, when the algorithm gets close to the optimal solution, it is possible to construct in polynomial time an initial basis for the simplex algorithm from the current interior point solution. This basis is used to warm-start the simplex algorithm that will provide the optimal basic solution. In most cases the constructed basis is optimal, or very few iterations are required by the simplex algorithm to make it optimal and hence the final clean-up phase be short. However, for some cases of ill-conditioned problems the additional simplex clean up phase may take of lot a time.

By default MOSEK performs a basis identification. However, if a basic solution is not needed, the basis identification procedure can be turned off. The parameters

control when basis identification is performed.

The type of simplex algorithm to be used (primal/dual) can be tuned with the parameter MSK_IPAR_BI_CLEAN_OPTIMIZER, and the maximum number of iterations can be set with MSK_IPAR_BI_MAX_ITERATIONS.

Finally, it should be mentioned that there is no guarantee on which basic solution will be returned.

13.2.2.5 The Interior-point Log¶

Below is a typical log output from the interior-point optimizer:

Optimizer  - threads                : 1
Optimizer  - solved problem         : the dual
Optimizer  - Constraints            : 2
Optimizer  - Cones                  : 0
Optimizer  - Scalar variables       : 6                 conic                  : 0
Optimizer  - Semi-definite variables: 0                 scalarized             : 0
Factor     - setup time             : 0.00              dense det. time        : 0.00
Factor     - ML order time          : 0.00              GP order time          : 0.00
Factor     - nonzeros before factor : 3                 after factor           : 3
Factor     - dense dim.             : 0                 flops                  : 7.00e+001
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME
0   1.0e+000 8.6e+000 6.1e+000 1.00e+000  0.000000000e+000  -2.208000000e+003 1.0e+000 0.00
1   1.1e+000 2.5e+000 1.6e-001 0.00e+000  -7.901380925e+003 -7.394611417e+003 2.5e+000 0.00
2   1.4e-001 3.4e-001 2.1e-002 8.36e-001  -8.113031650e+003 -8.055866001e+003 3.3e-001 0.00
3   2.4e-002 5.8e-002 3.6e-003 1.27e+000  -7.777530698e+003 -7.766471080e+003 5.7e-002 0.01
4   1.3e-004 3.2e-004 2.0e-005 1.08e+000  -7.668323435e+003 -7.668207177e+003 3.2e-004 0.01
5   1.3e-008 3.2e-008 2.0e-009 1.00e+000  -7.668000027e+003 -7.668000015e+003 3.2e-008 0.01
6   1.3e-012 3.2e-012 2.0e-013 1.00e+000  -7.667999994e+003 -7.667999994e+003 3.2e-012 0.01

The first line displays the number of threads used by the optimizer and the second line indicates if the optimizer chose to solve the primal or dual problem (see MSK_IPAR_INTPNT_SOLVE_FORM). The next lines display the problem dimensions as seen by the optimizer, and the Factor... lines show various statistics. This is followed by the iteration log.

Using the same notation as in Sec. 13.2.2 (The Interior-point Optimizer) the columns of the iteration log have the following meaning:

ITE: Iteration index \(k\).
PFEAS: \(\left\| Ax^k - b \tau^k \right\|_{\infty}\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
DFEAS: \(\left\| A^T y^{k} + s^k - c \tau^k \right\|_{\infty}\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
GFEAS: \(| - c^T x^k + b^T y^{k} - \kappa^k|\) . The numbers in this column should converge monotonically towards zero but may stall at low level due to rounding errors.
PRSTATUS: This number converges to \(1\) if the problem has an optimal solution whereas it converges to \(-1\) if that is not the case.
POBJ: \(c^T x^k / \tau^k\). An estimate for the primal objective value.
DOBJ: \(b^T y^k / \tau^k\). An estimate for the dual objective value.
MU: \(\frac{ (x^k)^T s^k + \tau^k \kappa^k }{n+1}\) . The numbers in this column should always converge to zero.
TIME: Time spent since the optimization started.

13.2.3 The Simplex Optimizer¶

An alternative to the interior-point optimizer is the simplex optimizer. The simplex optimizer uses a different method that allows exploiting an initial guess for the optimal solution to reduce the solution time. Depending on the problem it may be faster or slower to use an initial guess; see Sec. 13.2.1 (Optimizer Selection) for a discussion. MOSEK provides both a primal and a dual variant of the simplex optimizer.

13.2.3.1 Simplex Termination Criterion¶

The simplex optimizer terminates when it finds an optimal basic solution or an infeasibility certificate. A basic solution is optimal when it is primal and dual feasible; see Sec. 12.1 (Linear Optimization) for a definition of the primal and dual problem. Due to the fact that computations are performed in finite precision MOSEK allows violations of primal and dual feasibility within certain tolerances. The user can control the allowed primal and dual tolerances with the parameters MSK_DPAR_BASIS_TOL_X and MSK_DPAR_BASIS_TOL_S.

Setting the parameter MSK_IPAR_OPTIMIZER to "MSK_OPTIMIZER_FREE_SIMPLEX" instructs MOSEK to select automatically between the primal and the dual simplex optimizers. Hence, MOSEK tries to choose the best optimizer for the given problem and the available solution. The same parameter can also be used to force one of the variants.

13.2.3.2 Starting From an Existing Solution¶

When using the simplex optimizer it may be possible to reuse an existing solution and thereby reduce the solution time significantly. When a simplex optimizer starts from an existing solution it is said to perform a warm-start. If the user is solving a sequence of optimization problems by solving the problem, making modifications, and solving again, MOSEK will warm-start automatically.

By default MOSEK uses presolve when performing a warm-start. If the optimizer only needs very few iterations to find the optimal solution it may be better to turn off the presolve.

13.2.3.3 Numerical Difficulties in the Simplex Optimizers¶

Though MOSEK is designed to minimize numerical instability, completely avoiding it is impossible when working in finite precision. MOSEK treats a “numerically unexpected behavior” event inside the optimizer as a set-back. The user can define how many set-backs the optimizer accepts; if that number is exceeded, the optimization will be aborted. Set-backs are a way to escape long sequences where the optimizer tries to recover from an unstable situation.

Examples of set-backs are: repeated singularities when factorizing the basis matrix, repeated loss of feasibility, degeneracy problems (no progress in objective) and other events indicating numerical difficulties. If the simplex optimizer encounters a lot of set-backs the problem is usually badly scaled; in such a situation try to reformulate it into a better scaled problem. Then, if a lot of set-backs still occur, trying one or more of the following suggestions may be worthwhile:

Raise tolerances for allowed primal or dual feasibility: increase the value of
- MSK_DPAR_BASIS_TOL_X, and
- MSK_DPAR_BASIS_TOL_S.
Raise or lower pivot tolerance: Change the MSK_DPAR_SIMPLEX_ABS_TOL_PIV parameter.
Switch optimizer: Try another optimizer.
Switch off crash: Set both MSK_IPAR_SIM_PRIMAL_CRASH and MSK_IPAR_SIM_DUAL_CRASH to 0.
Experiment with other pricing strategies: Try different values for the parameters
- MSK_IPAR_SIM_PRIMAL_SELECTION and
- MSK_IPAR_SIM_DUAL_SELECTION.
If you are using warm-starts, in rare cases switching off this feature may improve stability. This is controlled by the MSK_IPAR_SIM_HOTSTART parameter.
Increase maximum number of set-backs allowed controlled by MSK_IPAR_SIM_MAX_NUM_SETBACKS.
If the problem repeatedly becomes infeasible try switching off the special degeneracy handling. See the parameter MSK_IPAR_SIM_DEGEN for details.

13.2.3.4 The Simplex Log¶

Below is a typical log output from the simplex optimizer:

Optimizer  - solved problem         : the primal
Optimizer  - Constraints            : 667
Optimizer  - Scalar variables       : 1424              conic                  : 0
Optimizer  - hotstart               : no
ITER      DEGITER(%)  PFEAS       DFEAS       POBJ                  DOBJ                  TIME      TOTTIME
0         0.00        1.43e+05    NA          6.5584140832e+03      NA                    0.00      0.02
1000      1.10        0.00e+00    NA          1.4588289726e+04      NA                    0.13      0.14
2000      0.75        0.00e+00    NA          7.3705564855e+03      NA                    0.21      0.22
3000      0.67        0.00e+00    NA          6.0509727712e+03      NA                    0.29      0.31
4000      0.52        0.00e+00    NA          5.5771203906e+03      NA                    0.38      0.39
4533      0.49        0.00e+00    NA          5.5018458883e+03      NA                    0.42      0.44

The first lines summarize the problem the optimizer is solving. This is followed by the iteration log, with the following meaning:

ITER: Number of iterations.
DEGITER(%): Ratio of degenerate iterations.
PFEAS: Primal feasibility measure reported by the simplex optimizer. The numbers should be 0 if the problem is primal feasible (when the primal variant is used).
DFEAS: Dual feasibility measure reported by the simplex optimizer. The number should be 0 if the problem is dual feasible (when the dual variant is used).
POBJ: An estimate for the primal objective value (when the primal variant is used).
DOBJ: An estimate for the dual objective value (when the dual variant is used).
TIME: Time spent since this instance of the simplex optimizer was invoked (in seconds).
TOTTIME: Time spent since optimization started (in seconds).