6. Mixed integer optimization

6.1. Introduction

In the previous chapters we have considered different classes of convex problems with continuous variables. In this chapter we consider a much wider range of problems by allowing integer variables. In particular, we consider conic and convex quadratic optimization problems where some of the variables are constrained to be integer valued. In principle, this allows us to model a vastly larger range of problems, but in practice we may not able to solve the resulting integer optimization problems within a reasonable amount of time.

6.2. Integer modeling

In this section we consider different building blocks for integer optimization, which we think of as building blocks for general integer problems.

6.2.1. Implication of positivity

Often we have a real-valued variable \(x\in \R\) satisfying \(0 \leq x < M\) for a known upper bound \(M\), and we wish to model the implication

(1)\[(x>0) \to (z = 1)\]

where \(z\in \{0,1\}\) is an indicator variable. Since \(0 \leq x <M\) we can rewrite (1) as a linear inequality

(2)\[x \leq Mz,\]

where \(z\in\{0,1\}\). From the characterization (2) we see that that (1) is equivalent to

\[(z=0) \to (x=0),\]

which is the well-known property of contraposition. In Sec. 6.3 we discuss such standard boolean expressions in more details.


Fig. 6.1 Production cost with fixed charge \(w_i\).

Example 6 Fixed charge cost

Assume that production of a specific item \(i\) has a production cost of \(u_i\) per unit produced with an additional fixed charge of \(w_i\) if we produce item \(i\), i.e., a discontinuous production cost

\[\begin{split}c_i(x_i) = \left \{ \begin{array}{ll}w_i + u_i x_i, & x_i > 0\\0, & x_i=0.\end{array}\right .\end{split}\]

shown in Fig. 6.1. If we let \(M\) denote an upper bound on the quantities we can produce, we get an alternative expression for the production cost,

\[c_i(x_i) = u_i x_i + w_i z_i, \qquad x_i \leq M z_i, \quad z_i\in\{0,1\},\]

where \(z_i\) is a binary variable indicating whether item \(i\) is produced or not. We can then minimize the total production cost under an affine constraint \(Ax=b\) as

\[\begin{split}\begin{array}{ll} \mbox{minimize}& u^T x + w^T z\\ \mbox{subject to}& Ax = b\\ & x_i \leq M z_i, \quad i=1,\dots,n\\ & x\geq 0\\ & z\in \{0,1\}^n, \end{array}\end{split}\]

which is a linear mixed-integer optimization problem. Note that by minimizing the production cost, we rule out the possibility that \(z_i=1\) and \(x_i=0\).

On the other hand, suppose \(x\) satisfies \(0<m \leq x\) for a known lower bound \(m\). We can then model the implication

(3)\[(z=1) \to (x>0),\]

(which is the reverse implication of (1)) as a linear inequality

(4)\[x \geq m z,\]

where \(z\in\{0,1\}\).

6.2.2. Semi-continuous variables

We can also model semi-continuity of a variable \(x\in \R^n\),

(5)\[x \in 0 \cup [a,b]\]

where \(0<a\leq b\) using a double inequality

(6)\[az \leq x \leq bz\]

where \(z\in \{0,1\}\).

6.2.3. Constraint satisfaction

Suppose we have an upper bound \(M\) on the affine expression \(a^T x - b\) where \(x\in \R^n\). We can then model the implication \((z=1) \to (a^Tx \leq b)\) as

(7)\[a^Tx \leq b + M(1-z)\]

where \(z\in\{0,1\}\), i.e., \(z\) indicates whether or not the constraint \(a^T x \leq b\) is satisfied. To model the reverse implication \((a^Tx\leq b)\to (z=1)\) we equivalently consider the contraposition \((z=0)\to (a^Tx > b)\), which we can rewrite as

\[a^T x > b + mz\]

where \(m<a^T x - b\) is a lower bound. In practice, we relax the strict inequality using a small amount of slack, i.e.,

(8)\[a^T x \geq b + (m-\epsilon)z + \epsilon.\]

Collectively, we thus model \((a^Tx \leq b) \leftrightarrow (z=1)\) as

(9)\[a^Tx \leq b + M(1-z), \qquad a^T x \geq b + (m-\epsilon)z + \epsilon.\]

In a similar fashion, we can model \((z=1)\to (a^T x \geq b)\) as

(10)\[a^Tx \geq b + m(1-z)\]

and \((z=0)\to (a^T x < b)\) as

(11)\[a^T x \leq b + (M+\epsilon) z - \epsilon,\]

so we model \((z=1)\leftrightarrow (a^T x \geq b)\) as

(12)\[a^Tx \geq b + m(1-z), \qquad a^T x \leq b + (M+\epsilon) z - \epsilon,\]

using the alower bound \(m<a^Tx - b\) and upper bound \(M>a^Tx - b\). We can combine (9) and (9) to model indication of equality constraints \(a^Tx = b\). To that end, we note that \((z=1)\to (a^T x = b)\) is equivalent to both (7) and (10), i.e.,

\[a^Tx \leq b + M(1-z), \qquad a^Tx \geq b + m(1-z).\]

On the other hand, \((z=0)\to (a^T x \neq b)\) (or equivalently \((z=0) \to (a^Tx > b) \lor (a^Tx < b)\)) can be written using (8) and (11) as

\[a^T x \geq b + (m-\epsilon) z_1 + \epsilon, \quad a^T x \leq b + (M+\epsilon) z_2 - \epsilon, \quad z_1 + z_2 -z \leq 1,\]

where \(z_1 + z_2 - z \leq 1\) is equivalent to \((z=0)\to (z_1=0)\lor (z_2=0)\).

6.2.4. Disjunctive constraints

With disjunctive constraints we require that at least one out of a set of constraints is satisfied. For example, we may require that

\[(a_1^T x \leq b_1) \lor (a_2^T x \leq b_2) \lor \cdots \lor (a_k^T x \leq b_k).\]

To that end, let \(M>a_j^T x - b_j,\:\forall j\) be a collective upper bound. Then we can model

\[(z=1) \to (a_1^T x \leq b_1) \lor (a_2^T x \leq b_2) \lor \cdots \lor (a_k^T x \leq b_k)\]


(13)\[z = z_1 + \dots + z_k \geq 1, \qquad a_j^T x \leq b_j + M(1-z_j) \: \forall j,\]

where \(z_j\in \{0,1\}\) are binary variables. To characterize the reverse implication \((a_1^T x \leq b_1) \lor (a_2^T x \leq b_2) \lor \cdots \lor (a_k^T x \leq b_k)\to (z=1)\) we consider the contrapositive

\[(z=0)\to (a_1^T x > b_1) \land (a_2^T x > b_2) \land \cdots \land (a_k^T x > b_k),\]

which we can write as

\[a_j^T x \geq b + (m-\epsilon)z + \epsilon, \quad j=1,\dots,k,\]

for a lower bound \(m<a_j^T x - b_j, \: \forall j\).

6.2.5. Pack constraints

For a pack constraint we require that at most one of the constraints are satisfied. Considering (13) we can formulate this as

\[z_1 + \dots + z_k \leq 1, \qquad a_j^T x \leq b_j + M(1-z_j) \: \forall j.\]

6.2.6. Partition constraints

For a partition constraint we require that exactly one of the constraints are satisfied. Considering (13) we can formulate this as

\[z_1 + \dots + z_k = 1, \qquad a_j^T x \leq b_j + M(1-z_j) \: \forall j.\]

6.2.7. Continuous piecewise-linear functions

Consider the continuous univariate piecewise-linear non-convex function \(f:[\alpha_1,\alpha_5] \mapsto \R\) shown in Fig. 6.2. At the interval \([\alpha_j,\alpha_{j+1}]\), \(j=1,2,3,4\) we can describe the function as

\[f(x) = \lambda_j f(\alpha_j) + \lambda_{j+1} f(\alpha_{j+1})\]

for some \(\lambda_j,\lambda_{j+1}\geq 0\) with \(\lambda_j+\lambda_{j+1}=1\). If we add a constraint that only two (adjacent) variables \(\lambda_{j}, \lambda_{j+1}\) can be nonzero we can characterize \(f(x)\) over the entire interval \([\alpha_1,\alpha_5]\) as a convex combination,

\[f(x) =\sum_{j=1}^4 \lambda_j f(\alpha_j).\]

The condition that only two adjacent variables can be nonzero is sometimes called an SOS2 constraint. If we introduce indicator variables \(z_i\) for each pair of adjacent variables \((\lambda_i,\lambda_{i+1})\), we can model it as

\[\begin{split}\begin{array}{c} \lambda_1 \leq z_1, \quad \lambda_2 \leq z_1+z_2, \quad \lambda_3 \leq z_2+z_3, \quad \lambda_4 \leq z_4+z_3, \quad \lambda_5 \leq z_4\\ z_1+z_2+z_3+z_4 = 1, \quad z\in \{0,1\}^4, \end{array}\end{split}\]

which satisfies \((z_j=1)\to \lambda_i=0, i\neq\{j,j+1\}\). Collectively, we can then model the epigraph \(f(x)\leq t\) as

(14)\[\begin{split}\begin{array}{c} x=\sum_{j=1}^n \lambda_j \alpha_j, \quad \sum_{j=1}^n \lambda_j f(\alpha_j) \leq t\\ \lambda_1 \leq z_1, \qquad \lambda_j \leq z_j + z_{j-1}, \: j=2,\dots,n-1, \qquad \lambda_n \leq z_{n-1},\\ \lambda \geq 0,\qquad \sum_{j=1}^n \lambda_j = 1, \qquad \sum_{j=1}^{n-1} z_j =1, \qquad z\in \{0,1\}^{n-1}, \end{array}\end{split}\]

for a piecewise-linear function \(f(x)\) with \(n\) terms. This approach is often called the lambda-method.


Fig. 6.2 A univariate piecewise-linear non-convex function.

For the function in Fig. 6.2 we can reduce the number of integer variables by using a Gray encoding


of the intervals \([\alpha_j,\alpha_{j+1}]\) and an indicator variable \(y\in \{0,1\}^2\) to represent the four different values of Gray code. We can then describe the constraints on \(\lambda\) using only two indicator variables,

\[\begin{split}\begin{array}{lll} (y_1=0) & \to & \lambda_3=0\\ (y_1=1) & \to & \lambda_1=\lambda_5=0\\ (y_2=0) & \to & \lambda_4=\lambda_5=0\\ (y_2=1) & \to & \lambda_1=\lambda_2=0, \end{array}\end{split}\]

which leads to a more efficient characterization of the epigraph \(f(x)\leq t\),

\[\begin{split}\begin{array}{c} x=\sum_{j=1}^5\lambda_j \alpha_j, \quad \sum_{j=1}^5\lambda_j f(\alpha_j) \leq t,\\ \lambda_3 \leq y_1, \quad \lambda_1+\lambda_5\leq (1-y_1), \quad \lambda_4+\lambda_5\leq y_2, \quad \lambda_1+\lambda_2\leq (1-y_2),\\ \lambda\geq 0, \quad \sum_{j=1}^5\lambda_j=1, \quad y_1+y_2 = 1, \quad y\in \{0,1\}^2, \end{array}\end{split}\]

The lambda-method can also be used to model multivariate continuous piecewise-linear non-convex functions, specified on a set of polyhedra \(P_k\). For example, for the function shown in Fig. 6.3 we can model the epigraph \(f(x)\leq t\) as

(15)\[\begin{split}\begin{array}{c} x=\sum_{i=1}^6 \lambda_i v_i, \qquad \sum_{i=1}^6 \lambda_i f(v_i) \leq t,\\ \lambda_1 \leq z_1 + z_2, \qquad \lambda_2\leq z_1, \qquad \lambda_3 \leq z_2+z_3,\\ \lambda_4 \leq z_1 + z_2 + z_3 + z_4, \qquad \lambda_5 \leq z_3+z_4, \qquad \lambda_6 \leq z_4,\\ \lambda \geq 0, \qquad \sum_{i=1}^6 \lambda_i = 1, \qquad \sum_{i=1}^4 z_i=1,\\ \qquad z \in \{0,1\}^4. \end{array}\end{split}\]

Note, for example, that \(z_2=1\) implies that \(\lambda_2=\lambda_5=\lambda_6=0\) and \(x=\lambda_1 v_1 + \lambda_3 v_3 + \lambda_4 v_4\).


Fig. 6.3 A multivariate continuous piecewise-linear non-convex function.

For this function we can also reduce the number of indicator variables using a Gray encoding of decision variables for the different polyhedra. If we use a decision variable \(y\in\{0,1\}^2\) with a Gray encoding

\[(P_1, P_2, P_3, P_4) \to (00, 10, 11, 01),\]

we have

\[\begin{split}\begin{array}{lll} (y_1=0) & \to & \lambda_3=0\\ (y_1=1) & \to & \lambda_2=\lambda_6=0\\ (y_2=0) & \to & \lambda_5=\lambda_6=0\\ (y_2=1) & \to & \lambda_1=\lambda_2=0, \end{array}\end{split}\]

which gives the characterization of the epigraph

(16)\[\begin{split} \begin{array}{c} x=\sum_{i=1}^6 \lambda_i v_i, \qquad \sum_{i=1}^6 \lambda_i f(v_i) \leq t,\\ \lambda_1 \leq y_1, \quad \lambda_1+\lambda_5\leq y_2, \quad \lambda_4+\lambda_5 \leq y_3, \quad \lambda_1+\lambda_2 \leq (1-y_2),\\ \lambda \geq 0, \qquad \sum_{i=1}^6 \lambda_i = 1, \qquad \sum_{i=1}^2 y_i=1, \qquad y \in \{0,1\}^2. \end{array}\end{split}\]

For multidimensional piecewise linear functions, a reduction of the number of binary decision variables is only possible when the polyhedral regions have a special topology, for example a “Union Jack” arrangement as shown in Fig. 6.3. We refer to the recent survey [VAN10] for further references and comparisons between different mixed-integer formulations for piecewise linear functions.

6.2.8. Lower semicontinuous piecewise-linear functions

The ideas in Sec. 6.2.7 can be applied to lower semicontinuous piecewise-linear functions as well. For example, consider the function shown in Fig. 6.4. If we denote the one-sided limits by \(f_{-}(c):=\lim_{x\uparrow c}f(x)\) and \(f_{+}(c):=\lim_{x\downarrow c}f(x)\), respectively, the one-sided limits, then we can describe the epigraph \(f(x)\leq t\) for the function in Fig. 6.4 as

(17)\[\begin{split}\begin{array}{c} x=\lambda_1 \alpha_1 + (\lambda_2+\lambda_3+\lambda_4)\alpha_2 + \lambda_5 \alpha_5,\\ \lambda_1 f(\alpha_1) + \lambda_2 f_-(\alpha_2) + \lambda_3 f(\alpha_2) + \lambda_4 f_+(\alpha_2) + \lambda_5 f(\alpha_3) \leq t, \\ \lambda_1 + \lambda_2 \leq z_1, \quad \lambda_3\leq z_2, \quad \lambda_4+\lambda_5 \leq z_3,\\ \lambda \geq 0, \qquad \sum_{i=5}^6 \lambda_i = 1, \qquad \sum_{i=1}^3 z_i=1, \qquad z \in \{0,1\}^3, \end{array}\end{split}\]

where we a different decision variable for the intervals \([\alpha_1,\alpha_2)\), \([\alpha_2,\alpha_2]\), and \((\alpha_2,\alpha_3]\). As a special case this gives us an alternative characterization of fixed charge models considered in Sec. 6.2.1.


Fig. 6.4 A univariate lower continuous piecewise-linear function.

6.3. Boolean primitives

In the previous sections we used a binary decision variables to indicate whether or not a real valued affine expression was positive. In general, we can express a multitude of boolean expressions as linear inequality constraints involving integer variables. The basic operators from boolean algebra are complement, conjunction and disjunction, which are used to construct more complicated boolean expressions. In the following we let \(x\) and \(y\) denote binary variables with values \(\{0,1\}\).

6.3.1. Complement

Logical complement \(\lnot x\) is defined by the following truth table.

\(x\) \(\lnot x\)
0 1
1 0

Thus in terms of an affine relationship,

(18)\[ \lnot x = 1 - x.\]

6.3.2. Conjunction

Logical conjunction \((x\land y)\) is defined by the following truth table.

\(x\) \(y\) \(x \land y\)
0 0 0
1 0 0
0 1 0
1 1 1

Thus \(z = (x \land y)\) is equivalent to

(19)\[ z + 1 \geq x + y, \qquad x \geq z, \qquad y \geq z.\]

6.3.3. Disjunction

Logical disjunction \((x\lor y)\) is defined by the following truth table.

\(x\) \(y\) \(x \lor y\)
0 0 0
1 0 1
0 1 1
1 1 1

Thus \((x \lor y)\) is equivalent to

(20)\[ x + y \geq 1.\]

The following table shows basic properties that are easily verified using the truth tables for complement, conjunction and disjunction.

Associativity \(x\lor(y\lor z) = (x\lor y)\lor z\) \(x\land(y\land z) = (x\land y)\land z\)
Commutativity \(x\lor y = y \lor x\) \(x\land y = y\land x\)
Distributivity \(x\lor (y \land z) = (x \lor y) \land (x \lor z)\) \(x\land (y \lor z) = (x \land y) \lor (x \land z)\)
De Morgan \(\lnot (x \land y) = \lnot x \lor \lnot y\) \(\lnot (x \lor y) = \lnot x \land \lnot y\)

The last row in the table shows the properties called De Morgan’s laws, which shows that conjugation and disjunction in a sense are dual operators.

6.3.4. Implication

Logical implication \((x \to y)\) is defined by the following truth table.

\(x\) \(y\) \(x \to y\)
0 0 1
1 0 0
0 1 1
1 1 1

It is easy verify that that \((x \to y)\) is equivalent to both \(( \lnot x \lor y)\) and

(21)\[\lnot y \to \lnot x.\]

In terms of indicator variables we have that \(x \to y\) is equivalent to

(22)\[x\leq y.\]
Example 7 Implication

Consider two real-valued variables \(0\leq u,v \leq M\). We can model complementary \(uv=0\) by introducing two indicator variables \(x\) and \(y\),

\[(u>0) \to (x=1), \qquad (v>0) \to (y=1),\]

or equivalently

\[\lnot x \to (u=0), \qquad \lnot y \to (v=0).\]

We then model \((u=0)\lor(v=0)\) as \((\lnot x \lor \lnot y)\), which can be written as

\[u\leq Mx, \qquad v\leq My, \qquad (1-x) + (1-y) \geq 1.\]

6.3.5. Exclusive disjunction

Logical exclusive disjunction \((x \oplus y)\) is defined by the following truth table.

\(x\) \(y\) \(x \oplus y\)
0 0 0
1 0 1
0 1 1
1 1 0

Thus \(z=x \oplus y\) is equivalent to

(23)\[ z \leq x+y, \qquad z \geq x-y, \qquad z\geq -x +y, \qquad z \leq 2-x-y.\]

It is easy to verify from the truth table that \((x \oplus y)\) is equivalent to \(( x \lor y) \land \lnot (x \land y)\).

6.4. Integer optimization

A general mixed integer conic optimization problem has the form

\[\begin{split}\begin{array}{ll} \mbox{minimize}& c^T x\\ \mbox{subject to}& Ax = b\\ & x \in {\cal C}\\ & x_i \in \mathbb{Z}, \quad \forall i\in {\cal I}, \end{array}\end{split}\]

where \(\cal C\) is direct product of self-dual cones (discussed in Chaps. 2-4) and \({\cal I}\subseteq \{1, \dots, n\}\) denotes the set of variables that are constrained to be integers.

6.5. Bibliography

This chapter is based on the books [NW88], [Wil93]. Modeling of piecewise linear functions is described in the survey paper [VAN10].