9 Mixed integer optimization

In other chapters of this cookbook we have considered different classes of convex problems with continuous variables. In this chapter we consider a much wider range of non-convex problems by allowing integer variables. This technique is extremely useful in practice, and already for linear programming it covers a vast range of problems. We introduce different building blocks for integer optimization, which make it possible to model useful non-convex dependencies between variables in conic problems. It should be noted that mixed integer optimization problems are very hard (technically NP-hard), and for many practical cases an exact solution may not be found in reasonable time.

9.1 Integer modeling

A general mixed integer conic optimization problem has the form

(9.1)minimizecTxsubject toAx=b,xK,xiZ,iI,

where K is a cone and I{1,,n} denotes the set of variables that are constrained to be integers.

Two major techniques are typical for mixed integer optimization. The first one is the use of binary variables, also known as indicator variables, which only take values 0 and 1, and indicate the absence or presence of a particular event or choice. This restriction can of course be modeled in the form (9.1) by writing:

0x1 and xZ.

The other, known as big-M, refers to the fact that some relations can only be modeled linearly if one assumes some fixed bound M on the quantities involved, and this constant enters the model formulation. The choice of M can affect the performance of the model, see Example 7.8.

9.1.1 Implication of positivity

Often we have a real-valued variable xR satisfying 0x<M for a known upper bound M, and we wish to model the implication

(9.2)x>0z=1.

Making z a binary variable we can write (9.2) as a linear inequality

(9.3)xMz, z{0,1}.

Indeed x>0 excludes the possibility of z=0, hence forces z=1. Since a priori xM, there is no danger that the constraint accidentally makes the problem infeasible. A typical use of this trick is to model fixed setup costs.

Example 9.1 Fixed setup cost

Assume that production of a specific item i costs ui per unit, but there is an additional fixed charge of wi if we produce item i at all. For instance, wi could be the cost of setting up a production plant, initial cost of equipment etc. Then the cost of producing xi units of product i is given by the discontinuous function

ci(xi)={wi+uixi,xi>00,xi=0.

If we let M denote an upper bound on the quantities we can produce, we can then minimize the total production cost of n products under some affine constraint Ax=b with

minimizeuTx+wTzsubject toAx=b,xiMzi,i=1,,nx0,z{0,1}n,

which is a linear mixed-integer optimization problem. Note that by minimizing the production cost, we drive zi to 0 when xi=0, so setup costs are indeed included only for products with xi>0.

_images/fixedcost.png

Fig. 9.1 Production cost with fixed setup cost wi.

9.1.2 Semi-continuous variables

We can also model semi-continuity of a variable xR,

(9.4)x0[a,b],

where 0<ab using a double inequality

azxbz, z{0,1}.

9.1.3 Indicator constraints

Suppose we want to model the fact that a certain linear inequality must be satisfied when some other event occurs. In other words, for a binary variable z we want to model the implication

z=1aTxb.

Suppose we know in advance an upper bound aTxbM. Then we can write the above as a linear inequality

aTxb+M(1z).

Now if z=1 then we forced aTxb, while for z=0 the inequality is trivially satisfied and does not enforce any additional constraint on x.

9.1.4 Disjunctive constraints

With a disjunctive constraint we require that at least one of the given linear constraints is satisfied, that is

(a1Txb1)(a2Txb2)(akTxbk).

Introducing binary variables z1,,zk, we can use Sec. 9.1.3 (Indicator constraints) to write a linear model

z1++zk1,z1,,zk{0,1},aiTxbi+M(1zi), i=1,,k.

Note that zj=1 implies that the j-th constraint is satisfied, but not vice-versa. Achieving that effect is described in the next section.

9.1.5 Constraint satisfaction

Say we want to define an optimization model that will behave differently depending on which of the inequalities

aTxboraTxb

is satisfied. Suppose we have lower and upper bounds for aTxb in the form of maTxbM. Then we can write a model

(9.5)b+mzaTxb+M(1z), z{0,1}.

Now observe that z=0 implies baTxb+M, of which the right-hand inequality is redundant, i.e. always satisfied. Similarly, z=1 implies b+maTxb. In other words z is an indicator of whether aTxb.

In practice we would relax one inequality using a small amount of slack, i.e.,

(9.6)b+(mϵ)z+ϵaTx

to avoid issues with classifying the equality aTx=b.

9.1.6 Exact absolute value

In Sec. 2.2.2 (Absolute value) we showed how to model |x|t as two linear inequalities. Now suppose we need to model an exact equality

(9.7)|x|=t.

It defines a non-convex set, hence it is not conic representable. If we split x into positive and negative part x=x+x, where x+,x0, then |x|=x++x as long as either x+=0 or x=0. That last alternative can be modeled with a binary variable, and we get a model of (9.7):

(9.8)x=x+x,t=x++x,0x+,x,x+Mz,xM(1z),z{0,1},

where the constant M is an a priori known upper bound on |x| in the problem.

9.1.7 Exact 1-norm

We can use the technique above to model the exact 1-norm equality constraint

(9.9)i=1n|xi|=c,

where xRn is a decision variable and c is a constant. Such constraints arise for instance in fully invested portfolio optimizations scenarios (with short-selling). As before, we split x into a positive and negative part, using a sequence of binary variables to guarantee that at most one of them is nonzero:

(9.10)x=x+x,0x+,x,x+cz,xc(ez),ixi++ixi=c,z{0,1}n, x+,xRn.

9.1.8 Maximum

The exact equality t=max{x1,,xn} can be expressed by introducing a sequence of mutually exclusive indicator variables z1,,zn, with the intention that zi=1 picks the variable xi which actually achieves maximum. Choosing a safe bound M we get a model:

(9.11)xit  xi+M(1zi), i=1,,n,z1++zn=1,z{0,1}n.

9.1.9 Boolean operators

Typically an indicator variable z{0,1} represents a boolean value (true/false). In this case the standard boolean operators can be implemented as linear inequalities. In the table below we assume all variables are binary.

Table 9.1 Boolean operators

Boolean

Linear

z=x OR y

xz, yz, zx+y

z=x AND y

xz, yz, z+1x+y

z=NOT x

z=1x

xy

xy

At most one of z1,,zn holds (SOS1, set-packing)

izi1

Exactly one of z1,,zn holds (set-partitioning)

izi=1

At least one of z1,,zn holds (set-covering)

izi1

At most k of z1,,zn holds (cardinality)

izik

9.1.10 Fixed set of values

We can restrict a variable x to take on only values from a specified finite set {a1,,an} by writing

(9.12)x=iziaiz{0,1}n,izi=1.

In (9.12) we essentially defined zi to be the indicator variable of whether x=ai. In some circumstances there may be more efficient representations of a restricted set of values, for example:

  • (sign) x{1,1}x=2z1, z{0,1},

  • (modulo) x{1,4,7,10}x=3z+1, 0z3, zZ,

  • (fraction) x{0,1/3,2/3,1}3x=z, 0z3,zZ,

  • (gap) x(,a][b,)bM(1z)xa+Mz, z{0,1} for sufficiently large M.

In a very similar fashion we can restrict a variable x to a finite union of intervals i[ai,bi]:

(9.13)iziaixizibi,z{0,1}n,izi=1.

9.1.11 Alternative as a bilinear equality

In the literature one often finds models containing a bilinear constraint

xy=0.

This constraint is just an algebraic formulation of the alternative x=0 or y=0 and therefore it can be written as a mixed-integer linear problem:

|x|Mz,|y|M(1z),z{0,1},

for a suitable constant M. The absolute value can be omitted if both x,y are nonnegative. Otherwise it can be modeled as in Sec. 2.2.2 (Absolute value).

9.1.12 Equal signs

Suppose we want to say that the variables x1,,xn must either be all nonnegative or all nonpositive. This can be achieved by introducing a single binary variable z{0,1} which picks one of these two alternatives:

M(1z)xiMz,i=1,,n.

Indeed, if z=0 then we have Mxi0 and if z=1 then 0xiM.

9.1.13 Continuous piecewise-linear functions

Consider a continuous, univariate, piecewise-linear, non-convex function f:[α1,α5]R shown in Fig. 9.2. At the interval [αj,αj+1], j=1,2,3,4 we can describe the function as

f(x)=λjf(αj)+λj+1f(αj+1)

where λjαj+λj+1αj+1=x and λj+λj+1=1. If we add a constraint that only two (adjacent) variables λj,λj+1 can be nonzero, we can characterize every value f(x) over the entire interval [α1,α5] as some convex combination,

f(x)=j=14λjf(αj).
_images/pwlnonconvex.png

Fig. 9.2 A univariate piecewise-linear non-convex function.

The condition that only two adjacent variables can be nonzero is sometimes called an SOS2 constraint. If we introduce indicator variables zi for each pair of adjacent variables (λi,λi+1), we can model an SOS2 constraint as:

λ1z1,λ2z1+z2,λ3z2+z3,λ4z4+z3,λ5z4z1+z2+z3+z4=1,z{0,1}4,

so that we have zj=1λi=0,i{j,j+1}. Collectively, we can then model the epigraph f(x)t as

(9.14)x=j=1nλjαj,j=1nλjf(αj)tλ1z1,λjzj+zj1,j=2,,n1,λnzn1,λ0,j=1nλj=1,j=1n1zj=1,z{0,1}n1,

for a piecewise-linear function f(x) with n terms. This approach is often called the lambda-method.

For the function in Fig. 9.2 we can reduce the number of integer variables by using a Gray encoding

_images/grayenc.png

of the intervals [αj,αj+1] and an indicator variable y{0,1}2 to represent the four different values of Gray code. We can then describe the constraints on λ using only two indicator variables,

(y1=0)λ3=0,(y1=1)λ1=λ5=0,(y2=0)λ4=λ5=0,(y2=1)λ1=λ2=0,

which leads to a more efficient characterization of the epigraph f(x)t,

x=j=15λjαj,j=15λjf(αj)t,λ3y1,λ1+λ5(1y1),λ4+λ5y2,λ1+λ2(1y2),λ0,j=15λj=1,y{0,1}2,

The lambda-method can also be used to model multivariate continuous piecewise-linear non-convex functions, specified on a set of polyhedra Pk. For example, for the function shown in Fig. 9.3 we can model the epigraph f(x)t as

(9.15)x=i=16λivi,i=16λif(vi)t,λ1z1+z2,λ2z1,λ3z2+z3,λ4z1+z2+z3+z4,λ5z3+z4,λ6z4,λ0,i=16λi=1,i=14zi=1,z{0,1}4.

Note, for example, that z2=1 implies that λ2=λ5=λ6=0 and x=λ1v1+λ3v3+λ4v4.

_images/triangles.png

Fig. 9.3 A multivariate continuous piecewise-linear non-convex function.

9.1.14 Lower semicontinuous piecewise-linear functions

The ideas in Sec. 9.1.13 (Continuous piecewise-linear functions) can be applied to lower semicontinuous piecewise-linear functions as well. For example, consider the function shown in Fig. 9.4. If we denote the one-sided limits by f(c):=limxcf(x) and f+(c):=limxcf(x), respectively, the one-sided limits, then we can describe the epigraph f(x)t for the function in Fig. 9.4 as

(9.16)x=λ1α1+(λ2+λ3+λ4)α2+λ5α3,λ1f(α1)+λ2f(α2)+λ3f(α2)+λ4f+(α2)+λ5f(α3)t,λ1+λ2z1,λ3z2,λ4+λ5z3,λ0,i=15λi=1,i=13zi=1,z{0,1}3,

where we have a different decision variable for the intervals [α1,α2), [α2,α2], and (α2,α3]. As a special case this gives us an alternative characterization of fixed charge models considered in Sec. 9.1.1 (Implication of positivity).

_images/lowersemicontpwl.png

Fig. 9.4 A univariate lower semicontinuous piecewise-linear function.

9.2 Mixed integer conic case studies

9.2.1 Wireless network design

The following problem arises in wireless network design and some other applications. We want to serve n clients with k transmitters. A transmitter with range rj can serve all clients within Euclidean distance rj from its position. The power consumption of a transmitter with range r is proportional to rα for some fixed 1α<. The goal is to assign locations and ranges to transmitters minimizing the total power consumption while providing coverage of all n clients.

Denoting by x1,,xnR2 locations of the clients, we can model this problem using binary decision variables zij indicating if the i-th client is covered by the j-th transmitter. This leads to a mixed-integer conic problem:

(9.17)minimizejrjαsubject torjpjxi2M(1zij),i=1,,n, j=1,,k,jzij1,i=1,,n,pjR2, zij{0,1}.

The objective can be realized by summing power bounds tjrjα or by directly bounding the α-norm of (r1,,rk). The latter approach would be recommended for large α.

This is a type of clustering problem. For α=1,2, respectively, we are minimizing the perimeter and area of the covered region. In practical applications the power exponent α can be as large as 6 depending on various factors (for instance terrain). In the linear cost model (α=1) typical solutions contain a small number of huge disks covering most of the clients. For increasing α large ranges are penalized more heavily and the disks tend to be more balanced.

9.2.2 Avoiding small trades

The standard portfolio optimization model admits a number of mixed-integer extensions aimed at avoiding solutions with very small trades. To fix attention consider the model

(9.18)maximizeμTxsubject to(γ,GTx)Qn+1,eTx=w+eTx0,x0,

with initial holdings x0, initial cash amount w, expected returns μ, risk bound γ and decision variable x. Here e is the all-ones vector. Let Δxj=xjxj0 denote the change of position in asset j.

Transaction costs

A transaction cost involved with nonzero Δxj could be modeled as

Tj(xj)={0,Δxj=0,αjΔxj+βj,Δxj0,

similarly to the problem from Example 9.1. Including transaction costs will now lead to the model:

(9.19)maximizeμTxsubject to(γ,GTx)Qn+1,eTx+αTx+βTz=w+eTx0,xx0Mz, x0xMz,x0, z{0,1}n,

where the binary variable zj is an indicator of Δxj0. Here M is a sufficiently large constant, for instance M=w+eTx0 will do.

Cardinality constraints

Another option is to fix an upper bound k on the number of nonzero trades. The meaning of z is the same as before:

(9.20)maximizeμTxsubject to(γ,GTx)Qn+1,eTx=w+eTx0,xx0Mz, x0xMz,eTzk,x0, z{0,1}n.

Trading size constraints

We can also demand a lower bound on nonzero trades, that is |Δxj|{0}[a,b] for all j. To this end we combine the techniques from Sec. 9.1.6 (Exact absolute value) and Sec. 9.1.2 (Semi-continuous variables) writing pj,qj for the indicators of Δxj>0 and Δxj<0, respectively:

(9.21)maximizeμTxsubject to(γ,GTx)Qn+1,eTx=w+eTx0,xx0=x+x,x+Mp, xMq,a(p+q)x++xb(p+q),p+qe,x,x+,x0, p,q{0,1}n.

9.2.3 Convex piecewise linear regression

Consider the problem of approximating the data (xi,yi), i=1,,n by a piecewise linear convex function of the form

f(x)=max{ajx+bj, j=1,k},

where k is the number of segments we want to consider. The quality of the fit is measured with least squares as

i(f(xi)yi)2.

Note that we do not specify the locations of nodes (breakpoints), i.e. points where f(x) changes slope. Finding them is part of the fitting problem.

As in Sec. 9.1.8 (Maximum) we introduce binary variables zij indicating that f(xi)=ajxi+bj, i.e. it is the j-th linear function that achieves maximum at the point xi. Following Sec. 9.1.8 (Maximum) we now have a mixed integer conic quadratic problem

(9.22)minimizeyf2subject toajxi+bjfi,i=1,,n, j=1,,k,fiajxi+bj+M(1zij),i=1,,n, j=1,,k,jzij=1,i=1,n,zij{0,1},i=1,,n, j=1,,k,

with variables a,b,f,z, where M is a sufficiently large constant.

_images/pieclin.png

Fig. 9.5 Convex piecewise linear fit with k=2,3,4 segments.

Frequently an integer model will have properties which formally follow from the problem’s constraints, but may be very hard or impossible for a mixed-integer solver to automatically deduce. It may dramatically improve efficiency to explicitly add some of them to the model. For example, we can enhance (9.22) with all inequalities of the form

zi,j+1+zi+i,j1,

which indicate that each linear segment covers a contiguous subset of the sample and additionally force these segments to come in the order of increasing j as i increases from left to right. The last statement is an example of symmetry breaking.