In other chapters of this cookbook we have considered different classes of convex problems with continuous variables. In this chapter we consider a much wider range of non-convex problems by allowing integer variables. This technique is extremely useful in practice, and already for linear programming it covers a vast range of problems. We introduce different building blocks for integer optimization, which make it possible to model useful non-convex dependencies between variables in conic problems. It should be noted that mixed integer optimization problems are very hard (technically NP-hard), and for many practical cases an exact solution may not be found in reasonable time.
where is a cone and denotes the set of variables that are constrained to be integers.
Two major techniques are typical for mixed integer optimization. The first one is the use of binary variables, also known as indicator variables, which only take values 0 and 1, and indicate the absence or presence of a particular event or choice. This restriction can of course be modeled in the form (9.1) by writing:
The other, known as big-M, refers to the fact that some relations can only be modeled linearly if one assumes some fixed bound on the quantities involved, and this constant enters the model formulation. The choice of can affect the performance of the model, see Example 7.8.
Indeed excludes the possibility of , hence forces . Since a priori , there is no danger that the constraint accidentally makes the problem infeasible. A typical use of this trick is to model fixed setup costs.
Example 9.1
Fixed setup cost
Assume that production of a specific item costs per unit, but there is an additional fixed charge of if we produce item at all. For instance, could be the cost of setting up a production plant, initial cost of equipment etc. Then the cost of producing units of product is given by the discontinuous function
If we let denote an upper bound on the quantities we can produce, we can then minimize the total production cost of products under some affine constraint with
which is a linear mixed-integer optimization problem. Note that by minimizing the production cost, we drive to 0 when , so setup costs are indeed included only for products with .
Suppose we want to model the fact that a certain linear inequality must be satisfied when some other event occurs. In other words, for a binary variable we want to model the implication
Suppose we know in advance an upper bound . Then we can write the above as a linear inequality
Now if then we forced , while for the inequality is trivially satisfied and does not enforce any additional constraint on .
Now observe that implies , of which the right-hand inequality is redundant, i.e. always satisfied. Similarly, implies . In other words is an indicator of whether .
In practice we would relax one inequality using a small amount of slack, i.e.,
It defines a non-convex set, hence it is not conic representable. If we split into positive and negative part , where , then as long as either or . That last alternative can be modeled with a binary variable, and we get a model of (9.7):
where is a decision variable and is a constant. Such constraints arise for instance in fully invested portfolio optimizations scenarios (with short-selling). As before, we split into a positive and negative part, using a sequence of binary variables to guarantee that at most one of them is nonzero:
The exact equality can be expressed by introducing a sequence of mutually exclusive indicator variables , with the intention that picks the variable which actually achieves maximum. Choosing a safe bound we get a model:
Typically an indicator variable represents a boolean value (true/false). In this case the standard boolean operators can be implemented as linear inequalities. In the table below we assume all variables are binary.
In (9.12) we essentially defined to be the indicator variable of whether . In some circumstances there may be more efficient representations of a restricted set of values, for example:
(sign) ,
(modulo) ,
(fraction) ,
(gap) for sufficiently large .
In a very similar fashion we can restrict a variable to a finite union of intervals :
Suppose we want to say that the variables must either be all nonnegative or all nonpositive. This can be achieved by introducing a single binary variable which picks one of these two alternatives:
Consider a continuous, univariate, piecewise-linear, non-convex function shown in Fig. 9.2. At the interval , we can describe the function as
where and . If we add a constraint that only two (adjacent) variables can be nonzero, we can characterize every value over the entire interval as some convex combination,
Fig. 9.2 A univariate piecewise-linear non-convex function.¶
The condition that only two adjacent variables can be nonzero is sometimes called an SOS2 constraint. If we introduce indicator variables for each pair of adjacent variables , we can model an SOS2 constraint as:
so that we have . Collectively, we can then model the epigraph as
for a piecewise-linear function with terms. This approach is often called the lambda-method.
For the function in Fig. 9.2 we can reduce the number of integer variables by using a Gray encoding
of the intervals and an indicator variable to represent the four different values of Gray code. We can then describe the constraints on using only two indicator variables,
which leads to a more efficient characterization of the epigraph ,
The lambda-method can also be used to model multivariate continuous piecewise-linear non-convex functions, specified on a set of polyhedra . For example, for the function shown in Fig. 9.3 we can model the epigraph as
The ideas in Sec. 9.1.13 (Continuous piecewise-linear functions) can be applied to lower semicontinuous piecewise-linear functions as well. For example, consider the function shown in Fig. 9.4. If we denote the one-sided limits by and , respectively, the one-sided limits, then we can describe the epigraph for the function in Fig. 9.4 as
where we have a different decision variable for the intervals , , and . As a special case this gives us an alternative characterization of fixed charge models considered in Sec. 9.1.1 (Implication of positivity).
Fig. 9.4 A univariate lower semicontinuous piecewise-linear function.¶
The following problem arises in wireless network design and some other applications. We want to serve clients with transmitters. A transmitter with range can serve all clients within Euclidean distance from its position. The power consumption of a transmitter with range is proportional to for some fixed . The goal is to assign locations and ranges to transmitters minimizing the total power consumption while providing coverage of all clients.
Denoting by locations of the clients, we can model this problem using binary decision variables indicating if the -th client is covered by the -th transmitter. This leads to a mixed-integer conic problem:
The objective can be realized by summing power bounds or by directly bounding the -norm of . The latter approach would be recommended for large .
This is a type of clustering problem. For , respectively, we are minimizing the perimeter and area of the covered region. In practical applications the power exponent can be as large as depending on various factors (for instance terrain). In the linear cost model () typical solutions contain a small number of huge disks covering most of the clients. For increasing large ranges are penalized more heavily and the disks tend to be more balanced.
The standard portfolio optimization model admits a number of mixed-integer extensions aimed at avoiding solutions with very small trades. To fix attention consider the model
with initial holdings , initial cash amount , expected returns , risk bound and decision variable . Here is the all-ones vector. Let denote the change of position in asset .
Transaction costs
A transaction cost involved with nonzero could be modeled as
similarly to the problem from Example 9.1. Including transaction costs will now lead to the model:
Consider the problem of approximating the data by a piecewise linear convex function of the form
where is the number of segments we want to consider. The quality of the fit is measured with least squares as
Note that we do not specify the locations of nodes (breakpoints), i.e. points where changes slope. Finding them is part of the fitting problem.
As in Sec. 9.1.8 (Maximum) we introduce binary variables indicating that , i.e. it is the -th linear function that achieves maximum at the point . Following Sec. 9.1.8 (Maximum) we now have a mixed integer conic quadratic problem
with variables , where is a sufficiently large constant.
Fig. 9.5 Convex piecewise linear fit with segments.¶
Frequently an integer model will have properties which formally follow from the problem’s constraints, but may be very hard or impossible for a mixed-integer solver to automatically deduce. It may dramatically improve efficiency to explicitly add some of them to the model. For example, we can enhance (9.22) with all inequalities of the form
which indicate that each linear segment covers a contiguous subset of the sample and additionally force these segments to come in the order of increasing as increases from left to right. The last statement is an example of symmetry breaking.