2. Linear optimization

2.1. Introduction

In this chapter we discuss different aspects of linear optimization. We first introduce the basic concepts of a linear optimization and discuss the underlying geometric interpretations. We then give examples of the most frequently used reformulations or modeling tricks used in linear optimization, and we finally discuss duality and infeasibility theory in some detail.

2.1.1. Basic notions

The most basic class of optimization is linear optimization. In linear optimization we minimize a linear function given a set of linear constraints. For example, we may wish to minimize a linear function

\[x_1 + 2 x_2 - x_3\]

under the constraints that

\[x_1 + x_2 + x_3 = 1, \quad x_1, x_2, x_3 \geq 0.\]

The function we minimize is often called the objective function; in this case we have a linear objective function. The constraints are also linear and consists of both linear equality constraints and linear inequality constraints. We typically use a more compact notation

(1)\[\begin{split} \begin{array}{ll} \mbox{minimize} & x_1 + 2 x_2 - x_3\\ \mbox{subject to} & x_1 + x_2 + x_3 = 1\\ & x_1, x_2, x_3 \geq 0, \end{array}\end{split}\]

and we call (1) an linear optimization problem. The domain where all constraints are satisfied is called the feasible set; the feasible set for (1) is shown in Fig. 2.1.


Fig. 2.1 Feasible set for \(x_1+x_2+x_3=1\) and \(x_1,x_2,x_3\geq 0.\)

For this simple problem we see by inspection that the optimal value of the problem is \(-1\) obtained by the optimal solution

\[(x_1^\star, x_2^\star, x_3^\star) = (0, 0, 1).\]

Linear optimization problems are typically formulated with a notation similar to

\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x\\ \mbox{subject to}& A x = b\\ & x\geq 0. \end{array}\end{split}\]

For example, we can pose (1) in this form with

\[\begin{split}x = \left[\begin{array}{c}x_1\\ x_2\\ x_3\end{array}\right], \quad c = \left[\begin{array}{r}1\\ 2\\ -1\end{array}\right], \quad A = \left[ \begin{array}{ccc} 1 & 1 & 1 \end{array} \right].\end{split}\]

There are many other formulations for linear optimization problems; we can have different types of constraints,

\[Ax = b, \quad Ax \geq b, \quad Ax \leq b, \quad l^c \leq Ax \leq u^c,\]

and different bounds on the variables

\[l^x \leq x \leq u^x\]

or we may have no bounds on some \(x_i\), in which case we say that \(x_i\) is a free variable.

All these formulations are equivalent in the sense that by simple transformations and introduction of auxiliary variables they can represent the same problem with the same optimal solution. The important feature is that the objective function and the constraints are all linear in \(x\).

2.1.2. Geometry of linear optimization

A hyperplane is a set \(\{ x \mid a^T(x - x_0) = 0\}\) or equivalently \(\{ x \mid a^T x = \gamma \}\) with \(a^Tx_0 = \gamma\), see Fig. 2.2.


Fig. 2.2 The dashed line illustrates a hyperplane \(\{x\mid a^Tx=\gamma\}\)

Thus linear constraints

\[Ax = b\]

with \(A \in \R^{m\times n}\) represents an intersection of \(m\) hyperplanes.

Next consider a point \(x\) above the hyperplane in Fig. 2.2. Since \(x-x_0\) forms an acute angle with \(a\) we have that \(a^T(x-x_0)\geq 0\), or \(a^Tx \geq \gamma\). The set \(\{ x \mid a^T x \geq \gamma \}\) is called a halfspace, see Fig. 2.3. Similarly the set \(\{ x \mid a^T x \leq 0 \}\) forms another halfspace; in Fig. 2.3 it corresponds to the area below the dashed line.


Fig. 2.3 The grey area is the halfspace \(\{x\mid a^T x\geq \gamma\}\)

A set of linear inequalities

\[Ax \leq b\]

corresponds to an intersection of halfspaces and forms a polyhedron, see Fig. 2.4.


Fig. 2.4 A polyhedron formed as an intersection of halfspaces.

The polyhedral description of the feasible set gives us a very intuitive interpretation of linear optimization, which is illustrated in Fig. 2.5.


Fig. 2.5 Geometric interpretation of linear optimization. The optimal solution \(x^\star\) is at point where the normals to \(c\) (the dashed lines) intersect the polyhedron.

The dashed lines are normal to the objective \(c=(-1,1)\), and to minimize \(c^T x\) we move as far as possible in the opposite direction of \(c\), to a point where one the normals intersect the polyhedron; an optimal solution is therefore always either a vertex of the polyhedron, or an entire facet of the polyhedron may be optimal.

The polyhedron shown in the figure in bounded, but this is not always the case for polyhedra coming from linear inequalities in optimization problems. In such cases the optimization problem may be unbounded, which we will discuss in detail in Section 2.4.

2.2. Linear modeling

In this section we discuss both useful reformulation techniques to model different functions using linear programming, as well as common practices that are best avoided. By modeling we mean equivalent reformulations that lead to the same optimal solution; there is no approximation or modeling error involved.

2.2.1. Convex piecewise-linear functions

Perhaps the most basic and frequently used reformulation for linear optimization involves modeling a convex piecewise-linear function by introducing a number of linear inequalities. Consider the convex piecewise-linear (or rather piecewise-affine) function illustrated in Fig. 2.6, where the function can be described as \(\max \{ a_1 x + b_1, a_2 x + b_2, a_3 x + b_3 \}\).


Fig. 2.6 A convex piecewise-linear function (solid lines) of a single variable \(x\). The function is defined as the maximum of 3 affine functions.

More generally, we consider convex piecewise-linear functions \(f:\R^n \mapsto \R\) defined as

\[f(x) := \max_{i=1,\dots,m} \{ a_i^T x + b_i \},\]

where the important notion is that \(f\) is defined as the maximum of a set of affine functions. To model the epigraph \(f(x) \leq t\) (see App. 10), we note that \(t\geq \max_i \{ a_i^T x + b_i \}\) if and only if \(t\) is larger than all the affine functions, i.e., we have an equivalent formulation with \(m\) inequalities,

\[a_i^T x + b_i \leq t, \quad i=1,\dots,m.\]

Piecewise-linear functions have many uses linear in optimization; either we have a convex piecewise-linear formulation from the onset, or we may approximate a more complicated (nonlinear) problem using piecewise-linear approximations.

A word of caution is in order at this point; in the past the use of linear optimization was significantly more widespread than the use of nonlinear optimization, and piecewise-linear modeling was used to approximately solve nonlinear optimization problems. The field of nonlinear optimization (especially conic optimization) has since matured, and with modern optimization software it is both easier and more efficient to directly formulate and solve conic problems without piecewise-linear approximations.

2.2.2. Absolute values

The absolute value of a scalar variable is an example of a simple convex piecewise-linear function,

\[| x | := \max \{ \: x, \: -x \: \},\]

so we model \(|x|\leq t\) using two inequalities

\[-t \leq x \leq t.\]

2.2.3. The \(\ell_1\) norm

All norms are convex functions, but the \(\ell_1\) and \(\ell_\infty\) norms are of particular interest for linear optimization. The \(\ell_1\) norm of vector \(x\in\R^n\) is defined as

\[\| x \|_1 := |x_1| + |x_2| + \cdots + |x_n|.\]

To model the epigraph

(2)\[\|x\|_1 \leq t,\]

we first introduce the following system

(3)\[| x_i | \leq z_i, \: i=1,\dots, n, \quad \sum_{i=1}^n z_i = t,\]

with additional (auxiliary) variable \(z\in\R^n\), and we claim that (2) and (3) are equivalent. They are equivalent if all \(z_i=|x_i|\) in (3). Suppose that \(t\) is optimal (as small as possible) for (3), but that some \(z_i > |x_i|\). But then we could reduce \(t\) further by reducing \(z_i\) contradicting the assumption that \(t\) optimal, so the two formulations are equivalent. Therefore, we can model (2) using linear (in)equalities

\[-z_i \leq x_i \leq z_i, \quad \sum_{i=1}^n z_i = t,\]

with auxiliary variables \(z\). Similarly, we can describe the epigraph of the norm of an affine function of \(x\),

\[\| Ax-b\|_1 \leq t\]


\[-z_i \leq a_i^Tx - b_i \leq z_i, \quad \sum_{i=1}^n z_i = t,\]

where \(a_i\) is the \(i-\)th row of \(A\) (taken as a column-vector).

The \(\ell_1\) norm is overwhelmingly popular as a convex approximation of the cardinality (i.e., number on nonzero elements) of a vector \(x\). For example, suppose we are given an underdetermined linear system

\[Ax = b\]

where \(A\in \R^{m\times n}\) and \(m<<n\). The basis pursuit problem

(4)\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|x \|_1\\ \mbox{subject to}& A x = b, \end{array}\end{split}\]

uses the \(\ell_1\) norm of \(x\) as a heuristic of finding a sparse solution (one with many zero elements) to \(Ax=b\), i.e., it aims to represent \(b\) using few columns of \(A\). Using the reformulation above we can pose the problem as a linear optimization problem,

(5)\[\begin{split}\begin{array}{lccccc} \mbox{minimize}& & & e^Tz\\ \mbox{subject to}& -z & \leq & x & \leq & z\\ & & &A x &=& b,\\ \end{array}\end{split}\]

where \(e=(1,\dots,1)^T\).

2.2.4. The \(\ell_\infty\) norm

The \(\ell_\infty\) norm of a vector \(x\in\R^n\) is defined as

\[\| x \|_\infty := \max_{i=1,\dots,n} | x_i |,\]

which is another example of simple piecewise-linear functions. To model

(6)\[ \| x \|_\infty \leq t\]

we use that \(t\geq \max_{i=1,\dots, n} | x_i |\) if and only if \(t\) is greater than each term, i.e., we can model (6) as

\[-t \leq x_i \leq t, \quad i=1,\dots,n.\]

Again, we can also consider an affine function of \(x\), i.e.,

\[\|A x - b\|_\infty \leq t,\]

which can be described as

\[-t \leq a_i^T x - b \leq t, \quad i=1,\dots,n.\]

It is interesting to note that the \(\ell_1\) and \(\ell_\infty\) norms are dual norms. For any norm \(\| \cdot \|\) on \(\R^n\), the dual norm \(\| \cdot \|_*\) is defined as

\[\| x \|_* = \max\{ x^T v \mid \| v\| \leq 1 \}.\]

Let us verify that the dual of the \(\ell_\infty\) norm is the \(\ell_1\) norm. Consider

\[\| x \|_{*,\infty} = \max\{ x^T v \mid \| v\|_\infty \leq 1 \}.\]

Obviously the maximum is attained for

\[\begin{split}v_i = \left\{ \begin{array}{cc} +1, & x_i \geq 0,\\-1, & x_i < 0,\end{array}\right.\end{split}\]

i.e., \(\| x \|_{*,\infty} = \| x \|_1 = \sum_i | x_i |\). Similarly, consider the dual of the \(\ell_1\) norm,

\[\| x \|_{\star,1} = \max\{ x^T v \mid \| v\|_1 \leq 1 \}.\]

To maximize \(x^T v\) subject to \(|v_1|+\cdots+|v_n|\leq 1\) we identify the largest element of \(x\), say \(|x_k|\), The optimizer \(v\) is then given by \(v_k=\pm 1\) and \(v_i=0, \: i\neq k\), i.e., \(\| x \|_{*,1}= \| x \|_\infty\). This illustrates a more general property of dual norms, namely that \(\|x\|_{**}=\|x\|\).

2.2.5. Avoid ill-posed problems

A problem is ill posed if small perturbations of the problem data result in arbitrarily large perturbations of the solution, or change feasibility of the problem. Such problem formulations should always be avoided as even the smallest numerical perturbations (for example rounding errors, or solving the problem on a different computer) can result in different or wrong solutions. Additionally, from an algorithmic point of view, even computing a wrong solution is very difficult for ill-posed problems.

A rigorous definition of the degree of ill-posedness is possible by defining a condition number for a linear optimization, but unfortunately this is not a very practical metric, as evaluating such a condition number requires solving several optimization problems. Therefore even though being able to quantify the difficulty of an optimization problem from a condition number is very attractive, we only make the modest recommendations to avoid problems

  • that are nearly infeasible,
  • with constraints that are linearly dependent, or nearly linearly dependent.

Typical examples of problems with nearly linearly dependent constraints are discretizations of continuous processes, where the constraints invariably become more correlated as we make the discretization finer; as such there may be nothing wrong with the discretization or problem formulation, but we should expect numerical difficulties for sufficiently fine discretizations.

2.2.6. Avoid badly scaled problems

Another difficulty encountered in practice is with models that are badly scaled. Loosely defined, we consider a model to be badly scaled if

  • variables are measured on very different scales,
  • constraints or bounds are measured on very different scales.

For example if one variable \(x_1\) has units of molecules and another variable \(x_2\) measures temperature, we expect both the coefficients \(c_1\) and \(c_2\) to be of a very different scale. In that case we might have an objective

\[10^{12}x_1 + x_2\]

and if we normalize the objective by \(10^{12}\) and round coefficients to 8 digits, we see that \(x_2\) effectively disappears from the objective. In practice we do not round coefficients, but in finite precision arithmetic any contribution of \(x_2\) to the objective will be insignificant (and unreliable).

A similar situation (from a numerical point of view) is encountered with the method of penalization or \(big-M\) strategies. For sake of argument, assume we have a standard linear optimization problem (7) with the additional constraint that \(x_1=0\). We may eliminate \(x_1\) completely from the model, or we might add an additional constraint, but suppose we choose to formulate a penalized problem instead

\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x + 10^{12}x_1\\ \mbox{subject to}& Ax = b\\ & x \geq 0, \end{array}\end{split}\]

reasoning that the large penalty term will force \(x_1=0\). However, if \(\| c \|<1\) we have the exact same problem, namely that in finite precision the penalty term will completely dominate the objective and render the contribution \(c^T x\) insignificant or unreliable. Therefore, penalty or big-\(M\) terms should never be used explicitly as part of an optimization model.

As a different example, consider again a problem (7), but with additional (redundant) constraints \(x_i \leq \gamma\). For optimization practitioners this is a common approach for stabilizing the optimization algorithm since it improves conditioning of the linear systems solved by an interior-point algorithm. The problem we solve is then

\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x\\ \mbox{subject to} & Ax = b\\ & x \geq 0\\ & x \leq \gamma e, \end{array}\end{split}\]

with a dual problem

\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y - \gamma e^T z\\ \mbox{subject to} & A^T y + s - z = c\\ & s, z \geq 0. \end{array}\end{split}\]

Suppose we do not know a-priori an upper bound on \(\| x \|_\infty\), so we choose \(\gamma=10^{12}\) reasoning that this will not change the optimal solution. Note that the large variable bound becomes a penalty term in the dual problem; in finite precision such a large bound will effectively destroy accuracy of the solution.

2.3. Duality in linear optimization

Duality theory is a rich and powerful area of convex optimization, and central to understanding sensitivity analysis and infeasibility issues in linear (and convex) optimization. Furthermore,it provides a simple and systematic way of obtaining non-trivial lower bounds on the optimal value for many difficult non-convex problem. In this section we only discuss duality theory at a descriptive level suited for practitioners; we refer to Appendix 9 for a more advanced treatment.

2.3.1. The dual problem

Initially, consider the standard linear optimization problem

(7)\[\begin{split}\begin{array}{ll} \mbox{minimize} & c^T x \\ \mbox{subject to} & Ax = b\\ & x \geq 0. \end{array}\end{split}\]

Associated with (7) is a so-called Lagrangian function \(L:\R^n\times \R^m \times \R^n \mapsto \R\) that augments the objective with a weighted combination of all the constraints,

\[L(x, y, s) = c^Tx + y^T(b-Ax) - s^T x,\]

The variables \(y\in \R^p\) and \(s\in \R_+^n\) are called Lagrange multipliers or dual variables. It is easy to verify that

\[L(x, y, s)\leq c^T x\]

for any feasible \(x\). Indeed, we have \(b-Ax=0\) and \(x^T s\geq 0\) since \(x,s\geq 0\), i.e., \(L(x,y,s)\leq c^T x\). Note the importance of nonnegativity of \(s\); more generally of all Lagrange multipliers associated with inequality constraints. Without the nonnegativity constraint the Lagrangian function is not a lower bound.

The dual function is defined as the minimum of \(L(x,y,s)\) over \(x\). Thus the dual function of (7) is

\[g(y, s) = \min_x L(x,y,s) = \min_x x^T( c - A^T y - s ) + b^T y.\]

We see that the Langrangian function is linear in \(x\), so it is unbounded below unless when \(c - A^T y - s = 0\), i.e.,

\[\begin{split}g(y,s) = \left \{ \begin{array}{ll} b^T y, & c - A^T y - s = 0\\ -\infty, & \mbox{otherwise}. \end{array} \right.\end{split}\]

Finally, we get a dual problem is by maximizing \(g(y,s)\). The dual problem of (7) is

(8)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y \\ \mbox{subject to} & c - A^T y = s\\ & s \geq 0. \end{array}\end{split}\]
Example 3 Dual of basis pursuit

As another example, let us derive the dual of the basis pursuit formulation (5). The Lagrangian function is

\[\begin{split}L(x, z, y, u, v) = e^Tz + u^T(x-z) - v^T(x+z) + y^T(b-Ax)\\\end{split}\]

with Lagrange multipliers \(y\in \R^m\) and \(u,v\in \R^n_+\). The dual function

\[g(y,u,v)=\min_{x,z}L(x,z,y,u,v) = \min_{x,z} z^T (e - u - v) + x^T(u - v - A^T y) + y^T b\]

is linear in \(z\) and \(x\) so unbounded below unless \(e = u + v\) and \(A^Ty = u-v\), i.e., the dual problem is

(9)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y \\ \mbox{subject to}& e = u + v,\\ & A^T y = u - v\\ & u, v \geq 0. \end{array}\end{split}\]
Example 4 Dual of basis pursuit revisited

We can also derive the dual of the basis pursuit formulation (4) directly. The Lagrangian is

\[L(x, y) = \| x \|_1 + y^T(Ax-b) = \| x \|_1 + x^T A^T y - b^T y\]

with a dual function

\[g(y, s) = -b^T y + \min_x (\| x \|_1 + x^T A^T y).\]

The term \(\min_x(\| x \|_1 + x^T A^T y)\) can be simplified as

\begin{equation} \begin{aligned} \min_x (\| x \|_1 + x^T A^T y) & = \min_{t\geq 0}\min_{\| z\|_1=1}(t\|z\|_1 + t z^T A^T y)\\ & = \min_{t\geq 0} t(1-\max_{\| z\|_1=1} z^T A^T y) \\ & = \min_{t\geq 0}t(1 - \| A^T y \|_\infty),\end{aligned}\end{equation}

where we used the definition of the dual norm in the last line. Finally \(\min_{t\geq 0}t(1-\| A^T y \|_\infty)\) is 0 if \(\|A^Ty\|_\infty \leq 1\) and unbounded below otherwise. In other words, we get a dual function

\[\begin{split}g(y) = \left\{ \begin{array}{ll}-b^T y, & \| A^T y \|_\infty \leq 1,\\-\infty, & \text{otherwise}, \end{array}\right.\end{split}\]

and a dual problem

(10)\[\begin{split}\begin{array}{ll} \mbox{maximize} & b^T y\\ \mbox{subject to}& \| A^T y \|_\infty \leq 1. \end{array}\end{split}\]

Not surprisingly, (9) and (10) are equivalent in the sense that

\[e = u+v, \quad A^T y = u-v, \quad u,v \geq 0 \qquad \Longleftrightarrow \qquad \|A^T y \|_\infty \leq 1.\]

2.3.2. Duality properties

Many important observations can be made for the dual problem (8), which we discuss in details in Appendix 9. We summarize the most important properties below; to that end let \(x^\star\) and \((y^\star, s^\star)\) denote the optimal primal and dual solutions with optimal values \(p^\star:=c^T x^\star\) and \(d^\star:=b^T y^\star\), respectively.

  • In both the primal and dual problem we optimize over the nonnegative orthant, i.e., \(x, s\geq 0\). The number of constraints in the primal problem becomes the number of variables in the dual problem, and vice versa.

  • The dual problem gives a lower bound on the primal problem for all \((x,y,s)\). From the dual problem we have \(c - A^Ty = s\), so

    \[x^T(c - A^T y) = x^T s \geq 0,\]

    and since \(Ax=b\) we have

    \[c^T x - b^T y = x^T s \geq 0\]

    i.e., \(b^T y \leq c^T x\). This is called the weak duality property, and the nonnegative entity \(x^T s\) is called the complementarity gap.

  • If the primal feasible set is nonempty

    \[{\cal F}_p=\{x\in \R^n \mid Ax=b, \: x\geq 0\} \neq \emptyset\]

    or the dual feasible set is nonempty

    \[{\cal F}_d=\{ y\in \R^m \mid c-A^T y \geq 0\} \neq \emptyset\]

    then the optimal primal and dual values coincide, i.e.,

    \[d^\star = c^T x^\star = b^T y^\star = p^\star.\]

    This remarkable fact (see Appendix 9.5 for a proof) is called the strong duality property. If strong duality holds then \((x^\star)^Ts^\star=0\) so \(x^\star\) and \(s^\star\) are complementary; since \(x^\star,s^\star\geq 0\) we have that \(x^\star_i > 0 \Leftrightarrow s^\star_i = 0\) and vice versa.

    For linear (and conic) optimization strong duality means that we have the choice of solving either the primal or the dual problem (or both).

2.4. Infeasibility in linear optimization

2.4.1. Basic concepts

In Sec. 2.3.2 we summarized the main duality properties, namely weak and strong duality properties. In this section we discuss situations where strong duality does not hold. Those situations are captured by the following two results known as (variations of) Farkas’ lemma; for proofs see Appendix 9.

Lemma 8 Farkas' lemma

Given \(A\) and \(b\), exactly one of the two statements are true:

  1. There exists an \(x\geq 0\) such that \(Ax = b\).
  2. There exists a \(y\) such that \(A^T y \leq 0\) and \(b^T y > 0\).

The Farkas lemma tells us that either the primal problem (7) is feasible (\({\cal F}_p\neq \emptyset\)) or there exists a \(y\) such that \(A^T y \leq 0\) and \(b^T y > 0\). In other words, any \(y\) satisfying

\[A^T y \leq 0, \quad b^T y > 0\]

is a certificate of primal infeasibility. We can also think of a Farkas certificate as an unbounded direction for the dual problem; to that end assume that

\[\not\exists x\geq 0 \: : \: Ax = b,\]

so we have a \(y\) satisfying \(A^Ty \leq 0\) and \(b^T y > 0\). If we further assume existence of point \(y_0\) satifying

\[c - A^T y_0\geq 0\]

then the dual remains feasible in the direction of \(y\),

\[c-A^T(t y + y_0) \geq 0, \quad \forall t\geq 0\]

with an unbounded objective \(b^T (ty + y_0)\rightarrow \infty\) for \(t\rightarrow \infty\), i.e., \(d^\star = \infty\).

Similarly, the dual variant of Farkas’ lemma states that either the dual problem is feasible (\({\cal F}_d\neq \emptyset\)) or there exists an \(x\geq 0\) such that \(Ax=0\) and \(c^T x <0\). More precisely

Lemma 9 Farkas' lemma dual variant

Given \(A\) and \(c\), exactly one of the two statements are true:

  1. There exists an \(x\geq 0\) such that \(Ax = 0\) and \(c^Tx < 0\).
  2. There exists a \(y\) such that \(c-A^Ty \geq 0\).

In other words, any \(x\geq 0\) satisfying \(Ax=0\) and \(c^T x < 0\) is a certificate of dual infeasibility. If the primal problem is feasible, then the certificate is a feasible unbounded direction for the primal objective, i.e., \(p^\star = -\infty\).

Below we summarize the different cases that can occur in linear optimization:

  • If the either the primal or dual problems are feasible, we have strong duality, i.e., \(p^\star=d^\star\).
  • If the primal problem is infeasible (\(p^\star=\infty\)), then from Farkas’ lemma the dual problem is unbounded (\(d^\star = \infty\)) or infeasible (\(d^\star=-\infty\)).
  • If the primal problem is unbounded (\(p^\star=-\infty\)), then from weak duality the dual problem is infeasible (\(d^\star = -\infty\)).
  • If the dual problem is infeasible (\(d^\star=-\infty\)), then from Farkas’ dual lemma then the primal problem is unbounded (\(p^\star=-\infty\)) or infeasible (\(p^\star=\infty\)).
Example 5 Primal and dual infeasibility

As an example exhibiting both primal and dual infeasibility consider the problem

\[\begin{split}\begin{array}{ll} \mbox{minimize} & -x_1 -x_2\\ \mbox{subject to} & x_1 = -1\\ & x_1,\,x_2 \geq 0 \end{array}\end{split}\]

with a dual problem

\[\begin{split}\begin{array}{ll} \mbox{maximize} & -y\\ \mbox{subject to} & \left [ \begin{array}{c}1\\0\end{array} \right ] y \leq \left [ \begin{array}{c}-1\\-1\end{array} \right ]. \end{array}\end{split}\]

Both the primal and dual problems are trivially infeasible; \(y=-1\) serves as a certificate of primal infeasibility, and \(x=(0,1)\) is a certificate of dual infeasibility.

2.4.2. Locating infeasibility

In some cases we are interested in locating the cause of infeasibility in a model, for example if we expect the infeasibility to be caused by an error in the problem formulation. This can be difficult in practice, but a Farkas certificate lets us reduce the dimension of the infeasible problem, which in some cases pinpoints the cause of infeasibility.

To that end, suppose we are given a certificate of primal infeasibility,

\[A^T y \leq 0, \quad b^T y > 0,\]

and define the index-set

\[{\cal I} = \{ i \mid y_i \neq 0\}.\]

Consider the reduced set of constraints

\[A_{{\cal I},:}x = b_{\cal I}, \quad x\geq 0.\]

It is then easy to verify that

\[A_{{\cal I},:}^T y_{\cal I} \leq 0, \quad b_{\cal I}^T y_{\cal I} > 0\]

is an infeasibility certificate for the reduced problem with fewer constraints. If the reduced system is sufficiently small, it may be possible to locate the cause of infeasibility by manual inspection.

2.5. Bibliography

The material in this chapter is very basic, and can be found in any textbook on linear optimization. For further details, we suggest a few standard references [Chv83], [BT97] and [PS98], which all cover much more that discussed here. Although the first edition of is not very recent, it is still considered one of the best textbooks on graph and network flow algorithms. [NW06] gives a more modern treatment of both theory and algorithmic aspects of linear optimization.