4 The power cone

So far we studied quadratic cones and their applications in modeling problems involving, directly or indirectly, quadratic terms. In this part we expand the quadratic and rotated quadratic cone family with power cones, which provide a convenient language to express models involving powers other than \(2\). We must stress that although the power cones include the quadratic cones as special cases, at the current state-of-the-art they require more advanced and less efficient algorithms.

4.1 The power cone(s)

\(n\)-dimensional power cones form a family of convex cones parametrized by a real number \(0<\alpha<1\):

(4.1)\[\begin{array}{rcl} \POW_n^{\alpha,1-\alpha} & = & \left\{x\in \real^n~:~x_1^\alpha x_2^{1-\alpha} \geq \sqrt{x_3^2+\cdots+x_n^2},\ x_1,x_2\geq 0\right\}. \end{array}\]

The constraint in the definition of \(\POW_n^{\alpha,1-\alpha}\) can be expressed as a composition of two constraints, one of which is a quadratic cone:

(4.2)\[\begin{split}\begin{array}{rcl} x_1^\alpha x_2^{1-\alpha} & \geq & |z|,\\ z & \geq & \sqrt{x_3^2+\cdots+x_n^2}, \end{array}\end{split}\]

which means that the basic building block we need to consider is the three-dimensional power cone

(4.3)\[\begin{array}{rcl} \POW_3^{\alpha,1-\alpha} & = & \left\{x\in \real^3~:~x_1^\alpha x_2^{1-\alpha} \geq |x_3|,\ x_1,x_2\geq 0\right\}. \end{array}\]

More generally, we can also consider power cones with “long left-hand side”. That is, for \(m< n\) and a sequence of exponents \(\alpha_1,\ldots,\alpha_m\) with \(\alpha_1+\cdots+\alpha_m=1\), we have the most general power cone object defined as

(4.4)\[\begin{array}{rcl} \POW_n^{\alpha_1,\cdots,\alpha_m} & = & \left\{x\in \real^n~:~\prod_{i=1}^m x_i^{\alpha_i} \geq \sqrt{\sum_{i=m+1}^n x_{i}^2},\ x_1,\ldots,x_m\geq 0\right\}. \end{array}\]

The left-hand side is nothing but the weighted geometric mean of the \(x_i\), \(i=1,\ldots, m\) with weights \(\alpha_i\). As we will see later, also this most general cone can be modeled as a composition of three-dimensional cones \(\POW_3^{\alpha,1-\alpha}\), so in a sense that is the basic object of interest.

There are some notable special cases we are familiar with. If we let \(\alpha\to 0\) then in the limit we get \(\POW_n^{0,1}=\real_+\times \Q^{n-1}\). If \(\alpha=\frac12\) then we have a rescaled version of the rotated quadratic cone, precisely:

\[(x_1,x_2,\ldots,x_n)\in\POW_n^{\frac12,\frac12} \iff (x_1/\sqrt{2},x_2/\sqrt{2},x_3,\ldots,x_n) \in \Qr^n.\]

A gallery of three-dimensional power cones for varying \(\alpha\) is shown in Fig. 4.1.

_images/powcone.png

Fig. 4.1 The boundary of \(\POW_3^{\alpha,1-\alpha}\) seen from a point inside the cone for \(\alpha=0.1,0.2,0.35,0.5\).

4.2 Sets representable using the power cone

In this section we give basic examples of constraints which can be expressed using power cones.

4.2.1 Powers

For all values of \(p\neq 0,1\) we can bound \(x^p\) depending on the convexity of \(f(x)=x^p\).

  • For \(p>1\) the inequality \(t\geq |x|^p\) is equivalent to \(t^{1/p}\geq |x|\) and hence corresponds to

    \[t\geq |x|^p \iff (t, 1, x) \in \POW_3^{1/p,1-1/p}.\]

    For instance \(t\geq |x|^{1.5}\) is equivalent to \((t,1,x)\in\POW_3^{2/3,1/3}\).

  • For \(0<p<1\) the function \(f(x)=x^p\) is concave for \(x\geq 0\) and so we get

    \[|t|\leq x^p,\ x\geq 0 \iff (x, 1, t) \in \POW_3^{p,1-p}.\]
  • For \(p<0\) the function \(f(x)=x^p\) is convex for \(x>0\) and in this range the inequality \(t\geq x^p\) is equivalent to

    \[t\geq x^p \iff t^{1/(1-p)}x^{-p/(1-p)}\geq 1 \iff (t,x,1)\in\POW_3^{1/(1-p),-p/(1-p)}.\]

    For example \(t\geq\frac{1}{\sqrt{x}}\) is the same as \((t,x,1)\in\POW_3^{2/3,1/3}\).

4.2.2 \(p\)-norm cones

Let \(p\geq 1\). The \(p\)-norm of a vector \(x\in\real^n\) is \(\|x\|_p=\left(|x_1|^p+\cdots+|x_n|^p\right)^{1/p}\) and the \(p\)-norm ball of radius \(t\) is defined by the inequality \(\|x\|_p\leq t\). We take the \(p\)-norm cone (in dimension \(n+1\)) to be the convex set

(4.5)\[\left\{(t,x)\in\real^{n+1}~:~t\geq \|x\|_p\right\}.\]

For \(p=2\) this is precisely the quadratic cone. We can model the \(p\)-norm cone by writing the inequality \(t\geq \|x\|_p\) as:

\[t\geq \sum_i |x_i|^p/t^{p-1}\]

and bounding each summand with a power cone. This leads to the following model:

(4.6)\[\begin{split}\begin{array}{rclr} r_i t^{p-1} & \geq & |x_i|^p & ((r_i,t,x_i)\in\POW_3^{1/p,1-1/p}), \\ \sum r_i & = & t. & \end{array}\end{split}\]

When \(0<p<1\) or \(p<0\) the formula for \(\|x\|_p\) gives a concave, rather than convex function on \(\real^n_+\) and in this case it is possible to model the set

\[\left\{(t,x)~:~0\leq t\leq \left(\sum x_i^p\right)^{1/p}, x_i\geq 0\right\},\ p<1, p\neq 0.\]

We leave it as an exercise (see previous subsection). The case \(p=-1\) appears in Sec. 3.2.6 (Harmonic mean).

4.2.3 The most general power cone

Consider the most general version of the power cone with “long left-hand side” defined in (4.4). We show that it can be expressed using the basic three-dimensional cones \(\POW_3^{\alpha,1-\alpha}\). Clearly it suffices to consider a short right-hand side, that is to model the cone

(4.7)\[\POW_{m+1}^{\alpha_1,\ldots,\alpha_m} = \left\{x_1^{\alpha_1} x_2^{\alpha_2}\cdots x_m^{\alpha_m} \geq |z|,\ x_1,\ldots,x_m\geq 0\right\},\]

where \(\sum_i\alpha_i=1\). Denote \(s=\alpha_1+\cdots+\alpha_{m-1}\). We split (4.7) into two constraints

(4.8)\[\begin{split}\begin{array}{rclr} x_1^{\alpha_1/s} \cdots x_{m-1}^{\alpha_{m-1}/s} & \geq & |t|, & \ x_1,\ldots,x_{m-1}\geq 0, \\ t^sx_m^{\alpha_m} & \geq & |z|, & x_m\geq 0, \end{array}\end{split}\]

and this way we expressed \(\POW_{m+1}^{\alpha_1,\ldots,\alpha_m}\) using two power cones \(\POW_{m}^{\alpha_1/s,\ldots,\alpha_{m-1}/s}\) and \(\POW_3^{s,\alpha_m}\). Proceeding by induction gives the desired splitting.

4.2.4 Geometric mean

The power cone \(\POW_{n+1}^{1/n,\ldots,1/n}\) is a direct way to model the inequality

(4.9)\[|z|\leq (x_1x_2\cdots x_n)^{1/n},\]

which corresponds to maximizing the geometric mean of the variables \(x_i\geq 0\). In this special case tracing through the splitting (4.8) produces an equivalent representation of (4.9) using three-dimensional power cones as follows:

\[\begin{split}\begin{array}{rcl} x_1^{1/2}x_2^{1/2} &\geq &|t_3|,\\ t_3^{1-1/3}x_3^{1/3} &\geq& |t_4|,\\ &\cdots& \\ t_{n-1}^{1-1/(n-1)}x_{n-1}^{1/(n-1)} &\geq& |t_n|,\\ t_n^{1-1/n}x_n^{1/n}&\geq& |z|. \end{array}\end{split}\]

4.2.5 Non-homogenous constraints

Every constraint of the form

\[x_1^{\alpha_1}x_2^{\alpha_2}\cdots x_m^{\alpha_m} \geq |z|^\beta,\ x_i\geq 0,\]

where \(\sum_i\alpha_i< \beta\) and \(\alpha_i>0\) is equivalent to

\[(x_1,x_2,\ldots,x_m,1,z)\in\POW_{m+2}^{\alpha_1/\beta,\ldots,\alpha_m/\beta,s}\]

with \(s=1-\sum_i\alpha_i/\beta\).

4.3 Power cone case studies

4.3.1 Portfolio optimization with market impact

Let us go back to the Markowitz portfolio optimization problem introduced in Sec. 3.3.3 (Markowitz portfolio optimization), where we now ask to maximize expected profit subject to bounded risk in a long-only portfolio:

(4.10)\[\begin{split}\begin{array}{lrcl} \maximize & \mu^Tx & & \\ \st & \sqrt{x^T\Sigma x} & \leq & \gamma,\\ & x_i & \geq & 0, \ i=1,\ldots,n, \end{array}\end{split}\]

In a realistic model we would have to consider transaction costs which decrease the expected return. In particular if a really large volume is traded then the trade itself will affect the price of the asset, a phenomenon called market impact. It is typically modeled by decreasing the expected return of \(i\)-th asset by a slippage cost proportional to \(x^\beta\) for some \(\beta>1\), so that the objective function changes to

\[\maximize \left( \mu^Tx - \sum_i\delta_i x_i^\beta\right).\]

A popular choice is \(\beta=3/2\). This objective can easily be modeled with a power cone as in Sec. 4.2 (Sets representable using the power cone):

\[\begin{split}\begin{array}{lrclr} \maximize & \mu^Tx - \delta^Tt & & & \\ \st & (t_i, 1, x_i) & \in & \POW_3^{1/\beta,1-1/\beta} & (t_i\geq x_i^\beta), \\ & \cdots & & & \end{array}\end{split}\]

In particular if \(\beta=3/2\) the inequality \(t_i\geq x_i^{3/2}\) has conic representation \((t_i,1,x_i)\in \POW_3^{2/3,1/3}\).

4.3.2 Maximum volume cuboid

Suppose we have a convex, conic representable set \(K\subseteq\real^n\) (for instance a polyhedron, a ball, intersections of those and so on). We would like to find a maximum volume axis-parallel cuboid inscribed in \(K\). If we denote by \(p\in\real^n\) the leftmost corner and by \(x_1,\ldots,x_n\) the edge lengths, then this problem is equivalent to

(4.11)\[\begin{split}\begin{array}{lll} \maximize & t & \\ \st & t \leq (x_1\cdots x_n)^{1/n},& \\ & x_i \geq 0, &\\ & (p_1+e_1x_1,\ldots,p_n+e_nx_n) \in K, & \forall e_1,\ldots,e_n\in\{0,1\}, \end{array}\end{split}\]

where the last constraint states that all vertices of the cuboid are in \(K\). The optimal volume is then \(v=t^n\). Modeling the geometric mean with power cones was discussed in Sec. 4.2.4 (Geometric mean).

Maximizing the volume of an arbitrary (not necessarily axis-parallel) cuboid inscribed in \(K\) is no longer a convex problem. However, it can be solved by maximizing the solution to (4.11) over all sets \(T(K)\) where \(T\) is a rotation (orthogonal matrix) in \(\real^n\). In practice one can approximate the global solution by sampling sufficiently many rotations \(T\) or using more advanced methods of optimization over the orthogonal group.

_images/icosahedron.png

Fig. 4.2 The maximal volume cuboid inscribed in the regular icosahedron takes up approximately \(0.388\) of the volume of the icosahedron (the exact value is \(3(1+\sqrt{5})/25\)).

4.3.3 \(p\)-norm geometric median

The geometric median of a sequence of points \(x_1,\ldots,x_k\in\real^n\) is defined as

\[\mathrm{argmin}_{y\in\real^n} \sum_{i=1}^k \|y-x_i\|,\]

that is a point which minimizes the sum of distances to all the given points. Here \(\|\cdot\|\) can be any norm on \(\real^n\). The most classical case is the Euclidean norm, where the geometric median is the solution to the basic facility location problem minimizing total transportation cost from one depot to given destinations.

For a general \(p\)-norm \(\|x\|_p\) with \(1\leq p<\infty\) (see Sec. 4.2.2 (-norm cones)) the geometric median is the solution to the obvious conic problem:

(4.12)\[\begin{split}\begin{array}{ll} \minimize & \sum_i t_i \\ \st & t_i \geq \|y-x_i\|_p, \\ & y\in\real^n. \end{array}\end{split}\]

In Sec. 4.2.2 (-norm cones) we showed how to model the \(p\)-norm bound using \(n\) power cones.

The Fermat-Torricelli point of a triangle is the Euclidean geometric mean of its vertices, and a classical theorem in planar geometry (due to Torricelli, posed by Fermat), states that it is the unique point inside the triangle from which each edge is visible at the angle of \(120^\circ\) (or a vertex if the triangle has an angle of \(120^\circ\) or more). Using (4.12) we can compute the \(p\)-norm analogues of the Fermat-Torricelli point. Some examples are shown in Fig. 4.3.

_images/fermat.png

Fig. 4.3 The geometric median of three triangle vertices in various \(p\)-norms.

4.3.4 Maximum likelihood estimator of a convex density function

In [TV98] the problem of estimating a density function that is know in advance to be convex is considered. Here we will show that this problem can be posed as a conic optimization problem. Formally the problem is to estimate an unknown convex density function \(g : \R_+ \rightarrow \R_+\) given an ordered sample \(y_1 < y_2 < \ldots < y_n\) of \(n\) outcomes of a distribution with density \(g\).

The estimator \(\tilde g \geq 0\) is a piecewise linear function

\[\tilde g : [y_1, y_n] \rightarrow \R_+\]

with break points at \((y_i, x_i)\), \(i = 1, \ldots , n\), where the variables \(x_i > 0\) are estimators for \(g(y_i)\). The slope of the \(i\)-th linear segment of \(\tilde g\) is

\[\frac{x_{i+1}-x_i}{y_{i+1}-y_i}.\]

Hence the convexity requirement leads to the constraints

\[\frac{x_{i+1}-x_i}{y_{i+1}-y_i} \leq \frac{x_{i+2}-x_{i+1}}{y_{i+2}-y_{i+1}}, \, i=1,\ldots,n-2.\]

Recall the area under the density function must be 1. Hence,

\[\sum_{i=1}^{n-1} (y_{i+1}-y_i)\left ( \frac{x_{i+1}+x_i}{2} \right ) = 1\]

must hold. Therefore, the problem to be solved is

\[\begin{split}\begin{array}{lrcll} \maximize & \prod_{i=1}^n x_i & \\ \st & \frac{x_{i+1}-x_i}{y_{i+1}-y_i} - \frac{x_{i+2}-x_{i+1}}{y_{i+2}-y_{i+1}} & \leq & 0, & i=1,\ldots,n-2, \\ & \sum_{i=1}^{n-1} (y_{i+1}-y_i)\left ( \frac{x_{i+1}+x_i}{2} \right ) & = & 1, \\ & x & \geq & 0.\\ \end{array}\end{split}\]

Maximizing \(\prod_{i=1}^n x_i\) or the geometric mean \((\prod_{i=1}^n x_i)^\frac{1}{n}\) will produce the same optimal solutions. Using that observation we can model the objective as shown in Sec. 4.2.4 (Geometric mean). Alternatively, one can use as objective \(\sum_i\log{x_i}\) and model it using the exponential cone as in Sec. 5.2 (Modeling with the exponential cone).