9.5 General Convex Optimization

The general nonlinear optimizer (which may be available for all or some types of nonlinear problems depending on the interface), solves smooth (twice differentiable) convex nonlinear optimization problems of the form

\[\begin{split}\begin{array}{lccccc} \mbox{minimize} & & & f(x) + c^T x + c^f & & \\ \mbox{subject to } & l^c & \leq & g(x) + A x & \leq & u^c, \\ & l^x & \leq & x & \leq & u^x, \end{array}\end{split}\]

where

  • \(m\) is the number of constraints.
  • \(n\) is the number of decision variables.
  • \(x \in \real^n\) is a vector of decision variables.
  • \(c \in \real^n\) is the linear part objective function.
  • \(A \in \real^{m \times n}\) is the constraint matrix.
  • \(l^c \in \real^m\) is the lower limit on the activity for the constraints.
  • \(u^c \in \real^m\) is the upper limit on the activity for the constraints.
  • \(l^x \in \real^n\) is the lower limit on the activity for the variables.
  • \(u^x \in \real^n\) is the upper limit on the activity for the variables.
  • \(f: \real^n \rightarrow \real\) is a nonlinear function.
  • \(g: \real^n \rightarrow \real^m\) is a nonlinear vector function.

This means that the \(i\)-th constraint has the form

\[l_i^c \leq g_i(x) + \sum_{j=1}^n a_{ij} x_j \leq u_i^c.\]

The linear term \(Ax\) is not included in \(g(x)\) since it can be handled much more efficiently as a separate entity when optimizing.

The nonlinear functions \(f\) and \(g\) must be smooth in all \(x \in [l^x; u^x]\). Moreover, \(f(x)\) must be a convex function and \(g_i(x)\) must satisfy

\[\begin{split}\begin{array}{rcl} -\infty <l_i^c & \Rightarrow & g_i(x) \mbox{ is concave},\\ u_i^c < \infty & \Rightarrow & g_i(x) \mbox{ is convex}, \\ -\infty <l_i^c \leq u_i^c < \infty & \Rightarrow & g_i(x) = 0. \end{array}\end{split}\]

9.5.1 Duality for General convex Optimization

Similarly to the linear case, MOSEK reports dual information in the general nonlinear case. Indeed in this case the Lagrange function is defined by

\[\begin{split}\begin{array} {rcl} L(x,s_l^c,s_u^c,s_l^x,s_u^x) & := & f(x) + c^T x + c^{f} \\ & & - (s_l^c)^T(g(x) + A x - l^c) - (s_u^c)^T(u^c - g(x) - A x)\\ & & - (s_l^x)^T(x-l^x) - (s_u^x)^T(u^x-x), \end{array}\end{split}\]

and the dual problem is given by

\[\begin{split}\begin{array} {lccl} \mbox{maximize} & L(x,s_l^c,s_u^c,s_l^x,s_u^x) & & \\ \mbox{subject to} & \nabla_x L(x,s_l^c,s_u^c,s_l^x,s_u^x)^T & = & 0, \\ & s_l^c,s_u^c,s_l^x,s_u^x \geq 0, & & \end{array}\end{split}\]

which is equivalent to

\[\begin{split}\begin{array} {lccl} \mbox{maximize} & (l^c)^T s_l^c - (u^c)^T s_u^c + (l^x)^T s_l^x - (u^x)^T s_u^x + c^{f} & & \\ & + f(x) - g(x)^T y - (\nabla f(x)^T - \nabla g(x)^T y)^T x & & \\ \mbox{subject to} & A^T y + s_l^x - s_u^x - (\nabla f(x)^T - \nabla g(x)^T y) & = & c, \\ & -y + s_l^c - s_u^c & = & 0, \\ & s_l^c, s_u^c, s_l^x, s_u^x \geq 0. & & \end{array}\end{split}\]

In this context we use the following definition for scalar functions

\[\begin{array} {lccl} \nabla f(x) & = & \left[ \frac{\partial f(x)}{\partial x_1}, \ldots, \frac{\partial f(x)}{\partial x_n} \right], \end{array}\]

and accordingly for vector functions

\[\begin{split}\begin{array} {lccl} \nabla g(x) & = & \left[ \begin{array} {c} \nabla g_1(x) \\ \vdots \\ \nabla g_m(x) \end{array} \right]. \end{array}\end{split}\]