9.2 Least Squares and Other Norm Minimization Problems¶

A frequently occurring problem in statistics and in many other areas of science is a norm minimization problem

(9.13)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|F x - g\|,\\ \mbox{subject to} & Ax=b, \end{array}\end{split}\]

where \(x\in\real^n\) and of course we can allow other types of constraints. The objective can involve various norms: infinity norm, 1-norm, 2-norm, \(p\)-norms and so on. For instance the most popular case of the 2-norm corresponds to the least squares linear regression, since it is equivalent to minimization of \(\|Fx-g\|_2^2\).

9.2.1 Least squares, 2-norm¶

In the case of the 2-norm we specify the problem directly in conic quadratic form

(9.14)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & t,\\ \mbox{subject to} & (t, Fx-g)\in\Q^{k+1}, \\ & Ax=b. \end{array}\end{split}\]

The first constraint of the problem can be represented as an affine conic constraint. This leads to the following model.

Listing 9.8 Script solving problem (9.14) Click here to download.¶

% Least squares regression
% minimize \|Fx-g\|_2
function x = norm_lse(F,g,A,b)
    n = size(F,2);
    k = size(g,1);
    m = size(A,1);

    % A model with variables x(n) and t(1)
    model = mosekmodel(name = "norm_lse", numvar = n + 1, objective = [zeros(n, 1); 1]);

    % Linear constraints
    model.appendcons(F = [A, zeros(m, 1)], domain = mosekdomain("equal", dim = m, rhs = b));

    % Affine conic constraint
    model.appendcons(F = [sparse(1,n), 1; F, sparse(k,1)], g = [0; -g], ...
                     domain = mosekdomain("quadratic cone", dim = k + 1));

    model.solve();
    xx = model.getsolution("any", "x");
    x = xx(1:n);
end

9.2.2 Ridge regularisation¶

Regularisation is classically applied to reduce the impact of outliers and to control overfitting. In the conic version of ridge (Tychonov) regression we consider the problem

(9.15)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|F x - g\|_2+ \gamma\|x\|_2,\\ \mbox{subject to} & Ax=b, \end{array}\end{split}\]

which can be written explicitly as

(9.16)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & t_1+\gamma t_2,\\ \mbox{subject to} & (t_1, Fx-g)\in\Q^{k+1}, \\ & (t_2, x)\in\Q^{n+1},\\ & Ax=b. \end{array}\end{split}\]

The implementation is a small extension of that from the previous section.

Listing 9.9 Script solving problem (9.16) Click here to download.¶

% Least squares regression with regularization
% minimize \|Fx-g\|_2 + gamma*\|x\|_2
function x = norm_lse_reg(F,g,A,b,gamma)
    n = size(F,2);
    k = size(g,1);
    m = size(A,1);

    % A model with variables x(n) and t1, t2
    model = mosekmodel(name = "norm_lse_reg", numvar = n + 2, objective = [zeros(n, 1); 1; gamma]);

    % Linear constraints
    model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));

    % Affine conic constraint for \|Fx-g\|_2
    model.appendcons(F = [sparse(1,n),        1, 0; ...
                          F,            sparse(k,2)], ...
                     g = [0; -g], ...
                     domain = mosekdomain("quadratic cone", dim = k + 1));

    % Affine conic constraint for \|x\|_2
    model.appendcons(F = [sparse(1, n+1),        1; ...
                          speye(n),     sparse(n,2) ], ...
                     domain = mosekdomain("quadratic cone", dim = n + 1));

    model.solve();
    xx = model.getsolution("any", "x");
    x = xx(1:n);
end

Note that classically least squares problems are formulated as quadratic problems and then the objective function would be written as

\[\|Fx-g\|_2^2+\gamma\|x\|_2^2.\]

This version can easily be obtained by replacing the quadratic cone with an appropriate rotated quadratic cone in (9.16). Then they core of the implementation would change as follows:

Listing 9.10 Script solving classical quadratic ridge regression Click here to download.¶

% Least squares regression with regularization
% The "classical" quadratic version
% minimize \|Fx-g\|_2^2 + gamma*\|x\|_2^2
function x = norm_lse_reg_quad(F,g,A,b,gamma)
    n = size(F,2);
    k = size(g,1);
    m = size(A,1);

    % A model with variables x(n) and t1, t2
    model = mosekmodel(name = "norm_lse_reg_quad", numvar = n + 2, objective = [zeros(n, 1); 1; gamma]);

    % Linear constraints
    model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));

    % Affine conic constraint for \|Fx-g\|_2^2
    model.appendcons(F = [sparse(1,n) 1 0; ...
                          sparse(1, n+2); ...
                          F sparse(k, 2)], ...
                     g = [0; 0.5; -g], ...
                     domain = mosekdomain("rotated quadratic cone", dim = k + 2));

    % Affine conic constraint for \|x\|_2^2
    model.appendcons(F = [sparse(1, n+1) 1; ...
                          sparse(1, n+2); ...
                          speye(n) sparse(n, 2) ], ...
                     g = [0; 0.5; zeros(n, 1)], ...
                     domain = mosekdomain("rotated quadratic cone", dim = n + 2));

    model.solve();
    xx = model.getsolution("any", "x");
    x = xx(1:n);
end

Fig. 9.2 shows the solution to a polynomial fitting problem for a few variants of least squares regression with and without ridge regularization.

_images/fit_poly1.png — Fig. 9.2 Three fits to a dataset at various levels of regularization.¶

9.2.3 Lasso regularization¶

In lasso or least absolute shrinkage and selection operator the regularization term is the 1-norm of the solution

(9.17)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|F x - g\|_2+ \gamma\|x\|_1,\\ \mbox{subject to} & Ax=b. \end{array}\end{split}\]

This variant typically tends to give preference to sparser solutions, i.e. solutions where only a few elements of \(x\) are nonzero, and therefore it is used as an efficient approximation to the cardinality constrained problem with an upper bound on the 0-norm of \(x\). To see how it works we first implement (9.17) adding the constraint \(t\geq \|x\|_1\) as a series of linear constraints

\[u_i\geq -x_i,\ u_i\geq x_i,\ t\geq \sum u_i,\]

so that eventually the problem becomes

\[\begin{split}\begin{array}{ll} \mbox{minimize} & t_1+\gamma t_2,\\ \mbox{subject to} & u+x\geq 0,\\ & u-x\geq 0,\\ & t_2-e^Tu \geq 0,\\ & Ax=b,\\ & (t_1, Fx-g) \in \Q^{k+1}. \end{array}\end{split}\]

Listing 9.11 Script solving problem (9.17) Click here to download.¶

% Least squares regression with lasso regularization
% minimize \|Fx-g\|_2 + gamma*\|x\|_1
function x = norm_lse_lasso(F,g,A,b,gamma)
    n = size(F,2);
    k = size(g,1);
    m = size(A,1);

    % A model with variables x(n), u(n) and t1, t2
    model = mosekmodel(name = "norm_lse_lasso", numvar = n + n + 2, objective = [zeros(2*n, 1); 1; gamma]);

    % Linear constraints
    model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));

    % u >=. |x|
    model.appendcons(F = [speye(n)  speye(n) sparse(n, 2); ...
                          -speye(n) speye(n) sparse(n, 2)], ...
                     domain = mosekdomain("greater than", dim = 2*n, rhs = zeros(2*n, 1)));

    % t2 >= sum(u)
    model.appendcons(F = [sparse(1, n), -ones(1, n), 0, 1], ...
                     domain = mosekdomain("greater than", rhs = 0));

    % Affine conic constraint for \|Fx-g\|_2
    model.appendcons(F = [sparse(1, 2*n) 1 0; ...
                          F sparse(k, n + 2)], ...
                     g = [0; -g], ...
                     domain = mosekdomain("quadratic cone", dim = k + 1));

    model.solve();
    xx = model.getsolution("any", "x");
    x = xx(1:n);
end

The sparsity pattern of the solution of a large random regression problem can look for example as follows:

Lasso regularization
Gamma 0.0100  density 99%   |Fx-g|_2: 54.3722
Gamma 0.1000  density 87%   |Fx-g|_2: 54.3939
Gamma 0.3000  density 67%   |Fx-g|_2: 54.5319
Gamma 0.6000  density 40%   |Fx-g|_2: 54.8379
Gamma 0.9000  density 26%   |Fx-g|_2: 55.0720
Gamma 1.3000  density 12%   |Fx-g|_2: 55.1903

9.2.4 p-norm minimization¶

Now we consider the minimization of the \(p\)-norm defined for \(p>1\) as

(9.18)¶\[\|y\|_p = \left(\sum_i |y_i|^p \right)^{1/p}.\]

We have the optimization problem:

(9.19)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & \|F x - g\|_p,\\ \mbox{subject to} & Ax=b. \end{array}\end{split}\]

Increasing the value of \(p\) forces stronger penalization of outliers as ultimately, when \(p\to\infty\), the \(p\)-norm \(\|y\|_p\) converges to the infinity norm \(\|y\|_\infty\) of \(y\). According to the Modeling Cookbook the \(p\)-norm bound \(t\geq \|Fx-g\|_p\) can be added to the model using a sequence of three-dimensional power cones and we obtain an equivalent problem

(9.20)¶\[\begin{split}\begin{array}{ll} \mbox{minimize} & t \\ \mbox{subject to} & (r_i,t,(Fx-g)_i)\in\POW_3^{1/p,1-1/p}, \\ & e^Tr = t, \\ & Ax = b. \end{array}\end{split}\]

The power cones can be added one by one to the structure representing affine conic constraints. Each power cone will require one \(r_i\), one copy of \(t\) and one row from \(F\) and \(g\). An alternative solution is to create the vector

\[[r_1;\ldots;r_k;t;\ldots;t;Fx-g]\]

and then reshuffle its elements into

\[[r_1;t;F_1x-g_1;\ldots;r_k;t;F_kx-g_k]\]

using an appropriate permutation matrix. This approach is demonstrated in the code below.

Listing 9.12 Script solving problem (9.20) Click here to download.¶

% P-norm minimization
% minimize \|Fx-g\|_p
function x = norm_p_norm(F,g,A,b,p)
    n = size(F,2);
    k = size(g,1);
    m = size(A,1);

    % A model with variables x(n), r(k) and t(1)
    model = mosekmodel(name = "norm_p_norm", numvar = n + k + 1, objective = [zeros(n + k, 1); 1]);

    % Linear constraints
    model.appendcons(F = [A, sparse(m, k + 1)], domain = mosekdomain("equal", dim = m, rhs = b));

    % t == sum(r)
    model.appendcons(F = [sparse(1, n), ones(1, k), -1], ...
                     domain = mosekdomain("equal", rhs = 0));

    % Permutation matrix which picks triples (r_i, t, F_ix-g_i)
    M = [];
    for i=1:3
      M = [M, sparse(i:3:3*k, 1:k, ones(k,1), 3*k, k)];
    end

    Fcon = M * [sparse(k, n), speye(k), sparse(k,1); ...
                sparse(k, n+k), ones(k, 1); ...
                F, sparse(k, k+1)];
    gcon = M * [sparse(2*k, 1); -g];

    model.appendcons(F = Fcon, g = gcon, ...
                     domain = mosekdomain("power cone", n = k, dim = 3, alpha = [1; p-1]));

    model.solve();
    xx = model.getsolution("any", "x");
    x = xx(1:n);
end

_images/fit_pnorms.png — Fig. 9.3 \(p\)-norm minimizing fits of a polynomial of degree at most 5 to the data for various values of \(p\).¶

9.2 Least Squares and Other Norm Minimization Problems¶

9.2.1 Least squares, 2-norm¶

9.2.2 Ridge regularisation¶

9.2.3 Lasso regularization¶

9.2.4 p-norm minimization¶

Table of Contents

Download PDF

Modeling Cookbook

Cheatsheet