9.2 Least Squares and Other Norm Minimization Problems¶
A frequently occurring problem in statistics and in many other areas of science is a norm minimization problem
where
9.2.1 Least squares, 2-norm¶
In the case of the 2-norm we specify the problem directly in conic quadratic form
The first constraint of the problem can be represented as an affine conic constraint. This leads to the following model.
% Least squares regression
% minimize \|Fx-g\|_2
function x = norm_lse(F,g,A,b)
n = size(F,2);
k = size(g,1);
m = size(A,1);
% A model with variables x(n) and t(1)
model = mosekmodel(name = "norm_lse", numvar = n + 1, objective = [zeros(n, 1); 1]);
% Linear constraints
model.appendcons(F = [A, zeros(m, 1)], domain = mosekdomain("equal", dim = m, rhs = b));
% Affine conic constraint
model.appendcons(F = [sparse(1,n), 1; F, sparse(k,1)], g = [0; -g], ...
domain = mosekdomain("quadratic cone", dim = k + 1));
model.solve();
xx = model.getsolution("any", "x");
x = xx(1:n);
end
9.2.2 Ridge regularisation¶
Regularisation is classically applied to reduce the impact of outliers and to control overfitting. In the conic version of ridge (Tychonov) regression we consider the problem
which can be written explicitly as
The implementation is a small extension of that from the previous section.
% Least squares regression with regularization
% minimize \|Fx-g\|_2 + gamma*\|x\|_2
function x = norm_lse_reg(F,g,A,b,gamma)
n = size(F,2);
k = size(g,1);
m = size(A,1);
% A model with variables x(n) and t1, t2
model = mosekmodel(name = "norm_lse_reg", numvar = n + 2, objective = [zeros(n, 1); 1; gamma]);
% Linear constraints
model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));
% Affine conic constraint for \|Fx-g\|_2
model.appendcons(F = [sparse(1,n), 1, 0; ...
F, sparse(k,2)], ...
g = [0; -g], ...
domain = mosekdomain("quadratic cone", dim = k + 1));
% Affine conic constraint for \|x\|_2
model.appendcons(F = [sparse(1, n+1), 1; ...
speye(n), sparse(n,2) ], ...
domain = mosekdomain("quadratic cone", dim = n + 1));
model.solve();
xx = model.getsolution("any", "x");
x = xx(1:n);
end
Note that classically least squares problems are formulated as quadratic problems and then the objective function would be written as
This version can easily be obtained by replacing the quadratic cone with an appropriate rotated quadratic cone in (9.16). Then they core of the implementation would change as follows:
% Least squares regression with regularization
% The "classical" quadratic version
% minimize \|Fx-g\|_2^2 + gamma*\|x\|_2^2
function x = norm_lse_reg_quad(F,g,A,b,gamma)
n = size(F,2);
k = size(g,1);
m = size(A,1);
% A model with variables x(n) and t1, t2
model = mosekmodel(name = "norm_lse_reg_quad", numvar = n + 2, objective = [zeros(n, 1); 1; gamma]);
% Linear constraints
model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));
% Affine conic constraint for \|Fx-g\|_2^2
model.appendcons(F = [sparse(1,n) 1 0; ...
sparse(1, n+2); ...
F sparse(k, 2)], ...
g = [0; 0.5; -g], ...
domain = mosekdomain("rotated quadratic cone", dim = k + 2));
% Affine conic constraint for \|x\|_2^2
model.appendcons(F = [sparse(1, n+1) 1; ...
sparse(1, n+2); ...
speye(n) sparse(n, 2) ], ...
g = [0; 0.5; zeros(n, 1)], ...
domain = mosekdomain("rotated quadratic cone", dim = n + 2));
model.solve();
xx = model.getsolution("any", "x");
x = xx(1:n);
end
Fig. 9.2 shows the solution to a polynomial fitting problem for a few variants of least squares regression with and without ridge regularization.

Fig. 9.2 Three fits to a dataset at various levels of regularization.¶
9.2.3 Lasso regularization¶
In lasso or least absolute shrinkage and selection operator the regularization term is the 1-norm of the solution
This variant typically tends to give preference to sparser solutions, i.e. solutions where only a few elements of
so that eventually the problem becomes
% Least squares regression with lasso regularization
% minimize \|Fx-g\|_2 + gamma*\|x\|_1
function x = norm_lse_lasso(F,g,A,b,gamma)
n = size(F,2);
k = size(g,1);
m = size(A,1);
% A model with variables x(n), u(n) and t1, t2
model = mosekmodel(name = "norm_lse_lasso", numvar = n + n + 2, objective = [zeros(2*n, 1); 1; gamma]);
% Linear constraints
model.appendcons(F = [A, zeros(m, 2)], domain = mosekdomain("equal", dim = m, rhs = b));
% u >=. |x|
model.appendcons(F = [speye(n) speye(n) sparse(n, 2); ...
-speye(n) speye(n) sparse(n, 2)], ...
domain = mosekdomain("greater than", dim = 2*n, rhs = zeros(2*n, 1)));
% t2 >= sum(u)
model.appendcons(F = [sparse(1, n), -ones(1, n), 0, 1], ...
domain = mosekdomain("greater than", rhs = 0));
% Affine conic constraint for \|Fx-g\|_2
model.appendcons(F = [sparse(1, 2*n) 1 0; ...
F sparse(k, n + 2)], ...
g = [0; -g], ...
domain = mosekdomain("quadratic cone", dim = k + 1));
model.solve();
xx = model.getsolution("any", "x");
x = xx(1:n);
end
The sparsity pattern of the solution of a large random regression problem can look for example as follows:
Lasso regularization
Gamma 0.0100 density 99% |Fx-g|_2: 54.3722
Gamma 0.1000 density 87% |Fx-g|_2: 54.3939
Gamma 0.3000 density 67% |Fx-g|_2: 54.5319
Gamma 0.6000 density 40% |Fx-g|_2: 54.8379
Gamma 0.9000 density 26% |Fx-g|_2: 55.0720
Gamma 1.3000 density 12% |Fx-g|_2: 55.1903
9.2.4 p-norm minimization¶
Now we consider the minimization of the
We have the optimization problem:
Increasing the value of
The power cones can be added one by one to the structure representing affine conic constraints. Each power cone will require one
and then reshuffle its elements into
using an appropriate permutation matrix. This approach is demonstrated in the code below.
% P-norm minimization
% minimize \|Fx-g\|_p
function x = norm_p_norm(F,g,A,b,p)
n = size(F,2);
k = size(g,1);
m = size(A,1);
% A model with variables x(n), r(k) and t(1)
model = mosekmodel(name = "norm_p_norm", numvar = n + k + 1, objective = [zeros(n + k, 1); 1]);
% Linear constraints
model.appendcons(F = [A, sparse(m, k + 1)], domain = mosekdomain("equal", dim = m, rhs = b));
% t == sum(r)
model.appendcons(F = [sparse(1, n), ones(1, k), -1], ...
domain = mosekdomain("equal", rhs = 0));
% Permutation matrix which picks triples (r_i, t, F_ix-g_i)
M = [];
for i=1:3
M = [M, sparse(i:3:3*k, 1:k, ones(k,1), 3*k, k)];
end
Fcon = M * [sparse(k, n), speye(k), sparse(k,1); ...
sparse(k, n+k), ones(k, 1); ...
F, sparse(k, k+1)];
gcon = M * [sparse(2*k, 1); -g];
model.appendcons(F = Fcon, g = gcon, ...
domain = mosekdomain("power cone", n = k, dim = 3, alpha = [1; p-1]));
model.solve();
xx = model.getsolution("any", "x");
x = xx(1:n);
end

Fig. 9.3