# 8.1 Linear Least Squares and Related Norm Minimization Problems¶

A frequently occurring problem in statistics and in many other areas of science is the problem

where \(F\) and \(b\) are a matrix and vector of appropriate dimensions. \(x\) is the vector decision variables. Typically, the norm used is the 1-norm, the 2-norm, or the infinity norm.

## 8.1.1 The Case of the 2-norm¶

Initially let us focus on the 2 norm. In this case (1) is identical to the quadratic optimization problem

in the sense that the set of optimal solutions for the two problems coincides. This fact follows from

Subsequently, it is demonstrated how the quadratic optimization problem (2) is solved using `mosekopt`

. In the example the problem data is read from a file, then data for the problem (2) is constructed and finally the problem is solved.

```
function nrm1()
% Clear prob
clear prob;
F = [ [ 0.4302 , 0.3516 ]; [0.6246, 0.3384] ]
b = [ 0.6593, 0.9666]'
% Compute the fixed term in the objective.
prob.cfix = 0.5*b'*b
% Create the linear objective terms
prob.c = -F'*b;
% Create the quadratic terms. Please note that only the lower triangular
% part of f'*f is used.
[prob.qosubi,prob.qosubj,prob.qoval] = find(sparse(tril(F'*F)))
% Obtain the matrix dimensions.
[m,n] = size(F);
% Specify a.
prob.a = sparse(0,n);
[r,res] = mosekopt('minimize',prob);
% The optimality conditions are F'*(F x - b) = 0.
% Check if they are satisfied:
fprintf('\nnorm(f^T(fx-b)): %e\n',norm(F'*(F*res.sol.itr.xx-b)));
```

Often the \(x\) variables must be within some bounds or satisfy some additional linear constraints. These requirements can easily be incorporated into the problem (2). E.g. the constraint \(\left\| x \right\|_{\infty}\leq 1\) can be modeled as reported in Listing 8.2.

```
function nrm2()
F = [ [ 0.4302 , 0.3516 ]; [0.6246, 0.3384] ]
b = [ 0.6593, 0.9666]'
% Compute the fixed term in the objective.
prob.cfix = 0.5*b'*b
% Create the linear objective terms
prob.c = -F'*b;
% Create the quadratic terms. Please note that only the lower triangular
% part of f'*f is used.
[prob.qosubi,prob.qosubj,prob.qoval] = find(sparse(tril(F'*F)));
% Obtain the matrix dimensions.
[m,n] = size(F);
prob.blx = -ones(n,1);
prob.bux = ones(n,1);
% Specify a.
prob.a = sparse(0,n);
[r,res] = mosekopt('minimize',prob);
% Check if the solution is feasible.
norm(res.sol.itr.xx,inf)
```

## 8.1.2 The Case of the Infinity Norm¶

In some applications of the norm minimization problem (1) it is better to use the infinity norm than the 2 norm. However, the problem (1) stated as an infinity norm problem is equivalent to the linear optimization problem

where \(e\) is the vector of ones of appropriate dimension. This implies that

and hence at optimum

holds. Problem (3) is straightforward to solve, for instance using script as in Listing 8.3

```
function nrm3()
clear prob;
F = [ [ 0.4302 , 0.3516 ]; [0.6246, 0.3384] ]
b = [ 0.6593, 0.9666]'
% Obtain the matrix dimensions.
[m,n] = size(F);
prob.c = sparse(n+1,1,1.0,n+1,1);
prob.a = [[F,ones(m,1)];[F,-ones(m,1)]];
prob.blc = [b ; -inf*ones(m,1)];
prob.buc = [inf*ones(m,1); b ];
[r,res] = mosekopt('minimize',prob);
% The optimal objective value is given by:
norm(F*res.sol.itr.xx(1:n)-b,inf)
```

## 8.1.3 The Case of the 1-norm¶

By definition, for the 1-norm we have that

Therefore, the norm minimization problem can be formulated as follows

which in turn is equivalent to

The reader should verify that this is really the case. In matrix notation this problem can be expressed as follows

where \(e = (1,\ldots ,1)^T\). Next, this problem is solved in Listing 8.4.

```
function nrm4()
clear prob;
F = [ [ 0.4302 , 0.3516 ]; [0.6246, 0.3384] ]
b = [ 0.6593, 0.9666]'
% Obtain the matrix dimensions.
[m,n] = size(F);
prob.c = [sparse(n,1) ; ones(m,1)];
prob.a = [[F,-speye(m)] ; [F,speye(m)]];
prob.blc = [-inf*ones(m,1); b];
prob.buc = [b ; inf*ones(m,1)];
[r,res] = mosekopt('minimize',prob);
% The optimal objective value is given by:
norm(F*res.sol.itr.xx(1:n)-b,1)
```

A better formulation

It is possible to improve upon the formulation of the problem (3). Indeed problem (3) is equivalent to

After eliminating the \(t\) variables then this problem is equivalent to

Please note that this problem has only half the number of general constraints than problem (3) since we have replaced constraints of the general form

with simpler constraints

which **MOSEK** treats in a special and highly efficient way. Furthermore **MOSEK** stores only the non-zeros in the coefficient matrix of the constraints. This implies that the problem (6) is likely to require much less space than the problem (5).

It is left as an exercise for the reader to implement this formulation in MATLAB.

### 8.1.3.1 More About Solving Linear Least Squares Problems¶

Linear least squares problems with and without linear side constraints appear very frequently in practice and it is therefore important to know how such problems are solved efficiently using **MOSEK**. Now, assume that the problem of interest is the linear least squares problem

where \(F\) and \(A\) are matrices and the remaining quantities are vectors. \(x\) is the vector of decision variables. The problem (7) as stated is a convex quadratic optimization problem and can be solved as such.

However, if \(F\) has much fewer rows than columns then it will usually be more efficient to solve the equivalent problem

Please note that a number of new constraints and variables has been introduced which of course seems to be disadvantageous but on the other hand the Hessian of the objective in problem (8) is much sparser than in problem (7). Frequently this turns out to be more important for the computational efficiency and therefore the latter formulation is usually the better one.

If \(F\) has many more rows than columns, then formulation (8) is not attractive whereas the corresponding dual problem is. Using the duality theory outlined in Section 16.5.1 we obtain the dual problem

which can be simplified to

after eliminating the \(\bar{y}\) variables. Here we use the convention that

In practice such fixed variables in \(s_l^x\) and \(s_u^x\) should be removed from the problem.

Given our assumptions the dual problem (9) will have much fewer constraints than the primal problem (8); in general, the fewer constraints a problem contains, the more efficient **MOSEK** tends to be. A question is: If the dual problem (9) is solved instead of the primal problem (8), how is the optimal \(x\) solution obtained? It turns out that the dual variables corresponding to the constraint

are the optimal \(x\) solution. Therefore, due to the fact that **MOSEK** always reports this information as the:

```
res.sol.itr.y
```

vector, the optimal \(x\) solution can easily be obtained.

In the following code fragment, it is investigated whether it is attractive to solve the dual rather than the primal problem for a concrete numerical example. This example has no linear equalities and \(F\) is a \(2000\) by \(400\) matrix.

```
function nrm5()
F = repmat( [ [ 0.4302, 0.3516 ]; [0.6246, 0.3384] ], 10, 1);
f = repmat( [ 0.6593, 0.9666]', 10,1) ;
% Obtain the matrix dimensions.
[m,n] = size(F)
prob = [];
prob.qosubi = n+(1:m);
prob.qosubj = n+(1:m);
prob.qoval = ones(m,1);
prob.a = [ F,-speye(m,m)];
prob.blc = f;
prob.buc = f;
blx = -ones(n,1);
bux = ones(n,1);
prob.blx = [blx;-inf*ones(m,1)];
prob.bux = [bux; inf*ones(m,1)];
fprintf('m=%d n=%d\n',m,n);
fprintf('First try\n');
tic
[rcode,res] = mosekopt('minimize',prob);
% Display the solution time.
fprintf('Time : %-.2f\n',toc);
try
% x solution:
x = res.sol.itr.xx;
% objective value:
fprintf('Objective value: %-6e\n', 0.5*norm(F*x(1:n)-f)^2);
% Check feasibility.
fprintf('Feasibility : %-6e\n',min(x(1:n)-blx(1:n)));
catch
fprintf('MSKERROR: Could not get solution')
end
% Clear prob.
prob=[];
%
% Next, we solve the dual problem.
% Index of lower bounds that are finite:
lfin = find(blx>-inf);
% Index of upper bounds that are finite:
ufin = find(bux<inf);
prob.qosubi = 1:m;
prob.qosubj = 1:m;
prob.qoval = -ones(m,1);
prob.c = [f;blx(lfin);-bux(ufin)];
prob.a = [F',...
sparse(lfin,(1:length(lfin))',...
ones(length(lfin),1),...
n,length(lfin)),...
sparse(ufin,(1:length(ufin))',...
-ones(length(ufin),1),...
n,length(ufin))];
prob.blc = sparse(n,1);
prob.buc = sparse(n,1);
prob.blx = [-inf*ones(m,1);...
sparse(length(lfin)+length(ufin),1)];
prob.bux = [];
fprintf('\n\nSecond try\n');
tic
[rcode,res] = mosekopt('maximize',prob);
% Display the solution time.
fprintf('Time : %-.2f\n',toc);
try
% x solution:
x = res.sol.itr.y
% objective value:
fprintf('Objective value: %-6e\n',...
0.5*norm(F*x(1:n)-f)^2);
% Check feasibility.
fprintf('Feasibility : %-6e\n',...
min(x(1:n)-blx(1:n)));
catch
fprintf('MSKERROR: Could not get solution')
end
```

Here is the output produced:

```
m=2000 n=400
First try
Time : 2.07
Objective value: 2.257945e+001
Feasibility : 1.466434e-009
Second try
Time : 0.47
Objective value: 2.257945e+001
Feasibility : 2.379134e-009
```

Both formulations produced a strictly feasible solution having the same objective value. Moreover, using the dual formulation leads to a reduction in the solution time by about a factor 5: In this case we can conclude that the dual formulation is far superior to the primal formulation of the problem.

## 8.1.4 Using Conic Optimization on Linear Least Squares Problems¶

Linear least squares problems can also be solved using conic optimization because the linear least squares problem

is equivalent to

This problem is a conic quadratic optimization problem having one quadratic cone and the corresponding dual problem is

which can be reduced to

Often the dual problem has much fewer constraints than the primal problem. In such cases it will be more efficient to solve the dual problem and obtain the primal solution \(x\) as the dual solution of the dual problem.