# 10.5 Nearest Correlation Matrix Problem¶

A correlation matrix is a symmetric positive definite matrix with unit diagonal. This term has origins in statistics, since the matrix whose entries are the correlation coefficients of a sequence of random variables has all these properties.

In this section we study variants of the problem of approximating a given symmetric matrix $$A$$ with correlation matrices:

• find the correlation matrix $$X$$ nearest to $$A$$ in the Frobenius norm,
• find an approximation of the form $$D+X$$ where $$D$$ is a diagonal matrix with positive diagonal and $$X$$ is a positive semidefinite matrix of low rank, using the combination of Frobenius and nuclear norm.

Both problems are related to portfolio optimization, where one can often have a matrix $$A$$ that only approximates the correlations of stocks. For subsequent optimizations one would like to approximate $$A$$ with a correlation matrix or, in the factor model, with $$D+VV^T$$ with $$VV^T$$ of small rank.

## 10.5.1 Nearest correlation with the Frobenius norm¶

The Frobenius norm of a real matrix $$M$$ is defined as

$\|M\|_F = \left(\sum_{i,j}M_{i,j}^2\right)^{1/2}$

and with respect to this norm our optimization problem can be expressed simply as:

(1)$\begin{split}\begin{array}{ll} \minimize & \|A-X\|_F\\ \st & \mathbf{diag}(X) = e,\\ & X \succeq 0.\\ \end{array}\end{split}$

We can exploit the symmetry of $$A$$ and $$X$$ to get a compact vector representation. To this end we make use of the following mapping from a symmetric matrix to a flattened vector containing the (scaled) lower triangular part of the matrix:

(2)$\begin{split}\begin{array}{ll} \mbox{vec}: & \R^{n\times n} \rightarrow \real^{n(n+1)/2} \\ \mbox{vec}(M) = & (\alpha_{11}M_{11},\alpha_{21}M_{21},\alpha_{22}M_{22},\ldots,\alpha_{n1}M_{n1},\ldots,\alpha_{nn}M_{nn}) \\ \alpha_{ij}=&\begin{cases}1 & j=i\\ \sqrt{2} & j<i\end{cases} \end{array}\end{split}$

Note that $$\|M\|_F=\|\mbox{vec}(M)\|_2$$. The Fusion implementation of $$\mbox{vec}$$ is as follows:

Listing 25 Implementation of function $$vec$$ in (2). Click here to download.
function r = vec(e)

import mosek.fusion.*;

N = e.getShape().dim(1);

subi = [1: N*(N+1)/2];
subj = zeros(N*(N+1)/2,1);
val  = zeros(N*(N+1)/2,1);

k=1;
for j=1:N,
for i=j:N,
subj(k) = i+(j-1)*N;
if (i==j),
val(k) = 1;
else
val(k) = sqrt(2);
end;
k = k + 1;
end
end

S = Matrix.sparse(N*(N+1)/2, N*N, subi, subj, val);
r = Expr.mul(S, Expr.reshape( e, N*N ));


That leads to an optimization problem with both conic quadratic and semidefinite constraints:

(3)$\begin{split}\begin{array}{ll} \minimize & t\\ \st & (t, \mbox{vec} (A-X)) \in \Q,\\ & \mathbf{diag}(X) = e,\\ & X \succeq 0.\\ \end{array}\end{split}$

Code example

Listing 26 Implementation of problem (3). Click here to download.
function nearestcorr_frobenius(A,N)

import  mosek.fusion.*;
M = Model('NearestCorrelation');

% Setting up the variables
X = M.variable('X', Domain.inPSDCone(N));
t = M.variable('t', 1, Domain.unbounded());

% (t, vec (A-X)) \in Q
M.constraint( Expr.vstack(t, vec(Expr.sub(A,X))), Domain.inQCone() );

% diag(X) = e
M.constraint(X.diag(), Domain.equalsTo(1.0));

% Objective: minimize t
M.objective(ObjectiveSense.Minimize, t);
M.solve();

% Get the solution values
reshape(X.level(), N,N)
M.dispose();


We use the following input

Listing 27 Input for the nearest correlation problem.
N = 5;
A = [  0.0,  0.5,  -0.1,  -0.2,   0.5;
0.5,  1.25, -0.05, -0.1,   0.25;
-0.1, -0.05,  0.51,  0.02, -0.05;
-0.2, -0.1,   0.02,  0.54, -0.1;
0.5,  0.25, -0.05, -0.1,   1.25 ];


The expected output is the following (small differences may apply):

ans =

1.0000    0.5000   -0.1000   -0.2000    0.5000
0.5000    1.0000   -0.0500   -0.1000    0.2500
-0.1000   -0.0500    1.0000    0.0200   -0.0500
-0.2000   -0.1000    0.0200    1.0000   -0.1000
0.5000    0.2500   -0.0500   -0.1000    1.0000


## 10.5.2 Nearest Correlation with Nuclear-norm Penalty¶

Next, we consider the approximation of $$A$$ of the form $$D+X$$ where $$D=\diag(w),\ w\geq 0$$ and $$X\succeq 0$$. We will also aim at minimizing the rank of $$X$$. This can be approximated by a relaxed linear objective penalizing the trace $$\trace(X)$$ (which in this case is the nuclear norm of $$X$$ and happens to be the sum of its eigenvalues).

The combination of these constraints leads to a problem:

$\begin{split}\begin{array}{ll} \minimize & \left\|X+\diag(w)-A\right\|_F + \gamma \trace(X),\\ \st & X \succeq 0, w \geq 0, \end{array}\end{split}$

where the parameter $$\gamma$$ controls the tradeoff between the quality of approximation and the rank of $$X$$.

Exploit the mapping $$\mbox{vec}$$ defined in (2) we can express this problem as:

(4)$\begin{split}\begin{array}{ll} \minimize & t + \gamma\trace(X) \\ \st & (t, \mbox{vec} (X + \diag(w) - A) ) \in \Q, \\ & X \succeq 0 , w \geq 0. \end{array}\end{split}$

Code example

Listing 28 Implementation of problem (4). Click here to download.
function nearestcorr_nucnorm(A, N, gammas)

import mosek.fusion.*;
M = Model('NucNorm');

% Setup variables
t = M.variable('t', 1, Domain.unbounded());
X = M.variable('X', Domain.inPSDCone(N));
w = M.variable('w', N, Domain.greaterThan(0.0));

% (t, vec (X + diag(w) - A)) in Q
D = Expr.mulElm( Matrix.eye(N), Var.repeat(w,1,N) );
M.constraint( Expr.vstack( t, vec(Expr.sub(Expr.add(X, D), Matrix.dense(A))) ), ...
Domain.inQCone() );

% Trace of X
TX = Expr.sum(X.diag());

for g=gammas
% Objective: Minimize t + gamma*Tr(X)
M.solve()

%Get the eigenvalues of X and approximate its rank
d = eig(reshape(X.level(),N,N));

disp(sprintf('gamma=%f, res=%e, rank=%d', g, t.level(), sum(d>1e-6)))
end
M.dispose();


We feed MOSEK with the same input as in Sec. 10.5.1 (Nearest correlation with the Frobenius norm). The problem is solved for a range of values $$\gamma$$ values, to demonstrate how the penalty term helps achieve a low rank solution. To this extent we report both the rank of $$X$$ and the residual norm $$\left\|X+\diag(w)-A\right\|_F$$.

gamma=0.000000, res=3.076163e-01, rank=4
gamma=0.100000, res=4.251692e-01, rank=2
gamma=0.200000, res=5.112081e-01, rank=1
gamma=0.300000, res=5.298432e-01, rank=1
gamma=0.400000, res=5.592686e-01, rank=1
gamma=0.500000, res=6.045702e-01, rank=1
gamma=0.600000, res=6.764402e-01, rank=1
gamma=0.700000, res=8.009913e-01, rank=1
gamma=0.800000, res=1.062385e+00, rank=1
gamma=0.900000, res=1.129513e+00, rank=0
gamma=1.000000, res=1.129513e+00, rank=0