10.5 Nearest Correlation Matrix Problem

A correlation matrix is a symmetric positive definite matrix with unit diagonal. This term has origins in statistics, since the matrix whose entries are the correlation coefficients of a sequence of random variables has all these properties.

In this section we study variants of the problem of approximating a given symmetric matrix \(A\) with correlation matrices:

  • find the correlation matrix \(X\) nearest to \(A\) in the Frobenius norm,
  • find an approximation of the form \(D+X\) where \(D\) is a diagonal matrix with positive diagonal and \(X\) is a positive semidefinite matrix of low rank, using the combination of Frobenius and nuclear norm.

Both problems are related to portfolio optimization, where one can often have a matrix \(A\) that only approximates the correlations of stocks. For subsequent optimizations one would like to approximate \(A\) with a correlation matrix or, in the factor model, with \(D+VV^T\) with \(VV^T\) of small rank.

10.5.1 Nearest correlation with the Frobenius norm

The Frobenius norm of a real matrix \(M\) is defined as

\[\|M\|_F = \left(\sum_{i,j}M_{i,j}^2\right)^{1/2}\]

and with respect to this norm our optimization problem can be expressed simply as:

(1)\[\begin{split}\begin{array}{ll} \minimize & \|A-X\|_F\\ \st & \mathbf{diag}(X) = e,\\ & X \succeq 0.\\ \end{array}\end{split}\]

We can exploit the symmetry of \(A\) and \(X\) to get a compact vector representation. To this end we make use of the following mapping from a symmetric matrix to a flattened vector containing the (scaled) lower triangular part of the matrix:

(2)\[\begin{split}\begin{array}{ll} \mbox{vec}: & \R^{n\times n} \rightarrow \real^{n(n+1)/2} \\ \mbox{vec}(M) = & (\alpha_{11}M_{11},\alpha_{21}M_{21},\alpha_{22}M_{22},\ldots,\alpha_{n1}M_{n1},\ldots,\alpha_{nn}M_{nn}) \\ \alpha_{ij}=&\begin{cases}1 & j=i\\ \sqrt{2} & j<i\end{cases} \end{array}\end{split}\]

Note that \(\|M\|_F=\|\mbox{vec}(M)\|_2\). The Fusion implementation of \(\mbox{vec}\) is as follows:

Listing 10.11 Implementation of function \(vec\) in (2). Click here to download.
  /** Assuming that e is an NxN expression, return the lower triangular part as a vector.
   */
  public static Expression vec(Expression e) {
    int N         = e.getShape().dim(0);
    int[] msubi   = new int[N * (N + 1) / 2];
    int[] msubj   = new int[N * (N + 1) / 2];
    double[] mcof = new double[N * (N + 1) / 2];

    for (int i = 0, k = 0; i < N; ++i)
      for (int j = 0; j < i + 1; ++j, ++k) {
        msubi[k] = k;
        msubj[k] = i * N + j;
        if (i == j) mcof[k] = 1.0;
        else        mcof[k] = Math.sqrt(2);
      }

    Matrix S = Matrix.sparse(N * (N + 1) / 2, N * N, msubi, msubj, mcof);
    return Expr.mul(S, Expr.flatten(e));
  }

That leads to an optimization problem with both conic quadratic and semidefinite constraints:

(3)\[\begin{split}\begin{array}{ll} \minimize & t\\ \st & (t, \mbox{vec} (A-X)) \in \Q,\\ & \mathbf{diag}(X) = e,\\ & X \succeq 0.\\ \end{array}\end{split}\]

Code example

Listing 10.12 Implementation of problem (3). Click here to download.
  private static void nearestcorr(Matrix A)
  throws SolutionError {
    int N = A.numRows();

    Model M = new Model("NearestCorrelation");
    try {
      // Setting up the variables
      Variable X = M.variable("X", Domain.inPSDCone(N));
      Variable t = M.variable("t", 1, Domain.unbounded());

      // (t, vec (A-X)) \in Q
      M.constraint( Expr.vstack(t, vec(Expr.sub( A, X))), Domain.inQCone() );

      // diag(X) = e
      M.constraint(X.diag(), Domain.equalsTo(1.0));

      // Objective: Minimize t
      M.objective(ObjectiveSense.Minimize, t);

      // Solve the problem
      M.solve();

      // Get the solution values
      System.out.println("X = \n" + mattostr(X.level(), N));
      System.out.println("t = \n" + mattostr(t.level(), N));

    } finally {
      M.dispose();
    }
  }

We use the following input

Listing 10.13 Input for the nearest correlation problem.
    int N = 5;
    Matrix A = Matrix.dense(N, N,
                            new double[] { 0.0,  0.5,  -0.1,  -0.2,   0.5,
                                           0.5,  1.25, -0.05, -0.1,   0.25,
                                           -0.1, -0.05,  0.51,  0.02, -0.05,
                                           -0.2, -0.1,   0.02,  0.54, -0.1,
                                           0.5,  0.25, -0.05, -0.1,   1.25
                                         });

The expected output is the following (small differences may apply):

X =
[ 1.000e+00  5.000e-01  -1.000e-01  -2.000e-01  5.000e-01 ]
[ 5.000e-01  1.000e+00  -5.000e-02  -9.999e-02  2.500e-01 ]
[ -1.000e-01  -5.000e-02  1.000e+00  2.000e-02  -5.000e-02 ]
[ -2.000e-01  -9.999e-02  2.000e-02  1.000e+00  -9.999e-02 ]
[ 5.000e-01  2.500e-01  -5.000e-02  -9.999e-02  1.000e+00 ]

10.5.2 Nearest Correlation with Nuclear-norm Penalty

Next, we consider the approximation of \(A\) of the form \(D+X\) where \(D=\diag(w),\ w\geq 0\) and \(X\succeq 0\). We will also aim at minimizing the rank of \(X\). This can be approximated by a relaxed linear objective penalizing the trace \(\trace(X)\) (which in this case is the nuclear norm of \(X\) and happens to be the sum of its eigenvalues).

The combination of these constraints leads to a problem:

\[\begin{split}\begin{array}{ll} \minimize & \left\|X+\diag(w)-A\right\|_F + \gamma \trace(X),\\ \st & X \succeq 0, w \geq 0, \end{array}\end{split}\]

where the parameter \(\gamma\) controls the tradeoff between the quality of approximation and the rank of \(X\).

Exploit the mapping \(\mbox{vec}\) defined in (2) we can express this problem as:

(4)\[\begin{split}\begin{array}{ll} \minimize & t + \gamma\trace(X) \\ \st & (t, \mbox{vec} (X + \diag(w) - A) ) \in \Q, \\ & X \succeq 0 , w \geq 0. \end{array}\end{split}\]

Code example

Listing 10.14 Implementation of problem (4). Click here to download.
  /* Nearest correlation with nuclear norm penalty */
  private static void nearestcorr_nn(Matrix A, double[] gammas, double[] res, int[] rank)
  throws SolutionError {
    int N = A.numRows();
    Model M = new Model("NucNorm");
    try {
      // Setup variables
      Variable t = M.variable("t", 1, Domain.unbounded());
      Variable X = M.variable("X", N, Domain.inPSDCone());
      Variable w = M.variable("w", N, Domain.greaterThan(0.0));

      // (t, vec (X + diag(w) - A)) in Q
      Expression D = Expr.mulElm( Matrix.eye(N), Var.repeat(w, 1, N) );
      M.constraint( Expr.vstack( t, vec(Expr.sub(Expr.add(X, D), A)) ), Domain.inQCone() );

      // Trace(X)
      Expression TX = Expr.sum(X.diag());

      for (int k = 0; k < gammas.length; ++k) {
        // Objective: Minimize t + gamma*Tr(X)
        M.objective(ObjectiveSense.Minimize, Expr.add(t, Expr.mul(gammas[k], TX)));
        M.solve();

        // Get the eigenvalues of X and approximate its rank
        double[] d = new double[N];
        LinAlg.syeig(mosek.uplo.lo, N, X.level(), d);
        int rnk = 0; for (int i = 0; i < d.length; ++i) if (d[i] > 1e-6) ++rnk;

        res[k] = t.level()[0];
        rank[k] = rnk;
      }
    } finally {
      M.dispose();
    }
  }

We feed MOSEK with the same input as in Section 10.5.1. The problem is solved for a range of values \(\gamma\) values, to demonstrate how the penalty term helps achieve a low rank solution. To this extent we report both the rank of \(X\) and the residual norm \(\left\|X+\diag(w)-A\right\|_F\).

gamma = 0.0, res=3.076e-01, rank = 4
gamma = 0.1, res=4.252e-01, rank = 2
gamma = 0.2, res=5.112e-01, rank = 1
gamma = 0.3, res=5.298e-01, rank = 1
gamma = 0.4, res=5.593e-01, rank = 1
gamma = 0.5, res=6.046e-01, rank = 1
gamma = 0.6, res=6.764e-01, rank = 1
gamma = 0.7, res=8.010e-01, rank = 1
gamma = 0.8, res=1.062e+00, rank = 1
gamma = 0.9, res=1.130e+00, rank = 0
gamma = 1.0, res=1.130e+00, rank = 0