9.3 Computing a Sparse Cholesky Factorization

Given a positive semidefinite symmetric (PSD) matrix

\[A \in \real^{n\times n}\]

it is well known there exists a matrix \(L\) such that

\[A = LL^T.\]

If the matrix \(L\) is lower triangular then it is called a Cholesky factorization. Given \(A\) is positive definite (nonsingular) then \(L\) is also nonsingular. A Cholesky factorization is useful for many reasons:

  • A system of linear equations \(Ax=b\) can be solved by first solving the lower triangular system \(L y = b\) followed by the upper triangular system \(L^T x = y\).

  • A quadratic term \(x^TAx\) in a constraint or objective can be replaced with \(y^Ty\) for \(y=L^Tx\), potentially leading to a more robust formulation (see [And13]).

Therefore, MOSEK provides a function that can compute a Cholesky factorization of a PSD matrix. In addition a function for solving linear systems with a nonsingular lower or upper triangular matrix is available.

In practice \(A\) may be very large with \(n\) is in the range of millions. However, then \(A\) is typically sparse which means that most of the elements in \(A\) are zero, and sparsity can be exploited to reduce the cost of computing the Cholesky factorization. The computational savings depend on the positions of zeros in \(A\). For example, below a matrix \(A\) is given together with a Cholesky factor up to 5 digits of accuracy:

(9.6)\[\begin{split}A = \left[ \begin{array}{cccc} 4 & 1 & 1 & 1 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \end{array} \right ], \quad L = \left[ \begin{array}{cccc} 2.0000 & 0 & 0 & 0 \\ 0.5000 & 0.8660 & 0 & 0 \\ 0.5000 & -0.2887 & 0.8165 & 0 \\ 0.5000 & -0.2887 & -0.4082 & 0.7071 \end{array} \right ].\end{split}\]

However, if we symmetrically permute the rows and columns of \(A\) using a permutation matrix \(P\)

\[\begin{split}P = \left [ \begin{array}{cccc} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \\ \end{array} \right ],\quad A' = PAP^T = \left [ \begin{array}{cccc} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 1 \\ 0 & 0 & 1 & 1 \\ 1 & 1 & 1 & 4 \\ \end{array} \right ],\end{split}\]

then the Cholesky factorization of \(A'=L'L'^T\) is

\[\begin{split}L' = \left [ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 1 & 1 & 1 & 1 \\ \end{array} \right ]\end{split}\]

which is sparser than \(L\).

Computing a permutation matrix that leads to the sparsest Cholesky factorization or the minimal amount of work is NP-hard. Good permutations can be chosen by using heuristics, such as the minimum degree heuristic and variants. The function Env.computesparsecholesky provided by MOSEK for computing a Cholesky factorization has a build in permutation aka. reordering heuristic. The following code illustrates the use of Env.computesparsecholesky and Env.sparsetriangularsolvedense.

Listing 9.3 How to use the sparse Cholesky factorization routine available in MOSEK. Click here to download.
    try:
        perm, diag, lnzc, lptrc, lensubnval, lsubc, lvalc = env.computesparsecholesky(
            0,      # Mosek chooses number of threads
            1,      # Use reordering heuristic
            1.0e-14,# Singularity tolerance
            anzc, aptrc, asubc, avalc)

        printsparse(n, perm, diag, lnzc, lptrc, lensubnval, lsubc, lvalc)

        x = [b[p] for p in perm]   # Permuted b is stored as x.

        # Compute inv(L)*x.
        env.sparsetriangularsolvedense(mosek.transpose.no,
                                       lnzc, lptrc, lsubc, lvalc, x)

        # Compute inv(L^T)*x.
        env.sparsetriangularsolvedense(mosek.transpose.yes,
                                       lnzc, lptrc, lsubc, lvalc, x)

        print("\nSolution Ax=b: x = ", numpy.array(
            [x[j] for i in range(n) for j in range(n) if perm[j] == i]), "\n")
    except:
        raise

We can set up the data to recreate the matrix \(A\) from (9.6):

    # Observe that anzc, aptrc, asubc and avalc only specify the lower
    # triangular part.
    n = 4
    anzc = [4, 1, 1, 1]
    asubc = [0, 1, 2, 3, 1, 2, 3]
    aptrc = [0, 4, 5, 6]
    avalc = [4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
    b = [13.0, 3.0, 4.0, 5.0]

and we obtain the following output:

Example with positive definite A.
P = [ 3 2 0 1 ]
diag(D) = [ 0.00 0.00 0.00 0.00 ]
L=
1.00 0.00 0.00 0.00
0.00 1.00 0.00 0.00
1.00 1.00 1.41 0.00
0.00 0.00 0.71 0.71

Solution A x = b, x = [ 1.00 2.00 3.00 4.00 ]

The output indicates that with the permutation matrix

\[\begin{split}P = \left [ \begin{array}{cccc} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ \end{array} \right ]\end{split}\]

there is a Cholesky factorization \(PAP^T=LL^T\), where

\[\begin{split}L = \left [ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 1 & 1 & 1.4142 & 0 \\ 0 & 0 & 0.7071 & 0.7071 \\ \end{array} \right ]\end{split}\]

The remaining part of the code solvers the linear system \(Ax=b\) for \(b=[13,3,4,5]^T\). The solution is reported to be \(x = [1, 2, 3, 4]^T\), which is correct.

The second example shows what happens when we compute a sparse Cholesky factorization of a singular matrix. In this example \(A\) is a rank 1 matrix

(9.7)\[\begin{split}A = \left[ \begin{array}{cccc} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ \end{array} \right ] = \left[ \begin{array}{cccc} 1 \\ 1 \\ 1 \end{array} \right ] \left[ \begin{array}{cccc} 1 \\ 1 \\ 1 \end{array} \right ]^T\end{split}\]
    #Example 2 - singular A
    n = 3
    anzc = [3, 2, 1]
    asubc = [0, 1, 2, 1, 2, 2]
    aptrc = [0, 3, 5]
    avalc = [1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

Now we get the output

P = [ 0 2 1 ]
diag(D) = [ 0.00e+00 1.00e-14 1.00e-14 ]
L=
1.00e+00 0.00e+00 0.00e+00
1.00e+00 1.00e-07 0.00e+00
1.00e+00 0.00e+00 1.00e-07

which indicates the decomposition

\[P A P^T = LL^T - D\]

where

\[\begin{split}P = \left[ \begin{array}{cccc} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \end{array} \right ],\quad L = \left[ \begin{array}{cccc} 1 & 0 & 0 \\ 1 & 10^{-7} & 0 \\ 1 & 0 & 10^{-7} \end{array} \right ],\quad D = \left[ \begin{array}{cccc} 1 & 0 & 0 \\ 0 & 10^{-14} & 0 \\ 0 & 0 & 10^{-14} \end{array} \right ].\end{split}\]

Since \(A\) is only positive semdefinite, but not of full rank, some of diagonal elements of \(A\) are boosted to make it truely positive definite. The amount of boosting is passed as an argument to Env.computesparsecholesky, in this case \(10^{-14}\). Note that

\[P A P^T = LL^T - D\]

where \(D\) is a small matrix so the computed Cholesky factorization is exact of slightly perturbed \(A\). In general this is the best we can hope for in finite precision and when \(A\) is singular or close to being singular.

We will end this section by a word of caution. Computing a Cholesky factorization of a matrix that is not of full rank and that is not suffciently well conditioned may lead to incorrect results i.e. a matrix that is indefinite may declared positive semidefinite and vice versa.