11.5 Logistic regression

Logistic regression is an example of a binary classifier, where the output takes one two values 0 or 1 for each data point. We call the two values classes.

Formulation as an optimization problem

Define the sigmoid function

S(x)=11+exp(x).

Next, given an observation xRd and a weights θRd we set

hθ(x)=S(θTx)=11+exp(θTx).

The weights vector θ is part of the setup of the classifier. The expression hθ(x) is interpreted as the probability that x belongs to class 1. When asked to classify x the returned answer is

x{1hθ(x)1/2,0hθ(x)<1/2.

When training a logistic regression algorithm we are given a sequence of training examples xi, each labelled with its class yi{0,1} and we seek to find the weights θ which maximize the likelihood function

ihθ(xi)yi(1hθ(xi))1yi.

Of course every single yi equals 0 or 1, so just one factor appears in the product for each training data point. By taking logarithms we can define the logistic loss function:

J(θ)=i:yi=1log(hθ(xi))i:yi=0log(1hθ(xi)).

The training problem with regularization (a standard technique to prevent overfitting) is now equivalent to

minθJ(θ)+λθ2.

This can equivalently be phrased as

(11.20)minimizeiti+λrsubject totilog(hθ(x))=log(1+exp(θTxi))if yi=1,tilog(1hθ(x))=log(1+exp(θTxi))if yi=0,rθ2.

Implementation

As can be seen from (11.20) the key point is to implement the softplus bound tlog(1+eu), which is the simplest example of a log-sum-exp constraint for two terms. Here t is a scalar variable and u will be the affine expression of the form ±θTxi. This is equivalent to

exp(ut)+exp(t)1

and further to

(11.21)(z1,1,ut)Kexp(z1exp(ut)),(z2,1,t)Kexp(z2exp(t)),z1+z21.
Listing 11.11 Implementation of tlog(1+eu) as in (11.21). Click here to download.
    // t >= log( 1 + exp(u) ) coordinatewise
    public static void softplus(Model      M,
                                Expression t,
                                Expression u)
    {
      int n = t.GetShape()[0];
      Variable z1 = M.Variable(n);
      Variable z2 = M.Variable(n);
      M.Constraint(Expr.Add(z1, z2), Domain.EqualsTo(1));
      M.Constraint(Expr.Hstack(z1, Expr.ConstTerm(n, 1.0), Expr.Sub(u,t)), Domain.InPExpCone());
      M.Constraint(Expr.Hstack(z2, Expr.ConstTerm(n, 1.0), Expr.Neg(t)), Domain.InPExpCone());
    }

Once we have this subroutine, it is easy to implement a function that builds the regularized loss function model (11.20).

Listing 11.12 Implementation of (11.20). Click here to download.
    // Model logistic regression (regularized with full 2-norm of theta)
    // X - n x d matrix of data points
    // y - length n vector classifying training points
    // lamb - regularization parameter
    public static Model logisticRegression(double[,]  X, 
                                           bool[]     y,
                                           double     lamb)
    {
      int n = X.GetLength(0);
      int d = X.GetLength(1);       // num samples, dimension
      
      Model M = new Model();   

      Variable theta = M.Variable("theta", d);
      Variable t     = M.Variable(n);
      Variable reg   = M.Variable();

      M.Objective(ObjectiveSense.Minimize, Expr.Add(Expr.Sum(t), Expr.Mul(lamb,reg)));
      M.Constraint(Var.Vstack(reg, theta), Domain.InQCone());
      
      double[] signs = new double[n];
      for(int i = 0; i < n; i++)
        if (y[i]) signs[i] = -1; else signs[i] = 1;

      softplus(M, t, Expr.MulElm(Expr.Mul(X, theta), signs));

      return M;
    }

Example: 2D dataset fitting

In the next figure we apply logistic regression to the training set of 2D points taken from the example ex2data2.txt . The two-dimensional dataset was converted into a feature vector xR28 using monomial coordinates of degrees at most 6.

_images/logistic-regression.png

Fig. 11.6 Logistic regression example with none, medium and strong regularization (small, medium, large λ). Without regularization we get obvious overfitting.