11.2 Primal Support-Vector Machine (SVM)¶
Machine-Learning (ML) has become a common widespread tool in many applications that affect our everyday life. In many cases, at the very core of these techniques there is an optimization problem. This case study focuses on the Support-Vector Machines (SVM).
The basic SVM model can be stated as:
We are given a set of
points in , partitioned into two groups. Find, if any, the separating hyperplane of the two subsets with the largest margin, i.e. as far as possible from the points.
Mathematical Model
Let
for all
If a solution exists,
To allow more flexibility the soft-margin SVM classifier is often used instead. It admits a violation of the large margin requirement (11.13) by a non-negative slack variable which is then penalized in the objective function.
In matrix form we have
where
Implementing the matrix formulation of the soft-margin SVM in Fusion is very easy. We only need to cast the problem in conic form, which in this case involves converting the quadratic term of the objective function into a conic constraint:
where
Fusion implementation
We now demonstrate how implement model (11.14). Let us assume that the training examples are stored in the rows of a matrix X
, the labels in a vector y
and that we have a set of weights C
for which we want to train the model. The implementation in Fusion of our conic model starts declaring the model class:
Model M = new Model("Primal SVM");
Then we proceed defining the variables :
Variable w = M.variable(n, Domain.unbounded());
Variable t = M.variable(1, Domain.unbounded());
Variable b = M.variable(1, Domain.unbounded());
Variable xi = M.variable(m, Domain.greaterThan(0.));
The conic constraint is obtained by stacking the three values:
M.constraint( Expr.vstack(1., t, w), Domain.inRotatedQCone());
Note how the dimension of the cone is deduced from the arguments. The relaxed classification constraints can be expressed using the built-in expressions available in Fusion. In particular:
element-wise multiplication
is performed with theExpr.mulElm
function;a vector whose entries are repetitions of
is produced byVar.repeat
.
The results is
M.constraint(
Expr.add( xi ,
Expr.mulElm( y,
Expr.sub( Expr.mul(X, w), Var.repeat(b, m) )
)
),
Domain.greaterThan( 1. )
);
Finally, the objective function is defined as
M.objective( ObjectiveSense.Minimize, Expr.add( t, Expr.mul(c, Expr.sum(xi) ) ) );
To solve a sequence of problems with varying C
we can simply iterate along those values changing the objective function:
for (int i = 0; i < nc; i++) {
double c = 500.0 * i;
M.objective( ObjectiveSense.Minimize, Expr.add( t, Expr.mul(c, Expr.sum(xi) ) ) );
M.solve();
try {
System.out.format("%4f | %8f", c, b.level()[0] );
for (int j = 0; j < n; j++)
System.out.format(" | %8f", w.level()[j] );
System.out.print("\n");
} catch (FusionException e) {}
}
}
Source code
Model M = new Model("Primal SVM");
try {
System.out.format("Number of data : %d\n", m);
System.out.format("Number of features: %d\n", n);
Variable w = M.variable(n, Domain.unbounded());
Variable t = M.variable(1, Domain.unbounded());
Variable b = M.variable(1, Domain.unbounded());
Variable xi = M.variable(m, Domain.greaterThan(0.));
M.constraint(
Expr.add( xi ,
Expr.mulElm( y,
Expr.sub( Expr.mul(X, w), Var.repeat(b, m) )
)
),
Domain.greaterThan( 1. )
);
M.constraint( Expr.vstack(1., t, w), Domain.inRotatedQCone());
System.out.println(" c | b | w");
for (int i = 0; i < nc; i++) {
double c = 500.0 * i;
M.objective( ObjectiveSense.Minimize, Expr.add( t, Expr.mul(c, Expr.sum(xi) ) ) );
M.solve();
try {
System.out.format("%4f | %8f", c, b.level()[0] );
for (int j = 0; j < n; j++)
System.out.format(" | %8f", w.level()[j] );
System.out.print("\n");
} catch (FusionException e) {}
}
}
finally {
M.dispose();
}
Example
We generate a random dataset consisting of two groups of points, each from a Gaussian distribution in
Random gen = new Random(seed);
int nump = gen.nextInt(m);
int numm = m - nump;
double [] y = new double[m];
Arrays.fill(y, 0, nump, 1.);
Arrays.fill(y, nump, m, -1.);
double [][] X = new double[m][n];
for (int i = 0; i < nump; i++)
for (int j = 0; j < n; j++)
X[i][j] = gen.nextGaussian() + 1.;
for (int i = nump; i < m; i++)
for (int j = 0; j < n; j++)
X[i][j] = gen.nextGaussian() - 1.;
With standard deviation

Fig. 11.2 Separating hyperplane for two clusters of points.¶
For

Fig. 11.3 Soft separating hyperplane for two groups of points.¶