The ongoing lines describe how to use estimate a Gaussian mixture model using the VlFeat implementation of Expectation Maximization algorithm.
EM algorithm attempts to model a dataset as a mixture of K multivariate gaussian distributions.
Consider a dataset containing 1000 randomly sampled points in 2-D.
If one wants to estimate a gaussian mixture of this dataset, the following commands could be invoked:
In the means
, sigmas
and weights
variables are stored means, sigmas and weights of estimated gaussians,
which form the mixture. One of the possible outcomes of this algorithm is
presented in following figure:
The visualization was done using the vl_plotframe method.
Note that the ellipses are axis alligned. This is an outcome of the optimization method, where (for the sake of speed) all the computations are done only with diagonals of covariance matrices.
The most simple way how to initiate the GMM algorithm is
to pick numClusters
random subset of data points,
as initial means of individual gaussians, the covariance of the whole dataset
as initial covariance matrices and equal weights which sum to one as
initial weight of each gaussian. This random method is implicitly set
when running vl_gmm
function. However user can specify the
Custom initialization method.
The Custom initialization method is used when a user wants
to specify its own initialization of the algorithm. When
the 'Initialization'
option is set to 'Custom'
also the options 'InitMeans'
, 'InitSigmas'
and
'InitWeights'
have to be set. This initialization approach
is frequently used with KMeans algorithm. KMeans is used
to obtain initial means, covariances and weights of gaussians. After
this an EM algorithm takes place. We show the workflow in the following
piece of code:
The demo scripts vl_demo_gmm_2d and vl_demo_gmm_3d also produce cute colorized figures such as these: