This package is a set of Matlab functions for chain-structured conditional random fields (CRFs) with categorical features. The code implements decoding (with the Viterbi algorithm), inference (with the forwards-backwards algorithm), sampling (with the forwards-filter bacwards-sample algorithm), and parameter estimation (with a limited-memory quasi-Newton algorithm) in these models. Several of the functions have been implemented in C as mex files to speed up calculations.
The functions use a data matrix X and a label vector y. Each element i in y gives the label at position i, among the values 0-k (for some k). The label 0 is special, it indicates the position between 'sentences'. Each element (i,j) in the data matrix X gives the value of feature j at position i, among the values 0-f (for some f that can be different for each feature). A feature value of 0 is special, it indicates that the feature is ignored at this position (this is used in natural language applications to represent words that don't occur very often). The features at positions with label values of 0 are ignored.
The model uses four sets of parameters:
wi,j,k: The potential of state j given that feature i takes the value k.
vstart,j: The potential of state j to start a sentence.
vend,j: The potential of state j to end a sentence.
vi,j: The potential for the transition from state i to j.
The specific model for a single sentence of length s where each word has
f features is:
p(y1:s | x1:s, w, vstart, vend, v) =
(1/Z)exp( &Sigma{i=1:s}&Sigma{j=1:f}[wj,yi,xi,j] + vstart,y1 + vend,ys + &Sigma{i=1:s-1}[vyi,i+1] )
The value of Z is chosen so that the sum over assignment of
y1:s is equal to one. To add a bias for each label, you can append a column of ones to X.
Note that crfChain contains many sub-directories that must be present on the Matlab path for the files to work. You can add these sub-directories to the Matlab path by typing (in Matlab) 'addpath(genpath(crfChain_dir))', where 'crfChain_dir' is the directory that the zip file is extracted to.
The mex files for several architectures are included in the archive. For systems where the mex files are not included, you can compile the mex files by calling the 'mexAll' function. The default Matlab compiler does not work on Windows, but you can use MinGW and Gnumex instead.