## Training of a random model.

Semantic Image Segmentation with Conditional Random Fields
Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Training of a random model.

Say, we have 3 different classes, in which we want to classify input data. We derive 2 features from the data, so the feature vector has length of two values. For simplicity, we describe each feature value with a 8-bit value, so it lies within the integer interval from 0 till 255.

Having the groundtruth data, with known classes, we perform training, i.e. estimating the probability density functions (PDFs) of the feature values distribution according to the classes. Since we have 2 features, we can represent a PDF in 2 dimensional space for each class. (In general case, the PDF is n-dimensional function, where n = nFeatures (i.e. number of features))

The general case for large nFeatures is almost intractable from the numerical point of view. I.e. in order to store PDFs for all nStates classes within nFeatures-dimensional space, quantizied with 8 bit each, we need nStates * 256^nFeatures data cells. Therefore, a number of sophisticated approximations are used (i.e. Gaussian mixture model, etc.)

The distributions for 3 classes are depicted at Fig.1 (red channel - class 0, green - class 1, blue - class 2)

Fig. 1 (256 x 256 image, represents 3 PDFs for 3 classes on 2D feature spase)
hist2d.jpg (31.31 KiB) Viewed 38625 times
Last edited by Creator on Tue May 09, 2017, 08:40, edited 4 times in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Bayes Model

The Bayes model approximates the PDFs via decomposing n-Dimensional space into n one-Dimensional signals. For this purpose we can build the 1-Dimensional PDFs for each feature and for each state (class), neglecting all the dependensies between features. These 1-Dimensional PDFs are histohrams H[feature][state] of feature feature occurances for the given classes state. These normalized histogramms of length 256 are presented on Fig. 2

Fig. 2 (6 histograms of length 256, represent 1D PFDs for 3 classes on 2D feature space)

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * 256 * nFeatures data cells. As expected, all the features correlation information will be lost.
Last edited by Creator on Tue May 09, 2017, 08:49, edited 4 times in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Bayes Model

In order to reconstruct the ortogonal n-dimensional PDF function from n one-dimensional PDFs, we make use of the superposition of them:

PDF[featureVector][state] = MUL_{feature \in featureVector} (H[feature][state]);

Code: Select all

for (int state = 0; state < nStates; state++) {   PDF[state] = 1;   for (int feature = 0; feature < nFeatures; feature++) {         byte featureValue = featureVector[feature];      if (H[feature][state].n != 0)          PDF[state] *= H[feature][state].data[featureValue] / H[feature][state].n;      else         PDF[state] = 0;   }}

The restored normalized PDFs are depicted at Fig. 3.

Fig. 3 (256 x 256 image, represents 3 restored PDF via multiplicative superpositoon for 3 classes on 2D feature space
hist2d_MUL.jpg (26.09 KiB) Viewed 38605 times
Last edited by Creator on Tue May 09, 2017, 08:49, edited 4 times in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Gaussian Model

Using Bayes Model in training we gain high performance. Nevertheless we lose all the inter-feature dependencies, i.e. each feature influences the resulting potential independently from all other features. It is also possible to use approximation, which is free from that drawback, e.g. approximation of the original PDFs with Gaussian functions. In this case, the inter-features dependencies are coded in covariance matrix – one of two parameters of a multi-dimensional Gaussian kernel:

PDF[featureVector][state] = G[state](featureVector), where G[](x) is a nFeatures-dimensional Gaussian function.

The restored normalized PDFs are depicted at Fig. 4 for Gaussian model.

Fig. 4 (256 x 256 image, represents 3 restored PDF via gaussian model for 3 classes on 2D feature space)
hist2d_GM.jpg (23.64 KiB) Viewed 38604 times

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * (nFeatures^2 + nFeatures) data cells. As we can see it is quadratic under the nFeatures.
Last edited by Creator on Tue May 09, 2017, 08:51, edited 3 times in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Gaussian Mixture Model

In spite of the Gaussian Model can encode the inter-feature dependences, it may produce even worse results as Bayes model. Approximating a complex form of real distributions is sometimes almost impossible with a single Gaussian function. In that reason we can extend this model by substituting a single Gaussian with an additive superposition of several Gaussians functions:

PDF[featureVector][state] = SUM_{g \in nGaussians[state]} (k[g] * G[state][g](featureVector)), where nGaussians[state] - nuber of Gaussian functions for approximation of the PDF of the state state, and k[g] - is a weight koefficient, whith SUM_{g \in nGaussians[state]} (k[g]) = 1.

Fig. 5 (256 x 256 image, represents 3 restored PDF via gaussian mixture model for 3 classes on 2D feature space)
hist2d_GMM.jpg (19.69 KiB) Viewed 38449 times

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * nGaussians * (nFeatures^2 + nFeatures) data cells. It is also quadratic under the nFeatures.
Last edited by Creator on Wed May 10, 2017, 07:59, edited 1 time in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### OpenCV Gaussian Mixture Model

The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 6 (Each class is approximated with 16 Gaussians, default parameters).
In comparison to our sequential GMM approach, the OpenCV implementation has the following drawbacks:
• OpenCV GMM is about 10 times slower
• All the training samples must be kept in memory for training => this leads to impossibility of training the model on large training dataset, when the PC RAM resource is bounded.
• Relatively poor accuracy
PDF[featureVector][state] = CvEMpredictor_state(featureVector) .

Fig. 6 (256 x 256 image, represents 3 restored PDF via OpenCV gaussian mixture model for 3 classes on 2D feature space)
hist2d_CvGMM.jpg (19.96 KiB) Viewed 38256 times
Last edited by Creator on Wed May 10, 2017, 08:48, edited 1 time in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### OpenCV Random Forest Model

One more example of OpenCV training approach - Random Forest (RF). Its results for our test setup are depicted at Fig. 7. It has the same drawback as OpenCV GMM approach - all the training samples must be kept in memory for training. It is also very slow, but it is shown to produce good classification results in spite of the Fig. 7 differs from the Fig. 1 wery much (because of the discriminative nature of random forest approach).

PDF[featureVector][state] = CvRTrees_predictor_state(featureVector) .

Fig. 7 (256 x 256 image, represents 3 restored PDF via OpenCV random forest model for 3 classes on 2D feature space)
hist2d_RF.jpg (43.7 KiB) Viewed 38151 times
Last edited by Creator on Wed May 10, 2017, 08:53, edited 1 time in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### Microsoft Sherwood Random Forest Model

Another Random Forest implementation is taken from the Microsoft Sherwood library. Its results for our test setup are depicted at Fig. 8. These results are not much differ from the results, depicted in the Fig. 7, nevertheless, the classification accuracy may be different from dataset to dataset.

PDF[featureVector][state] = Forest(featureVector).

Fig. 8 (256 x 256 image, represents 3 restored PDF via Microsoft random forest model for 3 classes on 2D feature space)
hist2d_MsRF.jpg (31.68 KiB) Viewed 37592 times
Last edited by Creator on Fri Jun 09, 2017, 19:32, edited 2 times in total.

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### K-Nearest Neighbors Model

A new discriminative classifier is based on k-nearest neighbors algorithm (KNN), where the input consists of the k closest training samples in the feature space and the output depends on k-Nearest Neighbors. Its results for our test setup are depicted at Fig. 9. As the another discriniative methods (Random Forests, etc.) it provides high potentials for almost all of the samples, including those, which are very distant from the training samples. In order to organize the training samples in the k-D tree data structure, the algorithm also needs keeping all them in memory. However it has incredibly good performance on low-dimentional feature spaces.

PDF[featureVector][state] = KNN(featureVector).

Fig. 9 (256 x 256 image, represents 3 restored PDF via k-Nearest Neighbors model for 3 classes on 2D feature space)
hist2d_KNN.jpg (27.11 KiB) Viewed 29105 times

Creator
Posts: 157
Joined: Tue Dec 16, 2008, 20:52
Location: Hannover, Germany
Contact:

### OpenCV k-Nearest Neighbors Model

The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 10 (Each class is approximated with 16 Gaussians, default parameters).

PDF[featureVector][state] = CvKNN(featureVector).

Fig. 10 (256 x 256 image, represents 3 restored PDF via OpenCV k-Nearest Neighbors model for 3 classes on 2D feature space)
hist2d_CvKNN.jpg (14.79 KiB) Viewed 28915 times