Training of a random model.

Creator · Postby **Creator** » Thu Jun 07, 2012, 18:16

Say, we have 3 different classes, in which we want to classify input data. We derive 2 features from the data, so the feature vector has length of two values. For simplicity, we describe each feature value with a 8-bit value, so it lies within the integer interval from 0 till 255.

Having the groundtruth data, with known classes, we perform training, i.e. estimating the probability density functions (PDFs) of the feature values distribution according to the classes. Since we have 2 features, we can represent a PDF in 2 dimensional space for each class. (In general case, the PDF is n-dimensional function, where n = nFeatures (i.e. number of features))

The general case for large nFeatures is almost intractable from the numerical point of view. I.e. in order to store PDFs for all nStates classes within nFeatures-dimensional space, quantizied with 8 bit each, we need nStates * 256^nFeatures data cells. Therefore, a number of sophisticated approximations are used (i.e. Gaussian mixture model, etc.)

The distributions for 3 classes are depicted at Fig.1 (red channel - class 0, green - class 1, blue - class 2)

: Fig. 1 (256 x 256 image, represents 3 PDFs for 3 classes on 2D feature spase); hist2d.jpg (31.31 KiB) Viewed 50670 times

Creator · Postby **Creator** » Tue Jun 19, 2012, 02:45

The Bayes model approximates the PDFs via decomposing n-Dimensional space into n one-Dimensional signals. For this purpose we can build the 1-Dimensional PDFs for each feature and for each state (class), neglecting all the dependensies between features. These 1-Dimensional PDFs are histohrams H[feature][state] of feature feature occurances for the given classes state. These normalized histogramms of length 256 are presented on Fig. 2

: Fig. 2 (6 histograms of length 256, represent 1D PFDs for 3 classes on 2D feature space)

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * 256 * nFeatures data cells. As expected, all the features correlation information will be lost.

Creator · Postby **Creator** » Wed Jun 20, 2012, 23:28

In order to reconstruct the ortogonal n-dimensional PDF function from n one-dimensional PDFs, we make use of the superposition of them:

PDF[featureVector][state] = MUL_{feature \in featureVector} (H[feature][state]);

Code: Select all

for (int state = 0; state < nStates; state++) {
   PDF[state] = 1;
   for (int feature = 0; feature < nFeatures; feature++) {   
      byte featureValue = featureVector[feature];
      if (H[feature][state].n != 0) 
         PDF[state] *= H[feature][state].data[featureValue] / H[feature][state].n;
      else
         PDF[state] = 0;
   }
}

The restored normalized PDFs are depicted at Fig. 3.

: Fig. 3 (256 x 256 image, represents 3 restored PDF via multiplicative superpositoon for 3 classes on 2D feature space; hist2d_MUL.jpg (26.09 KiB) Viewed 50650 times

Creator · Postby **Creator** » Thu Jun 21, 2012, 01:13

Using Bayes Model in training we gain high performance. Nevertheless we lose all the inter-feature dependencies, i.e. each feature influences the resulting potential independently from all other features. It is also possible to use approximation, which is free from that drawback, e.g. approximation of the original PDFs with Gaussian functions. In this case, the inter-features dependencies are coded in covariance matrix – one of two parameters of a multi-dimensional Gaussian kernel:

PDF[featureVector][state] = G[state](featureVector), where G[](x) is a nFeatures-dimensional Gaussian function.

The restored normalized PDFs are depicted at Fig. 4 for Gaussian model.

: Fig. 4 (256 x 256 image, represents 3 restored PDF via gaussian model for 3 classes on 2D feature space); hist2d_GM.jpg (23.64 KiB) Viewed 50649 times

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * (nFeatures^2 + nFeatures) data cells. As we can see it is quadratic under the nFeatures.

Creator · Postby **Creator** » Tue Oct 23, 2012, 17:06

In spite of the Gaussian Model can encode the inter-feature dependences, it may produce even worse results as Bayes model. Approximating a complex form of real distributions is sometimes almost impossible with a single Gaussian function. In that reason we can extend this model by substituting a single Gaussian with an additive superposition of several Gaussians functions:

PDF[featureVector][state] = SUM_{g \in nGaussians[state]} (k[g] * G[state][g](featureVector)), where nGaussians[state] - nuber of Gaussian functions for approximation of the PDF of the state state, and k[g] - is a weight koefficient, whith SUM_{g \in nGaussians[state]} (k[g]) = 1.

: Fig. 5 (256 x 256 image, represents 3 restored PDF via gaussian mixture model for 3 classes on 2D feature space); hist2d_GMM.jpg (19.69 KiB) Viewed 50494 times

This approach allows us to shrink nStates * 256^nFeatures data values into nStates * nGaussians * (nFeatures^2 + nFeatures) data cells. It is also quadratic under the nFeatures.

Creator · Postby **Creator** » Tue Dec 25, 2012, 19:08

The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 6 (Each class is approximated with 16 Gaussians, default parameters).
In comparison to our sequential GMM approach, the OpenCV implementation has the following drawbacks:

OpenCV GMM is about 10 times slower
All the training samples must be kept in memory for training => this leads to impossibility of training the model on large training dataset, when the PC RAM resource is bounded.
Relatively poor accuracy

PDF[featureVector][state] = CvEMpredictor_state(featureVector) .

: Fig. 6 (256 x 256 image, represents 3 restored PDF via OpenCV gaussian mixture model for 3 classes on 2D feature space); hist2d_CvGMM.jpg (19.96 KiB) Viewed 50301 times

Creator · Postby **Creator** » Fri Jan 25, 2013, 15:50

One more example of OpenCV training approach - Random Forest (RF). Its results for our test setup are depicted at Fig. 7. It has the same drawback as OpenCV GMM approach - all the training samples must be kept in memory for training. It is also very slow, but it is shown to produce good classification results in spite of the Fig. 7 differs from the Fig. 1 wery much (because of the discriminative nature of random forest approach).

PDF[featureVector][state] = CvRTrees_predictor_state(featureVector) .

: Fig. 7 (256 x 256 image, represents 3 restored PDF via OpenCV random forest model for 3 classes on 2D feature space); hist2d_RF.jpg (43.7 KiB) Viewed 50196 times

Creator · Postby **Creator** » Fri Jun 07, 2013, 22:04

Another Random Forest implementation is taken from the Microsoft Sherwood library. Its results for our test setup are depicted at Fig. 8. These results are not much differ from the results, depicted in the Fig. 7, nevertheless, the classification accuracy may be different from dataset to dataset.

PDF[featureVector][state] = Forest(featureVector).

: Fig. 8 (256 x 256 image, represents 3 restored PDF via Microsoft random forest model for 3 classes on 2D feature space); hist2d_MsRF.jpg (31.68 KiB) Viewed 49637 times

Creator · Postby **Creator** » Wed May 10, 2017, 08:55

A new discriminative classifier is based on k-nearest neighbors algorithm (KNN), where the input consists of the k closest training samples in the feature space and the output depends on k-Nearest Neighbors. Its results for our test setup are depicted at Fig. 9. As the another discriniative methods (Random Forests, etc.) it provides high potentials for almost all of the samples, including those, which are very distant from the training samples. In order to organize the training samples in the k-D tree data structure, the algorithm also needs keeping all them in memory. However it has incredibly good performance on low-dimentional feature spaces.

PDF[featureVector][state] = KNN(featureVector).

: Fig. 9 (256 x 256 image, represents 3 restored PDF via k-Nearest Neighbors model for 3 classes on 2D feature space); hist2d_KNN.jpg (27.11 KiB) Viewed 41150 times

Creator · Postby **Creator** » Wed Aug 23, 2017, 01:31

The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 10 (Each class is approximated with 16 Gaussians, default parameters).

PDF[featureVector][state] = CvKNN(featureVector).

: Fig. 10 (256 x 256 image, represents 3 restored PDF via OpenCV k-Nearest Neighbors model for 3 classes on 2D feature space); hist2d_CvKNN.jpg (14.79 KiB) Viewed 40960 times

Training of a random model.

Training of a random model.

Bayes Model

Bayes Model

Gaussian Model

Gaussian Mixture Model

OpenCV Gaussian Mixture Model

OpenCV Random Forest Model

Microsoft Sherwood Random Forest Model

K-Nearest Neighbors Model

OpenCV k-Nearest Neighbors Model

Who is online