Say, we have 3 different classes, in which we want to classify input data. We derive 2 features from the data, so the feature vector has length of two values. For simplicity, we describe each feature value with a 8-bit value, so it lies within the integer interval from 0 till 255.
Having the groundtruth data, with known classes, we perform training, i.e. estimating the probability density functions (PDFs) of the feature values distribution according to the classes. Since we have 2 features, we can represent a PDF in 2 dimensional space for each class. (In general case, the PDF is n-dimensional function, where n = nFeatures (i.e. number of features))
The general case for large nFeatures is almost intractable from the numerical point of view. I.e. in order to store PDFs for all nStates classes within nFeatures-dimensional space, quantizied with 8 bit each, we need nStates * 256^nFeatures data cells. Therefore, a number of sophisticated approximations are used (i.e. Gaussian mixture model, etc.)
The distributions for 3 classes are depicted at Fig.1 (red channel - class 0, green - class 1, blue - class 2)
Training of a random model.
Training of a random model.
Last edited by Creator on Tue May 09, 2017, 08:40, edited 4 times in total.
Bayes Model
The Bayes model approximates the PDFs via decomposing n-Dimensional space into n one-Dimensional signals. For this purpose we can build the 1-Dimensional PDFs for each feature and for each state (class), neglecting all the dependensies between features. These 1-Dimensional PDFs are histohrams H[feature][state] of feature feature occurances for the given classes state. These normalized histogramms of length 256 are presented on Fig. 2
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * 256 * nFeatures data cells. As expected, all the features correlation information will be lost.
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * 256 * nFeatures data cells. As expected, all the features correlation information will be lost.
Last edited by Creator on Tue May 09, 2017, 08:49, edited 4 times in total.
Bayes Model
In order to reconstruct the ortogonal n-dimensional PDF function from n one-dimensional PDFs, we make use of the superposition of them:
PDF[featureVector][state] = MUL_{feature \in featureVector} (H[feature][state]);
The restored normalized PDFs are depicted at Fig. 3.
PDF[featureVector][state] = MUL_{feature \in featureVector} (H[feature][state]);
Code: Select all
for (int state = 0; state < nStates; state++) {
PDF[state] = 1;
for (int feature = 0; feature < nFeatures; feature++) {
byte featureValue = featureVector[feature];
if (H[feature][state].n != 0)
PDF[state] *= H[feature][state].data[featureValue] / H[feature][state].n;
else
PDF[state] = 0;
}
}
The restored normalized PDFs are depicted at Fig. 3.
Last edited by Creator on Tue May 09, 2017, 08:49, edited 4 times in total.
Gaussian Model
Using Bayes Model in training we gain high performance. Nevertheless we lose all the inter-feature dependencies, i.e. each feature influences the resulting potential independently from all other features. It is also possible to use approximation, which is free from that drawback, e.g. approximation of the original PDFs with Gaussian functions. In this case, the inter-features dependencies are coded in covariance matrix – one of two parameters of a multi-dimensional Gaussian kernel:
PDF[featureVector][state] = G[state](featureVector), where G[](x) is a nFeatures-dimensional Gaussian function.
The restored normalized PDFs are depicted at Fig. 4 for Gaussian model.
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * (nFeatures^2 + nFeatures) data cells. As we can see it is quadratic under the nFeatures.
PDF[featureVector][state] = G[state](featureVector), where G[](x) is a nFeatures-dimensional Gaussian function.
The restored normalized PDFs are depicted at Fig. 4 for Gaussian model.
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * (nFeatures^2 + nFeatures) data cells. As we can see it is quadratic under the nFeatures.
Last edited by Creator on Tue May 09, 2017, 08:51, edited 3 times in total.
Gaussian Mixture Model
In spite of the Gaussian Model can encode the inter-feature dependences, it may produce even worse results as Bayes model. Approximating a complex form of real distributions is sometimes almost impossible with a single Gaussian function. In that reason we can extend this model by substituting a single Gaussian with an additive superposition of several Gaussians functions:
PDF[featureVector][state] = SUM_{g \in nGaussians[state]} (k[g] * G[state][g](featureVector)), where nGaussians[state] - nuber of Gaussian functions for approximation of the PDF of the state state, and k[g] - is a weight koefficient, whith SUM_{g \in nGaussians[state]} (k[g]) = 1.
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * nGaussians * (nFeatures^2 + nFeatures) data cells. It is also quadratic under the nFeatures.
PDF[featureVector][state] = SUM_{g \in nGaussians[state]} (k[g] * G[state][g](featureVector)), where nGaussians[state] - nuber of Gaussian functions for approximation of the PDF of the state state, and k[g] - is a weight koefficient, whith SUM_{g \in nGaussians[state]} (k[g]) = 1.
This approach allows us to shrink nStates * 256^nFeatures data values into nStates * nGaussians * (nFeatures^2 + nFeatures) data cells. It is also quadratic under the nFeatures.
Last edited by Creator on Wed May 10, 2017, 07:59, edited 1 time in total.
OpenCV Gaussian Mixture Model
The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 6 (Each class is approximated with 16 Gaussians, default parameters).
In comparison to our sequential GMM approach, the OpenCV implementation has the following drawbacks:
In comparison to our sequential GMM approach, the OpenCV implementation has the following drawbacks:
- OpenCV GMM is about 10 times slower
- All the training samples must be kept in memory for training => this leads to impossibility of training the model on large training dataset, when the PC RAM resource is bounded.
- Relatively poor accuracy
Last edited by Creator on Wed May 10, 2017, 08:48, edited 1 time in total.
OpenCV Random Forest Model
One more example of OpenCV training approach - Random Forest (RF). Its results for our test setup are depicted at Fig. 7. It has the same drawback as OpenCV GMM approach - all the training samples must be kept in memory for training. It is also very slow, but it is shown to produce good classification results in spite of the Fig. 7 differs from the Fig. 1 wery much (because of the discriminative nature of random forest approach).
PDF[featureVector][state] = CvRTrees_predictor_state(featureVector) .
PDF[featureVector][state] = CvRTrees_predictor_state(featureVector) .
Last edited by Creator on Wed May 10, 2017, 08:53, edited 1 time in total.
Microsoft Sherwood Random Forest Model
Another Random Forest implementation is taken from the Microsoft Sherwood library. Its results for our test setup are depicted at Fig. 8. These results are not much differ from the results, depicted in the Fig. 7, nevertheless, the classification accuracy may be different from dataset to dataset.
PDF[featureVector][state] = Forest(featureVector).
PDF[featureVector][state] = Forest(featureVector).
Last edited by Creator on Fri Jun 09, 2017, 19:32, edited 2 times in total.
K-Nearest Neighbors Model
A new discriminative classifier is based on k-nearest neighbors algorithm (KNN), where the input consists of the k closest training samples in the feature space and the output depends on k-Nearest Neighbors. Its results for our test setup are depicted at Fig. 9. As the another discriniative methods (Random Forests, etc.) it provides high potentials for almost all of the samples, including those, which are very distant from the training samples. In order to organize the training samples in the k-D tree data structure, the algorithm also needs keeping all them in memory. However it has incredibly good performance on low-dimentional feature spaces.
PDF[featureVector][state] = KNN(featureVector).
PDF[featureVector][state] = KNN(featureVector).
OpenCV k-Nearest Neighbors Model
The same as above, it is also possible to make use of the OpenCV implementation of the GMM. It is based on the Expectation Maximization (EM) method and produces the results, depicted at Fig. 10 (Each class is approximated with 16 Gaussians, default parameters).
PDF[featureVector][state] = CvKNN(featureVector).
PDF[featureVector][state] = CvKNN(featureVector).
Return to “Direct Graphical Models”
Who is online
Users browsing this forum: No registered users and 0 guests