Data is loaded from a text file. To begin, select the file by clicking on the "Browse" button and choose the file you want to process on the dialog window that appears. Select the delimiter type. Once a file is and the delimiter is selected, click on the "Process Data" button to begin the loading and processing of the training data.
Each line in the file is a training example with one or more features followed by its the classification category. Each of these values are separated by a delimiter, e.g. tabs, spaces, commas, etc.
Example 1
0,1,1
In the above example, the line contains 3 values separated by a comma delimiter. This means that the training example has 2 features (0, 1) and its classification category is 1.
Example 2
8.55 26.3 2
Example 2 also has 2 features (8.55, 26.3) and a classification category of 2, separated by spaces
If there are several examples provided in the text file, it will estimate the number of categories. If the text file loading and processing succeeded, the status indicator will look similar to the one below:
A failure in the loading and processing is also similarly indicated:
Most loading and processing errors that occur are due to the mismatch between the chosen delimiter and the one used by the file.
Support vector machines are binary classifiers. In order to train a model to perform multi-class/category classification, multiple models are needed, i.e. one model for each class/category. Input data is mapped into a new space via "kernels" to enhance the separability between different classes. SVMs are usually slower to train compared to other machine learning methods. Furthermore, mapping input data into new spaces also incurs a performance penalty. It is therefore recommended that multiclass SVMs are only utilized for small data sets.
Kernels requires one or two parameters, e.g. (σ, slope, intercept, etc.). A model cannot be added if another model has the same Category number or if it has a value beyond those obtained from the loaded data.
Regularization, Tolerance, Passes, CategoryThe regularization parameter C controls the ability of the trained model to generalized on an unseen data, i.e. the trade-off between low error on the training set, versus a low error on the test set.
The Tolerance parameter is used for determinining the equality between floating point values, i.e. if difference between floating point values is less than the tolerance value then the two floating point values are "equal".
The training process iterates over the training examples up to the limit set by the Passes parameter. The process terminates if it has reached the maximum number of passes. If the model's α parameter values do not change or if the difference is less than the Tolerance value it completes 1 pass, otherwise the counter is reset.
The Category parameter controls what the model should consider as a positive example, e.g. Category = 1 means that it only considers examples classified as 1 to be a positive example. For data sets comprising several categories, a separate model must be trained for each unique category.
Test data is also loaded from a text file similarly. Select the file by clicking on the "Browse" button and choose the file you want to process on the dialog window that appears. Click on the "Process data" to load the test data. Similar success or failure indicators are also shown based on the results of the loading process. Set the classification threshold and click on the "Classify" button to classify the loaded test data. The threshold value indicates the minimum prediction score required to classify one sample into a specific category. You can choose to classify using only one model or classify with all models by choosing "All models" from the model selector. Classifying with "All models" will expose each test data point to all models will choose the best classification result.
Trained model parameters are viewable in this page. You can also save or load these trained model parameters to/from a JSON file. This JSON file is fully compatible with the Support Vector Machine Classifier software and can be used interchangeably with that program.
Plot the classified test data or save it as a scalable vector graphics (SVG) file. Contour lines (decision boundaries) display is optional.