This article describes application of Evolutionary Algorithms to the task of Feature Selection. In particular the algorithms studies in this article are Particle Swarm Algorithm (PSO) and Genetic Algorithm (GA). The results are based on particular parameters used in experimentation. Here several parameters are analyzed for this problem on two datasets:
- Leukemia dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
- Colon Cancer dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
- Educational data mining data set: kdd2010 bridge-toalgebra (kddb) dataset.
The purpose of this article is to show comparative analysis of experiments and that choice of kernel depends both on datasets–the kind of problem, the number of iterations performed, parameters used and more specifically aim of the problem. Here the aim of problem is feature selection which means reducing the dataset to lesser number of features while retaining the accuracy. The fitness function for this problem is a weighted mean of accuracy and number of features. In the following experiments weights are so changed that number of features selected remains below or equal to 20. Hence a decrease in accuracy is noticed below, given the fact the the number of epocs are not changed. The following svm kernels are experimented.
Further, results depend on lot of parameters and this article is the illustration of results in one experimental setup and are not standard results. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.
- Linear
- Radial Basis Function (RBF)
- Polynomial Function
This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of experimentation for feature selection using SVM, GA and PSO.
Experiment 1: Leukemia Dataset, Linear SVM
Linear Kernel.
The final accuracy reached was: 100
Number of features selected: 9
The following shows the graph of fitness function versus epochs

The final accuracy reached was: 100

Experiment 2: Leukemia Dataset, RBF Kernel
Radial Basis Kernel. The accuracy attained is : 97.222. The following shows the graph of fitness function versus epochs.

The accuracy reached was 97.222 for 38 features in 2000 epocs. Best 10 features were obtained for SVM . After 1400 epocs there was no much decrease in value of optimizing function.
Experiment 3: Leukemia Dataset, Polynomial Kernel
The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 87.5
iterations: 2000
Number of features: 10
The following shows the graph of fitness function versus epochs

Experiment 4: Colon Cancer Dataset, Linear SVM
The final accuracy reached was: 98.3871
Number of features: 19
Top best 19 features were obtained for linear SVM . After 700 epocs there was no much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 800 epocs no minimization of features occurs , hence the algorithm has converged to 19 features as optimal.

Experiment 5: Colon Cancer Dataset, RBF Kernel
Radial Basis Kernel. The accuracy obtained is 91.935
The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 91.935 for 14 features. After 700 epocs there was no much decrease in value of optimizing function.

Experiment 6: Colon Cancer Dataset, Polynomial SVM
The fitness chosen is with higher weightage to selected features than to accuracy.
The final accuracy reached was: 76
iterations: 1000
Number of features obtained: 12
Obtained best 19 features . After 900 epocs there was not much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 900 epocs not much minimization of features occurs , hence the algorithm has converged to 12 features as optimal.

Experiment 7: Huge Dataset of 3,00,00,000 features
Total Data Testing accuracy: = 87.4392%
Accuracy after selecting all 30000 features with nnz : 88.4387%
The total number of features is approximately 3 crores. It took 2 days on the given system to calculate F-Score. When looking closely at data. The data has lot of sparseness. So only those features were taken which are non zeros i.e. non sparse features as sparse features wont contribute to discriminant. This has been used as a filtering method. A lot of time was spend in calculating f scores to filter the data.
Accuracy with non sparse data arranged in increasing value of f-scores
Total Accuracy =
5000 features: 88.4387
10000 features : 88.0772
15000 features : 87.7453
20000 features :87.4188
25000 features : 87.1324
30000 features : 86.9255
Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000.
Liblinear | Svm RBF | SVM polynomial | |
5000 features | 88.4387 | 88.772 | 88.7725 |
10000 features | 88.0772 | 88.772 | 88.7725 |
15000 features | 87.7453 | 88.772 | 88.7725 |
20000 features | 87.4188 | 88.772 | 88.7725 |
25000 features | 87.1324 | 88.772 | 88.7725 |
30000 features | 86.9255 | 88.772 | 88.7725 |
Plot of the Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000

More reduction in data features after 5000 features using PSO
PSO with C1=3, C2=3
Min number of features obtained: 415
population size=5
Accuracy: 75
Iterations:500
Convergence graph is given as follows:

PSO with alpha=0.8 and beta a= 0.2 ,C1=3, C2=3
Iterations:500
popsize=5
testing accuracy: 83.66
Min number of Features Obtained= 868

More reduction in data features after 5000 features using GA
GA 500 epochs
population size=5
Accuracy reached: 88.2
Minimum no of features: 1999

More reduction in data features after 5000 features using GA two times
Double application of GA: 79.398%
min features: 199
More reduction in data features after 5000 features using GA 3000 epocs
GA 3000 epochs
Reduced Min Number of features obtained : 367, more could be reduced by more epocs
Accuarcy: 86.563%
Convergence graph is as follows:

More reduction in data features after 5000 features using Forwards Selection Wrapper Method
forward selection:-
Reduced number of minimum features: 197
testing accuracy: 88.6904
More reduction in data features after 5000 features using Backward Selection Wrapper Method
This method did not performed well in reducing the number of feature much
These results show there is a tradeoff between accuracy and number of features selected. While accuracy also depends on the epocs, fitness function optimal value changes on change of fitness function which is taken as weighted mean targeting higher accuracy and lower feature subset on the training data. Further this also shows higher complex fitting of hyperplane may take more time to converge as is considered in experimentation. Best method seem to be forward selection one for reducing after filtering. Further, results depend on lot of parameters, filtering and wrapping techniques followed as pre-processing and post-processing and this article is the illustration of results in one experimental setup and are not standard results.
You must log in to post a comment.