# Feature Selection Techniques using Evolutionary Algorithms

This article describes application of Evolutionary Algorithms to the task of Feature Selection. In particular the algorithms studies in this article are Particle Swarm Algorithm (PSO) and Genetic Algorithm (GA). The results are based on particular parameters used in experimentation. Here several parameters are analyzed for this problem on two datasets:

1. Leukemia dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
2. Colon Cancer dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
3. Educational data mining   data set: kdd2010 bridge-toalgebra (kddb) dataset.

The purpose of this article is to show comparative analysis of experiments and that choice of kernel depends both on datasets–the kind of problem, the number of iterations performed, parameters used and more specifically aim of the problem. Here the aim of problem is feature selection which means reducing the dataset to lesser number of features while retaining the accuracy. The fitness function for this problem is a weighted mean of accuracy and number of features. In the following experiments weights are so changed that number of features selected remains below or equal to 20. Hence a decrease in accuracy is noticed below, given the fact the the number of epocs are not changed. The following svm kernels are experimented.

Further, results depend on lot of parameters and this article is the illustration of results in one experimental setup and are not standard results. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.

1. Linear
3. Polynomial Function

This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of experimentation for feature selection using SVM, GA and PSO.

## Experiment 1: Leukemia Dataset, Linear SVM

Linear Kernel.

The final accuracy reached was:   100

Number of features selected: 9

The following shows the graph of fitness function versus epochs

The final accuracy reached was: 100

## Experiment 2: Leukemia Dataset, RBF Kernel

Radial Basis Kernel. The accuracy attained is : 97.222. The following shows the graph of fitness function versus epochs.

The accuracy reached was 97.222 for 38 features in 2000 epocs. Best 10 features were obtained for SVM . After 1400 epocs there was no much decrease in value of optimizing function.

## Experiment 3: Leukemia Dataset, Polynomial Kernel

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was:    87.5

iterations: 2000

Number of features: 10

The following shows the graph of fitness function versus epochs

## Experiment 4: Colon Cancer Dataset, Linear SVM

The final accuracy reached was:  98.3871

Number of features: 19

Top best 19 features were obtained for linear SVM . After 700 epocs there was no much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 800 epocs no minimization of features occurs , hence the algorithm has converged to 19 features as optimal.

## Experiment 5: Colon Cancer Dataset, RBF Kernel

Radial Basis Kernel. The accuracy obtained is 91.935

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was:  91.935 for 14 features. After 700 epocs there was no much decrease in value of optimizing function.

## Experiment 6: Colon Cancer Dataset, Polynomial SVM

The fitness chosen is with higher weightage to selected features than to accuracy.

The final accuracy reached was:   76

iterations: 1000

Number of features obtained: 12

Obtained best 19 features . After 900 epocs there was not much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 900 epocs not much minimization of features occurs , hence the algorithm has converged to 12 features as optimal.

## Experiment 7: Huge Dataset of 3,00,00,000 features

Total Data Testing accuracy: = 87.4392%

Accuracy  after selecting all 30000 features with nnz  :  88.4387%

The total number of features is approximately 3 crores. It took 2 days on the given system to calculate F-Score. When looking closely at data. The data has lot of sparseness. So only those features were taken which are non zeros i.e. non sparse features as sparse features wont contribute to discriminant. This has been used as a filtering method. A lot of time was spend in calculating f scores to filter the data.

Accuracy with non sparse data arranged in increasing value of f-scores

Total  Accuracy =

5000 features: 88.4387

10000 features : 88.0772

15000 features :  87.7453

20000 features :87.4188

25000 features :  87.1324

30000 features : 86.9255

Results of different SVM classifiers by varying  the dimensions in the range from 1000 to 30000 in steps of 5000.

Plot of the Results of different SVM classifiers by varying  the dimensions in the range from 1000 to 30000 in steps of 5000

More reduction in data features after 5000 features using PSO

PSO with C1=3, C2=3

Min number  of  features obtained:  415

population size=5

Accuracy: 75

Iterations:500

Convergence graph is given as follows:

PSO with alpha=0.8 and beta a= 0.2 ,C1=3, C2=3

Iterations:500

popsize=5

testing accuracy: 83.66

Min number of Features Obtained= 868

More reduction in data features after 5000 features using GA

GA 500 epochs

population size=5

Accuracy reached: 88.2

Minimum no of features:  1999

More reduction in data features after 5000 features using GA two times

Double application of GA:    79.398%

min features: 199

More reduction in data features after 5000 features using GA 3000 epocs

GA 3000 epochs

Reduced Min Number of features obtained : 367, more could be reduced by more epocs

Accuarcy: 86.563%

Convergence graph is as follows:

More reduction in data features after 5000 features using Forwards Selection Wrapper Method

forward selection:-

Reduced number of minimum features: 197

testing accuracy: 88.6904

More reduction in data features after 5000 features using Backward Selection Wrapper Method

This method did not performed well in reducing the number of feature much

These results show there is a tradeoff between accuracy and number of features selected. While accuracy also depends on the epocs, fitness function optimal value changes on change of fitness function which is taken as weighted mean targeting higher accuracy and lower feature subset on the training data. Further this also shows higher complex fitting of hyperplane may take more time to converge as is considered in experimentation. Best method seem to be forward selection one for reducing after filtering. Further, results depend on lot of parameters, filtering and wrapping techniques followed as pre-processing and post-processing and this article is the illustration of results in one experimental setup and are not standard results.