This article describes application of Evolutionary Algorithms to the task of Feature Selection. In particular the algorithms studies in this article are Particle Swarm Algorithm (PSO) and Genetic Algorithm (GA). The results are based on particular parameters used in experimentation. Here several parameters are analyzed for this problem on two datasets:

- Leukemia dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
- Colon Cancer dataset (LIBSVM Data: Classification (Binary Class) (ntu.edu.tw))
- Educational data mining
*data set*: kdd2010 bridge-toalgebra (*kddb)*dataset.

The purpose of this article is to show comparative analysis of experiments and that choice of kernel depends both on datasets–the kind of problem, the number of iterations performed, parameters used and more specifically aim of the problem. Here the aim of problem is feature selection which means reducing the dataset to lesser number of features while retaining the accuracy. The fitness function for this problem is a weighted mean of accuracy and number of features. In the following experiments weights are so changed that number of features selected remains below or equal to 20. Hence a decrease in accuracy is noticed below, given the fact the the number of epocs are not changed. The following svm kernels are experimented.

Further, results depend on lot of parameters and this article is the illustration of results in one experimental setup and are not standard results. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.

- Linear
- Radial Basis Function (RBF)
- Polynomial Function

This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of experimentation for feature selection using SVM, GA and PSO.

**Experiment** 1: Leukemia Dataset, Linear SVM

Linear Kernel.

The final accuracy reached was: ** 100**

**Number of features selected: 9**

The following shows the graph of fitness function versus epochs

The final accuracy reached was: **100**

**Experiment** 2: Leukemia Dataset, RBF Kernel

Radial Basis Kernel. The accuracy attained is : 97.222. The following shows the graph of fitness function versus epochs.

The accuracy reached was 97.222 for 38 features in 2000 epocs. Best 10 features were obtained for SVM . After 1400 epocs there was no much decrease in value of optimizing function.

## Experiment 3: Leukemia Dataset, Polynomial Kernel

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: ** 87.5**

**iterations: 2000**

**Number of features: 10**

The following shows the graph of fitness function versus epochs

## Experiment 4: Colon Cancer Dataset, Linear SVM

The final accuracy reached was: **98.3871**

**Number of features: 19**

Top best 19 features were obtained for linear SVM . After 700 epocs there was no much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 800 epocs no minimization of features occurs , hence the algorithm has converged to 19 features as optimal.

## Experiment 5: Colon Cancer Dataset, RBF Kernel

Radial Basis Kernel. The accuracy obtained is 91.935

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 91.935 for 14 features. After 700 epocs there was no much decrease in value of optimizing function.

## Experiment 6: Colon Cancer Dataset, Polynomial SVM

The fitness chosen is with higher weightage to selected features than to accuracy.

The final accuracy reached was: ** 76**

**iterations: 1000**

**Number of features obtained: 12**

Obtained best 19 features . After 900 epocs there was not much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 900 epocs not much minimization of features occurs , hence the algorithm has converged to 12 features as optimal.

## Experiment 7: Huge Dataset of 3,00,00,000 features

Total Data Testing accuracy: = 87.4392%

Accuracy after selecting all 30000 features with nnz : 88.4387%

The total number of features is approximately 3 crores. It took 2 days on the given system to calculate F-Score. When looking closely at data. The data has lot of sparseness. So only those features were taken which are non zeros i.e. non sparse features as sparse features wont contribute to discriminant. This has been used as a filtering method. A lot of time was spend in calculating f scores to filter the data.

Accuracy with non sparse data arranged in increasing value of f-scores

Total Accuracy =

5000 features: 88.4387

10000 features : 88.0772

15000 features : 87.7453

20000 features :87.4188

25000 features : 87.1324

30000 features : 86.9255

__Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000.__

| Liblinear | Svm RBF | SVM polynomial |

5000 features | 88.4387 | 88.772 | 88.7725 |

10000 features | 88.0772 | 88.772 | 88.7725 |

15000 features | 87.7453 | 88.772 | 88.7725 |

20000 features | 87.4188 | 88.772 | 88.7725 |

25000 features | 87.1324 | 88.772 | 88.7725 |

30000 features | 86.9255 | 88.772 | 88.7725 |

__Plot of the____Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000__

__More reduction in data features after 5000 features using PSO__

PSO with C1=3, C2=3

**Min number of features obtained: 415**

population size=5

**Accuracy: 75**

Iterations:500

Convergence graph is given as follows:

PSO with alpha=0.8 and beta a= 0.2 ,C1=3, C2=3

Iterations:500

popsize=5

**testing accuracy: 83.66**

**Min number of Features Obtained= 868**

__More reduction in data features after 5000 features using GA__

GA 500 epochs

population size=5

**Accuracy reached: 88.2**

**Minimum no of features: 1999**

__More reduction in data features after 5000 features using GA two times__

Double application of GA: 79.398%

**min features: 199**

__More reduction in data features after 5000 features using GA 3000 epocs__

**GA 3000 epochs**

**Reduced Min Number of features obtained : 367, more could be reduced by more epocs**

**Accuarcy: 86.563%**

Convergence graph is as follows:

__More reduction in data features after 5000 features using Forwards Selection Wrapper Method__

forward selection:-

**Reduced number of minimum features: 197**

**testing accuracy: 88.6904**

__More reduction in data features after 5000 features using Backward Selection Wrapper Method__

This method did not performed well in reducing the number of feature much

These results show there is a tradeoff between accuracy and number of features selected. While accuracy also depends on the epocs, fitness function optimal value changes on change of fitness function which is taken as weighted mean targeting higher accuracy and lower feature subset on the training data. Further this also shows higher complex fitting of hyperplane may take more time to converge as is considered in experimentation. Best method seem to be forward selection one for reducing after filtering. Further, results depend on lot of parameters, filtering and wrapping techniques followed as pre-processing and post-processing and this article is the illustration of results in one experimental setup and are not standard results.