Your smile means the world to me my love When you smile I smile My heart opens up.. What I world I enter into When I hear you giggle I am lost in your smile I cant find myself

Your smile means the world to me my man When you smile I smile My heart opens up.. Colored sparkles raining through clouds is all I see… And is all I want to be wrapped into yes the colored sparkles……..

There is no difference between you and me my love I have merged in colors of your spirit my love Yes you are so manly I am so womanly I felt that you being cruel to me But I trying to understand, the way you express to me Give me time baby

Good that you know me now I can never hurt you my love For you mean the world to me For your world means the universe to me For all that is dear to you is dear to me…. My love

Oh my love….I Thank you for all the love you showered on me ND could have offered me…….Your smile means the world to me, Oh my love…

YOUR LOVE MEANS THE SMILE ON MY LIPS MY LOVE..YOUR LOVE MEANS LIFE TO ME MY LOVE AND YOUR IT MEANS EVERTHIGN TO ME

Many times we want to eat nutritious vegetables but it becomes quite mundane to eat same things daily especially uncooked vegetables. So here is a way to eat delicious food – full of vegetables and just little oil + no other carbohydrates ! Just Vegetables, Spices and Taste!

It can be taken as it is — if you are on diet or you may stuff it in in a burger or any kind of bread !

STEP1

Here are the ingredients for Cabbage-Carrot Fried Mix

Two chopped onions, One chopped tomatoes, finely chopped cabbage-half, three small greenish tomatoes (if available, otherwise put normal tomato), finely chopped carrot (one)

Grinded garlic and ginger as per taste

Spices as per taste

STEP 2

Take few spoons full of cooking oil in frying pan–as per taste – I took 4 spoons

Heat the oil

Put half table spoon of cumin and half table spoon black pepper

Put chopped onions in pan and fry till it turns slightly golden

Put chopped tomatoes in it

Keep it moving and put water in it and let tomatoes become soft

STEP 3

Put chopped cabbage in frying pan

Put salt, red chilli, other spices as per taste

Put chopped ginger and garlic as per taste

Mix it and put water in it

Heat it and put grinded carrot in pan

Heat it till cabbage and carrot becomes soft

Keep putting water and moving the mixture in pan

Keep lid of pan closed for better effects

STEP 4

Once vegetables are soft as per taste needed, cruncy, or mashed up or super soft.

Simple steps to eat this delicious vegetable fried mix.

Ingredients: Two onion, two tomatoes, one cauliflower, two capsicum, spices.

Heat oil as per taste in fry pan

Put a pinch of cumin in hot oil

Chop two onions and put it in this fry pan without burning cumin to black

Fry till onions golden in shade

Put two finely chopped tomatoes in it

Put salt, pepper, red chilli, green chilli, ginger powder and turmeric powder (a pinch), pinch of fennel powder, grinded coriander seeds, cinnamon powder and cloves and other spices as per taste. It comes in market as mixture in a packet of ready to use vegetable Spices.

Mix and fry the mixture

Put some water in it and let it dry till oil separates from the mixture

Grind one cauliflower and put it in the fry fan

Grind two capsicum and put in the same pan

Mix well

Put some water in pan so as to soak the mixture, don’t over add water, it will cause over mashing of the dish

Boil on slow gas with a lid of fry pan closed

Check if the mixture is soft, once so, dry the remaining water by heating on high flame

The dish is ready!

Eat with any kind of bread or on its own. Do serve it Hot!

This article describes application of Evolutionary Algorithms to the task of Feature Selection. In particular the algorithms studies in this article are Particle Swarm Algorithm (PSO) and Genetic Algorithm (GA). The results are based on particular parameters used in experimentation. Here several parameters are analyzed for this problem on two datasets:

Educational data mining data set: kdd2010 bridge-toalgebra (kddb) dataset.

The purpose of this article is to show comparative analysis of experiments and that choice of kernel depends both on datasets–the kind of problem, the number of iterations performed, parameters used and more specifically aim of the problem. Here the aim of problem is feature selection which means reducing the dataset to lesser number of features while retaining the accuracy. The fitness function for this problem is a weighted mean of accuracy and number of features. In the following experiments weights are so changed that number of features selected remains below or equal to 20. Hence a decrease in accuracy is noticed below, given the fact the the number of epocs are not changed. The following svm kernels are experimented.

Further, results depend on lot of parameters and this article is the illustration of results in one experimental setup and are not standard results. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.

Linear

Radial Basis Function (RBF)

Polynomial Function

This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of experimentation for feature selection using SVM, GA and PSO.

Experiment 1: Leukemia Dataset, Linear SVM

Linear Kernel.

The final accuracy reached was: 100

Number of features selected: 9

The following shows the graph of fitness function versus epochs

The final accuracy reached was: 100

Experiment 2: Leukemia Dataset, RBF Kernel

Radial Basis Kernel. The accuracy attained is : 97.222. The following shows the graph of fitness function versus epochs.

The accuracy reached was 97.222 for 38 features in 2000 epocs. Best 10 features were obtained for SVM . After 1400 epocs there was no much decrease in value of optimizing function.

Experiment 3: Leukemia Dataset, Polynomial Kernel

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 87.5

iterations: 2000

Number of features: 10

The following shows the graph of fitness function versus epochs

Experiment 4: Colon Cancer Dataset, Linear SVM

The final accuracy reached was: 98.3871

Number of features: 19

Top best 19 features were obtained for linear SVM . After 700 epocs there was no much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 800 epocs no minimization of features occurs , hence the algorithm has converged to 19 features as optimal.

Experiment 5: Colon Cancer Dataset, RBF Kernel

Radial Basis Kernel. The accuracy obtained is 91.935

The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 91.935 for 14 features. After 700 epocs there was no much decrease in value of optimizing function.

Experiment 6: Colon Cancer Dataset, Polynomial SVM

The fitness chosen is with higher weightage to selected features than to accuracy.

The final accuracy reached was: 76

iterations: 1000

Number of features obtained: 12

Obtained best 19 features . After 900 epocs there was not much decrease in value of fitness function. The following shows the graph of fitness function versus epochs

The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 900 epocs not much minimization of features occurs , hence the algorithm has converged to 12 features as optimal.

Experiment 7: Huge Dataset of 3,00,00,000 features

Total Data Testing accuracy: = 87.4392%

Accuracy after selecting all 30000 features with nnz : 88.4387%

The total number of features is approximately 3 crores. It took 2 days on the given system to calculate F-Score. When looking closely at data. The data has lot of sparseness. So only those features were taken which are non zeros i.e. non sparse features as sparse features wont contribute to discriminant. This has been used as a filtering method. A lot of time was spend in calculating f scores to filter the data.

Accuracy with non sparse data arranged in increasing value of f-scores

Total Accuracy =

5000 features: 88.4387

10000 features : 88.0772

15000 features : 87.7453

20000 features :87.4188

25000 features : 87.1324

30000 features : 86.9255

Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000.

Liblinear

Svm RBF

SVM polynomial

5000 features

88.4387

88.772

88.7725

10000 features

88.0772

88.772

88.7725

15000 features

87.7453

88.772

88.7725

20000 features

87.4188

88.772

88.7725

25000 features

87.1324

88.772

88.7725

30000 features

86.9255

88.772

88.7725

Plot of theResults of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000

More reduction in data features after 5000 features using PSO

PSO with C1=3, C2=3

Min number of features obtained: 415

population size=5

Accuracy: 75

Iterations:500

Convergence graph is given as follows:

PSO with alpha=0.8 and beta a= 0.2 ,C1=3, C2=3

Iterations:500

popsize=5

testing accuracy: 83.66

Min number of Features Obtained= 868

More reduction in data features after 5000 features using GA

GA 500 epochs

population size=5

Accuracy reached: 88.2

Minimum no of features: 1999

More reduction in data features after 5000 features using GA two times

Double application of GA: 79.398%

min features: 199

More reduction in data features after 5000 features using GA 3000 epocs

GA 3000 epochs

Reduced Min Number of features obtained : 367, more could be reduced by more epocs

Accuarcy: 86.563%

Convergence graph is as follows:

More reduction in data features after 5000 features using Forwards Selection Wrapper Method

forward selection:-

Reduced number of minimum features: 197

testing accuracy: 88.6904

More reduction in data features after 5000 features using Backward Selection Wrapper Method

This method did not performed well in reducing the number of feature much

These results show there is a tradeoff between accuracy and number of features selected. While accuracy also depends on the epocs, fitness function optimal value changes on change of fitness function which is taken as weighted mean targeting higher accuracy and lower feature subset on the training data. Further this also shows higher complex fitting of hyperplane may take more time to converge as is considered in experimentation. Best method seem to be forward selection one for reducing after filtering. Further, results depend on lot of parameters, filtering and wrapping techniques followed as pre-processing and post-processing and this article is the illustration of results in one experimental setup and are not standard results.

In this article some experiments and their results are discussed for minimization of Rastrigin function, a famous mathematical function used in Optimization Techniques evaluation. The experiments are performed on certain setup, parameters and system. This has been performed using Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The fitness function is the given function to be optimized. This article just pays focus on the way experiments are to be performed and analyzed in students lab and by mentors teaching Machine Learning results do not play a role in benchmarking or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of optimizing function with the help of GA and PSO.

This function is given by

The function is usually evaluated on x_{i} ∈ [-5.12, 5.12], for all i = 1, …, n . This has been tried on 5 different values of n between 10 and 100 as follows. Taking N as the size of the chromosome for GA and PSO implementation and population size of 10.

The Matlab implementation of fitness function is

%Fitness function function z= testfunction(x) %z= (x’x); [M,N]=size(x); z=[]; sum =0; for j=1:M for i = 1:N sum = sum + (x(j,i)x(j,i) – 10 * cos (23.14x(j,i))); end z= [z;sum + 10*N] end

The following table shows the experimental results.

N

GA Final Minimum Value after 500 iterations

PSO Global Minimum Value after 500 iterations with C1=1, C2=2

PSO Global Minimum Value after 500 iterations with C1=3, C2=3

PSO Global Minimum Value after 500 iterations with C1=3, C2=2

PSO Global Minimum Value after 500 iterations with C1=2, C2=3

10

27.8663(500 epochs)/ 9.1 in 1000 epochs

12.723

4.0002

8.4609

7.0004

20

113.9635(500 epochs)

74.584

13.001

31.879

29.679

40

399.6707 (500 epochs) / 235 in 1000 epochs

191.02

43.23

127.1

115.58

60

918.9643(500 epochs)/ 619 (1000 epochs)

372.2

96.065

193.74

189.75

80

1392.1043(500 epochs)

489.16

220.28

314.04

286.96

The general analysis of results is that PSO is performing better than GA here given the same number of iterations. Though GA solutions are gradually decreasing while PSO solutions are oscillating. Within PSO the c1=3, c2=3 seems to perform best, c1=1 and c2=2 seem to perform the worst in achieving global minimum. More details and comments below.

Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.

Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.

Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima

Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima

Dots : GA, still decreasing and is more stable though global minimum is not less than PSO.

And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.

PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.

Plot of the three competing models is as follows:

Results for N=60

Green: PSO1: with parameters: c1=1,c2=2: Oscillates and convergence not as fast as other parameters.

Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.

Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima

Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima

Dots: GA, still decreasing and is more stable though global minimum is not less than PSO.

And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.

Plot of three competing models as follows

PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.

Results for N=40

Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.

Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.

Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima

Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima

Dots: GA, still decreasing and is more stable though global minimum is not less than pso.

And comparing GA and PSO , GA seems not to oscillates so much and be stable though its minimum not attained as fast.

PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.

Results for N=20

Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.

Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.

Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima

Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima

Dots: GA, still decreasing and is more stable though global minimum is not less than PSO.

And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.

PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.

Results for N=11

Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.

Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.

Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima

Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima

Dots: GA, still decreasing and is more stable though global minimum is not less than PSO

And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.

PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.

The experiments are performed on certain setup, parameters and system. The general impression and analysis are given at end of each experiment conducted. Changing parameters, the mutations functions, population to be performed for creating new chromosomes in case of GA drastically effects the results obtained. In a similar manner for PSO several parameters, initializations effect the results apart from C1 and C2 experimented above. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.

This article is for education and learning purpose. The aim is to understand how to start experimentation in the area of Neural Networks, how to compare results, what all features to consider while doign experiments. It is a handy tool for those doing self learning in the area of Machine Learning. And a good start for educators who want to impart education in these areas and want to know the art of assignments and what all to expect. Further, the results are not benchmarks, they are elaborated for explanations given ahead. For benchmark results look into peer-reviewed research papers.

Here we have implemented backpropagation algorithm and have tested on a subset of original MNIST datasets for 2-class problem of digit image recognition.

Backpropogation with MNIST Data for binary case of 3 and 8 digits

This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of how Neural Networks can be used for two class classifications of image data. Further, the results vary with change in number of hidden layers, initial weights, and other settings.

I have written a script to collect subset of data from 3 and 8 digits called mnist_38_2.mat. The script takes the training data, training labels, testing data and testing labels and is combined into one data file. This script extracts the 3 and 8 digits from training and testing data and creates a single mat file of the whole data containing the training and the testing data along with their labels.

Note: I have performed much fewer epochs on experimentations due to constraints on my computing devise used. This is too less when GPUs are used. But all this is for illustrative purpose only. You can ask your students to perform higher iterations for reaching a higher accuracy and even testing on various number of hidden layers.

Here are details of implementation of the backpropagation algorithm that I have written for binary case of digits 3 and digit 8 classification. I have written the code in Matlab. The Matlab file name is backpropogation_38. Appropriate self explanatory comments are given in the code. Here is a brief summary of the algorithm that being implemented.

This is the main method of the file from where execution starts. In this method data is read from file mnist38All.

This is for two class problem for more than two class code changes. Taking correct positive class as 3 ie 0 and the other one as ie 8 as 1. This is the main method of the file from where execution starts. In this method data is read from file mnist38.mat then 10 folds of data is created. Experiments were performed with crossvalind with a holdout of 0.1,0.2,0.3 to make number of elements in testing set as 10% 20% and 30% in testing and remaining ones in training. Further vtot is defined as a vector which stores the average values of following quantities for the final result. The quantities stored in vtot are [ TP,TN,FP,FN, Precision, Recall, F-measure, accuracy]. We define the number of hidden and number of output neurons in this method experiments were done using varying number of hidden neurons equal to 100.

Inside the loop for ten folds call to the back propagation is made for training function on each of the training and testing set generated. Then on the same set of testing data the evaluations is performed. trainBP is for training and testBP for testing in the for loop.

This is the main function that does the training of the network . It takes the following parameters and return types are discussed. Followed by the details of procedure:

—————parameters————–

trainingSet: The training data as obtained by crossvalind

num_Hidden: The number of hidden nodes

num_Output: The number of output nides

trainingLabels: The labels of training data, in case of one output the number of colums in data is one else is equal to number of output nodes

—————-Return Arguments——————–

weights_1_ij:weights from input to hidden

weights_2_ij:weights from hidden to output

biasInput: the bias from input to hidden

biasHidden: the bias from hidden to output

—————-Details———————-

Here in the training the parameters such as learning rate are assigned which is set as equal to 1/sqrt(iteration). Then each training pattern is presented in the loop one by one in each iteration. The maximum number of iterations is set to 5000 due to computationally large size of the datasets which takes large time to compute per iteration. The condition of the looping of iterations is till either the maximum number of iterations reached or and error in values computed is less than permissible error of 0.001. Further for each input updated weights are computed.

S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ; is w1.x, the net input at jth hidden neuron

S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ; w2.h, the net input at the jth output neuron

delta_2_weights_2_ij(j) = O(j)*(1-O(j))*(Y(k)-O(j)); delta 2 for each output neuron

sum =sum = sum + delta_2_weights_2_ij(l) * weights_2_ij(j,l) ;

delta_1_weights_1_ij(j) = h(j)*(1-h(j))*sum; delta1 for each hidden neuron

weights_1_ij(i,j) = weights_1_ij(i,j) + eta * delta_1_weights_1_ ij(j) *x(i) ; updation in weight in input to hidden layer

weights_2_ij(i,j) = weights_2_ij(i,j) + eta * delta_2_weights_2_ij(j) * % h(i) ; updation in weight in hidden layer to output layer

This method is for testing the backpropagation algorithm written using iterative methology (non matrix based weight updation). This is the main function that does the testing of the network . It takes the following parameters and return types are discussed. Following is the detail of the procedure:

—————parameters————–

weights_1_ij:weights from input to hidden

weights_2_ij:weights from hidden to output

biasInput: the bias from input to hidden

biasHidden: the bias from hidden to output

testingSet: the testing set as creteated by 10 fold crossvalind

testLabel: the testing labes corresponding to the testingSet

num_Hidden: The number of hidden layer neurons

num_Output: number of output layer nodes

—————Return Arguments——————–

correctlyClassified

count3: The number of misclassified class 3 elements

count8: The number of misclassified class 8 elements

unclassified: the matrix containing 5 unclassified data elements of each of the given class

v: This is a vector returning the TP, TN, FP, FN ie the confusion matrix , it also returns precision, recall, F-Value and accuracy

—————-Details———————-

In this method each pattern in the testingSet is tested for its accuracy. Here the net input in hidden layer is calculated and the activation function applied on net input and the outputs at hidden layer are evaluated. Further, the net input at the output layer is computed and activation function is applied to get the net output. The results are compared, the net output with the the expected output to evaluate the TP, FP, TN, FN, precision, recall and accuracy.

The results are evaluated as explained above under the maximum number of epoch equal to 50 as it was taking long time to compute the results. The above three functions are executed and results computed. 100 hidden neurons were taken

The following are experiments conducted. Experiment results are as follows:

Number of Epochs

TP

TN

FP

FN

Precision

Recall

F-Value

Accuracy

0.1 holdout

3000

682

0

714

0

.4885

1

.4885

0.4885

Average of 10 folds, for each fold

100

682

0

714

0

.4885

1

.4885

0.4885

Only these many experiments were conducted due to constraint on time it is taking to run for large number of epochs.

Once settings of initial weights, training and testing data values are changed, it drastically changed the number of epocs required as well as accuracy increased considerably. Here are the results.

Number of Epochs

TP

TN

FP

FN

Precision

Recall

F-Value

Accuracy

Average over all folds and 100 epocs with hundred hidden layer neurons

100

679

486

228

3

.74

.99

.846

0.83

The following accuracy was obtained, but since it was taking too much time its ten folds could not be computed. This accuracy was computed 2 times, 350 epochs. I have attached the final weights of this accuracy with the code.

TP

TN

FP

FN

Precision

Recall

F-Value

Accuracy

BackPropagation with 100 hidden layer neurons

682

697

17

0

.9757

1

.9877

.9878

Code

function BackPropogation38() disp(‘..Starting BackPropogation38 Algorithm….’);

%reading data
A = load('mnist38All.mat');
%v=[ TP,TN,FP,FN, Precision, Recall, F-measure, accuracy]
vtot=[0, 0, 0, 0, 0, 0,0 ,0];
data =[A.train;A.test];
%10 folds with 10-90 ratio of testing and training
for i = 1:10
P=.1;
groups=data(:,785);
[train,test] = crossvalind('holdout',groups, P);
train1= data(train, 1: 785);
test1=data(test, 1: 785);
num_Hidden = 100;
num_Output = 1; % two for binary case else 10.
[rowtrain,coltrain]=size(train1);
[rowtest,coltest]= size(test1);
%initilizating weights
trainingLabels(1:rowtrain) = train1(:,coltrain);
trainingSet(1:rowtrain,1:coltrain-1)=0;
trainingSet(1:rowtrain,1:coltrain-1) =
train1(1:rowtrain,1:coltrain-1);
trainingLabels(1:rowtrain)= train1(1:rowtrain,coltrain);
testSet(1:rowtest,1:coltest-1)=0;
testLabels(1:rowtest)=test1(:,coltrain);
testSet(1:rowtest,1:coltest-1) = test1(1:rowtest,1:coltest-1);
testLabels(1:rowtest)= test1(1:rowtest,coltest);
for n1=1:rowtrain
for n2=1:coltrain-1
if trainingSet(n1,n2) >0
trainingSet(n1,n2)=1;
end
end
end
for n1=1:rowtest
for n2=1:coltest-1
if testSet(n1,n2) >0
testSet(n1,n2)=1;
end
end
end
[weights_1_ij, weights_2_ij, biasInput, biasHidden] = trainBP(trainingSet, num_Hidden,num_Output, trainingLabels,testLabels,testSet);
[correctlyClassified,count3,count8,unClassified,v] = testBP(testSet,weights_1_ij, weights_2_ij,biasInput, biasHidden, testLabels, num_Hidden,num_Output);
%v stores the output containg TP.FP etc , vtot the total of all
%such over 10 folds
vtot = vtot + v;
correctlyClassified
count3
count8
end
%computing average of 10 folds
vtot = vtot ./i

end

%learning rate
eta =1;
maxEpochs=100;
errorBound=0.001;
[trainLengthRow, trainLengthCol] = size(trainingSet);
num_Input = trainLengthCol;
Y = trainingLabels;
%assingning initial weights
weights_1_ij(1:num_Input,1:num_Hidden) = 0.01 ;
weights_2_ij(1:num_Hidden,1:num_Output) = 0.01 ;
biasInput(1:num_Hidden) = 0.01 ;
biasHidden(1:num_Output) = 0.01 ;
delta_1_weights_1_ij(1:num_Hidden)= 0;
delta_2_weights_2_ij(1:num_Output)= 0;
epochs =1;
error = 1;
while((epochs < maxEpochs) && (error > errorBound))
eta=1/sqrt(epochs);
error=0;
epochs = epochs+1
for k =( 1: trainLengthRow )
x = trainingSet(k,:) ;
S1(1:num_Hidden)=0;
S2(1:num_Output)=0;
%calculating weights
for j =(1:num_Hidden)
for i =(1:num_Input)
S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ;
end;
S1(j) = S1(j) + biasInput(j) * 1;
end;
h(1:num_Hidden) = 0;
for j =(1:num_Hidden)
h(j)= 1/(1+exp(-1*S1(j)));
end;
for j =(1:num_Output)
for i =(1:num_Hidden)
S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ;
end;
S2(j) = S2(j) + biasHidden(j) * 1;
end;
O(1:num_Output) = 0;
for j =(1:num_Output)
O(j)= 1/(1+exp(-1*S2(j)));
end;
%calculating weights
for j =(1:num_Output)
delta_2_weights_2_ij(j) = O(j)*(1-O(j))*(Y(k)-O(j));
end;
for j =(1:num_Hidden)
sum = 0;
for l=(1:num_Output)
sum = sum + delta_2_weights_2_ij(l) * weights_2_ij(j,l) ;
end;
delta_1_weights_1_ij(j) = h(j)*(1-h(j))*sum;
end;
%updating weights
%calculating new weights
for i =(1:num_Input)
for j =(1:num_Hidden)
weights_1_ij(i,j) = weights_1_ij(i,j) + eta * delta_1_weights_1_ij(j) * x(i) ;
end;
end;
%computing bias
for j =(1:num_Output)
biasHidden(j) = biasHidden(j) + delta_2_weights_2_ij(j) * 1 ;
end;
%updating weights
%calculating new weights
for i =(1:num_Hidden)
for j =(1:num_Output)
weights_2_ij(i,j) = weights_2_ij(i,j) + eta * delta_2_weights_2_ij(j) * h(i) ;
end;
end;
%computing bias
for j =(1:num_Hidden)
biasInput(j) = biasInput(j) + delta_1_weights_1_ij(j) * 1 ;
end;

end
if epochs % 10 == 0
save('weights.mat','weights_1_ij','weights_2_ij','biasInput','biasHidden','testLabels','testSet','trainingSet','trainingLabels');
end

end end

function [correctlyClassified,count3,count8,unClassified,v] = testBP(testingSet,weights_1_ij, weights_2_ij,biasInput, biasHidden, testLabel, num_Hidden,num_Output)

correctlyClassified = 0;
count3 = 0; count8=0; TP=0; TN=0; FP=0; FN =0; P=0; R=0; F=0;
[testLengthRow,testLengthCol]=size(testingSet);
unClassified(1:10 ,1: testLengthCol) = 0;
% checking accuracy by number of correctly classified
for k=(1: testLengthRow )
x=testingSet(k,:);
S1(1:num_Hidden)=0;
S2(1:num_Output)=0;
num_Input =testLengthCol;
%calculating
for j =(1:num_Hidden)
for i =(1:num_Input)
S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ;
end;
S1(j) = S1(j) + biasInput(j) * 1;
end;
h(1:num_Hidden) = 0;
for j =(1:num_Hidden)
h(j)= 1/(1+exp(-1*S1(j)));
end;
for j =(1:num_Output)
for i =(1:num_Hidden)
S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ;
end;
S2(j) = S2(j) + biasHidden(j) * 1;
end;
O(1:num_Output) = 0;
for j =(1:num_Output)
O(j)= 1/(1+exp(-1*S2(j)));
end;
% error as output approaching target
if sqrt( (round(O(1))- (testLabel(k))) * (round(O(1))- (testLabel(k)) )) == 0
% correctly classified examples
correctlyClassified=correctlyClassified+1;
%compute TP, TN
if(testLabel(k)==1)
TP = TP+1;
else
TN = TN +1;
end
else
% wrongly classified examples
if(testLabel(k)==1)
FN = FN+1;
else
FP = FP +1;
end
%storing 5 misclassified classes from each class
if(count8<5 && testLabel(k)==0)
count8 = count8 + 1;
unClassified(count8,1: testLengthCol) = testingSet(k,1: testLengthCol);
end
if(count3<5 && testLabel(k)==1 )
count3 = count3 + 1;
unClassified(count3+5,1: testLengthCol) = testingSet(k,1: testLengthCol);
end
end
end
k
%for storing 'TP, TN, FP, FN, Precision, Recall,
%F value , accuracy
v=[TP, TN, FP, FN, TP/(TP+FP), TP/P, 2*P*R / (P+R) , correctlyClassified/testLengthRow]
disp('TP, TN, FP, FN, TP/(TP+FP), TP/P, 2*P*R / (P+R) , correctlyClassified/trainLengthRow');
unClassified;
accuracy = correctlyClassified/testLengthRow ;
accuracy

end

Backpropagation algorithm for 10 digits learning

Applying BPNN classifier on original multiclass data with the same experimental settings as in the binary case above. The results are not as high, and only upto few epocs could be performed, again due to limits on computing capacity of systems for experimentation. This is due to the fact that the complete Neural Network, initial weights have to change with the change in problem. The outputs are 10 classes represented in form of 1’s and 0’s of ten digits combinations. Conclusion, experiments of parameters for 10 class data need to be performed.

Misclassified Images for 2 class digit recognition problem

Code was written in each of the algorithm that stores 5 misclassified samples from each class in a mat file. And a script that is in this folder to read each of the misclassified row from this mat file and convert it in a 28 x 28 matrix in image form. The script is as follows:

%This is a script that reads data which was unclassified and stored in a %mat file and displays the image file corresponding to that data.

data = load('mat38unclassified.mat');
A = data.unClassified;
[r,c] = size(A);
%read all r data that are present in unclassified mat file
for t=1 : r
k=1;
x(1:c)=A(t,1:c);
a(1:28,1:28)=0;
%convert it back to matrix form
for i=1: 28
for j =1: 28
a(i,j)= x(k);
k=k+1;
end
end
% display each of the figure separately
figure(t);
imshow(a);
end

The following are misclassified ones from 3 and 8 digits backpropagation for 2 class program

Following are misclassified “3” digit.

Comparison with other ML Techniques

Method

TP

TN

FP

FN

Precision

Recall

F-Value

Accuracy

Logistic regression

936

990

20

38

.9791

.9610

.9699

.9708

Naïve Bayes

938

903

72

71

.928

.928

.928

.92

Comment: Logistic Regression is performing the best and Naïve Bayes is also performing good.

RNN takes high amount of time to learn. So the execution time for RNN is very high.

Naïve Bayes require and extra overhead of discretizing the data. Otherwise Naïve Bayes is a fast algorithm in terms of execution time as its output is direct output that does not require iterations.

The accuracy in these experiments is execution time and system restrictions, which prevented optimization of error and hence increase in accuracy. This does not say than a particular do not perform well-it is just an experimental setup for education and learning purpose for how to go ahead with experimentations.

Weka was used for Naïve Bayes. And the settings cant be changed the true positive rate and false positive rate is coming out to be 0.92 and 0.926 respectively. Which is upper right hand column of the ROC graph.

Naive Bayes ROC Curve: Since using Weka GUI it is not possible to create ROC curve though the TPR and FPR is given by (.926,.92). Also see the screen shoot below for details. Only two point plots possible.

ROC for implemented Logistic Regression:

Note again: Much fewer epochs were performed on experimentations due to constraints on computing devise used. This is too less when GPUs are used. But all this is for illustrative purpose only. Also iterative approach for implementation has been used in the code implemented for illustration. Modern day approach do use matrix computations for efficiency in processing, given the advent in processors and computing facilities used for matrix computations.

You must be logged in to post a comment.