Your smile means the world to me my love When you smile I smile My heart opens up.. What I world I enter into When I hear you giggle I am lost in your smile I cant find myself
Your smile means the world to me my man When you smile I smile My heart opens up.. Colored sparkles raining through clouds is all I see… And is all I want to be wrapped into yes the colored sparkles……..
There is no difference between you and me my love I have merged in colors of your spirit my love Yes you are so manly I am so womanly I felt that you being cruel to me But I trying to understand, the way you express to me Give me time baby
Good that you know me now I can never hurt you my love For you mean the world to me For your world means the universe to me For all that is dear to you is dear to me…. My love
Oh my love….I Thank you for all the love you showered on me ND could have offered me…….Your smile means the world to me, Oh my love…
YOUR LOVE MEANS THE SMILE ON MY LIPS MY LOVE..YOUR LOVE MEANS LIFE TO ME MY LOVE AND YOUR IT MEANS EVERTHIGN TO ME
Many times we want to eat nutritious vegetables but it becomes quite mundane to eat same things daily especially uncooked vegetables. So here is a way to eat delicious food – full of vegetables and just little oil + no other carbohydrates ! Just Vegetables, Spices and Taste!
It can be taken as it is — if you are on diet or you may stuff it in in a burger or any kind of bread !
STEP1
Here are the ingredients for Cabbage-Carrot Fried Mix
Two chopped onions, One chopped tomatoes, finely chopped cabbage-half, three small greenish tomatoes (if available, otherwise put normal tomato), finely chopped carrot (one)
Grinded garlic and ginger as per taste
Spices as per taste
STEP 2
Take few spoons full of cooking oil in frying pan–as per taste – I took 4 spoons
Heat the oil
Put half table spoon of cumin and half table spoon black pepper
Put chopped onions in pan and fry till it turns slightly golden
Put chopped tomatoes in it
Keep it moving and put water in it and let tomatoes become soft
STEP 3
Put chopped cabbage in frying pan
Put salt, red chilli, other spices as per taste
Put chopped ginger and garlic as per taste
Mix it and put water in it
Heat it and put grinded carrot in pan
Heat it till cabbage and carrot becomes soft
Keep putting water and moving the mixture in pan
Keep lid of pan closed for better effects
STEP 4
Once vegetables are soft as per taste needed, cruncy, or mashed up or super soft.
Simple steps to eat this delicious vegetable fried mix.
Ingredients: Two onion, two tomatoes, one cauliflower, two capsicum, spices.
Heat oil as per taste in fry pan
Put a pinch of cumin in hot oil
Chop two onions and put it in this fry pan without burning cumin to black
Fry till onions golden in shade
Put two finely chopped tomatoes in it
Put salt, pepper, red chilli, green chilli, ginger powder and turmeric powder (a pinch), pinch of fennel powder, grinded coriander seeds, cinnamon powder and cloves and other spices as per taste. It comes in market as mixture in a packet of ready to use vegetable Spices.
Mix and fry the mixture
Put some water in it and let it dry till oil separates from the mixture
Grind one cauliflower and put it in the fry fan
Grind two capsicum and put in the same pan
Mix well
Put some water in pan so as to soak the mixture, don’t over add water, it will cause over mashing of the dish
Boil on slow gas with a lid of fry pan closed
Check if the mixture is soft, once so, dry the remaining water by heating on high flame
The dish is ready!
Eat with any kind of bread or on its own. Do serve it Hot!
This article describes application of Evolutionary Algorithms to the task of Feature Selection. In particular the algorithms studies in this article are Particle Swarm Algorithm (PSO) and Genetic Algorithm (GA). The results are based on particular parameters used in experimentation. Here several parameters are analyzed for this problem on two datasets:
Educational data mining data set: kdd2010 bridge-toalgebra (kddb) dataset.
The purpose of this article is to show comparative analysis of experiments and that choice of kernel depends both on datasets–the kind of problem, the number of iterations performed, parameters used and more specifically aim of the problem. Here the aim of problem is feature selection which means reducing the dataset to lesser number of features while retaining the accuracy. The fitness function for this problem is a weighted mean of accuracy and number of features. In the following experiments weights are so changed that number of features selected remains below or equal to 20. Hence a decrease in accuracy is noticed below, given the fact the the number of epocs are not changed. The following svm kernels are experimented.
Further, results depend on lot of parameters and this article is the illustration of results in one experimental setup and are not standard results. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.
Linear
Radial Basis Function (RBF)
Polynomial Function
This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of experimentation for feature selection using SVM, GA and PSO.
Experiment 1: Leukemia Dataset, Linear SVM
Linear Kernel.
The final accuracy reached was: 100
Number of features selected: 9
The following shows the graph of fitness function versus epochs
The final accuracy reached was: 100
Experiment 2: Leukemia Dataset, RBF Kernel
Radial Basis Kernel. The accuracy attained is : 97.222. The following shows the graph of fitness function versus epochs.
The accuracy reached was 97.222 for 38 features in 2000 epocs. Best 10 features were obtained for SVM . After 1400 epocs there was no much decrease in value of optimizing function.
Experiment 3: Leukemia Dataset, Polynomial Kernel
The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 87.5
iterations: 2000
Number of features: 10
The following shows the graph of fitness function versus epochs
Experiment 4: Colon Cancer Dataset, Linear SVM
The final accuracy reached was: 98.3871
Number of features: 19
Top best 19 features were obtained for linear SVM . After 700 epocs there was no much decrease in value of fitness function. The following shows the graph of fitness function versus epochs
The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 800 epocs no minimization of features occurs , hence the algorithm has converged to 19 features as optimal.
Experiment 5: Colon Cancer Dataset, RBF Kernel
Radial Basis Kernel. The accuracy obtained is 91.935
The fitness chosen is with higher weightage to selected features than to accuracy. The final accuracy reached was: 91.935 for 14 features. After 700 epocs there was no much decrease in value of optimizing function.
Experiment 6: Colon Cancer Dataset, Polynomial SVM
The fitness chosen is with higher weightage to selected features than to accuracy.
The final accuracy reached was: 76
iterations: 1000
Number of features obtained: 12
Obtained best 19 features . After 900 epocs there was not much decrease in value of fitness function. The following shows the graph of fitness function versus epochs
The following graph shows in red the number of features, blue the accuracy, green the minimization of fitness value. It is clear that after 900 epocs not much minimization of features occurs , hence the algorithm has converged to 12 features as optimal.
Experiment 7: Huge Dataset of 3,00,00,000 features
Total Data Testing accuracy: = 87.4392%
Accuracy after selecting all 30000 features with nnz : 88.4387%
The total number of features is approximately 3 crores. It took 2 days on the given system to calculate F-Score. When looking closely at data. The data has lot of sparseness. So only those features were taken which are non zeros i.e. non sparse features as sparse features wont contribute to discriminant. This has been used as a filtering method. A lot of time was spend in calculating f scores to filter the data.
Accuracy with non sparse data arranged in increasing value of f-scores
Total Accuracy =
5000 features: 88.4387
10000 features : 88.0772
15000 features : 87.7453
20000 features :87.4188
25000 features : 87.1324
30000 features : 86.9255
Results of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000.
Liblinear
Svm RBF
SVM polynomial
5000 features
88.4387
88.772
88.7725
10000 features
88.0772
88.772
88.7725
15000 features
87.7453
88.772
88.7725
20000 features
87.4188
88.772
88.7725
25000 features
87.1324
88.772
88.7725
30000 features
86.9255
88.772
88.7725
Plot of theResults of different SVM classifiers by varying the dimensions in the range from 1000 to 30000 in steps of 5000
More reduction in data features after 5000 features using PSO
PSO with C1=3, C2=3
Min number of features obtained: 415
population size=5
Accuracy: 75
Iterations:500
Convergence graph is given as follows:
PSO with alpha=0.8 and beta a= 0.2 ,C1=3, C2=3
Iterations:500
popsize=5
testing accuracy: 83.66
Min number of Features Obtained= 868
More reduction in data features after 5000 features using GA
GA 500 epochs
population size=5
Accuracy reached: 88.2
Minimum no of features: 1999
More reduction in data features after 5000 features using GA two times
Double application of GA: 79.398%
min features: 199
More reduction in data features after 5000 features using GA 3000 epocs
GA 3000 epochs
Reduced Min Number of features obtained : 367, more could be reduced by more epocs
Accuarcy: 86.563%
Convergence graph is as follows:
More reduction in data features after 5000 features using Forwards Selection Wrapper Method
forward selection:-
Reduced number of minimum features: 197
testing accuracy: 88.6904
More reduction in data features after 5000 features using Backward Selection Wrapper Method
This method did not performed well in reducing the number of feature much
These results show there is a tradeoff between accuracy and number of features selected. While accuracy also depends on the epocs, fitness function optimal value changes on change of fitness function which is taken as weighted mean targeting higher accuracy and lower feature subset on the training data. Further this also shows higher complex fitting of hyperplane may take more time to converge as is considered in experimentation. Best method seem to be forward selection one for reducing after filtering. Further, results depend on lot of parameters, filtering and wrapping techniques followed as pre-processing and post-processing and this article is the illustration of results in one experimental setup and are not standard results.
In this article some experiments and their results are discussed for minimization of Rastrigin function, a famous mathematical function used in Optimization Techniques evaluation. The experiments are performed on certain setup, parameters and system. This has been performed using Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The fitness function is the given function to be optimized. This article just pays focus on the way experiments are to be performed and analyzed in students lab and by mentors teaching Machine Learning results do not play a role in benchmarking or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of optimizing function with the help of GA and PSO.
This function is given by
The function is usually evaluated on xi ∈ [-5.12, 5.12], for all i = 1, …, n . This has been tried on 5 different values of n between 10 and 100 as follows. Taking N as the size of the chromosome for GA and PSO implementation and population size of 10.
The Matlab implementation of fitness function is
%Fitness function function z= testfunction(x) %z= (x’x); [M,N]=size(x); z=[]; sum =0; for j=1:M for i = 1:N sum = sum + (x(j,i)x(j,i) – 10 * cos (23.14x(j,i))); end z= [z;sum + 10*N] end
The following table shows the experimental results.
N
GA Final Minimum Value after 500 iterations
PSO Global Minimum Value after 500 iterations with C1=1, C2=2
PSO Global Minimum Value after 500 iterations with C1=3, C2=3
PSO Global Minimum Value after 500 iterations with C1=3, C2=2
PSO Global Minimum Value after 500 iterations with C1=2, C2=3
10
27.8663(500 epochs)/ 9.1 in 1000 epochs
12.723
4.0002
8.4609
7.0004
20
113.9635(500 epochs)
74.584
13.001
31.879
29.679
40
399.6707 (500 epochs) / 235 in 1000 epochs
191.02
43.23
127.1
115.58
60
918.9643(500 epochs)/ 619 (1000 epochs)
372.2
96.065
193.74
189.75
80
1392.1043(500 epochs)
489.16
220.28
314.04
286.96
The general analysis of results is that PSO is performing better than GA here given the same number of iterations. Though GA solutions are gradually decreasing while PSO solutions are oscillating. Within PSO the c1=3, c2=3 seems to perform best, c1=1 and c2=2 seem to perform the worst in achieving global minimum. More details and comments below.
Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.
Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.
Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima
Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima
Dots : GA, still decreasing and is more stable though global minimum is not less than PSO.
And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.
PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.
Plot of the three competing models is as follows:
Results for N=60
Green: PSO1: with parameters: c1=1,c2=2: Oscillates and convergence not as fast as other parameters.
Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.
Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima
Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima
Dots: GA, still decreasing and is more stable though global minimum is not less than PSO.
And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.
Plot of three competing models as follows
PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.
Results for N=40
Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.
Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.
Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima
Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima
Dots: GA, still decreasing and is more stable though global minimum is not less than pso.
And comparing GA and PSO , GA seems not to oscillates so much and be stable though its minimum not attained as fast.
PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.
Results for N=20
Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.
Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.
Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima
Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima
Dots: GA, still decreasing and is more stable though global minimum is not less than PSO.
And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.
PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.
Results for N=11
Green: PSO1: with parameters : c1=1,c2=2 : Oscillates and convergence not as fast as other parameters.
Blue: PSO2 : with parameters : c1=3,c2=3 : Oscillates and reaches towards minima.
Red: PSO3 : with parameters : c1=3,c2=2 : Oscillates. Reaches towards minima
Black: PSO4 : with parameters : c1=2,c2=3 : Oscillates. Reaches towards minima
Dots: GA, still decreasing and is more stable though global minimum is not less than PSO
And comparing GA and PSO , GA seems not to oscillates and be stable though its minimum not attained as fast.
PSO1 with c1=1,c2=2, seem to not be as fast in attaining minimum as compared to other parameters which are almost equivalent in attaining minimum. Here is separate result of c1=1, c2=2.
The experiments are performed on certain setup, parameters and system. The general impression and analysis are given at end of each experiment conducted. Changing parameters, the mutations functions, population to be performed for creating new chromosomes in case of GA drastically effects the results obtained. In a similar manner for PSO several parameters, initializations effect the results apart from C1 and C2 experimented above. The results are not benchmarks, they are elaborated for experimental setups in labs, and to be followed by mentors teaching Machine Learning Lab Work. For benchmark results look into peer-reviewed research papers.
This article is for education and learning purpose. The aim is to understand how to start experimentation in the area of Neural Networks, how to compare results, what all features to consider while doign experiments. It is a handy tool for those doing self learning in the area of Machine Learning. And a good start for educators who want to impart education in these areas and want to know the art of assignments and what all to expect. Further, the results are not benchmarks, they are elaborated for explanations given ahead. For benchmark results look into peer-reviewed research papers.
Here we have implemented backpropagation algorithm and have tested on a subset of original MNIST datasets for 2-class problem of digit image recognition.
Backpropogation with MNIST Data for binary case of 3 and 8 digits
This article just pays focus on the way experiments are to be performed and analyzed and some results and do not play a role in benchmarking results or changing existing theories. It is just for elaboration for the purpose of academicians and students to learn the art of how Neural Networks can be used for two class classifications of image data. Further, the results vary with change in number of hidden layers, initial weights, and other settings.
I have written a script to collect subset of data from 3 and 8 digits called mnist_38_2.mat. The script takes the training data, training labels, testing data and testing labels and is combined into one data file. This script extracts the 3 and 8 digits from training and testing data and creates a single mat file of the whole data containing the training and the testing data along with their labels.
Note: I have performed much fewer epochs on experimentations due to constraints on my computing devise used. This is too less when GPUs are used. But all this is for illustrative purpose only. You can ask your students to perform higher iterations for reaching a higher accuracy and even testing on various number of hidden layers.
Here are details of implementation of the backpropagation algorithm that I have written for binary case of digits 3 and digit 8 classification. I have written the code in Matlab. The Matlab file name is backpropogation_38. Appropriate self explanatory comments are given in the code. Here is a brief summary of the algorithm that being implemented.
This is the main method of the file from where execution starts. In this method data is read from file mnist38All.
This is for two class problem for more than two class code changes. Taking correct positive class as 3 ie 0 and the other one as ie 8 as 1. This is the main method of the file from where execution starts. In this method data is read from file mnist38.mat then 10 folds of data is created. Experiments were performed with crossvalind with a holdout of 0.1,0.2,0.3 to make number of elements in testing set as 10% 20% and 30% in testing and remaining ones in training. Further vtot is defined as a vector which stores the average values of following quantities for the final result. The quantities stored in vtot are [ TP,TN,FP,FN, Precision, Recall, F-measure, accuracy]. We define the number of hidden and number of output neurons in this method experiments were done using varying number of hidden neurons equal to 100.
Inside the loop for ten folds call to the back propagation is made for training function on each of the training and testing set generated. Then on the same set of testing data the evaluations is performed. trainBP is for training and testBP for testing in the for loop.
This is the main function that does the training of the network . It takes the following parameters and return types are discussed. Followed by the details of procedure:
—————parameters————–
trainingSet: The training data as obtained by crossvalind
num_Hidden: The number of hidden nodes
num_Output: The number of output nides
trainingLabels: The labels of training data, in case of one output the number of colums in data is one else is equal to number of output nodes
—————-Return Arguments——————–
weights_1_ij:weights from input to hidden
weights_2_ij:weights from hidden to output
biasInput: the bias from input to hidden
biasHidden: the bias from hidden to output
—————-Details———————-
Here in the training the parameters such as learning rate are assigned which is set as equal to 1/sqrt(iteration). Then each training pattern is presented in the loop one by one in each iteration. The maximum number of iterations is set to 5000 due to computationally large size of the datasets which takes large time to compute per iteration. The condition of the looping of iterations is till either the maximum number of iterations reached or and error in values computed is less than permissible error of 0.001. Further for each input updated weights are computed.
S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ; is w1.x, the net input at jth hidden neuron
S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ; w2.h, the net input at the jth output neuron
delta_2_weights_2_ij(j) = O(j)*(1-O(j))*(Y(k)-O(j)); delta 2 for each output neuron
sum =sum = sum + delta_2_weights_2_ij(l) * weights_2_ij(j,l) ;
delta_1_weights_1_ij(j) = h(j)*(1-h(j))*sum; delta1 for each hidden neuron
weights_1_ij(i,j) = weights_1_ij(i,j) + eta * delta_1_weights_1_ ij(j) *x(i) ; updation in weight in input to hidden layer
weights_2_ij(i,j) = weights_2_ij(i,j) + eta * delta_2_weights_2_ij(j) * % h(i) ; updation in weight in hidden layer to output layer
This method is for testing the backpropagation algorithm written using iterative methology (non matrix based weight updation). This is the main function that does the testing of the network . It takes the following parameters and return types are discussed. Following is the detail of the procedure:
—————parameters————–
weights_1_ij:weights from input to hidden
weights_2_ij:weights from hidden to output
biasInput: the bias from input to hidden
biasHidden: the bias from hidden to output
testingSet: the testing set as creteated by 10 fold crossvalind
testLabel: the testing labes corresponding to the testingSet
num_Hidden: The number of hidden layer neurons
num_Output: number of output layer nodes
—————Return Arguments——————–
correctlyClassified
count3: The number of misclassified class 3 elements
count8: The number of misclassified class 8 elements
unclassified: the matrix containing 5 unclassified data elements of each of the given class
v: This is a vector returning the TP, TN, FP, FN ie the confusion matrix , it also returns precision, recall, F-Value and accuracy
—————-Details———————-
In this method each pattern in the testingSet is tested for its accuracy. Here the net input in hidden layer is calculated and the activation function applied on net input and the outputs at hidden layer are evaluated. Further, the net input at the output layer is computed and activation function is applied to get the net output. The results are compared, the net output with the the expected output to evaluate the TP, FP, TN, FN, precision, recall and accuracy.
The results are evaluated as explained above under the maximum number of epoch equal to 50 as it was taking long time to compute the results. The above three functions are executed and results computed. 100 hidden neurons were taken
The following are experiments conducted. Experiment results are as follows:
Number of Epochs
TP
TN
FP
FN
Precision
Recall
F-Value
Accuracy
0.1 holdout
3000
682
0
714
0
.4885
1
.4885
0.4885
Average of 10 folds, for each fold
100
682
0
714
0
.4885
1
.4885
0.4885
Only these many experiments were conducted due to constraint on time it is taking to run for large number of epochs.
Once settings of initial weights, training and testing data values are changed, it drastically changed the number of epocs required as well as accuracy increased considerably. Here are the results.
Number of Epochs
TP
TN
FP
FN
Precision
Recall
F-Value
Accuracy
Average over all folds and 100 epocs with hundred hidden layer neurons
100
679
486
228
3
.74
.99
.846
0.83
The following accuracy was obtained, but since it was taking too much time its ten folds could not be computed. This accuracy was computed 2 times, 350 epochs. I have attached the final weights of this accuracy with the code.
TP
TN
FP
FN
Precision
Recall
F-Value
Accuracy
BackPropagation with 100 hidden layer neurons
682
697
17
0
.9757
1
.9877
.9878
Code
function BackPropogation38() disp(‘..Starting BackPropogation38 Algorithm….’);
%reading data
A = load('mnist38All.mat');
%v=[ TP,TN,FP,FN, Precision, Recall, F-measure, accuracy]
vtot=[0, 0, 0, 0, 0, 0,0 ,0];
data =[A.train;A.test];
%10 folds with 10-90 ratio of testing and training
for i = 1:10
P=.1;
groups=data(:,785);
[train,test] = crossvalind('holdout',groups, P);
train1= data(train, 1: 785);
test1=data(test, 1: 785);
num_Hidden = 100;
num_Output = 1; % two for binary case else 10.
[rowtrain,coltrain]=size(train1);
[rowtest,coltest]= size(test1);
%initilizating weights
trainingLabels(1:rowtrain) = train1(:,coltrain);
trainingSet(1:rowtrain,1:coltrain-1)=0;
trainingSet(1:rowtrain,1:coltrain-1) =
train1(1:rowtrain,1:coltrain-1);
trainingLabels(1:rowtrain)= train1(1:rowtrain,coltrain);
testSet(1:rowtest,1:coltest-1)=0;
testLabels(1:rowtest)=test1(:,coltrain);
testSet(1:rowtest,1:coltest-1) = test1(1:rowtest,1:coltest-1);
testLabels(1:rowtest)= test1(1:rowtest,coltest);
for n1=1:rowtrain
for n2=1:coltrain-1
if trainingSet(n1,n2) >0
trainingSet(n1,n2)=1;
end
end
end
for n1=1:rowtest
for n2=1:coltest-1
if testSet(n1,n2) >0
testSet(n1,n2)=1;
end
end
end
[weights_1_ij, weights_2_ij, biasInput, biasHidden] = trainBP(trainingSet, num_Hidden,num_Output, trainingLabels,testLabels,testSet);
[correctlyClassified,count3,count8,unClassified,v] = testBP(testSet,weights_1_ij, weights_2_ij,biasInput, biasHidden, testLabels, num_Hidden,num_Output);
%v stores the output containg TP.FP etc , vtot the total of all
%such over 10 folds
vtot = vtot + v;
correctlyClassified
count3
count8
end
%computing average of 10 folds
vtot = vtot ./i
end
%learning rate
eta =1;
maxEpochs=100;
errorBound=0.001;
[trainLengthRow, trainLengthCol] = size(trainingSet);
num_Input = trainLengthCol;
Y = trainingLabels;
%assingning initial weights
weights_1_ij(1:num_Input,1:num_Hidden) = 0.01 ;
weights_2_ij(1:num_Hidden,1:num_Output) = 0.01 ;
biasInput(1:num_Hidden) = 0.01 ;
biasHidden(1:num_Output) = 0.01 ;
delta_1_weights_1_ij(1:num_Hidden)= 0;
delta_2_weights_2_ij(1:num_Output)= 0;
epochs =1;
error = 1;
while((epochs < maxEpochs) && (error > errorBound))
eta=1/sqrt(epochs);
error=0;
epochs = epochs+1
for k =( 1: trainLengthRow )
x = trainingSet(k,:) ;
S1(1:num_Hidden)=0;
S2(1:num_Output)=0;
%calculating weights
for j =(1:num_Hidden)
for i =(1:num_Input)
S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ;
end;
S1(j) = S1(j) + biasInput(j) * 1;
end;
h(1:num_Hidden) = 0;
for j =(1:num_Hidden)
h(j)= 1/(1+exp(-1*S1(j)));
end;
for j =(1:num_Output)
for i =(1:num_Hidden)
S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ;
end;
S2(j) = S2(j) + biasHidden(j) * 1;
end;
O(1:num_Output) = 0;
for j =(1:num_Output)
O(j)= 1/(1+exp(-1*S2(j)));
end;
%calculating weights
for j =(1:num_Output)
delta_2_weights_2_ij(j) = O(j)*(1-O(j))*(Y(k)-O(j));
end;
for j =(1:num_Hidden)
sum = 0;
for l=(1:num_Output)
sum = sum + delta_2_weights_2_ij(l) * weights_2_ij(j,l) ;
end;
delta_1_weights_1_ij(j) = h(j)*(1-h(j))*sum;
end;
%updating weights
%calculating new weights
for i =(1:num_Input)
for j =(1:num_Hidden)
weights_1_ij(i,j) = weights_1_ij(i,j) + eta * delta_1_weights_1_ij(j) * x(i) ;
end;
end;
%computing bias
for j =(1:num_Output)
biasHidden(j) = biasHidden(j) + delta_2_weights_2_ij(j) * 1 ;
end;
%updating weights
%calculating new weights
for i =(1:num_Hidden)
for j =(1:num_Output)
weights_2_ij(i,j) = weights_2_ij(i,j) + eta * delta_2_weights_2_ij(j) * h(i) ;
end;
end;
%computing bias
for j =(1:num_Hidden)
biasInput(j) = biasInput(j) + delta_1_weights_1_ij(j) * 1 ;
end;
end
if epochs % 10 == 0
save('weights.mat','weights_1_ij','weights_2_ij','biasInput','biasHidden','testLabels','testSet','trainingSet','trainingLabels');
end
end end
function [correctlyClassified,count3,count8,unClassified,v] = testBP(testingSet,weights_1_ij, weights_2_ij,biasInput, biasHidden, testLabel, num_Hidden,num_Output)
correctlyClassified = 0;
count3 = 0; count8=0; TP=0; TN=0; FP=0; FN =0; P=0; R=0; F=0;
[testLengthRow,testLengthCol]=size(testingSet);
unClassified(1:10 ,1: testLengthCol) = 0;
% checking accuracy by number of correctly classified
for k=(1: testLengthRow )
x=testingSet(k,:);
S1(1:num_Hidden)=0;
S2(1:num_Output)=0;
num_Input =testLengthCol;
%calculating
for j =(1:num_Hidden)
for i =(1:num_Input)
S1(j) = S1(j) + weights_1_ij(i,j) * x(i) ;
end;
S1(j) = S1(j) + biasInput(j) * 1;
end;
h(1:num_Hidden) = 0;
for j =(1:num_Hidden)
h(j)= 1/(1+exp(-1*S1(j)));
end;
for j =(1:num_Output)
for i =(1:num_Hidden)
S2(j) = S2(j) + weights_2_ij(i,j) * h(i) ;
end;
S2(j) = S2(j) + biasHidden(j) * 1;
end;
O(1:num_Output) = 0;
for j =(1:num_Output)
O(j)= 1/(1+exp(-1*S2(j)));
end;
% error as output approaching target
if sqrt( (round(O(1))- (testLabel(k))) * (round(O(1))- (testLabel(k)) )) == 0
% correctly classified examples
correctlyClassified=correctlyClassified+1;
%compute TP, TN
if(testLabel(k)==1)
TP = TP+1;
else
TN = TN +1;
end
else
% wrongly classified examples
if(testLabel(k)==1)
FN = FN+1;
else
FP = FP +1;
end
%storing 5 misclassified classes from each class
if(count8<5 && testLabel(k)==0)
count8 = count8 + 1;
unClassified(count8,1: testLengthCol) = testingSet(k,1: testLengthCol);
end
if(count3<5 && testLabel(k)==1 )
count3 = count3 + 1;
unClassified(count3+5,1: testLengthCol) = testingSet(k,1: testLengthCol);
end
end
end
k
%for storing 'TP, TN, FP, FN, Precision, Recall,
%F value , accuracy
v=[TP, TN, FP, FN, TP/(TP+FP), TP/P, 2*P*R / (P+R) , correctlyClassified/testLengthRow]
disp('TP, TN, FP, FN, TP/(TP+FP), TP/P, 2*P*R / (P+R) , correctlyClassified/trainLengthRow');
unClassified;
accuracy = correctlyClassified/testLengthRow ;
accuracy
end
Backpropagation algorithm for 10 digits learning
Applying BPNN classifier on original multiclass data with the same experimental settings as in the binary case above. The results are not as high, and only upto few epocs could be performed, again due to limits on computing capacity of systems for experimentation. This is due to the fact that the complete Neural Network, initial weights have to change with the change in problem. The outputs are 10 classes represented in form of 1’s and 0’s of ten digits combinations. Conclusion, experiments of parameters for 10 class data need to be performed.
Misclassified Images for 2 class digit recognition problem
Code was written in each of the algorithm that stores 5 misclassified samples from each class in a mat file. And a script that is in this folder to read each of the misclassified row from this mat file and convert it in a 28 x 28 matrix in image form. The script is as follows:
%This is a script that reads data which was unclassified and stored in a %mat file and displays the image file corresponding to that data.
data = load('mat38unclassified.mat');
A = data.unClassified;
[r,c] = size(A);
%read all r data that are present in unclassified mat file
for t=1 : r
k=1;
x(1:c)=A(t,1:c);
a(1:28,1:28)=0;
%convert it back to matrix form
for i=1: 28
for j =1: 28
a(i,j)= x(k);
k=k+1;
end
end
% display each of the figure separately
figure(t);
imshow(a);
end
The following are misclassified ones from 3 and 8 digits backpropagation for 2 class program
Following are misclassified “3” digit.
Comparison with other ML Techniques
Method
TP
TN
FP
FN
Precision
Recall
F-Value
Accuracy
Logistic regression
936
990
20
38
.9791
.9610
.9699
.9708
Naïve Bayes
938
903
72
71
.928
.928
.928
.92
Comment: Logistic Regression is performing the best and Naïve Bayes is also performing good.
RNN takes high amount of time to learn. So the execution time for RNN is very high.
Naïve Bayes require and extra overhead of discretizing the data. Otherwise Naïve Bayes is a fast algorithm in terms of execution time as its output is direct output that does not require iterations.
The accuracy in these experiments is execution time and system restrictions, which prevented optimization of error and hence increase in accuracy. This does not say than a particular do not perform well-it is just an experimental setup for education and learning purpose for how to go ahead with experimentations.
Weka was used for Naïve Bayes. And the settings cant be changed the true positive rate and false positive rate is coming out to be 0.92 and 0.926 respectively. Which is upper right hand column of the ROC graph.
Naive Bayes ROC Curve: Since using Weka GUI it is not possible to create ROC curve though the TPR and FPR is given by (.926,.92). Also see the screen shoot below for details. Only two point plots possible.
ROC for implemented Logistic Regression:
Note again: Much fewer epochs were performed on experimentations due to constraints on computing devise used. This is too less when GPUs are used. But all this is for illustrative purpose only. Also iterative approach for implementation has been used in the code implemented for illustration. Modern day approach do use matrix computations for efficiency in processing, given the advent in processors and computing facilities used for matrix computations.
You must be logged in to post a comment.