Preprocessing: I have created three .mat files from the given input in form of a text file. I first imported it in xls and then copied it to .mat files.
NOTE: The derivation for linear and logistic regression is added at the end of the document.
I have split the datasets into 70% training and 30% testing randomly in 5 folds. I have used matlab method crossvalind to create 70-30 ratio of training and testing and did this 5 times. Finally I calculated the average over the five runs.
Brief discussion about Linear Regression Classifier
Now the details of form I have used for linear classifier are as follows:-
X(1:trainLengthRow, 1)= 1;
X(1:trainLengthRow, 2:trainLengthCol+1) = trainingSet;
Z= X’* X;
weights = (inv(Z)) *(X’ * Y);
Here the first weight represent the bias.
The code for classification using linear regression is written in Matlab. Here I am creating 5 folds randomly with 70-30 ratio of training to testing.
Methods in code are as follows:-
The processing starts by reading the data from mat file and creating 5 folds . in 70-30 ratio for training and testing.
Method to do train the Linear Regression Model
trainingSet: The training set
trainingLabels: The traingin labels corresponding to training Set
weights: weights computed by linear regression as explained above
This function tests the test set with testlabels and the computed outputs
testSet: The Test Set for testing
testLables: Corresponding test labels
weights: weights computed from the training
correctlyClassified: correctly classified number of samples
unClassified: 10 unclassified samples
v: vector that stores TP ;TN;FP; FN ; P; R; F; Accuracy
count0: Number of class -1 unclassified upto max val of 5
count1: Number of class +1 unclassified upto max val of 5
The data are trained for logistic regression with following standard parameters:-
Maximum number of numIteration =1000;
eta = 0.5;
errorBound = 0.0001;
I tested over various values of eta. The following are results with eta value of 0.5.
Also I have written two codes for logistic regression one using the expanded approach and another using the shortned matrix manipulations. Results are similar for both the codes. Code is for two class problem. Here is brief description of both:-
The matrix version is as follows:-
P(1:trainLengthRow) = 0;
Y = trainingLabels’;
X(1:trainLengthRow , 1:trainLengthCol+1) = 0;
X(1:trainLengthRow ,1) = 1;
X(1:trainLengthRow ,2:trainLengthCol+1) = trainingSet(1:trainLengthRow, . . . 1:trainLengthCol);
sum = sum + W_Old(j+1)*trainingSet(t,j);
P(t) = 1/(1+ exp(-1*sum( over values) )
Z= (X’) * (Y-P)’;
%computing the new weights
W_New = W_Old + eta * Z’;
The code settings are as described in the previous code details for linear regression. The ratio is 70-30 for training and I have used crossvalind with holdout parameter of .3 for 30% testing and 70 % training data and then I have taken 5 folds of it. Following are the methods for training and testing the logistic regression.
Both the code that I have implemented have the same method interface with a slight difference of how weights are stored . The formulas are given above and in Appendix.
This mathod is for tarining the logestic regression problem
trainingSet: the training set
trainingLabels: the labels corresponding to the traiining set
weights: the initial weights obtained from traiining
weight0: The initial bias weight
weights: the final weights obtained from traiining
weight0: The bias weight
This is the method that is called to test the accuracy of the methods
testSet: the set of samples to be considered for testing
testLabels: the labels corresponding to testset
weight0, weight: the weights corresponding to logistic regression
correctlyClassified: The number of correctly classified samples
unClassified: The array containing 5 unclassified data samples from each
v: The vecor that returns the computed values of TP;TN; FP; FN ,P; R; F, accuracy