**Preprocessing**: I have created three .mat files from the given input in form of a text file. I first imported it in xls and then copied it to .mat files.

**NOTE: **The **derivation for linear and logistic regression** is added at the end of the document.

## Linear Regression Classifier

I have split the datasets into 70% training and 30% testing randomly in 5 folds. I have used matlab method crossvalind to create 70-30 ratio of training and testing and did this 5 times. Finally I calculated the average over the five runs.

#### Brief discussion about Linear Regression Classifier

Now the details of form I have used for linear classifier are as follows:-

Y=trainingLabels

X(1:trainLengthRow, 1)= 1;

X(1:trainLengthRow, 2:trainLengthCol+1) = trainingSet;

Z= X’* X;

weights = (inv(Z)) *(X’ * Y);

Here the first weight represent the bias.

###### Code Decription

The code for classification using linear regression is written in Matlab. Here I am creating 5 folds randomly with 70-30 ratio of training to testing.

Methods in code are as follows:-

The processing starts by reading the data from mat file and creating 5 folds . in 70-30 ratio for training and testing.

**train1LinearReg**

Method to do train the Linear Regression Model

——-Parameter’s————

trainingSet: The training set

trainingLabels: The traingin labels corresponding to training Set

————Return Type——————-

weights: weights computed by linear regression as explained above

**testLinearReg**

This function tests the test set with testlabels and the computed outputs

———–paramaters———–

testSet: The Test Set for testing

testLables: Corresponding test labels

weights: weights computed from the training

—————Return—————–

correctlyClassified: correctly classified number of samples

unClassified: 10 unclassified samples

v: vector that stores TP ;TN;FP; FN ; P; R; F; Accuracy

count0: Number of class -1 unclassified upto max val of 5

count1: Number of class +1 unclassified upto max val of 5

## Logistic Regression

The data are trained for logistic regression with following standard parameters:-

Maximum number of numIteration =1000;

eta = 0.5;

errorBound = 0.0001;

I tested over various values of eta. The following are results with eta value of 0.5.

Also I have written **two codes** for logistic regression one using the expanded approach and another using the shortned matrix manipulations. Results are similar for both the codes. Code is for two class problem. Here is brief description of both:-

##### The matrix version is as follows:-

P(1:trainLengthRow) = 0;

Y = trainingLabels’;

X(1:trainLengthRow , 1:trainLengthCol+1) = 0;

X(1:trainLengthRow ,1) = 1;

X(1:trainLengthRow ,2:trainLengthCol+1) = trainingSet(1:trainLengthRow, . . . 1:trainLengthCol);

sum = sum + W_Old(j+1)*trainingSet(t,j);

P(t) = 1/(1+ exp(-1*sum( over values) )

Z= (X’) * (Y-P)’;

%computing the new weights

W_New = W_Old + eta * Z’;

###### Code Details

The code settings are as described in the previous code details for linear regression. The ratio is 70-30 for training and I have used crossvalind with holdout parameter of .3 for 30% testing and 70 % training data and then I have taken 5 folds of it. Following are the methods for training and testing the logistic regression.

Both the code that I have implemented have the same method interface with a slight difference of how weights are stored . The formulas are given above and in Appendix.

Method **TrainLogRegr:-**

This mathod is for tarining the logestic regression problem

—–Parameters—-

trainingSet: the training set

trainingLabels: the labels corresponding to the traiining set

weights: the initial weights obtained from traiining

weight0: The initial bias weight

—–Return Types——

weights: the final weights obtained from traiining

weight0: The bias weight

Method **TestLogRegr:-**

This is the method that is called to test the accuracy of the methods

———————–Parameters————————–

testSet: the set of samples to be considered for testing

testLabels: the labels corresponding to testset

weight0, weight: the weights corresponding to logistic regression

———————–Return Values————————

correctlyClassified: The number of correctly classified samples

unClassified: The array containing 5 unclassified data samples from each

classification type

v: The vecor that returns the computed values of TP;TN; FP; FN ,P; R; F, accuracy