15,029,260 members
Articles / Artificial Intelligence / Machine Learning
Article
Posted 27 May 2019

6.3K views
9 bookmarked

# Step-by-Step Guide to Implement Machine Learning VIII - Linear Regression

Rate me:
Easy to implement machine learning

## Introduction

There universally exists a relationship among variables. Indeed, the relationship can be divided into two categories, namely, certainty relation and uncertainty relation. The certainty relation can be expressed with a function. The certainty relation is also called correlation, which can be studied with regression analysis.

Generally, the linear regression model is:

The optimal can be determined by minimum the loss function:

## Regression Model

Linear regression consists of linear regression, local weighted linear regression, ridge regression, Lasso regression and stepwise linear regression.

### Linear Regression

The parameter for linear regression can be calculated by gradient descent method or regular expression. Because gradient descent method has been introduced in Step-by-Step Guide to Implement Machine Learning IV - Logistic Regression, we introduce the solution with regular expression in this article.

First, calculate the derivative of loss function:

Then, make the derivative equal to 0, we can obtain:

Finally, is:

where X is the training data and Y is the corresponding label. The code of linear regression is shown below:

Python
def standardLinearRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)

xTx = np.dot(x.T, x)
if np.linalg.det(xTx) == 0:   # calculate the Determinant of xTx
print("Error: Singluar Matrix !")
return
w = np.dot(np.linalg.inv(xTx), np.dot(x.T, y))
return w


### Local Weighted Linear Regression

It is underfitting in linear regression for it using the unbiased estimation of minimum mean square error(MMSE). To solve the problem, we assign weights on the points around the point to be predicted. Then, we apply normal regression analysis on it. The loss function for local weighted linear regression is:

Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is

The weights in local weighted linear regression is like the kernel function in SVM, which is given by:

The code of local weighted linear regression is shown below:

Python
def LWLinearRegression(self, x, y, sample):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)

sample_num = len(x)
weights = np.eye(sample_num)
for i in range(sample_num):
diff = sample - x[i, :]
weights[i, i] = np.exp(np.dot(diff, diff.T)/(-2 * self.k ** 2))
xTx = np.dot(x.T, np.dot(weights, x))
if np.linalg.det(xTx) == 0:
print("Error: Singluar Matrix !")
return
result = np.dot(np.linalg.inv(xTx), np.dot(x.T, np.dot(weights, y)))
return np.dot(sample.T, result)


### Ridge Regression

If the feature dimension is large, than the number of samples, the input matrix is not full rank, whose inverse matrix doesn't exist. To solve the problem, ridge regression add to make the matrix nonsingular. Actually, it is equal to add L2 regularization on the loss function for ridge regression, namely:

Like linear regression, we calculate the derivative of loss function and make it equal to 0. The optimal is:

The code of ridge regression is shown below:

Python
def ridgeRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)

feature_dim = len(x[0])
xTx = np.dot(x.T, x)
matrix = xTx + np.exp(feature_dim)*self.lamda
if np.linalg.det(xTx) == 0:
print("Error: Singluar Matrix !")
return
w = np.dot(np.linalg.inv(matrix), np.dot(x.T, y))
return w


### Lasso Regression

Like ridge regression, Lasso regression add L1 regularization on the loss function, namely:

Because the L1 regularization contains absolute value expression, the loss function is not derivable anywhere. Thus, we apply coordinate descent method (CD). The CD gets a minimum at a direction each iteration, namely,

We can get a closed solution for CD, which is given by:

where:

The code of Lasso regression is shown below:

Python
def lassoRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)

y = np.expand_dims(y, axis=1)
sample_num, feataure_dim = np.shape(x)
w = np.ones([feataure_dim, 1])
for i in range(self.iterations):
for j in range(feataure_dim):
h = np.dot(x[:, 0:j], w[0:j]) + np.dot(x[:, j+1:], w[j+1:])
w[j] = np.dot(x[:, j], (y - h))
if j == 0:
w[j] = 0
else:
w[j] = self.softThreshold(w[j])
return w


### Stepwise Linear Regression

Stepwise linear regression is similar to Lasso, which applies greedy algorithm at each iteration to get a minimum rather than CD. Stepwise linear regression adds or cuts down a small part on the weights at each iteration. The code of stepwise linear regression is shown below:

Python
def forwardstepRegression(self, x, y):
if self.norm_type == "Standardization":
x = preProcess.Standardization(x)
else:
x = preProcess.Normalization(x)

sample_num, feature_dim = np.shape(x)
w = np.zeros([self.iterations, feature_dim])
best_w = np.zeros([feature_dim, 1])
for i in range(self.iterations):
min_error = np.inf
for j in range(feature_dim):
temp_w = best_w
temp_w[j] += sign * self.learning_rate
y_hat = np.dot(x, temp_w)
error = ((y - y_hat) ** 2).sum()           # MSE
if error < min_error:                   # save the best parameters
min_error = error
best_w = temp_w
w = best_w
return w


## Conclusion and Analysis

There are many solutions to get the optimal parameter for linear regression. In this article, we only introduce some basic algorithms. Finally, let's compare our linear regression with the linear regression in Sklearn and the detection performance is displayed below:

Sklearn linear regression performance:

Our linear regression performance:

The performances look similar.

The related code and dataset in this article can be found in MachineLearning.

## History

• 28th May, 2019: Initial version

## Share

 Engineer Germany
Ryuk is interested in Machine Learning/Signal Processing/VoIP.