## Introduction

Linear Regression is an important algorithm of supervised learning. In this article, I am going to re-use the following notations that I have referred from [1] (in the References section):

- x
^{i}denotes the “input” variables, also called input**features** - y
^{i }denotes the “output” or**target**variable that we are trying to predict - A pair (x
^{i}, y^{i}) is called**a training example** - A list of m training examples {x
^{i}, y^{i}; i = 1,…,m} is called**a training set** - The superscript “i” in the notations (x
^{i}and y^{i}) is an index into the training set - X denotes the space of input values and Y denotes the space of output values. In this article, I am going to assume that X = Y = R
- A function h: X -> Y, where h(x) is a good predictor for the corresponding value of y, is called a
**hypothesis or a model**

When the target variable that we are trying to predict is continuous, we call the learning problem a regression problem. When y takes on only a small number of discrete values, we call it a classification problem.

## Background

In machine learning, if we are talking about regression, we often mean linear regression. Linear regression means you can add up the inputs multiplied by some constants to get output and we are going to represent h function as follows:

Where the w_{i}’s are the parameters (also called weights) parameterizing the space of linear functions mapping from X to Y. To simplicity, we also assume that x_{0} = 1 and our h(x) can look like this:

If we view w and x both as vectors, we can re-write h(x):

Where x = (x_{0}, x_{1}, x_{2},…,x_{n}) and w = (w_{0}, w_{1},…,w_{n}).

So far, a question is going to occur that is how can we get the weights w? To answer this question, we are going to define a cost function that is used to compute error as the difference between predicted h(x) and the actual y. The cost function looks like this:

We want to choose `w`

so as to minimize `costF(w)`

. To do this, there are two approaches:

- First approach, we are going to use gradient descent algorithm to minimize
`costF(w)`

. In this approach, we repeatedly run through the training set, and each time we encounter a training example, we update the weights according to the gradient of the error with respect to that single training example only. - Second approach, we are going to minimize
`costF`

by explicitly taking its derivatives with respect to`w`

, and setting them to zero. We can set this to zero and solve for`w`

to get the following equation:

You can discover more about these approaches in [1]. To use the code, in this article, I am going to use the TensorFlow library for the first approach and the NumPy library for the second approach.

## Using the Code

### Initializing a Linear Model

In this article, I assume that our model (or `h`

function) is the following equation:

h(x) = w_{1}*x + w_{0}, where x_{0} = 1, x_{1} = x

### Initializing a Training Set

We need to initialize data by creating the following Python script:

import numpy as np import matplotlib.pyplot as plt # the training set x_train = np.linspace(0, 10, 100) y_train = x_train + np.random.normal(0,1,100) plt.scatter(x_train, y_train) plt.show()

If you run this script, the result can look like this:

### Gradient Descent Algorithm Approach

In this approach, we repeatedly run through the training set, and each time we encounter a training example, we update the weights according to the gradient of the error with respect to that single training example only. The following code will allow you to create a best-fit line for the given data by using TensorFlow library:

import tensorflow as tf import numpy as np import matplotlib.pyplot as plt learning_rate = 0.01 # steps of looping through all your data to update the parameters training_epochs = 100 # the training set x_train = np.linspace(0, 10, 100) y_train = x_train + np.random.normal(0,1,100) # set up placeholders for input and output X = tf.placeholder(tf.float32) Y = tf.placeholder(tf.float32) # Define h(x) = x*w1 + w0 def h(X, w1, w0): return tf.add(tf.multiply(X, w1), w0) # set up variables for weights w0 = tf.Variable(0.0, name="weights") w1 = tf.Variable(0.0, name="weights") y_predicted = h(X, w1, w0) # Define the cost function costF = 0.5*tf.square(Y-y_predicted) # Define the operation that will be called on each iteration train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(costF) sess = tf.Session() init = tf.global_variables_initializer() sess.run(init) # Loop through the data training for epoch in range(training_epochs): for (x, y) in zip(x_train, y_train): sess.run(train_op, feed_dict={X: x, Y: y}) # get values of the final weights w_val_0 = sess.run(w0) w_val_1 = sess.run(w1) sess.close() # plot the data training plt.scatter(x_train, y_train) # plot the best fit line y_learned = x_train*w_val_1 + w_val_0 plt.plot(x_train, y_learned, 'r') plt.show()

If we run this script, the result can look like this:

### Matrix Derivatives Approach

In this approach, we are going to minimize `costF`

by explicitly taking its derivatives with respect to `w`

, and setting them to zero. You can use Matrix methods from the TensorFlow library but here I am going to use the NumPy library for solving this problem. The following code will allow you to create a best-fit line for the given data by using the NumPy library:

from numpy import * import numpy as np import matplotlib.pyplot as plt # the training set x_train = np.linspace(0, 10, 100) y_train = x_train + np.random.normal(0,1,100) xArr = [] yArr = [] for i in range(len(x_train)): # x0 = 1, x1 = x_train xArr.append([1.0,float(x_train[i])]) yArr.append(float(y_train[i])) def linearRegres(xArr,yArr): xMat = mat(xArr); yMat = mat(yArr).T xTx = xMat.T*xMat # checking the determination, if you don’t do this, you will get an # error when computing the inverse if the determination is zero if linalg.det(xTx) == 0.0: print("This matrix is singular, cannot do inverse") return ws = xTx.I * (xMat.T*yMat) return ws # get values of the final weights w_val = linearRegres(xArr,yArr) # plot the data training plt.scatter(x_train, y_train) # plot the best fit line y_learned = mat(xArr)*w_val plt.plot(x_train, y_learned, 'r') plt.show()

The result of running the script above can look like this:

## Points of Interest

In this article, I introduced two approaches to solve a linear regression problem. One problem with linear regression is that it tends to underfit the data and one way to solve this problem is a technique known as locally weighted linear regression. You can discover more about this technique in [1].

## References

- [1] CS229 Lecture notes by Andrew Ng
- [2] Machine Learning in Action by Peter Harrington
- [3] Machine Learning with TensorFlow by Nishant Shukla
- [4] TensorFlow Machine Learning Cookbook by Nick McClure
- [5] Data Science from Scratch by Joel Grus
- [6] Hands-on Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géron

## History

- 21
^{st}April, 2019: Initial version