## BufferedDev

In this article we’ll implement multivariable linear regression. As a prerequisite go through Linear Regression

multivariable_linear_regression
In [1]:
import numpy as np
import pandas as pd
from ipywidgets import *
import matplotlib.pyplot as plt

training_data

Out[1]:
area rooms price
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900

## Feature Scaling¶

The input variables in the training sample are spread over wide ranges, the area range is significantly bigger than the room range. This will cause gradient descent to run for longer over many iterations before converging. To make gradient descent faster we should normalize and scale the input variables so that the input variables are spread out over approximately the same range.

#### Feature Scaling And Mean Normalization¶

Feature scaling involves dividing the input variable with the range — difference between max and min values or the standard deviation and mean normalization involves substracting the mean from the input variable.

The formula below scales and normalizes feature $x_i$.

$x_i = \frac{x_i – \mu_i}{\sigma_i}$

Where $\mu_i$ is the mean and $\sigma_i$ is the range or standard deviation for the ith feature. In this tutorial, we’ll use standard deviation instead of range for $\sigma_i$

Example

Scale and normalize area’s first entry

$\mu_i = 2104$

$\sigma_i = 568.82$

norm_area $= \frac{2104 – 2104}{568.83}$

Therefore, norm_area $=$ 0

The snippet below implements the formula in Python

In [2]:
def normalize(input_variables):
result = None

mean = np.mean(input_variables)
std = np.std(input_variables)

result = (input_variables - mean) / std
return result


Now we can normalize the input variables, area and rooms

In [3]:
area = training_data['area']
rooms = training_data['rooms']

training_data['area'] = normalize(area)
training_data['rooms'] = normalize(rooms)

training_data

Out[3]:
area rooms price
0 0.000000 0.000000 399900
1 -0.886042 0.000000 329900
2 0.520374 0.000000 369000
3 -1.209517 -1.581139 232000
4 1.575185 1.581139 539900

## Vectorization¶

Linear regression used an iterative technique to compute the gradient descent. While that worked it’s not as fast and simple as the vectorized implementation.

## Hypothesis¶

The formular for the hypothesis function with n features is as shown below

$h(x) = \sum_{0}^{n}\theta_j x_j$

Which is the same as

$h(x) = \theta_0 + \theta_1 x_1 … \theta_n x_n$

The equation above is equivalent to a matrix vector dot product

$h(x) = X \cdot \theta$

#### Example¶

Describing the input variables as a matrix, I will add ones to the first column to make computation easy.

$X = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix}$

Then we can describe parameters $\theta$ as a vector — using arbitrary values for $\theta$

$\theta = \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix}$

Then prediction $= X \cdot \theta$

$X \cdot \theta = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix}$

Therefore

$prediction = X \cdot \theta = \begin{bmatrix} 0.0 \\ -0.9 \\ -4.4 \\ 4.8 \\ \end{bmatrix}$

Writing the hypothesis algorithm in Python

In [4]:
def hypothesis(X, theta):
return X @ theta


## Cost Function¶

The formular for unvectorized cost function is

$J(\theta) = \frac{1}{2m}\sum_{1}^{m}(h(x_i) – y_i)^2$

The vectorized equivalient of $(h(x_i) – y_i)^2$ is $(X \cdot \theta – y)^T \times (X \cdot \theta – y)$

So the vectorized cost function can be written as

$J(\theta) = \frac{1}{2m}sum((X \cdot \theta – y)^T \times (X \cdot \theta – y))$

Since $X \cdot \theta = prediction$, we can use the hypothesis function when implementing the cost function in Python

In [5]:
def cost_function(X, theta, y):
m, _ = X.shape

sqdError = np.matmul(np.transpose(hypothesis(X, theta) - y),  (hypothesis(X, theta) -y))

return (1/(2 * m)) * np.sum(sqdError)

In [6]:
X = np.array([
[1.0,  0.0,  0.0],
[1.0, -0.9,  0.0],
[1.0,  0.5,  0.0],
[1.0, -1.2, -1.6],
[1.0,  1.6,  1.6]
])
theta = np.array([
0,
1,
2
])
y = np.array([
399900,
329900,
369000,
232000,
539900])

cost = cost_function(X, theta, y)
cost

Out[6]:
75022811342.346

The unvectorized formular for gradient descent is as shown below

repeat until covergence {

$\theta = \theta – \alpha \frac{1}{m}\sum_{1}^{m}(h(x_i) – y_i)x_i$

}

And the vectorized formular

repeat until convergence {

$\theta_i = \theta_i – \alpha \frac{1}{m}sum((X \cdot \theta – y)x_i)$

}

The snippet below implements gradient descent in Python

In [7]:
def gradient_descent(X, theta, alpha=0, num_iter=1):
m, features = X.shape
temp = np.zeros(shape=(theta.shape))

for i in range(0, num_iter):
for feature in range(0, features):
temp[feature] = theta[feature] - (alpha / m ) * np.sum(((X @ theta) - y) * X[:, feature])
theta = temp
return theta


Running gradient descent, we get the following values for the parameters $\theta$

In [8]:
thetas = gradient_descent(X, theta, alpha=30, num_iter=100)
thetas

Out[8]:
array([-2.47154393e+268, -4.65254066e+284,  1.19688093e+286])
In [ ]: