BufferedDev

In this article we’ll implement multivariable linear regression. As a prerequisite go through Linear Regression

multivariable_linear_regression
In [1]:
import numpy as np
import pandas as pd
from ipywidgets import *
import matplotlib.pyplot as plt

df = pd.read_csv('../data/house_prices_2.csv')

training_data = df.head() 
training_data
Out[1]:
area rooms price
0 2104 3 399900
1 1600 3 329900
2 2400 3 369000
3 1416 2 232000
4 3000 4 539900

Feature Scaling

The input variables in the training sample are spread over wide ranges, the area range is significantly bigger than the room range. This will cause gradient descent to run for longer over many iterations before converging. To make gradient descent faster we should normalize and scale the input variables so that the input variables are spread out over approximately the same range.

Feature Scaling And Mean Normalization

Feature scaling involves dividing the input variable with the range — difference between max and min values or the standard deviation and mean normalization involves substracting the mean from the input variable.

The formula below scales and normalizes feature $x_i$.

$x_i = \frac{x_i – \mu_i}{\sigma_i}$

Where $\mu_i$ is the mean and $\sigma_i$ is the range or standard deviation for the ith feature. In this tutorial, we’ll use standard deviation instead of range for $\sigma_i$

Example

Scale and normalize area’s first entry

$\mu_i = 2104$

$\sigma_i = 568.82$

norm_area $= \frac{2104 – 2104}{568.83}$

Therefore, norm_area $=$ 0

The snippet below implements the formula in Python

In [2]:
def normalize(input_variables):
    result = None

    mean = np.mean(input_variables)
    std = np.std(input_variables)

    result = (input_variables - mean) / std
    return result

Now we can normalize the input variables, area and rooms

In [3]:
area = training_data['area']
rooms = training_data['rooms']

training_data['area'] = normalize(area)
training_data['rooms'] = normalize(rooms)

training_data
Out[3]:
area rooms price
0 0.000000 0.000000 399900
1 -0.886042 0.000000 329900
2 0.520374 0.000000 369000
3 -1.209517 -1.581139 232000
4 1.575185 1.581139 539900

Vectorization

Linear regression used an iterative technique to compute the gradient descent. While that worked it’s not as fast and simple as the vectorized implementation.

Hypothesis

The formular for the hypothesis function with n features is as shown below

$h(x) = \sum_{0}^{n}\theta_j x_j$

Which is the same as

$h(x) = \theta_0 + \theta_1 x_1 … \theta_n x_n$

The equation above is equivalent to a matrix vector dot product

$h(x) = X \cdot \theta$

Example

Describing the input variables as a matrix, I will add ones to the first column to make computation easy.

$ X = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix} $

Then we can describe parameters $\theta$ as a vector — using arbitrary values for $\theta$

$ \theta = \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix} $

Then prediction $ = X \cdot \theta $

$ X \cdot \theta = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix} $

Therefore

$prediction = X \cdot \theta = \begin{bmatrix} 0.0 \\ -0.9 \\ -4.4 \\ 4.8 \\ \end{bmatrix}$

Writing the hypothesis algorithm in Python

In [4]:
def hypothesis(X, theta):
    return X @ theta

Cost Function

The formular for unvectorized cost function is

$J(\theta) = \frac{1}{2m}\sum_{1}^{m}(h(x_i) – y_i)^2$

The vectorized equivalient of $(h(x_i) – y_i)^2$ is $(X \cdot \theta – y)^T \times (X \cdot \theta – y)$

So the vectorized cost function can be written as

$J(\theta) = \frac{1}{2m}sum((X \cdot \theta – y)^T \times (X \cdot \theta – y))$

Since $X \cdot \theta = prediction $, we can use the hypothesis function when implementing the cost function in Python

In [5]:
def cost_function(X, theta, y):
    m, _ = X.shape

    sqdError = np.matmul(np.transpose(hypothesis(X, theta) - y),  (hypothesis(X, theta) -y))
        
    return (1/(2 * m)) * np.sum(sqdError)
In [6]:
X = np.array([
      [1.0,  0.0,  0.0],
      [1.0, -0.9,  0.0],
      [1.0,  0.5,  0.0],
      [1.0, -1.2, -1.6],
      [1.0,  1.6,  1.6]
      ])
theta = np.array([
    0,
    1, 
    2
   ])
y = np.array([
   399900,
   329900,
   369000,
   232000,
   539900])
   
cost = cost_function(X, theta, y)
cost
Out[6]:
75022811342.346

Gradient Descent

The unvectorized formular for gradient descent is as shown below

repeat until covergence {

$\theta = \theta – \alpha \frac{1}{m}\sum_{1}^{m}(h(x_i) – y_i)x_i$

}

And the vectorized formular

repeat until convergence {

$\theta_i = \theta_i – \alpha \frac{1}{m}sum((X \cdot \theta – y)x_i)$

}

The snippet below implements gradient descent in Python

In [7]:
def gradient_descent(X, theta, alpha=0, num_iter=1):
    m, features = X.shape
    temp = np.zeros(shape=(theta.shape))

    for i in range(0, num_iter):
        for feature in range(0, features):
            temp[feature] = theta[feature] - (alpha / m ) * np.sum(((X @ theta) - y) * X[:, feature])
        theta = temp
    return theta

Running gradient descent, we get the following values for the parameters $\theta$

In [8]:
thetas = gradient_descent(X, theta, alpha=30, num_iter=100)
thetas
Out[8]:
array([-2.47154393e+268, -4.65254066e+284,  1.19688093e+286])
In [ ]:
 

View Comments

There are currently no comments.

Next Post