Multivariable Linear Regression
In this article we’ll implement multivariable linear regression. As a prerequisite go through Linear Regression
import numpy as np
import pandas as pd
from ipywidgets import *
import matplotlib.pyplot as plt
df = pd.read_csv('../data/house_prices_2.csv')
training_data = df.head()
training_data
Feature Scaling¶
The input variables in the training sample are spread over wide ranges, the area range is significantly bigger than the room range. This will cause gradient descent to run for longer over many iterations before converging. To make gradient descent faster we should normalize and scale the input variables so that the input variables are spread out over approximately the same range.
Feature Scaling And Mean Normalization¶
Feature scaling involves dividing the input variable with the range — difference between max and min values or the standard deviation and mean normalization involves substracting the mean from the input variable.
The formula below scales and normalizes feature $x_i$.
$x_i = \frac{x_i – \mu_i}{\sigma_i}$
Where $\mu_i$ is the mean and $\sigma_i$ is the range or standard deviation for the ith feature. In this tutorial, we’ll use standard deviation instead of range for $\sigma_i$
Example
Scale and normalize area’s first entry
$\mu_i = 2104$
$\sigma_i = 568.82$
norm_area $= \frac{2104 – 2104}{568.83}$
Therefore, norm_area $=$ 0
The snippet below implements the formula in Python
def normalize(input_variables):
result = None
mean = np.mean(input_variables)
std = np.std(input_variables)
result = (input_variables - mean) / std
return result
Now we can normalize the input variables, area and rooms
area = training_data['area']
rooms = training_data['rooms']
training_data['area'] = normalize(area)
training_data['rooms'] = normalize(rooms)
training_data
Vectorization¶
Linear regression used an iterative technique to compute the gradient descent. While that worked it’s not as fast and simple as the vectorized implementation.
Hypothesis¶
The formular for the hypothesis function with n features is as shown below
$h(x) = \sum_{0}^{n}\theta_j x_j$
Which is the same as
$h(x) = \theta_0 + \theta_1 x_1 … \theta_n x_n$
The equation above is equivalent to a matrix vector dot product
$h(x) = X \cdot \theta$
Example¶
Describing the input variables as a matrix, I will add ones to the first column to make computation easy.
$ X = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix} $
Then we can describe parameters $\theta$ as a vector — using arbitrary values for $\theta$
$ \theta = \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix} $
Then prediction $ = X \cdot \theta $
$ X \cdot \theta = \begin{bmatrix} 1 & 0 & 0 \\ 1 & -0.9 & 0 \\ 1 & 0.5 & 0 \\ 1 & -1.2 & -1.6 \\ 1 & 1.6 & 1.6 \\ \end{bmatrix} \cdot \begin{bmatrix} 0 \\ 1 \\ 2 \\ \end{bmatrix} $
Therefore
$prediction = X \cdot \theta = \begin{bmatrix} 0.0 \\ -0.9 \\ -4.4 \\ 4.8 \\ \end{bmatrix}$
Writing the hypothesis algorithm in Python
def hypothesis(X, theta):
return X @ theta
Cost Function¶
The formular for unvectorized cost function is
$J(\theta) = \frac{1}{2m}\sum_{1}^{m}(h(x_i) – y_i)^2$
The vectorized equivalient of $(h(x_i) – y_i)^2$ is $(X \cdot \theta – y)^T \times (X \cdot \theta – y)$
So the vectorized cost function can be written as
$J(\theta) = \frac{1}{2m}sum((X \cdot \theta – y)^T \times (X \cdot \theta – y))$
Since $X \cdot \theta = prediction $, we can use the hypothesis function when implementing the cost function in Python
def cost_function(X, theta, y):
m, _ = X.shape
sqdError = np.matmul(np.transpose(hypothesis(X, theta) - y), (hypothesis(X, theta) -y))
return (1/(2 * m)) * np.sum(sqdError)
X = np.array([
[1.0, 0.0, 0.0],
[1.0, -0.9, 0.0],
[1.0, 0.5, 0.0],
[1.0, -1.2, -1.6],
[1.0, 1.6, 1.6]
])
theta = np.array([
0,
1,
2
])
y = np.array([
399900,
329900,
369000,
232000,
539900])
cost = cost_function(X, theta, y)
cost
Gradient Descent¶
The unvectorized formular for gradient descent is as shown below
repeat until covergence {
$\theta = \theta – \alpha \frac{1}{m}\sum_{1}^{m}(h(x_i) – y_i)x_i$
}
And the vectorized formular
repeat until convergence {
$\theta_i = \theta_i – \alpha \frac{1}{m}sum((X \cdot \theta – y)x_i)$
}
The snippet below implements gradient descent in Python
def gradient_descent(X, theta, alpha=0, num_iter=1):
m, features = X.shape
temp = np.zeros(shape=(theta.shape))
for i in range(0, num_iter):
for feature in range(0, features):
temp[feature] = theta[feature] - (alpha / m ) * np.sum(((X @ theta) - y) * X[:, feature])
theta = temp
return theta
Running gradient descent, we get the following values for the parameters $\theta$
thetas = gradient_descent(X, theta, alpha=30, num_iter=100)
thetas
View Comments
Linear Regression
In this article we’ll implement multivariable linear regression. As a...