Gradient Descent Calculator

Table of Contents

A gradient descent calculator of the linear regression model \(y = ax + b\) is presented. The values of the coefficients \(a\) and \(b\) are compared to those calculated using the analytical method based the linear least squares fitting . Comparisons of the results are also presented for educational purposes.
Let us assume that we have a set of data of \( n \) data points \( (x_i , y_i) \) and we need to find a linear model of the form \(y = ax + b\) by minimizing the sum \[ \Sigma^{n}_{i=1}(y_i - (a \cdot x_i + b))^2 \] which is the sum of the distances between the data points and the points on the linear model as shown below. Note that the sum of the squares is used instead of the absolute value for practical reason when doing partial derivatives and other analytical computations.
We now present two methods to calculate the coefficients \( a \) and \( b \) included in the linear model.

Gradient Descent Method

Gradient descent is an iterative optimization algorithm that finds the parameters \(a\) and \(b\) by minimizing the cost function defined by:

\[ J(a, b) = (1/2n) \Sigma^{n}_{i=1}(y_i - (a \cdot x_i + b))^2 \] Whose partial derivatives are needed in the update rules and are given by: \[ \dfrac{\partial J}{\partial a} = (1/n) \Sigma^{n}_{i=1}(- x_i (y_i - (a x_i + b))) \] \[ \dfrac{\partial J}{\partial b} = (1/n) \Sigma^{n}_{i=1}(-(y_i - (a x_i + b ))) \]

The update rules for the parameters \( a \) and \( b \) are:

\[ a_{k+1} = a_k - r \dfrac{\partial J}{\partial a}|_{a=a_k,b=b_k} = a_k - r (1/n) \Sigma^{n}_{i=1}(-x_i (y_i - (a_k * x_i + b_k))) \] \[ b_{k+1} = b_k - r \dfrac{\partial J}{\partial b}|_{a=a_k,b=b_k} = b_k - r (1/n) \Sigma^{n}_{i=1}(-(y_i - (a_k x_i + b_k))) \] Where

\(r\) is the learning rate.
The process is repeated until convergence.

Linear Least Squares Fitting

The analytical method based on the linear least squares fitting uses closed-form formulas derived from minimizing the sum of squared errors. The formulas for parameters \(a\) and \(b\) are:

\[ a = \dfrac{(n \cdot \displaystyle \Sigma^{n}_{i=1} (x_i \cdot y_i) - \Sigma^{n}_{i=1} x_i \cdot \Sigma^{n}_{i=1} y_i)}{ (n \cdot \Sigma^{n}_{i=1} (x_i^2) - (\Sigma^{n}_{i=1}x_i)^2) } \] \[ b = \dfrac{(\Sigma^{n}_{i=1}y_i - a \cdot \Sigma^{n}_{i=1}x_i) }{ n } \]

Use of the Calculator

The purpose of this calculator is to investigate the effects of the learning rate \( r \) and the initial values of the coefficients \( a \) and \( b \) on the convergence of the iteration process based on the algorithm (updating formulas) of the gradient descent.
Enter the number of points \( n \) and click "Generate Points" which will randomly generates \( n \) points, investigate for different values of \( r \), \( a_0 \) and \( b_0 \)
The iteration process converges to the solutions if the partial \( \dfrac{\partial J}{\partial a} \) and \( \dfrac{\partial J}{\partial b} \) approaches zero.
The maximum number of iterations may also be increased if needed but the compuations maigh take time.
Scatter plot (red) of the data points as well as the graph of \( y = a x + b \) in blue for the gradient descent method and green for the linear least square method.

Enter the number of points: \( n = \) 10

Epochs (maximum number of iterations) :
Learning Rate \( r \):
Initial value of \( a \): \(a_0 = \)
Initial value of \( b \): \(b_0 = \)

Outputs

\( \dfrac{\partial J}{\partial a} \) =

\( \dfrac{\partial J}{\partial b} \) =

Results Form Gradient Descent Mathod

\(a = \)

\(b = \)

Results Form Linear Least Squares Method (Using formulas for \( a \) and \( b \) above)

\(a = \)

\(b = \)

Minimum Value of the cost function \( J(a, b) \):

Graph Legend: Blue for gradient descent Green Green for the linear least square method.