Overfitting Using Higher Order Linear Regression.

Introduction

In this blog I have tried to explain the concept of overfitting using Higher Order Linear Regression . I have tried to fit the Model with 20 data Points (10 Training data points)
And (10 test data points) using dfeerent degrees (0,1,3,9) The model seems to be overfitting at order of degree : 9 and thus I have tried to increase the data points to more than 100 at order:9 to overcome the Overfitting.

Please click below to refer the full code.

Get Started

Generate 20 data pairs (X, Y) using y = sin(2piX) + 0.1 * N

Use uniform distribution between 0 and 1 for X
Sample N from the normal gaussian distribution
Use 10 for train and 10 for test

Using root mean square error, find weights of polynomial regression for order is 0, 1, 3, 9

For Order = 0

For order = 1

For Order = 3

For Order = 9

To Display all The Weights in the table

To Draw Train Error vs Test Error Graph

First we plot train error

Then we Plot test error

Now the Train vs Test Errror plot

Now generate 100 more data pairs and fit it to the 9th order model and then we draw the fit

Now fit it to the model whose order is 9

Now we will regularize using the sum of weights.

Now we need to plot train error vs test error according to lambda

Based on the best test performance : we found out the best lambda value for the same to be : -1

Contributions

From the refernce I took the values were found to be hardcoded, Therefore I changed the values to be dynamic and thus the weights will now change according to the code. The train Vs test error graph was not satisfying the Overfitting conditions Therefore I tried to further decrease the training error and increase the test error.

Challenges

The main challenge I faced was finding the right function to get the right values for train and test weights as the train error vs test error graaph was not satisfying the overfittinng condition and the test error was similar to the train error and there was very less difference between them.

Visualization of the Graphs

Plotting of graphs is a major aspect in this concept. Plotting the graphs of train error vs test error helped us to know if the model is overfit or not by visualizing the difference between the train and test error we can conclude about the model. The visualization aspect also helped conclude the model was overfit at order 9 for 10 data points also the same visualization helped us understand by increasing the data points at same order 9 we can reduce overfitting. Thus visualization has a lot of benefits.

Explanation of Overfitting / Underfitting

Overfitting is when the model is too complex , and its training error is low and the test error is high. Which means the model is too trained and learns soo much that it negatively impacts the model. In order to reduce overfitting we need to increase the amount of data. This decreases the difference between the training error and the test error. This is the exact same that we have applied above , When overfitting has occured at order = 9 we have tried to increase the data points to 100 and have reduced the train vs test error. The concept of underfitting means the model is too simple and the train error as well as the test error is too high.

References

I have referrred the code from the following references. These references have given a good headstart to the assignment and helped me understand the concept better. I have also tried to experimnent a few things from the referenced code which is mentioned under "contributions"

https://mintu07ruet.github.io/files/Miah%2001.html
https://cyanacm.wordpress.com/2020/05/23/dataminint-assignment01/2/
https://www.statology.org/overfitting-machine-learning/

Links

Jupyter-notebook for all codes.
Github link for the blog post
My portolio

Contact

Any Queries ? Contact me !

Email Id : Neelesh216@gmail.com

Address : 404 E Border st, Arlington, Texas 76010, USA

Phone : 682 375 1222