Here is the basic gist of what Scaling and Normalization in Machine Learning refers to:
Scaling is where you change the range of your data.
Normalization is where you're changing the shape of the distribution of your data.
Now let's describe these two terms in more detail...
Scaling is when you can transform whatever data you have to fit a pre-defined scale or range. For example if the x-axis values in some original dataset ranges from 0 to 10, the ranges will change after using a scaler such as Min Max Scaling.
In the case of using the
minmax_scaling() method, the range afterwards will go from 0 to 1, such that all values that previously lay between 0 and 10 now lay between 0 and 1, while the proportions/ratios between the values are preserved despite the changes to the scaling.
Here is an example of Scaling using the
And here is the Kaggle Notebook link that produced the above scatter plot
Normalization is a more extreme version of scaling such that it not only changes the range of the data in question but also fits the data into a normal distribution. Doing so will change the shape of the data when graphed, with the fully normalized version looking highly similar to a normal distribution(also known as a bell curve).
Here is an example of Normalization using BoxCox Transformation:
And here is the Kaggle Notebook link that produced the above dist plot
Similarities between Scaling and Normalization:
There are many similarities between Scaling and Normalization and here are a few of them:
- Both techniques are used to transform numeric values
- Both techniques are part of the broader practice of data cleaning in machine learning
- Both techniques are only applied to numeric data
Differences between Scaling and Normalization:
There are also many differences between Scaling and Normalization. Here are a few of them:
- Scaling changes the range of the data and keeps the shape of the distribution unaffected
- Normalization changes the shape of the distribution and keeps the range of the data unaffected
- Scaling not just changes the range of the data, it can also shrink and stretch the data to fit within a given range
- Normalization adjusts the numeric values of the data to a common scale while leaving the range unaffected
Thanks for reading this blog post!
If you have any questions or concerns feel free to post a comment in this post and I will get back to you if I find the time.