Published on

Scaling and Normalization in Machine Learning

Figure 1: Introduction image featuring nature scenery

Here is the basic gist of what Scaling and Normalization in Machine Learning refers to:

Scaling is where you change the range of your data.

Normalization is where you're changing the shape of the distribution of your data.

Now let's describe these two terms in more detail...

Scaling

Scaling is when you can transform whatever data you have to fit a pre-defined scale or range. For example if the x-axis values in some original dataset ranges from 0 to 10, the ranges will change after using a scaler such as Min Max Scaling.

In the case of using the minmax_scaling() method, the range afterwards will go from 0 to 1, such that all values that previously lay between 0 and 10 now lay between 0 and 1, while the proportions/ratios between the values are preserved despite the changes to the scaling.

There are various types of scaling such as Min Max Scaling, Mean Normalization, Max Absolute Scaling and Robust Scaling

Here is an example of Scaling using the MinMaxScaler():

Figure 2: An example of Scaling using the MinMaxScaler()

And here is the Kaggle Notebook link that produced the above scatter plot

Normalization

Normalization is a more extreme version of scaling such that it not only changes the range of the data in question but also fits the data into a normal distribution. Doing so will change the shape of the data when graphed, with the fully normalized version looking highly similar to a normal distribution(also known as a bell curve).

There are various types of normalization such as Z-Score Normalization(Also known as Standardization) and Box Cox Transformation.

Here is an example of Normalization using BoxCox Transformation:

Figure 3: An example of Normalization using BoxCox Transformation

And here is the Kaggle Notebook link that produced the above dist plot

Similarities between Scaling and Normalization:

There are many similarities between Scaling and Normalization and here are a few of them:

  • Both techniques are used to transform numeric values
  • Both techniques are part of the broader practice of data cleaning in machine learning
  • Both techniques are only applied to numeric data

Differences between Scaling and Normalization:

There are also many differences between Scaling and Normalization. Here are a few of them:

  • Scaling changes the range of the data and keeps the shape of the distribution unaffected
  • Normalization changes the shape of the distribution and keeps the range of the data unaffected
  • Scaling not just changes the range of the data, it can also shrink and stretch the data to fit within a given range
  • Normalization adjusts the numeric values of the data to a common scale while leaving the range unaffected

Conclusion

Thanks for reading this blog post!

If you have any questions or concerns feel free to post a comment in this post and I will get back to you if I find the time.

If you found this article helpful feel free share it and make sure to follow me on Twitter and GitHub, connect with me on LinkedIn and subscribe to my YouTube channel.