Published on

Linear Regression in Machine Learning Explained

Ever wonder what the terms Regression and Linear Regression mean in a machine learning context? If so, keep reading to find out!

Image of a Barn and Fence in Autumn

What is Regression in Machine Learning?

Regression is primarily a technique for analyzing the relationship between independent variables/features and a dependent variable/outcome. Regression is a method most frequently used to solve supervised machine learning problems.

Regression modeling, which as the name implies uses regression, consists of building a mapping function such that input variables map to a continuous output variable. The input variables are independent and the continuous output variable is dependent in this case. This type of modelling is mainly useful in two ways: (1) To predict the outcome of new and unseen input data (2) To forecast and predict gaps in missing data.

What does Linear Regression mean in Machine Learning?

Linear regression is a type of machine learning model. It is most easily explained as a linear equation that combines a specific set of input values, which are independent variables, to the predicted output value, which is a dependent variable.

Although linear regression is originally a concept derived from the field of statistics, it is used extensively in machine learning for the purposes of understanding the relationship between independent input variables and a dependent output variable.

There are many different types of linear regression, the most notable of which are simple linear regression(where there is only a single input variable) and multiple linear regression(where there are multiple input variables).

The main points of differentiation between different types of linear regression are concerning the number of independent variables and the type of relationship between the independent and dependent variables.

Linear regression can be defined as a simple linear equation due to the fact that ultimately, it is a line drawn across a set of data points that is designed to model those data points in the most accurately possible way.

Here is what a linear equation looks like:

y = ax + c

Where a represents the slope of the line and the c represents its y-intercept.

Example of Linear Regression:

In order to better understand linear regression, let's go over an example. Here is a graph showing the heights of fathers in comparison to their sons:

Figure 1: Comparison of fathers' heights to their sons

Figure 1: Comparison of fathers' heights to their sons

Here is a code snippet using the R programming language that will calculate the linear regression line for the graph above:

library(tidyverse)
library(HistData)
library(caret)

# Obtain the dataset data and rename the `childHeight` column to `son`
galton_heights <- GaltonFamilies %>%
  filter(childNum == 1 & gender == "male") %>%
  select(father, childHeight) %>%
  rename(son = childHeight)

y <- galton_heights$son
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)

# Divide the original dataset into train and test sets
train_set <- galton_heights %>% slice(-test_index)
test_set <- galton_heights %>% slice(test_index)


# Fit linear regression model
fit <- lm(son ~ father, data = train_set)

# Plot the graph and the linear regression line
plot(galton_heights, pch = 16, col = "blue")
abline(fit, col="red", lwd=5)

And here is the linear regression line that the above code produces:

Figure 2: Linear regression line in graph

Figure 2: Linear regression line in graph

Ok, that's it for this blog post on what Linear Regression is. I hope you found it useful!

Conclusion

Thanks for reading this blog post!

If you have any questions or concerns please feel free to post a comment in this post and I will get back to you if I find the time.

If you found this article helpful please share it and make sure to follow me on Twitter and GitHub, connect with me on LinkedIn and subscribe to my YouTube channel.