- Published on
Logistic Regression in Machine Learning Explained
What is Logistic Regression?
Logistic regression is a type of machine learning model. From the two types of machine learning models(regression and classification), it is a classification algorithm even though the name would indicate it to be a regression model.
It derives its name from the logistic function, which is a type of exponential function. This function is characterized by a S-shaped curve(also known as a sigmoid curve).
Accordingly, logistic regression models produce S-shaped curves when plotted on graphs, like so:
Figure 2: Logistic Regression Model when graphed
Here are important points about Logistic Regression:
- Logistic Regression is intended for classification ML problems that have a single binary output variable. Machine learning problems with any other type of output variable/(s) should not use Logistic Regression for predictive modelling purposes
- Logistic Regression algorithms are similar to Linear Regression algorithms in that they both are predictive analysis algorithms but with Logistic Regression using the more complex Sigmoid function instead of relatively simple linear funtions
- Whereas linear regression produces numeric output values, logistic regression produces binary output values(0 or 1)
- Although logistic regression is technically a classification algorithm, it does predict probabilities in a regression algorithm sense. Its just that the probability is then converted into a binary value(0 or 1)
- Logistic regression uses another machine learning algorithm: Maximum-likelihood estimation in order to estimate the coefficients(Beta values b) for logistic regression from the training data available
- Logistic regression and linear regression share the same method of data preparation
Example of Logistic Regression In R
Here is a code snippet in the R programming language that demostrates the use of the Logistic Regression Model:
library(ggplot2) library(cowplot) url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data" data <- read.csv(url, header=FALSE) ... logistic <- glm(hd ~ ., data = data, family = "binomial") ... predicted.data <- data.frame( probability.of.hd=logistic$fitted.values, hd=data$hd) predicted.data <- predicted.data[ order(predicted.data$probability.of.hd, decreasing=FALSE),] predicted.data$rank <- 1:nrow(predicted.data) ggplot(data=predicted.data, aes(x=rank, y=probability.of.hd)) + geom_point(aes(color=hd)) + xlab("Index") + ylab("Predicted probability of getting heart disease")
The above code snippet is taken from StatQuest with Josh Starmer and you can find the full version of this code in the this GitHub link
Now, here is the graph that is produced by running the full version of the code snippet above:
Figure 3: Logistic Regression graphed using the R programming language
Here are some points of noteworthiness for the above graph:
- The range of the Y-axis is always between 0 and 1 for Logistic Regression graphs. This is due to the dependant variable(Predicted probability of getting heart disease) being discrete
- The turquoise colored data points represent patients already having heart disease. The light red colored data points represent patients without any past history of heart disease
- The x-axis value 'Rank' is representative of the likelihood of each of the 297 patients in the
data.framehaving heart disease, with a higher rank indicating a high likelihood of getting heart disease
That's it for this blog post on Logistic Regression in Machine Learning. I hope it was helpful!
Thanks for reading this blog post!
If you have any questions or concerns feel free to post a comment in this post and I will get back to you if I find the time.
If you found this article helpful feel free share it and make sure to follow me on Twitter and GitHub, connect with me on LinkedIn and subscribe to my YouTube channel.