Supervised learning: Linear Regression

Python and Machine Learning (ML) – Part 1 Supervised learning: Linear Regression

Linear Regression is one of the simplest and most fundamental algorithms in the field of machine learning and statistics, primarily used for predicting a quantitative response. It’s a parametric approach meaning it assumes a linear relationship between the input variables (independent variables) and the single output variable (dependent variable). Here’s a more detailed look at Linear Regression, including its types, method of operation and assumptions.

Types of Linear Regression

Simple Linear Regression: This involves a single independent variable used to predict a dependent variable. It attempts to establish a linear relationship between the two variables by fitting a linear equation to observed data. The equation of a simple linear regression line

𝑦 = 𝛽₀+𝛽₁𝑥+𝜖

where 𝑦 is the dependent variable, 𝑥 is the independent variable, 𝛽₀β₀ is the intercept, 𝛽₁β₁ is the slope, and 𝜖ϵ is the error term.

Multiple Linear Regression: This involves two or more independent variables used to predict a dependent variable by fitting a linear equation to the observed data. The equation for multiple linear regression is:

𝑦=𝛽₀+𝛽₁𝑥₁+𝛽₂𝑥₂+⋯+𝛽_𝑛𝑥_𝑛+𝜖

where each 𝑥 represents a different independent variable, and each 𝛽 represents the coefficient (or slope) of that variable.

How Linear Regression Works

Linear Regression works by estimating the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. The process involves:

Fitting the model: This involves determining the line of best fit through the data points. In the case of simple linear regression, it’s a straight line, whereas, for multiple linear regression, it’s a hyperplane.
Minimizing the error: The commonly used method to fit the line is the least squares method, which minimizes the sum of the squares of the residuals (the differences between observed and predicted values).

Key Assumptions

Linear Regression is based on several key assumptions:

Linearity: The relationship between the independent and dependent variables is linear.
Homoscedasticity: The variance of residual is the same for any value of the independent variables.
Independence: Observations are independent of each other.
No multicollinearity: In multiple linear regression, the independent variables are not too highly correlated.
Normality: For any fixed value of an independent variable, the dependent variable is normally distributed.

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.