Supervised learning: Support Vector Machines (SVM)

Python and Machine Learning (ML) – Part 1 Supervised learning: Support Vector Machines (SVM)

Support Vector Machines (SVM) are a robust and versatile category of supervised machine learning algorithms, used for both classification and regression tasks, but predominantly known for their application in classification. Developed originally in the 1960s and then refined in the 1990s, SVMs have become a fundamental tool in the data scientist’s toolkit due to their ability to handle both linear and non-linear boundaries depending on the kernel used.

How SVM Works

The main idea of SVM is to find the optimal hyperplane which best separates the data into its respective classes. For two-dimensional data, this hyperplane can be thought of as a line dividing a plane in two parts where each class lays on either side.

Key Components:

Hyperplane: This is the decision boundary that separates different classes in the feature space. In SVM, the aim is to find the hyperplane that has the maximum margin, i.e., the maximum distance between data points of both classes.
Support Vectors: These are the data points that are closest to the hyperplane and influence its position and orientation. SVMs are named after these support vectors because changing their position can alter the position of the hyperplane.
Margin: This is a gap between the two lines on the closest class points.

Types of SVM

Linear SVM: Linear SVM is used for linearly separable data, where the data can be separated using a straight line (or hyperplane in higher dimensions).
Non-linear SVM: When data is not linearly separable, SVM uses a method called the kernel trick to transform the input space to a higher dimensional space where a hyperplane can be used to separate the data. Common kernels include:
- Polynomial Kernel
- Radial Basis Function (RBF) Kernel: Often just called the Gaussian kernel.
- Sigmoid Kernel

The Kernel Trick

The kernel trick is a key feature that allows SVM to fit the maximum-margin hyperplane in a transformed feature space. The transformation is performed by a kernel function which computes a dot product between two vectors in the feature space. This allows the algorithm to fit the data in a higher-dimensional space without explicitly calculating the coordinates in that space, which can be computationally expensive.

Training SVM

The training of an SVM involves selecting a kernel type and its parameters as well as a regularization parameter 𝐶C that controls the trade-off between achieving a low error on the training data and minimizing the norm of the weights, thus ensuring a lower model complexity and better generalization properties on new data.

Applications of SVM

Face Detection: SVM classifies parts of the image as a face and non-face and creates a square boundary around the face.
Text and Hypertext Categorization: SVMs allow Textual data to be classified with high accuracy using training data. It is used for various classification tasks such as news articles, emails, and web pages classification.
Classification of Images: SVMs provide better search accuracy for image classification. It provides better accuracy in comparison to the traditional query-based searching techniques.
Bioinformatics: It includes protein classification and cancer classification. SVMs are used to classify proteins with up to 90% of the compounds classified correctly.
Handwriting Recognition: SVMs are used to recognize handwritten characters used widely.

Advantages of SVM

Effective in high dimensional spaces.
Still effective in cases where the number of dimensions exceeds the number of samples.
Uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
Versatile: Different Kernel functions can be specified for the decision function. Common kernels are provided, but it is also possible to specify custom kernels.

Limitations of SVM

If the number of features is much greater than the number of samples, the method is likely to give poor performances.
SVMs do not directly provide probability estimates, these are calculated using an expensive five-fold cross-validation.
They are sensitive to the type of kernel used and can be quite prone to over-fitting if the kernel parameters and regularization terms are not carefully chosen.

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.