Scikit-learn Python Library: APIs

Python and Machine Learning (ML) – Part 1 Scikit-learn Python Library: APIs

These APIs are well-documented and designed to facilitate easy and efficient model building, training, and evaluation. Here’s an overview of the main APIs provided by Scikit-learn:

1. Estimator API

This is the core API in scikit-learn and is used for all the machine learning algorithms. Each estimator in scikit-learn is a Python class, and the library includes estimators for classification, regression, clustering, and dimensionality reduction. Key methods include:

.fit(): Used for training the model.
.predict(): Used for making predictions.
.score(): Used for evaluating the predictions.

2. Transformers API

Transformers are used for data preprocessing and feature extraction. They include scaling, normalizing, and converting data so that it can be effectively used by machine learning models. Key methods include:

.fit(): Learning the transformation parameters from the training data.
.transform(): Applying the transformation to any data using the learned parameters.
.fit_transform(): A utility method that combines fit and transform into a single operation.

3. Pipeline API

Pipelines help to streamline the process of chaining multiple estimators into one, which is useful for building a model that includes a sequence of transformations followed by a classifier or regressor. Key components include:

Pipeline: Class that behaves like a compound estimator.
make_pipeline: Helper function to simplify pipeline construction.

4. Model Selection API

This part of the library includes tools to choose between models, primarily through cross-validation:

train_test_split: Split arrays or matrices into random train and test subsets.
cross_val_score: Evaluate a score by cross-validation.
GridSearchCV: Exhaustive search over specified parameter values for an estimator.
RandomizedSearchCV: Randomized search over parameters.

5. Metrics API

Scikit-learn provides a broad range of metrics to evaluate the performance of your models, such as accuracy, ROC-AUC, mean squared error, etc., and also tools to compute some of these metrics across different cross-validation folds.

Classification metrics: accuracy_score, roc_auc_score, confusion_matrix, etc.
Regression metrics: mean_squared_error, r2_score, etc.
Clustering metrics: silhouette_score, adjusted_rand_score, etc.

6. Decomposition API

This API is used for dimensionality reduction, offering various methods to break down high-dimensional datasets into manageable parts while retaining most of the important information:

PCA: Principal component analysis.
NMF: Non-negative matrix factorization.
TruncatedSVD: Dimensionality reduction using truncated SVD.

7. Ensemble Methods API

Scikit-learn includes several ensemble algorithms which combine the predictions of several base estimators to improve generalizability and robustness:

RandomForestClassifier and RandomForestRegressor
GradientBoostingClassifier and GradientBoostingRegressor
AdaBoostClassifier and AdaBoostRegressor

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.