How to use Scikit-learn Python Library?

Python and Machine Learning (ML) – Part 1 How to use Scikit-learn Python Library?

1. Install Scikit-learn

Ensure Scikit-learn is installed in your Python environment:

pip install scikit-learn

2. Import Necessary Modules

Start your script by importing necessary classes and functions:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

3. Load Data

Scikit-learn includes several built-in datasets, which can be used to quickly get started:

data = load_iris()
X, y = data.data, data.target

4. Split the Data

Divide the data into training and testing sets:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5. Choose a Model

Select a model that suits your problem. For example, a classifier for a classification task:

model = RandomForestClassifier(n_estimators=100)

6. Train the Model

Fit the model to your data using the fit method:

model.fit(X_train, y_train)

7. Make Predictions

Use the trained model to make predictions on new data:

predictions = model.predict(X_test)

8. Evaluate the Model

Assess the performance of the model using appropriate metrics:

accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

9. Model Tuning and Cross-Validation

Improve your model by tuning hyperparameters and using cross-validation:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'n_estimators': [50, 100, 200],
    'max_features': ['auto', 'sqrt', 'log2'],
    'max_depth' : [4,5,6,7,8],
    'criterion' :['gini', 'entropy']
}

CV_rfc = GridSearchCV(estimator=model, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
print("Best parameters:", CV_rfc.best_params_)

10. Save and Load Models

You can save your trained models using joblib for later use:

from joblib import dump, load
dump(model, 'model.joblib')
loaded_model = load('model.joblib')

Additional Resources and Learning

Scikit-learn’s documentation provides comprehensive guides, tutorials, and examples for all these steps and more. You can explore deeper into Scikit-learn’s functionalities by visiting their official documentation. This includes detailed tutorials on different machine learning methods, data transformations, model evaluation strategies, and more advanced topics like creating pipelines and working with text data.

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.