Ensure Scikit-learn is installed in your Python environment:
pip install scikit-learn
Start your script by importing necessary classes and functions:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
Scikit-learn includes several built-in datasets, which can be used to quickly get started:
data = load_iris()
X, y = data.data, data.target
Divide the data into training and testing sets:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Select a model that suits your problem. For example, a classifier for a classification task:
model = RandomForestClassifier(n_estimators=100)
Fit the model to your data using the fit
method:
model.fit(X_train, y_train)
Use the trained model to make predictions on new data:
predictions = model.predict(X_test)
Assess the performance of the model using appropriate metrics:
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)
Improve your model by tuning hyperparameters and using cross-validation:
from sklearn.model_selection import GridSearchCV
param_grid = {
'n_estimators': [50, 100, 200],
'max_features': ['auto', 'sqrt', 'log2'],
'max_depth' : [4,5,6,7,8],
'criterion' :['gini', 'entropy']
}
CV_rfc = GridSearchCV(estimator=model, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
print("Best parameters:", CV_rfc.best_params_)
You can save your trained models using joblib for later use:
from joblib import dump, load
dump(model, 'model.joblib')
loaded_model = load('model.joblib')
Scikit-learn’s documentation provides comprehensive guides, tutorials, and examples for all these steps and more. You can explore deeper into Scikit-learn’s functionalities by visiting their official documentation. This includes detailed tutorials on different machine learning methods, data transformations, model evaluation strategies, and more advanced topics like creating pipelines and working with text data.