Deploying a machine learning model

Python and Machine Learning (ML) – Part 1 Deploying a machine learning model

Deploying a machine learning model involves taking the model you’ve trained and making it available in a production environment where it can provide predictions on new data. This is the stage where the model starts delivering value by automating decisions, enhancing user experiences, or enabling new capabilities. Here’s a detailed look at the deployment process:

1. Model Validation

Before deployment, validate the model thoroughly using unseen test data to ensure that it performs as expected in real-world scenarios. This involves re-confirming that the model’s predictions are accurate and reliable.

2. Prepare the Model for Deployment

This step includes:

Serialization: Saving the model to a file format suitable for deployment. Common formats include pickle in Python, HDF5 for large numerical models, and ONNX for a cross-platform approach designed to support different machine learning frameworks.
Version Control: Keep track of different versions of your model, similar to how code is versioned. This allows you to roll back to a previous version if needed.

3. Choose a Deployment Platform

The choice of platform depends on your specific needs, such as expected load, latency requirements, and scalability:

On-premises: Deploying on your own hardware. This approach offers control over the infrastructure but requires significant maintenance and upfront investment.
Cloud-based Platforms: Services like AWS SageMaker, Google AI Platform, or Microsoft Azure ML provide robust environments specifically designed for deploying machine learning models. These platforms offer scalability, ease of use, and often include tools for monitoring and maintenance.
Edge Devices: For applications needing real-time decision making with low latency (like IoT devices), deploying directly on the edge device might be necessary.

4. Integration

Integrate the model into the existing software infrastructure. This might involve:

APIs: Creating an API around the model so that other software systems can query it and receive predictions.
Microservices: Deploying the model as a standalone service that communicates with other parts of your application through a lightweight, well-defined interface.

5. Monitoring and Maintenance

After deployment, it’s crucial to monitor the model to ensure it continues to perform well:

Performance Monitoring: Track metrics like latency and throughput to ensure the model is meeting its performance requirements.
Model Drift Monitoring: Over time, the data the model receives may change (this is known as data drift), or the model’s performance may degrade (model drift). It’s essential to monitor for these changes and update the model as needed.

6. Continuous Improvement

Based on feedback and ongoing monitoring results, the model may need periodic updates. This involves:

Retraining: Updating the model with new data.
A/B Testing: Testing multiple models or model versions to determine which performs best.

Example: Deploying a Flask API for a Model

Here’s a simple example of deploying a machine learning model using Flask, a lightweight web framework in Python:

from flask import Flask, request, jsonify
import pickle

# Load the trained model (Ensure your model is saved as 'model.pkl' in the project directory)
model = pickle.load(open('model.pkl', 'rb'))

app = Flask(__name__)

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    prediction = model.predict([data['features']])
    return jsonify({'prediction': list(prediction)})

if __name__ == '__main__':
    app.run(port=5000)

Code Explanation

Importing Libraries

Flask: A lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications.
request: This object from Flask is used to handle incoming request data.
jsonify: A helper function in Flask used to convert the data to JSON format, which is more accessible and readable over the web.
pickle: A Python module used for serializing and deserializing a Python object structure, which in this case is used to load a pre-trained machine learning model.

Loading the Trained Model

The model is loaded from a file using Python’s pickle module. This model is stored in a file called ‘model.pkl’, which should be present in the project directory.

model = pickle.load(open('model.pkl', 'rb'))

Setting up Flask

An instance of the Flask class is created. __name__ is a Python special variable which gives Python files a unique name to correctly identify the resources.

app = Flask(__name__)

Defining the Prediction Route

The @app.route decorator is used to bind the function to a URL. Here, it defines the /predict endpoint, which listens for POST requests. This is the endpoint to which data for prediction will be sent.

@app.route('/predict', methods=['POST'])

Predict Function

The predict() function is defined to handle the incoming requests. It:
- Extracts JSON data from the incoming POST request using request.get_json().The model makes predictions based on the extracted features.The prediction is then converted into a list (to make it serializable) and returned as a JSON response.

def predict():
    data = request.get_json()
    prediction = model.predict([data['features']])
    return jsonify({'prediction': list(prediction)})

Main Block

The main block checks if the script is executed directly (not imported). If so, it calls app.run() to start the server.
- port=5000 sets the port to 5000, on which the Flask server will listen.

if __name__ == '__main__':
    app.run(port=5000)

Running the Server

When the script is run, Flask will start the web server, making it possible to send HTTP POST requests to http://localhost:5000/predict with JSON data to get predictions.

Output

Web Server Start: When the script is executed, it starts a Flask web server on localhost at port 5000. This server waits for incoming connections.
Receiving a POST Request: The server expects POST requests at the endpoint /predict. These requests should include a JSON object in the body with a key named ‘features’, where the value is an array of inputs necessary for the machine learning model.
Processing and Response: Upon receiving a POST request, the predict() function:
- Extracts the ‘features’ from the JSON data sent to the server.
- Passes these features to the loaded machine learning model.
- The model predicts an outcome based on the provided features.
- The prediction is then sent back to the client in JSON format.
Example of a POST Request and Response:
- Request: A client sends a POST request with JSON content such as {"features": [5.1, 3.5, 1.4, 0.2]}.
- Response: The server processes this input through the model, and the model outputs its prediction, which is then returned as JSON. For instance, the response might look like {"prediction": [0]}, assuming the model predicts the class ‘0’ for the input features.

Output Example

Here is a more detailed illustration assuming the server is running, and you make a POST request with specific input features:

If you send a request with iris flower measurements, the server will respond with the predicted iris species encoded as an integer (based on the model’s training, where each integer corresponds to a specific species).

The actual content of the output will depend on the specifics of the model and the data it was trained on.

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.