Next Lesson

Stock Price Prediction using Machine Learning

Real-World Scenarios in Python Stock Price Prediction using Machine Learning

Objective:

Automate the process of fetching historical stock price data.
Preprocess the data for model training.
Train a machine learning model to predict future stock prices.
Evaluate the model’s performance.
Save the model and metrics.
Generate and send email reports to stakeholders.

Step-by-Step Solution

Setup Environment:
- Install necessary libraries: requests, pandas, scikit-learn, joblib, smtplib, email, logging.
Data Fetching:
- Define functions to fetch historical stock price data from an API.
Data Preprocessing:
- Use pandas to clean and preprocess the data for machine learning.
Model Training:
- Use scikit-learn to train a machine learning model on the preprocessed data.
Model Evaluation:
- Evaluate the model’s performance using appropriate metrics.
Model and Metrics Saving:
- Save the trained model and evaluation metrics using joblib.
Report Generation:
- Compile the evaluation metrics into an email report.
Automated Emailing:
- Send the generated report to stakeholders using smtplib.
Error Handling and Logging:
- Implement robust error handling and logging throughout the process.

Code Implementation

Step 1: Setting Up the Environment

First, we install the necessary libraries using pip:

pip install requests pandas scikit-learn joblib smtplib email logging

These libraries are used for making HTTP requests, data manipulation, machine learning, model serialization, sending emails, and logging.

Step 2: Data Fetching

We fetch historical stock price data from an API:

import requests
import logging
import pandas as pd

# Configure logging
logging.basicConfig(filename='stock_prediction.log', level=logging.INFO,
                    format='%(asctime)s:%(levelname)s:%(message)s')

def fetch_stock_data(api_url):
    try:
        response = requests.get(api_url)
        response.raise_for_status()
        stock_data = response.json()
        logging.info("Stock data fetched successfully.")
        return stock_data
    except requests.exceptions.RequestException as e:
        logging.error(f"Error fetching stock data: {e}")
        return None

api_url = "https://api.example.com/stock/AAPL"
stock_data = fetch_stock_data(api_url)

Logging Configuration: We configure logging to record the process and errors.
Fetching Data: The fetch_stock_data function makes a GET request to the provided API URL. If successful, it logs the success message and returns the stock data in JSON format. If there is an error, it logs the error message and returns None.

Step 3: Data Preprocessing

We preprocess the fetched stock price data:

def preprocess_stock_data(stock_data):
    df = pd.DataFrame(stock_data)
    
    # Convert date column to datetime
    df['date'] = pd.to_datetime(df['date'])
    
    # Sort by date
    df = df.sort_values('date')
    
    # Feature engineering: create lag features
    df['price_lag_1'] = df['close'].shift(1)
    df['price_lag_2'] = df['close'].shift(2)
    df['price_lag_3'] = df['close'].shift(3)
    
    # Drop rows with NaN values
    df = df.dropna()
    
    # Split into features and target
    X = df[['price_lag_1', 'price_lag_2', 'price_lag_3']]
    y = df['close']
    
    logging.info("Stock data preprocessed successfully.")
    return X, y

X, y = preprocess_stock_data(stock_data)

DataFrame Creation: We use pandas to create a DataFrame from the stock data.
Datetime Conversion: We convert the date column to datetime format for proper sorting and handling.
Sorting Data: We sort the DataFrame by date.
Feature Engineering: We create lag features (price_lag_1, price_lag_2, price_lag_3) to use as inputs for the model.
Dropping NaN Values: We drop rows with NaN values resulting from the lag feature creation.
Splitting Data: We split the data into features (X) and target (y).

Step 4: Model Training

We train a machine learning model on the preprocessed data:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

def train_model(X, y):
    # Split the data into training and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Initialize and train the model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Make predictions on the test set
    y_pred = model.predict(X_test)
    
    # Calculate mean squared error
    mse = mean_squared_error(y_test, y_pred)
    
    logging.info("Model trained successfully.")
    return model, mse

model, mse = train_model(X, y)

Train-Test Split: We split the data into training and test sets.
Model Initialization: We initialize a Linear Regression model.
Model Training: We train the model on the training data.
Predictions: We make predictions on the test data.
Evaluation: We calculate the mean squared error (MSE) to evaluate the model’s performance.

Step 5: Model and Metrics Saving

We save the trained model and evaluation metrics:

import joblib

def save_model_and_metrics(model, mse):
    # Save the model to a file
    joblib.dump(model, 'stock_model.pkl')
    
    # Save the metrics to a text file
    with open('model_metrics.txt', 'w') as f:
        f.write(f"Mean Squared Error: {mse}")
    
    logging.info("Model and metrics saved successfully.")

save_model_and_metrics(model, mse)

Model Saving: We save the trained model using joblib.
Metrics Saving: We save the MSE to a text file.

Step 6: Report Generation

We generate a report based on the evaluation metrics:

def generate_report(mse):
    # Compile the report content
    report_content = f"""
    Stock Price Prediction Model Performance
    
    Mean Squared Error: {mse}
    """
    
    logging.info("Report generated successfully.")
    return report_content

report_content = generate_report(mse)

Report Content: We compile the evaluation metrics into a textual report.

Step 7: Automated Emailing

We send the generated report via email:

import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText

def send_email_report(report_content, to_email):
    from_email = "youremail@example.com"
    subject = "Stock Price Prediction Model Performance"
    body = report_content
    
    # Create a multipart email message
    msg = MIMEMultipart()
    msg['From'] = from_email
    msg['To'] = to_email
    msg['Subject'] = subject
    
    # Attach the report content to the email
    msg.attach(MIMEText(body, 'plain'))
    
    try:
        # Connect to the SMTP server, login, and send the email
        server = smtplib.SMTP('smtp.example.com', 587)
        server.starttls()
        server.login(from_email, "yourpassword")
        server.send_message(msg)
        server.quit()
        
        logging.info("Email report sent successfully.")
    except Exception as e:
        logging.error(f"Error sending email report: {e}")

to_email = "stakeholder@example.com"
send_email_report(report_content, to_email)

Email Composition: We create a multipart email message and attach the report content.
SMTP Connection: We connect to the SMTP server, log in, and send the email.

Step 8: Full Script Execution

The full script integrates all the steps and ensures a seamless workflow from data fetching to reporting. The main function can be wrapped up as follows:

def main():
    # Fetch stock data
    stock_data = fetch_stock_data(api_url)
    
    if stock_data:
        # Preprocess stock data
        X, y = preprocess_stock_data(stock_data)
        
        # Train model
        model, mse = train_model(X, y)
        
        # Save model and metrics
        save_model_and_metrics(model, mse)
        
        # Generate report
        report_content = generate_report(mse)
        
        # Send email report
        send_email_report(report_content, to_email)
    
if __name__ == '__main__':
    main()

Summary

Data Fetching: Uses requests to fetch historical stock price data from an API.
Data Preprocessing: Uses pandas to clean and preprocess the data, creating lag features.
Model Training: Uses scikit-learn to train a Linear Regression model.
Model Evaluation: Uses mean squared error to evaluate the model’s performance.
Model and Metrics Saving: Saves the trained model and evaluation metrics using joblib.
Report Generation: Compiles the evaluation metrics into an email report.
Automated Emailing: Uses smtplib to send the report via email.
Error Handling and Logging: Implements robust error handling and logging throughout the script.

Previous Lesson

Back to Tutorial

Next Lesson

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
tk_lr	1 year	The tk_lr is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_or	5 years	The tk_or is a referral cookie set by the JetPack plugin on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
tk_r3d	3 days	JetPack installs this cookie to collect internal metrics for user activity and in turn improve user experience.
tk_tc	session	JetPack sets this cookie to record details on how user's use the website.