Objective:
requests, pandas, scikit-learn, joblib, smtplib, email, logging.pandas to clean and preprocess the data for machine learning.scikit-learn to train a machine learning model on the preprocessed data.joblib.smtplib.First, we install the necessary libraries using pip:
pip install requests pandas scikit-learn joblib smtplib email logging
These libraries are used for making HTTP requests, data manipulation, machine learning, model serialization, sending emails, and logging.
We fetch historical stock price data from an API:
import requests
import logging
import pandas as pd
# Configure logging
logging.basicConfig(filename='stock_prediction.log', level=logging.INFO,
format='%(asctime)s:%(levelname)s:%(message)s')
def fetch_stock_data(api_url):
try:
response = requests.get(api_url)
response.raise_for_status()
stock_data = response.json()
logging.info("Stock data fetched successfully.")
return stock_data
except requests.exceptions.RequestException as e:
logging.error(f"Error fetching stock data: {e}")
return None
api_url = "https://api.example.com/stock/AAPL"
stock_data = fetch_stock_data(api_url)
fetch_stock_data function makes a GET request to the provided API URL. If successful, it logs the success message and returns the stock data in JSON format. If there is an error, it logs the error message and returns None.We preprocess the fetched stock price data:
def preprocess_stock_data(stock_data):
df = pd.DataFrame(stock_data)
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'])
# Sort by date
df = df.sort_values('date')
# Feature engineering: create lag features
df['price_lag_1'] = df['close'].shift(1)
df['price_lag_2'] = df['close'].shift(2)
df['price_lag_3'] = df['close'].shift(3)
# Drop rows with NaN values
df = df.dropna()
# Split into features and target
X = df[['price_lag_1', 'price_lag_2', 'price_lag_3']]
y = df['close']
logging.info("Stock data preprocessed successfully.")
return X, y
X, y = preprocess_stock_data(stock_data)
pandas to create a DataFrame from the stock data.price_lag_1, price_lag_2, price_lag_3) to use as inputs for the model.X) and target (y).We train a machine learning model on the preprocessed data:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
def train_model(X, y):
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = model.predict(X_test)
# Calculate mean squared error
mse = mean_squared_error(y_test, y_pred)
logging.info("Model trained successfully.")
return model, mse
model, mse = train_model(X, y)
We save the trained model and evaluation metrics:
import joblib
def save_model_and_metrics(model, mse):
# Save the model to a file
joblib.dump(model, 'stock_model.pkl')
# Save the metrics to a text file
with open('model_metrics.txt', 'w') as f:
f.write(f"Mean Squared Error: {mse}")
logging.info("Model and metrics saved successfully.")
save_model_and_metrics(model, mse)
joblib.We generate a report based on the evaluation metrics:
def generate_report(mse):
# Compile the report content
report_content = f"""
Stock Price Prediction Model Performance
Mean Squared Error: {mse}
"""
logging.info("Report generated successfully.")
return report_content
report_content = generate_report(mse)
We send the generated report via email:
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
def send_email_report(report_content, to_email):
from_email = "youremail@example.com"
subject = "Stock Price Prediction Model Performance"
body = report_content
# Create a multipart email message
msg = MIMEMultipart()
msg['From'] = from_email
msg['To'] = to_email
msg['Subject'] = subject
# Attach the report content to the email
msg.attach(MIMEText(body, 'plain'))
try:
# Connect to the SMTP server, login, and send the email
server = smtplib.SMTP('smtp.example.com', 587)
server.starttls()
server.login(from_email, "yourpassword")
server.send_message(msg)
server.quit()
logging.info("Email report sent successfully.")
except Exception as e:
logging.error(f"Error sending email report: {e}")
to_email = "stakeholder@example.com"
send_email_report(report_content, to_email)
The full script integrates all the steps and ensures a seamless workflow from data fetching to reporting. The main function can be wrapped up as follows:
def main():
# Fetch stock data
stock_data = fetch_stock_data(api_url)
if stock_data:
# Preprocess stock data
X, y = preprocess_stock_data(stock_data)
# Train model
model, mse = train_model(X, y)
# Save model and metrics
save_model_and_metrics(model, mse)
# Generate report
report_content = generate_report(mse)
# Send email report
send_email_report(report_content, to_email)
if __name__ == '__main__':
main()
requests to fetch historical stock price data from an API.pandas to clean and preprocess the data, creating lag features.scikit-learn to train a Linear Regression model.joblib.smtplib to send the report via email.