In this example, we’ll consider a retail company that wants to perform a comprehensive sales analysis using Python. The objectives include:
We’ll break this down into several detailed steps:
We’ll start by loading sales data and performing necessary preprocessing steps.
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load the sales data
file_path = '/mnt/data/sales_data.csv'
sales_data = pd.read_csv(file_path)
# Display basic information about the data
print(sales_data.info())
# Convert Date column to datetime
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
# Handle missing values, if any
sales_data = sales_data.dropna()
# Feature engineering: Extract year, month, and day from the Date
sales_data['Year'] = sales_data['Date'].dt.year
sales_data['Month'] = sales_data['Date'].dt.month
sales_data['Day'] = sales_data['Date'].dt.day
# Scale numerical features for clustering
scaler = StandardScaler()
sales_data[['Quantity', 'Price']] = scaler.fit_transform(sales_data[['Quantity', 'Price']])
# Display the first few rows of the preprocessed data
print(sales_data.head())
Explanation:
pandas.read_csv.Date column to datetime format for easier manipulation.Date column.Quantity and Price) using StandardScaler to prepare for clustering.We’ll use the Prophet library to forecast future sales.
from fbprophet import Prophet
# Prepare data for Prophet
sales_forecast_data = sales_data.groupby('Date')['Quantity'].sum().reset_index()
sales_forecast_data.columns = ['ds', 'y']
# Initialize and fit the Prophet model
model = Prophet()
model.fit(sales_forecast_data)
# Create a DataFrame with future dates
future_dates = model.make_future_dataframe(periods=365)
# Make predictions
forecast = model.predict(future_dates)
# Display forecasted data
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
Explanation:
Date and sum the Quantity to prepare it for Prophet.We’ll use K-Means clustering to segment customers based on their purchasing behavior.
from sklearn.cluster import KMeans
# Aggregate data by Customer ID for clustering
customer_data = sales_data.groupby('CustomerID').agg({
'Quantity': 'sum',
'Price': 'mean'
}).reset_index()
# Apply K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=0)
customer_data['Cluster'] = kmeans.fit_predict(customer_data[['Quantity', 'Price']])
# Display the clustered data
print(customer_data.head())
Explanation:
CustomerID to compute total Quantity purchased and average Price.Create an interactive dashboard using Plotly Dash to visualize the data and predictions.
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
# Initialize the Dash app
app = dash.Dash(__name__)
# Define the layout of the app
app.layout = html.Div([
html.H1("Retail Sales Dashboard"),
dcc.Graph(id='sales-forecast'),
dcc.Graph(id='customer-segmentation'),
])
# Define callback to update sales forecast graph
@app.callback(
Output('sales-forecast', 'figure'),
[Input('sales-forecast', 'id')]
)
def update_sales_forecast(_):
figure = go.Figure()
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], mode='lines', name='Forecast'))
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_lower'], mode='lines', name='Lower Bound', fill='tonexty'))
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_upper'], mode='lines', name='Upper Bound', fill='tonexty'))
figure.update_layout(title='Sales Forecast', xaxis_title='Date', yaxis_title='Sales')
return figure
# Define callback to update customer segmentation graph
@app.callback(
Output('customer-segmentation', 'figure'),
[Input('customer-segmentation', 'id')]
)
def update_customer_segmentation(_):
figure = go.Figure()
for cluster in customer_data['Cluster'].unique():
cluster_data = customer_data[customer_data['Cluster'] == cluster]
figure.add_trace(go.Scatter(x=cluster_data['Quantity'], y=cluster_data['Price'], mode='markers', name=f'Cluster {cluster}'))
figure.update_layout(title='Customer Segmentation', xaxis_title='Quantity', yaxis_title='Price')
return figure
# Run the app
if __name__ == '__main__':
app.run_server(debug=True)
Explanation:
This advanced scenario covers:
To test the entire scenario, you need to ensure you have the necessary libraries installed, the appropriate data available, and follow a structured testing approach. Here’s a step-by-step guide to help you:
Install Required Libraries: Make sure you have all the required Python libraries installed. You can install them using pip:
Prepare the Data: Ensure you have a sales data CSV file named sales_data.csv with the following columns:
Date: Date of the sale.CustomerID: Identifier for the customer.Quantity: Number of items sold.Price: Price of the item.Create a Python script (e.g., preprocess.py) to test data loading and preprocessing:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load the sales data
file_path = 'sales_data.csv'
sales_data = pd.read_csv(file_path)
# Display basic information about the data
print(sales_data.info())
# Convert Date column to datetime
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
# Handle missing values, if any
sales_data = sales_data.dropna()
# Feature engineering: Extract year, month, and day from the Date
sales_data['Year'] = sales_data['Date'].dt.year
sales_data['Month'] = sales_data['Date'].dt.month
sales_data['Day'] = sales_data['Date'].dt.day
# Scale numerical features for clustering
scaler = StandardScaler()
sales_data[['Quantity', 'Price']] = scaler.fit_transform(sales_data[['Quantity', 'Price']])
# Display the first few rows of the preprocessed data
print(sales_data.head())
Run this script to ensure the preprocessing works correctly.
Create a Python script (e.g., forecast.py) to test sales forecasting:
from fbprophet import Prophet
import pandas as pd
# Load preprocessed sales data
file_path = 'sales_data.csv'
sales_data = pd.read_csv(file_path)
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
# Prepare data for Prophet
sales_forecast_data = sales_data.groupby('Date')['Quantity'].sum().reset_index()
sales_forecast_data.columns = ['ds', 'y']
# Initialize and fit the Prophet model
model = Prophet()
model.fit(sales_forecast_data)
# Create a DataFrame with future dates
future_dates = model.make_future_dataframe(periods=365)
# Make predictions
forecast = model.predict(future_dates)
# Display forecasted data
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail())
Run this script to ensure the sales forecasting works correctly.
Create a Python script (e.g., segmentation.py) to test customer segmentation:
from sklearn.cluster import KMeans
import pandas as pd
from sklearn.preprocessing import StandardScaler
# Load preprocessed sales data
file_path = 'sales_data.csv'
sales_data = pd.read_csv(file_path)
# Aggregate data by Customer ID for clustering
customer_data = sales_data.groupby('CustomerID').agg({
'Quantity': 'sum',
'Price': 'mean'
}).reset_index()
# Scale the data
scaler = StandardScaler()
customer_data[['Quantity', 'Price']] = scaler.fit_transform(customer_data[['Quantity', 'Price']])
# Apply K-Means clustering
kmeans = KMeans(n_clusters=4, random_state=0)
customer_data['Cluster'] = kmeans.fit_predict(customer_data[['Quantity', 'Price']])
# Display the clustered data
print(customer_data.head())
Run this script to ensure customer segmentation works correctly.
Create a Python script (e.g., dashboard.py) to test the interactive dashboard:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.graph_objs as go
import pandas as pd
from fbprophet import Prophet
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
# Load and preprocess data
file_path = 'sales_data.csv'
sales_data = pd.read_csv(file_path)
sales_data['Date'] = pd.to_datetime(sales_data['Date'])
sales_data = sales_data.dropna()
sales_data['Year'] = sales_data['Date'].dt.year
sales_data['Month'] = sales_data['Date'].dt.month
sales_data['Day'] = sales_data['Date'].dt.day
# Scale numerical features for clustering
scaler = StandardScaler()
sales_data[['Quantity', 'Price']] = scaler.fit_transform(sales_data[['Quantity', 'Price']])
# Forecasting
sales_forecast_data = sales_data.groupby('Date')['Quantity'].sum().reset_index()
sales_forecast_data.columns = ['ds', 'y']
model = Prophet()
model.fit(sales_forecast_data)
future_dates = model.make_future_dataframe(periods=365)
forecast = model.predict(future_dates)
# Clustering
customer_data = sales_data.groupby('CustomerID').agg({
'Quantity': 'sum',
'Price': 'mean'
}).reset_index()
customer_data[['Quantity', 'Price']] = scaler.fit_transform(customer_data[['Quantity', 'Price']])
kmeans = KMeans(n_clusters=4, random_state=0)
customer_data['Cluster'] = kmeans.fit_predict(customer_data[['Quantity', 'Price']])
# Initialize the Dash app
app = dash.Dash(__name__)
# Define the layout of the app
app.layout = html.Div([
html.H1("Retail Sales Dashboard"),
dcc.Graph(id='sales-forecast'),
dcc.Graph(id='customer-segmentation'),
])
# Define callback to update sales forecast graph
@app.callback(
Output('sales-forecast', 'figure'),
[Input('sales-forecast', 'id')]
)
def update_sales_forecast(_):
figure = go.Figure()
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat'], mode='lines', name='Forecast'))
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_lower'], mode='lines', name='Lower Bound', fill='tonexty'))
figure.add_trace(go.Scatter(x=forecast['ds'], y=forecast['yhat_upper'], mode='lines', name='Upper Bound', fill='tonexty'))
figure.update_layout(title='Sales Forecast', xaxis_title='Date', yaxis_title='Sales')
return figure
# Define callback to update customer segmentation graph
@app.callback(
Output('customer-segmentation', 'figure'),
[Input('customer-segmentation', 'id')]
)
def update_customer_segmentation(_):
figure = go.Figure()
for cluster in customer_data['Cluster'].unique():
cluster_data = customer_data[customer_data['Cluster'] == cluster]
figure.add_trace(go.Scatter(x=cluster_data['Quantity'], y=cluster_data['Price'], mode='markers', name=f'Cluster {cluster}'))
figure.update_layout(title='Customer Segmentation', xaxis_title='Quantity', yaxis_title='Price')
return figure
# Run the app
if __name__ == '__main__':
app.run_server(debug=True)
Run this script to start the dashboard. Open your browser and navigate to the displayed URL to interact with the dashboard.
preprocess.py script runs without errors and the output looks correct.forecast.py script runs and outputs reasonable forecast data.segmentation.py script runs and outputs reasonable clusters.dashboard.py script runs, the dashboard loads in the browser, and the visualizations update correctly based on the callbacks.When you run the dashboard.py script, Dash will start a local server. By default, it runs on http://127.0.0.1:8050.
Here is how you can start the server and access the dashboard:
dashboard.py script is located.python dashboard.pyDash is running on http://127.0.0.1:8050/http://127.0.0.1:8050.