Building an Interactive Analytic Dashboard for Product Recommendation Analysis
Introduction
In this project, I aimed to create a recommendation system that suggests products based on a customer’s latest purchase. The system leverages TF-IDF and cosine similarity for recommending similar products. The project culminates in an interactive Streamlit dashboard for data exploration and analysis.
Exploratory Data Analysis (EDA)
The EDA was conducted using an interactive Streamlit app. The app allows users to filter data by various parameters and visualize trends such as sales over time, top-selling products, and customer purchase behavior.
Key Features of the Streamlit App:
- Sales Over Time: A line chart showing sales trends over selected time periods.
- Top-Selling Products: A bar chart highlighting the top 10 products based on sales volume.
- Customer Analysis: Insights into customer purchase frequency and life cycle.
you can acces dashboard link: http://192.168.43.109:8502/
Project Overview
- Data Preparation: The data consists of transaction details such as
CustomerID
,Description
, and other sales information. The dataset was first cleaned and preprocessed to ensure consistency. - Recommendation System: A TF-IDF vectorizer was used to convert product descriptions into vectors, and cosine similarity was employed to find the most similar products.
- Implementation Details:
- Training Data: Descriptions from the training dataset were vectorized and stored.
- Testing Data: For each customer in the testing dataset, the latest purchase was identified, and recommendations were generated.
4. Evaluation: The recommendations were evaluated by comparing them with actual purchases to determine accuracy.
Building recomendation system
# Mapping product descriptions to indices
train_indices = pd.Series(train_df.index, index=train_df['Description']).to_dict()
# Function to recommend products
def recommend_products(input_product, top_n=3):
if input_product not in train_indices:
return [None, None, None]
idx = train_indices[input_product]
sim_scores = list(enumerate(cosine_similarity(tfidf_matrix_train[idx], tfidf_matrix_train).flatten()))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
product_indices = [i[0] for i in sim_scores]
recommended_products = train_df.iloc[product_indices]
return recommended_products['Description'].tolist()
Generating Recommendation
# Generating recommendations for testing data
recommendation_results = []
for customer_id in test_df['CustomerID'].unique():
customer_test_df = test_df[test_df['CustomerID'] == customer_id]
if not customer_test_df.empty:
latest_description = customer_test_df.iloc[-1]['Description']
recommendations = recommend_products(latest_description)
recommendation_results.append({
'input': latest_description,
'rec1': recommendations[0] if len(recommendations) > 0 else None,
'rec2': recommendations[1] if len(recommendations) > 1 else None,
'rec3': recommendations[2] if len(recommendations) > 2 else None,
'purchase': latest_description in recommendations
})
# Converting the results to a DataFrame
recommendation_results_df = pd.DataFrame(recommendation_results)
Conclusion
This project demonstrates how to build a recommendation system using TF-IDF and cosine similarity. The Streamlit app provides an interactive way to explore the data, making it easier to understand sales trends and customer behavior. This system can be further improved by incorporating additional features or fine-tuning the recommendation algorithm.