Analysis of customer purchasing patterns and demographics based on customer transaction data.
Dataset includes the following information:
- "in-store" - indicating whether the purchase was made online or in-store (online = 0, in-store = 1)
- "age" - indicating the customer age
- "items" - indicating the number of items purchased per transaction
- "amount" - indicating the amount spent in USD per transaction
- "region" - indicating the region in which the transaction was made (north = 1, south = 2, east = 3, west = 4)
The goal for Task 1 is to perform an initial exploratory data analysis and to specifically answer the questions:
- Do customers in differnt regions spend more per transaction? Which regions spend the most/least?
- Is there a relationship between the number of items purchased and the amount spent?
The goal for Task 2 is to dive deeper into the data by implementing 3 different classification algorithms. The models used were Decision Tree, Random Forest, and Gradient Boosting. The analysis was performed with the following questions in mind:
- Are there differences in the age of customers across the different regions? If so, can we predict the age of a customer in a region based on the other demographic data?
- Is there any relationship between the age of a customer and whether a purchase was made online or in-store? Do any other factors correlate to whether a purchase was made online or in-store?
The results of the analysis are located in the attached PowerPoint in the form of a .pdf file, titled "Customer Purchasing Patterns Report". The PowerPoint was a group project/presentation, so I have credited my other group members on the first slide. Although the presentation was done as a group, all of the analysis in Python is my own work and all graphs used in the PowerPoint are from my analysis. The premise behind the presentation was to present results of the analysis along with recommendations to the company (Blackwell).