Instacart Analysis

Moriam Adesola Adegbite
5 min readNov 25, 2023

--

Instacart is an online grocery platform, and I explored the dataset to derive insights that can optimize operations and enhance customer experience. Market basket analysis and customer segmentation analysis were the major analyses carried out. By exploring the dataset, I wanted to understand the following things:

  1. The products commonly added to the cart first
  2. The product pairs that were frequently purchased together
  3. The average number of products in a single order
  4. The customer segments based on purchase frequency
  5. The percentage of customers that haven’t placed an order in the past 30 days
  6. The performance of the aisles and departments
  7. The day and hour with most product orders

Before discussing the insights, here are a few things you should know about the dataset. The dataset has 6 tables; aisles, departments, order products prior, order products train, orders, and products tables. The entity relational diagram is displayed below. I merged the aisles and departments tables into the products table. For this analysis, three tables were used — order products prior, orders, and products.

ERD for Instacart Dataset

The dataset had 32,434,489 orders from 206,209 unique users. A total of 49,688 products were in the Instacart dataset, which was distributed across 134 aisles in 21 departments. The dataset can be found here.

Now to the analysis, insights, and recommendations

  1. Market basket analysis: The market basket analysis was carried out to understand customer behavior and gain insights that the business can use to make strategic decisions about pricing, promotions, and customer recommendations. The analysis showed that the average number of products in a single order is 8, while the top 5 products that were commonly added to the cart first were bananas, a bag of organic bananas, organic whole milk, organic strawberries, and organic avocado.

Also, the top 10 product pairs that are most frequently purchased together are shown in the image below:

Recommendation: Products frequently added to the cart first can be used to optimize inventory, such that these products must be available at all times. Also, frequently purchased products can be recommended to users when one of the products in the product pair is added to the cart.

2. Customer segmentation analysis: Customers were grouped based on their purchase frequency for targeted marketing efforts. The purchase frequency was in 4 categories derived from the lower (5), median (9), and upper (19) percentiles of the total number of orders in the dataset. The analysis showed that 56,797 customers had 9 to 18 orders, 53,931 customers had more than 18 orders, 51,905 customers had 5 to 8 orders, and 43576 customers had less than 5 orders.

Recommendation: Targeted marketing, such as product recommendations, should be carried out to customers with a lower number of purchases. Loyalty programs or special offers can be used to reward high-frequency customers for their continued business.

3. Customer churn analysis: The churn analysis was carried out to determine the level of customer retention of the business and the analysis showed that the number of customers that had not used the platform in the last 30 days was 141,127, which is 68% of the total number of customers.

Recommendation: An in-depth analysis on additional data needs to be carried out to understand the reasons for churning and provide recommendations that will encourage repeat purchases.

4. Aisles and departments performance: The performance of the aisles and departments was determined in terms of the number of orders, as shown in the images below. The fresh fruits aisle had the highest number of orders with over 3.6M+ orders. Also, the produce department had the highest number of orders with over 9.4M+ orders.

Recommendation: The performance of the aisles and departments can be used to optimize inventory levels, thus ensuring that there are no understocking or overstocking situations.

5. Lastly, analysis of the dataset showed that products were purchased mostly on Sundays and Mondays, with Thursday having the least purchase. Also, the 10th hour on Monday had the highest number of orders, which is closely followed by the 14th hour on Sunday. The analysis also showed that the purchase hours were mostly between the 8th and 18th hours of the day, with most purchases occurring after noon.

Recommendation: Promotions can be carried out from Tuesdays to Thursdays to encourage mid-week purchases by the users. Adequate levels of staff must be provided on the peak ordering hours and days, thus ensuring that there are no late product pickups, deliveries, or customer disappointments.

The dashboard for the Instacart analysis is shown below.

Click here to interact with the dashboard

Click here to view the SQL queries

Limitations of the data

· The metadata had no description of the year and month of the dataset.

· The demographics of the customers were not provided.

· The prices of the products were not provided.

--

--