Project Title: Retail Sales Analysis
Level: Beginner
Database: SQL
This project is designed to demonstrate SQL skills and techniques typically used by data analysts to explore, clean, and analyze retail sales data. The project involves setting up a retail sales database, performing exploratory data analysis (EDA), and answering specific business questions through SQL queries. This project is ideal for those who are starting their journey in data analysis and want to build a solid foundation in SQL.
- Set up a retail sales database: Create and populate a retail sales database with the provided sales data.
- Data Cleaning: Identify and remove any records with missing or null values.
- Exploratory Data Analysis (EDA): Perform basic exploratory data analysis to understand the dataset.
- Business Analysis: Use SQL to answer specific business questions and derive insights from the sales data.
- Database Creation: The project starts by creating a database named
p1_retail_db. - Table Creation: A table named
retail_salesis created to store the sales data. The table structure includes columns for transaction ID, sale date, sale time, customer ID, gender, age, product category, quantity sold, price per unit, cost of goods sold (COGS), and total sale amount.
CREATE DATABASE SQL;
(
transactions_id INT PRIMARY KEY,
sale_date DATE,
sale_time TIME,
customer_id INT,
gender VARCHAR(10),
age INT,
category VARCHAR(35),
quantity INT,
price_per_unit FLOAT,
cogs FLOAT,
total_sale FLOAT
);- Record Count: Determine the total number of records in the dataset.
- Customer Count: Find out how many unique customers are in the dataset.
- Category Count: Identify all unique product categories in the dataset.
- Null Value Check: Check for any null values in the dataset and delete records with missing data.
select tablename from pg_tables where schemaname ='public';
select * from "SQL" s
limit 10;
select count(*) from "SQL" s ;
-- Data Cleaning
select * from "SQL" s
where
transactions_id is null
or
sale_date is null
or
sale_time is null
or
gender is null
or
category is null
or
cogs is null
or
total_sale is null
or
quantiy is null;
--
delete from "SQL" s
where
transactions_id is null
or
sale_date is null
or
sale_time is null
or
gender is null
or
category is null
or
cogs is null
or
total_sale is null
or
quantiy is null;
-- Data Exploration
-- how many sales we have?
select count(*) as total_sale from "SQL" s ;
--how many customers do we have?
select count(s.customer_id ) as total_customers from "SQL" s ;
-- how many unique customers we have?
select count(distinct customer_id) as total_customers from "SQL" s;
--how many categories do we have?
select distinct category from "SQL" s ;
The following SQL queries were developed to answer specific business questions:
- Write a SQL query to retrieve all columns for sales made on '2022-11-05:
select * from "SQL" s
where s.sale_date ='2022-11-05';- Write a SQL query to retrieve all transactions where the category is 'Clothing' and the quantity sold is more than 4 in the month of Nov-2022:
select
*
from "SQL" s
where s.category = 'Clothing'
and s.quantiy >= 4
and to_char(s.sale_date::date , 'YYYY-MM') = '2022-11';- Write a SQL query to calculate the total sales (total_sale) for each category.:
select category, SUM(total_sale) as Total_sales from "SQL" s
group by s.category ;- Write a SQL query to find the average age of customers who purchased items from the 'Beauty' category.:
select category, count(s.customer_id ) as Total_customers, round(avg(age), 2) as Average_age from "SQL" s
group by category ;- Write a SQL query to find all transactions where the total_sale is greater than 1000.:
select * from "SQL" s
where total_sale > 1000;- Write a SQL query to find the total number of transactions (transaction_id) made by each gender in each category.:
select s.category ,gender, count(s.transactions_id ) as Total_transactions from "SQL" s
group by category ,gender
order by 1;- Write a SQL query to calculate the average sale for each month. Find out best selling month in each year:
select
t.year,
t.month,
t.avg_sale,
t.total_sale,
'Best Month' as status
from (
select
extract(year from sale_date::date ) :: int as year,
extract (month from sale_date::date ) :: int as month,
round(avg(total_sale), 2) as avg_sale,
sum(total_sale) as total_sale,
rank() over(partition by extract(year from sale_date::date )
order by avg(total_sale) desc) as month_rank
from "SQL" s
group by
extract(year from sale_date::date ),
extract (month from sale_date::date )
) t
where t.month_rank = 1
order by t.year desc;
--order by 1, 2;
SELECT
year,
month,
avg_sale
FROM
(
SELECT
EXTRACT(YEAR FROM sale_date:: date) :: int as year,
EXTRACT(MONTH FROM sale_date :: date) :: int as month,
AVG(total_sale) as avg_sale,
RANK() OVER(PARTITION BY EXTRACT(YEAR FROM sale_date :: date) ORDER BY AVG(total_sale) DESC) as rank
FROM "SQL" s
GROUP BY
EXTRACT(YEAR FROM sale_date:: date),
EXTRACT(MONTH FROM sale_date :: date)
) as t1
WHERE rank = 1
--order by 1, 3- **Write a SQL query to find the top 5 customers based on the highest total sales **:
select
customer_id,
sum(s.total_sale ) as sales
from "SQL" s
group by customer_id
order by 2 desc
limit 5;- Write a SQL query to find the number of unique customers who purchased items from each category.:
select
category,
count(distinct s.customer_id ) as Number_of_customers
from "SQL" s
group by s.category
order by count(distinct s.customer_id );- Write a SQL query to create each shift and number of orders (Example Morning <12, Afternoon Between 12 & 17, Evening >17):
WITH hourly_sale
AS
(
SELECT *,
CASE
WHEN EXTRACT(HOUR FROM sale_time :: Time) :: int < 12 THEN 'Morning'
WHEN EXTRACT(HOUR FROM sale_time :: Time) :: int BETWEEN 12 AND 17 THEN 'Afternoon'
ELSE 'Evening'
END as shift
FROM "SQL" s
)
SELECT
shift,
COUNT(*) as total_orders
FROM hourly_sale
GROUP BY shift- Customer Demographics: The dataset includes customers from various age groups, with sales distributed across different categories such as Clothing and Beauty.
- High-Value Transactions: Several transactions had a total sale amount greater than 1000, indicating premium purchases.
- Sales Trends: Monthly analysis shows variations in sales, helping identify peak seasons.
- Customer Insights: The analysis identifies the top-spending customers and the most popular product categories.
- Sales Summary: A detailed report summarizing total sales, customer demographics, and category performance.
- Trend Analysis: Insights into sales trends across different months and shifts.
- Customer Insights: Reports on top customers and unique customer counts per category.
This project serves as a comprehensive introduction to SQL for data analysts, covering database setup, data cleaning, exploratory data analysis, and business-driven SQL queries. The findings from this project can help drive business decisions by understanding sales patterns, customer behavior, and product performance.
- Clone the Repository: Clone this project repository from GitHub.
- Set Up the Database: Run the SQL scripts provided in the
database_setup.sqlfile to create and populate the database. - Run the Queries: Use the SQL queries provided in the
analysis_queries.sqlfile to perform your analysis. - Explore and Modify: Feel free to modify the queries to explore different aspects of the dataset or answer additional business questions.
This project is part of my portfolio, showcasing the SQL skills essential for data analyst roles. If you have any questions, feedback, or would like to collaborate, feel free to get in touch!
- LinkedIn: Connect with me professionally
Thank you for your support, and I look forward to connecting with you!