SQL (Structured Query Language) is one of the most important skills for us, data people. So in this article + video, get the necessary SQL skills you need for Data Analysis work.
Step 0: Install MySQL software
I am using the FREE MySQL Community Edition software to learn & practice SQL at home. You can get it from here.
If you have any other database software available (such as SQL Server or Oracle), you can use them to follow this tutorial.
Step 1: Import Awesome Chocolates Dataset
You need some data to practice SQL. So I prepared a sample dataset for a fictional (but yummy) company called Awesome Chocolates.
Download the .SQL file from here.
After you have the file,
- Open MySQL Workbench, login if necessary
- Click on the “server administration” tab (see illustration, click to expand)
- Click on “Data Import/Restore”
- Select the option “Import from self-contained file”
- Specify the path of the downloaded awesome-chocolates-data.sql file
- Start import
At the end of these steps, your MySQL should have the awesome chocolates database. Congratulations 🎉🥳
You can see this from “Schemas” tab on the workbench
Using SQL Server?
You can also use SQL Server to practice SQL. If you are using SQL Server Management Console, then follow below steps to import the data.
- Download this SQL Server Backup file
- Unzip the file
- Open SSMS & right click on the databases and chose “Restore Database” option. Follow the steps on that screen using below screenshots.
Step 2: Learn SQL for Data Analysis with this video
Everything is ready. Time to learn SQL.
I made an hour long tutorial to explain all the necessary SQL concepts for you. In this video, you will learn:
- How to use SELECT statement to answer business questions
- Working with WHERE clause
- Using AND, OR, NOT and combining them to create complex queries.
- Sorting query results using ORDER BY
- Combining data from two or more tables using JOINS
- Creating reports with GROUP BY
- More than 50 example queries, tips and ideas
Please watch the video below or on my YouTube Channel.
The Queries
Here are some of the example queries covered in the video lesson. Feel free to copy paste them in to SQL console to see how they work.
-- Select everything from sales table
select * from sales;
-- Show just a few columns from sales table
select SaleDate, Amount, Customers from sales;
select Amount, Customers, GeoID from sales;
-- Adding a calculated column with SQL
Select SaleDate, Amount, Boxes, Amount / boxes from sales;
-- Naming a field with AS in SQL
Select SaleDate, Amount, Boxes, Amount / boxes as 'Amount per box' from sales;
-- Using WHERE Clause in SQL
select * from sales
where amount > 10000;
-- Showing sales data where amount is greater than 10,000 by descending order
select * from sales
where amount > 10000
order by amount desc;
-- Showing sales data where geography is g1 by product ID &
-- descending order of amounts
select * from sales
where geoid='g1'
order by PID, Amount desc;
-- Working with dates in SQL
Select * from sales
where amount > 10000 and SaleDate >= '2022-01-01';
-- Using year() function to select all data in a specific year
select SaleDate, Amount from sales
where amount > 10000 and year(SaleDate) = 2022
order by amount desc;
-- BETWEEN condition in SQL with < & > operators
select * from sales
where boxes >0 and boxes <=50;
-- Using the between operator in SQL
select * from sales
where boxes between 0 and 50;
-- Using weekday() function in SQL
select SaleDate, Amount, Boxes, weekday(SaleDate) as 'Day of week'
from sales
where weekday(SaleDate) = 4;
-- Working with People table
select * from people;
-- OR operator in SQL
select * from people
where team = 'Delish' or team = 'Jucies';
-- IN operator in SQL
select * from people
where team in ('Delish','Jucies');
-- LIKE operator in SQL
select * from people
where salesperson like 'B%';
select * from people
where salesperson like '%B%';
select * from sales;
-- Using CASE to create branching logic in SQL
select SaleDate, Amount,
case when amount < 1000 then 'Under 1k'
when amount < 5000 then 'Under 5k'
when amount < 10000 then 'Under 10k'
else '10k or more'
end as 'Amount category'
from sales;
-- GROUP BY in SQL
select team, count(*) from people
group by team
SQL Practice Problems
Once you understand the concepts I’ve demoed in the video, try to solve below homework problems.
If you want to cheat, use the solutions tab to see the answers.
INTERMEDIATE PROBLEMS
👉 You need to combine various concepts covered in the video to solve these
1. Print details of shipments (sales) where amounts are > 2,000 and boxes are <100?
2. How many shipments (sales) each of the sales persons had in the month of January 2022?
3. Which product sells more boxes? Milk Bars or Eclairs?
4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?
5. Which shipments had under 100 customers & under 100 boxes? Did any of them occur on Wednesday?
HARD PROBLEMS
👉 These require concepts not covered in the video
1. What are the names of salespersons who had at least one shipment (sale) in the first 7 days of January 2022?
2. Which salespersons did not make any shipments in the first 7 days of January 2022?
3. How many times we shipped more than 1,000 boxes in each month?
4. Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?
5. India or Australia? Who buys more chocolate boxes on a monthly basis?
INTERMEDIATE PROBLEMS:
— 1. Print details of shipments (sales) where amounts are > 2,000 and boxes are <100?
select * from sales where amount > 2000 and boxes < 100;
— 2. How many shipments (sales) each of the sales persons had in the month of January 2022?
select p.Salesperson, count(*) as ‘Shipment Count’
from sales s
join people p on s.spid = p.spid
where SaleDate between ‘2022-1-1’ and ‘2022-1-31’
group by p.Salesperson;
— 3. Which product sells more boxes? Milk Bars or Eclairs?
select pr.product, sum(boxes) as ‘Total Boxes’
from sales s
join products pr on s.pid = pr.pid
where pr.Product in (‘Milk Bars’, ‘Eclairs’)
group by pr.product;
— 4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?
select pr.product, sum(boxes) as ‘Total Boxes’
from sales s
join products pr on s.pid = pr.pid
where pr.Product in (‘Milk Bars’, ‘Eclairs’)
and s.saledate between ‘2022-2-1’ and ‘2022-2-7’
group by pr.product;
— 5. Which shipments had under 100 customers & under 100 boxes? Did any of them occur on Wednesday?
select * from sales
where customers < 100 and boxes < 100;
select *,
case when weekday(saledate)=2 then ‘Wednesday Shipment’
else ”
end as ‘W Shipment’
from sales
where customers < 100 and boxes < 100;
HARD PROBLEMS:
— What are the names of salespersons who had at least one shipment (sale) in the first 7 days of January 2022?
select distinct p.Salesperson
from sales s
join people p on p.spid = s.SPID
where s.SaleDate between ‘2022-01-01’ and ‘2022-01-07’;
— Which salespersons did not make any shipments in the first 7 days of January 2022?
select p.salesperson
from people p
where p.spid not in
(select distinct s.spid from sales s where s.SaleDate between ‘2022-01-01’ and ‘2022-01-07’);
— How many times we shipped more than 1,000 boxes in each month?
select year(saledate) ‘Year’, month(saledate) ‘Month’, count(*) ‘Times we shipped 1k boxes’
from sales
where boxes>1000
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
— Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?
set @product_name = ‘After Nines’;
set @country_name = ‘New Zealand’;
select year(saledate) ‘Year’, month(saledate) ‘Month’,
if(sum(boxes)>1, ‘Yes’,’No’) ‘Status’
from sales s
join products pr on pr.PID = s.PID
join geo g on g.GeoID=s.GeoID
where pr.Product = @product_name and g.Geo = @country_name
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
— India or Australia? Who buys more chocolate boxes on a monthly basis?
select year(saledate) ‘Year’, month(saledate) ‘Month’,
sum(CASE WHEN g.geo=’India’ = 1 THEN boxes ELSE 0 END) ‘India Boxes’,
sum(CASE WHEN g.geo=’Australia’ = 1 THEN boxes ELSE 0 END) ‘Australia Boxes’
from sales s
join geo g on g.GeoID=s.GeoID
group by year(saledate), month(saledate)
order by year(saledate), month(saledate);
Resources to Learn More
SQL is a great skill to have if you work with data. Please use below courses, books, articles & websites to learn more.
SQL BOOKs 📚
I recommend getting these SQL books.
SQL COURSEs 💻
I recommend trying out these courses on SkillShare academy.
SQL WEBSITEs 🌐
Do check out these helpful websites to learn and understand various SQL concepts.
If you use my links to purchase the books or courses, I get a small affiliate commission.
There is no extra cost to you, obviously.
SQL Alternatives
If you want an alternative to SQL, consider learning Power Query.
Here is an article and here is a video to help you with that.
All the best 👍
I wish you all the best with your SQL learning. Do let me know in the comments below if you have enjoyed this article and the video.
22 Responses to “Learn SQL for Data Analysis in one hour”
Thanks Chandoo
you site is great 🙂
Best link for SQL download :
https://dev.mysql.com/downloads/mysql/
Hi, Chandoo. Thanks for all you do. Eagerly awaiting the solutions to your hard problems!
How to export the database from Sql to Excel to have a better view about the Database?
Hello,
Thank you so much for your video "Learn SQL for Data Analysis in one hour" on Youtube and the homework you left in your website. It is really helpful for me.
While I was doing the exercices, I had some questions I hope that you can help me to answer.
For answering the question 4 in the homework : "4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?", I tried 2 queries :
1. select pr.Product, sum(Boxes) as 'Total Boxes'
from sales s
join products pr on pr.pid = s.pid
where s.SaleDate between '2022-2-1' and '2022-2-7'
and pr.Product = 'Milk Bars' or pr.Product = 'Eclairs'
group by pr.Product;
=> Results: Eclairs: 144651; Milk Bars: 818
2. select pr.Product, sum(Boxes) as 'Total Boxes'
from sales s
join products pr on pr.pid = s.pid
where s.SaleDate between '2022-2-1' and '2022-2-7'
and pr.Product in ('Milk Bars', 'Eclairs')
group by pr.Product;
=> Results: Eclaires: 1019; Milk Bars: 818
In the video, if I understand well, the 2 conditions are similar :
- where pr.Product = 'Milk Bars' or pr.Product = 'Eclairs'
- where pr.Product in ('Milk Bars', 'Eclairs')
Can you help me to explain this mistake?
Thanks again!!
Hi there.. interesting question.
The problem could be with how you wrote the AND OR clauses without brackets.
Try this:
1. select pr.Product, sum(Boxes) as 'Total Boxes'
from sales s
join products pr on pr.pid = s.pid
where s.SaleDate between '2022-2-1' and '2022-2-7'
and (pr.Product = 'Milk Bars' or pr.Product = 'Eclairs')
group by pr.Product;
I solved it like I travelled the whole world.
Select*From 'awesome chocolates'.sales s;
Select*from 'awesome chocolates'.products pr;
Select pr.product, sum(s.boxes) as 'Total Amount of Boxes Sold'
From 'awesome chocolates'.products pr on pr.pid=s.pid
where pr.products = 'Eclairs'
Order by 'Total Amount of Boxes Sold' desc;
Select pr.product, sum(s.boxes) as 'Total Amount of Boxes Sold'
From 'awesome chocolates'.sales s
Join 'awesome chocolates'. products pr on pr.pid=s.pid
where pr.product='Milk Bars'
Order by 'Total Amount of Boxes Sold' desc;
Select Product, Max(total amount) as 'Total Amount of boxes sold'
From (
Select pr.product, sum(s.boxes) as Total Amount
From 'awesome chocolates'.products pr on pr.pid=s.pid
where pr.products = 'Eclairs'
Group by pr.product
Union all
Select pr.product, sum(s.boxes) as Total Amount
From 'awesome chocolates'.products pr on pr.pid=s.pid
where pr.products = 'Milk Bars'
Group by pr.product
) as subquery
Group by Product;
# 5. India or Australia? Who buys more chocolate boxes on a monthly basis?
for this question i wrote query without "case when" but the answer is similar to that
select year(s.saledate), month(s.saledate),g.geo,sum(s.boxes) , pd.product
from sales s
join geo g on g.geoid = s.geoid
join products pd on pd.pid = s.pid
where g.geo in ("india" ,"australia")
group by g.geo,month(s.saledate),year(s.saledate)
order by year(saledate), month(saledate);
Hi, after i watch ur video. im tryin to solve the hard problem number 4. im tryin to use ur query but it didnt work, so im giving my query. is it the result that we expect? and can u give the reason why u use sum() than count(), thank u
select year(s.SaleDate) as 'Year',
monthname(s.SaleDate) as 'Month',
g.Geo, pr.product,
count(s.Boxes) 'Boxes Shipment',
if(count(s.Boxes)> 1, 'Yes','No') 'Status'
from sales s
join geo g on (s.GeoID = g.GeoID)
join products pr on (s.PID = pr.PID)
where g.geo = 'New Zealand' and pr.Product = 'After Nines'
group by `Year`, `Month`
order by `Year` asc;
Hi Chandoo,
Below is my script for question 1 of the hard ones;
select distinct peo.salesperson,
sal.boxes
from sales sal
join people as peo
on peo.spid=sal.spid
where boxes >= 1 and saledate between '2022-01-01' and '2022-01-07'
group by peo.salesperson, sal.boxes;
Is it ok?
Abraham
Hi Chandoo,
Check my script response for question 4 from the hard problems for your comments;
select pro.product,sal.boxes,geo.geo, year(saledate) as 'Year', month(SaleDate) as 'Month', count(*) 'At least one box shipped',
case
when sal.boxes >=1 then 'Yes'
else 'No'
end as 'Status'
from sales sal
join products pro
on pro.pid=sal.pid
join geo geo
on geo.geoid=sal.geoid
where pro.product = 'After Nines' and geo.geo= 'New Zealand'
group by pro.product,sal.boxes,geo.geo, year(saledate), month(SaleDate)
order by sal.boxes desc;
Please help on this error. I cant find the mistake.Homework problem 2.
select p.Salesperson, s.SaleDate, s.Amount from sales s
join people p on s.SPID=p.SPID
where s.SaleDate between '2022-1-1' and '2022-1-31'
group by p.Salesperson;
Error: 15:34:56 select p.Salesperson, s.SaleDate, s.Amount from sales s join people p on s.SPID=p.SPID where SaleDate between '2022-1-1' and '2022-1-31' group by p.Salesperson LIMIT 0, 1000 Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'awesome chocolates.s.SaleDate' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by 0.000 sec
date format should be 'yyyy-mm-dd'.
you should write '2022-01-01' and '2022-01-31'
Hello sir,
can we use "NOT BETWEEN" for hard problem number 2
select distinct p.salesperson from sales s
join people p on p.spid=s.spid
where saledate not between '2022-1-1' and '2022-1-7';
Hard Problem #1 What are the names of salespersons who had at least one shipment(Sale) in the First 7 days of January 2022?
SELECT p.salesperson, s.saledate, s.customers
FROM `awesome chocolates`.people p
JOIN `awesome chocolates`.sales s ON p.SPID = s.SPID
Where s.customers>=1
And Date(s.saledate) between '2022-1-1' and '2022-1-7';
2. Which salespersons did not make any shipments in the first 7 days of January 2022?
SELECT p.salesperson, s.saledate, s.boxes, s.customers
FROM `awesome chocolates`.people p
JOIN `awesome chocolates`.sales s ON p.SPID = s.SPID
Where s.boxes=0
And Date(s.saledate) between '2022-1-1' and '2022-1-7';
Hard Problem #3. How many times we shipped more than 1,000 boxes in each month?
SELECT YEAR(s.saledate) AS year, MONTH(s.saledate) AS month, COUNT(*) AS shipment_count
FROM `awesome chocolates`.sales s
WHERE s.boxes > 1000
AND s.saledate BETWEEN '2021-01-01' AND '2022-03-31'
GROUP BY YEAR(s.saledate), MONTH(s.saledate)
ORDER BY YEAR(s.saledate), MONTH(s.saledate);
Hi Chandoo Sir,
Can you please help with this question?
Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?
Hey, there chandoo, thanks for this awesome video and guidance.
I was solving hard-level question #2 and there I had written my own code where I had joined the sales table and I am not getting the same output as yours. can you please clarify my concept on this?? Here is my code:
select ppl. Salesperson
from people as ppl
join sales as s on s.SPID = ppl.SPID
where ppl.SPID not in
(select distinct s.SPID
from sales as s
where s.SaleDate >='2022-01-01'
and s.SaleDate 1);
In hope of response 🙂
I facing some problem in importing data ...the error looks as failed to load with error exitcode 1..please reply
Hi Chandoo, I am unable to restore the .bak file to my SQL 2019 instance as the .bak file you provided is from a newer SQL 2022 version. Can you please export the database as .bacpac file and make it available as a download? Apparently, the .bacpac file will allow one to import the database into a lower SQL server version. Thanks in advance.
Hi Chandoo,
Thanks you very much for the "Learn SQL for Data Analysis in one hour" video and homework. They've been very helpful.I attempted the 2nd question in the Intermediate homework and observed that some shipments made by some salespersons had more than one product for Saledate (and time).
This was observed in my script below:
select p.Salesperson,s.PID,s.SaleDate,
row_number() over (partition by p.Salesperson,s.PID,s.SaleDate
order by p.Salesperson,s.PID,s.SaleDate) RowNum from sales s
join People p on s.SPID = p.SPID
where s.SaleDate between '01 January 2022' and '31 January 2022'
The solution you posted seems to count them as separate shipments making the Total Sales (Shipment) for the related Salespersons higher (i.e RowNum > 1). Could you kindly confirm if this was intended.
Based on my assumption, I'm getting lesser Total Sales (Shipment) for the related Salespersons with my script below:
With SPTotal (Salesperson,PID,SaleDate,RowNum)
as
(select p.Salesperson,s.PID,s.SaleDate,
row_number() over (partition by p.Salesperson,s.PID,s.SaleDate
order by p.Salesperson,s.PID,s.SaleDate) RowNum from sales s
join People p on s.SPID = p.SPID
where s.SaleDate between '01 January 2022' and '31 January 2022'
)
select Salesperson, sum(RowNum) as TotalSales from SPTotal
where RowNum = 1
group by Salesperson
order by TotalSales desc;
sir i am using microsoft ssms 19 but my version is 15.xx and your provided bak file version is 16.xx hence the .bak file is not opening due to version difference if you could provide the data in the form of .sql like you provided for my sql it would be helpful