fbpx
Search
Close this search box.

Learn SQL for Data Analysis in one hour

Share

Facebook
Twitter
LinkedIn

SQL (Structured Query Language) is one of the most important skills for us, data people. So in this article + video, get the necessary SQL skills you need for Data Analysis work.

Step 0: Install MySQL software

Install MySQL

I am using the FREE MySQL Community Edition software to learn & practice SQL at home. You can get it from here.

If you have any other database software available (such as SQL Server or Oracle), you can use them to follow this tutorial.

 

Step 1: Import Awesome Chocolates Dataset

Awesome Chocolates Dataset

You need some data to practice SQL. So I prepared a sample dataset for a fictional (but yummy) company called Awesome Chocolates.

Download the .SQL file from here.

After you have the file, 

  1. Open MySQL Workbench, login if necessary
  2. Click on the “server administration” tab (see illustration, click to expand)
  3. Click on “Data Import/Restore”
  4. Select the option “Import from self-contained file”
  5. Specify the path of the downloaded awesome-chocolates-data.sql file
  6. Start import

 

At the end of these steps, your MySQL should have the awesome chocolates database. Congratulations 🎉🥳

You can see this from “Schemas” tab on the workbench

 

Using SQL Server?

You can also use SQL Server to practice SQL. If you are using SQL Server Management Console, then follow below steps to import the data.

 

  1. Download this SQL Server Backup file
  2. Unzip the file
  3. Open SSMS & right click on the databases and chose “Restore Database” option. Follow the steps on that screen using below screenshots.

Restore Database - SQL Server Management Console

Steps to restore a database from backup in SQL Server

 

Step 2: Learn SQL for Data Analysis with this video

Everything is ready. Time to learn SQL.

I made an hour long tutorial to explain all the necessary SQL concepts for you. In this video, you will learn:

  • How to use SELECT statement to answer business questions
  • Working with WHERE clause
  • Using AND, OR, NOT and combining them to create complex queries.
  • Sorting query results using ORDER BY
  • Combining data from two or more tables using JOINS
  • Creating reports with GROUP BY
  • More than 50 example queries, tips and ideas

Please watch the video below or on my YouTube Channel.

The Queries

Here are some of the example queries covered in the video lesson. Feel free to copy paste them in to SQL console to see how they work.

				
					-- Select everything from sales table

select * from sales;

-- Show just a few columns from sales table

select SaleDate, Amount, Customers from sales;
select Amount, Customers, GeoID from sales;

-- Adding a calculated column with SQL

Select SaleDate, Amount, Boxes, Amount / boxes  from sales;

-- Naming a field with AS in SQL

Select SaleDate, Amount, Boxes, Amount / boxes as 'Amount per box'  from sales;

-- Using WHERE Clause in SQL

select * from sales
where amount > 10000;

-- Showing sales data where amount is greater than 10,000 by descending order
select * from sales
where amount > 10000
order by amount desc;

-- Showing sales data where geography is g1 by product ID & 
-- descending order of amounts

select * from sales
where geoid='g1'
order by PID, Amount desc;

-- Working with dates in SQL

Select * from sales
where amount > 10000 and SaleDate >= '2022-01-01';

-- Using year() function to select all data in a specific year

select SaleDate, Amount from sales
where amount > 10000 and year(SaleDate) = 2022
order by amount desc;

-- BETWEEN condition in SQL with < & > operators

select * from sales
where boxes >0 and boxes <=50;

-- Using the between operator in SQL

select * from sales
where boxes between 0 and 50;

-- Using weekday() function in SQL

select SaleDate, Amount, Boxes, weekday(SaleDate) as 'Day of week'
from sales
where weekday(SaleDate) = 4;

-- Working with People table

select * from people;

-- OR operator in SQL

select * from people
where team = 'Delish' or team = 'Jucies';

-- IN operator in SQL

select * from people
where team in ('Delish','Jucies');

-- LIKE operator in SQL

select * from people
where salesperson like 'B%';

select * from people
where salesperson like '%B%';

select * from sales;

-- Using CASE to create branching logic in SQL

select 	SaleDate, Amount, 
		case 	when amount < 1000 then 'Under 1k'
				when amount < 5000 then 'Under 5k'
                when amount < 10000 then 'Under 10k'
			else '10k or more'
		end as 'Amount category'
from sales;

-- GROUP BY in SQL

select team, count(*) from people
group by team
				
			

SQL Practice Problems

Once you understand the concepts I’ve demoed in the video, try to solve below homework problems.

If you want to cheat, use the solutions tab to see the answers.

Resources to Learn More

SQL Resources

SQL is a great skill to have if you work with data. Please use below courses, books, articles & websites to learn more.

SQL COURSEs 💻

I recommend trying out these courses on SkillShare academy.

SQL WEBSITEs 🌐

Do check out these helpful websites to learn and understand various SQL concepts.

If you use my links to purchase the books or courses, I get a small affiliate commission.

There is no extra cost to you, obviously.

SQL Alternatives

If you want an alternative to SQL, consider learning Power Query.

Here is an article and here is a video to help you with that.

 

All the best 👍

I wish you all the best with your SQL learning. Do let me know in the comments below if you have enjoyed this article and the video.

Facebook
Twitter
LinkedIn

Share this tip with your colleagues

Excel and Power BI tips - Chandoo.org Newsletter

Get FREE Excel + Power BI Tips

Simple, fun and useful emails, once per week.

Learn & be awesome.

Welcome to Chandoo.org

Thank you so much for visiting. My aim is to make you awesome in Excel & Power BI. I do this by sharing videos, tips, examples and downloads on this website. There are more than 1,000 pages with all things Excel, Power BI, Dashboards & VBA here. Go ahead and spend few minutes to be AWESOME.

Read my storyFREE Excel tips book

Overall I learned a lot and I thought you did a great job of explaining how to do things. This will definitely elevate my reporting in the future.
Rebekah S
Reporting Analyst
Excel formula list - 100+ examples and howto guide for you

From simple to complex, there is a formula for every occasion. Check out the list now.

Calendars, invoices, trackers and much more. All free, fun and fantastic.

Advanced Pivot Table tricks

Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. It's all here.

Still on fence about Power BI? In this getting started guide, learn what is Power BI, how to get it and how to create your first report from scratch.

22 Responses to “Learn SQL for Data Analysis in one hour”

  1. Tetonne says:

    Thanks Chandoo
    you site is great 🙂

    Best link for SQL download :
    https://dev.mysql.com/downloads/mysql/

  2. L J says:

    Hi, Chandoo. Thanks for all you do. Eagerly awaiting the solutions to your hard problems!

  3. Danny Savio Dey says:

    How to export the database from Sql to Excel to have a better view about the Database?

  4. Ha Linh TRAN says:

    Hello,

    Thank you so much for your video "Learn SQL for Data Analysis in one hour" on Youtube and the homework you left in your website. It is really helpful for me.

    While I was doing the exercices, I had some questions I hope that you can help me to answer.

    For answering the question 4 in the homework : "4. Which product sold more boxes in the first 7 days of February 2022? Milk Bars or Eclairs?", I tried 2 queries :
    1. select pr.Product, sum(Boxes) as 'Total Boxes'
    from sales s
    join products pr on pr.pid = s.pid
    where s.SaleDate between '2022-2-1' and '2022-2-7'
    and pr.Product = 'Milk Bars' or pr.Product = 'Eclairs'
    group by pr.Product;

    => Results: Eclairs: 144651; Milk Bars: 818

    2. select pr.Product, sum(Boxes) as 'Total Boxes'
    from sales s
    join products pr on pr.pid = s.pid
    where s.SaleDate between '2022-2-1' and '2022-2-7'
    and pr.Product in ('Milk Bars', 'Eclairs')
    group by pr.Product;
    => Results: Eclaires: 1019; Milk Bars: 818

    In the video, if I understand well, the 2 conditions are similar :
    - where pr.Product = 'Milk Bars' or pr.Product = 'Eclairs'
    - where pr.Product in ('Milk Bars', 'Eclairs')

    Can you help me to explain this mistake?
    Thanks again!!

    • Chandoo says:

      Hi there.. interesting question.
      The problem could be with how you wrote the AND OR clauses without brackets.
      Try this:
      1. select pr.Product, sum(Boxes) as 'Total Boxes'
      from sales s
      join products pr on pr.pid = s.pid
      where s.SaleDate between '2022-2-1' and '2022-2-7'

      and (pr.Product = 'Milk Bars' or pr.Product = 'Eclairs')

      group by pr.Product;

      • Al Kaviul Sarker says:

        I solved it like I travelled the whole world.

        Select*From 'awesome chocolates'.sales s;
        Select*from 'awesome chocolates'.products pr;

        Select pr.product, sum(s.boxes) as 'Total Amount of Boxes Sold'
        From 'awesome chocolates'.products pr on pr.pid=s.pid
        where pr.products = 'Eclairs'
        Order by 'Total Amount of Boxes Sold' desc;

        Select pr.product, sum(s.boxes) as 'Total Amount of Boxes Sold'
        From 'awesome chocolates'.sales s
        Join 'awesome chocolates'. products pr on pr.pid=s.pid
        where pr.product='Milk Bars'
        Order by 'Total Amount of Boxes Sold' desc;

        Select Product, Max(total amount) as 'Total Amount of boxes sold'
        From (
        Select pr.product, sum(s.boxes) as Total Amount
        From 'awesome chocolates'.products pr on pr.pid=s.pid
        where pr.products = 'Eclairs'
        Group by pr.product

        Union all

        Select pr.product, sum(s.boxes) as Total Amount
        From 'awesome chocolates'.products pr on pr.pid=s.pid
        where pr.products = 'Milk Bars'
        Group by pr.product
        ) as subquery
        Group by Product;

        • prithviraj says:

          # 5. India or Australia? Who buys more chocolate boxes on a monthly basis?

          for this question i wrote query without "case when" but the answer is similar to that

          select year(s.saledate), month(s.saledate),g.geo,sum(s.boxes) , pd.product
          from sales s
          join geo g on g.geoid = s.geoid
          join products pd on pd.pid = s.pid
          where g.geo in ("india" ,"australia")
          group by g.geo,month(s.saledate),year(s.saledate)
          order by year(saledate), month(saledate);

  5. Bagus says:

    Hi, after i watch ur video. im tryin to solve the hard problem number 4. im tryin to use ur query but it didnt work, so im giving my query. is it the result that we expect? and can u give the reason why u use sum() than count(), thank u

    select year(s.SaleDate) as 'Year',
    monthname(s.SaleDate) as 'Month',
    g.Geo, pr.product,
    count(s.Boxes) 'Boxes Shipment',
    if(count(s.Boxes)> 1, 'Yes','No') 'Status'
    from sales s
    join geo g on (s.GeoID = g.GeoID)
    join products pr on (s.PID = pr.PID)
    where g.geo = 'New Zealand' and pr.Product = 'After Nines'
    group by `Year`, `Month`
    order by `Year` asc;

  6. Abraham says:

    Hi Chandoo,
    Below is my script for question 1 of the hard ones;
    select distinct peo.salesperson,
    sal.boxes
    from sales sal
    join people as peo
    on peo.spid=sal.spid
    where boxes >= 1 and saledate between '2022-01-01' and '2022-01-07'
    group by peo.salesperson, sal.boxes;

    Is it ok?
    Abraham

  7. Abraham says:

    Hi Chandoo,
    Check my script response for question 4 from the hard problems for your comments;
    select pro.product,sal.boxes,geo.geo, year(saledate) as 'Year', month(SaleDate) as 'Month', count(*) 'At least one box shipped',
    case
    when sal.boxes >=1 then 'Yes'
    else 'No'
    end as 'Status'
    from sales sal
    join products pro
    on pro.pid=sal.pid
    join geo geo
    on geo.geoid=sal.geoid
    where pro.product = 'After Nines' and geo.geo= 'New Zealand'
    group by pro.product,sal.boxes,geo.geo, year(saledate), month(SaleDate)
    order by sal.boxes desc;

  8. Jenita says:

    Please help on this error. I cant find the mistake.Homework problem 2.

    select p.Salesperson, s.SaleDate, s.Amount from sales s
    join people p on s.SPID=p.SPID
    where s.SaleDate between '2022-1-1' and '2022-1-31'
    group by p.Salesperson;

    Error: 15:34:56 select p.Salesperson, s.SaleDate, s.Amount from sales s join people p on s.SPID=p.SPID where SaleDate between '2022-1-1' and '2022-1-31' group by p.Salesperson LIMIT 0, 1000 Error Code: 1055. Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'awesome chocolates.s.SaleDate' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by 0.000 sec

  9. Ragavi says:

    Hello sir,
    can we use "NOT BETWEEN" for hard problem number 2
    select distinct p.salesperson from sales s
    join people p on p.spid=s.spid
    where saledate not between '2022-1-1' and '2022-1-7';

  10. Al Kaviul Sarker says:

    Hard Problem #1 What are the names of salespersons who had at least one shipment(Sale) in the First 7 days of January 2022?

    SELECT p.salesperson, s.saledate, s.customers
    FROM `awesome chocolates`.people p
    JOIN `awesome chocolates`.sales s ON p.SPID = s.SPID
    Where s.customers>=1
    And Date(s.saledate) between '2022-1-1' and '2022-1-7';

  11. Al Kaviul Sarker says:

    2. Which salespersons did not make any shipments in the first 7 days of January 2022?

    SELECT p.salesperson, s.saledate, s.boxes, s.customers
    FROM `awesome chocolates`.people p
    JOIN `awesome chocolates`.sales s ON p.SPID = s.SPID
    Where s.boxes=0
    And Date(s.saledate) between '2022-1-1' and '2022-1-7';

  12. Al Kaviul Sarker says:

    Hard Problem #3. How many times we shipped more than 1,000 boxes in each month?

    SELECT YEAR(s.saledate) AS year, MONTH(s.saledate) AS month, COUNT(*) AS shipment_count
    FROM `awesome chocolates`.sales s
    WHERE s.boxes > 1000
    AND s.saledate BETWEEN '2021-01-01' AND '2022-03-31'
    GROUP BY YEAR(s.saledate), MONTH(s.saledate)
    ORDER BY YEAR(s.saledate), MONTH(s.saledate);

  13. Kirti Nishad says:

    Hi Chandoo Sir,

    Can you please help with this question?

    Did we ship at least one box of ‘After Nines’ to ‘New Zealand’ on all the months?

  14. Lakshay Gupta says:

    Hey, there chandoo, thanks for this awesome video and guidance.
    I was solving hard-level question #2 and there I had written my own code where I had joined the sales table and I am not getting the same output as yours. can you please clarify my concept on this?? Here is my code:

    select ppl. Salesperson
    from people as ppl
    join sales as s on s.SPID = ppl.SPID
    where ppl.SPID not in
    (select distinct s.SPID
    from sales as s
    where s.SaleDate >='2022-01-01'
    and s.SaleDate 1);

    In hope of response 🙂

  15. Ubhay shankar Shastri says:

    I facing some problem in importing data ...the error looks as failed to load with error exitcode 1..please reply

  16. Naresh says:

    Hi Chandoo, I am unable to restore the .bak file to my SQL 2019 instance as the .bak file you provided is from a newer SQL 2022 version. Can you please export the database as .bacpac file and make it available as a download? Apparently, the .bacpac file will allow one to import the database into a lower SQL server version. Thanks in advance.

  17. Temi says:

    Hi Chandoo,

    Thanks you very much for the "Learn SQL for Data Analysis in one hour" video and homework. They've been very helpful.I attempted the 2nd question in the Intermediate homework and observed that some shipments made by some salespersons had more than one product for Saledate (and time).

    This was observed in my script below:

    select p.Salesperson,s.PID,s.SaleDate,
    row_number() over (partition by p.Salesperson,s.PID,s.SaleDate
    order by p.Salesperson,s.PID,s.SaleDate) RowNum from sales s
    join People p on s.SPID = p.SPID
    where s.SaleDate between '01 January 2022' and '31 January 2022'

    The solution you posted seems to count them as separate shipments making the Total Sales (Shipment) for the related Salespersons higher (i.e RowNum > 1). Could you kindly confirm if this was intended.

    Based on my assumption, I'm getting lesser Total Sales (Shipment) for the related Salespersons with my script below:

    With SPTotal (Salesperson,PID,SaleDate,RowNum)
    as
    (select p.Salesperson,s.PID,s.SaleDate,
    row_number() over (partition by p.Salesperson,s.PID,s.SaleDate
    order by p.Salesperson,s.PID,s.SaleDate) RowNum from sales s
    join People p on s.SPID = p.SPID
    where s.SaleDate between '01 January 2022' and '31 January 2022'
    )
    select Salesperson, sum(RowNum) as TotalSales from SPTotal
    where RowNum = 1
    group by Salesperson
    order by TotalSales desc;

  18. dss says:

    sir i am using microsoft ssms 19 but my version is 15.xx and your provided bak file version is 16.xx hence the .bak file is not opening due to version difference if you could provide the data in the form of .sql like you provided for my sql it would be helpful

Leave a Reply