SQL Mastery Course
23 weeks · 3 phases · From zero to job-ready
What is SQL + Databases
Week 1 — What is SQL + Databases
Key Concepts
- A database is an organized collection of data stored in tables
- A table has columns (categories) and rows (data)
- SQL is the language used to query (ask questions of) a database
Core Syntax
SELECT -- what columns you want
FROM -- which table
WHERE -- filter condition (optional)Your First Queries
Show all data
SELECT *
FROM employees;Show specific columns
SELECT name, salary
FROM employees;Filter with WHERE
SELECT *
FROM employees
WHERE department = 'Finance';Multiple conditions
SELECT *
FROM employees
WHERE department = 'Finance' AND salary > 50000;Sort results
SELECT *
FROM employees
ORDER BY salary DESC;DESC = highest first. ASC = lowest first.
Practice Exercises
Using this table employees:
| id | name | department | salary |
|---|---|---|---|
| 1 | Dattatray | Finance | 60000 |
| 2 | Rahul | Engineering | 80000 |
| 3 | Priya | Finance | 55000 |
| 4 | Amit | Marketing | 45000 |
| 5 | Sneha | Engineering | 90000 |
Write SQL queries to answer:
- Show all employees
- Show only name and department of all employees
- Show all employees in the Engineering department
- Show all employees earning more than 55000
- Show all Finance employees earning more than 55000
- Show all employees sorted by salary from highest to lowest
- Show only the name of employees in Marketing
Answers below — try first before looking!
Answers
-- 1
SELECT * FROM employees;
-- 2
SELECT name, department FROM employees;
-- 3
SELECT * FROM employees WHERE department = 'Engineering';
-- 4
SELECT * FROM employees WHERE salary > 55000;
-- 5
SELECT * FROM employees WHERE department = 'Finance' AND salary > 55000;
-- 6
SELECT * FROM employees ORDER BY salary DESC;
-- 7
SELECT name FROM employees WHERE department = 'Marketing';- SQL is not case sensitive —
SELECTandselectboth work, but uppercase is convention - Always end your query with
; - Text values go in single quotes:
'Finance' - Number values don't need quotes:
50000
- WHERE with more conditions: AND, OR, NOT, IN, BETWEEN, LIKE
- Filtering like a pro
Filtering Data
Week 2 — Filtering Data
SELECT,FROM,WHERE,ORDER BY- Filtering with a single condition:
WHERE salary > 50000
This week you go deeper — combining conditions and using powerful filter tools.
Combining Conditions
AND — both conditions must be true
SELECT *
FROM employees
WHERE department = 'Finance' AND salary > 55000;Returns only Finance employees who also earn more than 55000.
OR — either condition can be true
SELECT *
FROM employees
WHERE department = 'Finance' OR department = 'Marketing';Returns employees from Finance or Marketing.
NOT — exclude a condition
SELECT *
FROM employees
WHERE NOT department = 'Finance';Returns everyone except Finance employees.
IN — match a list of values
Instead of chaining multiple ORs:
-- Long way
WHERE department = 'Finance' OR department = 'Marketing' OR department = 'HR'
-- Clean way with IN
WHERE department IN ('Finance', 'Marketing', 'HR')You can also use NOT IN to exclude:
SELECT *
FROM employees
WHERE department NOT IN ('Finance', 'Engineering');BETWEEN — filter a range
SELECT *
FROM employees
WHERE salary BETWEEN 50000 AND 75000;Includes both endpoints (50000 and 75000 are included).
Equivalent to:
WHERE salary >= 50000 AND salary <= 75000LIKE — pattern matching for text
LIKE lets you search for partial matches using wildcards:
%= any number of characters (including zero)_= exactly one character
-- Names starting with 'S'
SELECT * FROM employees WHERE name LIKE 'S%';
-- Names ending with 'a'
SELECT * FROM employees WHERE name LIKE '%a';
-- Names containing 'rah' anywhere
SELECT * FROM employees WHERE name LIKE '%rah%';
-- 5-letter names starting with 'A'
SELECT * FROM employees WHERE name LIKE 'A____';Combining Everything
SELECT *
FROM employees
WHERE department IN ('Finance', 'Engineering')
AND salary BETWEEN 60000 AND 90000
AND name LIKE 'S%'
ORDER BY salary DESC;Practice Exercises
Using the same employees table:
| id | name | department | salary |
|---|---|---|---|
| 1 | Dattatray | Finance | 60000 |
| 2 | Rahul | Engineering | 80000 |
| 3 | Priya | Finance | 55000 |
| 4 | Amit | Marketing | 45000 |
| 5 | Sneha | Engineering | 90000 |
| 6 | Sanjay | HR | 48000 |
| 7 | Divya | Marketing | 52000 |
Write queries to answer:
- Show employees from Finance or HR
- Show employees not in Engineering
- Show employees with salary between 50000 and 65000
- Show employees whose name starts with 'S'
- Show employees in Finance or Marketing with salary above 50000
- Show employees whose name contains the letter 'a' (anywhere)
- Show Engineering employees earning more than 75000, sorted by salary descending
Try first before looking at answers!
Answers
-- 1
SELECT * FROM employees
WHERE department IN ('Finance', 'HR');
-- 2
SELECT * FROM employees
WHERE department NOT IN ('Engineering');
-- 3
SELECT * FROM employees
WHERE salary BETWEEN 50000 AND 65000;
-- 4
SELECT * FROM employees
WHERE name LIKE 'S%';
-- 5
SELECT * FROM employees
WHERE department IN ('Finance', 'Marketing') AND salary > 50000;
-- 6
SELECT * FROM employees
WHERE name LIKE '%a%';
-- 7
SELECT * FROM employees
WHERE department = 'Engineering' AND salary > 75000
ORDER BY salary DESC;ANDis stricter — both must be trueORis broader — either can be trueINis cleaner than chaining many ORsBETWEENincludes the boundary valuesLIKEis for text patterns —%is your wildcard- You can combine all of these in one
WHEREclause
- Functions: COUNT, SUM, AVG, MIN, MAX
- Answer questions like: "How many employees are in Finance?" or "What's the average salary?"
Aggregate Functions
Week 3 — Aggregate Functions
- SELECT, FROM, WHERE, ORDER BY
- Filtering with AND, OR, NOT, IN, BETWEEN, LIKE
This week you learn to summarize data — count rows, add up values, find averages.
The 5 Core Aggregate Functions
| Function | What it does |
|---|---|
COUNT() | Count number of rows |
SUM() | Add up values |
AVG() | Calculate average |
MIN() | Find the lowest value |
MAX() | Find the highest value |
COUNT
-- How many employees are there?
SELECT COUNT(*) FROM employees;
-- How many employees are in Finance?
SELECT COUNT(*) FROM employees
WHERE department = 'Finance';
-- Give the result a clear name with AS
SELECT COUNT(*) AS total_employees FROM employees;AS renames the output column — always use it with aggregates so results are readable.
SUM
-- Total salary bill
SELECT SUM(salary) AS total_salary FROM employees;
-- Total salary for Engineering only
SELECT SUM(salary) AS engineering_salary
FROM employees
WHERE department = 'Engineering';AVG
-- Average salary across all employees
SELECT AVG(salary) AS avg_salary FROM employees;
-- Average salary in Finance
SELECT AVG(salary) AS avg_finance_salary
FROM employees
WHERE department = 'Finance';MIN and MAX
-- Lowest and highest salary
SELECT MIN(salary) AS lowest, MAX(salary) AS highest
FROM employees;
-- Highest salary in Engineering
SELECT MAX(salary) AS top_eng_salary
FROM employees
WHERE department = 'Engineering';Combining Multiple Aggregates
SELECT
COUNT(*) AS headcount,
SUM(salary) AS total_payroll,
AVG(salary) AS avg_salary,
MIN(salary) AS lowest_salary,
MAX(salary) AS highest_salary
FROM employees;Practice Exercises
Using employees:
| id | name | department | salary |
|---|---|---|---|
| 1 | Dattatray | Finance | 60000 |
| 2 | Rahul | Engineering | 80000 |
| 3 | Priya | Finance | 55000 |
| 4 | Amit | Marketing | 45000 |
| 5 | Sneha | Engineering | 90000 |
| 6 | Sanjay | HR | 48000 |
| 7 | Divya | Marketing | 52000 |
- How many total employees are there?
- What is the total payroll (sum of all salaries)?
- What is the average salary?
- What is the highest salary in Marketing?
- What is the lowest salary in the entire company?
- How many employees are in Engineering?
- Show count, average, min, and max salary all in one query
Try first!
Answers
-- 1
SELECT COUNT(*) AS total_employees FROM employees;
-- 2
SELECT SUM(salary) AS total_payroll FROM employees;
-- 3
SELECT AVG(salary) AS avg_salary FROM employees;
-- 4
SELECT MAX(salary) AS highest_marketing_salary
FROM employees WHERE department = 'Marketing';
-- 5
SELECT MIN(salary) AS lowest_salary FROM employees;
-- 6
SELECT COUNT(*) AS engineering_count
FROM employees WHERE department = 'Engineering';
-- 7
SELECT
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM employees;- Aggregate functions collapse many rows into one number
- Always use
ASto name your output columns - You can use
WHEREbefore aggregating to filter first COUNT(*)counts all rows;COUNT(column)skips NULLs
- GROUP BY — run aggregates per group (e.g. avg salary per department)
- HAVING — filter groups after aggregating
GROUP BY + HAVING
Week 4 — GROUP BY + HAVING
- COUNT, SUM, AVG, MIN, MAX
- Using WHERE to filter before aggregating
This week you learn to group data — get aggregates per department, per city, per category.
GROUP BY — Aggregates Per Group
Without GROUP BY, aggregates return one row for the whole table.
With GROUP BY, you get one row per group.
-- Average salary per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;Result:
| department | avg_salary |
|---|---|
| Finance | 57500 |
| Engineering | 85000 |
| Marketing | 48500 |
| HR | 48000 |
Rules for GROUP BY
Every column in SELECT must either:
- Be in
GROUP BY, OR - Be inside an aggregate function
-- CORRECT
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;
-- WRONG — 'name' is not grouped or aggregated
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;Multiple Columns in GROUP BY
-- Count employees per department per job title
SELECT department, job_title, COUNT(*) AS count
FROM employees
GROUP BY department, job_title;ORDER BY with GROUP BY
-- Departments sorted by average salary (highest first)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;HAVING — Filter Groups After Aggregating
WHERE filters rows before grouping.
HAVING filters groups after aggregating.
-- Only departments with more than 1 employee
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 1;
-- Only departments with average salary above 60000
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;WHERE vs HAVING — The Key Difference
-- WHERE filters individual rows first, THEN GROUP BY runs
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 45000 -- filter rows first
GROUP BY department
HAVING AVG(salary) > 60000; -- then filter groupsThink of it this way:
WHERE= filter before cookingHAVING= filter after cooking
Full Query Order
SQL clauses always go in this order:
SELECT department, AVG(salary)
FROM employees
WHERE salary > 40000
GROUP BY department
HAVING AVG(salary) > 55000
ORDER BY AVG(salary) DESC;Practice Exercises
- Show headcount (number of employees) per department
- Show total payroll per department
- Show average salary per department, sorted highest to lowest
- Show only departments with more than 1 employee
- Show departments where average salary is above 55000
- Show departments with total payroll above 100000
- Show departments with more than 1 employee, sorted by headcount descending
Answers
-- 1
SELECT department, COUNT(*) AS headcount
FROM employees GROUP BY department;
-- 2
SELECT department, SUM(salary) AS total_payroll
FROM employees GROUP BY department;
-- 3
SELECT department, AVG(salary) AS avg_salary
FROM employees GROUP BY department
ORDER BY avg_salary DESC;
-- 4
SELECT department, COUNT(*) AS headcount
FROM employees GROUP BY department
HAVING COUNT(*) > 1;
-- 5
SELECT department, AVG(salary) AS avg_salary
FROM employees GROUP BY department
HAVING AVG(salary) > 55000;
-- 6
SELECT department, SUM(salary) AS total_payroll
FROM employees GROUP BY department
HAVING SUM(salary) > 100000;
-- 7
SELECT department, COUNT(*) AS headcount
FROM employees GROUP BY department
HAVING COUNT(*) > 1
ORDER BY headcount DESC;GROUP BYsplits data into groups, aggregates run per group- Every non-aggregate in
SELECTmust be inGROUP BY WHEREfilters rows before grouping —HAVINGfilters groups after- You can use both
WHEREandHAVINGin the same query
- String functions — UPPER, LOWER, TRIM, LENGTH, SUBSTR
- Date functions — DATE, YEAR, MONTH, strftime
Subqueries
Week 5 — String + Date Functions
- GROUP BY, HAVING, aggregate functions
This week you learn to clean and transform text and date values inside queries.
String Functions
UPPER and LOWER — change case
SELECT UPPER(name) AS name_upper FROM employees;
-- 'Dattatray' → 'DATTATRAY'
SELECT LOWER(name) AS name_lower FROM employees;
-- 'Dattatray' → 'dattatray'LENGTH — count characters
SELECT name, LENGTH(name) AS name_length FROM employees;
-- 'Sneha' → 5TRIM — remove leading/trailing spaces
SELECT TRIM(' hello ') AS clean;
-- ' hello ' → 'hello'Useful when data has accidental spaces.
SUBSTR — extract part of a string
-- SUBSTR(column, start_position, length)
SELECT SUBSTR(name, 1, 3) AS short_name FROM employees;
-- 'Dattatray' → 'Dat'
-- 'Rahul' → 'Rah'REPLACE — swap text
SELECT REPLACE(department, 'Engineering', 'Eng') AS dept
FROM employees;Combining string functions
SELECT
name,
UPPER(SUBSTR(name, 1, 1)) || LOWER(SUBSTR(name, 2)) AS formatted_name
FROM employees;|| is SQLite's string concatenation operator.
Date Functions (SQLite)
SQLite stores dates as text in YYYY-MM-DD format.
DATE — get today's date
SELECT DATE('now');
-- '2026-04-25'strftime — extract parts of a date
-- strftime('%Y', date_column) → year
-- strftime('%m', date_column) → month (01-12)
-- strftime('%d', date_column) → day
SELECT
hire_date,
strftime('%Y', hire_date) AS hire_year,
strftime('%m', hire_date) AS hire_month
FROM employees;Date arithmetic
-- Employees hired more than 2 years ago
SELECT * FROM employees
WHERE hire_date < DATE('now', '-2 years');
-- Employees hired this year
SELECT * FROM employees
WHERE strftime('%Y', hire_date) = strftime('%Y', DATE('now'));Adding hire_date to practice
Your employees table can be extended like this for date practice:
| id | name | department | salary | hire_date |
|---|---|---|---|---|
| 1 | Dattatray | Finance | 60000 | 2021-03-15 |
| 2 | Rahul | Engineering | 80000 | 2019-07-01 |
| 3 | Priya | Finance | 55000 | 2022-11-20 |
| 4 | Amit | Marketing | 45000 | 2023-01-10 |
| 5 | Sneha | Engineering | 90000 | 2018-05-30 |
Practice Exercises
- Show all employee names in UPPERCASE
- Show each employee's name and the length of their name
- Show the first 3 letters of each employee's name as
short_name - Show hire year and hire month for each employee
- Show employees hired before 2022
- Show employees hired in 2022 or later, sorted by hire_date
- Show names and department — replace 'Finance' with 'FIN'
Answers
-- 1
SELECT UPPER(name) AS name FROM employees;
-- 2
SELECT name, LENGTH(name) AS name_length FROM employees;
-- 3
SELECT name, SUBSTR(name, 1, 3) AS short_name FROM employees;
-- 4
SELECT
name,
strftime('%Y', hire_date) AS hire_year,
strftime('%m', hire_date) AS hire_month
FROM employees;
-- 5
SELECT * FROM employees WHERE hire_date < '2022-01-01';
-- 6
SELECT * FROM employees
WHERE hire_date >= '2022-01-01'
ORDER BY hire_date;
-- 7
SELECT name, REPLACE(department, 'Finance', 'FIN') AS dept
FROM employees;- String functions don't change the actual data — they transform it for display
- SQLite uses
||for string concatenation - SQLite uses
strftime()for date formatting (PostgreSQL usesEXTRACTandTO_CHAR) - Always store dates as
YYYY-MM-DDso sorting and comparison works correctly
- NULL handling — IS NULL, IS NOT NULL, COALESCE
- What happens when data is missing
NULL Values
Week 6 — NULL Handling
- Aggregate functions, GROUP BY, string and date functions
This week you tackle NULL — the most misunderstood thing in SQL.
What is NULL?
NULL means unknown or missing — it is not zero, not an empty string, not "N/A". It is the absence of a value.
Real examples of NULL:
- An employee with no manager (they're the CEO)
- A product with no discount set
- A customer who hasn't entered their phone number
NULL breaks normal comparisons
-- This does NOT work as expected
SELECT * FROM employees WHERE bonus = NULL; -- returns nothing!
SELECT * FROM employees WHERE bonus != NULL; -- also returns nothing!NULL is not equal to anything — not even itself.
Use IS NULL and IS NOT NULL instead:
-- Employees with no bonus set
SELECT * FROM employees WHERE bonus IS NULL;
-- Employees who have a bonus
SELECT * FROM employees WHERE bonus IS NOT NULL;NULL in aggregate functions
-- COUNT(*) counts all rows including NULLs
-- COUNT(column) skips NULLs
SELECT
COUNT(*) AS total_rows,
COUNT(bonus) AS rows_with_bonus
FROM employees;SUM, AVG, MIN, MAX also skip NULLs automatically.
COALESCE — replace NULL with a default
COALESCE(value, fallback) returns the first non-NULL value.
-- Show 0 instead of NULL for missing bonuses
SELECT name, COALESCE(bonus, 0) AS bonus
FROM employees;
-- Multiple fallbacks
SELECT name, COALESCE(bonus, commission, 0) AS extra_pay
FROM employees;IFNULL — SQLite shorthand for COALESCE
SELECT name, IFNULL(bonus, 0) AS bonus
FROM employees;IFNULL(a, b) = COALESCE(a, b) — same thing, two names.
NULLIF — turn a value into NULL
NULLIF(a, b) returns NULL if a = b, otherwise returns a.
-- Treat 0 as NULL (useful to avoid division by zero)
SELECT salary / NULLIF(hours_worked, 0) AS hourly_rate
FROM employees;NULL in calculations
Any math with NULL returns NULL:
SELECT 100 + NULL; -- NULL
SELECT salary + NULL AS total FROM employees; -- all NULLsFix it with COALESCE:
SELECT salary + COALESCE(bonus, 0) AS total_comp
FROM employees;Extended employees table for practice
| id | name | department | salary | bonus |
|---|---|---|---|---|
| 1 | Dattatray | Finance | 60000 | 5000 |
| 2 | Rahul | Engineering | 80000 | NULL |
| 3 | Priya | Finance | 55000 | 3000 |
| 4 | Amit | Marketing | 45000 | NULL |
| 5 | Sneha | Engineering | 90000 | 8000 |
| 6 | Sanjay | HR | 48000 | NULL |
Practice Exercises
- Show employees who have no bonus
- Show employees who do have a bonus
- Show all employees with bonus — display 0 if NULL
- Show total compensation (salary + bonus) for each employee — treat NULL bonus as 0
- Count how many employees have a bonus set
- Show employees with no bonus in Finance
- Show the average bonus — but only count employees who actually have one
Answers
-- 1
SELECT * FROM employees WHERE bonus IS NULL;
-- 2
SELECT * FROM employees WHERE bonus IS NOT NULL;
-- 3
SELECT name, COALESCE(bonus, 0) AS bonus FROM employees;
-- 4
SELECT name, salary + COALESCE(bonus, 0) AS total_comp
FROM employees;
-- 5
SELECT COUNT(bonus) AS employees_with_bonus FROM employees;
-- 6
SELECT * FROM employees
WHERE bonus IS NULL AND department = 'Finance';
-- 7
SELECT AVG(bonus) AS avg_bonus FROM employees;
-- AVG automatically skips NULLs, so this is correct as-is- NULL = unknown, not zero
- Always use
IS NULL/IS NOT NULL— never= NULL COALESCEis your go-to for replacing NULLs with defaultsCOUNT(column)skips NULLs;COUNT(*)does not- Math with NULL always returns NULL — use COALESCE to fix it
- Mini Project 1 — Sales data analysis using everything from Phase 1
Mini Project 1
Week 7 — Mini Project 1: Sales Data Analysis
What This Tests
Everything from Phase 1:
- SELECT, FROM, WHERE, ORDER BY
- AND, OR, IN, BETWEEN, LIKE
- COUNT, SUM, AVG, MIN, MAX
- GROUP BY, HAVING
- String + Date functions
- NULL handling
The Dataset
You're a data analyst at a retail company. You have two tables:
Table: sales
| sale_id | product | category | amount | sale_date | rep_id |
|---|---|---|---|---|---|
| 1 | Laptop | Electronics | 85000 | 2024-01-15 | 101 |
| 2 | Phone | Electronics | 45000 | 2024-01-20 | 102 |
| 3 | Desk | Furniture | 12000 | 2024-02-05 | 101 |
| 4 | Chair | Furniture | 8000 | 2024-02-10 | 103 |
| 5 | Tablet | Electronics | 30000 | 2024-03-01 | 102 |
| 6 | Monitor | Electronics | 22000 | 2024-03-15 | 101 |
| 7 | Keyboard | Electronics | 5000 | 2024-04-02 | 103 |
| 8 | Sofa | Furniture | 35000 | 2024-04-18 | 102 |
| 9 | TV | Electronics | 55000 | 2024-05-10 | NULL |
| 10 | Bookshelf | Furniture | 9500 | 2024-05-25 | 103 |
Table: reps
| rep_id | rep_name | region |
|---|---|---|
| 101 | Dattatray | West |
| 102 | Priya | East |
| 103 | Rahul | North |
CREATE TABLE IF NOT EXISTS sales (
sale_id INTEGER PRIMARY KEY,
product TEXT,
category TEXT,
amount INTEGER,
sale_date TEXT,
rep_id INTEGER
);
INSERT INTO sales VALUES
(1, 'Laptop', 'Electronics', 85000, '2024-01-15', 101),
(2, 'Phone', 'Electronics', 45000, '2024-01-20', 102),
(3, 'Desk', 'Furniture', 12000, '2024-02-05', 101),
(4, 'Chair', 'Furniture', 8000, '2024-02-10', 103),
(5, 'Tablet', 'Electronics', 30000, '2024-03-01', 102),
(6, 'Monitor', 'Electronics', 22000, '2024-03-15', 101),
(7, 'Keyboard', 'Electronics', 5000, '2024-04-02', 103),
(8, 'Sofa', 'Furniture', 35000, '2024-04-18', 102),
(9, 'TV', 'Electronics', 55000, '2024-05-10', NULL),
(10, 'Bookshelf', 'Furniture', 9500, '2024-05-25', 103);
CREATE TABLE IF NOT EXISTS reps (
rep_id INTEGER PRIMARY KEY,
rep_name TEXT,
region TEXT
);
INSERT INTO reps VALUES
(101, 'Dattatray', 'West'),
(102, 'Priya', 'East'),
(103, 'Rahul', 'North');The Questions
Basic (5 questions)
- Show all sales, ordered by amount descending
- Show only Electronics sales
- Show sales with amount between 10000 and 50000
- Show sales where the product name starts with a vowel (A, E, I, O, U)
- Show sales with no rep assigned (NULL rep_id)
Aggregations (5 questions)
- What is the total revenue across all sales?
- How many sales were made per category?
- What is the average sale amount per category?
- Which category has the highest total revenue?
- Show only categories with total revenue above 100000
Date-based (3 questions)
- Show all sales from Q1 2024 (January–March)
- How many sales were made per month?
- What is the total revenue per month, sorted by month?
Combined (2 questions)
- Show Electronics sales above 30000, sorted by amount descending
- Show the category, count, and total revenue — but only for categories with more than 3 sales
Answers
-- 1
SELECT * FROM sales ORDER BY amount DESC;
-- 2
SELECT * FROM sales WHERE category = 'Electronics';
-- 3
SELECT * FROM sales WHERE amount BETWEEN 10000 AND 50000;
-- 4
SELECT * FROM sales
WHERE product LIKE 'A%' OR product LIKE 'E%'
OR product LIKE 'I%' OR product LIKE 'O%'
OR product LIKE 'U%';
-- 5
SELECT * FROM sales WHERE rep_id IS NULL;
-- 6
SELECT SUM(amount) AS total_revenue FROM sales;
-- 7
SELECT category, COUNT(*) AS sales_count
FROM sales GROUP BY category;
-- 8
SELECT category, AVG(amount) AS avg_amount
FROM sales GROUP BY category;
-- 9
SELECT category, SUM(amount) AS total_revenue
FROM sales GROUP BY category
ORDER BY total_revenue DESC
LIMIT 1;
-- 10
SELECT category, SUM(amount) AS total_revenue
FROM sales GROUP BY category
HAVING SUM(amount) > 100000;
-- 11
SELECT * FROM sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31';
-- 12
SELECT strftime('%m', sale_date) AS month, COUNT(*) AS sales_count
FROM sales GROUP BY month;
-- 13
SELECT
strftime('%Y-%m', sale_date) AS month,
SUM(amount) AS monthly_revenue
FROM sales
GROUP BY month
ORDER BY month;
-- 14
SELECT * FROM sales
WHERE category = 'Electronics' AND amount > 30000
ORDER BY amount DESC;
-- 15
SELECT category, COUNT(*) AS sales_count, SUM(amount) AS total_revenue
FROM sales
GROUP BY category
HAVING COUNT(*) > 3;You Passed Phase 1 If You Can...
- [ ] Write any SELECT query from memory
- [ ] Filter with WHERE using multiple conditions
- [ ] Use all 5 aggregate functions confidently
- [ ] GROUP BY a column and filter groups with HAVING
- [ ] Handle NULLs without making mistakes
- [ ] Extract dates and do basic date filtering
Next Up: Phase 2 — Intermediate SQL
- Week 8: JOINs Part 1 — combining data from multiple tables
JOINs — INNER & LEFT
Week 8 — JOINs Part 1: INNER JOIN + LEFT JOIN
- All of Phase 1: SELECT, filter, aggregate, GROUP BY, NULL handling
This week you learn the most important skill in SQL — combining data from multiple tables.
Why JOINs?
Real databases never store everything in one table. Data is split across tables to avoid repetition:
orderstable has order details + customer_idcustomerstable has customer names + addresses
To see an order with the customer's name, you need to JOIN the two tables.
The Tables We'll Use
orders
| order_id | customer_id | product | amount |
|---|---|---|---|
| 1 | 1 | Laptop | 85000 |
| 2 | 2 | Phone | 45000 |
| 3 | 1 | Monitor | 22000 |
| 4 | 3 | Tablet | 30000 |
| 5 | 99 | Keyboard | 5000 |
customers
| customer_id | name | city |
|---|---|---|
| 1 | Dattatray | Mumbai |
| 2 | Priya | Delhi |
| 3 | Rahul | Pune |
| 4 | Sneha | Bangalore |
Note: Order 5 has customer_id 99 (no matching customer). Customer 4 (Sneha) has no orders.
INNER JOIN — only matching rows from both tables
Returns rows where the join condition matches in both tables.
SELECT orders.order_id, customers.name, orders.product, orders.amount
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;Result:
| order_id | name | product | amount |
|---|---|---|---|
| 1 | Dattatray | Laptop | 85000 |
| 2 | Priya | Phone | 45000 |
| 3 | Dattatray | Monitor | 22000 |
| 4 | Rahul | Tablet | 30000 |
Order 5 (customer_id 99) is excluded — no match.
Sneha is also excluded — she has no orders.
Using Aliases to Shorten Table Names
SELECT o.order_id, c.name, o.product, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;o and c are aliases — much cleaner than repeating full table names.
LEFT JOIN — all rows from left table, matched rows from right
Returns all rows from the left table (before JOIN), and matching rows from the right. If no match, right side columns are NULL.
SELECT o.order_id, c.name, o.product, o.amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;Result:
| order_id | name | product | amount |
|---|---|---|---|
| 1 | Dattatray | Laptop | 85000 |
| 2 | Priya | Phone | 45000 |
| 3 | Dattatray | Monitor | 22000 |
| 4 | Rahul | Tablet | 30000 |
| 5 | NULL | Keyboard | 5000 |
Order 5 is now included — name is NULL because customer_id 99 doesn't exist.
LEFT JOIN to find unmatched rows
Classic use case — find orders with no customer:
SELECT o.order_id, o.product
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;Returns order 5 — the "orphan" order with no matching customer.
Filtering on JOIN results
-- Orders for customers in Mumbai
SELECT o.order_id, c.name, o.product
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE c.city = 'Mumbai';Aggregating after JOIN
-- Total spend per customer
SELECT c.name, SUM(o.amount) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name
ORDER BY total_spent DESC;CREATE TABLE IF NOT EXISTS customers (
customer_id INTEGER PRIMARY KEY,
name TEXT,
city TEXT
);
INSERT INTO customers VALUES
(1, 'Dattatray', 'Mumbai'),
(2, 'Priya', 'Delhi'),
(3, 'Rahul', 'Pune'),
(4, 'Sneha', 'Bangalore');
CREATE TABLE IF NOT EXISTS orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
product TEXT,
amount INTEGER
);
INSERT INTO orders VALUES
(1, 1, 'Laptop', 85000),
(2, 2, 'Phone', 45000),
(3, 1, 'Monitor', 22000),
(4, 3, 'Tablet', 30000),
(5, 99, 'Keyboard', 5000);Practice Exercises
- Show all orders with customer name (INNER JOIN)
- Show all orders including those with no matching customer (LEFT JOIN)
- Show only orders that have no matching customer
- Show total amount spent per customer name
- Show orders for customers in Delhi or Pune
- How many orders has each customer made?
- Which customer has the highest total spend?
Answers
-- 1
SELECT o.order_id, c.name, o.product, o.amount
FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id;
-- 2
SELECT o.order_id, c.name, o.product, o.amount
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id;
-- 3
SELECT o.order_id, o.product
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
-- 4
SELECT c.name, SUM(o.amount) AS total_spent
FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name ORDER BY total_spent DESC;
-- 5
SELECT o.order_id, c.name, c.city, o.product
FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE c.city IN ('Delhi', 'Pune');
-- 6
SELECT c.name, COUNT(o.order_id) AS order_count
FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name;
-- 7
SELECT c.name, SUM(o.amount) AS total_spent
FROM orders o INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name ORDER BY total_spent DESC LIMIT 1;INNER JOIN= only matching rows from both sidesLEFT JOIN= all rows from left, NULLs where no match on right- Always use aliases (
o,c) for cleaner queries - Join condition goes after
ON— usually matching IDs
- JOINs Part 2 — RIGHT JOIN, FULL OUTER JOIN, self joins, joining 3+ tables
JOINs — RIGHT, FULL & CROSS
Week 9 — JOINs Part 2: RIGHT JOIN, FULL OUTER JOIN, Self Joins, Multi-table JOINs
- INNER JOIN — only matching rows
- LEFT JOIN — all left rows + matching right rows
RIGHT JOIN — all rows from right table
Same as LEFT JOIN but flipped. Returns all rows from the right table, and matches from the left.
SELECT o.order_id, c.name, o.product
FROM orders o
RIGHT JOIN customers c ON o.customer_id = c.customer_id;Result:
| order_id | name | product |
|---|---|---|
| 1 | Dattatray | Laptop |
| 3 | Dattatray | Monitor |
| 2 | Priya | Phone |
| 4 | Rahul | Tablet |
| NULL | Sneha | NULL |
Sneha is included even though she has no orders.
> Note: SQLite does not natively support RIGHT JOIN. Flip your tables and use LEFT JOIN instead:
> FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id
FULL OUTER JOIN — all rows from both tables
Returns all rows from both tables. Where there's no match, the missing side is NULL.
SELECT o.order_id, c.name, o.product
FROM orders o
FULL OUTER JOIN customers c ON o.customer_id = c.customer_id;Returns: all orders (including orphans) + all customers (including those with no orders).
> SQLite does not support FULL OUTER JOIN natively. Simulate with UNION of LEFT JOIN + RIGHT JOIN:
SELECT o.order_id, c.name, o.product
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id
UNION
SELECT o.order_id, c.name, o.product
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;Self Join — joining a table to itself
Used when rows in a table reference other rows in the same table.
Classic example: employees table with a manager_id column.
-- employees table: id, name, manager_id
SELECT
e.name AS employee,
m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;This reads the same table twice with different aliases (e for employee, m for manager).
Joining 3+ Tables
-- orders → customers → regions (3 tables)
SELECT
o.order_id,
c.name AS customer,
r.region_name,
o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN regions r ON c.region_id = r.region_id;Just keep chaining JOIN ... ON ... for each additional table.
JOIN Cheat Sheet
| JOIN Type | Returns |
|---|---|
| INNER JOIN | Only rows with matches in both tables |
| LEFT JOIN | All left rows + matching right rows (NULL if no match) |
| RIGHT JOIN | All right rows + matching left rows (NULL if no match) |
| FULL OUTER JOIN | All rows from both tables (NULL where no match) |
| Self JOIN | Table joined to itself using aliases |
Practice Exercises
Using orders, customers, and this new products table:
products
| product_id | product_name | category | list_price |
|---|---|---|---|
| 1 | Laptop | Electronics | 90000 |
| 2 | Phone | Electronics | 48000 |
| 3 | Monitor | Electronics | 25000 |
| 4 | Tablet | Electronics | 32000 |
| 5 | Keyboard | Electronics | 6000 |
- Show all customers including those with no orders (simulate RIGHT JOIN using LEFT JOIN)
- Show orders joined with customers and city
- Show all customers who have never placed an order
- Join orders to customers to regions (add a
regioncolumn to customers for this: Mumbai=West, Delhi=North, Pune=West, Bangalore=South) - Show each customer's name alongside their total spend, ordered by spend descending
Answers
-- 1 (simulating RIGHT JOIN)
SELECT o.order_id, c.name, o.product
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id;
-- 2
SELECT o.order_id, c.name, c.city, o.product, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;
-- 3
SELECT c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
-- 4 (assuming region column added to customers)
SELECT o.order_id, c.name, c.region, o.product, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;
-- 5
SELECT c.name, SUM(o.amount) AS total_spent
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.name
ORDER BY total_spent DESC;- SQLite doesn't support RIGHT JOIN or FULL OUTER JOIN — flip tables and use LEFT JOIN
- Self JOIN needs two aliases for the same table
- Chain as many JOINs as you need — just keep matching the right columns
- Always qualify column names with table aliases when joining (avoids ambiguity)
- Subqueries — queries inside queries
String Functions
Week 10 — Subqueries
- JOINs — combining tables
- GROUP BY + HAVING — aggregating groups
A subquery is a query inside another query. It lets you use the result of one query as input to another.
Subquery in WHERE
Answer questions like: "Show employees who earn more than the average salary."
-- Step 1: what's the average?
SELECT AVG(salary) FROM employees; -- returns 67000
-- Step 2: use it as a filter
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);The inner query runs first, returns a value, and the outer query uses it.
Subquery with IN
-- Show orders for customers in Mumbai
SELECT * FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE city = 'Mumbai'
);The subquery returns a list of IDs, and IN checks against that list.
Subquery with NOT IN
-- Show customers who have never placed an order
SELECT * FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);Always add WHERE customer_id IS NOT NULL in the subquery — NOT IN breaks silently if the list contains NULL.
Subquery in FROM (Derived Table)
Use a subquery as if it were a table. Must give it an alias.
-- Average of department averages
SELECT AVG(dept_avg) AS overall_avg
FROM (
SELECT department, AVG(salary) AS dept_avg
FROM employees
GROUP BY department
) AS dept_summary;Subquery in SELECT (Scalar Subquery)
Returns a single value used as a column in the output.
-- Show each employee's salary alongside the company average
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;Correlated Subquery
A subquery that references the outer query. Runs once per row — slower but powerful.
-- Show employees who earn more than their department's average
SELECT name, department, salary
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department = e.department -- references outer query's row
);Subquery vs JOIN — when to use which
| Use subquery when... | Use JOIN when... |
|---|---|
| You need a single value (scalar) | You need columns from both tables |
| You're filtering with IN/NOT IN | You're combining data |
| The logic reads more clearly | Performance matters (JOINs are usually faster) |
Practice Exercises
- Show employees who earn more than the average salary
- Show orders placed by customers in Pune
- Show customers who have never placed an order (using NOT IN)
- Show each employee's salary and the company-wide average salary as a second column
- Show departments where at least one employee earns more than 80000
- Show the highest-paid employee in each department (using a subquery in WHERE)
Answers
-- 1
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- 2
SELECT * FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers WHERE city = 'Pune'
);
-- 3
SELECT * FROM customers
WHERE customer_id NOT IN (
SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);
-- 4
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;
-- 5
SELECT DISTINCT department FROM employees
WHERE department IN (
SELECT department FROM employees WHERE salary > 80000
);
-- 6
SELECT name, department, salary
FROM employees e
WHERE salary = (
SELECT MAX(salary) FROM employees
WHERE department = e.department
);- Subquery in WHERE returns a value or list to filter against
- Subquery in FROM creates a temporary table — always alias it
- Subquery in SELECT returns a single value per row
NOT INwith NULLs in the list returns no results — always filter NULLs in the subquery- Correlated subqueries are powerful but slow — prefer JOINs when possible
- CASE WHEN — conditional logic inside queries
Date & Time Functions
Week 11 — CASE WHEN
- Subqueries, JOINs, aggregations
CASE WHEN is SQL's if/else. It lets you add conditional logic inside a query.
Basic Syntax
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE default_result
ENDSimple Example — Salary Bands
SELECT
name,
salary,
CASE
WHEN salary >= 80000 THEN 'High'
WHEN salary >= 55000 THEN 'Mid'
ELSE 'Low'
END AS salary_band
FROM employees;Result:
| name | salary | salary_band |
|---|---|---|
| Dattatray | 60000 | Mid |
| Rahul | 80000 | High |
| Priya | 55000 | Mid |
| Amit | 45000 | Low |
| Sneha | 90000 | High |
CASE WHEN with AND / OR
SELECT
name,
department,
salary,
CASE
WHEN department = 'Engineering' AND salary > 80000 THEN 'Top Engineer'
WHEN department = 'Finance' THEN 'Finance Team'
ELSE 'Other'
END AS label
FROM employees;CASE WHEN inside aggregate functions
Conditional aggregation — count or sum only rows that match a condition.
-- Count high-salary and low-salary employees in one query
SELECT
COUNT(CASE WHEN salary >= 70000 THEN 1 END) AS high_earners,
COUNT(CASE WHEN salary < 70000 THEN 1 END) AS low_earners
FROM employees;
-- Total payroll split by department
SELECT
SUM(CASE WHEN department = 'Engineering' THEN salary ELSE 0 END) AS eng_payroll,
SUM(CASE WHEN department = 'Finance' THEN salary ELSE 0 END) AS fin_payroll
FROM employees;CASE WHEN in ORDER BY
-- Sort Finance first, then everything else alphabetically
SELECT * FROM employees
ORDER BY
CASE WHEN department = 'Finance' THEN 0 ELSE 1 END,
name;CASE WHEN in GROUP BY
-- Group by salary band
SELECT
CASE
WHEN salary >= 80000 THEN 'High'
WHEN salary >= 55000 THEN 'Mid'
ELSE 'Low'
END AS salary_band,
COUNT(*) AS headcount
FROM employees
GROUP BY salary_band;Simple CASE (matching a single column)
-- Abbreviated syntax when checking one column
SELECT
name,
CASE department
WHEN 'Finance' THEN 'FIN'
WHEN 'Engineering' THEN 'ENG'
WHEN 'Marketing' THEN 'MKT'
ELSE 'OTHER'
END AS dept_code
FROM employees;Practice Exercises
- Add a column
salary_band— 'High' (≥80k), 'Mid' (55k–79k), 'Low' (<55k) - Add a column
is_senior— 'Yes' if salary > 70000, 'No' otherwise - Count how many employees are in each salary band
- Show total payroll split into Engineering vs Non-Engineering in one row
- Show employees — label Finance employees as 'Priority', others as 'Standard'
- Sort employees: Engineering first, Finance second, rest last
Answers
-- 1
SELECT name, salary,
CASE
WHEN salary >= 80000 THEN 'High'
WHEN salary >= 55000 THEN 'Mid'
ELSE 'Low'
END AS salary_band
FROM employees;
-- 2
SELECT name,
CASE WHEN salary > 70000 THEN 'Yes' ELSE 'No' END AS is_senior
FROM employees;
-- 3
SELECT
CASE
WHEN salary >= 80000 THEN 'High'
WHEN salary >= 55000 THEN 'Mid'
ELSE 'Low'
END AS salary_band,
COUNT(*) AS headcount
FROM employees
GROUP BY salary_band;
-- 4
SELECT
SUM(CASE WHEN department = 'Engineering' THEN salary ELSE 0 END) AS eng_payroll,
SUM(CASE WHEN department != 'Engineering' THEN salary ELSE 0 END) AS other_payroll
FROM employees;
-- 5
SELECT name, department,
CASE WHEN department = 'Finance' THEN 'Priority' ELSE 'Standard' END AS priority
FROM employees;
-- 6
SELECT * FROM employees
ORDER BY
CASE department
WHEN 'Engineering' THEN 1
WHEN 'Finance' THEN 2
ELSE 3
END;CASE WHENevaluates top to bottom — first match wins- Always end with
END(and optionallyAS column_name) ELSEis optional — without it, unmatched rows return NULL- You can use
CASE WHENinsideCOUNT,SUM,ORDER BY,GROUP BY
- UNION + INTERSECT — combining results from multiple queries
CASE Statements
Week 12 — UNION, INTERSECT, EXCEPT
- JOINs combine columns side-by-side
- UNION and friends stack results on top of each other
UNION — combine rows from two queries
SELECT name, city FROM customers_2023
UNION
SELECT name, city FROM customers_2024;Rules:
- Both queries must have the same number of columns
- Columns must have compatible data types
- Column names come from the first query
UNIONremoves duplicate rows automatically
UNION ALL — keep duplicates
SELECT name FROM employees_india
UNION ALL
SELECT name FROM employees_usa;UNION ALL is faster than UNION because it skips the deduplication step. Use it when you know there are no duplicates or you want to count them.
UNION vs JOIN — the difference
| UNION | JOIN |
|---|---|
| Stacks rows vertically | Combines columns horizontally |
| Both queries need same columns | Tables linked by a shared key |
| Used to combine similar datasets | Used to enrich data from related tables |
INTERSECT — rows that appear in both queries
-- Customers who placed orders in both 2023 AND 2024
SELECT customer_id FROM orders_2023
INTERSECT
SELECT customer_id FROM orders_2024;Returns only rows that exist in both result sets.
EXCEPT — rows in first query but not the second
-- Customers from 2023 who did NOT order in 2024
SELECT customer_id FROM orders_2023
EXCEPT
SELECT customer_id FROM orders_2024;Practical Example: Monthly Report
-- Combine Q1 and Q2 sales into one result
SELECT 'Q1' AS quarter, product, amount FROM sales_q1
UNION ALL
SELECT 'Q2' AS quarter, product, amount FROM sales_q2
ORDER BY quarter, amount DESC;Using UNION with same table (different filters)
-- Top earners vs specific department — tagged rows
SELECT name, salary, 'Top Earner' AS tag
FROM employees WHERE salary > 80000
UNION
SELECT name, salary, 'Finance' AS tag
FROM employees WHERE department = 'Finance';Practice Exercises
For these exercises, imagine two tables:
orders_jan— orders from Januaryorders_feb— orders from February
Both have: order_id, customer_id, product, amount
- Show all orders from January and February combined (no duplicates)
- Show all orders from both months including duplicates
- Show customer IDs that placed orders in both months
- Show customer IDs that ordered in January but NOT in February
- From the
employeestable, show all employees earning > 80000 labeled 'High' and all in Finance labeled 'Finance' using UNION
Answers
-- 1
SELECT * FROM orders_jan
UNION
SELECT * FROM orders_feb;
-- 2
SELECT * FROM orders_jan
UNION ALL
SELECT * FROM orders_feb;
-- 3
SELECT customer_id FROM orders_jan
INTERSECT
SELECT customer_id FROM orders_feb;
-- 4
SELECT customer_id FROM orders_jan
EXCEPT
SELECT customer_id FROM orders_feb;
-- 5
SELECT name, salary, 'High' AS label FROM employees WHERE salary > 80000
UNION
SELECT name, salary, 'Finance' AS label FROM employees WHERE department = 'Finance';UNIONremoves duplicates;UNION ALLkeeps them (and is faster)- Both queries must have the same number and type of columns
INTERSECT= in both;EXCEPT= in first but not second- SQLite supports UNION, UNION ALL, INTERSECT, EXCEPT
- INSERT, UPDATE, DELETE — modifying data in tables
Views
Week 13 — INSERT, UPDATE, DELETE
- All SELECT-based querying
- You've been reading data — now you'll write it
INSERT — add new rows
Insert a single row
INSERT INTO employees (name, department, salary)
VALUES ('Kavya', 'HR', 52000);Insert multiple rows at once
INSERT INTO employees (name, department, salary)
VALUES
('Vikram', 'Engineering', 78000),
('Meera', 'Finance', 63000),
('Arjun', 'Marketing', 47000);Insert without column names (match exact column order)
INSERT INTO employees VALUES (8, 'Kavya', 'HR', 52000, NULL);Risky — if column order changes, this breaks. Always name columns.
Insert from another table
INSERT INTO employees_archive
SELECT * FROM employees WHERE hire_date < '2020-01-01';UPDATE — modify existing rows
Always use WHERE — or you update every row
-- Give Amit a raise
UPDATE employees
SET salary = 50000
WHERE name = 'Amit';
-- Update multiple columns at once
UPDATE employees
SET salary = 50000, department = 'Sales'
WHERE name = 'Amit';
-- Give everyone in Finance a 10% raise
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Finance';Rule: Always write the WHERE clause first mentally, then add SET. Forgetting WHERE updates every row.
DELETE — remove rows
-- Delete one employee
DELETE FROM employees WHERE name = 'Amit';
-- Delete all Marketing employees
DELETE FROM employees WHERE department = 'Marketing';WARNING: DELETE FROM employees; (no WHERE) deletes all rows. The table structure stays, but all data is gone.
Safe workflow before UPDATE/DELETE
Before running a destructive query, run the SELECT version first:
-- Step 1: preview what you're about to change
SELECT * FROM employees WHERE department = 'Marketing';
-- Step 2: once you're sure, run the DELETE
DELETE FROM employees WHERE department = 'Marketing';Transactions — undo mistakes
Wrap changes in a transaction so you can roll back if something goes wrong:
BEGIN;
UPDATE employees SET salary = salary * 1.10 WHERE department = 'Finance';
-- Check the result
SELECT * FROM employees WHERE department = 'Finance';
-- If happy: confirm
COMMIT;
-- If wrong: undo everything
ROLLBACK;TRUNCATE vs DELETE
| Command | What it does |
|---|---|
DELETE FROM table | Removes all rows, can be rolled back |
DELETE FROM table WHERE ... | Removes matching rows |
DROP TABLE table | Removes the entire table including structure |
SQLite doesn't have TRUNCATE — use DELETE FROM table; instead.
Practice Exercises
- Insert a new employee: 'Kedar', Engineering, salary 72000
- Insert 3 employees at once into the employees table
- Give all Engineering employees a 15% salary increase
- Change Priya's department from Finance to HR
- Delete all employees with salary below 50000
- Use a transaction to safely delete Marketing employees (preview first, then commit)
Answers
-- 1
INSERT INTO employees (name, department, salary)
VALUES ('Kedar', 'Engineering', 72000);
-- 2
INSERT INTO employees (name, department, salary)
VALUES
('Riya', 'Finance', 58000),
('Suresh', 'Marketing', 44000),
('Pooja', 'HR', 49000);
-- 3
UPDATE employees
SET salary = salary * 1.15
WHERE department = 'Engineering';
-- 4
UPDATE employees
SET department = 'HR'
WHERE name = 'Priya';
-- 5
DELETE FROM employees WHERE salary < 50000;
-- 6
BEGIN;
SELECT * FROM employees WHERE department = 'Marketing';
DELETE FROM employees WHERE department = 'Marketing';
-- If looks right:
COMMIT;
-- If something's wrong:
-- ROLLBACK;INSERT INTO table (cols) VALUES (...)— always name your columnsUPDATEwithoutWHERE= updates every row (dangerous)DELETEwithoutWHERE= deletes every row (dangerous)- Preview with
SELECTbefore runningUPDATEorDELETE - Use transactions (
BEGIN / COMMIT / ROLLBACK) for risky changes
- CREATE TABLE + constraints — designing your own tables
Stored Procedures
Week 14 — CREATE TABLE + Constraints
- INSERT, UPDATE, DELETE
- How to query and modify data
This week you learn to design tables from scratch — and enforce rules on the data.
CREATE TABLE
CREATE TABLE employees (
id INTEGER,
name TEXT,
department TEXT,
salary REAL,
hire_date TEXT
);Common Data Types (SQLite)
| Type | Use for |
|---|---|
INTEGER | Whole numbers: 1, 42, -5 |
REAL | Decimal numbers: 3.14, 99.9 |
TEXT | Strings: names, emails, dates |
BLOB | Binary data: images, files |
NULL | Missing/unknown value |
Constraints — rules that protect your data
PRIMARY KEY — unique identifier for each row
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT,
salary INTEGER
);- Each value must be unique
- Cannot be NULL
AUTOINCREMENTmakes it auto-generate
CREATE TABLE employees (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT
);NOT NULL — column must always have a value
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL, -- required
department TEXT NOT NULL,
salary INTEGER -- optional
);UNIQUE — all values in column must be different
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
email TEXT UNIQUE, -- no two employees can share an email
name TEXT
);DEFAULT — use a value if none is provided
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
status TEXT DEFAULT 'active',
created_at TEXT DEFAULT (DATE('now'))
);CHECK — custom validation rule
CREATE TABLE employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
salary INTEGER CHECK (salary > 0),
age INTEGER CHECK (age >= 18 AND age <= 70)
);FOREIGN KEY — link to another table
CREATE TABLE orders (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER,
amount REAL,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);Prevents inserting an order with a customer_id that doesn't exist in customers.
> SQLite: foreign key enforcement is OFF by default. Enable with:
> PRAGMA foreign_keys = ON;
Full Example: Designing a Finance App Schema
CREATE TABLE customers (
customer_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
city TEXT,
created_at TEXT DEFAULT (DATE('now'))
);
CREATE TABLE accounts (
account_id INTEGER PRIMARY KEY AUTOINCREMENT,
customer_id INTEGER NOT NULL,
account_type TEXT CHECK (account_type IN ('savings', 'current', 'investment')),
balance REAL DEFAULT 0 CHECK (balance >= 0),
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
CREATE TABLE transactions (
txn_id INTEGER PRIMARY KEY AUTOINCREMENT,
account_id INTEGER NOT NULL,
amount REAL NOT NULL,
txn_type TEXT CHECK (txn_type IN ('credit', 'debit')),
txn_date TEXT DEFAULT (DATE('now')),
FOREIGN KEY (account_id) REFERENCES accounts(account_id)
);ALTER TABLE — modify an existing table
SQLite has limited ALTER TABLE support:
-- Add a column
ALTER TABLE employees ADD COLUMN bonus INTEGER DEFAULT 0;
-- Rename the table
ALTER TABLE employees RENAME TO staff;
-- Rename a column (SQLite 3.25+)
ALTER TABLE employees RENAME COLUMN salary TO base_salary;DROP TABLE — delete a table completely
DROP TABLE employees;
-- Safe version — no error if table doesn't exist
DROP TABLE IF EXISTS employees;CREATE TABLE IF NOT EXISTS
CREATE TABLE IF NOT EXISTS employees (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);Safe to run multiple times — won't error if table already exists.
Practice Exercises
- Create a
productstable with: id, name, category, price (must be > 0), stock (default 0) - Create a
salestable linked toproductsvia foreign key - Add a
discountcolumn toproductswith default 0 - Create a
userstable with unique email and NOT NULL name - Write the schema for a simple personal finance tracker: accounts + transactions
Answers
-- 1
CREATE TABLE products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT,
price REAL CHECK (price > 0),
stock INTEGER DEFAULT 0
);
-- 2
CREATE TABLE sales (
sale_id INTEGER PRIMARY KEY AUTOINCREMENT,
product_id INTEGER NOT NULL,
quantity INTEGER NOT NULL,
sale_date TEXT DEFAULT (DATE('now')),
FOREIGN KEY (product_id) REFERENCES products(id)
);
-- 3
ALTER TABLE products ADD COLUMN discount REAL DEFAULT 0;
-- 4
CREATE TABLE users (
user_id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
email TEXT UNIQUE NOT NULL,
created_at TEXT DEFAULT (DATE('now'))
);
-- 5
CREATE TABLE accounts (
account_id INTEGER PRIMARY KEY AUTOINCREMENT,
account_name TEXT NOT NULL,
balance REAL DEFAULT 0
);
CREATE TABLE transactions (
txn_id INTEGER PRIMARY KEY AUTOINCREMENT,
account_id INTEGER NOT NULL,
amount REAL NOT NULL,
type TEXT CHECK (type IN ('income', 'expense')),
note TEXT,
date TEXT DEFAULT (DATE('now')),
FOREIGN KEY (account_id) REFERENCES accounts(account_id)
);PRIMARY KEY= unique + not null identifier for each rowNOT NULL= mandatory fieldUNIQUE= no duplicates allowedDEFAULT= auto-fill when value not providedCHECK= custom validationFOREIGN KEY= links rows across tables (enable with PRAGMA in SQLite)
- Mini Project 2 — Financial data analysis using JOINs + aggregations
Mini Project 2
Week 15 — Mini Project 2: Financial Data Analysis
What This Tests
Everything from Phase 2:
- INNER JOIN, LEFT JOIN, multi-table JOINs
- Subqueries in WHERE, FROM, SELECT
- CASE WHEN — conditional logic
- UNION, INTERSECT, EXCEPT
- INSERT, UPDATE, DELETE
- CREATE TABLE + constraints
The Dataset: Personal Finance Tracker
You're building and querying a personal finance database.
Schema
CREATE TABLE accounts (
account_id INTEGER PRIMARY KEY,
account_name TEXT NOT NULL,
account_type TEXT CHECK (account_type IN ('savings', 'checking', 'investment')),
balance REAL DEFAULT 0
);
CREATE TABLE categories (
category_id INTEGER PRIMARY KEY,
category_name TEXT NOT NULL,
type TEXT CHECK (type IN ('income', 'expense'))
);
CREATE TABLE transactions (
txn_id INTEGER PRIMARY KEY,
account_id INTEGER,
category_id INTEGER,
amount REAL NOT NULL,
txn_date TEXT,
note TEXT,
FOREIGN KEY (account_id) REFERENCES accounts(account_id),
FOREIGN KEY (category_id) REFERENCES categories(category_id)
);Setup Data
INSERT INTO accounts VALUES
(1, 'HDFC Savings', 'savings', 150000),
(2, 'SBI Checking', 'checking', 35000),
(3, 'Zerodha', 'investment', 200000);
INSERT INTO categories VALUES
(1, 'Salary', 'income'),
(2, 'Freelance', 'income'),
(3, 'Food', 'expense'),
(4, 'Rent', 'expense'),
(5, 'Transport', 'expense'),
(6, 'Entertainment', 'expense'),
(7, 'Investments', 'expense');
INSERT INTO transactions VALUES
(1, 1, 1, 85000, '2024-01-01', 'Monthly salary'),
(2, 1, 3, -8500, '2024-01-05', 'Groceries'),
(3, 1, 4, -20000, '2024-01-07', 'Rent January'),
(4, 2, 5, -2500, '2024-01-10', 'Uber rides'),
(5, 1, 6, -3000, '2024-01-15', 'Movies + dinner'),
(6, 1, 2, 15000, '2024-01-20', 'Freelance project'),
(7, 3, 7, -10000, '2024-01-25', 'SIP investment'),
(8, 1, 1, 85000, '2024-02-01', 'Monthly salary'),
(9, 1, 3, -9200, '2024-02-05', 'Groceries'),
(10, 1, 4, -20000, '2024-02-07', 'Rent February'),
(11, 2, 5, -1800, '2024-02-12', 'Metro + auto'),
(12, 1, 6, -1500, '2024-02-18', 'OTT subscriptions'),
(13, 3, 7, -10000, '2024-02-25', 'SIP investment'),
(14, 1, 1, 85000, '2024-03-01', 'Monthly salary'),
(15, 1, 3, -7800, '2024-03-05', 'Groceries');The Questions
Basic Joins (4 questions)
- Show all transactions with account name and category name
- Show all transactions for the 'HDFC Savings' account
- Show total amount spent per category (expense categories only)
- Show all transactions in January 2024
Aggregations (4 questions)
- What is the total income vs total expense for each month?
- Which category has the highest total spending?
- What is the average transaction amount per account?
- Show months where total expenses exceeded 35000
Subqueries + CASE WHEN (4 questions)
- Show transactions that are above the average transaction amount
- Add a column: label each transaction 'Income' or 'Expense' based on whether amount is positive or negative
- Show categories that have more than 2 transactions
- Show the account with the highest current balance
Data Modification (3 questions)
- Add a new transaction: salary credit of 90000 on 2024-04-01 to HDFC Savings
- Update the Zerodha balance to 210000 (after investment gains)
- Delete all Entertainment transactions
Answers
-- 1
SELECT t.txn_id, a.account_name, c.category_name, t.amount, t.txn_date, t.note
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id;
-- 2
SELECT t.*, c.category_name
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id
WHERE a.account_name = 'HDFC Savings';
-- 3
SELECT c.category_name, SUM(ABS(t.amount)) AS total_spent
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
WHERE c.type = 'expense'
GROUP BY c.category_name
ORDER BY total_spent DESC;
-- 4
SELECT t.txn_id, a.account_name, c.category_name, t.amount, t.txn_date
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id
WHERE t.txn_date BETWEEN '2024-01-01' AND '2024-01-31';
-- 5
SELECT
strftime('%Y-%m', txn_date) AS month,
SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS total_income,
SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) AS total_expense
FROM transactions
GROUP BY month
ORDER BY month;
-- 6
SELECT c.category_name, SUM(ABS(t.amount)) AS total_spent
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
WHERE c.type = 'expense'
GROUP BY c.category_name
ORDER BY total_spent DESC
LIMIT 1;
-- 7
SELECT a.account_name, AVG(ABS(t.amount)) AS avg_txn
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
GROUP BY a.account_name;
-- 8
SELECT strftime('%Y-%m', txn_date) AS month, SUM(ABS(amount)) AS total_expense
FROM transactions
WHERE amount < 0
GROUP BY month
HAVING SUM(ABS(amount)) > 35000;
-- 9
SELECT * FROM transactions
WHERE ABS(amount) > (SELECT AVG(ABS(amount)) FROM transactions);
-- 10
SELECT
txn_id,
amount,
txn_date,
CASE WHEN amount > 0 THEN 'Income' ELSE 'Expense' END AS txn_type
FROM transactions;
-- 11
SELECT c.category_name, COUNT(*) AS txn_count
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
GROUP BY c.category_name
HAVING COUNT(*) > 2;
-- 12
SELECT account_name, balance
FROM accounts
ORDER BY balance DESC
LIMIT 1;
-- 13
INSERT INTO transactions (account_id, category_id, amount, txn_date, note)
VALUES (1, 1, 90000, '2024-04-01', 'Monthly salary');
-- 14
UPDATE accounts SET balance = 210000 WHERE account_name = 'Zerodha';
-- 15
DELETE FROM transactions
WHERE category_id = (
SELECT category_id FROM categories WHERE category_name = 'Entertainment'
);You Passed Phase 2 If You Can...
- [ ] JOIN 3 tables together confidently
- [ ] Write a subquery in WHERE and FROM
- [ ] Use CASE WHEN for conditional columns and conditional aggregation
- [ ] Write INSERT, UPDATE, DELETE safely (with WHERE)
- [ ] Design a basic table with appropriate constraints
Next Up: Phase 3 — Advanced SQL
- Week 16: Window Functions Part 1 — ROW_NUMBER, RANK, DENSE_RANK
Window Functions — Ranking
Week 16 — Window Functions Part 1: ROW_NUMBER, RANK, DENSE_RANK
- Aggregates with GROUP BY collapse rows into one row per group
- Window functions are different — they add a computed column without collapsing rows
What is a Window Function?
-- GROUP BY collapses rows — you lose individual employee rows
SELECT department, AVG(salary) FROM employees GROUP BY department;
-- Window function keeps all rows AND adds the average alongside each
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;Both calculate the average per department — but window functions keep every row visible.
Syntax
function_name() OVER (
PARTITION BY column -- divide into groups (optional)
ORDER BY column -- order within each group (required for ranking)
)ROW_NUMBER — sequential number per partition
-- Rank employees within each department by salary (1 = highest)
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;Result:
| name | department | salary | row_num |
|---|---|---|---|
| Sneha | Engineering | 90000 | 1 |
| Rahul | Engineering | 80000 | 2 |
| Dattatray | Finance | 60000 | 1 |
| Priya | Finance | 55000 | 2 |
Ties get different numbers (arbitrary order for ties).
RANK — rank with gaps for ties
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;If two employees tie at rank 2, the next rank is 4 (gap).
| name | salary | rnk |
|---|---|---|
| Sneha | 90000 | 1 |
| Rahul | 80000 | 2 |
| Dattatray | 60000 | 3 |
| Priya | 55000 | 4 |
| Amit | 45000 | 5 |
DENSE_RANK — rank without gaps
SELECT
name,
salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;If two employees tie at rank 2, the next rank is still 3 (no gap).
ROW_NUMBER vs RANK vs DENSE_RANK
Imagine salaries: 90000, 80000, 80000, 55000
| Function | Result |
|---|---|
| ROW_NUMBER | 1, 2, 3, 4 |
| RANK | 1, 2, 2, 4 |
| DENSE_RANK | 1, 2, 2, 3 |
Classic Use Case: Top N per Group
Get the top 1 (or top 2) earner per department:
SELECT name, department, salary
FROM (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
WHERE rn = 1;This is one of the most common interview questions in SQL.
Without PARTITION BY — rank across entire table
SELECT
name,
salary,
RANK() OVER (ORDER BY salary DESC) AS company_rank
FROM employees;No PARTITION BY = the window is the entire table.
Practice Exercises
- Rank all employees by salary (highest = rank 1) using RANK
- Assign row numbers to employees within each department (ordered by salary desc)
- Show employees with their company-wide DENSE_RANK by salary
- Find the top earner in each department
- Find the top 2 earners in each department
- Show each employee's salary rank within their department and their overall company rank
Answers
-- 1
SELECT name, salary, RANK() OVER (ORDER BY salary DESC) AS salary_rank
FROM employees;
-- 2
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;
-- 3
SELECT name, salary,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;
-- 4
SELECT name, department, salary FROM (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) WHERE rn = 1;
-- 5
SELECT name, department, salary FROM (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
) WHERE rn <= 2;
-- 6
SELECT
name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank,
RANK() OVER (ORDER BY salary DESC) AS company_rank
FROM employees;- Window functions add a column — they don't reduce rows like GROUP BY
PARTITION BY= how to split the window (like GROUP BY for window functions)ORDER BYinside OVER = how to order within each partitionROW_NUMBER= always unique;RANK= gaps on ties;DENSE_RANK= no gaps- Top-N per group pattern: ROW_NUMBER in subquery, then filter by rn <= N
- Window Functions Part 2 — LAG, LEAD, SUM/AVG OVER, running totals
Window Functions — LAG/LEAD
Week 17 — Window Functions Part 2: LAG, LEAD, Running Totals
- ROW_NUMBER, RANK, DENSE_RANK
- PARTITION BY and ORDER BY inside OVER()
LAG — look at the previous row's value
SELECT
sale_date,
amount,
LAG(amount) OVER (ORDER BY sale_date) AS prev_amount
FROM sales;Result:
| sale_date | amount | prev_amount |
|---|---|---|
| 2024-01-01 | 50000 | NULL |
| 2024-02-01 | 65000 | 50000 |
| 2024-03-01 | 45000 | 65000 |
First row has NULL — no previous row exists.
LAG with default value
LAG(amount, 1, 0) OVER (ORDER BY sale_date)
-- 1 = look 1 row back, 0 = default if no row existsLEAD — look at the next row's value
SELECT
sale_date,
amount,
LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;Last row has NULL — no next row.
Month-over-month change using LAG
SELECT
sale_date,
amount,
LAG(amount) OVER (ORDER BY sale_date) AS prev_month,
amount - LAG(amount) OVER (ORDER BY sale_date) AS mom_change,
ROUND(
100.0 * (amount - LAG(amount) OVER (ORDER BY sale_date))
/ LAG(amount) OVER (ORDER BY sale_date),
2
) AS pct_change
FROM monthly_sales;This is a classic interview question — calculate MoM growth.
Running Total with SUM OVER
SELECT
sale_date,
amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;Result:
| sale_date | amount | running_total |
|---|---|---|
| 2024-01-01 | 50000 | 50000 |
| 2024-02-01 | 65000 | 115000 |
| 2024-03-01 | 45000 | 160000 |
Running Average
SELECT
sale_date,
amount,
AVG(amount) OVER (ORDER BY sale_date) AS running_avg
FROM sales;Partitioned Running Total
-- Running total per department
SELECT
name,
department,
salary,
SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS dept_running_total
FROM employees;Moving Average (rolling window)
-- 3-month moving average
SELECT
sale_date,
amount,
AVG(amount) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3m
FROM monthly_sales;ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = current row + 2 rows before it.
NTILE — divide rows into N buckets
-- Divide employees into 4 salary quartiles
SELECT
name,
salary,
NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;Quartile 1 = top 25%, quartile 4 = bottom 25%.
FIRST_VALUE and LAST_VALUE
SELECT
name,
department,
salary,
FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS top_earner
FROM employees;Practice Exercises
Using a monthly_sales table:
| month | revenue |
|---|---|
| 2024-01 | 120000 |
| 2024-02 | 145000 |
| 2024-03 | 98000 |
| 2024-04 | 167000 |
| 2024-05 | 152000 |
- Show each month's revenue alongside the previous month's revenue
- Calculate month-over-month change in revenue
- Show running total of revenue
- Show 3-month moving average of revenue
- Using
employees, show each employee's salary and the highest salary in their department
Answers
-- 1
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue
FROM monthly_sales;
-- 2
SELECT
month,
revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
FROM monthly_sales;
-- 3
SELECT month, revenue,
SUM(revenue) OVER (ORDER BY month) AS running_total
FROM monthly_sales;
-- 4
SELECT month, revenue,
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3m
FROM monthly_sales;
-- 5
SELECT name, department, salary,
MAX(salary) OVER (PARTITION BY department) AS dept_max_salary
FROM employees;LAG(col)= previous row's value;LEAD(col)= next row's valueSUM() OVER (ORDER BY ...)= running total (cumulative)AVG() OVER (ORDER BY ...)= running averageROWS BETWEEN= define a rolling window- MoM change =
amount - LAG(amount) OVER (ORDER BY date)
- CTEs (Common Table Expressions) — the
WITHclause for readable complex queries
CTEs
Week 18 — CTEs (Common Table Expressions)
- Subqueries — queries inside queries
- Window functions
A CTE is a named, temporary result set defined with WITH. It makes complex queries readable.
Basic CTE Syntax
WITH cte_name AS (
SELECT ...
FROM ...
WHERE ...
)
SELECT * FROM cte_name;CTE vs Subquery — same result, different readability
Subquery (hard to read)
SELECT name, department, salary
FROM (
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
)
WHERE salary > dept_avg;CTE (readable)
WITH dept_averages AS (
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
)
SELECT name, department, salary
FROM dept_averages
WHERE salary > dept_avg;Same result. The CTE version is far easier to understand and debug.
Multiple CTEs
Chain multiple CTEs with commas:
WITH
monthly_revenue AS (
SELECT
strftime('%Y-%m', sale_date) AS month,
SUM(amount) AS revenue
FROM sales
GROUP BY month
),
mom_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS growth
FROM monthly_revenue
)
SELECT * FROM mom_growth
WHERE growth IS NOT NULL
ORDER BY month;Each CTE builds on the previous one. This replaces deeply nested subqueries.
CTEs for Top-N per Group (cleaner version)
WITH ranked_employees AS (
SELECT
name,
department,
salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rn = 1;Recursive CTEs — generate sequences
A powerful but advanced use: a CTE that references itself.
-- Generate numbers 1 to 10
WITH RECURSIVE counter(n) AS (
SELECT 1
UNION ALL
SELECT n + 1 FROM counter WHERE n < 10
)
SELECT n FROM counter;Practical use: generate a calendar of dates.
WITH RECURSIVE dates(d) AS (
SELECT DATE('2024-01-01')
UNION ALL
SELECT DATE(d, '+1 day') FROM dates WHERE d < '2024-01-31'
)
SELECT d FROM dates;CTEs for Reusability Within a Query
WITH high_earners AS (
SELECT * FROM employees WHERE salary > 70000
)
-- Use it multiple times
SELECT department, COUNT(*) FROM high_earners GROUP BY department
UNION ALL
SELECT 'TOTAL', COUNT(*) FROM high_earners;When to Use CTEs vs Subqueries
| Use CTE when... | Use subquery when... |
|---|---|
| Logic is complex or multi-step | It's a simple one-liner |
| You need to reuse the same result | It's used only once |
| You want readable, debuggable code | You're optimizing for brevity |
| You're building on previous steps | Query is already simple |
In interviews and real work: default to CTEs. They signal clean thinking.
Practice Exercises
- Use a CTE to find employees who earn above the company average salary
- Use two CTEs: first get total revenue per month, then calculate MoM change
- Rewrite this subquery as a CTE:
SELECT * FROM (
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees
) WHERE rnk <= 2;- Use a recursive CTE to generate dates for all of February 2024
Answers
-- 1
WITH avg_salary AS (
SELECT AVG(salary) AS avg FROM employees
)
SELECT e.name, e.salary
FROM employees e, avg_salary
WHERE e.salary > avg_salary.avg;
-- 2
WITH monthly AS (
SELECT strftime('%Y-%m', sale_date) AS month, SUM(amount) AS revenue
FROM sales GROUP BY month
),
growth AS (
SELECT month, revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
revenue - LAG(revenue) OVER (ORDER BY month) AS mom_change
FROM monthly
)
SELECT * FROM growth ORDER BY month;
-- 3
WITH ranked AS (
SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
FROM employees
)
SELECT name, department, salary FROM ranked WHERE rnk <= 2;
-- 4
WITH RECURSIVE feb_dates(d) AS (
SELECT DATE('2024-02-01')
UNION ALL
SELECT DATE(d, '+1 day') FROM feb_dates WHERE d < '2024-02-29'
)
SELECT d FROM feb_dates;WITH name AS (...)defines a CTE — use it like a table below- Multiple CTEs:
WITH a AS (...), b AS (...)— separate with commas - CTEs make complex queries readable and debuggable
- Recursive CTEs use
UNION ALLand reference themselves - Default to CTEs over nested subqueries in interviews and real work
- Indexes + Query Performance — why queries slow down and how to fix them
Indexes & Performance
Week 19 — Indexes + Query Performance
- All SQL querying techniques
- CTEs and window functions
This week you learn why queries go slow — and how to fix them.
Why Queries Slow Down
When you run SELECT * FROM orders WHERE customer_id = 42, the database has two choices:
- Full table scan — read every single row and check if customer_id = 42
- Index lookup — jump directly to rows where customer_id = 42
Without an index, the database always does a full table scan. With millions of rows, this is slow.
What is an Index?
An index is a separate data structure (like a book's index) that maps column values to row locations. The database can jump straight to matching rows.
CREATE INDEX
-- Basic index on one column
CREATE INDEX idx_customer_id ON orders(customer_id);
-- Index on multiple columns (composite index)
CREATE INDEX idx_dept_salary ON employees(department, salary);
-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_email ON users(email);DROP INDEX
DROP INDEX idx_customer_id;When Indexes Help
| Query pattern | Index helps? |
|---|---|
WHERE customer_id = 42 | Yes — equality lookup |
WHERE department = 'Finance' | Yes |
WHERE salary > 50000 | Yes — range scan |
ORDER BY hire_date | Yes |
JOIN ON orders.customer_id = customers.customer_id | Yes — both sides |
WHERE UPPER(name) = 'RAHUL' | No — function on column defeats index |
WHERE name LIKE '%rah%' | No — leading wildcard defeats index |
WHERE name LIKE 'rah%' | Yes — trailing wildcard uses index |
EXPLAIN QUERY PLAN (SQLite)
See how SQLite executes a query — does it use an index or scan the whole table?
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 42;Output without index:
SCAN TABLE ordersOutput with index:
SEARCH TABLE orders USING INDEX idx_customer_id (customer_id=?)SEARCH = uses index (fast). SCAN = full table scan (slow).
Index Trade-offs
| Benefit | Cost |
|---|---|
| SELECT queries are faster | INSERT/UPDATE/DELETE are slower |
| ORDER BY is faster | More disk space used |
| JOINs are faster | Too many indexes = slow writes |
Rule: Index columns used frequently in WHERE, JOIN ON, and ORDER BY. Don't index everything.
Query Optimization Tips
1. Avoid functions on indexed columns in WHERE
-- BAD — index not used
WHERE YEAR(hire_date) = 2023
-- GOOD — index used
WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31'2. Avoid leading wildcards in LIKE
-- BAD — full scan
WHERE name LIKE '%aya'
-- GOOD — index used
WHERE name LIKE 'Dat%'3. Use LIMIT for large result sets
SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;4. Select only what you need
-- BAD
SELECT * FROM orders WHERE customer_id = 42;
-- GOOD
SELECT order_id, amount, order_date FROM orders WHERE customer_id = 42;5. Index both sides of a JOIN
-- Make sure both customer_id columns are indexed
CREATE INDEX idx_orders_cust ON orders(customer_id);
-- customers.customer_id is usually a PRIMARY KEY (auto-indexed)Composite Index Column Order Matters
CREATE INDEX idx_dept_salary ON employees(department, salary);This index helps:
WHERE department = 'Finance'WHERE department = 'Finance' AND salary > 60000
This index does NOT help:
WHERE salary > 60000(without department)
The leftmost column in the index must be in the WHERE clause.
Practice Exercises
- Create an index on
orders.customer_id - Create a composite index on
employees(department, salary) - Use EXPLAIN QUERY PLAN to check if a query uses an index
- Rewrite this slow query to use an index:
WHERE STRFTIME('%Y', hire_date) = '2023'
- Which of these needs an index: a table with 100 rows or 10 million rows?
Answers
-- 1
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- 2
CREATE INDEX idx_emp_dept_salary ON employees(department, salary);
-- 3
EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 5;
-- 4
-- BAD (function on column, can't use index)
WHERE STRFTIME('%Y', hire_date) = '2023'
-- GOOD (range scan, index works)
WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31'
-- 5
10 million rows — indexes are for large tables.
For 100 rows, a full scan is nearly instant anyway.- Index = lookup table for column values → speeds up reads, slows writes
EXPLAIN QUERY PLANshows you if the index is being used- Avoid functions on indexed columns in WHERE — they bypass the index
- Leading wildcards (
LIKE '%term') defeat indexes - Composite index: leftmost column must be in WHERE to get a benefit
- Don't over-index — every index adds write overhead
- Views + Stored Procedures — saving reusable query logic
Transactions & ACID
Week 20 — Views + Stored Procedures
- CTEs for reusable in-query logic
- Indexes for performance
This week you learn to save query logic permanently — views for reusable queries, stored procedures for reusable programs.
Views — saved SELECT queries
A view is a stored SELECT query that you can use like a table.
-- Create a view
CREATE VIEW high_earners AS
SELECT name, department, salary
FROM employees
WHERE salary > 70000;
-- Use it like a table
SELECT * FROM high_earners;
SELECT department, COUNT(*) FROM high_earners GROUP BY department;The view doesn't store data — it reruns the SELECT every time you query it.
CREATE VIEW
CREATE VIEW dept_summary AS
SELECT
department,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department;Now anyone can run SELECT * FROM dept_summary without knowing the underlying query.
DROP VIEW
DROP VIEW high_earners;
DROP VIEW IF EXISTS high_earners;CREATE OR REPLACE VIEW (PostgreSQL/MySQL)
CREATE OR REPLACE VIEW high_earners AS
SELECT name, department, salary, bonus
FROM employees
WHERE salary > 70000;SQLite doesn't support OR REPLACE for views — drop and recreate.
When to Use Views
| Use case | Example |
|---|---|
| Simplify complex queries | One view for a 5-table JOIN |
| Hide sensitive columns | View without salary column for non-HR users |
| Consistent reporting | Standard monthly summary everyone uses |
| Simplify permissions | Grant access to a view, not the raw table |
Stored Procedures (PostgreSQL/MySQL concept)
SQLite doesn't support stored procedures. But you'll encounter them in PostgreSQL and MySQL at work.
A stored procedure is a named, saved program that can run multiple SQL statements:
-- PostgreSQL syntax
CREATE OR REPLACE PROCEDURE give_raise(dept TEXT, pct REAL)
LANGUAGE plpgsql AS $$
BEGIN
UPDATE employees
SET salary = salary * (1 + pct / 100)
WHERE department = dept;
COMMIT;
END;
$$;
-- Call it
CALL give_raise('Engineering', 10);Functions (available in SQLite with extensions)
SQLite supports user-defined functions via application code, but not natively via SQL.
In PostgreSQL:
CREATE FUNCTION get_headcount(dept TEXT)
RETURNS INTEGER AS $$
SELECT COUNT(*) FROM employees WHERE department = dept;
$$ LANGUAGE sql;
SELECT get_headcount('Finance'); -- returns 2Materialized Views (PostgreSQL concept)
A regular view reruns the query every time. A materialized view stores the result physically — like a cache.
-- PostgreSQL
CREATE MATERIALIZED VIEW monthly_summary AS
SELECT strftime('%Y-%m', sale_date) AS month, SUM(amount) AS revenue
FROM sales GROUP BY month;
-- Refresh when data changes
REFRESH MATERIALIZED VIEW monthly_summary;Use when the underlying query is expensive and data doesn't change every second.
Practice Exercises (SQLite — views only)
- Create a view
finance_teamshowing only Finance department employees - Create a view
order_summarythat joins orders + customers and shows: order_id, customer name, product, amount - Create a view
dept_statsshowing headcount, avg salary, and total payroll per department - Query
dept_statsto find departments with avg salary > 60000 - Drop the
finance_teamview
Answers
-- 1
CREATE VIEW finance_team AS
SELECT * FROM employees WHERE department = 'Finance';
-- 2
CREATE VIEW order_summary AS
SELECT o.order_id, c.name AS customer, o.product, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;
-- 3
CREATE VIEW dept_stats AS
SELECT
department,
COUNT(*) AS headcount,
AVG(salary) AS avg_salary,
SUM(salary) AS total_payroll
FROM employees
GROUP BY department;
-- 4
SELECT * FROM dept_stats WHERE avg_salary > 60000;
-- 5
DROP VIEW IF EXISTS finance_team;- A view is a named SELECT query — query it like a table
- Views don't store data — they re-execute on every query
- Use views to simplify complex joins and standardize reporting
- SQLite supports views but not stored procedures
- PostgreSQL/MySQL have stored procedures + functions + materialized views
- Advanced Aggregations — ROLLUP, CUBE, GROUPING SETS
Advanced Subqueries
Week 21 — Advanced Aggregations: ROLLUP, CUBE, GROUPING SETS
- GROUP BY, HAVING
- Window functions
- CTEs
These three operators extend GROUP BY to produce subtotals and grand totals in a single query.
> Note: SQLite has limited support for these. Full support in PostgreSQL and MySQL. This lesson prepares you for those environments.
The Problem They Solve
You want a report with:
- Revenue per region per product
- Subtotal per region
- Grand total for all regions
Without ROLLUP, you'd need 3 separate queries and UNION them. ROLLUP does it in one.
ROLLUP — hierarchical subtotals
SELECT region, product, SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product);Result:
| region | product | total | |
|---|---|---|---|
| North | Laptop | 85000 | |
| North | Phone | 45000 | |
| North | NULL | 130000 | ← subtotal for North |
| South | Tablet | 30000 | |
| South | NULL | 30000 | ← subtotal for South |
| NULL | NULL | 160000 | ← grand total |
NULL in the output = subtotal row (not actual NULL data).
GROUPING() — detect if a NULL is a subtotal
SELECT
CASE WHEN GROUPING(region) = 1 THEN 'ALL REGIONS' ELSE region END AS region,
CASE WHEN GROUPING(product) = 1 THEN 'ALL PRODUCTS' ELSE product END AS product,
SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product);GROUPING(col) returns 1 if the NULL is from a rollup (subtotal), 0 if it's real data.
CUBE — all combinations of subtotals
ROLLUP gives subtotals along a hierarchy (region → product).
CUBE gives subtotals for every possible combination.
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY CUBE(region, product);Generates:
- Per region per product
- Per region (across all products)
- Per product (across all regions)
- Grand total
More rows than ROLLUP — useful for pivot-style reports.
GROUPING SETS — choose exactly which subtotals you want
More control than ROLLUP or CUBE:
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
(region, product), -- detail level
(region), -- subtotal by region
() -- grand total
);Simulating ROLLUP in SQLite with UNION ALL
Since SQLite doesn't support ROLLUP natively:
-- Detail rows
SELECT department, 'subtotal' AS level, SUM(salary) AS total
FROM employees GROUP BY department
UNION ALL
-- Grand total
SELECT 'ALL', 'grand_total', SUM(salary)
FROM employees;Practical Example: Sales Dashboard Query
-- PostgreSQL/MySQL
SELECT
COALESCE(region, 'TOTAL') AS region,
COALESCE(category, 'ALL CATEGORIES') AS category,
SUM(amount) AS revenue,
COUNT(*) AS transactions
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region, category;Practice Exercises
For these, use PostgreSQL or MySQL if available. In SQLite, simulate with UNION ALL.
- Get total sales per department with a grand total row (simulate with UNION ALL in SQLite)
- Write the ROLLUP version for sales by region and product
- What is the difference between ROLLUP and CUBE?
- Write a query using GROUPING SETS to get: per department, per salary band, and overall total
Answers
-- 1 (SQLite simulation)
SELECT department, SUM(salary) AS total_salary
FROM employees GROUP BY department
UNION ALL
SELECT 'GRAND TOTAL', SUM(salary) FROM employees;
-- 2 (PostgreSQL/MySQL)
SELECT region, product, SUM(amount) AS revenue
FROM sales
GROUP BY ROLLUP(region, product);
-- 3
-- ROLLUP: hierarchical — subtotals along one axis (region → product → grand total)
-- CUBE: all combinations — subtotals for every possible grouping
-- 4 (PostgreSQL/MySQL)
SELECT
department,
CASE
WHEN salary >= 80000 THEN 'High'
WHEN salary >= 55000 THEN 'Mid'
ELSE 'Low'
END AS salary_band,
SUM(salary) AS total
FROM employees
GROUP BY GROUPING SETS (
(department),
(salary_band),
()
);ROLLUP= subtotals along a hierarchy + grand totalCUBE= subtotals for every combination of dimensionsGROUPING SETS= you pick exactly which subtotals to includeGROUPING(col)= 1 if NULL is from rollup, 0 if real data- SQLite doesn't support these natively — PostgreSQL and MySQL do
- In SQLite, simulate with
UNION ALL
- Interview Prep — the top SQL questions asked in data analyst and engineering interviews
Interview Prep
Week 22 — SQL Interview Prep
The Most Common SQL Interview Questions
Categorized by type. Each one appears frequently at companies like Flipkart, Swiggy, Zepto, Goldman Sachs, JP Morgan, and top tech firms.
Category 1: Basic Filtering + Aggregation
-- Method 1: OFFSET
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;
-- Method 2: Subquery
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);
-- Method 3: DENSE_RANK (preferred in interviews)
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
WHERE rnk = 2;-- For Nth highest, use DENSE_RANK with rnk = N
SELECT salary FROM (
SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees
)
WHERE rnk = N; -- replace N with the desired rankSELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
SELECT AVG(salary) FROM employees WHERE department = e.department
);
-- Cleaner with window function
SELECT name, department, salary FROM (
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees
)
WHERE salary > dept_avg;Category 2: Duplicates
SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;DELETE FROM users
WHERE id NOT IN (
SELECT MIN(id) FROM users GROUP BY email
);-- Customers who never ordered
SELECT customer_id FROM customers
WHERE customer_id NOT IN (
SELECT DISTINCT customer_id FROM orders WHERE customer_id IS NOT NULL
);
-- Cleaner with LEFT JOIN
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;Category 3: Rankings + Top N
SELECT name, department, salary FROM (
SELECT name, department, salary,
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
FROM employees
)
WHERE rn = 1;SELECT product, SUM(amount) AS revenue
FROM sales
GROUP BY product
ORDER BY revenue DESC
LIMIT 3;SELECT name, department, salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;Category 4: Date and Time
SELECT * FROM orders
WHERE order_date >= DATE('now', '-30 days');WITH monthly AS (
SELECT strftime('%Y-%m', sale_date) AS month, SUM(amount) AS revenue
FROM sales GROUP BY month
)
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
/ LAG(revenue) OVER (ORDER BY month), 2) AS pct_growth
FROM monthly;SELECT * FROM users
WHERE last_login < DATE('now', '-90 days')
OR last_login IS NULL;Category 5: Self Joins + Hierarchies
SELECT
m.name AS manager,
e.name AS employee
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;SELECT * FROM employees WHERE manager_id IS NULL;Category 6: Running Totals + Cumulative
SELECT sale_date, amount,
SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;SELECT
product,
amount,
ROUND(100.0 * amount / SUM(amount) OVER (), 2) AS pct_of_total
FROM sales;Category 7: Pivoting Data
-- Monthly sales as columns
SELECT
strftime('%Y', sale_date) AS year,
SUM(CASE WHEN strftime('%m', sale_date) = '01' THEN amount ELSE 0 END) AS Jan,
SUM(CASE WHEN strftime('%m', sale_date) = '02' THEN amount ELSE 0 END) AS Feb,
SUM(CASE WHEN strftime('%m', sale_date) = '03' THEN amount ELSE 0 END) AS Mar
FROM sales
GROUP BY year;Category 8: Classic Tricky Questions
SELECT DISTINCT customer_id FROM orders WHERE product = 'Laptop'
EXCEPT
SELECT DISTINCT customer_id FROM orders WHERE product = 'Phone';WITH daily AS (
SELECT DISTINCT sale_date FROM sales
),
ranked AS (
SELECT sale_date,
ROW_NUMBER() OVER (ORDER BY sale_date) AS rn
FROM daily
)
SELECT MIN(sale_date), MAX(sale_date), COUNT(*) AS streak
FROM ranked
GROUP BY DATE(sale_date, '-' || rn || ' days');SELECT name, hire_date FROM employees
WHERE strftime('%Y-%m', hire_date) IN (
SELECT strftime('%Y-%m', hire_date)
FROM employees
GROUP BY strftime('%Y-%m', hire_date)
HAVING COUNT(*) > 1
);- Think out loud — explain your approach before writing
- Clarify edge cases — "Are there NULLs? Can salaries be tied?"
- Start simple — write a basic version, then optimize
- Use CTEs — they show you can write clean, readable SQL
- Know DENSE_RANK vs RANK vs ROW_NUMBER — this comes up in every interview
- Know two ways to answer — subquery + window function versions
- Mention performance — "If this table is large, I'd ensure an index on X"
Quick Reference: Question → Technique
| Question type | Best technique |
|---|---|
| Nth highest value | DENSE_RANK window function |
| Top N per group | ROW_NUMBER + subquery |
| Duplicates | GROUP BY + HAVING COUNT > 1 |
| Missing from another table | LEFT JOIN + IS NULL, or NOT IN, or EXCEPT |
| Running total | SUM() OVER (ORDER BY ...) |
| MoM change | LAG() |
| Percentage of total | value / SUM() OVER () |
| Conditional count/sum | SUM(CASE WHEN ... THEN 1 ELSE 0 END) |
| Complex multi-step logic | CTE |
- Capstone Project — full dataset analysis combining everything
Capstone Project
Week 23 — Capstone Project: Stock Market + Finance Analysis
What This Tests
Everything across all 3 phases:
- All querying, filtering, aggregation
- JOINs, subqueries, CTEs
- Window functions (ROW_NUMBER, RANK, LAG, running totals)
- CASE WHEN, NULL handling
- Table design
This is your portfolio project — write it clean, use CTEs, comment your thinking.
The Dataset: Market + Fundamentals
Schema
CREATE TABLE stocks (
ticker TEXT PRIMARY KEY,
company_name TEXT NOT NULL,
sector TEXT,
market_cap REAL
);
CREATE TABLE prices (
price_id INTEGER PRIMARY KEY,
ticker TEXT,
price_date TEXT,
open_price REAL,
close_price REAL,
volume INTEGER,
FOREIGN KEY (ticker) REFERENCES stocks(ticker)
);
CREATE TABLE financials (
fin_id INTEGER PRIMARY KEY,
ticker TEXT,
fiscal_year INTEGER,
revenue REAL,
net_income REAL,
eps REAL,
FOREIGN KEY (ticker) REFERENCES stocks(ticker)
);Setup Data
INSERT INTO stocks VALUES
('RELIANCE', 'Reliance Industries', 'Energy', 1800000),
('TCS', 'Tata Consultancy Services', 'IT', 1400000),
('HDFCBANK', 'HDFC Bank', 'Banking', 1100000),
('INFY', 'Infosys', 'IT', 650000),
('WIPRO', 'Wipro', 'IT', 280000),
('ICICIBANK', 'ICICI Bank', 'Banking', 750000),
('SBIN', 'State Bank of India', 'Banking', 600000);
INSERT INTO prices VALUES
(1, 'TCS', '2024-01-01', 3800, 3850, 500000),
(2, 'TCS', '2024-02-01', 3850, 3920, 480000),
(3, 'TCS', '2024-03-01', 3920, 3880, 510000),
(4, 'TCS', '2024-04-01', 3880, 4050, 620000),
(5, 'TCS', '2024-05-01', 4050, 4120, 590000),
(6, 'INFY', '2024-01-01', 1600, 1640, 800000),
(7, 'INFY', '2024-02-01', 1640, 1590, 750000),
(8, 'INFY', '2024-03-01', 1590, 1620, 820000),
(9, 'INFY', '2024-04-01', 1620, 1700, 900000),
(10, 'INFY', '2024-05-01', 1700, 1680, 870000),
(11, 'HDFCBANK', '2024-01-01', 1500, 1520, 1200000),
(12, 'HDFCBANK', '2024-02-01', 1520, 1480, 1100000),
(13, 'HDFCBANK', '2024-03-01', 1480, 1510, 1150000),
(14, 'HDFCBANK', '2024-04-01', 1510, 1560, 1300000),
(15, 'HDFCBANK', '2024-05-01', 1560, 1590, 1250000);
INSERT INTO financials VALUES
(1, 'TCS', 2022, 1917540, 422720, 114.50),
(2, 'TCS', 2023, 2254540, 463620, 125.80),
(3, 'TCS', 2024, 2408560, 471980, 128.60),
(4, 'INFY', 2022, 1217310, 224530, 53.40),
(5, 'INFY', 2023, 1467490, 244910, 58.10),
(6, 'INFY', 2024, 1533990, 261290, 62.30),
(7, 'HDFCBANK', 2022, 1684500, 362960, 64.80),
(8, 'HDFCBANK', 2023, 2069770, 445930, 79.20),
(9, 'HDFCBANK', 2024, 2422640, 512830, 91.10);The Questions (30 total)
Price Analysis (10 questions)
- Show the latest price for each stock
- Show all stocks with their price change from Jan to May 2024
- Calculate the monthly return for TCS (% change each month)
- Show the best performing month for each stock (highest close price)
- Show MoM price change for all stocks (LAG)
- Which stock had the highest single-month price increase?
- Show a running total of volume traded for TCS
- Show the 3-month moving average price for INFY
- Rank stocks by their total % return from Jan to May 2024
- Which months did any stock price drop (close < open)?
Fundamental Analysis (8 questions)
- Show revenue growth % year-over-year for each company
- Which company has the highest EPS in 2024?
- Show net income margin (net_income/revenue) for each company in 2024
- Which company grew revenue fastest from 2022 to 2024?
- Show companies where EPS increased every year (2022→2023→2024)
- Rank companies by 2024 revenue
- Show the year with the highest revenue for each company
- Show companies where net income > 400000 in any year
Combined Analysis (7 questions)
- Show each stock's current price alongside its 2024 EPS (P/E ratio = price/EPS)
- Show IT sector stocks with their 2024 revenue
- Which sector has the highest total market cap?
- Show stocks where price in May 2024 > price in Jan 2024 AND revenue grew in 2024
- Show each company's sector and their YoY revenue growth
- Show the top 2 stocks by EPS within each sector
- Create a performance summary: ticker, sector, latest price, 2024 EPS, 2024 net income
Data Management (5 questions)
- Add a new stock: 'BAJFINANCE', 'Bajaj Finance', 'Finance', 500000
- Add price records for BAJFINANCE for January and February 2024 (prices: 7200, 7450)
- Update WIPRO's market cap to 320000
- Create a view
it_sector_summaryshowing all IT stocks with their 2024 financials - Create an index on prices(ticker, price_date) for faster queries
Answers
-- 1
SELECT ticker, MAX(price_date) AS latest_date, close_price
FROM prices GROUP BY ticker;
-- Better with window function
SELECT ticker, price_date, close_price FROM (
SELECT ticker, price_date, close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
FROM prices
) WHERE rn = 1;
-- 2
WITH jan AS (
SELECT ticker, close_price AS jan_price FROM prices WHERE price_date = '2024-01-01'
),
may AS (
SELECT ticker, close_price AS may_price FROM prices WHERE price_date = '2024-05-01'
)
SELECT j.ticker, j.jan_price, m.may_price,
ROUND(100.0 * (m.may_price - j.jan_price) / j.jan_price, 2) AS total_return_pct
FROM jan j INNER JOIN may m ON j.ticker = m.ticker;
-- 3
SELECT ticker, price_date, close_price,
LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS prev_close,
ROUND(100.0 * (close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date))
/ LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date), 2) AS monthly_return_pct
FROM prices
WHERE ticker = 'TCS';
-- 4
SELECT ticker, price_date, close_price FROM (
SELECT ticker, price_date, close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY close_price DESC) AS rn
FROM prices
) WHERE rn = 1;
-- 5
SELECT ticker, price_date, close_price,
close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS price_change
FROM prices;
-- 6
WITH changes AS (
SELECT ticker, price_date,
close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS price_change
FROM prices
)
SELECT ticker, price_date, price_change FROM changes
WHERE price_change = (SELECT MAX(price_change) FROM changes);
-- 7
SELECT price_date, volume,
SUM(volume) OVER (ORDER BY price_date) AS running_volume
FROM prices WHERE ticker = 'TCS';
-- 8
SELECT price_date, close_price,
AVG(close_price) OVER (
ORDER BY price_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3m
FROM prices WHERE ticker = 'INFY';
-- 9
WITH returns AS (
WITH jan AS (SELECT ticker, close_price AS jan_p FROM prices WHERE price_date = '2024-01-01'),
may AS (SELECT ticker, close_price AS may_p FROM prices WHERE price_date = '2024-05-01')
SELECT j.ticker, ROUND(100.0*(m.may_p - j.jan_p)/j.jan_p, 2) AS total_return
FROM jan j JOIN may m ON j.ticker = m.ticker
)
SELECT ticker, total_return, RANK() OVER (ORDER BY total_return DESC) AS performance_rank
FROM returns;
-- 10
SELECT ticker, price_date, open_price, close_price
FROM prices WHERE close_price < open_price;
-- 11
SELECT ticker, fiscal_year, revenue,
LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year) AS prev_revenue,
ROUND(100.0 * (revenue - LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year))
/ LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year), 2) AS yoy_growth_pct
FROM financials;
-- 12
SELECT f.ticker, s.company_name, f.eps
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024
ORDER BY f.eps DESC LIMIT 1;
-- 13
SELECT f.ticker, s.company_name,
ROUND(100.0 * f.net_income / f.revenue, 2) AS net_margin_pct
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024
ORDER BY net_margin_pct DESC;
-- 14
WITH growth AS (
WITH y2022 AS (SELECT ticker, revenue AS rev_2022 FROM financials WHERE fiscal_year = 2022),
y2024 AS (SELECT ticker, revenue AS rev_2024 FROM financials WHERE fiscal_year = 2024)
SELECT a.ticker, ROUND(100.0*(b.rev_2024 - a.rev_2022)/a.rev_2022, 2) AS total_growth_pct
FROM y2022 a JOIN y2024 b ON a.ticker = b.ticker
)
SELECT * FROM growth ORDER BY total_growth_pct DESC LIMIT 1;
-- 15
SELECT ticker FROM (
SELECT ticker, fiscal_year, eps,
LAG(eps) OVER (PARTITION BY ticker ORDER BY fiscal_year) AS prev_eps
FROM financials
) WHERE prev_eps IS NOT NULL AND eps > prev_eps
GROUP BY ticker HAVING COUNT(*) = 2;
-- 16
SELECT f.ticker, s.company_name, f.revenue,
RANK() OVER (ORDER BY f.revenue DESC) AS revenue_rank
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024;
-- 17
SELECT ticker, fiscal_year, revenue FROM (
SELECT ticker, fiscal_year, revenue,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY revenue DESC) AS rn
FROM financials
) WHERE rn = 1;
-- 18
SELECT DISTINCT f.ticker, s.company_name
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.net_income > 400000;
-- 19
WITH latest_prices AS (
SELECT ticker, close_price FROM (
SELECT ticker, close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
FROM prices
) WHERE rn = 1
)
SELECT s.ticker, s.company_name, lp.close_price, f.eps,
ROUND(lp.close_price / f.eps, 2) AS pe_ratio
FROM stocks s
JOIN latest_prices lp ON s.ticker = lp.ticker
JOIN financials f ON s.ticker = f.ticker AND f.fiscal_year = 2024;
-- 20
SELECT s.ticker, s.company_name, s.sector, f.revenue
FROM stocks s JOIN financials f ON s.ticker = f.ticker
WHERE s.sector = 'IT' AND f.fiscal_year = 2024;
-- 21
SELECT sector, SUM(market_cap) AS total_market_cap
FROM stocks GROUP BY sector ORDER BY total_market_cap DESC;
-- 22 (stocks that went up in price AND grew revenue)
WITH price_up AS (
WITH jan AS (SELECT ticker, close_price AS jan_p FROM prices WHERE price_date = '2024-01-01'),
may AS (SELECT ticker, close_price AS may_p FROM prices WHERE price_date = '2024-05-01')
SELECT j.ticker FROM jan j JOIN may m ON j.ticker = m.ticker WHERE m.may_p > j.jan_p
),
rev_grew AS (
WITH y23 AS (SELECT ticker, revenue AS r23 FROM financials WHERE fiscal_year = 2023),
y24 AS (SELECT ticker, revenue AS r24 FROM financials WHERE fiscal_year = 2024)
SELECT a.ticker FROM y23 a JOIN y24 b ON a.ticker = b.ticker WHERE b.r24 > a.r23
)
SELECT ticker FROM price_up
INTERSECT
SELECT ticker FROM rev_grew;
-- 23
SELECT s.ticker, s.sector, f.fiscal_year, f.revenue,
LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year) AS prev_revenue,
ROUND(100.0*(f.revenue - LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year))
/ LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year), 2) AS yoy_growth
FROM stocks s JOIN financials f ON s.ticker = f.ticker;
-- 24
SELECT ticker, sector, eps FROM (
SELECT f.ticker, s.sector, f.eps,
ROW_NUMBER() OVER (PARTITION BY s.sector ORDER BY f.eps DESC) AS rn
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024
) WHERE rn <= 2;
-- 25
WITH latest AS (
SELECT ticker, close_price FROM (
SELECT ticker, close_price,
ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
FROM prices
) WHERE rn = 1
)
SELECT s.ticker, s.sector, l.close_price, f.eps, f.net_income
FROM stocks s
LEFT JOIN latest l ON s.ticker = l.ticker
LEFT JOIN financials f ON s.ticker = f.ticker AND f.fiscal_year = 2024;
-- 26
INSERT INTO stocks VALUES ('BAJFINANCE', 'Bajaj Finance', 'Finance', 500000);
-- 27
INSERT INTO prices (ticker, price_date, open_price, close_price, volume)
VALUES
('BAJFINANCE', '2024-01-01', 7150, 7200, 300000),
('BAJFINANCE', '2024-02-01', 7200, 7450, 320000);
-- 28
UPDATE stocks SET market_cap = 320000 WHERE ticker = 'WIPRO';
-- 29
CREATE VIEW it_sector_summary AS
SELECT s.ticker, s.company_name, s.market_cap, f.fiscal_year, f.revenue, f.net_income, f.eps
FROM stocks s JOIN financials f ON s.ticker = f.ticker
WHERE s.sector = 'IT';
-- 30
CREATE INDEX idx_prices_ticker_date ON prices(ticker, price_date);Capstone Complete
If you worked through all 30 questions, you can:
- [ ] Write complex multi-CTE queries from scratch
- [ ] Use window functions for rankings, running totals, and MoM analysis
- [ ] Join 3+ tables and analyze at multiple granularities
- [ ] Design schemas with proper constraints and indexes
- [ ] Write queries that directly answer business questions
You are now job-ready for data analyst and SQL-heavy roles.
What to Do Next
- Push this project to GitHub — show it in your portfolio
- Practice on LeetCode SQL — Database problems section, Hard level
- Set up PostgreSQL — learn the differences from SQLite
- Explore Google BigQuery — free tier, industry-scale data
- Build Project 2 — Company financial health dashboard with real NSE/BSE data