SQL Mastery Course

23 weeks · 3 phases · From zero to job-ready

0 / 23 complete

W01

What is SQL + Databases

W02

Filtering Data

W03

Aggregate Functions

W04

GROUP BY + HAVING

W05

Subqueries

W06

NULL Values

W07

Mini Project 1

W08

JOINs — INNER & LEFT

W09

JOINs — RIGHT, FULL & CROSS

W10

String Functions

W11

Date & Time Functions

W12

CASE Statements

W13

Views

W14

Stored Procedures

W15

Mini Project 2

W16

Window Functions — Ranking

W17

Window Functions — LAG/LEAD

W18

CTEs

W19

Indexes & Performance

W20

Transactions & ACID

W21

Advanced Subqueries

W22

Interview Prep

W23

Capstone Project

Week 1 PHASE 1

What is SQL + Databases

Week 1 — What is SQL + Databases

Key Concepts

A database is an organized collection of data stored in tables
A table has columns (categories) and rows (data)
SQL is the language used to query (ask questions of) a database

Core Syntax

SQL

SELECT   -- what columns you want
FROM     -- which table
WHERE    -- filter condition (optional)

Your First Queries

Show all data

SQL

SELECT *
FROM employees;

Show specific columns

SQL

SELECT name, salary
FROM employees;

Filter with WHERE

SQL

SELECT *
FROM employees
WHERE department = 'Finance';

Multiple conditions

SQL

SELECT *
FROM employees
WHERE department = 'Finance' AND salary > 50000;

Sort results

SQL

SELECT *
FROM employees
ORDER BY salary DESC;

DESC = highest first. ASC = lowest first.

Practice Exercises

Using this table employees:

id	name	department	salary
1	Dattatray	Finance	60000
2	Rahul	Engineering	80000
3	Priya	Finance	55000
4	Amit	Marketing	45000
5	Sneha	Engineering	90000

Write SQL queries to answer:

Show all employees
Show only name and department of all employees
Show all employees in the Engineering department
Show all employees earning more than 55000
Show all Finance employees earning more than 55000
Show all employees sorted by salary from highest to lowest
Show only the name of employees in Marketing

Answers below — try first before looking!

Answers

💡 Key Things to Remember

SQL is not case sensitive — SELECT and select both work, but uppercase is convention
Always end your query with ;
Text values go in single quotes: 'Finance'
Number values don't need quotes: 50000

→ Next Week

WHERE with more conditions: AND, OR, NOT, IN, BETWEEN, LIKE
Filtering like a pro

Week 2 PHASE 1

Filtering Data

Week 2 — Filtering Data

📌 What You Already Know

SELECT, FROM, WHERE, ORDER BY
Filtering with a single condition: WHERE salary > 50000

This week you go deeper — combining conditions and using powerful filter tools.

Combining Conditions

AND — both conditions must be true

SQL

SELECT *
FROM employees
WHERE department = 'Finance' AND salary > 55000;

Returns only Finance employees who also earn more than 55000.

OR — either condition can be true

SQL

SELECT *
FROM employees
WHERE department = 'Finance' OR department = 'Marketing';

Returns employees from Finance or Marketing.

NOT — exclude a condition

SQL

SELECT *
FROM employees
WHERE NOT department = 'Finance';

Returns everyone except Finance employees.

IN — match a list of values

Instead of chaining multiple ORs:

SQL

-- Long way
WHERE department = 'Finance' OR department = 'Marketing' OR department = 'HR'

-- Clean way with IN
WHERE department IN ('Finance', 'Marketing', 'HR')

You can also use NOT IN to exclude:

SQL

SELECT *
FROM employees
WHERE department NOT IN ('Finance', 'Engineering');

BETWEEN — filter a range

SQL

SELECT *
FROM employees
WHERE salary BETWEEN 50000 AND 75000;

Includes both endpoints (50000 and 75000 are included).

Equivalent to:

SQL

WHERE salary >= 50000 AND salary <= 75000

LIKE — pattern matching for text

LIKE lets you search for partial matches using wildcards:

% = any number of characters (including zero)
_ = exactly one character

SQL

-- Names starting with 'S'
SELECT * FROM employees WHERE name LIKE 'S%';

-- Names ending with 'a'
SELECT * FROM employees WHERE name LIKE '%a';

-- Names containing 'rah' anywhere
SELECT * FROM employees WHERE name LIKE '%rah%';

-- 5-letter names starting with 'A'
SELECT * FROM employees WHERE name LIKE 'A____';

Combining Everything

SQL

SELECT *
FROM employees
WHERE department IN ('Finance', 'Engineering')
  AND salary BETWEEN 60000 AND 90000
  AND name LIKE 'S%'
ORDER BY salary DESC;

Practice Exercises

Using the same employees table:

id	name	department	salary
1	Dattatray	Finance	60000
2	Rahul	Engineering	80000
3	Priya	Finance	55000
4	Amit	Marketing	45000
5	Sneha	Engineering	90000
6	Sanjay	HR	48000
7	Divya	Marketing	52000

Write queries to answer:

Show employees from Finance or HR
Show employees not in Engineering
Show employees with salary between 50000 and 65000
Show employees whose name starts with 'S'
Show employees in Finance or Marketing with salary above 50000
Show employees whose name contains the letter 'a' (anywhere)
Show Engineering employees earning more than 75000, sorted by salary descending

Try first before looking at answers!

Answers

💡 Key Things to Remember

AND is stricter — both must be true
OR is broader — either can be true
IN is cleaner than chaining many ORs
BETWEEN includes the boundary values
LIKE is for text patterns — % is your wildcard
You can combine all of these in one WHERE clause

→ Next Week

Functions: COUNT, SUM, AVG, MIN, MAX
Answer questions like: "How many employees are in Finance?" or "What's the average salary?"

Week 3 PHASE 1

Aggregate Functions

Week 3 — Aggregate Functions

📌 What You Already Know

SELECT, FROM, WHERE, ORDER BY
Filtering with AND, OR, NOT, IN, BETWEEN, LIKE

This week you learn to summarize data — count rows, add up values, find averages.

The 5 Core Aggregate Functions

Function	What it does
`COUNT()`	Count number of rows
`SUM()`	Add up values
`AVG()`	Calculate average
`MIN()`	Find the lowest value
`MAX()`	Find the highest value

COUNT

SQL

-- How many employees are there?
SELECT COUNT(*) FROM employees;

-- How many employees are in Finance?
SELECT COUNT(*) FROM employees
WHERE department = 'Finance';

-- Give the result a clear name with AS
SELECT COUNT(*) AS total_employees FROM employees;

AS renames the output column — always use it with aggregates so results are readable.

SUM

SQL

-- Total salary bill
SELECT SUM(salary) AS total_salary FROM employees;

-- Total salary for Engineering only
SELECT SUM(salary) AS engineering_salary
FROM employees
WHERE department = 'Engineering';

AVG

SQL

-- Average salary across all employees
SELECT AVG(salary) AS avg_salary FROM employees;

-- Average salary in Finance
SELECT AVG(salary) AS avg_finance_salary
FROM employees
WHERE department = 'Finance';

MIN and MAX

SQL

-- Lowest and highest salary
SELECT MIN(salary) AS lowest, MAX(salary) AS highest
FROM employees;

-- Highest salary in Engineering
SELECT MAX(salary) AS top_eng_salary
FROM employees
WHERE department = 'Engineering';

Combining Multiple Aggregates

SQL

SELECT
    COUNT(*) AS headcount,
    SUM(salary) AS total_payroll,
    AVG(salary) AS avg_salary,
    MIN(salary) AS lowest_salary,
    MAX(salary) AS highest_salary
FROM employees;

Practice Exercises

Using employees:

id	name	department	salary
1	Dattatray	Finance	60000
2	Rahul	Engineering	80000
3	Priya	Finance	55000
4	Amit	Marketing	45000
5	Sneha	Engineering	90000
6	Sanjay	HR	48000
7	Divya	Marketing	52000

How many total employees are there?
What is the total payroll (sum of all salaries)?
What is the average salary?
What is the highest salary in Marketing?
What is the lowest salary in the entire company?
How many employees are in Engineering?
Show count, average, min, and max salary all in one query

Try first!

Answers

💡 Key Things to Remember

Aggregate functions collapse many rows into one number
Always use AS to name your output columns
You can use WHERE before aggregating to filter first
COUNT(*) counts all rows; COUNT(column) skips NULLs

→ Next Week

GROUP BY — run aggregates per group (e.g. avg salary per department)
HAVING — filter groups after aggregating

Week 4 PHASE 1

GROUP BY + HAVING

Week 4 — GROUP BY + HAVING

📌 What You Already Know

COUNT, SUM, AVG, MIN, MAX
Using WHERE to filter before aggregating

This week you learn to group data — get aggregates per department, per city, per category.

GROUP BY — Aggregates Per Group

Without GROUP BY, aggregates return one row for the whole table.

With GROUP BY, you get one row per group.

SQL

-- Average salary per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;

Result:

department	avg_salary
Finance	57500
Engineering	85000
Marketing	48500
HR	48000

Rules for GROUP BY

Every column in SELECT must either:

Be in GROUP BY, OR
Be inside an aggregate function

SQL

-- CORRECT
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department;

-- WRONG — 'name' is not grouped or aggregated
SELECT department, name, COUNT(*)
FROM employees
GROUP BY department;

Multiple Columns in GROUP BY

SQL

-- Count employees per department per job title
SELECT department, job_title, COUNT(*) AS count
FROM employees
GROUP BY department, job_title;

ORDER BY with GROUP BY

SQL

-- Departments sorted by average salary (highest first)
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
ORDER BY avg_salary DESC;

HAVING — Filter Groups After Aggregating

WHERE filters rows before grouping.

HAVING filters groups after aggregating.

SQL

-- Only departments with more than 1 employee
SELECT department, COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 1;

-- Only departments with average salary above 60000
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 60000;

WHERE vs HAVING — The Key Difference

SQL

-- WHERE filters individual rows first, THEN GROUP BY runs
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE salary > 45000          -- filter rows first
GROUP BY department
HAVING AVG(salary) > 60000;  -- then filter groups

Think of it this way:

WHERE = filter before cooking
HAVING = filter after cooking

Full Query Order

SQL clauses always go in this order:

SQL

SELECT   department, AVG(salary)
FROM     employees
WHERE    salary > 40000
GROUP BY department
HAVING   AVG(salary) > 55000
ORDER BY AVG(salary) DESC;

Practice Exercises

Show headcount (number of employees) per department
Show total payroll per department
Show average salary per department, sorted highest to lowest
Show only departments with more than 1 employee
Show departments where average salary is above 55000
Show departments with total payroll above 100000
Show departments with more than 1 employee, sorted by headcount descending

Answers

💡 Key Things to Remember

GROUP BY splits data into groups, aggregates run per group
Every non-aggregate in SELECT must be in GROUP BY
WHERE filters rows before grouping — HAVING filters groups after
You can use both WHERE and HAVING in the same query

→ Next Week

String functions — UPPER, LOWER, TRIM, LENGTH, SUBSTR
Date functions — DATE, YEAR, MONTH, strftime

Week 5 PHASE 1

Subqueries

Week 5 — String + Date Functions

📌 What You Already Know

GROUP BY, HAVING, aggregate functions

This week you learn to clean and transform text and date values inside queries.

String Functions

UPPER and LOWER — change case

SQL

SELECT UPPER(name) AS name_upper FROM employees;
-- 'Dattatray' → 'DATTATRAY'

SELECT LOWER(name) AS name_lower FROM employees;
-- 'Dattatray' → 'dattatray'

LENGTH — count characters

SQL

SELECT name, LENGTH(name) AS name_length FROM employees;
-- 'Sneha' → 5

TRIM — remove leading/trailing spaces

SQL

SELECT TRIM('  hello  ') AS clean;
-- '  hello  ' → 'hello'

Useful when data has accidental spaces.

SUBSTR — extract part of a string

SQL

-- SUBSTR(column, start_position, length)
SELECT SUBSTR(name, 1, 3) AS short_name FROM employees;
-- 'Dattatray' → 'Dat'
-- 'Rahul' → 'Rah'

REPLACE — swap text

SQL

SELECT REPLACE(department, 'Engineering', 'Eng') AS dept
FROM employees;

Combining string functions

SQL

SELECT
    name,
    UPPER(SUBSTR(name, 1, 1)) || LOWER(SUBSTR(name, 2)) AS formatted_name
FROM employees;

|| is SQLite's string concatenation operator.

Date Functions (SQLite)

SQLite stores dates as text in YYYY-MM-DD format.

DATE — get today's date

SQL

SELECT DATE('now');
-- '2026-04-25'

strftime — extract parts of a date

SQL

-- strftime('%Y', date_column) → year
-- strftime('%m', date_column) → month (01-12)
-- strftime('%d', date_column) → day

SELECT
    hire_date,
    strftime('%Y', hire_date) AS hire_year,
    strftime('%m', hire_date) AS hire_month
FROM employees;

Date arithmetic

SQL

-- Employees hired more than 2 years ago
SELECT * FROM employees
WHERE hire_date < DATE('now', '-2 years');

-- Employees hired this year
SELECT * FROM employees
WHERE strftime('%Y', hire_date) = strftime('%Y', DATE('now'));

Adding hire_date to practice

Your employees table can be extended like this for date practice:

id	name	department	salary	hire_date
1	Dattatray	Finance	60000	2021-03-15
2	Rahul	Engineering	80000	2019-07-01
3	Priya	Finance	55000	2022-11-20
4	Amit	Marketing	45000	2023-01-10
5	Sneha	Engineering	90000	2018-05-30

Practice Exercises

Show all employee names in UPPERCASE
Show each employee's name and the length of their name
Show the first 3 letters of each employee's name as short_name
Show hire year and hire month for each employee
Show employees hired before 2022
Show employees hired in 2022 or later, sorted by hire_date
Show names and department — replace 'Finance' with 'FIN'

Answers

💡 Key Things to Remember

String functions don't change the actual data — they transform it for display
SQLite uses || for string concatenation
SQLite uses strftime() for date formatting (PostgreSQL uses EXTRACT and TO_CHAR)
Always store dates as YYYY-MM-DD so sorting and comparison works correctly

→ Next Week

NULL handling — IS NULL, IS NOT NULL, COALESCE
What happens when data is missing

Week 6 PHASE 1

NULL Values

Week 6 — NULL Handling

📌 What You Already Know

Aggregate functions, GROUP BY, string and date functions

This week you tackle NULL — the most misunderstood thing in SQL.

What is NULL?

NULL means unknown or missing — it is not zero, not an empty string, not "N/A". It is the absence of a value.

Real examples of NULL:

An employee with no manager (they're the CEO)
A product with no discount set
A customer who hasn't entered their phone number

NULL breaks normal comparisons

SQL

-- This does NOT work as expected
SELECT * FROM employees WHERE bonus = NULL;   -- returns nothing!
SELECT * FROM employees WHERE bonus != NULL;  -- also returns nothing!

NULL is not equal to anything — not even itself.

Use IS NULL and IS NOT NULL instead:

SQL

-- Employees with no bonus set
SELECT * FROM employees WHERE bonus IS NULL;

-- Employees who have a bonus
SELECT * FROM employees WHERE bonus IS NOT NULL;

NULL in aggregate functions

SQL

-- COUNT(*) counts all rows including NULLs
-- COUNT(column) skips NULLs

SELECT
    COUNT(*) AS total_rows,
    COUNT(bonus) AS rows_with_bonus
FROM employees;

SUM, AVG, MIN, MAX also skip NULLs automatically.

COALESCE — replace NULL with a default

COALESCE(value, fallback) returns the first non-NULL value.

SQL

-- Show 0 instead of NULL for missing bonuses
SELECT name, COALESCE(bonus, 0) AS bonus
FROM employees;

-- Multiple fallbacks
SELECT name, COALESCE(bonus, commission, 0) AS extra_pay
FROM employees;

IFNULL — SQLite shorthand for COALESCE

SQL

SELECT name, IFNULL(bonus, 0) AS bonus
FROM employees;

IFNULL(a, b) = COALESCE(a, b) — same thing, two names.

NULLIF — turn a value into NULL

NULLIF(a, b) returns NULL if a = b, otherwise returns a.

SQL

-- Treat 0 as NULL (useful to avoid division by zero)
SELECT salary / NULLIF(hours_worked, 0) AS hourly_rate
FROM employees;

NULL in calculations

Any math with NULL returns NULL:

SQL

SELECT 100 + NULL;   -- NULL
SELECT salary + NULL AS total FROM employees;  -- all NULLs

Fix it with COALESCE:

SQL

SELECT salary + COALESCE(bonus, 0) AS total_comp
FROM employees;

Extended employees table for practice

id	name	department	salary	bonus
1	Dattatray	Finance	60000	5000
2	Rahul	Engineering	80000	NULL
3	Priya	Finance	55000	3000
4	Amit	Marketing	45000	NULL
5	Sneha	Engineering	90000	8000
6	Sanjay	HR	48000	NULL

Practice Exercises

Show employees who have no bonus
Show employees who do have a bonus
Show all employees with bonus — display 0 if NULL
Show total compensation (salary + bonus) for each employee — treat NULL bonus as 0
Count how many employees have a bonus set
Show employees with no bonus in Finance
Show the average bonus — but only count employees who actually have one

Answers

💡 Key Things to Remember

NULL = unknown, not zero
Always use IS NULL / IS NOT NULL — never = NULL
COALESCE is your go-to for replacing NULLs with defaults
COUNT(column) skips NULLs; COUNT(*) does not
Math with NULL always returns NULL — use COALESCE to fix it

→ Next Week

Mini Project 1 — Sales data analysis using everything from Phase 1

Week 7 PROJECT

Mini Project 1

Week 7 — Mini Project 1: Sales Data Analysis

What This Tests

Everything from Phase 1:

SELECT, FROM, WHERE, ORDER BY
AND, OR, IN, BETWEEN, LIKE
COUNT, SUM, AVG, MIN, MAX
GROUP BY, HAVING
String + Date functions
NULL handling

The Dataset

You're a data analyst at a retail company. You have two tables:

Table: `sales`

sale_id	product	category	amount	sale_date	rep_id
1	Laptop	Electronics	85000	2024-01-15	101
2	Phone	Electronics	45000	2024-01-20	102
3	Desk	Furniture	12000	2024-02-05	101
4	Chair	Furniture	8000	2024-02-10	103
5	Tablet	Electronics	30000	2024-03-01	102
6	Monitor	Electronics	22000	2024-03-15	101
7	Keyboard	Electronics	5000	2024-04-02	103
8	Sofa	Furniture	35000	2024-04-18	102
9	TV	Electronics	55000	2024-05-10	NULL
10	Bookshelf	Furniture	9500	2024-05-25	103

Table: `reps`

rep_id	rep_name	region
101	Dattatray	West
102	Priya	East
103	Rahul	North

🛠 Setup SQL — Run This First

SQL

CREATE TABLE IF NOT EXISTS sales (
    sale_id INTEGER PRIMARY KEY,
    product TEXT,
    category TEXT,
    amount INTEGER,
    sale_date TEXT,
    rep_id INTEGER
);

INSERT INTO sales VALUES
(1, 'Laptop', 'Electronics', 85000, '2024-01-15', 101),
(2, 'Phone', 'Electronics', 45000, '2024-01-20', 102),
(3, 'Desk', 'Furniture', 12000, '2024-02-05', 101),
(4, 'Chair', 'Furniture', 8000, '2024-02-10', 103),
(5, 'Tablet', 'Electronics', 30000, '2024-03-01', 102),
(6, 'Monitor', 'Electronics', 22000, '2024-03-15', 101),
(7, 'Keyboard', 'Electronics', 5000, '2024-04-02', 103),
(8, 'Sofa', 'Furniture', 35000, '2024-04-18', 102),
(9, 'TV', 'Electronics', 55000, '2024-05-10', NULL),
(10, 'Bookshelf', 'Furniture', 9500, '2024-05-25', 103);

CREATE TABLE IF NOT EXISTS reps (
    rep_id INTEGER PRIMARY KEY,
    rep_name TEXT,
    region TEXT
);

INSERT INTO reps VALUES
(101, 'Dattatray', 'West'),
(102, 'Priya', 'East'),
(103, 'Rahul', 'North');

The Questions

Basic (5 questions)

Show all sales, ordered by amount descending
Show only Electronics sales
Show sales with amount between 10000 and 50000
Show sales where the product name starts with a vowel (A, E, I, O, U)
Show sales with no rep assigned (NULL rep_id)

Aggregations (5 questions)

What is the total revenue across all sales?
How many sales were made per category?
What is the average sale amount per category?
Which category has the highest total revenue?
Show only categories with total revenue above 100000

Date-based (3 questions)

Show all sales from Q1 2024 (January–March)
How many sales were made per month?
What is the total revenue per month, sorted by month?

Combined (2 questions)

Show Electronics sales above 30000, sorted by amount descending
Show the category, count, and total revenue — but only for categories with more than 3 sales

Answers

You Passed Phase 1 If You Can...

[ ] Write any SELECT query from memory
[ ] Filter with WHERE using multiple conditions
[ ] Use all 5 aggregate functions confidently
[ ] GROUP BY a column and filter groups with HAVING
[ ] Handle NULLs without making mistakes
[ ] Extract dates and do basic date filtering

Next Up: Phase 2 — Intermediate SQL

Week 8: JOINs Part 1 — combining data from multiple tables

Week 8 PHASE 2

JOINs — INNER & LEFT

Week 8 — JOINs Part 1: INNER JOIN + LEFT JOIN

📌 What You Already Know

All of Phase 1: SELECT, filter, aggregate, GROUP BY, NULL handling

This week you learn the most important skill in SQL — combining data from multiple tables.

Why JOINs?

Real databases never store everything in one table. Data is split across tables to avoid repetition:

orders table has order details + customer_id
customers table has customer names + addresses

To see an order with the customer's name, you need to JOIN the two tables.

The Tables We'll Use

`orders`

order_id	customer_id	product	amount
1	1	Laptop	85000
2	2	Phone	45000
3	1	Monitor	22000
4	3	Tablet	30000
5	99	Keyboard	5000

`customers`

customer_id	name	city
1	Dattatray	Mumbai
2	Priya	Delhi
3	Rahul	Pune
4	Sneha	Bangalore

Note: Order 5 has customer_id 99 (no matching customer). Customer 4 (Sneha) has no orders.

INNER JOIN — only matching rows from both tables

Returns rows where the join condition matches in both tables.

SQL

SELECT orders.order_id, customers.name, orders.product, orders.amount
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;

Result:

order_id	name	product	amount
1	Dattatray	Laptop	85000
2	Priya	Phone	45000
3	Dattatray	Monitor	22000
4	Rahul	Tablet	30000

Order 5 (customer_id 99) is excluded — no match.

Sneha is also excluded — she has no orders.

Using Aliases to Shorten Table Names

SQL

SELECT o.order_id, c.name, o.product, o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

o and c are aliases — much cleaner than repeating full table names.

LEFT JOIN — all rows from left table, matched rows from right

Returns all rows from the left table (before JOIN), and matching rows from the right. If no match, right side columns are NULL.

SQL

SELECT o.order_id, c.name, o.product, o.amount
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;

Result:

order_id	name	product	amount
1	Dattatray	Laptop	85000
2	Priya	Phone	45000
3	Dattatray	Monitor	22000
4	Rahul	Tablet	30000
5	NULL	Keyboard	5000

Order 5 is now included — name is NULL because customer_id 99 doesn't exist.

LEFT JOIN to find unmatched rows

Classic use case — find orders with no customer:

SQL

SELECT o.order_id, o.product
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;

Returns order 5 — the "orphan" order with no matching customer.

Filtering on JOIN results

SQL

-- Orders for customers in Mumbai
SELECT o.order_id, c.name, o.product
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE c.city = 'Mumbai';

Aggregating after JOIN

SQL

-- Total spend per customer
SELECT c.name, SUM(o.amount) AS total_spent
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
GROUP BY c.name
ORDER BY total_spent DESC;

🛠 Setup SQL — Run This First

SQL

CREATE TABLE IF NOT EXISTS customers (
    customer_id INTEGER PRIMARY KEY,
    name TEXT,
    city TEXT
);

INSERT INTO customers VALUES
(1, 'Dattatray', 'Mumbai'),
(2, 'Priya', 'Delhi'),
(3, 'Rahul', 'Pune'),
(4, 'Sneha', 'Bangalore');

CREATE TABLE IF NOT EXISTS orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    product TEXT,
    amount INTEGER
);

INSERT INTO orders VALUES
(1, 1, 'Laptop', 85000),
(2, 2, 'Phone', 45000),
(3, 1, 'Monitor', 22000),
(4, 3, 'Tablet', 30000),
(5, 99, 'Keyboard', 5000);

Practice Exercises

Show all orders with customer name (INNER JOIN)
Show all orders including those with no matching customer (LEFT JOIN)
Show only orders that have no matching customer
Show total amount spent per customer name
Show orders for customers in Delhi or Pune
How many orders has each customer made?
Which customer has the highest total spend?

Answers

💡 Key Things to Remember

INNER JOIN = only matching rows from both sides
LEFT JOIN = all rows from left, NULLs where no match on right
Always use aliases (o, c) for cleaner queries
Join condition goes after ON — usually matching IDs

→ Next Week

JOINs Part 2 — RIGHT JOIN, FULL OUTER JOIN, self joins, joining 3+ tables

Week 9 PHASE 2

JOINs — RIGHT, FULL & CROSS

Week 9 — JOINs Part 2: RIGHT JOIN, FULL OUTER JOIN, Self Joins, Multi-table JOINs

📌 What You Already Know

INNER JOIN — only matching rows
LEFT JOIN — all left rows + matching right rows

RIGHT JOIN — all rows from right table

Same as LEFT JOIN but flipped. Returns all rows from the right table, and matches from the left.

SQL

SELECT o.order_id, c.name, o.product
FROM orders o
RIGHT JOIN customers c ON o.customer_id = c.customer_id;

Result:

order_id	name	product
1	Dattatray	Laptop
3	Dattatray	Monitor
2	Priya	Phone
4	Rahul	Tablet
NULL	Sneha	NULL

Sneha is included even though she has no orders.

> Note: SQLite does not natively support RIGHT JOIN. Flip your tables and use LEFT JOIN instead:

> FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id

FULL OUTER JOIN — all rows from both tables

Returns all rows from both tables. Where there's no match, the missing side is NULL.

SQL

SELECT o.order_id, c.name, o.product
FROM orders o
FULL OUTER JOIN customers c ON o.customer_id = c.customer_id;

Returns: all orders (including orphans) + all customers (including those with no orders).

> SQLite does not support FULL OUTER JOIN natively. Simulate with UNION of LEFT JOIN + RIGHT JOIN:

SQL

SELECT o.order_id, c.name, o.product
FROM orders o LEFT JOIN customers c ON o.customer_id = c.customer_id
UNION
SELECT o.order_id, c.name, o.product
FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id;

Self Join — joining a table to itself

Used when rows in a table reference other rows in the same table.

Classic example: employees table with a manager_id column.

SQL

-- employees table: id, name, manager_id
SELECT
    e.name AS employee,
    m.name AS manager
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

This reads the same table twice with different aliases (e for employee, m for manager).

Joining 3+ Tables

SQL

-- orders → customers → regions (3 tables)
SELECT
    o.order_id,
    c.name AS customer,
    r.region_name,
    o.amount
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
INNER JOIN regions r ON c.region_id = r.region_id;

Just keep chaining JOIN ... ON ... for each additional table.

JOIN Cheat Sheet

JOIN Type	Returns
INNER JOIN	Only rows with matches in both tables
LEFT JOIN	All left rows + matching right rows (NULL if no match)
RIGHT JOIN	All right rows + matching left rows (NULL if no match)
FULL OUTER JOIN	All rows from both tables (NULL where no match)
Self JOIN	Table joined to itself using aliases

Practice Exercises

Using orders, customers, and this new products table:

`products`

product_id	product_name	category	list_price
1	Laptop	Electronics	90000
2	Phone	Electronics	48000
3	Monitor	Electronics	25000
4	Tablet	Electronics	32000
5	Keyboard	Electronics	6000

Show all customers including those with no orders (simulate RIGHT JOIN using LEFT JOIN)
Show orders joined with customers and city
Show all customers who have never placed an order
Join orders to customers to regions (add a region column to customers for this: Mumbai=West, Delhi=North, Pune=West, Bangalore=South)
Show each customer's name alongside their total spend, ordered by spend descending

Answers

💡 Key Things to Remember

SQLite doesn't support RIGHT JOIN or FULL OUTER JOIN — flip tables and use LEFT JOIN
Self JOIN needs two aliases for the same table
Chain as many JOINs as you need — just keep matching the right columns
Always qualify column names with table aliases when joining (avoids ambiguity)

→ Next Week

Subqueries — queries inside queries

Week 10 PHASE 2

String Functions

Week 10 — Subqueries

📌 What You Already Know

JOINs — combining tables
GROUP BY + HAVING — aggregating groups

A subquery is a query inside another query. It lets you use the result of one query as input to another.

Subquery in WHERE

Answer questions like: "Show employees who earn more than the average salary."

SQL

-- Step 1: what's the average?
SELECT AVG(salary) FROM employees;  -- returns 67000

-- Step 2: use it as a filter
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The inner query runs first, returns a value, and the outer query uses it.

Subquery with IN

SQL

-- Show orders for customers in Mumbai
SELECT * FROM orders
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE city = 'Mumbai'
);

The subquery returns a list of IDs, and IN checks against that list.

Subquery with NOT IN

SQL

-- Show customers who have never placed an order
SELECT * FROM customers
WHERE customer_id NOT IN (
    SELECT customer_id FROM orders WHERE customer_id IS NOT NULL
);

Always add WHERE customer_id IS NOT NULL in the subquery — NOT IN breaks silently if the list contains NULL.

Subquery in FROM (Derived Table)

Use a subquery as if it were a table. Must give it an alias.

SQL

-- Average of department averages
SELECT AVG(dept_avg) AS overall_avg
FROM (
    SELECT department, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY department
) AS dept_summary;

Subquery in SELECT (Scalar Subquery)

Returns a single value used as a column in the output.

SQL

-- Show each employee's salary alongside the company average
SELECT
    name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;

Correlated Subquery

A subquery that references the outer query. Runs once per row — slower but powerful.

SQL

-- Show employees who earn more than their department's average
SELECT name, department, salary
FROM employees e
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department = e.department  -- references outer query's row
);

Subquery vs JOIN — when to use which

Use subquery when...	Use JOIN when...
You need a single value (scalar)	You need columns from both tables
You're filtering with IN/NOT IN	You're combining data
The logic reads more clearly	Performance matters (JOINs are usually faster)

Practice Exercises

Show employees who earn more than the average salary
Show orders placed by customers in Pune
Show customers who have never placed an order (using NOT IN)
Show each employee's salary and the company-wide average salary as a second column
Show departments where at least one employee earns more than 80000
Show the highest-paid employee in each department (using a subquery in WHERE)

Answers

💡 Key Things to Remember

Subquery in WHERE returns a value or list to filter against
Subquery in FROM creates a temporary table — always alias it
Subquery in SELECT returns a single value per row
NOT IN with NULLs in the list returns no results — always filter NULLs in the subquery
Correlated subqueries are powerful but slow — prefer JOINs when possible

→ Next Week

CASE WHEN — conditional logic inside queries

Week 11 PHASE 2

Date & Time Functions

Week 11 — CASE WHEN

📌 What You Already Know

Subqueries, JOINs, aggregations

CASE WHEN is SQL's if/else. It lets you add conditional logic inside a query.

Basic Syntax

SQL

CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ELSE default_result
END

Simple Example — Salary Bands

SQL

SELECT
    name,
    salary,
    CASE
        WHEN salary >= 80000 THEN 'High'
        WHEN salary >= 55000 THEN 'Mid'
        ELSE 'Low'
    END AS salary_band
FROM employees;

Result:

name	salary	salary_band
Dattatray	60000	Mid
Rahul	80000	High
Priya	55000	Mid
Amit	45000	Low
Sneha	90000	High

CASE WHEN with AND / OR

SQL

SELECT
    name,
    department,
    salary,
    CASE
        WHEN department = 'Engineering' AND salary > 80000 THEN 'Top Engineer'
        WHEN department = 'Finance' THEN 'Finance Team'
        ELSE 'Other'
    END AS label
FROM employees;

CASE WHEN inside aggregate functions

Conditional aggregation — count or sum only rows that match a condition.

SQL

-- Count high-salary and low-salary employees in one query
SELECT
    COUNT(CASE WHEN salary >= 70000 THEN 1 END) AS high_earners,
    COUNT(CASE WHEN salary < 70000 THEN 1 END) AS low_earners
FROM employees;

-- Total payroll split by department
SELECT
    SUM(CASE WHEN department = 'Engineering' THEN salary ELSE 0 END) AS eng_payroll,
    SUM(CASE WHEN department = 'Finance' THEN salary ELSE 0 END) AS fin_payroll
FROM employees;

CASE WHEN in ORDER BY

SQL

-- Sort Finance first, then everything else alphabetically
SELECT * FROM employees
ORDER BY
    CASE WHEN department = 'Finance' THEN 0 ELSE 1 END,
    name;

CASE WHEN in GROUP BY

SQL

-- Group by salary band
SELECT
    CASE
        WHEN salary >= 80000 THEN 'High'
        WHEN salary >= 55000 THEN 'Mid'
        ELSE 'Low'
    END AS salary_band,
    COUNT(*) AS headcount
FROM employees
GROUP BY salary_band;

Simple CASE (matching a single column)

SQL

-- Abbreviated syntax when checking one column
SELECT
    name,
    CASE department
        WHEN 'Finance' THEN 'FIN'
        WHEN 'Engineering' THEN 'ENG'
        WHEN 'Marketing' THEN 'MKT'
        ELSE 'OTHER'
    END AS dept_code
FROM employees;

Practice Exercises

Add a column salary_band — 'High' (≥80k), 'Mid' (55k–79k), 'Low' (<55k)
Add a column is_senior — 'Yes' if salary > 70000, 'No' otherwise
Count how many employees are in each salary band
Show total payroll split into Engineering vs Non-Engineering in one row
Show employees — label Finance employees as 'Priority', others as 'Standard'
Sort employees: Engineering first, Finance second, rest last

Answers

💡 Key Things to Remember

CASE WHEN evaluates top to bottom — first match wins
Always end with END (and optionally AS column_name)
ELSE is optional — without it, unmatched rows return NULL
You can use CASE WHEN inside COUNT, SUM, ORDER BY, GROUP BY

→ Next Week

UNION + INTERSECT — combining results from multiple queries

Week 12 PHASE 2

CASE Statements

Week 12 — UNION, INTERSECT, EXCEPT

📌 What You Already Know

JOINs combine columns side-by-side
UNION and friends stack results on top of each other

UNION — combine rows from two queries

SQL

SELECT name, city FROM customers_2023
UNION
SELECT name, city FROM customers_2024;

Rules:

Both queries must have the same number of columns
Columns must have compatible data types
Column names come from the first query
UNION removes duplicate rows automatically

UNION ALL — keep duplicates

SQL

SELECT name FROM employees_india
UNION ALL
SELECT name FROM employees_usa;

UNION ALL is faster than UNION because it skips the deduplication step. Use it when you know there are no duplicates or you want to count them.

UNION vs JOIN — the difference

UNION	JOIN
Stacks rows vertically	Combines columns horizontally
Both queries need same columns	Tables linked by a shared key
Used to combine similar datasets	Used to enrich data from related tables

INTERSECT — rows that appear in both queries

SQL

-- Customers who placed orders in both 2023 AND 2024
SELECT customer_id FROM orders_2023
INTERSECT
SELECT customer_id FROM orders_2024;

Returns only rows that exist in both result sets.

EXCEPT — rows in first query but not the second

SQL

-- Customers from 2023 who did NOT order in 2024
SELECT customer_id FROM orders_2023
EXCEPT
SELECT customer_id FROM orders_2024;

Practical Example: Monthly Report

SQL

-- Combine Q1 and Q2 sales into one result
SELECT 'Q1' AS quarter, product, amount FROM sales_q1
UNION ALL
SELECT 'Q2' AS quarter, product, amount FROM sales_q2
ORDER BY quarter, amount DESC;

Using UNION with same table (different filters)

SQL

-- Top earners vs specific department — tagged rows
SELECT name, salary, 'Top Earner' AS tag
FROM employees WHERE salary > 80000
UNION
SELECT name, salary, 'Finance' AS tag
FROM employees WHERE department = 'Finance';

Practice Exercises

For these exercises, imagine two tables:

orders_jan — orders from January
orders_feb — orders from February

Both have: order_id, customer_id, product, amount

Show all orders from January and February combined (no duplicates)
Show all orders from both months including duplicates
Show customer IDs that placed orders in both months
Show customer IDs that ordered in January but NOT in February
From the employees table, show all employees earning > 80000 labeled 'High' and all in Finance labeled 'Finance' using UNION

Answers

💡 Key Things to Remember

UNION removes duplicates; UNION ALL keeps them (and is faster)
Both queries must have the same number and type of columns
INTERSECT = in both; EXCEPT = in first but not second
SQLite supports UNION, UNION ALL, INTERSECT, EXCEPT

→ Next Week

INSERT, UPDATE, DELETE — modifying data in tables

Week 13 PHASE 2

Views

Week 13 — INSERT, UPDATE, DELETE

📌 What You Already Know

All SELECT-based querying
You've been reading data — now you'll write it

INSERT — add new rows

Insert a single row

SQL

INSERT INTO employees (name, department, salary)
VALUES ('Kavya', 'HR', 52000);

Insert multiple rows at once

SQL

INSERT INTO employees (name, department, salary)
VALUES
    ('Vikram', 'Engineering', 78000),
    ('Meera', 'Finance', 63000),
    ('Arjun', 'Marketing', 47000);

Insert without column names (match exact column order)

SQL

INSERT INTO employees VALUES (8, 'Kavya', 'HR', 52000, NULL);

Risky — if column order changes, this breaks. Always name columns.

Insert from another table

SQL

INSERT INTO employees_archive
SELECT * FROM employees WHERE hire_date < '2020-01-01';

UPDATE — modify existing rows

Always use WHERE — or you update every row

SQL

-- Give Amit a raise
UPDATE employees
SET salary = 50000
WHERE name = 'Amit';

-- Update multiple columns at once
UPDATE employees
SET salary = 50000, department = 'Sales'
WHERE name = 'Amit';

-- Give everyone in Finance a 10% raise
UPDATE employees
SET salary = salary * 1.10
WHERE department = 'Finance';

Rule: Always write the WHERE clause first mentally, then add SET. Forgetting WHERE updates every row.

DELETE — remove rows

SQL

-- Delete one employee
DELETE FROM employees WHERE name = 'Amit';

-- Delete all Marketing employees
DELETE FROM employees WHERE department = 'Marketing';

WARNING: DELETE FROM employees; (no WHERE) deletes all rows. The table structure stays, but all data is gone.

Safe workflow before UPDATE/DELETE

Before running a destructive query, run the SELECT version first:

SQL

-- Step 1: preview what you're about to change
SELECT * FROM employees WHERE department = 'Marketing';

-- Step 2: once you're sure, run the DELETE
DELETE FROM employees WHERE department = 'Marketing';

Transactions — undo mistakes

Wrap changes in a transaction so you can roll back if something goes wrong:

SQL

BEGIN;

UPDATE employees SET salary = salary * 1.10 WHERE department = 'Finance';

-- Check the result
SELECT * FROM employees WHERE department = 'Finance';

-- If happy: confirm
COMMIT;

-- If wrong: undo everything
ROLLBACK;

TRUNCATE vs DELETE

Command	What it does
`DELETE FROM table`	Removes all rows, can be rolled back
`DELETE FROM table WHERE ...`	Removes matching rows
`DROP TABLE table`	Removes the entire table including structure

SQLite doesn't have TRUNCATE — use DELETE FROM table; instead.

Practice Exercises

Insert a new employee: 'Kedar', Engineering, salary 72000
Insert 3 employees at once into the employees table
Give all Engineering employees a 15% salary increase
Change Priya's department from Finance to HR
Delete all employees with salary below 50000
Use a transaction to safely delete Marketing employees (preview first, then commit)

Answers

💡 Key Things to Remember

INSERT INTO table (cols) VALUES (...) — always name your columns
UPDATE without WHERE = updates every row (dangerous)
DELETE without WHERE = deletes every row (dangerous)
Preview with SELECT before running UPDATE or DELETE
Use transactions (BEGIN / COMMIT / ROLLBACK) for risky changes

→ Next Week

CREATE TABLE + constraints — designing your own tables

Week 14 PHASE 2

Stored Procedures

Week 14 — CREATE TABLE + Constraints

📌 What You Already Know

INSERT, UPDATE, DELETE
How to query and modify data

This week you learn to design tables from scratch — and enforce rules on the data.

CREATE TABLE

SQL

CREATE TABLE employees (
    id INTEGER,
    name TEXT,
    department TEXT,
    salary REAL,
    hire_date TEXT
);

Common Data Types (SQLite)

Type	Use for
`INTEGER`	Whole numbers: 1, 42, -5
`REAL`	Decimal numbers: 3.14, 99.9
`TEXT`	Strings: names, emails, dates
`BLOB`	Binary data: images, files
`NULL`	Missing/unknown value

Constraints — rules that protect your data

PRIMARY KEY — unique identifier for each row

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT,
    salary INTEGER
);

Each value must be unique
Cannot be NULL
AUTOINCREMENT makes it auto-generate

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT
);

NOT NULL — column must always have a value

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,      -- required
    department TEXT NOT NULL,
    salary INTEGER           -- optional
);

UNIQUE — all values in column must be different

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    email TEXT UNIQUE,       -- no two employees can share an email
    name TEXT
);

DEFAULT — use a value if none is provided

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    status TEXT DEFAULT 'active',
    created_at TEXT DEFAULT (DATE('now'))
);

CHECK — custom validation rule

SQL

CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    salary INTEGER CHECK (salary > 0),
    age INTEGER CHECK (age >= 18 AND age <= 70)
);

FOREIGN KEY — link to another table

SQL

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    amount REAL,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Prevents inserting an order with a customer_id that doesn't exist in customers.

> SQLite: foreign key enforcement is OFF by default. Enable with:

> PRAGMA foreign_keys = ON;

Full Example: Designing a Finance App Schema

SQL

CREATE TABLE customers (
    customer_id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,
    email TEXT UNIQUE NOT NULL,
    city TEXT,
    created_at TEXT DEFAULT (DATE('now'))
);

CREATE TABLE accounts (
    account_id INTEGER PRIMARY KEY AUTOINCREMENT,
    customer_id INTEGER NOT NULL,
    account_type TEXT CHECK (account_type IN ('savings', 'current', 'investment')),
    balance REAL DEFAULT 0 CHECK (balance >= 0),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE transactions (
    txn_id INTEGER PRIMARY KEY AUTOINCREMENT,
    account_id INTEGER NOT NULL,
    amount REAL NOT NULL,
    txn_type TEXT CHECK (txn_type IN ('credit', 'debit')),
    txn_date TEXT DEFAULT (DATE('now')),
    FOREIGN KEY (account_id) REFERENCES accounts(account_id)
);

ALTER TABLE — modify an existing table

SQLite has limited ALTER TABLE support:

SQL

-- Add a column
ALTER TABLE employees ADD COLUMN bonus INTEGER DEFAULT 0;

-- Rename the table
ALTER TABLE employees RENAME TO staff;

-- Rename a column (SQLite 3.25+)
ALTER TABLE employees RENAME COLUMN salary TO base_salary;

DROP TABLE — delete a table completely

SQL

DROP TABLE employees;

-- Safe version — no error if table doesn't exist
DROP TABLE IF EXISTS employees;

CREATE TABLE IF NOT EXISTS

SQL

CREATE TABLE IF NOT EXISTS employees (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

Safe to run multiple times — won't error if table already exists.

Practice Exercises

Create a products table with: id, name, category, price (must be > 0), stock (default 0)
Create a sales table linked to products via foreign key
Add a discount column to products with default 0
Create a users table with unique email and NOT NULL name
Write the schema for a simple personal finance tracker: accounts + transactions

Answers

💡 Key Things to Remember

PRIMARY KEY = unique + not null identifier for each row
NOT NULL = mandatory field
UNIQUE = no duplicates allowed
DEFAULT = auto-fill when value not provided
CHECK = custom validation
FOREIGN KEY = links rows across tables (enable with PRAGMA in SQLite)

→ Next Week

Mini Project 2 — Financial data analysis using JOINs + aggregations

Week 15 PROJECT

Mini Project 2

Week 15 — Mini Project 2: Financial Data Analysis

What This Tests

Everything from Phase 2:

INNER JOIN, LEFT JOIN, multi-table JOINs
Subqueries in WHERE, FROM, SELECT
CASE WHEN — conditional logic
UNION, INTERSECT, EXCEPT
INSERT, UPDATE, DELETE
CREATE TABLE + constraints

The Dataset: Personal Finance Tracker

You're building and querying a personal finance database.

Schema

SQL

CREATE TABLE accounts (
    account_id INTEGER PRIMARY KEY,
    account_name TEXT NOT NULL,
    account_type TEXT CHECK (account_type IN ('savings', 'checking', 'investment')),
    balance REAL DEFAULT 0
);

CREATE TABLE categories (
    category_id INTEGER PRIMARY KEY,
    category_name TEXT NOT NULL,
    type TEXT CHECK (type IN ('income', 'expense'))
);

CREATE TABLE transactions (
    txn_id INTEGER PRIMARY KEY,
    account_id INTEGER,
    category_id INTEGER,
    amount REAL NOT NULL,
    txn_date TEXT,
    note TEXT,
    FOREIGN KEY (account_id) REFERENCES accounts(account_id),
    FOREIGN KEY (category_id) REFERENCES categories(category_id)
);

Setup Data

SQL

INSERT INTO accounts VALUES
(1, 'HDFC Savings', 'savings', 150000),
(2, 'SBI Checking', 'checking', 35000),
(3, 'Zerodha', 'investment', 200000);

INSERT INTO categories VALUES
(1, 'Salary', 'income'),
(2, 'Freelance', 'income'),
(3, 'Food', 'expense'),
(4, 'Rent', 'expense'),
(5, 'Transport', 'expense'),
(6, 'Entertainment', 'expense'),
(7, 'Investments', 'expense');

INSERT INTO transactions VALUES
(1, 1, 1, 85000, '2024-01-01', 'Monthly salary'),
(2, 1, 3, -8500, '2024-01-05', 'Groceries'),
(3, 1, 4, -20000, '2024-01-07', 'Rent January'),
(4, 2, 5, -2500, '2024-01-10', 'Uber rides'),
(5, 1, 6, -3000, '2024-01-15', 'Movies + dinner'),
(6, 1, 2, 15000, '2024-01-20', 'Freelance project'),
(7, 3, 7, -10000, '2024-01-25', 'SIP investment'),
(8, 1, 1, 85000, '2024-02-01', 'Monthly salary'),
(9, 1, 3, -9200, '2024-02-05', 'Groceries'),
(10, 1, 4, -20000, '2024-02-07', 'Rent February'),
(11, 2, 5, -1800, '2024-02-12', 'Metro + auto'),
(12, 1, 6, -1500, '2024-02-18', 'OTT subscriptions'),
(13, 3, 7, -10000, '2024-02-25', 'SIP investment'),
(14, 1, 1, 85000, '2024-03-01', 'Monthly salary'),
(15, 1, 3, -7800, '2024-03-05', 'Groceries');

The Questions

Basic Joins (4 questions)

Show all transactions with account name and category name
Show all transactions for the 'HDFC Savings' account
Show total amount spent per category (expense categories only)
Show all transactions in January 2024

Aggregations (4 questions)

What is the total income vs total expense for each month?
Which category has the highest total spending?
What is the average transaction amount per account?
Show months where total expenses exceeded 35000

Subqueries + CASE WHEN (4 questions)

Show transactions that are above the average transaction amount
Add a column: label each transaction 'Income' or 'Expense' based on whether amount is positive or negative
Show categories that have more than 2 transactions
Show the account with the highest current balance

Data Modification (3 questions)

Add a new transaction: salary credit of 90000 on 2024-04-01 to HDFC Savings
Update the Zerodha balance to 210000 (after investment gains)
Delete all Entertainment transactions

Answers

SQL

-- 1
SELECT t.txn_id, a.account_name, c.category_name, t.amount, t.txn_date, t.note
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id;

-- 2
SELECT t.*, c.category_name
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id
WHERE a.account_name = 'HDFC Savings';

-- 3
SELECT c.category_name, SUM(ABS(t.amount)) AS total_spent
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
WHERE c.type = 'expense'
GROUP BY c.category_name
ORDER BY total_spent DESC;

-- 4
SELECT t.txn_id, a.account_name, c.category_name, t.amount, t.txn_date
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
INNER JOIN categories c ON t.category_id = c.category_id
WHERE t.txn_date BETWEEN '2024-01-01' AND '2024-01-31';

-- 5
SELECT
    strftime('%Y-%m', txn_date) AS month,
    SUM(CASE WHEN amount > 0 THEN amount ELSE 0 END) AS total_income,
    SUM(CASE WHEN amount < 0 THEN ABS(amount) ELSE 0 END) AS total_expense
FROM transactions
GROUP BY month
ORDER BY month;

-- 6
SELECT c.category_name, SUM(ABS(t.amount)) AS total_spent
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
WHERE c.type = 'expense'
GROUP BY c.category_name
ORDER BY total_spent DESC
LIMIT 1;

-- 7
SELECT a.account_name, AVG(ABS(t.amount)) AS avg_txn
FROM transactions t
INNER JOIN accounts a ON t.account_id = a.account_id
GROUP BY a.account_name;

-- 8
SELECT strftime('%Y-%m', txn_date) AS month, SUM(ABS(amount)) AS total_expense
FROM transactions
WHERE amount < 0
GROUP BY month
HAVING SUM(ABS(amount)) > 35000;

-- 9
SELECT * FROM transactions
WHERE ABS(amount) > (SELECT AVG(ABS(amount)) FROM transactions);

-- 10
SELECT
    txn_id,
    amount,
    txn_date,
    CASE WHEN amount > 0 THEN 'Income' ELSE 'Expense' END AS txn_type
FROM transactions;

-- 11
SELECT c.category_name, COUNT(*) AS txn_count
FROM transactions t
INNER JOIN categories c ON t.category_id = c.category_id
GROUP BY c.category_name
HAVING COUNT(*) > 2;

-- 12
SELECT account_name, balance
FROM accounts
ORDER BY balance DESC
LIMIT 1;

-- 13
INSERT INTO transactions (account_id, category_id, amount, txn_date, note)
VALUES (1, 1, 90000, '2024-04-01', 'Monthly salary');

-- 14
UPDATE accounts SET balance = 210000 WHERE account_name = 'Zerodha';

-- 15
DELETE FROM transactions
WHERE category_id = (
    SELECT category_id FROM categories WHERE category_name = 'Entertainment'
);

You Passed Phase 2 If You Can...

[ ] JOIN 3 tables together confidently
[ ] Write a subquery in WHERE and FROM
[ ] Use CASE WHEN for conditional columns and conditional aggregation
[ ] Write INSERT, UPDATE, DELETE safely (with WHERE)
[ ] Design a basic table with appropriate constraints

Next Up: Phase 3 — Advanced SQL

Week 16: Window Functions Part 1 — ROW_NUMBER, RANK, DENSE_RANK

Week 16 PHASE 3

Window Functions — Ranking

Week 16 — Window Functions Part 1: ROW_NUMBER, RANK, DENSE_RANK

📌 What You Already Know

Aggregates with GROUP BY collapse rows into one row per group
Window functions are different — they add a computed column without collapsing rows

What is a Window Function?

SQL

-- GROUP BY collapses rows — you lose individual employee rows
SELECT department, AVG(salary) FROM employees GROUP BY department;

-- Window function keeps all rows AND adds the average alongside each
SELECT name, department, salary,
    AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

Both calculate the average per department — but window functions keep every row visible.

Syntax

SQL

function_name() OVER (
    PARTITION BY column   -- divide into groups (optional)
    ORDER BY column       -- order within each group (required for ranking)
)

ROW_NUMBER — sequential number per partition

SQL

-- Rank employees within each department by salary (1 = highest)
SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

Result:

name	department	salary	row_num
Sneha	Engineering	90000	1
Rahul	Engineering	80000	2
Dattatray	Finance	60000	1
Priya	Finance	55000	2

Ties get different numbers (arbitrary order for ties).

RANK — rank with gaps for ties

SQL

SELECT
    name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS rnk
FROM employees;

If two employees tie at rank 2, the next rank is 4 (gap).

name	salary	rnk
Sneha	90000	1
Rahul	80000	2
Dattatray	60000	3
Priya	55000	4
Amit	45000	5

DENSE_RANK — rank without gaps

SQL

SELECT
    name,
    salary,
    DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rnk
FROM employees;

If two employees tie at rank 2, the next rank is still 3 (no gap).

ROW_NUMBER vs RANK vs DENSE_RANK

Imagine salaries: 90000, 80000, 80000, 55000

Function	Result
ROW_NUMBER	1, 2, 3, 4
RANK	1, 2, 2, 4
DENSE_RANK	1, 2, 2, 3

Classic Use Case: Top N per Group

Get the top 1 (or top 2) earner per department:

SQL

SELECT name, department, salary
FROM (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
WHERE rn = 1;

This is one of the most common interview questions in SQL.

Without PARTITION BY — rank across entire table

SQL

SELECT
    name,
    salary,
    RANK() OVER (ORDER BY salary DESC) AS company_rank
FROM employees;

No PARTITION BY = the window is the entire table.

Practice Exercises

Rank all employees by salary (highest = rank 1) using RANK
Assign row numbers to employees within each department (ordered by salary desc)
Show employees with their company-wide DENSE_RANK by salary
Find the top earner in each department
Find the top 2 earners in each department
Show each employee's salary rank within their department and their overall company rank

Answers

💡 Key Things to Remember

Window functions add a column — they don't reduce rows like GROUP BY
PARTITION BY = how to split the window (like GROUP BY for window functions)
ORDER BY inside OVER = how to order within each partition
ROW_NUMBER = always unique; RANK = gaps on ties; DENSE_RANK = no gaps
Top-N per group pattern: ROW_NUMBER in subquery, then filter by rn <= N

→ Next Week

Window Functions Part 2 — LAG, LEAD, SUM/AVG OVER, running totals

Week 17 PHASE 3

Window Functions — LAG/LEAD

Week 17 — Window Functions Part 2: LAG, LEAD, Running Totals

📌 What You Already Know

ROW_NUMBER, RANK, DENSE_RANK
PARTITION BY and ORDER BY inside OVER()

LAG — look at the previous row's value

SQL

SELECT
    sale_date,
    amount,
    LAG(amount) OVER (ORDER BY sale_date) AS prev_amount
FROM sales;

Result:

sale_date	amount	prev_amount
2024-01-01	50000	NULL
2024-02-01	65000	50000
2024-03-01	45000	65000

First row has NULL — no previous row exists.

LAG with default value

SQL

LAG(amount, 1, 0) OVER (ORDER BY sale_date)
-- 1 = look 1 row back, 0 = default if no row exists

LEAD — look at the next row's value

SQL

SELECT
    sale_date,
    amount,
    LEAD(amount) OVER (ORDER BY sale_date) AS next_amount
FROM sales;

Last row has NULL — no next row.

Month-over-month change using LAG

SQL

SELECT
    sale_date,
    amount,
    LAG(amount) OVER (ORDER BY sale_date) AS prev_month,
    amount - LAG(amount) OVER (ORDER BY sale_date) AS mom_change,
    ROUND(
        100.0 * (amount - LAG(amount) OVER (ORDER BY sale_date))
        / LAG(amount) OVER (ORDER BY sale_date),
        2
    ) AS pct_change
FROM monthly_sales;

This is a classic interview question — calculate MoM growth.

Running Total with SUM OVER

SQL

SELECT
    sale_date,
    amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;

Result:

sale_date	amount	running_total
2024-01-01	50000	50000
2024-02-01	65000	115000
2024-03-01	45000	160000

Running Average

SQL

SELECT
    sale_date,
    amount,
    AVG(amount) OVER (ORDER BY sale_date) AS running_avg
FROM sales;

Partitioned Running Total

SQL

-- Running total per department
SELECT
    name,
    department,
    salary,
    SUM(salary) OVER (PARTITION BY department ORDER BY salary) AS dept_running_total
FROM employees;

Moving Average (rolling window)

SQL

-- 3-month moving average
SELECT
    sale_date,
    amount,
    AVG(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM monthly_sales;

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW = current row + 2 rows before it.

NTILE — divide rows into N buckets

SQL

-- Divide employees into 4 salary quartiles
SELECT
    name,
    salary,
    NTILE(4) OVER (ORDER BY salary DESC) AS quartile
FROM employees;

Quartile 1 = top 25%, quartile 4 = bottom 25%.

FIRST_VALUE and LAST_VALUE

SQL

SELECT
    name,
    department,
    salary,
    FIRST_VALUE(name) OVER (PARTITION BY department ORDER BY salary DESC) AS top_earner
FROM employees;

Practice Exercises

Using a monthly_sales table:

month	revenue
2024-01	120000
2024-02	145000
2024-03	98000
2024-04	167000
2024-05	152000

Show each month's revenue alongside the previous month's revenue
Calculate month-over-month change in revenue
Show running total of revenue
Show 3-month moving average of revenue
Using employees, show each employee's salary and the highest salary in their department

Answers

💡 Key Things to Remember

LAG(col) = previous row's value; LEAD(col) = next row's value
SUM() OVER (ORDER BY ...) = running total (cumulative)
AVG() OVER (ORDER BY ...) = running average
ROWS BETWEEN = define a rolling window
MoM change = amount - LAG(amount) OVER (ORDER BY date)

→ Next Week

CTEs (Common Table Expressions) — the WITH clause for readable complex queries

Week 18 PHASE 3

CTEs

Week 18 — CTEs (Common Table Expressions)

📌 What You Already Know

Subqueries — queries inside queries
Window functions

A CTE is a named, temporary result set defined with WITH. It makes complex queries readable.

Basic CTE Syntax

SQL

WITH cte_name AS (
    SELECT ...
    FROM ...
    WHERE ...
)
SELECT * FROM cte_name;

CTE vs Subquery — same result, different readability

Subquery (hard to read)

SQL

SELECT name, department, salary
FROM (
    SELECT name, department, salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg
    FROM employees
)
WHERE salary > dept_avg;

CTE (readable)

SQL

WITH dept_averages AS (
    SELECT name, department, salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg
    FROM employees
)
SELECT name, department, salary
FROM dept_averages
WHERE salary > dept_avg;

Same result. The CTE version is far easier to understand and debug.

Multiple CTEs

Chain multiple CTEs with commas:

SQL

WITH
monthly_revenue AS (
    SELECT
        strftime('%Y-%m', sale_date) AS month,
        SUM(amount) AS revenue
    FROM sales
    GROUP BY month
),
mom_growth AS (
    SELECT
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
        revenue - LAG(revenue) OVER (ORDER BY month) AS growth
    FROM monthly_revenue
)
SELECT * FROM mom_growth
WHERE growth IS NOT NULL
ORDER BY month;

Each CTE builds on the previous one. This replaces deeply nested subqueries.

CTEs for Top-N per Group (cleaner version)

SQL

WITH ranked_employees AS (
    SELECT
        name,
        department,
        salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
SELECT name, department, salary
FROM ranked_employees
WHERE rn = 1;

Recursive CTEs — generate sequences

A powerful but advanced use: a CTE that references itself.

SQL

-- Generate numbers 1 to 10
WITH RECURSIVE counter(n) AS (
    SELECT 1
    UNION ALL
    SELECT n + 1 FROM counter WHERE n < 10
)
SELECT n FROM counter;

Practical use: generate a calendar of dates.

SQL

WITH RECURSIVE dates(d) AS (
    SELECT DATE('2024-01-01')
    UNION ALL
    SELECT DATE(d, '+1 day') FROM dates WHERE d < '2024-01-31'
)
SELECT d FROM dates;

CTEs for Reusability Within a Query

SQL

WITH high_earners AS (
    SELECT * FROM employees WHERE salary > 70000
)
-- Use it multiple times
SELECT department, COUNT(*) FROM high_earners GROUP BY department
UNION ALL
SELECT 'TOTAL', COUNT(*) FROM high_earners;

When to Use CTEs vs Subqueries

Use CTE when...	Use subquery when...
Logic is complex or multi-step	It's a simple one-liner
You need to reuse the same result	It's used only once
You want readable, debuggable code	You're optimizing for brevity
You're building on previous steps	Query is already simple

In interviews and real work: default to CTEs. They signal clean thinking.

Practice Exercises

Use a CTE to find employees who earn above the company average salary
Use two CTEs: first get total revenue per month, then calculate MoM change
Rewrite this subquery as a CTE:

SQL

   SELECT * FROM (
       SELECT name, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rnk
       FROM employees
   ) WHERE rnk <= 2;

Use a recursive CTE to generate dates for all of February 2024

Answers

💡 Key Things to Remember

WITH name AS (...) defines a CTE — use it like a table below
Multiple CTEs: WITH a AS (...), b AS (...) — separate with commas
CTEs make complex queries readable and debuggable
Recursive CTEs use UNION ALL and reference themselves
Default to CTEs over nested subqueries in interviews and real work

→ Next Week

Indexes + Query Performance — why queries slow down and how to fix them

Week 19 PHASE 3

Indexes & Performance

Week 19 — Indexes + Query Performance

📌 What You Already Know

All SQL querying techniques
CTEs and window functions

This week you learn why queries go slow — and how to fix them.

Why Queries Slow Down

When you run SELECT * FROM orders WHERE customer_id = 42, the database has two choices:

Full table scan — read every single row and check if customer_id = 42
Index lookup — jump directly to rows where customer_id = 42

Without an index, the database always does a full table scan. With millions of rows, this is slow.

What is an Index?

An index is a separate data structure (like a book's index) that maps column values to row locations. The database can jump straight to matching rows.

CREATE INDEX

SQL

-- Basic index on one column
CREATE INDEX idx_customer_id ON orders(customer_id);

-- Index on multiple columns (composite index)
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- Unique index (also enforces uniqueness)
CREATE UNIQUE INDEX idx_email ON users(email);

DROP INDEX

SQL

DROP INDEX idx_customer_id;

When Indexes Help

Query pattern	Index helps?
`WHERE customer_id = 42`	Yes — equality lookup
`WHERE department = 'Finance'`	Yes
`WHERE salary > 50000`	Yes — range scan
`ORDER BY hire_date`	Yes
`JOIN ON orders.customer_id = customers.customer_id`	Yes — both sides
`WHERE UPPER(name) = 'RAHUL'`	No — function on column defeats index
`WHERE name LIKE '%rah%'`	No — leading wildcard defeats index
`WHERE name LIKE 'rah%'`	Yes — trailing wildcard uses index

EXPLAIN QUERY PLAN (SQLite)

See how SQLite executes a query — does it use an index or scan the whole table?

SQL

EXPLAIN QUERY PLAN
SELECT * FROM orders WHERE customer_id = 42;

Output without index:

SQL

SCAN TABLE orders

Output with index:

SQL

SEARCH TABLE orders USING INDEX idx_customer_id (customer_id=?)

SEARCH = uses index (fast). SCAN = full table scan (slow).

Index Trade-offs

Benefit	Cost
SELECT queries are faster	INSERT/UPDATE/DELETE are slower
ORDER BY is faster	More disk space used
JOINs are faster	Too many indexes = slow writes

Rule: Index columns used frequently in WHERE, JOIN ON, and ORDER BY. Don't index everything.

Query Optimization Tips

1. Avoid functions on indexed columns in WHERE

SQL

-- BAD — index not used
WHERE YEAR(hire_date) = 2023

-- GOOD — index used
WHERE hire_date BETWEEN '2023-01-01' AND '2023-12-31'

2. Avoid leading wildcards in LIKE

SQL

-- BAD — full scan
WHERE name LIKE '%aya'

-- GOOD — index used
WHERE name LIKE 'Dat%'

3. Use LIMIT for large result sets

SQL

SELECT * FROM orders ORDER BY order_date DESC LIMIT 100;

4. Select only what you need

SQL

-- BAD
SELECT * FROM orders WHERE customer_id = 42;

-- GOOD
SELECT order_id, amount, order_date FROM orders WHERE customer_id = 42;

5. Index both sides of a JOIN

SQL

-- Make sure both customer_id columns are indexed
CREATE INDEX idx_orders_cust ON orders(customer_id);
-- customers.customer_id is usually a PRIMARY KEY (auto-indexed)

Composite Index Column Order Matters

SQL

CREATE INDEX idx_dept_salary ON employees(department, salary);

This index helps:

WHERE department = 'Finance'
WHERE department = 'Finance' AND salary > 60000

This index does NOT help:

WHERE salary > 60000 (without department)

The leftmost column in the index must be in the WHERE clause.

Practice Exercises

Create an index on orders.customer_id
Create a composite index on employees(department, salary)
Use EXPLAIN QUERY PLAN to check if a query uses an index
Rewrite this slow query to use an index:

WHERE STRFTIME('%Y', hire_date) = '2023'

Which of these needs an index: a table with 100 rows or 10 million rows?

Answers

💡 Key Things to Remember

Index = lookup table for column values → speeds up reads, slows writes
EXPLAIN QUERY PLAN shows you if the index is being used
Avoid functions on indexed columns in WHERE — they bypass the index
Leading wildcards (LIKE '%term') defeat indexes
Composite index: leftmost column must be in WHERE to get a benefit
Don't over-index — every index adds write overhead

→ Next Week

Views + Stored Procedures — saving reusable query logic

Week 20 PHASE 3

Transactions & ACID

Week 20 — Views + Stored Procedures

📌 What You Already Know

CTEs for reusable in-query logic
Indexes for performance

This week you learn to save query logic permanently — views for reusable queries, stored procedures for reusable programs.

Views — saved SELECT queries

A view is a stored SELECT query that you can use like a table.

SQL

-- Create a view
CREATE VIEW high_earners AS
SELECT name, department, salary
FROM employees
WHERE salary > 70000;

-- Use it like a table
SELECT * FROM high_earners;
SELECT department, COUNT(*) FROM high_earners GROUP BY department;

The view doesn't store data — it reruns the SELECT every time you query it.

CREATE VIEW

SQL

CREATE VIEW dept_summary AS
SELECT
    department,
    COUNT(*) AS headcount,
    AVG(salary) AS avg_salary,
    SUM(salary) AS total_payroll
FROM employees
GROUP BY department;

Now anyone can run SELECT * FROM dept_summary without knowing the underlying query.

DROP VIEW

SQL

DROP VIEW high_earners;
DROP VIEW IF EXISTS high_earners;

CREATE OR REPLACE VIEW (PostgreSQL/MySQL)

SQL

CREATE OR REPLACE VIEW high_earners AS
SELECT name, department, salary, bonus
FROM employees
WHERE salary > 70000;

SQLite doesn't support OR REPLACE for views — drop and recreate.

When to Use Views

Use case	Example
Simplify complex queries	One view for a 5-table JOIN
Hide sensitive columns	View without salary column for non-HR users
Consistent reporting	Standard monthly summary everyone uses
Simplify permissions	Grant access to a view, not the raw table

Stored Procedures (PostgreSQL/MySQL concept)

SQLite doesn't support stored procedures. But you'll encounter them in PostgreSQL and MySQL at work.

A stored procedure is a named, saved program that can run multiple SQL statements:

SQL

-- PostgreSQL syntax
CREATE OR REPLACE PROCEDURE give_raise(dept TEXT, pct REAL)
LANGUAGE plpgsql AS $$
BEGIN
    UPDATE employees
    SET salary = salary * (1 + pct / 100)
    WHERE department = dept;
    COMMIT;
END;
$$;

-- Call it
CALL give_raise('Engineering', 10);

Functions (available in SQLite with extensions)

SQLite supports user-defined functions via application code, but not natively via SQL.

In PostgreSQL:

SQL

CREATE FUNCTION get_headcount(dept TEXT)
RETURNS INTEGER AS $$
    SELECT COUNT(*) FROM employees WHERE department = dept;
$$ LANGUAGE sql;

SELECT get_headcount('Finance');  -- returns 2

Materialized Views (PostgreSQL concept)

A regular view reruns the query every time. A materialized view stores the result physically — like a cache.

SQL

-- PostgreSQL
CREATE MATERIALIZED VIEW monthly_summary AS
SELECT strftime('%Y-%m', sale_date) AS month, SUM(amount) AS revenue
FROM sales GROUP BY month;

-- Refresh when data changes
REFRESH MATERIALIZED VIEW monthly_summary;

Use when the underlying query is expensive and data doesn't change every second.

Practice Exercises (SQLite — views only)

Create a view finance_team showing only Finance department employees
Create a view order_summary that joins orders + customers and shows: order_id, customer name, product, amount
Create a view dept_stats showing headcount, avg salary, and total payroll per department
Query dept_stats to find departments with avg salary > 60000
Drop the finance_team view

Answers

💡 Key Things to Remember

A view is a named SELECT query — query it like a table
Views don't store data — they re-execute on every query
Use views to simplify complex joins and standardize reporting
SQLite supports views but not stored procedures
PostgreSQL/MySQL have stored procedures + functions + materialized views

→ Next Week

Advanced Aggregations — ROLLUP, CUBE, GROUPING SETS

Week 21 PHASE 3

Advanced Subqueries

Week 21 — Advanced Aggregations: ROLLUP, CUBE, GROUPING SETS

📌 What You Already Know

GROUP BY, HAVING
Window functions
CTEs

These three operators extend GROUP BY to produce subtotals and grand totals in a single query.

> Note: SQLite has limited support for these. Full support in PostgreSQL and MySQL. This lesson prepares you for those environments.

The Problem They Solve

You want a report with:

Revenue per region per product
Subtotal per region
Grand total for all regions

Without ROLLUP, you'd need 3 separate queries and UNION them. ROLLUP does it in one.

ROLLUP — hierarchical subtotals

SQL

SELECT region, product, SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product);

Result:

region	product	total
North	Laptop	85000
North	Phone	45000
North	NULL	130000	← subtotal for North
South	Tablet	30000
South	NULL	30000	← subtotal for South
NULL	NULL	160000	← grand total

NULL in the output = subtotal row (not actual NULL data).

GROUPING() — detect if a NULL is a subtotal

SQL

SELECT
    CASE WHEN GROUPING(region) = 1 THEN 'ALL REGIONS' ELSE region END AS region,
    CASE WHEN GROUPING(product) = 1 THEN 'ALL PRODUCTS' ELSE product END AS product,
    SUM(revenue) AS total
FROM sales
GROUP BY ROLLUP(region, product);

GROUPING(col) returns 1 if the NULL is from a rollup (subtotal), 0 if it's real data.

CUBE — all combinations of subtotals

ROLLUP gives subtotals along a hierarchy (region → product).

CUBE gives subtotals for every possible combination.

SQL

SELECT region, product, SUM(revenue)
FROM sales
GROUP BY CUBE(region, product);

Generates:

Per region per product
Per region (across all products)
Per product (across all regions)
Grand total

More rows than ROLLUP — useful for pivot-style reports.

GROUPING SETS — choose exactly which subtotals you want

More control than ROLLUP or CUBE:

SQL

SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS (
    (region, product),  -- detail level
    (region),           -- subtotal by region
    ()                  -- grand total
);

Simulating ROLLUP in SQLite with UNION ALL

Since SQLite doesn't support ROLLUP natively:

SQL

-- Detail rows
SELECT department, 'subtotal' AS level, SUM(salary) AS total
FROM employees GROUP BY department

UNION ALL

-- Grand total
SELECT 'ALL', 'grand_total', SUM(salary)
FROM employees;

Practical Example: Sales Dashboard Query

SQL

-- PostgreSQL/MySQL
SELECT
    COALESCE(region, 'TOTAL') AS region,
    COALESCE(category, 'ALL CATEGORIES') AS category,
    SUM(amount) AS revenue,
    COUNT(*) AS transactions
FROM sales
GROUP BY ROLLUP(region, category)
ORDER BY region, category;

Practice Exercises

For these, use PostgreSQL or MySQL if available. In SQLite, simulate with UNION ALL.

Get total sales per department with a grand total row (simulate with UNION ALL in SQLite)
Write the ROLLUP version for sales by region and product
What is the difference between ROLLUP and CUBE?
Write a query using GROUPING SETS to get: per department, per salary band, and overall total

Answers

💡 Key Things to Remember

ROLLUP = subtotals along a hierarchy + grand total
CUBE = subtotals for every combination of dimensions
GROUPING SETS = you pick exactly which subtotals to include
GROUPING(col) = 1 if NULL is from rollup, 0 if real data
SQLite doesn't support these natively — PostgreSQL and MySQL do
In SQLite, simulate with UNION ALL

→ Next Week

Interview Prep — the top SQL questions asked in data analyst and engineering interviews

Week 22 INTERVIEW PREP

Interview Prep

Week 22 — SQL Interview Prep

The Most Common SQL Interview Questions

Categorized by type. Each one appears frequently at companies like Flipkart, Swiggy, Zepto, Goldman Sachs, JP Morgan, and top tech firms.

Category 1: Basic Filtering + Aggregation

QFind the second highest salary

SQL

-- Method 1: OFFSET
SELECT DISTINCT salary FROM employees
ORDER BY salary DESC
LIMIT 1 OFFSET 1;

-- Method 2: Subquery
SELECT MAX(salary) FROM employees
WHERE salary < (SELECT MAX(salary) FROM employees);

-- Method 3: DENSE_RANK (preferred in interviews)
SELECT salary FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
    FROM employees
)
WHERE rnk = 2;

QFind the Nth highest salary (generalized)

SQL

-- For Nth highest, use DENSE_RANK with rnk = N
SELECT salary FROM (
    SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) AS rnk
    FROM employees
)
WHERE rnk = N;  -- replace N with the desired rank

QFind employees with salary above the department average

SQL

SELECT e.name, e.department, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees WHERE department = e.department
);

-- Cleaner with window function
SELECT name, department, salary FROM (
    SELECT name, department, salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg
    FROM employees
)
WHERE salary > dept_avg;

Category 2: Duplicates

QFind duplicate emails (or any duplicate values)

SQL

SELECT email, COUNT(*) AS cnt
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

QDelete duplicate rows, keep the one with the lowest ID

SQL

DELETE FROM users
WHERE id NOT IN (
    SELECT MIN(id) FROM users GROUP BY email
);

QFind rows that appear in one table but not another

SQL

-- Customers who never ordered
SELECT customer_id FROM customers
WHERE customer_id NOT IN (
    SELECT DISTINCT customer_id FROM orders WHERE customer_id IS NOT NULL
);

-- Cleaner with LEFT JOIN
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

Category 3: Rankings + Top N

QTop earner per department

SQL

SELECT name, department, salary FROM (
    SELECT name, department, salary,
        ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS rn
    FROM employees
)
WHERE rn = 1;

QTop 3 products by revenue

SQL

SELECT product, SUM(amount) AS revenue
FROM sales
GROUP BY product
ORDER BY revenue DESC
LIMIT 3;

QRank employees within their department

SQL

SELECT name, department, salary,
    RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees;

Category 4: Date and Time

QFind records from the last 30 days

SQL

SELECT * FROM orders
WHERE order_date >= DATE('now', '-30 days');

QMonth-over-month revenue growth

SQL

WITH monthly AS (
    SELECT strftime('%Y-%m', sale_date) AS month, SUM(amount) AS revenue
    FROM sales GROUP BY month
)
SELECT
    month,
    revenue,
    LAG(revenue) OVER (ORDER BY month) AS prev_revenue,
    ROUND(100.0 * (revenue - LAG(revenue) OVER (ORDER BY month))
        / LAG(revenue) OVER (ORDER BY month), 2) AS pct_growth
FROM monthly;

QFind users who haven't logged in for 90+ days

SQL

SELECT * FROM users
WHERE last_login < DATE('now', '-90 days')
   OR last_login IS NULL;

Category 5: Self Joins + Hierarchies

QFind all managers and their direct reports

SQL

SELECT
    m.name AS manager,
    e.name AS employee
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id;

QFind employees with no manager (top level)

SQL

SELECT * FROM employees WHERE manager_id IS NULL;

Category 6: Running Totals + Cumulative

QRunning total of sales by date

SQL

SELECT sale_date, amount,
    SUM(amount) OVER (ORDER BY sale_date) AS running_total
FROM sales;

QPercentage of total for each row

SQL

SELECT
    product,
    amount,
    ROUND(100.0 * amount / SUM(amount) OVER (), 2) AS pct_of_total
FROM sales;

Category 7: Pivoting Data

QPivot — rows to columns using CASE WHEN

SQL

-- Monthly sales as columns
SELECT
    strftime('%Y', sale_date) AS year,
    SUM(CASE WHEN strftime('%m', sale_date) = '01' THEN amount ELSE 0 END) AS Jan,
    SUM(CASE WHEN strftime('%m', sale_date) = '02' THEN amount ELSE 0 END) AS Feb,
    SUM(CASE WHEN strftime('%m', sale_date) = '03' THEN amount ELSE 0 END) AS Mar
FROM sales
GROUP BY year;

Category 8: Classic Tricky Questions

QFind customers who bought product A but NOT product B

SQL

SELECT DISTINCT customer_id FROM orders WHERE product = 'Laptop'
EXCEPT
SELECT DISTINCT customer_id FROM orders WHERE product = 'Phone';

QFind consecutive days with sales (gap analysis)

SQL

WITH daily AS (
    SELECT DISTINCT sale_date FROM sales
),
ranked AS (
    SELECT sale_date,
        ROW_NUMBER() OVER (ORDER BY sale_date) AS rn
    FROM daily
)
SELECT MIN(sale_date), MAX(sale_date), COUNT(*) AS streak
FROM ranked
GROUP BY DATE(sale_date, '-' || rn || ' days');

QEmployees who joined in the same month as someone else

SQL

SELECT name, hire_date FROM employees
WHERE strftime('%Y-%m', hire_date) IN (
    SELECT strftime('%Y-%m', hire_date)
    FROM employees
    GROUP BY strftime('%Y-%m', hire_date)
    HAVING COUNT(*) > 1
);

💡 Interview Tips

Think out loud — explain your approach before writing
Clarify edge cases — "Are there NULLs? Can salaries be tied?"
Start simple — write a basic version, then optimize
Use CTEs — they show you can write clean, readable SQL
Know DENSE_RANK vs RANK vs ROW_NUMBER — this comes up in every interview
Know two ways to answer — subquery + window function versions
Mention performance — "If this table is large, I'd ensure an index on X"

Quick Reference: Question → Technique

Question type	Best technique
Nth highest value	DENSE_RANK window function
Top N per group	ROW_NUMBER + subquery
Duplicates	GROUP BY + HAVING COUNT > 1
Missing from another table	LEFT JOIN + IS NULL, or NOT IN, or EXCEPT
Running total	SUM() OVER (ORDER BY ...)
MoM change	LAG()
Percentage of total	value / SUM() OVER ()
Conditional count/sum	SUM(CASE WHEN ... THEN 1 ELSE 0 END)
Complex multi-step logic	CTE

→ Next Week

Capstone Project — full dataset analysis combining everything

Week 23 CAPSTONE

Capstone Project

Week 23 — Capstone Project: Stock Market + Finance Analysis

What This Tests

Everything across all 3 phases:

All querying, filtering, aggregation
JOINs, subqueries, CTEs
Window functions (ROW_NUMBER, RANK, LAG, running totals)
CASE WHEN, NULL handling
Table design

This is your portfolio project — write it clean, use CTEs, comment your thinking.

The Dataset: Market + Fundamentals

Schema

SQL

CREATE TABLE stocks (
    ticker TEXT PRIMARY KEY,
    company_name TEXT NOT NULL,
    sector TEXT,
    market_cap REAL
);

CREATE TABLE prices (
    price_id INTEGER PRIMARY KEY,
    ticker TEXT,
    price_date TEXT,
    open_price REAL,
    close_price REAL,
    volume INTEGER,
    FOREIGN KEY (ticker) REFERENCES stocks(ticker)
);

CREATE TABLE financials (
    fin_id INTEGER PRIMARY KEY,
    ticker TEXT,
    fiscal_year INTEGER,
    revenue REAL,
    net_income REAL,
    eps REAL,
    FOREIGN KEY (ticker) REFERENCES stocks(ticker)
);

Setup Data

SQL

INSERT INTO stocks VALUES
('RELIANCE', 'Reliance Industries', 'Energy', 1800000),
('TCS', 'Tata Consultancy Services', 'IT', 1400000),
('HDFCBANK', 'HDFC Bank', 'Banking', 1100000),
('INFY', 'Infosys', 'IT', 650000),
('WIPRO', 'Wipro', 'IT', 280000),
('ICICIBANK', 'ICICI Bank', 'Banking', 750000),
('SBIN', 'State Bank of India', 'Banking', 600000);

INSERT INTO prices VALUES
(1, 'TCS', '2024-01-01', 3800, 3850, 500000),
(2, 'TCS', '2024-02-01', 3850, 3920, 480000),
(3, 'TCS', '2024-03-01', 3920, 3880, 510000),
(4, 'TCS', '2024-04-01', 3880, 4050, 620000),
(5, 'TCS', '2024-05-01', 4050, 4120, 590000),
(6, 'INFY', '2024-01-01', 1600, 1640, 800000),
(7, 'INFY', '2024-02-01', 1640, 1590, 750000),
(8, 'INFY', '2024-03-01', 1590, 1620, 820000),
(9, 'INFY', '2024-04-01', 1620, 1700, 900000),
(10, 'INFY', '2024-05-01', 1700, 1680, 870000),
(11, 'HDFCBANK', '2024-01-01', 1500, 1520, 1200000),
(12, 'HDFCBANK', '2024-02-01', 1520, 1480, 1100000),
(13, 'HDFCBANK', '2024-03-01', 1480, 1510, 1150000),
(14, 'HDFCBANK', '2024-04-01', 1510, 1560, 1300000),
(15, 'HDFCBANK', '2024-05-01', 1560, 1590, 1250000);

INSERT INTO financials VALUES
(1, 'TCS', 2022, 1917540, 422720, 114.50),
(2, 'TCS', 2023, 2254540, 463620, 125.80),
(3, 'TCS', 2024, 2408560, 471980, 128.60),
(4, 'INFY', 2022, 1217310, 224530, 53.40),
(5, 'INFY', 2023, 1467490, 244910, 58.10),
(6, 'INFY', 2024, 1533990, 261290, 62.30),
(7, 'HDFCBANK', 2022, 1684500, 362960, 64.80),
(8, 'HDFCBANK', 2023, 2069770, 445930, 79.20),
(9, 'HDFCBANK', 2024, 2422640, 512830, 91.10);

The Questions (30 total)

Price Analysis (10 questions)

Show the latest price for each stock
Show all stocks with their price change from Jan to May 2024
Calculate the monthly return for TCS (% change each month)
Show the best performing month for each stock (highest close price)
Show MoM price change for all stocks (LAG)
Which stock had the highest single-month price increase?
Show a running total of volume traded for TCS
Show the 3-month moving average price for INFY
Rank stocks by their total % return from Jan to May 2024
Which months did any stock price drop (close < open)?

Fundamental Analysis (8 questions)

Show revenue growth % year-over-year for each company
Which company has the highest EPS in 2024?
Show net income margin (net_income/revenue) for each company in 2024
Which company grew revenue fastest from 2022 to 2024?
Show companies where EPS increased every year (2022→2023→2024)
Rank companies by 2024 revenue
Show the year with the highest revenue for each company
Show companies where net income > 400000 in any year

Combined Analysis (7 questions)

Show each stock's current price alongside its 2024 EPS (P/E ratio = price/EPS)
Show IT sector stocks with their 2024 revenue
Which sector has the highest total market cap?
Show stocks where price in May 2024 > price in Jan 2024 AND revenue grew in 2024
Show each company's sector and their YoY revenue growth
Show the top 2 stocks by EPS within each sector
Create a performance summary: ticker, sector, latest price, 2024 EPS, 2024 net income

Data Management (5 questions)

Add a new stock: 'BAJFINANCE', 'Bajaj Finance', 'Finance', 500000
Add price records for BAJFINANCE for January and February 2024 (prices: 7200, 7450)
Update WIPRO's market cap to 320000
Create a view it_sector_summary showing all IT stocks with their 2024 financials
Create an index on prices(ticker, price_date) for faster queries

Answers

SQL

-- 1
SELECT ticker, MAX(price_date) AS latest_date, close_price
FROM prices GROUP BY ticker;

-- Better with window function
SELECT ticker, price_date, close_price FROM (
    SELECT ticker, price_date, close_price,
        ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
    FROM prices
) WHERE rn = 1;

-- 2
WITH jan AS (
    SELECT ticker, close_price AS jan_price FROM prices WHERE price_date = '2024-01-01'
),
may AS (
    SELECT ticker, close_price AS may_price FROM prices WHERE price_date = '2024-05-01'
)
SELECT j.ticker, j.jan_price, m.may_price,
    ROUND(100.0 * (m.may_price - j.jan_price) / j.jan_price, 2) AS total_return_pct
FROM jan j INNER JOIN may m ON j.ticker = m.ticker;

-- 3
SELECT ticker, price_date, close_price,
    LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS prev_close,
    ROUND(100.0 * (close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date))
        / LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date), 2) AS monthly_return_pct
FROM prices
WHERE ticker = 'TCS';

-- 4
SELECT ticker, price_date, close_price FROM (
    SELECT ticker, price_date, close_price,
        ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY close_price DESC) AS rn
    FROM prices
) WHERE rn = 1;

-- 5
SELECT ticker, price_date, close_price,
    close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS price_change
FROM prices;

-- 6
WITH changes AS (
    SELECT ticker, price_date,
        close_price - LAG(close_price) OVER (PARTITION BY ticker ORDER BY price_date) AS price_change
    FROM prices
)
SELECT ticker, price_date, price_change FROM changes
WHERE price_change = (SELECT MAX(price_change) FROM changes);

-- 7
SELECT price_date, volume,
    SUM(volume) OVER (ORDER BY price_date) AS running_volume
FROM prices WHERE ticker = 'TCS';

-- 8
SELECT price_date, close_price,
    AVG(close_price) OVER (
        ORDER BY price_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS moving_avg_3m
FROM prices WHERE ticker = 'INFY';

-- 9
WITH returns AS (
    WITH jan AS (SELECT ticker, close_price AS jan_p FROM prices WHERE price_date = '2024-01-01'),
         may AS (SELECT ticker, close_price AS may_p FROM prices WHERE price_date = '2024-05-01')
    SELECT j.ticker, ROUND(100.0*(m.may_p - j.jan_p)/j.jan_p, 2) AS total_return
    FROM jan j JOIN may m ON j.ticker = m.ticker
)
SELECT ticker, total_return, RANK() OVER (ORDER BY total_return DESC) AS performance_rank
FROM returns;

-- 10
SELECT ticker, price_date, open_price, close_price
FROM prices WHERE close_price < open_price;

-- 11
SELECT ticker, fiscal_year, revenue,
    LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year) AS prev_revenue,
    ROUND(100.0 * (revenue - LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year))
        / LAG(revenue) OVER (PARTITION BY ticker ORDER BY fiscal_year), 2) AS yoy_growth_pct
FROM financials;

-- 12
SELECT f.ticker, s.company_name, f.eps
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024
ORDER BY f.eps DESC LIMIT 1;

-- 13
SELECT f.ticker, s.company_name,
    ROUND(100.0 * f.net_income / f.revenue, 2) AS net_margin_pct
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024
ORDER BY net_margin_pct DESC;

-- 14
WITH growth AS (
    WITH y2022 AS (SELECT ticker, revenue AS rev_2022 FROM financials WHERE fiscal_year = 2022),
         y2024 AS (SELECT ticker, revenue AS rev_2024 FROM financials WHERE fiscal_year = 2024)
    SELECT a.ticker, ROUND(100.0*(b.rev_2024 - a.rev_2022)/a.rev_2022, 2) AS total_growth_pct
    FROM y2022 a JOIN y2024 b ON a.ticker = b.ticker
)
SELECT * FROM growth ORDER BY total_growth_pct DESC LIMIT 1;

-- 15
SELECT ticker FROM (
    SELECT ticker, fiscal_year, eps,
        LAG(eps) OVER (PARTITION BY ticker ORDER BY fiscal_year) AS prev_eps
    FROM financials
) WHERE prev_eps IS NOT NULL AND eps > prev_eps
GROUP BY ticker HAVING COUNT(*) = 2;

-- 16
SELECT f.ticker, s.company_name, f.revenue,
    RANK() OVER (ORDER BY f.revenue DESC) AS revenue_rank
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.fiscal_year = 2024;

-- 17
SELECT ticker, fiscal_year, revenue FROM (
    SELECT ticker, fiscal_year, revenue,
        ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY revenue DESC) AS rn
    FROM financials
) WHERE rn = 1;

-- 18
SELECT DISTINCT f.ticker, s.company_name
FROM financials f JOIN stocks s ON f.ticker = s.ticker
WHERE f.net_income > 400000;

-- 19
WITH latest_prices AS (
    SELECT ticker, close_price FROM (
        SELECT ticker, close_price,
            ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
        FROM prices
    ) WHERE rn = 1
)
SELECT s.ticker, s.company_name, lp.close_price, f.eps,
    ROUND(lp.close_price / f.eps, 2) AS pe_ratio
FROM stocks s
JOIN latest_prices lp ON s.ticker = lp.ticker
JOIN financials f ON s.ticker = f.ticker AND f.fiscal_year = 2024;

-- 20
SELECT s.ticker, s.company_name, s.sector, f.revenue
FROM stocks s JOIN financials f ON s.ticker = f.ticker
WHERE s.sector = 'IT' AND f.fiscal_year = 2024;

-- 21
SELECT sector, SUM(market_cap) AS total_market_cap
FROM stocks GROUP BY sector ORDER BY total_market_cap DESC;

-- 22 (stocks that went up in price AND grew revenue)
WITH price_up AS (
    WITH jan AS (SELECT ticker, close_price AS jan_p FROM prices WHERE price_date = '2024-01-01'),
         may AS (SELECT ticker, close_price AS may_p FROM prices WHERE price_date = '2024-05-01')
    SELECT j.ticker FROM jan j JOIN may m ON j.ticker = m.ticker WHERE m.may_p > j.jan_p
),
rev_grew AS (
    WITH y23 AS (SELECT ticker, revenue AS r23 FROM financials WHERE fiscal_year = 2023),
         y24 AS (SELECT ticker, revenue AS r24 FROM financials WHERE fiscal_year = 2024)
    SELECT a.ticker FROM y23 a JOIN y24 b ON a.ticker = b.ticker WHERE b.r24 > a.r23
)
SELECT ticker FROM price_up
INTERSECT
SELECT ticker FROM rev_grew;

-- 23
SELECT s.ticker, s.sector, f.fiscal_year, f.revenue,
    LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year) AS prev_revenue,
    ROUND(100.0*(f.revenue - LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year))
        / LAG(f.revenue) OVER (PARTITION BY f.ticker ORDER BY f.fiscal_year), 2) AS yoy_growth
FROM stocks s JOIN financials f ON s.ticker = f.ticker;

-- 24
SELECT ticker, sector, eps FROM (
    SELECT f.ticker, s.sector, f.eps,
        ROW_NUMBER() OVER (PARTITION BY s.sector ORDER BY f.eps DESC) AS rn
    FROM financials f JOIN stocks s ON f.ticker = s.ticker
    WHERE f.fiscal_year = 2024
) WHERE rn <= 2;

-- 25
WITH latest AS (
    SELECT ticker, close_price FROM (
        SELECT ticker, close_price,
            ROW_NUMBER() OVER (PARTITION BY ticker ORDER BY price_date DESC) AS rn
        FROM prices
    ) WHERE rn = 1
)
SELECT s.ticker, s.sector, l.close_price, f.eps, f.net_income
FROM stocks s
LEFT JOIN latest l ON s.ticker = l.ticker
LEFT JOIN financials f ON s.ticker = f.ticker AND f.fiscal_year = 2024;

-- 26
INSERT INTO stocks VALUES ('BAJFINANCE', 'Bajaj Finance', 'Finance', 500000);

-- 27
INSERT INTO prices (ticker, price_date, open_price, close_price, volume)
VALUES
('BAJFINANCE', '2024-01-01', 7150, 7200, 300000),
('BAJFINANCE', '2024-02-01', 7200, 7450, 320000);

-- 28
UPDATE stocks SET market_cap = 320000 WHERE ticker = 'WIPRO';

-- 29
CREATE VIEW it_sector_summary AS
SELECT s.ticker, s.company_name, s.market_cap, f.fiscal_year, f.revenue, f.net_income, f.eps
FROM stocks s JOIN financials f ON s.ticker = f.ticker
WHERE s.sector = 'IT';

-- 30
CREATE INDEX idx_prices_ticker_date ON prices(ticker, price_date);

Capstone Complete

If you worked through all 30 questions, you can:

[ ] Write complex multi-CTE queries from scratch
[ ] Use window functions for rankings, running totals, and MoM analysis
[ ] Join 3+ tables and analyze at multiple granularities
[ ] Design schemas with proper constraints and indexes
[ ] Write queries that directly answer business questions

You are now job-ready for data analyst and SQL-heavy roles.

What to Do Next

Push this project to GitHub — show it in your portfolio
Practice on LeetCode SQL — Database problems section, Hard level
Set up PostgreSQL — learn the differences from SQLite
Explore Google BigQuery — free tier, industry-scale data
Build Project 2 — Company financial health dashboard with real NSE/BSE data

SQL Mastery Course

What is SQL + Databases

Week 1 — What is SQL + Databases

Key Concepts

Core Syntax

Your First Queries

Show all data

Show specific columns

Filter with WHERE

Multiple conditions

Sort results

Practice Exercises

Answers

Filtering Data

Week 2 — Filtering Data

Combining Conditions

AND — both conditions must be true

OR — either condition can be true

NOT — exclude a condition

IN — match a list of values

BETWEEN — filter a range

LIKE — pattern matching for text

Combining Everything

Practice Exercises

Answers

Aggregate Functions

Week 3 — Aggregate Functions

The 5 Core Aggregate Functions

COUNT

SUM

AVG

MIN and MAX

Combining Multiple Aggregates

Practice Exercises

Answers

GROUP BY + HAVING

Week 4 — GROUP BY + HAVING

GROUP BY — Aggregates Per Group

Rules for GROUP BY

Multiple Columns in GROUP BY

ORDER BY with GROUP BY

HAVING — Filter Groups After Aggregating

WHERE vs HAVING — The Key Difference

Full Query Order

Practice Exercises

Answers

Subqueries

Week 5 — String + Date Functions

String Functions

UPPER and LOWER — change case

LENGTH — count characters

TRIM — remove leading/trailing spaces

SUBSTR — extract part of a string

REPLACE — swap text

Combining string functions

Date Functions (SQLite)

DATE — get today's date

strftime — extract parts of a date

Date arithmetic

Adding hire_date to practice

Practice Exercises

Answers

NULL Values

Week 6 — NULL Handling

What is NULL?

NULL breaks normal comparisons

NULL in aggregate functions

COALESCE — replace NULL with a default

IFNULL — SQLite shorthand for COALESCE

NULLIF — turn a value into NULL

NULL in calculations

Extended employees table for practice

Practice Exercises

Answers

Mini Project 1

Week 7 — Mini Project 1: Sales Data Analysis

What This Tests

The Dataset

Table: sales

Table: reps

Table: `sales`

Table: `reps`

`orders`

`customers`

`products`