Statistics – Complete Course

Module 01 · Central Tendency

Measures of Central Tendency

Where does the data "centre" itself? Mean, Median, Mode — and when to use each.

Concept Overview

What is Central Tendency?

A measure of central tendency gives us a single representative value that summarises an entire dataset. It answers: "What is the typical or central value?" The three main measures are the Mean, Median, and Mode — each with unique strengths and use-cases.

➗

Arithmetic Mean

Sum of all values ÷ count. Most common, sensitive to outliers.

⚖️

Median

Middle value when sorted. Robust to outliers. Best for income data.

👑

Mode

Most frequent value. Used for categorical data & shoe sizes.

📐

Geometric Mean

ⁿ√(x₁×x₂×…×xₙ). Used for growth rates, investment returns.

🔄

Harmonic Mean

n ÷ Σ(1/xᵢ). Used for rates and speeds.

⚡

Weighted Mean

Mean where values have different importance weights.

Measure 1

Arithmetic Mean (AM) — The Average

The arithmetic mean is the sum of all observations divided by the number of observations. It is the most widely used measure and is the "balance point" of a distribution.

Arithmetic Mean

x̄ = (x₁ + x₂ + ... + xₙ) / n = Σxᵢ / n

For Grouped Data: x̄ = Σ(fᵢ × mᵢ) / Σfᵢ

1

List all values

Dataset: 12, 18, 25, 30, 15 → n = 5

2

Sum all values

Σx = 12 + 18 + 25 + 30 + 15 = 100

3

Divide by count

x̄ = 100 / 5 = 20

⚠ Outlier Problem: Salaries: ₹20k, ₹22k, ₹19k, ₹21k, ₹500k. Mean = ₹116.4k — completely misleading! One CEO salary skews the entire picture. This is why India's median household income is always reported alongside mean.

📊 Mean as Balance Point — Interactive

🧮 Mean Calculator

Enter values (comma-separated):

Results will appear here...

Measure 2

Median — The Middle Value

The median is the middle value when data is arranged in ascending order. For an even number of observations, it's the mean of the two middle values. The median is not affected by extreme values (outliers), making it ideal for skewed distributions like income and house prices.

Median — Ungrouped Data

Odd n: Median = value at position (n+1)/2
Even n: Median = average of values at n/2 and (n/2)+1

For Grouped Data (Ogive method):
Median = L + [(n/2 − cf) / f] × h

Grouped Data Formula Legend: L = lower boundary of median class | n = total frequency | cf = cumulative frequency before median class | f = frequency of median class | h = class width

⚖️ Median Calculator

Enter values:

Results will appear here...

Mean vs Median in Skewed Data

Measure 3

Mode — The Most Frequent Value

The mode is the value that appears most often in a dataset. A distribution can be unimodal (one mode), bimodal (two modes), or multimodal. Mode is the only measure applicable to categorical/nominal data (e.g., most popular colour, most common occupation).

Mode — Grouped Data (Czuprow's Formula)

Mode = L + [f₁ − f₀ / (2f₁ − f₀ − f₂)] × h

L=lower boundary of modal class | f₁=modal class freq | f₀=preceding class freq | f₂=succeeding class freq | h=class width

Real Example: Shoe sizes: 7, 8, 7, 9, 7, 8, 6, 7. Mode = 7. The manufacturer should produce the most size-7 shoes. Mean (7.4) is useless here — you can't make size 7.4 shoes!

👑 Mode Finder

Enter values:

Results will appear here...

Measure 4

Geometric Mean — For Growth Rates

The geometric mean is the nth root of the product of n values. It is used when values are multiplicative in nature — like compound interest, population growth rates, and investment returns. It is always ≤ Arithmetic Mean (AM–GM inequality).

Geometric Mean

GM = ⁿ√(x₁ × x₂ × ... × xₙ) = (∏xᵢ)^(1/n)

Equivalent using logarithms:
log(GM) = [log(x₁) + log(x₂) + ... + log(xₙ)] / n

Example: Nifty 50 returns: Year 1 = +20%, Year 2 = −10%, Year 3 = +15%.
Using values: 1.20, 0.90, 1.15 → GM = ∛(1.20 × 0.90 × 1.15) = ∛1.242 ≈ 1.075 → 7.5% CAGR.
Arithmetic mean gives (20−10+15)/3 = 8.33% — which overstates returns!

📐 Geometric Mean Calculator

Enter values (positive):

Results will appear here...

Measure 5

Harmonic Mean — For Rates & Speeds

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals. It's used when dealing with rates — speed, frequency, P/E ratios in finance. It gives more weight to smaller values.

Harmonic Mean

HM = n / Σ(1/xᵢ) = n / (1/x₁ + 1/x₂ + ... + 1/xₙ)

Classic Speed Problem: A car travels 60 km at 30 km/h and 60 km at 60 km/h. Average speed = HM(30, 60) = 2/(1/30+1/60) = 2/(3/60) = 40 km/h. Arithmetic mean gives 45 km/h — wrong!

Finance Use: Averaging P/E ratios across stocks in a portfolio — HM is more appropriate than AM because P/E is a ratio (price per unit of earnings).

Relationship: AM ≥ GM ≥ HM (for positive values)

Example: Values 2, 8 → AM = 5, GM = 4, HM = 3.2

Measure 6

Weighted Mean — Importance Matters

When different values carry different levels of importance (weights), we use the weighted mean. It is the foundation of index numbers, GPA calculations, and portfolio return calculations.

Weighted Arithmetic Mean

x̄w = Σ(wᵢ × xᵢ) / Σwᵢ

⚡ GPA / Weighted Mean Calculator

Example: Marks(Values) and Credits(Weights) for 4 subjects

Values (marks):

Weights (credits):

Results will appear here...

Subject	Marks (xᵢ)	Credits (wᵢ)	wᵢ × xᵢ
Economics	85	4	340
Statistics	72	3	216
Finance	90	4	360
History	65	2	130
Total	—	13	1046

Weighted Mean = 1046 / 13 = 80.46 (Simple mean = 78 — different!)

Summary

When to Use Which Measure?

Situation	Best Measure	Why
Exam scores, heights, temperatures	Arithmetic Mean	Symmetric, no outliers
Income, house prices, wealth	Median	Skewed distribution, outliers
Shoe sizes, favourite colour, most common profession	Mode	Categorical / nominal data
Investment returns, population growth, CAGR	Geometric Mean	Multiplicative growth
Speeds, rates, P/E ratio averaging	Harmonic Mean	Rate-based data
GPA, portfolio returns, index numbers	Weighted Mean	Unequal importance

All Three on One Distribution — Symmetric vs Skewed

A dataset has values: 2, 4, 4, 4, 5, 5, 7, 9. Which statement is TRUE?

Module 02 · Measures of Dispersion

How Spread Out is the Data?

Central tendency tells where data centres. Dispersion tells how much it varies.

Core Concept

Why Dispersion Matters

Two datasets can have the same mean but completely different spreads. Student A scores: 48, 50, 50, 52 (Mean=50). Student B scores: 10, 50, 90, 50 (Mean=50). Same mean — but Student B is wildly inconsistent!

📏

Range

Max − Min. Simple but affected by outliers.

🎯

Mean Deviation

Average of |deviations from mean|. More informative than range.

📐

Variance

Average of squared deviations. Foundation of statistics.

σ

Standard Deviation

√Variance. Most used measure of spread.

📊

IQR

Q3 − Q1. Middle 50% range. Robust to outliers.

%

CV

CV = (σ/x̄) × 100. Compares spread across datasets.

Range & Mean Deviation

Range = Xmax − Xmin
Mean Deviation (from mean) = Σ|xᵢ − x̄| / n
Mean Deviation (from median) = Σ|xᵢ − M| / n

Mean Deviation is more representative than Range because Range uses only 2 values while MD uses all values. MD from Median is always ≤ MD from Mean.

Variance & Standard Deviation

Standard deviation is the most important measure of dispersion in statistics. It quantifies the average spread of data around the mean. Population vs Sample formulas differ by the denominator (N vs n−1).

Variance & Standard Deviation

Population Variance: σ² = Σ(xᵢ − μ)² / N
Sample Variance: s² = Σ(xᵢ − x̄)² / (n−1) ← Bessel's correction
Standard Deviation: σ = √[Σ(xᵢ − μ)² / N]

Shortcut formula: σ² = Σxᵢ²/N − (Σxᵢ/N)² = E(X²) − [E(X)]²

📐 Variance & SD Calculator

Enter values:

Results will appear here...

Standard Deviation: Low vs High Spread

IQR & Box Plot

The Interquartile Range (IQR) is the range of the middle 50% of data. Q1 (25th percentile), Q2 (Median), Q3 (75th percentile). A box plot visualises these five-number statistics: Min, Q1, Median, Q3, Max.

Quartiles & IQR

Q1 = value at (n+1)/4 position
Q3 = value at 3(n+1)/4 position
IQR = Q3 − Q1
Outlier bounds: < Q1 − 1.5×IQR or > Q3 + 1.5×IQR

📦 Box Plot Visualisation

Coefficient of Variation (CV)

CV allows comparison of spread across datasets with different units or scales. It expresses standard deviation as a percentage of the mean. Lower CV = more consistent.

Coefficient of Variation

CV = (σ / x̄) × 100%

Example: Stock A: Mean return 10%, SD = 2% → CV = 20%. Stock B: Mean return 20%, SD = 8% → CV = 40%. Stock A is more consistent relative to its return — better risk-adjusted investment!

Which measure of dispersion is most useful for comparing variability between two datasets with different units (e.g., height in cm vs weight in kg)?

Module 03 · Probability Distributions

The Shape of Data

Normal, binomial, Poisson — the mathematical models behind real-world phenomena.

Normal Distribution — The Bell Curve

The normal distribution is the most important distribution in statistics. Many natural phenomena — heights, exam scores, measurement errors — follow it. It is symmetric, bell-shaped, and completely defined by its mean (μ) and standard deviation (σ).

Normal Distribution PDF

f(x) = (1/σ√2π) × e^[−(x−μ)²/2σ²]

Empirical Rule (68-95-99.7 Rule):
μ ± 1σ covers 68.27% of data
μ ± 2σ covers 95.45% of data
μ ± 3σ covers 99.73% of data

🔔 Normal Distribution — 68-95-99.7 Rule

Z-Score (Standardisation): z = (x − μ) / σ. Transforms any normal distribution to Standard Normal (μ=0, σ=1). Used to find probabilities using Z-tables.

🔔 Z-Score Calculator

Value (x):

Mean (μ):

Std Dev (σ):

Results will appear here...

Skewness & Kurtosis

Skewness measures asymmetry. Kurtosis measures the "tailedness" — how heavy the tails are compared to a normal distribution.

Pearson's Coefficient of Skewness

Sk = 3(Mean − Median) / σ

Positive Skew (right): Mean > Median > Mode → long right tail
Negative Skew (left): Mean < Median < Mode → long left tail
Symmetric: Mean = Median = Mode

Negative | Symmetric | Positive Skew

Kurtosis: Leptokurtic (K>3) = heavy tails, sharp peak (riskier in finance). Platykurtic (K<3) = light tails, flat peak. Mesokurtic (K=3) = normal distribution baseline.

Binomial Distribution

Models the number of successes in n independent Bernoulli trials, where each trial has probability p of success. Used for quality control, election polling, medical trials.

Binomial Distribution

P(X = k) = C(n,k) × p^k × (1−p)^(n−k)
Mean = np | Variance = np(1−p) | SD = √[np(1−p)]

🎲 Binomial Probability Calculator

n (trials):

p (success prob):

k (successes):

Results will appear here...

Binomial Distribution (n=10, p=0.4)

Poisson Distribution

Models the number of events occurring in a fixed interval of time or space, when events occur at a constant average rate (λ). Used for: calls per hour, accidents per day, typos per page.

Poisson Distribution

P(X = k) = (e^−λ × λ^k) / k!
Mean = λ | Variance = λ | (Mean = Variance is a key property!)

Example: A call centre receives 3 calls/minute on average (λ=3). P(exactly 5 calls in a minute) = (e⁻³ × 3⁵) / 5! = (0.0498 × 243) / 120 ≈ 0.1008 (10.08%)

Module 04 · Probability Theory

The Mathematics of Chance

Foundation of statistics, finance, and decision making under uncertainty.

Basic Probability Concepts

Probability is a number between 0 and 1 that measures how likely an event is to occur. P=0 means impossible, P=1 means certain.

Classical Probability (Laplace)

P(A) = Number of favourable outcomes / Total possible outcomes

P(A') = 1 − P(A) (Complement Rule)
0 ≤ P(A) ≤ 1 (Axiom of Probability)

➕

Addition Rule

P(A∪B) = P(A) + P(B) − P(A∩B). For mutually exclusive: P(A∪B) = P(A) + P(B).

✖️

Multiplication Rule

P(A∩B) = P(A) × P(B|A). For independent: P(A∩B) = P(A) × P(B).

🔀

Conditional Probability

P(A|B) = P(A∩B) / P(B). "Probability of A given B has occurred."

🧬

Bayes' Theorem

P(A|B) = P(B|A)×P(A) / P(B). Updates beliefs with new evidence.

Bayes' Theorem — Deep Dive

Bayes' theorem is one of the most powerful ideas in all of statistics. It tells us how to update our prior beliefs when we receive new evidence.

Bayes' Theorem

P(H|E) = P(E|H) × P(H) / P(E)

P(H|E) = Posterior (belief after evidence)
P(H) = Prior (initial belief)
P(E|H) = Likelihood (how well H explains E)
P(E) = Marginal (total probability of E)

Medical Test Example: Disease prevalence = 1% (Prior P(D)=0.01). Test is 99% accurate. You test positive. What's the actual probability you have the disease?

P(D|+) = P(+|D)×P(D) / P(+) = (0.99×0.01) / (0.99×0.01 + 0.01×0.99) = 0.0099/0.0198 = 50%! Not 99% as most people intuitively assume.

Probability Tree — Medical Test

Expected Value & Variance of a Random Variable

Expected Value (Discrete)

E(X) = Σ xᵢ × P(xᵢ)
Var(X) = E(X²) − [E(X)]² = Σ xᵢ² × P(xᵢ) − μ²
SD(X) = √Var(X)

Portfolio Expected Return: Stock A: 20% return, 60% probability. Stock B: −5% return, 40% probability. E(return) = 0.20×0.60 + (−0.05)×0.40 = 0.12 − 0.02 = 10% expected return.

🎲 Expected Value Calculator

Values (x):

Probabilities P(x):

Results will appear here...

A fair die is rolled. What is the Expected Value?

Module 05 · Correlation Analysis

How Variables Move Together

Pearson, Spearman, and the golden rule: correlation ≠ causation.

Pearson's Correlation Coefficient (r)

Pearson's r measures the strength and direction of a linear relationship between two continuous variables. It ranges from −1 to +1.

Pearson's r

r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / √[Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²]
r = [nΣxy − ΣxΣy] / √{[nΣx² − (Σx)²][nΣy² − (Σy)²]}

r = +1.0

Perfect +ve

r = +0.7

Strong +ve

r = +0.3

Weak +ve

r = 0.0

No correlation

r = −0.7

Strong −ve

r = −1.0

Perfect −ve

Scatter Plots — Different r Values

🔗 Pearson Correlation Calculator

X values:

Y values:

Results will appear here...

Spearman's Rank Correlation

Spearman's ρ (rho) is a non-parametric measure based on the ranks of data. Use it when data is ordinal, or when the relationship is monotonic but not necessarily linear.

Spearman's Rank Correlation

ρ = 1 − [6 × Σdᵢ²] / [n(n² − 1)]
where dᵢ = rank(xᵢ) − rank(yᵢ)

Use Spearman when: Data is ordinal (ranks, ratings) | Outliers are present | The relationship is monotonic but not linear | You're comparing rankings (e.g., judges' rankings in a competition).

Correlation ≠ Causation! Ice cream sales and drowning deaths are highly correlated (r≈0.9) — but ice cream doesn't cause drowning! Both are caused by a confounding variable: hot summer weather.

Module 06 · Regression Analysis

Predicting with Lines & Curves

Build mathematical models that predict one variable from another.

Simple Linear Regression

Regression finds the best-fit line through data points. The "Ordinary Least Squares" (OLS) method minimises the sum of squared residuals (vertical distances from points to the line).

Regression Line: Y on X

ŷ = a + bx
b (slope) = [nΣxy − ΣxΣy] / [nΣx² − (Σx)²] = r × (σy/σx)
a (intercept) = ȳ − b×x̄

Note: Regression line always passes through (x̄, ȳ)

R² (Coefficient of Determination): R² = r² tells what % of variance in Y is explained by X. R²=0.81 means 81% of variation in Y is explained by the regression model.

📈 Linear Regression Calculator

X values:

Y values:

Results will appear here...

Regression Line — Scatter + Best Fit

Two Regression Lines

In statistics, there are two regression lines: Y on X (used to predict Y given X), and X on Y (used to predict X given Y). They are different unless r = ±1.

Two Regression Lines

Y on X: (y−ȳ) = r(σy/σx)(x−x̄) → use to predict Y
X on Y: (x−x̄) = r(σx/σy)(y−ȳ) → use to predict X

Both lines intersect at the point (x̄, ȳ)
Product of regression coefficients = r² → byx × bxy = r²

Finding r from regression coefficients: If byx = 0.8 and bxy = 0.2, then r = √(0.8 × 0.2) = √0.16 = 0.4. Note: r has the same sign as both coefficients.

The regression coefficient of Y on X is 1.6 and of X on Y is 0.4. What is the correlation coefficient r?

Module 07 · Hypothesis Testing

Is it Real or Just Random Chance?

The framework of scientific decision-making under uncertainty.

The Hypothesis Testing Framework

Hypothesis testing is a formal procedure to decide whether sample data provides enough evidence to reject a null hypothesis (H₀). We never "prove" H₀ true — we only reject it or fail to reject it.

1

State Hypotheses

H₀ (null): No effect / status quo. H₁ (alternative): There is an effect.

2

Choose Significance Level

α = 0.05 (5%) is most common. This is the probability of rejecting H₀ when it's true (Type I error).

3

Choose & Calculate Test Statistic

Z-test, t-test, chi-square, F-test — depends on data type and sample size.

4

Find p-value or Critical Value

p-value = probability of getting results as extreme as observed, assuming H₀ is true.

5

Make Decision

If p-value < α → Reject H₀. If p-value ≥ α → Fail to reject H₀.

Common Test Statistics

Z-test (known σ, large n): z = (x̄ − μ₀) / (σ/√n)
t-test (unknown σ, small n): t = (x̄ − μ₀) / (s/√n), df = n−1
Chi-square (goodness of fit): χ² = Σ[(O−E)²/E]

Type I & Type II Errors

	H₀ is TRUE	H₀ is FALSE
Reject H₀	Type I Error (α) — False Positive	Correct Decision (Power = 1−β)
Fail to Reject H₀	Correct Decision (1−α)	Type II Error (β) — False Negative

Type I Error (α): Convicting an innocent person. Rejecting H₀ when it's true. We control this directly with α.

Type II Error (β): Acquitting a guilty person. Failing to detect a real effect. Minimise by increasing sample size.

Critical Region — One Tailed vs Two Tailed

A drug company claims their drug works. The null hypothesis H₀ is "the drug has no effect." If the drug works but the test fails to detect it, this is a:

Module 08 · Sampling Theory

The Art of Representative Selection

How to draw conclusions about a population from a sample.

Population vs Sample

A population is the entire group of interest. A sample is a subset drawn from it. Statistics (from sample) are used to estimate Parameters (of population).

Measure	Population (Parameter)	Sample (Statistic)
Mean	μ (mu)	x̄ (x-bar)
Variance	σ² (sigma squared)	s²
Std Dev	σ (sigma)	s
Size	N	n
Proportion	P	p̂

Sampling Methods

🎲

Simple Random

Every member has equal chance. Like a lottery draw.

📋

Systematic

Select every kth element. Simple but may have periodicity bias.

🗂️

Stratified

Divide into strata (groups), sample proportionally from each.

📍

Cluster

Divide into clusters, randomly select entire clusters.

⚠️

Convenience

Use what's easiest. Fast but biased — not recommended.

Central Limit Theorem (CLT) — The Most Important Theorem

The CLT states that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases (n ≥ 30), regardless of the shape of the population distribution. This is why the normal distribution is everywhere!

Sampling Distribution of x̄

E(x̄) = μ (sample mean is unbiased estimator of population mean)
SE(x̄) = σ/√n (Standard Error of the Mean)

As n → ∞, x̄ ~ N(μ, σ²/n)

Central Limit Theorem — Sample Size Effect

Confidence Interval: x̄ ± z × (σ/√n). For 95% CI, z = 1.96. This means: we are 95% confident the true population mean lies within this range.

Module 09 · Time Series Analysis

Patterns Through Time

Decompose trends, seasonality, and cycles in temporal data.

Components of Time Series

📈

Trend (T)

Long-term upward/downward movement. GDP growing over decades.

📅

Seasonal (S)

Regular pattern that repeats within a year. Ice cream sales in summer.

🔄

Cyclical (C)

Irregular fluctuations over 2–10 years. Business cycles.

⚡

Irregular (I)

Random, unpredictable variations. Natural disasters, wars.

Time Series Decomposition Models

Additive: Y = T + S + C + I (when seasonal variation is constant)
Multiplicative: Y = T × S × C × I (when seasonal variation grows with trend)

Moving Averages

A moving average smooths out short-term fluctuations to reveal the underlying trend. A 3-year moving average replaces each value with the average of it and its two neighbours.

Moving Average Smoothing

Simple Moving Average (3-period)

MA₃ = (Yt−1 + Yt + Yt+1) / 3

For even-period MAs (e.g. 4-point), a second centring average is needed.

Exponential Smoothing: Gives more weight to recent observations. St = αXt + (1−α)St−1 where α is the smoothing constant (0<α<1). Higher α = more responsive to recent changes.

Module 10 · Index Numbers

Measuring Change Over Time

The mathematics behind CPI, WPI, Sensex, and cost-of-living indices.

What are Index Numbers?

Index numbers are specialised averages that measure relative change in a variable (or group of variables) over time or between places. They reduce complex data to a single comparable number. The Consumer Price Index (CPI) measures inflation; the Sensex measures stock market performance.

Simple Price Index

P₀₁ = (P₁ / P₀) × 100
where P₀ = price in base year, P₁ = price in current year

Weighted Index Numbers

📊

Laspeyres Index

Uses BASE YEAR quantities as weights. Tends to overstate inflation.

📐

Paasche Index

Uses CURRENT YEAR quantities as weights. Understates inflation.

⚖️

Fisher's Ideal Index

Geometric mean of Laspeyres & Paasche. Called "ideal" as it satisfies all tests.

Key Index Formulas

Laspeyres: L = [Σ(P₁Q₀) / Σ(P₀Q₀)] × 100
Paasche: Pa = [Σ(P₁Q₁) / Σ(P₀Q₁)] × 100
Fisher: F = √(L × Pa) ← Geometric Mean of L and Pa

🧮 Index Number Calculator (Laspeyres & Paasche)

Enter prices and quantities for 3 commodities

P₀ (base prices):

P₁ (current prices):

Q₀ (base quantities):

Q₁ (current quantities):

Results will appear here...

Tests for Index Numbers (Fisher's tests):
• Unit Test: Index should be independent of units of measurement.
• Time Reversal Test: P₀₁ × P₁₀ = 1. Fisher satisfies this; Laspeyres & Paasche don't.
• Factor Reversal Test: Price index × Quantity index = Value index. Only Fisher satisfies this.

India CPI Trend (approximate)

Which index number is called Fisher's "Ideal" Index?

The Language of Data& Uncertainty

The Language of Data
& Uncertainty