Learning Resources

Interactive demonstrations and code examples for econometrics, microeconomics, and forecasting.

▶Standard Deviation

Statistics · Descriptive Statistics

Imagine this: Two classes both have an average exam score of 70. In Class A, most students scored between 65 and 75 — pretty close together. In Class B, scores ranged from 30 to 100 — all over the place! Both classes have the same average, but clearly something is different. The standard deviation is the single number that captures this difference. A small SD means scores are tightly clustered around the mean. A large SD means they’re spread out.

Play with the sliders below to see how the spread changes!

The Standard Deviation Formula

s = √( ∑ (x_i − x̄)² / (n−1) )

(xᵢ − x̄)² — squared deviations ∑ / (n−1) — variance √(…) — square root

The formula uses Greek letters as shorthand. Here’s what each one means:

Symbol	Name	Meaning in this context
σ	lowercase sigma	Standard deviation of a population
s	s (Roman)	Standard deviation of a sample (what we usually compute)
σ² / s²	sigma-squared / s-squared	Variance — SD before taking the square root (in squared units)
Σ	uppercase Sigma	Summation — “add up all the following terms”
μ	mu	Population mean (true average for the whole group)
x	x-bar	Sample mean (average computed from collected data)
n	n (Roman)	Sample size — number of observations

Tip: lowercase σ and uppercase Σ look similar but mean very different things. Context matters: σ (or s) is a single number measuring spread; Σ is an instruction to sum a list of numbers.

Normal Distribution of Exam Scores (mean = 70)

Within ±1σ (≈68%) Within ±2σ (≈95%) Beyond ±2σ (≈5%) Mean ±1σ ±2σ

Standard Deviation (s) 10.0

Controls how spread out the scores are (mean is fixed at 70)

SD is in the same units as the data. If scores are in points, the SD is in points. A score of 80 means very different things in a class with SD=2 versus SD=20. Try dragging the SD slider to see how the spread changes!
The 68-95-99.7 rule (Empirical Rule). For bell-shaped data, about 68% of observations fall within ±1 SD of the mean, 95% within ±2 SD, and 99.7% within ±3 SD. Check the stats output above — the percentages match the rule!
Mean and SD are independent. The spread of the bell curve depends only on σ, not on where the curve is centered. A class with mean 60 and SD=10 has the exact same bell shape as one with mean 85 and SD=10 — just shifted on the score axis.
SD vs. Variance. Variance is SD². It’s useful in math (variances add across independent variables) but hard to interpret because it’s in squared units. The SD is just its square root — back in the original units.
Outliers inflate the SD. Because deviations are squared before averaging, extreme points count disproportionately. A few outliers can make the SD much larger than it would otherwise be.

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Central Limit Theorem

Statistics · Inference

Imagine this: You sample 30 people and compute their average income. You do this thousands of times. The distribution of income in the population might be wildly skewed — but the distribution of those sample averages will be approximately normal.

This is the Central Limit Theorem: for large enough n, the sampling distribution of the mean is normal regardless of the population shape. It underlies every t-test, confidence interval, and regression inference you will ever run.

Pick a population shape, set the sample size, and simulate.

Population distribution

Shape of the underlying population

Sample size (n) 30

Observations drawn per sample

Population shape preview

Distribution of Sample Means — 2,000 simulations

Run the simulation to see results.

Click “Simulate” to run the demonstration.

Key Takeaways

The CLT is why the normal distribution is everywhere. Individual observations don’t need to be normal — only sample means do. That’s the foundation of t-tests, confidence intervals, and regression inference.
Larger n ︎→︎ faster convergence. At n = 1, sample means look like the raw population. By n = 30, the distribution is usually close to normal. At n = 100, it’s excellent even for severely skewed data.
SE = σ/√n. As n increases, sample means cluster more tightly around μ. Quadrupling the sample size halves the standard error.
Independence matters. The CLT assumes random, independent observations. Clustered or time-series data with autocorrelation can violate this and require different inference methods.

▶Confidence Intervals

Econometrics · Statistical Inference

Imagine this: You want to know the average exam score for every student in the school. But you can't ask everyone — that would take forever! So you ask a smaller group and use their average as your best guess. A confidence interval is like saying: "I'm pretty sure the real answer is somewhere in this range."

The wider the range, the more sure you can be it contains the truth. The narrower the range, the more precise — but the less sure. Play with the controls below to see how it works!

The Confidence Interval Formula

Estimate ± Z-value × SD / √n

Our best guess How confident Standard Error

This is where the Z-value in the formula comes from. The row matching your selected confidence level is highlighted.

Confidence Level	Significance (α)	Tail Area (α/2)	Z* (Critical Value)	Meaning
90%	0.10	0.05	1.645	We allow a 10% chance of being wrong
95%	0.05	0.025	1.960	We allow a 5% chance of being wrong
99%	0.01	0.005	2.576	We allow a 1% chance of being wrong

df	t* at 90%	t* at 95%	t* at 99%
5	2.015	2.571	4.032
10	1.812	2.228	3.169
15	1.753	2.131	2.947
20	1.725	2.086	2.845
25	1.708	2.060	2.787
30	1.697	2.042	2.750
50	1.676	2.009	2.678
100	1.660	1.984	2.626
∞	1.645	1.960	2.576

Z vs. t: The top table shows Z* values (used when the sample is large, roughly n > 30). The bottom table shows t* values, which depend on degrees of freedom (df = n − 1). Notice how t* values get closer to Z* as df increases — with a big enough sample, they're practically the same!

The Confidence Interval for Average Exam Score

78 points

Point Estimate

95% CI

How Sure Do You Want to Be?

More sure = wider range. It's a tradeoff!

Number of Students Asked 30

More students = more reliable guess

Standard Deviation (how spread out the scores are) 5.0

Higher = scores are all over the place. Lower = everyone scored similarly.

Standard Error (SD / √n): 0.91

Ask more students ︎→︎ narrower interval. The more people you ask, the better your guess. Try dragging the sample size slider to the right!
More variation in scores ︎→︎ wider interval. If some students got 20 and others got 100, it's harder to pin down the average. Try increasing the standard error!
More confidence ︎→︎ wider interval. Saying "I'm 99% sure" requires casting a wider net than "I'm 90% sure." Try switching between the confidence level buttons!
It's always a tradeoff. You can be very precise OR very confident, but not both at the same time (unless you get more data!).

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Hypothesis Testing & P-Values

Statistics · Inference

Imagine this: You run a regression on wage data and get a coefficient on education of 0.08 — each extra year of school is associated with an 8% wage increase. But is that a real effect, or just random noise in your sample?

The null hypothesis H₀ says the true coefficient is zero: no effect. The t-statistic measures how many standard errors your estimate sits from zero. The p-value asks: if H₀ were true, how often would you see a result this extreme just by chance?

Drag the t-statistic below and watch the p-value and decision update in real time.

The t-Statistic

t = β̂ ⁄ SE(β̂)

β̂ — OLS estimate SE(β̂) — standard error

Standard Normal Distribution — Rejection Regions (shaded red)

t-statistic 0.00

Drag to simulate different test statistics

Significance level (α)

Threshold for rejecting H₀

Test type

Two-tailed: H₁: β ≠ 0 One-tailed right: H₁: β > 0 One-tailed left: H₁: β < 0

Key Takeaways

A p-value is not the probability H₀ is true. It’s the probability of seeing data this extreme or more assuming H₀ were true. These are very different things.
Failing to reject ≠ accepting H₀. You may simply lack enough data (statistical power) to detect a real effect. Absence of evidence is not evidence of absence.
Statistical significance ≠ economic significance. With a large enough sample, even a trivially small effect will produce p < 0.05. Always report the coefficient magnitude and a confidence interval alongside the p-value.
α = 0.05 is conventional, not sacred. It was popularized by Fisher in the 1920s. The right threshold depends on the cost of false positives vs. false negatives in your specific application.
Two-tailed vs. one-tailed. A two-tailed test asks: is β ≠ 0? A one-tailed test asks: is β > 0 or β < 0? Toggle between them above. Note that one-tailed tests have less demanding critical values — use them only when theory strongly predicts a direction before seeing the data.

Quick Check

▶Type I & Type II Errors, Statistical Power

Statistics · Inference

Every hypothesis test can go wrong in two ways. A Type I error (α) is rejecting H₀ when it’s actually true — a false positive. A Type II error (β) is failing to reject H₀ when it’s actually false — a false negative. Power = 1 − β is the probability of correctly detecting a true effect.

The chart below shows the H₀ and H₁ distributions. The critical value separates “reject” from “fail to reject.” Drag the effect size to see how power changes.

α — Type I (false positive) β — Type II (false negative) Power — correct rejection

H₀: N(0,1) vs H₁: N(δ,1) — Two-Tailed Test

Effect size (δ) Large

Distance between H₀ and H₁ means

Significance level (α)

Sets the critical value for rejection

Key Takeaways

α and β trade off against each other. Making the test stricter (lower α) moves the critical value further out, which increases β. You will miss more true effects. There is no free lunch.
Power increases with effect size and sample size. Larger true effects are easier to detect. More data reduces SE, which narrows both distributions and makes them easier to separate.
Underpowered studies are dangerous. A study that finds p > α may simply lack the sample size to detect the effect — not evidence that no effect exists.
Power analysis should happen before data collection. Given a target power (e.g., 80%), a hypothesized effect size, and α, you can calculate the required n. Post-hoc power calculations are generally uninformative.

▶Ordinary Least Squares

Econometrics · Linear Regression

OLS fits a line through data by minimizing the sum of squared residuals. The sliders add measurement error to each variable independently. The hidden variable slider introduces an omitted confounder correlated with both x1 and x2 — watch how the coefficients get pulled as its effect on Y grows.

We usually show 2D projections, but the 3D view is a more complete picture of the regression.

x1 (σ₁): σ₁ = 0.30

x2 (σ₂): σ₂ = 0.30

hidden variable (U): U = 0.00

x1 (blue): adds measurement error to x1 only — blue dots spread horizontally, x1's standard error grows.

x2 (red): adds measurement error to x2 only — red dots spread horizontally, x2's standard error grows.

hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the regression. As U rises, its effect on Y grows and the coefficients on x1 and x2 pick up its influence — omitted variable bias.

▶Instrumental Variables

Econometrics · Causal Inference

OLS is unbiased only when the regressor X is uncorrelated with the error. When X is endogenous — correlated with unobserved determinants of Y — OLS picks up not just the causal effect but also the back-door path through the omitted variable. An instrumental variable Z breaks the endogeneity: it shifts X (relevance) but has no direct effect on Y (exclusion restriction). 2SLS uses only the exogenous part of X's variation — the part explained by Z — to identify β.

First Stage — Z ︎→︎ X

Structural Equation — X ︎→︎ Y

first-stage strength (π): Strong

endogeneity (ρ): ρ = 0.50

first-stage strength π (blue): controls how strongly Z predicts X. Higher π = larger first-stage F-statistic = more precise IV estimate. When F < 10 the instrument is "weak" and 2SLS can be severely biased and unreliable.

endogeneity ρ (purple): the correlation between X and the error term — the source of OLS bias. OLS absorbs the back-door path through the omitted variable; 2SLS strips it out. At ρ = 0 both estimators are consistent; as ρ rises the OLS coefficient drifts while 2SLS stays near 1.

▶Maximum Likelihood Estimation

Econometrics · Estimation

Maximum Likelihood Estimation finds parameters that make the observed data most probable. For binary outcomes we use logistic regression — MLE's answer to OLS for 0/1 data. Instead of minimizing squared residuals, MLE maximizes the log-likelihood. Output shows z-statistics and a Pseudo R² in place of the OLS R².

We usually show 2D projections, but the 3D view is a more complete picture of the regression.

x1 (σ₁): σ₁ = 0.30

x2 (σ₂): σ₂ = 0.30

hidden variable (U): U = 0.00

x1 (blue): adds measurement error to x1 — attenuation bias shrinks the x1 coefficient toward zero.

x2 (red): adds measurement error to x2 — same effect on x2's coefficient.

hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the logit. As U rises, it shifts the true probabilities and the estimated coefficients pick up its influence.

▶Fixed and Random Effects

Panel Data · Econometrics

With panel data — multiple observations per unit over time — unobserved individual characteristics (fixed effects) can bias pooled OLS. Demeaning subtracts each unit's mean from its observations, eliminating any time-invariant omitted variable. The fixed effects estimator exploits only within-unit variation to identify β. Random effects instead treats the individual effect as random and uncorrelated with X, allowing between-unit variation to also inform the estimate — gaining efficiency at the cost of that assumption.

clustered std. error (σ): σ = 0.10

unit heterogeneity (σ_α): σ_α = 0.00

hidden variable (U): U = 0.00

The dashed lines show the regression slope estimated within each group separately — each one is close to the true β. The solid OLS line is driven by between-group variation and diverges when σ_α is large. Clicking Demean collapses the groups to a common origin.

unit heterogeneity (σ_α): scales the time-invariant group intercepts α_i, which are correlated with group-level X. This biases pooled OLS but is removed exactly by demeaning — the FE/RE estimates are unaffected.

hidden variable (U): U is a time-varying omitted variable negatively correlated with within-group X variation. Because it varies within groups, demeaning does not remove it — U biases both pooled OLS and FE/RE estimates. Push far enough and the slope flips sign even after demeaning.

clustered std. error (σ): adds idiosyncratic within-unit noise — widens standard errors but does not bias the slope. This is the variation that clustered standard errors are designed to account for.

▶R-Squared & Goodness of Fit

▶Omitted Variable Bias

Regression · Causal Inference

You regress wages on education and get β̂ = 0.12. But ability also affects wages and is correlated with education. When you omit ability from the model, OLS attributes some of ability’s effect to education. The estimate is biased.

The bias formula: Bias ≈ γ × ρ(X, Z), where γ is the effect of the omitted variable Z on Y, and ρ is how correlated Z is with X. If either is zero, there is no bias.

Move the sliders to see how the biased estimate (dashed red) diverges from the true line (teal).

Correlation ρ(X, Z) 0.00

How correlated is the omitted variable with X?

Effect of Z on Y (γ) 0.0

How much does the omitted variable shift Y?

Key Takeaways

Bias = γ × δ̂, where δ̂ is the coefficient from regressing Z on X. If the omitted variable is uncorrelated with X (ρ = 0), there is no bias even if Z strongly affects Y.
You can sign the bias. If Z has a positive effect on Y (γ > 0) and is positively correlated with X (ρ > 0), β̂ will be upward biased. Knowing the direction helps even if you can’t measure Z.
Adding controls reduces OVB only if they proxy the omitted variable. Adding irrelevant controls doesn’t help and can introduce “bad control” bias.
This is why randomization is valuable. Random assignment of X makes ρ(X, Z) ≈ 0 for all omitted Z, eliminating OVB by design.

▶Heteroskedasticity

▶The Regression Weighting Problem

Aronow & Samii (2016) — Econometrics · Program Evaluation

When you run OLS with control variables, you may assume you are estimating the average treatment effect (ATE) across your full sample. Aronow and Samii (2016) show this is not the case. OLS implicitly assigns each observation a weight based on how much the treatment varies conditional on the covariates. Observations where treatment is nearly perfectly predicted by covariates receive very little weight — they barely influence the estimate at all. The result is that OLS identifies an effect for a subset of the data that may look very different from the full sample.

The core insight: OLS estimates a weighted average treatment effect where the weight for unit i is proportional to the conditional variance of treatment given covariates — not a simple average over the full sample.

The Math

Let D_i be the treatment indicator and X_i be covariates. Define the OLS residual from regressing treatment on covariates:

ê_i = D_i - E[D_i | X_i]

The OLS estimator of the treatment coefficient is a weighted average of unit-level treatment effects, with weights:

w_i = ê_i² / Σ ê_j²

Units whose treatment status is nearly determined by their covariates (ê_i ≈ 0) receive near-zero weight. The effective sample — the observations actually driving your estimate — can be far smaller and systematically different from your full sample.

Why It Matters

External validity: Your estimate may generalize only to the effective sample, not the full population you studied.
Heterogeneous effects: If treatment effects vary by covariates, OLS does not give you the ATE — it gives you the treatment effect for the people in the middle of the covariate distribution.
Reporting: You should report who is in your effective sample alongside your estimates.

Simulation: Drag to See the Problem

The slider controls how strongly a covariate predicts treatment — simulating increasing selection bias. Drag it right to watch the effective sample shrink and diverge from the full sample in real time.

AS Weight by Propensity Score

Where OLS Gets Its Identification

Covariate separation: 0.0 — random assignment

Effective n = — out of 400 (—% of sample driving the estimate)

At low separation (slider left): treated and control look alike, propensity scores cluster near 0.5, weights are roughly uniform — OLS estimates close to the ATE across the full sample.

At high separation (slider right): treated and control barely overlap. Only observations in the narrow middle zone identify the effect — shown in blue on the right. The gray mass of the full sample gets near-zero weight and effectively disappears from the estimate.

Real Data: LaLonde (1986)

The LaLonde dataset from the National Supported Work (NSW) job training experiment contains 445 treated and 260 control observations with covariates including age, education, race, marital status, and prior earnings. The charts below show how OLS weights are distributed and how the effective sample differs from the full sample.

# Install packages if needed:
# install.packages(c("MatchIt", "cobalt"))

library(MatchIt)   # provides lalonde dataset
library(cobalt)    # for love.plot / balance checks

data("lalonde")

# ── Step 1: OLS of treatment on covariates ──────────────────────────────────
fit_ps <- lm(treat ~ age + educ + black + hisp + married + nodegree + re74 + re75,
             data = lalonde)

# ── Step 2: Compute Aronow-Samii weights ────────────────────────────────────
e_hat   <- residuals(fit_ps)          # ê_i = D_i - E[D_i | X_i]
as_wts  <- e_hat^2 / sum(e_hat^2)    # normalize so weights sum to 1

# ── Step 3: Effective sample size ───────────────────────────────────────────
n_eff <- 1 / sum(as_wts^2)
cat("Full sample n:", nrow(lalonde), "\n")
cat("Effective sample n:", round(n_eff, 1), "\n")

# ── Step 4: Compare full sample vs effective sample covariate means ──────────
covs <- c("age", "educ", "black", "hisp", "married", "nodegree", "re74", "re75")

full_means <- colMeans(lalonde[, covs])
eff_means  <- sapply(covs, function(v) weighted.mean(lalonde[[v]], as_wts))

comparison <- data.frame(
  covariate    = covs,
  full_sample  = round(full_means, 3),
  eff_sample   = round(eff_means, 3),
  difference   = round(eff_means - full_means, 3)
)
print(comparison)

# ── Step 5: Run main OLS and note what it actually estimates ─────────────────
ols <- lm(re78 ~ treat + age + educ + black + hisp + married + nodegree + re74 + re75,
          data = lalonde)
summary(ols)

# The treatment coefficient is a weighted avg treatment effect —
# *not* the ATE over the full LaLonde sample.

* ── Load LaLonde data ──────────────────────────────────────────────────────
* Download from: https://users.nber.org/~rdehejia/data/nsw_dw.dta
* or use the version bundled with teffects/Stata 13+

* Using built-in Stata example data (NSW experimental):
* webuse lalonde, clear
* Or load your own:
use "nsw_dw.dta", clear

* ── Step 1: OLS of treatment on covariates ─────────────────────────────────
reg treat age educ black hispanic married nodegree re74 re75

* ── Step 2: Compute Aronow-Samii weights ───────────────────────────────────
predict e_hat, residuals          // ê_i = D_i - E[D_i | X_i]

gen e_hat_sq = e_hat^2
quietly summarize e_hat_sq
gen as_wt = e_hat_sq / r(sum)    // normalize weights to sum to 1

* ── Step 3: Effective sample size ──────────────────────────────────────────
gen as_wt_sq = as_wt^2
quietly summarize as_wt_sq
scalar n_eff = 1 / r(sum)
display "Full sample n = " _N
display "Effective sample n = " n_eff

* ── Step 4: Compare full vs effective sample covariate means ────────────────
foreach v in age educ black hispanic married nodegree re74 re75 {
    quietly summarize `v'
    scalar full_`v' = r(mean)
    quietly summarize `v' [aw = as_wt]
    scalar eff_`v'  = r(mean)
    display "`v':  full = " full_`v' "   effective = " eff_`v' ///
            "   diff = " eff_`v' - full_`v'
}

* ── Step 5: Main OLS regression ─────────────────────────────────────────────
reg re78 treat age educ black hispanic married nodegree re74 re75

* The coefficient on treat is a weighted average treatment effect (WATE)
* over the effective sample, not the ATE over the full sample.

# pip install causaldata numpy pandas statsmodels

import numpy  as np
import pandas as pd
import statsmodels.formula.api as smf
from causaldata import lalonde   # LaLonde NSW experimental data

df = lalonde.load_pandas().data

# ── Step 1: OLS of treatment on covariates ──────────────────────────────────
covariates = "age + educ + black + hisp + married + nodegree + re74 + re75"
ps_model   = smf.ols(f"treat ~ {covariates}", data=df).fit()

# ── Step 2: Aronow-Samii weights ─────────────────────────────────────────────
e_hat   = ps_model.resid                 # ê_i = D_i - Ê[D_i | X_i]
as_wts  = e_hat**2 / (e_hat**2).sum()   # normalize to sum = 1

# ── Step 3: Effective sample size ────────────────────────────────────────────
n_eff = 1 / (as_wts**2).sum()
print(f"Full sample n:      {len(df)}")
print(f"Effective sample n: {n_eff:.1f}")

# ── Step 4: Compare full vs effective sample covariate means ─────────────────
covs = ["age", "educ", "black", "hisp", "married", "nodegree", "re74", "re75"]

comparison = pd.DataFrame({
    "full_sample": df[covs].mean(),
    "eff_sample" : df[covs].apply(lambda col: np.average(col, weights=as_wts)),
})
comparison["difference"] = comparison["eff_sample"] - comparison["full_sample"]
print("\nCovariate balance: full sample vs. effective sample")
print(comparison.round(3))

# ── Step 5: Main OLS — note what it actually estimates ───────────────────────
ols = smf.ols(f"re78 ~ treat + {covariates}", data=df).fit()
print(ols.summary())

# The 'treat' coefficient is the WATE over the effective sample —
# not the ATE over the full LaLonde NSW sample.

Aronow, P. M., & Samii, C. (2016). Does regression produce representative estimates of causal effects? American Journal of Political Science, 60(1), 250–267. · LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4), 604–620.

▶Propensity Score Matching

Causal Inference · Program Evaluation

The selection problem: treated and control units differ not just in treatment, but in everything that drove them toward treatment. A simple comparison of means is biased.

PSM’s solution: compress all observable confounders into a single number — the propensity score P(T=1|X). Then match each treated unit to a control unit with a similar score. If the model is correctly specified, matched pairs are as-good-as-randomly assigned.

Adjust the sliders, pick a matching method, and click Run Matching to see how balance improves and the ATT estimate converges toward the truth.

Before Matching — Covariate X

Propensity Score Distributions

After Matching — Covariate X

Sample size (n per group) 120

n treated units; 3n control pool (realistic 1:3 ratio)

Imbalance (covariate gap) Severe

How far apart treated and control X distributions are

Generate data and run matching to see results.

Click Run Matching to estimate the ATT.

Key Takeaways

Propensity scores reduce dimensionality. Instead of matching on every covariate separately, you match on one number: P(T=1|X). Rosenbaum & Rubin (1983) proved this is sufficient to remove bias from observed confounders.
Common support matters. Matching only works where treated and control PS distributions overlap. Units outside the common support region have no valid match and should be dropped.
PSM estimates the ATT, not the ATE. You are asking: what was the effect for the people who were actually treated? That is different from the average effect across everyone.
PSM only removes selection on observables. If there is an unobserved confounder correlated with both treatment and outcome, PSM cannot fix it. IV or an experiment is needed for that.

Lecture 1. Review of Statistics & Probability ▶

Lecture 2. Simple Linear Regression (OLS) ▶

Lecture 3. Multiple Regression ▶

Lecture 4. Heteroskedasticity & Robust Standard Errors ▶

Lecture 5. Instrumental Variables ▶

Lecture 6. Panel Data & Fixed Effects ▶

Lecture 7. Maximum Likelihood Estimation ▶

Lecture 8. Time Series ▶

Lecture 9. Regression Discontinuity Design ▶

Lecture 10. Difference-in-Differences ▶

Lecture 11. Limited Dependent Variables ▶

This lab uses a non-experimental version of the LaLonde (1986) dataset — 185 NSW job training participants combined with 429 CPS comparison workers drawn from a national survey. The goal is to practice loading data, inspecting its structure, and computing basic summary statistics by group.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Description
`treat`	1 = received job training, 0 = control group
`age`	Age in years
`educ`	Years of education completed
`race`	Race: black / hispan / white
`married`	1 = married at time of enrollment
`nodegree`	1 = no high school diploma
`re74`	Real earnings in 1974 (pre-program, $)
`re75`	Real earnings in 1975 (pre-program, $)
`re78`	Real earnings in 1978 (post-program outcome, $)

Instructions

Download the dataset above and save it to a known folder. You will set your working directory to that folder in Step 0 of the code.
Load the data using the code below for the preferred program.
Examine the structure: how many observations? How many variables? What are the variable types?
Compute mean re78 separately for the treated (treat = 1) and control (treat = 0) groups. What patterns emerge?

Stata

* Lab 1: Loading Data in Stata
* ─────────────────────────────────────────────────

* Step 0 — Set your working directory (use FORWARD slashes, not backslashes)
cd "C:/Users/YourName/Downloads"

* Step 1 — Import the CSV
import delimited "lalonde.csv", clear

* Step 2 — Examine the dataset structure
describe
list in 1/5

* Step 3 — Summary statistics for all variables
summarize

* Step 4 — Mean earnings in 1978 by treatment status
tabstat re78, by(treat) stat(mean sd n)

# Lab 1: Loading Data in R
# ─────────────────────────────────────────────────

# Step 0 — Set your working directory
setwd("C:/Users/YourName/Downloads")

# Step 1 — Import the CSV
data <- read.csv("lalonde.csv")

# Step 2 — Examine the dataset structure
str(data)
head(data)

# Step 3 — Summary statistics for all variables
summary(data)

# Step 4 — Mean earnings in 1978 by treatment status
tapply(data$re78, data$treat,
       function(x) c(mean = mean(x), sd = sd(x), n = length(x)))

Python

# Lab 1: Loading Data in Python
# ─────────────────────────────────────────────────
import pandas as pd
import os

# Step 0 — Set your working directory
os.chdir("C:/Users/YourName/Downloads")

# Step 1 — Import the CSV
data = pd.read_csv("lalonde.csv")

# Step 2 — Examine the dataset structure
print(data.dtypes)
print(data.head())

# Step 3 — Summary statistics for all variables
print(data.describe())

# Step 4 — Mean earnings in 1978 by treatment status
print(data.groupby("treat")["re78"].agg(["mean", "std", "count"]))

Raw data rarely arrives analysis-ready. This lab covers the essential cleaning steps for the LaLonde dataset: checking for missing values, inspecting variable types, converting a categorical string variable into indicator (dummy) variables, and inspecting earnings distributions. This dataset combines two very different populations — NSW job training participants and CPS survey workers — so what you find in the pre-program earnings variables will matter more than it might seem.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Type	Notes
`race`	String	Needs to be converted to dummies: `black`, `hispan`
`re74`, `re75`	Numeric	Pre-program earnings; many zeros are legitimate (unemployed)
`re78`	Numeric	Outcome variable; check for implausible values
`treat`, `married`, `nodegree`	Binary	Should only take values 0 or 1

Instructions

Check for missing values across all variables. Are there any?
Verify that binary variables (treat, married, nodegree) only contain 0 and 1.
Convert race into two dummy variables: black and hispan (white is the omitted reference category).
Inspect the distribution of re78 for outliers. What is the maximum value?
Inspect the distributions of re74 and re75. What fraction of treated workers had zero pre-program earnings? Does the same pattern hold for the control group?

Stata

* Lab 2: Cleaning the Data in Stata
* ─────────────────────────────────────────────────

* Step 1 — Check for missing values
misstable summarize

* Step 2 — Verify binary variables
tabulate treat
tabulate married
tabulate nodegree

* Step 3 — Create dummy variables from race string
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 4 — Inspect the distribution of re78
summarize re78, detail
histogram re78, bin(30) title("Distribution of 1978 Earnings")

# Lab 2: Cleaning the Data in R
# ─────────────────────────────────────────────────

# Step 1 — Check for missing values
colSums(is.na(data))

# Step 2 — Verify binary variables
lapply(data[, c("treat", "married", "nodegree")], table)

# Step 3 — Create dummy variables from race string
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

# Step 4 — Inspect the distribution of re78
summary(data$re78)
hist(data$re78, breaks = 30, main = "Distribution of 1978 Earnings",
     xlab = "re78")

Python

# Lab 2: Cleaning the Data in Python
# ─────────────────────────────────────────────────
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv("lalonde.csv")

# Step 1 — Check for missing values
print(data.isnull().sum())

# Step 2 — Verify binary variables
for col in ["treat", "married", "nodegree"]:
    print(col, data[col].value_counts().to_dict())

# Step 3 — Create dummy variables from race string
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

# Step 4 — Inspect the distribution of re78
print(data["re78"].describe())
data["re78"].hist(bins=30)
plt.title("Distribution of 1978 Earnings")
plt.xlabel("re78")
plt.show()

Before running any regression, a well-constructed summary statistics table is essential. This lab covers computing means, standard deviations, and counts for key variables — overall and by treatment status — and producing a publication-style table. The dataset used here is the non-experimental version of the LaLonde data: NSW treated workers paired with a comparison group drawn from the CPS. Because these groups were not randomly assigned, this lab asks whether they look similar before the program started — and whether a simple comparison of 1978 earnings would be fair.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Description
`treat`	Treatment indicator (use to split the sample)
`age`, `educ`	Demographics
`black`, `hispan`, `married`, `nodegree`	Binary characteristics
`re74`, `re75`, `re78`	Earnings in 1974, 1975, 1978 ($)

Instructions

After loading and cleaning the data (Labs 1 & 2), compute the mean and standard deviation of each variable for the full sample.
Split the sample by treat and compute means for the treated and control groups separately.
Produce a table with three columns: Control mean, Treated mean, and Full-sample mean.
Run a t-test for each variable comparing treated and control means. Which variables have p < 0.05?
Compare pre-program earnings (re74, re75) across groups. What does the imbalance tell you about whether a simple comparison of 1978 earnings would be fair?

Stata

* Lab 3: Summary Statistics in Stata
* ─────────────────────────────────────────────────

* Step 0 — Load and prepare data
import delimited "lalonde.csv", clear
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 1 — Full-sample summary
local vars age educ black hispan married nodegree re74 re75 re78
summarize `vars'

* Step 2 — Means by treatment status
tabstat `vars', by(treat) stat(mean sd n) format(%9.2f)

* Step 3 — Test for differences in means (balance check)
foreach v of local vars {
    ttest `v', by(treat)
}

# Lab 3: Summary Statistics in R
# ─────────────────────────────────────────────────

# Step 0 — Load and prepare data
data <- read.csv("lalonde.csv")
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

vars <- c("age","educ","black","hispan","married","nodegree","re74","re75","re78")

# Step 1 — Full-sample summary
summary(data[, vars])

# Step 2 — Means by treatment status
do.call(rbind, lapply(vars, function(v) {
  c(variable = v,
    control  = round(mean(data[data$treat == 0, v]), 2),
    treated  = round(mean(data[data$treat == 1, v]), 2),
    full     = round(mean(data[, v]), 2))
}))

# Step 3 — t-tests for balance
lapply(vars, function(v) {
  t.test(data[data$treat == 1, v], data[data$treat == 0, v])
})

Python

# Lab 3: Summary Statistics in Python
# ─────────────────────────────────────────────────
import pandas as pd
from scipy import stats

data = pd.read_csv("lalonde.csv")
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

vars = ["age","educ","black","hispan","married","nodegree","re74","re75","re78"]

# Step 1 — Full-sample summary
print(data[vars].describe().round(2))

# Step 2 — Means by treatment status
table = data.groupby("treat")[vars].mean().T.round(2)
table.columns = ["Control", "Treated"]
table["Full"] = data[vars].mean().round(2)
print(table)

# Step 3 — t-tests for balance
for v in vars:
    t, p = stats.ttest_ind(data.loc[data.treat==1, v],
                           data.loc[data.treat==0, v])
    print(f"{v}: t = {t:.3f}, p = {p:.3f}")

Interpreting Your Output

Step 2: Comparing Group Means

Your output will show means for each variable split by treatment status. In Stata, tabstat shows Mean, Std Dev, and N for each group. In R and Python, the code produces a table with Control, Treated, and Full-sample columns. The format differs but the question is the same — look at each row and ask yourself:

Pre-program earnings (re74, re75): Do treated and control have similar earnings before the program? (if they do, re74 and re75 will have similar means)
Demographics (age, educ, race): Are the groups comparable on background characteristics? (if so, they will also have similar means)
Key question: If pre-program earnings are very different between groups, what does that tell you about whether a simple comparison of re78 is fair? Why does fairness matter in this context?

Step 3: t-test Results for Balance

Wait — why is a big p-value good here?
In regression, stars mean you found something. Here, stars mean your groups were already different.

Your output will show a p-value for each variable. Here is how to read them:

p < 0.05 = the two groups are significantly different on this variable
p ≥ 0.05 = the two groups look statistically similar

Answer these questions from your own output:

Which pre-program variables have p < 0.05?
Specifically: what is the p-value for re74? Is it small?
If treated and control had different earnings before the program, can you trust a simple comparison of 1978 earnings?

Understanding Balance

Balance means the treated and control groups are similar on pre-program characteristics. In a randomized experiment, balance happens automatically because treatment is assigned by coin flip. In this dataset, the comparison group is not the experimental control group — it is drawn from the CPS, a general population survey. CPS workers are a fundamentally different population from the disadvantaged workers targeted by the NSW program. That is why the groups look so different even before the program started.

Describing Imbalance: A Practical Framework

When you look at your results, you're looking for imbalance. Here is how to describe what you see:

1. Look at the numbers first (not the p-value)

For each variable: what is the actual difference between treated and control means?
Example: if treated re74 = $2,096 and control re74 = $5,619, the difference is about $3,500.
Write it down: "re74 differs by $3,500."

2. Now check if that difference is "real" or "chance"

Look at the p-value. If p < 0.05, this difference is unlikely due to luck alone.
If p ≥ 0.05, it could easily be chance.

3. Categorize each variable

Big difference + small p-value = clear imbalance (groups differ)
Small difference OR large p-value = balanced on this variable (groups similar)

4. Step back and ask

How many variables show clear imbalance?
Which are most important (pre-program earnings vs. age)?
What story does the imbalance tell about how different these two populations are?

This lab has two parts. First, you will compute the true causal effect of job training using the experimental dataset — NSW treated and control workers who were randomly assigned. Because assignment was random, a simple difference in mean 1978 earnings is an unbiased estimate of the ATT. Record that number: it is your benchmark for Labs 5, 6, and 7. Second, you will switch to the non-experimental dataset (NSW treated + CPS comparison) and run OLS. Because the comparison group is a different population, OLS will not recover the benchmark even with controls. This is the failure LaLonde (1986) documented.

Step 1 Dataset: LaLonde (1986) — NSW Treated + NSW Controls (Experimental) ↓ Download CSV

Variable	Role
`re78`	Outcome — real earnings in 1978 ($)
`treat`	1 if received job training (randomly assigned)
`age`, `educ`	Continuous controls
`black`, `hispan`, `married`, `nodegree`	Binary controls
`re74`, `re75`	Pre-program earnings

Steps 2–4 Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Role
`re78`	Outcome — real earnings in 1978 ($)
`treat`	1 if received job training (not randomly assigned here)
`age`, `educ`	Continuous controls
`race`	String — converted to `black`, `hispan` dummies in code
`married`, `nodegree`	Binary controls
`re74`, `re75`	Pre-program earnings (controls added in Step 4)

Instructions

Load lalonde_experimental.csv. Compute mean re78 for treat=1 and treat=0 separately. The difference is the ATT from LaLonde's randomized experiment. Record this number — it is your benchmark for Labs 5, 6, and 7.
Switch to lalonde.csv. Run OLS of re78 on treat only. The coefficient matches the raw mean difference in this dataset. How does it compare to your Step 1 benchmark?
Add demographic controls (age, educ, black, hispan, married, nodegree). Does the treat coefficient move closer to the benchmark?
Add pre-program earnings (re74, re75). Does the estimate improve further?

Getting It Right

Even with full controls, OLS cannot recover the true ATT because the CPS comparison group is a fundamentally different population. How far is your fully-controlled estimate from your Step 1 benchmark? This gap is what LaLonde (1986) documented. Labs 5–7 will try to close it.

Stata

* Step 1. True ATT from experimental data
import delimited "lalonde_experimental.csv", clear
tabstat re78, by(treat) stat(mean)
* ATT = mean re78 (treat=1) minus mean re78 (treat=0)
* Record this. It is your benchmark for Labs 5, 6, and 7.

* Steps 2-4. OLS on non-experimental data
import delimited "lalonde.csv", clear
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 2. OLS with treatment only
regress re78 treat, robust
* How does this compare to your Step 1 benchmark?

* Step 3. Add demographic controls
regress re78 treat age educ black hispan married nodegree, robust

* Step 4. Add pre-program earnings
regress re78 treat age educ black hispan married nodegree re74 re75, robust

* Step 5. How far is the fully controlled estimate from your Step 1 benchmark?

library(sandwich)
library(lmtest)

# Step 1. True ATT from experimental data
exp_data <- read.csv("lalonde_experimental.csv")
means <- tapply(exp_data$re78, exp_data$treat, mean)
att <- means["1"] - means["0"]
cat("True ATT (benchmark):", round(att, 2), "\n")
# Record this. It is your benchmark for Labs 5, 6, and 7.

# Steps 2-4. OLS on non-experimental data
data <- read.csv("lalonde.csv")
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

# Step 2. OLS with treatment only
m1 <- lm(re78 ~ treat, data = data)
coeftest(m1, vcov = vcovHC(m1, type = "HC1"))
# How does this compare to your Step 1 benchmark?

# Step 3. Add demographic controls
m2 <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree,
          data = data)
coeftest(m2, vcov = vcovHC(m2, type = "HC1"))

# Step 4. Add pre-program earnings
m3 <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree +
               re74 + re75, data = data)
coeftest(m3, vcov = vcovHC(m3, type = "HC1"))

# Step 5. How far is the fully controlled estimate from your Step 1 benchmark?
cat("OLS estimate (fully controlled):", round(coef(m3)["treat"], 2), "\n")
cat("Benchmark:", round(att, 2), "\n")
cat("Gap:", round(coef(m3)["treat"] - att, 2), "\n")

Python

import pandas as pd
import statsmodels.formula.api as smf

# Step 1. True ATT from experimental data
exp = pd.read_csv("lalonde_experimental.csv")
att = exp[exp["treat"]==1]["re78"].mean() - exp[exp["treat"]==0]["re78"].mean()
print(f"True ATT (benchmark): ${att:,.2f}")
# Record this. It is your benchmark for Labs 5, 6, and 7.

# Steps 2-4. OLS on non-experimental data
data = pd.read_csv("lalonde.csv")
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

# Step 2. OLS with treatment only
m1 = smf.ols("re78 ~ treat", data=data).fit(cov_type="HC1")
print(m1.summary())
# How does this compare to your Step 1 benchmark?

# Step 3. Add demographic controls
m2 = smf.ols("re78 ~ treat + age + educ + black + hispan + married + nodegree",
              data=data).fit(cov_type="HC1")
print(m2.summary())

# Step 4. Add pre-program earnings
m3 = smf.ols("re78 ~ treat + age + educ + black + hispan + married + nodegree"
              " + re74 + re75", data=data).fit(cov_type="HC1")
print(m3.summary())

# Step 5. How far is the fully controlled estimate from your Step 1 benchmark?
ols_est = m3.params["treat"]
print(f"OLS estimate (fully controlled): ${ols_est:,.2f}")
print(f"Benchmark:                       ${att:,.2f}")
print(f"Gap:                             ${ols_est - att:,.2f}")

Probit is a nonlinear regression model for binary outcomes. It estimates the probability that the dependent variable equals 1 as a function of covariates, using the standard normal CDF rather than a straight line. This lab uses probit to model the probability of receiving job training. At the end there is an extension that builds on the probit output to estimate a selection-corrected treatment effect.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Role
`treat`	Dependent variable in first stage, 1 if received training
`re78`	Outcome in second stage, earnings in 1978
`age`, `educ`, `black`, `hispan`, `married`, `nodegree`	Covariates in both stages
`re74`, `re75`	Pre-program earnings, strong predictors of selection

Instructions

Run a probit regression of treat on all available covariates. Which variables predict participation? Report marginal effects.
Generate predicted probabilities (propensity scores) from the probit. Compare the mean score for treated vs. control workers. Do the two groups overlap? Poor overlap means the comparison group is too different to be a valid counterfactual.

Getting It Right

Compute the inverse Mills ratio (IMR) for each observation. The formula differs by group: for treated workers (treat=1) use phi(xb) / Phi(xb); for controls (treat=0) use -phi(xb) / (1 - Phi(xb)). Phi is the standard normal CDF, phi is the PDF, and xb is the probit linear index.
Add the IMR as a control in a second-stage OLS of re78 on treat and all covariates. This is the Heckman two-step selection correction.
How far is the Heckman estimate from your Lab 4 Step 1 experimental ATT? Record this and compare to Lab 6 DiD when you get there.

Stata

* Step 0 — Load and prepare data
import delimited "lalonde.csv", clear
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 1. Probit first stage
probit treat age educ black hispan married nodegree re74 re75
margins, dydx(*)

* Step 2. Predicted probabilities (propensity scores)
predict phat, pr
tabstat phat, by(treat) stat(mean min max)

* Step 3. Inverse Mills ratio
predict xb, xb
gen imr = treat * (normalden(xb) / normal(xb)) + (1 - treat) * (-normalden(xb) / (1 - normal(xb)))

* Step 4. Second-stage OLS
regress re78 treat age educ black hispan married nodegree re74 re75 imr, robust

* Step 5. Compare to benchmarks
di "Heckman: " _b[treat]
* Compare to your Lab 4 Step 1 experimental ATT.
* Record this and compare to Lab 6 DiD when you get there.

library(sandwich)
library(lmtest)
library(margins)
data <- read.csv("lalonde.csv")
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

# Step 1. Probit first stage
probit <- glm(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
               data = data, family = binomial(link = "probit"))
summary(margins(probit))

# Step 2. Predicted probabilities (propensity scores)
data$phat <- predict(probit, type = "response")
tapply(data$phat, data$treat, summary)

# Step 3. Inverse Mills ratio
xb <- predict(probit, type = "link")
data$imr <- ifelse(data$treat == 1, dnorm(xb) / pnorm(xb), -dnorm(xb) / (1 - pnorm(xb)))

# Step 4. Second-stage OLS
heck <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree +
                   re74 + re75 + imr, data = data)
coeftest(heck, vcov = vcovHC(heck, type = "HC1"))

# Step 5. Compare to benchmarks
cat("Heckman:", round(coef(heck)["treat"], 2), "\n")
# Compare to your Lab 4 Step 1 experimental ATT.
# Record this and compare to Lab 6 DiD when you get there.

Python

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from scipy.stats import norm

data = pd.read_csv("lalonde.csv")
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

# Step 1. Probit first stage
probit = smf.probit(
    "treat ~ age + educ + black + hispan + married + nodegree + re74 + re75",
    data=data
).fit()
print(probit.get_margeff().summary())

# Step 2. Predicted probabilities (propensity scores)
data["phat"] = probit.predict()
print(data.groupby("treat")["phat"].describe())

# Step 3. Inverse Mills ratio
xb = probit.predict(linear=True)
data["imr"] = np.where(data["treat"] == 1, norm.pdf(xb) / norm.cdf(xb), -norm.pdf(xb) / (1 - norm.cdf(xb)))

# Step 4. Second-stage OLS
heck = smf.ols(
    "re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75 + imr",
    data=data
).fit(cov_type="HC1")
print(heck.summary())

# Step 5. Compare to benchmarks
print(f"Heckman: ${heck.params['treat']:,.2f}")
# Compare to your Lab 4 Step 1 experimental ATT.
# Record this and compare to Lab 6 DiD when you get there.

Difference-in-differences (DiD) estimates a treatment effect by comparing the change in outcomes over time for a treated group against the change for an untreated group. It differences out any time-invariant differences between groups and any common time trend, leaving only the treatment effect. The key assumption is parallel trends: in the absence of treatment, both groups would have followed the same trajectory. This lab applies DiD to the LaLonde data using pre-program earnings as the pre-period. At the end you will compare your estimate to the experimental benchmark from Lab 4.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Role
`re78`	Outcome, post-period earnings (1978)
`re75`	Pre-period earnings (1975)
`re74`	Second pre-period earnings (1974), for trend check
`treat`	Treatment indicator, 1 if received job training
`age`, `educ`, `black`, `hispan`, `married`, `nodegree`	Controls

Instructions

Compute the four group means: average re75 and re78 for treat=0 and treat=1. Calculate the DiD by hand.
Reshape the data to long format with one row per person per period. Create a post indicator equal to 1 for the 1978 observation. Run a DiD regression. Does the coefficient on the post x treat interaction match your manual calculation?
Add demographic controls. Does the estimate change?
Compute average re74 and re75 for each group. Is the change from 1974 to 1975 the same for treated and control workers?

Getting It Right

Compare your DiD estimate to the Lab 4 benchmark. How close is it? What might a difference in pre-period trends tell you about why DiD did or did not recover the true effect?

Stata

* Step 0 — Load and prepare data
import delimited "lalonde.csv", clear
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 1. Four group means and manual DiD
tabstat re74 re75 re78, by(treat) stat(mean)
* DiD = (re78_treat1 - re75_treat1) - (re78_treat0 - re75_treat0)
* Calculate this by hand from the table above.

* Step 2. Reshape and DiD regression
preserve
gen id = _n
rename re75 earnings0
rename re78 earnings1
reshape long earnings, i(id) j(post)
regress earnings i.post##i.treat, cluster(id)
* Coefficient on 1.post#1.treat is the DiD estimate.

* Step 3. Add controls
regress earnings i.post##i.treat age educ black hispan married nodegree, cluster(id)

* Step 4. Pre-period trend check
restore
tabstat re74 re75, by(treat) stat(mean)
* Is the change from re74 to re75 the same for both groups?

* Step 5. Compare to benchmark
* Record your DiD estimate from Step 2.
* Compare it to your Lab 4 benchmark.
* How large is the gap? What does the re74-to-re75 pattern suggest about why?

library(dplyr)
library(tidyr)
library(sandwich)
library(lmtest)
data <- read.csv("lalonde.csv")
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

# Step 1. Four group means and manual DiD
data %>% group_by(treat) %>%
  summarise(mean_re74 = mean(re74), mean_re75 = mean(re75), mean_re78 = mean(re78))

did_manual <- (mean(data$re78[data$treat==1]) - mean(data$re75[data$treat==1])) -
               (mean(data$re78[data$treat==0]) - mean(data$re75[data$treat==0]))
cat("Manual DiD:", did_manual, "\n")

# Step 2. Reshape and DiD regression
data_long <- data %>%
  mutate(id = row_number()) %>%
  pivot_longer(cols = c(re75, re78), names_to = "year", values_to = "earnings") %>%
  mutate(post = as.integer(year == "re78"))

did <- lm(earnings ~ post * treat, data = data_long)
coeftest(did, vcov = vcovCL(did, cluster = ~id))

# Step 3. Add controls
did_c <- lm(earnings ~ post * treat + age + educ + black + hispan + married + nodegree,
             data = data_long)
coeftest(did_c, vcov = vcovCL(did_c, cluster = ~id))

# Step 4. Pre-period trend check
data %>% group_by(treat) %>%
  summarise(change_74_75 = mean(re75) - mean(re74))
# Is the change the same for both groups?

# Step 5. Compare to benchmark
cat("DiD estimate:", round(coef(did)["post:treat"], 2), "\n")
# Compare to your Lab 4 Step 1 experimental ATT.
# How large is the gap? What does the pre-period pattern suggest about why?

Python

import pandas as pd
import statsmodels.formula.api as smf

data = pd.read_csv("lalonde.csv")
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

# Step 1. Four group means and manual DiD
print(data.groupby("treat")[["re74","re75","re78"]].mean())
did_manual = (
    (data.loc[data.treat==1,"re78"].mean() - data.loc[data.treat==1,"re75"].mean()) -
    (data.loc[data.treat==0,"re78"].mean() - data.loc[data.treat==0,"re75"].mean())
)
print("Manual DiD:", did_manual)

# Step 2. Reshape and DiD regression
data_long = pd.melt(
    data.assign(id=range(len(data))),
    id_vars=["id","treat","age","educ","black","hispan","married","nodegree","re74"],
    value_vars=["re75","re78"], var_name="year", value_name="earnings"
)
data_long["post"] = (data_long["year"] == "re78").astype(int)

did = smf.ols("earnings ~ post * treat", data=data_long).fit(cov_type="cluster", cov_kwds={"groups": data_long["id"]})
print(did.summary())

# Step 3. Add controls
did_c = smf.ols(
    "earnings ~ post * treat + age + educ + black + hispan + married + nodegree",
    data=data_long
).fit(cov_type="cluster", cov_kwds={"groups": data_long["id"]})
print(did_c.summary())

# Step 4. Pre-period trend check
trends = data.groupby("treat")[["re74","re75"]].mean()
trends["change_74_75"] = trends["re75"] - trends["re74"]
print(trends)
# Is the change the same for both groups?

# Step 5. Compare to benchmark
print(f"DiD estimate: ${did.params['post:treat']:,.2f}")
# Compare to your Lab 4 Step 1 experimental ATT.
# How large is the gap? What does the pre-period pattern suggest about why?

Propensity score matching estimates a treatment effect by pairing each treated unit with a control unit that had a similar probability of receiving treatment given observed covariates. By matching on the propensity score rather than all covariates individually, we try to construct a comparison group that looks like the treated group on everything we can observe. This lab builds directly on Lab 5: the propensity scores you estimated there are the inputs here. Dehejia and Wahba (1999) revisited LaLonde's data using propensity score matching and found estimates much closer to the experimental benchmark than the methods in Labs 5 and 6 achieved. This lab asks you to see if you can replicate that result.

Dataset: LaLonde (1986) — NSW Treated + CPS Comparison (Non-Experimental) ↓ Download CSV

Variable	Role
`treat`	Treatment indicator, 1 if received job training
`re78`	Outcome, earnings in 1978
`age`, `educ`, `black`, `hispan`, `married`, `nodegree`	Matching covariates
`re74`, `re75`	Pre-program earnings, matching covariates

Instructions

Estimate propensity scores using a probit of treat on all covariates (same specification as Lab 5).
Match each treated worker to the nearest control worker by propensity score (matching with replacement — the same control may match multiple treated workers). How many unique control workers were used? Are any treated units far from their match?
Check covariate balance before and after matching. Are the matched groups more similar on observable characteristics than the unmatched groups?
Estimate the treatment effect on the matched sample. Compare to the Lab 4 benchmark. How close did matching get?

Getting It Right

Try a caliper to discard treated units with no close match (e.g., caliper of 0.01 on the propensity score). Does the estimate change? How many units are dropped? What does sensitivity to the caliper tell you about the quality of the match?

Stata

* Step 0 — Load and prepare data
import delimited "lalonde.csv", clear
generate black  = (race == "black")
generate hispan = (race == "hispan")
drop race

* Step 1. Estimate propensity scores and match (nearest-neighbor, ATT)
teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75, probit), atet

* Step 2. Balance check before and after matching
tebalance summarize
* Look at the standardized differences. Values below 0.1 indicate good balance.

* Step 3. Treatment effect
* The teffects output reports the ATT directly.
* Record it and compare to your Lab 4 benchmark.

* Step 4. Caliper matching (Getting It Right)
* Use caliper to flag treated units with no close match, drop them,
* then re-estimate ATT on the on-support sample.
capture teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75, probit), ///
    atet caliper(0.01) osample(unmatched)
count if unmatched == 1
local n_drop = r(N)
drop if unmatched == 1
di "Treated units dropped: " `n_drop'
teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75, probit), atet
* How does the estimate change? How many treated units were dropped?

library(MatchIt)
library(sandwich)
library(lmtest)
data <- read.csv("lalonde.csv")
data$black  <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race   <- NULL

# Step 1. Nearest-neighbor matching on propensity score (probit)
m_out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
                 data = data, method = "nearest", distance = "probit", replace = FALSE)

# Step 2. Balance check
summary(m_out, standardize = TRUE)
# Look at standardized mean differences before and after matching.

# Step 3. Treatment effect on matched sample
m_data <- match.data(m_out)
fit <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75,
          data = m_data, weights = weights)
coeftest(fit, vcov = vcovCL(fit, cluster = ~subclass))
cat("Matching ATT:", round(coef(fit)["treat"], 2), "\n")
# Compare to your Lab 4 Step 1 experimental ATT.

# Step 4. Caliper matching (Getting It Right)
m_cal <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
                 data = data, method = "nearest", distance = "probit", caliper = 0.01, replace = FALSE)
summary(m_cal)
m_cal_data <- match.data(m_cal)
fit_cal <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75,
              data = m_cal_data, weights = weights)
coeftest(fit_cal, vcov = vcovCL(fit_cal, cluster = ~subclass))
cat("Caliper ATT:", round(coef(fit_cal)["treat"], 2), "\n")
cat("Units dropped:", nrow(data) - nrow(m_cal_data), "\n")
# How does the estimate change? Does restricting to better matches help?

Python

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from sklearn.neighbors import NearestNeighbors

data = pd.read_csv("lalonde.csv")
data["black"]  = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])

# Step 1. Propensity scores
probit = smf.probit(
    "treat ~ age + educ + black + hispan + married + nodegree + re74 + re75",
    data=data
).fit()
data["pscore"] = probit.predict()

# Step 2. Nearest-neighbor matching
treated = data[data.treat == 1].reset_index(drop=True)
control = data[data.treat == 0].reset_index(drop=True)

nn = NearestNeighbors(n_neighbors=1)
nn.fit(control[["pscore"]])
distances, indices = nn.kneighbors(treated[["pscore"]])
matched_control = control.iloc[indices.flatten()].reset_index(drop=True)

# Step 3. Balance check
covs = ["age", "educ", "black", "hispan", "married", "nodegree", "re74", "re75"]
before = data.groupby("treat")[covs].mean().T
before.columns = ["control_before", "treated_before"]
after = pd.DataFrame({
    "control_after":  matched_control[covs].mean(),
    "treated_after":  treated[covs].mean()
})
print(pd.concat([before, after], axis=1).round(2))

# Step 4. ATT estimate
att = treated["re78"].mean() - matched_control["re78"].mean()
print(f"Matching ATT: ${att:,.2f}")
# Compare to your Lab 4 Step 1 experimental ATT.

# Getting It Right: caliper matching
caliper = 0.01
mask = distances.flatten() <= caliper
att_cal = treated.loc[mask, "re78"].mean() - matched_control.loc[mask, "re78"].mean()
print("Caliper ATT (0.01):", att_cal)
print("Treated units dropped:", (~mask).sum())

▶Production Possibilities & Comparative Advantage

▶Demand

Microeconomics · Consumer Behavior

The demand curve shows how many units consumers are willing and able to buy at each price. Generally, as price rises, people buy less — that’s the Law of Demand. Use the price slider to move along the curve and see how quantity demanded responds.

But the whole curve can also shift. When something other than price changes — income, tastes, prices of related goods, or expectations — demand at every price changes simultaneously. Use the demand shift slider to see a rightward shift (increase) or leftward shift (decrease).

The Demand Curve

Price (P) $40

Movement along the curve — as price rises, quantity demanded falls

Demand Shift Baseline

Shifts the entire curve — right = demand increases; left = demand decreases

Income. For normal goods, rising income increases demand (right shift). For inferior goods, rising income decreases demand as consumers trade up.
Prices of substitutes. If a substitute becomes more expensive, demand for this good rises. If it becomes cheaper, demand falls.
Prices of complements. If a good used alongside this one gets cheaper, demand for this good rises; more expensive, demand falls.
Tastes and preferences. Health trends, advertising, or fashion can shift demand in either direction without any price change.
Expectations. If consumers expect higher prices in the future, they buy more now — shifting current demand right.
Number of buyers. More consumers in the market means higher demand at every price.

Law of Demand. All else equal, a higher price means lower quantity demanded. This is why demand curves slope downward. Drag the price slider up — the point moves left (less is demanded).
Movement along the curve vs. a shift. A price change moves you along the existing curve. A change in income, tastes, or related prices shifts the entire curve. These are fundamentally different events.
Substitutes and complements. If a substitute gets more expensive, consumers switch here — demand rises. If a complement gets more expensive, people use less of both — demand falls.
Normal vs. inferior goods. For most goods, higher income shifts demand right. For inferior goods (like instant noodles), higher income shifts demand left as people upgrade.
Expectations are part of demand. Anticipated price increases shift today’s demand right. Demand is forward-looking, not just about current prices.

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Supply

Microeconomics · Producer Behavior

The supply curve shows how many units producers are willing and able to sell at each price. Generally, as price rises, producing becomes more profitable and producers offer more — that’s the Law of Supply. Use the price slider to move along the curve and see how quantity supplied responds.

The entire curve can also shift. When input costs change, technology improves, the number of sellers changes, or government policy shifts — supply at every price changes simultaneously. Use the supply shift slider to see a rightward shift (more supply) or leftward shift (less supply).

The Supply Curve

Price (P) $40

Movement along the curve — as price rises, quantity supplied increases

Supply Shift Baseline

Shifts the entire curve — right = supply increases (lower costs); left = supply decreases (higher costs)

Input (resource) prices. Cheaper labor, materials, or energy lowers production costs — supply increases (right shift). More expensive inputs shift supply left.
Technology. Better production technology lowers costs and lets producers profitably offer more at every price — supply increases (right shift).
Number of sellers. More firms entering the market increases market supply. Firms exiting decreases it.
Government policies. Subsidies lower costs and increase supply. Taxes raise costs and decrease supply.
Expectations. If producers expect higher future prices, they may withhold supply now, decreasing current supply.
Prices of related goods in production. If a substitute in production becomes more profitable, producers switch, decreasing supply of this good.

Law of Supply. All else equal, a higher price means higher quantity supplied. Supply curves slope upward because higher prices make production more profitable. Drag the price slider up — the point moves right (more is supplied).
Movement along the curve vs. a shift. A change in the good’s own price moves you along the supply curve. A change in costs, technology, or the number of sellers shifts the entire curve.
Lower costs shift supply right. When inputs become cheaper or technology improves, producers can profitably supply more at every price. This is what long-run technological progress looks like.
The supply curve is a marginal cost curve. A producer only supplies a unit if the price covers the marginal cost of producing it. The supply curve tells you the minimum price needed to bring each additional unit to market.
Market supply = sum of firm supply curves. The market supply curve is the horizontal sum of all individual firm supply curves. More sellers means a larger total supply at every price.

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Market Equilibrium

Microeconomics · Market Equilibrium

Imagine this: Apple announces a new iPhone — millions of people suddenly want one. That’s a demand shift: more people want the product at every price. Now imagine oil prices spike, making it expensive to run factories and ship goods. That’s a supply shift: producers offer fewer units at every price. In both cases, the market adjusts until supply meets demand again.

A market equilibrium is the price and quantity where buyers and sellers agree — where the amount consumers want to buy exactly equals the amount producers want to sell. Use the buttons below to shift the curves and see what happens!

Supply & Demand

Demand (D₀) 100

Higher = more demand at every price (curve shifts right/up)

Supply (S₀) 15

Lower = more supply at every price (curve shifts right/down)

Demand shift vs. movement along demand. A demand shift happens when something other than price changes — income, tastes, prices of related goods, expectations, or the number of buyers. Moving along the curve is just a response to a price change. Try “Demand ↑” to see the whole curve move!
Supply shift vs. movement along supply. Same idea: a supply shift happens when costs change, technology improves, or the number of sellers changes. “Supply ↑” simulates producers becoming more efficient (lower costs = more offered at every price).
The four core results. Demand ↑ ︎→︎ P↑ Q↑. Demand ↓ ︎→︎ P↓ Q↓. Supply ↑ ︎→︎ P↓ Q↑. Supply ↓ ︎→︎ P↑ Q↓. These four cases are the foundation of ECO 2020.
Shortage vs. surplus. If price is below equilibrium, consumers want more than producers offer — a shortage. Sellers raise prices until the market clears. Above equilibrium: a surplus drives prices back down. Equilibrium is stable.
Both curves shift. Real markets often have both curves shifting simultaneously. Try moving both sliders at once. When both increase, quantity rises for certain — but the price change is ambiguous and depends on which shift is larger.

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Consumer & Producer Surplus

▶Elasticity

Microeconomics · Price Elasticity of Demand

Imagine this: A gas station raises prices by 10%. Most drivers still fill up — they need gas to get to work. Quantity demanded barely changes. Now a clothing store raises prices by 10%. Many shoppers switch to competitors or wait for a sale. Quantity demanded drops sharply.

Price Elasticity of Demand measures how sensitive consumers are to price: Price Elasticity of Demand = % change in Q_d ÷ % change in P. elasticity > 1 = elastic (responsive); elasticity < 1 = inelastic (unresponsive).

Drag the price slider to see how elasticity changes along the curve. The shaded rectangle is total revenue (P × Q) — watch how it changes!

Demand Curve — shaded area = Total Revenue (P × Q)

Price (P) $60

Move along the demand curve — watch elasticity and total revenue change

Curve Steepness Moderate

Steeper = less responsive to price (more inelastic); flatter = more responsive (more elastic)

Price Elasticity of Demand = % ΔQ_d ÷ % ΔP. A value of −2 means a 1% price increase causes a 2% drop in quantity demanded. Price Elasticity of Demand is always negative (higher price → lower Qd). We compare the absolute value to 1 to classify elasticity.
Elastic vs. inelastic. Absolute value > 1: elastic (price-sensitive). Absolute value < 1: inelastic. Absolute value = 1: unit elastic. Drag the price slider toward the top of the curve — elastic. Near the bottom — inelastic. Elasticity varies along a single linear curve!
Elasticity and total revenue (TR = P × Q). Elastic demand: price ↑ ︎→︎ TR ↓ (Qd falls more than price rose). Inelastic demand: price ↑ ︎→︎ TR ↑ (Qd barely falls). Watch the shaded rectangle shrink or grow as you move the price slider.
Determinants of elasticity. More elastic when: many substitutes, luxury good, large share of income, time to adjust. More inelastic when: few substitutes, necessity, small share of income, urgently needed.
Slope ≠ elasticity. A steeper demand curve is not the same as a more inelastic one at every point. A linear curve changes from elastic to inelastic as you move down it — even though the slope is constant throughout.

Practice Questions

Question 1 of 5

You answered all 5 questions!
Score: / 5

▶Price Controls

▶Monopoly vs. Perfect Competition

▶Externalities & Pigouvian Policy

▶Tax Incidence

▶Public Goods & the Free-Rider Problem

▶Price Discrimination

Lecture 1. Basic Principles ▶

Lecture 2. Gains from Trade ▶

Lecture 3. Supply and Demand ▶

Lecture 4. The Price System ▶

Lecture 5. Price Controls and Quotas ▶

Lecture 6. Competitive Markets ▶

Lecture 7. Market Power & Monopoly ▶

Lecture 8. Price Discrimination and Strategy ▶

Lecture 9. Externalities & Transaction Costs ▶

Lecture 10. Asymmetric Information Problems ▶

Lecture 11. Public Choice and Regulation ▶

Practice Questions

Question 1 of 4

▶Labor Market: Supply & Demand

Labor Economics · Wages & Employment

How are wages determined? Firms demand labor based on how much revenue an extra worker generates — the Marginal Revenue Product (MRP). Workers supply labor based on the opportunity cost of their time. Where supply meets demand, wages and employment are determined simultaneously.

Use the sliders to shift labor demand (e.g., a technology boom raises MRP) or labor supply (e.g., immigration expands the workforce). Watch how the equilibrium wage W* and employment L* respond.

Labor Demand Shift (productivity / product value) +0.0

Negative → demand falls · Positive → demand rises

Labor Supply Shift (workers available) +0.0

Negative → fewer workers · Positive → more workers

Key Takeaways

Labor demand = firms’ MRP. D_L shifts right when product prices rise, technology improves, or capital complements labor. Both wages and employment increase.
Labor supply reflects workers’ opportunity cost. S_L shifts right when immigration expands the workforce or participation rises — wages fall but employment rises.
Demand and supply shocks have opposite wage effects. A demand increase → W↑ and L↑. A supply increase → W↓ and L↑. Watch employer and worker surplus redistribute between the shaded areas.
Competitive equilibrium maximizes total surplus. W* = MRP at L*: no DWL. Policy interventions (minimum wages, payroll taxes) drive wedges between supply and demand, creating welfare losses analogous to price controls in goods markets.

▶Minimum Wage

▶Monopsony

▶Human Capital & Returns to Education

▶Unions & Collective Bargaining

▶Unemployment: Types & Natural Rate

Labor Economics · Macroeconomic Context

Not all unemployment is the same. Frictional unemployment reflects workers in transition between jobs — search takes time even in a healthy economy. Structural unemployment arises when workers’ skills no longer match available jobs — technology, offshoring, or industry decline. Cyclical unemployment is the extra joblessness caused by a recession: aggregate demand falls short of potential output.

The natural rate (NAIRU) = frictional + structural. It cannot be eliminated by demand stimulus. Use the sliders to build up the unemployment rate from its components.

Frictional Unemployment 4.0%

Structural Unemployment 2.0%

Cyclical Unemployment +0.0%

Positive = recession · Zero = at potential output

Key Takeaways

The natural rate is not zero unemployment. Even at full employment, workers are always between jobs (frictional) or retraining after industry change (structural). The U.S. NAIRU is estimated at roughly 4–5%.
Demand stimulus can only cure cyclical unemployment. Monetary or fiscal expansion raises aggregate demand, pulling the economy back to potential. It cannot retrain structurally unemployed workers or speed up job search.
Okun’s Law: 1% cyclical unemployment ≈ 2% GDP below potential. This empirical regularity links the unemployment gap to the output gap, connecting labor market analysis to macroeconomic measurement.
Duration matters. Long-term unemployment can erode skills and labor force attachment, converting what started as cyclical unemployment into structural unemployment — a phenomenon called hysteresis.

Class slides for Labor Economics coming soon.

Practice Questions

Question 1 of 4

Interactive tools for Intermediate Micro coming soon.

Class slides for Intermediate Micro coming soon.

Practice Questions

Question 1 of 4

Economics of Marriage, Sex & Divorce ▶

▶Simple Exponential Smoothing

▶Moving Average Smoother

▶ACF and PACF Explorer

▶Holt’s Linear Trend Method

▶STL Decomposition

Business Forecasting · Decomposition

STL (Seasonal and Trend decomposition using Loess) separates a series into trend, seasonal, and remainder components using locally-weighted regression (LOESS) rather than a simple moving average. Two key parameters control the fit: the trend window sets how flexible the trend is, and the seasonal window controls whether the seasonal pattern is fixed across all years (periodic) or allowed to evolve slowly.

The chart shows 48 quarters of constructed data. Use the component buttons to overlay each extracted component, or select Seasonally Adjusted to see the series with seasonality removed. Adjust the trend window and seasonal window to see how the components change.

View:

Season window:

Trend window 9

Larger window = smoother trend, smaller = more flexible (must be odd)

Original Series with Selected Component

Trend: Estimated by applying LOESS smoothing to the series with a user-specified window. A larger window gives a smoother trend; a smaller window tracks short-run movements more closely.
Seasonal: After subtracting the trend, LOESS is applied separately to each season position (all Q1s, all Q2s, etc.). With season(window="periodic") the seasonal pattern is fixed over time; a numeric window allows it to change slowly.
Remainder: R_t = y_t − T_t − S_t. A small remainder SD indicates the model captured most of the structure.
Seasonally adjusted: y_t − S_t — removes the seasonal pattern to reveal the underlying trend plus irregular variation.
Trend strength (F_T): 1 − Var(R) / Var(T + R). Values above 0.64 indicate strong trend.
Seasonal strength (F_S): 1 − Var(R) / Var(S + R). Values above 0.64 indicate strong seasonality.

▶Benchmark Forecasting Methods

▶ETS Models

Business Forecasting · Exponential Smoothing

ETS stands for Error, Trend, Seasonal. Every exponential smoothing method (SES, Holt, Holt-Winters) is a special case of the ETS family, distinguished by whether each component is None, Additive, or Multiplicative.

The chart shows 48 quarters of data with a trend and additive seasonal pattern. Use the toggle buttons to change the error type, trend type, and seasonal type. The fitted line (dashed red) and 8-quarter forecast (green) update automatically. Compare AIC values across specifications to see which model fits best.

Error (E):

Trend (T):

Seasonal (S):

Data, ETS Fitted Values, and Forecast

ETS(A,N,N) = Simple Exponential Smoothing. No trend, no seasonality.
ETS(A,A,N) = Holt’s Linear Trend. Additive trend, no seasonality.
ETS(A,A,A) = Holt-Winters Additive. Trend and additive seasonal component.
ETS(A,A,M) = Holt-Winters Multiplicative Seasonal. Use when seasonal swings grow proportionally with the level.
Damped trend (Ad): The trend is multiplied by a damping parameter φ < 1 each step, causing it to flatten out in long-run forecasts. Tends to outperform the full additive trend in practice.
AIC rewards fit but penalizes the number of parameters — a lower AIC indicates a better balance of fit and parsimony.

▶ARIMA Models

Business Forecasting · ARIMA

An ARIMA(p, d, q) model has three parts: AR(p) — the current value depends on its own past p values; I(d) — the series is differenced d times to achieve stationarity; MA(q) — the current value also depends on the past q forecast errors.

Use the buttons to change p, d, and q. The main chart shows the actual series, fitted values (dashed), and an 8-step forecast. The residual ACF chart below checks whether the model has captured all autocorrelation — bars should stay inside the red bounds for a well-specified model.

AR order (p):

Differencing (d):

MA order (q):

Series, ARIMA Fitted Values, and Forecast

Residual ACF — Is Autocorrelation Removed?

Choose d: Difference until the series looks stationary (constant mean). A unit-root test (ADF, KPSS) formalizes this. Most economic series need d = 1.
Choose p: Use the PACF — the AR order equals the lag at which the PACF cuts off sharply. Try ARIMA(p,d,0) first.
Choose q: Use the ACF of the differenced series — the MA order equals the lag at which the ACF cuts off. Try ARIMA(0,d,q) if the ACF cuts off and the PACF tails off.
Check residuals: All residual ACF bars should fall within the 95% bounds (±1.96 / √T). Significant bars indicate the model is still missing structure.
ARIMA(0,1,0): Random walk with no drift — equivalent to the Naive benchmark method.

▶Seasonal ARIMA (SARIMA)

Business Forecasting · ARIMA

SARIMA(p,d,q)(P,D,Q)[m] extends ARIMA with a seasonal layer operating at lag m. The non-seasonal part (p,d,q) handles short-run autocorrelation; the seasonal part (P,D,Q) handles patterns that repeat every m periods. Here m = 12 (monthly data).

The series below has a rising trend and a seasonal pattern that grows with the level — similar to the Lab 8 pharmaceutical data. Use the button rows to set the non-seasonal orders (top) and seasonal orders (bottom). The residual ACF should fall within the bounds if the model is well-specified.

p (AR): d (diff): q (MA):

P (SAR): D (sdiff): Q (SMA):

Monthly Series, SARIMA Fitted Values, and 18-Month Forecast

Residual ACF — Lags 1–24

Notation: ARIMA(p,d,q)(P,D,Q)[m] — lower case for non-seasonal, upper case for seasonal. The seasonal layer operates at lags m, 2m, 3m, …
D (seasonal differencing): Subtract y_t−m from y_t. Use D = 1 when the ACF shows slow decay at the seasonal lags. Apply before regular differencing.
P and Q: After seasonal differencing, look at the ACF and PACF at lags m, 2m, … A spike only at lag m in the PACF suggests P = 1; a spike only at lag m in the ACF suggests Q = 1.
Identification order in fpp3: (1) take log if variance grows with level; (2) choose D with unitroot_nsdiffs; (3) choose d with unitroot_ndiffs; (4) read ACF/PACF of the fully-differenced series for p,q,P,Q.
Residual check: A well-specified SARIMA leaves residuals that look like white noise — all ACF bars inside the 95% bounds, including at lags 12, 24.

Lecture 1. Introduction to Forecasting ▶

Lecture 2. Time Series Patterns & Graphics ▶

Lecture 3. Decomposition & Moving Averages ▶

Lecture 4. Time Series Features ▶

Lecture 5. The Forecaster’s Toolbox ▶

Lecture 6. Time Series Regression ▶

Lecture 7. Exponential Smoothing & ETS ▶

Lecture 8. ARIMA Models ▶

This lab introduces the fpp3 ecosystem and the tsibble data structure used throughout the course. You will load a built-in dataset, filter to a single time series, produce a time plot, and compute basic summary statistics. All fpp3 datasets are included in the package — no download is required.

Dataset: aus_retail — built in to fpp3 (monthly Australian retail trade, 1982–2018)

Variable	Description
`Month`	Monthly time index (tsibble index)
`State`	Australian state (key variable)
`Industry`	Retail industry category (key variable)
`Turnover`	Monthly retail turnover ($AUD millions)

Instructions

Install and load fpp3. Print aus_retail — how many key combinations are there? What is the index?
Filter to New South Wales, Supermarket and grocery stores. How many rows does this leave?
Make a time plot with autoplot(). Does the series show a trend? Seasonality?
Compute the mean, standard deviation, minimum, and maximum of Turnover for this series.
Repeat for a different state or industry. How do the patterns compare?

# Lab 1: Getting Started with fpp3
# ─────────────────────────────────────────────────
# install.packages("fpp3")   # run once
library(fpp3)

# Step 1 — Inspect the tsibble
aus_retail
# Note: key = State + Industry; index = Month

# Step 2 — Filter to one series
nsw_grocery <- aus_retail |>
  filter(State == "New South Wales",
         Industry == "Supermarket and grocery stores")
nsw_grocery

# Step 3 — Time plot
nsw_grocery |>
  autoplot(Turnover) +
  labs(title = "NSW Supermarket & Grocery Turnover",
       y = "Turnover ($AUD millions)", x = NULL)

# Step 4 — Summary statistics
nsw_grocery |>
  as_tibble() |>
  summarise(
    n             = n(),
    mean_turnover = mean(Turnover, na.rm = TRUE),
    sd_turnover   = sd(Turnover,   na.rm = TRUE),
    min_turnover  = min(Turnover,  na.rm = TRUE),
    max_turnover  = max(Turnover,  na.rm = TRUE)
  )

# Step 5 — Try another series for comparison
aus_retail |>
  filter(State == "Victoria",
         Industry == "Clothing, footwear and personal accessory retailing") |>
  autoplot(Turnover) +
  labs(title = "Victoria: Clothing & Footwear Turnover",
       y = "Turnover ($AUD millions)")

Python

# Lab 1: Getting Started with Time Series Data
# ─────────────────────────────────────────────────
# Export from R first:
#   library(fpp3)
#   aus_retail |>
#     filter(State == "New South Wales",
#            Industry == "Supermarket and grocery stores") |>
#     mutate(Month = as.Date(Month)) |>
#     as_tibble() |>
#     write.csv("nsw_grocery.csv", row.names = FALSE)

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("nsw_grocery.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")

# Step 1 — Inspect the data
print(df.head())
print(f"N = {len(df)}, {df.index.min().date()} to {df.index.max().date()}")

# Step 2 — Time plot
df["Turnover"].plot(
    figsize=(10, 4),
    title="NSW Supermarket & Grocery Turnover",
    ylabel="$AUD millions")
plt.tight_layout(); plt.show()

# Step 3 — Summary statistics
print(df["Turnover"].describe())

# Step 4 — Seasonal pattern: average turnover by calendar month
df["month"] = df.index.month
monthly_avg = df.groupby("month")["Turnover"].mean()
monthly_avg.plot(kind="bar", figsize=(8, 4),
    title="Average Turnover by Calendar Month")
plt.xlabel("Month"); plt.tight_layout(); plt.show()

# Step 5 — Compare another series (re-export from R with different filter)

Plotting comes before modeling. This lab walks through fpp3’s suite of visualization tools — time plot, seasonal plot, subseries plot, ACF correlogram, and lag plot — using a series with clear trend and strong monthly seasonality.

Dataset: a_10 — built in to fpp3 (monthly Australian antidiabetic drug sales, 1992–2008)

Variable	Description
`Month`	Monthly time index
`Cost`	Monthly government drug subsidy cost ($AUD millions)

Instructions

Make a time plot with autoplot(). Describe the trend and any seasonal pattern you see.
Use gg_season() to create a seasonal plot. Which months have consistently high sales? Low sales? Why might this be?
Use gg_subseries(). What do the horizontal blue lines represent? Which months are trending up the most?
Compute and plot the ACF with ACF() |> autoplot(). Are there seasonal spikes at lag 12? What does slow decay in the ACF indicate?
Create lag plots with gg_lag(). At which lags is the correlation strongest? Why?

# Lab 2: Time Series Graphics
# ─────────────────────────────────────────────────
library(fpp3)

# Build a_10: total antidiabetic drug cost from PBS data
a_10 <- PBS |>
  filter(ATC2 == "A10") |>
  summarise(Cost = sum(Cost) / 1e6)

# Step 1 — Time plot
a_10 |>
  autoplot(Cost) +
  labs(title = "Australian antidiabetic drug sales",
       y = "$ million", x = NULL)

# Step 2 — Seasonal plot
a_10 |>
  gg_season(Cost, labels = "both") +
  labs(title = "Seasonal plot: antidiabetic drug sales",
       y = "$ million")

# Step 3 — Subseries plot
a_10 |>
  gg_subseries(Cost) +
  labs(title = "Subseries plot: antidiabetic drug sales",
       y = "$ million")
# Blue horizontal lines = seasonal mean for each month

# Step 4 — ACF correlogram (lag_max = 3 years)
a_10 |>
  ACF(Cost, lag_max = 36) |>
  autoplot() +
  labs(title = "ACF: antidiabetic drug sales")
# Slow decay = trend; spikes at multiples of 12 = annual seasonality

# Step 5 — Lag plots
a_10 |>
  gg_lag(Cost, geom = "point", lags = 1:12) +
  labs(title = "Lag plots: antidiabetic drug sales")

Python

# Lab 2: Time Series Graphics
# ─────────────────────────────────────────────────
# Export from R first:
#   library(fpp3)
#   a_10 <- PBS |> filter(ATC2 == "A10") |> summarise(Cost = sum(Cost) / 1e6)
#   a_10 |> mutate(Month = as.Date(Month)) |>
#     as_tibble() |> write.csv("a10.csv", row.names = FALSE)

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from pandas.plotting import lag_plot

df = pd.read_csv("a10.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")

# Step 1 — Time plot
df["Cost"].plot(figsize=(10, 4),
    title="Antidiabetic Drug Sales", ylabel="$AUD millions")
plt.tight_layout(); plt.show()

# Step 2 — Seasonal plot: one line per year
df["year"]  = df.index.year
df["month"] = df.index.month
pivot = df.pivot(columns="year", values="Cost")
pivot.plot(figsize=(10, 4), alpha=0.7,
    title="Seasonal Plot: Drug Sales (one line per year)")
plt.xlabel("Month (1 = Jan)"); plt.tight_layout(); plt.show()

# Step 3 — Subseries means by calendar month
monthly = df.groupby("month")["Cost"].mean()
monthly.plot(kind="bar", figsize=(8, 4),
    title="Mean Drug Sales by Calendar Month")
plt.tight_layout(); plt.show()

# Step 4 — ACF correlogram (36 lags)
fig, ax = plt.subplots(figsize=(10, 4))
plot_acf(df["Cost"], lags=36, ax=ax)
ax.set_title("ACF: antidiabetic drug sales")
plt.tight_layout(); plt.show()

# Step 5 — Lag scatter plots (lags 1-12)
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
for i, ax in enumerate(axes.flat, 1):
    lag_plot(df["Cost"], lag=i, ax=ax)
    ax.set_title(f"Lag {i}")
plt.tight_layout(); plt.show()

STL (Seasonal and Trend decomposition using Loess) separates a time series into trend, seasonal, and remainder components. In this lab you will decompose Australian beer production, extract the seasonally adjusted series, and quantify the strength of trend and seasonality.

Dataset: aus_production — built in to fpp3 (quarterly Australian commodity production, 1956–2010)

Variable	Description
`Quarter`	Quarterly time index
`Beer`	Quarterly beer production (megalitres)

Instructions

Plot beer production over time. Describe the overall shape: does the trend increase then decrease? Is there strong seasonality?
Fit STL decomposition with model(STL(...)) and plot all four components. Which component has the largest scale?
Overlay the seasonally adjusted series on the original. What remains once seasonality is removed?
Use features(Beer, feat_stl) to compute trend strength (F_T) and seasonal strength (F_S). Are both above the 0.64 threshold for “strong”?
Re-fit STL with season(window = 7) instead of "periodic". How does this change the seasonal component?

# Lab 3: STL Decomposition
# ─────────────────────────────────────────────────
library(fpp3)

beer <- aus_production |>
  select(Quarter, Beer) |>
  filter(!is.na(Beer))

# Step 1 — Time plot
beer |>
  autoplot(Beer) +
  labs(title = "Australian quarterly beer production",
       y = "Megalitres", x = NULL)

# Step 2 — STL decomposition
dcmp <- beer |>
  model(STL(Beer ~ trend(window = 9) +
              season(window = "periodic"),
            robust = TRUE))

components(dcmp) |> autoplot()

# Step 3 — Seasonally adjusted overlay
components(dcmp) |>
  as_tsibble() |>
  autoplot(Beer, colour = "gray70") +
  geom_line(aes(y = season_adjust), colour = "#0072B2", linewidth = 0.9) +
  labs(title = "Beer: original (grey) vs. seasonally adjusted (blue)",
       y = "Megalitres")

# Step 4 — Trend and seasonal strength
beer |>
  features(Beer, feat_stl)
# F_T > 0.64 → strong trend; F_S > 0.64 → strong seasonality

# Step 5 — Variable seasonal window (allows seasonal shape to change)
dcmp2 <- beer |>
  model(STL(Beer ~ trend(window = 9) +
              season(window = 7)))
components(dcmp2) |> autoplot()

Python

# Lab 3: STL Decomposition
# ─────────────────────────────────────────────────
# Export from R:
#   aus_production |> select(Quarter, Beer) |> filter(!is.na(Beer)) |>
#     mutate(Quarter = as.Date(Quarter)) |>
#     as_tibble() |> write.csv("beer.csv", row.names = FALSE)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL

df = pd.read_csv("beer.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").dropna(subset=["Beer"]).set_index("Quarter")

# Step 1 — Time plot
df["Beer"].plot(figsize=(10, 4),
    title="Australian Quarterly Beer Production", ylabel="Megalitres")
plt.tight_layout(); plt.show()

# Step 2 — STL decomposition (period=4 for quarterly data)
stl = STL(df["Beer"], period=4, robust=True)
res = stl.fit()
res.plot()
plt.suptitle("STL Decomposition: Beer Production")
plt.tight_layout(); plt.show()

# Step 3 — Seasonally adjusted overlay
df["sa"] = df["Beer"] - res.seasonal
fig, ax  = plt.subplots(figsize=(10, 4))
ax.plot(df.index, df["Beer"], color="gray",      alpha=0.7, label="Original")
ax.plot(df.index, df["sa"],   color="steelblue", lw=1.5,   label="Seasonally adjusted")
ax.legend(); ax.set_title("Beer: Original vs. Seasonally Adjusted")
plt.tight_layout(); plt.show()

# Step 4 — Trend and seasonal strength (Hyndman-Athanasopoulos formulas)
var_r = np.var(res.resid)
F_T   = max(0, 1 - var_r / np.var(res.trend   + res.resid))
F_S   = max(0, 1 - var_r / np.var(res.seasonal + res.resid))
print(f"Trend strength (F_T):    {F_T:.3f}  (>0.64 = strong)")
print(f"Seasonal strength (F_S): {F_S:.3f}  (>0.64 = strong)")

# Step 5 — Variable seasonal window
stl2 = STL(df["Beer"], period=4, seasonal=7, robust=True)
res2 = stl2.fit()
res2.plot()
plt.suptitle("STL: seasonal window = 7 (more flexible)")
plt.tight_layout(); plt.show()

Before fitting complex models, always benchmark against simple ones. This lab fits the four benchmark forecasting methods — mean, naïve, seasonal naïve, and drift — splits the data into training and test sets, and evaluates forecast accuracy using MAE, RMSE, and MAPE.

Dataset: aus_production (Beer) — built in to fpp3

Variable	Description
`Quarter`	Quarterly time index
`Beer`	Quarterly beer production (megalitres)

Instructions

Split the series: train on data before 2005, test on 2005 onward. How many test periods are there?
Fit all four benchmark models on the training set using a single model() call.
Generate forecasts for the test period and plot them alongside the test data. Which method looks most reasonable?
Compute accuracy metrics with accuracy(fc, beer_test). Which model has the lowest RMSE? Lowest MAPE?
Run residual diagnostics on the seasonal naïve model. Is the Ljung-Box test significant? Are residuals white noise?

# Lab 4: Benchmark Methods & Forecast Accuracy
# ─────────────────────────────────────────────────
library(fpp3)

beer <- aus_production |>
  select(Quarter, Beer) |>
  filter(!is.na(Beer))

# Step 1 — Train / test split
beer_train <- beer |> filter(year(Quarter) < 2005)
beer_test  <- beer |> filter(year(Quarter) >= 2005)

cat("Training periods:", nrow(beer_train), "\n")
cat("Test periods:    ", nrow(beer_test),  "\n")

# Step 2 — Fit four benchmark models
fit <- beer_train |>
  model(
    Mean   = MEAN(Beer),
    Naive  = NAIVE(Beer),
    SNaive = SNAIVE(Beer),
    Drift  = RW(Beer ~ drift())
  )

# Step 3 — Forecast and plot
fc <- fit |> forecast(h = nrow(beer_test))

fc |>
  autoplot(beer_train, level = NULL) +
  autolayer(beer_test, Beer, colour = "black") +
  labs(title = "Beer production: benchmark forecasts vs. actual",
       y = "Megalitres") +
  guides(colour = guide_legend(title = "Method"))

# Step 4 — Accuracy on test set
accuracy(fc, beer_test) |>
  arrange(RMSE)

# Step 5 — Residual diagnostics for seasonal naive
fit |> select(SNaive) |> gg_tsresiduals()

augment(fit) |>
  filter(.model == "SNaive") |>
  features(.innov, ljung_box, lag = 8)
# p > 0.05 → residuals are consistent with white noise

Python

# Lab 4: Benchmark Methods & Forecast Accuracy
# ─────────────────────────────────────────────────
# Uses same beer.csv as Lab 3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import acorr_ljungbox

df = pd.read_csv("beer.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").dropna(subset=["Beer"]).set_index("Quarter")

# Step 1 — Train/test split (mirror R: train < 2005, test >= 2005)
train = df[df.index.year < 2005].copy()
test  = df[df.index.year >= 2005].copy()
print(f"Training: {len(train)}, Test: {len(test)}")

# Step 2 — Benchmark forecasts
h = len(test)

mean_fc   = pd.Series(train["Beer"].mean(), index=test.index)
naive_fc  = pd.Series(train["Beer"].iloc[-1], index=test.index)
snaive_fc = pd.Series(
    [train["Beer"].iloc[-(4 - i % 4)] for i in range(h)], index=test.index)
slope     = (train["Beer"].iloc[-1] - train["Beer"].iloc[0]) / (len(train) - 1)
drift_fc  = pd.Series(
    [train["Beer"].iloc[-1] + slope * (i + 1) for i in range(h)], index=test.index)

# Step 3 — Plot
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(train.index[-20:], train["Beer"].iloc[-20:], "k-", label="Training")
ax.plot(test.index, test["Beer"], "k--", label="Actual")
for fc, label in [(mean_fc, "Mean"), (naive_fc, "Naive"),
                   (snaive_fc, "Seasonal Naive"), (drift_fc, "Drift")]:
    ax.plot(test.index, fc, label=label)
ax.legend(); ax.set_title("Beer Production: Benchmark Forecasts")
plt.tight_layout(); plt.show()

# Step 4 — Accuracy
def metrics(actual, fc):
    e = actual.values - fc.values
    return {"RMSE": np.sqrt(np.mean(e**2)),
            "MAE":  np.mean(np.abs(e)),
            "MAPE": np.mean(np.abs(e / actual.values)) * 100}

results = {m: metrics(test["Beer"], fc)
           for m, fc in [("Mean", mean_fc), ("Naive", naive_fc),
                          ("Seasonal Naive", snaive_fc), ("Drift", drift_fc)]}
print(pd.DataFrame(results).T.sort_values("RMSE").round(2))

# Step 5 — Residual diagnostics for Seasonal Naive (training residuals)
# SNaive innovation at time t: Beer[t] - Beer[t-4]
resid_train = train["Beer"].iloc[4:].values - train["Beer"].iloc[:-4].values
lb = acorr_ljungbox(resid_train, lags=[8], return_df=True)
print(lb)
# p << 0.05 → residuals are NOT white noise (autocorrelation remains)

ETS models generalize SES, Holt’s, and Holt-Winters methods into a unified state-space framework. fpp3’s ETS() function selects the best error/trend/seasonal specification by minimizing AICc. This lab fits an ETS model to a drug sales series with strong trend and seasonality, diagnoses residuals, and produces a 2-year forecast.

Dataset: h02 — built in to fpp3 (monthly Australian corticosteroid drug sales, 1991–2008)

Variable	Description
`Month`	Monthly time index
`Cost`	Monthly government drug cost ($AUD millions)

Instructions

Plot the h02 series. Describe the trend and seasonal pattern. Does variance grow with the level?
Fit ETS(Cost) with automatic selection. Use report() to identify the chosen model. What do the E, T, and S letters mean?
Run residual diagnostics with gg_tsresiduals() and the Ljung-Box test. Are residuals white noise?
Produce a 2-year forecast. Do the prediction intervals look reasonable given the historical variation?
Manually fit ETS(A,A,A) and ETS(M,Ad,M) and compare AICc values. Which is preferred?

# Lab 5: Exponential Smoothing (ETS)
# ─────────────────────────────────────────────────
library(fpp3)

# Build h02: total corticosteroid drug cost from PBS
h02 <- PBS |>
  filter(ATC2 == "H02") |>
  summarise(Cost = sum(Cost) / 1e6)

# Step 1 — Time plot
h02 |>
  autoplot(Cost) +
  labs(title = "Australian corticosteroid drug sales",
       y = "$ million", x = NULL)

# Step 2 — Automatic ETS selection
fit_ets <- h02 |> model(ETS(Cost))
report(fit_ets)
# Read: ETS(error, trend, season) — e.g. ETS(M,Ad,M)
# smoothing parameters: alpha, beta*, gamma, phi

# Step 3 — Residual diagnostics
fit_ets |> gg_tsresiduals()

augment(fit_ets) |>
  features(.innov, ljung_box, lag = 24)
# p << 0.05 → residuals are NOT white noise; autocorrelation remains

# Step 4 — 2-year forecast with prediction intervals
fit_ets |>
  forecast(h = "2 years") |>
  autoplot(h02) +
  labs(title = "ETS forecast: corticosteroid drug sales",
       y = "$ million")

# Step 5 — Compare AICc across ETS specifications
h02 |>
  model(
    Auto = ETS(Cost),
    AAA  = ETS(Cost ~ error("A") + trend("A") + season("A")),
    MAM  = ETS(Cost ~ error("M") + trend("A") + season("M")),
    MAdM = ETS(Cost ~ error("M") + trend("Ad") + season("M"))
  ) |>
  glance() |>
  select(.model, AICc) |>
  arrange(AICc)

Python

# Lab 5: Exponential Smoothing (ETS)
# ─────────────────────────────────────────────────
# Note: uses statsmodels ETSModel — forecast API differs from R.
#   Use get_prediction(start, end) and .summary_frame(alpha) to get intervals.
#   Column names are 'mean', 'pi_lower', 'pi_upper' (not conf_int).
# Export from R:
#   library(fpp3)
#   h02 <- PBS |> filter(ATC2 == "H02") |> summarise(Cost = sum(Cost) / 1e6)
#   h02 |> mutate(Month = as.Date(Month)) |>
#     as_tibble() |> write.csv("h02.csv", row.names = FALSE)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.stats.diagnostic import acorr_ljungbox

df = pd.read_csv("h02.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"

# Step 1 — Time plot
df["Cost"].plot(figsize=(10, 4),
    title="H02 Antidiabetic Drug Costs", ylabel="$AUD millions")
plt.tight_layout(); plt.show()

# Step 2 — Fit ETS models and compare AIC
configs = {
    "ETS(A,A,A)": dict(error="add", trend="add",  seasonal="add", seasonal_periods=12),
    "ETS(M,A,M)": dict(error="mul", trend="add",  seasonal="mul", seasonal_periods=12),
    "ETS(M,Ad,M)":dict(error="mul", trend="add",  seasonal="mul", seasonal_periods=12,
                       damped_trend=True),
}
fits = {}
for name, kw in configs.items():
    try:
        fits[name] = ETSModel(df["Cost"], **kw).fit(disp=False)
    except Exception:
        pass

aic_tbl = {n: {"AIC": f.aic, "BIC": f.bic} for n, f in fits.items()}
print(pd.DataFrame(aic_tbl).T.sort_values("AIC").round(1))

# Step 3 — Report best model (assume ETS(M,A,M))
best = fits.get("ETS(M,A,M)", list(fits.values())[0])
print(best.summary())

# Step 4 — Residual diagnostics
resid = best.resid
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(resid, lags=24, ax=axes[1])
axes[2].hist(resid, bins=20); axes[2].set_title("Distribution")
plt.tight_layout(); plt.show()

lb = acorr_ljungbox(resid, lags=[24], return_df=True)
print(lb)

# Step 5 — 2-year forecast with 80% and 95% intervals
fc   = best.get_prediction(start=len(df), end=len(df) + 23)
sf95 = fc.summary_frame(alpha=0.05)   # 95% PI: columns pi_lower / pi_upper
sf80 = fc.summary_frame(alpha=0.20)   # 80% PI

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-60:], df["Cost"].iloc[-60:], color="black")
ax.plot(sf95.index, sf95["mean"], color="steelblue")
ax.fill_between(sf80.index,  sf80["pi_lower"],  sf80["pi_upper"],
                alpha=0.35, color="steelblue")
ax.fill_between(sf95.index, sf95["pi_lower"], sf95["pi_upper"],
                alpha=0.15, color="steelblue")
ax.set_title("H02: 2-Year ETS Forecast")
plt.tight_layout(); plt.show()

ARIMA models capture autocorrelation structure by regressing a series on its own past values (AR) and past forecast errors (MA), after differencing to achieve stationarity (I). This lab uses monthly US retail employment to practice the full ARIMA workflow: stationarity check, automatic model selection, residual diagnostics, and a head-to-head comparison with ETS.

Dataset: us_employment — built in to fpp3 (monthly US employment by sector, 1939–2019)

Variable	Description
`Month`	Monthly time index
`Series_ID`	BLS series identifier (key)
`Title`	Employment category label
`Employed`	Employed persons (thousands)

Instructions

Filter to “Retail Trade” employment. Plot the series and its ACF. Is the series stationary? What does slow ACF decay signal?
Use features(Employed, list(unitroot_ndiffs, unitroot_nsdiffs)) to determine how many differences are needed.
Fit ARIMA(Employed) with automatic selection. Use report() to read the output. Identify p, d, q, P, D, Q, and m.
Run residual diagnostics. Is the Ljung-Box test significant? Are residuals consistent with white noise?
Split on 2018 and compare ARIMA vs. ETS on the test set. Which has lower RMSE? Which has more conservative (wider) prediction intervals?

# Lab 6: ARIMA Models
# ─────────────────────────────────────────────────
library(fpp3)

retail <- us_employment |>
  filter(Title == "Retail Trade") |>
  select(Month, Employed)

# Step 1 — Time plot and ACF
retail |>
  autoplot(Employed) +
  labs(title = "US Retail Trade Employment",
       y = "Thousands of persons", x = NULL)

retail |>
  ACF(Employed, lag_max = 36) |>
  autoplot() +
  labs(title = "ACF: US Retail Employment")
# Slow ACF decay → non-stationary (unit root)

# Step 2 — How many differences?
retail |>
  features(Employed, list(unitroot_ndiffs, unitroot_nsdiffs))

# Step 3 — Automatic ARIMA selection
fit_arima <- retail |> model(ARIMA(Employed))
report(fit_arima)
# Read: ARIMA(p,d,q)(P,D,Q)[m]
# Coefficients, AICc, sigma^2

# Step 4 — Residual diagnostics
fit_arima |> gg_tsresiduals()

augment(fit_arima) |>
  features(.innov, ljung_box, lag = 24)

# Step 5 — ARIMA vs. ETS on test set
retail_train <- retail |> filter(year(Month) < 2018)
retail_test  <- retail |> filter(year(Month) >= 2018)

fit_compare <- retail_train |>
  model(
    ARIMA = ARIMA(Employed),
    ETS   = ETS(Employed)
  )

fc_compare <- fit_compare |>
  forecast(h = nrow(retail_test))

fc_compare |>
  autoplot(retail_train |> tail(60), level = 80) +
  autolayer(retail_test, Employed, colour = "black") +
  labs(title = "ARIMA vs. ETS: US Retail Employment",
       y = "Thousands of persons")

accuracy(fc_compare, retail_test) |>
  select(.model, RMSE, MAE, MAPE) |>
  arrange(RMSE)

Python

# Lab 6: ARIMA Models
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
#   pip install pmdarima
# Export from R:
#   us_employment |>
#     filter(Title == "Retail Trade") |>
#     select(Month, Employed) |>
#     mutate(Month = as.Date(Month)) |>
#     as_tibble() |> write.csv("retail_employment.csv", row.names = FALSE)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox

df = pd.read_csv("retail_employment.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"

# Step 1 — Time plot and ACF
df["Employed"].plot(figsize=(10, 4),
    title="US Retail Trade Employment", ylabel="Thousands of persons")
plt.tight_layout(); plt.show()

fig, ax = plt.subplots(figsize=(10, 4))
plot_acf(df["Employed"], lags=36, ax=ax)
ax.set_title("ACF: US Retail Employment"); plt.tight_layout(); plt.show()

# Step 2 — Stationarity check (ADF test)
pval = adfuller(df["Employed"].dropna())[1]
print(f"ADF p-value (levels):      {pval:.4f}  (>0.05 = non-stationary)")
pval_d = adfuller(df["Employed"].diff(12).diff(1).dropna())[1]
print(f"ADF p-value (s-diff+diff): {pval_d:.4f}  (<0.05 = stationary)")

# Step 3 — Fit ARIMA (auto-order via pmdarima; install: pip install pmdarima)
from pmdarima import auto_arima
fit_auto = auto_arima(df["Employed"], seasonal=True, m=12,
                       information_criterion="aicc", stepwise=True, trace=True)
print(fit_auto.summary())

# Step 4 — Residual diagnostics
resid = fit_auto.resid()
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
plot_acf(resid, lags=24, ax=axes[1])
axes[2].hist(resid, bins=20)
plt.tight_layout(); plt.show()

lb = acorr_ljungbox(resid, lags=[24], return_df=True)
print(lb)

# Step 5 — ARIMA vs. ETS on 2018+ test set
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
train = df[df.index.year < 2018]
test  = df[df.index.year >= 2018]
h     = len(test)

p, d, q       = fit_auto.order
P, D, Q, m    = fit_auto.seasonal_order
arima_fit = SARIMAX(train["Employed"], order=(p,d,q),
                    seasonal_order=(P,D,Q,m)).fit(disp=False)
ets_fit   = ETSModel(train["Employed"], error="mul", trend="add",
                     seasonal="mul", seasonal_periods=12).fit(disp=False)

arima_fc  = arima_fit.forecast(h)
ets_fc    = ets_fit.forecast(h)

rmse = lambda f: np.sqrt(np.mean((test["Employed"].values - f.values)**2))
print(f"ARIMA RMSE: {rmse(arima_fc):.1f}")
print(f"ETS   RMSE: {rmse(ets_fc):.1f}")

Automatic ARIMA selection is convenient, but reading ACF and PACF plots after differencing is the core diagnostic skill. An AR(p) process shows PACF cutting off at lag p with ACF tailing off; an MA(q) process is the mirror image. This lab walks through manual order identification and then checks your choice against the AICc-selected model.

Dataset: aus_production — built in to fpp3 (quarterly Australian production of several commodities, 1956–2010)

Variable	Description
`Quarter`	Quarterly time index (tsibble index)
`Gas`	Gas production (petajoules)

Instructions

Plot Gas production. Describe the trend and any seasonality. Is the series stationary?
Take a log transformation, then a seasonal difference (lag 4). Plot and check the KPSS statistic with features(log_gas, unitroot_kpss). Take a regular first difference if still needed.
Plot the ACF and PACF of the (log, seasonally-differenced) series. Based on the patterns, propose values of p and q for the non-seasonal part.
Fit your manually-chosen ARIMA(p,d,q)(P,D,Q)[4] alongside ARIMA(log(Gas)) auto-selection. Compare the two models by AICc using glance().
Forecast Gas production 8 quarters (2 years) ahead from both models and plot the intervals side by side. Which is narrower?

# Lab 7: ARIMA Order Selection via ACF/PACF
# ─────────────────────────────────────────────────
library(fpp3)

gas <- aus_production |> select(Quarter, Gas)

# Step 1 — Time plot
gas |> autoplot(Gas) +
  labs(title = "Australian Gas Production",
       y = "Petajoules", x = NULL)

# Step 2 — Log + seasonal difference
gas <- gas |> mutate(log_gas = log(Gas))

gas |>
  mutate(sdiff = difference(log_gas, lag = 4)) |>
  features(sdiff, unitroot_kpss)
# If p-value < 0.05, take one more regular difference

gas_diff <- gas |>
  mutate(d = difference(difference(log_gas, lag = 4), 1))

gas_diff |>
  autoplot(d) +
  labs(title = "Log Gas: seasonal + regular differenced")

# Step 3 — ACF and PACF of differenced series
gas_diff |> ACF(d, lag_max = 24)  |> autoplot()
gas_diff |> PACF(d, lag_max = 24) |> autoplot()
# AR(p): PACF cuts off, ACF tails off
# MA(q): ACF cuts off, PACF tails off

# Step 4 — Manual vs. auto ARIMA
fit <- gas |>
  model(
    manual = ARIMA(log(Gas) ~ pdq(1,1,0) + PDQ(1,1,0)),   # adjust after Step 3
    auto   = ARIMA(log(Gas))
  )

glance(fit) |> select(.model, AICc, BIC)
report(fit |> select(auto))

# Step 5 — Forecast 8 quarters ahead
fc <- fit |> forecast(h = 8)

fc |>
  autoplot(gas |> tail(40), level = 80) +
  labs(title = "Gas Production Forecast: Manual vs. Auto ARIMA",
       y = "Petajoules (log scale)")

Python

# Lab 7: ARIMA Order Selection via ACF/PACF
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
#   pip install pmdarima
# ─────────────────────────────────────────────────
# Export from R:
#   aus_production |>
#     select(Quarter, Gas) |>
#     mutate(Quarter = as.Date(Quarter)) |>
#     as_tibble() |> write.csv("gas.csv", row.names = FALSE)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX

df = pd.read_csv("gas.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").set_index("Quarter")

# Step 1 — Time plot
df["Gas"].plot(figsize=(10, 4),
    title="Australian Gas Production", ylabel="Petajoules")
plt.tight_layout(); plt.show()

# Step 2 — Log + seasonal + regular differencing; KPSS test
df["log_gas"]  = np.log(df["Gas"])
df["sdiff_d1"] = df["log_gas"].diff(4).diff(1)   # lag-4 + lag-1

_, p, *_ = kpss(df["sdiff_d1"].dropna(), regression="c")
print(f"KPSS p-value after log + s-diff + diff: {p:.4f}  (>0.05 = stationary)")

# Step 3 — ACF and PACF to identify p, q
fig, axes = plt.subplots(2, 1, figsize=(10, 7))
plot_acf( df["sdiff_d1"].dropna(), lags=24, ax=axes[0])
plot_pacf(df["sdiff_d1"].dropna(), lags=24, ax=axes[1])
axes[0].set_title("ACF of log-gas (s-diff + diff)")
axes[1].set_title("PACF of log-gas (s-diff + diff)")
plt.tight_layout(); plt.show()

# Step 4 — Manual vs. auto ARIMA (compare AICc)
manual = SARIMAX(df["log_gas"], order=(1,1,0),
                 seasonal_order=(1,1,0,4)).fit(disp=False)
print(f"Manual  AICc: {manual.aicc:.1f}")

from pmdarima import auto_arima   # pip install pmdarima
auto_m = auto_arima(df["log_gas"], seasonal=True, m=4,
                     information_criterion="aicc", stepwise=True, trace=False)
print(f"Auto    AICc: {auto_m.aicc():.1f}")
print("Auto order:", auto_m.order, auto_m.seasonal_order)

# Step 5 — Forecast 8 quarters ahead (back-transform from log)
fc_manual = np.exp(manual.forecast(8))
auto_fit  = SARIMAX(df["log_gas"], order=auto_m.order,
                    seasonal_order=auto_m.seasonal_order).fit(disp=False)
fc_auto   = np.exp(auto_fit.forecast(8))
print("Manual forecast:", fc_manual.round(1).tolist())
print("Auto   forecast:", fc_auto.round(1).tolist())

Seasonal ARIMA extends the ARIMA framework with a seasonal AR/MA layer operating at lag m, written ARIMA(p,d,q)(P,D,Q)[m]. Monthly pharmaceutical data are strongly seasonal and trended, making them ideal for practising seasonal differencing and reading the full SARIMA notation from R output.

Dataset: h02 — built in to fpp3 (monthly Australian PBS scripts for antidiabetic drugs, 1991–2008)

Variable	Description
`Month`	Monthly time index
`Cost`	Monthly drug cost (AUD millions)

Instructions

Plot the series. Identify trend, seasonality, and any calendar effects. Does variance grow with level? Consider a log transformation.
Use features(log(Cost), list(unitroot_ndiffs, unitroot_nsdiffs)) to determine d and D. Take the indicated differences and verify with an ACF plot.
Fit ARIMA(log(Cost)) with automatic selection. Use report() to read the full SARIMA(p,d,q)(P,D,Q)[12] output. Write out the model equation in words: what does each term capture?
Run gg_tsresiduals() and the Ljung-Box test. Do the residuals look like white noise?
Forecast 36 months (3 years) ahead. Plot the forecast with 80% and 95% prediction intervals. Does the seasonal pattern carry forward correctly?

# Lab 8: Seasonal ARIMA (SARIMA)
# ─────────────────────────────────────────────────
library(fpp3)

h02 <- PBS |>
  filter(ATC2 == "H02") |>
  summarise(Cost = sum(Cost) / 1e6)

# Step 1 — Plot and check for variance growth
h02 |> autoplot(Cost) +
  labs(title = "H02 Antidiabetic Drug Costs",
       y = "AUD millions", x = NULL)

h02 |> autoplot(log(Cost)) +
  labs(title = "H02 Drug Costs (log scale)")

# Step 2 — How many differences needed?
h02 |>
  features(log(Cost), list(unitroot_ndiffs, unitroot_nsdiffs))

# Visually check after seasonal + regular differencing
h02 |>
  mutate(d = difference(difference(log(Cost), lag = 12), 1)) |>
  autoplot(d)

h02 |>
  mutate(d = difference(difference(log(Cost), lag = 12), 1)) |>
  ACF(d, lag_max = 36) |>
  autoplot()

# Step 3 — Automatic SARIMA
fit <- h02 |> model(ARIMA(log(Cost)))
report(fit)
# Look for: ARIMA(p,d,q)(P,D,Q)[12]
# Non-seasonal: captures short-run autocorrelation
# Seasonal:     captures lag-12 autocorrelation

# Step 4 — Residual diagnostics
fit |> gg_tsresiduals(lag_max = 36)

augment(fit) |>
  features(.innov, ljung_box, lag = 36, dof = 4)

# Step 5 — 3-year forecast
fc <- fit |> forecast(h = 36)

fc |>
  autoplot(h02 |> tail(60)) +
  labs(title = "H02 Drug Costs: 3-Year SARIMA Forecast",
       y = "AUD millions")

Python

# Lab 8: Seasonal ARIMA (SARIMA)
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
#   pip install pmdarima
# ─────────────────────────────────────────────────
# Export from R: (same h02.csv as Lab 5)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import kpss
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
from pmdarima import auto_arima   # pip install pmdarima

df = pd.read_csv("h02.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"

# Step 1 — Original and log-scale plots
fig, axes = plt.subplots(2, 1, figsize=(10, 7))
df["Cost"].plot(ax=axes[0], title="H02 Drug Costs (original)")
np.log(df["Cost"]).plot(ax=axes[1], title="H02 Drug Costs (log scale)")
plt.tight_layout(); plt.show()

# Step 2 — Stationarity: KPSS after seasonal then regular differencing
log_cost = np.log(df["Cost"])
_, p1, *_ = kpss(log_cost.diff(12).dropna(),      regression="c")
_, p2, *_ = kpss(log_cost.diff(12).diff(1).dropna(), regression="c")
print(f"KPSS after seasonal diff only:     p = {p1:.4f}")
print(f"KPSS after seasonal + first diff:  p = {p2:.4f}")
print("(p > 0.05 means stationary)")

# Step 3 — Auto SARIMA selection on log scale
fit = auto_arima(log_cost, seasonal=True, m=12,
                  information_criterion="aicc", stepwise=True, trace=True)
print(fit.summary())

# Step 4 — Residual diagnostics
resid = fit.resid()
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
plot_acf(resid, lags=36, ax=axes[1])
axes[2].hist(resid, bins=20); axes[2].set_title("Distribution")
plt.tight_layout(); plt.show()

print(acorr_ljungbox(resid, lags=[36], return_df=True))

# Step 5 — 3-year forecast (36 months, back-transform)
p, d, q = fit.order; P, D, Q, m = fit.seasonal_order
sarima  = SARIMAX(log_cost, order=(p,d,q),
                  seasonal_order=(P,D,Q,m)).fit(disp=False)
fc_log  = sarima.get_forecast(36)
fc_mean = np.exp(fc_log.predicted_mean)
ci      = np.exp(fc_log.conf_int())

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-60:], df["Cost"].iloc[-60:], color="black", label="Observed")
ax.plot(fc_mean.index, fc_mean, color="steelblue", label="Forecast")
ax.fill_between(ci.index, ci.iloc[:,0], ci.iloc[:,1], alpha=0.2)
ax.legend(); ax.set_title("H02 Drug Costs: 3-Year SARIMA Forecast")
plt.tight_layout(); plt.show()

Dynamic regression combines regression predictors with an ARIMA error structure, written ARIMA(y ~ x + pdq() + PDQ()) in fpp3. This is essential when a series has external drivers (temperature, prices, calendar effects) and residual autocorrelation. Weekly gasoline production has a complex seasonal period (~52 weeks), so Fourier terms replace seasonal ARIMA to handle the non-integer periodicity.

Dataset: us_gasoline — built in to fpp3 (weekly US finished motor gasoline production, 1991–2017)

Variable	Description
`Week`	Weekly time index
`Barrels`	Production (millions of barrels per day)

Instructions

Plot the series. Describe the trend and the complex weekly seasonality (period ≈ 52.18 weeks).
Fit four models using K = 1, 3, 6, and 9 Fourier terms combined with ARIMA(Barrels ~ fourier(K=K) + PDQ(0,0,0)). Use a loop or model() with named entries. Extract AICc from glance() and select the best K.
Inspect the best model with report(). What ARIMA order was chosen for the errors? Run gg_tsresiduals().
Forecast 2 years (104 weeks) ahead using the best model. Plot the result.
Compare the Fourier + ARIMA forecast against a plain ARIMA(Barrels) model on a 2015–2017 test set. Which has lower RMSE?

# Lab 9: Dynamic Regression with ARIMA Errors
# ─────────────────────────────────────────────────
library(fpp3)

# Step 1 — Plot
us_gasoline |> autoplot(Barrels) +
  labs(title = "US Weekly Gasoline Production",
       y = "Millions of barrels/day", x = NULL)

# Step 2 — Select best number of Fourier terms by AICc
fit_fourier <- us_gasoline |>
  model(
    K1 = ARIMA(Barrels ~ fourier(K = 1)  + PDQ(0,0,0)),
    K3 = ARIMA(Barrels ~ fourier(K = 3)  + PDQ(0,0,0)),
    K6 = ARIMA(Barrels ~ fourier(K = 6)  + PDQ(0,0,0)),
    K9 = ARIMA(Barrels ~ fourier(K = 9)  + PDQ(0,0,0))
  )

glance(fit_fourier) |>
  select(.model, AICc) |>
  arrange(AICc)
# Lower AICc = better fit/parsimony trade-off

# Step 3 — Best model diagnostics (assuming K6 wins; adjust if needed)
best_fit <- us_gasoline |>
  model(ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)))

report(best_fit)
best_fit |> gg_tsresiduals()

# Step 4 — Forecast 2 years ahead
fc <- best_fit |> forecast(h = 104)

fc |>
  autoplot(us_gasoline |> tail(104), level = 80) +
  labs(title = "US Gasoline Production: 2-Year Forecast",
       y = "Millions of barrels/day")

# Step 5 — Compare vs. plain ARIMA on 2015-2017 test set
gas_train <- us_gasoline |> filter(year(Week) < 2015)
gas_test  <- us_gasoline |> filter(year(Week) >= 2015)

fit_compare <- gas_train |>
  model(
    Fourier_ARIMA = ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)),
    Plain_ARIMA   = ARIMA(Barrels)
  )

fc_compare <- fit_compare |>
  forecast(h = nrow(gas_test))

accuracy(fc_compare, gas_test) |>
  select(.model, RMSE, MAE, MAPE) |>
  arrange(RMSE)

Python

# Lab 9: Dynamic Regression with ARIMA Errors
# ─────────────────────────────────────────────────
# Note: Python may select a different best K than R in Step 2.
#   R's ARIMA() auto-selects the error order separately for each K value,
#   while Python uses a fixed ARIMA(0,1,1) error structure throughout.
#   This changes the AICc surface, so the winning K may differ.
#   The key conclusion in Step 5 (Fourier+ARIMA beats plain ARIMA) holds either way.
# ─────────────────────────────────────────────────
# Export from R:
#   us_gasoline |>
#     mutate(Week = as.Date(Week)) |>
#     as_tibble() |> write.csv("us_gasoline.csv", row.names = FALSE)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox

df = pd.read_csv("us_gasoline.csv", parse_dates=["Week"])
df = df.sort_values("Week").set_index("Week")

# Step 1 — Time plot
df["Barrels"].plot(figsize=(10, 4),
    title="US Weekly Gasoline Production",
    ylabel="Millions of barrels/day")
plt.tight_layout(); plt.show()

# Step 2 — Build Fourier terms (period ≈ 52.18 weeks = 365.25/7)
period = 365.25 / 7
n = len(df)
t = np.arange(n)

def fourier_df(t_vals, K, index):
    cols = {}
    for k in range(1, K + 1):
        cols[f"sin{k}"] = np.sin(2 * np.pi * k * t_vals / period)
        cols[f"cos{k}"] = np.cos(2 * np.pi * k * t_vals / period)
    return pd.DataFrame(cols, index=index)

def aicc(fit):
    k, n = len(fit.params), fit.nobs
    return fit.aic + 2 * k * (k + 1) / (n - k - 1)

aicc_results = {}
for K in [1, 3, 6, 9]:
    X = fourier_df(t, K, df.index)
    fit = SARIMAX(df["Barrels"], exog=X,
                  order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
    aicc_results[f"K={K}"] = aicc(fit)

print("AICc by number of Fourier terms:")
print(pd.Series(aicc_results).sort_values())
# Note: R auto-selects the ARIMA error order for each K; Python fixes it at (0,1,1).
# Python may select a different best K than R — both correctly apply AICc minimization.

# Step 3 — Fit best K and diagnose residuals
best_K = int(min(aicc_results, key=aicc_results.get).split("=")[1])
X_best = fourier_df(t, best_K, df.index)
best   = SARIMAX(df["Barrels"], exog=X_best,
                  order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
print(best.summary())

lb = acorr_ljungbox(best.resid, lags=[52], return_df=True)
print(lb)

# Step 4 — Forecast 2 years (104 weeks)
t_fc  = np.arange(n, n + 104)
X_fc  = fourier_df(t_fc, best_K, index=range(104))
fc    = best.get_forecast(104, exog=X_fc)
mean  = fc.predicted_mean
ci    = fc.conf_int()

fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-104:], df["Barrels"].iloc[-104:], color="black")
ax.plot(mean.index, mean.values, color="steelblue")
ax.fill_between(ci.index, ci.iloc[:,0], ci.iloc[:,1], alpha=0.2)
ax.set_title("US Gasoline: 2-Year Fourier + ARIMA Forecast")
plt.tight_layout(); plt.show()

# Step 5 — Compare vs. plain ARIMA on 2015-2017 test set
train_idx = df.index.year < 2015
test_idx  = df.index.year >= 2015
n_train   = train_idx.sum()
t_train, t_test = t[:n_train], t[n_train:]

X_tr = fourier_df(t_train, best_K, df.index[train_idx])
X_te = fourier_df(t_test,  best_K, df.index[test_idx])

f_fit = SARIMAX(df.loc[train_idx, "Barrels"], exog=X_tr,
                 order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
p_fit = SARIMAX(df.loc[train_idx, "Barrels"],
                 order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)

n_test = int(test_idx.sum())
fc_f = f_fit.forecast(n_test, exog=X_te)
fc_p = p_fit.forecast(n_test)
actual = df.loc[test_idx, "Barrels"]

rmse = lambda f: np.sqrt(np.mean((actual.values - np.array(f))**2))
print(f"Fourier+ARIMA RMSE: {rmse(fc_f):.3f}")
print(f"Plain ARIMA RMSE:   {rmse(fc_p):.3f}")

Learning Resources

▶Standard Deviation

Normal Distribution of Exam Scores (mean = 70)

▶Central Limit Theorem

Key Takeaways

▶Confidence Intervals

▶Hypothesis Testing & P-Values

Key Takeaways

Quick Check

▶Type I & Type II Errors, Statistical Power

Key Takeaways

▶Ordinary Least Squares

▶Instrumental Variables

First Stage — Z ︎→︎ X

Structural Equation — X ︎→︎ Y

▶Maximum Likelihood Estimation

▶Fixed and Random Effects

▶R-Squared & Goodness of Fit

Key Takeaways

▶Omitted Variable Bias

Key Takeaways

▶Heteroskedasticity

Key Takeaways

▶The Regression Weighting Problem

The Math

Why It Matters

Simulation: Drag to See the Problem

AS Weight by Propensity Score

Where OLS Gets Its Identification

Real Data: LaLonde (1986)

▶Propensity Score Matching

Key Takeaways

▶Production Possibilities & Comparative Advantage

Key Takeaways

▶Demand

The Demand Curve

▶Supply

The Supply Curve

▶Market Equilibrium

Supply & Demand

▶Consumer & Producer Surplus

Key Takeaways

▶Elasticity

Demand Curve — shaded area = Total Revenue (P × Q)

▶Price Controls

Key Takeaways

▶Monopoly vs. Perfect Competition

Key Takeaways

▶Externalities & Pigouvian Policy

Key Takeaways

▶Tax Incidence

Key Takeaways

▶Public Goods & the Free-Rider Problem

Key Takeaways

▶Price Discrimination

Key Takeaways

▶Labor Market: Supply & Demand

Key Takeaways

▶Minimum Wage

Key Takeaways

▶Monopsony

Key Takeaways

▶Human Capital & Returns to Education

Key Takeaways

▶Unions & Collective Bargaining

Key Takeaways

▶Unemployment: Types & Natural Rate

Key Takeaways

▶Simple Exponential Smoothing

Retail Sales vs. SES One-Step-Ahead Forecast

▶Moving Average Smoother

Quarterly Sales: Original Series vs. MA(k) Smoother

▶ACF and PACF Explorer

ACF — Lags 0–20

PACF — Lags 1–20

▶Holt’s Linear Trend Method

Retail Sales — Holt’s Linear Trend Fit and Forecast

▶STL Decomposition

Original Series with Selected Component

▶Benchmark Forecasting Methods