Learning Resources
Interactive demonstrations and code examples for econometrics, microeconomics, and forecasting.
▶Standard Deviation
Statistics · Descriptive Statistics
Play with the sliders below to see how the spread changes!
For each data point xᵢ, subtract the mean x̄ to find how far it sits from the center. Squaring does two things: it makes negatives positive (so deviations above and below don't cancel), and it penalizes large outliers more heavily. A score 20 points away from the mean contributes 400 to the sum; a score 2 points away contributes only 4.
We sum all the squared deviations and divide by n−1 to get an average. The result is called the variance (s²). It measures spread in squared units (e.g., points²), which is hard to interpret directly.
Why n−1 instead of n? Because we estimated the mean from the same data, which “uses up” one degree of freedom. Using n−1 gives an unbiased estimate of the true population variance. This is called Bessel’s correction.
Taking the square root of the variance returns us to the original units (points, dollars, etc.) instead of squared units. This makes the SD directly interpretable alongside the data. For example, if exam scores have an SD of 10 points, you can say “a typical score is within about 10 points of the mean” — a statement that makes intuitive sense.
The formula uses Greek letters as shorthand. Here’s what each one means:
| Symbol | Name | Meaning in this context |
|---|---|---|
| σ | lowercase sigma | Standard deviation of a population |
| s | s (Roman) | Standard deviation of a sample (what we usually compute) |
| σ² / s² | sigma-squared / s-squared | Variance — SD before taking the square root (in squared units) |
| Σ | uppercase Sigma | Summation — “add up all the following terms” |
| μ | mu | Population mean (true average for the whole group) |
| x | x-bar | Sample mean (average computed from collected data) |
| n | n (Roman) | Sample size — number of observations |
Tip: lowercase σ and uppercase Σ look similar but mean very different things. Context matters: σ (or s) is a single number measuring spread; Σ is an instruction to sum a list of numbers.
Normal Distribution of Exam Scores (mean = 70)
- SD is in the same units as the data. If scores are in points, the SD is in points. A score of 80 means very different things in a class with SD=2 versus SD=20. Try dragging the SD slider to see how the spread changes!
- The 68-95-99.7 rule (Empirical Rule). For bell-shaped data, about 68% of observations fall within ±1 SD of the mean, 95% within ±2 SD, and 99.7% within ±3 SD. Check the stats output above — the percentages match the rule!
- Mean and SD are independent. The spread of the bell curve depends only on σ, not on where the curve is centered. A class with mean 60 and SD=10 has the exact same bell shape as one with mean 85 and SD=10 — just shifted on the score axis.
- SD vs. Variance. Variance is SD². It’s useful in math (variances add across independent variables) but hard to interpret because it’s in squared units. The SD is just its square root — back in the original units.
- Outliers inflate the SD. Because deviations are squared before averaging, extreme points count disproportionately. A few outliers can make the SD much larger than it would otherwise be.
Score: / 5
▶Central Limit Theorem
Statistics · Inference
This is the Central Limit Theorem: for large enough n, the sampling distribution of the mean is normal regardless of the population shape. It underlies every t-test, confidence interval, and regression inference you will ever run.
Pick a population shape, set the sample size, and simulate.
Run the simulation to see results.
Key Takeaways
- The CLT is why the normal distribution is everywhere. Individual observations don’t need to be normal — only sample means do. That’s the foundation of t-tests, confidence intervals, and regression inference.
- Larger n ︎→︎ faster convergence. At n = 1, sample means look like the raw population. By n = 30, the distribution is usually close to normal. At n = 100, it’s excellent even for severely skewed data.
- SE = σ/√n. As n increases, sample means cluster more tightly around μ. Quadrupling the sample size halves the standard error.
- Independence matters. The CLT assumes random, independent observations. Clustered or time-series data with autocorrelation can violate this and require different inference methods.
▶Confidence Intervals
Econometrics · Statistical Inference
The wider the range, the more sure you can be it contains the truth. The narrower the range, the more precise — but the less sure. Play with the controls below to see how it works!
This is the average exam score from the students we actually asked — our single best guess for the true average. In our example, we surveyed a group and got an average of 78. It's the center of our confidence interval. The more students we ask, the closer this number tends to be to the real answer.
This number comes from the normal distribution table and controls how confident we want to be. A higher confidence level means a bigger Z-value, which stretches the interval wider. For 90% confidence it's 1.645, for 95% it's 1.960, and for 99% it's 2.576. Think of it as a "confidence multiplier" — the more sure you want to be, the bigger the multiplier.
The standard error tells us how much our sample average is likely to wiggle from sample to sample. It comes from two things: the standard deviation (how spread out individual scores are) divided by the square root of the sample size. This is why collecting more data helps — a bigger n means a smaller standard error, which means a narrower interval.
| Confidence Level | Significance (α) | Tail Area (α/2) | Z* (Critical Value) | Meaning |
|---|---|---|---|---|
| 90% | 0.10 | 0.05 | 1.645 | We allow a 10% chance of being wrong |
| 95% | 0.05 | 0.025 | 1.960 | We allow a 5% chance of being wrong |
| 99% | 0.01 | 0.005 | 2.576 | We allow a 1% chance of being wrong |
| df | t* at 90% | t* at 95% | t* at 99% |
|---|---|---|---|
| 5 | 2.015 | 2.571 | 4.032 |
| 10 | 1.812 | 2.228 | 3.169 |
| 15 | 1.753 | 2.131 | 2.947 |
| 20 | 1.725 | 2.086 | 2.845 |
| 25 | 1.708 | 2.060 | 2.787 |
| 30 | 1.697 | 2.042 | 2.750 |
| 50 | 1.676 | 2.009 | 2.678 |
| 100 | 1.660 | 1.984 | 2.626 |
| ∞ | 1.645 | 1.960 | 2.576 |
- Ask more students ︎→︎ narrower interval. The more people you ask, the better your guess. Try dragging the sample size slider to the right!
- More variation in scores ︎→︎ wider interval. If some students got 20 and others got 100, it's harder to pin down the average. Try increasing the standard error!
- More confidence ︎→︎ wider interval. Saying "I'm 99% sure" requires casting a wider net than "I'm 90% sure." Try switching between the confidence level buttons!
- It's always a tradeoff. You can be very precise OR very confident, but not both at the same time (unless you get more data!).
Score: / 5
▶Hypothesis Testing & P-Values
Statistics · Inference
The null hypothesis H0 says the true coefficient is zero: no effect. The t-statistic measures how many standard errors your estimate sits from zero. The p-value asks: if H0 were true, how often would you see a result this extreme just by chance?
Drag the t-statistic below and watch the p-value and decision update in real time.
This is your regression coefficient — the sample estimate of the true population parameter β. It tells you the estimated effect of X on Y. Because it comes from a sample, it carries uncertainty. Under H0: β = 0, a t-stat far from zero suggests the true effect is probably nonzero.
The standard error measures uncertainty in your estimate. A larger sample reduces SE (SE ∝ 1/√n), making your t-statistic larger and your test more powerful. A noisier dataset (higher σ²) inflates SE, making it harder to reject H0.
Key Takeaways
- A p-value is not the probability H0 is true. It’s the probability of seeing data this extreme or more assuming H0 were true. These are very different things.
- Failing to reject ≠ accepting H0. You may simply lack enough data (statistical power) to detect a real effect. Absence of evidence is not evidence of absence.
- Statistical significance ≠ economic significance. With a large enough sample, even a trivially small effect will produce p < 0.05. Always report the coefficient magnitude and a confidence interval alongside the p-value.
- α = 0.05 is conventional, not sacred. It was popularized by Fisher in the 1920s. The right threshold depends on the cost of false positives vs. false negatives in your specific application.
- Two-tailed vs. one-tailed. A two-tailed test asks: is β ≠ 0? A one-tailed test asks: is β > 0 or β < 0? Toggle between them above. Note that one-tailed tests have less demanding critical values — use them only when theory strongly predicts a direction before seeing the data.
Quick Check
▶Type I & Type II Errors, Statistical Power
Statistics · Inference
The chart below shows the H0 and H1 distributions. The critical value separates “reject” from “fail to reject.” Drag the effect size to see how power changes.
Key Takeaways
- α and β trade off against each other. Making the test stricter (lower α) moves the critical value further out, which increases β. You will miss more true effects. There is no free lunch.
- Power increases with effect size and sample size. Larger true effects are easier to detect. More data reduces SE, which narrows both distributions and makes them easier to separate.
- Underpowered studies are dangerous. A study that finds p > α may simply lack the sample size to detect the effect — not evidence that no effect exists.
- Power analysis should happen before data collection. Given a target power (e.g., 80%), a hypothesized effect size, and α, you can calculate the required n. Post-hoc power calculations are generally uninformative.
▶Ordinary Least Squares
Econometrics · Linear Regression
OLS fits a line through data by minimizing the sum of squared residuals. The sliders add measurement error to each variable independently. The hidden variable slider introduces an omitted confounder correlated with both x1 and x2 — watch how the coefficients get pulled as its effect on Y grows.
x2 (red): adds measurement error to x2 only — red dots spread horizontally, x2's standard error grows.
hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the regression. As U rises, its effect on Y grows and the coefficients on x1 and x2 pick up its influence — omitted variable bias.
▶Instrumental Variables
Econometrics · Causal Inference
OLS is unbiased only when the regressor X is uncorrelated with the error. When X is endogenous — correlated with unobserved determinants of Y — OLS picks up not just the causal effect but also the back-door path through the omitted variable. An instrumental variable Z breaks the endogeneity: it shifts X (relevance) but has no direct effect on Y (exclusion restriction). 2SLS uses only the exogenous part of X's variation — the part explained by Z — to identify β.
First Stage — Z ︎→︎ X
Structural Equation — X ︎→︎ Y
endogeneity ρ (purple): the correlation between X and the error term — the source of OLS bias. OLS absorbs the back-door path through the omitted variable; 2SLS strips it out. At ρ = 0 both estimators are consistent; as ρ rises the OLS coefficient drifts while 2SLS stays near 1.
▶Maximum Likelihood Estimation
Econometrics · Estimation
Maximum Likelihood Estimation finds parameters that make the observed data most probable. For binary outcomes we use logistic regression — MLE's answer to OLS for 0/1 data. Instead of minimizing squared residuals, MLE maximizes the log-likelihood. Output shows z-statistics and a Pseudo R² in place of the OLS R².
x2 (red): adds measurement error to x2 — same effect on x2's coefficient.
hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the logit. As U rises, it shifts the true probabilities and the estimated coefficients pick up its influence.
▶Fixed and Random Effects
Panel Data · Econometrics
With panel data — multiple observations per unit over time — unobserved individual characteristics (fixed effects) can bias pooled OLS. Demeaning subtracts each unit's mean from its observations, eliminating any time-invariant omitted variable. The fixed effects estimator exploits only within-unit variation to identify β. Random effects instead treats the individual effect as random and uncorrelated with X, allowing between-unit variation to also inform the estimate — gaining efficiency at the cost of that assumption.
unit heterogeneity (σα): scales the time-invariant group intercepts αi, which are correlated with group-level X. This biases pooled OLS but is removed exactly by demeaning — the FE/RE estimates are unaffected.
hidden variable (U): U is a time-varying omitted variable negatively correlated with within-group X variation. Because it varies within groups, demeaning does not remove it — U biases both pooled OLS and FE/RE estimates. Push far enough and the slope flips sign even after demeaning.
clustered std. error (σ): adds idiosyncratic within-unit noise — widens standard errors but does not bias the slope. This is the variation that clustered standard errors are designed to account for.
▶R-Squared & Goodness of Fit
Regression · Model Fit
But R² can mislead. A model with high R² may be spurious, overfit, or irrelevant for causal questions. And a valid causal estimate from an IV regression might have R² = 0.02. R² measures fit — not correctness, not causality.
Adjust the sliders to see how R² responds to noise and slope.
Key Takeaways
- R² = 1 − RSS/TSS. TSS is total variation in Y; RSS is the variation left over after the model. R² is the fraction the model accounts for.
- High R² does not mean the regression is valid. A spurious regression of two trending time series can produce R² > 0.99 even when there’s no real relationship.
- R² always increases when you add variables, even irrelevant ones. Use adjusted R² or information criteria (AIC, BIC) when comparing models with different numbers of regressors.
- In causal inference, R² is often unimportant. What matters is whether β̂ is unbiased, not how much of Y’s variance is explained.
▶Omitted Variable Bias
Regression · Causal Inference
The bias formula: Bias ≈ γ × ρ(X, Z), where γ is the effect of the omitted variable Z on Y, and ρ is how correlated Z is with X. If either is zero, there is no bias.
Move the sliders to see how the biased estimate (dashed red) diverges from the true line (teal).
Key Takeaways
- Bias = γ × δ̂, where δ̂ is the coefficient from regressing Z on X. If the omitted variable is uncorrelated with X (ρ = 0), there is no bias even if Z strongly affects Y.
- You can sign the bias. If Z has a positive effect on Y (γ > 0) and is positively correlated with X (ρ > 0), β̂ will be upward biased. Knowing the direction helps even if you can’t measure Z.
- Adding controls reduces OVB only if they proxy the omitted variable. Adding irrelevant controls doesn’t help and can introduce “bad control” bias.
- This is why randomization is valuable. Random assignment of X makes ρ(X, Z) ≈ 0 for all omitted Z, eliminating OVB by design.
▶Heteroskedasticity
Regression · Diagnostics
OLS is still unbiased under heteroskedasticity, but standard errors are wrong. The usual SEs are too small where variance is high, making your t-stats too large and p-values too small. The fix is heteroskedasticity-robust standard errors (HC1/HC2/HC3).
Move the severity slider to see the residual fan grow.
Key Takeaways
- OLS coefficients are still unbiased. Heteroskedasticity affects inference (standard errors, t-stats, p-values), not the point estimates themselves.
- Always use robust standard errors. In R:
lm_robust()orcoeftest(m, vcov=vcovHC(m)). In Stata:reg y x, robust. The cost is zero; there is no reason not to. - Visual check: residuals vs. fitted values plot. If the spread of residuals increases with fitted values, heteroskedasticity is likely. The Breusch–Pagan or White test provides a formal check.
- Weighted Least Squares (WLS) can be more efficient than OLS when the form of heteroskedasticity is known, by down-weighting high-variance observations.
▶The Regression Weighting Problem
Aronow & Samii (2016) — Econometrics · Program Evaluation
When you run OLS with control variables, you may assume you are estimating the average treatment effect (ATE) across your full sample. Aronow and Samii (2016) show this is not the case. OLS implicitly assigns each observation a weight based on how much the treatment varies conditional on the covariates. Observations where treatment is nearly perfectly predicted by covariates receive very little weight — they barely influence the estimate at all. The result is that OLS identifies an effect for a subset of the data that may look very different from the full sample.
The Math
Let Di be the treatment indicator and Xi be covariates. Define the OLS residual from regressing treatment on covariates:
The OLS estimator of the treatment coefficient is a weighted average of unit-level treatment effects, with weights:
Units whose treatment status is nearly determined by their covariates (êi ≈ 0) receive near-zero weight. The effective sample — the observations actually driving your estimate — can be far smaller and systematically different from your full sample.
Why It Matters
- External validity: Your estimate may generalize only to the effective sample, not the full population you studied.
- Heterogeneous effects: If treatment effects vary by covariates, OLS does not give you the ATE — it gives you the treatment effect for the people in the middle of the covariate distribution.
- Reporting: You should report who is in your effective sample alongside your estimates.
Simulation: Drag to See the Problem
The slider controls how strongly a covariate predicts treatment — simulating increasing selection bias. Drag it right to watch the effective sample shrink and diverge from the full sample in real time.
AS Weight by Propensity Score
Where OLS Gets Its Identification
At high separation (slider right): treated and control barely overlap. Only observations in the narrow middle zone identify the effect — shown in blue on the right. The gray mass of the full sample gets near-zero weight and effectively disappears from the estimate.
Real Data: LaLonde (1986)
The LaLonde dataset from the National Supported Work (NSW) job training experiment contains 445 treated and 260 control observations with covariates including age, education, race, marital status, and prior earnings. The charts below show how OLS weights are distributed and how the effective sample differs from the full sample.
# Install packages if needed:
# install.packages(c("MatchIt", "cobalt"))
library(MatchIt) # provides lalonde dataset
library(cobalt) # for love.plot / balance checks
data("lalonde")
# ── Step 1: OLS of treatment on covariates ──────────────────────────────────
fit_ps <- lm(treat ~ age + educ + black + hisp + married + nodegree + re74 + re75,
data = lalonde)
# ── Step 2: Compute Aronow-Samii weights ────────────────────────────────────
e_hat <- residuals(fit_ps) # ê_i = D_i - E[D_i | X_i]
as_wts <- e_hat^2 / sum(e_hat^2) # normalize so weights sum to 1
# ── Step 3: Effective sample size ───────────────────────────────────────────
n_eff <- 1 / sum(as_wts^2)
cat("Full sample n:", nrow(lalonde), "\n")
cat("Effective sample n:", round(n_eff, 1), "\n")
# ── Step 4: Compare full sample vs effective sample covariate means ──────────
covs <- c("age", "educ", "black", "hisp", "married", "nodegree", "re74", "re75")
full_means <- colMeans(lalonde[, covs])
eff_means <- sapply(covs, function(v) weighted.mean(lalonde[[v]], as_wts))
comparison <- data.frame(
covariate = covs,
full_sample = round(full_means, 3),
eff_sample = round(eff_means, 3),
difference = round(eff_means - full_means, 3)
)
print(comparison)
# ── Step 5: Run main OLS and note what it actually estimates ─────────────────
ols <- lm(re78 ~ treat + age + educ + black + hisp + married + nodegree + re74 + re75,
data = lalonde)
summary(ols)
# The treatment coefficient is a weighted avg treatment effect —
# *not* the ATE over the full LaLonde sample.
* ── Load LaLonde data ──────────────────────────────────────────────────────
* Download from: https://users.nber.org/~rdehejia/data/nsw_dw.dta
* or use the version bundled with teffects/Stata 13+
* Using built-in Stata example data (NSW experimental):
* webuse lalonde, clear
* Or load your own:
use "nsw_dw.dta", clear
* ── Step 1: OLS of treatment on covariates ─────────────────────────────────
reg treat age educ black hispanic married nodegree re74 re75
* ── Step 2: Compute Aronow-Samii weights ───────────────────────────────────
predict e_hat, residuals // ê_i = D_i - E[D_i | X_i]
gen e_hat_sq = e_hat^2
quietly summarize e_hat_sq
gen as_wt = e_hat_sq / r(sum) // normalize weights to sum to 1
* ── Step 3: Effective sample size ──────────────────────────────────────────
gen as_wt_sq = as_wt^2
quietly summarize as_wt_sq
scalar n_eff = 1 / r(sum)
display "Full sample n = " _N
display "Effective sample n = " n_eff
* ── Step 4: Compare full vs effective sample covariate means ────────────────
foreach v in age educ black hispanic married nodegree re74 re75 {
quietly summarize `v'
scalar full_`v' = r(mean)
quietly summarize `v' [aw = as_wt]
scalar eff_`v' = r(mean)
display "`v': full = " full_`v' " effective = " eff_`v' ///
" diff = " eff_`v' - full_`v'
}
* ── Step 5: Main OLS regression ─────────────────────────────────────────────
reg re78 treat age educ black hispanic married nodegree re74 re75
* The coefficient on treat is a weighted average treatment effect (WATE)
* over the effective sample, not the ATE over the full sample.
# pip install causaldata numpy pandas statsmodels
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from causaldata import lalonde # LaLonde NSW experimental data
df = lalonde.load_pandas().data
# ── Step 1: OLS of treatment on covariates ──────────────────────────────────
covariates = "age + educ + black + hisp + married + nodegree + re74 + re75"
ps_model = smf.ols(f"treat ~ {covariates}", data=df).fit()
# ── Step 2: Aronow-Samii weights ─────────────────────────────────────────────
e_hat = ps_model.resid # ê_i = D_i - Ê[D_i | X_i]
as_wts = e_hat**2 / (e_hat**2).sum() # normalize to sum = 1
# ── Step 3: Effective sample size ────────────────────────────────────────────
n_eff = 1 / (as_wts**2).sum()
print(f"Full sample n: {len(df)}")
print(f"Effective sample n: {n_eff:.1f}")
# ── Step 4: Compare full vs effective sample covariate means ─────────────────
covs = ["age", "educ", "black", "hisp", "married", "nodegree", "re74", "re75"]
comparison = pd.DataFrame({
"full_sample": df[covs].mean(),
"eff_sample" : df[covs].apply(lambda col: np.average(col, weights=as_wts)),
})
comparison["difference"] = comparison["eff_sample"] - comparison["full_sample"]
print("\nCovariate balance: full sample vs. effective sample")
print(comparison.round(3))
# ── Step 5: Main OLS — note what it actually estimates ───────────────────────
ols = smf.ols(f"re78 ~ treat + {covariates}", data=df).fit()
print(ols.summary())
# The 'treat' coefficient is the WATE over the effective sample —
# not the ATE over the full LaLonde NSW sample.
This lab uses the LaLonde (1986) dataset — a widely used dataset in program evaluation that tracks earnings outcomes for participants in a 1970s job training program. The goal is to practice loading data, inspecting its structure, and computing basic summary statistics by group.
| Variable | Description |
|---|---|
treat | 1 = received job training, 0 = control group |
age | Age in years |
educ | Years of education completed |
race | Race: black / hispan / white |
married | 1 = married at time of enrollment |
nodegree | 1 = no high school diploma |
re74 | Real earnings in 1974 (pre-program, $) |
re75 | Real earnings in 1975 (pre-program, $) |
re78 | Real earnings in 1978 (post-program outcome, $) |
- Download the dataset above and save it to a working directory.
- Load the data using the code below for the preferred program.
- Examine the structure: how many observations? How many variables? What are the variable types?
- Compute mean
re78separately for the treated (treat = 1) and control (treat = 0) groups. What patterns emerge?
* Lab 1: Loading Data in Stata * ───────────────────────────────────────────────── * Step 1 — Import the CSV import delimited "lalonde.csv", clear * Step 2 — Examine the dataset structure describe list in 1/5 * Step 3 — Summary statistics for all variables summarize * Step 4 — Mean earnings in 1978 by treatment status tabstat re78, by(treat) stat(mean sd n)
# Lab 1: Loading Data in R
# ─────────────────────────────────────────────────
# Step 1 — Import the CSV
data <- read.csv("lalonde.csv")
# Step 2 — Examine the dataset structure
str(data)
head(data)
# Step 3 — Summary statistics for all variables
summary(data)
# Step 4 — Mean earnings in 1978 by treatment status
tapply(data$re78, data$treat,
function(x) c(mean = mean(x), sd = sd(x), n = length(x)))
# Lab 1: Loading Data in Python
# ─────────────────────────────────────────────────
import pandas as pd
# Step 1 — Import the CSV
data = pd.read_csv("lalonde.csv")
# Step 2 — Examine the dataset structure
print(data.dtypes)
print(data.head())
# Step 3 — Summary statistics for all variables
print(data.describe())
# Step 4 — Mean earnings in 1978 by treatment status
print(data.groupby("treat")["re78"].agg(["mean", "std", "count"]))
Raw data rarely arrives analysis-ready. This lab covers the essential cleaning steps for the LaLonde dataset: checking for missing values, inspecting variable types, converting a categorical string variable into indicator (dummy) variables, and flagging potential outliers.
| Variable | Type | Notes |
|---|---|---|
race | String | Needs to be converted to dummies: black, hispan |
re74, re75 | Numeric | Pre-program earnings; many zeros are legitimate (unemployed) |
re78 | Numeric | Outcome variable; check for implausible values |
treat, married, nodegree | Binary | Should only take values 0 or 1 |
- Check for missing values across all variables. Are there any?
- Verify that binary variables (
treat,married,nodegree) only contain 0 and 1. - Convert
raceinto two dummy variables:blackandhispan(white is the omitted reference category). - Inspect the distribution of
re78for outliers. What is the maximum value?
* Lab 2: Cleaning the Data in Stata
* ─────────────────────────────────────────────────
* Step 1 — Check for missing values
misstable summarize
* Step 2 — Verify binary variables
tabulate treat
tabulate married
tabulate nodegree
* Step 3 — Create dummy variables from race string
generate black = (race == "black")
generate hispan = (race == "hispan")
drop race
* Step 4 — Inspect the distribution of re78
summarize re78, detail
histogram re78, bin(30) title("Distribution of 1978 Earnings")
# Lab 2: Cleaning the Data in R
# ─────────────────────────────────────────────────
# Step 1 — Check for missing values
colSums(is.na(data))
# Step 2 — Verify binary variables
lapply(data[, c("treat", "married", "nodegree")], table)
# Step 3 — Create dummy variables from race string
data$black <- as.integer(data$race == "black")
data$hispan <- as.integer(data$race == "hispan")
data$race <- NULL
# Step 4 — Inspect the distribution of re78
summary(data$re78)
hist(data$re78, breaks = 30, main = "Distribution of 1978 Earnings",
xlab = "re78")
# Lab 2: Cleaning the Data in Python
# ─────────────────────────────────────────────────
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("lalonde.csv")
# Step 1 — Check for missing values
print(data.isnull().sum())
# Step 2 — Verify binary variables
for col in ["treat", "married", "nodegree"]:
print(col, data[col].value_counts().to_dict())
# Step 3 — Create dummy variables from race string
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
# Step 4 — Inspect the distribution of re78
print(data["re78"].describe())
data["re78"].hist(bins=30)
plt.title("Distribution of 1978 Earnings")
plt.xlabel("re78")
plt.show()
Before running any regression, a well-constructed summary statistics table is essential. This lab covers computing means, standard deviations, and counts for key variables — overall and by treatment status — and producing a publication-style table.
| Variable | Description |
|---|---|
treat | Treatment indicator (use to split the sample) |
age, educ | Demographics |
black, hispan, married, nodegree | Binary characteristics |
re74, re75, re78 | Earnings in 1974, 1975, 1978 ($) |
- After loading and cleaning the data (Labs 1 & 2), compute the mean and standard deviation of each variable for the full sample.
- Split the sample by
treatand compute means for the treated and control groups separately. - Produce a table with three columns: Control mean, Treated mean, and Full-sample mean.
- Compare pre-program earnings (
re74,re75) across groups. What does balance (or imbalance) suggest about the study design?
* Lab 3: Summary Statistics in Stata
* ─────────────────────────────────────────────────
* Step 1 — Full-sample summary
local vars age educ black hispan married nodegree re74 re75 re78
summarize `vars'
* Step 2 — Means by treatment status
tabstat `vars', by(treat) stat(mean sd n) format(%9.2f)
* Step 3 — Test for differences in means (balance check)
foreach v of local vars {
ttest `v', by(treat)
}
# Lab 3: Summary Statistics in R
# ─────────────────────────────────────────────────
vars <- c("age","educ","black","hispan","married","nodegree","re74","re75","re78")
# Step 1 — Full-sample summary
summary(data[, vars])
# Step 2 — Means by treatment status
do.call(rbind, lapply(vars, function(v) {
c(variable = v,
control = round(mean(data[data$treat == 0, v]), 2),
treated = round(mean(data[data$treat == 1, v]), 2),
full = round(mean(data[, v]), 2))
}))
# Step 3 — t-tests for balance
lapply(vars, function(v) {
t.test(data[data$treat == 1, v], data[data$treat == 0, v])
})
# Lab 3: Summary Statistics in Python
# ─────────────────────────────────────────────────
import pandas as pd
from scipy import stats
data = pd.read_csv("lalonde.csv")
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
vars = ["age","educ","black","hispan","married","nodegree","re74","re75","re78"]
# Step 1 — Full-sample summary
print(data[vars].describe().round(2))
# Step 2 — Means by treatment status
table = data.groupby("treat")[vars].mean().T.round(2)
table.columns = ["Control", "Treated"]
table["Full"] = data[vars].mean().round(2)
print(table)
# Step 3 — t-tests for balance
for v in vars:
t, p = stats.ttest_ind(data.loc[data.treat==1, v],
data.loc[data.treat==0, v])
print(f"{v}: t = {t:.3f}, p = {p:.3f}")
LaLonde (1986) uses a randomized experiment to measure the earnings effect of job training. Because the NSW program randomly assigned workers to treatment and control, a simple comparison of mean 1978 earnings across groups gives an unbiased estimate of the causal effect. This lab asks you to compute that estimate using OLS and then add control variables to see how much the answer moves. The estimate from this lab is the benchmark you will compare all future methods against.
| Variable | Role |
|---|---|
re78 | Outcome — real earnings in 1978 ($) |
treat | Key regressor — 1 if received job training |
age, educ | Continuous controls |
black, hispan, married, nodegree | Binary controls |
re74, re75 | Pre-program earnings controls |
- Compute mean
re78fortreat=1andtreat=0separately. The difference is LaLonde's experimental estimate. - Run OLS of
re78ontreatonly. The coefficient should exactly match your mean difference. Why? - Add demographic controls (
age,educ,black,hispan,married,nodegree). Does the coefficient ontreatchange? - Add pre-program earnings (
re74,re75). Does the estimate move?
- Does the coefficient on
treatstay stable across all four models? Record the fully controlled estimate as your benchmark. In Labs 5 and 6 you will try to recover this number using methods that do not rely on random assignment. What would large instability here tell you about whether randomization actually worked?
* Step 1. Mean earnings by treatment group (the experimental estimate) tabstat re78, by(treat) stat(mean) * Step 2. OLS with treatment only (should match the mean difference exactly) regress re78 treat, robust * Step 3. Add demographic controls regress re78 treat age educ black hispan married nodegree, robust * Step 4. Add pre-program earnings regress re78 treat age educ black hispan married nodegree re74 re75, robust * Step 5. Record your benchmark * Record the treat coefficient from Step 4. * This is your experimental benchmark for Labs 5 and 6. * Does it stay stable across Steps 2, 3, and 4?
library(sandwich)
library(lmtest)
data <- read.csv("lalonde.csv")
# Step 1. Mean earnings by treatment group (the experimental estimate)
tapply(data$re78, data$treat, mean)
# Step 2. OLS with treatment only (coefficient on treat = mean difference)
m1 <- lm(re78 ~ treat, data = data)
coeftest(m1, vcov = vcovHC(m1, type = "HC1"))
# Step 3. Add demographic controls
m2 <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree,
data = data)
coeftest(m2, vcov = vcovHC(m2, type = "HC1"))
# Step 4. Add pre-program earnings
m3 <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree +
re74 + re75, data = data)
coeftest(m3, vcov = vcovHC(m3, type = "HC1"))
# Step 5. Record your benchmark
cat("Benchmark estimate:", coef(m3)["treat"], "\n")
# Record this. It is your experimental benchmark for Labs 5 and 6.
# Does it stay stable across Steps 2, 3, and 4?
import pandas as pd
import statsmodels.formula.api as smf
data = pd.read_csv("lalonde.csv")
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
# Step 1. Mean earnings by treatment group (the experimental estimate)
print(data.groupby("treat")["re78"].mean())
# Step 2. OLS with treatment only (coefficient = mean difference)
m1 = smf.ols("re78 ~ treat", data=data).fit(cov_type="HC1")
print(m1.summary())
# Step 3. Add demographic controls
m2 = smf.ols("re78 ~ treat + age + educ + black + hispan + married + nodegree",
data=data).fit(cov_type="HC1")
print(m2.summary())
# Step 4. Add pre-program earnings
m3 = smf.ols("re78 ~ treat + age + educ + black + hispan + married + nodegree"
" + re74 + re75", data=data).fit(cov_type="HC1")
print(m3.summary())
# Step 5. Record your benchmark
print("Benchmark estimate:", m3.params["treat"])
# Record this. It is your experimental benchmark for Labs 5 and 6.
# Does it stay stable across Steps 2, 3, and 4?
Probit is a nonlinear regression model for binary outcomes. It estimates the probability that the dependent variable equals 1 as a function of covariates, using the standard normal CDF rather than a straight line. This lab uses probit to model the probability of receiving job training. At the end there is an extension that builds on the probit output to estimate a selection-corrected treatment effect.
| Variable | Role |
|---|---|
treat | Dependent variable in first stage, 1 if received training |
re78 | Outcome in second stage, earnings in 1978 |
age, educ, black, hispan, married, nodegree | Covariates in both stages |
re74, re75 | Pre-program earnings, strong predictors of selection |
- Run a probit regression of
treaton all available covariates. Which variables predict participation? Report marginal effects. - Generate predicted probabilities from the probit. Compare the distribution for treated vs. control workers. Do the groups overlap well?
- Compare your probit estimate of the treatment effect to the Lab 4 benchmark. How close is it?
- Compute the inverse Mills ratio (IMR) for each observation using phi(xb) / Phi(xb), where phi is the standard normal PDF and Phi is the CDF at the probit linear index. Add it as a control in a second-stage OLS of
re78ontreatand all covariates. This is the Heckman two-step selection correction. - Compare the Heckman estimate to both the Lab 4 benchmark and your Lab 6 DiD. Which method came closest to the true effect?
* Step 1. Probit first stage probit treat age educ black hispan married nodegree re74 re75 margins, dydx(*) * Step 2. Predicted probabilities (propensity scores) predict phat, pr tabstat phat, by(treat) stat(mean min max) * Step 3. Inverse Mills ratio predict xb, xb gen imr = normalden(xb) / normal(xb) * Step 4. Second-stage OLS regress re78 treat age educ black hispan married nodegree re74 re75 imr, robust * Step 5. Compare to benchmarks di "Heckman: " _b[treat] * Record this and compare to your Lab 4 and Lab 6 estimates. * Which method came closest to the true effect?
library(sandwich)
library(lmtest)
library(margins)
data <- read.csv("lalonde.csv")
# Step 1. Probit first stage
probit <- glm(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
data = data, family = binomial(link = "probit"))
summary(margins(probit))
# Step 2. Predicted probabilities (propensity scores)
data$phat <- predict(probit, type = "response")
tapply(data$phat, data$treat, summary)
# Step 3. Inverse Mills ratio
xb <- predict(probit, type = "link")
data$imr <- dnorm(xb) / pnorm(xb)
# Step 4. Second-stage OLS
heck <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree +
re74 + re75 + imr, data = data)
coeftest(heck, vcov = vcovHC(heck, type = "HC1"))
# Step 5. Compare to benchmarks
cat("Heckman: ", coef(heck)["treat"], "\n")
cat("Lab 4 benchmark: record your value here\n")
cat("Lab 6 DiD: record your value here\n")
# Which method came closest to the true effect?
import pandas as pd
import statsmodels.formula.api as smf
from scipy.stats import norm
data = pd.read_csv("lalonde.csv")
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
# Step 1. Probit first stage
probit = smf.probit(
"treat ~ age + educ + black + hispan + married + nodegree + re74 + re75",
data=data
).fit()
print(probit.get_margeff().summary())
# Step 2. Predicted probabilities (propensity scores)
data["phat"] = probit.predict()
print(data.groupby("treat")["phat"].describe())
# Step 3. Inverse Mills ratio
xb = probit.fittedvalues
data["imr"] = norm.pdf(xb) / norm.cdf(xb)
# Step 4. Second-stage OLS
heck = smf.ols(
"re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75 + imr",
data=data
).fit(cov_type="HC1")
print(heck.summary())
# Step 5. Compare to benchmarks
print("Heckman: ", heck.params["treat"])
print("Lab 4 benchmark: record your value here")
print("Lab 6 DiD: record your value here")
# Which method came closest to the true effect?
Difference-in-differences (DiD) estimates a treatment effect by comparing the change in outcomes over time for a treated group against the change for an untreated group. It differences out any time-invariant differences between groups and any common time trend, leaving only the treatment effect. The key assumption is parallel trends: in the absence of treatment, both groups would have followed the same trajectory. This lab applies DiD to the LaLonde data using pre-program earnings as the pre-period. At the end you will compare your estimate to the experimental benchmark from Lab 4.
| Variable | Role |
|---|---|
re78 | Outcome, post-period earnings (1978) |
re75 | Pre-period earnings (1975) |
re74 | Second pre-period earnings (1974), for trend check |
treat | Treatment indicator, 1 if received job training |
age, educ, black, hispan, married, nodegree | Controls |
- Compute the four group means: average
re75andre78fortreat=0andtreat=1. Calculate the DiD by hand. - Reshape the data to long format with one row per person per period. Create a
postindicator equal to 1 for the 1978 observation. Run a DiD regression. Does the coefficient on thepost x treatinteraction match your manual calculation? - Add demographic controls. Does the estimate change?
- Compare your DiD estimate to the Lab 4 benchmark. How close is it?
- Compute average
re74andre75for each group. Is the change from 1974 to 1975 the same for treated and control workers? What might a difference in pre-period trends tell you about why DiD did or did not recover the true effect?
* Step 1. Four group means and manual DiD tabstat re74 re75 re78, by(treat) stat(mean) * DiD = (re78_treat1 - re75_treat1) - (re78_treat0 - re75_treat0) * Calculate this by hand from the table above. * Step 2. Reshape and DiD regression gen id = _n rename re75 earnings0 rename re78 earnings1 reshape long earnings, i(id) j(post) regress earnings i.post##i.treat, robust * Coefficient on 1.post#1.treat is the DiD estimate. * Step 3. Add controls regress earnings i.post##i.treat age educ black hispan married nodegree, robust * Step 4. Pre-period trend check * Run in original wide data before the reshape: tabstat re74 re75, by(treat) stat(mean) * Is the change from re74 to re75 the same for both groups? * Step 5. Compare to benchmark * Record your DiD estimate from Step 2. * Compare it to your Lab 4 benchmark. * How large is the gap? What does the re74-to-re75 pattern suggest about why?
library(dplyr)
library(tidyr)
library(sandwich)
library(lmtest)
data <- read.csv("lalonde.csv")
# Step 1. Four group means and manual DiD
data %>% group_by(treat) %>%
summarise(mean_re74 = mean(re74), mean_re75 = mean(re75), mean_re78 = mean(re78))
did_manual <- (mean(data$re78[data$treat==1]) - mean(data$re75[data$treat==1])) -
(mean(data$re78[data$treat==0]) - mean(data$re75[data$treat==0]))
cat("Manual DiD:", did_manual, "\n")
# Step 2. Reshape and DiD regression
data_long <- data %>%
mutate(id = row_number()) %>%
pivot_longer(cols = c(re75, re78), names_to = "year", values_to = "earnings") %>%
mutate(post = as.integer(year == "re78"))
did <- lm(earnings ~ post * treat, data = data_long)
coeftest(did, vcov = vcovHC(did, type = "HC1"))
# Step 3. Add controls
did_c <- lm(earnings ~ post * treat + age + educ + black + hispan + married + nodegree,
data = data_long)
coeftest(did_c, vcov = vcovHC(did_c, type = "HC1"))
# Step 4. Pre-period trend check
data %>% group_by(treat) %>%
summarise(change_74_75 = mean(re75) - mean(re74))
# Is the change the same for both groups?
# Step 5. Compare to benchmark
cat("DiD estimate: ", coef(did)["post:treat"], "\n")
cat("Lab 4 benchmark: record your value here\n")
# How large is the gap? What does the pre-period pattern suggest about why?
import pandas as pd
import statsmodels.formula.api as smf
data = pd.read_csv("lalonde.csv")
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
# Step 1. Four group means and manual DiD
print(data.groupby("treat")[["re74","re75","re78"]].mean())
did_manual = (
(data.loc[data.treat==1,"re78"].mean() - data.loc[data.treat==1,"re75"].mean()) -
(data.loc[data.treat==0,"re78"].mean() - data.loc[data.treat==0,"re75"].mean())
)
print("Manual DiD:", did_manual)
# Step 2. Reshape and DiD regression
data_long = pd.melt(
data.assign(id=range(len(data))),
id_vars=["id","treat","age","educ","black","hispan","married","nodegree","re74"],
value_vars=["re75","re78"], var_name="year", value_name="earnings"
)
data_long["post"] = (data_long["year"] == "re78").astype(int)
did = smf.ols("earnings ~ post * treat", data=data_long).fit(cov_type="HC1")
print(did.summary())
# Step 3. Add controls
did_c = smf.ols(
"earnings ~ post * treat + age + educ + black + hispan + married + nodegree",
data=data_long
).fit(cov_type="HC1")
print(did_c.summary())
# Step 4. Pre-period trend check
trends = data.groupby("treat")[["re74","re75"]].mean()
trends["change_74_75"] = trends["re75"] - trends["re74"]
print(trends)
# Is the change the same for both groups?
# Step 5. Compare to benchmark
print("DiD estimate: ", did.params["post:treat"])
print("Lab 4 benchmark: record your value here")
# How large is the gap? What does the pre-period pattern suggest about why?
Propensity score matching estimates a treatment effect by pairing each treated unit with a control unit that had a similar probability of receiving treatment given observed covariates. By matching on the propensity score rather than all covariates individually, we try to construct a comparison group that looks like the treated group on everything we can observe. This lab builds directly on Lab 5: the propensity scores you estimated there are the inputs here. Dehejia and Wahba (1999) revisited LaLonde's data using propensity score matching and found estimates much closer to the experimental benchmark than the methods in Labs 5 and 6 achieved. This lab asks you to see if you can replicate that result.
| Variable | Role |
|---|---|
treat | Treatment indicator, 1 if received job training |
re78 | Outcome, earnings in 1978 |
age, educ, black, hispan, married, nodegree | Matching covariates |
re74, re75 | Pre-program earnings, matching covariates |
- Estimate propensity scores using a probit of
treaton all covariates (same specification as Lab 5). - Match each treated worker to the nearest control worker by propensity score. How many control units were matched? Are any treated units far from their match?
- Check covariate balance before and after matching. Are the matched groups more similar on observable characteristics than the unmatched groups?
- Estimate the treatment effect on the matched sample. Compare to the Lab 4 benchmark. How close did matching get?
- Try a caliper to discard treated units with no close match (e.g., caliper of 0.01 on the propensity score). Does the estimate change? How many units are dropped? What does sensitivity to the caliper tell you about the quality of the match?
* Step 1. Estimate propensity scores and match (nearest-neighbor, ATT)
teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75), atet
* Step 2. Balance check before and after matching
tebalance summarize
* Look at the standardized differences. Values below 0.1 indicate good balance.
* Step 3. Treatment effect
* The teffects output reports the ATT directly.
* Record it and compare to your Lab 4 benchmark.
* Step 4. Caliper matching (Getting It Right)
* Use caliper to flag treated units with no close match, drop them,
* then re-estimate ATT on the on-support sample.
capture teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75), ///
atet caliper(0.01) osample(unmatched)
drop if unmatched == 1
di "Treated units dropped: " r(N_drop)
teffects psmatch (re78) (treat age educ black hispan married nodegree re74 re75), atet
* How does the estimate change? How many treated units were dropped?
library(MatchIt)
library(sandwich)
library(lmtest)
data <- read.csv("lalonde.csv")
# Step 1. Nearest-neighbor matching on propensity score (probit)
m_out <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
data = data, method = "nearest", distance = "probit")
# Step 2. Balance check
summary(m_out, standardize = TRUE)
# Look at standardized mean differences before and after matching.
# Step 3. Treatment effect on matched sample
m_data <- match.data(m_out)
fit <- lm(re78 ~ treat + age + educ + black + hispan + married + nodegree + re74 + re75,
data = m_data, weights = weights)
coeftest(fit, vcov = vcovCL(fit, cluster = ~subclass))
cat("Matching ATT:", coef(fit)["treat"], "\n")
cat("Lab 4 benchmark: record your value here\n")
# Step 4. Caliper matching (Getting It Right)
m_cal <- matchit(treat ~ age + educ + black + hispan + married + nodegree + re74 + re75,
data = data, method = "nearest", distance = "probit", caliper = 0.01)
summary(m_cal)
# How many treated units were dropped? Does the estimate change?
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
from sklearn.neighbors import NearestNeighbors
data = pd.read_csv("lalonde.csv")
data["black"] = (data["race"] == "black").astype(int)
data["hispan"] = (data["race"] == "hispan").astype(int)
data = data.drop(columns=["race"])
# Step 1. Propensity scores
probit = smf.probit(
"treat ~ age + educ + black + hispan + married + nodegree + re74 + re75",
data=data
).fit()
data["pscore"] = probit.predict()
# Step 2. Nearest-neighbor matching
treated = data[data.treat == 1].reset_index(drop=True)
control = data[data.treat == 0].reset_index(drop=True)
nn = NearestNeighbors(n_neighbors=1)
nn.fit(control[["pscore"]])
distances, indices = nn.kneighbors(treated[["pscore"]])
matched_control = control.iloc[indices.flatten()].reset_index(drop=True)
# Step 3. Balance check
covs = ["age", "educ", "black", "hispan", "married", "nodegree", "re74", "re75"]
before = data.groupby("treat")[covs].mean().T
before.columns = ["control_before", "treated_before"]
after = pd.DataFrame({
"control_after": matched_control[covs].mean(),
"treated_after": treated[covs].mean()
})
print(pd.concat([before, after], axis=1).round(2))
# Step 4. ATT estimate
att = treated["re78"].mean() - matched_control["re78"].mean()
print("Matching ATT:", att)
print("Lab 4 benchmark: record your value here")
# Getting It Right: caliper matching
caliper = 0.01
mask = distances.flatten() <= caliper
att_cal = treated.loc[mask, "re78"].mean() - matched_control.loc[mask, "re78"].mean()
print("Caliper ATT (0.01):", att_cal)
print("Treated units dropped:", (~mask).sum())
▶Production Possibilities & Comparative Advantage
Microeconomics · International Trade
Comparative advantage says: even if one country is better at producing everything, both benefit by specializing in the good with the lower opportunity cost. Click the button to see consumption expand beyond the PPF.
Key Takeaways
- OC, not absolute productivity, drives comparative advantage. A has CA in wheat (OC = 0.5 cloth < 2). B has CA in cloth (OC = 0.5 wheat < 2). Even if one country is better at both goods, trade still benefits both.
- Specialization + trade pushes consumption outside the PPF. After trading at terms between both OC ratios (here 1 : 1), both countries consume bundles that were individually unattainable.
- The PPF slope = −(OC of the good on the x-axis). A steeper PPF means a higher OC — comparative disadvantage in that good.
- Trade is not zero-sum. Specialization creates new value; gains accrue to both sides simultaneously.
▶Demand
Microeconomics · Consumer Behavior
But the whole curve can also shift. When something other than price changes — income, tastes, prices of related goods, or expectations — demand at every price changes simultaneously. Use the demand shift slider to see a rightward shift (increase) or leftward shift (decrease).
The Demand Curve
- Income. For normal goods, rising income increases demand (right shift). For inferior goods, rising income decreases demand as consumers trade up.
- Prices of substitutes. If a substitute becomes more expensive, demand for this good rises. If it becomes cheaper, demand falls.
- Prices of complements. If a good used alongside this one gets cheaper, demand for this good rises; more expensive, demand falls.
- Tastes and preferences. Health trends, advertising, or fashion can shift demand in either direction without any price change.
- Expectations. If consumers expect higher prices in the future, they buy more now — shifting current demand right.
- Number of buyers. More consumers in the market means higher demand at every price.
- Law of Demand. All else equal, a higher price means lower quantity demanded. This is why demand curves slope downward. Drag the price slider up — the point moves left (less is demanded).
- Movement along the curve vs. a shift. A price change moves you along the existing curve. A change in income, tastes, or related prices shifts the entire curve. These are fundamentally different events.
- Substitutes and complements. If a substitute gets more expensive, consumers switch here — demand rises. If a complement gets more expensive, people use less of both — demand falls.
- Normal vs. inferior goods. For most goods, higher income shifts demand right. For inferior goods (like instant noodles), higher income shifts demand left as people upgrade.
- Expectations are part of demand. Anticipated price increases shift today’s demand right. Demand is forward-looking, not just about current prices.
Score: / 5
▶Supply
Microeconomics · Producer Behavior
The entire curve can also shift. When input costs change, technology improves, the number of sellers changes, or government policy shifts — supply at every price changes simultaneously. Use the supply shift slider to see a rightward shift (more supply) or leftward shift (less supply).
The Supply Curve
- Input (resource) prices. Cheaper labor, materials, or energy lowers production costs — supply increases (right shift). More expensive inputs shift supply left.
- Technology. Better production technology lowers costs and lets producers profitably offer more at every price — supply increases (right shift).
- Number of sellers. More firms entering the market increases market supply. Firms exiting decreases it.
- Government policies. Subsidies lower costs and increase supply. Taxes raise costs and decrease supply.
- Expectations. If producers expect higher future prices, they may withhold supply now, decreasing current supply.
- Prices of related goods in production. If a substitute in production becomes more profitable, producers switch, decreasing supply of this good.
- Law of Supply. All else equal, a higher price means higher quantity supplied. Supply curves slope upward because higher prices make production more profitable. Drag the price slider up — the point moves right (more is supplied).
- Movement along the curve vs. a shift. A change in the good’s own price moves you along the supply curve. A change in costs, technology, or the number of sellers shifts the entire curve.
- Lower costs shift supply right. When inputs become cheaper or technology improves, producers can profitably supply more at every price. This is what long-run technological progress looks like.
- The supply curve is a marginal cost curve. A producer only supplies a unit if the price covers the marginal cost of producing it. The supply curve tells you the minimum price needed to bring each additional unit to market.
- Market supply = sum of firm supply curves. The market supply curve is the horizontal sum of all individual firm supply curves. More sellers means a larger total supply at every price.
Score: / 5
▶Market Equilibrium
Microeconomics · Market Equilibrium
A market equilibrium is the price and quantity where buyers and sellers agree — where the amount consumers want to buy exactly equals the amount producers want to sell. Use the buttons below to shift the curves and see what happens!
Supply & Demand
- Demand shift vs. movement along demand. A demand shift happens when something other than price changes — income, tastes, prices of related goods, expectations, or the number of buyers. Moving along the curve is just a response to a price change. Try “Demand ↑” to see the whole curve move!
- Supply shift vs. movement along supply. Same idea: a supply shift happens when costs change, technology improves, or the number of sellers changes. “Supply ↑” simulates producers becoming more efficient (lower costs = more offered at every price).
- The four core results. Demand ↑ ︎→︎ P↑ Q↑. Demand ↓ ︎→︎ P↓ Q↓. Supply ↑ ︎→︎ P↓ Q↑. Supply ↓ ︎→︎ P↑ Q↓. These four cases are the foundation of ECO 2020.
- Shortage vs. surplus. If price is below equilibrium, consumers want more than producers offer — a shortage. Sellers raise prices until the market clears. Above equilibrium: a surplus drives prices back down. Equilibrium is stable.
- Both curves shift. Real markets often have both curves shifting simultaneously. Try moving both sliders at once. When both increase, quantity rises for certain — but the price change is ambiguous and depends on which shift is larger.
Score: / 5
▶Consumer & Producer Surplus
Microeconomics · Welfare
The competitive equilibrium maximizes total surplus. Any price above or below equilibrium destroys some of that value as deadweight loss.
Move the price slider above or below equilibrium to see what happens.
Key Takeaways
- CS = area above price, below demand curve. For a linear demand P = a − bQ, CS at price P* is ½(a − P*) × Q*.
- PS = area below price, above supply curve. For a linear supply P = c + dQ, PS at price P* is ½(P* − c) × Q*.
- Price controls create deadweight loss. A binding ceiling (below equilibrium) causes a shortage; a binding floor (above equilibrium) causes a surplus. Both prevent mutually beneficial trades.
- Total surplus is maximized at competitive equilibrium. This is the First Welfare Theorem: free markets allocate resources efficiently (under idealized conditions).
▶Elasticity
Microeconomics · Price Elasticity of Demand
Price Elasticity of Demand measures how sensitive consumers are to price: Price Elasticity of Demand = % change in Qd ÷ % change in P. elasticity > 1 = elastic (responsive); elasticity < 1 = inelastic (unresponsive).
Drag the price slider to see how elasticity changes along the curve. The shaded rectangle is total revenue (P × Q) — watch how it changes!
Demand Curve — shaded area = Total Revenue (P × Q)
- Price Elasticity of Demand = % ΔQd ÷ % ΔP. A value of −2 means a 1% price increase causes a 2% drop in quantity demanded. Price Elasticity of Demand is always negative (higher price → lower Qd). We compare the absolute value to 1 to classify elasticity.
- Elastic vs. inelastic. Absolute value > 1: elastic (price-sensitive). Absolute value < 1: inelastic. Absolute value = 1: unit elastic. Drag the price slider toward the top of the curve — elastic. Near the bottom — inelastic. Elasticity varies along a single linear curve!
- Elasticity and total revenue (TR = P × Q). Elastic demand: price ↑ ︎→︎ TR ↓ (Qd falls more than price rose). Inelastic demand: price ↑ ︎→︎ TR ↑ (Qd barely falls). Watch the shaded rectangle shrink or grow as you move the price slider.
- Determinants of elasticity. More elastic when: many substitutes, luxury good, large share of income, time to adjust. More inelastic when: few substitutes, necessity, small share of income, urgently needed.
- Slope ≠ elasticity. A steeper demand curve is not the same as a more inelastic one at every point. A linear curve changes from elastic to inelastic as you move down it — even though the slope is constant throughout.
Score: / 5
▶Price Controls
Microeconomics · Government Intervention
Drag below $7 for a ceiling — drag above $7 for a floor.
Key Takeaways
- Non-binding controls have no effect. A ceiling above P* or a floor below P* is ignored by the market.
- Price ceilings create shortages. Qd > Qs. The quantity traded is limited by supply. Examples: rent control, wartime rationing.
- Price floors create surpluses. Qs > Qd. The quantity traded is limited by demand. Examples: minimum wage, agricultural supports.
- Both create DWL. The prevented trades — the DWL triangle — represent mutually beneficial exchanges that can’t happen because the price is stuck at the wrong level.
▶Monopoly vs. Perfect Competition
Microeconomics · Market Structure
Move the MC slider to see how cost conditions change the efficiency gap.
Key Takeaways
- MR < P for a monopolist. To sell one more unit the monopolist must lower price on all units. For P = a − bQ, MR = a − 2bQ — same intercept, double the slope.
- Qm < Qc, Pm > MC. The monopolist produces less and charges more than a competitive market would.
- DWL = ½(Pm − MC)(Qc − Qm). This triangle is the value of unrealized mutually beneficial trades.
- Market power is about price-setting, not just profit. Antitrust policy targets the DWL caused by output restriction, not profit itself.
▶Externalities & Pigouvian Policy
Microeconomics · Market Failures
A Pigouvian tax (negative) or subsidy (positive) equal to the externality size internalizes it and restores efficiency. Toggle type and drag the slider.
Key Takeaways
- Negative externalities cause over-production. MSC > MPC ⇒ the supply curve understates the true social cost. Market Q exceeds efficient Q.
- Positive externalities cause under-provision. MSB > MPB ⇒ the demand curve understates the true social benefit. Market Q falls short of efficient Q.
- Pigouvian tax/subsidy internalizes the externality. A tax = marginal external cost shifts supply up to MSC; a subsidy = marginal external benefit shifts demand up to MSB. In both cases the corrected equilibrium equals the social optimum.
- The Coase theorem is an alternative. If property rights are well-defined and transaction costs are low, private bargaining can achieve the efficient outcome without government intervention.
▶Tax Incidence
Microeconomics · Policy
The burden is shared according to relative elasticities: the more inelastic side bears more of the tax. If demand is perfectly inelastic, buyers pay 100%. If supply is perfectly inelastic, sellers bear 100%.
Drag the tax slider and watch the burden split and deadweight loss grow.
Key Takeaways
- Statutory incidence ≠ economic incidence. It doesn’t matter whether the tax is levied on buyers or sellers — the economic burden is determined solely by relative elasticities.
- The more inelastic side bears more of the tax. If |εD| < |εS|, buyers pay more. Gasoline taxes fall mostly on consumers because fuel demand is inelastic.
- Taxes create deadweight loss. By reducing quantity below the efficient level, some mutually beneficial trades are prevented. DWL grows with the square of the tax rate.
- Tax revenue peaks before DWL explodes. There is a Laffer-curve logic at the micro level: very high tax rates shrink the tax base so much that revenue may fall even as DWL continues to grow.
▶Public Goods & the Free-Rider Problem
Microeconomics · Market Failures
For public goods, efficiency requires vertical summation of demand: at any quantity, the social marginal benefit equals the sum of each individual’s marginal benefit. The Nash equilibrium falls short of this optimum.
Key Takeaways
- Private goods: horizontal summation. At each price, sum quantities demanded across consumers. This gives market demand.
- Public goods: vertical summation. At each quantity, sum marginal benefits across consumers (all enjoy the same unit). This gives social MB.
- The free-rider problem causes under-provision. In Nash equilibrium, the lower-valuation consumer free-rides entirely; the dominant consumer stops at their own MB = MC, well below the social optimum.
- Remedies: government provision, corrective subsidies, or Coasian bargaining. If all three conditions are met (non-rivalry, non-excludability, low transactions costs), negotiated cost-sharing can achieve Q*.
▶Price Discrimination
Microeconomics · Monopoly Pricing
Toggle between regimes to see how the efficiency–equity trade-off plays out.
Key Takeaways
- Single-price: DWL but some CS. The monopolist charges everyone P = $6. Buyers with WTP > $6 keep consumer surplus; units 40–80 go unsold (DWL = $80).
- Perfect discrimination: efficient but no CS. Every unit is sold at the buyer’s WTP. Output reaches the competitive level (Q = 80, P = MC = $2). DWL = $0, but CS = $0 too.
- Real-world discrimination is imperfect (2nd & 3rd degree). Airlines, coupons, student/senior discounts, and bundling are all partial forms that partially close the DWL gap without extracting all CS.
- Price discrimination requires market power. Competitive firms cannot price-discriminate: buyers would simply switch to a cheaper rival.
▶Labor Market: Supply & Demand
Labor Economics · Wages & Employment
Use the sliders to shift labor demand (e.g., a technology boom raises MRP) or labor supply (e.g., immigration expands the workforce). Watch how the equilibrium wage W* and employment L* respond.
Key Takeaways
- Labor demand = firms’ MRP. DL shifts right when product prices rise, technology improves, or capital complements labor. Both wages and employment increase.
- Labor supply reflects workers’ opportunity cost. SL shifts right when immigration expands the workforce or participation rises — wages fall but employment rises.
- Demand and supply shocks have opposite wage effects. A demand increase → W↑ and L↑. A supply increase → W↓ and L↑. Watch employer and worker surplus redistribute between the shaded areas.
- Competitive equilibrium maximizes total surplus. W* = MRP at L*: no DWL. Policy interventions (minimum wages, payroll taxes) drive wedges between supply and demand, creating welfare losses analogous to price controls in goods markets.
Class slides for Labor Economics coming soon.
Interactive tools for Intermediate Micro coming soon.
Class slides for Intermediate Micro coming soon.
▶Simple Exponential Smoothing
Business Forecasting · Exponential Smoothing
The chart shows 36 months of retail sales data (blue) alongside the SES one-step-ahead forecast (red). Drag the α slider and watch how the forecast’s responsiveness changes.
Retail Sales vs. SES One-Step-Ahead Forecast
- Initialize: Set the level equal to the first observation: ℓ1 = y1.
- Update level: ℓt = α yt + (1 − α) ℓt−1 — a weighted average of the new observation and the prior level.
- Forecast: ŷt+1|t = ℓt — all future forecasts equal the current level (flat forecast line).
- Why “exponential”? Substituting the level equation repeatedly shows that weights decay geometrically: α, α(1−α), α(1−α)², …
- Optimal α is estimated by minimizing the sum of squared one-step forecast errors (SSE).
▶Moving Average Smoother
Business Forecasting · Decomposition
This series has 48 quarters of data with a clear seasonal swing and upward trend. Try k = 4 to watch the seasonal pattern disappear. At larger k, the trend itself is smoothed — but the smoother lags further behind the data.
Quarterly Sales: Original Series vs. MA(k) Smoother
▶ACF / Correlogram Explorer
Business Forecasting · Time Series Graphics & ARIMA
Select a series type below. Red dashed lines are 95% significance bounds (±1.96 / √T). Colored bars exceed the bounds and indicate significant autocorrelation at that lag.
Sample ACF — Lags 0–20
- AR(p): ACF tails off with exponential decay; PACF cuts off sharply after lag p.
- MA(q): ACF cuts off sharply after lag q; PACF tails off.
- Non-stationary (unit root): ACF decays very slowly toward zero — first-difference the series before modeling.
- Seasonal pattern: Significant spikes at lags m, 2m, 3m, … (where m is the seasonal period).
- White noise residuals: All ACF bars inside the 95% bounds — the model has captured all structure.
This lab introduces the fpp3 ecosystem and the tsibble data structure used throughout the course. You will load a built-in dataset, filter to a single time series, produce a time plot, and compute basic summary statistics. All fpp3 datasets are included in the package — no download is required.
| Variable | Description |
|---|---|
Month | Monthly time index (tsibble index) |
State | Australian state (key variable) |
Industry | Retail industry category (key variable) |
Turnover | Monthly retail turnover ($AUD millions) |
- Install and load fpp3. Print aus_retail — how many key combinations are there? What is the index?
- Filter to New South Wales, Supermarket and grocery stores. How many rows does this leave?
- Make a time plot with
autoplot(). Does the series show a trend? Seasonality? - Compute the mean, standard deviation, minimum, and maximum of Turnover for this series.
- Repeat for a different state or industry. How do the patterns compare?
# Lab 1: Getting Started with fpp3
# ─────────────────────────────────────────────────
# install.packages("fpp3") # run once
library(fpp3)
# Step 1 — Inspect the tsibble
aus_retail
# Note: key = State + Industry; index = Month
# Step 2 — Filter to one series
nsw_grocery <- aus_retail |>
filter(State == "New South Wales",
Industry == "Supermarket and grocery stores")
nsw_grocery
# Step 3 — Time plot
nsw_grocery |>
autoplot(Turnover) +
labs(title = "NSW Supermarket & Grocery Turnover",
y = "Turnover ($AUD millions)", x = NULL)
# Step 4 — Summary statistics
nsw_grocery |>
as_tibble() |>
summarise(
n = n(),
mean_turnover = mean(Turnover, na.rm = TRUE),
sd_turnover = sd(Turnover, na.rm = TRUE),
min_turnover = min(Turnover, na.rm = TRUE),
max_turnover = max(Turnover, na.rm = TRUE)
)
# Step 5 — Try another series for comparison
aus_retail |>
filter(State == "Victoria",
Industry == "Clothing, footwear and personal accessory retailing") |>
autoplot(Turnover) +
labs(title = "Victoria: Clothing & Footwear Turnover",
y = "Turnover ($AUD millions)")
# Lab 1: Getting Started with Time Series Data
# ─────────────────────────────────────────────────
# Export from R first:
# library(fpp3)
# aus_retail |>
# filter(State == "New South Wales",
# Industry == "Supermarket and grocery stores") |>
# mutate(Month = as.Date(Month)) |>
# as_tibble() |>
# write.csv("nsw_grocery.csv", row.names = FALSE)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("nsw_grocery.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
# Step 1 — Inspect the data
print(df.head())
print(f"N = {len(df)}, {df.index.min().date()} to {df.index.max().date()}")
# Step 2 — Time plot
df["Turnover"].plot(
figsize=(10, 4),
title="NSW Supermarket & Grocery Turnover",
ylabel="$AUD millions")
plt.tight_layout(); plt.show()
# Step 3 — Summary statistics
print(df["Turnover"].describe())
# Step 4 — Seasonal pattern: average turnover by calendar month
df["month"] = df.index.month
monthly_avg = df.groupby("month")["Turnover"].mean()
monthly_avg.plot(kind="bar", figsize=(8, 4),
title="Average Turnover by Calendar Month")
plt.xlabel("Month"); plt.tight_layout(); plt.show()
# Step 5 — Compare another series (re-export from R with different filter)
Plotting comes before modeling. This lab walks through fpp3’s suite of visualization tools — time plot, seasonal plot, subseries plot, ACF correlogram, and lag plot — using a series with clear trend and strong monthly seasonality.
| Variable | Description |
|---|---|
Month | Monthly time index |
Cost | Monthly government drug subsidy cost ($AUD millions) |
- Make a time plot with
autoplot(). Describe the trend and any seasonal pattern you see. - Use
gg_season()to create a seasonal plot. Which months have consistently high sales? Low sales? Why might this be? - Use
gg_subseries(). What do the horizontal blue lines represent? Which months are trending up the most? - Compute and plot the ACF with
ACF() |> autoplot(). Are there seasonal spikes at lag 12? What does slow decay in the ACF indicate? - Create lag plots with
gg_lag(). At which lags is the correlation strongest? Why?
# Lab 2: Time Series Graphics
# ─────────────────────────────────────────────────
library(fpp3)
# Build a_10: total antidiabetic drug cost from PBS data
a_10 <- PBS |>
filter(ATC2 == "A10") |>
summarise(Cost = sum(Cost) / 1e6)
# Step 1 — Time plot
a_10 |>
autoplot(Cost) +
labs(title = "Australian antidiabetic drug sales",
y = "$ million", x = NULL)
# Step 2 — Seasonal plot
a_10 |>
gg_season(Cost, labels = "both") +
labs(title = "Seasonal plot: antidiabetic drug sales",
y = "$ million")
# Step 3 — Subseries plot
a_10 |>
gg_subseries(Cost) +
labs(title = "Subseries plot: antidiabetic drug sales",
y = "$ million")
# Blue horizontal lines = seasonal mean for each month
# Step 4 — ACF correlogram (lag_max = 3 years)
a_10 |>
ACF(Cost, lag_max = 36) |>
autoplot() +
labs(title = "ACF: antidiabetic drug sales")
# Slow decay = trend; spikes at multiples of 12 = annual seasonality
# Step 5 — Lag plots
a_10 |>
gg_lag(Cost, geom = "point", lags = 1:12) +
labs(title = "Lag plots: antidiabetic drug sales")
# Lab 2: Time Series Graphics
# ─────────────────────────────────────────────────
# Export from R first:
# library(fpp3)
# a_10 <- PBS |> filter(ATC2 == "A10") |> summarise(Cost = sum(Cost) / 1e6)
# a_10 |> mutate(Month = as.Date(Month)) |>
# as_tibble() |> write.csv("a10.csv", row.names = FALSE)
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from pandas.plotting import lag_plot
df = pd.read_csv("a10.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
# Step 1 — Time plot
df["Cost"].plot(figsize=(10, 4),
title="Antidiabetic Drug Sales", ylabel="$AUD millions")
plt.tight_layout(); plt.show()
# Step 2 — Seasonal plot: one line per year
df["year"] = df.index.year
df["month"] = df.index.month
pivot = df.pivot(columns="year", values="Cost")
pivot.plot(figsize=(10, 4), alpha=0.7,
title="Seasonal Plot: Drug Sales (one line per year)")
plt.xlabel("Month (1 = Jan)"); plt.tight_layout(); plt.show()
# Step 3 — Subseries means by calendar month
monthly = df.groupby("month")["Cost"].mean()
monthly.plot(kind="bar", figsize=(8, 4),
title="Mean Drug Sales by Calendar Month")
plt.tight_layout(); plt.show()
# Step 4 — ACF correlogram (36 lags)
fig, ax = plt.subplots(figsize=(10, 4))
plot_acf(df["Cost"], lags=36, ax=ax)
ax.set_title("ACF: antidiabetic drug sales")
plt.tight_layout(); plt.show()
# Step 5 — Lag scatter plots (lags 1-12)
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
for i, ax in enumerate(axes.flat, 1):
lag_plot(df["Cost"], lag=i, ax=ax)
ax.set_title(f"Lag {i}")
plt.tight_layout(); plt.show()
STL (Seasonal and Trend decomposition using Loess) separates a time series into trend, seasonal, and remainder components. In this lab you will decompose Australian beer production, extract the seasonally adjusted series, and quantify the strength of trend and seasonality.
| Variable | Description |
|---|---|
Quarter | Quarterly time index |
Beer | Quarterly beer production (megalitres) |
- Plot beer production over time. Describe the overall shape: does the trend increase then decrease? Is there strong seasonality?
- Fit STL decomposition with
model(STL(...))and plot all four components. Which component has the largest scale? - Overlay the seasonally adjusted series on the original. What remains once seasonality is removed?
- Use
features(Beer, feat_stl)to compute trend strength (FT) and seasonal strength (FS). Are both above the 0.64 threshold for “strong”? - Re-fit STL with
season(window = 7)instead of"periodic". How does this change the seasonal component?
# Lab 3: STL Decomposition
# ─────────────────────────────────────────────────
library(fpp3)
beer <- aus_production |>
select(Quarter, Beer) |>
filter(!is.na(Beer))
# Step 1 — Time plot
beer |>
autoplot(Beer) +
labs(title = "Australian quarterly beer production",
y = "Megalitres", x = NULL)
# Step 2 — STL decomposition
dcmp <- beer |>
model(STL(Beer ~ trend(window = 9) +
season(window = "periodic"),
robust = TRUE))
components(dcmp) |> autoplot()
# Step 3 — Seasonally adjusted overlay
components(dcmp) |>
as_tsibble() |>
autoplot(Beer, colour = "gray70") +
geom_line(aes(y = season_adjust), colour = "#0072B2", linewidth = 0.9) +
labs(title = "Beer: original (grey) vs. seasonally adjusted (blue)",
y = "Megalitres")
# Step 4 — Trend and seasonal strength
beer |>
features(Beer, feat_stl)
# F_T > 0.64 → strong trend; F_S > 0.64 → strong seasonality
# Step 5 — Variable seasonal window (allows seasonal shape to change)
dcmp2 <- beer |>
model(STL(Beer ~ trend(window = 9) +
season(window = 7)))
components(dcmp2) |> autoplot()
# Lab 3: STL Decomposition
# ─────────────────────────────────────────────────
# Export from R:
# aus_production |> select(Quarter, Beer) |> filter(!is.na(Beer)) |>
# mutate(Quarter = as.Date(Quarter)) |>
# as_tibble() |> write.csv("beer.csv", row.names = FALSE)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import STL
df = pd.read_csv("beer.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").dropna(subset=["Beer"]).set_index("Quarter")
# Step 1 — Time plot
df["Beer"].plot(figsize=(10, 4),
title="Australian Quarterly Beer Production", ylabel="Megalitres")
plt.tight_layout(); plt.show()
# Step 2 — STL decomposition (period=4 for quarterly data)
stl = STL(df["Beer"], period=4, robust=True)
res = stl.fit()
res.plot()
plt.suptitle("STL Decomposition: Beer Production")
plt.tight_layout(); plt.show()
# Step 3 — Seasonally adjusted overlay
df["sa"] = df["Beer"] - res.seasonal
fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(df.index, df["Beer"], color="gray", alpha=0.7, label="Original")
ax.plot(df.index, df["sa"], color="steelblue", lw=1.5, label="Seasonally adjusted")
ax.legend(); ax.set_title("Beer: Original vs. Seasonally Adjusted")
plt.tight_layout(); plt.show()
# Step 4 — Trend and seasonal strength (Hyndman-Athanasopoulos formulas)
var_r = np.var(res.resid)
F_T = max(0, 1 - var_r / np.var(res.trend + res.resid))
F_S = max(0, 1 - var_r / np.var(res.seasonal + res.resid))
print(f"Trend strength (F_T): {F_T:.3f} (>0.64 = strong)")
print(f"Seasonal strength (F_S): {F_S:.3f} (>0.64 = strong)")
# Step 5 — Variable seasonal window
stl2 = STL(df["Beer"], period=4, seasonal=7, robust=True)
res2 = stl2.fit()
res2.plot()
plt.suptitle("STL: seasonal window = 7 (more flexible)")
plt.tight_layout(); plt.show()
Before fitting complex models, always benchmark against simple ones. This lab fits the four benchmark forecasting methods — mean, naïve, seasonal naïve, and drift — splits the data into training and test sets, and evaluates forecast accuracy using MAE, RMSE, and MAPE.
| Variable | Description |
|---|---|
Quarter | Quarterly time index |
Beer | Quarterly beer production (megalitres) |
- Split the series: train on data before 2005, test on 2005 onward. How many test periods are there?
- Fit all four benchmark models on the training set using a single
model()call. - Generate forecasts for the test period and plot them alongside the test data. Which method looks most reasonable?
- Compute accuracy metrics with
accuracy(fc, beer_test). Which model has the lowest RMSE? Lowest MAPE? - Run residual diagnostics on the seasonal naïve model. Is the Ljung-Box test significant? Are residuals white noise?
# Lab 4: Benchmark Methods & Forecast Accuracy
# ─────────────────────────────────────────────────
library(fpp3)
beer <- aus_production |>
select(Quarter, Beer) |>
filter(!is.na(Beer))
# Step 1 — Train / test split
beer_train <- beer |> filter(year(Quarter) < 2005)
beer_test <- beer |> filter(year(Quarter) >= 2005)
cat("Training periods:", nrow(beer_train), "\n")
cat("Test periods: ", nrow(beer_test), "\n")
# Step 2 — Fit four benchmark models
fit <- beer_train |>
model(
Mean = MEAN(Beer),
Naive = NAIVE(Beer),
SNaive = SNAIVE(Beer),
Drift = RW(Beer ~ drift())
)
# Step 3 — Forecast and plot
fc <- fit |> forecast(h = nrow(beer_test))
fc |>
autoplot(beer_train, level = NULL) +
autolayer(beer_test, Beer, colour = "black") +
labs(title = "Beer production: benchmark forecasts vs. actual",
y = "Megalitres") +
guides(colour = guide_legend(title = "Method"))
# Step 4 — Accuracy on test set
accuracy(fc, beer_test) |>
arrange(RMSE)
# Step 5 — Residual diagnostics for seasonal naive
fit |> select(SNaive) |> gg_tsresiduals()
augment(fit) |>
filter(.model == "SNaive") |>
features(.innov, ljung_box, lag = 8)
# p > 0.05 → residuals are consistent with white noise
# Lab 4: Benchmark Methods & Forecast Accuracy
# ─────────────────────────────────────────────────
# Uses same beer.csv as Lab 3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import acorr_ljungbox
df = pd.read_csv("beer.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").dropna(subset=["Beer"]).set_index("Quarter")
# Step 1 — Train/test split (mirror R: train < 2005, test >= 2005)
train = df[df.index.year < 2005].copy()
test = df[df.index.year >= 2005].copy()
print(f"Training: {len(train)}, Test: {len(test)}")
# Step 2 — Benchmark forecasts
h = len(test)
mean_fc = pd.Series(train["Beer"].mean(), index=test.index)
naive_fc = pd.Series(train["Beer"].iloc[-1], index=test.index)
snaive_fc = pd.Series(
[train["Beer"].iloc[-(4 - i % 4)] for i in range(h)], index=test.index)
slope = (train["Beer"].iloc[-1] - train["Beer"].iloc[0]) / (len(train) - 1)
drift_fc = pd.Series(
[train["Beer"].iloc[-1] + slope * (i + 1) for i in range(h)], index=test.index)
# Step 3 — Plot
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(train.index[-20:], train["Beer"].iloc[-20:], "k-", label="Training")
ax.plot(test.index, test["Beer"], "k--", label="Actual")
for fc, label in [(mean_fc, "Mean"), (naive_fc, "Naive"),
(snaive_fc, "Seasonal Naive"), (drift_fc, "Drift")]:
ax.plot(test.index, fc, label=label)
ax.legend(); ax.set_title("Beer Production: Benchmark Forecasts")
plt.tight_layout(); plt.show()
# Step 4 — Accuracy
def metrics(actual, fc):
e = actual.values - fc.values
return {"RMSE": np.sqrt(np.mean(e**2)),
"MAE": np.mean(np.abs(e)),
"MAPE": np.mean(np.abs(e / actual.values)) * 100}
results = {m: metrics(test["Beer"], fc)
for m, fc in [("Mean", mean_fc), ("Naive", naive_fc),
("Seasonal Naive", snaive_fc), ("Drift", drift_fc)]}
print(pd.DataFrame(results).T.sort_values("RMSE").round(2))
# Step 5 — Residual diagnostics for Seasonal Naive (training residuals)
# SNaive innovation at time t: Beer[t] - Beer[t-4]
resid_train = train["Beer"].iloc[4:].values - train["Beer"].iloc[:-4].values
lb = acorr_ljungbox(resid_train, lags=[8], return_df=True)
print(lb)
# p << 0.05 → residuals are NOT white noise (autocorrelation remains)
ETS models generalize SES, Holt’s, and Holt-Winters methods into a unified state-space framework. fpp3’s ETS() function selects the best error/trend/seasonal specification by minimizing AICc. This lab fits an ETS model to a drug sales series with strong trend and seasonality, diagnoses residuals, and produces a 2-year forecast.
| Variable | Description |
|---|---|
Month | Monthly time index |
Cost | Monthly government drug cost ($AUD millions) |
- Plot the h02 series. Describe the trend and seasonal pattern. Does variance grow with the level?
- Fit
ETS(Cost)with automatic selection. Usereport()to identify the chosen model. What do the E, T, and S letters mean? - Run residual diagnostics with
gg_tsresiduals()and the Ljung-Box test. Are residuals white noise? - Produce a 2-year forecast. Do the prediction intervals look reasonable given the historical variation?
- Manually fit ETS(A,A,A) and ETS(M,Ad,M) and compare AICc values. Which is preferred?
# Lab 5: Exponential Smoothing (ETS)
# ─────────────────────────────────────────────────
library(fpp3)
# Build h02: total corticosteroid drug cost from PBS
h02 <- PBS |>
filter(ATC2 == "H02") |>
summarise(Cost = sum(Cost) / 1e6)
# Step 1 — Time plot
h02 |>
autoplot(Cost) +
labs(title = "Australian corticosteroid drug sales",
y = "$ million", x = NULL)
# Step 2 — Automatic ETS selection
fit_ets <- h02 |> model(ETS(Cost))
report(fit_ets)
# Read: ETS(error, trend, season) — e.g. ETS(M,Ad,M)
# smoothing parameters: alpha, beta*, gamma, phi
# Step 3 — Residual diagnostics
fit_ets |> gg_tsresiduals()
augment(fit_ets) |>
features(.innov, ljung_box, lag = 24)
# p << 0.05 → residuals are NOT white noise; autocorrelation remains
# Step 4 — 2-year forecast with prediction intervals
fit_ets |>
forecast(h = "2 years") |>
autoplot(h02) +
labs(title = "ETS forecast: corticosteroid drug sales",
y = "$ million")
# Step 5 — Compare AICc across ETS specifications
h02 |>
model(
Auto = ETS(Cost),
AAA = ETS(Cost ~ error("A") + trend("A") + season("A")),
MAM = ETS(Cost ~ error("M") + trend("A") + season("M")),
MAdM = ETS(Cost ~ error("M") + trend("Ad") + season("M"))
) |>
glance() |>
select(.model, AICc) |>
arrange(AICc)
# Lab 5: Exponential Smoothing (ETS)
# ─────────────────────────────────────────────────
# Note: uses statsmodels ETSModel — forecast API differs from R.
# Use get_prediction(start, end) and .summary_frame(alpha) to get intervals.
# Column names are 'mean', 'pi_lower', 'pi_upper' (not conf_int).
# Export from R:
# library(fpp3)
# h02 <- PBS |> filter(ATC2 == "H02") |> summarise(Cost = sum(Cost) / 1e6)
# h02 |> mutate(Month = as.Date(Month)) |>
# as_tibble() |> write.csv("h02.csv", row.names = FALSE)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.stats.diagnostic import acorr_ljungbox
df = pd.read_csv("h02.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"
# Step 1 — Time plot
df["Cost"].plot(figsize=(10, 4),
title="H02 Antidiabetic Drug Costs", ylabel="$AUD millions")
plt.tight_layout(); plt.show()
# Step 2 — Fit ETS models and compare AIC
configs = {
"ETS(A,A,A)": dict(error="add", trend="add", seasonal="add", seasonal_periods=12),
"ETS(M,A,M)": dict(error="mul", trend="add", seasonal="mul", seasonal_periods=12),
"ETS(M,Ad,M)":dict(error="mul", trend="add", seasonal="mul", seasonal_periods=12,
damped_trend=True),
}
fits = {}
for name, kw in configs.items():
try:
fits[name] = ETSModel(df["Cost"], **kw).fit(disp=False)
except Exception:
pass
aic_tbl = {n: {"AIC": f.aic, "BIC": f.bic} for n, f in fits.items()}
print(pd.DataFrame(aic_tbl).T.sort_values("AIC").round(1))
# Step 3 — Report best model (assume ETS(M,A,M))
best = fits.get("ETS(M,A,M)", list(fits.values())[0])
print(best.summary())
# Step 4 — Residual diagnostics
resid = best.resid
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(resid, lags=24, ax=axes[1])
axes[2].hist(resid, bins=20); axes[2].set_title("Distribution")
plt.tight_layout(); plt.show()
lb = acorr_ljungbox(resid, lags=[24], return_df=True)
print(lb)
# Step 5 — 2-year forecast with 80% and 95% intervals
fc = best.get_prediction(start=len(df), end=len(df) + 23)
sf95 = fc.summary_frame(alpha=0.05) # 95% PI: columns pi_lower / pi_upper
sf80 = fc.summary_frame(alpha=0.20) # 80% PI
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-60:], df["Cost"].iloc[-60:], color="black")
ax.plot(sf95.index, sf95["mean"], color="steelblue")
ax.fill_between(sf80.index, sf80["pi_lower"], sf80["pi_upper"],
alpha=0.35, color="steelblue")
ax.fill_between(sf95.index, sf95["pi_lower"], sf95["pi_upper"],
alpha=0.15, color="steelblue")
ax.set_title("H02: 2-Year ETS Forecast")
plt.tight_layout(); plt.show()
ARIMA models capture autocorrelation structure by regressing a series on its own past values (AR) and past forecast errors (MA), after differencing to achieve stationarity (I). This lab uses monthly US retail employment to practice the full ARIMA workflow: stationarity check, automatic model selection, residual diagnostics, and a head-to-head comparison with ETS.
| Variable | Description |
|---|---|
Month | Monthly time index |
Series_ID | BLS series identifier (key) |
Title | Employment category label |
Employed | Employed persons (thousands) |
- Filter to “Retail Trade” employment. Plot the series and its ACF. Is the series stationary? What does slow ACF decay signal?
- Use
features(Employed, list(unitroot_ndiffs, unitroot_nsdiffs))to determine how many differences are needed. - Fit
ARIMA(Employed)with automatic selection. Usereport()to read the output. Identify p, d, q, P, D, Q, and m. - Run residual diagnostics. Is the Ljung-Box test significant? Are residuals consistent with white noise?
- Split on 2018 and compare ARIMA vs. ETS on the test set. Which has lower RMSE? Which has more conservative (wider) prediction intervals?
# Lab 6: ARIMA Models
# ─────────────────────────────────────────────────
library(fpp3)
retail <- us_employment |>
filter(Title == "Retail Trade") |>
select(Month, Employed)
# Step 1 — Time plot and ACF
retail |>
autoplot(Employed) +
labs(title = "US Retail Trade Employment",
y = "Thousands of persons", x = NULL)
retail |>
ACF(Employed, lag_max = 36) |>
autoplot() +
labs(title = "ACF: US Retail Employment")
# Slow ACF decay → non-stationary (unit root)
# Step 2 — How many differences?
retail |>
features(Employed, list(unitroot_ndiffs, unitroot_nsdiffs))
# Step 3 — Automatic ARIMA selection
fit_arima <- retail |> model(ARIMA(Employed))
report(fit_arima)
# Read: ARIMA(p,d,q)(P,D,Q)[m]
# Coefficients, AICc, sigma^2
# Step 4 — Residual diagnostics
fit_arima |> gg_tsresiduals()
augment(fit_arima) |>
features(.innov, ljung_box, lag = 24)
# Step 5 — ARIMA vs. ETS on test set
retail_train <- retail |> filter(year(Month) < 2018)
retail_test <- retail |> filter(year(Month) >= 2018)
fit_compare <- retail_train |>
model(
ARIMA = ARIMA(Employed),
ETS = ETS(Employed)
)
fc_compare <- fit_compare |>
forecast(h = nrow(retail_test))
fc_compare |>
autoplot(retail_train |> tail(60), level = 80) +
autolayer(retail_test, Employed, colour = "black") +
labs(title = "ARIMA vs. ETS: US Retail Employment",
y = "Thousands of persons")
accuracy(fc_compare, retail_test) |>
select(.model, RMSE, MAE, MAPE) |>
arrange(RMSE)
# Lab 6: ARIMA Models
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
# pip install pmdarima
# Export from R:
# us_employment |>
# filter(Title == "Retail Trade") |>
# select(Month, Employed) |>
# mutate(Month = as.Date(Month)) |>
# as_tibble() |> write.csv("retail_employment.csv", row.names = FALSE)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
df = pd.read_csv("retail_employment.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"
# Step 1 — Time plot and ACF
df["Employed"].plot(figsize=(10, 4),
title="US Retail Trade Employment", ylabel="Thousands of persons")
plt.tight_layout(); plt.show()
fig, ax = plt.subplots(figsize=(10, 4))
plot_acf(df["Employed"], lags=36, ax=ax)
ax.set_title("ACF: US Retail Employment"); plt.tight_layout(); plt.show()
# Step 2 — Stationarity check (ADF test)
pval = adfuller(df["Employed"].dropna())[1]
print(f"ADF p-value (levels): {pval:.4f} (>0.05 = non-stationary)")
pval_d = adfuller(df["Employed"].diff(12).diff(1).dropna())[1]
print(f"ADF p-value (s-diff+diff): {pval_d:.4f} (<0.05 = stationary)")
# Step 3 — Fit ARIMA (auto-order via pmdarima; install: pip install pmdarima)
from pmdarima import auto_arima
fit_auto = auto_arima(df["Employed"], seasonal=True, m=12,
information_criterion="aicc", stepwise=True, trace=True)
print(fit_auto.summary())
# Step 4 — Residual diagnostics
resid = fit_auto.resid()
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
plot_acf(resid, lags=24, ax=axes[1])
axes[2].hist(resid, bins=20)
plt.tight_layout(); plt.show()
lb = acorr_ljungbox(resid, lags=[24], return_df=True)
print(lb)
# Step 5 — ARIMA vs. ETS on 2018+ test set
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
train = df[df.index.year < 2018]
test = df[df.index.year >= 2018]
h = len(test)
p, d, q = fit_auto.order
P, D, Q, m = fit_auto.seasonal_order
arima_fit = SARIMAX(train["Employed"], order=(p,d,q),
seasonal_order=(P,D,Q,m)).fit(disp=False)
ets_fit = ETSModel(train["Employed"], error="mul", trend="add",
seasonal="mul", seasonal_periods=12).fit(disp=False)
arima_fc = arima_fit.forecast(h)
ets_fc = ets_fit.forecast(h)
rmse = lambda f: np.sqrt(np.mean((test["Employed"].values - f.values)**2))
print(f"ARIMA RMSE: {rmse(arima_fc):.1f}")
print(f"ETS RMSE: {rmse(ets_fc):.1f}")
Automatic ARIMA selection is convenient, but reading ACF and PACF plots after differencing is the core diagnostic skill. An AR(p) process shows PACF cutting off at lag p with ACF tailing off; an MA(q) process is the mirror image. This lab walks through manual order identification and then checks your choice against the AICc-selected model.
| Variable | Description |
|---|---|
Quarter | Quarterly time index (tsibble index) |
Gas | Gas production (petajoules) |
- Plot Gas production. Describe the trend and any seasonality. Is the series stationary?
- Take a log transformation, then a seasonal difference (lag 4). Plot and check the KPSS statistic with
features(log_gas, unitroot_kpss). Take a regular first difference if still needed. - Plot the ACF and PACF of the (log, seasonally-differenced) series. Based on the patterns, propose values of p and q for the non-seasonal part.
- Fit your manually-chosen
ARIMA(p,d,q)(P,D,Q)[4]alongsideARIMA(log(Gas))auto-selection. Compare the two models by AICc usingglance(). - Forecast Gas production 8 quarters (2 years) ahead from both models and plot the intervals side by side. Which is narrower?
# Lab 7: ARIMA Order Selection via ACF/PACF
# ─────────────────────────────────────────────────
library(fpp3)
gas <- aus_production |> select(Quarter, Gas)
# Step 1 — Time plot
gas |> autoplot(Gas) +
labs(title = "Australian Gas Production",
y = "Petajoules", x = NULL)
# Step 2 — Log + seasonal difference
gas <- gas |> mutate(log_gas = log(Gas))
gas |>
mutate(sdiff = difference(log_gas, lag = 4)) |>
features(sdiff, unitroot_kpss)
# If p-value < 0.05, take one more regular difference
gas_diff <- gas |>
mutate(d = difference(difference(log_gas, lag = 4), 1))
gas_diff |>
autoplot(d) +
labs(title = "Log Gas: seasonal + regular differenced")
# Step 3 — ACF and PACF of differenced series
gas_diff |> ACF(d, lag_max = 24) |> autoplot()
gas_diff |> PACF(d, lag_max = 24) |> autoplot()
# AR(p): PACF cuts off, ACF tails off
# MA(q): ACF cuts off, PACF tails off
# Step 4 — Manual vs. auto ARIMA
fit <- gas |>
model(
manual = ARIMA(log(Gas) ~ pdq(1,1,0) + PDQ(1,1,0)), # adjust after Step 3
auto = ARIMA(log(Gas))
)
glance(fit) |> select(.model, AICc, BIC)
report(fit |> select(auto))
# Step 5 — Forecast 8 quarters ahead
fc <- fit |> forecast(h = 8)
fc |>
autoplot(gas |> tail(40), level = 80) +
labs(title = "Gas Production Forecast: Manual vs. Auto ARIMA",
y = "Petajoules (log scale)")
# Lab 7: ARIMA Order Selection via ACF/PACF
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
# pip install pmdarima
# ─────────────────────────────────────────────────
# Export from R:
# aus_production |>
# select(Quarter, Gas) |>
# mutate(Quarter = as.Date(Quarter)) |>
# as_tibble() |> write.csv("gas.csv", row.names = FALSE)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX
df = pd.read_csv("gas.csv", parse_dates=["Quarter"])
df = df.sort_values("Quarter").set_index("Quarter")
# Step 1 — Time plot
df["Gas"].plot(figsize=(10, 4),
title="Australian Gas Production", ylabel="Petajoules")
plt.tight_layout(); plt.show()
# Step 2 — Log + seasonal + regular differencing; KPSS test
df["log_gas"] = np.log(df["Gas"])
df["sdiff_d1"] = df["log_gas"].diff(4).diff(1) # lag-4 + lag-1
_, p, *_ = kpss(df["sdiff_d1"].dropna(), regression="c")
print(f"KPSS p-value after log + s-diff + diff: {p:.4f} (>0.05 = stationary)")
# Step 3 — ACF and PACF to identify p, q
fig, axes = plt.subplots(2, 1, figsize=(10, 7))
plot_acf( df["sdiff_d1"].dropna(), lags=24, ax=axes[0])
plot_pacf(df["sdiff_d1"].dropna(), lags=24, ax=axes[1])
axes[0].set_title("ACF of log-gas (s-diff + diff)")
axes[1].set_title("PACF of log-gas (s-diff + diff)")
plt.tight_layout(); plt.show()
# Step 4 — Manual vs. auto ARIMA (compare AICc)
manual = SARIMAX(df["log_gas"], order=(1,1,0),
seasonal_order=(1,1,0,4)).fit(disp=False)
print(f"Manual AICc: {manual.aicc:.1f}")
from pmdarima import auto_arima # pip install pmdarima
auto_m = auto_arima(df["log_gas"], seasonal=True, m=4,
information_criterion="aicc", stepwise=True, trace=False)
print(f"Auto AICc: {auto_m.aicc():.1f}")
print("Auto order:", auto_m.order, auto_m.seasonal_order)
# Step 5 — Forecast 8 quarters ahead (back-transform from log)
fc_manual = np.exp(manual.forecast(8))
auto_fit = SARIMAX(df["log_gas"], order=auto_m.order,
seasonal_order=auto_m.seasonal_order).fit(disp=False)
fc_auto = np.exp(auto_fit.forecast(8))
print("Manual forecast:", fc_manual.round(1).tolist())
print("Auto forecast:", fc_auto.round(1).tolist())
Seasonal ARIMA extends the ARIMA framework with a seasonal AR/MA layer operating at lag m, written ARIMA(p,d,q)(P,D,Q)[m]. Monthly pharmaceutical data are strongly seasonal and trended, making them ideal for practising seasonal differencing and reading the full SARIMA notation from R output.
| Variable | Description |
|---|---|
Month | Monthly time index |
Cost | Monthly drug cost (AUD millions) |
- Plot the series. Identify trend, seasonality, and any calendar effects. Does variance grow with level? Consider a log transformation.
- Use
features(log(Cost), list(unitroot_ndiffs, unitroot_nsdiffs))to determine d and D. Take the indicated differences and verify with an ACF plot. - Fit
ARIMA(log(Cost))with automatic selection. Usereport()to read the full SARIMA(p,d,q)(P,D,Q)[12] output. Write out the model equation in words: what does each term capture? - Run
gg_tsresiduals()and the Ljung-Box test. Do the residuals look like white noise? - Forecast 36 months (3 years) ahead. Plot the forecast with 80% and 95% prediction intervals. Does the seasonal pattern carry forward correctly?
# Lab 8: Seasonal ARIMA (SARIMA)
# ─────────────────────────────────────────────────
library(fpp3)
h02 <- PBS |>
filter(ATC2 == "H02") |>
summarise(Cost = sum(Cost) / 1e6)
# Step 1 — Plot and check for variance growth
h02 |> autoplot(Cost) +
labs(title = "H02 Antidiabetic Drug Costs",
y = "AUD millions", x = NULL)
h02 |> autoplot(log(Cost)) +
labs(title = "H02 Drug Costs (log scale)")
# Step 2 — How many differences needed?
h02 |>
features(log(Cost), list(unitroot_ndiffs, unitroot_nsdiffs))
# Visually check after seasonal + regular differencing
h02 |>
mutate(d = difference(difference(log(Cost), lag = 12), 1)) |>
autoplot(d)
h02 |>
mutate(d = difference(difference(log(Cost), lag = 12), 1)) |>
ACF(d, lag_max = 36) |>
autoplot()
# Step 3 — Automatic SARIMA
fit <- h02 |> model(ARIMA(log(Cost)))
report(fit)
# Look for: ARIMA(p,d,q)(P,D,Q)[12]
# Non-seasonal: captures short-run autocorrelation
# Seasonal: captures lag-12 autocorrelation
# Step 4 — Residual diagnostics
fit |> gg_tsresiduals(lag_max = 36)
augment(fit) |>
features(.innov, ljung_box, lag = 36, dof = 4)
# Step 5 — 3-year forecast
fc <- fit |> forecast(h = 36)
fc |>
autoplot(h02 |> tail(60)) +
labs(title = "H02 Drug Costs: 3-Year SARIMA Forecast",
y = "AUD millions")
# Lab 8: Seasonal ARIMA (SARIMA)
# ─────────────────────────────────────────────────
# Note: requires pmdarima for auto_arima — install once with:
# pip install pmdarima
# ─────────────────────────────────────────────────
# Export from R: (same h02.csv as Lab 5)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import kpss
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
from pmdarima import auto_arima # pip install pmdarima
df = pd.read_csv("h02.csv", parse_dates=["Month"])
df = df.sort_values("Month").set_index("Month")
df.index.freq = "MS"
# Step 1 — Original and log-scale plots
fig, axes = plt.subplots(2, 1, figsize=(10, 7))
df["Cost"].plot(ax=axes[0], title="H02 Drug Costs (original)")
np.log(df["Cost"]).plot(ax=axes[1], title="H02 Drug Costs (log scale)")
plt.tight_layout(); plt.show()
# Step 2 — Stationarity: KPSS after seasonal then regular differencing
log_cost = np.log(df["Cost"])
_, p1, *_ = kpss(log_cost.diff(12).dropna(), regression="c")
_, p2, *_ = kpss(log_cost.diff(12).diff(1).dropna(), regression="c")
print(f"KPSS after seasonal diff only: p = {p1:.4f}")
print(f"KPSS after seasonal + first diff: p = {p2:.4f}")
print("(p > 0.05 means stationary)")
# Step 3 — Auto SARIMA selection on log scale
fit = auto_arima(log_cost, seasonal=True, m=12,
information_criterion="aicc", stepwise=True, trace=True)
print(fit.summary())
# Step 4 — Residual diagnostics
resid = fit.resid()
fig, axes = plt.subplots(3, 1, figsize=(10, 8))
axes[0].plot(resid); axes[0].axhline(0, color="red", lw=0.8)
axes[0].set_title("Residuals")
plot_acf(resid, lags=36, ax=axes[1])
axes[2].hist(resid, bins=20); axes[2].set_title("Distribution")
plt.tight_layout(); plt.show()
print(acorr_ljungbox(resid, lags=[36], return_df=True))
# Step 5 — 3-year forecast (36 months, back-transform)
p, d, q = fit.order; P, D, Q, m = fit.seasonal_order
sarima = SARIMAX(log_cost, order=(p,d,q),
seasonal_order=(P,D,Q,m)).fit(disp=False)
fc_log = sarima.get_forecast(36)
fc_mean = np.exp(fc_log.predicted_mean)
ci = np.exp(fc_log.conf_int())
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-60:], df["Cost"].iloc[-60:], color="black", label="Observed")
ax.plot(fc_mean.index, fc_mean, color="steelblue", label="Forecast")
ax.fill_between(ci.index, ci.iloc[:,0], ci.iloc[:,1], alpha=0.2)
ax.legend(); ax.set_title("H02 Drug Costs: 3-Year SARIMA Forecast")
plt.tight_layout(); plt.show()
Dynamic regression combines regression predictors with an ARIMA error structure, written ARIMA(y ~ x + pdq() + PDQ()) in fpp3. This is essential when a series has external drivers (temperature, prices, calendar effects) and residual autocorrelation. Weekly gasoline production has a complex seasonal period (~52 weeks), so Fourier terms replace seasonal ARIMA to handle the non-integer periodicity.
| Variable | Description |
|---|---|
Week | Weekly time index |
Barrels | Production (millions of barrels per day) |
- Plot the series. Describe the trend and the complex weekly seasonality (period ≈ 52.18 weeks).
- Fit four models using K = 1, 3, 6, and 9 Fourier terms combined with
ARIMA(Barrels ~ fourier(K=K) + PDQ(0,0,0)). Use a loop ormodel()with named entries. Extract AICc fromglance()and select the best K. - Inspect the best model with
report(). What ARIMA order was chosen for the errors? Rungg_tsresiduals(). - Forecast 2 years (104 weeks) ahead using the best model. Plot the result.
- Compare the Fourier + ARIMA forecast against a plain
ARIMA(Barrels)model on a 2015–2017 test set. Which has lower RMSE?
# Lab 9: Dynamic Regression with ARIMA Errors
# ─────────────────────────────────────────────────
library(fpp3)
# Step 1 — Plot
us_gasoline |> autoplot(Barrels) +
labs(title = "US Weekly Gasoline Production",
y = "Millions of barrels/day", x = NULL)
# Step 2 — Select best number of Fourier terms by AICc
fit_fourier <- us_gasoline |>
model(
K1 = ARIMA(Barrels ~ fourier(K = 1) + PDQ(0,0,0)),
K3 = ARIMA(Barrels ~ fourier(K = 3) + PDQ(0,0,0)),
K6 = ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)),
K9 = ARIMA(Barrels ~ fourier(K = 9) + PDQ(0,0,0))
)
glance(fit_fourier) |>
select(.model, AICc) |>
arrange(AICc)
# Lower AICc = better fit/parsimony trade-off
# Step 3 — Best model diagnostics (assuming K6 wins; adjust if needed)
best_fit <- us_gasoline |>
model(ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)))
report(best_fit)
best_fit |> gg_tsresiduals()
# Step 4 — Forecast 2 years ahead
fc <- best_fit |> forecast(h = 104)
fc |>
autoplot(us_gasoline |> tail(104), level = 80) +
labs(title = "US Gasoline Production: 2-Year Forecast",
y = "Millions of barrels/day")
# Step 5 — Compare vs. plain ARIMA on 2015-2017 test set
gas_train <- us_gasoline |> filter(year(Week) < 2015)
gas_test <- us_gasoline |> filter(year(Week) >= 2015)
fit_compare <- gas_train |>
model(
Fourier_ARIMA = ARIMA(Barrels ~ fourier(K = 6) + PDQ(0,0,0)),
Plain_ARIMA = ARIMA(Barrels)
)
fc_compare <- fit_compare |>
forecast(h = nrow(gas_test))
accuracy(fc_compare, gas_test) |>
select(.model, RMSE, MAE, MAPE) |>
arrange(RMSE)
# Lab 9: Dynamic Regression with ARIMA Errors
# ─────────────────────────────────────────────────
# Note: Python may select a different best K than R in Step 2.
# R's ARIMA() auto-selects the error order separately for each K value,
# while Python uses a fixed ARIMA(0,1,1) error structure throughout.
# This changes the AICc surface, so the winning K may differ.
# The key conclusion in Step 5 (Fourier+ARIMA beats plain ARIMA) holds either way.
# ─────────────────────────────────────────────────
# Export from R:
# us_gasoline |>
# mutate(Week = as.Date(Week)) |>
# as_tibble() |> write.csv("us_gasoline.csv", row.names = FALSE)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.stats.diagnostic import acorr_ljungbox
df = pd.read_csv("us_gasoline.csv", parse_dates=["Week"])
df = df.sort_values("Week").set_index("Week")
# Step 1 — Time plot
df["Barrels"].plot(figsize=(10, 4),
title="US Weekly Gasoline Production",
ylabel="Millions of barrels/day")
plt.tight_layout(); plt.show()
# Step 2 — Build Fourier terms (period ≈ 52.18 weeks = 365.25/7)
period = 365.25 / 7
n = len(df)
t = np.arange(n)
def fourier_df(t_vals, K, index):
cols = {}
for k in range(1, K + 1):
cols[f"sin{k}"] = np.sin(2 * np.pi * k * t_vals / period)
cols[f"cos{k}"] = np.cos(2 * np.pi * k * t_vals / period)
return pd.DataFrame(cols, index=index)
def aicc(fit):
k, n = len(fit.params), fit.nobs
return fit.aic + 2 * k * (k + 1) / (n - k - 1)
aicc_results = {}
for K in [1, 3, 6, 9]:
X = fourier_df(t, K, df.index)
fit = SARIMAX(df["Barrels"], exog=X,
order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
aicc_results[f"K={K}"] = aicc(fit)
print("AICc by number of Fourier terms:")
print(pd.Series(aicc_results).sort_values())
# Note: R auto-selects the ARIMA error order for each K; Python fixes it at (0,1,1).
# Python may select a different best K than R — both correctly apply AICc minimization.
# Step 3 — Fit best K and diagnose residuals
best_K = int(min(aicc_results, key=aicc_results.get).split("=")[1])
X_best = fourier_df(t, best_K, df.index)
best = SARIMAX(df["Barrels"], exog=X_best,
order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
print(best.summary())
lb = acorr_ljungbox(best.resid, lags=[52], return_df=True)
print(lb)
# Step 4 — Forecast 2 years (104 weeks)
t_fc = np.arange(n, n + 104)
X_fc = fourier_df(t_fc, best_K, index=range(104))
fc = best.get_forecast(104, exog=X_fc)
mean = fc.predicted_mean
ci = fc.conf_int()
fig, ax = plt.subplots(figsize=(12, 5))
ax.plot(df.index[-104:], df["Barrels"].iloc[-104:], color="black")
ax.plot(mean.index, mean.values, color="steelblue")
ax.fill_between(ci.index, ci.iloc[:,0], ci.iloc[:,1], alpha=0.2)
ax.set_title("US Gasoline: 2-Year Fourier + ARIMA Forecast")
plt.tight_layout(); plt.show()
# Step 5 — Compare vs. plain ARIMA on 2015-2017 test set
train_idx = df.index.year < 2015
test_idx = df.index.year >= 2015
n_train = train_idx.sum()
t_train, t_test = t[:n_train], t[n_train:]
X_tr = fourier_df(t_train, best_K, df.index[train_idx])
X_te = fourier_df(t_test, best_K, df.index[test_idx])
f_fit = SARIMAX(df.loc[train_idx, "Barrels"], exog=X_tr,
order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
p_fit = SARIMAX(df.loc[train_idx, "Barrels"],
order=(0,1,1), seasonal_order=(0,0,0,0)).fit(disp=False)
n_test = int(test_idx.sum())
fc_f = f_fit.forecast(n_test, exog=X_te)
fc_p = p_fit.forecast(n_test)
actual = df.loc[test_idx, "Barrels"]
rmse = lambda f: np.sqrt(np.mean((actual.values - np.array(f))**2))
print(f"Fourier+ARIMA RMSE: {rmse(fc_f):.3f}")
print(f"Plain ARIMA RMSE: {rmse(fc_p):.3f}")