Learning Resources

Interactive demonstrations and code examples for econometrics, microeconomics, and forecasting.

Standard Deviation

Statistics · Descriptive Statistics

Imagine this: Two classes both have an average exam score of 70. In Class A, most students scored between 65 and 75 — pretty close together. In Class B, scores ranged from 30 to 100 — all over the place! Both classes have the same average, but clearly something is different. The standard deviation is the single number that captures this difference. A small SD means scores are tightly clustered around the mean. A large SD means they’re spread out.

Play with the sliders below to see how the spread changes!
The Standard Deviation Formula
s  =  √(∑ (xi − x̄)2 / (n−1))
(xᵢ − x̄)² — squared deviations ∑ / (n−1) — variance √(…) — square root
(xᵢ − x̄)² — Squared Deviation
For each data point xᵢ, subtract the mean x̄ to find how far it sits from the center. Squaring does two things: it makes negatives positive (so deviations above and below don't cancel), and it penalizes large outliers more heavily. A score 20 points away from the mean contributes 400 to the sum; a score 2 points away contributes only 4.
∑(…) / (n−1) — Variance
We sum all the squared deviations and divide by n−1 to get an average. The result is called the variance (s²). It measures spread in squared units (e.g., points²), which is hard to interpret directly.

Why n−1 instead of n? Because we estimated the mean from the same data, which “uses up” one degree of freedom. Using n−1 gives an unbiased estimate of the true population variance. This is called Bessel’s correction.
√(…) — Back to Original Units
Taking the square root of the variance returns us to the original units (points, dollars, etc.) instead of squared units. This makes the SD directly interpretable alongside the data. For example, if exam scores have an SD of 10 points, you can say “a typical score is within about 10 points of the mean” — a statement that makes intuitive sense.

The formula uses Greek letters as shorthand. Here’s what each one means:

SymbolNameMeaning in this context
σlowercase sigmaStandard deviation of a population
ss (Roman)Standard deviation of a sample (what we usually compute)
σ² / s²sigma-squared / s-squaredVariance — SD before taking the square root (in squared units)
Σuppercase SigmaSummation — “add up all the following terms”
μmuPopulation mean (true average for the whole group)
xx-barSample mean (average computed from collected data)
nn (Roman)Sample size — number of observations

Tip: lowercase σ and uppercase Σ look similar but mean very different things. Context matters: σ (or s) is a single number measuring spread; Σ is an instruction to sum a list of numbers.

Normal Distribution of Exam Scores (mean = 70)

Within ±1σ (≈68%) Within ±2σ (≈95%) Beyond ±2σ (≈5%) Mean ±1σ ±2σ


          
          
10.0
Controls how spread out the scores are (mean is fixed at 70)
  • SD is in the same units as the data. If scores are in points, the SD is in points. A score of 80 means very different things in a class with SD=2 versus SD=20. Try dragging the SD slider to see how the spread changes!
  • The 68-95-99.7 rule (Empirical Rule). For bell-shaped data, about 68% of observations fall within ±1 SD of the mean, 95% within ±2 SD, and 99.7% within ±3 SD. Check the stats output above — the percentages match the rule!
  • Mean and SD are independent. The spread of the bell curve depends only on σ, not on where the curve is centered. A class with mean 60 and SD=10 has the exact same bell shape as one with mean 85 and SD=10 — just shifted on the score axis.
  • SD vs. Variance. Variance is SD². It’s useful in math (variances add across independent variables) but hard to interpret because it’s in squared units. The SD is just its square root — back in the original units.
  • Outliers inflate the SD. Because deviations are squared before averaging, extreme points count disproportionately. A few outliers can make the SD much larger than it would otherwise be.
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Central Limit Theorem

Statistics · Inference

Imagine this: You sample 30 people and compute their average income. You do this thousands of times. The distribution of income in the population might be wildly skewed — but the distribution of those sample averages will be approximately normal.

This is the Central Limit Theorem: for large enough n, the sampling distribution of the mean is normal regardless of the population shape. It underlies every t-test, confidence interval, and regression inference you will ever run.

Pick a population shape, set the sample size, and simulate.
Shape of the underlying population
30
Observations drawn per sample
Distribution of Sample Means — 2,000 simulations
Run the simulation to see results.
Click “Simulate” to run the demonstration.

Key Takeaways

  • The CLT is why the normal distribution is everywhere. Individual observations don’t need to be normal — only sample means do. That’s the foundation of t-tests, confidence intervals, and regression inference.
  • Larger n ︎→︎ faster convergence. At n = 1, sample means look like the raw population. By n = 30, the distribution is usually close to normal. At n = 100, it’s excellent even for severely skewed data.
  • SE = σ/√n. As n increases, sample means cluster more tightly around μ. Quadrupling the sample size halves the standard error.
  • Independence matters. The CLT assumes random, independent observations. Clustered or time-series data with autocorrelation can violate this and require different inference methods.

Confidence Intervals

Econometrics · Statistical Inference

Imagine this: You want to know the average exam score for every student in the school. But you can't ask everyone — that would take forever! So you ask a smaller group and use their average as your best guess. A confidence interval is like saying: "I'm pretty sure the real answer is somewhere in this range."

The wider the range, the more sure you can be it contains the truth. The narrower the range, the more precise — but the less sure. Play with the controls below to see how it works!
The Confidence Interval Formula
Estimate ± Z-value × SD / √n
Our best guess How confident Standard Error
The Estimate (Point Estimate)
This is the average exam score from the students we actually asked — our single best guess for the true average. In our example, we surveyed a group and got an average of 78. It's the center of our confidence interval. The more students we ask, the closer this number tends to be to the real answer.
The Z-value (Critical Value)
This number comes from the normal distribution table and controls how confident we want to be. A higher confidence level means a bigger Z-value, which stretches the interval wider. For 90% confidence it's 1.645, for 95% it's 1.960, and for 99% it's 2.576. Think of it as a "confidence multiplier" — the more sure you want to be, the bigger the multiplier.
Standard Error (SD / √n)
The standard error tells us how much our sample average is likely to wiggle from sample to sample. It comes from two things: the standard deviation (how spread out individual scores are) divided by the square root of the sample size. This is why collecting more data helps — a bigger n means a smaller standard error, which means a narrower interval.
This is where the Z-value in the formula comes from. The row matching your selected confidence level is highlighted.
Confidence Level Significance (α) Tail Area (α/2) Z* (Critical Value) Meaning
90%0.100.051.645We allow a 10% chance of being wrong
95%0.050.0251.960We allow a 5% chance of being wrong
99%0.010.0052.576We allow a 1% chance of being wrong
dft* at 90%t* at 95%t* at 99%
52.0152.5714.032
101.8122.2283.169
151.7532.1312.947
201.7252.0862.845
251.7082.0602.787
301.6972.0422.750
501.6762.0092.678
1001.6601.9842.626
1.6451.9602.576
Z vs. t: The top table shows Z* values (used when the sample is large, roughly n > 30). The bottom table shows t* values, which depend on degrees of freedom (df = n − 1). Notice how t* values get closer to Z* as df increases — with a big enough sample, they're practically the same!
The Confidence Interval for Average Exam Score
78 points
Point Estimate
95% CI
More sure = wider range. It's a tradeoff!
30
More students = more reliable guess
5.0
Higher = scores are all over the place. Lower = everyone scored similarly.
Standard Error (SD / √n): 0.91
  • Ask more students ︎→︎ narrower interval. The more people you ask, the better your guess. Try dragging the sample size slider to the right!
  • More variation in scores ︎→︎ wider interval. If some students got 20 and others got 100, it's harder to pin down the average. Try increasing the standard error!
  • More confidence ︎→︎ wider interval. Saying "I'm 99% sure" requires casting a wider net than "I'm 90% sure." Try switching between the confidence level buttons!
  • It's always a tradeoff. You can be very precise OR very confident, but not both at the same time (unless you get more data!).
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Hypothesis Testing & P-Values

Statistics · Inference

Imagine this: You run a regression on wage data and get a coefficient on education of 0.08 — each extra year of school is associated with an 8% wage increase. But is that a real effect, or just random noise in your sample?

The null hypothesis H0 says the true coefficient is zero: no effect. The t-statistic measures how many standard errors your estimate sits from zero. The p-value asks: if H0 were true, how often would you see a result this extreme just by chance?

Drag the t-statistic below and watch the p-value and decision update in real time.
The t-Statistic
t  =  β̂  ⁄  SE(β̂)
β̂ — OLS estimate SE(β̂) — standard error
β̂ — The OLS Estimate
This is your regression coefficient — the sample estimate of the true population parameter β. It tells you the estimated effect of X on Y. Because it comes from a sample, it carries uncertainty. Under H0: β = 0, a t-stat far from zero suggests the true effect is probably nonzero.
SE(β̂) — Standard Error
The standard error measures uncertainty in your estimate. A larger sample reduces SE (SE ∝ 1/√n), making your t-statistic larger and your test more powerful. A noisier dataset (higher σ²) inflates SE, making it harder to reject H0.
Standard Normal Distribution — Rejection Regions (shaded red)
0.00
Drag to simulate different test statistics
Threshold for rejecting H0
Two-tailed: H1: β ≠ 0    One-tailed right: H1: β > 0    One-tailed left: H1: β < 0


          
          

Key Takeaways

  • A p-value is not the probability H0 is true. It’s the probability of seeing data this extreme or more assuming H0 were true. These are very different things.
  • Failing to reject ≠ accepting H0. You may simply lack enough data (statistical power) to detect a real effect. Absence of evidence is not evidence of absence.
  • Statistical significance ≠ economic significance. With a large enough sample, even a trivially small effect will produce p < 0.05. Always report the coefficient magnitude and a confidence interval alongside the p-value.
  • α = 0.05 is conventional, not sacred. It was popularized by Fisher in the 1920s. The right threshold depends on the cost of false positives vs. false negatives in your specific application.
  • Two-tailed vs. one-tailed. A two-tailed test asks: is β ≠ 0? A one-tailed test asks: is β > 0 or β < 0? Toggle between them above. Note that one-tailed tests have less demanding critical values — use them only when theory strongly predicts a direction before seeing the data.

Quick Check

Type I & Type II Errors, Statistical Power

Statistics · Inference

Every hypothesis test can go wrong in two ways. A Type I error (α) is rejecting H0 when it’s actually true — a false positive. A Type II error (β) is failing to reject H0 when it’s actually false — a false negative. Power = 1 − β is the probability of correctly detecting a true effect.

The chart below shows the H0 and H1 distributions. The critical value separates “reject” from “fail to reject.” Drag the effect size to see how power changes.
α — Type I (false positive) β — Type II (false negative) Power — correct rejection
H0: N(0,1) vs H1: N(δ,1) — Two-Tailed Test
1.50
Distance between H0 and H1 means
Sets the critical value for rejection


          

Key Takeaways

  • α and β trade off against each other. Making the test stricter (lower α) moves the critical value further out, which increases β. You will miss more true effects. There is no free lunch.
  • Power increases with effect size and sample size. Larger true effects are easier to detect. More data reduces SE, which narrows both distributions and makes them easier to separate.
  • Underpowered studies are dangerous. A study that finds p > α may simply lack the sample size to detect the effect — not evidence that no effect exists.
  • Power analysis should happen before data collection. Given a target power (e.g., 80%), a hypothesized effect size, and α, you can calculate the required n. Post-hoc power calculations are generally uninformative.

Ordinary Least Squares

Econometrics · Linear Regression

OLS fits a line through data by minimizing the sum of squared residuals. The sliders add measurement error to each variable independently. The hidden variable slider introduces an omitted confounder correlated with both x1 and x2 — watch how the coefficients get pulled as its effect on Y grows.



        
We usually show 2D projections, but the 3D view is a more complete picture of the regression.
σ1 = 0.30
σ2 = 0.30
U = 0.00
x1 (blue): adds measurement error to x1 only — blue dots spread horizontally, x1's standard error grows.

x2 (red): adds measurement error to x2 only — red dots spread horizontally, x2's standard error grows.

hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the regression. As U rises, its effect on Y grows and the coefficients on x1 and x2 pick up its influence — omitted variable bias.

Instrumental Variables

Econometrics · Causal Inference

OLS is unbiased only when the regressor X is uncorrelated with the error. When X is endogenous — correlated with unobserved determinants of Y — OLS picks up not just the causal effect but also the back-door path through the omitted variable. An instrumental variable Z breaks the endogeneity: it shifts X (relevance) but has no direct effect on Y (exclusion restriction). 2SLS uses only the exogenous part of X's variation — the part explained by Z — to identify β.



        

First Stage — Z ︎→︎ X

Structural Equation — X ︎→︎ Y

π = 1.50
ρ = 0.50
first-stage strength π (blue): controls how strongly Z predicts X. Higher π = larger first-stage F-statistic = more precise IV estimate. When F < 10 the instrument is "weak" and 2SLS can be severely biased and unreliable.

endogeneity ρ (purple): the correlation between X and the error term — the source of OLS bias. OLS absorbs the back-door path through the omitted variable; 2SLS strips it out. At ρ = 0 both estimators are consistent; as ρ rises the OLS coefficient drifts while 2SLS stays near 1.

Maximum Likelihood Estimation

Econometrics · Estimation

Maximum Likelihood Estimation finds parameters that make the observed data most probable. For binary outcomes we use logistic regression — MLE's answer to OLS for 0/1 data. Instead of minimizing squared residuals, MLE maximizes the log-likelihood. Output shows z-statistics and a Pseudo R² in place of the OLS R².



        
We usually show 2D projections, but the 3D view is a more complete picture of the regression.
σ1 = 0.30
σ2 = 0.30
U = 0.00
x1 (blue): adds measurement error to x1 — attenuation bias shrinks the x1 coefficient toward zero.

x2 (red): adds measurement error to x2 — same effect on x2's coefficient.

hidden variable U (purple): U is correlated with both x1 and x2 but excluded from the logit. As U rises, it shifts the true probabilities and the estimated coefficients pick up its influence.

Fixed and Random Effects

Panel Data · Econometrics

With panel data — multiple observations per unit over time — unobserved individual characteristics (fixed effects) can bias pooled OLS. Demeaning subtracts each unit's mean from its observations, eliminating any time-invariant omitted variable. The fixed effects estimator exploits only within-unit variation to identify β. Random effects instead treats the individual effect as random and uncorrelated with X, allowing between-unit variation to also inform the estimate — gaining efficiency at the cost of that assumption.



        
σ = 0.10
σα = 0.00
U = 0.00
The dashed lines show the regression slope estimated within each group separately — each one is close to the true β. The solid OLS line is driven by between-group variation and diverges when σα is large. Clicking Demean collapses the groups to a common origin.

unit heterogeneity (σα): scales the time-invariant group intercepts αi, which are correlated with group-level X. This biases pooled OLS but is removed exactly by demeaning — the FE/RE estimates are unaffected.

hidden variable (U): U is a time-varying omitted variable negatively correlated with within-group X variation. Because it varies within groups, demeaning does not remove it — U biases both pooled OLS and FE/RE estimates. Push far enough and the slope flips sign even after demeaning.

clustered std. error (σ): adds idiosyncratic within-unit noise — widens standard errors but does not bias the slope. This is the variation that clustered standard errors are designed to account for.

R-Squared & Goodness of Fit

Regression · Model Fit

After running a regression, R² is the first thing everyone looks at. It measures what fraction of the total variation in Y is explained by the model. R² = 1 means a perfect fit; R² = 0 means the model explains nothing.

But R² can mislead. A model with high R² may be spurious, overfit, or irrelevant for causal questions. And a valid causal estimate from an IV regression might have R² = 0.02. R² measures fit — not correctness, not causality.

Adjust the sliders to see how R² responds to noise and slope.
1.5
Higher noise = more scatter around the line
1.0
The true effect of X on Y


          

Key Takeaways

  • R² = 1 − RSS/TSS. TSS is total variation in Y; RSS is the variation left over after the model. R² is the fraction the model accounts for.
  • High R² does not mean the regression is valid. A spurious regression of two trending time series can produce R² > 0.99 even when there’s no real relationship.
  • R² always increases when you add variables, even irrelevant ones. Use adjusted R² or information criteria (AIC, BIC) when comparing models with different numbers of regressors.
  • In causal inference, R² is often unimportant. What matters is whether β̂ is unbiased, not how much of Y’s variance is explained.

Omitted Variable Bias

Regression · Causal Inference

You regress wages on education and get β̂ = 0.12. But ability also affects wages and is correlated with education. When you omit ability from the model, OLS attributes some of ability’s effect to education. The estimate is biased.

The bias formula: Bias ≈ γ × ρ(X, Z), where γ is the effect of the omitted variable Z on Y, and ρ is how correlated Z is with X. If either is zero, there is no bias.

Move the sliders to see how the biased estimate (dashed red) diverges from the true line (teal).
0.00
How correlated is the omitted variable with X?
0.0
How much does the omitted variable shift Y?


          

Key Takeaways

  • Bias = γ × δ̂, where δ̂ is the coefficient from regressing Z on X. If the omitted variable is uncorrelated with X (ρ = 0), there is no bias even if Z strongly affects Y.
  • You can sign the bias. If Z has a positive effect on Y (γ > 0) and is positively correlated with X (ρ > 0), β̂ will be upward biased. Knowing the direction helps even if you can’t measure Z.
  • Adding controls reduces OVB only if they proxy the omitted variable. Adding irrelevant controls doesn’t help and can introduce “bad control” bias.
  • This is why randomization is valuable. Random assignment of X makes ρ(X, Z) ≈ 0 for all omitted Z, eliminating OVB by design.

Heteroskedasticity

Regression · Diagnostics

Homoskedasticity means the variance of regression residuals is constant across all values of X. Heteroskedasticity means it isn’t — the scatter fans out (or in) as X changes.

OLS is still unbiased under heteroskedasticity, but standard errors are wrong. The usual SEs are too small where variance is high, making your t-stats too large and p-values too small. The fix is heteroskedasticity-robust standard errors (HC1/HC2/HC3).

Move the severity slider to see the residual fan grow.
0.0
0 = homoskedastic    2 = severe fanning


          

Key Takeaways

  • OLS coefficients are still unbiased. Heteroskedasticity affects inference (standard errors, t-stats, p-values), not the point estimates themselves.
  • Always use robust standard errors. In R: lm_robust() or coeftest(m, vcov=vcovHC(m)). In Stata: reg y x, robust. The cost is zero; there is no reason not to.
  • Visual check: residuals vs. fitted values plot. If the spread of residuals increases with fitted values, heteroskedasticity is likely. The Breusch–Pagan or White test provides a formal check.
  • Weighted Least Squares (WLS) can be more efficient than OLS when the form of heteroskedasticity is known, by down-weighting high-variance observations.

The Regression Weighting Problem

Aronow & Samii (2016) — Econometrics · Program Evaluation

When you run OLS with control variables, you may assume you are estimating the average treatment effect (ATE) across your full sample. Aronow and Samii (2016) show this is not the case. OLS implicitly assigns each observation a weight based on how much the treatment varies conditional on the covariates. Observations where treatment is nearly perfectly predicted by covariates receive very little weight — they barely influence the estimate at all. The result is that OLS identifies an effect for a subset of the data that may look very different from the full sample.

The core insight: OLS estimates a weighted average treatment effect where the weight for unit i is proportional to the conditional variance of treatment given covariates — not a simple average over the full sample.

The Math

Let Di be the treatment indicator and Xi be covariates. Define the OLS residual from regressing treatment on covariates:

ê_i = D_i − E[D_i | X_i]

The OLS estimator of the treatment coefficient is a weighted average of unit-level treatment effects, with weights:

w_i = ê_i² / Σ ê_j²

Units whose treatment status is nearly determined by their covariates (êi ≈ 0) receive near-zero weight. The effective sample — the observations actually driving your estimate — can be far smaller and systematically different from your full sample.

Why It Matters

  1. External validity: Your estimate may generalize only to the effective sample, not the full population you studied.
  2. Heterogeneous effects: If treatment effects vary by covariates, OLS does not give you the ATE — it gives you the treatment effect for the people in the middle of the covariate distribution.
  3. Reporting: You should report who is in your effective sample alongside your estimates.

Simulation: Drag to See the Problem

The slider controls how strongly a covariate predicts treatment — simulating increasing selection bias. Drag it right to watch the effective sample shrink and diverge from the full sample in real time.

AS Weight by Propensity Score

Where OLS Gets Its Identification

0.0 — random assignment
Effective n = out of 400  (% of sample driving the estimate)
At low separation (slider left): treated and control look alike, propensity scores cluster near 0.5, weights are roughly uniform — OLS estimates close to the ATE across the full sample.

At high separation (slider right): treated and control barely overlap. Only observations in the narrow middle zone identify the effect — shown in blue on the right. The gray mass of the full sample gets near-zero weight and effectively disappears from the estimate.

Real Data: LaLonde (1986)

The LaLonde dataset from the National Supported Work (NSW) job training experiment contains 445 treated and 260 control observations with covariates including age, education, race, marital status, and prior earnings. The charts below show how OLS weights are distributed and how the effective sample differs from the full sample.

Aronow, P. M., & Samii, C. (2016). Does regression produce representative estimates of causal effects? American Journal of Political Science, 60(1), 250–267.  ·  LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review, 76(4), 604–620.

Production Possibilities & Comparative Advantage

Microeconomics · International Trade

Every choice has a cost. The Production Possibilities Frontier (PPF) shows all efficient combinations of two goods a country can produce with its current resources. Points on the curve are efficient; inside is wasteful; outside is unattainable — unless there is trade.

Comparative advantage says: even if one country is better at producing everything, both benefit by specializing in the good with the lower opportunity cost. Click the button to see consumption expand beyond the PPF.

          

Key Takeaways

  • OC, not absolute productivity, drives comparative advantage. A has CA in wheat (OC = 0.5 cloth < 2). B has CA in cloth (OC = 0.5 wheat < 2). Even if one country is better at both goods, trade still benefits both.
  • Specialization + trade pushes consumption outside the PPF. After trading at terms between both OC ratios (here 1 : 1), both countries consume bundles that were individually unattainable.
  • The PPF slope = −(OC of the good on the x-axis). A steeper PPF means a higher OC — comparative disadvantage in that good.
  • Trade is not zero-sum. Specialization creates new value; gains accrue to both sides simultaneously.

Demand

Microeconomics · Consumer Behavior

The demand curve shows how many units consumers are willing and able to buy at each price. Generally, as price rises, people buy less — that’s the Law of Demand. Use the price slider to move along the curve and see how quantity demanded responds.

But the whole curve can also shift. When something other than price changes — income, tastes, prices of related goods, or expectations — demand at every price changes simultaneously. Use the demand shift slider to see a rightward shift (increase) or leftward shift (decrease).

The Demand Curve

$40
Movement along the curve — as price rises, quantity demanded falls
Baseline
Shifts the entire curve — right = demand increases; left = demand decreases


          
          
  • Income. For normal goods, rising income increases demand (right shift). For inferior goods, rising income decreases demand as consumers trade up.
  • Prices of substitutes. If a substitute becomes more expensive, demand for this good rises. If it becomes cheaper, demand falls.
  • Prices of complements. If a good used alongside this one gets cheaper, demand for this good rises; more expensive, demand falls.
  • Tastes and preferences. Health trends, advertising, or fashion can shift demand in either direction without any price change.
  • Expectations. If consumers expect higher prices in the future, they buy more now — shifting current demand right.
  • Number of buyers. More consumers in the market means higher demand at every price.
  • Law of Demand. All else equal, a higher price means lower quantity demanded. This is why demand curves slope downward. Drag the price slider up — the point moves left (less is demanded).
  • Movement along the curve vs. a shift. A price change moves you along the existing curve. A change in income, tastes, or related prices shifts the entire curve. These are fundamentally different events.
  • Substitutes and complements. If a substitute gets more expensive, consumers switch here — demand rises. If a complement gets more expensive, people use less of both — demand falls.
  • Normal vs. inferior goods. For most goods, higher income shifts demand right. For inferior goods (like instant noodles), higher income shifts demand left as people upgrade.
  • Expectations are part of demand. Anticipated price increases shift today’s demand right. Demand is forward-looking, not just about current prices.
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Supply

Microeconomics · Producer Behavior

The supply curve shows how many units producers are willing and able to sell at each price. Generally, as price rises, producing becomes more profitable and producers offer more — that’s the Law of Supply. Use the price slider to move along the curve and see how quantity supplied responds.

The entire curve can also shift. When input costs change, technology improves, the number of sellers changes, or government policy shifts — supply at every price changes simultaneously. Use the supply shift slider to see a rightward shift (more supply) or leftward shift (less supply).

The Supply Curve

$40
Movement along the curve — as price rises, quantity supplied increases
Baseline
Shifts the entire curve — right = supply increases (lower costs); left = supply decreases (higher costs)


          
          
  • Input (resource) prices. Cheaper labor, materials, or energy lowers production costs — supply increases (right shift). More expensive inputs shift supply left.
  • Technology. Better production technology lowers costs and lets producers profitably offer more at every price — supply increases (right shift).
  • Number of sellers. More firms entering the market increases market supply. Firms exiting decreases it.
  • Government policies. Subsidies lower costs and increase supply. Taxes raise costs and decrease supply.
  • Expectations. If producers expect higher future prices, they may withhold supply now, decreasing current supply.
  • Prices of related goods in production. If a substitute in production becomes more profitable, producers switch, decreasing supply of this good.
  • Law of Supply. All else equal, a higher price means higher quantity supplied. Supply curves slope upward because higher prices make production more profitable. Drag the price slider up — the point moves right (more is supplied).
  • Movement along the curve vs. a shift. A change in the good’s own price moves you along the supply curve. A change in costs, technology, or the number of sellers shifts the entire curve.
  • Lower costs shift supply right. When inputs become cheaper or technology improves, producers can profitably supply more at every price. This is what long-run technological progress looks like.
  • The supply curve is a marginal cost curve. A producer only supplies a unit if the price covers the marginal cost of producing it. The supply curve tells you the minimum price needed to bring each additional unit to market.
  • Market supply = sum of firm supply curves. The market supply curve is the horizontal sum of all individual firm supply curves. More sellers means a larger total supply at every price.
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Market Equilibrium

Microeconomics · Market Equilibrium

Imagine this: Apple announces a new iPhone — millions of people suddenly want one. That’s a demand shift: more people want the product at every price. Now imagine oil prices spike, making it expensive to run factories and ship goods. That’s a supply shift: producers offer fewer units at every price. In both cases, the market adjusts until supply meets demand again.

A market equilibrium is the price and quantity where buyers and sellers agree — where the amount consumers want to buy exactly equals the amount producers want to sell. Use the buttons below to shift the curves and see what happens!

Supply & Demand

100
Higher = more demand at every price (curve shifts right/up)
15
Lower = more supply at every price (curve shifts right/down)


          
          
  • Demand shift vs. movement along demand. A demand shift happens when something other than price changes — income, tastes, prices of related goods, expectations, or the number of buyers. Moving along the curve is just a response to a price change. Try “Demand ↑” to see the whole curve move!
  • Supply shift vs. movement along supply. Same idea: a supply shift happens when costs change, technology improves, or the number of sellers changes. “Supply ↑” simulates producers becoming more efficient (lower costs = more offered at every price).
  • The four core results. Demand ↑ ︎→︎ P↑ Q↑.   Demand ↓ ︎→︎ P↓ Q↓.   Supply ↑ ︎→︎ P↓ Q↑.   Supply ↓ ︎→︎ P↑ Q↓. These four cases are the foundation of ECO 2020.
  • Shortage vs. surplus. If price is below equilibrium, consumers want more than producers offer — a shortage. Sellers raise prices until the market clears. Above equilibrium: a surplus drives prices back down. Equilibrium is stable.
  • Both curves shift. Real markets often have both curves shifting simultaneously. Try moving both sliders at once. When both increase, quantity rises for certain — but the price change is ambiguous and depends on which shift is larger.
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Consumer & Producer Surplus

Microeconomics · Welfare

Markets don’t just clear — they create value. Consumer surplus (CS) is what buyers gain: the difference between what they were willing to pay and what they actually paid. Producer surplus (PS) is what sellers gain: the difference between the price they received and their minimum willingness to accept. Together, CS + PS is total surplus — the total economic value created by trade.

The competitive equilibrium maximizes total surplus. Any price above or below equilibrium destroys some of that value as deadweight loss.

Move the price slider above or below equilibrium to see what happens.
$7.00
Below equilibrium = price ceiling    Above = price floor


          

Key Takeaways

  • CS = area above price, below demand curve. For a linear demand P = a − bQ, CS at price P* is ½(a − P*) × Q*.
  • PS = area below price, above supply curve. For a linear supply P = c + dQ, PS at price P* is ½(P* − c) × Q*.
  • Price controls create deadweight loss. A binding ceiling (below equilibrium) causes a shortage; a binding floor (above equilibrium) causes a surplus. Both prevent mutually beneficial trades.
  • Total surplus is maximized at competitive equilibrium. This is the First Welfare Theorem: free markets allocate resources efficiently (under idealized conditions).

Elasticity

Microeconomics · Price Elasticity of Demand

Imagine this: A gas station raises prices by 10%. Most drivers still fill up — they need gas to get to work. Quantity demanded barely changes. Now a clothing store raises prices by 10%. Many shoppers switch to competitors or wait for a sale. Quantity demanded drops sharply.

Price Elasticity of Demand measures how sensitive consumers are to price: Price Elasticity of Demand = % change in Qd ÷ % change in P.  elasticity > 1 = elastic (responsive);  elasticity < 1 = inelastic (unresponsive).

Drag the price slider to see how elasticity changes along the curve. The shaded rectangle is total revenue (P × Q) — watch how it changes!

Demand Curve — shaded area = Total Revenue (P × Q)

$60
Move along the demand curve — watch elasticity and total revenue change
Moderate
Steeper = less responsive to price (more inelastic); flatter = more responsive (more elastic)


          
          
  • Price Elasticity of Demand = % ΔQd ÷ % ΔP. A value of −2 means a 1% price increase causes a 2% drop in quantity demanded. Price Elasticity of Demand is always negative (higher price → lower Qd). We compare the absolute value to 1 to classify elasticity.
  • Elastic vs. inelastic. Absolute value > 1: elastic (price-sensitive). Absolute value < 1: inelastic. Absolute value = 1: unit elastic. Drag the price slider toward the top of the curve — elastic. Near the bottom — inelastic. Elasticity varies along a single linear curve!
  • Elasticity and total revenue (TR = P × Q). Elastic demand: price ↑ ︎→︎ TR ↓ (Qd falls more than price rose). Inelastic demand: price ↑ ︎→︎ TR ↑ (Qd barely falls). Watch the shaded rectangle shrink or grow as you move the price slider.
  • Determinants of elasticity. More elastic when: many substitutes, luxury good, large share of income, time to adjust. More inelastic when: few substitutes, necessity, small share of income, urgently needed.
  • Slope ≠ elasticity. A steeper demand curve is not the same as a more inelastic one at every point. A linear curve changes from elastic to inelastic as you move down it — even though the slope is constant throughout.
Practice Questions
Question 1 of 5
You answered all 5 questions!
Score: / 5

Price Controls

Microeconomics · Government Intervention

Governments sometimes override market prices. A price ceiling caps the price below equilibrium (e.g., rent control, gas-price caps) — causing a shortage. A price floor sets a minimum price above equilibrium (e.g., minimum wage, farm price supports) — causing a surplus. Both prevent the market from clearing and create deadweight loss.

Drag below $7 for a ceiling — drag above $7 for a floor.
$7.00
Below $7 = price ceiling   ·   Above $7 = price floor

          

Key Takeaways

  • Non-binding controls have no effect. A ceiling above P* or a floor below P* is ignored by the market.
  • Price ceilings create shortages. Qd > Qs. The quantity traded is limited by supply. Examples: rent control, wartime rationing.
  • Price floors create surpluses. Qs > Qd. The quantity traded is limited by demand. Examples: minimum wage, agricultural supports.
  • Both create DWL. The prevented trades — the DWL triangle — represent mutually beneficial exchanges that can’t happen because the price is stuck at the wrong level.

Monopoly vs. Perfect Competition

Microeconomics · Market Structure

Market structure determines welfare. A competitive firm is a price-taker: it sets P = MC, maximizing total surplus. A monopolist faces the entire demand curve and maximizes profit by equating marginal revenue to MC — restricting output and raising price above MC. The gap between Qm and Qc is pure deadweight loss: units buyers value more than MC that simply don’t get produced.

Move the MC slider to see how cost conditions change the efficiency gap.
$2.00
Lower MC → wider Qm–Qc gap  ·  Higher MC → smaller DWL

          

Key Takeaways

  • MR < P for a monopolist. To sell one more unit the monopolist must lower price on all units. For P = a − bQ, MR = a − 2bQ — same intercept, double the slope.
  • Qm < Qc, Pm > MC. The monopolist produces less and charges more than a competitive market would.
  • DWL = ½(Pm − MC)(Qc − Qm). This triangle is the value of unrealized mutually beneficial trades.
  • Market power is about price-setting, not just profit. Antitrust policy targets the DWL caused by output restriction, not profit itself.

Externalities & Pigouvian Policy

Microeconomics · Market Failures

When transactions affect third parties, markets fail. A negative externality (e.g., pollution, congestion) means social cost exceeds private cost — the market over-produces. A positive externality (e.g., education, vaccination) means social benefit exceeds private benefit — the market under-provides. In both cases the private equilibrium is not socially efficient.

A Pigouvian tax (negative) or subsidy (positive) equal to the externality size internalizes it and restores efficiency. Toggle type and drag the slider.
$0.00  per unit

          

Key Takeaways

  • Negative externalities cause over-production. MSC > MPC ⇒ the supply curve understates the true social cost. Market Q exceeds efficient Q.
  • Positive externalities cause under-provision. MSB > MPB ⇒ the demand curve understates the true social benefit. Market Q falls short of efficient Q.
  • Pigouvian tax/subsidy internalizes the externality. A tax = marginal external cost shifts supply up to MSC; a subsidy = marginal external benefit shifts demand up to MSB. In both cases the corrected equilibrium equals the social optimum.
  • The Coase theorem is an alternative. If property rights are well-defined and transaction costs are low, private bargaining can achieve the efficient outcome without government intervention.

Tax Incidence

Microeconomics · Policy

Who actually pays a tax? The legal answer (who writes the check) often differs from the economic answer (who bears the burden). A per-unit tax creates a wedge between the price buyers pay (Pb) and the price sellers receive (Ps): Pb − Ps = t.

The burden is shared according to relative elasticities: the more inelastic side bears more of the tax. If demand is perfectly inelastic, buyers pay 100%. If supply is perfectly inelastic, sellers bear 100%.

Drag the tax slider and watch the burden split and deadweight loss grow.
$0.00
Higher tax ︎→︎ larger wedge between buyer and seller price


          

Key Takeaways

  • Statutory incidence ≠ economic incidence. It doesn’t matter whether the tax is levied on buyers or sellers — the economic burden is determined solely by relative elasticities.
  • The more inelastic side bears more of the tax. If |εD| < |εS|, buyers pay more. Gasoline taxes fall mostly on consumers because fuel demand is inelastic.
  • Taxes create deadweight loss. By reducing quantity below the efficient level, some mutually beneficial trades are prevented. DWL grows with the square of the tax rate.
  • Tax revenue peaks before DWL explodes. There is a Laffer-curve logic at the micro level: very high tax rates shrink the tax base so much that revenue may fall even as DWL continues to grow.

Public Goods & the Free-Rider Problem

Microeconomics · Market Failures

Public goods are non-rival and non-excludable. Once provided, no one can be stopped from consuming them, and one person’s consumption doesn’t reduce availability for others. Private markets systematically under-provide them because everyone wants someone else to pay — the free-rider problem.

For public goods, efficiency requires vertical summation of demand: at any quantity, the social marginal benefit equals the sum of each individual’s marginal benefit. The Nash equilibrium falls short of this optimum.
$3.00
Efficient Q = intersection of Social MB (purple) and MC (amber)  ·  Try MC = $4–$8 to see free-rider DWL grow

          

Key Takeaways

  • Private goods: horizontal summation. At each price, sum quantities demanded across consumers. This gives market demand.
  • Public goods: vertical summation. At each quantity, sum marginal benefits across consumers (all enjoy the same unit). This gives social MB.
  • The free-rider problem causes under-provision. In Nash equilibrium, the lower-valuation consumer free-rides entirely; the dominant consumer stops at their own MB = MC, well below the social optimum.
  • Remedies: government provision, corrective subsidies, or Coasian bargaining. If all three conditions are met (non-rivalry, non-excludability, low transactions costs), negotiated cost-sharing can achieve Q*.

Price Discrimination

Microeconomics · Monopoly Pricing

Can a monopolist do better than a single price? Price discrimination means charging different prices to different buyers or for different units. Under a single price, the monopolist restricts output and creates DWL. Under perfect (1st-degree) price discrimination, each buyer pays exactly their willingness to pay — output expands to the competitive level, DWL disappears, but all consumer surplus is extracted.

Toggle between regimes to see how the efficiency–equity trade-off plays out.

          

Key Takeaways

  • Single-price: DWL but some CS. The monopolist charges everyone P = $6. Buyers with WTP > $6 keep consumer surplus; units 40–80 go unsold (DWL = $80).
  • Perfect discrimination: efficient but no CS. Every unit is sold at the buyer’s WTP. Output reaches the competitive level (Q = 80, P = MC = $2). DWL = $0, but CS = $0 too.
  • Real-world discrimination is imperfect (2nd & 3rd degree). Airlines, coupons, student/senior discounts, and bundling are all partial forms that partially close the DWL gap without extracting all CS.
  • Price discrimination requires market power. Competitive firms cannot price-discriminate: buyers would simply switch to a cheaper rival.
Practice Questions
Question 1 of 4

Simple Exponential Smoothing

Business Forecasting · Exponential Smoothing

The smoothing parameter α controls how quickly the SES forecast responds to new information. A high α (near 1) places almost all weight on the most recent observation — the forecast tracks the data closely but is noisy. A low α (near 0) spreads weight broadly across all past observations — the forecast is very smooth but slow to react to level changes.

The chart shows 36 months of retail sales data (blue) alongside the SES one-step-ahead forecast (red). Drag the α slider and watch how the forecast’s responsiveness changes.

Retail Sales vs. SES One-Step-Ahead Forecast

0.30
Low α = smooth, slow forecast  —  High α = reactive, closely-tracking forecast


          
          
  • Initialize: Set the level equal to the first observation: 1 = y1.
  • Update level: t = α yt + (1 − α) ℓt−1 — a weighted average of the new observation and the prior level.
  • Forecast: ŷt+1|t = ℓt — all future forecasts equal the current level (flat forecast line).
  • Why “exponential”? Substituting the level equation repeatedly shows that weights decay geometrically: α, α(1−α), α(1−α)², …
  • Optimal α is estimated by minimizing the sum of squared one-step forecast errors (SSE).

Moving Average Smoother

Business Forecasting · Decomposition

A moving average of order k replaces each observation with the simple average of the most recent k observations. Choosing k equal to the seasonal period (4 for quarterly data, 12 for monthly) averages exactly one full cycle — eliminating the seasonal pattern and revealing the underlying trend.

This series has 48 quarters of data with a clear seasonal swing and upward trend. Try k = 4 to watch the seasonal pattern disappear. At larger k, the trend itself is smoothed — but the smoother lags further behind the data.

Quarterly Sales: Original Series vs. MA(k) Smoother

1
k = 1: no smoothing  —  k = 4: removes quarterly seasonality  —  k = 12: heavy smoothing


        

ACF / Correlogram Explorer

Business Forecasting · Time Series Graphics & ARIMA

The autocorrelation function (ACF) measures how correlated a time series is with its own past values. The bar at lag k shows rk — the correlation between yt and yt−k. Different data-generating processes produce distinctive ACF patterns, and learning to read them is the foundation of ARIMA model identification.

Select a series type below. Red dashed lines are 95% significance bounds (±1.96 / √T). Colored bars exceed the bounds and indicate significant autocorrelation at that lag.

Sample ACF — Lags 0–20

  • AR(p): ACF tails off with exponential decay; PACF cuts off sharply after lag p.
  • MA(q): ACF cuts off sharply after lag q; PACF tails off.
  • Non-stationary (unit root): ACF decays very slowly toward zero — first-difference the series before modeling.
  • Seasonal pattern: Significant spikes at lags m, 2m, 3m, … (where m is the seasonal period).
  • White noise residuals: All ACF bars inside the 95% bounds — the model has captured all structure.