Regression Discontinuity Design

Lecture 9

What if treatment is assigned by a rule, everyone above a threshold gets it, everyone below does not?

Near the cutoff, treatment is as good as randomly assigned.

People just above and just below a threshold are nearly identical in every observable and unobservable characteristic. The only thing that differs is whether they received treatment. This near-randomness at the margin is the source of identification.

Examples:

Scholarship at GPA ≥ 3.0. Students just below 3.0 are the control group for students just above.
Medicare eligibility at age 65. People who just turned 65 vs. those who are 64.
Angrist & Lavy (1999): class size rule (Maimonides’ rule), a new class opens when enrollment exceeds 40.

The running variable (also called the forcing variable) is the score that determines treatment. The cutoff c is the threshold.

Sharp RDD: treatment switches deterministically at the cutoff.

D_i = 1[X_i ≥ c]

Everyone with running variable X_i ≥ c is treated, everyone below is not. Treatment is a deterministic step function of the running variable.

The RDD estimand is the treatment effect at the cutoff:

τ_RDD = lim_x↓c E[Y_i | X_i = x] − lim_x↑c E[Y_i | X_i = x]

This is the jump in the conditional mean of Y at the cutoff, the gap between the right and left limits. We cannot observe both limits for the same person, we estimate them by extrapolating from nearby observations.

The key assumption: continuity.

Continuity assumption

The conditional expectation functions E[Y(0) | X] and E[Y(1) | X] are continuous in X at c. In the absence of treatment, the outcome would have evolved smoothly through the cutoff.

What this rules out.

Any other variable that also jumps at exactly the same cutoff (compound treatment).
Manipulation: units sorting to just above (or below) the cutoff to select into treatment.

What it does NOT require.

The outcome need not be continuous in X everywhere, only at c.
No assumption about unobservables far from the cutoff.

Local interpretation.

RDD identifies the treatment effect only for units at the margin, those with X_i ≈ c. External validity to units far from the cutoff requires additional assumptions.

Estimate the jump using local linear regression on each side of the cutoff.

Fit a separate linear regression in X on each side, using only observations within a bandwidth h of the cutoff:

Y_i = α + τD_i + β₁(X_i − c) + β₂D_i(X_i − c) + u_i

τ̂ is the estimated jump at the cutoff. The interaction term D_i(X_i − c) allows the slope to differ on each side.

Why local linear, not local constant (kernel regression)? Local linear regression has better boundary properties. It does not inherit the bias that kernel estimators have at the edge of their support, and the cutoff is always a boundary point.

Higher-order polynomials (cubic, quartic) are sometimes used to capture curvature, but Gelman and Imbens (2019) warn that global high-order polynomials can badly overfit near the cutoff and produce misleading estimates.

Bandwidth choice trades off bias and variance. Use the Imbens-Kalyanaraman (IK) or Calonico-Cattaneo-Titiunik (CCT) optimal bandwidth.

Narrow bandwidth: observations are very similar to those at the cutoff ︎→︎ low bias, but few observations ︎→︎ high variance.

Wide bandwidth: more observations, lower variance, but units far from the cutoff may be systematically different ︎→︎ higher bias.

The CCT (2014) method selects the mean-squared-error optimal bandwidth and provides a bias-corrected, robust confidence interval. This is now standard practice in applied work.

Robustness check: always report results at half and double the chosen bandwidth. Estimates should not change dramatically. If they do, the functional form assumption within the bandwidth is doing a lot of work.

Testing the validity of a regression discontinuity.

Density (manipulation) test, McCrary (2008) / rddensity.

If units can sort across the cutoff, there should be a bunching of mass just above (or below) c.
Test for a discontinuity in the density of the running variable at c. A significant jump is evidence of manipulation.
Example: if an exam score is the running variable and teachers can round up borderline students, there will be too many students just above the passing threshold.

Covariate smoothness test.

Run the RDD regression with predetermined covariates (age, gender, prior income) as the outcome.
There should be no jump in pre-determined characteristics at c. Any jump suggests the cutoff is not cleanly separating comparable units.

Placebo cutoffs.

Run the same RDD at fake cutoffs away from the true c. There should be no jump. Significant jumps at placebo cutoffs undermine credibility.

Always plot the data. A credible RDD should be visible to the naked eye.

The standard RDD graph bins observations into equal-width bins along the running variable and plots the bin means of Y. Superimpose the fitted local linear regression lines on each side.

A convincing graph shows:

A smooth, continuous relationship between X and Y on each side of the cutoff.
A clear visual jump at c that matches the regression estimate in sign and rough magnitude.
No sharp jumps at other points in the support.

Red flags in the graph:

The data looks noisy and the “jump” is not visible without the regression lines, the estimate may be fragile.
The fitted lines curve sharply near the cutoff, a sign that a high-order polynomial is driving the result.
Bins just below and just above the cutoff have identical means, perhaps there is no real effect.

Fuzzy RDD: treatment probability jumps at the cutoff but does not go from 0 to 1.

Sharp RDD

P(D=1 | X=x) jumps from 0 to 1 at c. Treatment is a deterministic function of the running variable. Estimate directly.

Fuzzy RDD

P(D=1 | X=x) jumps at c but stays strictly between 0 and 1 on both sides. Some units do not comply with the assignment rule.

Fuzzy RDD is an IV problem. The cutoff indicator 1[X_i ≥ c] serves as the instrument for actual treatment D_i.

τ_Fuzzy = (jump in E[Y] at c) / (jump in E[D] at c)

This is a Wald estimator. The interpretation is LATE: the effect on compliers near the cutoff, units whose treatment status is changed by crossing the threshold.

Extensions: regression kink design and geographic RDD.

Regression kink design (RKD).

Instead of a jump in the level of treatment, there is a kink, a change in the slope of treatment as a function of the running variable.
Example: unemployment benefits are a linear function of prior earnings up to a cap. The slope changes at the cap.
Identification exploits the change in slope of the policy rule rather than a discontinuity in levels.

Geographic (spatial) RDD.

Policy treatment assigned by which side of a geographic boundary a unit is on (state border, school district line).
Running variable is distance from the boundary.
Key concern: boundaries themselves may create differences (infrastructure, culture) that confound the policy effect.

Multi-cutoff and multi-score RDD.

Different cutoffs for different subgroups, or two running variables simultaneously. Requires care in estimation and interpretation.

How RDD compares to other causal strategies.

RDD vs. IV.

Fuzzy RDD is IV with the cutoff as the instrument. Sharp RDD is IV with a perfect first stage.
RDD is more transparent: the instrument and the discontinuity are both directly visible in the data.

RDD vs. panel FE.

FE removes time-invariant unobservables, RDD removes all unobservables at the cutoff via continuity.
RDD does not require panel data, but estimates only a local effect. FE estimates an average effect across treated units.

RDD vs. DiD.

DiD requires parallel trends before treatment. RDD requires continuity at the cutoff.
Both identify ATT-like parameters, different settings make one more plausible than the other.

When to use RDD.

Whenever a known rule assigns treatment based on a threshold. The more arbitrary the threshold (not chosen by units), the more credible the design.

Classic RDD applications in economics.

Angrist & Lavy (1999), class size and achievement.

Israeli Maimonides’ Rule: a new class opens when enrollment crosses a multiple of 40. Class size jumps discontinuously downward at each threshold.
Smaller classes improve reading and math scores. One of the most cited RDD papers.

Card et al. (2008), Medicare at 65.

Health insurance coverage jumps sharply at age 65. Hospital admissions and mortality improve for previously uninsured.

Lee (2008), electoral incumbency advantage.

Candidates who barely win (just above 50%) vs. barely lose. Incumbents win reelection at much higher rates, identifying a large incumbency advantage.

Carpenter & Dobkin (2009), minimum legal drinking age.

Mortality jumps discontinuously at age 21. Alcohol access causes a measurable increase in death rates.

Common mistakes in RDD.

Using high-order global polynomials.

Polynomials of degree 4 or higher can create spurious jumps and badly misfit the data away from the cutoff. Prefer local linear regression within a narrow bandwidth.

Choosing bandwidth after seeing the results.

Searching over bandwidths to find a significant estimate is data mining. Use the CCT optimal bandwidth or pre-register the bandwidth choice.

Failing to test for manipulation.

Always run the density test and the covariate smoothness check. Report them in the paper.

Extrapolating the RDD estimate.

The RDD estimate applies only to units near c. Do not claim a population-average treatment effect without additional assumptions.

RDD in practice: a checklist.

1. Plot the raw data and the running variable density.

2. Run the McCrary / rddensity manipulation test.

3. Check covariate smoothness at the cutoff.

4. Estimate with local linear regression using the CCT optimal bandwidth.

5. Report results at half and double the bandwidth as robustness checks.

6. Run placebo cutoff tests.

7. If fuzzy, show the first stage jump and report the IV (Wald) estimate.

8. Discuss the local nature of the estimate. Who are the marginal units?

Software for RDD.

Stata: rdrobust, rddensity, rdplot.

Calonico, Cattaneo, and Titiunik’s suite. Computes optimal bandwidth, bias-corrected estimates, and robust confidence intervals. The current gold standard.

R: rdrobust, rddensity packages.

Same suite available on CRAN. rdplot() produces the standard RDD graph.

Key outputs to report.

Conventional estimate, bias-corrected estimate, and robust standard error (the CCT triangle).
Bandwidth used, effective sample size within bandwidth.
First-stage jump and F-statistic if fuzzy.

Practice Questions

Question 1 of 4

RDD turns a policy rule into a natural experiment by comparing units just above and just below the threshold.

Sharp RDD: treatment switches deterministically at the cutoff. Estimate the jump directly.
Fuzzy RDD: treatment probability jumps at the cutoff. Use IV (Wald estimator), interpret as LATE for compliers near the cutoff.
Key assumption: continuity of potential outcomes at c. No manipulation, no compound treatment.