Instrumental Variables

Lecture 5

What do we do when CLM 4 fails and no set of controls can fix it?

Endogeneity: when Cov(X, ε) ≠ 0

Omitted variables.

Something in ε affects both X and Y. We covered this in Lecture 3. Controls help only if the omitted variable is observed.

Simultaneity (reverse causality).

Supply and demand: price affects quantity, but quantity demanded affects price. The regressor and outcome are determined jointly.

Measurement error in X.

If we observe X* = X + v instead of true X, the measurement error v ends up in the residual and is correlated with the mismeasured regressor.
This causes attenuation bias: |β̂₁| is biased toward zero.

The instrument: a variable that moves X but has no direct effect on Y.

An instrumental variable Z provides exogenous variation in X, variation that is uncontaminated by the endogeneity problem.

Intuition: we cannot use all variation in X to identify β₁ because some of it is correlated with ε. But the variation in X driven by Z is clean. We use only that.

IV is not a statistical trick. It requires finding a genuine source of exogenous variation in the world, which is why good instruments are hard to find and highly valued in applied work.

A valid instrument must satisfy two conditions.

Condition 1, Relevance

Cov(Z, X) ≠ 0. The instrument must actually shift the endogenous variable. This is testable, run the first-stage regression and check.

Condition 2, Exclusion Restriction

Cov(Z, ε) = 0. The instrument affects Y only through its effect on X. This is not directly testable, it must be defended on theoretical grounds.

Classic instruments in economics

Quarter of birth (Angrist & Krueger 1991) for education.

Compulsory schooling laws interact with birth quarter to create variation in years of schooling that is unrelated to ability.

Distance to college (Card 1995) for education.

Proximity to a college raises the probability of attending, but is unlikely to directly affect wages conditional on education.

Vietnam draft lottery (Angrist 1990) for military service.

Random draft lottery number determines military service. Used to estimate the earnings effect of veteran status.

Rainfall (Miguel et al. 2004) for economic shocks in Africa.

Rainfall shifts income in agrarian economies, used to instrument for income in studies of conflict.

The simplest IV estimator: the Wald estimator.

When Z is binary (0/1), the IV estimator has a clean form:

β̂_IV = (Ȳ₁ − Ȳ₀) / (X̄₁ − X̄₀)

The numerator is the reduced-form effect: how much does Y change when Z switches from 0 to 1?

The denominator is the first-stage effect: how much does X change when Z switches from 0 to 1?

We scale the reduced-form by the first stage to get the effect of X on Y. If the instrument moves X weakly, the denominator is small and the estimate is imprecise.

Two-Stage Least Squares (2SLS) generalizes IV to multiple regressors.

Stage 1: regress the endogenous variable X on the instrument Z and all exogenous controls:

X_i = π₀ + π₁Z_i + π₂W_i + v_i

Save the fitted values X̂_i. This is the “clean” part of X, the variation driven by Z, not by ε.

Stage 2: regress Y on X̂_i and the controls:

Y_i = β₀ + β₁X̂_i + β₂W_i + ε_i

The coefficient β̂₁ from Stage 2 is the 2SLS estimate. Important: do not do this manually, the standard errors from the manual Stage 2 are wrong. Use your software’s ivregress or ivreg command.

What happens when the instrument is only weakly correlated with X?

Weak instruments

A weak instrument has a small first-stage coefficient.

The Wald estimator divides by X̄₁ − X̄₀. If this is near zero, the estimator explodes in variance.

Weak instruments cause 2SLS to be badly biased in finite samples.

The asymptotic theory breaks down. The 2SLS estimator can be nearly as biased as OLS.
Standard errors are also unreliable, confidence intervals are too narrow.

The first-stage F-statistic is the standard diagnostic.

Rule of thumb (Stock, Wright & Yogo 2002): first-stage F > 10 for one instrument.
More precise critical values depend on the number of instruments and desired bias tolerance.
Always report the first-stage F. Reviewers will ask for it.

IV does not estimate the Average Treatment Effect (ATE).

It estimates the Local Average Treatment Effect (LATE), the effect for compliers: observations whose treatment status changes because of the instrument.

In the draft lottery example: compliers are men who served because they were drafted and would not have served otherwise. IV estimates the earnings effect of military service for this group, not for all veterans.

LATE is internally valid, it is a real causal effect. But it may not generalize. Always ask: who are the compliers in my setting, and is the LATE the parameter I care about?

The four types of units

Compliers: take the treatment when Z = 1, not when Z = 0.

IV identifies the treatment effect for this group. They are the margin that the instrument moves.

Always-takers: take the treatment regardless of Z.

The instrument does not affect them. They contribute nothing to the IV estimate.

Never-takers: never take the treatment regardless of Z.

Same, the instrument does not affect them either.

Defiers: do the opposite of what Z says.

IV assumes there are no defiers (monotonicity assumption). Usually plausible.

With multiple instruments, we can partially test the exclusion restriction.

If we have more instruments than endogenous variables (overidentified), each instrument implies a different IV estimate. If the exclusion restriction holds for all of them, these estimates should be similar.

Sargan-Hansen test (J-test): tests whether the overidentifying restrictions are valid. H₀: all instruments are valid. A significant J-statistic suggests at least one instrument violates the exclusion restriction.

Limitation: the test requires at least one instrument to be valid to serve as the baseline. It cannot detect if all instruments are invalid in the same direction. It is a necessary but not sufficient check.

When is IV better than OLS, and when is it worse?

IV is better when endogeneity is severe.

If OLS is badly biased, even an imprecise IV estimate may be more informative.
The bias-variance tradeoff: IV trades reduced bias for increased variance.

OLS is better when endogeneity is mild and the instrument is weak.

A weak instrument with even a small exclusion restriction violation can produce a worse estimate than biased OLS.
This is the “many weak instruments” problem that has generated substantial recent econometric research.

Hausman test: formally test whether OLS and IV estimates differ significantly.

H₀: OLS is consistent (no endogeneity). Rejection suggests endogeneity is present.

Common mistakes with IV

Asserting the exclusion restriction without defending it.

The exclusion restriction is never directly testable. You must argue it on economic grounds. “I tested it” is not an acceptable answer.

Ignoring the first-stage F-statistic.

IV with a weak first stage is worse than OLS. Always check it and report it.

Doing 2SLS manually and using Stage-2 standard errors.

The SEs from the manual second stage are incorrect because they use X̂ rather than X. Use ivregress 2sls in Stata or ivreg in R.

Forgetting that IV estimates LATE, not ATE.

Think carefully about who the compliers are and whether the LATE is the parameter of interest for your question.

Always report the reduced form and the first stage.

First stage: X = π₀ + π₁Z + …, does the instrument move the endogenous variable?

Reduced form: Y = γ₀ + γ₁Z + …, does the instrument move the outcome?

The 2SLS estimate is exactly β̂_IV = γ̂₁ / π̂₁.

If the reduced form is insignificant but the first stage is strong, IV will also be insignificant, there is no effect to find. If neither is significant, your instrument is likely weak. Report all three regressions: first stage, reduced form, and 2SLS.

Practice Questions

Question 1 of 4