Limited Dependent Variables

Lecture 11

What happens when we can only observe the outcome for part of the sample, or when the outcome is bounded at zero?

Three distinct data problems.

Censoring.

We observe all units, but the outcome is recorded only up to a limit for some of them.
Example: hours worked is observed as zero for non-workers, but we know they exist. Wages are top-coded at $150,000 in survey data.
The censored observations are in the sample, we just do not see their true value.

Truncation.

Observations below (or above) a threshold are entirely absent from the data.
Example: a study of high earners that only surveys people with income > $50,000. We do not observe lower-income people at all.

Sample selection.

Whether we observe Y depends on a separate selection process that is correlated with Y.
Example: wages are only observed for employed workers. Employment is not random, it is correlated with the wage offer.

Censoring and truncation are different problems with different solutions.

Censoring

All units are in the data. Outcome is known to be above/below a limit for censored observations, but exact value is unknown. Solution: Tobit.

Truncation

Some units are entirely absent from the sample. The analyst never sees them. Solution: truncated regression (MLE on the truncated normal distribution).

Why OLS fails with censoring: treating all censored observations as if their outcome equaled the censoring point biases the slope toward zero. Treating them as missing discards information. Both approaches produce inconsistent estimates.

Censoring is far more common in practice. The rest of this lecture focuses on censoring (Tobit) and sample selection (Heckman).

The Tobit model (Tobin 1958) handles censoring via MLE on a latent variable.

Assume a latent (unobserved) continuous variable Y*_i = X_iβ + u_i, u_i ~ N(0, σ²). We observe:

Y_i = Y*_i if Y*_i > 0, Y_i = 0 if Y*_i ≤ 0

The log-likelihood mixes a continuous density (for uncensored observations) with a probability mass (for censored ones):

ℓ = ∑_{y_i>0} log φ((y_i−x_iβ)/σ)/σ + ∑_{y_i=0} log Φ(−x_iβ/σ)

Tobit assumes normality and homoskedasticity. It is not robust to violations of these assumptions, unlike logit/probit, MLE-inconsistency from misspecification carries over fully to the slope estimates.

Three distinct Tobit marginal effects — know which one you want.

Effect on the latent variable Y*.

∂E[Y*] / ∂X_k = β_k. The raw Tobit coefficient. Useful for structural interpretation of the latent index.

Effect on the observed outcome Y (unconditional).

∂E[Y] / ∂X_k = β_k Φ(xβ/σ). Scaled down by the probability of being uncensored.
This is the appropriate marginal effect if you care about the average outcome in the full population.

Effect conditional on being uncensored.

∂E[Y | Y > 0] / ∂X_k = β_k[1 − λ(λ + xβ/σ)], where λ is the inverse Mills ratio.
The appropriate marginal effect if you care only about participants (e.g., workers’ wages conditional on working).

We only observe wages for people who work. Workers are not a random draw from the population. How do we estimate the wage equation without selection bias?

Sample selection: the outcome is missing for a non-random subset of the population.

Let S_i = 1 if we observe Y_i. We want E[Y_i | X_i] for the full population, but we can only use the selected subsample.

OLS on the selected sample estimates E[Y_i | X_i, S_i = 1]. If selection is correlated with the outcome (e.g., high-wage people are more likely to work), this differs from the population expectation. The difference is selection bias.

Classic examples:

Wages: only employed workers have observed wages. Employment is correlated with unobserved productivity.
Test scores: students who drop out don’t take the test. Dropouts are negatively selected on ability.
Loan repayment: only approved borrowers repay. Approval is correlated with creditworthiness.

Heckman (1979): model selection explicitly, then correct the outcome equation for selection bias.

Step 1 — Selection equation. Estimate a probit for P(S_i = 1 | Z_i) using all observations (selected and unselected). From the probit, compute the inverse Mills ratio for each selected observation:

λ̂_i = φ(Z_iγ̂) / Φ(Z_iγ̂)

Step 2 — Outcome equation. Run OLS on the selected sample, adding λ̂_i as a regressor:

Y_i = X_iβ + ρσλ̂_i + v_i

The coefficient on λ̂_i is ρσ, where ρ is the correlation between the selection and outcome errors. A significant coefficient signals selection bias. The corrected β̂ is consistent.

The Heckman model requires an exclusion restriction.

Exclusion Restriction

At least one variable in the selection equation Z_i must be excluded from the outcome equation X_i, a variable that affects selection but has no direct effect on the outcome.

Why it matters.

Without exclusion, identification relies only on the non-linearity of λ̂ (the Mills ratio). This is fragile, collinearity between λ̂ and X inflates standard errors and makes estimates sensitive to functional form assumptions.

Classic exclusion restrictions.

Number of children / young children at home (affects employment decisions but not the wage rate).
Spousal income (affects work participation but typically not the wage offer).
Local unemployment rate (affects whether a job is found, but not individual productivity).

Think of it as an IV for selection.

The exclusion restriction is the same concept as an IV relevance + exclusion pair, applied to the selection equation rather than a treatment equation.

Tobit vs. Heckman: when to use which.

Use Tobit when censoring is the only problem.

The same latent process governs both whether Y > 0 and the level of Y when positive.
Examples: hours worked (same utility function drives both the extensive and intensive margins), charitable donations.

Use Heckman (or a two-part model) when selection and outcome are driven by different processes.

The decision to participate and the outcome conditional on participation have different determinants.
Example: whether to buy a car (selection) and how much to spend on the car (outcome). Different variables matter for each.

Two-part model (Cragg 1971).

Estimate a probit for participation and a separate OLS (or log-normal MLE) for the positive outcomes. Does not impose that the same process governs both margins. Robust to distributional misspecification but ignores the selection correlation.

Duration (survival) models handle another form of limited data: outcomes that are times until an event.

Examples: time until re-employment, time until firm exits, time until loan defaults, patient survival time. The outcome T_i is a non-negative duration.

Censoring in durations: many observations have not yet experienced the event by the end of the study (“right-censored”). We know their duration exceeds the observation window but not by how much. OLS ignores this information.

Hazard function h(t) = lim_Δ︎→︎0 P(T ≤ t+Δ | T ≥ t) / Δ: the instantaneous rate of the event at duration t, conditional on surviving to t.

Cox proportional hazard model: h(t | X) = h₀(t) exp(Xβ). The baseline hazard h₀(t) is left unspecified, only the proportional effect of covariates is modeled. Semiparametric, robust to baseline hazard misspecification.

Common mistakes with limited dependent variable models.

Using OLS on a censored outcome.

Treating all censored values as if they equal the censoring point compresses the variation in Y and biases slope coefficients toward zero. Use Tobit.

Using Heckman without a credible exclusion restriction.

Identification from functional form alone (non-linearity of the Mills ratio) is fragile. Always have an economic argument for at least one excluded variable.

Reporting Tobit coefficients as if they were marginal effects.

The raw β from Tobit is the latent-variable effect. Compute and report the appropriate marginal effect for your question (unconditional or conditional on participation).

Ignoring non-normality in Tobit.

Tobit is sensitive to distributional assumptions. If errors are fat-tailed or heteroskedastic, use censored quantile regression or Powell’s CLAD estimator instead.

Choosing and using LDV models in practice.

1. Diagnose the data problem first.

Censoring (all units observed, some outcomes at a limit)? ︎→︎ Tobit.
Selection (outcome missing for non-random subset)? ︎→︎ Heckman or two-part model.
Duration / survival outcomes? ︎→︎ Cox model.

2. For Heckman: find a credible exclusion restriction before running the model.

3. Always compute and report marginal effects, not raw MLE coefficients.

4. Check sensitivity to distributional assumptions.

Compare Tobit with a two-part model. If they diverge, the normality assumption may be driving results.

5. Report the test for selection (t-test on the Mills ratio coefficient in Heckman step 2).

Software for LDV models.

Stata.

tobit: Tobit MLE. margins: compute any of the three marginal effects.
heckman: Heckman two-step and MLE. twostep option for the two-step estimator.
stset + stcox: Cox proportional hazard model.

R.

tobit() in AER, censReg() in censReg.
heckit() in sampleSelection for Heckman two-step.
coxph() in survival for duration models.

Key outputs to always report.

For Tobit: σ̂ (scale parameter) and the marginal effects at the mean or AMEs.
For Heckman: first-stage probit results, ρ̂ (selection correlation), t-statistic on Mills ratio.

Sample selection is a form of endogeneity. All the tools from earlier lectures apply.

If the selection mechanism is correlated with the error in the outcome equation, OLS on the selected sample is inconsistent, for the same reason that OVB makes OLS inconsistent. Selection bias is a missing-variable problem where the missing variable is the propensity to be selected.

Alternative strategies for selection:

IV: instrument for selection using a variable that shifts participation but not the outcome (this is exactly the exclusion restriction in Heckman).
Panel FE: if selection is driven by a time-invariant characteristic (α_i), within-unit changes difference it out.
Bounding: Manski bounds provide worst-case and best-case treatment effects without functional form assumptions. Useful when no credible exclusion restriction exists.

The Heckman correction is powerful but parametric. When its assumptions are questionable, prefer an IV or panel strategy.

Practice Questions

Question 1 of 4

Limited dependent variable problems arise when outcomes are bounded, censored, or missing non-randomly.

Censoring: Tobit MLE. Report marginal effects, not raw coefficients.
Sample selection: Heckman two-step. Requires a credible exclusion restriction, test for selection via the Mills ratio.
All LDV models are parametric. Check sensitivity to distributional assumptions.