Heteroskedasticity

Lecture 4

Recall CLM 5: the errors have constant variance.

Homoskedasticity: Var(ε_i | X_i) = σ² for all i.

The spread of the errors around the regression line is the same regardless of where you are on X.

Heteroskedasticity is when this fails: Var(ε_i | X_i) = σ_i². The variance of the error depends on X.

What does heteroskedasticity look like?

Classic fan shape in a residual plot.

Plot residuals e_i against fitted values Ŷ_i or against X.
If the spread fans out (or in) as X increases, CLM 5 is violated.

Common in cross-sectional economic data.

Household consumption vs. income: wealthier households have more variable spending.
Firm profits vs. firm size: larger firms have more variable outcomes.
County crime rates vs. population: smaller counties have noisier rates.

The residual plot is your first diagnostic tool — always look at it.

If OLS is still unbiased under heteroskedasticity, why do we care?

What heteroskedasticity does to OLS

OLS coefficients are still unbiased.

Heteroskedasticity does not affect CLM 1–4. The estimates β̂_j still center on the truth.

But the standard OLS standard errors are wrong.

The standard formula SE(β̂₁) = s / √∑(X_i − X̄)² assumes σ² is constant. It isn’t.
The reported SEs can be too small or too large, we don’t know which direction without more information.

Wrong SEs mean wrong t-statistics and wrong p-values.

We may reject H₀ when we should not, or fail to reject when we should.
OLS is no longer BLUE: it is not the most efficient linear unbiased estimator.

Testing for heteroskedasticity

Breusch-Pagan test.

Regress the squared residuals e_i² on the regressors X₁, …, X_k.
H₀: all slope coefficients are zero, variance does not depend on X.
Test statistic: LM = n · R² from this auxiliary regression, distributed χ²_k under H₀.

White test.

More general: includes squares and cross-products of all regressors in the auxiliary regression.
Detects non-linear forms of heteroskedasticity that Breusch-Pagan misses.
More flexible, but uses more degrees of freedom.

How do we fix the standard errors without changing the coefficients?

Heteroskedasticity-consistent (HC) robust standard errors.

Proposed by Eicker (1967), popularized by White (1980). Often called “White standard errors” or simply “robust SEs.”

The key idea: instead of assuming Var(ε_i) = σ², use each observation’s squared residual e_i² as an estimate of its own variance.

V̂_HC(β̂) = (X′X)⁻¹ X′ diag(e_i²) X (X′X)⁻¹

The coefficients β̂ are unchanged. Only the estimated covariance matrix, and thus the standard errors, changes.

HC variants: HC0, HC1, HC2, HC3

HC0 (White 1980): the baseline.

Uses e_i² directly. Consistent in large samples.

HC1: finite-sample correction.

Multiplies HC0 by n / (n − k − 1). Default in Stata’s vce(robust).

HC2 and HC3: leverage-adjusted.

Divide e_i² by functions of the hat matrix diagonal (leverage). HC3 is more conservative and often preferred in small samples.
Default in R’s sandwich package.

In practice: HC1 and HC3 are most common. The differences rarely matter in large samples.

Robust SEs can be larger or smaller than OLS SEs.

Robust SE > OLS SE

The variance fan grows with X. Standard OLS underestimates uncertainty. Robust correction inflates the SE, t-statistics shrink, p-values rise.

Robust SE < OLS SE

The variance fan shrinks with X. Standard OLS overestimates uncertainty. Robust correction deflates the SE, t-statistics grow, p-values fall.

You cannot know the direction in advance. Always use robust SEs in cross-sectional data, if homoskedasticity holds, HC SEs converge to the standard SEs anyway.

A related problem: clustering.

Heteroskedasticity means Var(ε_i) varies across observations. A different problem: errors within groups are correlated.

Example: students in the same classroom share a teacher, a classroom environment, unobserved peer effects. Their errors are not independent, even if their variances are equal.

Cluster-robust standard errors account for arbitrary correlation within clusters (classrooms, firms, states) while assuming independence across clusters.

Rule of thumb: cluster at the level of treatment assignment. If a policy varies at the state level, cluster by state.

An alternative fix: Weighted Least Squares (WLS).

If we know the form of heteroskedasticity, that Var(ε_i) = σ² h(X_i) for some known function h, we can exploit it.

WLS divides each observation by √h(X_i) before running OLS. This restores homoskedasticity in the transformed model and makes WLS BLUE.

In practice, the form of h is rarely known. Feasible GLS (FGLS) estimates h from the data first, then applies WLS. But if h is estimated with error, FGLS is no longer guaranteed to outperform robust OLS.

Modern practice: default to robust SEs. Use WLS only when the heteroskedasticity structure is well-motivated by economic theory.

Practical guidance

Always plot your residuals.

Residuals vs. fitted values. Residuals vs. each regressor. Look for fan shapes, trends, outliers.

In cross-sectional data, always use robust SEs as the default.

There is essentially no cost: if errors are homoskedastic, HC SEs are asymptotically equivalent to standard SEs.
In Stata: reg y x1 x2, robust. In R: coeftest(model, vcov = hccm(model, type = "HC1")).

If your data have a natural cluster structure, cluster your SEs.

In Stata: reg y x1 x2, vce(cluster groupvar).
Cluster-robust SEs subsume HC robust SEs, they fix both heteroskedasticity and within-cluster correlation.

Logarithms often reduce heteroskedasticity.

Many economic variables (wages, income, firm size) have right-skewed distributions and variance that grows with the level of the variable.

Taking logs compresses the upper tail and often stabilizes variance. This is one practical reason the log-wage specification is nearly universal in labor economics.

Check: compare the residual plot from a levels regression to a log regression. If the fan shape largely disappears after logging, the transformation has done its job.

But logging is a modeling choice, not a correction for heteroskedasticity per se. The coefficient interpretation changes too.

In time series, heteroskedasticity takes a special form: ARCH.

Financial returns are often conditionally heteroskedastic: the variance of today’s return depends on how volatile yesterday was.

ARCH (Autoregressive Conditional Heteroskedasticity, Engle 1982): Var(ε_t | ε_t−1, …) = α₀ + α₁ε_t−1².

This is widely used in finance for modeling volatility. Robert Engle won the 2003 Nobel Prize partly for this contribution.

For this course we focus on cross-sectional data, so standard HC robust SEs are the relevant fix.

Common misconceptions

Heteroskedasticity does not bias the coefficients.

The estimates β̂ are still centered on the truth. Only inference is affected.
Students often confuse this with omitted variable bias. They are different problems with different fixes.

A non-normal residual distribution is not the same as heteroskedasticity.

Heavy tails or skewness in residuals is a separate issue. CLT still delivers approximate normality of β̂ in large samples regardless of residual shape.

Passing a Breusch-Pagan test does not mean you are safe.

The test has low power in small samples. The absence of evidence is not evidence of absence. Default to robust SEs regardless.

Practice Questions

Question 1 of 4