←︎ Back Econometrics › Class Slides
1 / 25

Review of Statistics & Probability

Lecture 1

Why does econometrics need statistics?

Why does econometrics need statistics?

Data contains noise.
  • Measurement error, omitted factors, and randomness all cloud the signal.

Why does econometrics need statistics?

Data contains noise.
  • Measurement error, omitted factors, and randomness all cloud the signal.
Statistics separates signal from noise.
  • We want to know what is systematic vs. what is random.

Why does econometrics need statistics?

Data contains noise.
  • Measurement error, omitted factors, and randomness all cloud the signal.
Statistics separates signal from noise.
  • We want to know what is systematic vs. what is random.
Our goal: credible claims about the world from imperfect data.
  • Does education raise wages? Does a policy reduce crime? By how much?
  • Statistics gives us the language to answer these questions honestly.
There are two branches of statistics.
Descriptive statistics summarize what is in the data, means, medians, standard deviations, histograms.
Inferential statistics draw conclusions about a population using only a sample.
Econometrics lives almost entirely in the inferential branch.

What is a random variable?

What is a random variable?

A variable whose value is determined by a random process.
  • Before you observe it, it could take many possible values.
  • Once observed, it is just a number, the randomness was in the draw.

What is a random variable?

A variable whose value is determined by a random process.
  • Before you observe it, it could take many possible values.
  • Once observed, it is just a number, the randomness was in the draw.
Discrete: takes countable values.
  • Number of job offers received, number of children, employed (0/1).
Continuous: takes any value in a range.
  • Hourly wages, GDP growth, test scores.
Every random variable has a probability distribution.
The distribution describes how likely each value (or range of values) is.
For a continuous variable, the probability density function (PDF) ƒ(x) tells us the relative likelihood of each value. Probabilities are areas under the curve.
Two numbers summarize most of what we need: the mean (center) and the variance (spread).
The expected value is the population mean.
E[X] = μ, the probability-weighted average of all possible values.
Interpretation: if the random experiment were repeated infinitely many times, the average of all outcomes would converge to μ.
Key property: E[a + bX] = a + bE[X], expectations are linear.
Variance measures spread around the mean.
Var(X) = σ² = E[(Xμ)²]
The average squared deviation from the mean. Squaring penalizes large deviations and makes the variance always non-negative.
The standard deviation σ = √Var(X) is in the same units as X, easier to interpret.
Key property: Var(a + bX) = b² · Var(X)

Why is the normal distribution everywhere in statistics?

The normal distribution

X ∼ N(μ, σ²), symmetric, bell-shaped curve.
  • Fully characterized by two parameters: mean μ and variance σ².
  • The standard normal has μ = 0, σ² = 1, written Z ∼ N(0, 1).

The normal distribution

X ∼ N(μ, σ²), symmetric, bell-shaped curve.
  • Fully characterized by two parameters: mean μ and variance σ².
  • The standard normal has μ = 0, σ² = 1, written Z ∼ N(0, 1).
The 68–95–99.7 rule.
  • ~68% of observations fall within ±1σ of the mean.
  • ~95% fall within ±2σ (exactly: ±1.96σ).
  • ~99.7% fall within ±3σ.
Why it matters for econometrics:
  • OLS estimators are approximately normal in large samples, so we can do inference.

What is the difference between a population and a sample?

Population vs. sample

The population is everyone (or everything) we care about.
  • All U.S. workers. All counties in a state. All firms in an industry.
  • We almost never observe the full population.
The sample is the subset we actually observe.
  • We use sample statistics to estimate unknown population parameters.
  • The quality of inference depends on how the sample was drawn.
Population parameters vs. sample statistics.
Quantity Population Sample
Mean μ
Variance σ²
Std. deviation σ s
Size N n
Regression coeff. β β̂ (beta-hat)
The sample mean is an estimator of μ.
= (1/n) ∑ xi
Unbiased: E[] = μ. On average, across many samples, hits the target.
Consistent: as n ︎→︎ ∞, ︎→︎ μ. Larger samples give more precise estimates.
Law of Large Numbers: the formal statement that converges to μ as the sample grows.
The Central Limit Theorem is why statistics works.
Regardless of the shape of the population distribution, the sampling distribution of is approximately normal for large n:
(μ) / (σ / √n)  ∼  N(0, 1)
The term σ / √n is the standard error of the mean, it shrinks as the sample grows.

Why the CLT matters for econometrics

We can construct confidence intervals.
  • A 95% CI for μ: ± 1.96 · (s / √n)
We can test hypotheses.
  • Is the true mean zero? Is the true regression coefficient zero?
  • Under H0, the test statistic follows a known distribution, we can compute p-values.
OLS estimators inherit this normality.
  • β̂ is itself a sample average (of weighted data). CLT applies to it too.
Covariance measures how two variables move together.
Cov(X, Y) = E[(XμX)(YμY)]
Positive: X and Y tend to rise and fall together. Negative: they move in opposite directions.
Correlation: ρ = Cov(X, Y) / (σX σY)
ρ is scale-free and bounded: −1 ≤ ρ ≤ 1. A value of 0 means no linear relationship, it does not mean independence.
Regression is about conditional expectations.
Everything we just covered, random variables, means, variances, the normal distribution, feeds directly into regression.
OLS estimates E[Y | X]: the mean of Y given a value of X.
The regression coefficient β̂ is a sample statistic. It has an expected value, a variance, and, via the CLT, a sampling distribution we can use for inference.
Understanding that is what this course is about.
Practice Questions
Question 1 of 4

Key Terms