Correlation
"Correlation does not imply causation. But it often whispers where to look."
1. Chapter Overview
CORRELATION measures the STRENGTH and DIRECTION of the RELATIONSHIP between TWO variables. This chapter covers: types of correlation (positive/negative, linear/non-linear, simple/multiple/partial), DEGREES of correlation (perfect, high, low, zero), and KARL PEARSON'S COEFFICIENT OF CORRELATION (r).
2. What Is Correlation?
- A STATISTICAL MEASURE of the relationship between two variables
- Answers: When X changes, does Y change SYSTEMATICALLY? In which DIRECTION? How STRONGLY?
Types of Correlation
| By Direction | By Form | By Number of Variables |
|---|---|---|
| Positive: X↑ → Y↑ (height & weight) | Linear: Points cluster around a straight line | Simple: Two variables |
| Negative: X↑ → Y↓ (price & quantity demanded) | Non-linear: Relationship is curved | Multiple: 3+ variables |
| Zero: No relationship | Partial: Controlling for other variables |
Degree of Correlation
- Perfect (r = +1 or -1): ALL points fall exactly on a straight line
- High (r ~ ±0.7 to ±0.99): Strong relationship
- Moderate (r ~ ±0.3 to ±0.7): Moderate relationship
- Low (r ~ 0 to ±0.3): Weak relationship
- Zero (r = 0): No linear relationship (there could be non-linear)
3. Methods of Measuring Correlation
1. Scatter Diagram
- Plot (X, Y) pairs on a graph. Each point = one observation.
- The pattern of dots SHOWS the relationship VISUALLY
- Rough idea. Not precise. Good for FIRST GLANCE.
2. Karl Pearson's Coefficient of Correlation (r)
Or equivalently:
Properties of r
- Always between —1 and +1 (inclusive)
- +1: Perfect positive correlation. All points on a RISING line.
- —1: Perfect negative correlation. All points on a FALLING line.
- 0: No LINEAR correlation
- r is UNIT-FREE (doesn't depend on the units of X or Y)
- r is SYMMETRIC: Correlation(X,Y) = Correlation(Y,X)
- r is AFFECTED by outliers
4. Correlation vs Causation — THE MOST IMPORTANT CAVEAT
Correlation ≠ Causation
- Just because X and Y move together does NOT mean X CAUSES Y
- Examples:
- Ice cream sales and drowning deaths are POSITIVELY correlated. Does ice cream cause drowning? NO. Both increase in SUMMER (the hidden variable: temperature).
- Shoe size and reading ability in children are positively correlated. Do big feet make you read better? NO. Older children have bigger feet AND read better. AGE is the hidden variable.
Spurious Correlation
- A correlation that appears real but is COINCIDENTAL or explained by a THIRD VARIABLE
- Always ask: is there a LOGICAL CONNECTION? Could a THIRD VARIABLE explain this?
5. Exam Focus
- Types — positive/negative, linear/non-linear, simple/multiple
- Degree — perfect, high, moderate, low, zero
- Scatter diagram — visual inspection of correlation
- Karl Pearson's r — formula, properties (range -1 to +1, unit-free, symmetric)
- Correlation ≠ Causation — examples
6. Conclusion
Correlation is the first step in understanding relationships between variables:
- SCATTER DIAGRAM: Look at the dots. Do they suggest a pattern?
- PEARSON'S r: The numerical measure. -1 to +1. The STRENGTH and DIRECTION of the linear relationship.
- CAUSATION: Correlation is a CLUE, not a CONCLUSION. Always ask: WHY? What's the MECHANISM? Is there a THIRD VARIABLE?
'The data say A and B go together. The scientist asks: but WHY? Correlation opens the door. Causation walks through it.'
