Probability and Data Analysis

MYP Unit Framework

Key Concept: LOGIC Related Concepts: Representation, Validity, Models Global Context: Identities and Relationships (How do data and probability help us understand ourselves and our communities?) Statement of Inquiry: Statistical methods allow us to draw valid conclusions from data while recognising uncertainty and bias.


Inquiry Questions

TypeQuestion
FactualWhat is the difference between mean, median, and mode? How do you calculate probability of simple events?
ConceptualHow do different representations of data affect interpretation? Why is randomness important in probability and statistics?
DebatableCan statistics be trusted — or are they easily manipulated to support any argument? Should probability influence life decisions, or is intuition a better guide?

ATL Skills

  • Thinking: Critically evaluate statistical claims; distinguish between correlation and causation
  • Research: Collect, organise, and analyse data; design surveys and experiments
  • Communication: Present data using appropriate graphs and charts; write data-driven conclusions
  • Social: Collaborate on data collection and analysis projects
  • Self-Management: Manage time for extended data investigation

1. Descriptive Statistics

Measures of Central Tendency

Mean: The arithmetic average. Sum of all values divided by number of values.

Median: The middle value when data is arranged in order. Less affected by outliers than the mean.

Mode: The most frequently occurring value. Useful for categorical data.

Choosing the Right Measure

  • Mean: Use when data is symmetrically distributed without outliers
  • Median: Use when data is skewed or has outliers
  • Mode: Use for categorical data or when identifying the most common value

Measures of Dispersion (Spread)

Range: Maximum value - Minimum value. Simple but affected by outliers.

Interquartile Range (IQR): Q3 - Q1. Represents the spread of the middle 50% of data. Not affected by outliers.

Quartiles:

  • Q1: Median of the lower half of data
  • Q2: Median of all data
  • Q3: Median of the upper half of data

Box-and-Whisker Plots

Visual representation showing minimum, Q1, median, Q3, and maximum. Useful for comparing distributions.


2. Data Representation

Types of Graphs and Their Uses

Graph TypeBest Used For
Bar chartComparing categories
HistogramDistribution of continuous data
Pie chartShowing proportions of a whole
Line graphTrends over time
Scatter plotRelationship between two variables
Box plotComparing distributions; showing spread

Misleading Graphs

Graphs can mislead through:

  • Truncated y-axis (not starting at zero)
  • Inconsistent scales
  • Cherry-picked time frames
  • 3D effects distorting proportions
  • Cherry-picking data to support a narrative

Analysing Graphical Data

When interpreting a graph, ask:

  1. What does the x-axis represent? The y-axis?
  2. What is the scale? Does it start at zero?
  3. What trends or patterns are visible?
  4. What conclusions can be drawn?
  5. What information is missing?

3. Probability Basics

What Is Probability?

Probability measures the likelihood of an event occurring. It ranges from 0 (impossible) to 1 (certain).

Formula: P(event) = Number of favourable outcomes / Total number of possible outcomes

Key Terminology

  • Experiment: A process with uncertain outcomes (e.g., rolling a die)
  • Outcome: A single result of an experiment
  • Event: A set of outcomes (e.g., rolling an even number)
  • Sample Space: All possible outcomes

Types of Probability

Theoretical Probability: Based on reasoning (e.g., probability of rolling a 6 on a fair die is 1/6)

Experimental Probability: Based on actual trials (e.g., 12 sixes in 60 rolls = 0.2). As the number of trials increases, experimental probability approaches theoretical probability (Law of Large Numbers).

The Complement Rule

P(not A) = 1 - P(A)

Addition Rule (OR)

For mutually exclusive events: P(A or B) = P(A) + P(B)

For non-mutually exclusive events: P(A or B) = P(A) + P(B) - P(A and B)

Multiplication Rule (AND)

For independent events: P(A and B) = P(A) x P(B)

For dependent events: P(A and B) = P(A) x P(B given A)

Expected Value

Expected value = sum of (each outcome x its probability)

Real-world application: Insurance companies use expected value to set premiums; casinos use it to ensure profitability.


4. Probability in Practice

Tree Diagrams

Tree diagrams help visualise multi-stage events with probabilities at each branch.

Venn Diagrams

Venn diagrams show relationships between sets and can be used to calculate probabilities involving unions and intersections.

Two-Way Tables

Two-way tables organise data by two categories and allow calculation of conditional probabilities.

Conditional Probability

The probability of event A given that event B has occurred: P(A|B) = P(A and B) / P(B)


5. Correlation and Causation

Correlation

Correlation measures the strength and direction of the linear relationship between two variables.

  • Positive correlation: As one variable increases, the other increases
  • Negative correlation: As one variable increases, the other decreases
  • No correlation: No relationship between the variables

Correlation Coefficient (r)

Ranges from -1 to +1:

  • r = +1: Perfect positive correlation
  • r = -1: Perfect negative correlation
  • r = 0: No correlation

Correlation Does NOT Imply Causation

A common statistical fallacy: assuming that because two variables are correlated, one causes the other.

Example: Ice cream sales and drowning incidents are correlated. But ice cream does not cause drowning. Both are caused by a third variable: hot weather (more people swim AND more people eat ice cream).

Spurious Correlations

Sometimes correlations are purely coincidental. The website 'Spurious Correlations' shows examples like the correlation between margarine consumption and the divorce rate in Maine.


6. Data Investigation Project

The Statistical Investigation Cycle (PPDAC)

  1. Problem: Define the question you want to answer
  2. Plan: Design how to collect data
  3. Data: Collect the data
  4. Analysis: Organise, represent, and analyse the data
  5. Conclusion: Draw conclusions and communicate findings

Designing a Survey

  • Define a clear research question
  • Choose an appropriate sample size
  • Avoid biased or leading questions
  • Ensure anonymity and ethical data collection
  • Consider sampling method (random, stratified, convenience)

Ethical Considerations

  • Informed consent
  • Privacy and confidentiality
  • Honest representation of data
  • Avoiding manipulation of statistics for persuasion

Summative Assessment

Task: Statistical investigation (800-1000 words equivalent) involving data collection, analysis, and interpretation.

Criteria:

  • A: Knowing and Understanding — Apply statistical and probability concepts correctly
  • B: Investigating Patterns — Collect data, identify patterns, and draw valid conclusions
  • C: Communicating — Present data using appropriate representations; communicate reasoning clearly
  • D: Applying Mathematics in Real-World Contexts — Apply statistics to a real-world question; evaluate limitations

Option 1: Design and conduct a survey on a topic of interest. Analyse the data using measures of central tendency and dispersion. Present findings with appropriate graphs.

Option 2: Investigate a claim made in the media using statistical analysis. Is the claim supported by evidence? How might data be misleading?

Option 3: Conduct a probability experiment (e.g., rolling dice, spinning spinners). Compare experimental results with theoretical probability. Discuss the Law of Large Numbers.


Formative Assessment

  • Calculating mean, median, mode, range, and IQR from data sets
  • Creating and interpreting different types of graphs
  • Probability problem sets (single and multi-stage events)
  • Identifying misleading graphs (media analysis)
  • Correlation vs. causation exercises
  • Tree diagram and Venn diagram construction

Interdisciplinary Connections

  • Science: Analysing experimental data; understanding statistical significance
  • Economics: Risk assessment; market analysis; probability in financial decisions
  • Psychology: Understanding statistical claims in psychological research
  • Media Studies: Critical analysis of statistics in news and advertising
  • Sports: Analysing player and team statistics; probability in game strategy

Service as Action

  • Data for Good: Collect and analyse data on an issue in your school (waste, energy use, wellbeing). Present findings to the school leadership with recommendations.
  • Statistical Literacy: Create posters or a workshop for younger students about how to spot misleading statistics in the media.

IB Learner Profile

  • Inquirers: Ask questions that can be answered through data
  • Thinkers: Critically evaluate statistical claims and distinguish valid conclusions from misleading ones
  • Knowledgeable: Understand concepts of probability and data analysis
  • Principled: Use data ethically; represent findings honestly
  • Reflective: Consider the limitations of statistical methods and the role of uncertainty

Self-Test

  1. Find the mean, median, and mode of: 3, 7, 2, 8, 3, 9, 5.
  2. What is the range? What is the interquartile range?
  3. When should you use the median instead of the mean?
  4. List THREE ways graphs can be misleading.
  5. What is the probability of rolling an even number on a fair six-sided die?
  6. Two dice are rolled. What is the probability of rolling a sum of 7?
  7. What is the complement rule in probability?
  8. Explain the difference between mutually exclusive and independent events.
  9. What is the difference between correlation and causation? Give an example.
  10. What are the five stages of the PPDAC statistical investigation cycle?

This unit aligns with IB MYP Mathematics guide, developed for Year 4 (Class 9) students.

Verified by the tuition.in editorial team
Written and reviewed by subject-matter experts — read about our process.
Editorial process →
Header Logo