๐Ÿ”ต Data Visualisation

Scatter Plots & Correlation Explained

Discover the relationship between two variables โ€” plot the points, describe the pattern, and draw the line of best fit.


๐Ÿ• 10 min read  |  Class 9โ€“12  |  FBISE ยท CBSE ยท IGCSE ยท O-Levels ยท IB

Does studying more hours lead to higher exam scores? Do taller students tend to have longer foot lengths? These are questions about the relationship between two variables โ€” and a scatter plot is the tool designed to answer them visually. A scatter plot plots paired data as individual points on a grid. The pattern formed by those points reveals whether the two variables are related, how strongly, and in which direction. Understanding scatter plots and correlation is one of the most practically useful statistical skills you will learn.

What Is a Scatter Plot?

A scatter plot (or scatter diagram) displays pairs of values for two variables. One variable is plotted on the x-axis (typically the independent variable) and the other on the y-axis (the dependent variable). Each pair of values becomes one point on the graph. The overall pattern of points indicates the type and strength of the relationship โ€” known as correlation.

Unlike a line graph, the points in a scatter plot are not connected. The pattern is read as a whole, not point to point.

Types of Correlation

Type What the Points Show Real-World Example
Strong positive Points cluster tightly around a line going up-right Study hours vs exam score
Weak positive Points loosely trend up-right, widely scattered Height vs shoe size
Strong negative Points cluster tightly around a line going down-right Speed vs journey time
Weak negative Points loosely trend down-right Temperature vs heating cost
No correlation Points scattered randomly, no visible trend Shoe size vs IQ score
๐Ÿ’ก Correlation does not imply causation. Two variables can be correlated without one causing the other. For example, ice cream sales and drowning rates both rise in summer โ€” but ice cream does not cause drowning. Both are caused by a third factor (hot weather).

Step-by-Step: Drawing a Scatter Plot

The table shows the number of hours five students studied and their test scores.

Student Hours Studied Test Score (%)
A 2 45
B 4 58
C 5 65
D 7 78
E 9 90
๐Ÿ“‹ Plotting the Scatter Diagram
1

Identify the variables. Hours studied is the independent variable โ†’ x-axis. Test score is the dependent variable โ†’ y-axis.

2

Draw and label axes. x-axis: 0 to 10 hours. y-axis: 0 to 100%. Mark equal intervals.

3

Plot each pair as a single point (ร—).

(2, 45) ยท (4, 58) ยท (5, 65) ยท (7, 78) ยท (9, 90)
4

Describe the correlation: The points rise from left to right in a fairly tight cluster, showing strong positive correlation โ€” students who studied more tended to score higher.

5

Draw the line of best fit. Draw a straight line that passes through the middle of the data, with roughly equal numbers of points above and below. The line should pass through or near the mean point (xฬ„, ศณ) = (5.4, 67.2).

The Line of Best Fit

The line of best fit (regression line) is a straight line drawn through a scatter plot to represent the trend of the data. It is used for interpolation (predicting values within the data range) and โ€” with caution โ€” extrapolation (predicting outside the range).

To draw it by eye: imagine the cloud of points as a rugby ball shape and draw a line along its longest axis so that about half the points are above and half below. Always draw through the mean point (xฬ„, ศณ) when possible.

โš ๏ธ Extrapolation is unreliable. Predicting far outside the range of your data using the line of best fit assumes the trend continues โ€” which may not be true. Exam questions often ask you to comment on the reliability of an extrapolated prediction.

Real-Life Applications

  • ๐Ÿ‹๏ธ
    Sports science: Coaches scatter-plot training load against performance metrics to find the optimal training intensity for each athlete.
  • ๐Ÿ 
    Real estate: Property analysts plot house size against sale price to identify whether larger homes command proportionally higher prices in a given area.
  • ๐Ÿ”ฌ
    Science experiments: In physics or chemistry practicals, students plot experimental data as a scatter plot and use the line of best fit to determine gradients and intercepts.
  • ๐Ÿ“Š
    Public health: Epidemiologists scatter-plot variables like air pollution levels and respiratory illness rates across different cities to identify associations.

Common Mistakes Students Make

โš ๏ธ Connecting the dots. Scatter plot points are never connected with lines. Connecting them turns the graph into a line graph, which implies time sequence โ€” not what a scatter plot shows.
โš ๏ธ Drawing the line of best fit through the origin. The line of best fit does not have to pass through (0, 0) unless you have strong theoretical reasons to expect a proportional relationship.
โš ๏ธ Describing correlation as "cause and effect." Always say "there is a positive correlation between X and Y" โ€” never "X causes Y" based on a scatter plot alone.
โš ๏ธ Ignoring outliers when describing the overall pattern. Describe the general trend first, then mention any outliers separately. Do not let one unusual point dominate your description.

Frequently Asked Questions

Pearson's r is a numerical measure of the strength and direction of linear correlation, ranging from โˆ’1 (perfect negative) to +1 (perfect positive). A value near 0 indicates no linear correlation. It is introduced at IB and A-Level but the visual interpretation of scatter plots is expected at IGCSE and O-Level.
Use the form y = mx + c. Identify two points on your drawn line (not necessarily data points), calculate the gradient m = (yโ‚‚ โˆ’ yโ‚)/(xโ‚‚ โˆ’ xโ‚), then substitute one point to find c. This is tested at IGCSE and IB level.
An outlier is a point that lies noticeably away from the general pattern of the scatter plot โ€” significantly above or below the line of best fit. In exam questions, you may be asked to identify outliers, explain possible reasons for them, and comment on whether they should be included in the analysis.

Try the Scatter Plot Generator

Enter your paired data and instantly generate a fully labelled scatter plot โ€” complete with line of best fit and correlation description. Ideal for checking assignments or exploring data sets.

๐Ÿ”ต Open the Scatter Plot Generator โ†’