Scatter plots are powerful data visualization tools that can convey a lot of information. This tutorial will explain what they are and when to use them.
What is a Scatter Plot?
A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables - one plotted along the x-axis and the other plotted along the y-axis. For example this scatter plot shows the height and weight of a fictitious set of children.
Each dot represents one child with his or her height measured along the x-axis and weight measured along the y-axis.
When to Use Scatter Plots
Scatter plots are used when you want to show the relationship between two variables. Scatter plots are sometimes called correlation plots because they show how two variables are correlated. In the height and weight example, the chart wasn’t just a simple log of the height and weight of a set of children, but it also visualized the relationship between height and weight - namely that weight increases as height increases. Notice that the relationship isn’t perfect, some taller children weight less than some shorter children, but the general trend is pretty strong and we can see that weight is correlated with height.
Not all relationships are linear. For example this plot shows average daily high temperature measured over seven years, showing a familiar parabolic relationship between these variables where temperature peaks in the summer months.
You can also have data that show no discernible relationship at all - which itself can be an interesting finding (no correlation).
Common Extensions of Scatter Plots
Several advanced visualization tools allow for more complex visualizations.
Often scatter plots will include a trendline to help make the relationship more clear, as we do in the following graph.
Additionally, the size, shape or color of the dot could represents a third (or even fourth variable). For example, this chart shows the height and weight data but adds in the information of the gender of the child as the color of the dot.
Scatter plots are very useful tools for conveying the relationship between two variables, but you need to know how to use them and interpret them properly.