Charts are an essential part of working with data, as they are a way to condense large amounts of data into an easy to understand format. Visualizations of data can bring out insights to someone looking at the data for the first time, as well as convey findings to others who won’t see the raw data. There are countless chart types out there, each with different use cases. Often, the most difficult part of creating a data visualization is figuring out which chart type is best for the task at hand.
Your choice of chart type will depend on multiple factors. What are the types of metrics, features, or other variables that you plan on plotting? Who is the audience that you plan on presenting to – is it just an initial exploration for yourself, or are you presenting to a broader audience? What is the kind of conclusion that you want the reader to draw?
In this article, we’ll provide an overview of essential chart types that you’ll see most frequently offered by visualization tools. With these charts, you will have a broad toolkit to be able to handle your data visualization needs. Guidance on when to select each one based on use case is covered in a follow-up article.
The Foundational Four
In his book Show Me the Numbers, Stephen Few suggests four major encodings for numeric values, indicating positional value via bars, lines, points, and boxes. So we’ll start off with four basic chart types, one for each of these value-encoding means.
In a bar chart, values are indicated by the length of bars, each of which corresponds with a measured group. Bar charts can be oriented vertically or horizontally; vertical bar charts are sometimes called column charts. Horizontal bar charts are a good option when you have a lot of bars to plot, or the labels on them require additional space to be legible.
Line charts show changes in value across continuous measurements, such as those made over time. Movement of the line up or down helps bring out positive and negative changes, respectively. It can also expose overall trends, to help the reader make predictions or projections for future outcomes. Multiple line charts can also give rise to other related charts like the sparkline or ridgeline plot.
A scatter plot displays values on two numeric variables using points positioned on two axes: one for each variable. Scatter plots are a versatile demonstration of the relationship between the plotted variables—whether that correlation is strong or weak, positive or negative, linear or non-linear. Scatter plots are also great for identifying outlier points and possible gaps in the data.
A box plot uses boxes and whiskers to summarize the distribution of values within measured groups. The positions of the box and whisker ends show the regions where the majority of the data lies. We most commonly see box plots when we have multiple groups to compare to one another; other charts with more detail are preferred when we have only one group to plot.
Tables and single values
Before moving on to other chart types, it’s worth taking a moment to appreciate the option of just showing the raw numbers. In particular, when you only have one number to show, just displaying the value is a sensible approach to depicting the data. When exact values are of interest in an analysis, you can include them in an accompanying table or through annotations on a graphical visualization.
Additional chart types can come about from changing the ways encodings are used, or by including additional encodings. Secondary encodings like area, shape, and color can be useful for adding additional variables to more basic chart types.
If the groups depicted in a bar chart are actually continuous numeric ranges, we can push the bars together to generate a histogram. Bar lengths in histograms typically correspond to counts of data points, and their patterns demonstrate the distribution of variables in your data. A different chart type like line chart tends to be used when the vertical value is not a frequency count.
One modification of the standard bar chart is to divide each bar into multiple smaller bars based on values of a second grouping variable, called a stacked bar chart. This allows you to not only compare primary group values like in a regular bar chart, but also illustrate a relative breakdown of each group’s whole into its constituent parts.
If, on the other hand, the sub-bars were placed side-by-side into clusters instead of kept in their stacks, we would obtain the grouped bar chart. The grouped bar chart does not allow for comparison of primary group totals, but does a much better job of allowing for comparison of the sub-groups.
A dot plot is like a bar chart in that it indicates values for different categorical groupings, but encodes values based on a point’s position rather than a bar’s length. Dot plots are useful when you need to compare across categories, but the zero baseline is not informative or useful. You can also think of a dot plot as like a line plot with the line removed, so that it can be used with variables with unordered categories rather than just continuous or ordered variables.
An area chart starts with the same foundation as a line chart – value points connected by line segments – but adds in a concept from the bar chart with shading between the line and a baseline. This chart is most often seen when combined with the concept of stacking, to show how both how a total has changed over time, but also how its components’ contributions have changed.
Dual-axis charts overlay two different charts with a shared horizontal axis, but potentially different vertical axis scales (one for each component chart). This can be useful to show a direct comparison between the two sets of vertical values, while also including the context of the horizontal-axis variable. It is common to use different base chart types, like the bar and line combination, to reduce confusion of the different axis scales for each component chart.
Another way of showing the relationship between three variables is through modification of a scatter plot. When a third variable is categorical, points can use different shapes or colors to indicate group membership. If the data points are ordered in some way, points can also be connected with line segments to show the sequence of values. When the third variable is numeric in nature, that is where the bubble chart comes in. A bubble chart builds on the base scatter plot by having the third variable’s value determine the size of each point.
The density curve, or kernel density estimate, is an alternative way of showing distributions of data instead of the histogram. Rather than collecting data points into frequency bins, each data point contributes a small volume of data whose collected whole becomes the density curve. While density curves may imply some data values that do not exist, they can be a good way to smooth out noise in the data to get an understanding of the distribution signal.
An alternative to the box plot’s approach to comparing value distributions between groups is the violin plot. In a violin plot, each set of box and whiskers is replaced with a density curve built around a central baseline. This can provide a better comparison of data shapes between groups, though this does lose out on comparisons of precise statistical values. A frequent variation for violin plots is to include box-style markings on top of the violin plot to get the best of both worlds.
The heatmap presents a grid of values based on two variables of interest. The axis variables can be numeric or categorical; the grid is created by dividing each variable into ranges or levels like a histogram or bar chart. Grid cells are colored based on value, often with darker colors corresponding with higher values. A heatmap can be an interesting alternative to a scatter plot when there are a lot of data points to plot, but the point density makes it difficult to see the true relationship between variables.
There are plenty of additional charts out there that encode data in other ways for particular use cases. Xenographics includes a collection of some fanciful charts that have been driven by very particular purposes. Still, some of these charts have use cases that are common enough that they can be considered essential to know.
You might be surprised to see pie charts being sequestered here in the ‘specialist’ section, considering how commonly they are utilized. However, pie charts use an uncommon encoding, depicting values as areas sliced from a circular form. Since a pie chart typically lacks value markings around its perimeter, it is usually difficult to get a good idea of exact slice sizes. However, the pie chart and its cousin the donut plot excel at telling the reader that the part-to-whole comparison should be the main takeaway from the visualization.
A funnel chart is often seen in business contexts where visitors or users need to be tracked in a pipeline flow. The chart shows how many users make it to each stage of the tracked process from the width of the funnel at each stage division. The tapering of the funnel helps to sell the analogy, but can muddle what the true conversion rates are. A bar chart can often fulfill the same purpose as a funnel chart, but with a cleaner representation of data.
The bullet chart enhances a single bar with additional markings for how to contextualize that bar’s value. This usually means a perpendicular line showing a target value, but also background shading to provide additional performance benchmarks. Bullet charts are usually used for multiple metrics, and are more compact to render than other types of more fanciful gauges.
There are a number of families of specialist plots grouped by usage, but we’ll close this article out by touching upon one of them: map-based or geospatial plots. When values in a dataset correspond to actual geographic locations, it can be valuable to actually plot them with some kind of map. A common example of this type of map is the choropleth like the one above. This takes a heat map approach to depicting value through the use of color, but instead of values being plotted in a grid, they are filled into regions on a map.