It is believed that the bar chart is the king of all charts. It’s easy to read and most people are pretty familiar with it. But the research conducted by William Cleveland and Robert McGill in the early 1980s seems to question it. It showed that people decoded a scatter plot (a base of a bubble chart) more accurately and read it quicker than any other chart.
Bubble charts and scatter plots are regarded as the best choice for discovering correlations in data. Ironically, it was the scatter plot itself that probably helped create the notion of correlation and the basic principles of modern statistics. But let’s start from the very beginning of the scatter plot history.
It all begins in the 17th century when René Descartes created the cartesian coordinate system – a 2D plane with X- and Y-axes. Seems that it is all you need for a proper scatter plot, but at that time it was used solely for math studies. It was only at the beginning of the 19th century that William Playfair started using bar, line, and pie charts for visualizing real-world data. It seems that just a small step was needed from the line chart to create a scatter plot. But Playfair was more interested in exploring time series rather than the relationships between the variables, so there was little motivation for him to alter the charts he was using.
So when was the first scatter plot actually created? There’s no consensus about that. It’s apparent that what we today know as a scatter plot has gradually developed over the years. That’s why it’s rather impossible to give full credit to just one person. But there are, in fact, some ongoing discussions in this regard. Michael Friendly and Daniel Denis suggested that the first scatter plot was created by John F. W. Herschel in 1833. He used a scatter plot in a scientific article to show the relationship between the positional angle of double stars and the year of measurement.
The scatter plot takeoff happened in 1870 when Francis Galton, the very creator of the correlation concept, started using scatter plots for his studies of genetics. From that point scatter plots began to emerge and by now a scatter plot and its variations are considered to be the most popular chart type used in scientific papers.
But what about bubble charts, you would ask. It’s really hard to identify the exact moment when the bubble chart evolved from the scatter plot. The only thing we know for sure is when bubble charts started to stand out as a separate chart type. And that was when Hans Rosling in his TED talk in 2007 presented the bubble chart animation which is now widely known as Hans Rosling or Gapminder chart.
Gapminder Foundation, CC BY 3.0, via Wikimedia Commons
This is when the bubble chart became a data visualization star on its own. And that’s why we realized it deserved its own ‘deep dive into’, separately from the scatter plot.
🫧 Discover our bubble chart resource page with more helpful information and design tips.
Table of contents
|
Every bubble chart starts with a classical cartesian plane with X- and Y-axes. As a first step, you need X and Y coordinates to place every dot on the plane. To make the transformation from dots to bubbles you need to have a variable that will represent the size of the bubble. All the used variables should be organized in a certain way – as a flat table (meaning every column represents a variable and every row represents an observation or vice versa).
The bubble chart has a lot of variations. The first one worth mentioning is the categorical bubble chart. It differs from the classical bubble chart by the type of variables that are bound to X- and Y-axes. You guessed it: both of them are categorical. This makes the categorical bubble chart look more like a table with bubbles instead of numbers. It’s comparable to a heatmap which also looks like a table, although a heatmap will use colors to represent values.
A counts plot can be seen as a variation of both a bubble chart and a dot plot. It is similar to a dot plot because both of them have one axis for categorical variables and one – for numerical. But unlike a simple dot plot in a counts plot (and its close variation - a jitter plot), an additional variable can represent the bubble size.
Tilemaps can be considered the alternative to bubble maps or choropleth maps. They’re often used for comparing data between, for example, different regions of a country. It can work pretty well when the dots, or bubbles, form a recognizable whole, just like the United States of America below. To make a tilemap, first, you need a stylized map with every region represented by X and Y values and a variable that will represent the bubble size of every region. Tilemaps can also utilize all the coloring possibilities which we will explore below.
The first step in making a good bubble chart is to choose the right variables for the X- and Y-axis. They communicate the main idea and features that should be represented in the chart. The size of a bubble plays a secondary role, but we must pay attention to the range of the variable that will be represented by the bubble – it should be wide. If the range is narrow the difference between the bubbles will be barely visible which negates the visual usefulness of bubbles.
Another thing to consider when designing a bubble chart is the right styling of X- and Y-axes, including ticks, gridlines, and labels. This is important because bubble charts are used for dataset sizes that are larger than average, so there is little sense to place a data label near each bubble. We need to find the right balance between the number of ticks and gridlines and the data-to-ink ratio. In some cases, when there are few outliers in the data, it might be beneficial to try a logarithmic scale so that the bubbles will be spread evenly.
Overlapping bubbles are a very characteristic feature of bubble charts. Sometimes, one bubble chart can have quite a few such overlaps. To make every bubble clear and visible we need to apply opacity to the bubbles. In most cases, the opacity value between 0.5 and 0.8 on a scale of 1 should work pretty well.
To make a good bubble chart that is easy to read, it’s important to apply a compelling and obvious coloring scheme. For that reason, the most reasonable choice would be categorial coloring. That is because, unlike numerical coloring which can be overwhelming for the viewer, categorical coloring can add some necessary structure. It also allows us to see the bubble groups clearly. For example, in the example below with the help of color, we can easily discover that there is a “red bubbles” cluster.
As mentioned above, for obvious reasons it’s not particularly advised. However, if one doesn’t mind challenging the audience’s chart reading skills, it’s also possible to use numerical coloring of bubbles in bubble charts. There are two ways we can use numerical coloring. The first is to bind the fourth variable to color for adding another dimension to the chart. It could be simply another numerical values column or a datetime values column. By using datetime values we can get a “timeline” using color, where one side of a palette corresponds to the beginning of a period and the other side to the end.
Another way is to bind one of the bound variables to color to highlight its changes. It can be both X- or Y-axis values or bubble size values.
Using numerical diverging coloring can be useful if we want to visually separate bubbles based on one of the values. In some of the numerical palettes, a pale color may occur. In this case, it might be a good idea to add a stroke to the bubble so it won’t be lost on a white background. The binding possibilities of diverging coloring palettes are similar to sequential coloring palettes. The main difference is that diverging palettes have 3 key points: start, center, and end while sequential palettes have only two: start and end.
Would you like to learn more about bubble plot variations and alternatives? Or are you looking for more pro tips? Then you should check out our dedicated bubble chart resource page. It's full of really interesting information that will help you design a really good bubble chart.
We have also prepared a chart properties overview for Datylon users. You can the full user documentation in the Datylon Help Center.