How To Choose The Right Data Visualization
How To Choose The Right Data Visualization
by Mike Yi
Introduction
Data visualizations are a vital component of a data analysis, as they have the
ability to efficiently summarize large amounts of data through a graphical
format. There are many chart types available, each with their own strengths
and use cases. One of the trickiest parts of the analysis process is choosing
the right way to represent your data using one of these visualizations.
When deciding on a chart type, first think about the type of role the chart
will serve. Common roles for data visualization include:
Next, consider the types of data you want to plot. The type of chart you
use will depend on if the data is categorical, numeric, or some combina-
tion of both. Certain visualizations can also be used for multiple purposes
depending on these factors. This book is organized with this approach
in mind, with one chapter for each visualization role, each with multiple
chart types to cover common types of data and subtasks.
Note that this document should only serve as a general guideline: it is pos-
sible that breaking out of the standard modes will help you gain additional
insights. Experiment with not just different chart types, but also how the
variables are encoded in each chart. It’s also good to keep in mind that you
aren’t limited to showing everything in just one plot. It is often better to
keep each individual plot as simple and clear as possible, and instead use
multiple plots to make comparisons, show trends, and demonstrate rela-
tionships between multiple variables.
How this book is organized
This book is divided into chapters, one for each of the main categories for
using a data visualization. Each chapter is headed by a short introduction,
followed by a list of chart types falling in that category. Each chart type is
accompanied by a short description and one or more icons. Below is a key
for decoding these symbols:
BASIC: Chart types with this icon represent typical or standard chart
types. When you need to create a data visualization, try to see if one
of these chart types works first, before deciding on an uncommon or
advanced type.
UNCOMMON: Chart types with this icon are slightly more unusual
than the most common chart types. Use cases for these charts are
more specialized than other chart types in that same category or more
frequently seen in other roles.
ADVANCED: Chart types with this icon are even more specialized in
their roles. Make sure that the chart type is the best one for your use
case before implementing it. Sometimes, these chart types will not be
built into visualization software or libraries, and additional work will
need to be done to put these types of chart together.
Introduction ............................................................................................................................... 3
It is important to keep in mind that you don’t always need to use a chart to
depict your data. Sometimes, just showing the data as text is the most effec-
tive way of conveying information.
Bullet chart
Chart type comparing a single value to another number,
often a benchmark rather than another data point. The
single value is shown with a bar’s length, while comparison
points are shown as shaded regions or a perpendicular line.
Table
Compares data points (rows) across multiple different
attributes (columns). Usually sorted by an important or
prominent attribute to improve utility.
Charts for showing change over time
One of the most common applications for visualizing data is to see the
change in numeric value for a feature or metric across time. These charts
usually have time on the horizontal axis, moving from left to right, with
the variable of interest’s values on the vertical axis.
Line chart
Most common chart type for showing change over time. A
point is plotted for each time period from left to right; each
point’s vertical position indicates the feature’s value. Points
are connected by line segments to emphasize progression
across time.
Sparkline
A miniature line chart with little to no labeling, designed to
be placed alongside text or in tables. Provides a high-level
overview without attracting too much attention. Can also
be seen in a sparkbar form, or miniature bar chart (see
below).
Bar chart
Each time period is associated with a bar; each bar’s value
is represented in its height above (or below) a zero-baseline.
Works best when there aren’t too many time periods to
show.
Box plot
Each time period is associated with a box and whiskers;
each set of box and whiskers shows the range of the most
common data values. Best when there are multiple record-
ings for each time period and a distribution of values needs
to be plotted.
Tracking change over time is of key interest in the financial domain. One
specialist chart developed for this field includes the following:
Candlestick chart
Looks like a box plot, but each box and whiskers encodes
different statistics. The box ends indicate opening and
closing prices, while color indicates the direction of change.
Charts for showing part-to-whole composition
Sometimes, we need to know not just a total, but the components that
comprise that total. While other charts like a standard bar chart can be
used to compare the values of the components, the following charts put
the part-to-whole decomposition at the forefront.
Pie chart
The whole is represented by a filled circle. Parts are propor-
tional slices from that circle, one for each categorical group.
Best with five or fewer slices with distinct proportions.
Doughnut chart
A pie chart with a hole in the center. This central area can
be used to show a relevant single numeric value.
Treemap
Can be thought of as a more generalized Marimekko plot.
Sub-boxes do not need to have a consistent cut direction
at a particular hierarchy level, and there can be more than
two levels of hierarchy.
Charts for depicting flows and processes
Funnel chart
Seen in business contexts, showing how people encoun-
ter a product and eventually become users or customers.
One bar is plotted for each stage, whose lengths reflect the
number of users. Connecting regions emphasize connec-
tions in stages and give the chart type’s namesake shape.
Sankey diagram
The width of the colored region shows the relative volume
at each part of a process. Allows for multiple sources of
inputs and outputs to be visualized.
Gantt chart
Used for project scheduling, breaking them down into indi-
vidual tasks. Each task is associated with a bar, providing a
timeline for when each task should begin and end.
Charts for looking at how data is distributed
One important use for visualizations is to show how data points’ values
are distributed. This is particularly useful during the exploration process,
when trying to build an understanding of the properties of data features.
Note: Charts for visualizing data distributions across two or more variables
are covered in the Relationships chapter.
Bar chart
Used when a variable is qualitative or takes discrete values.
The height of each bar indicates the amount of each cate-
gorical group.
Histogram
Similar to a bar chart, but used when a variable takes
continuous numeric values. The variable’s numeric range
is divided into bins for aggregating counts. Bars are plotted
flush against each other to emphasize the variable’s contin-
uous nature.
Density curve
An alternative to the histogram when a variable takes nu-
meric values. Each data point contributes a small amount
of local area; the areas are summed across all points to form
the full curve.
Box plot
A box and whiskers shows the range of the most common
data values. The ends of the box outline the central 50% of
the data. More often used to compare distributions be-
tween groups rather than as an overall summary.
Letter-value plot
Extends the box plot’s marking of quartiles with additional
boxes that denote eighths, sixteenths, and smaller quan-
tiles. Best when there are lots of data available to make
estimates stable.
Violin plot
Combines a density curve plotted on a center line with
a box plot as a statistical summary. More often used to
compare distributions between groups rather than as an
overall summary.
The violin plot usually includes a box plot to provide statistical detail to
the density curve. The internal box plot may sometimes be excluded, or
another type of linear distribution chart can also be used instead. All of the
below are best with few or a moderate number of data points; with many
data points, a summary like the box plot is best.
Rug plot
All data points are plotted as tick marks on a straight line
with value corresponding precisely with position.
Strip plot
Like a rug plot, but with dots instead of tick marks. Some-
times plotted with points randomly jittered up or down to
reduce overlapping.
Swarm plot
Like a strip plot, but deliberate shifting is performed to
prevent overlapping. Some horizontal jitter may be needed
in order to keep the dot swarm compact.
Charts for comparing values
between groups
Bar chart
Most basic way of comparing numeric values between
groups or categories. Each group is assigned a bar; each
bar’s value is represented in its height above (or below) a
zero-baseline.
Lollipop chart
Replaces the bars of a bar chart with lines and dots. Useful
for when there are a lot of groups or categories to plot.
Dot plot
Replaces the bars of a bar chart with just dots. Since value
is indicated by position instead of length, the dot plot can
be good when a zero baseline is not useful.
Line chart
Each line in a line chart shows how values (vertical posi-
tion) change across time (horizontal). One line is plotted
for each group to be compared. Best when there are five or
fewer groups to plot.
Sparkline
Smaller line charts typically with little to no labeling.
Designed to show a high-level overview inline with text or
tables, but also useful when there are many groups to plot.
Ridgeline
A series of line charts or density curves (see Distributions)
with partially offset axes used to compare distributions
between groups. Best when there are distinct patterns
across groups.
Box plot
Compares a statistical summary of numeric values be-
tween groups. A set of box and whiskers depicting the
range of the most common data values (see Distributions) is
assigned to each group or category.
Letter-value plot
Used in a similar way as the box plot, but a letter-value
plot (see Distributions) is assigned to each group instead.
Best used when there are lots of data in each group so that
statistical estimates are stable.
Violin plot
Compares distributions between groups. A violin assembly
of density curve and box plot (see Distributions) is assigned
to each group or category.
One sub-category of comparison charts comes from the comparison of
values between groups for multiple attributes.
Slope chart
Specialized type of line chart. Two parallel lines indicate
different times, with vertical position indicating value. One
line segment is drawn between the two times for each data
point. Useful for when there are many data points; line
slopes provide a quick indicator for direction of change for
each one.
Dumbbell plot
Used to compare two data points across multiple variables.
Similar to parallel coordinates, each data point has a value
plotted on each line. In contrast, line segments connect
points within each variable, emphasizing the difference
in value. Can be used as an alternative to the slope chart
to show change between two time periods for multiple
groups.
In certain cases, you might be interested in just the ranking between
groups without needing to see the actual values.
Bump chart
Modified version of a line chart where vertical position
corresponds to rank rather than value. This change allows
it to support more categories than a standard line chart.
Scatter plot
Standard chart type for showing relationships between
two numeric variables. Each point’s position on the hor-
izontal and vertical axes indicate value on the associated
variable.
Bubble chart
Scatter plot with point size dictated by a third numeric
variable. Scatter plots can be extended in other ways: point
shapes can encode a categorical variable, and color can be
used to indicate either categorical or numeric data. It is best
to keep a scatter plot to a maximum of three variables to
maintain understandability.
Heatmap
Extension of bar charts and histograms (see Distribu-
tions) to two variables, each of which can be categorical
or numeric. Each axis represents groups or bins of values
for one of the variables, forming a grid. Cell colors indicate
data frequency or a summary of a third variable for each
intersection of axis variables.
Dendrogram
Specialized chart type to show similarity between data
points. The lower the branch connecting two data points
is, the more similar they are. Sometimes plotted with an
accompanying heatmap to depict the underlying data.
Sometimes the form of a relationship is that of a network of connections. A
mathematical graph consisting of nodes connected by edges is a basic form,
but other chart types exist for showing this type of data.
Network diagram
Points (nodes or vertices) represent individual entities.
Lines (edges) connect entities with a particular relation-
ship. Line thickness may be used to encode value. Vertex
positions do not necessarily have any inherent meaning,
and may simply be placed just to make connections as clear
as possible.
Transit map
Practical application of network diagrams for train and
subway systems. Frequently, these take a fair level of
abstraction, emphasizing connections between stations
rather than their actual geographical locations.
Chord diagram
Like a standard network diagram, but vertices are ar-
ranged in a circle.
Tree diagram
A network diagram organized to show hierarchical re-
lationships. The direction of each edge corresponds to a
relationship between the connected nodes, such as par-
ent-child or senior-junior relationships.
Charts for looking at geographical data
Scatter map
Scatter plot built on top of a geographical map, using geo-
graphic coordinates as point positions.
Bubble map
Bubble chart built on top of a geographic map, where point
size is an indicator of value. Can also be used to group to-
gether points in a scatter map if they are too dense.
2-d histogram
Heatmaps can be built on top of geographic areas. Some-
times seen with a hexagon-shaped grid rather than a
rectangular grid. May distort the geography on its edges.
Connection map
Network information and flows built on top of a geograph-
ic map.
Choropleth
Similar to a heatmap, but colors are assigned to geopolitical
regions rather than an arbitrary grid. Values are often in
the form of rates or ratios to avoid distortion due to popula-
tion density.
Cartogram
Geopolitical regions sized by value. This necessarily re-
quires distortion in shapes and topology.
Appendix A: Essential charts for data
analysis
This guide covers dozens of chart types, and many more exist for even
more specialized use cases. It can sometimes be daunting to figure out
which chart will work best for the data at hand.
To help with the chart choosing process, the next page contains a full-page
graphic featuring eighteen common chart types for data analysis. Most
visualizations for dashboards and reports will be served well one of these
chart types. Feel free to print the graphic out and use it as a quick refer-
ence for any time you need to visualize your data.
When using the chart picker, don’t forget to keep in mind three points:
Table
Relationship
Show raw numbers for multiple
data points on multiple variables Scatter Plot
Relationship between
two numeric variables
Change over Time
Line Chart Bubble Chart
Heatmap
Distribution
Distribution by two binned variables
Bar Chart (categorical or numeric)
Comparison or distribution by a
single categorical variable
Geospatial
Histogram Bubble Map
There are a few chart types excluded from the guide that probably
wouldn’t be considered too rare or specialized. Chart types like the ones
in this section have been excluded since they are less efficient than other,
more common chart types, or have flaws that make them more difficult
to understand. Only use these charts when you have a unique or specific
point that would benefit from an alternative representation.
Pictogram / isotype
Used to compare values between groups. Each icon rep-
resents a specific quantity; values are usually rounded to
the nearest whole number of icons. Thus, this loses some
precision compared to the more common bar chart.