Principles of Data Visualization
Principles of Data Visualization
Eamonn Maguire
CERN School of Computing, Israel
October 2018
A lot of the content for this introduction
comes from this book from Prof. Tamara
Munzner (UBC, Vancouver, Canada) which I
created the illustrations for.
Tamara Munzner
A Visualization should:
1. Save time
2. Have a clear purpose*
3. Include only the relevant content*
4. Encodes data/information appropriately
3
Visualization
Tamara Munzner
A Visualization should:
1. Save time
2. Have a clear purpose*
3. Include only the relevant content*
4. Encodes data/information appropriately
3
Visualization
The role of visualization systems is to provide visual representations of datasets
that help people carry out tasks more effectively.
External representation:
replace cognition with
perception
4
Visualization
The role of visualization systems is to provide visual representations of datasets
that help people carry out tasks more effectively.
External representation:
replace cognition with
perception
4
Visualization
The role of visualization systems is to provide visual representations of datasets
that help people carry out tasks more effectively.
External representation:
replace cognition with
perception
Cerebral: Visualizing Multiple Experimental Conditions
on a Graph with Biological Context. Barsky, Munzner,
Gardy, and Kincaid. IEEE TVCG (Proc. InfoVis) 14(6):
1253-1260, 2008.]
4
What are we Why are we How can we
visualising? visualising it? visualise?
The components
Why do the users of a visualization.
need this, and
what do they
need to be able Good and bad
to do with it? practices.
5
What are we Why are we How can we
visualising? visualising it? visualise?
The components
Why do the users of a visualization.
need this, and
what do they
need to be able Good and bad
to do with it? practices.
6
What are we visualising?
7
What are we visualising?
8
What are we visualising?
8
What are you visualising?
The branches of data visualization
The components
Why do the users of a visualization.
need this, and
what do they
need to be able Good and bad
to do with it? practices.
9
Why are we visualising?
The role of visualisation systems is to provide visual representations of
datasets that help people carry out tasks more effectively.
10
Why are we visualising?
Given a large matrix, or even a large series of numbers, it’s difficult for humans to
‘see’ patterns in the data.
11
Why are we visualising?
https://www.ipcc.ch/ipccreports/tar/wg1/fig2-32.htm 12
Why are we visualising?
Every visualisation should be thought of as a product
of what actions the user needs to take to get to their objective (target)
13
Why are we visualising?
Every visualisation should be thought of as a product
of what actions the user needs to take to get to their objective (target)
13
Why are we visualising?
Every visualisation should be thought of as a product
of what actions the user needs to take to get to their objective (target)
Always keep in mind why you’re doing something. If what you create does not show
what you intended, confuses, or misleads, it’s time to rethink :) 13
Discover
Finding new insights in your data
Implies a level of interactivity to query, compare, correlate etc.
The components
Why do the users of a visualization.
need this, and
what do they
need to be able Good and bad
to do with it? practices.
14
How can you encode information optimally?
15
How can you encode information optimally?
15
If we don’t follow grammatical rules or spell correctly, the
meaning of text can be lost.
Data
We want to maximise
information gained
Encoder Insights
Decoder
(Us) User
Error
We want to minimise
the error
Task
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter
10
0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram
10
0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram 10
Area
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram 10
Area
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Size
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram 10
Area
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Size Saturation
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram 10
Area
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Size Saturation
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
How can you encode information optimally?
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
2 10 4 5 6 9 1 3 5 3 4 7
Scatter 10
Line
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Histogram 10
Area
10
5 5
0 0
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Size Saturation
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
And that’s just a really simple low dimensional example
But, why?
19
Our perception system does not behave linearly.
Some stimuli are perceived less or more than
intended.
20
Stevens, 1975
We have to be careful when mapping data
to the visual world
Some visual channels are more effective for some data types
over others.
Positions
1.0 1.5 2.0 2.5 3.0
Log Error
Angles
Circular
Areas
Rectangular
areas
(aligned or in a
treemap)
It’s quite clear that bar charts are a more effective visual encoding
here than pie charts… our visual system is very good at judging
lengths, but not so much at judging angles and areas.
https://commons.wikimedia.org/wiki/File:Piecharts.svg
Robert Kosara and Drew Skau. 2016. Judgment error in pie chart variations. In Proceedings of the Eurographics: Short Papers (EuroVis '16).
Eurographics Association, Goslar Germany, Germany, 91-95. DOI: https://doi.org/10.2312/eurovisshort.20161167
Drew Skau and Robert Kosara. 2016. Arcs, Angles, or Areas: Individual Data Encodings in Pie and Donut Charts. Comput. Graph.
Forum 35, 3 (June 2016), 121-130. DOI: https://doi.org/10.1111/cgf.12888
25
Things aren’t so bad :)
Positions
T1/T7: Bar charts are better than areas…1.0 1.5 2.0 2.5 3.0
Log Error
Angles
Circular
Areas
Rectangular
areas
(aligned or in a
treemap)
27
Positions
T1/T7: Bar charts are better than areas…1.0 1.5 2.0 2.5 3.0
Log Error
Angles
Circular
Areas
Rectangular
areas
(aligned or in a
treemap)
The Shape Parameter of a Two-Variable Graph Multi-Scale Banking to 45 Degrees An Empirical Model of Slope Ratio
William Cleveland, Marylyn McGill, and Robert McGill
Jeffrey Heer, Maneesh Agrawala
Comparisons
Journal of the American Statistical IEEE Trans. Visualization & Comp. Graphics (Proc. Justin Talbot, John Gerth, Pat Hanrahan
Association, 83, 289–300, 1988 InfoVis), 12(5), 701–708, 2006 IEEE Trans. Visualization & Comp. Graphics
(Proc. InfoVis), 2012
29
HOW
Some data has a natural mapping that our brains expect given
certain types of data
30
Natural Mappings
31
HOW
There are many intricacies of the visual system that must be considered
32
The pop-out effect
We pre-attentively process a scene, and some visual elements
stand out more than others.
34
Not all exhibit the pop-out effect!
Parallel line pairs do not pop out from tilted pairs…
34
Not all exhibit the pop-out effect!
Parallel line pairs do not pop out from tilted pairs…
And not all visual channels pop out as quickly as other. E.g. colour is always on
top. 34
Relative Comparison
35
Relative Comparison
35
Relative Comparison
36
Relative Comparison
36px
36
Relative Comparison
37
Relative Comparison
37
Relative Comparison
4 values
Aligned Unordered
38
Relative Comparison
4 values
Aligned Unordered
8 values
Aligned Unordered
38
Relative Comparison
8 values
20 values
39
A) Known and Unknown Target Search B) Subitizing (how many colours?)
Random Grouped
Random Grouped
Target shown before hand (known) or not shown (unknown). Which grid has more colours?
The unique colour here is the orange square.
7 8
Target shown before hand (known) or not shown (unknown). Which grid has more colours?
The unique colour here is the orange square.
7 8
Target shown before hand (known) or not shown (unknown). Which grid has more colours?
The unique colour here is the orange square.
7 8
Target shown before hand (known) or not shown (unknown). Which grid has more colours?
The unique colour here is the orange square.
7 8
A. Law of Closure B. Law of Similarity C. Law of Proximity D. Law of Connectedness E. Law of Symmetry
[ ]{ }( )
F. Law of Good G. Contour Saliency H. Law of Common Fate I. Law of Past J. Law of K. Figure/Ground
Continuation Experience Pragnanz
b
d
b
a c
42
Integral/Separable Dimensions
Dimension X Dimension X
Width
43
Integral/Separable Dimensions
43
Integral/Separable Dimensions
44
HOW
45
2D always wins…
These options, taken randomly from google image searches so how widely 3D is abused in
information visualisation. All of these charts are manipulating our perception of the data by
using the Z axis to occlude information…it would be avoided in 2D.
47
2D always wins…
48
2D always wins…
48
2D always wins…
http://cms-results.web.cern.ch/cms-results/public-results/preliminary-results/BPH-14-008/index.html
49
HOW
Colour
50
Colour
The simplest, yet most abused of all visual encodings.
http://graphics.wsj.com/infectious-diseases-and-vaccines/
51
Colour
The simplest, yet most abused of all visual encodings.
Wavelength (nm)
UV IR
Visible Spectrum
53
Colour
Luminosity is also not stable across the colours, meaning some colours
will pop out more than others… and not always intentionally.
https://mycarta.wordpress.com/2012/10/06/the-rainbow-is-deadlong-live-the-rainbow-part-3/
54
Colour
Luminosity is also not stable across the colours, meaning some colours
will pop out more than others… and not always intentionally.
https://mycarta.wordpress.com/2012/10/06/the-rainbow-is-deadlong-live-the-rainbow-part-3/
54
Colour
And how we perceive changes in hue is also very different.
Gregory compared the wavelength of light with the smallest observable difference
in hue (expressed as wavelength difference) 55
Is there a colour palette for scientific visualisation
that works?
56
Colour
HSL linear L rainbow palette
https://mycarta.wordpress.com/2012/10/06/the-rainbow-is-deadlong-live-the-rainbow-part-3/
Kindlmann, G. Reinhard, E. and Creem, S., 2002, Face-based Luminance Matching for Perceptual Colormap
Generation, IEEE Proceedings of the conference on Visualization ’02 57
Colour
HSL linear L rainbow palette
These are available in matplotlib and therefore in seaborn, etc, so there’s no excuse :)
58
Colour
There are also lots of default colour maps that can be applied to
particular data types.
Categorical
Categorical
Binary
Categorical
Diverging Sequential
http://colorbrewer2.org/
59
Color
Categorical
Categorical
Binary
Categorical
Diverging Sequential
Categorical
Categorical
Binary
Categorical
Diverging Sequential
Categorical
Categorical
Binary
Categorical
Diverging Sequential
Categorical
Categorical
Binary
Categorical
Diverging Sequential
61
Color
Semantic relevance
Or just consistency
62
Color
What are semantically resonant colours?
64
Color
But, if you are going to use colour, try to think how you
can make it easier for users to decode the colour to the
category without constantly having to look up a legend.
That way, the decoding time is less.
65