Analyzing and Visualizing Data
Analyzing and Visualizing Data
Storage
Ingestion Analysis &
Visualization
Processing
Data sources
3
Factors to consider when selecting tools
Fully understand the business Understand the type and quality of Consider the data pipeline and
needs to be able to determine the data, and how often it is who needs access to analyze and
which data analyses and updated and processed. visualize data.
visualizations are needed to help
develop insights.
4
Business needs
5
Business needs: Granularity of insight
Finance A finance manager wants information (such as A CFO wants similar metrics at an aggregate level
revenue, costs, and profit margins) about their line across all lines of businesses. They want to be able
of business. to drill down to any line of business.
Marketing A marketing manager wants to know about the A CMO is interested in related metrics but at a
number of leads, opportunities, and closed deals broader level (such as a state or region).
within an area (such as a postal code or city).
Sales A sales manager is focused on their sales pipeline A VP of sales wants similar information at an
and wants to know how long it takes to close an aggregate level. They want to be able to drill
opportunity. They want to assess how many down to a sales representative or sales territory.
opportunities are needed to achieve quota targets.
6
Business needs: Visualizing insights
Show performance in a Establish or prove Show or examine how Show how your data is Highlight the various
particular area or function. whether a relationship different variables change distributed over certain elements that make up
exists between two or over time, or provide a intervals (based on your data—its
more variables. static snapshot of how clustering or grouping of composition.
different variables data).
compare.
7
Visualize Relationship
Analyzing and Visualizing Data
Visualize Relationship
• Bar/Column chart
• Scatter Plot
• Conected Scatter Plot
• Bubble Chart
• Wordcloud Chart
9
Visualize Relationship
Barchart Use cases
• Useful to display the absolute data that • Volume of google searches by region
include negative values. • Market share in revenure by product
• One axis contains categories, and the
other axis represents values
10
Visualize Relationship
Scatter plot Use cases
11
Visualize Relationship
Conected Scatter plot Use cases
12
Visualize Relationship
Bubble Chart Use cases
• Bubble charts show the data in the form • Relationship between life expectancy, GDP
of a circle. The values of the variables are per capita, & population size
represented by the x-axis and y-axis. The
size of the circle represents the measure
of the variables
13
Visualize Relationship
Wordcloud chart Use cases
• Visualizing the most prevalent words that • Top 100 used words by customers in
appear in a text. This can be used to customer service tickets
visualize the relationship between
different words that appear together or
capture a trend on the most commonly
prevalent words
14
Visualize Trend
Analyzing and Visualizing Data
Visualize a Trend
• Line chart
• Muilty-line chart
• Area chart
• Starcked Area chart
• Spline Chart
16
Visualize a Trend
Line chart Use cases
17
Visualize a Trend
Multi-Line chart Use cases
• Captures multiple numeric variables over • Apple vs Amazon stocks over time
time.
18
Visualize a Trend
Area chart Use cases
• Shows the trend changes over time and • Total sales over time
can be used to attract the attention of the • Active users over time
audiences to know the total changes
across the trends
19
Visualize Part of a Whole
Analyzing and Visualizing Data
Visualize Part of a Whole
• Pie Chart
• Donut Pie Chart
• Heat maps
• Stacked Column chart
• Treemap chart
21
Visualize Part of a Whole
Pie chart Use cases
• One of the most common ways to show • Voting preference by age group
part to whole data. It is also commonly • Market share of cloud providers
used with percentages
22
Visualize Part of a Whole
Donut chart Use cases
• The donut pie chart is a variant of the pie • Android OS market share
chart • Monthly sales by channel
23
Visualize Part of a Whole
Heatmap Use cases
• Heatmaps are two dimentional charts • Departments with the highest amount of
that use color shading to present data attrition over time
trends
24
Visualize Part of a Whole
Stacked Column chart Use cases
25
Visualize Part of a Whole
Treemap chart Use cases
26
Visualize Distribution
Analyzing and Visualizing Data
Visualize Distribution
• Histogram
• Box plot
• Violin plot
• Density plot
28
Visualize Distribution
Histogram Use cases
29
Visualize Distribution
Boxplot Use cases
• Shows the distribution of a variable using • Time spent reading across readers
5 key summary statistics— minimum, first
quartile, median, third quartile, and
maximum
30
Visualize Distribution
Violinplot Use cases
31
Visualize Distribution
Density plot Use cases
32
Visualize a Flow
Analyzing and Visualizing Data
Visualize a Flow
• Sankey chart
• Chord chart
• Network chart
34
Visualize a Flow
Sankey chart Use cases
35
Visualize a Flow
Chord chart Use cases
36
Visualize a Flow
Network chart Use cases
• Similar to a graph, it consists of nodes and • How different airports are connected
interconnected edges. It illustrates how worldwide
different items have relationships with • Social media friend group analysis
each other
37
Data characteristics
38
Data characteristics: Examples of data types
39
Data characteristics: Two fraud detection use cases
Data characteristic considerations Use case 1: Rule-Based Use case 2: ML in Real Time
(Batch Pipeline) (Streaming Pipeline)
How much data is there? Millions of transactions (kilobytes to Millions of transactions
terabytes) (bytes to megabytes)
At what speed and volume is it In predefined intervals In real time
arriving? (minutes to multiple days) (milliseconds to seconds)
How quickly is it processed? Minutes to hours Milliseconds to seconds
What type of data is it? Structured and semistructured Unstructured and semistructured data
What value do insights from the data Historical reporting of fraud cases Ability to detect fraud in real time
provide? Reactive approach Proactive approach
40
Access to data
41
Access to data: Consider authorization level based on role
42