Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
171 views

Unit 3 - Data Visualization

The document discusses data visualization, including its definition as transforming data into visual representations to gain insights. It notes the importance of visualization for exploratory data analysis and decision making. Finally, it outlines several popular data visualization tools like Tableau, Power BI, and Matplotlib and describes the architecture and components of figures in Matplotlib.

Uploaded by

VivuEtukuru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
171 views

Unit 3 - Data Visualization

The document discusses data visualization, including its definition as transforming data into visual representations to gain insights. It notes the importance of visualization for exploratory data analysis and decision making. Finally, it outlines several popular data visualization tools like Tableau, Power BI, and Matplotlib and describes the architecture and components of figures in Matplotlib.

Uploaded by

VivuEtukuru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Data Visualization

Unit 3

By
Dr. G. Sunitha
Professor
Department of AI & ML

School of Computing

Sree Sainath Nagar, A. Rangampet, Tirupati – 517 102


What is Data Visualization

• Data Visualization is a process of transforming data into Visual Representations such


as charts, graphs, diagrams, pictures, and videos which explain the data and allows
to gain insights from data.

• Data Visualization is the process of finding, interpreting, and comparing data so that
it can communicate more clearly complex ideas, thus making it easier to identify once
analysis of logical patterns.

2
Importance of Data Visualization
• “A Picture is worth a Thousand Words”
• “Data is the new oil? No, data is the new soil.” — David McCandless

3
Importance of Data Visualization

Exploratory Data Analysis

4
Importance of Data Visualization . . .
• The main goal of data viewing is to make it easier to identify patterns, styles, and
vendors in large data sets.
• DataViz can improve understanding and analyses, enabling better and faster decision
making.
• In this era, visual communication is a must-have skill for all managers.
• Through visualization, business analysts can quickly analyze the data and prepare
reports to make business decisions effectively.
• Visualization is an integral part of Data Science & Data Analysis.

5
Importance of Data Visualization . . .
• Exploratory Data Analysis (EDA) is very crucial
for the success of all data science projects. It is
an approach to analyze and gather insights into
the various aspects of the data. Data
Visualization is a part of EDA.
• Visualizing Machine Learning Model (Data
Summaries, Test Data Analysis)
• Visualizing ML Model Results (Model Output
Analysis)
• Having one central data visualization
(dashboard) with KPIs is essential for a data-
driven business.
• Actionable insights can be better conveyed to
We humans are a visual species. Graphics
the clients or non-technical persons through
and illustrations can be such a powerful
visualization. weapon for understanding and
communicating.
- Prof . Alberto Cairo

6
Data Visualization Tools
• Before data visualization tools
existed, much of the data used
to guide digital strategies was
static and had to be recollected,
reanalyzed and revisualized on a
seasonal basis. This gave teams
a huge amount of manual work,
handling of spreadsheets and
other tools.

• As dashboards came into wider


popularity, we have the
availability of dynamic,
customizable and always up-to-
date data, which certainly
brought the work of data-related
professionals to a higher level.

7
Data Visualization Tools . . .
Top Data Visualization Tools in the Market:
• Google Data Studio
– since 2016
– Paid & Free
– Simple Interface
– Data Collection is easy through Google ecosystem.

• SalesForce Tableau
– since 2003
– Paid & Free (If you use the public version, the data can’t be private)
– Outstanding visual library
– Less coding is required
– More powerful and detailed analysis
– Data Collection is complicated because it is to be collected externally.

8
Data Visualization Tools . . .
• Microsoft Power BI
– since 2015
– Paid
– Cloud-based
– Steep learning curve
– For digital marketing, there are no integrated connectors available, therefore
data is to be collected externally
– No native connectors to widely used digital tools.
• IBM Watson, DataWrapper, Grafana, Zoho Analytics, R Libraries
• Python Libraries
– Since 2003
– Low learning curve, ease-of-use, flexible, open-source
– Well-supported documentation and functionality
– Excellent tool for Enterprise Application Integration (EAI)
– For now, Python universe is ever-expanding.

9
Types of Data Visualizations

10
Line Plot

Density Plot

11
Scatter Plot

12
Bar Chart Stacked Bar Chart

Grouped Bar Chart

13
Histogram

Waterfall

14
Pie Chart Doughnut Chart

Gauge Chart

15
Funnel Chart

Bubble Chart

16
Radar Chart

Area Chart

17
Pair Plot

18
Mind Map

19
Heat Map

20
Visualizing Clusters

21
Box Plot

22
Voronoi Diagram

23
Violin Plot

Beeswarm Plot

24
Quadtree

Markers

25
KDE Plot

26
Dash Board

27
Geo-Visualization of Covid Statistics

https://geographicinsights.iq.harvard.edu/covidindiapc

28
Python Visualization Libraries

Geoplotlib

29
30
Matplotlib
• Comprehensive library for creating static, animated, and interactive visualizations in
Python.
• Multiplatform data visualization library built on NumPy arrays.
• 2D & 3D plotting library developed in 2003.
• Ability to play well with many operating systems and graphics backends.
• Latest version released 3.5.2
• https://matplotlib.org

Layers in matplotlib
Architecture
source: learnbay.co
31
Architecture of Matplotlib
• Back-end Layer: Provides implementations of three abstract interface classes.
– FigureCanvas:
• Provides area onto which figure is drawn.
• Example: for drawing in real world, we need paper onto which we can draw.
FigureCanvas is similar to paper.
– Renderer:
• Renderer object draws on FigureCanvas object.
• Example: just like we need paintbrush , pen or pencil to draw on paper , we
use Renderer object to draw on FigureCanvas.
– Event:
• Instance of Event handles user input like keystrokes, mouse clicks.

32
Architecture of Matplotlib . . .
• Artist Layer: Allows full control and fine-tuning of the Matplotlib figure — the top-
level container for all plot elements.
– This layer is comprised of one main object, Artist, that uses the Renderer to
draw on the canvas.
– It allows to do more customization than the Scripting layer and is more
convenient for advanced plots especially when handling multiple figures/axes.
– Everything we see on matplotlib figure is an instance of Artist. Example: title,
lines, labels, images etc.
– There are two types of Artist object:
• Primitive : Line2D, Rectangle, Circle , text.
• Composite: Axis, Axes, Tick, and figure.

33
Architecture of Matplotlib . . .
• Scripting Layer:
– lightest scripting interface among the three
layers.
– It is a collection of command style functions and
is therefore considered the easiest layer to use.
– The artistic layer is syntactically heavy as it is
meant for developers and not for individuals
whose goal is to perform quick exploratory
analysis of some data.

34
Figure

• Figure is a top level Artist (container), and is used as a canvas to draw on.

• It holds all the plot elements.

35
Anatomy of a Figure

36
Anatomy of a Figure . . .
Taking a deeper look into the anatomy of a Figure object, we can observe the following
components:
• Title : Text label of the whole Figure object
• X: Text label for the X/Y axis below the spines
• Majo/Y axis label r tick : Major value indicators on the spines
• Major tick label : Text label that will be displayed at the major ticks
• Minor tick : Small value indicators between the major tick marks
• Minor tick label : Text label that will be displayed at the minor ticks
• Spines : Lines connecting the axis tick marks
• Grid : Vertical and horizontal lines used as an extension of the tick marks
• Legend : They describe the content of the plot
• Line : Plotting type that connects data points with a line
• Markers : Plotting type that plots every data point with a defined

Note: Major ticks separate the axis into major units. On category axes, major ticks are
the only ticks available (you cannot show minor ticks on a category axis). On value axes,
one major tick appears for every major axis division.
37
Anatomy of a Figure . . .

38
Plots
• Plots in Matplotlib have a hierarchical structure that nests Python objects to create a tree-
like structure.
• A figure may embed multiple subplots.

39
Axes Class
• A Figure may contain one or more Axes.
• The Axes is an actual plot, or subplot,
depending on whether you want to plot single
or multiple visualizations.
• The Axes Class contains most of the figure
elements: Axis, Tick, Line2D, Text, Polygon,
etc., and sets the coordinate system.
• The image's region with the data space is
commonly known as Axes object.
• Axes allow placement of plots at any location in
the figure; thus it provides flexible way for
creation of sub-plots.

40
Ways to use Matplotlib
• There are essentially two ways to use Matplotlib: OO style and Pyplot style.
– Explicitly create Figures and Axes, and call methods on them (the "object-
oriented (OO) style").

Ex: using modules like matplotlib.figure, matplotlib.axes, matplotlib.lines etc.

– Rely on pyplot to automatically create and manage the Figures and Axes, and
use pyplot functions for plotting.

41
pyplot
• Pyplot is an interface to matplotlib.
• Most of the Matplotlib utilities lies under the pyplot submodule.
• pyplot contains a simpler interface for creating static, animated, and interactive
visualizations in Python.
• Pyplot allows the users to plot the data without explicitly configuring the Figure and
Axes themselves. They are implicitly and automatically configured to achieve the
desired output.
• It is handy to use plt to reference the imported submodule, as follows:
import matplotlib.pyplot as plt

42
Pyplot . . .

• The simplest way of creating a Figure is using pyplot.subplots.

fig = plt.figure()

Creates an instance of a new Figure and returns a handle to it.

• The simplest way of creating a Figure with an Axes is using pyplot.subplots.

# Create a figure containing a single axes.

fig, ax = plt.subplots() OR

fig, ax = plt.subplots( 1 , 1 )

• It is often convenient to create the Axes together with the Figure, but it can also be
added to the Figure manually later.

43
plt.figure() Parameters

44
plt.figure() Parameters

45
plt.figure() Parameters . . .

linewidth
edgecolor

facecolor

Example

46
Types of Plots
• Comparison plots –
– For comparing multiple variables over time.
– Ex: line charts, bar charts, and radar charts.
• Relation plots –
– To show relationships among variables.
– Ex: scatter plots for showing the relationship between two variables, bubble plots for
three variables, correlograms for variable pairs, heatmaps etc.
• Composition plots -
– Used to visualize variables that are part of a whole,
– Ex: pie charts, stacked bar charts, stacked area charts, Venn diagrams etc
• Distribution plots -
– To get a deeper insight into the distribution of variables.
– Ex: histograms, density plots, box plots, violin plots etc.
• Geo Plots –
– For visualizing geospatial data.
– Ex: dot maps, connection maps, marker maps, choropleth maps etc.
47
Types of plots in Matplotlib
• https://matplotlib.org/stable/plot_types/index.html

48
Types of inputs to plotting functions
• Plotting functions expect numpy.array or numpy.ma.masked_array as input, or
objects that can be passed to numpy.asarray.

• Classes that are similar to arrays ('array-like') such as pandas data objects
and numpy.matrix may not work as intended.

• Most methods will also parse an addressable object like a dict, or


a pandas.DataFrame. Matplotlib allows you provide the data keyword argument and
generate plots passing the strings corresponding to the x and y variables.

49
Line Plot
• Line charts are great for visualizing variables over time.
• Line charts are used to display quantitative values over a continuous time period and
show information as a series.
• A line chart is ideal for a time-series, which is connected by straight-line segments.
The value is placed on the y-axis, while the x-axis is the timescale.
• Uses :
– Track/show changes over short and long periods of time.
– Compare changes over the same period of time for more than one group.
– Line charts are well-suited for showing data trends.

50
Line Plot . . .
• Design practices :
– Avoid too many lines per chart
– Adjust scale such that the trend is clearly visible
– A legend should be available to describe each variable.
– The following diagram shows a trend of real-estate prices (in million US dollars)
for two decades.

51
Line Plot . . .
• matplotlib.pyplot.plot()
• Parameters for plot() function

52
Colors, Markers, and Line Styles
• They are a neat way to specify colors , marker types , line styles and line widths.
• matplotlib.colors, matplotlib.markers, matplotlib.lines

53
Ticks, Labels, and Legends
• xlim, xticks, and xticklabels methods.
• These control the plot range, tick locations, and tick labels, respectively.

54
Activity - Visualizing Stock Trends by Using a Line Plot

• Dataset –
– Google Stock Prices - This is the stock price datasets of Google for 5 years from
2013 to 2018.
– https://www.kaggle.com/medharawat/google-stock-price
• Dataset Description –
– 1259 records, 7 features GOOGL_data.csv

Feature Name Description Datatype


date Date String in format dd-mm-yyyy
open Opening price of stock Numeric
high Highest price of stock Numeric
low Lowest price of stock Numeric
close Closing price of stock Numeric
volume Volume of shares Numeric
Name Name of the company String

55
Activity - Visualizing Stock Trends by Using a Line Plot . . .

• Stock trend means “How did the stock price change over a time period??”
– X-axis is time-scale (Feature on x-axis will ‘date’)
– Y-axis is stock price (Feature on y-axis will be ‘close’)

Example

56
Subplots
• A set of subplots can be plotted on the same figure.
• Parameters to subplots() functions
Example

57
Matplotlib Patches
• Patches in matplotlib is a class that encapsulates many different shapes and
informative symbols like arrows, boxes, bulleted lists, etc. Here's what the hierarchy
of patches looks like:

58
Save Figure to Image File

59
Plotting with Pandas and Seaborn
• Pandas and Seaborn libraries support data visualization. They provide simple API.
Both are built on matplotlib.
• In pandas dataset have multiple columns of data, along with row and column labels.
pandas itself has built-in methods that simplify creating visualizations from
Data‐Frame and Series objects.
• Another library is seaborn, a statistical graphics library created by Michael Waskom.
Seaborn simplifies creating many common visualization types. Importing seaborn
modifies the default matplotlib color schemes and plot styles to improve readability
and aesthetics. Even if you do not use the seaborn API, you may prefer to import
seaborn as a simple way to improve the visual aesthetics of general matplotlib plots.

60
Visualization with Pandas – Series Lineplot

61
Visualization with Pandas – DataFrame Line Plot
• DataFrame has a number of options allowing some flexibility with how the columns
are handled; for example, whether to plot them all on the same subplot or to create
separate subplots.

62
Visualization with Pandas – Bar Plot
• DataFrame has a number of options allowing some flexibility with how the columns
are handled; for example, whether to plot them all on the same subplot or to create
separate subplots.

63
Scatter Plot
• Relation plots are perfectly suited to show relationships among variables.
• A scatter plot visualizes the linear correlation between two variables of one group or
multiple groups.
• Uses :
– Can detect whether a linear correlation (relationship) exists between two
variables.
– Allows to plot the relationship for multiple groups or categories using different
colors.

64

You might also like