Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
16 views

Unit 3 (Python)

Uploaded by

nandiniasadi01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Unit 3 (Python)

Uploaded by

nandiniasadi01
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Python Programming

Unit 3
Data Visualiza on
What is data visualization?
 Data visualization is the representation of data through use of common graphics, such
as charts, plots, infographics, and even animations.
 These visual displays of information communicate complex data relationships and
data-driven insights in a way that is easy to understand.
 Data analysts and data scientists use it to discover and explain patterns and trends.
 Numbers and tables can be overwhelming, but visuals like charts and graphs can reveal
patterns and trends that would be difficult to see otherwise. For example, a bar chart
can show you which products are selling the best, while a line graph can show you how
sales have changed over time.

Need for data visualization


The purpose of data visualization is to bridge the gap between raw data and actionable
insights. By making data accessible and engaging, we empower individuals and organizations
to understand, communicate, and utilize information effectively for informed decision-
making and positive outcomes. The specific purpose of data visualization will vary depending
on the context and audience. The following are some uses of data visualization.

 Identify patterns and trends: Visuals highlight relationships and insights buried within
numbers, making it easier to spot trends, outliers, and correlations.
 Simplify complex information: Graphs and charts break down complex data into
digestible chunks, making it easier for humans to grasp the overall picture.
 Reveal hidden insights: Certain visual patterns might not be apparent in raw data, but
become clear when represented visually.
 Communicate findings effectively: Visuals capture attention and hold it, making data
presentations more engaging and impactful than solely relying on text or numbers.
 Share information with diverse audiences: Visualizations transcend language barriers
and cater to learners of all styles, ensuring everyone can understand the data.
 Inform better decisions: By readily understanding patterns and relationships in
data, individuals and organizations can make more informed choices based on
evidence.
 Track progress and identify areas for improvement: Visualizing data over time reveals
progress towards goals and helps identify areas needing adjustments.

Python for data visualization


Python's readability, comprehensive libraries, integration with data analysis tools, flexibility,
support for interactive visualizations, and thriving community make it an exceptional choice
for data visualization projects across various domains. Here are some reasons that make
python a top choice for data visualization:
 Readability and Ease of Use:
o Python's straightforward syntax and English-like readability make it easy to learn
and use, even for those without extensive programming backgrounds.
o You can create visualizations rapidly, enabling you to explore data and iterate
quickly on your visualizations.
 Comprehensive libraries:
Python offers a vast array of powerful visualization libraries, catering to diverse needs
and preferences:

o Matplotlib: The foundation for many other libraries, providing low-level building
blocks for creating static, animated, and interactive visualizations.
o Seaborn: Built on top of Matplotlib, it offers a high-level interface for creating
aesthetically pleasing and informative statistical graphics.
o Plotly: Excels in creating interactive, web-based visualizations, ideal for sharing
and embedding in web applications.
o Bokeh: Another option for interactive visualizations, with a focus on web-based
plots and dashboards.
o Pandas Visualization: Built into the Pandas data analysis library, it offers
convenient plotting functions directly from DataFrames.
 Seamless Integration with Data Analysis Tools:
o Python's powerful data analysis libraries like NumPy and Pandas make it easy to
clean, manipulate, and prepare data for visualization.
o Integration with libraries like SciPy and Statsmodels enables statistical modeling
and analysis within the same Python environment.
o This smooth integration streamlines the data analysis and visualization
process, from data preparation to visualization and interpretation.
 Interactive Visualizations:
o Plotly and Bokeh enable the creation of dynamic and interactive visualizations
that engage users and facilitate exploration of data.
o These visualizations can be easily shared and embedded in web applications for
widespread dissemination.
 Strong Community and Support:
o Python boasts a large and active community of data scientists and
developers, providing extensive resources, tutorials, and support for
visualization tasks.
o Ongoing development and updates ensure that Python's visualization
capabilities remain cutting-edge.
Python libraries for data visualization
Matplotlib

 Matplotlib is a Python plotting library for creating static, dynamic, and interactive
visualizations. Its computational mathematics extension is NumPy.
 Despite being over a decade old, it remains the most popular plotting library in the
Python world.
 Since matplotlib was the first Python data visualization library, many other libraries
have been built on top of it or are intended to work in tandem with it during research.

 Although matplotlib is great for visualizing details, it isn't very practical for quickly and
easily creating publication-quality charts.

Plotly
 Plotly is a free open-source graphing library for creating data visualizations.
 Plotly (plotly.py) is a Python library that is built on top of the Plotly JavaScript library
(plotly.js) and can be used to create web-based data visualizations that can be
displayed in Jupyter notebooks or web applications using Dash, or saved as individual
HTML files.
 Plotly supports scatter plots, histograms, line charts, bar charts, pie charts, error bars,
box plots, multiple axes, sparklines, dendrograms, 3-D charts, and other chart types.
 Contour plots, which are uncommon in other data visualization libraries, are also
available in Plotly. Plotly is also available for use without an internet connection.

Seaborn

 Seaborn is a Python library that allows you to create statistical graphics.


 It has advanced software for producing visually appealing and informative statistical
graphics.
 Matplotlib is used primarily for education and research by data scientists, while
Seaborn is used for publications and real-world demonstrations.
 Seaborn makes use of matplotlib's power to create beautiful charts with just a few
lines of code. The main difference is that its default designs and color palettes are
designed to be more visually appealing and traditional.
 Its dataset-oriented plotting mechanisms work with data frames and vectors that
contain entire datasets, internally performing concept mapping and statistical
aggregation to produce insightful plots.

GGplot

 ggplot is a versatile library for plotting graphs in Python that was originally
implemented in R.
 It is a Domain-Specific language used to create domain-specific visualisations, primarily
for data analysis.
 Ggplot allows the graph to be plotted in a straightforward manner with only two lines
of code.
 The same code written with matplotlib, on the other hand, is very complex and
involves many lines of code. As a result, ggplot makes graph coding easier.

Bokeh

 The Grammar of Graphics is the foundation of Bokeh, a library similar to ggplot.


 Produces interactive web-ready plots that can be output in a variety of formats,
including HTML documents and JSON objects.
 Bokeh has long been regarded as one of the most popular libraries for real-time
streaming and data processing.

Matplotlib for data visualization:

Matplotlib is a multiplatform data visualization library built on NumPy arrays, and designed
to work with the broader SciPy stack.

One of Matplotlib’s most important features is its ability to play well with many operating
systems and graphics backends. Matplotlib supports dozens of backends and output types,
which means you can count on it to work regardless of which operating system you are using
or which output format you wish. This cross-platform, everything-to-everyone approach has
been one of the great strengths of Matplotlib.

In recent years, however, the interface and style of Matplotlib have begun to show their age.
Newer tools like ggplot and ggvis along with web visuali- zation toolkits based on D3js and
HTML5 canvas, often make Matplotlib feel clunky and old-fashioned.

Importing matplotlib Just as we use the np shorthand for NumPy and the pd shorthand for
Pandas, we will use some standard shorthands for Matplotlib imports:

Using style sheets


Matplotlib provides various stylesheets that allow you to easily change the appearance of
your plots. Stylesheets control the default styles of elements like lines, markers, and text in
your plots.
You can switch easily between different styles by simply changing the imported style sheet
using

Some available style-names are as follows:


 'classic',
 'dark_background',
 'fast',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn-v0_8',
 'seaborn-v0_8-bright',
You can get the list of all available styles using the code:
matplotlib.style.available

Various customization options in matplotlib


Matplotlib provides a wide range of customization options to tailor your plots according to
your preferences. Here are various customization options in Matplotlib:

1. Plot Style:

Matplotlib has various predefined styles that can be applied to change the overall
appearance of plots.
Example:
plt.style.use('ggplot')

2. Axes Labels and Title:

Add labels to the x and y axes, and a title to the entire plot.
Example:
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt. tle('Title of the Plot')

3. Ticks and Tick Labels:


Customize ck loca ons, labels, and forma ng for be er readability.
Example:
plt.x cks([1, 2, 3], ['One', 'Two', 'Three'])

4. Legend:
Add a legend to the plot to label different elements.
Example:
plt.plot(x1, y1, label='Line 1')
plt.plot(x2, y2, label='Line 2')
plt.legend()

5. Grid:
Display grid lines to aid in reading values from the plot.
Example:
plt.grid(True)

6. Figure Size
Adjust the size and aspect ra o of the en re figure.
Example:
plt.figure(figsize=(8, 6))

7. Axis Limits:
Set limits for the x and y axes to focus on a specific region of interest.
Example:
plt.xlim(0, 10)
plt.ylim(0, 20)

8. Background Color:
Change the background color of the en re plot.
Example:
plt.figure(facecolor='lightgray')

These are just a few examples of the many customiza on op ons available in Matplotlib.

Line plots
Line plots are a type of data visualization that uses lines to connect individual data points,
providing a clear representation of how a particular variable changes over a continuous
interval. Line plots can be used for the following:
 Visualizing Trends: Line plots are highly effective in revealing trends or patterns in
data. The connected lines make it easy to observe how a variable changes over a
continuous range, providing insights into the overall direction of the data.
 Showing Relationships: Line plots are particularly useful for illustrating relationships
between two variables, especially when one variable is dependent on the other. The
slope and direction of the line indicate the nature of the relationship.
 Highlighting Time Series Data: Line plots are commonly used for visualizing time
series data, where the x-axis represents time. This allows analysts to observe changes
in a variable over time, making trends, cycles, and seasonality more apparent.
 Comparing Multiple Series: Line plots can accommodate multiple lines on the same
graph, making it easy to compare the trends of different variables or multiple groups.
This is useful for identifying patterns and differences across categories.

Example:
1. For analyzing the monthly sales performance of a retail store over the course of a year
a line plot would be particularly useful. The x-axis would represent the months
(January through December), and the y-axis would represent the total sales for each
month.
2. Analyzing the historical stock prices of a company over several years. line plot would
provide a clear depiction of how the stock prices have changed over time, allowing you
to identify upward or downward trends.

Line plots can be created in Python with Matplotlib's pyplot library. To build a line plot, first
import Matplotlib. It is a standard convention to import Matplotlib's pyplot library as plt.

• To define a plot, you need some values, the matplotlib.pyplot module, and an idea of what
you want to display.

import matplotlib.pyplot as plt


plt.plot([1,2,3],[5,7,4])
plt.show()

output:

More than one line can be in the plot. To add another line, just call the plot (x,y) function
again.
Example:
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-1, 1, 50)
y1 = 2*x+ 1
y2 = 2**x + 1
plt.figure(num = 3, figsize=(8, 5))
plt.plot(x, y2)
plt.plot(x, y1, linewidth=1.0, linestyle='--')

Some key parameters:


 x, y : array-like or scalar
The horizontal / vertical coordinates of the data points. x values are optional and
default to range(len(y)).
 linestyle : {'-', '--', '-.', ':', '', (offset, on-off-seq), ...}
 linewidth: float
 alpha: scalar or None (for setting the transparency)
 color or c : to set the color of the line.

Scatter plots
A scatter plot is a visual representation of how two variables relate to each other. we can use
scatter plots to explore the relationship between two variables, for example by looking for
any correlation between them.

Key Features :
Scatter plots display individual data points on a two-dimensional graph, where each point
represents the values of two variables.
Each data point is usually marked with a dot or another symbol, making it easy to distinguish
individual observations.
Unlike line plots, scatter plots do not connect data points with lines. This lack of connection
allows for a clear view of the distribution of points.
Scatter plots are effective for visualizing the spread or dispersion of data points along both
axes. They provide insight into the range of values for each variable.

Scatter plots can be used for


 Identification of Patterns: Scatter plots are excellent for identifying patterns and trends
in data. The arrangement of points on the plot can reveal the nature of the relationship
between the two variables.
 Correlation Assessment: By observing the general direction of the points, you can
assess the correlation between the two variables. Positive correlation is indicated by
points sloping upwards, while negative correlation slopes downwards.
 Outlier Detection: Outliers, or data points that deviate significantly from the overall
pattern, are easily identifiable in scatter plots. This is crucial for understanding the data
distribution and potential anomalies.
 Non-Linear Relationships: Scatter plots allow for the visualization of non-linear
relationships, where the pattern doesn't follow a straight line. This is important for
understanding complex interactions between variables.
 Visualizing Dispersion: The spread or dispersion of points around a central trend can
be easily observed, providing insights into the variability of the data.

Scatter plot using matplotlib


We can create a simple scatter plot using matplotlib by passing x and y values to plt.scatter()
# scatter_plotting.py
importmatplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
x = [2, 4, 6, 6, 9, 2, 7, 2, 6, 1, 8, 4, 5, 9, 1, 2, 3, 7, 5, 8, 1, 3]
y = [7, 8, 2, 4, 6, 4, 9, 5, 9, 3, 6, 7, 2, 4, 6, 7, 1, 9, 4, 3, 6, 9]
plt.scatter(x, y)
plt.show()

Some key parameters:


 x, y : float or array-like, shape (n, )
The data positions.
 s : float or array-like, shape (n, ), optional
The marker size
 c : array-like or list of colors or color, optional
The marker colors.
 marker: (default: 'o')
The marker style.
 cmap: (default: 'viridis')
The Colormap instance or registered colormap name used to map scalar data to colors.
 alpha: float, default: None
The alpha blending value, between 0 (transparent) and 1 (opaque)

Visualizing errors
Error bars are included in Matplotlib line plots and graphs. Error is the difference between
the calculated value and actual value.

Without error bars, bar graphs provide the perception that a measurable or determined
number is defined to a high level of efficiency. The method matplotlib.pyplot.errorbar()
draws y vs. x as planes and/or indicators with error bars associated.

Adding the error bar in Matplotlib, Python. It's very simple, we just have to write the value
of the error. We use the command:
plt.errorbar(x, y, yerr = 2, capsize=3)

Where:

x = The data of the X axis.

Y = The data of the Y axis.

yerr = The error value of the Y axis. Each point has its own error value.

xerr = The error value of the X axis.

capsize = The size of the lower and upper lines of the error bar

A simple example, where we only plot one point. The error is the 10% on the Y axis.

import matplotlib.pyplot as plt


x=1
y = 20
y_error = 20*0.10 ## El 10% de error
plt.errorbar(x,y, yerr = y_error, capsize=3)
plt.show()

• Parameters of the errorbar :

a) yerr is the error value in each point.

b) linestyle, here it indicate that we will not plot a line.

c) fmt, is the type of marker, in this case is a point ("o") blue ("b").

d) capsize, is the size of the lower and upper lines of the error bar.

e) ecolor, is the color of the error bar. The default color is the marker color.

Density and contour plots


• It is useful to display three-dimensional data in two dimensions using contours or color-
coded regions. Three Matplotlib functions are used for this purpose. They are :
a) plt.contour for contour plots,

b) plt.contourf for filled contour plots,

c) plt.imshow for showing images.

• A contour line or isoline of a function of two variables is a curve along which the function
has a constant value. It is a cross-section of the three-dimensional graph of the function f(x,
y) parallel to the x, y plane.

• Contour lines are used e.g. in geography and meteorology. In cartography, a contour line
joins points of equal height above a given level, such as mean sea level.

Example:
import numpy as np
import matplotlib.pyplot as plt
xlist = np.linspace(-3.0, 3.0, 3)
ylist = np.linspace(-3.0, 3.0, 4)
X, Y = np.meshgrid(xlist, ylist)
plt.figure()
cp = plt.contour(X, Y, Z, colors='black', linestyles='dashed')
plt.clabel(cp, inline=True, fontsize=10)
plt.title('Contour Plot')
plt.xlabel('x (cm))
plt.ylabel('y (cm)')
plt.show()

Output:

When creating a contour plot, we can also specify the color map.

Histograms
In a histogram, the data are grouped into ranges (e.g. 10 - 19, 20 - 29) and then plotted as
connected bars. Each bar represents a range of data. The width of each bar is proportional
to the width of each category, and the height is proportional to the frequency or percentage
of that category.

It provides a visual interpretation of numerical data by showing the number of data points
that fall within a specified range of values called "bins".
Histograms can display a large amount of data and the frequency of the data values. The
median and distribution of the data can be determined by a histogram. In addition, it can
show any outliers or gaps in the data.

• Matplotlib provides a dedicated function to compute and display histograms: plt.hist()

• Code for creating histogram with randomized data :

Output:

Parameters: This method accept the following parameters that are described below:
 x : This parameter are the sequence of data.
 bins : This parameter is an optional parameter and it contains the integer or
sequence or string.
 range : This parameter is an optional parameter and it the lower and upper
range of the bins.
 bottom : This parameter is the location of the bottom baseline of each bin.
 histtype : This parameter is an optional parameter and it is used to draw type of
histogram. {‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’}
 color : This parameter is an optional parameter and it is a color spec or sequence
of color specs, one per dataset.
 label : This parameter is an optional parameter and it is a string, or sequence of
strings to match multiple datasets.

Customizing plot legend


Plot legends give meaning to a visualization, assigning meaning to the various plot
elements. We previously saw how to create a simple legend; here we'll take a look at
customizing the placement and aesthetics of the legend in Matplotlib.

The simplest legend can be created with the plt.legend() command, which automatically
creates a legend for any labeled plot elements:

There are many ways we might want to customize such a legend. For example, we can specify
the location and turn off the frame

We can use a rounded box (fancybox) or add a shadow, change the transparency (alpha
value) of the frame, or change the padding around the text:

Multiple subplots

Subplots mean groups of axes that can exist in a single matplotlib figure. subplots() function
in the matplotlib library, helps in creating multiple layouts of subplots. It provides control
over all the individual plots that are created.
subplots() without arguments returns a Figure and a single Axes. This is actually the simplest
and recommended way of creating a single Figure and Axes.

There are 3 different ways (at least) to create plots (called axes) in matplotlib. They
are:plt.axes(), figure.add_axis() and plt.subplots()

• plt.axes(): The most basic method of creating an axes is to use the plt.axes function. It takes
optional argument for figure coordinate system. These numbers represent [bottom, left,
width, height] in the figure coordinate system, which ranges from 0 at the bottom left of the
figure to 1 at the top right of the figure.

• figure.add_axis():The equivalent of this command within the object-oriented interface


is fig.add_axes()

• By calling subplot(n,m,k), we subdidive the figure into n rows and m columns and specify
that plotting should be done on the subplot number k. Subplots are numbered row by row,
from left to right.
 plt.subplots: The Whole Grid in One Go

The approach just described can become quite tedious when creating a large grid of subplots,
especially if you'd like to hide the x- and y-axis labels on the inner plots. For this purpose,
plt.subplots() is the easier tool to use (note the s at the end of subplots). Rather than creating
a single subplot, this function creates a full grid of subplots in a single line, returning them in
a NumPy array. The arguments are the number of rows and number of columns, along with
optional keywords sharex and sharey, which allow you to specify the relationships between
different axes.

Text and Annotations


When drawing large and complex plots in Matplotlib, we need a way of labelling certain
portion or points of interest on the graph. To do so, Matplotlib provides us with the
"Annotation" feature which allows us to plot arrows and text labels on the graphs to give
them more meaning.

There are four important parameters that you must always use with annotate().

a) text: This defines the text label. Takes a string as a value.

b) xy: The place where you want your arrowhead to point to. In other words, the place you
want to annotate. This is a tuple containing two values, x and y.

c) xytext: The coordinates for where you want to text to display.


d) arrowprops: A dictionary of key-value pairs which define various properties for the arrow,
such as color, size and arrowhead type.

Example :

import matplotlib.pyplot as plt


importnumpy as np
fig, ax = plt.subplots()
x = np.arange(0.0, 5.0, 0.01)
y =np.sin(2* np.pi *x)
# Annotation
ax.annotate('Local Max',
xy = (3.3, 1),
xytext=(3, 1.8),
arrowprops = dict(facecolor = 'green',
shrink =0.05))
ax.set_ylim(-2, 2)
plt.plot(x, y)
plt.show()

This can be done manually with the plt.text/ax.text command, which will place text at a
particular x/y value:
Example :

import matplotlib.pyplot as plt


importnumpy as np
fig, ax = plt.subplots()
x = np.arange(0.0, 5.0, 0.01)
y =np.sin(2* np.pi *x)
# Text
ax.text(3.3, 1, 'Local Max')
ax. set_ylim(-2, 2)
plt.plot(x, y)
plt.show()
Three-Dimensional plotting in Matplotlib
• Matplotlib is the most popular choice for data visualization. While initially developed for
plotting 2-D charts like histograms, bar charts, scatter plots, line plots, etc., Matplotlib has
extended its capabilities to offer 3D plotting modules as well.

• First import the library :

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

• The first one is a standard import statement for plotting using matplotlib, which you would
see for 2D plotting as well. The second import of the Axes3D class is required for enabling 3D
projections. It is, otherwise, not used anywhere else.

• Create figure and axes

fig = plt.figure(figsize=(4,4))

ax = fig.add_subplot(111, projection='3d')

Geographic map projections

The purpose of plotting geographic maps in data visualization is to spatially represent and
analyze data across geographical locations. Geographic maps provide a visual and intuitive
way to explore patterns, trends, and relationships related to location-based data. Here are
several key purposes for plotting geographic maps:
 Geographic maps allow for the visualization of how data is distributed across different
regions or locations. Example: Analyzing the spatial distribution of population density,
income levels, or disease outbreaks in different areas.
 Maps help in identifying spatial patterns and trends that may not be immediately
apparent in tabular data. Example: Visualizing the migration patterns of wildlife,
identifying hotspots of criminal activity, or observing regional variations in climate.
 Maps facilitate the exploration of relationships between different geographic entities.
Example: Understanding trade relationships between countries, exploring
transportation networks, or analyzing connections between different landmarks.
 Geographic maps are useful for analyzing demographic data in different regions.
Example: Visualizing age distribution, educational attainment, or cultural diversity across
cities or countries.
 Maps help identify spatial correlation and clustering of data points. Example: Mapping
the distribution of customer locations to identify potential markets, or visualizing
clusters of disease cases for epidemiological studies.

• Basemap is a toolkit under the Python visualization library Matplotlib. Its main function is
to draw 2D maps, which are important for visualizing spatial data. Basemap itself does not
do any plotting, but provides the ability to transform coordinates into one of 25 different
map projections.

• Matplotlib can also be used to plot contours, images, vectors, lines or points in transformed
coordinates. Basemap includes the GSSH coastline dataset, as well as datasets from GMT for
rivers, states and national boundaries.

• These datasets can be used to plot coastlines, rivers and political boundaries on a map at
several different resolutions. Basemap uses the Geometry Engine-Open Source (GEOS)
library at the bottom to clip coastline and boundary features to the desired map projection
area. In addition, basemap provides the ability to read shapefiles.

• Example objects in basemap:

a) contour(): Draw contour lines.

b) contourf(): Draw filled contours.

c) imshow(): Draw an image.

d) pcolor(): Draw a pseudocolor plot.

e) pcolormesh(): Draw a pseudocolor plot (faster version for regular meshes).

f) plot(): Draw lines and/or markers.

g) scatter(): Draw points with markers.


h) quiver(): Draw vectors.(draw vector map, 3D is surface map)

i) barbs(): Draw wind barbs (draw wind plume map)

j) drawgreatcircle(): Draw a great circle (draws a great circle route)

• For example, if we wanted to show all the different types of endangered plants within a
region, we would use a base map showing roads, provincial and state boundaries, waterways
and elevation. Onto this base map, we could add layers that show the location of different
categories of endangered plants. One added layer could be trees, another layer could be
mosses and lichens, another layer could be grasses.

Basemap basic usage:

from mpl_toolkits.basemap import Basemap


import matplotlib.pyplot as plt
map = Basemap()
map.drawcoastlines()
plt.show()

The most useful piece of the Basemap toolkit is the ability to over-plot a variety of data onto
a map background. For simple plotting and text, any plt function works on the map; you can
use the Basemap instance to project latitude and longitude coordinates to (x, y) coordinates
for plotting with plt

Seaborn plots
• Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics. Seaborn is an open-
source Python library.

• Seaborn helps you explore and understand your data. Its plotting functions operate on
dataframes and arrays containing whole datasets and internally perform the necessary
semantic mapping and statistical aggregation to produce informative plots.

• Keys features:
a) Seaborn is a statistical plotting library

b) It has beautiful default styles

c) It also is designed to work very well with Pandas dataframe objects.

Seaborn works easily with dataframes and the Pandas library. The graphs created can also be
customized easily.

• Functionality that seaborn offers:

a) A dataset-oriented API for examining relationships between multiple variables

b) Convenient views onto the overall structure of complex datasets

c) Specialized support for using categorical variables to show observations or aggregate


statistics

d) Options for visualizing univariate or bivariate distributions and for comparing them
between subsets of data

e) Automatic estimation and plotting of linear regression models for different kinds of
dependent variables

f) High-level abstractions for structuring multi-plot grids that let you easily build complex
visualizations

g) Concise control over matplotlib figure styling with several built-in themes

h) Tools for choosing color palettes that faithfully reveal patterns in your data.

Some plots:
1.) Scatter plot
2.) Heatmap

3.) Pairplot
Matplotlib vs seaborn:

Limitations of Seaborn:

Focused on Statistics: While Seaborn is excellent for statistical data visualization, it may not
offer the same level of flexibility as Matplotlib for general-purpose plotting.

Learning Curve: Users familiar with Matplotlib might need some time to adjust to Seaborn's
conventions and functions, especially if they are used to more granular control over plot
customization.

Plotly plots

Plotly library in Python is an open-source library that can be used for data visualization and
understanding data simply and easily. Plotly supports various types of plots like line charts,
scatter plots, histograms, box plots, etc.
While Matplotlib has long been the foundation for static visualizations, Plotly steps onto
the stage with a focus on interactivity and modern web-based charts, offering a vibrant
alternative to traditional data plotting tools.

Key features:

1. Interactivity: Plotly excels in creating interactive visualizations, enabling user


interactions like hovering, clicking, zooming, and panning.
2. Web Integration: Plotly is designed for seamless integration into web applications,
blogs, and dashboards, making it ideal for online data sharing.
3. Versatile Charts: Plotly supports a wide range of chart types, from scatter plots to 3D
plots, allowing for diverse and engaging visualizations.
4. Easy Customization: Plotly offers extensive customization options for colours, fonts,
markers, and labels, making it suitable for users of all skill levels.
5. Documentation: Plotly’s documentation is years ahead of Matplotlib. It’s easy to find
information about the kind of plot that you are trying to create. Meanwhile,
Matplotlib’s website is confusing. If you are a beginner, it will probably take a while
to find what you are looking for.

Drawbacks:

1. Web Dependencies: While Plotly is excellent for web-based projects, it may not
be the most suitable choice for creating static, publication-quality images without
additional steps or libraries.

2. Performance with Large Datasets: Plotly may experience performance issues with
extremely large datasets or complex visualizations, potentially leading to slower
rendering times.

3. Dependency on Plotly Cloud: While Plotly provides an open-source library for


local usage, some advanced features and functionalities are available only
through Plotly’s online cloud service. Users who prefer to work offline or in secure
environments may face limitations when utilizing certain Plotly features that are
tightly coupled with the cloud platform.

Some graphs using plotly:


1.) Line graph
2.) Scatter plot

3.) 3D Scatter plot

Matplotlib vs plotly
Feature Plotly Matplotlib
Basic interactivity capabilities.
Offers high interactivity with May require additional effort
Interactivity zooming, panning, and hover to achieve high interactivity.
Feature Plotly Matplotlib
features out of the box. Well-
suited for web-based applications.
Plotly Express provides a high- More granular control over
level interface with concise syntax plot elements but often
for creating complex requires more code for
Ease of Use visualizations. complex visualizations.
Well-suited for web-based Primarily designed for static
applications. Can be easily plotting. Integration into web
Web embedded in web pages and used applications may require
Integration to create interactive dashboards. additional steps.
Supports a wide range of plot
types, including 3D plots,
geographic maps, and various Offers standard 2D plots and
Plot Types interactive charts. limited 3D capabilities.
Has a growing community and is
Community widely used in data science and Well-established with a large
Support web development. and mature community.

Exploring ggplot
ggplot2 is built on the philosophy of the Grammar of Graphics, a systematic approach to
describing and building complex visualizations through a consistent and structured grammar.

Advantages of ggplot2:

1. Declarative Syntax: ggplot2 uses a declarative syntax, allowing users to express what
they want in their plot rather than specifying how to achieve it. This results in concise
and expressive code.
2. Layered Structure: Plots in ggplot2 are built layer by layer, making it easy to add
components such as points, lines, and annotations. This layering system contributes to
the flexibility and extensibility of the library.
3. Faceting: ggplot2 supports faceting, allowing users to create multiple plots based on
the levels of a categorical variable. This is useful for exploring how relationships differ
across subgroups.
4. Themes and Customization: The library provides a variety of themes to control the
overall appearance of plots. Additionally, users can customize almost every aspect of
a plot to match specific requirements.

Limitations of ggplot2:
 Learning Curve: For users new to the Grammar of Graphics philosophy, there might be
a learning curve in understanding the different components and layers. However, once
mastered, it can lead to more efficient and expressive code.

 Availability in R: ggplot2 is specific to the R programming language, so users working


with other languages must use alternative libraries with different syntax.
 Limited 3D Plotting: While ggplot2 excels at 2D plots, it has limited support for creating
3D plots compared to some other specialized libraries.

Comparison with Matplotlib (Python):

 Declarative vs. Procedural: ggplot2 follows a declarative approach, emphasizing what


the plot should look like, while Matplotlib is more procedural, requiring users to specify
each step in the plot creation process.
 Ease of Use: ggplot2's declarative syntax often leads to more concise and expressive
code, making it easier for users to create complex visualizations with fewer lines of
code. Matplotlib, while powerful, may require more code for certain tasks.
 Faceting: Both ggplot2 and Matplotlib support faceting, but ggplot2's implementation
is often considered more intuitive and user-friendly.
 Customization: ggplot2 provides a high level of customization, and its themes make it
easy to control the overall appearance of plots. Matplotlib offers fine-grained control
over individual plot elements but may require more effort for customization.

Exploring PyViz
PyViz is not a single library but a collection of tools and libraries that work together to create
a holistic visualization ecosystem. It includes libraries such as HoloViews, GeoViews, Panel,
and others.
PyViz libraries often use a declarative syntax, allowing users to express what they want to
visualize rather than specifying how to create the visualization. This can lead to more concise
and expressive code.
PyViz emphasizes the creation of interactive dashboards with widgets for user interaction.
This is particularly useful for creating dynamic and responsive visualizations.

Advantages of PyViz

1. High-Level Abstraction: PyViz simplifies the process of creating complex visualizations


by providing high-level abstractions and concise syntax. This can lead to more readable
and maintainable code.
2. Interactivity: PyViz tools, especially those built on Bokeh, offer interactive capabilities
for exploring data. This includes zooming, panning, and interactive widgets for filtering
and exploring datasets.
3. Dashboard Creation: PyViz excels in creating dashboards that allow users to interact
with visualizations dynamically. This is valuable for presenting data in an engaging and
user-friendly manner.
4. Multi-Platform Support: PyViz tools can be used in various environments, including
Jupyter notebooks, standalone scripts, and web applications. This flexibility allows
users to choose the environment that best suits their needs.

Limitations of PyViz

1. Learning Curve: While PyViz aims to simplify the process of creating visualizations,
there might still be a learning curve for users new to the ecosystem, especially when
working with multiple libraries.
2. Community Size: The PyViz community, while growing, might not be as extensive as
some other visualization ecosystems. This can impact the availability of community-
generated resources and support.
3. Matplotlib Integration: While PyViz can integrate with Matplotlib, it might not provide
the same level of fine-grained control over plot elements as using Matplotlib directly.

Exploring Bokeh and Panel

Bokeh

Bokeh specializes in creating interactive and dynamic visualizations with features like
zooming, panning, and hovering over data points.
Bokeh is designed to be embedded in web applications, making it suitable for creating
interactive dashboards and web-based data visualizations.
Advantages:

1. Interactivity: Bokeh excels in providing interactivity, allowing users to create engaging


and dynamic visualizations for exploration and presentation.
2. Web Integration: Well-suited for embedding plots into web applications, making it a
preferred choice for building interactive and web-based data dashboards.
3. Server Capability: Bokeh Server enables the creation of real-time data applications and
dashboards that can update dynamically as new data arrives.

Limitations:

1. Learning Curve: Bokeh may have a learning curve, especially for users new to web-
based plotting libraries, as it involves understanding concepts like Bokeh models and
layouts.
2. Plot Customization: While Bokeh offers good customization options, achieving highly
specific plot configurations may require more effort compared to more granular
libraries.
Panel
Panel is a high-level app and dashboarding framework built on top of Bokeh, making it easy
to create complex dashboards and applications.
Panel provides a component-based layout system that allows users to build dashboards using
a combination of charts, widgets, and custom HTML components.
Advantages:

1. Rapid Dashboard Prototyping: Panel is designed for rapid prototyping of dashboards


and applications, making it easy for users to quickly build and iterate on interactive
displays.
2. Integration with Other Libraries: Panel seamlessly integrates with other data science
libraries like Pandas, Matplotlib, and Plotly, allowing for a flexible and versatile
approach to building dashboards.

Limitations:

1. Dependency on Bokeh: As a high-level framework built on Bokeh, Panel inherits some


of Bokeh's limitations. Users looking for more granular control may need to work
directly with Bokeh.
2. Learning Curve: While Panel simplifies the creation of dashboards, users may still need
to understand Bokeh concepts for more advanced customizations and interactions.

Exploring Yellow brick


Yellowbrick is a Python library specifically designed for visual model evaluation. It provides a
set of visualizers to help assess the performance and behavior of machine learning models.
Yellowbrick seamlessly integrates with the scikit-learn library, enhancing the model
evaluation process with visualizations that complement scikit-learn's functionality.
Yellowbrick offers a diverse collection of visualizers for different aspects of model evaluation,
including classification, regression, clustering, feature analysis, and text visualization.

Advantages of Yellowbrick

1. Intuitive Visualizations: Yellowbrick simplifies the process of model evaluation by


providing intuitive visualizations that make it easier to interpret and communicate
complex information about model performance.
2. Seamless Integration: Works seamlessly with scikit-learn, allowing users to
incorporate visualizations into their existing machine learning workflows without
significant modifications.
3. Comprehensive Coverage: Offers a comprehensive suite of visualizers covering various
aspects of model evaluation, enabling users to explore different facets of their models.
4. User-Friendly API: Yellowbrick's API is designed to be user-friendly, making it
accessible to both beginners and experienced machine learning practitioners. The
library emphasizes simplicity and ease of use.

Limitations of Yellowbrick
1. Domain-Specific: Yellowbrick is primarily focused on visualizations for model
evaluation. Users seeking more general-purpose data visualization or plotting
capabilities may need to complement it with other libraries like Matplotlib or Seaborn.
2. Learning Curve: While Yellowbrick aims to simplify the process of model evaluation,
users who are not familiar with the underlying concepts of machine learning evaluation
metrics may still require some learning to interpret the visualizations effectively.
3. Availability of Visualizers: Some specific models or tasks may not have dedicated
visualizers within the Yellowbrick library. Users might need to explore additional tools
or create custom visualizations for specific requirements.

You might also like