Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Data Visualization

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Data Visualization

Uploaded by

sr5824241
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Understanding Data Visualization

Data visualization translates complex data sets into visual formats that are easier
for the human brain to understand. This can include a variety of visual tools such
as:
●​ Charts: Bar charts, line charts, pie charts, etc.
●​ Graphs: Scatter plots, histograms, etc.
●​ Maps: Geographic maps, heat maps, etc.
●​ Dashboards: Interactive platforms that combine multiple visualizations.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


The primary goal of data visualization is to make data more accessible and easier
to interpret allow users to identify patterns, trends, and outliers quickly. This is
particularly important in big data where the large volume of information can be
confusing without effective visualization techniques.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK
Data visualization is extremely useful in understanding the data and obtaining
useful insights. It can allow you to get an instant understanding of the data that is
just not possible by observing rows of data in a table. That’s what makes it so
important in Data Science!

Data visualization is a critical aspect of data science because it allows data


scientists to present complex data in an easily understandable and digestible
format. Here are a few reasons why data visualization is so important in data
science:

​Helps with data exploration: Data visualization enables data scientists to


explore the data and identify patterns, trends, and outliers quickly. By
visualizing data, one can identify relationships between variables that
may not be apparent when looking at data in a tabular format.
​Communicates insights: Data visualization helps data scientists
communicate insights to stakeholders in an easy-to-understand format.
A well-designed visualization can help non-technical stakeholders
understand complex data and make informed decisions based on that
data.
​Aids in decision-making: Data visualization is critical to decision-making
because it enables stakeholders to see trends and patterns that may not
be immediately apparent in a raw data set. By presenting data in a visual
format, data scientists can help stakeholders make informed decisions
quickly.
​Facilitates storytelling: Data visualization is an essential component of
data storytelling. By creating compelling visualizations, data scientists
can create a narrative around the data that resonates with stakeholders.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


Tools Used in Data Visualization
Below are some of the best tools that most companies are using in today’s time:
●​ Tableau
●​ Jupyter
●​ Plotly
●​ Power BI
●​ Qlikview
❖​DATATYPES IN DATA VISUALIZATION
In data visualization, data types play a crucial role in determining how data should
be represented visually. Understanding the different types of data helps in
choosing the right visualization technique. Below are the primary data types used
in data visualization:

1. Categorical Data (Qualitative Data):


●​ Definition: Data that represents categories or groups. These values are
distinct and not ordered.
●​ Examples: Colors, Names, Brands, Gender, Countries.
●​ Visualizations: Bar charts, pie charts, stacked bar charts, and dot plots.
Subtypes:
●​ Nominal: Categories without a specific order (e.g., countries, colors).
●​ Ordinal: Categories with a meaningful order but no consistent difference
between them (e.g., ratings like "poor", "fair", "good").

2. Numerical Data (Quantitative Data):


●​ Definition: Data that represents quantities and can be measured
numerically. It can be continuous or discrete.
●​ Examples: Age, Temperature, Income, Height, Sales figures.
●​ Visualizations: Line charts, histograms, scatter plots, area charts, box plots.
Subtypes:
●​ Discrete Data: Data that can only take distinct, separate values (e.g.,
number of people, count of cars).

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


●​ Continuous Data: Data that can take any value within a range and can be
measured with greater precision (e.g., weight, temperature).

3. Time-Series Data:
●​ Definition: Data points indexed or arranged in time order. Typically used for
data that is collected over time.
●​ Examples: Stock prices, weather data, sales over months.
●​ Visualizations: Line charts, area charts, time-series plots.

4. Geospatial Data:
●​ Definition: Data that is related to geographic locations. Often, this type of
data includes longitude, latitude, and other spatial attributes.
●​ Examples: Population density, sales by region, weather patterns.
●​ Visualizations: Maps (choropleth maps, scatter plots on maps, heatmaps).

5. Textual Data:
●​ Definition: Data in the form of text, often used to extract sentiment, topics,
or relationships from unstructured sources.
●​ Examples: Customer reviews, social media posts, survey responses.
●​ Visualizations: Word clouds, text frequency plots, network diagrams.

6. Boolean Data:
●​ Definition: Data that represents binary outcomes (true/false, yes/no, 1/0).
●​ Examples: A customer’s subscription status, presence or absence of a
feature.
●​ Visualizations: Bar charts, pie charts, heatmaps.

7. Mixed Data:
●​ Definition: Datasets that include a mix of both categorical and numerical
data.
●​ Examples: A dataset with "age" (numerical) and "gender" (categorical).
●​ Visualizations: Scatter plots, box plots, stacked bar charts.
Understanding these data types helps in selecting the appropriate type of chart or
graph, ensuring clarity and meaningful insights from the data visualization.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


❖​Data Visualisation Scale
In data visualization, scales are used to map data values to visual aesthetics like
position, color, size, shape, or opacity.
The way scales are applied depends on the type of data and the type of visual
representation you're using.
The goal of using scales is to represent data accurately and meaningfully while
making the visualization intuitive and easy to interpret.

Types of Scales in Data Visualization


There are primarily four types of scales used in data visualization, each
corresponding to different types of data:

1. Nominal (Categorical) Scale:


●​ Purpose: Used for categorical or qualitative data where the values
represent distinct groups or categories.
●​ Mapping Aesthetics: Categories are mapped to colors, shapes, or other
discrete visual features.
●​ Examples:
○​ Mapping categories like "Region" to different colors in a map or bar
chart.
○​ Mapping "Gender" to different shapes (e.g., circle for males, square
for females) in a scatter plot.
●​ Visual Example: A pie chart showing the percentage of sales for different
product categories, where each category has a unique color.

2. Ordinal Scale:
●​ Purpose: Used for ordered categories (where values have a natural order
but no consistent interval).
●​ Mapping Aesthetics: Ordinal values are mapped to discrete colors, sizes, or
shapes, with the order reflected in the visual hierarchy.
●​ Examples:
○​ Mapping survey responses ("Very Unsatisfied" to "Very Satisfied") to
different shades of color or varying sizes in a bar chart.
○​ Ranking products based on customer satisfaction in a bar plot, where
colors represent the satisfaction levels.
●​ Visual Example: A bar chart showing satisfaction levels, with darker shades
representing higher satisfaction.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


3. Quantitative (Continuous) Scale:
●​ Purpose: Used for continuous or numerical data where values have a
meaningful order and are measured on a numeric scale.
●​ Mapping Aesthetics: Numeric values are mapped to continuous gradients
of color, size, or position. This scale is used for continuous variables like age,
income, or temperature.
●​ Examples:
○​ Mapping the temperature to a gradient from blue (cold) to red (hot)
on a heatmap.
○​ Mapping the value of "Income" to the size of a point in a scatter plot,
where higher income values result in larger points.
●​ Visual Example: A line chart showing temperature trends over time, with
the color of the line changing according to temperature values.

4. Time Scale:
●​ Purpose: Used for data points that are indexed by time, such as dates or
timestamps.
●​ Mapping Aesthetics: Time data is usually mapped to the x-axis in plots, with
values corresponding to specific time intervals (days, months, years).
●​ Examples:
○​ Plotting stock prices over time, where each point on the x-axis
represents a specific date.
○​ Mapping event occurrences to the timeline of a Gantt chart.
●​ Visual Example: A time-series plot showing the growth of a company’s
revenue over several years.

Key Aesthetic Elements Mapped Using Scales


●​ Color: Colors can represent different categories (nominal), represent
gradients (quantitative), or show progression in a sequence (ordinal).
○​ Color Scales: Categorical colors for discrete groups, continuous
gradients for numerical data, or diverging colors to show
positive/negative values.
●​ Size: Numerical values are mapped to the size of visual elements like points,
bars, or bubbles.
○​ Size Scales: Used in scatter plots, bubble charts, or bar charts to show
relationships between data points.
●​ Shape: Different shapes can be used to represent different categories.
○​ Shape Scales: Used in scatter plots or symbol maps, where different
groups are represented by different shapes.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


●​ Position: The position of data points on axes (x, y) typically represents
continuous or ordered data values.
○​ Position Scales: Commonly used in scatter plots, bar charts, and line
charts.
●​ Opacity: Opacity or transparency can be used to reflect density or the
magnitude of data values.
○​ Opacity Scales: Often used in heatmaps, where areas with higher
data values appear more opaque.

Examples of Applying Scales in Visualization


●​ Bar Chart (Nominal Scale): Categories (e.g., countries) on the x-axis, with
the height of each bar representing a numerical value (e.g., population
size), and each country could have a unique color.
●​ Heatmap (Quantitative Scale): A heatmap where color intensity represents
the magnitude of values, such as higher sales values being darker or
warmer colors (red), and lower sales being lighter or cooler colors (blue).
●​ Scatter Plot (Quantitative and Ordinal Scales): A scatter plot where
numerical values like "price" and "quantity" are mapped to axes, and
"product category" is mapped to color (nominal scale) or size (ordinal
scale).

Conclusion
Scales in data visualization are essential for effectively mapping raw data to visual
elements that are easy to understand.
The choice of scale depends on the type of data being represented (categorical,
ordinal, continuous, or time) and the type of aesthetic being used (color, size,
position, etc.).
The goal is to make the data both informative and intuitive, allowing the audience
to quickly grasp insights from the visualization.
The coordinate system plays a crucial role in data visualization by providing a
structured way to represent data in a two- or three-dimensional space. It allows
the mapping of data points to specific locations on a graph, which helps viewers
interpret the data in a clear and meaningful way.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


❖​Types of Coordinate Systems Used in Data Visualization
1.​ Cartesian Coordinate System:
○​ Description: The most common coordinate system used in data
visualization, consisting of two axes—an x-axis (horizontal) and a
y-axis (vertical). It is used for plotting data in two dimensions.
○​ Use: This system is ideal for numerical data, such as scatter plots, line
charts, bar charts, and histograms.
○​ Applications:
■​ Scatter plots: Where each point is plotted based on two
variables (e.g., x-axis for time and y-axis for sales).
■​ Line charts: Where data points are connected to show trends
over time or other continuous variables.
○​ Example: A line chart showing stock prices over time, where time is
plotted along the x-axis and price is plotted along the y-axis.

○​
○​
○​
Polar Coordinate System:
○​ Description: A coordinate system that uses a radial distance from a
central point (origin) and an angle to position data points.
○​ This system is commonly used for visualizing data with cyclical
patterns, such as angular data (e.g., directions, time of day) or
circular distributions.
○​ Use: Used in polar plots, radar charts, and windrose diagrams.
○​ Applications:
■​ Radar charts: Used to display multi-dimensional data in a way
that highlights the relationships between several variables
(e.g., performance metrics for different departments of a
company).
■​ Windrose diagrams: Used to visualize wind direction and
speed.
○​ Example: A radar chart showing performance of a car in different

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


categories like speed, fuel efficiency, and safety, with each axis
representing a different category.

2.​ Logarithmic Coordinate System:


○​ Description: A variant of the Cartesian coordinate system, where one
or both axes follow a logarithmic scale instead of a linear one. This is
useful when data spans several orders of magnitude.
○​ Use: Often used to represent data that grows exponentially or has a
large variance (e.g., population growth, financial data, scientific
measurements).
○​ Applications:
■​ Logarithmic line charts: Used to represent financial data like
stock prices, or scientific data like earthquake magnitudes.
■​ Exponential growth patterns: Representing data such as viral
infections, compound interest, or decaying radioactive
substances.
○​ Example: A line chart showing the growth of a viral infection over
time, where the x-axis (time) and y-axis (cases) use logarithmic scales
to better show exponential growth.
3.​

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


4.​
3D Coordinate System:
○​ Description: A coordinate system that includes three axes: x, y, and z.
It is used to plot data in three dimensions, providing a more detailed
view of complex data relationships.
○​ Use: Often used in scientific, engineering, and geospatial data
visualization, where there are multiple variables.
○​ Applications:
■​ 3D scatter plots: Used for visualizing datasets with three
variables.
■​ Surface plots: Used for visualizing 3D functions, such as
geographic data or scientific models.
■​ Geospatial data visualization: Mapping geographic coordinates
(latitude, longitude, and elevation).
○​ Example: A 3D scatter plot of scientific data, such as the relationship
between temperature, pressure, and volume in a gas system.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


Coordinate System in Data Visualization – Key Uses
1.​ Positioning Data Points:
○​ The most obvious use of a coordinate system is to position individual
data points. For example, in a scatter plot, each point's position
corresponds to its values on the x and y axes, allowing the user to
visualize the relationship between two variables.
2.​ Creating Axes and Gridlines:
○​ Coordinate systems define where the axes are placed (typically with a
zero or origin point) and whether the data values are spaced
consistently (linear or logarithmic). Gridlines often help the viewer
visually align data points with their corresponding values.
3.​ Facilitating Trend Analysis:
○​ The Cartesian system helps visualize trends, such as how one variable
changes over time or in relation to another. For instance, a time
series plot uses a coordinate system to show how a variable (like
stock price) changes over time.
4.​ Supporting Complex Data:
○​ In the case of multidimensional data, a 3D coordinate system or polar
coordinate system can be used to provide a more complex
representation. This allows for better understanding of multi-variable
relationships, such as in data science, engineering, and geography.
5.​ Facilitating Comparisons:

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


○​ Coordinate systems also allow for comparisons between different
data series. For example, you can use different markers (shapes or
colors) in a scatter plot to represent different groups or categories of
data, aiding in comparative analysis.
6.​ Enabling Geospatial Mapping:
○​ In geographic visualizations, the coordinate system is based on
geographic coordinates (latitude and longitude). This enables the
visualization of geospatial data, such as mapping the spread of
disease or the distribution of resources across a country.

Examples of Coordinate Systems in Different Types of Visualizations


●​ Bar Charts (Cartesian system): Data points are placed along the x-axis
(categories) and y-axis (values), making it easy to compare discrete
categories.
●​ Scatter Plots (Cartesian system): Data points are plotted using a Cartesian
coordinate system to show the relationship between two continuous
variables (e.g., height vs. weight).
●​ Radar Charts (Polar system): Each axis represents a different variable, and
the data points are plotted based on their angular position and distance
from the center.
●​ Geospatial Maps (Geographic coordinate system): Data points such as city
locations or distribution of events are placed based on latitude and
longitude.

Conclusion
The coordinate system is fundamental to how data is represented visually.
Whether using a Cartesian, polar, or logarithmic system, the coordinate system
allows data to be positioned accurately, making it possible to reveal patterns,
relationships, and trends. The choice of coordinate system depends on the type of
data and the visualization goal—whether it's displaying a time series, geographic
data, or exploring complex multi-variable relationships.
Color is a powerful tool in data visualization, used to represent data values
effectively. By associating data with different colors, we can help viewers quickly
identify patterns, relationships, and trends, and understand the meaning behind
the data. Colors provide a way to encode information visually, making complex
datasets easier to interpret.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


❖​Key Uses of Color to Represent Data Values
1.​ Categorical Data (Qualitative Data):
○​ Purpose: Color is used to differentiate between distinct categories or
groups, making it easier to compare and identify patterns in the data.
○​ Color Mapping: Each category or group is assigned a unique color, so
viewers can quickly distinguish between them.
○​ Examples:
■​ In a bar chart showing sales by region, different regions (e.g.,
North, South, East, West) might be represented by distinct
colors.
■​ In a pie chart representing market share, each segment could
be assigned a different color to represent different companies
or products.
2.​ Best Practice: Use a set of colors that are easily distinguishable, and avoid
using too many colors to prevent confusion.
3.​ Ordinal Data:
○​ Purpose: Color is used to represent ordered categories, where the
color gradient or intensity reflects the ranking or level of the data.
○​ Color Mapping: The colors used can be from a continuous scale or a
sequence where darker or lighter shades represent higher or lower
values.
○​ Examples:
■​ In a bar chart showing customer satisfaction ratings, different
shades of color could represent levels of satisfaction, from
"very unsatisfied" (light color) to "very satisfied" (dark color).
■​ In a heatmap, color intensity (e.g., from pale yellow to deep
red) might be used to indicate the severity or frequency of an
event.
4.​ Best Practice: Choose color gradients that convey progression clearly,
making sure the differences between the levels are visible and intuitive.
5.​ Quantitative Data (Numerical Data):
○​ Purpose: Color is used to represent numerical values, typically
through continuous color gradients. This allows viewers to see the
variation in data values at a glance.
○​ Color Mapping: A gradient scale, often using colors like blue-to-red or
green-to-red, is applied, where low values are assigned cooler colors
(like blue or green) and high values are assigned warmer colors (like
yellow, orange, or red).
○​ Examples:
■​ In a scatter plot, numerical values (e.g., income or
temperature) might be represented by color, with cooler colors
indicating lower values and warmer colors indicating higher

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


values.
■​ In a choropleth map, regions with higher population density
may be shaded darker red, while regions with lower population
density are shaded lighter.
6.​ Best Practice: Use perceptually uniform color gradients to avoid
misinterpretation. A continuous gradient helps in representing numerical
data more effectively.
7.​ Diverging Data:
○​ Purpose: Color is used to represent data that has both positive and
negative values, or data that diverges from a central neutral value.
The goal is to highlight deviations from a baseline or expected value.
○​ Color Mapping: A diverging color scale is often used, where one side
of the scale (e.g., negative values) is represented with one color (e.g.,
blue) and the other side (positive values) is represented with another
color (e.g., red).
○​ Examples:
■​ In a heatmap showing changes in revenue over time, negative
growth (loss) may be represented with shades of blue, while
positive growth (gain) may be shown with shades of red.
■​ A line chart showing temperature deviation from average,
where negative temperature deviations are colored blue and
positive deviations are colored red.
8.​ Best Practice: Ensure that the central color (typically white or gray) is clearly
distinguishable from the colors used for positive and negative values to
avoid confusion.
9.​ Geospatial Data:
○​ Purpose: Color is used to represent geospatial data on maps, often
indicating variations in geographical features, population density, or
other spatial variables.
○​ Color Mapping: Colors are mapped to data values based on location,
often using a gradient or discrete color bands to represent different
ranges of data.
○​ Examples:
■​ Choropleth maps use color gradients to represent the intensity
of a variable (e.g., population density, median income) across
different geographic regions, such as countries or states.
■​ Heatmaps may show concentrations of events (e.g., traffic
accidents, disease outbreaks) in specific regions, with darker
colors indicating higher densities.
10.​Best Practice: Use contrasting colors to ensure the data is easily
interpretable, and avoid overuse of bright or jarring colors that can confuse
viewers.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


11.​Emphasizing Specific Data Points:
○​ Purpose: Color is used to highlight or draw attention to specific
points or data ranges, helping users focus on particular areas of
interest within the data.
○​ Color Mapping: Specific colors can be used to emphasize outliers,
anomalies, or particular data points, making them stand out from the
rest.
○​ Examples:
■​ In a scatter plot, outliers might be highlighted in a bright color
(e.g., red) to draw attention to them.
■​ In a line chart, key points or significant data milestones (e.g.,
peaks, troughs, or breaks) can be highlighted in a different
color to make them stand out.
12.​Best Practice: Use highlighting colors sparingly to prevent overwhelming
the viewer and maintain clarity.

❖​Best Practices for Using Color in Data Visualization


1.​ Ensure Accessibility: Consider colorblind users by choosing color palettes
that are distinguishable to all viewers. Tools like ColorBrewer provide color
schemes that are colorblind-friendly.
2.​ Maintain Consistency: Keep color usage consistent throughout the
visualization. The same color should represent the same data category or
value across different charts or views.
3.​ Avoid Overuse of Colors: Too many colors can confuse viewers. Aim to limit
the number of distinct colors to make the chart easier to interpret.
4.​ Consider Color Perception: Not all colors are perceived equally by the
human eye. For example, reds and greens can be difficult to distinguish in
certain contexts, especially for colorblind individuals. Choose colors that
have distinct perceptual differences.
5.​ Use Meaningful Color Choices: In some cases, using culturally or
contextually meaningful colors can help enhance the understanding of the
data. For example, red for "stop" or "alert" and green for "go" or "safe" can
make data interpretation more intuitive.

Examples of Common Color Schemes:


●​ Sequential Color Scheme: For ordered data (e.g., from low to high values),
using shades of a single color (e.g., light blue to dark blue) can represent
increasing magnitude.
●​ Diverging Color Scheme: For data with a meaningful midpoint (e.g., positive

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


vs. negative), a two-color gradient (e.g., blue for negative and red for
positive) can help to show deviations from a central point.
●​ Categorical Color Scheme: For distinct categories or labels, using different
colors for each category (e.g., red for "A", blue for "B", green for "C") can
make it easy to differentiate between them.

Conclusion
Color is an essential aspect of data visualization, enabling the representation of
data in an easily understandable and visually appealing way. When used
effectively, color can highlight patterns, indicate relationships, and improve the
interpretability of complex datasets. By considering factors like data type, color
contrast, accessibility, and consistency, you can enhance the communication of
data insights and make your visualizations more engaging and informative.

Representing amounts, distributions, and proportions effectively in data


visualization is essential for conveying insights in a clear, intuitive, and accurate
way.
Each type of data visualization serves a unique purpose in helping to highlight the
patterns, trends, or relationships in the dataset.
Below are some of the best visual techniques for representing these three
important aspects of data.

1. Representing Amounts
When visualizing amounts or magnitudes (quantitative data), the goal is to show
the actual size or value of a data point or group. Here, the focus is on comparing
the size of different values.
Recommended Visualizations:
●​ Bar Charts:
○​ Usage: Best for comparing individual values or groups. Bar charts
display the size of each data point along the y-axis (vertical) and
categories or groups along the x-axis (horizontal).
○​ Example: A bar chart showing the total sales by different regions.
○​ Best for: Comparing amounts across categories.
●​ Column Charts:
○​ Usage: Similar to bar charts but with vertical bars. These are ideal for
displaying data over time or comparing amounts between different

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


categories.
○​ Example: Monthly revenue shown as columns.
○​ Best for: Visualizing changes over time or across categories.
●​ Pie Charts:
○​ Usage: Useful for showing the proportion of a whole, but each slice
represents a category or amount.
○​ Example: A pie chart showing the total market share of different
companies.
○​ Best for: Showing proportions within a single category or group.
●​ Stacked Bar/Area Charts:
○​ Usage: These charts show the total amount while also breaking it
down into subcategories. This helps in understanding both the overall
amount and the composition.
○​ Example: A stacked bar chart showing total revenue broken down by
different products over several years.
○​ Best for: Showing both the total amount and its components.
Example Visualizations:
●​ A bar chart for comparing sales figures by product category.
●​ A stacked area chart for showing the total expenditure in different
categories (e.g., food, transport, entertainment) over time.

2. Representing Distributions
When visualizing distributions, the goal is to display how data points are spread
across a range, and to highlight key statistical properties such as the spread,
skewness, and outliers.
Recommended Visualizations:
●​ Histograms:
○​ Usage: Perfect for displaying the distribution of continuous data. The
x-axis shows different ranges (bins), and the y-axis shows the
frequency of data points within each range.
○​ Example: A histogram of exam scores showing how students' scores
are distributed.
○​ Best for: Understanding the frequency of values in a dataset, such as
the distribution of test scores or income levels.
●​ Box Plots (Box-and-Whisker Plots):
○​ Usage: Show the distribution of data by displaying the median,

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


quartiles, and potential outliers. The box represents the interquartile
range (IQR), and the "whiskers" extend to the range of the data
(excluding outliers).
○​ Example: A box plot showing the distribution of salaries in different
departments of a company.
○​ Best for: Comparing distributions across multiple groups or
identifying skewness and outliers.
●​ Violin Plots:
○​ Usage: Similar to box plots, but also show the kernel density estimate
(smoothed distribution), allowing you to see the distribution's shape.
○​ Example: A violin plot showing the distribution of income levels
across different regions.
○​ Best for: Visualizing the distribution and density of data, especially
when comparing across multiple groups.
●​ Density Plots:
○​ Usage: A smoothed version of a histogram that shows the probability
density of continuous data.
○​ Example: A density plot showing the distribution of house prices in a
city.
○​ Best for: Visualizing continuous data distribution without the need
for discrete bins.
Example Visualizations:
●​ A histogram showing the distribution of ages within a population.
●​ A box plot comparing salary distributions between different job positions in
a company.

3. Representing Proportions
When visualizing proportions, the goal is to show how one part compares to the
whole or how different categories contribute to the total.
Recommended Visualizations:
●​ Pie Charts:
○​ Usage: Show the proportion of each part in relation to the whole.
Each slice represents a category, and the size of the slice indicates its
relative proportion.
○​ Example: A pie chart showing the market share of different
smartphone brands.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


○​ Best for: Representing parts of a whole in a visually intuitive way,
particularly when there are fewer categories.
●​ Donut Charts:
○​ Usage: A variation of the pie chart, but with a hollow center, often
used to display proportions in a similar way.
○​ Example: A donut chart showing the budget allocation across
different departments in a company.
○​ Best for: Similar to pie charts, but often considered more
aesthetically pleasing.
●​ Stacked Bar Charts:
○​ Usage: Show the proportion of each category in a total, where each
bar represents the total amount and is divided into segments to
represent the individual categories or subgroups.
○​ Example: A stacked bar chart showing sales across multiple years,
where each bar is divided into product categories.
○​ Best for: Comparing parts of a whole across categories or over time.
●​ Treemaps:
○​ Usage: Display hierarchical data as nested rectangles. The area of
each rectangle is proportional to its value.
○​ Example: A treemap showing the distribution of sales revenue across
different regions and sub-regions.
○​ Best for: Showing proportions of parts within a hierarchy or complex
nested structure.
●​ 100% Stacked Bar Charts:
○​ Usage: These charts are similar to stacked bar charts but normalized
so that each bar represents 100% of the total. Each segment shows a
percentage of the whole.
○​ Example: A 100% stacked bar chart showing the proportion of
students passing or failing across different courses.
○​ Best for: Showing the relative proportion of categories in comparison
to the total across different groups.
Example Visualizations:
●​ A pie chart showing the market share of different companies.
●​ A stacked bar chart showing sales revenue broken down by region, with
each bar representing total sales for a year, divided by region.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


Choosing the Right Visualization
●​ For Amounts: Use bar charts, column charts, or pie charts. They help you
compare values between categories and show individual amounts clearly.

●​ For Distributions: Use histograms, box plots, violin plots, or density plots.
These allow you to see how data points are spread out and identify patterns
like skewness or central tendency.

●​ For Proportions: Use pie charts, donut charts, stacked bar charts, or
treemaps. These are ideal for visualizing how different parts contribute to a
whole.

Conclusion
Choosing the right visualization type is key to effectively communicating the story
behind the data. By selecting the appropriate chart type for representing
amounts, distributions, and proportions, you can provide clearer insights and
help your audience interpret the data more easily. The visualization type should
match the nature of the data (e.g., categorical vs. numerical, individual vs.
grouped data) and the insight you want to convey.

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK


Pie chart

UNIT 1 INTRODUCTION TO AI​ ​ ​ ​ ​ ​ NMK

You might also like