Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

Lesson-2-Data-Presentation

Uploaded by

ayahayes90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Lesson-2-Data-Presentation

Uploaded by

ayahayes90
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

DATA EDU

PRESENTATION 641

Lesson 2

Jay-R A. Manamtam
October 17, 2024
TABLE OF CONTENTS

01 02 03

Graphical
Tabular Description Measures of
Description of
of Data Central Tendency
Data
01

Tabular Description
of Data
DATA

Data refers to factual information (such as measurements or


statistics) used as a basis for reasoning, discussion, or
calculation.

Presentation of Data refers to the organization of data


into tables, graphs or charts, so that logical and statistical
conclusions can be derived from the collected
measurements. Data may be presented in (3 Methods):

Textual  Tabular or  Graphical.


TABULAR

Tables are the simplest way to represent data. A


table compiles all the data into columns and rows so
that it can be easily interpreted.

A table is a chart that organizes information in rows


and columns. Information presented in
a table format is tabular.
Objectives of
Tabular Presentation of Data

✓A tabular presentation helps in the simplification of


complex data.
✓It helps in the comparison of different data sets and
brings out their essential aspects.
✓Statistical analysis can be undertaken from a tabular
presentation.
✓A tabular presentation of data further aids in formations
of graphs and diagrams for data analysis.
Objectives of
Tabular Presentation of Data

Parts of a Table

1. Table Number
2. The Title
3. The Box Head (column captions)
4. The Stub (row captions)
5. The Body
6. Prefatory Notes
7. Foot Notes
8. Source Notes
Types of Table

1. Simple or One way – if only one characteristic


is presented on a table.

2. Two way Table – if two characteristics are


presented in the table.

3. Manifold Table – if more than two


characteristics are presented in the table
Table 1:
Number of Students in Different Rooms

Room No. of Students


Arts 25

Commerce 20

Science 28

Total 73
Table 2:
Number of Students in Different
Rooms and Gender
No. of Students
Room Male Female Total

Arts 10 15 25

Commerce 10 10 20

Science 18 10 28

Total 38 35 73
Table 3:
Number of Students in Different
Rooms, Gender and Sections
Section I Section II
Room Male Female Total Male Female Total

Arts 10 15 25 12 15 27

Commerce 10 10 20 8 12 20

Science 18 10 28 15 13 28

Total 38 35 73 35 40 75
02

Graphical
Description of Data
Graphical Representation

➢ refers to the use of intuitive charts to clearly visualize and


simplify data sets. Data is ingested into graphical
representation of data software and then represented by
a variety of symbols, such as lines on a line chart, bars on a
bar chart, or slices on a pie chart, from which users can
gain greater insight than by numerical analysis alone.
Graphical Representation
Why graphical representation of data is
important?
✓ Graphical representation is crucial component in
understanding and identifying patterns and trends in
the ever increasing flow of data.
✓ enables the quick analysis of large amounts of data at
one time and can aid in making predictions and informed
decisions.
✓ Graphical representation also make collaboration
significantly more efficient by using familiar visual
metaphors to illustrate relationships and highlight
meaning, eliminating complex, long-winded explanations
of an otherwise chaotic-looking array of figures.
Why graphical representation of data is
important?
✓ Data only has value once its significance has been
revealed and consumed, and its consumption is best
facilitated with graphical representation tools that are
designed with human cognition and perception in mind.
✓ Human visual processing is very efficient at detecting
relationships and changes between sizes, shapes, colors,
and quantities.
✓ Attempting to gain insight from numerical data alone,
especially in big data instances in which there may be
billions of rows of data, is exceedingly cumbersome and
inefficient.
Common Types of
Graphical Representation

❖ Pie Chart
❖ Bar Graph
❖ Histogram
❖ Line Graph (Frequency of Polygon)
❖ Pictograph
❖ Scatter plots
❖ Heatmaps
Pie Chart

❑ Is a type of graph that


displays data in a circular
graph.
❑ It is also know as a circle
graph, it is where numerical
information represents as
slices or in fractional form
or percentage where the
whole circle is 100%.
Bar Graph

❑ is a chart that plots data


using rectangular bars or
columns (called bins) that
represent the total amount
of observations in the data
for that category.
❑ can be displayed with
vertical columns, horizontal
bars, comparative bars, or
stacked bars.
How to Choose Between a Bar
Chart and Pie Chart
❑ a pie chart can only be used if the ❑ a bar chart can be used for a
sum of the individual parts add up broader range of data types, not
to a meaningful whole. just for breaking down a whole
❑ is built for visualizing how each into components.
part contributes to that whole. ❑ can easily compare two or three
❑ A part-to-whole comparison must data sets.
be of interest, rather than a ❑ most widely used method of data
group-to-group comparison. representation.
❑ The number of slices should be ❑ Bar graphs are better for
relatively small, about five at most comparing larger changes or
❑ Slices of interest should carve out differences in data among groups
identifiable proportions of ❑ Not suitable if there are large
regions, multiples of 1/4 or 1/3. number of categories.
Histogram

❑ Is a graph where the


information is represented
along with the height of
the rectangular bar.
❑ Though it looks like a bar
graph, histogram
represents a range of
quantitative data when a
bar graph represents
categorical variables.
Comparison between Bar Graph
and Histogram
Line Graph

❑ A line graph, also known as


a line chart or time series
plot, represents data
points using a series of
connected line segments.
❑ Line graphs are useful for
displaying smaller changes
in a trend over time.
Pictograph
❑ pictographs are charts that are used to represent data using icons and
images relevant to the data.
❑ A key is often included in a pictograph that indicates what each icon
or image represents.
❑ All icons in the pictogram must be of the same size, but we can use
the fraction of an icon to show the respective fraction of that amount.

Reference:
https://thirdspacelearning.com/gcse-maths/statistics/pictograph/
Scatter plot
❑ A scatter plot is a data visualization tool used to display individual
data points in a two-dimensional coordinate system.
❑ Scatter plots are commonly used for identifying relationships,
detecting outliers, pattern recognition, and prediction.

Reference:
https://www.math.net/scatter-plot
Heatmap
❑ A heatmap is a data visualization technique that represents data
values using colors on a grid.
❑ Heatmaps are particularly useful for displaying the intensity,
concentration, or relationships between data points within a matrix or
two-dimensional dataset.
❑ They are a powerful tool for exploratory data analysis and
communication of complex datasets.

Reference:
https://en.wikipedia.org/wiki/Heat_map
03

Measures of
Central Tendency
Measures of Central Tendency
or Average
A measure of central tendency is a single value that attempts to describe a set of
data by identifying the central position within that set of data.

Measures of central tendency or averages give us one value for the distribution
and this value represents the entire distribution. In this way averages convert a
group of figures into one value.

Collected and classified figures are vast. To condense these figures we use
average. Average converts the whole set of figures into just one figure and thus
helps in condensation.
To make comparisons of two or more than two distributions, we have to find the
representative values of these distributions. These representative values are
found with the help of measures of the central tendency.
Three Common
Measures of Central Tendency
Mean
• is equal to the sum of all the values in the data set divided by the number of values in the
data set.

The formula in finding the sample mean was:

σ𝑛
𝑖=1 𝑥𝑖
𝑥ҧ =
𝑛
Where:
𝑥ҧ = mean
𝑥𝑖 = score of each respondent
σ𝑛𝑖=1 𝑥𝑖 = sum of all scores of the respondents
𝑛 = total number of data points
Weighted Mean
• is a kind of average used in determining the central tendency of each item that was used in
each item of the instruments.

The formula can be written as:

σ(𝑣𝑤)
𝑀𝑤 =
σ𝑤
Where:
𝑀𝑤 = computed mean
𝑣 = value, score, or actual data point
𝑤 = weight assigned to each data point
σ(𝑣𝑤) = the sum of the products of data point and its weight
σ𝑤= the sum of weights
Three Common
Measures of Central Tendency
Median
• The "middle" of a sorted list of numbers.
• Is the number that separates the higher half from the lower half of scores.
• It is the middle value in a sorted, whether ascending or descending, list of
scores.
• To find the median, arrange the scores in ascending or descending order and
get the middle score. The middle score is the median value. If there are two
middle scores, the median is the mean or average of these two middle scores.
Three Common
Measures of Central Tendency

Mode
• Is the number that appears most often in a set of scores.
• A set of scores may have one mode, more than one mode, or no
mode at all.
• In a frequency distribution table, charts, or graphs, the mode is
the maximum frequency value.
Advantage and
Disadvantage of Mean
Advantages Disadvantages
• Takes account of all values in • It is highly affected by extreme values.
the series. • It cannot be determined by inspection.
• It is rigidly defined. • It cannot be computed accurately if any
value/score is missing.
• It is suitable for further
• It cannot be used when we are dealing
algebraic treatment. with qualitative characteristics such as
• It is least affected fluctuation honesty, beauty, etc.
of sampling. • It cannot be calculated for an open ended
distribution.
• Most popular and well known
average. • Two data sets can have the same
arithmetic mean while having completely
different implications.
Advantage and
Disadvantage of Median
Advantages Disadvantages
• Simple to determine and easy to • The process becomes tedious if the
understand. series contains large number of items.
• Less affected by outliers and • It is a less representative average
skewed data. because it does not depend on all the
items in the series.
• Can be easily represented • It is affected much by fluctuations of
graphically. sampling.
• Suitable for open ended • It is not capable of algebraic
distribution. treatment.
• Suitable For qualitative • In the case of an even number of
phenomenon observations, the median cannot be
determined exactly.
Advantage and
Disadvantage of Mode
Advantages Disadvantages
• Simple to determine and easy to
• It is a less representative average
understand.
because it does not depend on all the
• Can be easily located and
items in the series.
represented graphically.
• Less affected by outliers and • It is affected much by fluctuations of
skewed data. sampling.
• Suitable for open ended • It is not capable of algebraic
distribution. treatment.
• Suitable For qualitative • Mode is ill defined.
phenomenon
• The only average that can be used
in nominal level data.
Two ways to represent and analyze data
in Statistics

Ungrouped Data (Individual Data) Grouped Data (Interval Data)


• Ungrouped data refers to a raw • Grouped data involves categorizing or
dataset where individual data points grouping individual data points into
are presented individually or without intervals or classes.
any grouping into intervals or classes.
Frequency Distribution
Table (FDT)
• A frequency distribution table is a tabular representation of data
that summarizes the frequency or count of each value or interval
in a dataset.
• It organizes data into distinct categories or intervals, and for
each category, it shows how many data points fall into that
category.
• Frequency distribution tables are commonly used in statistics to
understand the distribution of data and identify patterns, central
tendencies, and variations within a dataset.
Example (FDT)
Frequency distribution tables
are useful for various purposes

• Describing the distribution and central tendencies of a


dataset.
• Identifying patterns, outliers, and gaps in the data.
• Visualizing the data with histograms or other graphical
representations.
• Comparing different datasets.
• Preparing data for further statistical analysis.
Steps to Create a Frequency
Distribution Table

1. Determine the Number of Categories (Intervals):


Decide on the number of categories or intervals you
want to use. The choice of intervals depends on the
nature of your data and the level of detail you want. You
can use rules like Sturges' Rule, the square root rule, or
domain knowledge to help determine the number of
intervals.
Determine the Number
of Categories (Intervals)

Sturges' Rule:
Sturges' Rule is a simple formula to estimate the number of intervals (𝑘) in a frequency
distribution. It is given by the formula:

𝑘 = 1 + 3.3 ∙ log10 (𝑁)

Where 𝑁 is the number of data points in your dataset. Sturges' Rule is a quick and
straightforward way to get an estimate, but it may not work well for small or highly skewed
datasets.
Determine the Number
of Categories (Intervals)

Square Root Rule:


The square root rule suggests taking the square root of the number of data points (𝑁) and
rounding to the nearest whole number to determine the number of intervals (𝑘). The formula
is:

𝑘 = 𝑁

This rule provides a slightly larger number of intervals compared to Sturges' Rule and can be a
better choice for moderately sized datasets.
Determine the Number
of Categories (Intervals)

Scott's Rule:
Scott's Rule takes into account both the number of data points and the standard deviation of
the dataset. It is given by the formula:

3.5 ∙ (standard deviation)


ℎ = 1
𝑁 3
Where ℎ is the width of the class interval. Once you calculate ℎ, you can determine the
number of intervals by dividing the range of the data by ℎ.
Determine the Number
of Categories (Intervals)

Freedman-Diaconis Rule:
Similar to Scott's Rule, the Freedman-Diaconis Rule uses the interquartile range (IQR) instead
of the standard deviation. The formula for the interval width (ℎ) is:

2 ∙ 𝐼𝑄𝑅
ℎ = 1
𝑁 3
Like Scott's Rule, you can determine the number of intervals by dividing the range of the data
by ℎ.
Determine the Number
of Categories (Intervals)

Expert Judgment:

Sometimes, it's best to rely on the expertise of a subject matter expert or domain knowledge.
If you have insights into the nature of your data or the research objectives, you may choose to
define custom intervals that make sense for your analysis.
Steps to Create a Frequency
Distribution Table

2. Determine the Range: Find the range of your data,


which is the difference between the maximum and
minimum values. This will help you define the width of
the intervals.

𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑉 − 𝐿𝑉
Steps to Create a Frequency
Distribution Table
3. Calculate the Interval Width: Divide the range by the
number of intervals to determine the width of each
interval. Round this number to a convenient value that
makes sense for your data.

𝑅𝑎𝑛𝑔𝑒
ℎ=
𝑘
Where:
ℎ = Interval Width
𝑘 = number of Intervals
Steps to Create a Frequency
Distribution Table
4. Create Categories (Intervals): Define the intervals based on the width
and starting point. For example, if your data ranges from 60 to 99 and
you want 4 intervals, each with a width of 10, you can create the
following intervals: 60-69, 70-79, 80-89, 90-99.
5. Tally the Data: Go through your dataset and tally the data points that
belong to each interval. For ungrouped data, simply count how many
data points fall into each category. For grouped data, place each data
point into the appropriate interval.
6. Construct the Table: Create the frequency distribution table with two
main columns: "Categories (Intervals)" and "Frequency." List the
categories and write down the frequency for each category based on
your tallies.
Steps to Create a Frequency
Distribution Table

7. Calculate Relative Frequency: Optionally, you can calculate


relative frequency by dividing the frequency for each category
by the total number of data points. This shows the proportion
of data in each category relative to the whole dataset.
8. Calculate Cumulative Frequency: If needed, calculate
cumulative frequency, which is the running total of frequencies
as you move down the table. The cumulative frequency for the
last category should be equal to the total number of data
points.
Measures of Central Tendency
for Grouped Data

Mean
σ𝑘
𝑖=1 𝑓𝑖 ∙ 𝑥𝑖
𝑥ҧ =
𝑛
Where:
𝑥ҧ = mean
𝑥𝑖 = the midpoint of each interval
𝑓𝑖 = the frequency of each interval
𝑘 = the number of intervals
σ𝑘𝑖=1 𝑓𝑖 ∙ 𝑥𝑖 = sum of all scores of the respondents
𝑛 = total number of data points
Measures of Central Tendency
for Grouped Data
Median
𝑁+1
2
−𝐿𝐶𝐹𝑏𝑀
𝑥෤ = 𝐿𝐶𝐵𝑀𝑒 + ℎ
𝑓𝑀𝑒
Where:
𝑥෤ = median
𝐿𝐶𝐵𝑀𝑒 = lower class boundary of the median class
𝑁 = total number of data points
𝐿𝐶𝐹𝑏𝑀 = less than cumulative frequency below median class
𝑓𝑀𝑒 = frequency of the median class
ℎ = width of class interval
Measures of Central Tendency
for Grouped Data

Mode
𝑑1
𝑥ො = 𝐿𝐶𝐵𝑀𝑜 + ℎ
𝑑1 +𝑑2
Where:
𝑥ො = mode
𝐿𝐶𝐵𝑀𝑜 = lower class boundary of the modal class
𝑑1 = positive difference between the frequency of the modal class and frequency
below the modal class
𝑑2 = positive difference between the frequency of the modal class and frequency
above the modal class
ℎ = width of class interval
References
• Garrett: H.E. (1956), Elementary Statistics, Longmans, Green, and Co. New York.
• Roth, R.K. (1999): Fundamentals of Educational Statistics and Measurement, Taratarini Pustakalaya,
Orissa
• https://machinep.com/importance-of-statistics-in-education
• https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/discrete-vs-
continuous-variables/
• https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php
• https://www.preservearticles.com/articles/what-is-the-importance-of-measures-of-central-
tendency-in-statistics/7716
• https://study.com/academy/lesson/central-tendency-measures-definition-
examples.html#:~:text=Central%20tendency%20is%20very%20useful,with%20large%20amounts%
20of%20data.
• https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php
• https://allthingsstatistics.com
• https://www.aplustopper.com/mean-advantages-disadvantages/
• https://byjus.com/question-answer/what-are-the-advantages-and-disadvantages-of-mean-median-
and-mode/
• https://www.slideshare.net/vharshana/role-of-statistics-in-scientific-research
• https://benefits-drawbacks.blogspot.com/2018/07/advantages-and-disadvantages-of-median.html
• https://www.preservearticles.com/notes/advantages-and-disadvantages-of-median/3760

You might also like