Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
71 views

Mathematics in The Modern World

This document discusses descriptive statistics and data management. It defines statistics as dealing with data collection, organization, analysis, interpretation and presentation. It describes different methods of collecting data like interviews, surveys, and online surveys. It discusses frequency distribution and how to represent data using tables, graphs, bar graphs and pie charts. It also defines measures of central tendency like mean, median and mode and provides examples of calculating each for both ungrouped and grouped data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Mathematics in The Modern World

This document discusses descriptive statistics and data management. It defines statistics as dealing with data collection, organization, analysis, interpretation and presentation. It describes different methods of collecting data like interviews, surveys, and online surveys. It discusses frequency distribution and how to represent data using tables, graphs, bar graphs and pie charts. It also defines measures of central tendency like mean, median and mode and provides examples of calculating each for both ungrouped and grouped data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Mathematics

in the Modern
World
Chapter 4:
Data Management
4.1 Descriptive Statistics
Statistics is a branch of mathematics that deals
with data collection, organization, analysis,
interpretation and presentation.

Data collection is defined as the procedure of


collecting, measuring and analyzing accurate
insights for research using standard validated
techniques.

Data organization refers to the method of


classifying and organizing data sets to make
them more useful, it can be applied to physical
records or digital records.
Data analysis is a process of inspecting, cleansing,
transforming, and modeling data with the goal of
discovering useful information, informing
conclusions, and supporting decision-making.

Interpretation of data is the process of assigning


meaning to the collected information and
determining the conclusions, significance, and
implications of the findings.

Presentation of data refers to the organization of


data into tables, graphs or charts, so that logical
and statistical conclusions can be derived from the
collected measurements.
Descriptive Statistics gives us information or help
describe the characteristics of a specific data set by
giving short summaries about the sample and
measures of the data.

Basic Statistical Concepts

A population consists of the totality of the


observation and sample is a part of the
population. A variable is any characteristics,
number, or quantity that can be measured or
counted.
Two kinds of variables:

1. Qualitative variables also called as categorical


variables are variables that are not numerical.
It describes data that fits into categories.

2. Quantitative variables are numerical. It can be


ranked and has order.
Quantitative variables can be classified further into
discrete variables and continuous variables.

A discrete variable is a variable whose value


is obtained by counting.

Continuous variables can assume an infinite


number of values between any two specific
values. They are obtained by measuring. They
often include fractions and decimals.
Examples
Discrete
number of students present
number of red marbles in a jar
number of heads when flipping three coins
students’ grade level

Continuous
height of students in class
weight of students in class
time it takes to get to school
distance traveled between classes
Types of Statistical Data

1. Numerical data. These data have meaning as a


measurement such as a person’s height, weight, IQ,
or blood pressure or shares of stocks a person owns.

2. Categorical data: Categorical data represent


characteristics such as a person’s gender, marital
status, hometown, or the types of movies they like.
Categorical data can take on numerical values (such
as “1” indicating male and “2” indicating female) but
those numbers don’t have mathematical meaning.
Four Levels of Measurement
1. Nominal – the lowest of the four ways to characterize data. It deals with
names, categories, or labels. (eg. colors of eyes, yes or no responses to a
survey, favorite breakfast cereal, and number on the back of a football
jersey).

2. Ordinal – the data at this level can be ordered but no differences between the
data. (eg. ten cities are ranked from one to ten, but differences between the
cities don't make much sense, letter grades where we can order things so that A
is higher than B but without any other information).

3. Interval – deals with data that can be ordered, and in which differences
between the data does make sense. But data at this level has no starting point.
(eg. Fahrenheit and Celsius scales of temperatures).

4. Ratio – the highest level of measurement. Data possess all of the features of
the interval level, in addition to an absolute zero. Due to the presence of a zero, it
now makes sense to compare the ratios of measurements.
4.2 Data Collection Method
Methods of Collecting Data
1. In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons: Time consuming, expensive and can be dismissed as anecdotal

2. Mail Surveys
Pros: Can reach anyone and everyone – no barrier
Cons: Expensive, data collection errors, lag time

3. Phone Surveys
Pros: High degree of confidence on the data collected, reach almost
anyone
Cons: Expensive, cannot self-administer, need to hire an agency

4. Web/Online Surveys
Pros: Cheap, can self-administer, very low probability of data errors
Cons: Not all your customers might have an email address/be on the
internet, customers may be wary of divulging information online
Three Ways of Presenting Data

1. Textual – this method comprises data


presentation with the help of a paragraph or a
number of paragraphs.

2. Tabular – the method of presenting data using


the statistical table. A systematic organization of
data in columns and rows.

3. Graphical – a chart representing the quantitative


variations or changes of variables in pictorial or
diagrammatic form.
4.3 Frequency Distribution
Frequency is the rate that measures how often
something occurs.

Example 1
Jack joins football practice every Wednesday morning,
Sunday morning and afternoon.

The frequency of Jack’s football practice every week is 3 (2 on


Sunday and 1 on Wednesday).

By counting frequencies we can make Frequency


Distribution Table.
Example 2

Jack’s team has scored the following numbers of goals in their games,
3, 1, 2, 1, 3, 2, 4, 2, 3, 2, 5, 4, 3, 2.

Jack put the numbers in order, then added up:


how often 1 occurs (2 times),
how often 2 occurs (5 times),
how often 3 occurs (4 times),
how often 4 occurs (2 times),
how often 5 occur (1 time)
Graphical Representation of Frequency Distribution
A. Bar Graph is a pictorial representation of statistical data in such a way
that length of the rectangles in the graph represents the proportional value
of the variable. Bar graphs are generally used to compare the values of
several variables at a time to analyze data. The length of the bars
(horizontal or vertical) represents the frequency of the variable and is
applicable to discrete categories only.
B. Line graph or Line chart is a graphical display of information that
changes continuously over time. Within a line graph, there are points
connecting the data to show a continuous change. The lines in a line graph
can descend and ascend based on the data. We can also compare different
events, situations, and information.
C. Pie Chart is a type of graph that displays data in a circular graph. The
pieces of the graph are proportional to the fraction of the whole in each
category. Each slice of the pie is relative to the size of that category in the
group as a whole. The entire “pie” represents 100 percent of a whole, while
the pie “slices” represent portions of the whole.
4.4 Measures of Central Tendency
A. Mean
It is the most common measure of central location. It can be
obtained by getting the sum of all values of the observations divided by
the number of observations. In computing for the mean, we use
𝑥
𝑥=
𝑛

where x is the value of each observations in the sample


n is the total number of observations in the sample

It is worth noting that the mean has the following characteristics:


1. The mean is affected by the presence of extreme values.
2. The sum of the deviations of the observations from the mean is zero.
3. The sum of the squared deviations of the observations from the
mean is minimum.
4. It is a good measure for interval and ratio type of data.
B. Median
It is the middle value of a set of observations arranged in
increasing or decreasing order. This measure divides the
data into two equal number of observations.

The median has the following characteristics:

1. It is not affected by the presence of extreme observations.


2. The sum of absolute deviations of the observation from
the median is minimum.
3. It is an appropriate measure for an ordinal type of data.
C. Mode
It is the most repeated value or the value that occurs for
the most number of times. Note that it is possible for a
certain data to have two modes. In such case, the
distribution of the data set is bimodal (with two modes).
When a certain data set has more than two modes, the
distribution is called multimodal distribution.

The mode has the following characteristics:


1. Mode is determined by frequency.
2. It is an appropriate measure for nominal data.
Example 1 (for ungrouped data)

The following are the 3rd year math grades of an applied math student:
1.6 1.2 1.9 1.5 1.5 1.5 1.0 1.3 1.0

Mean:
X1 + X 2 + ⋯ + X 9
X =
9

1.6 + 1.2 + 1.9 + 1.5 + 1.5 + 1.5 + 1.0 + 1.3 + 1.0


= = 1.39
9

Median:
1.0 1.0 1.2 1.3 1.5 1.5 1.5 1.6 1.9

Mode: 1.5
Example 2 (for grouped data)

Class limit 𝒇 𝒙 𝒇𝒙 < 𝒄𝒇 Class boundaries

60 – 67 2 63.5 127 2 59.5 – 67.5


52 – 59 2 55.5 111 4 51.5 – 59.5
44 – 51 6 47.5 285 10 43.5 – 51.5
36 – 43 10 39.5 395 20 35.5 – 43.5
28 – 35 7 31.5 220.5 27 27.5 – 35.5
20 – 27 3 23.5 70.5 30 19.5 – 27.5

The mean for grouped data is given by 𝑓𝑖 𝑥𝑖


𝑥=
𝑛
Where fi is the frequency of the ith class interval
xi is the class mark of the ith interval
Solving for the mean:
127 + 111 + 285 + 395 + 220.5 + 70.5
𝑥= = 40.3
30
The median for grouped data is given by

𝑛
𝑀𝑑 = 𝐿𝐶𝐵 + 2 − 𝑐𝑓𝑝 𝑖
𝑓𝑚

where LCB is lower boundary of the median class


i is the size of the class interval
cf p is the cumulative frequency of the interval preceding the median class
f is the frequency of the median class
m

Median Class – is the class containing cumulative frequency equal to n2 or next


higher.
Solving for median:
n 30
= = 15
2 2

Lower Limit of the Class Boundary


LCB = 35.5
Cumulative Frequency before the median class
𝑐𝑓𝑝 = 10
Frequency of the median class
fm = 10
Class Size (i) = 8

n
Median = LCB + 2 − 𝑐𝑓𝑝 i
fm

15 − 10
= 35.5 + 8 = 39.5
10
The mode for grouped data is given by
𝑓𝑚 − 𝑓1
𝑀𝑜 = 𝐿𝐶𝐵 + 𝑖
2𝑓𝑚 − 𝑓1 − 𝑓2

where LCB is the lower boundary of the modal class

i is the size of the class interval

fm is the frequency of the modal class

f1 is the frequency of the class preceding the modal class

f2 is the frequency of the class following the modal class

Modal Class – is the class with the highest frequency.


Solving for mode:

𝑓𝑚 − 𝑓1
Mode = LCB + i
2𝑓𝑚 − 𝑓1 − 𝑓2

10 − 7
= 35.5 + 8 = 38.9
20 − 7 − 6
4.5 Measures of Variability
Variability for Ungrouped Data

• Range - The range (R) is defined as the difference between the


highest value (HV) and the lowest value (LV) in the data. That is,

R  HV  LV
• Variance
It is defined as the average of the squared deviations from the mean.
It is the measure that considers the position of each observation
relative to the mean.
n x   x 
2 2
2
𝑥𝑖 − 𝑥
𝑠 2
= or s 
2

𝑖
𝑛 −1 n (n  1)
• Standard Deviation (the most widely encountered) - It is
the measure of the spread or dispersion of scores from the
mean of distribution. It is the square root of the variance.

n x   x 
2 2
𝑥𝑖 − 𝑥 2
𝑠 =
𝑛 −1
or s 
𝑖 n (n  1)

Variability for Grouped Data

Range: R  Highest Class mark  Lowest Class mark

Variance: Standard Deviation:


n fx   fx
2 2
s 
2

n fx   fx 2

s 
2

n (n  1) n (n  1)
4.6 Testing a Statistical Hypothesis
Hypothesis testing is the most significant area of statistical
inference. It is a step-by-step process in making inferences
(conclusions) about a population.

The truth value of a statistical hypothesis can only be identified


when we take a portion of the population of interest and use the
information obtained from this portion to decide whether the
statistical hypothesis is likely to be true or false. We either “reject”
the statistical hypothesis when inconsistencies from the sample
occur, or “not reject” otherwise. Note that the rejection of a
statistical hypothesis means that it is false, but its acceptance does
not necessarily mean it is true. Acceptance of the stated hypothesis
implies that there is not enough evidence to reject it.
Types of Statistical Hypothesis

We use the term null hypothesis for the hypothesis we


want to test, that is, to either reject or accept, denoted by H0.
If the null hypothesis is rejected, the alternative hypothesis,
denoted by H1, will then be accepted. The null hypothesis
H0 is stated such that it specifies an exact value while the
alternative hypothesis H1 is stated such that it allows for the
possibility of some certain values. For example, if the null
hypothesis H0 is 𝑥 = 8, the alternative hypothesis H1 might
be 𝑥 < 8, 𝑥 > 8, or 𝑥 ≠ 8.
Types of Statistical Tests

If the alternative hypothesis of any statistical test is one –


sided, for example, H1: 𝑥 < 8 or H1: 𝑥 > 8, it is said to be a
one – tailed test. On the other hand, if the alternative
hypothesis is two – sided, for example, H1: 𝑥 ≠ 8, the test is
said to be two – tailed.

Types of Error

However deciding whether to accept or reject any statistical


hypothesis of a population parameter is critical that it might lead
to wrong conclusions. For instance, a researcher could reject H0
when in fact, it is true. Such is called a type I error. Also, one
might accept H0 even when it is false. In this case, a type II error
occurred.
Constructing the Null and Alternative Hypothesis

A.Testing for Means

In hypothesis testing, means, variances, or proportions may


be compared so as to justify the need to reject or accept the null
hypothesis. But there are many instances that the sample means
were compared using experimental and control groups.
Example 1

1. A researcher wants to know if the average test score of the students taking a
particular examination is 80.

H0: 𝜇 = 80 (the average test score of the students taking a


particular examination is 80)
H1: 𝜇 ≠ 80 (the average test score of the students taking a
particular examination is not 80)

2. A small group of researchers is conducting a study to show if the average


number of hours a student spends on social media sites per day is greater than
10.

H0: 𝜇 = 10 (average number of hours a student spends on social


media sites per day is 10)
H1: 𝜇 > 10 (average number of hours a student spends on social
media sites per day is greater than 10)
3. A teacher wants to know if there is a difference in the performance of his
two classes based on their average grades.

H0: 𝜇1 = 𝜇2 (there is no difference in the performance of his two


classes based on their average grades)
H1: 𝜇1 ≠ 𝜇2 (there is a difference in the performance of his two
classes based on their average grades)

4. A researcher wants to study if the customer satisfaction level of a cable


television company A is greater than a cable television company B.

H0: 𝜇1 = 𝜇2 (the customer satisfaction levels of two competing


cable television companies are the same)
H1: 𝜇1 > 𝜇2 (the customer satisfaction levels of a cable television
company A is greater than a cable television company B)
5. A clinical trial is conducted to compare three different weight
loss programs based on the average weight measured among three
groups at the end of the program.

H0: 𝜇1 = 𝜇2 = 𝜇3 (there is no difference on the three weight


loss programs)
H1: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛𝑠 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙
(there is a difference on the three weight loss
programs)
B. Testing for Independence

The chi-square (𝜒 2 ) test is used to test the independence of two


variables. In other words, this test is used to determine whether the
two variables are related or not, based on the sample selected from
each variable.
Example 2
1. A survey is conducted to test if the grades of the students are associated to the number of
hours they spend on social media sites.
H0: The grades of the students are not associated to the number of hours they spend
on social media sites.
H1: The grades of the students are associated to the number of hours they spend on
social media sites.

2. A study shows that the daily consumption depends on the age level of a person.
H0: The daily consumption does not depend on the age level of a person.
H1: The daily consumption depends on the age level of a person.
C. Correlation

To determine whether two variables (usually x and y) are


linearly related, correlation is the statistical method to be used.
In this method, the data collected on two numerical variables
are tested to determine the strength of their relationship
estimated by the sample correlation coefficient r given by

𝑛( 𝑥𝑦) − ( 𝑥)( 𝑦)
𝑟=
𝑛( 𝑥 2 ) − 𝑥 2 𝑛( 𝑦 2 ) − 𝑦 2

where −1 ≤ 𝑟 ≤ 1 𝑎𝑛𝑑
𝑛 = number of data pairs
If the value of 𝑟 is close to positive 1, then there is a strong positive linear
relationship between the two variables. If 𝑟 is close to negative 1, there is a
strong negative linear relationship between them. However, if the two
variables has a weak or no linear relationship, 𝑟 is close to 0.

Example 3
1. A study is conducted to show how strong is the relationship between sleeping habit of
employees and their level of performance at work.

H0: Sleeping habit of employees is not related to their level of performance at work.
H1: Sleeping habit of employees is related to their level of performance at work.

2. A student wants to know if his grade in Mathematics is associated to his grade in English.

H0: His grade in Mathematics is not associated to his grade in English.


H1: His grade in Mathematics is associated to his grade in English.
3. A researcher wishes to see whether there is a relationship
between number of hours of study and test scores on an exam.
The following data were obtained.

Student Hours of Study Grade

A 7 83
B 3 63
C 2 60
D 6 88
E 3 68
F 4 75
Solution:
To solve for the correlation coefficient r, we must find first the
values of 𝑥𝑦, 𝑥 2 , and𝑦 2 .

Studen Hours of Grade


𝑥𝑦 𝑥2 𝑦2
t Study (x) (y)

A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
Substituting the values to the formula,

6)(1922) − (25)(437
𝑟=
6 123 − 25 2 6 32451 − 437 2

𝑟 = 0.934

Since the correlation coefficient is close to +1, it indicates


a strong linear relationship between the number of hours
of study and test scores on an exam of students.
D. Regression

Computing the correlation coefficient means determining the


strength of the relationship between two numerical variables. When
the resulting correlation coefficient is significant, then regression
analysis can be done. Regression is used to understand the movement
or trend of the given data so predictions can be made.
The regression equation is given by 𝑦 ′ = 𝑎 + 𝑏𝑥

where 𝑦)( 𝑥 2 ) − ( 𝑥)( 𝑥𝑦)


𝑎=
𝑛( 𝑥 2 ) − 𝑥 2

𝑛( 𝑥𝑦) − ( 𝑥)( 𝑦)
𝑏=
𝑛( 𝑥 2 ) − 𝑥 2
Example 4

Let us take the example in correlation section since a strong linear relationship exists
between the number of hours of study and test scores on an exam of students.
Solution:
Since 𝑥𝑦, 𝑥 2 , and𝑦 2 are necessary to solve for 𝒂 and 𝒃, we must solve them first.

Hours of Grade
Student 𝑥𝑦 𝑥2 𝑦2
Study (x) (y)

A 7 83 581 49 6889
B 3 63 189 9 3969
C 2 60 120 4 3600
D 6 88 528 36 7744
E 3 68 204 9 4624
F 4 75 300 16 5625
𝚺𝒙 = 25 𝚺𝒚 = 437 𝚺𝒙𝒚 = 1922 𝚺𝒙2 = 123 𝚺𝒚2 = 32451
Then we have,
(437)(123) − (25)(1922)
𝑎= 2
= 50.451
6 123 − (25)
(6)(1922) − (25)(437)
𝑏= = 5.372
6 123 − (25)2
Hence, the equation of the regression line is
𝒚′ = 𝟓𝟎. 𝟒𝟓𝟏 + 𝟓. 𝟑𝟕𝟐𝒙
Suppose we want to know the grade (𝒚′ ) of the student if he/she studies in x
hours. For example, let 𝑥 = 9. Then,
𝑦 ′ = 50.451 + 5.372(9)
𝑦 ′ = 98.80
Let 𝑥 = 5. Then,
𝑦 ′ = 50.451 + 5.372(5)
𝑦 ′ = 77.31

You might also like