0% found this document useful (0 votes)

77 views

Data-Management-Lecture-Notes

About data management

Uploaded by

inocencioandreanicole356

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views

Data-Management-Lecture-Notes

About data management

Uploaded by

inocencioandreanicole356

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Filamer Christian University

Roxas Avenue, Roxas City

DATA MANAGEMENT

I- Basic Concepts in Statistics

Definition of Terms:

 Statistics is a science of collecting, organizing, summarizing, presenting, analyzing, and

interpreting data to assist in making more eﬀective decisions.
 Descriptive Statistics - methods of collecting, organizing, summarizing, and presenting data in
an informative way.
 Inferential Statistics - methods concerned with the analysis of a subset of data leading to
predictions or inferences about the entire set of data, that is, to generalize result beyond the
data collected.
 Population - entire set of individuals or objects that are being studied.
 Sample - portion or part of the population of interest.
 Statistic - descriptive measures or characteristics of sample.
 Census - process of gathering information from every element of the population.
 Survey - process of gathering information from every element of a sample.
 Variable - observable characteristics of a person or object that can assume diﬀerent values.
 Data - measurements or observations for a variable.

Types of Variable

1. Qualitative - non numeric

2. Quantitative - can be reported numerically

Qualitative Quantitative
Civil Status Age
Nationality No. of Children
Degree earned Ounces of ice cream
Gender
Letter Grade
Job/occupation

Types of Quantitative

1. Discrete Variables

- can assume values that can be counted

2. Continuous Variables

- can assume an inﬁnite number of values between two speciﬁc values.

Levels of Measurement

Nominal Level - characterized by data that consist of names, labels, or categories only. The data cannot
be arranged in an ordering scheme (such as low to high).
Example:
Survey responses as yes, no, undecided
Ordinal Level - involves data that may be arranged in some order, but differences between data values
either cannot be determined or are meaningless.
Example:
Course grades A, B+, B, C+, C, D, F or AF
Interval Level - like the ordinal level, with the additional property that the difference between any two
data values is meaningful. However, there is no natural zero starting point (where none of the quantity
is present).
Example:
IQ, Temperature
Ratio Level - possesses the characteristic of interval level, and there exist a true zero. Differences and
ratios are meaningful.
Example:
Height, Salary, Time

Table 1: Examples of Levels of Measurement

Nominal Level Ordinal Level Interval Level Ratio Level

Zip code Judging SAT score Weight

Gender (1st, 2nd, 3rd) SAI Age

Eye color Rating scale

Political aﬃliation Major (Average, good, VG)

ﬁeld

Nationality

Methods of Collecting Data

1. Direct or Interview method- this is one of the most effective methods of collecting original data. To
obtain accurate responses, well-trained interviewers may do the interview. The interviewers can be of
great help to the respondents in answering questions that the respondents could not understand.

2. Indirect or Questionnaire method- this one of the easiest methods of data gathering. It takes time to
prepare because questionnaires need to be attractive. It can include illustrations, pictures and sketches.
Its contents, especially the directions, must be precise , clear and self-explanatory.

3. Registration Method- Through this method, the respondents provide information in compliance with
certain laws, policies, rules, regulations, decrees or standard practices. Examples are marriage contracts,
birth certificates, motor registrations, license of firearms, registration of corporations, real estates,
voters , etc.
4. Other methods

4.1. Observation- This method utilized to gather data regarding attitudes, behavior, values and cultural
patterns of the samples under investigation.

4.2. Telephone Interview- This method is employed if the question to be asked are brief and few. An
example is the check made on listeners to certain radio programs like asking what program his radio is
turned in to.

4.3. Experiments- This method is applied to collect or gather data if the investigator wants to control the
factors affecting the variable being studied. An example is when the researcher aims to determine the
different factors affecting the academic performance of the students such as methods or approaches
used in teaching, etc.

Presenting and describing Data

1. Textual form-the presentation is in narrative or paragraph form. The data are within the text of the
paragraph. This form may not the immediate interest of the reader. However, it can present a more
comprehensive picture of the data because of further written explanation of its nature.

2. Tabular-makes use of rows and columns like frequency table or frequency distribution. The data are
presented in a systematic and orderly manner, which catches one’s attention and may facilitate the
comprehension and analysis of the data presented.

3. Graphical form- the numerical data provided in a frequency distribution can be made more
interesting and easier to understand when depicted in graphical form. A graph is a pictorial or
geometrical representation of a given data.

Organizing Data

Categorical Distribution

Twenty ﬁve inductees were given a blood test to determine their blood type.

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Categorical Frequency Distribution Table

Class Tally Frequency Percent
A IIIII 5 20%
B IIIII-II 7 28%
O IIIII-IIII 9 36%
AB IIII 4 16%
more people have type O blood than any other type
Grouped Frequency Distribution

Ages of Patients in Hospital A

25 28 27 30 32
25 31 26 29

31 20 21 32 18
50 53 60 50

45 40 37 25 20
27 32 24 29

25 24 10 12 15
28 6 54 30

Grouped Frequency Distribution Table

Class Limits Class Boundaries Tally Frequency

6 – 14 5.5 – 14.5 III 3

15 – 23 14.5 – 23.5 IIIII 5

24 – 32 23.5 – 32.5 20 tallies 20

33 – 41 32.5 – 41.5 II 2

42 - 50 41.4 – 50.5 III 3

51 - 59 50.0 – 59.5 II 2

60 - 68 59.5 – 68.5 I 1

Total 36 36
Data Presentation

Histogram - a bar graph in which the horizontal scale represents classes of data values and the vertical
scale represents frequencies. The heights of the bars correspond to the frequency values, and the bars
are drawn adjacent to each other (without gaps)

Figure 1: Histogram

Frequency Polygon

- uses line segments connected to points located directly above class midpoint values.
Ogive

- a line graph that depicts cumulative frequencies, just as the cumulative frequency
distribution.

- uses class boundaries along the horizontal scale and the

graph begins with the lower boundary of the ﬁrst class and ends with the upper boundary of
the last class.

Pie Graph

- used to visually depict qualitative data

- a circle divided into sections according to the percentage of frequencies in each category of
the distribution
Bar Graph

- represents the data by using vertical or horizontal bars whose heights or lengths represent
the frequencies of the data

Line Graph

- used to show trends and increases or decreases in sale, scores, population per year etc.
Pareto Chart

- a type of chart that contains both bars and a line graph, where individual values are
represented in descending order by bars, and the cumulative total is represented by the line

Time Series Graph

- data that have been collected at different points in time

Measures of Central Tendency

Once data are collected, it is useful to summarize the data set by identifying a value around which data
are centered.

Arithmetic Mean or Average

- numerical balancing point of the data set. It is calculated by adding all the data values and dividing the
sum by the total number of data points.

Sample Mean ( x )

n xi
X= x 1 + x 2 + x 3 + … + x n = ∑ i=1
_____________________ _______
n n

Population Mean (µ)

N xi
µ = x1 + x2 + x3 + … + xN ∑ i=1
____________________ _____
N N
Example: 6 test scores in Statistics
12 10 15 8` 5 18

X= 12 + 10 + 15 + 8 + 5+18 = 68
____________________ ____
N 6
=11.333…

the mean score for statistic is 11.3

Median (Md)
- simply the middle number in an ordered set of data
if n is odd if n is even
Md = middle score Md = ∑ of two middle scores
2
Example:
1. Six students borrowed these numbers of books in a library:
1, 2, 3, 4, 6, 7 (3 and 4 are the middle scores)
Md = 3 + 4 / 2= 3.5
2. Five students borrowed these numbers of books in a library:
2, 2, 3, 3, 3
Md = 3

Mode (Mo)
- most frequently occurring number in a data set
Examples:
1. 2 5 8 3 2 2 3 1
(Mode is 2)
2. 1 1 3 5 7 8 3 4
(Bimodal, Mode is 1 and 3)
3. 2 4 6 8 9 3 1 5
(No mode)

Midrange (MR)

- the value midway between the highest and lowest values in the original data set

MR = highest score + lowest score

Weighted Mean

- multiply each value by its corresponding weight and dividing the sum of the products by the
sum of the weights.

n wi xi
X=w 1 x 1 + w 2 x 2 + … + w n x n x= ∑ i=1
__________________________ _____________
n wi
w 1 + w 2 + w n …+ ∑ i=1
where w1, w2, w3, ..., wn are the weights and X1, X2, X3, ..., Xn are the values.

Example:

Subject Units (w) Letter Grade Numerical Value (X)

PE 2 A 4

Calculus 5 C+ 2.5

English 3 B 3

X= 2 . 4 + 5 .2.5 + 3 . 3

10 = 2.95

the QPI is 2.95

Measures of Dispersions

• Range - diﬀerence between highest and lowest value

• Standard Deviation - always positive - zero if all the data are the same - uses the
same units as the original data set - can be inﬂuenced by outliers

• Variance - does not use the same units as the data

• Population Standard Deviation (σ)

σ = √σ 2 =
√ ∑ (X −μ)2
N
where X - individual value

µ - population mean

N - population size

Sample Standard Deviation (s)

s = √s2 =
√ ∑ (X −X )2
n−1
where X - individual value

X - sample mean
n - sample size
Let A = 5, 5, 5, 5, 5, 5, 5, 5, 5, 5

B = 4, 4, 4, 5, 5, 5, 5, 6, 6, 6

C = 0, 0, 0, 0, 0, 10, 10, 10, 10, 10

D = 0, 5, 5, 5, 5, 5, 5, 5, 5, 10

• Find the mean and median for each data set.

• Find the range and the standard deviation of each data set.

Measures of Positions

Percentiles - divide the data set into 100 equal groups

Deciles - divide the data set into 10 equal groups

• Quartiles - divide the data set into quarters

• z-score (standard score)

- the z-score for a given data value x is the number of standard deviations that x is
above or below the mean of the data

population: sample:

x−µ x−x
z= z=
σ s
Example:

A basketball player Carl is 78 inches tall and a volleyball player Jane is 76 inches tall. Carl is obviously
taller by 2 inches, but which player is relatively taller? Does Carl’s height among men exceed Jane’s
height among women? Men have mean height of 68 inches and a standard deviation of 2.8 inches while
women have mean height of 63.6 inches and a standard deviation of 2.5 inches.

X−x 78−69
Carl : z = = = 3.21
s 2.8
X−x 76−63.6
Jane : z = = = 4.96
s 2.5
Carl’s height is 3.21 standard deviations above the mean, but Jane’s height is a whopping 4.96 standard
deviations above the mean.

∴ Jane’s height among women is relatively greater than Carl’s height among men.

Exercises
1. The average teacher’s salary in a particular city is P54,166. If the standard deviation is P10,200,
ﬁnd the salaries corresponding to the following z scores.

a. 2

b. −1.6

c. 2.5

2. The mean time to download pdf ﬁle is 12 min with a standard deviation of 4 min. Belle’s download
time is 20 min. John’s download time is 6 min. How can you compare Belle’s download time compare
with John?

3. Cheryl has taken two quizzes in her history class. She scored 15 on the ﬁrst quiz, for which the mean
of all scores was 12 and the standard deviation was 2.4. Her score on the second quiz, for which the
mean of all scores was 11 and the standard deviation was 2.0, was 14. In comparison to her classmates,
did Cheryl do better on the ﬁrst quiz or the second quiz?

4. Roland received a score of 70 on a test for which the mean score was 65.5. Roland has learned that
the z-score for his test is 0.6. What is the standard deviation for this set of test scores?

Box and Whisker Plot (box plot)

- used to provide a visual summary of a set of data

- it shows the median, the 1st and 3rd quartiles, and the minimum and maximum values of a
data set

Constructing Box-and-Whisker Pot

1. Find the ﬁve-number summary for the data values, that is, the maximum and minimum data
values, Q1 and Q3, and the median.

2. Draw a horizontal line with a scale such that it includes the maximum and minimum data values.

3. Draw a box whose vertical sides go through Q1 and Q3, and draw a vertical line though the
median.

4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right side of the box.

Example:
Construct a box-and-whisker plot for the following data:

Number of Rooms Occupied in a Resort during an 18-day period

86 77 58 45 94 96 83 76 75

65 68 72 78 85 87 92 55 61

The Levels of Customer Satisfaction of Convenience Stores in Santa Rosa Bayan As Perceived by Their Customers
58% (31)
The Levels of Customer Satisfaction of Convenience Stores in Santa Rosa Bayan As Perceived by Their Customers
61 pages
Statistics A Review
No ratings yet
Statistics A Review
47 pages
Origin and Growth of Statistics
No ratings yet
Origin and Growth of Statistics
18 pages
Math 5
No ratings yet
Math 5
3 pages
Statistics
No ratings yet
Statistics
5 pages
Stats Lec01
No ratings yet
Stats Lec01
9 pages
2nd Software Engineering
No ratings yet
2nd Software Engineering
107 pages
Midterm Reviewer
No ratings yet
Midterm Reviewer
8 pages
Physics
No ratings yet
Physics
6 pages
AL- I (Unit -I)
No ratings yet
AL- I (Unit -I)
19 pages
Sta 131 Complete Note
No ratings yet
Sta 131 Complete Note
33 pages
Data Management
No ratings yet
Data Management
39 pages
summry biostatstics pptx
No ratings yet
summry biostatstics pptx
32 pages
Mathematics in The Modern World
No ratings yet
Mathematics in The Modern World
50 pages
Mmw Statistics
No ratings yet
Mmw Statistics
50 pages
Stats For PGDM
No ratings yet
Stats For PGDM
52 pages
Lesson 5 - Quantitative Analysis and Interpretation of Data
No ratings yet
Lesson 5 - Quantitative Analysis and Interpretation of Data
78 pages
Stats Assingment
No ratings yet
Stats Assingment
12 pages
Chapter 1 BFC34303
No ratings yet
Chapter 1 BFC34303
104 pages
Statistics MCT
No ratings yet
Statistics MCT
7 pages
Research II Q4 M2
No ratings yet
Research II Q4 M2
14 pages
Introduction To Statistics and SPSS
100% (1)
Introduction To Statistics and SPSS
110 pages
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
No ratings yet
Statistics Is The Study of The Collection, Organization, Analysis, Interpretation, and
18 pages
Statistics
No ratings yet
Statistics
17 pages
Review of Statistical Concepts
No ratings yet
Review of Statistical Concepts
60 pages
Data Management MMW
No ratings yet
Data Management MMW
92 pages
Gned 3 Finals Reviewer
No ratings yet
Gned 3 Finals Reviewer
5 pages
Psych 110 Notes Chap 1 2
No ratings yet
Psych 110 Notes Chap 1 2
10 pages
Math
No ratings yet
Math
13 pages
Engineering Probability and Statistics
No ratings yet
Engineering Probability and Statistics
42 pages
Statistics
No ratings yet
Statistics
14 pages
Stat Review Lecture (Complete)
No ratings yet
Stat Review Lecture (Complete)
18 pages
Frequency and Measures of Central Tendency and Variability
No ratings yet
Frequency and Measures of Central Tendency and Variability
108 pages
Statistics - Basic Concepts
No ratings yet
Statistics - Basic Concepts
29 pages
Statistics and Probability Notes
100% (1)
Statistics and Probability Notes
4 pages
Statistical Methods
No ratings yet
Statistical Methods
43 pages
Data Management ( 1)
No ratings yet
Data Management ( 1)
46 pages
Statistics L 1
No ratings yet
Statistics L 1
27 pages
Ge MMW Hybrid - 2
No ratings yet
Ge MMW Hybrid - 2
7 pages
4th Grade 7 Reviewer
No ratings yet
4th Grade 7 Reviewer
2 pages
Notes
No ratings yet
Notes
71 pages
Math-7 FLDP Quarter-4 Week-6
No ratings yet
Math-7 FLDP Quarter-4 Week-6
7 pages
Statatics Chapter 1
No ratings yet
Statatics Chapter 1
21 pages
GE-MATH-4-FINALS-MODULE
No ratings yet
GE-MATH-4-FINALS-MODULE
15 pages
Statistics- slide 2
No ratings yet
Statistics- slide 2
15 pages
MMW
No ratings yet
MMW
7 pages
Part1 141104090445 Conversion Gate01
No ratings yet
Part1 141104090445 Conversion Gate01
27 pages
Understandingstatisticsinresearch 151026064600 Lva1 App6892
No ratings yet
Understandingstatisticsinresearch 151026064600 Lva1 App6892
37 pages
Basic-Statistical-Concepts-_-Measures-of-Location.docx
No ratings yet
Basic-Statistical-Concepts-_-Measures-of-Location.docx
14 pages
Data Management
No ratings yet
Data Management
31 pages
Basic Statistics
No ratings yet
Basic Statistics
34 pages
Unit 9
No ratings yet
Unit 9
9 pages
Data Management MMW
No ratings yet
Data Management MMW
92 pages
STATS REVIEWER
No ratings yet
STATS REVIEWER
5 pages
Intro To Statistics Lecture
No ratings yet
Intro To Statistics Lecture
41 pages
Statistics Reviewer
No ratings yet
Statistics Reviewer
8 pages
Statistics: Status, Means Political State
No ratings yet
Statistics: Status, Means Political State
17 pages
Basic Statistics
No ratings yet
Basic Statistics
30 pages
Math-1100-Module-4
No ratings yet
Math-1100-Module-4
34 pages
Lec Notes Business Stat
No ratings yet
Lec Notes Business Stat
7 pages
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
History of Tourism and Hospitality
No ratings yet
History of Tourism and Hospitality
67 pages
group-6
No ratings yet
group-6
12 pages
Macro Perspective Report
No ratings yet
Macro Perspective Report
23 pages
Brown Beige Vintage Texture Background Instagram Story
No ratings yet
Brown Beige Vintage Texture Background Instagram Story
8 pages
Wordtune and Writing
No ratings yet
Wordtune and Writing
23 pages
A Recurrent Neural Network
No ratings yet
A Recurrent Neural Network
3 pages
Effects of Social Capital On Student Academic Performance
No ratings yet
Effects of Social Capital On Student Academic Performance
17 pages
Chi-Square Tabel Excel
No ratings yet
Chi-Square Tabel Excel
2 pages
Linear Regression Code
No ratings yet
Linear Regression Code
5 pages
Research Mid 2
No ratings yet
Research Mid 2
44 pages
75 Greatest Motivational Stories Ever Told!
No ratings yet
75 Greatest Motivational Stories Ever Told!
47 pages
Practice Assignment 1.2 - Not Graded
No ratings yet
Practice Assignment 1.2 - Not Graded
11 pages
Lembar Jawaban Skillab Evidence Based Edicine (Ebm) : Parameter Rerata SD RERATA+2sd Nilai Abnormalitas
No ratings yet
Lembar Jawaban Skillab Evidence Based Edicine (Ebm) : Parameter Rerata SD RERATA+2sd Nilai Abnormalitas
26 pages
AP Statistics 2
No ratings yet
AP Statistics 2
2 pages
Module 7 - Continuous Probability Distributions and Areas Under Normal Curve - Math403 - 2020
No ratings yet
Module 7 - Continuous Probability Distributions and Areas Under Normal Curve - Math403 - 2020
4 pages
MATH 7 4thSumTest
No ratings yet
MATH 7 4thSumTest
5 pages
CHAPTER All
No ratings yet
CHAPTER All
96 pages
The ARRIVE Guidelines Checklist: Animal Research: Reporting in Vivo Experiments
No ratings yet
The ARRIVE Guidelines Checklist: Animal Research: Reporting in Vivo Experiments
2 pages
Normal Binomial Distribution
No ratings yet
Normal Binomial Distribution
8 pages
Chapter 5 Thesis Recommendation
100% (2)
Chapter 5 Thesis Recommendation
6 pages
IS328 Data Mining-Tutorial 1 Solution
No ratings yet
IS328 Data Mining-Tutorial 1 Solution
5 pages
Hypothesis Testing & SPSS
No ratings yet
Hypothesis Testing & SPSS
34 pages
A Case Study On Cost Estimation and Pro - Tability Analysis at Continental Airlines PDF
0% (1)
A Case Study On Cost Estimation and Pro - Tability Analysis at Continental Airlines PDF
21 pages
Lab 7 Worksheet T Tests
No ratings yet
Lab 7 Worksheet T Tests
2 pages
Econometrics I 1
No ratings yet
Econometrics I 1
40 pages
Assignment Description 1
No ratings yet
Assignment Description 1
4 pages
Naive - Bayes - Ipynb - Colab
No ratings yet
Naive - Bayes - Ipynb - Colab
3 pages
ESP, Lec 06, Research Proposal
No ratings yet
ESP, Lec 06, Research Proposal
19 pages
Upgrad Course Plan - Final
No ratings yet
Upgrad Course Plan - Final
2 pages
2 - Probability and Queueing Theory
No ratings yet
2 - Probability and Queueing Theory
178 pages
Journal Article Review
No ratings yet
Journal Article Review
10 pages
Introduction of The Radial Basis Function (RBF) Networks: February 2001
No ratings yet
Introduction of The Radial Basis Function (RBF) Networks: February 2001
8 pages
Q3-M6 - 3is - Data Collection Procedure
No ratings yet
Q3-M6 - 3is - Data Collection Procedure
30 pages