Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
77 views

Data-Management-Lecture-Notes

About data management
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views

Data-Management-Lecture-Notes

About data management
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Filamer Christian University

Roxas Avenue, Roxas City

DATA MANAGEMENT

I- Basic Concepts in Statistics

Definition of Terms:

 Statistics is a science of collecting, organizing, summarizing, presenting, analyzing, and


interpreting data to assist in making more effective decisions.
 Descriptive Statistics - methods of collecting, organizing, summarizing, and presenting data in
an informative way.
 Inferential Statistics - methods concerned with the analysis of a subset of data leading to
predictions or inferences about the entire set of data, that is, to generalize result beyond the
data collected.
 Population - entire set of individuals or objects that are being studied.
 Sample - portion or part of the population of interest.
 Statistic - descriptive measures or characteristics of sample.
 Census - process of gathering information from every element of the population.
 Survey - process of gathering information from every element of a sample.
 Variable - observable characteristics of a person or object that can assume different values.
 Data - measurements or observations for a variable.

Types of Variable

1. Qualitative - non numeric

2. Quantitative - can be reported numerically

Qualitative Quantitative
Civil Status Age
Nationality No. of Children
Degree earned Ounces of ice cream
Gender
Letter Grade
Job/occupation

Types of Quantitative

1. Discrete Variables

- can assume values that can be counted

2. Continuous Variables

- can assume an infinite number of values between two specific values.


Levels of Measurement

Nominal Level - characterized by data that consist of names, labels, or categories only. The data cannot
be arranged in an ordering scheme (such as low to high).
Example:
Survey responses as yes, no, undecided
Ordinal Level - involves data that may be arranged in some order, but differences between data values
either cannot be determined or are meaningless.
Example:
Course grades A, B+, B, C+, C, D, F or AF
Interval Level - like the ordinal level, with the additional property that the difference between any two
data values is meaningful. However, there is no natural zero starting point (where none of the quantity
is present).
Example:
IQ, Temperature
Ratio Level - possesses the characteristic of interval level, and there exist a true zero. Differences and
ratios are meaningful.
Example:
Height, Salary, Time

Table 1: Examples of Levels of Measurement


Nominal Level Ordinal Level Interval Level Ratio Level

Zip code Judging SAT score Weight

Gender (1st, 2nd, 3rd) SAI Age

Eye color Rating scale

Political affiliation Major (Average, good, VG)


field

Nationality

Methods of Collecting Data

1. Direct or Interview method- this is one of the most effective methods of collecting original data. To
obtain accurate responses, well-trained interviewers may do the interview. The interviewers can be of
great help to the respondents in answering questions that the respondents could not understand.

2. Indirect or Questionnaire method- this one of the easiest methods of data gathering. It takes time to
prepare because questionnaires need to be attractive. It can include illustrations, pictures and sketches.
Its contents, especially the directions, must be precise , clear and self-explanatory.

3. Registration Method- Through this method, the respondents provide information in compliance with
certain laws, policies, rules, regulations, decrees or standard practices. Examples are marriage contracts,
birth certificates, motor registrations, license of firearms, registration of corporations, real estates,
voters , etc.
4. Other methods

4.1. Observation- This method utilized to gather data regarding attitudes, behavior, values and cultural
patterns of the samples under investigation.

4.2. Telephone Interview- This method is employed if the question to be asked are brief and few. An
example is the check made on listeners to certain radio programs like asking what program his radio is
turned in to.

4.3. Experiments- This method is applied to collect or gather data if the investigator wants to control the
factors affecting the variable being studied. An example is when the researcher aims to determine the
different factors affecting the academic performance of the students such as methods or approaches
used in teaching, etc.

Presenting and describing Data

1. Textual form-the presentation is in narrative or paragraph form. The data are within the text of the
paragraph. This form may not the immediate interest of the reader. However, it can present a more
comprehensive picture of the data because of further written explanation of its nature.

2. Tabular-makes use of rows and columns like frequency table or frequency distribution. The data are
presented in a systematic and orderly manner, which catches one’s attention and may facilitate the
comprehension and analysis of the data presented.

3. Graphical form- the numerical data provided in a frequency distribution can be made more
interesting and easier to understand when depicted in graphical form. A graph is a pictorial or
geometrical representation of a given data.

Organizing Data

Categorical Distribution

Twenty five inductees were given a blood test to determine their blood type.

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Categorical Frequency Distribution Table


Class Tally Frequency Percent
A IIIII 5 20%
B IIIII-II 7 28%
O IIIII-IIII 9 36%
AB IIII 4 16%
more people have type O blood than any other type
Grouped Frequency Distribution

Ages of Patients in Hospital A

25 28 27 30 32
25 31 26 29

31 20 21 32 18
50 53 60 50

45 40 37 25 20
27 32 24 29

25 24 10 12 15
28 6 54 30

Grouped Frequency Distribution Table

Class Limits Class Boundaries Tally Frequency

6 – 14 5.5 – 14.5 III 3

15 – 23 14.5 – 23.5 IIIII 5

24 – 32 23.5 – 32.5 20 tallies 20

33 – 41 32.5 – 41.5 II 2

42 - 50 41.4 – 50.5 III 3

51 - 59 50.0 – 59.5 II 2

60 - 68 59.5 – 68.5 I 1

Total 36 36
Data Presentation

Histogram - a bar graph in which the horizontal scale represents classes of data values and the vertical
scale represents frequencies. The heights of the bars correspond to the frequency values, and the bars
are drawn adjacent to each other (without gaps)

Figure 1: Histogram

Frequency Polygon

- uses line segments connected to points located directly above class midpoint values.
Ogive

- a line graph that depicts cumulative frequencies, just as the cumulative frequency
distribution.

- uses class boundaries along the horizontal scale and the

graph begins with the lower boundary of the first class and ends with the upper boundary of
the last class.

Pie Graph

- used to visually depict qualitative data

- a circle divided into sections according to the percentage of frequencies in each category of
the distribution
Bar Graph

- represents the data by using vertical or horizontal bars whose heights or lengths represent
the frequencies of the data

Line Graph

- used to show trends and increases or decreases in sale, scores, population per year etc.
Pareto Chart

- a type of chart that contains both bars and a line graph, where individual values are
represented in descending order by bars, and the cumulative total is represented by the line

Time Series Graph

- data that have been collected at different points in time


Measures of Central Tendency

Once data are collected, it is useful to summarize the data set by identifying a value around which data
are centered.

Arithmetic Mean or Average

- numerical balancing point of the data set. It is calculated by adding all the data values and dividing the
sum by the total number of data points.

Sample Mean ( x )

n xi
X= x 1 + x 2 + x 3 + … + x n = ∑ i=1
_____________________ _______
n n

Population Mean (µ)

N xi
µ = x1 + x2 + x3 + … + xN ∑ i=1
____________________ _____
N N
Example: 6 test scores in Statistics
12 10 15 8` 5 18

X= 12 + 10 + 15 + 8 + 5+18 = 68
____________________ ____
N 6
=11.333…

the mean score for statistic is 11.3

Median (Md)
- simply the middle number in an ordered set of data
if n is odd if n is even
Md = middle score Md = ∑ of two middle scores
2
Example:
1. Six students borrowed these numbers of books in a library:
1, 2, 3, 4, 6, 7 (3 and 4 are the middle scores)
Md = 3 + 4 / 2= 3.5
2. Five students borrowed these numbers of books in a library:
2, 2, 3, 3, 3
Md = 3

Mode (Mo)
- most frequently occurring number in a data set
Examples:
1. 2 5 8 3 2 2 3 1
(Mode is 2)
2. 1 1 3 5 7 8 3 4
(Bimodal, Mode is 1 and 3)
3. 2 4 6 8 9 3 1 5
(No mode)

Midrange (MR)

- the value midway between the highest and lowest values in the original data set

MR = highest score + lowest score

Weighted Mean

- multiply each value by its corresponding weight and dividing the sum of the products by the
sum of the weights.

n wi xi
X=w 1 x 1 + w 2 x 2 + … + w n x n x= ∑ i=1
__________________________ _____________
n wi
w 1 + w 2 + w n …+ ∑ i=1
where w1, w2, w3, ..., wn are the weights and X1, X2, X3, ..., Xn are the values.

Example:

Subject Units (w) Letter Grade Numerical Value (X)

PE 2 A 4

Calculus 5 C+ 2.5

English 3 B 3

X= 2 . 4 + 5 .2.5 + 3 . 3

10 = 2.95

the QPI is 2.95

Measures of Dispersions

• Range - difference between highest and lowest value


• Standard Deviation - always positive - zero if all the data are the same - uses the
same units as the original data set - can be influenced by outliers

• Variance - does not use the same units as the data

• Population Standard Deviation (σ)

σ = √σ 2 =
√ ∑ (X −μ)2
N
where X - individual value

µ - population mean

N - population size

Sample Standard Deviation (s)

s = √s2 =
√ ∑ (X −X )2
n−1
where X - individual value

X - sample mean
n - sample size
Let A = 5, 5, 5, 5, 5, 5, 5, 5, 5, 5

B = 4, 4, 4, 5, 5, 5, 5, 6, 6, 6

C = 0, 0, 0, 0, 0, 10, 10, 10, 10, 10

D = 0, 5, 5, 5, 5, 5, 5, 5, 5, 10

• Find the mean and median for each data set.

• Find the range and the standard deviation of each data set.

Measures of Positions

Percentiles - divide the data set into 100 equal groups

Deciles - divide the data set into 10 equal groups


• Quartiles - divide the data set into quarters

• z-score (standard score)

- the z-score for a given data value x is the number of standard deviations that x is
above or below the mean of the data

population: sample:

x−µ x−x
z= z=
σ s
Example:

A basketball player Carl is 78 inches tall and a volleyball player Jane is 76 inches tall. Carl is obviously
taller by 2 inches, but which player is relatively taller? Does Carl’s height among men exceed Jane’s
height among women? Men have mean height of 68 inches and a standard deviation of 2.8 inches while
women have mean height of 63.6 inches and a standard deviation of 2.5 inches.

X−x 78−69
Carl : z = = = 3.21
s 2.8
X−x 76−63.6
Jane : z = = = 4.96
s 2.5
Carl’s height is 3.21 standard deviations above the mean, but Jane’s height is a whopping 4.96 standard
deviations above the mean.

∴ Jane’s height among women is relatively greater than Carl’s height among men.

Exercises
1. The average teacher’s salary in a particular city is P54,166. If the standard deviation is P10,200,
find the salaries corresponding to the following z scores.

a. 2

b. −1.6

c. 2.5

2. The mean time to download pdf file is 12 min with a standard deviation of 4 min. Belle’s download
time is 20 min. John’s download time is 6 min. How can you compare Belle’s download time compare
with John?

3. Cheryl has taken two quizzes in her history class. She scored 15 on the first quiz, for which the mean
of all scores was 12 and the standard deviation was 2.4. Her score on the second quiz, for which the
mean of all scores was 11 and the standard deviation was 2.0, was 14. In comparison to her classmates,
did Cheryl do better on the first quiz or the second quiz?

4. Roland received a score of 70 on a test for which the mean score was 65.5. Roland has learned that
the z-score for his test is 0.6. What is the standard deviation for this set of test scores?

Box and Whisker Plot (box plot)

- used to provide a visual summary of a set of data

- it shows the median, the 1st and 3rd quartiles, and the minimum and maximum values of a
data set

Constructing Box-and-Whisker Pot

1. Find the five-number summary for the data values, that is, the maximum and minimum data
values, Q1 and Q3, and the median.

2. Draw a horizontal line with a scale such that it includes the maximum and minimum data values.

3. Draw a box whose vertical sides go through Q1 and Q3, and draw a vertical line though the
median.

4. Draw a line from the minimum data value to the left side of the box and a line from the
maximum data value to the right side of the box.

Example:
Construct a box-and-whisker plot for the following data:

Number of Rooms Occupied in a Resort during an 18-day period

86 77 58 45 94 96 83 76 75

65 68 72 78 85 87 92 55 61

You might also like