MMW Data Management
MMW Data Management
1. Descriptive statistics – collection, description, Upper Limits = Lower Limits + Class Interval – 1
and analysis of data without drawing For ex. Assuming LL = 16 C = 8
conclusions. The focus is to describe the data set
to make obscure information clear. UL = 16 + 8 – 1 UL = 23
2. Inferential statistics – makes estimates,
Histogram – Data grouped in intervals can be depicted
decisions, predictions, or generalizations about a
by a histogram. It is a bar graph that shows how data is
larger data set using the sample data.
distributed. A histogram should show an accurate
Variables – specific characteristic or attribute of a comparison of the data. The length should correspond to
subject. Such an attribute may assume two or more the frequency of the interval, the width of the rectangles
different values. Ex. Gender, height, etc. should be the same size as they use the same class
interval.
Types of Variables
Pie Charts
1. Qualitative Variables – measured not in numbers
but categorically by means of depression. For qualitative (categorical) data, an easy way to
2. Quantitative Variables – always associated with summarize data is through a pie chart. Pie Charts are
numbers or a scale measure. used to clearly show what part of the whole is accounted
by a specific characteristic. You can choose the
The measurement of a variable, either discrete (integer) arrangement of sectors either clockwise or
or continuous is classified into one of the ff. counterclockwise. Place the highest frequency at 12
1. Nominal – characterized by data that consists of o’clock, they should be arranged in decreasing order.
names, labels, codes or categories only. cannot The reason for constructing pie charts is to convey
be arranged in an order nor used for calculations. information visually, it should enable the reader to
Ex. Gender, race, etc. compare the relative proportions of the categorical data.
2. Ordinal – involves data that may be arranged in Every slice of pie corresponds to the relative frequency,
some order. Ex. Sizes (s,m,l,xl) socio-economic written in label. Using different colors for each slice
status (working, middle, upper), Likert scale. helps in depicting the proportions, legend may be used if
3. Interval – measurements where the difference the name of the category is long.
between values is meaningful. Ex. Temp in
Celsius and Fahrenheit, pH lvl, IQ. Measures of Central Tendency
4. Ratio – measurements are ordered according to
a value that indicates where the center of distribution
the amount of attribute they possess. In ratio, 0
tends to be located, or simply the average of the data. It
means absence of something. Temp in F or C is
is said to form the basis of statistics. The most common
not ratio since their 0 doesn’t mean the absence
measures of central tendency are the: mean, median, and
of temp, while K is a ratio as it is the absence of
mode. On a perfect normal distribution, all three are
heat. Ex. Height, weight, age, etc.
located at the same score, which is at the center of the
Nominal and Ordinal – Qualitative normal distribution.
Population versus Sample Most commonly used measure of central tendency. The
mean of a data set is the sum of the data points divided
Population – entire set of all objects. by the number of data points, or the average of data
Sample – subset of any population. points. It is strongly influenced by outliers (data points
that are extremely high or low compared to other data
Organizing Data points.) Population mean denoted by μ , sample mean is
Data Collection (Phase 1) – each element of the data is denoted by x .
called a data point. Generally, in this phase, raw data summation of data points
may not show apparent patterns or trends. x=
n
Organizing data into an FDT (phase 2) Characteristics of mean
FDT – most common way of organizing data. Utilizes a 1. The sum of deviations of the data points from the
table that lists all data points, along with how many mean is zero. (Deviation is the difference between a data
times the data point occurs ( frequency, f), and its point from a certain data point)
percentage of the total number of data (relative
f 2. The sum of the squared deviations of the data points is
frequency, rf = × 100 % ) (n – total frequency or size
n minimum when the deviations are taken from the mean.
nung pop/sample)
3. If a constant is added (or subtracted) to every data
Sa ungrouped data itatally lang then freq at rf point, the new mean is the original mean increase (or
decrease) by c.
4. If every data point is multiplied (or divided) by a
constant, the new mean is the original mean multiplied
(or divided) by c.
5. Since the mean is a calculated number, it may not be
an actual value in the data points.
Mean of Grouped data
In a grouped data, we do not know the individual data
points. In such situations, we use the midpoints of the
intervals to represent individual scores. Consequently,
the mean of the grouped data is only an approximation.
summation of fx
x=
summation of frequency