01 AE9 Module2 Lesson3 BasicDataManagementusingSPSS
01 AE9 Module2 Lesson3 BasicDataManagementusingSPSS
Software Application
Module 2: Variables and Data
Learning objectives:
- Define Variables, Data, and Datasets
- Identify the different types of data
- Encode variables and data in SPSS
AE 9
Outline
3.1 Variables, Data, and Datasets
3.2 Data Types
3.3 Scales of Measurement
3.4 Encoding variables in SPSS
3.5 Encoding Data in SPSS
AE 9
Variable
- a quantity that may assume any one of a set of values, something that vary (Merriam
Webster)
- is a symbol, commonly a single letter, that represents a number, called the value of
the variable, which is either arbitrary, not fully specified, or unknown.
AE 9
- can be in any form as long as it replaces an unknown value (age, gender, civil status,
income/salary, sales, expenditures, etc.).
Data
- factual information (such as measurements or statistics) used as a basis for
reasoning, discussion, or calculation (Merrian Webster)
Dataset
- refers to a file that contains one or more records (IBM)
- a collection of data
According to source
1. Primary data – data are taken directly from the respondents/samples; mostly
involves person-to-person contact.
2. Secondary Data – data are taken from published articles complied and processed
by personnel from an institution and/or government agencies.
AE 9
According to nature
1. Quantitative data – numerical data (e.g., age, household size, income)
3. Panel data (cross-sectional time series data) – data that is derived from a
number of observations/participants over time.
AE 9
According to measurement
1. Continuous data – data that can be divided into smaller units (height, weight,
distance).
2. Discrete data – data that cannot be divided into smaller units (e.g. no. of students,
number of faculty in a school).
AE 9
According to arrangement
1. Ranked data – data which can be arranged into a set of ordered categories (e.g.,
first, second, third)
2. Nominal data – discrete data which cannot be ordered (e.g., sex/gender, civil
status)
AE 9
2. Median
- the number that divides the data set into two (2) equal parts. First, put the data into
an array, then find the center value. For odd cases, it is easy to locate the median, for
even number of cases, add the middle pairs then divide by the two.
AE 9
3. Mode
- is the most frequently occurring value in a dataset. It is determined simply by
counting how many times each value appears and then finding the value with the
highest frequency.
30 14 28 7 12 4 21
4 22 8 16 20 2 10
Mean?
Median?
Mode?
AE 9
What measure is the most reliable in terms of determining the central tendency of a
given dataset?
Answer: Median
Mode simply relies on frequency of occurrence (no mode, bimodal and multimodal)
AE 9
1. Nominal
- a scale used to label variables that have no quantitative values.
- the values just “name” the attribute uniquely, no ordering of the cases is implied
For example, jersey numbers in basketball are measures at the nominal level. A player
with number 30 is not more of anything than a player with number 15 and is certainly
not twice whatever number 15 is.
Examples: gender, hair/eye color, blood type, place/address, full name, etc.
AE 9
Properties
- They have no natural order. For example, we can’t arrange eye colors in order of
worst to best or lowest to highest.
- Categories are mutually exclusive. For example, an individual can’t have both blue
and brown eyes.
Similarly, an individual can’t live both in the city and in a rural area.
AE 9
- The only number we can calculate for these variables are counts. For example,
we can count how many individuals have blonde hair, how many have black hair, how
many have brown hair, etc.
- The only measure of central tendency we can calculate for these variables is the
mode. The mode tells us which category had the most counts. For example, we could
find which eye color occurred most frequently.
AE 9
2. Ordinal
- a scale used to label variables that have a natural order, but no quantifiable
difference between values.
Properties
- They have a natural order. For example, “very satisfied” is better than “satisfied,”
which is better than “neutral,” etc.
- The difference between values can’t be evaluated. For example, we can’t exactly
say that the difference between “very satisfied and “satisfied” is the same as the
difference between “satisfied” and “neutral.”
AE 9
- The two measures of central tendency we can calculate for these variables are
the mode and the median. The mode tells us which category had the most counts
and the median tells us the “middle” value.
AE 9
3. Interval
- a scale used to label variables that have a natural order and a quantifiable difference
between values, but no “true zero” value.
Properties
- These variables have a natural order.
- We can measure the mean, median, mode, and standard deviation of these
variables.
- These variables have an exact difference between values. Recall that ordinal
variables have no exact difference between variables – we don’t know if the difference
between “very satisfied” and “satisfied” is the same as the difference between
“satisfied” and “neutral.”
AE 9
For variables on an interval scale, though, we know that the difference between a
credit score of 850 and 800 is the exact same as the difference between 800 and 750.
- These variables have no “true zero” value. For example, it’s impossible to have a
credit score of zero. It’s also impossible to have an SAT score of zero.
For temperatures, it’s possible to have negative values (e.g. -10° F) which means there
isn’t a true zero value that values can’t go below.
AE 9
4. Ratio
- a scale used to label variables that have a natural order, a quantifiable difference
between values, and a “true zero” value.
Properties
- These variables have a natural order.
- We can calculate the mean, median, mode, standard deviation, and a variety of
other descriptive statistics for these variables.
- These variables have a “true zero” value. For example, length, weight, and height
all have a minimum value (zero) that can’t be exceeded.
It’s sometimes not possible for ratio variables to take on negative values. For this
reason, the ratio between values can be calculated.
For example, someone who weighs 200 lbs. can be said to weigh two times as much
as someone who weights 100 lbs. Likewise someone who is 6 feet tall is 1.5 times
taller than someone who is 4 feet tall.
AE 9
https://www.graphpad.com/support/faq/what-is-the-difference-between-ordinal-interval-and-ratio-variables-why-should-i-care/
AE 9
Demonstration
AE 9
Demonstration
AE 9
References:
Kent State University Libraries. (2017, May 15). SPSS tutorials. Retrieved November 17, 2020, from
https://libguides.library.kent.edu/SPSS/
Ho, R. (2006). Handbook of Univariate and Multivariate Data Analysis and Interpretation with SPSS. Chapman and
Hall/CRC
https://www.spss-tutorials.com/basics/
THANK YOU!