01 Econ115a Mod2 Lesson3 BasicDataManagementusingSPSS
01 Econ115a Mod2 Lesson3 BasicDataManagementusingSPSS
Learning objectives:
- Define Variables, Data, and Datasets
- Identify the different types of data
- Encode variables and data in SPSS
Econ 115a
Outline
3.1 Variables, Data, and Datasets
3.2 Data Types
3.3 Scales of Measurement
3.4 Encoding variables in SPSS
3.5 Encoding Data in SPSS
Econ 115a
Variable
- a quantity that may assume any one of a set of values, something that vary (Merriam
Webster)
- is a symbol, commonly a single letter, that represents a number, called the value of
the variable, which is either arbitrary, not fully specified, or unknown.
- can be in any form as long as it replaces an unknown value (age, gender, civil status,
income/salary, sales, expenditures, etc.).
Econ 115a
Data
- factual information (such as measurements or statistics) used as a basis for
reasoning, discussion, or calculation (Merriam Webster)
Dataset
- refers to a file that contains one or more records (IBM)
- a collection of data
According to source
1. Primary data – data are taken directly from the respondents/samples; mostly
involves person-to-person contact.
2. Secondary Data – data are taken from published articles complied and processed
by personnel from an institution and/or government agencies.
Econ 115a
According to nature
1. Quantitative data – numerical data (e.g., age, household size, income)
3. Panel data (cross-sectional time series data) – data that is derived from a
number of observations/participants over time.
Econ 115a
According to measurement
1. Continuous data – data that can be divided into smaller units (height, weight,
distance)
2. Discrete data – data that cannot be divided into smaller units (e.g. no. of students,
number of faculty in a school)
Econ 115a
According to arrangement
1. Ranked data – data which can be arranged into a set of ordered categories (e.g.,
first, second, third)
2. Nominal data – discrete data which cannot be ordered (e.g., sex/gender, civil
status)
Econ 115a
2. Median
- the number that divides the data set into two (2) equal parts. First, put the data into
an array, then find the center value. For odd cases, it is easy to locate the median, for
even number of cases, add the middle pairs then divide by the two.
Econ 115a
3. Mode
- is the most frequently occurring value in a dataset. It is determined simply by
counting how many times each value appears and then finding the value with the
highest frequency.
Example: 30 14 28 7 12 4 21
4 22 8 16 20 2 10
Mean?
Median?
Mode?
Econ 115a
What measure is the most reliable in terms of determining the central tendency of a
given dataset?
Answer: Median
Mode simply relies on frequency of occurrence (no mode, bimodal, and multimodal)
Econ 115a
1. Nominal
- a scale used to label variables that have no quantitative values.
- the values just “name” the attribute uniquely, no ordering of the cases is implied.
For example, jersey numbers in basketball are measures at the nominal level. A player
with number 30 is not more of anything than a player with number 15 and is certainly
not twice whatever number 15 is.
Examples: gender, hair/eye color, blood type, place/address, full name, etc.
Econ 115a
Properties
They have no natural order. For example, we can’t arrange eye colors in order of
worst to best or lowest to highest.
Categories are mutually exclusive. For example, an individual can’t have both blue
and brown eyes. Similarly, an individual can’t live both in the city and in a rural area.
The only number we can calculate for these variables are counts. For example, we
can count how many individuals have blonde hair, how many have black hair, how
many have brown hair, etc.
Econ 115a
The only measure of central tendency we can calculate for these variables is the
mode. The mode tells us which category had the most counts. For example, we could
find which eye color occurred most frequently.
Econ 115a
2. Ordinal
- a scale used to label variables that have a natural order, but no quantifiable
difference between values.
Properties
They have a natural order. For example, “very satisfied” is better than “satisfied,”
which is better than “neutral,” etc.
The difference between values can’t be evaluated. For example, we can’t exactly
say that the difference between “very satisfied and “satisfied” is the same as the
difference between “satisfied” and “neutral.”
Econ 115a
The two measures of central tendency we can calculate for these variables are
the mode and the median. The mode tells us which category had the most counts
and the median tells us the “middle” value.
Econ 115a
3. Interval
- a scale used to label variables that have a natural order and a quantifiable difference
between values, but “no true zero” value.
Properties
These variables have a natural order.
These variables have an exact difference between values. Recall that ordinal
variables have no exact difference between variables – we don’t know if the difference
between “very satisfied” and “satisfied” is the same as the difference between
“satisfied” and “neutral.”
Econ 115a
For variables on an interval scale, though, we know that the difference between a
credit score of 850 and 800 is the exact same as the difference between 800 and 750.
These variables have no “true zero” value. For example, it’s impossible to have a
credit score of zero. It’s also impossible to have a SAT score of zero.
For temperatures, it’s possible to have negative values (e.g. -10° F) which means there
isn’t a true zero value that values can’t go below.
Econ 115a
4. Ratio
- a scale used to label variables that have a natural order, a quantifiable difference
between values, and a “true zero” value.
Properties
These variables have a natural order.
We can calculate the mean, median, mode, and a variety of other descriptive
statistics for these variables.
These variables have a “true zero” value. For example, length, weight, and height
all have a minimum value (zero) that can’t be exceeded.
Sometimes, it’s not possible for ratio variables to take on negative values. For this
reason, the ratio between values can be calculated.
For example, someone who weighs 200 lbs. can be said to weigh two times as much
as someone who weights 100 lbs. Likewise someone who is 6 feet tall is 1.5 times
taller than someone who is 4 feet tall.
Econ 115a
Demonstration
Econ 115a
Demonstration
Econ 115a
References:
Wahl, M. (2013). Crash Course on Basic Statistics. University of New York at Stony Brook
Beginning Statistics (2012). https://2012books.lardbucket.org/books/beginning-statistics/
Ho, R. (2006). Handbook of Univariate and Multivariate Data Analysis and Interpretation with SPSS. Chapman and
Hall/CRC
Isotalo, Jarkko (n.d.) Basic Statistics.
THANK YOU!