Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Econometrics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Simple Linear Regression

Regression analysis provides detailed


insight that can be applied to further
improve products and services.
Here at Alchemer, we offer hands-on application training events during which
customers learn how to become super users of our software.
In order to understand the value being delivered at these training events, we
distribute follow-up surveys to attendees with the goals of learning what they
enjoyed, what they didn’t, and what we can improve on for future sessions.
The data collected from these feedback surveys allows us to measure the
levels of satisfaction that our attendees associate with our events, and what
variables influence those levels of satisfaction.
Could it be the topics covered in the individual sessions of the event? The
length of the sessions? The food or catering services provided? The cost to
attend? Any of these variables have the potential to impact an attendee’s level
of satisfaction.

By performing a regression analysis on this survey data, we can determine


whether or not these variables have impacted overall attendee satisfaction,
and if so, to what extent.

This information then informs us about which elements of the sessions are
being well received, and where we need to focus attention so that attendees
are more satisfied in the future.

What is regression analysis and what


does it mean to perform a regression?
Regression analysis is a reliable method of identifying which variables have
impact on a topic of interest. The process of performing a regression allows
you to confidently determine which factors matter most, which factors can be
ignored, and how these factors influence each other.

In order to understand regression analysis fully, it’s essential to comprehend


the following terms:

• Dependent Variable: This is the main factor that you’re trying to


understand or predict.
• Independent Variables: These are the factors that you hypothesize
have an impact on your dependent variable.
In our application training example above, attendees’ satisfaction with the
event is our dependent variable. The topics covered, length of sessions, food
provided, and the cost of a ticket are our independent variables.
How does regression analysis work?
In order to conduct a regression analysis, you’ll need to define a dependent
variable that you hypothesize is being influenced by one or several
independent variables.

You’ll then need to establish a comprehensive dataset to work with.


Administering surveys to your audiences of interest is a terrific way to
establish this dataset. Your survey should include questions addressing all of
the independent variables that you are interested in.

Let’s continue using our application training example. In this case, we’d want
to measure the historical levels of satisfaction with the events from the past
three years or so (or however long you deem statistically significant), as well
as any information possible in regards to the independent variables.

Perhaps we’re particularly curious about how the price of a ticket to the event
has impacted levels of satisfaction.

To begin investigating whether or not there is a relationship between these


two variables, we would begin by plotting these data points on a chart, which
would look like the following theoretical example.

• what simple regression?


Simple linear regression is a regression model that estimates the relationship between one
independent variable and one dependent variable using a straight line. Both variables
should be quantitative.

Simple Linear Regression | An Easy Introduction & Examples


Regression models describe the relationship between variables by fitting a line to the observed
data. Linear regression models use a straight line, while logistic and nonlinear regression models
use a curved line. Regression allows you to estimate how a dependent variable changes as the
independent variable(s) change.

Simple linear regression is used to estimate the relationship between two quantitative
variables. You can use simple linear regression when you want to know:
1. How strong the relationship is between two variables (e.g. the relationship between rainfall and
soil erosion).
2. The value of the dependent variable at a certain value of the independent variable (e.g. the
amount of soil erosion at a certain level of rainfall).

Example….You are a social researcher interested in the relationship between


income and happiness. You survey 500 people whose incomes range from 15k
to 75k and ask them to rank their happiness on a scale from 1 to 10.
Your independent variable (income) and dependent variable (happiness) are
both quantitative, so you can do a regression analysis to see if there is a linear
relationship between them.

Assumptions of simple linear regression


Simple linear regression is a parametric test, meaning that it makes certain
assumptions about the data. These assumptions are:

1. Homogeneity of variance (homoscedasticity): the size of the error in our


prediction doesn’t change significantly across the values of the independent
variable.
2. Independence of observations: the observations in the dataset were collected
using statistically valid sampling methods, and there are no hidden relationships
among observations.
3. Normality: The data follows a normal distribution.

Multiple Linear Regression | A Quick Guide (Examples)


Published on February 20, 2020 by Rebecca Bevans. Revised on June 1, 2022.
Regression models are used to describe relationships between variables by fitting a line to the
observed data. Regression allows you to estimate how a dependent variable changes as the
independent variable(s) change.

Multiple linear regression is used to estimate the relationship between two or more
independent variables and one dependent variable. You can use multiple linear regression
when you want to know:

1. How strong the relationship is between two or more independent variables and one dependent
variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
2. The value of the dependent variable at a certain value of the independent variables (e.g. the
expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).

Example: You are a public health researcher interested in social factors that influence
heart disease. You survey 500 towns and gather data on the percentage of people in
each town who smoke, the percentage of people in each town who bike to work, and
the percentage of people in each town who have heart disease.
Because you have two independent variables and one dependent variable, and all your
variables are quantitative, you can use multiple linear regression to analyze the
relationship between them.

Assumptions of multiple linear regression


Multiple linear regression makes all of the same assumptions as simple linear
regression:

Homogeneity of variance (homoscedasticity): the size of the error in our prediction


doesn’t change significantly across the values of the independent variable.

Independence of observations: the observations in the dataset were collected using


statistically valid methods, and there are no hidden relationships among variables.

In multiple linear regression, it is possible that some of the independent variables are
actually correlated with one another, so it is important to check these before developing
the regression model. If two independent variables are too highly correlated (r2 > ~0.6),
then only one of them should be used in the regression model.

Normality: The data follows a normal distribution.

Linearity: the line of best fit through the data points is a straight line, rather than a curve
or some sort of grouping factor.

How to perform a multiple linear regression


Multiple linear regression formula
The formula for a multiple linear regression is:

• = the predicted value of the dependent variable


• = the y-intercept (value of y when all other parameters are set to 0)
• = the regression coefficient ( ) of the first independent variable ( ) (a.k.a. the
effect that increasing the value of the independent variable has on the predicted y value)
• … = do the same for however many independent variables you are testing
• = the regression coefficient of the last independent variable
• = model error (a.k.a. how much variation there is in our estimate of )

To find the best-fit line for each independent variable, multiple linear regression
calculates three things:

• The regression coefficients that lead to the smallest overall model error.
• The t-statistic of the overall model.
• The associated p-value (how likely it is that the t-statistic would have occurred by
chance if the null hypothesis of no relationship between the independent and
dependent variables was true).

Logistic Regression was used in the biological sciences in early


twentieth century. It was then used in many social science
applications. Logistic Regression is used when the dependent
variable(target) is categorical.

For example,

• To predict whether an email is spam (1) or (0)

• Whether the tumor is malignant (1) or not (0)

Consider a scenario where we need to classify whether an email is


spam or not. If we use linear regression for this problem, there is a
need for setting up a threshold based on which classification can be
done. Say if the actual class is malignant, predicted continuous value
0.4 and the threshold value is 0.5, the data point will be classified as
not malignant which can lead to serious consequence in real time.
From this example, it can be inferred that linear regression is not
suitable for classification problem. Linear regression is unbounded,
and this brings logistic regression into picture. Their value strictly
ranges from 0 to 1.

Chapter 6 / Validity and Reliability

gain, measurement involves assigning scores to individuals so that they


represent some characteristic of the individuals. But how do researchers know
that the scores actually represent the characteristic, especially when it is a
construct like intelligence, self-esteem, depression, or working memory
capacity? The answer is that they conduct research using the measure to
confirm that the scores make sense based on their understanding of the
construct being measured. This is an extremely important point. Psychologists
do not simply assume that their measures work. Instead, they collect data
to demonstrate that they work. If their research does not demonstrate that a
measure works, they stop using it.

As an informal example, imagine that you have been dieting for a month. Your
clothes seem to be fitting more loosely, and several friends have asked if you
have lost weight. If at this point your bathroom scale indicated that you had
lost 10 pounds, this would make sense and you would continue to use the
scale. But if it indicated that you had gained 10 pounds, you would rightly
conclude that it was broken and either fix it or get rid of it. In evaluating a
measurement method, psychologists consider two general dimensions:
reliability and validity.

RELIABILITY

Reliability refers to the consistency of a measure. Psychologists consider


three types of consistency: over time (test-retest reliability), across items
(internal consistency), and across different researchers (inter-rater reliability).

Test-Retest Reliability

When researchers measure a construct that they assume to be consistent


across time, then the scores they obtain should also be consistent across
time. Test-retest reliability is the extent to which this is actually the case.
For example, intelligence is generally thought to be consistent across time. A
person who is highly intelligent today will be highly intelligent next week. This
means that any good measure of intelligence should produce roughly the same
scores for this individual next week as it does today. Clearly, a measure that
produces highly inconsistent scores over time cannot be a very good measure
of a construct that is supposed to be consistent.

Assessing test-retest reliability requires using the measure on a group of


people at one time, using it again on the same group of people at a later time,
and then looking at test-retest correlation between the two sets of scores.
This is typically done by graphing the data in a scatterplot and computing the
correlation coefficient. Figure 4.2 shows the correlation between two sets of
scores of several university students on the Rosenberg Self-Esteem Scale,
administered two times, a week apart. The correlation coefficient for these
data is +.95. In general, a test-retest correlation of +.80 or greater is
considered to indicate good reliability.

Again, high test-retest correlations make sense when the construct being
measured is assumed to be consistent over time, which is the case for
intelligence, self-esteem, and the Big Five personality dimensions. But other
constructs are not assumed to be stable over time. The very nature of mood,
for example, is that it changes. So a measure of mood that produced a low test-
retest correlation over a period of a month would not be a cause for concern.

Another kind of reliability is internal consistency, which is the consistency


of people’s responses across the items on a multiple-item measure. In general,
all the items on such measures are supposed to reflect the same underlying
construct, so people’s scores on those items should be correlated with each
other. On the Rosenberg Self-Esteem Scale, people who agree that they are a
person of worth should tend to agree that they have a number of good
qualities. If people’s responses to the different items are not correlated with
each other, then it would no longer make sense to claim that they are all
measuring the same underlying construct. This is as true for behavioral and
physiological measures as for self-report measures. For example, people might
make a series of bets in a simulated game of roulette as a measure of their
level of risk seeking. This measure would be internally consistent to the extent
that individual participants’ bets were consistently high or low across trials.

Like test-retest reliability, internal consistency can only be assessed by


collecting and analyzing data. One approach is to look at a split-
half correlation. This involves splitting the items into two sets, such as the
first and second halves of the items or the even- and odd-numbered items.
Then a score is computed for each set of items, and the relationship between
the two sets of scores is examined. For example, Figure 4.3 shows the split-
half correlation between several university students’ scores on the even-
numbered items and their scores on the odd-numbered items of the
Rosenberg Self-Esteem Scale. The correlation coefficient for these data is
+.88. A split-half correlation of +.80 or greater is generally considered good
internal consistency.

Perhaps the most common measure of internal consistency used by


researchers in psychology is a statistic called Cronbach’s α (the Greek letter
alpha). Conceptually, α is the mean of all possible split-half correlations for a
set of items. For example, there are 252 ways to split a set of 10 items into two
sets of five. Cronbach’s α would be the mean of the 252 split-half correlations.
Note that this is not how α is actually computed, but it is a correct way of
interpreting the meaning of this statistic. Again, a value of +.80 or greater is
generally taken to indicate good internal consistency.

Interrater Reliability

Many behavioral measures involve significant judgment on the part of an


observer or a rater. Inter-rater reliability is the extent to which different
observers are consistent in their judgments. For example, if you were
interested in measuring university students’ social skills, you could make
video recordings of them as they interacted with another student whom they
are meeting for the first time. Then you could have two or more observers
watch the videos and rate each student’s level of social skills. To the extent
that each participant does, in fact, have some level of social skills that can be
detected by an attentive observer, different observers’ ratings should be highly
correlated with each other. Inter-rater reliability would also have been
measured in Bandura’s Bobo doll study. In this case, the observers’ ratings of
how many acts of aggression a particular child committed while playing with
the Bobo doll should have been highly positively correlated. Interrater
reliability is often assessed using Cronbach’s α when the judgments are
quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa)
when they are categorical.

VALIDITY
Validity is the extent to which the scores from a measure represent the variable they are
intended to. But how do researchers make this judgment? We have already considered
one factor that they take into account—reliability. When a measure has good test-retest
reliability and internal consistency, researchers should be more confident that the scores
represent what they are supposed to. There has to be more to it, however, because a
measure can be extremely reliable but have no validity whatsoever. As an absurd example,
imagine someone who believes that people’s index finger length reflects their self-esteem
and therefore tries to measure self-esteem by holding a ruler up to people’s index fingers.
Although this measure would have extremely good test-retest reliability, it would have
absolutely no validity. The fact that one person’s index finger is a centimeter longer than
another’s would indicate nothing about which one had higher self-esteem.

Discussions of validity usually divide it into several distinct “types.” But a good way to
interpret these types is that they are other kinds of evidence—in addition to reliability—
that should be taken into account when judging the validity of a measure. Here we
consider three basic kinds: face validity, content validity, and criterion validity.

Face validity is the extent to which a measurement method appears “on its face” to
measure the construct of interest. Most people would expect a self-esteem questionnaire
to include items about whether they see themselves as a person of worth and whether they
think they have good qualities. So a questionnaire that included these kinds of items
would have good face validity. The finger-length method of measuring self-esteem, on the
other hand, seems to have nothing to do with self-esteem and therefore has poor face
validity. Although face validity can be assessed quantitatively—for example, by having a
large sample of people rate a measure in terms of whether it appears to measure what it
is intended to—it is usually assessed informally.

Face validity is at best a very weak kind of evidence that a measurement method is
measuring what it is supposed to. One reason is that it is based on people’s intuitions
about human behavior, which are frequently wrong. It is also the case that many
established measures in psychology work quite well despite lacking face validity. The
Minnesota Multiphasic Personality Inventory-2 (MMPI-2) measures many personality
characteristics and disorders by having people decide whether each of over 567 different
statements applies to them—where many of the statements do not have any obvious
relationship to the construct that they measure. For example, the items “I enjoy detective
or mystery stories” and “The sight of blood doesn’t frighten me or make me sick” both
measure the suppression of aggression. In this case, it is not the participants’ literal
answers to these questions that are of interest, but rather whether the pattern of the
participants’ responses to a series of questions matches those of individuals who tend to
suppress their aggression.

Content validity is the extent to which a measure “covers” the construct of interest. For
example, if a researcher conceptually defines test anxiety as involving both sympathetic
nervous system activation (leading to nervous feelings) and negative thoughts, then his
measure of test anxiety should include items about both nervous feelings and negative
thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings,
and actions toward something. By this conceptual definition, a person has a positive
attitude toward exercise to the extent that he or she thinks positive thoughts about
exercising, feels good about exercising, and actually exercises. So to have good content
validity, a measure of people’s attitudes toward exercise would have to reflect all three of
these aspects. Like face validity, content validity is not usually assessed quantitatively.
Instead, it is assessed by carefully checking the measurement method against the
conceptual definition of the construct.

Criterion validity is the extent to which people’s scores on a measure are correlated
with other variables (known as criteria) that one would expect them to be correlated
with. For example, people’s scores on a new measure of test anxiety should be negatively
correlated with their performance on an important school exam. If it were found that
people’s scores were in fact negatively correlated with their exam performance, then this
would be a piece of evidence that these scores really represent people’s test anxiety. But if
it were found that people scored equally well on the exam regardless of their test anxiety
scores, then this would cast doubt on the validity of the measure.

A criterion can be any variable that one has reason to think should be correlated with the
construct being measured, and there will usually be many of them. For example, one
would expect test anxiety scores to be negatively correlated with exam performance and
course grades and positively correlated with general anxiety and with blood pressure
during an exam. Or imagine that a researcher develops a new measure of physical risk
taking. People’s scores on this measure should be correlated with their participation in
“extreme” activities such as snowboarding and rock climbing, the number of speeding
tickets they have received, and even the number of broken bones they have had over the
years. When the criterion is measured at the same time as the construct, criterion validity
is referred to as concurrent validity; however, when the criterion is measured at some
point in the future (after the construct has been measured), it is referred to as predictive
validity (because scores on the measure have “predicted” a future outcome).

Criteria can also include other measures of the same construct. For example, one would
expect new measures of test anxiety or physical risk taking to be positively correlated with
existing established measures of the same constructs. This is known as convergent
validity.

Assessing convergent validity requires collecting data using the measure. Researchers
John Cacioppo and Richard Petty did this when they created their self-report Need for
Cognition Scale to measure how much people value and engage in thinking (Cacioppo &
Petty, 1982) . In a series of studies, they showed that people’s scores were positively
[1]

correlated with their scores on a standardized academic achievement test, and that their
scores were negatively correlated with their scores on a measure of dogmatism (which
represents a tendency toward obedience). In the years since it was created, the Need for
Cognition Scale has been used in literally hundreds of studies and has been shown to be
correlated with a wide variety of other variables, including the effectiveness of an
advertisement, interest in politics, and juror decisions (Petty, Briñol, Loersch, &
McCaslin, 2009) . [2]
Discriminant validity, on the other hand, is the extent to which scores on a measure
are not correlated with measures of variables that are conceptually distinct. For example,
self-esteem is a general attitude toward the self that is fairly stable over time. It is not the
same as mood, which is how good or bad one happens to be feeling right now. So people’s
scores on a new measure of self-esteem should not be very highly correlated with their
moods. If the new measure of self-esteem were highly correlated with a measure of mood,
it could be argued that the new measure is not really measuring self-esteem; it is
measuring mood instead.

When they created the Need for Cognition Scale, Cacioppo and Petty also provided
evidence of discriminant validity by showing that people’s scores were not correlated with
certain other variables. For example, they found only a weak correlation between people’s
need for cognition and a measure of their cognitive style—the extent to which they tend
to think analytically by breaking ideas into smaller parts or holistically in terms of “the
big picture.” They also found no correlation between people’s need for cognition and
measures of their test anxiety and their tendency to respond in socially desirable ways.
All these low correlations provide evidence that the measure is reflecting a conceptually
distinct construct.

You might also like