Catatan Statisktik FIX
Catatan Statisktik FIX
Catatan Statisktik FIX
The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more
effective decisions.
} Descriptive S-> Methods of organizing, summarizing, and presenting data in an informative way.
(Can be used to organize data into a meaningful form & You can summarize data and provide
information that is easy to understand
} Inferential S-> The methods used to estimate a property of a population on the basis of a sample.
(can be used to estimate properties of a population & make decisions based on a limited set of data)
Population: The entire set of individuals/objects of interest or the
measurements obtained from all individuals/objects of interest.
Types of Variables
1. Qualitative V: observed & recorded as a non-numeric characteristic or attribute.
Ex: gender, state of birth, eye color
2. Quantitative V: numerically - can be discrete or continuous
Ex: balance in your checking account, the life of a car battery
} Discrete = typically the result of counting, values have “gaps” between the values. Examples: jumlah kamar
(1, 2, etc.), the number of students in class (326, 421, etc.) -> bulat. XX: 1,5 org / 2,3 kamar
} Continuous = the result of measuring something, can assume any value within a specific range. Examples:
Duration of flights from A to B (5.25 hours), grade point average (3.258).
Levels of Measurement (determines the type of statistical analysis that can be performed)
1. Nominal Level of Measurement: data is represented as labels, can only be classified & counted.
Examples: classifying M&M candies by color, identifying students at a football game by gender.
2. Ordinal Level of Measurement: data is based on a relative ranking or rating of items based on a
defined attribute or qualitative variable. Variables based on this level of measurement are only ranked
and counted. (The rankings are known but not the magnitude of differences between groups)
Examples: the list of top ten states for best business climate, student ratings of professors.
3. Interval Level of Measurement: data the interval or the distance between values is meaningful, based
on a scale with a known unit of measurement. (This data has all the characteristics of ordinal level
data, + the differences between the values are meaningful, there is no natural 0 point) Examples: the
Fahrenheit temperature scale (Zero temperature does not mean no temperature at all), dress sizes.
4. Ratio Level of Measurement: Data based on a scale with a known unit of measurement & a
meaningful interpretation of zero on the scale. (The data has all the characteristics of the interval
scale & ratios between numbers are meaningful, the 0 point represents the absence of the
characteristic) Examples: wages (Zero dollar = no money), changes in stock prices, and height.
1
C2: DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, &
GRAPHIC PRESENTATION
Frequency Table: a grouping of qualitative data into mutually exclusive and collectively exhaustive classes
showing the number of observations in each class.
} Mutually exclusive means the data fit in just one class
Term mutually exclusive -> In frequency distributions, classes are mutually exclusive if each
individual, object, or measurement is included in only one category.
} Collectively exhaustive
means there is a class for each value
Example
1) A frequency distribution is a grouping of quantitative data into overlapping classes showing the
number of observations in each class
⊚true
⊚false -> Classes in a frequency distribution may not overlap and must be mutually exclusive
2) What is the relative class frequency for the $25 up to $35 class?
9/50=0.15
Bar Chart: A graph that shows the qualitative classes on the horizontal axis and the
class frequencies on the vertical axis. The class frequencies are proportional to the
heights of the bars. Use a bar chart when you wish to compare the number of
observations for each class of a qualitative variable.
2
Pie Chart: A chart that shows the proportion or percentage that each class represents of the
total number of frequencies. Use a pie chart when you wish to compare relative differences
in the percentage of observations for each class of a qualitative variable.
Frequency Distribution: A grouping of quantitative data into mutually exclusive and collectively exhaustive
classes showing the number of observations in each class.
Constructing Frequency Distributions
Step 3 Set the individual class limits. Lower limits should be rounded to an easy-to-read
number when possible
3
Relative Frequency Distributions
Histogram: A graph in which the classes are marked on the horizontal axis and the class frequencies on the
vertical axis. The class frequencies are represented by the heights of the bars, and the bars are drawn adjacent
to each other. A histogram shows the shape of a distribution.
A frequency polygon: similar to a histogram, also shows the shape of a distribution. Good to use when
comparing two or more distributions
4
C3: DESCRIBING DATA: NUMERICAL MEASURES
Example:
There are 42 exits on I-75 through the state of Kentucky.
Listed below are the distances between exits (in miles).
Example:
Median: The midpoint of the values after they have been ordered from the minimum to the maximum values.
Example:
The Carter Construction Company pays its hourly employees $16.50, $19.00, or $25.00 per hour. There are
26 hourly employees: 14 are paid at the $16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is
the mean hourly rate paid for the 26 employees?
6
Example: Population Variance
The number of traffic citations
issued last year by month in
Beaufort County, South Carolina
is reported ------------------à
7
observations will lie within plus or minus 2 standard deviations of the mean, and practically all (99.7%) will
lie within 3 standard deviations of the mean.
If we have a symmetrical distribution, we can use the Empirical Rule, sometimes called the Normal Rule.
begin by estimating the mean (in this example, the mean is $1,851), then find the deviation of each value
from the midpoint, square the results in this column, and then multiply by the class frequency. Divide the
sum of these products and divide by n-1 and finally, take the square root of that calculation.
8
C4 DESCRIBING DATA: DISPLAYING AND EXPLORING DATA
Dot Plots Use dot plots to compare the two data sets like these of the number of vehicles serviced last
month for two different dealerships
Measures of Position
} n= Number of observations
} P= Percentile
Example
Morgan Stanley is an investment company with offices located throughout the United States. Listed below
are the commissions earned last month by a sample of 15 brokers
} First, sort the data from smallest to largest
} If a distribution of wages, incomes, turnover, etc. Is arranged, the quartiles are the values which
divide the distribution into four equal parts. Thus, for a distribution of wages :
• The first quartile (Q1) is the wage below which 25% of the wages are situated or the first
quartile is the wage above which 75% of the wages are situated
• The second quartile (Q2) is the wage below which 50% of the wages are situated or the
wage above which 50% of the wages are situated
• The third quartile (Q3) is the wage below which 75 % of the wages are situated or above
which 25% of the wages are situated.
9
Box Plot a graphic display that shows the general shape of a variable’s distribution. It is based on five
descriptive statistics: the maximum and minimum values, the first and third quartiles, and the median.
Example
Alexander’s Pizza offers free delivery of its pizza within 15 miles. How long does a typical delivery take?
Within what range will most deliveries be completed?
Using a sample of 20 deliveries,
Alexander determined the following:
} Minimum value = 13 minutes
} Q1 = 15 minutes
} Median = 18 minutes
} Q3 = 22 minutes
} Maximum value = 30 minutes
Skewness
10
Skewness Example
Following are the earnings per share for a sample of 15 software companies for the year 2018. The
earnings per share are arranged from smallest to largest.
Correlation Coefficient
Contingency Tables: A table used to classify observations according to two identifiable characteristics.
} It is a cross-tabulation that simultaneously summarizes two variables of interest
} Both variables need only be nominal or ordinal
11
Example
Applewood Auto Group’s profit comparison
} 90 of the 180 cars sold had a
profit above the median and half
below. This meets the definition of
median.
} The percentage of profits above
the median are Kane 48%, Olean
50%, Sheffield 42%, and Tionesta
60%.
INDEX NUMBER: number that expresses the relative change in price, quantity, or value compared to a
base period.
12
Construction of Index Numbers
Suppose the price of a fall weekend package at Tryon Mountain Lodge in western North Carolina in 2000
was $450. The price rose to $795 in 2019. What is the price index for 2019 using 2000 as the base period
and 100 as the base value?
P $795
P = P t (100) = $450 (100) = 176.7 The fall weekend package increased 76.7% from 2000 to 2019.
0
Finally, the prices $20, $22, $23 are averaged if we use three years (2015-2017) as the base and then each
year’s price is divided by 21.67 to obtain the price index.
Unweighted Indexes
} In an unweighted index, we do not consider the quantities
Total:
ΣPi 92.3+ …. +147.9 591.0
P= = = = 98.5
n 6 6
The mean price of food decreased 1.5% 2009 to 2019.
13
Simple Aggregate Price Index
The simple aggregate index for food items on the previous slide is found by dividing the sum of prices in
2019 by the sum of the prices in 2009.
P = (100) = (100) = 103.7
This means that the aggregate group of prices had increased 3.7% from 2009 to 2019.
In a weighted index, the quantities are considered & In the Laspeyres method, base period quantities are
used in both the base period and the given period
} Advantage: Only quantity data from the base period is used which allows for more meaningful
comparison over time
} Disadvantage: Does not reflect changes in buying patterns over time & It may overweight goods
whose prices increase
We conclude the price of this group of items has increased 3.0% from 2009 to 2019.
14
Paasche Price Index Example
The following table shows the calculations to determine the Paasche index.
Σp q $667.78
P = t t (100) = (100) = 88.7
Σp0qt $753.00
This result indicates that there has been a decrease of 11.3% in the price of this “market basket” of goods
between 2009 and 2019.
Value Index
Σptqt $10,600
V = Σp q (100) = $9,000
(100) = 117.8
0 0
Special-Purpose Indexes
example
The Seattle Chamber of Commerce wants to develop a measure of general business activity. It will be
called the General Business Activity Index of the Northwest and will include department store sales (40%),
regional employment (30%), freight car loadings (10%), and exports from Seattle harbor (20%).
15
Business activity has increased
57.0% from 2005 to 2010 and 57.1%
from 2005 to 2018.
Real Income
Suppose Ms. Watts earned $20,000 per year in the base period of 1982, 1983, and 1984. She has a current
income of $40,000. Note that although her money income has doubled since the base period of 1982-84,
the prices she pays for food, gasoline, clothing, and other items has also doubled. Compute her real
income.
Deflating Sales
The sales of Hill Enterprises, a small injection molding company in upstate New York, increased from
1982 to 2018. The owner, Harry Hill, realizes that the price of raw materials used in the production process
has also increased, so Mr. Hill wants to deflate sales to account for the increase in raw materials. What are
the deflated sales for 1990, 2000, 2005, 2010, 2015, and 2018 expressed in constant 1982 dollars?
16
Purchasing Power of the Dollar
Suppose the Consumer Price Index this month is 200.0 (1982-84 = 100). What is the purchasing power of
the dollar?
The CPI of 200 indicates that prices have doubled from the years 1982-84 to this month. Thus, the
purchasing power of the dollar has been cut in half. That is, a 1982-84 dollar is only worth 50 cents this
month.
Classical Probability
Mutually Exclusive: The occurrence of one event means that none of the other events can occur at the
same time.
Collectively Exhaustive: At least one of the events must occur when an experiment is conducted.
17
Empirical Probability
The empirical definition occurs when the number of times an event happens is divided by the number of
outcomes
Empirical Probability: The probability of an event happening is the fraction of the time similar events
happened in the past.
Law of Large Numbers: Over a large number of trials, the empirical probability of an event will approach
its true probability
Subjective Probability
Subjective Concept Of Probabiltiy: The likelihood
(probability) of a particular event happening that is
assigned by an individual based on whatever
information is available.
Examples of subjective probability are:
} Estimating the likelihood the New England
Patriots will be in the Super Bowl next year
} Estimating the likelihood the U.S. budget deficit
will be reduced by half in the next 10 years
18
Rules of Addition
} The rules of addition refer to the probability that any two or more
events can occur
} The special rule of addition is used when the events are mutually
exclusive
Complement Rule
} The complement rule is used to determine the probability of an event happening by subtracting the
probability of an event not happening
Joint Probability: a probability that measures the likelihood two or more events will happen concurrently.
General Rule of Addition Example:
A sample of 200 tourists in Florida shows 120 went to Disney, 100 went to Busch Gardens, and 60 visited both.
19
Special Rule of Multiplication
} The rules of multiplication are applied when two or more events occur simultaneously
} The special rule of multiplication refers to events that are independent
Independence: The occurrence of one event has no effect on the probability of the occurrence of another event.
A survey by the American Automobile Association (AAA) revealed 60% of its members made airline
reservations last year. Two members are selected at random. What is the probability both made airline
reservations last year?
P(R1 and R2) = P(R1)P(R2) = (.60)(.60) = 0.36
Conditional Probability: The probability of a particular event occurring, given that another event has occurred.
} The conditional probability is represented a P(B|A) and is read, the probability of B given A
A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts are white and the others are blue. He gets
dressed in the dark, so he just grabs a shirt and puts in on. He plays golf two days in a row and does not return
the shirts to the closet. What is the probability both shirts are white?
So the likelihood of selecting two shirts and finding them both to be white is .55. This can be extended to more
than two events
Contingency Tables
Contingency Table: A table used to classify sample observations according to two or more identifiable
categories or classes.
One hundred fifty adults were asked if they were older than
50 years of age and the number of Facebook accounts they
used. The following table summarizes the results.
20
1. 2.
3.
4.
5.
6. Independen jika umur dan nonton film tidak berkaitan: jawabannya tidak Independen sebab umur
mempengaruhi keputusan menonton film.
Tree Diagrams
21
Bayes’ Theorem
} Bayes’ Theorem is a method of revising a probability, given that additional information is obtained
Prior Probability: The initial probability based on the present level of information.
Posterior Probability: A revised probability based on additional information.
Suppose 5% of the population of Umen have a disease. A1 represents the part of the population that has the
disease and A2 represents those who do not. Let B denote a test result that shows the disease is present.
P(A1) = 0.05 Individual has the disease
P(A2) = 0.95 Individual does not have the disease
P(B|A1) = 0.90 Test shows the individual has the disease and is correct
P(B|A2) = 0.15 Test incorrectly shows the individual has the disease
Randomly select an individual and perform the test. The test results indicate the disease is present. What is
the probability the test is correct?
Use Bayes’ theorem to solve.
Multiplication Formula
} The multiplication formula states that if there are n ways of doing one thing, and m ways of doing
another thing, then there are m*n ways of doing both
} This can be extended to more than two events. For three events m, n, and x: Total
} number of arrangements = (m)(n)(x)
22
The Permutation Formula
Permutation: Any arrangement of r objects selected from a single group of n possible objects.
There are three electronic parts to be assembled, so n=3. Because all three are to be inserted into the plug-
in component, r=3.
Note: 0! = 1! = 1
3 P3 = 3!/(3-1)! = (3x2x1)/(3-3)! = 6
Label the parts A, B, and C -> ABC BAC CAB ACB BCA CBA
The Grand 16 movie theater uses teams of three employees to work the concession stand each evening.
There are seven employees available to work. How many different teams can be scheduled?
!! '! '!
7
C3 = = = = 35
#!(!%#)! (!('%()! (!)!
The three events—reading Time, Newsweek, or U.S. News & World Report
—are not mutually exclusive because executives can read more than one of the magazines.
The P(Time or U.S. News) = P(Time) + P(U.S. News) − P(Time and U.S. News)
= 0.35 + 0.40 − 0.10 = 0.65.
} There are 25 AAA batteries in a box and 7 are defective. Two batteries are selected without
replacement. What is the probability of selecting a defective battery followed by another defective
battery?
A) 1/2 or 0.50 B) 1/8 or 0.13
C) 1/700 or about 0.0014 D) 7/100 or about 0.07
23
We use the special rule of multiplication to solve this problem as the selections are not independent.
The probability of a defective battery on the first selection is 7/25 = 0.28. The probability of selecting
a second defective battery is a conditional probability that assumes the first selection was defective,
so the probability of a second defective battery is 6/24. The joint probability is
(7/25)(6/24) = 0.070000, or about 0.07.
} A developer of a new subdivision wants to build homes that are all different. There are three different
interior plans that can be combined with any of five different home exteriors. How many different
homes can be built?
A) 8 B) 10
C) 15 D) 30
} The ABCD football association is considering a Super Ten Football Conference. The top 10 football
teams in the country, based on past records, would be members of the Super Ten Conference. Each
team would play every other team in the conference during the season and the team winning the most
games would be declared the national champion. How many games would the conference
commissioner have to schedule each year? (Remember, Oklahoma versus Michigan is the same as
Michigan versus Oklahoma.)
A) 45 B) 50
C) 125 D) 14
} Alonzo, Bob, and Casper work bussing tables at a restaurant. Alonzo has a 40% chance, Bob has a
25% chance, and Casper has a 35% chance of bussing tables in the middle area of the restaurant. If
Alonzo is bussing tables, he has a 5% chance of breaking a dish. If Bob is bussing tables, he has a
2% chance of breaking a dish. Finally, if Casper is bussing tables, he has a 4% chance of breaking a
dish. If there is a broken dish in the middle of the restaurant, what is the probability it was broken by
Bob?
A) 0.014 B) 0.128
C) 0.359 D) 0.513
P(B|Break): 0,128
24
C6 DISCRETE PROBABILITY DISTRIBUTIONS
Probability Distribution: a listing of all the outcomes of an experiment and the probability associated with
each outcome.
Random Variables
Random Variable: quantity resulting from an experiment that, by chance, can assume different values.
Examples
} The number of employees absent from the day shift on Monday: the number might be 0, 1, 2, 3,
…The number absent is the random variable
} The grade level (Freshman, Sophomore, Junior, or Senior) of the members of the St. James High
School Varsity girls’ basketball team. Grade level is the random variable (and notice that it is a
qualitative variable).
25
Two Types of Random Variables
1. Discrete Random Variable:
A random variable that can assume only certain clearly separated values. (usually the result of counting)
Example: Tossing a coin three times and counting the number of heads
For example, the Bank of the Carolinas counts the number of credit cards carried by a group of
customers. The number of cards carried is the discrete random variable.
} The mean is a typical value used to represent the central location of the data
} The mean is also referred to as the expected value
} The amount of spread (or variation) in the data is described by the variance
The standard deviation of the probability distribution is the positive square root of the variance
26
2. How many cars does John expect to sell on a typical Saturday? We find John can expect to sell 2.1
cars on a typical Saturday
In other words, over the long run, say 50 Saturdays in a year,
he can expect to sell (50)(2.1) = 105 cars.
Take the (positive) square root of the variance to get the standard deviation.
The mean is 2.1, the variance is 1.290, and the standard deviation is 1.136 cars.
Binomial Distribution
There are four requirements of a binomial probability distribution
1. There are only two possible outcomes and the outcomes are mutually exclusive, as either a success or a
failure
2. The number of trials is fixed and known
3. The probability of a success is the same for each trial
4. Each trial is independent of any other trial
The binomial distribution is a widely occurring discrete probability distribution.
} Example: A young family has two children, both boys. The probability of the third birth being a boy
is still .50. The gender of the third child is independent of the gender of the other two.
} Probability winning the lottery or not
} Probability hitting a red light on your way to work or not
} Probability developing a side effect from a certain drug or not
} Probability she said “Yes”
27
4. The trials are independent, meaning that the outcome of one trial does not affect the outcome of any
other trial.
Note: Do not confuse the symbol with the mathematical constant 3.1416
Recently, www.creditcards.com reported that 28% of purchases at coffee shops were made with a debit
card. For 10 randomly selected purchases at the Starbucks on the corner of 12th Street and Main:
What is the probability that no purchases What is the probability that exactly one was made with a debit
were made with a debit card? card?
The probability that exactly one of the 10 purchases is made with a debit card
P(x) = nCr(π)r(1 − π)n − r is 0.1456 or 14.56 percent
Shortcut Formulas
28
Binomial Probability Tables
In the Southwest, 5% of all cell phone calls are dropped. What is the probability that out of six randomly
selected calls, none was dropped? Exactly one?
And the probability of selecting 12 cars and finding that the occupants of 7 or more vehicles were
wearing seat belts is .9562
Hypergeometric Distribution
} When sampling from relatively small populations without replacement, use the hypergeometric
distribution
29
Example Hypergeometric Distribution:
Play Time Toys Inc. employs 50 people in the Assembly Dept. 40 of the employees belong to a union and
10 do not. Five employees are selected at random to form a committee. What is the probability that four of
the five belong to a union
(2/52)(0/-2/50-2) (*+,(*-)(+-)
𝑃 (4) = = = .431
0/50 .,++/,'0-
Thus, the probability of selecting 5 assembly workers at random from the 50 workers and finding 4 of the
5 are union members is
0.431.
Poisson Distribution
30
Poisson Probability Distribution Tables
NewYork-LA Trucking company finds the mean number of breakdowns on the New York to Los Angeles
route is 0.30. From the table, we can locate the probability of no breakdowns on a particular run. Find the
column 0.3, then read down that column to the row labeled 0; the value is .7408. The probability of 1
breakdown is .2222
A total of 60% of the customers of a fast-food chain order a hamburger, French fries, and a drink. If a
random sample of 15 cash register receipts is selected, what is the probability that 10 or more will show
that the above three food items were ordered?
A) 1.000 B) 0.186
C) 0.403 D) 0.000
Applying the binomial distribution, go to the binomial probability table, find the case where the number of
trials is n = 15, and the probability of success is π = 0.60. Find the row where x, the number of successes, is
10. Finally, add the probabilities for 10 through 15 successes
(0.186 + 0.127 + 0.063 + 0.022 + 0.005 + 0.000). The result is 0.403.
A management professor receives an average of five e-mail messages per day from students. Assume the
number of messages approximates a Poisson distribution. What is the probability that on a randomly
selected day she will have five messages?
A) 0.0067 B) 0.8750
C) 0.1755 D) 1.0000
31
Applying the Poisson probability distribution, the mean of the distribution is 5. Referring to the Poisson
probability tables or using the Poisson probability formula for x = 5 and a mean of 5, the probability of five
messages is 0.1755.
A committee of three people needs to be chosen. There are three men and five women available to serve on
the committee. If the committee members are randomly chosen, what is the probability that two of the three
people chosen on the committee are men?
A) 0.667
B) 0.536
C) 0.268
D) 0.376
The hypergeometric distribution applies here as selection is without replacement from a finite population.
Here, N = 8 (three men, five women), n = 3 (the committee size), S = 3 (the number of men in the population),
and x = 2 (the number of men selected for the committee). Using formula 6-6, P(x=2) = [(3C2) × (5C1)] /(8C3)
= 15/56 = 0.268.
Uniform Distribution
} It is rectangular in shape, The height of the distribution
is constant or uniform for all values between a and b.
} The mean and the median are equal
} It is completely described by its minimum value a and
its maximum value b
32
Uniform Distribution Example
Southwest Arizona State University provides bus service to students while they are on campus. A bus
arrives at the North Main Street and College Drive stop every 30 minutes between 6 a.m. and 11 p.m.
during weekdays. Students arrive at the bus stop at random times. The time that a student waits is
uniformly distributed from 0 to 30 minutes.
The minimum wait time is 0 minutes and the maximum wait time is 30 minutes, so the range of the
distribution is 30 minutes. The height is
1/(b-a)= 1/30 = .0333
} The time to fly between New York City and Chicago is uniformly distributed with a minimum of 120
minutes and a maximum of 150 minutes. What is the probability that a flight is between 125 and 140
minutes?
A) 1.00 B) 0.50 C) 0.33 D) 0.67
The probability is computed as the area under the curve. For a uniform distribution, it is the area of a
defined rectangle. In this case, the base is (140 − 125) and the height is (1/(150 − 120), or (140 − 125)
× (1/(150 − 120))) = 15/30 = 0.5.
The probability is computed as the area under the curve. For a uniform distribution, it is the area of a
defined rectangle. In this case, the base is (150 − 140) and the height is (1/(150 − 120), or (150 − 140)
× (1/(150 − 120))) = 10/30 = 0.333
Complex formula to find probabilities, but we will not need to use it!
use the table given in Appendix B.3
34
Family of Normal Probability Distributions
Any normal probability distribution can be converted into a standard normal probability distribution
by subtracting the mean from each observation and dividing this difference by the standard deviation.
The results are called z values or z scores
Once the normally distributed observations are standardized, the z values are normally distributed
with a mean of 0 and a standard deviation of 1
Areas Under the Normal Curve
} Here is a portion of the “z” Table
} For example, if you have a z=1.50, this reflects an area (or probability) of .4332.
The entire table can be found in Appendix B.3.
35
Standard Normal Probability Example
Rideshare services are available internationally. A customer uses a smartphone app to request a ride.
Then, a driver receives the request, picks up the customer, and takes the customer to the desired
location. No cash is involved; the payment for the transaction is handled digitally.
Suppose the weekly income of rideshare drivers follows the normal probability distribution with a
mean of $1,000 and a standard deviation of $100. What is the z value of income for a driver who
earns $1,100 per week? For a driver who earns $900 per week?
What is the z-value of income for a driver What is the z-value of income for a driver
who earns $1,100? who earns $900?
Regardless of whether z is +1or -1, the area under the curve is .3413
A z of 1.00 indicates that a weekly income of $1,100 is one standard deviation above the mean and
a z of -1.00 shows that a $900 weekly income is one standard deviation below the mean. Both
incomes are the same distance from the mean.
36
The Empirical Rule Example
As part of its quality assurance program, the Autolite Battery Company conducts tests on battery life. For a
particular D-cell alkaline battery, the mean life is 19 hours. The useful life of the battery follows a normal
distribution with a standard deviation of 1.2 hours.
37
Using the weekly incomes of Uber drivers: Therefore, 34.13% of drivers earn between
P($1,000 < weekly income < $1,100) = 0.3413 $1000 and $1100 and 84.13% of drivers earn
P(weekly income < $1,100) = 0.3413 + 0.5000 =0.8413 less than $1,100
Using the weekly incomes of Uber drivers: Therefore, 48.21% of drivers earn between
P($790 <weekly income < $1,000) = 0.4821 $790 and $1000 and 1.79% of drivers earn less
P(weekly income < $790) = 0.5000 − 0.4821 = 0.0179 than $790.
What is the z-value of income for a driver who earns What is the z-value of income for a driver who
$1,250? earns $1,150?
x-μ $1,250-$1,000 x-μ $1,150-$1,000
Z= σ = $100
= 2.50 Z = σ
= $100
= 1.50
38
Finding a Value for x Using z
Layton Tire and Rubber Company wishes to set a minimum mileage guarantee on its new MX100 tire. Tests
reveal the mean mileage is 67,900 with a standard deviation of 2,050 miles and that the distribution follows
the normal distribution. Let x represent the minimum guaranteed mileage and use the formula for z to solve
so that no more than 4% of tires need to be replaced.
39
Poisson vs Exponential Distribution
} To explain the relationship between the Poisson and the exponential distributions, suppose customers
arrive at a family restaurant during the dinner hour at a rate of six per hour
} The Poisson distribution would have a mean of 6. For a time interval of 1 hour, we can use the Poisson
distribution to find the probability that one, or two, or ten customers arrive.
} But suppose instead of studying the number of customers arriving in an hour, we wish to study the time
between their arrivals
} The time between arrivals is a continuous distribution because time is measured as a continuous random
variable.
b. Find the probability the next order arrives in more than 40 seconds.
40
CHAPTER 7 PRACTICE PROBLEMS
A uniform distribution is defined over the The mean of a normal probability distribution is 500; the standard
interval from 6 to 10. deviation is 10.
a. What are the values for a and b? a. About 68% of the observations lie between what two values?
a=6 b = 10 490 and 510, found by 500 ± 1(10)
b. What is the mean of this uniform b. About 95% of the observations lie between what two values?
distribution? (6 + 10)/2 = 8 480 and 520, found by 500 ± 2(10)
c. What is the standard deviation? c. Practically all of the observations lie between what two values?
1.1547 470 and 530, found by 500 ± 3(10)
d. Show that the probability of any value The weekly mean income of a group of executives is $1,000 and the
between 6 and 10 is equal to 1.0. standard deviation of this group is $100. The distribution is normal.
Area is a rectangle, so What percent of the executives have an income of $925 or less?
height * base = [1/(10 – 6)](10 – 6) = 1 A) 27.34% B) 77.34%
C) 7.5% D) 22.66%
e. What is the probability that the random
variable is more than 7? Z=925-1000/100=-0.75
[1/(10 – 6)](10 – 7) = 0.75 which means probability 0.2734 Less than 925
means 0.5-0.2734=0.2266 or 22.66%
f. What is the probability that the random
variable is between 7 and 9? Waiting times to receive food after placing an order at the local
[1/(10 – 6)](9 – 7) = 0.5 Subway sandwich shop follow an exponential distribution with a
mean of 60 seconds. Calculate the probability a customer waits:
g. What is the probability that the random
variable is equal to 7.91? a. Less than 30 seconds. b. More than 120 seconds.
(–1/60)(30)
P (x= 7.91 ) = 0. For a continuous 1 – e = 0.3935 e(–1/60)(120) = 0.1353
probability distribution, the area for a
point value is zero. (LO7-1) c. Between 45 and 75 seconds.
e(–1/60)(45) – e(–1/60)(75) = 0.1859
The Internal Revenue Service reported the average refund in 2017 was $2,878 with a standard deviation of
$520. Assume the amount refunded is normally distributed.
A large manufacturing firm tests job applicant. Test scores are normally distributed with a mean of 500 and
a standard deviation of 50. Management is considering placing a new hire in an upper-level management
position if the person scores in the upper 6% of the distribution. What is the lowest score a new hire must
earn to qualify for a responsible position?
A) 50 B) 625 C) 460 D) 578
Recall that the area under the normal curve to the right of the mean is 0.5000.
The area between the mean and the desired "cutoff" score is 0.5000 − 0.0600 = 0.4400.
Now refer to the "areas under the normal curve" table.
Search the body of the table for the area closest to 0.4400. The closest area is 0.4406.
Move to the margins from this value and read the z-value of 1.56.
Finally, the lowest score a new hire must earn to qualify is
score x + zσ = 500 + 1.56(50) = 578
2. Systematic Random Sample: A random starting point is selected, and then every kth member of the
population is selected.
} Example: Stood’s Grocery Store wants to study the length of time customers spend in their store
Randomly select the days of the week, the times, and the starting point of the study, then systematically
select the customers and measure the time each spends in the store
Caution: If the population is in some order already, like invoices arranged in increasing dollar amounts, the
systematic procedure should not be used.
3. Stratified Random Sample: A population is divided into subgroups, called strata, and a sample is randomly
selected from each stratum.
} Example: A study of 50 of the 352 largest
US firms’ ad spending Begin by identifying
the strata, then use random sampling within
each group based on relative frequencies to
collect the sample
42
4. Cluster Sampling: A population is divided into clusters using naturally occurring geographic or other
boundaries. Then clusters are randomly selected and a sample is collected by randomly selecting from each
cluster.
} Example
Suppose we wish to sample residents of the 12 counties in the greater Chicago area
about government policy. Randomly select 3 counties and then select a random
sample of the residents in each of the 3 counties
SAMPLING ERROR: The difference between a sample statistic and its corresponding population parameter.
} Sampling error: xn − μ
The Foxtrot Inn’s number of rooms rented in June. The mean number of rooms rented, μ, is 3.13
43
2. What is the sampling distribution of the sample mean for samples of size 2?
4. What observations can be made about the population and the sampling distribution?
} The mean of the distribution of the sample mean ($15.43) is equal to the mean of the population,
μ = μx! (The sample means range from $14 to $17 while the population values range from $14 to $18)
} The spread in the distribution of the sample mean is less than the spread in the population values
} The shapes of the population and sample distributions are different
44
The mean of the distribution of sample
means will be exactly equal to the
population mean, if we select all possible
samples of same size from the population
μ = μx1
The standard deviation of the sampling
distribution of the sample mean is also
called the standard error of the mean
Normal Distribution
} If the population follows a normal distribution, the sampling distribution of the sample mean will also
follow the normal distribution for samples of any size
} If the population is not normally distributed, the sampling distribution of the sample mean will approach
a normal distribution when the sample size is at least 30
} Assume the population standard deviation is known
} To determine the probability that a sample mean falls in a particular region, use the following formula:
x1 - μ (+.(/ %(+..-
z= = = 1.80
σ/√n -.)/√+0
45
We conclude that it is unlikely; there is less than a 4% chance. The process is putting too much soda in the
bottles.
• What is the likelihood the sample mean is greater than 320 minutes?
"#$%""$
z= !" =-0.79 -> P(x>320) = 0,2852
√$"
Total likelihood= 0,5 + 0,2852=0,7852
The likelihood that the sample mean is greater than 320 minutes is 78,52 percent
• What is the likelihood the sample mean is between 320 and 350 minutes?
Probability is 0.7281, found by 0.2852 + 0.4429.
• What is the likelihood the sample mean is greater than 350 minutes?
0.0571, found by 0.5000 - 0.4429.
• What is the probability that the sampling error would be more than 20 minutes?
Sampling error of more than 20 minutes corresponds to times of less than 310 or more than 350
"&$%""$ "+$%""$
minutes. 𝑧 = '$ = − 1.58; 𝑧 = '$ = 1.58.
* *
√)$ √)$
Subtracting: 0.5 - .4429 = .0571 in each tail.
Multiplying by 2, the final probability is .1142.
46
Confidence Interval: A range of values constructed from sample data so that the population parameter is
likely to occur within that range at a specified probability. The specified probability is called the level of
confidence.
The factors that determine the width of a confidence interval for a mean are:
} The number of observations in the sample, n
} The variability in the population, usually estimated by the sample standard deviation, s
} The desired level of confidence
x - sample mean
z - z - value for a particular confidence level
σ - the population standard deviation
n - the number of observations in the sample
Finding a Value of z
The method for finding z for a 95% confidence interval is
95% of all confidence intervals computed from random samples selected from a population will
contain the population mean. To illustrate, suppose we select many samples of 49 store managers,
perhaps several hundred. We could expect about 95% of these confidence intervals to contain the
population mean. About 5% of the intervals would not contain the population mean. This is due to
sampling error and is the risk we assume when we select the level of confidence.
Finding a Value of t
} First assume the population is normal
} Using Appendix B.5, move across the columns identified for confidence intervals
} In the next example, we want to use the 95% level of confidence, so move to that column
} Then find df, the degrees of freedom (df), sample size minus 1 -> n-1
48
Thus, 1 degree of freedom is lost in a
sampling problem involving the standard
deviation of the sample because one
number (the arithmetic mean) is known.
For a 95% level of confidence and 9 degrees
of freedom, we select the row with 9
degrees of freedom. The value of t is 2.262.
The endpoints of the confidence interval are 0.256 and 0.384. The margin of error is 0.064. The manufacturer
can be reasonably sure (95% confident) that the mean remaining tread depth is between 0.256 and 0.384
inch. Because the value 0.30 is in this interval, it is possible that the mean of the population is 0.30.
How do we interpret this result? If we repeated this study 200 times, calculating the 95% confidence interval
with each sample’s mean and the standard deviation, we expect 190 of the intervals would include the
population mean. Ten of the intervals would not include the population mean. This is the effect of sampling
error.
49
What is the interpretation of a 96% confidence level?
A. Approximately 96 out of 100 such intervals would include the true value of the population
parameter
B. There's a 4% chance that the given interval does not include the true value of the population
parameter
C. The interval contains 96% of all sample means.
If 100 samples were collected from the same population and, based on each sample, 100 sample means
were calculated, and they were used to construct 100 confidence intervals, 96% or 96 of the 100 confidence
intervals are expected to include the population mean. A total of 4%, or 4 of the 100 confidence intervals,
are not expected to include the population mean
Examples
} Southern Tech career services reports that 80% of its graduates enter the job market in a position
related to their field of study
} A recent study of married men between the ages 35 and 50 found that 63% felt that both partners
should earn a living
50
Confidence Interval, π Example
The union representing the Bottle Blowers of America (BBA) is considering a proposal to merge with the
Teamsters Union. At least three-fourths of the BBA membership must approve any merger. A random
sample of 2,000 current members reveals 1,600 plan to vote for the merger proposal. What is the
estimate of the population proportion? Can you conclude that the necessary proportion of BBA members
favor the merger? Why?
p = 1600/2000= .80
The endpoints of the confidence interval are 0.782 and 0.818, so we conclude the merger will likely pass
because the interval estimate includes values greater than 75% of the union membership.
51
Determining Sample Size for Proportions
There are three factors that determine the sample size when we wish to estimate a proportion:
The margin of error, the desired lv. of confidence, a value for π to calculate the var in the population
Using Appendix B.5, move across the top row to 90% and then down to df row 39, the t value is 1.685.
The population mean is more than $431.65 but less than $468.35.
52
3. Using the confidence interval, explain why the population mean could be $445. Could the population
mean be $425? Why?
The endpoints are $431.65 and $468.35, so the population mean could be $445. It is not likely the
population mean is $425 since $425 is not within the confidence interval.
Using t-statistics
n = 50
df = 50-1 = 49
t-stat= 2.680
A group of statistics students decided to conduct a survey at their university to find the average (mean)
amount of time students spent studying per week. Assuming a population standard deviation of three
hours, what is the required sample size if the error should be less than a half hour with a 99% level of
confidence?
A) 196 B) 239
C) 15 D) 554
Using the formula When determining sample size, remember to round any partial value up to the next
whole number value.
z value for confidence interval: 99%: 2= 49.5%
probability 49.5% equal: 2.57
53
Step 1 State the null hypothesis (H0) and the alternate hypothesis (H1)
NULL HYPOTHESIS: A statement about the value of a population parameter developed for the purpose of
testing numerical evidence.
} The null hypothesis always includes the equal sign
} For example; =, ≥, or ≤ will be used in H0
ALTERNATE HYPOTHESIS A: statement that is accepted if the sample data provide sufficient evidence that
the null hypothesis is false.
} The alternate hypothesis never includes the equal sign
} For example; ≠, <, or > is used in H1
In hypothesis testing for the mean, μ, when σ is known, the test statistic z is computed
The region or area of rejection defines the location of all the values that are either so large or so small
that their probability of occurrence under a true null hypothesis is remote
54
Critical Value: The dividing point between the region where the null hypothesis is rejected and the region
where it is not rejected.
} The sampling distribution of the statistic z
follows the normal distribution
} Here, an α of .05 is used in a one-tailed
test
} The value 1.645 separates the regions
where the null hypothesis is rejected and
where it is not rejected
} The value 1.645 is the critical value
55
Two-Tailed Test Example, σ Known
Jamestown Steel Company manufactures and assembles desks and other office equipment at several
plants in New York State. At the Fredonia plant, the weekly production of the Model A325 desk follows a
normal distribution with a mean of 200 and a standard deviation of 16. New production methods have
been introduced and the vice president of manufacturing would like to investigate whether there has been
a change in weekly production of the Model A325. Is the mean number of desks produced different from
200 at the 0.01 significance level?
Step 4: Formulate the decision rule by first determining the critical values of z.
Decision Rule:
If the computed value of z is not between −2.576 and 2.576, reject the null hypothesis.
If z falls between −2.576 and 2.576, do not reject the null hypothesis.
1x - μ 203.5-200
z= = = 1.547
σ/√n 16/√50
Decision: Because 1.547 does not fall in the rejection region, we decide not to reject H0.
56
One-Tailed Test
Suppose instead of wanting to know if there had been a change in the mean number of desks assembled,
the vice president wanted to know if there had been an increase in the number of units assembled. Can
we conclude, because of the improved production methods, that the mean number of desks assembled in
the last 50 weeks was more than 200?
Before:
A two-tailed test
H0: = 200 desks
H1: ≠ 200 desks
Now:
A one-tailed test
H0: ≤ 200 desks
H1: > 200 desks
The critical values for a one-tailed test are different from a two-tailed test at the same significance level.
In the two-tailed test, we split the significance level in half and put half in the lower tail and half in the
upper tail. In a one-tailed test, we put all the rejection region in one tail. Using Appendix B.5 again, move
to the top heading called “Level of Significance, select the column with = .01, and move to the last row,
which is labeled z value is 2.326.
Finding a p-Value
} In the previous example about desk production, the computed z was 1.547 and H0 was not
rejected
} Round the computed z-value to two decimal places, 1.55
} Using the z-table, find the probability of finding a z-value of 1.55
or more by 0.5000 − 0.4394 = 0.0606
57
Hypothesis Testing, σ Unknown
The test statistic of 2.818 is greater than our critical value of 1.796.
Therefore, our decision is: Reject H0
Step 6: Interpret the result
We conclude that the time customers spend in the lot
is more than 15 minutes. This result indicates that the airport may need to add more parking places.
58
CHAPTER 10 PRACTICE PROBLEMS
The average cost of tuition plus room and board for a small private liberal arts college is reported to be
$9,500 per term, but a financial administrator believes that the average cost is higher. A study conducted
using 350 small liberal arts colleges showed that the average cost per term is $9,845. The population
standard deviation is $1,200. Let α = 0.05. What is our decision about the average cost?
A) Equal to $9,500 B) Greater than $9,500
C) Less than $9,500 D) Not equal to $9,500
Based on the sample information, the test statistic is as this test statistic is greater than the critical value
of +1.645
we reject the null hypothesis and conclude the mean is greater than $9,500.
Alternatively, the p-value is the probability of observing a sample value as extreme as, or more extreme
than, the value observed, given that the null hypothesis is true.
The probability of getting a sample mean of $9,845 or greater, assuming a population mean of $9,500
corresponds to the probability of obtaining a z-value greater than 5.38.
This probability is beyond the range of the "areas under the normal curve" table, so the probability is
extremely small or virtually zero.
The p-value, 0.0000, is less than the significance level 0.05, so the decision is to reject the null hypothesis
and conclude the mean is greater than $9,500.
The mean income per person in the United States is $60,000, and the distribution of incomes follows a
normal distribution. A random sample of 10 residents of Wilmington, Delaware, had a mean of $70,000
with a standard deviation of $10,000. At the .05 level of significance, is that enough evidence to conclude
that residents of Wilmington, Delaware, have more income than the national average?
59