Engineering Data Analysis Module 1-4
Engineering Data Analysis Module 1-4
Page 1 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Key Terms:
In statistics, we generally want to study a population. You can think of a population as a collection
of persons, things, or objects under study. To study the population, we select a sample. The idea of sampling
is to select a portion (or subset) of the larger population and study that portion (the sample) to gain
information about the population. Data are the result of sampling from a population.
Because it takes a lot of time and money to examine an entire population, sampling is a very practical
technique. If you wished to compute the overall grade point average at your school, it would make sense to
select a sample of students who attend the school. The data collected from the sample would be the students'
grade point averages.
In presidential elections, opinion poll samples of 1,000–2,000 people are taken. The opinion poll is
supposed to represent the views of the people in the entire country. Manufacturers of canned carbonated
drinks take samples to determine if a 16 ounce can contain 16 ounces of carbonated drink. From the sample
data, we can calculate a statistic”.
A statistic is a number that represents a property of the sample. For example, if we consider one math
class to be a sample of the population of all math classes, then the average number of points earned by
students in that one math class at the end of the term is an example of a statistic. The statistic is an estimate
of a population parameter.
Page 2 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
A parameter is a numerical characteristic of the whole population that can be estimated by a statistic.
Since we considered all math classes to be the population, then the average number of points earned per
student over all the math classes is an example of a parameter. One of the main concerns in the field of
statistics is how accurately a statistic estimates a parameter. The accuracy really depends on how well the
sample represents the population. The sample must contain the characteristics of the population in order to
be a representative sample. We are interested in both the sample statistic and the population parameter in
inferential statistics. In a later chapter, we will use the sample statistic to test the validity of the established
population parameter.
A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that
can be determined for each member of a population.
Variables may be numerical or categorical.
• Numerical variables take on values with equal units such as weight in pounds and time in hours.
• Categorical variables place the person or thing into a category.
“If we let X equal the number of points earned by one math student at the end of a term, then X is a numerical variable.
If we let Y be a person's party affiliation, then some examples of Y include Republican, Democrat, and Independent. Y is
a categorical variable”.
We could do some math with values of X (calculate the average number of points earned, for
example), but it makes no sense to do math with values of Y (calculating an average party affiliation makes
no sense).
Data are the actual values of the variable. They may be numbers or they may be words. Datum is a
single value. Two words that come up often in statistics are mean and proportion.
“If you were to take three exams in your math classes and obtain scores of 86, 75, and 92, you would calculate
your mean score by adding the three exam scores and dividing by three (your mean score would be 84.3 to one decimal
place). If, in your math class, there are 40 students and 22 are men and 18 are women, then the proportion of men
students is 22/40 and the proportion of women students is 18/40”.
Data is collected every second of every day from a vast array of sources. From the security cameras
that deploy facial recognition technology when people enter a building to the mobile devices that track
shopping, media, and communication habits, images and numbers are continually being collected by
Page 3 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
government agencies, consumer groups, and other organizations from all around the world. This data
contains information that can help businesses operate more efficiently and reach the right customers.
However, to be of any value, data must be correctly interpreted. Misinterpreted data can lead to flawed
insights that could disrupt an organization’s growth and stability strategies. To ensure that these copious
amounts of information are leveraged effectively, businesses and other groups are hiring data scientists to
help collect, store, and analyze pertinent information. While these professionals come from a variety of
backgrounds, the growing field of data science provides a number of rewarding opportunities specifically for
engineers. The fields overlap in significant ways, which often makes professionals with an engineering
background a good fit for a role working with data.
Data analysis involves gathering and studying data to form insights that can be used to make
decisions. The information derived can be useful in several different ways, such as for building a business
strategy or ensuring the safety and efficiency of an engineering project. Data collection and analysis is
becoming increasingly important across most every industry. Fields that collect this information include
marketing, sports, entertainment, medicine, communications, government, criminal justice, electronics, and
aerospace. Data can help companies make decisions on issues as diverse as how to engage their target
Page 4 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
audiences, what purchases to make, and how to organize their staff members. Ultimately, data science is not
just about collecting and analyzing information. It is about being able to predict the future and verify the
results of past decisions.
Engineering is one industry that has been particularly influenced by the growing need for data
collection and analysis. As big data has begun to play a larger role in industries around the world, engineers
have been called on to play an influential role in the way this information is gathered, stored, and leveraged.
Professionals with an engineering background generally prove to be particularly adept at developing
techniques for analyzing data groups to extract valuable information.
To succeed in a career as a data scientist, an engineer should possess the following qualifications:
Analytics expertise: Experience extrapolating information from large quantities of numbers will help you
succeed in this role. Depending on where you work, knowledge of specific analytic tools will also
likely be required.
Computer knowledge: Gone are the days of crunching numbers on a hand-held calculator — much less with
pen and paper. The vast majority of your day will be spent working on a computer, so knowledge of
coding, unstructured data, and cloud tools will increase your marketability.
Communication skills: It is important to be able to present your findings in a clear and concise way to ensure
that your employer understands the information and can act accordingly.
Strong drive: In data science, you should regularly be looking for ways to improve how information is
collected and processed. Being an intellectually curious self-starter will take you far in this role.
Exercise No. 1
Determine what the key terms refer to in the following study. We want to know the average (mean) amount
of money first year college students spend at ABC College on school supplies that do not include books.
We randomly surveyed 100 first year students at the college. Three of those students spent $150, $200,
and $225, respectively.
Answer:
• The population is all first-year students attending ABC College this term.
Page 5 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
• The sample could be all students enrolled in one section of a beginning statistics course at ABC
College (although this sample may not represent the entire population).
• The parameter is the average (mean) amount of money spent (excluding books) by first year college
students at ABC College this term.
• The statistic is the average (mean) amount of money spent (excluding books) by first year college
students in the sample.
• The variable could be the amount of money spent (excluding books) by one first year student. Let X =
the amount of money spent (excluding books) by one first year student attending ABC College.
• The data are the dollar amounts spent by the first-year students. Examples of the data are $150, $200,
and $225.
Exercise No. 2
Determine what the key terms refer to in the following study.
A study was conducted at a local college to analyze the average cumulative GPAs of students who graduated
last year. Fill in the letter of the phrase that best describes each of the items below.
1. Population_____ 2. Statistic _____ 3. Parameter _____ 4. Sample _____ 5. Variable _____
6. Data _____
a) all students who attended the college last year
b) the cumulative GPA of one student who graduated from the college last year
c) 3.65, 2.80, 1.50, 3.90
d) a group of students who graduated from the college last year, randomly selected
e) the average cumulative GPA of students who graduated from the college last year
f) all students who graduated from the college last year
g) the average cumulative GPA of students in the study who graduated from the college last year
Answer:
1. f; 2. g; 3. e; 4. d; 5. b; 6. c
Page 6 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Scales of Measurement
When gathering data by any method, measurements are usually obtained (e.g., height in inches,
weight in pounds, age in years, I.Q. scores, temperature in degrees Celsius, incidence rates, mortality rates,
etc.) Measurements are classified into four scales. In selecting the statistical tool to be used for drawing
inferences on a random sample, the type of measurement scale must be carefully chosen.
1. Nominal Scale
A measurement that classifies elements into two or more categories or classes. The numbers indicate that
the elements are different, but the difference is not according to order or magnitude.
Example: Distribution of Patients in XYZ Hospital According to Religion and Gender
2. Ordinal Scale
A measurement scale that ranks individuals in terms of the degree to which they possess a characteristic.
Page 7 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Legend:
0= not anxious
1= low anxiety level
2= moderate anxiety level
3= high anxiety level
3. Interval Scale
A measurement scale that, in addition to ordering scores from highest to lowest, establishes a uniform
unit in the scale so that any distance between two consecutive scores is of equal magnitude.
Example:
The aptitude scores from 80 to 90 are of equal difference as that of the aptitude scores from 90 to 100.
There is also no absolute zero in the scale. For example, a place where the temperature reading is 0 degrees
is 0 degree Celsius does not mean that there is no temperature in that place.
4. Ratio Scale
A measurement scale that, in addition to being an interval scale, also has an absolute zero in the scale.
Examples:
Height, weight, area, volume, speed, rate of doing work, amount of money deposited in a bank.
Page 8 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
There is no formula for selecting the best method to be used when gathering data. It depends on the
researcher’s design of the study, the type of data, the time allotment to complete the study, and the
researcher’s financial capacity.
There are several ways in collecting data among which are the following: Clerical tools and Mechanical
Devices
A. Clerical Tools
1. Questionnaire - Defines by Good as a list of planned, well-planned questions written questions related
to a particular topic, with space provided for indicating the response to each question, intended for
submission to a number of persons for reply; commonly used on a normative survey and in the
measurement of attitudes and opinions
Construction of Questionnaires:
1. Doing library search
2. Talking to knowledgeable people
3. Mastering the guidelines
4. Writing the questionnaire
5. Editing the questionnaire
6. Rewriting the questionnaire
7. Pretesting the questionnaire (dry run)
Page 9 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
e. Understanding
2. Interview- is one of the major techniques in gathering data or information. It is a purposeful face to
face relationship between two persons.
Types of Interviews
a. Direct Method- The researcher personally interviews the respondents.
b. Indirect Method- The researchers may use a telephone to interview the respondents.
Classes of Interview
a. Standardized- The interviewer is not allowed to change the specific wordings of the questions in the
interview schedule. He must conduct all interviews in the same manner, and he cannot adapt
questions for specific situations or pursue statements
b. Non-standardized- The interviewer has complete freedom to develop each interview in the most
appropriate manner for each situation. He is not held to any specific questions. This is the same as
so-called informal interview.
c. Semi-standardized- The interviewer is required to ask a number of specific major questions, and
beyond these he is free to probe as he chooses. There are prepared principal questions to be asked
and once these are asked and answered the interpreter is free to ask any questions as he sees fit for
the situations.
d. Focused- Also called depth interview. Similar to non-standardized interview, the researcher asks a
series of questions based on his previous understanding and insight of the situation. The interview
is focused on specific topics that are to be investigated in depth.
e. Nondirective- The interviewee or subject is allowed and even encouraged to express his feelings
without the fear of disapproval. The subject can express his feelings and views on certain topics even
without waiting to be questioned or even without pressure from the interviewer.
3. Empirical Observation Method- Means of gathering information for research, may be defined as
perceiving data through the senses: sight, hearing, taste, touch and smell. The sense of sight is the most
important and the most used among the senses.
Types of Observation
a. Participant and non-participant observation
1. Participant- Observer takes active part in the activities of the group being observed.
Page 10 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Exercise No. 2
Identify each quantitative variable as discrete or continuous. Write D if discrete or C if continuous
1. The boiling point of water is 100 deg. Cel.
2. Length of hair of female students.
3. Number of foreigners migrating to the Philippines every year
4. Her home telephone number is 2581376.
5. The number of children with missing/decayed teeth in barangay A is 2000.
6. John’s height is 168 cm.
7. The following data are the densities of sample substances taken from Tabing-Ilog River (g/cc):
23.6, 19.8, 15.0, 7.8 and 2.4.
8. Weights in pounds of the Math quiz contestants.
9. The average speed of motorboats cruising in Manila Bay every day is 50m/s.
10. Scores of 16 students in a Statistics Quiz.
Page 11 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Engineering Experiments
If we had infinite time and resource budgets there probably wouldn't be a big fuss made over designing
experiments. In production and quality control we want to control the error and learn as much as we can
about the process or the underlying theory with the resources at hand. From an engineering perspective we're
trying to use experimentation for the following purposes:
a. reduce time to design/develop new products & processes
b. improve performance of existing processes
c. improve reliability and performance of products
d. achieve product & process robustness
e. perform evaluation of materials, design alternatives, setting component & system tolerances, etc.
We always want to fine-tune or improve the process. In today's global world this drive for
competitiveness affects all of us both as consumers and producers.
Robustness is a concept that enters into statistics at several points. At the analysis, stage robustness refers
to a technique that isn't overly influenced by bad data. Even if there is an outlier or bad data you still want to
get the right answer. Regardless of who or what is involved in the process - it is still going to work.
Page 12 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Every experiment design has inputs. Back to the cake baking example: we have our ingredients such
as flour, sugar, milk, eggs, etc. Regardless of the quality of these ingredients we still want our cake to come
out successfully. In every experiment there are inputs and in addition, there are factors (such as time of baking,
temperature, geometry of the cake pan, etc.), some of which you can control and others that you can't control.
The experimenter must think about factors that affect the outcome. We also talk about the output and the
yield or the response to your experiment. For the cake, the output might be measured as texture, flavor,
height, size, or flavor.
Randomization
This is an essential component of any experiment that is going to have validity. If you are doing a comparative
experiment where you have two treatments, a treatment and a control, for instance, you need to include in
your experimental process the assignment of those treatments by some random process. An experiment
includes experimental units. You need to have a deliberate process to eliminate potential biases from the
conclusions, and random assignment is a critical step.
Replication
Replication is some in sense the heart of all of statistics. To make this point... Remember what the standard
error of the mean is? It is the square root of the estimate of the variance of the sample mean, i.e.,.The width
of the confidence interval is determined by this statistic. Our estimates of the mean become less variable as
Page 13 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
the sample size increases. Replication is the basic issue behind every method we will use in order to get a
handle on how precise our estimates are at the end. We always want to estimate or control the uncertainty in
our results. We achieve this estimate through replication. Another way we can achieve short confidence
intervals is by reducing the error variance itself. However, when that isn't possible, we can reduce the error in
our estimate of the mean by increasing n.
Another way is to reduce the size or the length of the confidence interval is to reduce the error variance -
which brings us to blocking.
Blocking
Blocking is a technique to include other factors in our experiment which contribute to undesirable variation.
Much of the focus in this class will be to creatively use various blocking techniques to control sources of
variation that will reduce error variance. For example, in human studies, the gender of the subjects is often
an important factor. Age is another factor affecting the response. Age and gender are often considered
nuisance factors which contribute to variability and make it difficult to assess systematic effects of a treatment.
By using these as blocking factors, you can avoid biases that might occur due to differences between the
allocation of subjects to the treatments, and as a way of accounting for some noise in the experiment. We
want the unknown error variance at the end of the experiment to be as small as possible. Our goal is usually
to find out something about a treatment factor (or a factor of primary interest), but in addition to this, we
want to include any blocking factors that will explain variation.
Page 14 of 15
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Factors
We usually talk about "treatment" factors, which are the factors of primary interest to you. In addition to
treatment factors, there are nuisance factors which are not your primary focus, but you have to deal with
them. Sometimes these are called blocking factors, mainly because we will try to block on these factors to
prevent them from influencing the results.
There are other ways that we can categorize factors: Experimental vs. Classification Factors
Experimental Factors
These are factors that you can specify (and set the levels) and then assign at random as the treatment to the
experimental units. Examples would be temperature, level of an additive fertilizer amount per acre, etc.
Classification Factors
These can't be changed or assigned, these come as labels on the experimental units. The age and sex of the
participants are classification factors which can't be changed or randomly assigned. But you can select
individuals from these groups randomly.
Quantitative Factors
You can assign any specified level of a quantitative factor. Examples: percent or pH level of a chemical.
Qualitative Factors
These factors have categories which are different types. Examples might be species of a plant or animal, a
brand in the marketing field, gender, - these are not ordered or continuous but are arranged perhaps in sets.
References:
Barbara Illowsky and Susan Dean, 2018, Introductory to Statistics
Calderon, J.F., and Gonzales, E.C., (2016) Methods of Research and Thesis Writing
De Belen, R., and Feliciano, P., (2015) 1st Edition Basic Statistics for Research
Pareño, E., and Jimenez, R., (2006) Basic Statistics: A Worktext
https://online.stat.psu.edu/stat503/book/export/html/632
Page 15 of 15
Probability
Engr. Sheila Jane Margaret C. Peñ a
Instructor
PROBABILITY
◉ The strand of mathematics looking at
the chance of events occurring.
◉ The chance that a given event will
occur.
2
EXPERIMENT
◉ Is the process by which an observation (or measurement) is
obtained.
Example: Recording a test grade, Measuring daily rainfall,
Flipping a coin and observing the face that appears. Possible
outcomes: Head, Tail, Tossing a die: Possible Outcomes: 1, 2, 3,
4, 5, 6
Each experiment may result in an outcome, which is
called an Event and is denoted by capital letter.
3
SAMPLE SPACE
◉ A set in which all of the possible
outcomes of a statistical experiment are
represented as points. It is also
represented by the symbol S.
4
Example
◉ The sample space when a coin is flipped is S = {H,T}
◉ The sample space of tossing a dice is S = {1,2,3,4,5,6}
5
ELEMENTS
◉ A member of an object in a set
◉ An item or term contained within a
6
EVENT
◉ Is an occurrence or the possibility of an
occurrence that is being investigated.
◉ It is a set of outcomes from a given experiment.
◉ An Event is a subset of a sample space.
7
Example
◉ If A is the event that an odd number comes out in a single toss
of a dice, then A = {1,3,5} is a subset of the sample space
S = {1,2,3,4,5,6}
8
SUBSET
◉ A subset of a given set is a collection of
things that belong to the original set.
◉ A set whose members are part of a bigger
set.
9
COMPLEMENT
◉ The complement of an event A with respect
to the sample space S is the set of all
elements of S that are not in event A. We
denote the complement of A by the symbol
A'.
10
Example
◉ Let R be the event that a red card is selected from an ordinary
deck of 52 playing cards, and let S be the entire deck. Then R’
is the event that the card selected from the deck is not a red
card but a black card.
◉ Consider the sample space S = {book, cell phone, mp3, paper,
stationery, laptop}.
Let A = {book, stationery, laptop, paper}.
Then, the complement of A, A’ = {cell phone, mp3}.
11
INTERSECTION
◉ The intersection of two events A and B,
denoted by the symbol AB is the event
containing all elements that are common to
A and B.
12
Example
◉ Let E be the event that a person selected at random in a
classroom is majoring in engineering, and let F be the event
that the person is female. Then E ∩F is the event of all female
engineering students in the classroom.
13
14
Two events A and B are mutually
exclusive events if AB = , that is A and
B have no elements in common.
Ex. In the die-tossing experiment, if
A = {1,2,3} and B = {4,5,6}, then A ∩ 𝐵 = ∅
15
The union of the two events A and B,
denoted by the symbol A∪B, is the event
containing all the elements that belong to
A or B or both.
A∩B∩C = region 1,
(A∪B)∩C’ = regions 2,
6, and 7
17
EXERCISES:
1. List the elements of each of the sample spaces
(a) the set of integers between 1 and 50 divisible by 8;
(b) the set S = {x | x2 +4x−5=0 };
(c) the set of outcomes when a coin is tossed until a tail
or three heads appear;
(d) the set S = {x | x is a continent};
(e) the set S = {x | 2x−4 ≥ 0 and x<1}.
18
19
20
21
22
23
RESULTS:
24
2. An experiment consists of choosing a number
from 0 to 9 at random. Let A be the event of
choosing an even number and B be the event of
choosing an odd number. Let C be the event of
choosing the number 2, 3, 4 or 5 and D be the
event of choosing 1, 6, or 7. List the elements of
the sets corresponding to the following:
Sample space S, A, B, C, D, AC, C', AB ,
(SC)', ABD'
25
Solution:
S = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
A = even numbers {0, 2, 4, 6, 8}
B = odd numbers {1, 3, 5, 7, 9}
C = {2, 3, 4, 5}
D = {1, 6, 4, 7}
AC (SC)'
{0, 2, 3, 4, 5, 6, 7, 8} {0, 1, 6, 7, 8, 9}
C’ ABD'
{0, 1, 6, 7, 8 ,9} {ø}
AB
{ø}
26
Counting Rules Useful in Probability
28
Counting Sample Points - The fundamental principle of counting,
often referred to as the multiplication rule
◉ Rule 1. If an operation can be performed in n1 ways, and if for each
of these ways a second operation can be performed in n2 ways, then
the two operations can be performed together in 𝑛1 𝑛2 ways.
Ex1. How many sample points are there in the sample space
when a pair of dice is thrown once?
The first die can land face-up in any one of n1 = 6 ways. For
each of these 6 ways, the second die can also land face-up in n2 = 6
ways. Therefore, the pair of dice can land in n1n2 = (6)(6) = 36 possible
ways.
29
Ex.2. A developer of a new subdivision offers
prospective home buyers a choice of Tudor, rustic,
colonial, and traditional exterior styling in ranch, two-
story, and split-level floor plans. In how many different
ways can a buyer order one of these homes?
30
Ex.2.
31
Example 3.
32
Rule 2. If an operation can be performed in 𝑛1 ways, and if
for each of these a second operation can be performed in
𝑛2 ways, and for each of the first two a third operation can
be performed in 𝑛3 ways, and so forth, then the sequence
of k operations can be performed in 𝑛1 𝑛2 … 𝑛𝑘 ways.
Ex1. Sam is going to assemble a computer by himself.
He has the choice of chips from two brands, a hard drive from
four, memory from three, and an accessory bundle from five
local stores. How many different ways can Sam order the
parts?
33
Solution:
n1n2n3n4n5 = (2)(4)(3)(5)
= 120 different ways to order parts.
34
PERMUTATION
◉ is an arrangement of all or part of a set of objects.
35
Theorems:
◉ For any non-negative integer n, n!, called “n
factorial,” is defined as n!=n(n−1)···(2)(1), with
special case 0! = 1.
36
Now consider the number of permutations that are
possible by taking two letters at a time from abcd.
These would be ab, ac, ad, ba, bc, bd, ca, cb, cd, da,
db, and dc.
In general, n distinct objects taken r at a time can be
arranged in n(n−1)(n−2)···(n−r + 1) ways. We represent
this product by the symbol
Theorem 2. The number of permutations of n distinct
𝑛!
objects taken r at a time is n𝑃𝑟 = 𝑛−𝑟 !
37
Example
1. In one year, three awards (research, teaching, and service)
will be given to a class of 25 graduate students in a statistics
department. If each student can receive at most one award,
how many possible selections are there?
38
Theorem 3. The number of permutations of n
objects arranged in a circle is (n−1)!.
We see that there are 5 ways to partition a set of 4 elements into two
subsets, or cells, containing 4 elements in the first cell and 1 element in
the second.
41
◉ Theorem 5. The number of ways of partitioning a set of n
objects into r cells with n1 elements in the first cell, n2
elements in the second, and so forth, is
𝑛 𝑛!
𝑛1 , 𝑛2 , … , 𝑛𝑟 = 𝑛1 ! 𝑛2 ! … 𝑛𝑟 !
42
43
In many problems, we are interested in the number of ways of
selecting r objects from n without regard to order. These selections
are called combinations. A combination is actually a partition with
two cells, the one cell containing the r objects selected and the other
cell containing the (n−r) objects that are left. The number of such
combinations, denoted by
𝑛 𝑛 𝑛!
𝑟, 𝑛 − 𝑟 𝑜𝑟 𝑟 = 𝑟! (𝑛 − 𝑟)!
44
Example
◉ A young boy asks his mother to get 5 picture card from his collections of
10 flower picture cards and 5 sports picture cards. How many are there
that his mother can get 3 flower and 2 sports picture cards?
45
Exercises:
1. If an experiment consists of throwing a die and then drawing
a letter at random from the English alphabet, how many points
are there in the sample space?
2. How many ways are there to select 3 candidates from 8
equally qualified recent graduates for openings in an
accounting firm?
46
EXERCISE 1.
47
EXERCISE 2.
Answer:
Step-by-step explanation:
First, we provide the given facts
3 candidates
8 possible candidates
This equation will make use of
the permutation formula. This is the
formula for the number of possible
combinations of r objects from a set of n
objects, regardless of order.
In this case,
r=3
n=8
Therefore, the accounting firm can have 336 possible combinations to get 3
candidates from an 8-man pool of qualified recent graduates.
48
49
Probability of an Event, P[Event]
◉ Postulates of Probability:
0 P[Event] 1
P [impossible event] = 0
P[sure event] = 1
The sum of the probabilities for all simple events in S
is equal to 1.
50
The probability of an event A is the sum of the weights of all
sample points in A. Therefore,
0 ≤ P (A ) ≤ 1, P (φ) = 0, and P (S ) = 1.
51
Example 1: A coin is tossed twice. What is the probability that at least
1 head occurs?
Answer: S = {HH, HT, TH, TT}
P (A) = 3/4
52
◉ Rule 3.
If an experiment can result in any one of N
different equally likely outcomes, and if exactly n
of these outcomes correspond to event A, then
the probability of event A is
𝑛
𝑃 𝐴 = 𝑁
53
Example
◉ In a poker hand consisting of 5 cards, find the probability of holding 2 aces and 3
jacks.
Solution : The number of ways of being dealt 2 aces from 5 cards is
5 5!
= = 10
2 2! 3!
The number of ways of being dealt 3 jacks from 5 cards is
5 5!
= 3!2! = 10
3
By the multiplication rule (Rule 2.1), there are n = (10)(10) = 100 hands with 2 aces and 3
jacks. The total number of 5-card poker hands, all of which are equally likely, is
52 52! 10 (10)
= = 2,598,960 therefore P(E)= = 3.85𝑋10−5
5 5!47! 2,598,960
54
Additive Rules
Theorem 6. If A and B are two events, then
P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Corollary 1: If A and B are mutually exclusive, then
P(A ∪ B) = P(A) + P(B).
Corollary 2: If A1, A2,…, An are mutually exclusive, then
P(A1 ∪ A2 ∪ ···∪ An) = P(A1) + P(A2) + ···+ P(An).
Corollary 3: If A1, A2,…, An is a partition of sample space S,
then
P(A1UA2U…U An ) = P(A1) + P(A2) + ···+ P(An) = P(S) = 1.
55
Theorem 7.
For three events A, B, and C,
What is P(A ∪ B ∪ C) equal to?
= P(A) + P(B) + P(C)− P(A ∩ B) −
P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C).
56
Example
57
Theorem 8.
If A and A’ are complementary
events, then
P(A) + P(A’) = 1.
58
Examples
1. If the probabilities that an automobile mechanic
will service 3, 4, 5, 6, 7, or 8 or more cars on any given
workday are, respectively, 0.12, 0.19, 0.28, 0.24, 0.10,
and 0.07, what is the probability that he will service
at least 5 cars on his next day at work?
59
60
61
62
63
Exercises:
1. In a high school graduating class of 100 students, 54 studied
mathematics, 69 studied history, and 35 studied both
mathematics and history. If one of these students is selected
at random, find the probability that
(a) the student took mathematics or history;
(b) the student did not take either of these subjects;
(c) the student took history but not mathematics.
64
or 0.88
65
or 0.12
or 0.34
66
2. In a poker hand consisting of 5 cards,
find the probability of holding
(a) 3 aces;
(b) 4 hearts and 1 club.
67
68
69
Thank you for
LISTENING!
Any questions?
(+63)9178295308 @Margaret Peña scpena@carsu.edu.ph
70
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
In this Lesson, we take the next step toward inference. In Lesson 2, we introduced events and probability
properties. In this Lesson, we will learn how to numerically quantify the outcomes into a random variable. Then
we will use the random variable to create mathematical functions to find probabilities of the random variable.
One of the most important discrete random variables is the binomial distribution and the most important
continuous random variable is the normal distribution. They will both be discussed in this lesson. We will also
talk about how to compute the probabilities for these two variables.
Learning Objectives
Random Variable
A random variable is a variable that takes on different values determined by chance. In other words, it
is a numerical quantity that varies at random.
Probability Functions
Transforming the outcomes to a random variable allows us to quantify the outcomes and determine certain
characteristics. If we have a random variable, we can find it’s probability function.
1|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Probability Function
A probability function is a mathematical function that provides probabilities for the possible outcomes of the
random variable, X. It is typically denoted as f(x).
There are two classes of probability functions: Probability Mass Functions and Probability Density Functions.
The probability of a random variable being less than or equal to a given value is calculated using another
probability function called the cumulative distribution function.
2|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
3|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
4|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
5|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
6|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
A binary variable is a variable that has two possible outcomes. For example, sex (male/female) or having
a tattoo (yes/no) are both examples of a binary categorical variable.
A random variable can be transformed into a binary variable by defining a “success” and a “failure”. For
example, consider rolling a fair six-sided die and recording the value of the face. The random variable, value of
the face, is not binary. If we are interested, however, in the event A= {3 is rolled}, then the “success” is rolling a
three. The failure would be any value not equal to three. Therefore, we can create a new variable with two
outcomes, namely A = {3} and B = {not a three} or {1, 2, 4, 5, 6}. This new variable is now a binary variable.
7|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
8|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
9|Pa ge
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
10 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
11 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
12 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
13 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
The standard normal is important because we can use it to find probabilities for a normal random
variable with any mean and any standard deviation.
But first, we need to explain Z-scores.
14 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
15 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
16 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Notation for joint probability can take a few different forms. The following formula represents the probability
of events intersection:
Although joint probability can help you determine the likelihood of two different events happening at the
same time, it does not indicate how the two events may influence each other.
Probability is a field closely related to statistics that deals with the likelihood of an event or phenomenon
occurring. It is quantified as a number between 0 and 1, where 0 indicates an impossible chance of occurrence
and 1 denotes the certain outcome of an event.
For example, the probability of drawing a red card from a deck of cards is 1/2 = 0.5. This means there is an
equal chance of drawing a red and black card since there are 26 of each in a deck. As such, there is a 50-50
probability of drawing a red card versus a black card.
Joint probability measures two events that happen at the same time. It can only be applied to situations where
more than one observation can occur at the same time. So the joint probability of picking a card that is both
red and 6 from a deck is P(6 ∩ red) = 2/52 = 1/26 since a deck of cards has two red sixes—the six of hearts and
the six of diamonds. Because the events red and 6 are independent, you can also use the following formula to
calculate the joint probability:
The symbol “∩” in a joint probability is referred to as an intersection. The probability of event X and event Y
happening is the same thing as the point where X and Y intersect. Therefore, the joint probability is also called
the intersection of two or more events. A Venn diagram is perhaps the best visual tool to explain an
intersection:
17 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
From the Venn above, the point where both circles overlap is the intersection, which has two observations: the
six of hearts and the six of diamonds.
Joint probability should not be confused with conditional probability, which is the probability that one event
will happen given that another action or event happens. The conditional probability formula is as follows:
This is to say that the chance of one event happening is conditional on another event happening. For example,
from a deck of cards, the probability that you get a six, given that you drew a red card is P(6│red) = 2/26 = 1/13,
since there are two sixes out of 26 red cards.
Joint probability only factors in the likelihood of both events occurring. Conditional probability can be used to
calculate joint probability, as seen in this formula:
The probability that A and B occurs is the probability of X occurring, given that Y occurs multiplied by the
probability that Y occurs. Given this formula, the probability of drawing a 6 and a red at the same time will be as
follows:
Let's highlight another example to show how joint probability works. This example uses dice and we want to find
out what the probability is that you'll roll a four on each die when you roll them. Remember, there are six sides
to each one.
In order to determine the joint probability, we first need to determine the probability of each roll:
1/6 x 1/6 = 1/36; This means that there is a 1/36 chance of rolling two fours using a pair of dice.
References:
https://www.investopedia.com/terms/j/jointprobability.asp
https://online.stat.psu.edu/stat500/lesson/3
https://www.knime.com/blog/continuous-probability-distribution
Walpole, E.R. et. al. (2011). Probability & Statistics for Engineers & Scientists. NINTH EDITION
18 | P a g e
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Objectives
Page 1 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Since we know the weights from the population, we can find the population mean.
19 + 14 + 15 + 9 + 10 + 17
𝜇= = 14
6
To demonstrate the sampling distribution, let’s start with obtaining all of the possible samples of size 𝑛 = 2
from the populations, sampling without replacement. The table below shows all the possible samples, the weights for
the chosen pumpkins, the sample mean and the probability of obtaining each sample. Since we are drawing at
random, each sample will have the same probability of being chosen.
We can combine all of the values and create a table of the possible values and their respective probabilities.
The table is the probability table for the sample mean and it is the sampling distribution of the sample mean weights
of the pumpkins when the sample size is 2. It is also worth noting that the sum of all the probabilities equals 1. It might
be helpful to graph these values.
Page 2 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
One can see that the chance that the sample mean is exactly the population mean is only 1 in 15, very small.
(In some other examples, it may happen that the sample mean can never be the same value as the population mean.)
When using the sample mean to estimate the population mean, some possible error will be involved since the sample
mean is random.
Now that we have the sampling distribution of the sample mean, we can calculate the mean of all the sample
means. In other words, we can find the mean (or expected value) of all the possible 𝑥’s.
The mean of the sample means is:
Even though each sample may give you an answer involving some error, the expected value is right at the target:
exactly the population mean. In other words, if one does the experiment over and over again, the overall average of
the sample mean is exactly the population mean.
Now, let's do the same thing as above but with sample size 𝑛 = 5
The following dot plots show the distribution of the sample means corresponding to sample sizes of 𝑛 = 2
and of 𝑛 = 5.
Page 3 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Sampling Error
The error resulting from using a sample characteristic to estimate a population characteristic. Sample size and sampling
error: As the dot plots above show, the possible sample means cluster more closely around the population mean as
the sample size increases. Thus, the possible sampling error decreases as sample size increases. What happens when
the population is not small, as in the pumpkin example?
An instructor of an introduction to statistics course has 200 students. The scores out of 100 points are shown in the
histogram.
The population mean is 𝜇 = 71.18 and the population standard deviation is 𝜎 = 10.73
What happens when the sample comes from a population that is not normally distributed? This is where the Central
Limit Theorem comes in.
The Central Limit Theorem applies to a sample mean from any distribution. We could have a left-skewed or a
right-skewed distribution. As long as the sample size is large, the distribution of the sample means will follow an
approximate Normal distribution. For the purposes of this course, a sample size of 𝑛 > 30 is considered a large sample.
Page 4 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 5 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
The following example will illustrate how to find the sampling distribution for an example where the population is
small.
In a particular family, there are five children. Their names are Alex (A), Betina (B), Carly (C), Debbie (D), and
Edward (E). The table below shows the child’s name and their favorite color.
We are interested in the proportion of children in the family who prefer the color blue, and from the table, we can see
that 𝑝 = .40 of the children prefer blue.
Similar to the pumpkin example earlier in the lesson, let's say we didn't know the proportion of children who like blue
as their favorite color. We'll use resampling methods to estimate the proportion. Let’s take 𝑛 = 2 repeated samples,
Page 6 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
taken without replacement. Here are all the possible samples of size 𝑛 = 2 and their respective probabilities of the
proportion of children who like blue.
Page 7 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
For the sampling distribution of the sample mean, we learned how to apply the Central Limit Theorem when the
underlying distribution is not normal. In this section, we will present how we can apply the Central Limit Theorem to
find the sampling distribution of the sample proportion. Let’s start by defining a Bernoulli random variable, 𝑌.
Page 8 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 9 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Try it!
If a random sample of size of seventy-five was surveyed, what is the probability we would find more than 50% of
Americans with an iPhone?
Page 10 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
In the previous lesson, the sampling distributions for the sample statistics assumed we knew the population
parameters (fantasy land). In real life, we do not know these parameters (or we would not need statistics!). In this
lesson, we switch from "fantasy land" to real life. We know what to do when the parameters are known, let's see how
we can use that information when they are unknown.
Objectives
Introduction to Inferences
The real power of statistics comes from applying the concepts of probability to situations where you have
data but not necessarily the whole population. The results, called statistical inference, give you probability statements
about the population of interest based on that set of data.
1. Estimation
Use information from the sample to estimate (or predict) the parameter of interest.
For instance, using the result of a poll about the president's current approval rating to estimate (or predict) his or
her true current approval rating nationwide.
2. Statistical Tests
Use information from the sample to determine whether a certain statement about the parameter of interest is
true. Statistical tests are also referred to as hypothesis tests.
For instance, suppose a news station claims that the President’s current approval rating is more than 75%. We
want to determine whether that statement is supported by the poll data.
Page 11 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Estimation: Two common estimation methods are point and interval estimates.
Point Estimates
An estimate for a parameter that is one numerical value. An example of a point estimate is the sample mean
or the sample proportion.
Interval Estimates
Interval estimates give an interval as the estimate for a parameter. This is a new concept which is the focus
of this lesson. Such intervals are built around point estimates which is why understanding point estimates is important
to understanding interval estimates. In this course, the interval estimates we find are referred to as confidence
intervals.
Confidence Interval
An interval of values computed from sample data that is likely to cover the true parameter of interest. There
are many estimators for population parameters. For example, if we want to know the "center" of a distribution, why
use the mean? Could we use the median? How about using the middle value, i.e. (max+min)/2? We choose particular
estimators for various reasons with information based on their sampling distributions. Here are some properties of
"good" estimators.
Page 12 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
In putting the two properties above together, the center of our interval should be the point estimate for the
parameter of interest. With the estimated standard error of the point estimate, we can include a measure of
confidence to our estimate by forming a margin of error.
This you may have readily seen whenever you have heard or read a sample survey result (e.g. a survey of the
current approval rating of the President, or attitude citizens have on some new policy). In such surveys, you may hear
reference to the "44% of those surveyed approved of the President's reaction" (this is the sample proportion), and
"the survey had a 3.5% margin or error, or ± 3.5%." This latter number is the margin of error.
With the point estimate and the margin of error, we have an interval for which the group conducting the
survey is confident the parameter value falls (i.e. the proportion of U.S. citizens who approve of the President's
reaction). In this example, that interval would be from 40.5% to 47.5%.
The interpretation of a confidence interval has the basic template of: "We are 'some level of percent
confident' that the 'population of interest' is from 'lower bound to upper bound'. The phrases in single quotes are
replaced with the specific language of the problem. We will discuss more about the interpretation of a confidence
interval after we provide a few more examples.
Page 13 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 14 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 15 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 16 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
Page 17 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
To construct a confidence interval for a population mean, we're going to apply the same three steps as with the
population proportion, but first, let's look at the two possible cases.
Page 18 of 19
College of Engineering and Industrial Technology
Department of Agricultural and Biosystems Engineering
Engineering Data Analysis
References:
https://www.investopedia.com/terms/j/jointprobability.asp
https://online.stat.psu.edu/stat500/lesson/3
https://www.knime.com/blog/continuous-probability-distribution
Walpole, E.R. et. al. (2011). Probability & Statistics for Engineers & Scientists. NINTH EDITION
Page 19 of 19