Module 1 EE Data Analysis
Module 1 EE Data Analysis
Module no. 1
Obtaining Data
Introduction:
Historically, measurements were obtained from a sample of people and generalized to a
population, and the terminology has remained. Sometimes the data are all of the
observations in the population. This results in a census. However, in the engineering
environment, the data are almost always a sample that has been selected from the
population. Three basic methods of collecting data are
An effective data-collection procedure can greatly simplify the analysis and lead to
improved understanding of the population or process that is being studied. We now
consider some examples of these data-collection methods.
Objectives:
At the end of this topic, the students should be able to
1. Discuss the different methods that engineers use to collect data;
2. Describe the different methods of sampling in planning and conducting surveys;
3. Identify the advantages that designed experiments have in comparison to other
methods of collecting engineering data.
Pre – Test
Module 1 – Obtaining Data
Name: Subject:
Course/Section: Date:
Learning Activities:
1.1 Methods of Data Collection
1.1.1 Retrospective Study
Montgomery, Peck, and Vining (2012) describe an acetone-butyl alcohol distillation
column (A distillation column is an essential item used in the distillation of liquid mixtures
to separate the mixture into its component parts, or fractions, based on the differences
in volatilities) for which concentration of acetone in the distillate (the output product
stream) is an important variable. Factors that may affect the distillate are the reboil
temperature, the condensate temperature, and the reflux rate. Production personnel
obtain and archive the following records:
• The concentration of acetone in an hourly test sample of output product
• The reboil temperature log, which is a record of the reboil temperature over time
• The condenser temperature controller log
• The nominal reflux rate each hour
The reflux rate should be held constant for this process. Consequently, production
personnel change this very infrequently.
A retrospective study would use either all or a sample of the historical process data
archived over some period of time. The study objective might be to discover the
relationships among the two temperatures and the reflux rate on the acetone
concentration in the output product stream. However, this type of study presents some
problems:
1. We may not be able to see the relationship between the reflux rate and acetone
concentration because the reflux rate did not change much over the historical
period.
2. The archived data on the two temperatures (which are recorded almost
continuously) do not correspond perfectly to the acetone concentration
measurements (which are made hourly). It may not be obvious how to construct
an approximate correspondence.
3. Production maintains the two temperatures as closely as possible to desired
targets or set points. Because the temperatures change so little, it may be
difficult to assess their real impact on acetone concentration.
4. In the narrow ranges within which they do vary, the condensate temperature
tends to increase with the reboil temperature. Consequently, the effects of these
two process variables on acetone concentration may be difficult to separate.
As you can see, a retrospective study may involve a significant amount of data, but
those data may contain relatively little useful information about the problem.
Furthermore, some of the relevant data may be missing, there may be transcription or
recording errors resulting in outliers (or unusual values), or data on other important
factors may not have been collected and archived.
usually conducted for a relatively short time period, sometimes variables that are not
routinely measured can be included. In the distillation column, the engineer would
design a form to record the two temperatures and the reflux rate when acetone
concentration measurements are made. It may even be possible to measure the input
feed stream concentrations so that the impact of this factor could be studied.
Generally, an observational study tends to solve problems 1 and 2 and goes a long
way toward obtaining accurate and reliable data. However, observational studies may
not help resolve problems 3 and 4.
Example:
Suppose that an engineer is designing a nylon connector to be used in an automotive
engine application. The engineer is considering establishing the design specification on
wall thickness at 3∕32 inch but is somewhat uncertain about the effect of this decision on
the connector pull-off force. If the pull-off force is too low, the connector may fail when it
is installed in an engine. Eight prototype units are produced and their pull-off forces
measured, resulting in the following data (in pounds): 12.6, 12.9, 13.4, 12.3, 13.6, 13.5,
12.6, 13.1. As we anticipated, not all of the prototypes have the same pull-off force. We
say that there is variability in the pull-off force measurements.
Much of what we know in the engineering and physical-chemical sciences is
developed through testing or experimentation. Designed experiments play a very
important role in engineering design and development and in the improvement of
manufacturing processes.
Population
A population is the entire group of individuals, scores, measurements, etc. about which
we want information.
Sample
The part of the population from which we actually collect information and is used to
draw conclusions about the whole.
Random Selection
A process of gathering a representative sample for a particular study. Random means
the people are chosen by chance, each person has the same probability of being
chosen.
Often, physical laws (such as Ohm’s law and the ideal gas law) are applied to help
design products and processes. We are familiar with this reasoning from general laws to
specific cases. But it is also important to reason from a specific set of measurements to
more general cases to answer the previous questions. This reasoning comes from a
sample (such as the eight connectors) to a population (such as the connectors that will
be in the products that are sold to customers). The reasoning is referred to as statistical
inference. See Figure 2.1. Clearly, reasoning based on measurements from some
objects to measurements on all objects can result in errors (called sampling errors).
However, if the sample is selected properly, these risks can be quantified and an
appropriate sample size can be determined.
Figure 3.1
The objectives of the experiment may include:
Example:
In a golf experiment all possible combinations of factor levels are tested such as the
following:
2. One-factor-at-a-time(OFAT)
Used extensively in practice
disadvantage: fails to consider interaction between the factors and less
efficient
Consider again the problem involving the choice of wall thickness for the nylon
connector. This is a simple illustration of a designed experiment. The engineer chose
two wall thicknesses for the connector and performed a series of tests to obtain pull-off
force measurements at each wall thickness.
Designed experiments offer a very powerful approach to studying complex systems,
such as the distillation column (section 1.1). This process has three factors—the two
temperatures and the reflux rate—and we want to investigate the effect of these three
factors on output acetone concentration. A good experimental design for this problem
must ensure that we can separate the effects of all three factors on the acetone
concentration. The specified values of the three factors used in the experiment are
called factor levels. Typically, we use a small number of levels such as two or three for
each factor. For the distillation column problem, suppose that we use two levels, ―high‖
and ―low‖ (denoted +1 and −1, respectively), for each of the three factors. A very
reasonable experiment design strategy uses every possible combination of the factor
levels to form a basic experiment with eight different settings for the process. See Table
1.1 for this experimental design.
Figure 3.4 illustrates that this design forms a cube in terms of these high and low
levels. With each setting of the process conditions, we allow the column to reach
equilibrium, take a sample of the product stream, and determine the acetone
concentration. We then can draw specific inferences about the effect of these factors.
Such an approach allows us to proactively study a population or process.
An important advantage of factorial experiments is that they allow one to detect an
interaction between factors. Consider only the two temperature factors in the distillation
experiment. Suppose that the response concentration is poor when the reboil
temperature is low, regardless of the condensate temperature. That is, the condensate
temperature has no effect when the reboil temperature is low. However, when the reboil
temperature is high, a high condensate temperature generates a good response, but a
low condensate temperature generates a poor response. That is, the condensate
temperature changes the response when the reboil temperature is high. The effect of
condensate temperature depends on the setting of the reboil temperature, and these
two factors are said to interact in this case.
The effect of condensate temperature depends on the setting of the reboil temperature,
and these two factors are said to interact in this case. If the four combinations of high
and low reboil and condensate temperatures were not tested, such an interaction would
not be detected.
We can easily extend the factorial strategy to more factors. Suppose that the
engineer wants to consider a fourth factor, type of distillation column. There are two
types: the standard one and a newer design. Figure 3.5 illustrates how all four factors—
reboil temperature, condensate temperature, reflux rate, and column design—could be
investigated in a factorial design. Because all four factors are still at two levels, the
experimental design can still be represented geometrically as a cube (actually, it’s a
hypercube). Notice that as in any factorial design, all possible combinations of the four
factors are tested. The experiment requires 16 trials.
Generally, if there are k factors and each has two levels, a factorial experimental
design will require runs. For example, with k = 4, the design in Figure 3.5 requires
16 tests. Clearly, as the number of factors increases, the number of trials required in a
factorial experiment increases rapidly. This quickly becomes unfeasible from the
viewpoint of time and other resources. Fortunately, with four to five or more factors, it is
usually unnecessary to test all possible combinations of factor levels. A fractional
factorial experiment is a variation of the basic factorial arrangement in which only a
subset of the factor combinations is actually tested. Figure 3.6 shows a fractional
factorial experimental design for the distillation column. The circled test combinations in
this figure are the only test combinations that need to be run. This experimental design
requires only 8 runs instead of the original 16; consequently it would be called a one-
half fraction.
This is an excellent experimental design in which to study all four factors. It will provide
good information about the individual effects of the four factors and some information
about how these factors interact.
Factorial and fractional factorial experiments are used extensively by engineers and
scientists in industrial research and development, where new technology, products, and
processes are designed and developed and where existing products and processes are
improved.
Basic Principles
1. Randomization: Running the trials in random order
the allocation of the experimental material
the order in which the individual runs of the experiment
2. Replication
to obtain an estimate of the experimental error
to estimate the true mean response for one of the factor levels
reflects sources of variability both between runs and within runs
distinction between replication and repeated measurements
3. Blocking: a design technique
used to reduce the variability transmitted from nuisance factors
Mechanistic Model
Mechanistic model is built from our underlying knowledge of the basic physical
mechanism.
As a simple example, suppose that we are measuring the flow of current in a thin
copper wire. Our model for this phenomenon might be Ohm’s law:
Empirical Model
Empirical model uses our engineering and scientific knowledge of a phenomenon, but it
is not directly developed from our theoretical or first-principles understanding of the
underlying mechanism.
For instance, suppose that we are interested in the number average molecular
weight ( ) of a polymer. Now we know that is related to the viscosity of the material
(V), and it also depends on the amount of catalyst (C) and the temperature (T) in the
polymerization reactor when the material is manufactured. The relationship between
and these variables is
say, where the form of the function f is unknown. Perhaps a working model could be
developed from a first-order Taylor series expansion, which would produce a model of
the form
where the β’s are unknown parameters. Now just as in Ohm’s law, this model will not
exactly describe the phenomenon, so we should account for the other sources of
variability that may affect the molecular weight by adding another term to the model;
therefore,
is the model that we will use to relate molecular weight to the other three variables.
Self-Evaluation:
1. Which of the three basic methods of data collection is the least useful? Why?
2. Which of the two basic sampling methods is the most useful in conducting
surveys? Why?
3. What are the advantages the designed experiment method of collecting data has
compared to the two methods?
Review of Concepts:
1. The three methods of data collection are
a. retrospective study
b. observational study
c. designed experiment
2. A population is the entire group of individuals, scores, measurements, etc. about
which we want information.
3. Sample is that part of the population from which we actually collect information
and is used to draw conclusions about the whole.
4. The four types of probability sampling method:
a. Simple Random Sampling
b. Systematic Sampling
c. Stratified Random Sampling
d. Cluster Sampling
5. The four types of non-probability sampling method:
a. Convenience sampling
b. Voluntary response sampling
c. Purposive sampling
d. Snowball sampling
6. The three strategies of experimentation are
a. Best-guess approach
b. One-factor-at-a-time (OFAT)
c. Factorial experiment
7. The basic principles of conducting experiments (design of experiments) are
Randomization, replication and blocking.
8. Mechanistic model is built from our underlying knowledge of the basic physical
mechanism.
9. Empirical model uses our engineering and scientific knowledge of a
phenomenon, but it is not directly developed from our theoretical or first-
principles understanding of the underlying mechanism.
References:
Douglas C. Montgomery & George C. Runger. Applied Statistics And Probability
For Engineers. John Wiley & Sons; 7th ed. 2018.
Hongshik Ahn. Probability And Statistics For Sciences & Engineering with
Examples in R. Cognella, Inc.; 2nd ed. 2018.
Jay L. Devore. Probability and Statistics for Engineering and the Science.
Cengage Learning; 9th ed. 2016.