EDA-Chapter-1 Batangas State University

Engineering Data
Analysis
MATH 403
CHAPTER 1
STATISTICS
Statistics may be defined as the science that deals with the
collection, organization, presentation, analysis, and
interpretation of data in order be able to draw judgments or
conclusions that help in the decision-making process. The two
parts of this definition correspond to the two main divisions of
Statistics. These are Descriptive Statistics and Inferential
Statistics.
TWO MAIN DIVISION OF STATISTICS
Descriptive Statistics Inferential Statistics
Descriptive Statistics, which is referred Inferential Statistics, implied in the

to in the first part of the definition, deals second part of the definition, deals with
with the procedures that organize, making a judgment or a conclusion
summarize and describe quantitative about a population based on the
data. It seeks merely to describe data. findings from a sample that is taken
from the population.
Statistical Terms
Before proceeding to the discussion of the different methods of obtaining data, let us have first
definition of some statistical terms:
Population or Universe
refers to the totality of objects, persons, places, things used in a particular
study. All members of a particular group of objects (items) or people
(individual), etc. which are subjects or respondents of a study.
Sample
is any subset of population or few members of a population.
Data
are facts, figures and information collected on some characteristics of a
population or sample. These can be classified as qualitative or quantitative
data.
Ungrouped
(or raw) data are data which are not organized in any specific way. They are
simply the collection of data as they are gathered.
Grouped Data
are raw data organized into groups or categories with corresponding
frequencies. Organized in this manner, the data is referred to as frequency
distribution.
Parameter
is the descriptive measure of a characteristic of a population
Statistic
is a measure of a characteristic of sample
Constant
is a characteristic or property of a population or sample which is
common to all members of the group.
Variable
is a measure or characteristic or property of a population or sample that
may have a number of different values. It differentiates a particular
member from the rest of the group. It is the characteristic or property
that is measured, controlled, or manipulated in research. They differ in
many respects, most notably in the role they are given in the research
and in the type of measures that can be applied to them.
1.1 Methods of Data Collection
Collection of the data is the first step in conducting statistical inquiry. It
simply refers to the data gathering, a systematic method of collecting and
measuring data from different sources of information in order to provide
answers to relevant questions. This involves acquiring information published
literature, surveys through questionnaires or interviews, experimentations,
documents and records, tests or examinations and other forms of data
gathering instruments.
The person who conducts the inquiry is an investigator.
The one who helps in collecting information is an enumerator and
information is collected from a respondent.
Data can be primary or secondary.
According to Wessel, “Data collected in the process of
investigation are known as primary data.” These are collected
for the investigator’s use from the primary source.
Secondary data, on the other hand, is collected by some
other organization for their own use but the investigator also
gets it for his use.
According to M.M. Blair, “Secondary data are those already
in existence for some other purpose than answering the
question in hand.”
In the field of engineering, the three basic methods of collecting data are
through retrospective study, observational study and through a designed
experiment.
A retrospective study would use the population or sample of the historical
data which had been archived over some period of time. It may involve a
significant amount of data but those data may contain relatively little useful
information about the problem, some of the relevant data may be missing,
recording errors or transcription may be present, or those other important
data may not have been gathered and archived. These result in statistical
analysis of historical data which identifies interesting phenomena but
difficulty of obtaining solid and reliable explanations is encountered.
In an observational study, however, process or population is observed
and disturbed as little as possible, and the quantities of interests are
recorded.
In a designed experiment, deliberate or purposeful changes in the
controllable variables of the system or process is done. The resulting
system output data must be observed, and an inference or decision about
which variables are responsible for the observed changes in output
performance is made. Experiments designed with basic principles such as
randomization are needed to establish cause-and-effect relationships.
Much of what we know in the engineering and physical-chemical sciences
is developed through testing or experimentation.
In engineering, there are problem areas with no scientific or
engineering theory that are directly or completely applicable, so
experimentation and observation of the resulting data is the only way to
solve them. There are times there is a good underlying scientific theory to
explain the phenomena of interest. Tests or experiments are almost always
necessary to be conducted to confirm the applicability and validity of the
theory in a specific situation or environment. Designed experiments are
very important in engineering design and development and in the
improvement of manufacturing processes in which statistical thinking and
statistical methods play an important role in planning, conducting, and
analyzing the data. (Montgomery, et al., 2018)
1.2 Planning and Conducting Surveys
A survey is a method of asking respondents some well-constructed
questions. It is an efficient way of collecting information and easy to
administer wherein a wide variety of information can be collected. The
researcher can be focused and can stick to the questions that interest
him and are necessary in his statistical inquiry or study.
However surveys depend on the respondents honesty, motivation,

memory and his ability to respond. Sometimes answers may lead to
vague data. Surveys can be done through face-to-face interviews or
self-administered through the use of questionnaires.
ADVANTAGE DISADVANTAGE
The advantages of face-to-face The disadvantages of
interviews include fewer face-to-face interviews are that
misunderstood questions, fewer they can be expensive and
incomplete responses, higher time-consuming and may require a
response rates, and greater control large staff of trained interviewers.
over the environment in which the In addition, the response can be
survey is administered; also, the biased by the appearance or
researcher can collect additional attitude of the interviewer.
information if any of the
respondents’ answers need
clarifying.
Self-administered surveys are less expensive than interviews. It
can be administered in large numbers and does not require many
interviewers and there is less pressure on respondents. However, in
self-administered surveys, the respondents are more likely to stop
participating mid-way through the survey and respondents cannot ask
to clarify their answers. There are lower response rates than in
personal interviews.
When designing a survey, the following steps are useful:
1. Determine the objectives of your survey: What questions do you want
to answer?
2. Identify the target population sample: Whom will you interview? Who
will be the respondents? What sampling method will you use?
3. Choose an interviewing method: face-to-face interview, phone
interview, self-
administered paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase
them.
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
In choosing the respondents, sampling techniques are necessary.
Sampling is the process of selecting units (e.g., people,
organizations from a population of interest. Sample must be a
representative of the target population. The target population is the
entire group a researcher is interested in; the group about which the
researcher wishes to draw conclusions.
There are two ways of selecting a sample. These are the

non-probability sampling and the probability sampling.
Non-Probability Sampling
Non-probability sampling is also called judgment or subjective sampling. This

method is convenient and economical but the inferences made based on the
findings are not so reliable. The most common types of non-probability sampling
are the convenience sampling, purposive sampling and quota sampling.
In convenience sampling, the researcher use a device in obtaining the

information from the respondents which favors the researcher but can cause bias
to the respondents.
In purposive sampling, the selection of respondents is predetermined

according to the characteristic of interest made by the researcher. Randomization
is absent in this type of sampling.
There are two types of quota sampling: proportional
and non proportional. In proportional quota sampling
the major characteristics of the population by sampling
a proportional amount of each is represented.
Non-proportional quota sampling is a bit less

restrictive. In this method, a minimum number of
sampled units in each category is specified and not
concerned with having numbers that match the
proportions in the population.
Probability Sampling
In probability sampling, every member of the population is given an equal
chance to be selected as a part of the sample. There are several probability
techniques. Among these are simple random sampling, stratified sampling and
cluster sampling.
Simple Random Sampling

Simple random sampling is the basic sampling technique where a group of
subjects (a sample) is selected for study from a larger group (a population). Each
individual is chosen entirely by chance and each member of the population has
an equal chance of being included in the sample. Every possible sample of a
given size has the same chance of selection; i.e. each member of the population
is equally likely to be chosen at any stage in the sampling process.
Stratified Sampling
There may often be factors which divide up the population into
sub-populations (groups / strata) and the measurement of interest may vary
among the different sub-populations. This has to be accounted for when a
sample from the population is selected in order to obtain a sample that is
representative of the population. This is achieved by stratified sampling.
A stratified sample is obtained by taking samples from each stratum or
sub-group of a population. When a sample is to be taken from a population
with several strata, the proportion of each stratum in the sample should be the
same as in the population.
Stratified sampling techniques are generally used when the population is
heterogeneous, or dissimilar, where certain homogeneous, or similar,
sub-populations can be isolated (strata). Simple random sampling is most
appropriate when the entire population from which the sample is taken is
homogeneous.
Some reasons for using stratified sampling over simple random sampling are:
1. the cost per observation in the survey may be reduced;
2. estimates of the population parameters may be wanted for each
subpopulation;
3. increased accuracy at given cost.
Cluster Sampling
Cluster sampling is a sampling technique where the entire
population is divided into groups, or clusters, and a random sample of
these clusters are selected. All observations in the selected clusters
are included in the sample.
1.3 Planning and Conducting Experiments:
Introduction to Design of Experiments
The products and processes in the engineering and scientific
disciplines are mostly derived from experimentation. An experiment is
a series of tests conducted in a systematic manner to increase the
understanding of an existing process or to explore a new product or
process. Design of Experiments, or DOE, is a tool to develop an
experimentation strategy that maximizes learning using minimum
resources.
Design of Experiments is widely and extensively used by engineers and
scientists in improving existing process through maximizing the yield and
decreasing the variability or in developing new products and processes. It is
a technique needed to identify the "vital few" factors in the most efficient
manner and then directs the process to its best setting to meet the
ever-increasing demand for improved quality and increased productivity.
The methodology of DOE ensures that all factors and their interactions
are systematically investigated resulting to reliable and complete information.
There are five stages to be carried out for the design of experiments. These
are planning, screening, optimization, robustness testing and verification.
1. Planning
It is important to carefully plan for the course of experimentation before
embarking upon the process of testing and data collection. At this stage,
identification of the objectives of conducting the experiment or investigation,
assessment of time and available resources to achieve the objectives.
Individuals from different disciplines related to the product or process should
compose a team who will conduct the investigation. They are to identify possible
factors to investigate and the most appropriate responses to measure. A team
approach promotes synergy that gives a richer set of factors to study and thus a
more complete experiment. Experiments which are carefully planned always lead
to increased understanding of the product or process. Well planned experiments
are easy to execute and analyze using the available statistical software.
2. Screening
Screening experiments are used to identify the important factors that
affect the process under investigation out of the large pool of potential factors.
Screening process eliminates unimportant factors and attention is focused on
the key factors. Screening experiments are usually efficient designs which
require few executions and focus on the vital factors and not on interactions.
3. Optimization
After narrowing down the important factors affecting the process, then
determine the best setting of these factors to achieve the objectives of the
investigation. The objectives may be to either increase yield or decrease
variability or to find settings that achieve both at the same time depending on
the product or process under investigation.
4. Robustness Testing
Once the optimal settings of the factors have been determined, it is important
to make the product or process insensitive to variations resulting from changes
in factors that affect the process but are beyond the control of the analyst. Such
factors are referred to as noise or uncontrollable factors that are likely to be
experienced in the application environment. It is important to identify such
sources of variation and take measures to ensure that the product or process is
made robust or insensitive to these factors.
5. Verification
This final stage involves validation of the optimum settings by conducting a few
follow up experimental runs. This is to confirm that the process functions as
expected and all objectives are achieved

EDA-Chapter-1 Batangas State University

Uploaded by

Copyright:

Available Formats

EDA-Chapter-1 Batangas State University

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EDA-Chapter-1 Batangas State University

Uploaded by

Copyright:

Available Formats

Engineering Data

Descriptive Statistics, which is referred Inferential Statistics, implied in the

However surveys depend on the respondents honesty, motivation,

There are two ways of selecting a sample. These are the

Non-probability sampling is also called judgment or subjective sampling. This

In convenience sampling, the researcher use a device in obtaining the

In purposive sampling, the selection of respondents is predetermined

Non-proportional quota sampling is a bit less

Simple Random Sampling

You might also like