Module 1
Module 1
OBTAINING DATA
Introduction
Hello dear young engineers!
Welcome to this module on Engineering Data Analysis. This module will help you understand
how to obtain data by following the methods, planning, and conducting. But first we will define
what is statistics?
Statistics may be defined as the science that deals with the collection, organization, presentation,
analysis, and interpretation of data in order be able to draw judgments or conclusions that help
in the decision-making process. The two parts of this definition correspond to the two main
divisions of Statistics. These are Descriptive Statistics and Inferential Statistics. Descriptive
Statistics, which is referred to in the first part of the definition, deals with the procedures that
organize, summarize and describe quantitative data. It seeks merely to describe data. Inferential
Statistics, implied in the second part of the definition, deals with making a judgment or a
conclusion about a population based on the findings from a sample that is taken from the
population.
1
WHAT IS DATA ?
Practice Activity 01
What have you observed with the first three statements? How the statements
four and five?
The general rule of thumb: if you can add it, it is quantitative, if you cannot
add something, then it is qualitative.
2
WHAT DO DATA LOOK LIKE ?
Practice Activity 02
___________________________
___________________________
___________________________
___________________________
___________________________
3
METHODS OF DATA COLLECTION
Data collection is the process of gathering and measuring information on variables of interest, in
an established systematic fashion that enables one to answer stated research questions, test
hypotheses, and evaluate outcomes.
TYPES OF DATA
PRIMARY DATA data which are collected fresh and for the first time and thus happen to
be original in character and known as PRIMARY DATA.
SECONDRY data which have been collected by someone else and which have already
DATA been passed through the statistical process.
4
METHODS OF DATA COLLECTION |Primary Data
1. Observation
Advantages Disadvantages
Types of Observation
1c. Participant
1d. Non-participate
• when the observer is observing people without giving any information to them.
• Advantages:
1. Objectivity and neutrality.
2. More willingness of the respondent. 5
Types of Observation
1e. Uncontrolled
• when the observation takes place in natural conditions. It is done to get a spontaneous
picture of life and persons.
1f. Controlled
• when an observation takes place according to definite pre-arranged plans, with the
experimental procedure then it is a controlled observation generally done in the
laboratory under controlled conditions.
2. Interview
• This method of collecting data involves presentation or oral-verbal stimuli and replies
in terms of oral-verbal responses.
• The interview method is an oral verbal communication where the interviewer asks
questions (which are aimed to get information required for study) to respondent.
Types of Interview
6
Types of Interview
8. Individual interviews
The interviewer meets a single person and interviews him.
9. Selection interviews Done for the selection of people for certain jobs.
3. Questionnaire
• This method of data collection is quite popular, particularly in the case of big
enquiries.
• Is mailed to respondents who are expected to read and understand the questions and
write down the reply in the space meant for the purpose of the questionnaire itself.
• The respondents have to answer the questions on their own.
Advantages Disadvantages
Low cost even if the geographical area is Low rate of return of duly filled
too large questionnaire.
Answers are in respondents word so free Slowest method of data collection.
from bias.
Adequate time to think for answers. Difficult to know if the expected
respondent have filled the form or it is
filled by someone else.
Non approachable respondents may be
conveniently contacted.
Large samples can be used so results are
more reliable.
7
METHODS OF DATA COLLECTION |Primary Data
4. Case Study
Advantages Disadvantages
They are less costly and less They are subject to selection bias
time-consuming; they are
advantageous when exposure data
is expensive or hard to obtain.
They are advantageous when They generally do not allow
studying dynamic populations in calculation of incidence (absolute
which follow-up is difficult. risk).
5. Survey
Advantages Disadvantages
Practice Activity 03
Define the following:
• Registration Method
• Experimentation Method
8
METHODS OF DATA COLLECTION |Secondary Data
Sources of Data
9
Factors to consider when choosing a Data collection methods
There are various factors to consider when choosing a data collection method. As such the
researcher must judiciously select the method/methods for his own study, keeping in view the
following factors:
This constitutes the most important factor affecting the choice of a particular method. The
method selected should be such that it suits the type of enquiry that is to be conducted by the
researcher. This factor is also important in deciding whether the data already available (secondary
data) are to be used or the data not yet available (primary data) are to be collected.
Availability of funds
The availability of funds for the research project determines to a large extent the method to be
used for the collection of data. When funds at the disposal of the researcher are very limited, he
will have to select a comparatively cheaper method which may not be as efficient and effective as
some other costly method. Finance, in fact, is a big constraint in practice and the researcher has
to act within this limitation.
Time factor
Availability of time has also to be taken into account in deciding a particular method of data
collection. Some methods take relatively more time, whereas with others the data can be
collected in a comparatively shorter duration. The time at the disposal of the researcher, thus,
affects the selection of the method by which the data are to be collected.
Precision required
Precision required is yet another important factor to be considered at the time of selecting the
method of collection of data.
10
Designing a Survey
Surveys can take different forms. They can be used to ask only one question or they can ask a
series of questions. We can use surveys to test out people’s opinions or to test a hypothesis.
1. Determine the goal of your survey: What question do you want to answer?
2. Identify the sample population: Whom will you interview?
3. Choose an interviewing method: face-to-face interview, phone interview, self-administered
paper survey, or internet survey.
4. Decide what questions you will ask in what order, and how to phrase them. (This is
important if there is more than one piece of information you are looking for.)
5. Conduct the interview and collect the information.
6. Analyze the results by making graphs and drawing conclusions.
Example:
Martha wants to construct a survey that shows which sports students at her school like to play
the most.
Step 1: List the goal of the survey
Step 2: What population should she interview?
Step 3: How should she administer the survey?
Step 4: Create a data collection sheet that she can use to record her results
Step 1: GOAL
The goal of the survey is to find the answer to the question: “Which sports do students
at Martha’s school like to play the most?”
Step 2: POPULATION
A sample of the population would include a random sample of the student population in
Martha’s school. A good strategy would be to randomly select students (using dice or a
random number generator) as they walk into an all-school assembly
Step 3: METHODS
Face-to-face interviews are a good choice in this case. Interviews will be easy to conduct
since the survey consists of only one question which can be quickly answered and
recorded, and asking the question face to face will help eliminate non-response bias.
Step 4: DATA
11
Basis of Conducting Experiment
1. With an experiment, the researcher is trying to learn something new about the world, an
explanation of 'why' something happens.
2. The experiment must maintain internal and external validity, or the results will be useless.
3. When designing an experiment, a researcher must follow all of the steps of the scientific
method, from making sure that the hypothesis is valid and testable, to using controls and
statistical tests.
12
Introduction to Design of Experiments (DOE)
Do you remember learning about this back in high school or junior high even? What were
those steps again?
How many of you have baked a cake? What are the factors involved to ensure a successful
cake? Factors might include preheating the oven, baking time, ingredients, amount of
moisture, baking temperature, etc.-- what else? You probably follow a recipe so there are
many additional factors that control the ingredients - i.e., a mixture. In other words,
someone did the experiment in advance! What parts of the recipe did they vary to make the
recipe a success? Probably many factors, temperature and moisture, various ratios of
ingredients, and the presence or absence of many additives. Now, should one keep all the
factors involved in the experiment at a constant level and just vary one to see what would
happen? This is a strategy that works but is not very efficient. This is one of the concepts
that we will address in this course.
13
Engineering Experiments
If we had infinite time and resource budgets there probably wouldn't be a big fuss made over
designing experiments. In production and quality control we want to control the error and learn
as much as we can about the process or the underlying theory with the resources at hand. From
an engineering perspective we're trying to use experimentation for the following purposes:
• reduce time to design/develop new products & processes
• improve performance of existing processes
• improve reliability and performance of products
• achieve product & process robustness
• perform an evaluation of materials, design alternatives, setting component & system
tolerances, etc.
We always want to fine-tune or improve the process. In today's global world this drive for
competitiveness affects all of us both consumers and producers.
Robustness is a concept that enters into statistics at several points. In the analysis, stage
robustness refers to a technique that isn't overly influenced by bad data. Even if there is an
outlier or bad data you still want to get the right answer. Regardless of who or what is involved
in the process - it is still going to work.
Every experiment design has input. Back to the cake baking example: we have our ingredients
such as flour, sugar, milk, eggs, etc. Regardless of the quality of these ingredients we still want
our cake to come out successfully. In every experiment there are inputs and in addition, there are
factors (such as time of baking, temperature, the geometry of the cake pan, etc.), some of which
you can control and others that you can't control. The experimenter must think about factors
that affect the outcome. We also talk about the output and the yield or the response to your
experiment. For the cake, the output might be measured as texture, height, size, or flavor.
14
The Basic Principles of DOE
Randomization
This is an essential component of any experiment that is going to have validity. If you are doing a
comparative experiment where you have two treatments, a treatment and a control, for instance,
you need to include in your experimental process the assignment of those treatments by some
random process. An experiment includes experimental units. You need to have a deliberate
process to eliminate potential biases from the conclusions, and random assignment is a critical
step.
Replication
Blocking
Blocking is a technique to include other factors in our experiment which contribute to
undesirable variation. Much of the focus in this class will be to creatively use various blocking
techniques to control sources of variation that will reduce error variance. For example, in human
studies, the gender of the subjects is often an important factor. Age is another factor affecting
the response. Age and gender are often considered nuisance factors which contribute to the
variability and make it difficult to assess the systematic effects of treatment. By using these as
blocking factors, you can avoid biases that might occur due to differences between the allocation
of subjects to the treatments, and as a way of accounting for some noise in the experiment. We
want the unknown error variance at the end of the experiment to be as small as possible. Our
goal is usually to find out something about a treatment factor (or a factor of primary interest),
but in addition to this, we want to include any blocking factors that will explain variation.
15
The Basic Principles of DOE
Multi-factor Designs
Confounding
Confounding is something that is usually considered bad! Here is an example. Let's say we are
doing a medical study with drugs A and B. We put 10 subjects on drug A and 10 on drug B. If
we categorize our subjects by gender, how should we allocate our drugs to our subjects? Let's
make it easy and say that there are 10 male and 10 female subjects. A balanced way of doing this
study would be to put five males on drug A and five males on drug B, five females on drug A
and five females on drug B. This is a perfectly balanced experiment such that if there is a
difference between males and females at least it will equally influence the results from drug A
and the results from drug B.
An alternative scenario might occur if patients were randomly assigned treatments as they came
in the door. At the end of the study, they might realize that drug A had only been given to the
male subjects and drug B was only given to the female subjects. We would call this design totally
confounded. This refers to the fact that if you analyze the difference between the average
response of the subjects on A and the average response of the subjects on B, this is exactly the
same as the average response of males and the average response of females. You would not have
any reliable conclusion from this study at all. The difference between the two drugs A and B
might just as well be due to the gender of the subjects since the two factors are totally
confounded.
Confounding is something we typically want to avoid but when we are building complex
experiments we sometimes can use confounding to our advantage. We will confound things we
are not interested in order to have more efficient experiments for the things we are interested in.
This will come up in multiple factor experiments later on. We may be interested in main effects
but not interactions so we will confound the interactions in this way in order to reduce the
sample size, and thus the cost of the experiment, but still has good information on the main
effects.
16
Steps for Planning, Conducting and Analyzing an Experiment
The practical steps needed for planning and conducting an experiment include: recognizing the
goal of the experiment, choice of factors, choice of response, choice of the design, analysis and
then drawing conclusions. This pretty much covers the steps involved in the scientific method.
What this topic will deal with primarily is the choice of design. This focus includes all the related
issues about how we handle these factors in conducting our experiments.
Factors
We usually talk about "treatment" factors, which are the factors of primary interest to you. In
addition to treatment factors, there are nuisance factors which are not your primary focus, but
you have to deal with them. Sometimes these are called blocking factors, mainly because we will
try to block these factors to prevent them from influencing the results.
These are factors that you can specify (and set These can't be changed or assigned, these
the levels) and then assign at random as the come as labels on the experimental units. The
treatment to the experimental units. age and sex of the participants are
Examples would be temperature, level of an classification factors which can't be changed
additive fertilizer amount per acre, etc. or randomly assigned. But you can select
individuals from these groups randomly.
17
Steps for Planning, Conducting and Analyzing an Experiment
You can assign any specified level of a These factors have categories which are
quantitative factor. Examples: percent or pH different types. Examples might be species of
level of a chemical. a plant or animal, a brand in the marketing
field, gender, - these are not ordered or
continuous but are arranged perhaps in sets.
18
References
• Dodge, Y.; Cox, D.; Commenges, D.; Davidson, A; Solomon, P.; and Wilson, S. (Eds.). The
Oxford Dictionary of Statistical Terms, 6th Edition. New York: Oxford University Press,
2006.
• Beyer, W. H. CRC Standard Mathematical Tables, 31st ed. Boca Raton, FL: CRC Press, pp.
536 and 571, 2002.
• Agresti A. (1990) Categorical Data Analysis. John Wiley and Sons, New York.
• Kotz, S.; et al., eds. (2006), Encyclopedia of Statistical Sciences, Wiley.
• Lindstrom, D. (2010). Schaum’s Easy Outline of Statistics, Second Edition (Schaum’s Easy
Outlines) 2nd Edition. McGraw-Hill Education
• Selection of appropriate method for data collection in research methodology tutorial 04
September 2022 - learn selection of appropriate method for data collection in research
methodology tutorial (11495): Wisdom Jobs India. Wisdom Jobs. (n.d.). Retrieved September
4, 2022, from
https://www.wisdomjobs.com/e-university/research-methodology-tutorial-355/selection-of-
appropriate-method-for-data-collection-11495.html
• Lesson 1: Introduction to design of experiments: Stat 503. PennState: Statistics Online
Courses. (n.d.). Retrieved September 4, 2022, from
https://online.stat.psu.edu/stat503/lesson/1
19