Engineering Data Analysis
Engineering Data Analysis
Engineering Data Analysis
Chapter 1
Obtaining Data
1.1 . Methods of Data Collection
1.2 . Planning and Conducting Surveys
1.3 . Planning and Conducting Experiments: Introduction to Design of Experiments
After careful study of this chapter, the students will be able to:
The task of data collection begins after a research problem has been defined and research design/plan
chalked out. While deciding about the method of data collection to be used for the study, the researcher should
keep in mind two types of data viz., primary and secondary. The primary data are those which are collected afresh
and for the first time, and thus happen to be original in character. The secondary data, on the other hand, are those
which have already been collected by someone else and which have already been passed through the statistical
process. The researcher would have to decide which sort of data he would be using (thus collecting) for his study
and accordingly he will have to select one or the other method of data collection.
Observation Method
The observation method is the most commonly used method especially in studies relating to behavioural
sciences. In a way we all observe things around us, but this sort of observation is not scientific observation.
Observation becomes a scientific tool and the method of data collection for the researcher, when it serves a
formulated research purpose, is systematically planned and recorded and is subjected to checks and controls on
validity and reliability. Under the observation method, the information is sought by way of investigator’s own
direct observation without asking from the respondent. For instance, in a study relating to consumer behaviour, the
investigator instead of asking the brand of wrist watch used by the respondent, may himself look at the watch.
Interview Method
The interview method of collecting data involves presentation of oral-verbal stimuli and reply in terms of
oral-verbal responses. This method can be used through personal interviews and, if possible, through telephone
interviews.
(a) Personal interviews: Personal interview method requires a person known as the interviewer asking questions
generally in a face-to-face contact to the other person or persons.
(b) Telephone interviews: This method of collecting information consists in contacting respondents on telephone
itself. It is not a very widely used method, but plays important part in industrial surveys, particularly in
developed regions.
Collection of data through questionnaires
This method of data collection is quite popular, particularly in case of big enquiries. It is being adopted by
private individuals, research workers, private and public organizations and even by governments. In this method a
questionnaire is sent (usually by post) to the persons concerned with a request to answer the questions and return
the questionnaire. A questionnaire consists of a number of questions printed or typed in a definite order on a form
or set of forms. The questionnaire is mailed to respondents who are expected to read and understand the questions
and write down the reply in the space meant for the purpose in the questionnaire itself. The respondents have to
answer the questions on their own.
Randomization - this is an essential component of any experiment that is going to have validity. If you
are doing a comparative experiment where you have two treatments, a treatment and a control for instance, you
need to include in your experimental process the assignment of those treatments by some random process. An
experiment includes experimental units. You need to have a deliberate process to eliminate potential biases from
the conclusions, and random assignment is a critical step.
Replication - is some in sense the heart of all of statistics. To make this point... Remember what the
standard error of the mean is? It is the square root of the estimate of the variance of the sample mean, i.e., .
The width of the confidence interval is determined by this statistic. Our estimates of the mean become less
variable as the sample size increases.
Replication is the basic issue behind every method we will use in order to get a handle on how precise
our estimates are at the end. We always want to estimate or control the uncertainty in our results. We achieve this
estimate through replication. Another way we can achieve short confidence intervals is by reducing the error
variance itself. However, when that isn't possible, we can reduce the error in our estimate of the mean by
increasing n.
Another way is to reduce the size or the length of the confidence interval is to reduce the error variance -
which brings us to blocking.
Blocking - is a technique to include other factors in our experiment which contribute to undesirable
variation. Much of the focus in this class will be to creatively use various blocking techniques to control sources
of variation that will reduce error variance. For example, in human studies, the gender of the subjects is often
important factor. Age is another factor affecting the response. Age and gender are often considered nuisance
factors which contribute to variability and make it difficult to assess systematic effects of a treatment. By using
these as blocking factors, you can avoid biases that might occur due to differences between the allocations of
subjects to the treatments, and as a way of accounting for some noise in the experiment. We want the unknown
error variance at the end of the experiment to be as small as possible. Our goal is usually to find out something
about a treatment factor (or a factor of primary interest), but in addition to this we want to include any blocking
factors that will explain variation.
Multi-factor Designs - 2k designs, 3k designs, response surface designs, etc. The point to all of these
multi-factor designs is contrary to the scientific method where everything is held constant except one factor which
is varied. The one factor at a time method is a very inefficient way of making scientific advances. It is much better
to design an experiment that simultaneously includes combinations of multiple factors that may affect the
outcome. Then you learn not only about the primary factors of interest but also about these other factors. These
may be blocking factors which deal with nuisance parameters or they may just help you understand the
interactions or the relationships between the factors that influence the response.
Confounding - is something that is usually considered bad! Here is an example. Let's say we are doing
a medical study with drugs A and B. We put 10 subjects on drug A and 10 on drug B. If we categorize our
subjects by gender, how should we allocate our drugs to our subjects? Let's make it easy and say that there are 10
male and 10 female subjects. A balanced way of doing this study would be to put five males on drug A and five
males on drug B, five females on drug A and five females on drug B. This is a perfectly balanced experiment such
that if there is a difference between male and female at least it will equally influence the results from drug A and
the results from drug B.
An alternative scenario might occur if patients were randomly assigned treatments as they came in the
door. At the end of the study they might realize that drug A had only been given to the male subjects and drug B
was only given to the female subjects. We would call this design totally confounded. This refers to the fact that if
you analyze the difference between the average response of the subjects on A and the average response of the
subjects on B, this is exactly the same as the average response on males and the average response on females. You
would not have any reliable conclusion from this study at all. The difference between the two drugs A and B,
might just as well be due to the gender of the subjects, since the two factors are totally confounded.
Confounding is something we typically want to avoid but when we are building complex experiments
we sometimes can use confounding to our advantage. We will confound things we are not interested in order to
have more efficient experiments for the things we are interested in. This will come up in multiple factor
experiments later on. We may be interested in main effects but not interactions so we will confound the
interactions in this way in order to reduce the sample size, and thus the cost of the experiment, but still have good
information on the main effects.
Factors
Researcher usually talk about "treatment" factors, which are the factors of primary interest to him. In
addition to treatment factors, there are nuisance factors which are not his primary focus, but he have to deal with
them. Sometimes these are called blocking factors, mainly because the researcher will try to block on these factors
to prevent them from influencing the results.