Empirical Software Engineering (SE-404) LAB A1-G1 Laboratory Manual
Empirical Software Engineering (SE-404) LAB A1-G1 Laboratory Manual
LAB A1-G1
Laboratory Manual
3 Defect detection activities like reviews and testing help in identifying the 27-01-2022
defects in the artifacts (deliverables). These defects must be classified into
various buckets before carrying out the root cause analysis. Following are
some of the defect categories: Logical, User interface, Maintainability,
and Standards. In the context of the above defect categories, classify the
following statements under the defect categories.
4 Why is version control important? How many types of version control 03-02-2022
systems are there? Demonstrate how version control is used in a proper
sequence (stepwise).
7 Validate the results obtained in experiment 6 using 10- cross validation. 25-02-2022
8 Online loan system has two modules for the two basic services, namely 08-03-2022
Car loan service and House loan service. The two modules have been
named as Car_Loan_Module and House_Loan_Module.
Car_Loan_Module has 2000 lines of uncommented source
code.House_Loan_Module has 3000 lines of uncommented source code.
Car_Loan_Module was completely implemented by Mike.
House_Loan_Module was completely implemented by John. Mike took
100 person hours to implement the Car_Loan_Module.
John took 200 person hours to implement the House_Loan_Module.
Mike’s module had 5 defects. John’s module had 6 defects. With respect
to the context given, which among the following is an INCORRECT
statement? Identify the null and alternate hypothesis for the following
options.
Justify and Choose one:
a. John’s Quality is better than Mike’s Quality
b. John’s Productivity is more than Mike’s Productivity
c. John introduced more defects than Mike
d. John’s Effort is more than Mike’s Effort
Introduction:-
1. WEKA: Also known as Waikato Environment is a machine learning software developed at the University of
Waikato in New Zealand. WEKA is a free and open source software package that assembles a wide range of
data mining and model building algorithms.
It is best suited for data analysis and predictive modeling. It contains algorithms and visualization tools that
support machine learning. WEKA has a GUI that facilitates easy access to all its features. It is written in JAVA
programming language and runs on almost any platform. WEKA supports major data mining tasks including
data mining, processing, visualization, regression etc. It works on the assumption that data is available in the
form of a flat file. WEKA can provide access to SQL Databases through database connectivity and can further
process the data/results returned by the query.
2. KEEL: KEEL is a free software Java tool which empowers the user to assess the behavior of evolutionary
learning and soft computing based techniques for different kind of data mining problems: regression,
classification, clustering, pattern mining and so on. KEEL is a data mining tool used by many EDM researchers.
For instance, KEEL has extremely extensive support for discretization algorithms, but has limited support for
other methods for engineering new features out of existing features. It has excellent support for feature
selection, with a wider range of algorithms than any other package. It also has extensive support for imputation
of missing data, and considerable support for data re-sampling. KEEL is open-source and free for use under a
GNU license.
For modeling, KEEL has an extensive set of classification and regression algorithms; with a large focus on
evolutionary algorithms. Its support for other types of data mining algorithms, such as clustering and factor
analysis, is more limited than other packages. Support for association rule mining is decent, though not as
extensive as some other packages.
• It contains a large collection of evolutionary algorithms for predicting models, preprocessing methods
(evolutionary feature and instance selection among others) and postprocessing procedures (evolutionary tuning
of fuzzy rules). It also presents many state-of-the-art methods for different areas of data mining such as decision
trees, fuzzy rule based systems or crisp rule learning.
• It includes around 100 data preprocessing algorithms proposed in the specialized literature: data
transformation, discretization, instance and feature selection, noise filtering and so forth.
• It incorporates a statistical library to analyze the results of the algorithms.
• It comprises a set of statistical tests for analyzing the suitability of the results and for performing parametric
and nonparametric comparisons among the algorithms. • It provides a user-friendly interface, oriented to the
analysis of algorithms.
3. SPSS: SPSS stands for “Statistical Package for the Social Sciences”. It is an IBM tool. This tool first
launched in 1968. This is a software package which is mainly used for statistical analysis of the data.
SPSS is mainly used in the following areas like healthcare, marketing, and educational research, market
researchers, health researchers, survey companies, education researchers, government, marketing organizations,
data miners, and many others. It provides data analysis for descriptive statistics, numeral outcome predictions,
and identifying groups. This software also gives data transformation, graphing and direct marketing features to
manage data smoothly.
Advantages of SPSS are:
• The data from any survey collected via Survey Gizmo gets easily exported to SPSS for detailed and good
analysis.
• In SPSS, data gets stored in .SAV format. This data mostly comes from surveys. This makes the process of
manipulating, analyzing and pulling data very simple. • SPSS have easy access to data with different variable
types. These variable data are easy to understand. SPSS helps researchers to set up models easily because most
of the process is automated.
• SPSS allows Opening data files, either in SPSS’ own file format or many others. • SPSS allows editing data
such as computing sums and means over columns or rows of data. SPSS has outstanding options for more
complex operations as well. • SPSS has options for creating tables and charts containing frequency counts or
summary statistics over (groups of) cases and variables.
• SPSS has a unique way to get data from critical data also. Trend analysis, assumptions, and predictive models
are some of the characteristics of SPSS.
• SPSS is easy for you to learn, use and apply.
• It helps to get data management system and editing tools handy.
• SPSS offers you in-depth statistical capabilities for analyzing the exact outcome. • SPSS helps us to design,
plotting, reporting and presentation features for more clarity.
Limitations are:
4. MATLAB: MATLAB stands for Matrix laboratory. It was developed by Mathworks, and it is a multipurpose
(or as we say it Multi-paradigm) programming language. It allows matrix manipulations and helps us to plot
different types of functions and data. It can also be used for the analysis and design as such as the control
systems.
● Easy to use interface: A user-friendly interface with features you want to use is one click away.
● A large inbuilt database of algorithms: MATLAB has numerous important algorithms you want to use already
built-in, and you just have to call them in your code. ● Extensive data visualization and processing: We can
process a large amount of data in MATLAB and visualize them using plots and figures.
● Debugging of code is easy: There are many inbuilt tools like analyzer and debugger for analysis and
debugging of codes written in MATLAB.
● Easy symbolic manipulation: We can perform symbolic math operations in MATLAB using the symbolic
manipulation algorithms and tools in MATLAB
● MATLAB is slow since it is an interpreted language that means MATLAB programs are not converted into
Machine language but are run by external software, so it can sometimes be slow.
● We cannot create the OUTPUT file in MATLAB.
● One cannot use graphics in MATLAB with -nojvm option, on doing so, we will get a runtime error.
5. R: R is a popular and powerful open source programming language and software environment for statistical
computing and graphics representation. R was created by Ross Ihaka and Robert Gentleman at the University of
Auckland, New Zealand, and is currently developed by the R Development Core Team. R implements various
statistical techniques like linear and non-linear modeling, machine learning algorithms, time series analysis, and
classical statistical tests and so on. R consists of a language and a run-time environment with graphics, a
debugger, access to certain system functions, and the ability to run programs stored in script files
Advantages of R:
1) Open Source: R is an open-source language. We can contribute to the development of R by optimizing our
packages, developing new ones, and resolving issues.
2) Platform Independent: R is a platform-independent language or cross-platform programming language which
means its code can run on all operating systems. R can run quite easily on Windows, Linux, and Mac.
3) Machine Learning Operations: R allows us to do various machine learning operations such as classification
and regression. R is used by the best data scientists in the world.
4) Highly Compatible: R is highly compatible and can be paired with many other programming languages like
C, C++, Java, and Python. It can also be integrated with technologies like Hadoop and various other database
management systems as well.
Disadvantages of R:
1) Data Handling: In R, objects are stored in physical memory. It is in contrast with other programming
languages like Python. R utilizes more memory as compared to Python. It requires the entire data in one single
place which is in the memory. It is not an ideal option when we deal with Big Data.
2) Basic Security: R lacks basic security. Because of this, there are many restrictions with R as it cannot be
embedded in a web-application.
3) Complicated Language: R is a very complicated language, and it has a steep learning curve. The people who
don't have prior knowledge or programming experience may find it difficult to learn R.
4) Lesser Speed: R programming language is much slower than other programming languages such as
MATLAB and Python. In comparison to other programming languages, R packages are much slower.
Learning from experiment:- We have successfully discussed and learnt various data analysis tools. We
have compared 5 different analysis tools WEKA, KEEL, SPSS, MATLAB, R and mentioned their advantages
and disadvantages as well.
EXPERIMENT-2
OBJECTIVE: Consider any empirical study of your choice (Experiments, Survey Research, Systematic
Review, Postmortem analysis and case study). Identify the following components for an empirical study.
INTRODUCTION:
Parametric Test
Parametric tests are used for data samples having normal distribution (bell-shaped curve)
Non-Parametric Test
Nonparametric tests are used when the distribution of data samples is highly skewed. If the assumptions of
parametric tests are met, they are more powerful as they use more information while computation.
Dependent Variables
The dependent variable (or response variable) is the output produced by analyzing the effect of the independent
variables. The dependent variables are presumed to be influenced by the independent variables. For example- effort,
cost, faults, and productivity.
Independent Variables
Independent variables (or predictor variables) are input variables that are manipulated or controlled by the
researcher to measure the response of the dependent variable. For example- lines of source code, number of
methods, and number of attributes.
Confounding Variables
Apart from the independent variables, unknown variables or confounding variables (extraneous variables) may
affect the outcome (dependent) variable.
Within-Company Analysis
In within-company analysis, the empirical study collects the data from the old versions/ releases of the same
software, predicts models, and applies the predicted models to the future versions of the same project.
In practice, the old data may not be available. In such cases, the data obtained from similar earlier projects
developed by different companies are used for prediction in new projects. The process of validating the predicted
model using data collected from different projects from which the model has been derived is known as
cross-company analysis.
Proprietary Dataset
Proprietary dataset is a licensed dataset owned by a company. For example, Microsoft Office, Adobe Acrobat, and
IBM SPSS dataset are proprietary dataset. In practice, obtaining data from proprietary dataset for research
validation is difficult as the software companies are usually not willing to share the information about their software
systems.
Open-Source Dataset
Open source dataset is usually a freely available dataset, developed by many developers from different places in a
collaborative manner. For example kaggle, google cloud and NASA dataset.
For our experiment I am taking a survey paper on the title Effect of Music on the Human Body and Mind
(https://digitalcommons.liberty.edu/cgi/viewcontent.cgi?article=1162&context=honors) by Dawn Kent.
The research project consisted of a 100-count survey given to students at Liberty University. The 10-question
surveys included some demographic data, and questions about average time spent studying in high school and in
college, average high school and college GPAs, and whether or not students listened to music while studying, and if
so, what types of music. The data was then compiled and analyzed. All surveys were completed voluntarily and all
participants had the understanding that their anonymous information would be used for research. Two extra surveys
were taken to allow for the exclusion of two surveys that were missing answers or failed to follow directions.
The survey was distributed on the campus of Liberty University. All surveys were taken voluntarily and without
recompense. Surveys were distributed within several classes, including the concert band, a large chemistry class
containing mostly biology and health sciences majors, and a statistics in psychology class. This contributed to a
higher concentration of music, biology, health, and psychology majors. Other surveys were distributed in common
areas such as the computer lab and dining hall and dormitories, where a variety of students could be found. Students
of every academic standing (i.e. freshman, sophomore, etc.) and a wide variety of ages were represented, although
most were concentrated in the average age range of college students at Liberty University.Gender ratios were
similar. There were 43 males and 57 females completing the survey. Ninety percent of the study participants were
between the ages of 17 and 22, while 7% were between 23 and 25 years of age, and 3% were 26 years or older.
Ratios between classes were fairly similar. Participants included 26 freshmen, 21 sophomores, 23 juniors and 30
seniors. Fifteen different majors were represented, although the survey contained greater concentrations in the
aforementioned areas. Also affecting this factor is the size of each major department at Liberty University. For
instance, after the top three majors, communications was next, which is the largest major at Liberty University. The
ratios of majors were as follows: music: 15; biology: 12; psychology: 11; communications: 10; business: 8;
education: 8; health: 7; computer science: 6; religion: 6; nursing: 5; government: 4; undecided: 3; math: 2; family
and consumer science: 2; and history.
RESULT:-
Parametric Test
T-test
The initial independent sample t-test run on the hypothesis that listening to music while studying would positively
affect GPA showed no significance (t(98) = -1.182; p = 240). However, the mean GPA of those who did not listen to
music while studying was slightly lower than those who did. Closer examination of the mean GPAs of each type of
music revealed that listeners of each individual type of music had lower, but not significantly lower GPAs than
those who did not, except in the case of two types of music. Those that listened to easy listening music had a much
higher, though not significantly higher, GPA mean than any other group. Those who listened to rock music had a
slightly higher GPA than those who did not listen to music. Also, two types of music showed notably lower GPA
means than the no-music group. Hip-hop/R&B had the lowest GPA meanwhile rap followed closely behind. Jazz
and country also had moderately lower GPA means. Alternative, classical, and gospel music stayed very close to the
no-music control group.
ANOVA
A one-way ANOVA test was run on this set of information. No significant effect on GPA was found between these
three groups (F(2, 97) = 2.202; p = .116). Next, the data was divided into a different three-group comparison. This
time, the focus turned to the negative end of the spectrum. The first group included all students who listened to any
type of music while studying excluding country, rap, hip-hop/R&B, and jazz. These four types of music showed
considerably lower GPA means than the no music group. The second group included all students who listened to
any type of music while studying that included at least one of country, rap, hip-hop/R&B, and jazz. The third group
remained those students that listen to no music while studying. An ANOVA test was run. The comparison between
the three groups was not significant (F(2, 97) = 3.003; p = .054).
There was no non parametric test used in the survey but wilcoxon whitney test which is an alternative for t test and
kruskal wallis test can be used to find p value significance between various music listening groups.
Independent variables
Independent variables used in the survey are demographic data like nationality, major pursuing, gender, age and
questions about average time spent studying in high school and in college, average high school GPAs, and whether
or not students listened to music while studying, and if so, what types of music.
Dependent Variables
Dependent Variable is the outcome of the survey that is the GPA they scored in the college.
Confounding Variables
Confounding Variables are the variables which affect the outcome indirectly so in the survey that can be the subject
that they are studying while listening to music, volume of music and disturbance in background.
It is a with-in company analysis as the survey is done on the data given by the user on their past experiences and
then the data is used to predict about the GPA they have scored.
The dataset used in the survey is open source dataset as the dataset is not licensed and also the results are calculated
on open source software as SPSS Student Version 12.0 for Windows software program.
Learning from experiment:-We have successfully discussed and learnt about parametric and nonparametric
tests independent, dependent and confounding variables, Within-company and cross-company analysis ,Proprietary
and open-source software.
EXPERIMENT-3
Objective:- Defect detection activities like reviews and testing help in identifying the defects in the
artifacts (deliverables). These defects must be classified into various buckets before carrying out the root
cause analysis. Following are some of the defect categories: Logical, User interface, Maintainability, and
Standards. In the context of the above defect categories, classify the following statements under the defect
categories.
Result:-
d. A pointer is declared but not initialized. It is used in the program for storing a value.
Logical Defect and Standards:
A pointer is a variable that holds the address of another variable. So the solution would be to initialize the pointer
properly and assign a value to it.
e. A program designed to handle 1000 simultaneous users, crashed when 1001 the user logged in.
Logical Defect, User Interface:
Solution would be to do proper load and stress testing before hosting to the internet for user usage.
f. A “while” loop never exits
Logical Defect:
Solution would be to write appropriate conditions to end the loop otherwise the program would run infinite times.
g. User interface displays “MALFUNCTION 54” when something goes wrong in the back-end
User Interface:
Solution would be to do proper error handling and make an appropriate display message so that the user would
get to know what the exact problem is. One of the main aspects of GUI testing is checking whether correct error
messages are being displayed. UI Testing is done to uncover such User Interface Bugs.
i. Hungarian Notation not followed while coding, even though the coding guidelines mandate to use
Hungarian Notation
Standards:
Solution: It is always good to follow standards i.e. Hungarian Notation while coding. All coding guidelines
should be followed for successful software development.
j. Pressing the “Tab” key moves the cursor in different fields of a web form randomly.
User Interface:
Solution would be to design forms properly and to perform form-based testing before hosting to the internet.
Learning from experiment:- We have successfully learned about RCA and from which we learned different
types of testing and found different defects in the different scenarios. We also discussed solutions to prevent these
defects.
EXPERIMENT-4
Objective:- Why is version control important? How many types of version control systems are there?
Demonstrate how version control is used in a proper sequence (stepwise).
Introduction:- In software engineering, version control (also known as revision control, source control, or
source code management) is a class of systems responsible for managing changes to computer programs,
documents, large web sites, or other collections of information.
Version control allows you to keep track of your work and helps you to easily explore the changes you have
made, be it data, coding scripts, notes, etc. With version control software such
as Git, version control is much smoother and easier to implement. Using an online platform like Github to store
your files means that you have an online backup of your work, which is beneficial for both you and your
collaborators.
1. Local Version Control Systems: It is one of the simplest forms and has a database that kept all the
changes to files under revision control. RCS is one of the most common VCS tools. It keeps patch sets
(differences between files) in a special format on disk. By adding up all the patches it can then re-create what any
file looked like at any point in time.
2. Centralized version control System: With centralized version control systems, you have a single
“central” copy of your project on a server and commit your changes to this central copy. You pull the files that
you need, but you never have a full copy of your project locally. Some of the most common version control
systems are centralized, including Subversion (SVN) and Perforce.
3. Distributed version control systems: With distributed version control systems (DVCS), you don't rely on
a central server to store all the versions of a project’s files. Instead, you clone a copy of a repository locally so that
you have the full history of the project. Two common distributed version control systems are Git and Mercurial.
Learning from experiment:- We have successfully learned about the version control system(VCS) and its
benefits. We have also learned about the type of VCS and stepwise sequence of VCS.
EXPERIMENT-5
Objective:- Demonstrate how Git can be used to perform version control?
Introduction:- Version control allows you to keep track of your work and helps you to easily explore the
changes you have made, be it data, coding scripts, notes, etc. With version control software such as Git, version
control is much smoother and easier to implement. Using an online platform like Github to store your files means
that you have an online backup of your work, which is beneficial for both you and your collaborators.
GITHUB WORKFLOW:
The GitHub workflow can be summarized by the commit-pull push” mantra.
● Commit: Once you’ve saved your files, you need to commit them - this means the changes you have made to
files in your repo will be saved as a version of the repo, and your changes are now ready to go up on GitHub (the
online copy of the repository).
● Pull: Now, before you send your changes to Github, you need to pull, i.e. make sure you are completely up to
date with the latest version of the online version of the files - other people could have been working on them even
if you haven’t. You should always pull before you start editing and before you push.
● Push: Once you are up to date, you can push your changes – at this point in time your local copy and the online
copy of the files will be the same.
Learning from experiment:- In this Experiment we learned about how to work with git and understand its
workflow.
EXPERIMENT-6
OBJECTIVE: Consider any prediction model of your choice.
1. Analyze the dataset that is given as a input to the prediction model
2. Find out the quartiles for the used dataset
3. Analyze the performance of a model using various performance metrics.
INTRODUCTION:
For the experiment we have used Iris dataset which can be downloaded from
https://archive.ics.uci.edu/ml/datasets/iris
The dataset is evaluated using pre-defined models and techniques in the sklearn library of python.
The summary of the iris dataset is given in the above image then we have reduced the dimension for the
prediction model and created input data and output label.
2. Find out the quartiles for the used dataset
The quantile( ) function of NumPy is used to find quartiles of the dataset for each of the metrics or attributes used.
Quantiles are the set of values/points that divides the dataset into groups of equal size.
Precision: Precision is defined as the number of true positives divided by the number of true positives plus the
number of false positives. False positives are cases the model incorrectly labels as positive that are actually
negative.
Learning from experiment:- Using the Decision Tree Predictive Model for iris dataset, we analyzed the
dataset, found out quartiles and analyzed the performance of the model using various performance metrics using
Python language.
EXPERIMENT-7
OBJECTIVE: Validate the results obtained in experiment 3 using 10-cross validation, hold out
validation or leave one out cross-validation.
INTRODUCTION:-For the experiment we have used Iris dataset which can be downloaded from
https://archive.ics.uci.edu/ml/datasets/iris
In the case of holdout cross-validation, the dataset is randomly split into training and validation data. Generally,
the split of training data is more than test data. The training data is used to induce the model and validation data
evaluates the performance of the model.
The more data is used to train the model, the better the model is. For the holdout cross-validation method, a good
amount of data is isolated from training.
10 cross validation
In k-fold cross-validation, the original dataset is equally partitioned into k subparts or folds. Out of the k-folds or
groups, for each iteration, one group is selected as validation data, and the remaining (k-1) groups are selected as
training data.
The process is repeated for k times until each group is treated as validation and remaining as training data.
The final accuracy of the model is computed by taking the mean accuracy of the k-models validation data.
For a dataset having n rows, 1st row is selected for validation, and the rest (n-1) rows are used to train the model.
For the next iteration, the 2nd row is selected for validation and rest to train the model. Similarly, the process is
repeated until n steps or the desired number of operations.
OUTPUT:-
The dataset is evaluated using pre-defined models and techniques in the sklearn library of python.
RESULT:-The hold out validation gives out the best accuracy score of 1 better than the 10 cross validation and
leave one out validation.
Learning from experiment:-We have learned about the hold out validation, k cross validation and leave one
out cross validation.
EXPERIMENT-8
OBJECTIVE:-Online loan system has two modules for the two basic services, namely Car loan service and
House loan service. The two modules have been named as Car_Loan_Module and House_Loan_Module.
Car_Loan_Module has 2000 lines of uncommented source code.
House_Loan_Module has 3000 lines of uncommented source code. Car_Loan_Module was completely
implemented by Mike. House_Loan_Module was completely implemented by John. Mike took 100 person hours
to implement the Car_Loan_Module. John took 200 person hours to implement the House_Loan_Module. Mike’s
module had 5 defects. John’s module had 6 defects. With respect to the context given, which among the following
is an INCORRECT statement?
Justify and choose one:
a) John’s Quality is better than Mike’s Quality
b) John’s Productivity is more than Mike’s Productivity
c) John introduced more defects than Mike
d) John’s Effort is more than Mike’s Effort.
NULL hypothesis: The null hypothesis states that a population parameter (such as the mean, the standard
deviation, and so on) is equal to a hypothesized value. The null hypothesis is often an initial claim that is based on
previous analyses or specialized knowledge.
Alternate hypothesis: The alternative hypothesis states that a population parameter is smaller, greater, or
different than the hypothesized value in the null hypothesis. The alternative hypothesis is what you might believe
to be true or hope to prove true.
CALCULATIONS:-
For John,
● Size = 3000 LOC
● Effort = 200 person-hours Defect = 6
● Productivity = size/effort = 3000/200 LOC/person-hours = 15 LOC/person-hours
● Quality = defect/size = 6 defects/ 3000 LOC = 0.02 defect/size
For Mike,
● Size = 2000 LOC
● Effort = 100 person-hours Defect = 5
● Productivity = size/effort = 2000/100 LOC/person-hours = 20 LOC/person-hours
● Quality = defect/size = 5 defects/ 2000 LOC = 0.025 defect/sizes
(b) is the incorrect statement. Below is the reason for the same.
(a) The quality of the code can be expressed in terms of defect density i.e. number of defects per lines of code.
● Mike's code's defect density = 5/4000 = 0.00125 defects/SLOC
● John's defect density = 6/5000 = 0.00120 defects/SL0C
The higher the defect density,the lower is the quality of the code. S0, John's quality is better than Mike's. Hence,
(a) is correct.
(c) John introduced 6 defects while Mike introduced 5. Clearly, John introduced more defects than Mike. Hence,
(c) is correct.
(d) John’s effort is 300 person-hours, while Mike’s effort is 200 person-hours. Clearly, John’s effort is more than
Mike’s. Hence, (d) is correct.
RESULT:- Mike’s productivity is more than that of John. Hence, option (b) is INCORRECT statement
EXPERIMENT-9
THEORY:-
1. NULL HYPOTHESIS: A null hypothesis is a type of hypothesis used in statistics that proposes that no
statistical significance exists in a set of given observations.
2. HYPOTHESIS TESTING: All hypothesis tests are conducted the same way. The researcher states a hypothesis
to be tested, formulates an analysis plan, analyzes sample data according to the plan, and accepts or rejects the
null hypothesis, based on results of the analysis.
PROCEDURE:-
State the hypotheses. Every hypothesis test requires the analyst to state a null hypothesis and an alternative
hypothesis. The hypotheses are stated in such a way that they are mutually exclusive. That is, if one is true, the
other must be false; and vice versa.
Formulate an analysis plan. The analysis plan describes how to use sample data to accept or reject the null
hypothesis. It should specify the following elements.
Significance level. Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any value
between 0 and 1 can be used.
Test method. Typically, the test method involves a test statistic and a sampling distribution. Computed from
sample data, the test statistic might be a mean score, proportion, difference between means, difference between
proportions, z-score, t statistic, chi-square, etc. Given a test statistic and its sampling distribution, a researcher can
assess probabilities associated with the test statistic. If the test statistic probability is less than the significance
level, the null hypothesis is rejected.
Analyze sample data. Using sample data, perform computations called for in the analysis plan.
Test statistics. When the null hypothesis involves a mean or proportion, use either of the following equations to
compute the test statistic.
Test statistic = (Statistic - Parameter) / (Standard deviation of statistic)
Test statistic = (Statistic - Parameter) / (Standard error of statistic)
where Parameter is the value appearing in the null hypothesis, and Statistic is the point estimate of Parameter. As
part of the analysis, you may need to compute the standard deviation or standard error of the statistic. Previously,
we presented common formulas for the standard deviation and standard error. When the parameter in the null
hypothesis involves categorical data, you may use a chi-square statistic as the test statistic. Instructions for
computing a chi-square test statistic are presented in the lesson on the chi-square goodness of fit test.
P-value. The P-value is the probability of observing a sample statistic as extreme as the test statistic, assuming the
null hypothesis is true.
Interpret the results. If the sample findings are unlikely, given the null hypothesis, the researcher rejects the null
hypothesis. Typically, this involves comparing the P-value to the significance level, and rejecting the null
hypothesis when the P-value is less than the significance level.
Type I and Type II errors can be defined in terms of hypothesis testing. A Type I error is the probability of
rejecting a true null hypothesis. A Type II error is the probability of failing to reject a false null hypothesis.
Problem Taken:- Within a school district, students were randomly assigned to one of two Math teachers - Mrs.
Smith and Mrs. Jones. After the assignment, Mrs. Smith had 30 students, and Mrs. Jones had 25 students.
At the end of the year, each class took the same standardized test. Mrs. Smith's students had an average test score
of 78, with a standard deviation of 10; and Mrs. Jones' students had an average test score of 85, with a standard
deviation of 15.
Test the hypothesis that Mrs. Smith and Mrs. Jones are equally effective teachers. Use a 0.10 level of significance.
(Assume that student performance is approximately normal.)
OUTPUT / ANSWERS:-
State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: μ1 - μ2 = 0
Alternative hypothesis: μ1 - μ2 ≠ 0
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the difference
between sample means is too big or if it is too small.
Formulate an analysis plan. For this analysis, the significance level is 0.10. Using sample data, we will conduct
a two-sample t-test of the null hypothesis.
Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the
t statistic test statistic (t).
SE = sqrt[(s12/n1) + (s22/n2)]
SE = sqrt[(102/30) + (152/25] = sqrt(3.33 + 9)
SE = sqrt(12.33) = 3.51
DF = (s12/n1 + s22/n2)2 / { [ (s12 / n1)2 / (n1 - 1) ] + [ (s22 / n2)2 / (n2 - 1) ] }
DF = (102/30 + 152/25)2 / { [ (102 / 30)2 / (29) ] + [ (152 / 25)2 / (24) ] }
DF = (3.33 + 9)2 / { [ (3.33)2 / (29) ] + [ (9)2 / (24) ] } = 152.03 / (0.382 + 3.375) = 152.03/3.757 = 40.47
t = [ (x1 - x2) - d ] / SE = [ (78 - 85) - 0 ] / 3.51 = -7/3.51 = -1.99
where s1 is the standard deviation of sample 1, s2 is the standard deviation of sample 2, n1 is the size of sample 1,
n2 is the size of sample 2, x1 is the mean of sample 1, x2 is the mean of sample 2, d is the hypothesized difference
between the population means, and SE is the standard error.
Since we have a two-tailed test, the P-value is the probability that a t statistic having 40 degrees of freedom is
more extreme than -1.99; that is, less than -1.99 or greater than 1.99.
We use the t Distribution Calculator to find P(t < -1.99) = 0.027, and P(t > 1.99) = 0.027. Thus, the P-value =
0.027 + 0.027 = 0.054.
Interpret results. Since the P-value (0.054) is less than the significance level (0.10), we cannot accept the null
hypothesis.
Type 1 error. Significance level in this case is 0.1
Specifically, the approach is appropriate because the sampling method was simple random sampling, the samples
were independent, the sample size was much smaller than the population size, and the samples were drawn from a
normal population.
Learning from experiment:- This experiment gives insights into the use of hypothetical testing on real-life
examples.