Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
98 views

Assignment Problems

The document discusses hypothesis testing concepts and provides examples involving pharmaceutical claims data, web page download times, bottling machine fill amounts, food store sales data, automobile fuel efficiency, and international student admission factors. The examples cover setting up null and alternative hypotheses, calculating test statistics and p-values, and determining if sample results are statistically significant.

Uploaded by

Hari Haran
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views

Assignment Problems

The document discusses hypothesis testing concepts and provides examples involving pharmaceutical claims data, web page download times, bottling machine fill amounts, food store sales data, automobile fuel efficiency, and international student admission factors. The examples cover setting up null and alternative hypotheses, calculating test statistics and p-values, and determining if sample results are statistically significant.

Uploaded by

Hari Haran
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Hypothesis Testing:

1. A pharmaceutical company claims that four out of five doctors


prescribe the pain medicine it produces. If you wish to test this
claim, how would you set up the null and alternative hypotheses?

2. It is found that Web surfers will lose interest in a Web page if


downloading takes more than 12 seconds . If you wish to test the
effectiveness of a newly designed Web page in regard to its
download time, how will you set up the null and alternative
hypotheses?

3. A bottling machine is to be tested for accuracy of the amount it


fills in 2-liter bottles.
A random sample of 37 bottles is taken and the
contents are measured. The data are shown below. Conduct the
test at an alpha of 5%.
Create the null and Alternate Hypothesis.

1. Assume sigma 1.8 cm3. What is the test statistic and what is
its value? What is the p-value?
2. Assume alpha is not known and the population is normal.
What is the test statistic and what is its value?
What is the p-value?
3. Looking at the answers to parts 1 and 2, comment on any
difference in the two results
Sample Data

Sample Data
2001.68
2000.98
1998.41
2000.34
2000.89
2001.07
1997.01
2000.34
1997.86
1998.43
1998.12
1997.85
2000.25
1997.65
2001.17
1997.44
1998.7
1998.67
1997.58
2000.28
1998.89
2000.13
2000.1
2000.39
2001.27
1998.98
2000.21
2000.36
2000.17
1998.67
2001.68
2000.76
1998.53
1998.24
1998.18
2000.67
2001.11

4. Average total daily sales at a small food store are known to be


$452.80. The store’s management recently implemented some
changes in displays of goods, order within aisles, and other
changes, and it now wants to know whether average sales
volume has changed. A random sample of 12 days shows
average sales of $501.90 and the standard deviation to be
$65.00.
Using alpha 0.05, is the sampling result significant? Explain.
5. An automobile manufacturer substitutes a different engine in
cars that were known to have an average miles-per-gallon rating
of 31.5 on the highway.
The manufacturer wants to test whether the new engine changes
the miles-per-gallon rating of the automobile model. A random
sample of 100 trial runs gives mean of 29.8 miles per
gallon and std dev of 6.6 miles per gallon.

Using the 0.05 level of significance, is the average


miles-per-gallon rating on the highway for cars using the new
engine different from the rating for cars using the old engine?

6.

Background and Objective: 


Every year thousands of applications are being submitted by
international students for admission in colleges of the USA. It
becomes an iterative task for the Education Department to know the
total number of applications received and then compare that data with
the total number of applications successfully accepted and visas
processed. Hence to make the entire process easy, the education
department in the US analyze the factors that influence the admission
of a student into colleges. The objective of this exercise is to analyse
the same.
Domain: Education
Dataset Description:

Attribute Description

GRE Graduate Record Exam Scores

GPA Grade Point Average


Rank It refers to the prestige of the undergraduate
institution.
The variable rank takes on the values 1 through 4.
Institutions with a rank of 1 have the highest
prestige, while those with a rank of 4 have the
lowest.

Admit It is a response variable; admit/don’t admit is a


binary variable where 1 indicates that student is
admitted and 0 indicates that student is not
admitted. 

SES SES refers to socioeconomic status: 1 - low, 2 -


medium, 3 - high.

Gender_mal Gender_male (0, 1) = 0 -> Female, 1 -> Male


e

Race Race – 1, 2, and 3 represent Hispanic, Asian, and


African-American 

Analysis Tasks: Analyze the historical data and determine the key


drivers for admission.
Predictive: 
 Find the missing values. (if any, perform missing value
treatment)
 Find outliers (if any, then perform outlier treatment)
 Find the structure of the data set and if required, transform the
numeric data type to factor and vice-versa.
 Find whether the data is normally distributed or not. Use the plot
to determine the same. 
 Normalize the data if not normally distributed.
 Use variable reduction techniques to identify significant
variables.
 Run logistic model to determine the factors that influence the
admission process of a student (Drop insignificant variables) 
 Calculate the accuracy of the model and run validation
techniques.
 Try other modelling techniques like decision tree and SVM and
select a champion model 
 Determine the accuracy rates for each kind of model 
 Select the most accurate model 
 Identify other Machine learning or statistical techniques
 
Descriptive: 
Categorize the average of grade point into High, Medium, and Low
(with admission probability percentages) and plot it on a point chart.  
Cross grid for admission variables with GRE Categorization is shown
below:

GRE Categorized

0-440 Low

440-580 Medium

580+ High
College_admission.cs
v
The dataset is given below:

DESCRIPTION

Background and Objective:


The data gives the details of third party motor insurance claims in Sweden for the year 1977. In
Sweden, all motor insurance companies apply identical risk arguments to classify customers, and
thus their portfolios and their claims statistics can be combined. The data were compiled by a
Swedish Committee on the Analysis of Risk Premium in Motor Insurance. The Committee was asked
to look into the problem of analyzing the real influence on the claims of the risk arguments and to
compare this structure with the actual tariff.
Domain: Insurance
Dataset Description: 
The insurance dataset holds 7 variables and the description of these variables are given below: 

Attribute Description

Kilometer Kilometers travelled per year 


s 1: < 1000 
2: 1000-15000 
3: 15000-20000 
4: 20000-25000 
5: > 25000

Zone Geographical zone 


1: Stockholm, Göteborg, and Malmö with surroundings
2: Other large cities with surroundings 
3: Smaller cities with surroundings in southern Sweden 
4: Rural areas in southern Sweden 
5: Smaller cities with surroundings in northern Sweden 
6: Rural areas in northern Sweden
7: Gotland
Bonus No claims bonus; equal to the number of years, plus one, since the last claim.

Make 1-8 represents eight different common car models. All other models are
combined in class 9.

Insured  The number of insured in policy-years.

Claims  Number of claims

Payment  The total value of payments in Skr (Swedish Krona)

Analysis Tasks: After understanding the data, you need to help the committee with the following by
the use of the R tool:
 
 The committee is interested to know each field of the data collected through descriptive
analysis to gain basic insights into the data set and to prepare for further analysis.  
 The total value of payment by an insurance company is an important factor to be monitored.
So the committee has decided to find whether this payment is related to the number of claims and
the number of insured policy years. They also want to visualize the results for better understanding. 
 The committee wants to figure out the reasons for insurance payment increase and
decrease. So they have decided to find whether distance, location, bonus, make, and insured
amount or claims are affecting the payment or all or some of these are affecting it. 
 The insurance company is planning to establish a new branch office, so they are interested
to find at what location, kilometre, and bonus level their insured amount, claims, and payment gets
increased. (Hint: Aggregate Dataset) 
 The committee wants to understand what affects their claim rates so as to decide the right
premiums for a certain set of situations. Hence, they need to find whether the insured amount, zone,
kilometre, bonus, or make affects the claim rates and to what extent. 

The dataset is:

Insurance_factor_ide
ntification.csv

You might also like