Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

All The Previous Questions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

Debark University

College of Natural and Computational Science


Department of Statistics

Regression analysis questions

1. In a multiple regression analysis, if the model provides a poor fit, this indicates that:

A) The sum of squares for error will be large

B) The standard error of estimate will be large

C) The multiple coefficient of determination will be close to zero

D) All of the above

Answer: D

2. The problem of multi-collinearity arises when the:

A) Dependent variables are highly correlated with one another

B) Independent variables are highly correlated with one another

C) Independent variables are highly correlated with the dependent variable

D) None of the above

Answer: B

3. in simple linear regression, when β is not significantly different from zero us conclude that:

A) X is a good predictor of Y

B) There is no linear relationship between X and Y

C) The relationship between X and Y is quadratic

D) There is no relationship between X and Y

Answer: B
Questions 4 – 10

A scientific foundation wanted to evaluate the relation between y= salary of researcher (in
thousands of dollars), x1= number of years of experience, x2= an index of publication quality,
x3=sex (M=1, F=0) and x4= an index of success in obtaining grant support. A sample of 35
randomly selected researchers was used to fit the multiple regression models. Parts of the
computer output appear below.

Predictor Coef SE Coef T P

Constant 17.846931 2.001876 8.915 0.0001

Years 1.103130 0.359573 3.068 0.0032

Papers 0.321520 0.037109 0.0002

Sex 1.593400 0.687724 2.317 0.0083

Grants 1.288941 0.298479 4.318 0.0003

s = 1.75276 R-sq = 92.3% adj R-sq = 91.4%

4. The least squares line fitted to the data is:

A) Salary = 2.001 + 0.33 x1 + 0.04 x2 + 0.69 x3 + 0.30 x4 + ε

B) Salary = 17.85 + 1.10 x1 + 0.32 x2 + 1.59 x3 + 1.29 x4 + ε

C) Salary = 2.001 + 0.33 x1 + 0.04 x2 + 0.69 x3 + 0.30 x4

D) Salary = 17.85 + 1.10 x1 + 0.32 x2 + 1.59 x3 + 1.29 x4

Answer: D

7. The variable that helps the most in predicting salary is:

A) Intercept B) years C) papers D) sex E) grants


Answer: C

8. Which of the following gives a 95% CI for β1?

A) 17.847± t* (2.002) B) 17.847± t* (8.915)

C) 1.1031± t* (.3596) D) 1.1031 ± t* (3.068)

Answer: C

9. How many degrees of freedom do the t* value from the previous question has?

A) 34 B) 33 C) 30 D) 4

Answer: C

10. According to the assumptions, what has to have a Normal distribution and constant variance?

A) The researchers B) the years C) the variables D) the salaries

Answer: D

11. Should anything be done to improve this model?

A) No, it has very good ANOVA p-value, R-sq and R-sq adjusted.

B) No, it has a lot of parameters so it does a good job of predicting sales.

C) Yes, not all the variables included in the model are good predictors of dependant variable.

D) Yes, "price" and "display" should be taken out since they have negative coefficients

Answer: C

12. Both the prediction interval for a new response and the confidence interval for the mean
response are narrower when made for values of x that are:

A) Closer to the mean of the x’s B) further from the mean of the x’s

C) Closer to the mean of the y’s D) further from the mean of the y’
Answer: A

13. In multiple regressions with p predictor variables, when constructing a confidence interval
for any βi, the degrees of freedom for the tabulated value of t should be:

A) n-1 B) n-2 C) n- p-1 D) p-1

Answer: C

14. In a regression study, a 95% confidence interval for β1 was given as: (-5.65, 2.61). What
would a test for H0: β1=0 vs. Ha: β10 conclude?

A) Reject the null hypothesis at α=0.05 and all smaller α

B) Fail to reject the null hypothesis at α=0.05 and all smaller α

C) Reject the null hypothesis at α=0.05 and all larger α

D) Fail to reject the null hypothesis at α=0.05 and all larger α

Answer: B

15. Which of the following criteria is the most optimal for assessing the goodness of the fit of a
multiple linear regression model?

A) Adjusted R2
B) R2
C) The intercept
D) The coefficient

Answer: A

16. Which of the following are correct?

A) The intercept/constant (β0) is the mean-Y when X=0


B) The intercept is amount of change in mean-Y when X=0

C) The Coefficient is mean-Y at certain value of X


D) The coefficient (β) is the amount of change in mean-Y for every unit increase in
dependant variables

Answer: A

17. What does the least squares method does exactly?

A) Minimizes the distance between the data points


B) Finds the least problematic regression line
C) Finds those (best) values of the intercept and slope that provide us with the smallest value
of the residual sum of squares
D) Finds those (best) values of the intercept and slope that provide us with the largest value
of the sum of residuals

Answer: C

20. Suppose the variable x2 has been omitted from the following regression equation, Y
. Is the estimator obtained when x2 is omitted from the equation? The bias
in is negative if _____

A) >0 and x1 and x2are positively correlated

B) <0 and x1 and x2are positively correlated

C) =0 and x1 and x2are negatively correlated

D) ≠0 and x1 and x2are negatively correlated

Answer: B

21. Suppose the variable x2 has been omitted from the following regression equation, Y
. Is the estimator obtained when x2 is omitted from the equation? The bias
in is positive if _____
A) >0 and x1 and x2are positively correlated

B) <0 and x1 and x2are positively correlated

C) =0 and x1 and x2are negatively correlated

D) ≠0 and x1 and x2are negatively correlated

Answer: A

23. In testing the validity of a multiple regression model, a large value of the

F -test statistic indicates that:

A) Most of the variation in the independent variables is explained by the variation in y

B) Most of the variation in y is explained by the regression equation

C) Most of the variation in y is unexplained by the regression equation

D) The model provides a poor fit

Answer: B

25. When start with a model with all K candidate regressor and drop a regressor which has the
smallest F-statistic < FOUT is defined as:

A) Forward selection

B) Backward selection

C) Stepwise selection

D) None

Answer: B
SQC Model Q & A
1. The no-inspection alternative of sampling is used when ______________
a. The supplier’s process is so good that defective units are never encountered
b. The supplier’s process is so bad that almost every unit is defective
c. The component is extremely critical
d. The component is moderately critical
Answer: a
Clarification: When the supplier’s process is so good that the defective units are
never encountered. This shows that the supplier process capability is quite high
2. The term AQL as used in sampling inspection, means
a. that level of lot quality for which there is a large risk of rejecting the lot.
b. the Average Quality Limit.
c. the maximum percent defective that can be considered satisfactory as a
process average.
d. the quality level.
3. Which of the following best describes what an operating characteristic curve shows?
a. The probability of accepting lots of various quality levels by sampling methods.
b. The operating characteristics of a machine.
c. How to operate a machine for best quality results.
d. The probability that a lot contains a certain number of rejects.
4. Which of these steps, is not carried out when the process becomes out-of-control when
using the cusum control charts?
a) Search for an assignable cause
b) Taking corrective action
c) Restarting the control chart from zero
d) Continuing the control chart
Answer: d
Clarification: When an out-of-control situation is encountered while using the cusum
control charts, we use the same procedure, used as in the Shewhart control charts. First,
identifying the assignable cause, then second, taking corrective action, and then restarting
the control chart from zero.
5. The primary use of a control chart is to
a. detect assignable causes of variation in the process.
b. detect non-conforming product.
c. measure the performance of all quality characteristics of a process.
d. detect the presence of random variation in the process.
6. Statistical quality control is best described as
a. keeping product characteristics within certain bounds.
b. calculating the mean and standard deviation.
c. the study of the characteristics of a product or process, with the help of
numbers, to make them behave the way we want them to behave.
d. the implementation of ISO 9000.
7. Degree to which design specifications are followed in manufacturing the product is called
a) Quality Control
b) Quality of conformance
c) Quality Assurance
d) None of the mentioned
Answer: b
8. When there are very tight specifications, overlapping assembly tolerances problems,
which of the control charts are used?
a) Attributes control charts
b) Variables control charts
c) Both, attributes control charts and the variables control charts can be used
d) Neither one of attributes or variables control charts can be used
Answer: b
Explanation: Variable control charts are more favored when there are very tight
specifications, overlapping assembly tolerances problems. This is because variables
control charts deal better with specifications of quality characteristics than the attributes
control chart.
9. 0.27 percent outside the normal tolerances can be obtained using ____________
a. 6-sigma both sides of mean
b. 3-sigma both sides of mean
c. 2-sigma both sides of mean
d. 8-sigma both sides of mean
Answer: b
Explanation: When there is NTL (s) of μ±3σ, i.e. 3-sigma both sides of the mean
of the variable, the 99.73% products are between the specification limit. So there
are 0.27% outside the normal tolerances.
10. The basic requirement for the success of any quality control plan is the availability of
experienced, knowledgeable and trained personnel at all levels.
a) True
b) False
Answer: a Clarification: It is the very basic requirement for the success of any quality
control plan is the availability of experienced, knowledgeable and trained personnel at all
levels.
11. Which of these is an advantage of variable control chart?
a) Numerous quality characteristics considered at a time
b) To achieve the information very easily about the mean and variability
c) To have analyses of units nonconforming
d) To analyze the defects in one unit
Answer: b
Explanation: The main advantage of a variable control chart is that they provide much
useful information about the process performance. They give more functional
information about the process mean and variability.
12. Double-sampling plan is __________
a. Only 2 units are checked
b. Only the first and last lot is checked 100%
c. Only two samples of n units are checked (necessarily)
d. Only two samples of n units are checked (conditionally)
Answer: d
Clarification: In the double-sampling plan, there are only two samples of n units are
checked. The lot disposition is totally based upon the information from the first and
second sample. There may not be two samples necessary to make decision.
13. Once a set of reliable control limits is obtained, we use the control chart for monitoring
future production. This is called __________
a. Phase I control chart usage
b. Phase II control chart usage
c. Phase III control chart usage
d. Phase IV control chart usage
Answer: b
Explanation: The use of reliable control limits to monitor the future production from a
process, is generally mentioned as the Phase II application of control chart while, setting
the trial control limits to monitor the process is called the Phase I application.
14. What type of variation occurs when a process is in control:
a. Variable
b. Attribute
c. Common Casue
d. Special Cause
e. None of the above
Answer: c
15. The control chart used to inspect the process state by using the average number of
nonconformities per unit data, is called _______________
a) u-chart
b) c-chart
c) p-chart
d) R-chart
Answer: a
Explanation: The control chart, which uses the average number of nonconformities
per unit data to analyze the process state, is generally termed as the Control chart for
Average number of Nonconformities per Unit
16. Here are necessarily 2 samples of n units taken and checked in the case of double
sampling plan.
a. True
b. False
Answer: b
Clarification: In the Double sampling plan, it is not necessary to take two
samples. Sometimes we can accept or reject the lot based upon the information
from the first lot.
17. What is done in single sampling plan?
a. Only one unit is checked
b. Only the first lot is checked 100%
c. Only n samples of 1 unit are checked
d. Only one sample of n units is checked
Answer: d
Clarification: Single-sampling plan is a lot sentencing procedure in which one sample of
n units is selected and checked, at random from the lot. Lot is sentenced based upon only
this sample
18. Which of these does not require sampling documentation at all?
a. 0% sampling
b. 100% inspection
c. Acceptance inspection
d. 50% inspection
Answer: a
Clarification: As there is no sampling done in the case of 0% inspection or
sampling, there is no need to do the documentation for the same. So it doesn’t
require any sampling documentation at all.
19. Which of these is not used for a lot quality inspection purposes?
a. EWMA Control chart
b. Cusum chart
c. Shewhart control charts
d. Acceptance Sampling
Answer: d
Clarification: The Acceptance sampling procedure is used for decision making of
either acceptance or rejection of a lot. It can’t be used as a lot quality estimators.
20. Reliability prediction is
a. the process of estimating performance.
b. the process of estimating the probability that a product will perform its
intended function for a stated time.
c. the process of telling "how you can get there from here."
d. All of the above.
21. A set of components has a MTBF of 1000 hours. What percentage will fail if the
components are tested for 500 hours?
a. 25%
b. 39%
c. 61%
d. 50%
22. What is the reliability of a system at 850 hours, if the average usage on the system was
400 hours for 1650 items and the total number of failures was 145? Assume an
exponential distribution.
a. 0%
b. 36%
c. 18%
d. 83%

23. Which of these quite successfully fulfills the following sentence?


“The rejection of entire lots as opposed to the simple return of defectives often provides
stronger motivation to the supplier for quality improvements.”
a. 0% sampling
b. 100% sampling
c. Random % of the lot sampling
d. Acceptance sampling
Answer: d
Clarification: As in the case of acceptance sampling, there are only a few units
selected to be returned to the supplier, instead of returning the whole lot, it
motivates the supplier for quality improvements.
24. The Shewhart control charts for sample size n=1 are called ________
a. Single Sample control charts
b. Stationary control charts
c. Control charts for zero variance
d. Control charts for individual measurements
Answer: d
Explanation: The Shewhart control charts for sample size n=1 are called, Control
charts for individual
25. Which of these is not a major use of the PCA (Process Capability Analysis)?
a. Prediction of how well the process will hold the tolerances
b. Reduction of variability in process
c. Establishing an interval between sampling
d. Stating the need of the Acceptance sampling
Answer: d
Explanation: PCA or the Process Capability Analysis is used to predict how well
the process will hold the tolerance and to reduce the variability. It also gives us
an interval between sampling.
Statistical computing I

1. Which one of the following is use of SPSS.


A. Manipulate data
B. Produce descriptive and inferential statistics
C. Produce table and graph
D. All
2. Which one is key feature of SPSS.
A. Spreadsheet view of data C. Easy to document
B. Good graphic capability D. all
3. Which one is not type of window in SPSS.
A. Output viewer C. Syntax editor
B. Data editor D. None
4. Which one is true about data editor window.
A. Used for data entering
B. Used to display results
C. Used to edit results
D. All
5. In the variable view window, you can do the following, except
A. choose nominal, ordinal, or scale data
B. enter data
C. specify the type of variable
D. create a new variable
6. Variable naming in SPSS requires the following, except
A. Names are case sensitive
B. Names cannot contain space
C. Names do not end with period
D. Names should up to 8 character
7. Which type of variable attribute is used for measuring the variable
A. Align C. Lable
B. Type D. Measure
8. In the output viewer
A. Only tables display C. All results display
B. Only charts display D. None
9. In chart editor window you can modify high resolution
A. charts C. Tables
B. plots. D. All
10. From the variable attributes______is used to describe the variable.
A. Lable C. Measure
B. Align D. Type
11. Which of the following is false about cross tabulation. It used to
A. Examining the relationship between two continuous variables
B. Test for significant differences in column proportions in the cross tabulation table.
C. Test independence and measures of association for nominal and ordinal data.
D. All
12. Which menu would you select to run statistical procedures?
A. Graph menu C. Data menu
B. Analyze menu D. Transform menu
13. Which of the following variable names will reject in SPSS?
A. @edu C. Age
B. with D. sex

14. How variable names is differs from variable lable?


A. Shorter and less detail
B. Longer and more detail
C. Abstract and unspecific
D. Refers cods rather than variables
15. How would you use the drop down mainus in SPSS to generate frequency tables.
A. Analyze, descriptive, descriptive
B. Analyze, descriptive, frequency
C. Analyze, frequency, pearson
D. All
16. In which sub_dialog box can the chi_square test found
A. Frequency, percentage C. Bivariate, pearson
B. Cross tab, statistics D. Gender, female
17. In assigning variable names, the variable in a given file must be
A. Unique
B. Duplication allow.
C. Duplication not allow
D. Except B
18. In the data view,
A. columns are cases
B. Rows are variable names.
C. Columns are variable names.
D. None
19. In SPSS, what is data viewer?
A. A table summarizing the frequency of data for one variable
B. A spreadsheet in to which data can be entered
C. A dialog box that allows you to choose statistical test
D. A screen in which variables can be defined and labled
20. Which SPSS window is use full for data entering?
A. Data view C. Output view
B. Variable view D. Syntax editor
21. When SPSS in first open at the top of the screen is
A. Status bar C. Syntax window
B. Menu bar D. Output window
22. When SPSS in first open at the bottom of the screen is
A. Status bar C. Syntax window
B. Menu bar D. Output window
23. The paired t-test is used to test statistical significance between ____population means.
A. Three C. Two
B. One D. Five

15 | P a g e
24. The________provides extensive facilities for editing contents
A. Numerical Variable C. Syntax editor
B. Categorical variable D. Output viewer
25. Graphical user interface makes SPSS
A. Complicate C. Easy
B. Simple D. B&C
Computing II
1. Which of the following is not true about R Console window
a. used to write R commands
b. displays all the commands R has run
c. displays all the results
d. opens automatically when you create a graph
2. R functionality is divided into a number of ________
a. Packages c. Domains
b. Functions d. Classes
3. Which of the following is default prompt for UNIX environment in R?
a. > c. <<
b. < d. >>

4. If a command is not complete at the end of a line, R will give a different prompt, by
default it is ____________
a. + c. /
b. ? d. –
5. They primary R system is available from the ______
a. CRDO c. CRWO
b. CRAN d. GNU
6. Point out the wrong statement?
a. Key feature of R was that its syntax is very similar to S
b. R runs only on Windows computing platform and operating system
c. R has been reported to be running on modern tablets, phones, PDAs, and game
consoles

16 | P a g e
d. R functionality is divided into a number of Packages
7. A matrix is ___dimensionsinal rectangular data set?
a. 2 c. 4
b. 3 d. 1
8. Data frames can be converted to a matrix by calling data._______
a. matr() c. matrix()
b. mat() d. none
9. R is an__________programming language?
a. Closed source c. Definite source
b. Open source d. GPL
10. Which function is used to create the vector with more than one element in R?
a. Library() c. c()
b. plot() d. par()
11. Which of the following is an example of vectorized operation as far as subtraction is
concerned in R?
a. x+y c. x*y
b. x-y d. x/y
12. What would be the output of the following code?
> x <- 1:4
> z <-x^2
a. 1 2 3 4
b. 16 9 4 1
c. 1 4 9 16
d. all
13. What would be the output of the following code?
> x <- 1:4
>x>2
a. 1 2 3 4
b. 4 3 2 1
c. FALSE FALSE TRUE TRUE
d. TRUE TRUE FALSE FALSE

17 | P a g e
14. which one is used to quit the R program
a. q() c. stop()
b. qt() d. none

15. R language is a dialect of which of the following languages


a. S c. SAS
b. C d. MATLAb
16. From the given alternative, which one of the following Graphical parameter used to
change line width?
a. lty c. lwd
b. col d. font
17. If x is a time series, then which of the following plot function is used to plot a time series
data in R?
a. plottime(x) c. series(x)
b. plot(x) d. plot(x,y)
18. Which of the following steps is typically used to generate reports and graphs in SAS?
a. DATA c. REPORT
b. PROC d. RUN
19. Which is of the following window is not SAS programing window?
a. Editor window c. Output window
b. Log window d. Chart window
20. Which PROC statement is used to calculate the mean of data set in SAS.
a. PROC CONTENTS c. PROC MEANS
b. PROC PRINTS d. PROC REG

21. t.test(x,y, alternate=”less”,var.equal=TRUE) is used to calculate_______


a. paired t-test c. regression
b. one sample t-test d. independent t-test
22. A linear regression of on is executed by the following command
a. Fitted.model<-lm(Y~X1+X2)
b. Fitted.model<-lm(Y~X1+X2+…..+Xp)

18 | P a g e
c. Fitted.model<-(Y~X1+X2)
d. Fitted.model<-(Y~X1+X2+……+Xp)
23. which R function is used to compact a categorical variable
a. vector c. factor
b. array d. data frames

24. Which R function is used to display the structure of each column of data frame?
a. Structure() c. Print()
b. Str() d. Seq()
25. Which of the following R window is used to display graphs
a. R console c. R editor
b. R graphics d. R output

CDA Model Questions


1. In the United States, the estimated annual probability that a woman over the age of 35
dies of lung cancer equals 0.001304 for current smokers and 0.000121 for nonsmokers.
Which is correct interpretation of an odds ratio?
a. Women who smoke and are over the age of 35 are 10.7896 times to die of lung
cancer than women who do not smoke and are over the age of 35
b. The probability of smoker is 10.7769 times that for non smoker
c. Women who smoke and are over the age of 35 are less likely to die of lung cancer
than women who do not smoke and are over the age of 35
d. Women who smoke and are over the age of 35 are 0.5836 times to die of lung cancer
than women who do not smoke and are over the age of 35
2. Which one of following is false
a. Logistic regression for binary Y is a special case of the baseline-category logit and
cumulative logit model with J = 2.
b. The cumulative logit model for J response categories corresponds to a logistic
regression model holding for each of the J-1 cumulative probabilities, such that
the curves for each cumulative probability have exactly the same shape (i.e., the
same parameter); that is, they increase or decrease at the same rate, so one can
use to describe effects that apply to all J-1 of the cumulative probabilities.

19 | P a g e
c. In the example just mentioned, at the lowest cholesterol level, the observed
number of heart disease cases equals 31. The standardized residual equals 1.35.
This means that the model predicted 29.65 cases (i.e., 1.35 = 31- 29.65).
d. Suppose that income (high, low) and gender are conditionally independent,
given type of job (secretarial, construction, service, professional, etc.). Then,
income and gender are also independent in the 2 × 2 marginal table (i.e.,
ignoring, rather than controlling, type of job).
3. FBI website (www.fbi.gov) stated that of all blacks slain in 2005, 91% were slain by
blacks, and of all whites slain in 2005, 83% were slain by whites. Let Y denote race of
victim and X denote race of murderer. Which conditional distribution do these statistics
refer to,

a. Y given X
b. X given Y
c. p(Y/X)
d. P(X/Y)
4. For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender
(female, male) and survival (yes, no) was 11.4, Based on this the relative risk
interpretation is.
a. The probability of survival for females was 11.4 times that for males
b. The odds of survival for females was 11.4 times that the odds of survival for males
c. The probability of survival for females was 3.6667 times that for males
d. The odds of survival for females was more likely that the odds of survival for males
5. Based on the estimated expected frequencies and standardized residuals table 1 below,
which one is interpretation of the standardized residuals in the corner cells having counts
21 and 83.
a. The original give table has large negative residuals for subjects who have above
average income and are pretty happy, and subjects who have below average income
and are very happy. Thus, there were fewer subjects of these types than the
hypothesis of independent predicts.
b. The original give table has large positive residuals for subjects who have average
income and are very happy, and subjects who have below average income and are not
too happy. Thus, there were more subjects of these types than the hypothesis of
independent predicts.
c. The original give table has large negative residuals for subjects who have above
average income and are not too happy, and subjects who have below average
income and are very happy. Thus, there were fewer subjects of these types than
the hypothesis of independent predicts.
d. The original give table has large positive residuals for subjects who have above
average income and are very happy, and subjects who have below average income

20 | P a g e
and are not too happy. Thus, there were more subjects of these types than the
hypothesis of independent predicts

Table 1:

6. which one of the following is false


a. In 2 × 2 tables, statistical independence is equivalent to a population odds ratio value
of θ = 1.0.
b. We found that a 95% confidence interval for the odds ratio relating having a heart
attack (yes, no) to drug (placebo, aspirin) is (1.44, 2.33). If we had formed the table
with aspirin in the first row (instead of placebo), then the 95% confidence interval
would have been (1/2.33, 1/1.44) = (0.43, 0.69).
c. Using a survey of college students, we study the association between opinion
about whether it should be legal to (1) use marijuana, (2) drink alcohol if you
are 18 years old. We may get a different value for the odds ratio if we treat
opinion about marijuana use as the response variable than if we treat alcohol use
as the response variable.
d. Interchanging two rows or interchanging two columns in a contingency table has no
effect on the value of the X2 or G2 chi-squared statistics. Thus, these tests treat both
the rows and the columns of the contingency table as nominal scale, and if either or
both variables are ordinal, the test ignores that information.
7. A difference between logit and loglinear models is
a. when both are fitted to a contingency table having 100 cells, the logit model treats the
cell counts as 25 binomial observations whereas the loglinear model treats the cell
counts as 50 poisson observations.
b. The logit model is a generalized linear model assuming a binomial random
component whereas the loglinear model is a generalizesed linear model assuming a
poisson random component.

21 | P a g e
c. when both are fitted to a contingency table having 50 cells, the logit model treats the
cell counts as 25 binomial observations whereas the loglinear model treats the cell
counts as 50 poisson observations.
d. b & c
8. Of smokers who get lung cancer, “women were 1.7 times more vulnerable than men to
get small-cell lung cancer, 1.7 is
a. Odds ratio b. difference of proportion c. relative risk d.odds of smoker
9. Based on the table below, which one of the FF is false
a. logit( ̂(x))=−1.0736−0.7195AZT+0.055Race is logit model
b. The estimated odds of AZT treatment is developing AIDS symptoms are 0.49 times
the estimated odds of AZT treatment dopes not developing AIDS symptoms.
c. The effect of race and AZT is not significant
d. none
Table 2:
Computer output for logit model with AIDS Symptoms Data

Use the following information to answer questions 10, 11 & 12.


An article in the NewYork Times (February 17, 1999) about the PSA blood test for
detecting prostate cancer stated that, of men who had this disease, the test fails to detect
prostate cancer in 1 in 4 (so called false-negative results), and of men who did not have it,
as many as two-thirds receive false-positive results. Let C (C¯) denote the event of
having (not having) prostate cancer and let + (-) denote a positive (negative) test result.
10. Which one of the following is true
a. P (-|C) = 1/4 & P (C¯ |+) = 2/3
b. P (+|C¯) = 2/3 & P (C|-) = ¼
c. P (-|C) = ¼ & P (+|C¯) = 2/3
11. The sensitivity of this test is
a. 0.3325 b.0.66 c.0.75 d. 0.5
12. The probability of men diagnosed with prostate cancer given that they tested positive is
a. 0.01124 b. 0.3325 c. 0.01 d. 0.0075

22 | P a g e
13. For the 23 space shuttle flights before the Challenger mission disaster in 1986, Table 4.10
shows the temperature (◦F) at the time of the flight and whether at least one primary O-
ring suffered thermal distress. which is false
a. Estimated odds of thermal distress multiply by 0.79 for each 1◦ increase in
temperature.
b. At temperature = 31, ̂ = 0.543
c. temperature is significant
d. logit model : logit( ̂) = 15.043 - 0.232x
Table 3:

14. The happiness categories (not, pretty, very), income category (below average income,
average income, above average income). Table 3 shows output for a baseline-category
logit model with very happy as the baseline category and scores {1, 2, 3} for the income
categories. which is false
a. The estimated odds of being in the lower category (less happy) decreases as
income increases.
b. model fits adequately
c. the probability that a person with average family income reports a very happy
marriage is 0.61
d. none
Table 4. Output on Modeling Happiness

23 | P a g e
15. Multinomial distribution have several categories, then the distribution of each category is
a. poisson
b. binomial
c. normal
d. multinomial
16. Table 5 shows output for a cumulative logit model with scores {1, 2, 3} for the income
categories, why the output reports two intercepts but one income effect, because With 3
response categories, there are two cumulative probabilities to model, and hence intercept
parameters. The proportional odds form of model has the same predictor effects for each
cumulative probability (i.e., the curves have the same shape), so only one effect is
reported for income.
a. use one category as reference
b. With 3 response categories, there are two cumulative probabilities to model
c. income is ordinal category
d. all
Table 5:

24 | P a g e
17. A third variable is introduced in the two variable table to
a. Refine the association that was observed originally between two variables.
b. The introduction of third variable may show that there was no association between
the original two variables.
c. Introducing a third variable may indicate association between two original variables
although initially no relationship was found between them.
d. All of the above are true
18. Let Y = political ideology (on an ordinal scale from 1 = very liberal to 5= very
conservative), = gender(1 = female, 0= male), = political party(1 =Democrat, 0 =
Republican). how many cumulative logit model
a. 1 b. 3 c. 4 d. 5 e. 2
Use the following information to answer questions 19 and 20
Suppose that customers enter a waiting line at random at a rate of 4 per minute.
19. Assuming that the number entering the line during a given time interval has
a. binomial distribution b. Poisson distribution c. geometric distribution e.none
20. probability of at least one customer enters during a given half-minute time interval
a. 0.14 b. 0.6 c. 0.87 d. 0.5
21. Which one of following is false
a. One reason it is usually wise to treat an ordinal variable with methods that use the
ordering is that in tests about effects, chi-squared statistics have smaller df values, so
it is easier for them to be farther out in the tail and give small P-values; that is, the
ordinal tests tend to be more powerful.
b. The cumulative logit model assumes that the response variable Y is ordinal; it should
not be used with nominal variables. By contrast, the baseline category logit model
treats Y as nominal. It can be used with ordinal Y, but it then ignores the ordering
information.

25 | P a g e
c. If political ideology tends to be mainly in the moderate category in New Zealand
and mainly in the liberal and conservative categories in Australia, then the
cumulative logit model with proportional odds assumption should fit well for
comparing these countries.
d. none
22. For General Social Survey data on Y= political ideology(categories liberal, moderate,
conservative), = gender(1- female, 0= male), and = political party(1= Democrat,0=
Republican), the ML fit of the cumulative logit model is logit[p(Y )] =
. Hence, for each gender, according to this model
a. The estimated odds that a Democrat’s response is liberal rather than moderate or
conservative is 2.6 times the estimated odds for Republican’s.
b. The estimated odds that a Democrat’s response is liberal or moderate rather than
conservative is 2.6 times the estimated odds for Republican’s.
c. The estimated odds that a Liberal response is Democrat rather than Republican is 2.6
times the estimated odds for conservative.
d. a & b
23. If X and Y are binary, and Z has K categories, so the data can be summarized in a
2 contigency table, one can test conditional independence of X and Y,
controlling for Z, using
a. likelihood-ratio test
b. Pearson’s chi-square test
c. standardized residual test
d. all
24. which one of the following is false
Let denote the probability that a randomly selected respondent supports current laws
legalizing abortion, predicted using gender of respondent (G =0, male; G= 1, female),
religious affiliation ( = 1. Protestant, 0 otherwise; =1. Catholic, 0 otherwise;
=0, Jewish), and political party affiliation ( =1, Democrat, 0 otherwise; =1,
Republican, 0 otherwise, = 0, Independent). The logit model with main effects
has prediction equation.
logit( ̂) = 0.11 + 0.16G – 0.57 - 0.66 + 0.47 - 1.67
For this prediction equation,
a. Females are estimated to be more likely than males to support legalized abortion,
controlling for religious affiliation and political party affiliation.
b. Controlling for gender and religious affiliation, the estimated odds that a Democrat
supports legalized abortion equal times the estimated odds that a
Republican supports legalized abortion.
c. The estimated probability that a male Jewish Independent supports legalized
abortion equals

26 | P a g e
d. The estimated probability of supporting legalized abortion is highest for female
Jewish Independents.
25. Which is not applicable for significance tests for any parameter in a statistical model,
when the sample size is less than 30 or n .
a. Wald test
b. Pearson chi-square test
c. score test
d. likelihood ratio test

27 | P a g e
Basic statistics
1. You asked ten of your classmates about their age. On the basis of this information, you stated
that the average age of all students in Debark university is 25 years. This is an example of:

A, Descriptive statistics

B, Inferential Statistics

C, Parameter

D, Population

2. In statistics, a population consists of:

A, All People living in the area under study

B, All People living in a country

C, All subjects or objects whose characteristics are being studied

D, None of the above

3. Parameter is a measure which is computed from

A, Sample data

B, Population data

C, Test statistics

C, None of these

4. A variable that assumes any value within a range is called

A, Discrete variable

B, Dependent variable

C, Independent variable

D, Continuous variable

5. The data collected by the researcher him/her self is called

A, Discrete data

28 | P a g e
B, Primary data

C, Inferential data

D, Secondary data

6. Which one of the following is incorrect about graphical presentation of data?

A, It is easier to understand

B, It can present the data in simply way

C, It is the most popular way of describing the data

D, It used for prediction

E, None

7. If there is no gap between the consecutive classes, the limits are called

A, Class limits

B, Class intervals

C, Class boundaries

D, Class marks

8. Data must be arranged either in ascending or descending order if some want to compute

A, Mode

B, Mean

C, Geometric Mean

D, Median

9. The measure of location which is the most likely to be influenced by extreme values in the
data set is

A, range

B, median

C. mode

29 | P a g e
D, mean

10. To find the average speed of a car which is the appropriate measure

A, Mean

B, Geometric Mean

C, Harmonic Mean

D, Weighted Mean

11. The sum of squares of deviations of the values is least” when deviations are taken from

A, Mean

B, Median

C, Mode

C, Harmonic mean

12. The mean of a distribution is 10, the median is 15, and the mode is 20. It is most likely that
this distribution is:

A, Positively Skewed

B, Negatively Skewed

C, Symmetrical

D, Asymptotic

13. The range of a sample gives an indication of the

A, Number of observations bearing the same value

B, Way in which the values cluster about a particular point

C, Degree to which the mean value differs from its expected value

D, Maximum variation in the sample

14. Which of these is a relative measure of dispersion?

A, Standard Deviation

30 | P a g e
B, Variance

C, Coefficient of Variation

D, Median

15. Sum of deviations will be zero if it is taken from

A, Mean

B, Mode

C, Median

D, Standard Deviation

16. The variance of 5 numbers is 20. If each number is divided by 2, then the variance of new
numbers is

A, 0

B, 5

C 18

D, 20

17. If a distribution is abnormally tall and peaked, then is can be said that the distribution is:

A, Leptokurtic

B, Pyrokurtic

C, Platykurtic

D, Mesokurtic

18. The mean of a distribution is 20 and the variance is 25. What is the value of the coefficient of
variation?

A, 4%

B, 20%

C, 25 %

D, 22.5%

31 | P a g e
Answer Q19 and Q20 using the information given below

Mark No of students

1-5 3

6-10 2

11-15 10

16-20 8

19. The median class is

A, 1-5

B, 6-10

C, 11-15

D, 16-20

20. The mode of the above data is

A, 10.5

B, 14.5

C, 16.5

D, No mode

Answer sheet basic statistics

1. B 11.A

2. C 12.B

3. B 13.D

4. D 14.C

5. B 15.A

32 | P a g e
6. D 16.B

7. C 17.C

8. D 18.C

9. A 19.C

10. C 20.B

TIMESERIES ANALYSIS
1. X t  0.8 X t 1   t  0.5 t 1 ,  t ~ WN( 0,   2 ) . Which of the following statement is true about
the process?
a. It is ARIMA(1,1, 1) B. It is stationary C. A and B D. none
2. Which of the following is true about the AR(2) model?
A. ACF cuts off after lag2
B. PACF cuts off after lag2
C. PACF tails off
D. ACF cuts off after lag1
3. Prosperity, Recession, and depression in a business is an example of
A. Irregular Trend
B. Secular Trend
C. Cyclical Trend
D. Seasonal Trend
4. A fire in a factory delaying production for some weeks is
A. Secular Trend
B. Cyclical Trend
C. Irregular Trend
D. Seasonal Trend

33 | P a g e
5. For the following MA (3) process yt = μ + Εt + θ1Εt-1 + θ2Εt-2 + θ3Εt-3 , where σt is a zero
mean white noise process with variance σ2
A. ACF = 0 at lag 3
B. ACF =0 at lag 5
C. ACF =1 at lag 1
D. ACF =0 at lag 2
6. The pacf (partial autocorrelation function) is necessary for distinguishing between ______ ?
A. An AR and MA model
B. An AR and an ARMA
C. An MA and an ARMA
D. Different models from within the ARMA family

7. Which of the following statement is correct?


1. If autoregressive parameter (p) in an ARIMA model is 1, it means that there is no auto-
correlation in the series.
2. If moving average component (q) in an ARIMA model is 1, it means that there is auto-
correlation in the series with lag 1.
3. If integrated component (d) in an ARIMA model is 0, it means that the series is not
stationary.
A. Only 1
B. Both 1 and 2
C. . Only 2
D. All of the statements
8. In a time-series forecasting problem, if the seasonal indices for quarters 1, 2, and 3 are 0.80,
0.90, and 0.95 respectively. What can you say about the seasonal index of quarter 4?
A. It will be less than 1

B.. It will be greater than 1

C. It will be equal to 1

D. Seasonality does not exist

9. A common method known as ratio-to-trend analysis used to


A. .Deseasonalize data
B. Take moving average
C. Remove multicollinearity
D. Represent graphical curve
10.In moving average method we cannot find trend values of some
A. End Periods
B. Middle Period
C. .Starting and End Periods
34 | P a g e
D. Starting Periods

11.What is the variance of the process (1- 0.3B)Yt = if = 2.


a. 2.85
b. 22.2
c. 2.2
d. 1.4

12.Which one of the following the backward shift operator, the ARMA(p, q) process
expressed as: Zt = zt-1 + εt – θ1εt-1, where εt ~WN(0, σε2).
a. (1 – B) Zt = (1 – B)  t
b. (1 – θ1B) Zt = (1 – B)  t
c. (1 – εtB) Zt = (1 – θ1B) εt-1
d. (1 – B) Zt = (1 – θ1B)  t
13.What is the value of the sample partial autocorrelation ̂ kk , k=2 when the r1=0.3 , r2 = 0.2

a. 0.01
b. 0.12
c. 0.33
d. 0.49
14.Which one of the following multiplicative Seasonal ARIMA(0, 1, 1)  (0, 1, 1)12 model ,
Yt 
=Wt = (1 – 0.4B)(1 – 0.6B12) t
A.
 212Yt 
=Wt = (1 – 0.4B)(1 – 0.6B12) t
B.
12Yt 
=Wt = (1 – 0.4B)(1 – 0.6B) t
C.
12Yt 12  t
D. =Wt = (1 – 0.4B)(1 – 0.6B )

35 | P a g e
15.The autocorrelation function ( 1 ) ,k=1, of the Moving average process Yt= (1-
0.6B) εt.

a. 0.44
b. -0.44
c. 0.85
d.0.36

18. Classify the following models as process (1+ 0.3B)(1- B)Yt =


t

a. ARIMA (0, 0, 2)

b. ARIMA (1, 0, 0)

c. ARIMA (1, 1, 1)

d. ARIMA (1, 1, 0)

19. Suppose the process {Yt} follows the MA(2) model, Zt =


 t + 0.2  t 1 - 0.3  t  2

 2= 1 what is the variance of the process

a) 0.87
b) 0.33
c) 0.5
d) 0.6

20. which of the following processes are stationary.

a) Yt +1.7Yt-1+0.8Yt-2 =  t
b) Yt +1.9Yt-1+0.6Yt-2 =  t +0.2  t 1 + 0.7  t  2
c) Yt +0.6Yt-1 =  t +1.2  t 1
d) None

36 | P a g e
21. which one of the following is the necessary steps of building a quality time series model:
A. making data stationary,
B. selecting the right model, and
C. evaluating model accuracy.
D. All
22. Which of the following are the key characteristics of time series data?
A. Validity.
B. Reliability.
C. Timeliness
D. All
23. The parameters of time series models.
A. Alpha
B. Gamma
C. Phi
D.All
24. The basic common assumption of time series data.
A. Differencing
B. Phi
C. Stationary
D. None
25. which one of the following determine the order of the model.
A. Phi and Delta
B. Stationary
C. Differencing
D. ACF and PACF
26. Which one of the following is determine the order of Moving average process.

A. ACF
B. PACF
C. q
D. P

37 | P a g e

You might also like