Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Practice Problems On Descriptive Statistics

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

SIVA SIVANI INSTITUTE OF MANAGEMENT, KOMPALLY

COURSE: PGDM SCTION – A


Practice Problems
--------------------------------------------------------------------------------------------------------------------------------------
Question: 1
The file P02_17.xlsx contains salaries of 200 recent graduates from a (fictional) MBA program.
a. What salary level is most indicative of those earned by students graduating from this MBA
program this year?
b. Do the empirical rules for standard deviations apply to these data? Can you tell, or at least make an
educated guess, by looking at the shape of the
histogram? Why?
c. If the empirical rules apply here, between which two numbers can you be about 68% sure that the
salary of any one of these 200 students will fall?
d. If the MBA program wants to make a statement such as “Some of our recent graduates started out
making X dollars or more, and almost all of them started out making at least Y dollars” for their
promotional materials, what values of X and Y would you suggest they use? Defend your choice.
e. As an admissions officer of this MBA program, how would you proceed to use these findings to
market the program to prospective students?

Question No: 2
Sometimes it is possible that missing data are predictive in the sense that rows with missing data are
somehow different from rows without missing data. Check this with the file P02_32.xlsx, which
contains blood pressures for 1000 (fictional) people, along with variables that can be related to blood
pressure. These other variables have a number of missing values,
presumably because the people didn’t want to report certain information.
a. For each of these other variables, find the mean
and standard deviation of blood pressure for all people without missing values and for all people with
missing values. Can you conclude that the presence or absence of data for any of these other variables
has anything to do with blood
pressure?
b. Some analysts suggest filling in missing data for a variable with the mean of the nonmissing values
for that variable. Do this for the missing data in the blood pressure data. In general, do you think this
is a valid way of filling in missing data? Why or why not?

Question No 2

The file P02_03.xlsx contains data from a survey of 399 people regarding an environmental policy.
Use filters for each of the following.
a. Identify all respondents who are female, middle aged, and have two children. What is the average
salary of these respondents?
b. Identify all respondents who are elderly and strongly disagree with the environmental policy. What
is the average salary of these respondents?
c. Identify all respondents who strongly agree with the environmental policy. What proportion of
these individuals are young?
d. Identify all respondents who are either (1) middle-aged men with at least one child and an annual
salary of at least $50,000, or (2) middle-aged women with two or fewer children and an annual salary
of at least $30,000.

Question 3
The file P02_10.xlsx contains midterm and final exam scores for 96 students in a corporate
finance course.
a. Create a histogram for each of the two sets of exam scores.
b. What are the mean and median scores on each of these exams?
c. Explain why the mean and median values are different for these data.
d. Based on your previous answers, how would you characterize this group’s performance on
the midterm and on the final exam?
e. Create a new column of differences (final exam_score minus midterm score). A positive
value_means the student improved, and a negative_value means the student did the
opposite._What are the mean and median of
the differences? What does a histogram of the differences indicate?

Question No 4
The file P02_09.xlsx lists the times required to service 200 consecutive customers at a (fictional) fast-
food restaurant.
a. Create a histogram of the customer service times. How would you characterize the distribution of
service times?
b. Calculate the mean, median, and first and third quartiles of this distribution.
c. Which measure of central tendency, the mean or the median, is more appropriate in describing this
distribution? Explain your reasoning.
d. Find and interpret the variance and standard deviation of these service times.
e. Are the empirical rules for standard deviations applicable for these service times? If not, explain
why. Can you tell whether they apply, or at least make an educated guess, by looking at the shape of
the histogram? Why?

Question – 5
The file P02_12.xlsx includes data on the 50 top graduate programs in the United States,
according to a 2009 U.S. News & World Report survey.
a. Indicate the type of data for each of the 10 variables considered in the formulation of the
overall ranking.
b. Create a histogram for each of the numerical variables in this data set. Indicate whether
each of these distributions is approximately symmetric or skewed. Which, if any, of these
distributions are skewed to the right? Which, if any, are skewed to the left?
c. Identify the schools with the largest and smallest annual out-of-state tuition and fee levels.
d. Find the annual out-of-state tuition and fee levels at each of the 25th, 50th, and 75th
percentiles for these schools. For post-2007 Excel users only, find these percentiles using
both the PERCENTILE. INC and PERCENTILE.EXE functions. Can you explain how and
why they are different (if they are indeed different)?
e. Create a box plot to characterize this distribution of these MBA salaries. Is this distribution
essentially symmetric or skewed? If there are any outliers on either end, which schools do
they correspond to? Are these same schools outliers in box plots of any of the other numerical
variables (from_columns_E to L)?

Question No: 6
The file P02_45.xlsx contains the salaries of 135 business school professors at a (fictional) large state
university.
a. If you increased every professor’s salary by $1000, what would happen to the mean and median
salary?
b. If you increased every professor’s salary by $1000, what would happen to the standard deviation of
the salaries?
c. If you increased every professor’s salary by 5%, what would happen to the standard deviation of the
salaries?
Question No 7
The file P02_01.xlsx indicates the gender and nationality of the MBA incoming class in two successive years at the Kelley
School of Business at Indiana University.
a. For each year, create tables of counts of gender and of nationality. Then create column charts of these counts. Do they
indicate any noticeable change in the composition of the two classes?
b. Repeat part a for nationality, but recode this variable so that all nationalities that have counts of 1 or 2 are classified as
Other.

Question No 8

The file P02_02.xlsx contains information on over 200 movies that were released in 2006 and 2007.
a. Create two column charts of counts, one of the different genres and one of the different distributors.
b. Recode the Genre column so that all genres with a count of 10 or less are lumped into a category called Other. Then create
a column chart of counts for this recoded variable. Repeat similarly for the Distributor variable.

Question No 9

The file P02_03.xlsx contains data from a survey of 399 people regarding a government environmental policy.
a. Which of the variables in this data set are categorical? Which of these are nominal; which are ordinal?
b. For each categorical variable, create a column chart of counts.
c. Recode the data into a new data set, making four transformations: (1) change Gender to list “Male”

Question No 10
The file P02_06.xlsx lists the average time (in minutes) it takes citizens of 379 metropolitan areas to_travel to work and back
home each day.
a. Create a histogram of the daily commute times.
b. Find the most representative average daily commute time across this distribution.
c. Find a useful measure of the variability of these average commute times around the mean.
d. The empirical rule for standard deviations indicates_that approximately 95% of these average_travel times will fall
between which_two_values? For this particular data set, is_this_empirical rule at least approximately
correct?

Question No 11
The file P02_07.xlsx includes data on 204 employees at the (fictional) company Beta Technologies.
a. Indicate the data type for each of the six variables included in this data set.
b. Create a histogram of the Age variable. How would you characterize the age distribution for these employees?
c. What proportion of these full-time Beta employees are female?
d. Find appropriate summary measures for each of the numerical variables in this data set.
e. For the Salary variable, explain why the empirical_rules for standard deviations do or do not_apply.

Question 12
8. The file P02_08.xlsx contains data on 500 shipments of one of the computer components that a company manufactures.
Specifically, the proportion of items that are defective is listed for each shipment.
a. Create a histogram that will help a production manager understand the variation of the proportion of defective components
in the company’s shipments.
b. Is the mean or median the most appropriate measure of central location for this data set? Explain your reasoning.
c. Discuss whether the empirical rules for standard deviations apply. Can you tell, or at least make an educated guess, by
looking at the shape of the histogram? Why?

Question No 13
The file P02_09.xlsx lists the times required to service 200 consecutive customers at a (fictional) fast-food restaurant.
a. Create a histogram of the customer service times. How would you characterize the distribution of service times?
b. Calculate the mean, median, and first and third quartiles of this distribution.
c. Which measure of central tendency, the mean or the median, is more appropriate in describing this distribution? Explain
your reasoning.
d. Find and interpret the variance and standard deviation of these service times.
e. Are the empirical rules for standard deviations applicable for these service times? If not, explain why. Can you tell
whether they apply, or at least make an educated guess, by looking at the shape of the histogram? Why?

Question No 14
10. The file P02_10.xlsx contains midterm and final exam scores for 96 students in a corporate finance course.
a. Create a histogram for each of the two sets of exam scores.
b. What are the mean and median scores on each of these exams?
c. Explain why the mean and median values are different for these data.
d. Based on your previous answers, how would you characterize this group’s performance on the midterm and on the final
exam?
e. Create a new column of differences (final exam_score minus midterm score). A positive value_means the student
improved, and a negative_value means the student did the opposite._What are the mean and median of the differences? What
does a histogram of the
differences indicate?

You might also like