Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Assignment 1,2,3

The document outlines the instructions and questions for Assignment 1 in an Introduction to Statistics course, due on March 16, 2025. It includes questions on distinguishing statistical terms, analyzing user data from a media study, evaluating climate data sources, and various statistical analyses on datasets related to soft drinks, audit times, student learning times, and sales revenue. The assignment emphasizes original work, clarity, and completeness in responses.

Uploaded by

wolverine40183
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment 1,2,3

The document outlines the instructions and questions for Assignment 1 in an Introduction to Statistics course, due on March 16, 2025. It includes questions on distinguishing statistical terms, analyzing user data from a media study, evaluating climate data sources, and various statistical analyses on datasets related to soft drinks, audit times, student learning times, and sales revenue. The assignment emphasizes original work, clarity, and completeness in responses.

Uploaded by

wolverine40183
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 1

Course Name: Introduction to Statistics (STA101)


Deadline: 16.03.2025
Instructions:

1. Add a cover page.


2. Answer all questions provided below.
3. Show all steps in your calculations and include relevant explanations where needed.
4. Ensure all responses are your original work.
5. Marks will be awarded for accuracy, clarity, and completeness.

Question 1: Distinguish between the following terms: [1*5=5]


i. Population and sample
ii. Parameter and statistic
iii. Bar graph and histogram
iv. Histogram and ogive
v. Descriptive statistics and inferential statistics

Question 2: A media company conducted a study on 10 users to understand their streaming habits. Each participant
was assigned a User ID for anonymity. The company recorded the Age Group of each user (e.g., 18-25, 26-35, etc.)
to analyze generational preferences. To track engagement, the study measured the Number of Hours Spent
Streaming per Week and the Average Number of Movies Watched per Month. Users also reported their Preferred
Streaming Platform (Netflix, Hulu, Disney+, etc.) to understand market share. The company wanted to analyze
content preferences, so users indicated their Favorite Movie Genre (Action, Comedy, Drama, etc.) and whether they
Watch Movies Alone or with Others (Alone, Family, Friends).

Subscription habits were also examined, including the Subscription Cost ($ per month) and whether users Share
Their Account with Others (Yes or No). To evaluate satisfaction, participants rated their Streaming Experience on a
scale from 1 to 10. The new interval scale variable included in the study was the User’s Monthly Temperature
Preference for Watching Movies (in °C or °F). The company hypothesized that environmental factors like room
temperature could influence viewing habits and comfort levels. Finally, the study recorded whether users Prefer
Subtitles or Dubbing (Subtitles, Dubbing, or No Preference). The media company planned to use these insights to
improve content recommendations and marketing strategies.

From the above scenario, identify variables and classify each of the variables as qualitative or quantitative, and state
whether they are discrete or continuous (if quantitative). Identify the level of measurement for each variable. [5]

Question 3: A scientist is studying changes in local climate patterns over the last 50 years. To gather data, they use
two different sources: [1+1=2]

o Method A: The scientist sets up weather stations in specific areas to monitor temperature, precipitation,
and humidity levels over the next year.
o Method B: The scientist uses historical climate data from a government database that tracks
temperature, precipitation, and humidity levels over the past 50 years.

a) Is Method A a primary or secondary source of data? Explain.


b) Is Method B a primary or secondary source of data? Explain.

Question 4: Coca-Cola, diet Coke, dr. Pepper, Pepsi, and Sprite are five popular soft drinks. Assume that the data
show the soft drink selected in a sample of 50 soft drink purchases.

1
Summarize the data by constructing the following: [1.5+3+0.5=5]

a) Relative and percent frequency distributions.


b) Draw a bar graph for the relative frequency distribution. Draw a pie chart for the percentage distribution.
Show the necessary calculations for determining the angles of each segment.
c) Based on these data, what are the three most popular soft drinks?

Question 5: The following data show the time in days required to complete year-end audits for a sample of 20
clients of Sanderson and Clifford, a small public accounting firm. [3+1+1+1=6]

a) How many classes would you suggest? What value would you suggest for a class interval?
Organize the data into a frequency distribution. Develop a relative frequency distribution and a percent
frequency distribution.
b) Construct a cumulative frequency distribution.
c) Draw a histogram and comment on the shape of the distribution.
d) Draw a frequency polygon and interpret the audit times.

Question 6: Refer to Question 05. [1.5+3+2.5=7]

a) Calculate and interpret the mean, median, and mode considering the raw data.
b) Calculate and interpret the mean, median, and mode considering the grouped data. Interpret your results.
Compare the results obtained for raw data with the results for grouped data.
c) Calculate variance and standard deviation considering the frequency distribution table and interpret your
results.

Question 7: A university conducted a study to analyze the time (in minutes) that 50 students spent on a new online
learning module. The administration aims to understand the distribution of these times to assess the module's
effectiveness and identify areas for improvement. The recorded times (in minutes) are as follows:

42.15, 45.32, 47.89, 49.56, 44.78, 46.23, 48.67, 50.12, 45.89, 43.56, 47.34, 49.01, 51.23, 48.56, 46.12, 44.90, 50.45,
48.78, 51.67, 49.34, 45.78, 46.89, 49.56, 50.89, 52.12, 51.34, 48.23, 44.67, 46.78, 50.34, 45.12, 47.23, 49.45, 50.78,
51.89, 49.12, 52.45, 47.56, 45.34, 46.78, 50.23, 51.67, 44.56, 47.90, 52.01, 51.23, 49.56, 48.12, 46.67, 50.56.

Based on this data, the administration has posed the following questions: [2.5+5+0.5=8]

a) Organize the data into a grouped frequency distribution using the class intervals: 42.0–43.9, 44.0–45.9,
46.0–47.9, 48.0–49.9, 50.0–51.9, 52.0–53.9.
b) Calculate the mean, median, mode, variance, and standard deviation of the time spent on the module, and
interpret the results.
c) Determine the number of students who took more than 50 minutes to complete the module.

Question 8: The following dataset result from a 150-question aptitude test given to 50 individuals recently
interviewed for a position at the Haskens Manufacturing. The data indicate the number of questions answered
correctly. [2+2+2=6]

2
a) Construct a stem-and-leaf display and comment on the findings.
b) Determine the five-number summary for the dataset. Prepare a box-and-whisker plot. Are the data
symmetric or skewed?
c) Determine whether the dataset contains any outliers, providing detailed mathematical calculations to
support your conclusion.

Question 9: Exposure to microbial products, especially endotoxin, may have an impact on vulnerability to allergic
diseases. The article “Dust Sampling Methods for Endotoxin—An Essential, But Underestimated Issue” (Indoor Air,
2006: 20–27) considered various issues associated with determining endotoxin concentration. The following data
on concentration (EU/mg) in settled dust for one sample of urban homes and another of farm homes was kindly
supplied by the authors of the cited article. [0.25*3+2+2.5=6]

U 6.0 5.0 11.0 33.0 4.0 5.0 80.0 18.0 35.0 17.0 23.0
F 4.0 14.0 11.0 9.0 9.0 8.0 4.0 20.0 5.0 8.9 21.0 9.2 3.0 2.0 0.3

a) Determine the sample mean for each sample. How do they compare?
b) Determine the sample median for each sample. How do they compare? Why is the median for the urban
sample so different from the mean for that sample?
c) Does either of the datasets exhibit a distinct modal value? If a distinct modal value exists, calculate the
mode. Justify your response with an explanation.
d) Calculate range, inter-quartile range, variance and standard deviation for each dataset.
e) Calculate the coefficient of variation (CV) for each group. Based on these CVs, determine which group
exhibits greater relative stability in endotoxin concentrations.

Question 10: The following dataset represent the daily sales revenue (in dollars) of a small retail store over a three-
week period (Monday to Sunday). Each value corresponds to the total sales for a particular day. [3+2+2+3=10]

83.47, 72.15, 68.29, 15.93, 90.56, 24.78, 37.62, 58.34, 49.05, 66.81, 12.47, 79.23,
53.89, 31.76, 47.58, 62.94, 25.37, 84.61, 39.28, 71.05, 56.72

a) Calculate the quartiles for the dataset. Determine the values corresponding to the first decile (D₁), fifth
decile (D₅), and ninth decile (D₉) of the dataset.
b) Find the 25th percentile (P₂₅), 50th percentile (P₅₀), and 75th percentile (P₇₅) of the dataset. Show that the
75th percentile is equal to the third quartile for this dataset.
c) Determine the relationship between the measures Me, D5, P50, and Q2.
d) Find the coefficient of skewness using Pearson’s estimate. What is your conclusion regarding the shape of
the distribution?

Good Luck!

You might also like