Overview
Overview
Topics:
Learning Objectives:
Describe the source of variability in the dataset, how it relates to outliers and how it
can be addressed.
Check List:
Introduction
This unit deals with numerical and graphical ways to describe and display data. This area of
statistics is called descriptive statistics. You will learn to calculate and interpret these measures
and graphs.
The plots that will be discussed in this unit are the stem plots, histograms, and the box plot.
These types of plots are aimed at displaying the distribution of numeric variable. In previous
units the bar plot was introduced which can be used for displaying the distribution of factors.
Numerical summaries of a variable try to characterize important features of the distribution of
the variable. The mean and the median identify the location of the distribution. On the other
hand, the standard deviation (and variance) and the inter-quartile range measure the spread of the
data.
Reading Assignment
Illowsky, B., Dean, S., Birmajer, D., Blount, B., Boyd, S., Einsohn, M., Helmreich, Kenyon,
L., Lee, S., & Taub, J. (2022). Introductory
statistics. openstax. https://openstax.org/details/books/introductory-statistics
o To get started, thoroughly read the Syllabus for this course accessible on the
course homepage in the General Information section.
o View the online book.
o Read Chapter 2 - Descriptive Statistics
Section 2.1 - Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs
Section 2.2 - Histograms, Frequency Polygons and Time Series
Graphs
Section 2.3 - Measures of the Location of the Data
Section 2.4 - Box Plots
Section 2.5 - Measures of the Center of the Data
Section 2.6 - Skewness and the Mean, Median and Mode
Section 2.7 - Measures of the Spread of the Data
Section 2.8 - Descriptive Statistics
o Solve the following practice exercises as homework from the
attached Practice Exercise- Unit 2.pdf
Video:
o This statistics video tutorial presents the boxplots and all the useful info they
convey.
o A boxplot is a commonly seen plot and conveys a lot of information in a single
plot.
o They are useful for describing the distribution of a numeric variable, as well
as indicating a few key points, such as the median and quartiles.
o They are also useful in identifying outliers.
MarinStatsLectures-R Programming & Statistics. (2019, September 3). Mean, median and
mode in statistics | statistics tutorial | MarinStatsLectures [Video]. Youtube.
o The tutorial introduces the mean, median, mode, and other measures of
“center” for a numeric variable.
MarinStatsLectures-R Programming & Statistics. (2018, June 19). Standard deviation &
degrees of freedom explained | statistics tutorial | MarinStatsLectures [Video]. Youtube.
o In this statistics video tutorial, you will learn the underlying concept of the
Sample Standard Deviation, and what it is actually measuring.
o It also explains the concept of the Degrees of Freedom and why do you
divide by (n-1)
The Organic Chemistry Tutor. (2019, January 11). Stem and leaf plots [Video]. YouTube.
o This video tutorial explains how to make a simple stem and leaf plot.
Discussion Assignment
In the discussion forum, you are expected to participate often and engage in deep levels of
discourse. You are required to post an initial response to the question/issue presented in
the Forum by Sunday evening and then respond to at least 3 of your classmates’ initial
posts. You should also respond to anyone who has responded to you.
An important practice is to check the validity of any data set that you analyze. One
goal is to detect typos in the data, and another would be to detect faulty
measurements. Recall that outliers are observations with values outside the
“normal” range of values of the rest of the observations.
o Specify a large population that you might want to study and describe
the type of numeric measurement that you will collect (examples: a
count of things, the height of people, a score on a survey, the weight of
something) for your study.
o What is the best course of action statistically if you found few outliers
in a sample of size 100?
Your Discussion should be a minimum of 200 words in length and not more than 500
words. Please include a word count. Following the APA standard, use references and in-
text citations for the textbook and any other sources.
Always prioritize using LibreOffice Calculator to retrieve values, as it will be a key tool for
the proctored final exam.
Written Assignment
"You are required to submit a substantial response to all questions and provide all
necessary theorems and techniques for the solutions"
Forty randomly selected students were asked the number of pairs of sneakers they owned.
Let X = the number of pairs of sneakers owned. The results are as follows:
1 2
2 5
3 8
4 12
5 12
6 0
7 1
1. Find the mean x̄ .
2. Find the samples standard deviation, s.
3. Complete the Relative Frequency column and the Cumulative Relative Frequency
Column.
4. Find the first quartile.
5. Find the median.
6. Find the third quartile.
7. What percent of the students owned at least five pairs?
8. Find the 40th percentile.
9. Find the 90th percentile.
Reference:
Problem No 114. Illowsky, B., Dean, S., Birmajer, D., Blount, B., Boyd, S., Einsohn, M.,
Helmreich, Kenyon, L., Lee, S., & Taub, J. (2022). Introductory
statistics. openstax. https://openstax.org/books/introductory-statistics/pages/2-bringing-it-
together-homework .
Always prioritize using LibreOffice Calculator to retrieve values, as it will be a key tool for
the proctored final exam.
Learning Journal
We are given a table for the city of Detroit, Michigan USA for the period of 1961 to 1973
with the following information: a column with the number of full-time police per 100,000
citizens and a column with the number of homicides per 100,000 citizens.
Homicide
Year Police
s
1961 260.35 8.6
1962 269.8 8.9
1963 272.04 8.52
1964 272.96 8.89
1965 272.51 13.07
1966 261.34 14.57
1967 268.89 21.36
1968 295.99 28.03
1969 319.87 31.49
1970 341.43 37.39
1971 356.59 46.26
1972 376.69 47.24
1973 390.19 52.33
On the same x-axis representing the year, construct the time series of the number of
police and the number of homicides on the same graph.
Identify the variable that demonstrated the higher percentage increase relative to its
initial value (police or homicides), and share your conclusions on the matter.
Did the increase in police officers have an impact on the murder rate? Why?
o Utilize the concepts introduced in this unit to solve the given problem.
o The submission must include a clear copy of the graph. A handwritten graph
is acceptable if each data point is visible on the graph.
o Your submission can be on a word document, a spread sheet or PDF.
Always prioritize using LibreOffice Calculator to retrieve values, as it will be a key tool for
the proctored final exam.