Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Descriptive Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Descriptive Statistics

Simple Descriptive Statistics


A Statistic is any single value that summarizes an entire dataset
Sample Statistics
• A sample statistic is defined as any number computed from your
sample data. Examples include the sample average, sample median,
sample standard deviation, and sample percentiles
Population Parameter
• A population parameter is defined as any number computed for the
entire population. Examples include the population mean and
population standard deviation.
Types of Data
Types of Data
Qualitative or Categorical Data
• Qualitative data, also known as the categorical data, describes the
data that fits into the categories. Qualitative data are not numerical.
• The categorical information involves categorical variables that
describe the features such as a person’s gender, home town etc.
• Categorical measures are defined in terms of natural language
specifications, but not in terms of numbers.
• Examples of the categorical data are birthdate, favorite sport, school
postcode. Here, the birthdate and school postcode hold the
quantitative value, but it does not give numerical meaning.
Types of Data
• Nominal Data
• Nominal data is one of the types of qualitative information which helps to label
the variables without providing the numerical value.
• Nominal data is also called the nominal scale. It cannot be in order or measured.
But sometimes, the data can be qualitative and quantitative. Examples of nominal
data are letters, symbols, words, gender etc.
• The nominal data are examined using the grouping method.
• In this method, the data are grouped into categories, and then the frequency or
the percentage of the data can be calculated.
• These data are visually represented using the pie charts.
Types of Data
• Ordinal Data
• Ordinal data/variable is a type of data that follows a natural order.
• The significant feature of the nominal data is that the difference
between the data values is not determined.
• This variable is mostly found in surveys, finance, economics,
questionnaires, and so on.
• Example is low, medium, high
• The ordinal data is commonly represented using a bar chart. These
data are investigated and interpreted through many visualization
tools.
• The information may be expressed using tables in which each row in
the table shows the distinct category.
Types of Data
• Quantitative or Numerical Data
• Quantitative data is also known as numerical data which represents the
numerical value (i.e., how much, how often, how many).
• Numerical data gives information about the quantities of a specific thing.
• Some examples of numerical data are height, length, size, weight, and so on. The
quantitative data can be classified into two different types based on the data sets.
• The two different classifications of numerical data are discrete data and
continuous data.
Types of Data
Discrete Data
• Discrete data can take only discrete values. Discrete information contains
only a finite number of possible values. Those values cannot be subdivided
meaningfully. Here, things can be counted in whole numbers.
• Example: Number of students in the class
Continuous Data
• Continuous data is data that can be calculated. It has an infinite number of
probable values that can be selected within a given specific range.
• Example: Temperature range
Descriptive and Inferential Statistics

Descriptive statistics summarize information already present in data


• Visualizations like bar charts, histograms, etc.
• Summary measures like averages, standard deviation, median, etc.
Inferential statistics use a sample of data to make predictions about
larger populations or about unobserved/future trends
• Generalizations from a sample to a population: Confidence intervals,
hypothesis tests, etc.
• Comparisons made between datasets: Comparisons, correlations,
regression, etc.
Measures of Central Tendency
Central tendency is a descriptive summary of a dataset through a single value
that reflects the center of the data distribution. Along with the variability
(dispersion) of a dataset, central tendency is a branch of descriptive statistics.
Generally, the central tendency of a dataset can be described using the
following measures:
• Mean
• Median
• Mode
Mean (x̄)
• Add values in the data set and divide by the number of values
Population Mean (μ)
• Population Mean (μ) = ∑X / N
• Where, ∑X is the summation of data in X
• And, N is the count of data in X.
Formula for Sample Mean (x̄)
• x̄ = sum of the terms/number of terms
Median
• The median of a set of data is the middlemost number or centre value
in the set.
• The median is also the number that is halfway into the set.
• To find the median, the data should be arranged first in order of least
to greatest or greatest to the least value.
• A median is a number that is separated by the higher half of a data
sample, a population or a probability distribution from the lower half.
Odd Number of Observations

• If the total number of observations given is odd, then the formula to


calculate the median is:

(n+1)/2 th observation.
Even Number of Observations

• If the total number of observation is even, then the


median formula is:
• Formula = Sum of two Middle Values /2
Mode
• The mode is the most frequently occurring value in a dataset
Measures of Dispersion
• Measures of Dispersion are used to represent the scattering
of data.
• These are the numbers that show the various aspects of the
data spread across various parameters.
Some absolute measures of dispersion are:
• Range
• Inter-Quartile Range
• Variance
• Standard Deviation
Range
• A measure of the spread of the data
• Range = maximum – minimum
Quartiles
• Quartiles are three values that split sorted data into four parts, each
with an equal number of observations.
• Quartiles are a type of quantile.
• First quartile: Also known as Q1, or the lower quartile.
• Second quartile: Also known as Q2, or the median.
• Third quartile: Also known as Q3, or the upper quartile.
Inter-Quartile Range (IQR)
• IQR = 3rd Quartile – 1st Quartile
• The inter-quartile range measures range of the mid 50 per
cent of the data
Variance
• The sum of squared differences from the mean
• Measured in units squared

Formula:
 (X −X)
2
i

n −1
Standard Deviation
• Square root of the variance
• Returns the units to those of the original variable

 (X −X)
2
Formula: i

n −1
Standard Deviation
• Square root of the variance
• Returns the units to those of the original variable

 (X −X)
2
Formula: i

n −1

You might also like