Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
17 views

Lecture-01 What Is Statistics

Staistics and data Science
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Lecture-01 What Is Statistics

Staistics and data Science
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

STAT 101: Basic Statistics

Department of Statistics
Jahangirnagar University

Md. Atikur Rahman


Email: arahman@juniv.edu
Contact: 01779734089
Statistics
What is Statistics?

•Lies, damned lies, and statistics


Introduction of Statistics
The study of statistics has become more popular than ever over the past
four decades or so. The increasing availability of computers and
statistical software packages has enlarged the role of statistics as a tool
for empirical research. As a result, statistics is used for research in
almost all professions, from medicine to sports also in physical sciences.
Today, university students in almost all disciplines are required to take
at least one statistics course.
Introduction of Statistics
Every day we make decisions that may be personal, business related, or of some
other kind. Usually these decisions are made under conditions of uncertainty. Many
times, the situations or problems we face in the real world have no precise or
definite solution. Statistical methods help us make scientific and intelligent
decisions in such situations.

Opening a large store in an area with or without assessing the demand for it may
affect its success. In this situation statistics helps us to take decision.
Definition of Statistics

Statistics is a group of methods used to


collect, analyze, present, and interpret data
and to make decisions.
Importance of Statistics
•Statistics in planning
•Statistics in economics
•Statistics in business
•Statistics in medical sector etc.
Types of Statistics

•Descriptive Statistics

•Inferential Statistics
Descriptive Statistics
Suppose we have information on the test scores of students enrolled in a statistics class. In
statistical terminology, the whole set of numbers that represents the scores of students is
called a data set, the name of each student is called an element, and the score of each student
is called an observation. A data set in its original form is usually very large. Consequently,
such a data set is not very helpful in drawing conclusions or making decisions. It is easier to
draw conclusions from summary tables and diagrams than from the original version of a data
set. So, we reduce data to a manageable size by constructing tables, drawing graphs, or
calculating summary measures such as averages. The portion of statistics that helps us do this
type of statistical analysis is called descriptive statistics.
Definition of descriptive Statistics
Descriptive statistics consists of methods for organizing,
displaying, and describing data by using tables, graphs, and
summary measures.
Inferential Statistics

Inferential statistics consists of methods that use


sample results to help make decisions or
predictions about a population.
Population vs Sample
Population

A population consists of all elements, individuals, items, or objects –


whose characteristics are being studied.

Sample

A representative portion of the population selected for study is referred


to as a sample.
Variable
A variable is a characteristic under study that assumes different values
for different elements that is which is vary from person to person, units
to units.

For example: height, volume, time, exam score, hair colour, eye colour.
Types of variable
❑Qualitative or Categorical Variables

❑Quantitative Variables

• Discrete Variables

• Continuous Variables
Qualitative or Categorical Variables or Attributes

A variable that cannot assume a numerical value but can be


classified into two or more nonnumeric categories is called a
qualitative or categorical variable. The data collected on such a
variable are called qualitative data.

For example: gender, hair colour, family status etc.


Quantitative Variables
A variable that can be measured numerically is called a
quantitative variable. The data collected on a quantitative
variable are called quantitative data.

For example: the amount of money you have, Number of


students present in a class etc.
Types of Quantitative Variables
Discrete variable

A variable whose values are countable is called a discrete variable. In other


words, a discrete variable can assume only certain values with no
intermediate values.

Example: The number of traffic citations a person received during the last
year, the number of customers arriving for service during a particular period.
Types of Quantitative Variables
Continuous variable

A variable that can measure any numerical value over a


certain interval is called a continuous variable.

For example: weight of an individual, height, time.


Types of variables
Level of measurements or scales of measurements
To perform statistical analysis of data, it is important to first understand variables and what
should be measured using these variables. There are different levels of measurement in
statistics and data measured using them can be broadly classified into qualitative and
quantitative data.

For example, it is practically impossible to calculate the average hourly rate of a worker in
the US. So, a sample audience is randomly selected such it represents the larger population
appropriately. Then the average hourly rate of this sample audience is calculated. Using
statistical tests, you can conclude the average hourly rate of a larger population. The level of
measurement of a variable decides the statistical test type to be used.
Level of measurements or scales of measurements

Four fundamental levels of measurement are:


•Nominal
•Ordinal
•Interval
•Ratio
Nominal Scale
• Nominal scale is a naming scale, where variables are simply “named” or labeled,
with no specific order.
Example: Where do you live?
• 1- Dhaka
• 2- Jahangirnagar University
• 3- Near the Jahangirnagar University
Another Example: A customer survey asking “Which brand of Smartphone's do you
prefer?”
1. Apple 2. Samsung 3 OnePlus 4. Others
Nominal Scale Examples
•Gender

•Political preferences

•Place of residence etc.


Nominal Scale Data and Analysis
There are two primary ways in which nominal scale data can be collected:

• By asking an open-ended question, the answers of which can be coded to a


respective number of label decided by the researcher.

• The other alternative to collect nominal data is to include a multiple choice


question in which the answers will be labeled.

In both cases, the analysis of gathered data will happen using percentages or
mode, i.e., the most common answer received for the question.
Ordinal Scale
Ordinal Scale is defined as a variable measurement scale used to simply
depict the order of variables and not the difference between each of the
variables.

For example: How satisfied are you with today’s class?

1. Very Unsatisfied
2. Unsatisfied
3. Neutral
4. Satisfied
5. Very Satisfied
Ordinal Data and Analysis
• Ordinal scale data can be presented in tabular or graphical formats for a researcher.

• Also, methods such as Mann-Whitney U test and Kruskal–Wallis H test can also
be used to analyze ordinal data. These methods are generally implemented to
compare two or more ordinal groups.

• In the Mann-Whitney U test, researchers can conclude which variable of one group
is bigger or smaller than another variable of a randomly selected group. While in
the Kruskal–Wallis H test, researchers can analyze whether two or more ordinal
groups have the same median or not.
Interval Scale (Distance is meaningful)
Interval Scale is defined as a numerical scale where the order of the variables is
known as well as the difference between these variables.

For example: 80 degrees is always higher than 50 degrees and the difference between
these two temperatures is the same as the difference between 70 degrees and 40
degrees.

Another example: An interval level of measurement could be the measurement of


anxiety in a student between the score of 10 and 11, if this interval is the same as that
of a student who is in between the score of 40 and 41.
Interval Data and Analysis
• All the techniques applicable to nominal and ordinal data analysis are
applicable to Interval Data as well. Apart from those techniques, there are
a few analysis methods such as descriptive statistics, correlation regression
analysis which is extensively for analyzing interval data.

• Descriptive statistics is the term given to the analysis of numerical data


which helps to describe or summarize data in a meaningful manner and it
helps in calculation of mean, median, and mode.
Ratio Scale (Absolute zero)
• Ratio Scale is defined as a variable measurement scale that not only produces the order
of variables but also makes the difference between variables known along with
information on the value of true zero.

• Examples: What is your daughter’s current height?


• Less than 5 feet.
• 5 feet 1 inch – 5 feet 5 inches
• 5 feet 6 inches- 6 feet
• More than 6 feet
Ratio Data and Analysis
• Ratio scale data is quantitative in nature due to which all quantitative
analysis techniques such as Cross-tabulation, Conjoint, etc. can be
used to calculate ratio data.
Specify the levels of measurement
• Number of text messages you send daily

• Your monthly cellular phone bill

• IQ score

• The temperature in Dhaka city

• Faculty ranks (like Professor, Associate Professor, and Assistant Professor)

• Judging (first place, second place, 3rd place etc.)

• Department

• Postal Code
Data
A data set is a collection of observations or information on one or
more variables.
Types of Data
•Qualitative Data
•Quantitative data
✔Discrete data
✔Continuous data
Quantitative and Qualitative data
1st year GPA of 20 students

3.23 3.33 3.45 3.67 3.89 3.92 3.98 3.45 3.23 3.45
3.56 3.67 3.13 3.13 2.59 2.58 3.09 3.01 3.43 3.23

Status of student
Time Series Data
• Time series data, also referred to as time-stamped data, is a sequence of data
points indexed in time order. Time-stamped is data collected at different points
in time.
• These data points typically consist of successive measurements made from the
same source over a time interval and are used to track change over time.
• Time series data is a collection of observations obtained through repeated
measurements over time.
Examples of time series data
• Monthly production of rice
• Rainfall measurements
• Stock prices
• Number of sunspots
• Annual retail sales
• Monthly subscribers
• Heartbeats per minute
Cross-sectional Data
A cross sectional data is data collected by observing various subjects like (firms, countries,
regions, individuals), at the same point in time.

Basically, Cross sectional is a data which is collected from all the participants at the same
time.

For example: suppose I want to measure current blood pressure levels in a population. The
underlying population should have members with similar characteristics. 20 people will be
selected randomly from that population and their Blood Pressure measured. Their height,
weight and other health factors will also be noted.
Data Collection and Sampling Techniques
•Random Sampling

•Systematic Sampling

•Stratified Sampling

•Cluster Sampling

You might also like