Lecture-01 What Is Statistics
Lecture-01 What Is Statistics
Department of Statistics
Jahangirnagar University
Opening a large store in an area with or without assessing the demand for it may
affect its success. In this situation statistics helps us to take decision.
Definition of Statistics
•Descriptive Statistics
•Inferential Statistics
Descriptive Statistics
Suppose we have information on the test scores of students enrolled in a statistics class. In
statistical terminology, the whole set of numbers that represents the scores of students is
called a data set, the name of each student is called an element, and the score of each student
is called an observation. A data set in its original form is usually very large. Consequently,
such a data set is not very helpful in drawing conclusions or making decisions. It is easier to
draw conclusions from summary tables and diagrams than from the original version of a data
set. So, we reduce data to a manageable size by constructing tables, drawing graphs, or
calculating summary measures such as averages. The portion of statistics that helps us do this
type of statistical analysis is called descriptive statistics.
Definition of descriptive Statistics
Descriptive statistics consists of methods for organizing,
displaying, and describing data by using tables, graphs, and
summary measures.
Inferential Statistics
Sample
For example: height, volume, time, exam score, hair colour, eye colour.
Types of variable
❑Qualitative or Categorical Variables
❑Quantitative Variables
• Discrete Variables
• Continuous Variables
Qualitative or Categorical Variables or Attributes
Example: The number of traffic citations a person received during the last
year, the number of customers arriving for service during a particular period.
Types of Quantitative Variables
Continuous variable
For example, it is practically impossible to calculate the average hourly rate of a worker in
the US. So, a sample audience is randomly selected such it represents the larger population
appropriately. Then the average hourly rate of this sample audience is calculated. Using
statistical tests, you can conclude the average hourly rate of a larger population. The level of
measurement of a variable decides the statistical test type to be used.
Level of measurements or scales of measurements
•Political preferences
In both cases, the analysis of gathered data will happen using percentages or
mode, i.e., the most common answer received for the question.
Ordinal Scale
Ordinal Scale is defined as a variable measurement scale used to simply
depict the order of variables and not the difference between each of the
variables.
1. Very Unsatisfied
2. Unsatisfied
3. Neutral
4. Satisfied
5. Very Satisfied
Ordinal Data and Analysis
• Ordinal scale data can be presented in tabular or graphical formats for a researcher.
• Also, methods such as Mann-Whitney U test and Kruskal–Wallis H test can also
be used to analyze ordinal data. These methods are generally implemented to
compare two or more ordinal groups.
• In the Mann-Whitney U test, researchers can conclude which variable of one group
is bigger or smaller than another variable of a randomly selected group. While in
the Kruskal–Wallis H test, researchers can analyze whether two or more ordinal
groups have the same median or not.
Interval Scale (Distance is meaningful)
Interval Scale is defined as a numerical scale where the order of the variables is
known as well as the difference between these variables.
For example: 80 degrees is always higher than 50 degrees and the difference between
these two temperatures is the same as the difference between 70 degrees and 40
degrees.
• IQ score
• Department
• Postal Code
Data
A data set is a collection of observations or information on one or
more variables.
Types of Data
•Qualitative Data
•Quantitative data
✔Discrete data
✔Continuous data
Quantitative and Qualitative data
1st year GPA of 20 students
3.23 3.33 3.45 3.67 3.89 3.92 3.98 3.45 3.23 3.45
3.56 3.67 3.13 3.13 2.59 2.58 3.09 3.01 3.43 3.23
Status of student
Time Series Data
• Time series data, also referred to as time-stamped data, is a sequence of data
points indexed in time order. Time-stamped is data collected at different points
in time.
• These data points typically consist of successive measurements made from the
same source over a time interval and are used to track change over time.
• Time series data is a collection of observations obtained through repeated
measurements over time.
Examples of time series data
• Monthly production of rice
• Rainfall measurements
• Stock prices
• Number of sunspots
• Annual retail sales
• Monthly subscribers
• Heartbeats per minute
Cross-sectional Data
A cross sectional data is data collected by observing various subjects like (firms, countries,
regions, individuals), at the same point in time.
Basically, Cross sectional is a data which is collected from all the participants at the same
time.
For example: suppose I want to measure current blood pressure levels in a population. The
underlying population should have members with similar characteristics. 20 people will be
selected randomly from that population and their Blood Pressure measured. Their height,
weight and other health factors will also be noted.
Data Collection and Sampling Techniques
•Random Sampling
•Systematic Sampling
•Stratified Sampling
•Cluster Sampling