In-Class Exercise #1 Notes

Uploaded by

CSstudent

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

In-Class Exercise #1 Notes

Uploaded by

CSstudent

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

In-Class Exercise #1 Notes

Chapter 0: First Things First

DCOVA FRAMEWORK
In order to minimise errors, you use a framework that organises the set of
tasks that you follow to apply statistics correctly:
- Define the data that you want to study to solve a problem.
- Collect the data from appropriate sources.
- Organise the data collected, by developing tables.
- Visualise the data collected, by developing charts.
- Analyse the data collected, reach conclusions and present results.
OPERATIONAL DEFINITIONS
Big Data is data that are being collected in large volumes, at very fast
velocities and near real time.
Unstructured Data have little repeating internal structure and an irregular
pattern, which requires pre-processing prior to analysis.
Variable defines a characteristic or property of an item or individual that can
vary among the occurrences of those items or individuals.
Descriptive Statistics are the methods that primarily help summarise and
present data.
Inferential Statistics are the methods that use data collected from a small
group to reach conclusions about a larger group.
Statistic refers to a value that summarises the data of a particular variable.
Logical Causality means that you can plausibly claim something directly causes
something else.
Data are numerical or textual facts and figures that are collected through some
type of measurement process.
Information is the result of analysing data, i.e. extracting meaning from data to
support evaluation and decision making.
Chapter 1: Defining and Collecting Data
CLASSIFYING VARIABLES BY TYPE
Numerical – variables whose data represent a counted or measured quantity.
Categorical – variables whose data represent categories, e.g. ‘yes’ or ‘no’.
Discrete – variables have data that arise from a counting process.
Continuous – variables have data that arise from a measuring process.
MEASUREMENT SCALES
Measurement Scale defines the ordering of values and determines if
differences among pairs of values for a variable are equivalent and whether
one value can be expressed in terms of another.
Interval Scale expresses a difference between measurements that do not
include a true zero point.
Ratio Scale expresses an ordered scale that includes a true zero point.
Nominal Scale classifies data where category values express no order or
ranking.
Ordinal Scale classifies data into distinct categories in which ranking is implied.
POPULATIONS AND SAMPLES
A population contains all the items or individuals of interest that one seeks to
study.
A sample contains only a portion of a population of interest.
A population parameter summarises the value of a population for a specific
variable.
A sample statistic summarises the value of a specific variable for sample data.
SOURCES OF DATA
Primary Data Source is when the data collector is the one using data for
analysis.
Secondary Data Source is when the person performing data analysis is not the
data collector.
A treatment is when researchers that collect data are looking for the effect of
some change on a variable of interest.
TYPES OF SAMPLING METHODS
The frame is a complete or partial listing of the items that make up the
population from which the sample will be selected.
Non-probability sample is where items or individuals are selected without
knowing their probabilities of selection.
Probability sample is where items or individuals are selected based on known
probabilities.
Convenience sample is where select items are chosen because they are easy,
inexpensive or convenient to sample.
Judgement sample is where the opinions of pre-selected experts in the subject
matter are chosen.
SIMPLE RANDOM SAMPLE
 Every individual or item from frame has an equal chance of being
selected.
 Selection may be with or without replacement.
 Samples are obtained from table of random numbers or computer
random number generators.

SYSTEMATIC SAMPLE
 Partition the N items in the frame into n groups of k items, this is where
k = N / n.
 Round k to the nearest integer. To select a systematic sample, choose
the first item to be selected at random from the first k items in the
frame.
 Then, select the remaining n - 1 items by taking every kth item
thereafter from the entire frame.

STRATIFIED SAMPLE
 Divide population into two or more subgroups (known as strata)
according to some common characteristic.
 A simple random sample is selected from each subgroup, sample sizes
being proportional to strata sizes.
 Samples from subgroups are combined into one.
 Application: Population of voters

CLUSTER SAMPLE
 Population is divided into several "clusters", each representative of the
population.
 A simple random sample of clusters is selected.
 All items in selected clusters can be used, or items can be chosen from a
cluster using alternative techniques.
 Application: Election exit polls
DATA CLEANING
 Data Cleaning corrects defects in inconsistent data and ensures the data
contain suitable quality for analysis.
 Invalid Variable Values can be identified as being incorrect by simple
scanning techniques so long as operational definitions for the variables
the data represent exist.
 Coding Errors can result from poor recording or entry of data values or
as the result of computerized operations such as copy-and-paste or data
import.
 Data Integration Errors arise when data from two different
computerised sources, such as two different data repositories are
combined into one data set for analysis.
 Missing Values are values that were not collected for a variable.
 Outliers are values that seem excessively different from most of the
other values.

DATA PRE-PROCESSING TASKS

 Data Formatting includes rearranging the structure of the data or
changing the electronic encoding of the data or both.
 Stacking and Unstacking Data – when collecting data for a numerical
variable, subdividing that data into two or more groups for analysis may
be necessary.
 Unstacked Arrangement – create separate numerical variables for each
group.
 Stacked Arrangement – pair the single numerical variable with a second,
categorical variable that contains two categories.
 Recoding Variables – After data have been collected, categories defined
for a categorical variable may need to be reconsidered or a numerical
variable may need to be transformed into a categorical variable by
assigning individual numeric values to one of several groups.
 A Recoded Variable – is one that supplements or replaces the original
variable in your analysis.
 Mutually Exclusive – one and only one of the new categories can be
assigned to any particular value being recoded.
 Collectively Exhaustive – each value can be recoded successfully by one
of your new categories.
TYPES OF SURVEY ERRORS

 Coverage Error occurs if

1. Certain groups of items are excluded from the frame so that they have
no chance of being selected in the sample.
2. Items are included from outside the frame.
Coverage Error may result in a selection bias.

 Nonresponse Error arises from failure to collect data on all items in the
sample and results in a nonresponse bias.

 Sampling Error reflects the variation, or “chance differences”, from

sample to sample, based on probability of particular individuals or items
being selected in particular samples.
Margin of Error = Sampling Error

 Measurement Error can arise when surveys rely on self-reported

information, the mode of data collection or the respondent to the
survey.

ETHICAL ISSUES
Coverage error can result in selection bias and becomes an ethical issue if
particular groups or individuals are purposely excluded from the frame so that
the survey results are more favourable to the survey’s sponsor.

Nonresponse error can lead to nonresponse bias and becomes an ethical issue
if the sponsor knowingly designs the survey so that particular groups or
individuals are less likely than others to respond.

Sampling error becomes an ethical issue if the findings are purposely

presented without reference to sample size and margin of error so that the
sponsor can promote a viewpoint that might otherwise be inappropriate.

Measurement error can become an ethical issue in one of three ways:

(1) a survey sponsor chooses leading questions that guide the respondent in a
particular direction.
(2) an interviewer, through mannerisms and tone, purposely makes a
respondent obligated to please the interviewer or otherwise guides the
respondent in a particular direction.
(3) a respondent willfully provides false information.

lecture note-1-birleştirildi
No ratings yet
lecture note-1-birleştirildi
137 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
Data Analysis Notes
No ratings yet
Data Analysis Notes
35 pages
Revision SB Chap 2 7
No ratings yet
Revision SB Chap 2 7
55 pages
ECONOMICS SEM 4 Notes Sakshi
No ratings yet
ECONOMICS SEM 4 Notes Sakshi
10 pages
Pdf24 Merged
No ratings yet
Pdf24 Merged
99 pages
SAMPLE SURVEY AND SAMPLING TECHNIQUES
No ratings yet
SAMPLE SURVEY AND SAMPLING TECHNIQUES
35 pages
Statistics and Analysis Notes
No ratings yet
Statistics and Analysis Notes
8 pages
Reviewer Stat Midterm
No ratings yet
Reviewer Stat Midterm
4 pages
Reviewer in Statistical Analysis With Software Application
No ratings yet
Reviewer in Statistical Analysis With Software Application
5 pages
UNIT 10 DATA COLLECTION,ORGANISATION AND PRESENTATION [Autosaved]
No ratings yet
UNIT 10 DATA COLLECTION,ORGANISATION AND PRESENTATION [Autosaved]
25 pages
Initial Data Analysis
No ratings yet
Initial Data Analysis
38 pages
STA301 Short Notes
No ratings yet
STA301 Short Notes
33 pages
AF Notes W2
No ratings yet
AF Notes W2
2 pages
Advance Statistics
No ratings yet
Advance Statistics
21 pages
Module1-Talk-GITAA-modified (Autosaved)
No ratings yet
Module1-Talk-GITAA-modified (Autosaved)
328 pages
Chapter 2-Statistical Tools-1
No ratings yet
Chapter 2-Statistical Tools-1
57 pages
Statistical Characteristics of Numerical Data
No ratings yet
Statistical Characteristics of Numerical Data
9 pages
Data, Data Collection, and Sourcing
No ratings yet
Data, Data Collection, and Sourcing
54 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Icte Lesson
No ratings yet
Icte Lesson
19 pages
Document from Nashra
No ratings yet
Document from Nashra
14 pages
Data Anal Notes
No ratings yet
Data Anal Notes
10 pages
Chapter 1 PDF
No ratings yet
Chapter 1 PDF
5 pages
Unit 4 Big Data Complete Notes
No ratings yet
Unit 4 Big Data Complete Notes
32 pages
Midterm 1
No ratings yet
Midterm 1
14 pages
Statistics and Data Science 188 Y1 s1
No ratings yet
Statistics and Data Science 188 Y1 s1
38 pages
AA SL - Unit 1a - Representing Data (Statistics)
No ratings yet
AA SL - Unit 1a - Representing Data (Statistics)
74 pages
BRM Chapter 6
No ratings yet
BRM Chapter 6
8 pages
STATISTICS N Quantitative
No ratings yet
STATISTICS N Quantitative
58 pages
SIA 2101 - Lecture 10 - Research Analysis
No ratings yet
SIA 2101 - Lecture 10 - Research Analysis
82 pages
Notes For In-Class Activity 1: The Basic Paradigm
No ratings yet
Notes For In-Class Activity 1: The Basic Paradigm
15 pages
DA notes
No ratings yet
DA notes
15 pages
Unit .......
No ratings yet
Unit .......
45 pages
STA116 Chapter 1 - Descriptive Statistics (Part A)
No ratings yet
STA116 Chapter 1 - Descriptive Statistics (Part A)
51 pages
Psych Stats Reviewer
No ratings yet
Psych Stats Reviewer
35 pages
Intro To Statistics
No ratings yet
Intro To Statistics
35 pages
Definitions
No ratings yet
Definitions
4 pages
Statapp Chapter 1 121928
No ratings yet
Statapp Chapter 1 121928
2 pages
Statistic
No ratings yet
Statistic
47 pages
Chapter 8
No ratings yet
Chapter 8
36 pages
g’s GEA1000 Cheatsheet
No ratings yet
g’s GEA1000 Cheatsheet
2 pages
Reviewer +Ch+1+Data+and+Data+Preparation+
No ratings yet
Reviewer +Ch+1+Data+and+Data+Preparation+
3 pages
SASA REVIEWER P1, P4 AT P5
No ratings yet
SASA REVIEWER P1, P4 AT P5
10 pages
Engineering Data Analysis
No ratings yet
Engineering Data Analysis
7 pages
Note For Students
No ratings yet
Note For Students
68 pages
COMM 215 Notes
No ratings yet
COMM 215 Notes
42 pages
Statistics and Data: April Andrea M.Valera 2 0 1 8
No ratings yet
Statistics and Data: April Andrea M.Valera 2 0 1 8
34 pages
1 BASICemm
No ratings yet
1 BASICemm
61 pages
Chapter 1 Data and Data Preparation
No ratings yet
Chapter 1 Data and Data Preparation
3 pages
STAB22 Lecture's Notes
No ratings yet
STAB22 Lecture's Notes
64 pages
GEDS 802 Note - Descriptive Stat - pt.2
No ratings yet
GEDS 802 Note - Descriptive Stat - pt.2
27 pages
Statistical Analysis With Software Application
No ratings yet
Statistical Analysis With Software Application
3 pages
Stat Reviewer
No ratings yet
Stat Reviewer
4 pages
Statistical Concepts and Principles
No ratings yet
Statistical Concepts and Principles
37 pages
Introduction To Data Analtsis
No ratings yet
Introduction To Data Analtsis
33 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
LECT-3-Introduction To Statics-Economics
No ratings yet
LECT-3-Introduction To Statics-Economics
47 pages