Multivariant Data.

This document discusses techniques for screening data prior to analysis, including checking for accuracy of values, identifying and addressing missing data, identifying outliers, and checking assumptions such as additivity, normality, linearity, homogeneity, and homoscedasticity. It outlines specific steps and statistical tests to screen for each of these issues in the data.

Uploaded by

Abdullah Afzal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Multivariant Data.

Uploaded by

Abdullah Afzal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

The Way of those on whom You have bestowed Your Grace,

not (the way) of those who earned Your Anger, nor of those
who went astray.
(The Qur'an- Surah Al- Fatihah)”
Subject
MULTIVARIATE DATA
ANALYSIS

Presented To DR Muhammad Shoaib

Presented By Abdullah Afzal

Roll Number L-21305 MBA(1.5)

TOPIC

What is screening of data and how many

techniques are available to screen the
data.
What is Data Screening
• Data screening is very important to make sure you’ve
met all your assumptions, outliers, and error problems.
Each type of analysis will have different types of data
screening.
• It is very easy to make mistakes when entering data.
• Some errors can miss up your analysis.
• So, it is important to spend the time for checking for
the mistakes initially, rather than trying to repair the
damage later, try another person to check your data.
TECHNIQUES OF SCREENING DATA:

IN THIS ORDER:
• Accuracy
• Missing
• Outliers

Assumptions:
• Additivity
• Normality
• Linearity
• Homogeneity / Homoscedasticity
Accuracy:
•Look for problems with the database.
•Generally, you want values that are outside
the scope; check the minimum and
maximum to see if it is within your
expectations.
•Fix it or just delete that data point. Do not
delete everyone, just the wrong data point.
Missing Data:
• Values in a data set are missing completely at random
(MCAR) if the events that lead to any particular data-item
being missing are independent both of observable variables
and of unobservable parameters of interest, and occur
entirely at random. When data are MCAR, the analysis
performed on the data is unbiased; however, data are rarely
MCAR. In the case of , the missingness of data is unrelated to
any study variable: thus, the participants with completely
observed data are in effect a random sample of all the
participant assigned a particular intervention. With MCAR,
the random assignment of treatments is assumed to be
preserved, but that is usually an unrealistically strong
assumption in practice.
Missing at random (MAR)
• Occurswhen the missingness is not random, but where
missingness can be fully accounted for by variables
where there is complete information. Since MAR is an
assumption that is impossible to verify statistically, we
must rely on its substantive reasonableness.
Missing not at random (MNAR)
• Data that is neither MAR nor MCAR (i.e. the value of the
variable that's missing is related to the reason it's
missing). To extend the previous example, this would
occur if men failed to fill in a depression survey because
of their level of depression.
Univariate
• Univariateis a term commonly used in statistics to
describe a type of data which consists of observations
on only a single characteristic or attribute. A simple
example of univariate data would be the salaries of
workers in industry. Like all the other data, univariate
data can be visualized using graphs, images or other
analysis tools after the data is measured, collected,
reported, and analyzed.
Multivariate
• outliersrefer to records that do not fit the
standard sets of correlations exhibited by the
other records in the dataset, with regards to
your causal model. So, if all but one person in
the dataset reports that diet has a positive
effect on weight loss, but this one guy reports
that he gains weight when he diets, then his
record would be considered a multivariate
outlier.
OUTLIERS
• Outliers are case(s) with extreme value on one
variable or multiple variables. – Univariate
outliers: you are an outlier for one variable. –
Multivariate outliers: you are an outlier for
multiple variables. Your pattern of data is weird.
OUTLIERS
• Youcan check for univariate outliers, but when
you have a big dataset, that’s really not
necessary. – We will cover how to check for
univariate outliers when necessary, since it
really only applies to those analyses (when you
have 1 DV like ANOVA).
OUTLIERS
• Multivariate – check with Mahalanobis distance.
– Mahalanobis distance – distance of a case
from the centroid of rest of cases. Centroid is
created by plotting the means of the all the
variables (like an average of averages), and then
seeing how far each person’s scores are from
the middle.
OUTLIERS
• How to check: – Create Mahalanobis scores. –
Use the chi-square function to find the cut off
score (anything past this score is an outlier). – df
= the number of variables you are testing
(important: testing! Not all the variables!) – Use
the p <.001 value.
OUTLIERS
• Regression
based outlier analyses: – Leverage –
a score that is far out on line but doesn’t
influence regression slopes (measured by
leverage values). – Discrepancy – a score that is
far away from everyone else that affects the
slope – Influence – product of leverage and
discrepancy (measured by Cook’s values).
OUTLIERS
• What do I do with them when I find them? – Ask
yourself: Did they do the study correctly? Are
they part of the population you wanted? –
Eliminate them – Leave them in
Assumptions Checks
•Additivity
•Normality
•Linearity
•Homogeneity
•Homoscedasticity
Additivity
• Additivityis the assumption that each variable
adds something to the analysis. Often, this
assumption is thought of as the lack of
multicollinearity. – Which is when variables are
too highly correlated. – What is too high?
General rule = r >.90
Additivity
• So,we don’t want to use two variables in an
analysis they are essentially the same. – Lowers
power – Doesn’t run – Waste of time If you get a
“singular matrix” error, you’ve used two
variables in an analysis that are too similar or
the variance of a variable is essentially zero.
Assumption Checks
•A quick note: for many of the statistical tests
you would run, there are diagnostic plots /
assumptions built into them. This guide lets you
apply data screening to any analysis, if you
wanted to learn one set of rules, rather than one
for each analysis. (BUT there are still things that
only apply to ANOVA that you’d want to add
when you run ANOVA).
Assumption Checks
• ForANOVA, t-tests, correlation: you will use a
fake regression analyses – it’s considered fake
because it’s not the real analysis, just a way to
get the information you need to do data
screening. For regression based tests: you can
run the real regression analysis to get the same
information. The rules are altered slightly, so
make sure you make notes in the regression
section on what’s different.
Assumption Checks
• Checks The random thing and why chi square: –
Why chi-square? – For many of these
assumptions, the errors should be chi-square
distributed (aka lots of small errors, only a few
big ones). – However, the standardized errors
should be normally distributed around zero
(don’t get these two things confused – we want
the actual error numbers to be chi-square
distributed, the standardized ones to be
normal).
Normality
•The assumption of normality is that the
sampling distribution is normal. – Not the
sample. – Not the population. – Distribution
bunnies to the rescue!
Normality
• Multivariate
Normality – each variable and all linear
combinations of variables are normal Given the Central
Limit Theorem – with N > 30 tests are robust (meaning
that you can violate this assumption and get
reasonably accurate statistics).
Normality
•So,the way to check for normality
though, is to use the sample. – If N > 30,
we tend not to worry too much. – If N <
30, and it’s bad, you may consider
changing analyses.
Linearity
•Assumption that there is a straight line
relationship between two variables (or the
combination of all the variables)
HomoG + S
•Homogeneity: equal variances – the
variables (or groups) have roughly equal
variances. Homoscedasticity: spread of the
variance of a variable is the same across all
values of the other variables.
HomoG + S
•How to check them: – Both of these can be
checked by looking at the residual scatterplot.
– Fitted values = the predicted score for a
person in your regression. – Residuals = the
difference between the predicted score and a
person’s actual score in the regression (y – y
hat).
HomoG + S
• How to check them: – We are plotting them
against each other. In theory, the residuals should
be randomly distributed (hence why we created a
random variable to test with). – Therefore, they
should look like a bunch of random dots.
R-Tips
•Multiple datasets will be created. –
Remember to use the right data set.
Sometimes, you don’t have problems. –
So, you would just SKIP a step making
sure to use the right dataset.

Bank of baroda Mobile Change New Form
29% (7)
Bank of baroda Mobile Change New Form
1 page
(Oxford World's Classics) Euripides, James Morwood, Edith Hall - The Trojan Women and Other Plays-Oxford University Press (2009) PDF
92% (12)
(Oxford World's Classics) Euripides, James Morwood, Edith Hall - The Trojan Women and Other Plays-Oxford University Press (2009) PDF
939 pages
1 Daewoo
100% (1)
1 Daewoo
77 pages
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
From Everand
Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions
Jim Frost
No ratings yet
List of Pharma Companies in Vizag
100% (3)
List of Pharma Companies in Vizag
4 pages
Data Screening Assumptions
No ratings yet
Data Screening Assumptions
29 pages
module 3 data preparation
No ratings yet
module 3 data preparation
33 pages
data screening and main model analysis in spss
No ratings yet
data screening and main model analysis in spss
26 pages
BRM CS
No ratings yet
BRM CS
4 pages
BRM Statwiki
No ratings yet
BRM Statwiki
55 pages
Presentation 3
No ratings yet
Presentation 3
14 pages
Most Important Findings 1zm31 Per Subject
No ratings yet
Most Important Findings 1zm31 Per Subject
24 pages
Act2 Apren GVZA
No ratings yet
Act2 Apren GVZA
4 pages
7. Data Cleaning
No ratings yet
7. Data Cleaning
39 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
No ratings yet
Data Screening (Sometimes Referred To As "Data Screaming") Is The Process of Ensuring Your Data Is
4 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
6. Assumption_16_oct18
No ratings yet
6. Assumption_16_oct18
48 pages
Data Screening/Cleaning/ Preparation For Analyses
No ratings yet
Data Screening/Cleaning/ Preparation For Analyses
13 pages
L18&19 Data Exploration
No ratings yet
L18&19 Data Exploration
50 pages
Advanced Quantitative Methods
No ratings yet
Advanced Quantitative Methods
12 pages
Methods notes
No ratings yet
Methods notes
9 pages
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
No ratings yet
SEM Boot Camp Day 1 Morning: Basics & Data Screening: James Gaskin James - Gaskin@byu - Edu
38 pages
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
No ratings yet
Estadístic A Descriptiv A: Dr. Lázaro Bustio Martínez Otoño 2023
42 pages
Data Analysis
100% (2)
Data Analysis
87 pages
Data Screening: Wei-Jiun, Shen Ph. D
No ratings yet
Data Screening: Wei-Jiun, Shen Ph. D
31 pages
Basic Statistics: Basic Statistical Interview Question
No ratings yet
Basic Statistics: Basic Statistical Interview Question
5 pages
Lecture 1 Data Quality and Statistics
50% (2)
Lecture 1 Data Quality and Statistics
31 pages
Quantitative Research Methods - Data Processing and Analysis
No ratings yet
Quantitative Research Methods - Data Processing and Analysis
25 pages
Group 1 Testing Assumptions
No ratings yet
Group 1 Testing Assumptions
35 pages
Session 2 - QRT - Oct 3, 2020
No ratings yet
Session 2 - QRT - Oct 3, 2020
17 pages
Using Multivariate Statistics: Barbara G. Tabachnick
No ratings yet
Using Multivariate Statistics: Barbara G. Tabachnick
22 pages
Advanced Statistics and Probability
No ratings yet
Advanced Statistics and Probability
37 pages
Presentation Fbook Version
No ratings yet
Presentation Fbook Version
22 pages
Act 2 AGJ
No ratings yet
Act 2 AGJ
6 pages
Day 3 Statistics Interview QnA
No ratings yet
Day 3 Statistics Interview QnA
5 pages
Data Mining Reviewer
No ratings yet
Data Mining Reviewer
4 pages
Inferential Statistics
No ratings yet
Inferential Statistics
22 pages
Seminar 3
No ratings yet
Seminar 3
69 pages
Natalie Loxton Data Screening
No ratings yet
Natalie Loxton Data Screening
36 pages
Quant Notes For C-Exam
No ratings yet
Quant Notes For C-Exam
4 pages
Unit 1
No ratings yet
Unit 1
38 pages
Big Data Chapter 3
No ratings yet
Big Data Chapter 3
29 pages
PSY417 Week02
No ratings yet
PSY417 Week02
38 pages
Complete SPSS Tests
No ratings yet
Complete SPSS Tests
148 pages
Ch-5
No ratings yet
Ch-5
26 pages
Chapter 02 Overview (R)
No ratings yet
Chapter 02 Overview (R)
43 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
No ratings yet
Preliminary Analysis: - Descriptive Statistics. - Checking The Reliability of A Scale
92 pages
Cheat Sheet
No ratings yet
Cheat Sheet
3 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
Introduction To Data Cleaning and Bias in Analysis
No ratings yet
Introduction To Data Cleaning and Bias in Analysis
35 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
SQL Interview Questions Goldman Sachs
No ratings yet
SQL Interview Questions Goldman Sachs
19 pages
RM-Cha 9
No ratings yet
RM-Cha 9
83 pages
C207 Study Guide
No ratings yet
C207 Study Guide
27 pages
AP Statistics Michel Liao
No ratings yet
AP Statistics Michel Liao
20 pages
CH 8 Data Analysis
No ratings yet
CH 8 Data Analysis
34 pages
Chapter 8
No ratings yet
Chapter 8
36 pages
Datascience Interview
100% (1)
Datascience Interview
31 pages
DWDM Unit 1 Chap2 PDF
No ratings yet
DWDM Unit 1 Chap2 PDF
21 pages
Analytics - PrepBook 2018 PDF
No ratings yet
Analytics - PrepBook 2018 PDF
34 pages
Data Wrangling
No ratings yet
Data Wrangling
18 pages
Abdullah (21305) E-Business
No ratings yet
Abdullah (21305) E-Business
10 pages
Abdullah Afzal MBA1.5
No ratings yet
Abdullah Afzal MBA1.5
10 pages
Drug Names: Drugs
No ratings yet
Drug Names: Drugs
3 pages
Custal Project Proposal - by Slidesgo
No ratings yet
Custal Project Proposal - by Slidesgo
22 pages
Ashtel Trading Est
No ratings yet
Ashtel Trading Est
1 page
Chief Minister's Stunting Reduction Programme For 11 Southern Districts of Punjab
No ratings yet
Chief Minister's Stunting Reduction Programme For 11 Southern Districts of Punjab
3 pages
Frequency Percent Valid Percent Cumulative Percent 1 55 55.6 55.6 55.6 2 44 44.4 44.4 100.0 Total 99 100.0 100.0
No ratings yet
Frequency Percent Valid Percent Cumulative Percent 1 55 55.6 55.6 55.6 2 44 44.4 44.4 100.0 Total 99 100.0 100.0
6 pages
Abdullah Afzal at Daniyal (Big Data)
No ratings yet
Abdullah Afzal at Daniyal (Big Data)
2 pages
Effect of Compensation On Job Performance: An Empirical Study
No ratings yet
Effect of Compensation On Job Performance: An Empirical Study
1 page
7P's of Marketing Mix Marketing Project
No ratings yet
7P's of Marketing Mix Marketing Project
16 pages
MCH Layyah 2nd Pre-PDWP
No ratings yet
MCH Layyah 2nd Pre-PDWP
9 pages
Strategic Environmental Assessment Transport Reforms Pakistan
No ratings yet
Strategic Environmental Assessment Transport Reforms Pakistan
232 pages
Sector Assessment (Summary) : Transport (Road Transport (Nonurban) ) I. Sector Performance, Problems and Opportunities
No ratings yet
Sector Assessment (Summary) : Transport (Road Transport (Nonurban) ) I. Sector Performance, Problems and Opportunities
6 pages
Tesco
No ratings yet
Tesco
4 pages
Goodness of Measure
No ratings yet
Goodness of Measure
15 pages
News Paper 21 March, Fiscal Policy
No ratings yet
News Paper 21 March, Fiscal Policy
18 pages
Context Clues
No ratings yet
Context Clues
4 pages
ILGC 3 Project Summary
No ratings yet
ILGC 3 Project Summary
2 pages
reading test 3+4-Khải
No ratings yet
reading test 3+4-Khải
4 pages
Aajonus Proof Receipts v0.1
No ratings yet
Aajonus Proof Receipts v0.1
53 pages
01 Introduction To FinMan
No ratings yet
01 Introduction To FinMan
76 pages
2024 Stanford cs25 Guest Lecture Jason Wei
No ratings yet
2024 Stanford cs25 Guest Lecture Jason Wei
20 pages
Relationship Between Cognitive Dissonance and Achievement in Mathematics Among Higher Secondary School Students
No ratings yet
Relationship Between Cognitive Dissonance and Achievement in Mathematics Among Higher Secondary School Students
5 pages
Mental Models: An Interdisciplinary Synthesis of Theory and Methods
No ratings yet
Mental Models: An Interdisciplinary Synthesis of Theory and Methods
13 pages
NSS Chemistry Part 3 Metals - MC
No ratings yet
NSS Chemistry Part 3 Metals - MC
20 pages
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
No ratings yet
Department of Education: Philippine Contemporary Arts in The Region Quarter3-Week3-4
6 pages
Narcissism and Attractiveness
No ratings yet
Narcissism and Attractiveness
4 pages
13757
No ratings yet
13757
48 pages
Pump Specification Weights and Measurements: Dorothy Waite
No ratings yet
Pump Specification Weights and Measurements: Dorothy Waite
14 pages
Mapei Self Leveling
No ratings yet
Mapei Self Leveling
4 pages
Common Idioms List
No ratings yet
Common Idioms List
3 pages
Poisoning Toxicology
No ratings yet
Poisoning Toxicology
2 pages
(Ebook) The Gospel of Matthew: A Hypertextual Commentary (European Studies in Theology, Philosophy and History of Religions, Book 16) by Adamczewski, Bartosz ISBN 9783631679418, 3631679416 - Quickly download the ebook to read anytime, anywhere
100% (2)
(Ebook) The Gospel of Matthew: A Hypertextual Commentary (European Studies in Theology, Philosophy and History of Religions, Book 16) by Adamczewski, Bartosz ISBN 9783631679418, 3631679416 - Quickly download the ebook to read anytime, anywhere
73 pages
Multiplication Tips
No ratings yet
Multiplication Tips
10 pages
FINAL Work Experience Sheet - ENGR. FAROUK NINOY A. BASILAN
No ratings yet
FINAL Work Experience Sheet - ENGR. FAROUK NINOY A. BASILAN
4 pages
Osha Safety Audits
No ratings yet
Osha Safety Audits
12 pages
33kv Alternate Supply
No ratings yet
33kv Alternate Supply
11 pages
GRW M18L04
No ratings yet
GRW M18L04
3 pages
Different File Formats
No ratings yet
Different File Formats
10 pages
Cave - Wikipedia
No ratings yet
Cave - Wikipedia
3 pages
DPP of Solution
No ratings yet
DPP of Solution
75 pages
NCP PTB
No ratings yet
NCP PTB
9 pages
June 2018 QP (12)
No ratings yet
June 2018 QP (12)
32 pages