SAS Programming For Data Mining: AUC Calculation Using Wilcoxon Rank Sum Test
SAS Programming For Data Mining: AUC Calculation Using Wilcoxon Rank Sum Test
Este sitio emplea cookies como ayuda para prestar servicios. Al utilizar este sitio, ests aceptando el uso de cookies.
4/28/15, 7:02 PM
Ms informacin
Entendido
Home
About Me
Follow Me
Join this site
with Google Friend Connect
The relationship between AUC and Wilcoxon Rank Sum test statistics is: AUC = (W-W0)/(N1*N0)+0.5 where N1 and
N0 are the frequency of class 1 and 0, and W0 is the Expected Sum of Ranks under H0: Randomly ordered, and W is
the Wilcoxon Rank Sums.
In one application example shown below, PROC LOGISTIC reports c=0.911960, while this method calculates it as
AUC=0.9119491555
Sites on SAS
Analytics in Writing
MySAS.NET
PROC-X Aggregator
SAS Analysis by Charlie
SAS Community
SAS Die Hard
SAS Graph Examples
SAS Support
SAS-L Archives
StatComput by Wensui
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 1 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
set Asso;
if Label2='c' then put 'c-stat=' nValue2;
4/28/15, 7:02 PM
run;
%AUC( predicted, y, p_0);
Python SciPy
R Bloggers Aggregator
R Cookbook
R Graphics
R Project
NPAR1WAY gets
AUC = 0.91766634744;
LOGISTIC reports c-statistic = 0.917659
So, which one is more accurate? I would say, NPAR1WAY. The reason is that we can also use yet another procedure,
PROC FREQ to verify the gini value which is 2*(AUC-0.5). Gini index is called Somers'D in PROC FREQ. Here, from
NPAR1WAY, gini value is calculated as 0.8353269487, the same as reported Somer's D C|R (since the column
variable is predictor)from PROC FREQ:
Recommended Sites
Baidu
Bing
Colt: JAVA Lib for Computing
Google
Then why not just use PROC FREQ since the coding is so simple? Well, the answer is really about the SPEED!
Check the log below for a data with only 100000 observations, 37.63sec vs. 0.15 sec in real time:
Tag
Array (5)
AUC (1)
Bayesian (2)
Boost Algorithms (4)
Data Manipulation (14)
Data Mining (12)
Erlang C (1)
3546
3547
3548
3549
3550
3551
3552
3553
3554
3555
3556
data one;
call streaminit(98676876);
do id=1 to 1e5;
score=ranuni(0)*1000;
if score+rannor(0)>0 then y=1;
else y=0;
output;
drop id;
end;
run;
NOTE: The data set WORK.ONE has 100000 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time
0.04 seconds
cpu time
0.04 seconds
3557
3558
3559
3560
3561
NOTE: There were 100000 observations read from the data set WORK.ONE.
NOTE: The data set WORK._FREQ_OUT has 1 observations and 27 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time
37.63 seconds
cpu time
37.56 seconds
Filter (1)
Finite Mixture Model (1)
Format (1)
Gap Statistic (1)
Gini Index (1)
GRAPH (2)
Hash Object (4)
Heckman Selection model (1)
HOSVD (2)
3562
3563
3564
3565
3566
3567
data _null_;
set _freq_out;
AUC=_smdrc_/2 + 0.5;
put "AUC = " AUC "
SOMER'S D R|C = " _smdrc_;
run;
AUC = 0.9995285252
SOMER'S D R|C = 0.9990570504
NOTE: There were 1 observations read from the data set WORK._FREQ_OUT.
NOTE: DATA statement used (Total process time):
real time
0.00 seconds
cpu time
0.00 seconds
3568
3569
%AUC(one, y, score);
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 2 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
HPGLIMMIX (1)
Index (2)
K-means Clustering (3)
K/N Algorithm (1)
kernel (1)
KNN (3)
LGD (1)
Macro Programming (7)
Moore-Penrose pseudoinverse
(3)
4/28/15, 7:02 PM
NOTE: There were 100000 observations read from the data set WORK.ONE.
WHERE y not = .;
NOTE: PROCEDURE NPAR1WAY used (Total process time):
real time
0.10 seconds
cpu time
0.09 seconds
AUC=0.9995285252 Gini=0.9990570504
NOTE: There were 2 observations read from the data set WORK.WILCOXONSCORE.
NOTE: The data set WORK.AUC has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time
0.01 seconds
cpu time
0.00 seconds
multi-threading (1)
Nearest Neighbor (3)
Over-dispersion (1)
PCA (3)
6 comments:
Charlie Shipp Family said...
Looks great .!.
Charlie Shipp
I'd like to know where did you find this relationship between AUC and Wilcoxon Rank Sum Test. I'm trying
to study more about it and it would really help!
Thanks
if you are interested in discussing this issue, then contact me via linkedin.
Jon Dickens
Post a Comment
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 3 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
4/28/15, 7:02 PM
SAS Programming
SAS SPSS
Newer Post
Home
Older Post
Recent Posts
Page 4 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
4/28/15, 7:02 PM
Dec-18-2014 | More
Experient downdating
algorithm for Leave-OneOut CV in RDA
In this post, I want to
demonstrate a piece of
experiment code for
downdating algorithm for
Leave-One-Out (LOO)
Cross Validation in
Regularized...
Dec-15-2014 | More
Control Excel via SAS DDE
& Python win32com
Excel is probably the most
used interface between
human and data.
Whenever you are dealing
with business people,
Excel is the de facto means
for all...
Dec-15-2014 | More
%HPGLIMMIX SAS macro
is available online at JSS
website
My paper "%HPGLIMMIX:
A High-Performance SAS
Macro for GLMM
Estimation" is now
available at Journal of
Statistical Software website
@here. SAS macro...
Jul-01-2014 | More
Market trend in advanced
analytics for SAS, R and
Python
Disclaimer: This
study is a view on
the market
trend on demand
of advanced analytics
software and their
adoptions from the job
market perspective,...
Dec-06-2013 | More
I don't always do
regression, but when I do, I
do it in SAS ...
There are several
exciting add-ins
from SAS
Analytics products
running on v9.4, especially
the SAS/STAT high
performance procedures,
where "high...
Jul-19-2013 | More
Finding the closest pair in
datat using PROC
MODECLUS
UPDATE: Rick
Wicklin kindly
shared his
visualization efforts
on the output to put a more
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 5 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
4/28/15, 7:02 PM
****************************;
Bob at r4stats.com claimed
that a linear mixed model
with over 5 million
observations and 2
million...
Mar-26-2013 | More
Poor man's HPQLIM?
Tobit model is a
type of censored
regression and is
one of the most
important regression
models you will encounter
in business. Amemiya
1984...
Feb-26-2013 | More
Kaggle Digit Recoginizer:
SAS k-Nearest Neighbor
solution
Kaggle is hosting
an educational
data mining
competition:
Kaggle Digit Recognizer,
using MNIST data.
Handwritten digit
recognition is one of...
Dec-10-2012 | More
KNN Classification and
Regression in SAS
PDF available at
here. Related post
on KNN
classification using
SAS is here. In data mining
and predictive modeling, it
refers to a memory-based
(or...
Nov-25-2012 | More
Finite Mixture Model for
Loss Given Default (LGD)
Loss Given Default
(LGD) is a key
business metric of
risk in financial
service. One unique
feature of this metric is
overdispersion and the
other is...
Oct-04-2012 | More
SAS functions for
computing parameters in
Erlang-C model
Call center
management is
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 6 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
4/28/15, 7:02 PM
management is
both Arts and
Sciences. While
driving moral and setting
up strategies is more about
Arts, staffing and servicing
level...
Jul-12-2012 | More
Stochastic Gradient
Decending Logistic
Regression in SAS
Test the Stochastic
Gradient Decending
Logistic Regression in
SAS. The logic and code
follows the code piece of
Ravi Varadhan, Ph.D from
this...
May-24-2012 | More
Multi-Threaded Principle
Component Analysis
SAS used to not
support
multithreading in
PCA, then I figured
out that its server version
supports this functionality,
see here. Today, I...
Jan-31-2012 | More
Random Number Seeds:
NOT only the first one
matters!
Today, Rick (blog
@ here) wrote
an article about
random number
seed in SAS to be used in
random number functions
in DATA Step. Rick noticed
when...
Jan-30-2012 | More
Using PROC CANCORR to
solve large scale PLS
problem
Partial Least
Square (PLS) is a
powerful tool for
discriminant
analysis with large number
of predictors [1]. PLS
extracts latent factors
that...
Nov-16-2011 | More
Bayesian Computation (3)
In Chapter 3 of "Bayesian
Computation with R", Jim
Albert talked about how to
conduct 2 fundamental
tasks of Statistics, namely
Estimation and...
Oct-06-2011 | More
Powered By : Blogger Plugins
Blog Archive
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 7 of 8
SAS Programming for Data Mining: AUC calculation using Wilcoxon Rank Sum Test
4/28/15, 7:02 PM
2015 (2)
2014 (4)
2013 (5)
2012 (7)
2011 (11)
2010 (19)
2009 (12)
December (3)
October (1)
AUC calculation
using Wilcoxon
Rank Sum Test
September (2)
August (2)
July (1)
June (1)
April (1)
March (1)
2008 (1)
2007 (5)
2006 (3)
SAS Output
SAS Analysis
SAS Macro
4 9 8 9
Copyright (c). Liang Xie. Awesome Inc. template. Powered by Blogger.
http://www.sas-programming.com/2009/10/auc-calculation-using-wilcoxon-rank-sum.html
Page 8 of 8