The code in this repo produces all of the plots and tables in the paper "The Power of Batching in Multiple Hypothesis Testing", which will appear in AISTATS 2020.
This code was developed using python 3.7.3. The code would likely run with anything as new as or newer than python 3.6, but this is untested. For the remainder of the README, pip
refers to your pip installation for python3, and python3
refers to your installation of python3.
The required packages are in requirements.txt
and can be installed through pip with pip install -r requirements.txt
.
Before installing the packages, you may want to create a virtual environment using conda or another tool in order to avoid overwriting any of your current python packages.
The following steps will generate/save the plots and print the tables to standard out.
- Clone the repo.
- Download the
creditcard.csv
file from Kaggle and place the file in the top level directory of the repo. - Run
python creditcard.py [--n-cpus CPUS]
, whereCPUS
is the number of cpus you would like to use. If--n-cpus
is not present, the code will use all available cpus. - Run
bash run.sh CPUS
, whereCPUS
is the number of cpus you would like to use. IfCPUS
is not specified, the code will use all available cpus.
The two tables that are printed to stdout are the two tables that appear in the paper for the credit card fraud experiments.
All of the plots that appear in the paper are saved to out/imgs
. The plots can be interpreted as:
[mean3|mean0]_[bh|sbh|bbh|bsbh]_[pi1s|pi1_1|pi1_5].png
:mean3
indicates the task where the alternative mean is 3, andmean0
indicates that the task is where the alternative mean is random and centered at 0.bh
indicates that the algorithm is BH,sbh
indicates that the algorithm is Storey-BH,bbh
indicates that the algorithms areBatch BH
andLORD
, andbsbh
indicates that the algorithms areBatch St-BH
andSAFFRON
.pi1s
indicates that the x-axis represents pi1,pi1_1
indicates that pi1 = 0.1, andpi1_5
indicates that pi1 = 0.5.monotone_[mean3|mean0]_[bbh|bsbh].png
: These plots show the empirical percent of trials where an algorithm was monotone.mean3
indicates the task where the alternative mean is 3, andmean0
indicates that the task is where the alternative mean is random and centered 0.bbh
indicates that the algorithms areBatch BH
andLORD
, andbsbh
indicates that the algorithms areBatch St-BH
andSAFFRON
rdiff[10|100|1000].png
: These plots show empirical values of R_t^+ - R_t forBatch BH
.
Depending on how python is installed on your system, you may have to edit run.sh
to use your desired python installation.