Analyzing the Potential of Active Learning for Document Image Classification

This repository contains the code for the paper Analyzing the Potential of Active Learning for Document Image Classification by Saifullah Saifullah, Stefan Agne, Andreas Dengel, and Sheraz Ahmed.

Installation and Dependencies

Clone the repository along with sub-modules:

git clone https://github.com/saifullah3396/doc_al.git --recursive

Install the project dependencies.

pip install -r requirements.txt

Setup the environment variables for running the code.

export PYTHONPATH=`pwd`/src:`pwd`/external/xai_torch/src

Setup the output directory for generating checkpoints and logs.

export XAI_TORCH_OUTPUT_DIR=<path-to-output-dir>

Setup the output directory for dataset and models.

export XAI_TORCH_CACHE_DIR=<path-to-cache-dir>

Running an experiment directly

To run an experiment, call the main training script and set args/al_args=. For example to run, active learning with entropy sampling on Tobacco3482 dataset, run:

./scripts/al_train.sh --config-path ../../../cfg/al_adv +experiment=active_learning/tobacco3482/resnet50 args/al_args=entropy_sampling args.data_args.dataset_dir=<path-to-dataset-dir>

Running experiments through a helper script

To run different experiments, the helper scripts ./experiments/tobacco3482.sh, ./experiments/tobacco3482_pre.sh, ./experiments/rvlcdip.sh can be used. For example to run an experiment with tobacco3482 dataset, class imbalance m=2, and entropy sampling, the following script can be used.

./scripts/experiments/tobacco3482.sh --query=entropy_sampling --exp=imb_2 --data-path=<path-to-dataset-dir> --seed=0

Citation

If you find this paper helpful in your work, please consider citing:

Saifullah, S., Agne, S., Dengel, A. et al. Analyzing the potential of active learning for document image classification. IJDAR 26, 187–209 (2023). [https://doi.org/10.21203/rs.3.rs-2273654/v1](https://doi.org/10.1007/s10032-023-00429-8)]

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
cfg/al_adv		cfg/al_adv
external		external
scripts		scripts
src/al		src/al
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analyzing the Potential of Active Learning for Document Image Classification

Installation and Dependencies

Running an experiment directly

Running experiments through a helper script

Citation

License

About

Releases

Packages

Languages

saifullah3396/doc_al

Folders and files

Latest commit

History

Repository files navigation

Analyzing the Potential of Active Learning for Document Image Classification

Installation and Dependencies

Running an experiment directly

Running experiments through a helper script

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages