Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

Official PyTorch implementation of CIAug: Equipping Interpolative Augmentation with Curriculum Learning, accepted at NAACL 2022 Main conference.

Notifications You must be signed in to change notification settings

sounritesh/CIAug-NAACL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CIAug: Equipping Interpolative Augmentation with Curriculum Learning

Official PyTorch implementation of CIAug (NAACL 2022 Main conference).

CIAug

Instructions for training the model:

  • Install the required packages using requirements.txt
  • Replace the Dataset PATH to where it is stored locally.
  • Replace the Probability Matrix to the place where it is locally stored.
  • Set the Curriculum threshold values.
  • Change bert-base-uncased to bert-base-multilingual-uncased incase running for languages other than english.
  • Replace the num_label with the number of labels in the dataset.
  • Number of training samples in the dataframe.

We have used some standard GLUE Datasets, TREC Dataset, CoNLL Dataset and some other standard datasets in Turkish and Arabic that could be downloaded using the transformer dataset or their official webpage.

For calculating the Matrix, run the code given in matrix.py.

The code is well documented for further explanation.

Cite using:

@inproceedings{sawhney-etal-2022-ciaug,
    title = "{CIA}ug: Equipping Interpolative Augmentation with Curriculum Learning",
    author = "Sawhney, Ramit  and
      Soun, Ritesh  and
      Pandit, Shrey  and
      Thakkar, Megh  and
      Malaviya, Sarvagya  and
      Pinter, Yuval",
    booktitle = "Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jul,
    year = "2022",
    address = "Seattle, United States",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.naacl-main.127",
    pages = "1758--1764",
    abstract = "Interpolative data augmentation has proven to be effective for NLP tasks. Despite its merits, the sample selection process in mixup is random, which might make it difficult for the model to generalize better and converge faster. We propose CIAug, a novel curriculum-based learning method that builds upon mixup. It leverages the relative position of samples in hyperbolic embedding space as a complexity measure to gradually mix up increasingly difficult and diverse samples along training. CIAug achieves state-of-the-art results over existing interpolative augmentation methods on 10 benchmark datasets across 4 languages in text classification and named-entity recognition tasks. It also converges and achieves benchmark F1 scores 3 times faster. We empirically analyze the various components of CIAug, and evaluate its robustness against adversarial attacks.",
}

About

Official PyTorch implementation of CIAug: Equipping Interpolative Augmentation with Curriculum Learning, accepted at NAACL 2022 Main conference.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages