The code and data for "Can Language Models Be Specific? How?"
S-TEST is a benchmark for measuring the specificity of the language of pre-trained language models.
Currently, S-TEST contains a set of connectors to the following pre-trained language models.
- GPT-2 (Radford et al., 2019)
- BERT-Base (Devlin et al., 2019)
- BERT-Large (Devlin et al., 2019)
- RoBERTa-Base (Liu et al., 2019)
- RoBERTa-Large (Liu et al., 2019)
This repo is build upon the LAMA benchmark.
To reproduce the results:
conda create -n stest37 -y python=3.7 && conda activate stest37
python setup.py install
pip install -r requirements.txt
Install spacy model
python3 -m spacy download en
Download the models
chmod +x download_models.sh
./download_models.sh
The script will create and populate a pre-trained_language_models
folder.
If you are interested in a particular model please edit the script.
python scripts/run_experiments.py
Temporary results will be logged in output/
and last_results.csv
python eval.py
The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:
@article{huang2022can,
title={Can Language Models Be Specific? How?},
author={Huang, Jie and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
journal={arXiv preprint arXiv:2210.05159},
year={2022}
}