FinBERT-QA is a Question Answering system for retrieving opinionated financial passages from task 2 of the FiQA dataset. Please see FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models for further information.
The system uses techniques from both information retrieval and natural language processing by first retrieving the top-50 answer candidates of each query using the Lucene toolkit, Anserini, then re-ranking the answer candidates using variants of pre-trained BERT models.
Built from Huggingface's transformers library and applying the Transfer and Adapt [TANDA] method, FinBERT-QA first transfers and fine-tunes a pre-trained BERT model to a general QA task, then adapts this model to the financial domain using the FiQA dataset. The transfer step uses the fine-tuned BERT model on the MS MACRO Passage Retrieval dataset from Nogueira et al., where it was converted from a TensorFlow to a PyTorch model.
The state-of-the-art results were improved by an average of ~20% on three ranking evaluation metrics (nDCG, MRR, Precision).
Overview of the QA pipeline:
1. The Anserini Answer Retriever first retrieves the top-50 candidate answers
2. A pre-trained BERT model is first transferred to the large-scale MS MACRO dataset
3. The transfered BERT model is then adapted to the target FiQA dataset to create the Answer Re-ranker
4. The final Answer Re-ranker outputs the top-10 most relevant answers
If no GPU is available, an alternative and low-effort way to train and evaluate the models as well as predicting the results is through the following online notebooks using Colab.
This repo can be used as a container with Docker. This is does not require a locally checked out copy of FinBERT-QA. Run the commands as root if Docker is not configured.
docker pull yuanbit/finbert_qa:3.0
Run with GPU.
docker run --runtime=nvidia -it yuanbit/finbert_qa:3.0
Run to query the top-k opinionated answers from the financial domain
python3 src/predict.py --user_input --top_k 5
Sample questions:
• Getting financial advice: Accountant vs. Investment Adviser vs. Internet/self-taught?
• Are individual allowed to use accrual based accounting for federal income tax?
• What are 'business fundamentals'?
• How would IRS treat reimbursement in a later year of moving expenses?
• Can I claim mileage for traveling to a contract position?
• Tax planning for Indian TDS on international payments
• Should a retail trader bother about reading SEC filings
• Why are American Express cards are not as popular as Visa or MasterCard?
• Why do companies have a fiscal year different from the calendar year?
• Are credit histories/scores international?
The retriever uses the BM25 implementation from Anserini. To replicate the creation of the Lucene index for the FiQA dataset run the following inside the docker image:
cd retriever
git clone https://github.com/castorini/anserini.git
sh indexer.sh
The raw dataset has been cleaned and split into training, validation, and test sets in the form of lists where each sample is a list of [question id, [label answer ids], [answer candidate ids]]
. The datasets are stored in the pickle files in data/data_pickle
. The generation of the datasets can be replicated by running the src/generate_data.py
script, more details please see Generate data. data/data_pickle/labels.pickle
is a pickle file consisting of a python dictionary where the keys are the question ids and the values are lists of relevant answer ids.
Since creating inputs to fine-tune a pre-trained BERT model can take some time, sample datasets are provided in data/sample/
for testing.
Example QA pair:
Question: Why are big companies like Apple or Google not included in the Dow Jones Industrial Average (DJIA) index?
Answer: That is a pretty exclusive club and for the most part they are not interested in highly volatile companies like Apple and Google. Sure, IBM is part of the DJIA, but that is about as stalwart as you can get these days. The typical profile for a DJIA stock would be one that pays fairly predictable dividends, has been around since money was invented, and are not going anywhere unless the apocalypse really happens this year. In summary, DJIA is the boring reliable company index.
Downloading pre-trained and fine-tuned models are automated by the scripts. As an alternative they can also be downloaded manually. Make sure you are inside the FinBERT-QA
directory.
For training usage.
'bert-qa'
: pre-trained BERT model fine-tuned on the MS Macro passage dataset of Nogueira et al.'finbert-domain'
: further pre-trained BERT model of Araci on a large financial corpus'finbert-task'
: further pre-trained BERT model on the FiQA dataset'bert-base'
:'bert-base-uncased'
model from transformers
from src.utils import *
get_model('bert-qa')
The model will be downloaded in model/bert-qa/
For evaluation and prediction usage.
'finbert-qa'
:'bert-qa'
fine-tuned on FiQA'finbert-domain'
:'finbert-domain'
fine-tuned on FiQA'finbert-task'
:'finberr-task'
fine-tuned on FiQA'bert-pointwise'
:'bert-base-uncase'
fine-tuned on FiQA using the cross-entropy loss'bert-pairwise'
:'bert-base-uncase'
fine-tuned on FiQA using a pairwise loss'qa-lstm'
: QA-LSTM model
from src.utils import *
get_trained_model('finbert-qa')
The model will be downloaded in model/trained/finbert-qa/
This example code further fine-tunes Nogueira et al.'s BERT model on the FiQA dataset using the pointwise learning approach (cross-entropy loss).
python3 src/train_models.py --model_type 'bert'
--train_pickle data/data_pickle/train_set_50.pickle \
--valid_pickle data/data_pickle/valid_set_50.pickle \
--bert_model_name 'bert-qa' \
--learning_approach 'pointwise' \
--max_seq_len 512 \
--batch_size 16 \
--n_epochs 3 \
--lr 3e-6 \
--weight_decay 0.01 \
--num_warmup_steps 10000
Training with these hyperparameters produced the SOTA results:
MRR@10: 0.436
nDCG@10: 0.482
P@1: 0.366
python3 src/train_models.py --model_type 'qa-lstm' \
--train_pickle data/data_pickle/train_set_50.pickle \
--valid_pickle data/data_pickle/valid_set_50.pickle \
--emb_dim 100 \
--hidden_size 256 \
--max_seq_len 128 \
--batch_size 64 \
--n_epochs 3 \
--lr 1e-3 \
--dropout 0.2
Detailed Usage
python3 src/train_models.py [--model_type MODEL_TYPE] [--train_pickle TRAIN_PICKLE] \
[--valid_pickle VALID_PICKLE] [--device DEVICE] \
[--max_seq_len MAX_SEQ_LEN] [--batch_size BATCH_SIZE] \
[--n_epochs N_EPOCHS] [--lr LR] [--emb_dim EMB_DIM] \
[--hidden_size HIDDEN_SIZE] [--dropout DROPOUT] \
[--bert_model_name BERT_MODEL_NAME] \
[--learning approach LEARNING_APPROACH] \
[--margin MARGIN] [--weight_decay WEIGHT_DECAY] \
[--num_warmup_steps NUM_WARMUP_STEPS]
Arguments:
MODEL_TYPE - Specify model type as 'qa-lstm' or 'bert'
TRAIN_PICKLE - Path to training data in .pickle format
VALID_PICKLE - Path to validation data in .pickle format
DEVICE - Specify 'gpu' or 'cpu'
MAX_SEQ_LEN - Maximum sequence length for a given input
BATCH_SIZE - Batch size
N_EPOCHS - Number of epochs
LR - Learning rate
EMB_DIM - Embedding dimension. Specify only if model_type is 'qa-lstm'
HIDDEN_SIZE - Hidden size. Specify only if model_type is 'qa-lstm'
DROPOUT - Dropout rate. Specify only if model_type is 'qa-lstm'
BERT_MODEL_NAME - Specify the pre-trained BERT model to use from 'bert-base', 'finbert-domain', 'finbert-task', 'bert-qa'
LEARNING_APPROACH - Learning approach. Specify 'pointwise' or 'pairwise' only if model_type is 'bert'
MARGIN - margin for pariwise loss
WEIGHT_DECAY - Weight decay. Specify only if model_type is 'bert'
NUM_WARMUP_STEPS - Number of warmup steps. Specify only if model type is 'bert'
python3 src/evaluate_models.py --test_pickle data/data_pickle/test_set_50.pickle \
--model_type 'bert' \
--max_seq_len 512 \
--bert_finetuned_model 'finbert-qa' \
--use_trained_model
Detailed Usage
python3 src/evaluate_models.py [--model_type MODEL_TYPE] [--test_pickle TEST_PICKLE] \
[--bert_model_name BERT_MODEL_NAME] \
[--bert_finetuned_model BERT_FINETUNED_MODEL] \
[--model_path MODEL_PATH] [--device DEVICE] \
[--max_seq_len MAX_SEQ_LEN] [--emb_dim EMB_DIM] \
[--hidden_size HIDDEN_SIZE] [--dropout DROPOUT]
Arguments:
MODEL_TYPE - Specify model type as 'qa-lstm' or 'bert'
TEST_PICKLE - Path to training data in .pickle format
BERT_MODEL_NAME - Specify the pre-trained BERT model to use from 'bert-base', 'finbert-domain', 'finbert-task', 'bert-qa'
BERT_FINETUNED_MODEL - Specify the name of the fine-tuned model from bert-pointwise', 'bert-pairwise', 'finbert-domain', 'finbert-task', 'finbert-qa'
MODEL_PATH - Specify model path if use_trained_model is not used
DEVICE - Specify 'gpu' or 'cpu'
MAX_SEQ_LEN - Maximum sequence length for a given input
EMB_DIM - Embedding dimension. Specify only if model_type is 'qa-lstm'
HIDDEN_SIZE - Hidden size. Specify only if model_type is 'qa-lstm'
DROPOUT - Dropout rate. Specify only if model_type is 'qa-lstm'
src/predict.py
: given a query, retrieves the top-50 candidate answers and re-ranks them with the FinBERT-QA model
Retrieve the top-5 answers for a user given query
python3 src/predict.py --user_input --top_k 5
Detailed usage
python3 src/predict.py [--user_input] [--query QUERY] \
[--top_k TOP_K] [--device DEVICE]
Arguments:
QUERY - Specify query if user_input is not used
TOP_K - Top-k answers to output
DEVICE - Specify 'gpu' or 'cpu'
python3 src/generate_data.py --query_path data/raw/FiQA_train_question_final.tsv \
--label_path data/raw/FiQA_train_question_doc_final.tsv
The data wil be stored in data/data_pickle
Detailed usage:
python3 src/generate_data.py [--query_path QUERY_PATH] [--label_path LABEL_PATH] \
[--cands_size CANDS_SIZE] [--output_dir OUTPUT_DIR]
Arguments:
QUERY_PATH - Path to the question id to text data in .tsv format. Each line should have at least two columns named (qid, question) separated by tab
LABEL_PATH - Path to the question id and answer id data in .tsv format. Each line should have at two columns named (qid, docid) separated by tab
CANDS_SIZE - Number of candidates to retrieve per question.
OUTPUT_DIR - The output directory where the generated data will be stored.
.
├── data # Files for FinBERT-QA
| └── ...
├── notebooks # Jupyter notebooks
│ ├── Process_Data.ipynb # Loads, cleans, and processes data
│ └── Retriever_Analysis.ipynb # Evaluates and analyzes the results from the retriever
├── retriever # Files for the retriever
| └── ...
├── src # Source files
│ ├── evaluate.py # Evaluation metrics - nDCG@k, MRR@k, Precision@k
│ ├── evaluate_models.py # Configures evaluation parameters
| ├── finbert_qa.py # Creates pre-trained BERT model, fine-tunes, evaluates, and makes predictions
| ├── generate_data.py # Generates train, validation, and test sets using the retriever
| ├── predict.py # Configures prediction parameters
| ├── process_data.py # Functions to process data, create vocabulary, and tokenizers for the QA-LSTM model
| ├── qa_lstm.py # Creates, trains, and evaluates a QA-LSTM model
| ├── train_models.py # Configures training parameters
│ └── utils.py # Helper functions
└── ...
[Bithiah Yuan](bithiahy[at]gmail.com)