Named entity recognition with BERT

Tools for finetuning BERT for named entity recognition, originally tested on PharmaCoNER data. The trained model for PharmaCoNER is available on request.

Requirements:

a pretrained BERT model (model paths hardcoded in the code at the moment)
conlleval.py file from https://github.com/sighsmile/conlleval
Train and devel files in the format below (paths hardcoded currently)

Data should be in conllish format (1st column tag, last column token (will get retokenized to subword units)):

I-MISC West
I-MISC Indian
O all-rounder
I-PER Phil
I-PER Simmons
O took
O four
O for
O 38
O on
O Friday
O as
I-ORG Leicestershire
O beat
I-ORG Somerset
O by
O an
O innings
O and
O 39
O runs
O in
O two
O days
O to
O take
O over
O at
O the
O head
O of
O the
O county
O championship
O .

train.py trains a new model, requires model save path and weight decay as input, e.g.

python3 train.py ./models/model.h5 0.01 | tee ./logs/model.log

predict.py makes predictions with an existing model, requires model path, input path and output path, e.g.

python3 predict.py ./models/model.h5 ./data/PharmaCoNER-dev-1.1.nersuite ./predictions.nersuite

The predicted tags are inserted to the last column of the input file.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
generalization_stats.py		generalization_stats.py
predict.py		predict.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named entity recognition with BERT

About

Releases

Packages

Languages

License

chaanim/pharmaconer

Folders and files

Latest commit

History

Repository files navigation

Named entity recognition with BERT

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages