Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

chaanim/pharmaconer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Named entity recognition with BERT

Tools for finetuning BERT for named entity recognition, originally tested on PharmaCoNER data. The trained model for PharmaCoNER is available on request.

Requirements:

  1. a pretrained BERT model (model paths hardcoded in the code at the moment)
  2. conlleval.py file from https://github.com/sighsmile/conlleval
  3. Train and devel files in the format below (paths hardcoded currently)

Data should be in conllish format (1st column tag, last column token (will get retokenized to subword units)):

I-MISC West
I-MISC Indian
O all-rounder
I-PER Phil
I-PER Simmons
O took
O four
O for
O 38
O on
O Friday
O as
I-ORG Leicestershire
O beat
I-ORG Somerset
O by
O an
O innings
O and
O 39
O runs
O in
O two
O days
O to
O take
O over
O at
O the
O head
O of
O the
O county
O championship
O .	

train.py trains a new model, requires model save path and weight decay as input, e.g.

python3 train.py ./models/model.h5 0.01 | tee ./logs/model.log

predict.py makes predictions with an existing model, requires model path, input path and output path, e.g.

python3 predict.py ./models/model.h5 ./data/PharmaCoNER-dev-1.1.nersuite ./predictions.nersuite

The predicted tags are inserted to the last column of the input file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages