Seq 2 Rel
Seq 2 Rel
Seq 2 Rel
lations between coreferent mentions of entities in discussed thus far. However, existing work stops
a document. A popular approach involves graph- short, focusing on intra-sentence binary relations
based methods, which have the advantage of natu- (Zeng et al., 2018; Zhang et al., 2020; Nayak and
rally modelling inter-sentence relations (Peng et al., Ng, 2020; Zeng et al., 2020). In this paper, we
2017; Song et al., 2018; Christopoulou et al., 2019; extend work on seq2seq methods for RE to the doc-
Nan et al., 2020; Minh Tran et al., 2020). However, ument level, with several important contributions:
like all pipeline-based approaches, these methods
assume that the entities within the text are known. • We propose a novel linearization schema that
As previous work has demonstrated, and as we can handle complexities overlooked by previ-
show in §5.2, jointly learning to extract entities ous seq2seq approaches, like coreferent men-
and relations can improve performance (Miwa and tions and n-ary relations (§3.1).
Sasaki, 2014; Miwa and Bansal, 2016; Gupta et al., • Using this linearization schema, we demon-
2016; Li et al., 2016a, 2017; Nguyen and Verspoor, strate that a seq2seq approach is able to learn
2019a; Yu et al., 2020) and may be more efficient the subtasks of DocRE (entity extraction,
due to shared parameters and training steps. Ex- coreference resolution and relation extraction)
isting end-to-end methods typically combine task- jointly, and report the first end-to-end results
specific components for entity detection, corefer- on several popular biomedical datasets (§5.1).
ence resolution, and relation extraction that are
trained jointly. Most approaches are restricted to • We devise a simple strategy, referred to as “en-
intra-sentence RE (Bekoulis et al., 2018; Luan et al., tity hinting” (§3.3), to compare our model to
2018; Nguyen and Verspoor, 2019b; Wadden et al., existing pipeline-based approaches, in some
2019; Giorgi et al., 2019) and have only recently cases exceeding their performance (§5.1).
been extended to DocRE (Eberts and Ulges, 2021).
However, they still focus on binary relations. Ide- 2 Task definition: document-level
ally, DocRE methods would be capable of mod- relation extraction
elling the complexities mentioned above without
strictly requiring entities to be known. Given a source document of S tokens, a model
must extract all tuples corresponding to a relation,
A less popular end-to-end approach is to frame R, expressed between the entities, E in the doc-
RE as a generative task with sequence-to-sequence ument, (E1 , ..., En , R) where n is the number of
(seq2seq) learning (Sutskever et al., 2014). This participating entities, or arity, of the relation. Each
framing simplifies RE by removing the need for entity Ei is represented as the set of its coreferent
task-specific components and explicit negative mentions {eij } in the document, which are often ex-
training examples, i.e. pairs of entities that do not pressed as aliases, abbreviations or acronyms. All
express a relation. If the information to extract is entities appearing in a tuple have at least one men-
appropriately linearized to a string, seq2seq meth- tion in the document. The mentions that express a
ods are flexible enough to model all complexities given relation are not necessarily contained within
Figure 2: A sequence-to-sequence model for document-level relation extraction. Special tokens are generated by
the decoder. Entity mentions are copied from the input via a copy mechanism (not shown). Decoding is initiated
by a @START@ token and terminated when the model generates the @END@ token. Attention connections shown only
for the second timestep to reduce clutter. CID: chemical-induced disease.
the same sentence. Commonly, E is assumed to be schema can be used to model various complexities,
known and provided as input to a model. We will like coreferent entity mentions and n-ary relations.
refer to these methods as “pipeline-based”. In this
paper, we are primarily concerned with the situa- 3.2 Model
tion where E is not given and must be predicted by The model follows a canonical seq2seq setup. An
a model, which we will refer to as “end-to-end”. encoder maps each token in the input to a contex-
tual embedding. An autoregressive decoder gener-
3 Our approach: seq2rel ates an output, token-by-token, attending to the out-
3.1 Linearization puts of the encoder at each timestep (Figure 2). De-
coding proceeds until a special “end-of-sequence”
To use seq2seq learning for RE, the information to token (@END@) is generated, or a maximum number
be extracted must be linearized to a string. This of tokens have been generated. Formally, X is the
linearization should be expressive enough to model source sequence of length S, which is some text
the complexities of entity and relation extraction we would like to extract relations from. Y is the
without being overly verbose. We propose the corresponding target sequence of length T , a lin-
following schema, illustrated with an example: earization of the relations contained in the source.
We model the conditional probability
X: Variants in the estrogen receptor alpha (ESR1) gene
and its mRNA contribute to risk for schizophrenia.
Y: estrogen receptor alpha ; ESR1 @GENE@ T
Y
schizophrenia @DISEASE@ @GDA@ p(Y |X) = p(yt |X, y<t ) (1)
t=1
The input text X, expresses a gene-disease associa- During training, we optimize over the model pa-
tion (GDA) between ESR1 and schizophrenia. In rameters θ the sequence cross-entropy loss
the corresponding target string Y , each relation be-
gins with its constituent entities. A semicolon sepa-
rates coreferent mentions (;), and entities are termi- T
X
nated with a special token denoting their type (e.g. `(θ) = − log p(yt |X, y<t ; θ) (2)
@GENE@). Similarly, relations are terminated with a t=1
CDR GDA
Method P R F1 P R F1
Christopoulou et al. (2019) 62.1 65.2 63.6 – – 81.5
Nan et al. (2020) – – 64.8 – – 82.2 Figure 3: Effect of training set size on performance.
Minh Tran et al. (2020) – – 66.1 – – 82.8
Performance reported as the median micro F1-score ob-
Lai and Lu (2021) 64.9 67.1 66.0 – – –
Xu et al. (2021) – – 68.7 – – 83.7 tained over five runs with different random seeds on the
Zhou et al. (2021) – – 69.4 – – 83.9 CDR and GDA validation sets, with and without entity
seq2rel (entity hinting) 68.2 66.2 67.2 84.4 85.3 84.9 hinting. Error bands correspond to the standard devia-
seq2rel (entity hinting, relaxed) 68.2 66.2 67.2 84.5 85.4 85.0 tion over the five runs. The absolute number of training
seq2rel (end-to-end) 43.5 37.5 40.2 55.0 55.4 55.2
seq2rel (end-to-end, relaxed) 56.6 48.8 52.4 70.3 70.8 70.5
examples are displayed for each corpus. Some labels
are excluded to reduce clutter.
John Giorgi, Xindi Wang, Nicola Sahar, Won Young Robin Jia, Cliff Wong, and Hoifung Poon. 2019.
Shin, Gary D Bader, and Bo Wang. 2019. End-to- Document-level n-ary relation extraction with multi-
end named entity recognition and relation extraction scale representation learning. In Proceedings of the
using pre-trained language models. ArXiv preprint, 2019 Conference of the North American Chapter of
abs/1912.13415. the Association for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long and
Alex Graves. 2012. Sequence transduction with Short Papers), pages 3693–3704, Minneapolis, Min-
recurrent neural networks. ArXiv preprint, nesota. Association for Computational Linguistics.
abs/1211.3711.
Jin-Dong Kim, T. Ohta, Yuka Tateisi, and Jun’ichi Tsu-
Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. jii. 2003. Genia corpus - a semantically annotated
Li. 2016a. Incorporating copying mechanism in corpus for bio-textmining. Bioinformatics, 19 Suppl
sequence-to-sequence learning. In Proceedings of 1:i180–2.
the 54th Annual Meeting of the Association for Com-
putational Linguistics (Volume 1: Long Papers), Po-Ting Lai and Zhiyong Lu. 2021. Bert-gt: Cross-
pages 1631–1640, Berlin, Germany. Association for sentence n-ary relation extraction with bert and
Computational Linguistics. graph transformer. Bioinformatics.
Yi Luan, Luheng He, Mari Ostendorf, and Hannaneh Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina
Hajishirzi. 2018. Multi-task identification of enti- Toutanova, and Wen-tau Yih. 2017. Cross-sentence
ties, relations, and coreference for scientific knowl- n-ary relation extraction with graph LSTMs. Trans-
edge graph construction. In Proceedings of the 2018 actions of the Association for Computational Lin-
Conference on Empirical Methods in Natural Lan- guistics, 5:101–115.
guage Processing, pages 3219–3232, Brussels, Bel-
gium. Association for Computational Linguistics. Long N Phan, James T Anibal, Hieu Tran, Shaurya
Chanana, Erol Bahadroglu, Alec Peltekian, and Gré-
Hieu Minh Tran, Minh Trung Nguyen, and Thien Huu goire Altan-Bonnet. 2021. Scifive: a text-to-text
Nguyen. 2020. The dots have their values: Exploit- transformer model for biomedical literature. ArXiv
ing the node-edge connections in graph-based neural preprint, abs/2106.03598.
models for document-level relation extraction. In
Findings of the Association for Computational Lin- Colin Raffel, Noam Shazeer, Adam Roberts, Katherine
guistics: EMNLP 2020, pages 4561–4567, Online. Lee, Sharan Narang, Michael Matena, Yanqi Zhou,
Association for Computational Linguistics. Wei Li, and Peter J Liu. 2020. Exploring the lim-
its of transfer learning with a unified text-to-text
Makoto Miwa and Mohit Bansal. 2016. End-to-end re- transformer. Journal of Machine Learning Research,
lation extraction using LSTMs on sequences and tree 21:1–67.
structures. In Proceedings of the 54th Annual Meet-
ing of the Association for Computational Linguistics Linfeng Song, Yue Zhang, Zhiguo Wang, and Daniel
(Volume 1: Long Papers), pages 1105–1116, Berlin, Gildea. 2018. N-ary relation extraction using graph-
Germany. Association for Computational Linguis- state LSTM. In Proceedings of the 2018 Conference
tics. on Empirical Methods in Natural Language Process-
ing, pages 2226–2235, Brussels, Belgium. Associa-
Makoto Miwa and Yutaka Sasaki. 2014. Modeling tion for Computational Linguistics.
joint entity and relation extraction with table repre-
sentation. In Proceedings of the 2014 Conference on Nitish Srivastava, Geoffrey E. Hinton, Alex
Empirical Methods in Natural Language Processing Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdi-
(EMNLP), pages 1858–1869, Doha, Qatar. Associa- nov. 2014. Dropout: a simple way to prevent neural
tion for Computational Linguistics. networks from overfitting. J. Mach. Learn. Res.,
15:1929–1958.
Guoshun Nan, Zhijiang Guo, Ivan Sekulic, and Wei Lu.
2020. Reasoning with latent structure refinement for Dianbo Sui, Yubo Chen, Kang Liu, Jun Zhao, Xian-
document-level relation extraction. In Proceedings grong Zeng, and Shengping Liu. 2020. Joint entity
of the 58th Annual Meeting of the Association for and relation extraction with set prediction networks.
Computational Linguistics, pages 1546–1557, On- ArXiv preprint, abs/2011.01675.
line. Association for Computational Linguistics.
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014.
Tapas Nayak and Hwee Tou Ng. 2020. Effective mod- Sequence to sequence learning with neural networks.
eling of encoder-decoder architecture for joint en- In Advances in Neural Information Processing Sys-
tity and relation extraction. In The Thirty-Fourth tems 27: Annual Conference on Neural Informa-
AAAI Conference on Artificial Intelligence, AAAI tion Processing Systems 2014, December 8-13 2014,
2020, The Thirty-Second Innovative Applications of Montreal, Quebec, Canada, pages 3104–3112.
Artificial Intelligence Conference, IAAI 2020, The
Tenth AAAI Symposium on Educational Advances Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang
in Artificial Intelligence, EAAI 2020, New York, NY, Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu
USA, February 7-12, 2020, pages 8528–8535. AAAI Yang, Sebastian Ruder, and Donald Metzler. 2021.
Press. Long range arena : A benchmark for efficient trans-
formers. In 9th International Conference on Learn-
Dat Quoc Nguyen and Karin Verspoor. 2019a. End- ing Representations, ICLR 2021, Virtual Event, Aus-
to-end neural relation extraction using deep biaffine tria, May 3-7, 2021. OpenReview.net.
attention. In Advances in Information Retrieval,
pages 729–738, Cham. Springer, Springer Interna- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
tional Publishing. Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
Kaiser, and Illia Polosukhin. 2017. Attention is all
Dat Quoc Nguyen and Karin Verspoor. 2019b. End- you need. In Advances in Neural Information Pro-
to-end neural relation extraction using deep biaffine cessing Systems 30: Annual Conference on Neural
attention. In European Conference on Information Information Processing Systems 2017, December 4-
Retrieval, pages 729–738. Springer. 9, 2017, Long Beach, CA, USA, pages 5998–6008.
Patrick Verga, Emma Strubell, and Andrew McCallum. learning approach for extracting gene-disease associ-
2018. Simultaneously self-attending to all mentions ations from literature. In Research in Computational
for full-abstract biological relation extraction. In Molecular Biology, pages 272–284, Cham. Springer
Proceedings of the 2018 Conference of the North International Publishing.
American Chapter of the Association for Compu-
tational Linguistics: Human Language Technolo- Benfeng Xu, Quan Wang, Yajuan Lyu, Yong Zhu, and
gies, Volume 1 (Long Papers), pages 872–884, New Zhendong Mao. 2021. Entity structure within and
Orleans, Louisiana. Association for Computational throughout: Modeling mention dependencies for
Linguistics. document-level relation extraction. In AAAI.
Oriol Vinyals, Samy Bengio, and Manjunath Kudlur. Pengcheng Yang, Fuli Luo, Shuming Ma, Junyang Lin,
2016. Order matters: Sequence to sequence for sets. and Xu Sun. 2019. A deep reinforced sequence-to-
In 4th International Conference on Learning Repre- set model for multi-label classification. In Proceed-
sentations, ICLR 2016, San Juan, Puerto Rico, May ings of the 57th Annual Meeting of the Association
2-4, 2016, Conference Track Proceedings. for Computational Linguistics, pages 5252–5258,
Florence, Italy. Association for Computational Lin-
David Wadden, Ulme Wennberg, Yi Luan, and Han- guistics.
naneh Hajishirzi. 2019. Entity, relation, and event
extraction with contextualized span representations. Yuan Yao, Deming Ye, Peng Li, Xu Han, Yankai Lin,
In Proceedings of the 2019 Conference on Empirical Zhenghao Liu, Zhiyuan Liu, Lixin Huang, Jie Zhou,
Methods in Natural Language Processing and the and Maosong Sun. 2019. DocRED: A large-scale
9th International Joint Conference on Natural Lan- document-level relation extraction dataset. In Pro-
guage Processing (EMNLP-IJCNLP), pages 5784– ceedings of the 57th Annual Meeting of the Associa-
5789, Hong Kong, China. Association for Computa- tion for Computational Linguistics, pages 764–777,
tional Linguistics. Florence, Italy. Association for Computational Lin-
guistics.
Li Wan, Matthew D. Zeiler, Sixin Zhang, Yann LeCun,
Bowen Yu, Zhenyu Zhang, Xiaobo Shu, Tingwen Liu,
and Rob Fergus. 2013. Regularization of neural net-
Yubin Wang, Bin Wang, and Sujian Li. 2020. Joint
works using dropconnect. In Proceedings of the
extraction of entities and relations based on a novel
30th International Conference on Machine Learning,
decomposition strategy. In ECAI 2020, pages 2282–
ICML 2013, Atlanta, GA, USA, 16-21 June 2013,
2289. IOS Press.
volume 28 of JMLR Workshop and Conference Pro-
ceedings, pages 1058–1066. JMLR.org. Daojian Zeng, Haoran Zhang, and Qianying Liu. 2020.
Copymtl: Copy mechanism for joint extraction of
Yucheng Wang, Bowen Yu, Hongsong Zhu, Tingwen entities and relations with multi-task learning. In
Liu, Nan Yu, and Limin Sun. 2021. Discontinu- The Thirty-Fourth AAAI Conference on Artificial In-
ous named entity recognition as maximal clique dis- telligence, AAAI 2020, The Thirty-Second Innova-
covery. In Proceedings of the 59th Annual Meet- tive Applications of Artificial Intelligence Confer-
ing of the Association for Computational Linguistics ence, IAAI 2020, The Tenth AAAI Symposium on Ed-
and the 11th International Joint Conference on Nat- ucational Advances in Artificial Intelligence, EAAI
ural Language Processing (Volume 1: Long Papers), 2020, New York, NY, USA, February 7-12, 2020,
pages 764–774, Online. Association for Computa- pages 9507–9514. AAAI Press.
tional Linguistics.
Xiangrong Zeng, Shizhu He, Daojian Zeng, Kang Liu,
Chih-Hsuan Wei, Hung-Yu Kao, and Zhiyong Lu. 2013. Shengping Liu, and Jun Zhao. 2019. Learning the
Pubtator: a web-based text mining tool for assisting extraction order of multiple relational facts in a sen-
biocuration. Nucleic acids research, 41(W1):W518– tence with reinforcement learning. In Proceedings
W522. of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th Interna-
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien tional Joint Conference on Natural Language Pro-
Chaumond, Clement Delangue, Anthony Moi, Pier- cessing (EMNLP-IJCNLP), pages 367–377, Hong
ric Cistac, Tim Rault, Remi Louf, Morgan Funtow- Kong, China. Association for Computational Lin-
icz, Joe Davison, Sam Shleifer, Patrick von Platen, guistics.
Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Teven Le Scao, Sylvain Gugger, Mariama Drame, Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu,
Quentin Lhoest, and Alexander Rush. 2020. Trans- and Jun Zhao. 2018. Extracting relational facts by
formers: State-of-the-art natural language process- an end-to-end neural model with copy mechanism.
ing. In Proceedings of the 2020 Conference on Em- In Proceedings of the 56th Annual Meeting of the
pirical Methods in Natural Language Processing: Association for Computational Linguistics (Volume
System Demonstrations, pages 38–45, Online. Asso- 1: Long Papers), pages 506–514, Melbourne, Aus-
ciation for Computational Linguistics. tralia. Association for Computational Linguistics.
Ye Wu, Ruibang Luo, Henry C. M. Leung, Hing-Fung Ranran Haoran Zhang, Qianying Liu, Aysa Xuemo
Ting, and Tak-Wah Lam. 2019. Renet: A deep Fan, Heng Ji, Daojian Zeng, Fei Cheng, Daisuke
Kawahara, and Sadao Kurohashi. 2020. Minimize fraction of inter-sentence relations in DocRED is
exposure bias of Seq2Seq models in joint entity reported as ∼40.7%. We can reproduce this value
and relation extraction. In Findings of the Associa-
if we consider relations intra-sentence when all
tion for Computational Linguistics: EMNLP 2020,
pages 236–246, Online. Association for Computa- mentions of an entity exist within a single sentence
tional Linguistics. and inter-sentence otherwise.
Sheng Zhang, Cliff Wong, Naoto Usuyama, Sarthak C Hypernym filtering
Jain, Tristan Naumann, and Hoifung Poon. 2021a.
Modular self-supervision for document-level rela-
The CDR dataset is annotated for chemical-induced
tion extraction. In Proceedings of the 2021 Con-
ference on Empirical Methods in Natural Language disease (CID) relationships between the most
Processing, pages 5291–5302, Online and Punta specific chemical and disease mentions in an ab-
Cana, Dominican Republic. Association for Compu- stract. Take the following example from the corpus:
tational Linguistics.
Tianyi Zhang, Felix Wu, Arzoo Katiyar, Kilian Q. Carbamazepine-induced cardiac dysfunction [...] A patient
Weinberger, and Yoav Artzi. 2021b. Revisiting few- with sinus bradycardia and atrioventricular block, induced
sample BERT fine-tuning. In 9th International Con- by carbamazepine, prompted an extensive literature review
ference on Learning Representations, ICLR 2021,
of all previously reported cases.
Virtual Event, Austria, May 3-7, 2021. OpenRe-
view.net.
In this example (PMID: 1728915), only (carba-
Deyu Zhou, Dayou Zhong, and Yulan He. 2014.
Biomedical relation extraction: from binary to com-
mazepine, bradycardia) and (carbamazepine, atri-
plex. Computational and mathematical methods in oventricular block) are labelled as true relations.
medicine, 2014. The relation (carbamazepine, cardiac dysfunction),
although true, is not labelled as cardiac dysfunction
Wenxuan Zhou and Muhao Chen. 2021. An im-
proved baseline for sentence-level relation extrac- is a hypernym of both bradycardia and atrioventric-
tion. ArXiv, abs/2102.01373. ular block. This can harm evaluation performance,
as the prediction (carbamazepine, cardiac dysfunc-
Wenxuan Zhou, Kevin Huang, Tengyu Ma, and Jing
Huang. 2021. Document-level relation extraction tion) will be considered a false positive. There-
with adaptive thresholding and localized context fore, we follow previous work (Gu et al., 2016b,
pooling. In AAAI. 2017; Verga et al., 2018; Christopoulou et al., 2019;
Zhou et al., 2021) by filtering negative relations like
these, with disease entities that are hypernyms of a
A Constrained decoding
corresponding true relations disease entity within
In Figure 4, we illustrate the rules used to constrain the same abstract, according to the hierarchy in the
decoding. At each timestep t, given the prediction MeSH vocabulary.10
of the previous timestep t − 1, the predicted class
probabilities of tokens that would generate a syn- D Parsing the models output
tactically invalid target string are set to a tiny value.
At test time, our model autoregressively generates
In practice, we found that a model rarely generates
an output, token-by-token, using beam search de-
invalid target strings, so these constraints have little
coding (see §3.2). In order to extract the predicted
effect on final performance (see §3.2.3 and §5.3).
relations from this output, we apply the following
B Details about dataset annotations steps. First, predicted token ids are converted
to a string. We use the decode()11 method of
In Table 6, we list which complexities (e.g. nested the HuggingFace Transformers tokenizer (Wolf
& discontinuous mentions, n-ary relations) are con- et al., 2020) to do this. For example, after calling
tained within each dataset used in our evaluations. decode() on the predicted token ids, this string
We also report the fraction of relations in the test might look like:
set that are inter-sentence. We consider a relation
intra-sentence if any sentence in the document con- monoamine oxidase b ; maob @GENE@ parkinson’s
tains at least one mention of each entity in the
relation, and inter-sentence otherwise. This pro- https://meshb.nlm.nih.gov
10
https://huggingface.co/docs/transformers/
11
duces an estimate that matches previously reported main_classes/tokenizer#transformers.
numbers for CDR (∼30%). In Yao et al. (2019), the PreTrainedTokenizerBase.decode
Figure 4: A diagram depicting syntactically valid predictions during decoding at each timestep t. The log proba-
bilities of all other possible predictions are set to a tiny value to prevent the model from producing a syntactically
invalid target string. BOS is the special beginning-of-sequence token, COPY denotes any token copied from the
source text, and COREF is the special token used to separate coreferent mentions (i.e. ;). ENTITY is any special
entity token (e.g. @GENE@) and RELATION any special relation token (e.g. @GDA@ for gene-disease association). n̂ents
denotes the number of entities predicted by the current timestep and nents the expected arity of the relation. The
special end-of-sequence token (not shown) is always considered valid and its log probability is never modified.
Table 6: Evaluation datasets used in this paper with details about their annotations. Inter-sentence relations (%) are
the fraction of relations in the test set that cross sentence boundaries. We consider a relation intra-sentence if any
sentence in the document contains at least one mention of each entity in the relation, and inter-sentence otherwise.
*This differs from the estimate in Yao et al. (2019), see Appendix B.
Corpus Nested Mentions? Discontinuous Mentions? Coreferent mentions? n-ary relations? Inter-sentence relations (%)
CDR (Li et al., 2016b) 3 3 3 7 29.8
GDA (Wu et al., 2019) 3 7 3 7 15.6
DGM (Jia et al., 2019) 7 7 3 3 63.5
DocRED (Yao et al., 2019) 7 7 3 7 12.5*
using Optuna (Akiba et al., 2019). The tuning not. We compare to EoG in the pipeline-based
process selects the best hyperparameters accord- setting on the CDR and GDA corpora.
ing to the validation set micro F1-score using the
TPE (Tree-structured Parzen Estimator) algorithm • Nan et al. (2020) propose LSR (Latent Struc-
(Bergstra et al., 2011).12 During tuning, we use ture Refinement). A “node constructor” en-
greedy decoding (i.e. beam size of one). Once opti- codes each sentence of an input document and
mal hyperparameters are found, we tune the beam outputs contextual representations. Represen-
size (bs) and length penalty (α) using a grid search tations that correspond to mentions and tokens
over the values bs = {2...10}, with a step size of on the shortest dependency path in a sentence
1, and α = {0.2...2.0}, with a step size of 0.2. are extracted as nodes. A “dynamic reasoner”
is then applied to induce a document-level
G Baselines graph based on the extracted nodes. The clas-
This section contains detailed descriptions of all sifier uses the final representations of nodes
methods we compare to in this paper. for relation classification. We compare to LSR
in the pipeline-based setting on the CDR and
G.1 Pipeline-based methods GDA corpora.
These methods are pipeline-based, assuming the en-
tities are provided as input. Many of them construct • Lai and Lu (2021) propose BERT-GT, which
a document-level graph using dependency parsing, combines BERT with a graph transformer.
heuristics, or structured attention and then update Both BERT and the graph transformer accept
node and edge representations using propagation. the document text as input, but the graph trans-
former requires the neighbouring positions for
• Christopoulou et al. (2019) propose EoG, an each token, and the self-attention mechanism
edge-orientated graph neural model. The is replaced with a neighbour–attention mecha-
nodes of the graph are constructed from men- nism. The hidden states of the two transform-
tions, entities, and sentences. Edges between ers are aggregated before classification. We
nodes are initially constructed using heuristics. compare to BERT-GT in the pipeline-based
An iterative algorithm is then used to generate setting on the CDR and GDA corpora.
edges between nodes in the graph. Finally,
a classification layer takes the representation • Minh Tran et al. (2020) propose EoGANE
of entity-to-entity edges as input to determine (EoG model Augmented with Node Represen-
whether those entities express a relation or tations), which extends the edge-orientated
model proposed by Christopoulou et al. (2019)
https://optuna.readthedocs.io/en/stable/
12