[CITATION][C] Bert: Pre-training of deep bidirectional transformers for language understanding

J Devlin - arXiv preprint arXiv:1810.04805, 2018