[PDF][PDF] Attention is all you need

A Vaswani - arXiv preprint arXiv:1706.03762, 2017 - user.phil.hhu.de
The dominant sequence transduction models are based on complex recurrent
orconvolutional neural networks in an encoder and decoder configuration. The best
performing such models also connect the encoder and decoder through an attentionm
echanisms. We propose a novel, simple network architecture based solely onan attention
mechanism, dispensing with recurrence and convolutions entirely. Experiments on two
machine translation tasks show these models to be superiorin quality while being more …