Google Scholar

[PDF][PDF] Attention is all you need

A Vaswani - arXiv preprint arXiv:1706.03762, 2017 - user.phil.hhu.de

The dominant sequence transduction models are based on complex recurrent
orconvolutional neural networks in an encoder and decoder configuration. The best
performing such models also connect the encoder and decoder through an attentionm
echanisms. We propose a novel, simple network architecture based solely onan attention
mechanism, dispensing with recurrence and convolutions entirely. Experiments on two
machine translation tasks show these models to be superiorin quality while being more …

Save Cite Cited by 129569 Related articles Cached

Cite

Advanced search

Saved to My library

[PDF][PDF] Attention is all you need