Paper of the week - Transformer-XL

https://arxiv.org/pdf/1901.02860.pdf

Transformers have had quite some success recently in Natural Language Processing, thanks to their greatly reduced computational cost as compared to recurrent models. They basically replace the recurrence with self-attention, which is however always limited to a predefined context history. This new paper extends their capacity to a much larger context by introducing back a recurring transition, but without back-propagating through it, along with a relative positional encoding. Their results are impressive on Language Modeling tasks.

Written on January 23, 2019