State Spaces Aren’t Enough: Machine Translation Needs Attention

In collaboration with University of Amsterdam

AuthorsAli Vardasbi*, Telmo Pessoa Pires*, Robin M. Schmidt, Stephan Peitz

*= Equal Contributors

Structured State Spaces for Sequences (S4) is a recently proposed sequence model with successful applications in various tasks, e.g., vision, language modeling, and audio. Thanks to its mathematical formulation, it compresses its input to a single hidden state and is able to capture long-range dependencies while avoiding the need for an attention mechanism. In this work, we apply S4 to Machine Translation (MT) and evaluate several encoder-decoder variants on WMT'14 and WMT'16. In contrast with the success in language modeling, we find that S4 lags behind the Transformer by approximately 4 BLEU points and counter-intuitively struggles with long sentences. Finally, we show that this gap is caused by S4's inability to summarize the full source sentence in a single hidden state, and show that we can close the gap by introducing an attention mechanism.

State Spaces Aren’t Enough: Machine Translation Needs Attention

Related readings and updates.

Improving How Machine Translations Handle Grammatical Gender Ambiguity

Efficient Representation Learning via Adaptive Context Pooling

Discover opportunities in Machine Learning.