Skip to main content
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

William Fedus, Barret Zoph, Noam Shazeer

00
2021-01-01
moescaling

Abstract

This paper introduces and evaluates the idea described in “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, and reports empirical results that helped shape subsequent work in moe, scaling.