
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
William Fedus, Barret Zoph, Noam Shazeer
00
2021-01-01
moescaling
Abstract
This paper introduces and evaluates the idea described in “Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity”, and reports empirical results that helped shape subsequent work in moe, scaling.