Sebastian Jaszczur – Fine-Grained Conditional Computation in Transformers | ML in PL 22

Опубликовано: 19 Январь 2025
на канале: ML in PL

230

Fine-Grained Conditional Computation in Transformers by Sebastian Jaszczur (IDEAS NCBR, University of Warsaw), 5 November 2022

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that the use and study of the largest models becomes out of reach for many researchers and end-users. Conditional computation, or sparsity, may help alleviate those problems.

In my work "Sparse is Enough in Scaling Transformers", done at Google Research and published at NeurIPS 2021, we showed that sparse layers leveraging fine-grained conditional computation can enable Transformers to scale efficiently and perform unbatched decoding much faster than standard Transformer. Importantly, in contrast to standard Mixture-of-Expert methods, this fine-grained sparsity achieves the speed-up without decreasing the model quality, and with the same number of model parameters.

My current work on this topic, done at IDEAS NCBR, focuses on adjusting those conditional computation methods to the training environment, with the goal of speeding up the training process as well as the inference. This can be achieved by a careful redesign of fine-grained conditional computation while using only dense tensor operations, which are efficient on modern accelerators. While this is still an ongoing work, the preliminary results show the promise of improving the training speed of Transformers on existing hardware, without degrading the quality of the model's predictions.

The talk was delivered during ML in PL Conference 2022 as a part of Contributed Talks. The conference was organized by a non-profit NGO called ML in PL Association.

ML in PL Association website: https://mlinpl.org/
ML in PL Conference 2022 website: https://conference2022.mlinpl.org/
ML In PL Conference 2023 website: https://conference2023.mlinpl.org/

---

ML in PL Association was founded based on the experiences in organizing of the ML in PL Conference (formerly PL in ML), the ML in PL Association is a non-profit organization devoted to fostering the machine learning community in Poland and Europe and promoting a deep understanding of ML methods. Even though ML in PL is based in Poland, it seeks to provide opportunities for international cooperation.