🦅 Eagle 7B : RNN outperforming Transformers RWKV

Опубликовано: 19 Январь 2025
на канале: Rithesh Sreenivasan
262
21

If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: https://www.buymeacoffee.com/rithesh
Eagle 7B is a 7.52B parameter model that:
• Built on the RWKV-v5 architecture
(a linear transformer with 10-100x+ lower inference cost)
• Ranks as the world’s greenest 7B model (per token)
• Trained on 1.1 Trillion Tokens across 100+ languages
• Outperforms all 7B class models in multi-lingual benchmarks
• Approaches Falcon (1.5T), LLaMA2 (2T), Mistral (2T?) level of performance in English evals
• Trade blows with MPT-7B (1T) in English evals
• All while being an “Attention-Free Transformer”
• Is a foundation model, with a very small instruct tune - further fine-tuning is required for various use cases!

https://blog.rwkv.com/p/eagle-7b-soar...
https://wiki.rwkv.com/advance/archite...
https://huggingface.co/spaces/BlinkDL...
https://huggingface.co/RWKV/v5-Eagle-7B
https://huggingface.co/RWKV/HF_v5-Eag...
   • RWKV: Reinventing RNNs for the Transf...  
https://johanwind.github.io/2023/03/2...
https://johanwind.github.io/2023/03/2...


If you like such content please subscribe to the channel here:
https://www.youtube.com/c/RitheshSree...