If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: https://www.buymeacoffee.com/rithesh
mixtral 8x22B - things we know so far 🫡
*176B parameters
*performance in between gpt4 and claude sonnet (according to their discord)
*same/ similar tokeniser used as mistral 7b
*65536 sequence length
*8 experts, 2 experts per token: More
*would require ~260GB VRAM in fp16, 73GB in bnb
*uses RoPE
32000 vocab size
https://huggingface.co/mistral-commun...
/ 1777946948617605384
https://twitter.com/MistralAI/status/...
/ 1778020589225091453
https://www.linkedin.com/posts/philip...
https://huggingface.co/mistral-commun...
If you like such content please subscribe to the channel here:
https://www.youtube.com/c/RitheshSree...