This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: https://github.com/scott4ai
Hugging Face: https://huggingface.co/scott4ai
Links:
Colab Demo: https://colab.research.google.com/dri...
Dataset: https://github.com/scott4ai/llama3-8b...
Unsloth Colab: https://colab.research.google.com/dri...
Unsloth Wiki: https://github.com/unslothai/unsloth/...
Unsloth Web: https://unsloth.ai/