From RLHF with PPO/DPO to ORPO + How to build ORPO on Trainium/Neuron SDK

Опубликовано: 16 Октябрь 2024
на канале: Generative AI on AWS
538
16

RSVP Webinar: https://www.eventbrite.com/e/webinar-...

Talk #0: Introduction
by Chris Fregly (Principal SA, Generative AI) and Antje Barth (Principal Developer Advocate, Generative AI)

Talk #1: Human Alignment with Reinforcement Learning from Human Feedback (RLHF) with both PPO and DPO
by Antje Barth (Principal Developer Advocate, Generative AI)

Proximal Policy Optimization (PPO): https://arxiv.org/pdf/1707.06347, 2017
Direct Preference Optimization (DPO): https://arxiv.org/pdf/2305.18290, 2023

Talk #2: From RLHF with PPO/DPO to ORPO + How to build ORPO on Trainium/Neuron SDK
by Hunter Carlisle (Senior SA, Annapurna ML)

ORPO is a new fine-tuning technique that performs SFT + preference alignment in one process.

Odds Ratio Preference Optimization (ORPO) paper: https://arxiv.org/pdf/2403.07691, 2024

RSVP Webinar: https://www.eventbrite.com/e/webinar-...

Zoom link: https://us02web.zoom.us/j/82308186562

Related Links
Generative AI Free Course on DeepLearning.ai: https://bit.ly/gllm
O'Reilly Book: https://www.amazon.com/Generative-AWS...
Website: https://generativeaionaws.com
Meetup: https://meetup.generativeaionaws.com
GitHub Repo: https://github.com/generative-ai-on-aws/
YouTube: https://youtube.generativeaionaws.com