Nick Burch - Laptop-sized ML for Text, with Open Source

Опубликовано: 20 Июнь 2023
на канале: Plain Schwarz
407
10

AI text models like GPT3, ChatGPT, Bing AI and Github Co-Pilot are getting a lot of buzz right now, both good and bad. Much of the training techniques are public, but the computational and data requirements mean most of us can't build our own. Using these big models typically involves cost or sharing your data. What if that's not an option?

Luckily, there are a number of open source language models out there, with pre-trained versions available to download! They won't let you compete with Google or OpenAI, but they're good enough for a number of real world problems.

We'll start with a quick introduction to the main open ML-for-text systems like Word2vec, GloVe, ELMo and BERT, along with how they differ from traditional text relevancy like TF-IDF. Then, we'll discover how open source ML frameworks let us easily work with those techniques, and how pre-trained models let
us quickly get up and running.

With our ML-for-text model running on our laptop (or hefty docker container!), next it's time to see what kinds of problems we can solve! We'll look at embeddings for search, inference, semantic reasoning, prediction and more, all with (fairly) minimal coding. Finally, we'll see how we can improve the pre-trained models for specific use-cases with our own text.

It may not run on your phone and it probably won't hallucinate incorrect answers, but there's still a lot of text problems we can solve just with open source on our laptops. And we'll share the code you need to do so!

Speaker: Nick Burch

More: https://2023.berlinbuzzwords.de/sessi...

Web: https://2023.berlinbuzzwords.de/
Fediverse: https://floss.social/@berlinbuzzwords
Linkedin:   / 13978964  
Twitter:   / berlinbuzzwords