How AI works - LLMs explained - How ChatGPT works

Опубликовано: 30 Октябрь 2024
на канале: STARTUP HAKK

https://StartupHakk.com?v=hLuE_p7snDs

Welcome to StartupHakk - where we love training Software Developers. We take people with zero experience and help train them to be ready to start as a Fullstack Software Developer in just 3 months.

As we have talked with a lot of people they don't understand how ChatGPT and other AI tools are working. At the core of AI are LLMs - which are Large Language Models. These are the core of how AI works. So let's talk about how LLMs work.

The inner-workings of LLMs are complicated; even researchers don’t fully understand how the models work. But it’s helpful to have a basic understanding of what happens when you use ChatGPT.

Words are complex, and language models store each specific word in “word space”—a plane with more dimensions than the human brain can envision. It helps me to picture a three-dimensional plane, to get a sense of how many different points exist within that plane. Now add a fourth, fifth, sixth dimension.

You can envision each word stored as a vector in this plane. Language models store words in “clusters,” with similar words placed closer together. For instance, the word vectors closest to cat include dog, kitten, and pet. Here are other examples of clusters:

The language model determines vector placement and clusters based on a neural network that’s been trained on heaps and heaps of language: the neural network gets really good at knowing that dog and cat often occur together, that walk and run are similar in meaning, or that “Trump” often follows “Donald.” It then clusters those words together.

Vectors act as good building blocks for language models because they can capture subtle.

If a language model learns something about a cat (for example, it sometimes goes to the vet), the same thing is likely to be true of a kitten or a dog. If a model learns something about the relationship between Paris and France (for example, they share a language), there’s a good chance that the same will be true for Berlin and Germany and for Rome and Italy.

A newly-created language model won’t be very good at this. It might struggle to finish the sentence, “I like my coffee with cream and…” But with more and more training, the model improves. Eventually, the model will become good at predicting “sugar” as the word that should finish the sentence. For humans, this comes intuitively; for language models, this happens with a lot of math in the background.

The key is scale. GPT-3 was trained on nearly the entire internet—books, articles, Wikipedia. The model was trained on about 500 billion words. By comparison, a human child has absorbed about 100 million words by age 10. So that’s a 5,000x multiple on the language digested by a 10-year-old.

What’s complex about language is that it’s full of nuances. Computers were originally built for computation (hence the name), which is more straightforward: 2 + 2 always equals 4.

In “the customer asked the mechanic to fix his car,” does “his” refer to the customer or the mechanic?

In “the professor urged the student to do her homework” does “her” refer to the professor or the student?

In “fruit flies like a banana” is “flies” a verb (referring to fruit soaring across the sky) or a noun (referring to banana-loving insects)?

For that, we need to talk about transformer models.

A transformer is a type of architecture for the neural network. You can think of it as a series of layers. Each layer of the model synthesizes some data from the input, storing that data within new vectors (which are called a hidden state). Those new vectors are then passed to the next layer in the stack.

Of course, the real thing is much more complex. The most powerful version of GPT-3 has 96 layers. And, naturally, word vectors are much more complicated than we can show here. The most powerful version of GPT-3 uses word vectors with 12,288 numbers. (Again, things have improved fast: GPT-1, released in 2018, had 12 layers and 768 numbers in its word vectors.)

That’s the crux of how LLMs work—they translate words to (lots of) numbers, then adapt those numbers in each level of the stack to understand context and meaning.

LLMs are much more complex than this short summary lets on. The goal here was to capture a high-level understanding of how LLMs train on vast amounts of data, then perform a series of complex, rapid calculations to translate inputs into outputs. For more in-depth analysis, check out this piece from Timothy B. Lee and Sean Trott, this piece from Madhumita Murgia, or NVIDIA’s deep-dive into transformer models.

#coding #codingbootcamp #softwaredeveloper #CodeYourFuture