2 Amazing Ideas in Latent Diffusion Models LDM w/ VAE, U-Net & CLIP: Generative AI

Опубликовано: 20 Октябрь 2024
на канале: Discover AI
1,787
54

New Latent Diffusion Models, LDM by Rombach & Blattmann, 2022, run the diffusion process in latent space instead of pixel space, making training cost lower and inference speed faster. Insights from a theoretical physicist applying Markov chains, UNet data augmentation theory. Keywords: stable ai art, generative AI.

LDM loosely decomposes the perceptual compression and semantic compression with generative modeling learning by first trimming off pixel-level redundancy with auto-encoder and then manipulate/generate semantic concepts with diffusion process on learned latent. Architecture wise Diffusion Models consists of Variational Autoencoders, a U-Net and CLIP Text Encoder (or BERT) for Generative AI.

Remember: Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from CompVis, Stability AI and LAION.

The key difference between standard diffusion and latent diffusion models: in latent diffusion the model is trained to generate latent (compressed) representations of the images.

There are three main components in latent diffusion models:

1. Variational AutoEncoder (VAE).
2. A U-Net Data Augmentation (2015).
3. A text-encoder, e.g. CLIP's Text Encoder.


Explained:
CompVis - Machine Vision and Learning LMU Munich
Machine Vision and Learning research group at Ludwig Maximilian University of Munich (formerly Computer Vision Group at Heidelberg University)

Noticeable links:

High-Resolution Image Synthesis with Latent Diffusion Models
https://arxiv.org/pdf/2112.10752.pdf

U-Net: Convolutional Networks for Biomedical Image Segmentation
https://arxiv.org/pdf/1505.04597.pdf

https://lilianweng.github.io/posts/20...
https://deepsense.ai/the-recent-rise-...

00:00 Latent Diffusion Model explained
00:37 Nonequilibrium Thermodynamics 2015
02:32 Generative Markov Chains
05:10 UNet Data Augmentation 2015
06:39 UNet Architecture
08:12 LDM 2022 pretrained Autoencoders w/ cross-attention layers
10:18 Schema of LDM - Latent Diffusion Model
13:07 Summary 5 Videos

#text-to-image
#stablediffusion
#ai
#generativeai