View publication

This paper was accepted at the Mathematics of Modern Machine Learning (M3L) Workshop at NeurIPS 2024.

We investigate the unreasonable effectiveness of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM and DDIM, and neither sampler with CFG generates the gamma-powered distribution. Then, we clarify the behavior of CFG by showing that it is a kind of Predictor-Corrector (PC) method that alternates between denoising and sharpening, which we call Predictor-Corrector Guidance (PCG). We show that in the SDE limit, DDPM-CFG is equivalent to PCG with a DDIM predictor applied to the conditional distribution, and Langevin dynamics corrector applied to a gamma-powered distribution. While the standard PC corrector applies to the conditional distribution and improves sampling accuracy, our corrector sharpens the distribution.

Related readings and updates.

When Does a Predictor Know Its Own Loss?

Given a predictor and a loss function, how well can we predict the loss that the predictor will incur on an input? This is the problem of loss prediction, a key computational task associated with uncertainty estimation for a predictor. In a classification setting, a predictor will typically predict a distribution over labels and hence have its own estimate of the loss that it will incur, given by the entropy of the predicted distribution. Should…
See paper details

Controllable Music Production with Diffusion Models and Guidance Gradients

This paper was accepted at the Diffusion Models workshop at NeurIPS 2023. We demonstrate how conditional generation from diffusion models can be used to tackle a variety of realistic tasks in the production of music in 44.1kHz stereo audio with sampling-time guidance. The scenarios we consider include continuation, inpainting and regeneration of musical audio, the creation of smooth transitions between two different music tracks, and the transfer…
See paper details