View publication

Optimal transport (OT) theory focuses, among all maps T:RdRdT:\mathbb{R}^d\rightarrow \mathbb{R}^d that can morph a probability measure onto another, on those that are the “thriftiest”, i.e. such that the averaged cost c(x,T(x))c(\mathbf{x}, T(\mathbf{x})) between x\mathbf{x} and its image T(x)T(\mathbf{x}) be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when cc is the 22\ell_2^2 distance, e.g., using entropic maps (Pooladian and Niles-Weed, 2021), or neural networks (Makkuva et al., 2020; Korotin et al., 2020). We propose a new model for transport maps, built on a family of translation invariant costs c(x,y):=h(xy)c(\mathbf{x},\mathbf{y}):=h(\mathbf{x}-\mathbf{y}), where h:=1222+τh:=\tfrac{1}{2}\|\cdot\|_2^2+\tau and τ\tau is a regularizer. We propose a generalization of the entropic map suitable for hh, and highlight a surprising link tying it with the Bregman centroids of the divergence DhD_h generated by hh, and the proximal operator of τ\tau. We show that choosing a sparsity-inducing norm for τ\tau results in maps that apply Occam’s razor to transport, in the sense that the displacement vectors Δ(x):=T(x)x\Delta(\mathbf{x}):= T(\mathbf{x})-\mathbf{x} they induce are sparse, with a sparsity pattern that varies depending on x\mathbf{x}. We showcase the ability of our method to estimate meaningful OT maps for high-dimensional single-cell transcription data, in the 3400034000-dd space of gene counts for cells, without using dimensionality reduction, thus retaining the ability to interpret all displacements at the gene level.

Related readings and updates.

Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points (x0,x1)(\mathbf{x}_0, \mathbf{x}_1) and ensuring that the velocity field is aligned, on average, with x1x0\mathbf{x}_1 - \mathbf{x}_0 when evaluated along a segment linking x0\mathbf{x}_0 to x1\mathbf{x}_1. While these pairs are sampled…

Read more

Given a source and a target probability measure supported on Rd\mathbb{R}^d, the Monge problem aims for the most efficient way to map one distribution to the other. This efficiency is quantified by defining a cost function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance, 22(x,y)=12xy22\ell^2_2(x,y)=\tfrac12\|x-y\|_2^2. The benefits of using elastic costs, defined…

Read more