Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer
AuthorsGajendra Katuwal, Advait Koparkar, Salar Abbaspourazad, Anshuman Mishra, Sarvesh Kirthivasan
Emergent Symbolic Structure in Health Foundation Models: Extraction, Alignment, and Cross-Modal Transfer
AuthorsGajendra Katuwal, Advait Koparkar, Salar Abbaspourazad, Anshuman Mishra, Sarvesh Kirthivasan
This paper was accepted at the Mechanistic Interpretability Workshop at ICML 2026.
Health foundation models (FMs) learn useful representations from wearable sensors, but interpreting what they encode and transferring that knowledge across modalities after training remains difficult. We present a post-training framework that decomposes frozen embeddings into interpretable directions, referred to as symbols, and use these symbols to align the embedding spaces without retraining. We evaluate the framework on three FMs for photoplethysmography (PPG) and accelerometer data, independently pretrained on ∼20M minutes of unlabeled data from ∼172K participants, and analyzed on a held-out cohort of 30K subjects. We find that extracted symbols associate selectively with health conditions and physiological attributes, and these associations are partially shared across modalities and architectures. Cross-modal transfer via symbols retains more than 95% of in-domain performance, is nearly symmetric across domain directions, and saturates with limited paired data, together indicating that alignment recovers a shared low-dimensional subspace rich in physiological information. Overall, these results suggest that health FM embeddings contain an interpretable symbolic organization that is shared across modalities and supports cross-domain transfer without joint training.
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
September 22, 2025research area Computer Visionconference NeurIPS
Despite Contrastive Language-Image Pretraining (CLIP)‘s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, we discover that off-the-shelf MLLMs (Multimodal Large Language Models) demonstrate powerful inherent modality alignment properties. While recent MLLM-based retrievers with unified architectures partially mitigate this gap, their reliance on coarse modality…
Promoting Cross-Modal Representations to Improve Multimodal Foundation Models for Physiological Signals
October 28, 2024research area Methods and Algorithmsconference NeurIPS
Many healthcare applications are inherently multimodal, involving several physiological signals. As sensors for these signals become more common, improving machine learning methods for multimodal healthcare data is crucial. Pretraining foundation models is a promising avenue for success. However, methods for developing foundation models in healthcare are still in early exploration and it is unclear which pretraining strategies are most effective…