Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

AuthorsRan Liu, Ellen Zippi, Hadi Pour Ansari, Chris Sandino, Jingping Nie, Hanlin Goh, Erdrin Azemi, Ali Moin

Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this study, our aim is to design a general-purpose architecture for biosignals that can be easily trained on multiple modalities and can be adapted to new modalities or tasks with ease. The proposed model is designed with three key features: (i) A frequency-aware architecture that can efficiently identify local and global information from biosignals by leveraging global filters in the frequency space. (ii) A channel-independent design that shares the encoder's weights across different channels using either general-purpose or modality-specific filters. (iii) A modality-combining transformer capable of effectively combining an arbitrary number of modalities. We demonstrate the robustness of the proposed architecture on multiple biosignal datasets, where we show the proposed architecture does not only perform better than single-modality models, but also outperform in transfer learning tasks.

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Related readings and updates.

Large-scale Training of Foundation Models for Wearable Biosignals

Subject-Aware Contrastive Learning for Biosignals

Discover opportunities in Machine Learning.