View publication

We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information such as rotation invariance. Then, the learned distance provides a measurement of semantic similarity between a pair of accelerometry time-series, which we use to train our foundation model to model relative relationships across time and across subjects. The foundation model is trained on 1 billion segments from 87,376 participants, and achieves state-of- the-art performance across multiple downstream tasks, including human activity recognition and gait metric regression. To our knowledge, we are the first to show the generalizability of a foundation model with motion data from wearables across distinct evaluation tasks.

† University of Illinois Urbana-Champaign (UIUC)

Figure 1: Each sequence color represents a different user’s time-series. RelCon draws candidates from both within- and between-user and ranks them by their relative similarity via a learnable distance function. Then, it iteratively applies a contrastive loss, selecting one candidate as positive while assigning the more distant as negative. This helps prevent false positives/negatives because the full relative ranking is captured. Prior approaches define a single positive/negative set, risking semantic errors if misdefined. AugPred and SimCLR construct positive pairs via semantics-preserving augmentations and thus are resistant to false positives, but REBAR does not have a semantically-constrained pair construction. Furthermore, each method’s candidate sampling varies, affecting their modeling of within- and between-user interactions.

Related readings and updates.

This paper was accepted at the Learning from Time Series for Health workshop at NeurIPS 2025.

Both speech and sensor time series data encode information in both the time- and frequency- domains, like spectral powers and waveform shapelets. We show that speech foundation models learn representations that generalize beyond the speech domain and achieve state-of-the-art performance on diverse time-series tasks from wearable sensors. Probes trained…

Read more

Modern wearable devices can conveniently record various biosignals in the many different environments of daily living, enabling a rich view of individual health. However, not all biosignals are the same: high-fidelity biosignals, such as photoplethysmogram (PPG), contain more physiological information, but require optical sensors with a high power footprint. Alternatively, a lower-fidelity biosignal such as accelerometry has a significantly…

Read more