Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks
AuthorsBruno Di Giorgi, Matthias Mauch, Mark Levy
Downbeat Tracking with Tempo-Invariant Convolutional Neural Networks
AuthorsBruno Di Giorgi, Matthias Mauch, Mark Levy
The human ability to track musical downbeats is robust to changes in tempo, and it extends to tempi never previously encountered. We propose a deterministic time-warping operation that enables this skill in a convolutional neural network (CNN) by allowing the network to learn rhythmic patterns independently of tempo. Unlike conventional deep learning approaches, which learn rhythmic patterns at the tempi present in the training dataset, the patterns learned in our model are tempo-invariant, leading to better tempo generalisation and more efficient usage of the network capacity. We test the generalisation property on a synthetic dataset created by rendering the Groove MIDI Dataset using FluidSynth, split into a training set containing the original performances and a test set containing tempo-scaled versions rendered with different SoundFonts (test-time augmentation). The proposed model generalises nearly perfectly to unseen tempi (F-measure of 0.89 on both training and test sets), whereas a comparable conventional CNN achieves similar accuracy only for the training set (0.89) and drops to 0.54 on the test set. The generalisation advantage of the proposed model extends to real music, as shown by results on the GTZAN and Ballroom datasets.
AbstRaL: Augmenting LLMs’ Reasoning by Reinforcing Abstract Thinking
June 22, 2025research area Methods and Algorithms
Recent studies have shown that large language models (LLMs), especially smaller ones, often lack robustness in their reasoning. I.e., they tend to experience performance drops when faced with distribution shifts, such as changes to numerical or nominal variables, or insertions of distracting clauses. A possible strategy to address this involves generating synthetic data to further “instantiate” reasoning problems on potential variations. In…
Retrieval-Augmented Correction of Named Entity Speech Recognition Errors
September 11, 2024research area Knowledge Bases and Search, research area Speech and Natural Language Processing
In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language processing (NLP) tasks. In NLP tasks where a database…