Learning to Detect Novel and Fine-Grained Acoustic Sequences Using Pretrained Audio Representations
AuthorsVasudha Kowtha, Miquel Espi Marques, Jonathan Huang, Yichi Zhang, Carlos Avendano
AuthorsVasudha Kowtha, Miquel Espi Marques, Jonathan Huang, Yichi Zhang, Carlos Avendano
This work investigates pre-trained audio representations for few shot Sound Event Detection. We specifically address the task of few shot detection of novel acoustic sequences, or sound events, with semantically meaningful temporal structure without assuming access to non-target audio. We develop procedures for pre-training suitable representations and methods that transfer them to our few shot learning scenario. Our experiments evaluate the general purpose utility of our pre-trained representations on AudioSet, and the utility of proposed few shot methods via tasks constructed from real-world acoustic sequences. Our pre-trained embeddings are suitable to the proposed task and enable multiple aspects of our few shot framework.