Polyphonic sound event detection for highly dense birdsong scenes

Alberto García Arroba Parrilla, Dan Stowell

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    47 Downloads (Pure)

    Abstract

    One hour before sunrise, one can experience the dawn chorus where birds from different species sing together. In this scenario, high levels of polyphony, as in the number of overlapping sound sources, are prone to happen resulting in a complex acoustic outcome. Sound Event Detection (SED) tasks analyze acoustic scenarios in order to identify the occurring events and their respective temporal information. However, highly dense scenarios can be hard to process and have not been studied in depth. Here we show, using a Convolutional Recurrent Neural Network (CRNN), how birdsong polyphonic scenarios can be detected when dealing with higher polyphony and how effectively this type of model can face a very dense scene with up to 10 overlapping birds. We found that models trained with denser examples (i.e., higher polyphony) learn at a similar rate as models that used simpler samples in their training set. Additionally, the model trained with the densest samples maintained a consistent score for all polyphonies, while the model trained with the least dense samples degraded as the polyphony increased. Our results demonstrate that highly dense acoustic scenarios can be dealt with using CRNNs. We expect that this study serves as a starting point for working on highly populated bird scenarios such as dawn chorus or other dense acoustic problems.
    Original languageEnglish
    Title of host publicationProceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2022)
    Pages146-150
    ISBN (Electronic)978-952-03-2677-7
    Publication statusPublished - 13 Jul 2022

    Keywords

    • cs.SD
    • eess.AS
    • Sound Event Detection
    • Polyphon
    • Birdsong

    Fingerprint

    Dive into the research topics of 'Polyphonic sound event detection for highly dense birdsong scenes'. Together they form a unique fingerprint.

    Cite this