Deep perceptual embeddings for unlabelled animal sound events

Research output: Contribution to journalArticleScientificpeer-review

16 Citations (Scopus)


Evaluating sound similarity is a fundamental building block in acoustic perception and computational analysis. Traditional data-driven analyses of perceptual similarity are based on heuristics or simplified linear models, and are thus limited. Deep learning embeddings, often using triplet networks, have been useful in many fields. However, such networks are usually trained using large class-labelled datasets. Such labels are not always feasible to acquire. We explore data-driven neural embeddings for sound event representation when class labels are absent, instead utilising proxies of perceptual similarity judgements. Ultimately, our target is to create a perceptual embedding space that reflects animals' perception of sound. We create deep perceptual embeddings for bird sounds using triplet models. In order to deal with the challenging nature of triplet loss training with the lack of class-labelled data, we utilise multidimensional scaling (MDS) pretraining, attention pooling, and a triplet mining scheme. We also evaluate the advantage of triplet learning compared to learning a neural embedding from a model trained on MDS alone. Using computational proxies of similarity judgements, we demonstrate the feasibility of the method to develop perceptual models for a wide range of data based on behavioural judgements, helping us understand how animals perceive sounds.

Original languageEnglish
Pages (from-to)2-11
Number of pages10
JournalJournal of the Acoustical Society of America
Issue number1
Publication statusPublished - 1 Jul 2021


Dive into the research topics of 'Deep perceptual embeddings for unlabelled animal sound events'. Together they form a unique fingerprint.

Cite this