Abstract
Following earlier work in multimodal distributional semantics, we present the first results of our efforts to build a perceptually grounded semantic model. Rather than using images, our models are built on sound data collected from freesound.org. We compare three models: one bag-of-words model based on user-provided tags, a model based on audio features, using a ‘bag-of-audio-words’ approach and a model that combines the two. Our results show that the models are able to capture semantic relatedness, with the tag-based model scoring higher than the sound-based model and the combined model. However, capturing semantic relatedness is biased towards language-based models. Future work will focus on improving the sound-based model, finding ways to combine linguistic and acoustic information, and creating more reliable evaluation data.
Original language | English |
---|---|
Title of host publication | Proceedings of the 11th International Conference on Computational Semantics |
Place of Publication | London |
Publisher | Association for Computational Linguistics |
Pages | 70-75 |
Number of pages | 6 |
Publication status | Published - 1 Apr 2015 |
Externally published | Yes |