Following earlier work in multimodal distributional semantics, we present the first results of our efforts to build a perceptually grounded semantic model. Rather than using images, our models are built on sound data collected from freesound.org. We compare three models: one bag-of-words model based on user-provided tags, a model based on audio features, using a ‘bag-of-audio-words’ approach and a model that combines the two. Our results show that the models are able to capture semantic relatedness, with the tag-based model scoring higher than the sound-based model and the combined model. However, capturing semantic relatedness is biased towards language-based models. Future work will focus on improving the sound-based model, finding ways to combine linguistic and acoustic information, and creating more reliable evaluation data.
|Title of host publication||Proceedings of the 11th International Conference on Computational Semantics|
|Place of Publication||London|
|Publisher||Association for Computational Linguistics|
|Number of pages||6|
|Publication status||Published - 1 Apr 2015|