On the difficulty of a distributional semantics of spoken language

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

The bulk of research in the area of speech processing concerns itself with supervised approaches to transcribing spoken language into text. In the domain of unsupervised learning most work on speech has focused on discovering relatively low level constructs such as phoneme inventories or word-like units. This is in contrast to research on written language, where there is a large body of work on unsupervised induction of semantic representations of words and whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of spoken language semantics becomes possible if we abstract from the surface variability. We simulate this setting by using a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we suggest possible routes toward transferring our methods to the domain of unrestricted natural speech.
Original languageEnglish
Title of host publicationProceedings of the Society for Computation in Linguistics
Volume2
DOIs
Publication statusPublished - 2019
EventSociety for Computation in Linguistics - New York City, United States
Duration: 3 Jan 2019 → …
https://blogs.umass.edu/scil/scil-2019/

Conference

ConferenceSociety for Computation in Linguistics
CountryUnited States
CityNew York City
Period3/01/19 → …
Internet address

Fingerprint

Semantics
Unsupervised learning
Speech processing

Keywords

  • cs.CL
  • cs.LG
  • cs.SD
  • eess.AS

Cite this

Chrupała, Grzegorz ; Gelderloos, Lieke ; Kádár, Ákos ; Alishahi, Afra. / On the difficulty of a distributional semantics of spoken language. Proceedings of the Society for Computation in Linguistics. Vol. 2 2019.
@inproceedings{e74ffb1e116448d28f8864a38f0d70a0,
title = "On the difficulty of a distributional semantics of spoken language",
abstract = "The bulk of research in the area of speech processing concerns itself with supervised approaches to transcribing spoken language into text. In the domain of unsupervised learning most work on speech has focused on discovering relatively low level constructs such as phoneme inventories or word-like units. This is in contrast to research on written language, where there is a large body of work on unsupervised induction of semantic representations of words and whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of spoken language semantics becomes possible if we abstract from the surface variability. We simulate this setting by using a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we suggest possible routes toward transferring our methods to the domain of unrestricted natural speech.",
keywords = "cs.CL, cs.LG, cs.SD, eess.AS",
author = "Grzegorz Chrupała and Lieke Gelderloos and {\'A}kos K{\'a}d{\'a}r and Afra Alishahi",
year = "2019",
doi = "10.7275/extq-7546",
language = "English",
volume = "2",
booktitle = "Proceedings of the Society for Computation in Linguistics",

}

Chrupała, G, Gelderloos, L, Kádár, Á & Alishahi, A 2019, On the difficulty of a distributional semantics of spoken language. in Proceedings of the Society for Computation in Linguistics. vol. 2, Society for Computation in Linguistics, New York City, United States, 3/01/19. https://doi.org/10.7275/extq-7546

On the difficulty of a distributional semantics of spoken language. / Chrupała, Grzegorz; Gelderloos, Lieke; Kádár, Ákos; Alishahi, Afra.

Proceedings of the Society for Computation in Linguistics. Vol. 2 2019.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

TY - GEN

T1 - On the difficulty of a distributional semantics of spoken language

AU - Chrupała, Grzegorz

AU - Gelderloos, Lieke

AU - Kádár, Ákos

AU - Alishahi, Afra

PY - 2019

Y1 - 2019

N2 - The bulk of research in the area of speech processing concerns itself with supervised approaches to transcribing spoken language into text. In the domain of unsupervised learning most work on speech has focused on discovering relatively low level constructs such as phoneme inventories or word-like units. This is in contrast to research on written language, where there is a large body of work on unsupervised induction of semantic representations of words and whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of spoken language semantics becomes possible if we abstract from the surface variability. We simulate this setting by using a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we suggest possible routes toward transferring our methods to the domain of unrestricted natural speech.

AB - The bulk of research in the area of speech processing concerns itself with supervised approaches to transcribing spoken language into text. In the domain of unsupervised learning most work on speech has focused on discovering relatively low level constructs such as phoneme inventories or word-like units. This is in contrast to research on written language, where there is a large body of work on unsupervised induction of semantic representations of words and whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of spoken language semantics becomes possible if we abstract from the surface variability. We simulate this setting by using a dataset of utterances spoken by a realistic but uniform synthetic voice. We evaluate two simple unsupervised models which, to varying degrees of success, learn semantic representations of speech fragments. Finally we suggest possible routes toward transferring our methods to the domain of unrestricted natural speech.

KW - cs.CL

KW - cs.LG

KW - cs.SD

KW - eess.AS

U2 - 10.7275/extq-7546

DO - 10.7275/extq-7546

M3 - Conference contribution

VL - 2

BT - Proceedings of the Society for Computation in Linguistics

ER -

Chrupała G, Gelderloos L, Kádár Á, Alishahi A. On the difficulty of a distributional semantics of spoken language. In Proceedings of the Society for Computation in Linguistics. Vol. 2. 2019 https://doi.org/10.7275/extq-7546