Word Segmentation as Unsupervised Constituency Parsing

Raquel G. Alhama*

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    Word identification from continuous input is typically viewed as a segmentation task. Experiments with human adults suggest that familiarity with syntactic structures in their native language also influences word identification in artificial languages; however, the relation between syntactic processing and word identification is yet unclear. This work takes one step forward by exploring a radically different approach of word identification, in which segmentation of a continuous input is viewed as a process isomorphic to unsupervised constituency parsing. Besides formalizing the approach, this study reports simulations of human experiments with DIORA (Drozdov et al., 2020), a neural unsupervised constituency parser. Results show that this model can reproduce human behavior in word identification experiments, suggesting that this is a viable approach to study word identification and its relation to syntactic processing.
    Original languageEnglish
    Title of host publicationProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
    Place of PublicationDublin, Ireland
    Pages4103–4112
    Volume1
    DOIs
    Publication statusPublished - May 2022

    Keywords

    • Word Identification
    • Segmentation Task
    • Artificial Languages

    Fingerprint

    Dive into the research topics of 'Word Segmentation as Unsupervised Constituency Parsing'. Together they form a unique fingerprint.

    Cite this