Word Segmentation as Unsupervised Constituency Parsing

Raquel G. Alhama*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Word identification from continuous input is typically viewed as a segmentation task. Experiments with human adults suggest that familiarity with syntactic structures in their native language also influences word identification in artificial languages; however, the relation between syntactic processing and word identification is yet unclear. This work takes one step forward by exploring a radically different approach of word identification, in which segmentation of a continuous input is viewed as a process isomorphic to unsupervised constituency parsing. Besides formalizing the approach, this study reports simulations of human experiments with DIORA (Drozdov et al., 2020), a neural unsupervised constituency parser. Results show that this model can reproduce human behavior in word identification experiments, suggesting that this is a viable approach to study word identification and its relation to syntactic processing.
Original languageEnglish
Title of host publicationProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Place of PublicationDublin, Ireland
Pages4103–4112
Volume1
DOIs
Publication statusPublished - May 2022

Keywords

  • Word Identification
  • Segmentation Task
  • Artificial Languages

Fingerprint

Dive into the research topics of 'Word Segmentation as Unsupervised Constituency Parsing'. Together they form a unique fingerprint.

Cite this