Improving sequence segmentation learning by predicting trigrams

A. van den Bosch, W. Daelemans

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    7 Citations (Scopus)
    73 Downloads (Pure)

    Abstract

    Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a new pointwise-prediction single-classifier method that predicts trigrams of class labels on the basis of windowed input sequences, and uses a simple voting mechanism to decide on the labels in the final output sequence. We apply the method to maximum-entropy, sparse winnow, and memory-based classifiers using three different sentence-level chunking tasks, and show that the method is able to boost generalization performance in most experiments, attaining error reductions of up to 51%. We compare and combine the method with two known alternative methods to combat near-sightedness, viz. a feedback-loop method and a stacking method, using the memory-based classifier. The combination with a feedback loop suffers from the label bias problem, while the combination with a stacking method produces the best overall results.
    Original languageEnglish
    Title of host publicationProceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30
    EditorsI. Dagan, D. Gildea
    Place of PublicationAnn Arbor, MI
    PublisherACL
    Pages80-87
    Number of pages8
    Publication statusPublished - 2005

    Fingerprint Dive into the research topics of 'Improving sequence segmentation learning by predicting trigrams'. Together they form a unique fingerprint.

  • Projects

    Optimization in machine learning of language

    Daelemans, W. M. P.

    1/01/041/01/06

    Project: Research project

    Memory models of language

    van den Bosch, A.

    1/07/011/07/06

    Project: Research project

    Cite this

    van den Bosch, A., & Daelemans, W. (2005). Improving sequence segmentation learning by predicting trigrams. In I. Dagan, & D. Gildea (Eds.), Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30 (pp. 80-87). ACL.