Improving sequence segmentation learning by predicting trigrams

A. van den Bosch, W. Daelemans

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    37 Downloads (Pure)

    Abstract

    Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a new pointwise-prediction single-classifier method that predicts trigrams of class labels on the basis of windowed input sequences, and uses a simple voting mechanism to decide on the labels in the final output sequence. We apply the method to maximum-entropy, sparse winnow, and memory-based classifiers using three different sentence-level chunking tasks, and show that the method is able to boost generalization performance in most experiments, attaining error reductions of up to 51%. We compare and combine the method with two known alternative methods to combat near-sightedness, viz. a feedback-loop method and a stacking method, using the memory-based classifier. The combination with a feedback loop suffers from the label bias problem, while the combination with a stacking method produces the best overall results.
    Original languageEnglish
    Title of host publicationProceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30
    EditorsI. Dagan, D. Gildea
    Place of PublicationAnn Arbor, MI
    PublisherACL
    Pages80-87
    Number of pages8
    Publication statusPublished - 2005

    Fingerprint

    Classifiers
    Labels
    Feedback
    Data storage equipment
    Learning systems
    Entropy
    Processing
    Experiments

    Cite this

    van den Bosch, A., & Daelemans, W. (2005). Improving sequence segmentation learning by predicting trigrams. In I. Dagan, & D. Gildea (Eds.), Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30 (pp. 80-87). Ann Arbor, MI: ACL.
    van den Bosch, A. ; Daelemans, W. / Improving sequence segmentation learning by predicting trigrams. Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30. editor / I. Dagan ; D. Gildea. Ann Arbor, MI : ACL, 2005. pp. 80-87
    @inproceedings{5640229cf6b74240bb66bd1e545afe62,
    title = "Improving sequence segmentation learning by predicting trigrams",
    abstract = "Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a new pointwise-prediction single-classifier method that predicts trigrams of class labels on the basis of windowed input sequences, and uses a simple voting mechanism to decide on the labels in the final output sequence. We apply the method to maximum-entropy, sparse winnow, and memory-based classifiers using three different sentence-level chunking tasks, and show that the method is able to boost generalization performance in most experiments, attaining error reductions of up to 51{\%}. We compare and combine the method with two known alternative methods to combat near-sightedness, viz. a feedback-loop method and a stacking method, using the memory-based classifier. The combination with a feedback loop suffers from the label bias problem, while the combination with a stacking method produces the best overall results.",
    author = "{van den Bosch}, A. and W. Daelemans",
    note = "Pagination: 8",
    year = "2005",
    language = "English",
    pages = "80--87",
    editor = "I. Dagan and D. Gildea",
    booktitle = "Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30",
    publisher = "ACL",

    }

    van den Bosch, A & Daelemans, W 2005, Improving sequence segmentation learning by predicting trigrams. in I Dagan & D Gildea (eds), Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30. ACL, Ann Arbor, MI, pp. 80-87.

    Improving sequence segmentation learning by predicting trigrams. / van den Bosch, A.; Daelemans, W.

    Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30. ed. / I. Dagan; D. Gildea. Ann Arbor, MI : ACL, 2005. p. 80-87.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Improving sequence segmentation learning by predicting trigrams

    AU - van den Bosch, A.

    AU - Daelemans, W.

    N1 - Pagination: 8

    PY - 2005

    Y1 - 2005

    N2 - Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a new pointwise-prediction single-classifier method that predicts trigrams of class labels on the basis of windowed input sequences, and uses a simple voting mechanism to decide on the labels in the final output sequence. We apply the method to maximum-entropy, sparse winnow, and memory-based classifiers using three different sentence-level chunking tasks, and show that the method is able to boost generalization performance in most experiments, attaining error reductions of up to 51%. We compare and combine the method with two known alternative methods to combat near-sightedness, viz. a feedback-loop method and a stacking method, using the memory-based classifier. The combination with a feedback loop suffers from the label bias problem, while the combination with a stacking method produces the best overall results.

    AB - Symbolic machine-learning classifiers are known to suffer from near-sightedness when performing sequence segmentation (chunking) tasks in natural language processing: without special architectural additions they are oblivious of the decisions they made earlier when making new ones. We introduce a new pointwise-prediction single-classifier method that predicts trigrams of class labels on the basis of windowed input sequences, and uses a simple voting mechanism to decide on the labels in the final output sequence. We apply the method to maximum-entropy, sparse winnow, and memory-based classifiers using three different sentence-level chunking tasks, and show that the method is able to boost generalization performance in most experiments, attaining error reductions of up to 51%. We compare and combine the method with two known alternative methods to combat near-sightedness, viz. a feedback-loop method and a stacking method, using the memory-based classifier. The combination with a feedback loop suffers from the label bias problem, while the combination with a stacking method produces the best overall results.

    M3 - Conference contribution

    SP - 80

    EP - 87

    BT - Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30

    A2 - Dagan, I.

    A2 - Gildea, D.

    PB - ACL

    CY - Ann Arbor, MI

    ER -

    van den Bosch A, Daelemans W. Improving sequence segmentation learning by predicting trigrams. In Dagan I, Gildea D, editors, Proceedings of the Ninth Conference on Natural Language Learning, CONLL-2005, June 29-30. Ann Arbor, MI: ACL. 2005. p. 80-87