Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking

S.V.M. Canisius, A. van den Bosch, W. Daelemans

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    37 Downloads (Pure)

    Abstract

    We present a comparative case study of discrete and probabilistic sequence classification methods applied to two real-world entity chunking tasks in the medical domain. It is shown that a discrete version of maximum-entropy models that does not coordinate its decisions is outperformed by both architecturally-augmented discrete versions, and probabilistic versions combined with an inference step to select the best output label sequence. In addition, we show that among the various sequence-aware methods evaluated in this study, be they discrete or probabilistic, no significant performance difference could be observed. This suggests that probabilistic sequence labelling methods are not fundamentally more suited for the type of sequence-oriented entity chunking tasks evaluated in this study than augmented discrete approaches. Future research should point out whether this result generalises to more types of sequence tasks in natural language processing.
    Original languageEnglish
    Title of host publicationProceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006
    EditorsP.-Y. Schobbens, W. Vanhof, G. Schwanen
    Place of PublicationNamur, Belgium
    PublisherBelgisch Nederlandse Ver. voor Kunstmatige Intelligentie
    Pages75-82
    Number of pages8
    Publication statusPublished - 2006

    Fingerprint

    Labeling
    Labels
    Classifiers
    Entropy
    Processing

    Cite this

    Canisius, S. V. M., van den Bosch, A., & Daelemans, W. (2006). Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking. In P-Y. Schobbens, W. Vanhof, & G. Schwanen (Eds.), Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006 (pp. 75-82). Namur, Belgium: Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie.
    Canisius, S.V.M. ; van den Bosch, A. ; Daelemans, W. / Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking. Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006. editor / P.-Y. Schobbens ; W. Vanhof ; G. Schwanen. Namur, Belgium : Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie, 2006. pp. 75-82
    @inproceedings{f84c79ed32ce4d6cbdfbdeedcdb75f97,
    title = "Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking",
    abstract = "We present a comparative case study of discrete and probabilistic sequence classification methods applied to two real-world entity chunking tasks in the medical domain. It is shown that a discrete version of maximum-entropy models that does not coordinate its decisions is outperformed by both architecturally-augmented discrete versions, and probabilistic versions combined with an inference step to select the best output label sequence. In addition, we show that among the various sequence-aware methods evaluated in this study, be they discrete or probabilistic, no significant performance difference could be observed. This suggests that probabilistic sequence labelling methods are not fundamentally more suited for the type of sequence-oriented entity chunking tasks evaluated in this study than augmented discrete approaches. Future research should point out whether this result generalises to more types of sequence tasks in natural language processing.",
    author = "S.V.M. Canisius and {van den Bosch}, A. and W. Daelemans",
    note = "Pagination: 8",
    year = "2006",
    language = "English",
    pages = "75--82",
    editor = "P.-Y. Schobbens and W. Vanhof and G. Schwanen",
    booktitle = "Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006",
    publisher = "Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie",

    }

    Canisius, SVM, van den Bosch, A & Daelemans, W 2006, Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking. in P-Y Schobbens, W Vanhof & G Schwanen (eds), Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006. Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie, Namur, Belgium, pp. 75-82.

    Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking. / Canisius, S.V.M.; van den Bosch, A.; Daelemans, W.

    Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006. ed. / P.-Y. Schobbens; W. Vanhof; G. Schwanen. Namur, Belgium : Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie, 2006. p. 75-82.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking

    AU - Canisius, S.V.M.

    AU - van den Bosch, A.

    AU - Daelemans, W.

    N1 - Pagination: 8

    PY - 2006

    Y1 - 2006

    N2 - We present a comparative case study of discrete and probabilistic sequence classification methods applied to two real-world entity chunking tasks in the medical domain. It is shown that a discrete version of maximum-entropy models that does not coordinate its decisions is outperformed by both architecturally-augmented discrete versions, and probabilistic versions combined with an inference step to select the best output label sequence. In addition, we show that among the various sequence-aware methods evaluated in this study, be they discrete or probabilistic, no significant performance difference could be observed. This suggests that probabilistic sequence labelling methods are not fundamentally more suited for the type of sequence-oriented entity chunking tasks evaluated in this study than augmented discrete approaches. Future research should point out whether this result generalises to more types of sequence tasks in natural language processing.

    AB - We present a comparative case study of discrete and probabilistic sequence classification methods applied to two real-world entity chunking tasks in the medical domain. It is shown that a discrete version of maximum-entropy models that does not coordinate its decisions is outperformed by both architecturally-augmented discrete versions, and probabilistic versions combined with an inference step to select the best output label sequence. In addition, we show that among the various sequence-aware methods evaluated in this study, be they discrete or probabilistic, no significant performance difference could be observed. This suggests that probabilistic sequence labelling methods are not fundamentally more suited for the type of sequence-oriented entity chunking tasks evaluated in this study than augmented discrete approaches. Future research should point out whether this result generalises to more types of sequence tasks in natural language processing.

    M3 - Conference contribution

    SP - 75

    EP - 82

    BT - Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006

    A2 - Schobbens, P.-Y.

    A2 - Vanhof, W.

    A2 - Schwanen, G.

    PB - Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie

    CY - Namur, Belgium

    ER -

    Canisius SVM, van den Bosch A, Daelemans W. Discrete versus Probabilistic Sequence Classifiers for Domain-specific Entity Chunking. In Schobbens P-Y, Vanhof W, Schwanen G, editors, Proceedings of the Eighteenth Belgium-Netherlands Conference on Artificial Intelligence, BNAIC-2006. Namur, Belgium: Belgisch Nederlandse Ver. voor Kunstmatige Intelligentie. 2006. p. 75-82