Skip to main navigation Skip to search Skip to main content

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

  • Anna Langedijk*
  • , Hosein Mohebbi
  • , Gabriele Sarti
  • , Willem Zuidema
  • , Jaap Jumelet
  • *Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    8 Downloads (Pure)

    Abstract

    In recent years, several interpretability methods have been proposed to interpret the inner workings of Transformer models at different levels of precision and complexity.In this work, we propose a simple but effective technique to analyze encoder-decoder Transformers. Our method, which we name DecoderLens, allows the decoder to cross-attend representations of intermediate encoder activations instead of using the default final encoder output.The method thus maps uninterpretable intermediate vector representations to human-interpretable sequences of words or symbols, shedding new light on the information flow in this popular but understudied class of models.We apply DecoderLens to question answering, logical reasoning, speech recognition and machine translation models, finding that simpler subtasks are solved with high precision by low and intermediate encoder layers.
    Original languageEnglish
    Title of host publicationFindings of the Association for Computational Linguistics: NAACL 2024
    EditorsKevin Duh, Helena Gomez, Steven Bethard
    Place of PublicationMexico City, Mexico
    PublisherAssociation for Computational Linguistics
    Pages4764–4780
    Number of pages17
    DOIs
    Publication statusPublished - Jun 2024

    Fingerprint

    Dive into the research topics of 'DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers'. Together they form a unique fingerprint.

    Cite this