Abstract
Establishing whether language models can use contextual information in a human-plausible way is important to ensure their trustworthiness in real-world settings. However, the questions of when and which parts of the context affect model generations are typically tackled separately, with current plausibility evaluations being practically limited to a handful of artificial benchmarks. To address this, we introduce Plausibility Evaluation of Context Reliance (PECoRe), an end-to-end interpretability framework designed to quantify context usage in language models' generations. Our approach leverages model internals to (i) contrastively identify context-sensitive target tokens in generated texts and (ii) link them to contextual cues justifying their prediction. We use \pecore to quantify the plausibility of context-aware machine translation models, comparing model rationales with human annotations across several discourse-level phenomena. Finally, we apply our method to unannotated model translations to identify context-mediated predictions and highlight instances of (im)plausible context usage throughout generation.
| Original language | English |
|---|---|
| Number of pages | 29 |
| DOIs | |
| Publication status | Published - 16 Jan 2024 |
| Event | The International Conference on Learning Representations (ICLR) - Messe Wien exhibition and congress center, Vienna, Austria Duration: 7 May 2024 → 11 May 2024 https://iclr.cc/Conferences/2024 |
Conference
| Conference | The International Conference on Learning Representations (ICLR) |
|---|---|
| Country/Territory | Austria |
| City | Vienna |
| Period | 7/05/24 → 11/05/24 |
| Internet address |
Keywords
- neural machine translation
- large language models