Projects per year
Abstract
Native Language Identification (NLI) intends to classify an author's native language based on their writing in another language. Historically, the task has heavily relied on time-consuming linguistic feature engineering, and NLI transformer models have thus far failed to offer effective, practical alternatives. The current work shows input size is a limiting factor, and that classifiers trained using Big Bird embeddings outperform linguistic feature engineering models (for which we reproduce previous work) by a large margin on the Reddit-L2 dataset. Additionally, we provide further insight into input length dependencies, show consistent out-of-sample (Europe subreddit) and out-of-domain (TOEFL-11) performance, and qualitatively analyze the embedding space. Given the effectiveness and computational efficiency of this method, we believe it offers a promising avenue for future NLI work.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation |
| Subtitle of host publication | LREC-COLING 2024 |
| Editors | Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue |
| Pages | 2375–2382 |
| Number of pages | 8 |
| Publication status | Published - 20 May 2024 |
| Event | LREC-COLING 2024: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation - Torino, Italy Duration: 20 May 2024 → 25 May 2024 https://lrec-coling-2024.org/ |
Conference
| Conference | LREC-COLING 2024 |
|---|---|
| Country/Territory | Italy |
| City | Torino |
| Period | 20/05/24 → 25/05/24 |
| Internet address |
Keywords
- native language identification
- transformer embeddings
- stylometry
- text classification
- natural language processing
- computational linguistics
Fingerprint
Dive into the research topics of 'BigNLI: Native Language Identification with Big Bird Embeddings'. Together they form a unique fingerprint.Projects
- 1 Finished
-
GRASP: GRASP 👊 : Gathering Redditors Against Stylometric Profiling
Emmery, C. (Principal Investigator), Miotto, M. (Researcher), Kramp, S. (Researcher) & Kleinberg, B. (CoPI)
16/01/23 → 28/07/23
Project: Research project