Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log

Research output: Contribution to conferenceAbstractOther research output

Abstract

The automatic classification and correction of typing errors in texts has been well-studied, e.g., [1], [2]. Yet, relatively little work can be found on the classification of typographic errors (slips of the finger) versus orthographic errors. In writing research, these errors should be treated separately, as these are cognitively different actions and can have a large influence on, for example, fluency analysis and counts of revisions [3], [4]. This distinction is hard to make using the final writing product only. By analyzing typing errors during the writing process, using keystroke logging, we gain information on both the (timing of the) production and correction of typing errors [5], [6]. Several studies have used these keystroke logs to manually code typographic and orthographic errors, e.g., [7], [8]. In this project, we aim to automatically distinguish between typographic and orthographic errors. This presentation shows our first step: the characterization of typographic errors using keystroke logs from a transcription task. In a transcription task, the final text is given to the writer, hence we assume every revision is a typographic error. Data from 2,103 Dutch transcription tasks (1,717 unique participants) were collected using Inputlog [9]. Character-level confusion matrices (as in [10]) are constructed and patterns of timings are reported. In total, 5,030 corrections were made, of which 59% single substitutions, 5% single transposition, 4% single insertions, and 1% single deletions. In 27% of the revisions more than one mutation was used, and in 4% nothing changed. We invite attendees to discuss our future steps.
Original languageEnglish
Publication statusPublished - 2019
EventThe 29th Computational Linguistics in the Netherlands conference (CLIN29) - Groningen, Netherlands
Duration: 31 Jan 2019 → …

Conference

ConferenceThe 29th Computational Linguistics in the Netherlands conference (CLIN29)
CountryNetherlands
CityGroningen
Period31/01/19 → …

Fingerprint

Transcription
Error correction
Substitution reactions

Cite this

Conijn, R., van Waes, L., & van Zaanen, M. (2019). Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log. Abstract from The 29th Computational Linguistics in the Netherlands conference (CLIN29), Groningen, Netherlands.
Conijn, Rianne ; van Waes, Luuk ; van Zaanen, Menno. / Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log. Abstract from The 29th Computational Linguistics in the Netherlands conference (CLIN29), Groningen, Netherlands.
@conference{66b9319763734b7a85ed29b2ffcce3fa,
title = "Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log",
abstract = "The automatic classification and correction of typing errors in texts has been well-studied, e.g., [1], [2]. Yet, relatively little work can be found on the classification of typographic errors (slips of the finger) versus orthographic errors. In writing research, these errors should be treated separately, as these are cognitively different actions and can have a large influence on, for example, fluency analysis and counts of revisions [3], [4]. This distinction is hard to make using the final writing product only. By analyzing typing errors during the writing process, using keystroke logging, we gain information on both the (timing of the) production and correction of typing errors [5], [6]. Several studies have used these keystroke logs to manually code typographic and orthographic errors, e.g., [7], [8]. In this project, we aim to automatically distinguish between typographic and orthographic errors. This presentation shows our first step: the characterization of typographic errors using keystroke logs from a transcription task. In a transcription task, the final text is given to the writer, hence we assume every revision is a typographic error. Data from 2,103 Dutch transcription tasks (1,717 unique participants) were collected using Inputlog [9]. Character-level confusion matrices (as in [10]) are constructed and patterns of timings are reported. In total, 5,030 corrections were made, of which 59{\%} single substitutions, 5{\%} single transposition, 4{\%} single insertions, and 1{\%} single deletions. In 27{\%} of the revisions more than one mutation was used, and in 4{\%} nothing changed. We invite attendees to discuss our future steps.",
author = "Rianne Conijn and {van Waes}, Luuk and {van Zaanen}, Menno",
year = "2019",
language = "English",
note = "The 29th Computational Linguistics in the Netherlands conference (CLIN29) ; Conference date: 31-01-2019",

}

Conijn, R, van Waes, L & van Zaanen, M 2019, 'Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log' The 29th Computational Linguistics in the Netherlands conference (CLIN29), Groningen, Netherlands, 31/01/19, .

Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log. / Conijn, Rianne; van Waes, Luuk; van Zaanen, Menno.

2019. Abstract from The 29th Computational Linguistics in the Netherlands conference (CLIN29), Groningen, Netherlands.

Research output: Contribution to conferenceAbstractOther research output

TY - CONF

T1 - Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log

AU - Conijn, Rianne

AU - van Waes, Luuk

AU - van Zaanen, Menno

PY - 2019

Y1 - 2019

N2 - The automatic classification and correction of typing errors in texts has been well-studied, e.g., [1], [2]. Yet, relatively little work can be found on the classification of typographic errors (slips of the finger) versus orthographic errors. In writing research, these errors should be treated separately, as these are cognitively different actions and can have a large influence on, for example, fluency analysis and counts of revisions [3], [4]. This distinction is hard to make using the final writing product only. By analyzing typing errors during the writing process, using keystroke logging, we gain information on both the (timing of the) production and correction of typing errors [5], [6]. Several studies have used these keystroke logs to manually code typographic and orthographic errors, e.g., [7], [8]. In this project, we aim to automatically distinguish between typographic and orthographic errors. This presentation shows our first step: the characterization of typographic errors using keystroke logs from a transcription task. In a transcription task, the final text is given to the writer, hence we assume every revision is a typographic error. Data from 2,103 Dutch transcription tasks (1,717 unique participants) were collected using Inputlog [9]. Character-level confusion matrices (as in [10]) are constructed and patterns of timings are reported. In total, 5,030 corrections were made, of which 59% single substitutions, 5% single transposition, 4% single insertions, and 1% single deletions. In 27% of the revisions more than one mutation was used, and in 4% nothing changed. We invite attendees to discuss our future steps.

AB - The automatic classification and correction of typing errors in texts has been well-studied, e.g., [1], [2]. Yet, relatively little work can be found on the classification of typographic errors (slips of the finger) versus orthographic errors. In writing research, these errors should be treated separately, as these are cognitively different actions and can have a large influence on, for example, fluency analysis and counts of revisions [3], [4]. This distinction is hard to make using the final writing product only. By analyzing typing errors during the writing process, using keystroke logging, we gain information on both the (timing of the) production and correction of typing errors [5], [6]. Several studies have used these keystroke logs to manually code typographic and orthographic errors, e.g., [7], [8]. In this project, we aim to automatically distinguish between typographic and orthographic errors. This presentation shows our first step: the characterization of typographic errors using keystroke logs from a transcription task. In a transcription task, the final text is given to the writer, hence we assume every revision is a typographic error. Data from 2,103 Dutch transcription tasks (1,717 unique participants) were collected using Inputlog [9]. Character-level confusion matrices (as in [10]) are constructed and patterns of timings are reported. In total, 5,030 corrections were made, of which 59% single substitutions, 5% single transposition, 4% single insertions, and 1% single deletions. In 27% of the revisions more than one mutation was used, and in 4% nothing changed. We invite attendees to discuss our future steps.

M3 - Abstract

ER -

Conijn R, van Waes L, van Zaanen M. Typoo or orthographic error? Automatic classification of typographic versus orthographic errors using keystroke log. 2019. Abstract from The 29th Computational Linguistics in the Netherlands conference (CLIN29), Groningen, Netherlands.