Current Limitations in Cyberbullying Detection

on Evaluation Criteria, Reproducibility, and Data Scarcity

Chris Emmery*, Ben Verhoeven, Guy De Pauw, Gilles Jacobs, Cynthia Van Hee, Els Lefever, Bart Desmet, Véronique Hoste, Walter Daelemans

*Corresponding author for this work

Research output: Contribution to journalArticleScientific

Abstract

The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.
Original languageEnglish
Number of pages26
JournalarXiv
Publication statusE-pub ahead of print - 25 Oct 2019

Fingerprint

Classifiers
Experiments

Keywords

  • cyberbullying detection
  • cybersecurity
  • machine learning
  • benchmarking
  • resource evaluation
  • cross-domain
  • reproducibility
  • crowdsourcing

Cite this

Emmery, C., Verhoeven, B., De Pauw, G., Jacobs, G., Van Hee, C., Lefever, E., ... Daelemans, W. (2019). Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity. arXiv.
Emmery, Chris ; Verhoeven, Ben ; De Pauw, Guy ; Jacobs, Gilles ; Van Hee, Cynthia ; Lefever, Els ; Desmet, Bart ; Hoste, Véronique ; Daelemans, Walter. / Current Limitations in Cyberbullying Detection : on Evaluation Criteria, Reproducibility, and Data Scarcity. In: arXiv. 2019.
@article{4ef7db6da33640c6912c6eb52a8ae08c,
title = "Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity",
abstract = "The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.",
keywords = "cyberbullying detection, cybersecurity, machine learning, benchmarking, resource evaluation, cross-domain, reproducibility, crowdsourcing",
author = "Chris Emmery and Ben Verhoeven and {De Pauw}, Guy and Gilles Jacobs and {Van Hee}, Cynthia and Els Lefever and Bart Desmet and V{\'e}ronique Hoste and Walter Daelemans",
year = "2019",
month = "10",
day = "25",
language = "English",
journal = "arXiv",

}

Emmery, C, Verhoeven, B, De Pauw, G, Jacobs, G, Van Hee, C, Lefever, E, Desmet, B, Hoste, V & Daelemans, W 2019, 'Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity', arXiv.

Current Limitations in Cyberbullying Detection : on Evaluation Criteria, Reproducibility, and Data Scarcity. / Emmery, Chris; Verhoeven, Ben; De Pauw, Guy; Jacobs, Gilles; Van Hee, Cynthia; Lefever, Els; Desmet, Bart; Hoste, Véronique; Daelemans, Walter.

In: arXiv, 25.10.2019.

Research output: Contribution to journalArticleScientific

TY - JOUR

T1 - Current Limitations in Cyberbullying Detection

T2 - on Evaluation Criteria, Reproducibility, and Data Scarcity

AU - Emmery, Chris

AU - Verhoeven, Ben

AU - De Pauw, Guy

AU - Jacobs, Gilles

AU - Van Hee, Cynthia

AU - Lefever, Els

AU - Desmet, Bart

AU - Hoste, Véronique

AU - Daelemans, Walter

PY - 2019/10/25

Y1 - 2019/10/25

N2 - The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

AB - The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field.

KW - cyberbullying detection

KW - cybersecurity

KW - machine learning

KW - benchmarking

KW - resource evaluation

KW - cross-domain

KW - reproducibility

KW - crowdsourcing

M3 - Article

JO - arXiv

JF - arXiv

ER -

Emmery C, Verhoeven B, De Pauw G, Jacobs G, Van Hee C, Lefever E et al. Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity. arXiv. 2019 Oct 25.