Automatic detection of cyberbullying in social media text

Cynthia Van Hee*, Gilles Jacobs, Chris Emmery, Bart Desmet, Els Lefever, Ben Verhoeven, Guy De Pauw, Walter Daelemans, Veronique Hoste

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.
Original languageEnglish
Article number0203794
Pages (from-to)e0203794
Number of pages22
JournalPLoS ONE
Volume13
Issue number10
Early online dateJan 2018
DOIs
Publication statusPublished - 8 Oct 2018

Fingerprint

Intelligent systems
Support vector machines
Classifiers
Experiments
Communication

Keywords

  • BULLYING BEHAVIOR
  • EXPERIENCES
  • HEALTH
  • IMPACT
  • NETWORK
  • YOUNGSTERS

Cite this

Van Hee, C., Jacobs, G., Emmery, C., Desmet, B., Lefever, E., Verhoeven, B., ... Hoste, V. (2018). Automatic detection of cyberbullying in social media text. PLoS ONE, 13(10), e0203794. [0203794]. https://doi.org/10.1371/journal.pone.0203794
Van Hee, Cynthia ; Jacobs, Gilles ; Emmery, Chris ; Desmet, Bart ; Lefever, Els ; Verhoeven, Ben ; De Pauw, Guy ; Daelemans, Walter ; Hoste, Veronique. / Automatic detection of cyberbullying in social media text. In: PLoS ONE. 2018 ; Vol. 13, No. 10. pp. e0203794.
@article{edeb918a41b944a683c696b845346b6b,
title = "Automatic detection of cyberbullying in social media text",
abstract = "While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64{\%} and 61{\%} for English and Dutch respectively, and considerably outperforms baseline systems.",
keywords = "BULLYING BEHAVIOR, EXPERIENCES, HEALTH, IMPACT, NETWORK, YOUNGSTERS",
author = "{Van Hee}, Cynthia and Gilles Jacobs and Chris Emmery and Bart Desmet and Els Lefever and Ben Verhoeven and {De Pauw}, Guy and Walter Daelemans and Veronique Hoste",
year = "2018",
month = "10",
day = "8",
doi = "10.1371/journal.pone.0203794",
language = "English",
volume = "13",
pages = "e0203794",
journal = "PLoS ONE",
issn = "1932-6203",
publisher = "PUBLIC LIBRARY SCIENCE",
number = "10",

}

Van Hee, C, Jacobs, G, Emmery, C, Desmet, B, Lefever, E, Verhoeven, B, De Pauw, G, Daelemans, W & Hoste, V 2018, 'Automatic detection of cyberbullying in social media text', PLoS ONE, vol. 13, no. 10, 0203794, pp. e0203794. https://doi.org/10.1371/journal.pone.0203794

Automatic detection of cyberbullying in social media text. / Van Hee, Cynthia; Jacobs, Gilles; Emmery, Chris; Desmet, Bart; Lefever, Els; Verhoeven, Ben; De Pauw, Guy; Daelemans, Walter; Hoste, Veronique.

In: PLoS ONE, Vol. 13, No. 10, 0203794, 08.10.2018, p. e0203794.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Automatic detection of cyberbullying in social media text

AU - Van Hee, Cynthia

AU - Jacobs, Gilles

AU - Emmery, Chris

AU - Desmet, Bart

AU - Lefever, Els

AU - Verhoeven, Ben

AU - De Pauw, Guy

AU - Daelemans, Walter

AU - Hoste, Veronique

PY - 2018/10/8

Y1 - 2018/10/8

N2 - While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

AB - While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a cyberbullying corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for the task. Experiments on a hold-out test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1 score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems.

KW - BULLYING BEHAVIOR

KW - EXPERIENCES

KW - HEALTH

KW - IMPACT

KW - NETWORK

KW - YOUNGSTERS

U2 - 10.1371/journal.pone.0203794

DO - 10.1371/journal.pone.0203794

M3 - Article

C2 - 30296299

VL - 13

SP - e0203794

JO - PLoS ONE

JF - PLoS ONE

SN - 1932-6203

IS - 10

M1 - 0203794

ER -

Van Hee C, Jacobs G, Emmery C, Desmet B, Lefever E, Verhoeven B et al. Automatic detection of cyberbullying in social media text. PLoS ONE. 2018 Oct 8;13(10):e0203794. 0203794. https://doi.org/10.1371/journal.pone.0203794