TY - JOUR
T1 - Predicting author profiles from online abuse directed at public figures
AU - van der Vegt, I.
AU - Kleinberg, B.
AU - Gill, P.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - The problem of online threats and abuse directed at public figures could potentially be mitigated with a computational approach, where sources of abusive language are better understood or identified through author profiling. However, abusive language constitutes a specific domain of language that is untested on whether differences emerge based on personality, age, or gender of text authors. The present study presents a unique data set of 789 abusive messages directed at politicians. It examines statistical relationships between author demographics of text authors and (abusive) language, then uses a machine learning approach to predict personality, age, and gender based on language in the texts. Results showed that (a) personality traits could be determined within 10% of their actual value, (b) age was determined with an error margin of 10 years, and (c) gender was classified correctly in 70% of the cases. Even though we found statistically significant relationships between language use and demographics, prediction performance was poor when compared to previous research on author profiling. Therefore, we suggest that further research is needed before author profiling systems can be of significant value within the context of abusive language and threat assessment.
AB - The problem of online threats and abuse directed at public figures could potentially be mitigated with a computational approach, where sources of abusive language are better understood or identified through author profiling. However, abusive language constitutes a specific domain of language that is untested on whether differences emerge based on personality, age, or gender of text authors. The present study presents a unique data set of 789 abusive messages directed at politicians. It examines statistical relationships between author demographics of text authors and (abusive) language, then uses a machine learning approach to predict personality, age, and gender based on language in the texts. Results showed that (a) personality traits could be determined within 10% of their actual value, (b) age was determined with an error margin of 10 years, and (c) gender was classified correctly in 70% of the cases. Even though we found statistically significant relationships between language use and demographics, prediction performance was poor when compared to previous research on author profiling. Therefore, we suggest that further research is needed before author profiling systems can be of significant value within the context of abusive language and threat assessment.
U2 - 10.1037/tam0000172
DO - 10.1037/tam0000172
M3 - Article
SN - 2169-4850
VL - 9
SP - 17
EP - 32
JO - Journal of Threat Assessment and Management
JF - Journal of Threat Assessment and Management
IS - 1
ER -