TY - GEN
T1 - Illicit Darkweb Classification via Natural-language Processing
T2 - 19th International Conference on Security and Cryptography, SECRYPT 2022
AU - Cascavilla, Giuseppe
AU - Catolino, Gemma
AU - Sangiovanni, Mirella
N1 - Publisher Copyright:
© 2021 by SCITEPRESS - Science and Technology Publications, Lda. All rights reserved.
PY - 2022
Y1 - 2022
N2 - This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly optimized BERT approach) with a traditional text classification approach like LSTM (Long short-term memory) neural networks. Finally, we developed two illegal activities classification approaches, one for illicit content on the Dark Web and one for identifying the specific types of drugs. Results show that Bert obtained the best approach, classifying the dark web’s general content and the types of Drugs with 96.08% and 91.98% of accuracy.
AB - This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly optimized BERT approach) with a traditional text classification approach like LSTM (Long short-term memory) neural networks. Finally, we developed two illegal activities classification approaches, one for illicit content on the Dark Web and one for identifying the specific types of drugs. Results show that Bert obtained the best approach, classifying the dark web’s general content and the types of Drugs with 96.08% and 91.98% of accuracy.
KW - AI
KW - Bert
KW - DarkWeb
KW - LSTM
KW - Machine Learning
KW - Natural-language Processing
KW - RoBERTA
KW - ULMFit
UR - http://www.scopus.com/inward/record.url?scp=85178502961&partnerID=8YFLogxK
U2 - 10.5220/0011298600003283
DO - 10.5220/0011298600003283
M3 - Conference contribution
AN - SCOPUS:85178502961
SN - 9789897585906
T3 - Proceedings of the International Conference on Security and Cryptography
SP - 620
EP - 626
BT - SECRYPT 2022 - Proceedings of the 19th International Conference on Security and Cryptography
A2 - De Capitani di Vimercati, Sabrina
A2 - Samarati, Pierangela
PB - Science and Technology Publications, Lda
Y2 - 11 July 2022 through 13 July 2022
ER -