TY - GEN
T1 - "When the Code becomes a Crime Scene" Towards Dark Web Threat Intelligence with Software Quality Metrics
AU - Cascavilla, G.
AU - Catolino, G.
AU - Ebert, F.
AU - Tamburri, D. A.
AU - Van Den Heuvel, W. J.
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - The increasing growth of illegal online activities in the so-called dark web - that is, the hidden collective of internet sites only accessible by a specialized web browsers - has challenged law enforcement agencies in recent years with sparse research efforts to help. For example, research has been devoted to supporting law enforcement by employing Natural Language Processing (NLP) to detect illegal activities on the dark web and build models for their classification. However, current approaches strongly rely upon the linguistic characteristics used to train the models, e.g., language semantics, which threatens their generalizability. To overcome this limitation, we tackle the problem of predicting illegal and criminal activities - a process defined as threat intelligence - on the dark web from a complementary perspective - that of dark web code maintenance and evolution - and propose a novel approach that uses software quality metrics and dark website appearance parameters instead of linguistic characteristics. We performed a preliminary empirical study on 10.367 web pages and collected more than 40 code metrics and website parameters using sonarqube. Results show an accuracy of up to 82% for predicting the three types of illegal activities (i.e., suspicious, normal, and unknown) and 66% for detecting 26 specific illegal activities, such as drugs or weapons trafficking. We deem our results can influence the current trends in detecting illegal activities on the dark web and put forward a completely novel research avenue toward dealing with this problem from a software maintenance and evolution perspective.
AB - The increasing growth of illegal online activities in the so-called dark web - that is, the hidden collective of internet sites only accessible by a specialized web browsers - has challenged law enforcement agencies in recent years with sparse research efforts to help. For example, research has been devoted to supporting law enforcement by employing Natural Language Processing (NLP) to detect illegal activities on the dark web and build models for their classification. However, current approaches strongly rely upon the linguistic characteristics used to train the models, e.g., language semantics, which threatens their generalizability. To overcome this limitation, we tackle the problem of predicting illegal and criminal activities - a process defined as threat intelligence - on the dark web from a complementary perspective - that of dark web code maintenance and evolution - and propose a novel approach that uses software quality metrics and dark website appearance parameters instead of linguistic characteristics. We performed a preliminary empirical study on 10.367 web pages and collected more than 40 code metrics and website parameters using sonarqube. Results show an accuracy of up to 82% for predicting the three types of illegal activities (i.e., suspicious, normal, and unknown) and 66% for detecting 26 specific illegal activities, such as drugs or weapons trafficking. We deem our results can influence the current trends in detecting illegal activities on the dark web and put forward a completely novel research avenue toward dealing with this problem from a software maintenance and evolution perspective.
KW - Dark Web
KW - Machine Learning
KW - Software Code metrics
KW - Software Code Quality
UR - http://www.scopus.com/inward/record.url?scp=85146254864&partnerID=8YFLogxK
U2 - 10.1109/ICSME55016.2022.00055
DO - 10.1109/ICSME55016.2022.00055
M3 - Conference contribution
AN - SCOPUS:85146254864
T3 - Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
SP - 439
EP - 443
BT - Proceedings - 2022 IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Software Maintenance and Evolution, ICSME 2022
Y2 - 2 October 2022 through 7 October 2022
ER -