Singling the Odd Ones Out: A Novelty Detection Approach to Find Defects in Infrastructure-as-Code

Stefano Dalla Palma, Majeed Mohammadi, Dario Di Nucci, Damian A. Tamburri

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    75 Downloads (Pure)

    Abstract

    Infrastructure-as-Code (IaC) is increasingly adopted. However, little is known about how to best maintain and evolve it. Previous studies focused on defining Machine-Learning models to predict defect-prone blueprints using supervised binary classification. This class of techniques uses both defective and non-defective instances in the training phase. Furthermore, the high imbalance between defective and non-defective samples makes the training more difficult and leads to unreliable classifiers. In this work, we tackle the defect-prediction problem from a different perspective using novelty detection and evaluate the performance of three techniques, namely OneClassSVM, LocalOutlierFactor, and IsolationForest, and compare their performance with a baseline RandomForest binary classifier. Such models are trained using only non-defective samples: defective data points are treated as novelty because the number of defective samples is too little compared to defective ones. We conduct an empirical study on an extremely-imbalanced dataset consisting of 85 real-world Ansible projects containing only small amounts of defective instances. We found that novelty detection techniques can recognize defects with a high level of precision and recall, an AUC-PR up to 0.86, and an MCC up to 0.31. We deem our results can influence the current trends in defect detection and put forward a new research path toward dealing with this problem.
    Original languageEnglish
    Title of host publicationMaLTeSQuE 2020: Proceedings of the 4th ACM SIGSOFT International Workshop on Machine-Learning Techniques for Software-Quality Evaluation
    Pages31-36
    Number of pages6
    Publication statusAccepted/In press - 2020
    EventThe 4th edition of the International Workshop on Machine Learning Techniques for Software Quality Evolution - Sacramento, Sacramento, United States
    Duration: 16 Nov 202016 Nov 2020
    https://maltesque2020.github.io/

    Conference

    ConferenceThe 4th edition of the International Workshop on Machine Learning Techniques for Software Quality Evolution
    Abbreviated titleMALTESQUE2020
    Country/TerritoryUnited States
    CitySacramento
    Period16/11/2016/11/20
    Internet address

    Keywords

    • Infrastructure-as-Code
    • Novelty Detection
    • Defect Prediction

    Fingerprint

    Dive into the research topics of 'Singling the Odd Ones Out: A Novelty Detection Approach to Find Defects in Infrastructure-as-Code'. Together they form a unique fingerprint.

    Cite this