Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    Written language contains stylistic cues that can be exploited to automatically infer a variety of potentially sensitive author information. Adversarial stylometry intends to attack such models by rewriting an author’s text. Our re-search proposes several components to facilitate deployment of these adversarial attacks in the wild, where neither data nor target models are accessible. We introduce a transformer-based extension of a lexical replacement attack, and show it achieves high transferability when trained on a weakly labeled corpus--decreasing target model performance below chance. While not completely inconspicuous, our more successful attacks also prove notably less detectable by humans. Our framework therefore provides a promising direction for future privacy-preserving adversarial attacks.
    Original languageEnglish
    Title of host publicationProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers
    Place of PublicationKyiv, Ukraine
    PublisherAssociation for Computational Linguistics
    Pages2388-2402
    Number of pages14
    DOIs
    Publication statusPublished - 15 Apr 2021
    EventThe 16th Conference of the European Chapter of the Association for Computational Linguistics - Kyiv, Ukraine
    Duration: 19 Apr 202123 Apr 2021
    Conference number: 16
    https://2021.eacl.org/

    Conference

    ConferenceThe 16th Conference of the European Chapter of the Association for Computational Linguistics
    Abbreviated titleEACL 2021
    Country/TerritoryUkraine
    CityKyiv
    Period19/04/2123/04/21
    Internet address

    Fingerprint

    Dive into the research topics of 'Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling'. Together they form a unique fingerprint.

    Cite this