Project Details
Description
We intend to collect a dataset of user profiles and posts from reddit.com, with discussions on an extensive set of topics. Using this, we can demonstrate how such data can be abused to algorithmically learn the writing style associated with particular personal characteristics. These machine learning models learn generalizable author profiles, meaning, they work on new users, and on other social media platforms. Hence, the models can be (and currently are) deployed to uncover sensitive information about Internet users without their consent or awareness. This ranges from more mundane features such as gender or age, to ones raising serious privacy concerns, such as personality types, political affiliation, and mental health issues. This project will highlight this issue with a public, online demo, where users can see which features these models exploit and how they can interactively change their writing to break these models.
Layman's description
Data Collection: The project involves gathering information from Reddit, including user profiles and the posts they make. Reddit is a website where people discuss a wide range of topics.
Machine Learning Analysis: Using advanced computer algorithms (machine learning), the project aims to show how this collected data can be misused. Specifically, it demonstrates how algorithms can learn to recognize patterns in the way people write, which can reveal personal characteristics.
Learning Writing Styles: These algorithms are designed to learn the unique writing styles associated with different personal traits. This means they can figure out things like gender, age, personality type, political views, and even mental health issues just by analyzing how someone writes.
Privacy Concerns: The project highlights the serious privacy implications of this kind of data analysis. It's not just about knowing someone's age or gender; it's about potentially revealing very personal and sensitive information without the person's knowledge or consent.
Public Demo: To raise awareness, the project will create an online demonstration where users can see firsthand how these algorithms work. Users will be able to see which aspects of their writing the algorithms are picking up on and even learn how to "fool" the algorithms by changing how they write.
In essence, this project is a wake-up call about the risks of online data collection and analysis, showing how seemingly innocent actions like posting on social media can inadvertently reveal a lot about ourselves, often without us even realizing it.
Machine Learning Analysis: Using advanced computer algorithms (machine learning), the project aims to show how this collected data can be misused. Specifically, it demonstrates how algorithms can learn to recognize patterns in the way people write, which can reveal personal characteristics.
Learning Writing Styles: These algorithms are designed to learn the unique writing styles associated with different personal traits. This means they can figure out things like gender, age, personality type, political views, and even mental health issues just by analyzing how someone writes.
Privacy Concerns: The project highlights the serious privacy implications of this kind of data analysis. It's not just about knowing someone's age or gender; it's about potentially revealing very personal and sensitive information without the person's knowledge or consent.
Public Demo: To raise awareness, the project will create an online demonstration where users can see firsthand how these algorithms work. Users will be able to see which aspects of their writing the algorithms are picking up on and even learn how to "fool" the algorithms by changing how they write.
In essence, this project is a wake-up call about the risks of online data collection and analysis, showing how seemingly innocent actions like posting on social media can inadvertently reveal a lot about ourselves, often without us even realizing it.
Short title | Gathering Redditors Against Stylometric Profiling |
---|---|
Acronym | GRASP |
Status | Finished |
Effective start/end date | 16/01/23 β 28/07/23 |
Keywords
- computational stylometry
- author profiling
- adversarial stylometry
- algorithmic bias
- author obfuscation
- social media
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.