Predicting Housing Market Trends Using Twitter Data

Marlon Velthorst, Çiçek Güven

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

In this study, we try to predict the Dutch housing market trends using text mining and machine learning as an application of data science methods in finance. Our main goal is to predict the short term upward or downward trend of the average house price in the Dutch market by using text data collected from Twitter. Twitter is widely used as well and has been proven to be a helpful source of data. However, Twitter, text mining (tokenization, bag-of-words, n-grams, weighted term frequencies) and machine learning (classification algorithms) have not been combined yet in order to predict the housing market trends in short term. In this study, tweets including predefined search words are collected relying on domain knowledge, and the corresponding text is grouped by month as documents. Then words and word sequences are transformed into numerical values. These values served as attributes to predict whether the housing market moves up or down, i.e. we approached this as a binomial classification problem relating text data of a month with (up or down) trends for the following month. Our main results reveal there is a correlation between the (weighted) frequency of words and short term housing trends, in other words, we were able to make accurate predictions of trends in short term using multiple machine learning and text mining techniques combined.
Original languageEnglish
Title of host publicationProceedings of 2019 6th Swiss Conference on Data Science (SDS)
PublisherIEEE
DOIs
Publication statusPublished - 8 Aug 2019
Externally publishedYes

Fingerprint Dive into the research topics of 'Predicting Housing Market Trends Using Twitter Data'. Together they form a unique fingerprint.

  • Cite this

    Velthorst, M., & Güven, Ç. (2019). Predicting Housing Market Trends Using Twitter Data. In Proceedings of 2019 6th Swiss Conference on Data Science (SDS) IEEE. https://doi.org/10.1109/SDS.2019.00010