Language encodes geographical information

Max M Louwerse, Rolf A Zwaan

Research output: Contribution to journalArticleScientificpeer-review

71 Citations (Scopus)


Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathematical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordinates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.

Original languageEnglish
Pages (from-to)51-73
Number of pages23
JournalCognitive Science
Issue number1
Publication statusPublished - Jan 2009
Externally publishedYes


Dive into the research topics of 'Language encodes geographical information'. Together they form a unique fingerprint.

Cite this