Language encodes geographical information

Max M Louwerse, Rolf A Zwaan

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathematical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordinates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.

Original languageEnglish
Pages (from-to)51-73
Number of pages23
JournalCognitive Science
Volume33
Issue number1
DOIs
Publication statusPublished - Jan 2009
Externally publishedYes

Fingerprint

Language
Semantics
Computational linguistics
Newspapers
Linguistics

Cite this

Louwerse, Max M ; Zwaan, Rolf A. / Language encodes geographical information. In: Cognitive Science. 2009 ; Vol. 33, No. 1. pp. 51-73.
@article{97b45a7c77014cd58baaf9b927efe4b6,
title = "Language encodes geographical information",
abstract = "Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathematical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordinates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.",
author = "Louwerse, {Max M} and Zwaan, {Rolf A}",
note = "Copyright {\circledC} 2009 Cognitive Science Society, Inc.",
year = "2009",
month = "1",
doi = "10.1111/j.1551-6709.2008.01003.x",
language = "English",
volume = "33",
pages = "51--73",
journal = "Cognitive Science",
issn = "0364-0213",
publisher = "Wiley",
number = "1",

}

Language encodes geographical information. / Louwerse, Max M; Zwaan, Rolf A.

In: Cognitive Science, Vol. 33, No. 1, 01.2009, p. 51-73.

Research output: Contribution to journalArticleScientificpeer-review

TY - JOUR

T1 - Language encodes geographical information

AU - Louwerse, Max M

AU - Zwaan, Rolf A

N1 - Copyright © 2009 Cognitive Science Society, Inc.

PY - 2009/1

Y1 - 2009/1

N2 - Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathematical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordinates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.

AB - Population counts and longitude and latitude coordinates were estimated for the 50 largest cities in the United States by computational linguistic techniques and by human participants. The mathematical technique Latent Semantic Analysis applied to newspaper texts produced similarity ratings between the 50 cities that allowed for a multidimensional scaling (MDS) of these cities. MDS coordinates correlated with the actual longitude and latitude of these cities, showing that cities that are located together share similar semantic contexts. This finding was replicated using a first-order co-occurrence algorithm. The computational estimates of geographical location as well as population were akin to human estimates. These findings show that language encodes geographical information that language users in turn may use in their understanding of language and the world.

U2 - 10.1111/j.1551-6709.2008.01003.x

DO - 10.1111/j.1551-6709.2008.01003.x

M3 - Article

VL - 33

SP - 51

EP - 73

JO - Cognitive Science

JF - Cognitive Science

SN - 0364-0213

IS - 1

ER -