TY - GEN
T1 - Efficient term cloud generation for streaming web content
AU - Papapetrou, Odysseas
AU - Papadakis, George
AU - Ioannou, Ekaterini
AU - Skoutas, Dimitrios
PY - 2010
Y1 - 2010
N2 - Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources from which to get their information, they often also want to get a wider overview and glimpse of what is being reported and discussed in the news and the blogosphere. In this paper, we present an approach for supporting this discovery and exploration process by exploiting term clouds. In particular, we provide an efficient method for dynamically computing the most frequently appearing terms in the posts of monitored online sources, for time intervals specified at query time, without the need to archive the actual published content. An experimental evaluation on a large-scale real-world set of blogs demonstrates the accuracy and the efficiency of the proposed method in terms of computational time and memory requirements.
AB - Large amounts of information are posted daily on the Web, such as articles published online by traditional news agencies or blog posts referring to and commenting on various events. Although the users sometimes rely on a small set of trusted sources from which to get their information, they often also want to get a wider overview and glimpse of what is being reported and discussed in the news and the blogosphere. In this paper, we present an approach for supporting this discovery and exploration process by exploiting term clouds. In particular, we provide an efficient method for dynamically computing the most frequently appearing terms in the posts of monitored online sources, for time intervals specified at query time, without the need to archive the actual published content. An experimental evaluation on a large-scale real-world set of blogs demonstrates the accuracy and the efficiency of the proposed method in terms of computational time and memory requirements.
U2 - 10.1007/978-3-642-13911-6_26
DO - 10.1007/978-3-642-13911-6_26
M3 - Conference contribution
T3 - Lecture Notes in Computer Science
SP - 385
EP - 399
BT - International Conference on Web Engineering
PB - Springer
CY - Berlin
T2 - 10th International Conference on Web Engineering
Y2 - 5 July 2010 through 9 July 2010
ER -