TY - JOUR
T1 - A Survey on Multi-document Summarization and Domain-Oriented Approaches
AU - Afsharizadeh, Mahsa
AU - Ebrahimpour-Komleh, Hossein
AU - Bagheri, Ayoub
AU - Chrupala, Grzegorz
N1 - Publisher Copyright:
© 2022, Journal of Information Systems and Telecommunication.All Rights Reserved.
PY - 2022/12
Y1 - 2022/12
N2 - Before the advent of the World Wide Web, lack of information was a problem. But with the advent of the web today, we are faced with an explosive amount of information in every area of search. This extra information is troublesome and prevents a quick and correct decision. This is the problem of information overload. Multi-document summarization is an important solution for this problem by producing a brief summary containing the most important information from a set of documents in a short time. This summary should preserve the main concepts of the documents. When the input documents are related to a specific domain, for example, medicine or law, summarization faces more challenges. Domain-oriented summarization methods use special characteristics related to that domain to generate summaries. This paper introduces the purpose of multi-document summarization systems and discusses domain-oriented approaches. Various methods have been proposed by researchers for multi-document summarization. This survey reviews the categorizations that authors have made on multi-document summarization methods. We also categorize the multi-document summarization methods into six categories: machine learning, clustering, graph, Latent Dirichlet Allocation (LDA), optimization, and deep learning. We review the different methods presented in each of these groups. We also compare the advantages and disadvantages of these groups. We have discussed the standard datasets used in this field, evaluation measures, challenges and recommendations
AB - Before the advent of the World Wide Web, lack of information was a problem. But with the advent of the web today, we are faced with an explosive amount of information in every area of search. This extra information is troublesome and prevents a quick and correct decision. This is the problem of information overload. Multi-document summarization is an important solution for this problem by producing a brief summary containing the most important information from a set of documents in a short time. This summary should preserve the main concepts of the documents. When the input documents are related to a specific domain, for example, medicine or law, summarization faces more challenges. Domain-oriented summarization methods use special characteristics related to that domain to generate summaries. This paper introduces the purpose of multi-document summarization systems and discusses domain-oriented approaches. Various methods have been proposed by researchers for multi-document summarization. This survey reviews the categorizations that authors have made on multi-document summarization methods. We also categorize the multi-document summarization methods into six categories: machine learning, clustering, graph, Latent Dirichlet Allocation (LDA), optimization, and deep learning. We review the different methods presented in each of these groups. We also compare the advantages and disadvantages of these groups. We have discussed the standard datasets used in this field, evaluation measures, challenges and recommendations
KW - Abstractive
KW - Domain-oriented
KW - Extractive
KW - Multi-document Summarization
KW - Rouge
KW - Single Document Summarization
UR - http://www.scopus.com/inward/record.url?scp=85125529365&partnerID=8YFLogxK
U2 - 10.52547/jist.16245.10.37.68
DO - 10.52547/jist.16245.10.37.68
M3 - Article
AN - SCOPUS:85125529365
SN - 2322-1437
VL - 10
SP - 68
EP - 79
JO - Journal of Information Systems and Telecommunication
JF - Journal of Information Systems and Telecommunication
IS - 37
ER -