Nowadays, the use of internet networks and social media has allowed people to express and interpret their opinions about other people or institutions easily and clearly. With the increasing prevalence of this opportunity, a growing rich content emerges. As a result, the analysis of big data obtained from the internet, transforming it into meaningful information, and using it is a subject that has been studied intensively in recent years. In this process, automatic text summarization has become an important task. In this study, the Helmholtz-based extractive summarization method is presented to create an automatic text summarization system. BBC News data set was used to test the proposed method. In this data set, there are both original full-text documents and summary documents of these original documents produced by human summarizers. The similarity of the summary document produced by the proposed Helmholtz-based extractive text summarization method with the original summary in the BBC News data set was calculated using the Simhash text similarity algorithm. When the results are examined, summary documents can be produced with 38.9% simhash similarity rate with the proposed Helmholtz-based extractive summarization method. In the Experiments section, the results obtained with other third-party extractive summarization algorithms are also shared.
Automatic Document Summarization Helmholtz Principle KL Divergence Latent Semantic Analysis LexRank Summarization Algorithms TextRank TF-IDF
Nowadays, the use of internet networks and social media has allowed people to express and interpret their opinions about other people or institutions easily and clearly. With the increasing prevalence of this opportunity, a growing rich content emerges. As a result, the analysis of big data obtained from the internet, transforming it into meaningful information, and using it is a subject that has been studied intensively in recent years. In this process, automatic text summarization has become an important task. In this study, the Helmholtz-based extractive summarization method is presented to create an automatic text summarization system. BBC News data set was used to test the proposed method. In this data set, there are both original full-text documents and summary documents of these original documents produced by human summarizers. The similarity of the summary document produced by the proposed Helmholtz-based extractive text summarization method with the original summary in the BBC News data set was calculated using the Simhash text similarity algorithm. When the results are examined, summary documents can be produced with 38.9% simhash similarity rate with the proposed Helmholtz-based extractive summarization method. In the Experiments section, the results obtained with other third-party extractive summarization algorithms are also shared.
Automatic Document Summarization Helmholtz Principle KL Divergence Latent Semantic Analysis LexRank Summarization Algorithms TextRank TF-IDF
Primary Language | English |
---|---|
Subjects | Engineering |
Journal Section | Articles |
Authors | |
Publication Date | October 10, 2022 |
Published in Issue | Year 2022 Volume: 5 Issue: 1 |
Dergimizin Tarandığı Dizinler (İndeksler)
Academic Resource Index | Google Scholar | ASOS Index |
Rooting Index | The JournalTOCs Index | General Impact Factor (GIF) Index |
Directory of Research Journals Indexing | I2OR Index
|