Research Article
BibTex RIS Cite

Helmholtz-Based Automatic Document Summarization

Year 2022, Volume: 5 Issue: 1, 13 - 25, 10.10.2022

Abstract

Nowadays, the use of internet networks and social media has allowed people to express and interpret their opinions about other people or institutions easily and clearly. With the increasing prevalence of this opportunity, a growing rich content emerges. As a result, the analysis of big data obtained from the internet, transforming it into meaningful information, and using it is a subject that has been studied intensively in recent years. In this process, automatic text summarization has become an important task. In this study, the Helmholtz-based extractive summarization method is presented to create an automatic text summarization system. BBC News data set was used to test the proposed method. In this data set, there are both original full-text documents and summary documents of these original documents produced by human summarizers. The similarity of the summary document produced by the proposed Helmholtz-based extractive text summarization method with the original summary in the BBC News data set was calculated using the Simhash text similarity algorithm. When the results are examined, summary documents can be produced with 38.9% simhash similarity rate with the proposed Helmholtz-based extractive summarization method. In the Experiments section, the results obtained with other third-party extractive summarization algorithms are also shared.

References

  • Lee, J. H., Park, S., Ahn, C. M., & Kim, D. (2009). Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management, 45(1), 20-34.
  • Torres-Moreno, J. M. (2014). Automatic text summarization. John Wiley & Sons.
  • Joshi A., Fidalgo E., Alegre E., Fernández-Robles L. 2019. SummCoder: An Unsupervised Framework for Extractive Text Summarization Based on Deep Auto-encoders. Expert Syst Appl., doi: 10.1016/j.eswa.2019.03.045.
  • Cigir C., Kutlu M., Cicekli I. 2009. Generic text summarization for Turkish. 2009 24th International Symposium on Computer and Information Sciences (IEEE), pp: 224-229.
  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
  • A. R. Pal, D. Saha, “An approach to automatic text summarization using WordNet,” 2014 IEEE International Advance Computing Conference (IACC), India, pp. 1169-1173, 2014.
  • D. Gunawan, S.H. Harahap, R.F. Rahmat, “Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia,” 2019 International Conference on ICT for Smart Society (ICISS), Indonesia, pp. 1-5, 2019.
  • Al-Sabahi, Kamal & Zuping, Zhang & Nadher, Mohammed. (2018). A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS). IEEE Access. PP. 1-1. 10.1109/ACCESS.2018.2829199.
  • A. Vartakavi, A. Garg and Z. Rafii, "Audio Summarization for Podcasts," 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 431-435, doi: 10.23919/EUSIPCO54536.2021.9615948.
  • https://www.kaggle.com/pariza/bbc-news-summary
  • B. Dadachev, A. Balinsky, H. Balinsky and S. Simske, “On the Helmholtz Principle for Data Mining,” Third International Conference on Emerging Security Technologies (EST), Lisbon, Portekiz, 2012.
  • B. Dadachev, A. Balinsky, H. Balinsky and S. Simske, “On Helmholtz’s Principle for Documents Processing,” Proceedings of the 10th ACM Symposium on Document Engineering, Manchester, England, ss. 283-286, 2010.
  • A. Toprak and M. Turan, "English Automatic Dictionary Creation with Natural Language Processing," 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), 2019, pp. 1-6, doi: 10.1109/ASYU48272.2019.8946431.
  • Khoury, Raphaël & Shi, Lei & Hamou-Lhadj, Abdelwahab. (2016). Key Elements Extraction and Traces Comprehension Using Gestalt Theory and the Helmholtz Principle. 478-482. 10.1109/ICSME.2016.24.
  • Sarkar K., Saraf K., Ghosh A. 2015. Improving graph based multidocument text summarization using an enhanced sentence similarity measure. 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015 - Proceedings, pp: 359-365.
  • J.-M. M. Agnes Desolneux, Lionel Moisan Jean-Michel, From GestaltTheory to Image Analysis, 2006, vol. 34, no. July.
  • A. Toprak and M. Turan, "The Positive Effect of PMI on the Selection of Meaningful Words," 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), 2019, pp. 911-915, doi: 10.23919/ELECO47770.2019.8990666.

Helmholtz-Based Automatic Document Summarization

Year 2022, Volume: 5 Issue: 1, 13 - 25, 10.10.2022

Abstract

Nowadays, the use of internet networks and social media has allowed people to express and interpret their opinions about other people or institutions easily and clearly. With the increasing prevalence of this opportunity, a growing rich content emerges. As a result, the analysis of big data obtained from the internet, transforming it into meaningful information, and using it is a subject that has been studied intensively in recent years. In this process, automatic text summarization has become an important task. In this study, the Helmholtz-based extractive summarization method is presented to create an automatic text summarization system. BBC News data set was used to test the proposed method. In this data set, there are both original full-text documents and summary documents of these original documents produced by human summarizers. The similarity of the summary document produced by the proposed Helmholtz-based extractive text summarization method with the original summary in the BBC News data set was calculated using the Simhash text similarity algorithm. When the results are examined, summary documents can be produced with 38.9% simhash similarity rate with the proposed Helmholtz-based extractive summarization method. In the Experiments section, the results obtained with other third-party extractive summarization algorithms are also shared.

References

  • Lee, J. H., Park, S., Ahn, C. M., & Kim, D. (2009). Automatic generic document summarization based on non-negative matrix factorization. Information Processing & Management, 45(1), 20-34.
  • Torres-Moreno, J. M. (2014). Automatic text summarization. John Wiley & Sons.
  • Joshi A., Fidalgo E., Alegre E., Fernández-Robles L. 2019. SummCoder: An Unsupervised Framework for Extractive Text Summarization Based on Deep Auto-encoders. Expert Syst Appl., doi: 10.1016/j.eswa.2019.03.045.
  • Cigir C., Kutlu M., Cicekli I. 2009. Generic text summarization for Turkish. 2009 24th International Symposium on Computer and Information Sciences (IEEE), pp: 224-229.
  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and development, 2(2), 159-165.
  • A. R. Pal, D. Saha, “An approach to automatic text summarization using WordNet,” 2014 IEEE International Advance Computing Conference (IACC), India, pp. 1169-1173, 2014.
  • D. Gunawan, S.H. Harahap, R.F. Rahmat, “Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia,” 2019 International Conference on ICT for Smart Society (ICISS), Indonesia, pp. 1-5, 2019.
  • Al-Sabahi, Kamal & Zuping, Zhang & Nadher, Mohammed. (2018). A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS). IEEE Access. PP. 1-1. 10.1109/ACCESS.2018.2829199.
  • A. Vartakavi, A. Garg and Z. Rafii, "Audio Summarization for Podcasts," 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 431-435, doi: 10.23919/EUSIPCO54536.2021.9615948.
  • https://www.kaggle.com/pariza/bbc-news-summary
  • B. Dadachev, A. Balinsky, H. Balinsky and S. Simske, “On the Helmholtz Principle for Data Mining,” Third International Conference on Emerging Security Technologies (EST), Lisbon, Portekiz, 2012.
  • B. Dadachev, A. Balinsky, H. Balinsky and S. Simske, “On Helmholtz’s Principle for Documents Processing,” Proceedings of the 10th ACM Symposium on Document Engineering, Manchester, England, ss. 283-286, 2010.
  • A. Toprak and M. Turan, "English Automatic Dictionary Creation with Natural Language Processing," 2019 Innovations in Intelligent Systems and Applications Conference (ASYU), 2019, pp. 1-6, doi: 10.1109/ASYU48272.2019.8946431.
  • Khoury, Raphaël & Shi, Lei & Hamou-Lhadj, Abdelwahab. (2016). Key Elements Extraction and Traces Comprehension Using Gestalt Theory and the Helmholtz Principle. 478-482. 10.1109/ICSME.2016.24.
  • Sarkar K., Saraf K., Ghosh A. 2015. Improving graph based multidocument text summarization using an enhanced sentence similarity measure. 2015 IEEE 2nd International Conference on Recent Trends in Information Systems, ReTIS 2015 - Proceedings, pp: 359-365.
  • J.-M. M. Agnes Desolneux, Lionel Moisan Jean-Michel, From GestaltTheory to Image Analysis, 2006, vol. 34, no. July.
  • A. Toprak and M. Turan, "The Positive Effect of PMI on the Selection of Meaningful Words," 2019 11th International Conference on Electrical and Electronics Engineering (ELECO), 2019, pp. 911-915, doi: 10.23919/ELECO47770.2019.8990666.
There are 17 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Ahmet Toprak 0000-0001-7046-8512

Metin Turan

Publication Date October 10, 2022
Published in Issue Year 2022 Volume: 5 Issue: 1

Cite

APA Toprak, A., & Turan, M. (2022). Helmholtz-Based Automatic Document Summarization. Veri Bilimi, 5(1), 13-25.



Dergimizin Tarandığı Dizinler (İndeksler)


Academic Resource Index

logo.png

journalseeker.researchbib.com

Google Scholar

scholar_logo_64dp.png

ASOS Index

asos-index.png

Rooting Index

logo.png

www.rootindexing.com

The JournalTOCs Index

journal-tocs-logo.jpg?w=584

www.journaltocs.ac.uk

General Impact Factor (GIF) Index

images?q=tbn%3AANd9GcQ0CrEQm4bHBnwh4XJv9I3ZCdHgQarj_qLyPTkGpeoRRmNh10eC

generalif.com

Directory of Research Journals Indexing

DRJI_Logo.jpg

olddrji.lbp.world/indexedJournals.aspx

I2OR Index

8c492a0a466f9b2cd59ec89595639a5c?AccessKeyId=245B99561176BAE11FEB&disposition=0&alloworigin=1

http://www.i2or.com/8.html



logo.png