Research Article
BibTex RIS Cite

System of automatic scientific article summarization in Turkish

Year 2024, Volume: 30 Issue: 4, 470 - 481, 30.08.2024

Abstract

The widespread use of the internet today, along with the rapidly increasing information, has brought along great information pollution. it has become a big problem for internet users to obtain meaningful data from this large and noisy data. Text summarization, which is generally used on texts obtained from digital media, has also been used for summarizing scientific articles in different fields. in this study, a scientific text summary study was carried out to be used on Turkish articles written in the field of informatics. A large Turkish Informatics Literature dataset was created with the articles collected from Dergipark. in addition to the text pre-processing studies available in the literature on this dataset, a new original pre-processing function has been developed by the scientific article format. While summarizing, Deep Belief Networks (DBN), which has an increasing use in the field of natural language processing in the literature, has been used. To measure the performance of the developed system, reference summaries were created with the BERT algorithm, which is a pre-trained natural language processing model. After the scientific articles were summarized with BERT and Deep Belief Networks, the abstracts were compared with BERT Score and BART Score, a specialized comparison metric of the BERT Model. The results showed that the developed Turkish informatics Literature Summarization Method constitutes a summary of a scientific article with 0.78 F-Score and 0.68 BART Score in the BERT Score metric.

References

  • [1] Erhandi B. Text Summarization with Deep Learning. MSc Thesis, Sakarya University, Sakarya, Turkey, 2020.
  • [2] Guran, A. Automatic Text Summarization System. PhD Thesis, Yıldız Technical University, Istanbul, Turkey, 2013.
  • [3] Mutlu B. Text Summarization by Hybrid Intelligent System. PhD Thesis, Gazi Universty, Ankara, Turkey, 2020.
  • [4] Gupta V, Lehal GS. “A Survey of Text Summarization Extractive Techniques”. Journal of Emerging Technologies in Web Intelligence, 2(3), 258-268, 2010.
  • [5] Dudak E. Extractive Text Summarization by Gray Wolf Optimization Algorithm and Classification of Abstracts with Deep Learning. MSc Thesis, Duzce Universty, Duzce, Turkey, 2020.
  • [6] Altmami NI, Menai MEB. “Automatic summarization of scientific articles: a survey”. Journal of King Saud University-Computer and Information Sciences, 34(4), 1011-1028, 2020.
  • [7] Luhn HP. “The automatic creation of literature abstracts”. IBM Journal of Research and Development, 2(2), 159-165, 1958.
  • [8] Altan Z. “A Turkish automatic text summarization system”. IASTED International Conference Artificiall Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
  • [9] Yolcular Oguz B. Literature Mining; A Real-Time WEB-based Text Mining Application. PhD Thesis, Akdeniz Universty, Antalya, Turkey, 2016.
  • [10] Kim M, Singh MD, Lee M. “Towards abstraction from extraction: Multiple timescale gated recurrent unit for summarization”. 1st Workshop on Representation Learning for NLP, Berlin Germany, 11 August 2016.
  • [11] Collins E, Augenstein I, Riedel S. “A supervised approach to extractive summarisation of scientific papers”. 21st Conference on Computational Natural Language Leraning, Vancouver, Canada, 3-4 August 2017.
  • [12] Wang L, Yao J, Tao Y, Zhong L, Liu W, Du Q. “A reinforced topic-aware convolutional sequence-to- sequence model for abstractive text summarization”. Internatioanl Joint Conference on Artificial Intelligence, Stockholm, Sweeden, 13-19 July 2018."
  • [13] Nikolov NI, Pfeiffer M, Hahnloser RH. “Data-driven Summarization of Scientific Articles”. Language Resources and Evaluation Conference, Miyazaki, Japan, 7-12 May 2018.
  • [14] Nallapati R, Zhou B, Gulcehre C, Xiang B. “Abstractive text summarization using sequence-to-sequence RNNS and beyond”. The SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 1-12 August 2016.
  • [15] Sirohi NK, Bansal M, Rajan SN. “Recent approaches for text summarization using machine learning & LSTM0”. Journal on Big Data, 3(1), 35-47, 2021.
  • [16] Lloret E, Roma-Ferri MT, Palomar M. “Compendium: a text summarization system for generating abstracts of research papers”. Data Knowledge Engineering, 88, 164-175, 2013.
  • [17] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep belief nets”. Neural Computation, 18(7), 1527-1554, 2006.
  • [18] Hua Y, Guo J, Zhao H. “Deep belief networks and deep learning”. International Conference on Intelligent Computing and Internet of Things, Harbin, China, 17-18 January 2015.
  • [19] Savas BK, Becerikli YA. “Deep learning approach to driver fatigue detection via mouth state analyses and yawning detection”. Journal of Computer Engineering, 23(3), 24-30, 2021.
  • [20] Sar KT. Time Based Sentiment Analysis Using Artificial Neural Networks and Bert Language Model: Exploring Comments on Whatsapp's New Privacy Policy, MSc Thesis, Dokuz Eylul Universty, Izmir, Turkey,2021.
  • [21] Devlin J, Chang MW, Lee K, Toutanova K. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv, 2019. https://arxiv. Org/pdf/1810.04805.
  • [22] Ozkan B. “Dialog Intent Classification Using NLP Methods”. MSc Thesis, Bahcesehir Universty, Istanbul, Turkey, 2021. Miller M. “Leveraging BERT for Extractive Text Summarization on Lectures”. arXiv, 2019. https://arxiv. Org/pdf/1906.04165.
  • [23] Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. “Bertscore: Evaluating text generation with bert”. arXiv, 2019. https://arxiv. Org/pdf/1904.09675
  • [24] Wu Y, Schuster M, Chen Z, Le QV, Norouzi, M, Macherey, W, Dean J. “Google's neural machine translation system: Bridging the gap between human and machine translation”. arXiv, 2016. https://arxiv. Org/pdf/1609.08144
  • [25] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Zettlemoyer L. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension”. arXiv. https://arxiv. Org/pdf/1910.13461.
  • [26] Yuan W, Neubig G, Liu P. “Bartscore: Evaluating generated text as text generation”. Conference on Neural Information Processing Systems, Virtual Conference, 6-14 December 2021.
  • [27] Github. “An Approach to Automatic Trending Tweet Summarization”. https://github. Com/yuva29/twitter-trends-summarizer (04/20/2018).
  • [28] Das SJ, Murakami R, Chakraborty B. “Development of a two-step LDA based aspect extraction technique for review summarization”. International Journal of Applied Science and Engineering, 18(1), 1-18, 2021.
  • [29] Karlbom H, Clifton A. “Abstractive Podcast Summarization using BART with Longformer Attention”. The 29th Text Retrieval Conference, Virtual Conference, 15–19 November 2021.
  • [30] Slamet C, Atmadja AR, Maylawati DS, Lestari RS, Darmalaksana W, Ramdhani MA. “Automated text summarization for indonesian article using vector space model. The 2nd Annual Applied Science and Engineering Conference (AASEC 2017), Bandung, Indonesia, 24 August 2017.

Türkçe otomatik bilimsel makale özetleme sistemi

Year 2024, Volume: 30 Issue: 4, 470 - 481, 30.08.2024

Abstract

Günümüzde internet kullanımının yaygınlaşması, hızla artan bilgi ile birlikte büyük bir bilgi kirliliğini de beraberinde getirmiştir. bu büyük ve gürültülü verilerden anlamlı veriler elde etmek internet kullanıcıları için büyük bir sorun haline gelmiştir. Genellikle dijital ortamlardan elde edilen metinler üzerinde kullanılan metin özetleme, farklı alanlardaki bilimsel makalelerin özetlenmesinde de kullanılmaktadır. Bu çalışmada bilişim alanında yazılmış Türkçe makaleler üzerinde kullanılmak üzere bilimsel metin özet çalışması yapılmıştır. Dergipark'tan toplanan makalelerle geniş bir Türk Bilişim Literatürü veri seti oluşturulmuştur. Bu veri seti üzerinde literatürde mevcut olan metin ön işleme çalışmalarına ek olarak bilimsel makale formatı ile yeni özgün bir ön işleme fonksiyonu geliştirilmiştir. Özetleme yapılırken literatürde doğal dil işleme alanında kullanımı giderek artan Deep Belief Networks (DBN) kullanılmıştır. Geliştirilen sistemin performansını ölçmek için önceden eğitilmiş bir doğal dil işleme modeli olan BERT algoritması ile referans özetleri oluşturulmuştur. Bilimsel makaleler BERT ve Deep Belief Networks ile özetlendikten sonra, özetler BERT Puanı ve BERT Modeli'nin özel bir karşılaştırma metriği olan BART Puanı ile karşılaştırıldı. Elde edilen sonuçlar, geliştirilen Türk Bilişim Literatür Özetleme Yöntemi'nin BERT Puanı metriğinde 0.78 F-Puan ve 0.68 BART Puanı ile bilimsel bir makalenin özetini oluşturduğunu göstermiştir.

References

  • [1] Erhandi B. Text Summarization with Deep Learning. MSc Thesis, Sakarya University, Sakarya, Turkey, 2020.
  • [2] Guran, A. Automatic Text Summarization System. PhD Thesis, Yıldız Technical University, Istanbul, Turkey, 2013.
  • [3] Mutlu B. Text Summarization by Hybrid Intelligent System. PhD Thesis, Gazi Universty, Ankara, Turkey, 2020.
  • [4] Gupta V, Lehal GS. “A Survey of Text Summarization Extractive Techniques”. Journal of Emerging Technologies in Web Intelligence, 2(3), 258-268, 2010.
  • [5] Dudak E. Extractive Text Summarization by Gray Wolf Optimization Algorithm and Classification of Abstracts with Deep Learning. MSc Thesis, Duzce Universty, Duzce, Turkey, 2020.
  • [6] Altmami NI, Menai MEB. “Automatic summarization of scientific articles: a survey”. Journal of King Saud University-Computer and Information Sciences, 34(4), 1011-1028, 2020.
  • [7] Luhn HP. “The automatic creation of literature abstracts”. IBM Journal of Research and Development, 2(2), 159-165, 1958.
  • [8] Altan Z. “A Turkish automatic text summarization system”. IASTED International Conference Artificiall Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
  • [9] Yolcular Oguz B. Literature Mining; A Real-Time WEB-based Text Mining Application. PhD Thesis, Akdeniz Universty, Antalya, Turkey, 2016.
  • [10] Kim M, Singh MD, Lee M. “Towards abstraction from extraction: Multiple timescale gated recurrent unit for summarization”. 1st Workshop on Representation Learning for NLP, Berlin Germany, 11 August 2016.
  • [11] Collins E, Augenstein I, Riedel S. “A supervised approach to extractive summarisation of scientific papers”. 21st Conference on Computational Natural Language Leraning, Vancouver, Canada, 3-4 August 2017.
  • [12] Wang L, Yao J, Tao Y, Zhong L, Liu W, Du Q. “A reinforced topic-aware convolutional sequence-to- sequence model for abstractive text summarization”. Internatioanl Joint Conference on Artificial Intelligence, Stockholm, Sweeden, 13-19 July 2018."
  • [13] Nikolov NI, Pfeiffer M, Hahnloser RH. “Data-driven Summarization of Scientific Articles”. Language Resources and Evaluation Conference, Miyazaki, Japan, 7-12 May 2018.
  • [14] Nallapati R, Zhou B, Gulcehre C, Xiang B. “Abstractive text summarization using sequence-to-sequence RNNS and beyond”. The SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 1-12 August 2016.
  • [15] Sirohi NK, Bansal M, Rajan SN. “Recent approaches for text summarization using machine learning & LSTM0”. Journal on Big Data, 3(1), 35-47, 2021.
  • [16] Lloret E, Roma-Ferri MT, Palomar M. “Compendium: a text summarization system for generating abstracts of research papers”. Data Knowledge Engineering, 88, 164-175, 2013.
  • [17] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep belief nets”. Neural Computation, 18(7), 1527-1554, 2006.
  • [18] Hua Y, Guo J, Zhao H. “Deep belief networks and deep learning”. International Conference on Intelligent Computing and Internet of Things, Harbin, China, 17-18 January 2015.
  • [19] Savas BK, Becerikli YA. “Deep learning approach to driver fatigue detection via mouth state analyses and yawning detection”. Journal of Computer Engineering, 23(3), 24-30, 2021.
  • [20] Sar KT. Time Based Sentiment Analysis Using Artificial Neural Networks and Bert Language Model: Exploring Comments on Whatsapp's New Privacy Policy, MSc Thesis, Dokuz Eylul Universty, Izmir, Turkey,2021.
  • [21] Devlin J, Chang MW, Lee K, Toutanova K. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv, 2019. https://arxiv. Org/pdf/1810.04805.
  • [22] Ozkan B. “Dialog Intent Classification Using NLP Methods”. MSc Thesis, Bahcesehir Universty, Istanbul, Turkey, 2021. Miller M. “Leveraging BERT for Extractive Text Summarization on Lectures”. arXiv, 2019. https://arxiv. Org/pdf/1906.04165.
  • [23] Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. “Bertscore: Evaluating text generation with bert”. arXiv, 2019. https://arxiv. Org/pdf/1904.09675
  • [24] Wu Y, Schuster M, Chen Z, Le QV, Norouzi, M, Macherey, W, Dean J. “Google's neural machine translation system: Bridging the gap between human and machine translation”. arXiv, 2016. https://arxiv. Org/pdf/1609.08144
  • [25] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Zettlemoyer L. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension”. arXiv. https://arxiv. Org/pdf/1910.13461.
  • [26] Yuan W, Neubig G, Liu P. “Bartscore: Evaluating generated text as text generation”. Conference on Neural Information Processing Systems, Virtual Conference, 6-14 December 2021.
  • [27] Github. “An Approach to Automatic Trending Tweet Summarization”. https://github. Com/yuva29/twitter-trends-summarizer (04/20/2018).
  • [28] Das SJ, Murakami R, Chakraborty B. “Development of a two-step LDA based aspect extraction technique for review summarization”. International Journal of Applied Science and Engineering, 18(1), 1-18, 2021.
  • [29] Karlbom H, Clifton A. “Abstractive Podcast Summarization using BART with Longformer Attention”. The 29th Text Retrieval Conference, Virtual Conference, 15–19 November 2021.
  • [30] Slamet C, Atmadja AR, Maylawati DS, Lestari RS, Darmalaksana W, Ramdhani MA. “Automated text summarization for indonesian article using vector space model. The 2nd Annual Applied Science and Engineering Conference (AASEC 2017), Bandung, Indonesia, 24 August 2017.
There are 30 citations in total.

Details

Primary Language English
Subjects Algorithms and Calculation Theory
Journal Section Research Article
Authors

Nazan Kemaloğlu Alagöz

Ecir Uğur Küçüksille

Publication Date August 30, 2024
Published in Issue Year 2024 Volume: 30 Issue: 4

Cite

APA Kemaloğlu Alagöz, N., & Küçüksille, E. U. (2024). System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 30(4), 470-481.
AMA Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. August 2024;30(4):470-481.
Chicago Kemaloğlu Alagöz, Nazan, and Ecir Uğur Küçüksille. “System of Automatic Scientific Article Summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30, no. 4 (August 2024): 470-81.
EndNote Kemaloğlu Alagöz N, Küçüksille EU (August 1, 2024) System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30 4 470–481.
IEEE N. Kemaloğlu Alagöz and E. U. Küçüksille, “System of automatic scientific article summarization in Turkish”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 30, no. 4, pp. 470–481, 2024.
ISNAD Kemaloğlu Alagöz, Nazan - Küçüksille, Ecir Uğur. “System of Automatic Scientific Article Summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30/4 (August 2024), 470-481.
JAMA Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2024;30:470–481.
MLA Kemaloğlu Alagöz, Nazan and Ecir Uğur Küçüksille. “System of Automatic Scientific Article Summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 30, no. 4, 2024, pp. 470-81.
Vancouver Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2024;30(4):470-81.

ESCI_LOGO.png    image001.gif    image002.gif        image003.gif     image004.gif