System of automatic scientific article summarization in Turkish

Nazan Kemaloğlu Alagöz; Ecir Uğur Küçüksille

Araştırma Makalesi

Türkçe otomatik bilimsel makale özetleme sistemi

Yıl 2024, Cilt: 30 Sayı: 4, 470 - 481, 30.08.2024

Nazan Kemaloğlu Alagöz , Ecir Uğur Küçüksille

Öz

Günümüzde internet kullanımının yaygınlaşması, hızla artan bilgi ile birlikte büyük bir bilgi kirliliğini de beraberinde getirmiştir. bu büyük ve gürültülü verilerden anlamlı veriler elde etmek internet kullanıcıları için büyük bir sorun haline gelmiştir. Genellikle dijital ortamlardan elde edilen metinler üzerinde kullanılan metin özetleme, farklı alanlardaki bilimsel makalelerin özetlenmesinde de kullanılmaktadır. Bu çalışmada bilişim alanında yazılmış Türkçe makaleler üzerinde kullanılmak üzere bilimsel metin özet çalışması yapılmıştır. Dergipark'tan toplanan makalelerle geniş bir Türk Bilişim Literatürü veri seti oluşturulmuştur. Bu veri seti üzerinde literatürde mevcut olan metin ön işleme çalışmalarına ek olarak bilimsel makale formatı ile yeni özgün bir ön işleme fonksiyonu geliştirilmiştir. Özetleme yapılırken literatürde doğal dil işleme alanında kullanımı giderek artan Deep Belief Networks (DBN) kullanılmıştır. Geliştirilen sistemin performansını ölçmek için önceden eğitilmiş bir doğal dil işleme modeli olan BERT algoritması ile referans özetleri oluşturulmuştur. Bilimsel makaleler BERT ve Deep Belief Networks ile özetlendikten sonra, özetler BERT Puanı ve BERT Modeli'nin özel bir karşılaştırma metriği olan BART Puanı ile karşılaştırıldı. Elde edilen sonuçlar, geliştirilen Türk Bilişim Literatür Özetleme Yöntemi'nin BERT Puanı metriğinde 0.78 F-Puan ve 0.68 BART Puanı ile bilimsel bir makalenin özetini oluşturduğunu göstermiştir.

Anahtar Kelimeler

Türkçe doğal dil işleme , Otomatik metin özetleme , Derin inanç ağları , BERT skor , BART skor

Kaynakça

[1] Erhandi B. Text Summarization with Deep Learning. MSc Thesis, Sakarya University, Sakarya, Turkey, 2020.
[2] Guran, A. Automatic Text Summarization System. PhD Thesis, Yıldız Technical University, Istanbul, Turkey, 2013.
[3] Mutlu B. Text Summarization by Hybrid Intelligent System. PhD Thesis, Gazi Universty, Ankara, Turkey, 2020.
[4] Gupta V, Lehal GS. “A Survey of Text Summarization Extractive Techniques”. Journal of Emerging Technologies in Web Intelligence, 2(3), 258-268, 2010.
[5] Dudak E. Extractive Text Summarization by Gray Wolf Optimization Algorithm and Classification of Abstracts with Deep Learning. MSc Thesis, Duzce Universty, Duzce, Turkey, 2020.
[6] Altmami NI, Menai MEB. “Automatic summarization of scientific articles: a survey”. Journal of King Saud University-Computer and Information Sciences, 34(4), 1011-1028, 2020.
[7] Luhn HP. “The automatic creation of literature abstracts”. IBM Journal of Research and Development, 2(2), 159-165, 1958.
[8] Altan Z. “A Turkish automatic text summarization system”. IASTED International Conference Artificiall Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
[9] Yolcular Oguz B. Literature Mining; A Real-Time WEB-based Text Mining Application. PhD Thesis, Akdeniz Universty, Antalya, Turkey, 2016.
[10] Kim M, Singh MD, Lee M. “Towards abstraction from extraction: Multiple timescale gated recurrent unit for summarization”. 1st Workshop on Representation Learning for NLP, Berlin Germany, 11 August 2016.
[11] Collins E, Augenstein I, Riedel S. “A supervised approach to extractive summarisation of scientific papers”. 21st Conference on Computational Natural Language Leraning, Vancouver, Canada, 3-4 August 2017.
[12] Wang L, Yao J, Tao Y, Zhong L, Liu W, Du Q. “A reinforced topic-aware convolutional sequence-to- sequence model for abstractive text summarization”. Internatioanl Joint Conference on Artificial Intelligence, Stockholm, Sweeden, 13-19 July 2018."
[13] Nikolov NI, Pfeiffer M, Hahnloser RH. “Data-driven Summarization of Scientific Articles”. Language Resources and Evaluation Conference, Miyazaki, Japan, 7-12 May 2018.
[14] Nallapati R, Zhou B, Gulcehre C, Xiang B. “Abstractive text summarization using sequence-to-sequence RNNS and beyond”. The SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 1-12 August 2016.
[15] Sirohi NK, Bansal M, Rajan SN. “Recent approaches for text summarization using machine learning & LSTM0”. Journal on Big Data, 3(1), 35-47, 2021.
[16] Lloret E, Roma-Ferri MT, Palomar M. “Compendium: a text summarization system for generating abstracts of research papers”. Data Knowledge Engineering, 88, 164-175, 2013.
[17] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep belief nets”. Neural Computation, 18(7), 1527-1554, 2006.
[18] Hua Y, Guo J, Zhao H. “Deep belief networks and deep learning”. International Conference on Intelligent Computing and Internet of Things, Harbin, China, 17-18 January 2015.
[19] Savas BK, Becerikli YA. “Deep learning approach to driver fatigue detection via mouth state analyses and yawning detection”. Journal of Computer Engineering, 23(3), 24-30, 2021.
[20] Sar KT. Time Based Sentiment Analysis Using Artificial Neural Networks and Bert Language Model: Exploring Comments on Whatsapp's New Privacy Policy, MSc Thesis, Dokuz Eylul Universty, Izmir, Turkey,2021.
[21] Devlin J, Chang MW, Lee K, Toutanova K. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv, 2019. https://arxiv. Org/pdf/1810.04805.
[22] Ozkan B. “Dialog Intent Classification Using NLP Methods”. MSc Thesis, Bahcesehir Universty, Istanbul, Turkey, 2021. Miller M. “Leveraging BERT for Extractive Text Summarization on Lectures”. arXiv, 2019. https://arxiv. Org/pdf/1906.04165.
[23] Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. “Bertscore: Evaluating text generation with bert”. arXiv, 2019. https://arxiv. Org/pdf/1904.09675
[24] Wu Y, Schuster M, Chen Z, Le QV, Norouzi, M, Macherey, W, Dean J. “Google's neural machine translation system: Bridging the gap between human and machine translation”. arXiv, 2016. https://arxiv. Org/pdf/1609.08144
[25] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Zettlemoyer L. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension”. arXiv. https://arxiv. Org/pdf/1910.13461.
[26] Yuan W, Neubig G, Liu P. “Bartscore: Evaluating generated text as text generation”. Conference on Neural Information Processing Systems, Virtual Conference, 6-14 December 2021.
[27] Github. “An Approach to Automatic Trending Tweet Summarization”. https://github. Com/yuva29/twitter-trends-summarizer (04/20/2018).
[28] Das SJ, Murakami R, Chakraborty B. “Development of a two-step LDA based aspect extraction technique for review summarization”. International Journal of Applied Science and Engineering, 18(1), 1-18, 2021.
[29] Karlbom H, Clifton A. “Abstractive Podcast Summarization using BART with Longformer Attention”. The 29th Text Retrieval Conference, Virtual Conference, 15–19 November 2021.
[30] Slamet C, Atmadja AR, Maylawati DS, Lestari RS, Darmalaksana W, Ramdhani MA. “Automated text summarization for indonesian article using vector space model. The 2nd Annual Applied Science and Engineering Conference (AASEC 2017), Bandung, Indonesia, 24 August 2017.

System of automatic scientific article summarization in Turkish

Yıl 2024, Cilt: 30 Sayı: 4, 470 - 481, 30.08.2024

Nazan Kemaloğlu Alagöz , Ecir Uğur Küçüksille

Öz

The widespread use of the internet today, along with the rapidly increasing information, has brought along great information pollution. it has become a big problem for internet users to obtain meaningful data from this large and noisy data. Text summarization, which is generally used on texts obtained from digital media, has also been used for summarizing scientific articles in different fields. in this study, a scientific text summary study was carried out to be used on Turkish articles written in the field of informatics. A large Turkish Informatics Literature dataset was created with the articles collected from Dergipark. in addition to the text pre-processing studies available in the literature on this dataset, a new original pre-processing function has been developed by the scientific article format. While summarizing, Deep Belief Networks (DBN), which has an increasing use in the field of natural language processing in the literature, has been used. To measure the performance of the developed system, reference summaries were created with the BERT algorithm, which is a pre-trained natural language processing model. After the scientific articles were summarized with BERT and Deep Belief Networks, the abstracts were compared with BERT Score and BART Score, a specialized comparison metric of the BERT Model. The results showed that the developed Turkish informatics Literature Summarization Method constitutes a summary of a scientific article with 0.78 F-Score and 0.68 BART Score in the BERT Score metric.

Anahtar Kelimeler

Turkish natural language processing , Automatic text summarization , Deep belief networks , BERT score , BART score

Kaynakça

[1] Erhandi B. Text Summarization with Deep Learning. MSc Thesis, Sakarya University, Sakarya, Turkey, 2020.
[2] Guran, A. Automatic Text Summarization System. PhD Thesis, Yıldız Technical University, Istanbul, Turkey, 2013.
[3] Mutlu B. Text Summarization by Hybrid Intelligent System. PhD Thesis, Gazi Universty, Ankara, Turkey, 2020.
[4] Gupta V, Lehal GS. “A Survey of Text Summarization Extractive Techniques”. Journal of Emerging Technologies in Web Intelligence, 2(3), 258-268, 2010.
[5] Dudak E. Extractive Text Summarization by Gray Wolf Optimization Algorithm and Classification of Abstracts with Deep Learning. MSc Thesis, Duzce Universty, Duzce, Turkey, 2020.
[6] Altmami NI, Menai MEB. “Automatic summarization of scientific articles: a survey”. Journal of King Saud University-Computer and Information Sciences, 34(4), 1011-1028, 2020.
[7] Luhn HP. “The automatic creation of literature abstracts”. IBM Journal of Research and Development, 2(2), 159-165, 1958.
[8] Altan Z. “A Turkish automatic text summarization system”. IASTED International Conference Artificiall Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
[9] Yolcular Oguz B. Literature Mining; A Real-Time WEB-based Text Mining Application. PhD Thesis, Akdeniz Universty, Antalya, Turkey, 2016.
[10] Kim M, Singh MD, Lee M. “Towards abstraction from extraction: Multiple timescale gated recurrent unit for summarization”. 1st Workshop on Representation Learning for NLP, Berlin Germany, 11 August 2016.
[11] Collins E, Augenstein I, Riedel S. “A supervised approach to extractive summarisation of scientific papers”. 21st Conference on Computational Natural Language Leraning, Vancouver, Canada, 3-4 August 2017.
[12] Wang L, Yao J, Tao Y, Zhong L, Liu W, Du Q. “A reinforced topic-aware convolutional sequence-to- sequence model for abstractive text summarization”. Internatioanl Joint Conference on Artificial Intelligence, Stockholm, Sweeden, 13-19 July 2018."
[13] Nikolov NI, Pfeiffer M, Hahnloser RH. “Data-driven Summarization of Scientific Articles”. Language Resources and Evaluation Conference, Miyazaki, Japan, 7-12 May 2018.
[14] Nallapati R, Zhou B, Gulcehre C, Xiang B. “Abstractive text summarization using sequence-to-sequence RNNS and beyond”. The SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 1-12 August 2016.
[15] Sirohi NK, Bansal M, Rajan SN. “Recent approaches for text summarization using machine learning & LSTM0”. Journal on Big Data, 3(1), 35-47, 2021.
[16] Lloret E, Roma-Ferri MT, Palomar M. “Compendium: a text summarization system for generating abstracts of research papers”. Data Knowledge Engineering, 88, 164-175, 2013.
[17] Hinton GE, Osindero S, Teh YW. “A fast learning algorithm for deep belief nets”. Neural Computation, 18(7), 1527-1554, 2006.
[18] Hua Y, Guo J, Zhao H. “Deep belief networks and deep learning”. International Conference on Intelligent Computing and Internet of Things, Harbin, China, 17-18 January 2015.
[19] Savas BK, Becerikli YA. “Deep learning approach to driver fatigue detection via mouth state analyses and yawning detection”. Journal of Computer Engineering, 23(3), 24-30, 2021.
[20] Sar KT. Time Based Sentiment Analysis Using Artificial Neural Networks and Bert Language Model: Exploring Comments on Whatsapp's New Privacy Policy, MSc Thesis, Dokuz Eylul Universty, Izmir, Turkey,2021.
[21] Devlin J, Chang MW, Lee K, Toutanova K. “Bert: Pre-training of deep bidirectional transformers for language understanding”. arXiv, 2019. https://arxiv. Org/pdf/1810.04805.
[22] Ozkan B. “Dialog Intent Classification Using NLP Methods”. MSc Thesis, Bahcesehir Universty, Istanbul, Turkey, 2021. Miller M. “Leveraging BERT for Extractive Text Summarization on Lectures”. arXiv, 2019. https://arxiv. Org/pdf/1906.04165.
[23] Zhang T, Kishore V, Wu F, Weinberger KQ, Artzi Y. “Bertscore: Evaluating text generation with bert”. arXiv, 2019. https://arxiv. Org/pdf/1904.09675
[24] Wu Y, Schuster M, Chen Z, Le QV, Norouzi, M, Macherey, W, Dean J. “Google's neural machine translation system: Bridging the gap between human and machine translation”. arXiv, 2016. https://arxiv. Org/pdf/1609.08144
[25] Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Zettlemoyer L. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension”. arXiv. https://arxiv. Org/pdf/1910.13461.
[26] Yuan W, Neubig G, Liu P. “Bartscore: Evaluating generated text as text generation”. Conference on Neural Information Processing Systems, Virtual Conference, 6-14 December 2021.
[27] Github. “An Approach to Automatic Trending Tweet Summarization”. https://github. Com/yuva29/twitter-trends-summarizer (04/20/2018).
[28] Das SJ, Murakami R, Chakraborty B. “Development of a two-step LDA based aspect extraction technique for review summarization”. International Journal of Applied Science and Engineering, 18(1), 1-18, 2021.
[29] Karlbom H, Clifton A. “Abstractive Podcast Summarization using BART with Longformer Attention”. The 29th Text Retrieval Conference, Virtual Conference, 15–19 November 2021.
[30] Slamet C, Atmadja AR, Maylawati DS, Lestari RS, Darmalaksana W, Ramdhani MA. “Automated text summarization for indonesian article using vector space model. The 2nd Annual Applied Science and Engineering Conference (AASEC 2017), Bandung, Indonesia, 24 August 2017.

Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Algoritmalar ve Hesaplama Kuramı
Bölüm	Makale
Yazarlar	Nazan Kemaloğlu Alagöz Ecir Uğur Küçüksille
Yayımlanma Tarihi	30 Ağustos 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 30 Sayı: 4

Kaynak Göster

APA	Kemaloğlu Alagöz, N., & Küçüksille, E. U. (2024). System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 30(4), 470-481.
AMA	Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. Ağustos 2024;30(4):470-481.
Chicago	Kemaloğlu Alagöz, Nazan, ve Ecir Uğur Küçüksille. “System of automatic scientific article summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30, sy. 4 (Ağustos 2024): 470-81.
EndNote	Kemaloğlu Alagöz N, Küçüksille EU (01 Ağustos 2024) System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30 4 470–481.
IEEE	N. Kemaloğlu Alagöz ve E. U. Küçüksille, “System of automatic scientific article summarization in Turkish”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 30, sy. 4, ss. 470–481, 2024.
ISNAD	Kemaloğlu Alagöz, Nazan - Küçüksille, Ecir Uğur. “System of automatic scientific article summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 30/4 (Ağustos2024), 470-481.
JAMA	Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2024;30:470–481.
MLA	Kemaloğlu Alagöz, Nazan ve Ecir Uğur Küçüksille. “System of automatic scientific article summarization in Turkish”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 30, sy. 4, 2024, ss. 470-81.
Vancouver	Kemaloğlu Alagöz N, Küçüksille EU. System of automatic scientific article summarization in Turkish. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2024;30(4):470-81.

Makale Dosyaları

Tam Metin

Bu dergi Creative Commons Al 4.0 Uluslararası Lisansı ile lisanslanmıştır.