Araştırma Makalesi
BibTex RIS Kaynak Göster

Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği

Yıl 2025, Cilt: 8 Sayı: 3, 1405 - 1426, 16.06.2025
https://doi.org/10.47495/okufbed.1570542

Öz

Günümüzde bilgisayar destekli araçlar, akademik araştırmalarda bilgiye erişimi iyileştirme ve verimliliği optimize etmede önemli bir rol oynamaktadır. Bu bağlamda, yapay zeka ve doğal dil işleme tekniklerinin kullanımı, araştırmacıların iş yükünü hafifletmekte ve daha hızlı sonuçlar elde etmelerini sağlamaktadır. Bu çalışmada, araştırma sürecinin özellikle literatür taraması ve kaynak bulma aşamalarında verimliliği artırmayı amaçlayan yeni bir prototip çalışma geliştirilmiştir. Bu çalışma, kullanıcıların girdiği anahtar kelimeler aracılığıyla Google Scholar platformundan belirlenen sayıda akademik makaleyi PDF formatında otomatik olarak indirir. Ardından Berstum, TextRank ve LexRank olmak üzere üç farklı özetleme algoritması seçeneği sunarak, indirilen makalelerin özetlerini çıkarır. Kullanıcı dostu bir arayüz aracılığıyla araştırmacılar istedikleri anahtar kelimeleri girebilir, benzerlik analizi için bir metin sağlayabilir ve tercih ettikleri özetleme algoritmasını seçebilirler. Elde edilen özetler ve benzerlik skorları arayüzde anlaşılır bir şekilde sunulur. İndirilen makalelerin içeriklerini kullanıcının girdiği metinle karşılaştırmak amacıyla metinlerdeki kelimelerin önem ve benzerliğini ölçen TF-IDF (Terim Frekansı-Ters Belge Frekansı) ve kosinüs benzerlik algoritmaları kullanılmıştır. Bu sayede, kullanıcının aradığı konularla ilgili makaleler ve ilgili bölümler tespit edilebilmektedir. Çalışmada ayrıca, geliştirilen prototipin ürettiği özetlerin kalitesini değerlendirmek için, özetlerin kullanıcıların girdiği referans metinlerle olan örtüşmesini, kesinliğini ve anlam bütünlüğünü ölçen Bleu, Rouge ve Meteor metrikleri kullanılmıştır. Bu değerlendirme sonucunda prototip çalışmanın ürettiği özetlerin yüksek doğruluk değerlerine ulaştığı görülmüştür. Prototip bu çalışma, farklı formatlarda ve düzenlerde olabilen PDF dosyalarının yapısal farklılıklarını ele almak için çeşitli ön işleme adımları kullanır. Bu sayede, farklı kaynaklardan gelen makalelerin özetleri ve benzerlik analizleri tutarlı bir şekilde gerçekleştirilebilmektedir.

Kaynakça

  • Aftiss A., Lamsiyah S., El Alaoui SO., Schommer C. Biomdsum: an effective hybrid biomedical multi-document summarization method based on pagerank and longformer encoder-decoder. IEEE Access 2024; 12.
  • Alzahrani SM., Salim N., Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2011; 42(2): 133-149.
  • Banerjee S., Lavie A. Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 2005; 65-72.
  • Beel J., Gipp B., Langer S., Breitinger C. Paper recommender systems: a literature survey. International Journal on Digital Libraries 2016; 17: 305-338.
  • Bird S., Klein E., Loper E. Natural language processing with python: Analyzing text with the natural language toolkit. O'Reilly Media, Inc. 2009.
  • Boell SK., Cecez-Kecmanovic D. A hermeneutic approach for conducting literature reviews and literature searches. Communications of the Association for Information Systems 2014; 34(1): 12.
  • Cao Z., Li W., Wei F., Li S. Retrieve, rerank and rewrite: soft template based neural summarization. In: Proceedings of the Association for Computational Linguistics (ACL) 2018.
  • Cohan A., Dernoncourt F., Kim DS., Bui T., Kim S., Chang W., Goharian N. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 2018.
  • Cohan A., Goharian N. Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries 2018; 19: 287-303.
  • Creswell JW., Creswell JD. Mixed methods procedures. In: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches 2018.
  • Devlin J. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018.
  • Edress Z., Ortakci Y. Optimizing text summarization with sentence clustering and natural language processing. International Journal of Advanced Computer Science & Applications 2024; 15(10).
  • Erkan G., Radev DR. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 2004; 22: 457-479.
  • Fabbri AR., Kryściński W., McCann B., Xiong C., Socher R., Radev D. Summeval: re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics 2021; 9: 391-409.
  • Fateh Ali N., Mohtasim M., Mosharrof S., Gopi Krishna T. Automated literature review using nlp techniques and llm-based retrieval-augmented generation. arXiv e-prints 2024; arXiv-2411.
  • Fauzi R., Iqbal M., Haryanti T. Design and implementation of a final project plagiarism detection system using cosine similarity method. International Journal of Applied Information Technology 2021; 59-74.
  • Ghanem FA., Padma MC., Abdulwahab HM., Alkhatib R. Novel genetic optimization techniques for accurate social media data summarization and classification using deep learning models. Technologies 2024; 12(10): 199.
  • Gulati V., Kumar D., Popescu DE., Hemanth JD. Extractive article summarization using integrated TextRank and BM25+ algorithm. Electronics 2023; 12(2): 372.
  • Gusenbauer M., Haddaway NR. Which academic search systems are suitable for systematic reviews or meta‐analyses? evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods 2020; 11(2): 181-217.
  • Hemamou L., Debiane M. Scaling up summarization: leveraging large language models for long text extractive summarization. arXiv preprint arXiv:2408.15801 2024.
  • Januzaj Y., Luma A. Cosine similarity–a computing approach to match similarity between higher education programs and job market demands based on maximum number of common words. International Journal of Emerging Technologies in Learning (iJET) 2022; 17(12): 258-268.
  • Kadhim EA., Feizi-Derakhshi MR., Aghdasi HS. Advanced text summarization model incorporating nlp techniques and feature-based scoring. IEEE Access 2025.
  • Lin CY. Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out 2004; 74-81.
  • Liu Y., Lapata M. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 2019. Loper E., Bird S. Nltk: the natural language toolkit. arXiv preprint cs/0205028 2002.
  • Mihalcea R., Tarau P. Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing 2004; 404-411.
  • Mihalcea R. Graph-based natural language processing and information retrieval. Cambridge University Press 2011.
  • Naing I., Funabiki N., Wai KH., Aung ST. A design of automatic reference paper collection system using Selenium and Bert model. In: 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE) 2023; 267-268.
  • Nenkova A., McKeown K. Automatic summarization. Foundations and Trends® in Information Retrieval 2011; 5(2–3): 103-233.
  • Orel E., Ciglenecki I., Thiabaud A., Temerev A., Calmy A., Keiser O., Merzouki A. An automated literature review tool (LiteRev) for streamlining and accelerating research using natural language processing and machine learning: descriptive performance evaluation study. Journal of Medical Internet Research 2023; 25: e39736.
  • Papineni K., Roukos S., Ward T., Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002; 311-318.
  • Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning 2003; 242(1): 29-48.
  • See A., Liu PJ., Manning CD. Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 2017.
  • Setiawan GH., Adnyana IM. Improving helpdesk chatbot performance with term frequency-inverse document frequency (TF-IDF) and cosine similarity models. Journal of Applied Informatics and Computing 2023; 7(2): 252-257.
  • Sofi SM., Selamat A. Aspect based sentiment analysis: feature extraction using latent dirichlet allocation (LDA) and term frequency-inverse document frequency (TF-IDF) in machine learning (ML). Malaysian Journal of Information and Communication Technology (MyJICT) 2023; 169-179.
  • Thirumahal R. TF-IDF vectorization and clustering for extractive text summarization. Journal of Information Technology and Digital World 2024; 6(1): 96-111.
  • Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems 2017.

A Prototype Study on Text Summarization and Similarity Analysis: Google Scholar Example

Yıl 2025, Cilt: 8 Sayı: 3, 1405 - 1426, 16.06.2025
https://doi.org/10.47495/okufbed.1570542

Öz

Today, computer-aided tools play an important role in improving access to information and optimizing efficiency in academic research. In this context, the use of artificial intelligence and natural language processing techniques reduces the workload of researchers and enables them to obtain faster results. In this study, a new prototype study was developed that aims to increase efficiency, especially in the literature review and source finding stages of the research process. The prototype automatically downloads a specified number of academic articles in PDF format from the Google Scholar platform through the keywords entered by users. Then, it extracts summaries of the downloaded articles by offering three different summarization algorithm options: Bertsum, TextRank and LexRank. Through a user-friendly interface, researchers can enter the desired keywords, provide a text for similarity analysis and select the summarization algorithm they prefer. The resulting summaries and similarity scores are presented in an understandable manner in the interface. TF-IDF (Term Frequency-Inverse Document Frequency) and cosine similarity algorithms that measure the importance and similarity of words in the texts to compare the content of the downloaded articles with the text entered by the user has been used. In this way, articles and related sections related to the topics searched by the user can be detected. In addition, in order to evaluate the quality of the summaries produced by the developed prototype study, Bleu, Rouge and Meteor metrics were used, which measure the overlap, precision and semantic integrity of the summaries with the reference texts. As a result of this evaluation, it was seen that the summaries produced by the prototype reached high accuracy values. The prototype uses various preprocessing steps to address the structural differences of PDF files that may have different formats and layouts. In this way, summaries and similarity analyses of articles from different sources can be performed consistently.

Kaynakça

  • Aftiss A., Lamsiyah S., El Alaoui SO., Schommer C. Biomdsum: an effective hybrid biomedical multi-document summarization method based on pagerank and longformer encoder-decoder. IEEE Access 2024; 12.
  • Alzahrani SM., Salim N., Abraham A. Understanding plagiarism linguistic patterns, textual features, and detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2011; 42(2): 133-149.
  • Banerjee S., Lavie A. Meteor: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization 2005; 65-72.
  • Beel J., Gipp B., Langer S., Breitinger C. Paper recommender systems: a literature survey. International Journal on Digital Libraries 2016; 17: 305-338.
  • Bird S., Klein E., Loper E. Natural language processing with python: Analyzing text with the natural language toolkit. O'Reilly Media, Inc. 2009.
  • Boell SK., Cecez-Kecmanovic D. A hermeneutic approach for conducting literature reviews and literature searches. Communications of the Association for Information Systems 2014; 34(1): 12.
  • Cao Z., Li W., Wei F., Li S. Retrieve, rerank and rewrite: soft template based neural summarization. In: Proceedings of the Association for Computational Linguistics (ACL) 2018.
  • Cohan A., Dernoncourt F., Kim DS., Bui T., Kim S., Chang W., Goharian N. A discourse-aware attention model for abstractive summarization of long documents. arXiv preprint arXiv:1804.05685 2018.
  • Cohan A., Goharian N. Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries 2018; 19: 287-303.
  • Creswell JW., Creswell JD. Mixed methods procedures. In: Research Design: Qualitative, Quantitative, and Mixed Methods Approaches 2018.
  • Devlin J. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 2018.
  • Edress Z., Ortakci Y. Optimizing text summarization with sentence clustering and natural language processing. International Journal of Advanced Computer Science & Applications 2024; 15(10).
  • Erkan G., Radev DR. Lexrank: graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 2004; 22: 457-479.
  • Fabbri AR., Kryściński W., McCann B., Xiong C., Socher R., Radev D. Summeval: re-evaluating summarization evaluation. Transactions of the Association for Computational Linguistics 2021; 9: 391-409.
  • Fateh Ali N., Mohtasim M., Mosharrof S., Gopi Krishna T. Automated literature review using nlp techniques and llm-based retrieval-augmented generation. arXiv e-prints 2024; arXiv-2411.
  • Fauzi R., Iqbal M., Haryanti T. Design and implementation of a final project plagiarism detection system using cosine similarity method. International Journal of Applied Information Technology 2021; 59-74.
  • Ghanem FA., Padma MC., Abdulwahab HM., Alkhatib R. Novel genetic optimization techniques for accurate social media data summarization and classification using deep learning models. Technologies 2024; 12(10): 199.
  • Gulati V., Kumar D., Popescu DE., Hemanth JD. Extractive article summarization using integrated TextRank and BM25+ algorithm. Electronics 2023; 12(2): 372.
  • Gusenbauer M., Haddaway NR. Which academic search systems are suitable for systematic reviews or meta‐analyses? evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods 2020; 11(2): 181-217.
  • Hemamou L., Debiane M. Scaling up summarization: leveraging large language models for long text extractive summarization. arXiv preprint arXiv:2408.15801 2024.
  • Januzaj Y., Luma A. Cosine similarity–a computing approach to match similarity between higher education programs and job market demands based on maximum number of common words. International Journal of Emerging Technologies in Learning (iJET) 2022; 17(12): 258-268.
  • Kadhim EA., Feizi-Derakhshi MR., Aghdasi HS. Advanced text summarization model incorporating nlp techniques and feature-based scoring. IEEE Access 2025.
  • Lin CY. Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out 2004; 74-81.
  • Liu Y., Lapata M. Text summarization with pretrained encoders. arXiv preprint arXiv:1908.08345 2019. Loper E., Bird S. Nltk: the natural language toolkit. arXiv preprint cs/0205028 2002.
  • Mihalcea R., Tarau P. Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing 2004; 404-411.
  • Mihalcea R. Graph-based natural language processing and information retrieval. Cambridge University Press 2011.
  • Naing I., Funabiki N., Wai KH., Aung ST. A design of automatic reference paper collection system using Selenium and Bert model. In: 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE) 2023; 267-268.
  • Nenkova A., McKeown K. Automatic summarization. Foundations and Trends® in Information Retrieval 2011; 5(2–3): 103-233.
  • Orel E., Ciglenecki I., Thiabaud A., Temerev A., Calmy A., Keiser O., Merzouki A. An automated literature review tool (LiteRev) for streamlining and accelerating research using natural language processing and machine learning: descriptive performance evaluation study. Journal of Medical Internet Research 2023; 25: e39736.
  • Papineni K., Roukos S., Ward T., Zhu WJ. Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002; 311-318.
  • Ramos J. Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning 2003; 242(1): 29-48.
  • See A., Liu PJ., Manning CD. Get to the point: summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368 2017.
  • Setiawan GH., Adnyana IM. Improving helpdesk chatbot performance with term frequency-inverse document frequency (TF-IDF) and cosine similarity models. Journal of Applied Informatics and Computing 2023; 7(2): 252-257.
  • Sofi SM., Selamat A. Aspect based sentiment analysis: feature extraction using latent dirichlet allocation (LDA) and term frequency-inverse document frequency (TF-IDF) in machine learning (ML). Malaysian Journal of Information and Communication Technology (MyJICT) 2023; 169-179.
  • Thirumahal R. TF-IDF vectorization and clustering for extractive text summarization. Journal of Information Technology and Digital World 2024; 6(1): 96-111.
  • Vaswani A. Attention is all you need. In: Advances in Neural Information Processing Systems 2017.
Toplam 36 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Makine Öğrenme (Diğer)
Bölüm Araştırma Makalesi
Yazarlar

Onur Şahin 0009-0000-8955-658X

Rıdvan Yayla 0000-0002-1105-9169

Gönderilme Tarihi 20 Ekim 2024
Kabul Tarihi 12 Şubat 2025
Yayımlanma Tarihi 16 Haziran 2025
Yayımlandığı Sayı Yıl 2025 Cilt: 8 Sayı: 3

Kaynak Göster

APA Şahin, O., & Yayla, R. (2025). Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 8(3), 1405-1426. https://doi.org/10.47495/okufbed.1570542
AMA 1.Şahin O, Yayla R. Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2025;8(3):1405-1426. doi:10.47495/okufbed.1570542
Chicago Şahin, Onur, ve Rıdvan Yayla. 2025. “Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8 (3): 1405-26. https://doi.org/10.47495/okufbed.1570542.
EndNote Şahin O, Yayla R (01 Haziran 2025) Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8 3 1405–1426.
IEEE [1]O. Şahin ve R. Yayla, “Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği”, Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, c. 8, sy 3, ss. 1405–1426, Haz. 2025, doi: 10.47495/okufbed.1570542.
ISNAD Şahin, Onur - Yayla, Rıdvan. “Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi 8/3 (01 Haziran 2025): 1405-1426. https://doi.org/10.47495/okufbed.1570542.
JAMA 1.Şahin O, Yayla R. Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi. 2025;8:1405–1426.
MLA Şahin, Onur, ve Rıdvan Yayla. “Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği”. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi, c. 8, sy 3, Haziran 2025, ss. 1405-26, doi:10.47495/okufbed.1570542.
Vancouver 1.Şahin O, Yayla R. Metin Özetleme ve Benzerlik Analizi Üzerine Prototip Bir Çalışma: Google Scholar Örneği. Osmaniye Korkut Ata Üniversitesi Fen Bilimleri Enstitüsü Dergisi [Internet]. 01 Haziran 2025;8(3):1405-26. Erişim adresi: https://izlik.org/JA83NP34PT

23487




196541947019414  

1943319434 19435194361960219721 19784  2123822610 23877

* Uluslararası Hakemli Dergi (International Peer Reviewed Journal)

* Yazar/yazarlardan hiçbir şekilde MAKALE BASIM ÜCRETİ vb. şeyler istenmemektedir (Free submission and publication).

* Yılda Ocak, Mart, Haziran, Eylül ve Aralık'ta olmak üzere 5 sayı yayınlanmaktadır (Published 5 times a year)

* Dergide, Türkçe ve İngilizce makaleler basılmaktadır.

*Dergi açık erişimli bir dergidir.

Creative Commons License

Bu web sitesi Creative Commons Atıf 4.0 Uluslararası Lisansı ile lisanslanmıştır.