Bilimsel Makalelerin Atıf Sayısı Tahmini

Hakan Ezgi Kızılöz

doi:10.31590/ejosat.araconf48

Research Article

Bilimsel Makalelerin Atıf Sayısı Tahmini

Year 2020, Ejosat Special Issue 2020 (ARACONF), 370 - 375, 01.04.2020

Hakan Ezgi Kızılöz

https://doi.org/10.31590/ejosat.araconf48

Cited By: 8

Abstract

Bilimsel makalelerin etkisini ölçmek kolay ya da tekdüze bir süreç değildir. Makalelerin atıf sayıları, etkilerinin ölçümünde önemli bir rol oynamaktadır. Öte yandan, bir makalenin atıf sayısı, makale yayınlandığı anda elde edilebilen bir veri değildir. Atıf sayısının elde edilebilmesi için makalenin yayınlanması ve toplulukta fark edilerek atıf(lar) alması, yani uzun sayılabilecek bir süre geçmesi gerekmektedir. Bu çalışmada, atıf sayısının erişilebilir olmaması problemini basitleştirdik ve bir makalenin yayınlanmasından sonraki bir yıl içerisinde en az bir atıf alıp almayacağını tahmin eden bir derin öğrenme modeli oluşturduk. Modelimizde kelime dizileri arasındaki ilişkiyi bulabilmek adına Uzun Kısa Süreli Bellek (UKSB) kullanılmaktadır. Bunun yanı sıra, bu çalışmada modelimizin makale tam metni yerine sadece özetini kullandığımızda bu durumun performans üzerindeki etkisini de analiz ediyoruz. Deneylerimizde herkese açık veri kümelerini kullanılmıştır. Makalelerin tam metni Kaggle’da bulunan bir veri kümesinde mevcuttur. Özet, üstveri öznitelikleri ve ilk yıl atıf sayıları ise Microsoft Academic Graph’tan çıkarılmıştır. Elde edilen sonuçlar, tam metin kullanımının daha yüksek doğrulukla sonuçlandığını göstermektedir. Fakat tam metin kullanıldığında modelin eğitim süresi, özet kullanıldığındaki eğitim süresine göre çok yüksek çıkmaktadır. Ayrıca, tam metinlere kıyasla makale özetleri daha kolay erişilebilir durumdadır. Son olarak, eğittiğimiz model bu makalenin ilk yayın yılında en az bir atıf alacağını öngörmektedir.

Keywords

Derin Öğrenme, Uzun Kısa Süreli Bellek, Metin Madenciliği, Denetimli Öğrenme, Atıf Tahmini

References

J. Beel and B. Gipp, “Google Scholar’s ranking algorithm: an introductory overview,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 1, pp. 230–241.
M. Jacobson. (2017) How Far Down the Search Engine Results Page Will Most People Go? [Online]. Available: https://www.theleverageway.com/blog/how-far-down-the-search-engine-results-page-will-most-people-go/
R. K. Merton, “The Matthew effect in science: The reward and communication systems of science are considered,” Science, vol. 159 (3810), pp. 58–63, 1968, American Association for the Advancement of Science.
J. Gehrke, P. Ginsparg, and J. Kleinberg, “Overview of the 2003 KDD Cup,” ACM SIGKDD Explorations Newsletter, vol. 5 (2), pp. 149–151, 2003, ACM.
K. McKeown, H. Daume III, S. Chaturvedi, J. Paparrizos, K. Thadani, P. Barrio, O. Biran, S. Bothe, M. Collins, K. R. Fleischmann, and others, “Predicting the impact of scientific concepts using full-text features,” Journal of the Association for Information Science and Technology, vol. 67 (11), pp. 2684–2696, 2016, Wiley Online Library.
R. Yan, J. Tang, X. Liu, D. Shan, and X. Li, “Citation count prediction: learning to estimate future citations for literature,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 1247–1252, ACM.
R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li, “To better stand on the shoulder of giants,” in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 2012, pp. 51–60, ACM.
J. Chen and C. Zhang, “Predicting citation counts of papers,” in IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015, pp. 434–440, IEEE.
C. Castillo, D. Donato, and A. Gionis, “Estimating number of citations using author reputation,” in International Symposium on String Processing and Information Retrieval, 2007, pp. 107–117, Springer.
L. Weihs and O. Etzioni, “Learning to predict citation-based impact measures,” in Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 2017, pp. 49–58, IEEE.
A. Ibáñez, P. Larrañaga, and C. Bielza, “Predicting citation count of Bioinformatics papers within four years of publication,” Bioinformatics, vol. 25 (24), pp. 3303–3309, 2009, Oxford University Press.
A. Livne, E. Adar, J. Teevan, and S. Dumais, “Predicting citation counts using text and graph mining,” in Proc. the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications, 2013.
N. Pobiedina and R. Ichise, “Predicting citation counts for academic literature using graph pattern mining,” in International conference on industrial, engineering and other applications of applied intelligent systems, 2014, pp. 109–119, Springer.
C. Stegehuis, N. Litvak, and L. Waltman, “Predicting the long-term citation impact of recent publications,” Journal of informetrics, vol. 9 (3), pp. 642–657, 2015, Elsevier.

Citation Count Prediction of Academic Papers

Year 2020, Ejosat Special Issue 2020 (ARACONF), 370 - 375, 01.04.2020

Hakan Ezgi Kızılöz

https://doi.org/10.31590/ejosat.araconf48

Cited By: 8

Abstract

Even though measuring the impact of scientific papers is not a straightforward process, their citation counts play a significant role in this determination. Citation count of a paper, however, is not available until the paper gets published and a substantial amount of time passes until it spreads through the community. To overcome this issue, we relax the problem by building a deep learning model that predicts whether a paper will receive at least one citation in a one-year interval after its publication. Our model employs Long Short-Term Memory (LSTM) to capture the relationship between word sequences. In our study, we also analyze the effect of using the abstract versus full-text of papers over performance. We utilize publicly available datasets in our experiments: Kaggle for the full-text of papers, and Microsoft Academic Graph for extracting the abstract, metadata features and the initial year citation counts of papers. Our obtained results show that the use of full-text leads to higher accuracy, yet with an enormous trade-off on training time. Additionally, paper abstracts are easier to access as compared to the full-text. Finally, our model predicts that this paper will receive at least one citation during its initial year of publication.

Keywords

Deep Learning, LSTM, Text Mining, Supervised Learning, Citation Prediction

References

J. Beel and B. Gipp, “Google Scholar’s ranking algorithm: an introductory overview,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 1, pp. 230–241.
M. Jacobson. (2017) How Far Down the Search Engine Results Page Will Most People Go? [Online]. Available: https://www.theleverageway.com/blog/how-far-down-the-search-engine-results-page-will-most-people-go/
R. K. Merton, “The Matthew effect in science: The reward and communication systems of science are considered,” Science, vol. 159 (3810), pp. 58–63, 1968, American Association for the Advancement of Science.
J. Gehrke, P. Ginsparg, and J. Kleinberg, “Overview of the 2003 KDD Cup,” ACM SIGKDD Explorations Newsletter, vol. 5 (2), pp. 149–151, 2003, ACM.
K. McKeown, H. Daume III, S. Chaturvedi, J. Paparrizos, K. Thadani, P. Barrio, O. Biran, S. Bothe, M. Collins, K. R. Fleischmann, and others, “Predicting the impact of scientific concepts using full-text features,” Journal of the Association for Information Science and Technology, vol. 67 (11), pp. 2684–2696, 2016, Wiley Online Library.
R. Yan, J. Tang, X. Liu, D. Shan, and X. Li, “Citation count prediction: learning to estimate future citations for literature,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 1247–1252, ACM.
R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li, “To better stand on the shoulder of giants,” in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 2012, pp. 51–60, ACM.
J. Chen and C. Zhang, “Predicting citation counts of papers,” in IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015, pp. 434–440, IEEE.
C. Castillo, D. Donato, and A. Gionis, “Estimating number of citations using author reputation,” in International Symposium on String Processing and Information Retrieval, 2007, pp. 107–117, Springer.
L. Weihs and O. Etzioni, “Learning to predict citation-based impact measures,” in Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 2017, pp. 49–58, IEEE.
A. Ibáñez, P. Larrañaga, and C. Bielza, “Predicting citation count of Bioinformatics papers within four years of publication,” Bioinformatics, vol. 25 (24), pp. 3303–3309, 2009, Oxford University Press.
A. Livne, E. Adar, J. Teevan, and S. Dumais, “Predicting citation counts using text and graph mining,” in Proc. the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications, 2013.
N. Pobiedina and R. Ichise, “Predicting citation counts for academic literature using graph pattern mining,” in International conference on industrial, engineering and other applications of applied intelligent systems, 2014, pp. 109–119, Springer.
C. Stegehuis, N. Litvak, and L. Waltman, “Predicting the long-term citation impact of recent publications,” Journal of informetrics, vol. 9 (3), pp. 642–657, 2015, Elsevier.

There are 14 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Articles
Authors	Hakan Ezgi Kızılöz 0000-0002-4815-9024
Publication Date	April 1, 2020
Published in Issue	Year 2020 Ejosat Special Issue 2020 (ARACONF)

Cite

APA	Kızılöz, H. E. (2020). Bilimsel Makalelerin Atıf Sayısı Tahmini. Avrupa Bilim Ve Teknoloji Dergisi370-375. https://doi.org/10.31590/ejosat.araconf48