Research Article
PDF EndNote BibTex RIS Cite

Bilimsel Makalelerin Atıf Sayısı Tahmini

Year 2020, 370 - 375, 01.04.2020
https://doi.org/10.31590/ejosat.araconf48

Abstract

Bilimsel makalelerin etkisini ölçmek kolay ya da tekdüze bir süreç değildir. Makalelerin atıf sayıları, etkilerinin ölçümünde önemli bir rol oynamaktadır. Öte yandan, bir makalenin atıf sayısı, makale yayınlandığı anda elde edilebilen bir veri değildir. Atıf sayısının elde edilebilmesi için makalenin yayınlanması ve toplulukta fark edilerek atıf(lar) alması, yani uzun sayılabilecek bir süre geçmesi gerekmektedir. Bu çalışmada, atıf sayısının erişilebilir olmaması problemini basitleştirdik ve bir makalenin yayınlanmasından sonraki bir yıl içerisinde en az bir atıf alıp almayacağını tahmin eden bir derin öğrenme modeli oluşturduk. Modelimizde kelime dizileri arasındaki ilişkiyi bulabilmek adına Uzun Kısa Süreli Bellek (UKSB) kullanılmaktadır. Bunun yanı sıra, bu çalışmada modelimizin makale tam metni yerine sadece özetini kullandığımızda bu durumun performans üzerindeki etkisini de analiz ediyoruz. Deneylerimizde herkese açık veri kümelerini kullanılmıştır. Makalelerin tam metni Kaggle’da bulunan bir veri kümesinde mevcuttur. Özet, üstveri öznitelikleri ve ilk yıl atıf sayıları ise Microsoft Academic Graph’tan çıkarılmıştır. Elde edilen sonuçlar, tam metin kullanımının daha yüksek doğrulukla sonuçlandığını göstermektedir. Fakat tam metin kullanıldığında modelin eğitim süresi, özet kullanıldığındaki eğitim süresine göre çok yüksek çıkmaktadır. Ayrıca, tam metinlere kıyasla makale özetleri daha kolay erişilebilir durumdadır. Son olarak, eğittiğimiz model bu makalenin ilk yayın yılında en az bir atıf alacağını öngörmektedir.

References

  • J. Beel and B. Gipp, “Google Scholar’s ranking algorithm: an introductory overview,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 1, pp. 230–241.
  • M. Jacobson. (2017) How Far Down the Search Engine Results Page Will Most People Go? [Online]. Available: https://www.theleverageway.com/blog/how-far-down-the-search-engine-results-page-will-most-people-go/
  • R. K. Merton, “The Matthew effect in science: The reward and communication systems of science are considered,” Science, vol. 159 (3810), pp. 58–63, 1968, American Association for the Advancement of Science.
  • J. Gehrke, P. Ginsparg, and J. Kleinberg, “Overview of the 2003 KDD Cup,” ACM SIGKDD Explorations Newsletter, vol. 5 (2), pp. 149–151, 2003, ACM.
  • K. McKeown, H. Daume III, S. Chaturvedi, J. Paparrizos, K. Thadani, P. Barrio, O. Biran, S. Bothe, M. Collins, K. R. Fleischmann, and others, “Predicting the impact of scientific concepts using full-text features,” Journal of the Association for Information Science and Technology, vol. 67 (11), pp. 2684–2696, 2016, Wiley Online Library.
  • R. Yan, J. Tang, X. Liu, D. Shan, and X. Li, “Citation count prediction: learning to estimate future citations for literature,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 1247–1252, ACM.
  • R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li, “To better stand on the shoulder of giants,” in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 2012, pp. 51–60, ACM.
  • J. Chen and C. Zhang, “Predicting citation counts of papers,” in IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015, pp. 434–440, IEEE.
  • C. Castillo, D. Donato, and A. Gionis, “Estimating number of citations using author reputation,” in International Symposium on String Processing and Information Retrieval, 2007, pp. 107–117, Springer.
  • L. Weihs and O. Etzioni, “Learning to predict citation-based impact measures,” in Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 2017, pp. 49–58, IEEE.
  • A. Ibáñez, P. Larrañaga, and C. Bielza, “Predicting citation count of Bioinformatics papers within four years of publication,” Bioinformatics, vol. 25 (24), pp. 3303–3309, 2009, Oxford University Press.
  • A. Livne, E. Adar, J. Teevan, and S. Dumais, “Predicting citation counts using text and graph mining,” in Proc. the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications, 2013.
  • N. Pobiedina and R. Ichise, “Predicting citation counts for academic literature using graph pattern mining,” in International conference on industrial, engineering and other applications of applied intelligent systems, 2014, pp. 109–119, Springer.
  • C. Stegehuis, N. Litvak, and L. Waltman, “Predicting the long-term citation impact of recent publications,” Journal of informetrics, vol. 9 (3), pp. 642–657, 2015, Elsevier.

Citation Count Prediction of Academic Papers

Year 2020, 370 - 375, 01.04.2020
https://doi.org/10.31590/ejosat.araconf48

Abstract

Even though measuring the impact of scientific papers is not a straightforward process, their citation counts play a significant role in this determination. Citation count of a paper, however, is not available until the paper gets published and a substantial amount of time passes until it spreads through the community. To overcome this issue, we relax the problem by building a deep learning model that predicts whether a paper will receive at least one citation in a one-year interval after its publication. Our model employs Long Short-Term Memory (LSTM) to capture the relationship between word sequences. In our study, we also analyze the effect of using the abstract versus full-text of papers over performance. We utilize publicly available datasets in our experiments: Kaggle for the full-text of papers, and Microsoft Academic Graph for extracting the abstract, metadata features and the initial year citation counts of papers. Our obtained results show that the use of full-text leads to higher accuracy, yet with an enormous trade-off on training time. Additionally, paper abstracts are easier to access as compared to the full-text. Finally, our model predicts that this paper will receive at least one citation during its initial year of publication.

References

  • J. Beel and B. Gipp, “Google Scholar’s ranking algorithm: an introductory overview,” in Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), 2009, vol. 1, pp. 230–241.
  • M. Jacobson. (2017) How Far Down the Search Engine Results Page Will Most People Go? [Online]. Available: https://www.theleverageway.com/blog/how-far-down-the-search-engine-results-page-will-most-people-go/
  • R. K. Merton, “The Matthew effect in science: The reward and communication systems of science are considered,” Science, vol. 159 (3810), pp. 58–63, 1968, American Association for the Advancement of Science.
  • J. Gehrke, P. Ginsparg, and J. Kleinberg, “Overview of the 2003 KDD Cup,” ACM SIGKDD Explorations Newsletter, vol. 5 (2), pp. 149–151, 2003, ACM.
  • K. McKeown, H. Daume III, S. Chaturvedi, J. Paparrizos, K. Thadani, P. Barrio, O. Biran, S. Bothe, M. Collins, K. R. Fleischmann, and others, “Predicting the impact of scientific concepts using full-text features,” Journal of the Association for Information Science and Technology, vol. 67 (11), pp. 2684–2696, 2016, Wiley Online Library.
  • R. Yan, J. Tang, X. Liu, D. Shan, and X. Li, “Citation count prediction: learning to estimate future citations for literature,” in Proceedings of the 20th ACM international conference on Information and knowledge management, 2011, pp. 1247–1252, ACM.
  • R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li, “To better stand on the shoulder of giants,” in Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries, 2012, pp. 51–60, ACM.
  • J. Chen and C. Zhang, “Predicting citation counts of papers,” in IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015, pp. 434–440, IEEE.
  • C. Castillo, D. Donato, and A. Gionis, “Estimating number of citations using author reputation,” in International Symposium on String Processing and Information Retrieval, 2007, pp. 107–117, Springer.
  • L. Weihs and O. Etzioni, “Learning to predict citation-based impact measures,” in Proceedings of the 17th ACM/IEEE Joint Conference on Digital Libraries, 2017, pp. 49–58, IEEE.
  • A. Ibáñez, P. Larrañaga, and C. Bielza, “Predicting citation count of Bioinformatics papers within four years of publication,” Bioinformatics, vol. 25 (24), pp. 3303–3309, 2009, Oxford University Press.
  • A. Livne, E. Adar, J. Teevan, and S. Dumais, “Predicting citation counts using text and graph mining,” in Proc. the iConference 2013 Workshop on Computational Scientometrics: Theory and Applications, 2013.
  • N. Pobiedina and R. Ichise, “Predicting citation counts for academic literature using graph pattern mining,” in International conference on industrial, engineering and other applications of applied intelligent systems, 2014, pp. 109–119, Springer.
  • C. Stegehuis, N. Litvak, and L. Waltman, “Predicting the long-term citation impact of recent publications,” Journal of informetrics, vol. 9 (3), pp. 642–657, 2015, Elsevier.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Hakan Ezgi KIZILÖZ>
TÜRK HAVA KURUMU ÜNİVERSİTESİ
0000-0002-4815-9024
Türkiye

Publication Date April 1, 2020
Published in Issue Year 2020, Volume , Issue

Cite

Bibtex @research article { ejosat711034, journal = {Avrupa Bilim ve Teknoloji Dergisi}, eissn = {2148-2683}, address = {}, publisher = {Osman SAĞDIÇ}, year = {2020}, pages = {370 - 375}, doi = {10.31590/ejosat.araconf48}, title = {Bilimsel Makalelerin Atıf Sayısı Tahmini}, key = {cite}, author = {Kızılöz, Hakan Ezgi} }
APA Kızılöz, H. E. (2020). Bilimsel Makalelerin Atıf Sayısı Tahmini . Avrupa Bilim ve Teknoloji Dergisi , Ejosat Special Issue 2020 (ARACONF) , 370-375 . DOI: 10.31590/ejosat.araconf48
MLA Kızılöz, H. E. "Bilimsel Makalelerin Atıf Sayısı Tahmini" . Avrupa Bilim ve Teknoloji Dergisi (2020 ): 370-375 <https://dergipark.org.tr/en/pub/ejosat/issue/53473/711034>
Chicago Kızılöz, H. E. "Bilimsel Makalelerin Atıf Sayısı Tahmini". Avrupa Bilim ve Teknoloji Dergisi (2020 ): 370-375
RIS TY - JOUR T1 - Citation Count Prediction of Academic Papers AU - Hakan EzgiKızılöz Y1 - 2020 PY - 2020 N1 - doi: 10.31590/ejosat.araconf48 DO - 10.31590/ejosat.araconf48 T2 - Avrupa Bilim ve Teknoloji Dergisi JF - Journal JO - JOR SP - 370 EP - 375 VL - IS - SN - -2148-2683 M3 - doi: 10.31590/ejosat.araconf48 UR - https://doi.org/10.31590/ejosat.araconf48 Y2 - 2020 ER -
EndNote %0 European Journal of Science and Technology Bilimsel Makalelerin Atıf Sayısı Tahmini %A Hakan Ezgi Kızılöz %T Bilimsel Makalelerin Atıf Sayısı Tahmini %D 2020 %J Avrupa Bilim ve Teknoloji Dergisi %P -2148-2683 %V %N %R doi: 10.31590/ejosat.araconf48 %U 10.31590/ejosat.araconf48
ISNAD Kızılöz, Hakan Ezgi . "Bilimsel Makalelerin Atıf Sayısı Tahmini". Avrupa Bilim ve Teknoloji Dergisi / (April 2020): 370-375 . https://doi.org/10.31590/ejosat.araconf48
AMA Kızılöz H. E. Bilimsel Makalelerin Atıf Sayısı Tahmini. EJOSAT. 2020; 370-375.
Vancouver Kızılöz H. E. Bilimsel Makalelerin Atıf Sayısı Tahmini. Avrupa Bilim ve Teknoloji Dergisi. 2020; 370-375.
IEEE H. E. Kızılöz , "Bilimsel Makalelerin Atıf Sayısı Tahmini", Avrupa Bilim ve Teknoloji Dergisi, pp. 370-375, Apr. 2020, doi:10.31590/ejosat.araconf48

Cited By