Summarising big data: public GitHub dataset for software engineering challenges

Abdulkadir Şeker; Banu Diri; Halil Arslan; Fatih Amasyalı

doi:10.17776/csj.728932

Araştırma Makalesi

Yıl 2020, Cilt: 41 Sayı: 3, 720 - 724, 30.09.2020

Abdulkadir Şeker Banu Diri Halil Arslan Fatih Amasyalı

https://doi.org/10.17776/csj.728932

Cited By: 1

Öz

Kaynakça

V. Cosentino, J. Luis, and J. Cabot. Findings from GitHub: methods, datasets and limitations. Proceedings of the 13th International Workshop on Mining Software Repositories, (2016), 137–141.
V. Cosentino, J. L. Canovas Izquierdo, and J. Cabot. A Systematic Mapping Study of Software Development With GitHub. IEEE Access, 5 (2017) 7173–7192.
Z. Kotti and D. Spinellis. Standing on shoulders or feet?: the usage of the MSR data papers. Proceedings of the 16th International Conference on Mining Software Repositories, (2019) 565–576.
G. Gousios. The GHTorrent dataset and tool suite. Proceedings of the 10th Working Conference on Mining Software Repositories, (2013) 233–236.
Y. Zhang, G. Yin, Y. Yu, and H. Wang. Investigating social media in GitHub’s pull-requests: a case study on Ruby on Rails. Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies - CrowdSoft 2014 (2014) 37–41.
E. van der Veen, G. Gousios, and A. Zaidman. Automatically Prioritizing Pull Requests. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. (2015) 357–361.
Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, (2015) 367–371.
M. L. de L. Júnior, D. M. Soares, A. Plastino, and L. Murta. Automatic assignment of integrators to pull requests: The importance of selecting appropriate attributes. J. Syst. Softw., 144 (2018) 181–196.
G. Zhao, D. A. da Costa, and Y. Zou. Improving the Pull Requests Review Process Using Learning-to-rank Algorithms. Empir. Softw. Eng., (2019) 1–31.

Summarising big data: public GitHub dataset for software engineering challenges

Yıl 2020, Cilt: 41 Sayı: 3, 720 - 724, 30.09.2020

Abdulkadir Şeker Banu Diri Halil Arslan Fatih Amasyalı

https://doi.org/10.17776/csj.728932

Cited By: 1

Öz

In open-source software development environments; textual, numerical, and relationship-based data generated are of interest to researchers. Various data sets are available for this data, which is frequently used in areas such as software engineering and natural language processing. However, since these data sets contain all the data in the environment, the problem arises in the terabytes of data processing. For this reason, almost all of the studies using GitHub data use filtered data according to certain criteria. In this context, using a different data set in each study makes a comparison of the accuracy of the studies quite difficult. In order to solve this problem, a common dataset was created and shared with the researchers, which would allow to work on many software engineering problems.

Anahtar Kelimeler

GitHub, ghtorrent, big data

Kaynakça

V. Cosentino, J. Luis, and J. Cabot. Findings from GitHub: methods, datasets and limitations. Proceedings of the 13th International Workshop on Mining Software Repositories, (2016), 137–141.
V. Cosentino, J. L. Canovas Izquierdo, and J. Cabot. A Systematic Mapping Study of Software Development With GitHub. IEEE Access, 5 (2017) 7173–7192.
Z. Kotti and D. Spinellis. Standing on shoulders or feet?: the usage of the MSR data papers. Proceedings of the 16th International Conference on Mining Software Repositories, (2019) 565–576.
G. Gousios. The GHTorrent dataset and tool suite. Proceedings of the 10th Working Conference on Mining Software Repositories, (2013) 233–236.
Y. Zhang, G. Yin, Y. Yu, and H. Wang. Investigating social media in GitHub’s pull-requests: a case study on Ruby on Rails. Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies - CrowdSoft 2014 (2014) 37–41.
E. van der Veen, G. Gousios, and A. Zaidman. Automatically Prioritizing Pull Requests. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. (2015) 357–361.
Y. Yu, H. Wang, V. Filkov, P. Devanbu, and B. Vasilescu. Wait for It: Determinants of Pull Request Evaluation Latency on GitHub. 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, (2015) 367–371.
M. L. de L. Júnior, D. M. Soares, A. Plastino, and L. Murta. Automatic assignment of integrators to pull requests: The importance of selecting appropriate attributes. J. Syst. Softw., 144 (2018) 181–196.
G. Zhao, D. A. da Costa, and Y. Zou. Improving the Pull Requests Review Process Using Learning-to-rank Algorithms. Empir. Softw. Eng., (2019) 1–31.

Toplam 9 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Engineering Sciences
Yazarlar	Abdulkadir Şeker 0000-0002-4552-2676 Banu Diri 0000-0002-4052-0049 Halil Arslan 0000-0003-3286-5159 Fatih Amasyalı 0000-0002-0404-5973
Yayımlanma Tarihi	30 Eylül 2020
Gönderilme Tarihi	29 Nisan 2020
Kabul Tarihi	12 Haziran 2020
Yayımlandığı Sayı	Yıl 2020Cilt: 41 Sayı: 3

Kaynak Göster

APA	Şeker, A., Diri, B., Arslan, H., Amasyalı, F. (2020). Summarising big data: public GitHub dataset for software engineering challenges. Cumhuriyet Science Journal, 41(3), 720-724. https://doi.org/10.17776/csj.728932

Cumhuriyet Science Journal

Öz

Kaynakça

Summarising big data: public GitHub dataset for software engineering challenges

Öz

Anahtar Kelimeler

Kaynakça

Ayrıntılar

Kaynak Göster

Cited By

New Developer Metrics for Open Source Software Development Challenges: An Empirical Study of Project Recommendation Systems

Applied Sciences

Abdulkadir Şeker

https://doi.org/10.3390/app11030920