Bilişim Teknolojileri Dergisi

1307-9697 2147-0715

Gazi Üniversitesi

10.17671/gazibtd.714447

Computer Software

Bilgisayar Yazılımı

Preparing Interdisciplinary Graduate Course Contents Using Natural Language Processing Techniques

Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması

https://orcid.org/0000-0002-2166-1102

Albayrak

Ahmet

DÜZCE ÜNİVERSİTESİ, TEKNOLOJİ FAKÜLTESİ

10 30 2020

13 4 373 383 04 04 2020 08 11 2020

2008

Bilişim Teknolojileri Dergisi

In this study, natural language processing methods, one of the data mining techniques, were used to prepare the content of an interdisciplinary course that is planned to be opened at graduate level. The graduate course is called Data Science and Applications. Data science is an interdisciplinary concept that includes statistics and computer science. The course has no place in the literature with a similar name. Data science is an approach that prioritizes data and is applied in many fields. Since the application area is very wide, the course is called Data Science and Applications. Papers published at a conference organized by IEEE for years were used as a data set in determining the course content. The conference called Data Science and Advanced Analytics will be held for the 7th time this year. Papers accepted to the conference in 2015, 2016, 2017 and 2018 were used in the data set. The title texts and keywords of the papers were analyzed with natural language processing techniques and the course content was determined. In this study, after the first data set was prepared, data-cleaning process was performed on the data, and then the title of the papers was divided into words. The frequencies of the words are found in the data set devoted to the words and the first twenty words are selected according to the frequency. Apache Spark NTK package was used in the natural language processing process. Since the 20 words chosen are atomic, the main topic titles are determined by the induction method.

Bu çalışmada lisansüstü seviyede açılan düşünülen disiplinler arası bir dersin içeriğinin hazırlanması için veri madenciliği tekniklerinden doğal dil işleme yöntemleri kullanılmıştır. Lisansüstü ders, Veri Bilimi ve Uygulamaları adını taşımaktadır. Veri bilimi temelde istatistik ve bilgisayar bilimlerini içine alan disiplinler arası bir kavramdır. Dersin benzer bir ad ile literatürde yeri yoktur. Veri bilimi yaklaşımı veriyi öncelikleyen ve oldukça fazla alanda uygulanan bir yaklaşımdır. Uygulama alanı çok geniş olduğundan derse Veri Bilimi ve Uygulamaları adı verilmiştir. IEEE’nin yıllardır düzenlediği bir konferansta basılan bildiriler ders içeriğinin belirlenmesinde veri seti olarak kullanılmıştır. Data Science and Advanced Analytics adındaki konferansın bu yıl 7. si düzenlenecektir. 2015, 2016, 2017 ve 2018 yıllarında konferansa kabul edilen bildiriler veri setinde kullanılmıştır. Bildirilerin başlık kısımları ve anahtar kelimeler doğal dil işleme teknikleri ile analiz edilerek ders içeriği belirlenmiştir. Bu çalışmada ilk olarak veri seti hazırlandıktan sonra, veri üzerinde veri temizleme işlemi yapılmış ardından bildiri başlıkları sözcüklere ayrılmıştır. Sözcüklere ayrılan veri seti içinde sözcüklerin frekansları bulunarak frekansa göre ilk yirmi sözcük seçilmiştir. Doğal dil işleme sürecinde Apache Spark NTK paketi kullanılmıştır. Seçilen 20 sözcük atomik olduğundan tümevarım yöntemi ile ana konu başlıkları belirlenmiştir.

veri bilimi doğal dil işleme ders içeriği hazırlama veri bilimcisi konu modelleme

data science natural language processing course content preparation data scientist topic modeling

G. Strawn, “Data Scientist”, IT Prof., 18(3), 55–57, 2016.

M. Kim, T. Zimmermann, R. Deline, and A. Begel, “Data scientists in software teams: State of the art and challenges”, IEEE Trans. Softw. Eng., 44(11), 1024–1038, 2018.

C. Costa and M. Y. Santos, “The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age”, Int. J. Inf. Manage., 37(6), 726–734, 2017.

V. Dhar, “Data science and prediction”, Commun. ACM, 56( 12), 64–73, 2013.

F. W. Spaid and J. C. Frishett, “Incipient separation of a supersonic, turbulent boundary layer, including effects of heat transfer”, AIAA Journal, 10(7). 1972.

F. Provost and T. Fawcett, “Data Science and its Relationship to Big Data and Data-Driven Decision Making”, Big Data, 1(1) 51–59, 2013.

H. Hu, Y. Luo, Y. Wen, Y. S. Ong, and X. Zhang, “How to Find a Perfect Data Scientist: A Distance-Metric Learning Approach”, IEEE Access, 6, 60380–60395, 2018.

Internet: McKinsey, Big data: the next frontier for innovation, competition, and productivity,athttp://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_ innovation, 21.01.2020.

Internet: Columbia University, Data Science Institue. Columbia University, https://datascience.columbia.edu/master-of-science-in-data-science,15.02.2020.

E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.

M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2020.

M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis”, Neurocomputing, 378, 315–323, 2020.

F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review”, IEEE Access, 6, 56046–56058, 2018.

K. A. Renn and E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs”, J. Coll. Stud. Dev., 49(4), 319–335, 2008.

I. Y. Song and Y. Zhu, “Big data and data science: what should we teach?”, Expert Syst., 33(4), 364–373, 2016.

Y. Zhang, M. Chen, and L. Liu, A review on text mining, Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, 681–685, 2015.

I. Yahav, O. Shehory, and D. Schwartz, “Comments Mining With TF-IDF: The Inherent Bias and Its Removal”, IEEE Trans. Knowl. Data Eng., 31(3), 437–450, 2019.

X. Chen, D. Zou, G. Cheng, and H. Xie, “Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education”, Comput. Educ., 151, 2020.

L. Yao, Y. Zhang, Q. Chen, H. Qian, B. Wei, Z. Hu, “Mining coherent topics in documents using word embeddings and large-scale text data,” Eng. Appl. Artif. Intell., 64, 432–439, 2017.

M. Pejic-Bach, T. Bertoncel, M. Meško, Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2018.

F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review,” IEEE Access, 6, 56046–56058, 2018.

K. A. Renn, E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs,” J. Coll. Stud. Dev., 49(4), 319–335, 2008.

E. Ustun., “Learning Analytics and Applications in Higher Education”, Bilişim Teknolojileri Dergisi, 13(3), 2020.

H. Polat, M. Korpe, “Extracting Close Meaning Concepts from GNAT Parliamentary Minutes”, Bilişim Teknolojileri Dergisi, 11(3), 2018.

Ç. Aci, A. Çirak, “Turkish News Articles Categorization Using Convolutional Neural Networks and Word2Vec”, Bilişim Teknolojileri Dergisi, 12(3), 2019.