Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması

Ahmet Albayrak

doi:10.17671/gazibtd.714447

Research Article

Preparing Interdisciplinary Graduate Course Contents Using Natural Language Processing Techniques

Year 2020, Volume: 13 Issue: 4, 373 - 383, 30.10.2020

Ahmet Albayrak

https://doi.org/10.17671/gazibtd.714447

Cited By: 4

Abstract

In this study, natural language processing methods, one of the data mining techniques, were used to prepare the content of an interdisciplinary course that is planned to be opened at graduate level. The graduate course is called Data Science and Applications. Data science is an interdisciplinary concept that includes statistics and computer science. The course has no place in the literature with a similar name. Data science is an approach that prioritizes data and is applied in many fields. Since the application area is very wide, the course is called Data Science and Applications. Papers published at a conference organized by IEEE for years were used as a data set in determining the course content. The conference called Data Science and Advanced Analytics will be held for the 7th time this year. Papers accepted to the conference in 2015, 2016, 2017 and 2018 were used in the data set. The title texts and keywords of the papers were analyzed with natural language processing techniques and the course content was determined. In this study, after the first data set was prepared, data-cleaning process was performed on the data, and then the title of the papers was divided into words. The frequencies of the words are found in the data set devoted to the words and the first twenty words are selected according to the frequency. Apache Spark NTK package was used in the natural language processing process. Since the 20 words chosen are atomic, the main topic titles are determined by the induction method.

Keywords

data science, natural language processing, course content preparation, data scientist, topic modeling

References

G. Strawn, “Data Scientist”, IT Prof., 18(3), 55–57, 2016.
M. Kim, T. Zimmermann, R. Deline, and A. Begel, “Data scientists in software teams: State of the art and challenges”, IEEE Trans. Softw. Eng., 44(11), 1024–1038, 2018.
C. Costa and M. Y. Santos, “The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age”, Int. J. Inf. Manage., 37(6), 726–734, 2017.
V. Dhar, “Data science and prediction”, Commun. ACM, 56( 12), 64–73, 2013.
F. W. Spaid and J. C. Frishett, “Incipient separation of a supersonic, turbulent boundary layer, including effects of heat transfer”, AIAA Journal, 10(7). 1972.
F. Provost and T. Fawcett, “Data Science and its Relationship to Big Data and Data-Driven Decision Making”, Big Data, 1(1) 51–59, 2013.
H. Hu, Y. Luo, Y. Wen, Y. S. Ong, and X. Zhang, “How to Find a Perfect Data Scientist: A Distance-Metric Learning Approach”, IEEE Access, 6, 60380–60395, 2018.
Internet: McKinsey, Big data: the next frontier for innovation, competition, and productivity,athttp://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_ innovation, 21.01.2020.
Internet: Columbia University, Data Science Institue. Columbia University, https://datascience.columbia.edu/master-of-science-in-data-science,15.02.2020.
E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2020.
M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis”, Neurocomputing, 378, 315–323, 2020.
F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review”, IEEE Access, 6, 56046–56058, 2018.
K. A. Renn and E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs”, J. Coll. Stud. Dev., 49(4), 319–335, 2008.
I. Y. Song and Y. Zhu, “Big data and data science: what should we teach?”, Expert Syst., 33(4), 364–373, 2016.
Y. Zhang, M. Chen, and L. Liu, A review on text mining, Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, 681–685, 2015.
I. Yahav, O. Shehory, and D. Schwartz, “Comments Mining With TF-IDF: The Inherent Bias and Its Removal”, IEEE Trans. Knowl. Data Eng., 31(3), 437–450, 2019.
X. Chen, D. Zou, G. Cheng, and H. Xie, “Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education”, Comput. Educ., 151, 2020.
L. Yao, Y. Zhang, Q. Chen, H. Qian, B. Wei, Z. Hu, “Mining coherent topics in documents using word embeddings and large-scale text data,” Eng. Appl. Artif. Intell., 64, 432–439, 2017.
M. Pejic-Bach, T. Bertoncel, M. Meško, Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2018.
M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, 378, 315–323, 2020, doi: 10.1016/j.neucom.2019.08.096.
F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review,” IEEE Access, 6, 56046–56058, 2018.
K. A. Renn, E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs,” J. Coll. Stud. Dev., 49(4), 319–335, 2008.
E. Ustun., “Learning Analytics and Applications in Higher Education”, Bilişim Teknolojileri Dergisi, 13(3), 2020.
H. Polat, M. Korpe, “Extracting Close Meaning Concepts from GNAT Parliamentary Minutes”, Bilişim Teknolojileri Dergisi, 11(3), 2018.
Ç. Aci, A. Çirak, “Turkish News Articles Categorization Using Convolutional Neural Networks and Word2Vec”, Bilişim Teknolojileri Dergisi, 12(3), 2019.

Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması

Year 2020, Volume: 13 Issue: 4, 373 - 383, 30.10.2020

Ahmet Albayrak

https://doi.org/10.17671/gazibtd.714447

Cited By: 4

Abstract

Bu çalışmada lisansüstü seviyede açılan düşünülen disiplinler arası bir dersin içeriğinin hazırlanması için veri madenciliği tekniklerinden doğal dil işleme yöntemleri kullanılmıştır. Lisansüstü ders, Veri Bilimi ve Uygulamaları adını taşımaktadır. Veri bilimi temelde istatistik ve bilgisayar bilimlerini içine alan disiplinler arası bir kavramdır. Dersin benzer bir ad ile literatürde yeri yoktur. Veri bilimi yaklaşımı veriyi öncelikleyen ve oldukça fazla alanda uygulanan bir yaklaşımdır. Uygulama alanı çok geniş olduğundan derse Veri Bilimi ve Uygulamaları adı verilmiştir. IEEE’nin yıllardır düzenlediği bir konferansta basılan bildiriler ders içeriğinin belirlenmesinde veri seti olarak kullanılmıştır. Data Science and Advanced Analytics adındaki konferansın bu yıl 7. si düzenlenecektir. 2015, 2016, 2017 ve 2018 yıllarında konferansa kabul edilen bildiriler veri setinde kullanılmıştır. Bildirilerin başlık kısımları ve anahtar kelimeler doğal dil işleme teknikleri ile analiz edilerek ders içeriği belirlenmiştir. Bu çalışmada ilk olarak veri seti hazırlandıktan sonra, veri üzerinde veri temizleme işlemi yapılmış ardından bildiri başlıkları sözcüklere ayrılmıştır. Sözcüklere ayrılan veri seti içinde sözcüklerin frekansları bulunarak frekansa göre ilk yirmi sözcük seçilmiştir. Doğal dil işleme sürecinde Apache Spark NTK paketi kullanılmıştır. Seçilen 20 sözcük atomik olduğundan tümevarım yöntemi ile ana konu başlıkları belirlenmiştir.

Keywords

veri bilimi, doğal dil işleme, ders içeriği hazırlama, veri bilimcisi, konu modelleme

References

G. Strawn, “Data Scientist”, IT Prof., 18(3), 55–57, 2016.
M. Kim, T. Zimmermann, R. Deline, and A. Begel, “Data scientists in software teams: State of the art and challenges”, IEEE Trans. Softw. Eng., 44(11), 1024–1038, 2018.
C. Costa and M. Y. Santos, “The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age”, Int. J. Inf. Manage., 37(6), 726–734, 2017.
V. Dhar, “Data science and prediction”, Commun. ACM, 56( 12), 64–73, 2013.
F. W. Spaid and J. C. Frishett, “Incipient separation of a supersonic, turbulent boundary layer, including effects of heat transfer”, AIAA Journal, 10(7). 1972.
F. Provost and T. Fawcett, “Data Science and its Relationship to Big Data and Data-Driven Decision Making”, Big Data, 1(1) 51–59, 2013.
H. Hu, Y. Luo, Y. Wen, Y. S. Ong, and X. Zhang, “How to Find a Perfect Data Scientist: A Distance-Metric Learning Approach”, IEEE Access, 6, 60380–60395, 2018.
Internet: McKinsey, Big data: the next frontier for innovation, competition, and productivity,athttp://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_ innovation, 21.01.2020.
Internet: Columbia University, Data Science Institue. Columbia University, https://datascience.columbia.edu/master-of-science-in-data-science,15.02.2020.
E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2020.
M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis”, Neurocomputing, 378, 315–323, 2020.
F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review”, IEEE Access, 6, 56046–56058, 2018.
K. A. Renn and E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs”, J. Coll. Stud. Dev., 49(4), 319–335, 2008.
I. Y. Song and Y. Zhu, “Big data and data science: what should we teach?”, Expert Syst., 33(4), 364–373, 2016.
Y. Zhang, M. Chen, and L. Liu, A review on text mining, Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, 681–685, 2015.
I. Yahav, O. Shehory, and D. Schwartz, “Comments Mining With TF-IDF: The Inherent Bias and Its Removal”, IEEE Trans. Knowl. Data Eng., 31(3), 437–450, 2019.
X. Chen, D. Zou, G. Cheng, and H. Xie, “Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education”, Comput. Educ., 151, 2020.
L. Yao, Y. Zhang, Q. Chen, H. Qian, B. Wei, Z. Hu, “Mining coherent topics in documents using word embeddings and large-scale text data,” Eng. Appl. Artif. Intell., 64, 432–439, 2017.
M. Pejic-Bach, T. Bertoncel, M. Meško, Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2018.
M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, 378, 315–323, 2020, doi: 10.1016/j.neucom.2019.08.096.
F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review,” IEEE Access, 6, 56046–56058, 2018.
K. A. Renn, E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs,” J. Coll. Stud. Dev., 49(4), 319–335, 2008.
E. Ustun., “Learning Analytics and Applications in Higher Education”, Bilişim Teknolojileri Dergisi, 13(3), 2020.
H. Polat, M. Korpe, “Extracting Close Meaning Concepts from GNAT Parliamentary Minutes”, Bilişim Teknolojileri Dergisi, 11(3), 2018.
Ç. Aci, A. Çirak, “Turkish News Articles Categorization Using Convolutional Neural Networks and Word2Vec”, Bilişim Teknolojileri Dergisi, 12(3), 2019.

There are 27 citations in total.

Details

Primary Language	Turkish
Subjects	Computer Software
Journal Section	Articles
Authors	Ahmet Albayrak 0000-0002-2166-1102
Publication Date	October 30, 2020
Submission Date	April 4, 2020
Published in Issue	Year 2020 Volume: 13 Issue: 4

Cite

APA	Albayrak, A. (2020). Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması. Bilişim Teknolojileri Dergisi, 13(4), 373-383. https://doi.org/10.17671/gazibtd.714447

Cited By

Duygu Analizi ve Topluluk Öğrenmesi Yaklaşımları ile Kullanıcı Yorumlarının Analizi

Düzce Üniversitesi Bilim ve Teknoloji Dergisi

https://doi.org/10.29130/dubited.1102181

DİJİTAL BİLGİ VE İLETİŞİM TEKNOLOJİLERİNE (BİT) DAİR KEŞFEDİCİ BİR ÇÖZÜMLEME: PAMUKKALE ÜNİVERSİTESİ (PAU) SİYASET BİLİMİ VE KAMU YÖNETİMİ (SBKY) ÖRNEĞİ

Elektronik Sosyal Bilimler Dergisi

https://doi.org/10.17755/esosder.1201879

BAŞLIĞINDA “DATA SCİENCE” İFADESİ GEÇEN ULUSLARARASI KONGRELERDE SUNULAN BİLDİRİLERİN METİN MADENCİLİĞİ YÖNTEMLERİ İLE İNCELENMESİ

Nicel Bilimler Dergisi

https://doi.org/10.51541/nicel.1075225

Netflix verileri üzerinde TF-IDF algoritması ve Kosinüs benzerliği ile bir İçerik Öneri Sistemi Uygulaması

AJIT-e Online Academic Journal of Information Technology

https://doi.org/10.5824/ajite.2022.01.002.x

Download Cover Image

Article Files

Full Text