Araştırma Makalesi
BibTex RIS Kaynak Göster

Preparing Interdisciplinary Graduate Course Contents Using Natural Language Processing Techniques

Yıl 2020, Cilt: 13 Sayı: 4, 373 - 383, 30.10.2020
https://doi.org/10.17671/gazibtd.714447

Öz

In this study, natural language processing methods, one of the data mining techniques, were used to prepare the content of an interdisciplinary course that is planned to be opened at graduate level. The graduate course is called Data Science and Applications. Data science is an interdisciplinary concept that includes statistics and computer science. The course has no place in the literature with a similar name. Data science is an approach that prioritizes data and is applied in many fields. Since the application area is very wide, the course is called Data Science and Applications. Papers published at a conference organized by IEEE for years were used as a data set in determining the course content. The conference called Data Science and Advanced Analytics will be held for the 7th time this year. Papers accepted to the conference in 2015, 2016, 2017 and 2018 were used in the data set. The title texts and keywords of the papers were analyzed with natural language processing techniques and the course content was determined. In this study, after the first data set was prepared, data-cleaning process was performed on the data, and then the title of the papers was divided into words. The frequencies of the words are found in the data set devoted to the words and the first twenty words are selected according to the frequency. Apache Spark NTK package was used in the natural language processing process. Since the 20 words chosen are atomic, the main topic titles are determined by the induction method.

Kaynakça

  • G. Strawn, “Data Scientist”, IT Prof., 18(3), 55–57, 2016.
  • M. Kim, T. Zimmermann, R. Deline, and A. Begel, “Data scientists in software teams: State of the art and challenges”, IEEE Trans. Softw. Eng., 44(11), 1024–1038, 2018.
  • C. Costa and M. Y. Santos, “The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age”, Int. J. Inf. Manage., 37(6), 726–734, 2017.
  • V. Dhar, “Data science and prediction”, Commun. ACM, 56( 12), 64–73, 2013.
  • F. W. Spaid and J. C. Frishett, “Incipient separation of a supersonic, turbulent boundary layer, including effects of heat transfer”, AIAA Journal, 10(7). 1972.
  • F. Provost and T. Fawcett, “Data Science and its Relationship to Big Data and Data-Driven Decision Making”, Big Data, 1(1) 51–59, 2013.
  • H. Hu, Y. Luo, Y. Wen, Y. S. Ong, and X. Zhang, “How to Find a Perfect Data Scientist: A Distance-Metric Learning Approach”, IEEE Access, 6, 60380–60395, 2018.
  • Internet: McKinsey, Big data: the next frontier for innovation, competition, and productivity,athttp://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_ innovation, 21.01.2020.
  • Internet: Columbia University, Data Science Institue. Columbia University, https://datascience.columbia.edu/master-of-science-in-data-science,15.02.2020.
  • E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
  • E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
  • M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2020.
  • M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis”, Neurocomputing, 378, 315–323, 2020.
  • F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review”, IEEE Access, 6, 56046–56058, 2018.
  • K. A. Renn and E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs”, J. Coll. Stud. Dev., 49(4), 319–335, 2008.
  • I. Y. Song and Y. Zhu, “Big data and data science: what should we teach?”, Expert Syst., 33(4), 364–373, 2016.
  • Y. Zhang, M. Chen, and L. Liu, A review on text mining, Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, 681–685, 2015.
  • I. Yahav, O. Shehory, and D. Schwartz, “Comments Mining With TF-IDF: The Inherent Bias and Its Removal”, IEEE Trans. Knowl. Data Eng., 31(3), 437–450, 2019.
  • X. Chen, D. Zou, G. Cheng, and H. Xie, “Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education”, Comput. Educ., 151, 2020.
  • L. Yao, Y. Zhang, Q. Chen, H. Qian, B. Wei, Z. Hu, “Mining coherent topics in documents using word embeddings and large-scale text data,” Eng. Appl. Artif. Intell., 64, 432–439, 2017.
  • M. Pejic-Bach, T. Bertoncel, M. Meško, Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2018.
  • M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, 378, 315–323, 2020, doi: 10.1016/j.neucom.2019.08.096.
  • F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review,” IEEE Access, 6, 56046–56058, 2018.
  • K. A. Renn, E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs,” J. Coll. Stud. Dev., 49(4), 319–335, 2008.
  • E. Ustun., “Learning Analytics and Applications in Higher Education”, Bilişim Teknolojileri Dergisi, 13(3), 2020.
  • H. Polat, M. Korpe, “Extracting Close Meaning Concepts from GNAT Parliamentary Minutes”, Bilişim Teknolojileri Dergisi, 11(3), 2018.
  • Ç. Aci, A. Çirak, “Turkish News Articles Categorization Using Convolutional Neural Networks and Word2Vec”, Bilişim Teknolojileri Dergisi, 12(3), 2019.

Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması

Yıl 2020, Cilt: 13 Sayı: 4, 373 - 383, 30.10.2020
https://doi.org/10.17671/gazibtd.714447

Öz

Bu çalışmada lisansüstü seviyede açılan düşünülen disiplinler arası bir dersin içeriğinin hazırlanması için veri madenciliği tekniklerinden doğal dil işleme yöntemleri kullanılmıştır. Lisansüstü ders, Veri Bilimi ve Uygulamaları adını taşımaktadır. Veri bilimi temelde istatistik ve bilgisayar bilimlerini içine alan disiplinler arası bir kavramdır. Dersin benzer bir ad ile literatürde yeri yoktur. Veri bilimi yaklaşımı veriyi öncelikleyen ve oldukça fazla alanda uygulanan bir yaklaşımdır. Uygulama alanı çok geniş olduğundan derse Veri Bilimi ve Uygulamaları adı verilmiştir. IEEE’nin yıllardır düzenlediği bir konferansta basılan bildiriler ders içeriğinin belirlenmesinde veri seti olarak kullanılmıştır. Data Science and Advanced Analytics adındaki konferansın bu yıl 7. si düzenlenecektir. 2015, 2016, 2017 ve 2018 yıllarında konferansa kabul edilen bildiriler veri setinde kullanılmıştır. Bildirilerin başlık kısımları ve anahtar kelimeler doğal dil işleme teknikleri ile analiz edilerek ders içeriği belirlenmiştir. Bu çalışmada ilk olarak veri seti hazırlandıktan sonra, veri üzerinde veri temizleme işlemi yapılmış ardından bildiri başlıkları sözcüklere ayrılmıştır. Sözcüklere ayrılan veri seti içinde sözcüklerin frekansları bulunarak frekansa göre ilk yirmi sözcük seçilmiştir. Doğal dil işleme sürecinde Apache Spark NTK paketi kullanılmıştır. Seçilen 20 sözcük atomik olduğundan tümevarım yöntemi ile ana konu başlıkları belirlenmiştir.

Kaynakça

  • G. Strawn, “Data Scientist”, IT Prof., 18(3), 55–57, 2016.
  • M. Kim, T. Zimmermann, R. Deline, and A. Begel, “Data scientists in software teams: State of the art and challenges”, IEEE Trans. Softw. Eng., 44(11), 1024–1038, 2018.
  • C. Costa and M. Y. Santos, “The data scientist profile and its representativeness in the European e-Competence framework and the skills framework for the information age”, Int. J. Inf. Manage., 37(6), 726–734, 2017.
  • V. Dhar, “Data science and prediction”, Commun. ACM, 56( 12), 64–73, 2013.
  • F. W. Spaid and J. C. Frishett, “Incipient separation of a supersonic, turbulent boundary layer, including effects of heat transfer”, AIAA Journal, 10(7). 1972.
  • F. Provost and T. Fawcett, “Data Science and its Relationship to Big Data and Data-Driven Decision Making”, Big Data, 1(1) 51–59, 2013.
  • H. Hu, Y. Luo, Y. Wen, Y. S. Ong, and X. Zhang, “How to Find a Perfect Data Scientist: A Distance-Metric Learning Approach”, IEEE Access, 6, 60380–60395, 2018.
  • Internet: McKinsey, Big data: the next frontier for innovation, competition, and productivity,athttp://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_ innovation, 21.01.2020.
  • Internet: Columbia University, Data Science Institue. Columbia University, https://datascience.columbia.edu/master-of-science-in-data-science,15.02.2020.
  • E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
  • E. Saquete, D. Tomás, P. Moreda, P. Martínez-Barco, and M. Palomar, “Fighting post-truth using natural language processing: A review and open challenges,” Expert Syst. Appl., 141, 112943, 2020.
  • M. Pejic-Bach, T. Bertoncel, M. Meško, and Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2020.
  • M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis”, Neurocomputing, 378, 315–323, 2020.
  • F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review”, IEEE Access, 6, 56046–56058, 2018.
  • K. A. Renn and E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs”, J. Coll. Stud. Dev., 49(4), 319–335, 2008.
  • I. Y. Song and Y. Zhu, “Big data and data science: what should we teach?”, Expert Syst., 33(4), 364–373, 2016.
  • Y. Zhang, M. Chen, and L. Liu, A review on text mining, Proc. IEEE Int. Conf. Softw. Eng. Serv. Sci. ICSESS, 681–685, 2015.
  • I. Yahav, O. Shehory, and D. Schwartz, “Comments Mining With TF-IDF: The Inherent Bias and Its Removal”, IEEE Trans. Knowl. Data Eng., 31(3), 437–450, 2019.
  • X. Chen, D. Zou, G. Cheng, and H. Xie, “Detecting latent topics and trends in educational technologies over four decades using structural topic modeling: A retrospective of all volumes of Computers & Education”, Comput. Educ., 151, 2020.
  • L. Yao, Y. Zhang, Q. Chen, H. Qian, B. Wei, Z. Hu, “Mining coherent topics in documents using word embeddings and large-scale text data,” Eng. Appl. Artif. Intell., 64, 432–439, 2017.
  • M. Pejic-Bach, T. Bertoncel, M. Meško, Ž. Krstić, “Text mining of industry 4.0 job advertisements”, Int. J. Inf. Manage., 50, 416–431, 2018.
  • M. Giménez, J. Palanca, and V. Botti, “Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis,” Neurocomputing, 378, 315–323, 2020, doi: 10.1016/j.neucom.2019.08.096.
  • F. Salo, M. Injadat, A. B. Nassif, A. Shami, and A. Essex, “Data mining techniques in intrusion detection systems: A systematic literature review,” IEEE Access, 6, 56046–56058, 2018.
  • K. A. Renn, E. R. Jessup-Anger, “Preparing new professionals: Lessons for graduate preparation programs from the national study of new professionals in student affairs,” J. Coll. Stud. Dev., 49(4), 319–335, 2008.
  • E. Ustun., “Learning Analytics and Applications in Higher Education”, Bilişim Teknolojileri Dergisi, 13(3), 2020.
  • H. Polat, M. Korpe, “Extracting Close Meaning Concepts from GNAT Parliamentary Minutes”, Bilişim Teknolojileri Dergisi, 11(3), 2018.
  • Ç. Aci, A. Çirak, “Turkish News Articles Categorization Using Convolutional Neural Networks and Word2Vec”, Bilişim Teknolojileri Dergisi, 12(3), 2019.
Toplam 27 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Bilgisayar Yazılımı
Bölüm Makaleler
Yazarlar

Ahmet Albayrak 0000-0002-2166-1102

Yayımlanma Tarihi 30 Ekim 2020
Gönderilme Tarihi 4 Nisan 2020
Yayımlandığı Sayı Yıl 2020 Cilt: 13 Sayı: 4

Kaynak Göster

APA Albayrak, A. (2020). Doğal Dil İşleme Teknikleri Kullanılarak Disiplinler Arası Lisansüstü Ders İçeriği Hazırlanması. Bilişim Teknolojileri Dergisi, 13(4), 373-383. https://doi.org/10.17671/gazibtd.714447