Year 2021, Volume , Issue 23, Pages 787 - 792 2021-04-30

Turkish Text Generation with Generative Adversarial Network
Üretken Rakip Ağ ile Türkçe Metin Üretimi

Barış GÜCÜK [1] , Rafet DURGUT [2] , Oğuz FINDIK [3]


The training data set used for the success of the training phase in machine learning methods is very important. One of the most common problems in natural language processing is the lack of sufficient data or the unlabeled data. Especially in classification problems, the scarcity of data in a certain class reduces the success of the classification. In this study, generative adversarial network method was used in order to increase the texts belonging to the missing class in the data set. Data augmentation is performed on news texts. The results obtained were evaluated together with machine learning techniques such as n-grams, support vector machine, TF-IDF and logistic regression. According to the results, the use of generative adversarial network for Turkish text generation increased the classification success by approximately 47%.
Makine öğrenmesi yöntemlerinde tahmin aşamasının başarısı için kullanılan eğitim veri seti kümesi oldukça önemlidir. Doğal dil işlemede en çok karşılaşılan problemlerden birisi yeterli veri bulunamaması veya bulunan verilerin etiketsiz olmasıdır. Özellikle sınıflandırma problemlerinde belirli bir sınıftaki verinin azlığı sınıflandırmanın başarısını düşürmektedir. Bu çalışmada veri kümesinde bulunan eksik sınıfa ait metinlerin arttırılması amacı ile üretken rakip ağlar yöntemi kullanılmıştır. Haber metinleri üzerinde veri çoğalma işlemi gerçekleştirilmiştir. Elde edilen sonuçlar n-gram, destek vektör makinesi, TF-IDF ve lojistik regresyon gibi makine öğrenmesi teknikleriyle birlikte kullanılarak performansları değerlendirilmiştir. Sonuçlara göre üretken rakip ağların Türkçe metin üretimi için kullanılması sınıflandırma başarısını yaklaşık % 47 oranında arttırmıştır.
  • Michie D., Spiegelhalter D. J. & Taylor C. C. (1994). Machine learning. Neural and Statistical Classification, (13.1994), 1-298.
  • Ayon D. (2016). Machine learning algorithms: a review. International Journal of Computer Science and Information Technologies, (7.3), 1174-1179.
  • Xiaojin Z. (2005). Semi-Supervised Learning Literature Survey. CS Technical Reports University of Wisconsin-Madison.
  • Jun S. & Hideki I. (2008). Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data. Proceedings of ACL-08 HLT, 665–673.
  • Goodfellow I. J., Pouget-Abadie J., Mirza M., Xu B., Warde-Farley D., Ozair S., Courville A. & Bengio Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing System (NIPS), 2672-2680.
  • Zhang H., Xu T., Li H., Zhang S., Wang X., Huang X. & Metexas D. N. (2019). StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, (41.8), 1947-1962.
  • Santhanam S. (2018). Context Based Text-Generation Using Lstm Networks. Conference: Artificial Intelligence International Conference A2IC.
  • Wang W. Y., Singh S. & Li J. (2019). Deep Adversarial Learning for NLP. Proceedings of NAACL HLT, 1-5.
  • Che T., Li Y., Zhang R., Hjelm R., Li W., Song Y. & Bengio Y. (2017). Maximum-Likelihood Augmented Discrete Generative Adversarial Networks.
  • Guo J., Lu S., Cai H., Zhang W., Yu Y. & Wang J. (2017). Long Text Generation via Adversarial Training with Leaked Information. Association for the Advancement of Artificial Intelligence.
  • Yu L., Zhang W., Wang J. & Yu Y. (2017). Long Text Generation via Adversarial Training with Leaked Information. Association for the Advancement of Artificial Intelligence.
  • Lin K., Li D., He X., Zhang Z. & Sun M. (2017). Adversarial Ranking for Language Generation. Advances in Neural Information Processing System (NIPS).
  • Fedus W., Goodfellow I. & Dai A. (2018). Maskgan: Better Text Generation Via Filling In The _____. International Conference on Learning Representations (ICLR).
  • Cao Y., Zhou Z., Zhang W. & Yu Y. (2017). Unsupervised Diverse Colorization via Generative Adversarial Networks.
  • Kusner M. & Hernandez-Lobato J. (2016). GANS for Sequences of Discrete Elements with the Gumbel-softmax Distribution.
  • Maskin E. (1999). Nash Equilibrium and Welfare Optimality. The Review of Economic Studies, (66.1), 23-38.
  • Gharesifard B. & Cortes J. (2013). Distributed convergence to Nash equilibria in two-network zero-sum games. Automatica, (49.6), 1683-1692.
  • TensorFlow. Neden TensorFlow. Online Web site: 15 Aralık 2020 tarihinde https://www.tensorflow.org/about adresinden erişildi.
  • Wikipedia. Long short-term memory. Online Web site: 15 Aralık 2020 tarihinde https://en.wikipedia.org/wiki/Long_short-term_memory adresinden erişildi.
  • Wikipedia. n-gram. Online Web site: 15 Aralık 2020 tarihinde https://en.wikipedia.org/wiki/N-gram adresinden erişildi.
  • Medium. TF-IDF/Term Frequency Technique. Online Web site: 15 Aralık 2020 tarihinde https://medium.com/analytics-vidhya/tf-idf-term-frequency-technique-easiest-explanation-for-text-classification-in-nlp-with-code-8ca3912e58c3 adresinden erişildi.
  • Wikipedia. Support vector machine. Online Web site: 15 Aralık 2020 tarihinde https://en.wikipedia.org/wiki/Support_vector_machine adresinden erişildi.
  • Wikipedia. Logistic regression. Online Web site: 15 Aralık 2020 tarihinde https://en.wikipedia.org/wiki/Logistic_regression adresinden erişildi.
  • GitHub. Zemberek-NLP. Online Web site: 15 Aralık 2020 tarihinde https://github.com/ahmetaa/zemberek-nlp adresinden erişildi.
Primary Language tr
Subjects Engineering
Journal Section Articles
Authors

Orcid: 0000-0002-1381-3663
Author: Barış GÜCÜK (Primary Author)
Institution: KARABÜK ÜNİVERSİTESİ
Country: Turkey


Orcid: 0000-0002-6891-5851
Author: Rafet DURGUT
Institution: KARABÜK ÜNİVERSİTESİ
Country: Turkey


Orcid: 0000-0001-5069-6470
Author: Oğuz FINDIK
Institution: KARABÜK ÜNİVERSİTESİ
Country: Turkey


Dates

Publication Date : April 30, 2021

APA Gücük, B , Durgut, R , Fındık, O . (2021). Üretken Rakip Ağ ile Türkçe Metin Üretimi . Avrupa Bilim ve Teknoloji Dergisi , (23) , 787-792 . DOI: 10.31590/ejosat.857179