Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution

Beytullah Yıldız

doi:10.55525/tjst.1068940

Research Article

Year 2022, Volume: 17 Issue: 1, 89 - 98, 20.03.2022

Beytullah Yıldız

https://doi.org/10.55525/tjst.1068940

Cited By: 1

Abstract

References

[1] Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: 29th AAAI conference on artificial intelligence, Austin, Texas USA, January 25–30, 2015 2015.
[2] Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep Learning-based Text Classification: A Comprehensive Review. ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1-40, 2021.
[3] Tufek A, Aktas M S. On the provenance extraction techniques from large scale log files: a case study for the numerical weather prediction models. In: European Conference on Parallel Processing, 2020 : Springer, pp. 249-260.
[4] Tezgider M, Yildiz B, Aydin G. Text classification using improved bidirectional transformer. Concurrency and Computation: Practice and Experience, p. e6486.
[5] Soyalp G, Alar A, Ozkanli K, Yildiz B. Improving Text Classification with Transformer. In: 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021; Ankara, Turkey, IEEE pp. 707-712.
[6] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: 26th International Conference on Neural Information Processing Systems, 2013, Lake Tahoe, Nevada, pp. 3111-3119.
[7] Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics, April 2017, Valencia, Spain: Association for Computational Linguistics, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427-431.
[8] Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. In: The Conference on Empirical Methods in Natural Language Processing (EMNLP). October 2014 Doha, Qatar: Association for Computational Linguistics, pp. 1532-1543.
[9] Devlin J. Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, Minneapolis, MN, USA.
[10] Padurariu C, Breaban M E. Dealing with data imbalance in text classification. Procedia Computer Science, 2019, vol. 159, pp. 736-745.
[11] Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. Text Classification Algorithms: A Survey. Information, 2019, vol. 10, no. 4, p. 150.
[12] Yildiz B, Tezgider M. Improving word embedding quality with innovative automated approaches to hyperparameters. Concurrency and Computation: Practice and Experience, 2021 p. e6091.
[13] Yildiz B, Tezgider M. Learning Quality Improved Word Embedding with Assessment of Hyperparameters. In European Conference on Parallel Processing, 2019: Springer, pp. 506-518.
[14] Li Y, Sun G, Zhu Y. Data imbalance problem in text classification. In: 2010 Third International Symposium on Information Processing, 2010: IEEE, pp. 301-305.
[15] Dixon L, Li J, Sorensen J, Thain N, Vasserman L. Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 67-73.
[16] Shi K, Li L, Liu H, He J, Zhang N, Song W. An improved KNN text classification algorithm based on density. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, 2011: IEEE, pp. 113-117.
[17] Ogura H, Amano H, Kondo M. Comparison of metrics for feature selection in imbalanced text classification. Expert Systems with Applications, 2011, vol. 38, no. 5, pp. 4978-4989.
[18] Liu Y, Loh H T, Sun A. Imbalanced text classification: A term weighting approach. Expert systems with Applications, 2009, vol. 36, no. 1, pp. 690-701.
[19] Liu Y, Loh H T, Kamal Y T, Tor and S B. Handling of imbalanced data in text classification: Category-based term weights. In: Natural language processing and text mining: Springer, 2007, pp. 171-192.
[20] Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: Experimental evaluation. Information Sciences, 2020, vol. 513, pp. 429-441.
[21] Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 2010: Citeseer.
[22] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint, 2013, arXiv:1301.3781.
[23] Olmezogullari E, AktasM S. Pattern2Vec: Representation of clickstream data sequences for learning user navigational behavior. Concurrency and Computation: Practice and Experience, 2021, p. e6546.
[24] Hallac I R, Makinist S, Ay B, and Aydin G. user2vec: Social media user representation based on distributed document embeddings. In: 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), 2019: IEEE, pp. 1-5.

Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution

Year 2022, Volume: 17 Issue: 1, 89 - 98, 20.03.2022

Beytullah Yıldız

https://doi.org/10.55525/tjst.1068940

Cited By: 1

Abstract

Technological developments and the widespread use of the internet cause the data produced on a daily basis to increase exponentially. An important part of this deluge of data is text data from applications such as social media, communication tools, customer service. The processing of this large amount of text data needs automation. Significant successes have been achieved in text processing recently. Especially with deep learning applications, text classification performance has become quite satisfactory. In this study, we proposed an innovative data distribution algorithm that reduces the data imbalance problem to further increase the text classification success. Experiment results show that there is an improvement of approximately 3.5% in classification accuracy and over 3 in F1 score with the algorithm that optimizes the data distribution.

Keywords

Text classification, Data Imbalance, Data Distribution, Deep learning, Word Embedding.

References

[1] Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: 29th AAAI conference on artificial intelligence, Austin, Texas USA, January 25–30, 2015 2015.
[2] Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep Learning-based Text Classification: A Comprehensive Review. ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1-40, 2021.
[3] Tufek A, Aktas M S. On the provenance extraction techniques from large scale log files: a case study for the numerical weather prediction models. In: European Conference on Parallel Processing, 2020 : Springer, pp. 249-260.
[4] Tezgider M, Yildiz B, Aydin G. Text classification using improved bidirectional transformer. Concurrency and Computation: Practice and Experience, p. e6486.
[5] Soyalp G, Alar A, Ozkanli K, Yildiz B. Improving Text Classification with Transformer. In: 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021; Ankara, Turkey, IEEE pp. 707-712.
[6] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: 26th International Conference on Neural Information Processing Systems, 2013, Lake Tahoe, Nevada, pp. 3111-3119.
[7] Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of Tricks for Efficient Text Classification. In: 15th Conference of the European Chapter of the Association for Computational Linguistics, April 2017, Valencia, Spain: Association for Computational Linguistics, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 427-431.
[8] Pennington J, Socher R, Manning C. Glove: Global Vectors for Word Representation. In: The Conference on Empirical Methods in Natural Language Processing (EMNLP). October 2014 Doha, Qatar: Association for Computational Linguistics, pp. 1532-1543.
[9] Devlin J. Chang M W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019, Minneapolis, MN, USA.
[10] Padurariu C, Breaban M E. Dealing with data imbalance in text classification. Procedia Computer Science, 2019, vol. 159, pp. 736-745.
[11] Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. Text Classification Algorithms: A Survey. Information, 2019, vol. 10, no. 4, p. 150.
[12] Yildiz B, Tezgider M. Improving word embedding quality with innovative automated approaches to hyperparameters. Concurrency and Computation: Practice and Experience, 2021 p. e6091.
[13] Yildiz B, Tezgider M. Learning Quality Improved Word Embedding with Assessment of Hyperparameters. In European Conference on Parallel Processing, 2019: Springer, pp. 506-518.
[14] Li Y, Sun G, Zhu Y. Data imbalance problem in text classification. In: 2010 Third International Symposium on Information Processing, 2010: IEEE, pp. 301-305.
[15] Dixon L, Li J, Sorensen J, Thain N, Vasserman L. Measuring and mitigating unintended bias in text classification. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 2018, pp. 67-73.
[16] Shi K, Li L, Liu H, He J, Zhang N, Song W. An improved KNN text classification algorithm based on density. In: 2011 IEEE International Conference on Cloud Computing and Intelligence Systems, 2011: IEEE, pp. 113-117.
[17] Ogura H, Amano H, Kondo M. Comparison of metrics for feature selection in imbalanced text classification. Expert Systems with Applications, 2011, vol. 38, no. 5, pp. 4978-4989.
[18] Liu Y, Loh H T, Sun A. Imbalanced text classification: A term weighting approach. Expert systems with Applications, 2009, vol. 36, no. 1, pp. 690-701.
[19] Liu Y, Loh H T, Kamal Y T, Tor and S B. Handling of imbalanced data in text classification: Category-based term weights. In: Natural language processing and text mining: Springer, 2007, pp. 171-192.
[20] Thabtah F, Hammoud S, Kamalov F, Gonsalves A. Data imbalance in classification: Experimental evaluation. Information Sciences, 2020, vol. 513, pp. 429-441.
[21] Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 2010: Citeseer.
[22] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint, 2013, arXiv:1301.3781.
[23] Olmezogullari E, AktasM S. Pattern2Vec: Representation of clickstream data sequences for learning user navigational behavior. Concurrency and Computation: Practice and Experience, 2021, p. e6546.
[24] Hallac I R, Makinist S, Ay B, and Aydin G. user2vec: Social media user representation based on distributed document embeddings. In: 2019 International Artificial Intelligence and Data Processing Symposium (IDAP), 2019: IEEE, pp. 1-5.

There are 24 citations in total.

Details

Primary Language	English
Subjects	Engineering
Journal Section	TJST
Authors	Beytullah Yıldız 0000-0001-7664-5145
Publication Date	March 20, 2022
Submission Date	February 6, 2022
Published in Issue	Year 2022 Volume: 17 Issue: 1

Cite

APA	Yıldız, B. (2022). Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution. Turkish Journal of Science and Technology, 17(1), 89-98. https://doi.org/10.55525/tjst.1068940
AMA	Yıldız B. Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution. TJST. March 2022;17(1):89-98. doi:10.55525/tjst.1068940
Chicago	Yıldız, Beytullah. “Efficient Text Classification With Deep Learning on Imbalanced Data Improved With Better Distribution”. Turkish Journal of Science and Technology 17, no. 1 (March 2022): 89-98. https://doi.org/10.55525/tjst.1068940.
EndNote	Yıldız B (March 1, 2022) Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution. Turkish Journal of Science and Technology 17 1 89–98.
IEEE	B. Yıldız, “Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution”, TJST, vol. 17, no. 1, pp. 89–98, 2022, doi: 10.55525/tjst.1068940.
ISNAD	Yıldız, Beytullah. “Efficient Text Classification With Deep Learning on Imbalanced Data Improved With Better Distribution”. Turkish Journal of Science and Technology 17/1 (March 2022), 89-98. https://doi.org/10.55525/tjst.1068940.
JAMA	Yıldız B. Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution. TJST. 2022;17:89–98.
MLA	Yıldız, Beytullah. “Efficient Text Classification With Deep Learning on Imbalanced Data Improved With Better Distribution”. Turkish Journal of Science and Technology, vol. 17, no. 1, 2022, pp. 89-98, doi:10.55525/tjst.1068940.
Vancouver	Yıldız B. Efficient Text Classification with Deep Learning on Imbalanced Data Improved with Better Distribution. TJST. 2022;17(1):89-98.

Cited By

Machine Learning-Based Text Classification Comparison: Turkish Language Context

Applied Sciences

https://doi.org/10.3390/app13169428

Download Cover Image

Article Files

Full Text