Araştırma Makalesi

Medical Text Classification Using Semisupervised Learning and Bert-Based Models

Cilt: 7 Sayı: 1 30 Nisan 2025
PDF İndir
TR EN

Medical Text Classification Using Semisupervised Learning and Bert-Based Models

Öz

Medical text classification organizes complex medical texts, facing challenges like insufficient training data. This paper proposes a novel method for categorizing medical texts based on a dataset of health problem abstracts and their labels. We applied data representation techniques to our labeled dataset and employed various machine learning algorithms for text classification. Initial results were unsatisfactory due to limited labeled data. To enhance this, we applied data augmentation techniques using an unlabeled dataset, utilizing BERT-based models (BioBERT, ClinicalBERT) to enrich the labeled data. Different voting mechanisms, namely hard voting and soft voting were employed to validate and add new labeled records to the dataset. After augmenting the labeled data, machine learning algorithms were re-applied. The results demonstrated that our approach significantly improves the performance of medical text classification, effectively addressing the challenges posed by limited labeled data and enhancing overall accuracy.

Anahtar Kelimeler

Kaynakça

  1. Kaggle, "Medical Text Classification Dataset." Available: https://www.kaggle.com/code/chaitanyakck/medical-text-classification/, (Accessed: Jan. 24, 2025).
  2. M. Bayer, M.-A. Kaufhold, and C. Reuter, “A survey on data augmentation for text classification,” ACM Computing Surveys, vol. 55, no. 7, pp. 1-39, 2022.
  3. J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.
  4. K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling clinical notes and predicting hospital readmission,” arXiv preprint, arXiv:1904.05342, 2019.
  5. K. M. Chaitrashree, T. N. Sneha, S. R. Tanushree, G. R. Usha, and T. C. Pramod, “Unstructured medical text classification using machine learning and deep learning approaches,” in 2021 IEEE International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pp. 429-433, 2021.
  6. H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance, ” BMC Medical Research Methodology, vol. 22, no. 181, 2022.
  7. Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, P. S. Yu, and L. He, "A survey on text classification: From traditional to deep learning,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 2, pp. 1-41, 2022.
  8. K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Computer Science Review, vol. 54, no. 100664, 2024.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Derin Öğrenme, Yarı ve Denetimsiz Öğrenme, Makine Öğrenme (Diğer)

Bölüm

Araştırma Makalesi

Erken Görünüm Tarihi

28 Nisan 2025

Yayımlanma Tarihi

30 Nisan 2025

Gönderilme Tarihi

6 Aralık 2024

Kabul Tarihi

13 Şubat 2025

Yayımlandığı Sayı

Yıl 2025 Cilt: 7 Sayı: 1

Kaynak Göster

APA
Soygazi, F., & Oğuz, D. (2025). Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Mühendislik Bilimleri ve Araştırmaları Dergisi, 7(1), 60-69. https://doi.org/10.46387/bjesr.1597329
AMA
1.Soygazi F, Oğuz D. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 2025;7(1):60-69. doi:10.46387/bjesr.1597329
Chicago
Soygazi, Fatih, ve Damla Oğuz. 2025. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7 (1): 60-69. https://doi.org/10.46387/bjesr.1597329.
EndNote
Soygazi F, Oğuz D (01 Nisan 2025) Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Mühendislik Bilimleri ve Araştırmaları Dergisi 7 1 60–69.
IEEE
[1]F. Soygazi ve D. Oğuz, “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”, Müh.Bil.ve Araş.Dergisi, c. 7, sy 1, ss. 60–69, Nis. 2025, doi: 10.46387/bjesr.1597329.
ISNAD
Soygazi, Fatih - Oğuz, Damla. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7/1 (01 Nisan 2025): 60-69. https://doi.org/10.46387/bjesr.1597329.
JAMA
1.Soygazi F, Oğuz D. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 2025;7:60–69.
MLA
Soygazi, Fatih, ve Damla Oğuz. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi, c. 7, sy 1, Nisan 2025, ss. 60-69, doi:10.46387/bjesr.1597329.
Vancouver
1.Fatih Soygazi, Damla Oğuz. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 01 Nisan 2025;7(1):60-9. doi:10.46387/bjesr.1597329