TR
EN
Medical Text Classification Using Semisupervised Learning and Bert-Based Models
Öz
Medical text classification organizes complex medical texts, facing challenges like insufficient training data. This paper proposes a novel method for categorizing medical texts based on a dataset of health problem abstracts and their labels. We applied data representation techniques to our labeled dataset and employed various machine learning algorithms for text classification. Initial results were unsatisfactory due to limited labeled data. To enhance this, we applied data augmentation techniques using an unlabeled dataset, utilizing BERT-based models (BioBERT, ClinicalBERT) to enrich the labeled data. Different voting mechanisms, namely hard voting and soft voting were employed to validate and add new labeled records to the dataset. After augmenting the labeled data, machine learning algorithms were re-applied. The results demonstrated that our approach significantly improves the performance of medical text classification, effectively addressing the challenges posed by limited labeled data and enhancing overall accuracy.
Anahtar Kelimeler
Kaynakça
- Kaggle, "Medical Text Classification Dataset." Available: https://www.kaggle.com/code/chaitanyakck/medical-text-classification/, (Accessed: Jan. 24, 2025).
- M. Bayer, M.-A. Kaufhold, and C. Reuter, “A survey on data augmentation for text classification,” ACM Computing Surveys, vol. 55, no. 7, pp. 1-39, 2022.
- J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang, “BioBERT: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234-1240, 2020.
- K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling clinical notes and predicting hospital readmission,” arXiv preprint, arXiv:1904.05342, 2019.
- K. M. Chaitrashree, T. N. Sneha, S. R. Tanushree, G. R. Usha, and T. C. Pramod, “Unstructured medical text classification using machine learning and deep learning approaches,” in 2021 IEEE International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), pp. 429-433, 2021.
- H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance, ” BMC Medical Research Methodology, vol. 22, no. 181, 2022.
- Q. Li, H. Peng, J. Li, C. Xia, R. Yang, L. Sun, P. S. Yu, and L. He, "A survey on text classification: From traditional to deep learning,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 2, pp. 1-41, 2022.
- K. Taha, P. D. Yoo, C. Yeun, D. Homouz, and A. Taha, “A comprehensive survey of text classification techniques and their research applications: Observational and experimental insights,” Computer Science Review, vol. 54, no. 100664, 2024.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Derin Öğrenme, Yarı ve Denetimsiz Öğrenme, Makine Öğrenme (Diğer)
Bölüm
Araştırma Makalesi
Erken Görünüm Tarihi
28 Nisan 2025
Yayımlanma Tarihi
30 Nisan 2025
Gönderilme Tarihi
6 Aralık 2024
Kabul Tarihi
13 Şubat 2025
Yayımlandığı Sayı
Yıl 2025 Cilt: 7 Sayı: 1
APA
Soygazi, F., & Oğuz, D. (2025). Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Mühendislik Bilimleri ve Araştırmaları Dergisi, 7(1), 60-69. https://doi.org/10.46387/bjesr.1597329
AMA
1.Soygazi F, Oğuz D. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 2025;7(1):60-69. doi:10.46387/bjesr.1597329
Chicago
Soygazi, Fatih, ve Damla Oğuz. 2025. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7 (1): 60-69. https://doi.org/10.46387/bjesr.1597329.
EndNote
Soygazi F, Oğuz D (01 Nisan 2025) Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Mühendislik Bilimleri ve Araştırmaları Dergisi 7 1 60–69.
IEEE
[1]F. Soygazi ve D. Oğuz, “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”, Müh.Bil.ve Araş.Dergisi, c. 7, sy 1, ss. 60–69, Nis. 2025, doi: 10.46387/bjesr.1597329.
ISNAD
Soygazi, Fatih - Oğuz, Damla. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi 7/1 (01 Nisan 2025): 60-69. https://doi.org/10.46387/bjesr.1597329.
JAMA
1.Soygazi F, Oğuz D. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 2025;7:60–69.
MLA
Soygazi, Fatih, ve Damla Oğuz. “Medical Text Classification Using Semisupervised Learning and Bert-Based Models”. Mühendislik Bilimleri ve Araştırmaları Dergisi, c. 7, sy 1, Nisan 2025, ss. 60-69, doi:10.46387/bjesr.1597329.
Vancouver
1.Fatih Soygazi, Damla Oğuz. Medical Text Classification Using Semisupervised Learning and Bert-Based Models. Müh.Bil.ve Araş.Dergisi. 01 Nisan 2025;7(1):60-9. doi:10.46387/bjesr.1597329