Research Article

COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS

Volume: 16 Number: 1 March 15, 2026
EN TR

COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS

Abstract

Objective: Radiographic assessment of knee osteoarthritis (OA) commonly relies on the Kellgren–Lawrence (KL) grading system; however, its subjective nature leads to considerable inter- and intra-observer variability, particularly in early disease stages. This study aimed to comparatively evaluate pre-trained deep learning models for automated KL grading of knee OA from plain radiographs using an ordinal-aware learning and evaluation framework. Materials and Methods: This retrospective experimental study utilized 8,260 knee radiographs obtained from the publicly available Osteoarthritis Initiative (OAI) dataset, with expert-assigned KL grades ranging from 0 to 4. Five pre-trained convolutional neural network architectures (VGG-16, ResNet-50, DenseNet-121, EfficientNetB0, and InceptionV3) were implemented using transfer learning. All models were trained under identical preprocessing, augmentation, class-balancing, and hyperparameter settings to ensure fair comparison. An ordinal CORAL-based loss function was employed to model the ordered nature of KL grades. Model performance was primarily evaluated using quadratic weighted kappa (QWK), along with accuracy, balanced accuracy, macro-F1 score, ROC–AUC, and precision–recall analyses. Decision curve analysis (DCA) was conducted at clinically relevant thresholds (KL ≥ 2 and KL ≥ 3) to assess potential clinical utility. Results: Among the evaluated architectures, VGG-16 achieved the highest ordinal agreement on the independent test set (QWK = 0.830), with a macro-F1 score of 0.676 and balanced accuracy of 0.684. Overall, model performance was higher for moderate-to-severe OA stages (KL grades 3 and 4), while lower discriminative performance was observed for early-stage disease, particularly KL grade 1. Confusion matrix analysis demonstrated that most misclassifications occurred between adjacent KL grades, indicating clinically plausible ordinal behavior. Decision curve analysis revealed that the proposed ordinal deep learning model provided a consistently higher net benefit than treat-all and treat-none strategies across a wide range of threshold probabilities for both KL ≥ 2 and KL ≥ 3 scenarios. Conclusion: Ordinal-aware deep learning models can effectively perform automated KL grading of knee osteoarthritis from plain radiographs, yielding clinically meaningful and interpretable results. The proposed framework reduces observer-dependent variability and demonstrates potential as a decision-support tool for both early and advanced stages of knee OA. Further validation using multi-center datasets is warranted to enhance clinical generalizability.

Keywords

Supporting Institution

Yozgat Bozok University

Ethical Statement

For this study, an application was submitted to the Non-Interventional Clinical Research Ethics Committee of Yozgat Bozok University to determine whether ethical approval was required.

Thanks

We would like to thank the Editor and the reviewers for the time and effort they devoted to the evaluation of our manuscript. We sincerely appreciate their constructive comments and valuable suggestions, which we believe have significantly improved the scientific quality and clarity of our work.

References

  1. 1. Karataş T, Yılmaz E, Polat Ü. Osteoartrit yönetimi, yaşam kalitesi ve hemşirenin destekleyici rolü. Med J SDU. 2022;29(2):265-71.
  2. 2. Yıldız K, Çelik S, Taşkın E, Boy F, Aygün Ü. Osteoartrit tanılı hastalarda platelet indekslerinin incelenmesi. Van Saglik Bilim Derg. 2024;17(3):131-5.
  3. 3. Bilge A, Ulusoy RG, Üstebay S, Öztürk Ö. Osteoartrit. Kafkas J Med Sci. 2018;8(1):133-42.
  4. 4. Misir A, Yildiz KI, Kizkapan TB, Incesoy MA. Kellgren–Lawrence grade of osteoarthritis is associated with change in certain morphological parameters. Knee. 2020;27(3):633-41.
  5. 5. Kohn MD, Sassoon AA, Fernando ND. Classifications in brief: Kellgren–Lawrence classification of osteoarthritis. Clin Orthop Relat Res. 2016;474(8):1886-93.
  6. 6. Zhao H, Ou L, Zhang Z, Zhang L, Liu K, Kuang J. The value of deep learning-based X-ray techniques in detecting and classifying Kellgren–Lawrence grades of knee osteoarthritis: a systematic review and meta-analysis. Eur Radiol. 2025;35:327-40.
  7. 7. Köse Ö, Acar B, Çay F, Yilmaz B, Güler F, Yüksel HY. Inter- and intraobserver reliabilities of four different radiographic grading scales of osteoarthritis of the knee joint. J Knee Surg. 2018;31(3):247-53.
  8. 8. Li W, Xiao Z, Liu J, Feng J, Zhu D, Liao J, et al. Deep learning-assisted knee osteoarthritis automatic grading on plain radiographs: the value of multiview X-ray images and prior knowledge. Quant Imaging Med Surg. 2023;13(6):3587-601.

Details

Primary Language

English

Subjects

​Internal Diseases, Rheumatology and Arthritis

Journal Section

Research Article

Publication Date

March 15, 2026

Submission Date

January 8, 2026

Acceptance Date

February 24, 2026

Published in Issue

Year 2026 Volume: 16 Number: 1

APA
Kızılkaya, H., Ortataş, F. N., & Üreten, K. (2026). COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS. Bozok Tıp Dergisi, 16(1), 115-125. https://doi.org/10.16919/bozoktip.1859321
AMA
1.Kızılkaya H, Ortataş FN, Üreten K. COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS. Bozok Tıp Dergisi. 2026;16(1):115-125. doi:10.16919/bozoktip.1859321
Chicago
Kızılkaya, Hafize, Fatma Nur Ortataş, and Kemal Üreten. 2026. “COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS”. Bozok Tıp Dergisi 16 (1): 115-25. https://doi.org/10.16919/bozoktip.1859321.
EndNote
Kızılkaya H, Ortataş FN, Üreten K (March 1, 2026) COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS. Bozok Tıp Dergisi 16 1 115–125.
IEEE
[1]H. Kızılkaya, F. N. Ortataş, and K. Üreten, “COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS”, Bozok Tıp Dergisi, vol. 16, no. 1, pp. 115–125, Mar. 2026, doi: 10.16919/bozoktip.1859321.
ISNAD
Kızılkaya, Hafize - Ortataş, Fatma Nur - Üreten, Kemal. “COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS”. Bozok Tıp Dergisi 16/1 (March 1, 2026): 115-125. https://doi.org/10.16919/bozoktip.1859321.
JAMA
1.Kızılkaya H, Ortataş FN, Üreten K. COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS. Bozok Tıp Dergisi. 2026;16:115–125.
MLA
Kızılkaya, Hafize, et al. “COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS”. Bozok Tıp Dergisi, vol. 16, no. 1, Mar. 2026, pp. 115-2, doi:10.16919/bozoktip.1859321.
Vancouver
1.Hafize Kızılkaya, Fatma Nur Ortataş, Kemal Üreten. COMPARISON OF THE PERFORMANCE OF PRETRAINED DEEP LEARNING MODELS FOR THE AUTOMATIC KELLGREN-LAWRENCE GRADING OF KNEE OSTEOARTHRITIS USING PLAIN RADIOGRAPHS. Bozok Tıp Dergisi. 2026 Mar. 1;16(1):115-2. doi:10.16919/bozoktip.1859321