Year: 2026 | Month: June | Volume: 13 | Issue: 6 | Pages: 611-620
DOI: https://doi.org/10.52403/ijrr.20260659
Validity and Reliability of the Pfirrmann and Schizas Criteria Degrees in Lumbar Degenerative Disease Patients Using the Deep Learning Method
Ivan Alexander Liando1, I Wayan Suryanto Dusak1, I Gusti Lanang Ngurah Agung Artha Wiguna1, I Ketut Suyasa1, Elysanti Dwi Martadiani2, Made Bramantya Karna1, I Gusti Ngurah Wien Aryana1, Cokorda Gde Oka Dharmayuda1, I Gede Eka Wiratnaya1, Anak Agung Gde Yuda Asmara1, I Wayan Subawa1
1Department of Orthopaedic and Traumatology, Faculty of Medicine, Udayana University/Ngoerah Hospital, Denpasar, Indonesia
2Department of Radiology, Faculty of Medicine, Udayana University/Ngoerah Hospital, Denpasar, Indonesia
Corresponding Author: Ivan Alexander Liando
ABSTRACT
Traditional diagnosis of lumbar degenerative disease relies on clinical evaluation and MRI imaging. Machine learning (ML) and deep learning (DL) have potential in automating the assessment of spinal conditions. This study aims to evaluate the validity and reliability of deep learning models in determining the Pfirrmann and Schizas grade for lumbar degenerative disease using MRI. A retrospective study was conducted using MRI scans of lumbar spine patients. A deep learning model was trained to classify degenerative changes based on the Pfirrmann and Schizas scoring systems. Diagnostic accuracy was assessed using a Receiver Operating Characteristic (ROC) curve, and reliability was measured by interobserver agreement. A total of 170 patients were included, with a mean age of 55.20 ± 13.34 years (range 20–>60 years) and a near-equal sex distribution (48.8% male, 51.2% female). The deep learning model demonstrated good-to-excellent diagnostic validity for both Pfirrmann and Schizas classification across all five lumbar levels (L1–L2 to L5–S1), with sensitivity ranging from 80.85% to 96.30% and specificity from 80.17% to 95.24% for Pfirrmann, and sensitivity 82.76%–94.74% and specificity 90.15%–96.79% for Schizas. AUC-ROC values indicated good accuracy for Pfirrmann (0.815–0.890) and good-to-excellent accuracy for Schizas (0.880–0.929). Reliability was acceptable for both classifications (Cronbach's Alpha: Pfirrmann 0.792, Schizas 0.684). PPV was relatively lower across levels, likely reflecting class imbalance toward mild-to-moderate grades in the study cohort. Deep learning models have the potential to improve the diagnosis of LDD, enhance early intervention, and improve patient outcomes.
Keywords: Artificial intelligence, deep learning, lumbar degenerative disease, Pfirrmann classification, schizophrenia classification.
[PDF Full Text]