Validity and Reliability of the Pfirrmann and Schizas Criteria Degrees in Lumbar Degenerative Disease Patients Using the Deep Learning Method

Year: 2026 | Month: June | Volume: 13 | Issue: 6 | Pages: 611-620

DOI: https://doi.org/10.52403/ijrr.20260659

Validity and Reliability of the Pfirrmann and Schizas Criteria Degrees in Lumbar Degenerative Disease Patients Using the Deep Learning Method

Ivan Alexander Liando¹, I Wayan Suryanto Dusak¹, I Gusti Lanang Ngurah Agung Artha Wiguna¹, I Ketut Suyasa¹, Elysanti Dwi Martadiani², Made Bramantya Karna¹, I Gusti Ngurah Wien Aryana¹, Cokorda Gde Oka Dharmayuda¹, I Gede Eka Wiratnaya¹, Anak Agung Gde Yuda Asmara¹, I Wayan Subawa¹

¹Department of Orthopaedic and Traumatology, Faculty of Medicine, Udayana University/Ngoerah Hospital, Denpasar, Indonesia
²Department of Radiology, Faculty of Medicine, Udayana University/Ngoerah Hospital, Denpasar, Indonesia

Corresponding Author: Ivan Alexander Liando

ABSTRACT

Traditional diagnosis of lumbar degenerative disease relies on clinical evaluation and MRI imaging. Machine learning (ML) and deep learning (DL) have potential in automating the assessment of spinal conditions. This study aims to evaluate the validity and reliability of deep learning models in determining the Pfirrmann and Schizas grade for lumbar degenerative disease using MRI. A retrospective study was conducted using MRI scans of lumbar spine patients. A deep learning model was trained to classify degenerative changes based on the Pfirrmann and Schizas scoring systems. Diagnostic accuracy was assessed using a Receiver Operating Characteristic (ROC) curve, and reliability was measured by interobserver agreement. A total of 170 patients were included, with a mean age of 55.20 ± 13.34 years (range 20–>60 years) and a near-equal sex distribution (48.8% male, 51.2% female). The deep learning model demonstrated good-to-excellent diagnostic validity for both Pfirrmann and Schizas classification across all five lumbar levels (L1–L2 to L5–S1), with sensitivity ranging from 80.85% to 96.30% and specificity from 80.17% to 95.24% for Pfirrmann, and sensitivity 82.76%–94.74% and specificity 90.15%–96.79% for Schizas. AUC-ROC values indicated good accuracy for Pfirrmann (0.815–0.890) and good-to-excellent accuracy for Schizas (0.880–0.929). Reliability was acceptable for both classifications (Cronbach's Alpha: Pfirrmann 0.792, Schizas 0.684). PPV was relatively lower across levels, likely reflecting class imbalance toward mild-to-moderate grades in the study cohort. Deep learning models have the potential to improve the diagnosis of LDD, enhance early intervention, and improve patient outcomes.

Keywords: Artificial intelligence, deep learning, lumbar degenerative disease, Pfirrmann classification, schizophrenia classification.

[PDF Full Text]

Validity and Reliability of the Pfirrmann and Schizas Criteria Degrees in Lumbar Degenerative Disease Patients Using the Deep Learning Method

Ivan Alexander Liando1, I Wayan Suryanto Dusak1, I Gusti Lanang Ngurah Agung Artha Wiguna1, I Ketut Suyasa1, Elysanti Dwi Martadiani2, Made Bramantya Karna1, I Gusti Ngurah Wien Aryana1, Cokorda Gde Oka Dharmayuda1, I Gede Eka Wiratnaya1, Anak Agung Gde Yuda Asmara1, I Wayan Subawa1

ABSTRACT

Ivan Alexander Liando¹, I Wayan Suryanto Dusak¹, I Gusti Lanang Ngurah Agung Artha Wiguna¹, I Ketut Suyasa¹, Elysanti Dwi Martadiani², Made Bramantya Karna¹, I Gusti Ngurah Wien Aryana¹, Cokorda Gde Oka Dharmayuda¹, I Gede Eka Wiratnaya¹, Anak Agung Gde Yuda Asmara¹, I Wayan Subawa¹