Purpose:
Factors that impact on head and neck cancer (HNC) survival, more specifically cancer-specific survival (CSS), are poorly understood with no available survival data beyond 2009 in NSW. Furthermore, national and state cancer registries contain limited clinical data necessary for robust machine learning analysis. Therefore the Prince of Wales Hospital (POWH) manually curated HNC datasets (HREC 10/040) were used to predict CSS and identify prognostic factors.
Methodology:
Content data was sourced from MOSAIQ, an oncology-specific electronic medical records system, whereas cause/date of death data was obtained via linkage with the National Death Index. Newly diagnosed patients with squamous cell carcinoma of the oral cavity, oropharynx, nasopharynx, hypopharynx, and larynx without distant metastases, presenting at POWH between 01-01-2000 and 31-12-2017 for definitive treatment were eligible. Nineteen available demographic, tumour, and treatment variables were used for prediction. Five machine learning models (logistic regression, gradient boosted trees, random forest, support vector machine, artificial neural network) were trained with 5-fold cross-validation in Python. Factors associated with CSS were examined using forward stepwise conditional Cox regression in SPSS 25.
Results:
Data on 886 patients was analysed, of whom 238 died from HNC (median follow up of 3.67 years). The support vector machine model scored the best overall performance metrics with 85% classification accuracy, 65% sensitivity/recall, 93% specificity, 78% precision, 70% f1 score, and 91% area under the curve. Multivariate Cox regression identified higher TÂ and N stage, hypopharynx and oral cavity cancers (compared to larynx), and any form of radiotherapy treatment as negative prognostic factors, and operable cancers as favourable prognostic factors.
Conclusion:
Using manually curated datasets has demonstrated high classification accuracy. The support vector machine model should be considered for use in clinical decision support systems to improve our understanding of the factors impacting on and predicting CSS to drive improvements in patient care.