Advanced machine learning and natural language processing approaches were combined to identify patients with non-metastatic castration-resistant prostate cancer from electronic health records data.
Through combining machine learning and rule-based natural language processing (NLP), researchers developed an algorithm to leverage electronic health records (EHRs) to identify patients with non-metastatic castration-resistant prostate cancer (nmCRPC).1
Utilizing EHR data from the Department of Veterans affairs nationwide, researchers identified 13,199 patients in their final nmCRPC cohort of 654,148 patients with prostate cancer from 2006 to 2020. Of the total patients with prostate cancer identified by their algorithm, 26,506 patients were castration resistant, but in the nmCRPC cohort, 8,297 patients were excluded due to evidence of metastatic disease.
The accuracy of this machine learning algorithm was 86% with the NLP that classified those patients with metastatic disease, showing an accuracy of 96%, a precision of 99%, and a sensitivity of 98%. Moreover, there was an 86% accuracy within 3 months of the patient’s diagnosis to predict if they will progress to nmCRPC.
“It is important to be able to identify complex disease states from increasingly accessible EHR data,” the researchers from the Huntsman Cancer Institute at the University of Utah wrote, in a poster of their study. “We combined advanced machine learning and NLP approaches to identify [patients with] nmCRPC from EHR data including a variety of elements from multiple sources.”
The researchers used an extreme gradient boosting machine learning approach that was previously trained on a similar cohort of patients with prostate cancer identified within the Veterans Affairs’ cancer registries. International Classification of Disease (ICD) -9 and -10 codes were divided into 7-day intervals with the numbers of the ICD codes within each interval assigned as a set of predictive features for patients who progressed.
This also allowed researchers to exclude patients without prostate cancer that may have been in the EHR they looked through. Training patients were fed into the algorithm to teach it to categorize patients. This startedwith if patients experienced urinary symptoms, and if the answer was yes, it identified if the patient had ICD for bladder cancer or urinary tract infection, and if the answer was again yes, those patients were designated as ones without prostate cancer. Patients with ICD codes for prostate cancer were given a +2 value that allowed for proper weighting of the model to move on to predicting patient’s progression.
To further classify patients, those with evidence of prior surgical castration, current androgen deprivation therapy (ADT), or a testosterone level consistent with medical castration, those with 50 ng/dl or more (≤ 2.0 nmol/l), were considered castrate. These patients were then removed from the cohort. Moreover, patients with nmCRPC were defined as having a prostate cancer diagnosis, castration-resistant defined by if the patient had 2 consecutive increases in PSA while castrate, or no evidence of metastatic disease on radiology report.
In order to identify patients with metastatic disease, patient data was fed through the NLP to find non-negated mentions of metastatic disease in radiology reports. The algorithm then used a unified medical language system to identify metastatic vocabulary and identify patterns of metastatic disease, but it still required human review. Once this was done, a score was given to these patients to trigger identification within the wider algorithm looking at thousands of patients with prostate cancer.
According to the researchers, if a patient does not show signs of metastatic disease but has progression of their disease, despite them having castrate levels of testosterone signals, they transition to nmCRPC. This is typically after a patient initially responds to ADT but becomes resistant to therapies that inhibit androgen binding to the androgen receptor, blocking the potential of treatment. Identifying these patients is important to then adjust treatment and manage their disease progression.
“This approach classifies cancer diagnosis and date of diagnosis with reasonable accuracy,” the researchers concluded.