Machine learning models showed similar performance in identifying patients on antithrombotic agents who were at high risk for gastrointestinal (GI) bleeding, with two approaches marginally outperforming the most widely used existing risk calculator.
In a cross-sectional study of more than 300,000 patients, two machine learning models—regularized Cox regression (RegCox) and extreme gradient boosting (XGBoost)—better predicted GI bleeding risk than a modified HAS-BLED score, reported Jeph Herrin, PhD, of Yale University, and co-authors in JAMA Network Open.
A third approach, random survival forests (RSF), performed similarly to HAS-BLED, they added.
HAS-BLED awards points for hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use, with higher scores implying higher bleeding risk.
“Given that we modified the HAS-BLED model to use claims-based risk factors, this finding should not be interpreted as a direct comparison with the established clinical model, but the finding does suggest that machine learning approaches can improve on standard approaches if the same data are used,” Herrin and colleagues wrote.
The three machine learning models were trained on de-identified records of 305,463 patients who were prescribed antithrombotic medication. Data came from the OptumLabs Data Warehouse, a national U.S. claims database. HAS-BLED was modified as used in a prior study to calculate risk based on electronic health record data.
The models were compared with HAS-BLED on the ability to predict the risk of individual patients experiencing GI bleeding at 6 and 12 months after their first prescription for antithrombotic drugs. Model variables included demographic characteristics, comorbidities, and medications.
Findings showed:
- HAS-BLED had an area under the receiver operating characteristic curve (AUC) of 0.6 for 6 months and an AUC of 0.57 for 12 months.
- RegCox and XGBoost both had an AUC of 0.67 for both risks.
- RSF had an AUC of 0.62 for 6 months and 0.6 for 12 months.
“AUCs of these models indicate that they should be considered as supplementary to other input for clinical decision making because they all had a limited ability to discriminate” the researchers noted. “This study’s findings should be viewed primarily as informing the development of better risk models for GI bleed.”
“This is a good start,” noted Fei Wang, PhD, of Cornell University Medical College in New York, in an accompanying editorial. “Many other factors, including more comprehensive performance evaluation metrics, model interpretability, and data quantity need to be considered for assessing the potential clinical impact of these models.”
“Prospective evaluation is essential for assessing the real clinical impact of these risk prediction models,” he added. “Many factors other than the model itself, such as patient status, clinician behavior, and clinics operation, may impact the model deployment process and associated clinical outcomes.”
Machine learning is considered a form of artificial intelligence. A 2020 review observed that “learning in a clinical setting presents unique challenges that complicate the use of common machine learning methodologies. For example, diseases in electronic health records are poorly labeled, conditions can encompass multiple underlying endotypes, and healthy individuals are underrepresented.”
The approach has been applied to multiple areas in medicine; examples include studies that predicted acute kidney injury and pediatric suicide risk.
In the present study, Herrin and colleagues included patients 18 or older prescribed vitamin K antagonists, direct oral anticoagulants, or thienopyridine antiplatelet agents between January 2016 and December 2019. Mean age overall was 69 and 45.8% were women. About a third of participants were used for model development, the remainder for verification of the model.
All had a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism. Overall, 56.4% were receiving anticoagulants, 42% antiplatelet agents, and 1.6% both. Followup was 12 months or more.
“All of the approaches used in this study, including the HAS-BLED, had low positive predicted values and high negative predictive values, indicating that all of the models are better at identifying patients who will not experience a GI bleed than at identifying those who will,” the researchers noted. “This suggests that using any of these models for clinical decision making will be most appropriate for identifying patients at low risk.”
“The quantitative prediction improvements for machine learning models over HAS-BLED were marginal based on the AUC values,” the editorialist noted. “While quantitative performance is absolutely important, there is no perfect threshold of AUC value that means that the model can be used in clinical practice by exceeding such threshold.”
“Other performance metrics, such as sensitivity, specificity, and positive predicted values, are also important for us to comprehensively understand the model behaviors,” he added.
Model interpretability or explainability also is crucial, Wang said: “The role of these models in clinical practice is decision support, rather than making decisions. Therefore, clinicians prefer to use models that they can understand and that align well with their own experience and knowledge.”
“This is an important reason why scorecard-type risk calculators, like HAS-BLED, are popular in clinical practice, despite the fact that their quantitative performances may not be high,” he added. “HAS-BLED score is simply the sum of individual risk factor scores, while the relationships among the input variables and the GIB risk in the three machine learning models were not as straightforward. This may hinder their clinical utilities.”
Limitations of the study included the absence of uninsured people or those insured by Medicare in the database, which may limit generalizability.
-
Machine learning models showed similar performance in identifying patients on antithrombotic agents who were at high risk for gastrointestinal bleeding, with two approaches marginally outperforming the most widely used existing risk calculator.
-
Prospective evaluation is essential for assessing the clinical impact of machine-learning risk prediction models, the editorialist pointed out.
Paul Smyth, MD, Contributing Writer, BreakingMED™
This research was funded by a grant from the Agency for Healthcare Research and Quality (AHRQ).
Herrin reported receiving research funding from the National Cancer Institute, AHRQ, the Patient Centered Outcomes Research Institute, and the Centers for Medicare & Medicaid Services.
Wang reported receiving personal fees from IBM, Boehringer Ingelheim, and American Air Liquide and grants from the National Science Foundation, National Institutes of Health, Michael J. Fox Foundation for Parkinson’s Research, Office of Naval Research, and Sanofi outside the submitted work.
Cat ID: 102
Topic ID: 74,102,102,914,188,130,192,925