Machine Learning Methods Improve Prognostication, Identify Clinically Distinct Phenotypes, and Detect Heterogeneity in Response to Therapy in a Large Cohort of Heart Failure Patients
Background Whereas heart failure (HF) is a complex clinical syndrome, conventional approaches to its management have treated it as a singular disease, leading to inadequate patient care and inefficient clinical trials. We hypothesized that applying advanced analytics to a large cohort of HF patients would improve prognostication of outcomes, identify distinct patient phenotypes, and detect heterogeneity in treatment response.
Methods and Results The Swedish Heart Failure Registry is a nationwide registry collecting detailed demographic, clinical, laboratory, and medication data and linked to databases with outcome information. We applied random forest modeling to identify predictors of 1‐year survival. Cluster analysis was performed and validated using serial bootstrapping. Association between clusters and survival was assessed with Cox proportional hazards modeling and interaction testing was performed to assess for heterogeneity in response to HF pharmacotherapy across propensity‐matched clusters. Our study included 44 886 HF patients enrolled in the Swedish Heart Failure Registry between 2000 and 2012. Random forest modeling demonstrated excellent calibration and discrimination for survival (C‐statistic=0.83) whereas left ventricular ejection fraction did not (C‐statistic=0.52): there were no meaningful differences per strata of left ventricular ejection fraction (1‐year survival: 80%, 81%, 83%, and 84%). Cluster analysis using the 8 highest predictive variables identified 4 clinically relevant subgroups of HF with marked differences in 1‐year survival. There were significant interactions between propensity‐matched clusters (across age, sex, and left ventricular ejection fraction and the following medications: diuretics, angiotensin‐converting enzyme inhibitors, β‐blockers, and nitrates, P<0.001, all).
Conclusions Machine learning algorithms accurately predicted outcomes in a large data set of HF patients. Cluster analysis identified 4 distinct phenotypes that differed significantly in outcomes and in response to therapeutics. Use of these novel analytic approaches has the potential to enhance effectiveness of current therapies and transform future HF clinical trials.
What Is New?
We applied machine learning methodologies to a large clinical data set (>40 000 heart failure patients) and demonstrated that these methods can predict outcomes in a highly accurate manner as well as identify clinically distinct subgroups that have differential responses to commonly used therapies.
What Are the Clinical Implications?
Advanced analytic methods can use readily available patient clinical data to predict outcomes with a high degree of accuracy and precision.
Agnostic algorithms can define clinically recognizable patient clusters with unique clinical trajectories.
As healthcare systems collect massive amounts of information on patients, there is a need for machine learning methods to augment clinical decision making.
Advanced analytics can play a role in improving heart failure clinical trial design and execution.
Patients have had heart failure (HF) for centuries, and it is estimated that more than 37 million people worldwide are currently affected.1 Despite being a complex clinical syndrome, contemporary clinical descriptors lag far behind its nuanced scientific understanding. In fact, current classifications used clinically and in trials rely heavily on incomplete descriptors such as left ventricular ejection fraction (LVEF) cut points, stratifying patients simply as those with “reduced” or “preserved” LVEF: HFrEF and HFpEF.2
There is increasing recognition that such classifications are discordant with the current understanding of HF and may impair our ability to personalize risk assessment and treatment. The emphasis on LVEF is particularly notable as prior studies have shown only modest differences in long‐term survival among patients with “reduced” as compared with “preserved” LVEF.3, 4 Still further, numerous promising therapies have failed to demonstrate benefit in clinical trials where inclusion was based almost exclusively on LVEF.5 Despite this, recent guidelines have recommended even further subclassification of HF according to LVEF, with the introduction of HF with “midrange ejection fraction” as a distinct clinical entity.6 However, many thought leaders in HF have pushed for the move towards a more refined and nuanced classification of HF, beyond ejection fraction, as this approach is far more likely to have positive implications for individualized patient care and clinical trial design.2, 7 Against this backdrop, the Institute of Medicine has emphasized the need for a new taxonomy of disease that may provide a more accurate classification of HF, with the goal of enhancing diagnosis and treatment.8
Advanced analytics refers to the use of novel statistical approaches that harness the substantial computing power currently available to investigate large data sets.9, 10 In medicine, these methodologies can use a data‐driven approach to re‐examine phenotyping of complex diseases such as HF.11 Recently, these methods identified distinct phenotypes of HF with reduced and preserved ejection fraction (HFrEF and HFpEF) among 1619 patients with HFrEF enrolled in the HF‐ACTION (Heart Failure: A Controlled Trial Investigating Outcomes of Exercise Training) clinical trial and 397 patients seen at the HF clinic at outpatient clinic of the Northwestern University HFpEF Program.12, 13
These prior analyses, while intriguing, were limited by their small size, lack of real‐world applicability, and predetermination of patients according to LVEF. Accordingly, we sought to apply machine learning methods to the Swedish Heart Failure Registry, a richly characterized national cohort of more than 40 000 HF patients, to identify clinically meaningful patient subgroups, improve risk prediction, and detect heterogeneity in response to common HF therapies.
The data, analytic methods, and study materials will not be made available to other researchers for purposes of reproducing the results or replicating the procedure. These data are curated and maintained by the SwedeHF research foundation and can be requested at the following website: http://www.ucr.uu.se/rikssvikt-en/research/general-information-research.
Study Population and Data Sources
The Swedish Heart Failure Registry (SwedeHF), the details of which have been described elsewhere, is a nationwide registry enrolling both hospitalized patients and outpatients with clinician‐assessed HF from over 65 hospitals and 113 clinics within Sweden.14, 15, 16, 17 The registry was started in 2000 and fully implemented in 2003. The registry collects or derives more than 130 patient variables including detailed demographic, clinical, and laboratory data and is linked to other registries for additional baseline comorbidity data and for data on outcomes. For the current study, data were extracted from the database on patients enrolled from May 2000 through December 2012. This yielded 80 772 records that represented 51 060 unique patients. Patients lacking follow‐up were excluded, leaving us 44 886 patients for the analysis. All‐cause mortality at 1 year was the primary outcome for the present study and was obtained by merging the SwedeHF database with the Swedish Population Registry using the unique 10‐digit personal identification number of Swedish citizens. The Swedish Board of Health and Welfare (http://www.socialstyrelsen.se) maintains the Population Registry, the Patient Registry, and the Dispensed Drug Registry. The Population Registry provided date of death. The protocol, registration form, and annual report of SwedeHF are available at http://www.SwedeHF.se. Establishment of the registry and analysis of data for this study were approved by a multisite ethics committee. Individual patient consent was not required, but patients were informed of entry into national registries and allowed to opt out. The registry and this study conform to the Declaration of Helsinki.
Definitions were based on those used in the SwedeHF Registry. HF was diagnosed by the attending physician based on guideline recommendations at the time of inclusion. New York Heart Association functional classes I to IV were used to define severity. Revascularization was defined as history of coronary artery bypass surgery or percutaneous coronary intervention. Myocardial infarction was based on information from patient records. Type 2 diabetes mellitus was defined as a confirmed history of this diagnosis. Comorbidities were defined as the presence of any or several of the following: hypertension, atrial fibrillation, chronic obstructive pulmonary disease, valvular heart disease, stroke, peripheral artery disease, or idiopathic dilated cardiomyopathy, all of which were classified as “yes” or “no” based on patient records. LVEF was the most recently estimated value and was grouped into 4 classes: ≥50%, 40% to 49%, 30% to 39%, and <30%. Creatinine clearance was calculated with the Modification of Diet in Renal Disease formula and an estimated glomerular filtration rate <60 mL/min per 1.73 m2 was considered the cutoff for renal insufficiency.
As previously described, variables with >20% missing data were excluded from the analysis.12, 13 Missing data for the remaining 86 variables were handled using mean (numeric variables) or most common value (non‐numeric variables) imputation. We trained a random forest model to predict survival up to 1 year after hospital discharge or at an outpatient clinic visit.18, 19 The random forest algorithm was chosen because it can be applied to a data set with mixed variable types, performs well on data sets with large numbers of variables, is not prone to overfitting, and allows estimation of variable importance. The model was validated by 10‐fold bootstrapped cross‐validation. To assess model calibration, patients were grouped into deciles based on predicted risk before plotting observed versus predicted risk. We ranked predictors (variables) by decrease in Gini impurity index, estimated while training the random forest model. Using the 8 strongest derived predictors of mortality, we then clustered patients into 4 groups using K‐means clustering with Euclidean distance as a measure of dissimilarity. The variables were the following: age, creatinine, hemoglobin, weight, heart rate, systolic blood pressure, mean arterial pressure, and income. K‐means clustering is an unsupervised learning method that partitions N objects into K clusters in which each object belongs to the cluster with the nearest mean. K‐means can be computed in a computationally efficient manner for large data sets. We selected 4 clusters as the smallest number of clusters that optimized cluster separation (evaluated by the Silhouette score), cluster stability (evaluated by Jaccard coefficient estimation over 10 bootstrapped samples), and variation in cluster size and difference in mortality between clusters.20 The full cohort was also stratified according to cut points of LVEF that were adapted from the European Society of Cardiology Guidelines for the diagnosis and treatment of acute and chronic HF.4 Four cohorts were delineated representing preserved LVEF (LVEF≥50%) as well as mild‐to‐severe decrements in systolic function (LVEF 40–49%, LVEF 30–39%, and LVEF<30%). One‐year Kaplan–Meier survival curves were generated for each of the 4 clusters and LVEF group to compare the predictive utility of each methodology. The log‐rank test was used to compare the difference in survival between groups. Each novel cluster was interrogated for variability with respect to demographics, comorbidities, medications, laboratory data, and socioeconomics. Finally, interaction terms in a logistic regression model were used to test for the statistical significance of interactions between clusters and commonly prescribed HF medications. We sought to identify the variation in outcomes between each cluster when exposed to traditionally used HF medications, namely: β‐blockers, digoxin, angiotensin‐converting enzyme inhibitors, angiotensin receptor blockers, mineralocorticoid receptor antagonists, diuretics, and nitrates. We propensity‐matched groups (drug versus no drug) within each cluster for age, sex, and LVEF using the coarsened exact matching method; this ensures that groups being compared are evenly matched on selected variables by removing those subjects who skew the matching. Hazard ratios were estimated through Cox regression after adjustment for LVEF, the major criteria for use of HF therapeutics. Analyses were performed using R 3.2.2 (R Development Core Team, Vienna, Austria). A 2‐sided P≤0.05 was considered statistically significant for all analyses.
Random Forest Machine Learning
The Random Forest (RF) machine learning algorithm is a process of fitting a series of classification and regression trees to the data.18 A tree is constructed by first taking a single bootstrap sample of the data. The data are iteratively split into nodes as in a standard classification tree; however, the RF process uses a random subset of the predictors at each considered split. Each of the trees is fitted and results are combined across trees. The R package “randomForest” was used to implement this method. We trained a RF model to predict survival up to 1 year after hospital discharge or an outpatient clinic visit.
Model Validation and Calibration
The model was validated by 10‐fold bootstrapped cross‐validation. To assess model calibration, patients were grouped into deciles based on predicted risk before plotting observed versus predicted risk. The RF algorithm was chosen because it can be applied to a data set with mixed variable types, performs well on data sets with substantial number of variables, is not prone to overfitting, and allows estimation of variable importance. We ranked predictors (variables) by decrease in Gini impurity index, estimated while training the RF model.21
Using the 8 strongest derived predictors of mortality, we then clustered patients into 4 groups using K‐means clustering with Euclidean distance as a measure of dissimilarity.12, 13 The variables were the following: age, creatinine, hemoglobin, weight, heart rate, systolic blood pressure, mean arterial pressure, and income. K‐means clustering is an unsupervised learning method that partitions n objects into k clusters in which each object belongs to the cluster with the nearest mean. K‐means can be computed in a computationally efficient manner for large data sets. K‐means clustering is an unsupervised learning method that partitions N objects into K clusters in which each object belongs to the cluster with the nearest mean. K‐means can be computed in a computationally efficient manner for large data sets. We selected 4 clusters as the smallest number of clusters that optimized cluster separation (evaluated by Silhouette score), cluster stability (evaluated by Jaccard coefficient estimation over 10 bootstrapped samples), and variation in cluster size and difference in mortality between clusters.22 The Jaccard similarity coefficient is a measure of similarity between 2 data sets. A high coefficient indicates that the same members were grouped together each time. Low coefficients meant that a subject jumped across clusters when the clustering is redone with a bootstrapped sample. Jaccard Index=(the number in both sets)/(the number in either set)×100. Silhouette analysis is used to study the separation between clusters. The silhouette score measures how similar a subject is (considering all the variables used for clustering) to subjects in the same cluster, and how distant from members of other clusters.
Baseline Characteristics Across Clusters and LVEF
Baseline characteristics of the patients in the 4 clusters are described below and listed in Table 1; those according to LVEF cut points are shown in Table 2. Compared with differences in patient characteristics when stratified by clusters, demarcation according to LVEF was far more homogeneous.
As shown, patients in Cluster 1 of 4 (N=11 090; 24.7%) were the oldest, weighed the least, and were more likely than any other cluster to be women. This cluster had, by far, the largest percentage of patients with a normal LVEF (LVEF>50%). They had the highest median blood pressure and the highest percentage of patients with prior strokes/transient ischemic attacks. These patients were the most likely to have a nonischemic cause for their cardiomyopathy (3.4%) and had the lowest percentage to undergo percutaneous coronary intervention (9.7%) or coronary artery bypass surgery (16.3%). A high percentage of these patients had comorbid conditions such as atrial fibrillation (55%), peripheral artery disease (11%), renal insufficiency (12%), aortic stenosis (11%), and prior malignancy (15%) that were second only to Cluster 3. Patients in Cluster 1 also had the second lowest hemoglobin levels and the highest plasma brain natriuretic peptide levels. They had the lowest use of β‐blockers and angiotensin‐converting enzyme inhibitors and second highest use of diuretics, nitrates, and digoxin. Use of implanted device therapies was lowest in this cluster. Patients in this cluster had the lowest rates of smoking or alcohol abuse. They were the least likely to be married/cohabitating or university educated and had the lowest income. They were also the least likely to be seen in the cardiology clinic for follow‐up.
Cluster 2 of 4 (N=9000; 20.1%) comprised mostly men (≈77%) and had the second lowest median age. These patients tended to have the highest weight, with a median body mass index in the obese range (30 kg/m2). They had the second highest median blood pressures and mean arterial pressures. This cluster was characterized by a high prevalence of diabetes mellitus, dyslipidemia, and hypertension: They also had the highest median hemoglobin A1C and cholesterol levels. They had the second highest (after Cluster 4) rates of ischemic cardiomyopathy and patients with reduced LVEF (<40%). They had the highest number of patients who underwent coronary artery bypass surgery (23%). Patients in this cluster generally had the second lowest prevalence of comorbid conditions, again second only to Cluster 4. Accordingly, use of guideline‐recommended HF medications was second to Cluster 4. They had the lowest natriuretic peptide levels. These patients were most likely to be married/cohabitating (62%), had the second highest rates of a university education (16%), and a relatively higher number (51%) were seen in a cardiology specialty clinic.
Cluster 3 of 4 (17 438; 38.8%) were also older, as Cluster 1, but almost evenly split between men and women. These patients had a far lower prevalence of LVEF (>50%). These patients similarly had the lowest body mass index and the highest prevalence of comorbid conditions such as renal insufficiency, atrial fibrillation, aortic stenosis, chronic obstructive pulmonary disease, peripheral artery disease, and malignancy. They had the highest percentage of patients who had a prior myocardial infarction and 22% had undergone coronary artery bypass surgery. They had the lowest hemoglobin levels as well as the lowest estimated glomerular filtration rate. They had the highest levels of N‐terminal prohormone of brain natriuretic peptide. They had the highest use of diuretics, nitrates, and digoxin; their use of β‐blockers and angiotensin‐converting enzyme inhibitor (ACE‐I)/angiotensin receptor blocker was the lowest. They tended to have the lowest rates of university education and had the lowest income.
Cluster 4 of 4 (N=7358; 16.4%) was the smallest cluster, and comprised the youngest patients, with a median age of 60 years. They tended to be males (74%) and had the second highest median body mass index (27 kg/m2). The majority of patients in this cluster had HFrEF (LVEF<40%); it was the highest of any cluster. These patients had the lowest rates of hypertension and diabetes mellitus, but the highest rates, by far, of smoking. They had the highest prevalence of ischemic cardiomyopathy and the highest likelihood of having undergone coronary artery revascularization, especially via percutaneous coronary intervention (19%). Patients in this cluster had the lowest rates of comorbid conditions, had the best renal function, and the second‐lowest natriuretic peptide levels. Patients in this cluster had the highest use of neurohormonal blockade and the lowest use of diuretics. They were more likely than any other cluster to be treated with an implantable device. They were the most likely to be married/cohabitating, have a university education, and be followed in a cardiology clinic.
Clinical Outcomes Across Clusters
As shown in Figure 1, analysis of the 1‐year outcomes showed marked differences in outcomes per cluster, with 1‐year survival being as follows: 69% (Cluster 3), 77% (Cluster 1), 92% (Cluster 4), and 93% (Cluster 2). Compared with patients in Cluster 2 (lowest risk), patients had increased risk of adverse outcomes as follows: Cluster 1 (hazard ratio [HR] 3.31, 95% confidence interval [CI], 3.04–3.59), Cluster 3 (HR 4.52, 95% CI, 4.18–4.89), and Cluster 4 (HR 1.19, 95% CI, 1.07–1.33). In contrast, we noted only slight differences in outcomes per LVEF, with 1‐year survival as follows: 79.93% (LVEF≥50%), 80.92% (LVEF<30), 83.40% (LVEF 40–49%), and 83.72% (LVEF 30–39%).
The area under the curve (AUC) for the 8 strongest derived predictors of mortality from the RF modeling was 0.78. As shown in Figure 2, the addition of further variables improved the AUC to 0.83, which indicated a stronger ability to discriminate individual risk. Membership within clusters in themselves had modest predictive capabilities, with AUC of 0.68. Of note, the LVEF strata that have been proposed to categorize HF had extremely poor prognostic value (AUC=0.52). An online tool based on our methodology is presented at the following website: The SwedeHF Cluster Risk Score (http://hfcalculator.qure.ai) and provides both a prediction of 1‐year mortality as well as membership in the cluster that matches the individual patient best (Figure 3).
Interaction with HF Therapies
We noted significant differences in associations between therapies and mortality across patient clusters and therapies after propensity matching for age, sex, and LVEF, the key determinants of therapeutic intervention in HF (Figures 4 and 5). Patients on diuretics in all clusters did worse than those who were not on diuretics, potentially reflecting more advanced disease. However, there was evidence of interaction between use of diuretics and patient clusters, with patients in Cluster 4 doing the worst (HR for diuretics yes versus no, 2.68, 95% CI, 2.32–3.09), while those in Cluster 3 fared the best (HR 1.69, 95% CI, 1.57–1.82, Pinteraction<0.001). Patients in all clusters who were on β‐blockers and ACE‐I appeared to benefit from the therapy when compared with patients who were not on these therapies. For ACE‐I, despite heterogeneity in patient risks, outcomes were similar among all clusters, implying an interaction between therapy and cluster (Pinteraction<0.05). In the case of β‐blockers, patients in Cluster 3 and Cluster 4 appeared to derive the greatest benefit from therapy (HR 0.56, CI, 0.56–0.64), whereas those in Cluster 1 derived the least (HR 0.82, CI, 0.80–0.93), Pinteraction<0.001. We noted no significant association between digoxin therapy and outcomes in any of the clusters except for Cluster 4 (HR 1.27, CI, 1.13–1.43), and no evidence of interaction with clusters. Lastly, there was evidence of an association with harm with use of nitrates in all clusters, as well as a strong interaction (Pinteraction<0.001), such that Cluster 4 patients on this therapy did the worst and Cluster 3 patients did relatively better when treated with nitrates.
In this study, we used machine learning methods for a cohort of >40 000 HF patients in Sweden to examine whether we might gain unique insights into prognostication, categorization, and assessment of therapeutic heterogeneity. We generated 4 novel phenotypes of disease within this population using cluster analysis. Despite being entirely data driven, these phenotypic clusters were clinically recognizable and demonstrated strong prognostic value, more so than currently used approaches. We demonstrated that this method can identify differences in benefit from common therapeutics, a clinical actuality that needs to be quantified. As a proof of concept, we created an online tool that is readily amenable to inclusion into the electronic health record (EHR) and can allow for cluster assignment and prognostication based on clinical input. These findings highlight the potential for data‐driven classifications of HF to supplant the currently used categorization system, as well as the potential for novel and nuanced approaches to advance risk stratification, prediction, and treatment response. As shown in Figure 6, our work has important implications for patients and providers, with the ability to personalize risk and response to therapy, and for future clinical trials in which enriched phenotyping and more refined classification can facilitate testing of novel therapies.
Our goal was not to create yet another risk prediction score for HF. Rather, this was a proof‐of‐concept demonstration that machine learning methods can provide a very high degree of discrimination for patient risk (AUC=0.83), far superior to currently recommended risk models such as the Seattle Heart Failure Model (AUC=0.73) and the Meta‐Analysis Global Group in Chronic Heart Failure (MAGGIC) risk score (AUC=0.74).23, 24 Furthermore, whereas prediction models might do well on a population level, their ability to be precise in regard to the individual patient and patients outside of the derivation cohort is modest at best; this issue can be overcome by machine learning methods that can tailor predictability to the patients under study.25 Also, published risk models are static, and prognostically important variables such as therapies, habits, physical activity, and location can frequently change, profoundly limiting the clinical application of these risk models in the clinical setting. Last, tremendous amounts of patient‐specific data are available to integrated healthcare systems, ranging from biomarkers to geospatial mapping; this mandates the creation and application of innovative prediction algorithms that are malleable, constantly improving, and allow integration of massive amounts of information.9, 26, 27 As key components of a learning healthcare system, machine learning methods such as the ones demonstrated in this article are dynamic and can improve in response to feedback.
Drug development for HF has been characterized largely by failures of mechanistically promising compounds to show benefit in large clinical trials. A key reason for this has been the amalgamation of patients under the broad umbrellas of arbitrary LVEF cut points when it is increasingly recognized that HF is far too complex a syndrome to be adequately described using these subjective and simplistic variables. In fact, our findings extend the landmark work by Owan and colleagues in 2006 that, even in the contemporary era, there are only modest differences in outcomes across LVEF categories.3, 28, 29 Furthermore, we identified clusters that were clinically recognizable and practical in routine care and provided key information about patients beyond the current classification of HF. For example, Cluster 1 patients tended to be older females with hypertension, nonischemic cardiomyopathy, and preserved LVEF. They had a high burden of comorbid conditions and lowest rates of medication use. While this phenotype is intuitively recognized by clinicians, our work formalizes and quantifies the characteristics and associated risk as well as the potential therapy response in this and the other 3 clusters. Patients in this cluster on diuretics, aldosterone antagonists, and nitrates did much worse than those who were not. It is plausible that these patients might benefit from more aggressive use of ACE‐I/β‐blockers, even in cases of “preserved” LVEF, as well as a focus on management of comorbid conditions.16, 30 Cluster 2 patients were mostly men and appeared to have the constellation of pro‐atherosclerotic conditions, namely, diabetes mellitus, dyslipidemia, hypertension, and smoking. Thus, they had a high prevalence of patients with ischemic cardiomyopathy and high rates of patients with “reduced” LVEF. Patients in this cluster who were on diuretics, aldosterone antagonists, digoxin, and nitrates tended to fare worse than those who were not. Apart from avoiding the above medications, these patients might represent a “diabetic HF” phenotype that is increasingly being described, and be managed with aggressive neurohormonal blockade along with medications such as the sodium/glucose cotransporter‐2 inhibitors.31, 32 Cluster 3 patients were similar to Cluster 1 in being older with multiple comorbid conditions and a large percentage of patients with preserved LVEF. They appeared to have the highest rates of renal insufficiency and the most neurohormonal activation as gauged by natriuretic peptide levels. Accordingly, these patients had the worst survival, with only 69% alive at 1 year (compared with 93% of Cluster 2). Beyond neurohormonal blockade with ACE‐I and β‐blockers, all other medications were associated with harm in these patients. A focus on managing comorbid conditions and discussions of goals‐of‐care might be most important for this group of patients. Studies have shown, for example, the limited benefit of implantable cardioverter defibrillators in patients with concomitant chronic kidney disease.33, 34 Lastly, Cluster 4 patients were the youngest and tended to be males with ischemic cardiomyopathy and HFrEF. Despite being the smallest cluster, these patients best fit the profile of those who had been included in most “positive” HF clinical trials. Patients in this cluster on diuretics, aldosterone antagonists, digoxin, and nitrates appeared to do far worse than those who were not on these therapies. Our online tool provides a paradigm by which patient data from the electronic health record (EHR) might be used to rapidly assign patients to specific clusters and then enroll them into studies that are more specific for the therapy being evaluated.
Our aim in this project was to illustrate the potential of using machine learning algorithms to improve care of HF patients and provide novel pathways by which to test new therapies. Several key points should be considered in light of our findings. First, it is important to note that our patient clusters were clinically recognizable despite relatively superficial patient‐specific data in our cohort (eg, no detailed information on biomarkers) and outperformed the predictive and classification capabilities of LVEF‐based categories. Second, we uncovered the significant heterogeneity in response to guideline‐recommended HF therapies after correcting for LVEF, the key determinant of whether a patient qualifies for treatment, based on the guidelines. We demonstrated the near universal benefit of ACE‐I and β‐blockade, suggesting that these therapies work on pathways that may be fundamental to the syndrome. Digoxin and nitrates, therapies whose mortality benefit in HF is largely unsupported by clinical trial, tended to be associated with a null effect or harm across the clusters. The data in regard to aldosterone antagonists were intriguing, because while these medications have been shown to provide benefit in HF in closely monitored trials, real‐world experience has countered these findings.35 Although intriguing, these findings should be interpreted with caution given the observational nature of our data and potential for selection bias and unmeasured confounding.
Moving forward, it is entirely plausible that machine learning methods will be used across healthcare systems to identify patients who may gain the most benefit or most harm from therapies (eg, implantable cardioverter defibrillator use) and create platforms for “next generation” clinical trials. An example is demonstrated in Figure 3, where information from a fictitious patient is entered and they are fit into Cluster 4; subsequently, they can then be entered into particular treatment pathways that have been enriched for that particular patient phenotype.
The major limitation of our study is that the findings are heavily reliant on available data; a separate set of variables and more complete data are very likely to have yielded dissimilar clusters. We excluded those with >20% of missing data, with the missing variables likely to be laboratory values that have prognostic value in HF. Their inclusion—natriuretic peptide levels in particular—are likely to have impacted prognostication and clustering. Whereas a more comprehensive exploration of imputation methods might have improved accuracy slightly, our purpose was not to create yet another prediction algorithm. Rather, it was a proof‐of‐concept effort aimed at demonstrating that machine learning methodologies can prognosticate patients in an accurate and precise manner from a large patient data set. Also, this limitation is liable to occur in many similar settings that involve large databases and there is a need, in the future, for expert consensus regarding applying advanced analytics to clinical data of varying quality and completeness. Furthermore, our goal was not to present yet another set of patient clusters; rather, it was to show how machine learning algorithms have the potential to move us beyond the simplistic phenotyping of HF that hinders care of patients and development of new therapies. This approach has potential if embedded within electronic heart records in our ultimate quest to create a learning healthcare system. Second, we only included patients from Sweden in this article, a relatively homogeneous population of patients. However, this likely helped us do away with many unmeasured confounders that impact clinical trajectory, and strengthened our ability to find clinically important clusters. Lastly, the heterogeneity in treatment response may have to do with confounders other than age, sex, and LVEF, but our goal was to simply demonstrate a concept; furthermore, it is entirely in accordance with guidelines, where pharmacological therapies are based almost entirely on LVEF.
In conclusion, we found that machine learning algorithms predict outcomes in a large data set of HF patients from Sweden with a high degree of precision and accuracy. Cluster analysis identified 4 distinct phenotypes that differed significantly in outcomes and in response to therapeutics. The use of these novel analytic approaches has the potential to enhance effectiveness of current therapies and transform future clinical trials.
Sources of Funding
The Swedish Heart Failure Registry is funded by the Swedish Federal Government. Lund is a Swedish Research Council Clinical Researcher. SwedeHF data management was funded by the Swedish Research Council, the Swedish Heart‐Lung Foundation, and Stockholm County Council.
Ahmad reports a grant from Heart Failure Society of America unrelated to this work. Lund reports grants from AstraZeneca; Novartis and Boston Scientific unrelated to the present work, and consulting honoraria from AstraZeneca, Novartis, Bayer, Relypsa, and Vifor Pharma. Dahlström reports work grants from AstraZeneca unrelated to this work and consulting honoraria from AstraZeneca, Novartis, and Vifor Pharma. Desai reports research funding from the Centers for Medicare and Medicaid Services to develop and maintain performance measures that are used for public reporting, and support from Johnson & Johnson and Medtronic. K12 HS023000‐01 was from the Agency for Healthcare Research and Quality. The remaining authors have no disclosures to report.
- ↵Konstam MA, Abboud FM. Ejection fraction: misunderstood and overrated (changing the paradigm in categorizing heart failure). Circulation. 2017;135:717–719.
- ↵Ponikowski P, Voors AA, Anker SD, Bueno H, Cleland JG, Coats AJ, Falk V, Gonzalez‐Juanatey JR, Harjola VP, Jankowska EA, Jessup M, Linde C, Nihoyannopoulos P, Parissis JT, Pieske B, Riley JP, Rosano GM, Ruilope LM, Ruschitzka F, Rutten FH, van der Meer P; ESC Scientific Document Group . 2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: the task force for the diagnosis and treatment of acute and chronic heart failure of the European Society of Cardiology (ESC). Developed with the special contribution of the Heart Failure Association (HFA) of the ESC. Eur J Heart Fail. 2016;37:2129–2200.
- ↵Felker GM, Pang PS, Adams KF, Cleland JG, Cotter G, Dickstein K, Filippatos GS, Fonarow GC, Greenberg BH, Hernandez AF, Khan S, Komajda M, Konstam MA, Liu PP, Maggioni AP, Massie BM, McMurray JJ, Mehra M, Metra M, O'Connell J, O'Connor CM, Pina IL, Ponikowski P, Sabbah HN, Teerlink JR, Udelson JE, Yancy CW, Zannad F, Gheorghiade M; International AHFS Working Group . Clinical trials of pharmacological therapies in acute heart failure syndromes: lessons learned and directions forward. Circ Heart Fail. 2010;3:314–325.
- ↵Hsu JJ, Ziaeian B, Fonarow GC. Heart failure with mid‐range (borderline) ejection fraction: clinical implications and future directions. JACC Heart Fail. 2017;5:763–771.
- ↵Packer M. Heart failure with a mid‐range ejection fraction: a disorder that a psychiatrist would love. JACC Heart Fail. 2017;5:805–807.
- ↵Yancy CW, Jessup M, Bozkurt B, Butler J, Casey DE Jr., Colvin MM, Drazner MH, Filippatos G, Fonarow GC, Givertz MM, Hollenberg SM, Lindenfeld J, Masoudi FA, McBride PE, Peterson PN, Stevenson LW, Westlake C. 2016 ACC/AHA/HFSA Focused Update on New Pharmacological Therapy for Heart Failure: An Update of the 2013 ACCF/AHA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Failure Society of America. J Am Coll Cardiol. 2016;68:1476–1488.
- ↵Ahmad T, Testani JM, Desai NR. Can big data simplify the complexity of modern medicine?: prediction of right ventricular failure after left ventricular assist device support as a test case. JACC Heart Fail. 2016;4:722–725.
- ↵Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of medicine. JAMA. 2016;315:551–552.
- ↵Ahmad T, Pencina MJ, Schulte PJ, O'Brien E, Whellan DJ, Pina IL, Kitzman DW, Lee KL, O'Connor CM, Felker GM. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol. 2014;64:1765–1774.
- ↵Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, Bonow RO, Huang CC, Deo RC. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. 2015;131:269–279.
- ↵Frizzell JD, Liang L, Schulte PJ, Yancy CW, Heidenreich PA, Hernandez AF, Bhatt DL, Fonarow GC, Laskey WK. Prediction of 30‐day all‐cause readmissions in patients hospitalized for heart failure: comparison of machine learning and other statistical approaches. JAMA Cardiol. 2017;2:204–209.
- ↵Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine‐learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017;12:e0174944.
- ↵Lovmar L, Ahlford A, Jonsson M, Syvanen AC. Silhouette scores for assessment of SNP genotype clusters. BMC Genom. 2005;6:35.
- ↵Levy WC, Mozaffarian D, Linker DT, Sutradhar SC, Anker SD, Cropp AB, Anand I, Maggioni A, Burton P, Sullivan MD, Pitt B, Poole‐Wilson PA, Mann DL, Packer M. The Seattle heart failure model: prediction of survival in heart failure. Circulation. 2006;113:1424–1433.
- ↵Allen LA, Matlock DD, Shetterly SM, Xu S, Levy WC, Portalupi LB, McIlvennan CK, Gurwitz JH, Johnson ES, Smith DH, Magid DJ. Use of risk models to predict death in the next year among individual ambulatory patients with heart failure. JAMA Cardiol. 2017;2:435–441.
- ↵Krumholz HM. Big data and new knowledge in medicine: the thinking, training, and tools needed for a learning health system. Health Aff (Millwood). 2014;33:1163–1170.
- ↵Chen R, Mias GI, Li‐Pook‐Than J, Jiang L, Lam HY, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl‐Carrillo M, Gallardo M, Blasco MA, Greenberg PL, Snyder P, Klein TE, Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012;148:1293–1307.
- ↵Loscalzo J. Personalized cardiovascular medicine and drug development: time for a new paradigm. Circulation. 2012;125:638–645.
- ↵Mehra MR, Butler J. Comorbid conditions in heart failure: an unhappy marriage. Heart Fail Clin. 2014;10:ix.
- ↵Sattar N, Petrie MC, Zinman B, Januzzi JL Jr.. Novel diabetes drugs and the cardiovascular specialist. J Am Coll Cardiol. 2017;69:2646–2656.
- ↵Fitchett D, Butler J, van de Borne P, Zinman B, Lachin JM, Wanner C, Woerle HJ, Hantel S, George JT, Johansen OE, Inzucchi SE; EMPA‐REG OUTCOME® trial investigators . Effects of empagliflozin on risk for cardiovascular death and heart failure hospitalization across the spectrum of heart failure risk in the EMPA‐REG OUTCOME(R) trial. Eur Heart J. 2018;39:363–370.
- ↵Allen LA, Stevenson LW, Grady KL, Goldstein NE, Matlock DD, Arnold RM, Cook NR, Felker GM, Francis GS, Hauptman PJ, Havranek EP, Krumholz HM, Mancini D, Riegel B, Spertus JA, American Heart Association; Council on Quality of Care and Outcomes Research; Council on Cardiovascular Nursing; Council on Clinical Cardiology; Council on Cardiovascular Radiology and Intervention; Council on Cardiovascular Surgery and Anesthesia . Decision making in advanced heart failure: a scientific statement from the American Heart Association. Circulation. 2012;125:1928–1952.
- ↵Bansal N, Szpiro A, Reynolds K, Smith DH, Magid DJ, Gurwitz JH, Masoudi F, Greenlee RT, Tabada GH, Sung SH, Dighe A, Go AS. Long‐term outcomes associated with implantable cardioverter defibrillator in adults with chronic kidney disease. JAMA Intern Med. 2018;178:390–398.
- ↵Hernandez AF, Mi X, Hammill BG, Hammill SC, Heidenreich PA, Masoudi FA, Qualls LG, Peterson ED, Fonarow GC, Curtis LH. Associations between aldosterone antagonist therapy and risks of mortality and readmission among patients with heart failure and reduced ejection fraction. JAMA. 2012;308:2097–2107.