Machine learning used to generate a new holistic model for coronary artery disease

Coronary artery disease is incredibly complex, with a wide range of contributing factors. These diseases are also associated with many clinical manifestations. Therefore, it is imperative to detect coronary artery diseases early because that would allow the implementation of preventive measures, such as lipid-lowering therapies and lifestyle modifications.

Study: Machine Learning-Based Marker for Coronary Artery Disease: Derivation and Validation in Two Longitudinal Cohorts.  Image Credit: Gorodenkoff / ShutterstockTo study: Machine Learning-Based Marker for Coronary Artery Disease: Derivation and Validation in Two Longitudinal Cohorts. Image Credit: Gorodenkoff / Shutterstock


A quantitative difference in the amount of plaque composition and coronary stenosis helps assess the risk of myocardial infarction and death. Misclassification and misdiagnosis of coronary artery disease can lead to stroke, heart attack, and death.

Hypertension, dyslipidemia, diabetes, and smoking are common factors associated with coronary artery disease events. These factors are included in tools, such as the Framingham Risk Score, pooled cohort equations (PCE), and SCORE2, which are used to predict coronary artery disease events. However, these tools use only a small amount of data from electronic health records (EHRs) and discard most of it. Some of the critical data discarded by these tools include vital signs, medications, laboratory tests, symptoms, and many other clinical features.

Machine learning could be used to analyze and interpret large amounts of heterogeneous clinical data from patients across EHR-based healthcare systems. For example, machine learning models have been designed to accurately predict 5- or 10-year risk of coronary artery disease based on EHR data.

A recent EHR-based model has outperformed PCE in predicting coronary artery disease status at one year. These models are predominantly used as a classification tool for a binary framework. However, they do not measure disease on a continuous scale, that is, a quantitative framework. The quantitative form of evaluation of coronary artery disease could be more beneficial, as it will help to provide better personalized care.

a new study

A recent study published in the lancet journal investigated if a in silico The ISCAD Score for Coronary Artery Disease (ISCAD), based on a machine learning model, can be used as a clinical marker to detect coronary artery disease. They also evaluated whether the identified marker could be used in risk stratification and to assess disease prognosis.

Usually, molecules or anthropometric measurements are used as conventional methods. Live disease indicators. The current study evaluated the utility of ISCAD, which is based on multiple clinical data points in EHR, as a in silico marker of coronary artery disease.

The study cohort consisted of participants from two EHR-linked biobanks in the US and the UK. The BioMe Biobank consists of more than 60,000 people of various ethnicities based in the US. In addition, the model was tested externally in the UK Biobank, which comprises more than 500,000 British people.

Clinical features associated with coronary artery disease were extracted from the EHRs. The machine learning model used in this study was adapted from a previous model associated with predicting short-term risk of coronary artery disease through a binary framework based on EHR data. Model probability scores were used as a quantitative marker of coronary artery disease.

key results

A total of 95,935 participants (35,749 from the BioMe Biobank and 60,186 from the UK Biobank) were recruited for this study. The median age of the participants was around 62 years. The BioMe Biobank sample consisted of 41% men and 59% women, and 14% were diagnosed with coronary artery disease. Similarly, the UK Biobank comprised 42% men, 58% women and 14% of participants were diagnosed with coronary artery disease.

The current clinical prediction model for coronary artery disease had an area under the receiver operating characteristic (ROC) curve of 0.95 and 0.93 in the BioMe hold and validation sets, respectively. It also predicted a sensitivity of 0.84 and a specificity of 0.8 on the UK Biobank external test set.

Based on known risk factors, PCE, and polygenic risk scores, ISCAD captured the risk of coronary artery disease. Coronary artery stenosis was found to be quantitatively elevated with increasing ISCAD quartiles. It also indicated an increased risk of multivessel coronary artery disease, obstructive coronary artery disease, and stenosis of the major coronary arteries. In addition, all-cause hazard and death rates gradually increased across ISCAD deciles.


The current study has some limitations, including the use of diagnostic codes to establish coronary artery disease case status, which has a high potential for misclassification. Furthermore, a low sample size could affect the generalizability of the findings.

Importantly, the analysis of EHR data through machine learning models opens up a new avenue to assess a broad spectrum of diseases. This study determined the association of ISCAD with clinical outcomes of coronary artery disease, including recurrent myocardial infarction, atherosclerotic plaque burden, and all-cause death. The machine learning-based marker also enabled the identification of underdiagnosed individuals with elevated ISCAD and EHR tests.

In the future, more research is required to determine the association of in silico markers with the occurrence of coronary artery disease events and deaths. The effectiveness of this strategy needs to be further evaluated using other populations as well.

Magazine reference:

  • Forrest, SI et al. (2022). Machine learning-based marker for coronary artery disease: derivation and validation in two longitudinal cohorts. The lancet.

Leave a Reply

Your email address will not be published. Required fields are marked *