Multi-omic machine learning can predict therapy response in breast cancer

April 2022 Health Innovation Andrea Enguita
3D render of a medical background with DNA strand

Tumour ecosystems are increasingly recognised as major determinants of treatment response in breast cancer patients. A novel study now used an ensemble machine learning approach to integrate multi-omic features from pre-treatment cancer biopsies to derive predictors for pathological complete response (pCR). The models were externally validated, demonstrating very good discrimination power and may be used to determine therapy choice in future clinical trials.

Neoadjuvant treatment is increasingly used in the management of breast cancer to improve rates of breast-conserving surgery and increase survival. Unfortunately, not all patients respond well to this kind of therapy. Breast cancers are complex ecosystems of malignant cells and the tumour microenvironment. How these ecosystems are organised in breast appears to be strongly associated with their genomic features. Sammut et al. hypothesised that improved prediction models of treatment response need to account for tumours as complex ecosystems. This study characterised biological parameters extracted from a prospective neoadjuvant study that collected detailed pre-therapy tumour multi-omic data, and associated these with the eventual response.  It was found that malignant cell, immune activation and evasion features were associated with treatment response. These features, derived from clinicopathological variables, digital pathology and DNA and RNA sequencing, were used as input into an ensemble machine learning approach to generate predictive models.

Multi-platform profiling of tumour biopsies

This study enrolled 180 women with early and locally advanced breast cancer undergoing neoadjuvant treatment into a molecular profiling study (TransNEO). Fresh-frozen pre-treatment core tumour biopsies were collected from 168 cases using ultrasound guidance. DNA and RNA were extracted and profiled by shallow whole-genome sequencing, whole-exome sequencing and RNA sequencing. The diagnostic core biopsy haematoxylin and eosin-stained slides from 166 cases were digitized. Chemotherapy was administered for a median of 18 weeks in 145 cases. Patients with HER2+ tumours (N= 65) received a median of three cycles of anti-HER2 therapy in combination with a taxane. Response was assessed at surgery using the residual cancer burden (RCB) classification. On completion of neoadjuvant treatment, in the 161 cases with RCB assessment, 42 (26%) had a pathological complete response (pCR), 25 (16%) had a good response (RCB-I), 65 (40%) had a moderate response (RCB-II) and 29 (18%) had extensive residual disease (RCB-III).

Clinical phenotypes and genomic landscapes

With regard to clinical features individually associated with pCR, only ER status was associated with pCR. Furthermore, also the genomic landscape was associated with responses. In this, tumours that attained pCR mostly came from more-aggressive integrative cluster subtypes, were enriched for TP53 mutations, had higher tumour mutation burdens and neoantigen loads, had less-complex clonal architectures and were enriched for APOBEC and HRD signatures. Interestingly, some tumours also appear to have immune escaped by losing copies of the HLA class I allele and these tumours are less likely to respond to treatment. In therapy-naive tumours, proliferation and immune response, both innate and adaptive, have combined effects that associate with sensitivity to treatment. In general, tumours that attain pCR tend to be highly proliferative and display evidence of an active tumour immune microenvironment (TiME). However, some tumours, despite being proliferative and with an enriched TiME, display features of T cell dysfunction and tend to be resistant to therapy.

A machine learning framework was used to integrate all analysed features into a predictive model of pCR. The fully trained models were tested for validation on an independent external cohort of 75 patients that received neoadjuvant therapy. The fully integrated model (clinical, DNA, RNA, digital pathology and treatment) achieved an area under the curve of 0.87 and relied on features obtained from all modalities of data, with RNA features having the most significant contribution. In a clinical workflow, the predictive models could be applied to candidates for neoadjuvant therapy: patients predicted to have chemo-resistant tumours should be considered for enrolment into clinical trials of novel therapies since their prognosis is poor with standard-of-care therapies. 


This study showed that response to treatment is determined to a great degree by the baseline characteristics of the totality of the tumour ecosystem. Machine learning models for prediction of therapy response that combine clinical, molecular and digital pathology data significantly outperform those based on clinical variables. The high accuracy obtained in external validation suggests that the models are robust and may enable using molecular and digital pathology to determine therapy choice in future clinical trials, including in the adjuvant therapy setting. More generally, the framework highlights the importance of data integration in machine learning models for response prediction and could be used to generate similar predictors for other cancers.


Sammut S-J, Crispin-Ortuzar M, Chin S-F, et al. Multi-omic machine learning predictor of breast cancer therapy response. Nature. 2022;601:623-9.