In recent months, before attention shifted to the pressing issue of tackling the COVID-19 pandemic, artificial intelligence (AI) had been hitting the headlines in medical news. Most weeks saw the release of high-profile publications, announcements of how AI is forging advances in research or how it will transform healthcare. Genomics is one of the fields where expectations for AI are high. Our new report Artificial intelligence for genomic medicine looks beyond the hyperbole into how AI is currently being used, where things might be heading, and the challenges ahead.
The promise of AI for genomic medicine
The AI approach – machine learning – and its subset, deep learning, are on the rise in genomic medicine and research. Ten years ago, there were roughly 300 publications listed on PubMed relating to AI in genetics or genomics, rising to around 2,000 in 2019. Publications aside, there are a growing range of companies building AI applications in the genomics space, for example for drug discovery, gene editing, and variant analysis. There are also a growing number of academic machine learning resources for genomics, some of which have already been routinely used in clinical genomics analysis for some time.
This rise of AI in genomics is unsurprising. Genomics – a ‘big data’ field – requires computational approaches to interrogate the enormous volume of data generated by sequencing technologies and to marry it in meaningful ways with other biological and clinical data. Analysing these datasets for new biological insights can be especially difficult when the rules have to be explicitly predefined, step by step, within the computer code. Instead machine learning techniques can learn from data without the need to specify explicit rules.
Hope, hype and hitches
Coupled with more powerful computing infrastructure, machine learning and deep learning are presenting opportunities to:
- Generate new insights from large-scale datasets – improving our understanding of genomic variation in relation to health and disease
- Better streamline key analytical problems in genomics analysis – helping focus the search for disease-causing variants and reducing clinical analysis times
Nearly every stage of the genomics data pipeline is affected by developments in AI, though the greatest activity is in the research phase. By facilitating the analysis of large and complex research datasets, machine learning will accelerate new discoveries in genomic medicine: current studies are seeking to understand how cancers evolve, to examine microbiomes, and to analyse multi-omics datasets.
While we shouldn’t underestimate the eventual medical impact of this research, to date AI’s outcomes for genomic medicine – in a clinical context – are not commensurate with the hype that has surrounded it. Broadly, this is because the thresholds for adopting new technologies in healthcare are higher than for other sectors, given the potential for harm to patients that could arise from misuse of an algorithm. More specifically, there are a range of issues which impede the development of robust, safe, clinically validated algorithms which are demonstrably beneficial. These issues – which include reproducibility, data silos, inadequate computing infrastructure, bias in datasets, privacy and security, lack of transparency, and regulatory ambiguity – will need to be addressed effectively if we are to reap the benefits of AI for genomic medicine.
An urgent agenda
Our report sets out seven priority policy actions that could go some way to towards meeting the challenges in making AI work to best effect for genomic medicine. Whilst there are currently more immediate urgent issues in health that will rightly take precedence, this does not absolve policy-makers and other stakeholders of the need to facilitate the safe and effective deployment of AI for genomic medicine, and other areas of healthcare. Failure to act promptly would risk:
- Compounding existing disparities. As AI is applied more routinely to genomic datasets, some pre-existing challenges will be further deepened, notably the lack of diversity in genomic datasets and databases. An imbalance of information on some populations can lead to misdiagnosis, as well as uneven success rates in personalised medicine and clinical trial outcomes. If left unresolved, the development of AI algorithms using unrepresentative genomic datasets will perpetuate and further entrench health disparities for underserved groups.
- Opportunity costs. A significant amount of investment in being poured into growing AI for healthcare. To make the most of this investment, it is crucial for AI to be channelled effectively to address the most pressing problems together with those where AI is most likely to add value. This requires close collaboration between AI practitioners and genomics domain experts to identify the most appropriate questions to address, determine which machine learning approaches to apply, and to recognise limitations in datasets, methods, and current knowledge so to avoid AI models that may lead to misleading insights or faulty predictions.
- Over-reliance on technology to solve complex problems. Despite its vast potential, AI alone will not advance genomic medicine and it certainly cannot do so without the necessary oversight, safeguards, validation, robust ethical appraisals, and public engagement. The temptation for ’tech solutionism‘ has come into sharp focus during the current pandemic, and recent reports and commentaries have warned against the rushed deployment of AI and digital technologies without credible supporting evidence and careful oversight.
Efforts to explore the current pandemic using AI are already underway – there are already over 100 pre-print publications (preliminary reports that have not yet been peer reviewed) that deploy machine learning for the study of the virus causing COVID-19. Many of these studies combine sequence data and machine learning, for example to examine the virus’ evolutionary origins; or to design antibodies; or to examine the host cellular response. So, while other health-related AI news may be scarce currently, the technology certainly hasn’t gone away. Nor should standards, safeguards, and the necessary scrutiny surrounding its development and deployment.