How do we close the diversity gap in genomics?
Tanya Brigden and Bhavya Krishnan
11 July 2024
Not all patients are benefitting equally from advances in genomic research. Research databases and biobanks underpinning genetic tests and tools are full of data from populations labelled as “European ancestry” limiting the generalisability of research findings. This may mean that studies based on these databases are less able to yield results that are meaningfully transferable to other populations, leading to calls to close what is known as ‘The diversity gap’.
In this blog, myself (Tanya) and my colleague Bhavya, reflect on this challenge and its impacts on patients. We also set out some of the discussion we had together about how all of us working in the genetics and genomics field can take collective steps towards a world where everyone can benefit from genomic healthcare.
Although we approach this problem from different perspectives – Bhavya’s expertise is in biomedical science and mine in bioethics and law – it is clear to us that the lack of diversity and representation in genomics is a clinical, methodological, societal and ethical problem, and that multidisciplinary reflection and problem solving will be key.
The diversity gap
The diversity gap in genomic data refers to the underrepresentation of certain populations, particularly non-European ancestries, in genetic research and databases. This bias means the global majority make up the minority in genomic databases. This gap has significant implications for both scientific understanding and care that people receive.
Early genomic research, like the Human Genome Project, mostly involved people of European descent because major research institutions were in Europe and North America. More funding and resources were available in these high-income countries, so the focus stayed on European populations. Collecting genetic data from more diverse groups faced extra ethical, legal, and logistical challenges, making the diversity gap even wider.
Growing awareness in UK health policy
Awareness of this gap is growing, as are conversations around solutions and strategies to ensure that all communities benefit from genomics. Promoting diversity and equity of access is one of the key pillars of the Genome UK strategy.
Genomics England have set up the Diverse Data initiative which aims to reduce health inequalities and improve patient outcomes in genomic medicine for minoritised communities.
Most recently, the NHS Race and Health Observatory have published a report highlighting the underrepresentation of ethnic minority groups in genomics datasets, research studies and genetic medicine services, accompanied by a series of recommendations for change. The literature review for this report cites 37 PHG publications, as we too have been grappling with these challenges, particularly in relation to the growing use of innovative technologies such as artificial intelligence and polygenic risk scores.
Despite policymakers acknowledging the disparities in healthcare and health data among different population groups, several challenges still need to be addressed. This was highlighted during the event on ‘Diversity in Clinical Genomics’ held at Cambridge West Hub on 21st June which brought together experts and enthusiasts from the genomics community. We attended this thought-provoking meeting, with our colleague Chantal Babb de Villiers, who was also speaking at the event.
So what did we learn about why diversity and representative data is important, and how can we make positive changes towards equity in genomic healthcare?
Impact of the diversity gap in healthcare delivery
Healthcare Inequality: Genetic research informs medical treatments, diagnostics, and disease understanding. However, guidelines based on predominantly European cohorts may not apply universally, causing healthcare disparities. For instance, BRCA1 and BRCA2 gene mutations linked to breast and ovarian cancers were first studied in European populations, making risk models and screening less accurate for women of African, Asian, or Hispanic ancestry. Similar issues occur with cystic fibrosis, leading to potential misdiagnoses and delayed treatment in other ethnic groups, undermining their health outcomes.
Limited treatment efficacy: A lack of representative data can lead to suboptimal dosing and increased risks of adverse drug reactions in non-European populations. For example, warfarin dosing guidelines are primarily based on studies conducted in European populations, neglecting genetic variations in metabolising enzymes like CYP2C9 and VKORC1 that significantly affect drug response. Improving treatment efficacy requires inclusive research practices that incorporate diverse genetic data to tailor medication regimens accurately across all diverse population groups.
Higher number of variants of uncertain significance (VUS): Non-European populations tend to receive a higher number of Variants of Uncertain Significance (VUS) results in genetic tests. This occurs because the reference databases, which are predominantly based on European genetic data, lack sufficient comparative information for other ancestries. For example, when genetic variants are identified in African, Asian, or Hispanic individuals, there may be insufficient data to determine whether these variants are benign or pathogenic. This uncertainty can lead to confusion and anxiety for patients and challenges in clinical decision-making for healthcare providers.
Bias in genetic studies: Studies that lack diverse representation miss genetic variants that are important for understanding diseases in underrepresented populations. For instance, sickle cell disease is more prevalent in individuals of African descent, but because of the European focus in early genomic studies, there was a delay in understanding the genetic basis and optimal treatments for this disease. Similarly, the genetic predispositions to conditions like Type 2 diabetes may vary significantly across different ethnic groups. Without diverse genetic data, the identification of relevant risk factors and development of tailored interventions remain inadequate.
Missed scientific opportunities: Genetic diversity helps us understand human biology and evolution in ways that studying a single population cannot. Many diseases and health challenges are global in nature. Understanding genetic diversity globally can lead to discoveries that benefit people worldwide, not just in specific regions or ethnic groups. By looking at diverse genetic data, we can see how different groups of people have adapted to their environments, revealing important information about disease resistance, metabolism, and other traits. For example, studying the EPAS1 gene in Tibetan people, which helps them live at high altitudes, has taught us about conditions like hypoxia. Similarly, research on people in malaria-endemic areas has improved our knowledge of resistance to malaria. Without studying diverse populations, we might miss these crucial discoveries.
Bridging the gap
During the event, a range of interrelated strategies and solutions were explored. A few of these reflections particularly stuck with us.
Bhavya: What stood out to me were the nuanced issues in data collection. The call shouldn’t just be for more data, but for more relevant and representative data. Simply amassing large datasets is not enough; we need to collect data that reflects the diversity of the populations we aim to study and serve. This involves prioritising underrepresented groups and ensuring their data is analysed and validated with the same rigour as data from traditionally studied populations. A significant amount of diverse data remains unanalysed and unvalidated, often dismissed as “noise.” This issue arises from historical biases that skewed early studies towards European populations, resulting in reference genomes and tools that don’t account for genetic variation in other groups.
The lack of appropriate analytical tools makes this data clinically non-viable, as scientists cannot validate it into actionable recommendations. Limited resources, funding, and a shortage of skilled researchers further complicate the situation as there is no incentive for these researchers to go the extra mile to engage with underrepresented communities and encourage participation.
Additionally, there is a crucial difference between data from native populations and diaspora populations. Shared genetic ancestry from a region doesn’t necessarily mean the data is representative. Just because people share ancestry from a particular region doesn’t mean their genetic information tells the whole story. Environmental conditions and cultural factors over many generations can greatly influence how likely people are to get certain diseases. This underlines why it’s so crucial to gather and study data in a more detailed and careful way.
Tanya: I was very struck by the description of the diversity gap being as much a social issue as a clinical issue. Historical injustices and experiences of discrimination have contributed to mistrust in medical research among marginalised communities, impacting their willingness to participate in genetic studies. This mistrust perpetuates barriers to inclusive research. It is not always clear to people why they should participate and how it might help themselves and their community. Initiatives such as Your DNA Your Say have explored public attitudes towards genomic data sharing and found variations between different regions in terms of their perceptions of benefit, concerns around possible misuse and trust in organisations. This is why tailored communication strategies that are appropriate for different communities are essential for promoting engagement and trustworthiness. We need to demystify the science and create space to involve and engage them in study design.
That said, building a trustworthy genomics and genetics ecosystem will require more than meaningful community engagement (although that is an important step!) Funders, peer reviewers and journals have a role in creating a research culture that values population diversity in large scale genomic studies. There is a need to incentivise not only the collection of diverse data but, as Bhavya raised, analysis of the data that is collected. Diverse representation at all levels of research could also increase trustworthiness in the field – from the participants, to the reviewers of proposals, to the scientists conducting the research.
Ultimately, the insights that we have shared in this blog are inherently tied to one another. Without trust, communities whose involvement is key to unlocking the full benefits of genomics are unlikely to consent to research. But in order for these groups to want to (and be able to) participate, all of us working in the field of genomics and genetics need to demonstrate trustworthiness and create an environment where population diversity is valued.