Are synthetic health data ‘personal data’

S1R1C1

Latest: Read the related paper High-fidelity synthetic patient data applications and privacy considerations (PDF download) by Puja Myles, Colin Mitchell, Elizabeth Redrup Hill, Luca Foschini and Zhenchen Wang. Published in the Journal of Data Protection and Privacy August 2024

S2R1C1

S2R2C1

S2R2C2

Synthetic data—artificial data that closely mimic the properties and relationships of real data—are not a new concept but technological advances have led to great optimism about their potential for health research and innovation. However, generating synthetic health data from real patient data has led developers and regulators to question the extent to which they may remain ‘personal data’, governed by data protection law.

Our report, Are synthetic health data ‘personal data’?, was independently commissioned by the MHRA to assess the status of synthetic health data under data protection law. We evaluated the current legal framework (the UK and EU GDPR), regulatory guidance and latest legal commentary to assess whether—or in what circumstances—synthetic health data might be considered ‘personal data’.

We found that regulators and legal commentators currently approach synthetic data with caution. The default is to presume that synthetic data generated from real patient data are ‘personal data’ unless it can be shown that risk of identification has been reduced to remote levels. While this safeguards privacy, it may limit research, testing and ultimately translation into patient care.

As a consequence we make three main recommendations for synthetic data developers, researchers, regulators and policymakers:

synthetic data developers and users should continue to follow best practice in relation to data protection impact assessments and anonymisation in assessing the identifiability and other data protection risks arising from processing.
synthetic data developers, researchers, regulators and policymakers should seek to achieve greater clarity, and reach consensus on:
- appropriate standards and approaches to assessing identifiability of specific synthetic data generation methods, utilising quantitative metrics as far as possible;
- whether the default for regulating certain forms of synthetic data and synthetic data generation should change from presumptively ‘personal data’ to a more proportionate approach that allows for some synthetic data to be classified as non-personal data based on an assessment of risk by data controllers.
as synthetic data generation and other forms of AI-driven processing for health purposes gain pace, regulators and policymakers should prioritise determining what form of regulation is appropriate for this sector and how it fits within the overall regulatory framework.

The full report is available to download here

Please note that this report is intended to provide general information and understanding of the legal framework. It should not be considered legal advice, nor used as a substitute for seeking qualified legal advice

S2R3C1

Share this content

Are synthetic health data ‘personal data’ (PDF 5MB)