AI Risk Guidance

Guidance for Developing Private AI Models in Sensitive Healthcare Data

This guidance explains the risks and concerns associated with developing AI models on private healthcare data and what mitigations (privacy-preserving techniques) can be implemented to deal with these.

Risks to Privacy in AI Models

Data memorisation, also known as overfitting, is a common challenge in training AI models which occurs when the model becomes too focused on the specific details of the training data, rather than learning the underlying patterns. This can typically occur when there are too many features and/or too few participants in the training data. Inversion and inference attacks can exploit this vulnerability to potentially reveal specific participant data.

In an attribute inference attack, an adversary may have partial knowledge of an individual and access to a model trained on records including that individual. From this, they can infer the unknown values of features in those records. The adversary uses the accessible model to make predictions for the instances they have partial knowledge of and by analysing the model's output, they attempt to deduce or infer the unknown values for those instances. This is usually an iterative process, refining their understanding of the model's behaviour and adjusting their queries to the model. Higher confidence in predictions will boost the attacker's confidence in the accuracy of their inferred values.

Membership inference attacks enable an attacker to determine whether specific individuals' data was included in the training of AI models. This poses a significant threat, as discerning the inclusion of individual data can lead to unintended exposure of sensitive medical histories. By exploiting the models outputs, usually by observing a confidence score, attackers can infer whether specific individuals' data was included in the training set, therefore exposing whether an individual has a certain disease for example in something like a treatment response model.

TREs typically anonymise data by stripping out personal identifiable information (PII) such as names, addresses, and dates of birth. Within the TRE, the security measure (such as virtual desktops and disclosure control methods) ensure the data is functionally anonymised, in that it cannot be combined with other data. If the same datasets were publically available outside of the TRE environment, then that data could potentially be identifiable due to privacy attacks and linkage to other data outside of the TRE

Privacy-Preserving Techniques

There are various measures available to safeguard privacy in AI models, either during the training phase or by imposing constraints during deployment. These can protect against a range of different attacks but should be carefully chosen depending on how that model is going to be used or shared. For instance, when sharing or releasing an AI model, there exists a vulnerability to white-box privacy attacks, where the attacker possesses full access to the model, enabling direct inspection and a wider range of attacks to be performed. In such cases, it's imperative to employ privacy-preserving techniques that safeguard the training data to counter these threats. Conversely, in scenarios where the model is inaccessible but can be queried, it becomes susceptible to black-box attacks. Here, it could be more beneficial to enforce access/query limitations on the model or to employ privacy-preserving techniques during inference.

Differential Privacy

Differential privacy works by adding noise either to the data, or the response of the model, to ensure that an adversary can’t determine with confidence that information about an individual is present in the data. This level of noise is determined by epsilon, also known as the privacy budget, which controls the privacy guarantee of the data. However, differential privacy involves a trade-off between privacy and utility due to the effect of adding noise. Because of this addition of noise, this can reduce the accuracy of an AI model, so researchers have to carefully consider this trade-off and the level of noise suitable.

Synthetic Data

Synthetic data aims to generate artificially created data which replicates the statistical properties and patterns of the real data. This is usually done through training a generative model on some real data to learn the characteristics and structure of that data to be able to create new samples from it. Analysis of this type of data should produce similar results compared to using the original data but this depends on the level of synthetic data generated, and like differential privacy, there is a trade-off between privacy and utility depending on the fidelity of the synthetic data. The more the synthetic data mimics real data, then the more likely it is to reveal individuals’ data.

Homomorphic Encryption

Homomorphic encryption provides high protection while retaining utility as it enables computations to be performed on encrypted data without the need of having to decrypt it. Although this is the most ideal solution, this method is currently very limited in its abilities in AI and can be challenging to implement. HE is more typically used at the inference stage of AI models to protect the query data, and not the training data.

Releasing AI Models Safely

If an AI model is ready to deploy, one option could be to host that model with restricted access and queries. This would mean that the AI model would stay within the TRE, and could only be queried through a web interface or an API. By imposing access and query controls, it means that the model can only be used by approved users, and attacks are prevented because of the query restrictions.

If an adversary did somehow manage to be able to query the model. Then they would only be able to run black-box attacks as they wouldn’t have direct access to the model. This makes attacks more difficult to perform as the adversary only has the outputs from the model to attack, therefore limiting their capabilities.

Federated learning (FL) allows multiple parties to train AI models on data from multiple sources, without having to share their local data. In centralised FL, gradients from each of the local AI models are sent to a server, where they are aggregated into a single global model, which is then sent back to the local sources to further develop. This process is repeated iteratively until the global model is refined and improved. In decentralised FL, instead of having one singular coordination server, the gradients are sent out to each local source where they all update the global model directly. Each version has its advantages and disadvantages. In centralised FL for example, you only need to trust the one server, whereas in decentralised FL you need to trust all parties involved. However, in decentralised FL there is no single point of failure. But, no matter what version is chosen, the AI model gradients and updates are still being shared between servers, and therefore is still vulnerable to privacy attacks unless additional privacy-preserving techniques are implemented as well.