As consideration of data privacy grows, and with the new General Data Protection Regulation (GDPR) in force across the European Union, it is important to ensure you are using data appropriately. Today, data permissioning and management are critical aspects of any AI-driven company. Below, we describe regulatory and ethical considerations to help you manage safely the data you use to build your models. Seek legal advice to ensure compliance with any applicable legislation; the information below is introductory in nature and will not reflect your company’s individual circumstances.
The GDPR came into force across the European Union on 25th May 2018. It applies to all companies processing the personal data of people in the EU, regardless of a company’s location. Among other considerations, it standardises data handling requirements and penalties for data misuse. Article 83 of the GDPR specifies fines of up to 4% of a company’s global revenue or €20m – whichever is greater – for non-compliance.
Individuals, organisations and companies which, according to the GDPR, are either “controllers” or “processors” of personal information are accountable for their handling of individuals’ personal information. Companies must “implement measures which meet the principles of data protection by design and by default”. Transparency and fairness are also key concepts within the GDPR. You must be clear and honest regarding how you will use individuals’ personal data – and must not use personal data in a way that is unduly detrimental, unexpected or misleading for the individuals concerned.
Demonstrate compliance with the GDPR principles of protection, fairness and transparency in multiple ways, including by:
The GDPR has expanded the definition of personal data, which broadly refers to information relating to an identified or identifiable person, to include information “specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that person”. This includes job titles, employers and social media handles – even if the individual has made these public on selected websites. While some information may directly identify an individual, other information may do so indirectly if combined with other information. In both circumstances, the data is deemed personal information. Further, if you are inferring personal information – such as gender, age or salary – using your system, you must treat the information as if it were gathered from the user directly.
“Demonstrate compliance with GDPR principles of protection, fairness and transparency in multiple ways.”
Certain personal data – in categories including racial origin, religious beliefs, genetic data and data concerning a person’s sexual orientation – may be considered “sensitive data” and require special care. Pseudonymise all personal data through anonymisation, encryption or tokenisation:
“Consider the security of data not just when it is stored but when it enters, moves within and leaves your environment.”
Even with security best practices in place, holding personal data remains a risk. Minimise the personal data you require. If you can fully anonymise your data and avoid the need to store any personal information, do so. If you must store personal data, consider the security of data not just when it is stored but when it enters, moves within and leaves your environment. Examine every point where personal data could be read by an employee or malicious third party and ensure you have pursued every measure within your control to protect it. Delete data when it has been processed according to its agreed purpose.
In addition to standardising data handling requirements and penalties for misuse, the GDPR introduced considerations that can impact AI solutions:
These articles are yet to be comprehensively tested in court. However, they explicitly prohibit legal effects – such as sentencing and parole decisions – that result solely from automated processing undertaken without the individual’s explicit consent, when consent is required.
What constitutes “similarly significant” effects, “explicit consent” (beyond acceptance of an extensive set of online conditions containing a relevant paragraph) and whether something is “necessary” to perform a contract are subject to interpretation at this time.
If you are developing an AI system that could materially impact an individual’s life, therefore, it is prudent to consider making your system advisory only and including a human check. Once case law has better established the meanings of the terms above, there will be greater clarity regarding the implications of the legislation.
Article 22 (Paragraph 3) of the GDPR, which requires companies to protect the data they control and allows individuals to challenge an automated system they believe is treating them unfairly, demands a robust explanatory framework for the outputs of your systems. Convention 108 of the Council of Europe (https://bit.ly/2n6POrT), adopted into UK and EU law in May 2018, imposes related requirements:
Convention 108 affords individuals the right to understand how decisions made about them, using data processing, have been developed. Because every individual possesses this right, you must be able to explain, in lay terms, how decisions that affect individuals are made.
Beyond the stipulations of Convention 108 of the Council of Europe, there is growing demand more broadly for greater explainability of AI systems. Improved explainability was a recommendation, for example, from the UK Parliamentary Science and Technology Select Committee on AI. Regulatory or pragmatic demands may force you to consider the explainability of your systems.
For a system that uses a decision tree, it will be straightforward to explain how data maps to the system’s decision. For machine learning-based systems, and particularly deep learning-based systems, this will not be possible. There may be thousands of abstract numbers corresponding to the connections in the network that contribute to its output. These numbers will be meaningless to individuals who seek to understand the system, which is why many AI systems are considered to be ‘black box’ and inexplicable.
There are, however, means of explanation that do not involve a system’s mathematics. These approaches consider the impact of variables inputted to a system and their influence on the output. There are several techniques you can apply including Inferred Explanation, Feature Extrapolation and Key Variable Analysis (Fig. 41). Which you favour will depend on the level of explainability you require and the challenge of providing it.
Approach | Difficulty | Speed | Advantages | Disadvantages |
---|---|---|---|---|
Inferred Explanation | Low | Fast | Easy to understand | Limited explanatory power |
Feature Extrapolation | Moderate | Slow | Easy to understand | Limited applicability |
Key Variable Analysis | Very high | Very slow | Thorough | Challenging to understand |
Source: MMC Ventures
1. Inferred Explanation: Inferred Explanation is the easiest way to explain AI. The algorithm is not described and a ‘black box’ is retained around it. Correlations are considered between system inputs and outputs, without explaining the steps between.
By demonstrating examples of decisions, individuals can see the correlations between input data and output decisions (Fig. 42), without detail regarding how the inputs and outputs are connected. Inferred explanation does not provide complete clarity regarding a model, but will demonstrate how decisions relate to inputs in a manner that will be satisfactory in many situations.
Source: MMC Ventures
“There are means of explanation that do not involve a system’s mathematics.”
2. Feature Extrapolation: Some systems, including financial models and systems that materially impact individuals, require an explanation – beyond correlation – of how models reach their conclusions. While more effort, it is possible to evaluate features in data that are activating parts of a network. This is particularly fruitful for image classification systems. Using test data, and reversing the flow of data in a network, you can create images that demonstrate the features that activate a particular layer in the network (Fig. 43). Further, Google recently released
a library for TensorFlow to undertake this visualisation automatically, within a browser, during training (bitly.com/2R6XeZu). While not suitable for all AI systems, feature extrapolation provides a degree of explainability in a manner that non-technical individuals can appreciate.
“Some systems require an explanation – beyond correlation – of how models reach their conclusions.”
Source: Zeiler and Fergus, https://bit.ly/2JjF4R0
“It is possible to evaluate features in data that are activating parts of a network.”
3. Key Variable Analysis: If you wish to provide the most precise explanation of your system, you must analyse the impact of each input on the system’s decision-making process and overall decision. This will require a full statistical analysis and is a significant undertaking. For each output decision, you will require an example of input data that strongly results in that decision. Change each variable, in turn, from the value that leads to Decision 1 to the value that leads to Decision 2. Then, change the variables in combination. The effort required will increase exponentially according to the number of variables you have, and the process will be time-consuming. However, you will be able to determine whether any system inputs, singularly or in combination, have a disproportionate effect on your system’s output decision (“variable bias”). You may find, for example, that your model places a high importance on gender when providing an output decision. This is possible, even if gender is not explicit in your data, if you have other closely associated variables (such as vocation in a gender biased industry).
Key variable analysis has drawbacks as well as advantages. In addition to being complex and resource-intensive, it can be difficult to explain results accessibly. Further, if you explain your model in a high degree of detail, malicious third parties can use this information to force results from your model that they wish to see.
“If you wish to provide the most precise explanation of your system, you must analyse the impact of each input on the system’s decision-making process and overall.”
Use this approach if you: | Avoid this approach if you: |
---|---|
Inferred Explanation | |
– Seek a high-level overview of your AI system – Believe correlation offers sufficient explainability |
– Require detail regarding how variables lead to decisions |
Feature Extraction | |
– Require detail from within the network – Have a network type (e.g. images) where abstractions can be mapped onto input data |
– Have limited time – Require precise impact of input variables, not general features – Are not using an assignment–based or generative AI network |
Key Variable Analysis | |
– Require detail about the importance of variables – Seek to prevent unwanted bias in your variables |
– Have limited time – Seek to publish your results – Wish to offer a layperson’s guide to your model |
Source: MMC Ventures
When developing and deploying AI systems, as well as providing sufficient explainability it is important to use data ethically. “Plan for ethical AI from the outset and underpinning all initiatives. It’s got to be foundational, not an afterthought” (Steven Roberts, Barclays). In addition to the intrinsic importance of doing so, a growing number of companies are incurring reputational and financial costs from failing to do so.
The Durham Police Constabulary, in conjunction with computer science academics, is trialling a framework – ALGOCARE – to ensure its AI system uses data transparently within an explainable, ethical process. Many companies with AI systems also have frameworks in place, albeit privately and often loosely defined. While every company’s framework differs, ALGOCARE highlights issues you should consider when managing data.
“Plan for ethical AI from the outset and underpinning all initiatives. It’s got to be foundational, not an afterthought.”
Steven RobertsBarclays
“EU and UK Parliamentary committees are engaged on the issues of AI and explainability.”
EU and UK Parliamentary committees, including the Science and Technology Select Committee and the House of Lords Artificial Intelligence Select Committee on AI, are engaged on issues of AI, explainability and data privacy. The UK Science and Technology Select Committee, for example, launched an inquiry into the use of algorithms in public and business decision-making. Further, more specific, legislation is probable. Ensure that a senior member of your team (Chief Science Officer, Head of AI or Head of Data) is responsible for staying up-to-date regarding proposed legislation and the impact it could have on your business.
“Ensure that a senior member of your team is responsible for staying up-to-date regarding proposed legislation.”