The use of computers and the internet has allowed unprecedented amounts of data to be collected and used for a variety of ends. Big data technology represents the most advanced and sizeable use of this new asset. The size and extent of such operations come up against a number of regulatory barriers. Most notably the General Data Protection Regulation (EU) 2016/679 (GDPR).

What is Big Data?

Big data is the harnessing, processing and analysis of digital data in huge and ever-increasing volume, variety and velocity. It has quickly risen up the corporate agenda as organisations appreciate that they can gain advantage through valuable insights about their customers and users through the techniques that are rapidly developing in the data world.

Much big data (for example, climate and weather data) is not personal data. Personal data relates to an identifiable living individual. For data that is or could be personal data, data protection legislation in particular the GDPR must be carefully considered.

Brexit

During the transition period (ends 31 December 2020 unless extended) and after organisations should, as the ICO has noted, continue data protection compliance as usual. The key principles, rights and obligations will remain the same and organisations already complying with the GDPR should be in a good position to comply with the post-Brexit data protection regime.

Big Data Analytics, Artificial Intelligence and Machine Learning

Being able to use big data is critical to the development of Artificial Intelligence (AI) and machine learning. AI is the ability of a computer to perform tasks commonly associated with human beings. In particular, AI can cope with, and to a large extent is predicated on, the analysis of huge amounts of data in its varying shapes, sizes and forms.

Machine learning is a set of techniques that allows computers to ‘think’ by creating mathematical algorithms based on accumulated data.

Big data, AI and machine learning are linked as described by the ICO:

“In summary, big data can be thought of as an asset that is difficult to exploit. AI can be seen as a key to unlocking the value of big data; and machine learning is one of the technical mechanisms that underpins and facilitates AI. The combination of all three concepts can be called “big data analytics”. (Paragraph 11 of ICO: Big data and data protection 2017.)

Big data analytics differs from traditional data processing in the following ways:

  • It uses complex algorithms for processing data. This usually involves a “discovery” phase to find relevant correlations (which can be a form of machine learning) so that algorithms can be created.
  • There is limited transparency on how these algorithms work and how data is processed. As vast amounts of data are processed through massive networks, a “black box” effect is created that makes it very difficult to understand the reasons for decisions made by the algorithms.
  • There is a tendency to collect “all the data” as it is more easily available rather than limiting the analytics to random samples or statistically representative samples.
  • Often data is re-used for a different purpose for which it was originally collected, often because it is obtained from third parties.
  • It usually involves data from new sources such as the Internet of Things (IoT) and “observed” data that has been generated automatically, for example by tracking online behaviour rather than data provided by individuals. In addition, new “derived” or “inferred” data produced by the algorithms is used further in the analytics.

Big Data and Data protection

Managing compliance with the GDPR will play a large part in big data management projects involving data harvested from the expanding range of available digital sources. Many organisations will already have an established data protection governance structure and policy and compliance framework in place and these can be helpful as pathfinders towards structured data governance.

Controller or processor?

Under Article 4(7) of the GDPR, a person who determines “the purposes and means” of processing personal data is a controller and under Article 4(8), a processor just processes personal data on behalf of the controller.

Correctly assessing whether an organisation is a controller or a processor in the context of the collection of massive amounts of data is therefore critical to the GDPR compliant structuring of the relationship and to allocating risk and responsibility.

However, the borderline between controller and processor can be fuzzy in practice. Where it lies in the AI context was considered for the first time in the UK in the ICO’s July 2017 decision on an agreement between the Royal Free Hospital and Google DeepMind. Under the agreement, DeepMind used the UK’s standard, publicly available acute kidney injury (AKI) algorithm to process personal data of 1.6m patients in order to test the clinical safety of Streams, an AKI application that the hospital was developing. The ICO ruled that the hospital had failed to comply with data protection law and, as part of the remediation required by the ICO, the hospital commissioned law firm Linklaters to audit the system. The hospital published the audit report in May 2018, which found (at paragraph 20.7) that the agreement had properly characterised DeepMind as a processor not a controller.

Things important to this characterisation were that the algorithm was simplistic and its use had been mandated by the NHS. Understanding whether an organisation is a processor or controller is a complex issue and seeking advice on the matter may be crucial to understand potential liabilities for those using big data.

Personal data

In the context of big data, it is worth considering whether personal data can be fully anonymised, in which case taking it outside data protection requirements. This is noted in Recital 26 of the GDPR which says that:

“the principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable”.

However, personal data which has been pseudonymised, in other words could still identify an individual in conjunction with additional information, is still classed as personal data.

Profiling

The GDPR includes a definition of profiling that is relevant to the processing of big data. Profiling is defined as any form of automated processing of personal data used to evaluate certain personal aspects relating to a natural person, in particular to analyse or predict the following: performance at work; economic situation; health; personal preferences; interests; reliability; behaviour; location; movements. (Article 4(4), GDPR.)

The GDPR includes data subject rights in relation to automated decision making, including profiling. The fact that profiling is taking place must be disclosed to the individual, together with information about the logic involved, as well as the significance and the envisaged consequences for such processing.

Individuals have the right not to be subject to a decision based solely on automated processing (which includes profiling), which produces legal effects concerning them or similarly significantly affects them (Article 22(1), GDPR). However, this right will not apply in certain cases, for example if the individual has given explicit consent, although suitable measures must be implemented to protect the data subjects.

Fair processing

In the ICO Big Data Paper 2017, the ICO emphasises the importance of fairness, transparency and meeting the data subject’s reasonable expectations in data processing. It states that transparency about how the data is used will be an important element when assessing compliance. It also highlights the need to consider the effect of the processing on the individuals concerned as well as communities and societal groups concerned. Similarly, the EDPS 2015 opinion stresses that organisations must be more transparent about how they process data, afford users a higher degree of control over how their data is used, design user friendly data protection into their products and services and become more accountable for what they do.

Transparency

As well as the general requirement for transparency in Article 4(1)(a), the GDPR includes specific obligations on controllers to provide data subjects with certain prescribed information (typically done in the form of a privacy notice) (Articles 13 and 14, GDPR).

The ICO Big Data Paper 2017 notes that the complexity and opacity of data analytics can lead to mistrust and potentially be a barrier to data sharing, particularly in the public sector. In the private sector, it can lead to reduced competitiveness from lack of consumer trust. Therefore privacy notices are a key tool in providing transparency in the data context. In relation to privacy notices, the Paper suggests using innovative approaches such as videos, cartoons, icons and just-in-time notifications, as well as a combination of approaches to make complex information easier to understand.

An introduction

This blog is no more than an introduction and summary of some of the legal issues raised by big data. In many ways the GDPR was created in response to such activity and therefore the extent of its applicability to the topic is unsurprising. Any organisation looking to undertake such a project should be aware of regulations in a way that allows compliance to be built into an operating system.

If you have any questions on data protection law or on any of the issues raised in this article please get in touch with one of our data protection lawyers.