De-identification of Data and Privacy
As discussed in a recent blog post on CyberWatch Australia, researchers from the University of Melbourne successfully re-identified the medical data of Australian patients that formed part of a de-identified open dataset. This raises a myriad of questions about privacy, the need for access to big data, and how organisations can protect the information with which they are entrusted.
DE-IDENTIFYING DATA DOES NOT GUARANTEE ANONYMITY
In August 2016, the Federal Department of Health published online the de-identified longitudinal medical billing records of 10% of Australians, about 2.9 million people. For each selected patient, all publicly-reimbursed medical and pharmaceutical bills for the years 1984 to 2014 were included.
In September 2016, researchers from the University of Melbourne were able to re-identify some of the records of these 2.9 million Australians. A report by the researchers in December 2017 outlines the techniques used to re-identify the data and the ease at which this can be done: "it is straightforward for anyone with technical skills about the level of an undergraduate computing degree".
There are two ways to identify individuals in a de-identified dataset: decryption and linking. Decryption involves using algorithms to undo the conversion of personal information into code. The more surprising result was the effectiveness of re-identification by linking the unencrypted parts of a record with known information about an individual. Known information could be in the form of information available online, but imagine the possibility of health insurers or holders of pharmacy records using their own data as a starting point to re-identify the detailed health records of patients.
TWO INCOMPATIBLE GOALS? PRIVACY VS ACCESS TO DATA
Access to high quality, and at times sensitive, data has become part of many businesses' market research strategy, but this commercial motivation does not lessen the obligation to protect individual's personal information. The report recognises that "taking advantage of the benefits of big data without seriously compromising privacy is one of the most difficult engineering challenges of our time."
For organisations, big data is a double edge sword. It possess significant benefits in regards to marketing, enhancing customer experience, optimising business processes and creating other efficiencies within a business. However, the risk of inadequate controls because of an incorrect assumption that de-identified data is not identifiable is a common mistake.
WHAT YOU CAN DO TO PROTECT YOURSELF AND YOUR DATA
Masking data sufficiently to truly de-identify it will often mean destroying the value of the data. This re-identification of sensitive health information by the University of Melbourne shows that simple de-identification should arguably not be relied on completely as a sole means of privacy protection. Organisations should assume that de-identified data is still personal information under privacy law – which it is if it can be re-identified. The richer and thus more useful the data, the more likely it can be re-identified.
Due to the exponential rate at which technological advances are made, the risk of security protections being outpaced by methods of re-identifying big data is significant. It is a "brave" assumption to treat such data as not re-identifiable. Therefore organisations need to consider whether they have adequate contractual protections in their agreements addressing the use and handling of such data. This should include the scope that the parties (including any third party) can use the personal information for (e.g. internal use only with no right to sell to other third parties), storage and security provisions, and the inclusion of adequate liability protections protecting against payment of a penalty under the Privacy Act or claims by third parties.