Privacy Enhancing Computation 101
THE PROBLEM
The global pressures to protect data requires enterprises to shift away from the legacy approaches of BAAs and encrypted data sharing, and move toward privacy-enhancing computation (PEC) techniques.
THE PROBLEM
The global pressures to protect data requires enterprises to shift away from the legacy approaches of BAAs and encrypted data sharing, and move toward privacy-enhancing computation (PEC) techniques.
THE SOLUTION
THE BASICS
An important emerging set of techniques and technologies are known by the umbrella term “Privacy Enhancing Computation” – or by the close synonyms “Privacy Enhancing Technologies” or “Privacy Preserving Technologies.”
While a diverse assortment of technical approaches are under this umbrella, they all share one key objective. This common goal is achieving data security through privacy enhanced computation and to enable the secure and compliant processing of artificial intelligence and other forms of data analytics on data sets that contain personally identifiable information. Often this data is stored in multiple locations spread across organizational and national boundaries.
THE IMPERATIVE
The benefits of privacy enhancing computation are applicable to many vertical markets, with the healthcare and financial services sectors leading the overall adoption curve. These sectors have the most immediate and obviously compelling use cases, such as reducing the time and resources necessary to develop new pharmaceuticals or drastically cutting down cases of credit card fraud. Privacy preserving technology may also be applied to other data-intensive verticals, such as education, manufacturing, and infrastructure development. As the data economy booms, so too will potential use cases of PEC.
THE LANDSCAPE OF SOLUTIONS
Several different taxonomies have been published for achieving data security through privacy enhancing computation, but the list that we find easiest to digest has seven categories:
Differential Privacy
Differential Privacy was originally developed by cryptography experts and its vocabulary reflects that heritage. This approach attempts to shield the identity of any individual in a data set by describing only the attributes of groups and patterns in that data set. The mathematical techniques used intend to estimate the probability of privacy loss for an individual with respect to a given algorithm. A key step is to determine the amount of “noise” that must be artificially injected into the data set to effectively maintain a given level of privacy. Read more about differential privacy.
Federated Learning
Federated learning is a specific technique within the broad field of machine learning that aims to train ML models on distributed and heterogeneous data sets. Originally developed by Google in 2016 to train ML models on data stored on mobile phones, the term federated learning is used in multiple varied ways by different researchers today. This technique involves the exchange of whole models between different data providers, which can allow for training on heterogeneous data sets. However, federated learning can be computationally intensive and potentially expose the model provider’s valuable IP. TripleBlind has developed several important innovations in this field that improve scalability and practical use. Read more about federated learning.
Homomorphic Encryption
Homomorphic encryption is a specific class of encryption schemes that permits users to run certain operations on data while the data remains in its encrypted state. It is often considered an extension of public key cryptography, which was invented in the 1970’s. From there, it took more than three decades for the first usable forms of homomorphic encryption to be published. Homomorphic is a term from advanced algebra that speaks to the structure-preserving relationship between the plaintext and the encrypted data. Since the outputs of computation on encrypted data/ciphertext are identical to those of unencrypted data/plaintext, these functions may be thought of as homomorphisms. Read more about homomorphic encryption.
Secure Enclaves
A secure enclave (also known as Confidential Compute or Trusted Execution Environment) provides CPU hardware-level isolation and memory encryption on every server. Secure enclaves isolates application code and data from anyone without privileges and encrypts all data in memory. With additional software, secure enclaves enable the encryption of both storage and network data for simple full stack security. Secure enclaves work well for operations taking place on a single server, since data and algorithms must be stored in the same location. However, secure enclaves force data operations into a state of hardware dependence –– limiting distributed data operations across multiple organizations, servers, and country borders. Learn more about secure enclaves.
Secure Multi-Party Computation
Secure Multi-Party Computation (SMPC), also known as Multi-Party Computation or Secure Computation, has been actively researched within the field of cryptography since the late 1970s. This approach is applicable when two or more parties each have private data that they do not want to reveal, but all parties want to know the outcome of a calculation without relying on a trusted third party. Classic problems covered in academic literature on SMPC include millionaires wanting to know who is wealthier without revealing their own net worth, or private voting schemes that avoid reliance on a central authority to tabulate votes. The mathematical fundamentals for SMPC have existed for decades, but limitations for processing speed and overall scalability have been critical challenges for implementers of these techniques. Read more about SMPC.
Synthetic Data
Synthetic data, in the most general sense, is any data set that has been generated using software rather than collected empirically. In the specific realm of privacy protection, software used to create synthetic data sets intends to model real data without disclosing records that can be tied back to specific individuals. Synthetic data grants a considerable depth of tools and techniques designed to give data users the inputs needed for data analytics, AI, and research. The pros and cons of synthetic data approaches versus private use of real data has been the subject of considerable research in recent years, highlighting a tension between the needs for data privacy and data accuracy. Read more about synthetic data.
Tokenization, Data Masking, and Data Hashing
Tokenization replaces sensitive data – such as phone numbers, payment card numbers, or patient ID numbers – with non-sensitive substitutes, often without changes to the field type or length. This non-sensitive substitute is known as the “token”, which represents the record or individual when multiple parties process data. These tokens might be revealed in a data breach, but the sensitive data represented by the token is not. Tokenization requires a secure centralized system for the generation and management of the tokens and their relationship to the underlying data. Data masking and hashing are related techniques that alter the values of fields which contain data that is not an ID number but is still private, but not ID data numbers. Read more about tokenization.
Privacy enhancing cryptography is an alternate term that reflects a collection of methods in a subset under the broader umbrella of privacy enhancing computation. Cryptography primitives naturally come into play across the seven PEC categories above – so cryptographers often speak from this perspective. Bear in mind that cryptography is not the only aspect of PEC solutions available from privacy-enhancing computation companies.
TripleBlind is one of the leading privacy-enhancing technology companies. We’ve linked excellent resources for preserving privacy throughout this piece. If you’re interested in learning more about the technology behind TripleBlind’s software-only PEC solution, you can access a complimentary copy of our whitepaper here.
Our subject matter experts would also love to walk you through our technology and discuss potential use cases –– simply schedule a demo with us by filling out the form.