Managing data effectively is among the biggest concerns for CDOs, other C-suite leaders, compliance officers and data scientists in the financial services and healthcare industries. Data access, data quality, preparation, bias challenges and compliance affect every business that attempts to collaborate with other organizations to improve the quality of their data sets to improve the accuracy of their implementation of artificial intelligence, machine learning, analytics and related projects.
Privacy Enhancing Technologies (PET) includes a cohort of solutions designed to facilitate collaborating with sensitive and protected data, while remaining in compliance with the growing number of data privacy and residency regulations in place. However, there are a number of myths and misconceptions surrounding PETs that cause decision makers to hesitate from deploying these technologies and benefitting from its potential. Here we debunk some of those myths and clarify those misconceptions.
Privacy Enhancing Computation Requires High Compute Overhead
Homomorphic encryption was an early privacy enhancing technology that continues to have promise, but also raises concerns. It has gained a reputation for being slow and it requires more than 42X the compute power and 20X the memory of alternative solutions. Not all PETs require excessive memory, bandwidth or compute time. TripleBlind’s privacy preserving operations protect sensitive data and enforce privacy regulations with little impact on compute performance.
Here are four examples where TripleBlind can help organizations reduce concerns about performance. TripleBlind incorporates several techniques to guarantee privacy. Four of these techniques and the speed of their performance is noted below.
A Blind Query allows a remote party to request the execution of predefined SQL-like queries. This approach enforces privacy by restricting the output to only what is approved by the data owner. The Blind Query was requested by the organization to produce summary statistics for a one million record database owned by a second organization. The total execution time was a mere 1.2 seconds.
A Blind Join facilities identification of overlapping fields between two or more databases. Privacy is protected by completely obscuring the field being compared using SMPC. A Blind Join was performed between three independent organizations, each with a one million records and a 10% overlap was found. The total execution time was 90 seconds.
Machine learning has a reputation for high resource usage and long training cycles. TripleBlind supports the federated learning technique and offers our own patented Blind Learning technique. We compared these two approaches using private data sets distributed over 2-5 independent client organizations with and without GPUs against a single system with direct access to a non-private full data set. The training data included 10,000 x-ray images with a median file size of .4MB. As a result of workload distribution and parallel operation made possible by distributed learning, machine learning training was up to 5X faster than training on an equivalent single machine.
Machine learning models are expensive to create and are subject to multiple types of reverse-engineering attacks when distributed to others for uncontrolled usage. TripleBlind’s SMPC-based Blind Interference protects both the source data and the trained model. Performing interference using an image owned by one organization against a neural network based on the LeNet-5 architecture owned by a second organization occurred in 0.2 seconds, 15% – 2,500% faster than other privacy preserving approaches such as SecureNN, Gazelle or MiniONN.
Privacy-Enhancing Technologies are Flawed
Several PET solutions provide only partial data protection and/or can create inaccurate results. Secure enclaves provide isolation for code and data from the operating system using either hardware-based isolation or isolating an entire virtual machine. As a result it is hardware dependent and silos data. Tokenization substitutes original sensitive data with non-sensitive placeholders referred to as tokens. Masking obscures, anonymizes or suppresses data by replacing sensitive data with random characters or just any non-sensitive data. Hashing is the process of transforming any given key or a string of characters into another value. All three approaches reduce the accuracy of analytics.
Synthetic data is artificially created rather than being generated by actual events. It is often created with the help of algorithms and is used for a wide range of activities. Since it’s not real data, it can skew the analytical outcomes. Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset while withholding information about individuals in the dataset, this approach can potentially introduce errors when analysis is run with this data. Federated learning trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them. It comes with high compute and communications costs, and lower accuracy.
TripleBlind offers the most complete and scalable solution for privacy enhancing computation. It is a software-only solution delivered via a simple API that allows data users to compute on data as they normally would, without having to “see”, copy or store any data. Our solution allows data owners full Digital Rights Management (DRM) over how their data is used on a granular, per-use level. TripleBlind eliminates the expensive manual data anonymization step required when using other solutions, while enforcing regulatory compliance.
Privacy enhancing computation with TripleBlind demonstrates that PETs don’t have to be complex in order to safeguard your data and reduce privacy risks. In the case of a healthcare provider, it can ensure privacy on both sides – the provider never provides raw data and its partners never provide raw algorithms. All operations are available without the risks of working with raw patient data.
Privacy-Enhancing Technologies Isn’t a Valid Approach to Data Privacy
Since PET is a relatively new data privacy solution, there is a myth that it’s not a comprehensive solution and insufficient to comply with today’s data privacy and data residency regulations.
While several PETs exhibit performance weaknesses, TripleBlind builds on well understood principles, such as federated learning and multi-party compute. It radically improves the practical use of privacy preserving technology by adding true scalability and faster processing with support for all data and algorithm types. ’s novel method of data de-identification via one-way encryption allows all attributes of the data to be used, even at an individual level, while eliminating any possibility of the data user learning anything about the individual.
At TripleBlind, we recognize the barriers organizations and data scientists face when trying to collaborate with data while adhering to regulatory standards such as HIPAA and GDPR. The TripleBlind solution compares favorably in its ability to allow organizations to collaborate around their most sensitive data and algorithms without compromising their privacy.