AMIA 2023 – System Demonstration
Blind Learning and Privophy Demo
System Overview
Some Preliminary Results
Blind Learning
In addition to the results presented in the paper, we list the following results to help the reader compare the performance of Blind Learning to other methods using well-known baselines:
Dataset:
CIFAR-10
Model:
ResNet-18
Training Criteria:
Accuracy per training round over 100 global epochs. The baseline is a model trained on the complete dataset (50,000 images) at a centralized location for 100 epochs, using ResNet-18. The baseline accuracy is 80%. The visualized results also show the training results on three individual clients, each with 16,600 non-overlapping images for both IID and Non-IID data distributions.
Results:
Overall, BL and FL algorithms can provide acceptable utility compared to centralized training. However, FL requires more training rounds and is more communication and computation costly compared to BL. The following figures illustrate the accuracy/round for BL, FL, centrally-trained model, and individual models per client.
Test accuracy per training round for non-IID data distribution over 3 clients. FL: Federated Learning, BL: Blind Learning, E: local epochs.
Test accuracy per training round for IID data distribution over 3 clients. Note that BL reaches highest accuracy ~10 epochs earlier than FL.
Interactive Demo Plan
Introduction
We’ll discuss the motivation behind TripleBlind and what it means to “unlock private data sharing.”
Explanation
We’ll cover the underlying methodology of our innovations, including Blind Learning.
Use Case
We’ll demo the use of TripleBlind with a live example where we train an image classifier using two remote decentralized datasets.
Hands On
We’ll invite the audience to play with our solution.
Audience Interaction Plan
Members of the audience will be invited to interact with our system. Each participant will play the role of either a data scientist or a data owner.
Data Scientist
The data scientist task will focus on training a deep learning model using remote decentralized datasets.
Data Owner
The data owner task will focus on running an encrypted inference using a remote model on some local data.
Accessible through live Jupyter Notebooks
Each notebook will include one of five scenarios (see below). Note that each dataset is divided into two sets, each set is placed on an individual Google Cloud instance to simulate training and inference on decentralized data owned by different organizations.
Training a deep learning model (VGG-16) for image classification using CIFAR-10
Training a deep learning model for tabular data classification
Training a deep learning model for multi-modal (text and images) classification
Running a secure inference task using a remote, pre-trained model
We have also added another notebook to illustrate our Private Set Intersection protocol