Balancing the needs of data privacy and predictive utility is a central
challenge for machine learning in healthcare. In particular, privacy concerns
have led to a dearth of public datasets, complicated the construction of
multi-hospital cohorts and limited the utilization of external machine learning
resources. To remedy this, new methods are required to enable data owners, such
as hospitals, to share their datasets publicly, while preserving both patient
privacy and modeling utility. We propose NeuraCrypt, a private encoding scheme
based on random deep neural networks. NeuraCrypt encodes raw patient data using
a randomly constructed neural network known only to the data-owner, and
publishes both the encoded data and associated labels publicly. From a
theoretical perspective, we demonstrate that sampling from a sufficiently rich
family of encoding functions offers a well-defined and meaningful notion of
privacy against a computationally unbounded adversary with full knowledge of
the underlying data-distribution. We propose to approximate this family of
encoding functions through random deep neural networks. Empirically, we
demonstrate the robustness of our encoding to a suite of adversarial attacks
and show that NeuraCrypt achieves competitive accuracy to non-private baselines
on a variety of x-ray tasks. Moreover, we demonstrate that multiple hospitals,
using independent private encoders, can collaborate to train improved x-ray
models. Finally, we release a challenge dataset to encourage the development of
new attacks on NeuraCrypt.

By admin