EzPC: Microsoft's attempt to enhance data security in AI model validation

An robotic eye made of lines and concentric circles with digits on the side — *Image via The Indian Express*

Those who have worked in the domain of data science know that developing an artificial intelligence (AI) model typically includes three stages at a high level: training, validation, and testing. When testing the accuracy of the model, there are usually a lot of considerations when choosing a validation set to tune the hyperparameters. For an accurate model evaluation, organizations tend to use a portion of their real data for validation, but naturally, there are a lot of security and privacy concerns, especially when dealing with personally identifiable information (PII).

If your model is being developed by an external company, you essentially have two options. Either the firm shares its model with you - which would be a risk to its IP protection - or you share your real data with them, which is a privacy risk for you and can also result in the model overfitting to real data. There are a lot of legal hurdles to jump over as well when making either of these difficult choices. So while organizations want to adopt AI as quickly as possible, they face a challenge when dealing with data, regardless of whether the model development process is internal or external.

To tackle this problem, Microsoft is working on a new framework called EzPC, which stands for "Easy Secure Multi-party Computation". In essence, EzPC is based on secure multiparty computation (MPC). MPC enables multiple parties to jointly compute a function using cryptography without revealing their data to each other.

While MPC has been around for years, it has proven difficult to implement because of the challenges involved in making it scalable and efficient when computing multiple functions. EzPC tackles these problems by using MPC as a building block and enabling developers - not only cryptography experts -to expand upon it. According to Microsoft:

Two innovations are at the core of EzPC. First, a modular compiler called CrypTFlow takes as input TensorFlow or Open Neural Network Exchange (ONNX) code for ML inference and automatically generates C-like code, which can then be compiled into various MPC protocols. This compiler is both “MPC-aware” and optimized, ensuring that the MPC protocols are efficient and scalable. The second innovation is a suite of highly performant cryptographic protocols for securely computing complex ML functions.

Microsoft boasted that EzPC enabled the "first-ever secure validation of a production-grade AI model" in its testing with researchers at Stanford University, thus proving that you don't need to share data to perform validation. Although Microsoft's EzPC model took 15 minutes to do secure inference with a validation element - which is 3000x longer than a regular inference - on "two standard cloud virtual machines", the company says that this is irrelevant because computation parallelism can solve this problem. Under the current methodology, over 500 images in the validation set went through inference in a period of five days and a total cost of less than $100. Microsoft claims that it could have completed inference for the entire set in 15 minutes if all the data was run in parallel. You can explore the findings in the paper published here.

As such, Microsoft has encouraged the use of EzPC, emphasizing its foundations in MPC. Organizations which leverage from EzPC will also be able to work around legal hurdles while ensuring that an AI model has been accurately assessed ahead of its use in production environments. EzPC is an open-source framework that you can find on GitHub here. You can also keep track of the latest developments on the initiative here and check out the research papers on the topic here.