Privacy Preserved Machine Learning
Privacy-Preserving Machine Learning is a step-by-step approach to preventing data leakage in machine learning algorithms. Large-scale data collection practices can lead to Exposure of sensitive personal information, Algorithmic bias and discrimination, Surveillance and social control, etc. PPML allows many privacy-enhancing strategies to allow multiple input sources to train ML models cooperatively without exposing their private data in its original form. In this article we will be discussing 4 techniques of PPML which have shown great potential in incorporating privacy mechanisms.
- Differential Privacy (DP)
Differential Privacy is a data aggregation method that adds randomized “noise” to the data. The data cannot be reverse engineered to understand the original inputs. While DP is used by Microsoft and open-source libraries to protect privacy in the creation and tuning of ML models, there is a distinct trade-off when it comes to the data’s reliability. Given that the accuracy of ML models depends on the quality of the data, the amount of noise added to the underlying dataset is inversely proportional to the accuracy and certainty/reliability of that data, and of the entire model. In machine learning scenarios DP works through adding small amounts of statistical random noise during training, the purpose of which is to conceal contributions of individual parties. When DP is employed, a mathematical proof ensures that the final ML model learns only general trends in the data without acquiring information specific to individual ones. To expand the scope of scenarios where DP can be successfully applied, we push the boundaries of the state of the art in DP training algorithms to address the issues of scalability, efficiency, and privacy/utility trade-offs. - Zero-Knowledge Machine Learning (ZKML)
A zero-knowledge proof system (ZKP) is a method allowing a prover P to convince a verifier V about the truth of a statement without disclosing any information apart from the statement’s veracity. To affirm the statement’s truth, P produces a proof π for V to review, enabling V to be convinced of the statement’s truthfulness.
ZKPs are applicable during training to validate N’s correct execution on a labelled dataset A. Here, A serves as the public input, with an arithmetic circuit C depicting the neural network N. The training process requires an additional arithmetic circuit to implement the optimization function, minimizing the loss function. For each training epoch i, a proof π_i is generated, confirming the algorithm’s accurate execution through epochs 1 to i-1, including the validity of the preceding epoch’s proof. The training culminates with a compressed proof π, proving the correct training over dataset A. - Federated Learning (FL)
In Federated Learning or FL we look to train a global model using a dataset that is distributed in multiple servers with local data samples but without each server sharing their local data. In FL there is a global objective function that is being optimized which is defined as
where n is the number of servers, each variable is the set of parameters as viewed by the server i, and each function is a local objective function of server i. FL tries to find the best set of values that optimizes f.
Process:- Initialization. An initial global model is created and distributed by a central server to all other servers.
- Local training. Each server trains the model using their local data. This ensures data privacy and security.
- Model update. After training, each server shares with the central server their local updates like gradients and parameters.
- Aggregation. The central server receives all local updates and aggregates them into the global model, for example, using averaging.
- Model distribution. The updated model is distributed again with local servers and the previous steps are repeated until a desired level of performance is achieve by the global model.
- Fully Homomorphic Encryption based Machine Learning (FHEML)
It is a way where we implement machine learning algorithms that utilize fully homomorphic encryption schemes. It enables computations to be carried out on encrypted data, ensuring the confidentiality of the data which is being processed.
Steps in the training process:- Encrypt the dataset using the public key.
- Initialize the neural network with initial weights.
- Perform forward passes on encrypted data.
- Approximate activation functions using polynomials.
- Compute the loss on encrypted data.
- Perform backward passes to calculate gradients.
- Update the weights on encrypted data.
- Repeat the process for multiple training iterations.
- Keep weights encrypted throughout training.
- Data owner decrypts the final trained weights using the private key.
Recent Comments