Privacy Meter

What is Privacy Meter?

Privacy Meter is an open-source library to audit data privacy in a wide range of statistical and machine learning algorithms (classification, regression, computer vision, and natural language processing). The tool enables data protection impact assessment based on the state-of-the-art membership inference attacks.

Why Privacy Meter?

Machine learning is playing a central role in automated decision-making in a wide range of organizations and service providers. The data, which are used to train the models, typically contain sensitive information about individuals. Although the data in most cases cannot be released, due to privacy concerns, the models are usually made public or deployed as a service for inference on new test data. For a safe and secure use of machine learning models, it is important to have a quantitative assessment of the privacy risks of these models, and to make sure that they do not reveal sensitive information about their training data. This is of great importance as there has been a surge in the use of machine learning in sensitive domains such as medical and finance applications.

Data Protection regulations, such as GDPR and AI governance frameworks, require personal data to be protected when used in AI systems, and that the users have control over their data and awareness about how it is being used. For example, Article 35 of GDPR requires organizations to systematically analyze, identify and minimize the data protection risks of a project, especially when the project involves innovative technologies such as Artificial Intelligence, Machine Learning, and Deep Learning. Thus, proper mechanisms need to be in place to quantitatively evaluate and verify the privacy of individuals in every step of the data processing pipeline in AI systems.

Overview

Privacy Meter is a versatile tool that can be used with different types of models, datasets and privacy games, which all need to be specified in a .yaml configuration file. The description of the configuration file can be found here.

Auditing Methodology

Privacy Meter encompasses multiple privacy auditing method by considering various sources of information leakage. For information leakage through training points, we recommend using membership inference attacks. For information leakage in the vicinity of training points, range membership inference attacks should be used. For information leakage in the form of the percentage of dataset used in training the given models, dataset usage cardinality inference is the go-to method. Privacy meter also supports auditing the differential privacy (DP) lower bounds of (DP or non-DP) training algorithms with membership inference attacks. The specific details of each inference attack and how to use them in Privacy Meter can be found by clicking the respective link above.

Installation Instructions

To install the dependencies, run the following command:

pip install -r requirements.txt

Alternatively, if you prefer using conda, you can create a new environment using the provided env.yaml file:

conda env create -f env.yaml

This should create a conda environment named privacy_meter and install all necessary libraries in it. If conda takes too much time (more than a few minutes) to solve the environment, we suggest updating the conda default solver by following this official article.

Dataset and models

Privacy Meter can be used with all datasets and model classes. Here, we provide examples of using Privacy Meter on CIFAR10 (cifar10), CIFAR100 (cifar100), Purchase (purchase100), Texas (texas100), and AG News (agnews). In terms of models, we provide examples for CNN (cnn), AlexNet (alexnet), WideResNet (wrn28-1, wrn28-2, wrn28-10), MLP (mlp), and GPT-2 (gpt2) models. To specify the dataset and model, you can use the dataset and model_name parameters in the configuration file. Sample configurations have been provided in the configs folder for Purchase-100, CIFAR-10 and AG News datasets.

Extending to Other Datasets and Models

Attacking LLMs with other datasets

To use other datasets supported by HuggingFace's datasets library, after specifying it in the configuration file, you need to additionally follow these steps:

Create /dataset/<hf_dataset>.py: this file handles the loading and preprocessing of the new huggingface dataset. You can refer to /dataset/agnews.py for an example.
Modify /dataset/utils.py to include the new dataset in the get_dataset function.

For other datasets, you can simply modify the get_dataset function in /dataset/utils.py to support loading the new dataset.

Attacking other transformers

To attack other transformers from Huggingface's transformers library, you need to modify /models/utils.py to include the new model in the get_model function. If you want to use different training pipelines, you can modify /trainers/train_transformer.py accordingly. You can also use other PEFT methods in the same file if you want to use more than LoRA.

For other Pytorch models, you can create a new model architecture in /models/ and modify the get_model function in /models/utils.py to include the new model.

Use custom training scripts

We integrate a fast training library, hlb-CIFAR10, developed by tysam-code, into Privacy Meter as an example of incorporating custom training scripts. This library achieves an impressive training accuracy of 94% on CIFAR-10 in approximately 6.84 seconds on a single A100 GPU, setting a new world speed record. This integration allows users to efficiently evaluate the effectiveness of the newly proposed algorithm against existing attack algorithms using the CIFAR-10 dataset. To leverage this fast training library, simply specify the model_name as speedyresnet in the configuration file.

To use other training scripts, you can refer to how speedyresnet and /trainers/fast_train.py is integrated into Privacy Meter for an example.

Auditing Trained Models

By default, the Privacy Meter checks if the experiment directory specified by the configuration file contains models_metadata.json, which contains the model path to be loaded. To audit trained models obtained outside the Privacy Meter, you should follow the file structure (see <log_dir>/<models> in the next section) and create a models_metadata.json file that shares the same structure as the one generated by Privacy Meter. You can also run the demo configuration file with a few epochs to generate a demo directory to start with.

Audit Results

The audit results will be saved in the log_dir specified in the configuration file. The results include the following:

<log_dir>/
    ├── models/
        ├── models_metadata.json: the meta information of the run and each trained model
        ├── model_<model_id>.pkl: the trained models
        └── memberships.npy: the membership labels of the training data for each model
    ├── report/
        ├── exp/: contains attack results and (log) ROC curves for each target model    
        ├── log_time_analysis.log: log with timing information for each run
        ├── attack_result_average.csv: the aggregate attack results of the run
        └── ROC_(log_)average.png: the aggregate (log) ROC of the run
    └── signals/: contains the attack signals computed for each target and reference model, 
                    according to the attack type specified in the configuration file

Video (Talks)

Low-Cost High-Power Membership Inference Attacks at ICML 2024, by Reza Shokri.
Auditing Data Privacy in Machine Learning at USENIX Enigma 2022, by Reza Shokri.
Machine Learning Privacy Meter Tool at HotPETS 2020, by Sasi Kumar Murakonda.

Discussion

Please feel free to join our Slack Channel to discuss with us on the project!

References

The Privacy Meter is built upon the following research papers (bib file):

Zarifzadeh, Sajjad, Philippe Liu, and Reza Shokri. Low-Cost High-Power Membership Inference Attacks. in Forty-first International Conference on Machine Learning, 2024.
Sasi Kumar Murakonda, Reza Shokri. MLPrivacy Meter: Aiding Regulatory Compliance by Quantifying the Privacy Risks of Machine Learning in Workshop on Hot Topics in Privacy Enhancing Technologies (HotPETs), 2020.
Jiayuan Ye, Aadyaa Maddi, Sasi Kumar Murakonda, Reza Shokri. Enhanced Membership Inference Attacks against Machine Learning Models in Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, 2022.
Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks in IEEE Symposium on Security and Privacy, 2019.
Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership Inference Attacks against Machine Learning Models in IEEE Symposium on Security and Privacy, 2017.

Authors

The tool is designed and developed at NUS Data Privacy and Trustworthy Machine Learning Lab. We also welcome contributions from the community.

Name		Name	Last commit message	Last commit date
Latest commit History 566 Commits
configs		configs
dataset		dataset
documentation		documentation
models		models
modules		modules
range_samplers		range_samplers
research		research
trainers		trainers
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
attacks.py		attacks.py
audit.py		audit.py
demo.ipynb		demo.ipynb
env.yaml		env.yaml
get_signals.py		get_signals.py
main.py		main.py
ramia_scores.py		ramia_scores.py
requirements.txt		requirements.txt
run_audit_dp.py		run_audit_dp.py
run_duci.py		run_duci.py
run_range_mia.py		run_range_mia.py
util.py		util.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy Meter

What is Privacy Meter?

Why Privacy Meter?

Overview

Auditing Methodology

Installation Instructions

Dataset and models

Extending to Other Datasets and Models

Attacking LLMs with other datasets

Attacking other transformers

Use custom training scripts

Auditing Trained Models

Audit Results

Video (Talks)

Discussion

References

Authors

About

Releases

Packages

Languages

License

privacytrustlab/privacy_meter_dev

Folders and files

Latest commit

History

Repository files navigation

Privacy Meter

What is Privacy Meter?

Why Privacy Meter?

Overview

Auditing Methodology

Installation Instructions

Dataset and models

Extending to Other Datasets and Models

Attacking LLMs with other datasets

Attacking other transformers

Use custom training scripts

Auditing Trained Models

Audit Results

Video (Talks)

Discussion

References

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages