Awesome Trustworthy Deep Learning

The deployment of deep learning in real-world systems calls for a set of complementary technologies that will ensure that deep learning is trustworthy (Nicolas Papernot). The list covers different topics in emerging research areas including but not limited to out-of-distribution generalization, adversarial examples, backdoor attack, model inversion attack, machine unlearning, etc.

Daily updating from ArXiv. The preview README only includes papers submitted to ArXiv within the last one year. More paper can be found here 📂 [Full List].

Paper List

Survey

📂 [Full List of Survey].

Out-of-Distribution Generalization

📂 [Full List of Out-of-Distribution Generalization].

Evasion Attacks and Defenses

📂 [Full List of Evasion Attacks and Defenses].

Poisoning Attacks and Defenses

📂 [Full List of Poisoning Attacks and Defenses].

Privacy

📂 [Full List of Privacy].

Open Problems in Machine Unlearning for AI Safety. [paper]
- Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal.
- Key Word: Machine Unlearning.
- Digest
  As AI systems grow in capability and autonomy in critical areas like cybersecurity, healthcare, and biological research, ensuring their alignment with human values is crucial. Machine unlearning, originally focused on privacy and data removal, is gaining attention for its potential in AI safety. However, this paper identifies significant limitations preventing unlearning from fully addressing safety concerns, especially in managing dual-use knowledge where information can have both beneficial and harmful applications. It highlights challenges such as unintended side effects, conflicts with existing safety mechanisms, and difficulties in evaluating robustness and preserving safety features during unlearning. By outlining these constraints and open problems, the paper aims to guide future research toward more realistic and effective AI safety strategies.

Fairness

📂 [Full List of Fairness].

Interpretability

📂 [Full List of Interpretability].

Alignment

📂 [Full List of Alignment].

Others

📂 [Full List of Others].

Related Awesome Lists

Robustness Lists

Privacy Lists

Fairness Lists

Interpretability Lists

Other Lists

Toolboxes

Robustness Toolboxes

DeepDG: OOD generalization toolbox
- A domain generalization toolbox for research purpose.
Cleverhans
- This repository contains the source code for CleverHans, a Python library to benchmark machine learning systems' vulnerability to adversarial examples.
Adversarial Robustness Toolbox (ART)
- Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.
Adversarial-Attacks-Pytorch
- PyTorch implementation of adversarial attacks.
Advtorch
- Advtorch is a Python toolbox for adversarial robustness research. The primary functionalities are implemented in PyTorch. Specifically, AdverTorch contains modules for generating adversarial perturbations and defending against adversarial examples, also scripts for adversarial training.
RobustBench
- A standardized benchmark for adversarial robustness.
BackdoorBox
- The open-sourced Python toolbox for backdoor attacks and defenses.
BackdoorBench
- A comprehensive benchmark of backdoor attack and defense methods.

Privacy Toolboxes

Diffprivlib
- Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.
Privacy Meter
- Privacy Meter is an open-source library to audit data privacy in statistical and machine learning algorithms.
OpenDP
- The OpenDP Library is a modular collection of statistical algorithms that adhere to the definition of differential privacy.
PrivacyRaven
- PrivacyRaven is a privacy testing library for deep learning systems.
PersonalizedFL
- PersonalizedFL is a toolbox for personalized federated learning.
TAPAS
- Evaluating the privacy of synthetic data with an adversarial toolbox.

Fairness Toolboxes

AI Fairness 360
- The AI Fairness 360 toolkit is an extensible open-source library containing techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle.
Fairlearn
- Fairlearn is a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues.
Aequitas
- Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.
FAT Forensics
- FAT Forensics implements the state of the art fairness, accountability and transparency (FAT) algorithms for the three main components of any data modelling pipeline: data (raw data and features), predictive models and model predictions.

Interpretability Toolboxes

Lime
- This project is about explaining what machine learning classifiers (or models) are doing.
InterpretML
- InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.
Deep Visualization Toolbox
- This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.
Captum
- Captum is a model interpretability and understanding library for PyTorch.
Alibi
- Alibi is an open source Python library aimed at machine learning model inspection and interpretation.
AI Explainability 360
- The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models.

Other Toolboxes

Uncertainty Toolbox
Causal Inference 360
- A Python package for inferring causal effects from observational data.
Fortuna
- Fortuna is a library for uncertainty quantification that makes it easy for users to run benchmarks and bring uncertainty to production systems.
VerifAI
- VerifAI is a software toolkit for the formal design and analysis of systems that include artificial intelligence (AI) and machine learning (ML) components.

Seminar

Workshops

Robustness Workshops

Privacy Workshops

Fairness Workshops

Algorithmic Fairness through the Lens of Causality and Privacy (NeurIPS 2022)

Interpretability Workshops

Interpretable Machine Learning in Healthcare (ICML 2022)

Other Workshops

Tutorials

Robustness Tutorials

Talks

Robustness Talks

Blogs

Robustness Blogs

Interpretability Blogs

Other Blogs

Cleverhans Blog - Ian Goodfellow, Nicolas Papernot

Other Resources

Contributing

Welcome to recommend paper that you find interesting and focused on trustworthy deep learning. You can submit an issue or contact me via [email]. Also, if there are any errors in the paper information, please feel free to correct me.

Formatting (The order of the papers is reversed based on the initial submission time to arXiv)

Paper Title [paper]
- Authors. Published Conference or Journal
- Key Word: XXX.
- Digest
  XXXXXX

Name		Name	Last commit message	Last commit date
Latest commit History 649 Commits
img		img
FULL_LIST.md		FULL_LIST.md
LICENSE		LICENSE
README.md		README.md

License

MinghuiChen43/awesome-trustworthy-deep-learning

Folders and files

Latest commit

History

Repository files navigation