The deployment of deep learning in real-world systems calls for a set of complementary technologies that will ensure that deep learning is trustworthy (Nicolas Papernot). The list covers different topics in emerging research areas including but not limited to out-of-distribution generalization, adversarial examples, backdoor attack, model inversion attack, machine unlearning, etc.
Daily updating from ArXiv. The preview README only includes papers submitted to ArXiv within the last one year. More paper can be found here 📂 [Full List].
- Awesome Trustworthy Deep Learning Paper List 📃
- Related Awesome Lists 😲
- Toolboxes 🧰
- Seminar ⏰
- Workshops 🔥
- Tutorials 👩🏫
- Talks 🎤
- Blogs ✍️
- Other Resources ✨
- Contributing 😉
📂 [Full List of Out-of-Distribution Generalization].
📂 [Full List of Evasion Attacks and Defenses].
📂 [Full List of Poisoning Attacks and Defenses].
- Open Problems in Machine Unlearning for AI Safety. [paper]
- Fazl Barez, Tingchen Fu, Ameya Prabhu, Stephen Casper, Amartya Sanyal, Adel Bibi, Aidan O'Gara, Robert Kirk, Ben Bucknall, Tim Fist, Luke Ong, Philip Torr, Kwok-Yan Lam, Robert Trager, David Krueger, Sören Mindermann, José Hernandez-Orallo, Mor Geva, Yarin Gal.
- Key Word: Machine Unlearning.
-
Digest
As AI systems grow in capability and autonomy in critical areas like cybersecurity, healthcare, and biological research, ensuring their alignment with human values is crucial. Machine unlearning, originally focused on privacy and data removal, is gaining attention for its potential in AI safety. However, this paper identifies significant limitations preventing unlearning from fully addressing safety concerns, especially in managing dual-use knowledge where information can have both beneficial and harmful applications. It highlights challenges such as unintended side effects, conflicts with existing safety mechanisms, and difficulties in evaluating robustness and preserving safety features during unlearning. By outlining these constraints and open problems, the paper aims to guide future research toward more realistic and effective AI safety strategies.
📂 [Full List of Interpretability].
-
DeepDG: OOD generalization toolbox
- A domain generalization toolbox for research purpose.
-
- This repository contains the source code for CleverHans, a Python library to benchmark machine learning systems' vulnerability to adversarial examples.
-
Adversarial Robustness Toolbox (ART)
- Adversarial Robustness Toolbox (ART) is a Python library for Machine Learning Security. ART provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats of Evasion, Poisoning, Extraction, and Inference.
-
- PyTorch implementation of adversarial attacks.
-
- Advtorch is a Python toolbox for adversarial robustness research. The primary functionalities are implemented in PyTorch. Specifically, AdverTorch contains modules for generating adversarial perturbations and defending against adversarial examples, also scripts for adversarial training.
-
- A standardized benchmark for adversarial robustness.
-
- The open-sourced Python toolbox for backdoor attacks and defenses.
-
- A comprehensive benchmark of backdoor attack and defense methods.
-
- Diffprivlib is a general-purpose library for experimenting with, investigating and developing applications in, differential privacy.
-
- Privacy Meter is an open-source library to audit data privacy in statistical and machine learning algorithms.
-
- The OpenDP Library is a modular collection of statistical algorithms that adhere to the definition of differential privacy.
-
- PrivacyRaven is a privacy testing library for deep learning systems.
-
- PersonalizedFL is a toolbox for personalized federated learning.
-
- Evaluating the privacy of synthetic data with an adversarial toolbox.
-
- The AI Fairness 360 toolkit is an extensible open-source library containing techniques developed by the research community to help detect and mitigate bias in machine learning models throughout the AI application lifecycle.
-
- Fairlearn is a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues.
-
- Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.
-
- FAT Forensics implements the state of the art fairness, accountability and transparency (FAT) algorithms for the three main components of any data modelling pipeline: data (raw data and features), predictive models and model predictions.
-
- This project is about explaining what machine learning classifiers (or models) are doing.
-
- InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.
-
- This is the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.
-
- Captum is a model interpretability and understanding library for PyTorch.
-
- Alibi is an open source Python library aimed at machine learning model inspection and interpretation.
-
- The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models.
-
- A Python package for inferring causal effects from observational data.
-
- Fortuna is a library for uncertainty quantification that makes it easy for users to run benchmarks and bring uncertainty to production systems.
-
- VerifAI is a software toolkit for the formal design and analysis of systems that include artificial intelligence (AI) and machine learning (ML) components.
-
Backdoor Attacks and Defenses in Machine Learning (ICLR 2023)
-
Adversarial Machine Learning on Computer Vision: Art of Robustness (CVPR 2023)
-
Workshop on Adversarial Robustness In the Real World (ECCV 2022)
-
Workshop on Spurious Correlations, Invariance, and Stability (ICML 2022)
-
Robust and reliable machine learning in the real world (ICLR 2021)
-
Distribution Shifts Connecting Methods and Applications (NeurIPS 2021)
-
Workshop on Adversarial Robustness In the Real World (ICCV 2021)
-
Uncertainty and Robustness in Deep Learning Workshop (ICML 2021)
-
Uncertainty and Robustness in Deep Learning Workshop (ICML 2020)
-
Pitfalls of limited data and computation for Trustworthy ML (ICLR 2023)
-
Secure and Safe Autonomous Driving (SSAD) Workshop and Challenge (CVPR 2023)
-
Trustworthy and Reliable Large-Scale Machine Learning Models (ICLR 2023)
-
TrustNLP: Third Workshop on Trustworthy Natural Language Processing (ACL 2023)
-
Pitfalls of limited data and computation for Trustworthy ML (ICLR 2023)
-
Workshop on Mathematical and Empirical Understanding of Foundation Models (ICLR 2023)
-
Automotive and Autonomous Vehicle Security (AutoSec) (NDSS 2022)
-
Trustworthy and Socially Responsible Machine Learning (NeurIPS 2022)
-
International Workshop on Trustworthy Federated Learning (IJCAI 2022)
-
1st Workshop on Formal Verification of Machine Learning (ICML 2022)
-
Workshop on Distribution-Free Uncertainty Quantification (ICML 2022)
-
Practical Adversarial Robustness in Deep Learning: Problems and Solutions (CVPR 2021)
-
Adversarial Robustness: Theory and Practice (NeurIPS 2018) [Note]
-
ECE1784H: Trustworthy Machine Learning (Course, Fall 2019) - Nicolas Papernot
-
A School for all Seasons on Trustworthy Machine Learning (Course) - Reza Shokri, Nicolas Papernot
Welcome to recommend paper that you find interesting and focused on trustworthy deep learning. You can submit an issue or contact me via [email]. Also, if there are any errors in the paper information, please feel free to correct me.
Formatting (The order of the papers is reversed based on the initial submission time to arXiv)
- Paper Title [paper]
- Authors. Published Conference or Journal
- Key Word: XXX.
-
Digest
XXXXXX