Skip to content

Commit

Permalink
Updated README with link to arXiv paper. Added 4 remaining maps (1.0.1).
Browse files Browse the repository at this point in the history
  • Loading branch information
nfelnlp committed Sep 1, 2021
1 parent e427fde commit cc3bcd9
Show file tree
Hide file tree
Showing 5 changed files with 73 additions and 18 deletions.
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,10 @@ docs/source/_build/

maps/

.ipynb_checkpoints/
.ipynb_checkpoints/

align_maps.py
pypi-test-api-token.txt
run_visualization.py
runtimes.md
src/thermostat/data/stats.py
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@
* Increases comparability and replicability of research.
* Reduces the implementational burden.

This work is described in our demo paper:
This work is described in our paper accepted to **EMNLP 2021 System Demonstrations** :
**Nils Feldhus, Robert Schwarzenberg, and Sebastian Möller.**
__Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools.__
*2021. Under (single-blind) peer review, coming soon.*
__Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools.__ *2021.*

arXiv pre-print available here: https://arxiv.org/abs/2108.13961



Expand All @@ -25,8 +26,6 @@ pip install thermostat-datasets
```




## Usage

Downloading a dataset requires just two lines of code:
Expand Down Expand Up @@ -174,7 +173,7 @@ Example configuration: `multi_nli-roberta-lime`

Name | 🤗 | `lgxa` | `lig` | `lime` | `occ` | `svs`
--- | --- | --- | --- | --- | --- | ---
ALBERT (`albert`) | [`prajjwal1/albert-base-v2-mnli`](https://huggingface.co/prajjwal1/albert-base-v2-mnli) | ✅ | ✅ | ✅ | ✅ | 🔄x3/3
ALBERT (`albert`) | [`prajjwal1/albert-base-v2-mnli`](https://huggingface.co/prajjwal1/albert-base-v2-mnli) | ✅ | ✅ | ✅ | ✅ |
BERT (`bert`) | [`textattack/bert-base-uncased-MNLI`](https://huggingface.co/textattack/bert-base-uncased-MNLI) | ✅ | ✅ | ✅ | ✅ | ✅
ELECTRA (`electra`) | [`howey/electra-base-mnli`](https://huggingface.co/howey/electra-base-mnli) | ✅ | ✅ | ✅ | ✅ | ✅
RoBERTa (`roberta`) | [`textattack/roberta-base-MNLI`](https://huggingface.co/textattack/roberta-base-MNLI) | ✅ | ✅ | ✅ | ✅ | ✅
Expand All @@ -188,11 +187,11 @@ Example configuration: `xnli-roberta-lime`

Name | 🤗 | `lgxa` | `lig` | `lime` | `occ` | `svs`
--- | --- | --- | --- | --- | --- | ---
ALBERT (`albert`) | [`prajjwal1/albert-base-v2-mnli`](https://huggingface.co/prajjwal1/albert-base-v2-mnli) | ✅ | ✅ | ✅ | ✅ | 🔄
ALBERT (`albert`) | [`prajjwal1/albert-base-v2-mnli`](https://huggingface.co/prajjwal1/albert-base-v2-mnli) | ✅ | ✅ | ✅ | ✅ |
BERT (`bert`) | [`textattack/bert-base-uncased-MNLI`](https://huggingface.co/textattack/bert-base-uncased-MNLI) | ✅ | ✅ | ✅ | ✅ | ✅
ELECTRA (`electra`) | [`howey/electra-base-mnli`](https://huggingface.co/howey/electra-base-mnli) | ✅ | ✅ | ✅ | ✅ | ✅
RoBERTa (`roberta`) | [`textattack/roberta-base-MNLI`](https://huggingface.co/textattack/roberta-base-MNLI) | ✅ | ✅ | ✅ | ✅ | 🔄
XLNet (`xlnet`) | [`textattack/xlnet-base-cased-MNLI`](https://huggingface.co/textattack/xlnet-base-cased-MNLI) | ✅ | ✅ | ✅ | ✅ | 🔄
RoBERTa (`roberta`) | [`textattack/roberta-base-MNLI`](https://huggingface.co/textattack/roberta-base-MNLI) | ✅ | ✅ | ✅ | ✅ |
XLNet (`xlnet`) | [`textattack/xlnet-base-cased-MNLI`](https://huggingface.co/textattack/xlnet-base-cased-MNLI) | ✅ | ✅ | ✅ | ✅ |


### AG News
Expand Down Expand Up @@ -248,10 +247,26 @@ If you're successful, follow the official instructions for [sharing a community

At first, all Thermostat contributions will have to be loaded via the code example above. Please notify us of existing explanation datasets by creating an [Issue](https://github.com/DFKI-NLP/thermostat/issues) with the tag [Contribution](https://github.com/DFKI-NLP/thermostat/labels/contribution) and a maintainer of this repository will add your dataset to the Thermostat configs s.t. it can be accessed by everyone via `thermostat.load()`.

---

## Cite Thermostat

```
@inproceedings{feldhus2021thermostat,
title={Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools},
author={Nils Feldhus and Robert Schwarzenberg and Sebastian Möller},
year={2021},
editor = {Heike Adel and Shuming Shi},
booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
}
```


## Disclaimer
We give no warranties for the correctness of the heatmaps or any other part of the data. This is evolving work and will be hot-patched continuously.

The Thermostat project follows the [ACL and ACM Code of Ethics](https://www.acm.org/code-of-ethics).


## Acknowledgements

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

setup(
name="thermostat-datasets",
version="1.0.0", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
version="1.0.1", # expected format is one of x.y.z.dev0, or x.y.z.rc1 or x.y.z (no to dashes, yes to dots)
description="Collection of NLP model explanations and accompanying analysis tools",
long_description="Thermostat is a large collection of NLP model explanations and accompanying analysis tools. "
"Combines explainability methods from the captum library with Hugging Face's datasets and "
Expand Down
30 changes: 29 additions & 1 deletion src/thermostat/data/thermostat_configs.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import datasets


_VERSION = datasets.Version('1.0.0', '')
_VERSION = datasets.Version('1.0.1', '')


# Base arguments for any dataset
Expand Down Expand Up @@ -464,6 +464,13 @@ def __init__(
data_url="https://cloud.dfki.de/owncloud/index.php/s/F5xWYpyDpwaAPJs/download",
**_MNLI_ALBERT_KWARGS,
),
ThermostatConfig(
name="multi_nli-albert-svs",
description="MultiNLI dataset, ALBERT model, Shapley Value Sampling explanations",
explainer="ShapleyValueSampling",
data_url="https://cloud.dfki.de/owncloud/index.php/s/fffM7w64CnTSzHA/download",
**_MNLI_ALBERT_KWARGS,
),
ThermostatConfig(
name="multi_nli-bert-lgxa",
description="MultiNLI dataset, BERT model, Layer Gradient x Activation explanations",
Expand Down Expand Up @@ -632,6 +639,13 @@ def __init__(
data_url="https://cloud.dfki.de/owncloud/index.php/s/bEg95CGBtzaFQij/download",
**_XNLI_ALBERT_KWARGS,
),
ThermostatConfig(
name="xnli-albert-svs",
description="XNLI dataset, ALBERT model, Shapley Value Sampling explanations",
explainer="ShapleyValueSampling",
data_url="https://cloud.dfki.de/owncloud/index.php/s/wekiPq7ijzsCQK4/download",
**_XNLI_ALBERT_KWARGS,
),
ThermostatConfig(
name="xnli-bert-lgxa",
description="XNLI dataset, BERT model, Layer Gradient x Activation explanations",
Expand Down Expand Up @@ -730,6 +744,13 @@ def __init__(
data_url="https://cloud.dfki.de/owncloud/index.php/s/XB2tnATQW3tbxPW/download",
**_XNLI_ROBERTA_KWARGS,
),
ThermostatConfig(
name="xnli-roberta-svs",
description="XNLI dataset, RoBERTa model, Shapley Value Sampling explanations",
explainer="ShapleyValueSampling",
data_url="https://cloud.dfki.de/owncloud/index.php/s/opYTzjSeWWL7eYg/download",
**_XNLI_ROBERTA_KWARGS,
),
ThermostatConfig(
name="xnli-xlnet-lgxa",
description="XNLI dataset, XLNet model, Layer Gradient x Activation explanations",
Expand Down Expand Up @@ -758,4 +779,11 @@ def __init__(
data_url="https://cloud.dfki.de/owncloud/index.php/s/yEFEyrq4pbGKP4s/download",
**_XNLI_XLNET_KWARGS,
),
ThermostatConfig(
name="xnli-xlnet-svs",
description="XNLI dataset, XLNet model, Shapley Value Sampling explanations",
explainer="ShapleyValueSampling",
data_url="https://cloud.dfki.de/owncloud/index.php/s/fT34Q7CD2GQkdxJ/download",
**_XNLI_XLNET_KWARGS,
),
]
18 changes: 12 additions & 6 deletions src/thermostat/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,25 +12,31 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""TODO: Add a description here."""


import datasets
import json

from thermostat.data.thermostat_configs import builder_configs


# TODO: Add BibTeX citation
# Find for instance the citation on arxiv or on the dataset repo/website
_CITATION = "Coming soon."
_CITATION = """\
@misc{feldhus2021thermostat,
title={Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools},
author={Nils Feldhus and Robert Schwarzenberg and Sebastian Möller},
year={2021},
eprint={2108.13961},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
"""

_DESCRIPTION = "Thermostat is a large collection of NLP model explanations and accompanying analysis tools."

# TODO: Add a link to an official homepage for the dataset here
# Link to an official homepage for the dataset
_HOMEPAGE = 'https://github.com/DFKI-NLP/thermostat'

# TODO: Add the licence for the dataset here if you can find it
# Licence for the dataset
_LICENSE = 'Apache 2.0'


Expand Down

0 comments on commit cc3bcd9

Please sign in to comment.