The current repository provides the code for PlayNTell, described in the article Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge, presented at EMNLP 2022.
git clone git@github.com:deezer/playntell.git
cd playntell
Build the docker image and run it in a container while launching an interactive bash session:
$ make build
$ make run-bash
When building the docker image, the released data which is currently hosted on Zenodo is also downloaded in the directory data
; thus the build may take a while. Also in data
, we can find pre-computed embeddings for the discogs tags using music-w2v (see Doh et al., 2020 in the paper).
In order to run the code, a cuda
environment, with a version >=11.2
is required. While the inference is quite fast, training could last up to 2 days.
To train the model on pre-processed deezer training data:
$ poetry run python3 playntell/training_experiments/train_playntell.py
Note: playntell
accepts multiple parameters. Two useful ones are:
--exp_name
: to distinguish the output of different runs. Default: "playntell";--playlist_feature
: music modalities to be used. Default: "audio_tags_artist". E.g. if "audio_artist", only two modalities are used as explained in the paper.
Note: playntell
saves its outputs in: ../data/playlist-captioning/p/curated-deezer/algorithm-data/playntell/
. In particular:
- model is stored as
../data/playlist-captioning/p/curated-deezer/algorithm-data/playntell/saved_models/{exp_name}/best.pth
; - inference on test sets is stored in
../data/playlist-captioning/p/curated-deezer/algorithm-data/playntell/predictions/{exp_name}
; - log file is store as
../data/playlist-captioning/p/curated-deezer/algorithm-data/playntell/logs/{exp_name}
Once the model is trained, you can perform inference on preprocessed data (from deezer or spotify playlists) with:
$ poetry run python3 playntell/infer.py --exp_name playntell --inference_dataset_name curated-deezer
You can use the playntell model to predict a caption for a playlist (given by its audio files, tags, and artists information) with:
$ poetry run python3 playntell/caption_playlist.py /data/playlist-captioning/p/test_playlist/playlist.json
the playlist.json
file has two fields:
id
which is the id of the playlist (could be any string)tracks
is the list of tracks of the playlist. Each track must have the following fields:id
: the filename of the audio file.artist
: the name of the main artist of the song in the audio file.tags
: a list of tags (from the discogs taxonomy) describing the track.
A dummy example of a playlist.json
and audio files is provided for testing in /data/playlist-captioning/p/test_playlist/
, found in the docker container.
This repo uses code from the following repositories, with some modifications:
- muscaps (muscaps folder)
- mood_flow_audio_features (audio_features folder)
- ply_title_get (ply_title_gen folder)
- VisualGPT (audio_gpt folder) We took the code from VisualGPT, modified the model code in it and renamed it to AudioGPT
Please cite our paper if you use this code in your work:
@InProceedings{Gabbolini2022,
title={Data-Efficient Playlist Captioning With Musical and Linguistic Knowledge},
author={Gabbolini, Giovanni and Hennequin, Romain and Epure, Elena},
booktitle={Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month={December},
year={2022}
}