This version of MULAN repository for extracting attribute features from raw audio files from LRE15 dataset. These articulatory attribute features (manner and place) are high-level speech descriptive features. More information you can find in the article below
@inproceedings{DBLP:conf/slt/KukanovHSL16,
author = {Ivan Kukanov and Ville Hautam{\"{a}}ki and
Sabato Marco Siniscalchi and Kehuang Li},
title = {Deep learning with maximal figure-of-merit cost to advance multi-label
speech attribute detection},
booktitle = {2016 {IEEE} Spoken Language Technology Workshop, {SLT} 2016, San Diego,
CA, USA, December 13-16, 2016},
pages = {489--495},
year = {2016},
doi = {10.1109/SLT.2016.7846308}
}
- fix path variable
KALDI_ROOT
inpath.sh
pointing to your installed Kaldi toolkit - be sure that all your bash files are runnable, fixing: run from the project folder
chmod -R +x ./
- fix paths
dataset_dir
,scan_sub_dir
andout_dir
in the scriptsrun_cnn.sh
andrun_dbn.sh
dataset_dir
: path to the LRE-15 datasetscan_sub_dir
: subforlders to search wave files with thepcm
extentionout_dir
: output path, where audio lists, fbank features and result attribute scores will be saved there
After that, you can run
$ ./run_dbn.sh 0 # 0 is the processing stage, from 0-2
or
$ ./run_cnn.sh 0 # 0 is the processing stage, from 0-3
Manner attribute scores will be saved in $out_dir/res/manner/scores.txt
and place attributes in $out_dir/res/place/scores.txt
in the next format:
utterance_id [ columns with attributes scores per each frame
]
Columns in scores.txt
correspond to the next type of attributes (you can find in data/dict/
):
manner: [ fricative glides nasal other silence stop voiced vowel ]
place: [ coronal dental glottal high labial low mid other palatal silence velar ]