Speaker Test

This repo presents a short test for evaluating the effectiveness of different SpeechBrain models at differnetiating different speakers.

The dataset used is Mozilla's Common Voice through kaggle. This dataset holds thousands of short clips from different speakers.

The speaker embeddings are found using the selected model, and then their similarity scores are found using the cosine difference between the embeddings. The results are presented in a number of ways. Some of the models used are:

speechbrain/spkrec-xvect-voxceleb
speechbrain/spkrec-ecapa-voxceleb
LanceaKing/spkrec-ecapa-cnceleb
- This one is trained on Chinese media and is therefore innapropriate, but produces interesing results

Usage

To run the tests on your own, you can simply run:

poetry env use python3
poetry install
python3 main.py

Note that in order for torchaudio to load the audio files, a backend must be present on the system. The recomended software is ffmpeg version 6. This can be installed on Ubuntu using sudo apt install ffmpeg or on macOS using Hombrew brew install ffmpeg@6 && brew link ffmpeg@6.

Results

The results for the three models are presented below. The random seed used to sample is 80085. Hehe.

Similarity matrix

These figures show, on a large scale, how similar the clips of different speakers are.

Histogram of similarity scores

The histogram of the distribution scores demonstrates the rough distribution of the similarity scores across all speakers.

E.C.D.F of similarity scores

An E.C.D.F gives a good idea of the differnt quantiles for the distribution.

Quantiles

Some quantiles for the distribuiton of scores is computed. These are useful as threshholds for acceptance.

Model	q(0.6)	q(0.7)	q(0.75)	q(0.8)	q(0.9)
spkrec-ecapa-voxceleb	0.6304247379302979	0.6656997799873352	0.682094156742096	0.6995031833648682	0.7450991868972778
spkrec-ecapa-voxceleb	0.9659057259559631	0.9702195525169373	0.9722887873649597	0.9744231700897217	0.9789397716522217
spkrec-ecapa-cnceleb	0.4949462413787842	0.5427722334861755	0.568801999092102	0.5973677635192871	0.6694293022155762

Conclusions

For the use case I am using the models for, the speechbrain/spkrec-ecapa-voxceleb seems like a better fit than speechbrain/spkrec-xvect-voxceleb, as it does not give high confidence scores when the speakers are distinct.

More testing should be done on how well the models identifies that the speaker is the same on different clips of the same speaker. However, since the models were prepared and tested using the voxceleb dataset seperately, I trust that they both already excel in this regard.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
results		results
.gitignore		.gitignore
README.md		README.md
ecdf_models.py		ecdf_models.py
main.py		main.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaker Test

Usage

Results

Similarity matrix

Histogram of similarity scores

E.C.D.F of similarity scores

Quantiles

Conclusions

About

Languages

OwenWaldron/speaker-test

Folders and files

Latest commit

History

Repository files navigation

Speaker Test

Usage

Results

Similarity matrix

Histogram of similarity scores

E.C.D.F of similarity scores

Quantiles

Conclusions

About

Topics

Resources

Stars

Watchers

Forks

Languages