-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3.3_diarization.qmd
58 lines (44 loc) · 1.53 KB
/
3.3_diarization.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 3.3 Diarization {.unnumbered}
This is a minimal example script showing how to detect speakers
in the audio track of a video file. To start, we will load in a few
modules that will be needed for the task.
```{python}
from pyannote.audio import Pipeline
from pydub import AudioSegment
import polars as pl
import tempfile
```
Next, we need to load the model.
```{python}
pipeline = Pipeline.from_pretrained("statsmaths/diarize")
```
The algorthm only takes audio inputs, so we need to convert our
video into a temporary wave file. If you have a wave file, this can
be loaded and passed directly into the model. We will pass the
audio file directly to the diarization model here as well.
```{python}
with tempfile.NamedTemporaryFile(suffix='.wav') as temp_file:
audio = AudioSegment.from_file('video/sotu.mp4', format="mp4")
audio.export(temp_file.name, format="wav")
diarization = pipeline(temp_file.name)
```
The output requires a little bit of parsing before it is ready
to use. Here we produce an output dictionary.
```{python}
data = diarization.to_lab().split('\n')
data = [x.split(' ') for x in data]
data = [x for x in data if len(x) == 3]
output = {
'start_time': [float(x[0]) for x in data],
'end_time': [float(x[1]) for x in data],
'speaker': [x[2] for x in data]
}
```
The output is constructed such that we can call the `from_dict`
method from **polars** to construct a data frame. If needed, this
can be saved as a CSV file with the `write_csv` method of the
resulting data frame.
```{python}
dt = pl.from_dict(output)
dt
```