Category Archives: Kaldi

Once you have your alignments you might need to retrieve data from them.

Kaldi’s show-alignments generates an alignment file that is “readable for humans”. Here’s how to invoke it:

show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.1 > ali.1.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.2 > ali.2.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.3 > ali.3.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.4 > ali.4.txt

The file phones.txt is in data/lang/;
The file final.mdl is in exp/mono_ali/;
The files ali.1, ali.2, ali.3, ali.4 are in exp/mono_ali/. They have to be unziped (gunzip) before being used.

Before running show-alignments, the alignment files look like this:

f01br16b22k1-s003 3 12 18 17 1826 1825 1825 1825 1828 1830 1829 1829 1829 1829 1256 1258 1257 1257 1260 1259 1259 1259 1259 1259 1259 362 361 361 361 364 363 363 363 366 2750 2749 2749 2749 2749 2749 2749 2749 2749 2752 2751 2751 2754 2753 2753 2753 1928 1927 1927 1930 1929 1929 1932 1931 1931 1931 1931 980 982 984 1826 1825 1825 1825 1828 1827 1827 1827 1830 1829 2678 2677 2677 2677 2680 2679 2682 2681 2486 2485 2485 2485 2485 2485 2485 2488 2487 2487 2490 2489 2489 2489 2504 2503 2503 2503 2503 2506 2508 2738 2737 2737 2737 2737 2740 2739 2739 2739 2742 974 973 973 973 973 976 978 1814 1816 1818 2504 2503 2503 2506 2505 2508 2507 2507 2507 4 14 15 15 15 15 15 15 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 17 17 17 17 17 17 17 17

The above is utterance number 3 (s003) for female speaker 1 (f01br16b22k1) in the West Point Brazilian Portuguese LDC Corpus.

After your run show-alignments, you will see:

f01br16b22k1-s003 [ 3 12 18 17 ] [ 1826 1825 1825 1825 1828 1830 1829 1829 1829 1829 ] [ 1256 1258 1257 1257 1260 1259 1259 1259 1259 1259 1259 ] [ 362 361 361 361 364 363 363 363 366 ] [ 2750 2749 2749 2749 2749 2749 2749 2749 2749 2752 2751 2751 2754 2753 2753 2753 ] [ 1928 1927 1927 1930 1929 1929 1932 1931 1931 1931 1931 ] [ 980 982 984 ] [ 1826 1825 1825 1825 1828 1827 1827 1827 1830 1829 ] [ 2678 2677 2677 2677 2680 2679 2682 2681 ] [ 2486 2485 2485 2485 2485 2485 2485 2488 2487 2487 2490 2489 2489 2489 ] [ 2504 2503 2503 2503 2503 2506 2508 ] [ 2738 2737 2737 2737 2737 2740 2739 2739 2739 2742 ] [ 974 973 973 973 973 976 978 ] [ 1814 1816 1818 ] [ 2504 2503 2503 2506 2505 2508 2507 2507 2507 ] [ 4 14 15 15 15 15 15 15 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 17 17 17 17 17 17 17 17 ]

f01br16b22k1-s003 SIL m_B ew1_E a_B v_I o1_E eh1_S m_B ujn1_I t_I u_E v_B eh1_I lj_I u_E SIL

The above are the transition IDs (per frame) of each phone, and the phone is the text below – always aligned with the left-most TID – transition IDs are integers that encode the PDFs (probability density function), the phone identity, and information about self-loops or forward transitions.

The text part above corresponds to phones and the position they occupy in a word: _B is the beginning of a word, _E is the end, _I is word-internal phone, and _S is a singleton (word with one sound only).

The file above can be tweaked using your preference of sed awk, or similar, to look like this:

m ew1 a v o1 eh1 m ujn1 t u v eh1 lj u

These are the aligned words of utterance s003 for female speaker 1 above (literal translation: “My grandfather is very old” – don’t blame the speaker, they were reading prompts for the corpus)

Fstclosure

2 Replies

The fstclosure command implements Kleene closure/Kleene star. That is, it converts a set of strings into the set of strings consisting of zero or more repetitions of strings in the input set. The command can also be used to emulate the “+” operator with --closure_plus flag.

For example, given a simple automaton representing the regular expression a(b|c):

========================
Initial automaton
fstprint --osymbols=words.txt --isymbols=words.txt L.fst
0 1 a a
1 2 b b
1 3 c c
2 4 <eps> <eps>
3 4 <eps> <eps>
4

The language a(b|c)

Running fstclosure produces (a(b|c))*:

========================
Run fstclosure for Kleene Star
fstclosure L.fst Lstar.fst
fstprint --osymbols=words.txt --isymbols=words.txt Lstar.fst
5 0 <eps> <eps>
5
0 1 a a
1 2 b b
1 3 c c
2 4 <eps> <eps>
3 4 <eps> <eps>
4 0 <eps> <eps>
4
========================

The language (a(b|c))*

While running fstclosure --closure_plus produces (a(b|c))+.

========================
Run fstclosure for Kleene plus
fstclosure --closure_plus L.fst Lplus.fst
fstprint --osymbols=words.txt --isymbols=words.txt Lplus.fst
0 1 a a
1 2 b b
1 3 c c
2 4 <eps> <eps>
3 4 <eps> <eps>
4 0 <eps> <eps>
4

The language (a(b|c))+

Kaldi alignments in Matlab

Posted on Github is a first version of Matlab code for reading, displaying, and playing Kaldi phone alignments.

Kaldi-alignments-matlab

Resource management recipe

The initial parts of the Kaldi Resource Management recipe have been run on the server. See /projects/speech/sys/kaldi-trunk/egs/rm/s5. Makefile has targets for building the demo, and some additional ones for examining things. For instance this shows the representation of the lexicon as an fst:

make show_fst_lex /projects/speech/sys/kaldi-trunk/tools/openfst/bin/fstprint --isymbols=data/lang/phones.txt --osymbols=data/lang/words.txt data/lang/L.fst | head -50 ... 1 157 ax_B AROUND 1 160 ax_B ARRIVAL 1 164 ax_B ARRIVE 1 167 ax_B ARRIVED 1 171 ax_B ARRIVING 1 176 eh_B ARROW 1 178 ax_B AS 1 179 ax_B ASTORIA 1 185 ey_B ASUW 1 194 ey_B ASW

Finite State Methods

LING 4485/6485 Fall 2016

Category Archives: Kaldi

utils/prepare_lang.sh

fstrandgen

Kaldi – show-alignments

Fstclosure

Kaldi alignments in Matlab

Resource management recipe