Author Archives: Simone Harmath De Lemos

Kaldi – show-alignments

Once you have your alignments you might need to retrieve data from them.

Kaldi’s show-alignments generates an alignment file that is “readable for humans”. Here’s how to invoke it:

show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.1 > ali.1.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.2 > ali.2.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.3 > ali.3.txt
show-alignments $basePath”/data/lang/phones.txt” final.mdl ark:ali.4 > ali.4.txt

  1. The file phones.txt is in data/lang/;
  2. The file final.mdl is in exp/mono_ali/;
  3. The files ali.1, ali.2, ali.3, ali.4 are in exp/mono_ali/. They have to be unziped (gunzip) before being used.

Before running show-alignments, the alignment files look like this:

f01br16b22k1-s003 3 12 18 17 1826 1825 1825 1825 1828 1830 1829 1829 1829 1829 1256 1258 1257 1257 1260 1259 1259 1259 1259 1259 1259 362 361 361 361 364 363 363 363 366 2750 2749 2749 2749 2749 2749 2749 2749 2749 2752 2751 2751 2754 2753 2753 2753 1928 1927 1927 1930 1929 1929 1932 1931 1931 1931 1931 980 982 984 1826 1825 1825 1825 1828 1827 1827 1827 1830 1829 2678 2677 2677 2677 2680 2679 2682 2681 2486 2485 2485 2485 2485 2485 2485 2488 2487 2487 2490 2489 2489 2489 2504 2503 2503 2503 2503 2506 2508 2738 2737 2737 2737 2737 2740 2739 2739 2739 2742 974 973 973 973 973 976 978 1814 1816 1818 2504 2503 2503 2506 2505 2508 2507 2507 2507 4 14 15 15 15 15 15 15 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 17 17 17 17 17 17 17 17

The above is utterance number 3 (s003) for female speaker 1 (f01br16b22k1) in the West Point Brazilian Portuguese LDC Corpus.

After your run show-alignments, you will see:

f01br16b22k1-s003  [ 3 12 18 17 ] [ 1826 1825 1825 1825 1828 1830 1829 1829 1829 1829 ] [ 1256 1258 1257 1257 1260 1259 1259 1259 1259 1259 1259 ] [ 362 361 361 361 364 363 363 363 366 ] [ 2750 2749 2749 2749 2749 2749 2749 2749 2749 2752 2751 2751 2754 2753 2753 2753 ] [ 1928 1927 1927 1930 1929 1929 1932 1931 1931 1931 1931 ] [ 980 982 984 ] [ 1826 1825 1825 1825 1828 1827 1827 1827 1830 1829 ] [ 2678 2677 2677 2677 2680 2679 2682 2681 ] [ 2486 2485 2485 2485 2485 2485 2485 2488 2487 2487 2490 2489 2489 2489 ] [ 2504 2503 2503 2503 2503 2506 2508 ] [ 2738 2737 2737 2737 2737 2740 2739 2739 2739 2742 ] [ 974 973 973 973 973 976 978 ] [ 1814 1816 1818 ] [ 2504 2503 2503 2506 2505 2508 2507 2507 2507 ] [ 4 14 15 15 15 15 15 15 12 10 10 10 10 10 10 10 10 10 10 10 10 10 10 18 17 17 17 17 17 17 17 17 17 ]

f01br16b22k1-s003  SIL            m_B                                                   ew1_E                                                      a_B                                     v_I                                                                                 o1_E                                                       eh1_S           m_B                                                   ujn1_I                                      t_I                                                                       u_E                                    v_B                                                   eh1_I                           lj_I               u_E                                              SIL 

The above are the transition IDs (per frame) of each phone, and the phone is the text below – always aligned with the left-most TID – transition IDs are integers that encode the PDFs (probability density function), the phone identity, and information about self-loops or forward transitions.

The text part above corresponds to phones and the position they occupy in a word: _B is the beginning of a word, _E is the end, _I is word-internal phone, and _S is a singleton (word with one sound only).

The file above can be tweaked using your preference of sed awk, or similar, to look like this:

m ew1   a v o1  eh1     m ujn1 t u      v eh1 lj u

These are the aligned words of utterance s003 for female speaker 1 above (literal translation: “My grandfather is very old” – don’t blame the speaker, they were reading prompts for the corpus)

Portuguese Corpora

We have 4 running-speech Portuguese Corpora available:

  • The West Point Brazilian Portuguese Speech corpus LDC2008S04 (Morgan, Ackerlind & Packer, 2008);
  • The CSLU Spoltech Brazilian Portuguese 1.0 LDC2006S16 (Schramm et al., 2006);
  • C-ORAL Brasil;
  • CLUL Spoken Portuguese Corpus, Geographical and Social Varieties (Casteleiro et al., n.d.);

Specific data about each to follow shortly…