utils/prepare_lang.sh

Before I was able to get this command (Step 5 from Mats’ run.sh), we needed some files that were weren’t aware of. In total, you will need the following files before this command will work. All paths are relative to run.sh.

  • data/local/dict/lexicon.txt
  • data/local/dict/nonsilence_phones.txt
  • data/local/dict/optional_silence.txt
  • data/local/dict/silence_phones.txt
  • path.sh

prepare_lang.sh will complain if you don’t have any one of these. The complaint for path.sh is a little less clear, since not having this file seems to result in other errors.

lexicon.txt contains a lexical entry on each line which consists of a word, a space, and then the phones in that word, separated by spaces.

nonsilence_phones.txt contains one phone symbol per line.

optional_silence.txt contains the symbol for an optional silence. This is just sil, but the file still needs to exist. Make sure that there is a newline at the end.

silence_phones.txt can be identical to optional_silence.txt

path.sh can be copied from RM, though you may need to edit the KALDI_ROOT variable, since this is a relative path.

The German versions of all of these can be seen in kaldi-master/egs/vm1.

Leave a Reply