Monthly Archives: August 2016

Portuguese Corpora

We have 4 running-speech Portuguese Corpora available:

  • The West Point Brazilian Portuguese Speech corpus LDC2008S04 (Morgan, Ackerlind & Packer, 2008);
  • The CSLU Spoltech Brazilian Portuguese 1.0 LDC2006S16 (Schramm et al., 2006);
  • C-ORAL Brasil;
  • CLUL Spoken Portuguese Corpus, Geographical and Social Varieties (Casteleiro et al., n.d.);

Specific data about each to follow shortly…

Installing FOMA

Foma is a finite-state toolkit that is mostly compatible with XSFT. In addition to an interface for building and displaying finite-state machines, Foma includes a C API for developers. The documentation for Foma is available online here.

Windows

A Windows binary is available here. Download and unzip the file, and place the executables foma.exe, flookup.exe, and cgflookup.exe somewhere in your Path. You can inspect and modify your PATH variable from My Computer > Properties > Advanced > Environment Variables > System Variables. The executables can then be invoked from any directory via the command line.

Note: the system command in Foma will not work unless you have Cygwin installed.

Mac

An OSX binary is available here. Download and unpack the file using tar -xvfz. Then, move the binaries foma and flookup to a directory in your PATH. An easy way to do this it to just move them to /usr/bin/, via the command sudo cp ./foma /usr/bin/.; sudo cp ./flookup /usr/bin/.

Linux

Recent versions of Ubuntu, starting with 16.04 Xenial Xerus, include Foma in the repositories. It can easily be installed via sudo apt install foma-bin. For the C API, you will also need libfoma-dev.

If you don’t have Ubuntu 16.04 (or a different Linux distro which includes Foma), binaries are available online. Choose either the 64 or 32-bit release. Download and unpack using tar -xvfz. Then, move the binaries foma, flookup, and cgflookup to a directory in your PATH (e.g., /usr/bin/).

Source

The source code is available here. Download and unpack, then run make; sudo make install. This will place the resulting binary in /usr/local/. If you edit the parser or the lexer, you may additionally need flex and bison, which are very popular lexer/parser tools for C. If you’re on Windows, this may be a much more complicated process, and you should use the binaries if possible.