Category Archives: Uncategorized

fstintersect

fstintersect computes the intersection of two FSAs. An intersection is the same as in math: the intersection of A and B is the set of all inputs,outputs of FSA A that also occur in FSA B.

for example,

fstprint a.fst

a

0   1   1   1

1   2   2    2

1   3   3   3

2   4  0   0

3   4   0   0

4

fstprint b.fs

tb

0   1   1   1

1   2   2   2

2

fstintersect a.fst b.fst | fstprint

final

0   1   1   1

1   2    2   2

2  3    0   0

3

fstreverse

The operation fstreverse produces an fst that will contain all of the reverse strings of the input fst. If fst A transduces string x to y with weight a, then the reverse of A transduces the reverse of x to the reverse of y with weight a.Reverse(). Typically, a = a.Reverse().

========================
Initial automaton, L.fst
fstprint –osymbols=words.txt –isymbols=words.txt L.fst
0 1 a a
1 2 b b
1 3 c c
2 4 <eps> <eps>
3 4 <eps> <eps>
4

L
========================
Run fstreverse on L.fst to produce L2.fst
fstreverse L.fst L2.fst
fstprint –osymbols=words.txt –isymbols=words.txt L2.fst
0 5 <eps> <eps>
1
2 1 a a
3 2 b b
4 2 c c
5 3 <eps> <eps>
5 4 <eps> <eps>

L2
========================

Tokenization Example

This example is from here.
In the tokenization example, it uses the following commands:
fstdraw: Draw FST using symbol table files and Graphviz dot
fstunion: Compute the union of two fsts.
fstclosure: Compute the concatenative closure of a fst.
fstrmepsilon: Remove epsilon-transitions (when both the input and output label are an epsilon) from a transducer.
fstdeterminize: The result will be an equivalent FST that has the property that no state has two transitions with the same input label.
fstminimize: Perform the minimization of deterministic weighted automata and transducers.

First, we create three fst files using fstcompile:

fstcompile --isymbols=ascii.syms --osymbols=wotw.syms > Mars.fst << EOF
> 0 1 M Mars
> 1 2 a <epsilon>
> 2 3 r <epsilon>
> 3 4 s <epsilon>
> 4
> EOF

Mars
fstcompile --isymbols=ascii.syms --osymbols=wotw.syms > Martian.fst << EOF
> 0 1 M Martian
> 1 2 a <epsilon>
> 2 3 r <epsilon>
> 3 4 t <epsilon>
> 4 5 i <epsilon>
> 5 6 a <epsilon>
> 6 7 n <epsilon>
> 7
> EOF

Martian
fstcompile --isymbols=ascii.syms --osymbols=wotw.syms > man.fst << EOF
> 0 1 m man
> 1 2 a <epsilon>
> 2 3 n <epsilon>
> 3
> EOF
man

Then, run fstunion to compute the union of three fsts. And use fstclosure to compute the concatenative closure.
fstunion man.fst Mars.fst | fstunion - Martian.fst | fstclosure > lexicon.fst

lexicon

Simplify fst using fstrmepsilon, fstdeterminize, and fstminimize.
fstrmepsilon lexicon.fst | fstdeterminize | fstminimize > lexicon_opt.fst

Print final output: lexicon_opt.fst
fstprint --isymbols=ascii.syms --osymbols=wotw.syms lexicon_opt.fst | head

lexicon_opt

Fstprint and fstcompile

The command line program fstprint prints a machine in text format. If the arguments --isymbols or --osymbols are not included, labels are printed as numbers. With the symbol arguments, labels are printed in string form. Conversely fstcompile compiles a file in text format into binary. If the source file uses labels in text form, the symbol arguments should be supplied.

The demo uses the lexicon L.fst from resource management. Here is the
result of running /home/mr249/s5-mr249/demo/fstprint_fstcompile/demo.sh.

=========================
LANG=../../data/lang
echo $LANG
../../data/lang
=========================
Run fstprint without symbols.
fstprint $LANG/L.fst | head
0	1	0	0	0.693147182
0	1	1	0	0.693147182
1	2	5	1	0.693147182
1	1	5	1	0.693147182
1	2	29	2	0.693147182
1	1	29	2	0.693147182
1	3	74	3
1	15	74	4
1	21	10	5
1	26	26	6
=========================
Run fstprint with symbols.
fstprint --osymbols=$LANG/words.txt --isymbols=$LANG/phones.txt $LANG/L.fst | head
0	1			0.693147182
0	1	sil		0.693147182
1	2	sil_S	!SIL	0.693147182
1	1	sil_S	!SIL	0.693147182
1	2	ax_S	A	0.693147182
1	1	ax_S	A	0.693147182
1	3	ey_B	A42128
1	15	ey_B	AAW
1	21	ae_B	ABERDEEN
1	26	ax_B	ABOARD
=========================
The same, putting the result in L.fstx
fstprint --osymbols=$LANG/words.txt --isymbols=$LANG/phones.txt $LANG/L.fst > L.fstx
head L.fstx
0	1			0.693147182
0	1	sil		0.693147182
1	2	sil_S	!SIL	0.693147182
1	1	sil_S	!SIL	0.693147182
1	2	ax_S	A	0.693147182
1	1	ax_S	A	0.693147182
1	3	ey_B	A42128
1	15	ey_B	AAW
1	21	ae_B	ABERDEEN
1	26	ax_B	ABOARD
=========================
Compile L.fstx as L2.fst
fstcompile --osymbols=$LANG/words.txt --isymbols=$LANG/phones.txt $LANG/L.fstx L2.fst
Print L2.fst with symbols.
fstprint --osymbols=$LANG/words.txt --isymbols=$LANG/phones.txt L2.fst | head
0	1			0.693147182
0	1	sil		0.693147182
1	2	sil_S	!SIL	0.693147182
1	1	sil_S	!SIL	0.693147182
1	2	ax_S	A	0.693147182
1	1	ax_S	A	0.693147182
1	3	ey_B	A42128
1	4	ey_B	AAW
1	5	ae_B	ABERDEEN
1	6	ax_B	ABOARD

fstrelabel

fstrelabel relabels input and output symbols of a given fst. To use the command you provide a file containing numeric pairs.
for relabeling input symbols: fstrelabel -relabel_ipairs=isyms.txt filename.fst
for relabeling output symbols: fstrelabel -relabel_opairs=osyms.txt filename.fst

Given the following L.fst (using fstdraw):
fst

We can create a new text file containing numeric pairs representing relabeling. For example, the isyms.txt file used below looks like this:

3        1

Indicating inputs of 3 should be relabeled to inputs of 1. After running fstrelabel we have:
fst2

As you can see, the inputs of 3 in the original fst have been changed to inputs of 1. We can then run fstrelabel again to change the output symbols using osyms.txt which contains the same information as isyms.txt. The output produced is:
L3