Commit be22248
authored
[src] Make word alignment optional (#4802)
* Remove unused variable.
* cudadecoder: Make word alignment optional.
For CTC models using word pieces or graphemes, there is not enough
positional information to use the word alignment.
I tried marking every unit as "singleton" word_boundary.txt, but this
explodes the state space very, very often. See:
nvidia-riva/riva-asrlib-decoder#3
With the "_" character in CTC models predicting word pieces, we at the
very least know which word pieces begin a word and which ones are
either in the middle of the word or the end of a word, but the
algorithm would still need to be rewritten, especially since "blank"
is not a silence phoneme (it can appear between).
I did look into using the lexicon-based word alignment. I don't have a
specific complaint about it, but I did get a weird error where it
couldn't create a final state at all in the output lattice, which
caused Connect() to output an empty lattice. This may be because I
wasn't quite sure how to handle the blank token. I treat it as its own
phoneme, bcause of limitations in TransitionInformation, but this
doesn't really make any sense.
Needless to say, while the CTM outputs of the cuda decoder will be
correct from a WER point of view, their time stamps won't be correct,
but they probably never were in the first place, for CTC models.1 parent f6f4cca commit be22248
File tree
2 files changed
+8
-9
lines changed- src
- cudadecoder
- fstext
2 files changed
+8
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
88 | 89 | | |
89 | 90 | | |
90 | 91 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
689 | 689 | | |
690 | 690 | | |
691 | 691 | | |
692 | | - | |
693 | 692 | | |
694 | 693 | | |
695 | 694 | | |
696 | | - | |
697 | 695 | | |
698 | 696 | | |
699 | 697 | | |
| |||
0 commit comments