Thank you for your inquiry.
The model we are distributing is a learning of the separated sound of the TAMAGO-03 microphone array. When used with different microphone arrays, at least the difference in the volume of the separated sound input when generating the feature amount causes a decrease in recognition performance. Please try to compare the volume of the separated sound with the original evaluation set of IJCAI-PRICAI and adjust the volume until the performance improves.
Also, if the microphone arrangement is significantly different from the TAMAGO-03 microphone array, the recognition performance may deteriorate because the tendency of distortion after separation is different. For best results, you need to create your own model by learn the separated sounds in your microphone array.
Since it is a language model learned with a large vocabulary, many words can be recognized unless it is a word such as jargon or slang, so a language model with a small vocabulary should not be necessary. If you really need to create your own language model, please use Kaldi’s tools to create your language model. You need to run mkgraph.sh with arguments to the directory containing the “final.mdl” file and the directory of your language model.
HARK support team.