Reply To: Kaldi results / training a new model

April 30, 2021 at 4:01 pm #2365

Moderator

Yes, the correct way is to adjust the MultiGain connected after AudioStreamFromMic. To check the amplitude of the separated sound, check the output of SaveWave PCM. The HARK Cookbook URL below is an example of a SaveWave PCM connection. The IJCAI-PRICAI network file differs in that the separated sound after noise suppression is output, but the connection is such that the separated sound is output as in “Connection example 2”.
https://www.hark.jp/document/hark-cookbook-en/subsec-Separation-002.html

Note: The IJCAI-PRICAI network file uses the separated sound file output by SaveWave PCM only for confirmation. When converting to features, the frequency domain is calculated as it is, so be aware that changing the gain parameter of Synthesize does not affect the features used in the recognition process.

Use a toolkit called Kaldi to train the model. Kaldi contains a number of sample recipes for learning models with the corpus.
https://www.kaldi-asr.org/
https://github.com/kaldi-asr/kaldi
We use a paid corpus, but there are also free corpus. Fortunately, there are many free English corpus.

Note that we are using MSLS features. It is not a general MFCC features. Therefore, we will propose two methods, so please select the one you want.
We have confirmed that using MSLS features is better than MFCC features, so we chose method 2, but the work cost is less if we choose 1.

1. Simply replace the MSLS Extraction contained in the HARK network file with the MFCC Extraction and match the feature dimensions to those used by the recipe.
2. Understand the features output by MSLS Extraction and make some patches to the recipes used for learning Kaldi.

Work required to prepare training data:
First, the impulse data is convoluted into the corpus data to be used, and the input data that simulates the microphone array coming from various sound source directions is prepared.

For method 1:
Next, input the data that simulates the input of the prepared microphone array to the network file, and use the separated sound file output by SaveWave PCM for learning.

For method 2:
Next, prepare a network file to obtain the features to be learned. Duplicate the network file you are currently using for speech recognition and make the following modifications: Use AudioStreamFromWAV instead of AudioStreamFromMic. At that time, don’t forget to set CONDITION for the EOF terminal. HARK also has a Save HTK Feature node for saving in HTK Feature format. The feature amount input to SpeechRecognition (SMN) Client is connected to SaveHTKFeature. At that time, set the parameter of SaveHTKFeature to USER.
In the created network file, data that simulates the input of the microphone array prepared first is input, and the output feature amount file is used for training.
Replace the features learned by Kaldi with those output by HARK. Also, match the number of dimensions in the config file related to the features of the learning recipe.
Supplementary explanation: Kaldi has a method of learning while converting PCM data to features, but please do not use Kaldi’s feature conversion function and directly read the HTK Feature format output by HARK for learning. By doing so, the patch that outputs the MSLS feature to Kaldi’s feature conversion program becomes unnecessary.

Best regards,
HARK support team.

Reply To: Kaldi results / training a new model

Forum User

Forum Search

Recent Replies