HARK connection with KALDI GUI

This topic has 5 replies, 3 voices, and was last updated 5 years, 3 months ago by Masayuki Takigahira.

Viewing 6 posts - 1 through 6 (of 6 total)

Author

Posts
March 9, 2020 at 10:31 am #1404
Riya Parth Dube
Participant
Hello
I have a KALDI GUI created with QT creator and i want to connect my sound file pre-processed from the HARK output to that GUI for live decoding.
I have both language and acoustic models for that. Please help me with how can I do it.
thank you
Riya.
March 9, 2020 at 11:01 am #1406
lapus.er
Participant
Hi Riya,

There is documentation for integrating Kaldi with HARK here: https://www.hark.jp/document/3.0.0/hark-document-en/subsec-KaldiDecoder.html

Please let us know if there is a specific use case that you want to achieve which is not in the documentation.

Cheers,
HARK Support Team
- This reply was modified 5 years, 3 months ago by lapus.er.
March 12, 2020 at 11:11 am #1442
Riya Parth Dube
Participant
according to the document, i use NNET 3 model and when i try to assign path, it shows me that the files are stil not set to path and says permission denied. attaching screen shot for reference.
- This reply was modified 5 years, 3 months ago by Riya Parth Dube.
Attachments:
1. Screenshot-from-2020-03-12-12-19-29-min.png
2. Screenshot-from-2020-03-11-15-48-26-min.png
March 12, 2020 at 3:33 pm #1448
Masayuki Takigahira
Moderator
Hi Riya-san,

I don’t know the KALDI GUI, so I’ll suggest two solutions and I hope you choose the one that suits your purpose.

1. How to use a third-party external Kaldi decoder.
If an external Kaldi decoder requests a PCM stream, it can be obtained in the following way.
HARK has a node called HarkDataStreamSender, which can obtain PCM data (and localization data if necessary) via Socket.
Please refer to HarkDataStreamSender for usage.
In this case, you need to parse the PCM data coming from HARK and format it to match the input of the decoder you use. Many people use small scripts such as Python or node.js for this purpose.

2. How to use the KaldiDecoder we have provided.
HARK has a function to convert PCM data to MSLS (or MFCC) features, so it sends a stream of features to KaldiDecoder. Sent from HARK using SpeechRecognitionClient node.

If you are not sure how to set the file path given to KaldiDecoder, please refer to the following sample file.
Samples
HARK_recog_3.0.0_IROS2018_practice2.zip

The config file included in this sample uses relative paths. Since it is a relative path from the place to execute, we are recommended to use the absolute path when executing from various places.
For example, if I refer to your ScreenShot, it will be like an attached file. Since the location of the acoustic model (final.mdl) was not known from your ScreenShot, that part is a dummy. You can rewrite it in the same way, so please try it.
If you are not using iVector in your acoustic model, you need to remove the ivector’s configuration line from the attached config file.

Note: This sample uses our own MSLS features, so if you want to rewrite this sample and use it, you will need to replace the MSLSExtraction node with the MFCCExtraction node. The MFCC features generated by the MFCCExtraction node are compatible with those used by HTK Toolkit, Kaldi, etc. Please match the number of dimensions and the presence or absence of Delta and Accelerate.

Sincerely,
HARK Support Team
- This reply was modified 5 years, 3 months ago by Masayuki Takigahira.
Attachments:
1. online.zip
March 13, 2020 at 11:16 am #1455
Riya Parth Dube
Participant
hi I looked through the HARK_recog_3.0.0_IROS2018_practice2.zip

BUt i am unable to link my file path to kaldi decoder even when i follow the same procedure.
March 16, 2020 at 4:48 pm #1459
Masayuki Takigahira
Moderator
All samples we provide from the download page have been confirmed to be executable in advance. First of all, let’s check what is the cause.

Step 1: After extracting the downloaded sample file, please execute according to the included readme_en.txt without changing any files.
If it does not work at this stage, you have failed to install HARK and KaldiDecoder. Check the installation method and try the installation process again.

Step 2: Overwrite final.mdl under the kaldi_conf/chain_sample/tdnn_5b/ directory with final.mdl of your acoustic model. And replace the HCLG.fst , phones.txt , word.txt , and phones/ directory's files under the kaldi_conf/chain_sample/tdnn_5b/graph_multi_a_tri5a_fsh_sw1_tg/ directory with the graph files of your language model.
You may see an error message when performing the steps of “(1) Launch Kaldi” written in readme_en.txt. The error message usually describes the cause of the crash. If you use iVector, you need to replace the files under kaldi_conf/chain_sample/extractor with the iVectorExtractor generated when training your model. If you do not use iVector, you need to delete the "--ivector-extraction-config=kaldi_conf/ivector_extractor.conf" line in the contents of kaldi_conf/online.conf . Furthermore, the number of contexts written in kaldi_conf/conf/splice.conf may be different from when you trained your acoustic model. In that case, it needs to be modified. These are determined by your acoustic model training settings.
If you are getting the error message continuously here, please provide a screenshot of the error message. If the error message disappears, KaldiDecoder has started successfully. You can work with HARK by matching the features in the next step.

Step 3: If you execute “(2) Execute HARK” in readme_en.txt as it is, KaldiDecoder will crash with an error message that the dimensions are different. If the settings of splice and sub-sampling-factor are appropriate, it is possible to cope by matching the dimension of the feature and changing the type of feature.
In the sample provided by us, the feature’s number of dimensions is set to 40. Please change it according to the number of dimensions of the features used for training your acoustic model.
Note: This sample set includes two network files. One is practice2-2-2.n for online that is executed in real time with a microphone array, and the other is practice2-2-2_offline.n for offline (also processing with a WAV file recorded with microphone arrays). Both are set for TAMAGO microphone arrays. We recommend that you first test using the offline version.

Step 4: If no error message is displayed, but the recognition result is incorrect, check the following. We recommend MSLS features that can be generated by HARK for speech recognition, but they are not common. If created using the usual procedure in Kaldi, MFCC features should be used. practice2-2-2.n and practice2-2-2_offline.n use the MSLSExtraction node. By changing the MSLSExtraction node of the network to the MFCCExtraction node using HARK-Designer, it is possible to connect correctly with the acoustic model learned with general MFCC features.

Sincerely,
HARK Support Team
- This reply was modified 5 years, 3 months ago by Masayuki Takigahira.
- This reply was modified 5 years, 3 months ago by Masayuki Takigahira.
Author

Posts