Masayuki Takigahira

Forum Replies Created

Viewing 15 posts - 1 through 15 (of 59 total)
  • Author
  • in reply to: Automating HARK input with Batch Process #1497

    Thank you for your inquiry.

    First, create a network file for file input using AudioStreamFromWave . This is a normal network file that takes one WAV file as input. Please refer to Cookbook and Samples for how to create a network file.
    Next, change the parameter type of the Constant node that is inputting the file name from string to subnet_param , and change the parameter value to ARG1 or string:ARG1 . Please refer here for the meaning of ARG<number> .
    With this change, you can change the file name arbitrarily with the argument, so you can execute as follows.

    If you want a network file named network.n to process a WAV file named input.wav:

    harkmw ./network.n ./input.wav

    After that, just write the loop of the file name as follows with shell script.

    If you have a file list:

    for f in <code>cat filelist.txt</code>; do
      harkmw ./network.n ${f};

    Of course, you can also do the following.

    for f in your_path/*.wav; do
      harkmw ./network.n ${f};

    Generally, you should also change the output file. If you forget your changes, they will be overwritten during the loop. Do the same with parameters that set the output filename (or parameters that affect the output filename) for SaveWavePCM or Save...(your save node name)... in a similar way to the input file. The difference is that it uses ARG2 which means the second argument.

    Best regards,
    HARK support team

    in reply to: HARk gui error #1495


    Thank you for your inquiry.

    From the information in the screenshot I saw that the machine with HARK installed appeared to be separate from the machine it was working on, making a remote connection. If you have a remote connection, please try the following steps.

    The behavior when no options are given to the hark_designer command is to start the HARK-Designer server, then start the browser and connect as a client. That is, it tries to boot for use on the local machine.

    When making a remote connection, it is necessary to start up only the HARK-Designer server, so it is necessary to start it as follows.

    hark_designer allowremote

    If port number 3000 is not available (for example, another application is using it), you need to change the port number.
    If you need to change the port number used to connect HARK-Designer from the initial value of 3000, give the environment variable as follows and start hark_designer.

    If you want to set the port number to 4000:

    PORT=4000 hark_designer allowremote

    After the server is up, launch a browser (Firefox, Chrome, etc.) on the machine you are working on and connect to the machine where HARK is installed.

    If the port number is 4000 and the IP is then:


    Best regards,
    HARK support team

    in reply to: HARK connection with KALDI GUI #1459

    All samples we provide from the download page have been confirmed to be executable in advance. First of all, let’s check what is the cause.

    Step 1: After extracting the downloaded sample file, please execute according to the included readme_en.txt without changing any files.
    If it does not work at this stage, you have failed to install HARK and KaldiDecoder. Check the installation method and try the installation process again.

    Step 2: Overwrite final.mdl under the kaldi_conf/chain_sample/tdnn_5b/ directory with final.mdl of your acoustic model. And replace the HCLG.fst , phones.txt , word.txt , and phones/ directory's files under the kaldi_conf/chain_sample/tdnn_5b/graph_multi_a_tri5a_fsh_sw1_tg/ directory with the graph files of your language model.
    You may see an error message when performing the steps of “(1) Launch Kaldi” written in readme_en.txt. The error message usually describes the cause of the crash. If you use iVector, you need to replace the files under kaldi_conf/chain_sample/extractor with the iVectorExtractor generated when training your model. If you do not use iVector, you need to delete the "--ivector-extraction-config=kaldi_conf/ivector_extractor.conf" line in the contents of kaldi_conf/online.conf . Furthermore, the number of contexts written in kaldi_conf/conf/splice.conf may be different from when you trained your acoustic model. In that case, it needs to be modified. These are determined by your acoustic model training settings.
    If you are getting the error message continuously here, please provide a screenshot of the error message. If the error message disappears, KaldiDecoder has started successfully. You can work with HARK by matching the features in the next step.

    Step 3: If you execute “(2) Execute HARK” in readme_en.txt as it is, KaldiDecoder will crash with an error message that the dimensions are different. If the settings of splice and sub-sampling-factor are appropriate, it is possible to cope by matching the dimension of the feature and changing the type of feature.
    In the sample provided by us, the feature’s number of dimensions is set to 40. Please change it according to the number of dimensions of the features used for training your acoustic model.
    Note: This sample set includes two network files. One is practice2-2-2.n for online that is executed in real time with a microphone array, and the other is practice2-2-2_offline.n for offline (also processing with a WAV file recorded with microphone arrays). Both are set for TAMAGO microphone arrays. We recommend that you first test using the offline version.

    Step 4: If no error message is displayed, but the recognition result is incorrect, check the following. We recommend MSLS features that can be generated by HARK for speech recognition, but they are not common. If created using the usual procedure in Kaldi, MFCC features should be used. practice2-2-2.n and practice2-2-2_offline.n use the MSLSExtraction node. By changing the MSLSExtraction node of the network to the MFCCExtraction node using HARK-Designer, it is possible to connect correctly with the acoustic model learned with general MFCC features.

    HARK Support Team

    in reply to: HARK connection with KALDI GUI #1448

    Hi Riya-san,

    I don’t know the KALDI GUI, so I’ll suggest two solutions and I hope you choose the one that suits your purpose.

    1. How to use a third-party external Kaldi decoder.
    If an external Kaldi decoder requests a PCM stream, it can be obtained in the following way.
    HARK has a node called HarkDataStreamSender, which can obtain PCM data (and localization data if necessary) via Socket.
    Please refer to HarkDataStreamSender for usage.
    In this case, you need to parse the PCM data coming from HARK and format it to match the input of the decoder you use. Many people use small scripts such as Python or node.js for this purpose.

    2. How to use the KaldiDecoder we have provided.
    HARK has a function to convert PCM data to MSLS (or MFCC) features, so it sends a stream of features to KaldiDecoder. Sent from HARK using SpeechRecognitionClient node.

    If you are not sure how to set the file path given to KaldiDecoder, please refer to the following sample file.

    The config file included in this sample uses relative paths. Since it is a relative path from the place to execute, we are recommended to use the absolute path when executing from various places.
    For example, if I refer to your ScreenShot, it will be like an attached file. Since the location of the acoustic model (final.mdl) was not known from your ScreenShot, that part is a dummy. You can rewrite it in the same way, so please try it.
    If you are not using iVector in your acoustic model, you need to remove the ivector’s configuration line from the attached config file.

    Note: This sample uses our own MSLS features, so if you want to rewrite this sample and use it, you will need to replace the MSLSExtraction node with the MFCCExtraction node. The MFCC features generated by the MFCCExtraction node are compatible with those used by HTK Toolkit, Kaldi, etc. Please match the number of dimensions and the presence or absence of Delta and Accelerate.

    HARK Support Team

    in reply to: RASP-ZXでの伝達関数測定 #1315


    1) 最初に作成するマイク座標ファイルは何を選択しても問題ありません。直線配置であればマイク座標はgridがよさそうです。音源座標は、次の「2.」の回答で書いているように通常はcircleやsphereが良いと思われます。点の数がマイクの個数以上となるように設定して作成します。
    2) Top画面に戻ってから、出来上がったマイク座標ファイルを編集します。「Visualize」というボタンを押すと、右側に現在の配置が描画され、左側に座標リストが表示されます。リストの右側にある丸の中に「-」というボタンを押すと指定行の座標が削除されます。(注:途中のidを削除した場合は連番になるように編集の最後に「reset id」ボタンを押してください。)
    3) Top画面に再度戻ってから、「Download」ボタンで座標ファイルを保存されることをお勧めします。



    4.ダウンロードできる伝達関数とは、下記のURLからダウンロードできるものでしょうか?もし、その認識で合っていましたら「Other Information (Common for TSP recording and geometric calculation transfer function)」という項目がありますので、ご確認ください。恐らくですが、ここに記載している5度毎72方位(マイクアレイ中心からの仰角16.7度)の円形配置という設定と、作成された際の設定で異なるためと考えられます。ご質問の「2.」で書かれているのは伝達関数作成マニュアルの設定かと思われますが、このマニュアルの説明での作成例ですのでご注意ください。



    こちらの投稿 を既にご覧になっているとの事でしたので、マルチチャネルの場合に必要となる情報だけ回答させて頂きます。

    hark_msgs/HarkWavench に送信するチャネル数を設定して頂き、マルチチャネルのPCMデータは wavedata が各チャネルのデータで、 src にチャネル順に格納して頂ければ送信可能です。


    result = []


    result = []


    for (int k = 0; k < nb_channels; k++) {
      for (int i = 0; i < length; i++) {
        (*output)(k, i) = cbf->deque_wave[0].src[k].wavedata[i];



    data_bytes の件、書き忘れました。全チャネル分になりますのでご注意ください。


    in reply to: ロボットNAOの伝達関数 #1249


    > どこかで伝達関数ファイルが公開されているようならそれについても教えていただけると幸いです。

    *1) アルデバランロボティクスは、2016年5月よりソフトバンクロボティクスに名称が変更された様です。

    HARKで使用する伝達関数ファイルはHARKTOOL5で作成が可能です。HARKTOOL5のドキュメントにつきましては下記のURLをご参照ください。 (English Ver.) (日本語版)


    *2) HARKTOOL5上でマイクの配置と音源の配置を与える事で計算されます。マイク座標が自動生成出来る配置ではない場合、個数だけ合わせて頂きEditで個々のマイク座標を修正する事で任意のマイク配置を指定する事が可能です。


    > NAOの伝達関数の作成時のノウハウについてご教授いただきたいです。



    in reply to: 音源までの距離推定について #1239


    > どういった改造を施せば可能となるのか教えていただけると幸いです。

    の HARK2.1.0 更新履歴で記述されているようにコード上は既に3D定位/分離対応済みとなっております。


    • NewFeatures
    • Released HarkTool5. HarkTool5 is a platform-independent WEB-based generating transfer function tools. This tool has the same features as conventional HarkTool4 with 3D support.
    • Released libharkio3. libharkio3 was designed for file I/O format unification and the consolidation of the matrix operation code etc.

    HARK 1.2.0 から HARK 2.0.x までの LocalizeMUSIC では、3次元座標に対応しているものの MUSICスペクトルのピーク探索が1次元的に行われていましたが、 HARK 2.1.0 以降は libharkio3 による新しい伝達関数フォーマットで近傍情報を保持出来るように再設計され、その近傍情報を元に3次元空間における局所最大でピーク探索を行うように変更されています。近傍とみなす範囲は伝達関数の作成時にHARKTool5の設定で変更できます。


    in reply to: Error: Bad WAVE header #1135

    Thank you for your inquiry.

    The WAV file format supported by the AudioStreamFromWave node is as follows:
    – signed 16bit / 24bit PCM
    – WAV-Ex header is not supported.
    Please see this page for details.

    Your WAV file could not be loaded due to the following:
    – IEEE float 32bit PCM
    Unfortunately, this format is not supported by the AudioStreamFromWave node.

    To correct a file that AudioStreamFromWave could not read, follow these steps:

    Step.1: Please open the WAV file in Audacity. This is the same state as immediately after recording.
    Step.2: Please trace like “File => Export => Export Audio…” and display the Export Audio dialog.
    Step.3: Please select “Other uncompressed files” for the file type.
    Step.4: Please select “WAV (Microsoft)” for the “Header:” item. Please do not choose “WAVEX (Microsoft)” here.
    Step.5: Please select “Signed 24-bit PCM” for the “Encoding:” item. TAMAGO-03 has 24-bit resolution. Please select “Signed 16-bit PCM” for other 16-bit microphone arrays.
    Step.6: Please click “Save” button.

    Best regards,
    HARK support team

    in reply to: Notification of access failure (August 23, 2019 13:00 JST) #1132

    We confirmed that it returned to the normal state.

    HARK support team

    in reply to: Notification of access failure (August 23, 2019 13:00 JST) #1128

    You can check the latest status of AWS from the “Asia Pacific” tab at .

    in reply to: Recording audio stream from ROS – data type error #1122

    You need to use hark_msgs/HarkWave in your workspace.
    In your case, audio_common_msgs/AudioData seems to store mp3 data into uint8[] array, so you will first need to expand it to raw PCM data.

    Second, the data structure of hark_msgs/HarkWave is as follows.

    user@ubuntu:~$ rosmsg show hark_msgs/HarkWave
    std_msgs/Header header
      uint32 seq
      time stamp
      string frame_id
    int32 count
    int32 nch
    int32 length
    int32 data_bytes
    hark_msgs/HarkWaveVal[] src
      float32[] wavedata

    wavedata is a raw PCM data array. Since HARK is not aware of the number of bits, it simply casts an integer value to a floating point type. In other words, 123 is 123.0f .

    data_bytes is the data size. In other words, it is the size of float (4 bytes) multiplied by the size of wavedata .

    length is the number of samples per frame handled by HARK. The initial value of HARK is 512 . Since HARK processed frame by frame, in other words, the size of wavedata must be nch times length=512 .

    nch is the number of channels. Your device seems to be 1ch, so it should be 1 . For microphone array data, a larger number will be stored.

    count is the frame count. Since HARK processing frame by frame, it is necessary to know what frame the data is. In other words, it is incremented as the frame advances. The first frame number is 0 . Not 1 .

    There is a final note. In order to prevent problems in FFT/IFFT processing, etc., the frames processing by HARK are subject to sample overlap processing.
    The following image may help you understand.

    Best regards,

    in reply to: How to receive data from HarkDataStreamSender #1121

    Sample code is not provided from the official website, but specifications and source code are available. I hope you find it helpful.
    There is only one point to note. Since the data is transmitted in little endian, it is not in the network byte order generally called.

    Please refer to the next url for specifications.

    The source code can be obtained with the following command on Ubuntu.
    apt source hark-core

    In the case of Windows environment, it is probably easy to download from the following URL.
    You can download any version by clicking the following file name on the browser. The <version> will be in the form of “x.x.x”.

    Best regards,

    in reply to: Can wios work on Windows10? #1116

    The Windows version of wios supports only the function to record via the network that RASP-24 etc. supports. In other words, wios can be used only when recording over the network with a USB-LAN or USB-Wireless dongle inserted into the USB port of the RASP-ZX.

    Currently, when connecting directly with a USB cable, you can only create a WAV file with a third-party recording tool such as Audacity. HARKTOOL5 can create a transfer function with an Complex Regression Model that does not require synchronized recording.

    If you need a TSP wav file created by synchronized recording, the following workaround also exists.

    Please create Ubuntu installed the virtual machine on VMWare/VirtualBox on Windows and connect RASP-ZX to the virtual machine. In that case, you can record with ALSA.

    Best regards,


    Since I noticed that there was a mistake, I deleted it immediately after the following post. wios did not support both WASAPI and DirectSound (DS) at this time. Only the RASP protocol that cannot be supported by the standard Windows API is supported.

    The RASP-24 is connected via a LAN, and the recording data is transmitted over the network using the SiF original protocol and recorded using the SiF original interface.

    The USB Audio Class (UAC) supported devices connected via USB, such as TAMAGO-03, are recorded with WASAPI or DirectSound (DS) interface.

    RASP-ZX supports two connection methods.
    If you are connected directly to a PC with a USB cable, it will be recorded with WASAPI or DS interface. On the other hand, if a USB-LAN or USB-Wireless dongle is inserted into the USB port and you are trying to connect via a network, it will be connected with the SiF original protocol as same as RASP-24.

    You need to choose a wios command depending on which connection method you use.

    HARK has supported WASAPI since version 3.0, but wios has not yet completed support for WASAPI, DirectSound (DS) will be used. The effect of this difference will hardly occur.

    Best regards,

    > 私の環境上直近で用意できるのがnnet1モデルのみであるため、
    > どうにかnnet1で動作させたいという思いがございます。

    > ご教示いただいた内容からnnet3の場合Kaldiのモデル作成からやり直す必要が
    > あるように見受けられますがnnet1の場合も同様なのでしょうか。

    HARK_recog_2.3.0.1_practice2 に含まれるネットワークファイルでは特徴量の次元数が異なりますので修正が必要です。40次元の特徴量を作るHARKのネットワークファイルのサンプルは、 に含まれております。注意点として、 MSLSExtraction ノードでMSLS特徴量を生成している部分を MFCCExtraction ノードに置き換えて頂く必要が御座います。MFCCExtractionノードのパラメータは MSLSExtraction と同じように40次元となるように設定して差し替えてください。



Viewing 15 posts - 1 through 15 (of 59 total)