Masayuki Takigahira

Forum Replies Created

Viewing 12 posts - 46 through 57 (of 57 total)
  • Author
    Posts
  • in reply to: 雑音情報ファイルの生成について #666

    お問い合わせありがとうございます。

    NOISEi.datが生成されない件ですが、HARK2.1以降の仕様動作であり正常です。
    正確には、ファイルフォーマットがzip形式に変更されており実部と虚部の両方が1つのzipファイルに格納されています。ファイル名の設定をNOISEr.datからNOISE.zip等に変更してご利用ください。

    クックブックでnode_Constant_1が2つ存在していた件、混乱させてしまい申し訳ございません。
    Iteratorシート側に200、Mainシート側に雑音の入力ファイル名であっています。

    クックブックに古いバージョンの記述が残っている件、ご迷惑をおかけしております。
    現在、少しづつではありますが改訂作業を進めている状況です。
    HARK-DocumentのノードリファレンスやFAQの更新を優先しており、それぞれには記載が御座います。
    関連している箇所について挙げさせて頂きます。

    HARK-Documentのノードリファレンスより、CMSaveの項目
    https://www.hark.jp/document/hark-document-ja/subsec-CMSave.html

    FAQより、「NOISEr.datは生成されるが、NOISEi.datが生成されない」の項目
    https://wp.hark.jp/faq/#While_trying_to_create_noise_file_for_suppressing_noise_the_real_part_NOISErdat_is_being_created_but_the_imaginary_part_NOISEidat_is_never_created_Why_is_this_so

    FAQについては英語のみの提供となっており、大変恐縮ではございますがGoogle翻訳などをご活用ください。

    以上、よろしくお願い致します。

    お問い合わせありがとうございます。

    HarkDataStreamSenderノードはsocket通信のTCP/IPプロトコルで バイナリデータ を送信します。バイナリデータですので、toStringやUTF-8変換などを行っては いけません のでご注意ください。
    データ構造につきましては下記URL(HARK-Documentのノードリファレンス、HarkDataStreamSenderの項)の「データ送信の詳細」以降に記載しております。
    https://www.hark.jp/document/hark-document-ja/subsec-HarkDataStreamSender.html

    記載している情報ですが、C/C++の型情報に基づいておりますのでnode.js上でのデータの扱い方を簡単に書かせて頂きます。node.jsでは特定のモジュールを使用しなければ64bit整数を扱えないようですが、下記のような構造ですので時間情報が必要な場合でも、精度の高い時間情報が不要でしたらフレーム数(sampling=16kHz、advance=160の場合1フレームあたり10ms)から算出する事が可能です。

    
    buf = new Buffer([0x04, 0x00, 0x00, 0x00, 0xa0, 0x00, 0x00, 0x00 ... 受信データ ...]);
    
    // ---
    hdh_type    = buf.readInt32LE(0);  // HD_Header (type)    : SRC_INFOを意味する 0x00000004
    hdh_advance = buf.readInt32LE(4);  // HD_Header (advance) : ADVANCE=160を意味する 0x000000a0
    hdh_sec     = buf.readInt32LE(8);
    hdh_sec    *= (2 ** 32); // 恐らくモジュールを追加しなければオーバーフローします
    hdh_sec    += buf.readInt32LE(12); // HD_Header (tv_sec)  : 1960/01/01 00:00:00 からの秒数
    hdh_usec    = buf.readInt32LE(16);
    hdh_usec   *= (2 ** 32); // 恐らくモジュールを追加しなければオーバーフローします
    hdh_usec   += buf.readInt32LE(20); // HD_Header (tv_usec) : 同上からのマイクロ秒(秒数未満)
    // ---
    srcs        = buf.readInt32LE(24); // Sources             : 音源数(下記HDH_SrcInfoの個数)
    // ---
    for(var i = 0; i < srcs; i++){
    
    src_id[i]   = buf.readInt32LE(28+i*20+0);  // HDH_SrcInfo (src_id) : 音源iのid
    src_x[i]    = buf.readFloatLE(28+i*20+4);  // HDH_SrcInfo (x[0])   : 音源iのx座標
    src_y[i]    = buf.readFloatLE(28+i*20+8);  // HDH_SrcInfo (x[1])   : 音源iのy座標
    src_z[i]    = buf.readFloatLE(28+i*20+12); // HDH_SrcInfo (x[2])   : 音源iのz座標
    src_pow[i]  = buf.readFloatLE(28+i*20+16); // HDH_SrcInfo (power)  : 音源iのMUSIC power
    
    }
    // ---
    

    毎フレーム上記のようなデータが届きます。また、音源数 i は定位した音源が無ければ 0 となる事もあります。その場合はループ内のデータ構造(HDH_SrcInfo)部分は受信しません。

    デカルト座標から極座標に変換される場合は、下記URLの座標系の説明をご参照ください。
    https://www.hark.jp/document/hark-document-ja/sect0030.html

    以上、ご参考になれば幸いです。

    in reply to: Sound Source Localization with suppression of constant noise #448

    I identified the cause by checking your network file (PreMeasuredNoiseSSL.n).
    The problem is very simple, the Gener_Pre_Measured_Noise_SSL sheet must be set to “iterator”. Right now it is set to “subnet”.
    In HARK, processing is repeated on an input stream on a frame-by-frame basis.
    Therefore, the type of the sheet created by the user placed next to the InputStream node in the MAIN sheet should be an iterator in general.
    The name of the sheet can be arbitrarily given, but as in the cookbook example, we often use the name “LOOP” for the sheet to be processed repeatedly.

    How to change:
    Left click on the tab of the target sheet and select it, then click “Change to iterator” from the pop menu displayed by right clicking.

    When creating a new sheet:
    If you click on the “+” button, You can change the “Type” parameter from “subnet” to “iterator”.

    Best regards,
    m.takigahira

    in reply to: Sound Source Localization with suppression of constant noise #442

    Please try the following.
    Run the sample network file I posted last time and check if there is an error.

    If an error occurred, the HARK installation itself may have failed. First, try reinstalling HARK. If there is no reason to use the current version, please consider updating to the latest version. If an error occurs during installation, please provide detailed error information, we can help you.

    If it can be executed without error, the noise correlation matrix file named cm.zip will be saved. In that is the case, there are two things you can do. The first is to edit the network file I posted. The second thing is to post your network file to this thread. We want to take a look at your network file.

    Notes:
    I confirmed that the sample network file “sep_rec_offline_output_cm_test.n” I posted last time can be executed even in the environment of HARK 2.2.0. Since it is a network file that does not use the new functions, so you probably can execute it with HARK 2.1.0 or later.

    The reason for downloading the HARK 2.3.0 recognition sample set is that the set of input WAV file and transfer function file can be easily obtained. Of course, recognition scripts for kaldi are only supported on hark 2.3.0 or higher. It will not work on the version you currently have.

    Best regards,
    m.takigahira

    in reply to: Sound Source Localization with suppression of constant noise #438

    Reference information

    You can try the attached sample network file.

    
    wget http://www.hark.jp/networks/HARK_recog_2.3.0.1_practice2.zip
    unzip HARK_recog_2.3.0.1_practice2.zip
    cd HARK_recog_2.3.0.1_practice2
    cp <your download path>/sep_rec_offline_output_cm_test.n ./
    batchflow ./sep_rec_offline_output_cm_test.n ./2SPK-jp.wav
    

    Sorry, this sample audio file is in Japanese.

    Best regards,
    m.takigahira

    in reply to: Sound Source Localization with suppression of constant noise #437

    I expect that the cause is the following.
    IterCount is a node that outputs frame count.
    In the cookbook the constant node uses 200 frames. Every 1 frame can be considered 10 ms. So 200 is the same as 2 seconds. (If ADVANCE is 160 samples in case of the 16 kHz sampling, one frame is 10 ms.)
    In other words, input data must be greater than 2 seconds.
    You can either lower the frames to match the length of the input file.
    Or you can increase the length of the input file to match the frame count.

    The detailed behavior of CMMakerFromFFTwithFlag is described in the HARK document. The HTML version can be found at the following URL.
    http://www.hark.jp/document/hark-document-en/subsec-CMMakerFromFFTwithFlag.html

    Also, although it is irrelevant to the calculation result, old description may remain in the cookbook manual.
    For example, since the format saved by CMSave has been changed to zip format, it is preferable to make the file name extension zip.
    http://www.hark.jp/document/hark-document-en/subsec-.html

    Best regards,
    m.takigahira

    in reply to: Sound Source Localization with suppression of constant noise #430

    >> It is the name of my input.wav?
    Yes.

    If type is set to string and file name is entered directly in the text box, processing will be done with the specified file.
    On the other hand, if you set the type to subnet_param and enter the string ARG1 in the text box you can specify the file at startup.
    This has the same effect as $1 written in the shell script.
    In other words, when you write ARG2, the contents specified by the 2nd argument in the runtime are reflected.
    It is often used when you want to perform the same processing with HARK for various input data.
    This method can be used on many other nodes, but sometimes it is necessary to explicitly indicate the type.
    In that case, specify as :ARG like string:ARG1, float:ARG3 or etc…

    Best regards,
    m.takigahira

    in reply to: KaldiDecoder(v2.4.0)で認識結果が返って来ない #428

    最小構成ではありませんが、私が動作している事を確認出来ているPC環境について記載させて頂きます。

    OS : Ubuntu 16.04.03
    CPU : Intel Core i7-7700@3.60GHz [Turbo Boost:4.2GHz], 4 cores(8 threads)
    Mem : 64GB

    以上、宜しくお願い致します。
    瀧ヶ平

    in reply to: KaldiDecoder(v2.4.0)で認識結果が返って来ない #424

    お問い合わせ、ありがとう御座います。

    こちらの環境でも同様な条件で確認させて頂きましたところ
    負荷が高くなる環境では同様の事象が発生する事を確認致しました。

    次期バージョンで修正を予定しておりますが
    ご提供までに暫く掛かってしまいますので
    ワークアラウンドをご紹介させて頂きます。

    HARKの特徴量送信ノードSpeechRecognitionSMNClientを
    SpeechRecognitionClientに置き換えて頂く事で
    同じPC環境でも認識結果を得られる事を確認しております。

    SpeechRecognitionSMNClientではSMN処理を行うために
    定位結果の該当発話区間における特徴量を一度バッファし、
    SMN処理後に一気に送信する処理を行っております。
    一方SpeechRecognitionClientではフレーム毎に特徴量を
    送信しており、KaldiDecoderの負荷が分散します。

    次期バージョンのリリースまでの期間、
    こちらのワークアラウンドにて対応をお願い致します。
    お手数をお掛けしてしまいますが宜しくお願い致します。

    以上、ご確認のほど宜しくお願い致します。
    瀧ヶ平

    in reply to: Sound Source Localization with suppression of constant noise #419

    Please set the VALUE parameter of Constant node of MAIN as follows.
    – Set type to subnet_param.
    – Enter ARG1 in the text box.
    If you execute network file on terminal, you can give an input WAV file by specifying a file name with the first argument.

    e.g.) ./network.n ./your_input.wav

    Please set no values to all parameters of InputStream node in MAIN.

    Best regards,
    m.takigahira

    in reply to: Thresh parameter in SourceTracker node #418

    SourceTracker’s Thresh is a parameter that judges whether the MUSIC spectrum power
    of the node connected to the preceding stage such as LocalizeMUSIC or not exceeds the threshold value.
    It is difficult to make physical meaning to the value of the MUSIC spectrum itself.
    Therefore, there is no unit.

    For MUSIC spectral power when LocalizeMUSIC is connected, please refer to the following formula (15) and (16) in the document.
    http://www.hark.jp/document/hark-document-en/subsec-LocalizeMUSIC.html

    Thresh differs depending on the user’s environment, so there is no recommended value,
    but you can know roughly the proper value by the following method.
    1. If you set DEBUG parameter to true on the LocalizeMUSIC node, the MUSIC spectrum power in each direction included in the transfer function is output to the stdout.
    2. In the result of this stdout, the value of the column whose numerical value rises when you speaking is the value of the MUSIC spectrum power in the sound source direction.
    3. You can check the average of the values shown during the speaking periods and the average of the values shown during the silence periods.
    4. You fine-adjust the intermediate value of the two values confirmed in step 3. as the initial value of SourceTracker’s Thresh parameter.

    Best regards,
    m.takigahira

    in reply to: C++ code of the nodes #413

    – Case.1
    If you are using Ubuntu distribution, you can easily get source codes using the following method.

    apt-get source <package-name>

    e.g.) In case of the HARK basic package’s source codes and HARK-Python source code required.
    apt-get source harkfd hark-sss hark-python

    – Case.2
    When you need an older version than the one you are currently releasing,
    or
    If you are using another Linux distribution (e.g. Debian, Mint, Fedora, Cent OS, Red Hat, Vine, etc…)
    you can download source code in tar.gz or tar.xz format from the following location.
    However, we do not officially support other than Ubuntu.
    If you use an OS other than Ubuntu, you may need to change configure.ac or/and Makefile.am.

    http://archive.hark.jp/harkrepos/dists/<code-name>/non-free/source/

    e.g.) Xenial, Trusty, Precise are as follows.
    http://archive.hark.jp/harkrepos/dists/xenial/non-free/source/
    http://archive.hark.jp/harkrepos/dists/trusty/non-free/source/
    http://archive.hark.jp/harkrepos/dists/precise/non-free/source/

    If you need a more older version please download from the bottom of the bottom on this page.
    http://www.hark.jp/wiki.cgi?page=Softwares

    Best regards,
    m.takigahira

Viewing 12 posts - 46 through 57 (of 57 total)