network to separate sources

HARK FORUM network to separate sources

This topic contains 17 replies, has 2 voices, and was last updated by Avatar kohira 1 month, 2 weeks ago.

Viewing 15 posts - 1 through 15 (of 18 total)
  • Author
    Posts
  • #1160
    Avatar
    kohira
    Participant

    Dear Sirs,

    I would like to setup a network to separate sources, where the position of each source is fixed to mic array. Now I have made a transfer function for that using HARKTOOL5-GUI on ubuntu, which name is tr.zip.

    Should I connect any node to INPUT_SOURCES of GHDSS?
    If so, why? in spite of everything is fixed, and what node?

    Or, is there any sample network file for that?

    Thank you.

    #1161
    Avatar
    lapus.er
    Participant

    Hi,

    There is documentation for HARK Transfer Function usage here:
    https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html

    In section 4.2., Evaluating Separation Transfer Functions, it describes how to create a network for sound separation with a GHDSS node. Perhaps the steps provided in that section of the documentation can help you achieve what you want.

    If you have further questions with the steps enumerated in the HARK Transfer Function documentation or anything regarding sound separation in general, please don’t hesitate to post it here.

    Cheers,
    HARK Support Team

    • This reply was modified 2 months, 2 weeks ago by Avatar lapus.er.
    #1163
    Avatar
    kohira
    Participant

    Hi,
    For source separation under the condition where everything is freeze but for mouths, I think it would be achieved by using certain one node which outputs fix values of directions of sources instead of a set of nodes which are LocalizeMUSIC, SourceTracker … and Delay in Figure 6.56.
    But I could not find out the one node yet.
    Could you please tell me the node, if you know?
    Thank you.

    #1164
    Avatar
    lapus.er
    Participant

    Hi Kohira,

    I am not sure which figure (Figure 5.65) you are referring to. Is it in this document: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html?

    If it is not in the link above, can you please post the link of the documentation which has Figure 6.56? It will help me understand your question better if I can see the diagram.

    Cheers,
    HARK Support Team

    #1165
    Avatar
    kohira
    Participant

    Oh, sorry for big mistake!
    That was “HARK Document Version 3.0.0. (Revision: 9272).”
    https://www.hark.jp/document/hark-document-en/subsec-GHDSS.html
    Thank you.

    #1166
    Avatar
    lapus.er
    Participant

    Hi,

    Perhaps you can try using the ConstantLocalization node with the GHDSS node. ConstantLocalization allows you to output constant sound source localization results from multiple sound sources by explicitly specifying the angles or elevations.

    An example of how to use this node with GHDSS is explained here: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html. See figure 45 in Section 4.2.1.

    You can also read a more detailed documentation of the node here: https://www.hark.jp/document/3.0.0/hark-document-en/subsec-ConstantLocalization.html

    Please let us know if this works for you.

    Cheers,
    HARK Support Team

    • This reply was modified 2 months, 2 weeks ago by Avatar lapus.er.
    #1171
    Avatar
    kohira
    Participant

    The network in Figure 45 is what I needed!
    I am trying the network. I will show result in any case.
    Thank you.

    #1172
    Avatar
    kohira
    Participant

    Hello,
    I made experiment to get result shown below.
    1. I have got two stream for two sources which location are fixed.
    2. Each has rather large cross talk, but is better than one obtained by just adding two sources waves.
    I used TR created by calculation (MIC x 8, fixed source x 2).

    Could any body tell me how the TR created by calculation makes it worse than the one by measurement?

    Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?

    Thank you.

    #1174
    Avatar
    lapus.er
    Participant

    Hi,

    Before I can help you, I need to clarify: What is “TR”?

    Cheers,
    HARK Support Team

    #1175
    Avatar
    kohira
    Participant

    Hi,

    It is “Transfer Function.”
    I created it by calculation for eight microphones and two sources.

    Thank you.

    #1176
    Avatar
    lapus.er
    Participant

    Hi kohira,

    >> Could any body tell me how the TR created by calculation makes it worse than the one by measurement?

    We alrady indicated in our documentation that Geometric-Calcuation-based Transfer Function Generation may have less quality as compared to Measurement-based Transfer Function Generation
    depending on the environment. The reason for this is, the calculation does not take into consideration the effects of the possible obstructions in the environment during the recording.

    See documentation: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html#_Toc518480855 (see figure 14).

    So if the environment is not a “Free Space” environment, then the performance of the TR generated using Geometric-Calcuation will surely be lower than that of a TR generated using Measurement.

    >> Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?
    I will ask the team if there is any avaiable sample save files that you can use.

    Cheers,
    HARK Support Team

    #1177
    Avatar
    kohira
    Participant

    Hi,

    My interest is if the separated voice could be used for speech recognition, under the conditions shown below.

    1. In a small room
    2. 3D position of mouth would be change for several inches during speech.
    3. Someone would raise hands during speech, which affects transfer function.

    I am very appreciated if I get sample voice separated best.
    Or, I would like to know maximum performance of HARK.

    Thank you.

    #1178
    Avatar
    lapus.er
    Participant

    Hi kohira,

    There are sample files available for download in this link:
    https://www.hark.jp/download/samples/

    You might be interested in checking out the following files:
    HARK_recog_3.0.0_practice2.zip
    HARK_recog_3.0.0_IROS2018_practice2.zip

    Please let me know if these are not the files that you are looking for.

    Best Regards,
    HARK Support Team

    #1183
    Avatar
    kohira
    Participant

    Hi,

    I’ve tried HARK_recog_3.0.0_practice2.zip to get 12 wav files and kaldi_out.txt containing 12 sentences. And accuracies are 80% to 100% for each sentence.

    But wav files signals are flat, i.e., no voice or sound.

    Could you please tell me how to fix it, or any hint?

    Thank you.
    kohira

    #1184
    Avatar
    lapus.er
    Participant

    Hi kohira,

    Can you please upload the exact files that you are using – except for the ones already found in HARK_recog_3.0.0_practice2.zip. I will also need to know the exact steps that you performed to generate the files.

    I know that what I am asking is a bit tedious on your part, but the items I am requesting are necessary in order for us to identify the problem/issue that you raised.

    Regards,
    HARK Support Team

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic.