HARK FORUM › network to separate sources
- This topic has 17 replies, 2 voices, and was last updated 5 years, 1 month ago by kohira.
-
AuthorPosts
-
September 3, 2019 at 11:19 am #1160
Dear Sirs,
I would like to setup a network to separate sources, where the position of each source is fixed to mic array. Now I have made a transfer function for that using HARKTOOL5-GUI on ubuntu, which name is tr.zip.
Should I connect any node to INPUT_SOURCES of GHDSS?
If so, why? in spite of everything is fixed, and what node?Or, is there any sample network file for that?
Thank you.
September 4, 2019 at 3:30 pm #1161Hi,
There is documentation for HARK Transfer Function usage here:
https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.htmlIn section 4.2., Evaluating Separation Transfer Functions, it describes how to create a network for sound separation with a GHDSS node. Perhaps the steps provided in that section of the documentation can help you achieve what you want.
If you have further questions with the steps enumerated in the HARK Transfer Function documentation or anything regarding sound separation in general, please don’t hesitate to post it here.
Cheers,
HARK Support Team- This reply was modified 5 years, 2 months ago by lapus.er.
September 5, 2019 at 10:49 am #1163Hi,
For source separation under the condition where everything is freeze but for mouths, I think it would be achieved by using certain one node which outputs fix values of directions of sources instead of a set of nodes which are LocalizeMUSIC, SourceTracker … and Delay in Figure 6.56.
But I could not find out the one node yet.
Could you please tell me the node, if you know?
Thank you.September 5, 2019 at 3:10 pm #1164Hi Kohira,
I am not sure which figure (Figure 5.65) you are referring to. Is it in this document: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html?
If it is not in the link above, can you please post the link of the documentation which has Figure 6.56? It will help me understand your question better if I can see the diagram.
Cheers,
HARK Support TeamSeptember 5, 2019 at 5:19 pm #1165Oh, sorry for big mistake!
That was “HARK Document Version 3.0.0. (Revision: 9272).”
https://www.hark.jp/document/hark-document-en/subsec-GHDSS.html
Thank you.September 6, 2019 at 11:09 am #1166Hi,
Perhaps you can try using the ConstantLocalization node with the GHDSS node. ConstantLocalization allows you to output constant sound source localization results from multiple sound sources by explicitly specifying the angles or elevations.
An example of how to use this node with GHDSS is explained here: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html. See figure 45 in Section 4.2.1.
You can also read a more detailed documentation of the node here: https://www.hark.jp/document/3.0.0/hark-document-en/subsec-ConstantLocalization.html
Please let us know if this works for you.
Cheers,
HARK Support Team- This reply was modified 5 years, 2 months ago by lapus.er.
September 9, 2019 at 6:45 pm #1171The network in Figure 45 is what I needed!
I am trying the network. I will show result in any case.
Thank you.September 17, 2019 at 11:00 am #1172Hello,
I made experiment to get result shown below.
1. I have got two stream for two sources which location are fixed.
2. Each has rather large cross talk, but is better than one obtained by just adding two sources waves.
I used TR created by calculation (MIC x 8, fixed source x 2).Could any body tell me how the TR created by calculation makes it worse than the one by measurement?
Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?
Thank you.
September 25, 2019 at 3:35 pm #1174Hi,
Before I can help you, I need to clarify: What is “TR”?
Cheers,
HARK Support TeamSeptember 25, 2019 at 5:33 pm #1175Hi,
It is “Transfer Function.”
I created it by calculation for eight microphones and two sources.Thank you.
September 26, 2019 at 3:48 pm #1176Hi kohira,
>> Could any body tell me how the TR created by calculation makes it worse than the one by measurement?
We alrady indicated in our documentation that Geometric-Calcuation-based Transfer Function Generation may have less quality as compared to Measurement-based Transfer Function Generation
depending on the environment. The reason for this is, the calculation does not take into consideration the effects of the possible obstructions in the environment during the recording.See documentation: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html#_Toc518480855 (see figure 14).
So if the environment is not a “Free Space” environment, then the performance of the TR generated using Geometric-Calcuation will surely be lower than that of a TR generated using Measurement.
>> Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?
I will ask the team if there is any avaiable sample save files that you can use.Cheers,
HARK Support TeamSeptember 26, 2019 at 5:25 pm #1177Hi,
My interest is if the separated voice could be used for speech recognition, under the conditions shown below.
1. In a small room
2. 3D position of mouth would be change for several inches during speech.
3. Someone would raise hands during speech, which affects transfer function.I am very appreciated if I get sample voice separated best.
Or, I would like to know maximum performance of HARK.Thank you.
September 27, 2019 at 12:02 pm #1178Hi kohira,
There are sample files available for download in this link:
https://www.hark.jp/download/samples/You might be interested in checking out the following files:
HARK_recog_3.0.0_practice2.zip
HARK_recog_3.0.0_IROS2018_practice2.zipPlease let me know if these are not the files that you are looking for.
Best Regards,
HARK Support TeamOctober 2, 2019 at 4:41 pm #1183Hi,
I’ve tried HARK_recog_3.0.0_practice2.zip to get 12 wav files and kaldi_out.txt containing 12 sentences. And accuracies are 80% to 100% for each sentence.
But wav files signals are flat, i.e., no voice or sound.
Could you please tell me how to fix it, or any hint?
Thank you.
kohiraOctober 4, 2019 at 2:47 pm #1184Hi kohira,
Can you please upload the exact files that you are using – except for the ones already found in HARK_recog_3.0.0_practice2.zip. I will also need to know the exact steps that you performed to generate the files.
I know that what I am asking is a bit tedious on your part, but the items I am requesting are necessary in order for us to identify the problem/issue that you raised.
Regards,
HARK Support Team -
AuthorPosts
- You must be logged in to reply to this topic.