network to separate sources

HARK FORUM network to separate sources

Viewing 15 posts - 1 through 15 (of 18 total)
  • Author
    Posts
  • #1160
    kohira
    Participant

      Dear Sirs,

      I would like to setup a network to separate sources, where the position of each source is fixed to mic array. Now I have made a transfer function for that using HARKTOOL5-GUI on ubuntu, which name is tr.zip.

      Should I connect any node to INPUT_SOURCES of GHDSS?
      If so, why? in spite of everything is fixed, and what node?

      Or, is there any sample network file for that?

      Thank you.

      #1161
      lapus.er
      Participant

        Hi,

        There is documentation for HARK Transfer Function usage here:
        https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html

        In section 4.2., Evaluating Separation Transfer Functions, it describes how to create a network for sound separation with a GHDSS node. Perhaps the steps provided in that section of the documentation can help you achieve what you want.

        If you have further questions with the steps enumerated in the HARK Transfer Function documentation or anything regarding sound separation in general, please don’t hesitate to post it here.

        Cheers,
        HARK Support Team

        • This reply was modified 4 years, 7 months ago by lapus.er.
        #1163
        kohira
        Participant

          Hi,
          For source separation under the condition where everything is freeze but for mouths, I think it would be achieved by using certain one node which outputs fix values of directions of sources instead of a set of nodes which are LocalizeMUSIC, SourceTracker … and Delay in Figure 6.56.
          But I could not find out the one node yet.
          Could you please tell me the node, if you know?
          Thank you.

          #1164
          lapus.er
          Participant

            Hi Kohira,

            I am not sure which figure (Figure 5.65) you are referring to. Is it in this document: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html?

            If it is not in the link above, can you please post the link of the documentation which has Figure 6.56? It will help me understand your question better if I can see the diagram.

            Cheers,
            HARK Support Team

            #1165
            kohira
            Participant

              Oh, sorry for big mistake!
              That was “HARK Document Version 3.0.0. (Revision: 9272).”
              https://www.hark.jp/document/hark-document-en/subsec-GHDSS.html
              Thank you.

              #1166
              lapus.er
              Participant

                Hi,

                Perhaps you can try using the ConstantLocalization node with the GHDSS node. ConstantLocalization allows you to output constant sound source localization results from multiple sound sources by explicitly specifying the angles or elevations.

                An example of how to use this node with GHDSS is explained here: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html. See figure 45 in Section 4.2.1.

                You can also read a more detailed documentation of the node here: https://www.hark.jp/document/3.0.0/hark-document-en/subsec-ConstantLocalization.html

                Please let us know if this works for you.

                Cheers,
                HARK Support Team

                • This reply was modified 4 years, 7 months ago by lapus.er.
                #1171
                kohira
                Participant

                  The network in Figure 45 is what I needed!
                  I am trying the network. I will show result in any case.
                  Thank you.

                  #1172
                  kohira
                  Participant

                    Hello,
                    I made experiment to get result shown below.
                    1. I have got two stream for two sources which location are fixed.
                    2. Each has rather large cross talk, but is better than one obtained by just adding two sources waves.
                    I used TR created by calculation (MIC x 8, fixed source x 2).

                    Could any body tell me how the TR created by calculation makes it worse than the one by measurement?

                    Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?

                    Thank you.

                    #1174
                    lapus.er
                    Participant

                      Hi,

                      Before I can help you, I need to clarify: What is “TR”?

                      Cheers,
                      HARK Support Team

                      #1175
                      kohira
                      Participant

                        Hi,

                        It is “Transfer Function.”
                        I created it by calculation for eight microphones and two sources.

                        Thank you.

                        #1176
                        lapus.er
                        Participant

                          Hi kohira,

                          >> Could any body tell me how the TR created by calculation makes it worse than the one by measurement?

                          We alrady indicated in our documentation that Geometric-Calcuation-based Transfer Function Generation may have less quality as compared to Measurement-based Transfer Function Generation
                          depending on the environment. The reason for this is, the calculation does not take into consideration the effects of the possible obstructions in the environment during the recording.

                          See documentation: https://www.hark.jp/document/tf/generating_transfer_functions/Generating_a_Transfer_Function_Using_HARKTOOL5.html#_Toc518480855 (see figure 14).

                          So if the environment is not a “Free Space” environment, then the performance of the TR generated using Geometric-Calcuation will surely be lower than that of a TR generated using Measurement.

                          >> Or, is there any sample wave files obtained by HARK’s source separation, which have little cross talk for two or three sources?
                          I will ask the team if there is any avaiable sample save files that you can use.

                          Cheers,
                          HARK Support Team

                          #1177
                          kohira
                          Participant

                            Hi,

                            My interest is if the separated voice could be used for speech recognition, under the conditions shown below.

                            1. In a small room
                            2. 3D position of mouth would be change for several inches during speech.
                            3. Someone would raise hands during speech, which affects transfer function.

                            I am very appreciated if I get sample voice separated best.
                            Or, I would like to know maximum performance of HARK.

                            Thank you.

                            #1178
                            lapus.er
                            Participant

                              Hi kohira,

                              There are sample files available for download in this link:
                              https://www.hark.jp/download/samples/

                              You might be interested in checking out the following files:
                              HARK_recog_3.0.0_practice2.zip
                              HARK_recog_3.0.0_IROS2018_practice2.zip

                              Please let me know if these are not the files that you are looking for.

                              Best Regards,
                              HARK Support Team

                              #1183
                              kohira
                              Participant

                                Hi,

                                I’ve tried HARK_recog_3.0.0_practice2.zip to get 12 wav files and kaldi_out.txt containing 12 sentences. And accuracies are 80% to 100% for each sentence.

                                But wav files signals are flat, i.e., no voice or sound.

                                Could you please tell me how to fix it, or any hint?

                                Thank you.
                                kohira

                                #1184
                                lapus.er
                                Participant

                                  Hi kohira,

                                  Can you please upload the exact files that you are using – except for the ones already found in HARK_recog_3.0.0_practice2.zip. I will also need to know the exact steps that you performed to generate the files.

                                  I know that what I am asking is a bit tedious on your part, but the items I am requesting are necessary in order for us to identify the problem/issue that you raised.

                                  Regards,
                                  HARK Support Team

                                Viewing 15 posts - 1 through 15 (of 18 total)
                                • You must be logged in to reply to this topic.