I am developing a silence detection application. I am in the middle of understanding the formula for AudiostreamFromMic as specified in the Cookbook:
Recording time = (LENGTH + (number of frames - 1) * ADVANCE)/SAMPLING RATE
1. Why is the LENGTH and ADVANCE set to 512 and 160, respectively? Can it be set arbitrarily or there is some basis for the defaults?
2. Any source material for the equation? I am trying to find the equation in DSP books or online materials to understand the meaning of the numbers.
3. I am also exploring PyAudio and other Python audio libraries to replicate audio recording. Normally, the CHUNK / FRAME size is set to 1024. How is it different with 512 / 160 values?
My goal is to record audio only when there is a noise (i.e., if the power value / RMS value is more than certain threshold). Then, stop recording when the power value / RMS value is lower than the threshold. Any advice in doing this in HARK