]>
description | Using a camera pointed at a spectrogram as an intermediate stage of processing audio to allow interation with the sound. |
last change | Wed, 13 Nov 2024 20:33:45 +0000 (20:33 +0000) |
William Greenwood - Contact: greenwoodw50 [at] gmail [dot] com
Audio over STFT uses a camera as well as a display to allow interaction with a sound. Sound is converted to a lossless spectrogram before being displayed to a screen. It can then be captured by the relevant capture device and converted back into sound. This process is very slightly lossy due to the compression of converting 16-bit integers into unsigned 8-bit integers.
Audio samples are chunked together and a FFT is performed, this converts the samples into a complex array. The length of these vectors is the amplitude of the sound and is currently mapped to the value of the colour. The angle of the vector relative to the real axis is the phase of the sound and is usually discarded, this is mapped to the saturation of the colour.
The hue is currently unmapped but could potentially be used as error-correction data or a copy of either phase or amplitude.
The window size of the FFT defines only the lower limit of the frequency recoverable. I have found that the lower the window size (and thus, the width of the displayed spectrogram) the better the recovered audio quality. Unfortunately, after graphing the relationship between processing time and spectrogram width, it appears the relationship is exponentially slower the smaller the width is. Therefore, the sample frequency (lowering which dramatically reduces the process time) and the spectrogram height have all been tuned to allow us to increece the window width as much as possible.
Due to this bottle-neck, the following things would increece the quality of the recovered data:
After profiling the code, steps have been taken to optimize the code. Primarily, the result from the FFT (power) is not mesured in dB, and is therefore scaled logarithmically. To convert this to dB the formuala 20log10(power). This (and its respective inverse) is innefficient. As a replacement for the inverse, a lookup-table has been used.
As well as this, if the window width is over 1800, I have deemed it more efficient to use a log aproximation based of arctan. This is very loosely accurate, but this does not affect our application much as our approximation will be immediately un-aproximated.
2 weeks ago | main | shortlog | log | tree |