]> OzVa Git service - audio-over-stft/summary
 
descriptionUsing a camera pointed at a spectrogram as an intermediate stage of processing audio to allow interation with the sound.
last changeWed, 13 Nov 2024 20:33:45 +0000 (20:33 +0000)
readme

Audio over the Short-Time Fourrier-Transform

William Greenwood - Contact: greenwoodw50 [at] gmail [dot] com

Due to speed requirements, i decided to do a rewrite of this program in rust. This can be seen here. All further developments will be made there.

Abstract

Audio over STFT uses a camera as well as a display to allow interaction with a sound. Sound is converted to a lossless spectrogram before being displayed to a screen. It can then be captured by the relevant capture device and converted back into sound. This process is very slightly lossy due to the compression of converting 16-bit integers into unsigned 8-bit integers.

Technical details

Audio samples are chunked together and a FFT is performed, this converts the samples into a complex array. The length of these vectors is the amplitude of the sound and is currently mapped to the value of the colour. The angle of the vector relative to the real axis is the phase of the sound and is usually discarded, this is mapped to the saturation of the colour.

The hue is currently unmapped but could potentially be used as error-correction data or a copy of either phase or amplitude.

Current modes

Limitations

The window size of the FFT defines only the lower limit of the frequency recoverable. I have found that the lower the window size (and thus, the width of the displayed spectrogram) the better the recovered audio quality. Unfortunately, after graphing the relationship between processing time and spectrogram width, it appears the relationship is exponentially slower the smaller the width is. Therefore, the sample frequency (lowering which dramatically reduces the process time) and the spectrogram height have all been tuned to allow us to increece the window width as much as possible.

Due to this bottle-neck, the following things would increece the quality of the recovered data:

  1. Increeced computational power
  2. Increeced capture quality
  3. Further optimization
  4. Re-writing in a faster language

Current optimization

After profiling the code, steps have been taken to optimize the code. Primarily, the result from the FFT (power) is not mesured in dB, and is therefore scaled logarithmically. To convert this to dB the formuala 20log10(power). This (and its respective inverse) is innefficient. As a replacement for the inverse, a lookup-table has been used.

As well as this, if the window width is over 1800, I have deemed it more efficient to use a log aproximation based of arctan. This is very loosely accurate, but this does not affect our application much as our approximation will be immediately un-aproximated.

shortlog
2024-11-13 willtidying up main
2024-09-10 willChanged loop.py to chache spectrograms
2024-09-09 willAdded .venv
2024-09-03 willVarious small changes
2024-09-01 willFixed buzzing
2024-08-26 willDeleted unecaccary files
2024-08-26 willAdded .gitignore *.pyc
2024-08-26 willRemoved README
2024-08-26 willResolution and waitKey changes
2024-08-25 willOptimixation and notation
2024-08-24 willAdded .gitignore
2024-08-24 willFixed lookup table system
2024-08-18 willFixed camera buffer issue and started on lookup
2024-08-05 willAdded README.html
2024-08-05 will- Added VideoCapture thread class to continuously captu...
2024-07-31 willuhhh
...
heads
2 weeks ago main