[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable-Q transform and arbitrary frequency scale #6

Open
TF3RDL opened this issue Apr 26, 2023 · 7 comments
Open

Variable-Q transform and arbitrary frequency scale #6

TF3RDL opened this issue Apr 26, 2023 · 7 comments
Labels
enhancement New feature or request
Milestone

Comments

@TF3RDL
Copy link
TF3RDL commented Apr 26, 2023

Variable-Q transform is very similar to constant-Q transform except the Q value is lower as the frequency decreases, which is useful if you want to have better time resolution on lower frequencies at the expense of frequency resolution at bass frequencies (like a logarithmic-frequency spectrogram with ERB frequency resolution).

Arbitrary frequency scale for bin spacing also enables variable-Q transform and with that, it can directly calculate Mel spectrogram using VQT with Mel-frequency bin spacing and resolution.

@jurihock
Copy link
Owner

This is not an issue. The name of this project is "Constant-Q Sliding DFT", which is also the main purpose.

@jurihock
Copy link
Owner
jurihock commented Apr 29, 2023

Further reading:

@jurihock jurihock reopened this Aug 31, 2023
@jurihock
Copy link
Owner

Actually, I thought about variable bin bandwidth again. In my current project, an improved time resolution at low frequencies would be beneficial. However, I do not plan to use a frequency scale other than logarithmic in this repo.

According to [1]:

Auditory filters in the human auditory system are approximately constant-Q only for frequencies above 500 Hz and smoothly approach a constant bandwidth towards lower frequencies. Accordingly, music signals generally do not contain closely spaced pitches at low frequencies, thus the Q-factors (relative frequency resolution) can safely be reduced towards lower frequencies, which in turn improves the time resolution.

...the variable bin bandwidth can be mapped via parameter gamma like in [2].

According to [3], the windowing procedure remains the same as in the constant Q case, where gamma equals to 0. Altough an additional memory access to particular bin fiddles is required, I don't expect a huge performance drawback.

[1] A Matlab Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution
[2] librosa.vqt
[3] Sliding with a constant Q

@TF3RDL
Copy link
Author
TF3RDL commented Sep 5, 2023

I have yet to see good use cases of arbitrary bin spacing (which is a special case of VQT and it is closely related to long-term variable-Q transform paper) as last time, I've played around with my VQ-sDFT implementation on my spectrogram sketch to generate Mel spectrogram with this algorithm:
mel spectrogram using sdft

@jurihock
Copy link
Owner
jurihock commented Sep 6, 2023

I suppose it will not be too hard to perform a sliding DFT with arbitrary frequency bin spacing and arbitrary bin bandwidth as well based on the current QDFT implementation. However, a dedicated repository would be appropriate for this purpose. The explicit logarithmic frequency scale also has its place.

@TF3RDL
Copy link
Author
TF3RDL commented Sep 6, 2023

Exactly it is not hard to do something like this (relevant section is when using CQT part of this CodePen audio visualization, which uses Goertzel algorithm instead of sDFT, but it doesn't matter since any CQT/VQT implementation can be easily adapted to use arbitrary bin spacing and bandwidth), it is just logarithmic frequency scaling are more convenient since it follows musical scales whereas perceptual and other frequency scales are not

jurihock added a commit that referenced this issue Sep 9, 2023
@TF3RDL
Copy link
Author
TF3RDL commented Aug 14, 2024

BTW, my VQ-sDFT (alongside SWIFT) implementation with arbitrary frequency scale support is included in my first AudioWorklet-based audio visualization project over CodePen, for those who interested on variable-Q transform and arbitrary frequency audio spectrum analyzer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants