Stars
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Data manipulation and transformation for audio signal processing, powered by PyTorch
An Open Source Machine Learning Framework for Everyone
Google Research
A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Fast, modern C++ DSP framework, FFT, Sample Rate Conversion, FIR/IIR/Biquad Filters (SSE, AVX, AVX-512, ARM NEON)
Recurrent neural network for audio noise reduction
Production First and Production Ready End-to-End Speech Recognition Toolkit
Tools for handling speech data in machine learning projects.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
🌎 machine learning tutorials (mainly in Python3)
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
Unsupervised Speech Decomposition Via Triple Information Bottleneck
Noise supression using deep filtering
kaldi-asr/kaldi is the official location of the Kaldi project.
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Google's Engineering Practices documentation
Tengine is a lite, high performance, modular inference engine for embedded device
Different implementations of "Weighted Prediction Error" for speech dereverberation
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)