Audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for deep learning. Runs on CPU. Supports mono audio and multichannel audio. Can be integrated in training pipelines in e.g. Tensorflow/Keras or Pytorch. Has helped people get world-class results in Kaggle competitions. Is used by companies making next-generation audio products.

Need a Pytorch-specific alternative with GPU support? Check out torch-audiomentations!

Setup

pip install audiomentations

Usage example

from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
import numpy as np

augment = Compose([
    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5),
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-4, max_semitones=4, p=0.5),
    Shift(p=0.5),
])

# Generate 2 seconds of dummy audio for the sake of example
samples = np.random.uniform(low=-0.2, high=0.2, size=(32000,)).astype(np.float32)

# Augment/transform/perturb the audio data
augmented_samples = augment(samples=samples, sample_rate=16000)

Documentation

The API documentation, along with guides, example code, illustrations and example sounds, is available at https://iver56.github.io/audiomentations/

Transforms

AddBackgroundNoise: Mixes in another sound to add background noise
AddColorNoise: Adds noise with specific color
AddGaussianNoise: Adds gaussian noise to the audio samples
AddGaussianSNR: Injects gaussian noise using a randomly chosen signal-to-noise ratio
AddShortNoises: Mixes in various short noise sounds
AdjustDuration: Trims or pads the audio to fit a target duration
AirAbsorption: Applies frequency-dependent attenuation simulating air absorption
Aliasing: Produces aliasing artifacts by downsampling without low-pass filtering and then upsampling
ApplyImpulseResponse: Convolves the audio with a randomly chosen impulse response
BandPassFilter: Applies band-pass filtering within randomized parameters
BandStopFilter: Applies band-stop (notch) filtering within randomized parameters
BitCrush: Applies bit reduction without dithering
Clip: Clips audio samples to specified minimum and maximum values
ClippingDistortion: Distorts the signal by clipping a random percentage of samples
Gain: Multiplies the audio by a random gain factor
GainTransition: Gradually changes the gain over a random time span
HighPassFilter: Applies high-pass filtering within randomized parameters
HighShelfFilter: Applies a high shelf filter with randomized parameters
Lambda: Applies a user-defined transform
Limiter: Applies dynamic range compression limiting the audio signal
LoudnessNormalization: Applies gain to match a target loudness
LowPassFilter: Applies low-pass filtering within randomized parameters
LowShelfFilter: Applies a low shelf filter with randomized parameters
Mp3Compression: Compresses the audio to lower the quality
Normalize: Applies gain so that the highest signal level becomes 0 dBFS
Padding: Replaces a random part of the beginning or end with padding
PeakingFilter: Applies a peaking filter with randomized parameters
PitchShift: Shifts the pitch up or down without changing the tempo
PolarityInversion: Flips the audio samples upside down, reversing their polarity
RepeatPart: Repeats a subsection of the audio a number of times
Resample: Resamples the signal to a randomly chosen sampling rate
Reverse: Reverses the audio along its time axis
RoomSimulator: Simulates the effect of a room on an audio source
SevenBandParametricEQ: Adjusts the volume of 7 frequency bands
Shift: Shifts the samples forwards or backwards
SpecChannelShuffle: Shuffles channels in the spectrogram
SpecFrequencyMask: Applies a frequency mask to the spectrogram
TanhDistortion: Applies tanh distortion to distort the signal
TimeMask: Makes a random part of the audio silent
TimeStretch: Changes the speed without changing the pitch
Trim: Trims leading and trailing silence from the audio

Changelog

[0.37.0] - 2024-09-03

Changed

Leverage the SIMD-accelerated numpy-minmax package for speed improvements. These transforms are faster now: Limiter, Mp3Compression and Normalize. Unfortunately, this change removes support for macOS running on Intel. Intel Mac users have the following options: A) use audiomentations 0.36.1, B) Create a fork of audiomentations, C) submit a patch to numpy-minmax, D) run Linux or Windows.
Limit numpy dependency to >=1.21,<2 for now, since numpy v2 is not officially supported yet.

For the full changelog, including older versions, see https://iver56.github.io/audiomentations/changelog/

Acknowledgements

Thanks to Nomono for backing audiomentations.

Thanks to all contributors who help improving audiomentations.

Name		Name	Last commit message	Last commit date
Latest commit History 1,221 Commits
.circleci		.circleci
.github/workflows		.github/workflows
audiomentations		audiomentations
demo		demo
docs		docs
overrides/partials/integrations/analytics		overrides/partials/integrations/analytics
tests		tests
.codecov.yml		.codecov.yml
.coveragerc		.coveragerc
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
packaging.md		packaging.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audiomentations

Setup

Usage example

Documentation

Transforms

Changelog

[0.37.0] - 2024-09-03

Changed

Acknowledgements

About

Releases

Packages

Languages

License

ibinti/audiomentations

Folders and files

Latest commit

History

Repository files navigation

Audiomentations

Setup

Usage example

Documentation

Transforms

Changelog

[0.37.0] - 2024-09-03

Changed

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages