WO2025190785A1 - Appareil et procédé de traitement d'un fichier audio stockant une piste musicale et appareil et procédé de détermination d'un échantillon sous-jacent à une piste musicale - Google Patents
Appareil et procédé de traitement d'un fichier audio stockant une piste musicale et appareil et procédé de détermination d'un échantillon sous-jacent à une piste musicaleInfo
- Publication number
- WO2025190785A1 WO2025190785A1 PCT/EP2025/056195 EP2025056195W WO2025190785A1 WO 2025190785 A1 WO2025190785 A1 WO 2025190785A1 EP 2025056195 W EP2025056195 W EP 2025056195W WO 2025190785 A1 WO2025190785 A1 WO 2025190785A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- music track
- sound source
- determined
- pieces
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/051—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/066—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for pitch analysis as part of wider processing for musical purposes, e.g. transcription, musical performance evaluation; Pitch recognition, e.g. in polyphonic sounds; Estimation or use of missing fundamental
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/125—Medley, i.e. linking parts of different musical pieces in one single piece, e.g. sound collage, DJ mix
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/131—Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
- G10H2240/141—Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/121—Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
- G10H2240/145—Sound library, i.e. involving the specific use of a musical database as a sound bank or wavetable; indexing, interfacing, protocols or processing therefor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
Definitions
- Music tracks are based on various instruments and/or samples. There may be a demand for techniques for the decomposition of music tracks.
- the present disclosure provides an apparatus for processing an audio file storing a music track.
- the apparatus comprises processing circuitry configured to perform source separation on the audio file to determine an audio signal of a sound source in the music track.
- the processing circuitry is further configured to determine times of occurrence of the sound source in the music track based on the determined audio signal of the sound source.
- the processing circuitry is configured to extract sound pieces from the determined audio signal of the sound source at the determined times of occurrence of the sound source.
- the processing circuitry is further configured to group the extracted sound pieces into groups of similar sound pieces and determine a respective sound that is common to each sound piece in the respective group of similar sound pieces as a sound sample of the sound source.
- the present disclosure provides a method for processing an audio file storing a music track.
- the method comprises performing source separation on the audio file to determine an audio signal of a sound source in the music track. Furthermore, the method comprises determining times of occurrence of the sound source in the music track based on the determined audio signal of the sound source.
- the method additionally comprises extracting sound pieces from the determined audio signal of the sound source at the determined times of occurrence of the sound source.
- the method comprises grouping the extracted sound pieces into groups of similar sound pieces and determining a respective sound that is common to each sound piece in the respective group of similar sound pieces as a sound sample of the sound source.
- the present disclosure provides a method for determining a sample underlying a music track.
- the method comprises determining a spectral representation of the music track based on an audio file storing the music track. Further, the method comprises determining a repeating pattern in the spectral representation of the music track and extracting sound pieces from the audio file comprising the determined repeating pattern. The method additionally comprises determining a sound that is common to each of the extracted sound pieces as the sample underlying the music track.
- the present disclosure provides a non-transitory machine- readable medium having stored thereon a program having a program code for performing the method according to the second aspect or the fourth aspect, when the program is executed on a processor or a programmable hardware.
- the present disclosure provides a program having a program code for performing the method according to the second aspect or the fourth aspect, when the program is executed on a processor or a programmable hardware.
- Fig. 1 illustrates a flowchart of an example of a method for processing an audio file storing a music track
- Fig. 3 illustrates an exemplary process flow for an unpitched percussion instrument
- Fig. 5 illustrates a flowchart of an example of a method for determining a sample underlying a music track
- Fig. 7 illustrates an exemplary apparatus for performing the methods described herein.
- Fig- 1 illustrates a method 100 for processing an audio file storing a music track (music piece, song).
- the audio file is a digital file (i.e., machine- or computer-readable data) that contains audio data, which can be played back as sound.
- the audio data represent the music track.
- the audio file may be in any file format such as, e.g., MP3, WAV, FLAC or AAC.
- the music track is an individual piece or recording of music.
- the music track may be a specific portion of a larger musical work or a standalone composition.
- the method 100 comprises performing 102 source separation (also known as “audio source separation” or “audio separation”) on the audio file to determine an audio signal (audio data) of a sound source in the music track.
- Source separation is a technique (process) in which individual sound sources within an audio mixture such as the music track are separated or isolated from one another.
- a sound source is an individual element or component that contributes to the overall auditory experience of the music track.
- a sound source may be an instrument, a vocal (voice) or another audio element that produces distinct sounds within a musical composition (e.g., ambient or environmental sounds or electronic sounds of an electronic device such as a synthesizer or a drum machine).
- source separation allows to extract the distinct components or sound sources that contribute to the overall music track stored in the audio file.
- Various techniques such as Blind Source Separation (BSS), Informed Source Separation (ISS) or trained machine-learning models may be used in performing source separation on the audio file.
- the determined audio signal of the sound source in the music track is a (e.g., electronic or digital) representation of the sound of the sound source extracted from the music track.
- the method 100 comprises determining 104 times of occurrence of the sound source in the music track based on the determined audio signal of the sound source.
- the times of occurrence of the sound source in the music track are the specific moments or time instances when the sound source appears or is audible in the music track.
- the times of occurrence of the sound source in the music track may be timestamps and/or durations during which the sound source is present or active in the music track.
- Various techniques for determining the times of occurrence of a sound source in a music track such as automated analysis of a waveform or a spectrogram of the determined audio signal of the sound source may be used. Specific examples will be described below in greater detail with reference to Fig. 3 and Fig. 4.
- the method 100 additionally comprises extracting 106 sound pieces (excerpts, fragments, snippets) from the determined audio signal of the sound source at the determined times of occurrence of the sound source.
- sound pieces excerpts, fragments, snippets
- specific sections or portions of the audio signal of the sound source are isolated or separated at the previously determined times of occurrence of the sound source.
- a plurality of sound pieces are obtained as a result of the extraction.
- the sound pieces are representations of the sound of the sound source at the times of occurrence of the sound source in the music track.
- the method 100 comprises grouping 108 the extracted sound pieces into groups of similar sound pieces. In other words, the extracted sound pieces are grouped based on similarities between the sound pieces.
- Various characteristics may be used for determining the similarity between the extracted sound pieces. For example, sound pieces in which the sound source plays a specific (particular) note may be grouped together. In other words, sound pieces in which the sound source emits sound of a specific pitch of frequency may be grouped together. Similarly, sound pieces in which the sound source plays a specific type of beat may be ground together. One or more groups of similar sound pieces are obtained as a result of the grouping.
- the method 100 further comprises determining 110 a respective sound that is common to each sound piece in the respective group of similar sound pieces as a sound sample of the sound source.
- Various techniques such as statistical approaches may be used to determine the respective sound that is common to each sound piece in the respective group of similar sound pieces.
- the sound piece in the respective group of similar sound pieces may be cross-corelated to determine the sound common to each sound piece in the respective group.
- the common sound may then be extracted as a respective sound sample of the sound source.
- a respective sound sample of the sound source is obtained for each group of similar sound pieces. For example, if the different groups relate to different played notes or pitches of the sound source, the samples may represent different played notes of the sound source omitting other interfering sounds present in the sound pieces. Similarly, if the different groups relate to different played beats, the samples may represent different played beats of the sound source omitting other interfering sounds present in the sound pieces.
- the method 100 allows to separate one or more sound samples of a sound source from the music track.
- the method 100 is described above for a single sound source. However, in case the sound of multiple sound sources (e.g., multiple instruments and/or vocals) is combined in the music track, the above described processing may be performed for more than one of the sound sources to obtain one or more respective samples for the respective sound source.
- the obtained sound sample(s) may be used for various applications as will be described in the following in greater detail.
- the obtained sound samples may be used for a virtual instrument.
- the method 100 may further comprise generating a virtual instrument based on the determined sound samples of the sound source.
- a virtual instrument is software-based mimic of a traditional or electronic music instrument. Unlike physical instruments that produce sound through vibrating strings, resonating air columns, or other tangible means, virtual instruments generate sound using digital signal processing algorithms.
- the obtained sound samples of the sound source represent the sounds output by the sound source in great detail.
- the obtained sound samples of the sound source catch the specific sound of the instrument in the music track.
- the obtained sound samples of the sound source may be played back by the virtual instrument or used as a basis for sound output by the virtual instrument when the virtual instrument is played by a user.
- the specific sound of the sound source in the music track may be used for creating a new piece of music with the virtual instrument. For example, if the sound source is a piano, a guitar or a bass, a corresponding virtual piano, guitar, bass or synthesizer may be generated according to the method 100.
- the method 100 may further comprise causing output of a Graphical User Interface (GUI) on a display.
- GUI Graphical User Interface
- the GUI shows at least graphical icons allowing a user to interact with the virtual instrument.
- the GUI may show further graphical elements such as a graphical representation of the virtual instrument.
- the output of the GUI may be caused on a display of a mobile phone, a tablet computer, a laptop, a desktop computer or a TV-set used/accessible by a user.
- the method 100 may, e.g., comprise transmitting control data to a target device which is to display the GUI.
- the control data is encoded with one or more control commands for controlling (instructing) a display of the target device to output the GUI.
- FIG. 2 An exemplary GUI 200 is illustrated in Fig. 2.
- the virtual instrument is a virtual piano.
- the virtual piano is depicted in the GUI 200 by a corresponding graphical icon 210.
- the virtual piano is based on the obtained sound samples of a piano used in the music track.
- Exemplary additional graphical icons are denoted by reference numbers 220, ..., 260.
- the graphical icon 220 allows to set parameters like an ambience of the virtual piano or a reverb characteristic of the ambience.
- the graphical icon 230 allows to action the various virtual keys of the virtual piano and adjust action settings.
- the graphical icon 240 allows to action the various virtual pedals of the virtual piano and adjust action settings.
- the graphical icon 250 allows to adjust resonance settings of the virtual piano and the graphical icon 260 allows to adjust noise in the virtual piano’s ambience.
- a user is able to interact with (play) the virtual piano by means of the graphical icons 210, . . ., 260.
- the virtual piano is based on the obtained sound samples of the piano used in the music track, the user may create a new piece of music with a virtual piano having the specific sound of the piano used in the music track.
- the obtained sound samples may be used to recognize whether the music track uses copyrighted material.
- the method 100 may comprise determining whether the samples of the sound source are taken from a group of music tracks (e.g., copyrighted music tracks). For example, the samples of the sound source may be compared to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks. Based on the similarities of the samples of the sound source to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks, it may be determined whether material from the group of music tracks is used in the music track.
- a group of music tracks e.g., copyrighted music tracks
- the samples of the sound source may be compared to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks. Based on the similarities of the samples of the sound source to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks, it may be determined whether material from the group of music tracks is used
- the obtained sound samples may alternatively or additionally be used to obtain a transcription of the music track.
- the method 100 may further comprise determining a transcription of the music track based on the samples of the sound source and the determined times of occurrence of the sound source in the music track.
- the transcription of the music track refers to a written or symbolic representation of the musical elements found in the music track.
- the musical elements of the music track may be melody, harmony, rhythm and/or lyrics.
- the transcription of the music track may, e.g., be a Musical Instrument Digital Interface (MIDI) file.
- MIDI Musical Instrument Digital Interface
- the determined times of occurrence of the sound source in the music track and the samples of the sound source indicate the contribution of the sound source (e.g., a specific instrument or a vocal) to the music track. Accordingly, a corresponding written or symbolic representation of the contribution of the sound source to the music track may be determined.
- the sound samples may be determined for multiple (e.g., all) sound sources of the music track.
- the method 100 may comprise determining samples and times of occurrence for at least one further sound source according to the abovedescribed principles. Accordingly, the transcription of the music track may be determined further based on the determined samples and times of occurrence for the at least one further sound source. This may allow to obtain a full transcription of the music track.
- the transcription of the music track may be used for various applications.
- the transcription of the music track may be used to recognize or identify the music track.
- the method 100 may in some examples comprise identifying the music track based on the transcription of the music track.
- the transcription of the music track may be compared to transcriptions of known music tracks. Based on the similarities of the transcription of the music track to the transcriptions of the known music tracks, the music track may be identified (i.e., it may be determined whether the music track is one of the known music tracks).
- a MIDI file of the music track may be compared to MIDI files of the known music tracks. Comparing the transcription of the music track allows to recognize the music track also if the speed has been altered compared to original recording of the music track.
- the method 100 may then further comprise modifying the audio file by replacing the audio signal of the sound source with the audio signal for the other sound source.
- the sound of the (original) audio source in the music track may be replaced by sound of another sound source.
- one instrument may be replaced by another instrument in the music track to alter the music track.
- two examples for determining sound samples of a sound source from a music track will be described in greater detail.
- the sound source is an unpitched percussion instrument.
- the unpitched percussion instrument is a type of percussion instrument that does not produce definite pitches or specific musical notes. Instead, these instruments create sounds with indistinct pitch characteristics. Examples for unpitched percussion instruments are drums, cymbals, tambourines, triangles or wood blocks.
- the first example will be described with reference to Fig. 3.
- the unpitched percussion instrument is a drum kit.
- Source separation is performed on the audio file to determine an audio signal of the drum kit in the music track.
- Subfigure (a) exemplarily illustrates two audio signals 305 and 310 obtained by source separation of the audio file.
- the audio signal 310 is for the drum kit.
- the audio signal 305 is for another instrument used in the music track.
- the times of occurrence of the drum kit in the music track are determined by automatic beat detection.
- Automatic beat detection is a technique (process) in which one or more software algorithms analyze an audio signal to identify and mark the locations of beats or rhythmic pulses in the music. The beats are the regular, recurring patterns that form the foundation of a musical rhythm.
- the automatic beat detection is applied to the audio signal 310 for the drum kit. Exemplary algorithms for automatic beat detection are described in http s : //es senti a. upf . edu/tutori al rhy thm b eatdetecti on . html or http s : //es senti a. upf . edu/ reference/ std_B eatT rackerDegara .
- the resulting sound samples 341, 342 and 343 may be used as described above.
- a virtual drum may be generated from the sound samples that reflects the characteristics of the drum kit used in the music track (e.g., the specific sound of the drum kit or the intensity with which the drum kit is played in the music track). This may allow to make the virtual drum kit sound more natural.
- the sound source is a tonal instrument.
- the tonal instrument is a musical instrument that is capable of producing pitched or tonal sounds. Tonal instruments generate musical notes with discernible pitch, allowing for the creation of melodies and harmonies.
- the tonal instrument may be a guitar, a bass, a piano or violine.
- the second example will be described in the following with reference to Fig. 4.
- source separation is initially performed on the audio file to determine an audio signal of the tonal instrument in the music track.
- the times of occurrence of the tonal instrument in the music track are determined by onset detection for played notes.
- Onset detection for played notes is a technique (process) in which one or more software algorithms analyze an audio signal to identify the precise moments in time when musical notes or sound events begin (onset points) within a piece of audio. The onset detection for played notes is applied to the audio signal for the tonal instrument.
- a respective sound that is common to each sound piece in the respective group of similar sound pieces is determined as a sound sample for the played notes of the tonal instrument.
- statistical approaches such as cross-correlation may be used to determine the sound common to each sound piece in the respective group.
- statistical approaches or methods may be used to keep only the sound that is common in each of the groups, thereby dismissing other interfering sounds.
- the music track comprises a bass line consisting of three different notes with different durations
- sound samples may be obtained for the three different notes according to the method 100.
- the sound samples obtained for the three different notes may be interpolated and/or extrapolated to the full scale to create a full instrument.
- the resulting virtual bass is able to provide the full scale of notes.
- the proposed technology may further be used for identifying a sample underlying a music track.
- a “sample” is a short piece, segment or snippet of audio taken from a pre-existing music track (recording) and used in a new music track (composition).
- the sample may comprise a portion of a song, a drumbeat, a vocal line, or any other sound extracted from an existing music track.
- a sample is usually repeated throughout the new music track.
- a method 500 for determining a sample underlying (forming the foundation of) a music track (music piece, song) is illustrated in Fig. 5.
- the method 500 comprises determining 504 a repeating pattern in the spectral representation of the music track.
- Various techniques for detecting a repeating pattern in the spectral representation of the music track may be used. For example, pattern recognition techniques may be used to identify a repeating pattern. Additionally or alternatively, autocorrelation of the spectral representation of the music track (or a part thereof) with a time-shifted replica of itself (or a part thereof) may be used to identify a repeating pattern. Further additionally or alternatively, techniques like Fourier analysis or trained machine-learning models may be used to determine a repeating pattern in the spectral representation of the music track.
- Fig. 6 exemplarily shows in the left part spectral representations 610, ... 640 of four different parts of a music track. The spectral representation 650 of an exemplary repeating pattern identified in the spectral representations 610, ... 640 of the four different parts of the music track is illustrated in the right part of Fig. 6.
- the method 500 additionally comprises extracting 506 sound pieces (excerpts, fragments, snippets) from the audio file comprising the determined repeating pattern.
- sound pieces excerpts, fragments, snippets
- specific sections or portions of the audio signal represented by (encoded to) the audio file comprising the determined repeating pattern are isolated or separated.
- a plurality of sound pieces are obtained as a result of the extraction.
- the sound pieces are representations of the determined repeating pattern in the music track.
- the method 500 comprises determining 508 a sound that is common to each of the extracted sound pieces as the sample underlying the music track. Analogously to what is described above, various techniques such as statistical approaches may be used to determine the respective sound that is common to each of the extracted sound pieces. The common sound may then be extracted as the sample underlying the music track. The method allows to identify and extract one or more samples underlying a music track. The method 500 is described above for a single sample. However, in case the music track is based on multiple samples, the above described processing may be performed to discover and extract more than one sample in the music track.
- the obtained sample underlying the music track may be used for various applications as will be described in the following in greater detail.
- the obtained sample underlying the music track may be used to further analyze the composition of the music track.
- the obtained sample underlying the music track may be used to recognize whether the music track uses copyrighted material.
- the method 500 may comprise determining whether the determined sample underlying the music track is taken from a group of music tracks (e.g., copyrighted music tracks).
- the obtained sample underlying the music track may be compared to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks. Based on the similarities of the obtained sample underlying the music track to the music tracks in the group of music tracks or to samples of the music tracks in the group of music tracks, it may be determined whether material from the group of music tracks is used in the music track.
- Fig. 7 illustrates an exemplary apparatus 700 comprising processing circuitry 710.
- the processing circuitry 710 is configured to receive an audio file 701 and process it as described herein (e.g., according to one of the methods 100 and 500 described above).
- the processing circuitry 710 may be a single dedicated processor, a single shared processor, or a plurality of individual processors, some of which or all of which may be shared, a digital signal processor (DSP) hardware, an application specific integrated circuit (ASIC), a neuromorphic processor, a sys- tem-on-a-chip (SoC) or a field programmable gate array (FPGA).
- DSP digital signal processor
- ASIC application specific integrated circuit
- SoC sys- tem-on-a-chip
- FPGA field programmable gate array
- the processing circuitry 110 may optionally be coupled to, e.g., memory such as read only memory (ROM) for storing software, random access memory (RAM) and/or non-volatile memory.
- the apparatus 700 may comprise memory configured to store instructions, which when executed by the processing circuitry 710, cause the processing circuitry 710 to perform the steps and methods described herein.
- the apparatus 700 may, e.g., be or be part of a server, a computing cloud, a personal computer or a mobile device (e.g., a mobile phone, a laptop-computer or a tablet computer).
- Examples of the present disclosure may provide a system that can analyze a piece of music in order to decompose it into a transcription (like a MIDI file) and a set of sample based instruments that accurately reproduce the original. For example, if a user loves the sound of the piano in a song, the user may use the proposed technology on this song to obtain a virtual piano instrument that sounds the same and use it for his/her own music.
- a piece of music can be decomposed in a set of transcriptions (e.g., MIDI tracks) each associated with a number of samples making up a separate instrument.
- the accurate transcription e.g., a MIDI file
- the sample sets (instruments) obtained according to the proposed technology may, e.g., be used to automatically create a reusable instrument that makes it possible to use a specific sounding instrument in new work or to recognize samples from other copyrighted material.
- An apparatus for processing an audio file storing a music track comprising processing circuitry configured to: perform source separation on the audio file to determine an audio signal of a sound source in the music track; determine times of occurrence of the sound source in the music track based on the determined audio signal of the sound source; extract sound pieces from the determined audio signal of the sound source at the determined times of occurrence of the sound source; group the extracted sound pieces into groups of similar sound pieces; and determine a respective sound that is common to each sound piece in the respective group of similar sound pieces as a sound sample of the sound source.
- the sound source is a tonal instrument
- the processing circuitry is configured to: determine the times of occurrence of the sound source in the music track by onset detection for played notes; and extract the sound pieces from the determined audio signal of the sound source by: determining a chromagram based on the determined audio signal of the sound source; determining pitches and lengths of the played notes from the chromagram; and extract sound pieces for the determined pitches of played notes from the determined audio signal of the sound source based on the determined times of occurrence and lengths of the played notes.
- processing circuitry is configured to: cause output of a graphical user interface on a display, wherein the graphical user interface shows graphical icons allowing a user to interact with the virtual instrument.
- processing circuitry is further configured to: generate an audio signal for another sound source based on the transcription of the music track; and modify the audio file by replacing the audio signal of the sound source with the audio signal for the other sound source.
- a method for processing an audio file storing a music track comprising: performing source separation on the audio file to determine an audio signal of a sound source in the music track; determining times of occurrence of the sound source in the music track based on the determined audio signal of the sound source; extracting sound pieces from the determined audio signal of the sound source at the determined times of occurrence of the sound source; grouping the extracted sound pieces into groups of similar sound pieces; and determining a respective sound that is common to each sound piece in the respective group of similar sound pieces as a sound sample of the sound source.
- An apparatus for determining a sample underlying a music track comprising processing circuitry configured to: determine a spectral representation of the music track based on an audio file storing the music track; determine a repeating pattern in the spectral representation of the music track; extract sound pieces from the audio file comprising the determined repeating pattern; determine a sound that is common to each of the extracted sound pieces as the sample underlying the music track.
- a method for determining a sample underlying a music track comprising: determining a spectral representation of the music track based on an audio file storing the music track; determining a repeating pattern in the spectral representation of the music track; extracting sound pieces from the audio file comprising the determined repeating pattern; determining a sound that is common to each of the extracted sound pieces as the sample underlying the music track.
- the aspects and features described in relation to a particular one of the previous examples may also be combined with one or more of the further examples to replace an identical or similar feature of that further example or to additionally introduce the features into the further example.
- Examples may further be or relate to a (computer) program including a program code to execute one or more of the above methods when the program is executed on a computer, processor or other programmable hardware component.
- steps, operations or processes of different ones of the methods described above may also be executed by programmed computers, processors or other programmable hardware components.
- Examples may also cover program storage devices, such as digital data storage media, which are machine-, processor- or computer-readable and encode and/or contain machine-executable, processor-executable or computer-executable programs and instructions.
- Program storage devices may include or be digital storage devices, magnetic storage media such as magnetic disks and magnetic tapes, hard disk drives, or optically readable digital data storage media, for example.
- Other examples may also include computers, processors, control units, (field) programmable logic arrays ((F)PLAs), (field) programmable gate arrays ((F)PGAs), graphics processor units (GPU), ASICs, integrated circuits (ICs) or SoCs programmed to execute the steps of the methods described above.
- FPLAs field programmable logic arrays
- FPGAs field programmable gate arrays
- GPU graphics processor units
- ASICs integrated circuits
- SoCs integrated circuits programmed to execute the steps of the methods described above.
- aspects described in relation to a device or system should also be understood as a description of the corresponding method.
- a block, device or functional aspect of the device or system may correspond to a feature, such as a method step, of the corresponding method.
- aspects described in relation to a method shall also be understood as a description of a corresponding block, a corresponding element, a property or a functional feature of a corresponding device or a corresponding system.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
L'invention concerne un procédé de traitement d'un fichier audio stockant une piste musicale. Le procédé consiste à effectuer une séparation de source sur le fichier audio pour déterminer un signal audio d'une source sonore dans la piste musicale. En outre, le procédé consiste à déterminer des moments d'occurrence de la source sonore dans la piste musicale sur la base du signal audio déterminé de la source sonore. Le procédé consiste en outre à extraire des éléments sonores à partir du signal audio déterminé de la source sonore aux moments d'occurrence déterminés de la source sonore. De plus, le procédé consiste à regrouper les éléments sonores extraits en groupes d'éléments sonores similaires et à déterminer un son respectif qui est commun à chaque élément sonore dans le groupe respectif d'éléments sonores similaires en tant qu'échantillon sonore de la source sonore.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP24162532 | 2024-03-11 | ||
| EP24162532.6 | 2024-03-11 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025190785A1 true WO2025190785A1 (fr) | 2025-09-18 |
Family
ID=90364839
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2025/056195 Pending WO2025190785A1 (fr) | 2024-03-11 | 2025-03-06 | Appareil et procédé de traitement d'un fichier audio stockant une piste musicale et appareil et procédé de détermination d'un échantillon sous-jacent à une piste musicale |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025190785A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060075884A1 (en) * | 2004-10-11 | 2006-04-13 | Frank Streitenberger | Method and device for extracting a melody underlying an audio signal |
| EP2342708B1 (fr) * | 2008-10-15 | 2012-07-18 | Museeka S.A. | Procédé d'analyse d'un signal audio musical numérique |
| US9099064B2 (en) * | 2011-12-01 | 2015-08-04 | Play My Tone Ltd. | Method for extracting representative segments from music |
| CN110162660A (zh) * | 2019-05-28 | 2019-08-23 | 维沃移动通信有限公司 | 音频处理方法、装置、移动终端及存储介质 |
-
2025
- 2025-03-06 WO PCT/EP2025/056195 patent/WO2025190785A1/fr active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060075884A1 (en) * | 2004-10-11 | 2006-04-13 | Frank Streitenberger | Method and device for extracting a melody underlying an audio signal |
| EP2342708B1 (fr) * | 2008-10-15 | 2012-07-18 | Museeka S.A. | Procédé d'analyse d'un signal audio musical numérique |
| US9099064B2 (en) * | 2011-12-01 | 2015-08-04 | Play My Tone Ltd. | Method for extracting representative segments from music |
| CN110162660A (zh) * | 2019-05-28 | 2019-08-23 | 维沃移动通信有限公司 | 音频处理方法、装置、移动终端及存储介质 |
Non-Patent Citations (3)
| Title |
|---|
| CHEN KE ET AL: "Pa?-HuBERT: Self-Supervised Music Source Separation Via Primitive Auditory Clustering And Hidden-Unit Bert", 2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS (ICASSPW), IEEE, 4 June 2023 (2023-06-04), pages 1 - 5, XP034389057, [retrieved on 20230802], DOI: 10.1109/ICASSPW59220.2023.10193575 * |
| JEONGSOO PARK ET AL: "Separation of Instrument Sounds using Non-negative Matrix Factorization with Spectral Envelope Constraints", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 12 January 2018 (2018-01-12), XP080852422 * |
| MO SHENTONG ET AL: "Semantic Grouping Network for Audio Source Separation", ARXIV.ORG, 1 July 2024 (2024-07-01), XP093277298, Retrieved from the Internet <URL:https://arxiv.org/pdf/2407.03736> [retrieved on 20250519] * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Xi et al. | GuitarSet: A Dataset for Guitar Transcription. | |
| CN112382257B (zh) | 一种音频处理方法、装置、设备及介质 | |
| Brossier | Automatic annotation of musical audio for interactive applications | |
| US6798886B1 (en) | Method of signal shredding | |
| JP6735100B2 (ja) | 音楽コンテンツ及びリアルタイム音楽伴奏の自動採譜 | |
| US8198525B2 (en) | Collectively adjusting tracks using a digital audio workstation | |
| US8153882B2 (en) | Time compression/expansion of selected audio segments in an audio file | |
| US9012754B2 (en) | System and method for generating a rhythmic accompaniment for a musical performance | |
| Salamon et al. | An analysis/synthesis framework for automatic f0 annotation of multitrack datasets | |
| Dixon | On the computer recognition of solo piano music | |
| US8831762B2 (en) | Music audio signal generating system | |
| US9263018B2 (en) | System and method for modifying musical data | |
| US9251773B2 (en) | System and method for determining an accent pattern for a musical performance | |
| Su et al. | Sparse modeling of magnitude and phase-derived spectra for playing technique classification | |
| CN108369800B (zh) | 声处理装置 | |
| JP2008250008A (ja) | 楽音処理装置およびプログラム | |
| Lerch | Software-based extraction of objective parameters from music performances | |
| US20090084250A1 (en) | Method and device for humanizing musical sequences | |
| Liang et al. | Musical Offset Detection of Pitched Instruments: The Case of Violin. | |
| Cuesta et al. | A framework for multi-f0 modeling in SATB choir recordings | |
| WO2025190785A1 (fr) | Appareil et procédé de traitement d'un fichier audio stockant une piste musicale et appareil et procédé de détermination d'un échantillon sous-jacent à une piste musicale | |
| Primavera et al. | Audio morphing for percussive hybrid sound generation | |
| CN113412513A (zh) | 音信号合成方法、生成模型的训练方法、音信号合成系统及程序 | |
| US20250299655A1 (en) | Generating musical instrument accompaniments | |
| Suruceanu | Automatic Music Transcription From Monophonic Signals |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 25709715 Country of ref document: EP Kind code of ref document: A1 |