WO2009022084A1

WO2009022084A1 - Method for automatically composing a personalized ring tone from a hummed voice recording and portable telephone implementing this method

Info

Publication number: WO2009022084A1
Application number: PCT/FR2008/051477
Authority: WO
Inventors: Olivier Lescurieux; Nicolas Delorme; Aymeric Zils
Original assignee: VOXLER
Current assignee: VOXLER
Priority date: 2007-08-10
Filing date: 2008-08-07
Publication date: 2009-02-19
Anticipated expiration: 2010-02-10
Also published as: EP2186315A1; FR2919975B1; FR2919975A1

Abstract

The invention relates essentially to a method of automatically composing a personalized ring tone (13) from a recording of a voice signal (5) sung or hummed by a user. In this method, analysis parameters (25) are extracted from the voice signal (5), such as the pitch and/or loudness and/or attack of the notes of the voice signal, and the voice signal (5) is transformed into a ring tone comprising at least one musical track. In accordance with the invention, to transform the voice signal (5) into a ring tone, the voice signal is tuned by transposing said voice signal (5) as a whole of one and the same pitch in such a way as to minimize a distance between the voice signal (5) as a whole and a tempered chromatic range, and the voice signal (5) is tempered by replacing the notes of the transposed voice signal by tempered notes.

Description

Procédé pour composer automatiquement une sonnerie personnalisée à partir d'un enregistrement de voix fredonnée et téléphone portable mettant en œuvre ce procédé Method for automatically dialing a personalized ringtone from a voice and cell phone recording using this method

[0001]. L'invention concerne un procédé et un dispositif pour composer automatiquement une sonnerie personnalisée à partir d'un enregistrement d'un signal de voix. Le signal de voix correspond à une voix fredonnée, ce qui désigne dans l'invention une voix chantée ou parlée, ou des bruits vocaux divers, tels que des sifflements, des onomatopées, ou du « human beat boxing » qui consiste en l'imitation vocale d'une boîte à rythme, de scratch, et d'autres instruments principalement percussifs.[0001]. A method and apparatus for automatically dialing a custom ring from a recording of a voice signal. The voice signal corresponds to a hummed voice, which means in the invention a sung or spoken voice, or various vocal noises, such as whistles, onomatopoeia, or "human beat boxing" which consists of imitation voice of a drum machine, scratch, and other percussive instruments.

[0002]. L'invention a notamment pour but d'élaborer automatiquement la sonnerie en fonction de l'intention musicale de l'utilisateur et d'un type de musique choisi par l'utilisateur ou imposé par le fournisseur de service, à travers une interface de téléphone, ou de tout autre dispositif multimédia, tel qu'un site internet. L'intention musicale est notamment définie par les notes et/ou le rythme et/ou le timbre et/ou l'expressivité de la voix de l'utilisateur (vibrato, dynamique, etc.).[0002]. The invention is intended in particular to automatically develop the ringtone according to the musical intention of the user and a type of music chosen by the user or imposed by the service provider, through a telephone interface , or any other multimedia device, such as a website. The musical intention is defined in particular by the notes and / or the rhythm and / or the timbre and / or the expressiveness of the user's voice (vibrato, dynamics, etc.).

[0003]. L'invention peut être mise en œuvre localement, par exemple sur le téléphone portable, ou à distance, via un réseau, sur un serveur.[0003]. The invention can be implemented locally, for example on the mobile phone, or remotely, via a network, on a server.

L'invention trouve une application particulièrement avantageuse pour l'élaboration d'une sonnerie d'un téléphone portable.The invention finds a particularly advantageous application for the development of a ringtone of a mobile phone.

[0004]. Il existe aujourd'hui une logique de personnalisation dans le domaine des téléphones portables. En effet, on s'aperçoit que les couleurs, et les formes des téléphones, ou encore les présentations de leurs menus, sont de plus en plus variées, afin que l'utilisateur puisse choisir celles qui correspondent le mieux à sa personnalité.[0004]. Today there is a logic of personalization in the field of mobile phones. Indeed, we see that the colors and shapes of phones, or the presentations of their menus, are increasingly varied, so that the user can choose those that best fit his personality.

[0005]. En ce qui concerne les sonneries, plusieurs styles enregistrés sont proposés à l'utilisateur, ce dernier pouvant choisir celui correspondant le mieux à la musique qu'il aime. Toutefois compte tenu de l'aspect enregistré et donc figé des sonneries, la personnalisation est limitée aux sonneries proposées par le téléphone. Il existe bien des procédés pour télécharger des sonneries correspondant à des airs de chansons connues mais là encore les sonneries sont limitées aux morceaux de musique existants.[0005]. Regarding the ringtones, several registered styles are offered to the user, the latter can choose the one best suited to the music he likes. However, given the recorded appearance and thus frozen ringtones, personalization is limited to ringtones offered by the phone. There are many methods for downloading ringtones corresponding to tunes of known songs but again the ringtones are limited to existing music tracks.

[0006]. L'invention permet d'augmenter le degré de personnalisation des sonneries en proposant un procédé permettant à l'utilisateur d'élaborer lui-même une sonnerie à partir de sa voix, entrée spontanée et expressive par excellence. L'invention permet ainsi à l'utilisateur de créer une sonnerie qu'il a lui-même imaginée en la fredonnant dans son téléphone, et ouvre des possibilités illimitées dans l'élaboration d'une sonnerie.[0006]. The invention makes it possible to increase the degree of personalization of the ringtones by proposing a method allowing the user to elaborate himself a ringtone from his voice, spontaneous and expressive entry par excellence. The invention thus allows the user to create a ringtone that he himself imagined by humming in his phone, and opens unlimited possibilities in the development of a ringtone.

[0007]. Plus précisément, dans l'invention, on cherche à transformer le signal de voix en une sonnerie cohérente. C'est-à-dire que l'on ne recherche pas nécessairement à retranscrire la voix avec une fidélité absolue, mais plutôt à en extraire les intentions musicales pour transformer ce signal de voix en une musique et/ou accompagner ce signal de voix. Cette musique peut être constituée d'une ou plusieurs pistes musicales cohérentes. Cette musique et/ou l'accompagnement de ce signal de voix sera construit pour respecter des règles de composition musicale rythmiques et/ou mélodiques et/ou d'arrangement. [0008]. Ces règles de composition musicale comprennent notamment :[0007]. More precisely, in the invention, it is sought to transform the voice signal into a coherent ring. That is to say, we do not necessarily seek to transcribe the voice with absolute fidelity, but rather to extract the musical intentions to turn this voice signal into a music and / or accompany this voice signal. This music may consist of one or more coherent musical tracks. This music and / or accompaniment of this voice signal will be constructed to respect rhythmic and / or melodic musical composition and / or arrangement rules. [0008]. These music composition rules include:

- des règles rythmiques, comme par exemple interdire les notes de durée trop courte, ou caler la musique sur un tempo ou un pattern rythmique donné et/ou- rhythmic rules, such as prohibiting notes of short duration, or stalling the music on a given tempo or rhythm pattern and / or

- des règles mélodiques, comme par exemple tempérer les notes et/ou les recaler dans une gamme donnée et/ou conclure une phrase mélodique incomplète et/oumelodic rules, such as for example tempering the notes and / or resetting them in a given range and / or concluding an incomplete melodic phrase and / or

- des règles d'arrangement qui peuvent être représentatives d'un style comme par exemple générer plusieurs pistes musicales à partir d'un signal monophonique en garantissant une cohérence des pistes musicales entre elles par le respect des règles rythmiques et/ou mélodiques et/ou de style musical.rules of arrangement that can be representative of a style, such as for example generating several musical tracks from a monophonic signal while ensuring coherence of the musical tracks with respect to the rhythmic and / or melodic rules and / or musical style.

[0009]. Dans un exemple, dans l'invention, à partir du signal de voix monophonique, corrigé s'il y a lieu, on crée un ensemble de pistes musicales arrangées entre elles et jouées de préférence par différents instruments. Ces pistes correspondent à la musicalité du signal de voix et au style de musique choisi par l'utilisateur.[0009]. In one example, in the invention, from the monophonic voice signal, corrected if necessary, a set of tracks is created. arranged between them and played preferably by different instruments. These tracks correspond to the musicality of the voice signal and the music style chosen by the user.

[00010]. A cet effet, dans l'invention, l'utilisateur fredonne dans le microphone de son téléphone ou de son ordinateur connecté à un site[00010]. For this purpose, in the invention, the user hums in the microphone of his phone or computer connected to a site

Internet, ou de tout périphérique ou terminal dédié.Internet, or any device or dedicated terminal.

[00011]. On extrait ensuite des paramètres sonores du signal de voix tels que la hauteur (ou pitch en anglais), l'intensité (ou vélocité, ou volume de la voix), l'attaque (pics brefs d'intensité ou consonnes séparant des parties voisées), le timbre (rugosité et brillance notamment) et autres paramètres d'expressivité divers, tel que le rythme ou le tempo ou la gamme ou l'harmonie extraits de la voix. Ces paramètres font partie dans l'invention des « paramètres d'analyse ».[00011]. Then we extract sound parameters from the voice signal such as the pitch (pitch), the intensity (or velocity, or volume of the voice), the attack (sharp peaks of intensity or consonants separating voiced parts). ), the timbre (roughness and brilliance in particular) and other various expressivity parameters, such as the rhythm or tempo or the range or harmony extracted from the voice. These parameters are part of the invention of the "analysis parameters".

[00012]. Ensuite, on contrôle la façon dont les paramètres d'analyse sont utilisés pour la synthèse de sonnerie en fonction d'un « style » de composition de la sonnerie choisi par l'utilisateur ou imposé par le fournisseur du service ou le fabricant de l'appareil. Par exemple pour un style « RnB », on applique certaines règles de correction et/ou de composition et/ou de transformation de la voix particulière à ce style de musique, tandis que pour un style « jazz » d'autres règles seront appliquées. Par exemple, certaines notes considérées comme inadéquates dans le style « RnB » pourront être conservées dans le style[00012]. Then, we control how the analysis parameters are used for the ringtone synthesis according to a "style" of composition of the ringtone chosen by the user or imposed by the service provider or the manufacturer of the ringtone. apparatus. For example for a style "RnB", we apply some rules of correction and / or composition and / or transformation of the particular voice to this style of music, while for a style "jazz" other rules will be applied. For example, some notes that are considered inappropriate in the "RnB" style may be kept in style

« jazz »."Jazz".

[00013]. La sonnerie synthétisée est ensuite enregistrée dans une des mémoires de l'objet auquel elle est destinée, tel qu'un téléphone mobile ou fixe ou un ordinateur lorsque l'invention est utilisée avec des applications de type voix sur IP, ou sur un périphérique ou terminal dédié.[00013]. The synthesized ring is then recorded in one of the memories of the object for which it is intended, such as a mobile or fixed telephone or a computer when the invention is used with voice-over-IP applications, or on a peripheral device. dedicated terminal.

[00014]. Par terminal, on désigne tout appareil et/ou logiciel utilisé pour la composition et/ou le stockage et/ou l'écoute de la sonnerie. Un terminal peut être par exemple un téléphone portable, un téléphone fixe, un ordinateur ou un équipement électronique dédié. Un même terminal peut être utilisé tout au long du processus mais plusieurs terminaux peuvent aussi intervenir. Ainsi par exemple un ordinateur peut être utilisé pour composer la sonnerie c'est à dire pour la fredonner à partir d'un microphone. Ce même ordinateur peut être utilisé pour écouter la sonnerie en streaming et sans avoir la possibilité de la charger sur le téléphone portable. Le téléphone portable peut alors être utilisé pour stocker la sonnerie qui n'est alors envoyée à l'utilisateur que par exemple s'il a payé le prix associé à la sonnerie. Par extension, le terme téléphone ou téléphone portable désigne tout terminal utilisé pour la composition et/ou l'écoute et/ou le stockage de la sonnerie.[00014]. Terminal refers to any device and / or software used for the composition and / or storage and / or listening ring. A terminal can be for example a mobile phone, a fixed telephone, a computer or a dedicated electronic equipment. The same terminal can be used throughout the process but several terminals can also intervene. For example, a computer can be used to dial the ring, ie to hum from a microphone. This same computer can be used to listen to the ringtone streaming and without having the possibility to load it on the mobile phone. The mobile phone can then be used to store the ringtone which is then sent to the user only for example if he paid the price associated with the ringtone. By extension, the term telephone or mobile phone means any terminal used for the composition and / or listening and / or storage of the ring.

[00015]. De préférence, la sonnerie est synthétisée par un arrangement de couches sonores sous forme de pistes musicales qui dépendent de l'expressivité de la voix de l'utilisateur, et du style musical choisi ou imposé. Ces couches sonores peuvent provenir directement d'un traitement audio de la voix, ou de la lecture par un instrument virtuel d'une piste de type MIDI dérivée de la voix.[00015]. Preferably, the ring is synthesized by an arrangement of sound layers in the form of musical tracks that depend on the expressiveness of the voice of the user, and the musical style chosen or imposed. These sound layers can come directly from an audio processing of the voice, or from the playback by a virtual instrument of a MIDI track derived from the voice.

[00016]. On distingue la création de sonneries de type MIDI et la création de sonneries de type audio. Une sonnerie de type MIDI est un fichier MIDI constitué d'un ensemble de pistes MIDI destinées à être jouées par un ou plusieurs instruments virtuels disponibles sur le terminal. Une sonnerie de type audio est un fichier audio constitué d'un ensemble de pistes audio qui peuvent correspondre à une ou plusieurs pistes MIDI déjà jouées par un instrument virtuel et/ou à une transformation de la voix et/ou à des boucles audio préexistantes. La sonnerie audio est jouée au moyen d'un lecteur audio. Dans le cas de la sonnerie audio, aucun instrument virtuel n'est nécessaire pour sa lecture puisque les pistes audio sont lisibles telles quelles par le lecteur audio. Le format MIDI peut être remplacé par tout mode de représentation symbolique de la musique et en particulier par le format imelody.[00016]. One distinguishes the creation of ringtones of the type MIDI and the creation of ringtones of the audio type. A MIDI tone is a MIDI file consisting of a set of MIDI tracks to be played by one or more virtual instruments available on the terminal. An audio-type ringer is an audio file consisting of a set of audio tracks that can correspond to one or more MIDI tracks already played by a virtual instrument and / or to a transformation of the voice and / or pre-existing audio loops. The audio tone is played using an audio player. In the case of the audio ringing, no virtual instrument is necessary for its reading since the audio tracks are readable as such by the audio player. The MIDI format can be replaced by any mode of symbolic representation of the music and in particular by the imelody format.

[00017]. Chaque piste est synthétisée séparément en fonction du style musical choisi par l'utilisateur ou imposé (par exemple un style de sonnerie de type « flûte jazz ») et de l'ensemble des paramètres d'analyse. Ainsi les événements de chaque piste MIDI sont élaborés à partir des paramètres d'analyse en fonction de règles de composition musicale, tandis que la synthèse sonore des pistes audio est effectuée à partir des paramètres d'analyse et de sons pré-enregistrés ou synthétisés, ou de transformations de la voix.[00017]. Each track is synthesized separately according to the musical style chosen by the user or imposed (for example a style of ringing type "jazz flute") and all the parameters of analysis. Thus the events of each MIDI track are elaborated from the analysis parameters according to composition rules while sound synthesis of audio tracks is done from analysis parameters and pre-recorded or synthesized sounds, or voice transformations.

[00018]. Le traitement du signal de voix peut être effectué à distance sur un serveur, la voix étant enregistrée sur le téléphone puis acheminée vers le serveur qui la transforme en sonnerie, cette dernière étant renvoyée au téléphone. Ce traitement à distance est une solution facile à mettre en œuvre pour les opérateurs de téléphonie qui n'ont à installer le logiciel de composition de sonneries que sur un seul support (le serveur) et qui peuvent facilement contrôler l'accès à ce serveur en autorisant cet accès en contrepartie d'une redevance.[00018]. Voice signal processing can be done remotely on a server, the voice is recorded on the phone and then routed to the server which turns it into a ringtone, the latter being sent back to the phone. This remote processing is an easy solution to implement for telephone operators who only have to install the ringtone software on a single medium (the server) and who can easily control access to this server. authorizing this access in exchange for a fee.

[00019]. En variante, la voix fredonnée est directement traitée en local sur le téléphone. Dans ce cas, on autorise l'utilisateur à télécharger le programme de composition de sonneries en contrepartie d'une redevance. Dans cette mise en œuvre locale, la sonnerie est composée de manière quasiment instantanée, puisqu'il n'y a pas d'échange de fichier audio entre le téléphone et le serveur.[00019]. In a variant, the hummed voice is directly processed locally on the telephone. In this case, the user is allowed to download the ringtones program for a fee. In this local implementation, the ring is composed almost instantaneously, since there is no exchange of audio file between the phone and the server.

[00020]. En variante, la voix fredonnée est directement envoyée sur le serveur à partir du téléphone en streaming, transformée en sonnerie sur le serveur et renvoyée sur le téléphone sous forme d'un fichier audio ou midi suivant le type d'élaboration de sonnerie choisi.[00020]. Alternatively, the hummed voice is sent directly to the server from the streaming phone, transformed into a ringtone on the server and sent back to the phone as an audio or midi file depending on the type of elaboration chosen ringtone.

[00021]. En variante et dans les cas d'utilisation via un réseau, on peut renvoyer la sonnerie en streaming pour une pré-écoute puis ne la renvoyer sous forme d'un fichier qu'une fois que l'utilisateur s'est acquitté de la redevance associée.[00021]. Alternatively and in the case of use over a network, you can send the ringtone streaming for a preview and then return it as a file once the user has paid the fee associated.

[00022]. Différentes redevances peuvent être facturées et par exemple pour l'accès au service, en fonction du temps d'utilisation du service, en fonction du nombre d'essais de composition puis d'écoute du résultat obtenu, ou après l'écoute du résultat obtenu, lorsque l'usager décide de télécharger ou d'installer le résultat obtenu comme sonnerie de son téléphone portable. Différents compteurs sont insérés dans la chaîne de traitement pour permettre cette facturation. [00023]. L'invention concerne donc un procédé pour composer automatiquement une sonnerie à partir d'un enregistrement d'un signal de voix monophonique, tel qu'un chant, un fredonnement par un utilisateur, dans lequel : - on extrait des paramètres d'analyse du signal de voix, tels que la hauteur et/ou l'intensité et/ou l'attaque des notes du signal de voix, et[00022]. Different fees may be charged and for example for access to the service, depending on the time of use of the service, depending on the number of attempts to compose and then listen to the result obtained, or after listening to the result obtained , when the user decides to download or install the result obtained as a ringtone on their mobile phone. Different counters are inserted in the processing chain to allow this billing. [00023]. The invention therefore relates to a method for automatically dialing a ringtone from a recording of a monophonic voice signal, such as a song, a humming by a user, in which: - analysis parameters are extracted from the voice signal, such as the pitch and / or the intensity and / or the attack of the notes of the voice signal, and

- on transforme le signal de voix en une sonnerie comprenant au moins une piste musicale, caractérisé en ce que, pour transformer le signal de voix en sonnerie, - on accorde le signal de voix en transposant l'ensemble dudit signal de voix d'une même hauteur de manière à minimiser une distance entre l'ensemble du signal de voix et une gamme chromatique tempérée, etthe voice signal is transformed into a ring comprising at least one musical track, characterized in that, in order to transform the voice signal into a ring, the voice signal is tuned by transposing the set of said voice signal of a same height so as to minimize a distance between the whole of the voice signal and a tempered chromatic range, and

- on tempère le signal de voix en remplaçant les notes du signal de voix transposé par des notes tempérées. [00024]. Selon une mise en œuvre, pour accorder le signal de voix,- the voice signal is tempered by replacing the notes of the transposed voice signal with temperate notes. [00024]. According to one implementation, to tune the voice signal,

- on détermine une note centrale du signal de voix correspondant à la note la plus fréquente de ce signal de voix, etdetermining a central note of the voice signal corresponding to the most frequent note of this voice signal, and

- on transpose globalement l'ensemble du signal de voix de manière à faire correspondre la hauteur de la note centrale à la hauteur de sa note tempérée la plus proche.the overall voice signal is transposed in a manner that matches the pitch of the central note to the height of its closest temperate note.

[00025]. Selon une mise en œuvre, pour tempérer le signal de voix, le signal de voix ayant été préalablement découpé en segments,[00025]. According to one implementation, to temper the voice signal, the voice signal having been previously cut into segments,

- on définit pour chaque note tempérée un modèle de note associé,- for each temperate note, we define an associated note model,

- en boucle, pour chaque segment, on détermine, la note tempérée dont le modèle est le plus proche de la hauteur du segment,in a loop, for each segment, the tempered note whose model is closest to the height of the segment is determined,

- on affecte ladite note tempérée au segment, etsaid tempered note is assigned to the segment, and

- on met à jour le modèle de ladite note tempérée en prenant en compte la hauteur du segment, par exemple en faisant la moyenne des hauteurs des notes du signal qui ont été associées à cette note tempérée, ces hauteurs de notes pouvant être pondérées, le cas échéant, par leur durée ou leur intensité. [00026]. Selon une mise en oeuvre, dans le modèle de note initial, chaque note tempérée est modélisée par une hauteur de l'échelle chromatique accordée sur la note centrale.the model of said temperate note is updated by taking into account the height of the segment, for example by averaging the heights of the notes of the signal which have been associated with this tempered note, these note pitches being able to be weighted, the where appropriate, by their duration or intensity. [00026]. According to one implementation, in the initial note template, each tempered note is modeled by a height of the chromatic scale tuned to the central note.

[00027]. Selon une mise en œuvre, pour tempérer, on part du dernier segment du signal de voix d'un point de vue temporel et on remonte jusqu'au premier.[00027]. According to one implementation, to temper, we start from the last segment of the voice signal from a temporal point of view and go back to the first.

[00028]. Selon une mise en œuvre, pour transposer le signal de voix, on calcule un coût d'accordage qui est égal à l'intégration, sur la durée de la mélodie du signal de voix, du produit de la différence instantanée entre la hauteur du signal de voix et la hauteur tempérée la plus proche élevée à la puissance p (p réel strictement positif) et de l'intensité du signal de voix élevé à la puissance q (q réel positif), et on transpose le signal de voix de manière à minimiser la valeur du coût d'accordage.[00028]. According to one implementation, for transposing the voice signal, a tuning cost is calculated which is equal to the integration, over the duration of the melody of the voice signal, of the product of the instantaneous difference between the signal height. of voice and the nearest temperate height raised to the power p (p strictly positive real) and the intensity of the high voice signal to the power q (real positive q), and the voice signal is transposed so as to minimize the value of the tuning cost.

[00029]. Selon une mise en œuvre, p vaut 2 et q vaut 1.[00029]. According to one implementation, p is equal to 2 and q is 1.

[00030]. Selon une mise en œuvre, on détermine une gamme en mettant en œuvre les étapes suivantes :[00030]. According to one implementation, a range is determined by implementing the following steps:

- on choisit des probabilités d'occurrence d'une note dans une gamme donnée,- we choose probabilities of occurrence of a note in a given range,

- on calcule le degré d'appartenance de la mélodie à plusieurs gammes, ce degré d'appartenance étant fonction de la concordance des notes du signal de voix et des probabilités d'occurrence des notes de la gamme, etthe degree of belonging of the melody to several scales is calculated, this degree of membership being a function of the concordance of the notes of the voice signal and the probabilities of occurrence of the notes of the scale, and

- on sélectionne la gamme présentant le degré d'appartenance le plus élevé.- the range with the highest degree of membership is selected.

[00031]. Selon une mise en œuvre, le degré d'appartenance des notes de la gamme est égal à la somme pour toutes les notes du signal de voix du produit de la durée de chaque note élevée à la puissance p par l'intensité de chaque note élevée à la puissance q et par la probabilité d'occurrence de chaque note élevée à la puissance r, p q et r étant des réels supérieurs à 0 et p étant différent de 0. [00032]. Selon une mise en œuvre, pour tempérer la mélodie du signal de voix,[00031]. According to one implementation, the degree of membership of the notes of the scale is equal to the sum for all the notes of the voice signal of the product of the duration of each note raised to the power p by the intensity of each high note. to the power q and by the probability of occurrence of each note raised to the power r, pq and r being realities greater than 0 and p being different from 0. [00032]. According to one implementation, to temper the melody of the voice signal,

- on choisit des probabilités d'occurrence d'une note dans une gamme donnée, - on calcule une transposition optimale pour chacune des gammes candidates qui est fonction de la concordance des notes tempérées les plus proches du signal de voix et des probabilités d'occurrence des notes de la gamme candidate, en calculant un degré d'appartenance de la mélodie à chaque gamme candidate et pour l'ensemble des valeurs possibles de la transposition,- one chooses probabilities of occurrence of a note in a given range, - one computes an optimal transposition for each of the candidate ranges which is a function of the concordance of the temperate notes closest to the voice signal and probabilities of occurrence notes of the candidate scale, by calculating a degree of belonging of the melody to each candidate range and for all the possible values of the transposition,

- on choisit la gamme présentant le degré d'appartenance le plus élevé et- the range with the highest degree of membership is chosen and

- on transpose les notes du signal de voix de la transposition optimale associée à cette gamme, et - on tempère en remplaçant les notes du signal de voix transposé par les notes tempérées les plus proches du signal de voix transposé.- transpose the notes of the voice signal of the optimal transposition associated with this range, and - tempers by replacing the notes of the voice signal transposed by the temperate notes closest to the transposed voice signal.

[00033]. Selon une mise en œuvre, le degré d'appartenance des notes de la gamme est égal à l'intégrale pour toutes les notes de la mélodie du produit de l'intensité de chaque note élevée à la puissance q par l'écart de la note du signal de voix par rapport à la note située entre les deux notes tempérées les plus proches à la puissance p par la probabilité d'occurrence de chaque note élevée à la puissance r, p q et r étant des réels supérieurs à 0 et p étant différent de 0.[00033]. According to one implementation, the degree of membership of the notes of the scale is equal to the integral for all the notes of the melody of the product of the intensity of each note raised to the power q by the difference of the note. of the voice signal in relation to the note between the two temperate notes closest to the power p by the probability of occurrence of each score raised to the power r, pq and r being realities greater than 0 and p being different from 0.

[00034]. Selon une mise en œuvre, on utilise la connaissance de la gamme pour tempérer en minimisant une distance entre la note du signal de voix transposé et la note tempérée la plus proche, cette distance étant pondérée selon la probabilité d'occurrence de la note dans la gamme.[00034]. According to one implementation, range knowledge is used to temper by minimizing a distance between the transposed voice signal note and the closest temperate note, this distance being weighted according to the probability of occurrence of the note in the range.

[00035]. Selon une mise en œuvre, la probabilité d'occurrence d'une note dans une gamme donnée est déterminée à partir des probetones de Krumhansl &Kessler.[00035]. According to one implementation, the probability of occurrence of a score in a given range is determined from the Krumhansl & Kessler probetones.

[00036]. Selon une mise en œuvre, pour transformer le signal de voix en sonnerie, on supprime ou regroupe les notes dont la durée est inférieure à une valeur de référence, par exemple 1 ms, et/ou dont l'intensité est inférieure à une intensité de référence et/ou dont la qualité d'extraction de la hauteur est inférieure à une valeur de référence.[00036]. According to one implementation, to transform the voice signal into a ring, it removes or groups notes whose duration is less than a reference value, for example 1 ms, and / or whose intensity is less than a reference intensity and / or whose height extraction quality is less than a reference value.

[00037]. Selon une mise en œuvre, pour transformer le signal de voix en sonnerie, on corrige le signal de voix ou une piste musicale déjà obtenue à partir du signal de voix en la nettoyant et/ou en corrigeant la mélodie et/ou en la recalant rythmiquement et/ou en dérivant une mélodie du signal de voix.[00037]. According to one implementation, to transform the voice signal into a ring, the voice signal or a musical track already obtained from the voice signal is corrected by cleaning it and / or correcting the melody and / or recalibrating it rhythmically. and / or by deriving a melody from the voice signal.

[00038]. Selon une mise en œuvre, pour recaler rythmiquement le signal de voix, on réalise un suivi du tempo de ce signal de voix et on recale les notes du signal de voix dans ce tempo.[00038]. According to one implementation, to rhythmically adjust the voice signal, it tracks the tempo of this voice signal and recalibrates the notes of the voice signal in this tempo.

[00039]. Selon une mise en œuvre, la correction rythmique est effectuée en imposant un tempo fixe, par exemple le tempo moyen extrait du signal de voix, le signal de voix étant calé rythmiquement sur le tempo imposé par une méthode de « Time stretching ».[00039]. According to one implementation, the rhythm correction is performed by imposing a fixed tempo, for example the average tempo extracted from the voice signal, the voice signal being calibrated rhythmically on the tempo imposed by a method of "Time stretching".

[00040]. Selon une mise en œuvre, la correction rythmique est effectuée par un marquage rythmique par la technique de « Time Warping » dans laquelle on effectue un repérage des temps forts du signal de voix, afin de construire un rythme de référence sur lequel les pistes musicales sont synchronisées. [00041]. Selon une mise en œuvre, la correction mélodique est effectuée par une technique de « Pitch shifting » dans laquelle on recale les notes de la voix dans des notes tempérées et/ou dans la gamme moyenne du signal de voix fredonné.[00040]. According to one implementation, the rhythmic correction is performed by a rhythmic marking by the technique of "Time Warping" in which a tracing of the loud beats of the voice signal is carried out, in order to build a reference rhythm on which the musical tracks are synchronized. [00041]. According to one implementation, the melodic correction is performed by a "pitch shifting" technique in which the notes of the voice are recalibrated in temperate notes and / or in the average range of the hummed voice signal.

[00042]. Selon une mise en œuvre, pour composer une nouvelle mélodie dérivant du signal de voix, on sélectionne les notes du signal de voix et un rythme en fonction des paramètres d'analyse extraits et de règles de construction musicales qui dépendent d'un style musical choisi par l'utilisateur ou imposé.[00042]. According to one implementation, to compose a new melody derived from the voice signal, the notes of the voice signal and a rhythm are selected according to the extracted analysis parameters and musical construction rules that depend on a chosen musical style. by the user or imposed.

[00043]. Selon une mise en œuvre, pour composer une nouvelle mélodie dérivant du signal de voix, on élabore une mélodie originale qui est une réponse à la mélodie vocale et/ou au rythme vocal calculée à partir des paramètres d'analyse afin d'établir un dialogue avec une machine, ou une suite afin de terminer la mélodie vocale correctement en fonction du style choisi.[00043]. According to one implementation, to compose a new melody deriving from the voice signal, an original melody is produced that is a response to the vocal melody and / or the vocal rhythm calculated at from the analysis parameters to establish a dialogue with a machine, or a sequence to finish the vocal melody correctly according to the chosen style.

[00044]. Selon une mise en œuvre, on effectue une modification du timbre, et/ou de la hauteur et/ou d'autres caractéristiques de la voix par transformation du signal de voix, ou par synthèse d'ambiance sonore à partir du signal de voix.[00044]. According to one implementation, the tone, and / or the pitch and / or other characteristics of the voice are modified by transforming the voice signal, or by synthesizing the sound environment from the voice signal.

[00045]. Selon une mise en œuvre, pour élaborer une ou plusieurs pistes musicales, on utilise une ou plusieurs boucles de rythme préenregistrées sous forme de signal audio que l'on cale sur le tempo extrait du signal de voix.[00045]. According to one embodiment, to develop one or more musical tracks, one or more pre-recorded rhythm loops are used in the form of an audio signal that is staggered on the tempo extracted from the voice signal.

[00046]. Selon une mise en œuvre, pour créer une ou plusieurs pistes musicales, on sélectionne dans une base de données musicale des échantillons musicaux présentant les paramètres musicaux les plus proches de ceux du signal de voix sur un intervalle de temps donné.[00046]. According to one implementation, to create one or more musical tracks, musical samples are selected from musical databases presenting the musical parameters closest to those of the voice signal over a given time interval.

[00047]. Selon une mise en œuvre, on règle les volumes des différentes pistes musicales les uns par rapport aux autres, et/ou on introduit des effets sur des pistes sélectionnées, tels que de la saturation ou un effet de compression sonore, et le cas échéant on mixe l'ensemble des pistes en une piste de sortie, et/ou on introduit des effets globaux sur cette piste de sortie, tels que de la réverbération.[00047]. According to one embodiment, the volumes of the different musical tracks are adjusted with respect to each other, and / or effects are introduced on selected tracks, such as saturation or a sound compression effect, and, if appropriate, mixes all the tracks into an output track, and / or we introduce global effects on this output track, such as reverb.

[00048]. Selon une mise en œuvre, on crée un fichier audio à partir de la piste de sortie mixée, dans un format de type mp3.[00048]. According to one implementation, an audio file is created from the mixed output track, in an mp3 format.

[00049]. Selon une mise en œuvre, la sonnerie comprend plusieurs pistes musicales arrangées entre elles en fonction des paramètres d'analyse extraits et de règles de composition musicale.[00049]. According to one implementation, the ringtone comprises several musical tracks arranged between them according to the extracted analysis parameters and musical composition rules.

[00050]. Selon une mise en œuvre, les règles de composition musicales sont liées à un style de musique, tel qu'un style rock ou blues, choisi par l'utilisateur. [00051]. Selon une mise en œuvre, le signal de voix est envoyé en streaming vers un serveur assurant l'extraction de paramètres d'analyse et l'élaboration de la sonnerie.[00050]. According to one implementation, the musical composition rules are related to a style of music, such as a rock or blues style, chosen by the user. [00051]. According to one implementation, the voice signal is streamed to a server providing the extraction of analysis parameters and the development of the ringtone.

[00052]. Selon une mise en œuvre, les pistes musicales sont obtenues à partir d'un traitement MIDI et/ou audio du signal de voix.[00052]. According to one implementation, the musical tracks are obtained from a MIDI and / or audio processing of the voice signal.

[00053]. L'invention concerne en outre un téléphone portable mettant en œuvre le procédé selon l'invention.[00053]. The invention further relates to a mobile phone implementing the method according to the invention.

[00054]. L'invention concerne en outre un procédé pour associer des sonneries de téléphone à des contacts stockés dans un téléphone portable dans lequel :[00054]. The invention further relates to a method for associating telephone ringtones with contacts stored in a mobile phone in which:

- on enregistre une phrase chantée par l'utilisateur du téléphone portable ou par un des contacts,- we record a phrase sung by the user of the mobile phone or by one of the contacts,

- on transforme la phrase chantée en une sonnerie à l'aide du procédé défini selon l'invention, et - on stocke la sonnerie obtenue dans une mémoire associée au contact auquel la sonnerie est destinée, de sorte que lorsque le contact appelle, la sonnerie lui correspondant est jouée par le téléphone.the sung phrase is transformed into a ring by means of the method defined according to the invention, and the ringtone obtained is stored in a memory associated with the contact for which the ring is intended, so that when the contact calls, the ringing his correspondent is played by the phone.

[00055]. L'invention concerne en outre un dispositif de génération de musique temps réel mettant en œuvre le procédé défini selon l'invention pour générer une sonnerie à partir d'une phrase musicale chantée, ce dispositif comportant des moyens pour faire jouer cette sonnerie en boucle, de sorte qu'il est possible de chanter sur cette sonnerie, ou de la mixer avec des morceaux de musique pour créer des pistes musicales.[00055]. The invention further relates to a device for generating real-time music implementing the method defined according to the invention for generating a ringtone from a sung musical phrase, this device comprising means for playing this ringing ring, so that it is possible to sing on this ringtone, or mix it with music tracks to create musical tracks.

[00056]. L'invention concerne également un procédé pour élaborer une sonnerie dans lequel on enregistre successivement plusieurs lignes de voix, l'écoute des lignes de voix précédemment enregistrées et traitées selon le procédé défini conformément à l'invention étant autorisé pendant l'enregistrement d'une nouvelle ligne de voix et des paramètres d'analyse pouvant être extraits de chaque enregistrement ou de l'ensemble des enregistrements. [00057]. L'invention sera mieux comprise à la lecture de la description qui suit et à l'examen des figures qui l'accompagnent. Ces figures ne sont données qu'à titre illustratif mais nullement limitatif de l'invention. Elles montrent :[00056]. The invention also relates to a method for producing a ringing tone in which several voice lines are recorded successively, the listening of voice lines previously recorded and processed according to the method defined in accordance with the invention being authorized during the recording of a voice. new voice line and analysis parameters that can be extracted from each record or from all records. [00057]. The invention will be better understood on reading the description which follows and on examining the figures which accompany it. These figures are given for illustrative but not limiting of the invention. They show :

[00058]. Figure 1 : une représentation schématique de la chaîne de traitement selon l'invention permettant d'élaborer une sonnerie à partir d'un signal de voix d'un utilisateur ;[00058]. Figure 1: a schematic representation of the processing chain according to the invention for developing a ringtone from a voice signal of a user;

[00059]. Figure 2 : une représentation schématique du module d'analyse du signal de voix selon l'invention ;[00059]. Figure 2: a schematic representation of the voice signal analysis module according to the invention;

[00060]. Figure 3 : une représentation schématique des modules de synthèse de type midi et audio selon l'invention ;[00060]. Figure 3: a schematic representation of the midi and audio synthesis modules according to the invention;

[00061]. Figure 4 : une représentation graphique de l'amplitude du signal de voix de l'utilisateur en fonction du temps ;[00061]. Figure 4: a graphical representation of the amplitude of the voice signal of the user as a function of time;

[00062]. Figure 5 : une représentation graphique d'une transcription MIDI brute du signal de voix ;[00062]. Figure 5: a graphical representation of a raw MIDI transcription of the voice signal;

[00063]. Figures 6-8 : des représentations graphiques de pistes musicales MIDI obtenues à partir de la piste MIDI brute du signal de voix après application du procédé de traitement selon l'invention ;[00063]. Figures 6-8: graphical representations of MIDI musical tracks obtained from the raw MIDI track of the voice signal after application of the processing method according to the invention;

[00064]. Figure 9 : une représentation graphique du marquage rythmique selon l'invention effectué sur le signal de voix afin de synchroniser rythmiquement la voix transformée ou non, et la ou les pistes instrumentales ;[00064]. Figure 9: a graphic representation of the rhythm marking according to the invention performed on the voice signal to synchronize rhythmically the voice transformed or not, and the instrumental track or tracks;

[00065]. Figure 10 : une représentation graphique d'un signal audio d'une boucle de batterie de style RnB pouvant être calée sur la voix pour l'accompagner ;[00065]. Figure 10: a graphical representation of an audio signal of an RnB style drum loop that can be keyed to the voice to accompany it;

[00066]. Figure 11 : une représentation graphique du signal de voix brut et du signal de voix transposé selon un algorithme d'accordage (transposition optimale de la mélodie) ; [00067]. Figure 12 : une représentation graphique du signal de voix transposé optimal et du signal de voix tempéré selon un algorithme de « tempérage » de la mélodie sur une gamme chromatique tempérée en « La 440 » ;[00066]. Figure 11: a graphical representation of the raw voice signal and the voice signal transposed according to a tuning algorithm (optimal transposition of the melody); [00067]. FIG. 12: a graphical representation of the optimal transposed voice signal and the tempered voice signal according to a "tempering" algorithm of the melody over a tempered chromatic range in "La 440";

[00068]. Figure 13.1 : une représentation graphique du signal de voix tempéré par rapport à une gamme de La Majeur non optimale ;[00068]. Figure 13.1: a graphical representation of the tempered voice signal with respect to a range of non-optimal Major;

[00069]. Figure 13.2 : une représentation graphique du signal de voix tempéré par rapport à une gamme de La Mineur harmonique optimale ;[00069]. Figure 13.2: a graphical representation of the tempered voice signal with respect to an optimal harmonic min range;

[00070]. Figure 14 : une représentation graphique du signal de voix brut et du signal de voix transposé selon un algorithme d'accordage prenant en compte les probabilités d'occurrence des notes dans une gamme ;[00070]. Figure 14: a graphical representation of the raw voice signal and the voice signal transposed according to a tuning algorithm taking into account the probabilities of occurrence of the notes in a range;

[00071]. Figure 15 : une représentation schématique d'une chaîne de traitement simplifiée de l'invention ;[00071]. Figure 15: a schematic representation of a simplified processing chain of the invention;

[00072]. Figure 16: une représentation graphique de la hauteur d'un signal de voix en fonction du temps sur lequel une grille d'accordage a été placé ;[00072]. Figure 16: a graphical representation of the pitch of a voice signal as a function of time on which a tuning grid has been placed;

[00073]. Figure 17: un histogramme des hauteurs des notes chantées, suivant les bandes de 1/2 ton de la grille d'accordage ;[00073]. Figure 17: a histogram of the pitches of the sung notes, following the 1/2 ton bands of the tuning grid;

[00074]. Figure 18: une représentation du modèle initial des notes tempérées accordées par 1/2 ton sur la note centrale ;[00074]. Figure 18: a representation of the initial model of the temperate notes tuned by 1/2 ton on the central note;

[00075]. Figure 19 : une représentation des hauteurs des modèles de notes tempérées, après une première analyse des dernières notes chantées ;[00075]. Figure 19: a representation of the heights of the models of temperate notes, after a first analysis of the last sung notes;

[00076]. Figure 20: une représentation des hauteurs finales des modèles de notes tempérées, après une analyse complète du signal de voix.[00076]. Figure 20: a representation of the final heights of the temperate note models, after a complete analysis of the voice signal.

[00077]. Les éléments identiques conservent la même référence d'une figure à l'autre. [00078]. La figure 1 montre une représentation schématique d'une chaîne 1 de traitement permettant de produire automatiquement une sonnerie 13 de téléphone à partir d'un signal de voix 5 d'un utilisateur. Cet utilisateur 2 est en relation avec une interface 3, telle qu'un téléphone portable ou un ordinateur ou un terminal qui est relié à un serveur 4 via un réseau, par exemple de type Internet. Dans cette mise en oeuvre, le traitement du signal de voix 5 de l'utilisateur est effectué sur le serveur 4.[00077]. Identical elements retain the same reference from one figure to another. [00078]. Figure 1 shows a schematic representation of a processing chain 1 for automatically generating a telephone ring 13 from a voice signal of a user. This user 2 is in connection with an interface 3, such as a mobile phone or a computer or a terminal that is connected to a server 4 via a network, for example of the Internet type. In this implementation, the processing of the voice signal 5 of the user is performed on the server 4.

[00079]. Plus précisément, l'utilisateur 2 définit, via l'interface 3, des paramètres d'entrée du procédé selon l'invention. A cet effet, l'utilisateur émet un signal de voix 5 en fredonnant dans son téléphone et choisit le style 6 de musique suivant lequel la sonnerie 13 va être élaborée. Le choix du style de musique est optionnel et peut être imposé à l'utilisateur.[00079]. More specifically, the user 2 defines, via the interface 3, input parameters of the method according to the invention. For this purpose, the user emits a voice signal humming in his telephone and chooses the style of music according to which the ringing 13 will be elaborated. The choice of music style is optional and may be imposed on the user.

[00080]. L'interface 3 comporte un microphone 9 capable de capter le signal de voix 5 de l'utilisateur et le cas échéant des interfaces 10 permettant à l'utilisateur de choisir le style de musique élaboré et peut comprendre un haut-parleur pour accompagner, s'il y a lieu, musicalement l'utilisateur pendant qu'il chante. L'interface 3 comporte en outre une mémoire apte à stocker la sonnerie 13.[00080]. The interface 3 comprises a microphone 9 capable of capturing the voice signal 5 of the user and, if appropriate, interfaces 10 allowing the user to choose the style of music developed and may include a loudspeaker to accompany, s if there is any, musically the user while he is singing. The interface 3 further comprises a memory capable of storing the ringing 13.

[00081]. Dans une variante, la sonnerie 13 peut être stockée sur un terminal différent de celui utilisé pour créer la sonnerie. Par exemple la création de la sonnerie peut être réalisée à partir d'un ordinateur alors qu'elle sera finalement stockée sur un téléphone portable.[00081]. Alternatively, the ring 13 can be stored on a terminal different from that used to create the ring. For example the creation of the ringtone can be done from a computer while it will finally be stored on a mobile phone.

[00082]. Par ailleurs, le serveur 4 comporte un module 15 d'analyse et un module 17 de synthèse.[00082]. Moreover, the server 4 comprises an analysis module 15 and a synthesis module 17.

[00083]. Le module 15 assure l'extraction de paramètres d'analyse, c'est-à-dire de paramètres physiques et musicaux du signal de voix 5, tels la hauteur, extraite par exemple par auto-corrélation ou par l'algorithme décrit dans le document de brevet français de France Telecom portant le numéro d'enregistrement national 01 07284, l'intensité extraite par exemple à partir de l'énergie du signal de voix ou par l'algorithme décrit dans le document de brevet français de France Telecom portant le numéro d'enregistrement national 01 07284, la qualité de détection de la hauteur qui caractérise le niveau de fiabilité de l'estimation de la hauteur extraite par exemple par l'algorithme décrit dans le document de brevet français de France Telecom portant le numéro d'enregistrement national 01 07284, les attaques extraites par exemple par un algorithme de type HFC (High Frequency Content en anglais) utilisant le contenu hautes-fréquences du signal de voix ou à partir de la qualité de détection de la hauteur en considérant par exemple qu'une attaque est un son dont la hauteur est estimée avec une mauvaise qualité, les voyelles caractérisées par un algorithme utilisant par exemple les centroïdes spectralesles consonnes caractérisées par exemple par un modèle combinant HFC et centroïdes spectrales, le timbre de la voix caractérisé par exemple par un algorithme utilisant les MFCC (MeI Frequency Cepstral Coefficients), les divers bruits vocaux comme le « beat box » extraits par exemple par un algorithme utilisant également les MFCC.[00083]. The module 15 extracts the analysis parameters, that is to say the physical and musical parameters of the voice signal 5, such as the height extracted for example by autocorrelation or by the algorithm described in FIG. French patent document of France Telecom bearing the national registration number 01 07284, the intensity extracted for example from the energy of the voice signal or by the algorithm described in the French patent document of France Telecom bearing the national registration number 01 07284, the quality detection of the height which characterizes the level of reliability of the estimate of the height extracted for example by the algorithm described in the French patent document of France Telecom carrying the national registration number 01 07284, the attacks extracted for example by a High Frequency Content (HFC) type algorithm using the high-frequency content of the voice signal or from the height detection quality, for example by considering that an attack is a sound whose height is estimated with a poor quality, the vowels characterized by an algorithm using for example the spectral centroids consonants characterized for example by a model combining HFC and spectral centroids, the tone of the voice characterized for example by an algorithm using MFCC (MeI Frequency Cepstral Coefficients) , the various vocal noises like the "beat box" extracted for example by an algorithm using equal MFCC.

[00084]. Par intensité on désigne l'intensité absolue ou normalisée. Par intensité normalisée, on désigne une intensité qui a été normée par exemple par rapport à l'intensité la plus forte détectée dans la mélodie fredonnée.[00084]. Intensity is the absolute or normalized intensity. Normalized intensity is an intensity that has been normalized, for example, with respect to the strongest intensity detected in the hummed melody.

[00085]. Le module 17 assure la synthèse de la sonnerie, c'est-à-dire les opérations de correction, et/ou de nettoyage, et/ou de transformation et/ou d'orchestration et/ou de recalage rythmique du signal de voix 5 en fonction des paramètres d'analyse extraits par le module 15 et du style musical choisi par l'utilisateur ou imposé.[00085]. The module 17 provides the synthesis of the ring, that is to say the correction operations, and / or cleaning, and / or transformation and / or orchestration and / or rhythmic registration of the voice signal 5 depending on the analysis parameters extracted by the module 15 and the musical style chosen by the user or imposed.

[00086]. Ainsi, lorsque l'utilisateur fredonne, le téléphone 3 capte la voix 5 de l'utilisateur à l'aide du microphone 9 et l'envoie au serveur 4 sous forme d'un fichier 14 audio, par exemple de type mp3 ou wav.[00086]. Thus, when the user hums, the telephone 3 picks up the voice 5 of the user using the microphone 9 and sends it to the server 4 in the form of an audio file 14, for example of the mp3 or wav type.

[00087]. Le serveur 4 reçoit le fichier 14 et en extrait les paramètres d'analyse 25, tels que la hauteur, l'intensité et/ou l'attaque du signal de voix 5. Les paramètres d'analyse 25 extraits du signal de voix 5, ainsi que s'il y a lieu le signal de voix 5 en tant que tel, sont ensuite transmis au module de synthèse 17. [00088]. En outre, des paramètres 16 du style musical choisi par l'utilisateur ou imposé, appelés « paramètres de style » sont transmis au module 17 de synthèse. A partir de ces paramètres de style 16, le module de synthèse 17 établira notamment les règles de composition musicale, comme par exemple des règles de correction, et/ou de nettoyage, et/ou de transformation, et/ou d'orchestration à appliquer au signal de voix 5 pour obtenir une musique présentant les caractéristiques du style musical.[00087]. The server 4 receives the file 14 and extracts the analysis parameters 25, such as the pitch, the intensity and / or the attack of the voice signal 5. The analysis parameters extracted from the voice signal 5, as well as the voice signal as such, are then transmitted to the synthesis module 17. [00088]. In addition, parameters 16 of the musical style chosen by the user or imposed, called "style parameters" are transmitted to the synthesis module 17. From these style parameters 16, the synthesis module 17 will establish in particular the rules of musical composition, such as correction rules, and / or cleaning, and / or transformation, and / or orchestration to apply to the voice signal 5 to obtain a music presenting the characteristics of the musical style.

[00089]. En fonction des paramètres d'analyse 25 extraits et des paramètres de style 16, le module 17 transforme le signal de voix 5 en une ou plusieurs pistes sonores (par exemple une ou plusieurs pistes[00089]. According to the extracted analysis parameters and the style parameters 16, the module 17 transforms the voice signal 5 into one or more sound tracks (for example one or more tracks

MIDI dérivées de la voix jouées par différents instruments virtuels, et/ou une ou plusieurs pistes audio dérivées directement de la voix), qui sont ensuite, le cas échéant, mixées entre elles pour obtenir la sonnerie 13 du téléphone portable.MIDI derived from the voice played by different virtual instruments, and / or one or more audio tracks derived directly from the voice), which are then, if necessary, mixed together to obtain the ring 13 of the mobile phone.

[00090]. En variante, l'enregistrement du fichier contenant le signal de voix 5 est effectué sur le serveur 4 distant, l'enregistrement du signal de voix 5 sur le serveur 4 pouvant alors être effectué en « streaming ».[00090]. Alternatively, the recording of the file containing the voice signal 5 is performed on the remote server 4, the recording of the voice signal 5 on the server 4 can then be performed in "streaming".

[00091]. Dans une exploitation en temps réel ou en streaming, les paramètres d'analyse peuvent être extraits instantanément, ou plus précisément à l'issue de chaque fenêtre d'observation du signal.[00091]. In a real-time or streaming operation, the analysis parameters can be extracted instantly, or more precisely at the end of each signal observation window.

[00092]. En variante, le signal est traité en local par le téléphone portable 3.[00092]. Alternatively, the signal is processed locally by the mobile phone 3.

[00093]. En variante, le signal est traité à la fois partiellement en local (par exemple pour le calcul de la transformée de Fourier du signal de voix) et partiellement sur le serveur, afin de soulager la charge CPU du traitement effectué sur le serveur. Dans ce cas, le signal de voix et le résultat de la FFT ou de tout autre calcul effectué localement sont transmis au serveur qui utilise ces données pour transformer le signal de voix en sonnerie. [00094]. La figure 2 montre une représentation schématique du module 15 selon l'invention qui analyse le signal de voix 5 capté par le microphone 9 (représenté sur la figure 4). Ce module 15 effectue une analyse de bas niveau du signal de voix 5 via les modules 21 et éventuellement 22 et éventuellement une analyse de plus haut niveau via le module 23.[00093]. As a variant, the signal is processed both partially locally (for example for calculating the Fourier transform of the voice signal) and partially on the server, in order to relieve the CPU load of the processing performed on the server. In this case, the voice signal and the result of the FFT or any other calculation made locally are transmitted to the server that uses this data to transform the voice signal into a ring. [00094]. FIG. 2 shows a schematic representation of the module 15 according to the invention which analyzes the voice signal 5 picked up by the microphone 9 (represented in FIG. 4). This module 15 performs a low level analysis of the voice signal 5 via the modules 21 and possibly 22 and possibly a higher level analysis via the module 23.

[00095]. L'analyse de bas niveau est effectuée localement de manière quasi instantanée puisqu'elle est liée au son qui vient d'être prononcé ou à une fenêtre temporelle courte et par exemple de 10 ms. Tandis que l'analyse de haut-niveau, effectuée à l'aide des paramètres de bas niveau, est une analyse globale du signal de voix effectuée a posteriori sur plusieurs secondes du signal de voix 5 ou même sur son ensemble.[00095]. The low-level analysis is performed locally almost instantaneously since it is related to the sound that has just been pronounced or to a short time window and for example 10 ms. While the high-level analysis, performed using the low-level parameters, is a global analysis of the voice signal performed a posteriori over several seconds of the voice signal 5 or even on its whole.

[00096]. Plus précisément, lors de l'analyse de bas niveau, le module 21 extrait des paramètres instantanés tels que la hauteur (pitch en anglais) et/ou l'intensité du signal de voix, qui permettent notamment au module[00096]. More specifically, during the low-level analysis, the module 21 extracts instantaneous parameters such as the pitch and / or the intensity of the voice signal, which in particular allow the module

22 de segmenter le signal de voix en événements sonores (c'est-à-dire par exemple de déterminer l'instant auquel chaque note a été chantée et la durée de celle-ci) et/ou classifier ces événements sonores, c'est-à-dire associer chaque événement à une classe (qui peuvent être par exemple les différents instruments d'une batterie ou les différentes notes fredonnées). Par événements du flux vocal, on entend les objets qui ont un sens rythmique et/ou mélodique, tels que les notes, les syllabes ou les phonèmes.22 to segment the voice signal into sound events (ie for example to determine the moment at which each note was sung and the duration thereof) and / or to classify these sound events is ie to associate each event to a class (which can be for example the different instruments of a battery or the different humming notes). Voice flow events are objects that have a rhythmic and / or melodic meaning, such as notes, syllables or phonemes.

[00097]. Par exemple, le module 21 n'extrait que la hauteur qui est le seul paramètre utilisé ensuite par les modules 22 et/ou 23. En variante, le module 21 extrait la hauteur et l'intensité du signal de voix. L'intensité pourra par exemple être utilisée pour influer sur l'intensité des sons à synthétiser.[00097]. For example, the module 21 only extracts the height which is the only parameter used subsequently by the modules 22 and / or 23. As a variant, the module 21 extracts the pitch and the intensity of the voice signal. The intensity may for example be used to influence the intensity of the sounds to be synthesized.

[00098]. En variante, le module 21 peut extraire les paramètres spectraux du signal de voix (basés sur une transformée de Fourier ou sur une MFCC de ce signal) qui permettent notamment de caractériser l'expressivité vocale (en termes de timbre, phonèmes, etc.). [00099]. Dans ce cas, et par exemple pour des applications rythmiques, le module 21 n'extrait pas nécessairement la hauteur et extrait des paramètres spectraux comme par exemple le contenu haute fréquence (HFC) utilisé pour identifier les attaques et dériver un rythme moyen, ou les 13 premiers coefficients de la MFCC (MeI Frequency Cepstral[00098]. As a variant, the module 21 can extract the spectral parameters of the voice signal (based on a Fourier transform or on an MFCC of this signal) which make it possible in particular to characterize the vocal expressivity (in terms of timbre, phonemes, etc.). . [00099]. In this case, and for example for rhythmic applications, the module 21 does not necessarily extract the height and extracts spectral parameters such as, for example, the high frequency content (HFC) used to identify the attacks and derive a mean rhythm, or the First 13 coefficients of the MFCC (MeI Frequency Cepstral

Coefficients) pour piloter vocalement une boîte à rythme (Vocal BeatBoxing).Coefficients) to vocally control a drum machine (Vocal BeatBoxing).

[000100]. Par ailleurs, à partir des paramètres instantanés extraits par le module 21 , le module 22 réalise une description musicale de bas niveau, consistant à segmenter le signal de voix fredonnée en événements sonores, et/ou classifier ces événements sonores, comme décrit par exemple dans la demande de brevet française de numéro d'enregistrement n°0653557.[000100]. Moreover, from the instantaneous parameters extracted by the module 21, the module 22 realizes a low-level musical description, consisting in segmenting the hummed voice signal into sound events, and / or classifying these sound events, as described for example in the French patent application of registration number 0653557.

[000101]. A cet effet, le module 22 détecte les attaques dans le signal de voix et/ou effectue une segmentation de manière à identifier les différents événements du flux vocal. Les notes qui ont été chantées sont déduites par le module 22 en fonction des fréquences des notes du signal et de la position des attaques mesurées. La classification consiste à déterminer à quelle classe appartient chacun des événements. Par exemple, pour du BeatBoxing Vocal, on associe chaque son à l'un des instruments percussifs d'une batterie.[000101]. For this purpose, the module 22 detects the attacks in the voice signal and / or performs a segmentation so as to identify the different events of the voice stream. The notes that have been sung are deduced by the module 22 as a function of the frequencies of the notes of the signal and the position of the measured attacks. The classification consists of determining to which class belongs each of the events. For example, for Vocal BeatBoxing, we associate each sound with one of the percussive instruments of a drums.

[000102]. Le module 22 effectue également s'il y a lieu une analyse d'expressivité de la voix en détectant notamment les legato et/ou les trémolo présents dans la voix.[000102]. The module 22 also performs if necessary an analysis of expressiveness of the voice by detecting in particular legato and / or tremolo present in the voice.

[000103]. A partir des paramètres extraits par les modules 21 et le cas échéant 22, le module 23 effectue une analyse globale du signal de voix, dite description musicale de haut-niveau.[000103]. From the parameters extracted by the modules 21 and if appropriate 22, the module 23 performs a global analysis of the voice signal, said high-level musical description.

[000104]. A cet effet, le module 23 peut notamment déterminer une transposition tempérée de la mélodie (« accordage ») et le cas échéant la gamme (« harmonie ») et/ou le tempo (« rythme ») dans lesquels le signal de voix est fredonné. [000105]. A cet effet, le module 23 peut effectuer une analyse rythmique. Le rythme peut notamment être caractérisé par son tempo, mesuré en bpm. Le tempo est déduit en repérant la position temporelle des événements vocaux (notes, syllabes ou phonèmes). Dans une implémentation classique, le tempo peut être extrait à partir de l'autocorrélation du signal de voix. Dans une implémentation, le tempo peut être extrait par un algorithme de suivi comme celui proposé par Eric Scheirer (Eric D Scheirer « Tempo and beat analysis of acoustic music signais ». J. Acoust. Soc. Am., 103(1 ), 1998).[000104]. For this purpose, the module 23 may in particular determine a tempered transposition of the melody ("tuning") and where appropriate the range ("harmony") and / or the tempo ("rhythm") in which the voice signal is hummed . [000105]. For this purpose, the module 23 can perform a rhythmic analysis. The rhythm can be characterized in particular by its tempo, measured in bpm. The tempo is deduced by locating the temporal position of the vocal events (notes, syllables or phonemes). In a typical implementation, the tempo can be extracted from the autocorrelation of the voice signal. In an implementation, the tempo can be extracted by a tracking algorithm like the one proposed by Eric Scheirer (Eric D Scheirer "Tempo and Beat Analysis of Acoustic Music Signals." J. Acoust, Soc., Am., 103 (1), 1998 ).

[000106]. Afin de réaliser l'accordage, le module 23 détermine une transposition globale permettant de retranscrire la mélodie de manière optimale dans la gamme chromatique tempérée (Accordage en « La 440 ») jouable par les instruments VST accordés en La 440. Cette optimisation dépend de critères comme la proximité fréquentielle à la mélodie originale, et peut être pondéré par d'autres paramètres comme l'intensité. Un exemple d'accordage est présenté sur la figure 11. Dans le cas d'une gamme tempérée en Vz tons, il suffit de chercher la transposition optimale entre -1/4 ton et +1/4 ton.[000106]. In order to achieve the tuning, the module 23 determines a global transposition for retranscribing the melody optimally in the temperate chromatic range (tuning "The 440") playable by the VST instruments granted in the 440. This optimization depends on criteria as the frequency proximity to the original melody, and can be weighted by other parameters such as intensity. An example of tuning is shown in Figure 11. In the case of a tempered range in Vz tones, it is sufficient to look for the optimal transposition between -1/4 ton and +1/4 ton.

[000107]. Par exemple, on peut choisir la transposition qui permet de modifier le moins possible les passages mélodiques d'intensité importante, ce qui revient mathématiquement à minimiser un « coût d'accordage ». Un exemple de coût d'accordage est calculé par la final P formule : Coût(Transpo) =

+ Transpo)\ .int ensité(t)^q ,p > 0,q ≥ 0 , p[000107]. For example, we can choose the transposition that allows to modify as little as possible the melodic passages of high intensity, which mathematically amounts to minimizing a "cost of tuning". An example of tuning cost is calculated by the final P formula: Cost (Transpo) =

+ Transpo) \ .int asity (t) ^q , p> 0, q ≥ 0, p

et q réels, h(t) étant la hauteur de la mélodie à l'instant t, Transpo étant la variation de hauteur appliquée dont on cherche la valeur optimale comprise entre -1/4 et +1/4 de ton pour une gamme chromatique tempérée en Vz tons, Δ étant la différence instantanée entre une hauteur et la hauteur tempérée la plus proche, intensité(t) étant l'intensité de la mélodie à l'instant t, tO désignant le début de l'enregistrement et tfinal la fin de l'enregistrement. [000108]. L'implémentation de référence utilise cette formule avec p=2 et tfinal ² q=1 , soit CoûtÇTranspo) = + Transpo)\ .int ensité(t) . t

and q real, h (t) being the pitch of the melody at time t, Transpo being the applied pitch variation whose optimum value is sought between -1/4 and +1/4 of a tone for a chromatic scale tempered in Vz tones, where Δ is the instantaneous difference between a height and the nearest tempered height, intensity (t) being the intensity of the melody at time t, where t0 is the beginning of the recording and tfinal is the end of the recording. [000108]. The reference implementation uses this formula with p = 2 and tfinal ² q = 1, which is Cost Trans () = + Transpo) \ .int (t). t

[000109]. Il est à noter que le choix des hauteurs tempérées (par exemple la gamme chromatique en « La 440 ») dépend de la théorie musicale choisie, une gamme en % de ton ou à écart de tons variable pouvant par exemple être choisie pour des théories musicales extra-occidentales.[000109]. It should be noted that the choice of tempered heights (for example the chromatic scale in "The 440") depends on the chosen musical theory, a range in% of tone or variable pitch difference that can for example be chosen for musical theories non-Western.

[000110]. Une fois la transposition optimale effectuée, on réalise un « tempérage » de la mélodie transposée optimale, consistant à accorder la hauteur instantanée de la mélodie transposée optimale sur la hauteur tempérée la plus probable. Dans un exemple simple représenté figure 12, la hauteur tempérée la plus probable est la plus proche fréquentiellement de la hauteur de la mélodie.[000110]. Once the optimal transposition has been carried out, a "tempering" of the optimal transposed melody, consisting in tuning the instantaneous height of the optimal transposed melody to the most probable tempered height is carried out. In a simple example shown in Figure 12, the most likely temperate height is the closest to the melody height.

[000111]. En variante, on peut tempérer les notes chantées sans optimiser la transposition. Par exemple, on peut transposer ou accorder chaque note chantée instantanément sur la note de fréquence la plus proche ou la plus probable dans la gamme chromatique occidentale « La[000111]. Alternatively, the sung notes can be tempered without optimizing the transposition. For example, each singed note can be transposed or tuned instantly to the nearest or most likely frequency note in the Western chromatic scale.

440Hz ». Cet accordage instantané peut être réalisé par le module 22.440Hz ". This instant tuning can be achieved by the module 22.

[000112]. En variante, on peut implémenter des règles plus sophistiquées pour tempérer et tenant compte par exemple du profil mélodique, de la présence d'attaques, ou de la proximité d'une note tempérée (justesse de la note). Par exemple, on peut décider d'accorder toutes les notes de la mélodie ou de la mélodie transposée optimale qui sont à moins de 1/16^eme de ton d'une note juste sur la note juste la plus proche et adopter une stratégie plus sophistiquée pour les autres notes (fausses notes). 1/16^eme de ton est ici donné à titre d'exemple et on peut choisir toute autre règle permettant de répartir les notes entre la note juste en dessous et la note juste au dessus.[000112]. Alternatively, one can implement more sophisticated rules to temper and taking into account for example the melodic profile, the presence of attacks, or the proximity of a temperate note (accuracy of the note). For example, one may decide to tune all the notes of the optimal transposed melody or melody that are within 1/16 ^th of a note just on the nearest note and adopt a more sophisticated strategy. for other notes (false notes). 1/16 ^th of the tone is given here as an example and you can choose any other rule to divide the notes between the note just below and the note just above.

[000113]. Pour les fausses notes, on peut par exemple décider de les concaténer à la note d'avant, c'est-à-dire à la note qui la précède temporellement, ou à la note d'après. Par concaténer on entend les remplacer par la note d'avant ou d'après. On décidera de concaténer avec la note d'après par exemple si la fausse note est précédée d'une attaque et la note d'après ne l'est pas, ce qui suggère qu'elles ne constituent qu'une seule et même note. On décidera de concaténer avec la note d'avant par exemple si la fausse note n'est pas précédée d'une attaque mais est suivie d'une attaque. Les autres fausses notes seront alors concaténées par exemple pour la première moitié à la note d'avant et pour la seconde moitié avec celle d'après.[000113]. For the false notes, one can for example decide to concatenate them to the note of before, that is to say to the note which precedes it temporally, or to the note of after. To concatenate is to replace them with the note before or after. We will decide to concatenate with the note after for example if the false note is preceded by an attack and the note after it is not, which suggests that they constitute one and the same note. It will be decided to concatenate with the note before for example if the false note is not preceded by an attack but is followed by an attack. The other false notes will then be concatenated for example for the first half to the note before and for the second half with the one after.

[000114]. On détermine ensuite le cas échéant la gamme optimale dans laquelle s'inscrit la mélodie, parmi un ensemble de gammes (par exemple les gammes diatoniques et pentatoniques majeures et mineures).[000114]. Then, if necessary, the optimal range in which the melody is inscribed is determined from among a set of scales (for example the major and minor diatonic and pentatonic scales).

L'inscription de la mélodie dans une gamme peut dépendre par exemple du nombre de notes de la mélodie appartenant à la gamme, de leur durée, de leur intensité, et/ou de probabilités de notes dans la gamme. Un exemple de détermination de la gamme optimale est présenté sur les figures 13.1 (gamme non optimale de La majeur) et 13.2 (gamme optimale de La mineur). Les traits en gras représentent les notes de la gamme alors que les traits maigres représentent les notes hors-gamme. En comparant les figures 13.1 et 13.2, on s'aperçoit que la mélodie fredonnée correspond davantage à des notes de la gamme de La mineur qu'à celles de La majeur.The inscription of the melody in a range may depend for example on the number of notes of the melody belonging to the scale, their duration, their intensity, and / or the probabilities of notes in the range. An example of determining the optimal range is shown in Figures 13.1 (non-optimal range of A major) and 13.2 (optimal range of A minor). The bold lines represent the notes of the scale while the lean traits represent the notes out of range. Comparing figures 13.1 and 13.2, we notice that the hummed melody corresponds more to notes of the range of A minor than those of A major.

[000115]. Par probabilité de notes dans la gamme, on entend la probabilité que la note apparaisse dans une mélodie qui serait chantée dans une gamme donnée. On affecte alors à chacune des notes de la gamme tempérée un coefficient qui est la probabilité de la note dans la gamme. La somme de ces coefficients doit ensuite être normée pour représenter une probabilité au sens strict c'est-à-dire que le total des coefficients est égal à 1. Les probetone (Krusmhansl & Kessler, 1982) fournissent un bon exemple de probabilité d'occurrence d'une note dans une gamme majeure ou mineure, mais on peut choisir toute autre répartition de ces probabilités, notamment en fonction du style de musique choisi. [000116]. En variante, on peut augmenter le coefficient associé à la première et/ou à la dernière note pour augmenter leur poids, du fait de leur forte probabilité d'être une note fondamentale de la gamme optimale.[000115]. By probability of notes in the range, we mean the probability that the note appears in a melody that would be sung in a given range. We then assign to each of the notes of the temperate scale a coefficient which is the probability of the note in the range. The sum of these coefficients must then be normalized to represent a probability in the strict sense that is to say that the total of the coefficients is equal to 1. The probetone (Krusmhansl & Kessler, 1982) provide a good example of probability of the occurrence of a note in a major or minor scale, but one can choose any other distribution of these probabilities, in particular according to the style of music chosen. [000116]. Alternatively, one can increase the coefficient associated with the first and / or last note to increase their weight, because of their high probability of being a fundamental note of the optimal range.

[000117]. Par exemple, on peut choisir la gamme qui permet de conserver le plus possible de notes d'intensité importante dans la gamme, ce qui revient mathématiquement à choisir la gamme maximisant une « qualité » harmonique. Cette qualité peut être calculée oar la formule - Q^ualité(^G) = ∑^durée(^notey^{Λnt ensité}(^noteyΛG,noteγ,_P > o,(q,r) > o r ' notes '[000117]. For example, one can choose the range which makes it possible to keep as many notes of high intensity in the range as possible, which mathematically amounts to choosing the range that maximizes a harmonic "quality". This quality can be calculated by the formula - Q ^uality ( ^G ) = Σ ^duration ( ^note y ^{Λnt ensited} ( ^note yΛG, noteγ, _P > o, (q, r)> or 'notes'

G étant la gamme candidate, durée(note) étant la durée de la note, intensité(note) étant l'intensité de la note, et 'P(G, note)' étant la probabilité de la note 'note' dans la gamme G.G being the candidate range, duration (note) being the duration of the note, intensity (note) being the intensity of the note, and 'P (G, note)' being the probability of the note 'note' in the range BOY WUT.

[000118]. L'implémentation de référence utilise cette formule avec p=q=r=1 , et P les probe tone de Krumhansl & Kessler, soit[000118]. The reference implementation uses this formula with p = q = r = 1, and P the probe tone of Krumhansl & Kessler, either

Qualité(G) = V durée(note). int ensité{note).P(G, note) notes [000119]. En variante, on peut imposer une gamme pour les notes fredonnées sans chercher de gamme optimale. Par exemple, on peut accorder chaque note chantée instantanément sur la note de fréquence la plus proche dans la gamme de La Majeur.Quality (G) = V duration (note). intity {note) .P (G, note) notes [000119]. As a variant, a range can be imposed for the humped notes without searching for an optimal range. For example, each note singed instantly can be tuned to the nearest frequency note in the range of The Major.

[000120]. En variante, on peut réaliser simultanément l'accordage et la détermination de la gamme, en déterminant directement la transposition optimale de la mélodie originale dans les différentes gammes candidates[000120]. Alternatively, the tuning and the determination of the range can be carried out simultaneously, by directly determining the optimal transposition of the original melody in the different candidate ranges.

(par exemple les gammes diatoniques et pentatoniques majeures et mineures), et en choisissant la gamme induisant la meilleure « qualité d'accordage ». Dans ce cas, la qualité d'accordage est pondérée par les probabilités de la gamme considérée.(eg the major and minor diatonic and pentatonic scales), and choosing the range inducing the best "tuning quality". In this case, the tuning quality is weighted by the probabilities of the range considered.

[000121]. Par exemple, on peut choisir la transposition mélodique et le type de gamme qui permettent de modifier le moins possible les notes d'intensité importante en respectant au mieux les probabilités d'occurrence des notes dans la gamme, ce qui revient mathématiquement à maximiser une « qualité». [000122]. Cette qualité peut être calculée par la formule :[000121]. For example, we can choose the melodic transposition and the type of scale that allow to modify as little as possible the notes of great intensity by respecting at best the probabilities of occurrence of the notes in the range, which mathematically amounts to maximizing a "quality". [000122]. This quality can be calculated by the formula:

Quahté(G, Transpo) = \ - -A(h(t) + Transpo)\ mt ensιté(t)^q P(G, h(t) + Transpo)^r ,p > 0, (q, r) > 0Quahté (G, Transpo) = \ - -A (h (t) + Transpo) \ mt set (t) ^q P (G, h (t) + Transpo) ^r , p> 0, (q, r)> 0

J A t o ^H JA to ^H

, p,q et r réels, G étant la gamme candidate, Transpo étant la variation de hauteur appliquée dont on cherche la valeur optimale comprise entre -1/4 et +1/4 de ton pour une gamme chromatique tempérée en Vz tons, h(t) étant la hauteur de la mélodie à l'instant t, Δ étant la différence instantanée entre une hauteur et la hauteur tempérée la plus proche, intensité(t) étant l'intensité de la mélodie à l'instant t, tO désignant le début de l'enregistrement et tfinal la fin de l'enregistrement, et 'P(G, h)' la probabilité dans la gamme G de la note tempérée de hauteur la plus proche de la hauteur h., p, q and r real, G being the candidate range, Transpo being the applied pitch variation whose optimal value is sought between -1/4 and +1/4 of a tone for a tempered chromatic range in Vz tones, h (t) being the height of the melody at time t, where Δ is the instantaneous difference between a height and the nearest tempered height, intensity (t) being the intensity of the melody at time t, where t0 is the beginning of the recording and tfinal the end of the recording, and 'P (G, h)' the probability in the G range of the temperate note of pitch closest to the height h.

[000123]. L'implémentation de référence utilise cette formule avec p=2,q=r=1 , et P les probe tone de Krumhansl & Kessler, soit[000123]. The reference implementation uses this formula with p = 2, q = r = 1, and P the Krumhansl & Kessler probe tone

Qualιté(G, Transpo) = - -A(h(t) + Transpo) int ensιté(t) P(G, h(t) + Transpo) . UnQuality (G, Transpo) = - -A (h (t) + Transpo) int ensity (t) P (G, h (t) + Transpo). A

, „ ⁴ exemple d'accordage prenant en compte la gamme optimale est présenté sur la figure 14. ⁴ example of tuning taking into account the optimal range is shown in FIG.

[000124]. Une fois la transposition optimale effectuée, on réalise un « tempérage » de la mélodie transposée optimale, consistant à accorder la hauteur instantanée de la mélodie transposée optimale sur la hauteur tempérée la plus probable ou la plus proche dans la gamme G optimale.[000124]. Once the optimal transposition has been performed, a "tempering" of the optimal transposed melody is performed, which consists in tuning the instantaneous height of the optimal transposed melody to the most probable or closest tempered height in the optimal G range.

Dans un exemple simple, la hauteur tempérée la plus probable est la plus proche fréquentiellement de la hauteur de la mélodie sans prendre en compte la gamme G optimale qui n'a alors été utilisée que pour définir la transposition optimale.In a simple example, the most likely temperate height is the closest frequency to the pitch of the melody without taking into account the optimal G range which was then used only to define the optimal transposition.

[000125]. En variante, on peut tenir compte de la gamme G optimale dans l'étape de « tempérage » en s'appuyant sur la notion de zone d'attraction d'une note pondérée par sa probabilité d'occurrence dans la gamme G optimale.[000125]. As a variant, the optimum gamma-gamma can be taken into account in the "tempering" step by relying on the notion of a zone of attraction of a score weighted by its probability of occurrence in the optimal gamma range.

[000126]. Considérons une note comprise entre le do et le do# pour une gamme de do majeur. Intégrer la notion zone d'attraction d'une note pondérée par sa probabilité d'occurrence revient à dire que le do (note fondamentale) a un pouvoir d'attraction plus fort que le do# (note altérée secondaire), c'est-à-dire que l'on doit choisir le do plus souvent que si l'on ne savait pas que l'on est en gamme de do, sans pour autant exclure le do# lorsque l'on en est très proche. [000127]. Dans une implémentation, on calcule à chaque instant la distance entre la note jouée et chacune des deux notes avec lesquelles elle est le plus proche et on divise cette distance par la probabilité d'occurrence de la note dans la gamme P(G, h). On obtient alors une notion de distance pondérée. En variante, on peut diviser par P(G, h)^r avec r>0. On choisit ensuite celle des deux notes qui est à la distance pondérée la plus proche.[000126]. Consider a note between do and do # for a range of C major. Integrating the notion of attraction zone of a weighted score by its probability of occurrence amounts to saying that the do (note fundamental) has a greater power of attraction than the do # (secondary altered note), that is to say that we must choose the do more often than if we did not know that we are in range of do, without excluding the do # when we are very close. [000127]. In an implementation, the distance between the note played and each of the two notes with which it is closest is calculated at each instant and this distance divided by the probability of occurrence of the note in the range P (G, h) . We then obtain a notion of weighted distance. Alternatively, one can divide by P (G, h) ^r with r> 0. Then choose the one of the two notes which is at the closest weighted distance.

[000128]. En variante, on peut combiner les stratégies de tempérage par distance pondérée et celles tenant compte de la présence d'attaques.[000128]. Alternatively, weighted distance tempering strategies may be combined with those taking into account the presence of attacks.

[000129]. En variante, on peut utiliser la notion de distance pondérée pour réaliser l'opération d'accordage et de détermination de la gamme[000129]. Alternatively, the notion of weighted distance can be used to perform the tuning and range determination operation.

(Fig.14). Dans ce cas, Δ représente la distance pondérée.(Fig.14). In this case, Δ represents the weighted distance.

[000130]. En variante, dans une implémentation dite en temps réel, on n'attend pas la fin de l'enregistrement pour effectuer la description musicale de haut niveau 23. On peut alors réaliser cette analyse de haut niveau 23 en permanence afin de fournir à chaque instant une estimation des paramètres de haut niveau (rythme, tempo, gamme notamment). A l'instant t on réalise l'analyse par exemple sur une fenêtre dont la taille va grandissante et partant du début de l'enregistrement et allant jusqu'à t, ou sur une fenêtre de taille constante et prenant en compte les 5 dernières secondes de chant. Cette implémentation trouverait une application particulièrement avantageuse dans le cadre d'un jeu vidéo ou de toute autre application interactive temps réel.[000130]. As a variant, in a so-called real-time implementation, the end of the recording is not expected to perform the high-level musical description 23. It is then possible to carry out this high-level analysis 23 permanently in order to provide at each instant an estimate of the high level parameters (rhythm, tempo, range in particular). At the instant t, the analysis is carried out for example on a window whose size increases and starts from the beginning of the recording and up to t, or on a window of constant size and taking into account the last 5 seconds of singing. This implementation would find a particularly advantageous application in the context of a video game or any other interactive application real time.

[000131]. L'ensemble des paramètres d'analyse 25 extraits par les modules 21-23 et éventuellement les paramètres de style 16 sont envoyés au module de synthèse 17 représenté sur la figure 3. Ce module de synthèse 17 est composé d'un module de traitement MIDI 26 et/ou d'un module de traitement audio 27. Les paramètres d'analyse 25 ainsi que les paramètres de style 16 sont transmis à ces deux modules 26, 27. En outre, le signal de voix 5 et/ou des boucles d'instruments préenregistrés 53 sont appliqués en entrée du module 27.[000131]. The set of analysis parameters extracted by the modules 21-23 and possibly the style parameters 16 are sent to the synthesis module 17 represented in FIG. 3. This synthesis module 17 is composed of a MIDI processing module. 26 and / or an audio processing module 27. The analysis parameters 25 as well as that the style parameters 16 are transmitted to these two modules 26, 27. In addition, the voice signal 5 and / or prerecorded instrument loops 53 are applied to the input of the module 27.

[000132]. Les paramètres d'analyse 25 transmis dépendent du type de synthèse envisagé. En effet, pour retranscrire simplement la voix chantée sous forme de sonnerie MIDI, seuls les débuts de notes, les durées, les hauteurs et éventuellement intensités des notes chantées sont envoyés au module 17. En revanche, pour ajouter une boucle rythmique sur la voix, le tempo extrait de la voix est envoyé au module 17 afin de caler la boucle rythmique sur la voix. Tandis que pour corriger la mélodie ou ajouter un accompagnement mélodique cohérent, on envoie en outre les paramètres liés à l'harmonie du signal de voix 5 de sorte qu'un instrument synthétisé puisse correspondre à cette harmonie.[000132]. The transmitted analysis parameters depend on the type of synthesis envisaged. In fact, to simply transcribe the sung voice as a MIDI ringtone, only the beginnings of notes, the durations, the heights and possibly intensities of the sung notes are sent to the module 17. On the other hand, to add a rhythmic loop on the voice, the tempo extracted from the voice is sent to the module 17 in order to stall the rhythmic loop on the voice. While to correct the melody or add a coherent melodic accompaniment, the parameters related to the harmony of the voice signal 5 are further sent so that a synthesized instrument can correspond to this harmony.

[000133]. En variante et par exemple dans le cas où on ne souhaite capter que des informations d'ordre rythmique, on n'analyse pas la hauteur mais des informations d'ordre spectral comme par exemple le HFC pour identifier avec précision les instants d'attaque ou les 13 premiers coefficients de la MFCC pour un pilotage vocal d'une batterie à partir d'un beat boxing.[000133]. As a variant and for example in the case where it is desired to capture only rhythmic information, the height is not analyzed, but spectral order information, such as for example the HFC, to identify precisely the attack instants or the first 13 coefficients of the MFCC for a voice control of a battery from a beat boxing.

[000134]. Lors du traitement MIDI effectué par le module 26, le signal 5 vocal est transformé à partir des paramètres d'analyse 25 en une piste 29 MIDI brute représentée sur la figure 5, cette piste 29 comportant pour les pistes mélodiques les notes tempérées représentant la phrase chantée déterminée à partir des modules 21-22.[000134]. During the MIDI processing performed by the module 26, the speech signal is transformed from the analysis parameters 25 into a raw MIDI track 29 shown in FIG. 5, this track 29 having for the melodic tracks the temperate notes representing the phrase sung determined from the modules 21-22.

[000135]. En variante, la piste midi brute 29 est tempérée de manière optimale en intégrant les résultats de l'accordage et le cas échéant de la gamme optimale déterminés dans le module 23 et éventuellement des paramètres de style.[000135]. In a variant, the raw midi track 29 is optimally tempered by integrating the results of the tuning and, if appropriate, the optimum range determined in the module 23 and possibly the style parameters.

[000136]. La piste 29 MIDI brute est ensuite utilisée pour synthétiser l'ensemble des pistes instrumentales MIDI, par exemple la piste de piano, de basse, de batterie, ou de synthétiseurs. Autrement dit, dans l'invention, à partir d'une ligne monophonique (la ligne de voix), on élabore une sonnerie polyphonique, c'est-à-dire une sonnerie qui est une superposition de plusieurs pistes musicales, et poly-instrumentale, c'est- à-dire que des pistes de la sonnerie peuvent correspondre à des instruments différents.[000136]. The raw MIDI track 29 is then used to synthesize all MIDI instrumental tracks, such as the piano, bass, drums, or synthesizer tracks. In other words, in the invention, from a monophonic line (the voice line), one produces a polyphonic ringtone, that is to say a ring which is a superposition of several musical tracks, and poly-instrumental, that is to say that the ring tracks can correspond to different instruments.

[000137]. Comme représenté sur la figure 8, les pistes 31-33, 35-38 définissant les différentes pistes musicales de la sonnerie peuvent être obtenues par des traitements et/ou transformations de la piste 29 MIDI brute du signal de voix. Ces traitements et transformations qui font partie des règles de composition musicale dépendent notamment du type d'instrument à synthétiser et des paramètres de style.[000137]. As shown in Fig. 8, tracks 31-33, 35-38 defining the different musical tracks of the ring may be obtained by processing and / or transforming the raw MIDI track 29 of the voice signal. These treatments and transformations that are part of the musical composition rules depend in particular on the type of instrument to be synthesized and the parameters of style.

[000138]. Les traitements peuvent être des traitements de nettoyage de la piste, et/ou de correction mélodique, et/ou de recalage rythmique. Les transformations consistent à composer automatiquement une nouvelle mélodie dérivée de la mélodie chantée : le choix des notes et du rythme est effectué en fonction des paramètres d'analyse, et de règles de construction musicales qui dépendent des paramètres de style (les règles de composition appliquées pour la musique RnB étant différentes des règles de composition pour la musique jazz).[000138]. Treatments can be track cleansing treatments, and / or melodic correction, and / or rhythmic registration. The transformations consist in automatically composing a new melody derived from the sung melody: the choice of the notes and the rhythm is made according to the parameters of analysis, and rules of musical construction which depend on the parameters of style (the rules of composition applied for RnB music being different from the composition rules for jazz music).

[000139]. Pour effectuer le nettoyage, on élimine les notes indésirables tout en conservant le cas échéant les notes d'expressivité. Une note peut par exemple être considérée comme expressive si elle est inscrite dans la continuité mélodique de la phrase (proche des notes précédente et suivante) et si elle a une intensité suffisante pour appuyer les notes qu'elle accompagne.[000139]. To carry out the cleaning, we eliminate unwanted notes while keeping the notes of expressiveness if necessary. For example, a note may be considered expressive if it is written in the melodic continuity of the sentence (close to the preceding and following notes) and if it has sufficient intensity to support the notes it accompanies.

[000140]. Le nettoyage consiste par exemple à appliquer divers traitements à la piste 29, comme la suppression ou le regroupement des notes dont la durée est inférieure à une durée de référence (par exemple 1 ms), et/ou le lissage des notes instables, et/ou la suppression des notes dont l'intensité est inférieure à un seuil, et/ou la suppression des notes dont la qualité de détection de la hauteur est inférieure à un seuil. La figure 6 montre par exemple le traitement effectué sur la piste 29 MIDI brute pour obtenir la piste 30 jouée par exemple par un piano. [000141]. Pour effectuer une correction mélodique, on recale la hauteur des notes de la piste 29 MIDI brute suivant l'harmonie supposée, le cas échéant en conservant tout ou partie des notes expressives. Une note chantée qui ne se situe pas dans la gamme calculée est soit remplacée par la note la plus proche ou la plus probable de cette gamme, soit conservée et mise en valeur comme note d'expressivité (par exemple comme trille ou pour la mise en place d'un portamento). La correction des notes peut ainsi être contrôlée, en fonction par exemple de leur niveau d'expressivité, de leur intensité et de leur durée. La figure 6 montre une correction mélodique effectuée sur la piste nettoyée 30 de la figure 5 pour obtenir une piste 31 recalée dans une gamme majeure. A cet effet, les notes de la piste 30 sont recalées dans la gamme optimale calculée précédemment.[000140]. The cleaning consists for example in applying various treatments to the track 29, such as deleting or grouping the notes whose duration is less than a reference duration (for example 1 ms), and / or the smoothing of the unstable notes, and / or deleting notes whose intensity is below a threshold, and / or deleting notes whose height detection quality is below a threshold. FIG. 6 shows, for example, the processing performed on the raw MIDI track 29 to obtain the track 30 played for example by a piano. [000141]. In order to perform a melodic correction, the pitch of the notes of the raw MIDI track 29 is adjusted according to the supposed harmony, if necessary by preserving all or part of the expressive notes. A sung note that is not in the calculated range is either replaced by the nearest or most probable note of that scale, or kept and highlighted as a note of expressiveness (for example as a trill or for the setting place of a portamento). The correction of the notes can thus be controlled, depending for example on their level of expressiveness, their intensity and their duration. FIG. 6 shows a melodic correction performed on the cleaned track 30 of FIG. 5 to obtain a track 31 recalibrated in a major range. For this purpose, the notes of the track 30 are recalibrated in the optimal range calculated previously.

[000142]. Pour effectuer un recalage rythmique, les notes détectées lors de la phase d'analyse sont décalées, afin que leurs instants d'attaque respectent un motif rythmique donné. Ce motif rythmique peut être simplement un rythme régulier (par exemple les notes sont positionnées sur les croches déterminées en fonction du tempo extrait par le module[000142]. To perform a rhythmic adjustment, the notes detected during the analysis phase are shifted so that their attacking moments respect a given rhythm pattern. This rhythmic pattern can be simply a regular rhythm (for example the notes are positioned on the eighth notes determined according to the tempo extracted by the module

23), ou plus complexe (par exemple un rythme de bossa-nova), ou correspondant aux instants d'attaque de la mélodie chantée.23), or more complex (for example a bossa-nova rhythm), or corresponding to the instants of attack of the sung melody.

[000143]. Pour effectuer une transformation (composition automatique d'une nouvelle mélodie ou d'une piste rythmique dérivée de la voix), de nombreuses méthodes et algorithmes sont possibles : transposition, extraction des notes sur les temps forts (repérés grâce au tempo extrait par le module 23), application de patterns rythmiques, dérivation de notes de l'harmonie.[000143]. To perform a transformation (automatic composition of a new melody or a rhythmic track derived from the voice), many methods and algorithms are possible: transposition, extraction of notes on the highlights (identified thanks to the tempo extracted by the module 23), application of rhythmic patterns, derivation of notes of harmony.

[000144]. Ainsi dans un exemple, la piste 33 correspondant à une oème ά. voix est obtenue en transposant la piste 31 trois octaves plus haut. A cet effet, trois octaves sont ajoutées au paramètre de hauteur contenu dans le fichier MIDI du signal de voix corrigé.[000144]. Thus in one example, the track 33 corresponding to an oem ά. voice is obtained by transposing track 31 three octaves higher. For this purpose, three octaves are added to the pitch parameter contained in the MIDI file of the corrected voice signal.

[000145]. Dans un exemple, la piste 35 correspondant à un choeur est obtenue en reprenant les notes de la piste 31 sur les temps les plus forts. Pour cela, on modifie la piste MIDI en éliminant toutes les notes qui ne sont pas sur les temps, et en modifiant la durée des notes restantes afin qu'elles durent la totalité du temps.[000145]. In one example, the track 35 corresponding to a chorus is obtained by taking again the notes of the track 31 on the strongest times. For this, we modify the MIDI track by eliminating all the notes that do not are not on the times, and changing the duration of the remaining notes so that they last the entire time.

[000146]. La piste 36 correspondant à une basse est obtenue en ne conservant que les notes sur les temps et en alternant fondamentale et quinte déterminés en fonction de la connaissance de l'harmonie. Pour cela, on modifie la piste MIDI en éliminant toutes les notes qui ne sont pas sur les temps, et en modifiant la hauteur des notes restantes en choisissant la note la plus proche de l'accord considéré (fondamental ou quinte).[000146]. The track 36 corresponding to a bass is obtained by keeping only the notes on time and alternating fundamental and fifth determined according to the knowledge of harmony. For that, one modifies the MIDI track by eliminating all the notes which are not on the times, and modifying the height of the remaining notes by choosing the note closest to the chord considered (fundamental or fifth).

[000147]. La piste 37 correspondant à une deuxième basse plus complexe que la première est basée sur la transposition d'un riff (motif rythmico-mélodique répétitif). Pour cela, on modifie la piste MIDI en éliminant toutes les notes qui ne sont pas sur les temps (ou plusieurs temps en fonction de la durée du riff), et en ajoutant les notes du riff transposées à la hauteur des notes restantes.[000147]. The track 37 corresponding to a second bass more complex than the first is based on the transposition of a riff (repetitive rhythmic-melodic pattern). For this, we modify the MIDI track by eliminating all the notes that are not on the beats (or several times according to the duration of the riff), and by adding the notes of the riff transposed to the height of the remaining notes.

[000148]. La piste 38 correspondant à la batterie est obtenue en calant un motif rythmique caractéristique du style RnB sur le tempo de la voix (extrait par le module 23). Pour cela, on charge une piste MIDI batterie caractéristique du style, qu'on répète un nombre de fois suffisant pour couvrir l'ensemble de la mélodie, et on ajoute des notes complémentaires aux instants expressifs de la voix.[000148]. The track 38 corresponding to the battery is obtained by setting a rhythmic pattern characteristic of the RnB style on the tempo of the voice (extracted by the module 23). For this, we load a MIDI drum track characteristic of the style, which is repeated a sufficient number of times to cover the entire melody, and we add additional notes to the expressive moments of the voice.

[000149]. Des sons percussifs complémentaires (par exemple des coups de cymbales) peuvent être ajoutés en fonction de l'expressivité de la voix (paramètres d'analyse 25 extraits par les module 22-23), par exemple sur des notes d'expressivité.[000149]. Complementary percussive sounds (for example cymbal strokes) may be added depending on the expressiveness of the voice (analysis parameters extracted by the modules 22-23), for example on notes of expressiveness.

[000150]. Il est à noter que la cohérence dans l'arrangement des pistes est garantie par les paramètres d'analyse 25 qui permettent notamment de s'assurer que toutes les pistes sont dans le même tempo et/ou dans la même harmonie.[000150]. It should be noted that the consistency in the arrangement of the tracks is guaranteed by the analysis parameters 25 which make it possible in particular to ensure that all the tracks are in the same tempo and / or in the same harmony.

[000151]. Chaque piste MIDI de la sonnerie peut être créée à partir de la piste MIDI brut du signal de voix qui est traitée et/ou transformée, ou à partir d'une piste MIDI dérivant de la piste MIDI brute (soit une piste MIDI déjà traitée et/ou transformée), ou à partir des paramètres d'analyse 25 et de motifs préexistants. Ainsi une piste de violon peut par exemple être dérivée directement de la piste de piano, et une piste de batterie RnB peut être construite directement à partir du tempo et d'un motif rythmique[000151]. Each MIDI track of the ringtone can be created from the raw MIDI track of the voice signal being processed and / or transformed, or from a MIDI track deriving from the raw MIDI track (ie a MIDI track already processed and / or transformed), or from the analysis parameters and pre-existing patterns. For example, a violin track can be derived directly from the piano track, and an RnB drum track can be built directly from the tempo and a rhythmic pattern.

RnB.RnB.

[000152]. Par ailleurs, des effets spécifiques au MIDI, comme le doublage des notes, l'arpégiation, l'introduction de retard, peuvent être appliqués sur les pistes obtenues 30, 31 , 33, 35-38. Ces effets sont appliqués en fonction du style musical choisi par l'utilisateur ou imposé et/ou du timbre de la voix de l'utilisateur. Par exemple, il sera possible de transformer les notes d'une piste en accords joués en arpèges, ces arpèges étant joués en fonction de l'harmonie déterminée préalablement et de la vitesse d'élocution. Dans un autre exemple d'effet, toutes les notes d'expressivité peuvent être transformées en trilles.[000152]. In addition, MIDI-specific effects, such as dubbing notes, arpegiation, delaying, can be applied to the resulting tracks 30, 31, 33, 35-38. These effects are applied according to the musical style chosen by the user or imposed and / or the tone of the voice of the user. For example, it will be possible to transform the notes of a track into chords played in arpeggios, these arpeggios being played according to the harmony previously determined and the speed of speech. In another example effect, all expressiveness notes can be transformed into trills.

[000153]. En outre, une mélodie originale peut être dérivée de la voix, afin non pas de se substituer, mais de continuer celle-ci. Par exemple, le système pourra composer une réponse à la mélodie vocale (pour un dialogue avec la machine), ou une suite (afin de terminer la mélodie correctement en fonction du style choisi). Par exemple, si la ligne de basse alterne une fondamentale et une quinte, la quinte introduisant une tension musicale et la fondamentale fermant cette tension, et qu'il termine sur une quinte, le procédé continuera la phrase musicale en introduisant une fondamentale afin de fermer la tension musicale. La mélodie fredonnée peut aussi être envoyée avec ses paramètres d'analyse et notamment les paramètres harmoniques et/ou rythmiques à un logiciel de composition automatique, afin que ce logiciel compose une fin ou une réponse.[000153]. In addition, an original melody can be derived from the voice, not to substitute, but to continue it. For example, the system can compose a response to the vocal melody (for a dialogue with the machine), or a sequence (to finish the melody correctly according to the chosen style). For example, if the bass line alternates a fundamental and a fifth, the fifth introducing a musical tension and the fundamental closing this tension, and ending on a fifth, the process will continue the musical phrase by introducing a fundamental to close the musical tension. The hummed melody can also be sent with its analysis parameters including the harmonic and / or rhythmic parameters to an automatic composition software, so that this software composes an end or an answer.

[000154]. Une fois obtenues, les pistes MIDI 30, 31 , 33, 35-38 sont intégrées dans un unique fichier MIDI pouvant être lu par tout matériel respectant la norme MIDI. Ce fichier peut être envoyé au téléphone ou à tout autre appareil ou logiciel en tant que sonnerie MIDI, ces appareils disposant d'instruments MIDI de base pour jouer les pistes MIDI qu'il contient.[000154]. Once obtained, the MIDI tracks 30, 31, 33, 35-38 are integrated into a single MIDI file that can be played by any MIDI-compliant material. This file can be sent to the phone or any other device or software as a MIDI ringtone, these devices with basic MIDI instruments to play the MIDI tracks it contains.

[000155]. En variante, la ou les pistes MIDI 30, 31 , 33, 35-38 sont transmises à un module 43 comportant des instruments virtuels par exemple de type VST qui permettent de synthétiser les pistes audio associées à chaque instrument. Par exemple, la piste de basse MIDI est envoyée à l'instrument VST nommé « Virtual Bass » qui transforme cette piste MIDI symbolique en une piste audio de basse. Toutes les pistes audio synthétisées à partir des pistes MIDI sont ensuite envoyées à un module 47 qui assure le mixage des pistes musicales générées.[000155]. As a variant, the MIDI track or tracks 30, 31, 33, 35-38 are transmitted to a module 43 comprising virtual instruments, for example of the VST type, which make it possible to synthesize the audio tracks associated with each instrument. For example, the MIDI bass track is sent to the VST instrument named "Virtual Bass" which turns this symbolic MIDI track into a bass audio track. All audio tracks synthesized from the MIDI tracks are then sent to a module 47 which mixes the generated musical tracks.

[000156]. Par ailleurs, le module 27 élabore des pistes 49-51 de type audio, dérivant de la transformation directe du signal de voix et/ou de sons pré-enregistrés, sans passer par une représentation symbolique de type MIDI. Ces pistes 49-51 sont obtenues par transformation du signal 5 de voix et/ou par synthèse sonore à partir des paramètres d'analyse 25 extraits et de sons 53 pré-enregistrés.[000156]. Furthermore, the module 27 develops tracks 49-51 of audio type, derived from the direct transformation of the voice signal and / or pre-recorded sounds, without passing through a symbolic representation of MIDI type. These tracks 49-51 are obtained by transforming the voice signal and / or by sound synthesis from the extracted analysis parameters and pre-recorded sounds.

[000157]. Les transformations de la voix peuvent être soit des transformations correctives (mélodiques et/ou rythmiques), soit des transformations timbrales, soit ces deux types de transformations appliquées successivement.[000157]. The transformations of the voice can be either corrective transformations (melodic and / or rhythmic), or timbral transformations, or these two types of transformations successively applied.

[000158]. La correction mélodique peut être effectuée par exemple par une technique dite de « Pitch shifting » dans laquelle on recale les notes de la voix sur les notes tempérées ou de la gamme, optimale ou non, déterminée par le module d'analyse 15, ou sur les notes d'une mélodie dérivée du signal de voix.[000158]. The melodic correction can be performed for example by a so-called "pitch shifting" technique in which the notes of the voice are recalibrated on the temperate notes or of the range, optimal or not, determined by the analysis module 15, or on the notes of a melody derived from the voice signal.

[000159]. La correction rythmique de la voix consiste en une synchronisation rythmique de la voix sur un rythme de référence. Cette correction rythmique peut être effectuée en calant rythmiquement la voix sur un rythme imposé (par exemple un tempo fixe). Le calage est effectué par « Time stretching », en modifiant la durée de passages sonores qui ne sont pas en rythme afin de les caler sur le rythme imposé. Ce rythme peut être déterminé par les paramètres d'analyse ou p ar un pattern rythmique imposé.[000159]. The rhythmic correction of the voice consists of a rhythmic synchronization of the voice on a rhythm of reference. This rhythmic correction can be performed by rhythmically stalling the voice on an imposed rhythm (for example a fixed tempo). The timing is done by "Time stretching", changing the duration of sound passages that are not in rhythm to stall on the imposed rhythm. This rhythm can be determined by the parameters of analysis or by an imposed rhythmic pattern.

[000160]. En variante, cette correction rythmique peut consister en une synchronisation rythmique sur le rythme de la voix, sur lequel sont calées les pistes instrumentales. Dans un exemple, on effectue un marquage rythmique par la technique de « Time Warping » dans laquelle on effectue un repérage des temps forts du signal de voix représentés par les traits verticaux 55 sur la figure 9, afin de construire un rythme de référence sur lequel toutes les pistes musicales sont synchronisées. Ce rythme de référence fera alors partie des paramètres d'analyse 25 et pourra être utilisé pour construire les pistes midi et/ou audio. Ce rythme de référence pourra aussi être utilisé comme l'entrée d'un logiciel de composition automatique qui générera automatiquement une mélodie dont le rythme coïncidera avec ce rythme de référence.[000160]. Alternatively, this rhythmic correction may consist of a rhythmic synchronization on the rhythm of the voice, on which are instrumental tracks. In one example, a rhythm marking is carried out by the "Time Warping" technique in which the strong points of the voice signal represented by the vertical lines 55 in FIG. 9 are recorded in order to construct a reference rhythm on which all the music tracks are synchronized. This reference rate will then be part of the analysis parameters 25 and may be used to build the midi and / or audio tracks. This reference rhythm can also be used as the input of an automatic composition software that will automatically generate a melody whose rhythm will coincide with this reference rhythm.

[000161]. Les transformations timbrales sont basées sur la modification du timbre de la voix obtenue par application de filtres sur le signal de voix. Cette modification du timbre peut être effectuée par transformation du signal de voix 5 en une autre voix réaliste ou non, comme par exemple une voix d'ogre ou de monstre, en utilisant par exemple un vocodeur de phase. La voix transformée peut rester intelligible ou être transformée en instrument de musique ou en son inintelligible. La voix peut également être transformée en une ambiance sonore (nappe sonore obtenue par exemple par synthèse concaténative à partir de sons divers, musicaux ou non).[000161]. The timbral transformations are based on the modification of the tone of the voice obtained by applying filters on the voice signal. This modification of the timbre can be performed by transforming the voice signal 5 into another realist voice or not, such as for example an ogre voice or monster, using for example a vocoder phase. The transformed voice can remain intelligible or be transformed into a musical instrument or its unintelligible. The voice can also be transformed into a sound environment (sound sheet obtained for example by concatenative synthesis from various sounds, musical or not).

[000162]. Par ailleurs, des pistes audio peuvent être obtenues en sélectionnant des notes du signal de voix et en les accentuant en fonction de l'expressivité de la voix ou des paramètres d'analyse 25, et/ou en utilisant des boucles de rythme préenregistrées que l'on cale sur le tempo mesuré (et le cas échéant modifié) du signal de voix.[000162]. Furthermore, audio tracks can be obtained by selecting notes of the voice signal and emphasizing them according to the expressiveness of the voice or the parameters of analysis 25, and / or by using pre-recorded rhythm loops that the We hold on the measured (and if necessary modified) tempo of the voice signal.

[000163]. Dans une mise en œuvre, on utilise la voix pour choisir un objet sonore dans une base de données musicale, afin de composer une ou plusieurs des pistes audio. A cet effet, on extrait les paramètres d'analyse, tels que la hauteur, l'intensité et l'attaque, d'échantillons musicaux et on stocke ces échantillons musicaux et les paramètres d'analyse qui leur sont associés dans la base de données. Ensuite, on extrait des paramètres d'analyse 25 du signal de voix sur une période de temps donnée et on sélectionne dans la base de données les échantillons musicaux présentant les paramètres d'analyse les plus proches du signal de voix sur l'intervalle de temps donné. Par exemple, pour générer la piste de batterie, on peut rechercher la rythmique RnB correspondant le mieux au rythme exprimé par la voix.[000163]. In one implementation, the voice is used to select a sound object in a music database, in order to compose one or more audio tracks. For this purpose, the analysis parameters, such as height, intensity and attack, are extracted from samples and stores these musical samples and associated analysis parameters in the database. Then, analysis parameters of the voice signal are extracted over a given period of time and the musical samples having the closest analysis parameters of the voice signal over the time interval are selected in the database. given. For example, to generate the drum track, one can search for the rhythmic RnB corresponding best to the rhythm expressed by the voice.

[000164]. Les échantillons musicaux peuvent présenter plusieurs niveaux de granularité. En effet, les échantillons musicaux peuvent être de petits objets musicaux, tels qu'une note d'un instrument synthétisé, ou des objets musicaux de grande taille, tels que des boucles musicales. Les paramètres d'analyse seront extraits sur une plus grande période et en plus grand nombre dans le cas d'une sélection d'objets musicaux de grande taille.[000164]. Musical samples can have multiple levels of granularity. Indeed, musical samples can be small musical objects, such as a note of a synthesized instrument, or large musical objects, such as musical loops. The analysis parameters will be extracted over a larger period and in greater numbers in the case of a selection of large musical objects.

[000165]. La ou les pistes audio 49-51 ainsi dérivées du signal de voix sont transmises au module de mixage 47.[000165]. The audio track or tracks 49-51 thus derived from the voice signal are transmitted to the mixing module 47.

[000166]. Le module de mixage 47 effectue ensuite un traitement et un mixage sonore des pistes audio 30', 31 ', 33', 35'-38, 49-51 ' issues du traitement MIDI et/ou du traitement audio. A cet effet, le module 47 règle les volumes des différentes pistes les uns par rapport aux autres et/ou introduit des effets individuels sur chaque piste (tels que saturation ou compression sonore). Le module 47 réalise ensuite une piste de sortie audio unique, mixée à partir de toutes les pistes instrumentales individuelles. Le module 47 est également apte à appliquer des effets sonores globaux sur cette piste de sortie (par exemple une réverbération globale ou tout effet d'acoustique des salles comme par exemple un effet « église »).[000166]. The mixing module 47 then performs a sound processing and mixing audio tracks 30 ', 31', 33 ', 35'-38, 49-51' from MIDI processing and / or audio processing. For this purpose, the module 47 adjusts the volumes of the different tracks relative to each other and / or introduces individual effects on each track (such as saturation or sound compression). The module 47 then realizes a single audio output track, mixed from all the individual instrumental tracks. The module 47 is also able to apply global sound effects on this output track (for example a global reverberation or any room acoustics effect such as for example a "church effect").

[000167]. En variante, l'utilisateur pourra modifier le mixage original en choisissant par exemple d'augmenter le volume sur la basse ou de le diminuer sur un autre instrument. [000168]. Le signal de son 59 issu du mixage est ensuite appliqué en entrée d'un module 60 qui le transforme dans un format audio lisible par le téléphone de l'utilisateur (par exemple mp3). En sortie du module 60, on obtient une sonnerie 61 qui est enregistrée dans une mémoire du téléphone portable.[000167]. Alternatively, the user can change the original mix by choosing for example to increase the volume on the bass or reduce it on another instrument. [000168]. The sound signal 59 from the mix is then input to a module 60 which transforms it into an audio format readable by the user's phone (eg mp3). At the output of the module 60, there is obtained a ring 61 which is recorded in a memory of the mobile phone.

[000169]. En variante, le signal de voix peut également être transmis tel quel au module de mixage 47 afin d'être injecté dans la sonnerie et sera alors considéré comme une piste audio.[000169]. Alternatively, the voice signal can also be transmitted as such to the mixing module 47 to be injected into the ring and then be considered an audio track.

[000170]. La figure 15 présente une chaîne de traitement simplifié de l'invention. Le module d'analyse est limité au module 21 qui extrait uniquement les paramètres de hauteur et d'intensité pour générer la piste MIDI brute 29. Seules deux pistes MIDI cohérentes sont dérivées : la piste 1 qui est la piste MIDI brute nettoyée, et la piste 2 jouant cette même piste une octave plus haut. [000171]. L'invention concerne également les téléphones portables comportant une zone mémoire associée à chacun des contacts du téléphone, cette zone mémoire étant destinée à stocker une sonnerie élaborée à l'aide du procédé selon l'invention.[000170]. Figure 15 shows a simplified processing chain of the invention. The analysis module is limited to module 21 which extracts only the pitch and intensity parameters to generate the raw MIDI track 29. Only two coherent MIDI tracks are derived: track 1 which is the cleaned raw MIDI track, and the track 2 playing this same track an octave higher. [000171]. The invention also relates to mobile telephones comprising a memory zone associated with each of the contacts of the telephone, this memory zone being intended to store a ring produced using the method according to the invention.

[000172]. Ainsi l'utilisateur ou un contact, peut enregistrer dans le téléphone une phrase chantée en ayant préalablement sélectionné un style de musique. Par exemple, un contact nommé Ursula enregistre une phrase chantée du type « bonjour c'est Ursulaaaϋ » en sélectionnant un style rock.[000172]. Thus the user or a contact, can record in the phone a sung phrase having previously selected a style of music. For example, a contact named Ursula records a sung phrase like "hello is Ursulaaaϋ" by selecting a rock style.

[000173]. Le procédé selon invention décrit ci-dessus transforme alors la phrase chantée en une sonnerie.[000173]. The method according to the invention described above then transforms the sung phrase into a ring.

[000174]. L'utilisateur associe alors la sonnerie obtenue au contact auquel elle est destinée, en la stockant dans la zone mémoire associée à ce contact.[000174]. The user then associates the ringtone obtained with the contact for which it is intended, by storing it in the memory zone associated with this contact.

[000175]. Ainsi lorsque le contact appellera, le téléphone jouera la sonnerie correspondant à ce contact. [000176]. L'invention trouve une application particulièrement avantageuse pour l'élaboration d'une sonnerie d'un téléphone portable. Toutefois, l'invention pourrait aussi être utilisée plus généralement dans le domaine de la création musicale. Par extension, on appelle « sonnerie » tout objet musical possédant une cohérence mélodique et/ou rythmique, et contenant ou non de la voix. Ainsi un riff instrumental, un jingle, une sonnerie de téléphone, un message de répondeur sont considérés comme des sonneries au sens de l'invention.[000175]. So when the contact will call, the phone will play the ringtone corresponding to that contact. [000176]. The invention finds a particularly advantageous application for the development of a ringtone of a mobile phone. However, the invention could also be used more generally in the field of music creation. By extension, we call "ringtone" any musical object having a melodic and / or rhythmic coherence, and containing or not the voice. Thus an instrumental riff, a jingle, a telephone ring, an answering machine message are considered as ringtones within the meaning of the invention.

[000177]. En outre, il est possible d'envisager une application dite « Voxgroovebox » dans laquelle on enregistre une phrase musicale en appuyant sur un bouton du clavier du téléphone, et dès qu'on relâche le bouton, la sonnerie générée par le procédé selon l'invention à partir de la phrase musicale se met à boucler sur elle-même. Ensuite, il est possible de chanter sur cette sonnerie, ou de la mixer avec des morceaux de musique stockés dans le téléphone. En fonction de l'intention musicale, on active tout ou partie des pistes: rythmique seule, basse seule, lead seul, deux de ces pistes ou les trois.[000177]. In addition, it is possible to consider an application called "Voxgroovebox" in which a musical phrase is recorded by pressing a button on the telephone keypad, and as soon as the button is released, the ringtone generated by the method according to the invention from the musical phrase begins to loop on itself. Then, it is possible to sing on this ringtone, or to mix it with pieces of music stored in the phone. Depending on the musical intention, you can activate all or part of the tracks: rhythmic only, bass only, lead alone, two of these tracks or all three.

[000178]. Il est également possible d'envisager une application dite « Répondeur augmenté » dans laquelle on enregistre, déclame ou chante un message de répondeur, et le système réalise automatiquement la correction et/ou l'accompagnement musical de ce message. Dans un exemple, le signal de voix est retranscrit seul mais il est musicalement cohérent car il a été recalé à un rythme donné.[000178]. It is also possible to consider an application called "Answering machine augmented" in which one records, declaims or sings an answering machine, and the system automatically performs the correction and / or musical accompaniment of this message. In one example, the voice signal is transcribed alone but it is musically coherent because it has been re-calibrated at a given rate.

[000179]. Dans un autre exemple, on conserve une voix intelligible et on détermine un tempo moyen ou un rythme de référence du message téléphonique. On adjoindra alors au signal de voix une boucle instrumentale, constituée de une ou plusieurs pistes et dont le tempo ou le rythme correspond à celui qui a été analysé.[000179]. In another example, an intelligible voice is retained and an average tempo or a reference rate of the telephone message is determined. Then add to the voice signal an instrumental loop, consisting of one or more tracks and whose tempo or rhythm corresponds to the one that was analyzed.

[000180]. En variante, l'utilisateur a la possibilité de modifier chacune des pistes. Dans ce cas, l'utilisateur reçoit par exemple une partition poly- instrumentale correspondant à la mélodie réorchestrée de ce qu'il a fredonnée ou une représentation symbolique simplifiée de la partition. Il peut alors lire chacune des pistes, par exemple à l'aide d'un éditeur de partitions et/ou modifier manuellement chacune des pistes. Il peut par exemple modifier manuellement le rythme, les notes, ou tout paramètre d'analyse. Toute interface et par exemple un stylet ou des boutons peut être utilisée pour modifier les pistes et/ou les remixer.[000180]. In a variant, the user has the possibility of modifying each of the tracks. In this case, the user receives for example a poly-instrumental score corresponding to the re-orchestrated melody of what he has hummed or a simplified symbolic representation of the score. He can then read each of the tracks, for example using a partitions and / or manually modify each of the tracks. It can for example manually modify the rhythm, the notes, or any parameter of analysis. Any interface and for example a stylus or buttons can be used to edit tracks and / or remix.

[000181]. En variante, la composition peut être effectuée en plusieurs étapes ou passes. L'utilisateur commence par exemple par composer vocalement en le fredonnant le lead mélodique. Dans ce cas, une seule piste est générée : le lead mélodique corrigé à partir des paramètres d'analyse. Il fredonne ensuite par exemple la ligne de basse. Pour l'aider à fredonner en rythme et dans l'harmonie, il pourra fredonner la ligne de basse tout en écoutant le lead mélodique. La ligne de basse fredonnée sera alors corrigée en lui appliquant les paramètres d'analyse du lead mélodique. Il pourrait aussi choisir d'appliquer à chacune des deux voix un mélange de paramètres issus des deux fredonnements. Par exemple, le rythme pourrait être issu de la ligne de basse et l'harmonie de la ligne mélodique. Dans un autre exemple on calculera une harmonie globale issue des deux fredonnements et calculée en appliquant les algorithmes d'analyse d'harmonie à un fichier d'enregistrement de voix constitué de la juxtaposition temporelle des deux enregistrements de voix (c'est-à-dire en juxtaposant temporellement les deux fichiers pour créer un seul fichier d'une longueur égale à la somme de leurs deux longueurs).[000181]. Alternatively, the composition may be performed in several steps or passes. The user begins, for example, by vocally dialing the humming melodic lead. In this case, only one track is generated: the melodic lead corrected from the analysis parameters. He hums then for example the bass line. To help him hum in rhythm and harmony, he will be able to hum the bass line while listening to the melodic lead. The hummed bass line will then be corrected by applying the parameters of analysis of the melodic lead. He could also choose to apply to each of the two voices a mixture of parameters from the two humps. For example, the rhythm could come from the bass line and the harmony of the melodic line. In another example we will compute a global harmony resulting from the two humps and calculated by applying the harmony analysis algorithms to a voice recording file consisting of the temporal juxtaposition of the two voice recordings (ie say by temporally juxtaposing the two files to create a single file with a length equal to the sum of their two lengths).

[000182]. On décrit ci-après une mise en œuvre de l'invention pour extraire une mélodie à partir d'un signal de voix 5.[000182]. An implementation of the invention is described below for extracting a melody from a voice signal 5.

[000183]. Tout d'abord, on met en œuvre un algorithme permettant d'obtenir les flux de hauteur, d'énergie et de qualité instantanées du signal de voix 5, en utilisant par exemple l'algorithme décrit dans le document de brevet français de France Telecom portant le numéro d'enregistrement national 01 07284.[000183]. Firstly, an algorithm is implemented that makes it possible to obtain the instantaneous height, energy and quality fluxes of the voice signal 5, for example using the algorithm described in France Telecom's French patent document. bearing the national registration number 01 07284.

[000184]. On nettoie ensuite de préférence le flux des notes instantanées du signal de voix 5 au moyen d'un filtre médian, en éliminant les notes hors tessiture, et/ou les notes de faible énergie, et/ou les notes de faible qualité, et/ou les notes trop courtes (parasites), par exemple pour des segments de durée inférieure à 50ms. [000185]. On effectue ensuite une segmentation du signal de voix 5, chaque segment du signal correspondant à un morceau de signal de voix 5 compris entre deux attaques correspondant à des variations significatives en hauteur et/ou en énergie du signal de voix et/ou en énergie dans les hautes fréquences, qui peut par exemple être détectée en mettant en œuvre une technique de type HFC (High Frequency Content), décrite dans la demande PCT/FR2007/051807. Ensuite, on recale de préférence le signal à zéro à partir de l'instant du premier segment détecté.[000184]. Preferably, the stream of the instantaneous notes of the voice signal 5 is cleaned by means of a median filter, eliminating out-of-range notes, and / or low energy notes, and / or low quality notes, and / or or notes that are too short (noisy), for example for segments shorter than 50ms. [000185]. Segmentation of the voice signal 5 is then performed, each segment of the signal corresponding to a piece of voice signal between two attacks corresponding to significant variations in pitch and / or energy of the voice signal and / or energy in the high frequencies, which can for example be detected by implementing a technique of HFC (High Frequency Content) type, described in application PCT / FR2007 / 051807. Then, the signal is preferably reset to zero from the instant of the first detected segment.

[000186]. On détermine la hauteur de chaque segment S1-S8 correspondant à la médiane de la hauteur de chaque segment. Ainsi, sur les figures 16, 18, 19 et 20, les segments S1-S8 sont séparés par une forte variation de hauteur.[000186]. The height of each segment S1-S8 corresponding to the median height of each segment is determined. Thus, in FIGS. 16, 18, 19 and 20, the segments S1-S8 are separated by a large variation in height.

[000187]. On effectue ensuite un accordage global du signal de voix 5. Cet accordage consiste à déterminer la note, dite « note centrale » NC, qui est la note la plus fréquente (la plus jouée sur 100 niveaux d'accordage global), à partir des notes non-tempérées de chaque segment S1-S8, puis à transposer globalement l'ensemble du signal de voix 5, de manière à faire correspondre la "note centrale" NC à sa note tempérée la plus proche.[000187]. This is followed by a global tuning of the voice signal 5. This tuning consists in determining the note, called "central note" NC, which is the most frequent note (the most played on 100 levels of overall tuning), starting from non-tempered notes of each segment S1-S8, then to globally transpose the whole of the voice signal 5, so as to match the "central note" NC to its closest temperate note.

[000188]. A cette fin, on considère le flux mélodique, c'est à dire le flux de l'ensemble des hauteurs h de notes chantées, comme montré sur la figure 16. On divise l'axe des hauteurs en bandes B correspondant aux notes de la gamme chromatique tempérée, de manière à obtenir une grille d'accordage G placée sur le signal de voix 5 construite par intervalles de 1/2 tons.[000188]. To this end, we consider the melodic flow, ie the flow of the set of heights h of sung notes, as shown in Figure 16. We divide the height axis into B bands corresponding to the notes of the temperature range, so as to obtain a tuning grid G placed on the voice signal 5 built in intervals of 1/2 tones.

[000189]. On réalise un histogramme de la durée t des notes chantées en fonction des hauteurs h de l'ensemble des notes du signal 5, dans ces bandes, comme montré sur la figure 17. La note centrale NC est la moyenne des hauteurs de la bande la plus haute de l'histogramme[000189]. A histogram is made of the duration t of the sung notes as a function of the heights h of all the notes of the signal 5, in these bands, as shown in FIG. 17. The central note NC is the average of the heights of the band. highest of the histogram

(appelée mode). [000190]. On cherche alors la transposition des bandes (entre -1/2 et +1/2 ton) permettant d'obtenir un histogramme optimal, dans lequel le mode est maximal, autrement dit le mode dans lequel la note centrale NC est la plus prédominante. Cette transposition optimale correspond à l'accordage global de la mélodie. On transpose ensuite l'ensemble du signal de voix 5 de cette valeur optimale, afin que la note centrale NC soit transposée sur une hauteur tempérée. Par exemple, sur la figure 18, la transposition permet d'aligner la note centrale NC de hauteur initiale 54.2 sur la note tempérée la plus proche de hauteur 54.0.(called mode). [000190]. One then seeks the transposition of the bands (between -1/2 and +1/2 ton) making it possible to obtain an optimal histogram, in which the mode is maximum, in other words the mode in which the central note NC is the most predominant. This optimal transposition corresponds to the overall tuning of the melody. The whole of the voice signal 5 is then transposed by this optimum value so that the central note NC is transposed to a moderate height. For example, in FIG. 18, the transposition makes it possible to align the central note CN of initial height 54.2 with the nearest tempered note of height 54.0.

[000191]. On effectue ensuite un tempérage du signal de voix qui consiste à assigner pour chaque segment S1-S8 du signal de voix 5 une hauteur tempérée, à partir de la hauteur effectivement chantée. Pour cela, on considère la mélodie accordée, sa note centrale NC, et les 128 notes tempérées correspondant aux 128 notes Midi.[000191]. A tempering of the voice signal is then performed which consists in assigning for each segment S1-S8 of the voice signal 5 a tempered height, starting from the height actually sung. For this, we consider the tuned melody, its central note NC, and the 128 temperate notes corresponding to the 128 Midi notes.

[000192]. Pour chacune de ces 128 notes Midi, on crée un modèle de la hauteur effectivement chantée (plus précisément de la différence de hauteur entre la note centrale et les notes chantées). Dans le modèle initial, chaque note Midi est modélisée par une hauteur de l'échelle chromatique accordée sur la note centrale. Ainsi, comme représenté sur la figure 18, le modèle initial des notes Midi correspond à la grille G de notes à intervalles e constants placée sur le signal de voix 5 centrée par rapport à la note centrale de hauteur 54.0, le modèle initial de la note Midi 54 correspondant dans notre exemple à la hauteur 54.0, le modèle initial de la note Midi 55 à la hauteur 55.0, et ainsi de suite.[000192]. For each of these 128 Midi notes, we create a model of the actually sung height (more precisely the height difference between the middle note and the sung notes). In the initial model, each Midi note is modeled by a height of the chromatic scale tuned to the central note. Thus, as represented in FIG. 18, the initial model of the Midi notes corresponds to the grid G of notes at constant intervals e placed on the voice signal 5 centered with respect to the central pitch note 54.0, the initial model of the note. Corresponding Midi 54 in our example at height 54.0, the initial pattern of note Midi 55 at height 55.0, and so on.

[000193]. La hauteur de la note centrale NC restant invariable, on ajuste progressivement le modèle de chaque note tempérée par rapport aux hauteurs effectivement chantées.[000193]. The height of the NC central note remaining invariable, it gradually adjusts the model of each tempered note with respect to the heights actually sung.

[000194]. De préférence, on commence par modéliser les hauteurs des dernières notes chantées, puis on affine le modèle en remontant vers le début de la mélodie suivant le sens d'analyse SA. Cette technique repose sur l'hypothèse que la précision du chant s'améliore au cours de la mélodie (la voix se place), et que la hauteur des dernières notes chantées est plus stable: cela permet d'utiliser un modèle robuste pour déterminer les hauteurs des notes, généralement imprécises, chantées en début de mélodie.[000194]. Preferably, one starts by modeling the heights of the last sung notes, then one refines the model by going back to the beginning of the melody according to the direction of analysis SA. This technique is based on the assumption that the accuracy of the song improves during the melody (the voice is placed), and that the height of the last sung notes is more stable: it allows to use a robust model for determine the pitch of the notes, generally imprecise, sung at the beginning of the melody.

[000195]. L'affectation d'une note tempérée pour chaque segment S1-S8 suit l'algorithme suivant : - en boucle, pour chaque segment Sj, en partant de préférence du dernier jusqu'au premier, on détermine, à partir du modèle courant, la note tempérée la plus probable pour le segment Sj. Par exemple, on considère que la note tempérée la plus probable est la note tempérée dont le modèle est le plus proche de la note chantée, - on affecte cette note tempérée au segment Sj, et[000195]. The assignment of a tempered note for each segment S1-S8 follows the following algorithm: in a loop, for each segment Sj, starting preferably from the last to the first one, is determined, starting from the current model, the temperate rating most likely for segment Sj. For example, we consider that the most likely temperate note is the tempered note whose model is closest to the sung note, - we assign this temperate note to the segment Sj, and

- on met à jour le modèle de cette note Midi tempérée, en prenant en compte la hauteur chantée sur le segment Sj: la hauteur du modèle de cette note tempérée évolue. A cet effet, le modèle de la note tempérée peut par exemple être égal à la moyenne des hauteurs des notes réelles (chantées) auxquelles a été affectée la note du modèle, ces hauteurs de notes pouvant être pondérées le cas échéant par leur durée ou leur intensité.- we update the model of this note Midi tempered, taking into account the pitch sung on the segment Sj: the height of the model of this temperate note evolves. For this purpose, the model of the tempered note may for example be equal to the average of the heights of the actual scores (sung) to which the score of the model has been assigned, these note pitches being able to be weighted if necessary by their duration or their duration. intensity.

[000196]. Par exemple, au départ, le modèle de la note tempérée 54 correspond à la hauteur 54.0 (la note centrale) et le modèle de la note tempérée 55 correspond à la hauteur 55.0. Si un segment analysé Sj a une hauteur de note valant 54.2, il sera associé à la note tempérée 54, puisque l'écart entre la hauteur analysée 54.2 et le modèle de la note 54 de hauteur 54.0 est plus petit que l'écart entre la hauteur analysée 54.2 et le modèle de la note 55 de hauteur 55.0 (54.2-54.0=0.2 plus petit que 55.0-54.2=0.8). Ensuite, on met à jour le modèle de la note 54 qui est égal à la moyenne m des notes auxquelles on a assigné la note tempérée 54, à savoir dans l'exemple m=(54.0+54.2)/2=54.1. Le modèle de la note tempérée 54 correspond donc maintenant à la hauteur 54.1 , alors qu'il correspondait à la hauteur 54.0 avant analyse du segment Sj. On recommence ensuite pour un nouveau segment.[000196]. For example, at the beginning, the model of the tempered note 54 corresponds to the height 54.0 (the central note) and the model of the tempered note 55 corresponds to the height 55.0. If an analyzed segment Sj has a pitch of 54.2, it will be associated with the tempered note 54, since the difference between the analyzed height 54.2 and the model of the note 54 of height 54.0 is smaller than the difference between height analyzed 54.2 and the model of the note 55 of height 55.0 (54.2-54.0 = 0.2 smaller than 55.0-54.2 = 0.8). Then, we update the model of the note 54 which is equal to the average m of the notes to which we assigned the tempered note 54, namely in the example m = (54.0 + 54.2) /2=54.1. The model of the tempered note 54 thus corresponds to the height 54.1, whereas it corresponded to the height 54.0 before analysis of the segment Sj. We start again for a new segment.

[000197]. En poursuivant ce même exemple (Figure 19), le modèle de la note tempérée 57 correspond à la note 57.0 et le modèle de la note tempérée 56 correspond à la note 56.0. Si un nouveau segment analysé Sk a une hauteur de note valant 56.8, il sera associé à la note tempérée 57, puisque l'écart entre la hauteur analysée 56.8 et le modèle de la note 57 de hauteur 57.0 est plus petit que l'écart entre la hauteur analysée 56.8 et le modèle de la note 56 de hauteur 56.0 (57.0-56.8=0.2 plus petit que 56.8-56.0=0.8). Ensuite, on met à jour le modèle de la note 57 qui est égal à la moyenne m des notes auxquelles on a assigné la note tempérée 57, à savoir dans l'exemple m=(57.0+56.8)/2=56.9. Le modèle de la note tempérée 57 correspond donc maintenant à la hauteur 56.9, alors qu'il correspondait à la hauteur 57.0 avant analyse du segment Sk. On recommence ensuite pour le segment suivant, et ainsi de suite jusqu'à analyse de l'ensemble des segments.[000197]. Continuing the same example (Figure 19), the model of the tempered note 57 corresponds to the note 57.0 and the model of the tempered note 56 corresponds to the note 56.0. If a new segment analyzed Sk has a note pitch of 56.8, it will be associated with the tempered note 57, since the difference between the analyzed height 56.8 and the 57.0 pitch 57.0 model is smaller than the difference between the analyzed height 56.8 and the model of the note 56 of height 56.0 (57.0-56.8 = 0.2 smaller than 56.8-56.0 = 0.8). Then, we update the model of the note 57 which is equal to the average m of the notes to which we have assigned the tempered note 57, namely in the example m = (57.0 + 56.8) /2=56.9. The model of the tempered note 57 now corresponds to the height 56.9, whereas it corresponded to the height 57.0 before analysis of the segment Sk. It then starts again for the following segment, and so on until analysis of the whole segments.

[000198]. Ainsi, comme montré sur la figure 20, pour la partie 61 traitée de la fin du signal 5, la hauteur du modèle de la note 54 monte et la hauteur du modèle de la note 57 descend vers les hauteurs effectivement chantées.[000198]. Thus, as shown in FIG. 20, for the processed portion 61 of the end of the signal 5, the height of the pattern of the note 54 rises and the height of the pattern of the note 57 goes down to the actually sung heights.

[000199]. A la fin de l'algorithme, une note tempérée a été affectée à chaque segment S1-S8. On a ainsi défini une échelle G' dont les écarts e' de hauteur ne correspondent plus à des demi-tons, mais aux hauteurs effectivement chantées, comme montré sur la figure 20.[000199]. At the end of the algorithm, a tempered note was assigned to each S1-S8 segment. We have thus defined a scale G 'whose gaps e' height no longer correspond to semitones, but to the heights actually sung, as shown in Figure 20.

[000200]. Le modèle de note a été décrit pour des notes de type MIDI mais il est clair qu'il peut être défini pour n'importe quel type de note tempéré pouvant être associé à une hauteur de signal. [000200]. The note pattern has been described for MIDI notes but it is clear that it can be set for any type of temperate note that can be associated with a signal pitch.

Claims

A method for automatically dialing a ring (13) from a recording of a monophonic voice signal (5), such as a song, a user humming, wherein:

extracting analysis parameters (25) of the voice signal (5), such as the pitch and / or the intensity and / or the attack of the notes of the voice signal, and

the voice signal (5) is transformed into a ring comprising at least one musical track, characterized in that, to transform the voice signal (5) into a ring,

the voice signal is tuned by transposing the set of said voice signal (5) to the same height so as to minimize a distance between the whole of the voice signal (5) and a tempered chromatic range, and - moderates the voice signal (5) by replacing the notes of the transposed voice signal with temperate notes.

2. Method according to claim 1, characterized in that, in order to tune the voice signal, a central note (NC) of the voice signal (5) corresponding to the most frequent note of this voice signal is determined, and

- The whole of the voice signal is transposed globally so as to match the height of the central note (NC) to the height of its closest temperate note.

3. Method according to claim 1 or 2, characterized in that for tempering the voice signal (5), the voice signal (5) having been previously cut into segments (S1-S8),

for each temperate note, a model of associated note is defined; in loop, for each segment (Sj), the tempered note whose model is closest to the height of the segment (Sj) is determined,

said tempered note is assigned to the segment (Sj), and

the model of said temperate note is updated by taking into account the height of the segment (Sj), for example by averaging the heights of the notes of the signal which have been associated with this tempered note, these heights of notes may be weighted, where appropriate, by their duration or intensity.

4. Method according to claims 2 and 3, characterized in that, in the initial note template, each tempered note is modeled by a height of the chromatic scale tuned to the central note.

5. Method according to claim 3 or 4, characterized in that to temper, we start from the last segment of the voice signal (5) from a time point of view and go back to the first.

6. Method according to one of claims 1 to 5, characterized in that for transposing the voice signal (5), a tuning cost is calculated which is equal to the integration, over the duration of the signal melody. of voice, the product of the instantaneous difference between the pitch of the voice signal (5) and the nearest temperate height raised to the power p (p strictly positive real) and the loudness of the voice signal raised to the power q (real positive q), and the voice signal (5) is transposed so as to minimize the value of the tuning cost.

7. Method according to claim 6, characterized in that p is 2 and q is 1.

8. Method according to one of claims 1 to 7, characterized in that a range is determined by implementing the following steps:

- we choose probabilities of occurrence of a note in a given range,

the degree of belonging of the melody to several scales is calculated, this degree of membership being a function of the concordance of the notes of the voice signal and the probabilities of occurrence of the notes of the scale, and

- the range with the highest degree of membership is selected.

9. A method according to claim 8, characterized in that the degree of membership of the notes of the range is equal to the sum for all the notes of the voice signal of the product of the duration of each note raised to the power p by l intensity of each note raised to the power q and by the probability of occurrence of each note raised to the power r, pq and r being realities greater than 0 and p being different from 0.

10. Method according to one of claims 1 to 9 characterized in that to temper the melody of the voice signal, - one chooses probabilities of occurrence of a note in a given range,

an optimal transposition is calculated for each of the candidate ranges which is a function of the concordance of the temperate notes closest to the voice signal and the probabilities of occurrence of the notes of the candidate scale, by calculating a degree of membership of the melody to each candidate range and for all the possible values of the transposition,

selecting the range having the highest degree of membership and transposing the notes of the voice signal of the optimal transposition associated with this range, and

- Tempers by replacing the notes of the voice signal transposed by the temperate notes closest to the transposed voice signal.

11. Method according to claim 10, characterized in that the degree of membership of the notes of the scale is equal to the integral for all the notes of the melody of the product of the intensity of each note raised to the power q by the difference of the note of the voice signal relative to the note situated between the two temperate notes closest to the power p by the probability of occurrence of each note raised to the power r, pq and r being higher realities at 0 and p being different from 0.

12. The method of claim 10 or 11, characterized in that one uses the knowledge of the range to temper by minimizing a distance between the note of the transposed voice signal and the most temperate note. near, this distance being weighted according to the probability of occurrence of the note in the range.

13. Method according to one of claims 8 to 12, characterized in that the probability of occurrence of a note in a given range is determined from the probetones of Krumhansl & Kessler.

14. Method according to one of claims 1 to 13, characterized in that to turn the voice signal (5) into a ring, it removes or groups notes whose duration is less than a reference value, for example 1 ms , and / or whose intensity is less than a reference intensity and / or whose height extraction quality is less than a reference value.

15. Method according to one of claims 1 to 14, characterized in that to transform the voice signal into a ring, the voice signal (5) or a musical track already obtained from the voice signal is corrected by cleaning it and / or correcting the melody and / or recaling it rhythmically and / or deriving a melody from the voice signal (5).

16. The method of claim 15, characterized in that to rhythmically recalibrate the voice signal (5), it tracks the tempo of this voice signal (5) and recalibrates the notes of the voice signal in this tempo.

17. The method of claim 15 or 16, characterized in that the rhythmic correction is performed by imposing a fixed tempo, for example the average tempo extracted from the voice signal, the voice signal (5) being set rhythmically on the imposed tempo. by a method of "Time stretching".

18. A method according to one of claims 15 to 17, characterized in that the rhythmic correction is performed by a rhythmic marking by the technique of "Time Warping" in which a tracing of the high points of the voice signal (5) is carried out. , in order to build a reference rhythm on which the musical tracks are synchronized.

19. A method according to one of claims 15 to 18, characterized in that the melodic correction is performed by a technique of "pitch shifting" in which the notes of the voice (5) are recalibrated in temperate notes and / or in the average range of the hummed voice signal.

20. Method according to one of claims 15 to 19, characterized in that to compose a new melody derived from the voice signal, the notes of the voice signal (5) and a rhythm are selected according to the analysis parameters ( 25) excerpts and musical construction rules that depend on a musical style chosen by the user or imposed.

21. Method according to one of claims 15 to 20, characterized in that to compose a new melody deriving from the voice signal, an original melody is elaborated which is a response to the vocal melody and / or to the vocal rhythm calculated from Analysis parameters to establish a dialogue with a machine, or a sequence to end the vocal melody correctly according to the chosen style.

22. Method according to one of claims 1 to 21, characterized in that one carries out a modification of the tone, and / or the height and / or other characteristics of the voice by transformation of the voice signal (5), or by sound environment synthesis from the voice signal (5).

23. Method according to one of claims 1 to 22, characterized in that to develop one or more musical tracks, one or more pre-recorded rhythm loops in the form of an audio signal that is staggered on the tempo extracted from the signal of voice (5).

24. Method according to one of claims 1 to 23, characterized in that to create one or more musical tracks, one selects in a musical database musical samples having the musical parameters closest to those of the voice signal on a given time interval.

25. Method according to one of claims 1 to 24, characterized in that the volumes of the different musical tracks are adjusted with respect to each other, and / or are introduced effects on selected tracks, such as saturation or a sound compression effect, and if necessary mixes all the tracks into an output track, and / or we introduce global effects on this output track, such as reverb.

26. The method of claim 25, characterized in that one creates an audio file from the mixed output track, in an mp3 type format.

27. Method according to one of claims 1 to 26, characterized in that the ringing comprises several musical tracks arranged between them according to the extracted analysis parameters and musical composition rules.

28. Method according to one of claims 1 to 27, characterized in that the musical composition rules are related to a style (6) of music, such as a rock or blues style, chosen by the user.

29. Method according to one of claims 1 to 28, characterized in that the voice signal (5) is streamed to a server (4) providing the extraction of analysis parameters (25) and the elaboration the bell (13).

30. Method according to one of claims 1 to 29, characterized in that the musical tracks are obtained from a MIDI and / or audio processing of the voice signal (5).

Mobile phone implementing the method according to one of claims 1 to 30.

32. A method for associating telephone ringtones with contacts stored in a mobile phone in which: - we record a phrase sung by the user of the mobile phone or by one of the contacts,

the sung phrase is converted into a ring signal using the method defined in one of claims 1 to 30, and the ringing obtained is stored in a memory associated with the contact for which the ring is intended, so that when the contact calls, the corresponding ringing is played by the phone.

33. Device for generating real-time music implementing the method defined according to one of claims 1 to 30 for generating a ringtone from a sung musical phrase, this device comprising means for playing this ringing ring, so that it is possible to sing on this ringtone, or mix it with music tracks to create musical tracks.

34. A method for producing a ringtone in which several voice lines are recorded one after the other, listening to the voice lines previously recorded and processed according to the method defined by one of the claims 1 to 30 being authorized during the recording of a voice. a new voice line and analysis parameters (25) that can be extracted from each record or from all records.