WO2014084706A1

WO2014084706A1 - Method for three-dimensional audio localisation in real time using a parametric mixer and pre-decomposition into frequency bands

Info

Publication number: WO2014084706A1
Application number: PCT/MX2013/000157
Authority: WO
Inventors: Daniel LÓPEZ GARCÍA
Original assignee: Individual
Current assignee: Individual
Priority date: 2012-11-30
Filing date: 2013-11-29
Publication date: 2014-06-05
Anticipated expiration: 2015-05-30
Also published as: MX2012014008A

Abstract

The invention relates to a method for processing at least one monaural audio signal with known parameters, with the aim of generating a stereo signal allowing the perception that sound comes from any spatial position in real time by means of the use of earphones. The method is based on the division of the process into two steps, one step wherein the audio signal is prepared by being dividing into frequency sub-bands and stored in a memory, and the second step that is the spatial processing of the signal by means of the use of empirical parametric tables where the spatial localisation coordinates are inserted either by a person or by a device via an interface. Said embodiment allows this effect to be obtained in real time in devices with little computing power, without being limited thereto.

Description

METODO DE UBICACIÓN TRIDIMENSIONAL DE AUDIO EN TIEMPO REAL USANDO UN MEZCLADOR PARAMÉTRICO Y PREDESCOMPOSICION EN THREE-DIMENSIVE REAL-TIME AUDIO LOCATION METHOD USING A PARAMETRIC MIXER AND PREDESCOMPOSITION IN

BANDAS DE FRECUENCIA FREQUENCY BANDS

Campo de la Invención Field of the Invention

La presente invención se ubica en el campo del procesamiento de señales y acústica. Específicamente, la presente invención se refiere a un método para procesar una o varias señales de audio monoaural, para generar una señal estéreo que permiten la percepción en tiempo real de que el sonido proviene de cualquier posición espacial mediante el uso de auriculares. The present invention is located in the field of signal processing and acoustics. Specifically, the present invention relates to a method for processing one or more monaural audio signals, to generate a stereo signal that allows the real-time perception that sound comes from any spatial position through the use of headphones.

Antecedentes de la Invención Background of the Invention

Es conocido que el oído humano es capaz de discernir la ubicación espacial del sonido. Esto significa que un escucha con aptitudes auditivas promedio puede localizar las fuentes de los sonidos a su alrededor. Esta característica 3D del sonido está basada en que cada oído recibe las ondas acústicas con diferente amplitud y tiempo de arribo; información que el cerebro compara, y con base en ello, determina la ubicación del origen del sonido. El efecto antes descrito se conoce como escucha binaural. It is known that the human ear is able to discern the spatial location of sound. This means that a listener with average listening skills can locate the sources of the sounds around him. This 3D characteristic of the sound is based on the fact that each ear receives the acoustic waves with different amplitude and arrival time; information that the brain compares, and based on it, determines the location of the origin of the sound. The effect described above is known as binaural listening.

Diferentes configuraciones se han usado para dar la sensación de escucha binaural al usuario. Dentro de éstas soluciones, el posicionar físicamente las bocinas donde se desea que el sonido aparente su origen es el método más sencillo. Otra solución muy conocida en el medio acústico consiste en usar una cabeza de maniquí que simule una cabeza real. Se coloca un micrófono en cada una de las de las orejas del maniquí para posteriormente realizar la grabación deseada. El efecto 3D se obtiene reproduciendo, sin ningún procesamiento especial, el sonido grabado en cada micrófono en el correspondiente audífono (micrófono derecho en auricular derecho, y en forma similar, el micrófono izquierdo en el auricular izquierdo) . El principal problema con este sistema es que la ubicación espacial de la fuente del sonido es fija. Different configurations have been used to give the user a binaural listening sensation. Within these solutions, physically positioning the speakers where it is desired that the apparent sound of its origin is the simplest method. Another well known solution in the acoustic environment It consists of using a dummy head that simulates a real head. A microphone is placed in each of the dummy's ears to subsequently make the desired recording. The 3D effect is obtained by reproducing, without any special processing, the sound recorded in each microphone in the corresponding hearing aid (right microphone in the right earphone, and similarly, the left microphone in the left earphone). The main problem with this system is that the spatial location of the sound source is fixed.

Para otorgar flexibilidad y capacidad de poder ubicar una fuente de sonido a complacencia, varios métodos han sido propuestos. El procedimiento más empleado consiste en el uso de la Head-Related-Transfer-Function (HRTF) . Esta función de transferencia describe matemáticamente como las diferentes frecuencias para diferentes ubicaciones de fuentes de sonido son escuchadas por cada uno de los oídos (La configuración anatómica del oído humano es considerada en esta función) . El principal problema de este método es la considerable capacidad de cómputo que es requerida y que no permite su uso extendido en aplicaciones en tiempo real, entendiéndose éste como que la percepción auditiva sea congruente con el resto de interacciones en una implementación dada. Dicho requerimiento se debe a que es necesario transformar la señal de audio del dominio del tiempo al dominio de la frecuencia mediante la transformada de Fourier; multiplicar la función de transferencia por el espectro de la señal, y posteriormente transformar de nuevo la señal modificada al dominio del tiempo mediante la transformada inversa de Fourier. Todas estas operaciones matemáticas implican capacidad de procesamiento que en muchas aplicaciones no es recomendado que se realicen o no se tiene la capacidad de cómputo requerida, siendo un claro ejemplo las aplicaciones en dispositivos móviles. Es cierto que es posible evitar las transformadas de Fourier mediante el uso de la convolución de la señal de audio con la transformada inversa de la HRTF (Head-Related-Impulse- Function) , pero dada la naturaleza en que se obtiene la HRTF, es preferible usar el método que utiliza transformadas de Fourier. To provide flexibility and ability to locate a source of sound to complacency, several methods have been proposed. The most commonly used procedure is the use of the Head-Related-Transfer-Function (HRTF). This transfer function describes mathematically how different frequencies for different locations of sound sources are heard by each of the ears (The anatomical configuration of the human ear is considered in this function). The main problem of this method is the considerable computing capacity that is required and that does not allow its extended use in real-time applications, this being understood as that the auditory perception is congruent with the rest of interactions in a given implementation. This requirement is due to the need to transform the audio signal from the time domain to the frequency domain by means of the Fourier transform; multiply the transfer function by the spectrum of the signal, and then transform the modified signal back to the time domain by means of the inverse Fourier transform. All these operations Mathematics implies processing capacity that in many applications it is not recommended that the required computing capacity be carried out or not, being a clear example the applications in mobile devices. It is true that it is possible to avoid Fourier transforms by using the convolution of the audio signal with the inverse transformation of the HRTF (Head-Related-Impulse-Function), but given the nature in which the HRTF is obtained, it is It is preferable to use the method that uses Fourier transforms.

En relación con la tecnología antes descrita, la patente US 5105462 menciona un sistema para crear la ilusión de fuentes de sonidos localizadas en diferentes ubicaciones en un espacio tridimensional, dicha procedimiento emplea un sistema convencional de dos altoparlantes, teniendo éstos que cumplir con ciertos requerimientos específicos como la distancia entre ellos y su ubicación respecto al escucha. La secuencia de sonido original que emplea es una señal monoaural que es copiada en dos canales, cada uno representando los canales derecho/izquierdo de un sistema de audio estéreo. Estas dos señales son subdivididas en intervalos de frecuencia. A cada intervalo de frecuencia se modifica su amplitud y se agrega un retraso entre el canal derecho y el izquierdo de acuerdo con una función de transferencia obtenida de manera empírica. Esta función de transferencia es dependiente de la ubicación deseada y del intervalo de frecuencia a procesar. El utilizar esta función de transferencia empírica evita el uso de la HRTF; no obstante, las modificaciones de amplitud y retrasos para las diferentes bandas de frecuencia, en una implementación digital del sistema, son necesarias de hacer en el dominio de la frecuencia. Este último requerimiento demanda del uso de la transformada de Fourier para transformar la señal de audio monoaural al dominio de la frecuencia, aplicar la función de transferencia dos veces para obtener los dos canales de audio estéreo, y finalmente, usar la transformada inversa de Fourier para obtener las señales en dominio temporal. Todas estas operaciones matemáticas necesitan una gran capacidad de cómputo que, para dispositivos con capacidad de procesamiento limitado y aplicaciones que requieran de este efecto en tiempo real, ésta solución sea difícilmente empleada. Un punto importante a recalcar sobre esta propuesta es que la separación en sub-bandas de frecuencia y el procesamiento espacial del sonido se considera un proceso indivisible, sin mencionar que las etapas se puedan separar a fin de tener etapas independientes de procesamiento, lo que otorgaría una gran flexibilidad al método . In relation to the technology described above, US 5105462 mentions a system to create the illusion of sound sources located in different locations in a three-dimensional space, said method employs a conventional system of two loudspeakers, these having to meet certain specific requirements as the distance between them and their location with respect to listening. The original sound sequence used is a monaural signal that is copied into two channels, each representing the right / left channels of a stereo audio system. These two signals are subdivided into frequency intervals. To each frequency range its amplitude is modified and a delay between the right and the left channel is added according to an empirically obtained transfer function. This transfer function is dependent on the desired location and the frequency range to be processed. Using this empirical transfer function avoids the use of the HRTF; however, the amplitude modifications and delays for the different Frequency bands, in a digital implementation of the system, are necessary to do in the frequency domain. This last requirement demands the use of the Fourier transform to transform the monaural audio signal to the frequency domain, apply the transfer function twice to obtain the two stereo audio channels, and finally, use the Fourier reverse transform to get the signals in temporal domain. All these mathematical operations require a large computing capacity that, for devices with limited processing capacity and applications that require this effect in real time, this solution is hardly used. An important point to emphasize on this proposal is that separation in frequency subbands and spatial processing of sound is considered an indivisible process, not to mention that the stages can be separated in order to have independent processing stages, which would grant great flexibility to the method.

En el caso de la patente US 5521981, se maneja un sistema de localización 3D de sonidos. En este caso el objetivo principal es reducir la necesidad de cómputo que este efecto necesita, para este fin se hacen las siguientes consideraciones: Primero se define un conjunto de locaciones significativas (en su caso de ejemplo: arriba, abajo, derecha, izquierda, enfrente) . En segundo lugar, teniendo un banco de sonidos que se sabe se necesitan localizar espacialmente, se pre-procesan usando la función de transferencia HRFT para cada una de las localizaciones del conjunto a una distancia del escucha fija. Estas secuencias de audio pre-procesadas se guardan en memoria. Esto va a producir n número de secuencias de audio para cada sonido, siendo n el número de ubicaciones que se definió previamente. El tercer paso consiste en obtener las coordenadas en las que se quiere ubicar la fuente de sonido. Como los valores de amplitud y retraso de la señal están pre-procesados y sus ubicaciones fijas; es necesario realizar una interpolación lineal de las locaciones predeterminadas que ya se han pre-procesado . Aunque la configuración de este método reduce las necesidades de cómputo, se sigue haciendo uso de la función HRTF para la ubicación espacial del sonido. Asimismo, si bien el sistema se divide en dos etapas, en ambas se hacen modificaciones respecto a la ubicación de la fuente' de sonido In the case of US 5521981, a 3D sound localization system is handled. In this case the main objective is to reduce the need for computation that this effect needs, for this purpose the following considerations are made: First a set of significant locations is defined (in your case example: up, down, right, left, opposite ). Secondly, having a sound bank that is known to be spatially located, they are preprocessed using the HRFT transfer function for each of the locations of the set at a distance from the fixed listener. These Pre-processed audio sequences are stored in memory. This will produce n number of audio sequences for each sound, with n being the number of locations that were previously defined. The third step is to obtain the coordinates in which you want to locate the sound source. As the signal amplitude and delay values are pre-processed and their fixed locations; it is necessary to perform a linear interpolation of the predetermined locations that have already been pre-processed. Although the configuration of this method reduces computing needs, the HRTF function is still used for the spatial location of the sound. Likewise, although the system is divided into two stages, in both modifications are made regarding the location of the sound source

Sumario de la Invención Summary of the Invention

Dado lo anterior, un objetivo de la presente invención es brindar un método que permite el tratamiento de una o varias señales monoaurales de audio que al ser reproducida en dos auriculares produce el efecto de ubicar tridimensionalmente la o las fuentes de sonido en tiempo real, en donde dicho método comprende una etapa de pre- procesamiento que prepara la señal subdividiéndola en sub- bandas de frecuencia y guardando un conjunto de diferentes secuencias de audio, cada una representando cada una de las sub-bandas de frecuencia. El guardado se hace en un dispositivo de memoria de forma indefinida; y una etapa posicionamiento en tiempo real, que genera la señal de audio estéreo, tomando las secuencias de audio de las diferentes sub-bandas y modificándolas en amplitud y retraso temporal para cada canal de audio, siendo estos parámetros obtenidos de una tabla de datos generada empíricamente bajo un esquema de prueba y error, donde la localización de la o las fuentes de sonido son insertadas ya sea por una persona o dispositivo a través de una interfaz. Las secuencias para cada sub-banda de frecuencia se mezclan para obtener la señal final de cada canal de audio y estás señales se envían a los respectivos auriculares . Given the foregoing, an objective of the present invention is to provide a method that allows the treatment of one or several monaural audio signals that when reproduced in two headphones produces the effect of locating the sound source or sources in real time, in three wherein said method comprises a preprocessing stage that prepares the signal by subdividing it into frequency subbands and saving a set of different audio sequences, each representing each of the frequency subbands. Saving is done in a memory device indefinitely; and a real-time positioning stage, which generates the stereo audio signal, taking the audio sequences from the different subbands and modifying them in amplitude and time delay for each audio channel, these parameters being obtained from an empirically generated data table under a trial and error scheme, where the location of the sound source or sources are inserted either by a person or device through an interface. The sequences for each frequency subband are mixed to obtain the final signal of each audio channel and these signals are sent to the respective headphones.

Un objetivo más de la presente invención es brindar un método que permite el tratamiento de una señal monoaural de audio, guardarla indefinidamente y que al ser reproducida en dos auriculares produzca el efecto de ubicar tridimensionalmente la fuente de sonido. A further objective of the present invention is to provide a method that allows the treatment of a monaural audio signal, to store it indefinitely and that when reproduced in two headphones produces the effect of locating the sound source three-dimensionally.

Breve Descripción de las Figuras de la Invención Brief Description of the Invention Figures

Para completar la descripción que se está realizando y con objetivo de ayudar a una mejor comprensión de las características del invento, se acompaña una serie de esquemas en donde con carácter ilustrativo, y no limitativo, se ha representado lo siguiente: To complete the description that is being made and in order to help a better understanding of the characteristics of the invention, a series of schemes is attached, where, as an illustration, and not limitation, the following has been represented:

La Figura 1 se refiere a un diagrama de bloques del método de procesamiento de audio según la presente invención . Figure 1 refers to a block diagram of the audio processing method according to the present invention.

La Figura 2 se refiere a un diagrama de bloques del método de procesamiento de audio según la presente invención con entradas auxiliares que conllevan información de la división espectral de la señal en la etapa de pre- procesamiento y su posible formato. Figure 2 refers to a block diagram of the audio processing method according to the present invention with auxiliary inputs that carry information of the spectral division of the signal in the pre-stage. Processing and its possible format.

La Figura 3 se refiere a un diagrama de bloques del método de procesamiento de audio según la presente invención con un número n de entradas de audio y n número de sonidos ubicados espacialmente . Figure 3 refers to a block diagram of the audio processing method according to the present invention with a number n of audio inputs and n number of spatially located sounds.

Descripción Detallada de las Modalidades Representativas de la Invención Detailed Description of the Representative Modalities of the Invention

La presente invención se refiere a un método que procesa una señal monoaural digital, pero no limitada a esta, para poder crear la ilusión en tiempo real que el sonido proviene de una ubicación espacial determinada. The present invention relates to a method that processes a digital monaural signal, but not limited to it, in order to create the real-time illusion that the sound comes from a particular spatial location.

El método de procesamiento de la señal de audio está representado en el diagrama de bloques de la figura 1 y contiene las siguientes etapas: The method of processing the audio signal is represented in the block diagram of Figure 1 and contains the following steps:

La etapa de pre-procesamiento (100), puede ser una implementación en software y/o en hardware, del tipo una entrada-varias salidas, donde dichas señales, tanto la de entrada como las de salidas, representan una señal de audio digital, pero no limitada a ésta. Dicha etapa comprende un módulo (102) que genere n secuencias de audio Bl, B2, B3, ... , Bn (103) a partir de la secuencia de audio original que es ingresada por la entrada (101), siendo n el número de sub-bandas en que fue dividido el espectro de la señal. Cada una de estas secuencias de audio (103) representa cada una de las sub-bandas de frecuencia. Dichas secuencias de audio son guardadas indefinidamente en un dispositivo de memoria (200) que puede ser, pero sin limitarse a, dispositivos ópticos, memorias flash, discos duros, etc. La separación en sub-bandas de frecuencia se puede hacer mediante diferentes técnicas. La técnica seleccionada no es necesariamente restringida en capacidad de cómputo, esto es debido a que el método propuesto mantiene independientes el procesamiento de las dos etapas que lo compone, pudiendo ser realizadas en diferentes equipos. Esta configuración permite que esta etapa se realice con mayor precisión en equipos con gran capacidad de cómputo y permita tener secuencias de audio de mayor calidad . The preprocessing stage (100) can be an implementation in software and / or hardware, of the type one input-several outputs, where said signals, both the input and the outputs, represent a digital audio signal, but not limited to this one. Said stage comprises a module (102) that generates n audio sequences Bl, B2, B3, ..., Bn (103) from the original audio sequence that is input by the input (101), where n is the number of subbands into which the signal spectrum was divided. Each of these audio sequences (103) represents each of the frequency subbands. Said audio sequences are stored indefinitely in a memory device (200) which can be, but not limited to, optical devices, flash memories, discs. hard, etc. The separation into frequency subbands can be done by different techniques. The selected technique is not necessarily restricted in computing capacity, this is due to the fact that the proposed method keeps the processing of the two stages that composes it independent, and can be performed in different equipment. This configuration allows this stage to be carried out with greater precision in equipment with great computing capacity and allows to have higher quality audio sequences.

La segunda etapa corresponde al procesamiento espacial de la señal de audio en tiempo real (300), la cual puede ser una implementación en software y/o hardware. Esta etapa accede a las diferentes secuencias de audio correspondientes a las sub-bandas de frecuencia Bl,B2,...,Bn (103) y las envía al mezclador y desfasador (302). Este módulo recibe las tres coordenadas espaciales (x,y,z) en la entrada (303) que son insertadas ya sea por una persona o dispositivo a través de una interfaz, patentada o sin patentar y sin agravio de la presente invención. Estás variables también pueden ser dadas en algún otro tipo de coordenadas tipo polares o esféricas, o algún sistema de segmentación o representación que se considere más óptimo. A partir de estas coordenadas, el mezclador y desfasador (302) accede a una tabla paramétrica empírica (301), donde obtiene los correspondientes valores de ganancia G (dB) y tiempos de retraso (t) de los dos canales de audio para las diferentes bandas de frecuencia. La información de retraso (t) puede estar expresada en segundos, o bien, en alguna cantidad que indirectamente contenga información de un retraso temporal, como pueden ser el número de muestras o posiciones a retrasarse en una señal digital, conocida la frecuencia de muestreo. La tabla paramétrica empírica (301) es obtenida con un método basado en prueba y error. Con base a esos parámetros, se mezclan las secuencias de audio Bl,B2,...,Bn (103) con su correspondiente cambio de ganancia y tiempo de retraso. El resultado son dos señales (304) y (305) que corresponden al canal izquierdo y derecho de audio respectivamente. Estas señales se pueden ingresar a un par de auriculares, o bien se pueden guardar en un dispositivo de memoria por tiempo indefinido para reproducirlas en un tiempo posterior. The second stage corresponds to the spatial processing of the real-time audio signal (300), which can be an implementation in software and / or hardware. This stage accesses the different audio sequences corresponding to the frequency subbands Bl, B2, ..., Bn (103) and sends them to the mixer and phase shifter (302). This module receives the three spatial coordinates (x, y, z) at the entrance (303) that are inserted either by a person or device through an interface, patented or unpatent and without tort of the present invention. These variables can also be given in some other type of polar or spherical coordinates, or some segmentation or representation system that is considered more optimal. From these coordinates, the mixer and phase shifter (302) accesses an empirical parametric table (301), where it obtains the corresponding gain values G (dB) and delay times (t) of the two audio channels for the different frequency bands The delay information (t) may be expressed in seconds, or in some amount that indirectly contains information from a time delay, such as the number of samples or positions to be delayed in a digital signal, known the sampling frequency. The empirical parametric table (301) is obtained with a method based on trial and error. Based on these parameters, the audio sequences Bl, B2, ..., Bn (103) are mixed with their corresponding change in gain and delay time. The result is two signals (304) and (305) that correspond to the left and right audio channel respectively. These signals can be input to a pair of headphones, or they can be stored in a memory device for an indefinite period to be played back at a later time.

Es importante recalcar que la etapa de procesamiento espacial de la señal de audio en tiempo real (300) puede estar implementada en reiteradas ocasiones en una misma aplicación, pudiendo generar una cantidad de señales de audio en 3D sin un límite técnico conocido más que la capacidad de cómputo. El esquema correspondiente a esta configuración está representado en la figura 3. En dicho esquema es necesario tener guardados las secuencias de audio Bl,B2,...,Bn (103) de cada sonido que se necesite. Esta operación puede hacerse con una sola implementación de la etapa de pre-procesamiento, solo que guardando cada conjunto de secuencias de audio (103) en locaciones diferentes de memoria; o bien, tener diversas etapas de pre-procesamiento en paralelo que guarden las diferentes secuencias, igualmente en diferentes locaciones de memoria. Con los datos ya guardados en la memoria (200), la etapa de procesamiento espacial de la señal de audio en tiempo real (300) puede optimizarse al compartir elementos de esta etapa como la tabla de valores o el sistema de mezclado. Esta implementación múltiple implicaría la necesidad de un mezclador en la salida total del sistema que en la figura 3 esta representados como, pero sin limitarse a, dos sumadores (306) y (307). El mezclador de salida (306) genera la secuencia de audio para el canal izquierdo (304) a partir de todas las señales de salida de canal izquierdo de cada subsistema y el mezclador de salida (307) genera la secuencia de audio del canal derecho (305) a partir de las señales de salida de canal derecho de cada subsistema. It is important to emphasize that the spatial processing stage of the real-time audio signal (300) can be repeatedly implemented in the same application, being able to generate a quantity of 3D audio signals without a known technical limit more than the capacity of computation The scheme corresponding to this configuration is represented in figure 3. In this scheme it is necessary to keep the audio sequences Bl, B2, ..., Bn (103) of each sound that is needed. This operation can be done with a single implementation of the preprocessing stage, only by saving each set of audio sequences (103) in different memory locations; or, have several stages of preprocessing in parallel that save the different sequences, also in different memory locations. With the data already stored in memory (200), the spatial processing stage of the real-time audio signal (300) can be optimized by sharing elements of this stage like the table of values or the mixing system. This multiple implementation would imply the need for a mixer in the total output of the system that in figure 3 is represented as, but not limited to, two adders (306) and (307). The output mixer (306) generates the audio sequence for the left channel (304) from all the left channel output signals of each subsystem and the output mixer (307) generates the audio sequence of the right channel ( 305) from the right channel output signals of each subsystem.

Es importante añadir que la descomposición en distintos rangos de frecuencia se realiza por el módulo (102) utilizando cualquier técnica conocida y por conocer, como puede ser, pero sin estar limitado a, el filtrado analógico, digital, análisis en Fourier, ondeletas y convolución. También se contempla la posibilidad de convertir la señal internamente en cualesquiera de sus posibles representaciones a cualesquier otra posible representación, pudiendo experimentar de un proceso de descomposición en frecuencia tantas veces como sea necesario. Esto es con el fin de poder aceptar cualquier formato de señal y entregar cualquier formato de señal. It is important to add that the decomposition in different frequency ranges is performed by the module (102) using any known and known technique, such as, but not limited to, analog, digital filtering, Fourier analysis, wavelets and convolution . The possibility of converting the signal internally into any of its possible representations to any other possible representation is also contemplated, being able to experience a frequency decomposition process as many times as necessary. This is in order to be able to accept any signal format and deliver any signal format.

Respecto a la subdivisión de la secuencia de audio en sub-bandas de frecuencia, el número de éstas y su ancho espectral se determinan específicamente para cada aplicación. Puede ser el caso en que las bandas no tengan el mismo ancho espectral ya que ciertos rangos de frecuencia se pueden considerar iguales para fines de percepción auditiva, y los cuales no están divididos equitativamente en el espectro auditivo; o que se contemple que alguna sub-banda o conjunto de sub-bandas sean descartadas. Algunas razones para hacer esta consideración pueden ser, sin estar limitado, a razones de compresión, poco contenido audible en dichas sub-bandas u optimización del aparato. Regarding the subdivision of the audio sequence into frequency subbands, the number of these and their spectral width are determined specifically for each application. It may be the case where the bands do not have the same spectral width since certain frequency ranges can be considered equal for auditory perception purposes, and which are not evenly divided into the auditory spectrum; or contemplated that any sub-band or sub-band set is discarded. Some reasons for making this consideration may be, without being limited, for reasons of compression, little audible content in said subbands or optimization of the apparatus.

Debido a lo anterior, es posible que la etapa de pre- procesamiento (100) tenga entradas adicionales que dirijan su comportamiento siendo posibles variables, pero sin limitarse a ellas, el número de señales de salida, sus anchos espectrales, formato y método de descomposición. Dicha implementación queda representada en la figura 2, donde las variables de división espectral de la señal (104) están dadas externamente ya sea por el usuario o una aplicación la cual escoge sus parámetros óptimos. Due to the above, it is possible that the preprocessing stage (100) has additional inputs that direct its behavior being possible variables, but not limited to them, the number of output signals, their spectral widths, format and decomposition method . Said implementation is represented in Figure 2, where the spectral division variables of the signal (104) are given externally either by the user or an application which chooses its optimal parameters.

Es importante mencionar que el módulo mezclador y desfasador (302) hace modificaciones a la secuencia de audio dividida en sub-bandas de frecuencia. El primer paso es que mediante las coordenadas espaciales (303), busca en la tabla paramétrica los correspondientes valores de ganancia y tiempo de retraso para cada sub-banda de frecuencia y canal de audio. El mezclador y desfasador (302) toma el conjunto de secuencias de audio (103) y modifica este conjunto con los valores paramétricos obtenidos anteriormente aplicando el valor de ganancia y tiempo de retraso correspondiente al canal izquierdo al conjunto (103), y posteriormente los valores de ganancia y tiempo de retraso del canal derecho al mismo conjunto (103) . Es importante mencionar que el orden de las operaciones de ganancia y retraso puede ser invertido. Al final se mezclan las secuencias de audio modificadas de todas las sub-bandas de cada canal y se obtiene la señal de sonido estéreo compuesta por las señales de salida (304) y (305) con el efecto de ubicación tridimensional. It is important to mention that the mixer and phase shifter module (302) makes modifications to the audio sequence divided into frequency subbands. The first step is that using the spatial coordinates (303), look in the parametric table for the corresponding gain and delay time values for each frequency sub-band and audio channel. The mixer and phase shifter (302) takes the set of audio sequences (103) and modifies this set with the parametric values obtained previously by applying the gain value and delay time corresponding to the left channel to the set (103), and subsequently the values of gain and delay time of the right channel to the same set (103). It is important to mention that the order of profit and delay operations can be reversed. In the end the modified audio sequences of all sub-bands of each channel and the stereo sound signal composed of the output signals (304) and (305) is obtained with the effect of three-dimensional location.

La presente invención requiere de una interconectividad de las dos etapas en que se compone el método descrito. La conexión de la etapa de pre- procesamiento (100) con la etapa de procesamiento espacial de la señal de audio en tiempo real (300) se da por medio de una memoria (200) . El esquema descrito en la figura 1 contempla que las n señales que proceden de la etapa de pre-procesamiento (100), y las que necesita la etapa de procesamiento espacial de la señal de audio en tiempo real tienen el mismo formato; el invento aquí reivindicado no se limita a este caso en particular, sino deja la posibilidad de alterar el formato de las señales Bl...Bn (103) de cualesquier formato a cualesquier formato distinto para cada señal de las sub-bandas de frecuencia. Estas modificaciones se hacen tantas veces como se considere necesario, sin alterar la naturaleza del invento expuesto. The present invention requires an interconnectivity of the two stages in which the described method is composed. The connection of the preprocessing stage (100) with the spatial processing stage of the real-time audio signal (300) is given by means of a memory (200). The scheme described in Figure 1 contemplates that the n signals that come from the preprocessing stage (100), and those required by the spatial processing stage of the real-time audio signal have the same format; The invention claimed herein is not limited to this particular case, but leaves the possibility of altering the format of the Bl ... Bn (103) signals of any format to any different format for each signal of the frequency subbands. These modifications are made as many times as deemed necessary, without altering the nature of the exposed invention.

De acuerdo con la presente invención, las señales de salida (304) y (305) se conectan a auriculares lo que le proporciona al método capacidad de movilidad, esta propiedad es útil para, pero sin limitarse a, aplicaciones para dispositivos móviles o consolas de videojuegos móviles. Otro factor de ventaja de éste radica en la generación de la tabla paramétrica, dado que se facilita ya que es independiente a la acústica del lugar, además de que esta tecnología se puede usar en situaciones donde los altoparlantes no son posibles de usar. In accordance with the present invention, the output signals (304) and (305) are connected to headphones which provides the mobility capability method, this property is useful for, but not limited to, applications for mobile devices or consoles. mobile video games Another advantage factor of this lies in the generation of the parametric table, since it is facilitated since it is independent of the acoustics of the place, in addition to this technology can be used in situations where loudspeakers are not possible to use.

Es necesario añadir que cuando se cambia la ubicación espacial del sonido de manera muy rápida, este cambio puede generar efectos sonoros no placenteros; para evitar éstos, se puede agregar un módulo que suavice estas transiciones al escucha. Este módulo "smoother" o suavizador recibe las señales (304) y (305) de un aparato que implemente el presente método y proporciona dos salidas de audio, cada una representando los respectivos canales izquierdo y derecho de un sistema de audio estéreo. Estas señales son suavizadas mediante algún filtro especial u otro método generando dos señales para la los dos canales de audio estéreo sin los sonidos molestos antes presentados. It is necessary to add that when the location is changed spatial sound very quickly, this change can generate unpleasant sound effects; To avoid these, you can add a module that softens these transitions to the listener. This "smoother" or softener module receives signals (304) and (305) from an apparatus that implements the present method and provides two audio outputs, each representing the respective left and right channels of a stereo audio system. These signals are smoothed by some special filter or other method generating two signals for the two stereo audio channels without the annoying sounds presented above.

Claims

CLAIMS Having described the invention, the content of the following claims is claimed as property:

1. A method of processing one or more monaural audio signals that allows the simulation of 3D location of sound sources, characterized in that it comprises:

A preprocessing stage (100) that can be implemented in software and / or hardware, of the type one input-several outputs, where the input signal is the original audio sequence that enters through the input (101) and the Output signals are audio sequences (103) that correspond to the frequency subbands into which the original signal is divided. This step stores the different audio sequences (103) in a memory device (200) indefinitely.

A stage of spatial processing of the real-time audio signal (300) that can be implemented in software and / or in hardware of several inputs, which consist of n audio sequences (103) (obtained from the memory device (200 )) and spatial coordinates (303). With the use of these, the mixer and phase shifter (302) searches the parametric table (301) for the modification values of the different audio sequences (103) for each of the stereo channels. The mixer and phase shifter (302) adds the audio sequences (103) with their respective modifications to each of the stereo audio channels and sends the signals to the respective audio channel. left audio (304) and right audio channel (305) of headphones.

2. The method of processing one or more monaural audio signals according to claim 1, wherein the preprocessing stage (100) can have several inputs, being the original audio sequence entering through the various inputs (101 ) and spectral decomposition variables (104). The audio output sequences (103) corresponding to the frequency subbands into which the original signal is divided will have spectral characteristics defined by the spectral decomposition variables (104).

3. The method of processing one or more monaural audio signals according to claim 1, wherein in the preprocessing stage (100), the original audio sequence that enters through the input (101) can be broken down into different sub -frequency bands, and that these audio sequences may, by any known and known technique, internally become any of their possible representations to any other possible representation as many times as necessary, this to achieve the desired frequency decomposition.

4. The method of processing one or more monaural audio signals according to claim 1, wherein the step of spatial processing of the real-time audio signal (300) can be implemented several times in the same application, generating an amount of 3D signals at the decision of the system implementer.

5. The method of processing one or more monaural audio signals according to the claims above where the spatial location coordinates are inserted either by a person or device through a real-time interface.

6. The method of processing one or more monaural audio signals according to the preceding claims wherein the spatial location coordinates are taken from a memory device that have been previously entered by a person or device through an interface.

7. The method of processing one or more monaural audio signals according to the preceding claims, wherein the output audio signals are stored in memory for later reproduction.

8. The method of processing one or more monaural audio signals according to the preceding claims wherein the output audio signal passes through a smoothing module.