[go: up one dir, main page]

HK1236308A1 - Determination and use of auditory-space-optimized transfer functions - Google Patents

Determination and use of auditory-space-optimized transfer functions Download PDF

Info

Publication number
HK1236308A1
HK1236308A1 HK17109926.1A HK17109926A HK1236308A1 HK 1236308 A1 HK1236308 A1 HK 1236308A1 HK 17109926 A HK17109926 A HK 17109926A HK 1236308 A1 HK1236308 A1 HK 1236308A1
Authority
HK
Hong Kong
Prior art keywords
room
transfer functions
listening
optimized
listening room
Prior art date
Application number
HK17109926.1A
Other languages
German (de)
French (fr)
Chinese (zh)
Other versions
HK1236308B (en
Inventor
Karlheinz Brandenburg
Stephan Werner
Christoph SLADECZEK
Original Assignee
Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Technische Universität Ilmenau
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V., Technische Universität Ilmenau filed Critical Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
Publication of HK1236308A1 publication Critical patent/HK1236308A1/en
Publication of HK1236308B publication Critical patent/HK1236308B/en

Links

Description

Embodiments of the present invention relate to a device for determining "room-optimized transfer functions" for a listening room, to a corresponding method, and to a device for spatial reproduction of an audio signal with corresponding methods. According to preferred embodiments, the reproduction is performed using a binaural near-field sound transducer, such as, for example, a stereo headphone or stereo in-ear headphones. Further embodiments relate to a system comprising the two devices, and to a computer-implemented method for carrying out the aforementioned methods.
The perceptual quality when presenting a spatial auditory scene, for example based on a multichannel audio signal, depends significantly on the acoustic artistic design of the content being presented, on the playback system, and on the room acoustics of the listening room or audition room. A main objective in the development of audio playback systems is to generate auditory events that are perceived as plausible by the listener. This plays a particular role, for example, in the playback of video-audio content. When content is perceived by the user as plausible, various perceptual quality characteristics, such as, for example, localizability, perception of distance, perception of spaciousness, and the sonic aspects of the reproduction, must meet expectations. Ideally, the perception of the reproduced sound thus corresponds to the actual situation in the room.
In speaker-based audio reproduction systems, two- or multi-channel audio material is played back in the listening room. This audio material can originate from a channel-based mix, where the final speaker signals are already available. Furthermore, the speaker signals can also be generated using an object-based audio reproduction method. In this case, based on a description of an audio object (e.g., position, volume, etc.) and knowledge of the prevailing speaker arrangement, the speaker playback signals are generated. Phantom sound sources are created, which are usually located along the connection axes between the speakers. Depending on the selected speaker arrangement and the prevailing room acoustics of the listening room, these phantom sound sources can be perceived by the listener in different directions and distances. The room acoustics itself has a significant influence on the pleasantness of the reproduced auditory scene.
However, playback via speaker systems is not practical in all listening situations. Furthermore, it is not possible to install speakers everywhere. Examples of such situations include listening to music on mobile devices, use in changing rooms, user acceptance, or acoustic disturbance of other people. As an alternative to speakers, near-field sound transducers, such as in-ear devices or headphones, which are worn directly at or in the immediate vicinity of the ear, are often used.
Classic stereo reproduction via loudspeakers, which are equipped with, for example, an acoustic driver per channel or ear, creates in the listener the perception that the phantom sound sources being depicted are located on the axis connecting the two ears inside the head. This phenomenon is called "in-head localization." A plausible external perception (externalization) of the phantom sound sources does not occur. The phantom sound sources thus created usually do not possess direction (information) or distance (information) that can be decoded by a user, such as would be present when reproducing the same acoustic scene via a speaker system (e.g., 2.0 or 5.1) in a listening room.
To bypass the in-head localization during headphone playback, methods of binaural synthesis are used (without losing artistic design and mixing in the audio material). In binaural synthesis, so-called "external ear transfer functions" (head-related transfer function, HRTF, or head-related transfer function) are used for the left and right ears. These external ear transfer functions include a variety of directional vectors assigned to each ear for virtual sound sources. According to these, a filtering of the audio signals occurs during playback, so that an auditory scene is spatially represented or the spatiality is simulated. Binaural synthesis takes advantage of the fact that interaural characteristics are mainly responsible for the perception of the direction of a sound source, and these interaural characteristics are reflected in the external ear transfer functions.Therefore, if an audio signal is to be perceived from a defined direction, this signal is filtered with the HRTFs corresponding to that direction, for either the left or the right ear. Using binaural synthesis, it is thus possible to reproduce a realistic spatial sound scene, for example stored as multichannel audio via headphones. To simulate a speaker setup virtually, direction-dependent HRTF pairs are used for each speaker to be simulated. In order to plausibly represent the direction and distance of the speaker setup, the direction-dependent acoustic transfer functions of the listening room (room-related transfer function, RRTF, room-related transfer function) must also be emulated. These are combined with the HRTFs and result in binaural room impulse responses (BRIRs, binaural room-impulse responses).The BRIRs can be applied as filters to the acoustic signal.
However, current research and studies clearly show that the plausibility of an audio reproduction depends not only on the physically correct synthesis of the playback signals but also significantly on context-dependent quality parameters, particularly on the user's expectation horizon regarding room acoustics. Therefore, there is a need for an improved approach in binaural synthesis.
The US 2013/0272527 A1 describes an audio system with a receiver for receiving an audio signal in a so-called "binaural circuit for generating a binaural signal by means of which a virtual sound source can be positioned in the room." In this case, an adaptation of the binaural transfer function depending on the acoustic environmental parameters is possible, making the sound appear very natural.
The US 2008/0273708 A1 shows how sound signals can be used through HRTF processing to simulate early reflections.
The object of the present invention is to create an improved spatial reproduction using near-field transducers, particularly with regard to the alignment between acoustic synthesis and the consumer's expectation horizon.
The task is solved by the independent patent claims.
Embodiments of the present invention provide a (portable) device for determining "listening room-optimized transfer functions" for a listening room based on an analysis of the room acoustics. The listening room-optimized transfer functions are used for listening room-optimized post-processing of audio signals during spatial reproduction. Based on the external ear transfer functions (HRTFs), a room to be synthesized can be emulated, and based on the listening room-optimized transfer functions, the listening room itself can be emulated. By using these two transfer functions, which can also be referred to in combined form as binaural room-related impulse responses, a realistic spatial sound simulation is achieved, which corresponds to the characteristics specified by the multi-channel (stereo) signal in terms of spatiality, but is improved considering the expected auditory perception, which is particularly anticipated by the room acoustics.
According to further exemplary embodiments, the present invention provides another (portable) device for spatial reproduction of an audio signal using a binaural near-field transducer, wherein the spatial reproduction is simulated by means of known external ear transfer functions and by means of transfer functions optimized for a listening room, such that during playback of audio content, the acoustic signals emitted via the near-field transducer are imparted with the characteristics of a listening room.
According to the core idea, the present invention thus creates the prerequisites for taking into account cognitive effects during the playback of multi-channel stereo. To this end, according to a first aspect, listening room-optimized transfer functions are determined for the respective listening room, in which, for example, an auditory scene is to be reproduced using headphones (generally using a binaural near-field sound transducer). The determination of the listening room-optimized transfer functions essentially corresponds to the derivation of a room acoustic filter based on the measured room acoustics, with the aim of synthetically reproducing the acoustic properties of the real room. In a second step, the auditory scene can then be reproduced according to a second aspect of the invention, both by means of HRTFs and by means of the listening room-optimized transfer functions as a spatial audio simulation. During playback, HRTFs create the sense of space, while the listening room-optimized transfer functions enable the adaptation of the spatiality to the current listening room situation. In other words, the listening room-optimized transfer functions perform an adjustment or post-processing of the HRTFs or of the signals processed by the HRTFs. As a result, the divergence between the room to be reproduced, defined by the multi-channel audio material, and the listening room in which the listener is located, can be reduced during the playback of audio content.
There are different possibilities for determining the hearing-room-optimized transfer functions. According to a first variant, measurements can be performed using a test sound source and a microphone, allowing the room acoustics to be analyzed along a test path within the listening room, thus obtaining an acoustic model of the room. According to a second variant, naturally occurring sounds, such as a voice, can also be used as test signals. This second variant particularly offers the advantage that practically any electronic device equipped with a microphone, such as a mobile phone or smartphone, on which the above-described functionality is implemented, is sufficient to determine the room acoustics. According to a third variant, the analysis of the listening room or the determination of the acoustic room model can be based on geometric models. In this context, it would also be possible to capture a geometric model optically, for example using a camera typically integrated into mobile devices (such as mobile phones), in order to calculate the acoustic model of the listening room subsequently. Based on such an obtained acoustic room model, the hearing-room-optimized transfer functions can now be determined.
According to further examples, not only the listening room itself can be taken into account, but also the positioning of the listener within the listening room. The background for this is that the room acoustics or acoustic perception changes accordingly depending on whether the listening position is closer to the wall or in which direction the listener is facing. Thus, according to further examples, a variety of direction-dependent and/or position-dependent transfer functions (transfer function families) can be stored within the room-optimized transfer functions, which can be selected, for example, depending on the listener's position within the listening room or on the listener's viewing angle.
Also, with regard to the binaural room-optimized transfer functions, it is advantageous if the device for spatial reproduction or a database connected to the device stores a variety of binaural room-optimized transfer function sets for different listening rooms, so that they can be retrieved depending on which room the listener is currently in. For this purpose, the spatial reproduction device may, for example, also include a positioning device, such as a GPS.
According to further examples, it is also possible, in addition to or in parallel with the listening room characteristics, to imprint the corresponding characteristics of a virtual speaker setup onto the audio material to be reproduced. This virtual speaker setup may, for example, correspond to a real speaker setup in the listening room or may be freely configured.
Further exemplary embodiments relate to the corresponding methods for determining the headphone-room-optimized transfer functions and for reproducing multi-channel stereo audio signals (or object-based audio signals or WFS audio signals) using the headphone-room-optimized transfer functions.
The following embodiments will be explained in detail with reference to the accompanying figures. They show: Fig. 1a a schematic block diagram of a device for determining hearing room-optimized transfer functions; Fig. 1b a schematic flow diagram of a method for determining hearing room-optimized transfer functions; Fig. 2a a schematic block diagram of a device for spatial reproduction of multi-channel stereo audio material taking into account hearing room-optimized transfer functions; Fig. 2b a schematic flow diagram for a method for spatial reproduction of multi-channel stereo audio material taking into account hearing room-optimized transfer functions; and Fig. 3 a schematic block diagram of a system for determining and using hearing room-optimized transfer functions.
Before the following embodiments of the present invention are explained in detail with reference to the accompanying drawings, it should be noted that identical or functionally equivalent elements are provided with the same reference numerals, so that their descriptions can be applied or exchanged accordingly.
Prior to describing the invention, the following section addresses the motivation behind capturing and auralizing the room acoustics of an listening room for location-dependent spatial audio reproduction via headphones. In this context, a brief introduction to binaural synthesis is also provided, along with an overview of the external ear transfer functions (HRTFs) used for binaural synthesis and the adjustable variables contained within these HRTFs. Based on this overview, it is further explained how the HRTFs can be adapted by the room-optimized transfer functions TF to be determined, in order to appropriately account for the room acoustics according to the invention.
Binaural synthesis is based on the principle that an audio signal is filtered with a specific filter function or HRTF (Head-Related Transfer Function) before output via a sound transducer (preferably directly at one of the ears). The filter characteristics differ depending on the direction vector or virtual sound source, thereby simulating spatial sound, for example when using headphones. The filter functions/HRTFs are modeled after the natural sound localization mechanisms of the human auditory system. This makes it possible to process the audio signal in either the analog or digital domain, or to impart an acoustic characteristic to it as if it were emitted from any position in space. The mechanisms involved in sound localization are: detection of the lateral direction of incidence; detection of the direction of incidence in the median plane; and detection of the distance.
For localization with respect to the lateral direction, acoustic features such as time differences between left and right channels and (frequency-dependent) level differences between left and right channels are significant. In particular, a distinction can be made between phase delay at low frequencies and group delay at high frequencies for the time differences. These time differences can be simulated via signal processing using any stereo driver. The determination of the direction of incidence in the median plane is based on the fact that the pinna and/or the ear canal entrance perform a direction-selective filtering of the acoustic signal. This filtering is frequency-selective, so an audio signal can be pre-filtered by such a frequency filter to simulate a specific direction of incidence or to emulate spatiality. The determination of the distance of a sound source from the listener is based on different mechanisms. The main mechanisms are loudness, frequency-selective filtering of the traveled sound path, sound reflection, and initial time gap. A large part of the above-mentioned factors is individual. Individual variables may include, for example, the distance between the ears and the shape of the pinna, which particularly affect lateral and median localization. The spatial sound simulation is achieved by manipulating an audio signal with regard to the aforementioned mechanisms, where the manipulation parameters (per room direction and distance) are stored in the HRTFs (Head-Related Transfer Functions).
These HRTFs (Head-Related Transfer Functions) are primarily intended for free-field sound propagation. The background is that the aforementioned three factors for localization are distorted in applications within enclosed spaces, because the sound emitted by a sound source does not only arrive directly but also in a reflected form (e.g., via walls) to the listener, which leads to a change in acoustic perception. Therefore, in rooms, there is both direct sound and (later arriving) reflected sound, which can be differentiated by the listener, for example, based on the time delay for specific frequency bands and/or the position of the secondary sound source in the room. These (reverberation) parameters also depend on the room size and characteristics (e.g., damping, shape), so that a listener can estimate the room size and characteristics. Since these room acoustics parameters are basically perceived through the same mechanisms as those of localization, room acoustics can also be binaurally simulated. To simulate room acoustics, the HRTF is extended by means of the RRTF to the binaural room impulse response (BRIR), which simulates specific acoustic room conditions for the listener when using headphones. Thus, depending on the virtual room size, there is a change in the reverberation behavior, a shift of secondary sound sources, and a change in the loudness of secondary sound sources, particularly in relation to the loudness of primary sound sources.
As mentioned at the beginning, cognitive effects also play a significant role for the listener. Studies on such cognitive effects have shown that the relevance of parameters such as the degree of similarity between the listening room and the room to be synthesized is high. In the case of a small divergence between the listening room and the reproduced room, the expert speaks of low externalization of the listening event.
By being motivated in this way, the binaural synthesis is now to be extended so that the binaural simulation of an auditory scene can be adapted to the context of use. In detail, the simulation is adjusted according to the listening conditions, such as the current room acoustics (attenuation) and the geometry of the listening room. For this purpose, parameters such as distance perception, spatial perception, and direction perception can be varied so that they appear plausible with respect to the current listening room. Variation parameters include, for example, HRTF or RRTF characteristics, such as time delay differences, level differences, frequency-selective filtering, or initial time gap. The adaptation can, for example, be performed by emulating a room size with a specific reverberation behavior (reverberation or reflection behavior), or by limiting distances, e.g., between the listener and the sound source, to a maximum value. Another influencing factor on the spatial sound behavior is the user's position within the listening room, since it is crucial for reverberation and reflections whether the user is located centrally in the room or near a wall. This behavior can also be emulated by adjusting the HRTF or RRTF parameters. Subsequently, it will be explained how and with which means the adjustment of the HRTF or RRTF parameters is carried out in order to improve the plausibility of the acoustic simulation on-site.
The concept of auralization of room acoustics includes, in its basic structure, two components, which are represented on one hand by two independent devices and on the other hand by two corresponding methods. Referring to Figures 1a and 1b, the first component, namely the acquisition of hearing-room-optimized transfer functions TF, is explained. Before explaining the use of the hearing-room-optimized transfer functions TF, referring to Figures 2a and 2b, will be discussed.
Fig. 1a shows a device 10 for determining optimized transfer functions TF (transfer function) for a listening room 12. To determine the listening room-optimized transfer functions TF, the listening room 12 or its room acoustics are analyzed. Therefore, the device 10 includes an interface, for example, as illustrated here a microphone interface (see reference numeral 14), for capturing room-related data. Since the listening room-optimized transfer functions TF, which are subsequently used via binaural synthesis to impart the characteristics of the listening room to an acoustic material, are typically designed in such a way that existing HRTFs are adapted, the device 10 can determine the transfer functions TF taking into account the HRTFs to be used. In this respect, the device 10 optionally includes another interface for reading in or forwarding HRTFs.
The following describes different approaches for determining the room acoustics, starting from the device 10, based on which the hearing-room-optimized transfer functions TF are then determined in a subsequent step. According to a first variant, the prevailing room acoustic conditions of the listening room can be measured technically. For example, the room acoustics of the listening room 12 can be measured using an acoustic measurement method and the device 10. To this end, a test signal is then used, transmitted via an optional speaker (not shown). The playback of the test signal or the control of the speaker can take place via the device 10, if the device 10 includes a speaker interface (not shown) or the speaker itself.The measurement signal transmitted via the speaker into room 12 is recorded using the microphone 14, so that the room acoustics can be determined based on the signal change along the measurement path (between the speaker and microphone). Thus, at least one hearing-room-optimized transfer function TF, for example, for a specific room direction or for a plurality of hearing-room-optimized transfer functions TF, can be derived. From the measured transfer function in a specific direction, room-acoustic parameters relevant to the listening room are derived. These are used to generate the hearing-room-optimized transfer functions TF for other required directions. For this purpose, discrete first reflections can be adapted to other room directions and distances of the virtual sound source positions to be displayed, by compressing and/or stretching certain areas of the impulse response (transfer function in the time domain).The information relevant to direction perception is contained in the HRTFs. To determine the ear-room-optimized transfer functions TF for all room directions or with very high accuracy, it may be advantageous, according to further exemplary embodiments, to repeat the analysis using the test signal for different positions of microphone 14 and speaker in the listening room 12.
According to another variant, the determination of room acoustics can be estimated using acoustic signals that are already present in the listening room 12. Examples of such signals include ambient sounds that are naturally present, as well as a speech signal from a user. The algorithms used here are derived from algorithms for removing reverberation from a speech signal. The background is that typically, in the case of reverberation reduction algorithms, an estimation of the room transfer function lying on the signal to be de-reverberated is performed. So far, these algorithms have been used to determine a filter which, when applied to the original signal, results in the least reverberated signal possible. In the application for analyzing room acoustics, the filter function is not determined, but rather only an estimation method is used to identify the characteristics of the listening room. Again, the microphone 14, which is coupled to the device 10, is used in this process.
According to a third variant, room acoustics can be simulated based on geometric room data. This approach is based on the fact that geometric data (e.g., edge dimensions, free path length) of a room 12 allow for an estimation of the room acoustics. The room acoustics of the room 12 can either be directly simulated or approximately determined based on a room acoustic filter database that includes acoustic comparison models. In this context, methods such as acoustic ray tracing or the image source method in combination with a diffuse sound model can be mentioned. The two aforementioned methods are based on geometric models of the listening room. Therefore, the interface described above for capturing listening room-related data of the device 10 does not necessarily have to be a microphone interface, but can generally be referred to as a data interface used for reading in geometric data. Furthermore, it is also possible to read in additional data via the interface, which may include information about a speaker setup present in the listening room.
Several possibilities exist for acquiring geometric space data: According to a first sub-variant, the data can be obtained from a geometry database, such as Google Maps Inhouse. These databases typically include geometric models, such as vector models of spatial geometries, from which distances, but also reflection characteristics, can primarily be determined. According to another sub-variant, an image database can also be used as input, where geometric parameters are then determined in an intermediate step using image recognition. According to an alternative sub-variant, it would also be possible to obtain image information not from an image database, but directly by means of a camera or, more generally, an optical sensor, thus allowing a geometric model to be determined directly by the user. Based on the room geometry determined from image data, room acoustics can then be simulated analogously to the previous point.
By means of these simulated room acoustic models, in a subsequent step, the hearing-room-optimized transfer functions TF are derived for at least one, preferably for a plurality of rooms. The derivation of the hearing-room-optimized transfer functions TF, which are comparable in terms of their parameters to the RRTFs, essentially corresponds to determining a filter function (per room direction), by means of which the acoustic behavior in the room, e.g., sound propagation in a specific room direction, can be modeled. The room-specific transfer functions TF per room typically include a plurality of transfer functions, by means of which the external ear transfer functions (assigned to individual room angles) can be adapted accordingly (comparable to the approach used in processing the room impulse response). Therefore, the number of hearing-room-optimized transfer functions TF is typically determined by the number of external ear transfer functions.These are functions that appear as function families and consist of a variety of directions, namely for left/right and for the relevant directions. The exact number of external ear transfer functions in the HRTF model depends on the desired spatial resolution capability and can vary significantly, because there are also HRTF models where a large number of direction vectors are determined by interpolation. From this context, it becomes clear why it is reasonable for the device for determining the binaural room-optimized transfer function TF to use the HRTF model. In a further step, the determined binaural room-optimized transfer functions TF are, for example, stored in a room acoustic filter database.
According to another embodiment, a plurality of room-optimized transfer function families (TF) can also be determined and stored per listening room, thereby taking into account the fact that the listening room functions or the acoustic behavior in the listening room differ depending on the position of the listener. In other words, this means that for each (possible) position of the user in the listening room 12, a specific room-optimized transfer characteristic can be determined, wherein the determination thereof can be based on the same acoustic model of the listening room 12. Consequently, advantageously, the analysis of the listening room needs to be performed only once. According to another embodiment, different room-optimized transfer function families (TF) can also be determined per room direction in which the user is looking.
The device described above 10 can be implemented in different ways. According to preferred embodiments, the device 10 is designed as a mobile device, wherein the sensor 14, such as a microphone or camera, can be integrated accordingly. In other words, further embodiments relate to a device for determining the hearing room-optimized transfer function TF, which includes on one hand the analysis unit 10 and on the other hand a microphone and/or a camera. The analysis unit 10 can, for example, be implemented as hardware or software-based. Thus, embodiments of the device 10 include an internal or cloud computing-connected CPU or another logic that is configured to perform the determination of hearing room-optimized transfer functions TF and/or the hearing room analysis. Subsequently, the method, in particular the basic steps of the method, on which the algorithm for software-implemented determination of hearing room-optimized transfer functions TF is based, will be explained with reference to FIG. 1b.
Fig. 1b shows a flow diagram 100 of the method for determining the head-related transfer functions (HRTFs) optimized for a listening room. The method 100 includes the central step 110 of determining the head-related transfer functions (HRTFs) optimized for a listening room. As explained above, step 110 is based on the analysis of the room acoustics 120 (see step 120 "Analyze room acoustics") and optionally also on existing HRTF functions. Starting from step 110, another optional step, namely the storage of the transfer functions TF, can follow. This step is labeled with reference numeral 130.
According to further exemplary embodiments, it would also be conceivable in the embodiments explained with reference to FIGS. 1a and 1b that, at the same time as determining the hearing room-optimized transfer functions TF, a determination of the position of the listening room is carried out, so that the resulting dataset regarding the position can directly be assigned to the respective listening room. This offers the advantage that, when retrieving the hearing room-optimized transfer functions TF from a database at a later time, it is possible to assign each dataset based on a position determination.
Hereinafter, the use of the previously determined hearing room-optimized transfer functions TF is explained with reference to Fig. 2a and 2b.
Fig. 2a shows a device for spatial reproduction 20 using a binaural near-field sound transducer 22. The functionality of the device 20 is explained, among other things, with the aid of the flowchart in Fig. 2b, which illustrates the method 200 of reproduction. The device 20 is designed to reproduce the audio signal 24, such as a multi-channel stereo audio signal (or an object-based audio signal or an audio signal based on a wave field synthesis algorithm (WFS)), and to simultaneously emulate ambience sound (see step 210). For this purpose, the playback device 20 performs processing of the audio signal using HRTFs and using the room-optimized transfer functions TF.
The device 20 may include an HRTF/TF memory or be connected, for example, to a database where the HRTFs and the corresponding room-optimized transfer functions TF determined by the above methods are stored. According to preferred embodiments, prior to processing the audio signal, the HRTF is combined (see step 220) with the TF or adapted based on the TF. The result of this combination is a transfer function BRIR' comparable to the BRIR (room impulse response), which is then finally applied to the audio signal 24 to simulate the spatial sound (see step 210). This processing essentially corresponds to applying a BRIR'-based filter to the audio signal. Thus, it is possible to perform binaural synthesis in combination with the reverberation of the audio signals depending on the acoustic conditions prevailing in the listening room, so that during playback there is a high degree of similarity between the synthesized room and the listening room. Consequently, the synthesized room (at least approximately) matches the user's expectations, thereby increasing the plausibility of the scene.
According to embodiments, the device 20 may also include a position-determining unit, such as a GPS receiver, by means of which the current position of the listener can be determined. Based on the determined position, the listening room can now be identified, and the room-optimized transmission functions TF assigned to the listening room can be loaded (and optionally updated when changing rooms). Optionally, it is also possible, using this position-determining unit, to determine the position of the listener within the listening room, in order to also display differences in acoustics depending on the listener's position within the room, if stored. According to third embodiments, this position-determining unit can also be extended by an orientation-determining unit, so that the direction of gaze of the listener can also be determined, and the TFs can be loaded accordingly depending on the determined direction of gaze, in order to account for direction-dependent listening room acoustics.
Starting from this basic consideration of the two units 10 and 20, an extended embodiment example of Fig. 3 is now explained. Fig. 3 shows a schematic representation of the signal flow when listening to adapted room acoustic simulations for use with binaural synthesis, based on a system 10 + 20, which includes the device for determining the TFs and the device for reproducing audio signals using the TFs.
Such a system 10 + 20 can, for example, be implemented as a mobile device (e.g., a smartphone), on which the file to be played back is also stored. The system 10 + 20 is essentially a combination of the device 10 from Fig. 1a and the device 20 from Fig. 1b, where the individual components are divided differently for functional explanation.
The system 10 + 20 includes a functional unit for the auralization of the listening room 20a and a functional unit for binaural synthesis 20b. Furthermore, the system 10 + 20 comprises a function block 10a for modeling the room acoustics and a function block 10b for modeling the transmission behavior. The modeling of the room acoustics is based on an acquisition of the listening room, which is performed by the function block 10c for acquiring the room acoustics. Moreover, the system 10 + 20, in the illustrated embodiment, includes two memories, namely one for storing scene position data 30a and one for storing HRTF data 30b. Subsequently, the functionality of the system 10 + 20 will be explained starting from the information flow during playback, assuming that the listening room is known to the system 10 + 20 or has already been determined using a positioning method (see above).
When reproducing channel-based or object-based audio data 24 using the headphone 22, the audio data are first supplied to a signal processing unit 20a, which applies the previously modeled room transfer function TF to the signal 24 and this is reverberated. The modeling of the room transfer function TF takes place in a signal processing block 10a, where this modeling can be overlaid by the modeling transfer behavior (see function block 10b), as explained below.
This second (optional) function block 10b models a virtual speaker setup within the respective listening room. Thus, the user can be given an acoustic behavior that emulates the playback of an audio file on a specific speaker setup (2.0, 5.1, 9.2). In this case, the speaker positions are fixedly connected to the listening room, and each speaker is assigned a specific transmission behavior, for example defined by frequency response and directivity or different level behaviors. It is also possible to position special sound source types, for example a mirror sound source, at a fixed position within the room. The speaker setup is modeled based on scene position data, which include information about the position, distance, or type of the virtual speaker. These scene position data can correspond to a real existing speaker setup or be based on a virtual speaker setup and are typically customizable by the user.
After the signals have been convolved in the auralization processing unit 20a, the convolved signals are supplied to the binaural synthesis 20b, which apply the direction of the virtual loudspeakers to the corresponding audio material by means of a set of directional HRTF filters (see 30b). The binaural synthesis system can optionally evaluate the listener's head rotation, as explained above. The result is a headphone signal that can be adapted to a specific headphone through appropriate equalization, such that the acoustic signal behaves as if it had been reproduced with a specific loudspeaker setup in the respective listening room.
The system 10 + 20 can, for example, be implemented as a mobile device or as components of a home cinema system. In general, application areas include the playback of music and entertainment content, such as sound for films or game sound via the binaural near-field speaker.
It should be noted at this point that, according to an alternative embodiment, the device 20 shown in Fig. 2a can also be designed to emulate a specific speaker setup or the playback of an audio signal for a specific speaker setup, based on scene position data. Similarly, according to another embodiment, the device 10 can be configured to determine the scene position data of a speaker setup in the listening room 12 (e.g., via an acoustic measurement), so that this speaker setup can be emulated using the device 20.
Although certain aspects related to a device have been described, it is understood that these aspects also constitute a description of the corresponding method, so that a block or a component of a device can also be understood as a corresponding method step or as a feature of a method step. Similarly, aspects described in connection with one or more method steps also constitute a description of a corresponding block, details, or feature of a corresponding device. Some or all of the method steps can be performed by a hardware apparatus (or using a hardware apparatus), such as a microprocessor, a programmable computer, or an electronic circuit. In some embodiments, some or several of the main method steps can be performed by such an apparatus.
An invention-related encoded signal, such as an audio signal, a video signal, or a transport stream signal, can be stored on a digital storage medium or can be transmitted over a transmission medium such as a wireless transmission medium or a wired transmission medium, for example the Internet.
The inventive coded audio signal can be stored on a digital storage medium, or can be transmitted via a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the Internet.
Depending on specific implementation requirements, embodiments of the invention can be implemented in hardware or software. The implementation can be carried out using a digital storage medium, for example, a floppy disk, a DVD, a Blu-ray disc, a CD, a ROM, a PROM, an EPROM, an EEPROM, or a flash memory, a hard drive, or another magnetic or optical storage device, on which electronically readable control signals are stored that can interact with a programmable computer system in such a way that the respective method is performed. Therefore, the digital storage medium can be computer-readable.
Some exemplary embodiments according to the invention thus include a storage medium having electronically readable control signals that are capable of cooperating with a programmable computer system in such a way that one of the methods described herein is carried out.
In general, implementation examples of the present invention can be implemented as a computer program product with a program code, wherein the program code is effective to perform one of the methods when the computer program product runs on a computer.
The program code can, for example, also be stored on a machine-readable medium.
Other embodiments include a computer program for performing one of the methods described herein, wherein the computer program is stored on a machine-readable medium.
In other words, an embodiment of the inventive method is thus a computer program having a program code for performing one of the methods described herein, when the computer program is executed on a computer.
Another embodiment of the inventive method is thus a data carrier (or a digital storage medium or a computer-readable medium) on which the computer program for performing one of the methods described herein is recorded.
Another embodiment of the inventive method is thus a data stream or a sequence of signals that represents the computer program for performing one of the methods described herein. The data stream or sequence of signals can be configured, for example, to be transferred via a data communication connection, such as via the Internet.
Another embodiment includes a processing device, such as a computer or a programmable logic component, which is configured or adapted to perform one of the methods described herein.
Another embodiment includes a computer on which the computer program for performing one of the methods described herein is installed.
Another embodiment according to the invention includes a device or system configured to transmit a computer program for performing at least one of the methods described herein to a receiver. The transmission can, for example, be performed electronically or optically. The receiver can, for example, be a computer, a mobile device, a storage device, or a similar device. The device or system can, for example, include a file server for transmitting the computer program to the receiver.
In some embodiments, a programmable logic device (for example, a field-programmable gate array, an FPGA) can be used to perform some or all of the functionalities of the methods described herein. In some embodiments, a field-programmable gate array can work together with a microprocessor to perform one of the methods described herein. Generally, the methods are performed by any hardware device in some embodiments. This can be general-purpose hardware such as a computer processor (CPU), or specific hardware for the method, such as an ASIC.
The embodiments described above are merely illustrative of the principles of the present invention. It is understood that modifications and variations of the arrangements and details described herein will be apparent to those skilled in the art. Therefore, it is intended that the invention be limited only by the scope of the following patent claims and not by the specific details presented herein based on the description and explanation of the embodiments.

Claims (15)

  1. A device (10) for determining room-optimized transfer functions (TF) for a listening room (12) derived for the listening room (12) and serving for room-optimized post-processing of audio signals (24) in spatial reproduction, wherein the spatial reproduction of the audio signals (24) is emulated by means of a binaural close-range sound transducer (22) using known head-related transfer functions (HRTF) and using the room-optimized transfer functions (TF), wherein a room to be synthesized may be emulated based on the head-related transfer functions (HRTF), and wherein the listening room (12) may be emulated based on the room-optimized transfer functions (TF), wherein the device (10) is configured to analyze room acoustics of the listening room (12) and to determine, starting from analyzing the room acoustics, the room-optimized transfer functions (TF) for the listening room (12) where the spatial reproduction by means of the binaural close-range sound transducer (22) is to take place, wherein the device (10) comprises a storage in which may be deposited a plurality of room-optimized transfer function families (TF) for a plurality of listening rooms (12), characterized in that the room-optimized transfer functions (TF) include per room a plurality of transfer functions assigned to individual solid angles, wherein each solid angle represents a sound propagation direction in the room.
  2. The device (10) in accordance with claim 1, wherein the device (10) comprises a microphone (14) of a portable device for acoustic measurement and/or wherein analysis of the room acoustics of the listening room (12) takes place by means of an acoustic measurement in the listening room (12) using ambient noise and/or using a test signal.
  3. The device (10) in accordance with claim 1, wherein the analysis of the room acoustics of the listening room (12) is based on calculating a geometrical model of the listening room (12) and/or modeling the geometrical model based on a camera-based model of the listening room (12).
  4. The device (10) in accordance with claim 2 or 3, wherein the room-optimized transfer functions (TF) are selected such that room acoustics of the listening room (12) may be emulated on the basis thereof.
  5. The device (10) in accordance with any of claims 1 to 4, wherein the device (10) is configured to determine the room-optimized transfer functions (TF) considering a virtual loudspeaker setup in correspondence with which a number of virtual loudspeakers are positioned in the listening room (12).
  6. The device (10) in accordance with any of claims 1 to 5, wherein the known head-related transfer functions (HRTF) comprise a plurality of individual transfer functions (TF) for the left and right ears which are associated to directional vectors for a plurality of virtual sound sources.
  7. The device (10) in accordance with any of claims 1 to 7, wherein emulating the spatial reproduction is based on interaural features, balance features and distance features, wherein the interaural features comprise a connection between a direction of incidence in the medial planes and an individual or non-individual head-related filtering, wherein the balance features comprise a connection between a lateral direction of incidence and a difference in volume and/or a connection between the lateral direction of incidence and a run-time difference, wherein the distance features comprise a connection between a virtual distance and frequency-dependent filtering and/or a connection between the virtual distance and an initial time gap and/or a connection between the virtual distance and a reflection behavior.
  8. The device (10) in accordance with any of claims 1 to 7, wherein the binaural close-range sound transducer (22) is a headset configured to output as the audio signal (24) a multi-channel stereo signal, an object-based audio signal (24) and/or an audio signal (24) on the basis of a wave-field synthesis algorithm.
  9. A method (100) for determining room-optimized transfer functions (TF) for a listening room (12) which are derived for the listening room (12) and may serve for room-optimized post-processing of audio signals (24) in spatial reproduction, wherein the spatial reproduction of the audio signals (24) by means of a binaural close-range sound transducer (22) is emulated using known head-related transfer functions (HRTF) and using the room-optimized transfer functions (TF), wherein a room to be synthesized may be emulated based on the head-related transfer functions (HRTF), and wherein the listening room (12) may be emulated based on the room-optimized transfer functions (TF), comprising:
    analyzing (120) prevailing room acoustics of the listening room (12); and
    determining (110) the room-optimized transfer functions (TF) for the listening room (12) where spatial reproduction by means of the binaural close-range sound transducer (22) is to take place, on the basis of analyzing the room acoustics;
    depositing a plurality of room-optimized transfer function families (TF) for a plurality of listening rooms (12) into a storage,
    characterized in that
    the room-optimized transfer functions (TF) include per room a plurality of transfer functions assigned to individual solid angles, wherein each solid angle represents a sound propagation direction in the room.
  10. A device (20) for spatial reproduction of an audio signal (24) by means of a binaural close-range sound transducer (22), wherein the spatial reproduction is emulated using known head-related transfer functions (HRTF) and using room-optimized transfer functions (TF) for a listening room (12), wherein a room to be synthesized may be emulated based on the head-related transfer functions (HRTF), and wherein the listening room (12) may be emulated based on the room-optimized transfer functions (TF), wherein the room-optimized transfer functions (TF) have been determined beforehand for the respective listening room (12); wherein the device (20) comprises a first storage in which are stored a first plurality of transfer function families (TF) for different listening rooms (12), and a position-determining unit, wherein the position-determining unit is configured to identify the position and determine the listening room (12) using the position identified; and wherein the device (20) is configured to select, for emulating the spatial reproduction, the corresponding transfer functions (TF) for the respective listening room (12) from the transfer function families, characterized in that the room-optimized transfer functions (TF) include per room a plurality of transfer functions assigned to individual solid angles, wherein each solid angle represents a sound propagation direction in the room.
  11. The device (20) in accordance with claim 10, wherein the device (20) comprises a second storage in which are stored a second plurality of transfer function families (TF) for different orientations, and an orientation-determining unit, wherein the orientation-determining unit is configured to determine an orientation in the listening room (12), and wherein the device (20) is configured to select, for emulating the spatial reproduction, the corresponding transfer functions (TF) for the respective orientation from the transfer function families, and/or wherein the device (20) comprises a third storage in which are stored a third plurality of transfer function families (TF) for different positions in the listening room (12), and another position-determining unit, wherein the other position-determining unit is configured to determine a position in the listening room (12), and wherein the device (20) is configured to select, for emulating the spatial reproduction, the corresponding transfer functions (TF) for the respective position in the listening room (12) from the transfer function families, and/or wherein the position-determining unit is configured to determine, while reproducing, the positions again, and wherein the device (20) is configured to update the room-optimized transfer functions (TF) based on the updated position.
  12. A method (200) for spatially reproducing an audio signal (24) by means of a binaural close-range sound transducer (22), comprising:
    post-processing (210) the audio signal (24) using known head-related transfer functions (HRTF) and using room-optimized transfer functions (TF) for a listening room (12) which have been determined beforehand for the listening room (12) where reproduction by means of the binaural close-range sound transducer (22) is to take place, wherein a room to be synthesized may be emulated based on the head-related transfer functions (HRTF), and wherein the listening room (12) may be emulated based on the room-optimized transfer functions (TF);
    storing a first plurality of transfer function families (TF) for different listening rooms (12) in a first storage;
    identifying a position; and
    determining the listening room (12) using the position,
    wherein the device (20) is configured to select, for emulating the spatial reproduction, the corresponding transfer functions (TF) for the respective listening room (12) from the transfer function families, characterized in that the room-optimized transfer functions (TF) include per room a plurality of transfer functions assigned to individual solid angles, wherein each solid angle represents a sound propagation direction in the room.
  13. The method (200) in accordance with claim 12, wherein, before reproducing, combining (220) the head-related transfer functions (HRTF) and the room-optimized transfer functions (TF) to form a room-related room impulse response (BRIR') takes place.
  14. A system (10 + 20) comprising:
    a device (10) in accordance with any of claims 1 to 8; and
    a device (20) in accordance with any of claims 10 to 11.
  15. A computer program having program code that causes performing the method (100; 200) in accordance with claim 9 or 12 when the program runs on a computer, CPU or mobile terminal.
HK17109926.1A 2014-05-28 2015-05-15 Determination and use of auditory-space-optimized transfer functions HK1236308B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE102014210215.4 2014-05-28

Publications (2)

Publication Number Publication Date
HK1236308A1 true HK1236308A1 (en) 2018-03-23
HK1236308B HK1236308B (en) 2020-08-21

Family

ID=

Similar Documents

Publication Publication Date Title
US10003906B2 (en) Determining and using room-optimized transfer functions
CN103329576B (en) Audio system and method of operation
US8855341B2 (en) Systems, methods, apparatus, and computer-readable media for head tracking based on recorded sound signals
CN112005559B (en) Ways to improve the positioning of surround sound
US9769589B2 (en) Method of improving externalization of virtual surround sound
JP7705647B2 (en) Spatial relocation of multiple acoustic streams
KR20200047414A (en) Systems and methods for modifying room characteristics for spatial audio rendering over headphones
CN107852563A (en) Binaural audio reproduces
CN113170271A (en) Method and apparatus for processing stereo signals
CN113196805B (en) Method for obtaining and reproducing a binaural recording
JP2018500816A (en) System and method for generating head-external 3D audio through headphones
US10142760B1 (en) Audio processing mechanism with personalized frequency response filter and personalized head-related transfer function (HRTF)
KR20250096787A (en) Audio signal processor and related method and computer program for generating two-channel audio signals using smart decisions of single-channel acoustic data
Lee et al. A real-time audio system for adjusting the sweet spot to the listener's position
CN108574925A (en) The method and apparatus that audio signal output is controlled in virtual auditory environment
WO2023208333A1 (en) Devices and methods for binaural audio rendering
HK1236308A1 (en) Determination and use of auditory-space-optimized transfer functions
HK1236308B (en) Determination and use of auditory-space-optimized transfer functions
US20250380105A1 (en) System for determining customized audio
US20250380107A1 (en) System for determining customized audio
O’Dwyer Sound Source Localization and Virtual Testing of Binaural Audio
CN119497030A (en) Audio processing method and electronic device
TW202509754A (en) Generating an audio data signal
CN118301536A (en) Audio virtual surrounding processing method and device, electronic equipment and storage medium
CN115167803A (en) A sound effect adjustment method, device, electronic device and storage medium