WO2024084949A1 - 音響信号処理方法、コンピュータプログラム、及び、音響信号処理装置 - Google Patents
音響信号処理方法、コンピュータプログラム、及び、音響信号処理装置 Download PDFInfo
- Publication number
- WO2024084949A1 WO2024084949A1 PCT/JP2023/036004 JP2023036004W WO2024084949A1 WO 2024084949 A1 WO2024084949 A1 WO 2024084949A1 JP 2023036004 W JP2023036004 W JP 2023036004W WO 2024084949 A1 WO2024084949 A1 WO 2024084949A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- information
- listener
- wind
- change
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
- H04S7/303—Tracking of listener position or orientation
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/54—Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H7/00—Instruments in which the tones are synthesised from a data store, e.g. computer organs
- G10H7/02—Instruments in which the tones are synthesised from a data store, e.g. computer organs in which amplitudes at successive sample points of a tone waveform are stored in one or more memories
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K15/00—Acoustics not otherwise provided for
- G10K15/08—Arrangements for producing a reverberation or echo sound
- G10K15/12—Arrangements for producing a reverberation or echo sound using electronic time-delay networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
- G10H2210/265—Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays
- G10H2210/281—Reverberation or echo
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/155—Musical effects
- G10H2210/265—Acoustic effect simulation, i.e. volume, spatial, resonance or reverberation effects added to a musical sound, usually by appropriate filtering or delays
- G10H2210/295—Spatial effects, musical uses of multiple audio channels, e.g. stereo
- G10H2210/305—Source positioning in a soundscape, e.g. instrument positioning on a virtual soundstage, stereo panning or related delay or reverberation changes; Changing the stereo width of a musical source
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/401—3D sensing, i.e. three-dimensional (x, y, z) position or movement sensing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/371—Gensound equipment, i.e. synthesizing sounds produced by man-made devices, e.g. machines
- G10H2250/381—Road, i.e. sounds which are part of a road, street or urban traffic soundscape, e.g. automobiles, bikes, trucks, traffic, vehicle horns, collisions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/395—Gensound nature
- G10H2250/415—Weather
- G10H2250/431—Natural aerodynamic noises, e.g. wind gust sounds, rustling leaves or beating sails
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/461—Gensound wind instruments, i.e. generating or synthesising the sound of a wind instrument, controlling specific features of said sound
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/01—Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
Definitions
- This disclosure relates to an acoustic signal processing method, etc.
- Patent Document 1 discloses technology related to a stereophonic calculation method, which is an acoustic signal processing method.
- a stereophonic calculation method which is an acoustic signal processing method.
- the arrival time of sound at a listener (observer) is controlled so as to change depending on the distance between the sound source and the listener and the speed of sound.
- Patent Document 1 it may be difficult to give the listener a sense of realism.
- the present disclosure therefore aims to provide an acoustic signal processing method and the like that can give listeners a sense of realism.
- the acoustic signal processing method includes an acquisition step of acquiring object information indicating a change in an object causing wind and a predetermined timing related to the change in the object, and an output step of outputting aerodynamic sound data indicating aerodynamic sound caused by the wind a predetermined time after the predetermined timing indicated by the acquired object information based on the change in the object.
- a computer program causes a computer to execute the above-mentioned acoustic signal processing method.
- An audio signal processing device includes an acquisition unit that acquires object information indicating a change in an object that causes wind and a predetermined timing related to the change in the object, and an output unit that outputs aerodynamic sound data indicating aerodynamic sound caused by the wind a predetermined time after the predetermined timing indicated by the acquired object information based on the change in the object.
- the acoustic signal processing method can provide a sense of realism to the listener.
- FIG. 1 is a diagram showing an immersive audio playback system, which is an example of a system to which the audio processing or decoding processing of the present disclosure can be applied.
- FIG. 2 is a functional block diagram showing a configuration of an encoding device which is an example of an encoding device according to the present disclosure.
- FIG. 3 is a functional block diagram showing a configuration of a decoding device which is an example of the decoding device of the present disclosure.
- FIG. 4 is a functional block diagram showing a configuration of an encoding device that is another example of an encoding device according to the present disclosure.
- FIG. 5 is a functional block diagram showing a configuration of a decoding device which is another example of the decoding device of the present disclosure.
- FIG. 1 is a diagram showing an immersive audio playback system, which is an example of a system to which the audio processing or decoding processing of the present disclosure can be applied.
- FIG. 2 is a functional block diagram showing a configuration of an encoding device which is an example of an
- FIG. 6 is a functional block diagram showing a configuration of a decoder which is an example of the decoder in FIG. 3 or FIG.
- FIG. 7 is a functional block diagram showing the configuration of a decoder which is another example of the decoder in FIG. 3 or FIG.
- FIG. 8 is a diagram showing an example of a physical configuration of an audio signal processing device.
- FIG. 9 is a diagram illustrating an example of a physical configuration of an encoding device.
- FIG. 10 is a block diagram illustrating a functional configuration of an audio signal processing device according to an embodiment.
- FIG. 11 is a flowchart of a first operation example of the audio signal processing device according to the embodiment.
- FIG. 12 is a diagram showing an electric fan, which is an object, and a listener according to the first operation example.
- FIG. 13A is a diagram illustrating the process of determining the predetermined time in step S40 shown in FIG.
- FIG. 13B is a diagram illustrating a detailed example of the output of aerodynamic sound data according to the embodiment.
- FIG. 13C is a diagram illustrating another detailed example of the output of aerodynamic sound data according to the embodiment.
- FIG. 14 is a flowchart of a second operation example of the audio signal processing device according to the embodiment.
- FIG. 15 is a diagram showing an ambulance and a listener, which are objects according to the second operation example.
- FIG. 16 is a schematic diagram for explaining the predetermined timing according to the second operation example.
- FIG. 17 is a flowchart illustrating the details of step S35 according to the second operation example.
- FIG. 18 is a flowchart illustrating details of step S35 according to another first example of the second operation example.
- FIG. 19 is a functional block diagram and an example of steps for explaining a case where the rendering unit in FIG. 6 and FIG. 7 performs pipeline
- Patent Document 1 discloses technology relating to a stereophonic calculation method, which is an acoustic signal processing method.
- the arrival time of sound to the listener is controlled to change according to the distance between the sound source and the listener and the speed of sound. More specifically, the arrival time is controlled to become longer as the distance increases, and longer as the speed of sound decreases. This allows the listener to recognize the distance between the object emitting the sound (i.e., the sound source) and themselves.
- Sound that has been controlled in this way is used in applications such as virtual reality (VR) or augmented reality (AR) to reproduce three-dimensional sound in a space (virtual space) in which a user (listener) exists. Sound that has been controlled in this way is particularly used in virtual spaces where information on the listener's 6 DoF (Degrees of Freedom) is sensed.
- VR virtual reality
- AR augmented reality
- the sound that reaches the listener disclosed in Patent Document 1 is the traveling sound of a vehicle (moving sound source), which is an object in VR or AR, and is the sound (engine sound, etc.) emitted by the vehicle itself.
- a vehicle creates wind when it moves. Aerodynamic sound is generated when the wind created by the vehicle reaches the listener's ears. This aerodynamic sound is a sound that is generated, for example, according to the shape of the listener L's ear when wind caused by an object (for example, a vehicle) reaches the listener.
- objects that create wind are not limited to objects that run (move) like the above-mentioned vehicle, but also include objects that generate wind, such as an electric fan.
- Patent Document 1 does not disclose how to allow aerodynamic sound to be heard by the listener. More specifically, Patent Document 1 does not disclose technology for controlling the time it takes for aerodynamic sound to reach the listener when an object creates wind. With the technology disclosed in Patent Document 1, the listener is unable to hear the aerodynamic sound at the appropriate timing, which causes the listener to feel uncomfortable and makes it difficult for the listener to obtain a sense of realism. Therefore, there is a demand for an audio signal processing method that can provide the listener with a sense of realism.
- the acoustic signal processing method includes an acquisition step of acquiring object information indicating a change in an object causing wind and a predetermined timing related to the change in the object, and an output step of outputting aerodynamic sound data indicating aerodynamic sound caused by the wind a predetermined time after the predetermined timing indicated by the acquired object information based on the change in the object.
- the acoustic signal processing method according to the second aspect of the present disclosure is the acoustic signal processing method according to the first aspect, in which the object information indicates a change in the wind due to a change in the object and the predetermined timing is the timing of the change in the wind, and the acoustic signal processing method includes a determination step of determining the predetermined time based on the wind indicated by the acquired object information.
- the acoustic signal processing method according to the third aspect of the present disclosure is the acoustic signal processing method according to the second aspect, in which the change in wind indicated by the object information indicates a change in the wind speed, and in the determination step, the predetermined time is determined based on the wind speed.
- the specified time is determined based on the wind speed, allowing the listener to hear the aerodynamic sound at a more appropriate time.
- the acoustic signal processing method according to the fourth aspect of the present disclosure is the acoustic signal processing method according to the third aspect, in which the aerodynamic sound is a sound generated at the changed wind speed.
- the acoustic signal processing method is the acoustic signal processing method according to the first aspect, in which the object information indicates the position of the object, and the acoustic signal processing method includes a determination step of determining the predetermined time based on the distance between the position of the listener of the aerodynamic sound and the position of the object indicated by the acquired object information.
- the specified time is determined based on the distance, allowing the listener to hear the aerodynamic sound at a more appropriate time.
- the acoustic signal processing method according to the sixth aspect of the present disclosure is the acoustic signal processing method according to the third or fourth aspect, in which the object information indicates the position of the object, and in the determination step, the predetermined time is determined based on the wind speed and the distance between the position of the listener of the aerodynamic sound and the position of the object indicated by the acquired object information.
- the specified time is determined based on the wind speed and the distance, allowing the listener to hear the aerodynamic sound at a more appropriate time.
- the acoustic signal processing method according to the seventh aspect of the present disclosure is an acoustic signal processing method according to any one of the first to sixth aspects, in which the object information indicates that the predetermined timing is a first timing for outputting sound data associated with the object, and in the output step, the aerodynamic sound data is output a predetermined time after the first timing indicated by the acquired object information.
- the aerodynamic sound data can be output a predetermined time after the first timing at which the sound is output, allowing the listener to hear the aerodynamic sound at a more appropriate timing.
- an acoustic signal processing method is an acoustic signal processing method according to any one of the first to sixth aspects, in which the object information indicates the position of the object and the predetermined timing is a second timing at which the distance between the position of the listener of the aerodynamic sound and the position of the object becomes shorter than a predetermined distance, and in the output step, the aerodynamic sound data is output after the predetermined time from the second timing indicated by the acquired object information.
- the aerodynamic sound data can be output at the second time when the distance becomes shorter than the predetermined distance, in other words, at the time when a predetermined time has elapsed since the second time when the object approached the listener, allowing the listener to hear the aerodynamic sound at a more appropriate time.
- the acoustic signal processing method according to the ninth aspect of the present disclosure is an acoustic signal processing method according to any one of the first to sixth aspects, in which the object information indicates that the change in wind due to a change in the object is a change in the wind direction and the predetermined timing is a third timing at which the change in wind direction occurred, and in the output step, the aerodynamic sound data is output the predetermined time after the third timing indicated by the acquired object information.
- an audio signal processing method is the audio signal processing method according to the sixth aspect, in which the object is an object that generates the sound and the wind indicated by sound data associated with the object, and the aerodynamic sound is an aerodynamic sound that is generated when the wind generated by the object reaches the listener.
- the acoustic signal processing method according to the eleventh aspect of the present disclosure is the acoustic signal processing method according to the tenth aspect, in which, when the distance is D, the distance from the position of the object at which the wind speed becomes So is U, and the predetermined time is t, t satisfies the following formula:
- the time from the specified timing until the wind generated by the object reaches the listener can be determined as the specified time. Therefore, since the aerodynamic sound data can be output when such a specified time has elapsed from the specified timing, the listener can hear the aerodynamic sound at a more appropriate timing.
- the acoustic signal processing method according to a twelfth aspect of the present disclosure is the acoustic signal processing method according to the sixth aspect, in which the object is an object that generates the wind by moving the position of the object, and the aerodynamic sound is aerodynamic sound that is generated when the wind generated by the movement reaches the listener.
- the audio signal processing method according to a thirteenth aspect of the present disclosure is the audio signal processing method according to the twelfth aspect, in which the predetermined timing indicated by the object information is the timing at which the amount of change in the distance over time turns from negative to positive.
- the acoustic signal processing method according to the 14th aspect of the present disclosure is the acoustic signal processing method according to the 12th or 13th aspect, in which, when the distance is D, the distance from the position of the object at which the wind speed of the wind generated by the movement becomes So is U, and the predetermined time is t, t satisfies the following formula:
- the time from the specified timing until the wind generated by the object reaches the listener can be determined as the specified time. Therefore, since the aerodynamic sound data can be output when such a specified time has elapsed from the specified timing, the listener can hear the aerodynamic sound at a more appropriate timing.
- a computer program according to a fifteenth aspect of the present disclosure is a computer program for causing a computer to execute an acoustic signal processing method according to any one of the first to fourteenth aspects.
- an audio signal processing device includes an acquisition unit that acquires object information indicating a change in an object causing wind and a predetermined timing related to the change in the object, and an output unit that outputs aerodynamic sound data indicating aerodynamic sound caused by the wind a predetermined time after the predetermined timing indicated by the acquired object information based on the change in the object.
- ordinal numbers such as first and second may be attached to elements. These ordinal numbers are attached to elements in order to identify them, and do not necessarily correspond to a meaningful order. These ordinal numbers may be rearranged, newly added, or removed as appropriate.
- each figure is a schematic diagram and is not necessarily an exact illustration. Therefore, the scale and the like are not necessarily the same in each figure.
- the same reference numerals are used for substantially the same configuration, and duplicate explanations are omitted or simplified.
- ⁇ 3D sound reproduction system> 1 is a diagram showing a stereophonic (immersive audio) reproduction system A0000 as an example of a system to which the acoustic processing or decoding processing of the present disclosure can be applied.
- the stereophonic reproduction system A0000 includes an acoustic signal processing device A0001 and an audio presentation device A0002.
- the acoustic signal processing device A0001 performs acoustic processing on the audio signal emitted by the virtual sound source to generate an audio signal after acoustic processing that is presented to the listener (i.e., the listener).
- the audio signal is not limited to a voice, but may be any audible sound.
- Acoustic processing is, for example, signal processing performed on an audio signal in order to reproduce one or more sound-related effects that a sound generated from a sound source experiences between the time the sound is emitted and the time the listener hears the sound.
- the acoustic signal processing device A0001 performs acoustic processing based on information that describes the factors that cause the above-mentioned sound-related effects.
- the spatial information includes, for example, information indicating the positions of the sound source, the listener, and surrounding objects, information indicating the shape of the space, parameters related to sound propagation, and the like.
- the acoustic signal processing device A0001 is, for example, a PC (Personal Computer), a smartphone, a tablet, or a game console.
- the signal after acoustic processing is presented to the listener (user) from the audio presentation device A0002.
- the audio presentation device A0002 is connected to the audio signal processing device A0001 via wireless or wired communication.
- the audio signal after acoustic processing generated by the audio signal processing device A0001 is transmitted to the audio presentation device A0002 via wireless or wired communication.
- the audio presentation device A0002 is composed of multiple devices, such as a device for the right ear and a device for the left ear, the multiple devices present sounds in synchronization with each other or with the audio signal processing device A0001.
- the audio presentation device A0002 is, for example, headphones, earphones, or a head-mounted display worn on the listener's head, or a surround speaker composed of multiple fixed speakers.
- the stereophonic sound reproduction system A0000 may be used in combination with an image presentation device or a stereoscopic image presentation device that provides an ER (Extended Reality) experience, including visual VR or AR.
- ER Extended Reality
- FIG. 1 shows an example of a system configuration in which the acoustic signal processing device A0001 and the audio presentation device A0002 are separate devices
- the stereophonic sound reproduction system A0000 to which the acoustic signal processing method or decoding method of the present disclosure can be applied is not limited to the configuration of FIG. 1.
- the acoustic signal processing device A0001 may be included in the audio presentation device A0002, which may perform both acoustic processing and sound presentation.
- the acoustic signal processing device A0001 and the audio presentation device A0002 may share the acoustic processing described in this disclosure, or a server connected to the acoustic signal processing device A0001 or the audio presentation device A0002 via a network may perform part or all of the acoustic processing described in this disclosure.
- the audio signal processing device A0001 is referred to as such, but if the audio signal processing device A0001 performs audio processing by decoding a bit stream generated by encoding at least a portion of the data of the audio signal or spatial information used in the audio processing, the audio signal processing device A0001 may be referred to as a decoding device.
- FIG. 2 is a functional block diagram showing a configuration of an encoding device A0100, which is an example of an encoding device according to the present disclosure.
- the input data A0101 is data to be encoded, including spatial information and/or audio signals, that is input to the encoder A0102. Details of the spatial information will be explained later.
- the encoder A0102 encodes the input data A0101 to generate encoded data A0103.
- the encoded data A0103 is, for example, a bit stream generated by the encoding process.
- Memory A0104 stores the encoded data A0103.
- Memory A0104 may be, for example, a hard disk or a solid-state drive (SSD), or may be other memory.
- a bit stream generated by the encoding process is given as an example of the encoded data A0103 stored in the memory A0104, but data other than a bit stream may be used.
- the encoding device A0100 may convert the bit stream into a predetermined data format and store the converted data in the memory A0104.
- the converted data may be, for example, a file or multiplexed stream that stores one or more bit streams.
- the file is, for example, a file having a file format such as ISOBMFF (ISO Base Media File Format).
- ISOBMFF ISO Base Media File Format
- the encoded data A0103 may also be in the form of multiple packets generated by dividing the bit stream or file.
- the encoding device A0100 may be provided with a conversion unit (not shown), or the conversion process may be performed by a CPU (Central Processing Unit).
- FIG. 3 is a functional block diagram showing a configuration of a decoding device A 0110 which is an example of a decoding device according to the present disclosure.
- the memory A0114 stores, for example, the same data as the encoded data A0103 generated by the encoding device A0100.
- the memory A0114 reads out the stored data and inputs it as input data A0113 to the decoder A0112.
- the input data A0113 is, for example, a bit stream to be decoded.
- the memory A0114 may be, for example, a hard disk or SSD, or may be some other memory.
- the decoding device A0110 may not directly use the data stored in the memory A0114 as the input data A0113, but may convert the read data and generate converted data as the input data A0113.
- the data before conversion may be, for example, multiplexed data that stores one or more bit streams.
- the multiplexed data may be, for example, a file having a file format such as ISOBMFF.
- the data before conversion may also be in the form of multiple packets generated by dividing the bit stream or file.
- the decoding device A0110 may be provided with a conversion unit (not shown), or the conversion process may be performed by a CPU.
- the decoder A0112 decodes the input data A0113 to generate an audio signal A0111 that is presented to the listener.
- Fig. 4 is a functional block diagram showing a configuration of an encoding device A0120, which is another example of an encoding device according to the present disclosure.
- components having the same functions as those in Fig. 2 are given the same reference numerals, and descriptions of these components are omitted.
- the encoding device A0100 differs from the encoding device A0100 in that the encoding device A0120 includes a transmission unit A0121 that transmits the encoded data A0103 to the outside, whereas the encoding device A0100 stores the encoded data A0103 in a memory A0104.
- the transmitting unit A0121 transmits a transmission signal A0122 to another device or server based on the encoded data A0103 or data in a different data format generated by converting the encoded data A0103.
- the data used to generate the transmission signal A0122 is, for example, the bit stream, multiplexed data, file, or packet described in the encoding device A0100.
- Fig. 5 is a functional block diagram showing a configuration of a decoding device A0130, which is another example of a decoding device according to the present disclosure.
- components having the same functions as those in Fig. 3 are given the same reference numerals, and descriptions of these components are omitted.
- the decryption device A0130 differs from the decryption device A0110 in that, while the decryption device A0110 reads the input data A0113 from the memory A0114, the decryption device A0130 has a receiving unit A0131 that receives the input data A0113 from outside.
- the receiving unit A0131 receives the receiving signal A0132, acquires the received data, and outputs the input data A0113 to be input to the decoder A0112.
- the received data may be the same as the input data A0113 to be input to the decoder A0112, or may be data in a format different from that of the input data A0113. If the received data is in a format different from that of the input data A0113, the receiving unit A0131 may convert the received data into the input data A0113, or a conversion unit or CPU (not shown) included in the decoding device A0130 may convert the received data into the input data A0113.
- the received data is, for example, a bit stream, multiplexed data, a file, or a packet, as described for the encoding device A0120.
- FIG. 6 is a functional block diagram showing a configuration of a decoder A0200 which is an example of the decoder A0112 in FIG. 3 or FIG.
- the input data A0113 is an encoded bitstream and includes encoded audio data, which is an encoded audio signal, and metadata used for audio processing.
- the spatial information management unit A0201 acquires metadata contained in the input data A0113 and analyzes the metadata.
- the metadata includes information describing elements that act on sounds arranged in a sound space.
- the spatial information management unit A0201 manages spatial information necessary for sound processing obtained by analyzing the metadata, and provides the spatial information to the rendering unit A0203.
- the information used for sound processing is called spatial information in this disclosure, it may be called something else.
- the information used for sound processing may be called, for example, sound space information or scene information.
- the spatial information input to the rendering unit A0203 may be called a spatial state, a sound space state, a scene state, etc.
- the spatial information may be managed for each sound space or for each scene.
- the spatial information may be managed as scenes of different sound spaces for each room, or the spatial information may be managed as different scenes depending on the scene being represented even if the room is the same space.
- an identifier for identifying each piece of spatial information may be assigned.
- the spatial information data may be included in a bitstream, which is one form of input data, or the bitstream may include an identifier for the spatial information and the spatial information data may be obtained from somewhere other than the bitstream. If the bitstream includes only an identifier for the spatial information, the identifier for the spatial information may be used during rendering to obtain the spatial information data stored in the memory of the acoustic signal processing device A0001 or an external server as input data.
- the information managed by the spatial information management unit A0201 is not limited to the information included in the bitstream.
- the input data A0113 may include data indicating the characteristics or structure of the space obtained from a software application or server that provides VR or AR as data not included in the bitstream.
- the input data A0113 may include data indicating the characteristics or position of a listener or an object as data not included in the bitstream.
- the input data A0113 may include information obtained by a sensor provided in a terminal including a decoding device as information indicating the position of the listener, or information indicating the position of the terminal estimated based on information obtained by the sensor.
- the spatial information management unit A0201 may communicate with an external system or server to obtain spatial information and the position of the listener. Also, the spatial information management unit A0201 may obtain clock synchronization information from an external system and execute a process of synchronizing with the clock of the rendering unit A0203.
- the space in the above description may be a virtually formed space, i.e., a VR space, or may be a real space (i.e., a physical space) or a virtual space corresponding to the real space, i.e., an AR or MR (Mixed Reality).
- the virtual space may also be called a sound field or sound space.
- the information indicating a position in the above description may be information such as coordinate values indicating a position within a space, information indicating a relative position with respect to a predetermined reference position, or information indicating the movement or acceleration of a position within a space.
- the audio data decoder A0202 decodes the encoded audio data contained in the input data A0113 to obtain an audio signal.
- the encoded audio data acquired by the stereophonic reproduction system A0000 is a bitstream encoded in a specific format, such as MPEG-H 3D Audio (ISO/IEC 23008-3).
- MPEG-H 3D Audio is merely one example of an encoding method that can be used to generate the encoded audio data contained in the bitstream, and the encoded audio data may also include a bitstream encoded in another encoding method.
- the encoding method used may be a lossy codec such as MP3 (MPEG-1 Audio Layer-3), AAC (Advanced Audio Coding), WMA (Windows Media Audio), AC3 (Audio Codec-3), or Vorbis, or a lossless codec such as ALAC (Apple Lossless Audio Codec) or FLAC (Free Lossless Audio Codec), or any encoding method other than the above may be used.
- MP3 MPEG-1 Audio Layer-3
- AAC Advanced Audio Coding
- WMA Windows Media Audio
- AC3 Audio Codec-3
- Vorbis Vorbis
- ALAC Apple Lossless Audio Codec
- FLAC Free Lossless Audio Codec
- PCM pulse code modulation
- the decoding process may be, for example, a process of converting an N-bit binary number into a number format (e.g., floating-point format) that can be processed by the rendering unit A0203 when the number of quantization bits of the PCM data is N.
- a number format e.g., floating-point format
- the rendering unit A0203 receives an audio signal and spatial information, performs acoustic processing on the audio signal using the spatial information, and outputs the processed audio signal A0111.
- the spatial information management unit A0201 reads the metadata of the input signal, detects rendering items such as objects or sounds defined in the spatial information, and sends them to the rendering unit A0203. After rendering begins, the spatial information management unit A0201 grasps changes over time in the spatial information and the listener's position, and updates and manages the spatial information. The spatial information management unit A0201 then sends the updated spatial information to the rendering unit A0203. The rendering unit A0203 generates and outputs an audio signal to which acoustic processing has been added based on the audio signal included in the input data A0113 and the spatial information received from the spatial information management unit A0201.
- the spatial information update process and the audio signal output process with added acoustic processing may be executed in the same thread, or the spatial information management unit A0201 and the rendering unit A0203 may each be assigned to an independent thread.
- the thread startup frequency may be set individually, or the processes may be executed in parallel.
- the spatial information management unit A0201 and the rendering unit A0203 execute their processes in different independent threads, it is possible to allocate computational resources preferentially to the rendering unit A0203, so that sound output processing that cannot tolerate even the slightest delay, such as sound output processing in which a delay of even one sample (0.02 msec) would cause a popping noise, can be safely performed.
- the allocation of computational resources to the spatial information management unit A0201 is limited.
- updating spatial information is a low-frequency process (for example, processing such as updating the direction of the listener's face). For this reason, unlike audio signal output processing, it does not necessarily require an instantaneous response, so limiting the allocation of computational resources does not have a significant impact on the acoustic quality provided to the listener.
- Updating of the spatial information may be performed periodically at preset times or periods, or when preset conditions are met.
- updating of the spatial information may be performed manually by the listener or the manager of the sound space, or may be performed when triggered by a change in an external system. For example, if a listener operates a controller to instantly warp the position of his/her avatar, or to instantly advance or reverse the time, or if the manager of the virtual space suddenly performs a performance that changes the environment of the place, the thread in which the spatial information management unit A0201 is placed may be started as a one-off interrupt process in addition to being started periodically.
- the role of the information update thread that executes the spatial information update process is, for example, to update the position or orientation of the listener's avatar placed in the virtual space based on the position or orientation of the VR goggles worn by the listener, and to update the position of objects moving in the virtual space, and these roles are handled within a processing thread that runs relatively infrequently, on the order of a few tens of Hz. Processing to reflect the properties of direct sound may be performed in such an infrequent processing thread. This is because the properties of direct sound change less frequently than the frequency with which audio processing frames for audio output occur. By doing so, the computational load of the process can be made relatively small, and the risk of pulsive noise occurring when information is updated at an unnecessarily fast frequency can be avoided.
- FIG. 7 is a functional block diagram showing the configuration of a decoder A0210, which is another example of the decoder A0112 in FIG. 3 or FIG. 5.
- the decoder A0210 shown in FIG. 7 differs from the decoder A0200 shown in FIG. 6 in that the input data A0113 includes an uncoded audio signal rather than encoded audio data.
- the input data A0113 includes a bitstream including metadata and an audio signal.
- the spatial information management unit A0211 is the same as the spatial information management unit A0201 in FIG. 6, so a description thereof will be omitted.
- the rendering unit A0213 is the same as the rendering unit A0203 in Figure 6, so a description of it will be omitted.
- the configuration in FIG. 7 is called the decoder A0210, but it may also be called an audio processing unit that performs audio processing.
- a device that includes an audio processing unit may be called an audio processing device rather than a decoding device.
- the audio signal processing device A0001 may be called an audio processing device.
- Fig. 8 is a diagram showing an example of the physical configuration of an audio signal processing device. Note that the audio signal processing device in Fig. 8 may be a decoding device. Also, a part of the configuration described here may be provided in the audio presentation device A0002. Also, the audio signal processing device shown in Fig. 8 is an example of the above-mentioned audio signal processing device A0001.
- the acoustic signal processing device in FIG. 8 includes a processor, a memory, a communication IF, a sensor, and a speaker.
- the processor may be, for example, a CPU (Central Processing Unit), a DSP (Digital Signal Processor) or a GPU (Graphics Processing Unit), and the CPU, DSP or GPU may execute a program stored in memory to perform the acoustic processing or decoding processing of the present disclosure.
- the processor may also be a dedicated circuit that performs signal processing on audio signals, including the acoustic processing of the present disclosure.
- Memory is composed of, for example, RAM (Random Access Memory) or ROM (Read Only Memory). Memory may also include magnetic storage media such as hard disks or semiconductor memory such as SSDs (Solid State Drives). Memory may also include internal memory built into the CPU or GPU.
- RAM Random Access Memory
- ROM Read Only Memory
- Memory may also include magnetic storage media such as hard disks or semiconductor memory such as SSDs (Solid State Drives). Memory may also include internal memory built into the CPU or GPU.
- the communication IF Inter Face
- the audio signal processing device shown in FIG. 8 has a function of communicating with other communication devices via the communication IF, and acquires a bitstream to be decoded.
- the acquired bitstream is stored in a memory, for example.
- the communication module is composed of, for example, a signal processing circuit and an antenna corresponding to the communication method.
- Bluetooth registered trademark
- WIGIG registered trademark
- the communication IF may be a wired communication method such as Ethernet (registered trademark), USB (Universal Serial Bus), or HDMI (registered trademark) (High-Definition Multimedia Interface) instead of the wireless communication method described above.
- the sensor performs sensing to estimate the position or orientation of the listener. Specifically, the sensor estimates the position and/or orientation of the listener based on one or more detection results of the position, orientation, movement, velocity, angular velocity, acceleration, etc. of a part of the listener's body, such as the head, or the whole of the listener, and generates position information indicating the position and/or orientation of the listener.
- the position information may be information indicating the position and/or orientation of the listener in real space, or information indicating the displacement of the position and/or orientation of the listener based on the position and/or orientation of the listener at a specified time.
- the position information may also be information indicating the position and/or orientation relative to the stereophonic reproduction system A0000 or an external device equipped with the sensor.
- the sensor may be, for example, an imaging device such as a camera or a ranging device such as LiDAR (Light Detection and Ranging), and may capture the movement of the listener's head and detect the movement of the listener's head by processing the captured image.
- the sensor may be a device that performs position estimation using wireless signals of any frequency band, such as millimeter waves.
- the audio signal processing device shown in FIG. 8 may acquire position information from an external device equipped with a sensor via a communication IF.
- the audio signal processing device does not need to include a sensor.
- the external device is, for example, the audio presentation device A0002 described in FIG. 1 or a 3D image playback device worn on the listener's head.
- the sensor is configured by combining various sensors such as a gyro sensor and an acceleration sensor.
- the sensor may detect, for example, the angular velocity of rotation about at least one of three mutually orthogonal axes in the sound space as the speed of movement of the listener's head, or may detect the acceleration of displacement with at least one of the three axes as the displacement direction.
- the sensor may detect, for example, the amount of movement of the listener's head as the amount of rotation about at least one of three mutually orthogonal axes in the sound space, or the amount of displacement about at least one of the three axes. Specifically, the sensor detects 6DoF (position (x, y, z) and angle (yaw, pitch, roll)) as the listener's position.
- the sensor is configured by combining various sensors used for detecting movement, such as a gyro sensor and an acceleration sensor.
- the sensor only needs to be capable of detecting the position of the listener, and may be realized by a camera or a GPS (Global Positioning System) receiver, etc. Position information obtained by performing self-position estimation using LiDAR (Laser Imaging Detection and Ranging) or the like may be used. For example, when the audio signal playback system is realized by a smartphone, the sensor is built into the smartphone.
- GPS Global Positioning System
- the sensor may also include a temperature sensor such as a thermocouple that detects the temperature of the audio signal processing device shown in FIG. 8, and a sensor that detects the remaining charge of a battery provided in or connected to the audio signal processing device.
- a temperature sensor such as a thermocouple that detects the temperature of the audio signal processing device shown in FIG. 8, and a sensor that detects the remaining charge of a battery provided in or connected to the audio signal processing device.
- a speaker for example, has a diaphragm, a drive mechanism such as a magnet or voice coil, and an amplifier, and presents the audio signal after acoustic processing as sound to the listener.
- the speaker operates the drive mechanism in response to the audio signal (more specifically, a waveform signal that indicates the waveform of the sound) amplified via the amplifier, and the drive mechanism vibrates the diaphragm.
- the diaphragm vibrates in response to the audio signal, generating sound waves that propagate through the air and are transmitted to the listener's ears, causing the listener to perceive the sound.
- the audio signal processing device shown in FIG. 8 is provided with a speaker and an audio signal after acoustic processing is presented through the speaker
- the means for presenting the audio signal is not limited to the above configuration.
- the audio signal after acoustic processing may be output to an external audio presentation device A0002 connected by a communication module. Communication through the communication module may be wired or wireless.
- the audio signal processing device shown in FIG. 8 may be provided with a terminal for outputting an analog audio signal, and an audio signal may be presented from an earphone or the like by connecting a cable such as an earphone to the terminal.
- the audio signal is reproduced by headphones, earphones, a head-mounted display, a neck speaker, a wearable speaker, a surround speaker composed of multiple fixed speakers, or the like that is worn on the head or part of the body of the listener, which is the audio presentation device A0002.
- Fig. 9 is a diagram showing an example of the physical configuration of an encoding device.
- the encoding device shown in Fig. 9 is an example of the encoding devices A0100 and A0120 described above.
- the encoding device in FIG. 9 includes a processor, a memory, and a communication interface.
- the processor may be, for example, a CPU (Central Processing Unit) or a DSP (Digital Signal Processor), and the encoding process of the present disclosure may be performed by the CPU or GPU executing a program stored in memory.
- the processor may also be a dedicated circuit that performs signal processing on audio signals, including the encoding process of the present disclosure.
- Memory is composed of, for example, RAM (Random Access Memory) or ROM (Read Only Memory). Memory may also include magnetic storage media such as hard disks or semiconductor memory such as SSDs (Solid State Drives). Memory may also include internal memory built into the CPU or GPU.
- RAM Random Access Memory
- ROM Read Only Memory
- Memory may also include magnetic storage media such as hard disks or semiconductor memory such as SSDs (Solid State Drives). Memory may also include internal memory built into the CPU or GPU.
- the communication IF (Inter Face) is a communication module that supports communication methods such as Bluetooth (registered trademark) or WIGIG (registered trademark).
- the encoding device has the function of communicating with other communication devices via the communication IF, and transmits an encoded bit stream.
- the communication module is composed of, for example, a signal processing circuit and an antenna corresponding to the communication method.
- Bluetooth registered trademark
- WIGIG registered trademark
- the communication IF may be a wired communication method such as Ethernet (registered trademark), USB (Universal Serial Bus), or HDMI (registered trademark) (High-Definition Multimedia Interface) instead of the wireless communication method described above.
- FIG. 10 is a block diagram showing a functional configuration of the acoustic signal processing device 100 according to the present embodiment.
- the audio signal processing device 100 is a device for outputting aerodynamic sound data indicating aerodynamic sound caused by wind generated by an object in a virtual space (sound reproduction space).
- the audio signal processing device 100 according to this embodiment is a device that is used in various applications in virtual spaces, such as virtual reality or augmented reality (VR or AR), for example.
- VR or AR augmented reality
- An object in a virtual space is included in the content (here, video is an example of the content) displayed on the display unit 300, which displays the content executed within the virtual space.
- the object there is no particular limitation on the object, so long as it is an object that creates wind.
- An object is, for example, a moving body that generates wind by moving its position.
- Moving bodies include, for example, objects that represent plants and animals, man-made objects, or natural objects.
- objects that represent man-made objects include vehicles, bicycles, and airplanes.
- objects that represent man-made objects include sports equipment such as baseball bats and tennis rackets, and furniture such as desks, chairs, and grandfather clocks.
- an object may be at least one of something that can move within the content and something that can be moved, but is not limited to this.
- the object may be an object that can blow air.
- objects include, for example, electric fans, circulators, paper fans, and air conditioners.
- Aerodynamic sound is sound that occurs when wind generated by an object reaches the listener's ears in a virtual space.
- the aerodynamic sound is the aerodynamic sound that is generated when the wind generated by the object reaches the listener. More specifically, the aerodynamic sound is the sound that is generated when the wind blown out from an electric fan reaches the listener, for example, depending on the shape of the listener's ear.
- the aerodynamic sound is generated when wind generated by the object's movement reaches the listener, and more specifically, is the sound that is generated when the wind reaches the listener, for example, depending on the shape of the listener's ear.
- the object may also be an object that creates wind and generates sound.
- the sound generated by the object is a sound indicated by sound data associated with the object (hereinafter sometimes referred to as object sound data).
- object sound data For example, if the object is an electric fan, the sound generated by the object is a motor sound generated by a motor possessed by the electric fan.
- the sound generated by the object is a siren sound emitted by the ambulance.
- the object is an electric fan, which is an example of an object that can blow air.
- the acoustic signal processing device 100 outputs aerodynamic sound data representing aerodynamic sounds in a virtual space to the headphones 200.
- the headphones 200 are a device that reproduces aerodynamic sound, and are an audio output device that presents the aerodynamic sound to the listener. More specifically, the headphones 200 reproduce the aerodynamic sound based on the aerodynamic sound data output by the audio signal processing device 100. This allows the listener to hear the aerodynamic sound. Note that instead of the headphones 200, other output channels such as speakers may be used.
- the headphones 200 include a head sensor unit 201 and an output unit 202.
- the head sensor unit 201 senses the position of the listener, which is determined by the horizontal coordinates and vertical height in the virtual space, and outputs second position information indicating the position of the listener of the aerodynamic sound in the virtual space to the acoustic signal processing device 100.
- the head sensor unit 201 may sense 6DoF information of the listener's head.
- the head sensor unit 201 may be an inertial measurement unit (IMU), an accelerometer, a gyroscope, a magnetic sensor, or a combination of these.
- the output unit 202 is a device that reproduces the sound that reaches the listener in the sound reproduction space. More specifically, the output unit 202 reproduces the aerodynamic sound based on aerodynamic sound data indicating the aerodynamic sound output from the acoustic signal processing device 100.
- the object is an electric fan
- sound data indicating a motor sound is output from the audio signal processing device 100
- the output unit 202 reproduces the motor sound based on the output sound data.
- sound data indicating a siren sound is output from the audio signal processing device 100, and the output unit 202 reproduces the siren sound based on the output sound data.
- the display unit 300 is a display device that displays content (images) including objects in a virtual space. The process by which the display unit 300 displays content will be described later.
- the display unit 300 is realized by a display panel such as a liquid crystal panel or an organic EL (Electro Luminescence) panel, for example.
- the acoustic signal processing device 100 shown in FIG. 10 will be described.
- the acoustic signal processing device 100 outputs aerodynamic sound data to the headphones 200 a predetermined time after a predetermined timing.
- the acoustic signal processing device 100 includes an acquisition unit 110, a determination unit 120, an output unit 130, and a storage unit 140.
- the acquisition unit 110 acquires object information.
- the object information is information indicating the change in the object causing the wind, the specified timing of the change in the object, the change in the wind due to the change in the object, and the position of the object.
- object information is treated as information including first change information indicating the change in the object causing the wind, timing information indicating the specified timing of the change in the object, second change information indicating the change in the wind due to the change in the object, and first position information indicating the position of the object.
- the object information includes sound data (object sound data) that indicates the sound.
- the object information may also include shape information that indicates the shape of the object.
- the acquisition unit 110 acquires second position information. As described above, the second position information is information indicating the position of the listener in the virtual space.
- the acquisition unit 110 acquires aerodynamic sound data indicating aerodynamic sound.
- the aerodynamic sound data is stored in the storage unit 140, and the acquisition unit 110 acquires the aerodynamic sound data stored in the storage unit 140.
- the acquisition unit 110 may acquire the object information, second position information, and aerodynamic sound data, for example, from an input signal, or may acquire the object information, second position information, and aerodynamic sound data from other sources.
- the input signal will be described below.
- the object sound data and aerodynamic sound data may be collectively referred to as sound data.
- the input signal is composed of, for example, spatial information, sensor information, and sound data (audio signal). Furthermore, the above information and sound data may be included in one input signal, or the above information and sound data may be included in multiple separate signals.
- the input signal may include a bit stream composed of sound data and metadata (control information), in which case the metadata may include information identifying the spatial information and sound data.
- the first change information, timing information, second change information, first position information, shape information, object sound data, second position information, and aerodynamic sound data described above may be included in the input signal. More specifically, the first change information, timing information, second change information, first position information, and shape information may be included in spatial information, and the second position information may be generated based on information obtained from sensor information.
- the sensor information may be obtained from the head sensor unit 201, or may be obtained from another external device.
- the spatial information is information about the sound space (three-dimensional sound field) created by the stereophonic reproduction system A0000, and is composed of information about the objects contained in the sound space and information about the listener.
- Objects include sound source objects that emit sound and act as sound sources, and non-sound producing objects that do not emit sound. Non-sound producing objects function as obstacle objects that reflect sounds emitted by sound source objects, but there are also cases where sound source objects function as obstacle objects that reflect sounds emitted by other sound source objects. Obstacle objects may also be called reflecting objects.
- Information that is commonly assigned to sound source objects and non-sound-producing objects includes position information, shape information, and the rate at which the sound volume decays when the object reflects sound.
- the position information is expressed by coordinate values on three axes, for example, the X-axis, Y-axis, and Z-axis in Euclidean space, but it does not necessarily have to be three-dimensional information.
- the position information may be two-dimensional information expressed by coordinate values on two axes, for example, the X-axis and Y-axis.
- the position information of an object is determined by the representative position of a shape expressed by a mesh or voxel.
- the shape information may also include information about the surface material.
- the attenuation rate may be expressed as a real number less than 1 or greater than 0, or as a negative decibel value. In real space, sound volume is not amplified by reflection, so the attenuation rate is set to a negative decibel value, but for example, to create an eerie feeling in an unreal space, an attenuation rate of greater than 1, i.e., a positive decibel value, may be set. Also, the attenuation rate may be set to a different value for each frequency band that makes up multiple frequency bands, or a value may be set independently for each frequency band. Also, if an attenuation rate is set for each type of material on the object surface, a corresponding attenuation rate value may be used based on information about the surface material.
- the information commonly assigned to the sound source object and the non-sound generating object may include information indicating whether the object belongs to a living thing or not, or information indicating whether the object is a moving object or not. If the object is a moving object, the position information may move over time, and the changed position information or the amount of change is transmitted to the rendering units A0203 and A0213.
- Information about the sound source object includes, in addition to the information commonly given to the sound source object and non-sound generating object described above, object sound data and information necessary for radiating the object sound data into the sound space.
- the object sound data is data expressing the sound perceived by the listener, including information about the frequency and strength of the sound.
- the object sound data is typically a PCM signal, but may also be data compressed using an encoding method such as MP3. In that case, the signal needs to be decoded at least before it reaches the generation unit (generation unit 907 described later in FIG. 19), so the rendering units A0203 and A0213 may include a decoding unit (not shown). Alternatively, the signal may be decoded by the audio data decoder A0202.
- At least one object sound data may be set for one sound source object, and multiple object sound data may be set.
- identification information for identifying each object sound data may be assigned, and the identification information for the object sound data may be stored as metadata as information relating to the sound source object.
- Information necessary for emitting object sound data into a sound space may include, for example, information on the reference volume that serves as a reference when playing back the object sound data, information on the position of the sound source object, information on the orientation of the sound source object, and information on the directionality of the sound emitted by the sound source object.
- the reference volume information may be, for example, the effective value of the amplitude value of the object sound data at the sound source position when the object sound data is emitted into the sound space, and may be expressed as a floating point decibel (db) value.
- db decibel
- the reference volume information may indicate that sound is emitted into the sound space from the position indicated by the information regarding the position at the same volume without increasing or decreasing the volume of the signal level indicated by the object sound data.
- the reference volume information is -6 db
- the reference volume information may indicate that sound is emitted into the sound space from the position indicated by the information regarding the position with the volume of the signal level indicated by the object sound data reduced to about half.
- the reference volume information may be assigned to one object sound data or to multiple object sound data collectively.
- the volume information included in the information necessary to radiate object sound data into a sound space may include, for example, information indicating time-series fluctuations in the volume of the sound source. For example, if the sound space is a virtual conference room and the sound source is a speaker, the volume transitions intermittently over a short period of time. Expressed more simply, this can be said to mean that sound and silence occur alternately. Also, if the sound space is a concert hall and the sound source is a performer, the volume is maintained for a certain period of time. Also, if the sound space is a battlefield and the sound source is an explosion, the volume of the explosion sound will increase for a moment and then remain silent. In this way, the volume information of the sound source includes not only sound volume information but also information on the transition of sound volume, and such information may be used as information indicating the nature of the object sound data.
- the loudness transition information may be data showing frequency characteristics in a time series.
- the loudness transition information may be data showing the duration of a section where sound is present.
- the loudness transition information may be data showing a time series of the duration of a section where sound is present and the duration of a section where sound is absent.
- the loudness transition information may be data listing multiple sets of durations during which the amplitude of a sound signal can be considered to be stationary (approximately constant) and data on the amplitude value of the signal during that time in a time series.
- the loudness transition information may be data listing multiple sets of durations during which the frequency characteristics of a sound signal can be considered to be stationary.
- the loudness transition information may be data listing multiple sets of durations during which the frequency characteristics of a sound signal can be considered to be stationary and data on the frequency characteristics during that time in a time series.
- the loudness transition information may be data showing the outline of a spectrogram, for example, as a data format.
- the volume that serves as a reference for the frequency characteristics may be the reference volume.
- the reference volume information and information indicating the properties of the object sound data may be used to calculate the volume of the direct sound or reflected sound to be perceived by the listener, as well as in a selection process to select whether or not to perceive it.
- Orientation information is typically expressed in yaw, pitch, and roll.
- the roll rotation may be omitted and it may be expressed in azimuth (yaw) and elevation (pitch).
- Orientation information may change over time, and if it does, it is transmitted to rendering units A0203 and A0213.
- the information about the listener is information about the listener's position and orientation in sound space.
- the position information is expressed as positions on the X-, Y-, and Z-axes in Euclidean space, but it does not necessarily have to be three-dimensional information and may be two-dimensional information.
- Orientation information is typically expressed in yaw, pitch, and roll. Alternatively, the orientation information may be expressed in azimuth (yaw) and elevation (pitch) without the roll rotation.
- the position information and orientation information may change over time, and if they do change, they are transmitted to the rendering units A0203 and A0213.
- the sensor information includes the amount of rotation or displacement detected by a sensor worn by the listener and the position and orientation of the listener.
- the sensor information is transmitted to the rendering units A0203 and A0213, which update the position and orientation information of the listener based on the sensor information.
- the sensor information may be position information obtained by a mobile terminal performing self-position estimation using a GPS, a camera, or LiDAR (Laser Imaging Detection and Ranging).
- Information obtained from outside through a communication module other than the sensor may be detected as sensor information.
- Information indicating the temperature of the acoustic signal processing device 100 and information indicating the remaining battery level may be obtained from the sensor as sensor information.
- Information indicating the computational resources (CPU capacity, memory resources, PC performance) of the acoustic signal processing device 100 or the audio presentation device A0002 may be obtained in real time as sensor information.
- the acquisition unit 110 acquires the object information from the storage unit 140, but is not limited to this, and may acquire the object information from a device other than the acoustic signal processing device 100 (for example, a server device 500 such as a cloud server).
- the acquisition unit 110 acquires the second position information from the headphones 200 (more specifically, the head sensor unit 201), but is not limited to this.
- the first change information is information that indicates a change in the object that creates wind.
- a change in the object means a change in the state of the object.
- the object since the object is an electric fan, the following are examples of changes in the state of the object:
- a change in the state of an object is when an electric fan is switched between ON and OFF (hereinafter may be referred to as an "ON/OFF switch").
- an ON/OFF switch Another example of a change in the state of an object is when a switch that controls the fan's wind speed is switched from low to high (hereinafter may be referred to as a "wind speed switch”).
- a switch that controls the fan's oscillation is switched from no oscillation to oscillation (hereinafter may be referred to as a "wind direction switch”).
- the second change information is information that indicates a change in the wind due to a change in the object.
- the second change information indicates a change in the wind speed or a change in the wind direction (wind direction) as a change in the wind due to a change in the object.
- the content of the information indicated by the second change information changes according to the change in the state of the object indicated by the first change information.
- the second change information indicates, for example, that the wind speed has switched from 0 m/s to V1 m/s (V1>0). If the change in the state of the object indicated by the first change information is a "wind speed switch”, the second change information indicates, for example, that the wind speed has switched from V2 m/s to V3 m/s (V3>V2). If the change in the state of the object indicated by the first change information is a "wind direction switch", the second change information indicates, for example, that the wind direction has switched from a constant state to a changing state. In this way, it is preferable for the second change information to be information that depends on the first change information.
- V1, V2, and V3, which indicate the wind speed are, for example, the wind speed at the position where the object, an electric fan, is placed.
- the timing information is information that indicates a predetermined timing regarding a change in an object.
- the acoustic signal processing device 100 outputs aerodynamic sound data to the headphones 200 a predetermined time after this predetermined timing.
- the predetermined timing indicates the start of the predetermined time for outputting the aerodynamic sound data.
- the specified timing indicated by the timing information is the timing of a change in wind, more specifically, the timing of a change in wind due to a change in an object.
- the specified timing is the timing of a change in wind speed or direction due to a change in an object.
- the specified timing is the timing when the wind speed changes.
- An example of a change in wind speed is when an electric fan, which is an object, is switched from OFF to ON.
- the wind speed changes from 0 m/s to V1 m/s
- the predetermined timing is the timing when the wind speed changes, that is, the timing when the wind speed changes from 0 m/s to V1 m/s.
- the predetermined timing is the timing when the wind speed changes and is also the timing (first timing) when sound data (object sound data) associated with the electric fan, which is an object, is output.
- the audio signal processing device 100 (more specifically, the output unit 130) according to this embodiment outputs sound data (object sound data) associated with the electric fan at the predetermined timing (first timing).
- the timing information included in the object information indicates that the predetermined timing is the timing of the change in wind and is the first timing.
- the specified timing may also be, for example, a timing specified by an administrator of the audio signal processing device 100.
- the object in the virtual space is included in the content (image) displayed on the display unit 300, and in this embodiment, it is an electric fan.
- the first position information is information that indicates where in the virtual space the electric fan is located at a given point in time. Note that in the virtual space, for example, the electric fan may be moved by the user picking it up and moving it. For this reason, the acquisition unit 110 continuously acquires the first position information. The acquisition unit 110 acquires the first position information, for example, each time the spatial information is updated by the spatial information management units A0201 and A0211.
- the sound data may be a sound signal such as PCM (Pulse Code Modulation) data, but is not limited to this and may be any information that indicates the properties of the sound.
- PCM Pulse Code Modulation
- the sound data relating to the sound signal may be the PCM data representing the sound signal itself, or may be data consisting of information indicating that the component is a noise signal and information indicating that the volume is X decibels.
- the sound data relating to the sound data may be the PCM data representing the sound signal itself, or may be data consisting of information indicating that the component is a noise signal and information indicating the peak/dip of the frequency component.
- a sound signal based on sound data means PCM data that represents the sound data.
- the aerodynamic sound data is stored in advance in the storage unit 140, as described above.
- the aerodynamic sound data is data that captures the sound that occurs when wind reaches a human ear or a model that mimics a human ear.
- the aerodynamic sound data is data that captures the sound that occurs when wind reaches a model that mimics a human ear.
- a dummy head microphone or the like is used as a model that mimics a human ear, and the aerodynamic sound data is collected.
- the wind changes due to a change in the object.
- the aerodynamic sound is the aerodynamic sound caused by the wind before the change, or the wind after the change.
- the aerodynamic sound may be the aerodynamic sound caused by the wind after the change, for example, the aerodynamic sound caused by the wind at the changed wind speed, or the aerodynamic sound caused by the wind in the changed wind direction.
- Shape information is information that indicates the shape of an object in virtual space.
- Shape information indicates the shape of an object, and more specifically, indicates the three-dimensional shape of the object as a rigid body.
- the shape of an object may be indicated, for example, by a sphere, rectangular prism, cube, polyhedron, cone, pyramid, cylinder, prism, or a combination of these.
- shape information may be expressed, for example, as mesh data, or as a collection of multiple faces made up of voxels, three-dimensional point clouds, or vertices with three-dimensional coordinates.
- the first change information includes object identification information for identifying the object.
- the timing information also includes object identification information
- the second change information also includes object identification information
- the first position information also includes object identification information
- the object sound data also includes object identification information
- the shape information also includes object identification information.
- the acquisition unit 110 acquires the first change information, timing information, second change information, first position information, object sound data, and shape information separately, the object indicated by each of the first change information, timing information, second change information, first position information, object sound data, and shape information is identified by referring to the object identification information included in each of the first change information, timing information, second change information, first position information, object sound data, and shape information.
- the object indicated by each of the first change information, timing information, second change information, first position information, object sound data, and shape information is the same electric fan.
- the first change information, timing information, second change information, first position information, object sound data, and shape information acquired by the acquisition unit 110 are each identified as information relating to an electric fan by referring to six object identification information. Therefore, the first change information, the timing information, the second change information, the first position information, the object sound data, and the shape information are linked together as information indicating the electric fan.
- the listener may move in the virtual space.
- the second position information is information indicating where in the virtual space the listener is located at a given point in time. Since the listener can move in the virtual space, the acquisition unit 110 continuously acquires the second position information. The acquisition unit 110 acquires the second position information, for example, each time the spatial information is updated by the spatial information management units A0201 and A0211.
- the above-mentioned first change information, timing information, second change information, first position information, shape information, object sound data, second position information, and aerodynamic sound data may be included in the metadata, control information, or header information included in the input signal.
- sound data including object sound data and aerodynamic sound data is a sound signal (PCM data)
- information for identifying the sound signal may be included in the metadata, control information, or header information, and the sound signal may be included in something other than the metadata, control information, or header information.
- the audio signal processing device 100 (more specifically, the acquisition unit 110) may acquire metadata, control information, or header information included in the input signal, and perform audio processing based on the metadata, control information, or header information.
- the audio signal processing device 100 (more specifically, the acquisition unit 110) only needs to acquire the above-mentioned first change information, timing information, second change information, first position information, shape information, object sound data, second position information, and aerodynamic sound data, and the acquisition source is not limited to the input signal.
- the sound data, including the object sound data and the aerodynamic sound data, and the metadata may be stored in one input signal, or may be stored separately in multiple input signals.
- sound signals other than sound data including object sound data and aerodynamic sound data may be stored as audio content information in the input signal.
- the audio content information may be subjected to encoding processing such as MPEG-H 3D Audio (ISO/IEC 23008-3) (hereinafter referred to as MPEG-H 3D Audio).
- MPEG-H 3D Audio MPEG-H 3D Audio
- the technology used for the encoding processing is not limited to MPEG-H 3D Audio, and other well-known technologies may be used.
- information such as the above-mentioned first change information, timing information, second change information, first position information, shape information, object sound data, second position information, and aerodynamic sound data may be the subject of encoding processing.
- the audio signal processing device 100 acquires the sound signal and metadata contained in the encoded bitstream.
- audio content information is acquired and decoded.
- the audio signal processing device 100 functions as a decoder (e.g., decoders A0200 and A0210) included in a decoding device (e.g., decoding devices A0110 and A0130), and more specifically, functions as rendering units A0203 and A0213 included in the decoder.
- the term audio content information in this disclosure is to be interpreted as information including the sound signal itself, first change information, timing information, second change information, first position information, shape information, object sound data, second position information, and aerodynamic sound data, in accordance with the technical content.
- the acquisition unit 110 outputs the acquired object information and second position information to the determination unit 120 and the output unit 130.
- the determination unit 120 determines the predetermined time based on the wind indicated by the object information acquired by the acquisition unit 110. In other words, the determination unit 120 determines the predetermined time based on the wind generated by the object.
- the determination unit 120 determines the predetermined time based on the wind speed indicated by the second change information included in the acquired object information and the distance between the position of the listener and the position of the object. If the predetermined time is t seconds, then as an example, t>0 is satisfied, but this is not limited to this, and the predetermined time may be, for example, 0.1 seconds or more and 5 seconds or less.
- the determination unit 120 can determine, for example, a time specified by an administrator of the acoustic signal processing device 100 as the predetermined time. Furthermore, the determination unit 120 calculates the distance as follows.
- the determination unit 120 calculates the distance between the position of the listener and the position of the object based on the first position information included in the object information acquired by the acquisition unit 110 and the acquired second position information. As described above, the acquisition unit 110 acquires the first position information and the second position information in the virtual space each time the spatial information is updated by the spatial information management units A0201 and A0211. The determination unit 120 calculates the distance between the position of the listener and the position of the object in the virtual space based on the multiple pieces of first position information and multiple pieces of second position information acquired each time the spatial information is updated.
- the determination unit 120 determines the specified time and outputs it to the output unit 130.
- the output unit 130 outputs the aerodynamic sound data acquired by the acquisition unit 110 a predetermined time determined by the determination unit 120 from the predetermined timing indicated by the object information acquired by the acquisition unit 110.
- the output unit 130 outputs the aerodynamic sound data to the headphones 200. This enables the headphones 200 to play the aerodynamic sound indicated by the output aerodynamic sound data. In other words, the listener can hear the aerodynamic sound a predetermined time after the predetermined timing.
- the memory unit 140 is a storage device that stores computer programs executed by the acquisition unit 110, the determination unit 120, and the output unit 130, as well as object information and aerodynamic sound data.
- the shape information is information used to generate an image of an object in a virtual space, and is also information that indicates the shape of the object (electric fan).
- the shape information is also information that is used to generate the content (image) that is displayed on the display unit 300.
- the acquisition unit 110 also outputs the acquired shape information to the display unit 300.
- the display unit 300 acquires the shape information output by the acquisition unit 110.
- the display unit 300 further acquires attribute information indicating attributes (such as color) other than the shape of the object (electric fan) in the virtual space.
- the display unit 300 may acquire the attribute information directly from a device other than the audio signal processing device 100 (the server device 500), or may acquire it from the audio signal processing device 100.
- the display unit 300 generates and displays content (video) based on the acquired shape information and attribute information.
- FIG. 11 is a flowchart of a first operation example of the acoustic signal processing device 100 according to this embodiment.
- Fig. 12 is a diagram showing an electric fan F, which is an object, and a listener L according to the first operation example.
- the acquisition unit 110 acquires object information (S10).
- the object information includes first change information indicating a change in the object causing the wind W, timing information indicating a predetermined timing related to the change in the object, second change information indicating a change in the wind W due to the change in the object, and first position information indicating the position of the object.
- the object information also includes object sound data indicating a motor sound, and shape information. This step S10 corresponds to the acquisition step.
- the second change information indicates a change in the wind speed of the wind W as a change in the wind W due to a change in the object.
- the specified timing indicated by the timing information is the timing of the change in the wind W, more specifically, the timing of the change in the wind W due to a change in the object.
- the acquisition unit 110 acquires second position information indicating the position of the listener L in the virtual space from the headphones 200 (S20). Furthermore, the acquisition unit 110 acquires aerodynamic sound data indicating the aerodynamic sound stored in the storage unit 140 (S30).
- the determination unit 120 determines the predetermined time based on the wind speed indicated by the second change information and the distance between the position of the listener L and the position of the object (electric fan F) (S40). This step S40 corresponds to the determination step.
- the output unit 130 outputs sound data (object sound data) associated with the electric fan F at a predetermined timing (S50). Then, the output unit 130 outputs aerodynamic sound data indicating aerodynamic sound caused by the wind W a predetermined time after the predetermined timing (S60). This step S60 corresponds to an output step.
- the specified timing is the timing of a change in the wind W, that is, the timing when the wind speed changes due to a change in the object.
- the specified timing is the timing when the electric fan F is switched from OFF to ON.
- the determination unit 120 may determine the specified time as the time from the specified time until the wind W generated by the electric fan F reaches the listener L.
- FIG. 13A is a diagram explaining the process for determining the predetermined time in step S40 shown in FIG. 11.
- the distance between the position of the listener L and the position of the object (electric fan F) is defined as D. More specifically, the distance between the position of the listener L's ear and the position of the object (electric fan F) is defined as D. Note that the distance D is calculated by the determination unit 120 based on the first position information included in the object information acquired by the acquisition unit 110 and the acquired second position information.
- V So x (U/x)
- the average wind speed up to a position at distance D satisfies the following formula.
- t The time (predetermined time) from when the electric fan F is switched from OFF to ON (i.e., the predetermined timing) until the wind W generated by the object electric fan F reaches the listener L is t, which is the distance divided by the average wind speed, and satisfies the following formula.
- step S60 aerodynamic sound data is output when a predetermined time t has elapsed from the predetermined timing.
- the listener L can hear the aerodynamic sound output from the headphones 200 at the time when the wind W generated by the fan F reaches the listener L (predetermined time t) after the electric fan F is switched from OFF to ON (i.e., the predetermined timing). Therefore, the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the specified timing is the timing when the electric fan F is switched from OFF to ON, and is the first timing when the object sound data associated with the object, the electric fan F, is output.
- the above operation includes the following meaning. That is, the meaning is that "the aerodynamic sound indicated by the aerodynamic sound data is output so as to become a sound with an amplitude that can be perceived by the listener L from the specified timing until the specified time t has elapsed.” This is realized, for example, by a filter that has the specified time t as a time constant when outputting the aerodynamic sound data. Specifically, it may be done as follows.
- FIG. 13B is a diagram illustrating a detailed example of the output of aerodynamic sound data according to this embodiment.
- FIG. 13C is a diagram illustrating another detailed example of the output of aerodynamic sound data according to this embodiment.
- FIG. 13B is a diagram showing a trigger signal indicating the ON/OFF change of electric fan F.
- (a) of FIG. 13B shows a trigger signal whose value is "0" when electric fan F is OFF and whose value is "1" when electric fan F is ON.
- (b) of FIG. 13B is a diagram showing the above-mentioned trigger signal multiplied by a time constant t. In other words, the above-mentioned trigger signal is multiplied by a low-pass filter whose time constant is a predetermined time t.
- (c) of FIG. 13B is a diagram showing aerodynamic sound data whose amplitude has been amplified according to the magnitude of the output signal of the low-pass filter.
- t does not necessarily have to be a value calculated precisely based on the formula below, but may be a value simply approximated so that t increases as the distance D increases.
- FIG. 13C is a diagram showing a trigger signal indicating the ON/OFF change of electric fan F.
- FIG. 13B is a diagram showing the above-mentioned trigger signal multiplied by a time constant t, and shows a trigger signal multiplied by a time constant t smaller than the time constant t in (b) of Fig. 13B.
- (c) of Fig. 13C is a diagram showing aerodynamic sound data controlled according to the value of the trigger signal multiplied by the time constant t shown in (b) of Fig. 13C.
- the specified timing is the timing when the electric fan F is switched from OFF to ON, and is the first timing at which the object sound data associated with the object, the electric fan F, is output.
- step S50 the listener L can hear the motor sound of the fan F output from the headphones 200 at the timing when the electric fan F is switched from OFF to ON. Furthermore, by the processing of step S60, the listener L can hear the aerodynamic sound output from the headphones 200 at the timing when the time has passed since the listener L heard the motor sound and the wind W caused by switching the electric fan F from OFF to ON reaches the listener L.
- the motor sound reaches the listener L at the speed of sound and is heard by the listener L, and the aerodynamic sound is heard by the listener L when the wind W reaches the listener L.
- the speed of sound is generally faster than the wind speed, and in this operation example, as in real space, the listener L hears the motor sound first and then the aerodynamic sound. Therefore, the listener L can hear the motor sound (sound represented by the sound data associated with the object) and the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can experience a sense of realism.
- the predetermined timing is the timing when the wind speed changes and the timing (first timing) when sound data (object sound data) associated with the object, electric fan F, is output, but this is not limited to the above.
- the object information may indicate a change in the direction of the wind W due to a change in the object (electric fan F). More specifically, the object information may indicate a change in the direction (wind direction) of the wind W as a change in the wind W due to a change in the object (electric fan F). This is the case, for example, when the change in the state of the object indicated by the first change information is a "wind direction change" and the second change information indicates that the wind direction has changed from a constant state to a changing state.
- the timing information included in the object information indicates that the specified timing is the third timing at which a change in the direction of the wind W (wind direction) occurs.
- the output unit 130 may output aerodynamic sound data indicating the aerodynamic sound caused by the wind W a predetermined time after the third timing (predetermined timing) indicated by the object information.
- the specified timing and the specified time are not limited to those shown in Operation Example 1.
- the specified timing may be a timing (specified timing) specified by a user (e.g., an administrator of the acoustic signal processing device 100), and the specified time may be a time (specified time) specified by the administrator.
- the determination unit 120 may determine the timing and time specified by the user as the specified timing and the specified time.
- the acoustic signal processing device 100 may include a reception unit, which receives the timing and time specified by the user, and the determination unit 120 may determine the timing and time received by the reception unit as the specified timing and the specified time.
- the administrator specifies the specified timing and time so that the listener L can hear the aerodynamic sound at the same timing as in real space.
- the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the aerodynamic sound data is stored in advance in the storage unit 140, but this is not limited to the above.
- the determination unit 120 may generate the aerodynamic sound data.
- the determination unit 120 may generate the aerodynamic sound data by acquiring a noise signal and processing the acquired noise signal with each of a plurality of band emphasis filters.
- the determination unit 120 determines the predetermined time based on the wind speed indicated by the second change information and the distance between the position of the listener L and the position of the object (electric fan F), but this is not limited to this.
- the object information may include first position information indicating the position of the object, and the determination unit 120 may determine the predetermined time based on the distance between the position of the listener L of the aerodynamic sound and the position of the object indicated by the first position information included in the acquired object information.
- the predetermined time corresponding to a reference distance is set, and the predetermined time is determined so that the longer the distance between the position of the listener L of the aerodynamic sound and the position of the object is longer than the reference distance, and the shorter the predetermined time is determined so that the shorter the distance between the position of the listener L of the aerodynamic sound and the position of the object is shorter than the reference distance.
- Modification of the embodiment Modifications of the embodiment will be described below, focusing on the differences from the embodiment, and explanations of commonalities will be omitted or simplified.
- the acoustic signal processing device 100 is used, but the object in the virtual space is different.
- the object according to this modified example is a vehicle, which is a moving body. More specifically, the object is an ambulance.
- the aerodynamic sound is a sound that is generated when the wind W, which is generated by the movement of the object's position, reaches the listener L.
- the object, the ambulance is an object that generates sound, and generates a siren sound.
- the object information in this modified example is information indicating the change in the object causing the wind W, the specified timing of the change in the object, the change in the wind W due to the change in the object, and the position of the object.
- the object information is treated as information including first change information indicating the change in the object causing the wind W, timing information indicating the specified timing of the change in the object, second change information indicating the change in the wind W due to the change in the object, and first position information indicating the position of the object.
- the first change information is information that indicates a change in the object that is causing the wind W, and in this modified example, the change in the object means a change in the position of the object.
- the first location information is information that indicates the location within the virtual space of the ambulance at a given point in time. Note that in the virtual space, the ambulance may travel and its location may change, for example, when operated by a driver. For this reason, the acquisition unit 110 continuously acquires the first location information.
- the second change information is information that indicates a change in the wind W due to a change in the object.
- the content of the information indicated by the second change information changes according to the change in the position of the object indicated by the first change information.
- the second change information indicates that the wind speed of the wind W generated by the movement of the object has changed from a first predetermined value to a second predetermined value, or that the wind direction has changed from a first predetermined direction to a second predetermined direction.
- first and second predetermined values are, for example, the wind speed at the position where the ambulance is located
- first and second predetermined directions are, for example, the wind direction at the position where the ambulance is located.
- the first change information indicates that an ambulance approaches the listener L and then moves away from the listener L.
- the wind W generated by the movement of the ambulance blows strongly toward the listener L while the ambulance is approaching the listener L, and blows weakly toward the listener L while the ambulance is moving away from the listener L. Therefore, the wind speed of the wind W is a high value toward the listener L while the ambulance is approaching the listener L, and a low value toward the listener L while the ambulance is moving away from the listener L. In this way, the wind W (more specifically, the wind speed of the wind W) is changing.
- the wind speed of the wind W generated by the object is considered to be the same as the moving speed of the ambulance.
- the moving speed of the ambulance is calculated by differentiating the position of the ambulance with respect to time in the virtual space based on the first position information.
- the timing information is information indicating a predetermined timing regarding a change in an object.
- the predetermined timing indicated by the timing information is the timing of a change in the wind W, more specifically, the timing of a change in the wind W due to a change in the position of the object.
- the predetermined timing is the timing when the wind speed changes due to a change in the position of the object, and as one example, the timing when an ambulance approaches the listener L and then moves away from the listener L.
- the predetermined timing is the timing when the amount of change in the distance between the position of the listener L and the position of the object in the virtual space turns from negative to positive over time.
- this predetermined timing is the timing when the object is closest to the listener L in the virtual space.
- the predetermined timing may also be the timing when the wind direction changes due to a change in the position of the object.
- FIG. 14 is a flowchart of a second operation example of the acoustic signal processing device 100 according to this embodiment.
- Fig. 15 is a diagram showing an ambulance A and a listener L which are objects according to the second operation example.
- the acquisition unit 110 acquires object information (S10).
- the object information includes first change information indicating a change in the object causing the wind W, timing information indicating a predetermined timing related to the change in the object, second change information indicating a change in the wind W due to the change in the object, and first position information indicating the position of the object.
- the object information also includes object sound data indicating a siren sound, and shape information.
- the second change information indicates a change in the wind speed of the wind W as a change in the wind W due to a change in the object.
- the specified timing indicated by the timing information is the timing of the change in the wind W, more specifically, the timing of the change in the wind W due to a change in the object.
- the acquisition unit 110 acquires second position information indicating the position of the listener L in the virtual space from the headphones 200 (S20). Furthermore, the acquisition unit 110 acquires aerodynamic sound data indicating the aerodynamic sound stored in the storage unit 140 (S30).
- the output unit 130 determines whether or not the predetermined timing has arrived (S35). If the predetermined timing has not arrived (No in step S35), the process of step S35 is repeated.
- the determination unit 120 determines the specified time based on the wind speed indicated by the second change information and the distance between the position of the listener L and the position of the object (ambulance A) (S40).
- the output unit 130 outputs aerodynamic sound data indicating the aerodynamic sound caused by the wind W a predetermined time after the predetermined timing (S60).
- step S35 in this operation example will now be explained in more detail.
- the specified timing is the timing of a change in the wind W. More specifically, the specified timing is the timing when the wind speed changes due to a change in the object's position, and is the timing when the amount of change in the distance between the position of the listener L and the position of the object in the virtual space turns from negative to positive over time.
- FIG. 16 is a schematic diagram for explaining the specified timing according to operation example 2.
- Ambulance A moves in the order of (a), (b), and (c) shown in FIG. 16. Also, assume that the position of listener L remains constant while ambulance A moves from (a) to (c). While ambulance A moves from (a) to (b), the amount of change in the distance between the position of listener L and the position of the object in the virtual space is negative. While ambulance A moves from (b) to (c), the amount of change in the distance between the position of listener L and the position of the object in the virtual space is positive. Therefore, the timing at which the amount of change in distance turns from negative to positive is when ambulance A is in position (b) shown in FIG. 16.
- step S35 the process shown in FIG. 17 is performed.
- FIG. 17 is a flowchart explaining the details of step S35 in operation example 2.
- the determination unit 120 judges whether or not the timing (predetermined timing) has come when the amount of change in the distance between the position of the listener L and the position of the object (ambulance A) in the virtual space has turned from negative to positive (S35a).
- the determination unit 120 calculates the amount of change in the distance by calculating the distance between the position of the listener L and the position of the object (ambulance A) and differentiating the calculated distance. If the answer is Yes in step S35a, the processing of step S40 is performed, and if the answer is No in step S35a, the processing of step S35 is repeated.
- the listener L hears the aerodynamic sound when the time has passed from when the amount of change in the distance between the listener L's position and the object's position turns from negative to positive until the wind W created by ambulance A reaches the listener L.
- the time when the amount of change in the distance turns from negative to positive is the time when the object is closest to the listener L, and is the specified time. Therefore, the determination unit 120 may determine the time from the specified time until the wind W created by ambulance A reaches the listener L as the specified time.
- the predetermined time is determined based on the same idea as in Figure 13A described in Operation Example 1. That is, as shown in Figure 15, the distance between the position of the listener L and the position of the object (ambulance A) is set to D, and more specifically, the distance between the position of ambulance A at the position (b) shown in Figure 16 and the position of the listener L is set to D.
- V So x (U/x)
- the average wind speed up to a position at distance D satisfies the following formula.
- the time (predetermined time) from when the change in the distance between the position of the listener L and the position of the object turns from negative to positive (i.e., the specified time) until the wind W generated by the object, ambulance A, reaches the listener L is t, which is the distance divided by the average wind speed, and satisfies the following formula.
- step S60 aerodynamic sound data is output when a predetermined time t has elapsed from the predetermined timing.
- the listener L can hear the aerodynamic sound output from the headphones 200 from the time when the change in the distance between the listener L's position and the object's position turns from negative to positive (i.e., the specified time) until the time when the wind W created by the ambulance A reaches the listener L (specified time t). Therefore, the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the listener L hears the aerodynamic sound after a vehicle such as ambulance A comes closest to the listener L. For this reason, if the listener L hears the aerodynamic sound before the ambulance A comes closest to the listener L in virtual space, the listener L will feel uncomfortable.
- the timing when the amount of change in the distance between the position of the listener L and the position of the object turns from negative to positive is set as the specified timing.
- ambulance A is an object that generates sound, and generates a siren sound.
- the output unit 130 may output an object sound signal indicating a siren sound so that listener L hears a siren sound accompanied by the Doppler effect.
- the predetermined timing was the timing when the change in the distance between the position of the listener L and the position of the object turned from negative to positive, but this is not limited to this.
- the predetermined timing may be the timing (second timing) when the distance between the position of the listener L and the position of the object becomes shorter than the predetermined distance.
- the predetermined distance is, for example, several meters to several tens of meters, and is a distance that indicates that the distance between the position of the listener L and the position of the object has become sufficiently close.
- the predetermined distance may be, for example, a value specified by an administrator of the acoustic signal processing device 100.
- step S35 the process shown in FIG. 18 is performed.
- FIG. 18 is a flowchart explaining the details of step S35 according to another first example of operation example 2.
- step S30 the determination unit 120 judges whether or not the timing (second timing) has come when the distance between the position of the listener L and the position of the object (ambulance A) in the virtual space becomes shorter than a predetermined distance (S35b). As described above, if the answer is Yes in step S35b, the processing of step S40 is performed, and if the answer is No in step S35b, the processing of step S35 is repeated.
- the listener L can hear the aerodynamic sound output from the headphones 200 at the time when the time has passed from the second time when the distance between the position of the listener L and the position of the object (ambulance A) becomes sufficiently close to the time when the wind W generated by the ambulance A reaches the listener L.
- step S35 the processes of both steps S35a and S35b shown in FIG. 17 and FIG. 18 are performed. If both steps S35a and S35b are Yes, the process of step S40 is performed, and if at least one of steps S35a and S35b is No, the process of step S35 is repeated. The process shown in this other second example of operation example 2 may be performed.
- FIG. 19 is a functional block diagram and a diagram showing an example of steps for explaining a case where the rendering units A0203 and A0213 in FIG. 6 and FIG. 7 perform pipeline processing.
- a rendering unit 900 which is an example of the rendering units A0203 and A0213 in FIG. 6 and FIG. 7, is used for explanation.
- Pipeline processing refers to dividing the process for creating sound effects into multiple processes and executing each process one by one in sequence. Each of the divided processes performs, for example, signal processing on the audio signal, or the generation of parameters to be used in the signal processing.
- the rendering unit 900 in this embodiment includes, as pipeline processing, processing for applying, for example, a reverberation effect, early reflection processing, distance attenuation effect, binaural processing, and the like.
- pipeline processing processing for applying, for example, a reverberation effect, early reflection processing, distance attenuation effect, binaural processing, and the like.
- the above processing is only an example, and other processing may be included, or some processing may not be included.
- the rendering unit 900 may include diffraction processing or occlusion processing as pipeline processing, or may omit reverberation processing, for example, if it is not necessary.
- each processing may be expressed as a stage, and an audio signal such as a reflected sound generated as a result of each processing may be expressed as a rendering item.
- the order of each stage in the pipeline processing and the stages included in the pipeline processing are not limited to the example shown in FIG. 19.
- the rendering unit 900 does not need to include all of the stages shown in FIG. 19, and some stages may be omitted or other stages may exist in addition to the rendering unit 900.
- each process analyzes the metadata contained in the input signal and calculates the parameters required to generate reflected sounds.
- the rendering unit 900 includes a reverberation processing unit 901, an early reflection processing unit 902, a distance attenuation processing unit 903, a selection unit 904, a calculation unit 906, a generation unit 907, and a binaural processing unit 905.
- the reverberation processing unit 901 performs a reverberation processing step
- the early reflection processing unit 902 performs an early reflection processing step
- the distance attenuation processing unit 903 performs a distance attenuation processing step
- the selection unit 904 performs a selection processing step
- the binaural processing unit 905 performs a binaural processing step.
- the reverberation processor 901 In the reverberation processing step, the reverberation processor 901 generates an audio signal indicating reverberation or parameters required for generating an audio signal.
- Reverberation is a sound that includes reverberation that reaches the listener as reverberation after direct sound.
- reverberation is a reverberation that reaches the listener after being reflected more times (e.g., several tens of times) than the initial reflection sound, at a relatively late stage (e.g., about a hundred and several tens of ms after the direct sound arrives) after the initial reflection sound described below reaches the listener.
- the reverberation processor 901 refers to the audio signal and spatial information contained in the input signal, and performs calculations using a predetermined function prepared in advance to generate reverberation.
- the reverberation processor 901 may generate reverberation by applying a known reverberation generation method to the sound signal.
- a known reverberation generation method is the Schroeder method, but is not limited to this.
- the reverberation processor 901 uses the shape and acoustic characteristics of the sound reproduction space indicated by the spatial information. This allows the reverberation processor 901 to calculate parameters for generating an audio signal indicating reverberation.
- the early reflection processing unit 902 calculates parameters for generating an early reflection sound based on the spatial information.
- the early reflection sound is a reflection sound that reaches the listener after one or more reflections at a relatively early stage (for example, about several tens of milliseconds after the direct sound arrives) after the direct sound from the sound source object reaches the listener.
- the early reflection processing unit 902 refers to the sound signal and metadata, for example, and calculates the path (path length) of the reflection sound that reflects from the sound source object to the object and reaches the listener using the shape, size, and position of objects such as structures of the three-dimensional sound field (space), and the reflectance of the object.
- the early reflection processing unit 902 may also calculate the path (path length) of the direct sound. Information indicating the path may be used as a parameter for generating the early reflection sound, and may also be used as a parameter for the selection process of the reflection sound in the selection unit 904.
- the distance attenuation processing unit 903 calculates the volume of the sound reaching the listener based on the difference between the path length of the direct sound and the path length of the reflected sound calculated by the early reflection processing unit 902. Since the volume of the sound reaching the listener attenuates in proportion to the distance to the listener (inversely proportional to the distance) relative to the volume of the sound source, the volume of the direct sound can be obtained by dividing the volume of the sound source by the length of the path of the direct sound, and the volume of the reflected sound can be calculated by dividing the volume of the sound source by the length of the path of the reflected sound.
- the selection unit 904 selects the sound to be generated.
- the selection process may be performed based on parameters calculated in the previous step.
- sounds not selected in the selection process do not need to be subjected to processing subsequent to the selection process in the pipeline processing.
- processing subsequent to the selection process for sounds not selected it is possible to reduce the computational load on the acoustic signal processing device 100 compared to the case where it is decided not to execute only binaural processing for sounds not selected.
- the order of the selection process is set to be executed in an earlier order among the orders of multiple processes in the pipeline process, more processing after the selection process can be omitted, and the amount of calculation can be reduced even more.
- the selection process is executed in an order prior to the processing of the calculation unit 906 and the generation unit 907, processing of aerodynamic sounds related to objects that have been determined not to be selected can be omitted, and the amount of calculation in the acoustic signal processing device 100 can be reduced even further.
- parameters calculated as part of the pipeline process that generates the rendering items may be used by the selection unit 904 or the calculation unit 906.
- the binaural processing unit 905 performs signal processing on the audio signal of the direct sound so that the sound is perceived as reaching the listener from the direction of the sound source object. Furthermore, the binaural processing unit 905 performs signal processing so that the reflected sound is perceived as reaching the listener from an obstacle object related to the reflection. Based on the coordinates and orientation of the listener in the sound space (i.e., the position and orientation of the listening point), a process is performed to apply a HRIR (Head-Related Impulse Responses) DB (Data base) so that the sound reaches the listener from the position of the sound source object or the position of the obstacle object. Note that the position and direction of the listening point may be changed in accordance with, for example, the movement of the listener's head. Also, information indicating the position of the listener may be obtained from a sensor.
- HRIR Head-Related Impulse Responses
- HRIR Head-Related Impulse Responses
- HRIR Head-Related Impulse Responses
- HRIR is a response characteristic that is converted from an expression in the frequency domain to an expression in the time domain by Fourier transforming the head-related transfer function, which represents the changes in sound caused by surrounding objects including the auricle, the human head, and shoulders as a transfer function.
- the HRIR DB is a database that contains such information.
- the rendering unit 900 may include processing units not shown.
- it may include a diffraction processing unit or an occlusion processing unit.
- the diffraction processing unit executes a process to generate an audio signal that indicates a sound that includes diffracted sound caused by an obstacle between the listener and the sound source object in a three-dimensional sound field (space).
- diffracted sound is sound that travels from the sound source object to the listener by going around the obstacle.
- the diffraction processing unit refers to the sound signal and metadata, and uses the position of the sound source object in the three-dimensional sound field (space), the position of the listener, and the positions, shapes, and sizes of obstacles to calculate a path from the sound source object to the listener, bypassing obstacles, and generates diffracted sound based on that path.
- the occlusion processing unit generates an audio signal that can be heard when a sound source object is located behind an obstacle object, based on the spatial information acquired in any of the steps and information such as the material of the obstacle object.
- the position information given to the sound source object is defined as a "point" in the virtual space, and the details of the invention have been described assuming that the sound source is a so-called "point sound source”.
- a spatially extended sound source that is not a point sound source may be defined as an object having a length, size, or shape. In such a case, since the distance between the listener and the sound source or the direction from which the sound comes is not determined, the reflected sound resulting from this may be limited to the processing of "selecting" by the selection unit 904 without analysis or regardless of the analysis result.
- a representative point such as the center of gravity of the object may be determined, and the processing of the present disclosure may be applied assuming that the sound is generated from that representative point.
- the threshold value may be adjusted according to the information on the spatial extension of the sound source before applying the processing of the present disclosure.
- the bitstream includes, for example, an audio signal and metadata.
- the audio signal is sound data that represents sound, and indicates information about the frequency and intensity of the sound.
- the spatial information included in the metadata is information about the space in which a listener who hears a sound based on the audio signal is located. Specifically, the spatial information is information about a specific position (localization position) when the sound image of the sound is localized at a specific position in a sound space (for example, in a three-dimensional sound field), that is, when the listener perceives the sound as arriving from a specific direction.
- the spatial information includes, for example, sound source object information and position information indicating the position of the listener.
- Sound source object information is information about an object that generates sound based on an audio signal, that is, that reproduces an audio signal, and is information about a virtual object (sound source object) that is placed in a sound space, which is a virtual space that corresponds to the real space in which the object is placed.
- Sound source object information includes, for example, information indicating the position of the sound source object placed in the sound space, information about the orientation of the sound source object, information about the directionality of the sound emitted by the sound source object, information indicating whether the sound source object belongs to a living thing, and information indicating whether the sound source object is a moving object.
- an audio signal corresponds to one or more sound source objects indicated by the sound source object information.
- the bitstream is composed of metadata (control information) and an audio signal.
- the audio signal and metadata may be stored in a single bitstream or may be stored separately in multiple bitstreams. Similarly, the audio signal and metadata may be stored in a single file or may be stored separately in multiple files.
- a bitstream may exist for each sound source, or for each playback time. If a bitstream exists for each playback time, multiple bitstreams may be processed in parallel at the same time.
- Metadata may be added to each bitstream, or may be added together as information for controlling multiple bitstreams. Metadata may also be added for each playback time.
- the audio signal and metadata may be included in information indicating other bitstreams or files related to one or some of the bitstreams or files, or the audio signal and metadata may be included in information indicating other bitstreams or files related to each of all the bitstreams or files.
- the related bitstreams or files are, for example, bitstreams or files that may be used simultaneously during audio processing.
- the related bitstreams or files may include a bitstream or file that collectively describes information indicating other related bitstreams or files.
- the information indicating other related bitstreams or files is, for example, an identifier indicating the other bitstream, a file name indicating the other file, a URL (Uniform Resource Locator) or a URI (Uniform Resource Identifier), etc.
- the acquisition unit 110 identifies or acquires the bitstream or file based on the information indicating the other related bitstreams or files.
- the bitstream may contain information indicating other related bitstreams, and may contain information indicating a bitstream or file related to another bitstream or file.
- a file containing information indicating a related bitstream or file may be, for example, a control file such as a manifest file used in content distribution.
- the metadata may be obtained from sources other than the bitstream of the audio signal.
- the metadata that controls the audio or the metadata that controls the video may be obtained from sources other than the bitstream, or both may be obtained from sources other than the bitstream.
- the audio signal reproduction system may have a function of outputting metadata that can be used to control the video to a display device that displays images, or a 3D video reproduction device that reproduces 3D video.
- Metadata may be information used to describe a scene represented in sound space.
- a scene is a term that refers to the collection of all elements that represent three-dimensional images and acoustic events in sound space, which are modeled in an audio signal reproduction system using metadata.
- metadata here may include not only information that controls audio processing, but also information that controls video processing.
- metadata may include information that controls only audio processing or video processing, or information used to control both.
- the audio signal reproduction system generates virtual sound effects by performing acoustic processing on the audio signal using metadata included in the bitstream and additionally acquired interactive listener position information.
- acoustic processing such as distance attenuation effect, localization, and Doppler effect.
- information for switching all or part of the acoustic effects on and off, and priority information may be added as metadata.
- the encoded metadata includes information about a sound space including a sound source object and an obstacle object, and information about a position when the sound image of the sound is localized at a specific position in the sound space (i.e., perceived as a sound arriving from a specific direction).
- an obstacle object is an object that can affect the sound perceived by the listener, for example by blocking or reflecting the sound emitted by the sound source object before it reaches the listener.
- Obstacle objects can include not only stationary objects, but also animals such as people, or moving objects such as machines.
- the other sound source objects can be obstacle objects for any sound source object.
- Non-sound-emitting objects which are objects that do not emit sound, such as building materials or inanimate objects, and sound source objects that emit sound can both be obstacle objects.
- the metadata includes all or part of the information that represents the shape of the sound space, the shape and position information of obstacle objects that exist in the sound space, the shape and position information of sound source objects that exist in the sound space, and the position and orientation of the listener in the sound space.
- the sound space may be either a closed space or an open space.
- the metadata also includes information that indicates the reflectance of structures that can reflect sound in the sound space, such as floors, walls, or ceilings, and the reflectance of obstacle objects that exist in the sound space.
- the reflectance is the ratio of the energy of the reflected sound to the incident sound, and is set for each frequency band of sound. Of course, the reflectance may be set uniformly regardless of the frequency band of sound.
- parameters such as a uniform attenuation rate, diffracted sound, and early reflected sound may be used.
- reflectance was mentioned as a parameter related to an obstacle object or sound source object included in the metadata, but information other than reflectance may also be included.
- information other than reflectance may include information related to the material of the object as metadata related to both sound source objects and non-sound-producing objects.
- information other than reflectance may include parameters such as diffusion rate, transmittance, and sound absorption rate.
- Information about the sound source object may include volume, radiation characteristics (directivity), playback conditions, the number and type of sound sources emitted from one object, and information specifying the sound source area in the object.
- the playback conditions may determine, for example, whether the sound is a continuous sound or an event-triggered sound.
- the sound source area in the object may be determined in a relative relationship between the listener's position and the object's position, or may be determined based on the object.
- the surface of the object the listener is looking at is used as the reference, and the listener can perceive that sound C is emitted from the right side of the object and sound E is emitted from the left side of the object as seen from the listener.
- the sound source area in the object is determined based on the object, it is possible to fix which sound is emitted from which area of the object, regardless of the direction the listener is looking. For example, the listener can perceive that a high-pitched sound is coming from the right side and a low-pitched sound is coming from the left side when the object is viewed from the front. In this case, if the listener goes around to the back of the object, the listener can perceive that a low-pitched sound is coming from the right side and a high-pitched sound is coming from the left side when viewed from the back.
- Spatial metadata can include time to early reflections, reverberation time, and the ratio of direct sound to diffuse sound. If the ratio of direct sound to diffuse sound is zero, the listener will only perceive direct sound.
- the acoustic signal processing method includes an acquisition step of acquiring object information indicating a change in an object causing wind W and a predetermined timing related to the change in the object, and an output step of outputting aerodynamic sound data indicating aerodynamic sound caused by the wind W a predetermined time after the predetermined timing indicated by the acquired object information that is based on the change in the object.
- the specified timing is, for example, the timing of a change in the wind W
- the specified time is, for example, the time it takes for the wind W generated by the electric fan F to reach the listener L.
- the specified timing is, for example, the timing of a change in the wind W
- the specified time is, for example, the time it takes for the wind W generated by the ambulance A to reach the listener L.
- the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the acoustic signal processing method according to the embodiment can provide the listener L with a sense of realism.
- the specified timing may be a timing (specified timing) specified by the user, and the time specified by the user may be the specified time.
- the user may specify the specified timing and time so that the listener L can hear the aerodynamic sound at the same timing as in real space, and the specified specified timing and time may be the specified timing and specified time. Even in this case, the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so that the listener L is less likely to feel uncomfortable and can obtain a sense of presence.
- the object information indicates a change in the wind W due to a change in the object
- the predetermined timing is the timing of the change in the wind W.
- the audio signal processing method includes a determination step of determining the predetermined time based on the wind W indicated by the acquired object information.
- aerodynamic sound data can be output when a predetermined time determined based on the wind W has elapsed since the wind W changed, allowing the listener L to hear the aerodynamic sound at a more appropriate time.
- the change in the wind W indicated by the object information indicates a change in the wind speed of the wind W
- the predetermined time is determined based on the wind speed
- the specified time is determined based on the wind speed, allowing the listener L to hear the aerodynamic sound at a more appropriate time.
- the aerodynamic sound is the sound generated by the changed wind speed.
- the object information indicates the position of the object.
- the acoustic signal processing method includes a determination step of determining the predetermined time based on the distance between the position of the listener L of the aerodynamic sound and the position of the object indicated by the acquired object information.
- the specified time is determined based on the distance, allowing the listener L to hear the aerodynamic sound at a more appropriate time.
- the object information indicates the position of the object.
- the predetermined time is determined based on the wind speed and the distance between the position of the listener L of the aerodynamic sound and the position of the object indicated by the acquired object information.
- the specified time is determined based on the wind speed and the distance, allowing the listener L to hear the aerodynamic sound at a more appropriate time.
- the object information indicates that the predetermined timing is a first timing for outputting sound data associated with the object.
- the aerodynamic sound data is output a predetermined time after the first timing indicated by the acquired object information.
- the aerodynamic sound data can be output a predetermined time after the first timing at which the sound is output, allowing the listener L to hear the aerodynamic sound at a more appropriate timing.
- the specified timing is, for example, the timing when the electric fan F is switched from OFF to ON.
- the time specified time
- the listener L can hear the aerodynamic sound output from the headphones 200. Therefore, since the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the audio signal processing method according to the embodiment can provide the listener L with a sense of realism.
- the object information indicates the position of the object
- the predetermined timing indicates that the predetermined timing is a second timing at which the distance between the position of the listener L of the aerodynamic sound and the position of the object becomes shorter than the predetermined distance.
- the aerodynamic sound data is output a predetermined time after the second timing indicated by the acquired object information.
- the aerodynamic sound data can be output at the second time when the distance becomes shorter than the predetermined distance, in other words, at the time when a predetermined time has elapsed since the second time when the object approached the listener L, allowing the listener L to hear the aerodynamic sound at a more appropriate time.
- the specified timing is, for example, the timing when the amount of change in the distance between the position of the listener L and the position of the object turns from negative to positive.
- the time (specified time) for the wind W created by the ambulance A to reach the listener L has elapsed from this specified timing, the listener L can hear the aerodynamic sound output from the headphones 200. Therefore, since the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the acoustic signal processing method according to the modified example of the embodiment can provide the listener L with a sense of realism.
- the object information indicates that the change in the wind W due to the change in the object is a change in the direction of the wind W
- the predetermined timing is a third timing at which the change in the direction of the wind W occurred.
- the aerodynamic sound data is output a predetermined time after the third timing indicated by the acquired object information.
- the aerodynamic sound data can be output when a predetermined time has elapsed since the third timing when the change in the direction of the wind W occurred, allowing the listener L to hear the aerodynamic sound at a more appropriate timing.
- the object is an object that generates a sound and wind W indicated by sound data associated with the object
- the aerodynamic sound is an aerodynamic sound that is generated when the wind W generated by the object reaches the listener L.
- the distance is D
- the distance from the object position at which the wind speed is So is U. If the predetermined time is t, then t satisfies the following formula.
- the time from the specified timing until the wind W generated by the object reaches the listener L can be determined as the specified time. Therefore, since the aerodynamic sound data can be output at a timing when such a specified time has elapsed from the specified timing, the listener L can hear the aerodynamic sound at a more appropriate timing.
- the time at which the wind W generated by the electric fan F reaches the listener L can be determined as the predetermined time. Therefore, the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism. In this way, the audio signal processing method according to the embodiment can provide the listener L with a sense of realism.
- the object is an object that generates wind W by moving the position of the object
- the aerodynamic sound is aerodynamic sound that occurs when the wind W generated by the movement reaches the listener L.
- the predetermined timing indicated by the object information is the timing at which the amount of change in distance over time changes from negative to positive.
- the distance is D
- the distance from the object's position at which the wind speed of the wind W generated by the movement becomes So is U. If the predetermined time is t, then t satisfies the following formula.
- the time from the specified timing until the wind W generated by the object reaches the listener L can be determined as the specified time. Therefore, since the aerodynamic sound data can be output at a timing when such a specified time has elapsed from the specified timing, the listener L can hear the aerodynamic sound at a more appropriate timing.
- the time at which the wind W generated by ambulance A reaches the listener L can be determined as the predetermined time. Therefore, the listener L can hear the aerodynamic sound at the same timing as in real space, that is, at the appropriate timing, so the listener L is less likely to feel uncomfortable and can obtain a sense of realism.
- the acoustic signal processing method according to the embodiment can provide the listener L with a sense of realism.
- the computer program according to the embodiment is a computer program for causing a computer to execute the above-described acoustic signal processing method.
- the acoustic signal processing device 100 also includes an acquisition unit 110 that acquires object information indicating the change in the object causing the wind W and a predetermined timing related to the change in the object, and an output unit 130 that outputs aerodynamic sound data indicating the aerodynamic sound caused by the wind W a predetermined time after the predetermined timing indicated by the acquired object information based on the change in the object.
- the present disclosure is not limited to these embodiment and modified examples.
- the present disclosure may be realized by arbitrarily combining the components described in this specification, or by excluding some of the components.
- the present disclosure also includes modified examples obtained by applying various modifications that a person skilled in the art can think of to the above embodiment and modified examples without departing from the gist of the present disclosure, i.e., the meaning indicated by the words described in the claims.
- an example was given in which the object was an electric fan F, but this is not limited to this.
- an object that creates wind W is given as an example.
- the object that creates the wind W may be, for example, an object into which the wind W blows, such as a window or door.
- an object into which the wind W blows such as a window or door.
- the wind W blows into the building when the window or door opens, causing the listener L to hear aerodynamic sound.
- the timing at which the window or door opens corresponds to a predetermined timing, and the wind W is generated at the position of the window or door, and the technology disclosed herein can be applied.
- the object that generates the wind W may be, for example, an object from which the wind W blows out, such as a vent or exhaust hole.
- a vent or exhaust hole When the wind W blows out from a vent or exhaust hole, it is meaningless in the virtual space to precisely define the position at which the wind W is generated, and the technology disclosed herein can be applied assuming that the wind W is generated at the position of the exit of the vent or exhaust hole.
- the specified timing can be determined by an administrator of the virtual space or an administrator of the audio signal processing device 100.
- a reception unit provided in the audio signal processing device 100 may receive the timing specified by the administrator, and the determination unit 120 may determine the timing received by the reception unit as the specified timing.
- Some of the components constituting the above-mentioned audio signal processing device may be a computer system composed of a microprocessor, ROM, RAM, hard disk unit, display unit, keyboard, mouse, etc.
- a computer program is stored in the RAM or hard disk unit.
- the microprocessor achieves its functions by operating in accordance with the computer program.
- the computer program is composed of a combination of multiple instruction codes that indicate commands for a computer to achieve a specified function.
- Some of the components constituting the above-mentioned audio signal processing device may be composed of a single system LSI (Large Scale Integration).
- a system LSI is an ultra-multifunctional LSI manufactured by integrating multiple components on a single chip, and specifically, is a computer system including a microprocessor, ROM, RAM, etc.
- a computer program is stored in the RAM. The system LSI achieves its functions when the microprocessor operates in accordance with the computer program.
- Some of the components constituting the above-mentioned audio signal processing device may be composed of an IC card or a standalone module that can be attached to and detached from each device.
- the IC card or the module is a computer system composed of a microprocessor, ROM, RAM, etc.
- the IC card or the module may include the above-mentioned ultra-multifunction LSI.
- the IC card or the module achieves its functions by the microprocessor operating according to a computer program. This IC card or this module may be tamper-resistant.
- some of the components constituting the above-mentioned audio signal processing device may be the computer program or the digital signal recorded on a computer-readable recording medium, such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray (registered trademark) Disc), semiconductor memory, etc. Also, they may be digital signals recorded on such recording media.
- a computer-readable recording medium such as a flexible disk, hard disk, CD-ROM, MO, DVD, DVD-ROM, DVD-RAM, BD (Blu-ray (registered trademark) Disc), semiconductor memory, etc.
- BD Blu-ray (registered trademark) Disc
- some of the components constituting the above-mentioned audio signal processing device may transmit the computer program or the digital signal via a telecommunications line, a wireless or wired communication line, a network such as the Internet, data broadcasting, etc.
- the present disclosure may be the methods described above. It may also be a computer program that implements these methods using a computer, or a digital signal that includes the computer program.
- the present disclosure may also provide a computer system having a microprocessor and a memory, the memory storing the computer program, and the microprocessor operating in accordance with the computer program.
- the program or the digital signal may also be implemented by another independent computer system by recording it on the recording medium and transferring it, or by transferring the program or the digital signal via the network, etc.
- This disclosure can be used in audio signal processing methods and audio signal processing devices, and is particularly applicable to audio systems, etc.
- Audio signal processing device 110 Acquisition unit 120 Decision unit 130 Output unit 140 Storage unit 200 Headphones 201 Head sensor unit 202 Output unit 300 Display unit 900 Rendering unit 901 Reverberation processing unit 902 Early reflection processing unit 903 Distance attenuation processing unit 904 Selection unit 905 Binaural processing unit 906 Calculation unit 907 Generation unit A Ambulance A0000 Stereophonic sound reproduction system A0001 Acoustic signal processing device A0002 Audio presentation device A0100 Encoding device A0101 Input data A0102 Encoder A0103 Encoded data A0104 Memory A0110 Decoding device A0111 Audio signal A0112 Decoder A0113 Input data A0114 Memory A0120 Encoding device A0121 Transmission unit A0122 Transmission signal A0130 Decoding device A0131 Receiving unit A0132 Received signal A0200 Decoder A0201 Spatial information management unit A0202 Audio data decoder A0203 Rendering unit A0210 Decoder A0211 Spatial information management unit A0213 Rendering unit F Fan L Listener
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- General Engineering & Computer Science (AREA)
- Stereophonic System (AREA)
Abstract
Description
従来、仮想空間において、受聴者への音の到達時間が制御される音響信号処理方法が知られている。
[本開示の音響処理技術又は符号化/復号技術を適用可能な装置例]
<立体音響再生システム>
図1は、本開示の音響処理又は復号処理が適用可能なシステムの一例である立体音響(Immersive Audio)再生システムA0000を示す図である。立体音響再生システムA0000は、音響信号処理装置A0001と音声提示装置A0002とを含む。
図2は、本開示の符号化装置の一例である符号化装置A0100の構成を示す機能ブロック図である。
図3は、本開示の復号装置の一例である復号装置A0110の構成を示す機能ブロック図である。
図4は、本開示の符号化装置の別の一例である符号化装置A0120の構成を示す機能ブロック図である。図4では、図2の構成と同じ機能を有する構成に図2の構成と同じ符号を付しており、これらの構成については説明を省略する。
図5は、本開示の復号装置の別の一例である復号装置A0130の構成を示す機能ブロック図である。図5では、図3の構成と同じ機能を有する構成に図3の構成と同じ符号を付しており、これらの構成については説明を省略する。
図6は、図3又は図5におけるデコーダA0112の一例であるデコーダA0200の構成を示す機能ブロック図である。
図8は、音響信号処理装置の物理的構成の一例を示す図である。なお、図8の音響信号処理装置は、復号装置であってもよい。また、ここで説明する構成の一部は音声提示装置A0002に備えられていてもよい。また、図8に示される音響信号処理装置は、上記の音響信号処理装置A0001の一例である。
図9は、符号化装置の物理的構成の一例を示す図である。また、図9に示される符号化装置は、上記の符号化装置A0100及びA0120などの一例である。
さらに、実施の形態に係る音響信号処理装置100の構成について説明する。図10は、本実施の形態に係る音響信号処理装置100の機能構成を示すブロック図である。
図11は、本実施の形態に係る音響信号処理装置100の動作例1のフローチャートである。図12は、動作例1に係るオブジェクトである扇風機Fと受聴者Lとを示す図である。
以下、実施の形態の変形例について説明する。以下では、実施の形態との相違点を中心に説明し、共通点の説明を省略又は簡略化する。
変形例においては、実施の形態に係る音響信号処理装置100が用いられるが、仮想空間におけるオブジェクトが異なる。本変形例に係るオブジェクトは、移動体である車両である。より具体的には、オブジェクトは、救急車である。この場合、空力音は、オブジェクトの位置の移動により発生した風Wが受聴者Lに到達することによって生じる音である。また、オブジェクトである救急車は、音を発生させるオブジェクトであり、サイレン音を発生させる。
図14は、本実施の形態に係る音響信号処理装置100の動作例2のフローチャートである。図15は、動作例2に係るオブジェクトである救急車Aと受聴者Lとを示す図である。
実施の形態に係る音響信号処理方法は、風Wを起こすオブジェクトの変化、及び、オブジェクトの変化に関する所定タイミングを示すオブジェクト情報を取得する取得ステップと、取得されたオブジェクト情報が示す所定タイミングから、オブジェクトの変化に基づく所定時間後に風Wによる空力音を示す空力音データを出力する出力ステップと、を含む。
以上、本開示の態様に係る音響信号処理方法及び音響信号処理装置について、実施の形態及び変形例に基づいて説明したが、本開示は、この実施の形態及び変形例に限定されるものではない。例えば、本明細書において記載した構成要素を任意に組み合わせて、また、構成要素のいくつかを除外して実現される別の実施の形態を本開示の実施の形態としてもよい。また、上記実施の形態及び変形例に対して本開示の主旨、すなわち、請求の範囲に記載される文言が示す意味を逸脱しない範囲で当業者が思いつく各種変形を施して得られる変形例も本開示に含まれる。
110 取得部
120 決定部
130 出力部
140 記憶部
200 ヘッドフォン
201 頭部センサ部
202 出力部
300 表示部
900 レンダリング部
901 残響処理部
902 初期反射処理部
903 距離減衰処理部
904 選択部
905 バイノーラル処理部
906 算出部
907 生成部
A 救急車
A0000 立体音響再生システム
A0001 音響信号処理装置
A0002 音声提示装置
A0100 符号化装置
A0101 入力データ
A0102 エンコーダ
A0103 符号化データ
A0104 メモリ
A0110 復号装置
A0111 音声信号
A0112 デコーダ
A0113 入力データ
A0114 メモリ
A0120 符号化装置
A0121 送信部
A0122 送信信号
A0130 復号装置
A0131 受信部
A0132 受信信号
A0200 デコーダ
A0201 空間情報管理部
A0202 音声データデコーダ
A0203 レンダリング部
A0210 デコーダ
A0211 空間情報管理部
A0213 レンダリング部
F 扇風機
L 受聴者
Claims (16)
- 風を起こすオブジェクトの変化、及び、前記オブジェクトの変化に関する所定タイミングを示すオブジェクト情報を取得する取得ステップと、
取得された前記オブジェクト情報が示す前記所定タイミングから、前記オブジェクトの変化に基づく所定時間後に前記風による空力音を示す空力音データを出力する出力ステップと、
を含む、
音響信号処理方法。 - 前記オブジェクト情報は、
前記オブジェクトの変化による前記風の変化と、
前記所定タイミングが、前記風の変化のタイミングであることとを示し、
前記音響信号処理方法は、取得された前記オブジェクト情報が示す前記風に基づいて、前記所定時間を決定する決定ステップを含む、
請求項1に記載の音響信号処理方法。 - 前記オブジェクト情報が示す前記風の変化は、前記風の風速の変化を示し、
前記決定ステップでは、前記風速に基づいて、前記所定時間を決定する、
請求項2に記載の音響信号処理方法。 - 前記空力音は、変化後の前記風速で生じる音である、
請求項3に記載の音響信号処理方法。 - 前記オブジェクト情報は、前記オブジェクトの位置を示し、
前記音響信号処理方法は、前記空力音の受聴者の位置と、取得された前記オブジェクト情報が示す前記オブジェクトの位置との距離に基づいて、前記所定時間を決定する決定ステップを含む、
請求項1に記載の音響信号処理方法。 - 前記オブジェクト情報は、前記オブジェクトの位置を示し、
前記決定ステップでは、前記風速、及び、前記空力音の受聴者の位置と、取得された前記オブジェクト情報が示す前記オブジェクトの位置との距離に基づいて、前記所定時間を決定する、
請求項3に記載の音響信号処理方法。 - 前記オブジェクト情報は、前記所定タイミングが、前記オブジェクトに対応付けられた音データを出力する第1タイミングであることを示し、
前記出力ステップでは、取得された前記オブジェクト情報が示す前記第1タイミングから前記所定時間後に前記空力音データを出力する、
請求項1に記載の音響信号処理方法。 - 前記オブジェクト情報は、
前記オブジェクトの位置と、
前記所定タイミングが、前記空力音の受聴者の位置と前記オブジェクトの位置との距離が所定距離より短くなる第2タイミングであることとを示し、
前記出力ステップでは、取得された前記オブジェクト情報が示す前記第2タイミングから前記所定時間後に前記空力音データを出力する、
請求項1に記載の音響信号処理方法。 - 前記オブジェクト情報は、
前記オブジェクトの変化による前記風の変化が前記風の向きの変化であることと、
前記所定タイミングが、前記風の向きの変化が起こった第3タイミングであることとを示し、
前記出力ステップでは、取得された前記オブジェクト情報が示す第3タイミングから前記所定時間後に前記空力音データを出力する、
請求項1に記載の音響信号処理方法。 - 前記オブジェクトは、前記オブジェクトに対応付けられた音データが示す音及び前記風を発生させるオブジェクトであり、
前記空力音は、前記オブジェクトが発生させた前記風が前記受聴者に到達することによって生じる空力音である、
請求項6に記載の音響信号処理方法。 - 前記距離をDとし、
前記風速がSoとなる前記オブジェクトの位置からの距離をUとし、
前記所定時間をtとした場合、前記tは、下記式を満たす、
t={(D-U)^2}/{So×U×(log(D)-log(U))
請求項10に記載の音響信号処理方法。 - 前記オブジェクトは、前記オブジェクトの位置の移動により前記風を発生させるオブジェクトであり、
前記空力音は、前記移動により発生した前記風が前記受聴者に到達することによって生じる空力音である、
請求項6に記載の音響信号処理方法。 - 前記オブジェクト情報が示す前記所定タイミングは、時間の経過に伴う前記距離の変化量が負から正に転じたタイミングである、
請求項12に記載の音響信号処理方法。 - 前記距離をDとし、
前記移動により発生した前記風の前記風速がSoとなる前記オブジェクトの位置からの距離をUとし、
前記所定時間をtとした場合、前記tは、下記式を満たす、
t={(D-U)^2}/{So×U×(log(D)-log(U))
請求項12に記載の音響信号処理方法。 - 請求項1~14のいずれか1項に記載の音響信号処理方法をコンピュータに実行させるためのコンピュータプログラム。
- 風を起こすオブジェクトの変化、及び、前記オブジェクトの変化に関する所定タイミングを示すオブジェクト情報を取得する取得部と、
取得された前記オブジェクト情報が示す前記所定タイミングから、前記オブジェクトの変化に基づく所定時間後に前記風による空力音を示す空力音データを出力する出力部と、
を備える、
音響信号処理装置。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23879596.7A EP4607963A1 (en) | 2022-10-19 | 2023-10-03 | Acoustic signal processing method, computer program, and acoustic signal processing device |
| JP2024551436A JPWO2024084949A1 (ja) | 2022-10-19 | 2023-10-03 | |
| KR1020257011611A KR20250091201A (ko) | 2022-10-19 | 2023-10-03 | 음향 신호 처리 방법, 컴퓨터 프로그램, 및, 음향 신호 처리 장치 |
| CN202380071659.3A CN120113259A (zh) | 2022-10-19 | 2023-10-03 | 音响信号处理方法、计算机程序以及音响信号处理装置 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263417397P | 2022-10-19 | 2022-10-19 | |
| US63/417,397 | 2022-10-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024084949A1 true WO2024084949A1 (ja) | 2024-04-25 |
Family
ID=90737351
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2023/036004 Ceased WO2024084949A1 (ja) | 2022-10-19 | 2023-10-03 | 音響信号処理方法、コンピュータプログラム、及び、音響信号処理装置 |
Country Status (5)
| Country | Link |
|---|---|
| EP (1) | EP4607963A1 (ja) |
| JP (1) | JPWO2024084949A1 (ja) |
| KR (1) | KR20250091201A (ja) |
| CN (1) | CN120113259A (ja) |
| WO (1) | WO2024084949A1 (ja) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013201577A (ja) | 2012-03-23 | 2013-10-03 | Shimizu Corp | 立体音響計算方法、装置、プログラム、記録媒体および立体音響提示システムならびに仮想現実空間提示システム |
| CN110972053A (zh) * | 2019-11-25 | 2020-04-07 | 腾讯音乐娱乐科技(深圳)有限公司 | 构造听音场景的方法和相关装置 |
| WO2020255810A1 (ja) * | 2019-06-21 | 2020-12-24 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
| WO2021180938A1 (en) | 2020-03-13 | 2021-09-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for rendering a sound scene using pipeline stages |
-
2023
- 2023-10-03 EP EP23879596.7A patent/EP4607963A1/en active Pending
- 2023-10-03 WO PCT/JP2023/036004 patent/WO2024084949A1/ja not_active Ceased
- 2023-10-03 KR KR1020257011611A patent/KR20250091201A/ko active Pending
- 2023-10-03 JP JP2024551436A patent/JPWO2024084949A1/ja active Pending
- 2023-10-03 CN CN202380071659.3A patent/CN120113259A/zh active Pending
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2013201577A (ja) | 2012-03-23 | 2013-10-03 | Shimizu Corp | 立体音響計算方法、装置、プログラム、記録媒体および立体音響提示システムならびに仮想現実空間提示システム |
| WO2020255810A1 (ja) * | 2019-06-21 | 2020-12-24 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
| CN110972053A (zh) * | 2019-11-25 | 2020-04-07 | 腾讯音乐娱乐科技(深圳)有限公司 | 构造听音场景的方法和相关装置 |
| WO2021180938A1 (en) | 2020-03-13 | 2021-09-16 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus and method for rendering a sound scene using pipeline stages |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20250091201A (ko) | 2025-06-20 |
| EP4607963A1 (en) | 2025-08-27 |
| CN120113259A (zh) | 2025-06-06 |
| JPWO2024084949A1 (ja) | 2024-04-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP4607963A1 (en) | Acoustic signal processing method, computer program, and acoustic signal processing device | |
| US20250150776A1 (en) | Acoustic signal processing method, recording medium, and acoustic signal processing device | |
| US20250150770A1 (en) | Information generation method, acoustic signal processing method, recording medium, and information generation device | |
| US20250247667A1 (en) | Acoustic processing method, acoustic processing device, and recording medium | |
| EP4607964A1 (en) | Acoustic signal processing method, computer program, and acoustic signal processing device | |
| EP4607965A1 (en) | Sound processing device and sound processing method | |
| JP2020188435A (ja) | オーディオエフェクト制御装置、オーディオエフェクト制御システム、オーディオエフェクト制御方法及びプログラム | |
| US20240406669A1 (en) | Metadata for Spatial Audio Rendering | |
| WO2025205328A1 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
| WO2025075102A1 (ja) | 音響処理装置、音響処理方法、及び、プログラム | |
| WO2025075136A1 (ja) | 音声信号処理方法、コンピュータプログラム、及び、音声信号処理装置 | |
| TW202424726A (zh) | 音響處理裝置及音響處理方法 | |
| WO2025075147A1 (ja) | 音声信号処理方法、コンピュータプログラム、及び、音声信号処理装置 | |
| WO2025075079A1 (ja) | 音響処理装置、音響処理方法、及び、プログラム | |
| TW202424727A (zh) | 音響處理裝置及音響處理方法 | |
| WO2024214799A1 (ja) | 情報処理装置、情報処理方法、及び、プログラム | |
| WO2025135070A1 (ja) | 音響情報処理方法、情報処理装置、及び、プログラム | |
| WO2025075149A1 (ja) | 音声信号処理方法、コンピュータプログラム、及び、音声信号処理装置 | |
| WO2025075135A1 (ja) | 音声信号処理方法、コンピュータプログラム、及び、音声信号処理装置 | |
| WO2023199815A1 (ja) | 音響処理方法、プログラム、及び音響処理システム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23879596 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024551436 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202380071659.3 Country of ref document: CN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 202547046896 Country of ref document: IN |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023879596 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2023879596 Country of ref document: EP Effective date: 20250519 |
|
| WWP | Wipo information: published in national office |
Ref document number: 202547046896 Country of ref document: IN Ref document number: 202380071659.3 Country of ref document: CN |
|
| WWP | Wipo information: published in national office |
Ref document number: 1020257011611 Country of ref document: KR |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023879596 Country of ref document: EP |