WO2023212879A1

WO2023212879A1 - Object audio data generation method and apparatus, electronic device, and storage medium

Info

Publication number: WO2023212879A1
Application number: PCT/CN2022/091051
Authority: WO
Inventors: 史润宇; 易鑫林; 张墉; 刘晗宇; 吕柱良; 吕雪洋
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2022-05-05
Filing date: 2022-05-05
Publication date: 2023-11-09
Anticipated expiration: 2024-11-05
Also published as: CN117355894A

Abstract

Disclosed in embodiments of the present disclosure are an object audio data generation method and apparatus, an electronic device, and a storage medium. The method comprises: obtaining sound data of at least one sound object; obtaining current position information of the at least one sound object; and synthesizing the sound data and the current position information of the at least one sound object to generate object audio data. Therefore, position information of each sound object can be accurately obtained in real time, so that object audio data can be recorded and generated in real time.

Description

Object audio data generation method, device, electronic equipment and storage medium

Technical field

本公开涉及通信技术领域，尤其涉及一种对象音频数据的生成方法、装置、电子设备和存储介质。The present disclosure relates to the field of communication technology, and in particular, to a method, device, electronic device and storage medium for generating object audio data.

Background technique

MPEG(Moving Picture Experts Group，动态图像专家组)的下一代音频编解码标准MPEG-H 3D Audio作为ISO/IEC 23008-3国际标准，在这个标准框架中使用了一种全新的音频格式，对象音频(Object Audio)，可以标记声音的方位，使听者无论是用耳机还是音响，并且无论音响的喇叭数是多少，都可以听到特定方位传来的声音。MPEG-H 3D Audio, the next generation audio codec standard of MPEG (Moving Picture Experts Group), is the ISO/IEC 23008-3 international standard. In this standard framework, a new audio format, object audio, is used (Object Audio) can mark the direction of the sound, so that the listener can hear the sound coming from a specific direction regardless of the number of speakers using headphones or speakers.

相关技术中，通过预先录制单声道音频，在后期与预先准备好的单声道音频的位置信息进行组合，生成对象音频数据，采用该方法，需要后期依靠制作设备进行制作，尚缺乏一种实时录制声音对象的对象音频数据的方法。In related technologies, object audio data is generated by pre-recording monophonic audio and combining it with the position information of the pre-prepared monophonic audio in the later stage. Using this method requires the production of production equipment in the later stage, and there is still a lack of a method. A method for recording object audio data of a sound object in real time.

发明内容Contents of the invention

本公开实施例提供一种对象音频数据的生成方法、装置、电子设备和存储介质，可以实时准确的获取每一个声音对象的位置信息，实时录制生成对象音频数据。Embodiments of the present disclosure provide a method, device, electronic device, and storage medium for generating object audio data, which can accurately obtain the location information of each sound object in real time, and record and generate object audio data in real time.

第一方面，本公开实施例提供一种对象音频数据的生成方法，该方法包括：获取至少一个声音对象的声音数据；获取所述至少一个声音对象的当前位置信息；将至少一个所述声音对象的所述声音数据和当前位置信息进行合成，以生成对象音频数据。In a first aspect, embodiments of the present disclosure provide a method for generating object audio data. The method includes: obtaining sound data of at least one sound object; obtaining current location information of the at least one sound object; and converting at least one sound object to The sound data and the current location information are synthesized to generate object audio data.

在该技术方案中，获取至少一个声音对象的声音数据；获取至少一个声音对象的当前位置信息；将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。由此，能够实时准确的获取每一个声音对象的位置信息，实现实时录制生成对象音频数据。In this technical solution, sound data of at least one sound object is obtained; current position information of at least one sound object is obtained; sound data of at least one sound object and current position information are synthesized to generate object audio data. As a result, the position information of each sound object can be accurately obtained in real time, and the object audio data can be recorded and generated in real time.

在一些实施例中，所述获取所述至少一个声音对象的当前位置信息，包括：获取录制所述至少一个声音对象的声音数据的至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of the at least one sound object includes: obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object.

在一些实施例中，在所述将至少一个所述声音对象的所述声音数据和所述当前位置信息进行合成之前，还包括：对所述至少一个声音对象的声音数据和所述当前位置信息进行同步。In some embodiments, before synthesizing the sound data of the at least one sound object and the current location information, the method further includes: synthesizing the sound data of the at least one sound object and the current location information. to synchronize.

在一些实施例中，所述获取录制所述至少一个声音对象的声音数据的至少一个录音终端的当前位置信息，包括：以单向收发方式、双向收发方式或混合收发方式获取所述至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object includes: obtaining the at least one recording in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode. The current location information of the terminal.

在一些实施例中，所述以混合收发方式获取所述至少一个录音终端的位置信息，包括：以所述单向收发方式获取第一定位参考信息；以所述双向收发方式获取第二定位参考信息；根据所述第一定位参考信息和所述第二定位参考信息确定所述至少一个录音终端的当前位置信息。In some embodiments, obtaining the location information of the at least one recording terminal using a hybrid transceiver method includes: obtaining first positioning reference information using the one-way transceiver method; and obtaining a second positioning reference information using the two-way transceiver method. Information: determining the current location information of the at least one recording terminal according to the first positioning reference information and the second positioning reference information.

在一些实施例中，所述第一定位参考信息为角度信息和距离信息之中的一种，所述第二定位参考信息为所述角度信息和所述距离信息之中的另一种。In some embodiments, the first positioning reference information is one of angle information and distance information, and the second positioning reference information is the other one of the angle information and the distance information.

在一些实施例中，所述以所述单向收发方式获取所述至少一个录音终端的当前位置信息，包括：接收所述至少一个录音终端以广播方式发送的第一定位信号，并根据所述第一定位信号生成所述至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of the at least one recording terminal in the one-way transceiver mode includes: receiving a first positioning signal sent by the at least one recording terminal in a broadcast manner, and based on the The first positioning signal generates current location information of the at least one recording terminal.

在一些实施例中，所述以所述双向收发方式获取所述至少一个录音终端的位置信息，包括：接收所述至少一个录音终端以广播方式发送的定位起始信号；向所述至少一个录音终端发送应答信号；接收所述至少一个录音终端发送的第二定位信号，并根据所述第二定位信号生成所述至少一个录音终端的当前位置信息。In some embodiments, obtaining the location information of the at least one recording terminal in the two-way transceiver mode includes: receiving a positioning start signal sent by the at least one recording terminal in a broadcast manner; The terminal sends a response signal; receives a second positioning signal sent by the at least one recording terminal, and generates current location information of the at least one recording terminal according to the second positioning signal.

在一些实施例中，每个所述录音终端对应一个声音对象，且所述录音终端的位置伴随所述声音对象的声音源移动。In some embodiments, each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.

在一些实施例中，还包括：获取所述至少一个声音对象的初始位置信息。In some embodiments, the method further includes: obtaining initial position information of the at least one sound object.

在一些实施例中，所述将至少一个所述声音对象的所述声音数据和当前位置信息进行合成，以生成对象音频数据，包括：获取音频参数，并将所述音频参数作为所述对象音频数据的头文件信息；在每个采样时刻，将每个所述声音对象的声音数据作为对象音频信号进行保存，并将所述当前位置信息作为对象音频辅助数据进行保存，以生成所述对象音频数据。In some embodiments, synthesizing the sound data and current location information of at least one sound object to generate object audio data includes: obtaining audio parameters, and using the audio parameters as the object audio data. header file information of the data; at each sampling moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.

在一些实施例中，还包括：以帧为单位对所述声音数据和所述当前位置信息进行保存。In some embodiments, the method further includes: saving the sound data and the current location information in frame units.

第二方面，本公开实施例提供一种对象音频数据的生成装置，所述对象音频数据的生成装置包括：数据获取单元，被配置为获取至少一个声音对象的声音数据；信息获取单元，被配置为获取所述至少一个声音对象的当前位置信息；数据生成单元，被配置为将至少一个所述声音对象的所述声音数据和当前位置信息进行合成，以生成对象音频数据。In a second aspect, embodiments of the present disclosure provide a device for generating object audio data. The device for generating object audio data includes: a data acquisition unit configured to acquire sound data of at least one sound object; and an information acquisition unit configured to In order to obtain the current position information of the at least one sound object; a data generation unit is configured to synthesize the sound data and the current position information of the at least one sound object to generate object audio data.

在一些实施例中，所述信息获取单元，具体被配置为：获取录制所述至少一个声音对象的声音数据的至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit is specifically configured to: acquire the current location information of at least one recording terminal that records the sound data of the at least one sound object.

在一些实施例中，所述装置，还包括：同步处理单元，被配置为对所述至少一个声音对象的声音数据和所述当前位置信息进行同步。In some embodiments, the device further includes: a synchronization processing unit configured to synchronize the sound data of the at least one sound object and the current location information.

在一些实施例中，所述信息获取单元，具体被配置为以单向收发方式、双向收发方式或混合收发方式获取所述至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit is specifically configured to acquire the current location information of the at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.

在一些实施例中，所述信息获取单元，包括：第一信息获取模块，被配置为以所述单向收发方式获取第一定位参考信息；第二信息获取模块，被配置为以所述双向收发方式获取第二定位参考信息；第一当前信息获取模块，被配置为根据所述第一定位参考信息和所述第二定位参考信息确定所述至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit includes: a first information acquisition module configured to acquire the first positioning reference information in the one-way transceiver mode; a second information acquisition module configured to acquire the first positioning reference information in the two-way transceiver mode. The transceiver mode acquires second positioning reference information; the first current information acquisition module is configured to determine the current position information of the at least one recording terminal according to the first positioning reference information and the second positioning reference information.

在一些实施例中，所述信息获取单元，包括：第二当前信息获取模块，被配置为接收所述至少一个录音终端以广播方式发送的第一定位信号，并根据所述第一定位信号生成所述至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit includes: a second current information acquisition module configured to receive a first positioning signal sent by the at least one recording terminal in a broadcast manner, and generate a generated positioning signal based on the first positioning signal. The current location information of the at least one recording terminal.

在一些实施例中，所述信息获取单元，包括：信号接收模块，被配置为接收所述至少一个录音终端以广播方式发送的定位起始信号；信号发送模块，被配置为向所述至少一个录音终端发送应答信号；第三当前信息获取模块，被配置为接收所述至少一个录音终端发送的第二定位信号，并根据所述第二定位信号生成所述至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit includes: a signal receiving module configured to receive a positioning start signal sent by the at least one recording terminal in a broadcast manner; a signal sending module configured to send a signal to the at least one recording terminal. The recording terminal sends a response signal; the third current information acquisition module is configured to receive the second positioning signal sent by the at least one recording terminal, and generate the current location information of the at least one recording terminal according to the second positioning signal.

在一些实施例中，所述装置，还包括：初始位置获取单元，被配置为获取所述至少一个声音对象的初始位置信息。In some embodiments, the device further includes: an initial position acquisition unit configured to acquire initial position information of the at least one sound object.

在一些实施例中，所述数据生成单元，包括：参数获取模块，被配置为获取音频参数，并将所述音频参数作为所述对象音频数据的头文件信息；音频数据生成模块，被配置为在每个采样时刻，将每个所述声音对象的声音数据作为对象音频信号进行保存，并将所述当前位置信息作为对象音频辅助数据进行保存，以生成所述对象音频数据。In some embodiments, the data generation unit includes: a parameter acquisition module configured to acquire audio parameters and use the audio parameters as header file information of the object audio data; an audio data generation module configured to At each sampling moment, the sound data of each sound object is saved as an object audio signal, and the current position information is saved as object audio auxiliary data to generate the object audio data.

在一些实施例中，所述数据生成单元，还包括：处理模块，被配置为以帧为单位对所述声音数据和所述当前位置信息进行保存。In some embodiments, the data generation unit further includes: a processing module configured to save the sound data and the current location information in frame units.

第三方面，本公开实施例提供一种电子设备，该电子设备包括：至少一个处理器；以及与所述至少一个处理器通信连接的存储器；其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器能够执行上述第一方面所述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect.

第四方面，本公开实施例提供一种存储有计算机指令的非瞬时计算机可读存储介质，其特征在于，所述计算机指令用于使所述计算机执行上述第一方面所述的方法。In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect.

第五方面，本公开实施例提供一种计算机程序产品，包括计算机指令，其特征在于，所述计算机指令在被处理器执行时实现上述第一方面所述的方法。In a fifth aspect, embodiments of the present disclosure provide a computer program product, including computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method described in the first aspect.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.

Description of the drawings

为了更清楚地说明本公开实施例或背景技术中的技术方案，下面将对本公开实施例或背景技术中所需要使用的附图进行说明。In order to more clearly illustrate the technical solutions in the embodiments of the disclosure or the background technology, the drawings required to be used in the embodiments or the background technology of the disclosure will be described below.

图1是本公开实施例提供的一种对象音频数据的生成方法的流程图；Figure 1 is a flow chart of a method for generating object audio data provided by an embodiment of the present disclosure;

图2是本公开实施例提供的另一种对象音频数据的生成方法的流程图；Figure 2 is a flow chart of another method for generating object audio data provided by an embodiment of the present disclosure;

图3是本公开实施例提供的又一种对象音频数据的生成方法的流程图；Figure 3 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure;

图4是本公开实施例提供的又一种对象音频数据的生成方法的流程图；Figure 4 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure;

图5是本公开实施例提供的又一种对象音频数据的生成方法的流程图；Figure 5 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure;

图6是本公开实施例提供的又一种对象音频数据的生成方法的流程图；Figure 6 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure;

图7是本公开实施例提供的一种对象音频数据的生成装置的结构图；Figure 7 is a structural diagram of a device for generating object audio data provided by an embodiment of the present disclosure;

图8是本公开实施例提供的另一种对象音频数据的生成装置的结构图；Figure 8 is a structural diagram of another device for generating object audio data provided by an embodiment of the present disclosure;

图9是本公开实施例提供的对象音频数据的生成装置中一种信息获取单元的结构图；Figure 9 is a structural diagram of an information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure;

图10是本公开实施例提供的对象音频数据的生成装置中另一种信息获取单元的结构图；Figure 10 is a structural diagram of another information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure;

图11是本公开实施例提供的对象音频数据的生成装置中又一种信息获取单元的结构图；Figure 11 is a structural diagram of yet another information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure;

图12是本公开实施例提供的又一种对象音频数据的生成装置的结构图；Figure 12 is a structural diagram of yet another device for generating object audio data provided by an embodiment of the present disclosure;

图13是本公开实施例提供的对象音频数据的生成装置中一种数据生成单元的结构图；Figure 13 is a structural diagram of a data generation unit in the device for generating object audio data provided by an embodiment of the present disclosure;

图14为本公开一实施例示出的电子设备的结构图。FIG. 14 is a structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed ways

为了使本领域普通人员更好地理解本公开的技术方案，下面将结合附图，对本公开实施例中的技术方案进行清楚、完整地描述。In order to allow ordinary people in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.

除非上下文另有要求，否则，在整个说明书和权利要求书中，术语“包括”被解释为开放、包含的意思，即为“包含，但不限于”。在说明书的描述中，术语“一些实施例”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外，所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。Unless the context requires otherwise, throughout the specification and claims, the term "including" is to be interpreted in an open, inclusive sense, that is, "including, but not limited to." In the description of this specification, the terms "some embodiments" and the like are intended to indicate that a particular feature, structure, material, or characteristic associated with the embodiment or example is included in at least one embodiment or example of the present disclosure. The schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be included in any suitable manner in any one or more embodiments or examples.

需要说明的是，本公开的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first", "second", etc. in the description, claims and drawings of the present disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. The terms “first” and “second” are used for descriptive purposes only and shall not be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of the disclosure as detailed in the appended claims.

本公开中的至少一个还可以描述为一个或多个，多个可以是两个、三个、四个或者更多个，本公开不做限制。在本公开实施例中，对于一种技术特征，通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征，该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited. In the embodiment of the present disclosure, for a technical feature, the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc. The technical features described in "first", "second", "third", "A", "B", "C" and "D" are in no particular order or order.

本公开中各表所示的对应关系可以被配置，也可以是预定义的。各表中的信息的取值仅仅是举例，可以配置为其他值，本公开并不限定。在配置信息与各参数的对应关系时，并不一定要求必须配置各表中示意出的所有对应关系。例如，本公开中的表格中，某些行示出的对应关系也可以不配置。又例如，可以基于上述表格做适当的变形调整，例如，拆分，合并等等。上述各表中标题示出参数的名称也可以采用通信装置可理解的其他名称，其参数的取值或表示方式也可以通信装置可理解的其他取值或表示方式。上述各表在实现时，也可以采用其他的数据结构，例如可以采用数组、队列、容器、栈、线性表、指针、链表、树、图、结构体、类、堆、散列表或哈希表等。The corresponding relationships shown in each table in this disclosure can be configured or predefined. The values of the information in each table are only examples and can be configured as other values, which is not limited by this disclosure. When configuring the correspondence between information and each parameter, it is not necessarily required to configure all the correspondences shown in each table. For example, in the table in this disclosure, the corresponding relationships shown in some rows may not be configured. For another example, appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc. The names of the parameters shown in the titles of the above tables may also be other names understandable by the communication device, and the values or expressions of the parameters may also be other values or expressions understandable by the communication device. When implementing the above tables, other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables. wait.

本领域普通技术人员可以意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本公开的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of this disclosure.

针对相关技术中，对象音频数据的获取方法，无法实现直接对对象音频数据的直接录制，无法获得真实的声音对象位置信息的问题。本公开实施例中提供一种对象音频数据的生成方法、装置、电子设备和存储介质，以实时准确的获取每一个声音对象的位置信息，实时录制生成对象音频数据，以解决相关技术中的问题。In the related technology, the object audio data acquisition method cannot realize the direct recording of the object audio data and cannot obtain the real sound object position information. Embodiments of the present disclosure provide a method, device, electronic device and storage medium for generating object audio data to accurately obtain the location information of each sound object in real time and record and generate object audio data in real time to solve problems in related technologies. .

具体地，下面参考附图具体描述本公开实施例提供的对象音频数据的生成方法、装置、电子设备和存储介质。Specifically, the method, device, electronic device, and storage medium for generating object audio data provided by embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

需要说明的是，本公开实施例的对象音频数据的生成方法可以由本公开实施例的对象音频数据的生成装置执行，该对象音频数据的生成装置可以由软件和/或硬件的方式实现，该对象音频数据的生成装置可配置在电子设备中，其中，电子设备可以安装并运行对象音频数据的生成程序。其中，电子设备可以包括但不限于智能手机、平板电脑等具有各种操作系统的硬件设备。It should be noted that the method for generating object audio data in the embodiment of the present disclosure can be executed by the device for generating object audio data according to the embodiment of the present disclosure. The device for generating object audio data can be implemented by software and/or hardware. The object The audio data generating device may be configured in an electronic device, and the electronic device may install and run the target audio data generating program. Among them, electronic devices may include but are not limited to smartphones, tablet computers and other hardware devices with various operating systems.

本公开实施例中，位置信息：指以听者(Audience)为原点时，各麦克风或声音对象相对于听者的位置信息。该位置信息可以以直角坐标系(xyz)表示，也可以以球坐标系(θ,γ,r)表示。它们之间可以由如下的式(1)进行转换。In the embodiment of the present disclosure, position information: refers to the position information of each microphone or sound object relative to the listener when the listener (Audience) is taken as the origin. The position information can be expressed in a rectangular coordinate system (xyz) or a spherical coordinate system (θ, γ, r). They can be converted by the following formula (1).

式(1)中，xyz分别表示麦克风或声音对象在直角坐标系的x轴(前后方向),y轴(左右方向),z轴(上下方向)上的位置坐标。θ,γ,r分别表示麦克风或声音对象在球坐标系上的水平方向角度(麦克风或声音对象和原点的连线在水平面上的映射和x轴的夹角)；垂直方向角度(麦克风或声音对象和原点的连线与水平面的夹角)；和麦克风或声音对象离原点的直线距离。In formula (1), xyz respectively represent the position coordinates of the microphone or sound object on the x-axis (front-back direction), y-axis (left-right direction), and z-axis (up-down direction) of the rectangular coordinate system. θ, γ, and r respectively represent the horizontal angle of the microphone or sound object on the spherical coordinate system (the angle between the mapping of the microphone or sound object and the origin on the horizontal plane and the x-axis); the vertical angle (the angle between the microphone or sound object and the origin) The angle between the line connecting the object and the origin and the horizontal plane); and the straight-line distance of the microphone or sound object from the origin.

前面所述位置信息为三维坐标系下的位置信息，若在二维坐标系下，则位置信息可以以直角坐标系(x,y)表示，也可以以极坐标系(θ,r)表示。它们之间可以由如下的式(2)进行转换。The aforementioned position information is position information in a three-dimensional coordinate system. If it is in a two-dimensional coordinate system, the position information can be expressed in the rectangular coordinate system (x, y) or in the polar coordinate system (θ, r). They can be converted by the following formula (2).

x＝rcosθx＝rcosθ

y＝sinθ (2)y＝sinθ (2)

式(2)中各变量意义与式(1)一致。The meaning of each variable in formula (2) is consistent with formula (1).

无论采用何种坐标系(直角坐标系，球坐标系，或者其他坐标系)表示，或者改变坐标系的原点等变换形式，均不影响本公开的具体实现，不影响对本公开权利的主张。No matter what coordinate system (rectangular coordinate system, spherical coordinate system, or other coordinate system) is used to express it, or the transformation form such as changing the origin of the coordinate system does not affect the specific implementation of the disclosure, nor does it affect the assertion of rights in the disclosure.

故本公开的说明中，为了简便，位置信息将以球坐标系(极坐标系)方式进行表示。Therefore, in the description of this disclosure, for simplicity, the position information will be expressed in a spherical coordinate system (polar coordinate system).

本公开实施例中，对象音频(Object Audio)：泛指各种可以描述声音对象(Audio Object)的声音格式。含有位置信息的点声音对象，或可以大致判断出中心位置的面声音对象都可以作为声音对象(Audio Object)。对象音频(Object Audio)一般由两部分组成，声音信号本身(Audio Data)，以及附带位置信息(Object Audio Metadata)。其中，声音信号本身可以看作单声道音频信号，其形式可以是PCM(Pulse-code modulation)，DSD(Direct Stream Digital)等未压缩格式，也可以是MP3(MPEG-1 or MPEG-2 Audio Layer III)，AAC(Advanced Audio Coding)，Dolby Digital等压缩格式。附带的位置信息为任意时刻t时，上述1.中所示的位置信息。In this disclosed embodiment, object audio (Object Audio) generally refers to various sound formats that can describe sound objects (Audio Object). Point sound objects containing position information, or surface sound objects that can roughly determine the center position can be used as audio objects (Audio Object). Object Audio (Object Audio) generally consists of two parts, the sound signal itself (Audio Data), and the accompanying position information (Object Audio Metadata). Among them, the sound signal itself can be regarded as a mono audio signal, which can be in the form of PCM (Pulse-code modulation), DSD (Direct Stream Digital) and other uncompressed formats, or it can be MP3 (MPEG-1 or MPEG-2 Audio Layer III), AAC (Advanced Audio Coding), Dolby Digital and other compression formats. The attached position information is the position information shown in 1. above at any time t.

如果有多个对象音频时，其格式可以是每个对象音频的声音信号和位置信息单独组合；也可以是所有对象的声音信号组合在一起，位置信息组合在一起，并在声音信号或位置信息中加入第几项声音信号对应第几项位置信息的对应信息。If there are multiple object audios, the format can be a separate combination of the sound signal and position information of each object audio; it can also be a combination of the sound signals of all objects, the position information is combined, and the sound signal or position information is combined. Corresponding information of which sound signal corresponds to which position information is added.

请参见图1，图1是本公开实施例提供的一种对象音频数据的生成方法的流程图。Please refer to FIG. 1 , which is a flow chart of a method for generating object audio data provided by an embodiment of the present disclosure.

如图1所示，该方法可以包括但不限于如下步骤：As shown in Figure 1, the method may include but is not limited to the following steps:

S1，获取至少一个声音对象的声音数据。S1, obtain sound data of at least one sound object.

本公开实施例中，获取至少一个声音对象的声音数据，可以通过声音采集装置录制声音对象的声音信号，获取声音对象的声音数据，至少一个声音对象可以包括一个或多个声音对象，在包括一个声音对象的情况下，通过一个声音采集装置录制声音对象的声音信号，在包括多个声音对象的情况下，通过多个声音采集装置录制多个声音对象的声音信号。In the embodiment of the present disclosure, to obtain the sound data of at least one sound object, the sound signal of the sound object can be recorded through a sound collection device to obtain the sound data of the sound object. The at least one sound object can include one or more sound objects, including one In the case of a sound object, the sound signal of the sound object is recorded by one sound collecting device. In the case of multiple sound objects, the sound signals of multiple sound objects are recorded by multiple sound collecting devices.

其中，声音采集装置可以为麦克风等能够收集声音信息的装置，本公开实施例对此不作具体限制。The sound collection device may be a device capable of collecting sound information, such as a microphone, which is not specifically limited in the embodiments of the present disclosure.

S2，获取至少一个声音对象的当前位置信息。S2, obtain the current location information of at least one sound object.

本公开实施例中，获取声音对象的当前位置信息，可以为在获取声音对象的声音数据的同时，获取声音对象的当前位置信息，以实时获取声音对象的声音数据和位置信息。In the embodiment of the present disclosure, obtaining the current location information of the sound object may be to obtain the current location information of the sound object while obtaining the sound data of the sound object, so as to obtain the sound data and location information of the sound object in real time.

其中，针对每一个声音对象，获取声音对象的声音数据，可以通过一个或多个声音采集装置获取声音对象的声音数据，本公开实施例中，获取声音对象的当前位置信息，在声音采集装置相对于声音对象的相对位置固定不变的情况下，可以获取声音采集装置的当前位置信息，根据声音采集装置与声音对象的相对位置关系，确定声音对象的当前位置信息。Among them, for each sound object, the sound data of the sound object is obtained, and the sound data of the sound object can be obtained through one or more sound collection devices. In the embodiment of the present disclosure, the current position information of the sound object is obtained, relative to the sound collection device. When the relative position of the sound object is fixed, the current position information of the sound collection device can be obtained, and the current position information of the sound object can be determined based on the relative position relationship between the sound collection device and the sound object.

本公开实施例中，声音采集装置与声音对象的相对位置固定不变的情况下，声音对象移动，声音采集装置随着声音对象的移动而移动，从而能够实时获取每一个声音对象的位置信息。In the embodiment of the present disclosure, when the relative position of the sound collection device and the sound object is fixed, the sound object moves, and the sound collection device moves with the movement of the sound object, so that the position information of each sound object can be obtained in real time.

需要说明的是，本公开实施例中，获取声音采集装置的当前位置信息，可以通过超声波定位方法，例如，声音采集装置中设置有超声波收发装置，通过对声音采集装置中的超声波信号进行采集，能够获取声音采集装置的当前位置信息。或者，还可以通过其他方式，本公开实施例对此不作具体限制。It should be noted that in the embodiment of the present disclosure, the current location information of the sound collection device can be obtained through the ultrasonic positioning method. For example, the sound collection device is provided with an ultrasonic transceiver device, and the ultrasonic signal in the sound collection device is collected. The current location information of the sound collection device can be obtained. Alternatively, other methods may be used, and the embodiments of the present disclosure do not specifically limit this.

S3，将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。S3: Synthesize the sound data of at least one sound object and the current location information to generate object audio data.

本公开实施例中，在获取至少一个声音对象的声音数据和当前位置信息，实时获取声音对象的声音数据和位置信息的情况下，将声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。In the embodiment of the present disclosure, when the sound data and current location information of at least one sound object are obtained and the sound data and location information of the sound object are obtained in real time, the sound data and current location information of the sound object are synthesized to generate an object. audio data.

其中，将声音对象的声音数据和当前位置信息进行合成，可以通过对声音数据和当前位置信息按照时间进行组合，以特定文件存储格式进行保存，生成对象音频数据。Among them, the sound data of the sound object and the current location information are synthesized. The sound data and the current location information can be combined according to time and saved in a specific file storage format to generate the object audio data.

通过实施本公开实施例，获取至少一个声音对象的声音数据；获取至少一个声音对象的当前位置信息；将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。由此，能够实时准确的获取每一个声音对象的位置信息，实现实时录制生成对象音频数据。By implementing embodiments of the present disclosure, sound data of at least one sound object is obtained; current position information of at least one sound object is obtained; sound data and current position information of at least one sound object are synthesized to generate object audio data. As a result, the position information of each sound object can be accurately obtained in real time, and the object audio data can be recorded and generated in real time.

如图2所示，该方法可以包括但不限于如下步骤：As shown in Figure 2, the method may include but is not limited to the following steps:

S21，获取至少一个声音对象的声音数据。S21, obtain the sound data of at least one sound object.

其中，S21的相关描述可以参见上述实施例中的相关描述，此处不再赘述。For the relevant description of S21, please refer to the relevant description in the above embodiment, and will not be described again here.

S22，获取至少一个声音对象的当前位置信息。S22, obtain the current location information of at least one sound object.

在一些实施例中，获取至少一个声音对象的当前位置信息，包括：获取录制至少一个声音对象的声音数据的至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of at least one sound object includes: obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object.

本公开实施例中，可以通过录音终端录制声音对象的声音信号，获取声音对象的声音数据。其中，在声音对象为一个的情况下，可以通过一个录音终端录制声音对象的声音数据，在声音对象为多个的情况下，可以通过多个录音终端录制声音对象的声音数据。In the embodiment of the present disclosure, the sound signal of the sound object can be recorded through the recording terminal, and the sound data of the sound object can be obtained. Wherein, when there is one sound object, the sound data of the sound object can be recorded through one recording terminal; when there are multiple sound objects, the sound data of the sound object can be recorded through multiple recording terminals.

其中，录音终端中包括麦克风，可以通过录音终端中的麦克风录制声音对象的声音数据。The recording terminal includes a microphone, and the sound data of the sound object can be recorded through the microphone in the recording terminal.

本公开实施例中，获取至少一个声音对象的当前位置信息，可以获取录制声音对象的声音数据的录音终端的当前位置信息，在存在一个或多个声音对象的情况下，可以获取录制一个或多个声音对象的声音数据的录音终端的当前位置信息。In the embodiment of the present disclosure, the current location information of at least one sound object is obtained. The current location information of the recording terminal that records the sound data of the sound object can be obtained. In the case where one or more sound objects exist, the current location information of the recording terminal that records one or more sound objects can be obtained. The current location information of the recording terminal of the sound data of the sound object.

在一些实施例中，获取录制至少一个声音对象的声音数据的至少一个录音终端的当前位置信息，包括：以单向收发方式、双向收发方式或混合收发方式获取至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of at least one recording terminal that records sound data of at least one sound object includes: obtaining the current location information of at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.

本公开实施例中，获取录制至少一个声音对象的声音数据的至少一个录音终端的当前位置信息，在存在一个声音对象的情况下，获取录制声音对象的声音数据的至少一个录音终端的的当前位置信息，在存在多个声音对象的情况下，获取录制每一个声音对象的声音数据的至少一个录音终端的当前位置信息。In the embodiment of the present disclosure, the current location information of at least one recording terminal that records the sound data of at least one sound object is obtained. If there is a sound object, the current location information of at least one recording terminal that records the sound data of the sound object is obtained. Information, when there are multiple sound objects, obtain the current location information of at least one recording terminal that records the sound data of each sound object.

其中，可以通过单向收发方式获取录音终端的当前位置信息，或者通过双向收发方式获取至少一个录音终端的当前位置信息，或者通过混合收发方式获取录音终端的当前位置信息。Among them, the current location information of the recording terminal can be obtained through a one-way transceiver method, or the current location information of at least one recording terminal can be obtained through a two-way transceiver method, or the current location information of the recording terminal can be obtained through a hybrid transceiver method.

其中，通过混合收发方式获取录音终端的当前位置信息，可以通过单向收发方式和双向收发方式共同获取录音终端的当前位置信息。Among them, the current location information of the recording terminal is obtained through a hybrid transceiver method, and the current location information of the recording terminal can be obtained through a one-way transceiver method and a two-way transceiver method.

在一些实施例中，以混合收发方式获取至少一个录音终端的位置信息，包括：以单向收发方式获取第一定位参考信息；以双向收发方式获取第二定位参考信息；根据第一定位参考信息和第二定位参考信息确定至少一个录音终端的当前位置信息。In some embodiments, obtaining the location information of at least one recording terminal in a hybrid transceiver mode includes: obtaining the first positioning reference information in a one-way transceiver mode; acquiring the second positioning reference information in a two-way transceiver mode; and obtaining the second positioning reference information according to the first positioning reference information. and the second positioning reference information to determine the current location information of at least one recording terminal.

本公开实施例中，通过混合收发方式获取录音终端的位置信息，可以通过单向收发方式获取第一定位参考信息，以及，以双向收发方式获取第二定位参考信息，根据第一定位参考信息和第二定位参考信息确定录音终端的当前位置信息。In the embodiment of the present disclosure, the location information of the recording terminal is obtained through a hybrid transceiver method. The first positioning reference information can be obtained through a one-way transceiver method, and the second positioning reference information can be obtained through a two-way transceiver method. According to the first positioning reference information and The second positioning reference information determines the current location information of the recording terminal.

其中，第一定位参考信息和第二定位参考信息不同。Wherein, the first positioning reference information and the second positioning reference information are different.

在一些实施例中，第一定位参考信息为角度信息和距离信息之中的一种，第二定位参考信息为角度信息和距离信息之中的另一种。In some embodiments, the first positioning reference information is one of angle information and distance information, and the second positioning reference information is the other one of angle information and distance information.

本公开实施例中，通过混合收发方式获取录音终端的位置信息，可以通过单向收发方式获取角度信息，以及，以双向收发方式获取距离信息，根据角度信息和距离信息确定录音终端的当前位置信息。In the embodiment of the present disclosure, the location information of the recording terminal is obtained through a hybrid transceiver method, the angle information can be obtained through a one-way transceiver method, and the distance information can be obtained through a two-way transceiver method, and the current location information of the recording terminal is determined based on the angle information and distance information. .

或者，本公开实施例中，通过混合收发方式获取录音终端的位置信息，可以通过单向收发方式获取距离信息，以及，以双向收发方式获取角度信息，根据距离信息和角度信息确定录音终端的当前位置信息。Alternatively, in the embodiment of the present disclosure, the location information of the recording terminal is obtained through a hybrid transceiver method, the distance information can be obtained through a one-way transceiver method, and the angle information can be obtained through a two-way transceiver method, and the current location of the recording terminal is determined based on the distance information and angle information. location information.

本公开实施例中，第一定位参考信息和第二定位参考信息可以通过声波或超声波获取，或者还可以通过UWB(Ultra Wide Band，超宽带)或WiFi或BT等电磁波信号获取。In the embodiment of the present disclosure, the first positioning reference information and the second positioning reference information can be obtained through sound waves or ultrasonic waves, or can also be obtained through electromagnetic wave signals such as UWB (Ultra Wide Band) or WiFi or BT.

在一些实施例中，以单向收发方式获取至少一个录音终端的当前位置信息，包括：接收至少一个录音终端以广播方式发送的第一定位信号，并根据第一定位信号生成至少一个录音终端的当前位置信息。In some embodiments, obtaining the current location information of at least one recording terminal in a one-way transceiver mode includes: receiving a first positioning signal sent by at least one recording terminal in a broadcast manner, and generating a location information of at least one recording terminal based on the first positioning signal. Current location information.

本公开实施例中，通过单向收发方式获取录音终端的当前位置信息，可以通过接收录音终端以广播方式发送的第一定位信号，并根据第一定位信号生成录音终端的当前位置信息。其中，可以通过TDOA(time difference of arrival)方法获取录音终端的当前位置信息。In the embodiment of the present disclosure, the current location information of the recording terminal is obtained through a one-way transceiver method, by receiving the first positioning signal sent by the recording terminal in a broadcast manner, and generating the current location information of the recording terminal based on the first positioning signal. Among them, the current location information of the recording terminal can be obtained through the TDOA (time difference of arrival) method.

其中，录音终端广播方式发送的第一定位信号可以为声波或超声波，或者还可以为UWB(Ultra Wide Band，超宽带)或WiFi或BT等电磁波信号。Among them, the first positioning signal sent by the recording terminal in the broadcast mode can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band), WiFi or BT.

在一些实施例中，以双向收发方式获取至少一个录音终端的位置信息，包括：接收至少一个录音终端以广播方式发送的定位起始信号；向至少一个录音终端发送应答信号；接收至少一个录音终端发送的第二定位信号，并根据第二定位信号生成至少一个录音终端的当前位置信息。In some embodiments, obtaining the location information of at least one recording terminal in a two-way transceiver mode includes: receiving a positioning start signal sent by at least one recording terminal in a broadcast mode; sending a response signal to at least one recording terminal; receiving at least one recording terminal The second positioning signal is sent, and the current position information of at least one recording terminal is generated according to the second positioning signal.

本公开实施例中，通过双向收发方式获取录音终端的当前位置信息，可以通过接收录音终端以广播方式发送的定位起始信号，向录音终端发送应答信号，接收录音终端发送的第二定位信号，并根据第二定位信号，生成录音终端的当前位置信息。其中，可以通过TOF(time of flight)方法获取至少一个录音终端的位置信息。In the embodiment of the present disclosure, the current location information of the recording terminal is obtained through a two-way transceiver method, by receiving the positioning start signal sent by the recording terminal in a broadcast mode, sending a response signal to the recording terminal, and receiving the second positioning signal sent by the recording terminal. And generate the current location information of the recording terminal according to the second positioning signal. Among them, the location information of at least one recording terminal can be obtained through the TOF (time of flight) method.

其中，录音终端以广播方式发送的定位起始信号可以为声波或超声波，或者还可以为UWB(Ultra Wide Band，超宽带)或WiFi或BT等电磁波信号。Among them, the positioning start signal sent by the recording terminal in a broadcast manner can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band) or WiFi or BT.

录音终端发送的第二定位信号可以为声波或超声波，或者还可以为UWB(Ultra Wide Band，超宽带)或WiFi或BT等电磁波信号。The second positioning signal sent by the recording terminal can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band) or WiFi or BT.

在一些实施例中，每个录音终端对应一个声音对象，且录音终端的位置伴随声音对象的声音源移动。In some embodiments, each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.

本公开实施例中，每个录音终端对应一个声音对象，在存在一个声音对象的情况下，对应声音对象通过一个或多个录音终端录取声音对象的声音数据。In the embodiment of the present disclosure, each recording terminal corresponds to a sound object. If there is a sound object, the corresponding sound object records the sound data of the sound object through one or more recording terminals.

其中，录音终端的位置伴随声音对象的声音源移动，可以理解的是，本公开实施例中，获取至少一个声音对象的当前位置信息，包括：获取录制至少一个声音对象的声音数据的至少一个录音终端的当前位置信息。录音终端对应一个声音对象，录音终端与声音对象的声音源的位置相对固定，在声音对象的声音源移动的情况下，录音终端随着声音对象的声音源的移动而移动。The position of the recording terminal moves with the sound source of the sound object. It can be understood that, in the embodiment of the present disclosure, obtaining the current position information of at least one sound object includes: obtaining at least one recording of the sound data of the at least one sound object. The current location information of the terminal. The recording terminal corresponds to a sound object, and the positions of the recording terminal and the sound source of the sound object are relatively fixed. When the sound source of the sound object moves, the recording terminal moves along with the movement of the sound source of the sound object.

在一些实施例中，获取至少一个声音对象的初始位置信息。In some embodiments, initial position information of at least one sound object is obtained.

本公开实施例中，获取声音对象的初始位置信息，和声音对象的当前位置信息，获取声音对象的声音数据，从而实时获取声音对象的声音数据和位置信息。In the embodiment of the present disclosure, the initial position information of the sound object and the current position information of the sound object are obtained, and the sound data of the sound object is obtained, thereby obtaining the sound data and position information of the sound object in real time.

其中，获取声音对象的声音数据，初始位置信息和当前位置信息，能够实时获取声音对象的声音数据和位置信息。Among them, the sound data, initial position information and current position information of the sound object are obtained, and the sound data and position information of the sound object can be obtained in real time.

S23，对至少一个声音对象的声音数据和当前位置信息进行同步。S23: Synchronize the sound data and current location information of at least one sound object.

本公开实施例中，在获取声音对象的声音数据和当前位置信息的情况下，对声音对象的声音数据和当前位置信息进行同步，可以根据时间，将声音数据和当前位置信息进行同步。In the embodiment of the present disclosure, when the sound data and the current location information of the sound object are obtained, the sound data and the current location information of the sound object are synchronized. The sound data and the current location information can be synchronized according to time.

S24，将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。S24: Synthesize the sound data of at least one sound object and the current location information to generate object audio data.

在一些实施例中，将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据，包括：获取音频参数，并将音频参数作为对象音频数据的头文件信息；在每个采样时刻，将每个声音对象的声音数据作为对象音频信号进行保存，并将当前位置信息作为对象音频辅助数据进行保存，以生成对象音频数据。In some embodiments, synthesizing the sound data of at least one sound object and the current position information to generate the object audio data includes: obtaining audio parameters and using the audio parameters as header file information of the object audio data; in each sample At each moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.

本公开实施例中，生成的对象音频数据可以有多种存储格式，例如：作为文件保存的第一格式、可以实时播放的第二格式等。In the embodiment of the present disclosure, the generated object audio data can be stored in multiple storage formats, such as a first format saved as a file, a second format that can be played in real time, etc.

示例性地，第一格式：file packing mode[]，至少一个声音对象的声音数据会组合在一起成为一个音频信息，其保存方式可以是raw-pcm格式，也可以是未压缩的wav格式(此时单个声音对象看做wav文件的一个channel)，也可以编码成为各种压缩格式。而至少一个声音对象的当前位置信息也会组合在一起，作为对象音频辅助数据(Object Audio metadata)保存。For example, in the first format: file packing mode[], the sound data of at least one sound object will be combined into one audio information, and its storage method can be raw-pcm format or uncompressed wav format (this When a single sound object is regarded as a channel of a wav file), it can also be encoded into various compression formats. The current position information of at least one sound object will also be combined and saved as object audio auxiliary data (Object Audio metadata).

示例性地，第二格式：low delay mode[]，以一定的时间长度为一帧(frame)，在每一帧的内部，以file packing mode同样的格式保存，并将此时的声音数据和当前位置信息连接在一起后成为该帧的对象音频数据。此时，各帧的对象音频数据按时间顺序依次送往播放设备或保存。For example, the second format: low delay mode[], with a certain length of time as one frame (frame), inside each frame, save it in the same format as file packing mode, and combine the sound data at this time with The current position information is concatenated together to become the object audio data of the frame. At this time, the target audio data of each frame is sent to the playback device or saved in chronological order.

其中，本公开实施例中，获取音频参数，可以获取采样率(Sampling rate)、位宽(bit depth)，声音对象的数量N _obj(Number of对象s)等，并将音频参数作为对象音频数据的头文件信息，对于每个采样时刻，将每个声音对象的声音数据作为对象音频信号进行保存，并将当前位置信息作为对象音频辅助数据进行保存，以生成对象音频数据。 Among them, in the embodiment of the present disclosure, to obtain the audio parameters, you can obtain the sampling rate (Sampling rate), bit width (bit depth), the number of sound objects N _obj (Number of objects s), etc., and use the audio parameters as the object audio data header file information, for each sampling moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.

在一种可能的实现方式中，如图3所示，[s51]获得声音对象的数量N _obj，以及同步后的至少一个声音对象的当前位置信息

声音对象的声音数据

In a possible implementation, as shown in Figure 3, [s51] obtains the number of sound objects N _obj and the current position information of at least one sound object after synchronization

Sound data for sound objects

[s52]确定存储格式，确定以file packing mode或low delay mode来进行保存/传输。[s52] Determine the storage format and determine whether to save/transmit in file packing mode or low delay mode.

[s53a]将音频的基本参数，如采样率(Sampling rate)，位宽(bit depth)，声音对象的数量N _obj(Number of对象s)等作为头文件信息记入对象音频文件。 [s53a] Record the basic parameters of audio, such as sampling rate (Sampling rate), bit width (bit depth), number of sound objects N _obj (Number of objects s), etc. as header file information into the target audio file.

[s54a]在判断为file packing mode时，将对象音频信息按file packing mode进行保存，具体如下：[s54a] When it is determined that it is file packing mode, the object audio information is saved according to file packing mode, as follows:

[s541a]将对象声音信息按raw-pcm格式进行保存，具体如下：[s541a] Save the object sound information in raw-pcm format, as follows:

对于第一个采样时刻，将获取的对t＝1时刻采样得到的音频数据中求得的声音对象的声音数据

按声音源的自然顺序进行保存，每个声音对象的声音数据占据长度为wBitsPerSample比特。 For the first sampling time, the sound data of the sound object obtained from the audio data sampled at time t=1 is obtained.

Saved in the natural order of sound sources, the sound data of each sound object occupies a length of wBitsPerSample bits.

在之后每一个采样时刻t，将获取的对t时刻采样得到的声音对象的声音数据

按声音源的自然顺序记录在t-1时刻得到的声音对象的对象音频信号

之后，每个声音对象的声音数据占据长度为wBitsPerSample比特。 At each subsequent sampling time t, the sound data of the sound object sampled at time t will be obtained.

Record the object audio signal of the sound object obtained at time t-1 according to the natural order of the sound source.

After that, the sound data of each sound object occupies a length of wBitsPerSample bits.

其保存格式可以参见如下表1所示：Its saving format can be seen in Table 1 below:

表1Table 1

[s542a]将获取的至少一个声音对象的当前位置信息

作为对象音频辅助数据进行保存，在第一个采样点，按声音源的自然顺序进行保存，保存格式参见下表2： [s542a] The current location information of at least one sound object will be obtained

It is saved as object audio auxiliary data. At the first sampling point, it is saved in the natural order of the sound sources. For the saving format, see Table 2 below:

表2Table 2

其中，各参数分别为：iSampleOffset：该采样点的序号；Among them, each parameter is: iSampleOffset: the serial number of the sampling point;

对象Object_index：当前记录的音源的序号；Object Object_index: the serial number of the currently recorded audio source;

对象Object_Azimuth：当前记录的音源的θ；Object Object_Azimuth: θ of the currently recorded sound source;

对象Object_Elevation：当前记录的音源的γ；Object Object_Elevation: γ of the currently recorded sound source;

对象Object_Radius：当前记录的音源的r。Object Object_Radius: r of the currently recorded audio source.

在其后的采样点，判断至少一个声音对象是否有位置的变化，如果有，对该采样点位置发生变化的音源进行保存，保存格式参见上表2。At subsequent sampling points, determine whether at least one sound object has a change in position. If so, save the sound source whose position has changed at the sampling point. See Table 2 above for the storage format.

其中，可以指定一定的时间间隔，如N个采样点进行一次判断及保存，以节约存储空间。Among them, a certain time interval can be specified, such as N sampling points being judged and saved at one time to save storage space.

[s55a]本公开实施例中，将作为对象音频数据的头文件信息的音频参数、以及作为对象音频信号的声音对象的声音数据和作为对象音频辅助数据的当前位置信息进行拼接，以生成完整的对象音频数据。[s55a] In the embodiment of the present disclosure, the audio parameters as the header file information of the object audio data, the sound data of the sound object as the object audio signal, and the current position information as the object audio auxiliary data are spliced to generate a complete Object audio data.

其中，拼接方式如下表3至表6所示：Among them, the splicing methods are shown in Table 3 to Table 6 below:

表3table 3

表4Table 4

表5table 5

表6Table 6

在另一种可能的实施方式中，如图3所示，获得声音对象的数量N _obj，以及同步后的至少一个声音对象的当前位置信息

声音对象的声音数据

In another possible implementation, as shown in Figure 3, the number of sound objects N _obj and the synchronized current position information of at least one sound object are obtained.

Sound data for sound objects

[s53b]获取音频参数，可以获取采样率(Sampling rate)、位宽(bit depth)，声音对象的数量N _obj(Number of对象s)等，并将音频参数作为对象音频数据的头文件信息，[s54b]在存储格式为第二格式low delay mode[]的情况下，将对象音频数据按low delay mode进行保存，具体如下： [s53b] Get the audio parameters, you can get the sampling rate (Sampling rate), bit width (bit depth), the number of sound objects N _obj (Number of objects s), etc., and use the audio parameters as the header file information of the object audio data, [s54b] When the storage format is the second format low delay mode[], the object audio data is saved in low delay mode, as follows:

[s541b]以帧为单位，把当前帧所包含的所有采样点，对于第一个采样时刻，将获取的对t＝1时刻采样得到的音频数据中求得的声音对象的声音数据

按声音源的自然顺序进行保存，每个声音对象的声音数据占据长度为wBitsPerSample比特。 [s541b] Taking the frame as a unit, for all the sampling points contained in the current frame, for the first sampling time, the sound data of the sound object obtained from the audio data sampled at t=1 time will be obtained

[s542b]以帧为单位，把当前帧所包含的所有采样点的声音对象的位置信息

之后，每个声音对象的声音数据占据长度为wBitsPerSample比特。 [s542b] In frame units, the position information of the sound objects of all sampling points contained in the current frame is

At each subsequent sampling time t, the sound data of the sound object sampled at time t will be obtained.

其中，保存格式参见上表1所示。Among them, the saving format is shown in Table 1 above.

[s543b]本公开实施例中，将作为对象音频数据的头文件信息的音频参数、以及作为对象音频信号的声音对象的声音数据和作为对象音频辅助数据的当前位置信息进行拼接，以生成完整的对象音频数据。[s543b] In the embodiment of the present disclosure, the audio parameters as the header file information of the object audio data, the sound data of the sound object as the object audio signal, and the current position information as the object audio auxiliary data are spliced to generate a complete Object audio data.

其中，拼接方式参见如下表7至表9所示：Among them, the splicing methods are shown in Table 7 to Table 9 below:

表7Table 7

表8Table 8

表9Table 9

在一些实施例中，还包括：以帧为单位对声音数据和当前位置信息进行保存。In some embodiments, the method further includes: saving the sound data and current location information in frame units.

[s55]本公开实施例中，首先记录或传送头文件信息，对于每一帧，将记录的声音对象的声音数据和记录的对象音频辅助数据(Object Audio metadata)进行拼接，成为该帧的对象音频信息。将各帧的对象音频数据按时间顺序拼接后保存，或在每次得到1帧的对象音频数据后，直接传送，以实现低迟延(low delay)传输。将组合后的对象音频数据，根据需要，保存在内存(memory)或存储器(disk)中，或传输到播放设备，或编码成为MPEG-H 3D Audio格式，或Dolby Atmos格式或其他支持对象音频(Object Audio)的编码格式，并保存或传送。[s55] In the embodiment of the present disclosure, the header file information is first recorded or transmitted. For each frame, the recorded sound data of the sound object and the recorded object audio auxiliary data (Object Audio metadata) are spliced to become the object of the frame. audio information. The object audio data of each frame is spliced in chronological order and saved, or directly transmitted after obtaining one frame of object audio data each time to achieve low delay transmission. The combined object audio data is saved in memory or disk as needed, or transferred to the playback device, or encoded into MPEG-H 3D Audio format, or Dolby Atmos format or other supported object audio ( Object Audio) encoding format and save or transmit it.

通过实施本公开实施例，利用定位技术，可以实时且准确的获取每一个声音对象的当前位置信息，而非后期软件制作，能够实时录制生成对象音频数据。By implementing the embodiments of the present disclosure, positioning technology can be used to obtain the current location information of each sound object in real time and accurately. Instead of post-production software, the object audio data can be recorded and generated in real time.

为方便理解，本公开实施例提供一示例性实施例。For ease of understanding, the embodiment of the present disclosure provides an exemplary embodiment.

如图4所示，在一种可能的实现方式中，本公开实施例中，通过录音终端获取声音对象的声音数据，每一个声音对象通过一个录音终端收集声音数据，多个录音终端获取多个声音对象的声音数据，发送至录音模块，获取至少一个声音对象的声音数据。As shown in Figure 4, in one possible implementation, in the embodiment of the present disclosure, the sound data of the sound object is obtained through a recording terminal. Each sound object collects sound data through a recording terminal, and multiple recording terminals obtain multiple The sound data of the sound object is sent to the recording module to obtain the sound data of at least one sound object.

其中，录音终端可以发送定位信号，由定位模块中的数个接收端(天线或麦克风)接收，图4中示出声音信号是由有线的方式传递给录音模块的，但是也可以通过无线(WiFi或BT等)方式传递，定位模块中的接收端接收录音终端发送的定位信号，获取声音对象的当前位置信息。Among them, the recording terminal can send a positioning signal, which is received by several receiving ends (antennas or microphones) in the positioning module. Figure 4 shows that the sound signal is transmitted to the recording module in a wired manner, but it can also be transmitted wirelessly (WiFi or BT, etc.), the receiving end in the positioning module receives the positioning signal sent by the recording terminal, and obtains the current location information of the sound object.

需要说明的是，图4中仅示出单向收发方式获取声音对象的当前位置信息的情况，本公开实施例中，还可以采用双向收发方式，或者混合收发方式等获取声音对象的当前位置信息，其中，在采用双向收发方式的情况下，录音终端发送定位信号外，还可根据需要发送定位起始信号，且还可以接收定位模块回传的应答信号，定位模块可以接受录音终端发送的定位信号和起始定位信号外，还可以发送应答信号。It should be noted that FIG. 4 only shows the situation of obtaining the current location information of the sound object using the one-way transceiver method. In the embodiment of the present disclosure, the current location information of the sound object can also be obtained using the two-way transceiver method, or the hybrid transceiver method. , among them, when the two-way transceiver mode is adopted, in addition to sending positioning signals, the recording terminal can also send positioning start signals as needed, and can also receive response signals returned by the positioning module. The positioning module can accept the positioning sent by the recording terminal. In addition to the signal and initial positioning signal, a response signal can also be sent.

如图5所示，本公开实施例中，在对声音对象进行对象音频数据录制时，首先通过各录音设备录制对应的声音对象的声音信号，并发射测距信号。分别获取声音对象的声音信息(声音数据)，以及声音对象的位置信息(当前位置信息)；同步声音对象的声音信息(声音数据)和位置信息(当前位置信息)，之后组合各声音对象的声音信息(声音数据)和位置信息(当前位置信息)，生成完整的对象音频信号 (对象音频数据)，由此完成对象音频数据的录制。As shown in FIG. 5 , in the embodiment of the present disclosure, when recording object audio data of a sound object, the sound signal of the corresponding sound object is first recorded through each recording device, and a ranging signal is emitted. Obtain the sound information (sound data) and the position information (current position information) of the sound object respectively; synchronize the sound information (sound data) and position information (current position information) of the sound object, and then combine the sounds of each sound object information (sound data) and position information (current position information) to generate a complete object audio signal (object audio data), thereby completing the recording of the object audio data.

如图6所示，组合各声音对象的声音信息(声音数据)和位置信息(当前位置信息)，生成完整的对象音频信号(对象音频数据)的过程，可以具体包括：As shown in Figure 6, the process of combining the sound information (sound data) and position information (current position information) of each sound object to generate a complete object audio signal (object audio data) may specifically include:

[S301]获取声音对象的数量N，各录音终端所发射的定位信号的特征参数，以及定位模块的位置信息。其中，声音对象的数量N，各录音终端所发射的定位信号的特征参数可以为事先约定好，也可以为由各录音终端在发送声音信号给录音模块时同步传递给录音模块，再由录音模块传递给定位模块。[S301] Obtain the number N of sound objects, the characteristic parameters of the positioning signals transmitted by each recording terminal, and the position information of the positioning module. Among them, the number N of sound objects and the characteristic parameters of the positioning signals emitted by each recording terminal can be agreed in advance, or can be synchronously transmitted to the recording module by each recording terminal when sending the sound signal to the recording module, and then the recording module Passed to the positioning module.

[S302]根据定位模块的位置信息，确定位置信息的坐标原点位置。并为各声音对象分配初始位置。[S302] According to the position information of the positioning module, determine the coordinate origin position of the position information. And assign an initial position to each sound object.

[S303]对定位模块的各接收设备(天线或麦克风)处接收到的定位信号，进行解调并提取定位特征，以供后续通过该特征对各录音终端进行定位。[S303] Demodulate the positioning signals received at each receiving device (antenna or microphone) of the positioning module and extract positioning features for subsequent positioning of each recording terminal through the features.

[S304～S311]对每一个待定位的声音对象，分别进行位置信息的确定。[S304~S311] For each sound object to be located, the position information is determined respectively.

其中S305～S306为从接收到的定位特征中判断是否存在某一个声音对象的定位信号或定位起始信号，若有，则取得该信息并根据定位方式采用不同的定位方案。例如使用单向收发方式时，利用TDOA(time difference of arrival)方法求得该声音对象的位置信息，使用双向收发方式时，利用TOF(time of flight)方法求得该声音对象的位置信息。或UWB室内定位方案等。若采用双向收发方式，则定位模块须与各个录音终端进行双向数据通信。S305 to S306 are to determine whether there is a positioning signal or positioning start signal of a certain sound object from the received positioning characteristics. If so, obtain the information and use different positioning solutions according to the positioning method. For example, when using the one-way transceiver method, the TDOA (time difference of arrival) method is used to obtain the position information of the sound object. When the two-way transceiver method is used, the TOF (time of flight) method is used to obtain the position information of the sound object. Or UWB indoor positioning solution, etc. If the two-way sending and receiving method is adopted, the positioning module must conduct two-way data communication with each recording terminal.

同步模块从录音模块中取得声音对象的声音信息(声音数据)，从定位模块中取得声音对象的位置信息(当前位置信息)，按时间进行同步，将同步后的声音对象的声音信息(声音数据)及位置信息(当前位置信息)发送给组合模块。The synchronization module obtains the sound information (sound data) of the sound object from the recording module, obtains the position information (current position information) of the sound object from the positioning module, synchronizes according to time, and synchronizes the sound information (sound data) of the sound object. ) and location information (current location information) are sent to the combination module.

组合模块从同步模块中获得同步后的各声音对象的位置信息(当前位置信息)

和声音信息(声音数据)

并将声音对象的位置信息(当前位置信息)和声音对象的声音信息(声音数据)进行组合，成为完整的对象音频信号。 The combination module obtains the synchronized position information (current position information) of each sound object from the synchronization module.

and sound information (sound data)

And combine the position information of the sound object (current position information) and the sound information of the sound object (sound data) to form a complete object audio signal.

根据用途，对象音频信号有两种保存方式，保存用的file packing mode[]，和实时播放用的low delay mode[]。Depending on the purpose, there are two ways to save the object audio signal, file packing mode[] for saving, and low delay mode[] for real-time playback.

对于file packing mode，各声音对象的声音信息(声音数据)会组合在一起成为一个多对象的音频信息，其保存方式可以是raw-pcm格式，也可以是未压缩的wav格式(此时单个对象看做wav文件的一个channel)，也可以编码成为各种压缩格式。而各对象的声音对象位置信息也会组合在一起，作为对象音频辅助数据(Object Audio metadata)保存。For file packing mode, the sound information (sound data) of each sound object will be combined into a multi-object audio information, which can be saved in raw-pcm format or uncompressed wav format (in this case, a single object Seen as a channel of a wav file), it can also be encoded into various compression formats. The sound object position information of each object will also be combined together and saved as object audio auxiliary data (Object Audio metadata).

对于low delay mode，规定一定的时间长度τ为一帧(frame)，在每一帧的内部，以file packing mode同样的格式保存，并将此时的声音信息和音频辅助数据连接在一起后成为该帧的对象音频信息。此时，各帧的音频信息按时间顺序依次送往播放设备或保存。For low delay mode, a certain time length τ is specified as a frame. Within each frame, it is saved in the same format as file packing mode, and the sound information and audio auxiliary data at this time are connected together to become Object audio information for this frame. At this time, the audio information of each frame is sent to the playback device or saved in chronological order.

图7是本公开实施例提供的一种对象音频数据的生成装置的结构图。FIG. 7 is a structural diagram of a device for generating object audio data provided by an embodiment of the present disclosure.

如图7所示，对象音频数据的生成装置1，包括：数据获取单元11、信息获取单元12和数据生成单元13。As shown in FIG. 7 , the object audio data generating device 1 includes: a data acquisition unit 11 , an information acquisition unit 12 and a data generation unit 13 .

数据获取单元11，被配置为获取至少一个声音对象的声音数据。The data acquisition unit 11 is configured to acquire sound data of at least one sound object.

信息获取单元12，被配置为获取至少一个声音对象的当前位置信息。The information acquisition unit 12 is configured to acquire current location information of at least one sound object.

数据生成单元13，被配置为将至少一个声音对象的声音数据和当前位置信息进行合成，以生成对象音频数据。The data generating unit 13 is configured to synthesize the sound data of at least one sound object and the current location information to generate object audio data.

在一些实施例中，信息获取单元12，具体被配置为：获取录制至少一个声音对象的声音数据的至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit 12 is specifically configured to: acquire the current location information of at least one recording terminal that records the sound data of at least one sound object.

如图8所示，在一些实施例中，对象音频数据的生成装置1，还包括：同步处理单元14，被配置为对至少一个声音对象的声音数据和当前位置信息进行同步。As shown in Figure 8, in some embodiments, the object audio data generating device 1 also includes: a synchronization processing unit 14 configured to synchronize the sound data of at least one sound object and the current location information.

在一些实施例中，信息获取单元12，具体被配置为以单向收发方式、双向收发方式或混合收发方式获取至少一个录音终端的当前位置信息。In some embodiments, the information acquisition unit 12 is specifically configured to acquire the current location information of at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.

如图9所示，在一些实施例中，信息获取单元12，包括：第一信息获取模块121、第二信息获取模块122和第一当前信息获取模块123。As shown in Figure 9, in some embodiments, the information acquisition unit 12 includes: a first information acquisition module 121, a second information acquisition module 122, and a first current information acquisition module 123.

第一信息获取模块121，被配置为以单向收发方式获取第一定位参考信息。The first information acquisition module 121 is configured to acquire the first positioning reference information in a one-way sending and receiving manner.

第二信息获取模块122，被配置为以双向收发方式获取第二定位参考信息。The second information acquisition module 122 is configured to acquire the second positioning reference information in a bidirectional sending and receiving manner.

第一当前信息获取模块123，被配置为根据第一定位参考信息和第二定位参考信息确定至少一个录音终端的当前位置信息。The first current information acquisition module 123 is configured to determine the current location information of at least one recording terminal based on the first positioning reference information and the second positioning reference information.

如图10所示，在一些实施例中，信息获取单元12，包括：第二当前信息获取模块124，被配置为接收至少一个录音终端以广播方式发送的第一定位信号，并根据第一定位信号生成至少一个录音终端的当前位置信息。As shown in Figure 10, in some embodiments, the information acquisition unit 12 includes: a second current information acquisition module 124, configured to receive a first positioning signal sent by at least one recording terminal in a broadcast manner, and according to the first positioning The signal generates current location information of at least one recording terminal.

如图11所示，在一些实施例中，信息获取单元12，包括：信号接收模块125、信号发送模块126和第三当前信息获取模块127。As shown in Figure 11, in some embodiments, the information acquisition unit 12 includes: a signal receiving module 125, a signal sending module 126 and a third current information acquiring module 127.

信号接收模块125，被配置为接收至少一个录音终端以广播方式发送的定位起始信号。The signal receiving module 125 is configured to receive a positioning start signal sent by at least one recording terminal in a broadcast manner.

信号发送模块126，被配置为向至少一个录音终端发送应答信号。The signal sending module 126 is configured to send a response signal to at least one recording terminal.

第三当前信息获取模块127，被配置为接收至少一个录音终端发送的第二定位信号，并根据第二定位信号生成至少一个录音终端的当前位置信息。The third current information acquisition module 127 is configured to receive a second positioning signal sent by at least one recording terminal, and generate current location information of at least one recording terminal according to the second positioning signal.

如图12所示，在一些实施例中，对象音频数据的生成装置1，还包括：初始位置获取单元15，被配置为获取至少一个声音对象的初始位置信息。As shown in Figure 12, in some embodiments, the object audio data generating device 1 further includes: an initial position acquisition unit 15 configured to acquire initial position information of at least one sound object.

如图13所示，在一些实施例中，数据生成单元13，包括：参数获取模块131和音频数据生成模块132。As shown in Figure 13, in some embodiments, the data generation unit 13 includes: a parameter acquisition module 131 and an audio data generation module 132.

参数获取模块131，被配置为获取音频参数，并将音频参数作为对象音频数据的头文件信息。The parameter acquisition module 131 is configured to acquire audio parameters and use the audio parameters as header file information of the object audio data.

音频数据生成模块132，被配置为在每个采样时刻，将每个声音对象的声音数据作为对象音频信号进行保存，并将当前位置信息作为对象音频辅助数据进行保存，以生成对象音频数据。The audio data generation module 132 is configured to, at each sampling moment, save the sound data of each sound object as the object audio signal, and save the current position information as the object audio auxiliary data, to generate the object audio data.

请继续参见图13，在一些实施例中，数据生成单元13，还包括：处理模块133。Please continue to refer to Figure 13. In some embodiments, the data generation unit 13 also includes: a processing module 133.

处理模块133，被配置为以帧为单位对声音数据和当前位置信息进行保存。The processing module 133 is configured to save the sound data and current location information in frame units.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the devices in the above embodiments, the specific manner in which each module performs operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

本公开实施例提供的对象音频数据的生成装置，可以执行如上面一些实施例所述的对象音频数据的生成方法，其有益效果与上述的对象音频数据的生成方法的有益效果相同，此处不再赘述。The object audio data generation device provided by the embodiments of the present disclosure can perform the object audio data generation method as described in some of the above embodiments, and its beneficial effects are the same as those of the object audio data generation method described above, which are not mentioned here. Again.

图14是根据一示例性实施例示出的一种用于对象音频数据的生成方法的电子设备100的结构图。FIG. 14 is a structural diagram of an electronic device 100 used for a method of generating object audio data according to an exemplary embodiment.

示例性地，电子设备100可以是移动电话，计算机，数字广播终端，消息收发设备，游戏控制台，平板设备，医疗设备，健身设备，个人数字助理等。Illustratively, the electronic device 100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.

如图14所示，电子设备100可以包括以下一个或多个组件：处理组件101，存储器102，电源组件103，多媒体组件104，音频组件105，输入/输出(I/O)的接口106，传感器组件107，以及通信组件108。As shown in FIG. 14 , the electronic device 100 may include one or more of the following components: a processing component 101 , a memory 102 , a power supply component 103 , a multimedia component 104 , an audio component 105 , an input/output (I/O) interface 106 , and a sensor. component 107, and communications component 108.

处理组件101通常控制电子设备100的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件101可以包括一个或多个处理器1011来执行指令，以完成上述的方法的全部或部分步骤。此外，处理组件101可以包括一个或多个模块，便于处理组件101和其他组件之间的交互。例如，处理组件101可以包括多媒体模块，以方便多媒体组件104和处理组件101之间的交互。The processing component 101 generally controls the overall operations of the electronic device 100, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 101 may include one or more processors 1011 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 101 may include one or more modules that facilitate interaction between processing component 101 and other components. For example, processing component 101 may include a multimedia module to facilitate interaction between multimedia component 104 and processing component 101 .

存储器102被配置为存储各种类型的数据以支持在电子设备100的操作。这些数据的示例包括用于在电子设备100上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器102可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如SRAM(Static Random-Access Memory，静态随机存取存储器)，EEPROM(Electrically Erasable Programmable read only memory，带电可擦可编程只读存储器)，EPROM(Erasable Programmable Read-Only Memory，可擦除可编程只读存储器)，PROM(Programmable read-only memory，可编程只读存储器)，ROM(Read-Only Memory，只读存储器)，磁存储器，快闪存储器，磁盘或光盘。Memory 102 is configured to store various types of data to support operations at electronic device 100 . Examples of such data include instructions for any application or method operating on the electronic device 100, contact data, phonebook data, messages, pictures, videos, etc. The memory 102 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as SRAM (Static Random-Access Memory), EEPROM (Electrically Erasable Programmable read only memory), which can be Erasable programmable read-only memory), EPROM (Erasable Programmable Read-Only Memory, erasable programmable read-only memory), PROM (Programmable read-only memory, programmable read-only memory), ROM (Read-Only Memory, only read memory), magnetic memory, flash memory, magnetic disk or optical disk.

电源组件103为电子设备100的各种组件提供电力。电源组件103可以包括电源管理系统，一个或多个电源，及其他与为电子设备100生成、管理和分配电力相关联的组件。Power supply component 103 provides power to various components of electronic device 100 . Power supply components 103 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 100 .

多媒体组件104包括在所述电子设备100和用户之间的提供一个输出接口的触控显示屏。在一些实施例中，触控显示屏可以包括LCD(Liquid Crystal Display，液晶显示器)和TP(Touch Panel，触摸面板)。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件104包括一个前置摄像头和/或后置摄像头。当电子设备100处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。Multimedia component 104 includes a touch-sensitive display screen that provides an output interface between the electronic device 100 and the user. In some embodiments, the touch display screen may include LCD (Liquid Crystal Display) and TP (Touch Panel). The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some embodiments, multimedia component 104 includes a front-facing camera and/or a rear-facing camera. When the electronic device 100 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.

音频组件105被配置为输出和/或输入音频信号。例如，音频组件105包括一个MIC(Microphone，麦克风)，当电子设备100处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器102或经由通信组件108发送。在一些实施例中，音频组件105还包括一个扬声器，用于输出音频信号。Audio component 105 is configured to output and/or input audio signals. For example, the audio component 105 includes a MIC (Microphone), and when the electronic device 100 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signals may be further stored in memory 102 or sent via communications component 108 . In some embodiments, audio component 105 also includes a speaker for outputting audio signals.

I/O接口2112为处理组件101和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 2112 provides an interface between the processing component 101 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.

传感器组件107包括一个或多个传感器，用于为电子设备100提供各个方面的状态评估。例如，传感器组件107可以检测到电子设备100的打开/关闭状态，组件的相对定位，例如所述组件为电子设备100的显示器和小键盘，传感器组件107还可以检测电子设备100或电子设备100一个组件的位置改变，用户与电子设备100接触的存在或不存在，电子设备100方位或加速/减速和电子设备100的温度变化。传感器组件107可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件107还可以包括光传感器，如CMOS(Complementary Metal Oxide Semiconductor，互补金属氧化物半导体)或CCD(Charge-coupled Device，电荷耦合元件)图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件107还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor component 107 includes one or more sensors for providing various aspects of status assessment for electronic device 100 . For example, the sensor component 107 can detect the open/closed state of the electronic device 100, the relative positioning of components, such as the display and the keypad of the electronic device 100, the sensor component 107 can also detect the electronic device 100 or an electronic device 100. The position of components changes, the presence or absence of user contact with the electronic device 100 , the orientation or acceleration/deceleration of the electronic device 100 and the temperature of the electronic device 100 change. Sensor assembly 107 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor component 107 may also include a light sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge-coupled Device) image sensor for use in imaging applications. In some embodiments, the sensor component 107 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件108被配置为便于电子设备100和其他设备之间有线或无线方式的通信。电子设备100可以接入基于通信标准的无线网络，如WiFi，2G或3G，或它们的组合。在一个示例性实施例中，通信组件108经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件108还包括NFC(Near Field Communication，近场通信)模块，以促进短程通信。例如，在NFC模块可基于RFID(Radio Frequency Identification，射频识别)技术，IrDA(Infrared Data Association，红外数据协会)技术，UWB(Ultra Wide Band，超宽带)技术，BT(Bluetooth，蓝牙)技术和其他技术来实现。Communication component 108 is configured to facilitate wired or wireless communication between electronic device 100 and other devices. The electronic device 100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 108 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 108 also includes an NFC (Near Field Communication) module to facilitate short-range communication. For example, the NFC module can be based on RFID (Radio Frequency Identification) technology, IrDA (Infrared Data Association) technology, UWB (Ultra Wide Band) technology, BT (Bluetooth, Bluetooth) technology and other Technology to achieve.

在示例性实施例中，电子设备100可以被一个或多个ASIC(Application Specific Integrated Circuit，专用集成电路)、DSP(Digital Signal Processor，数字信号处理器)、数字信号处理设备(DSPD)、PLD(Programmable Logic Device，可编程逻辑器件)、FPGA(Field Programmable Gate Array，现场可编程逻辑门阵列)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述对象音频数据的生成方法。需要说明的是，本实施例的电子设备的实施过程和技术原理参见前述对本公开实施例的对象音频数据的生成方法的解释说明，此处不再赘述。In an exemplary embodiment, the electronic device 100 may be configured by one or more ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Processor, digital signal processor), digital signal processing device (DSPD), PLD ( Programmable Logic Device, Programmable Logic Device), FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array), controller, microcontroller, microprocessor or other electronic components, used to perform the generation of audio data for the above objects method. It should be noted that for the implementation process and technical principles of the electronic device in this embodiment, please refer to the aforementioned explanation of the method for generating object audio data in this embodiment of the present disclosure, and will not be described again here.

本公开实施例提供的电子设备100，可以执行如上面一些实施例所述的对象音频数据的生成方法，其有益效果与上述的对象音频数据的生成方法的有益效果相同，此处不再赘述。The electronic device 100 provided by the embodiments of the present disclosure can perform the object audio data generation method as described in some of the above embodiments, and its beneficial effects are the same as those of the object audio data generation method described above, which will not be described again here.

为了实现上述实施例，本公开还提出一种存储介质。In order to implement the above embodiments, the present disclosure also proposes a storage medium.

其中，该存储介质中的指令由电子设备的处理器执行时，使得电子设备能够执行如前所述的对象音频数据的生成方法。例如，所述存储介质可以是ROM(Read Only Memory Image，只读存储器)、RAM(Random Access Memory，随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory，紧凑型光盘只读存储器)、磁带、软盘和光数据存储设备等。Wherein, when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the method of generating the object audio data as described above. For example, the storage medium can be ROM (Read Only Memory Image, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only memory) , tapes, floppy disks and optical data storage devices, etc.

为了实现上述实施例，本公开还提供一种计算机程序产品，该计算机程序由电子设备的处理器执行时，使得电子设备能够执行如前所述的对象音频数据的生成方法。In order to implement the above embodiments, the present disclosure also provides a computer program product. When the computer program is executed by a processor of an electronic device, the electronic device can perform the object audio data generating method as described above.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.

以上所述，仅为本公开的具体实施方式，但本公开的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本公开揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本公开的保护范围之内。因此，本公开的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present disclosure. should be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims

A method for generating object audio data, which is characterized by including:

Obtain the sound data of at least one sound object;

Obtain the current location information of the at least one sound object;

The sound data and current location information of at least one sound object are synthesized to generate object audio data.

The method of claim 1, wherein obtaining the current location information of the at least one sound object includes:

Obtain the current location information of at least one recording terminal that records the sound data of the at least one sound object.

The method according to claim 1 or 2, characterized in that, before synthesizing the sound data of at least one of the sound objects and the current location information, it further includes:

Synchronize the sound data of the at least one sound object and the current location information.

The method of claim 2, wherein obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object includes:

The current location information of the at least one recording terminal is obtained in a one-way transceiver mode, a two-way transceiver mode or a hybrid transceiver mode.

The method of claim 4, wherein the obtaining the location information of the at least one recording terminal in a hybrid sending and receiving manner includes:

Obtain the first positioning reference information in the one-way sending and receiving method;

Obtain the second positioning reference information in the two-way sending and receiving mode;

The current location information of the at least one recording terminal is determined according to the first positioning reference information and the second positioning reference information.

The method of claim 5, wherein the first positioning reference information is one of angle information and distance information, and the second positioning reference information is one of the angle information and the distance information. Another of.

The method according to any one of claims 4 to 6, characterized in that, obtaining the current location information of the at least one recording terminal in the one-way transceiver mode includes:

Receive the first positioning signal sent by the at least one recording terminal in a broadcast manner, and generate the current location information of the at least one recording terminal according to the first positioning signal.

The method according to any one of claims 4 to 6, characterized in that, obtaining the location information of the at least one recording terminal in the two-way transceiver mode includes:

Receive a positioning start signal sent by the at least one recording terminal in a broadcast manner;

Send a response signal to the at least one recording terminal;

Receive a second positioning signal sent by the at least one recording terminal, and generate current location information of the at least one recording terminal according to the second positioning signal.

The method according to any one of claims 2 to 8, characterized in that each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.

The method of claim 9, further comprising:

Obtain initial position information of the at least one sound object.

The method according to any one of claims 1 to 10, characterized in that said synthesizing the sound data of at least one of the sound objects and the current location information to generate object audio data includes:

Obtain audio parameters and use the audio parameters as header file information of the object audio data;

At each sampling moment, the sound data of each sound object is saved as an object audio signal, and the current position information is saved as object audio auxiliary data to generate the object audio data.

The method of claim 11, further comprising:

The sound data and the current location information are saved in units of frames.

A device for generating object audio data, characterized by including:

a data acquisition unit configured to acquire sound data of at least one sound object;

an information acquisition unit configured to acquire current location information of the at least one sound object;

A data generating unit configured to synthesize the sound data of at least one of the sound objects and the current location information to generate object audio data.

The device according to claim 13, wherein the information acquisition unit is specifically configured to:

The device according to claim 13 or 14, characterized in that the device further includes:

A synchronization processing unit configured to synchronize the sound data of the at least one sound object and the current location information.

The device according to claim 14, wherein the information acquisition unit is specifically configured to

The device of claim 16, wherein the information acquisition unit includes:

The first information acquisition module is configured to acquire the first positioning reference information in the one-way transceiver mode;

The second information acquisition module is configured to acquire the second positioning reference information in the two-way sending and receiving mode;

The first current information acquisition module is configured to determine the current location information of the at least one recording terminal according to the first positioning reference information and the second positioning reference information.

The device of claim 17, wherein the first positioning reference information is one of angle information and distance information, and the second positioning reference information is one of the angle information and the distance information. Another kind of.

The device according to any one of claims 16 to 18, characterized in that the information acquisition unit includes:

The second current information acquisition module is configured to receive the first positioning signal sent by the at least one recording terminal in a broadcast manner, and generate the current location information of the at least one recording terminal according to the first positioning signal.

A signal receiving module configured to receive a positioning start signal sent by the at least one recording terminal in a broadcast manner;

A signal sending module configured to send a response signal to the at least one recording terminal;

The third current information acquisition module is configured to receive the second positioning signal sent by the at least one recording terminal, and generate the current location information of the at least one recording terminal according to the second positioning signal.

The device according to any one of claims 14 to 20, wherein each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.

The device according to claim 21, characterized in that the device further includes:

An initial position acquisition unit is configured to acquire initial position information of the at least one sound object.

The device according to any one of claims 13 to 22, characterized in that the data generating unit includes:

A parameter acquisition module configured to acquire audio parameters and use the audio parameters as header file information of the object audio data;

The audio data generation module is configured to, at each sampling moment, save the sound data of each sound object as an object audio signal, and save the current position information as object audio auxiliary data, to generate the Object audio data.

The device of claim 23, wherein the data generating unit further includes:

A processing module configured to save the sound data and the current location information in frame units.

An electronic device, characterized by including:

at least one processor; and

a memory communicatively connected to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 12 Methods.

A non-transitory computer-readable storage medium storing computer instructions, characterized in that the computer instructions are used to cause the computer to execute the method described in any one of claims 1 to 12.

A computer program product comprising computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method of any one of claims 1 to 12.