[go: up one dir, main page]

CN111210809A - Voice training data adaptation method and device, voice data conversion method and electronic equipment - Google Patents

Voice training data adaptation method and device, voice data conversion method and electronic equipment Download PDF

Info

Publication number
CN111210809A
CN111210809A CN201811400134.7A CN201811400134A CN111210809A CN 111210809 A CN111210809 A CN 111210809A CN 201811400134 A CN201811400134 A CN 201811400134A CN 111210809 A CN111210809 A CN 111210809A
Authority
CN
China
Prior art keywords
data
original voice
voice data
voice
conversion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811400134.7A
Other languages
Chinese (zh)
Other versions
CN111210809B (en
Inventor
张平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811400134.7A priority Critical patent/CN111210809B/en
Publication of CN111210809A publication Critical patent/CN111210809A/en
Application granted granted Critical
Publication of CN111210809B publication Critical patent/CN111210809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a voice training data adaptation method and device, a voice data conversion method and electronic equipment. The voice training data adaptation method comprises the following steps: acquiring original voice data for data conversion, wherein the original voice data has audio data information in all directions; and converting the original voice data through a channel conversion algorithm to obtain training data suitable for different channels. The embodiment of the invention carries out conversion processing on the existing original voice data through the channel conversion algorithm to obtain the training data adaptive to different channels, avoids training by carrying out a large amount of voice data acquisition on a new voice recognition product every time, and can obtain the training data adaptive to the voice recognition product only by updating and maintaining the channel conversion algorithm, thereby improving the modeling efficiency of a new voice matching model and saving the labor cost.

Description

Voice training data adaptation method and device, voice data conversion method and electronic equipment
Technical Field
The invention relates to the technical field of smart home, in particular to a voice training data adaptation method and device, a voice data conversion method and electronic equipment.
Background
The intelligent sound box is an upgrading product of the sound box, is a tool for family consumers to acquire songs, weather forecasts, news and the like from a cloud end through voice input, and can also control other intelligent household equipment, such as opening a curtain through voice input, setting the temperature of a refrigerator, warming a water heater in advance and the like.
Different intelligent audio amplifier products all have the difference in the aspect of microphone setting and speech signal processing technique. The service provider (used for providing services such as songs, weather and news) needs to set a voice database matched with the intelligent sound boxes of different models, uses voice data in the voice database as training data to train matching models suitable for the intelligent sound boxes of various models, and performs matching operations in aspects such as voiceprint and voice through corresponding matching models after a user inputs voice by using the intelligent sound box of a certain model, so that voiceprint recognition or voice recognition is realized.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems: with the upgrading and development of technology, new voice recognition products are continuously released in the market. After a new product is released, because the existing voice data in the existing voice database is not matched with the new product, a service provider needs to acquire a large amount of voice data for the new product and acquire voice training data suitable for the model of voice recognition product to perform modeling, and the acquisition efficiency is very low.
Disclosure of Invention
The embodiment of the invention provides a voice training data adaptation method and device, a voice data conversion method and electronic equipment, and aims to overcome the defect that the training data acquisition efficiency is low in the prior art.
To achieve the above object, an embodiment of the present invention provides a method for adapting speech training data, including:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in all directions;
and converting the original voice data through a channel conversion algorithm to obtain training data suitable for different channels.
The embodiment of the invention also provides a voice data conversion method, which comprises the following steps:
converting original voice data through a channel conversion algorithm matched with a playing device to obtain training data suitable for the playing device, wherein the original voice data have audio data information in all directions;
performing model training according to the training data to obtain a data conversion model;
and converting the data to be output of the playing equipment according to the data conversion model so as to obtain the playing data suitable for the playing equipment.
The embodiment of the present invention further provides a device for adapting voice training data, including:
the system comprises an original voice data acquisition module, a voice conversion module and a voice conversion module, wherein the original voice data acquisition module is used for acquiring original voice data for data conversion, and the original voice data has audio data information in all directions;
and the data conversion module is used for converting the original voice data through a channel conversion algorithm so as to obtain training data suitable for different channels.
An embodiment of the present invention further provides an electronic device, including:
a memory for storing a program;
a processor for executing the program stored in the memory for:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in all directions;
and converting the original voice data through a channel conversion algorithm to obtain training data suitable for different channels.
According to the voice training data adapting method and device, the voice data conversion method and the electronic equipment, the existing original voice data are converted through the channel conversion algorithm to obtain the training data adapted to different channels, the training data adapted to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm and training a large amount of voice data acquisition on the new voice recognition product every time, the modeling efficiency of a new voice matching model is improved, and meanwhile the labor cost is saved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a system block diagram of a service system according to an embodiment of the present invention;
FIG. 2 is a flowchart of an embodiment of a method for adapting speech training data provided by the present invention;
FIG. 3 is a flowchart of another embodiment of a method for adapting speech training data according to the present invention;
FIG. 4 is a schematic structural diagram of an embodiment of an apparatus for adapting speech training data according to the present invention;
FIG. 5 is a schematic structural diagram of another embodiment of an apparatus for adapting speech training data according to the present invention;
FIG. 6 is a flow chart of an embodiment of a method for converting voice data provided by the present invention;
fig. 7 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In the prior art, different voice recognition products (e.g., smart speaker products) have differences in both microphone settings and voice signal processing techniques. The service provider needs to provide a voice database matched with the intelligent sound boxes of different models, and takes the voice data in the voice database as training data to train a matching model suitable for various types of voice recognition products. After a user inputs voice by using a certain type of voice recognition product, matching operations in aspects of voiceprint, voice and the like can be carried out through the corresponding matching model, and therefore voiceprint recognition or voice recognition is achieved. When a new speech recognition product is released, because stock speech data in the existing speech database is not matched with the new product, a service provider needs to acquire a large amount of speech data for the new product and acquire training data suitable for the speech recognition product of the model for modeling, and the acquisition efficiency is very low. Therefore, the present application provides a scheme for adapting speech training data, which has the following main principles: the method comprises the steps of converting original voice data (namely voice data with audio data information in all directions, such as complete channel information, rich high-frequency information, voice data with noise removal and the like) which are obtained or obtained in advance through a channel conversion algorithm to obtain training data suitable for different channels (such as two-microphone, four-microphone, six-microphone and the like), so that the condition that a large amount of voice data are collected to train a new voice recognition product every time is avoided, and only the channel conversion algorithm needs to be updated and maintained, the training data suitable for the voice recognition product can be obtained, therefore, the modeling efficiency of a matching model of the new voice recognition product can be improved, and meanwhile, the labor cost is saved.
The method provided by the embodiment of the invention can be applied to any business system with voice data processing capability. Fig. 1 is a system block diagram of a service system provided in an embodiment of the present invention, and the structure shown in fig. 1 is only one example of a service system to which the technical solution of the present invention can be applied. As shown in fig. 1, the service system includes a training data adapting device. The device includes: the raw speech data acquisition module and the data conversion module may be configured to perform the processing flows shown in fig. 2 and 3 below.
In the service system, first, original voice data for data conversion is acquired, the original voice data having audio data information in each direction; then, the obtained original voice data is converted through a channel conversion algorithm to obtain training data suitable for different channels. Specifically, the existing original voice data (i.e., the channel information is complete, the high-frequency information is rich, and the noisy high-quality voice data is removed) can be directly obtained; the existing stock data can also be recorded in high fidelity, so that the original voice data can be obtained; in addition, for data which is not contained in the existing data, the voice of a recording person can be recorded through a high-fidelity recording device to supplement. After conversion processing is performed through a channel conversion algorithm, training data (e.g., two-wheat data, four-wheat data, six-wheat data, etc.) suitable for different channels are obtained to be respectively used for training different matching models (e.g., two-wheat model, four-wheat model, six-wheat model, etc.).
The above embodiments are illustrations of technical principles and exemplary application frameworks of the embodiments of the present invention, and specific technical solutions of the embodiments of the present invention are further described in detail below through a plurality of embodiments.
Example one
Fig. 2 is a flowchart of an embodiment of a method for adapting speech training data provided by the present invention, where an execution subject of the method may be the service system, various server devices with speech data processing capability, or devices or chips integrated on the server devices. As shown in fig. 2, the method for adapting speech training data includes the following steps:
s201, original voice data for data conversion is acquired.
In the embodiment of the present invention, the original voice data has audio data information in various directions. The existing original voice data can be obtained from the first database, the original voice data obtained by recording existing stock data through high-fidelity recording equipment can also be obtained from the second database, and the original voice data obtained by recording personnel through the high-fidelity recording equipment can also be obtained from the third database.
S202, original voice data are converted through a channel conversion algorithm to obtain training data suitable for different channels.
In the embodiment of the present invention, step S201, i.e., the process of acquiring the original voice data, is independent of the data conversion process. The raw speech data is used as input to the channel switching algorithm, and the acquisition step is a pre-processing data preparation process. And step S202, i.e., the data conversion process, may be performed whenever corresponding training data is required.
According to the voice training data adaptation method provided by the embodiment of the invention, the existing original voice data is converted through the channel conversion algorithm to obtain the training data adapted to different channels, so that the training data adapted to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm while the training data acquired by acquiring a large amount of voice data of a new voice recognition product is avoided, the modeling efficiency of a new voice matching model is improved, and the labor cost is saved.
Example two
Fig. 3 is a flowchart of another embodiment of a method for adapting speech training data according to the present invention. As shown in fig. 3, on the basis of the embodiment shown in fig. 2, the method for adapting speech training data provided in this embodiment may further include the following steps:
s301, existing original voice data are obtained from a first database.
S302, original voice data obtained by recording existing stock data through high-fidelity recording equipment are obtained in a second database.
And S303, acquiring original voice data obtained by recording the sound of a recording person through high-fidelity recording equipment in a third database.
In the embodiment of the present invention, the execution sequence of the steps S301 to S303 is not sequential, may be performed simultaneously, or may be performed sequentially according to any sequence, and of course, any one or two of the three steps may also be performed.
In addition, the method for adapting speech training data provided in the embodiment of the present invention may further include an obtaining step of a channel switching algorithm, as shown in steps S304 to S305 described below.
S304, acquiring the recording data aiming at the fixed text under different channels.
In the embodiment of the present invention, a section of fixed text may be set first, and when a channel conversion algorithm is obtained, different recording data may be obtained by recording the section of fixed text in different channels, for example, in a channel environment of two microphones, four microphones, six microphones, and the like and original speech.
Further, for the same channel environment, data acquisition at different distances can be performed to obtain recording data for the fixed text at different distances.
S305, acquiring a channel conversion algorithm according to different parameter distribution functions of different recording data.
In the embodiment of the invention, aiming at the recording data under different channels, a channel conversion algorithm can be obtained according to the Gaussian distribution function of the recording data; aiming at the recording data at different distances, a channel conversion algorithm can be obtained according to the energy distribution function of the recording data, and finally, the channel conversion algorithm for data conversion is obtained.
S306, the original voice data is converted through a channel conversion algorithm to obtain training data suitable for different channels.
In the embodiment of the present invention, steps S301 to S303 (i.e., the acquisition process of the raw speech data) are independent of steps S304 to S305 (i.e., the acquisition process of the channel conversion algorithm), the raw speech data is used as the input of the channel conversion algorithm, and the acquisition process can be regarded as a pre-processed data preparation process; the channel conversion algorithm acquisition process needs to be executed each time a new smart speaker is generated, so as to update and maintain the old channel conversion algorithm.
According to the voice training data adaptation method provided by the embodiment of the invention, the existing original voice data is converted through the channel conversion algorithm to obtain the training data adapted to different channels, so that the training data adapted to the voice recognition product can be obtained only by updating and maintaining the channel conversion algorithm while the training data acquired by acquiring a large amount of voice data of a new voice recognition product is avoided, the modeling efficiency of a new voice matching model is improved, and the labor cost is saved.
EXAMPLE III
Fig. 4 is a schematic structural diagram of an embodiment of a speech training data adaptation apparatus provided by the present invention, which can be used to execute the method steps shown in fig. 2. As shown in fig. 4, the speech training data adapting apparatus may include: a raw voice data acquisition module 41 and a data conversion module 42.
The original voice data obtaining module 41 may be configured to obtain original voice data for data conversion; the data conversion module 42 may be configured to perform conversion processing on the raw voice data acquired by the raw voice data acquisition module 41 through a channel conversion algorithm to obtain training data suitable for different channels.
In the embodiment of the present invention, the original voice data has audio data information in various directions. After the original voice data obtaining module 41 obtains the original voice data, the data converting module 42 may perform conversion processing on the original voice data obtained by the original voice data obtaining module 41 through a channel conversion algorithm to obtain training data suitable for different channels. The original voice data acquisition process by the original voice data acquisition module 41 is independent of the data conversion process by the data conversion module 42. The raw speech data is used as input to the channel switching algorithm, and the acquisition step is a pre-processing data preparation process. The data conversion process can be implemented whenever the corresponding training data is needed.
The voice training data adapting device provided by the embodiment of the invention carries out conversion processing operation on the existing original voice data through the channel conversion algorithm to obtain the training data adapted to different channels, avoids training by carrying out a large amount of voice data acquisition on a new voice recognition product every time, and can obtain the training data adapted to the voice recognition product only by updating and maintaining the channel conversion algorithm, thereby improving the modeling efficiency of a new voice matching model and saving the labor cost.
Example four
Fig. 5 is a schematic structural diagram of another embodiment of the speech training data adaptation apparatus provided by the present invention, which can be used to execute the method steps shown in fig. 3. As shown in fig. 5, on the basis of the embodiment shown in fig. 4, the speech training data adaptation apparatus provided in the embodiment of the present invention may further include: an algorithm acquisition module 51. The algorithm obtaining module 51 may be configured to obtain recording data for a fixed text under different channels, and obtain a channel conversion algorithm according to a difference parameter distribution function of different recording data.
In the embodiment of the present invention, a section of fixed text may be set first, and when acquiring the channel conversion algorithm, the algorithm acquisition module 51 may record the section of fixed text in different channels, for example, in the environment of two-microphone, four-microphone, six-microphone, and the like and in a high-fidelity channel environment, to acquire different recording data.
Further, the algorithm obtaining module 51 may be further configured to obtain the recording data for the fixed text at different distances for the same channel environment.
In the embodiment of the present invention, the algorithm obtaining module 51 may obtain a channel conversion algorithm according to a gaussian distribution function of the recording data under different channels; aiming at the recording data at different distances, a channel conversion algorithm can be obtained according to the energy distribution function of the recording data, and finally, the channel conversion algorithm for data conversion is obtained.
In the embodiment of the present invention, the process algorithm obtaining module 51 for obtaining the original voice data by the original voice data obtaining module 41 obtains a process of obtaining a channel conversion algorithm, the original voice data is used as an input of the channel conversion algorithm, and the obtaining process can be regarded as a data preparation process of preprocessing; the channel conversion algorithm acquisition process needs to be executed each time a new smart speaker is generated, so as to update and maintain the old channel conversion algorithm.
Still further, the raw speech data acquisition module 41 may include: a first obtaining unit 411, where the first obtaining unit 411 may be configured to obtain existing original voice data in a first database.
The raw speech data acquisition module 41 may further include: a second obtaining unit 412, where the second obtaining unit 412 may be configured to obtain, in a second database, original voice data obtained by recording existing inventory data with a high-fidelity recording apparatus.
The raw speech data acquisition module 41 may further include: a third obtaining unit 413, where the third obtaining unit 413 may be configured to obtain, in a third database, original voice data obtained by recording a human voice by a high-fidelity recording apparatus.
In this embodiment of the present invention, the acquiring orders of the first acquiring unit 411, the second acquiring unit 412, and the third acquiring unit 413 are not sequential, and may be executed simultaneously, or may be executed sequentially according to an arbitrary order, or of course, any one or two of the three units may be optionally executed.
The voice training data adapting device provided by the embodiment of the invention carries out conversion processing operation on the existing original voice data through the channel conversion algorithm to obtain the training data adapted to different channels, avoids training by carrying out a large amount of voice data acquisition on a new voice recognition product every time, and can obtain the training data adapted to the voice recognition product only by updating and maintaining the channel conversion algorithm, thereby improving the modeling efficiency of a new voice matching model and saving the labor cost.
EXAMPLE five
Fig. 6 is a flowchart of a voice data conversion method according to an embodiment of the present invention. The execution subject of the method can be various server devices with voice data processing capability, and can also be devices or chips integrated on the server devices. As shown in fig. 6, the voice data conversion method includes the steps of:
s601, converting the original voice data through a channel conversion algorithm matched with the playing device to obtain training data suitable for the playing device.
In the embodiment of the present invention, the original voice data refers to voice data having audio data information in various directions.
Regarding the acquisition of the original voice data, the existing original voice data can be acquired in the first database, the original voice data obtained by recording the existing stock data through the high-fidelity recording equipment can be acquired in the second database, and the original voice data obtained by recording the recording personnel through the high-fidelity recording equipment can be acquired in the third database.
When TTS (Text To Speech, i.e., from Text To Speech) is played, the Speech playing device needs To play Speech according To the configured Speech database. And for playing devices of different models, voice databases of different channels need to be configured. According to the voice data conversion method provided by the embodiment of the invention, when a new playing device is generated, a server providing support for the playing device can acquire the channel conversion matched with the playing device according to the credit type of the playing device, so as to acquire the training data suitable for the playing device.
Specifically, when a channel conversion algorithm matched with the playing device is obtained, the following steps may be taken: acquiring recording data aiming at the fixed text under different channels, wherein the recording data comprises the recording data aiming at the fixed text by a playing device; then, according to the different parameter distribution functions of different recording data, a channel conversion algorithm is obtained.
In the embodiment of the present invention, a section of fixed text may be set first, and when a channel conversion algorithm is obtained, different recording data may be obtained by recording the section of fixed text in different channels, for example, in a channel environment of two microphones, four microphones, six microphones, and the like and original speech.
And aiming at the recording data under different channels, a channel conversion algorithm can be obtained according to the Gaussian distribution function of the recording data.
And S602, performing model training according to the training data to obtain a data conversion model.
S603, according to the data conversion model, converting the data to be output of the playing device to obtain the playing data suitable for the playing device.
In the embodiment of the invention, the server performs model training after acquiring the training data suitable for the playing equipment, thereby obtaining the data conversion model.
When the playing device plays the voice, the data to be output can be sent to the server, the server inputs the data to be output into the data conversion model, and the model automatically outputs the playing data suitable for the playing device. When the playing device receives the playing data from the server, the playing can be performed.
According to the voice data conversion method provided by the embodiment of the invention, the existing original voice data is converted through the channel conversion algorithm matched with the playing equipment to obtain the training data adaptive to the playing equipment, so that the situation that a large amount of voice data acquisition is carried out on a new voice playing product every time, and the training data adaptive to the voice playing product can be obtained only by updating and maintaining the channel conversion algorithm can be avoided, and thus a data conversion model is trained, the conversion of the data to be played of a new product is realized, the voice playing quality is improved, and the labor cost during data acquisition can be saved.
EXAMPLE six
The internal functions and structure of the speech training data adaptation apparatus, which can be implemented as an electronic device, are described above. Fig. 7 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention. As shown in fig. 7, the electronic device includes a memory 71 and a processor 72.
The memory 71 stores programs. In addition to the above-described programs, the memory 71 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.
The memory 71 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
A processor 72, coupled to the memory 71, that executes programs stored by the memory 71 to:
acquiring original voice data for data conversion, the original voice data having audio data information in various directions;
and converting the acquired original voice data through a channel conversion algorithm to acquire training data suitable for different channels.
Further, as shown in fig. 7, the electronic device may further include: communication components 73, power components 74, audio components 75, a display 76, and the like. Only some of the components are schematically shown in fig. 7, and the electronic device is not meant to include only the components shown in fig. 7.
The communication component 73 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 73 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 73 further includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
A power supply component 74 provides power to the various components of the electronic device. The power components 74 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.
The audio component 75 is configured to output and/or input audio signals. For example, the audio component 75 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory 71 or transmitted via a communication component 73. In some embodiments, audio assembly 75 also includes a speaker for outputting audio signals.
The display 76 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for adapting speech training data, comprising:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in all directions;
and converting the original voice data through a channel conversion algorithm to obtain training data suitable for different channels.
2. The method of adapting speech training data according to claim 1, further comprising, before said converting said raw speech data by a channel conversion algorithm:
acquiring recording data aiming at the fixed text under different channels;
and acquiring the channel conversion algorithm according to different difference parameter distribution functions of the recording data.
3. The method of adapting speech training data according to claim 2, further comprising:
and acquiring the recording data aiming at the fixed text under different distances.
4. The method of claim 2, wherein the distribution function of the difference parameters of the recorded data under different channels is a Gaussian distribution function.
5. The method of claim 3, wherein the difference parameter distribution function of the recorded data at different distances is an energy distribution function.
6. The method of any of claims 1 to 5, wherein the obtaining raw speech data for data conversion comprises:
existing raw speech data is retrieved from a first database.
7. The method of any of claims 1 to 5, wherein the obtaining raw speech data for data conversion comprises:
and acquiring original voice data obtained by recording the existing stock data through high-fidelity recording equipment in a second database.
8. The method of any of claims 1 to 5, wherein the obtaining raw speech data for data conversion comprises:
and acquiring original voice data obtained by recording the voice of a recording person through high-fidelity recording equipment in a third database.
9. A method for converting voice data, comprising:
converting original voice data through a channel conversion algorithm matched with a playing device to obtain training data suitable for the playing device, wherein the original voice data have audio data information in all directions;
performing model training according to the training data to obtain a data conversion model;
and converting the data to be output of the playing equipment according to the data conversion model so as to obtain the playing data suitable for the playing equipment.
10. The voice data conversion method according to claim 9, wherein before the conversion processing of the original voice data by the channel conversion algorithm matched with the playback device, the method comprises:
acquiring recording data for a fixed text under different channels, wherein the recording data comprises the recording data for the fixed text of the playing equipment;
and acquiring the channel conversion algorithm according to different difference parameter distribution functions of the recording data.
11. An apparatus for adapting speech training data, comprising:
the system comprises an original voice data acquisition module, a voice conversion module and a voice conversion module, wherein the original voice data acquisition module is used for acquiring original voice data for data conversion, and the original voice data has audio data information in all directions;
and the data conversion module is used for converting the original voice data through a channel conversion algorithm so as to obtain training data suitable for different channels.
12. An electronic device, comprising:
a memory for storing a program;
a processor for executing the program stored in the memory for:
acquiring original voice data for data conversion, wherein the original voice data has audio data information in all directions;
and converting the original voice data through a channel conversion algorithm to obtain training data suitable for different channels.
CN201811400134.7A 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment Active CN111210809B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811400134.7A CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811400134.7A CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Publications (2)

Publication Number Publication Date
CN111210809A true CN111210809A (en) 2020-05-29
CN111210809B CN111210809B (en) 2024-03-19

Family

ID=70789391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811400134.7A Active CN111210809B (en) 2018-11-22 2018-11-22 Voice training data adaptation method and device, voice data conversion method and electronic equipment

Country Status (1)

Country Link
CN (1) CN111210809B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456697B1 (en) * 1998-09-23 2002-09-24 Industrial Technology Research Institute Device and method of channel effect compensation for telephone speech recognition
US6502070B1 (en) * 2000-04-28 2002-12-31 Nortel Networks Limited Method and apparatus for normalizing channel specific speech feature elements
US20050128938A1 (en) * 2003-12-16 2005-06-16 Yuguang Fang Channel estimation and synchronization with preamble using polyphase code
US20070239441A1 (en) * 2006-03-29 2007-10-11 Jiri Navratil System and method for addressing channel mismatch through class specific transforms
CN101674087A (en) * 2009-09-27 2010-03-17 电子科技大学 Method for obtaining channel mismatching error of time alternative ADC system
KR20100081165A (en) * 2009-01-05 2010-07-14 경희대학교 산학협력단 Method for calculating security capacity of gaussian mimo wiretap channel
US20110105031A1 (en) * 2009-10-30 2011-05-05 Action Star Enterprise Co., Ltd. Audio broadcasting system and method for broadcasting the same
CN102129859A (en) * 2010-01-18 2011-07-20 盛乐信息技术(上海)有限公司 Voiceprint authentication system and method for rapid channel compensation
CN104064204A (en) * 2013-03-22 2014-09-24 宏达国际电子股份有限公司 Audio playback system and method applied to handheld electronic devices
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
US20140337026A1 (en) * 2013-05-09 2014-11-13 International Business Machines Corporation Method, apparatus, and program for generating training speech data for target domain
CN104464786A (en) * 2014-11-21 2015-03-25 西安诺瓦电子科技有限公司 Audio frequency controlling device and method
CN106941007A (en) * 2017-05-12 2017-07-11 北京理工大学 A kind of audio event model composite channel adaptive approach
CN107123432A (en) * 2017-05-12 2017-09-01 北京理工大学 A kind of Self Matching Top N audio events recognize channel self-adapted method
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A channel matching method and device for voiceprint recognition
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6456697B1 (en) * 1998-09-23 2002-09-24 Industrial Technology Research Institute Device and method of channel effect compensation for telephone speech recognition
US6502070B1 (en) * 2000-04-28 2002-12-31 Nortel Networks Limited Method and apparatus for normalizing channel specific speech feature elements
US20050128938A1 (en) * 2003-12-16 2005-06-16 Yuguang Fang Channel estimation and synchronization with preamble using polyphase code
US20070239441A1 (en) * 2006-03-29 2007-10-11 Jiri Navratil System and method for addressing channel mismatch through class specific transforms
KR20100081165A (en) * 2009-01-05 2010-07-14 경희대학교 산학협력단 Method for calculating security capacity of gaussian mimo wiretap channel
CN101674087A (en) * 2009-09-27 2010-03-17 电子科技大学 Method for obtaining channel mismatching error of time alternative ADC system
US20110105031A1 (en) * 2009-10-30 2011-05-05 Action Star Enterprise Co., Ltd. Audio broadcasting system and method for broadcasting the same
CN102129859A (en) * 2010-01-18 2011-07-20 盛乐信息技术(上海)有限公司 Voiceprint authentication system and method for rapid channel compensation
CN104064204A (en) * 2013-03-22 2014-09-24 宏达国际电子股份有限公司 Audio playback system and method applied to handheld electronic devices
JP2014204316A (en) * 2013-04-05 2014-10-27 日本放送協会 Acoustic signal reproducing device and acoustic signal preparation device
US20140337026A1 (en) * 2013-05-09 2014-11-13 International Business Machines Corporation Method, apparatus, and program for generating training speech data for target domain
CN104464786A (en) * 2014-11-21 2015-03-25 西安诺瓦电子科技有限公司 Audio frequency controlling device and method
CN106941007A (en) * 2017-05-12 2017-07-11 北京理工大学 A kind of audio event model composite channel adaptive approach
CN107123432A (en) * 2017-05-12 2017-09-01 北京理工大学 A kind of Self Matching Top N audio events recognize channel self-adapted method
CN107481723A (en) * 2017-08-28 2017-12-15 清华大学 A channel matching method and device for voiceprint recognition
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高荣春;韩纪庆;张磊;: "说话人识别中基于最大后验概率的通道补偿方法", 通信学报, no. 03 *

Also Published As

Publication number Publication date
CN111210809B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
US20220286317A1 (en) Apparatus, system and method for directing voice input in a controlling device
CN111161714B (en) Voice information processing method, electronic equipment and storage medium
JP6783339B2 (en) Methods and devices for processing audio
CN103021401B (en) Internet-based multi-people asynchronous chorus mixed sound synthesizing method and synthesizing system
CN107864410B (en) Multimedia data processing method and device, electronic equipment and storage medium
JP2019204074A (en) Speech dialogue method, apparatus and system
JP2014089437A (en) Voice recognition device, and voice recognition method
CN103491411A (en) Method and device based on language recommending channels
CN108959634A (en) Video recommendation method, device, equipment and storage medium
CN109961786A (en) Products Show method, apparatus, equipment and storage medium based on speech analysis
CN109473104A (en) Speech recognition network delay optimization method and device
CN112802465A (en) Voice control method and system
CN104091596A (en) Music identifying method, system and device
CN113921004A (en) Intelligent device control method and device, storage medium and electronic device
CN104112459A (en) Method and apparatus for playing audio data
CN110099295B (en) TV voice control method, device, equipment and storage medium
CN105549876A (en) Method and apparatus for performing input in input box
CN112331195A (en) Voice interaction method, device and system
CN105242552A (en) Installation guide method and device
KR20190119521A (en) Electronic apparatus and operation method thereof
CN106453005A (en) Intelligent air-conditioning system having personalized speech broadcast function
CN105355194A (en) Speech synthesis method and speech synthesis device
CN110139164A (en) A kind of voice remark playback method, device, terminal device and storage medium
CN112700770A (en) Voice control method, sound box device, computing device and storage medium
CN111210809B (en) Voice training data adaptation method and device, voice data conversion method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant