[go: up one dir, main page]

WO2017045429A1 - Procédé et système de détection de données audio, et support d'informations - Google Patents

Procédé et système de détection de données audio, et support d'informations Download PDF

Info

Publication number
WO2017045429A1
WO2017045429A1 PCT/CN2016/083044 CN2016083044W WO2017045429A1 WO 2017045429 A1 WO2017045429 A1 WO 2017045429A1 CN 2016083044 W CN2016083044 W CN 2016083044W WO 2017045429 A1 WO2017045429 A1 WO 2017045429A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
audio data
user
feature
valid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/083044
Other languages
English (en)
Chinese (zh)
Inventor
傅鸿城
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Publication of WO2017045429A1 publication Critical patent/WO2017045429A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/81Detection of presence or absence of voice signals for discriminating voice from music

Definitions

  • the present invention relates to the field of information processing technologies, and in particular, to a method, system, and storage medium for detecting audio data.
  • Users can record audio data through the user terminal and upload it to the server so that the server can share the audio data to other user terminals.
  • the server In order to prevent people from maliciously using audio transmission illegal or sensitive content of celebrities or sensitive people, it is necessary to first detect the audio data uploaded to the server, and then the server will release the audio data detected.
  • the embodiment of the invention provides a method, a system and a storage medium for detecting audio data, which realizes automatic detection of audio data by a detection system of audio data.
  • An embodiment of the present invention provides a method for detecting audio data, including:
  • the embodiment of the invention further provides a detection system for audio data, comprising:
  • a model acquisition unit configured to acquire a timbre-based machine learning model of the first voice data included in the audio data of the first user
  • a data acquiring unit configured to acquire second voice data included in audio data of the second user
  • a tone color extracting unit configured to extract a timbre feature of the second voice data acquired by the data acquiring unit
  • An information determining unit configured to determine, according to a matching degree of the timbre feature of the second voice data extracted by the timbre extraction unit and the machine learning model acquired by the model acquiring unit, the release information of the audio data of the second user, The posting information is used as a detection result of the audio data.
  • the detection system of the audio data acquires a machine learning model trained by the timbre characteristics of the first voice data in the audio data of the first user, and then according to the second in the audio data of the second user.
  • the timbre feature of the two voice data and the machine learning model can determine whether the audio data of the second user is sensitive data, whether the audio data of the second user can be released, and the second user is automatically detected by the detection system of the audio data.
  • the audio data is detected to determine whether the audio data of the second user can be released.
  • the method of the present embodiment is low in cost, high in efficiency, and avoids The factors that determine the uncertainty of the test results caused by manual detection.
  • FIG. 1 is a flowchart of a method for detecting audio data according to an embodiment of the present invention
  • FIG. 2 is a flowchart of a method for acquiring a timbre-based machine learning model of first voice data included in audio data of a first user in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of extracting PLP features from first valid data in an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for acquiring second voice data included in audio data of a second user in an embodiment of the present invention
  • FIG. 5 is a schematic diagram of extracting MFCC features from second effective data in an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for determining release information of audio data of a second user according to a matching degree between a timbre feature of the second voice data and a machine learning model according to an embodiment of the present invention
  • FIG. 7 is a schematic structural diagram of a system for detecting audio data according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of another audio data detecting system according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of another audio data detecting system according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of another audio data detecting system according to an embodiment of the present invention.
  • the embodiment of the present invention provides a method for detecting audio data, which is mainly used for detecting audio data by a detection system of audio data after a user uploads audio data to a server by using a user terminal, and before the server issues the audio data.
  • the flow chart is shown in Figure 1, including:
  • Step 101 Acquire a timbre-based machine learning model of the first voice data included in the audio data of the first user, where the machine learning model can be obtained by training the timbre characteristics of the first voice data.
  • the audio data of the first user herein may refer to audio data of a celebrity or a sensitive group, which may be specifically intercepted from a news or task presentation; the first voice data refers to the voice data of the first user in the audio data of the first user.
  • the timbre feature may adopt a Perceptual Linear Predictive (PLP) feature
  • PLP Perceptual Linear Predictive
  • the PLP feature is a feature parameter based on the auditory model, which is a set of coefficients of the all-pole model predictive polynomial, and the PLP technique will
  • Step 102 Acquire second voice data included in the audio data of the second user, where the audio data of the second user mainly refers to audio data recorded by the user terminal, and the audio data detecting system acquires the second voice data of the step.
  • the second voice data can be obtained by removing the noise, mute and non-speech data in the audio data of the second user.
  • Step 103 Extract the timbre feature of the second voice data, and the timbre feature may be a feature such as a PLP.
  • the specific extraction method may be a method for obtaining the timbre feature of the first voice data in the above step 101, and details are not described herein.
  • Step 104 Determine, according to the matching degree of the timbre feature of the second voice data obtained in step 103 and the machine learning model obtained in step 101, the release information of the audio data of the second user, and use the release information as the detection result of the audio data.
  • the detection system of the audio data first calculates the matching degree of the timbre feature of the second voice data with the machine learning model, and then compares the matching degree with a preset threshold, and determines the second voice data according to the comparison result. Whether the timbre is close to the timbre of the first voice data, if not, determining that the audio data of the second user can be released, and if it is close, determining that the audio data of the second user cannot be released, and the second user is required to be released. The audio data is further reviewed.
  • the detection system of the audio data acquires a machine learning model trained by the timbre characteristics of the first voice data in the audio data of the first user, and then according to the second in the audio data of the second user.
  • the timbre feature of the two voice data and the machine learning model can determine whether the audio data of the second user is sensitive data, whether the audio data of the second user can be released, and the second user is automatically detected by the detection system of the audio data.
  • the audio data is detected to determine whether the audio data of the second user can be released.
  • the method of the present embodiment is low in cost, high in efficiency, and avoids The factors that determine the uncertainty of the test results caused by manual detection.
  • the detection system of the audio data may be specifically implemented by the following steps:
  • step A1 the first valid data included in the audio data of the first user is extracted, where the first valid data refers to the data of the first user's audio data from which noise and silence are removed, and the first valid data includes the first voice. data.
  • the detection system of the audio data may first pass the collected audio data of the first user.
  • the analog-to-digital conversion is converted into a binary sequence; then the binary sequence is segmented by using Voice Activity Detection (VAD) technology, and the first valid data in each segment is extracted.
  • VAD Voice Activity Detection
  • the endpoint detection technology can be divided into multiple types according to the feature selection, such as time domain algorithm, frequency domain algorithm, zero-crossing rate based algorithm and model-based algorithm.
  • steps B1 and C1 can be directly executed, and the resulting Gaussian mixture model (GMM) is a timbre model of the speech data.
  • GMM Gaussian mixture model
  • step B1 the perceptual linear prediction feature of the first valid data, that is, the PLP feature is extracted, and the first valid data of one frame can extract n features, thus forming an n-dimensional PLP feature.
  • the extraction of PLP features mainly includes three levels of techniques: critical band analysis processing, equal loudness curve pre-emphasis and signal intensity-auditory loudness transformation.
  • the critical band division reflects the masking effect of human ear hearing, which is the human ear hearing model.
  • the equivalent of the loudness curve pre-emphasis is the pre-emphasis of the equal loudness curve with the analog human ear about 40dB (decibel) equal-impedance curve; the signal strength-auditory loudness change is to approximate the difference between the intensity of the simulated sound and the loudness of the human ear. Linear relationship for intensity-loudness conversion.
  • the extraction of the PLP feature may be implemented by using the flow shown in FIG. 3, including: discrete Fourier transform, computational spectrum, critical band analysis, equal loudness pre-emphasis, intensity loudness transformation, inverse Fourier transform, all Pole model and calculation of cepstrum processing.
  • step C1 according to the PLP feature extracted in step B1, a first user's Gaussian mixture model (GMM) is determined by using an Expectation Maximization Algorithm (EM) algorithm.
  • GMM Gaussian mixture model
  • the audio data detection system can directly utilize the PLP feature and use the EM algorithm to train the GMM, or, in order to better reflect the dynamic change of the PLP feature, the audio data detection system can first calculate at least a first-order difference calculation of the PLP feature.
  • the obtained PLP differential feature is then used to calculate the PLP differential feature, and the GMM is trained by the EM algorithm, which is a timbre-based machine learning model of the first voice data.
  • the audio data detection system calculates the original PLP feature for the first-order difference feature when calculating the PLP differential feature, and the first-order differential feature for the second-order differential feature, and so on.
  • Any first-order difference feature can be achieved by the following formula (1):
  • the detection system of the audio data may be implemented by the following steps A2 and B2 when performing the above step 102:
  • step A2 the second valid data included in the audio data of the second user is extracted, and the second valid data may be extracted by using the endpoint detection technology, which is similar to the method for extracting the first valid data from the audio data of the first user. I will not repeat them here.
  • the audio data detection system may use the four-threshold endpoint detection technology to extract the second valid data, which may be implemented by the following steps A21 to A23:
  • Step A21 determining four threshold values, which are determined according to clustering information of audio data of a portion of the second user, the clustering information including noise energy, effective sound energy, and average energy of noise and effective sound. .
  • the detection system of the audio data can frame the audio data of the second user, and each frame can include 25 milliseconds (ms), with 10 ms as the frame shifting; then, after the framed audio data is subjected to noise reduction processing, Taking part of the audio data of the second user, for example, taking 50 frames of audio data, clustering the audio data of the second user of the part, and mainly clustering into two types, namely noise and effective sound, after clustering calculation The clustering information is obtained; finally, four threshold values can be calculated according to the noise energy, the effective sound energy, and the average energy of the noise and the effective sound in the clustering information.
  • ms milliseconds
  • Step A22 dividing the audio data of the second user into pieces of audio data according to the four threshold values and the energy information of the audio data of the second user, and determining that each piece of the audio data in the plurality of pieces of audio data is noise or is effective sound. Attributes.
  • the audio data detection system can calculate the energy of each frame of audio data for the framed audio data; then compare the energy of each frame of audio data with four threshold values, and divide the second user's audio data into multiple segments. Audio data; finally, according to the relationship between each piece of audio data and the four thresholds, determine whether each piece of audio data is a property of noise or effective sound.
  • the energy E of each frame of audio data can be: Where x i is the amplitude of the sampling point of one frame of audio data, and N is the number of sampling points of one frame of audio data.
  • Step A23 extracting audio data segments whose attributes are valid sounds in the plurality of pieces of audio data as The second valid data.
  • step B2 the non-speech data included in the second valid data is removed to obtain the second voice data.
  • the detection system of the audio data may be specifically implemented by the following steps B21 to B23:
  • Step B21 extracting a Mel Frequency Cepstrum Coefficient (MFCC) feature of the second valid data.
  • the MFCC feature is an auditory perceptual frequency domain cepstrum parameter, which mainly reflects the characteristics of the short-term amplitude spectrum of the speech from the perspective of the human ear's nonlinear psychological perception of the sound frequency.
  • the flow shown in FIG. 5 can be used.
  • the MFCC feature vector sequence can be obtained by performing discrete Fourier transform, modulo square operation, triangular filter function group, logarithm operation and discrete cosine transform on the second valid data.
  • Step B22 The MFCC feature is put into a Support Vector Machine (SVM) classification model to classify the voice data and the non-speech data in the second valid data.
  • SVM Support Vector Machine
  • the SVM classification model is that the detection system of the audio data is trained according to the MFCC features of the voice sample data and the non-speech sample data, the voice sample data mainly includes human voice data, and the non-voice data may include non-human such as pure music and noise. Sound data. And the SVM classification model can acquire the voice sample data and the non-speech sample data in advance by the detection system of the audio data, and then obtain the MFCC features of the voice sample data and the non-speech sample data respectively, and finally obtain the SVM according to the obtained MFCC feature training. The model is classified and preset into the detection system of the audio data.
  • Step B23 the non-speech data in the second valid data is removed according to the classification of step B22 to obtain the second voice data.
  • the detection system of the audio data may be implemented by performing the following steps:
  • the calculated timbre feature of the second voice data is in the likelihood probability of the Gaussian mixture model GMM, and the likelihood probability may represent the matching degree of the timbre feature of the second voice data with the GMM.
  • the GMM has M (such as 256) single Gauss representations, namely:
  • the likelihood probability of the timbre feature of the second speech data in the GMM is:
  • step B3 it is determined whether the likelihood probability is less than a preset value, such as 0.3. If it is less, the probability that the second voice data is similar to the timbre of the first voice data is smaller, and step C3 is performed; if it is greater than or equal to The probability that the second voice data approximates the timbre of the first voice data is large, and the audio data of the second user needs to be further detected.
  • a preset value such as 0.3
  • step C3 it is determined that the release information of the audio data of the second user is advertised.
  • An embodiment of the present invention further provides an evaluation system for voice data, and a schematic structural diagram thereof is shown in FIG.
  • a model obtaining unit 10 configured to acquire a timbre-based machine learning model of the first voice data included in the audio data of the first user
  • the data obtaining unit 11 is configured to acquire second voice data included in the audio data of the second user;
  • a tone color extracting unit 12 configured to extract a timbre feature of the second voice data acquired by the data acquiring unit 11;
  • the information determining unit 13 is configured to determine the release of the audio data of the second user according to the matching degree of the timbre feature of the second voice data extracted by the timbre extracting unit 12 and the machine learning model acquired by the model acquiring unit 10 The information is used as the detection result of the audio data.
  • the information determining unit 13 may first calculate the matching degree of the timbre feature of the second voice data with the machine learning model, and then compare the matching degree with a preset threshold, so that the second may be determined according to the comparison result. Whether the timbre of the voice data is close to the timbre of the first voice data, if not, determining that the audio data of the second user can be released, and if it is close, determining that the audio data of the second user cannot be released, and the The audio data of the two users is further reviewed.
  • the model obtaining unit 10 of the detection system of the audio data acquires a machine learning model trained by the timbre characteristics of the first voice data in the audio data of the first user, and then the information determining unit 13 According to the timbre feature of the second voice data in the second user's audio data and the machine learning model, it can be determined whether the audio data of the second user is sensitive data, and whether the audio data of the second user can be released.
  • the audio data of the second user is automatically detected by the detection system of the audio data to determine whether the audio data of the second user can be released.
  • the audio data is manually detected.
  • the implementation is low in cost, high in efficiency, and avoids factors that are uncertain due to manual detection.
  • the model obtaining unit 10 of the audio data detecting system may be specifically implemented by the effective data extracting unit 110, the PLP feature unit 210, and the model determining unit 310, specifically:
  • the valid data extracting unit 110 is configured to extract first valid data included in the audio data of the first user, where the first valid data includes the first voice data; the valid data extracting unit 110 may first collect The audio data of the first user is converted into a binary sequence by analog-to-digital conversion; then the binary sequence is segmented by using endpoint detection technology, and the first valid data in each segment is extracted.
  • the PLP feature unit 210 is configured to extract a perceptual linear prediction PLP feature of the first valid data extracted by the effective data extracting unit 110.
  • the model determining unit 310 is configured to determine the Gaussian mixture model GMM of the first user by using a maximum expected EM algorithm according to the perceptual linear prediction PLP feature extracted by the PLP feature unit 210.
  • the GMM is a timbre-based machine learning model of the first voice data.
  • the model determining unit 310 is specifically configured to directly use the perceptual linear prediction PLP feature to train the Gaussian mixture model GMM by using a maximum expected EM algorithm; or the model determining unit 310 is configured to utilize the perceptual linear prediction.
  • a perceptual linear prediction PLP differential feature obtained by at least a first-order difference calculation of a PLP feature, the Gaussian mixture model GMM being trained using a maximum expected EM algorithm.
  • the data acquisition unit 11 of the detection system of the audio data may be specifically implemented by the extraction unit 111 and the non-speech removal unit 112, and if the model acquisition unit 10 acquires the machine learning
  • the model is a Gaussian mixture model GMM
  • the information determining unit 13 is specifically implemented by the probability calculation unit 131 and the publication determining unit 132, specifically:
  • the extracting unit 111 is configured to extract second valid data included in the audio data of the second user.
  • the extracting unit 111 is specifically configured to determine four threshold values according to clustering information of the audio data of the second user, where the clustering information includes noise energy, effective sound energy, and average energy of noise and effective sound. And dividing the audio data of the second user into pieces of audio data according to the four threshold values and the energy of the audio data of the second user, determining that each piece of audio data in the multi-segment audio data is noise or An attribute of the effective sound; the audio data segment whose attribute is a valid sound in the plurality of pieces of audio data is extracted as the second valid data.
  • the non-speech removing unit 112 is configured to remove the non-speech data included in the second valid data extracted by the extracting unit 111 to obtain the second speech data, and then the timbre extracting unit 12 extracts the second obtained by the non-speech removing unit 112. The timbre characteristics of the two voice data.
  • the non-speech removing unit 112 is specifically configured to extract the MFCC feature of the second frequency valid data, and put the MFCC feature into the SVM classification model of the support vector machine for the second valid data. Sorting the voice data and the non-speech data; removing the non-speech data in the second valid data according to the classification to obtain the second voice data;
  • the SVM classification model is trained based on the MFCC features of the voice sample data and the non-speech sample data.
  • the probability calculation unit 131 is configured to calculate a likelihood probability of the Gaussian mixture model GMM acquired by the timbre feature of the second voice data extracted by the timbre extraction unit 12 in the model acquisition unit 10, and the likelihood probability may represent the second voice The match between the tone characteristics of the data and the GMM.
  • a release determining unit 132 configured to: if the likelihood probability calculated by the probability calculation unit 131 is less than a pre-predetermined The set value determines that the release information of the audio data of the second user is advertiseable; if the likelihood probability is greater than or equal to the preset value, determining that the release information of the audio data of the second user is not advertised.
  • the embodiment of the present invention further provides a terminal device, which is shown in FIG. 10 .
  • the terminal device may have a large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 20 (eg, one or more processors) and memory 21, one or more storage media 22 that store application 221 or data 222 (eg, one or one storage device in Shanghai).
  • the memory 21 and the storage medium 22 may be short-term storage or persistent storage.
  • the program stored on the storage medium 22 may include one or more modules (not shown), each of which may include a series of instruction operations in the terminal device.
  • central processor 20 may be arranged to communicate with storage medium 22 to perform a series of instruction operations in storage medium 22 on the terminal device.
  • the terminal device may also include one or more power sources 23, one or more wired or wireless network interfaces 24, one or more input and output interfaces 25, and/or one or more operating systems 223, such as Windows ServerTM, Mac OS. XTM, UnixTM, LinuxTM, FreeBSDTM and more.
  • the steps performed by the detection system of the audio data described in the above method embodiments may be based on the structure of the terminal device shown in FIG.
  • the embodiment of the present invention further provides a storage medium, which may be a non-volatile readable storage medium, in which processor-executable instructions are stored, and the processor-executable instructions are used to perform the following operations:
  • processor-executable instructions are operative to:
  • the first voice data is included in a valid data
  • processor executable instructions are to perform the following operations:
  • the processor executable instructions are configured to perform the following operations:
  • the clustering information includes noise energy, effective The average energy of sound energy and noise and effective sound;
  • each piece of audio data in the multi-segment audio data is noise or The property of a valid sound
  • the audio data segment whose attribute is a valid sound in the plurality of pieces of audio data is extracted as the second valid data.
  • processor executable instructions are configured to perform the following operations:
  • the SVM classification model is trained based on the MFCC features of the voice sample data and the non-speech sample data.
  • processor-executable instructions are operative to:
  • the machine learning model is a Gaussian mixture model GMM, and then according to the second voice data a degree of matching between the timbre feature and the machine learning model, and determining a likelihood probability of the timbre feature of the second voice data in the Gaussian mixture model GMM when determining the release information of the audio data of the second user;
  • the likelihood probability is less than a preset value, it is determined that the release information of the audio data of the second user is advertiseable.
  • the program may be stored in a computer readable storage medium, and the storage medium may include: Read only memory (ROM), random access memory (RAM), magnetic or optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un système de détection de données audio, qui s'appliquent au domaine technique du traitement d'informations. Le procédé de détection de données audio consiste : à acquérir un modèle d'apprentissage automatique entraîné à l'aide des caractéristiques tonales de premières données vocales dans des données audio d'un premier utilisateur, puis à déterminer, sur la base des caractéristiques tonales de secondes données vocales dans des données audio d'un second utilisateur et du modèle d'apprentissage automatique, si les données audio du second utilisateur sont des données confidentielles et si ces données audio peuvent être communiquées. Le système de détection de données audio comprend une unité d'acquisition de modèle (10), une unité d'acquisition de données (11), une unité d'extraction de ton (12) et une unité de détermination d'informations (13). Une détection automatique des données audio du second utilisateur est mise en œuvre au moyen du système de détection de données audio.
PCT/CN2016/083044 2015-09-18 2016-05-23 Procédé et système de détection de données audio, et support d'informations Ceased WO2017045429A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510600668.4 2015-09-18
CN201510600668.4A CN106548786B (zh) 2015-09-18 2015-09-18 一种音频数据的检测方法及系统

Publications (1)

Publication Number Publication Date
WO2017045429A1 true WO2017045429A1 (fr) 2017-03-23

Family

ID=58288092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/083044 Ceased WO2017045429A1 (fr) 2015-09-18 2016-05-23 Procédé et système de détection de données audio, et support d'informations

Country Status (2)

Country Link
CN (1) CN106548786B (fr)
WO (1) WO2017045429A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883106A (zh) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885845B (zh) * 2017-11-10 2020-11-17 广州酷狗计算机科技有限公司 音频分类方法及装置、计算机设备及存储介质
CN108766459B (zh) * 2018-06-13 2020-07-17 北京联合大学 一种多人语音混合中目标说话人估计方法及系统
CN110033785A (zh) * 2019-03-27 2019-07-19 深圳市中电数通智慧安全科技股份有限公司 一种呼救识别方法、装置、可读存储介质及终端设备
CN110277106B (zh) * 2019-06-21 2021-10-22 北京达佳互联信息技术有限公司 音频质量确定方法、装置、设备及存储介质
CN110933235B (zh) * 2019-11-06 2021-07-27 杭州哲信信息技术有限公司 一种基于机器学习的智能呼叫系统中的噪声识别方法
CN112017694B (zh) * 2020-08-25 2021-08-20 天津洪恩完美未来教育科技有限公司 语音数据的评测方法和装置、存储介质和电子装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002029785A1 (fr) * 2000-09-30 2002-04-11 Intel Corporation Procede, appareil et systeme permettant la verification du locuteur s'inspirant d'un modele de melanges de gaussiennes (gmm)
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
CN101241699A (zh) * 2008-03-14 2008-08-13 北京交通大学 一种远程汉语教学中的说话人确认系统
CN101308653A (zh) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 一种应用于语音识别系统的端点检测方法
CN101419797A (zh) * 2008-12-05 2009-04-29 无敌科技(西安)有限公司 一种提高语音辨识效率的方法及其语音辨识装置
CN104301561A (zh) * 2014-09-30 2015-01-21 成都英博联宇科技有限公司 一种智能会议电话机
CN204231479U (zh) * 2014-09-30 2015-03-25 成都英博联宇科技有限公司 一种智能会议电话机

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004032112A1 (fr) * 2002-10-04 2004-04-15 Koninklijke Philips Electronics N.V. Appareil de synthese vocale a segments de discours personnalises
CN101872614A (zh) * 2009-04-24 2010-10-27 韩松 混合型语音合成系统
CN104361891A (zh) * 2014-11-17 2015-02-18 科大讯飞股份有限公司 特定人群的个性化彩铃自动审核方法及系统
CN105244031A (zh) * 2015-10-26 2016-01-13 北京锐安科技有限公司 说话人识别方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411930B1 (en) * 1998-11-18 2002-06-25 Lucent Technologies Inc. Discriminative gaussian mixture models for speaker verification
WO2002029785A1 (fr) * 2000-09-30 2002-04-11 Intel Corporation Procede, appareil et systeme permettant la verification du locuteur s'inspirant d'un modele de melanges de gaussiennes (gmm)
CN101241699A (zh) * 2008-03-14 2008-08-13 北京交通大学 一种远程汉语教学中的说话人确认系统
CN101308653A (zh) * 2008-07-17 2008-11-19 安徽科大讯飞信息科技股份有限公司 一种应用于语音识别系统的端点检测方法
CN101419797A (zh) * 2008-12-05 2009-04-29 无敌科技(西安)有限公司 一种提高语音辨识效率的方法及其语音辨识装置
CN104301561A (zh) * 2014-09-30 2015-01-21 成都英博联宇科技有限公司 一种智能会议电话机
CN204231479U (zh) * 2014-09-30 2015-03-25 成都英博联宇科技有限公司 一种智能会议电话机

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111883106A (zh) * 2020-07-27 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置
CN111883106B (zh) * 2020-07-27 2024-04-19 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法及装置

Also Published As

Publication number Publication date
CN106548786A (zh) 2017-03-29
CN106548786B (zh) 2020-06-30

Similar Documents

Publication Publication Date Title
Boles et al. Voice biometrics: Deep learning-based voiceprint authentication system
WO2017045429A1 (fr) Procédé et système de détection de données audio, et support d'informations
CN108597496B (zh) 一种基于生成式对抗网络的语音生成方法及装置
CN106486131B (zh) 一种语音去噪的方法及装置
US8140331B2 (en) Feature extraction for identification and classification of audio signals
CN103236260B (zh) 语音识别系统
CN105405439B (zh) 语音播放方法及装置
WO2021139425A1 (fr) Procédé, appareil et dispositif de détection d'activité vocale, et support d'enregistrement
WO2021128741A1 (fr) Procédé et appareil d'analyse de fluctuation d'émotion dans la voix, et dispositif informatique et support de stockage
US20170154640A1 (en) Method and electronic device for voice recognition based on dynamic voice model selection
CN111292762A (zh) 一种基于深度学习的单通道语音分离方法
Pillos et al. A Real-Time Environmental Sound Recognition System for the Android OS.
CN116490920A (zh) 用于针对由自动语音识别系统处理的语音输入检测音频对抗性攻击的方法、对应的设备、计算机程序产品和计算机可读载体介质
US20160027438A1 (en) Concurrent Segmentation of Multiple Similar Vocalizations
Bagul et al. Text independent speaker recognition system using GMM
CN109473102A (zh) 一种机器人秘书智能会议记录方法及系统
CN114333874B (zh) 处理音频信号的方法
Murugaiya et al. Probability enhanced entropy (PEE) novel feature for improved bird sound classification
Mu et al. MFCC as features for speaker classification using machine learning
CN109997186A (zh) 一种用于分类声环境的设备和方法
CN106356076B (zh) 基于人工智能的语音活动性检测方法和装置
CN109584888A (zh) 基于机器学习的鸣笛识别方法
Xie et al. Detection of anuran calling activity in long field recordings for bio-acoustic monitoring
WO2022134781A1 (fr) Procédé, appareil et dispositif de détection de parole prolongée et support de stockage
Fahmeeda et al. Voice based gender recognition using deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16845543

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16845543

Country of ref document: EP

Kind code of ref document: A1