[go: up one dir, main page]

WO2021189903A1 - Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations - Google Patents

Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations Download PDF

Info

Publication number
WO2021189903A1
WO2021189903A1 PCT/CN2020/131983 CN2020131983W WO2021189903A1 WO 2021189903 A1 WO2021189903 A1 WO 2021189903A1 CN 2020131983 W CN2020131983 W CN 2020131983W WO 2021189903 A1 WO2021189903 A1 WO 2021189903A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
target
spectrogram
voice signal
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/131983
Other languages
English (en)
Chinese (zh)
Inventor
魏文琦
王健宗
贾雪丽
张之勇
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Publication of WO2021189903A1 publication Critical patent/WO2021189903A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique

Definitions

  • This application relates to the field of artificial intelligence, and in particular to an audio-based user state recognition method, device, electronic equipment, and storage medium.
  • An audio-based user state recognition method provided in this application includes:
  • the user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
  • the present application also provides an audio-based user state recognition device, which includes:
  • the model generation module is used to obtain an audio training set, perform feature conversion on each audio in the audio training set to obtain a target spectrogram atlas; based on the attention mechanism and small sample learning, use the target spectrogram atlas to
  • the pre-built deep learning network model is trained to obtain the user state recognition model;
  • the state recognition module is used to perform feature conversion on the audio of the user to be recognized when the audio of the user to be recognized is received to obtain the spectrogram to be recognized; use the user state recognition model to perform the feature conversion on the spectrogram to be recognized Recognize and get the result of user status recognition.
  • This application also provides an electronic device, which includes:
  • Memory storing at least one instruction
  • the processor executes the instructions stored in the memory to implement the following steps:
  • the user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
  • the present application also provides a computer-readable storage medium in which at least one instruction is stored, and when the at least one instruction is executed by a processor in an electronic device, the following steps are implemented:
  • the user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
  • FIG. 1 is a schematic flowchart of an audio-based user state recognition method provided by an embodiment of this application
  • FIG. 2 is a schematic diagram of a detailed process of obtaining a target spectrogram set in an audio-based user state recognition method provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of modules of an audio-based user state recognition device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of the internal structure of an electronic device that implements an audio-based user state recognition method provided by an embodiment of the application;
  • This application provides an audio-based user status recognition method.
  • FIG. 1 it is a schematic flowchart of an audio-based user state recognition method provided by an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the audio-based user state recognition method includes:
  • the audio training set is a collection of audios containing initial tags.
  • the initial tags are the user's disease conditions, such as acute bronchitis, chronic pharyngitis, pertussis, fever; further, Since the user's cough audio has corresponding sound features under different disease conditions, preferably, the audio training set is a collection of cough audio corresponding to different disease conditions, wherein the sound feature is the frequency domain of the cough audio
  • the characteristics can be represented by a spectrogram.
  • the embodiment of the present application performs feature transformation on the audio training set to obtain the target spectrogram atlas, including:
  • each audio in the audio training set is resampled to obtain the corresponding digital voice signal.
  • the present application uses a digital-to-analog converter to resample each audio in the audio training set.
  • a pre-emphasis operation is performed on each audio in the audio training set
  • the performing the pre-emphasis operation on each audio in the audio training set includes: re-sampling each audio in the audio training set to obtain the corresponding digital voice signal;
  • the digital voice signal is pre-emphasized to obtain a standard digital voice signal, and all the standard digital voice signals are summarized to obtain a voice signal set.
  • x(t) is the digital voice signal
  • t is the time
  • y(t) is the standard digital voice signal
  • is the preset adjustment value of the pre-emphasis operation, preferably, the value of ⁇
  • the range is [0.9,1.0].
  • the standard voice signal in the voice signal set can only reflect the change of audio in the time domain, and cannot reflect the audio characteristics of the standard voice signal.
  • the audio is more intuitive and clear, and feature conversion is performed on each standard digital voice signal in the voice signal set.
  • performing feature conversion on each standard digital voice signal in the voice signal set includes: using a preset sound processing algorithm to map each standard digital voice signal in the voice signal set on a frequency Domain, the corresponding target spectrogram is obtained, and all the target spectrograms are summarized to obtain the target spectrogram set.
  • the sound processing algorithm described in this application is the Mel filter algorithm.
  • the above steps only perform feature conversion on each audio of the audio training set, and will not affect the initial label corresponding to each audio of the audio training set, so the target spectrogram is set Each target spectrogram has a corresponding initial label.
  • the target spectrogram atlas is used to pre-build
  • the deep learning network model is trained to obtain an audio-based user state recognition model.
  • the training of the pre-built deep learning network model by using the target spectrogram atlas includes:
  • Step A The target spectrogram set is divided into a training set and a test set;
  • the target spectrogram set is divided into a training set and a test set, and the robustness of the model is enhanced by continuously testing the training model by using the test set, and dividing the target spectrogram set into a training set And a test set, including: classifying each target spectrogram in the target spectrogram atlas according to the corresponding initial label to obtain the corresponding classification target spectrogram atlas; randomly taking out from the classification target spectrogram atlas A preset number of target spectrograms are used as the test subset, and the complement of the training subset in the classified spectrogram set is used as the training subset; all the training sets of the training subset are summarized, and all the training sets are summarized.
  • the test subset obtains
  • Step B Use the training set to train the deep learning network to obtain an initial recognition model, test the initial recognition model according to the test set to obtain a loss value, and return to step when the loss value is greater than a preset threshold A.
  • the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.
  • the deep learning network in the embodiment of the present application is a convolutional neural network.
  • the size of the images in the target spectrogram atlas may be different, which in turn leads to the target extracted by the deep learning network model during the training process.
  • the target spectrograms in the spectrogram set have different feature dimensions and cannot be uniformly trained.
  • the embodiment of the present application uses the training set to compare the pre-deep learning network , It is necessary to add an attention mechanism processing layer before the fully connected layer of the deep learning network model to perform image feature alignment, where the attention mechanism processing layer performs a feature alignment network according to different image feature dimensions, for example: target sound spectrum
  • the image feature a of the feature extraction performed on the deep learning network model in Figure A is a D*T1 dimensional matrix
  • the image feature b of the target spectrogram B that is feature extraction performed on the deep learning network model is a D*T2 dimensional matrix
  • the attention mechanism processing layer converts the preset weight matrix of image feature a multiplied by T1*1 into a D-dimensional matrix, and converts the preset weight matrix of image feature b multiplied by T2*1 into a D-dimensional matrix to realize the image Feature a and image feature b are aligned.
  • the embodiment of the present application needs to perform the initial recognition model to verify the recognition ability of the model to facilitate the training and adjustment of the model.
  • the recognition category of the initial recognition model in the embodiment of the present application is the same as the category of the initial tags in the target spectrogram atlas.
  • the recognition categories in the initial recognition model also have the same two types: chronic pharyngitis and fever.
  • testing the initial recognition model according to the test set to obtain a loss value includes: extracting a feature vector corresponding to each of the initial tags in the initial recognition model to obtain a target feature Vector; use the initial recognition model to perform feature extraction on each target spectrogram in the test subset to obtain a test feature vector; calculate the target feature vector and the test feature vector corresponding to each of the initial tags Calculate the average value of all the loss distance values to obtain the loss value.
  • the embodiment of the present application adopts an Euclidean distance calculation method to calculate the distance between the target feature vector corresponding to each of the initial tags and the test feature vector.
  • the different recognition types of the initial model correspond to different fully connected layer nodes, and the fully connected layer nodes have corresponding sequences.
  • the embodiment of the present application obtains the full range corresponding to each recognition type of the initial recognition model.
  • the output values of the connection layer nodes are combined in the order of the corresponding fully connected layer nodes to obtain the corresponding target feature vector; further, in the embodiment of the present application, each target spectrogram in the test subset is input to the office
  • the initial recognition model according to the initial label corresponding to each target spectrogram in the test subset, the output value of the fully connected layer node corresponding to the recognition category in the initial recognition model is obtained, and the output value of the fully connected layer node is obtained according to the corresponding fully connected layer node. Combine sequentially to obtain the test feature vector.
  • the audio training set may be stored in a blockchain node.
  • the audio of the user to be identified is of the same category as the audio in the audio training set.
  • the audio of the user to be identified is the user's cough audio. Audio training set
  • the method for performing feature conversion on the audio of the user to be identified in the embodiment of the present application is the same as the above-mentioned method for performing feature conversion on each audio of the audio training set.
  • the user status recognition result is the user's health status, such as acute bronchitis, chronic pharyngitis, pertussis, and fever.
  • feature conversion is performed on each audio in the audio training set to obtain the target spectrogram atlas, so that the features in the audio in the audio training set are clearer and more intuitive, and the accuracy of subsequent model training is increased; Attention mechanism and small sample learning, using the target spectrogram atlas to train a pre-built deep learning network model to obtain a user state recognition model, which enhances the robustness and training accuracy of the model under the small sample training set; Perform feature conversion on the audio of the user to be identified to obtain the spectrogram to be identified, so that the audio features of the user to be identified are more clear and intuitive, and the recognition accuracy of the subsequent model is improved; The to-be-recognized spectrogram is recognized, and the user state recognition result is obtained. A small amount of more easily available audio data is used to train the model, which reduces the data resource consumption of the model training. Only the user's audio can be used to recognize the user state. Enhance the practicality of the model.
  • FIG. 3 it is a functional block diagram of the audio-based user state recognition device of the present application.
  • the audio-based user state recognition apparatus 100 described in this application can be installed in an electronic device.
  • the audio-based user state recognition device may include a model generation module 101 and a state recognition module 102.
  • the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the model generation module 101 is used to obtain an audio training set, perform feature conversion on each audio in the audio training set to obtain a target spectrogram set; based on the attention mechanism and small sample learning, use the target spectrogram Set to train the pre-built deep learning network model to obtain the user state recognition model.
  • the audio training set is a collection of audios containing initial tags.
  • the initial tags are the user's disease conditions, such as acute bronchitis, chronic pharyngitis, pertussis, fever; further, Since the user's cough audio has corresponding sound features under different disease conditions, preferably, the audio training set is a collection of cough audio corresponding to different disease conditions, wherein the sound feature is the frequency domain of the cough audio
  • the characteristics can be represented by a spectrogram.
  • the model generation module 101 in this embodiment of the present application uses the following means to perform feature transformation on the audio training set to obtain the Target sound spectrum atlas, including:
  • each audio in the audio training set is resampled to obtain the corresponding digital voice signal.
  • the present application uses a digital-to-analog converter to resample each audio in the audio training set.
  • a pre-emphasis operation is performed on each audio in the audio training set
  • the pre-emphasis operation on each audio in the audio training set includes: re-sampling each audio in the audio training set to obtain the corresponding digital voice signal;
  • the digital voice signal is pre-emphasized to obtain a standard digital voice signal, and all the standard digital voice signals are summarized to obtain a voice signal set.
  • model generation module 101 uses the following formula to perform the pre-emphasis operation:
  • x(t) is the digital voice signal
  • t is the time
  • y(t) is the standard digital voice signal
  • is the preset adjustment value of the pre-emphasis operation, preferably, the value of ⁇
  • the range is [0.9,1.0].
  • the standard voice signal in the voice signal set can only reflect the change of audio in the time domain, and cannot reflect the audio characteristics of the standard voice signal.
  • the audio is more intuitive and clear, and feature conversion is performed on each standard digital voice signal in the voice signal set.
  • the model generation module 101 in the embodiment of the present application uses the following means to perform feature conversion on each standard digital voice signal in the voice signal set, including: using a preset voice processing algorithm to concentrate the voice signal Each standard digital speech signal is mapped in the frequency domain to obtain a corresponding target spectrogram, and all the target spectrograms are summarized to obtain the target spectrogram set.
  • the sound processing algorithm described in this application is the Mel filter algorithm.
  • the above steps only perform feature conversion on each audio of the audio training set, and will not affect the initial label corresponding to each audio of the audio training set, so the target spectrogram is set Each target spectrogram has a corresponding initial label.
  • the target spectrogram atlas is used to pre-build
  • the deep learning network model is trained to obtain an audio-based user state recognition model.
  • the model generation module 101 uses the following methods to train the pre-built deep learning network model, including:
  • Step A The target spectrogram set is divided into a training set and a test set;
  • the target spectrogram set is divided into a training set and a test set, and the robustness of the model is enhanced by continuously testing the training model by using the test set, and dividing the target spectrogram set into a training set
  • the test set including: classifying each target spectrogram in the target spectrogram atlas according to the corresponding initial label to obtain the corresponding classification target spectrogram atlas; randomly taking out from the classification target spectrogram atlas
  • a preset number of target spectrograms are used as the test subset, and the complement of the training subset in the classified spectrogram set is used as the training subset; all the training sets of the training subset are summarized, and all the training sets are summarized.
  • the test subset obtains a
  • Step B Use the training set to train the deep learning network to obtain an initial recognition model, test the initial recognition model according to the test set to obtain a loss value, and return to step when the loss value is greater than a preset threshold A.
  • the loss value is less than or equal to a preset threshold, the initial recognition model is used as the user state recognition model.
  • the deep learning network in the embodiment of the present application is a convolutional neural network.
  • the size of the images in the target spectrogram atlas may be different, which in turn leads to the target extracted by the deep learning network model during the training process.
  • the target spectrograms in the spectrogram set have different feature dimensions and cannot be uniformly trained.
  • the embodiment of the present application uses the training set to compare the pre-deep learning network , It is necessary to add an attention mechanism processing layer before the fully connected layer of the deep learning network model to perform image feature alignment, where the attention mechanism processing layer performs a feature alignment network according to different image feature dimensions, for example: target sound spectrum
  • the image feature a of the feature extraction performed on the deep learning network model in Figure A is a D*T1 dimensional matrix
  • the image feature b of the target spectrogram B that is feature extraction performed on the deep learning network model is a D*T2 dimensional matrix
  • the attention mechanism processing layer converts the preset weight matrix of image feature a multiplied by T1*1 into a D-dimensional matrix, and converts the preset weight matrix of image feature b multiplied by T2*1 into a D-dimensional matrix to realize the image Feature a and image feature b are aligned.
  • the embodiment of the present application needs to perform the initial recognition model to verify the recognition ability of the model to facilitate the training and adjustment of the model.
  • the recognition category of the initial recognition model in the embodiment of the present application is the same as the category of the initial tags in the target spectrogram atlas.
  • the recognition categories in the initial recognition model also have the same two types: chronic pharyngitis and fever.
  • the model generation module 101 in the embodiment of the present application obtains the loss value by the following means, including: extracting the feature vector corresponding to each of the initial tags in the initial recognition model to obtain the target feature vector;
  • the recognition model performs feature extraction on each target spectrogram in the test subset to obtain a test feature vector; calculates the distance between the target feature vector corresponding to each initial tag and the test feature vector to obtain the loss distance Value; Calculate the average of all the loss distance values to obtain the loss value.
  • the embodiment of the present application adopts an Euclidean distance calculation method to calculate the distance between the target feature vector corresponding to each of the initial tags and the test feature vector.
  • the model generation module 101 described in this embodiment of the application obtains each of the initial recognition models.
  • the output values of the fully connected layer nodes corresponding to the recognition category are combined according to the order of the corresponding fully connected layer nodes to obtain the corresponding target feature vector; further, the model generation module 101 described in the embodiment of the present application combines the Each target spectrogram in the test subset is input to the initial recognition model, and the fully connected layer node corresponding to the recognition category in the initial recognition model is obtained according to the initial label corresponding to each target spectrogram in the test subset The output values of are combined according to the order of the corresponding fully connected layer nodes to obtain the test feature vector.
  • the audio training set may be stored in a blockchain node.
  • the state recognition module 102 is configured to, when receiving the audio of the user to be recognized, perform feature conversion on the audio of the user to be recognized to obtain the spectrogram to be recognized; The spectrum is identified, and the user status identification result is obtained.
  • the audio of the user to be identified is of the same category as the audio in the audio training set.
  • the audio of the user to be identified is the user's cough audio. Audio training set
  • the method for performing feature conversion on the audio of the user to be identified in the embodiment of the present application is the same as the above-mentioned method for performing feature conversion on each audio of the audio training set.
  • the user status recognition result is the user's disease condition, such as acute bronchitis, chronic pharyngitis, whooping cough, and fever.
  • FIG. 44 it is a schematic structural diagram of an electronic device that implements an audio-based user state recognition method according to the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as an audio-based user state recognition program.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital) equipped on the electronic device 1.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of an audio-based user status recognition program, etc., but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control core of the electronic device (Control Unit), using various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules stored in the memory 11 (for example, audio-based user status recognition programs, etc.), and calling
  • the data in the memory 11 is used to perform various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI) bus or an extended industry standard structure (extended industry standard structure). industry standard architecture, EISA for short) bus, etc.
  • PCI peripheral component interconnect
  • extended industry standard structure extended industry standard structure
  • EISA industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 4 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a Wi-Fi interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the audio-based user state recognition program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
  • the integrated module/unit of the electronic device 1 can be stored in a computer-readable storage medium. It can be volatile or non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the following steps:
  • the user state recognition model is used to recognize the to-be-recognized spectrogram to obtain a user state recognition result.
  • the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store a block chain node Use the created data, etc.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

La présente invention concerne un procédé et un appareil d'identification d'état d'utilisateur basé sur l'audio, ainsi qu'un dispositif électronique et un support de stockage lisible par ordinateur. Le procédé comprend: l'acquisition d'un ensemble d'apprentissage audio, et la réalisation d'une conversion de caractéristique sur chaque élément d'audio dans l'ensemble d'apprentissage audio de façon à obtenir un ensemble de sonogramme cible (S1); sur la base d'un mécanisme d'attention et d'un apprentissage de petits échantillons, effectuer l'apprentissage d'un modèle de réseau d'apprentissage profond pré-construit en utilisant l'ensemble de sonogramme cible de façon à obtenir un modèle d'identification d'état d'utilisateur (S2); lorsque l'audio d'un utilisateur devant être soumis à une identification est reçu, effectuer une conversion de caractéristiques sur l'audio dudit utilisateur de façon à obtenir un sonogramme à soumettre à une identification (S3) ; et identifier ledit sonogramme à l'aide du modèle d'identification d'état d'utilisateur de façon à obtenir un résultat d'identification d'état d'utilisateur (S4). La présente demande concerne également la technologie de la chaîne de blocs, et l'ensemble de sonogramme cible peut être stocké dans la chaîne de blocs. En utilisant le procédé, la consommation de ressources de données est réduite, et la praticabilité d'un modèle est améliorée.
PCT/CN2020/131983 2020-10-09 2020-11-27 Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations Ceased WO2021189903A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011074898.9A CN112233700A (zh) 2020-10-09 2020-10-09 基于音频的用户状态识别方法、装置及存储介质
CN202011074898.9 2020-10-09

Publications (1)

Publication Number Publication Date
WO2021189903A1 true WO2021189903A1 (fr) 2021-09-30

Family

ID=74120698

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/131983 Ceased WO2021189903A1 (fr) 2020-10-09 2020-11-27 Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations

Country Status (2)

Country Link
CN (1) CN112233700A (fr)
WO (1) WO2021189903A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114913860A (zh) * 2022-04-27 2022-08-16 中国工商银行股份有限公司 声纹识别方法、装置、计算机设备、存储介质及程序产品
CN115221350A (zh) * 2022-07-29 2022-10-21 平安科技(深圳)有限公司 一种基于少样本度量学习的事件音频检测方法及系统
CN115547339A (zh) * 2022-08-10 2022-12-30 深圳市声扬科技有限公司 语音处理方法、处理装置、电子设备及存储介质
CN116434288A (zh) * 2021-12-31 2023-07-14 丰图科技(深圳)有限公司 状态识别方法、装置、终端和存储介质
CN117476036A (zh) * 2023-12-27 2024-01-30 广州声博士声学技术有限公司 一种环境噪声识别方法、系统、设备和介质
CN118410332A (zh) * 2023-12-27 2024-07-30 梅州市金航科技有限公司 基于无监督领域自适应的舵机实时故障诊断方法及装置
CN120853617A (zh) * 2025-09-19 2025-10-28 福建睿芯科技有限公司 一种基于声纹识别模型的设备工作状态识别方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114049637A (zh) * 2021-11-10 2022-02-15 重庆大学 一种目标识别模型的建立方法、系统、电子设备及介质
CN116509371A (zh) * 2022-01-21 2023-08-01 华为技术有限公司 一种音频检测的方法及电子设备
CN114373484A (zh) * 2022-03-22 2022-04-19 南京邮电大学 语音驱动的帕金森病多症状特征参数的小样本学习方法
CN114722884B (zh) * 2022-06-08 2022-09-30 深圳市润东来科技有限公司 基于环境音的音频控制方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321015A (zh) * 2012-03-29 2015-01-28 昆士兰大学 用于处理患者声音的方法与装置
CN106073706A (zh) * 2016-06-01 2016-11-09 中国科学院软件研究所 一种面向简易精神状态量表的个性化信息和音频数据分析方法及系统
CN106202952A (zh) * 2016-07-19 2016-12-07 南京邮电大学 一种基于机器学习的帕金森疾病诊断方法
CN106847262A (zh) * 2016-12-28 2017-06-13 华中农业大学 一种猪呼吸道疾病自动识别报警方法
WO2019023879A1 (fr) * 2017-07-31 2019-02-07 深圳和而泰智能家居科技有限公司 Procédé et dispositif de reconnaissance de son de toux, et support d'informations
WO2019119050A1 (fr) * 2017-12-21 2019-06-27 The University Of Queensland Méthode d'analyse des sons de la toux à l'aide de signatures pathologiques pour diagnostiquer des maladies respiratoires

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205535A (zh) * 2016-12-16 2018-06-26 北京酷我科技有限公司 情感标注的方法及其系统
CN111666960B (zh) * 2019-03-06 2024-01-19 南京地平线机器人技术有限公司 图像识别方法、装置、电子设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321015A (zh) * 2012-03-29 2015-01-28 昆士兰大学 用于处理患者声音的方法与装置
CN106073706A (zh) * 2016-06-01 2016-11-09 中国科学院软件研究所 一种面向简易精神状态量表的个性化信息和音频数据分析方法及系统
CN106202952A (zh) * 2016-07-19 2016-12-07 南京邮电大学 一种基于机器学习的帕金森疾病诊断方法
CN106847262A (zh) * 2016-12-28 2017-06-13 华中农业大学 一种猪呼吸道疾病自动识别报警方法
WO2019023879A1 (fr) * 2017-07-31 2019-02-07 深圳和而泰智能家居科技有限公司 Procédé et dispositif de reconnaissance de son de toux, et support d'informations
WO2019119050A1 (fr) * 2017-12-21 2019-06-27 The University Of Queensland Méthode d'analyse des sons de la toux à l'aide de signatures pathologiques pour diagnostiquer des maladies respiratoires

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116434288A (zh) * 2021-12-31 2023-07-14 丰图科技(深圳)有限公司 状态识别方法、装置、终端和存储介质
CN114913860A (zh) * 2022-04-27 2022-08-16 中国工商银行股份有限公司 声纹识别方法、装置、计算机设备、存储介质及程序产品
CN115221350A (zh) * 2022-07-29 2022-10-21 平安科技(深圳)有限公司 一种基于少样本度量学习的事件音频检测方法及系统
CN115547339A (zh) * 2022-08-10 2022-12-30 深圳市声扬科技有限公司 语音处理方法、处理装置、电子设备及存储介质
CN117476036A (zh) * 2023-12-27 2024-01-30 广州声博士声学技术有限公司 一种环境噪声识别方法、系统、设备和介质
CN117476036B (zh) * 2023-12-27 2024-04-09 广州声博士声学技术有限公司 一种环境噪声识别方法、系统、设备和介质
CN118410332A (zh) * 2023-12-27 2024-07-30 梅州市金航科技有限公司 基于无监督领域自适应的舵机实时故障诊断方法及装置
CN120853617A (zh) * 2025-09-19 2025-10-28 福建睿芯科技有限公司 一种基于声纹识别模型的设备工作状态识别方法

Also Published As

Publication number Publication date
CN112233700A (zh) 2021-01-15

Similar Documents

Publication Publication Date Title
WO2021189903A1 (fr) Procédé et appareil d'identification d'état d'utilisateur basé sur l'audio, dispositif électronique et support d'informations
WO2022116420A1 (fr) Procédé et appareil de détection d'événement vocal, dispositif électronique, et support de stockage informatique
WO2021232594A1 (fr) Appareil et procédé de reconnaissance d'émotions de parole, dispositif électronique, et support de stockage
WO2022105179A1 (fr) Procédé et appareil de reconnaissance d'image de caractéristiques biologiques, dispositif électronique et support de stockage lisible
WO2022121176A1 (fr) Procédé et appareil de synthèse de la parole, dispositif électronique et support de stockage lisible
CN113157739B (zh) 跨模态检索方法、装置、电子设备及存储介质
WO2021189855A1 (fr) Procédé et appareil de reconnaissance d'image basés sur une séquence de tdm et dispositif électronique et support
WO2022222943A1 (fr) Procédé et appareil de recommandation de département, dispositif électronique et support de stockage
CN112581227A (zh) 产品推荐方法、装置、电子设备及存储介质
CN113064994A (zh) 会议质量评估方法、装置、设备及存储介质
WO2022194062A1 (fr) Procédé et appareil de détection de marqueur de maladie, dispositif électronique et support d'enregistrement
CN113961473A (zh) 数据测试方法、装置、电子设备及计算机可读存储介质
CN107729928A (zh) 信息获取方法和装置
CN113065607A (zh) 图像检测方法、装置、电子设备及介质
CN114708461A (zh) 基于多模态学习模型的分类方法、装置、设备及存储介质
CN111814743A (zh) 笔迹识别方法、装置及计算机可读存储介质
WO2022141867A1 (fr) Procédé et appareil de reconnaissance de parole, dispositif électronique et support de stockage lisible
CN111651292A (zh) 数据校验方法、装置、电子设备及计算机可读存储介质
CN111738005B (zh) 命名实体对齐方法、装置、电子设备及可读存储介质
CN111859985B (zh) Ai客服模型测试方法、装置、电子设备及存储介质
CN116521867B (zh) 文本聚类方法、装置、电子设备及存储介质
CN114511569B (zh) 基于肿瘤标志物的医学图像识别方法、装置、设备及介质
CN116486972A (zh) 电子病历生成方法、装置、设备及存储介质
CN113704405B (zh) 基于录音内容的质检评分方法、装置、设备及存储介质
CN115146064A (zh) 意图识别模型优化方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20927189

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20927189

Country of ref document: EP

Kind code of ref document: A1