CN118197303A - Intelligent speech recognition and sentiment analysis system and method - Google Patents
Intelligent speech recognition and sentiment analysis system and method Download PDFInfo
- Publication number
- CN118197303A CN118197303A CN202410077111.6A CN202410077111A CN118197303A CN 118197303 A CN118197303 A CN 118197303A CN 202410077111 A CN202410077111 A CN 202410077111A CN 118197303 A CN118197303 A CN 118197303A
- Authority
- CN
- China
- Prior art keywords
- voice
- user
- data
- recognition
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及语音识别分析技术领域,具体涉及一种智能语音识别与情感分析系统及方法。The present invention relates to the technical field of speech recognition and analysis, and in particular to an intelligent speech recognition and sentiment analysis system and method.
背景技术Background technique
智能语音识别(ASR)是语音处理领域的重要组成部分,经过多年的研发和技术进步,使得系统能够准确地将口头语言转换为文本,深度学习技术的崛起,特别是端到端的模型设计,使得ASR的性能得到了显著提升,ASR广泛应用于语音助手、语音搜索、语音识别笔记等领域,其在提高用户体验、增强设备人机交互能力方面发挥了关键作用,情感分析,又称为情感计算、意见挖掘,是一种基于自然语言处理和机器学习技术的方法,用于识别和理解文本中的情感倾向,情感计算的发展得益于自然语言处理、文本挖掘和机器学习等领域的进步;Intelligent speech recognition (ASR) is an important part of the field of speech processing. After years of research and development and technological progress, the system can accurately convert spoken language into text. The rise of deep learning technology, especially end-to-end model design, has significantly improved the performance of ASR. ASR is widely used in voice assistants, voice search, voice recognition notes and other fields. It plays a key role in improving user experience and enhancing the human-computer interaction capabilities of devices. Sentiment analysis, also known as sentiment computing and opinion mining, is a method based on natural language processing and machine learning technology to identify and understand the sentiment tendency in text. The development of sentiment computing benefits from the progress in natural language processing, text mining and machine learning.
现有分析系统通常是在语音识别助手识别用户语音的过程中进行情感分析,然后在与用户当前交互完成后,基于情感分析的结果对语音识别助手进行优化,该种处理方式存在以下缺陷:Existing analysis systems usually perform sentiment analysis when the speech recognition assistant recognizes the user's speech, and then optimize the speech recognition assistant based on the results of the sentiment analysis after the current interaction with the user is completed. This processing method has the following defects:
若每次用户对语音识别助手输入语音时均进行情感分析,则会增加分析系统的数据处理负担,使得分析系统即要对语音进行识别,也要进行进行情感分析,降低语音识别助手的语音识别效率。If sentiment analysis is performed every time a user inputs speech into the speech recognition assistant, the data processing burden of the analysis system will increase, causing the analysis system to have to both recognize the speech and perform sentiment analysis, thereby reducing the speech recognition efficiency of the speech recognition assistant.
发明内容Summary of the invention
本发明的目的是提供一种智能语音识别与情感分析系统及方法,以解决背景技术中不足。The purpose of the present invention is to provide an intelligent speech recognition and sentiment analysis system and method to address the deficiencies in the background technology.
为了实现上述目的,本发明提供如下技术方案:一种智能语音识别与情感分析方法,所述分析方法包括以下步骤:In order to achieve the above object, the present invention provides the following technical solution: an intelligent speech recognition and sentiment analysis method, the analysis method comprising the following steps:
用户通过实体按键或者语音关键词唤醒语音识别助手,且用户向语音识别助手输入语音,语音识别助手识别语音后,基于用户语音识别内容进行相应反馈;The user wakes up the voice recognition assistant through a physical button or voice keyword, and the user inputs voice into the voice recognition assistant. After the voice recognition assistant recognizes the voice, it provides corresponding feedback based on the user's voice recognition content;
用户输入语音过程中,分析系统对用户语音进行录音处理,并获取用户语音输入过程中的环境数据以及设备数据,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据,判断是否需要对用户输入的语音数据进行情感分析;During the user's voice input process, the analysis system records the user's voice and obtains the environmental data and device data during the user's voice input process. After the voice interaction is completed, the environmental data and device data are analyzed by optimizing the judgment model to determine whether sentiment analysis of the user's voice data is required.
当判断需要进行情感分析时,分析系统获取对用户的录音数据,对录音数据提取特征数据,基于情感分析技术分析特征数据后识别用户的情感类别,并获取语音识别助手对用户的反馈数据,对反馈数据分析后,基于情感类别的识别结果以及反馈数据分析结果对语音识别助手进行优化。When it is determined that sentiment analysis is needed, the analysis system obtains the user's recorded data, extracts feature data from the recorded data, identifies the user's sentiment category after analyzing the feature data based on sentiment analysis technology, and obtains the voice recognition assistant's feedback data on the user. After analyzing the feedback data, the voice recognition assistant is optimized based on the recognition results of the sentiment category and the feedback data analysis results.
在一个优选的实施方式中,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据,环境数据包括背景噪音指数,设备数据包括采样率以及语音正确识别指数。In a preferred embodiment, after the voice interaction is completed, the environmental data and the device data are analyzed by optimizing the judgment model, the environmental data includes the background noise index, and the device data includes the sampling rate and the voice correct recognition index.
在一个优选的实施方式中,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析包括以下步骤:In a preferred embodiment, after the voice interaction is completed, after analyzing the environment data and the device data by optimizing the judgment model, determining whether it is necessary to perform sentiment analysis on the voice data input by the user includes the following steps:
在语音交互完成后,获取背景噪音指数、采样率以及语音正确识别指数,并代入优化判断模型中进行分析,输出优化系数;After the voice interaction is completed, the background noise index, sampling rate, and voice recognition accuracy index are obtained and substituted into the optimization judgment model for analysis, and the optimization coefficient is output;
若优化系数≥优化阈值,判断需要对用户输入的语音数据进行情感分析;If the optimization coefficient is greater than or equal to the optimization threshold, it is determined that sentiment analysis needs to be performed on the voice data input by the user;
若优化系数<优化阈值,判断不需要对用户输入的语音数据进行情感分析。If the optimization coefficient is less than the optimization threshold, it is determined that there is no need to perform sentiment analysis on the voice data input by the user.
在一个优选的实施方式中,所述优化判断模型的建立包括以下步骤:In a preferred embodiment, the establishment of the optimization judgment model includes the following steps:
将背景噪音指数、采样率以及语音正确识别指数标准化处理后,综合计算获取优化系数;After normalizing the background noise index, sampling rate, and speech recognition accuracy index, the optimization coefficient is obtained through comprehensive calculation;
获取优化系数yhx后,将优化系数yhx与预设的优化阈值进行对比,完成对优化判断模型的建立,优化阈值用于分析环境和设备对语音情感分析的影响大小。After obtaining the optimization coefficient yh x , the optimization coefficient yh x is compared with the preset optimization threshold to complete the establishment of the optimization judgment model. The optimization threshold is used to analyze the impact of the environment and equipment on speech emotion analysis.
在一个优选的实施方式中,所述背景噪音指数的获取逻辑为:在用户输入语音的过程中,若出现其他人声,且其他人声的分贝超过第一分贝阈值的时段为人声分贝预警的时段;In a preferred embodiment, the logic for obtaining the background noise index is: during the process of the user inputting voice, if other voices appear and the period when the decibel of other voices exceeds the first decibel threshold is the period of human voice decibel warning;
在用户输入语音的过程中,若出现非人声噪音,且非人声噪音的分贝超过第二分贝阈值的时段为非人声分贝预警的时段;During the process of the user inputting voice, if non-human voice noise occurs and the decibel of the non-human voice noise exceeds the second decibel threshold, it is the period of non-human voice decibel warning;
将人声分贝预警的时段与非人声分贝预警的时段进行积分运算后获取背景噪音指数,计算表达式为:式中,Z(t)为语音识别错误率,[tx,ty]为人声分贝预警的时段,[ti,tj]为非人声分贝预警的时段。The background noise index is obtained by integrating the period of human voice decibel warning and the period of non-human voice decibel warning. The calculation expression is: Where Z(t) is the speech recognition error rate, [ tx , ty ] is the period of human voice decibel warning, and [ ti , tj ] is the period of non-human voice decibel warning.
在一个优选的实施方式中,所述语音正确识别指数的计算逻辑为:获取用户在整个交互过程中输入的语音段数量,并计算每段语音的正确识别率,表达式为:式中,F表示正确识别率,ZQS表示语音正确识别字数,ZSL表示交互过程中所有语音的字数;In a preferred embodiment, the calculation logic of the speech correct recognition index is: obtain the number of speech segments input by the user during the entire interaction process, and calculate the correct recognition rate of each speech segment, the expression is: In the formula, F represents the correct recognition rate, ZQS represents the number of correctly recognized words in speech, and ZSL represents the number of words in all speech during the interaction process;
将整个交互过程中,正确识别率最大语音段的正确识别率作为语音正确识别指数,表达式为:YZL=max(F1、F2、...、Fn),式中,n表示整个交互过程中语音段的数量,且n为整数,max()表示最大正确识别率。The correct recognition rate of the speech segment with the maximum correct recognition rate in the whole interaction process is taken as the speech correct recognition index, and the expression is: YZL = max(F 1 , F 2 , ..., F n ), where n represents the number of speech segments in the whole interaction process, and n is an integer, and max() represents the maximum correct recognition rate.
在一个优选的实施方式中,获取语音识别助手对用户的反馈数据,对反馈数据分析包括以下步骤:In a preferred embodiment, obtaining the feedback data of the speech recognition assistant to the user and analyzing the feedback data comprises the following steps:
获取语音识别助手对用户的反馈数据,反馈数据包括识别响应速度、卡顿频率以及语音指令覆盖率;Obtain the feedback data of the voice recognition assistant on users, including recognition response speed, freeze frequency, and voice command coverage;
将识别响应速度、卡顿频率以及语音指令覆盖率综合计算获取反馈系数;The recognition response speed, freeze frequency and voice command coverage are calculated comprehensively to obtain the feedback coefficient;
若反馈系数≥预设的反馈阈值,评估语音识别助手的语音识别性能好;If the feedback coefficient is ≥ the preset feedback threshold, the speech recognition performance of the speech recognition assistant is evaluated to be good;
若反馈系数<预设的反馈阈值,评估语音识别助手的语音识别性能差。If the feedback coefficient is less than the preset feedback threshold, the speech recognition performance of the speech recognition assistant is evaluated to be poor.
本发明还提供一种智能语音识别与情感分析系统,包括唤醒模块、语音识别模块、录音模块、数据采集模块、判断模块、特征提取模块、情感分析模块、识别分析模块、优化模块;The present invention also provides an intelligent speech recognition and emotion analysis system, including a wake-up module, a speech recognition module, a recording module, a data acquisition module, a judgment module, a feature extraction module, an emotion analysis module, a recognition analysis module, and an optimization module;
唤醒模块:用户通过唤醒模块唤醒语音识别模块;Wake-up module: The user wakes up the speech recognition module through the wake-up module;
语音识别模块:用户向语音识别模块输入语音,语音识别模块识别语音后,基于用户语音识别内容进行相应反馈;Speech recognition module: The user inputs speech into the speech recognition module, and after the speech recognition module recognizes the speech, it provides corresponding feedback based on the user's speech recognition content;
录音模块:用户输入语音过程中,对用户语音进行录音处理;Recording module: Record the user's voice during the user input process;
数据采集模块:获取用户语音输入过程中的环境数据以及设备数据;Data collection module: obtains environmental data and device data during user voice input;
判断模块:在语音交互完成后,通过优化判断模型分析环境数据以及设备数据,判断是否需要对用户输入的语音数据进行情感分析;Judgment module: After the voice interaction is completed, the environment data and device data are analyzed by optimizing the judgment model to determine whether sentiment analysis is needed on the voice data input by the user;
特征提取模块:当判断需要进行情感分析时,获取对用户的录音数据,并提取录音数据中的特征数据;Feature extraction module: when it is determined that sentiment analysis is required, the user's recording data is obtained and feature data is extracted from the recording data;
情感分析模块:基于情感分析技术分析特征数据后识别用户的情感类别;Sentiment analysis module: identifies the user's sentiment category after analyzing feature data based on sentiment analysis technology;
识别分析模块:获取语音识别模块对用户的反馈数据,并对反馈数据进行分析;Recognition and analysis module: obtains the feedback data of the speech recognition module to the user and analyzes the feedback data;
优化模块:基于情感类别的识别结果以及反馈数据分析结果对语音识别模块进行优化。Optimization module: Optimize the speech recognition module based on the recognition results of emotion categories and feedback data analysis results.
在上述技术方案中,本发明提供的技术效果和优点:In the above technical solution, the technical effects and advantages provided by the present invention are:
1、本发明在用户输入语音过程中,分析系统对用户语音进行录音处理,并获取用户语音输入过程中的环境数据以及设备数据,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,当判断需要进行情感分析时基于情感分析技术识别用户的情感类别,并获取语音识别助手对用户的反馈数据,基于情感类别的识别结果以及反馈数据对语音识别助手进行优化。该分析方法在语音识别助手识别用户语音的过程中进行录音,并在语音交互完成后判断是否需要进行情感分析,有效降低分析系统的数据处理负担,并提高语音识别助手的语音识别效率;1. In the process of user inputting voice, the analysis system of the present invention records the user's voice and obtains the environmental data and device data during the user's voice input process. After the voice interaction is completed, after analyzing the environmental data and device data through the optimization judgment model, it is determined whether it is necessary to perform sentiment analysis on the voice data input by the user. When it is determined that sentiment analysis is necessary, the sentiment category of the user is identified based on the sentiment analysis technology, and the feedback data of the voice recognition assistant to the user is obtained. The voice recognition assistant is optimized based on the recognition result of the sentiment category and the feedback data. This analysis method records the voice during the process of the voice recognition assistant recognizing the user's voice, and determines whether sentiment analysis is needed after the voice interaction is completed, which effectively reduces the data processing burden of the analysis system and improves the voice recognition efficiency of the voice recognition assistant;
2、本发明通过获取用户语音输入过程中的环境数据以及设备数据,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,当环境和设备对用户输入的语音影响过大时,鉴于继续进行情感分析将加重分析系统的工作负担,且可能引发分析精度下降,影响分析结果,我们在对环境和设备对录音所产生影响进行分析后,根据实际情况判断是否展开情感分析,从而在无需对录音进行情感分析时,能作出恰当的决策。2. The present invention obtains environmental data and device data during the user's voice input process. After the voice interaction is completed, the present invention analyzes the environmental data and device data through an optimized judgment model to determine whether it is necessary to perform sentiment analysis on the voice data input by the user. When the environment and the device have too much impact on the user's input voice, continuing the sentiment analysis will increase the workload of the analysis system and may cause a decrease in analysis accuracy, affecting the analysis results. After analyzing the impact of the environment and the device on the recording, we determine whether to carry out sentiment analysis based on the actual situation, so that we can make appropriate decisions when there is no need to perform sentiment analysis on the recording.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments recorded in the present invention. For ordinary technicians in this field, other drawings can also be obtained based on these drawings.
图1为本发明的方法流程图。FIG. 1 is a flow chart of the method of the present invention.
图2为本发明的系统模块图。FIG. 2 is a system module diagram of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, technical solution and advantages of the embodiments of the present invention clearer, the technical solution in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
实施例1:请参阅图1所示,本实施例所述一种智能语音识别与情感分析方法,所述分析方法包括以下步骤:Embodiment 1: Please refer to FIG. 1 , an intelligent speech recognition and sentiment analysis method described in this embodiment, the analysis method comprises the following steps:
用户通过实体按键或者语音关键词唤醒语音识别助手,且用户向语音识别助手输入语音,包括以下步骤:The user wakes up the voice recognition assistant by pressing a physical button or using a voice keyword, and the user inputs voice to the voice recognition assistant, including the following steps:
实体按键唤醒:用户通过按下设备上的特定物理按键,例如手机上的语音助手按钮,激活语音识别助手;Physical button wake-up: The user activates the voice recognition assistant by pressing a specific physical button on the device, such as the voice assistant button on a mobile phone;
语音关键词唤醒:设备处于待机状态时,系统通过持续监听环境声音,一旦检测到用户说出预定义的语音关键词,即激活语音识别助手;Voice keyword wake-up: When the device is in standby mode, the system continuously monitors the ambient sound and activates the voice recognition assistant once it detects that the user speaks a predefined voice keyword;
语音输入:用户在唤醒后,可以开始说出语音指令、问题或请求,语音输入可能包括对话、查询、命令等多种形式;Voice input: After waking up, the user can start speaking voice commands, questions or requests. Voice input may include conversations, queries, commands and other forms.
语音信号采集:设备使用内置或外部麦克风等设备采集用户的语音信号,语音信号采集可能涉及降噪和语音端点检测等预处理步骤,以提高语音质量和准确性;Voice signal collection: The device uses a built-in or external microphone to collect the user's voice signal. Voice signal collection may involve pre-processing steps such as noise reduction and voice endpoint detection to improve voice quality and accuracy.
用户反馈和交互:系统向用户提供响应后,用户可以继续与语音助手进行交互,提出进一步的问题或指令,反馈机制可用于改善系统的性能,并确保用户满意度。User feedback and interaction: After the system provides a response to the user, the user can continue to interact with the voice assistant, asking further questions or instructions. The feedback mechanism can be used to improve the performance of the system and ensure user satisfaction.
语音识别助手识别语音后,基于用户语音识别内容进行相应反馈,相应反馈包括文本输出、语音输出或进行相应动作,包括以下步骤:After the speech recognition assistant recognizes the speech, it provides corresponding feedback based on the user's speech recognition content. The corresponding feedback includes text output, speech output or corresponding actions, including the following steps:
语音识别结果生成:将用户语音转换为文本,形成语音识别结果,使用智能语音识别技术,例如深度学习模型,将语音信号转化为相应的文字;Speech recognition result generation: Convert user speech into text to form speech recognition results. Use intelligent speech recognition technology, such as deep learning models, to convert speech signals into corresponding text.
意图识别和命令解析:分析语音识别结果,确定用户的意图和请求,利用自然语言处理技术,例如命名实体识别和关键词提取,以理解用户的意图并解析具体的命令或请求;Intent recognition and command parsing: Analyze speech recognition results, determine the user's intent and request, and use natural language processing techniques, such as named entity recognition and keyword extraction, to understand the user's intent and parse the specific command or request;
生成系统回应:基于用户的意图,生成系统的文本或语音回应,利用对话生成模型、预定义的响应模板或其他自然语言处理技术,以形成系统的回应;Generate system response: Generate text or voice response of the system based on the user's intent, using dialogue generation models, predefined response templates or other natural language processing techniques to form the system's response;
文本输出或语音合成:将生成的文本回应显示给用户,例如在屏幕上显示,如果用户希望得到语音回应,系统会使用语音合成技术将文本转化为语音,使用文本到语音(TTS)引擎,根据生成的文本生成自然语音;Text output or speech synthesis: The generated text response is displayed to the user, for example, on a screen. If the user wishes to receive a spoken response, the system uses speech synthesis technology to convert the text into speech, using a text-to-speech (TTS) engine to generate natural speech based on the generated text;
语音输出:当用户期望通过听觉方式接收回应时,系统通过扬声器或耳机向用户提供语音输出,利用设备上的音频输出设备,以人耳可听的方式播放生成的语音;Voice output: When the user expects to receive a response in an auditory manner, the system provides voice output to the user through a speaker or headphones, using the audio output device on the device to play the generated voice in a way that is audible to the human ear;
执行相应动作:根据用户请求,系统可能需要执行特定的操作或动作,调用相关的服务或功能,例如控制智能家居设备、发送消息、进行搜索等;Perform corresponding actions: Based on the user's request, the system may need to perform specific operations or actions and call related services or functions, such as controlling smart home devices, sending messages, searching, etc.
用户反馈和互动:提供机会让用户做出反馈,提问更多问题,或发出新的命令,设计对话交互方式,以维持与用户的实时交流;User feedback and interaction: Provide opportunities for users to provide feedback, ask more questions, or issue new commands, and design conversational interactions to maintain real-time communication with users;
记录用户上下文:系统记录和维护用户的对话上下文,以更好地理解用户的需求和提供连贯的对话,使用上下文管理技术,确保系统能够记住之前的对话历史。Record user context: The system records and maintains the user's conversation context to better understand the user's needs and provide coherent conversations. Context management techniques are used to ensure that the system can remember previous conversation history.
用户输入语音过程中,分析系统对用户语音进行录音处理,并获取用户语音输入过程中的环境数据以及设备数据,包括以下步骤:During the user's voice input process, the analysis system records the user's voice and obtains the environmental data and device data during the user's voice input process, including the following steps:
语音信号采集:利用设备上的麦克风等硬件设备采集用户的语音信号,使用音频采集技术,通过麦克风转换声音波形为电信号;Voice signal collection: Use the microphone and other hardware devices on the device to collect the user's voice signal, and use audio collection technology to convert the sound waveform into an electrical signal through the microphone;
语音信号预处理:对采集到的语音信号进行预处理,以提高语音识别的准确性,预处理步骤可能包括降噪、语音端点检测、去除回音等处理;Speech signal preprocessing: preprocess the collected speech signals to improve the accuracy of speech recognition. The preprocessing steps may include noise reduction, speech endpoint detection, echo removal, etc.
录音开始标记:标记语音输入的开始时间点,记录时间戳或其他标记,以便后续分析时确定语音输入的时间范围;Recording start mark: mark the start time of voice input, record timestamp or other marks, so as to determine the time range of voice input in subsequent analysis;
环境数据采集:获取与语音输入环境相关的数据,如环境噪音水平、温度、湿度等,利用传感器或设备记录周围环境的相关信息;Environmental data collection: Obtain data related to the voice input environment, such as ambient noise level, temperature, humidity, etc., and use sensors or devices to record relevant information about the surrounding environment;
设备数据采集:获取语音输入设备的相关信息,如设备型号、电池状态、网络连接状态等,利用设备的传感器和系统接口获取设备的状态和属性信息;Device data collection: Obtain relevant information about the voice input device, such as device model, battery status, network connection status, etc., and use the device's sensors and system interfaces to obtain the device's status and attribute information;
录音结束标记:标记语音输入的结束时间点,记录时间戳或其他标记,以便后续分析时确定语音输入的时间范围;Recording end mark: mark the end time of voice input, record timestamp or other marks, so as to determine the time range of voice input in subsequent analysis;
语音信号存储:将采集到的语音信号存储起来,以便后续分析和处理,将语音信号以音频文件的形式保存在系统中,通常以.wav或者.mp3等格式;Voice signal storage: The collected voice signals are stored for subsequent analysis and processing, and the voice signals are saved in the system in the form of audio files, usually in .wav or .mp3 formats;
数据传输和处理:将采集到的语音信号、环境数据和设备数据传输给相应的分析系统,利用网络连接或本地数据传输方式将数据发送给后台系统进行进一步的处理。Data transmission and processing: The collected voice signals, environmental data and equipment data are transmitted to the corresponding analysis system, and the data is sent to the background system for further processing via network connection or local data transmission.
在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,当判断需要进行情感分析时,分析系统获取对用户的录音数据,对录音数据提取特征数据,特征数据包括情感词汇、语调、语速等,包括以下步骤:After the voice interaction is completed, after analyzing the environment data and device data through the optimization judgment model, it is determined whether it is necessary to perform sentiment analysis on the voice data input by the user. When it is determined that sentiment analysis is required, the analysis system obtains the user's recording data and extracts feature data from the recording data. The feature data includes sentiment vocabulary, intonation, speech speed, etc., including the following steps:
语音数据预处理:对录音数据进行预处理,以减少噪音并提高分析的准确性,包括降噪、语音端点检测、去除回音等处理步骤;Voice data preprocessing: preprocess the recorded data to reduce noise and improve the accuracy of analysis, including noise reduction, voice endpoint detection, echo removal and other processing steps;
语音信号特征提取:从语音数据中提取有用的特征,用于后续情感分析;Speech signal feature extraction: extract useful features from speech data for subsequent sentiment analysis;
语谱特征提取:包括梅尔频率倒谱系数(MFCC)、声音能量等;Spectral feature extraction: including Mel-frequency cepstral coefficients (MFCC), sound energy, etc.;
基于时域的特征:如短时能量、短时过零率等;Features based on time domain: such as short-time energy, short-time zero-crossing rate, etc.
基于频域的特征:如频谱平均值、频谱带宽等;Features based on frequency domain: such as spectrum average, spectrum bandwidth, etc.;
情感词汇提取:从录音数据中提取包含情感信息的词汇,使用情感词汇库,通过文本分析或语音识别结果中提取涉及情感的关键词;Emotional vocabulary extraction: Extract words containing emotional information from recording data, use the emotional vocabulary library, and extract keywords related to emotions through text analysis or speech recognition results;
语调分析:分析语音中的语调模式,以了解说话者的情感状态,利用基频提取、语调曲线分析等技术,识别语音的音调起伏;Intonation analysis: Analyze the intonation patterns in speech to understand the speaker's emotional state, and use techniques such as fundamental frequency extraction and intonation curve analysis to identify the pitch fluctuations of speech;
语速分析:分析语音的语速,即说话者讲话的速度,通过计算语音中的音节持续时间、语速变化等信息来估计语速;Speech rate analysis: Analyze the speech rate, that is, the speed at which the speaker speaks, and estimate the speech rate by calculating the duration of syllables in the speech, the change in speech rate, and other information;
特征归一化:对提取的特征进行归一化,以确保它们在相同尺度上,有利于后续的模型训练和分析,可以使用标准化或其他归一化方法,使不同特征的值范围相近;Feature normalization: Normalize the extracted features to ensure that they are on the same scale, which is beneficial for subsequent model training and analysis. Standardization or other normalization methods can be used to make the value ranges of different features similar;
特征存储和分析:存储提取的特征,并进行后续的情感分析和语音识别,利用数据库或文件系统存储特征数据,并使用相应的算法进行情感分。Feature storage and analysis: Store the extracted features and perform subsequent sentiment analysis and speech recognition. Use a database or file system to store feature data and use the corresponding algorithm for sentiment analysis.
基于情感分析技术分析特征数据后识别用户的情感类别,情感类别包括高兴、悲伤、愤怒等,包括以下步骤:After analyzing the feature data based on sentiment analysis technology, the user's emotion category is identified. The emotion categories include happiness, sadness, anger, etc., including the following steps:
获取用户的录音数据,提取录音数据中的特征数据,包括声调、语速、能量、频率,通过训练完成的KNN模型对特征数据进行情感分类预测,对于每个测试样本,找到其最近的K个邻居,并根据它们的情感类别进行投票,选择得票最多的类别作为预测结果;Obtain the user's recording data, extract the feature data in the recording data, including tone, speaking speed, energy, and frequency, and use the trained KNN model to perform sentiment classification prediction on the feature data. For each test sample, find its nearest K neighbors and vote based on their sentiment categories, and select the category with the most votes as the prediction result;
KNN模型的建立包括以下步骤:The establishment of the KNN model includes the following steps:
收集包含用户语音样本和相应情感类别标签的数据集,对语音样本进行标注,标记情感类别,如高兴、悲伤、愤怒,对语音数据进行特征提取,将语音转化为机器学习模型可用的数值特征,提取与情感相关的声学特征,以及可能的文本特征,如情感词汇,对特征数据进行预处理,以确保数据的一致性和可用性,包括数据清理、归一化、标准化等步骤,以减小特征之间的差异,确保训练和测试数据集的分布一致,将数据集划分为训练集和测试集,以便在训练和评估模型时使用,通过交叉验证选择表现最好的K邻居数,使用训练集训练KNN模型,将训练集的特征数据和相应的情感类别标签提供给KNN算法,训练模型以建立特征与情感类别之间的关系,完成KNN模型的建立;Collect a dataset containing user voice samples and corresponding emotion category labels, annotate the voice samples, mark emotion categories such as happiness, sadness, and anger, perform feature extraction on the voice data, convert the voice into numerical features that can be used by the machine learning model, extract acoustic features related to emotions, and possible text features such as emotion vocabulary, preprocess the feature data to ensure data consistency and availability, including steps such as data cleaning, normalization, and standardization to reduce the difference between features and ensure that the distribution of training and test data sets is consistent, divide the data set into training and test sets for use in training and evaluating the model, select the best performing K neighbors through cross-validation, use the training set to train the KNN model, provide the feature data of the training set and the corresponding emotion category labels to the KNN algorithm, train the model to establish the relationship between features and emotion categories, and complete the establishment of the KNN model;
通过交叉验证选择表现最好的K邻居数包括以下步骤:Selecting the best performing number of K neighbors through cross validation involves the following steps:
选择一系列K值,通常从较小的值开始,逐渐增加,例如,可以选择K=1,3,5,7,9等,对于每个K值,进行交叉验证循环,对于每次循环:将训练集分为子训练集和验证集,使用子训练集对KNN模型进行训练,使用验证集评估模型性能,对于每个K值,计算在验证集上的性能指标,如准确性、精确度、召回率等,可以选择一个或多个指标来评估模型性能,基于性能指标的结果,选择在验证集上表现最好的K值,这可能是具有最高准确性或其他优良性能的K值,使用整个训练集(包括验证集)和选择的最优K值,重新训练KNN模型。Select a series of K values, usually starting with a smaller value and gradually increasing. For example, you can choose K = 1, 3, 5, 7, 9, etc. For each K value, perform a cross-validation cycle. For each cycle: divide the training set into a sub-training set and a validation set, use the sub-training set to train the KNN model, and use the validation set to evaluate the model performance. For each K value, calculate the performance indicators on the validation set, such as accuracy, precision, recall, etc. You can choose one or more indicators to evaluate the model performance. Based on the results of the performance indicators, select the K value that performs best on the validation set. This may be the K value with the highest accuracy or other excellent performance. Use the entire training set (including the validation set) and the selected optimal K value to retrain the KNN model.
例如:本申请收集的语音数据如表1所示:For example, the voice data collected in this application is shown in Table 1:
表1Table 1
我们选择声音能量和声调作为特征,将数据集分为训练集和测试集,这里我们使用留出法,80%的数据作为训练集,20%的数据作为测试集,假设我们选择K=3,我们使用训练集的特征数据和相应的情感类别标签进行模型训练,对于每个样本,KNN算法会找到最近的3个邻居,使用训练好的KNN模型对测试集中的新样本进行情感分类预测,对于每个测试样本,找到其最近的3个邻居。We select sound energy and tone as features and divide the dataset into training and test sets. Here we use the holdout method, with 80% of the data as the training set and 20% of the data as the test set. Assuming we choose K=3, we use the feature data of the training set and the corresponding emotion category labels for model training. For each sample, the KNN algorithm will find the three nearest neighbors and use the trained KNN model to perform emotion classification prediction on new samples in the test set. For each test sample, find its three nearest neighbors.
获取语音识别助手对用户的反馈数据,对反馈数据分析后,基于情感类别的识别结果以及反馈数据分析结果对语音识别助手进行优化。Obtain feedback data from the voice recognition assistant on users, analyze the feedback data, and optimize the voice recognition assistant based on the recognition results of the emotion categories and the feedback data analysis results.
本申请在用户输入语音过程中,分析系统对用户语音进行录音处理,并获取用户语音输入过程中的环境数据以及设备数据,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,当判断需要进行情感分析时基于情感分析技术识别用户的情感类别,并获取语音识别助手对用户的反馈数据,基于情感类别的识别结果以及反馈数据对语音识别助手进行优化。该分析方法在语音识别助手识别用户语音的过程中进行录音,并在语音交互完成后判断是否需要进行情感分析,有效降低分析系统的数据处理负担,并提高语音识别助手的语音识别效率;In the process of user voice input, the analysis system of this application records the user's voice and obtains the environmental data and device data during the user's voice input process. After the voice interaction is completed, after analyzing the environmental data and device data through the optimization judgment model, it is determined whether it is necessary to perform sentiment analysis on the voice data input by the user. When it is determined that sentiment analysis is necessary, the user's sentiment category is identified based on sentiment analysis technology, and the feedback data of the voice recognition assistant to the user is obtained. The voice recognition assistant is optimized based on the recognition results of the sentiment category and the feedback data. This analysis method records the voice during the process of the voice recognition assistant recognizing the user's voice, and determines whether sentiment analysis is needed after the voice interaction is completed, which effectively reduces the data processing burden of the analysis system and improves the voice recognition efficiency of the voice recognition assistant;
本申请通过获取用户语音输入过程中的环境数据以及设备数据,在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,当环境和设备对用户输入的语音影响过大时,鉴于继续进行情感分析将加重分析系统的工作负担,且可能引发分析精度下降,影响分析结果,我们在对环境和设备对录音所产生影响进行分析后,根据实际情况判断是否展开情感分析,从而在无需对录音进行情感分析时,能作出恰当的决策。This application obtains environmental data and device data during the user's voice input process. After the voice interaction is completed, it analyzes the environmental data and device data through an optimized judgment model to determine whether it is necessary to perform sentiment analysis on the voice data input by the user. When the environment and equipment have too much impact on the user's input voice, continuing the sentiment analysis will increase the workload of the analysis system and may cause a decrease in analysis accuracy and affect the analysis results. After analyzing the impact of the environment and equipment on the recording, we determine whether to conduct sentiment analysis based on the actual situation, so that we can make appropriate decisions when there is no need to perform sentiment analysis on the recording.
实施例2:在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,包括以下步骤:Embodiment 2: After the voice interaction is completed, after analyzing the environment data and the device data by optimizing the judgment model, it is determined whether it is necessary to perform sentiment analysis on the voice data input by the user, including the following steps:
在语音交互完成后,通过优化判断模型分析环境数据以及设备数据,环境数据包括背景噪音指数,设备数据包括采样率以及语音正确识别指数;After the voice interaction is completed, the environmental data and device data are analyzed by optimizing the judgment model. The environmental data includes the background noise index, and the device data includes the sampling rate and the voice recognition accuracy index.
在语音交互完成后,获取背景噪音指数BZS、采样率CYL以及语音正确识别指数YZL,并代入优化判断模型中进行分析,输出优化系数;After the voice interaction is completed, the background noise index BZS, sampling rate CYL and voice correct recognition index YZL are obtained, and substituted into the optimization judgment model for analysis, and the optimization coefficient is output;
若优化系数≥优化阈值,判断需要对用户输入的语音数据进行情感分析;If the optimization coefficient is greater than or equal to the optimization threshold, it is determined that sentiment analysis needs to be performed on the voice data input by the user;
若优化系数<优化阈值,判断不需要对用户输入的语音数据进行情感分析;If the optimization coefficient is less than the optimization threshold, it is determined that there is no need to perform sentiment analysis on the voice data input by the user;
背景噪音指数的获取逻辑为:在用户输入语音过程中,若出现除用户外的其他人声,则会增大分析系统识别用户语音情感的错误率,其他人声分贝越大,会导致分析系统识别用户语音情感的错误率越大,因此,在用户输入语音的过程中,若出现其他人声,且其他人声的分贝超过第一分贝阈值的时段为人声分贝预警的时段;The logic for obtaining the background noise index is as follows: during the user inputting voice, if other voices other than the user appear, the error rate of the analysis system in identifying the user's voice emotion will increase. The greater the decibel of other voices, the greater the error rate of the analysis system in identifying the user's voice emotion. Therefore, during the user inputting voice, if other voices appear and the decibel of other voices exceeds the first decibel threshold, it is the period of human voice decibel warning;
在用户输入语音过程中,若出现非人声噪音,不仅会导致分析系统识别用户语音情感的准确性降低,而且还可能损坏录音设备,非人声噪音越大,会导致分析系统识别用户语音情感的错误率越大,因此,在用户输入语音的过程中,若出现非人声噪音,且非人声噪音的分贝超过第二分贝阈值的时段为非人声分贝预警的时段;If non-human voice noise appears during the user's voice input process, it will not only reduce the accuracy of the analysis system in identifying the user's voice emotion, but also may damage the recording equipment. The greater the non-human voice noise, the greater the error rate of the analysis system in identifying the user's voice emotion. Therefore, during the user's voice input process, if non-human voice noise appears and the decibel of the non-human voice noise exceeds the second decibel threshold, it is the period of non-human voice decibel warning;
将人声分贝预警的时段与非人声分贝预警的时段进行积分运算后获取背景噪音指数,计算表达式为:式中,Z(t)为语音识别错误率,[tx,ty]为人声分贝预警的时段,[ti,tj]为非人声分贝预警的时段;The background noise index is obtained by integrating the period of human voice decibel warning and the period of non-human voice decibel warning. The calculation expression is: Where Z(t) is the speech recognition error rate, [t x , ty ] is the period of human voice decibel warning, and [t i , t j ] is the period of non-human voice decibel warning;
在用户输入语音过程中,若出现除用户外的其他人声,则会增大分析系统识别用户语音情感的错误率,具体为:During the user input process, if other voices other than the user appear, the error rate of the analysis system in recognizing the user's voice emotion will increase. Specifically:
混淆和交叠:其他人的声音可能与用户的语音混合在一起,使得情感分析系统难以准确地分辨和识别用户的情感,由于混淆和交叠,系统可能错误地将其他人的情感影响纳入分析,导致错误率的增加;Confusion and overlap: The voices of other people may be mixed with the user's voice, making it difficult for the sentiment analysis system to accurately distinguish and identify the user's emotions. Due to confusion and overlap, the system may mistakenly include the emotional influence of others in the analysis, resulting in an increase in the error rate;
情感表达差异:不同个体可能以不同的方式表达相同的情感,而其他人的声音可能引入额外的情感表达差异,情感分析系统需要能够区分不同说话者的情感表达方式,否则可能对用户的情感分析产生误导;Differences in emotional expression: Different individuals may express the same emotion in different ways, and other people’s voices may introduce additional differences in emotional expression. The sentiment analysis system needs to be able to distinguish the emotional expressions of different speakers, otherwise it may mislead the user’s sentiment analysis;
情感转变:用户在语音输入的过程中情感可能发生变化,而其他人的声音也可能带有不同的情感色彩,系统需要追踪和理解语音中可能存在的情感转变,而其他人的情感可能引入噪音,使得分析变得更加困难;Emotional changes: Users’ emotions may change during voice input, and other people’s voices may also have different emotional tones. The system needs to track and understand possible emotional changes in voice, and other people’s emotions may introduce noise, making analysis more difficult.
语音指令混淆:如果其他人在用户表达情感时发出类似的语音指令,系统可能难以确定哪个声音是用户的情感表达,这可能导致情感分析系统误将其他人的情感表达或指令识别为用户的,从而增加错误率;Voice command confusion: If other people make similar voice commands when the user expresses emotions, the system may have difficulty determining which sound is the user's emotional expression, which may cause the emotion analysis system to mistakenly identify other people's emotional expressions or commands as the user's, thereby increasing the error rate;
背景噪音引入:其他人的语音可能作为背景噪音引入,降低了语音信号的质量,背景噪音可能导致系统难以准确地提取和分析用户语音中的情感特征,从而增加错误率。Introduction of background noise: The voices of other people may be introduced as background noise, reducing the quality of the voice signal. Background noise may make it difficult for the system to accurately extract and analyze the emotional features in the user's voice, thereby increasing the error rate.
若出现非人声噪音,不仅会导致分析系统识别用户语音情感的准确性降低,而且还可能损坏录音设备,具体为:If non-human voice noise is present, it will not only reduce the accuracy of the analysis system in identifying the user's voice emotion, but may also damage the recording equipment. Specifically:
影响情感分析准确性:非人声噪音,如机械噪音、电子噪音等,可能与用户的语音混合在一起,使情感分析系统难以准确地提取和分析用户语音中的情感特征,非人声噪音引入了额外的信号,可能干扰情感特征的提取,从而降低系统对用户情感的准确性;Impact on the accuracy of sentiment analysis: Non-human noise, such as mechanical noise and electronic noise, may be mixed with the user's voice, making it difficult for the sentiment analysis system to accurately extract and analyze the emotional features in the user's voice. Non-human noise introduces additional signals, which may interfere with the extraction of emotional features, thereby reducing the accuracy of the system's understanding of the user's emotions.
损坏录音设备:强烈的非人声噪音,尤其是高强度的噪音,可能会对录音设备产生不良影响,高强度的非人声噪音可能导致录音设备的麦克风或其他组件受损,降低设备的寿命,甚至引起设备损坏;Damage to recording equipment: Strong non-human voice noise, especially high-intensity noise, may have an adverse effect on recording equipment. High-intensity non-human voice noise may damage the microphone or other components of the recording equipment, reduce the life of the equipment, or even cause damage to the equipment;
信噪比下降:非人声噪音会引入额外的噪音成分,降低语音信号的信噪比,降低的信噪比使得语音情感分析系统更难以清晰地识别和分析用户语音,导致分析困难;Decreased signal-to-noise ratio: Non-human voice noise introduces additional noise components, reducing the signal-to-noise ratio of the speech signal. The reduced signal-to-noise ratio makes it more difficult for the speech emotion analysis system to clearly identify and analyze the user's voice, resulting in difficulty in analysis;
语音信号失真:非人声噪音可能导致语音信号的失真,使得原始语音信息无法被清晰地传递到分析系统,失真的语音信号会降低情感分析系统对用户情感表达的准确性,因为关键的情感特征可能已经被失真影响。Speech signal distortion: Non-human noise may cause distortion of speech signals, making it impossible for the original speech information to be clearly transmitted to the analysis system. Distorted speech signals will reduce the accuracy of the sentiment analysis system in expressing the user's emotions, because key emotional features may have been affected by the distortion.
采样率的获取方式为:The sampling rate is obtained as follows:
操作系统设置:在Windows操作系统中,你可以通过打开"声音"设置,选择录音设备,并查看其属性,在"高级"或"高级属性"选项卡中,你应该能够找到设备的默认采样率,在macOS中,你可以打开"音频MIDI设置",选择"音频设备",并查看设备的配置信息,包括采样率,在Linux操作系统中,你可以使用命令行工具如arecord或者aplay,并通过参数获取设备的采样率信息;Operating system settings: In Windows operating system, you can open the "Sound" settings, select the recording device, and view its properties. In the "Advanced" or "Advanced Properties" tab, you should be able to find the default sampling rate of the device. In macOS, you can open "Audio MIDI Setup", select "Audio Devices", and view the device configuration information, including the sampling rate. In Linux operating system, you can use command line tools such as arecord or aplay and obtain the device's sampling rate information through parameters;
设备文档:查阅设备的用户手册或技术规格表,通常,设备的规格表中会提供关于采样率的详细信息;Device documentation: Consult the device's user manual or technical specification sheet. Usually, the device's specification sheet provides detailed information about the sampling rate.
音频设备管理工具:在某些情况下,设备制造商提供了专用的音频设备管理工具,通过这些工具你可以查看和配置设备的详细设置,包括采样率;Audio device management tools: In some cases, device manufacturers provide dedicated audio device management tools that allow you to view and configure detailed settings for your device, including sampling rates.
应用程序设置:在某些应用程序中,你可以直接在设置或首选项中找到音频设备的配置信息,包括采样率。Application settings: In some applications, you can find configuration information for your audio device, including the sample rate, directly in the settings or preferences.
由于背景噪音指数的获取逻辑为通过分析人声或非人声分贝对情感分析错误率的影响,然而,在实际情况中,还存在设备本身存在异常时,会导致语音识别正确率下降;Since the logic of obtaining the background noise index is to analyze the impact of human voice or non-human voice decibels on the error rate of sentiment analysis, in actual situations, when there are abnormalities in the device itself, the accuracy of speech recognition will decrease;
但在实际应用中,现有的语音识别助手即使对用户输入的一段语音中的某几个字或词未正确识别,也会通过理解上下文的方式来自动填补未正确识别的某几个字或词;However, in actual applications, even if existing speech recognition assistants fail to correctly recognize certain words or phrases in a speech input by the user, they will automatically fill in the missing words or phrases by understanding the context.
然而,未正确识别的某几个字或词会导致分析系统无法准确分析用户的语音情感,增大分析误差,因此,本申请还对用户输入的语音进行识别正确率分析;However, some incorrectly recognized characters or words may cause the analysis system to be unable to accurately analyze the user's voice emotion, increasing the analysis error. Therefore, the present application also analyzes the recognition accuracy of the user's input voice;
语音正确识别指数的计算逻辑为:获取用户在整个交互过程中输入的语音段数量,并计算每段语音的正确识别率,表达式为:式中,F表示正确识别率,ZQS表示语音正确识别字数,ZSL表示交互过程中所有语音的字数;The calculation logic of the speech recognition accuracy index is: obtain the number of speech segments input by the user during the entire interaction process, and calculate the correct recognition rate of each speech segment. The expression is: In the formula, F represents the correct recognition rate, ZQS represents the number of correctly recognized words in speech, and ZSL represents the number of words in all speech during the interaction process;
将整个交互过程中,正确识别率最大语音段的正确识别率作为语音正确识别指数,表达式为:YZL=max(F1、F2、...、Fn),式中,n表示整个交互过程中语音段的数量,且n为整数,max()表示最大正确识别率。The correct recognition rate of the speech segment with the maximum correct recognition rate in the whole interaction process is taken as the speech correct recognition index, and the expression is: YZL = max(F 1 , F 2 , ..., F n ), where n represents the number of speech segments in the whole interaction process, and n is an integer, and max() represents the maximum correct recognition rate.
优化判断模型的建立包括以下步骤:The establishment of the optimization judgment model includes the following steps:
将背景噪音指数BZS、采样率CYL以及语音正确识别指数YZL标准化处理后,综合计算获取优化系数yhx,表达式为:After the background noise index BZS, sampling rate CYL and speech recognition accuracy index YZL are standardized, the optimization coefficient yh x is obtained by comprehensive calculation, and the expression is:
式中,BZS为背景噪音指数,CYL为采样率,YZL为语音正确识别指数,α、β、γ分别为采样率、语音正确识别指数以及背景噪音指数的比例系数,且α、β、γ均大于0; Where BZS is the background noise index, CYL is the sampling rate, YZL is the speech recognition index, α, β, and γ are the proportional coefficients of the sampling rate, speech recognition index, and background noise index, respectively, and α, β, and γ are all greater than 0;
获取优化系数yhx后,将优化系数yhx与预设的优化阈值进行对比,完成对优化判断模型的建立,优化阈值用于分析环境和设备对语音情感分析的影响大小。After obtaining the optimization coefficient yh x , the optimization coefficient yh x is compared with the preset optimization threshold to complete the establishment of the optimization judgment model. The optimization threshold is used to analyze the impact of the environment and equipment on speech emotion analysis.
获取语音识别助手对用户的反馈数据,对反馈数据分析包括以下步骤:Obtaining the feedback data of the speech recognition assistant to the user, and analyzing the feedback data includes the following steps:
获取语音识别助手对用户的反馈数据,反馈数据包括识别响应速度、卡顿频率以及语音指令覆盖率;Obtain user feedback from the voice recognition assistant, including recognition response speed, freeze frequency, and voice command coverage;
将识别响应速度、卡顿频率以及语音指令覆盖率综合计算获取反馈系数fkx,计算表达式为:式中,XYD、FGL、KDL分别为识别响应速度、语音指令覆盖率以及卡顿频率,a1、a2、a3分别为识别响应速度、语音指令覆盖率以及卡顿频率的比例系数,且a1、a2、a3均大于0;The recognition response speed, freeze frequency and voice command coverage are calculated comprehensively to obtain the feedback coefficient fk x , and the calculation expression is: Where, XYD, FGL, KDL are recognition response speed, voice command coverage, and jamming frequency, respectively; a 1 , a 2 , and a 3 are proportional coefficients of recognition response speed, voice command coverage, and jamming frequency, respectively; and a 1 , a 2 , and a 3 are all greater than 0;
若反馈系数fkx≥预设的反馈阈值,评估语音识别助手的语音识别性能好;If the feedback coefficient fk x ≥ the preset feedback threshold, the speech recognition performance of the speech recognition assistant is evaluated to be good;
若反馈系数fkx<预设的反馈阈值,评估语音识别助手的语音识别性能差;If the feedback coefficient fk x < the preset feedback threshold, the speech recognition performance of the speech recognition assistant is evaluated to be poor;
识别响应速度的获取方式为:在语音识别助手的代码中,可以使用计时器或时间戳来记录用户语音输入开始的时间和系统产生响应的时间,通过计算这两个时间点之间的差值,就可以得到识别助手的响应时间。The method to obtain the recognition response speed is as follows: In the code of the voice recognition assistant, a timer or timestamp can be used to record the time when the user's voice input starts and the time when the system generates a response. By calculating the difference between these two time points, the response time of the recognition assistant can be obtained.
卡顿频率的获取方式为:在语音识别助手中添加详细的日志记录,包括语音识别请求的时间戳、处理时间和响应时间等信息,分析这些日志以识别系统在响应语音输入时是否存在卡顿现象,以及卡顿的频率和时长。The method for obtaining the frequency of jams is to add detailed log records in the speech recognition assistant, including information such as the timestamp, processing time, and response time of the speech recognition request, and analyze these logs to identify whether the system has jams when responding to voice input, as well as the frequency and duration of the jams.
语音指令覆盖率的获取方式为:如果系统已经在实际应用中使用,可以分析用户日志来了解用户实际使用的语音指令,这可以通过匿名收集和分析用户的语音输入数据来实现,从用户日志中提取语音指令,然后分析每个指令的成功率,以评估系统在实际使用中的覆盖率。The voice command coverage is obtained as follows: If the system has been used in actual applications, user logs can be analyzed to understand the voice commands actually used by users. This can be achieved by anonymously collecting and analyzing user voice input data, extracting voice commands from user logs, and then analyzing the success rate of each command to evaluate the system's coverage in actual use.
基于情感类别的识别结果以及反馈数据分析结果对语音识别助手进行优化,包括以下步骤:The speech recognition assistant is optimized based on the recognition results of the emotion categories and the feedback data analysis results, including the following steps:
关联情感与指令响应:将情感类别与相应的指令响应关联起来。例如,当用户表达愤怒时,系统可以采取更冷静的回应,或提供额外的支持以解决用户的问题。Associate emotions with command responses: Associate emotion categories with corresponding command responses. For example, when a user expresses anger, the system can take a calmer response or provide additional support to solve the user's problem.
个性化情感模型:考虑开发个性化情感模型,能够适应不同用户的情感表达方式。通过个性化模型,提高系统对用户情感的理解和识别准确性。Personalized emotion model: Consider developing a personalized emotion model that can adapt to the emotional expressions of different users. Through the personalized model, the system can improve the accuracy of understanding and recognition of user emotions.
反馈机制:在用户交互中引入情感反馈机制,以便系统能够根据用户的情感状态调整交互策略。例如,当检测到用户愉悦时,系统可以提供更加友好和轻松的回应。Feedback mechanism: Introduce emotional feedback mechanism in user interaction so that the system can adjust the interaction strategy according to the user's emotional state. For example, when the system detects that the user is happy, it can provide a more friendly and relaxed response.
用户调查和反馈:进行用户调查,了解用户对语音识别助手在情感识别方面的期望和反馈。收集用户对系统情感识别准确性的看法,并根据反馈进行相应调整。User surveys and feedback: Conduct user surveys to understand user expectations and feedback on emotion recognition in speech recognition assistants. Collect user opinions on the accuracy of emotion recognition in the system and make adjustments accordingly based on the feedback.
优化情感标签和分类:定期审查和优化情感类别标签和分类体系。确保情感类别的定义与用户期望和实际语境相符,以提高情感识别的准确性。Optimize emotion labels and classification: Regularly review and optimize emotion category labels and classification systems. Ensure that the definition of emotion categories is consistent with user expectations and actual context to improve the accuracy of emotion recognition.
处理多模态输入:考虑处理多模态输入,例如结合语音和图像信息进行情感分析。这可以增强对用户情感的理解,尤其在视频通话或语音交互中更为有用。Processing multimodal input: Consider processing multimodal input, such as combining voice and image information for sentiment analysis. This can enhance the understanding of user emotions, especially useful in video calls or voice interactions.
隐私保护:在优化情感识别功能时,要注意保护用户隐私。明确告知用户情感识别的目的和使用方式,确保符合隐私法规。Privacy protection: When optimizing emotion recognition functions, pay attention to protecting user privacy. Clearly inform users of the purpose and use of emotion recognition to ensure compliance with privacy regulations.
语音数据质量优化:语音识别助手可能受到语音输入质量的影响,如噪音、回声等。使用噪音抑制技术、回声消除算法,或在用户输入前提供清晰的语音输入指导,以提高语音数据的质量。Voice data quality optimization: Voice recognition assistants may be affected by voice input quality, such as noise, echo, etc. Use noise suppression technology, echo cancellation algorithms, or provide clear voice input guidance before user input to improve the quality of voice data.
多模态输入支持:只依赖语音输入可能限制了系统的性能,尤其是在复杂的交互场景中。支持多模态输入,例如结合语音和文本输入,以提高系统对用户意图的理解和准确性。Multimodal input support: Relying solely on voice input may limit the performance of the system, especially in complex interaction scenarios. Support multimodal input, such as combining voice and text input, to improve the system's understanding and accuracy of user intent.
语音指令覆盖范围扩展:语音识别助手可能未能覆盖用户广泛的语音指令。扩展语音指令的覆盖范围,包括常用指令和特定领域的指令,以适应更多用户需求。Expanded voice command coverage: Voice recognition assistants may not cover a wide range of user voice commands. Expand the coverage of voice commands, including common commands and commands in specific fields, to meet the needs of more users.
实时性能监测:语音识别助手在实时性能方面可能存在问题,导致卡顿或延迟。引入实时性能监测机制,以追踪系统的性能并及时发现潜在问题,使得系统能够更加流畅地处理语音输入。Real-time performance monitoring: Speech recognition assistants may have problems with real-time performance, causing freezes or delays. Introduce a real-time performance monitoring mechanism to track the performance of the system and identify potential problems in a timely manner, so that the system can process voice input more smoothly.
语言和口音适应性提升:在处理多语言和口音时,语音识别助手可能表现不佳。改进语言模型,引入口音适应性技术,以提高系统在不同语境下的识别准确性。Improved language and accent adaptability: Speech recognition assistants may not perform well when dealing with multiple languages and accents. Improve the language model and introduce accent adaptability technology to improve the system's recognition accuracy in different contexts.
调整语音识别引擎参数:语音识别引擎的默认参数可能不适用于特定应用场景。调整语音识别引擎的参数,例如音频特征提取的参数、模型深度等,以优化识别性能。Adjust the speech recognition engine parameters: The default parameters of the speech recognition engine may not be suitable for specific application scenarios. Adjust the parameters of the speech recognition engine, such as the parameters of audio feature extraction and model depth, to optimize the recognition performance.
用户反馈机制:语音识别错误时,缺乏用户反馈可能导致问题未被及时发现。引入用户反馈机制,允许用户报告语音识别错误,并收集这些反馈以指导系统优化。User feedback mechanism: When speech recognition errors occur, lack of user feedback may result in problems not being discovered in time. Introduce a user feedback mechanism to allow users to report speech recognition errors and collect this feedback to guide system optimization.
模型迭代和更新:使用的语音识别模型可能过时或不适应当前用户的语言习惯。定期更新语音识别模型,利用最新的语音数据和技术,进行模型迭代和优化。Model iteration and update: The speech recognition model used may be outdated or not suitable for the current user's language habits. The speech recognition model should be updated regularly, and the latest speech data and technology should be used to iterate and optimize the model.
难度级别适应:语音识别系统可能无法处理复杂或难以理解的语音输入。引入难度级别适应机制,使系统能够动态调整对语音输入的处理水平,以适应不同难度水平的用户。Difficulty level adaptation: Speech recognition systems may not be able to handle complex or difficult to understand speech input. Introducing a difficulty level adaptation mechanism enables the system to dynamically adjust the level of processing of speech input to accommodate users with different difficulty levels.
隐私和安全保障:不充分的隐私和安全保障可能降低用户对语音识别助手的信任。采取措施确保语音数据的安全存储和处理,并明确告知用户有关隐私政策和数据使用方式。Privacy and security: Inadequate privacy and security protections may reduce user trust in voice recognition assistants. Take measures to ensure the secure storage and processing of voice data, and clearly inform users about privacy policies and data usage.
实施例3:请参阅图2所示,本实施例所述一种智能语音识别与情感分析系统,包括唤醒模块、语音识别模块、录音模块、数据采集模块、判断模块、特征提取模块、情感分析模块、识别分析模块、优化模块;Embodiment 3: Please refer to FIG. 2 , an intelligent speech recognition and sentiment analysis system described in this embodiment includes a wake-up module, a speech recognition module, a recording module, a data acquisition module, a judgment module, a feature extraction module, a sentiment analysis module, a recognition analysis module, and an optimization module;
唤醒模块:用户通过唤醒模块唤醒语音识别模块;Wake-up module: The user wakes up the speech recognition module through the wake-up module;
语音识别模块:用户向语音识别模块输入语音,语音识别模块识别语音后,基于用户语音识别内容进行相应反馈,相应反馈包括文本输出、语音输出或进行相应动作,反馈数据发送至识别分析模块;Speech recognition module: The user inputs speech into the speech recognition module. After the speech recognition module recognizes the speech, it provides corresponding feedback based on the user's speech recognition content. The corresponding feedback includes text output, voice output or corresponding actions. The feedback data is sent to the recognition analysis module;
录音模块:用户输入语音过程中,对用户语音进行录音处理,录音数据发送至特征提取模块;Recording module: When the user inputs voice, the user's voice is recorded and processed, and the recording data is sent to the feature extraction module;
数据采集模块:获取用户语音输入过程中的环境数据以及设备数据,环境数据以及设备数据发送至判断模块;Data collection module: obtains environmental data and device data during the user's voice input process, and sends the environmental data and device data to the judgment module;
判断模块:在语音交互完成后,通过优化判断模型分析环境数据以及设备数据后,判断是否需要对用户输入的语音数据进行情感分析,判断结果发送至特征提取模块;Judgment module: After the voice interaction is completed, the environment data and device data are analyzed by optimizing the judgment model to determine whether it is necessary to perform sentiment analysis on the voice data input by the user, and the judgment result is sent to the feature extraction module;
特征提取模块:当判断需要进行情感分析时,获取对用户的录音数据,对录音数据提取特征数据,特征数据包括情感词汇、语调、语速等,特征数据发送至情感分析模块;Feature extraction module: when it is determined that sentiment analysis is needed, the user's recording data is obtained, and feature data is extracted from the recording data. The feature data includes sentiment vocabulary, intonation, speech speed, etc. The feature data is sent to the sentiment analysis module;
情感分析模块:基于情感分析技术分析特征数据后识别用户的情感类别,情感类别包括高兴、悲伤、愤怒等,或者进行情感极性分析,如正面、负面、中性等,情感类别的识别结果发送至优化模块;Sentiment analysis module: After analyzing feature data based on sentiment analysis technology, identify the user's emotion category, which includes happiness, sadness, anger, etc., or perform sentiment polarity analysis, such as positive, negative, neutral, etc. The identification results of the emotion category are sent to the optimization module;
识别分析模块:获取语音识别模块对用户的反馈数据,对反馈数据分析,反馈数据分析结果发送至优化模块;Recognition and analysis module: obtains the feedback data of the speech recognition module on the user, analyzes the feedback data, and sends the feedback data analysis results to the optimization module;
优化模块:基于情感类别的识别结果以及反馈数据分析结果对语音识别模块进行优化。Optimization module: Optimize the speech recognition module based on the recognition results of emotion categories and feedback data analysis results.
上述公式均是去量纲取其数值计算,公式是由采集大量数据进行软件模拟得到最近真实情况的一个公式,公式中的预设参数由本领域的技术人员根据实际情况进行设置。The above formulas are all dimensionless and numerical calculations. The formula is a formula for the most recent real situation obtained by collecting a large amount of data and performing software simulation. The preset parameters in the formula are set by technicians in this field according to actual conditions.
在本说明书的描述中,参考术语“一个实施例”、“示例”、“具体示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the description with reference to the terms "one embodiment", "example", "specific example", etc. means that the specific features, structures, materials or characteristics described in conjunction with the embodiment or example are included in at least one embodiment or example of the present invention. In this specification, the schematic representation of the above terms does not necessarily refer to the same embodiment or example. Moreover, the specific features, structures, materials or characteristics described can be combined in any one or more embodiments or examples in a suitable manner.
以上公开的本发明优选实施例只是用于帮助阐述本发明。优选实施例并没有详尽叙述所有的细节,也不限制该发明仅为的具体实施方式。显然,根据本说明书的内容,可作很多的修改和变化。本说明书选取并具体描述这些实施例,是为了更好地解释本发明的原理和实际应用,从而使所属技术领域技术人员能很好地理解和利用本发明。本发明仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present invention disclosed above are only used to help explain the present invention. The preferred embodiments do not describe all the details in detail, nor do they limit the invention to only specific implementation methods. Obviously, many modifications and changes can be made according to the content of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the present invention, so that those skilled in the art can understand and use the present invention well. The present invention is limited only by the claims and their full scope and equivalents.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410077111.6A CN118197303B (en) | 2024-01-18 | 2024-01-18 | Intelligent speech recognition and sentiment analysis system and method |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202410077111.6A CN118197303B (en) | 2024-01-18 | 2024-01-18 | Intelligent speech recognition and sentiment analysis system and method |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118197303A true CN118197303A (en) | 2024-06-14 |
| CN118197303B CN118197303B (en) | 2024-08-23 |
Family
ID=91397330
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202410077111.6A Active CN118197303B (en) | 2024-01-18 | 2024-01-18 | Intelligent speech recognition and sentiment analysis system and method |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118197303B (en) |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118972485A (en) * | 2024-07-30 | 2024-11-15 | 咪咕音乐有限公司 | Speech generation method, device, medium and computer program product |
| CN119274591A (en) * | 2024-11-29 | 2025-01-07 | 芯知科技(江苏)有限公司 | Speech emotion analysis method and system based on artificial intelligence |
| CN119905085A (en) * | 2025-03-31 | 2025-04-29 | 龙岩学院 | English spoken emotion intelligent interactive teaching system and method based on deep learning |
| CN120632013A (en) * | 2024-07-04 | 2025-09-12 | 先智创科(北京)科技有限公司 | Intelligent dialogue scene analysis method based on AI big model |
Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000181500A (en) * | 1998-12-15 | 2000-06-30 | Equos Research Co Ltd | Speech recognition device and agent device |
| WO2014062521A1 (en) * | 2012-10-19 | 2014-04-24 | Sony Computer Entertainment Inc. | Emotion recognition using auditory attention cues extracted from users voice |
| WO2015070645A1 (en) * | 2013-11-14 | 2015-05-21 | 华为技术有限公司 | Voice recognition method, voice recognition device, and electronic device |
| CN109493886A (en) * | 2018-12-13 | 2019-03-19 | 西安电子科技大学 | Speech-emotion recognition method based on feature selecting and optimization |
| US20190103105A1 (en) * | 2017-09-29 | 2019-04-04 | Lenovo (Beijing) Co., Ltd. | Voice data processing method and electronic apparatus |
| KR20190143116A (en) * | 2018-06-20 | 2019-12-30 | 주식회사 샤우터 | Talk auto-recording apparatus method |
| CN111276162A (en) * | 2020-01-14 | 2020-06-12 | 林泽珊 | Hearing aid-based voice output optimization method, server and storage medium |
| CN112235468A (en) * | 2020-10-16 | 2021-01-15 | 绍兴市寅川软件开发有限公司 | Audio processing method and system for voice customer service evaluation |
| CN114464180A (en) * | 2022-02-21 | 2022-05-10 | 海信电子科技(武汉)有限公司 | Intelligent device and intelligent voice interaction method |
| CN114492579A (en) * | 2021-12-25 | 2022-05-13 | 浙江大华技术股份有限公司 | Emotion recognition method, camera device, emotion recognition device and storage device |
| CN115148193A (en) * | 2022-07-04 | 2022-10-04 | 鼎富新动力(北京)智能科技有限公司 | Voice recognition method and system |
| WO2023065619A1 (en) * | 2021-10-21 | 2023-04-27 | 北京邮电大学 | Multi-dimensional fine-grained dynamic sentiment analysis method and system |
| US20230154487A1 (en) * | 2021-11-15 | 2023-05-18 | Chu-Ying HUANG | Method, system and device of speech emotion recognition and quantization based on deep learning |
-
2024
- 2024-01-18 CN CN202410077111.6A patent/CN118197303B/en active Active
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2000181500A (en) * | 1998-12-15 | 2000-06-30 | Equos Research Co Ltd | Speech recognition device and agent device |
| WO2014062521A1 (en) * | 2012-10-19 | 2014-04-24 | Sony Computer Entertainment Inc. | Emotion recognition using auditory attention cues extracted from users voice |
| WO2015070645A1 (en) * | 2013-11-14 | 2015-05-21 | 华为技术有限公司 | Voice recognition method, voice recognition device, and electronic device |
| US20190103105A1 (en) * | 2017-09-29 | 2019-04-04 | Lenovo (Beijing) Co., Ltd. | Voice data processing method and electronic apparatus |
| KR20190143116A (en) * | 2018-06-20 | 2019-12-30 | 주식회사 샤우터 | Talk auto-recording apparatus method |
| CN109493886A (en) * | 2018-12-13 | 2019-03-19 | 西安电子科技大学 | Speech-emotion recognition method based on feature selecting and optimization |
| CN111276162A (en) * | 2020-01-14 | 2020-06-12 | 林泽珊 | Hearing aid-based voice output optimization method, server and storage medium |
| CN112235468A (en) * | 2020-10-16 | 2021-01-15 | 绍兴市寅川软件开发有限公司 | Audio processing method and system for voice customer service evaluation |
| WO2023065619A1 (en) * | 2021-10-21 | 2023-04-27 | 北京邮电大学 | Multi-dimensional fine-grained dynamic sentiment analysis method and system |
| US20230154487A1 (en) * | 2021-11-15 | 2023-05-18 | Chu-Ying HUANG | Method, system and device of speech emotion recognition and quantization based on deep learning |
| CN114492579A (en) * | 2021-12-25 | 2022-05-13 | 浙江大华技术股份有限公司 | Emotion recognition method, camera device, emotion recognition device and storage device |
| CN114464180A (en) * | 2022-02-21 | 2022-05-10 | 海信电子科技(武汉)有限公司 | Intelligent device and intelligent voice interaction method |
| CN115148193A (en) * | 2022-07-04 | 2022-10-04 | 鼎富新动力(北京)智能科技有限公司 | Voice recognition method and system |
Non-Patent Citations (1)
| Title |
|---|
| 王展帆: ""基于深度学习的多模态情感分析"", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 02, 15 February 2023 (2023-02-15) * |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120632013A (en) * | 2024-07-04 | 2025-09-12 | 先智创科(北京)科技有限公司 | Intelligent dialogue scene analysis method based on AI big model |
| CN118972485A (en) * | 2024-07-30 | 2024-11-15 | 咪咕音乐有限公司 | Speech generation method, device, medium and computer program product |
| CN119274591A (en) * | 2024-11-29 | 2025-01-07 | 芯知科技(江苏)有限公司 | Speech emotion analysis method and system based on artificial intelligence |
| CN119905085A (en) * | 2025-03-31 | 2025-04-29 | 龙岩学院 | English spoken emotion intelligent interactive teaching system and method based on deep learning |
| CN119905085B (en) * | 2025-03-31 | 2025-06-13 | 龙岩学院 | Deep learning-based intelligent interactive English emotion teaching system and method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118197303B (en) | 2024-08-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN118197303B (en) | Intelligent speech recognition and sentiment analysis system and method | |
| US11842721B2 (en) | Systems and methods for generating synthesized speech responses to voice inputs by training a neural network model based on the voice input prosodic metrics and training voice inputs | |
| CN116665669A (en) | A voice interaction method and system based on artificial intelligence | |
| CN101346758A (en) | Emotion recognition device | |
| CN114783424B (en) | Text corpus screening methods, devices, equipment and storage media | |
| JP2009003162A (en) | Force voice detector | |
| CN112349266B (en) | A voice editing method and related equipment | |
| Kumar et al. | Machine learning based speech emotions recognition system | |
| CN118173092A (en) | An online customer service platform based on AI voice interaction | |
| JP5099211B2 (en) | Voice data question utterance extraction program, method and apparatus, and customer inquiry tendency estimation processing program, method and apparatus using voice data question utterance | |
| CN108091323A (en) | For identifying the method and apparatus of emotion from voice | |
| CN117352000A (en) | Speech classification method, device, electronic equipment and computer readable medium | |
| CN119380719A (en) | Audio to text conversion method and device, electronic device, and storage medium | |
| Pugazhenthi et al. | AI-Driven Voice Inputs for Speech Engine Testing in Conversational Systems | |
| Bisikalo et al. | Precision Automated Phonetic Analysis of Speech Signals for Information Technology of Text-dependent Authentication of a Person by Voice. | |
| CN120632013B (en) | Intelligent Dialogue Scene Analysis Method Based on AI Large Model | |
| CN113689886B (en) | Voice data emotion detection method and device, electronic equipment and storage medium | |
| CN120164454B (en) | A low-delay speech synthesis method, device, equipment and medium | |
| CN119360881B (en) | Audio data labeling method, device, equipment and medium | |
| CN119204030B (en) | Voice translation method and device for solving voice ambiguity | |
| CN117935865B (en) | User emotion analysis method and system for personalized marketing | |
| CN119889301A (en) | Voice interaction recognition analysis method and system based on Bluetooth headset | |
| CN119993128A (en) | Speech recognition method, device, electronic device and storage medium | |
| CN119763617B (en) | Effective voice detection method and device | |
| Narain et al. | Impact of emotions to analyze gender through speech |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |