CN1924996B - System and method for selecting audio content by using speech recognition - Google Patents
System and method for selecting audio content by using speech recognition Download PDFInfo
- Publication number
- CN1924996B CN1924996B CN2005100991147A CN200510099114A CN1924996B CN 1924996 B CN1924996 B CN 1924996B CN 2005100991147 A CN2005100991147 A CN 2005100991147A CN 200510099114 A CN200510099114 A CN 200510099114A CN 1924996 B CN1924996 B CN 1924996B
- Authority
- CN
- China
- Prior art keywords
- sound
- content
- module
- statement
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
本发明为一种利用语音辨识以选取声音内容的系统及其方法,用以在依序播放的一声音内容中取得一声音语句,进而于一处理系统中进行处理,该系统包含:一播放模组,用以播放该声音内容;一接收模组,用以即时接收一使用者所发出的一语音输入语句;一缓冲模组,用以暂存该播放模组所播放的一指定区间内的该声音内容与该使用者所发出的该语音输入语句;一辨识模组,用以撷取该缓冲模组中该指定区间中的该声音内容与该语音输入语句而进行语音辨识;以及一转换模组,用以依照该辨识模组所比对出的最符合该语音输入语句的该声音语句而转换出对应该声音语句的一文字语句,藉以传送至该处理系统进行处理。
The present invention is a system and method for selecting sound content by using voice recognition, which is used to obtain a sound sentence in a sound content played in sequence, and then process it in a processing system. The system includes: a playback module, which is used to play the sound content; a receiving module, which is used to instantly receive a voice input sentence issued by a user; a buffer module, which is used to temporarily store the sound content in a specified interval played by the playback module and the voice input sentence issued by the user; a recognition module, which is used to capture the sound content and the voice input sentence in the specified interval in the buffer module to perform voice recognition; and a conversion module, which is used to convert a text sentence corresponding to the sound sentence according to the sound sentence that best matches the voice input sentence compared by the recognition module, so as to transmit it to the processing system for processing.
Description
技术领域technical field
本发明涉及一种选取声音内容的系统及其方法,本发明尤其涉及一种利用语音辨识技术以在声音内容中选取出特定声音片段而可进一步进行其后续处理的系统及其方法。 The present invention relates to a system and method for selecting audio content, in particular, the present invention relates to a system and method for selecting a specific audio segment from audio content by using speech recognition technology for further subsequent processing. the
背景技术Background technique
现行的资讯表达形式多以书面文字的内容(content)为主,而在此书面的文字内容之中常常有一些重要或是关键性的文字语句,对于这些关键语句(key phrases)可以透过系统主动加上标记来便于让使用者识别,这些标记像是反白、底线、引号、不同的颜色或是字体变更,或是由使用者主动以键盘、鼠标或输入笔等各式选取工具加以移动标记而选取出,被选取的关键语句可以用来作为进阶搜寻(advanced search)或是关键词索引(keyword index)等等用途。譬如系统可以在互联网的网站中对于其网页内容中的关键语句加上超链接,点选后可链接至其他的网页,而一般使用者则是可在电脑屏幕上观看一篇书面文章时,也可将一段关键语句以鼠标选取后,转贴到互联网上的各式搜寻引擎中以寻找出相关的对应文章。 The current form of information expression is mainly based on the content of the written text, and there are often some important or key text sentences in the written text content, for these key phrases (key phrases) can be obtained through the system Actively add marks for easy identification by users, such as highlighting, underlining, quotation marks, different colors or font changes, or are actively moved by users with various selection tools such as keyboard, mouse or stylus The selected key words can be used for advanced search or keyword index. For example, the system can add hyperlinks to key sentences in the content of the web pages on Internet sites, and after clicking, they can be linked to other web pages. After selecting a key sentence with the mouse, it can be forwarded to various search engines on the Internet to find relevant corresponding articles. the
由于,现行的资讯内容多是以文字呈现为主,对于纯粹只有声音表现的资讯内容仍是属于少数,但是在各式移动装置愈来愈普及的情况下,由于屏幕大小的限制,有些讯息由“看”改成“听”更为方便,再加上蓝牙耳机与无限上网等技术的逐渐普及,愈来愈多的资讯表达形式是采用“听”的声音内容,而对于在这些声音内容中如何去选取关键语句(key phrase)成了需要解决的问题。 Because most of the current information content is mainly presented in text, there are still a minority of information content that is purely expressed in sound. However, as various mobile devices become more and more popular, due to the limitation of screen size, some information is It is more convenient to change "watch" to "listen". Coupled with the gradual popularization of technologies such as Bluetooth headsets and unlimited Internet access, more and more forms of information expression use "listen" sound content, and for these sound content How to select the key phrase (key phrase) has become a problem that needs to be solved. the
此外,因为前述的采用“看”的书面文字内容是以并行的方式(parallel)来表达其讯息,而“听”的声音内容却是以顺序的方式(sequential)来表示其资讯内容,因此显然无法像对书面的文字内容一样使用既有选取工具,如超链接或是由鼠标选取其关键语句等等方式,来选取声音内容,因此使用者如何与声音内容有效进行互动成了逐渐增加的需求。In addition, because the written content of "seeing" is used to express its information in a parallel manner (parallel), while the audio content of "listening" is expressed in a sequential manner (sequential), it is obvious that It is impossible to use the existing selection tools like the written text content, such as hyperlinks or key words selected by the mouse, etc. to select the sound content, so how to effectively interact with the sound content has become an increasing demand for users .
综上所述,由于现今声音内容在选取其关键语句的技术仍有不足之处,因此发明人有鉴于上述现有技术的缺点而发明出本发明“利用语音辨识以选取声音内容的系统及其方法”。 To sum up, since the technology for selecting key sentences of voice content still has deficiencies, the inventors have invented the present invention "system for selecting voice content by using speech recognition and its method". the
发明内容Contents of the invention
本发明的主要目的在于提供一种利用语音辨识以选取声音内容的系统及其方法,其可以利用现有的语音辨识方法并经过适当搭配运用以达到声音内容与使用者的有效互动。 The main purpose of the present invention is to provide a system and method for selecting voice content by voice recognition, which can utilize the existing voice recognition method and use it properly to achieve effective interaction between the voice content and the user. the
本发明的另一目的在于提供一种利用语音辨识以选取声音内容的系统及其方法,其在播放一段声音内容后,对使用者所发出的语音输入语句与该使用者发出语音输入语句前的指定区间内所播出的声音内容来进行语音辨识,而选取出此段声音内容中的特定声音语句,进而进行后续的处理。 Another object of the present invention is to provide a system and method for selecting sound content by using speech recognition. Speech recognition is performed on the audio content played in the specified interval, and specific audio sentences in this segment of audio content are selected for subsequent processing. the
本发明的又一目的为提供选取声音内容的系统,用以在依序播放的一声音内容中取得一声音语句,进而于一处理系统中进行处理,其包含:一播放模组,用以播放该声音内容;一接收模组,用以即时接收一使用者所发出的一语音输入语句;一缓冲模组,用以暂存该播放模组所播放的一指定区间内的该声音内容与该使用者所发出的该语音输入语句,且该指定区间为当该接收模组接收到该语音输入语句时,该播放模组在一最后指定时间内所播放的该声音内容;一辨识模组,用以撷取该缓冲模组中该指定区间中的该声音内容与该语音输入语句而进行语音辨识,进而比对辨识出该指定区间的该声音内容中最符合该使用者所发出的该语音输入语句的该声音语句;以及一转换模组,与该辨识模组连接,用以依照该辨识模组所比对出的最符合该语音输入语句的该声音语句而转 换出对应该声音语句的一文字语句,进而提供给该处理系统进行处理。 Yet another object of the present invention is to provide a system for selecting audio content, for obtaining an audio sentence from an audio content played sequentially, and then processing it in a processing system, which includes: a playback module for playing The sound content; a receiving module, used to immediately receive a voice input sentence issued by a user; a buffer module, used to temporarily store the sound content and the sound content in a specified interval played by the playing module. The voice input sentence issued by the user, and the specified interval is the sound content played by the playback module within a last specified time when the receiving module receives the voice input sentence; a recognition module, It is used to extract the voice content in the specified section in the buffer module and the voice input sentence for voice recognition, and then compare and identify the voice content in the specified section that best matches the voice uttered by the user The sound sentence of the input sentence; and a conversion module, connected with the recognition module, in order to convert the corresponding sound sentence according to the sound sentence that is most consistent with the voice input sentence compared by the recognition module A literal statement of , and then provided to the processing system for processing. the
根据上述构想,该系统还包含一来源数据库,而该来源数据库可以包含有多个文字内容,因此该转换模组还可与该来源数据库和该播放模组连接,用以撷取该来源数据库的一文字内容并转换成该声音内容而透过该播放模组播放。 According to the above idea, the system also includes a source database, and the source database can contain multiple text contents, so the conversion module can also be connected with the source database and the playback module to retrieve the source database A text content is converted into the audio content and played through the playback module. the
根据上述构想,该来源数据库也可以是包含有多个文字内容与语音资讯,因此该播放模组则是该播放模组撷取该来源数据库的语音数据以播出该声音内容。 According to the above idea, the source database may also include a plurality of text contents and voice information, so the playback module captures the voice data of the source database to play the audio content. the
根据上述构想,其最后指定时间为20秒。 According to the above conception, the last specified time is 20 seconds. the
根据上述构想,该处理系统为一语音对话系统、一索引分类系统、一操控系统或是一进阶搜寻系统,倘若该处理系统为该进阶搜寻系统,则可以透过一检索模组以检索出对应该文字语句的相关文字或是语音资讯以供该使用者使用。 According to the above idea, the processing system is a voice dialogue system, an index classification system, a control system or an advanced search system, if the processing system is the advanced search system, it can be retrieved through a retrieval module Output relevant text or voice information corresponding to the text sentence for the user to use. the
本案的又一目的为提供一种选取声音内容的系统,用以在依序播放的一声音内容中取得一声音语句,其中该声音内容更具有多个声音标记,用以标记出该声音内容中的多个关键用语,其包含:一播放模组,用以播放带有该声音标记的该声音内容;一接收模组,用以即时接收一使用者所发出的一语音输入语句;一辨识模组,对该声音内容的多个关键用语与该语音输入语句进行语音辨识,进而比对辨识出该等关键用语中最符合该使用者所发出的该语音输入语句的该声音语句;一缓冲模组,用以暂存所述播放模组所播放的一指定区间内的所述声音内容与所述使用者所发出的所述语音输入语句,其中所述辨识模组撷取所述缓冲模组中的该指定区间内的所述声音内容与所述使用者所发出的所述语音输入语句进行辨识;以及一转换模组,用以依照该辨识模组所比对出的最符合该语音输入语句的该声音语句而转换出对应该声音语句的一文字语句。 Another object of this case is to provide a system for selecting audio content, which is used to obtain an audio sentence in an audio content played sequentially, wherein the audio content has a plurality of audio tags for marking the audio content in the audio content. A plurality of key terms, which include: a playback module, used to play the sound content with the sound mark; a receiving module, used to receive a voice input sentence issued by a user in real time; a recognition module group, performing voice recognition on a plurality of key terms of the voice content and the voice input sentence, and then comparing and identifying the voice sentence that best matches the voice input sentence issued by the user among the key terms; a buffer module set, used to temporarily store the sound content in a specified interval played by the playback module and the voice input sentence issued by the user, wherein the recognition module retrieves the buffer module Identify the voice content in the specified interval and the voice input sentence issued by the user; and a conversion module for matching the voice input according to the recognition module The voice sentence of the sentence is converted into a text sentence corresponding to the voice sentence. the
根据上述构想,该辨识模组透过一直接声波比对出最相近的可能的直接比对双方的声音波形方式或是。 According to the above idea, the recognition module uses a direct sound wave comparison to find out the closest possible sound waveform of the two sides directly compared or. the
该根据上述构想,该辨识模组透过选自一隐藏式马可夫模型方式(Hidden Markov Model,HMM)、一神经网络方式(Neural Networks)、一动态时间校准方式(Dynamic Time Warping,DTW)或一语音模版比对方式(Template Matching)来进行语音辨识。 According to the above idea, the identification module is selected from a hidden Markov model (Hidden Markov Model, HMM), a neural network (Neural Networks), a dynamic time calibration (Dynamic Time Warping, DTW) or a Voice template comparison (Template Matching) for voice recognition. the
根据上述构想,该声音标记为以不同快慢、不同声调或不同音量来表示该关键用语,或是该声音标记为对该关键用语的前后加上提示音的方法标记。 According to the above idea, the sound mark is to express the key term with different speeds, different tones or different volumes, or the sound mark is a method mark for adding prompt sounds before and after the key term. the
根据上述构想,该转换模组所转换出的该文字语句,进而提供一处理系统中进行后续处理。 According to the above idea, the text sentence converted by the conversion module is further provided to a processing system for subsequent processing. the
本案的又一目的为提供一种选取声音内容的方法,用以在依序播放的一声音内容中取得一声音语句,进而进行一后续处理程序,其包含下列步骤:(a)播放该声音内容;(b)接收一使用者所发出的一语音输入语句;(c)将该语音输入语句与在一指定区间内所播放的该声音内容进行语音辨识;以及(d)从指定区间内的该声音内容中比对出最符合该使用者所发出的该语音输入语句的该声音内容,进而进行该后续处理程序。 Another object of this case is to provide a method for selecting sound content, for obtaining a sound sentence in a sound content played sequentially, and then performing a follow-up processing procedure, which includes the following steps: (a) playing the sound content ; (b) receiving a voice input sentence issued by a user; (c) performing speech recognition on the voice input sentence and the sound content played in a specified interval; and (d) from the specified interval The voice content is compared with the voice content that most matches the voice input sentence uttered by the user, and then the subsequent processing procedure is performed. the
根据上述构想,该声音内容还具有多个声音标记,用以标记出该声音内容中的多个关键用语,因此 According to the above idea, the sound content also has a plurality of sound marks, which are used to mark out a plurality of key terms in the sound content, so
根据上述构想,该步骤(c)还包含将该语音输入语句与该指定区间内的该声音内容中带有该多个其中之一的关键用语进行语音辨识。 According to the above idea, the step (c) further includes performing voice recognition on the voice input sentence and the key words with one of the plurality of key words in the voice content in the specified interval. the
根据上述构想,该步骤(d)还包含由该多个关键用语中比对出最符合该使用者所发出的该语音输入语句的该声音语句。 According to the above idea, the step (d) further includes comparing the plurality of key words to find the voice sentence that best matches the voice input sentence uttered by the user. the
根据上述构想,该步骤(c)透过一比对出最相近的可能的直接比对双方的声音波形方式或是透过选自一隐藏式马可夫模型方式、一神经网络方式、一动态时间校准方式或一语音模版比对方式来进行语音辨识。 According to the above idea, the step (c) is through a comparison of the sound waveforms of the two sides which is the closest possible direct comparison or through a method selected from a hidden Markov model, a neural network method, and a dynamic time calibration The voice recognition is carried out by means of a voice template comparison mode or a voice template comparison mode. the
根据上述构想,该步骤(d)还包含一步骤(d1)转换该声音内容为一文字语句。 According to the above idea, the step (d) further includes a step (d1) converting the audio content into a text sentence. the
根据上述构想,该后续处理步骤为一进阶搜寻步骤、一关键字索引步骤、一语音对话系统或是一操控程序。 According to the above idea, the subsequent processing step is an advanced search step, a keyword indexing step, a voice dialogue system or a control program. the
本案的功效与目的,可藉由下列实施方式说明,对其有更深入的了解。 The effect and purpose of this case can be explained through the following embodiments, and it has a deeper understanding. the
附图说明Description of drawings
图1(A)为本发明第一较佳实施例的一种利用语音辨识以选取声音内容的系统的简要配置架构示意图。 FIG. 1(A) is a schematic configuration diagram of a system for selecting audio content by using speech recognition according to the first preferred embodiment of the present invention. the
图1(B)为本发明第二较佳实施例的一种利用语音辨识以选取声音内容的系统的简要配置架构示意图。 FIG. 1(B) is a schematic configuration diagram of a system for selecting audio content by using speech recognition according to a second preferred embodiment of the present invention. the
图2为本发明较佳实施例的一种利用语音辨识以选取声音内容的方法的流程示意图。 FIG. 2 is a schematic flowchart of a method for selecting audio content by using speech recognition according to a preferred embodiment of the present invention. the
具体实施方式Detailed ways
对于下文中说明本发明,本领域普通技术人员须了解下文中的说明仅作为例证用,而不用于限制本发明。 For the following descriptions of the present invention, those skilled in the art should understand that the following descriptions are only for illustration purposes and not intended to limit the present invention. the
以下针对本案较佳实施例的利用语音辨识以选取声音内容的系统及其方法进行描述,但实际架构与所采行的方法并不必须完全符合描述的架构与方法,本领域普通技术人员当能在不脱离本发明的实际精神及范围的情况下,做出种种变化及修改。 The following is a description of the system and method for selecting sound content by using speech recognition in the preferred embodiment of this case, but the actual structure and the method adopted do not necessarily completely conform to the described structure and method, and those of ordinary skill in the art will be able to Various changes and modifications may be made without departing from the true spirit and scope of the invention. the
请参阅图1(A)和(B),其分别为本发明所揭示的一种利用语音辨识以选取声音内容的系统及其方法的简要系统架构示意图。本发明的选取系统10包含有一播放模组11、一接收模组12、一缓冲模组13、一辨识模组14、一转换模组15和一来源数据库16,其借着从该播放模组11所播放出的声音内容中选取出一声音语句,进而可提供给一处理系统17进行一后续处理。 Please refer to FIG. 1 (A) and (B), which are schematic diagrams of a system structure of a system and method for selecting audio content by using speech recognition disclosed in the present invention, respectively. The
其中,由该播放模组11是用来播放出该声音内容以让一使用者依照时间顺序听到该声音内容,而该接收模组12则是用以即时接收该使用者所发出的一语音输入语句,此外,该缓冲模组13则是暂存着该播放模组11所播放的一指定区间内的该声音内容与由该接收模组12所接收的该使用者所发出的该语音输入语句,因此, 该辨识模组14是撷取该缓冲模组13中该指定区间中的该声音内容与该语音输入语句而进行语音辨识,进而比对辨识出该指定区间的该声音内容中最符合该使用者所发出的该语音输入语句的该声音语句,于是,该转换模组15是用以依照该辨识模组14所比对出的最符合该语音输入语句的该声音语句而转换出对应该声音语句的一文字语句,而该来源数据库16则是提供该播放模组11所播放的声音内容来源。 Wherein, the playing
此外,根据该来源数据库16的所储存资讯的种类不同,该选取系统10的组成架构亦略有不同。 In addition, according to the different types of information stored in the
于是,请参阅图1(A),其为本案第一实施例的选取系统10,其中该来源数据库16包含有多个文字内容,因此该转换模组15还可与该来源数据库16和该播放模组11相互连接,而该转换模组15可撷取该来源数据库16中多个文字内容其中的一文字内容并转换成该声音内容而透过该播放模组11来播出,同时,透过该转换模组15,同时也可将欲播放的声音内容储存在该缓冲模组13中。 Then, referring to Fig. 1 (A), it is the
此外,若是该来源数据库16是包含有多个文字内容与语音资讯时,在此情况下,请参阅图1(B),该来源数据库16则是无须与该转换模组15连接,而是直接可以由该播放模组11撷取该来源数据库16中的语音数据而播放的该声音内容,且该来源数据库16也可将欲播放的声音内容储存在该缓冲模组13中。 In addition, if the
且由于使用者是以时间顺序听到该声音内容,因此该使用者所发出的语音输入语句通常是属于刚听过的声音内容,因此本发明设定出该指定区间为当该接收模组12接收到该语音输入语句时,该播放模组11在一最后指定时间内所播放的该声音内容,并且将该指定区间的声音内容暂存在该缓冲模组13中,其中该最后指定时间可以设定为20秒或是其他的任意时间。此外,当该接收模组12接收到该使用者所发出的该语音输入语句时,该语音输入语句也会储存在该缓冲模组13,于是该辨识模组14只要撷取该缓冲模组13所储存的该声音内容与该语音输入语句并利用语音辨识技术 加以比对选取出在该指定区间的该声音内容中最符合该使用者所发出的该语音输入语句的该声音语句,同时也可透过该转换模组15将所比对选取出的该声音语句转换为一文字语句,进而提供给该处理系统17进行处理。 And because the user hears the sound content in chronological order, the voice input sentence sent by the user usually belongs to the sound content just heard, so the present invention sets the designated interval as when the receiving
其中该处理系统17可以是一语音对话系统、一索引分类系统、一操控系统或是一进阶搜寻系统等等,可以根据不同需求而进行不同的后续处理程序,譬如:该语音对话系统可以依据该文字语句的涵义而进行一语音对话、该索引分类系统可以将其声音内容进行关键字索引程序、该操控系统则是可以透过了解其文字语句意义而进而去操控其他程序、或是该进阶搜寻系统可将其文字语句透过一检索模组(图中未揭示)以检索出对应该文字语句的相关文字或是语音资讯以供该使用者使用。 Wherein the
且因该处理系统17是因应不同需求而进行不同的后续处理程序,譬如:若该处理系统17是该索引分类系统,则可以仅需要该选取系统10提供该声音内容以来进行索引分类,而若该处理系统17是该语音对话系统、该操控系统或是该进阶搜寻系统,则可能需要该选取系统10提供该文字语句以供该处理系统17进一步判断分析。于是,该选取系统10即可因应该处理系统17的不同类型而传送该声音语句或是该文字语句至该处理系统17中来进行后续处理,而在其实际资讯流传送流程上,倘若该选取系统10欲传送该声音语句至该处理系统17中,则是可以由该辨识模组14传送该声音语句至该处理系统17,反之,若是该选取系统10欲传送该文字语句至该处理系统17中,则可以透过该转换模组15传送转换后的文字语句至该处理系统17中。 And because the
再则,该辨识模组14是透过一直接声波比对方式或是以一声学模型比对方式来进行语音辨识,其中该直接声波比对方式即是直接比对双方的声音波形,而比对出最相近的可能,而该声学模型比对方式则是透过一隐藏式马可夫模型(Hidden Markov Model,HMM)、一神经网络(Neural Networks)、一动态时间校准(Dynamic Time Warping,DTW)或是一语音模版比对(Template Matching)等各式声学模型来进行语音辨识。 Furthermore, the
请再参阅图2,其为本发明利用语音辨识以选取声音内容的系统及其方法的实施方法流程图。本发明方法先由系统播放一段声音内容21,随后再接收使用者所发出的语音输入语句22,且将该语音输入语句与该段播放声音内容中的一指定区间内的声音内容进行语音辨识23,并从该指定区间内的该声音内容中比对选取出最符合该使用者所发出的该语音输入语句的该声音内容24,进而进行一后续处理程序25,其中该后续处理程序可以是一进阶搜寻步骤、一关键字索引步骤、一语音对话系统或是一操控程序,且如上面内容所述,当该后续处理程序需要利用文字资讯来进行时,则本发明方法还可以将该声音内容转换成一文字语句以供该后续处理程序进行处理。 Please refer to FIG. 2 again, which is a flow chart of the implementation method of the system and method for selecting audio content by using speech recognition according to the present invention. In the method of the present invention, the system first plays a section of audio content 21, then receives the voice input sentence 22 sent by the user, and performs voice recognition 23 on the voice input sentence and the audio content in a specified interval of the playback audio content. , and compare and select the voice content 24 that best matches the voice input sentence issued by the user from the voice content in the specified interval, and then perform a follow-up processing procedure 25, wherein the follow-up processing procedure can be a Advanced search steps, a keyword indexing step, a voice dialogue system or a control program, and as described above, when the subsequent processing procedure needs to be carried out using text information, the method of the present invention can also use the voice The content is converted into a text statement for the subsequent processing program to process. the
此外,为了让语音辨识的效率更高,本发明还可以对该声音内容主动加上标记,以使该声音内容拥有多个声音标记来标记出该声音内容中的多个关键用语,如此可以让使用者在听的时候知道这是属于关键用语,其中该声音标记为以不同快慢、不同声调或不同音量来表示该关键用语或是对该关键用语的前后加上提示音的方法标记。 In addition, in order to make speech recognition more efficient, the present invention can also actively mark the voice content, so that the voice content has multiple voice marks to mark multiple key terms in the voice content, so that The user knows that this is a key term when listening to it, and the sound mark is a method mark for expressing the key term with different speeds, different tones or different volumes or adding prompting sounds before and after the key term. the
其中该声音标示可以储存在如图1(A)和(B)所示的来源数据库16中,无论该来源数据库16所储存是纯为文字内容或是同时拥有文字内容和语音资讯,只要透过系统的简单设定(譬如:在语音资讯中可以直接储存带有特定声音标记的语音关键语句,而在文字内容中则是可以直接对文字内容中的特定文字片段直接标注出欲标记的声音形式,以便于以后文字转语音时可以播出该特定声音标记),即可播放出带有声音标记的声音内容。 Wherein the sound mark can be stored in the
于是,其语音辨识方式即可以只对该指定区间内的带有声音标记的该声音内容进行语音辨识,因此不但有效节省辨识时间,且辨识率也会相对提高。然而,若单纯以技术讨论,本发明的选取 系统也可以无须特别指定声音内容的区间,而可以直接将全部的声音内容与其语音输入语句进行比对,或是将这些全部的声音内容中带有声音标记的关键用语与该语音输入语句进行比对。 Therefore, the voice recognition method can only perform voice recognition on the voice content with the voice mark in the specified interval, so not only the recognition time is effectively saved, but also the recognition rate is relatively improved. However, if it is simply discussed in terms of technology, the selection system of the present invention can also directly compare all the voice content with its voice input sentence without specifying the interval of the voice content, or compare all the voice content with the The key words of the sound mark are compared with the voice input sentence. the
因此,根据本发明所提供的声音内容选取技术来即时选取适当的声音语句,其提供了一种便利的互动机制以让使用者与以顺序方式呈现的(sequential)声音内容有效互动,大幅改善了过去使用者只能一直处在被动的立场倾听该声音内容来撷取资讯,且改进了过去的声音内容不能像以并行方式呈现(parallel)的书面文字内容一样同样拥有很多的工具帮助人与其内容的互动。 Therefore, according to the sound content selection technology provided by the present invention, the appropriate sound sentence is selected in real time, which provides a convenient interaction mechanism to allow the user to interact effectively with the (sequential) sound content presented in a sequential manner, greatly improving the In the past, users could only listen to the audio content in a passive position to extract information, and the improved audio content in the past cannot have as many tools as the written text content presented in parallel to help people and content of interaction. the
于是在实际应用上,本发明可适用在各种以声音内容传达资讯的各式互动设备(如移动装置、蓝牙设备或上网装置)中,只要透过本发明所提供的声音内容选取机制,就可以让使用者在声音内容中轻易的选取出所欲指定的声音语句,进而可提供作为后续的相关处理或服务项目中,而此使用者并不需要特别的训练或是记忆特殊的操作指令。 Therefore, in practical applications, the present invention can be applied to various interactive devices (such as mobile devices, bluetooth devices, or Internet access devices) that convey information through sound content. As long as the sound content selection mechanism provided by the present invention is used, the It allows the user to easily select the desired voice sentence from the voice content, and then provide it as a follow-up related processing or service item, and the user does not need special training or memorization of special operation instructions. the
综上所述,本案确实可提供一种利用语音辨识以选取声音内容的系统及其方法,其突破了在固有播放声音内容无法与使用者进行互动的问题,而是利用既有语音识别的技术并搭配适当的资讯存取技术以及特殊的语音标记模式,以让使用者所发出的语音输入语句和所播放的声音内容进行语音辨识,进而选取出此段声音内容中的特定声音语句,进而进行后续的各式处理程序,此技术无须增加许多繁复的软硬体设备,而实施成本极为低廉。因此,本发明声音内容选取系统及其选取声音内容的方法的技术相对简单但却可提供极高的便利性,使用者无须特别训练或学习并可运用到各种以声音表达资讯的领域,且可以有效增进产业的进步,本发明技术简单,可运用领域广泛,实具产业的价值,遂依法提出发明专利申请。 To sum up, this case can indeed provide a system and method for selecting voice content by using voice recognition. And with the appropriate information access technology and special voice marking mode, the voice input sentence issued by the user and the sound content played can be recognized by voice, and then the specific voice sentence in this piece of voice content can be selected, and then carried out For subsequent various processing procedures, this technology does not need to add many complicated software and hardware devices, and the implementation cost is extremely low. Therefore, the technology of the sound content selection system and the method for selecting sound content of the present invention is relatively simple but can provide extremely high convenience, and the user does not need special training or learning and can be applied to various fields of expressing information with sound, and It can effectively promote the progress of the industry. The technology of the invention is simple, it can be used in a wide range of fields, and has real industrial value. Therefore, an application for an invention patent is filed according to law. the
以上所述利用较佳实施例详细说明本发明,而非限制本发明的范围,因此本领域普通技术人员应能明了,适当而作些微小的改 变与调整,仍将不失本发明的要义所在,也不脱离本发明的精神和范围,故都应视为本发明的进一步实施状况。 The above description utilizes the preferred embodiments to illustrate the present invention in detail, rather than limit the scope of the present invention, so those of ordinary skill in the art should be able to understand that it is appropriate to make some minor changes and adjustments without losing the gist of the present invention Therefore, it should be regarded as a further implementation of the present invention. the
本发明所主张的范围应以权利要求书中的权利要求所述的为准。The claimed scope of the present invention should be determined by what is stated in the claims.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2005100991147A CN1924996B (en) | 2005-08-31 | 2005-08-31 | System and method for selecting audio content by using speech recognition |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN2005100991147A CN1924996B (en) | 2005-08-31 | 2005-08-31 | System and method for selecting audio content by using speech recognition |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN1924996A CN1924996A (en) | 2007-03-07 |
| CN1924996B true CN1924996B (en) | 2011-06-29 |
Family
ID=37817605
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN2005100991147A Expired - Fee Related CN1924996B (en) | 2005-08-31 | 2005-08-31 | System and method for selecting audio content by using speech recognition |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN1924996B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104601832A (en) * | 2008-04-29 | 2015-05-06 | 台达电子工业股份有限公司 | Dialogue system and voice dialogue processing method |
| CN111609883B (en) * | 2020-05-20 | 2021-03-30 | 山东联信征信管理有限公司 | Communication machine room protection monitoring management system based on big data |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020013705A1 (en) * | 2000-07-28 | 2002-01-31 | International Business Machines Corporation | Speech recognition by automated context creation |
| US20020178002A1 (en) * | 2001-05-24 | 2002-11-28 | International Business Machines Corporation | System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition |
| CN1391209A (en) * | 2001-06-11 | 2003-01-15 | 株式会社日立制作所 | Phonetics synthesizing method and synthesizer thereof |
| CN1474379A (en) * | 2002-07-02 | 2004-02-11 | �ձ������ȷ湫˾ | Voice identfying/responding system, voice/identifying responding program and its recording medium |
| US20040054541A1 (en) * | 2002-09-16 | 2004-03-18 | David Kryze | System and method of media file access and retrieval using speech recognition |
| US6775358B1 (en) * | 2001-05-17 | 2004-08-10 | Oracle Cable, Inc. | Method and system for enhanced interactive playback of audio content to telephone callers |
| WO2005041109A2 (en) * | 2003-10-17 | 2005-05-06 | Nielsen Media Research, Inc. | Methods and apparatus for identifiying audio/video content using temporal signal characteristics |
-
2005
- 2005-08-31 CN CN2005100991147A patent/CN1924996B/en not_active Expired - Fee Related
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20020013705A1 (en) * | 2000-07-28 | 2002-01-31 | International Business Machines Corporation | Speech recognition by automated context creation |
| US6775358B1 (en) * | 2001-05-17 | 2004-08-10 | Oracle Cable, Inc. | Method and system for enhanced interactive playback of audio content to telephone callers |
| US20020178002A1 (en) * | 2001-05-24 | 2002-11-28 | International Business Machines Corporation | System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition |
| CN1391209A (en) * | 2001-06-11 | 2003-01-15 | 株式会社日立制作所 | Phonetics synthesizing method and synthesizer thereof |
| CN1474379A (en) * | 2002-07-02 | 2004-02-11 | �ձ������ȷ湫˾ | Voice identfying/responding system, voice/identifying responding program and its recording medium |
| US20040054541A1 (en) * | 2002-09-16 | 2004-03-18 | David Kryze | System and method of media file access and retrieval using speech recognition |
| WO2005041109A2 (en) * | 2003-10-17 | 2005-05-06 | Nielsen Media Research, Inc. | Methods and apparatus for identifiying audio/video content using temporal signal characteristics |
Also Published As
| Publication number | Publication date |
|---|---|
| CN1924996A (en) | 2007-03-07 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Iskra et al. | Speecon-speech databases for consumer devices: Database specification and validation | |
| CN101030368B (en) | Method and system for communicating across channels simultaneously with emotion preservation | |
| CN113326387B (en) | Intelligent conference information retrieval method | |
| US8909525B2 (en) | Interactive voice recognition electronic device and method | |
| CA3058928A1 (en) | Hands-free annotations of audio text | |
| CN104078044A (en) | Mobile terminal and sound recording search method and device of mobile terminal | |
| KR20140123369A (en) | Question answering system using speech recognition and its application method thereof | |
| EP3620939A1 (en) | Method and device for simultaneous interpretation based on machine learning | |
| CN101763756A (en) | Interactive intelligent foreign language dictation training system and method based on network | |
| KR20130086971A (en) | Question answering system using speech recognition and its application method thereof | |
| TWI270052B (en) | System for selecting audio content by using speech recognition and method therefor | |
| CN1924996B (en) | System and method for selecting audio content by using speech recognition | |
| JP5713782B2 (en) | Information processing apparatus, information processing method, and program | |
| CN116628264A (en) | A conference information processing method, device, equipment and medium | |
| CN1835077B (en) | Chinese name automatic speech recognition input method and system | |
| Frödrich | Functions of'Uptalk'in Australian English | |
| US20110165541A1 (en) | Reviewing a word in the playback of audio data | |
| Sladek et al. | Speech-to-text transcription in support of pervasive computing | |
| TW201411577A (en) | Voice processing method of point-to-read device | |
| TWI220206B (en) | System and method for searching a single word in accordance with speech | |
| Adell Mercado et al. | Buceador, a multi-language search engine for digital libraries | |
| TW594606B (en) | Language learning system and method thereof | |
| CN118430538A (en) | Error correction multi-mode model construction method, system, equipment and medium | |
| CN118506762A (en) | Low-sample multilingual synthesized voice cloning method and system | |
| Feng et al. | Language Modeling for Voice-Enabled Social TV Using Tweets. |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20110629 |