JP4223841B2

JP4223841B2 - Spoken dialogue system and method

Info

Publication number: JP4223841B2
Application number: JP2003072673A
Authority: JP
Inventors: 賢司阿部
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2003-03-17
Filing date: 2003-03-17
Publication date: 2009-02-12
Anticipated expiration: 2023-03-17
Also published as: JP2004279841A

Description

【０００１】
【発明の属する技術分野】
本発明は、各カラムごとにメタデータが付与されたデータベースを有し、当該データベースを参照することで音声認識用文法を自動的に作成する音声対話システム及び方法に関する。
【０００２】
【従来の技術】
近年におけるコンピュータ技術の急速な伸展に伴って、音声を用いた対話システムが急速に普及している。そして、音声対話システムにおいては、ユーザによる発話を正確に認識するために、まず音素ごとに音声認識用辞書を参照することによって音響的な認識を行ってから、音声認識用文法を参照することによって文法的な認識を行うことがよく行われている。
【０００３】
従来、音声対話システムに用いられる音声認識用文法を作成する場合、まずシステム開発者がサービス内容に応じた対話シナリオを設計し、当該シナリオに沿って、対話においてユーザが発し得る発話内容（単語列）を受理することができるように、有限状態オートマトンといった言語生成機構としてモデル化されていた。
【０００４】
そして、音声認識用文法についても、ユーザの発話内容が制限されるような認識用文法であるシステム主導型の認識用文法のみならず、ユーザに自由な発話を認めるようなユーザ主導型の認識用文法を生成することができるように様々な工夫がなされている。
【０００５】
例えば（特許文献１）においては、システム主導型の音声対話だけではなく、ユーザによる自由な発話を受け付けることができるように、ガイダンスの出力内容を制御し、ユーザからの応答内容に応じて認識用文法を切り換えることができる音声対話システムを開示している。
【０００６】
【特許文献１】
特開２００２−３４２０６５号公報
【０００７】
【発明が解決しようとする課題】
しかし、認識用文法の生成作業は、非常に多くの工数を消費する作業であり、特にユーザ主導型の認識用文法を生成する場合には、ユーザが発話する可能性のある対話内容をすべて網羅しておく必要があり、現実的には完全な認識用文法を生成することができないという問題点があった。
【０００８】
また、ユーザ発話を構成する単語（トークン）やそれに意味的な情報を付加するための意味タグを用いて音声認識用文法を構築し、例えば音声を用いた情報検索システムにおいては、認識結果より得られるキーワードや意味タグ情報に基づいて検索対象データベースを検索していたが、この場合、ユーザへの応答である検索結果を抽出するのに、音声認識処理及び情報検索処理の２つの処理を実施する必要があった。
【０００９】
本発明は、上記問題点を解決するために、ユーザ主導型あるいはシステム主導型の音声認識用文法を容易に作成することができ、またユーザへの応答の抽出を迅速にすることができる音声対話システムを提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記目的を達成するために本発明にかかる音声対話システムは、音声対話サービスにおける検索対象となるコンテンツ情報を格納したデータベースを有し、少なくともユーザ発話を入力する音声入力部と、ユーザ発話を認識する音声認識部と、認識結果に応じて応答指示を制御する対話管理部と、ユーザへの応答を生成して出力する応答生成・出力部を含む音声対話システムであって、データベースにおいて、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグが各カラムごとに設定され、トークンタグとデータタグを含めて音声認識用文法を作成する音声認識用文法作成部をさらに含み、音声認識部において、作成された音声認識用文法に基づいてユーザ発話を認識することを特徴とする。
【００１１】
かかる構成により、検索対象データベースとして、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグの付与されたデータベースを用意することで、検索対象データベースを参照するだけで認識用文法を生成することができ、音声認識時に検索結果を抽出することができるため、音声認識後の検索処理を省略することができ、処理負荷を軽減することが可能となる。
【００１２】
また、本発明にかかる音声対話システムは、音声認識用文法作成部において、ユーザによる自由な発話に対応する音声認識用文法もしくは発話が制限されている音声認識用文法を作成することが好ましい。ユーザ主導型／システム主導型の認識用文法の両方を簡単に作成することができるからである。
【００１３】
また、本発明にかかる音声対話システムは、音声認識結果や対話成立状況をログデータとして記録する手段を含み、ログデータに基づいて用いる又は作成する音声認識用文法を切り替えることが好ましい。対話の進行状況に応じて、ユーザへの対応を動的に切り換えることができるとともに、状況に応じた音声認識用辞書を作成することができるからである。
【００１４】
また、本発明にかかる音声対話システムは、ログデータに基づいて、データタグの付されたカラムが出力された回数と、音声認識用文法が使用された回数との比を算出し、算出した比が所定のしきい値以下である場合には、音声認識用文法を発話が制限されている音声認識用文法に切り換え、しきい値を超えている場合には音声認識用文法を自由な発話に対応する音声認識用文法に切り換えることが好ましい。最終的な検索結果であるデータが出力されたという事実は、対話が正常に行われたことを示すことから、かかる回数が多いほど自由な発話に対応できると判断できるからである。
【００１５】
また、本発明にかかる音声対話システムは、ログデータに基づいて、データタグの付されたカラムが出力された回数と、音声認識用文法が使用された回数との比を算出し、算出した比が所定のしきい値以下である場合には、発話が制限されている音声認識用文法を作成し、しきい値を超えている場合には自由な発話に対応する音声認識用文法を作成することが好ましい。最終的な検索結果であるデータが出力されたという事実は、対話が正常に行われたことを示すことから、かかる回数が多いほど自由な発話に対応できると判断できるからである。
【００１６】
また、本発明は、上記のような音声対話システムをコンピュータの処理ステップとして実行するソフトウェアを特徴とするものであり、具体的には、音声対話サービスにおける検索対象となるコンテンツ情報を格納したデータベースを有し、少なくともユーザ発話を入力する工程と、ユーザ発話を認識する工程と、認識結果に応じて応答指示を制御する工程と、ユーザへの応答を生成して出力する工程を含む音声対話方法であって、データベースにおいて、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグが各カラムごとに設定され、トークンタグとデータタグを含めて音声認識用文法を作成する工程をさらに含み、ユーザ発話を認識する工程において、作成された音声認識用文法に基づいてユーザ発話を認識する音声対話方法並びにそのような工程を具現化するコンピュータ実行可能なプログラムであることを特徴とする。
【００１７】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、検索対象データベースとして、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグの付与されたデータベースを用意することで、検索対象データベースを参照するだけで認識用文法を生成することができ、音声認識時に検索結果を抽出することができるため、音声認識後の検索処理を省略することができ、処理負荷を軽減することができる音声対話システムを実現することが可能となる。
【００１８】
【発明の実施の形態】
以下、本発明の実施の形態にかかる音声対話システムについて、図面を参照しながら説明する。図１は本発明の実施の形態にかかる音声対話システムの構成図である。
【００１９】
図１において、入力受信部１からは、ユーザによる発話や文字が入力される。例えば本実施の形態にかかる音声対話システムが情報検索システムである場合には、ユーザによる検索条件が音声データとして入力される。ここで、ユーザとは、音声認識用文法の作成を行うシステムエンジニア等の技術者及び検索タスクを行うエンドユーザとが考えられる。
【００２０】
すなわち、音声認識用文法の作成を行うシステムエンジニア等の技術者は、本実施の形態にかかる音声対話システムで使用する音声認識用文法を生成するために、入力受信部１から音声データあるいはコマンド等の文字データを入力する。音声データを用いて音声認識用文法を生成する場合には、入力解析部２において一般的な音声認識用文法を用いて音声認識して入力内容が特定される。
【００２１】
そして、対話管理部３に入力内容が渡されると、音声認識用文法作成部１２において、基本辞書１３とともに検索対象である検索データベース１１を参照して音声認識用文法を生成し、ユーザ主導型音声認識用文法格納部１４あるいはシステム主導型音声認識用文法格納部１５に保存する。
【００２２】
また、検索タスクを行うエンドユーザは、入力受信部１から検索条件となる音声データを入力する。音声データは、入力解析部２において、初期の音声認識用文法あるいは音声認識用文法作成部１２において既に生成されている音声認識用文法を用いて解析され、解析結果が対話管理部３に渡される。
【００２３】
ここで、従来の音声対話システムでは、対話管理部３において入力内容に応じて検索データベース１１を照会することで、入力条件に合致した検索結果を取得する。
【００２４】
一方、本願発明は、入力受信部１からエンドユーザによる発話が入力されると、事前に生成して保存してある音声認識用文法を用いて音声認識を行った時点で、検索結果も取得することができる点に特徴を有する。もちろん、音声認識用文法作成部１２において音声認識用文法が生成されていない状況においては、対話管理部３におけるデータベース照会によって検索結果を得る。
【００２５】
まず基本辞書１３は、音声認識用文法を作成する際の基本情報を抽出するための辞書である。基本辞書１３には、検索データベース１１の各単語に対する“読み”に関する情報や、検索データベース１１に存在しない単語であり、かつ音声認識用文法に必要な単語を登録する。例えば、単語“氏名”に対して、“さん”あるいは“くん”といった単語や、助詞等を登録することになる。
【００２６】
次に、検索データベース１１は、例えば予定案内や郵便番号検索等のタスクに応じたデータベースであり、データベースの各要素に対して、ユーザが入力すべき内容か、あるいはユーザに提示すべき内容かを識別する識別情報がタグとして付加されている。
【００２７】
例えば、検索データベース１１が予定案内用のデータベースである場合におけるデータ構成の例示図を図２に示す。図２において、氏名欄の各単語や日付欄の各単語、及び“予定”という単語に「（Ｔ）」というタグ記号が付加されている。これは、ユーザの問い合わせの中で“トークン”として用いられることを表すための識別子（タグ）である。
【００２８】
また、予定欄の各単語に「（Ｄ）」という記号が付加されている。これは、ユーザの問い合わせに対する応答、すなわち検索結果として用いられることを表すための識別子（タグ）である。なお、検索データベース１１は、音声対話システムを実行するユーザ、あるいは音声対話システムを提供するユーザによって、事前に作成されるものである。
【００２９】
音声認識用文法作成部１２では、ユーザの要求に応じて、検索データベース１１に登録されているデータに基づいて音声認識用文法を作成する。例えば、図２に示す予定案内用のデータベースに対して、ユーザから、“氏名”、“日付”、“予定”の各項目で構成される音声認識用文法を作成するよう要求があった場合には、氏名欄、日付欄、予定欄に登録されている各単語を用いて、オートマトンとして音声認識用文法が作成される。
【００３０】
この際、トークンタグ「（Ｔ）」が識別子として付加されている単語は“トークン”として、データタグ「（Ｄ）」が識別子として付加されている単語は“ユーザへの応答内容”として、それぞれ音声認識用文法に組み込まれる。また、上述した基本辞書１３を参照することにより、“さん”や“くん”といった、検索データベース１１に存在しない単語についても、必要に応じて音声認識用文法に組み込むことができる。
【００３１】
このように、音声認識用文法はオートマトンモデルにしたがって構築される。具体的には、検索データベース１１に登録されている各単語を出力する“状態”を定義し、単語を出力することによって状態を遷移するよう状態のネットワークを構築することになる。なお、かかるオートマトンモデルの作成方法については特に限定されるものではなく、従来の方法であればどの方法で作成しても良い。
【００３２】
図２に示す予定案内用データベースに基づいて作成される音声認識用文法の一例を図３に示す。図３において、○印は状態を示しており、Ｓ０が初期状態を、以下Ｓ１〜Ｓｎ（ｎは自然数）が状態遷移後の状態を示している。また、矢印は状態間の遷移を示している。すなわち、検索データベース１１においてトークンタグあるいはデータタグが付加されている項目は、状態が遷移するための条件を示していることになる。
【００３３】
ここで、ユーザがユーザ主導型音声認識用文法の作成を希望した場合には、ユーザの自由な発話を想定して、音声認識用文法を作成する必要がある。したがって、図４に示すように、各状態及び遷移を網羅するように、検索データベース１１及び基本辞書１３の単語を組み合わせた音声認識用文法を自動的に作成し、ユーザ主導型音声認識用文法１４に格納する。
【００３４】
図４において、記号“＜”及び“＞”で囲まれている部分に、検索データベース１１から抽出された各項目の値が代入される。したがって、音声認識用文法の生成時に、検索結果そのものであるデータタグの付与されたデータそのものが含まれることになる。また、「〜さん」のような接尾語等は、基本辞書１３に登録されている単語から抽出されている。
【００３５】
作成された音声認識用文法をオートマトンの形態で記述したものが図５である。図５に示すように、各項目値を含めた形態でオートマトンを作成し、最終状態においてはデータタグの付与されたデータ、すなわち状態遷移において定められた検索条件に対応する検索結果が示されることになる。
【００３６】
また、ユーザがシステム主導型音声認識用文法を希望した場合には、「誰の予定ですか？」、あるいは「何日の予定ですか？」というシステムが準備した問いかけに対するユーザ発話を受理すれば良いことから、図６に示すように最小限の単語を受理する音声認識用文法を自動的に作成し、システム主導型音声認識用文法格納部１５に格納する。
【００３７】
作成された音声認識用文法をオートマトンの形態で記述したものが図７である。図７も図５と同様に、各項目値を含めた形態でオートマトンを作成し、最終状態においてはデータタグの付与されたデータ、すなわち状態遷移において定められた検索条件に対応する検索結果が示されることになる。
【００３８】
なお、かかる音声認識用文法は、ユーザの要求に従って事前に作成しておいても良いし、音声対話中に対話管理部３からの要求に応じて動的に作成するものであっても良い。
【００３９】
入力解析部２では、入力受信部１で入力されたユーザ発話を認識するが、この際に用いる音声認識用文法としては、ユーザ主導型音声認識用文法格納部１４に保存されているユーザ主導型音声認識用文法、あるいはシステム主導型音声認識用文法格納部１５に保存されているシステム主導型音声認識用文法を参照し、入力音声との照合を行う。
【００４０】
入力音声との照合は、それぞれ音声認識用文法においてトークンタグが付与されている項目についてのみ実行される。また、照合過程で抽出されたデータタグが付与されている項目は、直接ユーザへの応答内容であることから、当該項目の内容をユーザへの応答結果として対話管理部３に送信する。
【００４１】
対話管理部３では、音声認識の結果に基づいて、ユーザとの対応方針を決定する。また、入力解析部２において結果が得られない場合や、対話が成立しない場合には、ユーザ主導型音声認識用文法では対応できないものと判断し、音声認識用文法をシステム主導型音声認識用文法に切り換えるように音声認識用文法切り換え部１６に指示を出す。
【００４２】
なお、音声認識結果や対話成立状況をログデータとして記録するようにログデータ記録部（図示せず）を設け、ログデータ記録部に記録されているログデータに基づいて音声認識用文法を切り替えることも考えられる。
【００４３】
例えば、ログデータを検索しながら、データタグの付されたカラムが出力された回数と、音声認識用文法が使用された回数との比を算出し、算出された比に基づいて音声認識用文法を切り換える。この場合、算出された比が所定のしきい値以下である場合には、最終的な検索結果であるデータが出力された比率が低いという事実によって、対話があまり正常に行われていないことが示されていることから、音声認識用文法を発話が制限されているシステム主導型音声認識用文法に切り換えることで、対話を正常に行うことが容易になるようにする。
【００４４】
逆に、しきい値を超えている場合には、最終的な検索結果であるデータが出力された比率が高いという事実によって、対話が正常に行われていないことを示すことから、音声認識用文法を自由な発話に対応するユーザ主導型音声認識用文法に切り換えることになる。
【００４５】
また、対話において既に抽出した情報を加えて、音声認識用文法を再度作成するよう、音声認識用文法作成部１２に指示を出すことも考えられる。このように、対話の状況に応じてユーザ主導型音声認識用文法とシステム主導型音声認識用文法を動的に切り換えることによって、両者の特徴を併せ持った混合主導型音声認識用文法を用いるのと同様の対話を実現することも可能となる。
【００４６】
音声認識用文法切り換え部１６では、対話管理部３からの指示に応じて、入力解析部２で用いる音声認識用文法を切り換え、ユーザ主導型音声認識用文法格納部１４とシステム主導型音声認識用文法格納部１５から、それぞれ指示に応じた認識用文法を抽出する。
【００４７】
最後に、システム応答生成・出力部４では、ユーザへの応答内容を生成し、生成された内容をユーザへ出力する。本実施の形態においては、検索データベース１１に基づいて音声認識用文法を作成する際に、応答内容そのものであるデータも音声認識用文法に組み込まれる。したがって、従来の音声対話システムのように、音声認識用文法を作成して入力音声を認識し、その後さらに認識結果に基づいてデータベースを検索する必要が無く、計算機資源の処理負荷を軽減することが可能となる。
【００４８】
次に、本発明の実施の形態にかかる音声対話システムを実現するプログラムの処理の流れについて説明する。まず図８では、本発明の実施の形態にかかる音声対話システムにおける認識用文法生成を実行するプログラムの処理の流れ図を示す。
【００４９】
図８において、まず検索データベース１１の各項目について、ユーザが入力すべき項目、あるいはユーザに提示すべき内容を含む項目であるのかを識別する識別情報を付加する（ステップＳ８０１）。そして、音声認識用文法の作成者から音声認識用文法生成用の音声データあるいは文字データ等のコマンド入力を受け付ける（ステップＳ８０２）。
【００５０】
次に、音声認識結果や対話成立状況が記録されているログデータに基づいて、データタグの付されたカラムが出力された回数と、音声認識用文法が使用された回数との比を算出する（ステップＳ８０３）。
【００５１】
そして、算出された比率が所定のしきい値を超えている場合には（ステップＳ８０４：Ｙｅｓ）、音声認識用文法を自由な発話に対応するユーザ主導型音声認識用文法を生成あるいは更新し（ステップＳ８０５）、算出された比率が所定のしきい値以下である場合には（ステップＳ８０４：Ｎｏ）、音声認識用文法を発話が制限されているシステム主導型音声認識用文法を生成あるいは更新する（ステップＳ８０６）。なお図８では省略しているが、ログデータにかかわらず、文法作成者の要求に応じて、音声認識用文法を生成あるいは更新することも可能である。
【００５２】
そして、当該処理の内容をログデータとして記録する（ステップＳ８０７）。
【００５３】
次に図９では、本発明の実施の形態にかかる音声対話システムにおいて、生成された認識用文法を用いて検索処理を実行するプログラムの処理の流れ図を示す。
【００５４】
図９において、まずエンドユーザから検索用の音声データの入力を受け付ける（ステップＳ９０１）。
【００５５】
次に、音声認識結果や対話成立状況が記録されているログデータに基づいて、データタグの付されたカラムが出力された回数と、音声認識用文法が使用された回数との比を算出する（ステップＳ９０２）。
【００５６】
そして、算出された比率が所定のしきい値を超えている場合には（ステップＳ９０３：Ｙｅｓ）、音声認識用文法として自由な発話に対応するユーザ主導型音声認識用文法を用いて音声認識処理を行い（ステップＳ９０４）、算出された比率が所定のしきい値以下である場合には（ステップＳ９０３：Ｎｏ）、音声認識用文法として発話が制限されているシステム主導型音声認識用文法を用いて音声認識処理を行う（ステップＳ９０５）。音声認識処理において、検索結果も得られる。
【００５７】
そして、入力された条件に対応する応答を生成して出力し（ステップＳ９０６）、当該処理における音声認識結果や対話成立状況をログデータとして記録する（ステップＳ９０７）。
【００５８】
以上のように本実施の形態によれば、検索対象データベースとして、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグの付与されたデータベースを用意することで、検索対象データベースを参照するだけで音声認識用文法を生成することができ、音声認識時に検索結果を抽出することができるため、音声認識後の検索処理を省略することができ、処理負荷を軽減することが可能となる。
【００５９】
なお、本発明の実施の形態にかかる音声対話システムを実現するプログラムは、図１０に示すように、ＣＤ−ＲＯＭ１０２−１やフレキシブルディスク１０２−２等の可搬型記録媒体１０２だけでなく、通信回線の先に備えられた他の記憶装置１０１や、コンピュータ１０３のハードディスクやＲＡＭ等の記録媒体１０４のいずれに記憶されるものであっても良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００６０】
また、本発明の実施の形態にかかる音声対話システムにより生成された音声認識用文法等についても、図１０に示すように、ＣＤ−ＲＯＭ１０２−１やフレキシブルディスク１０２−２等の可搬型記録媒体１０２だけでなく、通信回線の先に備えられた他の記憶装置１０１や、コンピュータ１０３のハードディスクやＲＡＭ等の記録媒体１０４のいずれに記憶されるものであっても良く、例えば本発明にかかる音声対話システムを利用する際にコンピュータ１０３により読み取られる。
【００６１】
なお、以下のバリエーションについても付記しておく。
【００６２】
（付記１）音声対話サービスにおける検索対象となるコンテンツ情報を格納したデータベースを有し、
少なくともユーザ発話を入力する音声入力部と、
前記ユーザ発話を認識する音声認識部と、
認識結果に応じて応答指示を制御する対話管理部と、
ユーザへの応答を生成して出力する応答生成・出力部を含む音声対話システムであって、
前記データベースにおいて、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグが各カラムごとに設定され、
前記トークンタグと前記データタグを含めて音声認識用文法を作成する音声認識用文法作成部をさらに含み、
前記音声認識部において、作成された前記音声認識用文法に基づいて前記ユーザ発話を認識することを特徴とする音声対話システム。
【００６３】
（付記２）前記音声認識用文法作成部において、ユーザによる自由な発話に対応する音声認識用文法もしくは発話が制限されている音声認識用文法を作成する付記１に記載の音声対話システム。
【００６４】
（付記３）音声認識結果や対話成立状況をログデータとして記録する手段を含み、前記ログデータに基づいて用いる又は作成する音声認識用文法を切り替える付記１に記載の音声対話システム。
【００６５】
（付記４）前記ログデータに基づいて、前記データタグの付されたカラムが出力された回数と、前記音声認識用文法が使用された回数との比を算出し、算出した比が所定のしきい値以下である場合には、前記音声認識用文法を発話が制限されている音声認識用文法に切り換え、前記しきい値を超えている場合には前記音声認識用文法を自由な発話に対応する音声認識用文法に切り換える付記３に記載の音声対話システム。
【００６６】
（付記５）前記ログデータに基づいて、前記データタグの付されたカラムが出力された回数と、前記音声認識用文法が使用された回数との比を算出し、算出した比が所定のしきい値以下である場合には、発話が制限されている音声認識用文法を作成し、前記しきい値を超えている場合には自由な発話に対応する音声認識用文法を作成する付記３に記載の音声対話システム。
【００６７】
（付記６）音声対話サービスにおける検索対象となるコンテンツ情報を格納したデータベースを有し、
少なくともユーザ発話を入力する工程と、
前記ユーザ発話を認識する工程と、
認識結果に応じて応答指示を制御する工程と、
ユーザへの応答を生成して出力する工程を含む音声対話方法であって、
前記データベースにおいて、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグが各カラムごとに設定され、
前記トークンタグと前記データタグを含めて音声認識用文法を作成する工程をさらに含み、
前記ユーザ発話を認識する工程において、作成された前記音声認識用文法に基づいて前記ユーザ発話を認識することを特徴とする音声対話方法。
【００６８】
（付記７）音声対話サービスにおける検索対象となるコンテンツ情報を格納したデータベースを有し、
少なくともユーザ発話を入力するステップと、
前記ユーザ発話を認識するステップと、
認識結果に応じて応答指示を制御するステップと、
ユーザへの応答を生成して出力するステップを含む音声対話方法を具現化するコンピュータ実行可能なプログラムであって、
前記データベースにおいて、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグが各カラムごとに設定され、
前記トークンタグと前記データタグを含めて音声認識用文法を作成するステップをさらに含み、
前記ユーザ発話を認識するステップにおいて、作成された前記音声認識用文法に基づいて前記ユーザ発話を認識することを特徴とするコンピュータ実行可能なプログラム。
【００６９】
【発明の効果】
以上のように本発明にかかる音声対話システムによれば、検索対象データベースとして、ユーザが発話すべきカテゴリを示す識別情報であるトークンタグと、検索対象となるデータを示す識別情報であるデータタグの付与されたデータベースを用意することで、検索対象データベースを参照するだけで認識用文法を生成することができ、音声認識時に検索結果を抽出することができるため、音声認識後の検索処理を省略することができ、処理負荷を軽減することが可能となる。
【図面の簡単な説明】
【図１】本発明の実施の形態にかかる音声対話システムの構成図
【図２】本発明の実施の形態にかかる音声対話システムにおける検索データベースのデータ構成例示図
【図３】本発明の実施の形態にかかる音声対話システムにおける音声認識用文法の例示図
【図４】本発明の実施の形態にかかる音声対話システムにおけるユーザ主導型音声認識用文法の例示図
【図５】本発明の実施の形態にかかる音声対話システムにおけるユーザ主導型音声認識用文法のオートマトン例示図
【図６】本発明の実施の形態にかかる音声対話システムにおけるシステム主導型音声認識用文法の例示図
【図７】本発明の実施の形態にかかる音声対話システムにおけるシステム主導型音声認識用文法のオートマトン例示図
【図８】本発明の実施の形態にかかる音声対話システムにおける音声認識用文法生成処理の流れ図
【図９】本発明の実施の形態にかかる音声対話システムにおける検索処理の流れ図
【図１０】コンピュータ環境の例示図
【符号の説明】
１入力受信部
２入力解析部
３対話管理部
４応答生成・出力部
１１検索データベース
１２音声認識用文法作成部
１３基本辞書
１４ユーザ主導型音声認識用文法格納部
１５システム主導型音声認識用文法格納部
１６音声認識用文法切り換え部
９１回線先の記憶装置
９２ＣＤ−ＲＯＭやフレキシブルディスク等の可搬型記録媒体
９２−１ＣＤ−ＲＯＭ
９２−２フレキシブルディスク
９３コンピュータ
９４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speech dialogue system and method that has a database to which metadata is assigned for each column, and that automatically creates a grammar for speech recognition by referring to the database.
[0002]
[Prior art]
Along with the rapid development of computer technology in recent years, dialogue systems using voice have rapidly become widespread. In the spoken dialogue system, in order to accurately recognize the user's utterance, first, acoustic recognition is performed by referring to the speech recognition dictionary for each phoneme, and then by referring to the speech recognition grammar. Grammatic recognition is often done.
[0003]
Conventionally, when creating a speech recognition grammar used in a speech dialogue system, a system developer first designs a dialogue scenario according to the service content, and utterance content (word string) that a user can utter in the dialogue according to the scenario. ) Has been modeled as a language generation mechanism such as a finite state automaton.
[0004]
And for speech recognition grammar, not only system-driven recognition grammar, which is a recognition grammar that restricts the user's utterance content, but also user-driven recognition grammar that allows the user to speak freely. Various ideas have been made so that grammar can be generated.
[0005]
For example, in (Patent Document 1), not only system-driven voice conversation but also guidance output is controlled so that a user's free utterance can be accepted, and recognition is performed according to a response from the user. A spoken dialogue system capable of switching grammar is disclosed.
[0006]
[Patent Document 1]
JP 2002-342065 A
[0007]
[Problems to be solved by the invention]
However, the process of generating recognition grammars is a process that consumes a great deal of man-hours. In particular, when generating user-driven recognition grammars, all of the dialog contents that the user may utter are covered. In practice, there is a problem that a complete recognition grammar cannot be generated.
[0008]
In addition, a speech recognition grammar is constructed by using words (tokens) constituting user utterances and semantic tags for adding semantic information thereto. For example, in an information retrieval system using speech, it is obtained from recognition results. The database to be searched is searched based on the keyword and semantic tag information to be searched. In this case, two processes of the voice recognition process and the information search process are performed to extract the search result that is a response to the user. There was a need.
[0009]
In order to solve the above-described problems, the present invention can easily create a user-driven or system-driven grammar for speech recognition and can quickly extract a response to the user. The purpose is to provide a system.
[0010]
[Means for Solving the Problems]
In order to achieve the above object, a speech dialogue system according to the present invention has a database storing content information to be searched in a speech dialogue service, and recognizes at least a voice input unit for inputting a user utterance and a user utterance. A voice dialogue system including a voice recognition unit, a dialogue management unit that controls response instructions according to recognition results, and a response generation / output unit that generates and outputs a response to a user. A voice tag for creating a speech recognition grammar including a token tag and a data tag, in which a token tag that is identification information indicating a category to be spoken and a data tag that is identification information indicating data to be searched are set for each column. It further includes a recognition grammar creation unit that recognizes user utterances based on the created speech recognition grammar. And wherein the Rukoto.
[0011]
With this configuration, the search target database is prepared by preparing a database to which a token tag that is identification information indicating a category to be uttered by a user and a data tag that is identification information indicating data to be searched are provided. The recognition grammar can be generated simply by referring to the target database, and the search result can be extracted at the time of speech recognition, so the search processing after speech recognition can be omitted and the processing load can be reduced. It becomes possible.
[0012]
In the speech dialogue system according to the present invention, the speech recognition grammar creation unit preferably creates speech recognition grammar corresponding to a user's free speech or speech recognition grammar with restricted speech. This is because it is possible to easily create both user-driven / system-driven grammars for recognition.
[0013]
The speech dialogue system according to the present invention preferably includes means for recording a speech recognition result or dialogue establishment status as log data, and it is preferable to switch a speech recognition grammar to be used or created based on the log data. This is because the response to the user can be dynamically switched according to the progress of the conversation, and a speech recognition dictionary corresponding to the situation can be created.
[0014]
In addition, the speech dialogue system according to the present invention calculates the ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used based on the log data, and calculates the calculated ratio. Is less than a predetermined threshold, the grammar for speech recognition is switched to a grammar for speech recognition with restricted utterances, and if the threshold is exceeded, the grammar for speech recognition is changed to a free utterance. It is preferable to switch to a corresponding speech recognition grammar. This is because the fact that the data that is the final search result has been output indicates that the dialogue has been performed normally, so that it can be determined that the larger the number of such times, the more utterances can be handled.
[0015]
In addition, the speech dialogue system according to the present invention calculates the ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used based on the log data, and calculates the calculated ratio. If the threshold is less than or equal to a predetermined threshold, create a speech recognition grammar with limited utterances, and create a speech recognition grammar corresponding to free utterances when the threshold is exceeded. It is preferable. This is because the fact that the data that is the final search result has been output indicates that the dialogue has been performed normally, so that it can be determined that the larger the number of such times, the more utterances can be handled.
[0016]
Further, the present invention is characterized by software that executes the above-described voice interaction system as a processing step of a computer. Specifically, a database storing content information to be searched for in a voice interaction service is stored. A voice interaction method comprising: inputting at least a user utterance; recognizing a user utterance; controlling a response instruction according to a recognition result; and generating and outputting a response to the user. In the database, a token tag that is identification information indicating a category to be uttered by a user and a data tag that is identification information indicating data to be searched are set for each column, including the token tag and the data tag. And generating a speech recognition grammar, which is created in the step of recognizing user utterances. It characterized in that it is a voice dialogue method and a computer-executable program for implementing such a step of recognizing a user's utterance based on the voice recognition grammar.
[0017]
With such a configuration, by loading and executing the program on a computer, a token tag that is identification information indicating a category to be uttered by a user and data that is identification information indicating data to be searched are obtained as a search target database. By preparing a database with a tag attached, it is possible to generate a recognition grammar simply by referring to the database to be searched, and the search results can be extracted during speech recognition. A voice interaction system that can be omitted and can reduce the processing load can be realized.
[0018]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a voice interaction system according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a configuration diagram of a voice interaction system according to an embodiment of the present invention.
[0019]
In FIG. 1, an utterance or a character by a user is input from the input receiving unit 1. For example, when the voice interactive system according to the present embodiment is an information search system, a search condition by the user is input as voice data. Here, the user may be an engineer such as a system engineer who creates a grammar for speech recognition and an end user who performs a search task.
[0020]
That is, a technician such as a system engineer who creates a speech recognition grammar generates speech data or commands from the input receiving unit 1 in order to generate a speech recognition grammar used in the speech dialogue system according to the present embodiment. Enter the character data. When the speech recognition grammar is generated using the speech data, the input analysis unit 2 recognizes the speech using a general speech recognition grammar and specifies the input content.
[0021]
When the input content is passed to the dialogue management unit 3, the speech recognition grammar creation unit 12 generates a speech recognition grammar by referring to the search database 11 to be searched together with the basic dictionary 13, and the user-initiated speech The data is stored in the recognition grammar storage unit 14 or the system-driven speech recognition grammar storage unit 15.
[0022]
Further, the end user who performs the search task inputs voice data as a search condition from the input receiving unit 1. The speech data is analyzed in the input analysis unit 2 using the initial speech recognition grammar or the speech recognition grammar already generated in the speech recognition grammar creation unit 12, and the analysis result is passed to the dialogue management unit 3. .
[0023]
Here, in the conventional voice dialogue system, the dialogue management unit 3 inquires the retrieval database 11 according to the input content, thereby obtaining the retrieval result that matches the input condition.
[0024]
On the other hand, according to the present invention, when an end user's utterance is input from the input receiving unit 1, a search result is also acquired when speech recognition is performed using a speech recognition grammar generated and stored in advance. It has a feature in that it can. Of course, in a situation where the speech recognition grammar creation unit 12 does not generate the speech recognition grammar, a search result is obtained by a database query in the dialog management unit 3.
[0025]
First, the basic dictionary 13 is a dictionary for extracting basic information when creating a speech recognition grammar. In the basic dictionary 13, information related to “reading” for each word in the search database 11 and words that are not present in the search database 11 and are necessary for speech recognition grammar are registered. For example, a word such as “san” or “kun”, a particle or the like is registered for the word “name”.
[0026]
Next, the search database 11 is a database corresponding to a task such as a schedule guide or a postal code search, for example, whether the content to be input by the user or the content to be presented to the user for each element of the database. Identification information for identification is added as a tag.
[0027]
For example, FIG. 2 shows an exemplary diagram of a data structure when the search database 11 is a schedule guidance database. In FIG. 2, a tag symbol “(T)” is added to each word in the name column, each word in the date column, and the word “plan”. This is an identifier (tag) indicating that it is used as a “token” in the user's inquiry.
[0028]
Further, a symbol “(D)” is added to each word in the schedule column. This is an identifier (tag) for representing a response to a user's inquiry, that is, used as a search result. The search database 11 is created in advance by a user who executes the voice interaction system or a user who provides the voice interaction system.
[0029]
The speech recognition grammar creation unit 12 creates a speech recognition grammar based on data registered in the search database 11 in response to a user request. For example, when the user requests the schedule guidance database shown in FIG. 2 to create a grammar for speech recognition composed of items of “name”, “date”, and “schedule”. Uses the words registered in the name field, date field, and schedule field to create a speech recognition grammar as an automaton.
[0030]
At this time, the word to which the token tag “(T)” is added as an identifier is “token”, and the word to which the data tag “(D)” is added as an identifier is “response content to the user”. Built into speech recognition grammar. Further, by referring to the basic dictionary 13 described above, words that do not exist in the search database 11 such as “san” and “kun” can be incorporated into the speech recognition grammar as necessary.
[0031]
Thus, the speech recognition grammar is constructed according to the automaton model. Specifically, a “state” for outputting each word registered in the search database 11 is defined, and a state network is constructed so that the state is changed by outputting the word. The method for creating the automaton model is not particularly limited, and any conventional method may be used.
[0032]
An example of the speech recognition grammar created based on the schedule guidance database shown in FIG. 2 is shown in FIG. In FIG. 3, ◯ indicates a state, S0 indicates an initial state, and S1 to Sn (n is a natural number) below indicate a state after state transition. Moreover, the arrow has shown the transition between states. That is, an item to which a token tag or a data tag is added in the search database 11 indicates a condition for the state transition.
[0033]
Here, when the user desires to create a user-driven speech recognition grammar, it is necessary to create the speech recognition grammar assuming the user's free speech. Therefore, as shown in FIG. 4, a speech recognition grammar that combines words from the search database 11 and the basic dictionary 13 is automatically created so as to cover each state and transition, and a user-driven speech recognition grammar 14 is created. To store.
[0034]
In FIG. 4, the value of each item extracted from the search database 11 is substituted into the portion surrounded by the symbols “<” and “>”. Therefore, when the grammar for speech recognition is generated, the data itself to which the data tag that is the search result itself is added is included. In addition, suffixes such as “˜san” are extracted from words registered in the basic dictionary 13.
[0035]
FIG. 5 shows the created speech recognition grammar in the form of an automaton. As shown in FIG. 5, the automaton is created in a form including each item value, and in the final state, the data tag is attached, that is, the search result corresponding to the search condition defined in the state transition is shown. become.
[0036]
Also, if the user wishes to use the system-driven speech recognition grammar, the user's utterance to the question prepared by the system, “Who are you planning?” Or “What is your plan?” For this reason, as shown in FIG. 6, a speech recognition grammar that accepts a minimum number of words is automatically created and stored in the system-driven speech recognition grammar storage unit 15.
[0037]
FIG. 7 shows the created speech recognition grammar in the form of an automaton. Similarly to FIG. 5, FIG. 7 also creates an automaton in a form including each item value, and in the final state, the search result corresponding to the data tag attached data, that is, the search condition defined in the state transition is shown. Will be.
[0038]
The speech recognition grammar may be created in advance according to the user's request, or may be dynamically created in response to a request from the dialogue management unit 3 during the voice dialogue.
[0039]
The input analysis unit 2 recognizes the user utterance input by the input receiving unit 1. As the speech recognition grammar used at this time, the user-driven type stored in the user-driven speech recognition grammar storage unit 14 is used. The speech recognition grammar or the system-driven speech recognition grammar stored in the system-driven speech recognition grammar storage unit 15 is referred to and collated with the input speech.
[0040]
The collation with the input speech is executed only for items to which token tags are assigned in the speech recognition grammar. Moreover, since the item to which the data tag extracted in the verification process is assigned is the response content directly to the user, the content of the item is transmitted to the dialog management unit 3 as a response result to the user.
[0041]
The dialogue management unit 3 determines a correspondence policy with the user based on the result of speech recognition. If the input analysis unit 2 cannot obtain a result or if a dialogue is not established, it is determined that the user-initiated speech recognition grammar cannot be used, and the speech recognition grammar is changed to the system-driven speech recognition grammar. The voice recognition grammar switching unit 16 is instructed to switch to
[0042]
In addition, a log data recording unit (not shown) is provided so as to record the voice recognition result and conversation establishment status as log data, and the grammar for voice recognition is switched based on the log data recorded in the log data recording unit. Is also possible.
[0043]
For example, while searching log data, calculate the ratio between the number of times a column with a data tag was output and the number of times the speech recognition grammar was used, and based on the calculated ratio, the speech recognition grammar Switch. In this case, when the calculated ratio is less than or equal to a predetermined threshold, the dialog may not be performed normally due to the fact that the ratio of the data that is the final search result is low. Therefore, it is possible to facilitate normal conversation by switching the speech recognition grammar to a system-driven speech recognition grammar in which speech is restricted.
[0044]
On the other hand, if the threshold value is exceeded, the fact that the final search result data is output is high, which indicates that the dialogue is not performed normally. The grammar is switched to a grammar for user-initiated speech recognition corresponding to free speech.
[0045]
It is also conceivable to give an instruction to the speech recognition grammar creation unit 12 to create the speech recognition grammar again by adding the information already extracted in the dialogue. In this way, by dynamically switching between user-driven speech recognition grammar and system-driven speech recognition grammar according to the state of dialogue, a mixed-driven speech recognition grammar having both features is used. A similar dialogue can be realized.
[0046]
The speech recognition grammar switching unit 16 switches the speech recognition grammar used in the input analysis unit 2 in accordance with an instruction from the dialogue management unit 3, and the user-driven speech recognition grammar storage unit 14 and the system-driven speech recognition A grammar for recognition corresponding to each instruction is extracted from the grammar storage unit 15.
[0047]
Finally, the system response generation / output unit 4 generates a response content to the user and outputs the generated content to the user. In the present embodiment, when the speech recognition grammar is created based on the search database 11, the data that is the response content itself is also incorporated into the speech recognition grammar. Therefore, unlike the conventional speech dialogue system, it is not necessary to create a grammar for speech recognition and recognize input speech, and then search the database based on the recognition result, thereby reducing the processing load of computer resources. It becomes possible.
[0048]
Next, the flow of processing of a program that realizes the voice interaction system according to the embodiment of the present invention will be described. First, FIG. 8 shows a flowchart of processing of a program for executing recognition grammar generation in the speech dialogue system according to the exemplary embodiment of the present invention.
[0049]
In FIG. 8, first, identification information for identifying whether each item in the search database 11 is an item to be input by the user or an item including contents to be presented to the user is added (step S801). Then, a command input such as speech data or character data for generating speech recognition grammar is received from the creator of the speech recognition grammar (step S802).
[0050]
Next, the ratio of the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated based on the log data in which the speech recognition result and the conversation establishment status are recorded. (Step S803).
[0051]
If the calculated ratio exceeds a predetermined threshold (step S804: Yes), the user-initiated speech recognition grammar corresponding to a free utterance is generated or updated (step S804: Yes). In step S805), if the calculated ratio is equal to or less than a predetermined threshold (step S804: No), a system-driven speech recognition grammar in which speech is restricted is generated or updated. (Step S806). Although omitted in FIG. 8, the speech recognition grammar can be generated or updated according to the request of the grammar creator regardless of the log data.
[0052]
Then, the contents of the processing are recorded as log data (step S807).
[0053]
Next, FIG. 9 shows a flowchart of the processing of a program that executes search processing using the generated recognition grammar in the voice interaction system according to the embodiment of the present invention.
[0054]
In FIG. 9, first, input of search voice data is received from the end user (step S901).
[0055]
Next, the ratio of the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated based on the log data in which the speech recognition result and the conversation establishment status are recorded. (Step S902).
[0056]
If the calculated ratio exceeds a predetermined threshold (step S903: Yes), the speech recognition processing is performed using the user-driven speech recognition grammar corresponding to a free utterance as the speech recognition grammar. (Step S904), and if the calculated ratio is equal to or less than a predetermined threshold (step S903: No), the system-driven speech recognition grammar with restricted utterance is used as the speech recognition grammar. Then, voice recognition processing is performed (step S905). Search results are also obtained in the speech recognition process.
[0057]
Then, a response corresponding to the input condition is generated and output (step S906), and the voice recognition result and conversation establishment status in the processing are recorded as log data (step S907).
[0058]
As described above, according to the present embodiment, as a search target database, a token tag that is identification information indicating a category to be uttered by a user and a data tag that is identification information indicating data to be searched are provided. By preparing a database, it is possible to generate a grammar for speech recognition simply by referring to the database to be searched, and to extract search results during speech recognition, so that search processing after speech recognition can be omitted. It is possible to reduce the processing load.
[0059]
As shown in FIG. 10, the program for realizing the voice interaction system according to the embodiment of the present invention is not only a portable recording medium 102 such as a CD-ROM 102-1 and a flexible disk 102-2, but also a communication line. It may be stored in any of the other storage device 101 provided in front of the computer 101 or a recording medium 104 such as a hard disk or a RAM of the computer 103. When the program is executed, the program is loaded and stored in the main memory. Executed.
[0060]
Also, the speech recognition grammar generated by the speech dialogue system according to the embodiment of the present invention is also portable recording medium 102 such as CD-ROM 102-1 or flexible disk 102-2 as shown in FIG. It may be stored not only in the other storage device 101 provided at the end of the communication line, but also in the recording medium 104 such as the hard disk or RAM of the computer 103. For example, the voice dialogue according to the present invention It is read by the computer 103 when using the system.
[0061]
The following variations are also noted.
[0062]
(Supplementary note 1) Having a database storing content information to be searched in the voice dialogue service,
A voice input unit for inputting at least a user utterance;
A voice recognition unit for recognizing the user utterance;
A dialogue manager that controls response instructions according to the recognition results;
A spoken dialogue system including a response generation / output unit that generates and outputs a response to a user,
In the database, a token tag that is identification information indicating a category that the user should speak and a data tag that is identification information indicating data to be searched are set for each column,
A speech recognition grammar creation unit that creates a speech recognition grammar including the token tag and the data tag;
In the voice recognition system, the voice recognition unit recognizes the user utterance based on the created voice recognition grammar.
[0063]
(Supplementary note 2) The speech dialogue system according to supplementary note 1, wherein the speech recognition grammar creation unit creates speech recognition grammar corresponding to a user's free speech or speech recognition grammar in which speech is restricted.
[0064]
(Supplementary note 3) The voice dialogue system according to supplementary note 1, including means for recording a voice recognition result and a dialogue establishment status as log data, and switching a grammar for voice recognition to be used or created based on the log data.
[0065]
(Supplementary Note 4) Based on the log data, a ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated. If the threshold is less than or equal to the threshold, the grammar for speech recognition is switched to a grammar for speech recognition with restricted utterances, and if the threshold is exceeded, the grammar for speech recognition is compatible with free utterances. The spoken dialogue system according to appendix 3, wherein the grammar for voice recognition is switched.
[0066]
(Supplementary Note 5) Based on the log data, a ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated. If the threshold is less than or equal to the threshold value, a speech recognition grammar with limited utterances is created, and if the threshold is exceeded, a speech recognition grammar corresponding to a free utterance is created. The spoken dialogue system described.
[0067]
(Additional remark 6) It has the database which stored the content information used as the search object in a voice interaction service,
Inputting at least a user utterance;
Recognizing the user utterance;
A step of controlling a response instruction according to the recognition result;
A voice interaction method including a step of generating and outputting a response to a user,
In the database, a token tag that is identification information indicating a category that the user should speak and a data tag that is identification information indicating data to be searched are set for each column,
Further comprising the step of creating a speech recognition grammar including the token tag and the data tag,
In the step of recognizing the user utterance, the user utterance is recognized based on the created speech recognition grammar.
[0068]
(Supplementary note 7) Having a database storing content information to be searched in the voice dialogue service,
Inputting at least a user utterance;
Recognizing the user utterance;
Controlling response instructions according to recognition results;
A computer-executable program that embodies a spoken dialogue method that includes generating and outputting a response to a user,
In the database, a token tag that is identification information indicating a category that the user should speak and a data tag that is identification information indicating data to be searched are set for each column,
Creating a speech recognition grammar including the token tag and the data tag;
A computer-executable program for recognizing the user utterance based on the created speech recognition grammar in the step of recognizing the user utterance.
[0069]
【The invention's effect】
As described above, according to the spoken dialogue system of the present invention, the search target database includes a token tag that is identification information indicating a category to be uttered by a user and a data tag that is identification information indicating data to be searched. By preparing the assigned database, it is possible to generate a recognition grammar simply by referring to the database to be searched, and the search result can be extracted at the time of speech recognition, so the search processing after speech recognition is omitted. And the processing load can be reduced.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a voice interaction system according to an embodiment of the present invention.
FIG. 2 is a data configuration example diagram of a search database in the voice interaction system according to the embodiment of the present invention.
FIG. 3 is an exemplary diagram of a grammar for speech recognition in the speech dialogue system according to the embodiment of the present invention.
FIG. 4 is a view showing an example of a user-driven speech recognition grammar in the speech dialogue system according to the embodiment of the present invention.
FIG. 5 is a diagram showing an example of an automaton of a user-driven grammar for speech recognition in the speech dialogue system according to the embodiment of the present invention;
FIG. 6 is an exemplary diagram of a system-driven speech recognition grammar in the speech dialogue system according to the embodiment of the present invention.
FIG. 7 is a diagram showing an example of an automaton of a system-driven grammar for speech recognition in the speech dialogue system according to the embodiment of the present invention.
FIG. 8 is a flowchart of speech recognition grammar generation processing in the speech dialogue system according to the embodiment of the present invention.
FIG. 9 is a flowchart of search processing in the voice interaction system according to the embodiment of the present invention.
FIG. 10 is an exemplary diagram of a computer environment.
[Explanation of symbols]
1 Input receiver
2 Input analysis unit
3 Dialogue Management Department
4 Response generation / output section
11 Search database
12 Grammar creation for speech recognition
13 Basic dictionary
14 Grammar storage for user-driven speech recognition
15 Grammar storage for system-driven speech recognition
16 Grammar switching part for voice recognition
91 Line destination storage device
92 Portable recording media such as CD-ROM and flexible disk
92-1 CD-ROM
92-2 Flexible disk
93 computers
94 Recording medium such as RAM / hard disk on computer

Claims

An audio input unit for inputting a user's utterance even without low,
A speech dialogue system comprising a dialog manager that controls the response indication in response to the recognition result of the inputted user utterance,
User identification information indicating a category to be uttered token tag, or a database data tag is stored in a state of being set for each column is identification information indicating the data to be retrieved in the voice conversation service,
A speech recognition grammar generating unit configured to generate a speech recognition grammar including said token tag and the data tag,
A speech recognition unit that recognizes the user utterance by comparing the item to which the token tag is given in the created speech recognition grammar with the input user utterance ;
A voice dialogue system comprising: a response generation / output unit that outputs an item to which a data tag corresponding to the token tag extracted in the verification process is attached as a response content to the user .

The speech dialogue system according to claim 1, wherein the speech recognition grammar creation unit creates speech recognition grammar corresponding to a user's free speech or speech recognition grammar in which speech is restricted.

The voice dialogue system according to claim 1, further comprising means for recording a voice recognition result and dialogue establishment status as log data, and switching a voice recognition grammar to be used or created based on the log data.

Based on the log data, a ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated, and the calculated ratio is equal to or less than a predetermined threshold value. In some cases, the speech recognition grammar is switched to a speech recognition grammar with restricted utterances, and when the threshold is exceeded, the speech recognition grammar is used for speech recognition corresponding to a free utterance. The spoken dialogue system according to claim 3, wherein the system is switched to grammar.

Based on the log data, a ratio between the number of times the column with the data tag is output and the number of times the grammar for speech recognition is used is calculated, and the calculated ratio is equal to or less than a predetermined threshold value. 4. The speech recognition grammar according to claim 3, wherein a speech recognition grammar in which utterances are restricted is created in some cases, and a speech recognition grammar corresponding to a free speech is created if the threshold value is exceeded. Dialog system.

A voice input unit for inputting at least a user utterance;
A dialogue management unit for controlling a response instruction according to the input recognition result of the user utterance;
A token tag that is identification information indicating a category to be uttered by a user, or a database in which a data tag that is identification information indicating data to be searched in a voice interaction service is set in each column. A voice interaction method executed by a computer,
A step of creating a speech recognition grammar including the token tag and the data tag, the speech recognition grammar creation unit provided in the computer;
A step of recognizing the user utterance by collating the input user utterance with the item to which the token tag is added in the created speech recognition grammar, the voice recognition unit provided in the computer;
A response generation / output unit included in the computer includes a step of outputting an item to which a data tag corresponding to the token tag extracted in the verification process is added as a response content to the user. Spoken dialogue method.

A voice input unit for inputting at least a user utterance;
A dialogue management unit for controlling a response instruction according to the input recognition result of the user utterance;
A token tag that is identification information indicating a category to be uttered by a user, or a database in which a data tag that is identification information indicating data to be searched in a voice interaction service is set in each column. Audio that causes a computer to perform processing An interactive program,
A speech recognition grammar creation process for creating a speech recognition grammar including the token tag and the data tag;
A speech recognition process for recognizing the user utterance by comparing the item to which the token tag is assigned in the created grammar for speech recognition with the input user utterance;
A spoken dialogue program for causing the computer to execute response generation / output processing for outputting an item to which a data tag corresponding to the token tag extracted in the verification process is added as a response content to the user .