JP7225642B2

JP7225642B2 - Communication robot, control method and control program

Info

Publication number: JP7225642B2
Application number: JP2018182049A
Authority: JP
Inventors: 祐江藤; 雅芳清水; 真司神田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2023-02-21
Anticipated expiration: 2038-09-27
Also published as: JP2020049596A

Description

本発明は、コミュニケーションロボット、制御方法及び制御プログラムに関する。 The present invention relates to a communication robot, control method and control program.

プレゼンテーションや展示、フロント業務等の様々な現場で対人のコミュニケーションを実現するコミュニケーションロボットの普及が進んでいる。例えば、コミュニケーションロボットには、音声認識や機械翻訳、音声感情分析などの音声処理の他、顔認識や表情認識などの画像処理に関するＡＩ（Artificial Intelligence）技術を活用したプラットフォームが導入される。 Communication robots that realize interpersonal communication in various fields such as presentations, exhibitions, and front desk operations are becoming widespread. For example, communication robots will be equipped with a platform that utilizes AI (Artificial Intelligence) technology related to image processing such as face recognition and facial expression recognition, as well as voice processing such as voice recognition, machine translation, and voice emotion analysis.

このようにコミュニケーションロボットが音声処理や画像処理などの情報処理を実行する場合、コミュニケーションロボットに情報が入力されてからコミュニケーションロボットが処理結果を応答するまでに時間差が応答遅延時間として発生する。さらに、コミュニケーションロボットに接続された外部のコンピュータにより情報処理が実行される場合、ネットワークの伝送遅延が加わる分、コミュニケーションロボットの内部で情報処理が実行される場合よりも応答遅延時間が拡大する。 When the communication robot executes information processing such as voice processing and image processing in this way, a time difference occurs as a response delay time from when information is input to the communication robot until the communication robot responds with the processing result. Furthermore, when information processing is performed by an external computer connected to the communication robot, the response delay time is longer than when information processing is performed inside the communication robot due to network transmission delay.

ところで、音声認識機能を備えた車載ナビゲーション装置等の車載システムへの適用を想定した技術として、応答遅延時間に応じた時間長のフィラー、例えば「ええと」や「あの」などのつなぎ言葉を発話する音声認識端末装置が提案されている。 By the way, as a technology that is assumed to be applied to in-vehicle systems such as in-vehicle navigation devices equipped with a voice recognition function, it is possible to utter a filler with a length of time corresponding to the response delay time, such as connecting words such as "um" and "that". A voice recognition terminal device has been proposed.

特開２００６－８８２７６号公報JP-A-2006-88276 特開２０１４－１１０５５８号公報JP 2014-110558 A 特開２０１５－１３５４２０号公報JP 2015-135420 A

しかしながら、上記の音声認識端末装置は、あくまで音声ＵＩ（User Interface）の機能を提供するものに過ぎず、対人のコミュニケーションを実現するコミュニケーションロボットへの適用はそもそも想定されていない。 However, the speech recognition terminal device described above merely provides a voice UI (User Interface) function, and is not originally intended to be applied to a communication robot that realizes interpersonal communication.

１つの側面では、本発明は、コミュニケーションロボットに処理の待ち時間中にフィラー動作を行わせつつ、処理結果を出力する際には、とるべき姿勢で処理結果を出力できるようにするコミュニケーションロボット、制御方法及び制御プログラムを提供することを目的とする。 In one aspect, the present invention provides a communication robot and control system that allows the communication robot to perform a filler action while waiting for processing, and to output the processing result in the posture that should be taken when outputting the processing result. An object is to provide a method and a control program.

一態様では、コミュニケーションロボットは、コミュニケーションロボットに対して入力された情報に基づいて、前記情報が入力されたタイミングから前記コミュニケーションロボットにより応答を出力するまでの応答遅延時間長を予測する予測部と、予測された応答遅延時間長に対応する前記コミュニケーションロボットの動作を決定する決定部と、決定した前記動作を前記コミュニケーションロボットに実行させる動作制御部と、を有する。 In one aspect, the communication robot includes a prediction unit that predicts, based on information input to the communication robot, a response delay time length from a timing at which the information is input until the communication robot outputs a response; A determination unit that determines an operation of the communication robot corresponding to the predicted response delay time length, and an operation control unit that causes the communication robot to execute the determined operation.

一実施形態によれば、ロボットの応答遅延中の動作に発生する不自然さを抑制できる。 According to one embodiment, it is possible to suppress the unnaturalness that occurs in the motion of the robot during the response delay.

図１は、実施例１に係るコミュニケーションロボットのユースケースの一例を示す図である。FIG. 1 is a diagram illustrating an example of a use case of a communication robot according to the first embodiment; 図２は、応答遅延時間の一例を示す図である。FIG. 2 is a diagram showing an example of response delay time. 図３は、実施例１に係るコミュニケーションロボット１の機能的構成の一例を示すブロック図である。FIG. 3 is a block diagram showing an example of the functional configuration of the communication robot 1 according to the first embodiment. 図４は、頭部３の駆動例を示す図である。FIG. 4 is a diagram showing an example of how the head 3 is driven. 図５は、胴部５の駆動例を示す図である。FIG. 5 is a diagram showing an example of driving the body portion 5. As shown in FIG. 図６は、腕部７の駆動例を示す図である。FIG. 6 is a diagram showing an example of how the arm portion 7 is driven. 図７は、ルックアップテーブル１３Ａの一例を示す図である。FIG. 7 is a diagram showing an example of the lookup table 13A. 図８は、ルックアップテーブル１４Ａの一例を示す図である。FIG. 8 is a diagram showing an example of the lookup table 14A. 図９は、実施例１に係るフィラー動作の制御処理の手順を示すフローチャートである。FIG. 9 is a flowchart illustrating the procedure of a filler operation control process according to the first embodiment. 図１０は、実施例２に係るコミュニケーションロボット２の機能的構成の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of the functional configuration of the communication robot 2 according to the second embodiment. 図１１は、動作区間の設定方法の一例を示す図である。FIG. 11 is a diagram illustrating an example of a method for setting operation intervals. 図１２は、動作と違和感の有無の対応関係の一例を示す図である。FIG. 12 is a diagram illustrating an example of a correspondence relationship between actions and the presence or absence of discomfort. 図１３は、各動作区間で実行が許可される動作の一例を示す図である。FIG. 13 is a diagram showing an example of actions permitted to be executed in each action section. 図１４は、実施例２に係るフィラー動作の制御処理の手順を示すフローチャートである。FIG. 14 is a flowchart illustrating the procedure of a filler operation control process according to the second embodiment. 図１５は、実施例３に係るコミュニケーションロボット４の機能的構成の一例を示すブロック図である。FIG. 15 is a block diagram showing an example of the functional configuration of the communication robot 4 according to the third embodiment. 図１６は、実施例１～実施例３に係る制御プログラムを実行するコンピュータのハードウェア構成例を示す図である。FIG. 16 is a diagram illustrating a hardware configuration example of a computer that executes control programs according to the first to third embodiments.

以下に添付図面を参照して本願に係るコミュニケーションロボット、制御方法及び制御プログラムについて説明する。なお、この実施例は開示の技術を限定するものではない。そして、各実施例は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 A communication robot, a control method, and a control program according to the present application will be described below with reference to the accompanying drawings. Note that this embodiment does not limit the disclosed technology. Further, each embodiment can be appropriately combined within a range that does not contradict the processing contents.

［ユースケースの一例］
図１は、実施例１に係るコミュニケーションロボットのユースケースの一例を示す図である。図１には、ユースケースのあくまで一例として、多言語のコミュニケーションを実現する側面から、音声認識や機械翻訳を併用することにより、対象者Ｕ１の発話を母国語から外国語へ翻訳して読み上げる音声ＵＩの機能を提供するコミュニケーションロボット１を示す。 [Example of use case]
FIG. 1 is a diagram illustrating an example of a use case of a communication robot according to the first embodiment; As a mere example of a use case, FIG. 1 shows, from the aspect of realizing multilingual communication, by using both speech recognition and machine translation, the utterance of the target person U1 is translated from the native language to the foreign language and read out. A communication robot 1 that provides UI functionality is shown.

［応答遅延時間］
ここで、コミュニケーションロボット１に対する発話が対象者Ｕ１により行われてからその発話が目的とする外国語でコミュニケーションロボット１により読み上げられるまでの間には、応答遅延時間が発生する。このような応答遅延時間が発生する一因として、音声認識や機械翻訳等の音声処理が実行されることが挙げられる。 [Response delay time]
Here, a response delay time occurs from when the target person U1 makes an utterance to the communication robot 1 to when the utterance is read out by the communication robot 1 in the target foreign language. One of the reasons for the occurrence of such a response delay time is the execution of speech processing such as speech recognition and machine translation.

図２は、応答遅延時間の一例を示す図である。図２には、コミュニケーションロボット１で発生するイベントが時系列に示されている。図２に示すように、コミュニケーションロボット１は、対象者Ｕ１の発話を待機し（ステップＳ１）、発話の開始を検出してから当該発話の終了を検出する（ステップＳ２及びステップＳ３）。続いて、コミュニケーションロボット１は、ステップＳ２及びステップＳ３で検出された発話区間の音声データの翻訳を開始する（ステップＳ４）。そして、コミュニケーションロボット１は、発話区間の音声データの翻訳が終了すると（ステップＳ５）、対象者Ｕ１の発話が目的とする外国語に翻訳された合成音声の再生を開始し（ステップＳ６）、その後、再生が終了する（ステップＳ７）。 FIG. 2 is a diagram showing an example of response delay time. FIG. 2 shows events occurring in the communication robot 1 in chronological order. As shown in FIG. 2, the communication robot 1 waits for the target person U1 to speak (step S1), detects the start of the speech, and then detects the end of the speech (steps S2 and S3). Subsequently, the communication robot 1 starts translating the voice data of the utterance period detected in steps S2 and S3 (step S4). Then, when the translation of the speech data in the utterance section is finished (step S5), the communication robot 1 starts reproducing the synthesized speech in which the utterance of the subject U1 is translated into the target foreign language (step S6). , the reproduction ends (step S7).

これら一連のイベントにおいて、ステップＳ３で発話の終了が検出された時点からステップＳ６で翻訳後の合成音声の再生が開始される時点までの応答遅延時間Ｔは、対象者Ｕ１にとっては空白の期間、いわゆる待ち時間となる。なお、ここでは、コミュニケーションロボット１の内部で音声処理が実行される場合を例示したが、次のような場合、さらに応答遅延時間が拡大する。例えば、コミュニケーションロボット１に接続された外部のコンピュータにより音声処理がクラウドサービス等として実行される場合、ネットワークの伝送遅延が加わる分、さらに応答遅延時間が拡大する。 In these series of events, the response delay time T from the time when the end of the utterance is detected in step S3 to the time when the synthesized speech after translation starts playing in step S6 is a blank period for the subject U1. This is the so-called waiting time. Although the case where voice processing is executed inside the communication robot 1 is illustrated here, the response delay time is further increased in the following cases. For example, when voice processing is executed as a cloud service or the like by an external computer connected to the communication robot 1, the response delay time is further increased due to network transmission delay.

［課題の一側面］
このような応答遅延時間Ｔに直面して、コミュニケーションロボット１が停止していたのでは、対象者Ｕ１およびコミュニケーションロボット１の間のインタラクションの親和性が損なわれる。 [One aspect of the challenge]
If the communication robot 1 were to stop in the face of such a response delay time T, the interaction affinity between the subject U1 and the communication robot 1 would be impaired.

そうであるからと言って、背景技術の欄で挙げた音声認識端末装置のように、コミュニケーションロボット１につなぎ言葉を発話させたとしても、依然として、動作に不自然さが残る。あくまで一例として、コミュニケーションロボット１に情報が入力されたタイミングからコミュニケーションロボット１が応答を出力するまでの間につなぎ言葉の発話が終了することによりつなぎ言葉が途切れることがある。この場合、つなぎ言葉が途切れたタイミングからコミュニケーションロボット１が応答を出力するまでに生じる時間差が継ぎ目となって不自然に感じられる場合がある。 Even so, even if the communication robot 1 is made to utter a filler word like the voice recognition terminal device mentioned in the background art column, the motion still remains unnatural. By way of example only, the connecting word may be interrupted due to the end of the utterance of the connecting word between the timing when information is input to the communication robot 1 and the time when the communication robot 1 outputs a response. In this case, the time difference that occurs from the timing when the connecting word is interrupted until the communication robot 1 outputs a response may be a seam, which may feel unnatural.

また、背景技術の欄で挙げた音声認識端末装置以外の文献に記載の技術を用いて、応答遅延時のインタラクションに発生する違和感を抑制することも困難である。このような文献の一例として、相手の状態に応じて適切な模倣動作や同調動作のような協力的動作をコミュニケーションロボットに実行させる動作生成システムがある。 Moreover, it is also difficult to suppress discomfort that occurs in interactions during response delays using techniques described in documents other than the speech recognition terminal device listed in the background art column. As an example of such literature, there is a motion generation system that causes a communication robot to perform a cooperative motion such as an appropriate imitation motion or a synchronized motion according to the state of the other party.

上記の動作生成システムでは、次のような課題が設定されている。すなわち、「人間１４が何かを行うときに、ロボット１２がこの種の模倣動作や同調動作（たとえば、人間１４が指差しをするときに、ロボット１２の頭がすぐに同じ方向を向く）を即座に実行するならば、明らかに不自然である。」という課題が設定されている。このような課題設定の下、上記の動作精製システムでは、所定の反応の遅延時間経過後に協力的動作をコミュニケーションロボットに行わせる。このように、上記の文献では、「反応の遅延時間」という用語が含まれているが、その意味合いが上記の「応答遅延時間」との間で根本的に異なる。 The above motion generation system has the following problems. That is, "when the human 14 does something, the robot 12 performs this kind of imitation or synchronizing action (for example, when the human 14 points, the head of the robot 12 immediately turns in the same direction). If it is executed immediately, it is obviously unnatural.” Under such a task setting, the motion refining system causes the communication robot to perform a cooperative motion after a predetermined reaction delay time has elapsed. As described above, the above document includes the term "reaction delay time", but its meaning is fundamentally different from the above-mentioned "response delay time".

すなわち、上記の動作生成システムが「反応の遅延時間」は、コミュニケーションロボットが即座に動作を行うことができる状態であるにもかかわらず、人の反応に合わせてあえて待機することを目的とするものである。このため、上記の「反応の遅延時間」には、コミュニケーションロボット１が音声処理等の情報処理を完了して応答できる状態になるまでインタラクションに違和感がない雰囲気をつなぐというが動機付けが入りこむ余地がない。 In other words, the purpose of the motion generation system is to intentionally wait according to the human reaction even though the communication robot is in a state where the communication robot can immediately perform a motion. is. For this reason, the above-mentioned "reaction delay time" has room for motivation to create an atmosphere that does not make the interaction feel uncomfortable until the communication robot 1 completes information processing such as voice processing and is ready to respond. do not have.

このような動機付けがない「反応の遅延時間」は、上記の「応答遅延時間Ｔ」に対応し得ない。それ故、人が不自然に感じない反応時間よりも応答遅延時間が長くなる状況が一例として発生しうる。このよう状況下で上記の「反応の遅延時間」がコミュニケーションロボットの動作の制御に用いられたとしても、音声処理等が完了する前に動作が途切れるので、ロボットの応答遅延時のインタラクションに違和感が発生する。 Such a "reaction delay time" without motivation cannot correspond to the above "response delay time T". Therefore, as an example, a situation may occur in which the response delay time is longer than the reaction time that people do not feel unnatural. Under such circumstances, even if the above-mentioned "response delay time" is used to control the movement of the communication robot, the movement will be interrupted before voice processing, etc., is completed. Occur.

［課題解決のアプローチの一側面］
そこで、本実施例に係るコミュニケーションロボット１は、コミュニケーションロボット１に対する情報入力完了から応答の再生開始までの応答遅延時間を予測し、予測された応答遅延時間に対応する動作の実行を決定する。これによって、コミュニケーションロボット１が音声処理等の情報処理を完了して応答できる状態になるまでインタラクションに違和感がない雰囲気をつなげることができる。この際、予測された応答遅延時間に対応する動作がコミュニケーションロボット１により行われるので、コミュニケーションロボット１の動作が終了するタイミングと、コミュニケーションロボット１が応答を出力するタイミングとの時間差を抑えることができる。このため、コミュニケーションロボット１の動作と、コミュニケーションロボット１の応答出力とをシームレスに近付けることができる結果、タイミングの時間差から生じる不自然さを抑制できる。したがって、本実施例に係るコミュニケーションロボット１によれば、ロボットの応答遅延時間中のインタラクション（挙動）に発生する違和感を抑制することが可能になる。 [One aspect of problem-solving approach]
Therefore, the communication robot 1 according to this embodiment predicts the response delay time from the completion of information input to the communication robot 1 to the start of playback of the response, and determines the execution of the action corresponding to the predicted response delay time. This allows the communication robot 1 to complete the information processing such as voice processing and to be in a state of being able to respond, creating a comfortable atmosphere for the interaction. At this time, since the communication robot 1 performs an action corresponding to the predicted response delay time, the time difference between the timing when the action of the communication robot 1 ends and the timing when the communication robot 1 outputs a response can be suppressed. . Therefore, the motion of the communication robot 1 and the response output of the communication robot 1 can be brought seamlessly closer to each other, and as a result, the unnaturalness caused by the timing difference can be suppressed. Therefore, according to the communication robot 1 according to the present embodiment, it is possible to suppress discomfort that occurs in interaction (behavior) during the response delay time of the robot.

［コミュニケーションロボット１の構成］
図３は、実施例１に係るコミュニケーションロボット１の機能的構成の一例を示すブロック図である。図３に示すコミュニケーションロボット１は、所定のネットワークを介して、音声認識や機械翻訳、音声感情分析などの音声処理の他、顔認識や表情認識などの画像処理などをバックエンドで実行するサーバ装置５０と接続される。このようにフロントエンドとして機能するコミュニケーションロボット１がサーバ装置５０と接続されることにより、一例として、各種の音声処理や各種の画像処理がクラウドサービス等を通じて提供される。 [Configuration of Communication Robot 1]
FIG. 3 is a block diagram showing an example of the functional configuration of the communication robot 1 according to the first embodiment. The communication robot 1 shown in FIG. 3 is a server device that executes voice processing such as voice recognition, machine translation, voice emotion analysis, etc., as well as image processing such as face recognition and facial expression recognition on the back end via a predetermined network. 50. By connecting the communication robot 1 functioning as a front end to the server device 50 in this manner, various audio processing and various image processing are provided through cloud services or the like, for example.

図３に示すように、コミュニケーションロボット１は、頭部３、胴部５、右腕部７Ｒ、左腕部７Ｌ、音声入力部９Ａ、音声出力部９Ｂと、通信部９Ｃと、モータ９Ｍと、制御部１０とを有する。なお、図３に示す機能部は、あくまで例示であり、コミュニケーションロボット１の機能的構成が図３に示す例以外の機能的構成を有することを妨げない。 As shown in FIG. 3, the communication robot 1 includes a head 3, a body 5, a right arm 7R, a left arm 7L, an audio input section 9A, an audio output section 9B, a communication section 9C, a motor 9M, and a control section. 10. Note that the functional units shown in FIG. 3 are merely examples, and the functional configuration of the communication robot 1 may have a functional configuration other than the example shown in FIG.

図３に示すコミュニケーションロボット１では、制御部１０が出力する制御信号に従ってモータ９Ｍが動力を発生させることにより、頭部３、胴部５、右腕部７Ｒおよび左腕部７Ｌを駆動させることができる。 In the communication robot 1 shown in FIG. 3, the head 3, the body 5, the right arm 7R and the left arm 7L can be driven by the motor 9M generating power according to the control signal output from the control unit 10.

頭部３は、モータ９Ｍの動力によって頭部３を駆動させるアクチュエータ３１と、光を点灯または点滅する発光部３２とを有する。このうち、発光部３２は、コミュニケーションロボット１の感情表現に用いることができる。例えば、発光部３２は、喜怒哀楽の感情ごとに当該感情に対応する色で点灯または点滅することにより、コミュニケーションロボット１の喜怒哀楽を表現することができる。 The head 3 has an actuator 31 that drives the head 3 by the power of the motor 9M, and a light emitting section 32 that lights or flashes light. Among them, the light emitting unit 32 can be used for the communication robot 1 to express emotions. For example, the light emitting unit 32 can express the emotions of the communication robot 1 by lighting or blinking in a color corresponding to each emotion.

図４は、頭部３の駆動例を示す図である。例えば、図４の上段に示すように、Ｘ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して頭部３のアクチュエータ３１を駆動することにより、頭部３をチルト方向に回転させることができる。このように左右のＸ軸を回転軸として頭部３を下方向および上方向に回転駆動させることにより、頷き動作等を行うことができる。また、図４の中段に示すように、Ｙ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して頭部３のアクチュエータ３１を駆動させることにより、頭部３をパン方向に回転させることができる。このように上下のＹ軸を回転軸として頭部３を左方向および右方向に回転駆動させることにより、首振り動作等を行うことができる。さらに、また、図４の下段に示すように、Ｚ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して頭部３のアクチュエータ３１を駆動させることにより、頭部３をロール方向に回転させることができる。このように頭部３を前後のＺ軸回りに回転駆動させることにより、首傾げ動作等を行うことができる。 FIG. 4 is a diagram showing an example of how the head 3 is driven. For example, as shown in the upper part of FIG. 4, the head 3 can be rotated in the tilt direction by outputting a control signal to the motor 9M to generate a torque around the X axis to drive the actuator 31 of the head 3. can be done. In this way, by rotating the head 3 downward and upward about the left and right X-axes as a rotation axis, a nodding motion or the like can be performed. Further, as shown in the middle part of FIG. 4, the head 3 is rotated in the pan direction by outputting a control signal for generating torque around the Y axis to the motor 9M to drive the actuator 31 of the head 3. can be done. By rotating the head 3 in the left and right directions with the vertical Y-axis as the rotation axis in this way, it is possible to perform a swinging motion and the like. Furthermore, as shown in the lower part of FIG. 4, the head 3 is rotated in the roll direction by outputting a control signal to the motor 9M to generate a torque around the Z axis to drive the actuator 31 of the head 3. can be made By rotating the head 3 about the Z-axis in the front-rear direction in this manner, the head can be tilted.

胴部５は、モータ９Ｍの動力によって胴部５を駆動させるアクチュエータ５１を有する。図５は、胴部５の駆動例を示す図である。例えば、図５の上段に示すように、Ｘ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して胴部５のアクチュエータ５１を駆動することにより、胴部５をチルト方向に回転させることができる。このように左右のＸ軸を回転軸として胴部５を前方向および後方向に回転駆動させることにより、お辞儀動作や仰け反り動作などを行うことができる。また、図５の下段に示すように、Ｙ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して胴部５のアクチュエータ５１を駆動することにより、胴部５をパン方向に回転させることができる。このように上下のＹ軸を回転軸として胴部５を左方向および右方向に回転駆動させることにより、胴ひねり動作等を行うことができる。さらに、また、図５の下段に示すように、Ｚ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して胴部５のアクチュエータ５１を駆動させることにより、胴部５をロール方向に回転させることができる。このように胴部５を前後のＺ軸回りに回転駆動させることにより、胴部５を左方に倒れる動作等を行うことができる。 The trunk portion 5 has an actuator 51 that drives the trunk portion 5 by the power of the motor 9M. FIG. 5 is a diagram showing an example of driving the body portion 5. As shown in FIG. For example, as shown in the upper part of FIG. 5, a control signal for generating torque around the X axis is output to the motor 9M to drive the actuator 51 of the body 5, thereby rotating the body 5 in the tilt direction. can be done. In this way, by rotationally driving the torso 5 forward and backward about the left and right X-axis as a rotation axis, it is possible to perform a bowing motion, a backward bending motion, and the like. Further, as shown in the lower part of FIG. 5, a control signal for generating torque around the Y axis is output to the motor 9M to drive the actuator 51 of the body 5, thereby rotating the body 5 in the pan direction. can be done. By rotating the body portion 5 leftward and rightward about the vertical Y-axis as a rotation axis in this manner, a body twisting motion and the like can be performed. Furthermore, as shown in the lower part of FIG. 5, a control signal for generating torque around the Z axis is output to the motor 9M to drive the actuator 51 of the body 5, thereby rotating the body 5 in the roll direction. can be made By rotationally driving the body 5 about the Z-axis in the front-rear direction in this manner, the body 5 can be tilted leftward.

右腕部７Ｒおよび左腕部７Ｌは、モータ９Ｍの動力によって右腕部７Ｒまたは左腕部７Ｌを駆動させるアクチュエータ７１Ｒ及びアクチュエータ７１Ｌと、光を点灯または点滅する発光部７２Ｒおよび発光部７２Ｌとを有する。このうち、発光部７２Ｒおよび発光部７２Ｌは、右腕部７Ｒおよび左腕部７Ｌの先端部に設けることにより、方向指示器として機能させることができる。例えば、発光部７２Ｒを点灯することにより、右腕部７Ｒが指す方向に視線を誘導することができる。また、発光部７２Ｌを点灯することにより、左腕部７Ｌが指す方向に視線を誘導することができる。 The right arm 7R and left arm 7L have actuators 71R and 71L that drive the right arm 7R and left arm 7L by the power of the motor 9M, and light emitters 72R and 72L that turn on or flash light. Of these, the light-emitting portion 72R and the light-emitting portion 72L can function as direction indicators by being provided at the distal end portions of the right arm portion 7R and the left arm portion 7L. For example, by lighting the light-emitting portion 72R, the line of sight can be guided in the direction pointed by the right arm portion 7R. Also, by turning on the light-emitting portion 72L, the line of sight can be guided in the direction pointed by the left arm portion 7L.

図６は、腕部７の駆動例を示す図である。例えば、図６に示すように、Ｘ軸回りのトルクを発生させる制御信号をモータ９Ｍに出力して右腕部７Ｒのアクチュエータ７１Ｒを駆動することにより、右腕部７Ｒを上下方向に回転させることができる。このように左右のＸ軸を回転軸として右腕部７Ｒを下方向および上方向に回転駆動させることにより、右腕の振り上げ動作や振り下げ動作などを行うことができる。ここで、図６には、右腕部７Ｒの駆動例を抜粋して示したが、左腕部７Ｌについてもアクチュエータ７１Ｌを駆動することにより、左腕部７Ｌを上下方向に回転させることができ、左腕の振り上げ動作や振り下げ動作などを行うことができる。これら右腕および左腕を連動させることにより、例えば、気を付けの姿勢や前にならえの姿勢をとらせることもできる。 FIG. 6 is a diagram showing an example of how the arm portion 7 is driven. For example, as shown in FIG. 6, the right arm 7R can be rotated vertically by outputting a control signal for generating torque around the X axis to the motor 9M to drive the actuator 71R of the right arm 7R. . By rotationally driving the right arm portion 7R downward and upward about the left and right X-axes as a rotation axis in this manner, the right arm can be raised and lowered. Here, although FIG. 6 shows an example of driving the right arm 7R, the left arm 7L can also be rotated vertically by driving the actuator 71L. It can perform swing-up motions and swing-down motions. By interlocking the right arm and the left arm, for example, it is possible to make the robot take a posture of taking care or a posture to follow.

音声入力部９Ａは、音信号を入力する機能部である。 The voice input unit 9A is a functional unit that inputs sound signals.

一実施形態として、音声入力部９Ａは、音を電気信号に変換する１または複数のマイクロフォン等により実装することができる。例えば、音声入力部９Ａは、マイクロフォンを介して音を採取することにより得られたアナログ信号をデジタル信号へ変換した上で音声データとして音声処理部１１へ入力する。 As one embodiment, the audio input unit 9A can be implemented by one or more microphones or the like that convert sound into electrical signals. For example, the voice input unit 9A converts an analog signal obtained by picking up sound through a microphone into a digital signal and inputs the digital signal to the voice processing unit 11 as voice data.

音声出力部９Ｂは、各種の音声を出力する機能部である。 The audio output unit 9B is a functional unit that outputs various sounds.

一実施形態として、音声出力部９Ｂは、１つまたは複数のスピーカを含むスピーカユニットとして実装することができる。例えば、音声出力部９Ｂは、制御部１０からの指示にしたがって、プレゼンテーションやナビゲーションに関するメッセージを読み上げる合成音声等を出力することができる。 As one embodiment, the audio output section 9B can be implemented as a speaker unit including one or more speakers. For example, the voice output unit 9B can output synthetic voices for reading out messages regarding presentations and navigation according to instructions from the control unit 10 .

制御部１０は、コミュニケーションロボット１の全体制御を行う処理部である。 The control unit 10 is a processing unit that performs overall control of the communication robot 1 .

一実施形態として、制御部１０は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などのハードウェアプロセッサにより実装することができる。ここでは、プロセッサの一例として、ＣＰＵやＭＰＵを例示したが、汎用型および特化型を問わず、任意のプロセッサ、例えばＤＳＰ（Digital Signal Processor）やＧＰＵ（Graphics Processing Unit）などにより実装することができる。この他、制御部１０は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などのハードワイヤードロジックによって実現されることとしてもかまわない。 As one embodiment, the control unit 10 can be implemented by a hardware processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit). Here, as an example of a processor, a CPU and an MPU are exemplified, but regardless of whether it is a general-purpose type or a specialized type, it can be implemented by an arbitrary processor such as a DSP (Digital Signal Processor) or a GPU (Graphics Processing Unit). can. Alternatively, the control unit 10 may be realized by hardwired logic such as ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array).

制御部１０は、図示しない主記憶装置として実装されるＤＲＡＭ（Dynamic Random Access Memory）などのＲＡＭのワークエリア上に、コミュニケーションロボット１を制御する制御プログラムを展開することにより、下記の処理部を仮想的に実現する。 The control unit 10 virtualizes the following processing units by developing a control program for controlling the communication robot 1 on a work area of a RAM such as a DRAM (Dynamic Random Access Memory) implemented as a main storage device (not shown). practically realized.

制御部１０は、図３に示すように、音声処理部１１と、伝送処理部１２と、予測部１３と、決定部１４と、動作制御部１５とを有する。 The control unit 10 includes an audio processing unit 11, a transmission processing unit 12, a prediction unit 13, a determination unit 14, and an operation control unit 15, as shown in FIG.

音声処理部１１は、音声データを取得する処理部である。 The audio processing unit 11 is a processing unit that acquires audio data.

一実施形態として、音声処理部１１は、音声入力部９Ａから音声データを取得する。ここで音声入力部９Ａから取得される音声データは、ストリーム形式で入力されることとしてもよいし、ファイル形式で入力されることとしてもかまわない。このように取得される音声データには、各種の音声処理を実行することができる。 As one embodiment, the audio processing unit 11 acquires audio data from the audio input unit 9A. The audio data acquired from the audio input unit 9A may be input in a stream format or may be input in a file format. Various audio processing can be performed on the audio data acquired in this way.

このような音声処理の一例として、音声処理部１１は、音声データから発話区間を検出することができる。例えば、音声処理部１１は、音声データの波形の振幅および零交差に基づいて発話開始および発話終了を検出することとしてもよいし、音声データのフレームごとにＧＭＭ（Gaussian mixture model）にしたがって音声の尤度および非音声の尤度を算出してこれらの尤度の比から発話開始および発話終了を検出することもできる。 As an example of such audio processing, the audio processing unit 11 can detect an utterance period from audio data. For example, the speech processing unit 11 may detect the start of speech and the end of speech based on the amplitude and zero crossing of the waveform of the speech data. It is also possible to calculate the likelihood and the non-speech likelihood and detect the speech start and speech end from the ratio of these likelihoods.

この他、音声処理部１１は、音声データから検出された発話区間にワードスポッティングを始めとする音声認識を実行することもできる。例えば、音声処理部１１は、発話区間の音声データを所定の言語モデルや所定の音素モデルと照合することにより、当該音声データをテキストへ変換する。 In addition, the speech processing unit 11 can also perform speech recognition, such as word spotting, in speech segments detected from speech data. For example, the speech processing unit 11 converts the speech data into text by collating the speech data of the utterance period with a predetermined language model or a predetermined phoneme model.

なお、ここでは、コミュニケーションロボット１が発話区間の検出や発話区間の音声認識を実行する例を挙げたが、必ずしもコミュニケーションロボット１が発話区間の検出や発話区間の音声認識を実行せずともかまわない。例えば、コミュニケーションロボット１に接続されたサーバ装置５０が発話区間の検出や発話区間の音声認識を実行することとしてもかまわない。 Although an example in which the communication robot 1 executes the detection of the speech period and the speech recognition of the speech period is given here, the communication robot 1 does not necessarily have to detect the speech period and the speech recognition of the speech period. . For example, the server device 50 connected to the communication robot 1 may detect the utterance period and recognize the voice of the utterance period.

伝送処理部１２は、外部装置にデータを伝送する処理部である。 The transmission processing unit 12 is a processing unit that transmits data to an external device.

１つの側面として、伝送処理部１２は、音声処理部１１により発話区間に対する音声認識が実行された場合、音声認識結果として得られたテキストの翻訳依頼をサーバ装置５０に伝送する。この翻訳依頼が伝送されたサーバ装置５０では、コミュニケーションロボット１から伝送されたテキストに機械翻訳を実行することにより、対象者Ｕ１の発話に対応するテキストを母国語から外国語へ翻訳する。このように母国語から外国語へ翻訳されたテキストがサーバ装置５０からコミュニケーションロボット１へ応答される。 As one aspect, the transmission processing unit 12 transmits a translation request for the text obtained as a result of the speech recognition to the server device 50 when the speech processing unit 11 performs speech recognition for the speech period. The server device 50 to which this translation request has been transmitted performs machine translation on the text transmitted from the communication robot 1, thereby translating the text corresponding to the utterance of the target person U1 from the native language to the foreign language. The text translated from the native language into the foreign language is sent from the server device 50 to the communication robot 1 as a response.

なお、ここでは、あくまで一例として、テキストの翻訳がサーバ装置５０により実行される例を挙げたが、テキストの翻訳もコミュニケーションロボット１により実行されることとしてもかまわない。 Here, as an example, the text translation is performed by the server device 50 , but the text translation may also be performed by the communication robot 1 .

予測部１３は、コミュニケーションロボット１に入力される情報量に基づいて応答遅延時間を予測する処理部である。 The prediction unit 13 is a processing unit that predicts response delay time based on the amount of information input to the communication robot 1 .

一実施形態として、予測部１３は、音声処理部１１により検出された発話区間の時間長から応答遅延時間を予測する。以下、発話区間の時間長のことを「発話時間」と記載する場合がある。例えば、予測部１３は、発話時間と応答遅延時間Ｔの対応関係が定義されたルックアップテーブル１３Ａを参照して、音声処理部１１により検出された発話時間に対応する値を応答遅延時間Ｔとして予測することができる。図７は、ルックアップテーブル１３Ａの一例を示す図である。図７に示すルックアップテーブル１３Ａによれば、発話時間が０秒以上０．５秒未満の範囲である場合、応答遅延時間が０．６秒と予測される。また、発話時間が０．５秒以上１．０秒未満の範囲である場合、応答遅延時間が１．０秒と予測される。また、発話時間が１．０秒以上１．５秒未満の範囲である場合、応答遅延時間が１．６秒と予測される。また、発話時間が１．５秒以上２．０秒未満の範囲である場合、応答遅延時間が２．５秒と予測される。 As one embodiment, the prediction unit 13 predicts the response delay time from the time length of the speech period detected by the speech processing unit 11 . Hereinafter, the time length of the speech section may be referred to as "speech duration". For example, the prediction unit 13 refers to the lookup table 13A that defines the correspondence between the utterance time and the response delay time T, and sets the value corresponding to the utterance time detected by the speech processing unit 11 as the response delay time T. can be predicted. FIG. 7 is a diagram showing an example of the lookup table 13A. According to the lookup table 13A shown in FIG. 7, the response delay time is predicted to be 0.6 seconds when the speech time is in the range of 0 seconds to less than 0.5 seconds. Also, when the speech time is in the range of 0.5 seconds to less than 1.0 seconds, the response delay time is predicted to be 1.0 seconds. Also, when the speech time is in the range of 1.0 seconds or more and less than 1.5 seconds, the response delay time is predicted to be 1.6 seconds. Also, when the speech time is in the range of 1.5 seconds or more and less than 2.0 seconds, the response delay time is predicted to be 2.5 seconds.

このように、ルックアップテーブル１３Ａには、発話時間が短くなるにしたがって短い応答遅延時間Ｔが予測される一方で、発話時間が長くなるにしたがって長い応答遅延時間Ｔが予測される。このような応答遅延時間Ｔを定義するのは、発話時間が長くなるにつれて翻訳処理、例えば形態素解析や機械翻訳などの所要時間が長くなることが一因にある。さらに、発話時間が長くなるにつれてテキストのサイズが大きくなることから、ネットワークの伝送遅延も大きくなることも一因にある。 In this way, the lookup table 13A predicts a shorter response delay time T as the speech time becomes shorter, while predicting a longer response delay time T as the speech time becomes longer. One of the reasons for defining such a response delay time T is that the longer the speech time is, the longer the time required for translation processing, such as morphological analysis and machine translation. In addition, as the speech duration increases, the size of the text increases, which also increases network transmission delay.

なお、ここでは、あくまで一例としてルックアップテーブル１３Ａを用いる場合を例示したが、発話時間が長くなるにしたがって長い応答遅延時間Ｔを導出する関数を用いて、発話時間に対応する応答遅延時間Ｔを算出することとしてもかまわない。例えば、応答遅延時間Ｔを導出する関数の一例として、発話時間を「ｘ」としたとき、Ｔ＝１．３＊ｘを採用することができる。また、発話時間および応答遅延時間Ｔの両者の関係は、必ずしも線形でなくともよく、非線形であってかまわない。例えば、応答遅延時間Ｔを導出する非線形の関数の一例として、発話時間を「ｘ」としたとき、シグモイド関数σ（ｘ）を採用することができる。この場合、シグモイド関数のゲインには、一例として、人が一呼吸で発話する発話時間の推定上限値などを設定することができる。 Although the case where the lookup table 13A is used is exemplified here, the response delay time T corresponding to the utterance time is calculated using a function for deriving a longer response delay time T as the utterance time becomes longer. It does not matter if it is calculated. For example, as an example of a function for deriving the response delay time T, T=1.3*x, where x is the speech time, can be adopted. Also, the relationship between the speech time and the response delay time T does not necessarily have to be linear, and may be non-linear. For example, as an example of a nonlinear function for deriving the response delay time T, a sigmoid function σ(x) can be adopted, where x is the speech time. In this case, the gain of the sigmoid function can be set, for example, to an estimated upper limit of the speech time during which a person speaks in one breath.

決定部１４は、応答遅延時間に応じてコミュニケーションロボット１のフィラー動作を決定する処理部である。以下、コミュニケーションロボット１に実行させる動作の中でも、コミュニケーションロボット１に対する情報入力から応答出力までの応答遅延時間をつなぐ動作のことを「フィラー動作」と記載する場合がある。 The determination unit 14 is a processing unit that determines the filler motion of the communication robot 1 according to the response delay time. Hereinafter, among the actions to be executed by the communication robot 1, the action that connects the response delay time from the information input to the communication robot 1 to the response output may be referred to as a "filler action".

一実施形態として、決定部１４は、予測部１３により予測された応答遅延時間からコミュニケーションロボット１のフィラー動作を決定する。ここで言う「フィラー動作」には、コミュニケーションロボット１の身体の駆動のみならず、その他の表現、例えばメッセージ等の音声出力やＬＥＤ点滅等の表示などもその範疇に含まれる。例えば、決定部１４は、応答遅延時間と動作の対応関係が定義されたルックアップテーブル１４Ａを参照して、予測部１３により予測された応答遅延時間Ｔに対応する動作をコミュニケーションロボット１のフィラー動作として決定することができる。 As one embodiment, the determination unit 14 determines the filler motion of the communication robot 1 from the response delay time predicted by the prediction unit 13 . The "filler operation" referred to here includes not only the movement of the body of the communication robot 1, but also other expressions such as voice output of messages and displays such as blinking of LEDs. For example, the determination unit 14 refers to the lookup table 14A that defines the correspondence relationship between the response delay time and the motion, and determines the motion corresponding to the response delay time T predicted by the prediction unit 13 as the filler motion of the communication robot 1. can be determined as

図８は、ルックアップテーブル１４Ａの一例を示す図である。図８に示すルックアップテーブル１４Ａによれば、応答遅延時間Ｔが０秒以上１秒未満の範囲である場合、ＬＥＤ点滅で表現を行う動作が定義されている。この動作は、一例として、頭部３に発光部３２として組み込まれたリング状のＬＥＤを点滅させることにより実現できる。また、応答遅延時間Ｔが１秒以上２秒未満の範囲である場合、コミュニケーションロボット１に目線を上に向ける動作を実行させることが定義されている。この動作は、一例として、コミュニケーションロボット１の頭部３の中で顔の正面に対応する部分を水平方向よりも上側に向く姿勢へ駆動させることにより実現できる。また、応答遅延時間Ｔが２秒以上５秒未満の範囲である場合、コミュニケーションロボット１に首をかしげる動作を実行させることが定義されている。この動作は、一例として、コミュニケーションロボット１の頭部３をロール方向へ回転して駆動させることにより実現できる。また、応答遅延時間Ｔが５秒以上の範囲である場合、コミュニケーションロボット１に両手を上げる動作を実行すると共に、メッセージ「少々お待ち下さい」の音声出力で表現を行うことが定義されている。この動作は、一例として、コミュニケーションロボット１の右腕部７Ｒおよび左腕部７Ｌを上方向に回転して駆動させることにより実現できる。 FIG. 8 is a diagram showing an example of the lookup table 14A. According to the lookup table 14A shown in FIG. 8, when the response delay time T is in the range of 0 seconds or more and less than 1 second, the operation of expressing by LED blinking is defined. This operation can be realized, for example, by blinking a ring-shaped LED incorporated as the light emitting part 32 in the head 3 . Further, it is defined that when the response delay time T is in the range of 1 second or more and less than 2 seconds, the communication robot 1 is caused to perform an action of turning its eyes upward. This operation can be realized, for example, by driving the portion corresponding to the front of the face in the head 3 of the communication robot 1 to a posture that faces upward from the horizontal direction. Further, when the response delay time T is in the range of 2 seconds or more and less than 5 seconds, it is defined that the communication robot 1 is caused to tilt its head. This operation can be realized, for example, by rotating and driving the head 3 of the communication robot 1 in the roll direction. Further, when the response delay time T is in the range of 5 seconds or longer, it is defined that the communication robot 1 raises both hands and outputs a voice message "Please wait a moment". This operation can be realized, for example, by rotating the right arm 7R and the left arm 7L of the communication robot 1 upward.

このように、ルックアップテーブル１４Ａには、応答遅延時間Ｔが短いほどコミュニケーションロボット１の外形形状、いわゆるシルエットの変化が小さい動作がフィラー動作として定義されている。これは、フィラー動作が実行されることで変化した姿勢のままで情報入力、例えば対象者Ｕ１の発話等に対する応答を出力する事態を避け、フィラー動作前の姿勢に速やかに戻して応答を出力するためである。一方で、ルックアップテーブル１４Ａには、応答遅延時間Ｔが長いほどコミュニケーションロボット１のシルエットの変化が大きい動作がフィラー動作として定義されている。これは、コミュニケーションロボット１のフィラー動作が小さい場合、次のような不安を対象者Ｕ１に与えやすい側面があるからである。例えば、応答遅延時間が長引くにつれて情報入力がコミュニケーションロボット１により受け付けられていない、あるいは情報入力に対応する情報処理が実行されていない等の不安を対象者Ｕ１に与えやすい側面があるからである。 Thus, in the lookup table 14A, a filler motion is defined as a motion in which the shorter the response delay time T, the smaller the change in the outer shape of the communication robot 1, that is, the so-called silhouette. This avoids outputting a response to information input, for example, the subject U1's utterance, etc., while maintaining the posture changed by executing the filler motion, and quickly returns to the posture before the filler motion and outputs the response. It's for. On the other hand, in the lookup table 14A, a filler motion is defined as a motion in which the silhouette of the communication robot 1 changes more as the response delay time T increases. This is because when the filler motion of the communication robot 1 is small, it tends to give the subject U1 anxiety as follows. For example, as the response delay time becomes longer, the communication robot 1 may not accept the information input, or the information processing corresponding to the information input may not be executed.

なお、ここでは、あくまで一例としてルックアップテーブル１４Ａを用いる場合を例示したが、応答遅延時間Ｔが長くなるにしたがってシルエットの変化が大きい動作を導出する関数を用いて、応答遅延時間Ｔに対応する動作を出力することとしてもかまわない。例えば、右腕部７Ｒ及び左腕部７Ｌの少なくとも１つの振り上げ動作や振り下げ動作の回転角度の大きさを「θ」としたとき、θ＝（π＊Ｔ）／４を採用することができる。また、応答遅延時間Ｔの長さおよび動作のシルエットの変化の大きさの両者の関係は、必ずしも線形でなくともよく、非線形であってかまわない。例えば、右腕部７Ｒ及び左腕部７Ｌの少なくとも１つの振り上げ動作や振り下げ動作の回転角度の大きさを「θ」としたとき、シグモイド関数σ（θ）を採用することができる。この場合、シグモイド関数のゲインには、一例として、腕部７が上限まで振り上げられた方位と腕部７が下限まで振り下げられた方位との差、すなわち腕部７の可動域などを設定することができる。 Here, the case of using the lookup table 14A is exemplified only as an example. It does not matter if the action is output. For example, θ=(π*T)/4 can be employed, where θ is the rotation angle of at least one swing-up motion or swing-down motion of the right arm 7R and left arm 7L. Also, the relationship between the length of the response delay time T and the magnitude of change in the motion silhouette does not necessarily have to be linear, and may be non-linear. For example, a sigmoid function σ(θ) can be employed, where θ is the rotation angle of at least one swing-up motion or swing-down motion of the right arm 7R and left arm 7L. In this case, for the gain of the sigmoid function, for example, the difference between the direction in which the arm 7 is swung up to the upper limit and the direction in which the arm 7 is swung down to the lower limit, that is, the range of motion of the arm 7 is set. be able to.

動作制御部１５は、コミュニケーションロボット１の動作を制御する処理部である。 The motion control section 15 is a processing section that controls the motion of the communication robot 1 .

一実施形態として、動作制御部１５は、フィラー動作が実行される前の元の姿勢がフィラー動作によって変化し、フィラー動作の完了後に元の姿勢に復帰するまでの時間と、応答遅延時間Ｔとを一致させることとする。この場合、動作制御部１５は、応答遅延時間Ｔが経過した時点で各部位の姿勢が元の姿勢に復帰できるように、コミュニケーションロボット１の各部位の駆動量および駆動速度などの駆動パラメータを決定し、駆動パラメータにしたがってフィラー動作および元の姿勢への復帰動作を実行する。 As one embodiment, the motion control unit 15 sets the time required for the original posture before the filler motion to change due to the filler motion to return to the original posture after the filler motion is completed, and the response delay time T. shall be matched. In this case, the motion control unit 15 determines drive parameters such as the drive amount and drive speed of each part of the communication robot 1 so that the posture of each part can return to its original posture when the response delay time T has elapsed. Then, the filler motion and the return motion to the original posture are performed according to the driving parameters.

例えば、フィラー動作が「ＬＥＤ点滅」である場合、動作制御部１５は、コミュニケーションロボット１の頭部３に発光部３２として組み込まれたリング状のＬＥＤを点滅させる。また、フィラー動作が「目線を上に向ける」である場合、動作制御部１５は、コミュニケーションロボット１の左右方向のＸ軸回りに頭部３を上方向へ回転駆動させる。また、フィラー動作が「首をかしげる」である場合、動作制御部１５は、コミュニケーションロボット１の前後方向のＺ軸回りに頭部３をロール方向、左方向または右方向へ回転駆動させる。また、フィラー動作が「両手を上げる＋音声メッセージ」である場合、コミュニケーションロボット１の左右方向のＸ軸回りに右腕部７Ｒおよび左腕部７Ｌを上方向へ回転駆動させると共に、音声出力部９Ｂからメッセージ「少々お待ち下さい」を音声出力させる。このようなフィラー動作の実行後、動作制御部１５は、駆動系のフィラー動作が行われていた場合、フィラー動作の実行前の元の姿勢に復帰する復帰動作を実行する。 For example, when the filler action is "LED flashing", the action control unit 15 causes the ring-shaped LED incorporated as the light emitting unit 32 in the head 3 of the communication robot 1 to flash. Further, when the filler motion is to "look upward", the motion control unit 15 rotates the head 3 upward about the X-axis of the communication robot 1 in the left-right direction. Further, when the filler motion is "tilt head", the motion control unit 15 rotationally drives the head 3 in the roll direction, the left direction, or the right direction around the Z-axis of the communication robot 1 in the front-rear direction. When the filler motion is "raise both hands + voice message", the right arm 7R and the left arm 7L are rotationally driven upward about the X-axis in the horizontal direction of the communication robot 1, and the message is output from the voice output unit 9B. "Please wait a moment" is output by voice. After performing such a filler motion, the motion control unit 15 performs a return motion for returning to the original posture before the filler motion, if the filler motion of the drive system has been performed.

［処理の流れ］
図９は、実施例１に係るフィラー動作の制御処理の手順を示すフローチャートである。この処理は、一例として、コミュニケーションロボット１に対する情報入力を受け付けた場合、例えば音声処理部１１により発話区間が検出された場合に起動する。 [Process flow]
FIG. 9 is a flowchart illustrating the procedure of a filler operation control process according to the first embodiment. For example, this process is started when information input to the communication robot 1 is received, for example, when the voice processing unit 11 detects an utterance period.

図９に示すように、音声入力部９Ａから取得された音声データから発話区間が検出されると（ステップＳ１０１Ｙｅｓ）、音声処理部１１は、当該発話区間にワードスポッティングを始めとする音声認識を実行する（ステップＳ１０２）。続いて、伝送処理部１２は、ステップＳ１０２の音声認識結果として得られたテキストの翻訳依頼をサーバ装置５０に伝送する（ステップＳ１０３）。 As shown in FIG. 9, when an utterance segment is detected from the voice data acquired from the voice input unit 9A (step S101 Yes), the voice processing unit 11 executes voice recognition including word spotting for the utterance segment. (step S102). Subsequently, the transmission processing unit 12 transmits a translation request for the text obtained as the speech recognition result of step S102 to the server device 50 (step S103).

このようにテキストの翻訳依頼が伝送されたサーバ装置５０では、コミュニケーションロボット１から伝送されたテキストに機械翻訳が実行される。そして、対象者Ｕ１の発話に対応するテキストが母国語から外国語へ翻訳された段階でテキストの翻訳結果がコミュニケーションロボット１へ返信される。 In the server device 50 to which the text translation request has thus been transmitted, the text transmitted from the communication robot 1 is machine-translated. Then, when the text corresponding to the utterance of the target person U1 is translated from the native language to the foreign language, the translation result of the text is sent back to the communication robot 1.

これらステップＳ１０２又はステップＳ１０３と並行するか、あるいはステップＳ１０２及びステップＳ１０３と前後して、予測部１３は、ステップＳ１０１で検出された発話区間の時間長から応答遅延時間を予測する（ステップＳ１０４）。そして、決定部１４は、ステップＳ１０４で予測された応答遅延時間からコミュニケーションロボット１のフィラー動作を決定する（ステップＳ１０５）。 In parallel with step S102 or step S103, or before or after step S102 and step S103, the prediction unit 13 predicts the response delay time from the time length of the speech period detected in step S101 (step S104). Then, the determination unit 14 determines the filler motion of the communication robot 1 from the response delay time predicted in step S104 (step S105).

なお、ステップＳ１０４で予測される応答遅延時間は、ステップＳ１０５におけるフィラー動作の決定に用いられる。このため、ステップＳ１０４の処理は、ステップＳ１０５の処理が実行されるまでの任意のタイミングで実行することができる。例えば、ステップＳ１０４の処理がステップＳ１０３の処理の後に実行されたとしても、ステップＳ１０２又はステップＳ１０３と並行して実行されたとしても、ステップＳ１０５以降の処理内容に変更はない。 Note that the response delay time predicted in step S104 is used for determining the filler operation in step S105. Therefore, the process of step S104 can be executed at any timing until the process of step S105 is executed. For example, even if the process of step S104 is performed after the process of step S103, even if it is performed in parallel with step S102 or step S103, there is no change in the process contents after step S105.

その上で、動作制御部１５は、ステップＳ１０４で予測された応答遅延時間Ｔの間、ステップＳ１０５で決定されたフィラー動作および元の姿勢への復帰動作を実行する（ステップＳ１０６）。 Then, the motion control unit 15 executes the filler motion determined in step S105 and the return motion to the original posture during the response delay time T predicted in step S104 (step S106).

そして、元の姿勢へ復帰した段階でテキストの翻訳結果がサーバ装置５０から受信されない場合（ステップＳ１０７Ｎｏ）、動作制御部１５は、追加のフィラー動作、例えばシルエットの変化が小さいフィラー動作を優先して実行し（ステップＳ１０８）、ステップＳ１０７へ移行する。 Then, when the translation result of the text is not received from the server device 50 at the stage of returning to the original posture (step S107 No), the motion control unit 15 gives priority to additional filler motions, for example, filler motions with small silhouette changes. Execute (step S108) and proceed to step S107.

一方、元の姿勢へ復帰した段階でテキストの翻訳結果がサーバ装置５０から受信された場合（ステップＳ１０７Ｙｅｓ）、動作制御部１５は、サーバ装置５０によるテキストの翻訳結果を合成音声等で音声出力し（ステップＳ１０９）、処理を終了する。 On the other hand, when the translation result of the text is received from the server device 50 at the stage of returning to the original posture (step S107 Yes), the operation control unit 15 outputs the text translation result by the server device 50 as synthesized voice or the like. (Step S109), the process ends.

なお、図９に示すフローチャートでは、発話区間が検出された後にフィラー動作を実行する場合を例示したが、発話終了が検出される前にフィラー動作を開始することもできる。例えば、発話開始が検出されてから所定の閾値以上、例えば３秒以上経過しても発話終了が検出されない場合、シルエットの変化が大きいフィラー動作を優先して発話終了が検出される前に先行してフィラー動作を開始することもできる。 Although the flowchart shown in FIG. 9 illustrates the case where the filler motion is executed after the speech period is detected, the filler motion can be started before the end of the speech is detected. For example, when the end of speech is not detected even after a predetermined threshold value or more, for example, 3 seconds or more have elapsed since the start of speech was detected, the filler motion with a large change in silhouette is prioritized before the end of speech is detected. can also initiate a filler operation.

この他、図９に示すフローチャートでは、元の姿勢へ復帰されてからサーバ装置５０によるテキストの翻訳結果を音声出力する場合を例示したが、これに限定されない。例えば、元の姿勢への復帰前にサーバ装置５０からテキストの翻訳結果が受信された場合、復帰動作を実行しながらサーバ装置５０によるテキストの翻訳結果を音声出力することとしてもかまわない。 In addition, in the flowchart shown in FIG. 9, the case where the translation result of the text by the server device 50 is output by voice after returning to the original posture is exemplified, but the present invention is not limited to this. For example, when the text translation result is received from the server device 50 before returning to the original posture, the text translation result by the server device 50 may be output by voice while executing the returning motion.

［効果の一側面］
上述してきたように、本実施例に係るコミュニケーションロボット１は、コミュニケーションロボット１に対する情報入力から応答出力までの応答遅延時間を予測し、予測された応答遅延時間に対応する動作の実行を決定する。これによって、コミュニケーションロボット１が音声処理等の情報処理を完了して応答できる状態になるまでインタラクションに違和感がない雰囲気をつなげることができる。したがって、本実施例に係るコミュニケーションロボット１によれば、ロボットの応答遅延時のインタラクションに発生する違和感を抑制することが可能である。 [One aspect of the effect]
As described above, the communication robot 1 according to this embodiment predicts the response delay time from information input to the response output to the communication robot 1, and determines the execution of the action corresponding to the predicted response delay time. This allows the communication robot 1 to complete the information processing such as voice processing and to be in a state of being able to respond, creating a comfortable atmosphere for the interaction. Therefore, according to the communication robot 1 according to the present embodiment, it is possible to suppress discomfort that occurs in interaction when the response of the robot is delayed.

さて、上記の実施例１では、応答遅延時間Ｔの間に１つのフィラー動作をコミュニケーションロボット１に実行させる場合を例示したが、応答遅延時間の間に実行できるフィラー動作が必ずしも１つに限定される訳ではない。そこで、本実施例では、２つ以上のフィラー動作を組み合わせて実行する例について説明する。 In the first embodiment, the communication robot 1 is caused to perform one filler motion during the response delay time T, but the filler motion that can be performed during the response delay time is not necessarily limited to one. does not mean Therefore, in this embodiment, an example of performing a combination of two or more filler motions will be described.

図１０は、実施例２に係るコミュニケーションロボット２の機能的構成の一例を示すブロック図である。図１０に示すように、コミュニケーションロボット２は、図３に示すコミュニケーションロボット１に比べて、制御部２０の機能の一部が異なる。すなわち、コミュニケーションロボット２は、設定部２１をさらに有すると共に、図３に示す決定部１４の機能と一部の機能が異なる決定部２２を有する。なお、図３に示すコミュニケーションロボット１と同様の機能を発揮する機能部には同一の符号を付与し、その説明を省略する。 FIG. 10 is a block diagram showing an example of the functional configuration of the communication robot 2 according to the second embodiment. As shown in FIG. 10, the communication robot 2 differs from the communication robot 1 shown in FIG. 3 in part of the functions of the control section 20 . That is, the communication robot 2 further has a setting section 21 and a determination section 22 having a function partially different from that of the determination section 14 shown in FIG. Note that functional units that perform the same functions as those of the communication robot 1 shown in FIG.

設定部２１は、応答遅延時間に基づいて複数の動作区間を設定する処理部である。ここでは、あくまで一例として、２つの動作区間で２種類のフィラー動作が実行される例を挙げて説明することとする。以下、２つの動作区間のうち先行する動作区間のことを「第１の動作区間」と記載すると共に、第１の動作区間に後続する動作区間のことを「第２の動作区間」と記載する場合がある。 The setting unit 21 is a processing unit that sets a plurality of operation intervals based on response delay times. Here, as an example only, an example in which two types of filler motions are performed in two motion sections will be described. Hereinafter, the preceding motion segment of the two motion segments will be referred to as a "first motion segment", and the motion segment following the first motion segment will be referred to as a "second motion segment". Sometimes.

このように第１の動作区間および第２の動作区間を設定するのは、応答遅延時間の近傍でフィラー動作から応答出力の動作へつなげる際の違和感を低減する側面がある。すなわち、予測部１３により予測される応答遅延時間の予測値が応答遅延時間の実測値と必ずしも一致するとは限らないが、そうであるからと言って、応答遅延時間の予測値が的外れであるケースは稀であり、応答遅延時間の実測値は予測値の近傍に収束しやすい。 Setting the first motion interval and the second motion interval in this way has the aspect of reducing discomfort when connecting the filler motion to the response output motion in the vicinity of the response delay time. That is, the predicted value of the response delay time predicted by the prediction unit 13 does not necessarily match the measured value of the response delay time, but even so, the predicted value of the response delay time is off target. is rare, and measured values of response delay tend to converge close to predicted values.

この知見を利用して、設定部２１は、応答遅延時間の予測値に基づいて第１の動作区間および第２の動作区間を設定する。図１１は、動作区間の設定方法の一例を示す図である。図１１に示すように、設定部２１は、応答遅延時間の予測値Ｔ_予測の経過時点の所定時間前、例えば１秒前までの区間を第１の動作区間に設定する。さらに、設定部２１は、第１の動作区間の終了から応答遅延時間の予測値Ｔ_実測の経過時点を超えて所定時間後、例えば１秒後までを第２の動作区間に設定する。ここで、応答遅延時間の予測値Ｔ_予測が応答遅延時間の実測値Ｔ_実測との間でずれが生じたとしても、第２の動作区間の範囲内で応答遅延時間の予測値Ｔ_予測および応答遅延時間の実測値Ｔ_実測のずれが収束するように、上記の所定時間が設定される。例えば、応答遅延時間の予測値Ｔ_予測および応答遅延時間の実測値Ｔ_実測のずれの実績のうち所定の割合、例えば８割以上が含まれる区間長を第２の動作区間として設定することができる。この他、応答遅延時間の予測値Ｔ_予測および応答遅延時間の実測値Ｔ_実測のずれの統計値、例えば中央値や最頻値、平均値などに安全マージンが加算された区間長を第２の動作区間として設定することができる。なお、図１１には、応答遅延時間の予測値Ｔ_予測の経過時点の前および後で同一の区間長を持つ第２の動作区間を設定する例を説明したが、上記の実績や上記の統計値に基づいて異なる区間長を設定することもできる。 Using this knowledge, the setting unit 21 sets the first operation interval and the second operation interval based on the predicted value of the response delay time. FIG. 11 is a diagram illustrating an example of a method for setting operation intervals. As shown in FIG. 11, the setting unit 21 sets a section up to a _{predetermined} time, for example, one second before the passage of the predicted value T of the response delay time as the first operation section. Further, the setting unit 21 sets a second operation interval from the end of the first operation interval to a predetermined time, for example, one second after the actual _measurement of the predicted value T of the response delay time. Here, even if there is a difference between the predicted value T _predicted of the response delay time and the measured value T of the response delay time, the predicted value T _predicted value T of the response delay time and _the response delay time within the range of the second operation section The predetermined time is set so that the deviation of the actual measurement _value T of the delay time converges. For example, a section length that includes a predetermined ratio, for example, 80% or more of the results of _deviation between the predicted value T of the response delay time and the _measured value T of the response delay time can be set as the second operation section. . In addition, the section length obtained by adding a safety margin to the statistic value of the difference between _{the predicted} value T of the response delay time and the _measured value T of the response delay time, such as the median value, the mode value, and the average value, is the second It can be set as an operation section. Note that FIG. 11 illustrates an example in which the second operation intervals having the same interval length are set before and after the passage of the predicted value T _prediction of the response delay time. You can also set different interval lengths based on the value.

これによって、応答遅延時間の予測値の経過時点を含む前後の所定時間が第２の動作区間に設定されることになる。このような第２の動作区間において、情報入力に対する応答を出力する動作、例えばテキストの翻訳結果に対応する発話の音声出力を割り込ませて実行する場合、動作が中断、あるいは継続されても違和感が少ないフィラー動作を実行させる。これによって、フィラー動作から応答出力の動作へつなげる際の違和感の軽減を図る。 As a result, a predetermined time before and after the time point at which the predicted value of the response delay time has elapsed is set as the second operation interval. In such a second operation section, when an operation for outputting a response to information input, for example, when an utterance corresponding to the translation result of the text is interrupted and executed, there is a sense of discomfort even if the operation is interrupted or continued. Have fewer filler operations performed. This reduces the sense of incongruity when connecting the filler motion to the response output motion.

決定部２２は、図３に示す決定部１４と同様、コミュニケーションロボット２に実行させるフィラー動作を決定する処理部である。 The determination unit 22 is a processing unit that determines the filler motion to be executed by the communication robot 2, similar to the determination unit 14 shown in FIG.

１つの側面として、決定部２２は、図３に示す決定部１４に比較して、設定部２１により設定された複数の動作区間ごとに当該動作区間で実行させるフィラー動作を決定する点が異なる。例えば、上述の通り、設定部２１により第１の動作区間および第２の動作区間が設定される場合、決定部２２は、第１の動作区間および第２の動作区間ごとにフィラー動作を決定する。 As one aspect, the determination unit 22 is different from the determination unit 14 shown in FIG. 3 in that, for each of a plurality of motion intervals set by the setting unit 21, the filler motion to be executed in the motion interval is determined. For example, as described above, when the setting unit 21 sets the first motion interval and the second motion interval, the determining unit 22 determines the filler motion for each of the first motion interval and the second motion interval. .

ここで、情報入力に対する応答を出力する動作をフィラー動作に割り込ませる状況を想定する場合、応答出力の動作の割込み時に動作が中断、あるいは継続されても違和感が少ないフィラー動作とそうでないフィラー動作がある。 Here, when assuming a situation in which the action of outputting a response to information input is interrupted by the filler action, there are two filler actions, one that does not cause discomfort even if the action is interrupted or continued when the response output action is interrupted, and the other that does not. be.

図１２は、動作と違和感の有無の対応関係の一例を示す図である。図１２に示す例では、コミュニケーションロボット２のフィラー動作が駆動系の動作とその他の表現系の動作、すなわち表示および音声による動作とに分類して示されている。さらに、図１２に示す例では、駆動系の動作が対象者Ｕ１に目線を合わせた状態で実行される動作と目線を外す動作とにさらに分類されている。このような分類ごとに、各々の動作が中断された場合と継続された場合とに分けてコミュニケーションロボット２がテキストの翻訳結果に対応する発話を音声出力する応答出力の動作を割り込ませる際の違和感の有無が示されている。 FIG. 12 is a diagram illustrating an example of a correspondence relationship between actions and the presence or absence of discomfort. In the example shown in FIG. 12, the filler motions of the communication robot 2 are classified into motions of the driving system and motions of other expression systems, that is, motions by display and voice. Furthermore, in the example shown in FIG. 12, the motions of the drive system are further classified into motions executed with the eyes of the subject U1 aligned and motions of removing the eyes. Discomfort when the communication robot 2 outputs the utterance corresponding to the translation result of the text as an interrupt by interrupting the operation of the response output, divided into the case where each operation is interrupted and the case where each operation is continued for each of such classifications. The presence or absence of

図１２に示す通り、対象者Ｕ１に目線を合わせた状態では、駆動系の動作が中断された場合も、あるいは駆動系の動作が継続された場合のいずれの場合においても、上記の音声出力の割込みに対象者Ｕ１が持つ違和感は少ないことがわかる。例えば、図８に例示する駆動系の動作の中でも、両手を上げる動作は、対象者Ｕ１に目線を合わせた状態で行われる。このように両者の目線が合った状態であれば、コミュニケーションロボット２がテキストの翻訳結果に対応する発話を音声出力しても、当該発話が対象者Ｕ１に向けられたものであることが明らかである。したがって、コミュニケーションロボット２の腕部７を上げる動作、あるいは上げた腕部７を戻す動作が継続されようが途中で中断されようが、さほどの違和感はない。 As shown in FIG. 12, in a state in which the eyes are aligned with the target person U1, the above-described voice output is possible even when the operation of the driving system is interrupted or when the operation of the driving system is continued. It can be seen that the object person U1 feels less uncomfortable with the interruption. For example, among the actions of the drive system illustrated in FIG. 8, the action of raising both hands is performed with the eyes aligned with the subject U1. In this way, when the eyes of both parties are aligned, even if the communication robot 2 outputs an utterance corresponding to the translation result of the text, it is clear that the utterance is directed toward the target person U1. be. Therefore, whether the action of raising the arm 7 of the communication robot 2 or the action of returning the raised arm 7 is continued or interrupted, there is no sense of discomfort.

一方、対象者Ｕ１から目線が外された状態では、駆動系の動作が中断される場合も、あるいは駆動系の動作が継続される場合のいずれの場合においても、上記の音声出力の割込みに対象者Ｕ１が違和感を持つことがわかる。例えば、図８に例示する駆動系の動作の中でも、目線を上げる動作は、対象者Ｕ１から目線が外される。これを対象者Ｕ１の視点から見れば、目線を外しながらの状態、あるいは目線が外された状態でコミュニケーションロボット２がテキストの翻訳結果に対応する発話を音声出力することになる。この場合、当該発話が対象者Ｕ１に向けられたものかどうかに疑問が生じるので、対象者Ｕ１に違和感が生じる。 On the other hand, when the target person U1 is out of line of sight, the above-mentioned audio output interruption is applicable in either case of interruption of the operation of the driving system or continuation of the operation of the driving system. It can be seen that the person U1 feels uncomfortable. For example, among the actions of the driving system illustrated in FIG. 8, the action of raising the line of sight removes the line of sight from the subject U1. From the viewpoint of the target person U1, the communication robot 2 outputs the utterance corresponding to the translation result of the text while looking away from the target person U1, or in a state where the eyes are removed. In this case, since it is doubtful whether the utterance is directed to the target person U1, the target person U1 feels uncomfortable.

また、表現系の動作のうち表示が中断される場合、応答遅延時間が経過して対象者Ｕ１にとっての待ち時間が終了したことをＬＥＤ点滅の終了によって表現できる。このため、コミュニケーションロボット２がテキストの翻訳結果に対応する発話を音声出力しても対象者Ｕ１が持つ違和感は少ないことがわかる。その一方で、表示が継続される場合、ＬＥＤ点滅の表現が継続することによって待ち時間が終了していないとの錯誤を対象者Ｕ１に与える可能性があるので、対象者Ｕ１に違和感が生じる。 Further, when the display is interrupted among the actions of the expression system, the fact that the response delay time has passed and the waiting time for the subject U1 has ended can be expressed by the end of blinking of the LED. Therefore, even if the communication robot 2 voice-outputs the utterance corresponding to the translation result of the text, it can be seen that the subject U1 feels little discomfort. On the other hand, when the display is continued, the subject U1 may be misled into thinking that the waiting time has not ended due to the continuation of the blinking of the LED, which makes the subject U1 feel uncomfortable.

さらに、表示系の動作のうち音声の表現が中断される場合も、あるいは音声の表現が継続される場合のいずれの場合においても、コミュニケーションロボット２がテキストの翻訳結果に対応する発話を音声出力すると、対象者Ｕ１が違和感を持つことがわかる。例えば、図８に例示する表現系の動作の中でも、メッセージ「少々お待ち下さい」の音声出力が中断されてテキストの翻訳結果に対応する発話が即座に音声出力されれば、デジタルに音声出力が切り替わる様子が人間の振る舞いから逸脱するので、対象者Ｕ１に違和感が生じる。また、メッセージ「少々お待ち下さい」の音声出力を継続すれば、テキストの翻訳結果に対応する発話を音声出力できる状態であるにもかかわらず、無意味なフィラー動作を行うことになるので、本末転倒である。 Furthermore, in either case where speech expression is interrupted or continued in the operation of the display system, if the communication robot 2 outputs the utterance corresponding to the translation result of the text as speech. , the subject U1 feels uncomfortable. For example, among the actions of the expression system illustrated in FIG. 8, if the voice output of the message "Please wait a moment" is interrupted and the utterance corresponding to the translation result of the text is immediately voice output, the voice output is switched to digital. Since the appearance deviates from human behavior, the subject U1 feels uncomfortable. Also, if the voice output of the message "Please wait a moment" continues, even though the utterance corresponding to the text translation result can be voice output, it will perform a meaningless filler action. be.

これらのことから、決定部２２は、第１の動作区間で実行される第１のフィラー動作を決定する場合、上記の実施例１と同様、ルックアップテーブル１４Ａを参照して、第１の動作区間の区間長に対応する動作を第１のフィラー動作として決定する。その一方で、決定部２２は、第２の動作区間で実行される第２のフィラー動作を決定する場合、コミュニケーションロボット２が実行可能な動作のうち、応答出力の動作の割込み時に動作が中断、あるいは継続されても違和感が少ない動作を第２のフィラー動作として決定する。 For these reasons, when determining the first filler motion to be executed in the first motion section, the determining unit 22 refers to the lookup table 14A to determine the first motion, as in the first embodiment. A motion corresponding to the segment length of the segment is determined as the first filler motion. On the other hand, when determining the second filler motion to be executed in the second motion section, the determination unit 22 suspends the motion when the response output motion is interrupted among the motions that the communication robot 2 can perform. Alternatively, a motion that causes little discomfort even if continued is determined as the second filler motion.

図１３は、各動作区間で実行が許可される動作の一例を示す図である。図１３に示すように、第１の動作区間には、第１の動作区間の区間長に対応する動作であれば、駆動系の動作のいずれであっても、あるいは表現系の動作のいずれであっても、第１のフィラー動作として決定することが許可されるので、制限は課されない。その一方で、第２の動作区間には、駆動系の動作の中でも、対象者Ｕ１から目線が外される動作を第２のフィラー動作として決定することは許可されない。すなわち、第２の動作区間には、対象者Ｕ１に目線を合わせた状態で実行される動作に絞って第２のフィラー動作として決定することが許可されるといった制限が課される。さらに、第２の動作区間には、表現系の動作の中でも、応答出力の動作の割込み時に動作が中断されても違和感が少ない表示による動作に絞って第２のフィラー動作として決定することが許可されるといった制限が課される。 FIG. 13 is a diagram showing an example of actions permitted to be executed in each action section. As shown in FIG. 13, in the first motion section, any motion of the driving system or the motion of the expression system may be included as long as it corresponds to the length of the first motion section. Even if there is, it is allowed to be determined as the first filler operation, so no restriction is imposed. On the other hand, in the second motion section, it is not permitted to determine, among the motions of the driving system, the motion in which the subject U1 looks away as the second filler motion. That is, in the second motion section, a restriction is imposed such that it is permitted to narrow down to the motions performed while looking at the target person U1 and to determine them as the second filler motions. Furthermore, in the second action section, it is permitted to select as the second filler action narrowing down to a display action that causes little discomfort even if the action is interrupted at the time of the interruption of the response output action, among the actions of the expressive system. restrictions are imposed.

このように、情報入力に対する応答を出力する動作、例えばテキストの翻訳結果に対応する発話の音声出力を第２のフィラー動作に割り込ませて実行する場合、動作が中断、あるいは継続されても違和感が少ない動作を第２のフィラー動作として決定する。これによって、第２のフィラー動作から応答出力の動作へつなげる際の違和感の軽減を図る。すなわち、コミュニケーションロボット２の動作が終了するまでの段階で当該動作が中断されて応答が出力される場合、当該動作の中断によって動作が途切れる継ぎ目が違和感となって現れることがある。この違和感の現れ方は、図１２に示す通り、駆動系および表現系の動作の種類が変わることによって程度に差が生じる。このことから、中断によって途切れる継ぎ目が違和感となって現れにくい駆動系および表現系の動作を第２のフィラー動作として実行することで、第２のフィラー動作から応答出力の動作へつなげる際の違和感を軽減することができる。 In this way, when an operation for outputting a response to information input, for example, voice output of an utterance corresponding to the result of text translation, is executed by interrupting the second filler operation, even if the operation is interrupted or continued, a sense of incongruity is felt. The lesser motion is determined as the second filler motion. This reduces the sense of incongruity when connecting the second filler motion to the response output motion. That is, if the action is interrupted and a response is output before the action of the communication robot 2 is completed, the joint where the action is interrupted may cause a sense of incongruity. As shown in FIG. 12, the appearance of this sense of incongruity varies depending on the type of operation of the drive system and expression system. For this reason, by executing the motions of the drive system and the expression system, which are unlikely to cause discomfort due to the interruption, as the second filler motion, the sense of discomfort when connecting the second filler motion to the response output motion can be reduced. can be mitigated.

［処理の流れ］
図１４は、実施例２に係るフィラー動作の制御処理の手順を示すフローチャートである。この処理も、一例として、コミュニケーションロボット２に対する情報入力を受け付けた場合、例えば音声処理部１１により発話区間が検出された場合に起動する。 [Process flow]
FIG. 14 is a flowchart illustrating the procedure of a filler operation control process according to the second embodiment. This process is also started, for example, when information input to the communication robot 2 is received, for example, when the voice processing unit 11 detects an utterance period.

図１４に示すように、音声入力部９Ａから取得された音声データから発話区間が検出されると（ステップＳ２０１Ｙｅｓ）、音声処理部１１は、当該発話区間にワードスポッティングを始めとする音声認識を実行する（ステップＳ２０２）。続いて、伝送処理部１２は、ステップＳ２０２の音声認識結果として得られたテキストの翻訳依頼をサーバ装置５０に伝送する（ステップＳ２０３）。 As shown in FIG. 14, when an utterance segment is detected from the voice data acquired from the voice input unit 9A (step S201 Yes), the voice processing unit 11 executes voice recognition including word spotting for the utterance segment. (step S202). Subsequently, the transmission processing unit 12 transmits a translation request for the text obtained as the speech recognition result of step S202 to the server device 50 (step S203).

このようにテキストの翻訳依頼が伝送されたサーバ装置５０では、コミュニケーションロボット１から伝送されたテキストに機械翻訳が実行される。そして、対象者Ｕ１の発話に対応するテキストが母国語から外国語へ翻訳される段階でテキストの翻訳結果がコミュニケーションロボット２へ返信される。 In the server device 50 to which the text translation request has thus been transmitted, the text transmitted from the communication robot 1 is machine-translated. Then, when the text corresponding to the utterance of the subject U1 is translated from the native language to the foreign language, the translation result of the text is sent back to the communication robot 2.

これらステップＳ２０２又はステップＳ２０３と並行するか、あるいはステップＳ２０２及びステップＳ２０３と前後して、予測部１３は、ステップＳ２０１で検出された発話区間の時間長から応答遅延時間を予測する（ステップＳ２０４）。続いて、設定部２１は、ステップＳ２０４で予測された応答遅延時間の予測値に基づいて第１の動作区間および第２の動作区間を設定する（ステップＳ２０５）。 In parallel with step S202 or step S203, or before or after step S202 and step S203, the prediction unit 13 predicts the response delay time from the time length of the speech period detected in step S201 (step S204). Subsequently, the setting unit 21 sets the first operation interval and the second operation interval based on the predicted value of the response delay time predicted in step S204 (step S205).

なお、ステップＳ２０４で予測される応答遅延時間は、ステップＳ２０５における第１の動作区間および第２の動作区間の設定に用いられる。このため、ステップＳ２０４の処理は、ステップＳ２０５の処理が実行されるまでの任意のタイミングで実行することができる。例えば、ステップＳ２０４の処理がステップＳ２０３の処理の後に実行されたとしても、ステップＳ２０２又はステップＳ２０３と並行して実行されたとしても、ステップＳ２０５以降の処理内容に変更はない。 Note that the response delay time predicted in step S204 is used for setting the first operation interval and the second operation interval in step S205. Therefore, the process of step S204 can be executed at any timing until the process of step S205 is executed. For example, even if the process of step S204 is performed after the process of step S203, even if it is performed in parallel with step S202 or step S203, there is no change in the process contents after step S205.

さらに、決定部２２は、ルックアップテーブル１４Ａを参照して、第１の動作区間の区間長に対応する動作を第１のフィラー動作として決定する。さらに、決定部２２は、応答出力の動作の割込み時に動作が中断、あるいは継続されても違和感が少ない動作を第２のフィラー動作として決定する（ステップＳ２０６）。 Further, the determining unit 22 refers to the lookup table 14A and determines the motion corresponding to the segment length of the first motion segment as the first filler motion. Further, the determination unit 22 determines a motion that causes little discomfort even if the motion is interrupted or continued at the time of interruption of the response output motion as the second filler motion (step S206).

そして、動作制御部１５は、ステップＳ２０５で設定された第１の動作区間の間、ステップＳ２０６で決定された第１のフィラー動作および元の姿勢への復帰動作を実行する（ステップＳ２０７）。 Then, the motion control unit 15 executes the first filler motion determined in step S206 and the return motion to the original posture during the first motion section set in step S205 (step S207).

その後、動作制御部１５は、ステップＳ２０６で決定された第２のフィラー動作および元の姿勢への復帰動作を開始する（ステップＳ２０８）。そして、サーバ装置５０からテキストの翻訳結果が受信される前に第２のフィラー動作が完了した場合（ステップＳ２０９ＮｏかつステップＳ２１０Ｙｅｓ）、動作制御部１５は、追加のフィラー動作、例えばシルエットの変化が小さいフィラー動作を開始し（ステップＳ２１１）、ステップＳ２０９へ移行する。 Thereafter, the motion control unit 15 starts the second filler motion determined in step S206 and the return motion to the original posture (step S208). Then, if the second filler motion is completed before the translation result of the text is received from the server device 50 (No in step S209 and Yes in step S210), the motion control unit 15 performs the additional filler motion, for example, the change in the silhouette is small. A filler operation is started (step S211), and the process proceeds to step S209.

また、テキストの翻訳結果がサーバ装置５０から受信された場合（ステップＳ２０９Ｙｅｓ）、動作制御部１５は、実行中の第２のフィラー動作または追加のフィラー動作を中断するか、あるいは継続し（ステップＳ２１２）、サーバ装置５０によるテキストの翻訳結果を合成音声等で音声出力し（ステップＳ２１３）、処理を終了する。 Further, when the translation result of the text is received from the server device 50 (step S209 Yes), the motion control unit 15 interrupts or continues the second filler motion or the additional filler motion being executed (step S212 ), the result of the translation of the text by the server device 50 is output as synthesized speech or the like (step S213), and the process is terminated.

なお、図１４に示すフローチャートでは、無条件に第１の動作区間および第２の動作区間が設定される例を示したが、応答遅延時間が短いほど複数の動作を行うことは困難となるので、一定の条件を課すこともできる。例えば、ステップＳ２０４で予測された応答遅延時間が所定の閾値、例えば５秒以上であるか否かを判定する。このとき、応答遅延時間が閾値以上である場合に絞ってステップＳ２０５以降の処理を実行する一方で、応答遅延時間が閾値未満である場合、上記の実施例１で図９を用いて説明したステップＳ１０５以降の処理を実行することもできる。 Although the flowchart shown in FIG. 14 shows an example in which the first operation interval and the second operation interval are set unconditionally, the shorter the response delay time, the more difficult it becomes to perform multiple operations. , can also impose certain conditions. For example, it is determined whether or not the response delay time predicted in step S204 is equal to or greater than a predetermined threshold value, for example, 5 seconds. At this time, when the response delay time is equal to or greater than the threshold value, the processing from step S205 onwards is executed. The processing after S105 can also be executed.

［効果の一側面］
上述してきたように、本実施例に係るコミュニケーションロボット２によれば、上記の実施例１に係るコミュニケーションロボット１と同様、ロボットの応答遅延時のインタラクションに発生する違和感を抑制することが可能である。 [One aspect of the effect]
As described above, according to the communication robot 2 according to the present embodiment, as with the communication robot 1 according to the above-described first embodiment, it is possible to suppress the discomfort that occurs in the interaction when the response of the robot is delayed. .

さらに、本実施例に係るコミュニケーションロボット２では、応答遅延時間に基づいて第１の動作区間および第２の動作区間を設定する。その上で、本実施例に係るコミュニケーションロボット２では、第１の動作区間の区間長に対応する動作を第１のフィラー動作として決定する。さらに、本実施例に係るコミュニケーションロボット２では、応答出力の動作の割込み時に動作が中断、あるいは継続されても違和感が少ない動作を第２のフィラー動作として決定する。それ故、応答遅延時間の予測値および応答遅延時間の実績値にずれが発生する場合でも、フィラー動作から応答出力の動作へつなげる際の違和感を軽減することが可能である。 Furthermore, in the communication robot 2 according to the present embodiment, the first action section and the second action section are set based on the response delay time. In addition, in the communication robot 2 according to the present embodiment, a motion corresponding to the segment length of the first motion segment is determined as the first filler motion. Furthermore, in the communication robot 2 according to the present embodiment, a motion that causes little discomfort even if the motion is interrupted or continued is determined as the second filler motion. Therefore, even if there is a discrepancy between the predicted value of the response delay time and the actual value of the response delay time, it is possible to reduce discomfort when connecting the filler motion to the response output motion.

さて、これまで開示の装置に関する実施例について説明したが、本発明は上述した実施例以外にも、種々の異なる形態にて実施されてよいものである。そこで、以下では、本発明に含まれる他の実施例を説明する。 Although embodiments of the disclosed apparatus have been described so far, the present invention may be embodied in various forms other than the embodiments described above. Therefore, other embodiments included in the present invention will be described below.

［応答遅延時間１］
上記の実施例１および上記の実施例２では、応答遅延時間の予測に発話時間を用いる例を説明したが、発話時間に限定されず、他の情報を用いることができる。例えば、コミュニケーションロボット１及び２は、発話区間から音声認識により得られたテキストにおけるモーラや音素の数の他、テキストにおける表音文字の数、テキストに対する自然言語処理、例えば形態素解析で得られた単語の数などを応答遅延時間の予測に用いることができる。これらモーラ数、音素数、表音文字数、単語数のいずれの数値を用いる場合においても、数値が大きいほど翻訳処理の所要時間も長くなる。このため、図７に示すルックアップテーブル１３Ａに準拠して、数値が小さいほど短い応答遅延時間を予測し、数値が大きいほど長い応答遅延時間を予測するルックアップテーブルや関数などを用いて、応答遅延時間を予測することができる。 [Response delay time 1]
In the first embodiment and the second embodiment described above, an example in which the speech time is used to predict the response delay time has been described, but the information is not limited to the speech time, and other information can be used. For example, the communication robots 1 and 2, in addition to the number of moras and phonemes in the text obtained by speech recognition from the utterance interval, the number of phonetic characters in the text, natural language processing for the text, such as words obtained by morphological analysis can be used to predict the response delay time. Regardless of whether the number of moras, the number of phonemes, the number of phonetic characters, or the number of words is used, the larger the number, the longer the time required for the translation process. Therefore, based on the lookup table 13A shown in FIG. 7, a lookup table or function predicts a shorter response delay time as the numerical value is smaller and a longer response delay time as the numerical value is larger. Delay time can be predicted.

［応答遅延時間２］
例えば、コミュニケーションロボット１及び２は、応答遅延時間の実測値に基づいて応答遅延時間の予測値を更新することもできる。すなわち、コミュニケーションロボット１及び２は、図９や図１４に示す処理が実行されるバックグラウンドにおいて、情報入力から応答出力までの応答遅延時間を実測値として計測する。このような応答遅延時間の一例として、上記の実施例１及び上記の実施例２の例で従えば、発話区間が検出されてからテキストの翻訳結果が出力されるまでの期間が挙げられる。その上で、コミュニケーションロボット１及び２は、当該実績値と当該実測値が計測された時の発話時間が対応付けられたログを蓄積する。このログを参照して、コミュニケーションロボット１及び２は、ルックアップテーブル１３Ａに含まれるレコードごとに、次のような処理を実行する。すなわち、コミュニケーションロボット１及び２は、上記のログに含まれる応答遅延時間の実測値のうち当該レコードの発話時間に対応する応答遅延時間の実測値と、レコード内の応答遅延時間の予測値との間でずれを算出する。このように算出されたずれの統計値、例えば最頻値や中央値、平均値を求め、コミュニケーションロボット１及び２は、ずれの統計値に基づいて当該レコードの応答遅延時間の予測値を更新する。例えば、予測値から実測値を減算することによりずれが算出される場合、ずれの統計値の符号が正であるならば、予測値からずれの統計値を減算する更新を実行する一方で、ずれの統計値の符号が負であるならば、予測値にずれの統計値を加算する更新を実行する。 [Response delay time 2]
For example, the communication robots 1 and 2 can also update the predicted value of the response delay time based on the measured value of the response delay time. That is, the communication robots 1 and 2 measure the response delay time from information input to response output as an actual measurement value in the background where the processes shown in FIGS. 9 and 14 are executed. An example of such a response delay time, according to the examples of the first and second embodiments, is the period from the detection of the utterance section to the output of the translation result of the text. Further, the communication robots 1 and 2 accumulate logs in which the actual values and the utterance times when the measured values are measured are associated with each other. Referring to this log, the communication robots 1 and 2 execute the following processing for each record included in the lookup table 13A. That is, the communication robots 1 and 2 compare the measured value of the response delay time corresponding to the utterance time of the record among the measured values of the response delay time included in the log and the predicted value of the response delay time in the record. Calculate the deviation between Statistical values of the deviation calculated in this way, such as the mode, median, and average, are obtained, and the communication robots 1 and 2 update the predicted value of the response delay time of the record based on the statistical value of the deviation. . For example, if the deviation is calculated by subtracting the observed value from the predicted value, then if the sign of the deviation statistic is positive, perform an update that subtracts the deviation statistic from the predicted value, while If the sign of the statistic of is negative, perform an update that adds the deviation statistic to the predicted value.

［応答遅延時間３］
上記の実施例１および上記の実施例２では、情報処理の一例として実行される翻訳処理による応答遅延時間を発話時間から動的に予測し、これ以外の応答遅延の要因、例えばネットワークや駆動については一定値を静的に含めて加味する例を説明した。しかしながら、上記の実施例１および上記の実施例２で示された例に限定されず、応答遅延の要因ごとに応答遅延時間を動的に予測することもできる。例えば、コミュニケーションロボット１又は２は、ＰＩＮＧ等のコマンドを用いてサーバ装置５０の応答時間を測定してその応答時間からネットワークに関する応答遅延時間を個別に予測することができる。また、コミュニケーションロボット１又は２は、各部位のアクチュエータへ送信する制御信号の伝送時間から駆動に関する応答遅延時間を予測することができる。 [Response delay time 3]
In the above-described first embodiment and the above-described second embodiment, as an example of information processing, the response delay time due to translation processing executed as an example of information processing is dynamically predicted from the utterance time. described an example of adding a constant value statically. However, the present invention is not limited to the examples shown in the above first and second embodiments, and it is also possible to dynamically predict the response delay time for each factor of response delay. For example, the communication robot 1 or 2 can measure the response time of the server device 50 using a command such as PING and individually predict the response delay time related to the network from the response time. Also, the communication robot 1 or 2 can predict the response delay time related to driving from the transmission time of the control signal to be sent to the actuator of each part.

［スタンドアローン］
上記の実施例１および上記の実施例２では、コミュニケーションロボット１及び２がサーバ装置５０により提供されるプラットフォームを利用する例を説明したが、コミュニケーションロボット１又は２は、スタンドアローンで情報処理を実行することとしてもかまわない。図１５は、実施例３に係るコミュニケーションロボット４の機能的構成の一例を示すブロック図である。図１５に示すように、コミュニケーションロボット４は、図３に示すコミュニケーションロボット１および図１０に示すコミュニケーションロボット２に比べて、通信部９Ｃが不要であると共に、制御部４０の機能の一部が異なる。すなわち、コミュニケーションロボット４は、音声処理部１１や伝送処理部１２の代わりに、音声区間検出部４１、音声認識部４２および翻訳部４３を有する点が異なる。このように発話区間の検出、音声認識、自然言語処理および機械翻訳の全てがコミュニケーションロボット４により実行される場合、ネットワークの伝送遅延が発生する代わりに、音声処理の所要時間が変わる。例えば、音声区間検出部４１、音声認識部４２および翻訳部４３による音声処理がコミュニケーションロボット１又は２の側で実行される分、音声処理の所要時間が増加する。このように音声処理の所要時間が増加する程度は、コミュニケーションロボット４のプロセッサ及びメモリ等のマシンパワーによって変化する。このため、コミュニケーションロボット４の予測部１３が用いるルックアップテーブル１３Ａにおける応答遅延時間の予測値には、音声区間検出部４１、音声認識部４２および翻訳部４３による翻訳処理の所要時間に基づく値が設定される。この際、コミュニケーションロボット４のプロセッサ及びメモリ等の性能の高低に応じて応答遅延時間の予測値を変化させることができる。 [Standalone]
In the first embodiment and the second embodiment, the communication robots 1 and 2 use the platform provided by the server device 50. However, the communication robot 1 or 2 executes information processing in a stand-alone manner. I don't mind if you do. FIG. 15 is a block diagram showing an example of the functional configuration of the communication robot 4 according to the third embodiment. As shown in FIG. 15, the communication robot 4 does not require the communication unit 9C and differs in some functions of the control unit 40 from the communication robot 1 shown in FIG. 3 and the communication robot 2 shown in FIG. . That is, the communication robot 4 is different in that it has a speech section detection section 41 , a speech recognition section 42 and a translation section 43 instead of the speech processing section 11 and the transmission processing section 12 . When the communication robot 4 executes all of the speech period detection, speech recognition, natural language processing, and machine translation, the time required for speech processing changes instead of network transmission delay. For example, since the speech processing by the speech segment detection unit 41, the speech recognition unit 42, and the translation unit 43 is executed on the communication robot 1 or 2 side, the time required for the speech processing increases. The extent to which the time required for voice processing increases in this manner varies depending on the machine power of the communication robot 4 processor, memory, and the like. Therefore, the predicted value of the response delay time in the lookup table 13A used by the prediction unit 13 of the communication robot 4 includes a value based on the time required for translation processing by the speech section detection unit 41, the speech recognition unit 42, and the translation unit 43. set. At this time, the predicted value of the response delay time can be changed according to the performance of the processor, memory, etc. of the communication robot 4 .

［コミュニケーションロボットの情報処理］
上記の実施例１および上記の実施例２では、発話区間の検出、音声認識、自然言語処理および機械翻訳等の音声処理が情報処理として実行される例を挙げたが、コミュニケーションロボット１又は２が実行する情報処理は音声処理に限定されない。例えば、コミュニケーションロボット１、２又は４は、画像を入力とし、他の情報処理、例えば画像処理、例えば顔認識や表情認識などを実行することとしてもかまわない。この場合、画像処理の所要時間から応答遅延時間を予測することとすればよい。 [Information processing of communication robot]
In the above-described first embodiment and the above-described second embodiment, examples were given in which voice processing such as detection of utterance intervals, voice recognition, natural language processing, and machine translation were executed as information processing. The information processing to be executed is not limited to voice processing. For example, the communication robots 1, 2, or 4 may receive images as input and execute other information processing such as image processing such as face recognition and facial expression recognition. In this case, the response delay time may be predicted from the time required for image processing.

［分散および統合］
また、図示した各装置の各構成要素は、必ずしも物理的に図示の如く構成されておらずともよい。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、音声処理部１１、伝送処理部１２、予測部１３、決定部１４または動作制御部１５をコミュニケーションロボット１の外部装置としてネットワーク経由で接続するようにしてもよい。また、音声処理部１１、伝送処理部１２、予測部１３、設定部２１、決定部２２または動作制御部１５をコミュニケーションロボット２の外部装置としてネットワーク経由で接続するようにしてもよい。また、音声区間検出部４１、音声認識部４２、翻訳部４３、予測部１３、設定部２１、決定部２２または動作制御部１５をコミュニケーションロボット４の外部装置としてネットワーク経由で接続するようにしてもよい。また、音声処理部１１、伝送処理部１２、予測部１３、決定部１４または動作制御部１５を別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のコミュニケーションロボット１の機能を実現するようにしてもよい。また、音声処理部１１、伝送処理部１２、予測部１３、設定部２１、決定部２２または動作制御部１５を別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のコミュニケーションロボット２の機能を実現するようにしてもよい。また、音声区間検出部４１、音声認識部４２、翻訳部４３、予測部１３、設定部２１、決定部２２または動作制御部１５を別の装置がそれぞれ有し、ネットワーク接続されて協働することで、上記のコミュニケーションロボット４の機能を実現するようにしてもよい。 Distributed and integrated
Also, each component of each illustrated device may not necessarily be physically configured as illustrated. In other words, the specific form of distribution and integration of each device is not limited to the one shown in the figure, and all or part of them can be functionally or physically distributed and integrated in arbitrary units according to various loads and usage conditions. Can be integrated and configured. For example, the voice processing unit 11, the transmission processing unit 12, the prediction unit 13, the determination unit 14, or the motion control unit 15 may be connected as external devices of the communication robot 1 via a network. Also, the voice processing unit 11, the transmission processing unit 12, the prediction unit 13, the setting unit 21, the determination unit 22, or the motion control unit 15 may be connected to the communication robot 2 via a network as external devices. Alternatively, the speech interval detection unit 41, the speech recognition unit 42, the translation unit 43, the prediction unit 13, the setting unit 21, the determination unit 22, or the operation control unit 15 may be connected as external devices of the communication robot 4 via a network. good. In addition, the voice processing unit 11, the transmission processing unit 12, the prediction unit 13, the determination unit 14, or the operation control unit 15 are provided in separate devices, which are connected to a network and cooperate with each other to achieve the functions of the communication robot 1 described above. may be realized. Further, another device has the voice processing unit 11, the transmission processing unit 12, the prediction unit 13, the setting unit 21, the determination unit 22, or the operation control unit 15, and is connected to the network and cooperates to achieve the above communication. You may make it implement|achieve the function of the robot 2. FIG. In addition, separate devices each have the speech segment detection unit 41, the speech recognition unit 42, the translation unit 43, the prediction unit 13, the setting unit 21, the determination unit 22, or the operation control unit 15, and are connected to a network and cooperate with each other. , the functions of the communication robot 4 may be realized.

［制御プログラム］
また、上記の実施例で説明した各種の処理は、予め用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することによって実現することができる。そこで、以下では、図１６を用いて、上記の実施例と同様の機能を有する制御プログラムを実行するコンピュータの一例について説明する。 [Control program]
Moreover, various processes described in the above embodiments can be realized by executing a prepared program on a computer such as a personal computer or a work station. Therefore, an example of a computer that executes a control program having functions similar to those of the above embodiment will be described below with reference to FIG.

図１６は、実施例１～実施例３に係る制御プログラムを実行するコンピュータのハードウェア構成例を示す図である。図１６に示すように、コンピュータ１００は、操作部１１０ａと、スピーカ１１０ｂと、カメラ１１０ｃと、ディスプレイ１２０と、通信部１３０とを有する。さらに、このコンピュータ１００は、ＣＰＵ１５０と、ＲＯＭ１６０と、ＨＤＤ１７０と、ＲＡＭ１８０とを有する。これら１１０～１８０の各部はバス１４０を介して接続される。 FIG. 16 is a diagram illustrating a hardware configuration example of a computer that executes control programs according to the first to third embodiments. As shown in FIG. 16, the computer 100 has an operation section 110a, a speaker 110b, a camera 110c, a display 120, and a communication section . Furthermore, this computer 100 has a CPU 150 , a ROM 160 , an HDD 170 and a RAM 180 . Each part of these 110 to 180 is connected via a bus 140 .

ＨＤＤ１７０には、図１６に示すように、上記の実施例１で示した音声処理部１１、伝送処理部１２、予測部１３、決定部１４及び動作制御部１５と同様の機能を発揮する制御プログラム１７０ａが記憶される。また、ＨＤＤ１７０には、上記の実施例２で示した音声処理部１１、伝送処理部１２、予測部１３、設定部２１、決定部２２及び動作制御部１５と同様の機能を発揮する制御プログラム１７０ａが記憶されることとしてもよい。また、ＨＤＤ１７０には、本実施例で示した音声区間検出部４１、音声認識部４２、翻訳部４３、予測部１３、設定部２１、決定部２２及び動作制御部１５と同様の機能を発揮する制御プログラム１７０ａが記憶されることとしてもよい。このような制御プログラム１７０ａは、図３に示した制御部１０、図１０に示した制御部２０または図１５に示した制御部４０の各構成要素と同様、統合又は分離してもかまわない。すなわち、ＨＤＤ１７０には、必ずしも上記の実施例１で示した全てのデータが格納されずともよく、処理に用いるデータがＨＤＤ１７０に格納されればよい。 As shown in FIG. 16, the HDD 170 stores a control program that exhibits the same functions as the audio processing unit 11, the transmission processing unit 12, the prediction unit 13, the determination unit 14, and the operation control unit 15 shown in the first embodiment. 170a is stored. The HDD 170 also contains a control program 170a that exhibits the same functions as the audio processing unit 11, the transmission processing unit 12, the prediction unit 13, the setting unit 21, the determination unit 22, and the operation control unit 15 shown in the second embodiment. may be stored. Also, the HDD 170 exhibits the same functions as the speech section detection unit 41, the speech recognition unit 42, the translation unit 43, the prediction unit 13, the setting unit 21, the determination unit 22, and the operation control unit 15 shown in this embodiment. A control program 170a may be stored. Such a control program 170a may be integrated or separated like each component of the control unit 10 shown in FIG. 3, the control unit 20 shown in FIG. 10, or the control unit 40 shown in FIG. That is, the HDD 170 does not necessarily store all the data shown in the first embodiment, and the HDD 170 only needs to store data used for processing.

このような環境の下、ＣＰＵ１５０は、ＨＤＤ１７０から制御プログラム１７０ａを読み出した上でＲＡＭ１８０へ展開する。この結果、制御プログラム１７０ａは、図１６に示すように、制御プロセス１８０ａとして機能する。この制御プロセス１８０ａは、ＲＡＭ１８０が有する記憶領域のうち制御プロセス１８０ａに割り当てられた領域にＨＤＤ１７０から読み出した各種データを展開し、この展開した各種データを用いて各種の処理を実行する。例えば、制御プロセス１８０ａが実行する処理の一例として、図９や図１４に示す処理などが含まれる。なお、ＣＰＵ１５０では、必ずしも上記の実施例１で示した全ての処理部が動作せずともよく、実行対象とする処理に対応する処理部が仮想的に実現されればよい。 Under such an environment, the CPU 150 reads out the control program 170 a from the HDD 170 and develops it in the RAM 180 . As a result, the control program 170a functions as a control process 180a, as shown in FIG. The control process 180a deploys various data read from the HDD 170 in an area assigned to the control process 180a among storage areas of the RAM 180, and executes various processes using the deployed various data. For example, examples of processing executed by the control process 180a include the processing shown in FIGS. 9 and 14, and the like. Note that the CPU 150 does not necessarily have to operate all the processing units described in the first embodiment, as long as the processing units corresponding to the processes to be executed are virtually realized.

なお、上記の制御プログラム１７０ａは、必ずしも最初からＨＤＤ１７０やＲＯＭ１６０に記憶されておらずともかまわない。例えば、コンピュータ１００に挿入されるフレキシブルディスク、いわゆるＦＤ、ＣＤ－ＲＯＭ、ＤＶＤディスク、光磁気ディスク、ＩＣカードなどの「可搬用の物理媒体」に制御プログラム１７０ａを記憶させる。そして、コンピュータ１００がこれらの可搬用の物理媒体から制御プログラム１７０ａを取得して実行するようにしてもよい。また、公衆回線、インターネット、ＬＡＮ、ＷＡＮなどを介してコンピュータ１００に接続される他のコンピュータまたはサーバ装置などに制御プログラム１７０ａを記憶させておき、コンピュータ１００がこれらから制御プログラム１７０ａを取得して実行するようにしてもよい。 Note that the control program 170a described above does not necessarily have to be stored in the HDD 170 or the ROM 160 from the beginning. For example, the control program 170a is stored in a “portable physical medium” such as a flexible disk inserted into the computer 100, so-called FD, CD-ROM, DVD disk, magneto-optical disk, IC card, or the like. Then, the computer 100 may acquire and execute the control program 170a from these portable physical media. Alternatively, the control program 170a may be stored in another computer or server device connected to the computer 100 via a public line, the Internet, LAN, WAN, etc., and the computer 100 may obtain the control program 170a from these devices and execute it. You may make it

以上の実施例を含む実施形態に関し、さらに以下の付記を開示する。 The following notes are further disclosed with respect to the embodiments including the above examples.

（付記１）コミュニケーションロボットに対して入力された情報に基づいて、前記情報が入力されたタイミングから前記コミュニケーションロボットにより応答を出力するまでの応答遅延時間長を予測する予測部と、
予測された応答遅延時間長に対応する前記コミュニケーションロボットの動作を決定する決定部と、
決定した前記動作を前記コミュニケーションロボットに実行させる動作制御部と、
を有することを特徴とするコミュニケーションロボット。 (Appendix 1) a prediction unit that predicts, based on information input to a communication robot, a response delay time length from the timing at which the information is input until the communication robot outputs a response;
a determination unit that determines an operation of the communication robot corresponding to the predicted response delay time length;
a motion control unit that causes the communication robot to execute the determined motion;
A communication robot characterized by having

（付記２）前記決定部は、前記応答遅延時間長の予測値が長いほど前記コミュニケーションロボットのシルエットの変化が大きい動作を実行対象として決定することを特徴とする付記１に記載のコミュニケーションロボット。 (Supplementary note 2) The communication robot according to Supplementary note 1, wherein the determination unit determines an action to be executed in which the silhouette of the communication robot changes more as the predicted value of the response delay time length increases.

（付記３）前記決定部は、前記予測された応答遅延時間長の経過後に、前記コミュニケーションロボットにより応答が出力できない場合、決定された動作よりも短時間の動作を更に実行対象として決定することを特徴とする付記１に記載のコミュニケーションロボット。 (Appendix 3) When the communication robot is unable to output a response after the predicted response delay time length has elapsed, the determination unit further determines a motion that is shorter than the determined motion to be executed. A communication robot according to appendix 1, characterized in that:

（付記４）前記予測部は、前記情報の量に基づいて前記応答遅延時間長を予測することを特徴とする付記１に記載のコミュニケーションロボット。 (Appendix 4) The communication robot according to appendix 1, wherein the prediction unit predicts the response delay time length based on the amount of the information.

（付記５）前記予測部は、前記情報の量および応答遅延時間長の予測値の対応関係が定義された対応関係データを参照して、前記コミュニケーションロボットに入力される情報の量に対応する前記応答遅延時間長の予測値を予測に用いることを特徴とする付記４に記載のコミュニケーションロボット。 (Supplementary Note 5) The prediction unit refers to correspondence relationship data that defines a correspondence relationship between the amount of information and the predicted value of the response delay time length, and determines the amount of information input to the communication robot. 5. The communication robot according to appendix 4, wherein the predicted value of the response delay time length is used for the prediction.

（付記６）前記対応関係データに含まれる前記応答遅延時間長の予測値を前記応答遅延時間長の実測値に基づいて更新する更新部をさらに有することを特徴とする付記５に記載のコミュニケーションロボット。 (Appendix 6) The communication robot according to appendix 5, further comprising an updating unit that updates the predicted value of the response delay time length included in the correspondence data based on the measured value of the response delay time length. .

（付記７）前記予測部により予測された応答遅延時間長に基づいて第１の動作区間および第２の動作区間を設定する設定部をさらに有し、
前記決定部は、前記第１の動作区間の区間長に対応する動作を前記第１の動作区間で実行することを決定すると共に、前記コミュニケーションロボットと前記情報の入力を行う対象者との目線が合った状態で行われる動作を前記第２の動作区間で実行することを決定することを特徴とする付記１に記載のコミュニケーションロボット。 (Appendix 7) further comprising a setting unit that sets a first operation interval and a second operation interval based on the response delay time length predicted by the prediction unit;
The decision unit decides to execute an action corresponding to the section length of the first action section in the first action section, and the line of sight between the communication robot and the target person who inputs the information is determined. 1. The communication robot according to appendix 1, wherein the communication robot determines to perform the motion performed in the second motion section in the second motion interval.

（付記８）前記第２の動作区間は、前記応答遅延時間長の予測値の経過時点を含むことを特徴とする付記７に記載のコミュニケーションロボット。 (Supplementary note 8) The communication robot according to Supplementary note 7, wherein the second operation section includes a point in time when the predicted value of the response delay time length has passed.

（付記９）コミュニケーションロボットに対して入力された情報に基づいて、前記情報が入力されたタイミングから前記コミュニケーションロボットにより応答を出力するまでの応答遅延時間長を予測し、
予測された応答遅延時間長に対応する前記コミュニケーションロボットの動作を決定し、
決定した前記動作を前記コミュニケーションロボットに実行させる、
処理をコンピュータが実行することを特徴とする制御方法。 (Appendix 9) predicting a response delay time length from the timing when the information is input until the communication robot outputs a response based on the information input to the communication robot;
determining the operation of the communication robot corresponding to the predicted response delay time length;
causing the communication robot to execute the determined action;
A control method characterized in that the processing is executed by a computer.

（付記１０）前記決定する処理は、前記応答遅延時間長の予測値が長いほど前記コミュニケーションロボットのシルエットの変化が大きい動作を実行対象として決定することを特徴とする付記９に記載の制御方法。 (Supplementary note 10) The control method according to Supplementary note 9, wherein the determining process determines an action to be executed that causes a greater change in the silhouette of the communication robot as the predicted value of the response delay time length increases.

（付記１１）前記決定する処理は、前記予測された応答遅延時間長の経過後に、前記コミュニケーションロボットにより応答が出力できない場合、決定された動作よりも短時間の動作を更に実行対象として決定することを特徴とする付記９に記載の制御方法。 (Appendix 11) In the determining process, when the communication robot cannot output a response after the predicted response delay time length has elapsed, a motion shorter than the determined motion is further determined as an execution target. The control method according to appendix 9, characterized by:

（付記１２）前記予測する処理は、前記情報の量に基づいて前記応答遅延時間長を予測することを特徴とする付記９に記載の制御方法。 (Appendix 12) The control method according to appendix 9, wherein the predicting process predicts the response delay time length based on the amount of information.

（付記１３）前記予測する処理は、前記情報の量および応答遅延時間長の予測値の対応関係が定義された対応関係データを参照して、前記コミュニケーションロボットに入力される情報の量に対応する前記応答遅延時間長の予測値を予測に用いることを特徴とする付記１２に記載の制御方法。 (Appendix 13) The process of predicting corresponds to the amount of information input to the communication robot by referring to correspondence data that defines the correspondence between the amount of information and the predicted value of the response delay time length. 13. The control method according to appendix 12, wherein the predicted value of the response delay time length is used for the prediction.

（付記１４）前記対応関係データに含まれる前記応答遅延時間長の予測値を前記応答遅延時間長の実測値に基づいて更新する処理を前記コンピュータがさらに実行することを特徴とする付記１３に記載の制御方法。 (Supplementary note 14) The computer according to Supplementary note 13, wherein the computer further executes a process of updating the predicted value of the response delay time length included in the correspondence data based on the measured value of the response delay time length. control method.

（付記１５）前記予測された応答遅延時間長に基づいて第１の動作区間および第２の動作区間を設定する処理を前記コンピュータがさらに実行し、
前記決定する処理は、前記第１の動作区間の区間長に対応する動作を前記第１の動作区間で実行することを決定すると共に、前記コミュニケーションロボットと前記情報の入力を行う対象者との目線が合った状態で行われる動作を前記第２の動作区間で実行することを決定することを特徴とする付記９に記載の制御方法。 (Appendix 15) The computer further executes a process of setting a first operation interval and a second operation interval based on the predicted response delay time length,
The determining process determines that an action corresponding to the section length of the first action section is to be executed in the first action section, and the line of sight between the communication robot and the target person who inputs the information. 10. The control method according to claim 9, wherein it is determined to perform an action performed in a state where the two are matched in the second action section.

（付記１６）前記第２の動作区間は、前記応答遅延時間長の予測値の経過時点を含むことを特徴とする付記１５に記載の制御方法。 (Supplementary note 16) The control method according to Supplementary note 15, wherein the second operation interval includes a point in time at which the predicted value of the response delay time length has elapsed.

（付記１７）コミュニケーションロボットに対して入力された情報に基づいて、前記情報が入力されたタイミングから前記コミュニケーションロボットにより応答を出力するまでの応答遅延時間長を予測し、
予測された応答遅延時間長に対応する動作を決定し、
決定した前記動作を前記コミュニケーションロボットに実行させる、
処理をコンピュータに実行させることを特徴とする制御プログラム。 (Appendix 17) Predicting a response delay time length from the timing when the information is input until the communication robot outputs a response based on the information input to the communication robot,
determine the action corresponding to the predicted response delay time length,
causing the communication robot to execute the determined action;
A control program that causes a computer to execute processing.

（付記１８）前記決定する処理は、前記応答遅延時間長の予測値が長いほど前記コミュニケーションロボットのシルエットの変化が大きい動作を実行対象として決定することを特徴とする付記１７に記載の制御プログラム。 (Supplementary note 18) The control program according to Supplementary note 17, wherein the determining process determines an action to be executed in which the silhouette of the communication robot changes more as the predicted value of the response delay time length increases.

（付記１９）前記決定する処理は、前記予測された応答遅延時間長の経過後に、前記コミュニケーションロボットにより応答が出力できない場合、決定された動作よりも短時間の動作を更に実行対象として決定することを特徴とする付記１７に記載の制御プログラム。 (Appendix 19) In the determining process, when the communication robot cannot output a response after the predicted response delay time length has elapsed, a motion shorter than the determined motion is further determined as an execution target. The control program according to appendix 17, characterized by:

（付記２０）前記予測する処理は、前記情報の量に基づいて前記応答遅延時間長を予測することを特徴とする付記１７に記載の制御プログラム。 (Appendix 20) The control program according to appendix 17, wherein the predicting process predicts the response delay time length based on the amount of information.

１コミュニケーションロボット
３頭部
５胴部
７Ｒ右腕部
７Ｌ左腕部
９Ａ音入力部
９Ｂ音出力部
９Ｃ通信部
９Ｍモータ
１０制御部
１１音声処理部
１２伝送処理部
１３予測部
１４決定部
１５動作制御部
５０サーバ装置 1 communication robot 3 head 5 torso 7R right arm 7L left arm 9A sound input unit 9B sound output unit 9C communication unit 9M motor 10 control unit 11 voice processing unit 12 transmission processing unit 13 prediction unit 14 determination unit 15 motion control unit 50 Server equipment

Claims

a prediction unit that predicts, based on information input to a communication robot, a response delay time length from the timing at which the information is input until the communication robot outputs a response;
a determination unit that determines a filler motion, which is a body drive of the communication robot , corresponding to the predicted response delay time length;
The original posture before the determined filler motion is performed is changed by the filler motion, and the response delay time is matched with the time until the original posture is restored after the filler motion is completed, a motion control unit that causes the communication robot to perform the filler motion;
A communication robot characterized by having

2. The communication robot according to claim 1, wherein the decision unit decides, as an execution target, an action that causes a greater change in the silhouette of the communication robot as the predicted value of the response delay time length increases.

If the communication robot cannot output a response after the predicted response delay time length has passed, the determination unit further determines a motion for a shorter time than the determined motion to be executed. Item 3. The communication robot according to Item 1 or 2.

The prediction unit refers to correspondence relationship data defining a correspondence relationship between the amount of information and the predicted value of the response delay time length, and the response delay time length corresponding to the amount of information input to the communication robot. 4. The communication robot according to claim 1, wherein the predicted value of is used for the prediction.

5. The communication robot according to claim 4, further comprising an updating unit that updates the predicted value of the response delay time length included in the correspondence data based on the measured value of the response delay time length.

Predicting a response delay time length from the timing when the information is input until the communication robot outputs a response based on the information input to the communication robot;
Determining a filler motion, which is a body drive of the communication robot , corresponding to the predicted response delay time length,
The original posture before the determined filler motion is performed is changed by the filler motion, and the response delay time is matched with the time until the original posture is restored after the filler motion is completed, causing the communication robot to perform the filler operation;
A control method characterized in that the processing is executed by a computer.

Predicting a response delay time length from the timing when the information is input until the communication robot outputs a response based on the information input to the communication robot;
Determining a filler motion, which is a body drive of the communication robot , corresponding to the predicted response delay time length,
The original posture before the determined filler motion is performed is changed by the filler motion, and the response delay time is matched with the time until the original posture is restored after the filler motion is completed, causing the communication robot to perform the filler operation;
A control program that causes a computer to execute processing.