JP2008112360A

JP2008112360A - User interface device

Info

Publication number: JP2008112360A
Application number: JP2006295800A
Authority: JP
Inventors: Masaru Saito; 勝斉藤
Original assignee: Sony Computer Entertainment Inc
Current assignee: Sony Interactive Entertainment Inc
Priority date: 2006-10-31
Filing date: 2006-10-31
Publication date: 2008-05-15

Abstract

<P>PROBLEM TO BE SOLVED: To provide a new user interface device, and to improve convenience for a user. <P>SOLUTION: A noticing site specification part specifies a nose region in a recognized face region as a site under consideration (S118). A tracking part traces the position of the nose region of the acquired i-th frame by matching using a template image (S122). The tracking part tries the detection of the change of the direction of the face based on the tracing result (S126). When the change of the direction of the face of the user is detected in the i-th frame by the tracking part (YES in S128), an input processing part performs the input processing corresponding to meaning content showing the detected change of the direction of the face of the user by referring to the table (S130). <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、画像入力を利用したユーザインタフェイス技術に関する。 The present invention relates to a user interface technique using image input.

例えばゲーム装置では、コントローラの上下左右方向キーやアナログスティックなどを操作してゲームの入力とすることが広く行われている（特許文献１）。最近では技術革新が進み、コントローラを振ったり回転させたりしてゲームをプレイすることも可能になっている。
特開２００４−３１３４９２号公報 For example, in game devices, it is widely used to operate a controller by operating up / down / left / right direction keys or an analog stick of a controller (Patent Document 1). Recently, technological innovations have made it possible to play games by shaking or rotating the controller.
JP 2004-313492 A

本発明者は現状のコントローラについて次に示す課題を認識している。すなわち、方向キーの操作により例えばポインタを移動しようとすると、移動に時間がかかり、使い勝手が良いとはいえない。また、アナログスティックはセンターに戻ろうとする力が働き操作しにくい場合もある。さらに、コントローラを振ったり回転させたりする動作は１人でやると寂しいと感じられることもあるし、疲れるため長時間継続することが難しい場合もある。このように、現状のコントローラはユーザにおける利便性の面で改善の余地もある。 The present inventor has recognized the following problems with current controllers. That is, for example, if the pointer is moved by the operation of the direction key, it takes time to move, and it cannot be said that it is convenient. Also, the analog stick may be difficult to operate due to the force of returning to the center. Furthermore, the operation of shaking or rotating the controller may be felt lonely by one person, or it may be difficult to continue for a long time due to fatigue. Thus, the current controller has room for improvement in terms of user convenience.

本発明はこうした状況を認識してなされたものであり、その目的は、新たなユーザインタフェイス装置を提供し、ユーザにおける利便性を高めることにある。 The present invention has been made in recognition of such a situation, and an object of the present invention is to provide a new user interface device and improve convenience for the user.

本発明のある態様は、ユーザインタフェイス装置に関し、ユーザの顔画像をもとにユーザの顔の向きを推定し、その推定結果に応じて所定の入力処理を実行するものである。また、別の態様は、顔の向きに替えて顔の向きの変化に着目し、所定の入力処理を実行するものである。なお、これらの態様を、方法もしくはシステム、コンピュータプログラム、記録媒体などの間で変換したものもまた本発明として有効である。 An aspect of the present invention relates to a user interface device, which estimates the orientation of a user's face based on a user's face image and executes a predetermined input process according to the estimation result. In another aspect, a predetermined input process is executed by paying attention to a change in the face orientation instead of the face orientation. In addition, what converted these aspects between the method or system, a computer program, a recording medium, etc. is also effective as this invention.

本発明のユーザインタフェイス装置によれば、ユーザにおける利便性を高めることができる。 According to the user interface device of the present invention, convenience for the user can be enhanced.

実施の形態を詳述する前にその概要を以下に示す。
１．ある態様のユーザインタフェイス装置は、撮影されたユーザの顔画像を取得する取得部と、取得された顔画像をもとにユーザの顔の向きを推定する推定部と、推定された顔の向きに応じて所定の入力処理を実行する入力処理部とを備える。ここで、「所定の入力処理」とは、本装置に対してユーザの意思を何らかの形で反映させることをいい、必ずしも入力の確定動作を伴う必要はない。 Before describing the embodiment in detail, an outline thereof is shown below.
1. A user interface device according to an aspect includes an acquisition unit that acquires a photographed user's face image, an estimation unit that estimates a user's face direction based on the acquired face image, and an estimated face direction And an input processing unit that executes predetermined input processing according to the above. Here, the “predetermined input process” means that the intention of the user is reflected in some form on the apparatus, and does not necessarily need to be accompanied by an input determining operation.

２．別の態様では、複数回に亘って撮影されたユーザの顔画像を順次取得する取得部と、取得された顔画像をもとにユーザの顔の向きを順次推定する推定部と、順次推定された顔の向きの変化に応じて所定の入力処理を実行する入力処理部とを備える。当該装置はさらに、顔の向きの変化とその意味内容に応じた入力処理との関係を記述するテーブルを備え、入力処理部は、そのテーブルを参照することにより、顔の向きの変化が表す意味内容に応じた入力処理を実行してもよい。 2. In another aspect, an acquisition unit that sequentially acquires a user's face image taken a plurality of times, an estimation unit that sequentially estimates a user's face direction based on the acquired face image, and And an input processing unit that executes predetermined input processing in accordance with the change in the face orientation. The apparatus further includes a table describing the relationship between the change in face orientation and the input processing according to the meaning content, and the input processing unit refers to the table to indicate the change in face orientation. You may perform the input process according to the content.

３．さらに別の態様では、複数回に亘って撮影されたユーザの顔画像を順次取得する取得部と、初回に取得された顔画像の注目部位の位置を顔画像内の部位の位置関係を利用して特定する注目部位特定部と、特定された注目部位を含む部分画像をテンプレート画像として、２回目以降に取得された顔画像の注目部位の位置をテンプレート画像を用いたマッチングによって追跡することにより顔の向きの変化を検出するトラッキング部と、顔の向きの変化とその意味内容に応じた入力処理との関係を記述するテーブルと、そのテーブルを参照することにより、顔の向きの変化が表す意味内容に応じた入力処理を実行する入力処理部とを備える。ここで「初回」とは必ずしも第１回の撮影に限られない。あくまでも、最初に注目部位を特定するために利用される撮影であればよく、以降のトラッキング処理から見ればそれよりも先になされるため初回と呼ばれるに過ぎない。 3. In yet another aspect, an acquisition unit that sequentially acquires a user's face image photographed multiple times, and a position of a target region of the face image acquired for the first time using a positional relationship of the region in the face image. By tracking the position of the target region of the face image acquired for the second time or later by matching using the template image, using the target region specifying unit to be identified and the partial image including the specified target region as the template image The tracking unit that detects the change in the orientation of the face, a table that describes the relationship between the change in the orientation of the face and the input processing according to the meaning content, and the meaning that the change in the orientation of the face represents by referring to the table An input processing unit that executes input processing according to the content. Here, the “first time” is not necessarily limited to the first shooting. To the last, it is sufficient that the imaging is used for specifying the region of interest first, and it is only called the first time because it is performed earlier than the subsequent tracking processing.

いずれの態様も、顔に特有の性質に注目したものである。すなわち、例えば手では、時と場合によって形状および位置がいろいろ変化するため画像認識において捕らえにくい。また、可動範囲が広いことが災いして「向き」を定義することが困難である。その点、顔であれば、形状の変化が比較的少なく可動範囲も限られているため、画像認識で位置の特定が容易で、「向き」も定義できる。本発明者は顔に特有のこのような性質に気づき、ユーザインタフェイス装置において顔の向きあるいはその変化を入力処理に反映させるという技術に想到した。 Each aspect pays attention to the characteristic peculiar to the face. That is, for example, with a hand, the shape and the position change variously depending on time and circumstances, so that it is difficult to capture in image recognition. In addition, it is difficult to define the “direction” due to the wide movable range. On the other hand, if it is a face, the change in shape is relatively small and the movable range is limited. Therefore, the position can be easily identified by image recognition, and the “direction” can be defined. The present inventor has noticed such a characteristic unique to the face, and has come up with a technique for reflecting the orientation of the face or a change thereof in the input process in the user interface device.

以下、図面を参照しながら本発明の好適な実施の形態を詳述する。なお、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付し、適宜重複した説明は省略する。また、実施の形態は発明を限定するものではなく例示であり、実施の形態に記述されるすべての特徴やその組み合わせは必ずしも発明の本質的なものであるとは限らない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In addition, the same code | symbol is attached | subjected to the same or equivalent component, member, and process which are shown by each drawing, and the overlapping description is abbreviate | omitted suitably. In addition, the embodiments do not limit the invention but are exemplifications, and all features and combinations thereof described in the embodiments are not necessarily essential to the invention.

図１は、実施の形態にかかるユーザインタフェイス装置を用いた情報処理システム１の使用環境を例示する概念図である。このユーザインタフェイス装置は、情報処理システム１の一部を構成するコンピュータ１０によりその機能が実現される。情報処理システム１は、ユーザインタフェイス装置１００の他に、デジタルカメラ２と、ディスプレイ３とを備える。 FIG. 1 is a conceptual diagram illustrating a use environment of an information processing system 1 using a user interface device according to an embodiment. The function of this user interface device is realized by a computer 10 constituting a part of the information processing system 1. The information processing system 1 includes a digital camera 2 and a display 3 in addition to the user interface device 100.

デジタルカメラ２は、ディスプレイ３の前にいるユーザの顔画像を撮影可能な位置に設けられる。図１では、デジタルカメラ２がディスプレイ３の上に設けられた場合を例示している。なお、デジタルカメラ２による撮影画像は、所定のフレームレートの動画であってもよいし、静止画の連続であってもよい。本実施の形態ではそれが動画であるものとして説明する。ディスプレイ３は、特に限定されないが、コンピュータ１０の制御に応じて例えばソフトウェアキーボードやメニュー画面などを表示する。コンピュータ１０は、後述のように、デジタルカメラ２による撮影画像すなわちユーザの顔画像に応じた入力処理を実行する。 The digital camera 2 is provided at a position where the face image of the user in front of the display 3 can be taken. FIG. 1 illustrates the case where the digital camera 2 is provided on the display 3. Note that the image captured by the digital camera 2 may be a moving image having a predetermined frame rate or may be a series of still images. In the present embodiment, it is assumed that it is a moving image. The display 3 is not particularly limited, and displays, for example, a software keyboard or a menu screen according to the control of the computer 10. As will be described later, the computer 10 executes input processing according to an image captured by the digital camera 2, that is, a user's face image.

図２は、図１に示されるコンピュータ１０において実現されるユーザインタフェイス装置１００の機能ブロック図である。ここに示す各ブロックは、ハードウェア的にはコンピュータのＣＰＵをはじめとする素子や機械装置で実現され、ソフトウェア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。これらの機能ブロックがハードウェアやソフトウェアの組み合わせによっていろいろなかたちで実現できることは当業者に理解されるところである。 FIG. 2 is a functional block diagram of the user interface device 100 implemented in the computer 10 shown in FIG. Each block shown here is realized in hardware by an element such as a CPU of a computer or a mechanical device, and in software it is realized by a computer program or the like. Draw functional blocks. Those skilled in the art will understand that these functional blocks can be realized in various forms by a combination of hardware and software.

ユーザインタフェイス装置１００は、バッファ１１と、取得部１２と、推定部１４と、入力処理部１６と、テーブル１８とを備える。バッファ１１は、既知の手法によりデジタルカメラ２の撮影画像をフレームごとに一時記憶する。取得部１２は、推定部１４によるによる処理のために、バッファ１１に記憶された画像をフレームごとに順次読み出す。推定部１４は、顔認識部２０と、注目部位特定部２２と、トラッキング部２４とを含む。 The user interface device 100 includes a buffer 11, an acquisition unit 12, an estimation unit 14, an input processing unit 16, and a table 18. The buffer 11 temporarily stores the captured image of the digital camera 2 for each frame by a known method. The acquisition unit 12 sequentially reads out the images stored in the buffer 11 for each frame for processing by the estimation unit 14. The estimation unit 14 includes a face recognition unit 20, an attention site specifying unit 22, and a tracking unit 24.

顔認識部２０は、取得部１２の取得画像においてユーザの顔にあたる画像領域を認識する。以下、取得画像における顔にあたる画像領域を「顔領域」、顔領域を含む画像を「顔画像」と表記する。顔領域の認識には、ブースティングやサポートベクターマシン等の公知のパターン認識技術を用いればよい。例えば、数十から数百種類の顔画像を合成した基準画像をあらかじめ用意しておき、顔認識部２０は、取得画像中に矩形領域を設定し、基準画像とその矩形領域の画像を比較して画像としての類似度を数値化する。取得画像中にさまざまな矩形領域を設定して類似度を算出することにより、取得画像中の「顔らしい」領域を顔領域として特定する。なお、顔認識部２０は、取得部１２の取得画像の中に顔領域を認識できなければ、次の取得画像の顔領域を認識する。これらの処理は顔領域が認識されるまで繰り返される。 The face recognition unit 20 recognizes an image area corresponding to the user's face in the acquired image of the acquisition unit 12. Hereinafter, an image area corresponding to a face in the acquired image is referred to as a “face area”, and an image including the face area is referred to as a “face image”. For the recognition of the face area, a known pattern recognition technique such as boosting or support vector machine may be used. For example, a reference image obtained by synthesizing several tens to several hundreds of face images is prepared in advance, and the face recognition unit 20 sets a rectangular area in the acquired image, and compares the reference image with the image of the rectangular area. Then, the similarity as an image is digitized. By setting various rectangular areas in the acquired image and calculating the similarity, the “face-like” area in the acquired image is specified as the face area. Note that the face recognition unit 20 recognizes the face area of the next acquired image if the face area cannot be recognized in the acquired image of the acquisition unit 12. These processes are repeated until the face area is recognized.

注目部位特定部２２は、顔認識部２０が認識した顔領域のうちの注目部位を特定する。本実施の形態では、顔領域のうちの鼻にあたる領域（以下「鼻領域」と表記）を認識し、これを注目部位として特定する。鼻領域の認識は顔領域の場合と同様にしてもよいし、顔領域の中央部分を単純に鼻領域と認識してもよい。あるいは、比較的容易に特定できる両目の位置を先に特定し、両目を結ぶ線の中央部分の下方にあたる所定範囲を鼻領域と認識してもよい。すなわち、顔画像内の部位の位置関係を利用して鼻領域を認識してもよい。図３に鼻領域を例示する。破線で囲まれた矩形領域が鼻領域である。目や口と異なり鼻は閉じたり開いたりしないため、本実施の形態では鼻領域を注目部位として特定することとしている。これにより、トラッキング部２４による追跡が容易となる。 The attention site specifying unit 22 specifies the attention site in the face area recognized by the face recognition unit 20. In the present embodiment, a region corresponding to the nose (hereinafter referred to as “nose region”) in the face region is recognized and specified as a target region. The recognition of the nose region may be the same as in the case of the face region, or the central part of the face region may be simply recognized as the nose region. Alternatively, the positions of both eyes that can be specified relatively easily may be specified first, and a predetermined range corresponding to the lower part of the center portion of the line connecting the eyes may be recognized as the nose region. That is, the nose region may be recognized using the positional relationship between the parts in the face image. FIG. 3 illustrates the nose region. A rectangular area surrounded by a broken line is a nose area. Unlike the eyes and mouth, the nose does not close or open, so in this embodiment, the nose region is specified as the site of interest. This facilitates tracking by the tracking unit 24.

図２に戻り、トラッキング部２４は、注目部位特定部２２により注目部位すなわち鼻領域が特定されたフレーム（以下「初回フレーム」と表記）の後の各フレーム（以下「後続フレーム」と表記）において鼻領域を追跡することによりユーザの顔の向きを順次推定し、その変化を検出する。本実施の形態では「顔の向き」は、デジタルカメラ２に対してユーザの顔が正対したときのものを基準とする。鼻領域の追跡には公知のテンプレートマッチングを用いればよい。具体的には、初回フレームにおける鼻領域を含む部分画像をテンプレート画像として、後続フレームの鼻領域の位置をそのテンプレート画像を用いたマッチングによって追跡する。ここで、後続フレームの全領域についてマッチングを実行する必要はなく、顔の可動範囲をもとに、初回フレームにおける鼻領域の周辺部分についてのみマッチングを実行すれは足りる。なお、初回フレームにおける鼻領域を含む部分画像を一貫してテンプレート画像として利用することに替えて、テンプレート画像を順次更新してもよい。 Returning to FIG. 2, the tracking unit 24 performs the following operation on each frame (hereinafter referred to as “subsequent frame”) after the frame (hereinafter referred to as “initial frame”) in which the target region, that is, the nose region is specified by the target region specifying unit 22. By tracking the nose area, the direction of the user's face is sequentially estimated and the change is detected. In this embodiment, the “face orientation” is based on the face when the user's face is facing the digital camera 2. Known template matching may be used for tracking the nose region. Specifically, the partial image including the nose region in the first frame is used as a template image, and the position of the nose region in the subsequent frame is tracked by matching using the template image. Here, it is not necessary to perform matching for the entire region of the subsequent frame, and it is sufficient to perform matching only for the peripheral portion of the nose region in the first frame based on the movable range of the face. The template image may be sequentially updated instead of using the partial image including the nose region in the first frame as a template image consistently.

このように、推定部１４は、取得部１２により順次取得されたユーザの顔画像をもとにユーザの顔の向きを順次推定し、その変化を検出する。
テーブル１８は、顔の向きの変化とその意味内容に応じた入力処理との関係を記述している。入力処理部１６は、テーブル１８を参照することにより、トラッキング部２４により検出された顔の向きの変化が表す意味内容に応じた入力処理を実行する。入力処理の結果はディスプレイ３に出力される。入力処理部１６は、また、顔認識部２０が顔領域を認識したことをスイッチとして、スクリーンセーバからの復帰やメニュー表示などの入力準備処理をしてもよい。 As described above, the estimation unit 14 sequentially estimates the orientation of the user's face based on the user's face images sequentially acquired by the acquisition unit 12, and detects the change.
The table 18 describes the relationship between the change in the orientation of the face and the input processing according to the meaning content. The input processing unit 16 refers to the table 18 and executes input processing according to the meaning content represented by the change in the face orientation detected by the tracking unit 24. The result of the input process is output to the display 3. The input processing unit 16 may also perform input preparation processing such as return from the screen saver and menu display, using the fact that the face recognition unit 20 has recognized the face area as a switch.

テーブル１８の記述例、すなわち、顔の向きの変化とその意味内容に応じた入力処理の例をいくつか示す。
(1) ポインタの移動・・・顔の向きの変化に応じて、ディスプレイ３に表示されたポインタを移動させるように入力処理を実行する。例えば、ユーザの顔の向きが右に変化したとき、ディスプレイ３に表示されたポインタがユーザから見て右に移動するように入力処理を実行する。その他の向きへの変化も同様にする。このような入力処理はユーザの直感にも合致しやすいと考えられるため、ユーザにおける利便性の向上が期待される。なお、ポインタに替えてキャラクタを移動させるようにしてもよい。
(2) 「Ｙｅｓ」「Ｎｏ」の選択・・・一般に承諾や同意などの気持ちを表すとき、人は首を縦に振る。一方、不承諾や不同意などの気持ちを表すとき、人は首を横に振る。この点に着目し、ユーザに顔の向きが下に変化して元に戻ったときは、ユーザが「Ｙｅｓ」を選択した場合の入力処理を実行し、顔の向きが左右に変化して元に戻ったときは、ユーザが「Ｎｏ」を選択した場合の入力処理を実行する。これによればユーザは、ディスプレイ３に表示された「Ｙｅｓ」「Ｎｏ」の問いかけに対し、通常の会話時にするのと同じように頷いたり首を横に振ることで、その動作の意味通りの入力処理を得られる。したがって、マウスなどによりポインタを操作して「Ｙｅｓ」「Ｎｏ」を選択する場合と比較してより直感に近い動作で入力を進めることができるため、ユーザにおける利便性の向上が期待される。
(3) 画面のスクロール・・・顔の向きの変化に応じて、ディスプレイ３の表示をスクロールさせるように入力処理を実行する。例えば、ユーザの顔の向きが下に変化したとき、ディスプレイ３の表示を下にスクロールするように入力処理を実行する。その他の向きへの変化も同様にする。コンピュータの操作に不慣れなユーザにとってスクロールの動作は難しいと感じられる場合もあるが、上述のように顔の向きの変化と画面スクロールの向きを対応づけることにより画面スクロールをより直感に近づけることができ、ユーザにおける利便性の向上が期待される。 Some description examples of the table 18, that is, some examples of input processing according to changes in the orientation of the face and the meaning contents thereof will be shown.
(1) Movement of pointer: Input processing is executed so as to move the pointer displayed on the display 3 in accordance with the change in the orientation of the face. For example, when the orientation of the user's face changes to the right, the input process is executed so that the pointer displayed on the display 3 moves to the right as viewed from the user. The same applies to changes in other directions. Since such input processing is considered to easily match the user's intuition, it is expected to improve convenience for the user. The character may be moved instead of the pointer.
(2) Selection of “Yes” or “No”: Generally, when expressing feelings such as consent or consent, a person shakes his head vertically. On the other hand, when expressing feelings such as disapproval or disagreement, a person shakes his head. Paying attention to this point, when the user's face orientation changes downward and returns to the original state, input processing when the user selects “Yes” is executed, and the face orientation changes to the left and right. When the process returns to, input processing when the user selects “No” is executed. According to this, in response to the “Yes” and “No” questions displayed on the display 3, the user crawls and shakes his / her head in the same manner as in normal conversation, and the operation is as intended. Input processing can be obtained. Therefore, compared with the case where the pointer is operated with a mouse or the like and “Yes” or “No” is selected, the input can be advanced with a more intuitive operation, and thus the convenience for the user is expected to be improved.
(3) Scrolling of screen: Input processing is executed so as to scroll the display on the display 3 in accordance with the change in the orientation of the face. For example, when the orientation of the user's face changes downward, the input process is executed so that the display 3 is scrolled downward. The same applies to changes in other directions. Scrolling may seem difficult for users who are not familiar with computer operations, but as described above, screen scrolling can be made more intuitive by associating changes in face orientation with screen scrolling orientation. Improvement of convenience for the user is expected.

図４は、図２に示されるユーザインタフェイス装置１００における処理を示すフローチャートである。処理が開始されると、フレームカウント値のｉが０にセットされる（Ｓ１０１）。取得部１２は、バッファ１１からｉ番目のフレームを取得する（Ｓ１０２）。顔認識部２０は、取得されたｉ番目のフレームについて顔領域の認識を試みる（Ｓ１０４）。顔領域が認識できなかった場合（Ｓ１０６：Ｎｏ）、ｉがインクリメントされ（Ｓ１０８）、同様の処理が実行される。これらの処理は、顔領域が認識されるまで繰り返される。 FIG. 4 is a flowchart showing processing in the user interface apparatus 100 shown in FIG. When the process is started, the frame count value i is set to 0 (S101). The acquisition unit 12 acquires the i-th frame from the buffer 11 (S102). The face recognition unit 20 tries to recognize the face area for the acquired i-th frame (S104). When the face area cannot be recognized (S106: No), i is incremented (S108), and the same processing is executed. These processes are repeated until the face area is recognized.

顔認識部２０によりｉ番目のフレームにおいて顔領域が認識されると（Ｓ１０６：Ｙｅｓ）、注目部位特定部２２は、認識された顔領域のうちの鼻領域を注目部位として特定する（Ｓ１１８）。特定された鼻領域を含む部分画像は、トラッキング部２４においてテンプレート画像として保持される。 When the face recognition unit 20 recognizes a face region in the i-th frame (S106: Yes), the attention site specifying unit 22 specifies the nose region in the recognized face region as the attention site (S118). The partial image including the identified nose region is held as a template image in the tracking unit 24.

鼻領域の特定後、ｉがインクリメントされ（Ｓ１１９）、取得部１２は、バッファ１１からｉ番目のフレームを取得する（Ｓ１２０）。トラッキング部２４は、テンプレート画像を用いたマッチングにより、取得されたフレームについて鼻領域の位置の追跡を試みる（Ｓ１２２）。ここで、鼻領域を見失って追跡できなかった場合（Ｓ１２３：Ｎｏ）、ｉがインクリメントされ（Ｓ１２４）、Ｓ１０２のステップに戻る。なお、鼻領域を追跡できたか否かはテンプレートマッチングのスコアとスレッショルドとの大小関係で決定できる。 After specifying the nose region, i is incremented (S119), and the acquisition unit 12 acquires the i-th frame from the buffer 11 (S120). The tracking unit 24 attempts to track the position of the nose region for the acquired frame by matching using the template image (S122). If the nose area is lost and tracking is not possible (S123: No), i is incremented (S124), and the process returns to the step of S102. Whether or not the nose region can be tracked can be determined by the magnitude relationship between the template matching score and the threshold.

トラッキング部２４は、鼻領域の追跡に成功すると（Ｓ１２３：Ｙｅｓ）、追跡結果に基づいて顔の向きの変化の検出を試みる（Ｓ１２６）。顔の向きの変化が検出されなかった場合（Ｓ１２８：Ｎｏ）、ｉがインクリメントされ（Ｓ１１９）、同様の処理が実行される。これらの処理は、顔の向きの変化が検出されるまで繰り返される。 When the tracking unit 24 succeeds in tracking the nose region (S123: Yes), the tracking unit 24 tries to detect a change in the orientation of the face based on the tracking result (S126). If no change in face orientation is detected (S128: No), i is incremented (S119), and the same processing is executed. These processes are repeated until a change in face orientation is detected.

トラッキング部２４によりｉ番目のフレームにおいてユーザの顔の向きの変化が検出されると（Ｓ１２８：Ｙｅｓ）、入力処理部１６は、テーブル１８を参照することにより、検出されたユーザの顔の向きの変化が表す意味内容に応じた入力処理を実行する（Ｓ１３０）。 When the tracking unit 24 detects a change in the orientation of the user's face in the i-th frame (S128: Yes), the input processing unit 16 refers to the table 18 to determine the detected orientation of the user's face. An input process corresponding to the meaning content represented by the change is executed (S130).

図４では、トラッキング部２４が１フレームの追跡ごとに顔の向きの変化の検出を試みる場合を示したが、これに替えて、トラッキング部２４は、追跡済みのフレーム数が所定数となるまで追跡を繰り返し、追跡済みフレーム数が所定数となったときに顔の向きの変化の検出を試みてもよい。検出を試みるときの追跡済みフレーム数（以下、「必要フレーム数」）は、入力処理部１６において実行すべき入力処理ごとに設定されてもよい。 FIG. 4 shows a case where the tracking unit 24 attempts to detect a change in face orientation every time one frame is tracked. Instead, the tracking unit 24 changes the number of tracked frames until a predetermined number is reached. Tracking may be repeated, and detection of a change in face orientation may be attempted when the number of tracked frames reaches a predetermined number. The number of tracked frames when the detection is attempted (hereinafter referred to as “necessary number of frames”) may be set for each input process to be executed in the input processing unit 16.

例えば、入力処理部１６が「Ｙｅｓ」「Ｎｏ」の選択のような入力処理を実行する場合は、必要フレーム数は多めに設定される。「Ｙｅｓ」「Ｎｏ」の選択のためには、トラッキング部２４はユーザの顔の向きの少なくとも２方向の変化を検出しなければならないからである。一方、入力処理部１６がポインタの移動や画面のスクロールのような入力処理の場合は、必要フレーム数は少なめに設定される。この場合は顔の向きの１方向の変化を検出すれば足りるためである。 For example, when the input processing unit 16 performs an input process such as selecting “Yes” or “No”, the required number of frames is set to be larger. This is because, in order to select “Yes” or “No”, the tracking unit 24 must detect a change in at least two directions of the user's face direction. On the other hand, when the input processing unit 16 performs input processing such as pointer movement or screen scrolling, the required number of frames is set to be small. In this case, it is sufficient to detect a change in the direction of the face in one direction.

（利用場面１：手を使えないとき）
図１に示される情報処理システム１が自動車工場の端末に用いられている場合を説明する。いま、自動車工場で車を修理している整備士が、工場内のコンピュータを操作してパーツ情報などを参照する場面を考える。このような場面では手が汚れていたり濡れていたりしてコンピュータを手で操作できない場合が多い。本実施の形態によれば、整備士が顔をデジタルカメラ２と正対させれば、コンピュータ１０はそれをスイッチとしてディスプレイ３にメニューを表示する。整備士は、顔の向きを変えるだけでパーツ情報を参照できる。したがって、パーツ情報を参照するために手を洗ったりする必要がなくなり、整備士の時間の無駄を抑えうる。 (Situation 1: When you cannot use your hands)
A case where the information processing system 1 shown in FIG. 1 is used in a terminal of an automobile factory will be described. Consider a scene in which a mechanic repairing a car in an automobile factory operates a computer in the factory to refer to parts information. In such situations, the hands are often dirty or wet and the computer cannot be operated by hand. According to the present embodiment, when the mechanic makes the face face the digital camera 2, the computer 10 displays a menu on the display 3 using it as a switch. The mechanic can refer to the parts information simply by changing the face orientation. Therefore, it is not necessary to wash hands in order to refer to the part information, and the waste of time for the mechanic can be suppressed.

（利用場面２：ソフトウェアキーボード）
図１に示される情報処理システム１でソフトウェアキーボードによる入力処理を実行する場合を説明する。従来のようにソフトウェアキーボードで文字を特定する際に方向キーやジョイスティックを使用すると、１文字移動するごとに動きを止めなければならなかったり所望の文字を行き過ぎてしまったりして操作性がよいとはいえない。また、アナログスティックを使用すると真ん中に戻ろうとする力が操作性を悪化させる恐れがある。本実施の形態によれば、ユーザは顔の向きを変えることでマウスのようにアナログ的にポインタを動かすことができるので使い勝手がよい。さらにマウスおよびマウスパッドが不要となる点で省スペース化が期待される。 (Usage scenario 2: Software keyboard)
A case where input processing using a software keyboard is executed in the information processing system 1 shown in FIG. 1 will be described. If you use direction keys or joysticks to identify characters on a software keyboard as in the past, you will have to stop moving each time you move a character, or you may overshoot the desired character and be easy to operate. I can't say that. Further, when an analog stick is used, the force to return to the middle may deteriorate the operability. According to the present embodiment, the user can move the pointer in an analog manner like a mouse by changing the direction of the face, which is convenient. Furthermore, space saving is expected in that a mouse and a mouse pad are not required.

（利用場面３：「Ｙｅｓ」または「Ｎｏ」の選択）
図１に示される情報処理システム１をエンタテイメントシステムとして用いる場合を説明する。いま、ユーザがＲＰＧ（Role-Playing Game）をプレイしている場面を考える。従来、キャラクタの移動には方向キーやアナログスティックなどが使われ、「Ｙｅｓ」または「Ｎｏ」を選択する場面では○ボタンあるいは×ボタンを押すなどしていた。本実施の形態では、ユーザは、顔の向きを変化させることでキャラクタを動かし画面上をウォークスルーできる。また、別のキャラクタに話しかけられて「Ｙｅｓ」または「Ｎｏ」を選択するとき、ユーザは頷くあるいは首を横に振ることで選択を実行できる。ここで、入力処理部１６は、ユーザが所定時間反応を示さなかったことを検知した場合には、選択がなかったものとして自然にゲームを進めることもできる。 (Usage scenario 3: selection of “Yes” or “No”)
A case where the information processing system 1 shown in FIG. 1 is used as an entertainment system will be described. Consider a scene where a user is playing an RPG (Role-Playing Game). Conventionally, a direction key, an analog stick, or the like is used to move a character, and when a “Yes” or “No” selection is made, the ○ button or the X button is pressed. In the present embodiment, the user can walk through the screen by moving the character by changing the orientation of the face. In addition, when speaking to another character and selecting “Yes” or “No”, the user can make a selection by whispering or shaking his / her head. Here, if the input processing unit 16 detects that the user has not responded for a predetermined time, the input processing unit 16 can naturally proceed with the game as if there was no selection.

上述した実施の形態は例示に過ぎず、その種々の変形技術が本発明に含まれることは当業者には容易に理解されるところである。以下、そうした変形技術を例示する。 The above-described embodiments are merely examples, and those skilled in the art will readily understand that various modifications thereof are included in the present invention. Hereinafter, such a modification technique will be exemplified.

実施の形態では主に顔の向きの変化に着目したが、変形技術では顔の向き自体に着目する。トラッキング部２４は、後続フレームにおける鼻領域の位置を追跡するまでは実施の形態と同様にし、その後、初回フレームにおける鼻領域の位置と現在のフレームにおける鼻領域の位置とから現在の顔の向きを推定する。用途によっては顔の向きの変化よりも顔の向き自体を入力処理に利用した方が使い勝手がよいと感じられる場合もある。そういった場合には本変形技術が有利である。 Although the embodiment mainly focuses on the change of the face direction, the deformation technique focuses on the face direction itself. The tracking unit 24 does the same as the embodiment until the position of the nose region in the subsequent frame is tracked, and then determines the current face orientation from the position of the nose region in the first frame and the position of the nose region in the current frame. presume. Depending on the application, it may be more convenient to use the face orientation itself for input processing than to change the face orientation. In such a case, this modification technique is advantageous.

実施の形態では主としてデジタルカメラ２の撮影画像が動画である場合を説明したが、それは動画でなく静止画であってもよい。推定部１４は、初回に撮影された静止画における鼻領域がその後の静止画でどの位置にあるかにより顔の向きを推定することができる。なお、初回に撮影された静止画とは、注目部位特定部２２により鼻領域が特定された静止画を意味する。 In the embodiment, the case where the photographed image of the digital camera 2 is mainly a moving image has been described, but it may be a still image instead of a moving image. The estimation unit 14 can estimate the face direction based on the position of the nose region in the still image taken for the first time in the subsequent still image. In addition, the still image photographed for the first time means a still image in which the nose region is specified by the attention site specifying unit 22.

実施の形態では、注目部位特定部２２が顔画像のうちの鼻領域を注目部位として特定する場合を説明した。変形技術では、目や口にあたる領域を注目部位としてもよい。また、目・口・鼻の位置関係の２つあるいは３つを注目部位としてもよい。目や口の場合、トラッキング部２４における追跡容易性が鼻ほどではないが、それを克服することで実施の形態と同様の効果が得られる。 In the embodiment, the case has been described where the attention site specifying unit 22 specifies the nose region of the face image as the attention site. In the deformation technique, a region corresponding to the eyes or mouth may be set as the attention site. Further, two or three of the positional relationship between eyes, mouth and nose may be set as the attention site. In the case of the eyes and mouth, the tracking unit 24 is not as easy to track as the nose, but by overcoming it, the same effects as in the embodiment can be obtained.

実施の形態では特に触れなかったが、ユーザが顔の向きを変化させずに顔の位置を変えることもある。ディスプレイ３の表示面に対して並行に顔を動かした場合がそれである。このような場合もユーザインタフェイス装置１００は顔の向きが変化したものと推定してそれに応じた入力処理を実行する。例えば、ユーザがディスプレイ３の表示面に対して並行に顔を左に動かした場合、ユーザインタフェイス装置１００は、ユーザの顔の向きが左に変化したものと推定し、ポインタを左に移動させるような入力処理を実行する。この入力処理はユーザの直感に合致したものと考えられ、結果的に妥当なものといえる。 Although not particularly mentioned in the embodiment, the user may change the position of the face without changing the orientation of the face. This is the case when the face is moved in parallel with the display surface of the display 3. Also in such a case, the user interface device 100 estimates that the face orientation has changed, and executes an input process corresponding to the change. For example, when the user moves the face to the left in parallel with the display surface of the display 3, the user interface device 100 estimates that the orientation of the user's face has changed to the left, and moves the pointer to the left. Input processing like this is executed. This input process is considered to be consistent with the user's intuition, and as a result can be said to be appropriate.

実施の形態にかかるユーザインタフェイス装置を用いた情報処理システムの使用環境を例示する概念図である。It is a conceptual diagram which illustrates the use environment of the information processing system using the user interface device concerning an embodiment. 図１に示されるコンピュータにおいて実現されるユーザインタフェイス装置の機能ブロック図である。It is a functional block diagram of the user interface apparatus implement | achieved in the computer shown by FIG. 図２に示される取得部による取得画像と、注目部位特定部において認識される鼻領域とを重ねて示す説明図である。It is explanatory drawing which overlaps and shows the acquired image by the acquisition part shown by FIG. 2, and the nose area recognized in an attention site | part specific part. 図２に示されるユーザインタフェイス装置における処理を示すフローチャートである。It is a flowchart which shows the process in the user interface apparatus shown by FIG.

Explanation of symbols

１・・・情報処理システム、２・・・デジタルカメラ、３・・・ディスプレイ、１０・・・コンピュータ、１２・・・取得部、１４・・・推定部、１６・・・入力処理部、１８・・・テーブル、２０・・・顔認識部、２２・・・注目部位特定部、２４・・・トラッキング部、１００・・・ユーザインタフェイス装置。 DESCRIPTION OF SYMBOLS 1 ... Information processing system, 2 ... Digital camera, 3 ... Display, 10 ... Computer, 12 ... Acquisition part, 14 ... Estimation part, 16 ... Input processing part, 18 ... Table, 20 ... Face recognition unit, 22 ... Part of interest specifying unit, 24 ... Tracking unit, 100 ... User interface device.

Claims

An acquisition unit for acquiring a photographed face image of the user;
An estimation unit that estimates the orientation of the user's face based on the acquired face image;
An input processing unit that executes predetermined input processing according to the estimated face orientation;
A user interface device comprising:

An acquisition unit for sequentially acquiring a user's face image taken multiple times;
An estimation unit that sequentially estimates the orientation of the user's face based on the acquired face image;
An input processing unit that executes a predetermined input process in accordance with a sequentially estimated change in face orientation;
A user interface device comprising:

The apparatus according to claim 2, further comprising a table that describes a relationship between a change in face orientation and input processing in accordance with the meaning content thereof.
The user interface device, wherein the input processing unit executes an input process according to a semantic content represented by a change in face orientation by referring to the table.

A function of sequentially estimating the orientation of the user's face based on the user's face image taken multiple times;
A function of executing a predetermined input process in accordance with the sequentially estimated change in face orientation;
A user interface processing program characterized by causing a computer to exhibit the above.