JP6606849B2

JP6606849B2 - Discriminator generation device, discriminator generation method, estimation device, estimation method, and program

Info

Publication number: JP6606849B2
Application number: JP2015077799A
Authority: JP
Inventors: 総田端; 慧吾廣川; 靖寿松葉
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2015-04-06
Filing date: 2015-04-06
Publication date: 2019-11-20
Anticipated expiration: 2035-04-06
Also published as: JP2016197371A

Description

本発明は、複数の画像から識別器を構築し、画像中の特定部位（位置）を推定する技術に関する。 The present invention relates to a technique for constructing a discriminator from a plurality of images and estimating a specific part (position) in the image.

従来、顔器官検出技術は、顔のモデルを定義し、入力画像をモデルに当てはめて最適化することによって、モデルに予め定義されている顔器官位置に基づいて入力画像の顔器官位置を求める手法（モデルベース手法）が主流であった（例えば、非特許文献１、２、３等）。特許文献１では、入力画像を３次元モデルに対し当てはめることで高精度な顔器官検出を実現している。 Conventionally, the facial organ detection technique defines a facial model and optimizes the input image by applying it to the model, thereby obtaining the facial organ position of the input image based on the facial organ position predefined in the model. (Model-based method) was the mainstream (for example, Non-Patent Documents 1, 2, 3, etc.). In Patent Document 1, high-precision face organ detection is realized by applying an input image to a three-dimensional model.

しかしながら、モデルベース手法では、次のような問題がある。第１に、最適化問題として位置合わせエラーではなく、モデルへの当てはまりエラーを解くため、エラーを最小化したとしても、必ずしも位置合わせの結果が良くなるとは限らない。第２に、全ての顔に統一のモデル（低次元）を持つことが前提であるため、年代、性別、人種等のバリエーションに対応できない。 However, the model-based method has the following problems. First, the optimization problem is not an alignment error, but an error that is applied to the model is solved. Therefore, even if the error is minimized, the alignment result does not always improve. Second, since it is premised on having a unified model (low dimension) on all faces, it cannot cope with variations such as age, gender, and race.

一方、近年、適当な初期位置から正しい顔器官位置を回帰問題として解く方法（リグレッションベース手法）がとられている（例えば、非特許文献４、５、６等）。これらの手法では、位置合わせエラーを高次元空間で回帰問題として解くため、モデルベース手法の問題を克服し、精度の高い位置合わせが実現できる。 On the other hand, in recent years, a method (regression-based method) for solving a correct facial organ position from a suitable initial position as a regression problem has been taken (for example, Non-Patent Documents 4, 5, 6, etc.). In these methods, the alignment error is solved as a regression problem in a high-dimensional space, so that the problem of the model-based method can be overcome and highly accurate alignment can be realized.

特許５２１３７７８号公報Japanese Patent No. 52113778 米国特許出願公開２０１４／０１８５９２４号公報US Patent Application Publication No. 2014/0185924

Tim Cootes, ”An Introduction to Active Shape Models”, http://person.hst.aau.dk/06gr1088d/artikler/Pdf/Cootes%20Introduction.pdfTim Cootes, “An Introduction to Active Shape Models”, http://person.hst.aau.dk/06gr1088d/artikler/Pdf/Cootes%20Introduction.pdf Timothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor, “ActiveAppearance Models”, IEEETransactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 6, June 2001, http://person.hst.aau.dk/06gr1088d/artikler/Pdf/Cootes%20Introduction.pdfTimothy F. Cootes, Gareth J. Edwards, and Christopher J. Taylor, “ActiveAppearance Models”, IEEETransactions on Pattern Analysis and Machine Intelligence, Vol. 23, No. 6, June 2001, http: //person.hst.aau. dk / 06gr1088d / artikler / Pdf / Cootes% 20Introduction.pdf David Cristinacce and Tim Cootes, “Feature Detection and Trackingwith Constrained Local Models”, http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/papers/BMVC06/cristinacce_bmvc06.pdfDavid Cristinacce and Tim Cootes, “Feature Detection and Trackingwith Constrained Local Models”, http://personalpages.manchester.ac.uk/staff/timothy.f.cootes/papers/BMVC06/cristinacce_bmvc06.pdf Piotr Dollar, Peter Welinder, Pietro Perona, “Cascaded PoseRegression”, Conference on ComputerVision and Pattern Recognition (CVPR)2010Piotr Dollar, Peter Welinder, Pietro Perona, “Cascaded PoseRegression”, Conference on ComputerVision and Pattern Recognition (CVPR) 2010 Xavier P. Burgos-Artizzu, Pietro Perona, Piotr Dollar, “Robust FaceLandmark Estimation Under Occlusion”, International Conference On ComputerVision(ICCV2013)Xavier P. Burgos-Artizzu, Pietro Perona, Piotr Dollar, “Robust FaceLandmark Estimation Under Occlusion”, International Conference On ComputerVision (ICCV2013) Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun, “Face Alignment at3000 FPS via Regressing Local Binary Features”, Conference on Computer Visionand Pattern Recognition(CVPR)2014Shaoqing Ren, Xudong Cao, Yichen Wei, Jian Sun, “Face Alignment at3000 FPS via Regressing Local Binary Features”, Conference on Computer Vision and Pattern Recognition (CVPR) 2014

しかしながら、リグレッションベース手法では、位置合わせエラーを高次元空間で回帰問題として解くため、回帰関数を求めるための特徴の組み合わせの数が膨大となり、現実時間では解くことができない問題点があった。 However, in the regression-based method, since the alignment error is solved as a regression problem in a high-dimensional space, the number of feature combinations for obtaining a regression function becomes enormous, and there is a problem that cannot be solved in real time.

この点、既存の手法（特許文献２、非特許文献４、５、６）では、ランダムに選択した特徴から候補となる識別器を複数作成し、それらの作成した識別器同士を相対的に比較して、最も良い識別器を決定することで上記問題を回避している。しかしながら、本手法においてはランダムに特徴の選択、識別器の作成を行っているため、選択する特徴の数と、作成する識別器の数に精度が依存してしまうなど、効率性の観点からも最適な識別器を得ることは困難である。 In this regard, in the existing methods (Patent Document 2, Non-Patent Documents 4, 5, and 6), a plurality of candidate classifiers are created from randomly selected features, and the created classifiers are relatively compared with each other. Thus, the above problem is avoided by determining the best classifier. However, since this method randomly selects features and creates classifiers, the accuracy depends on the number of features to be selected and the number of classifiers to be created. It is difficult to obtain an optimal classifier.

本発明は、前述した問題点に鑑みてなされたものであり、その目的とすることは、オブジェクト検出における効率的に良好なロバスト性を有する識別器を生成可能な識別器生成装置等を提供することである。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a discriminator generating apparatus and the like that can generate a discriminator having efficient and robust robustness in object detection. That is.

前述した目的を達成するための本発明における識別器生成装置は、複数の弱識別器からなる強識別器を複数用いて構成される識別器を生成する識別器生成装置であって、検出対象となるオブジェクトを含む複数の学習画像と、前記学習画像のオブジェクトの位置である正解値と、を受け付けて、前記学習画像から複数の特徴を抽出し、前記複数の特徴から２つを選択して、それらの差分から差分特徴を決定し、前記複数の学習画像を用いて前記差分特徴の特徴量からヒストグラムを生成して、前記ヒストグラムの面積が等しくなる閾値を決定し、前記差分特徴の特徴量と前記閾値とを用いて木構造からなる前記弱識別器を生成し、前記学習画像と前記正解値を用いて、前記弱識別器により識別された画像の検出対象の推定位置を補正するための移動量を算出することを特徴とする。 In order to achieve the above-described object, a discriminator generating device in the present invention is a discriminator generating device that generates a discriminator configured by using a plurality of strong discriminators including a plurality of weak discriminators. Receiving a plurality of learning images including the object and a correct value that is the position of the object of the learning image, extracting a plurality of features from the learning image, and selecting two from the plurality of features, A difference feature is determined from the difference, a histogram is generated from the feature amount of the difference feature using the plurality of learning images, a threshold value for equalizing the areas of the histogram is determined, and the feature amount of the difference feature Generating the weak classifier having a tree structure using the threshold value, and correcting the estimated position of the detection target of the image identified by the weak classifier using the learning image and the correct answer value; And calculates the amount of movement.

前記識別器は、前記複数の学習画像を顔の向き毎に分類して各向き毎の識別器を生成されることを特徴とする。 The classifier generates the classifier for each direction by classifying the plurality of learning images for each direction of the face.

前記弱識別器及び前記強識別器は、予め設定された目標値に基づいて生成されることを特徴とする。 The weak classifier and the strong classifier are generated based on a preset target value.

動的計画法を用いて前記木構造の深さに対応して用いられる前記特徴量を決定し前記弱識別器を生成することを特徴とする。 The weak discriminator is generated by determining the feature quantity used corresponding to the depth of the tree structure using dynamic programming.

前記特徴は、Ｓｈａｐｅ−Ｉｎｄｅｘであることを特徴とする。 The feature is a shape-index.

また、本発明における推定装置は、前記識別器生成装置の識別器を用いて、画像中の検出対象オブジェクトの推定位置を移動させながら前記検出対象オブジェクトを推定することを特徴とする。 The estimation apparatus according to the present invention is characterized in that the detection target object is estimated while moving the estimated position of the detection target object in the image using the classifier of the classifier generation apparatus.

また、本発明における識別器生成方法は、検出対象となるオブジェクトを含む複数の学習画像と、前記学習画像のオブジェクトの位置である正解値と、を受け付けて、前記学習画像から複数の特徴を抽出し、前記複数の特徴から２つを選択して、それらの差分から差分特徴を決定し、前記複数の学習画像を用いて前記差分特徴の特徴量からヒストグラムを生成して、前記ヒストグラムの面積が等しくなる閾値を決定し、前記差分特徴の特徴量と前記閾値とを用いて木構造からなる弱識別器を生成し、前記学習画像と前記正解値を用いて、前記弱識別器により識別された画像の検出対象の推定位置を補正するための移動量を算出することを特徴とする。 In addition, the classifier generation method according to the present invention receives a plurality of learning images including an object to be detected and a correct value that is the position of the object of the learning image, and extracts a plurality of features from the learning image. And selecting two of the plurality of features, determining a difference feature from the difference between them, generating a histogram from the feature amount of the difference feature using the plurality of learning images, and determining the area of the histogram determining the equal threshold, using said feature quantity and the threshold value of the difference feature generates a weak discriminator that Do from the tree structure, using the correct value and the learning image, are identified by the weak classifier A moving amount for correcting the estimated position of the detection target of the detected image is calculated.

また、本発明における推定方法は、検出対象となるオブジェクトを含む複数の学習画像と、前記学習画像のオブジェクトの位置である正解値と、を受け付けて、前記学習画像から複数の特徴を抽出し、前記複数の特徴から２つを選択して、それらの差分から差分特徴を決定し、前記複数の学習画像を用いて前記差分特徴の特徴量からヒストグラムを生成して、前記ヒストグラムの面積が等しくなる閾値を決定し、前記差分特徴の特徴量と前記閾値とを用いて木構造からなる弱識別器を生成し、前記学習画像と前記正解値を用いて、前記弱識別器により識別された画像の検出対象の推定位置を補正するための移動量を算出し、前記弱識別器を用いて、画像中の検出対象オブジェクトの推定位置を移動させながら前記検出対象オブジェクトを推定することを特徴とする。 The estimation method according to the present invention receives a plurality of learning images including an object to be detected and a correct value that is the position of the object of the learning image, and extracts a plurality of features from the learning image, Two of the plurality of features are selected, a difference feature is determined from the difference between them, a histogram is generated from the feature amount of the difference feature using the plurality of learning images, and the areas of the histogram are equalized determining the threshold value, the using the feature amount of the difference characteristic between the said threshold value to generate a weak discriminator that Do from the tree structure, using the correct value and the learning image, identified by the weak classifier image calculating a movement amount for correcting the estimated position of the detection object by using the weak classifier, estimate the detection target object while moving the estimated position of the detection target object in an image And wherein the Rukoto.

また、本発明におけるプログラムは、前記識別器生成方法をコンピュータに実行させることを特徴とする。 A program according to the present invention causes a computer to execute the classifier generation method.

前記推定方法をコンピュータに実行させることを特徴とする。 The estimation method is executed by a computer.

本発明により、オブジェクト検出における効率的に良好なロバスト性を有する識別器を生成する識別器生成装置等を提供することができる。 According to the present invention, it is possible to provide a discriminator generating apparatus and the like that generate a discriminator having efficient and robust robustness in object detection.

識別器生成装置１、推定装置２のハードウェア構成例を示す図The figure which shows the hardware structural example of the discriminator production | generation apparatus 1 and the estimation apparatus 2 リグレッションベース手法に基づく顔器官推定の概念図Conceptual diagram of facial organ estimation based on regression-based method 本実施形態に係る識別器３の概念図Conceptual diagram of classifier 3 according to the present embodiment 本実施形態に係る識別器３（顔方向別）の概念図Conceptual diagram of classifier 3 (by face direction) according to the present embodiment （ａ）Ｆｅｒｎ木（弱識別器）５の木構造を示す図（ｂ）Ｆｅｒｎ木の別の表現態様を示す図(A) The figure which shows the tree structure of Fern tree (weak classifier) 5 (b) The figure which shows another expression mode of a Fern tree 学習目標値に応じて識別器の学習を行う様子を示す図The figure which shows a mode that a discriminator learns according to a learning target value Ｓｈａｐｅ−Ｉｎｄｅｘに基づく特徴を示す図The figure which shows the feature based on Shape-Index 学習処理の流れを説明するフローチャートFlow chart explaining the flow of learning processing ヒストグラムＨから閾値δを決定する様子を示す図The figure which shows a mode that threshold value (delta) is determined from the histogram H Ｆｅｒｎ候補グラフ７Ｇｒの概念図Conceptual diagram of Fern candidate graph 7Gr 分岐関数の組み合わせを決定する様子を示す図Diagram showing how the combination of branch functions is determined 決定した分岐関数からＦｅｒｎ木（弱識別器）を生成する様子を示す図The figure which shows a mode that a Fern tree (weak discriminator) is produced | generated from the determined branch function. 移動量を求める様子を示す図Figure showing how to calculate the amount of movement 推定処理の流れを説明するフローチャートFlow chart explaining the flow of estimation processing （ａ）入力データを示す図（ｂ）顔方向に応じた識別器３を選択する図(A) The figure which shows input data (b) The figure which selects the discriminator 3 according to face direction 推定結果を示す図Figure showing estimation results

以下図面に基づいて、本発明の実施形態を詳細に説明する。本実施形態では、オブジェクトとして顔器官を例にあげ、顔器官の位置を推定する識別器を学習し、学習した識別器を用いて顔器官の位置の推定を行う場合について説明を行うが、本発明は顔器官以外のオブジェクトに対しても適用できる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the present embodiment, a facial organ is taken as an example as an object, a classifier for estimating the position of the facial organ is learned, and the case of estimating the position of the facial organ using the learned classifier will be described. The invention can also be applied to objects other than facial organs.

図１は、本実施形態に係る識別器生成装置１、推定装置２のハードウェア構成の例を示す図である。図１に示すように、識別器生成装置１は、制御部１１、記憶部１２、メディア入出力部１３、通信制御部１４、入力部１５、表示部１６、周辺機器Ｉ／Ｆ部２７等が、バス１８を介して接続される。 FIG. 1 is a diagram illustrating an example of a hardware configuration of the discriminator generation device 1 and the estimation device 2 according to the present embodiment. As shown in FIG. 1, the discriminator generation device 1 includes a control unit 11, a storage unit 12, a media input / output unit 13, a communication control unit 14, an input unit 15, a display unit 16, a peripheral device I / F unit 27, and the like. Are connected via a bus 18.

制御部１１は、ＣＰＵ、ＲＯＭ、ＲＡＭ等によって構成される。ＣＰＵは、記憶部１２、ＲＯＭ、記録媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス１８を介して接続された各装置を駆動制御し、集約装置２が行う後述する処理を実現する。ＲＯＭは、不揮発性メモリであり、コンピュータのブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持している。ＲＡＭは、揮発性メモリであり、記憶部１２、ＲＯＭ、記録媒体等からロードしたプログラム、データ等を一時的に保持するとともに、制御部１１が各種処理を行う為に使用するワークエリアを備える。 The control unit 11 includes a CPU, ROM, RAM, and the like. The CPU calls and executes a program stored in the storage unit 12, ROM, recording medium, or the like to a work memory area on the RAM, drives and controls each device connected via the bus 18, and is performed by the aggregation device 2. The processing described later is realized. The ROM is a non-volatile memory and permanently holds a computer boot program, a program such as BIOS, data, and the like. The RAM is a volatile memory, and temporarily stores programs, data, and the like loaded from the storage unit 12, ROM, recording medium, and the like, and includes a work area used by the control unit 11 to perform various processes.

記憶部１２は、ＨＤＤ等であり、制御部１１が実行するプログラム、プログラム実行に必要なデータ、ＯＳ等が格納される。プログラムに関しては、ＯＳに相当する制御プログラムや、後述する処理をコンピュータに実行させるためのアプリケーションプログラムが格納されている。これらの各プログラムコードは、制御部１１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて各種の手段として実行される。 The storage unit 12 is an HDD or the like, and stores a program executed by the control unit 11, data necessary for program execution, an OS, and the like. As for the program, a control program corresponding to the OS and an application program for causing a computer to execute processing to be described later are stored. Each of these program codes is read by the control unit 11 as necessary, transferred to the RAM, read by the CPU, and executed as various means.

メディア入出力部１３（ドライブ装置）は、データの入出力を行い、例えば、ＣＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）、ＤＶＤドライブ（−ＲＯＭ、−Ｒ、−ＲＷ等）等のメディア入出力装置を有する。通信制御部１４は、通信制御装置、通信ポート等を有し、コンピュータとネットワーク間の通信を媒介する通信インタフェースであり、ネットワークを介して、他のコンピュータ間との通信制御を行う。ネットワークは、有線、無線を問わない。 The media input / output unit 13 (drive device) inputs / outputs data, for example, media such as a CD drive (-ROM, -R, -RW, etc.), DVD drive (-ROM, -R, -RW, etc.) Has input / output devices. The communication control unit 14 includes a communication control device, a communication port, and the like, and is a communication interface that mediates communication between a computer and a network, and performs communication control between other computers via the network. The network may be wired or wireless.

入力部１５は、データの入力を行い、例えば、キーボード、マウス等のポインティングデバイス、テンキー等の入力装置を有する。入力部１５を介して、コンピュータに対して、操作指示、動作指示、データ入力等を行うことができる。表示部１６は、液晶パネル等のディスプレイ装置、ディスプレイ装置と連携してコンピュータのビデオ機能を実現するための論理回路等（ビデオアダプタ等）を有する。尚、入力部１５及び表示部１６は、タッチパネルディスプレイのように、一体となっていても良い。 The input unit 15 inputs data and includes, for example, a keyboard, a pointing device such as a mouse, and an input device such as a numeric keypad. An operation instruction, an operation instruction, data input, and the like can be performed on the computer via the input unit 15. The display unit 16 includes a display device such as a liquid crystal panel, and a logic circuit or the like (video adapter or the like) for realizing a video function of the computer in cooperation with the display device. The input unit 15 and the display unit 16 may be integrated like a touch panel display.

周辺機器Ｉ／Ｆ部１７は、コンピュータと周辺機器とのデータ送受信を行うためのポートやアンテナ等であり、コンピュータは周辺機器Ｉ／Ｆ部１７を介して周辺機器とのデータの送受信を行う。周辺機器との接続形態は、有線（例えば、ＵＳＢ等）、無線（例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標）等）を問わない。バス１８は、各装置間の制御信号、データ信号等の授受を媒介する経路である。 The peripheral device I / F unit 17 is a port, an antenna, or the like for performing data transmission / reception between the computer and the peripheral device. The computer transmits / receives data to / from the peripheral device via the peripheral device I / F unit 17. The connection form with the peripheral device may be either wired (for example, USB) or wireless (for example, Bluetooth (registered trademark)). The bus 18 is a path that mediates transmission / reception of control signals, data signals, and the like between the devices.

（リグレッションベース手法に基づく顔器官推定）
図２は、リグレッションベース手法に基づく顔器官推定を表す概念図である。リグレッションベース手法では、顔器官の位置推定問題を、変位量関数（回帰関数）を用いた回帰問題として考える。 (Face organ estimation based on regression-based method)
FIG. 2 is a conceptual diagram showing facial organ estimation based on the regression-based method. In the regression-based method, the facial organ position estimation problem is considered as a regression problem using a displacement amount function (regression function).

図２（ａ）のように顔を含む入力画像Ｉと顔器官の適当な初期位置Ｓ^０が与えられたとする。リグレッションベース手法では、変位量関数（回帰関数）Ｒを用いて初期位置Ｓ^０を顔器官の正解位置に向かって変位させ（図２（ｂ））、正解位置Ｓ^Ｔを推定し、推定結果（回帰結果）を表示した出力画像Ｅを得る（図２（ｃ））。変位量関数（回帰関数）Ｒは、通常、機械学習によって学習生成される。本実施形態における識別器生成装置及び推定装置は、変位量関数（回帰関数）Ｒからなる識別器を複数生成し、識別器を生成、ならびに入力画像のオブジェクトの位置推定を行う。 2 the input image I and the face appropriate initial position S ⁰ of the organs including the face, as in (a) it is a given. In the regression-based method, the initial position S ⁰ is displaced toward the correct position of the facial organ using the displacement amount function (regression function) R (FIG. 2B), the correct position ^ST is estimated, and the estimation result ( An output image E displaying the regression result is obtained (FIG. 2C). The displacement amount function (regression function) R is usually learned and generated by machine learning. The discriminator generation device and the estimation device in this embodiment generate a plurality of discriminators each including a displacement amount function (regression function) R, generate a discriminator, and estimate the position of an object in an input image.

なお、学習、推定対象とするオブジェクトは、本実施形態に限ることなく、任意のオブジェクトを用いることができる。また、その数も任意に定めることができ、対象とする各画像において適切な数(１〜２００点程度)を設定することが可能である。 The object to be learned and estimated is not limited to this embodiment, and any object can be used. Also, the number can be arbitrarily determined, and an appropriate number (about 1 to 200 points) can be set for each target image.

（本実施形態に係る識別器３の構造）
図３は、本実施形態で学習生成する識別器３のデータ構造を表す概念図である。識別器３は、図３に示すように、変位量関数（回帰関数）Ｒ^１，…，Ｒ^ｔ，…，Ｒ^Ｔとして機能する複数の強識別器４−１，…，４−ｔ，…４−Ｔを接続した構造となっている。なお、簡単のため本文では強識別器を“Ｒ”と表す場合がある。 (Structure of the discriminator 3 according to the present embodiment)
FIG. 3 is a conceptual diagram showing the data structure of the discriminator 3 that is learned and generated in the present embodiment. Discriminator 3, as shown in FIG. 3, the displacement amount function (regression ^{^{function) R 1, ..., R t}} , ..., a plurality of strong classifiers 4-1 functioning as ^{R T, ..., 4-t} , ... 4-T is connected. For simplicity, the strong discriminator may be represented as “R” in the text.

また、各強識別器は、同様に変位量関数（回帰関数）として機能する複数の弱識別器を接続して構成される。例えば、図３の強識別器４−ｔは、変位量関数（回帰関数）ｒ^１，…，ｒ^ｋ，…，ｒ^Ｋからなる複数の弱識別器５−１，…，５−ｋ，…５−Ｋから構成される。なお、簡単のため本文では弱識別器を“ｒ”と表す場合がある。 Each strong classifier is configured by connecting a plurality of weak classifiers that similarly function as a displacement amount function (regression function). For example, strong classifier 4-t in FIG. 3, the displacement amount function (regression ^{^{function) r 1, ..., r k}} , ..., a plurality of weak classifiers 5-1 consisting ^{r K, ..., 5-k} , ... 5-K. For simplicity, the weak discriminator may be represented as “r” in the text.

以上のように、本実施形態の識別器３は、強識別器４と弱識別器５の２階層構造からなり、識別器３に、顔画像と顔器官の初期位置Ｓ^０が入力されると、変位量関数（回帰関数）からなる各強識別器４（に含まれる複数の弱識別器５）によって変位を繰り返しながら、最終的に顔器官位置Ｓ^Ｔを推定するように動作する。 As described above, the identifier 3 of the present embodiment, a two layered structure of the weak classifier 5 with strong classifier 4, to the discriminator 3, the initial position S ⁰ of the face image and the face organ is input while repeating the displacement by the displacement amount function (s weak classifiers 5 included in) the strong classifier 4 consisting of (regression function), and finally operative to estimate a face organ position S ^T.

また、本実施形態に係る識別器３は、顔の向き毎に学習生成される。
図４は、顔の向き毎に学習生成された識別器３を示す図である。図に示すように、「正面方向」の顔向きに対応した識別器３−１、「斜め左方向」の顔向きに対応した識別器３−２、「左方向」の顔向きに対応した識別器３−３、・・・、といったように、様々な顔向きに対応した識別器を学習する。これにより、あらゆる顔方向の画像に対して顔器官位置を効率よく推定することが可能となる。 Further, the classifier 3 according to the present embodiment is generated by learning for each face direction.
FIG. 4 is a diagram illustrating the discriminator 3 generated by learning for each face direction. As shown in the figure, the discriminator 3-1 corresponding to the face direction of “front direction”, the discriminator 3-2 corresponding to the face direction of “diagonal left direction”, and the discrimination corresponding to the face direction of “left direction” Learning classifiers corresponding to various face orientations, such as devices 3-3,. This makes it possible to efficiently estimate the face organ position for images in any face direction.

なお、必ずしも向き毎に識別器を分けて学習する必要はなく、また、本実施形態の３パターンに限ることなく、様々な顔の向きにて識別器を構築することが可能である。ここで、それぞれの顔の向きとは一意の角度ではなく、ある程度の範囲を一纏まりとして学習することができる。その際、学習するオブジェクトが画像内に全て含まれる角度であれば、例えば右斜め３０°と正面と左斜め３０°の向きの顔画像は、全て正面の顔の向きとして学習可能である。 Note that it is not always necessary to separately learn the classifier for each orientation, and the classifier can be constructed with various face orientations without being limited to the three patterns of the present embodiment. Here, the orientation of each face is not a unique angle, but a certain range can be learned as a group. At that time, if the object to be learned is an angle included in the entire image, for example, all face images in the direction of 30 ° to the right, 30 ° to the front, and 30 ° to the left can be learned as the direction of the front face.

（弱識別器５）
図５（ａ）は、本実施形態の弱識別器５であるＦｅｒｎ木の木構造を示す図である。Ｆｅｒｎ木の各ノードは入力された画像の入力値に応じて分岐先の２つのノードのいずれかへ分岐させる「分岐関数」から構成される。図中の“ｆ”は分岐関数を表している。Ｆｅｒｎ木は、決定木の一種であるが、図に示すように同一深度（同層）のノードで同一の分岐関数ｆを共有できる点に特徴がある。 (Weak classifier 5)
FIG. 5A is a diagram illustrating a tree structure of a Fern tree that is the weak classifier 5 of the present embodiment. Each node of the Fern tree is composed of a “branch function” that branches to one of two branch destination nodes according to the input value of the input image. “F” in the figure represents a branch function. The Fern tree is a kind of decision tree, but is characterized in that the same branch function f can be shared by nodes of the same depth (same layer) as shown in the figure.

図５（ｂ）は、Ｆｅｒｎ木の木構造の別の表現態様である。Ｆｅｒｎ木は、上記したように同一深度（同層）で同一の分岐関数を用いることから、その木構造は図５（ｂ）に示すように分岐関数の一次元系列｛ｆ_０、ｆ_１、ｆ_２、・・・、ｆ_Ｓ｝で表現することもできる。 FIG. 5B is another representation mode of the Fern tree tree structure. Since the Fern tree uses the same branch function at the same depth (same layer) as described above, the tree structure is a one-dimensional series of branch functions {f ₀ , f ₁ , It can also be expressed by f ₂ ,..., f _S }.

本実施形態は、上記した分岐関数の各ノードを動的計画法により決定する。なお、本実施形態においては、弱識別器を構成する木構造についてＦｅｒｎ木を例として説明するが、これに限定することなく、本発明の開示の範囲内において生成可能な種々の木構造にて弱識別器を構築することが可能である。 In the present embodiment, each node of the above branch function is determined by dynamic programming. In the present embodiment, the Fern tree will be described as an example of the tree structure constituting the weak classifier. However, the present invention is not limited to this, and various tree structures that can be generated within the scope of the disclosure of the present invention are used. It is possible to construct a weak classifier.

（学習目標値に応じた学習）
また、本実施形態では、強識別器毎に学習目標値（正解位置までの到達目標）を設定し、この学習目標値に応じて、弱識別器の接続数を動的に決定する。学習目標値とは、強識別器を構築する際に達成すべき最終的な正解位置の推定精度に対する目標値である。 (Learning according to the learning target value)
In the present embodiment, a learning target value (target to reach the correct position) is set for each strong classifier, and the number of connections of weak classifiers is dynamically determined according to the learning target value. The learning target value is a target value for the estimation accuracy of the final correct position to be achieved when constructing the strong classifier.

このように、強識別器毎に、学習目標値（正解位置までの到達目標）を定めることで、各強識別器内において必要の分だけの目標値に応じた最適な数の弱識別器が学習生成される。 In this way, by determining the learning target value (target to reach the correct answer position) for each strong classifier, the optimum number of weak classifiers corresponding to the target values as much as necessary in each strong classifier can be obtained. Learning generated.

図６は、ある強識別器４−ｔの学習を行う様子を示す図である。強識別器４−ｔの学習時に学習目標値が設定される。そして、学習目標を達成したか否かを評価しながら、強識別器４−ｔ内の弱識別器５−１、５−２、…を生成しながら接続していく。学習目標を達成した場合、弱識別器の生成を終了し次の強識別器の学習へ移行する。このように、設定された学習目標値に応じて、自動的に弱識別器の接続数が決定される。 FIG. 6 is a diagram showing how learning is performed by a certain strong classifier 4-t. A learning target value is set during learning of the strong classifier 4-t. .. Are connected while generating the weak classifiers 5-1, 5-2,... In the strong classifier 4-t while evaluating whether or not the learning target has been achieved. When the learning goal is achieved, the generation of the weak classifier is terminated and the process proceeds to learning of the next strong classifier. In this way, the number of weak classifier connections is automatically determined according to the set learning target value.

なお、本実施形態では強識別器４−１，…，４−ｔ，…４−Ｔの各学習段階を「ステージ」と呼び、強識別器内の弱識別器５−１，…，５−ｋ，…５−Ｋの各学習段階を「ラウンド」と呼ぶ。 In this embodiment, the learning stages of the strong classifiers 4-1, ..., 4-t, ... 4-T are called "stages", and the weak classifiers 5-1, ..., 5- in the strong classifiers. Each learning stage of k,..., 5-K is called “round”.

（特徴量）
また、本実施形態において識別器の構築のために用いられる特徴は、非特許文献５に記載のＳｈａｐｅ−Ｉｎｄｅｘ特徴を用いて算出される。Ｓｈａｐｅ−Ｉｎｄｅｘは、任意の複数の点のうちの２点を用いて算出される画素位置の特徴を表す。Ｓｈａｐｅ−Ｉｎｄｅｘを用いることにより、顔形状の変化が大きい場合でも不変性の高い特徴を得られるメリットがある。Ｓｈａｐｅ−Ｉｎｄｅｘの詳細は非特許文献５を参照されたい。本実施形態では、弱識別器のノードを構成する特徴量として、算出したＳｈａｐｅ−Ｉｎｄｅｘの２点間の輝度差（Ｐｉｘｅｌ−Ｄｉｆｆｅｒｅｎｃｅ）を差分特徴量として用いることにする。 (Feature value)
In addition, the feature used for constructing the discriminator in the present embodiment is calculated using the Shape-Index feature described in Non-Patent Document 5. Shape-Index represents the feature of the pixel position calculated using two of a plurality of arbitrary points. By using Shape-Index, there is an advantage that a high invariant feature can be obtained even when the face shape changes greatly. Refer to Non-Patent Document 5 for the details of Shape-Index. In the present embodiment, the calculated luminance difference (Pixel-Difference) between two points of Shape-Index is used as the difference feature amount as the feature amount constituting the node of the weak classifier.

図７はＳｈａｐｅ−Ｉｎｄｅｘを表す図である。図７の６Ａ、６ＢはＳｈａｐｅ−Ｉｎｄｅｘの画素位置であり、それぞれの画素値（輝度値）がＩ_Ａ、Ｉ_Ｂであることを示している。本実施形態では図に示すように２つのＳｈａｐｅ−Ｉｎｄｅｘ間の輝度差（＝Ｉ_Ａ−Ｉ_Ｂ）を特徴量として用いる。 FIG. 7 is a diagram showing Shape-Index. Reference numerals 6A and 6B in FIG. 7 denote Shape-Index pixel positions, which indicate that the pixel values (luminance values) are I _A and I _B , respectively. Brightness difference between two Shape-Index In the present embodiment, as shown in figure ₍₌ I A -I _B) used as the feature amount.

＜学習動作＞
次に、図８のフローチャートを参照しながら、本実施形態に係る識別器生成装置１が識別器３を生成・学習する処理（学習処理）を説明する。 <Learning action>
Next, a process (learning process) in which the classifier generating device 1 according to the present embodiment generates and learns the classifier 3 will be described with reference to the flowchart of FIG.

まず、識別器生成装置１の制御部１１は、学習データとして、顔を含むＮ個の学習画像Ｉ_ｉ（ｉ＝１〜Ｎ）と、各学習画像中の顔器官の正解位置（目標位置）Ｓ^※ _ｉ（ｉ＝１〜Ｎ）の入力を受け付け（ステップＳ１）、学習に用いるための任意の顔器官の初期位置Ｓ^０（学習の初期値）を設定する（ステップＳ２）。 First, the control unit 11 of the discriminator generation device 1 uses N learning images I _i (i = 1 to N) including faces and correct positions (target positions) of facial organs in each learning image as learning data. An input of S ^* _i (i = 1 to N) is accepted (step S1), and an initial position S ⁰ (initial value of learning) of an arbitrary facial organ to be used for learning is set (step S2).

初期位置Ｓ^０の設定の方法は任意であるが、本実施形態では、前記した顔器官の正解位置（目標位置）の平均値（（Ｓ^※ _１＋…＋Ｓ^※ _Ｎ）/Ｎ）を初期位置Ｓ^０として設定する。 While setting method of the initial position S ⁰ is arbitrary, in the present embodiment, the initial position of the correct position of the face organ average of (target ^{_{position) ((S ※ 1 + ...}} + S ※ N) / N) It is set as S ^0.

以降、識別器３の学習を行っていく。まず、制御部１１は、識別器３の現在の学習ステージ（学習する強識別器）のインデクスｔを初期化する（ｔ＝１、ステップＳ３）。 Thereafter, learning of the discriminator 3 is performed. First, the control unit 11 initializes the index t of the current learning stage (learning strong classifier) of the classifier 3 (t = 1, step S3).

そして、制御部１１は、学習ステージｔ（学習する強識別器４−ｔ（Ｒ_ｔ））における弱識別器を生成するための特徴と閾値からなる分岐関数の候補を生成する。具体的には、制御部１１は、まずＳｈａｐｅ−Ｉｎｄｅｘを算出するための画素位置をランダムにＰ個（本実施形態では４万個程度）選択し、選択したＰ個の画素位置から数式１に示すＳｈａｐｅ−Ｉｎｄｅｘに基づくＭ個の特徴を、全学習画像について算出する（ステップＳ４）。 And the control part 11 produces | generates the candidate of the branch function which consists of the characteristic and threshold value for producing | generating the weak discriminator in the learning stage t (strong discriminator 4-t ( _Rt ) to learn). Specifically, the control unit 11 first randomly selects P pixel positions (about 40,000 in the present embodiment) for calculating the Shape-Index, and calculates Formula 1 from the selected P pixel positions. M features based on the indicated Shape-Index are calculated for all learning images (step S4).

なお、Ｍは、Ｓｈａｐｅ−Ｉｎｄｅｘの全ての組み合わせについて輝度差を算出した場合、Ｍ＝Ｐ（Ｐ−１）となる。 M is M = P (P−1) when the luminance difference is calculated for all the combinations of Shape-Index.

次に、制御部１１は、Ｍ個の特徴毎に適切な閾値δを設定する（ステップＳ５）。具体的には、制御部１１は、Ｍ個の特徴毎に全学習画像から算出した特徴の特徴量に対する学習画像数の分布（ヒストグラム）を作成し、分布面積を２分する中央値となる閾値δを設定する。閾値δは種々の指標に基づいて設定可能であるが、本実施形態では、特徴の分布を２分したヒストグラム面積が等しくなる特徴量の値を特定し、閾値δを設定する。 Next, the control unit 11 sets an appropriate threshold δ for each of the M features (step S5). Specifically, the control unit 11 creates a distribution (histogram) of the number of learning images for the feature amount of the feature calculated from all the learning images for each of the M features, and a threshold value serving as a median value that divides the distribution area into two Set δ. Although the threshold value δ can be set based on various indices, in this embodiment, the value of the feature quantity that equalizes the histogram area obtained by dividing the feature distribution into two is specified, and the threshold value δ is set.

図９は、ステップＳ５の処理を概念的に表した図である。図９（ａ）に示すように、制御部１１は、２点のＳｈａｐｅ−Ｉｎｄｅｘにおける相対位置となる画素の差分特徴（ｆｅａｔｕｅｒ_１、ｆｅａｔｕｅｒ_２・・・、ｆｅａｔｕｅｒ_Ｍ）毎に、全学習画像につき特徴量を算出し、各差分特徴毎にヒストグラムＨ_１、Ｈ_２・・・、Ｈ_Ｍを作成する。そして、制御部１１は、図９（ｂ）に示すように、各ヒストグラムを均等に２分する閾値δ_１、δ_２、・・・、δ_Ｍを設定する。 FIG. 9 is a diagram conceptually showing the process of step S5. As shown in FIG. 9 (a), the control unit 11 performs, for each learning image, for each difference feature (feature ₁ , feature ₂ ..., Feature _M ) of the pixels at the relative positions in the two Shape-Index. A feature amount is calculated, and histograms H ₁ , H ₂ ... _HM are created for each difference feature. Then, as shown in FIG. 9B, the control unit 11 sets threshold values δ ₁ , δ ₂ ,..., Δ _M that equally divide each histogram into two.

以上のように、ステップＳ４においてランダムに選択した特徴（ｆｅａｔｕｅｒ_１、ｆｅａｔｕｅｒ_２、・・・、ｆｅａｔｕｅｒ_Ｍ）とステップＳ５において設定した閾値（δ_１、δ_２、・・・、δ_Ｍ）によって、弱識別器の各ノードを構成するＭ個の分岐関数の候補（Ｆ_１、Ｆ_２、・・・、Ｆ_Ｍ）が生成される（数式２）。 As described above, according to the features (feature ₁ , feature ₂ ,..., Feature _M ) randomly selected in step S4 and the threshold values (δ ₁ , δ ₂ ,..., Δ _M ) set in step S5, M branch function candidates (F ₁ , F ₂ ,..., F _M ) constituting each node of the weak classifier are generated (Formula 2).

制御部１１は、以降の処理で、上式の分岐関数の候補（Ｆ_１、Ｆ_２、・・・、Ｆ_Ｍ）の中から、弱識別器の各ノードに割り当てる分岐関数を決定する。これにより最適な弱識別器を構築する。
まず、制御部１１は、現在の学習ラウンド（学習する弱識別器）のインデクスｋを初期化する（ｋ＝１、ステップＳ６）。 In the subsequent processing, the control unit 11 determines a branch function to be assigned to each node of the weak classifier from the branch function candidates (F ₁ , F ₂ ,..., F _M ) in the above formula. As a result, an optimum weak classifier is constructed.
First, the control unit 11 initializes the index k of the current learning round (weak classifier to be learned) (k = 1, step S6).

続いて、制御部１１は、分岐関数の候補（Ｆ_１、Ｆ_２、・・・、Ｆ_Ｍ）から、分岐関数の全系列の組み合わせをグラフにした「Ｆｅｒｎ候補グラフＧｒ」を作成する（ステップＳ７）。
図１０は、Ｆｅｒｎ候補グラフ７Ｇｒを表す概念図である。Ｆｅｒｎ候補グラフ７Ｇｒのノード７ｎは分岐関数の候補（Ｆ_１、Ｆ_２、・・・、Ｆ_Ｍ）を表し、エッジ７ｅは分岐関数の候補同士を繋ぐ全ての組み合わせを表している。 Subsequently, the control unit 11 creates a “Fern candidate graph Gr” that graphs combinations of all series of branch functions from the branch function candidates (F ₁ , F ₂ ,..., F _M ) (step S1). S7).
FIG. 10 is a conceptual diagram showing the Fern candidate graph 7Gr. The node 7n of the Fern candidate graph 7Gr represents a branch function candidate (F ₁ , F ₂ ,..., F _M ), and the edge 7e represents all combinations connecting the branch function candidates.

そして、制御部１１は、ノード７ｎとエッジ７ｅから構成されるＦｅｒｎ候補グラフ７Ｇｒから適切なパス、すなわち、分岐関数の組み合わせを選定する（ステップＳ８）。具体的には、制御部１１は、各深度におけるコスト関数を累積した累積コスト関数を最小とする（コスト関数の正負は逆転させてもよく、この場合は累積コスト関数を最大とする）分岐関数の組み合わせを、動的計画法を用いて選定する。ここで、累積コスト関数について説明をしておく。累積コスト関数は次のように定義される。 Then, the control unit 11 selects an appropriate path, that is, a combination of branch functions from the Fern candidate graph 7Gr composed of the node 7n and the edge 7e (step S8). Specifically, the control unit 11 minimizes the accumulated cost function obtained by accumulating the cost function at each depth (the sign of the cost function may be reversed, and in this case, the accumulated cost function is maximized). Are selected using dynamic programming. Here, the accumulated cost function will be described. The cumulative cost function is defined as follows:

上式において、ＳはＦｅｒｎ木の葉ノードの深度（Ｆｅｒｎ木の高さ）であり、すなわち、Ｆｅｒｎ木を構成する分岐関数の個数である。深度Ｓはユーザが任意に設定可能なパラメータである。また、ｓ（＝０〜Ｓ）はＦｅｒｎ木の各ノードの深度を示すインデクスである。 In the above equation, S is the depth of the leaf node of the Fern tree (the height of the Fern tree), that is, the number of branch functions constituting the Fern tree. The depth S is a parameter that can be arbitrarily set by the user. Further, s (= 0 to S) is an index indicating the depth of each node of the Fern tree.

また、上式において右辺の第１項のＵ（ｆ_ｓ）は、Ｆｅｒｎ木の各深度ｓにおいて分岐関数がｆ_ｓであるとした場合の局所コストである（以降、Ｕ（ｆ_ｓ）を「データ項」とも呼ぶ）。第２項のＰ（ｆ_ｓ，ｆ_ｓ−１）は、分岐関数の状態遷移コストであり、定数λはその重みである（以降、Ｐ（ｆ_ｓ，ｆ_ｓ−１）を「平滑化項」とも呼ぶ）。以下、データ項Ｕ（ｆ_ｓ）および平滑化項Ｐ（ｆ_ｓ，ｆ_ｓ−１）について具体的に説明する。 In the above equation, U (f _s ) in the first term on the right side is a local cost when the branch function is f _s at each depth s of the Fern tree (hereinafter, U (f _s ) is expressed as “ Also called "data term"). The second term P (f _s , f _s−1 ) is the state transition cost of the branch function, and the constant λ is its weight (hereinafter, P (f _s , f _s−1 ) is expressed as “smoothing term. "). Hereinafter, the data term U _{(f s)} and smoothing term _{_{P (f s, f s-}} 1) will be described in detail.

１．データ項Ｕ（ｆ_ｓ）
データ項Ｕ（ｆ_ｓ）は、分岐関数の主に識別能力を表す指標であり、具体的には数式４に示す計算式によって算出される。 1. Data term U _(f s)
Data term U (f _s) is an index represents mainly the ability to distinguish the branch function is specifically calculated by the formula shown in Equation 4.

α；調整パラメータ
σ_ｗ１、σ_ｗ２；分割後の各クラスのクラス内分散
σ_ｂ；分割後のクラス間分散

α: Adjustment parameter σ _w1 , σ _w2 ; _Intraclass variance of each class after division σ _b ; Interclass variance after division

数式中のΔＳ_１は、分岐関数によって２分した一方の分類に属する各学習画像Ｉ_ｉ（ｉ∈Ω_１）の現在位置Ｓ_ｉ（ｉ∈Ω_１）と正解位置Ｓ^※ _ｉ（ｉ∈Ω_１）の距離平均であり、ΔＳ_２は、分岐関数によって２分した他方の分類に属する各学習画像Ｉ_ｉ（ｉ∈Ω_２）の現在位置Ｓ_ｉ（ｉ∈Ω_２）と正解位置Ｓ^※ _ｉ（ｉ∈Ω_２）の距離平均であり、次のように計算される。 ΔS ₁ in the equation is the current position S _i (i∈Ω ₁ ) and correct position S ^* _i (i∈Ω) of each learning image I _i (i∈Ω ₁ ) belonging to one classification divided by a bifurcation function. ₁ ) is a distance average, and ΔS ₂ is a current position S _i (i∈Ω ₂ ) and a correct position S ^{* of} each learning image I _i (i∈Ω ₂ ) belonging to the other classification divided by the bifurcation function. It is a distance average of _i (i∈Ω ₂ ) and is calculated as follows.

ｂ＝１，２
Ω_ｂ；このｂｉｎに含まれる学習画像の集合

b = 1, 2
Ω _b ; set of learning images included in this bin

データ項Ｕ（ｆ_ｓ）の値が小さくなる分岐関数ほど識別能力が高いことを示す。データ項Ｕ（ｆ_ｓ）の値を小さくする分岐関数（識別性能の高い分岐関数）とは、第１には、数式４のΔＳ_１、ΔＳ_２の値（つまり数式５の値）を大きくする分岐関数である。すなわち、分割した各学習画像の現在位置と正解位置との距離平均が大きくなる分岐関数である。これにより、学習画像の現在位置をより正解位置へ近づけようとする分岐関数ほど、最適な分岐関数として選定されやすくなる。 More branching function the value of the data section U (f _s) is reduced indicates a higher discriminating ability. A branch function to reduce the value of the data section U (f _s) (high branching function of identification performance) is in the first 1, [Delta] S 1 of formula _4, the value of [Delta] S ₂ (that is the value of Equation 5) to increase A branch function. That is, it is a branch function that increases the average distance between the current position and the correct position of each divided learning image. Thereby, the branch function that tries to bring the current position of the learning image closer to the correct position is more easily selected as the optimal branch function.

また、データ項Ｕ（ｆ_ｓ）の値を小さくする分岐関数（識別性能の高い分岐関数）とは、第２には、数式４のα（σ_ω１＋σ_ω２）／σ_ｂを小さくする分岐関数である。すなわち、分割後の２つの分布のクラス内分散を小さくし、且つ、クラス間分散を大きくする分岐関数ほど（２つの分布の分離度を大きくする分岐関数ほど）、最適な分岐関数として選定されやすくなる。 The data section U and the branch function values to reduce the (f _s) (high branching function of identification performance) is in the second, branch functions to reduce the formula _{_{4 α (σ ω1 + σ ω2}} ) / σ b It is. In other words, the branch function that reduces the intra-class variance of the two distributions after division and increases the inter-class variance (the branch function that increases the separation of the two distributions) is easier to select as the optimal branch function. Become.

２．平滑化項Ｐ（ｆ_ｓ，ｆ_ｓ−１）
一方、平滑化Ｐ（ｆ_ｓ，ｆ_ｓ−１）は、分岐関数の状態遷移コストであり、Ｆｅｒｎ木の１つ前の深度（ｓ−１）の分岐関数（ｆ_ｓ−１）と現在の深度（ｓ）の分岐関数（ｆ_ｓ）との関連度を評価する。平滑化Ｐ（ｆ_ｓ，ｆ_ｓ−１）は数式６ように算出される。 2. Smoothing term _{_{P (f s, f s-}} 1)
On the other hand, the smoothing P (f _s , f _s-1 ) is the state transition cost of the branch function, and the branch function (f _s-1 ) at the depth (s-1) immediately before the Fern tree and the current The degree of association between the depth (s) and the bifurcation function (f _s ) is evaluated. Smoothing _{_{P (f s, f s-}} 1) is calculated as Equation 6.

ｂ＝１またはｂ＝２

b = 1 or b = 2

ここで、ｍ_１は、１つ前の深度（ｓ−１）の分岐関数（ｆ_ｓ−１）で片方のｂｉｎ（例えばΩ_１）に分類された学習画像のうち、現在の深度（ｓ）の分岐関数（ｆ_ｓ）で同じくΩ_１に分類された学習画像の個数を表す。一方、ｍ_２は、１つ前の深度（ｓ−１）の分岐関数（ｆ_ｓ−１）によりΩ_１に分類された学習画像のうち、現在の深度（ｓ）の分岐関数（ｆ_ｓ）でΩ_２に分類された学習画像の個数を表す。 Here, m ₁ is the current depth (s) among the learning images classified into one bin (for example, Ω ₁ ) by the branch function (f _s-1 ) of the previous depth (s-1). It represents the number of branching function (f _s) at similarly classified training image to Omega _1. On the other hand, _{m 2,} of the previous depth (s-1) branch function _{(f s-1)} by the classification learning image to Omega _1, branch function of the current depth _{(s) (f} s) in represents the number of classification learning image to Omega _2.

上式において、ｍ_１＝ｍ_２の場合、平滑化項Ｐ（ｆ_ｓ，ｆ_ｓ−１）の値が最も小さくなる。すなわち、１つ前の深度（ｓ−１）の分岐関数（ｆ_ｓ−１）によって各ｂｉｎに分類された学習画像を、現在の深度（ｓ）において偏りなく均等に各ｂｉｎに分類する分岐関数ｆ_ｓほど、最適な分岐関数として選定されやすくなる。平滑化項の趣旨は、１つ前の深度（ｓ−１）のノードに割り当てた分岐関数（特徴）となるべく関連度の低い分岐関数（特徴）を現在の深度（ｓ）のノードにおいて選定させることで、Ｆｅｒｎ木（弱識別器）の各ノード（各分岐関数）における識別ターゲットに偏りをなくし、結果的に、各ｂｉｎに分類される学習画像の個数の偏りを小さくし、過学習やノイズ発生を防ぐことにある。 In the above _equation, if the m 1 = _{m 2,} smoothing term _{_{P (f s, f s-}} 1) the value of the smallest. That is, the branch function that classifies the learning images classified into the bins by the branch function (f _s-1 ) of the previous depth (s-1) evenly into the bins at the current depth (s) without any bias. As f _s , it becomes easier to be selected as an optimal branch function. The purpose of the smoothing term is to select a branch function (feature) having a low relevance as a branch function (feature) assigned to the node at the previous depth (s-1) at the node at the current depth (s). This eliminates the bias in the identification target in each node (each branch function) of the Fern tree (weak classifier), and as a result, reduces the bias in the number of learning images classified into each bin, thereby over-learning and noise. It is to prevent the occurrence.

以上、累積コスト関数を構成するデータ項Ｕ（ｆ_ｓ）と平滑化項Ｐ（ｆ_ｓ，ｆ_ｓ−１）について具体的に説明した。
制御部１１は、この累積コスト関数を最小化する分岐関数の組み合わせ Above, the data term U _{(f s)} and the smoothing term _{_{P (f s, f s-}} 1) constituting the cumulative cost function was specifically described.
The control unit 11 combines the branch function that minimizes the accumulated cost function.

を、動的計画法を用いて選定（決定）する。動的計画法では、各深度ｓ（＝０〜Ｓ）においてコスト関数の累計値を計算していく。この際、深度ｓの各ノード（各分岐関数の候補）では、１つ前の深度（ｓ−１）の各ノード（各分岐関数の候補）における、コスト関数Ｕ（ｆ_ｓ）＋Ｐ（ｆ_ｓ，ｆ_ｓ−１）を最小とするパス（部分最適パス）をバックポインタとして各々保持しておく。そして、最下層の深度Ｓまでコスト関数の累計値を計算したら、最下層の深度Ｓでコスト関数の累計値が最小となるノード（分岐関数）からバックポインタを辿って行く（トレースバック）ことで、分岐関数の組み合わせを選定（決定）する。ここで選定した分岐関数を構成する差分特徴及びその差分特徴に関連する閾値は、弱識別器の深さ毎の構成要素として保存される。 Are selected (determined) using dynamic programming. In the dynamic programming, the cumulative value of the cost function is calculated at each depth s (= 0 to S). At this time, the nodes of the depth s (candidates of each branch functions), at each node of the previous depth (s-1) (candidate of each branch function), the cost function _{U (f s) + P (} f s , F _s−1 ), each path (partial optimal path) is held as a back pointer. Then, after calculating the accumulated value of the cost function up to the depth S of the lowest layer, the back pointer is traced (trace back) from the node (branch function) where the accumulated value of the cost function becomes the minimum at the depth S of the lowest layer. Select (determine) a combination of branch functions. The difference feature constituting the branch function selected here and the threshold value related to the difference feature are stored as components for each weak classifier depth.

図１１は、動的計画法によって分岐関数の組み合わせを選定した様子を表す概念図である。図１１の数値は、各深度（ｓ＝０〜Ｓ）におけるコスト関数の累計値を表す。図１１の場合、（Ｆ_１００３，Ｆ_３８４３，Ｆ_８３４，・・・，Ｆ_７４９８，Ｆ_５１１１）が分岐関数の組み合わせとして選定されたことを示す。 FIG. 11 is a conceptual diagram showing a state in which a combination of branch functions is selected by dynamic programming. The numerical value of FIG. 11 represents the cumulative value of the cost function at each depth (s = 0 to S). In the case of FIG. 11, (F ₁₀₀₃ , F ₃₈₄₃ , F ₈₃₄ ,..., F ₇₄₉₈ , F ₅₁₁₁ ) are selected as a combination of branch functions.

制御部１１は、動的計画法で得られた分岐関数の組み合わせから、弱識別器ｒ^ｋを構築する（ステップＳ９）。
図１２は、数式７に示す分岐関数の組み合わせから弱識別器ｒ^ｋを構築する様子を示す図である。 Control unit 11, a combination of the branch function obtained by dynamic programming to construct a weak classifier r ^k (step S9).
Figure 12 is a diagram showing how to construct a weak classifier r ^k from the combination of branching function shown in Equation 7.

制御部１１は、構築した弱識別器ｒ^ｋにより識別された検出対象となる画像内のオブジェクトの入力値の位置（現在の推定位置）を補正するための移動量を決定する（ステップＳ１０）。
移動量の決定方法としては、検出対象オブジェクトの位置（正解値）が既知の学習画像に対して、検出対象オブジェクトに対する任意の値（学習値）を入力する。次に、当該学習画像の学習値を構築された弱識別器ｒ^ｋを用いて識別を行い、その識別結果としてどの葉ノードに識別されるかを判定する。続いて、識別された学習画像の学習値と正解値との距離を算出する。このような識別処理を異なる複数の学習画像及び／又は学習値にて繰り返し行う。そして同じ葉ノードに識別された全学習データから算出された学習値と正解値との距離の平均値を求める。ここで求められた平均値が、当該葉ノードに識別された画像のオブジェクトの推定位置の移動量として用いられる。
つまり、移動量ΔＳは、Ｆｅｒｎ木（弱識別器）ｒ^ｋの各葉ノードで分類されたｂｉｎ中の学習画像の集合をそれぞれΩ_ｂとし、各学習画像Ｉ_ｉ（ｉ∈Ω_ｂ）の正解位置をＳ^※ _ｉ（ｉ∈Ω_ｂ）と現在位置をＳ^ｋ−１ _ｉ（ｉ∈Ω_ｂ）とすると、数式８のように定式化できる。 Control unit 11 determines a movement amount for correcting the position of the input values of an object in an image to be detected that is identified by weak classifiers r ^k constructed (current estimated position) (step S10).
As a method of determining the movement amount, an arbitrary value (learning value) for the detection target object is input to a learning image whose position (correct value) of the detection target object is known. Next, it is determined whether or not subjected to identification using the weak classifiers r ^k that are built on the learned value of the learning image, are identified in which the leaf node as the identification result. Subsequently, the distance between the learning value and the correct value of the identified learning image is calculated. Such identification processing is repeated with a plurality of different learning images and / or learning values. Then, the average value of the distance between the learning value calculated from all the learning data identified by the same leaf node and the correct value is obtained. The average value obtained here is used as the movement amount of the estimated position of the object of the image identified by the leaf node.
That is, the movement amount ΔS is the Omega _b Fern tree (weak classifiers) ^{r k} of the set of training images in bin classified in each leaf node, respectively, the correct answer of each learning image _I i _(i∈Ω _b) If the position is S ^* _i (i∈Ω _b ) and the current position is S ^k−1 _i (i∈Ω _b ), it can be formulated as Equation 8.

Ω_ｂ；このｂｉｎに分類された学習画像の集合
ｂ＝１〜２^Ｓ

Ω _b ; set of learning images classified into this bin b = 1 to 2 ^S

図１３は、構築した弱識別器ｒ^ｋのある葉ノード５１において移動量を求める様子を示す図である。図に示すように葉ノード５１において各ｂｉｎ（Ω_１、Ω_２）に分類された学習画像Ｉ_１、学習画像Ｉ_２に基づいて移動量ΔＳ^ｋ _１、ΔＳ^ｋ _２を算出する。 FIG. 13 is a diagram showing how the movement amount is ^obtained in the leaf node 51 with the constructed weak classifier rk. As shown in the figure, the movement amounts ΔS ^k ₁ and ΔS ^k ₂ are calculated based on the learning image I ₁ and the learning image I ₂ classified into each bin (Ω ₁ , Ω ₂ ) in the leaf node 51.

制御部１１は、弱識別器ｒ^ｋの全ての葉ノード５１において、移動量ΔＳ^ｋ _１、ΔＳ^ｋ _２を算出し、算出した移動量を、弱識別器ｒ^ｋの出力値（回帰結果）として保存しておく。後述する推定処理では、ここで保存したＦｅｒｎ木（弱識別器）ｒ^ｋの出力値を取得して顔器官位置の推定を行う。 Control unit 11, in all of the leaf nodes 51 weak classifiers ^{r k,} the movement amount [Delta] S ^k _1, to calculate the [Delta] S ^k _2, the calculated amount of movement, as an output value of the weak discriminator ^{r k} (regression results) Save it. In estimation processing will be described later, estimates the face organ position where Fern trees saved (weak classifiers) to obtain the output value of r ^k.

オブジェクト位置の検出時には、制御部１１は、算出した移動量に基づいて、各学習画像Ｉ_ｉ（ｉ＝１〜Ｎ）の顔器官の現在推定されている位置Ｓ_ｉ ^ｋ−１を数式９のように更新する。各学習画像の移動量は、前述したように各学習画像が葉ノードにおいて最終分類されたｂｉｎ毎に算出された移動量である。 At the time of detecting the object position, the control unit 11 calculates the currently estimated position S _i ^k−1 of the facial organ of each learning image I _i (i = 1 to N) based on the calculated movement amount as shown in Equation 9. Update as follows. The movement amount of each learning image is the movement amount calculated for each bin in which each learning image is finally classified in the leaf node as described above.

また、制御部１１は、構築した識別器を用いて学習画像における学習値の更新を行うことでラウンドエラーｅ_{ｒｏｕｎｄ}を下記の数式で評価し（ステップＳ１１）、ラウンド学習（弱識別器の学習）の終了判定を行う。 Further, the control unit 11 uses the constructed classifier to update the learning value in the learning image, thereby evaluating the round error e _round using the following formula (step S11), and round learning (learning of the weak classifier). Determine the end of.

ラウンドエラーｅ_{ｒｏｕｎｄ}は、現在のステージｔ内（強識別器４−ｔ（Ｒ^ｔ内））で、入力された各学習画像Ｉ_ｉの現在位置Ｓ_ｉ ^ｔ−１が正解位置Ｓ^※ _ｉまでどのくらい近づいたかを、学習画像Ｉ_ｉの初期位置Ｓ^０ _ｉと正解位置Ｓ^※ _ｉとの距離を基準とした割合で表したものである。制御部１１は、各ステージで、予め設定した学習目標値（現在位置を正解位置にどのくらい近づけるかの割合）と比較し、学習を続行するか、或いは学習を終了するかを判断する。 The round error e _round is within the current stage t (strong discriminator 4-t (within ^Rt )) and how long the current position S _i ^{t-1 of} each input learning image I _i is up to the correct position S ^* _i. The approach is represented by a ratio based on the distance between the initial position S ⁰ _i and the correct position S ^* _i of the learning image I _i . At each stage, the control unit 11 compares with a preset learning target value (a ratio of how close the current position is to the correct position) to determine whether to continue learning or to end learning.

ｅ_{ｒｏｕｎｄ}＜学習目標値の場合（ステップＳ１２；Ｎｏ）、次の学習ラウンドへ移行し（ｋ←ｋ＋１、ステップＳ２７）、Ｆｅｒｎ木（弱識別器）の学習を続行する。一方、ｅ_{ｒｏｕｎｄ}≧学習目標値の場合（ステップＳ１２；Ｙｅｓ）、そのステージのＦｅｒｎ木（弱識別器）の学習を終了し、学習したＦｅｒｎ木（弱識別器）ｒ^１，…，ｒ^ｋに基づいて強識別器Ｒ^ｔを構築する（ステップＳ１４）。
このように、ステージ毎に設定した学習目標値に応じて、Ｆｅｒｎ木（弱識別器）の最適な接続数（学習ラウンド数）が動的に決定される。 If e _round <learning target value (step S12; No), the process proceeds to the next learning round (k ← k + 1, step S27), and the learning of the Fern tree (weak classifier) is continued. On the other _hand, in the case of _{e round english (us)} ≧ objective value (step S12; Yes), terminates the learning Fern tree of the stage (weak classifier), I learned Fern tree (weak ^classifiers) r 1, ..., a ^{r k} Based on this, a strong classifier ^Rt is constructed (step S14).
Thus, the optimum number of connections (number of learning rounds) of the Fern tree (weak classifier) is dynamically determined according to the learning target value set for each stage.

現在のステージｔの学習が終了した場合（ステップＳ１２；Ｙｅｓ）、制御部１１は、更に、学習の収束状態を表す指標であるステージエラーｅ_{ｓｔａｇｅ}を数式１１で評価し（ステップＳ１５）、ステージ学習（強識別器の学習）の終了判定を行う（なお、下記式は、現在のステージｔがｔ＞＝３の場合に有効である）。 When the learning of the current stage t is completed (step S12; Yes), the control unit 11 further evaluates the stage error e _stage , which is an index indicating the learning convergence state, using the equation 11 (step S15), and the stage learning (End of strong classifier learning) is determined (Note that the following equation is valid when the current stage t is t> = 3).

ここで、ｅ_{ｓｔａｇｅ}＜１の場合（ステップＳ１６；Ｎｏ）、制御部１１は、学習が未収束と判断し学習ステージｔをインクリメントし（ｔ←ｔ＋１、ステップＳ１７）、次の強識別器の学習ステージへ移行する（ステップＳ４に戻る）。一方、ｅ_{ｓｔａｇｅ}≧１の場合（ステップＳ１６；Ｙｅｓ）、制御部１１は、学習が収束済みと判断し強識別器の学習を終了する。 Here, if e _stage <1 (step S16; No), the control unit 11 determines that learning has not converged, increments the learning stage t (t ← t + 1, step S17), and learns the next strong classifier. Transition to the stage (return to step S4). On the other hand, when e _stage ≧ 1 (step S16; Yes), the control unit 11 determines that the learning has been converged and ends the learning of the strong classifier.

強識別器の学習が終了すると、制御部１１は、学習した強識別器（Ｒ^１，…，Ｒ^ｔ）から識別器３を構築し、記憶部１２に格納する（ステップＳ１８）。 When the learning of the strong classifier is completed, the control unit 11 constructs the classifier 3 from the learned strong classifiers (R ¹ ,..., R ^t ) and stores the classifier 3 in the storage unit 12 (step S18).

以上、本実施形態に係る識別器生成装置１の学習処理について詳細に説明した。 Heretofore, the learning process of the classifier generation device 1 according to the present embodiment has been described in detail.

＜推定動作＞
次に、図１４のフローチャートを参照しながら、本実施形態に係る推定装置２が、顔画像から顔器官位置を推定する顔器官推定処理について説明する。推定装置２の記憶部１２には、識別器生成装置２が生成した識別器３が格納されているものとする。 <Estimated operation>
Next, a facial organ estimation process in which the estimation device 2 according to the present embodiment estimates a facial organ position from a facial image will be described with reference to the flowchart of FIG. It is assumed that the classifier 3 generated by the classifier generation device 2 is stored in the storage unit 12 of the estimation device 2.

推定装置２の制御部１１は、推定対象である入力画像Ｉ、顔器官の初期位置Ｓ^０、顔方向情報Ｄを含む入力データを受付ける（ステップＳ３１）。また、制御部１１は、入力された顔方向情報Ｄに対応する識別器３を選択し読込む（ステップＳ３２）。 The control unit 11 of the estimation apparatus 2 receives input data including the input image I to be estimated, the initial position S ⁰ of the facial organ, and the face direction information D (step S31). Further, the control unit 11 selects and reads the discriminator 3 corresponding to the input face direction information D (step S32).

図１５（ａ）は入力データを表し、図１５（ｂ）は選択された識別器３を表す。図１５（ａ）に示すように、入力データとして入力画像Ｉ、顔器官の初期位置Ｓ^０、また、顔方向情報Ｄ（“正面”）が与えられている。この場合、図１５（ｂ）に示すように、制御部１１は、顔方向情報Ｄ（“正面”）に対応する識別器３−１を選択する。なお、初期位置Ｓ^０とは、入力画像Ｉにおいて推定対象となるオブジェクトの任意の予測位置である。 FIG. 15A shows the input data, and FIG. 15B shows the selected discriminator 3. As shown in FIG. 15A, an input image I, an initial position S ⁰ of a facial organ, and face direction information D (“front”) are given as input data. In this case, as illustrated in FIG. 15B, the control unit 11 selects the discriminator 3-1 corresponding to the face direction information D (“front”). Note that the initial position S ^0, an arbitrary predicted position of an object to be estimated target in the input image I.

制御部１１は、識別器３の強識別器Ｒ^ｔによる推定ステージｔのインデクスを初期化し（ｔ＝１、ステップＳ３３）、また、推定ステージ内のＦｅｒｎ木（弱識別器）ｒ^ｋによる推定ラウンドｋを初期化する（ｋ＝１、ステップＳ３４）。 Control unit 11 initializes the index estimation stage t by strong classifier ^{R t} discriminator 3 (t = 1, step S33), also, Fern tree (weak classifiers) in estimating the stage ^{r k} by the estimated round k is initialized (k = 1, step S34).

制御部１１は、入力画像Ｉと前回の位置Ｓ^ｋ−１（最初は初期位置Ｓ^０）を弱識別器ｒ^ｋへ入力し、弱識別器の学習済パラメータ（特徴と閾値）を参照して入力画像Ｉを分類し、その出力値として移動量ΔＳ^ｋ（＝ｒ^ｋ（Ｉ，Ｓ^ｋ−１））を取得する（ステップＳ３５）。
そして、制御部１１は、取得した移動量ΔＳ^ｋと前回の位置Ｓ^ｋ−１から、現在位置を次のように移動させる（ステップＳ３６）。 Control unit 11, the position S ^k-1 of the input image I and the previous ^(first initial position S ⁰⁾ to enter into the weak classifier r ^k, with reference to the trained parameter of the weak classifiers (wherein the threshold value) The input image I is classified, and a movement amount ΔS ^k (= r ^k (I, S ^k−1 )) is acquired as its output value (step S35).
Then, the control unit 11 moves the current position from the acquired movement amount ΔS ^k and the previous position S ^k−1 as follows (step S36).

弱識別器ｒ^ｋが、強識別器Ｒ^ｔ内の最後の弱識別器でない場合（ステップＳ３７；Ｎｏ）、制御部１１は、推定ラウンドｋをインクリメントし（ｋ←ｋ＋１、ステップＳ３８）、次の弱識別器による推定処理を続行する。
一方、弱識別器ｒ^ｋが、強識別器Ｒ^ｔ内の最後の弱識別器の場合（ステップＳ３７；Ｙｅｓ）、更に、強識別器Ｒ^ｔが識別器３の最後の強識別器か否かを判断する（ステップＳ３９）。 Weak classifier ^{r k} is, if not the last weak classifier in the strong classifier ^{R t} (step S37; No), the control unit 11 increments the estimated round k (k ← k + 1, step S38), the following The estimation process by the weak classifier is continued.
On the other hand, the weak discriminator r ^k is the case of the last weak discriminator in the strong classifier R ^t (step S37; Yes), further, strong classifier R ^t is whether the last strong classifier discriminator 3 Is determined (step S39).

強識別器Ｒ^ｔが識別器３の最後の強識別器でない場合（ステップＳ３９；Ｎｏ）、制御部１１は、推定ステージｔをインクリメントし（ｔ←ｔ＋１、ステップＳ４０）、ステップＳ４に戻り、次の強識別器による推定処理に移行する。 When the strong discriminator R ^t is not the last strong discriminator of the discriminator 3 (step S39; No), the control unit 11 increments the estimation stage t (t ← t + 1, step S40), and returns to step S4. It moves to the estimation process by the strong classifier.

一方、強識別器Ｒ^ｔが識別器３の最後の強識別器の場合（ステップＳ３９；Ｙｅｓ）、制御部１１は、推定処理を終了し、推定結果を出力する（ステップＳ４１）。 On the other hand, when the strong classifier ^Rt is the last strong classifier of the classifier 3 (step S39; Yes), the control unit 11 ends the estimation process and outputs the estimation result (step S41).

図１６は、顔器官の推定結果を示す図である。図に示すように、出力画像Ｅ上に顔器官の最終的な推定位置Ｓ^Ｔが算出されてプロット表示される。 FIG. 16 is a diagram illustrating the estimation result of the facial organs. As shown, the final estimated position S ^T of the face organ is being calculated plotted displayed on the output image E.

以上、添付図面を参照しながら、本発明に係る識別器生成装置１、推定装置２等の好適な実施形態について説明したが、本発明はかかる例に限定されない。例えば、本発明は以下の変形例等を含む。 The preferred embodiments of the discriminator generation device 1 and the estimation device 2 according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. For example, the present invention includes the following modifications.

例えば、木の深さを動的に決めることも可能である。例えば、あらかじめ目標値を与えその目標値を満たすまで木の深さを増していき、最適な深さを動的に決定することも可能である。 For example, the depth of the tree can be determined dynamically. For example, it is possible to determine the optimum depth dynamically by giving a target value in advance and increasing the depth of the tree until the target value is satisfied.

また、本発明の弱識別器はＦｅｒｎ木である必要は必ずしもない。例えば、特徴の候補に、ＮＵＬＬ候補（特徴を選択しない場合）を加えると、動的計画法でＦｅｒｎ木でない木構造の弱識別器を構築することが可能である。その他、手動で、不要な枝を剪定することで、Ｆｅｒｎ木ではない木構造を構築できる。 Further, the weak classifier of the present invention is not necessarily a Fern tree. For example, if a NULL candidate (when no feature is selected) is added to a feature candidate, it is possible to construct a weak classifier having a tree structure that is not a Fern tree by dynamic programming. In addition, a tree structure that is not a Fern tree can be constructed by manually pruning unnecessary branches.

また、識別器生成装置１の制御部１１は、ステップＳ２において複数の異なる初期位置を学習画像に対して設定することで、識別器３を学習するための学習データ数を擬似的に増加させるようにしてもよい。例えば、Ｎ個の学習画像に対して３パターンの異なる初期位置を設定することで、学習データ数を３倍に増加させることができる。 Further, the control unit 11 of the discriminator generation device 1 sets a plurality of different initial positions in the learning image in step S2 so as to increase the number of learning data for learning the discriminator 3 in a pseudo manner. It may be. For example, the number of learning data can be increased three times by setting three patterns of different initial positions for N learning images.

また、一般的にリグレッションベース手法では、初期位置の与え方によって推定結果の良し悪しが変動する傾向がある。そこで、推定装置２の制御部１１は、推定ラウンドが所定ラウンド進んだ時点（概ね全推定ラウンドの１割程度進んだ時点）で、初期位置の妥当性を判断し、妥当でないと判断した場合には別の初期位置で推定処理をやり直すようにしてもよい。具体的には、初期位置の妥当性判断を行う推定ステージのインデクスがｔの場合、強識別器４−ｔ（Ｒ^ｔ）に入力される入力位置Ｓ^ｔ−１と出力される出力位置Ｓ^ｔの変位量の分散を算出し、変位量の分散が所定の閾値より小さければ、与えた初期位置は妥当（良い推定解へ収束する可能性が高い）と判断する。一方、閾値より大きければ、与えた初期位置は妥当でない（良い推定解へ収束する可能性が低い）と判断する。なお、上記の変位量とは、具体的には、ステップＳ３５において推定される当該強識別器４−ｔ（Ｒ^ｔ）内の各弱識別器５の変位量の和に相当する。 In general, in the regression-based method, the quality of the estimation result tends to vary depending on how the initial position is given. Therefore, the control unit 11 of the estimation device 2 determines the validity of the initial position when the estimation round advances by a predetermined round (approximately 10% of all estimation rounds) and determines that the initial position is not appropriate. May perform the estimation process again at another initial position. Specifically, when the index of the estimation stage for determining the validity of the initial position is t, the input position S ^t-1 input to the strong discriminator 4-t (R ^t ) and the output position S ^t output If the variance of the displacement amount is calculated and the variance of the displacement amount is smaller than a predetermined threshold value, it is determined that the given initial position is appropriate (highly likely to converge to a good estimated solution). On the other hand, if it is larger than the threshold, it is determined that the given initial position is not valid (the possibility of convergence to a good estimated solution is low). Note that the above-described displacement amount specifically corresponds to the sum of the displacement amounts of the weak classifiers 5 in the strong classifier 4-t (R ^t ) estimated in step S35.

また、本実施形態では、Ｓｈａｐｅ−Ｉｎｄｅｘに基づく特徴を用いたが、これに限らず、Ｈａａｒ−Ｌｉｋｅ、ＳＩＦＴ、ＳＵＲＦ等の他の特徴（特徴量）を用いてもよい。 In the present embodiment, the feature based on Shape-Index is used. However, the present invention is not limited to this, and other features (features) such as Haar-Like, SIFT, and SURF may be used.

その他、当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される。 In addition, it is obvious that those skilled in the art can arrive at various changes or modifications within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. It is understood.

１；識別器生成装置
２；推定装置
３；識別器、回帰関数
４、Ｒ；強識別器、回帰関数
５、ｒ；Ｆｅｒｎ木（弱識別器）、回帰関数
５１；葉ノード
６Ａ、６Ｂ；Ｓｈａｐｅ−Ｉｎｄｅｘ
Ｈ；ヒストグラム
δ；閾値
７Ｇｒ；Ｆｅｒｎ候補グラフ
７ｎ；Ｆｅｒｎ候補グラフ７Ｇｒのノード
７ｅ；Ｆｅｒｎ候補グラフ７Ｇｒのエッジ
Ｉ_ｉ；学習画像
Ｉ；入力画像
Ｅ；出力画像
Ｄ；顔方向情報
Ｓ；位置
１１；制御部
１２；記憶部
１３；メディア入出力部
１４；通信制御部
１５；入力部
１６；表示部
１７；周辺機器Ｉ/Ｆ部
１８；バス

DESCRIPTION OF SYMBOLS 1; Classifier production | generation apparatus 2; Estimation apparatus 3; Classifier, regression function 4, R; Strong classifier, regression function 5, r; Fern tree (weak classifier), regression function 51; Leaf node 6A, 6B; -Index
H; histogram δ; threshold 7Gr; Fern candidate graph 7n; node 7e of Fern candidate graph 7Gr; edge I _i of Fern candidate graph 7Gr; learning image I; input image E; output image D; face direction information S; Control unit 12; Storage unit 13; Media input / output unit 14; Communication control unit 15; Input unit 16; Display unit 17; Peripheral device I / F unit 18;

Claims

A classifier generating device that generates a classifier configured by using a plurality of strong classifiers composed of a plurality of weak classifiers,
Receiving a plurality of learning images including an object to be detected, and a correct value that is the position of the object of the learning image;
Extracting a plurality of features from the learning image;
Selecting two of the plurality of features and determining a difference feature from the difference between them;
Generating a histogram from the feature quantity of the difference feature using the plurality of learning images, determining a threshold value at which the areas of the histogram are equal;
Using the feature amount of the difference feature and the threshold value, generate the weak classifier having a tree structure,
A discriminator generation device, wherein a movement amount for correcting an estimated position of a detection target of an image identified by the weak discriminator is calculated using the learning image and the correct answer value.

The classifier generating apparatus according to claim 1, wherein the classifier classifies the plurality of learning images for each face direction to generate a classifier for each direction.

The classifier generation device according to claim 1 or 2, wherein the weak classifier and the strong classifier are generated based on a preset target value.

4. The weak classifier is generated by determining the feature amount to be used corresponding to the depth of the tree structure using dynamic programming. 5. Classifier generator.

The discriminator generation device according to any one of claims 1 to 4, wherein the feature is Shape-Index.

An estimation apparatus, wherein the detection target object is estimated while moving the estimated position of the detection target object in an image using the classifier of the classifier generation device according to any one of claims 1 to 5.

Receiving a plurality of learning images including an object to be detected, and a correct value that is the position of the object of the learning image;
Extracting a plurality of features from the learning image;
Selecting two of the plurality of features and determining a difference feature from the difference between them;
Generating a histogram from the feature quantity of the difference feature using the plurality of learning images, determining a threshold value at which the areas of the histogram are equal;
Using said feature quantity of the difference feature threshold generates a weak discriminator that Do from the tree structure,
A classifier generation method characterized by calculating a movement amount for correcting an estimated position of a detection target of an image identified by the weak classifier using the learning image and the correct answer value.

Receiving a plurality of learning images including an object to be detected, and a correct value that is the position of the object of the learning image;
Extracting a plurality of features from the learning image;
Selecting two of the plurality of features and determining a difference feature from the difference between them;
Generating a histogram from the feature quantity of the difference feature using the plurality of learning images, determining a threshold value at which the areas of the histogram are equal;
Using said feature quantity of the difference feature threshold generates a weak discriminator that Do from the tree structure,
Using the learning image and the correct answer value, calculate a movement amount for correcting the estimated position of the detection target of the image identified by the weak classifier,
An estimation method, wherein the detection target object is estimated while moving the estimated position of the detection target object in the image using the weak classifier.

A program for causing a computer to execute the classifier generation method according to claim 7.

A program for causing a computer to execute the estimation method according to claim 8.