JP3960535B2

JP3960535B2 - Subsampling method and computer-executable program for real-time blob detection and tracking in an image stream based on affine deformation

Info

Publication number: JP3960535B2
Application number: JP2002234243A
Authority: JP
Inventors: アレシュウデ; クリストファーアトキソン
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2002-08-12
Filing date: 2002-08-12
Publication date: 2007-08-15
Anticipated expiration: 2022-08-12
Also published as: JP2004163989A

Description

【０００１】
【発明の属する技術分野】
この発明はイメージストリーム内のオブジェクトを検出しトラッキングするためのサブサンプリング方法およびプログラムに関し、特に、ほぼ楕円形のオブジェクトをリアルタイムで検出しトラッキングするためのピクセルのサブサンプリング方法およびコンピュータ実行可能プログラムに関する。
【０００２】
【従来の技術】
高性能コンピュータの発展に伴い、広範な分野で、コンピュータがごく短時間に大量の情報を取扱う能力が開発されつつある。典型的な例のひとつは、リアルタイムで人と対話することのできるロボットである。別の例は、コンピュータまたはコンピュータネットワークに常駐して人と対話し、それ自身の判断で活動することのできるコンピュータプログラムである。この出願では、このようなロボットまたはプログラムを「エージェント」と呼ぶこととする。
【０００３】
人とエージェントとのこのようなリアルタイムの対話を実現するためには、たとえばヒューマノイドロボットの頭部に装着した１台または２台以上のカメラを用いて人と人の活動を観察する能力が、必要な前提条件となる[5]。高いフレーム（またはフィールド）レート、例えば６０Ｈｚで動作しているカメラから獲得したイメージ内の複数のオブジェクトを検出しトラッキングできるシステムは大いに有益となるであろう。
【０００４】
確率的「ブロッブトラッカ」は、多くはある種の尤度関数の最大化に基づくものであるが、最近ますます一般化している。これまでに、色ヒストグラム、ガウス混色、輝度勾配、深さ、オプティカルフロー等の様々なモダリティまたはこれらモダリティの組合せを用いた多くのブロッブトラッカが提案されている[1,2,3,4,5,6]。ここで用いられる「ブロッブ」とはイメージ内のほぼ楕円形状のオブジェクトをいう。応用の性質上、多くの場合にはイメージストリーム内でのブロッブの検出およびトラッキングはリアルタイムで行なうことが求められる。
【０００５】
進歩したトラッカは各イメージピクセルに対しかなりの量の処理を必要とするので、全イメージにリアルタイムで適用することはできない。通常、各ピクセルでの処理を簡略化しながら結果として得られるトラッカの信頼性を保つことはできないので、多くの実用トラッキングシステムでは、代替的な方策としてウインドウ処理，マスキング、およびサブサンプリング等の技術を採用して、処理すべき情報量を減じている。
【０００６】
ウインドウ処理では、全イメージに代えて、矩形のサブイメージ（ウインドウ）のみが処理に用いられる。
【０００７】
マスキングでは、０または１の値の二値イメージ（マスク）が規定され、原イメージの内で、二値イメージにおける対応の値が１に等しいピクセルのみが処理に用いられる。
【０００８】
サブサンプリングでは、例えばイメージの２行ごとおよび２列ごとだけが処理に用いられる。これは必ずしも２行または２列ごとでなくてもよく、３行ごと、または４列ごとであってもよい。
【０００９】
先行技術のブロッブトラッカ[1,2,3,4,5,6]に共通する特徴は、トラッキングされたオブジェクトの形状を、確率的に「ブロッブピクセル」と分類されるピクセルの二次統計により近似することである。ここで用いられる「二次統計」とは、ガウス分布の場合、ブロッブに含まれるピクセルの平均と共分散行列を計算することを意味する。
【００１０】
[参考文献]
［１］Ｃ．ブレグラー著、「ビデオシーケンス内での人間の動力学の学習と認識」、ＩＥＥＥコンピュータ・ソサエティ大会、コンピュータ・ビジョンとパターン認識予稿集、５６９頁−５７４頁、サン・ファン、プエルトリコ、１９９７年（C. Bregler. Learning and recognizing human dynamics in video sequences. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 569-574, San Juan, Puerto Rico, 1997.）
［２］Ｄ．コマニシュー、Ｖ．レメシュ、およびＰ．メール著、「移動平均を用いた、非剛体オブジェクトのリアルタイム・トラッキング」、ＩＥＥＥコンピュータ・ソサエティ大会、コンピュータ・ビジョンとパターン認識予稿集、第２巻、１４２頁−１４９頁、ヒルトン・ヘッド、サウス・キャロライナ、２０００年（D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. In Proc. IEEE computer Society Conf. Computer vision and Pattern Recognition. Vol. 2, pp. 142-149, Hilton Head, South Carolina, 2000.）
［３］Ｎ．ジョジック、Ｍ．ターク、およびＴ．Ｓ．ファン著、「密な不均衡マップ内での、自己オクルージョンを起こす連接オブジェクトのトラッキング」、第７回コンピュータ・ビジョン国際会議予稿集、１２３頁−１３０頁、ケルキラ、ギリシャ、１９９９年（N. Jojic, M. Turk, and T.S. Huang. Tracking self-occluding articulated objects in dense disparity maps. In Proc. 7th Int. Conf. computer Vision, pp. 123-130, Kerkyra, Greece, 1999.）
［４］Ｓ．Ｊ．マッケナ、Ｙ．ラージャ、およびＳ．Ｇ．ゴン著「適応的混色モデルを用いたカラー・オブジェクトのトラッキング」、イメージおよびビジョン・コンピューティング、１７（３−４）、２２５頁−２３１頁、１９９９年３月（S.J. McKenna, Y. Raja, and S. Gong. Tracking colour objects using adaptive mixture models. Image and Vision Computing, 17(3-4): 225-231, March 1999.）
［５］Ａ．ウデ、Ｔ．シバタ、およびＣ．Ｇ．アトキソン、「ヒューマノイド・ロボットとの対話のためのリアルタイム・視覚システム」、ロボティックス・アンド・オートノマス・システムズ、３７（２−３）、１１５頁−１２５頁、２００１年１１月（A. Ude, T. Shibata, and C.G. Atkenson. Real-time visual system for interaction with a humanoid robot. Robotics and Autonomous Systems, 37(2-3): 115-125, November 2001.）
［６］Ｙ．ウーおよびＴ．Ｓ．ファン著、「ロバストな視覚トラッキングへの共演繹アプローチ」、第８回コンピュータ・ビジョン国際会議予稿集、第１１巻、２６頁−３３頁、バンクーバー、カナダ、２００１年（Y. Wu and T.S. Huang. A co-inference approach to robust visual tracking. In Proc. Eight Int. Conf. Computer Vision, Vol. 11, pp. 26-33, Vancouver, Canada, 2001.）
【００１１】
従って、この発明の目的はリアルタイムのブロッブ検出およびトラッキングに適したサブサンプリング方法およびプログラムを提供することである。
【００１２】
この発明の別の目的はたとえイメージ内のブロッブのサイズが変化しても各ブロッブを検出したり、トラッキングしたりするのに必要な時間を実質的に同じにすることができる、サブサンプリング方法およびプログラムを提供することである。
【００１３】
この発明のさらに別の目的は、結果として得られるトラッカの信頼性を損なうことなく、各ブロッブの検出と、トラッキングとをより早く行なうことができる、サブサンプリング方法およびプログラムを提供することである。
【００１４】
【課題を解決するための手段】
この発明のある局面に従う方法は、アフィン変形に基づく、イメージストリーム内におけるリアルタイムのブロッブ検出およびトラッキングのためのサブサンプリング方法であって、イメージ内でサブサンプリングすべき領域を特定するステップと、イメージ内の特定された領域を予め定められたサイズのウインドウにマッピングするアフィン変換を計算するステップと、計算されたアフィン変換の逆変換を予め定められたサイズのウインドウのピクセル座標に適用するステップと、予め定められたサイズのウインドウの各ピクセルについて、ピクセル値を計算し、これらの値を元のイメージ内の関連のピクセル座標とともに記憶するステップとを含む、サブサンプリング方法が提供される。
【００１５】
特定された領域が、予め定められたサイズのウインドウに、このウインドウのピクセルに逆変換を適用することにより変換されるので、サブサンプリングのための計算時間はほぼ一定となる。従って、システムのリアルタイム動作が保証される。
【００１６】
適用するステップは、ブロッブの位置と、サブサンプリングすべき領域に含まれるブロッブの長軸および短軸の方向および長さとをイメージ内に規定される座標軸に対して推定するステップと、アフィン変換を規定してブロッブをウインドウ内の変形されたイメージ(形状)上にマッピングするステップとを含み、変形されたイメージの長軸および短軸はウインドウ内に規定される座標系のそれぞれの軸に平行であり、変形されたイメージの長軸および短軸はウインドウの辺より短く、さらに逆アフィン変換を適用して変形されたイメージのそれぞれのピクセル値および座標を計算するステップを含む。変形されたイメージのピクセル値を計算するにはニアレストネイバーまたは線形補間が用いられる。
【００１７】
長軸と短軸の長さは大きく異なることがあるので、変形されたイメージをウィンドウ内に規定された座標系の軸の一つに沿ってサブサンプリングしても、データが過剰な方向に沿ってだけ情報量が削減されるので正確さは損なわれない。
【００１８】
好ましくは、推定するステップが、ブロッブの位置と、サブサンプリングすべき領域に含まれるブロッブの長軸および短軸の方向および長さとをイメージ内に規定される座標軸に対して推定するステップを含む。
【００１９】
ブロッブの位置と、サブサンプリングすべき領域に含まれるブロッブの長軸および短軸の方向および長さとを推定することにより、変形されたイメージの長軸および短軸を、変形されたイメージがマッピングされるウインドウ内に規定される座標軸と整列させることができる適切なアフィン変換を規定することができる。
【００２０】
さらに好ましくは、固定されたサイズのウインドウが固定されたサイズの正方形であり、定義するステップが、アフィン変換を規定してブロッブをウインドウ内の変形されたイメージ上にマッピングするステップを含み、変形されたイメージの長軸および短軸はウインドウ内に規定される座標系のそれぞれの軸に平行であり、変形されたイメージの長軸および短軸はウインドウの辺より短く、長軸と短軸とは同じ長さである。
【００２１】
情報量はデータが過剰な方向においてのみ減じられるので、正確さを損なうことなく実質的に処理時間を節約することができる。
【００２２】
アフィン変換を以下のように定義してもよい。
【００２３】
【数３】

ここでu ⁱ=[uⁱ,vⁱ]^Tおよびθⁱはｉ番目の測定時にイメージにおいて規定される座標系によって表わされる形状の位置と向きであり、aⁱおよびbⁱはそれぞれその形状の長軸および短軸の半分の長さであり、wはウインドウの予め規定されたサイズであり、sは変形された形状がウインドウよりもどれほど小さくなるべきかを特定する縮尺係数である。
【００２４】
この発明の別の局面に従えば、アフィン変換を計算するステップが、ブロッブの位置と、サブサンプリングすべき領域に含まれるブロッブの長軸および短軸の方向および長さとをイメージ内に規定される座標軸に対して推定するステップと、ブロッブをウインドウ内の変形されたイメージ上にマッピングするようにアフィン変換を定義するステップとを含む。変形されたイメージの長軸および短軸はウインドウ内に規定される座標系のそれぞれの軸に平行である。変形されたイメージの長軸および短軸はウインドウの辺より短い。適用するステップは、逆アフィン変換およびニアレストネイバーまたは線形補間を適用して変形されたイメージのそれぞれのピクセル値および座標を計算するステップを含んでもよい。
【００２５】
推定するステップは、ブロッブの位置と、サブサンプリングすべき領域に含まれるブロッブの長軸および短軸の方向および長さとをイメージ内に規定される座標軸に対して推定するステップを含む。
【００２６】
この発明の別の局面は、コンピュータ上で実行されて上述の方法を実現するコンピュータ実行可能プログラムに関する。
【００２７】
【発明の実施の形態】
[背景となる理論]
先に述べたように、先行技術のブロッブトラッカに共通する特徴は、トラッキングされたオブジェクトの形状を、ブロッブピクセルの二次統計により近似することである。関連の共分散行列の固有値分解を計算することにより、ブロッブの範囲を、その長軸および短軸に沿って計算することができる。両軸の長さはかなり相違する可能性があるので、イメージを、イメージの座標軸に代えて主ブロッブ方向に沿ってサブサンプリングし、各々の軸の長さを考慮してこれらの方向に沿ってそれぞれ異なる縮尺係数を適用することが合理的であろう。
【００２８】
図１はこの発明の実施例がイメージ内のブロッブをどのように処理するかを概略的に示す。これらの変換結果は後述するようにアフィンマッピングをもたらし、イメージをこのような変換により幾何学的に変換する処理を「アフィン変形」と呼ぶ。図１を参照して、イメージストリーム７０のあるフィールド（またはフレーム）はブロッブ７２および７４を含む。ブロッブ７２および７４の各々について、この発明の実施例はブロッブを含む区域を、異なるアフィン変形８０または８２により、固定サイズのウインドウ９０または１００に変換する。
【００２９】
アフィン変形８０がブロッブ７２を円９２に変換する。ブロッブ７４はブロッブ９４に変換されるだろう。アフィン変形８２はブロッブ７４を円１０４に変換する。ブロッブ７２は別のブロッブ１０２に変換されるだろう。ウインドウ９０に先行技術のブロッブトラッカのいずれかを適用して、ブロッブ９２（７２）が検出されトラッキングされる。ウインドウ１００にブロッブトラッカを適用して、ブロッブ１０４が検出されトラッキングされる。ウインドウ９０および１００は同じサイズなので、ブロッブ９２および１０４を検出したり、トラッキングしたりするのに必要な時間は実質的に同じになる。ブロッブの両軸の長さがかなり相違するおそれがあるので、イメージを、イメージの座標軸に代えて主ブロッブ方向に沿ってサブサンプリングし、各々の軸の長さを考慮してこれらの方向に沿ってそれぞれ異なる縮尺係数を適用することが有用であろう。なお、ブロッブに関連の共分散行列の固有値分解を計算することにより、ブロッブの長軸および短軸に沿った範囲を推定すること、すなわちブロッブピクセルを取囲む楕円の場所と形状とを計算することができる。
【００３０】
変形８０および８２は主方向に沿ったサブサンプリングを含み、以下の変換を適用することにより実現される。
（１）図２に示されるように、処理すべきイメージ内のブロッブ２００がブロッブ２０２に平行移動され、その中心がイメージ座標ｕおよびｖの原点Ｏと一致させられる。
（２）図３に示されるように、平行移動されたブロッブ２０２は主方向が座標軸ｕおよびｖと一致するように回転され、ブロッブ２０４となる。
（３）図４に示されるように、ブロッブ２０４はその長軸および短軸が、予め定められた固定サイズのウインドウ２０６の辺よりも短くなるように縮小され、この実施例では円であるブロッブ２０８となる。
（４）図５に示されるように、ウインドウ２０６はさらに平行移動されてその中心が予め定められたサイズの新たなウインドウ２１０の中心と整列させられる。結果として生じるブロッブ２１２が先行技術のトラッカにより検出されたりトラッキングされたりする。
【００３１】
図６に示されるように、これらの変換結果はアフィンマッピング２３０をもたらし、イメージをこのような変換により幾何学的に変換する処理を前述のように「アフィン変形」と呼ぶ。
【００３２】
同次座標におけるマッピングは以下のアフィン変換で与えられる。
【００３３】
【数４】

ここでu _k ⁱ=[u_k ⁱ,v_k ⁱ]^Tおよびθ _k ⁱはｋ番目のブロッブのｉ番目の測定時の位置と向きであり、a_k ⁱおよびb_k ⁱは長軸および短軸の半分の長さであり、wはブロッブ周りの領域をマッピングするウインドウの予め規定されたサイズであり、sはマッピングされたブロッブが目標のウインドウよりもどれほど小さくなるべきかを特定する縮尺係数である。これはマッピングされた領域が本当にトラッキングされたオブジェクトを含んでいるかを確実にするために必要である。なぜなら、次のイメージフィールドまたはフレームにおける正確なブロッブパラメータを前もって知ることはできないからである。
【００３４】
ブロッブパラメータ（u_k ⁱ, v_k ⁱ θ_k ⁱ, a_k ⁱ b_k ⁱ）は予測処理によって推定される。ブロッブパラメータの予測は離散二次動システム
x_i=ax_i-1+bx_i-2+e_i
に基づくものであってもよく、ここでx_iは（u_k ⁱ, v_k ⁱ, θ_k ⁱ, a_k ⁱ, b_k ⁱ）のいずれかであり、e_iはシステムノイズである。または、ブロッブパラメータをカルマンフィルタによって予測してもよい。
【００３５】
もしトラッキングされたブロッブが十分に小さくて、次の測定時間もu _k ⁱを中心とするサイズwのウインドウ内に留まるであろうと仮定することができるならば、すなわち 2*b_k ⁱ≦w/s であれば、入力イメージを変形する必要はなく、単に関心のある領域の四角形を、目標とするウインドウにコピーするだけでよい。こうすれば確実に、既に十分に小さくてリアルタイムで処理できるようなブロッブを縮小せずにすむ。
【００３６】
説明した変換２３０は可逆なので、原イメージにおけるブロッブ２００を含む、関心のある領域２２０より通常は小さい出力ウインドウ２１０のピクセルを分析(パーズ)し、このウインドウ２１０の各ピクセルに逆マッピング２３２を適用することにより、変形したイメージを生成することができる。これらの位置の関連のピクセル値(色輝度)は、ニアレストネイバーまたは線形補間により推定される。変換されたピクセル位置もまた記憶されるが、これはこの実施例で用いられるブロッブトラッカが、他の多くの先行技術のトラッカと同様に、次のブロッブ位置の推定にこれらを必要とするからである。
【００３７】
その後ブロッブトラッカをウインドウ２１０内のイメージ等の変形されたイメージに適用することができる。このようにして、正確さを失うことなく実質的な時間の節約が可能である。なぜなら、データが過剰な方向においてのみ情報量が減じられるからである。変形されたイメージ（ウインドウ２１０）のサイズは固定されているので、ブロッブあたりの処理時間はほぼ一定であり、従ってシステムのリアルタイム動作が保証される。
【００３８】
以下で説明されるように、ここで述べたサブサンプリングの考え方をリアルタイムのブロッブ検出に利用することもできる。典型的には、ブロッブトラッカはブロッブの初期状態に関する情報をなんら有していない。従って、ブロッブの検出はランダムサーチ処理に基づいて行なわれ、イメージ内のブロッブの形状と場所とがランダムに選択される。形状パラメータは、生成されたブロッブの２−Ｄサイズが予め特定された限界内にあるように制御された方法で変えられる。
【００３９】
リアルタイムの動作を達成するために、各ブロッブの周りの関心のある領域が、上述のとおり固定サイズのウインドウに変形される。この実施例では、ブロッブ検出に用いられる確率的指標となる関数が変形されたイメージの各ピクセルで評価される。ウインドウ内の全ての確率の和があるしきい値を超えると、その領域は注目すべきものとみなされ、関連の、ランダムに選択されたブロッブパラメータを用いてトラッカが開始される。
【００４０】
このようなしきい値は多くのファクタに依存するため、これらを前もって選択することは不可能である。従って、この実施例では初期化処理を二つの段階に分ける。第一に、アルゴリズムがイメージストリームを十分な時間だけ(典型的には５秒間)探索して、確率の和をサンプリングする。その後、求められたオブジェクトの各々についてのしきい値が、評価された確率の和の平均と最大値との間の値に設定される。第二段階では、再びランダムサーチが開始される。自動的に選択されたしきい値を超えると、オブジェクトが発見されたとみなされる。この同じ手順を、トラッカがオブジェクトを見失ったときにも用いることができる。この場合、応用により、検出処理を第一段階から開始することも第二段階から開始することもできる。
【００４１】
[ランダムサーチによる自動ブロッブ検出]
注目すべきオブジェクトの自動検出とその後のトラッカの初期化とは、実用トラッキングシステムのいずれにおいても必須の部分である。我々は動いているカメラで捕捉された動的シーンに関心があるので、検出アルゴリズムはトラッキングアルゴリズムと同じかそれより早く実行される必要がある。一つのイメージを長時間分析して結果を得ても、注目すべきオブジェクトまたはカメラが処理の終わる前に別の場所に移動してしまっているかもしれないので、それでは役に立たない。加えて、実用的なシステムでは、種々のシーン、オブジェクトごとにさまざまなパラメータをユーザが設定することを期待すべきでない。そうすることは煩雑だからである。この発明のシステムの基本となる知識は、色および形状の確率分布により提供される。イメージ内の楕円形オブジェクトをサーチするには時間がかかるので、トラッカを初期化する基本となる知識として、ここでは色のみを用いる。
【００４２】
あるピクセルがＬ番目のブロッブに属する確率は色に基づいてある式で与えられるが、ここでは簡潔のため詳細は説明しない。ブロッブの初期状態についての情報がないので、その形状およびサイズをイメージ内でランダムに選択する。形状パラメータは生成されたブロッブの２−Ｄサイズが予め特定された限界内にあるように、制御された方法で変えられる。リアルタイム動作を達成するため、上述のとおり、各ブロッブの周りの関心のある領域を固定サイズのウインドウに変形する。ある式で与えられる色確率がその後変形されたイメージの各ピクセルについて推定される。ウインドウ内の全ての確率の和があるしきい値を超えると、その領域は注目すべきものとみなされ、関連の、ランダムに選択されたブロッブパラメータを用いて、トラッカが開始される。
【００４３】
[好ましい実施例]
ハードウェア構成
この実施例は、単一の静止カメラを一例として以下に説明されるが、ここで説明される方法とプログラムは、頭部に２台のカメラを備えたヒューマノイドロボットなど、複数の移動カメラを備えたシステムにも容易に適用可能である。
【００４４】
図７はこの実施例の方法を実現するコンピュータシステムを例示する。図８はシステムのブロック図である。図７を参照して、この実施例を実現するコンピュータシステム２０は、ＦＤ（フレキシブルディスク）ドライブ５２およびＣＤ−ＲＯＭ（コンパクトディスク読出専用メモリ）ドライブ５０を有するコンピュータ４０と、いずれもコンピュータ４０に接続された、キーボード４６と、マウス４８と、モニタ４２と、ビデオカメラ３０とを含む。
【００４５】
図８を参照して、コンピュータ４０は、ＦＤドライブ５２およびＣＤ−ＲＯＭドライブ５０に加えて、ＣＰＵ（中央処理装置）５６と、ＣＰＵ５６、ＦＤドライブ５２およびＣＤ−ＲＯＭドライブ５０に接続されたバス６６と、バス６６とカメラ３０とに接続されたビデオキャプチャボード６８と、ブートアッププログラム等を記憶する読出専用メモリ（ＲＯＭ）５８と、ＣＰＵ５６に接続され、プログラム命令、システムプログラム、およびデータを記憶するランダムアクセスメモリ（ＲＡＭ）６０とを含む。
【００４６】
ここでは示さないが、コンピュータ４０はさらにローカルエリアネットワーク（ＬＡＮ）への接続を提供するネットワークアダプタボードを含んでもよい。ヒューマノイドロボットの場合、システムはさらに、カメラとロボットの種々の部品をコンピュータシステム２０の制御に従って動かす多数のアクチュエータを含む。
【００４７】
コンピュータシステム２０にこの実施例のサブサンプリング方法を行なわせるプログラムは、ＣＤ−ＲＯＭドライブ５０またはＦＤドライブ５２に挿入されるＣＤ−ＲＯＭ６２またはＦＤ６４に記憶されさらにハードディスク５４に転送されてもよい。これに代えて、プログラムは図示しないネットワークを通じてコンピュータ４０に送信されハードディスク５４に記憶されてもよい。プログラムは実行の際にＲＡＭ６０にロードされる。プログラムはＣＤ−ＲＯＭ６２、ＦＤ６４、またはネットワークを介してＲＡＭ６０に直接ロードされてもよい。
【００４８】
以下に説明するプログラムは、コンピュータ４０にこの実施例の方法を行なわせるいくつかの命令を含む。この方法を行なわせるのに必要な基本的機能のいくつかはコンピュータ４０のオペレーティングシステム（ＯＳ）またはサードパーティのプログラム、もしくはコンピュータ４０にインストールされるモジュールにより提供されるので、このプログラムはこの実施例の方法を実現するのに必要な機能全てを必ずしも含まなくてよい。コンピュータシステム２０の動作は周知であるので、ここでは繰り返さない。
【００４９】
ソフトウェア構成
図９はコンピュータシステム２０（図７）で実行されてこの実施例の方法を実現するメインプログラムの全体的な制御構造を例示する。図９を参照して、メインプログラムは、開始すると（３００）、イメージストリーム３０２からのしきい値を推定するステップ３２０と、しきい値が推定されたか否かを判断するステップ（３２２）とを含む。しきい値が推定されていれば、制御はステップ３２４に進み、さもなければ、制御はステップ３２０に戻る。メインプログラムはさらに、イメージストリーム３０４内のオブジェクトを検出するステップ３２４と、ステップ３２４でオブジェクトが検出されたか否かを判断するステップ３２６とを含む。オブジェクトが発見されていれば、制御はステップ３２８に進み、さもなければ制御はステップ３２４に戻る。
【００５０】
メインプログラムはさらにイメージストリーム３０６内のオブジェクトをトラッキングするステップ３２８と、オブジェクトが発見されたか否かを判断するステップとを含む。もしオブジェクトが発見されていれば、制御はステップ３２８に戻り、オブジェクトが発見されていなければ、ステップ３２４に戻る。
【００５１】
図１０はステップ３２４の処理の制御構造をより詳細に示す。この処理はまた、しきい値を推定するステップ（３２０）にも適用される。図１０を参照して、オブジェクト検出処理３２４は、形状モデル４０２からブロッブの形状と場所をランダムに生成するステップから開始する（４００）。処理３２４はさらに、イメージストリーム４０４内のイメージにアフィン変形を適用するステップと、色モデル４０６に基づき確率の和を推定するステップ４２４と、しきい値が利用可能であるかを判断するステップ４２６とを含む。
【００５２】
もししきい値が利用可能でなければ、すなわち、処理がしきい値を推定するステップ３２０内であれば、制御はステップ４２８に進み、ここでしきい値を生成するのに十分なデータが利用可能であるかどうかを判断する。もし十分なデータが利用可能であれば、ステップ４３０でしきい値が生成され、制御はステップ４２０に戻り、オブジェクトの検出を開始する。もしステップ４２８で十分なデータが利用可能でなければ、制御はステップ４２０に戻り、ステップ４２８で十分なデータが利用可能となるまで、ステップ４２０から４２８までが繰り返される。
【００５３】
ステップ４２６でしきい値が利用可能であると判断されると、制御はステップ４３２に進み、しきい値を超えたかどうかがさらに判断される。しきい値を超えていれば、ブロッブが発見されたとみなされ、トラッキングが開始する（４０８）。しきい値を超えていなければ、制御はステップ４２０に戻り、ステップ４３２でしきい値を超えたと判断されるまで、ステップ４２０から４３２までが繰り返される。
【００５４】
図１１はオブジェクトをトラッキングするステップ３２８の処理をより詳細に示す。この実施例では、ＥＭ（期待値最大化）アルゴリズムが変形イメージウインドウに適用される。これは各ピクセルにおいて、場所、形状、および先に計算された色確率に基づき、ピクセルがブロッブの一つに属する確率を計算すること（期待ステップ）と、新たな場所および形状パラメータを推定すること（最大化ステップ）とからなる。ピクセルの場所が必要とされるすべての計算において、原イメージにおけるピクセルの先に計算された座標が、新たなイメージウインドウのピクセル座標に代えて用いられる。一旦アルゴリズムが収束するか、最大繰返し数に達すると、ＥＭ繰返しは停止する。
【００５５】
一旦ブロッブが検出されると処理が開始され、ブロッブのトラッキングが始まり（５００）、この処理は、ブロッブを含むイメージストリーム５０２内のイメージにアフィン変形を適用するステップ（５２０）と、色モデル５０４に基づき色確率を評価するステップと、色モデル５０４および形状モデル５０６に基づき形状と色の確率を組合わせるステップ（ＥＭアルゴリズムにおける期待ステップ）５２４と、ステップ５２４の出力に基づきブロッブの形状と場所を推定し、結果に基づき形状モデル５０６を更新するステップ５２６（ＥＭアルゴリズムにおける最大化ステップ）と、アルゴリズムが収束したか、または繰返しステップ限界を超えたかを判断するステップ（５２８）とを含む。アルゴリズムが収束したか、繰返しステップの限界を超えていれば、制御はステップ５３０に進み、そうでなければ制御はステップ５２４に戻り、ステップ５２４および５２６でＥＭアルゴリズムを繰り返す。
【００５６】
ステップ５３０で、ブロッブが発見されたかどうかを判断する。いくつかのブロッブについてトラッキングが失敗したと判断されると、これらのブロッブの検出が再開され、他のブロッブのトラッキングは継続される（５０８）。ブロッブが発見されたと判断されれば、制御はステップ５２０に戻り、トラッキングが繰り返される。
【００５７】
なお、ステップ５２０では予測されたブロッブの形状および位置を用いてアフィン変換が計算されるが、これは単に、先のイメージで推定されたブロッブの位置および形状でもよい。マッピングにより、ブロッブの周りの領域が、ブロッブが固定サイズの新たなイメージウインドウの中心に位置付けられその形状が円となるように変換され、これは図３および図４に示すとおりである。この円の半径は予測の不正確さを勘定にいれるため、ウインドウよりも小さい。
【００５８】
またステップ５２０では、新たなウインドウの各ピクセルにアフィン変換の逆変換が適用される。変換されたピクセルは一般にイメージピクセルの一つに正確にマッピングされるのではないため、新たなピクセルの各々の色は原イメージのピクセルのニアレストネイバーまたは線形補間によって推定される。色のほか、原イメージの(アフィン変換に関して)対応するピクセルの位置もまた、新たなイメージの各ピクセルについて記憶される。
【００５９】
ステップ５２２で、変形されたイメージの各ピクセルについて色確率が計算される。この実施例では色モデルが一定に保たれるので、これらは一度計算するだけでよい。
【００６０】
ステップ５２４から５２８はＥＭアルゴリズムを実現する。ステップ５２４で、各ピクセルにおいて、そのピクセルがブロッブの一つに属する確率が、場所、形状および先に計算された色確率に基づいて計算される（期待ステップ）。ステップ５２６で、新たな場所の推定と形状パラメータとが計算される。
【００６１】
図１２はこの実施例の結果を示す。システムはイメージ５５０内に三個のブロッブ５６０、５６２および５６４を検出している。ブロッブを白い×印でマークしている。システムはこれらのブロッブ５６０、５６２および５６４に適用され、それぞれ変形されたイメージ５８０，５８２および５８４を出力し、これらがイメージ５５０上に置かれる。図１２に示されるとおり、イメージ５８０，５８２および５８４はすべて同サイズであり、ブロッブは実質的に円となるように変換されている。
【００６２】
計算時間をさらに減じるため、主なオペレーティングシステムで利用可能なマルチスレッド機能を備えたマルチプロセッサＰＣ(パーソナルコンピュータ)で、トラッカの並列版を実現してもよい。上述のアルゴリズムは種々のレベルで並列化できる。たとえば、ブロッブを二つのグループに分け、アルゴリズムを二重プロセッサＰＣ上でブロッブグループごとに１スレッドで開始させてもよい。正確な検出とトラッキングを確実に行なうためには、すべてのスレッドを同期させることが必要であろう。
【００６３】
この発明を具体的な実施例を参照して説明したが、この発明はこれに限られるものではない。発明の範囲は請求項によってのみ制限される。
【図面の簡単な説明】
【図１】この発明に従った実施例の効果を示す概略図である。
【図２】この発明に従った実施例のアフィン変換の平行移動段階を示す概略図である。
【図３】この発明に従った実施例のアフィン変換の回転段階を示す概略図である。
【図４】この発明に従った実施例のアフィン変換の縮尺段階を示す概略図である。
【図５】この発明に従った実施例のアフィン変換の別の平行移動段階を示す概略図である。
【図６】この発明に従った実施例のアフィン変換とその逆変換の効果を示す概略図である。
【図７】この発明の実施例が実現されるコンピュータシステムの外観図である。
【図８】図７に示されたコンピュータシステムの構造図である。
【図９】この発明に従った実施例のメインプログラムの制御構造を示すフローチャートである。
【図１０】この発明に従った実施例のメインプログラムのオブジェクト検出およびしきい値推定処理をしめすフローチャートである。
【図１１】この発明に従った実施例のメインプログラムのオブジェクトトラッキング処理を示すフローチャートである。
【図１２】この発明に従った実施例の適用の結果例を示す模式図である。
【符号の説明】
２０コンピュータシステム、３０カメラ、４０コンピュータ、７２、７４、９２、９４、１０２、１０４、２００、２０２、２０４、２０８、２１２ブロッブ、３２０−３３０、４２０−４３２、５２０−５３０ステップ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a subsampling method and program for detecting and tracking objects in an image stream, and more particularly to a pixel subsampling method and computer-executable program for detecting and tracking substantially elliptical objects in real time.
[0002]
[Prior art]
With the development of high-performance computers, the ability of computers to handle large amounts of information in a very short time is being developed in a wide range of fields. One typical example is a robot that can interact with people in real time. Another example is a computer program that can reside on a computer or computer network, interact with people, and act at its own discretion. In this application, such a robot or program is referred to as an “agent”.
[0003]
In order to realize such a real-time dialogue between a person and an agent, for example, the ability to observe human and human activities using one or more cameras mounted on the head of a humanoid robot is necessary. [5]. A system that can detect and track multiple objects in an image acquired from a camera operating at a high frame (or field) rate, eg, 60 Hz, would be of great benefit.
[0004]
Probabilistic “blob trackers” are mostly based on maximization of certain likelihood functions, but have become more and more common recently. Many blob trackers using various modalities such as color histogram, Gaussian color mixture, luminance gradient, depth, optical flow, etc., or combinations of these modalities have been proposed [1,2,3,4,5 , 6]. As used herein, “blob” refers to an approximately elliptical object in an image. Due to the nature of the application, detection and tracking of blobs in the image stream is often required in real time.
[0005]
Advanced trackers require a significant amount of processing for each image pixel and cannot be applied to all images in real time. Usually, it is not possible to maintain the reliability of the resulting tracker while simplifying the processing at each pixel, so many practical tracking systems use alternative techniques such as windowing, masking, and subsampling. Adopted to reduce the amount of information to be processed.
[0006]
In window processing, instead of all images, only rectangular sub-images (windows) are used for processing.
[0007]
In masking, a binary image (mask) with a value of 0 or 1 is defined, and only pixels in the original image whose corresponding value in the binary image is equal to 1 are used for processing.
[0008]
In subsampling, for example, only every two rows and every two columns of an image are used for processing. This does not necessarily have to be every 2 rows or 2 columns, but may be every 3 rows or every 4 columns.
[0009]
A common feature of prior art blob trackers [1,2,3,4,5,6] is that the shape of the tracked object is approximated by quadratic statistics of pixels that are stochastically classified as “blob pixels” It is to be. As used herein, “secondary statistics” means calculating an average and a covariance matrix of pixels included in a blob in the case of a Gaussian distribution.
[0010]
  [References]
  [1] C.I. Bregler, “Learning and Recognizing Human Dynamics in Video Sequences”, IEEE Computer Society Conference, Computer Vision and Pattern Recognition Proceedings, pp. 567-574, San Juan, Puerto Rico, 1997 (C Bregler. Learning and recognizing human dynamics in video sequences. In Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, pp. 569-574, San Juan, Puerto Rico, 1997.)
  [2] D.D. Komanishu, V. Remesh, and P.A. E-mail, “Real-time tracking of non-rigid objects using moving averages”, IEEE Computer Society Conference, Computer Vision and Pattern Recognition Proceedings Vol. 2, pages 142-149, Hilton Head, South. Carolina, 2000 (D. Comaniciu, V. Ramesh, and P. Meer. Real-time tracking of non-rigid objects using mean shift. InProc. IEEE computer Society Conf. Computer vision and Pattern RecognitionVol. 2, pp. 142-149, Hilton Head, South Carolina, 2000.)
  [3] N. Josik, M.C. Turk, and T. S. Fan, “Tracking articulated objects in a dense imbalance map,” Proceedings of the 7th International Conference on Computer Vision, pages 123-130, Kerkyra, Greece, 1999 (N. Jojic , M. Turk, and TS Huang.Tracking self-occluding articulated objects in dense disparity maps.InProc. 7th Int. Conf. computer Vision, pp. 123-130, Kerkyra, Greece, 1999.)
  [4] S.E. J. et al. McKenna, Y.C. Raja, and S. G. Gon “Color Object Tracking Using Adaptive Color Mixing Model”, Image and Vision Computing,17 (3-4),225-231, March 1999 (S.J. McKenna, Y. Raja, and S. Gong. Tracking color objects using adaptive mixture models.Image and Vision Computing, 17 (3-4): 225-231, March 1999.)
  [5] A. Ude, T. Shibata and C.I. G. Atoxon, “Real-time visual system for dialogue with humanoid robots”, Robotics and Autonomous Systems, 37 (2-3), pp. 115-125, November 2001 (A. Ude, T. Shibata, and CG Atkenson.Real-time visual system for interaction with a humanoid robot.Robotics and Autonomous Systems, 37 (2-3): 115-125, November 2001.)
  [6] Y. Wu and T.W. S. Fan, “Co-starring approach to robust visual tracking,” Proceedings of the 8th International Computer Vision Conference, Vol. 11, pp. 26-33, Vancouver, Canada, 2001 (Y. Wu and TS Huang. A co-inference approach to robust visual tracking.Proc. Eight Int. Conf. Computer Vision, Vol. 11, pp. 26-33, Vancouver, Canada, 2001.)
[0011]
Accordingly, an object of the present invention is to provide a subsampling method and program suitable for real-time blob detection and tracking.
[0012]
Another object of the present invention is a subsampling method and method that can substantially equalize the time required to detect and track each blob even if the size of the blob in the image changes. Is to provide a program.
[0013]
Still another object of the present invention is to provide a sub-sampling method and program capable of detecting each blob and tracking faster without impairing the reliability of the resulting tracker.
[0014]
[Means for Solving the Problems]
A method according to an aspect of the invention is a subsampling method for real-time blob detection and tracking in an image stream based on affine deformation, the method comprising: identifying a region to be subsampled in an image; Calculating an affine transformation that maps a specified region of the image to a window of a predetermined size, applying an inverse transformation of the calculated affine transformation to the pixel coordinates of the window of a predetermined size, For each pixel of the defined size window, a subsampling method is provided that includes calculating pixel values and storing these values along with the associated pixel coordinates in the original image.
[0015]
Since the identified region is transformed into a window of a predetermined size by applying an inverse transformation to the pixels of this window, the computation time for subsampling is substantially constant. Therefore, real-time operation of the system is guaranteed.
[0016]
The applying step defines the position of the blob and the direction and length of the major and minor axes of the blob contained in the region to be subsampled with respect to the coordinate axes defined in the image, and defines the affine transformation. Mapping the blob onto the deformed image (shape) in the window, the major and minor axes of the deformed image being parallel to the respective axes of the coordinate system defined in the window The major and minor axes of the deformed image are shorter than the sides of the window and further include applying inverse affine transformations to calculate the respective pixel values and coordinates of the deformed image. Nearest neighbor or linear interpolation is used to calculate the pixel values of the deformed image.
[0017]
Because the lengths of the major and minor axes can vary greatly, subsampling the deformed image along one of the axes of the coordinate system defined in the window will result in excessive data As a result, the amount of information is reduced and accuracy is not lost.
[0018]
Preferably, the estimating step includes estimating the position of the blob and the direction and length of the major and minor axes of the blob included in the region to be subsampled with respect to the coordinate axes defined in the image.
[0019]
By estimating the position of the blob and the direction and length of the major and minor axes of the blob in the region to be subsampled, the transformed image is mapped to the major and minor axes of the transformed image. A suitable affine transformation can be defined that can be aligned with the coordinate axes defined in the window.
[0020]
More preferably, the fixed size window is a fixed size square and the defining step includes defining an affine transformation and mapping the blob onto the deformed image in the window. The major and minor axes of the image are parallel to the respective axes of the coordinate system defined in the window, and the major and minor axes of the deformed image are shorter than the sides of the window. Are the same length.
[0021]
Since the amount of information is reduced only in the direction of excess data, processing time can be substantially saved without loss of accuracy.
[0022]
You may define an affine transformation as follows.
[0023]
[Equation 3]

hereu ⁱ= [uⁱ, vⁱ]^TAnd θⁱIs the position and orientation of the shape represented by the coordinate system defined in the image during the i-th measurement, aⁱAnd bⁱIs the half length of the major and minor axes of the shape, respectively, w is the predefined size of the window, and s is a scale factor that specifies how much the deformed shape should be smaller than the window It is.
[0024]
According to another aspect of the invention, the step of calculating the affine transformation defines in the image the position of the blob and the direction and length of the major and minor axes of the blob included in the region to be subsampled. Estimating relative to the coordinate axes and defining an affine transformation to map the blob onto the deformed image in the window. The major and minor axes of the deformed image are parallel to the respective axes of the coordinate system defined in the window. The major and minor axes of the deformed image are shorter than the window edges. The applying step may include calculating a respective pixel value and coordinates of the deformed image by applying an inverse affine transformation and nearest neighbor or linear interpolation.
[0025]
The estimating step includes estimating the position of the blob and the direction and length of the major and minor axes of the blob included in the region to be subsampled with respect to the coordinate axes defined in the image.
[0026]
Another aspect of the present invention relates to a computer-executable program that is executed on a computer to implement the above-described method.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
[Background theory]
As previously mentioned, a common feature of prior art blob trackers is that the shape of the tracked object is approximated by second order statistics of the blob pixels. By calculating the eigenvalue decomposition of the associated covariance matrix, the range of the blob can be calculated along its major and minor axes. Since the lengths of both axes can be quite different, the image is subsampled along the main blob direction instead of the image coordinate axes and along these directions taking into account the length of each axis It would be reasonable to apply different scale factors.
[0028]
FIG. 1 schematically illustrates how an embodiment of the present invention processes blobs in an image.As will be described later, these transformation results bring about affine mapping, and the process of geometrically transforming an image by such transformation is called “affine deformation”.Referring to FIG. 1, a field (or frame) of image stream 70 includes

blobs

72 and 74. For each of the

blobs

72 and 74, an embodiment of the present invention separates the area containing the blob into different affine variants 80 or82To convert the

window

90 or 100 to a fixed size.
[0029]
The affine deformation 80 converts the blob 72 into a circle 92. Blob 74 will be converted to blob 94. Affine deformation82Converts the blob 74 into a circle 104. A blob 72 will be converted to another blob 102. Applying any of the prior art blob trackers to window 90, blob 92 (72) is detected and tracked. By applying a blob tracker to the window 100, the blob 104 is detected and tracked. Since

windows

90 and 100 are the same size,

blob

92 and 104 are detectedOrtrackingOrThe time required to do is substantially the same. Since the lengths of the blob axes can be quite different, the image should be subsampled along the main blob direction instead of the image coordinate axes and along these directions taking into account the length of each axis. It may be useful to apply different scale factors. Note that by calculating the eigenvalue decomposition of the covariance matrix associated with the blob, estimating the range along the long and short axes of the blob, i.e. calculating the location and shape of the ellipse surrounding the blob pixel. Can do.
[0030]
Deformation 80 and82Includes subsampling along the main direction and is realized by applying the following transformations:
(1) As shown in FIG. 2, the blob 200 in the image to be processed is translated to the blob 202 and its center is made coincident with the origin O of the image coordinates u and v.
(2) As shown in FIG. 3, the translated blob 202 is rotated so that the main direction coincides with the coordinate axes u and v to become a blob 204.
(3) As shown in FIG. 4, the blob 204 is reduced so that its major and minor axes are shorter than the sides of the window 206 of a predetermined fixed size. In this embodiment, the blob 204 is a circle. 208.
(4) As shown in FIG. 5, the window 206 is further translated to align its center with the center of a new window 210 of a predetermined size. The resulting blob 212 is detected by a prior art trackerBetrackingBeThe
[0031]
As shown in FIG. 6, these transformation results result in an affine mapping 230 that can be used to process the geometric transformation of the image by such transformation.Like above-mentionedThis is called “affine deformation”.
[0032]
Mapping in homogeneous coordinates is given by the following affine transformation.
[0033]
[Expression 4]

hereu _k ⁱ= [u_k ⁱ, v_k ⁱ]^Tandθ _k ⁱIs the position and orientation of the k-th blob at the i-th measurement, and a_k ⁱAnd b_k ⁱIs half the length of the major and minor axes, w is the predefined size of the window mapping the area around the blob, and s is how small the mapped blob should be smaller than the target window Is a scale factor that identifies This is necessary to ensure that the mapped region really contains the tracked object. This is because the exact blob parameters in the next image field or frame cannot be known in advance.
[0034]
Blob parameter (u_k ⁱ, v_k ⁱ  θ_k ⁱ, a_k ⁱ  b_k ⁱ) Is estimated by the prediction process. Blob parameter prediction is a discrete quadratic system
x_i= ax_i-1+ bx_i-2+ e_i
May be based on where x_iIs (u_k ⁱ, v_k ⁱ, θ_k ⁱ, a_k ⁱ, b_k ⁱ) And e_iIs system noise. Alternatively, the blob parameter may be predicted by a Kalman filter.
[0035]
If the tracked blob is small enough, the next measurement timeu _k ⁱCan be assumed to stay in a window of size w centered at, ie 2 * b_k ⁱIf ≦ w / s, there is no need to transform the input image, just copy the rectangle of the area of interest to the target window. This ensures that the blob that is already small enough to be processed in real time is not reduced.
[0036]
  Since the described transformation 230 is reversible, the pixels of the output window 210 that are typically smaller than the region of interest 220, including the blob 200 in the original image, are parsed and inversely mapped to each pixel of this window 210.232Can be used to generate a deformed image.KiThe The associated pixel values (color luminance) at these positions are estimated by nearest neighbor or linear interpolation. The transformed pixel positions are also stored, since the blob tracker used in this example, like many other prior art trackers, needs them for the next blob position estimation. is there.
[0037]
A blob tracker can then be applied to the deformed image, such as the image in window 210. In this way, substantial time savings are possible without loss of accuracy. This is because the amount of information is reduced only in the direction of excessive data. Since the size of the deformed image (window 210) is fixed, the processing time per blob is almost constant, thus guaranteeing real-time operation of the system.
[0038]
As described below, the sub-sampling concept described here can also be used for real-time blob detection. Typically, a blob tracker has no information about the initial state of the blob. Therefore, the detection of the blob is performed based on the random search process, and the shape and location of the blob in the image is selected at random. The shape parameters are varied in a controlled manner so that the 2-D size of the generated blob is within pre-specified limits.
[0039]
To achieve real time operation, the region of interest around each blob is transformed into a fixed size window as described above. In this embodiment, a function that is a probabilistic index used for blob detection is evaluated at each pixel of the transformed image. If the sum of all probabilities in the window exceeds a certain threshold, the region is considered noteworthy and the tracker is started with the associated, randomly selected blob parameters.
[0040]
Since such thresholds depend on many factors, it is impossible to select them in advance. Therefore, in this embodiment, the initialization process is divided into two stages. First, the algorithm searches the image stream for a sufficient amount of time (typically 5 seconds) and samples the sum of probabilities. Thereafter, the threshold value for each of the determined objects is set to a value between the average of the sum of the estimated probabilities and the maximum value. In the second stage, the random search is started again. When the automatically selected threshold is exceeded, the object is considered discovered. This same procedure can be used when the tracker loses sight of an object. In this case, the detection process can be started from the first stage or the second stage depending on the application.
[0041]
[Automatic blob detection by random search]
Automatic detection of objects to be noticed and subsequent initialization of the tracker is an essential part of any practical tracking system. Since we are interested in dynamic scenes captured by a moving camera, the detection algorithm needs to be executed the same as or faster than the tracking algorithm. Analyzing one image for a long time does not help because the object or camera of interest may have moved to another location before processing is complete. In addition, in a practical system, the user should not be expected to set various parameters for various scenes and objects. This is because it is cumbersome. The basic knowledge of the system of the invention is provided by color and shape probability distributions. Since it takes time to search for an elliptical object in the image, only the color is used here as the basic knowledge for initializing the tracker.
[0042]
The probability that a pixel belongs to the Lth blob is given by an expression based on the color, but here it will not be described in detail for the sake of brevity. Since there is no information about the initial state of the blob, its shape and size are chosen randomly in the image. The shape parameters are changed in a controlled manner so that the 2-D size of the generated blob is within pre-specified limits. To achieve real-time operation, transform the region of interest around each blob into a fixed size window as described above. The color probability given by an equation is then estimated for each pixel of the transformed image. If the sum of all probabilities in the window exceeds a certain threshold, the region is considered noteworthy and the tracker is started using the associated, randomly selected blob parameters.
[0043]
[Preferred embodiment]
Hardware configuration
This embodiment will be described below with a single static camera as an example, but the method and program described here comprises a plurality of mobile cameras, such as a humanoid robot with two cameras on the head. It can be easily applied to other systems.
[0044]
  FIG. 7 illustrates a computer system that implements the method of this embodiment. FIG. 8 is a block diagram of the system. Referring to FIG. 7, a computer system 20 that implements this embodiment includes an FD (flexible disk) drive.The52 and a computer 40 having a CD-ROM (Compact Disc Read Only Memory) drive 50, including a keyboard 46, a mouse 48, a monitor 42, and a video camera 30, all connected to the computer 40.
[0045]
  Referring to FIG. 8, computer 40 is an FD driver.TheIn addition to the CPU 52 and the CD-ROM drive 50, a CPU (Central Processing Unit) 56, a CPU 56, and an FD driverThe52, a bus 66 connected to the CD-ROM drive 50, a video capture board 68 connected to the bus 66 and the camera 30, a read only memory (ROM) 58 for storing a boot-up program and the like, and a CPU 56. And random access memory (RAM) 60 for storing program instructions, system programs, and data.
[0046]
Although not shown here, the computer 40 may further include a network adapter board that provides a connection to a local area network (LAN). In the case of a humanoid robot, the system further includes a number of actuators that move the various parts of the camera and robot under the control of the computer system 20.
[0047]
A program that causes the computer system 20 to perform the sub-sampling method of this embodiment may be stored in the CD-ROM 62 or FD 64 inserted into the CD-ROM drive 50 or FD drive 52 and further transferred to the hard disk 54. Alternatively, the program may be transmitted to the computer 40 through a network (not shown) and stored in the hard disk 54. The program is loaded into the RAM 60 at the time of execution. The program may be directly loaded into the RAM 60 via the CD-ROM 62, the FD 64, or the network.
[0048]
The program described below includes several instructions that cause the computer 40 to perform the method of this embodiment. Since some of the basic functions necessary to carry out this method are provided by the operating system (OS) of the computer 40 or a third party program, or a module installed in the computer 40, this program is the embodiment. It is not always necessary to include all functions necessary for realizing the method. The operation of computer system 20 is well known and will not be repeated here.
[0049]
Software configuration
FIG. 9 illustrates the overall control structure of the main program that is executed on the computer system 20 (FIG. 7) to implement the method of this embodiment. Referring to FIG. 9, when the main program starts (300), the step 320 estimates a threshold value from the image stream 302 and the step (322) determines whether the threshold value has been estimated. Including. If the threshold has been estimated, control proceeds to step 324, otherwise control returns to step 320. The main program further includes a step 324 for detecting an object in the image stream 304 and a step 326 for determining whether an object is detected in step 324. If the object has been found, control proceeds to step 328, otherwise control returns to step 324.
[0050]
The main program further includes a step 328 of tracking an object in the image stream 306 and a step of determining whether the object has been found. If the object has been found, control returns to step 328, and if the object has not been found, return to step 324.
[0051]
  FIG. 10 shows the control structure of the process of step 324 in more detail. This process also applies to the step of estimating the threshold (320). Referring to FIG. 10, the object detection process 324 starts from the step of randomly generating the shape and location of the blob from the shape model 402 (400). The process 324 further includes applying an affine transformation to the images in the image stream 404 and the probability model based on the color model 406.

Sum

Estimating step424And step 426 of determining whether a threshold is available.
[0052]
If the threshold is not available, that is, if the process is within step 320 of estimating the threshold, control proceeds to step 428 where sufficient data is available to generate the threshold. Determine if it is possible. If sufficient data is available, a threshold is generated at step 430 and control returns to step 420 to begin object detection. If sufficient data is not available at step 428, control returns to step 420 and steps 420 through 428 are repeated until sufficient data is available at step 428.
[0053]
If it is determined at step 426 that the threshold is available, control proceeds to step 432 to further determine whether the threshold has been exceeded. If the threshold is exceeded, it is assumed that a blob has been found and tracking begins (408). If the threshold is not exceeded, control returns to step 420 and steps 420 through 432 are repeated until it is determined in step 432 that the threshold has been exceeded.
[0054]
FIG. 11 shows in more detail the process of step 328 for tracking an object. In this embodiment, an EM (Expectation Maximization) algorithm is applied to the deformed image window. It calculates the probability that a pixel belongs to one of the blobs (expectation step) and estimates new location and shape parameters at each pixel based on the location, shape, and previously calculated color probabilities (Maximization step). In all calculations where pixel location is required, the previously calculated coordinates of the pixels in the original image are used in place of the pixel coordinates of the new image window. Once the algorithm converges or the maximum number of iterations is reached, EM iteration stops.
[0055]
Once a blob is detected, processing begins and blob tracking begins (500), which includes applying an affine transformation to the image in the image stream 502 containing the blob (520) and the color model 504. A step of evaluating the color probability based on the color model 504 and a shape model based on the shape model 506 (an expected step in the EM algorithm) 524, and a block based on the output of step 524.OfEstimating shape and location and updating shape model 506 based on the result526(Maximization step in the EM algorithm) and determining (528) whether the algorithm has converged or the iteration step limit has been exceeded. If the algorithm has converged or the limit of the iteration step has been exceeded, control proceeds to step 530, otherwise control returns to step 524 and the EM algorithm is repeated in

steps

524 and 526.
[0056]
In step 530, it is determined whether a blob has been found. If it is determined that tracking has failed for some blobs, detection of these blobs is resumed and tracking of other blobs continues (508). If it is determined that a blob has been found, control returns to step 520 and tracking is repeated.
[0057]
In step 520, the affine transformation is calculated using the predicted blob shape and position, but this may simply be the blob position and shape estimated from the previous image. The mapping transforms the area around the blob so that the blob is centered in a new image window of fixed size and its shape is a circle, as shown in FIG.FIG.As shown in The radius of this circle is smaller than the window because it accounts for inaccuracies in prediction.
[0058]
Also in step 520, the inverse affine transformation is applied to each pixel in the new window. Since the transformed pixel is generally not mapped exactly to one of the image pixels, the color of each new pixel is estimated by nearest neighbor or linear interpolation of the pixels of the original image. In addition to the color, the corresponding pixel location (with respect to the affine transformation) of the original image is also stored for each pixel of the new image.
[0059]
At step 522, a color probability is calculated for each pixel of the transformed image. In this embodiment, the color model is kept constant, so these need only be calculated once.
[0060]
Steps 524 to 528 implement the EM algorithm. At step 524, for each pixel, the probability that the pixel belongs to one of the blobs is calculated based on the location, shape, and previously calculated color probability (expectation step). At step 526, new location estimates and shape parameters are calculated.
[0061]
FIG. 12 shows the results of this example. The system has detected three

blobs

560, 562 and 564 in the image 550. The blob is marked with a white cross. The system is applied to these

blobs

560, 562 and 564, outputting transformed

images

580, 582 and 584, respectively, which are placed on the image 550. As shown in FIG. 12,

images

580, 582 and 584 are all the same size and the blob has been transformed to be substantially a circle.
[0062]
In order to further reduce the calculation time, a parallel version of the tracker may be realized by a multiprocessor PC (personal computer) having a multithread function that can be used in a main operating system. The above algorithm can be parallelized at various levels. For example, the blobs may be divided into two groups and the algorithm may be started with one thread per blob group on the dual processor PC. It may be necessary to synchronize all threads to ensure accurate detection and tracking.
[0063]
Although the present invention has been described with reference to specific embodiments, the present invention is not limited thereto. The scope of the invention is limited only by the claims.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing the effect of an embodiment according to the present invention.
FIG. 2 is a schematic diagram illustrating a translation stage of an affine transformation of an embodiment according to the present invention.
FIG. 3 is a schematic diagram showing a rotation stage of affine transformation in an embodiment according to the present invention.
FIG. 4 is a schematic diagram showing scale steps of affine transformation of an embodiment according to the present invention.
FIG. 5 is a schematic diagram showing another translation stage of the affine transformation of the embodiment according to the present invention.
FIG. 6 is a schematic diagram showing the effect of affine transformation and its inverse transformation in the embodiment according to the present invention.
FIG. 7 is an external view of a computer system in which an embodiment of the present invention is implemented.
FIG. 8 is a structural diagram of the computer system shown in FIG. 7;
FIG. 9 is a flowchart showing a control structure of a main program according to an embodiment of the present invention.
FIG. 10 is a flowchart showing object detection and threshold value estimation processing of the main program according to the embodiment of the present invention.
FIG. 11 is a flowchart showing object tracking processing of the main program of the embodiment according to the present invention.
FIG. 12 is a schematic diagram showing an example of the result of application of an embodiment according to the present invention.
[Explanation of symbols]
20 computer systems, 30 cameras, 40 computers, 72, 74, 92, 94, 102, 104, 200, 202, 204, 208, 212 blobs, 320-330, 420-432, 520-530 steps

Claims

A subsampling method for real-time blob detection and tracking in an image stream based on affine deformations,
Identifying an elliptical blob candidate region to be subsampled in the image where a blob is expected to exist ;
A window including the blob candidate area in the image is set in a predetermined size and at a predetermined position so that the blob candidate area has a predetermined position and a predetermined shape. and calculating an affine transformation that maps,
The affine transformation is determined so that the blob candidate region is smaller than the window size when transformed by the affine transformation,
The method further comprises:
Mapping an image of a region including the blob candidate region in the image to the window by the affine transformation;
For each pixel of the window size, wherein the predetermined steps of the prior conversion by affine transformation on the basis of the pixel value of the corresponding pixel to calculate the pixel value, to memorize these values in a storage device,
Determining the position and shape of the pixel region belonging to the blob in the image in the window after mapping by probability calculation using a color model for the pixel value of each pixel stored in the storage device;
Applying the inverse of the calculated affine transformation to the pixel coordinates of the pixel region belonging to the blob determined by the determining step within the predetermined size window, and mapping the blob to the image And determining the position and shape of the blob in the image .

The mapping step comprises:
Estimating the position of the blob candidate region within the image, and a long axis and short axis direction and length of the elliptical shape of the blob candidate region for coordinate axes defined in the image,
The elliptical shape of the blob candidate region, within said window, and a step of defining the affine transformation so that is mapped to the predetermined position and the predetermined shape after deformation by the affine transformation the major axis and the minor axis of the elliptical shape of the blob candidate region, the parallel to the respective axes of the coordinate system defined in the window, and the affine transformation so as to be shorter than the side of the window is defined ,
Said mapping step further comprises applying respective said affine transformation and nearest neighbor or linear interpolation coordinate and pixel value of the region of pixels including the blob candidates in the image, each of the coordinates of the deformed image and The subsampling method according to claim 1, comprising calculating a pixel value .

Wherein the step of estimating includes a position of the blob candidate region with respect to the coordinate axes defined the major and minor axes of the direction and length of the elliptical shape of the blob candidate region included in the region to be subsampled in the image The sub-sampling method according to claim 2, further comprising the step of:

Before Kiu guiding is a square of a fixed size,
Wherein the step of defining is, parallel to the respective axes of the coordinate system major axis and the minor axis of the elliptical shape of the blobs candidate region after conversion by affine transformation is defined within the window, the ellipse after conversion shorter than the major and minor axes of the window side of the shape, as the major and minor axes of the elliptical shape after conversion becomes the same length, comprising the step of defining the affine transformation, claim 3 The subsampling method described in 1.

The affine transformation is defined by the following equation:

Where ^{^{u i = [u i, v}} i] T and theta ⁱ is the position and orientation of the elliptical shape of the blob candidate region represented by the i-th coordinate system stipulated in the image at the time of measurement, a ⁱ and b ⁱ is the major axis and the length of the minor axis of the half of the elliptical shape, respectively, w is a said predetermined et size of said window, s is the blob candidates converted by the affine transformation elliptical shapes are scale factor for specifying a should how smaller than the window, sub-sampling method according to any of claims 2 4.

A computer executable program for causing a computer to perform a sub-sampling method for real-time blob detection and tracking in an image stream based on affine deformation, the sub-sampling method comprising:
Identifying an elliptical blob candidate region to be subsampled in the image where a blob is expected to exist ;
Affine the blob candidate region, so that the predetermined position and a predetermined shape, for mapping the blob candidate region, a window disposed at and place size predetermined in the image and calculating a transformation,
The affine transformation is determined so that the blob candidate region is smaller than the window size when transformed by the affine transformation,
The method further comprises:
Mapping an image of a region including the blob candidate region in the image to the window by the affine transformation;
For each pixel of the window of the predetermined size, calculating a pixel value based on a pixel value of a corresponding pixel before conversion by the affine transformation , and storing these values in a storage device ;
Determining the position and shape of the pixel region belonging to the blob in the image in the window after mapping by probability calculation using a color model for the pixel value of each pixel stored in the storage device;
Applying the inverse of the calculated affine transformation to the pixel coordinates of the pixel region belonging to the blob determined by the determining step within the predetermined size window, and mapping the blob to the image And determining the position and shape of the blob in the image .

The mapping step comprises:
A step wherein the blob is elliptical candidate areas, estimating the position of the, with respect to the long axis and short axis direction and length and coordinate axes are defined in the image of the blob candidate region,
Elliptical shape of the blob candidate region, said the window, and a step of defining affine transformation so that is mapped to the predetermined position and said predetermined shape, after deformation by the affine transformation major and minor axes of the elliptical shape of the blob candidate region is the parallel to the respective axes of the coordinate system defined in the window, and the affine transformation so as to be shorter than the side of the window is defined,
It said mapping step further, the affine transformation and the nearest neighbor or linear interpolation is applied respectively to the coordinates and the pixel value of the region of pixels including the blob candidates in the image, each of the coordinate values of the modified image And the program of claim 6 comprising calculating pixel values .

Wherein the step of estimating includes a position of the blob candidate region with respect to the coordinate axes defined the major and minor axes of the direction and length of the elliptical shape of the blob candidate region included in the region to be subsampled in the image The program according to claim 7, further comprising a step of estimating by using the program.

Before Kiu guiding is a square of a fixed size,
It said defining step is a parallel to the respective axes of the coordinate system major axis and the minor axis of the elliptical shape of the blobs candidate region after the conversion is defined in the window by the affine transformation, the elliptical shape of the converted The method includes : defining the affine transformation so that the major axis and the minor axis are shorter than the side of the window, and the major axis and the minor axis of the elliptical shape after the transformation have the same length. Program.

The affine transformation is defined by the following equation:

Where u ⁱ = [u ⁱ , v ⁱ ] ^T and θ _i are the position and orientation of the elliptical shape of the blob candidate region represented by the coordinate system defined in the image at the i-th measurement, and a ⁱ and b ⁱ is the major axis and the length of the minor axis of the half of the elliptical shape, respectively, w is a said predetermined et size of said window, s is an ellipse of the blob candidates converted by the affine transformation shape is scale factor for specifying a should how smaller than the window, the program according to claim 7 of claim 9.