JP3727991B2

JP3727991B2 - Image extraction device

Info

Publication number: JP3727991B2
Application number: JP34388695A
Authority: JP
Inventors: 優和真継
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1995-12-28
Filing date: 1995-12-28
Publication date: 2005-12-21
Anticipated expiration: 2015-12-28
Also published as: JPH09186936A

Description

【０００１】
【発明の属する技術分野】
本発明は画像抽出装置に関し、特に、画像切り出し機能、領域抽出機能を有する画像処理システムに用いて好適なものである。
【０００２】
【従来の技術】
従来、画像切り出し（抽出）を行う一般的な手法としては、特定の色背景を用いるクロマキー方式や、ヒストグラム処理、差分処理、微分処理、輪郭強調処理、輪郭追跡などの画像処理によりキー信号を生成するビデオマット（テレビジョン学会技術報告、ｖｏｌ．１２，ｐｐ．２９−３４，１９８８）などの手法が知られている。
【０００３】
画像から特定領域の画像を抽出する装置として、例えば特公平6-9062号公報に開示される手法においては、空間フィルタによって得られる微分値を２値化して境界線を検出し、上記検出した境界線で仕切られる連結領域にラベル付けを行い、同じラベルを有する領域を抽出するようにしている。
【０００４】
なお、背景のみの画像との差分に基づいて画像抽出を行う手法は古典的なものであり、最近は、特開平4-216181号公報において背景のみの画像と処理対象画像との差分データにマスク画像( 特定処理領域のこと) を設定して、画像中の複数の特定領域における対象物体を抽出または検出を行う手法が開示されている。
【０００５】
また、特公平7-16250 号公報に係わる方式では、抽出対象の色彩モデルを用いて背景を含む現画像の色彩変換データ、背景のみの画像と現画像との明度の差分データから抽出対象の存在確率分布を求めるようにしている。
【０００６】
さらに、カメラ操作、動作モードの適正化を行う手法の一例として、特開平6-253197号公報に記載されている方法では、背景のみの画像の撮像時に平均輝度が適正となるように絞りを設定する。そして、同じ設定値を用いて現画像を撮像してそれらの差分画像データに基づいて物体画像の抽出を行うようにしている。
【０００７】
一方、撮像手段においては、信号処理のデジタル化に伴い映像情報の処理および加工の自由度が上がるに従って、内部での処理は輝度レベル、色調の変換、ホワイトバランス処理、量子化サイズ変換などの比較的軽易な処理から、エッジ抽出機能を有するものや、色成分の逐次成長法を用いた画像抽出機能（テレビジョン学会技術報告、ｖｏｌ．１８，ｐｐ．１３−１８，１９９４）を有するものなどへと大きな進展を見せている。
【０００８】
【発明が解決しようとする課題】
しかしながら、上記従来例のうち、背景のみの画像との差分データを利用する方式は、特開平6-253197号公報に記載されている手法を除いて、いずれも撮影条件（カメラパラメータおよび照明などの外的条件）を考慮していないために、同じ撮影条件および同一固定位置で、背景のみの画像と抽出すべき被写体込み画像を得なければ、差分データからの抽出対象領域の判定誤差が非常に大きくなってしまうという問題があった。
【０００９】
また、特公平7-16250 号公報に記載されている方法は、抽出対象の色彩モデルを要するという点で任意の未知物体の画像抽出には不向きであった。
【００１０】
また、特開平6-253197号公報に係わる方式においても、撮像手段が同一固定位置にあることや、背景のみの画像撮像時と同一の撮像条件であることが前提であり、背景のみの画像を撮像した時の絞りの設定値を、被写体込み画像を撮像する際に用いるという技術を開示しているに過ぎなかった。また、背景のみの画像の撮像条件を優先するこの方式では、被写体込み画像の抽出対象の画質は一般的には保証されないという問題があった。
【００１１】
さらに、上記クロマキー方式は、背景の制約が大きいために屋外で使えない問題や、色ぬけなどの問題点があった。
また、ビデオマット方式は、輪郭の指定作業は人間が画素単位で正確に行う必要があり、そのためには労力と熟練を要するという問題点があった。
【００１２】
また、微分演算で境界線を検出して境界線で仕切られる領域を検出する方式は、複雑なテクスチャパターン（模様) を有する物体への適用が困難であることや、安定的かつ汎用的な境界線検出処理方式がないこと等の問題点があった。
【００１３】
本発明は上述の問題点にかんがみ、複数画像間の比較から特定被写体を抽出する際に、各画像における撮像条件の相違に対する許容度を大きくすることと、抽出対象領域を簡易に同定することと、抽出すべき対象の良好な画像が得られるようにすることとができる画像抽出装置を提供することを目的とする。
【００２０】
【課題を解決するための手段】
本発明の画像抽出装置は、撮像手段により撮像を行う際の撮像条件を制御するための撮像条件制御手段と、上記撮像手段により撮像を行う際の撮像条件を記録するとともに、上記記録した撮像条件を再生して出力するための撮像条件記録再生手段と、上記撮像手段で撮像した複数画像を記録するための画像記録手段と、上記撮像条件記録再生手段から供給される撮像条件に基づいて上記複数画像のうち、少なくとも一つの画像の画像データを変換するための画像データ変換手段と、上記画像データ変換手段によって変換された複数画像の画像データを比較するための画像データ比較手段と、上記画像データ比較手段から出力される比較結果に基づいて特定画像領域の画像を抽出するための画像切り出し手段とを具備し、上記画像記録手段は、背景のみの画像および切り出し対象画像が背景中に存在する被写体込み画像を一次的に記憶保持し、上記画像データ比較手段は、上記画像記録手段により再生されて出力される複数画像の差分画像データを抽出し、上記撮像条件記録再生手段は、所定背景中の切り出し対象画像を撮像した時の撮像条件を記録し、上記画像データ変換手段は、上記撮像条件記録再生手段により再生されて出力される撮像条件に基づいて上記背景のみの画像の画像データを変換することを特徴とする。
【００２３】
また、本発明のその他の特徴とするところは、上記画像記録手段は、上記画像切り出し手段で得られる特定画像領域の画像データを符号化して記録することを特徴としている。
【００２４】
また、本発明のその他の特徴とするところは、上記画像データ変換手段は、上記背景のみの画像および被写体込み画像間の類似度が最大となるように空間的にシフト演算を行うようにしている。
【００２６】
また、本発明のその他の特徴とするところは、上記画像データ変換手段は、画像サイズ、輝度レベル、色成分および解像度を変換するようにしている。
【００２７】
【作用】
本発明は上記技術手段よりなるので、第１の発明によれば、撮像条件が記録および再生され、上記再生された撮像条件に基づいて画像データが変換されるので、例えば登録済の画像と現画像、あるいは動画像中の異なるフレーム画像などのような複数の画像間の比較を行って特定被写体を抽出する際に、各画像の撮像条件が相違することに対する許容度を大きくすることができ、これにより、手ぶれなどによる撮像手段の位置の微小変動、露光条件の相違、センサのゲイン変動等が多少あっても特定被写体を抽出することが可能となる。
【００２８】
そして、背景のみの画像および切り出し対象画像が背景中に存在する被写体込み画像が、画像記録手段で一次的に記憶保持されるとともに、上記画像記録手段により再生されて出力される複数画像の差分画像データが画像データ比較手段により抽出され、上記抽出された差分画像データが上記背景のみの画像と被写体込み画像との比較データとして用いられるので、背景から特定被写体を切り出す処理を効率的に行うことが可能となる。
【００２９】
さらに、撮像条件記録再生手段に記録された所定背景中の切り出し対象画像を撮像した時の撮像条件に基づいて背景のみの画像の画像データが変換されるので、切り出し対象にピントの合った高画質の画像を出力することが可能となる。
【００３０】
第２の発明によれば、画像切り出し手段で得られる特定画像領域の画像データが符号化されて画像記録手段に記録されるので、必要とする画像を効率的に符号化することができる画像抽出が可能となる。
【００３１】
第３の発明によれば、背景のみの画像および被写体込み画像間の類似度が最大となるように、空間的にシフト演算が行われるので、撮像手段の位置、姿勢の変動を許容する画像抽出が可能となる。
【００３３】
第４の発明によれば、画像サイズ、輝度レベル、色成分、および解像度を画像データ変換手段によって変換されるので、異なる撮像条件で撮像された複数画像間の画像データの正規化を良好に行うことができ、特定被写体の抽出を高精度に行うことが可能となる。
【００３４】
【発明の実施の形態】
以下、本発明の画像抽出装置の実施形態を図面を参照して説明する。
図１は、本発明の画像抽出装置の要部構成を示す機能構成図である。図１においてＡは撮像光学系および撮像素子等からなる撮像手段、Ｂはズーム、焦点、絞りその他、種々の撮像パラメータを制御する撮像条件制御手段、Ｃは撮像条件制御手段Ｂの制御データを記録再生する撮像条件記録再生手段、Ｄは撮像手段Ａによって撮像された画像を記録する画像記録手段、Ｅは撮像条件記録再生手段Ｃより供給された各種制御パラメータに基づいて画像を変換する画像データ変換手段、Ｆは背景画像と被写体込みの画像とを比較する画像データ比較手段、Ｇは画像切り出し手段であり、画像比較手段Ｆの出力に基づいて画像の切り出し領域を設定するものである。Ｈは画像を表示するためのモニタディスプレイ、電子ビューファイダ等からなる表示手段である。
【００３５】
また、上記撮像手段Ａは、複数画像を撮像するために設けられているものであり、上記撮像条件制御手段Ｂは、上記撮像手段Ａにより撮像を行う際の撮像条件を制御するためのものである。なお、本実施形態においては、上記撮像条件は露光量、合焦状態およびストロボ発光の有無を含んでいる。
【００３６】
上記撮像条件記録再生手段Ｃは、上記撮像手段Ａにより撮像を行う際の撮像条件を記録するとともに、上記記録した撮像条件を再生して出力するためのものであり、本実施形態においては上記撮像条件記録再生手段Ｃは、所定背景中の切り出し対象画像を撮像した時の撮像条件を記録する。
【００３７】
上記画像記録手段Ｄは、上記撮像手段Ａで撮像した複数画像を記録するためのものであり、背景のみの画像および切り出し対象画像が背景中に存在する被写体込み画像を一次的に記憶保持する。また、上記画像切り出し手段Ｇで得られる特定画像領域の画像データを符号化して記録する。
【００３８】
また、上記画像データ変換手段Ｅは、上記撮像条件記録再生手段Ｃから供給される撮像条件に基づいて上記複数画像のうち、少なくとも一つの画像の画像データを変換するために設けられているものである。また、上記撮像条件記録再生手段Ｃにより再生されて出力される撮像条件に基づいて上記背景のみの画像の画像データを変換する。さらに、上記画像データ変換手段Ｅは、上記背景のみの画像および被写体込み画像間の類似度が最大となるようにメモリ上で空間的にシフト演算を行うとともに、画像サイズ、輝度レベル、色成分および解像度を変換する。
【００３９】
上記画像データ比較手段Ｆは、上記画像データ変換手段Ｅによって変換された複数画像の画像データを比較するためのものであり、上記画像記録手段Ｄにより再生されて出力される複数画像の差分画像データを抽出する。
【００４０】
上記画像切り出し手段Ｇは、上記画像データ比較手段Ｆから出力される複数画像の画像データの比較結果に基づいて特定画像領域の画像を抽出するために設けられているものである。
【００４１】
このように構成された本実施形態の画像抽出装置によれば、例えば登録済の画像と現画像、あるいは動画像中の異なるフレーム画像などのような、複数の画像間の比較から特定被写体を抽出する際に、各画像のそれぞれにおいて撮像条件が相違することに対する許容度を大きくすることができる。
【００４２】
これにより、背景のみの画像から被写体画像の抽出を行う際に、例えば手ぶれなどによる撮像手段Ａの位置の微小変動や露光条件の相違、あるいはセンサのゲインなどの変動が多少あっても、特定被写体を良好に抽出することができる。また、色彩モデルなど抽出対象に関するモデルを用いずに照明条件などの変動の許容度を大きくすることができる。
【００４３】
また、撮像条件やカメラパラメータの変動に対する許容度を高めることができるので、背景からの特定被写体を切り出す処理を効率的に行うようにすることができる。さらに、背景のみの画像を撮像した際の撮像条件で撮像した被写体込み画像を用いて背景のみの画像の画像データを変換するので、背景のみの画像と被写体込み画像とから撮像条件およびカメラパラメータの変動に対する許容度を高めることができる。また、背景のみの画像の撮像時の撮像条件に左右されない高画質な被写体抽出を行うことができ、切り出し対象にピントの合った高画質な画像を出力することができる。
【００４４】
また、本実施形態の他の特徴によれば、画像切り出し手段Ｇで得られる特定画像領域の画像データを符号化して画像記録手段Ｄに記録するようにしたので、必要とする画像を効率的に符号化することができ、画像抽出を良好に行うようにすることができる。
【００４５】
また、本実施形態のその他の特徴によれば、背景のみの画像および被写体込み画像間の類似度が最大となるように、画像データ変換手段Ｅが空間的にシフト演算を行うようにしたので、撮像手段Ａの位置の変動や姿勢の変動を許容する画像抽出が可能となり、撮像時の手ぶれ等による悪影響が少ない画像抽出を行うことができる。
【００４６】
また、本実施形態のその他の特徴によれば、撮像条件に露光量、合焦状態およびストロボ発光の有無を含むようにしたことにより、異なる撮像条件で撮像された複数画像から特定被写体の画像抽出を行うことが可能となり、倍率条件、ピント、コントラスト、照明条件などの変動に対する許容度を高めた画像抽出を行うことができる。
【００４７】
また、本実施形態のその他の特徴によれば、画像データ変換手段Ｅが画像サイズ、輝度レベル、色成分、および解像度を変換するようにしたので、異なる撮像条件で撮像された複数画像間の画像データの正規化を行い、画像間の比較に基づく特定被写体の抽出処理を高精度に行うことができ、高精度な被写体画像抽出を行うことができるようになる。
【００４８】
次に、図２〜図６を参照しながら本発明の画像抽出装置の構成および動作をより具体的に説明する。
まず、第１の実施形態を説明する。第１の実施形態の基本処理は、背景のみの画像と背景中の被写体画像とを撮像し、両者の撮像条件を考慮した比較データ（差分データ）に統計的な処理を施すことにより、被写体画像領域を検出して被写体の画像抽出を行うものである。
【００４９】
また、本実施形態では、操作者がビデオカメラ等による撮像手段を手に持って撮像することを想定し、抽出すべき被写体が背景中に入っている画像を最初に撮像し、その時の撮像条件を画像データとともに記憶手段に記録する。次に、同じ撮像条件（画像信号特性パラメータを含むカメラ内部パラメータ）を記憶手段から読みだして背景のみの画像を撮像するようにしている。
【００５０】
一方、２回の撮像で環境条件が同一とは見做せない場合、例えば、撮像時刻や外部からの照明条件が全く異なる場合などは、同一条件にすべき撮像パラメータは倍率、ピントなどに限定して、同一条件にできない事態が生じるのを防止するようにしている。
【００５１】
図２は、本実施形態の撮像システムの要部構成を示している。図２において、１は撮像手段、２は撮像レンズからなる結像光学系で、本実施形態では立体画像を撮像するための複眼撮像系となっている。３は結像光学系２の各レンズを駆動するレンズモータ駆動制御部、４はイメージセンサであり、一般にはＣＣＤ等が用いられる。
【００５２】
５は撮像パラメータ計測制御手段であり、倍率を制御するズームレンズの焦点距離検出手段、周知の手段によってレンズの合焦状態を検出する合焦状態検出手段、例えばＣＣＤの蓄積時間を制御するシャッタ速度検出制御手段、絞りの開口径を制御する絞り計測制御手段、画像信号特性パラメータ（ガンマ、ニー、ホワイトバランス補正、ＣＣＤの蓄積時間など）の特徴量（例えばガンマについては補正係数など）検出手段などを含んでいる。６はメモリ等からなる画像記録手段、７は表示手段としての電子ビューファインダ（ＥＶＦなど）である。
【００５３】
８は撮像モード記録手段であり、撮像パラメータ、画像特性パラメータおよびストロボ発光の有無、スキャンニングなどの意図的な運動、手ぶれの有無などを含む撮像時の情報を記録するためのものである。なお、手振れ、スキャンやパンなどのカメラ運動については撮像手段に加速度センサを内蔵させ、その出力データ等から判定してもよい。これらの付帯データは、画像データとともに画像データベース１８に記憶される。
【００５４】
９は、撮像条件などに基づいて合成時に画像データの変換を行う画像データの変換手段であり、詳細については後で説明する。１０は画像信号処理回路であり、ガンマ、ニー、ホワイトバランス補正、ＡＦ（Automatic Focusing）、ＡＥ（Automatic Exposure）、ＡＧＣ（Automatic Gain Contorol ）処理回路などを有している。１１は画像データ比較手段であり、背景のみの画像と被写体込み画像との差分を検出して出力するためのものである。
【００５５】
１２は画像切り出し手段であり、上記画像データ比較手段１１からの出力を統計処理した結果に基づいて抽出対象領域を同定し、上記同定した抽出対象領域を被写体込み画像から切り出しためのキー信号（またはマスクデータ）を出力するものである。１３は画像転送手段であり、外部のデータベースまたは端末などに画像データなどを転送するために設けられているものである。
【００５６】
１４はストロボ発光手段、１５は外部同期手段であり、信号線は省略するが、各回路へと同期クロックを供給する。１６は端末を表し、外部からの撮像モード制御、切り出し画像の選択、登録画像の検索および選択などを行うためのものである。１７はディスプレイであり、処理画像の出力やファインダディスプレイとしても機能する。
【００５７】
１８は画像データベースを表し、過去に撮像された画像データ、およびそれぞれの付帯データとして、登録画像であるか否かの種別、撮像パラメータ、撮像条件（屋外または室内の区別、ストロボ発光の有無など）、あるいはその他の情報（日付、時刻、場所、カメラ操作者、タイトルなど）を保存している。
【００５８】
１９は画像種別設定手段であり、他の画像との比較に基づいて対象を抽出する際の基準画像として登録するためのスイッチや、基準画像と比較すべき被写体抽出用画像かの種別を設定するためのスイッチ等の手段を表し、これにより画像種別が付帯情報として自動的に記録される。
【００５９】
２０はカメラパラメータ設定手段であり、通常は背景のみの画像と被写体込み画像とは同じ撮像モードで撮像が行われるが、上記カメラパラメータ設定手段２０は、撮像手段の内部の特性を操作者が任意に設定する場合に用いるものである。なお、撮像手段の内部の回路は、その機能ごとの表示としたが、各機能は不図示のマイクロプロセッサによって動作制御される。
【００６０】
次に、上述のように構成された本実施形態の画像抽出装置の基本処理を図３のフローチャートに示し、画像抽出の実例を図６に示して説明する。なお、これらの動作も、図には機能ごとの表示を行ったが、実際はマイクロプロセッサによって処理される。
処理が開始されると、先ず、最初のステップＳ１において、撮像手段より出力された画像が背景のみの画像であるか、それとも被写体込み画像であるかの種別が判断される。上記判断は、画像種別設定手段１９を介して操作者によって設定された画像の種別について行われる。すなわち、操作者が撮像しようとする画像が主被写体か、背景のみか、両者を併合したものかの種別を指示する。
【００６１】
ステップＳ１の判断の結果、被写体込みの画像を撮像する場合には、ステップＳ２に進んで撮像モードを設定する。その後、ステップＳ３において、撮像のための各種パラメータを設定して被写体に最適な撮像条件（以下に説明）で撮像を行う。そして、上記ステップＳ３における撮像を終了した後は、ステップＳ４に進み、上述のように倍率（焦点距離）、合焦度、絞り、シャッタ速度、手振れの有無、パンニング・チルティングの有無、あるいはゲイン等、各種の撮像モードや撮像条件などの付帯情報の計測を行う。
【００６２】
次に、ステップＳ５に進み、上記ステップＳ４で計測した付帯情報を画像データとともに所定のフォーマットで記録する。なお、上記付帯情報は対応する画像データのアドレス等とともに別途ヘッダファイル等に記録するようにしてもよい。
【００６３】
一方、上記ステップＳ１の判別結果、背景のみの画像を撮像する場合には、ステップＳ１からステップＳ６に進む。そして、ステップＳ６において、被写体込み画像の付帯情報を読み込み、次に、ステップＳ７において上述と同様に撮像モードを設定する。この場合、ステップＳ２の処理における同一条件パラメータを選択するようにして、基本的には、次のステップＳ８において同一の撮像条件で背景のみの画像を撮像するようにする。
【００６４】
ただし、環境条件などの変動に対応するために、上記ステップＳ７においては、撮像時に撮像モード記録手段８で記録された撮像条件の選択モードとして、屋外または室内の区別、ストロボ投光の有無等の情報を用いて最適な撮像モードを設定するようにする。撮像パラメータ計測制御手段５においては、これら撮像条件と付帯情報を活用して環境条件の変化の有無を判定し、同一条件にするように撮像パラメータを制御する。
【００６５】
なお、室内、屋外を問わず、背景のみの画像の撮像時刻と被写体込み画像の撮像時刻とが近接している場合には、環境条件、特にストロボ等のような、撮像手段に搭載の照明手段を除いた照明条件および背景パターン等はほとんど変化していないと考えられる。
【００６６】
したがって、画像信号特性パラメータを除く撮像モードを両者間で同一（例えば、被写体込み画像の撮像モードで統一）にすることにより、２つの画像間において背景パターンの同一領域での画像データの変動を抑制することができ、後続する画像切り出し用統計的処理の信頼性を増すことができる。
【００６７】
ただし、被写体画像の分光反射率特性等によってセンサ信号処理回路の特性（ガンマ、ホワイトバランスなど）が変化することやノイズが加わること等から、一般的な状況では同じ撮像モードでも背景部の画像データは２つの画像間で完全に一致しないケースもあり得る。
【００６８】
このようなケースに対応するために、ステップＳ９において画像データの変換処理を行う。この場合、スケール変換（倍率変動時）、輝度変換（露光量／ガンマ変動時）、色成分変換（ホワイトバランス特性変動）、画像間位置合わせ（画像の位置に変化のあるときは）等を行い、背景のみの画像のうち、被写体込み画像の背景のみの画像領域に相当する部分の画像データがほぼ同じとなるように正規化する。
【００６９】
次に、ステップＳ１０で背景のみの画像との画像比較処理を行って画像データ差分を算出するとともにその画像差分データに基づいて、ステップＳ１１で画像切り出し処理を行う。なお、これらの処理内容については後で詳しく説明する。
【００７０】
上記画像データ変換手段９は、図４に示すように、画像データおよびその付帯データを入力するデータ入力手段９０、輝度値変換手段９１、色成分変換手段９２、空間シフト演算手段９３、画像サイズスケーリング手段９４、パラメータ変動評価手段９５等から構成されている。なお、一般的には、被写体込み画像を基準画像とする。そして、上記基準画像については画像データを固定とし、背景のみの画像の画像データを変換するようにしている。
【００７１】
また、撮像モードが両画像間で同一の場合には、すなわち、環境条件の変化が全然無い場合には、輝度値および色成分の変換は一般的には必要ないといえる。しかし、被写体画像の特徴およびその背景に対する面積の割合によっては、画像信号特性パラメータ、すなわち、ゲインおよびホワイトバランス特性、ガンマ特性が変わることがある。
【００７２】
したがって、これらの特性値のうち、少なくとも一つの特性値の変化幅が一定閾値を越えた場合には、所定のメモリに記録した背景のみの画像および被写体込み画像中の各点のパラメータ特性値の差異に基づいて背景のみの画像を、その輝度、色成分を被写体込み画像の特性値に対応するように変換する。以下、各変換手段において行われる処理内容を説明する。
【００７３】
なお、画像データ変換手段９における変換処理は、各パラメータの変動が比較的小さい場合は、撮像環境に大きな変化がないと考えられ、図４（ａ）に示すごとく、複数画像のサイズを合わせるスケーリング変換、各画像の位置合わせを行って基本的な画像合成を可能にした後、輝度変換、色成分変換等の画像の細部の合わせ込みを行うといった順に行われる。なお、図４（ａ）において縦方向の矢印は処理の行われる順序を表している。
【００７４】
ただし、撮像モード同一化制御を撮像パラメータ設定手段２０により解除した場合はこの限りではなく、図４（ｂ）のように、位置合わせを除いて変換順序を設定するための変換順序設定手段９６により、変動量の大きい項目順に順序設定をして変換を行うようにしてもよい。これによって、変動量の小さい項目の変換を行うにつれて、誤差の影響が小さくなり、精度が向上する。
【００７５】
上記スケーリング変換を行うためのスケーリング変換手段９４は、両画像間の倍率変動量に基づいて、被写体込み画像と同じ視野角となるように背景のみの画像を変換する。ただし、撮像手段と被写体との距離の変動が十分小さいと仮定する。また、倍率データの変動がない場合は、ここでの処理を省略して位置合わせ手段９３から変換を始めてもよい。
【００７６】
輝度値変換手段９１は、背景のみの画像と被写体込み画像との間でのゲイン、ガンマ特性の変動量、露光量、および屋外・室内の区別、ストロボ発光の有無などの照明条件の一致・不一致などに基づいて、輝度レベルの変換を各画素ごとに行う。
色成分変換手段９２は、ホワイトバランス特性変動量などに基づいて、各画像データの色成分の変換を行う。
【００７７】
位置合わせ手段９３は、背景のみの画像および被写体込み画像間の類似度が最大となるようにするために、メモリ上で空間的にシフト演算を行う。すなわち、２つの画像を撮像する間の撮像手段の位置の変動や、姿勢変動（手ぶれ、撮像手段の持ち直しがある場合、足の位置を多少変えたりする場合）を許容するために、両画像間の対応点抽出と対応点に基づく位置合わせ、すなわち、マッチング演算を行う。
【００７８】
通常は、短時間中の手ぶれ程度の変動に対しては、各画像の４隅の局所領域間での対応点（少なくとも３点）抽出で十分である。なお、上記のような両画像間の対応点抽出を行うためのアルゴリズムとしては、代表的には、画像中の各点を中心とする所定サイズのブロックに分割し、ブロック間の相互相関値が最大となる点同士を対応点とするブロックマッチング方式を考慮することができる。
【００７９】
ところで、位置合わせを行うためには、背景部分が両画像間で所定割合以上で重なり合うことが重要である。一般的な目安としては、抽出対象の画像領域を除いた重なり合う面積が両画像間で、例えば５割以上あることが必要な条件となる。
【００８０】
ただし、重なり合う面積の割合の最低値は、背景のみの画像のパターンに応じて可変であることはいうまでもない。特に、背景と被写体との領域分離が比較的容易な場合、例えば背景が無地に近い場合、あるいは周期パターン等を有し、かつ抽出被写体がそれとは明白に異なるパターンを有する場合などにおいては、重なり面積は著しく小さくてもよい。
【００８１】
上記画像データ比較手段１１においては、画像変換後の背景のみの画像と被写体込み画像との差分画像データを生成する。
また、上記画像切り出し手段１２は、上記生成した差分画像データを平滑化処理（メディアンフィルタなど）、統計処理（色成分偏差、輝度レベル偏差に基づく）することにより、背景画像データの変動の大きい領域から被写体領域を抽出する。
【００８２】
具体的には、変換後の背景のみの画像Ｉ_b、被写体込み画像Ｉ_tのＲ，Ｇ，Ｂ成分および輝度信号から、色相Ｈ_b，Ｈ_t、彩度Ｓ_b，Ｓ_t、明度Ｖ_b，Ｖ_tをそれぞれ抽出し、以下の評価関数Ｆの値を所定の閾値で２値化して被写体領域と背景領域の識別を行う。
F (H_b− H_t,S_b− S_t,V_b− V_t)
＝α_h(H_b− H_t)²＋α_s(S_b− S_t)²＋α_u(V_b− V_t)² …（式１)
【００８３】
ここに、α_h、α_s、α_uは画像Ｉ_b,Ｉ_tの各成分のS ／Ｎ値または各画像を所定サイズのブロックに分割した際の各成分の分散値の関数であり、例えばα_h＝Ｐ_h（Ｉ_b）・Ｐ_h（Ｉ_t）などが用いられる。
【００８４】
ここに、Ｐ_h（Ｉ）は、画像データＩの所定領域における色相成分に関するＳ／Ｎ値の単調増加関数あるいは分散値の単調減少関数（逆数など）を表す。同様にして、Ｐ_s（Ｉ）、Ｐ_u（Ｉ）として彩度、明度に関するパラメータが定義される。領域中各点の判別用閾値Ｔの設定は大津の方法（電子情報通信学会論文誌、vol.J63,PP.349−356,1980）などを評価関数の各頁、すなわち、（H _b− H_t）²，（S _b− S_t）²，（V _b− V_t）²それぞれについて適用して、Ｔ_h,Ｔ_s,Ｔ_uを得たとすると、Ｔ＝Ｔ_h＋Ｔ_{s +}Ｔ_uとしてもよい。
【００８５】
なお、以上の評価関数および各パラメータは、上記した定義に限定されるものではない。また、閾値そのものはあらかじめ与えた値で画像全領域にわたって一定値としてもよい。
【００８６】
次に、閾値処理の結果、被写体領域と判定された連結領域をラベル付けし、ディスプレイ１７上にそれぞれ異なる色、輝度値、またはハッチングパターン等でマスク画像データとして表示する。
【００８７】
この際、連結領域内の孤立領域、すなわち背景の塊状領域を同一被写体領域と見做してそれを取り囲む被写体領域と同一のラベルに変換してもよい。ユーザは、抽出すべき被写体領域の一つを不図示のマウスなどを用いて選択、指示（マウスでクリックするなど）し、結果として背景などを除去した被写体画像のみがディスプレイ１７上に表示される。
【００８８】
ユーザは、その抽出画像データで問題なければ、例えば、確認用のアイコンをクリックするなどして確認の指示を与える。これにより、上記被写体のみの画像データが符号化され、次に画像データファイルの生成が実行される。
【００８９】
図６の（１）は、被写体および背景の両方を入れた被写体込み画像の例を示し、図６の（２）は背景のみの画像の例を示している。図６の（１）の被写体込み画像では、被写体（人物）を優先した撮像モードで画像が入力される。
【００９０】
また、図６の（２）の背景のみの画像では、同じ倍率で遠景に画像信号特性量が合うように撮像されている。これらの画像間では、同じ背景部においても画像データの特性（平均的輝度レベル、色成分）が若干異なることがあり、図６の（２）は説明上それを強調した図である。
【００９１】
図６の（３）は、被写体込み画像の撮像条件に基づいて背景のみの画像の正規化を行った結果を示している。切り出し処理は、被写体込み画像（１）と背景のみの画像（３）との差分データ（上述したステップＳ１０における比較処理の結果）に基づいて行い、その結果を図６の（４）に示す。
【００９２】
なお、登録画像を画像データベース１８から検索、抽出して用いてもよい。この場合、登録画像とは本実施形態では被写体込み画像のことであり、画像のヘッダ部またはヘッダファイルに記録された撮像時の付帯情報を活用して同様の処理を行う。
【００９３】
次に、本発明の画像抽出装置の第２の実施形態を説明する。
この第２の実施形態では、背景のみの画像と被写体込み画像の撮像順序によらず、被写体画像の画質および切り出し精度を損なわない処理を行うようにしている。
【００９４】
このために、本実施形態ではそれぞれの倍率条件を一定に保持しながら他の撮像条件を独立に設定可能な条件下で撮像して記録するようにしている。ただし、被写体込み画像を撮像する際は、被写体像の画質を優先した撮像モードが取られるものとする。
【００９５】
図５に、第２の実施形態の処理フローを示す。図５中の、第一画像および第二画像とは、上述の説明における背景のみの画像、被写体込み画像またはそれらの逆順の画像に対応する。
【００９６】
図５において、ステップＳ４９の画像データ変換処理においては、第１の実施形態と同様に画像データ変換手段９において、いずれか一方の撮像条件と一致するように他方の画像データの変換と両画像間の位置合わせを行う。
【００９７】
なお、この処理に先だって、両画像データに対して間引き処理または局所平均化処理などによる低解像度画像への変換を行ってもよい。これは、倍率条件が同じ場合でもピント（解像度）は被写体込み画像の背景領域と背景のみの画像の対応する領域とでは大きく異なる場合があるからである。上述のように、両画像とも低解像度化して処理すると、被写体領域を大まかに推定する際の効率および精度を向上させることができる。
【００９８】
【発明の効果】
第１の発明によれば、撮像条件を記録し、上記記録した撮像条件に基づいて画像データを変換するようにしたので、例えば登録済の画像と現画像、あるいは動画像中の異なるフレーム画像などのような、複数の画像間の比較から特定被写体を抽出する際に、各画像のそれぞれにおいて撮像条件が相違することに対する許容度を大きくすることができる。これにより、背景のみの画像から被写体画像の抽出を行う際に、手ぶれなどによる撮像手段の位置の微小変動や露光条件の相違、あるいはセンサのゲインなどの変動が多少あっても特定被写体を良好に抽出することができる。また、色彩モデルなど抽出対象に関するモデルを用いずに照明条件などの変動の許容度を大きくすることができる。
【００９９】
そして、背景のみの画像と被写体込み画像との比較データとして差分画像データを用いるようにしたので、撮像条件やカメラパラメータの変動に対する許容度を高めることができ、背景からの特定被写体を切り出す処理を効率的に行うようにすることができる。
【０１００】
さらに、背景のみの画像を撮像した際の撮像条件で撮像した被写体込み画像を用いて背景のみの画像の画像データを変換するようにしたので、背景のみの画像と被写体込み画像とから撮像条件およびカメラパラメータの変動に対する許容度を高め、かつ背景のみの画像の撮像時の撮像条件に左右されない高画質な被写体抽出を行うことができ、切り出し対象にピントの合った高画質な画像を出力することができる。
【０１０１】
第２の発明によれば、画像切り出し手段で得られる特定画像領域の画像データを符号化して画像記録手段に記録するようにしたので、必要とする画像を効率的に符号化することができ、画像抽出を良好に行うようにすることができる。
【０１０２】
第３の発明によれば、空間的にシフト演算を行い、背景のみの画像および被写体込み画像間の類似度が最大となるようにしたので、撮像手段の位置の変動や姿勢の変動を許容する画像抽出が可能となり、撮像時の手ぶれ等による悪影響が少ない画像抽出を行うことができる。
【０１０４】
第４の発明によれば、画像データ変換手段が画像サイズ、輝度レベル、色成分、および解像度を変換するようにしたので、異なる撮像条件で撮像された複数画像間の画像データの正規化を行い、画像間の比較に基づく特定被写体の抽出処理を高精度に行うことができ、高精度な被写体画像抽出を行うことができる。
【図面の簡単な説明】
【図１】本発明の画像抽出装置の要部構成を示す機能構成図である。
【図２】本発明の一実施形態を示すシステム構成図である。
【図３】基本処理の手順を示すフローチャートである。
【図４】画像データ変換手段の構成を示すブロック図である。
【図５】第２の実施形態の処理を示すフローチャートである。
【図６】処理過程の実施例を示す図である。
【符号の説明】
１撮像手段
２結像光学系
３レンズモータ駆動制御部
４イメージセンサ
５撮像パラメータ計測制御手段
６画像記録手段
７ビューファインダ
８撮像モード記録手段
９画像データ変換手段
１０撮像モード設定手段
１１画像データ比較手段
１２画像転送手段
１３画像切り出し手段
１４ストロボ発光手段
１５外部同期手段
１６端末
１７ディスプレイ
１８画像データベース[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image extraction apparatus, and is particularly suitable for use in an image processing system having an image cutout function and a region extraction function.
[0002]
[Prior art]
Conventionally, as a general technique for image segmentation (extraction), a key signal is generated by a chroma key method using a specific color background or image processing such as histogram processing, difference processing, differentiation processing, contour enhancement processing, and contour tracking. A video mat (Television Society Technical Report, vol. 12, pp. 29-34, 1988) is known.
[0003]
As an apparatus for extracting an image of a specific area from an image, for example, in the technique disclosed in Japanese Patent Publication No. 6-9062, a differential value obtained by a spatial filter is binarized to detect a boundary line, and the detected boundary The connected areas partitioned by lines are labeled, and areas having the same label are extracted.
[0004]
Note that the method of performing image extraction based on the difference from the background-only image is a classic one, and recently, in Japanese Patent Laid-Open No. 4-216181, the difference data between the background-only image and the processing target image is masked. A method is disclosed in which an image (specific processing region) is set, and target objects in a plurality of specific regions in the image are extracted or detected.
[0005]
In addition, in the method related to Japanese Examined Patent Publication No. 7-16250, the existence of an object to be extracted from the color conversion data of the current image including the background using the color model of the object to be extracted, and the difference data of brightness between the image of the background and the current image. The probability distribution is obtained.
[0006]
Furthermore, as an example of a method for optimizing camera operation and operation mode, in the method described in Japanese Patent Laid-Open No. 6-253197, the aperture is set so that the average luminance is appropriate when capturing an image of only the background. To do. Then, the current image is captured using the same set value, and the object image is extracted based on the difference image data.
[0007]
On the other hand, in the imaging means, as the degree of freedom of video information processing and processing increases with the digitization of signal processing, internal processing is compared with brightness level, tone conversion, white balance processing, quantization size conversion, etc. From simple processing to one having an edge extraction function or one having an image extraction function using the sequential growth method of color components (Technical Report of Television Society, vol.18, pp.13-18, 1994), etc. It shows great progress.
[0008]
[Problems to be solved by the invention]
However, among the above-described conventional examples, the methods using difference data with only the background image, except for the method described in Japanese Patent Laid-Open No. 6-253197, are all shooting conditions (such as camera parameters and illumination). (External conditions) are not taken into account, so if the background-only image and the subject-included image to be extracted are not obtained under the same shooting conditions and the same fixed position, the determination error of the extraction target region from the difference data is very high. There was a problem of getting bigger.
[0009]
Further, the method described in Japanese Patent Publication No. 7-16250 is unsuitable for image extraction of an arbitrary unknown object in that a color model to be extracted is required.
[0010]
Also, in the method related to Japanese Patent Laid-Open No. 6-253197, it is premised that the imaging means is at the same fixed position, and that the imaging conditions are the same as when imaging only the background. Only a technique of using the aperture setting value at the time of capturing an image when capturing an image including a subject has been disclosed. In addition, in this system in which priority is given to the imaging condition of only the background image, there is a problem in that the image quality of the extraction target of the subject-inclusive image is generally not guaranteed.
[0011]
Furthermore, the chroma key method has problems such as being unusable outdoors due to the large restrictions on the background and coloring.
In addition, the video mat method has a problem that it is necessary for humans to accurately perform contour designation work in units of pixels, which requires labor and skill.
[0012]
In addition, the method of detecting the boundary line by the differential operation and detecting the area partitioned by the boundary line is difficult to apply to an object having a complex texture pattern (pattern), or a stable and general-purpose boundary. There were problems such as lack of a line detection processing method.
[0013]
In view of the above-mentioned problems, the present invention increases the tolerance for differences in imaging conditions in each image and easily identifies an extraction target region when extracting a specific subject from a comparison between a plurality of images. An object of the present invention is to provide an image extraction apparatus capable of obtaining a good image of an object to be extracted.
[0020]
[Means for Solving the Problems]
The image extraction apparatus of the present invention records an imaging condition control unit for controlling an imaging condition when imaging is performed by the imaging unit, an imaging condition when the imaging unit performs imaging, and the recorded imaging condition. Based on the imaging conditions supplied from the imaging condition recording / reproducing means, the imaging condition recording / reproducing means for reproducing and outputting the image, the image recording means for recording a plurality of images taken by the imaging means, Image data conversion means for converting image data of at least one of the images, image data comparison means for comparing image data of a plurality of images converted by the image data conversion means, and the image data Image extracting means for extracting an image of the specific image area based on the comparison result output from the comparing means, and the image recording means, A scene-only image and a subject-included image in which a cut-out target image is present in the background are temporarily stored and held, and the image data comparison unit stores a plurality of difference image data reproduced and output by the image recording unit. The imaging condition recording / reproducing unit records the imaging conditions when the image to be cut out in a predetermined background is imaged, and the image data converting unit reproduces and outputs the image reproduced by the imaging condition recording / reproducing unit. The image data of the background only image is converted based on the condition.
[0023]
Another feature of the present invention is that the image recording means encodes and records image data of a specific image area obtained by the image cutout means.
[0024]
Another feature of the present invention is that the image data conversion means spatially performs a shift operation so that the similarity between the background-only image and the subject-included image is maximized. .
[0026]
Another feature of the present invention is that the image data conversion means converts an image size, a luminance level, a color component, and a resolution.
[0027]
[Action]
Since the present invention comprises the above technical means, according to the first invention, the imaging conditions are recorded and reproduced, and the image data is converted based on the reproduced imaging conditions. When extracting a specific subject by comparing between a plurality of images such as images or different frame images in a moving image, the tolerance for different imaging conditions of each image can be increased, As a result, it is possible to extract a specific subject even if there is a slight change in the position of the imaging means due to camera shake, a difference in exposure conditions, a gain change in the sensor, or the like.
[0028]
The image including only the background and the subject-included image in which the image to be cut out exists in the background is temporarily stored and held by the image recording unit, and is reproduced and output by the image recording unit. Since the data is extracted by the image data comparison means, and the extracted difference image data is used as comparison data between the background-only image and the subject-included image, the process of cutting out the specific subject from the background can be performed efficiently. It becomes possible.
[0029]
Furthermore, since the image data of the background only image is converted based on the imaging conditions when the image to be cut out in the predetermined background recorded in the imaging condition recording / reproducing means is captured, the high image quality focused on the object to be cut out It is possible to output the image.
[0030]
According to the second invention, the image data of the specific image area obtained by the image cut-out means is encoded and recorded in the image recording means, so that the required image can be efficiently encoded. Is possible.
[0031]
According to the third invention, since the spatial shift operation is performed so that the similarity between the background-only image and the subject-included image is maximized, image extraction that allows fluctuations in the position and orientation of the imaging means is performed. Is possible.
[0033]
According to the fourth invention, since the image size, the luminance level, the color component, and the resolution are converted by the image data conversion unit, normalization of the image data between a plurality of images captured under different imaging conditions is performed satisfactorily. It is possible to extract a specific subject with high accuracy.
[0034]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the image extraction apparatus of the present invention will be described below with reference to the drawings.
FIG. 1 is a functional configuration diagram showing the main configuration of the image extraction apparatus of the present invention. In FIG. 1, A is an image pickup means comprising an image pickup optical system and an image pickup element, B is an image pickup condition control means for controlling various image pickup parameters such as zoom, focus, stop, etc. C is a record of control data of the image pickup condition control means B. An imaging condition recording / reproducing means for reproducing, D is an image recording means for recording an image taken by the imaging means A, and E is an image data conversion for converting an image based on various control parameters supplied from the imaging condition recording / reproducing means C. Means F is an image data comparison means for comparing the background image and the image including the subject, and G is an image cutout means for setting an image cutout region based on the output of the image comparison means F. H is a display means including a monitor display for displaying an image, an electronic viewfinder, and the like.
[0035]
The imaging means A is provided for capturing a plurality of images, and the imaging condition control means B is for controlling imaging conditions when the imaging means A performs imaging. is there. In the present embodiment, the imaging conditions include the exposure amount, the focused state, and the presence or absence of strobe light emission.
[0036]
The imaging condition recording / reproducing means C is for recording the imaging conditions when imaging is performed by the imaging means A, and for reproducing and outputting the recorded imaging conditions. The condition recording / reproducing means C records the imaging condition when the clipping target image in the predetermined background is captured.
[0037]
The image recording means D is for recording a plurality of images picked up by the image pickup means A, and temporarily stores and holds a background-only image and a subject-included image in which a clipping target image exists in the background. Further, the image data of the specific image area obtained by the image cutout means G is encoded and recorded.
[0038]
The image data converting means E is provided for converting image data of at least one of the plurality of images based on the imaging conditions supplied from the imaging condition recording / reproducing means C. is there. Further, the image data of the background only image is converted based on the imaging conditions reproduced and output by the imaging condition recording / reproducing means C. Further, the image data conversion means E performs a spatial shift operation on the memory so that the similarity between the background-only image and the subject-included image is maximized, and the image size, luminance level, color component, and Convert resolution.
[0039]
The image data comparison means F is for comparing the image data of a plurality of images converted by the image data conversion means E, and the difference image data of the plurality of images reproduced and output by the image recording means D. To extract.
[0040]
The image cutout means G is provided for extracting an image of a specific image area based on a comparison result of a plurality of image data output from the image data comparison means F.
[0041]
According to the image extraction apparatus of the present embodiment configured as described above, a specific subject is extracted from a comparison between a plurality of images such as a registered image and a current image, or different frame images in a moving image. In doing so, it is possible to increase the tolerance for different imaging conditions in each image.
[0042]
Thereby, when extracting a subject image from an image of only the background, for example, even if there is a slight change in the position of the imaging means A due to camera shake or the like, a difference in exposure conditions, or a change in sensor gain, the specific subject. Can be extracted satisfactorily. In addition, it is possible to increase the tolerance of fluctuations in illumination conditions and the like without using a model relating to an extraction target such as a color model.
[0043]
In addition, since it is possible to increase the tolerance for fluctuations in imaging conditions and camera parameters, it is possible to efficiently perform a process of cutting out a specific subject from the background. Furthermore, since the image data of the background only image is converted using the subject-included image captured under the image capturing condition when the image of the background only is captured, the image capturing condition and the camera parameter are determined from the image including the background only and the image including the subject. The tolerance for fluctuations can be increased. In addition, it is possible to perform high-quality subject extraction that is not affected by the imaging conditions at the time of capturing an image of only the background, and it is possible to output a high-quality image that is focused on the extraction target.
[0044]
According to another feature of the present embodiment, the image data of the specific image area obtained by the image cutout means G is encoded and recorded in the image recording means D, so that the necessary image can be efficiently stored. It is possible to perform encoding and to perform image extraction satisfactorily.
[0045]
Further, according to other features of the present embodiment, the image data conversion means E spatially performs the shift operation so that the similarity between the background-only image and the subject-included image is maximized. Image extraction that allows fluctuations in the position and posture of the image pickup means A is possible, and image extraction that is less adversely affected by camera shake during image pickup can be performed.
[0046]
Further, according to another feature of the present embodiment, the image capturing condition includes exposure amount, in-focus state, and presence / absence of strobe light emission, so that an image of a specific subject is extracted from a plurality of images captured under different image capturing conditions. Thus, it is possible to perform image extraction with increased tolerance for variations in magnification conditions, focus, contrast, illumination conditions, and the like.
[0047]
According to another feature of the present embodiment, since the image data conversion means E converts the image size, the luminance level, the color component, and the resolution, the image between a plurality of images captured under different imaging conditions. Data normalization is performed, and a specific subject extraction process based on comparison between images can be performed with high accuracy, so that subject image extraction with high accuracy can be performed.
[0048]
Next, the configuration and operation of the image extraction apparatus of the present invention will be described more specifically with reference to FIGS.
First, the first embodiment will be described. The basic processing of the first embodiment is to capture an image of a background only and a subject image in the background, and subject the comparison data (difference data) in consideration of the imaging conditions of the two to perform statistical processing. The image of the subject is extracted by detecting the area.
[0049]
In the present embodiment, it is assumed that the operator holds an image pickup means such as a video camera in his / her hand, and an image in which the subject to be extracted is in the background is first picked up, and the image pickup conditions at that time Are recorded together with the image data in the storage means. Next, the same imaging conditions (camera internal parameters including image signal characteristic parameters) are read from the storage means so as to capture an image of only the background.
[0050]
On the other hand, if the environmental conditions are not considered to be the same between the two times of imaging, for example, if the imaging time or external illumination conditions are completely different, the imaging parameters that should be the same conditions are limited to magnification, focus, etc. Thus, it is possible to prevent a situation in which the same condition cannot be obtained.
[0051]
FIG. 2 shows a main configuration of the imaging system according to the present embodiment. In FIG. 2, reference numeral 1 denotes an image pickup means, and 2 denotes an image forming optical system including an image pickup lens, which is a compound eye image pickup system for picking up a stereoscopic image in this embodiment. Reference numeral 3 denotes a lens motor drive controller for driving each lens of the imaging optical system 2, and reference numeral 4 denotes an image sensor. In general, a CCD or the like is used.
[0052]
Reference numeral 5 denotes an imaging parameter measurement control means, which is a focal length detection means for a zoom lens for controlling the magnification, a focus state detection means for detecting the focus state of the lens by a known means, for example, a shutter speed for controlling the accumulation time of the CCD. Detection control means, stop measurement control means for controlling the aperture diameter of the stop, detection means for characteristic quantities (for example, correction coefficient for gamma) of image signal characteristic parameters (gamma, knee, white balance correction, CCD accumulation time, etc.), etc. Is included. Reference numeral 6 denotes an image recording unit comprising a memory or the like, and 7 denotes an electronic viewfinder (EVF or the like) as a display unit.
[0053]
Reference numeral 8 denotes imaging mode recording means for recording information at the time of imaging including imaging parameters, image characteristic parameters, presence / absence of strobe light emission, intentional motion such as scanning, presence / absence of camera shake, and the like. Note that camera motion such as camera shake, scanning, and panning may be determined from the output data or the like by incorporating an acceleration sensor in the imaging means. These incidental data are stored in the image database 18 together with the image data.
[0054]
Reference numeral 9 denotes image data conversion means for converting image data at the time of synthesis based on imaging conditions and the like, and details will be described later. An image signal processing circuit 10 includes gamma, knee, white balance correction, AF (Automatic Focusing), AE (Automatic Exposure), AGC (Automatic Gain Control) processing circuits, and the like. Reference numeral 11 denotes image data comparison means for detecting and outputting the difference between the background-only image and the subject-included image.
[0055]
Reference numeral 12 denotes an image cutout unit, which identifies an extraction target region based on a result of statistical processing of the output from the image data comparison unit 11, and a key signal (or a key for cutting out the identified extraction target region from a subject-included image) (Mask data) is output. An image transfer means 13 is provided for transferring image data or the like to an external database or terminal.
[0056]
Reference numeral 14 denotes a strobe light emitting means, and reference numeral 15 denotes an external synchronizing means. Although a signal line is omitted, a synchronizing clock is supplied to each circuit. Reference numeral 16 denotes a terminal for performing imaging mode control from outside, selection of a cutout image, search and selection of a registered image, and the like. Reference numeral 17 denotes a display, which also functions as a processed image output and a finder display.
[0057]
Reference numeral 18 denotes an image database. As image data captured in the past and associated data, the type as to whether it is a registered image, imaging parameters, imaging conditions (outdoor or indoor distinction, presence / absence of strobe light emission, etc.) Or other information (date, time, location, camera operator, title, etc.).
[0058]
Reference numeral 19 denotes image type setting means for setting a switch for registering as a reference image for extracting a target based on comparison with another image, and a type of subject extraction image to be compared with the reference image. For example, the image type is automatically recorded as incidental information.
[0059]
Reference numeral 20 denotes camera parameter setting means. Normally, an image of only a background and an image including a subject are picked up in the same image pickup mode. However, the camera parameter setting means 20 allows the operator to arbitrarily set the internal characteristics of the image pickup means. This is used when setting to. In addition, although the circuit inside an imaging means was set as the display for every function, each function is operation-controlled by the microprocessor not shown.
[0060]
Next, the basic processing of the image extracting apparatus of the present embodiment configured as described above is shown in the flowchart of FIG. 3, and an example of image extraction will be described with reference to FIG. Although these operations are also displayed for each function in the figure, they are actually processed by the microprocessor.
When the process is started, first, in the first step S1, it is determined whether the image output from the imaging means is a background only image or a subject-included image. The above determination is made for the type of image set by the operator via the image type setting means 19. That is, the operator designates a type of whether the image to be captured is the main subject, only the background, or a combination of both.
[0061]
If it is determined in step S1 that an image including the subject is to be captured, the process proceeds to step S2 to set an imaging mode. Thereafter, in step S3, various parameters for imaging are set and imaging is performed under imaging conditions (described below) that are optimal for the subject. Then, after completing the imaging in step S3, the process proceeds to step S4, and as described above, the magnification (focal length), the degree of focus, the aperture, the shutter speed, the presence / absence of camera shake, the presence / absence of panning / tilting, or the gain The incidental information such as various imaging modes and imaging conditions is measured.
[0062]
Next, proceeding to step S5, the incidental information measured at step S4 is recorded in a predetermined format together with the image data. The supplementary information may be recorded separately in a header file or the like together with the address of the corresponding image data.
[0063]
On the other hand, if the result of determination in step S1 is that only an image of the background is captured, the process proceeds from step S1 to step S6. In step S6, incidental information of the image including the subject is read, and in step S7, the imaging mode is set in the same manner as described above. In this case, the same condition parameter in the process of step S2 is selected, and basically, only the background is captured under the same imaging condition in the next step S8.
[0064]
However, in order to cope with fluctuations in environmental conditions and the like, in step S7 described above, as an imaging condition selection mode recorded by the imaging mode recording means 8 at the time of imaging, the outdoor or indoor distinction, the presence or absence of strobe light emission, etc. An optimal imaging mode is set using the information. The imaging parameter measurement control means 5 determines the presence / absence of a change in environmental conditions by using these imaging conditions and incidental information, and controls the imaging parameters so that the same conditions are satisfied.
[0065]
It should be noted that the illumination means mounted on the imaging means, such as a strobe, when the imaging time of the background only image and the imaging time of the subject-incorporated image are close to each other, whether indoors or outdoors. It is considered that the lighting conditions, background patterns, etc. except for have hardly changed.
[0066]
Therefore, by making the imaging mode excluding the image signal characteristic parameter the same between the two (for example, unified with the imaging mode for the subject-included image), the fluctuation of the image data in the same area of the background pattern between the two images is suppressed. This can increase the reliability of the subsequent statistical processing for image segmentation.
[0067]
However, since the characteristics of the sensor signal processing circuit (gamma, white balance, etc.) change due to the spectral reflectance characteristics of the subject image, noise, etc., the image data in the background area is the same even in the same imaging mode in general situations. May not perfectly match between the two images.
[0068]
In order to deal with such a case, image data conversion processing is performed in step S9. In this case, scale conversion (when the magnification is changed), luminance conversion (when the exposure amount / gamma is changed), color component conversion (white balance characteristics fluctuation), image alignment (when there is a change in the image position), etc. The normalization is performed so that the image data of the portion corresponding to the image area of only the background of the subject-included image in the background only image is substantially the same.
[0069]
Next, in step S10, an image comparison process with the background only image is performed to calculate an image data difference, and an image cutout process is performed in step S11 based on the image difference data. Details of these processes will be described later.
[0070]
As shown in FIG. 4, the image data conversion means 9 includes a data input means 90 for inputting image data and its accompanying data, a luminance value conversion means 91, a color component conversion means 92, a space shift calculation means 93, an image size scaling. It comprises means 94, parameter fluctuation evaluation means 95 and the like. In general, an image including a subject is used as a reference image. For the reference image, the image data is fixed, and the image data of the background only image is converted.
[0071]
In addition, when the imaging mode is the same between both images, that is, when there is no change in environmental conditions, it can be said that conversion of luminance values and color components is generally unnecessary. However, depending on the characteristics of the subject image and the ratio of the area to the background, the image signal characteristic parameters, that is, the gain, white balance characteristic, and gamma characteristic may change.
[0072]
Therefore, when the change width of at least one of these characteristic values exceeds a certain threshold value, the parameter characteristic value of each point in the background-only image and the subject-included image recorded in the predetermined memory is recorded. Based on the difference, the image of only the background is converted so that the luminance and color components thereof correspond to the characteristic values of the image including the subject. Hereinafter, processing contents performed in each converting means will be described.
[0073]
Note that the conversion processing in the image data conversion means 9 is considered that there is no significant change in the imaging environment when the variation of each parameter is relatively small. As shown in FIG. After conversion and alignment of each image to enable basic image composition, the image details are adjusted in order such as luminance conversion and color component conversion. In FIG. 4A, the vertical arrows indicate the order in which processing is performed.
[0074]
However, this is not the case when the imaging mode equalization control is canceled by the imaging parameter setting means 20, and as shown in FIG. 4B, the conversion order setting means 96 for setting the conversion order excluding alignment is used. Alternatively, the conversion may be performed by setting the order in the order of items with the largest fluctuation amount. As a result, the effect of the error is reduced and the accuracy is improved as the item with a small fluctuation amount is converted.
[0075]
The scaling conversion means 94 for performing the scaling conversion converts the background only image so as to have the same viewing angle as the subject-incorporated image, based on the magnification fluctuation amount between the two images. However, it is assumed that the variation in the distance between the imaging means and the subject is sufficiently small. If there is no change in the magnification data, the process may be omitted and the conversion may be started from the alignment means 93.
[0076]
Luminance value conversion means 91 matches or does not match lighting conditions such as gain, gamma characteristic fluctuation amount, exposure amount, outdoor / indoor distinction, presence / absence of strobe light emission, between a background-only image and a subject-included image. Based on the above, luminance level conversion is performed for each pixel.
The color component conversion unit 92 converts the color component of each image data based on the white balance characteristic fluctuation amount and the like.
[0077]
The alignment means 93 performs a spatial shift operation on the memory in order to maximize the similarity between the background-only image and the subject-included image. In other words, in order to allow fluctuations in the position of the imaging means between two images and posture changes (when there is camera shake, a pick-up of the imaging means, or when the position of the foot is slightly changed), The corresponding points are extracted and alignment based on the corresponding points, that is, matching calculation is performed.
[0078]
Normally, it is sufficient to extract corresponding points (at least three points) between the local areas at the four corners of each image for fluctuations in the degree of camera shake during a short time. As an algorithm for extracting the corresponding points between both images as described above, typically, the image is divided into blocks of a predetermined size centered on each point in the image, and the cross-correlation value between the blocks is calculated. It is possible to consider a block matching method in which the maximum points are corresponding points.
[0079]
By the way, in order to perform alignment, it is important that the background portion overlaps between both images at a predetermined ratio or more. As a general guideline, it is a necessary condition that the overlapping area excluding the image region to be extracted is, for example, 50% or more between both images.
[0080]
However, it goes without saying that the minimum value of the ratio of the overlapping areas is variable according to the pattern of the background only image. In particular, when the background and subject are relatively easily separated, for example, when the background is close to plain or has a periodic pattern, etc., and the extracted subject has a distinctly different pattern, etc. The area may be significantly smaller.
[0081]
The image data comparison unit 11 generates difference image data between the background-only image after the image conversion and the subject-included image.
In addition, the image cutout unit 12 performs smoothing processing (such as a median filter) and statistical processing (based on color component deviation and luminance level deviation) on the generated difference image data, so that an area in which background image data varies greatly A subject area is extracted from.
[0082]
Specifically, the converted background-only image I _b , Subject-included image I _t From the R, G, B components and the luminance signal, hue H _b , H _t , Saturation S _b , S _t , Brightness V _b , V _t Are extracted, and the value of the following evaluation function F is binarized with a predetermined threshold value to identify the subject area and the background area.
F (H _b − H _t , S _b − S _t , V _b − V _t )
= Α _h (H _b − H _t ) ² + Α _s (S _b − S _t ) ² + Α _u (V _b − V _t ) ² ... (Formula 1)
[0083]
Where α _h , Α _s , Α _u Is image I _b, I _t Is a function of the S / N value of each component or the variance value of each component when each image is divided into blocks of a predetermined size, for example α _h = P _h (I _b ) ・ P _h (I _t ) Etc. are used.
[0084]
Where P _h (I) represents a monotonically increasing function of the S / N value or a monotonically decreasing function of the variance value (reciprocal number, etc.) relating to the hue component in a predetermined region of the image data I. Similarly, P _s (I), P _u Parameters relating to saturation and lightness are defined as (I). The threshold value T for discriminating each point in the region is set according to Otsu's method (Journal of the Institute of Electronics, Information and Communication Engineers, vol.J63, PP.349-356, 1980), etc. _b − H _t ) ² , (S _b − S _t ) ² , (V _b − V _t ) ² Apply for each, T _h, T _s, T _u T = T _h + T _{s +} T _u It is good.
[0085]
Note that the above evaluation function and each parameter are not limited to the above definitions. Further, the threshold value itself may be a predetermined value and may be a constant value over the entire image area.
[0086]
Next, the connected areas determined as the subject areas as a result of the threshold processing are labeled and displayed on the display 17 as mask image data with different colors, luminance values, hatch patterns, or the like.
[0087]
At this time, the isolated region in the connected region, that is, the background block region may be regarded as the same subject region and converted into the same label as the surrounding subject region. The user selects and designates one of the subject areas to be extracted using a mouse (not shown) or the like (clicks with the mouse or the like), and as a result, only the subject image from which the background is removed is displayed on the display 17. .
[0088]
If there is no problem with the extracted image data, the user gives a confirmation instruction by, for example, clicking a confirmation icon. Thereby, the image data of only the subject is encoded, and then generation of an image data file is executed.
[0089]
(1) in FIG. 6 shows an example of a subject-included image including both a subject and a background, and (2) in FIG. 6 shows an example of an image with only a background. In the subject-incorporated image of (1) in FIG. 6, the image is input in an imaging mode in which the subject (person) is prioritized.
[0090]
In addition, the background-only image in (2) of FIG. 6 is imaged at the same magnification so that the image signal characteristic amount matches the distant view. Among these images, the characteristics (average luminance level and color component) of the image data may be slightly different even in the same background portion, and FIG.
[0091]
(3) of FIG. 6 shows the result of normalizing the image of only the background based on the imaging condition of the subject-included image. The clipping process is performed based on the difference data between the subject-included image (1) and the background-only image (3) (the result of the comparison process in step S10 described above), and the result is shown in (4) of FIG.
[0092]
The registered image may be retrieved from the image database 18 and used. In this case, the registered image is an image including a subject in the present embodiment, and similar processing is performed by using supplementary information at the time of imaging recorded in the header portion or header file of the image.
[0093]
Next, a second embodiment of the image extraction apparatus of the present invention will be described.
In the second embodiment, processing that does not impair the image quality and clipping accuracy of the subject image is performed regardless of the imaging order of the background-only image and the subject-included image.
[0094]
For this reason, in this embodiment, the respective magnification conditions are kept constant, and other image capturing conditions are captured and recorded under conditions that can be set independently. However, when capturing an image including a subject, an imaging mode in which priority is given to the image quality of the subject image is taken.
[0095]
FIG. 5 shows a processing flow of the second embodiment. The first image and the second image in FIG. 5 correspond to the background-only image, the subject-included image, or the reverse image thereof in the above description.
[0096]
In FIG. 5, in the image data conversion process in step S49, the image data conversion unit 9 converts the other image data and matches between the two images so as to match any one of the imaging conditions as in the first embodiment. Perform position alignment.
[0097]
Prior to this processing, both image data may be converted into a low resolution image by thinning processing or local averaging processing. This is because even when the magnification condition is the same, the focus (resolution) may be greatly different between the background region of the image including the subject and the corresponding region of the background-only image. As described above, when both the images are processed with a reduced resolution, the efficiency and accuracy in roughly estimating the subject area can be improved.
[0098]
【The invention's effect】
According to the first invention, the imaging condition is recorded, and the image data is converted based on the recorded imaging condition. For example, the registered image and the current image, or different frame images in the moving image, etc. When a specific subject is extracted from a comparison between a plurality of images as described above, it is possible to increase the tolerance for different imaging conditions in each image. As a result, when extracting a subject image from an image of only the background, even if there is a slight variation in the position of the imaging means due to camera shake, differences in exposure conditions, or slight variations in sensor gain, etc. Can be extracted. In addition, it is possible to increase the tolerance of fluctuations in illumination conditions and the like without using a model relating to an extraction target such as a color model.
[0099]
Since the difference image data is used as the comparison data between the background-only image and the subject-incorporated image, the tolerance for fluctuations in imaging conditions and camera parameters can be increased, and the process of cutting out a specific subject from the background is performed. It can be done efficiently.
[0100]
Furthermore, since the image data of the background only image is converted using the subject-included image captured under the image capturing conditions when the image of the background only is captured, the image capturing condition and the Increases the tolerance for camera parameter fluctuations, enables high-quality subject extraction that is not affected by the imaging conditions when capturing images of only the background, and outputs high-quality images that are in focus for the target to be cut out Can do.
[0101]
According to the second invention, since the image data of the specific image area obtained by the image cutout unit is encoded and recorded in the image recording unit, the necessary image can be efficiently encoded, Image extraction can be performed satisfactorily.
[0102]
According to the third invention, since the shift operation is spatially performed so that the similarity between the background-only image and the subject-included image is maximized, the variation of the position of the imaging unit and the variation of the posture are allowed. Image extraction is possible, and image extraction with less adverse effects due to camera shake during imaging can be performed.
[0104]
According to the fourth invention, since the image data conversion means converts the image size, the luminance level, the color component, and the resolution, the image data between the plurality of images captured under different imaging conditions is normalized. Thus, the extraction process of the specific subject based on the comparison between the images can be performed with high accuracy, and the subject image extraction with high accuracy can be performed.
[Brief description of the drawings]
FIG. 1 is a functional configuration diagram showing a main configuration of an image extraction apparatus according to the present invention.
FIG. 2 is a system configuration diagram showing an embodiment of the present invention.
FIG. 3 is a flowchart showing a procedure of basic processing.
FIG. 4 is a block diagram showing a configuration of image data conversion means.
FIG. 5 is a flowchart showing processing of the second embodiment.
FIG. 6 is a diagram illustrating an example of a processing process.
[Explanation of symbols]
1 Imaging means
2 Imaging optics
3 Lens motor drive controller
4 Image sensor
5. Imaging parameter measurement control means
6 Image recording means
7 Viewfinder
8 Imaging mode recording means
9 Image data conversion means
10 Imaging mode setting means
11 Image data comparison means
12 Image transfer means
13 Image clipping means
14 Strobe flash means
15 External synchronization means
16 terminals
17 Display
18 Image database

Claims

Imaging condition control means for controlling imaging conditions when imaging is performed by the imaging means;
An imaging condition recording / reproducing unit for recording an imaging condition when performing imaging by the imaging unit and reproducing and outputting the recorded imaging condition;
Image recording means for recording a plurality of images taken by the imaging means;
Image data conversion means for converting image data of at least one of the plurality of images based on the imaging conditions supplied from the imaging condition recording / reproducing means;
Image data comparison means for comparing image data of a plurality of images converted by the image data conversion means;
Image extracting means for extracting an image of a specific image region based on the comparison result output from the image data comparing means,
The image recording means temporarily stores and holds a background-only image and a subject-included image in which a clipping target image exists in the background,
The image data comparison unit extracts difference image data of a plurality of images reproduced and output by the image recording unit,
The imaging condition recording / reproducing means records an imaging condition when an image to be cut out in a predetermined background is captured,
The image extraction apparatus characterized in that the image data conversion means converts image data of the background only image based on the imaging conditions reproduced and output by the imaging condition recording / reproducing means.

2. The image extracting apparatus according to claim 1, wherein the image recording unit encodes and records image data of a specific image area obtained by the image cutout unit.

3. The image extraction apparatus according to claim 1, wherein the image data conversion means spatially performs a shift operation so that the similarity between the background-only image and the subject-included image is maximized.

The image extraction apparatus according to claim 1, wherein the image data conversion unit converts an image size, a luminance level, a color component, and a resolution.