JP6982865B2

JP6982865B2 - Moving image distance calculation device and moving image distance calculation program

Info

Publication number: JP6982865B2
Application number: JP2017235198A
Authority: JP
Inventors: 嶐一岡
Original assignee: University of Aizu
Current assignee: University of Aizu
Priority date: 2017-12-07
Filing date: 2017-12-07
Publication date: 2021-12-17
Anticipated expiration: 2037-12-07
Also published as: JP2019101967A

Description

本発明は、動画像距離算出装置および動画像距離算出用プログラムに関し、より詳細には、移動する一台のカメラにより進行方向の正面の様子を撮影した動画像に基づいて、動画像に映った対象物からカメラまでの距離を算出する動画像距離算出装置および動画像距離算出用プログラムに関する。 The present invention relates to a moving image distance calculation device and a moving image distance calculation program, and more specifically, it is reflected in a moving image based on a moving image obtained by taking a front view in the traveling direction with a moving camera. The present invention relates to a moving image distance calculation device for calculating the distance from an object to a camera and a moving image distance calculation program.

近年、車両やドローンなどの移動物体に対して、正面方向の外界を撮影するためのカメラを設置することが多い。近時では、カメラにより正面方向の様子を単に撮影するだけでなく、車両やドローンなどの自動運転に必要な距離情報の取得等に、撮影された動画像を利用したいという要望が存在する。 In recent years, a camera for photographing the outside world in the front direction is often installed on a moving object such as a vehicle or a drone. Recently, there is a demand to use the captured moving image not only to capture the front view with a camera but also to acquire the distance information necessary for automatic driving of vehicles and drones.

移動物体と他物体との距離測定行う方法として、移動物体に距離センサ等を設置する方法が多く用いられている。距離センサとして、例えば、レーザーセンサ、超音波センサ、赤外線センサ等を用いて、他物体との距離を直接的に測定する方法が用いられている（例えば、非特許文献１参照）。 As a method of measuring the distance between a moving object and another object, a method of installing a distance sensor or the like on the moving object is often used. As the distance sensor, for example, a method of directly measuring the distance to another object by using a laser sensor, an ultrasonic sensor, an infrared sensor or the like is used (see, for example, Non-Patent Document 1).

また、近時では、ステレオカメラにより撮影された動画像を画像解析することによって、撮影された他物体との距離を算出する方法も用いられている（例えば、非特許文献２参照）。 Recently, a method of calculating the distance to another object taken by image analysis of a moving image taken by a stereo camera has also been used (see, for example, Non-Patent Document 2).

菅沼直樹，"自律型自動運転自動車における認知・判断技術−オンボードセンシングからパスプランニングまで−"，第23回画像センシングシンポジウム（SSII2017）講演論文集，横浜（パシフィコ横浜），2017年6月Naoki Suganuma, "Cognitive / Judgment Technology in Autonomous Autonomous Vehicles-From On-Board Sensing to Path Planning-", Proceedings of the 23rd Image Sensing Symposium (SSII2017), Yokohama (Pacifico Yokohama), June 2017 滝本周平，伊藤崇晶，"車載カメラを用いた単眼測距検証システムの開発"，ＳＥＩテクニカルレビュー，第169号，p.82-87，2006年7月Shuhei Takimoto, Takaaki Ito, "Development of Monocular Distance Measurement Verification System Using In-Vehicle Camera", SEI Technical Review, No. 169, p.82-87, July 2006

一般的に、車両に距離センサを設置する場合、１台の車両に複数台の距離センサを設置することが多い。例えば、車両から比較的近い他物体を検出するために、超音波センサが設置される。車両から比較的遠い他物体を検出するために、レーザーセンサが併設される。距離センサによる情報の精度を高めるためには、数種類の距離センサを複数設置することが必要になるため、コスト上昇を招くという問題が生じていた。特に、レーザーセンサは、他のセンサよりも高額であることから、コスト負担が増大するという問題があった。 Generally, when a distance sensor is installed in a vehicle, a plurality of distance sensors are often installed in one vehicle. For example, ultrasonic sensors are installed to detect other objects that are relatively close to the vehicle. A laser sensor is installed in order to detect other objects that are relatively far from the vehicle. In order to improve the accuracy of information by the distance sensor, it is necessary to install a plurality of distance sensors of several types, which causes a problem of cost increase. In particular, since the laser sensor is more expensive than other sensors, there is a problem that the cost burden increases.

また、レーザーセンサは、前方の物体との距離測定をすることはできても、その物体が人であるのか、車両であるのか、障害物であるのか等の測定対象の識別・特定を行うことが困難であるという問題があった。 In addition, although the laser sensor can measure the distance to an object in front, it identifies and identifies the measurement target such as whether the object is a person, a vehicle, or an obstacle. There was a problem that it was difficult.

ステレオカメラを用いて前方の物体との距離測定を行う場合には、画像解析処理等により測定対象物の識別・特定を行うことが可能である。しかしながら、ステレオカメラの場合には、少なくとも２台以上のカメラが必要になるため、コスト上昇を招くという問題があった。 When measuring the distance to the object in front using a stereo camera, it is possible to identify and specify the object to be measured by image analysis processing or the like. However, in the case of a stereo camera, at least two or more cameras are required, which causes a problem of cost increase.

さらに、ステレオカメラを用いる場合には、測定される距離の点群が、外界のどの物体に対応しているかを決定する必要がある。従って、ステレオカメラを利用する場合であっても、距離センサを併設することが多かった。 Furthermore, when using a stereo camera, it is necessary to determine which object in the outside world the point cloud of the measured distance corresponds to. Therefore, even when using a stereo camera, a distance sensor is often installed.

また、前方の対象物との距離を求めるために、複数の距離センサを用いると、全体としてセンサ設置のためのコスト上昇を招くと共に、それらのセンサ群により取得されたデータを統合するために複雑な処理が必要になるという問題が生じていた。同様の問題は、車両だけでなく、ドローン等のような移動物体において、他物体との距離を測定する場合に生ずる問題であった。 In addition, using multiple distance sensors to determine the distance to the object in front leads to an increase in the cost for sensor installation as a whole, and is complicated to integrate the data acquired by those sensor groups. There was a problem that various processing was required. A similar problem has occurred when measuring the distance to another object not only in a vehicle but also in a moving object such as a drone.

本発明は、上記問題に鑑みてなされたものであり、移動する１台のカメラによって進行方向の正面の様子が撮影された動画像に基づいて、動画像に映る対象物からカメラまでの距離を算出する動画像距離算出装置および動画像距離算出用プログラムを提供することを課題とする。 The present invention has been made in view of the above problems, and the distance from the object reflected in the moving image to the camera is determined based on the moving image in which the front view in the traveling direction is taken by one moving camera. An object of the present invention is to provide a moving image distance calculation device for calculation and a moving image distance calculation program.

上記課題を解決するために、本発明に係る動画像距離算出装置は、移動する１台のカメラによって移動方向の正面の様子を時間ｔ＝τ−Ｔ（但し、τ＞０，Ｔ＞０）から時間ｔ＝τまで撮影した動画像に基づいて、当該動画像に映る対象物から前記カメラまでの距離を算出する動画像距離算出装置であって、前記動画像の時間ｔ＝τ−Ｔの時のフレーム画像に映る対象物の画素の１つをターゲットピクセルとし、当該フレーム画像からＭ個（Ｍ≧２）の異なる対象物毎に前記ターゲットピクセルを抽出するターゲットピクセル抽出手段と、時間ｔ＝τ−Ｔから時間ｔ＝τまでの複数の前記フレーム画像に基づいて、前記ターゲットピクセル抽出手段により抽出されたＭ個の前記ターゲットピクセルの座標の軌跡を、２次元画像を対象とする動的計画法に基づいて、時間ｔ＝τ−Ｔの前記フレーム画像から時間ｔ＝τの前記フレーム画像までの時系列順に算出する軌跡算出手段と、該軌跡算出手段により算出されたＭ個の前記軌跡に基づいて、時間ｔ＝τ−Ｔの前記フレーム画像の座標から時間ｔ＝τの前記フレーム画像の座標までの移動画素数ｑ_ｍ（ｍ＝１，２，・・・，Ｍ）を、Ｍ個の前記ターゲットピクセル毎に測定する移動画素数測定手段と、前記移動画素数測定手段により測定されたＭ個の前記移動画素数ｑ_ｍのうち、前記移動画素数が最も少ない画素数をμとし、前記移動画素数が最も多い画素数をγとし、Ｍ個の前記ターゲットピクセルによりそれぞれ特定される対象物から前記カメラまでの距離のうちで最も近い距離をＺ_ｎとし、Ｍ個の前記ターゲットピクセルによりそれぞれ特定される対象物から前記カメラまでの距離のうちで最も遠い距離をＺ_Ｌとして、定数ａおよび定数ｂを、
ａ＝Ｚ_Ｌ・ｅｘｐ（（μ／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ））
ｂ＝（１／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ）
により算出し、Ｍ個の前記ターゲットピクセルのそれぞれに対応する前記対象物から前記カメラまでの時間ｔ＝τにおける距離をＺ_ｍ（ｍ＝１，２，・・・，Ｍ）として、当該距離Ｚ_ｍを、前記定数ａおよび前記定数ｂと、Ｍ個の前記移動画素数ｑ_ｍとに基づいて、
Ｚ_ｍ＝ａ・ｅｘｐ（−ｂｑ_ｍ）により算出する距離算出手段とを有することを特徴とする。 In order to solve the above problem, the moving image distance calculation device according to the present invention uses one moving camera to change the front view in the moving direction with time t = τ−T (however, τ> 0, T> 0). It is a moving image distance calculation device that calculates the distance from the object reflected in the moving image to the camera based on the moving image taken from the time t = τ, and the time t = τ−T of the moving image. A target pixel extraction means for extracting one of the pixels of an object reflected in the frame image at the time as a target pixel, and extracting the target pixel for each of M (M ≧ 2) different objects from the frame image, and time t = Based on the plurality of frame images from τ−T to time t = τ, the trajectory of the coordinates of the M target pixels extracted by the target pixel extraction means is dynamically planned for the two-dimensional image. Based on the method, the locus calculation means for calculating from the frame image at time t = τ −T to the frame image at time t = τ in chronological order, and the M loci calculated by the locus calculation means. based on the time t = tau-T the time from the coordinates of the frame image t = number of mobile pixels up coordinates of the frame image of tau _q m of (m = 1,2, ···, M ) a, M pieces wherein to the number of pixels moved measuring means for measuring for each target pixel, among the measured M-number of the mobile number of pixels q _m by the moving pixel number measuring means, and the number of pixels the number of moving pixels is smallest μ of, _Let γ be the number of pixels having the largest number of moving pixels, and let Zn be the shortest distance from the object specified by the M target pixels to the camera, and the M target pixels be used. The farthest distance from the specified object to the camera is Z _L , and the constant a and the constant b are set to Z L.
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
b = (1 / (γ-μ)) log (Z _L / Z _N )
The distance Z at the time t = τ from the object corresponding to each of the M target pixels to the camera is defined as Z _m (m = 1, 2, ..., M). _m is based on the constant a and the constant b and the number of M moving pixels q _m .
It is characterized by having a distance calculation means calculated by Z _m = a · exp (−bq _m).

また、本発明に係る動画像距離算出用プログラムは、移動する１台のカメラによって移動方向の正面の様子を時間ｔ＝τ−Ｔ（但し、τ＞０，Ｔ＞０）から時間ｔ＝τまで撮影した動画像に基づいて、当該動画像に映る対象物から前記カメラまでの距離を算出するための動画像距離算出用プログラムであって、制御手段に、前記動画像の時間ｔ＝τ−Ｔの時のフレーム画像に映る対象物の画素の１つをターゲットピクセルとして、当該フレーム画像からＭ個（Ｍ≧２）の異なる対象物毎に前記ターゲットピクセルを抽出させるターゲットピクセル抽出機能と、時間ｔ＝τ−Ｔから時間ｔ＝τまでの複数の前記フレーム画像に基づいて、前記ターゲットピクセル抽出機能により抽出されたＭ個の前記ターゲットピクセルの座標の軌跡を、２次元画像を対象とする動的計画法に基づいて、時間ｔ＝τ−Ｔの前記フレーム画像から時間ｔ＝τの前記フレーム画像までの時系列順に算出させる軌跡算出機能と、該軌跡算出機能により算出されたＭ個の前記軌跡に基づいて、時間ｔ＝τ−Ｔの前記フレーム画像の座標から時間ｔ＝τの前記フレーム画像の座標までの移動画素数ｑ_ｍ（ｍ＝１，２，・・・，Ｍ）を、Ｍ個の前記ターゲットピクセル毎に測定させる移動画素数測定機能と、前記移動画素数測定機能により測定されたＭ個の前記移動画素数ｑ_ｍのうち、前記移動画素数が最も少ない画素数をμとし、前記移動画素数が最も多い画素数をγとし、Ｍ個の前記ターゲットピクセルによりそれぞれ特定される対象物から前記カメラまでの距離のうちで最も近い距離をＺ_ｎとし、Ｍ個の前記ターゲットピクセルによりそれぞれ特定される対象物から前記カメラまでの距離のうちで最も遠い距離をＺ_Ｌとして、定数ａおよび定数ｂを、
ａ＝Ｚ_Ｌ・ｅｘｐ（（μ／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ））
ｂ＝（１／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ）により算出させ、Ｍ個の前記ターゲットピクセルのそれぞれに対応する前記対象物から前記カメラまでの時間ｔ＝τにおける距離をＺ_ｍ（ｍ＝１，２，・・・，Ｍ）として、当該距離Ｚ_ｍを、前記定数ａおよび前記定数ｂと、Ｍ個の前記移動画素数ｑ_ｍとに基づいて、
Ｚ_ｍ＝ａ・ｅｘｐ（−ｂｑ_ｍ）により算出させる距離算出機能とを実現させることを特徴とする。 Further, in the moving image distance calculation program according to the present invention, the front view in the moving direction is changed from time t = τ−T (where τ> 0, T> 0) to time t = τ by one moving camera. It is a moving image distance calculation program for calculating the distance from the object reflected in the moving image to the camera based on the moving images taken up to, and the time t = τ− of the moving image is used as a control means. A target pixel extraction function that uses one of the pixels of the object displayed in the frame image at the time of T as the target pixel and extracts the target pixel for each of M (M ≧ 2) different objects from the frame image, and time. Based on the plurality of frame images from t = τ−T to time t = τ, the locus of the coordinates of the M target pixels extracted by the target pixel extraction function is the motion for the two-dimensional image. A locus calculation function for calculating in chronological order from the frame image at time t = τ −T to the frame image at time t = τ based on the target planning method, and M of the above calculated by the locus calculation function. _{Based on the locus, the number of moving pixels q m} (m = 1, 2, ..., M) from the coordinates of the frame image at time t = τ −T to the coordinates of the frame image at time t = τ is determined. and M of the measured to the number of pixels moved measurement function for each target pixel, among the measured M-number of the mobile number of pixels q _m by the number of pixels moved measurement function, the number of pixels the number of moving pixels is smallest μ _Let γ be the number of pixels having the largest number of moving pixels, and let Zn be the shortest distance from the object specified by the M target pixels to the camera, and M the targets. _{Let Z L be} the farthest distance from the object specified by each pixel to the camera, and let constant a and constant b be.
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
Calculated by b = (1 / (γ-μ)) log (Z _L / Z _N ), the distance from the object corresponding to each of the M target pixels to the camera at time t = τ is Z. _{As m} (m = 1, 2, ..., M), the distance Z _m is set based on the constant a and the constant b and the number of M moving pixels q _m .
It is characterized by realizing a distance calculation function calculated by Z _m = a · exp (−bq _m).

また、上述した動画像距離算出装置または動画像距離算出用プログラムにおいて、ターゲットピクセルの抽出を行うために用いられる前記フレーム画像に対し、mean-shift法を適用することにより、当該フレーム画像を複数の領域に分割し、分割された前記領域のうち、前記対象物が映る領域の重心点の画素を前記ターゲットピクセルとして、あるいは、前記領域の画素に対して端から順番に番号を付加し、当該領域の画素の番号のうち中間の番号に該当する画素を前記ターゲットピクセルとして、異なる対象物が映るＭ個（Ｍ≧２）の領域毎に前記ターゲットピクセルを抽出するものであってもよい。 Further, by applying the mean-shift method to the frame image used for extracting the target pixel in the moving image distance calculation device or the moving image distance calculation program described above, a plurality of the frame images can be obtained. It is divided into regions, and among the divided regions, the pixel at the center of gravity of the region where the object is reflected is used as the target pixel, or the pixels in the region are numbered in order from the end to be the region. The pixel corresponding to the intermediate number among the pixel numbers of the above may be used as the target pixel, and the target pixel may be extracted for each of M (M ≧ 2) regions in which different objects are reflected.

本発明に係る動画像距離算出装置および動画像距離算出用プログラムによれば、移動するカメラにより進行方向の正面の様子が撮影された動画像に基づいて、動画像に映る対象物からカメラまでの距離を算出することが可能になる。 According to the moving image distance calculation device and the moving image distance calculation program according to the present invention, from the object reflected in the moving image to the camera based on the moving image in which the front view in the traveling direction is taken by the moving camera. It becomes possible to calculate the distance.

実施の形態に係る動画像距離算出装置の概略構成を示したブロック図である。It is a block diagram which showed the schematic structure of the moving image distance calculation apparatus which concerns on embodiment. 移動する物体から見た正面方向の様子を模式的に示した図である。It is a figure which showed the state in the front direction seen from the moving object schematically. 動画像のデータとターゲットピクセルとを説明するための図である。It is a figure for demonstrating the data of a moving image and a target pixel. 時間経過に伴って変化するターゲットピクセルの追跡処理を説明するための図である。It is a figure for demonstrating the tracking process of the target pixel which changes with the lapse of time. 累積動的視差に基づいて対象物までの仮想距離を求める方法を説明するため図である。It is a figure for demonstrating the method of finding the virtual distance to an object based on cumulative dynamic parallax. 時間ｔ−１の時のフレーム画像から時間ｔの時の１つのターゲットピクセル（ｘ，ｙ，ｔ）へ遷移し得るピクセル群を示した図である。It is a figure which showed the pixel group which can transition from the frame image at the time t-1 to one target pixel (x, y, t) at the time t. 信号機の映った領域の画素からなるターゲットピクセルを、時間ｔ＝１から時間ｔ＝３０までトラッキングさせた場合の軌跡を示したフレーム画像の拡大図である。It is an enlarged view of the frame image which showed the locus when the target pixel which consists of the pixel of the area where a traffic light is reflected is tracked from time t = 1 to time t = 30. ５箇所の信号機のターゲットピクセルを、時間ｔ＝１から時間ｔ＝３０までそれぞれトラッキングさせた場合の軌跡を示したフレーム画像である。It is a frame image which showed the locus when the target pixel of 5 traffic lights was tracked from time t = 1 to time t = 30, respectively. カメラにより撮影されたフレーム画像に対してmean-shift法を適用した画像を示した図である。It is a figure which showed the image which applied the mean-shift method to the frame image taken by the camera. プログラムに応じてＣＰＵにより実行される機能の機能部を示したブロック図である。It is a block diagram which showed the functional part of the function which is executed by the CPU according to a program. （ａ）は、実施の形態に係るＣＰＵが、時間ｔ＝τ−Ｔから時間ｔ＝τまでの時系列順にターゲットピクセルのトラッキング処理を行うことによって、カメラから対象物までの距離を算出する方法を示したフローチャートである。（ｂ）は、実施の形態に係るＣＰＵが、時間ｔ＝τから時間ｔ＝τ−Ｔまでの過去に遡る時系列順にターゲットピクセルのトラッキング処理を行うことによって、カメラから対象物までの距離を算出する方法を示したフローチャートである。(A) is a method in which a CPU according to an embodiment calculates a distance from a camera to an object by performing tracking processing of target pixels in chronological order from time t = τ−T to time t = τ. It is a flowchart which showed. In (b), the CPU according to the embodiment determines the distance from the camera to the object by performing tracking processing of the target pixels in chronological order retroactively from time t = τ to time t = τ−T. It is a flowchart which showed the calculation method.

以下、本発明に係る動画像距離算出装置の一例を示し、図面を用いて詳細に説明する。図１は、動画像距離算出装置の概略構成を示したブロック図である。動画像距離算出装置１００は、記録部１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、ＣＰＵ(Central Processing Unit：制御手段、領域分割手段、ターゲットピクセル抽出手段、距離算出手段、軌跡算出手段、移動画素数測定手段、距離算出手段）１０４とを有している。 Hereinafter, an example of the moving image distance calculation device according to the present invention will be shown and described in detail with reference to the drawings. FIG. 1 is a block diagram showing a schematic configuration of a moving image distance calculation device. The moving image distance calculation device 100 includes a recording unit 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, a CPU (Central Processing Unit: control means, area division means, target pixel extraction means, distance). It has a calculation means, a locus calculation means, a moving pixel number measuring means, a distance calculation means) 104.

動画像距離算出装置１００には、カメラ２００が接続される。カメラ２００は、例えば、車両やドローンなどに搭載される。カメラ２００は、移動する車両等の進行方向正面の様子を動画像として撮影することが可能となっている。また、動画像距離算出装置１００には、モニタ２１０が接続されている。モニタ２１０には、カメラ２００によって撮影された動画像等を表示させることが可能になっている。 A camera 200 is connected to the moving image distance calculation device 100. The camera 200 is mounted on, for example, a vehicle or a drone. The camera 200 can capture a moving image of a moving vehicle or the like in front of the vehicle in the traveling direction. Further, a monitor 210 is connected to the moving image distance calculation device 100. The monitor 210 can display a moving image or the like taken by the camera 200.

記録部１０１には、カメラ２００により撮影された動画像が記録される。より詳細には、複数のフレーム画像を時系列的に記録したデータとして、記録部１０１に動画像が記録される。例えば、時間ｔ＝１から時間ｔ＝Ｔまでの動画像を、カメラ２００で撮影した場合を考える。カメラ２００の動画像として、Δｔ時間毎に１枚のフレームの画像（フレーム画像）を記録できる場合には、記録部１０１に、Ｔ／Δｔ枚のフレーム画像が時系列的に記録されることになる。本実施の形態に係る記録部１０１では、一例として、単位時間毎のフレーム画像をＴ枚記録することが可能になっている。 A moving image taken by the camera 200 is recorded in the recording unit 101. More specifically, a moving image is recorded in the recording unit 101 as data obtained by recording a plurality of frame images in time series. For example, consider the case where a moving image from time t = 1 to time t = T is taken by the camera 200. If one frame image (frame image) can be recorded as a moving image of the camera 200 every Δt time, the T / Δt frame images are recorded in the recording unit 101 in chronological order. Become. As an example, the recording unit 101 according to the present embodiment can record T frame images for each unit time.

なお、動画像距離算出装置１００あるいはカメラ２００に、フレームバッファーを設けて、カメラ２００で撮影された単位時間毎のフレーム画像がフレームバッファーに一時的に記録され、フレームバッファーに記録されたフレーム画像が、時系列的に記録部１０１に記録される構成であってもよい。また、記録部１０１に記録される動画像は、カメラ２００によりリアルタイムに撮影された動画像には限定されず、予めカメラ２００によって撮影された動画像（過去の動画像）であってもよい。 A frame buffer is provided in the moving image distance calculation device 100 or the camera 200, and the frame image for each unit time taken by the camera 200 is temporarily recorded in the frame buffer, and the frame image recorded in the frame buffer is recorded. , It may be configured to be recorded in the recording unit 101 in chronological order. Further, the moving image recorded in the recording unit 101 is not limited to the moving image taken in real time by the camera 200, and may be a moving image (past moving image) taken by the camera 200 in advance.

また、カメラ２００により撮影された動画像は、デジタルの動画像だけには限定されない。例えば、撮影された動画像がアナログの動画像であっても、デジタル変換処理によって、時系列的にフレーム画像を記録部１０１に記録させることが可能であれば、動画像距離算出装置１００の距離算出処理に利用することが可能である。 Further, the moving image taken by the camera 200 is not limited to the digital moving image. For example, even if the captured moving image is an analog moving image, if it is possible to record the frame image in the recording unit 101 in time series by the digital conversion process, the distance of the moving image distance calculation device 100 It can be used for calculation processing.

記録部１０１は、一般的なハードディスク等によって構成されている。なお、記録部１０１の構成は、ハードディスクだけに限定されるものではなく、フラッシュメモリ、ＳＳＤ（Solid State Drive / Solid State Disk）などであってもよい。記録部１０１は、動画像を、時系列的な複数のフレーム画像として記録することが可能な記録媒体であるならば、具体的な構成は特に限定されない。 The recording unit 101 is composed of a general hard disk or the like. The configuration of the recording unit 101 is not limited to the hard disk, and may be a flash memory, SSD (Solid State Drive / Solid State Disk), or the like. The specific configuration of the recording unit 101 is not particularly limited as long as it is a recording medium capable of recording a moving image as a plurality of time-series frame images.

ＣＰＵ１０４は、記録部１０１に時系列的に記録された複数のフレーム画像（動画像）に基づいて、フレーム画像に映っている対象物からカメラ２００までの距離を、対象物の映っている画素に応じて算出する処理を行う。ＣＰＵ１０４は、後述する処理プログラム（図１１（ａ）（ｂ）のフローチャートに基づくプログラム）に従って、特定の画素（ターゲットピクセル）毎の距離算出処理を行うが、その詳細については後述する。 Based on a plurality of frame images (moving images) recorded in time series in the recording unit 101, the CPU 104 sets the distance from the object displayed in the frame image to the camera 200 to the pixels in which the object is displayed. Perform the calculation process accordingly. The CPU 104 performs a distance calculation process for each specific pixel (target pixel) according to a processing program (a program based on the flowchart of FIGS. 11A and 11B) described later, and the details thereof will be described later.

ＲＯＭ１０２には、フレーム画像に映っている対象物からカメラ２００までの距離を算出するためプログラム等が記録されている。ＲＡＭ１０３は、ＣＰＵ１０４の処理に利用されるワークエリアとして用いられる。 A program or the like is recorded in the ROM 102 to calculate the distance from the object shown in the frame image to the camera 200. The RAM 103 is used as a work area used for processing the CPU 104.

本実施の形態に係る動画像距離算出装置１００では、ＣＰＵ１０４で実行されるプログラム（図１１（ａ）（ｂ）に示すフローチャート）が、ＲＯＭ１０２に記録される場合について説明を行う。しかしながら、これらのプログラムは、記録部１０１に記録されるものであってもよい。 In the moving image distance calculation device 100 according to the present embodiment, a case where a program (flow chart shown in FIGS. 11A and 11B) executed by the CPU 104 is recorded in the ROM 102 will be described. However, these programs may be recorded in the recording unit 101.

カメラ２００は、レンズを通してカメラ正面の景色等を動画像として撮影可能な撮影手段である。動画像を撮影することが可能であれば、カメラ２００の種類・構成は特に限定されない。例えば、一般的なムービーカメラであってもよく、また、スマートフォン等のカメラ機能を利用するものであってもよい。 The camera 200 is a photographing means capable of photographing a landscape in front of the camera as a moving image through a lens. The type and configuration of the camera 200 is not particularly limited as long as it is possible to capture a moving image. For example, it may be a general movie camera, or it may be one that uses a camera function of a smartphone or the like.

モニタ２１０は、カメラ２００で撮影された動画像や、距離算出処理によりトラッキング処理された画素の軌跡が示される画像等（例えば、後述する図７や図８の画像等）を、ユーザに対して視認可能に表示させることが可能となっている。モニタ２１０には、液晶ディスプレイや、ＣＲＴディスプレイなどの一般的な表示装置が用いられる。 The monitor 210 provides the user with a moving image taken by the camera 200, an image showing the locus of pixels tracked by the distance calculation process (for example, images of FIGS. 7 and 8 described later). It is possible to display it visually. A general display device such as a liquid crystal display or a CRT display is used for the monitor 210.

次に、記録部１０１に時系列順に記録されたフレーム画像に基づいて、フレーム画像に映った対象物の距離を算出する考え方について説明する。 Next, a concept of calculating the distance of an object reflected in the frame image based on the frame images recorded in the recording unit 101 in chronological order will be described.

２０００年以上前に、ユークリッド（Euclid）が、動的視差（motion parallax）という視覚的現象について論じている。動的視差による視覚的現象とは、物体が等速で動いているときに、遠くの物の方が、近くの物よりも、視覚的に動きが小さくなる現象である。図２は、移動する物体から見た正面方向の様子を模式的に示した図である。図２に示すように、物体が前方に移動するとき、中心（図２のＡ参照）に位置する物は、あまり動かない。中心以外の周辺の様子（図２のＢ参照）を観察すると、遠くの物の動きは小さく、近くの物の動きは大きくなる。このような動的視差による視覚的現象は、日常的に観測される。この視覚的現象は、心理学的な研究の対象となっている。また、この視覚的現象を応用した技術が、コンピュータグラフィックスによるアニメーション処理や、パイロット運転技術教育用のシミュレータ開発などに利用されている。 More than 2000 years ago, Euclid discussed the visual phenomenon of motion parallax. The visual phenomenon due to dynamic parallax is a phenomenon in which when an object is moving at a constant velocity, the movement of a distant object is visually smaller than that of a nearby object. FIG. 2 is a diagram schematically showing a state in the front direction as seen from a moving object. As shown in FIG. 2, when the object moves forward, the object located in the center (see A in FIG. 2) does not move much. Observing the surroundings other than the center (see B in FIG. 2), the movement of distant objects is small, and the movement of nearby objects is large. Visual phenomena due to such dynamic parallax are observed on a daily basis. This visual phenomenon is the subject of psychological research. In addition, technology that applies this visual phenomenon is being used for animation processing using computer graphics and for developing simulators for pilot driving technology education.

動画像距離算出装置１００では、動的視差による視覚的現象を利用して、移動する車両やドローン等に搭載されたカメラによって撮影された動画像より、動画像に映っている対象物からカメラ２００までの距離を算出する。 The moving image distance calculation device 100 utilizes a visual phenomenon due to dynamic parallax to obtain a camera 200 from an object reflected in the moving image from a moving image taken by a camera mounted on a moving vehicle, a drone, or the like. Calculate the distance to.

図３は、動画像のデータとターゲットピクセルとを説明するための図である。カメラ２００により撮影された動画像を、ｆ（ｘ，ｙ，ｔ）で表す。変数ｘの範囲は、１≦ｘ≦Ｘとし、変数ｙの範囲は、１≦ｙ≦Ｙとし、変数ｔは、ｔ＝１，２，・・・，Ｔとする。Ｘは、カメラ２００によって撮影される動画像の横画素数を示し、Ｙは縦画素数を示す。ｆ（ｘ，ｙ，ｔ）は、該当する画素の値であり、通常、ＲＧＢ（Ｒ：赤、Ｇ：緑、Ｂ：青）の値を示す。（ｘ，ｙ）は該当する画素の座標点を示し、ｔは時間（相対時刻）を示す。 FIG. 3 is a diagram for explaining the moving image data and the target pixel. The moving image taken by the camera 200 is represented by f (x, y, t). The range of the variable x is 1 ≦ x ≦ X, the range of the variable y is 1 ≦ y ≦ Y, and the variable t is t = 1, 2, ..., T. X indicates the number of horizontal pixels of the moving image captured by the camera 200, and Y indicates the number of vertical pixels. f (x, y, t) is the value of the corresponding pixel, and usually indicates the value of RGB (R: red, G: green, B: blue). (X, y) indicates the coordinate point of the corresponding pixel, and t indicates the time (relative time).

動画像をｆ（ｘ，ｙ，ｔ）で示すことにより、動画像のデータを図３のような直方体のデータとして表すことができる。図３の最も手前の面は、時間ｔ＝１のフレーム画像ｆ（ｘ，ｙ，ｔ）（但し、１≦ｘ≦Ｘ，１≦ｙ≦Ｙ）を示すことになる。このフレーム画像中のいずれかの座標（ｘ_０，ｙ_０）のピクセルｆ（ｘ_０，ｙ_０，１）をターゲットピクセルとする。図３では、説明の便宜上、ターゲットピクセルが１つの場合について説明するが、ターゲットピクセルは、１つだけには限定されない。 By indicating the moving image by f (x, y, t), the moving image data can be represented as rectangular parallelepiped data as shown in FIG. The foremost surface of FIG. 3 shows a frame image f (x, y, t) at time t = 1 (however, 1 ≦ x ≦ X, 1 ≦ y ≦ Y). The pixel f (x ₀ , y ₀ , 1) at any of the coordinates (x ₀ , y ₀ ) in this frame image is set as the target pixel. In FIG. 3, for convenience of explanation, the case where there is one target pixel will be described, but the number of target pixels is not limited to one.

ターゲットピクセルが、時間ｔの経過とともに直方体の中でどのように動いて、時間ｔ＝Ｔのフレーム画像の中のいずれの画素に至るかを追跡することを考える。ターゲットピクセルの追跡を考えることにより、移動するカメラで正面の様子が撮影された場合に、動的視差の視覚的現象の観点から、動画像に記録された対象物の見え方が、どのように変化するかを検出することができる。 Consider tracking how a target pixel moves in a rectangular parallelepiped over time t to reach which pixel in the frame image at time t = T. By considering the tracking of the target pixel, how the object recorded in the moving image looks from the viewpoint of the visual phenomenon of dynamic parallax when the front view is taken by a moving camera. It is possible to detect whether it changes.

図４は、時間経過に伴って変化するターゲットピクセルの追跡処理を説明するための図である。図４には、時間ｔ＝１の時の座標（ｘ_０，ｙ_０）におけるターゲットピクセルｆ（ｘ_０，ｙ_０，１）と、時間ｔ＝Ｔの時の座標（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））におけるターゲットピクセルｆ（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ），Ｔ）とが示されている。ターゲットピクセルの追跡を、トラッキングと称する。ターゲットピクセルが、時間ｔ＝１のフレーム画像の中の位置（ｘ_０，ｙ_０，１）（＝（ｘ_０，ｙ_０））から、時間ｔ＝Ｔのフレーム画像の中の位置（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））へ移動した場合を考える。このとき、時間ｔ＝１から時間ｔ＝Ｔまでの経過時間Ｔによる、ターゲットピクセルの累積動的視差をｑ（ｘ_０，ｙ_０，Ｔ）とすると、累積動的視差ｑ（ｘ_０，ｙ_０，Ｔ）は、
ｑ（ｘ_０，ｙ_０，Ｔ）＝｜（ｘ_０，ｙ_０）−（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））｜
として定めることができる。 FIG. 4 is a diagram for explaining a tracking process of a target pixel that changes with the passage of time. In FIG. 4, the target pixel f (x ₀ , y ₀ _{, 1) at the coordinates (x 0} , y ₀ ) at the time t = 1 and the coordinates (x ^* ₀ (T)) at the time t = T are shown. , Y ^* ₀ (T)) and the target pixel f (x ^* ₀ (T), y ^* ₀ (T), T). Tracking the target pixel is called tracking. The target pixel is located in the frame image at time t = 1 from the position (x ₀ , y ₀ , 1) (= (x ₀ , y ₀ )) in the frame image at time t = T (x ^*. Consider the case of moving to ₀ (T), y ^* _{0 (T)).} At this time, if the cumulative dynamic parallax of the target pixel due to the elapsed time T from the time t = 1 to the time t = T is q (x ₀ , y ₀ , T), the cumulative dynamic parallax q (x ₀ , y). ₀ , T) is
q (x ₀ , y ₀ , T) = | (x ₀ , y ₀ )-(x ^* ₀ (T), y ^* ₀ (T)) |
Can be determined as.

累積動的視差ｑ（ｘ_０，ｙ_０，Ｔ）は、時間ｔ＝１のフレーム画像の座標（ｘ_０，ｙ_０）（＝ｆ（ｘ_０，ｙ_０，１）の座標）に映った対象物の現実世界の物体位置（物体点）が、時間ｔ＝Ｔのフレーム画像上の座標（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））に映った対象物の現実世界の物体位置（物体点）まで移動した場合における、カメラ２００から現実世界の物体位置（物体点）までの距離の変化に対応するものである。累積動的視差の変化量と、カメラ２００から現実世界の物体位置までの距離の変化量との対応関係を求めることにより、累積動的視差ｑに基づいて、カメラ２００から現実世界の物体位置までの距離を求めることができる。 The cumulative dynamic parallax q (x ₀ , y ₀ _{, T) is reflected in the coordinates (x 0} , y ₀ ) (= f (x ₀ , y ₀ , 1) coordinates of the frame image at time t = 1). The object position (object point) in the real world of the object is reflected in ^{the coordinates (x *} ₀ (T), y ^* ₀ (T)) on the frame image at time t = T. It corresponds to the change in the distance from the camera 200 to the object position (object point) in the real world when moving to (object point). By finding the correspondence between the amount of change in the cumulative dynamic parallax and the amount of change in the distance from the camera 200 to the object position in the real world, from the camera 200 to the object position in the real world based on the cumulative dynamic parallax q. Distance can be calculated.

図５は、累積動的視差に基づいて対象物からカメラ２００までの距離を求める方法を説明するため図である。図５の縦軸は、カメラ２００から対象物までの仮想距離Ｚｖを示す。仮想距離Ｚｖのプラスの方向は図の下方向である。図５の横軸は、累積動的視差ｑを示す。累積動的視差ｑのプラスの方向は、図の右方向である。カメラ２００が正面方向に進むとすると、Δｔの時間経過によって、仮想距離はΔＺｖ（＞０）だけ短くなり、累積動的視差は、Δｑ（＞０または＜０）だけ変化する。仮想距離Ｚｖの値は、仮想であるゆえ、ある定数である単位累積動的視差ｑ_０の値に対応するものとする。動的視差の特性として、動的視差の値が大きいほど、カメラ２００から物体までの距離は短く、動的視差の値が小さいほど、カメラ２００から物体までの距離は長い現象が存在する。このため、仮想距離Ｚｖと累積動的視差ｑ_０とのパラメータには、比例関係が成立する。すなわち、図５に示す関係から、Ｚｖ：ｑ_０＝−ΔＺｖ：Δｑの比例関係が成立する。 FIG. 5 is a diagram for explaining a method of obtaining a distance from an object to the camera 200 based on cumulative dynamic parallax. The vertical axis of FIG. 5 shows the virtual distance Zv from the camera 200 to the object. The positive direction of the virtual distance Zv is the lower direction in the figure. The horizontal axis of FIG. 5 indicates the cumulative dynamic parallax q. The positive direction of the cumulative dynamic parallax q is the right direction in the figure. Assuming that the camera 200 advances in the front direction, the virtual distance is shortened by ΔZv (> 0) and the cumulative dynamic parallax is changed by Δq (> 0 or <0) with the passage of time of Δt. Since the value of the virtual distance Zv is virtual, it corresponds to the value _{of the unit cumulative dynamic parallax q 0, which is a constant.} As a characteristic of dynamic parallax, there is a phenomenon that the larger the value of dynamic parallax, the shorter the distance from the camera 200 to the object, and the smaller the value of dynamic parallax, the longer the distance from the camera 200 to the object. Therefore, a proportional relationship is established between the parameters of the virtual distance Zv and the cumulative dynamic parallax q _0. That is, from the relationship shown in FIG. 5, _{the proportional relationship of Zv: q 0} = −ΔZv: Δq is established.

この比例関関係から、−ｑ_０・ΔＺｖ＝Ｚｖ・Δｑの関係式が成立し、
ΔＺｖ／Ｚｖ＝−Δｑ／ｑ_０
ｌｏｇＺｖ＝−ｑ／ｑ_０＋ｃ（ｃは定数）と式を変形することにより
Ｚｖ＝ａ・ｅｘｐ（−ｂｑ）
が成立する。ここで、ａ，ｂは正の定数である。またｅｘｐ（−ｂｑ）は、自然対数の底の値（ネイピア数：Napier's constant）の−ｂｑ乗を示している。定数ａ，ｂの値が決定されると、カメラ２００により撮影された動画像に基づき累積動的視差ｑの値を算出することにより、Ｚｖの値を、仮想距離ではなく現実世界の距離として求めることが可能になる。 From this proportional relation, _{the relational expression of −q 0} · ΔZv = Zv · Δq is established.
ΔZv / Zv = −Δq / q ₀
By transforming the equation with logZv = -q / q ₀ + c (c is a constant)
Zv = a · exp (-bq)
Is established. Here, a and b are positive constants. Further, exp (-bq) indicates the value of the base of the natural logarithm (Napier's constant) to the power of -bq. When the values of the constants a and b are determined, the value of Zv is obtained as the distance in the real world instead of the virtual distance by calculating the value of the cumulative dynamic parallax q based on the moving image taken by the camera 200. Will be possible.

定数ａ，ｂの値は、変数Ｚｖと変数ｑとの変動範囲に基づいて決定される。Ｚｖは、既に説明したように、カメラ２００から物体位置までの仮想距離を示している。仮想距離は、対象世界（対象となる世界、対象となる環境）によって変わり得る値である。例えば、本実施の形態におけるＺｖの対象世界は、ターゲットピクセルの累積動的視差の値の変動に依存する、動画像の仮想的な３次元空間を対象世界としている。動画像の３次元空間（対象世界）で求められる仮想距離は、現実世界の距離とは異なる値である。このため、動画像の３次元空間（対象世界）の仮想距離Ｚｖに対応する現実世界の距離の変動範囲を視察や他の方法で、事前に決定することにより、対象世界の距離から現実世界の距離を対応づけて求めることが可能になる。 The values of the constants a and b are determined based on the fluctuation range of the variable Zv and the variable q. Zv indicates the virtual distance from the camera 200 to the object position, as described above. The virtual distance is a value that can change depending on the target world (target world, target environment). For example, the target world of Zv in the present embodiment is a virtual three-dimensional space of a moving image, which depends on the fluctuation of the cumulative dynamic parallax value of the target pixel. The virtual distance obtained in the three-dimensional space (target world) of the moving image is a value different from the distance in the real world. Therefore, by observing or otherwise determining in advance the fluctuation range of the distance in the real world corresponding to the virtual distance Zv in the three-dimensional space (target world) of the moving image, the distance in the target world can be determined in advance to the real world. It is possible to find the distances in association with each other.

現実世界の距離Ｚを、対象世界の仮想距離Ｚｖに対応づけることができれば、
Ｚ＝ａ・ｅｘｐ（−ｂｑ）・・・式１
によって、現実世界の距離Ｚを求めることができる。つまり、現実世界におけるカメラ２００から物体位置までの距離Ｚを求めることが可能になる。 If the distance Z in the real world can be associated with the virtual distance Zv in the target world,
Z = a · exp (−bq) ・・・ Equation 1
Allows the distance Z in the real world to be obtained. That is, it becomes possible to obtain the distance Z from the camera 200 to the object position in the real world.

本実施の形態では、一例として、動画像の３次元空間（対象世界）の仮想距離Ｚｖに対応する現実世界の距離の変動範囲を視察によって決定する。つまり、動画像の３次元空間（対象世界）の仮想距離Ｚｖが現実世界においてどのくらいの距離範囲に含まれるかを、ユーザの目視等の観察によって決定する。 In the present embodiment, as an example, the fluctuation range of the distance in the real world corresponding to the virtual distance Zv in the three-dimensional space (target world) of the moving image is determined by inspection. That is, it is determined by visual observation of the user or the like how much the virtual distance Zv in the three-dimensional space (target world) of the moving image is included in the real world.

例えば、フレーム画像に映ったＭ個の対象物の距離を算出する場合、動画像の３次元空間（対象世界）においてＭ個の仮想距離Ｚｖが存在することになる。このＭ個の仮想距離Ｚｖに対応する現実世界での距離範囲（カメラ２００からＭ個のそれぞれの対象物までの距離が全て含まれると判断できる現実世界の距離範囲であって、目視により判断された現実世界の距離範囲）を、Ｚ_ＮからＺ_Ｌまでの距離範囲（Ｚ_Ｎ≦Ｚ_Ｌ）とする。仮想距離Ｚｖの距離範囲は、Ｚ_Ｎ≦Ｚｖ≦Ｚ_Ｌで表すことができる。なお、仮想距離Ｚｖの距離範囲を示す距離Ｚ_Ｎおよび距離Ｚ_Ｌは、必ずしも目視により決定される場合には限定されず、他の方法により決定されるものであっても良い。 For example, when calculating the distances of M objects reflected in a frame image, M virtual distances Zv exist in the three-dimensional space (object world) of the moving image. The real-world distance range corresponding to the M virtual distances Zv (the real-world distance range in which it can be determined that all the distances from the camera 200 to each of the M objects are included, and is visually determined. The distance range in the real world) is defined as the distance range from Z _N _{to Z L} (Z _N ≤ Z _L ). The distance range of the virtual distance Zv can be expressed _{by Z N} ≤ Zv ≤ Z _L. _{The distance Z N} and the distance Z _L indicating the distance range of the virtual distance Zv are not necessarily limited to those determined visually, and may be determined by other methods.

累積動的視差ｑの値の変動範囲は、個別に動画像から求められる実験的な値により決定される。つまり、先験的な情報は必要とされない。累積動的視差ｑの変動範囲は、動画像に映っている物体の現実世界によって左右され、かつ、時間範囲Ｔにも依存する変動範囲となる。ここで、累積動的視差ｑは、時間ｔ＝１のフレーム画像におけるターゲットピクセルが、時間ｔ＝Ｔのフレーム画像においてどこまで移動したかによって求めることができる。つまり、累積動的視差ｑは、時間ｔ＝１のフレーム画像の座標から時間ｔ＝Ｔのフレーム画像の座標まで移動した、ターゲットピクセルの移動画素数により求めることができる。例えば、Ｍ個の対象物のそれぞれに対応して、Ｍ個のターゲットピクセルが存在する場合には、累積動的視差（移動画素数）もターゲットピクセルの数に応じてＭ個求めることができる。このようにして求められたＭ個の累積動的視差（移動画素数）ｑの変動範囲を、μ≦ｑ≦γとする。Ｍ個の移動画素数のうち最も少ない画素数がμに該当し、Ｍ個の移動画素数のうち最も多い画素数がγに該当することになる。つまり、μとγとは、動画像によって求められた複数のターゲットピクセルの移動画素数により定まる実験的な値である。 The fluctuation range of the value of the cumulative dynamic parallax q is determined by the experimental value individually obtained from the moving image. That is, no a priori information is required. The fluctuation range of the cumulative dynamic parallax q is a fluctuation range that depends on the real world of the object shown in the moving image and also depends on the time range T. Here, the cumulative dynamic parallax q can be obtained by how far the target pixel in the frame image at time t = 1 has moved in the frame image at time t = T. That is, the cumulative dynamic parallax q can be obtained by the number of moving pixels of the target pixel moved from the coordinates of the frame image at time t = 1 to the coordinates of the frame image at time t = T. For example, when there are M target pixels corresponding to each of the M objects, the cumulative dynamic parallax (number of moving pixels) can also be obtained as M according to the number of target pixels. The fluctuation range of the cumulative dynamic parallax (number of moving pixels) q of M obtained in this way is set to μ ≦ q ≦ γ. The smallest number of pixels among the M moving pixels corresponds to μ, and the largest number of M moving pixels corresponds to γ. That is, μ and γ are experimental values determined by the number of moving pixels of the plurality of target pixels obtained by the moving image.

μ，γとＺ_Ｌ，Ｚ_Ｎとの対応関係は、動的視差の性質に基づいて求めることができる。μはＺ_Ｌに対応し、γはＺ_Ｎに対応する。これは、仮想距離Ｚｖが遠いほど、動画像の物体点（物体位置）の移動量が少なくなり、仮想距離Ｚｖが近いほど、動画像の物体点（物体位置）の移動量が多くなるという、動的視差の性質によるものである。このように、仮想距離Ｚｖの距離範囲のうち距離が最も短い距離Ｚ_Ｎは、累積動的視差（移動画素数）ｑの変動範囲のうち最も移動量が多いγに対応し、仮想距離Ｚｖの距離範囲のうち距離が最も長い距離Ｚ_Ｌは、累積動的視差（移動画素数）ｑの変動範囲のうち最も移動量が少ないμに対応することになる。 The correspondence between μ, γ and Z _L , Z _N can be determined based on the nature of dynamic parallax. μ _{corresponds to Z L} and γ corresponds to Z _N. This is because the farther the virtual distance Zv is, the smaller the amount of movement of the object point (object position) of the moving image is, and the closer the virtual distance Zv is, the larger the amount of movement of the object point (object position) of the moving image is. This is due to the nature of dynamic parallax. _{In this way, the distance Z N} having the shortest distance in the distance range of the virtual distance Zv corresponds to γ having the largest amount of movement in the fluctuation range of the cumulative dynamic parallax (number of moving pixels) q, and the virtual distance Zv. _{The distance Z L} , which has the longest distance in the distance range, corresponds to μ, which has the smallest amount of movement in the fluctuation range of the cumulative dynamic parallax (number of moving pixels) q.

従って、Ｚｖ＝ａ・ｅｘｐ（−ｂｑ）のＺｖとｑの値に、μとＺ_Ｌ、およびγとＺ_Ｎとを対応づけて代入することにより、次のａ，ｂに関する連立方程式が成立する。 Therefore, by substituting the values of Zv and q of Zv = a · exp (−bq) with μ and Z _L , and γ and Z _N , the following simultaneous equations for a and b are established. ..

Ｚ_Ｌ＝ａ・ｅｘｐ（−ｂμ）・・・式２
Ｚ_Ｎ＝ａ・ｅｘｐ（−ｂγ）・・・式３
この連立方程式を解くと、下記のように、定数ａ，ｂを求めることができる。 Z _L = a · exp (−bμ) ・・・ Equation 2
Z _N = a · exp (−bγ) ・・・ Equation 3
By solving this simultaneous equation, the constants a and b can be obtained as follows.

ａ＝Ｚ_Ｌ・ｅｘｐ（（μ／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ））・・・式４
ｂ＝（１／（γ−μ））ｌｏｇ（Ｚ_Ｌ／Ｚ_Ｎ）・・・式５
このように、定数ａ，ｂを求めて、上述した式１に適用することによって、仮想距離Ｚｖの値を現実世界の距離Ｚとして算出することが可能になる。 a = Z _L · exp ((μ / (γ −μ)) log (Z _L / Z _N )) ・・・ Equation 4
b = (1 / (γ-μ)) log (Z _L / Z _N ) ・・・ Equation 5
In this way, by obtaining the constants a and b and applying them to the above-mentioned equation 1, the value of the virtual distance Zv can be calculated as the distance Z in the real world.

次に、ターゲットピクセルのトラッキング方法について説明する。図６は、時間ｔ−１の時のフレーム画像から時間ｔの時の１つのターゲットピクセル（ｘ，ｙ，ｔ）へ遷移し得るピクセル群（局所領域）、つまり、遷移対象となり得るフレーム画像のピクセル群を示した図である。２次元画像を対象とした動的計画法を用いて時間ｔの画素のトラッキングを行う場合には、図６に示すように、トラッキングの対象となる時間ｔ−１の局所領域のいずれかの画素から、時間ｔのターゲットピクセルへ遷移されることになる。この点で、図６には、ピクセルトラッキングを行う２次元画像を対象とした動的計画法での局所遷移が示されていると判断できる。この局所遷移に基づいて２次元画像を対象とした動的計画法の漸化式が作られる。 Next, the tracking method of the target pixel will be described. FIG. 6 shows a pixel group (local region) that can transition from a frame image at time t-1 to one target pixel (x, y, t) at time t, that is, a frame image that can be a transition target. It is a figure which showed the pixel group. When tracking pixels at time t using dynamic programming for a two-dimensional image, as shown in FIG. 6, any pixel in the local region at time t-1 to be tracked. Will be transitioned to the target pixel at time t. In this respect, it can be determined that FIG. 6 shows the local transition in the dynamic programming method for the two-dimensional image to be pixel-tracked. Based on this local transition, a recurrence formula of dynamic programming for a two-dimensional image is created.

ここで、２次元画像を対象とした動的計画法とは、２つの系列間の非線形性となる対応関係を求めるための方法であって、一般的に知られる動的計画法の１つの応用例に該当する。従来の動的計画法と異なる点は、局所距離の作り方である。本実施の形態で用いる２次元画像を対象とした動的計画法では、ターゲットピクセルの画素の値と、動画像のすべての画素の値との関係において局所距離を計算する。その後、ターゲットピクセルを始点として、時間ｔ＝Ｔの画像のすべての画素を可能な終点として、始点から終点に至る最小の累積距離を与えるパスを、動的計画法を用いて求める。このような手法を用いることによって、時間ｔ＝１から時間ｔ＝Ｔまでの複数のフレーム画像にわたるターゲットピクセルの追跡を行うことが可能になる。 Here, the dynamic programming method for a two-dimensional image is a method for obtaining a non-linear correspondence between two series, and is one application of a generally known dynamic programming method. Applies to the example. The difference from the conventional dynamic programming method is how to make the local distance. In the dynamic programming method for a two-dimensional image used in the present embodiment, the local distance is calculated in the relationship between the pixel value of the target pixel and the value of all the pixels of the moving image. Then, a path that gives the minimum cumulative distance from the start point to the end point is obtained by using dynamic programming, with the target pixel as the start point and all the pixels of the image at time t = T as possible end points. By using such a technique, it becomes possible to track the target pixel over a plurality of frame images from time t = 1 to time t = T.

２次元画像を対象とする動的計画法の漸化式は、まず、ターゲットピクセルｆ（ｘ_０，ｙ_０，１）を用いて局所距離と呼ばれる値、ｄ（ｘ，ｙ，ｔ）＝｜ｆ（ｘ_０，ｙ_０，１）−ｆ（ｘ，ｙ，ｔ）｜を計算する。ｄ（ｘ，ｙ，ｔ）の境界条件は、ｔ≦０、または、ｘ≦０、または、ｘ＞Ｘ、または、ｙ≧０、または、ｙ＞Ｙに対して、ｄ（ｘ，ｙ，ｔ）＝∞とする。また、最適な局所距離を累積することによって求められる値を最適累積距離Ｓ（ｘ，ｙ，ｔ）とする。最適累積距離Ｓ（ｘ，ｙ，ｔ）の初期条件を、０≦ｘ≦Ｘ＋１，０≦ｙ≦Ｙ＋１，０≦ｔ≦Ｔ，およびＳ（ｘ_０，ｙ_０，１）＝０に対して、Ｓ（ｘ，ｙ，ｔ）＝∞とする。 The recurrence formula of the dynamic programming method for a two-dimensional image first uses a target pixel f (x ₀ , y ₀ , 1) and a value called a local distance, d (x, y, t) = | Calculate f (x ₀ , y ₀ , 1) -f (x, y, t) |. The boundary condition of d (x, y, t) is d (x, y, for t ≦ 0, or x ≦ 0, or x> X, or y ≧ 0, or y> Y. t) = ∞. Further, the value obtained by accumulating the optimum local distance is defined as the optimum cumulative distance S (x, y, t). The initial condition of the optimum cumulative distance S (x, y, t) is set to 0 ≦ x ≦ X + 1, 0 ≦ y ≦ Y + 1, 0 ≦ t ≦ T, and S (x ₀ , y ₀ , 1) = 0. , S (x, y, t) = ∞.

このような境界条件・初期条件が設定される場合、動的計画法に基づく最適累積距離の漸化式は、

・・・式６
となる。 When such boundary conditions and initial conditions are set, the recurrence formula of the optimum cumulative distance based on dynamic programming is used.

... Equation 6
Will be.

ここで、Ｎ（ｉ，ｊ）≡｛（ｉ＋α，ｊ＋β）：｜α｜≦α_０，｜β｜≦β_０｝と定義される。Ｎ（ｉ，ｊ）は、図６に示す時間ｔ−１のフレーム画像の格子点の集合を示しており、動的計画法による漸化式の関与する範囲を示す。 Here, it is defined as N (i, j) ≡ {(i + α, j + β): | α | ≤α ₀ , | β | ≤β _0}. N (i, j) shows a set of grid points of the frame image at time t-1 shown in FIG. 6, and shows the range in which the recurrence formula by dynamic programming is involved.

式６に示す漸化式を、ｔ＝１からｔ＝Ｔまで作用させたときの最適累積値は、Ｓ（ｘ，ｙ，Ｔ）のスカラー値の分布（但し、１≦ｘ≦Ｘ，１≦ｙ≦Ｙ）となる。ｔ＝１からｔ＝Ｔまで作用させたときの最適累積値Ｓ（ｘ，ｙ，Ｔ）の値が、予め設定される閾値ｈ以下の値となり、かつ最小値となる座標を求める。この座標を（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））とすると、次の式によってこの座標を求めることができる。

・・・式７
ここで、記号ａｒｇは、最小値を与えることになる変数ｘ，ｙを取り出すという操作を意味している。このとき、ターゲットピクセル（ｘ_０，ｙ_０）の時間Ｔによる累積動的視差をｑ（ｘ_０，ｙ_０，Ｔ）とすると、累積動的視差ｑ（ｘ_０，ｙ_０，Ｔ）は、
ｑ（ｘ_０，ｙ_０，Ｔ）＝｜（ｘ_０，ｙ_０）−（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））｜
と定めることができる。この累積動的視差ｑ（ｘ_０，ｙ_０，Ｔ）の関係式を用いて、ターゲットピクセルを追跡することにより、ターゲットピクセルの軌跡（トラッキング）に基づいて、カメラ２００から対象物までの距離を求めることができる。 When the recurrence formula shown in Equation 6 is applied from t = 1 to t = T, the optimum cumulative value is the distribution of scalar values of S (x, y, T) (however, 1 ≦ x ≦ X, 1). ≦ y ≦ Y). The optimum cumulative value S (x, y, T) when acted from t = 1 to t = T is a value equal to or less than a preset threshold value h, and the coordinates are obtained to be the minimum value. Assuming that these coordinates are (x ^* ₀ (T), y ^* ₀ (T)), these coordinates can be obtained by the following equation.

・・・ Equation 7
Here, the symbol arg means an operation of extracting variables x and y that give a minimum value. At this time, if the _{cumulative dynamic parallax of the target pixel (x 0} , y ₀ ) due to the time T is q (x ₀ , y ₀ , T), the cumulative dynamic parallax q (x ₀ , y ₀ , T) is.
q (x ₀ , y ₀ , T) = | (x ₀ , y ₀ )-(x ^* ₀ (T), y ^* ₀ (T)) |
Can be determined. By tracking the target pixel using the relational expression of the cumulative dynamic parallax q (x ₀ , y ₀ , T), the distance from the camera 200 to the object is determined based on the trajectory (tracking) of the target pixel. You can ask.

ここで，ターゲットピクセルが動画像においてどのような軌跡をとったかは、次述する漸化式によって定めることができる。動画像におけるターゲットピクセルの軌跡は、ターゲットピクセルの変化を視覚化するための重要な情報になる。 Here, the trajectory of the target pixel in the moving image can be determined by the recurrence formula described below. The locus of the target pixel in the moving image is important information for visualizing the change of the target pixel.

ターゲットピクセルの軌跡を求めるためには、到達点（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ），Ｔ）から（ｘ^＊ _０（１），ｙ^＊ _０（１），１）＝（ｘ_０，ｙ_０，１）までに至る点列を漸化式により求める。 To find the locus of the target pixel, from the destination (x ^* ₀ (T), y ^* ₀ (T), T) to (x ^* ₀ (1), y ^* ₀ (1), 1) = (x) _The sequence of points up to 0, y ₀ , 1) is obtained by a recurrence formula.

まず、座標（ｘ^＊ _０（Ｔ−１），ｙ^＊ _０（Ｔ−１））のターゲットピクセルを、
（ｘ^＊ _０（Ｔ−１），ｙ^＊ _０（Ｔ−１））
＝（ｘ^＊ _０（Ｔ）＋ｉ^＊，ｙ^＊ _０（Ｔ）＋ｊ^＊）によって求める。 First, the target pixel of the coordinates (x ^* ₀ (T-1), y ^* ₀ (T-1)) is set.
(X ^* ₀ (T-1), y ^* ₀ (T-1))
= (X ^* ₀ (T) + i ^* , y ^* ₀ (T) + j ^* ).

この場合の（ｉ^＊，ｊ^＊）は、

によって定まる。 In this case, (i ^* , j ^* ) is

Determined by.

以下同様に、（ｘ^＊ _０（Ｔ−２），ｙ^＊ _０（Ｔ−２））のターゲットピクセルを、
（ｘ^＊ _０（Ｔ−２），ｙ^＊ _０（Ｔ−２））
＝（ｘ^＊ _０（Ｔ−１）＋ｉ^＊，ｙ^＊ _０（Ｔ−１）＋ｊ^＊）によって求める。 Similarly, ^{the target pixel of (x *} ₀ (T-2), y ^* ₀ (T-2)) is set.
(X ^* ₀ (T-2), y ^* ₀ (T-2))
= (X ^* ₀ (T-1) + i ^* , y ^* ₀ (T-1) + j ^* ).

この場合の（ｉ^＊，ｊ^＊）は、

によって定まる。 In this case, (i ^* , j ^* ) is

Determined by.

このようにして、（ｘ^＊ _０（１），ｙ^＊ _０（１），１）＝（ｘ_０，ｙ_０，１）までに至る点列のターゲットピクセルを求めるために、Ｔ回続けて計算を行うことにより、
（ｘ^＊ _０（１），ｙ^＊ _０（１），１）＝（ｘ_０，ｙ_０，１）を求めることができる。 In this way, in order to obtain the target pixel of the point sequence up to ^{(x *} ₀ (1), y ^* ₀ (1), 1) = (x ₀ , y _{0, 1), it is calculated T times in succession.} By doing
(X ^* ₀ (1), y ^* ₀ (1), 1) = (x ₀ , y ₀ , 1) can be obtained.

次に、到達点（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ），Ｔ）から（ｘ^＊ _０（１），ｙ^＊ _０（１），１）＝（ｘ_０，ｙ_０，１）までに至る点列を、車両の進行方向の正面を撮影したカメラの動画像に適用することにより、カメラ２００から対象物までの距離を計算する方法について、一例を示して説明する。 Next, from the arrival point (x ^* ₀ (T), y ^* ₀ (T), T) to (x ^* ₀ (1), y ^* ₀ (1), 1) = (x ₀ , y ₀ , 1) An example will be described of a method of calculating the distance from the camera 200 to an object by applying the point sequence up to the above to a moving image of a camera that captures the front of the vehicle in the traveling direction.

５個（Ｍ＝５）のターゲットピクセルを時間ｔ＝１のフレーム画像から抽出する。Ｔ＝３０として、それぞれのターゲットピクセルを、上述した方法で追跡（トラッキング）して、時間ｔ＝３０（＝Ｔ）におけるフレーム画像の到達点を求める。図７は、時間ｔ＝１のフレーム画像において信号機が映る画素の１つをターゲットピクセルとして、時間ｔ＝１から時間ｔ＝３０までターゲットピクセルをトラッキングさせた場合の軌跡（追跡の軌跡）が、時間ｔ＝３０（＝Ｔ）のフレーム画像に示された拡大図である。図８は、異なる５個の信号機を対象物として、信号機の画素の１つをそれぞれのターゲットピクセルとして設定し、５個のターゲットピクセルを同時にトラッキングさせた場合の軌跡（追跡の軌跡）が示された、時間ｔ＝３０の時のフレーム画像を示している。 Five (M = 5) target pixels are extracted from the frame image at time t = 1. As T = 30, each target pixel is tracked by the method described above to obtain the arrival point of the frame image at time t = 30 (= T). FIG. 7 shows a locus (tracking locus) when the target pixel is tracked from time t = 1 to time t = 30 with one of the pixels in which the traffic light is reflected in the frame image at time t = 1 as the target pixel. It is an enlarged view shown in the frame image of time t = 30 (= T). FIG. 8 shows a locus (tracking locus) when five different traffic lights are set as objects, one of the pixels of the traffic light is set as each target pixel, and the five target pixels are tracked at the same time. Moreover, the frame image at the time t = 30 is shown.

図８に示す、５個のターゲットピクセルについて、フレーム画像上の追跡距離（トラッキング軌跡の始端ピクセルの座標から終端ピクセルの座標までの軌跡距離の長さ）の変動範囲を測定する。５個のターゲットピクセルの追跡距離のうち、追跡距離が最小距離となる場合の画素数（最小画素数）が４．０ピクセル、追跡距離が最大距離となる場合の画素数（最大画素数）が３０．５２ピクセルであった。これらの追跡距離の画素数に基づいて、上述したμとγとの値が決定される。本実施の形態の場合には、最小画素数を示す４．０ピクセルは、上述した式２の最小距離μの値に該当し、最大画素数を示す３０．５２ピクセルは、上述した式３の最大距離γに該当することになる。
最小距離（μ）＝４．０ピクセル、最大距離（γ）＝３０．５２ピクセル For the five target pixels shown in FIG. 8, the fluctuation range of the tracking distance (the length of the locus distance from the coordinates of the start pixel to the coordinates of the end pixel of the tracking locus) on the frame image is measured. Of the five target pixel tracking distances, the number of pixels (minimum number of pixels) when the tracking distance is the minimum distance is 4.0 pixels, and the number of pixels (maximum number of pixels) when the tracking distance is the maximum distance is It was 30.52 pixels. The values of μ and γ described above are determined based on the number of pixels of these tracking distances. In the case of the present embodiment, 4.0 pixels indicating the minimum number of pixels correspond to the value of the minimum distance μ in the above-mentioned equation 2, and 30.52 pixels indicating the maximum number of pixels correspond to the value of the above-mentioned equation 3 in the above-mentioned equation 3. It corresponds to the maximum distance γ.
Minimum distance (μ) = 4.0 pixels, maximum distance (γ) = 30.52 pixels

また、時間ｔ＝３０（＝Ｔ）の時のフレーム画像に映っている５個の対象物（５個のターゲットピクセルに対応する５つの対象物）のうち、人間の視察によって、現実世界でのカメラ２００から対応する５つの物体（信号機：対象物）までの距離は、最小距離となる５０ｍから、最大距離となる１００ｍまでの範囲に含まれている。これらの距離は、動画像に映っている対象物からカメラ２００までの実際のスケール幅を示す。最小距離を示す５０ｍは、上述した式３のＺ_Ｎの値に対応し、最大距離を示す１００ｍは、上述した式２のＺ_Ｌの値に対応することになる。
最小距離（Ｚ_Ｎ）＝５０ｍ、最大距離（Ｚ_Ｌ）＝１００ｍ Also, of the five objects (five objects corresponding to the five target pixels) shown in the frame image at time t = 30 (= T), by human inspection, in the real world. The distance from the camera 200 to the corresponding five objects (signal: object) is included in the range from the minimum distance of 50 m to the maximum distance of 100 m. These distances indicate the actual scale width from the object shown in the moving image to the camera 200. 50 m indicating the minimum distance _{corresponds to the value of Z N in} the above-mentioned equation 3, and 100 m indicating the maximum distance corresponds to the value _{of Z L in the above-mentioned equation 2.}
Minimum distance (Z _N ) = 50 m, maximum distance (Z _L ) = 100 m

なお、既に説明した動的視差の性質に基づいて、カメラ２００から対象物までの距離が遠い程（Ｚ_Ｌ）、対応する対象物のターゲットピクセルの追跡距離が短くなり（μ）、カメラ２００から対象物までの距離が近い程（Ｚ_Ｎ）、対応する対象物のターゲットピクセルの追跡距離が長くなる（γ）。このため、μはＺ_Ｌに対応し、γはＺ_Ｎと対応することになり、式２、式３に示したような関係が成立することになる。 In addition, based on the property of dynamic parallax described above, the farther the distance from the camera 200 to the object is (Z _L ), the shorter the tracking distance of the target pixel of the corresponding object (μ), and the distance from the camera 200 is increased. The closer the distance _{to the object (Z N} ), the longer the tracking distance of the target pixel of the corresponding object (γ). Therefore, μ _{corresponds to Z L} and γ corresponds to Z _N, and the relationship shown in Equations 2 and 3 is established.

これらのμ，γ，Ｚ_Ｌ，Ｚ_Ｎを用いて上述した式４、式５および式１を計算することによって、５個のターゲットピクセルに対応する画素点に映っている対象物からカメラ２００までの現実の距離を算出することができる。図８には、５個のターゲットピクセルの軌跡に対応して算出された現実世界の距離（現実世界のカメラ２００からそれぞれの信号機までの距離）が示されている。図８に示すように、上述したμ，γ，Ｚ_Ｌ，Ｚ_Ｎの値を用いて、式２〜式５に基づいて変数ａおよびｂを算出し、算出されたａ，ｂを式１に代入することによって、各ターゲットピクセルの座標に映る対象物からカメラ２００までの現実世界の距離の値を算出することができる。 By calculating the above-mentioned equations 4, 5, and 1 using these μ, γ, Z _L , and Z _N , the object reflected in the pixel points corresponding to the five target pixels to the camera 200 The actual distance of can be calculated. FIG. 8 shows the distance in the real world (distance from the camera 200 in the real world to each traffic light) calculated corresponding to the trajectories of the five target pixels. As shown in FIG. 8, variables a and b are calculated based on Equations 2 to 5 using _{the above-mentioned values of μ, γ, Z L} , and Z _{N, and the calculated a and b are converted into Equation 1.} By substituting, the value of the real-world distance from the object reflected in the coordinates of each target pixel to the camera 200 can be calculated.

次にターゲットピクセルの選択・決定について説明する。原理的には、フレーム画像ｆ（ｘ，ｙ，ｔ）（但し、１≦ｘ≦Ｘ，１≦ｙ≦Ｙ）の全てのピクセルをターゲットピクセルにすることが可能である。しかしながら、ターゲットピクセルの対象を広げすぎると距離を算出するための計算量が増加してしまう。このため、ターゲットピクセルの数が適切になるように選別・削減することが好ましい。ターゲットピクセルを適切に選別・削減する方法の一例として、公知の領域分割方法を利用する。 Next, the selection / determination of the target pixel will be described. In principle, it is possible to set all the pixels of the frame image f (x, y, t) (however, 1 ≦ x ≦ X, 1 ≦ y ≦ Y) as target pixels. However, if the target of the target pixel is expanded too much, the amount of calculation for calculating the distance will increase. Therefore, it is preferable to select and reduce the number of target pixels so as to be appropriate. As an example of a method for appropriately selecting and reducing target pixels, a known region division method is used.

既存の領域分割手法のうち最も有力な手法の一つとして、mean-shift法（中間値シフト法）と呼ばれる方法が知られている。mean-shift法は、広く知られた領域分割手法であって、Open CV（Open Source Computer Vision Library）と呼ばれる、広く公開されたオープンソースのコンピュータビジョン向けのライブラリによって提供されている。mean-shift法をフレーム画像に適用することにより、フレーム画像の画素毎のＲＧＢ値（色情報）などに基づいて、フレーム画像に映る対象物等に応じて画像領域の分割が行われる。分割された領域のうち同一領域と判断される部分については、カメラからの距離がほぼ等しいと解釈する。本実施の形態では、分割された領域（分割領域）毎に重心点を求めて、それぞれの重心点をターゲットピクセルとして決定する。このようにしてターゲットピクセルを決定することにより、カメラ２００からの距離がほぼ等しいと解釈され得る領域毎に距離算出を行うことができ、ターゲットピクセルの選定を適切に行うことが可能になる。また、ターゲットピクセルの数を分割領域の数まで削減することができるので、演算負担の軽減化を図ることが可能になる。 As one of the most promising methods among the existing area division methods, a method called a mean-shift method (intermediate value shift method) is known. The mean-shift method is a well-known area division method and is provided by a widely open library for computer vision called Open CV (Open Source Computer Vision Library). By applying the mean-shift method to the frame image, the image area is divided according to the object or the like reflected in the frame image based on the RGB value (color information) for each pixel of the frame image. Of the divided areas, the parts that are judged to be the same area are interpreted as having almost the same distance from the camera. In the present embodiment, the center of gravity point is obtained for each divided area (divided area), and each center of gravity point is determined as a target pixel. By determining the target pixel in this way, the distance can be calculated for each region where the distance from the camera 200 can be interpreted to be substantially the same, and the target pixel can be appropriately selected. Further, since the number of target pixels can be reduced to the number of divided areas, it is possible to reduce the calculation load.

図９は、カメラにより撮影された風景画像に対してmean-shift法が適用された画像を示している。mean-shift法を適用することにより、フレーム画像の風景が複数の領域に分割され、同じ領域が、同じ色で示される。このようにして分割された領域毎に重心位置を求めて、該当する重心位置の画素をターゲットピクセルに設定する。 FIG. 9 shows an image in which the mean-shift method is applied to a landscape image taken by a camera. By applying the mean-shift method, the landscape of the frame image is divided into multiple areas, and the same area is shown in the same color. The position of the center of gravity is obtained for each region divided in this way, and the pixel at the corresponding center of gravity is set as the target pixel.

図１１（ａ）は、カメラにより撮影された車両正面方向の動画像に基づいて、ＣＰＵ１０４が、カメラ２００から対象物までの距離を算出する方法を示したフローチャートである。距離算出の対象となる動画像は、時間ｔ＝τ−Ｔから時間ｔ＝τまでのフレーム画像ｆ（ｘ，ｙ，ｔ）（１≦ｘ≦Ｘ，１≦ｙ≦Ｙ，τ−Ｔ≦ｔ≦τ）とする。 FIG. 11A is a flowchart showing a method in which the CPU 104 calculates the distance from the camera 200 to the object based on the moving image in the front direction of the vehicle taken by the camera. The moving image for which the distance is calculated is the frame image f (x, y, t) (1 ≦ x ≦ X, 1 ≦ y ≦ Y, τ −T ≦) from the time t = τ−T to the time t = τ. t ≦ τ).

図１０は、ＲＯＭ１０２より読み込んだプログラムに基づいて、ＣＰＵ１０４が距離算出処理を実行する場合の各機能部を示したブロック図である。ＣＰＵ１０４は、図１１（ａ）（ｂ）に示す処理内容に応じて、領域分割部（領域分割手段）３０１、ターゲットピクセル抽出部（ターゲットピクセル抽出手段）３０２、軌跡算出部（軌跡算出手段）３０３、移動画素数測定部（移動画素数測定手段）３０４および距離算出部（距離算出手段）３０５として機能し、それぞれの処理を行う。 FIG. 10 is a block diagram showing each functional unit when the CPU 104 executes the distance calculation process based on the program read from the ROM 102. The CPU 104 has an area division unit (area division means) 301, a target pixel extraction unit (target pixel extraction means) 302, and a locus calculation unit (trajectory calculation means) 303 according to the processing contents shown in FIGS. 11A and 11B. , Functions as a moving pixel number measuring unit (moving pixel number measuring means) 304 and a distance calculating unit (distance calculating means) 305, and performs their respective processes.

まず、ＣＰＵ１０４は、記録部１０１から時間ｔ＝τ−Ｔのフレーム画像を読み出して、読み出したフレーム画像に対してmean-shift法を適用し、フレーム画像の領域分割処理を行う（図１１（ａ）のＳ．１：領域分割ステップ、領域分割機能）。この処理においてＣＰＵ１０４は、時間ｔ＝τ−Ｔのフレーム画像を複数の領域に分割する領域分割部（領域分割手段）３０１として機能する。 First, the CPU 104 reads a frame image with a time t = τ−T from the recording unit 101, applies a mean-shift method to the read frame image, and performs area division processing of the frame image (FIG. 11 (a). ) S.1: Area division step, area division function). In this process, the CPU 104 functions as a region dividing unit (region dividing means) 301 that divides the frame image at time t = τ−T into a plurality of regions.

ＣＰＵ１０４は、複数の領域に分割されたフレーム画像の領域のうち、カメラ２００からの距離を測定する対象物が映る領域を求める。距離の測定対象となる対象物がＭ個ある場合には、Ｍ個の領域を求める。なお、Ｍは２以上の数とする。そして、ＣＰＵ１０４は、Ｍ個の領域毎に重心点の座標を求めて、求められた重心点の座標の画素を時間ｔ＝τ−Ｔのターゲットピクセルとして、Ｍ個の領域毎に抽出する（Ｓ．２：ターゲットピクセル抽出ステップ、ターゲットピクセル抽出機能）。この処理においてＣＰＵ１０４は、Ｍ個のターゲットピクセルを抽出するターゲットピクセル抽出部（ターゲットピクセル抽出手段）３０２として機能する。 The CPU 104 obtains an area in which an object whose distance is measured from the camera 200 is projected, out of the area of the frame image divided into a plurality of areas. When there are M objects whose distances are to be measured, M areas are obtained. In addition, M is a number of 2 or more. Then, the CPU 104 obtains the coordinates of the center of gravity point for each of the M regions, and extracts the pixels of the coordinates of the obtained center of gravity points as the target pixels of the time t = τ−T for each of the M regions (S). .2: Target pixel extraction step, target pixel extraction function). In this process, the CPU 104 functions as a target pixel extraction unit (target pixel extraction means) 302 for extracting M target pixels.

ＣＰＵ１０４は、時間ｔ＝τ−Ｔのフレーム画像において決定された複数（Ｍ個）のターゲットピクセルに基づいて、時間ｔ＝τ−Ｔのフレーム画像から時間ｔ＝τのフレーム画像までのターゲットピクセルの軌跡を、２次元画像を対象とする動的計画法を用いたトラッキングアルゴリズムによって算出する（Ｓ．３：軌跡算出ステップ、軌跡算出機能）。そして、ＣＰＵ１０４は、時間ｔ＝τのフレーム画像におけるターゲットピクセルの到達点の座標（ｘ^＊ _０（Ｔ），ｙ^＊ _０（Ｔ））を決定する。この処理においてＣＰＵ１０４は、時間ｔ＝τ−Ｔのフレーム画像におけるターゲットピクセルの画素位置から時間ｔ＝τのフレーム画像におけるターゲットピクセルの画素位置までのターゲットピクセルの軌跡を、Ｍ個のターゲットピクセルに応じて算出する軌跡算出部（軌跡算出手段）３０３として機能する。 The CPU 104 has a plurality of (M) target pixels determined in the frame image at time t = τ-T, and the CPU 104 has the target pixels from the frame image at time t = τ-T to the frame image at time t = τ. The locus is calculated by a tracking algorithm using a dynamic planning method for a two-dimensional image (S.3: locus calculation step, locus calculation function). ^{Then, the CPU 104 determines the coordinates (x *} ₀ (T), y ^* ₀ (T)) of the arrival point of the target pixel in the frame image at time t = τ. In this process, the CPU 104 sets the locus of the target pixel from the pixel position of the target pixel in the frame image at time t = τ −T to the pixel position of the target pixel in the frame image at time t = τ according to M target pixels. It functions as a locus calculation unit (trajectory calculation means) 303.

なお、ＣＰＵ１０４において、時間ｔ＝τのフレーム画像に対してmean-shift法を適用して、時間ｔ＝τのフレーム画像の領域分割処理を行った場合、時間ｔ＝τ−Ｔのフレーム画像においてターゲットピクセルが属する分割領域と、時間ｔ＝τのフレーム画像において対応するターゲットピクセルが到達した画素点の分割領域とは、通常一致することになる。 When the mean-shift method is applied to the frame image of time t = τ and the area division processing of the frame image of time t = τ is performed in the CPU 104, the frame image of time t = τ −T is used. The divided area to which the target pixel belongs and the divided area of the pixel point reached by the corresponding target pixel in the frame image at time t = τ usually coincide with each other.

ＣＰＵ１０４は、時間ｔ＝τ−Ｔのフレーム画像の画素位置から時間ｔ＝τのフレーム画像の画素位置までの、ターゲットピクセルの追跡距離（移動距離）の長さ（画素数）を、Ｍ個のターゲットピクセル毎に測定する（Ｓ．４：移動画素数測定ステップ、移動画素数測定機能）。この処理においてＣＰＵ１０４は、時間ｔ＝τのフレーム画像において、Ｍ個のターゲットピクセルの移動距離を、移動画素数ｑ_ｍ（ｍ＝１，２，・・・，Ｍ）として測定する移動画素数測定部（移動画素数測定手段）３０４として機能する。 The CPU 104 sets the length (number of pixels) of the tracking distance (movement distance) of the target pixel from the pixel position of the frame image at time t = τ −T to the pixel position of the frame image at time t = τ by M pieces. It measures for each target pixel (S.4: Moving pixel number measuring step, moving pixel number measuring function). In this process, the CPU 104 measures the number of moving pixels by measuring the moving distance of M target pixels as the number of moving pixels q _m (m = 1, 2, ..., M) in the frame image at time t = τ. It functions as a unit (moving pixel number measuring means) 304.

ＣＰＵ１０４は、求められたＭ個のフレーム画像の移動画素数ｑ_ｍのうち、最も移動画素数が少ない画素数を前述したμとし、最も移動画素数が多い画素数を前述したγとする。さらに、移動画素数が最も多い画素数のターゲットピクセルにより特定される対象物からカメラ２００までの、時間ｔ＝τにおける現実世界の距離を、目視により決定してＺ_ｎとする。また、移動画素数が最も少ない画素数のターゲットピクセルにより特定される対象物からカメラ２００までの、時間ｔ＝τにおける現実世界の距離を、目視により決定してＺ_Ｌとする。 CPU104, of the moving pixel number q _m of the M frame image obtained, most mobile number of pixels of the smaller number of pixels and μ described above, and γ mentioned above the most number of moving a large number of pixels pixel. Furthermore, from the object moving pixel number is specified by the largest number of pixels of the target pixel to the camera 200, the distance of the real world in the time t = tau, and Z _n as determined by visual observation. Further, the distance in the real world from the object specified by the target pixel having the smallest number of moving pixels to the camera 200 at time t = τ is visually determined and defined as Z _L.

そして、ＣＰＵ１０４は、設定されたμ，γ，Ｚ_Ｎ，Ｚ_Ｌを用いて、上述した式４および式５に値を代入することにより、定数ａおよび定数ｂを求めて、求められた定数ａおよび定数ｂと、Ｍ個のターゲットピクセル毎の移動画素数ｑ_ｍ（ｍ＝１，２，・・・，Ｍ）とを用いて、それぞれのターゲットピクセルに該当する対象物からカメラ２００までの時間ｔ＝τにおける現実世界の距離を算出する（Ｓ．５：距離算出ステップ、距離算出機能）。 Then, the CPU 104 obtains the constant a and the constant b by substituting the values into the above-mentioned equations 4 and 5 using the set μ, γ, Z _N , and Z _{L, and obtains the constant a.} Using the constant b and the number of moving pixels q _m (m = 1, 2, ..., M) for each M target pixels, the time from the object corresponding to each target pixel to the camera 200. Calculate the distance in the real world at t = τ (S.5: distance calculation step, distance calculation function).

この処理においてＣＰＵ１０４は、設定されたμ，γ，Ｚ_Ｎ，Ｚ_Ｌに基づいて求められた定数ａ，ｂと、ターゲットピクセル毎の移動画素数ｑ_ｍとに基づいて、各対象物からカメラ２００までの現実世界の距離を算出する距離算出部（距離算出手段）３０５として機能する。 In this process, the CPU 104 performs the camera 200 from each object based on the constants a and b obtained based on _{the set μ, γ, Z N} , and Z _L _{and the number of moving pixels q m for each target pixel.} It functions as a distance calculation unit (distance calculation means) 305 that calculates the distance in the real world up to.

このようにして、ＣＰＵ１０４は、カメラ２００により撮影された動画像に基づいて、動画像に映った対象物からカメラ２００までの時間ｔ＝τにおける現実世界の距離を算出する。時間ｔ＝τのフレーム画像において、分割された領域（分割領域）の同定を行うことにより、それぞれの分割領域に映る対象物が、信号機、看板、道路、壁、木、建物等のどのような物に該当するかを分類することができ、カメラ２００から分類された各対象物までの距離情報を、パラメータとして求めることができる。このようにして求められたパラメータは、例えば、自動車等の自動運転制御等における重要な制御パラメータとして利用することが可能になる。 In this way, the CPU 104 calculates the distance in the real world at the time t = τ from the object reflected in the moving image to the camera 200 based on the moving image taken by the camera 200. By identifying the divided areas (divided areas) in the frame image at time t = τ, what kind of object is reflected in each divided area, such as a traffic light, a signboard, a road, a wall, a tree, a building, etc. It is possible to classify whether or not it corresponds to an object, and the distance information from the camera 200 to each classified object can be obtained as a parameter. The parameters thus obtained can be used as important control parameters in, for example, automatic driving control of automobiles and the like.

以上、本発明に係る動画像距離算出装置および動画像距離算出用プログラムについて、図面を用いて詳細に説明したが、本発明に係る動画像距離算出装置および動画像距離算出用プログラムは、実施の形態に示した動画像距離算出装置１００の構成例等には限定されない。実施の形態において説明したターゲットピクセルのトラッキング処理では、mean-shift法により分割された領域の重心点に基づいて、ターゲットピクセルを決定する方法を説明した。しかしながら、ターゲットピクセルは、必ずしも分割領域の重心点に基づいて決定されるものに限定されない。 The moving image distance calculation device and the moving image distance calculation program according to the present invention have been described in detail with reference to the drawings. However, the moving image distance calculation device and the moving image distance calculation program according to the present invention have been carried out. The present invention is not limited to the configuration example of the moving image distance calculation device 100 shown in the embodiment. In the target pixel tracking process described in the embodiment, a method of determining the target pixel based on the center of gravity point of the region divided by the mean-shift method has been described. However, the target pixel is not necessarily limited to the one determined based on the center of gravity point of the divided region.

例えば、最適累積値Ｓ（ｘ，ｙ，ｔ）と予め設定される閾値ｈとを用いて、
Ｓ（ｘ，ｙ，ｔ）／ｔ≦ｈ
の条件を満たす領域をＲ（ｘ，ｙ，ｔ）とする。この領域の重心の座標をｒ（ｘ^＊，ｙ^＊，ｔ）として、ｒ（ｘ^＊，ｙ^＊，ｔ）をターゲットピクセルに決定する方法を用いてもよい。従って、ターゲットピクセルのトラッキング処理は、ｒ（ｘ^＊，ｙ^＊，ｔ）を用いて、ｔ＝１，２，・・・，Ｔのトラッキングを行うことになる。ここで、Ｒ（ｘ，ｙ，ｔ）を求めるために、最小累積値Ｓ（ｘ，ｙ，ｔ）を時間ｔで除算する処理は、最小累積値を時間の長さで正規化する処理に該当する。また、閾値ｈは、経験的に定まる定数である。 For example, using the optimum cumulative value S (x, y, t) and the preset threshold value h,
S (x, y, t) / t ≦ h
Let R (x, y, t) be a region satisfying the above condition. A method of determining r (x ^* , y ^* , t) as the target pixel may be used, where the coordinates of the center of gravity of this region are r (x ^* , y ^{*, t).} Therefore, in the tracking process of the target pixel, r (x ^* , y ^* , t) is used to track t = 1, 2, ..., T. Here, the process of dividing the minimum cumulative value S (x, y, t) by the time t in order to obtain R (x, y, t) is the process of normalizing the minimum cumulative value by the length of time. Applicable. Further, the threshold value h is a constant determined empirically.

分割領域の重心点によってターゲットピクセルを決定しても、分割領域内にはターゲットピクセルに類似する画素が存在している。このため、トラッキングされるターゲットピクセルの画素位置の周辺にも、類似の積分値を備える画素が存在することになる。従って、閾値ｈ以下となる正規化された最小累積値の値（Ｓ（ｘ，ｙ，ｔ）／ｔ）も、分割された領域を示すことになり、この領域の重心の座標をターゲットピクセルに決定することによって、トラッキング処理により求められるターゲットピクセルの信頼性を高めることになる。 Even if the target pixel is determined by the center of gravity point of the divided area, there are pixels similar to the target pixel in the divided area. Therefore, a pixel having a similar integrated value also exists around the pixel position of the target pixel to be tracked. Therefore, the value of the normalized minimum cumulative value (S (x, y, t) / t) that is equal to or less than the threshold value h also indicates the divided region, and the coordinates of the center of gravity of this region are set as the target pixel. By making a decision, the reliability of the target pixel required by the tracking process will be increased.

また、Ｓ（ｘ，ｙ，ｔ）／ｔ≦ｈの条件を満たす領域をＲ（ｘ，ｙ，ｔ）として、この領域の重心の座標をトラッキングすることにより、バックトレースの視覚化を行いやすくなり、トラッキングが正確に行われているか判断しやすくすることができる。 Further, by tracking the coordinates of the center of gravity of this region with the region satisfying the condition of S (x, y, t) / t ≦ h as R (x, y, t), it is easy to visualize the back trace. Therefore, it is possible to easily determine whether the tracking is performed accurately.

さらに、従来のトラッキング手法では、動画像に映る対象物毎に個別にトラッキング処理を行う必要があったが、本発明に係る動画像距離算出装置および動画像距離算出用プログラムでは、図８に示したように、複数の対象物（より詳細には、複数のターゲットピクセル）について、同時にトラッキング処理を行うことが可能である。 Further, in the conventional tracking method, it is necessary to perform tracking processing individually for each object reflected in the moving image, but in the moving image distance calculation device and the moving image distance calculation program according to the present invention, it is shown in FIG. As described above, it is possible to simultaneously perform tracking processing on a plurality of objects (more specifically, a plurality of target pixels).

例えば、複数の対象物毎のターゲットピクセルをｆ（ｘ_ｍ，ｙ_ｍ，ｔ）（但し、ｍ＝１，２，・・・，Ｍ）とする。ここで、ｍは、対象物を区別するための番号であり、Ｍはその対象物の総数である。 For example, the plurality of objects each of target pixels _{_{f (x m, y m,}} t) ( where, m = 1,2, ···, M ) and. Here, m is a number for distinguishing the objects, and M is the total number of the objects.

この対象物毎のターゲットピクセルをトラッキングするための動的計画法の漸化式では、局所距離ｄ_ｍ（ｘ，ｙ，ｔ）を
ｄ_ｍ（ｘ，ｙ，ｔ）＝｜ｆ（ｘ_ｍ，ｙ_ｍ，１）−ｆ（ｘ，ｙ，ｔ）｜
とし、最小累積値Ｓ（ｘ，ｙ，ｔ）を

とする。さらに、トラッキング処理の対象となるターゲットピクセルを、Ｓ（ｘ，ｙ，ｔ）／ｔの値が閾値ｈ以下となる複数の局所的領域の重心に決定する。 The recurrence formula of a dynamic programming for tracking target pixel of the object for each local distance _{d m (x, y, t} ) and _{d m (x, y, t} ) = | f (x m, _{y m, 1) -f (x} , y, t) |
And the minimum cumulative value S (x, y, t)

And. Further, the target pixel to be tracked is determined to be the center of gravity of a plurality of local regions where the value of S (x, y, t) / t is equal to or less than the threshold value h.

このようにして、複数の対象物毎にターゲットピクセルを決定することにより、それぞれの対象物からカメラまでの距離を、トラッキングされるターゲットピクセル軌跡の終端から始端までの経路距離（軌跡のピクセル距離）を用いて、対象物毎に求めることが可能になる。 By determining the target pixel for each of a plurality of objects in this way, the distance from each object to the camera is determined by the path distance from the end to the start of the tracked target pixel locus (pixel distance of the locus). Can be used for each object.

また、実施の形態に係る動画像距離算出装置１００では、図１１（ａ）に示すように、ＣＰＵ１０４が、過去から現在に向けて経時的に変化する（時間ｔ＝τ−Ｔから時間ｔ＝τまで変化する）動画像のフレーム画像に基づいて、ターゲットピクセルのトラッキング処理を行う場合について説明した。しかしながら、ターゲットピクセルのトラッキング処理を行う場合に、時間ｔ＝τから過去の時間ｔ＝τ−Ｔへと時間を遡るようにして、動画像のフレーム画像におけるターゲットピクセルのトラッキング処理を行うことも可能である。 Further, in the moving image distance calculation device 100 according to the embodiment, as shown in FIG. 11A, the CPU 104 changes with time from the past to the present (time t = τ−T to time t =). A case where tracking processing of a target pixel is performed based on a frame image of a moving image (which changes up to τ) has been described. However, when the tracking process of the target pixel is performed, it is also possible to perform the tracking process of the target pixel in the frame image of the moving image by tracing back the time from the time t = τ to the past time t = τ−T. Is.

図１１（ｂ）は、ＣＰＵ１０４が、時間ｔ＝τから時間ｔ＝τ−Ｔまで、時間ｔを過去に遡るようにしてターゲットピクセルのトラッキング処理を行う場合の処理内容を示したフローチャートである。ＣＰＵ１０４は、時間ｔ＝τのフレーム画像に対して、mean-shift法を適用することにより、領域分割を行う（Ｓ．１１：領域分割ステップ、領域分割機能）。 FIG. 11B is a flowchart showing the processing contents when the CPU 104 performs the tracking process of the target pixel by tracing back the time t from the time t = τ to the time t = τ−T. The CPU 104 divides a frame image at time t = τ by applying a mean-shift method (S.11: region division step, region division function).

そして、ＣＰＵ１０４は、求められた分割領域の中から対象物が映るＭ個の領域を求めて、Ｍ個の領域毎に重心点の画素をターゲットピクセルとして抽出する（Ｓ．１２：ターゲットピクセル抽出ステップ、ターゲットピクセル抽出機能）。例えば、抽出されたＭ個のターゲットピクセルの１つの画素を（ｘ^＊ _０（τ），ｙ^＊ _０（τ））とする。 Then, the CPU 104 obtains M areas in which the object is reflected from the obtained divided areas, and extracts the pixel of the center of gravity point as the target pixel for each of the M areas (S.12: Target pixel extraction step). , Target pixel extraction function). For example, let one pixel of the extracted M target pixels be (x ^* ₀ (τ), y ^* ₀ (τ)).

ＣＰＵ１０４は、時間ｔ＝τから時間ｔ＝τ−Ｔまでの動画像に基づいて、ターゲットピクセルのトラッキング処理を行う（Ｓ．１３：軌跡算出ステップ、軌跡算出機能）。このトラッキング処理により、時間ｔ＝τから時間ｔ＝τ−Ｔまでの過去に遡る時系列により、ターゲットピクセルの軌跡を求めることになる。時間ｔ＝τ−Ｔのフレーム画像におけるＭ個のターゲットピクセルのうちの１つの画素の到達点を（ｘ^＊ _０（τ−Ｔ），ｙ^＊ _０（τ−Ｔ））とする。ＣＰＵ１０４は、時間ｔ＝τから時間ｔ＝τ−Ｔまでの軌跡によるターゲットピクセルの累積動的視差の値ｑ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））を、
ｑ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））
＝｜（ｘ^＊ _０（τ）−ｙ^＊ _０（τ））−（ｘ^＊ _０（τ−Ｔ），ｙ^＊ _０（τ−Ｔ））｜
より求める。この累積動的視差の値ｑ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））は既に説明した移動画素数ｑ_ｍに該当する。従って、ＣＰＵ１０４はこの処理により、移動画素数の測定を行うことになる（Ｓ．１４：移動画素数測定ステップ、移動画素数測定機能）。 The CPU 104 performs tracking processing of the target pixel based on the moving image from the time t = τ to the time t = τ−T (S.13: trajectory calculation step, trajectory calculation function). By this tracking process, the locus of the target pixel is obtained by the time series going back to the past from the time t = τ to the time t = τ−T. Let the arrival point of one of the M target pixels in the frame image at time t = τ-T be (x ^* ₀ (τ-T), y ^* ₀ (τ-T)). ^{The CPU 104 sets the cumulative dynamic parallax value q (x *} ₀ (τ), y ^* ₀ (τ)) of the target pixel according to the locus from the time t = τ to the time t = τ−T.
q (x ^* ₀ (τ), y ^* ₀ (τ))
= | (X ^* ₀ (τ) -y ^* ₀ (τ))-(x ^* ₀ (τ-T), y ^* ₀ (τ-T)) |
Ask more. The value q (x ^* ₀ (τ), y ^* ₀ (τ)) of this cumulative dynamic parallax corresponds to the _{number of moving pixels q m already described.} Therefore, the CPU 104 measures the number of moving pixels by this process (S.14: moving pixel number measuring step, moving pixel number measuring function).

そして、ＣＰＵ１０４は、求められた累積動的視差の値ｑ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））に基づいて、時間ｔ＝τのフレーム画像に映る対象物の距離を算出する（Ｓ．１５：距離算出ステップ、距離算出機能）。時間ｔ＝τにおける対象物からカメラ２００までの現実世界の距離は、
ｄ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））
＝ａ・ｅｘｐ（−ｂｑ（ｘ^＊ _０（τ），ｙ^＊ _０（τ）））
により求めることができる。 Then, the CPU 104 calculates the distance of the object reflected in the frame image at time t = τ based on the obtained cumulative dynamic parallax value q (x ^* ₀ (τ), y ^* _{0 (τ)).} (S.15: Distance calculation step, distance calculation function). The real-world distance from the object to the camera 200 at time t = τ is
d (x ^* ₀ (τ), y ^* ₀ (τ))
= A · exp (−bq (x ^* ₀ (τ), y ^* ₀ (τ)))
Can be obtained by.

この式は、式１に示したＺ＝ａ・ｅｘｐ（−ｂｑ）に対応する。 This equation corresponds to Z = a · exp (−bq) shown in Equation 1.

このようにして、時間ｔ＝τから過去の時間ｔ＝τ−Ｔへと時間を遡るようにして、動画像のフレーム画像におけるターゲットピクセルのトラッキング処理を行って現実世界の距離を求めることにより、２つの利点が生じる。 In this way, the time is traced back from the time t = τ to the past time t = τ−T, and the tracking process of the target pixel in the frame image of the moving image is performed to obtain the distance in the real world. There are two advantages.

まず、時間ｔ＝τのフレーム画像における分割領域は、フレーム画像に映る対象物までの距離が近いこともあって、比較的広い領域となる傾向がある。このため、ターゲットピクセルを、より安定した分割領域に基づいて決定することが可能になるという利点が生じる。 First, the divided region in the frame image at time t = τ tends to be a relatively wide region because the distance to the object reflected in the frame image is short. This has the advantage that the target pixel can be determined based on a more stable split area.

次に、時間ｔ＝τから過去の時間ｔ＝τ−Ｔへと時間を遡るようにして、ターゲットピクセルのトラッキングを行うため、時間が遡ってもターゲットピクセルが動画像のフレームから外れてしまう現象が生じ難いという利点が生じる。時間ｔ＝τ−Ｔから時間ｔ＝τへと変化する動画像では、カメラ２００の進行に伴って、時間ｔ＝τ−Ｔのフレーム画像に存在していたターゲットピクセルが、時間ｔ＝τになる前にフレーム画像から外れてしまい、時間ｔ＝τのフレーム画像に存在しない可能性も生じ得る。しかしながら、時間ｔ＝τから過去の時間ｔ＝τ−Ｔへと時間を遡るようにして、ターゲットピクセルのトラッキングを行う場合には、ターゲットピクセルが画面の中心方向へと移動する傾向となるため、ターゲットピクセルの移動量が僅かになっても、フレーム画像内に存在する可能性が高い。 Next, since the target pixel is tracked by tracing back the time from the time t = τ to the past time t = τ−T, the phenomenon that the target pixel is out of the frame of the moving image even if the time is traced back. Has the advantage of being less likely to occur. In the moving image that changes from time t = τ-T to time t = τ, the target pixel existing in the frame image of time t = τ-T changes to time t = τ as the camera 200 progresses. It may be out of the frame image before it becomes, and it may not exist in the frame image at time t = τ. However, when tracking the target pixel by tracing back the time from the time t = τ to the past time t = τ−T, the target pixel tends to move toward the center of the screen. Even if the amount of movement of the target pixel is small, it is highly likely that it exists in the frame image.

また、同様に、ターゲットピクセルの決定が行われた分割領域も、時間ｔ＝τ−Ｔのフレーム画像内に存在する可能性が高い。このため、時間ｔ＝τから過去の時間ｔ＝τ−Ｔへと時間を遡るようにして、ターゲットピクセルのトラッキングを行う場合には、ターゲットピクセルおよび分割領域がフレーム画像から外れてしまうという現象に対処する必要性が低減されるという利点が生じる。 Similarly, it is highly possible that the divided region in which the target pixel is determined also exists in the frame image at time t = τ−T. Therefore, when tracking the target pixel by tracing back the time from the time t = τ to the past time t = τ−T, the target pixel and the divided area deviate from the frame image. The advantage is that the need to deal with it is reduced.

なお、時間を遡るようにしてターゲットピクセルのトラッキングを行う場合、終端のピクセルが正しいターゲットピクセルと必ず同じピクセルになるとは保証できないが、この点に関しては、時間ｔ＝τ−Ｔから時間ｔ＝τへと経時的にターゲットピクセルのトラッキングを行う場合であっても同様である。 When tracking the target pixel by going back in time, it cannot be guaranteed that the terminal pixel will always be the same pixel as the correct target pixel, but in this regard, time t = τ−T to time t = τ. The same applies even when tracking the target pixel over time.

また、実施の形態に係る動画像距離算出装置１００において、ターゲットピクセルのトラッキング処理を安定的に行うために、以下の方法を用いることも可能である。 Further, in the moving image distance calculation device 100 according to the embodiment, the following method can be used in order to stably perform the tracking process of the target pixel.

まず、実施の形態に係る画像距離算出装置１００では、分割された領域（分割領域）毎に重心点を求めて、それぞれの重心点をターゲットピクセルとして決定する場合について説明した。しかしながら、分割領域の形状が、例えば、Ｌの字形状である場合、重心点が分割領域の外側に存在する場合が生じ得る。このような場合には、分割領域の外側の座標がターゲットピクセルとして決定されてしまうおそれがある。このため、領域内の画素に対して端から順番に番号を付加し、その領域の全ての画素数の半分の値に該当する番号（その領域の画素の番号のうち中間の番号に該当する）の画素を、ターゲットピクセルに決定する。このようにしてターゲットピクセルを決定することによって、分割領域内の画素をターゲットピクセルとして確実に決定することができる。 First, in the image distance calculation device 100 according to the embodiment, a case where a center of gravity point is obtained for each divided area (divided area) and each center of gravity point is determined as a target pixel has been described. However, when the shape of the divided region is, for example, an L-shape, the center of gravity may be outside the divided region. In such a case, the coordinates outside the divided area may be determined as the target pixel. Therefore, numbers are added to the pixels in the area in order from the end, and the number corresponds to half the value of all the pixels in the area (corresponds to the middle number among the numbers of the pixels in the area). Pixel is determined as the target pixel. By determining the target pixel in this way, the pixel in the divided region can be reliably determined as the target pixel.

次に、決定されたターゲットピクセルの座標を、図１１（ｂ）のＳ．１２において設定したように、（ｘ^＊ _０（τ），ｙ^＊ _０（τ））とするときに、ターゲットピクセルの値を１つの画素の値だけで定めるのではなく、ターゲットピクセルの座標の上下左右の画素の値の平均値によって決定することも可能である。 Next, the coordinates of the determined target pixel are set to S. in FIG. 11 (b). As set in 12, when (x ^* ₀ (τ), y ^* ₀ (τ)), the value of the target pixel is not determined by the value of only one pixel, but above and below the coordinates of the target pixel. It is also possible to determine by the average value of the values of the left and right pixels.

例えば、ターゲットピクセルの座標（ｘ^＊ _０（τ），ｙ^＊ _０（τ））の上下左右の座標を含む、５つの座標点の平均値を用いて、ターゲットピクセルの値Ｐ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））を決定する場合には、

・・・式８
により値を決定することができる。このように複数の座標の値の平均値によりターゲットピクセルの値を決定することにより、より安定した値を得ることが可能になる。 ^{For example, the value P (x *} ₀ (x * 0 (x * 0)) of the target pixel is used by using the average value of five coordinate points including the coordinates of the target pixel (x ^* ₀ (τ), y ^* _{0 (τ)).} When determining τ), y ^* ₀ (τ)),

... Equation 8
The value can be determined by. By determining the value of the target pixel based on the average value of the values of the plurality of coordinates in this way, a more stable value can be obtained.

さらに、上述した局所距離の算出において、実施の形態に係る画像距離算出装置１００では、ターゲットピクセルの値をｆ（ｘ_０，ｙ_０，１）として、局所距離を、ターゲットピクセルの値と、時間ｔの対象となる画素の値とのユークリット距離により算出して、
局所距離ｄ（ｘ，ｙ，ｔ）＝｜ｆ（ｘ_０，ｙ_０，１）−ｆ（ｘ，ｙ，ｔ）｜
によって求める場合について説明した。 Further, in the above-mentioned calculation of the local distance, in the image distance calculation device 100 according to the embodiment, the value of the target pixel is f (x ₀ , y ₀ , 1), and the local distance is the value of the target pixel and the time. Calculated by the eucritic distance from the value of the target pixel of t,
Local distance d (x, y, t) = | f (x ₀ , y ₀ , 1) -f (x, y, t) |
I explained the case of requesting by.

しかしながら、局所距離の算出は、この方法には限定されない。例えば、２つの画素のコサイン相関関数を用いて距離値を算出することにより、局所距離を求めることも可能である。すなわち局所距離ｄ（ｘ，ｙ，ｔ）を、

・・・式９
により求めることができる。 However, the calculation of the local distance is not limited to this method. For example, it is also possible to obtain the local distance by calculating the distance value using the cosine correlation function of two pixels. That is, the local distance d (x, y, t) is set to

・・・ Equation 9
Can be obtained by.

式９の分子に示す＜ｆ（ｘ，ｙ，ｔ），Ｐ（ｘ^＊ _０（τ），ｙ^＊ _０（τ））＞は、ＲＧＢのベクトルの内積を示している。相関値は常にゼロと１との間の値を取るため、式９により求められる値は、画素の値の絶対的な変動に対して安定して求められることになる。このため、式９により算出された局所距離は、フレーム画像に発生し得るノイズ等の影響を受け難いという特徴を備える。 <F (x, y, t), P (x ^* ₀ (τ), y ^* ₀ (τ))> shown in the numerator of Equation 9 indicates the inner product of RGB vectors. Since the correlation value always takes a value between zero and 1, the value obtained by Equation 9 is stably obtained with respect to the absolute fluctuation of the pixel value. Therefore, the local distance calculated by the equation 9 has a feature that it is not easily affected by noise or the like that may occur in the frame image.

式９の場合の時間ｔのパラメータは、図１１（ｂ）を示して既に説明したように、τ−Ｔ≦ｔ≦τとなるが、計算の進行は、時間τから時間τ−Ｔまでの過去に遡るようにして、時間ｔが進行することにより行われる。すなわち、時間ｔは、時間ｔ＝τ，τ−１，τ−２，・・・，τ−Ｔと進行し、ターゲットピクセルに到達する時間ｔは、時間ｔ＝τ−Ｔとなる。 The parameter of the time t in the case of the equation 9 is τ−T ≦ t ≦ τ as already explained with reference to FIG. 11B, but the progress of the calculation is from the time τ to the time τ−T. It is performed by advancing the time t so as to go back to the past. That is, the time t progresses as time t = τ, τ-1, τ-2, ..., τ−T, and the time t to reach the target pixel is time t = τ−T.

また、ターゲットピクセルのトラッキング処理においてバックトレースを視覚化することにより、トラッキングが正確に行われているか判断し易くなる。この点においては、既に説明したように、Ｓ（ｘ，ｙ，ｔ）／ｔ≦ｈの条件を満たす領域をＲ（ｘ，ｙ，ｔ）として、この領域の重心の座標をトラッキングすることにより、バックトレースの視覚化を行い易くすることができる。 Further, by visualizing the back trace in the tracking process of the target pixel, it becomes easy to judge whether the tracking is performed accurately. In this respect, as described above, the region satisfying the condition of S (x, y, t) / t ≦ h is set as R (x, y, t), and the coordinates of the center of gravity of this region are tracked. , It is possible to facilitate the visualization of the back trace.

１００ …動画像距離算出装置
１０１ …記録部
１０２ …ＲＯＭ
１０３ …ＲＡＭ
１０４ …ＣＰＵ（制御手段、領域分割手段、ターゲットピクセル抽出手段、距離算出手段、軌跡算出手段、移動画素数測定手段、距離算出手段）
２００ …カメラ
２１０ …モニタ
３０１ …領域分割部（領域分割手段）
３０２ …ターゲットピクセル抽出部（ターゲットピクセル抽出手段）
３０３ …軌跡算出部（軌跡算出手段）
３０４ …移動画素数測定部（移動画素数測定手段）
３０５ …距離算出部（距離算出手段） 100 ... Moving image distance calculation device 101 ... Recording unit 102 ... ROM
103 ... RAM
104 ... CPU (control means, area division means, target pixel extraction means, distance calculation means, locus calculation means, moving pixel number measuring means, distance calculation means)
200 ... Camera 210 ... Monitor 301 ... Area division unit (area division means)
302… Target pixel extraction unit (target pixel extraction means)
303 ... Trajectory calculation unit (trajectory calculation means)
304 ... Moving pixel number measuring unit (moving pixel number measuring means)
305… Distance calculation unit (distance calculation means)

Claims

The front view in the moving direction is reflected in the moving image based on the moving image taken from the time t = τ-T (however, τ> 0, T> 0) to the time t = τ by one moving camera. A moving image distance calculation device that calculates the distance from an object to the camera.
One of the pixels of the object reflected in the frame image when the time t = τ−T of the moving image is set as the target pixel, and the target pixel is set for each of M (M ≧ 2) different objects from the frame image. Target pixel extraction means to extract and
Based on the plurality of frame images from time t = τ−T to time t = τ, the loci of the coordinates of the M target pixels extracted by the target pixel extraction means are targeted for a two-dimensional image. A locus calculation means for calculating in chronological order from the frame image at time t = τ −T to the frame image at time t = τ based on the dynamic planning method.
_{Based on the M loci calculated by the locus calculation means, the number of moving pixels q m} (m =) from the coordinates of the frame image at time t = τ −T to the coordinates of the frame image at time t = τ. A moving pixel number measuring means for measuring 1, 2, ..., M) for each of the M target pixels,
Wherein among the mobile M-number of the number of pixels moved which is measured by the number measuring unit pixels q _m, the moving pixel number is the smallest number of pixels as mu, the number of pixels the number of moving pixels is highest and gamma, the M the nearest distance among the distances from the object to be identified respectively by the target pixel to the camera and Z _n, among the objects identified respectively by the M of the target pixel of the distance to the camera _{Let Z L be} the farthest distance, and let constant a and constant b be
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
b = (1 / (γ-μ)) log (Z _L / Z _N )
Calculated by
_{Let Z m} (m = 1, 2, ..., M) be the distance from the object corresponding to each of the M target pixels to the camera at time t = τ _{, and set the distance Z m} as described above. Based on the constant a and the constant b and the number of M moving pixels q _m ,
Z _m = a · exp (-bq _m )
A moving image distance calculation device characterized by having a distance calculation means calculated by

The front view in the moving direction is reflected in the moving image based on the moving image taken from the time t = τ-T (however, τ> 0, T> 0) to the time t = τ by one moving camera. A moving image distance calculation device that calculates the distance from an object to the camera.
One of the pixels of the object reflected in the frame image when the time t = τ of the moving image is set as the target pixel, and the target pixel is extracted for each of M (M ≧ 2) different objects from the frame image. Target pixel extraction method and
Based on the plurality of frame images from time t = τ−T to time t = τ, the loci of the coordinates of the M target pixels extracted by the target pixel extraction means are targeted for a two-dimensional image. Based on the dynamic planning method, a locus calculation means for calculating in chronological order retroactively from the frame image at time t = τ to the frame image at time t = τ −T.
_{Based on the M loci calculated by the locus calculation means, the number of moving pixels q m} (m =) from the coordinates of the frame image at time t = τ −T to the coordinates of the frame image at time t = τ. A moving pixel number measuring means for measuring 1, 2, ..., M) for each of the M target pixels,
Wherein among the mobile M-number of the number of pixels moved which is measured by the number measuring unit pixels q _m, the moving pixel number is the smallest number of pixels as mu, the number of pixels the number of moving pixels is highest and gamma, the M the nearest distance among the distances from the object to be identified respectively by the target pixel to the camera and Z _n, among the objects identified respectively by the M of the target pixel of the distance to the camera _{Let Z L be} the farthest distance, and let constant a and constant b be
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
b = (1 / (γ-μ)) log (Z _L / Z _N )
Calculated by
_{Let Z m} (m = 1, 2, ..., M) be the distance from the object corresponding to each of the M target pixels to the camera at time t = τ _{, and set the distance Z m} as described above. Based on the constant a and the constant b and the number of M moving pixels q _m ,
Z _m = a · exp (-bq _m )
A moving image distance calculation device characterized by having a distance calculation means calculated by

The target pixel extraction means has a region dividing means for dividing the frame image into a plurality of regions by applying a mean-shift method to the frame image used for extracting the target pixels.
The target pixel extraction means has M (M ≧ 2) objects of different objects reflected in the area divided by the area dividing means, with the pixel of the center of gravity of the area in which the object is reflected as the target pixel. The moving image distance calculation device according to claim 1 or 2, wherein the target pixel is extracted for each region.

The target pixel extraction means has a region dividing means for dividing the frame image into a plurality of regions by applying a mean-shift method to the frame image used for extracting the target pixels.
The target pixel extraction means adds numbers in order from the end to the pixels in the area divided by the area dividing means, and the pixel corresponding to the middle number among the numbers of the pixels in the area is the target pixel. The moving image distance calculation device according to claim 1 or 2, wherein the target pixels are extracted for each of M (M ≧ 2) regions in which different objects are reflected.

The front view in the moving direction is reflected in the moving image based on the moving image taken from the time t = τ-T (however, τ> 0, T> 0) to the time t = τ by one moving camera. A moving image distance calculation program for calculating the distance from an object to the camera.
As a control means,
One of the pixels of the object reflected in the frame image when the time t = τ−T of the moving image is set as the target pixel, and the target pixel is set for each of M (M ≧ 2) different objects from the frame image. Target pixel extraction function to extract and
Based on the plurality of frame images from time t = τ−T to time t = τ, the loci of the coordinates of the M target pixels extracted by the target pixel extraction function are targeted for a two-dimensional image. Based on the dynamic planning method, a trajectory calculation function for calculating in chronological order from the frame image at time t = τ-T to the frame image at time t = τ, and
_{Based on the M loci calculated by the locus calculation function, the number of moving pixels q m} (m =) from the coordinates of the frame image at time t = τ −T to the coordinates of the frame image at time t = τ. A moving pixel number measurement function that measures 1, 2, ..., M) for each of the M target pixels,
Wherein among the moving pixel number measurement function by the measured M number of the mobile number of pixels q _m, the moving pixel number is the smallest number of pixels as mu, the number of pixels the number of moving pixels is highest and gamma, the M the nearest distance among the distances from the object to be identified respectively by the target pixel to the camera and Z _n, among the objects identified respectively by the M of the target pixel of the distance to the camera _{Let Z L be} the farthest distance, and let constant a and constant b be
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
b = (1 / (γ-μ)) log (Z _L / Z _N )
Calculated by
_{Let Z m} (m = 1, 2, ..., M) be the distance from the object corresponding to each of the M target pixels to the camera at time t = τ _{, and set the distance Z m} as described above. Based on the constant a and the constant b and the number of M moving pixels q _m ,
Z _m = a · exp (-bq _m )
A moving image distance calculation program characterized by realizing a distance calculation function calculated by

The front view in the moving direction is reflected in the moving image based on the moving image taken from the time t = τ-T (however, τ> 0, T> 0) to the time t = τ by one moving camera. A moving image distance calculation program for calculating the distance from an object to the camera.
As a control means,
One of the pixels of the object displayed in the frame image when the time t = τ of the moving image is set as the target pixel, and the target pixel is extracted from the frame image for each of M (M ≧ 2) different objects. Target pixel extraction function and
Based on the plurality of frame images from time t = τ−T to time t = τ, the loci of the coordinates of the M target pixels extracted by the target pixel extraction function are targeted for a two-dimensional image. Based on the dynamic planning method, a trajectory calculation function that calculates from the frame image at time t = τ to the frame image at time t = τ −T in chronological order retroactively.
_{Based on the M loci calculated by the locus calculation function, the number of moving pixels q m} (m =) from the coordinates of the frame image at time t = τ −T to the coordinates of the frame image at time t = τ. A moving pixel number measurement function that measures 1, 2, ..., M) for each of the M target pixels,
Wherein among the moving pixel number measurement function by the measured M number of the mobile number of pixels q _m, the moving pixel number is the smallest number of pixels as mu, the number of pixels the number of moving pixels is highest and gamma, the M the nearest distance among the distances from the object to be identified respectively by the target pixel to the camera and Z _n, among the objects identified respectively by the M of the target pixel of the distance to the camera _{Let Z L be} the farthest distance, and let constant a and constant b be
a = Z _L · exp ((μ / (γ-μ)) log (Z _L / Z _N ))
b = (1 / (γ-μ)) log (Z _L / Z _N )
Calculated by
_{Let Z m} (m = 1, 2, ..., M) be the distance from the object corresponding to each of the M target pixels to the camera at time t = τ _{, and set the distance Z m} as described above. Based on the constant a and the constant b and the number of M moving pixels q _m ,
Z _m = a · exp (-bq _m )
A moving image distance calculation program characterized by realizing a distance calculation function calculated by

To the control means
By applying the mean-shift method to the frame image used for extracting the target pixel in the target pixel extraction function, a region division function for dividing the frame image into a plurality of regions is realized.
In the target pixel extraction function, among the areas divided by the area division function, M (M ≧ 2) in which different objects are reflected, with the pixel at the center of gravity of the area in which the object is reflected as the target pixel. The moving image distance calculation program according to claim 5 or 6, wherein the target pixel is extracted for each region.

To the control means
By applying the mean-shift method to the frame image used for extracting the target pixel in the target pixel extraction function, a region division function for dividing the frame image into a plurality of regions is realized.
In the target pixel extraction function, numbers are added in order from the end to the pixels in the region divided by the region division function, and the pixel corresponding to the middle number among the pixel numbers in the region is the target pixel. The moving image distance calculation program according to claim 5 or 6, wherein the target pixels are extracted for each of M (M ≧ 2) regions in which different objects are reflected.