JP2007266652A

JP2007266652A - Moving object detection apparatus, moving object detection method, moving object detection program, video decoding apparatus, video encoding apparatus, imaging apparatus, and video management system

Info

Publication number: JP2007266652A
Application number: JP2005035627A
Authority: JP
Inventors: Daijiro Ichimura; 大治郎市村; Yoshimasa Honda; 義雅本田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-05-31
Filing date: 2005-02-14
Publication date: 2007-10-11
Also published as: WO2005117448A1

Abstract

【課題】移動物体の検出を高速かつ高精度かつ低処理負荷で行うことができる移動物体検出方法および装置を提供する。
【解決手段】画像を縮小画像、水平方向成分、垂直方向成分および対角方向成分に分割する帯域分割方法、および、動き予測補償符号化を用いて映像符号化された映像ストリームから動きの情報を抽出する動き情報抽出手段１０２と、映像ストリームの最上位ビット平面から順に１以上のビット平面から水平方向成分、垂直方向成分および対角方向成分の情報を抽出するエッジ情報抽出手段１０３と、抽出した動き情報とエッジ情報とを用いて移動物体を検出し、その検出結果を出力する移動物体検出手段１０６とを有することにより、映像ストリームを復号化する必要がないので、高速かつ高精度かつ低処理負荷で移動物体を検出することができる。
【選択図】図１A moving object detection method and apparatus capable of detecting a moving object at high speed, high accuracy, and low processing load.
A band division method for dividing an image into a reduced image, a horizontal component, a vertical component, and a diagonal component, and motion information from a video stream that has been encoded using motion prediction compensation encoding. A motion information extracting unit 102 for extracting, an edge information extracting unit 103 for extracting information on a horizontal component, a vertical component and a diagonal component from one or more bit planes in order from the most significant bit plane of the video stream; By having a moving object detection means 106 that detects a moving object using motion information and edge information and outputs the detection result, it is not necessary to decode a video stream, so that high-speed, high-precision, and low-processing A moving object can be detected by a load.
[Selection] Figure 1

Description

本発明は、映像を符号化して生成した映像ストリームから移動物体を検出する移動物体の検出方法および装置に関する。 The present invention relates to a moving object detection method and apparatus for detecting a moving object from a video stream generated by encoding video.

従来より、この移動物体検出装置としては、例えば、特許文献１に記載されているようなものがあった。 Conventionally, as this moving object detection device, there has been one as described in Patent Document 1, for example.

この移動物体検出装置は、映像ストリームを復号化することなく、動き予測補償符号化方式に用いた動きベクトルを抽出し、動きベクトルをある領域内の物体の動きとみなして、高速に移動物体を検出するものである。図１８は、特許文献１に記載された従来の移動物体検出装置を示すものである。 This moving object detection apparatus extracts a motion vector used in the motion prediction / compensation coding method without decoding a video stream, regards the motion vector as a motion of an object in a certain region, and detects a moving object at high speed. It is to detect. FIG. 18 shows a conventional moving object detection device described in Patent Document 1. In FIG.

図１８において、可変長復号部１８０１で復号化された、画像ブロックの符号化モード、動き補償モードおよび動きベクトル情報と、模様情報検出部１８０２で検出された模様情報とは、移動物体検出処理部１８０３へ送られる。移動物体検出処理部１８０３は、これらの情報を用いて、この画像ブロックが移動物体であるか否かを判別する。この判別には、動きベクトル、空間的類似性判断、時間的類似性判断等を用いて行う。
特開平１０−７５４５７号公報 In FIG. 18, the coding mode, motion compensation mode, and motion vector information of the image block decoded by the variable length decoding unit 1801 and the pattern information detected by the pattern information detection unit 1802 are a moving object detection processing unit. Sent to 1803. The moving object detection processing unit 1803 uses these pieces of information to determine whether or not this image block is a moving object. This determination is performed using a motion vector, spatial similarity determination, temporal similarity determination, or the like.
Japanese Patent Laid-Open No. 10-75457

しかしながら、上記従来の構成では、必ずしも物体の動きを正確に表していない動きベクトルのみに依存しているので、精度が良いとは言えなかった。すなわち、動きベクトルの生成方法としては、符号化中の領域に対して前後の画像から符号化の圧縮率が高くなる参照領域を探索し、その探索した領域への参照を動きベクトルとすることが多い。このため、動きベクトルのみを用いた移動物体の検出は精度が良くなかった。 However, in the above conventional configuration, it depends on only the motion vector that does not necessarily accurately represent the motion of the object, so it cannot be said that the accuracy is good. That is, as a method for generating a motion vector, a reference region where the compression rate of encoding is high is searched from previous and next images with respect to the region being encoded, and a reference to the searched region is used as a motion vector. Many. For this reason, the detection of the moving object using only the motion vector is not accurate.

本発明は、上記課題を解決するためになされたものであり、その目的とするところは、画像を縮小画像、水平方向成分、垂直方向成分および対角方向成分に分割する帯域分割方法、および、動き予測補償符号化を用いて映像符号化された映像ストリームから、移動物体の検出を高速かつ高精度かつ低処理負荷で行うことができる移動物体検出方法および装置を提供することにある。 The present invention has been made to solve the above-described problems, and the object of the present invention is to provide a band dividing method for dividing an image into a reduced image, a horizontal component, a vertical component, and a diagonal component, and It is an object of the present invention to provide a moving object detection method and apparatus capable of detecting a moving object at high speed, with high accuracy, and with a low processing load from a video stream that has been encoded using motion prediction compensation encoding.

本発明の移動物体検出装置は、映像を複数レイヤに分けて符号化する階層符号化および動き予測補償符号化を用いて映像符号化された映像ストリームから、動き情報を抽出する動き情報抽出手段と、前記映像ストリームからエッジ情報を抽出するエッジ情報抽出手段と、前記動き情報と前記エッジ情報とを用いて移動物体を検出し、当該検出結果を出力する移動物体検出手段とを有する構成をとる。 A moving object detection apparatus according to the present invention includes a motion information extraction unit that extracts motion information from a video stream that has been video-encoded using hierarchical encoding and motion prediction compensation encoding in which video is divided into a plurality of layers. , An edge information extracting means for extracting edge information from the video stream, and a moving object detecting means for detecting a moving object using the motion information and the edge information and outputting the detection result.

この構成によれば、映像ストリームを復号化することなく、物体輪郭を検出可能で、さらに、動きの情報から移動物体の検出が可能で、高速かつ高精度かつ低処理負荷で移動物体を検出することができる。 According to this configuration, the object contour can be detected without decoding the video stream, and the moving object can be detected from the motion information, and the moving object can be detected with high speed, high accuracy, and low processing load. be able to.

また、本発明の移動物体検出装置は、さらにエッジ情報抽出手段は、画像をビット平面符号化したビット平面情報のうち、最上位ビット平面からＮ（Ｎは自然数）ビット位ビット平面までのビット平面情報を、前記映像ストリームからエッジ情報として抽出するものである。 Further, in the moving object detection device of the present invention, the edge information extraction means further includes a bit plane from the most significant bit plane to the N (N is a natural number) bit-order bit plane in the bit plane information obtained by bit-plane coding of the image. Information is extracted from the video stream as edge information.

この構成によれば、特定のビット平面まで情報を抽出することで特定の強度以上のエッジを検出することが可能で、高速に物体の輪郭を検出することができる。また、特定ビット位以上のビット平面だけで物体の輪郭の検出が可能で、特定ビット位未満のビット平面は必要なく、映像ストリームを通信速度の遅い通信網を介して受信している場合も、低ビットレートで高精度な検出ができる。 According to this configuration, by extracting information up to a specific bit plane, an edge having a specific intensity or more can be detected, and the contour of the object can be detected at high speed. In addition, it is possible to detect the contour of an object only with a bit plane above a specific bit position, no bit plane less than a specific bit position is necessary, and even when receiving a video stream via a communication network with a low communication speed, Highly accurate detection at a low bit rate.

また、本発明の移動物体検出装置は、さらに映像ストリームが複数の領域に分割されたものであって、移動物体検出手段は、前記領域内部のビット平面情報の符号長の合計が予め定めた第１の値以上である場合に、前記領域を移動物体の輪郭領域と判定するものである。 In the moving object detection device of the present invention, the video stream is further divided into a plurality of areas, and the moving object detection means has a predetermined total code length of bit plane information in the area. When the value is equal to or greater than 1, the region is determined as the contour region of the moving object.

この構成によれば、画像のある領域のある閾値のビット位までのビット平面の符号量を確認するだけで領域内に存在するエッジの多さを判定することが可能であり、高速に物体の輪郭を検出することができる。 According to this configuration, it is possible to determine the number of edges existing in a region simply by confirming the code amount of the bit plane up to a certain threshold bit position in a region of the image, and at high speed, The contour can be detected.

また、本発明の移動物体検出装置おいて、さらに移動物体検出手段は、前記領域内部の前記ビット平面情報の符号長の合計が、予め定めた第２の値以下である場合に、前記領域を移動物体の輪郭領域と判定するものである。 Further, in the moving object detection device of the present invention, the moving object detection means further selects the area when the total code length of the bit plane information inside the area is equal to or less than a predetermined second value. It is determined as the contour area of the moving object.

この構成によれば、物体の輪郭は線であるので、ある領域があまりに多くの水平方向成分、垂直方向成分および対角方向成分を含む場合、例えば、それは縞模様を含む領域であり、移動物体の輪郭ではないと判定し誤検出を防ぐことが可能である。 According to this configuration, since the contour of the object is a line, if a certain region contains too many horizontal, vertical and diagonal components, for example, it is a region containing a striped pattern and a moving object Therefore, it is possible to prevent erroneous detection by determining that it is not a contour.

また、本発明の移動物体検出装置おいて、さらに動き情報抽出手段は、前記移動物体の輪郭領域と判断された領域から動きベクトルを抽出し、移動物体検出手段は、前記動きベクトルの大きさが予め定めた第３の値以上である場合に、前記領域を移動物体の輪郭領域であると判定するものである。 In the moving object detection device of the present invention, the motion information extraction unit further extracts a motion vector from an area determined to be the contour area of the moving object, and the moving object detection unit has a magnitude of the motion vector. When the value is equal to or greater than a predetermined third value, the region is determined to be a contour region of a moving object.

この構成によれば、動いていない物体を移動物体でないと判定し、移動物体の検出の精度を向上することができる。 According to this configuration, it is possible to determine that an object that is not moving is not a moving object, and to improve the accuracy of detection of the moving object.

また、本発明の移動物体検出装置において、さらに動き情報抽出手段は、移動物体の輪郭領域と判断された領域から第１の動きベクトルを抽出するとともに、当該領域の近傍に位置する領域を選択し、選択した領域から第２の動きベクトルを抽出し、前記移動物体検出手段は、前記第１の動きベクトルと、前記第２の動きベクトルとの差分ベクトルの大きさを測定値として算出し、前記測定値が予め定めた第４の値以下である場合、前記選択された領域を移動物体の内部領域であると判定するものである。 In the moving object detection device of the present invention, the motion information extraction unit further extracts the first motion vector from the area determined to be the outline area of the moving object and selects an area located in the vicinity of the area. A second motion vector is extracted from the selected region, and the moving object detection means calculates a magnitude of a difference vector between the first motion vector and the second motion vector as a measurement value, and When the measured value is equal to or less than a predetermined fourth value, the selected area is determined to be an internal area of the moving object.

この構成によれば、映像の中で移動物体の輪郭の領域は周囲の領域とは異なる速度を持つので、移動物体の輪郭以外の領域を移動物体の領域ではないと判定し、移動物体の検出の精度を向上することができる。 According to this configuration, since the contour area of the moving object in the video has a different speed from the surrounding area, it is determined that the area other than the contour of the moving object is not the moving object area, and the moving object is detected. Accuracy can be improved.

また、本発明の移動物体検出装置において、さらに動き情報抽出手段は、複数の領域を選択し、それぞれの選択領域から動きベクトルを抽出し、移動物体検出手段は、前記選択領域毎に、前記第１の動きベクトルと前記選択された領域の動きベクトルとの差分ベクトルの大きさを求め、全ての選択領域についての差分ベクトルの大きさの合計を前記測定値として算出するものである。 In the moving object detection device of the present invention, the motion information extraction unit further selects a plurality of regions, extracts a motion vector from each selection region, and the moving object detection unit performs the first operation for each of the selection regions. The magnitude of the difference vector between one motion vector and the motion vector of the selected area is obtained, and the sum of the magnitudes of the difference vectors for all selected areas is calculated as the measurement value.

この構成によれば、映像の中で移動物体の輪郭の領域は周囲の領域とは異なる速度を持つので、移動物体の輪郭以外の複数の領域を移動物体の領域ではないと判定し、移動物体の検出の精度を向上することができる。 According to this configuration, the contour area of the moving object in the video has a different speed from the surrounding area. Therefore, it is determined that a plurality of areas other than the contour of the moving object are not moving object areas, and the moving object The accuracy of detection can be improved.

また、本発明の移動物体検出装置において、さらに前記移動物体検出手段は、前記移動物体の内部領域であると判定された領域の動きベクトルと、前記領域の近傍に位置する領域の動きベクトルとの差分ベクトルの大きさが予め定めた第５の値以下である場合、前記移動物体の領域の内部領域であると判定するものである。 Further, in the moving object detection device of the present invention, the moving object detection means further includes a motion vector of an area determined to be an internal area of the moving object and a motion vector of an area located in the vicinity of the area. When the magnitude of the difference vector is equal to or smaller than a predetermined fifth value, it is determined that the difference vector is an inner area of the moving object area.

この構成によれば、ある速度で移動する移動物体の、移動物体と判定していない領域を検出することが可能で、移動物体の検出の精度を向上することができる。 According to this configuration, it is possible to detect a region of a moving object that moves at a certain speed that is not determined as a moving object, and it is possible to improve the accuracy of detection of the moving object.

また、本発明の移動物体検出装置において、さらに移動物体検出手段は、前記移動物体の輪郭領域又前記移動物体の内部領域と判定された領域により囲まれた領域を、移動物体の内部領域であると判定するものである。 Further, in the moving object detection device of the present invention, the moving object detection means further includes an area surrounded by the contour area of the moving object or the area determined as the inner area of the moving object as the inner area of the moving object. It is determined.

この構成によれば、移動物体の輪郭と判定した内部を移動物体の領域として検出することが可能で、移動物体の検出の精度を向上することができる。 According to this configuration, the inside determined as the contour of the moving object can be detected as the moving object region, and the accuracy of detection of the moving object can be improved.

また、本発明の移動物体検出装置において、さらに移動物体検出手段は、第１の移動物体と判定された輪郭領域又は内部領域の近傍に、第２の移動物体の輪郭領域又は内部領域であると判定された領域の数が、予め定めた第６の値以上の場合、前記第１の移動物体と判定された輪郭領域又は内部領域を第１の移動物体であると再判定するものである。 Further, in the moving object detection device of the present invention, the moving object detection means further includes the contour area or the internal area of the second moving object in the vicinity of the contour area or the internal area determined as the first moving object. When the number of determined areas is equal to or greater than a predetermined sixth value, the contour area or the internal area determined as the first moving object is determined again as the first moving object.

この構成によれば、あまりに小さい領域を移動物体でないと判定することが可能で、移動物体検出の誤検出を低下することができる。 According to this configuration, it is possible to determine that a too small area is not a moving object, and it is possible to reduce erroneous detection of moving object detection.

本発明の移動物体検出方法は、映像ストリームから移動物体を検出する方法であって、前記移動物体を検出する移動物体検出装置が実行するところの、映像を複数レイヤに分けて符号化する階層符号化、および、動き予測補償符号化を用いて映像符号化された映像ストリームから動き情報を抽出するステップと、前記映像ストリームからエッジ情報を抽出するステップと、抽出した前記動き情報と前記エッジ情報とを用いて移動物体を検出するステップとを有するものである。 The moving object detection method of the present invention is a method for detecting a moving object from a video stream, and is a hierarchical code for encoding a video divided into a plurality of layers executed by the moving object detection device for detecting the moving object. And extracting motion information from a video stream that has been encoded using motion prediction compensation encoding, extracting edge information from the video stream, and extracting the motion information and edge information. And a step of detecting a moving object using.

この方法によれば、映像ストリームを復号化することなく、物体の輪郭を検出可能で、さらに、動きの情報から移動物体の検出が可能で、高速かつ高精度かつ低処理負荷で移動物体を検出することができる。 According to this method, it is possible to detect the contour of an object without decoding the video stream. Furthermore, it is possible to detect a moving object from motion information, and to detect a moving object at high speed, high accuracy, and with a low processing load. can do.

本発明の移動物体検出プログラムは、映像ストリームから移動物体を検出するためにコンピュータを、映像を複数レイヤに分けて符号化する階層符号化および動き予測補償符号化を用いて映像符号化された映像ストリームから動き情報を抽出するステップと、前記映像ストリームからエッジ情報を抽出するステップと、抽出した前記動き情報と前記エッジ情報とを用いて移動物体を検出するステップとを実行させるものである。 The moving object detection program of the present invention is a video encoded using hierarchical encoding and motion prediction compensation encoding, in which a computer is detected in order to detect a moving object from a video stream, and the video is divided into a plurality of layers. A step of extracting motion information from the stream, a step of extracting edge information from the video stream, and a step of detecting a moving object using the extracted motion information and edge information are executed.

このプログラムによれば、映像ストリームを復号化することなく、物体の輪郭を検出可能で、さらに、動きの情報から移動物体の検出が可能で、高速かつ高精度かつ低処理負荷で移動物体を検出することができる。 According to this program, it is possible to detect the contour of an object without decoding the video stream. Furthermore, it is possible to detect a moving object from motion information, and to detect a moving object with high speed, high accuracy, and low processing load. can do.

本発明の映像復号化装置は、映像を複数レイヤに分けて符号化する階層符号化および動き予測補償符号化により符号化した映像ストリームを復号化する映像復号化手段と、前記映像復号化手段が前記映像ストリームを復号化する際に抽出した動き情報およびエッジ情報から移動物体を検出する移動物体検出手段とを有するものである。 The video decoding apparatus according to the present invention includes a video decoding unit that decodes a video stream encoded by hierarchical encoding and motion prediction compensation encoding that divides a video into a plurality of layers, and the video decoding unit includes: Moving object detection means for detecting a moving object from motion information and edge information extracted when decoding the video stream.

この構成によれば、映像復号装置と移動物体検出装置が、一部の処理や手段を共有することができ、映像の復号化と移動物体の検出とを同時に高速に行うことが可能で、かつ、装置全体の規模を小さくすることが可能である。 According to this configuration, the video decoding device and the moving object detection device can share part of the processing and means, and can simultaneously perform video decoding and moving object detection at high speed, and It is possible to reduce the scale of the entire apparatus.

また、本発明の映像復号化装置において、映像ストリームは複数の領域に分割されたものであって、移動物体検出手段は、前記領域内部のビット平面情報の符号長の合計が予め定めた第１の値以上である場合に、前記領域を移動物体の輪郭領域と判定するものである。 In the video decoding device of the present invention, the video stream is divided into a plurality of areas, and the moving object detection means has a predetermined first code length of bit plane information in the area. When the value is equal to or greater than the value, the region is determined as the contour region of the moving object.

この構成によれば、例えば水平方向成分、垂直方向成分および対角方向成分のある領域のある閾値のビット位までのビット平面の符号量を確認するだけで領域内に存在するエッジの多さを判定することが可能であり、高速に物体の輪郭を検出することができる。 According to this configuration, for example, it is possible to reduce the number of edges existing in a region only by checking the code amount of the bit plane up to a certain threshold bit position in a region having a horizontal component, a vertical component, and a diagonal component. The contour of the object can be detected at high speed.

また、本発明の映像復号化装置おいて、さらに移動物体検出手段は、前記領域内部の前記ビット平面情報の符号長の合計が予め定めた第２の値以下である場合に、前記領域を移動物体の輪郭領域と判定するものである。 In the video decoding device of the present invention, the moving object detection means further moves the area when the total code length of the bit plane information in the area is equal to or less than a predetermined second value. It is determined as the contour area of the object.

本発明の映像復号化装置において、さらに映像復号化手段は、移動物体検出手段の検出した移動物体の領域を強調した映像を生成するものである。 In the video decoding apparatus of the present invention, the video decoding means further generates a video in which the area of the moving object detected by the moving object detection means is emphasized.

この方法によれば、監視者は容易に移動物体を検知することが可能である。 According to this method, the monitor can easily detect a moving object.

本発明の映像復号化装置において、さらに前記映像復号化手段は、エッジ成分からなる映像を生成し、前記移動物体検出手段の検出した移動物体の領域のみを強調して表示すものである。 In the video decoding apparatus according to the present invention, the video decoding means further generates a video composed of edge components, and emphasizes and displays only the area of the moving object detected by the moving object detection means.

これにより、通信速度の制限などにより基本レイヤのビットレートが非常に低く、画質の極端に悪い映像しか生成できないときでも、輪郭のみの方が細部を認識できる場合がある。 As a result, even when the base layer bit rate is very low due to communication speed limitations or the like and only images with extremely poor image quality can be generated, only the contour may be able to recognize details.

また、輪郭からなる映像の中において移動物体のみが非常に目立ち、複数の監視映像を同時に見る監視者にとって異常や不審人物の発生を検知しやすい。あるいは、複数のカメラ映像を表示する場合など、処理能力が限られた環境においても、低処理負荷で、監視上重要な領域を見やすく表示することが可能となる。 Also, only moving objects are very conspicuous in the contour image, and it is easy for a monitor who watches a plurality of monitor images at the same time to detect the occurrence of an abnormality or a suspicious person. Alternatively, even in an environment where the processing capability is limited, such as when displaying a plurality of camera images, it is possible to easily display an area important for monitoring with a low processing load.

本発明の映像符号化装置は、映像を複数レイヤに分けて符号化する階層符号化および動き予測補償符号化を用いて符号化した映像ストリームを生成する映像符号化手段と、映像符号化手段が前記映像を符号化する際に動き情報と、映像のエッジ情報とを抽出して移動物体を検出する移動物体検出手段を有するものである。この構成によれば、映像符号化手段と移動物体検出手段とが、一部の処理や手段を共有することができ、映像の符号化と移動物体の検出とを同時に高速に行うことが可能で、かつ、装置全体の規模を小さくすることが可能である。 The video encoding apparatus according to the present invention includes a video encoding unit that generates a video stream encoded using hierarchical encoding and motion prediction compensation encoding that divides a video into a plurality of layers, and the video encoding unit includes: It has moving object detection means for detecting the moving object by extracting the motion information and the edge information of the video when encoding the video. According to this configuration, the video encoding means and the moving object detection means can share a part of the processing and means, and the video encoding and the detection of the moving object can be simultaneously performed at high speed. In addition, the scale of the entire apparatus can be reduced.

本発明の撮像装置は、映像を入力する撮像手段と、この撮像手段の入力した映像を符号化する本発明に係る映像符号化装置と、移動物体検出手段が出力する移動物体の検出結果に基づき、撮像手段に対して撮像機能を制御する撮像制御手段と、映像ストリームと移動物体の検出結果とを出力する出力部とを有するものである。 The imaging apparatus of the present invention is based on an imaging means for inputting video, a video encoding apparatus according to the present invention for encoding video input by the imaging means, and a detection result of a moving object output by the moving object detection means. The image pickup control means for controlling the image pickup function with respect to the image pickup means, and the output section for outputting the video stream and the detection result of the moving object.

この構成によって、遠隔地への映像送信のために生成する映像ストリームの生成過程で、移動物体の検出を行うことができるので、映像監視などにおいて、不審人物などを高速に移動物体として検出し撮影しつづけることが可能であるとともに、その映像を送信することができ、映像監視を効率的に行うことができる。 With this configuration, it is possible to detect moving objects in the process of generating a video stream that is generated for video transmission to a remote place. The video can be transmitted and the video can be monitored efficiently.

また、本発明の撮像装置において、撮像制御手段が、前記移動物体検出手段が出力する移動物体の領域の面積を入力映像の全面積に対して一定の割合となるように、撮像手段を制御するものである。 In the imaging apparatus of the present invention, the imaging control unit controls the imaging unit so that the area of the moving object region output from the moving object detection unit is a constant ratio with respect to the total area of the input video. Is.

この構成によって、移動物体とその周囲の状況を映像に収めることが可能で、注目する移動物体の監視を効率的に行うことができる。 With this configuration, it is possible to capture the moving object and the surrounding situation on the video, and it is possible to efficiently monitor the moving object of interest.

本発明の映像監視システムは、本発明に係る撮像装置と、この撮像装置から受信した映像ストリームを復号化するとともに、移動物体の検出結果を用いて、検出した移動物体の領域の画像認識を行う映像監視装置とを有するものである。 The video monitoring system of the present invention decodes the imaging apparatus according to the present invention and a video stream received from the imaging apparatus, and performs image recognition of the detected moving object region using the detection result of the moving object. A video surveillance device.

この構成によって、遠隔地への映像送信のために生成する映像ストリームの生成過程で、移動物体の検出を行うことができ、また、移動物体以外の領域の画像認識処理を省略し高速かつ低処理負荷で画像認識することができるので、映像監視などにおいて、不審人物などを高速に移動物体として検出し撮影しつづけることが可能である。 With this configuration, it is possible to detect moving objects in the process of generating a video stream that is generated for video transmission to a remote location, and to omit high-speed and low processing by omitting image recognition processing for areas other than moving objects. Since the image can be recognized with a load, it is possible to continuously detect and shoot a suspicious person as a moving object in video surveillance or the like.

なお、本発明において画像認識とは、移動物体の検出に限らず、人・顔・物の認識や人の認証を含む、機械による画像を用いた自動判別手段をさす。 In the present invention, image recognition is not limited to detection of a moving object, but refers to automatic discrimination means using a machine image including recognition of a person / face / object and authentication of a person.

また、本発明の映像復号化装置において、さらに映像ストリームは、基本レイヤと拡張レイヤに階層化して符号化され、前記動き情報抽出手段は、前記基本レイヤの映像ストリームから前記動き情報を抽出し、前記エッジ情報抽出手段は、前記拡張レイヤの映像ストリームから前記エッジ情報を抽出するものである。 Further, in the video decoding device of the present invention, the video stream is further encoded by being layered into a base layer and an enhancement layer, and the motion information extraction unit extracts the motion information from the video stream of the base layer, The edge information extracting means extracts the edge information from the enhancement layer video stream.

この構成によれば、前記動き情報が動きのないことを示している場合に前記エッジ情報の抽出などの処理を中止して処理負荷を軽減することが可能であり、また、前記エッジ情報がエッジのないことを示している場合に前記動き情報の抽出などの処理を中止して処理負荷を軽減することが可能であり、高速に物体の輪郭を検出することができる。 According to this configuration, when the motion information indicates that there is no motion, it is possible to stop the processing such as extraction of the edge information and reduce the processing load. In the case where it is indicated that there is no error, the processing load such as the extraction of the motion information can be stopped to reduce the processing load, and the contour of the object can be detected at high speed.

また、本発明の映像復号化装置において、さらに映像ストリームは、基本レイヤと拡張レイヤに階層化して符号化され、前記動き情報抽出手段は、前記拡張レイヤの映像ストリームから前記動き情報を抽出し、前記エッジ情報抽出手段は、拡張レイヤの映像ストリームから前記エッジ情報を抽出するものである。 In the video decoding device of the present invention, the video stream is further encoded by being layered into a base layer and an enhancement layer, and the motion information extraction unit extracts the motion information from the video stream of the enhancement layer, The edge information extraction means extracts the edge information from the enhancement layer video stream.

この構成によれば、移動物体の検出処理を、拡張レイヤの映像ストリームのみで行うことが可能で、高速かつ少ない映像ストリームで物体の輪郭を検出することができる。 According to this configuration, the moving object detection process can be performed only with the enhancement layer video stream, and the contour of the object can be detected with high speed and with a small number of video streams.

以上説明したように、本発明によれば、画像を縮小画像、水平方向成分、垂直方向成分および対角方向成分に分割する帯域分割方法、および、動き予測補償符号化を用いて映像符号化された映像ストリームから、映像を復号化することなく、高速、高精度かつ低処理負荷で移動する物体の輪郭を検出することが可能になる。また、それと同時に映像の復号化を行うことができる。 As described above, according to the present invention, video coding is performed using a band division method for dividing an image into a reduced image, a horizontal component, a vertical component, and a diagonal component, and motion prediction compensation coding. It is possible to detect the contour of an object moving at high speed, high accuracy, and low processing load without decoding the video from the video stream. At the same time, video can be decoded.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（実施の形態１）
実施の形態１は、本発明に係る移動物体検出方法および装置を、映像復号化装置に適用したものである。つまり、映像ストリームを復号化すると同時に、映像内にある移動物体を高速かつ高精度で検出できるようにしたものである。 (Embodiment 1)
In the first embodiment, the moving object detection method and apparatus according to the present invention is applied to a video decoding apparatus. In other words, at the same time as decoding the video stream, a moving object in the video can be detected at high speed and with high accuracy.

初めに、本実施の形態で用いる映像ストリームについて説明する。この映像ストリームは基本レイヤと拡張レイヤとからなり、基本レイヤは、単独で復号化し低解像度の映像を得ることができる。拡張レイヤは、基本レイヤの画質を向上して高解像度の映像を得ることが可能な付加情報であり、水平・垂直・対角方向のエッジ成分（水平方向成分、垂直方向成分および対角方向成分）を含む。 First, the video stream used in this embodiment will be described. This video stream is composed of a base layer and an enhancement layer, and the base layer can be decoded independently to obtain a low-resolution video. The enhancement layer is additional information that can improve the image quality of the base layer and obtain a high-resolution image. Horizontal, vertical, and diagonal edge components (horizontal component, vertical component, and diagonal component) )including.

次に、この映像ストリームを生成する方法を説明する。 Next, a method for generating this video stream will be described.

まず、入力画像を帯域分割して、縮小画像、水平成分、垂直成分、対角成分を生成する。また、縮小画像を動き予測補償符号化により、単独で映像を復号化可能な基本レイヤとして符号化する。そして水平方向成分、垂直方向成分および対角方向成分をビット平面符号化により、基本レイヤを復号化した映像を高画質化するための拡張レイヤとして符号化する。 First, the input image is band-divided to generate a reduced image, a horizontal component, a vertical component, and a diagonal component. In addition, the reduced image is encoded as a base layer that can be decoded independently by motion prediction compensation encoding. Then, the horizontal direction component, the vertical direction component, and the diagonal direction component are encoded as an extension layer for improving the image quality of the video obtained by decoding the base layer by bit plane encoding.

ここで、帯域分割について説明する。帯域分割では、画像を縮小画像、水平成分、垂直成分、および対角成分の４つの成分に分割する。この帯域分割は、ウエーブレット変換や、ハイパスフィルタとローパスフィルタとダウンサンプラの組合せを用いるなどにより行う。また、帯域分割して得た、縮小画像・水平方向成分、垂直方向成分、および対角方向成分は、帯域合成によってもとの画像に復元することが可能である。この帯域分割によって得られる水平方向成分、垂直方向成分、および対角方向成分とは、数学的に計算可能な近隣画素との画素値の差であり、必ずしも、物体の輪郭を表す訳ではない。例えば、白黒の横縞模様は、その色の境目に強い垂直成分が横線となって現われる。 Here, band division will be described. In the band division, the image is divided into four components: a reduced image, a horizontal component, a vertical component, and a diagonal component. This band division is performed by wavelet transform or using a combination of a high pass filter, a low pass filter and a down sampler. The reduced image / horizontal direction component, vertical direction component, and diagonal direction component obtained by band division can be restored to the original image by band synthesis. The horizontal direction component, the vertical direction component, and the diagonal direction component obtained by this band division are pixel value differences from neighboring pixels that can be mathematically calculated, and do not necessarily represent the contour of the object. For example, in a black and white horizontal stripe pattern, a vertical component strong at the boundary of the color appears as a horizontal line.

図１は、本発明の移動物体検出方法および装置を適用した実施の形態１に係る映像復号化装置１００の構成を示すブロック図である。 FIG. 1 is a block diagram showing a configuration of a video decoding apparatus 100 according to Embodiment 1 to which the moving object detection method and apparatus of the present invention is applied.

図１において、映像復号化装置１００は、ストリーム入力部１０１、基本レイヤ復号化１０２、拡張レイヤ復号化部１０３、帯域合成部１０４、映像出力部１０５、移動物体検出部１０６、検出結果出力部１０７を有する。 In FIG. 1, a video decoding apparatus 100 includes a stream input unit 101, a base layer decoding 102, an enhancement layer decoding unit 103, a band synthesis unit 104, a video output unit 105, a moving object detection unit 106, and a detection result output unit 107. Have

なお、基本レイヤ復号化部１０２と拡張レイヤ復号化部１０３と帯域合成部１０４とが本発明の映像復号化手段に相当し、基本レイヤ復号化部１０２が動き情報抽出手段に相当し、拡張レイヤ復号化部１０３がエッジ情報抽出手段に相当し、移動物体検出部１０６が移動物体検出手段に相当する。 Note that base layer decoding section 102, enhancement layer decoding section 103, and band synthesizing section 104 correspond to the video decoding means of the present invention, base layer decoding section 102 corresponds to the motion information extraction means, and enhancement layer The decoding unit 103 corresponds to edge information extraction means, and the moving object detection unit 106 corresponds to moving object detection means.

ここで、映像復号化手段は、入力した映像ストリームを復号化して映像を生成して出力する。動き情報抽出手段は、入力した映像ストリームから動き情報を抽出して移動物体検出手段に出力する。エッジ情報抽出手段は、入力した映像ストリームからエッジ情報を抽出して移動物体検出手段に出力する。移動物体検出手段は、入力したエッジ情報と動き情報から移動物体を検出する。 Here, the video decoding means generates a video by decoding the input video stream and outputs the video. The motion information extraction unit extracts motion information from the input video stream and outputs the motion information to the moving object detection unit. The edge information extraction means extracts edge information from the input video stream and outputs it to the moving object detection means. The moving object detection means detects a moving object from the input edge information and motion information.

次に、以上のように構成された映像復号化装置１００の動作を説明する。 Next, the operation of the video decoding apparatus 100 configured as described above will be described.

図３は、図１に示す実施の形態１の映像復号化装置１００の動作を表すフローチャートである。なお、図３に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納された制御プログラムを、同じく図示しないＣＰＵが実行することにより、プログラムの実行によりソフトウエア的に実行されるようにすることも可能である。 FIG. 3 is a flowchart showing the operation of the video decoding apparatus 100 according to the first embodiment shown in FIG. The flowchart shown in FIG. 3 is executed by software by executing a control program stored in a storage device (not shown) such as a ROM or a flash memory by a CPU (not shown). It is also possible to do so.

まず、ストリーム入力部１０１が、映像復号化装置１００の外部から映像ストリームを入力し、映像ストリームの基本レイヤを基本レイヤ復号化部１０２へ、拡張レイヤを拡張レイヤ復号化部１０３に出力する（ステップＳ３０１）。 First, the stream input unit 101 inputs a video stream from the outside of the video decoding device 100, and outputs the base layer of the video stream to the base layer decoding unit 102 and the enhancement layer to the enhancement layer decoding unit 103 (step). S301).

次に、基本レイヤ復号化部１０２が、ストリーム入力部１０１から入力した基本レイヤから動き情報を抽出し移動物体検出部１０６に出力する。また、拡張レイヤ復号化部１０３が、ストリーム入力部１０１から入力した拡張レイヤからエッジ情報を抽出して移動物体検出部１０６に出力する。そして、移動物体検出部１０６が、基本レイヤ復号化部１０２と拡張レイヤ復号化部１０３から入力した動き情報とエッジ情報を用いて移動物体の検出を行い、移動物体検出結果を生成し検出結果出力部１０７と帯域合成部１０４に出力する（ステップＳ３０２）。 Next, the base layer decoding unit 102 extracts motion information from the base layer input from the stream input unit 101 and outputs the motion information to the moving object detection unit 106. Further, enhancement layer decoding section 103 extracts edge information from the enhancement layer input from stream input section 101 and outputs the extracted edge information to moving object detection section 106. Then, the moving object detection unit 106 detects a moving object using the motion information and edge information input from the base layer decoding unit 102 and the enhancement layer decoding unit 103, generates a moving object detection result, and outputs a detection result. Output to the unit 107 and the band synthesis unit 104 (step S302).

なお、映像は、移動物体を含むこともあれば含まないこともあり、また、含む場合にも移動物体が１つであることもあれば複数であることもある。 Note that a video may or may not include a moving object, and may include one moving object or a plurality of moving objects.

以下に、ステップＳ３０２の移動物体検出処理について詳しく説明を行う。 Hereinafter, the moving object detection process in step S302 will be described in detail.

図４は、図３の移動物体検出処理の手順の一例を示すフローチャートである。 FIG. 4 is a flowchart illustrating an example of the procedure of the moving object detection process of FIG.

まず、ステップＳ４０１では、エッジ情報の抽出処理を行う。具体的には、拡張レイヤ復号化部１０３が、ストリーム入力部１０１から入力した拡張レイヤから特定のビット平面までの情報を含む符号を抽出し、エッジ情報を生成して移動物体検出部１０６に出力する。 First, in step S401, edge information extraction processing is performed. Specifically, enhancement layer decoding section 103 extracts a code including information from the enhancement layer input from stream input section 101 to a specific bit plane, generates edge information, and outputs it to moving object detection section 106 To do.

ここで、ビット平面符号化について説明する。 Here, bit plane encoding will be described.

このビット平面とは、２進数で表された幾つかの数値データの同じビット位のみを並べたビット列のことである。ビット平面ごとに符号化する方法をビット平面符号化とよび、Weiping. Li, "Overview of Fine Granularity Scalability in MPEG-4 Video Standard", IEEE Transaction on Circuits and Systems for Video Technology, vol.11, pp.301-317, Mar.2001に記述されるようにデータの品質を調整する能力に優れている。 This bit plane is a bit string in which only the same bit positions of several numerical data expressed in binary numbers are arranged. The method of coding for each bit plane is called bit plane coding. Weiping. Li, "Overview of Fine Granularity Scalability in MPEG-4 Video Standard", IEEE Transaction on Circuits and Systems for Video Technology, vol.11, pp. Excellent ability to adjust data quality as described in 301-317, Mar.2001.

図２はビット平面符号化の概念を示した図であり、水平方向成分のある領域を表すものとして説明を進める。 FIG. 2 is a diagram showing the concept of bit-plane coding, and the description will be given assuming that it represents a region having a horizontal component.

図２において、１列は、水平成分の１画素を２進数で表現したものを表す（画素１、画素２）。１行は、水平方向成分のある領域におけるビット平面を表し（ビット平面１、ビット平面２）、すなわち、各画素の同じ位のビットのみを集めたものである。ビット平面は、上位のビット平面であるほど、水平方向成分の強いエッジを表現することが可能である。このビット平面のうち、最上位ビット平面から特定のビット平面までの情報を並べたものを符号化したものがエッジ情報である。例えば８×８画素や１６×１６画素の領域ごとの特定ビット平面までのビット平面ごとの符号量などの情報を含む。水平方向成分、垂直方向成分および対角方向成分は多くの「０」を含むので、ビット平面符号化は「０」が多い場合に符号長が短くなるように符号化する。よって、「１」を多く含むほど水平方向成分、垂直方向成分および対角方向成分の領域のビット平面は符号長が長くなる。 In FIG. 2, one column represents one pixel of the horizontal component expressed in binary (pixel 1, pixel 2). One row represents a bit plane in a region having a horizontal component (bit plane 1 and bit plane 2), that is, a collection of only the same bits of each pixel. The higher the bit plane of the bit plane, the stronger the edge of the horizontal component can be expressed. Among the bit planes, edge information is obtained by encoding information obtained by arranging information from the most significant bit plane to a specific bit plane. For example, it includes information such as the code amount for each bit plane up to a specific bit plane for each 8 × 8 pixel or 16 × 16 pixel region. Since the horizontal direction component, the vertical direction component, and the diagonal direction component include many “0” s, bit plane encoding is performed so that the code length is shortened when there are many “0s”. Therefore, as the number of “1” s increases, the bit plane of the horizontal component, vertical component, and diagonal component regions has a longer code length.

図５は本実施の形態の拡張レイヤのデータ構造を表すものである。図５に示す拡張レイヤは、１画像分の符号であり、ｎ個のビット平面とｍ個の領域の情報を含む。１画像分の拡張レイヤは、画像のヘッダ情報５０１、最上位のビット平面を表すビット平面１から最下位のビット平面ｎの情報５０２を保持する。 FIG. 5 shows the data structure of the enhancement layer according to this embodiment. The enhancement layer shown in FIG. 5 is a code for one image, and includes information on n bit planes and m areas. The enhancement layer for one image holds image header information 501 and information 502 from the bit plane 1 representing the most significant bit plane to the least significant bit plane n.

図６は図５における拡張レイヤのビット平面ｋのデータ構造を表し、拡張レイヤのビット平面ｋはビット平面のヘッダ情報６０１、領域１〜領域ｍのビット平面ｋの符号６０２を含む。 FIG. 6 shows the data structure of the bit plane k of the enhancement layer in FIG. 5, and the bit plane k of the enhancement layer includes the header information 601 of the bit plane and the reference numeral 602 of the bit plane k of regions 1 to m.

図７は図６における拡張レイヤの領域ｊのビット平面ｋのデータ構造を表し、拡張レイヤの領域ｊのビット平面ｋは該当する領域の画素成分の符号７０１と、領域の符号が終了したことを表す終端信号７０２を含む。 FIG. 7 shows the data structure of the bit plane k of the enhancement layer region j in FIG. 6. The bit plane k of the enhancement layer region j shows that the pixel component code 701 of the corresponding region and the region code have been terminated. A termination signal 702 is included.

以上のようなデータ構造により、ビット平面の情報の抽出は、最上位ビット平面から特定のビット平面まで、それら領域の終端信号を順に映像ストリーム内から検索し、領域の終端信号間の符号長を数えるだけでよい。このため、拡張レイヤ復号化部１０３は高速にエッジ情報を生成することが可能である。 With the data structure as described above, bit plane information is extracted from the video stream in order from the most significant bit plane to a specific bit plane by searching for the end signals of those areas in the video stream, and the code length between the end signals of the areas is determined. Just count. Therefore, enhancement layer decoding section 103 can generate edge information at high speed.

次に、ステップＳ４０２では、動き情報の抽出処理を行う。具体的には、基本レイヤ復号化部１０２が、ストリーム入力部１０１から入力した基本レイヤから動きベクトルの情報を抽出し、動き情報を生成して移動物体検出部１０６に出力する。 Next, in step S402, motion information extraction processing is performed. Specifically, the base layer decoding unit 102 extracts motion vector information from the base layer input from the stream input unit 101, generates motion information, and outputs the motion information to the moving object detection unit 106.

この動き情報は、基本レイヤの動き予測補償に用いるものであり、領域ごとの動き予測補償符号化であるか、フレーム内符号化であるかの情報、動きベクトルの大きさと方向、動きベクトルが参照する画像の情報、および画像全体が動き予測補償符号化であるか、フレーム内符号化であるかの情報などを含む。 This motion information is used for motion prediction compensation of the base layer, and is referred to as motion prediction compensation coding for each region or intra-frame coding, the magnitude and direction of the motion vector, and the motion vector. And information on whether the entire image is motion prediction compensation coding or intraframe coding.

図８は本実施の形態の基本レイヤのデータ構造を表すものである。図８に示す基本レイヤは、１画像分の符号であり、ｍ個の領域の情報を含む。すなわち、１画像分の基本レイヤは、画像のヘッダ情報８０１と、領域１〜領域ｍの情報８０２とを含む。図９は図８における基本レイヤの領域ｐのデータ構造を表し、基本レイヤの領域ｐは領域のヘッダ情報９０１、動きベクトル９０２、画素成分の符号９０３、および、領域の符号が終了したことを表す終端信号９０４を含む。 FIG. 8 shows the data structure of the base layer of the present embodiment. The base layer shown in FIG. 8 is a code for one image and includes information of m areas. That is, the base layer for one image includes image header information 801 and information 802 on regions 1 to m. FIG. 9 shows the data structure of the base layer region p in FIG. 8. The base layer region p shows the region header information 901, the motion vector 902, the pixel component code 903, and the end of the region code. A termination signal 904 is included.

動きベクトルの抽出は、それら領域のヘッダ情報９０１や終端信号９０４を映像ストリームから検索し、その位置から定位置にある動きベクトル９０２のみを復号化するだけでよい。これにより、基本レイヤ復号化部１０２は高速に動き情報を生成することが可能である。 To extract a motion vector, it is only necessary to search the header information 901 and termination signal 904 of those areas from the video stream and decode only the motion vector 902 located at a fixed position from that position. As a result, the base layer decoding unit 102 can generate motion information at high speed.

ステップＳ４０３では、移動物体の輪郭の検出処理を行う。具体的には、移動物体検出部１０６が、基本レイヤ復号化部１０２と拡張レイヤ復号化部１０３とから入力した動き情報とエッジ情報を用いて移動物体の輪郭の領域を検出し、結果を移動物体検出部１０６に記憶する。 In step S403, a moving object outline detection process is performed. Specifically, the moving object detection unit 106 detects the contour area of the moving object using the motion information and edge information input from the base layer decoding unit 102 and the enhancement layer decoding unit 103, and moves the result. Store in the object detection unit 106.

ここで、輪郭の領域の検出方法について、以下に説明する。 Here, a method for detecting a contour region will be described below.

すなわち、ある領域に対する水平方向成分、垂直方向成分および対角方向成分のビット平面から求めた符号長、例えば最上位ビット平面から３ビット平面までの各符号量の合計符号長が閾値Ａ以上であることを条件１とする。なお、この閾値Ａは弱いエッジと判定する基準値である。 That is, the code length obtained from the bit plane of the horizontal direction component, vertical direction component and diagonal direction component for a certain region, for example, the total code length of each code amount from the most significant bit plane to the 3-bit plane is equal to or greater than the threshold A This is Condition 1. The threshold A is a reference value for determining a weak edge.

また、上記の領域の合計符号長が閾値Ｂ以下であることを条件２とする。この閾値Ｂは縞模様のようなエッジでない画像を識別するための基準値である。 Further, the condition 2 is that the total code length of the above region is equal to or less than the threshold value B. This threshold value B is a reference value for identifying an image that is not an edge such as a striped pattern.

そして、領域が含むエッジ情報が点か線か面を表すものであるか否かを判別し、上記の領域の合計符号長が、これら条件１と条件２とを満足するとき、物体の輪郭に現われる線であると判定する。以下に、具体的な例を、図１０を用いて説明する。 Then, it is determined whether or not the edge information included in the region represents a point, a line, or a surface, and when the total code length of the region satisfies these conditions 1 and 2, the contour of the object It is determined that the line appears. Hereinafter, a specific example will be described with reference to FIG.

図１０（ａ）乃至（ｃ）はそれぞれ、８×８画素領域における水平方向成分の例を表したものである。説明を簡単にするため、画素値を２値で表しており、最上位ビット平面から特定のビット平面までに”１”を含むならばマスを黒くし、”１”を含まないものは白く表す。図１０（ａ）は領域内にノイズや小さい点などが存在する場合の水平方向成分を示し、図１０（ｂ）は領域内に縦線が存在する場合の水平方向成分を示し、図１０（ｃ）は領域内全てが例えば縞模様の一部である場合の水平方向成分を示す。図１０（ａ）乃至（ｃ）で表される領域をそれぞれ符号化すると、領域が含む０以外の値の多さに応じて符号量は小さい順に図１０（ａ）、図１０（ｂ）、図１０（ｃ）となる。垂直方向成分と対角方向成分についても同様である。このとき、閾値Ａが８、閾値Ｂが３２とすると、閾値Ａ＜前記合計値＜閾値Ｂの関係が成立する図１０（ｂ）に示す領域は物体の輪郭に現われる線を含むと判定することができる。なお、閾値Ａ＜閾値Ｂである。 FIGS. 10A to 10C each show an example of the horizontal component in the 8 × 8 pixel region. In order to simplify the explanation, the pixel value is represented by a binary value. If “1” is included from the most significant bit plane to a specific bit plane, the square is black, and those not including “1” are white. . FIG. 10A shows the horizontal component when noise or a small point exists in the region, and FIG. 10B shows the horizontal component when a vertical line exists in the region. c) shows the horizontal component when the entire region is part of a striped pattern, for example. When each of the regions represented in FIGS. 10A to 10C is encoded, the code amount is ascending in accordance with the number of values other than 0 included in the regions in the order of FIG. 10A, FIG. 10B, It becomes FIG.10 (c). The same applies to the vertical direction component and the diagonal direction component. At this time, if the threshold A is 8 and the threshold B is 32, it is determined that the region shown in FIG. 10B where the relationship of threshold A <the total value <threshold B is satisfied includes a line appearing in the contour of the object. Can do. Note that threshold A <threshold B.

また、より簡便な輪郭抽出としては、閾値Ａのみを利用して、閾値Ａ＜前記合計値の関係が成立する領域は、物体の輪郭に現れる線を含むと判定することもできる。 For simpler contour extraction, it is also possible to use only the threshold value A and determine that a region where the relationship of threshold value A <the total value is satisfied includes a line that appears in the contour of the object.

さらに、輪郭と判定したある領域が移動物体の輪郭であるか否かは、次の条件３あるいは条件４を満たすか否かにより行う。 Further, whether or not a certain region determined to be a contour is a contour of a moving object is determined by whether or not the following condition 3 or condition 4 is satisfied.

すなわち、条件３は、領域の動きベクトルの大きさが閾値Ｃ未満であることであり、対象とする移動物体の動きはある程度以上の動きをしている必要があるからである。 That is, the condition 3 is that the size of the motion vector of the region is less than the threshold value C, and the movement of the target moving object needs to move more than a certain amount.

条件４は、領域の動きベクトルと、周囲の動きベクトルとの差分ベクトルの大きさが閾値Ｄ未満であることである。これは、移動物体が周囲とは同じ動きをするか否かを判断する。なお、周囲の動きベクトルは一つでなくても良い。その場合の条件４について説明する。まず、周囲の複数の動きベクトルを抽出し、周囲の動きベクトル毎に、領域の動きベクトルとの差分ベクトルの大きさを求める。この場合の条件４は、差分ベクトルの大きさの合計値が閾値Ｄ未満であることである。 Condition 4 is that the magnitude of the difference vector between the motion vector of the region and the surrounding motion vectors is less than the threshold value D. This determines whether or not the moving object moves the same as the surroundings. The surrounding motion vector may not be one. The condition 4 in that case will be described. First, a plurality of surrounding motion vectors are extracted, and for each surrounding motion vector, the size of the difference vector from the region motion vector is obtained. Condition 4 in this case is that the sum of the magnitudes of the difference vectors is less than the threshold value D.

なお、条件４について上記以外として次の条件も想定できる。例えば周囲の動きベクトルとして複数の動きベクトルを選択した場合、領域と周囲の領域の動きベクトルのＸ方向成分（水平方向成分）の差分の２乗和と、同じくＹ方向成分（垂直方法成分）の差分の２乗和の合計で計算したもの（以下、分散）を基準とすることもできる。この場合の条件４は、上記分散が閾値Ｄ未満であることである。条件４を満たせば、領域の動きベクトルは周囲と同じ方向や大きさを持つものとし、移動物体でないと判断する。また、分散の計算はこれに限らず、動きベクトルの大きさの差分の絶対値と角度の差分の絶対値の積を周囲領域で合計した値で計算しても良い。領域の動きベクトルが周囲の動きベクトルと異なる方向や大きさを持つかどうかを判断できるものであれば、これらに限らない。 In addition to the above, the following condition can be assumed for condition 4. For example, when a plurality of motion vectors are selected as the surrounding motion vectors, the sum of squares of the difference between the X direction component (horizontal direction component) of the motion vector between the region and the surrounding region, and the Y direction component (vertical method component) are also used. A value calculated by the sum of squared differences (hereinafter, variance) can be used as a reference. Condition 4 in this case is that the variance is less than the threshold value D. If the condition 4 is satisfied, the motion vector of the region is assumed to have the same direction and size as the surroundings, and is determined not to be a moving object. In addition, the calculation of the variance is not limited to this, and the product of the absolute value of the difference in magnitude of the motion vector and the absolute value of the difference in angle may be calculated as a sum of the surrounding areas. The present invention is not limited to this as long as it can be determined whether or not the motion vector of the region has a different direction and size from the surrounding motion vectors.

そして、この条件４あるいは条件５を満たすとき、その領域は移動物体の領域でないと判定する。なお、画像全体をフレーム内符号化しているように、動きベクトルを含まないフレームでは輪郭の判断をせずに、動きベクトルを含んでいるフレームを待つようにする。これは、動きベクトルのないフレームからは動きを検出することができないからである。 When this condition 4 or 5 is satisfied, it is determined that the area is not a moving object area. Note that, as in the case where the entire image is encoded in the frame, the frame including the motion vector is waited without determining the outline in the frame not including the motion vector. This is because no motion can be detected from a frame having no motion vector.

移動物体検出部１０６は、上記の条件１と条件２とから物体の輪郭と判定した領域のうち、条件３、又は条件４を満たす領域を、移動物体の輪郭ではないと判定する。これは、移動する物体の輪郭は、周囲と異なる速度をもって動くからである。 The moving object detection unit 106 determines that an area satisfying the condition 3 or 4 among the areas determined as the object outline from the above conditions 1 and 2 is not the outline of the moving object. This is because the contour of the moving object moves at a different speed from the surroundings.

次に、ステップＳ４０４では、移動物体の内部の検出処理を行う。具体的には、移動物体検出部１０６が、基本レイヤ復号化部１０２から入力した動き情報と、前記記憶した移動物体の輪郭の検出結果を用いて、移動物体の内部の領域を検出する。内部の領域の検出結果を移動物体検出部１０６に記憶する。 Next, in step S404, detection processing inside the moving object is performed. Specifically, the moving object detection unit 106 detects a region inside the moving object using the motion information input from the base layer decoding unit 102 and the stored detection result of the contour of the moving object. The detection result of the internal area is stored in the moving object detection unit 106.

ここで、内部の領域を検出する方法について、以下に説明する。 Here, a method for detecting the internal region will be described below.

すなわち、ある領域を移動物体の内部であると判定する条件は次に示す条件５あるいは条件６を満たす場合である。 That is, the condition for determining that a certain region is inside the moving object is the case where the following condition 5 or 6 is satisfied.

条件５は、移動物体の輪郭または内部と判定した領域の近傍にあり、近傍の領域との動きベクトルの大きさと方向の分散が閾値Ｅ未満であることであり、閾値Ｅは移動物体の輪郭と内部とが同一速度で動くと判定するときの基準値である。 Condition 5 is that it is in the vicinity of the contour of the moving object or the region determined to be inside, and the variance of the magnitude and direction of the motion vector with the neighboring region is less than the threshold value E. The threshold value E is the contour of the moving object. This is a reference value for determining that the inside moves at the same speed.

条件６は、移動物体の輪郭または内部と判定した領域に囲まれていることであり、これは移動物体の内部は輪郭で囲まれているものだからである。 Condition 6 is that the moving object is surrounded by a contour or an area determined to be inside, because the moving object is surrounded by a contour.

次に、ステップＳ４０５では、移動物体の誤検出を除去する処理を行う。具体的には、移動物体検出部１０６が、記憶してある移動物体の輪郭と内部の領域の検出結果から、誤検出した領域を除去し、移動物体検出結果を生成し検出結果出力部１０７と帯域合成部１０４に出力する。 Next, in step S405, processing for removing erroneous detection of a moving object is performed. Specifically, the moving object detection unit 106 removes the erroneously detected area from the stored moving object outline and internal area detection result, generates a moving object detection result, and generates a detection result output unit 107. The data is output to the band synthesis unit 104.

この誤検出した領域であることの判定条件は、移動物体の輪郭または内部と判定した領域が周囲に少ないことであり、あまりに小さい移動物体を検出した場合には、誤検出の可能性が高いからである。 The condition for determining this erroneously detected area is that there are few areas around the contour or inside of the moving object, and if a too small moving object is detected, the possibility of erroneous detection is high. It is.

移動物体検出部１０６は、上記のようにして得た移動物体の領域から移動物体検出結果を生成する。移動物体検出結果は、例えば以下のようなものである。 The moving object detection unit 106 generates a moving object detection result from the moving object region obtained as described above. The moving object detection result is, for example, as follows.

第１に、移動物体の領域であるか、そうでないかを領域ごとに記述した情報であり、第２に、１つの移動物体に対してそれに外接する１つ矩形や楕円を定義し、矩形や楕円ごとの座標や大きさを記述した情報である。 The first is information that describes whether or not the region is a moving object for each region. Second, one rectangle or ellipse that circumscribes one moving object is defined. Information describing the coordinates and size of each ellipse.

なお、移動物体の内部の情報を必要としない場合は、内部の検出の処理を省略しても良い。 If information inside the moving object is not required, the internal detection process may be omitted.

また、移動物体の検出方法については、動きベクトルを用いた移動物体の検出方法に限らず、本発明のエッジ情報と組み合わせれば、他の方法を用いても良い。 Further, the moving object detection method is not limited to the moving object detection method using the motion vector, and other methods may be used in combination with the edge information of the present invention.

本実施の形態の移動物体検出方法によると、基本レイヤが動きベクトルを含み、拡張レイヤがあるビット位のビット平面までの符号さえ含んでいれば、伝送が低ビットレートであって、例え画質が悪くても、移動物体の検出を高速かつ高精度かつ低処理負荷に行うことが可能である。 According to the moving object detection method of the present embodiment, if the base layer includes a motion vector and the enhancement layer includes even a code up to a bit plane of a certain bit level, the transmission is at a low bit rate, and the image quality is Even if it is bad, it is possible to detect a moving object at high speed, high accuracy, and low processing load.

次に、ステップＳ３０３では、移動物体を検出した結果を出力する。具体的には、検出結果出力部１０７が、移動物体検出部１０６から入力された移動物体の領域の座標を外部に出力する。 Next, in step S303, the result of detecting the moving object is output. Specifically, the detection result output unit 107 outputs the coordinates of the area of the moving object input from the moving object detection unit 106 to the outside.

次に、ステップＳ３０４では、基本レイヤ復号化処理を行う。具体的には、基本レイヤ復号化部１０２が、ストリーム入力部１０１から入力した映像ストリームの基本レイヤを、動き予測補償復号化して縮小画像を生成し、帯域合成部１０４に出力する。 Next, in step S304, base layer decoding processing is performed. Specifically, the base layer decoding unit 102 performs motion prediction compensation decoding on the base layer of the video stream input from the stream input unit 101 to generate a reduced image, and outputs the reduced image to the band synthesis unit 104.

次に、ステップＳ３０５では、拡張レイヤ復号化処理を行う。具体的には、拡張レイヤ復号化部１０３が、ストリーム入力部１０１から入力した映像ストリームの拡張レイヤをビット平面復号化して水平方向成分、垂直方向成分および対角方向成分を生成し、帯域合成部１０４に出力する。 Next, in step S305, enhancement layer decoding processing is performed. Specifically, the enhancement layer decoding unit 103 performs bit plane decoding on the enhancement layer of the video stream input from the stream input unit 101 to generate a horizontal component, a vertical component, and a diagonal component, and a band synthesis unit To 104.

次に、ステップＳ３０６では、帯域合成処理を行う。具体的には、帯域合成部１０４が、基本レイヤ復号化部１０２から入力した縮小画像と拡張レイヤ復号化部１０３から入力した水平方向成分、垂直方向成分および対角方向成分を帯域合成し復号画像を生成し、映像出力部１０５に出力する。さらに、帯域合成部１０４が、移動物体検出部１０６から入力した移動物体検出結果を用いて、復号画像の移動物体を含む領域を強調しても良い。 Next, in step S306, band synthesis processing is performed. Specifically, the band synthesizing unit 104 performs band synthesis on the reduced image input from the base layer decoding unit 102 and the horizontal direction component, the vertical direction component, and the diagonal direction component input from the enhancement layer decoding unit 103, thereby decoding the decoded image. Is output to the video output unit 105. Further, the band synthesizing unit 104 may emphasize a region including the moving object in the decoded image using the moving object detection result input from the moving object detecting unit 106.

ここで、この移動物体の領域の強調について説明する。例えば、帯域合成部１０４が、移動物体の領域の部分だけ復号映像を着色する、または、移動物体の領域を枠で囲むなどの処理を行う。また、基本レイヤを復号化して得る縮小画像の全画素の値を「０」として帯域合成して輪郭のみからなる画像を生成し、さらに、移動物体の領域部分を強調しても良い。 Here, enhancement of the area of the moving object will be described. For example, the band synthesizing unit 104 performs processing such as coloring the decoded video only in the moving object region or surrounding the moving object region with a frame. In addition, an image including only the outline may be generated by performing band synthesis by setting all pixel values of the reduced image obtained by decoding the base layer to “0”, and further, the region portion of the moving object may be emphasized.

このようにすると、輪郭からなる映像の中において移動物体のみが非常に目立ち、複数の監視映像を同時に見る監視者にとって異常や不審人物の発生を検知しやすい。また、通信速度の制限などにより基本レイヤのビットレートが非常に低く、画質の極端に悪い映像しか生成できないときには、輪郭のみの方がむしろ細部を認識できる場合がある。あるいは、複数のカメラ映像を表示する場合などの、処理能力が限られた環境においても、輪郭のみ表示する方が、低処理負荷で、監視上重要な領域を見やすくすることが可能になる。 In this way, only the moving object is very conspicuous in the image composed of contours, and it is easy for a monitor who watches a plurality of monitoring images simultaneously to detect the occurrence of an abnormality or a suspicious person. In addition, when the bit rate of the base layer is very low due to the limitation of the communication speed and only an image with extremely bad image quality can be generated, the contour alone may rather recognize the details. Alternatively, even in an environment where the processing capability is limited, such as when displaying a plurality of camera images, it is possible to make it easier to see a region important for monitoring with a lower processing load by displaying only the contour.

次に、ステップＳ３０７では、映像出力処理を行う。具体的には、映像出力部１０５が、帯域合成部１０４から入力した復号映像を外部に出力する。 Next, in step S307, video output processing is performed. Specifically, the video output unit 105 outputs the decoded video input from the band synthesis unit 104 to the outside.

なお、復号化処理を行わずに移動物体の検出のみを行うことも可能であり、このとき、映像を得ることはできないが、基本レイヤ復号化処理（ステップＳ３０４）から映像出力処理（ステップＳ３０７）までの処理をしないので、更に高速かつ低処理負荷で移動物体を検出することが可能となる。 Note that it is possible to detect only a moving object without performing the decoding process. At this time, a video cannot be obtained, but the video output process (step S307) is performed from the base layer decoding process (step S304). Thus, the moving object can be detected at a higher speed and with a lower processing load.

次に、ステップＳ３０８では、終了判定処理を行う。ストリーム入力部１０１が、続く映像ストリームの有無を判定するなどして、映像復号化装置１００がこれ以上移動物体の検出を行う必要も映像を復号する必要もなければ処理を終了し、そうでなければステップＳ３０１に戻る。 Next, in step S308, an end determination process is performed. When the stream input unit 101 determines whether or not there is a subsequent video stream, the video decoding apparatus 100 ends the process if it is not necessary to detect a moving object anymore or to decode the video, and so on. Return to step S301.

なお、上記の説明では基本レイヤ復号化処理（ステップＳ３０４）乃至映像出力処理（ステップＳ３０７）を移動物体の検出処理（ステップＳ３０２とステップＳ３０３）後に行っているが、これに限らず、基本レイヤや拡張レイヤの復号化処理と並列して、移動物体検出処理を行うことが可能である。 In the above description, the base layer decoding process (step S304) to the video output process (step S307) are performed after the moving object detection process (steps S302 and S303). However, the present invention is not limited to this. In parallel with the enhancement layer decoding process, the moving object detection process can be performed.

また、帯域分割を用いた他の符号化方法による映像ストリームの生成方法として、入力画像を動き予測補償した後に帯域分割を行って、ビット平面符号化する方法がある。しかし、この方法では、動き予測補償により前後の画像と差分をとった画像を帯域分割しても、物体の輪郭に発生する水平方向成分、垂直方向成分および対角方向成分を得ることができない。この場合は、全体をフレーム内符号化した画像の水平方向成分、垂直方向成分および対角方向成分のみ用いることになる。 In addition, as a method for generating a video stream by another encoding method using band division, there is a method of performing bit division encoding after performing motion prediction compensation on an input image and performing bit-plane encoding. However, in this method, even if the image obtained by performing the motion prediction compensation is different from the previous and subsequent images, the horizontal component, the vertical component, and the diagonal component generated in the contour of the object cannot be obtained. In this case, only the horizontal direction component, the vertical direction component, and the diagonal direction component of the image obtained by intra-frame encoding the whole are used.

また、拡張レイヤに、水平方向成分、垂直方向成分および対角方向成分のみならず、縮小画像と基本レイヤを復号した画像の差分の情報を含めることも可能である。 The enhancement layer can include not only the horizontal component, the vertical component, and the diagonal component, but also information on the difference between the reduced image and the image obtained by decoding the base layer.

以上のように、本実施の形態１によれば、入力画像を直接帯域分割して得る水平方向成分、垂直方向成分および対角方向成分の情報と、動き予測補償によって生成する動きベクトルを含む映像ストリームから、エッジの情報と動きの情報を抽出する手段を設けたことにより、動き予測符号化を用いた基本レイヤと水平方向成分、垂直方向成分および対角方向成分のビット平面符号化を用いた拡張レイヤからなる映像ストリームを復号化することなく、高速かつ高精度かつ低処理負荷に移動物体を検出することができる。 As described above, according to the first embodiment, the video including the information on the horizontal direction component, the vertical direction component, and the diagonal direction component obtained by direct band division of the input image, and the motion vector generated by the motion prediction compensation By providing means for extracting edge information and motion information from a stream, a base layer using motion predictive coding and bit plane coding of horizontal, vertical and diagonal components are used. A moving object can be detected at high speed, high accuracy, and low processing load without decoding a video stream composed of an enhancement layer.

また、実施の形態１によれば、動き情報を基本レイヤの映像ストリームから、エッジ情報を拡張レイヤの映像ストリームから抽出することが可能で、動き情報が動きのないことを示している場合にエッジ情報の抽出などの処理を中止して処理負荷を軽減することが可能であり、また、エッジ情報がエッジのないことを示している場合に動き情報の抽出などの処理を中止して処理負荷を軽減することが可能であり、高速に物体の輪郭を検出することができる。このとき、動き情報とエッジ情報の抽出はどちらを先に行っても良く、また、並列に行っても良い。 Also, according to the first embodiment, it is possible to extract motion information from the base layer video stream and edge information from the enhancement layer video stream, and when the motion information indicates no motion, the edge Processing such as information extraction can be stopped to reduce the processing load. Also, when edge information indicates that there is no edge, processing such as motion information extraction is stopped to reduce the processing load. The contour of the object can be detected at high speed. At this time, either the motion information or the edge information may be extracted first or in parallel.

また、実施の形態1によれば、移動物体の検出は、動きベクトルと一部のビット平面のエッジ情報のみで行うことができるので、通信速度が制限された状況など低ビットレートの映像ストリームであっても高速かつ高効率に移動物体の検出が可能である。 In addition, according to Embodiment 1, since the detection of a moving object can be performed using only motion vectors and edge information of some bit planes, a low bit rate video stream such as a situation where communication speed is limited. Even in such a case, it is possible to detect a moving object at high speed and high efficiency.

また、実施の形態1では、移動物体の検出に必要なエッジ情報の抽出を拡張レイヤ復号化部１０３が行い、動き情報の抽出を基本レイヤ復号化部１０２が行うことにより、映像復号化処理と移動物体検出処理とが一部の手段や処理を共有することができるので、移動物体の検出と映像の復号化とを同時に、かつ高速に行うことが可能であり、また、装置全体の規模を小さくすることができる。 Further, in the first embodiment, the enhancement layer decoding unit 103 performs extraction of edge information necessary for detecting a moving object, and the base layer decoding unit 102 performs extraction of motion information. Since the moving object detection process can share some means and processes, it is possible to simultaneously detect the moving object and decode the video at high speed, and to increase the scale of the entire apparatus. Can be small.

また、実施の形態１によれば、拡張レイヤ復号化部１０３が、映像ストリーム内にあるビット平面ヘッダ６０１に含まれる開始信号や、８×８画素などの領域ごとの終端信号７０２を検索し識別信号間の符号長を数えるだけで、高速にエッジ情報を生成することが可能である。 Further, according to the first embodiment, the enhancement layer decoding unit 103 searches and identifies the start signal included in the bit plane header 601 in the video stream and the termination signal 702 for each region such as 8 × 8 pixels. Edge information can be generated at high speed only by counting the code length between signals.

また、実施の形態１によれば、基本レイヤ復号化部１０２が、映像ストリーム内の、例えば、８×８画素などの領域ごとの識別信号を検索し、その識別信号から決まった位置にある動きベクトルを復号するだけで、高速に動き情報を生成することが可能である。 Further, according to Embodiment 1, the base layer decoding unit 102 searches for an identification signal for each region such as 8 × 8 pixels in the video stream, and moves at a position determined from the identification signal. It is possible to generate motion information at high speed only by decoding a vector.

また、実施の形態1によれば、移動物体検出部１０６が、エッジ情報と動き情報による移動物体の輪郭の検出、動き情報や既に検出した結果による移動物体の内部の検出、および、誤検出の除去を行うことにより、移動物体の検出を高精度に行うことが可能である。 Further, according to the first embodiment, the moving object detection unit 106 detects the contour of the moving object based on the edge information and the motion information, detects the inside of the moving object based on the motion information and the already detected result, and performs false detection. By performing the removal, it is possible to detect the moving object with high accuracy.

また、実施の形態１では、帯域合成部１０４が、復号化した映像の移動物体の領域を強調することや、基本レイヤを復号化した縮小映像を帯域合成しない線画をもちいることにより、移動物体の検出結果を監視者に検知しやすくすることが可能である。 Further, in the first embodiment, the band synthesis unit 104 emphasizes the area of the moving object in the decoded video, or uses a line drawing that does not band synthesize the reduced video obtained by decoding the base layer. It is possible to make it easy for the observer to detect the detection result.

（実施の形態２）
実施の形態２は、本発明に係る移動物体検出方法および装置を映像監視システムに適用したものである。映像監視システムは、映像符号化装置を備えた自動追尾カメラを有する。つまり、映像を符号化し映像ストリームを生成すると同時に、映像内にある移動物体を高速かつ高精度かつ低処理負荷で検出し、その検出結果をもとに自動追尾カメラが移動物体を自動追尾し、効率的に映像監視できるようにしたものである。 (Embodiment 2)
In the second embodiment, the moving object detection method and apparatus according to the present invention are applied to a video surveillance system. The video surveillance system has an automatic tracking camera equipped with a video encoding device. In other words, the video is encoded and the video stream is generated, and at the same time, the moving object in the video is detected with high speed, high accuracy and low processing load, and the automatic tracking camera automatically tracks the moving object based on the detection result, The video can be monitored efficiently.

以下に、この映像監視システムについて具体的に説明する。 The video monitoring system will be specifically described below.

図１１は、本発明の移動物体検出方法および装置を適用した実施の形態２に係る映像監視システムの構成を示す図である。 FIG. 11 is a diagram showing a configuration of a video surveillance system according to Embodiment 2 to which the moving object detection method and apparatus of the present invention is applied.

この映像監視システムは、映像監視装置１１００、通信網１１１０、Ｎ台の自動追尾カメラ１１２１〜１１２Ｎを有する。なお、自動追尾カメラは本発明の撮像装置に相当する。 This video monitoring system includes a video monitoring device 1100, a communication network 1110, and N automatic tracking cameras 1121 to 112N. The automatic tracking camera corresponds to the imaging device of the present invention.

図１２は、実施の形態２に係る自動追尾カメラ１１２１乃至１１２Ｎの構成を示すブロック図である。図１２に示す自動追尾カメラは、図１１に示す映像監視システムにおける自動追尾カメラ１１２１に対応する。 FIG. 12 is a block diagram illustrating a configuration of the automatic tracking cameras 1121 to 112N according to the second embodiment. The automatic tracking camera shown in FIG. 12 corresponds to the automatic tracking camera 1121 in the video surveillance system shown in FIG.

図１２において、自動追尾カメラ１１２１は、撮像部１２０１、映像符号化部１２０２、撮像制御部１２０３を有する。他の自動追尾カメラ１１２２〜１１２Ｎについても同様の構成を有する。 In FIG. 12, the automatic tracking camera 1121 includes an imaging unit 1201, a video encoding unit 1202, and an imaging control unit 1203. The other automatic tracking cameras 1122 to 112N have the same configuration.

なお、撮像部１２０１が本発明の撮像手段に相当し、撮像制御部１２０３が本発明の撮像制御手段に相当する。 The imaging unit 1201 corresponds to the imaging unit of the present invention, and the imaging control unit 1203 corresponds to the imaging control unit of the present invention.

ここで、撮像部１２０１はパン・ティルト・ズームなどの撮像機能動作を行って撮像した映像を映像符号化部１２０２へ出力する。 Here, the imaging unit 1201 performs imaging function operations such as panning, tilting, and zooming, and outputs an image captured to the video encoding unit 1202.

映像符号化部１２０２は、入力した映像を帯域分割して、水平方向成分、垂直方向成分および対角方向成分の情報と、動き予測補償によって生成する動きベクトルを含む映像ストリームを生成する。 The video encoding unit 1202 divides the input video into bands, and generates a video stream including information on the horizontal direction component, the vertical direction component, and the diagonal direction component, and a motion vector generated by motion prediction compensation.

撮像制御部１２０３は、追尾する目標の情報と移動物体の検出の結果を入力し、撮像部１２０１に対してパン・ティルト・ズームを行うための制御信号を生成し出力する。 The imaging control unit 1203 inputs target information to be tracked and the result of detection of a moving object, and generates and outputs a control signal for performing pan / tilt / zoom to the imaging unit 1201.

図１３は、映像符号化部１２０２の構成を示すブロック図であり、本発明の移動物体検出方法および装置を適用した映像符号化装置に相当する。 FIG. 13 is a block diagram showing a configuration of the video encoding unit 1202, and corresponds to a video encoding device to which the moving object detection method and apparatus of the present invention is applied.

図１３において、映像符号化部１２０２は、映像入力部１３０１、帯域分割部１３０２、基本レイヤ符号化部１３０３、拡張レイヤ符号化部１３０４、ストリーム出力部１３０５，移動物体検出部１３０６、および検出結果出力部１３０７を有する。 In FIG. 13, a video encoding unit 1202 includes a video input unit 1301, a band division unit 1302, a base layer encoding unit 1303, an enhancement layer encoding unit 1304, a stream output unit 1305, a moving object detection unit 1306, and a detection result output. Part 1307.

なお、帯域分割部１３０２と基本レイヤ符号化部１３０３と拡張レイヤ符号化部１３０４とが本発明の映像符号化手段に相当し、基本レイヤ符号化部１３０３が動き情報抽出手段に相当し、拡張レイヤ符号化部１３０４がエッジ情報抽出手段に相当し、移動物体検出部１３０６が移動物体検出手段に相当する。 Note that the band division unit 1302, the base layer encoding unit 1303, and the enhancement layer encoding unit 1304 correspond to the video encoding unit of the present invention, the base layer encoding unit 1303 corresponds to the motion information extraction unit, and the enhancement layer. The encoding unit 1304 corresponds to edge information extraction means, and the moving object detection unit 1306 corresponds to moving object detection means.

ここで、映像符号化手段は、入力した映像を符号化して映像ストリームを生成して出力する。これを構成する帯域分割部１３０２は、入力画像を帯域分割して、縮小画像、水平成分、垂直成分、および対角成分を生成したり、縮小画像を動き予測補償符号化により、単独で映像を復号化可能な基本レイヤとして符号化する。また、これら水平方向成分、垂直方向成分および対角方向成分をビット平面符号化により、拡張レイヤとして符号化する。基本レイヤ符号化部１３０３は、生成した映像ストリームから動き情報を抽出して移動物体検出部１３０６に出力する。拡張レイヤ符号化部１３０４は、生成した映像ストリームからエッジ情報を抽出して移動物体検出部１３０６に出力する。移動物体検出部１３０６は、入力したエッジ情報と動き情報から移動物体を検出する。なお、ストリーム出力部１３０５と検出結果出力部１３０７とが本発明の出力部に相当する。 Here, the video encoding means encodes the input video to generate and output a video stream. The band dividing unit 1302 constituting this generates a reduced image, a horizontal component, a vertical component, and a diagonal component by dividing the input image into bands, or the reduced image is subjected to motion prediction compensation encoding to independently generate a video. Encode as a decodable base layer. Also, these horizontal direction component, vertical direction component and diagonal direction component are encoded as an enhancement layer by bit plane encoding. The base layer encoding unit 1303 extracts motion information from the generated video stream and outputs the motion information to the moving object detection unit 1306. The enhancement layer encoding unit 1304 extracts edge information from the generated video stream and outputs it to the moving object detection unit 1306. The moving object detection unit 1306 detects a moving object from the input edge information and motion information. Note that the stream output unit 1305 and the detection result output unit 1307 correspond to the output unit of the present invention.

次に、本実施の形態に係る自動追尾カメラ１１２１の動作を説明する。図１４は、図１２に示す自動追尾カメラ１１２１の動作を表すフローチャートである。なお、図１４に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納された制御プログラムを、同じく図示しないＣＰＵが実行することにより、プログラムの実行によりソフトウエア的に実行されるようにすることも可能である。 Next, the operation of the automatic tracking camera 1121 according to this embodiment will be described. FIG. 14 is a flowchart showing the operation of the automatic tracking camera 1121 shown in FIG. The flowchart shown in FIG. 14 is executed by software by executing a control program stored in a storage device (not shown) such as a ROM or a flash memory by a CPU (not shown). It is also possible to do so.

まず、ステップＳ１４０１により、撮像処理を行う。具体的には、撮像部１２０１が、監視対象の映像を撮像し、入力画像を映像符号化部１２０２の映像入力部１３０１に出力する。また、撮像部１２０１が、パン・ティルト・ズームや設置場所の情報を映像符号化部１２０２の検出結果出力部１３０７に出力する。 First, in step S1401, an imaging process is performed. Specifically, the imaging unit 1201 captures an image to be monitored and outputs an input image to the video input unit 1301 of the video encoding unit 1202. Further, the imaging unit 1201 outputs pan / tilt / zoom and installation location information to the detection result output unit 1307 of the video encoding unit 1202.

次に、ステップＳ１４０２では、映像符号化処理を行う。映像符号化部１２０２が、撮像部１２０２から入力した入力映像を符号化して映像ストリームを生成し、同時に移動物体を検出して移動物体検出結果を生成する。これら生成した映像ストリームと移動物体検出結果とを、通信網１１１０を介して映像監視装置１１００の受信部１１０１に出力する。また、移動物体検出結果を撮像制御部１２０３に出力する。 Next, in step S1402, a video encoding process is performed. The video encoding unit 1202 encodes the input video input from the imaging unit 1202 to generate a video stream, and simultaneously detects a moving object to generate a moving object detection result. The generated video stream and the moving object detection result are output to the receiving unit 1101 of the video monitoring apparatus 1100 via the communication network 1110. The moving object detection result is output to the imaging control unit 1203.

次に、ステップＳ１４０３では、撮像制御処理を行う。具体的には、撮像制御部１２０３が、通信網１１００を介して映像監視装置１１００のカメラ群制御部１１０２から入力した目標追尾命令と、映像符号化部から入力した移動物体検出結果とにより、パン・ティルト・ズームの制御信号を生成し撮像部１２０１に出力する。撮像部１２０１は、撮像制御部１２０３から入力した制御信号に基づきパン・ティルト・ズームを行う。 Next, in step S1403, an imaging control process is performed. Specifically, the imaging control unit 1203 performs panning according to a target tracking command input from the camera group control unit 1102 of the video monitoring apparatus 1100 via the communication network 1100 and a moving object detection result input from the video encoding unit. A tilt / zoom control signal is generated and output to the imaging unit 1201. The imaging unit 1201 performs pan / tilt / zoom based on a control signal input from the imaging control unit 1203.

ここで、この制御信号について説明する。後述する映像監視装置１１００が生成した目標追尾命令が、例えば、補足すべき不審人物を撮影するための座標や拡大率などを指定している場合には、撮像制御部１２０３はそれにあわせてパン・ティルト・ズームさせる制御信号を生成する。補足すべき不審人物を撮影するための座標と、移動物体検出結果が示す移動物体の領域の座標にズレがある場合には、撮像制御部１２０３がズレを修正して制御信号を生成してもよい。また、追尾する移動物体が画面に対して常に一定の面積を占めるようにカメラをパンさせても良い。目標追尾命令が無いが移動物体検出結果がある場合、移動物体を映像の中心にして撮影する。また、複数の移動物体が全て映像に収まるように制御信号を生成しても良い。その他、特に目標追尾命令も移動物体検出結果も無い場合は、広範囲を撮影する目的で撮像部１２０１に首振り運動させる制御信号を生成してもよい。 Here, this control signal will be described. For example, when a target tracking command generated by the video monitoring apparatus 1100 to be described later specifies coordinates or an enlargement ratio for photographing a suspicious person to be supplemented, the imaging control unit 1203 performs pan / A control signal for tilting and zooming is generated. If there is a deviation between the coordinates for photographing the suspicious person to be supplemented and the coordinates of the moving object area indicated by the moving object detection result, the imaging control unit 1203 may correct the deviation and generate a control signal. Good. The camera may be panned so that the moving object to be tracked always occupies a certain area with respect to the screen. When there is no target tracking command but there is a moving object detection result, the moving object is photographed with the center of the image. Further, the control signal may be generated so that a plurality of moving objects all fit in the video. In addition, when there is neither a target tracking command nor a moving object detection result, a control signal for causing the imaging unit 1201 to swing motion may be generated for the purpose of photographing a wide range.

次に、ステップＳ１４０４で、自動追尾カメラ１１２１の電源が切られるなど、映像監視を行う必要がなければ終了し、そうでなければステップＳ１４０１に戻る。 Next, in step S1404, if video monitoring is not necessary, for example, the automatic tracking camera 1121 is turned off, the process ends. Otherwise, the process returns to step S1401.

ここで、図１４におけるステップＳ１４０２の映像符号化処理について詳しく説明する。 Here, the video encoding process in step S1402 in FIG. 14 will be described in detail.

図１５は、映像符号化部１２０の動作を表すフローチャートである。なお、図１５に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納された制御プログラムを、同じく図示しないＣＰＵが実行することにより、プログラムの実行によりソフトウエア的に実行することも可能である。 FIG. 15 is a flowchart showing the operation of the video encoding unit 120. In the flowchart shown in FIG. 15, a control program stored in a storage device (not shown) (for example, a ROM or a flash memory) is executed by a CPU (not shown) and executed in software by executing the program. Is also possible.

まず、ステップＳ１５０１では、映像入力処理を行う。具体的には、映像入力部１３０１が、自動追尾カメラ１１２１の撮像部１２０１から入力画像を入力し帯域分割部１３０２に出力する。 First, in step S1501, video input processing is performed. Specifically, the video input unit 1301 inputs an input image from the imaging unit 1201 of the automatic tracking camera 1121 and outputs the input image to the band dividing unit 1302.

次に、ステップＳ１５０２では、帯域分割処理を行う。具体的には、帯域分割部１３０２が、映像入力部１３０１から入力した入力画像を帯域分割して縮小画像と水平方向成分、垂直方向成分および対角方向成分を生成し、縮小画像を基本レイヤ符号化部１３０３に、水平方向成分、垂直方向成分および対角方向成分を拡張レイヤ符号化部１３０４に出力する。 Next, in step S1502, band division processing is performed. Specifically, the band dividing unit 1302 performs band division on the input image input from the video input unit 1301 to generate a reduced image, a horizontal component, a vertical component, and a diagonal component, and the reduced image is converted into a base layer code. The horizontal direction component, the vertical direction component, and the diagonal direction component are output to enhancement section coding section 1304.

次に、ステップＳ１５０３では、基本レイヤ符号化処理を行う。具体的には、基本レイヤ符号化部１３０３が、帯域分割部１３０２から入力した縮小画像を動き予測補償符号化して基本レイヤを生成し、ストリーム出力部１３０５に出力する。また、動き予測補償の際に得る動き情報を移動物体検出部１３０６に出力する。 Next, in step S1503, a base layer encoding process is performed. Specifically, base layer coding section 1303 generates a base layer by performing motion prediction compensation coding on the reduced image input from band dividing section 1302, and outputs the base layer to stream output section 1305. Also, motion information obtained in motion prediction compensation is output to the moving object detection unit 1306.

次に、ステップＳ１５０４では、拡張レイヤ符号化処理を行う。具体的には、拡張レイヤ符号化部１３０４が、帯域分割部１３０２から入力した水平方向成分、垂直方向成分および対角方向成分をビット平面符号化して拡張レイヤを生成し、ストリーム出力部１３０５に出力する。また、ビット平面符号化の際に得られるエッジ情報を移動物体検出部１３０６に出力する。 Next, in step S1504, an enhancement layer encoding process is performed. Specifically, enhancement layer encoding section 1304 generates an enhancement layer by bit-plane encoding the horizontal direction component, vertical direction component, and diagonal direction component input from band division section 1302, and outputs them to stream output section 1305. To do. Also, the edge information obtained at the time of bit plane coding is output to the moving object detection unit 1306.

次に、ステップＳ１５０５では、ストリーム出力処理を行う。具体的には、ストリーム出力部１３０５が、基本レイヤ符号化部１３０３から入力した基本レイヤと、拡張レイヤ符号化部１３０４から入力した拡張レイヤを、通信網１１１０を介して映像監視装置１１００の受信部１１０１に出力する。 Next, in step S1505, stream output processing is performed. Specifically, the stream output unit 1305 receives the base layer input from the base layer encoding unit 1303 and the enhancement layer input from the enhancement layer encoding unit 1304 via the communication network 1110, the receiving unit of the video monitoring apparatus 1100. 1101 is output.

次に、ステップＳ１５０６では、移動物体検出処理を行う。具体的には、移動物体検出部１３０６が、基本レイヤ符号化部１３０３から入力した動き情報と、拡張レイヤ符号化部１３０４から入力したエッジ情報とを用いて移動物体の検出を行い移動物体検出結果を生成し、検出結果出力部１３０７に出力する。 Next, in step S1506, a moving object detection process is performed. Specifically, the moving object detection unit 1306 detects the moving object using the motion information input from the base layer encoding unit 1303 and the edge information input from the enhancement layer encoding unit 1304, and the moving object detection result Is output to the detection result output unit 1307.

なお、移動物体の検出の方法については実施の形態１と同様であるので、ここでは詳しく述べない。 Note that the method for detecting a moving object is the same as that in the first embodiment, and thus will not be described in detail here.

次に、ステップＳ１５０７では、検出結果出力処理を行う。具体的には、検出結果出力部１３０７が、移動物体検出部１３０６から入力した移動物体検出結果と、自動追尾カメラ１１２１の撮像部１２０１から入力したパン・ティルト・ズームや設置位置などの情報を、通信網１１１０を介して映像監視装置１１００の受信部１１０１に出力する。 In step S1507, detection result output processing is performed. Specifically, the detection result output unit 1307 receives the moving object detection result input from the moving object detection unit 1306 and information such as pan / tilt / zoom and installation position input from the imaging unit 1201 of the automatic tracking camera 1121. The data is output to the reception unit 1101 of the video monitoring apparatus 1100 via the communication network 1110.

なお、実施の形態１に述べた映像復号化装置と同様に、本実施の形態においても水平方向成分、垂直方向成分および対角方向成分の情報と、動き予測補償によって生成する動きベクトルとを含んだ映像ストリームを生成することができれば、他の帯域分割方法を利用することも可能である。 As with the video decoding apparatus described in the first embodiment, the present embodiment also includes information on the horizontal direction component, the vertical direction component, and the diagonal direction component, and a motion vector generated by motion prediction compensation. If a video stream can be generated, other band dividing methods can be used.

次に、本実施の形態に係る映像監視装置１１００の構成について以下に説明する。 Next, the configuration of video monitoring apparatus 1100 according to the present embodiment will be described below.

図１１において、映像監視装置１１００は、受信部１１０１、画像認識部１１０２、およびカメラ群制御部１１０３を有する。 In FIG. 11, the video monitoring apparatus 1100 includes a reception unit 1101, an image recognition unit 1102, and a camera group control unit 1103.

画像認識部１１０２は本発明の画像認識手段に相当し、映像ストリームと移動物体の検出結果を入力して、詳細な画像認識を行い、画像認識の結果をカメラ群制御部１１０３に出力する。 The image recognizing unit 1102 corresponds to the image recognizing unit of the present invention, receives a video stream and a moving object detection result, performs detailed image recognition, and outputs the image recognition result to the camera group control unit 1103.

カメラ群制御部１１０３は本発明のカメラ群制御手段に相当し、画像認識の結果を入力し、カメラ１１２１〜１１２Ｎに対して追尾する目標の情報を生成し出力する。 The camera group control unit 1103 corresponds to the camera group control means of the present invention, receives the image recognition result, and generates and outputs target information to be tracked for the cameras 1121 to 112N.

次に、上記のように構成された映像監視装置１１００の動作を説明する。 Next, the operation of the video monitoring apparatus 1100 configured as described above will be described.

図１６は、映像監視装置１１００の動作を表すフローチャートである。 FIG. 16 is a flowchart showing the operation of the video monitoring apparatus 1100.

まず、ステップＳ１６０１により、受信処理を行う。具体的には、受信部１１０１が、通信網１１１０を介して自動追尾カメラ１１２１からの映像ストリームと移動物体検出結果とを入力し画像認識部１１０２に出力する。 First, in step S1601, reception processing is performed. Specifically, the receiving unit 1101 inputs the video stream and the moving object detection result from the automatic tracking camera 1121 via the communication network 1110 and outputs them to the image recognition unit 1102.

次に、ステップＳ１６０２では、画像認識処理を行う。具体的には、画像認識部１１０２が、受信部１１０１から入力した映像ストリームと移動物体検出結果を用いて映像ストリームを復号化し、種々の公知の画像認識方法で人・顔・物の検出や認証などを行い、その結果を生成しカメラ群制御部１１０３に出力する。また、画像認識部１１０２は、移動物体検出結果が含む移動物体の領域以外は画像認識を行わないようにすることにより、処理を高速化することが可能である。 Next, in step S1602, image recognition processing is performed. Specifically, the image recognition unit 1102 decodes the video stream using the video stream input from the reception unit 1101 and the moving object detection result, and detects and authenticates a person / face / object using various known image recognition methods. The result is generated and output to the camera group control unit 1103. Further, the image recognition unit 1102 can speed up the processing by not performing image recognition except for the moving object region included in the moving object detection result.

次に、ステップＳ１６０３では、カメラ制御処理を行う。具体的には、カメラ群制御部１１０３が、画像認識部１１０２から入力した画像認識結果を用いて自動追尾カメラ１１２１に対する目標追尾命令を生成し、通信網１１１０を介して自動追尾カメラ１１２１の撮像制御部１２０３に出力する。また、自動追尾カメラ１１２１に対する画像認識結果により、他の自動追尾カメラ１１２２〜１１２Ｎに新たな追尾の必要が生じた場合、新たな目標追尾命令を生成し通信網１１１０を介して該当する自動追尾カメラ１１２２〜１１２Ｎの撮像部１２０３に出力する。 In step S1603, camera control processing is performed. Specifically, the camera group control unit 1103 generates a target tracking command for the automatic tracking camera 1121 using the image recognition result input from the image recognition unit 1102, and performs imaging control of the automatic tracking camera 1121 via the communication network 1110. To the unit 1203. Further, when the image recognition result for the automatic tracking camera 1121 necessitates new tracking for the other automatic tracking cameras 1122 to 112N, a new target tracking command is generated and the corresponding automatic tracking camera is transmitted via the communication network 1110. The image is output to the imaging units 1203 of 1122 to 112N.

ここで、目標追尾命令について説明する。 Here, the target tracking command will be described.

画像認識部１１０２から入力した画像認識結果が、例えば、映像内に不審人物が存在することを示す場合、カメラ群制御部１１０３はその不審人物を大きく撮影させるために座標や拡大率などを含む目標追尾命令を生成する。また、映像内に不審人物が存在するが自動追尾カメラ１１２１では不審人物の顔を撮影することが不可能な場合、自動追尾カメラ１１２２に対してその不審人物を撮影させる目標追尾命令を生成し、自動追尾カメラ１１２１に対して不審人物を含む広い範囲を撮影させる目標追尾命令を生成する。 For example, when the image recognition result input from the image recognition unit 1102 indicates that there is a suspicious person in the video, the camera group control unit 1103 includes a target including coordinates, an enlargement ratio, and the like in order to photograph the suspicious person greatly. Generate a tracking instruction. If there is a suspicious person in the video but the automatic tracking camera 1121 is unable to capture the face of the suspicious person, it generates a target tracking command that causes the automatic tracking camera 1122 to image the suspicious person, A target tracking command for causing the automatic tracking camera 1121 to capture a wide range including a suspicious person is generated.

次に、ステップＳ１６０４では、終了判定を行い、映像監視装置１１００の電源が切られるなど、映像監視を行う必要がなければ終了し、そうでなければステップＳ１６０１に戻る。 Next, in step S1604, an end determination is performed. If it is not necessary to perform video monitoring, such as when the video monitoring apparatus 1100 is turned off, the process ends. Otherwise, the process returns to step S1601.

以上のように構成された映像監視システムの動作について以下に説明する。 The operation of the video surveillance system configured as described above will be described below.

図１７は本実施の形態の映像監視システムの動作を示すシーケンス図である。 FIG. 17 is a sequence diagram showing the operation of the video monitoring system of the present embodiment.

まず、自動追尾カメラ１１２１は監視対象を撮影すると、水平方向成分、垂直方向成分および対角方向成分の情報と、動き予測補償によって生成する動きベクトルとを含んだ映像ストリームを生成すると共に、移動物体検出結果を求め、これらを通信網１１１０を介して、映像監視装置１１００へ送信する（ステップＳ１７０１）。 First, when the automatic tracking camera 1121 captures a monitoring target, the automatic tracking camera 1121 generates a video stream including information on the horizontal direction component, the vertical direction component, and the diagonal direction component, and a motion vector generated by motion prediction compensation. Detection results are obtained and transmitted to the video monitoring apparatus 1100 via the communication network 1110 (step S1701).

映像監視装置１１００は受信した映像ストリームを復号し、移動物体検出結果の情報を用いて対象物体の認識を行う。そして、自動追尾カメラに対象物体を追尾するための目標追尾命令を送信する（ステップＳ１７０２）。 The video monitoring apparatus 1100 decodes the received video stream, and recognizes the target object using information on the moving object detection result. Then, a target tracking command for tracking the target object is transmitted to the automatic tracking camera (step S1702).

自動追尾カメラ１１２１はこれを受けて、撮像部を制御し対象物を追尾する。そして、このときの映像ストリームなどを映像監視装置１１００へ送信する（ステップＳ１７０３）。 In response to this, the automatic tracking camera 1121 controls the imaging unit to track the object. Then, the video stream at this time is transmitted to the video monitoring apparatus 1100 (step S1703).

以降、上記のステップＳ１７０２とステップＳ１７０３とが繰り返される。なお、自動追尾カメラ１１２１からの映像ストリーム等は映像監視装置１１００からの命令の有無にかかわらず、常時映像監視装置１１００へ送信される。 Thereafter, step S1702 and step S1703 are repeated. Note that a video stream or the like from the automatic tracking camera 1121 is always transmitted to the video monitoring device 1100 regardless of whether there is a command from the video monitoring device 1100.

以上のように、本実施の形態に係る映像監視システムは、通信網を介して自動追尾カメラから映像監視装置へ映像を送信するために、映像は符号化してデータ圧縮された映像ストリームとする必要がある。このとき、本発明によれば、映像ストリームを生成する過程で、同時に移動物体検出を行い、その結果情報を映像監視装置へ通知することができるので、映像監視装置はあらためて受信した映像ストリームから移動物体の検出をする必要が無くなる。これにより、映像監視装置の処理を軽減することができる。 As described above, the video monitoring system according to the present embodiment needs to encode the video into a video stream that is encoded and compressed in order to transmit the video from the automatic tracking camera to the video monitoring apparatus via the communication network. There is. At this time, according to the present invention, in the process of generating the video stream, the moving object detection can be performed at the same time, and the result information can be notified to the video monitoring apparatus, so that the video monitoring apparatus moves from the newly received video stream. There is no need to detect an object. Thereby, the processing of the video monitoring apparatus can be reduced.

また、本実施の形態２によれば、遠隔地にある自動追尾カメラが撮影した画像を受信して、映像監視装置で映像の監視と追尾を行う映像監視システムにおいて、自動追尾カメラが、一部の手段や処理を共有して、撮影した画像の水平方向成分、垂直方向成分および対角方向成分の情報と、動き予測補償によって生成する動きベクトルとを含んだ映像ストリームに映像符号化する処理と、移動物体検出処理とをすることができるので、高精度な移動物体の検出と映像の符号化とを同時に高速に行うことが可能であり、また、システム全体の規模を小さくすることもできる。 Further, according to the second embodiment, in an image monitoring system that receives an image captured by an automatic tracking camera at a remote location and monitors and tracks the image with the image monitoring device, the automatic tracking camera is partially Processing to encode the video stream into a video stream including information on the horizontal component, vertical component and diagonal component of the captured image, and a motion vector generated by motion prediction compensation. Since the moving object detection process can be performed, it is possible to simultaneously detect the moving object with high accuracy and to encode the video at high speed, and to reduce the scale of the entire system.

また、実施の形態２によれば、自動追尾カメラは移動物体の検出結果をもとに求められた映像監視装置からの指示で、パン・ティルト・ズームの撮像機能の制御を行うことができるので、移動物体、ひいては不審人物などを効率的に監視することが可能である。 Further, according to the second embodiment, the automatic tracking camera can control the pan / tilt / zoom imaging function in accordance with an instruction from the video monitoring apparatus obtained based on the detection result of the moving object. It is possible to efficiently monitor a moving object and eventually a suspicious person.

また、実施の形態２によれば、映像監視装置は上記の映像ストリームとともに入力する移動物体の検出結果をもとに移動物体の領域のみを画像認識するので、画像認識処理の負荷を軽減することができるとともに、画像認識の精度が向上する。また、これにより、より多くの自動追尾カメラを制御して効率的に監視することが可能な映像監視システムとすることができる。 In addition, according to the second embodiment, the video monitoring apparatus recognizes an image of only the moving object region based on the detection result of the moving object that is input together with the video stream, thereby reducing the load of the image recognition process. And the accuracy of image recognition is improved. In addition, this makes it possible to provide a video monitoring system that can control and monitor more automatic tracking cameras efficiently.

（実施の形態３）
実施の形態３は、本発明に係る移動物体検出方法および装置である。 (Embodiment 3)
Embodiment 3 is a moving object detection method and apparatus according to the present invention.

本実施の形態では、実施の形態１と同様に基本レイヤと拡張レイヤからなる映像ストリームのうち拡張レイヤの映像ストリームのみを用いて移動物体を検出する方法を述べる。本実施の形態で取り扱う拡張レイヤの映像ストリームは、ISO/IEC 14496-2 Amendment 2に規定されるＭＰＥＧ−４ＦＧＳ(Fine Granularity Scalable coding)のＦＧＳＴ(FGS Temporal Scalability)のように拡張レイヤの映像ストリームのフレームの先頭に動きベクトル情報が含まれるものとする。 In the present embodiment, a method for detecting a moving object using only the video stream of the enhancement layer among the video streams composed of the base layer and the enhancement layer as in the first embodiment will be described. The enhancement layer video stream handled in this embodiment is an enhancement layer video stream such as MPEG-4 FGS (Fine Granularity Scalable coding) FGST (FGS Temporal Scalability) defined in ISO / IEC 14496-2 Amendment 2. It is assumed that motion vector information is included at the beginning of the frame.

図１９は、本発明の移動物体検出方法および装置を適用した実施の形態１に係る移動物体検出装置１９００の構成を示すブロック図である。 FIG. 19 is a block diagram showing a configuration of a moving object detection apparatus 1900 according to Embodiment 1 to which the moving object detection method and apparatus of the present invention is applied.

図１９において、移動物体検出装置１９００は、ストリーム入力部１９０１、動き情報抽出部１９０２、エッジ情報抽出部１９０３、移動物体検出部１９０４、検出結果出力部１９０５を有する。 19, the moving object detection apparatus 1900 includes a stream input unit 1901, a motion information extraction unit 1902, an edge information extraction unit 1903, a moving object detection unit 1904, and a detection result output unit 1905.

本実施の形態では、実施の形態１と異なり、ストリーム入力部１９０１は拡張レイヤの映像ストリームのみ入力する。 In the present embodiment, unlike the first embodiment, the stream input unit 1901 inputs only an enhancement layer video stream.

なお、動き情報抽出部１９０２が動き情報抽出手段に相当し、エッジ情報抽出部１９０３がエッジ情報抽出手段に相当し、移動物体検出部１９０４が移動物体検出手段に相当する。 Note that the motion information extraction unit 1902 corresponds to the motion information extraction unit, the edge information extraction unit 1903 corresponds to the edge information extraction unit, and the moving object detection unit 1904 corresponds to the moving object detection unit.

ここで、動き情報抽出手段は、入力した拡張レイヤの映像ストリームから動き情報を抽出して移動物体検出手段に出力する。エッジ情報抽出手段は、入力した拡張レイヤの映像ストリームからエッジ情報を抽出して移動物体検出手段に出力する。移動物体検出手段は、入力したエッジ情報と動き情報から移動物体を検出する。 Here, the motion information extraction unit extracts the motion information from the input enhancement layer video stream and outputs the motion information to the moving object detection unit. The edge information extraction unit extracts edge information from the input enhancement layer video stream and outputs the extracted edge information to the moving object detection unit. The moving object detection means detects a moving object from the input edge information and motion information.

次に、以上のように構成された移動物体装置１９００の動作を説明する。 Next, the operation of the moving object device 1900 configured as described above will be described.

図２０は、図１９に示す実施の形態３の移動物体装置１９００の動作を表すフローチャートである。なお、図２０に示すフローチャートは、図示しない記憶装置（例えばＲＯＭやフラッシュメモリなど）に格納された制御プログラムを、同じく図示しないＣＰＵが実行することにより、プログラムの実行によりソフトウエア的に実行されるようにすることも可能である。 FIG. 20 is a flowchart showing the operation of the moving object device 1900 of the third embodiment shown in FIG. The flowchart shown in FIG. 20 is executed by software by executing a control program stored in a storage device (not shown) such as a ROM or a flash memory by a CPU (not shown). It is also possible to do so.

まず、ストリーム入力部１９０１が、移動物体検出装置１９００の外部から拡張レイヤの映像ストリームを入力し、動き情報抽出部１９０２とエッジ情報抽出部１９０３に出力する（ステップＳ２００１）。 First, the stream input unit 1901 inputs an enhancement layer video stream from the outside of the moving object detection apparatus 1900, and outputs it to the motion information extraction unit 1902 and the edge information extraction unit 1903 (step S2001).

次に、動き情報抽出部１９０２が、ストリーム入力部１９０１から入力した拡張レイヤから動き情報を抽出し、移動物体検出部１９０４に出力する（ステップＳ２００２）。 Next, the motion information extraction unit 1902 extracts motion information from the enhancement layer input from the stream input unit 1901 and outputs the motion information to the moving object detection unit 1904 (step S2002).

次に、エッジ情報抽出部１９０３が、ストリーム入力部１９０２から入力した拡張レイヤからエッジ情報を抽出し、移動物体検出部１９０４に出力する（ステップＳ２００３）。 Next, the edge information extraction unit 1903 extracts edge information from the enhancement layer input from the stream input unit 1902 and outputs it to the moving object detection unit 1904 (step S2003).

ここで、ＭＰＥＧ−４ＦＧＳで規定されるＦＧＳＴでは、１フレームの拡張レイヤの先頭にフレーム全領域の動きベクトルが格納され、それに続いてビット平面の情報が格納される。よって、ストリーム入力部１９０１が動きベクトルの映像ストリームまでを入力し動き情報抽出部１９０２が動き情報を生成し、フレーム内に動きがある場合のみビット平面の映像ストリームを入力してエッジ情報抽出部１９０３に出力しても良い。これにより、フレーム内に動きがない場合に、ストリームの入力処理とエッジの抽出処理および移動物体の検出処理を省略し処理負荷を軽減することが可能である。 Here, in FGST defined by MPEG-4 FGS, the motion vector of the entire frame area is stored at the head of the enhancement layer of one frame, and subsequently the information of the bit plane is stored. Accordingly, the stream input unit 1901 inputs up to the motion vector video stream, the motion information extraction unit 1902 generates motion information, and the bit plane video stream is input only when there is motion in the frame, and the edge information extraction unit 1903. May be output. As a result, when there is no motion in the frame, it is possible to omit the stream input process, the edge extraction process, and the moving object detection process, thereby reducing the processing load.

次に、移動物体検出部１９０４が、動き情報抽出部１９０２から入力した動き情報とエッジ情報抽出部１９０３から入力したエッジ情報を用いて移動物体の検出を行い、実施の形態１と同様に、移動物体検出結果を生成し検出結果出力部１９０５に出力する（ステップＳ２００４乃至ステップＳ２００６）。 Next, the moving object detection unit 1904 detects the moving object using the motion information input from the motion information extraction unit 1902 and the edge information input from the edge information extraction unit 1903, and moves as in the first embodiment. An object detection result is generated and output to the detection result output unit 1905 (steps S2004 to S2006).

次に、移動物体を検出した結果を出力する。具体的には、検出結果出力部１９０５が、移動物体検出部１９０４から入力された移動物体の領域の座標を外部に出力する（ステップＳ２００７）。 Next, the result of detecting the moving object is output. Specifically, the detection result output unit 1905 outputs the coordinates of the area of the moving object input from the moving object detection unit 1904 to the outside (step S2007).

次に、終了判定処理を行う。ストリーム入力部１９０１が、続く映像ストリームの有無を判定するなどして、移動物体検出装置１９００がこれ以上移動物体の検出を行うがなければ処理を終了し、そうでなければステップＳ２００１に戻る（ステップＳ２００８）。 Next, end determination processing is performed. The stream input unit 1901 determines whether or not there is a subsequent video stream. If the moving object detection apparatus 1900 does not detect any more moving objects, the process ends. Otherwise, the process returns to step S2001 (step S2001). S2008).

以上のように、本実施の形態３によれば、拡張レイヤの映像ストリームのみを入力し、動き情報抽出部１９０２が動き情報を抽出し、エッジ情報抽出部１９０３がエッジ情報を抽出することにより、高速かつ少ない映像ストリームで物体の輪郭を検出することができる。 As described above, according to the third embodiment, only the enhancement layer video stream is input, the motion information extraction unit 1902 extracts the motion information, and the edge information extraction unit 1903 extracts the edge information. The contour of an object can be detected with high speed and a small video stream.

本発明は、映像を符号化して生成した映像ストリームから移動物体を検出する移動物体検出装置に有用であり、映像ストリームを復号化することなく高速に移動物体を検出するのに適している。 The present invention is useful for a moving object detection apparatus that detects a moving object from a video stream generated by encoding video, and is suitable for detecting a moving object at high speed without decoding the video stream.

本発明の実施の形態１による映像復号化装置の構成を示す図The figure which shows the structure of the video decoding apparatus by Embodiment 1 of this invention. 本発明の実施の形態１におけるビット平面符号化の概念図Conceptual diagram of bit plane encoding in Embodiment 1 of the present invention 本発明の実施の形態１による映像復号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video decoding apparatus by Embodiment 1 of this invention. 本発明の実施の形態１による映像復号化装置の移動物体検出処理の動作を示すフローチャートThe flowchart which shows the operation | movement of the moving object detection process of the video decoding apparatus by Embodiment 1 of this invention. 本発明の実施の形態１における拡張レイヤのストリーム構造図Stream structure diagram of enhancement layer in Embodiment 1 of the present invention 本発明の実施の形態１における拡張レイヤのビット平面ｋのストリーム構造図Stream structure diagram of bit plane k of the enhancement layer in Embodiment 1 of the present invention 本発明の実施の形態１における拡張レイヤの領域ｊのビット平面ｋのストリーム構造図Stream structure diagram of bit plane k of region j of the enhancement layer according to Embodiment 1 of the present invention 本発明の実施の形態１における基本レイヤのストリーム構造図Base layer stream structure diagram in Embodiment 1 of the present invention 本発明の実施の形態１における基本レイヤの領域ｊのストリーム構造図Stream structure diagram of base layer region j in Embodiment 1 of the present invention （ａ）乃至（ｃ）本発明の実施の形態１における８×８画素領域における水平方向成分を表した図(A) thru | or (c) The figure showing the horizontal direction component in the 8x8 pixel area | region in Embodiment 1 of this invention. 本発明の実施の形態２による映像監視システムの構成を示す図The figure which shows the structure of the video surveillance system by Embodiment 2 of this invention. 本発明の実施の形態２による自動追尾カメラの構成を示す図The figure which shows the structure of the automatic tracking camera by Embodiment 2 of this invention. 本発明の実施の形態２による映像符号化装置の構成を示す図The figure which shows the structure of the video coding apparatus by Embodiment 2 of this invention. 本発明の実施の形態２による自動追尾カメラの動作を示すフローチャートThe flowchart which shows operation | movement of the automatic tracking camera by Embodiment 2 of this invention. 本発明の実施の形態２による映像符号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video coding apparatus by Embodiment 2 of this invention. 本発明の実施の形態２による映像監視装置の動作を示すフローチャートThe flowchart which shows operation | movement of the image | video monitoring apparatus by Embodiment 2 of this invention. 本発明の実施の形態２による映像監視システムの動作を示すシーケンス図Sequence diagram showing the operation of the video surveillance system according to the second embodiment of the present invention 従来の移動物体検出装置の構成を示す図The figure which shows the structure of the conventional moving object detection apparatus 本発明の実施の形態３による映像復号化装置の構成を示す図The figure which shows the structure of the video decoding apparatus by Embodiment 3 of this invention. 本発明の実施の形態３による映像復号化装置の動作を示すフローチャートThe flowchart which shows operation | movement of the video decoding apparatus by Embodiment 3 of this invention.

Explanation of symbols

１００映像復号化装置
１０１ストリーム入力部
１０２基本レイヤ復号化部
１０３拡張レイヤ復号化部
１０４帯域合成部
１０５映像出力部
１０６，１３０６移動物体検出部
１０７，１３０７検出結果出力部
１１００映像監視装置
１１０１受信部
１１０２画像認識部
１１０３カメラ群制御部
１１１０通信網
１１２１乃至１１２Ｎ自動追尾カメラ
１２０１撮像部
１２０２映像符号化部
１２０３撮像制御部
１３０１映像入力部
１３０２帯域分割部
１３０３基本レイヤ符号化部
１３０４拡張レイヤ符号化部
１３０５ストリーム出力部
１８０１可変長復号部
１８０２模様情報検出部
１８０３移動物体検出処理部
１９００移動物体検出装置
１９０１ストリーム入力部
１９０２動き情報抽出部
１９０３エッジ情報抽出部
１９０４移動物体検出部
１９０５検出結果出力部 DESCRIPTION OF SYMBOLS 100 Video decoding apparatus 101 Stream input part 102 Base layer decoding part 103 Enhancement layer decoding part 104 Band synthesis part 105 Video output part 106,1306 Moving object detection part 107,1307 Detection result output part 1100 Video monitoring apparatus 1101 Reception part 1102 Image recognition unit 1103 Camera group control unit 1110 Communication network 1121 to 112N Automatic tracking camera 1201 Imaging unit 1202 Video coding unit 1203 Imaging control unit 1301 Video input unit 1302 Band division unit 1303 Base layer coding unit 1304 Enhancement layer coding unit 1305 Stream output unit 1801 Variable length decoding unit 1802 Pattern information detection unit 1803 Moving object detection processing unit 1900 Moving object detection device 1901 Stream input unit 1902 Motion information extraction unit 1903 Edge information extraction 1904 moving object detection unit 1905 the detection result output unit

Claims

Motion information extraction means for extracting motion information from a video stream that has been encoded using hierarchical encoding and motion predictive compensation encoding that divides and encodes a video into a plurality of layers;
Edge information extracting means for extracting edge information from the video stream;
A moving object detecting means for detecting a moving object using the movement information and the edge information and outputting the detection result;
A moving object detection apparatus having:

The edge information extracting means extracts bit plane information from the most significant bit plane to N (N is a natural number) bit-order bit plane information as edge information from the bit plane information obtained by bit plane encoding the image. The moving object detection device according to claim 1.

The video stream is divided into a plurality of areas,
3. The moving object detection unit according to claim 2, wherein the moving object detection unit determines the area as a contour area of the moving object when a total code length of bit plane information in the area is equal to or greater than a predetermined first value. Moving object detection device.

The moving object detection means determines the area as a contour area of the moving object when the total code length of the bit plane information inside the area is equal to or less than a predetermined second value. The moving object detection device described.

The motion information extraction unit extracts a motion vector from an area determined to be a contour region of the moving object, and the moving object detection unit has a case where the magnitude of the motion vector is a predetermined third value or more. The moving object detection device according to claim 3, wherein the area is determined to be a contour area of a moving object.

The motion information extraction unit extracts a first motion vector from an area determined to be a contour area of the moving object, selects an area located in the vicinity of the area, and selects a second motion vector from the selected area. The moving object detecting means calculates a difference vector between the first motion vector and the second motion vector as a measured value, and the measured value is a fourth value determined in advance. The moving object detection device according to claim 3, wherein the selected area is determined to be an internal area of a moving object when:

The motion information extraction means selects a plurality of regions, extracts a motion vector from each selected region,
The moving object detection means obtains the size of the difference vector between the first motion vector and the motion vector of the selected region for each of the selected regions, and calculates the size of the difference vector for all the selected regions. The moving object detection apparatus according to claim 6, wherein a total is calculated as the measurement value.

The moving object detection means has a fifth difference vector in which a magnitude of a difference vector between a motion vector of an area determined to be an internal area of the moving object and a motion vector of an area located in the vicinity of the area is predetermined. The moving object detection device according to claim 6, wherein if the value is equal to or smaller than the value, the moving object detection device determines that the area is an internal area of the moving object area.

9. The moving object detection unit determines that an area surrounded by a contour area of the moving object or an area determined to be an internal area of the moving object is an internal area of the moving object. Moving object detection device.

The moving object detection means has a predetermined number of regions determined to be the contour region or the internal region of the second moving object in the vicinity of the contour region or the internal region determined to be the first moving object. The moving object detection device according to claim 3, wherein, when the value is equal to or larger than a sixth value, the contour area or the internal area determined as the first moving object is re-determined as the first moving object.

A method for detecting a moving object from a video stream, which is executed by a moving object detection device that detects the moving object.
A step of extracting motion information from a video stream that has been encoded using hierarchical encoding that divides and encodes a video into a plurality of layers, and motion prediction compensation encoding;
Extracting edge information from the video stream;
Detecting a moving object using the extracted motion information and edge information;
A moving object detection method comprising:

To detect moving objects from the video stream,
Extracting motion information from a video stream that has been video-encoded using hierarchical encoding and motion predictive compensation encoding that divides and encodes the video into multiple layers;
Extracting edge information from the video stream;
Detecting a moving object using the extracted motion information and edge information;
Moving object detection program to execute.

Video decoding means for decoding a video stream encoded by hierarchical encoding and motion prediction compensation encoding for encoding video into a plurality of layers;
Moving object detection means for detecting a moving object from motion information and edge information extracted when the video decoding means decodes the video stream;
A video decoding apparatus comprising:

The video stream is divided into a plurality of areas,
The moving object detection means determines the area as a contour area of a moving object when the total code length of bit plane information inside the area is equal to or greater than a predetermined first value. Video decoding device.

The moving object detection unit determines the area as a contour area of a moving object when a total code length of the bit plane information in the area is equal to or less than a predetermined second value. Video decoding device.

The video decoding device according to claim 15, wherein the video decoding unit generates a video in which a region of the moving object detected by the moving object detection unit is emphasized.

The video decoding means generates a video composed of edge components,
14. The video decoding apparatus according to claim 13, wherein the moving object area detected by the moving object detection means is displayed with emphasis.

Video encoding means for generating a video stream encoded using hierarchical encoding and motion prediction compensation encoding for encoding video into a plurality of layers;
Moving object detecting means for detecting movement objects by extracting motion information and edge information of the video when the video encoding means encodes the video;
A video encoding device.

Imaging means for inputting video;
A video encoding device according to claim 18,
An imaging control unit that controls an imaging function for the imaging unit based on a detection result of the moving object output by the moving object detection unit;
An output unit for outputting the video stream and the detection result of the moving object;
An imaging apparatus having

The imaging apparatus according to claim 19, wherein the imaging control means controls the imaging means so that the area of the moving object output by the moving object detection means is a constant ratio with respect to the total area of the input video. .

An imaging device according to claim 19,
A video monitoring device that decodes the video stream received from the imaging device and performs image recognition of the detected moving object region using the detection result of the moving object;
A video surveillance system.

The video stream is encoded by being layered into a base layer and an enhancement layer,
The motion information extraction means extracts the motion information from the base layer video stream,
The video decoding device according to claim 1, wherein the edge information extraction unit extracts the edge information from the video stream of the enhancement layer.

The video stream is encoded by being layered into a base layer and an enhancement layer,
The motion information extraction means extracts the motion information from the enhancement layer video stream,
The video decoding device according to claim 1, wherein the edge information extraction unit extracts the edge information from a video stream of an enhancement layer.