[go: up one dir, main page]

WO2002013535A2 - Video encoder using image from a secondary image sensor - Google Patents

Video encoder using image from a secondary image sensor Download PDF

Info

Publication number
WO2002013535A2
WO2002013535A2 PCT/EP2001/008538 EP0108538W WO0213535A2 WO 2002013535 A2 WO2002013535 A2 WO 2002013535A2 EP 0108538 W EP0108538 W EP 0108538W WO 0213535 A2 WO0213535 A2 WO 0213535A2
Authority
WO
WIPO (PCT)
Prior art keywords
image
video
encoding
parameter
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2001/008538
Other languages
French (fr)
Other versions
WO2002013535A3 (en
Inventor
Michael Bakhmutsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Priority to KR1020027004498A priority Critical patent/KR20020064794A/en
Priority to JP2002518086A priority patent/JP2004506354A/en
Priority to EP01969495A priority patent/EP1310102A2/en
Publication of WO2002013535A2 publication Critical patent/WO2002013535A2/en
Publication of WO2002013535A3 publication Critical patent/WO2002013535A3/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/37Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability with arrangements for assigning different transmission priorities to video input data or to video coded data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/10Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths
    • H04N23/11Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from different wavelengths for generating image signals from visible and infrared light wavelengths
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/20Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from infrared radiation only
    • H04N23/23Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from infrared radiation only from thermal infrared radiation

Definitions

  • This invention relates to the field of video communications, and in particular to a method and system that facilitates an optimized transmission of images based on a coupling of a video camera with a secondary sensor, such as a heat sensor mosaic.
  • An MPEG encoding of a stream of images uses a variety of techniques to reduce the amount of data that needs to be transmitted or stored.
  • bandwidth is used herein to include the amount of encoded data required to either store or transmit video images.
  • a discrete cosine transform (DCT) is used to reduce the size of the encoded information spatially within each image frame, or portion of a frame.
  • Motion estimation techniques are used to reduce the size of the encoded information temporally, based on the amount of difference, or movement, between successive images.
  • Quantization is used to reduce the size of the encoded information based on the degree of detail required, or to reduce the size, and thus the detail, based on available bandwidth.
  • Each of these techniques are intended to optimize the allocation of bandwidth to different characteristics of the image, without introducing noticeable visible anomalies when the received image is decoded and displayed.
  • bandwidth optimizing techniques of MPEG encoding some compromises are required for low bandwidth systems.
  • video images communicated over the Internet are typically constrained to small size images, providing substantially less resolution than a full-resolution DND version of the same stream of images.
  • Video images communicated for videoconferencing are typically encoded at less than half the frame rate of conventional television broadcasts, and produces delayed and discontinuous images on the display.
  • MPEG-4 allows for the separation of an object from its background, and thereby allows the object to be encoded at a different, typically finer, level of detail than the background.
  • This encoding technique is expected to be particularly well suited for videoconferencing, wherein the majority of the limited bandwidth is allocated to the human 'objects' in the scene, with minimal bandwidth being allocated to background scenes. In this manner, although movements in the background may appear staggered and potentially blurred, the human objects in the scene will appear clearly, and potentially at a higher frame rate that reduces delays and discontinuities.
  • object-dependent encoding techniques are also expected to facilitate graphic art effects, wherein select objects can be encoded with different emphasis than the background scene, or other objects.
  • These advanced techniques for allocating bandwidth or providing graphic art effects to objects of interest in an encoded image requires the recognition of each object in the image.
  • Object recognition is a complex processing task that currently requires processing equipment that is beyond the feasible cost range for consumer devices. The high cost and relatively low accuracy of current object recognition devices precludes its use in most applications that could benefit from an optimized encoding, such as video conferencing and Internet video communications.
  • a preferred secondary sensor for detecting animate objects, such as humans in a videoconference scene is a conventional infrared heat sensor matrix.
  • the secondary image may also be used as a "front end filter" to conventional object recognition applications, thereby increasing the efficiency and accuracy of these applications.
  • Fig. 1 illustrates an example block diagram of an encoding system in accordance with this invention.
  • Fig. 2 illustrates an example camera system in accordance with this invention.
  • Fig. 3 illustrates an example flow diagram of an encoding system in accordance with this invention.
  • FIG. 1 illustrates an example block diagram of an encoding system 100 in accordance with this invention.
  • the encoding system 100 includes a source of a video image 110, a corresponding secondary image 120, and an encoder 150.
  • the term image is used herein to define an array of values corresponding to items within a field of view of a collection device.
  • the video image 110 generally corresponds to an array of values associated with the collection of visible light within the field of view of a video camera.
  • This array of values may be in any of a variety of formats, and although represented as an array of values in the figures, may be a serial stream of values.
  • the secondary image 120 in accordance with this invention is not a derivative of the video image 110, but is a representation of substantially the same scene as the video image, collected via an alternative sensor to the sensor that is used to collect the video image 110.
  • the secondary image 120 is a representation of the scene collected via an infrared heat sensor, although other secondary sensing devices may be used as well.
  • the secondary sensor captures a characteristic of the scene that facilitates the recognition of potential objects of interest 101 in the video image 110.
  • An infrared sensor is particularly well suited for detecting animate objects, such as humans, even when the object is fully clothed.
  • the resolution of the secondary image 120 may be different than the video image 110.
  • the secondary image 120 may be a 64x64 array of thermal values, whereas the video image 110 may be a 330x485, or larger, array of luminance and chrominance values.
  • the resolution of the secondary image 120 is selected based on a cost/performance tradeoff.
  • the resolution of the secondary image 120 determines the accuracy of determining the shape of the object of interest 101, and thereby the degree of encoding optimization that can be achieved, but the cost of a sensor to produce a high-resolution image 120 may be substantially higher than the cost of a sensor that produces a low-resolution image 120.
  • Such a high cost may be warranted, for example, in a professional system that is used to identify a newscaster in a scene, and substitutes appropriate background images based on the news content.
  • this invention is presented using the paradigm of an identification of potential objects or regions of interest in the video image based on thermal emissions, and an adjustment of the level of detail provided in the encoding of these objects or regions of interest.
  • other characteristics of a secondary image can be used to control the encoding of the video images, such as an identification of objects based on a particular color.
  • other encoding parameters such as brightness, color intensity, frame rate, and so on, may be adjusted in dependence upon the detected characteristics.
  • any parameter or characteristic that affects the encoding of an image is termed an "encoding parameter".
  • the luminance and chrominance values within these regions may be set to a constant value, thereby minimizing the information content that needs to be encoded for these regions.
  • the characteristics of the secondary image 120 are used to control the encoding parameters 160 that are used by the encoder 150 for encoding the video image 110.
  • the object 101 is illustrated as being overlaid on the images 110, 120.
  • the sensor regions corresponding to infrared image 120 that are overlaid by the infrared emitting object 101 will have higher sensed values than the surrounding regions.
  • Regions that are partially overlaid by the infrared emitting object 101 will have an average sensed value that is lower than the regions that are overlaid completely by the infrared emitting object 101, but higher than the regions that do not contain a source of infrared emissions. If, as illustrated, a region 121 of the secondary image 120 contains a characteristic (high thermal sensed value) that corresponds to the presence of an animate object (warm body) in that region 121, the encoder 150 encodes the regions 111 of the video image 110 corresponding to this region 121 at a finer level of detail than regions in the secondary image 120 that do not exhibit the presence of a warm body.
  • This level of detail can be changed, for example, by modifying the quantization step size used in the quantization of DCT values in an MPEG encoding.
  • Other encoding parameters 160 may be adjusted, in addition to, or in lieu of the quantization parameter.
  • the perception of a higher frame rate can be achieved by transmitting frames containing the regions of interest more often than frames that contain the other regions.
  • the characteristic of the region 121 may be one of many parameters
  • a "static" block in an image is progressively encoded in finer and finer detail, subject to bandwidth availability, so that any potential “lulls" in bandwidth utilization can be used to improve the picture quality.
  • a preferred combination of this invention with the copending invention would favor the progressively finer encoding of the regions of interest identified by the secondary image 120, rather than potentially less interesting regions. That is, for example, identified regions of interest would be given higher priority for allocation of the available bandwidth, and the regions of less interest would be allocated bandwidth after the interesting regions are rendered at a predefined acceptable level of detail.
  • the secondary image 120 can be used as a "front-end" filter to a conventional object-recognition application.
  • the object-recognition application is configured to prioritize the search for potential objects to the areas of interest identified by the characteristics of the regions of the secondary image 120. Similarly, if the object-recognition application is designed to find objects that are known to correspond to a minimum size area relative to the secondary image 120, the search can be restricted to the areas of the secondary- image that contain contiguous blocks having the desired characteristic that occupy the minimum size area.
  • the encoder 150 can encode the individual regions of the video image 110 at a finer level of detail, or, if the encoding directly supports object-dependent encoding, such as an MPEG-4 encoding, the encoder 150 encodes the identified regions as an explicit object, with an associated quantization parameter.
  • object-dependent encoding such as an MPEG-4 encoding
  • the specific details of the encoding and its associated level of detail dependencies will be dependent upon the particular encoding scheme employed, and other techniques for optimizing the level of detail based on an identification of an obj ect or region of interest will be evident to one of ordinary skill in the art in view of this disclosure.
  • FIG. 2 illustrates an example camera system 200 in accordance with this invention.
  • the camera system 200 includes a camera 210 for collecting video images (110 in FIG. 1), and a secondary sensor 220 for collecting secondary images (120 in FIG. 1).
  • the field of view 215 of the camera 210 and the field of view 225 of the sensor 220 should substantially correspond.
  • the same optic system that is used by the camera to produce the video image 110 would be used to produce the secondary image 120, via a sensor 220 that is integral to the camera 210, so that an exact correspondence can be achieved.
  • an exact correspondence is not required.
  • FIG. 2 illustrates a secondary sensor 220 that is adjacent the camera 210, illustrative of a configuration for a sensor 220 that is provided as an "option” to a conventional video camera 210, or as a removable item on a camera 210 that includes an integral encoder (150 of FIG. 1) in accordance with this invention.
  • the correspondence between the images 110, 120 is substantially linear, as illustrated in FIG. 1.
  • a mapping between the images 110, 120 in regions beyond the substantially corresponding region 275 can be defined in terms of a more complex coordinate transformation, using approximation techniques common in the art.
  • the field of view 215 will contract or expand accordingly. In an ideal embodiment, the change of zoom in the camera 210 will effect a corresponding change of the field of view 225 of the secondary sensor.
  • the field of view 225 may be fixed.
  • the field of view 225 is set to a "typical" field, within which objects of interest are likely to appear.
  • the regions of video image 110 in the field of view 215 of the camera 210 that are beyond the field of view 225 of the secondary sensor 220, because of a zoomed- out setting of the camera 210, in this embodiment are set to a default coarse level of detail setting.
  • regions of the secondary image 120 that are beyond the field of view 215 of the camera 210, because of a zoomed-in setting of the camera 210 are ignored, except as necessary to effect the aforementioned interpolation of characteristic values to prevent edge discontinuities.
  • ancillary methods for improving the correlation between the images 110, 120 may also be used.
  • the appropriate coordinate transformation may be determined by comparing characteristics of the images 110, 120 and using least-square-error curve fitting techniques, common in the art, to determine the appropriate parameters of the coordinate transformation between the images 110, 120.
  • thermal imaging arrays are commonly available. Commonly available thermal arrays provide images (120 in FIG. 1) having 64x64 regions (121); larger and smaller arrays are also available.
  • US patent 6,031,231 "INFRARED FOCAL PLANE ARRAY”, issued 29 February 2000 to Kimata et al, and incorporated by reference herein, provides an overview of two-dimensional infrared focal plane arrays of temperature detecting units that are arranged on semiconductor substrates.
  • US patent 4,868,391 "INFRARED LENS ARRAYS”, issued 19 September 1989 to Antoine Y.
  • each of the lenses have a common focal point, energizing a single temperature detecting unit.
  • an array of fresnel lenses are arranged to direct thermal energy to a plurality of temperature detecting units on a semiconductor substrate. The output from the temperature detecting units corresponds to the image 120 of FIG. 1.
  • the fields of view of the individual detecting units within the sensor 220 need not be uniform. That is, for example, in a preferred embodiment of this invention, the fresnel lenses corresponding to the perimeter regions of the image 120 have a wider field of view than the fresnel lenses corresponding to the center region of the image 120, because it is likely that objects or regions of interest will generally be located near the center of the video image 110.
  • the sensor 220 may correspond to a conventional infrared camera. In such an embodiment, the infrared camera 220 and the video camera 210 are mounted on a common carrier, and controlled by a common control system. Each of the cameras 210, 220 provide their corresponding images 110, 120 to an encoder 150 for processing as discussed above.
  • the encoder 150 may be located in a device that reads the images 110, 120 directly from the camera 210 and sensor 220, and may be embedded within either the camera 210 or sensor 220. In like manner, the encoder 150, camera 210, and sensor 220 may be embodied as a single device.
  • the encoder 150 may also be an independent device that acquires the images 110, 120 from recordings or transmissions from the camera 210, and sensor 220.
  • a time-stamp is provided for each image 110, 120, to facilitate a synchronization between the video images 110 and secondary images 120.
  • the frame rate of the camera 210 and sensor 220 need not be identical, provided only that the secondary images 120 can be substantially correlated in time with the video images 110.
  • FIG. 3 illustrates an example flow diagram of an encoding system in accordance with this invention.
  • this flow diagram is presented with reference to the objects of FIGs. 1 and 2, and in the context of a straightforward MPEG encoding, without the details of alternative embodiments presented above.
  • the invention is not limited to this example.
  • a default quantization factor is determined.
  • This default quantization factor corresponds to a quantization step size in a conventional MPEG encoding that produces a relatively coarse level of detail. This default factor may be determined based on available bandwidth, prior image quality, overall complexity or dynamics of prior images, and so on. For convenience, this default quantization factor is allocated to each region of the video image 110, at 330, and then selectively modified, via the loop 340-360, based on the characteristics of the secondary image 120, such as a thermal-object-outline derived from the secondary image 120.
  • Each region 121 of the secondary image 120 is successively processed in the loop 340-360.
  • a simple threshold test at 345, is used to determine whether each region corresponds to a "region of interest".
  • Each region 121 of the secondary image 120 has an associated characteristic, such as a resistance or a voltage corresponding to the detected heat within the region 121, and a measure of this characteristic is used to determine whether or not the region is a "region of interest”. If the measure exceeds the threshold, the quantization factor of the corresponding regions 111 of the video image 110 is adjusted so as to effect an encoding at a finer level of detail, at 350.
  • the loop 340-360 may be replaced by a continuous determination of an appropriate quantization factor for each region 111 of the video image 110 based on an interpolation of the measures of each region 121.
  • the loop 340-360 may be replaced or augmented by a fuzzy logic system as discussed in US patent 5,475,433, or a progressive approach as discussed in copending application 09/220,292, discussed above.
  • the loop 340-360 may be replaced by a conventional object-recognition system that uses the measures of the characteristics of the image 120 to facilitate an efficient object search, also discussed above.
  • the video image 110 is encoded, using the quantization factors determined above based on the secondary image 120.
  • the encoding and quantization factors may also be dependent on other parameters, such as available bandwidth, degree of complexity and movement, and so on, using techniques common in the art, or as disclosed in the copending US patent application 09/220,292.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Toxicology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Closed-Circuit Television Systems (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

A secondary sensor is provided that senses the same scene as a video camera. The image from the secondary sensor is used to identify areas of the video image corresponding to objects of interest. The identified areas of interest can then be encoded at a finer level of detail than the other areas in the video image. A preferred secondary sensor for detecting animate objects, such as humans in a videoconference scene, is a conventional infrared heat sensor matrix. By encoding the areas of the video image corresponding to ambient temperature regions of the heat sensor matrix at a generally lower level of detail, the available bandwidth can be allocated for transmitting the higher temperature regions at a finer level of detail or a higher frame rate. The secondary image may also be used as a 'front end filter' to conventional object recognition applications.

Description

Using a secondary sensor for optimized video communications
1. Field of the Invention
This invention relates to the field of video communications, and in particular to a method and system that facilitates an optimized transmission of images based on a coupling of a video camera with a secondary sensor, such as a heat sensor mosaic. 2. Description of Related Art
Video communications consume a relatively large transmission bandwidth, and a number of systems have been developed and continue to be developed to reduce the required bandwidth, or to optimize the use of existing bandwidth. An MPEG encoding of a stream of images, for example, uses a variety of techniques to reduce the amount of data that needs to be transmitted or stored. For ease of reference, the term bandwidth is used herein to include the amount of encoded data required to either store or transmit video images. A discrete cosine transform (DCT) is used to reduce the size of the encoded information spatially within each image frame, or portion of a frame. Motion estimation techniques are used to reduce the size of the encoded information temporally, based on the amount of difference, or movement, between successive images. Quantization is used to reduce the size of the encoded information based on the degree of detail required, or to reduce the size, and thus the detail, based on available bandwidth. Each of these techniques are intended to optimize the allocation of bandwidth to different characteristics of the image, without introducing noticeable visible anomalies when the received image is decoded and displayed. Even with the bandwidth optimizing techniques of MPEG encoding, some compromises are required for low bandwidth systems. For example, video images communicated over the Internet are typically constrained to small size images, providing substantially less resolution than a full-resolution DND version of the same stream of images. Video images communicated for videoconferencing are typically encoded at less than half the frame rate of conventional television broadcasts, and produces delayed and discontinuous images on the display.
US patent 5,475,433 "FUZZY-COΝTROLLED CODING METHOD AND APPARATUS THEREFOR" (sic), issued 12 December 1995, for Je-chang Jeong, and incorporated by reference herein, teaches a further method of optimizing the MPEG encoding of video images by adjusting the parameters of the aforementioned encoding techniques based on a combination of characteristics. For example, a sequence of images with a large amount of motion is encoded at less level of detail than a relatively static image, based on the premise that the lack of detail will not be visibly detectable in a fast moving scene. In like manner, the degree of complexity of the image and its brightness, and the amount of available bandwidth are used to adjust the quantization level, and therefore the amount of detail, of the transmitted image.
Other techniques have been proposed for improving the bandwidth allocation process, most of which rely on the segregation of images into "objects", or "object regions". MPEG-4, for example, allows for the separation of an object from its background, and thereby allows the object to be encoded at a different, typically finer, level of detail than the background. This encoding technique is expected to be particularly well suited for videoconferencing, wherein the majority of the limited bandwidth is allocated to the human 'objects' in the scene, with minimal bandwidth being allocated to background scenes. In this manner, although movements in the background may appear staggered and potentially blurred, the human objects in the scene will appear clearly, and potentially at a higher frame rate that reduces delays and discontinuities. These object-dependent encoding techniques are also expected to facilitate graphic art effects, wherein select objects can be encoded with different emphasis than the background scene, or other objects. These advanced techniques for allocating bandwidth or providing graphic art effects to objects of interest in an encoded image, however, requires the recognition of each object in the image. Object recognition, however, is a complex processing task that currently requires processing equipment that is beyond the feasible cost range for consumer devices. The high cost and relatively low accuracy of current object recognition devices precludes its use in most applications that could benefit from an optimized encoding, such as video conferencing and Internet video communications.
It is an object of this invention to provide a method and system for object recognition that facilitates an optimization of bandwidth allocation of video images. It is a further object of this invention to provide a low cost video system having an object-based resource allocation. It is a further object of this invention to provide a low cost video system that facilitates an optimization of bandwidth allocation. It is a further object of this invention to provide a means of distinguishing an object from the background of an image. These objects and others are achieved by providing a secondary sensor that senses the same scene as a video camera. The secondary image is used to identify areas of the video image corresponding to objects of interest. The identified areas of interest can then be encoded at a finer level of detail than the other areas in the video image. A preferred secondary sensor for detecting animate objects, such as humans in a videoconference scene, is a conventional infrared heat sensor matrix. By encoding the areas of the video image corresponding to ambient temperature regions of the heat sensor matrix at a very coarse level of detail, the available bandwidth can be allocated for transmitting the higher temperature regions at a higher level of detail, or at a higher frame rate. The secondary image may also be used as a "front end filter" to conventional object recognition applications, thereby increasing the efficiency and accuracy of these applications.
The invention is explained in further detail, and by way of example, with reference to the accompanying drawings wherein:
Fig. 1 illustrates an example block diagram of an encoding system in accordance with this invention.
Fig. 2 illustrates an example camera system in accordance with this invention. Fig. 3 illustrates an example flow diagram of an encoding system in accordance with this invention.
Throughout the drawings, the same reference numerals indicate similar or corresponding features or functions.
FIG. 1 illustrates an example block diagram of an encoding system 100 in accordance with this invention. The encoding system 100 includes a source of a video image 110, a corresponding secondary image 120, and an encoder 150. For ease of reference, the term image is used herein to define an array of values corresponding to items within a field of view of a collection device. For example, the video image 110 generally corresponds to an array of values associated with the collection of visible light within the field of view of a video camera. This array of values may be in any of a variety of formats, and although represented as an array of values in the figures, may be a serial stream of values.
As discussed further below, the secondary image 120 in accordance with this invention is not a derivative of the video image 110, but is a representation of substantially the same scene as the video image, collected via an alternative sensor to the sensor that is used to collect the video image 110. In a preferred embodiment, the secondary image 120 is a representation of the scene collected via an infrared heat sensor, although other secondary sensing devices may be used as well. Preferably, the secondary sensor captures a characteristic of the scene that facilitates the recognition of potential objects of interest 101 in the video image 110. An infrared sensor is particularly well suited for detecting animate objects, such as humans, even when the object is fully clothed. Another sensor, such as a detector of particular visible colors, may be used, for example, when the particular color is associated with potential objects of interest. As illustrated, the resolution of the secondary image 120 may be different than the video image 110. In a low-cost embodiment, for example, the secondary image 120 may be a 64x64 array of thermal values, whereas the video image 110 may be a 330x485, or larger, array of luminance and chrominance values. The resolution of the secondary image 120 is selected based on a cost/performance tradeoff. The resolution of the secondary image 120 determines the accuracy of determining the shape of the object of interest 101, and thereby the degree of encoding optimization that can be achieved, but the cost of a sensor to produce a high-resolution image 120 may be substantially higher than the cost of a sensor that produces a low-resolution image 120. Such a high cost may be warranted, for example, in a professional system that is used to identify a newscaster in a scene, and substitutes appropriate background images based on the news content.
For ease of reference and understanding, this invention is presented using the paradigm of an identification of potential objects or regions of interest in the video image based on thermal emissions, and an adjustment of the level of detail provided in the encoding of these objects or regions of interest. As will be evident to one of ordinary skill in the art in view of this disclosure, other characteristics of a secondary image can be used to control the encoding of the video images, such as an identification of objects based on a particular color. In like manner, other encoding parameters, such as brightness, color intensity, frame rate, and so on, may be adjusted in dependence upon the detected characteristics. In the context of this invention, any parameter or characteristic that affects the encoding of an image is termed an "encoding parameter". For example, in lieu of directly adjusting the encoding level of detail for background regions, the luminance and chrominance values within these regions may be set to a constant value, thereby minimizing the information content that needs to be encoded for these regions. As illustrated in FIG. 1, the characteristics of the secondary image 120 are used to control the encoding parameters 160 that are used by the encoder 150 for encoding the video image 110. For example, the object 101 is illustrated as being overlaid on the images 110, 120. In the aforementioned infrared sensor example, if this object 101 is a source of heat, the sensor regions corresponding to infrared image 120 that are overlaid by the infrared emitting object 101 will have higher sensed values than the surrounding regions. Regions that are partially overlaid by the infrared emitting object 101 will have an average sensed value that is lower than the regions that are overlaid completely by the infrared emitting object 101, but higher than the regions that do not contain a source of infrared emissions. If, as illustrated, a region 121 of the secondary image 120 contains a characteristic (high thermal sensed value) that corresponds to the presence of an animate object (warm body) in that region 121, the encoder 150 encodes the regions 111 of the video image 110 corresponding to this region 121 at a finer level of detail than regions in the secondary image 120 that do not exhibit the presence of a warm body. This level of detail can be changed, for example, by modifying the quantization step size used in the quantization of DCT values in an MPEG encoding. Other encoding parameters 160 may be adjusted, in addition to, or in lieu of the quantization parameter. For example, the perception of a higher frame rate can be achieved by transmitting frames containing the regions of interest more often than frames that contain the other regions. Note that the characteristic of the region 121 may be one of many parameters
160 that affect the level of detail of the encoding of the corresponding regions 111 in the video image 110. For example, a "fuzzy-logic" system such as presented in the aforementioned US patent 5,475,433 may be used to determine an encoding level of detail that is dependent upon a variety of factors, including one or more characteristics of the secondary image 120. Copending US patent application "MOTION-ANALYSIS BASED
BUFFER REGULATION SCHEME", serial number 09/220,292, filed 23 December 1998 for Shing-Chi Tzou, Zhiyong Wang, and Janwun Lee, Attorney Docket PHA 23,597, incorporated by reference herein, discloses the use of an image map that contains a nominal value that is used to determine the quantization step size for each MPEG-sized block in a video image. The nominal value of each block is dynamically adjusted, based on the current as well as prior characteristics of the block. As in the cited US patent 5,474,433, this nominal value is adjusted to produce a coarser level of detail for a "dynamic" block whose content changes quickly. The use of an image map allows for a continuously improved rendering of the video image. For example, a "static" block in an image is progressively encoded in finer and finer detail, subject to bandwidth availability, so that any potential "lulls" in bandwidth utilization can be used to improve the picture quality. A preferred combination of this invention with the copending invention would favor the progressively finer encoding of the regions of interest identified by the secondary image 120, rather than potentially less interesting regions. That is, for example, identified regions of interest would be given higher priority for allocation of the available bandwidth, and the regions of less interest would be allocated bandwidth after the interesting regions are rendered at a predefined acceptable level of detail.
As will be evident to one of ordinary skill in the art in view of this disclosure, a variety of techniques can be used to correlate the characteristics of the regions of the secondary image 120 to the level of detail of the regions of the video image 110. A filtering, or interpolation, of the characteristics of the regions of the secondary image 120 can be used to determine a corresponding quantization factor for each region, or block, of the video image 110, to minimize discontinuities at the edges of each region of the secondary image 120, using techniques common in the art. In an explicit object-identification scheme, the secondary image 120 can be used as a "front-end" filter to a conventional object-recognition application. In such an embodiment, the object-recognition application is configured to prioritize the search for potential objects to the areas of interest identified by the characteristics of the regions of the secondary image 120. Similarly, if the object-recognition application is designed to find objects that are known to correspond to a minimum size area relative to the secondary image 120, the search can be restricted to the areas of the secondary- image that contain contiguous blocks having the desired characteristic that occupy the minimum size area. When the object-recognition application recognizes an object of interest, the encoder 150 can encode the individual regions of the video image 110 at a finer level of detail, or, if the encoding directly supports object-dependent encoding, such as an MPEG-4 encoding, the encoder 150 encodes the identified regions as an explicit object, with an associated quantization parameter. The specific details of the encoding and its associated level of detail dependencies will be dependent upon the particular encoding scheme employed, and other techniques for optimizing the level of detail based on an identification of an obj ect or region of interest will be evident to one of ordinary skill in the art in view of this disclosure.
FIG. 2 illustrates an example camera system 200 in accordance with this invention. The camera system 200 includes a camera 210 for collecting video images (110 in FIG. 1), and a secondary sensor 220 for collecting secondary images (120 in FIG. 1). In order for the secondary image 120 to correspond to the video image 110, the field of view 215 of the camera 210 and the field of view 225 of the sensor 220 should substantially correspond. In an ideal embodiment, the same optic system that is used by the camera to produce the video image 110 would be used to produce the secondary image 120, via a sensor 220 that is integral to the camera 210, so that an exact correspondence can be achieved. However, as illustrated in FIG. 2, an exact correspondence is not required. FIG. 2 illustrates a secondary sensor 220 that is adjacent the camera 210, illustrative of a configuration for a sensor 220 that is provided as an "option" to a conventional video camera 210, or as a removable item on a camera 210 that includes an integral encoder (150 of FIG. 1) in accordance with this invention.
Depending upon the particular configuration of the sensor 220 relative to the camera 210, there will be a region 275 wherein the fields of view 215, 225 substantially correspond. Within this region 275, the correspondence between the images 110, 120 is substantially linear, as illustrated in FIG. 1. Depending upon the accuracy desired, a mapping between the images 110, 120 in regions beyond the substantially corresponding region 275 can be defined in terms of a more complex coordinate transformation, using approximation techniques common in the art. If the camera 210 has a variable-zoom capability, the field of view 215 will contract or expand accordingly. In an ideal embodiment, the change of zoom in the camera 210 will effect a corresponding change of the field of view 225 of the secondary sensor. Alternatively, in a lower-cost embodiment, the field of view 225 may be fixed. In this embodiment, the field of view 225 is set to a "typical" field, within which objects of interest are likely to appear. The regions of video image 110 in the field of view 215 of the camera 210 that are beyond the field of view 225 of the secondary sensor 220, because of a zoomed- out setting of the camera 210, in this embodiment are set to a default coarse level of detail setting. In like manner, regions of the secondary image 120 that are beyond the field of view 215 of the camera 210, because of a zoomed-in setting of the camera 210, are ignored, except as necessary to effect the aforementioned interpolation of characteristic values to prevent edge discontinuities. Ancillary methods for improving the correlation between the images 110, 120 may also be used. For example, the appropriate coordinate transformation may be determined by comparing characteristics of the images 110, 120 and using least-square-error curve fitting techniques, common in the art, to determine the appropriate parameters of the coordinate transformation between the images 110, 120.
Any of a variety of devices, common in the art, may be used to provide the secondary sensor 220 of FIG. 2 for creating the secondary image 120 of FIG. 1. In the infrared field, thermal imaging arrays are commonly available. Commonly available thermal arrays provide images (120 in FIG. 1) having 64x64 regions (121); larger and smaller arrays are also available. US patent 6,031,231 "INFRARED FOCAL PLANE ARRAY", issued 29 February 2000 to Kimata et al, and incorporated by reference herein, provides an overview of two-dimensional infrared focal plane arrays of temperature detecting units that are arranged on semiconductor substrates. US patent 4,868,391 "INFRARED LENS ARRAYS", issued 19 September 1989 to Antoine Y. Messiou, and incorporated by reference herein, provides an array of fresnel lens that are arranged at different angles to provide a wide field of view, the array being configured as a substantially flat sheet. In the '391 patent, each of the lenses have a common focal point, energizing a single temperature detecting unit. In a preferred low-cost embodiment of this invention, an array of fresnel lenses are arranged to direct thermal energy to a plurality of temperature detecting units on a semiconductor substrate. The output from the temperature detecting units corresponds to the image 120 of FIG. 1.
Note that the fields of view of the individual detecting units within the sensor 220 need not be uniform. That is, for example, in a preferred embodiment of this invention, the fresnel lenses corresponding to the perimeter regions of the image 120 have a wider field of view than the fresnel lenses corresponding to the center region of the image 120, because it is likely that objects or regions of interest will generally be located near the center of the video image 110. Note, also, that the sensor 220 may correspond to a conventional infrared camera. In such an embodiment, the infrared camera 220 and the video camera 210 are mounted on a common carrier, and controlled by a common control system. Each of the cameras 210, 220 provide their corresponding images 110, 120 to an encoder 150 for processing as discussed above. The encoder 150 may be located in a device that reads the images 110, 120 directly from the camera 210 and sensor 220, and may be embedded within either the camera 210 or sensor 220. In like manner, the encoder 150, camera 210, and sensor 220 may be embodied as a single device. The encoder 150 may also be an independent device that acquires the images 110, 120 from recordings or transmissions from the camera 210, and sensor 220. Preferably, a time-stamp is provided for each image 110, 120, to facilitate a synchronization between the video images 110 and secondary images 120. Note that the frame rate of the camera 210 and sensor 220 need not be identical, provided only that the secondary images 120 can be substantially correlated in time with the video images 110. These and other system configuration options will be evident to one of ordinary skill in the art in view of this disclosure.
FIG. 3 illustrates an example flow diagram of an encoding system in accordance with this invention. For convenience and ease of understanding, this flow diagram is presented with reference to the objects of FIGs. 1 and 2, and in the context of a straightforward MPEG encoding, without the details of alternative embodiments presented above. As would be evident to one of ordinary skill in the art, the invention is not limited to this example.
At 310, the correspondence between the secondary image 120 and the video image 110 is determined, as discussed above. At 320, a default quantization factor is determined. This default quantization factor corresponds to a quantization step size in a conventional MPEG encoding that produces a relatively coarse level of detail. This default factor may be determined based on available bandwidth, prior image quality, overall complexity or dynamics of prior images, and so on. For convenience, this default quantization factor is allocated to each region of the video image 110, at 330, and then selectively modified, via the loop 340-360, based on the characteristics of the secondary image 120, such as a thermal-object-outline derived from the secondary image 120.
Each region 121 of the secondary image 120 is successively processed in the loop 340-360. In this example, a simple threshold test, at 345, is used to determine whether each region corresponds to a "region of interest". Each region 121 of the secondary image 120 has an associated characteristic, such as a resistance or a voltage corresponding to the detected heat within the region 121, and a measure of this characteristic is used to determine whether or not the region is a "region of interest". If the measure exceeds the threshold, the quantization factor of the corresponding regions 111 of the video image 110 is adjusted so as to effect an encoding at a finer level of detail, at 350. As noted above, the loop 340-360 may be replaced by a continuous determination of an appropriate quantization factor for each region 111 of the video image 110 based on an interpolation of the measures of each region 121. In like manner, the loop 340-360 may be replaced or augmented by a fuzzy logic system as discussed in US patent 5,475,433, or a progressive approach as discussed in copending application 09/220,292, discussed above. In like manner, the loop 340-360 may be replaced by a conventional object-recognition system that uses the measures of the characteristics of the image 120 to facilitate an efficient object search, also discussed above.
At 370, the video image 110 is encoded, using the quantization factors determined above based on the secondary image 120. The encoding and quantization factors may also be dependent on other parameters, such as available bandwidth, degree of complexity and movement, and so on, using techniques common in the art, or as disclosed in the copending US patent application 09/220,292.
The foregoing merely illustrates the principles of the invention. Other embodiments and applications will be evident to one of ordinary skill in the art in view of this disclosure. For example, although the invention is presented in terms of optimizing the bandwidth required to transmit images, the encoding schemes presented herein are equally applicable for optimizing the storage requirements for storing images, and can be used to optimize the capacity of recording media, such as video tape. It will thus be appreciated that those skilled in the art will be able to devise various arrangements which, although not explicitly described or shown herein, embody the principles of the invention and are included within the scope of the following claims.

Claims

CLAIMS:
1. A video encoding system (100) that is configured to receive at least one video image (110) and at least one corresponding secondary image (120), comprising: an encoder (150) that encodes each region (111) of a plurality of regions of the video image (110) using an encoding parameter (160) that is dependent upon a characteristic of a corresponding region (121) of the secondary image (120), and produces thereby an encoding of the video image (110).
2. The video encoding system (100) of claim 1, further including: an image detector (210) that is sensitive to visible light within a first field of view (215), and thereby produces the at least one video image (110) corresponding to the first field of view (215), and a heat detector that is sensitive to infrared emissions within a second field of view (225) that substantially corresponds to at least a portion of the first field of view (215) of the image detector (210), and thereby produces the corresponding secondary image (120).
. The video encoding system (100) of claim 1 , wherein the at least one corresponding secondary image (120) provides an object- related pattern (101), and the encoder (150) is configured to encode objects within the video image (110) based on the obj ect-related pattern (101).
4. The video encoding system (100) of claim 3, further including an object-recognition system that facilitates a recognition of the object-related pattern (101) based on the at least one corresponding secondary image (120).
5. The video encoding system (100) of claim 1 , wherein the encoder (150) is further configured to encode each region (111) of the plurality of regions based on at least one of: a motion parameter, a complexity parameter, a brightness parameter, and a bandwidth parameter.
6. The video encoding system (100) of claim 1, wherein the encoding parameter (160) corresponds to a level of detail of the encoding of the video image (110).
7. The video encoding system (100) of claim 6, wherein the characteristic of the corresponding region (121) of the secondary image (120) is a measure of a temperature associated with the corresponding region (121) of the secondary image (120).
8. The video encoding system (100) of claim 7, wherein the encoder (150) is further configured to encode each region (111) of the plurality of regions based on at least one of: a motion parameter, a complexity parameter, a brightness parameter, and a bandwidth parameter.
9. A camera system (200) comprising: a video camera (210) that collects video images (110) corresponding to a first field of view (215) of the video camera (210), a secondary detector (220), operably attached to the video camera (210), that collects secondary images (120) corresponding to a second field of view (225) that substantially corresponds to at least a segment of the first field of view (215), to facilitate a subsequent recognition of regions of interest (101) within the video images (110), based on the associated secondary images (120).
10. The camera system (200) of claim 9, wherein the secondary detector (220) comprises a thermal detector.
11. The camera system (200) of claim 9, further including an encoder (150) that is configured to encode the video images (110) in dependence upon characteristics of the corresponding secondary images (120) and to produce thereby an encoded output.
12. The camera system (200) of claim 11, further including at least one of: a transmitter that is configured to transmit the encoded output to a receiver, and a recorder that is configured to store the encoded output.
13. The camera system (200) of claim 11 , further including an object recognition system that uses the secondary images (120) to facilitate a recognition of an object-related pattern (101), and wherein the encoder (150) is configured to encode objects within the video image (110) based on the obj ect-related pattern (101).
14. The camera system (200) of claim 11 , wherein the encoder (150) is further configured to encode the video images (110) based on at least one of: a motion parameter, a complexity parameter, a brightoess parameter, and a bandwidth parameter.
15. The camera system (200) of claim 11 , wherein the characteristics of the corresponding secondary images (220) correspond to a measure of thermal emissions within the second field of view (225).
16. The camera system (200) of claim 11 , wherein the encoder (150) is configured to encode the video images (110) using quantization factors that are dependent upon the characteristics of the corresponding secondary images (120).
17. The camera system (200) of claim 16, wherein the quantization factors are further dependent upon at least one of: a motion parameter, a complexity parameter, a brightness parameter, and a bandwidth parameter.
18. A method of encoding a video image (110) comprising: receiving a secondary image (120) corresponding to at least a portion of the video image (110), determining (310) the correspondence between the secondary image (120) and the video image (110), associating (350) an encoding factor to each region (111) of a plurality of regions of the video image (110) in dependence upon a characteristic of a corresponding region (121) of the secondary image (120), and encoding (370) each region (111) of the plurality of regions of the video image (110) based on the associated encoding factor.
19. The method of claim 18, wherein the secondary image (120) comprises a thermal map.
20. The method of claim 18 , wherein the encoding parameter (160) affects a level of detail of the encoding of each region (111) of the video image (110).
PCT/EP2001/008538 2000-08-08 2001-07-23 Video encoder using image from a secondary image sensor Ceased WO2002013535A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
KR1020027004498A KR20020064794A (en) 2000-08-08 2001-07-23 Using a secondary sensor for optimized video communications
JP2002518086A JP2004506354A (en) 2000-08-08 2001-07-23 Use of secondary sensors for optimal image communication
EP01969495A EP1310102A2 (en) 2000-08-08 2001-07-23 Video encoder using image from a secondary image sensor

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US63468200A 2000-08-08 2000-08-08
US09/634,682 2000-08-08

Publications (2)

Publication Number Publication Date
WO2002013535A2 true WO2002013535A2 (en) 2002-02-14
WO2002013535A3 WO2002013535A3 (en) 2002-06-13

Family

ID=24544796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2001/008538 Ceased WO2002013535A2 (en) 2000-08-08 2001-07-23 Video encoder using image from a secondary image sensor

Country Status (5)

Country Link
EP (1) EP1310102A2 (en)
JP (1) JP2004506354A (en)
KR (1) KR20020064794A (en)
CN (1) CN1393111A (en)
WO (1) WO2002013535A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2658245A1 (en) * 2012-04-27 2013-10-30 BlackBerry Limited System and method of adjusting camera image data
US8994845B2 (en) 2012-04-27 2015-03-31 Blackberry Limited System and method of adjusting a camera based on image data
CN109727417A (en) * 2017-10-27 2019-05-07 安讯士有限公司 Method and controller for controlling a video processing unit to facilitate detection of newcomers

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2050088B1 (en) * 2006-07-28 2015-11-11 Koninklijke Philips N.V. Private screens self distributing along the shop window

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5764803A (en) * 1996-04-03 1998-06-09 Lucent Technologies Inc. Motion-adaptive modelling of scene content for very low bit rate model-assisted coding of video sequences
AUPP340798A0 (en) * 1998-05-07 1998-05-28 Canon Kabushiki Kaisha Automated video interpretation system
US6496607B1 (en) * 1998-06-26 2002-12-17 Sarnoff Corporation Method and apparatus for region-based allocation of processing resources and control of input image formation

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2658245A1 (en) * 2012-04-27 2013-10-30 BlackBerry Limited System and method of adjusting camera image data
US8994845B2 (en) 2012-04-27 2015-03-31 Blackberry Limited System and method of adjusting a camera based on image data
CN109727417A (en) * 2017-10-27 2019-05-07 安讯士有限公司 Method and controller for controlling a video processing unit to facilitate detection of newcomers
US11164008B2 (en) 2017-10-27 2021-11-02 Axis Ab Method and controller for controlling a video processing unit to facilitate detection of newcomers in a first environment

Also Published As

Publication number Publication date
EP1310102A2 (en) 2003-05-14
KR20020064794A (en) 2002-08-09
JP2004506354A (en) 2004-02-26
WO2002013535A3 (en) 2002-06-13
CN1393111A (en) 2003-01-22

Similar Documents

Publication Publication Date Title
US8416303B2 (en) Imaging apparatus and imaging method
US8605185B2 (en) Capture of video with motion-speed determination and variable capture rate
US5751378A (en) Scene change detector for digital video
EP1431912B1 (en) Method and system for determining an area of importance in an archival image
US6961083B2 (en) Concurrent dual pipeline for acquisition, processing and transmission of digital video and high resolution digital still photographs
US20080129857A1 (en) Method And Camera With Multiple Resolution
EP0725536A2 (en) Method and apparatus for image sensing with dynamic range expansion
US6873727B2 (en) System for setting image characteristics using embedded camera tag information
US20060140445A1 (en) Method and apparatus for capturing digital facial images optimally suited for manual and automated recognition
US7733380B1 (en) Method and/or architecture for controlling encoding parameters using integrated information from camera ISP
US20070092244A1 (en) Camera exposure optimization techniques that take camera and scene motion into account
US20080240586A1 (en) Image distribution apparatus, communication terminal apparatus, and control method thereof
US20090290645A1 (en) System and Method for Using Coded Data From a Video Source to Compress a Media Signal
TW200939779A (en) Intelligent high resolution video system
EP1425707A2 (en) Image segmentation by means of temporal parallax difference induction
US20030169818A1 (en) Video transcoder based joint video and still image pipeline with still burst mode
WO2008107713A1 (en) Controlled high resolution sub-image capture with time domain multiplexed high speed full field of view reference video stream for image based biometric applications
JPH05191718A (en) Image pickup device
US8120675B2 (en) Moving image recording/playback device
US7889265B2 (en) Imaging apparatus, control method for the imaging apparatus, and storage medium storing computer program which causes a computer to execute the control method for the imaging apparatus
WO2002013535A2 (en) Video encoder using image from a secondary image sensor
JP4243034B2 (en) Encoder
CN106101530A (en) A kind of method that high-speed adaptability night vision image strengthens
JPH03230691A (en) Digital electronic still camera
KR100457302B1 (en) Auto tracking and auto zooming method of multi channel by digital image processing

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

ENP Entry into the national phase

Ref country code: JP

Ref document number: 2002 518086

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1020027004498

Country of ref document: KR

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 01802968X

Country of ref document: CN

AK Designated states

Kind code of ref document: A3

Designated state(s): CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR

WWP Wipo information: published in national office

Ref document number: 1020027004498

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 2001969495

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 2001969495

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2001969495

Country of ref document: EP