KR20010089664A

KR20010089664A - System and method for three-dimensional modeling

Info

Publication number: KR20010089664A
Application number: KR1020017007811A
Authority: KR
Inventors: 욘앤; 찰라팔리키랜
Original assignee: 요트.게.아. 롤페즈; 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 1999-10-21
Filing date: 2000-10-06
Publication date: 2001-10-08
Also published as: WO2001029767A3; EP1190385A2; WO2001029767A2; JP2003512802A

Abstract

이미지 처리 시스템 및 방법에는 입력 쌍의 이미지 프레임으로부터 정보를 사용하여 3차원 모델을 제공하는 것이 기재된다. 3차원 표면은 입력 프레임에서 특징을 우선 식별함으로써 획득된다. 특징 대응부 매칭은 프레임에 근거한 차이 정보를 사용하여 실행된다. 식별된 특징은 또한 시간 정보를 사용하여 상관될 수 있다.Image processing systems and methods describe providing a three-dimensional model using information from image frames of an input pair. Three-dimensional surfaces are obtained by first identifying features in the input frame. Feature correspondence matching is performed using the difference information based on the frame. The identified features may also be correlated using time information.

Description

System and method for three-dimensional modeling}

인터넷 또는 PSTN(공중 회선 교환 전화망)을 통한 비디오/이미지 통신 애플리케이션은 대중성과 사용량에 있어서 성장하고 있다. 통상의 비디오/이미지 통신 테크놀로지에서 화상(JPEG 또는 GIF 포맷)이 포착되며 다음으로 전송 네트워크를 통해 전송된다. 그러나, 이 접근법은 화상의 사이즈(즉, 데이터 양) 때문에 큰 대역폭을 필요로 한다.Video / image communication applications over the Internet or public switched telephone network (PSTN) are growing in popularity and usage. In conventional video / image communication technology, an image (JPEG or GIF format) is captured and then transmitted over a transmission network. However, this approach requires large bandwidth because of the size of the picture (i.e. the amount of data).

64K 및 2Mbits/sec간의 데이터 속도를 갖는 움직이는 이미지를 코딩할 때, 블록에 기반을 둔 하이브리드 코더가 전형적으로 사용된다. 코더는 각 이미지의 시퀀스를 독립적으로 움직이는 블록으로 세분한다. 다음으로 각 블록은 2차원 모션 예측과 변환 코딩에 의해 코드화된다. 전송 속도에 따라, 수신된 결과 이미지는 매끄럽게 실행하지 않으며, 실시간으로 실행되지 않는다.When coding moving images with data rates between 64K and 2Mbits / sec, block-based hybrid coders are typically used. The coder subdivides the sequence of each image into blocks that move independently. Each block is then coded by two-dimensional motion prediction and transform coding. Depending on the transmission speed, the received result image does not run smoothly and does not run in real time.

방법은 비디오/이미지 통신을 향상시키는데 및/또는 전송될 필요가 있는 정보 양을 감소시키는데 사용되었다. 이러한 한 방법은 비디오폰 응용에 사용되었다. 이미지는 그 모션, 모양 및 표면 칼라를 규정하는 3 세트의 파라미터에 의해 인코드된다. 가시적인 통신의 주체(subject)가 전형적으로 사람이므로, 주 초점은 주체의 머리 또는 얼굴에 향할 수 있다.The method has been used to enhance video / image communication and / or reduce the amount of information that needs to be transmitted. One such method has been used for videophone applications. The image is encoded by three sets of parameters that define its motion, shape and surface color. Since the subject of visible communication is typically a human, the main focal point may be directed at the subject's head or face.

통상적인 비디오폰 통신 시스템에서, 단일 카메라는 전형적으로 비디오 콜을 실행하는 사람의 이미지를 획득하는데 사용된다. 단지 한 카메라가 사용되므로, 진짜 3차원(3D)의 얼굴 표면(즉, 모양 파라미터)을 획득하는 것은 어렵다. 전형적으로, 3차원 표면을 생성하기 위해, 대상에 대한 다수의 2차원 시점(보통 6)이 필요하다. 이 시점을 사용하여, 거리 변형이 적용된다. 예를 들어, 타원이 대상의 경계에 대한 거리의 함수로서 대상에서 Z 좌표의 모양을 얻기 위한 발생 함수로서 사용될 수 있다. 이 윤곽선은 단지 3차원 모양에 근접할 수 있다.In a typical videophone communication system, a single camera is typically used to acquire an image of the person making the video call. Since only one camera is used, it is difficult to obtain a true three-dimensional (3D) facial surface (ie, shape parameter). Typically, to create a three dimensional surface, a number of two dimensional viewpoints (usually six) to the object are required. Using this point in time, a distance deformation is applied. For example, an ellipse can be used as a generation function to obtain the shape of the Z coordinate at the object as a function of the distance to the object's boundary. This contour can only approximate a three-dimensional shape.

또 다른 실시예는 모델에 기반을 둔 코딩으로 불린다. 저비트율 통신은 주체의 머리의 각 안면 파라미터들만을 인코딩하고 전송함으로써 달성될 수 있다. 멀리 떨어진 장소에서, 얼굴 이미지는 전송된 파라미터를 사용하여 합성된다. 일반적으로, 모델에 기반을 둔 코딩은 적어도 4개의 작업을 필요로 한다: 얼굴의 구분화(segmentation), 안면의 특징 추출, 특징 트래킹 및 모션 추정.Another embodiment is called model based coding. Low bit rate communication can be achieved by encoding and transmitting only each facial parameter of the subject's head. In remote places, face images are synthesized using the transmitted parameters. In general, model-based coding requires at least four tasks: facial segmentation, facial feature extraction, feature tracking, and motion estimation.

얼굴의 구분화(segmentation)에 대한 공지된 한 방법은 파라미터화된 얼굴을 기술하는 데이터세트를 생성하는 것이다. 이 데이터세트는 얼굴 대상의 3차원 기술을 규정한다. 파라미터화된 얼굴은 근육 및 피부 엑추에이터(actuator)와 힘에 기반을 둔 변형을 모델링함으로써 해부학에 기반을 둔 구조로서 제공된다.One known method for segmentation of faces is to generate a dataset describing the parameterized face. This dataset defines the three-dimensional description of the face object. The parameterized face is provided as an anatomical-based structure by modeling muscle- and skin-actuator and force-based deformations.

도 1에 도시된 바와 같이, 한 세트의 다각형(polygon)은 사람의 얼굴 모델(100)을 규정한다. 각각의 다각형의 정점은 X, Y, 및 Z 좌표에 의해 규정된다. 각 정점은 인덱스 번호에 의해 식별된다. 특정 다각형은 다각형을 둘러싸는 한 세트의 인덱스에 의해 규정된다. 또한, 코드는 한 세트의 인덱스에 부가될 수 있으며, 특정 다각형에 대한 칼라를 규정한다.As shown in FIG. 1, a set of polygons defines a human face model 100. The vertex of each polygon is defined by the X, Y, and Z coordinates. Each vertex is identified by an index number. A particular polygon is defined by a set of indices surrounding the polygon. In addition, the code may be added to a set of indices, specifying a color for a particular polygon.

시스템 및 방법에서는 또한 디지털 이미지를 분석하고 사람의 얼굴을 인식하고 안면의 특징을 추출하는 것이 공지된다. 통상의 안면의 특징 결정 시스템은 안면의 칼라 톤 결정, 형판 매칭 또는 에지 검출 접근법과 같은 방법을 사용한다.Systems and methods are also known for analyzing digital images, recognizing human faces and extracting facial features. Conventional facial feature determination systems use methods such as facial color tone determination, template matching, or edge detection approaches.

모델 기반 코딩에서 가장 어려운 문제점 중 하나는 빠르고, 쉽고, 강인하게 안면의 특징 대응부를 제공하는 것이다. 순차적인 프레임에서, 동일한 안면의 특징은 정확하게 매치되어야 한다. 통상적으로, 블록-매칭 처리는 특징 대응부를 결정하기 위해 현 프레임과 다음 프레임에서의 픽셀들을 비교하는데 사용된다. 전체 프레임이 특징 대응부에 대해 검색되는 경우, 처리가 느리게 되며, 동일한 기울기 값을 갖는 영역의 미스매칭으로 인한 부정확한 결과를 산출할 수도 있다. 단지 서브세트의 프레임만이 검색되는 경우, 처리 시간은 개선될 수 있다. 그러나, 이 경우에, 처리는 임의의 특징 대응부를 결정하는데 실패할 수 있다.One of the most difficult problems in model-based coding is providing fast, easy and robust facial feature mapping. In sequential frames, the same facial features must match exactly. Typically, block-matching processing is used to compare the pixels in the current frame and the next frame to determine the feature correspondence. When the entire frame is retrieved for the feature correspondence, the processing is slow and may produce inaccurate results due to mismatching of regions having the same slope value. If only a subset of frames are retrieved, processing time can be improved. In this case, however, the process may fail to determine any feature correspondence.

이와 같이 감소된 데이터 속도 전송을 위한 디지털 이미지에 포함된 대상의 3차원 모델링에 대한 개선된 시스템 및 방법에 대한 필요성이 기술상 존재한다.There is a technical need for improved systems and methods for three-dimensional modeling of objects contained in digital images for such reduced data rate transmission.

본 발명은 일반적으로 3차원 모델링 분야에 관한 것이며, 특히, 본 발명은 정보에 기반을 둔 차이(disparity)를 사용하여 디지털 이미지내에 포함된 대상(object)의 3차원 모델링을 위한 시스템 및 방법에 관한 것이다.FIELD OF THE INVENTION The present invention generally relates to the field of three-dimensional modeling, and in particular, the present invention relates to systems and methods for three-dimensional modeling of objects contained in digital images using information-based disparity. will be.

도 1은 3차원 모델에 기반을 둔 코딩에 사용된 사람 얼굴 모델의 개략적 전면도.1 is a schematic front view of a human face model used for coding based on a three-dimensional model.

도 2는 본 발명의 한 양태에 따른 3차원 모델링 시스템의 블록도2 is a block diagram of a three-dimensional modeling system in accordance with an aspect of the present invention.

도 3은 도 1의 시스템을 지지할 수 있는 예시적인 컴퓨터 시스템의 블록도.3 is a block diagram of an example computer system capable of supporting the system of FIG.

도 4는 도 2의 컴퓨터 시스템의 아키텍처를 도시하는 블록도.4 is a block diagram illustrating the architecture of the computer system of FIG.

도 5는 본 발명의 양호한 실시예에 따른 예시적인 장치를 도시한 블록도.5 is a block diagram illustrating an exemplary apparatus in accordance with a preferred embodiment of the present invention.

종래의 비디오/이미지 통신 시스템과 상기에 기재된 모델에 기반을 둔 코딩의 제한을 처리하는 것이 본 발명의 목적이다.It is an object of the present invention to address the limitations of conventional video / image communication systems and coding based on the models described above.

실시간 압축 비디오 정보를 유도하는 객체 지향(object-oriented), 교차 플랫폼(cross-platform) 방법을 제공하는 것이 본 발명의 다른 목적이다.It is another object of the present invention to provide an object-oriented, cross-platform method for deriving real-time compressed video information.

이미지 프레임내의 특정 대상의 코딩을 가능하게 하는 것이 본 발명의 또 다른 목적이다.It is another object of the present invention to enable coding of a particular object in an image frame.

합성 및 자연 가시 대상을 상호작용으로 또는 실시간으로 완성시키는 것이 본 발명의 또 다른 목적이다.It is another object of the present invention to complete synthetic and natural visible objects interactively or in real time.

본 발명의 한 양태에서, 영상 처리 장치는 한 쌍의 입력 이미지 신호로부터 특징 위치 정보를 추출하도록 구성된 적어도 하나의 특징 추출 결정기와 특징 위치 정보와 차이(disparity) 정보에 따라 입력 영상 신호에서 대응하는 특징을 매치하는 매칭 유닛을 포함한다.In one aspect of the present invention, an image processing apparatus includes at least one feature extraction determiner configured to extract feature position information from a pair of input image signals and a corresponding feature in the input image signal according to the feature position information and the disparity information. It includes a matching unit to match.

본 발명의 한 실시예는 3차원 모델과 관련된 파라미터를 결정하는 방법에 관한 것이다. 상기 방법은 한 쌍의 입력 이미지들과 관련된 특징 위치 정보를 추출하는 단계와 추출된 특징 위치 정보와 차이(disparity) 정보에 따라 한 쌍의 입력 이미지에 특징 대응부를 매치하는 단계를 포함한다. 또한, 상기 방법은 특징 대응부 매칭에 따라 3차원 모델에 대한 파라미터를 결정하는 단계를 포함한다.One embodiment of the invention is directed to a method of determining parameters associated with a three-dimensional model. The method includes extracting feature position information associated with the pair of input images and matching the feature correspondent to the pair of input images according to the extracted feature position information and disparity information. The method also includes determining a parameter for the three-dimensional model in accordance with feature correspondence matching.

본 발명의 이러한 및 다른 실시예 및 양태는 다음의 상세한 설명에서 예시된다.These and other embodiments and aspects of the invention are illustrated in the following detailed description.

본 발명의 특징 및 이점은 도면과 함께 하기에서 기술하는 양호한 실시예의 상세한 설명을 참조함으로써 이해할 수 있다.The features and advantages of the present invention can be understood by reference to the detailed description of the preferred embodiments described below in conjunction with the drawings.

도 2를 참조하면, 3차원 모델링 시스템(10)이 도시되어 있다. 일반적으로, 시스템(10)은 적어도 하나의 특징 추출 결정기(11)와, 적어도 하나의 세트의 시간 정보(12), 및 특징 대응부 매칭 유닛(13)을 포함한다. 왼쪽 프레임(14)과 오른쪽 프레임(15)은 시스템(10)으로 입력된다. 왼쪽 및 오른쪽 프레임들은 디지털 또는 아날로그로 될 수 있는 이미지 데이터로 구성된다. 이미지 데이터가 아날로그이면, 아날로그-디지털 회로는 데이터를 디지털 포맷으로 변환하는데 사용될 수 있다.Referring to FIG. 2, a three-dimensional modeling system 10 is shown. In general, the system 10 includes at least one feature extraction determiner 11, at least one set of time information 12, and a feature correspondence matching unit 13. The left frame 14 and the right frame 15 are input to the system 10. The left and right frames consist of image data, which can be digital or analog. If the image data is analog, analog-digital circuitry can be used to convert the data into a digital format.

특징 추출 결정기(11)는 안면의 코, 눈 및 입의 특징 위치와 같이 디지털 이미지에서의 특징 위치(position/location)를 결정한다. 2개의 특징 추출 결정기(11)가 도 2에 도시되는데, 한 결정기는 왼쪽 및 오른쪽 프레임(14 및 15)으로부터 위치 정보를 추출하는데 사용될 수 있다. 시간 정보(12)는 정확한 특징 대응부에 대한 제약을 제공하는데 사용되는 이전의 및/또는 미래의 프레임들과 같은데이터를 포함한다. 이해되어지는 바와 같이, 처리될 현재 프레임은 시스템(10)에 제 1 프레임 입력을 필요로 한다. 테스트 프레임은 몇몇 히스테리시스를 확립하는데 사용된다.Feature extraction determiner 11 determines the feature position (position / location) in the digital image, such as the feature position of the nose, eyes and mouth of the face. Two feature extraction determiners 11 are shown in FIG. 2, where one determiner can be used to extract positional information from left and right frames 14 and 15. The temporal information 12 includes data such as previous and / or future frames used to provide constraints on the exact feature correspondence. As will be appreciated, the current frame to be processed requires a first frame input to the system 10. Test frames are used to establish some hysteresis.

양호한 실시예에서, 시스템(10)은 데이터 처리 장치에 의해 실행된 컴퓨터 판독 가능한 코드에 의해 실행된다. 코드는 데이터 처리 장치내의 메모리에 저장될 수 있으며, CD-ROM 또는 플로피 디스크와 같은 메모리 매체로부터 판독/다운로드된다. 다른 실시예에서, 하드웨어 회로 소자는 본 발명을 실행하기 위해 소프트웨어 명령 대신에 또는 소프트웨어 명령과 결합하여 사용될 수 있다. 예를 들어, 본 발명은 처리를 위한 트리메디아 프로세서(Trimedia processor) 및 디스플레이를 위한 텔레비전 모니터를 사용하여 디지털 텔레비전 플랫폼상에 실행될 수 있다. 본 발명은 또한 도 3에 도시된 컴퓨터(30)에서 실행될 수 있다.In a preferred embodiment, system 10 is executed by computer readable code executed by a data processing apparatus. The code may be stored in memory in the data processing device and read / downloaded from a memory medium such as a CD-ROM or floppy disk. In other embodiments, hardware circuitry may be used in place of or in combination with software instructions to carry out the present invention. For example, the present invention can be implemented on a digital television platform using a Trimedia processor for processing and a television monitor for display. The invention may also be practiced on the computer 30 shown in FIG.

도 3에 도시된 바와 같이, 컴퓨터(30)는 가변 대역폭 네트워크 또는 인터넷과 같은, 데이터 네트워크에 인터페이스하기 위한 네트워크 연결부(31)와 비디오 또는 디지털 카메라(도시하지 않음)와 같은 다른 멀리 떨어진 곳과 인터페이스하기 위한 팩스/모뎀 연결부(32)를 포함한다. 컴퓨터(30)는 또한 사용자에게 정보(비디오 데이터를 포함하는)를 디스플레이하기 위한 디스플레이(33)와, 텍스트 및 사용자 명령을 입력하기 위한 키보드(34)와, 디스플레이(33) 상에 커서를 위치 결정하고 사용자 명령을 입력하는 마우스(35)와, 디스크 드라이브에서 인스톨될 플로피 디스크로부터 판독하고 플로피 디스크에 기록하기 위한 디스크 드라이브(36), 및 CD-ROM 상에 저장된 정보를 액세스하기 위한 CD-ROM 드라이브(37)를 포함한다. 컴퓨터(30)는 또한, 이미지 등을 입력하기 위한 한 쌍의 비디오 컨퍼런스 카메라와 이미지, 텍스트 등을 출력하기 위한 프린터(38)와 같은 컴퓨터에 부착된 하나 또는 그 이상의 주변 장치를 가진다.As shown in FIG. 3, the computer 30 interfaces with a network connection 31 for interfacing to a data network, such as a variable bandwidth network or the Internet, and another remote location such as a video or digital camera (not shown). And a fax / modem connection 32 for the purpose of connection. The computer 30 also includes a display 33 for displaying information (including video data) to the user, a keyboard 34 for entering text and user commands, and a cursor on the display 33. A mouse 35 for inputting a user command, a disk drive 36 for reading from and writing to a floppy disk to be installed in the disk drive, and a CD-ROM drive for accessing information stored on the CD-ROM. (37). The computer 30 also has one or more peripheral devices attached to the computer, such as a pair of video conference cameras for inputting images and the like, and a printer 38 for outputting images, text, and the like.

도 4는 RAM(Random Access Memory), ROM(Read-Only Memory) 및 하드디스크와 같은 컴퓨터 판독 가능한 매체를 포함할 수 있는 메모리(40)를 포함하는 컴퓨터(30)의 내부 구조를 도시한다. 메모리(40)에 저장된 항목은 운영 시스템(41)과 데이터(42), 및 애플리케이션(43)을 포함한다. 메모리(40)에 저장된 데이터는 또한 시간 정보(12)를 포함할 수 있다. 본 발명의 양호한 실시예에서, 본 발명이 마이크로소프트 윈도우 95 뿐만 아니라 다른 운영 시스템으로 사용될 수 있다 하더라도, 운영 시스템(41)은 UNIX와 같은 윈도우 운영 시스템이다. 메모리(40)에 저장된 애플리케이션 중에는 비디오 코더(44)와, 비디오 디코더(45), 및 프레임 글래버(grabber)(46)가 있다. 비디오 코더(44)는 통상의 방법으로 비디오 데이터를 인코드하며, 비디오 디코더(45)는 통상의 방법으로 코드화되었던 비디오 데이터를 디코드한다. 프레임 글래버(46)는 비디오 신호 스트림에서 단일 프레임이 포착되고 처리되는 것을 허용한다.4 illustrates an internal structure of a computer 30 that includes a memory 40 that may include a computer readable medium such as random access memory (RAM), read-only memory (ROM), and a hard disk. Items stored in memory 40 include an operating system 41 and data 42, and an application 43. The data stored in memory 40 may also include time information 12. In a preferred embodiment of the present invention, operating system 41 is a Windows operating system such as UNIX, although the present invention can be used not only in Microsoft Windows 95 but in other operating systems. Among the applications stored in the memory 40 are a video coder 44, a video decoder 45, and a frame grabber 46. Video coder 44 encodes the video data in a conventional manner, and video decoder 45 decodes the video data that was encoded in the conventional manner. Frame grabber 46 allows a single frame to be captured and processed in the video signal stream.

또한, CPU(central processing unit; 50)과, 통신 인터페이스(51), 메모리 인터페이스(52), CD-ROM 드라이브 인터페이스(53), 비디오 인터페이스(54), 및 버스(55)가 컴퓨터(30)에 포함된다. CPU(50)는 그와 같이 상술된, 메모리(40) 밖의 컴퓨터 판독 가능한 코드, 즉, 애플리케이션을 실행하기 위한 마이크로프로세서 등을 포함한다. 이러한 애플리케이션은 메모리(40)(상술된 바와 같이)에 또는 대안적으로 디스크 드라이브(36)의 플로피 디스크 상에 또는 CD-ROM 드라이브(37)에서 CD-ROM에 저장될 수 있다. CPU(50)는 메모리 인터페이스(52)를 통해 플로피 디스크 상에 저장된 애플리케이션(또는 다른 데이터)을 액세스하며, CD-ROM 드라이브 인터페이스(53)를 통해 CD-ROM 상에 저장된 애플리케이션(다른 데이터)을 액세스한다.In addition, a central processing unit (CPU) 50, a communication interface 51, a memory interface 52, a CD-ROM drive interface 53, a video interface 54, and a bus 55 are connected to the computer 30. Included. The CPU 50 includes computer readable code outside the memory 40 as described above, that is, a microprocessor or the like for executing an application. Such an application may be stored in memory 40 (as described above) or alternatively on a floppy disk of disk drive 36 or in a CD-ROM in CD-ROM drive 37. The CPU 50 accesses the application (or other data) stored on the floppy disk via the memory interface 52, and accesses the application (other data) stored on the CD-ROM through the CD-ROM drive interface 53. do.

컴퓨터(30)의 애플리케이션 실행 및 다른 작업은 키보드(34) 또는 마우스(35)를 사용하여 초기화될 수 있다. 컴퓨터(30)에서 실행하는 애플리케이션으로부터의 출력 결과는 사용자에게 디스플레이(34)상에 디스플레이될 수 있거나 대안적으로 네트워크 연결부(31)를 통해 출력될 수 있다. 예를 들어, 입력 비디오 데이터는 비디오 인터페이스(54) 또는 네트워크 연결부(31)를 통해 수신될 수 있다. 입력 비디오 데이터는 비디오 디코더(45)에 의해 디코드될 수 있다. 출력 비디오 데이터는 비디오 인터페이스(54) 또는 네트워크 연결부(31)를 통해 전송하기 위해 비디오 코더(44)에 의해 코드화될 수 있다. 디스플레이(33)는 바람직하게도 버스(55)를 통해 CPU(50)에 의해 제공된 디코드된 비디오 데이터에 근거하여 비디오 이미지를 형성하기 위한 디스플레이 처리기를 포함한다. 다양한 애플리케이션으로부터의 출력 결과는 프린터(38)에 제공될 수 있다.Application execution and other tasks of the computer 30 may be initiated using the keyboard 34 or the mouse 35. The output from the application running on the computer 30 can be displayed on the display 34 to the user or alternatively can be output via the network connection 31. For example, input video data may be received via video interface 54 or network connection 31. The input video data can be decoded by the video decoder 45. The output video data may be coded by the video coder 44 for transmission over the video interface 54 or the network connection 31. Display 33 preferably includes a display processor for forming a video image based on decoded video data provided by CPU 50 via bus 55. Output results from various applications may be provided to the printer 38.

도 2를 참조하면, 왼쪽 프레임(14)과 오른쪽 프레임(15)은 바람직하게도 한 쌍의 스테레오 디지털 이미지를 포함한다. 예를 들어, 디지털 이미지는 2개(스틸 사진 또는 비디오)의 카메라(60 및 61)(도 5에 도시됨)로부터 수신될 수 있고 다음의 처리 동안 메모리(40)에 저장될 수 있다. 각도 또는 시각 차로 얻어진 다른 프레임들 또는 프레임 쌍들이 또한 사용될 수 있다. 카메라(60 및 61)는 비디오 컨퍼런싱 시스템 또는 애니메이션 시스템과 같은 또 다른 시스템의 일부일 수도 있다.Referring to FIG. 2, the left frame 14 and the right frame 15 preferably comprise a pair of stereo digital images. For example, the digital image may be received from two (still photo or video) cameras 60 and 61 (shown in FIG. 5) and stored in memory 40 during subsequent processing. Other frames or frame pairs obtained with an angular or visual difference can also be used. Cameras 60 and 61 may be part of another system, such as a video conferencing system or animation system.

카메라(60 및 61)는 서로 가까이 위치하며, 주체(64)는 카메라(62 및 63)로부터 짧은 거리 떨어져 위치한다. 도 5에 도시된 바와 같이, 카메라(60 및 61)는 서로 거리 b(중심-중심)만큼 떨어져 있다. 대상(62)은 각각의 카메라(60 및 61)로부터 거리 f만큼 떨어져 있다. 바람직하게도, b는 대략 5 내지 6 인치와 같으며, f는 대략 3피트와 같다. 그러나, 본 발명은 이 거리에 한정되지 않으며, 이 거리는 단지 예를 들기 위한 것임을 알아야 한다.Cameras 60 and 61 are located close to each other and subject 64 is located a short distance away from cameras 62 and 63. As shown in Fig. 5, the cameras 60 and 61 are separated from each other by a distance b (center-center). Subject 62 is separated by distance f from each of cameras 60 and 61. Preferably, b equals approximately 5 to 6 inches and f equals approximately 3 feet. However, it is to be understood that the present invention is not limited to this distance, which distance is for illustrative purposes only.

바람직하게도, 카메라(60)는 정면을 취하며, 카메라(61)는 대상(62)의 오프셋 또는 측면을 취한다. 이는 차이 맵을 결정하기 위해 왼쪽 프레임(14)과 오른쪽 프레임(15)으로 구성될 비교를 허용한다. 본 발명의 양호한 실시예에서, 왼쪽 프레임(14)(이미지 A)은 오른쪽 프레임(15)(이미지 B)과 비교된다. 그러나 역 비교도 실행될 수 있다.Preferably, the camera 60 takes the front side and the camera 61 takes the offset or side of the object 62. This allows a comparison that will consist of the left frame 14 and the right frame 15 to determine the difference map. In the preferred embodiment of the present invention, the left frame 14 (image A) is compared with the right frame 15 (image B). However, reverse comparison can also be performed.

디지털 프레임 또는 이미지는 다수의 수평 스캔 라인과 어레이 픽셀을 형성하는 다수의 수직 컬럼을 포함함으로써 개념화될 수 있다. 스캔 라인과 컬럼의 수는 디지털 영상의 해상도를 결정한다. 차이 맵을 결정하기 위해, 스캔 라인이 확보되는데, 예를 들어 이미지 A의 스캔 라인 10은 이미지 B의 스캔 라인 10과 매치한다. 다음으로 이미지 A의 스캔 라인 10상의 픽셀은 이미지 B의 스캔 라인 10내의 그 대응 픽셀과 매치된다. 그래서, 예를 들어, 이미지 A의 스캔 라인 10의 15번째 픽셀은 이미지 B의 스캔 라인 10의 10번째 픽셀과 매치하며, 차이는 다음과 같이 계산된다:15-10=5. 왼쪽 및 오른쪽 카메라(60 및 61)가 가까이 위치할 때 이미지의 예를 들어 사람 얼굴의 가장 앞쪽 정보의 픽셀이 배경 정보의 픽셀보다 큰 차이를 갖게 될 것임을 알게 된다.A digital frame or image can be conceptualized by including a number of horizontal scan lines and a number of vertical columns forming array pixels. The number of scan lines and columns determines the resolution of the digital image. To determine the difference map, a scan line is secured, for example scan line 10 of image A matches scan line 10 of image B. The pixel on scan line 10 of image A then matches its corresponding pixel in scan line 10 of image B. So, for example, the fifteenth pixel of scan line 10 of image A matches the tenth pixel of scan line 10 of image B, and the difference is calculated as follows: 15-10 = 5. It will be appreciated that when the left and right cameras 60 and 61 are located close, for example the pixels of the frontmost information of the human face will have a greater difference than the pixels of the background information.

차이 계산에 근거한 차이 맵은 메모리(40)에 저장될 수 있다. 이미지의 각 스캔 라인(또는 컬럼)은 그 스캔 라인(또는 컬럼)내의 각 픽셀에 대한 차이를 포함하는 프로파일을 갖는다. 이 실시예에서, 각 픽셀의 그레이스케일 레벨은 그 픽셀에 대해 계산된 차이의 크기를 포함한다. 그레이스케일 레벨이 어두울수록 차이는 점점 더 낮아진다.The difference map based on the difference calculation may be stored in the memory 40. Each scan line (or column) of an image has a profile that includes a difference for each pixel in that scan line (or column). In this embodiment, the grayscale level of each pixel includes the magnitude of the difference calculated for that pixel. The darker the grayscale level, the lower the difference.

차이 임계값이 예를 들어 10으로 선택될 수 있으며, 차이 임계값 이상의 임의의 차이는 픽셀이 가장 앞쪽의 정보(즉, 주체(64))임을 나타내고, 10미만의 임의의 차이는 픽셀이 배경 정보임을 나타낸다. 차이 임계값의 선택은 상기된 카메라 거리의 일부에 근거한다. 예를 들어, 보다 낮은 차이 임계는 대상(62)이 카메라(60 및 61)로부터 보다 멀리 떨어져 위치하는 경우 사용될 수 있으며, 보다 큰 차이 임계는 카메라(60 및 61)가 서로로부터 보다 멀리 떨어진 경우에 사용될 수 있다.The difference threshold may be chosen for example 10, any difference above the difference threshold indicates that the pixel is the frontmost information (i.e. subject 64), and any difference less than 10 indicates that the pixel is background information. Indicates that The selection of the difference threshold is based on a portion of the camera distance described above. For example, a lower difference threshold may be used when the object 62 is located farther from the cameras 60 and 61, and a larger difference threshold may be used when the cameras 60 and 61 are farther from each other. Can be used.

차이 맵은 왼쪽 및 오른쪽 프레임(14 및 15)으로부터 좌표 또는 안면의 특징 위치를 추출하는데 사용된다. 바람직하게도, 1999년 8월 30일 출원된 미국 특허 출원 08/385,280에 기재된 시스템 및 방법은 특징 추출 결정기(11)를 포함한다. 바람직하게도, 안면의 특징 위치는 머리의 아웃라인 위치뿐만 아니라 눈, 코, 입에대하 위치를 포함한다. 도 1과 관련하여, 이 위치는 얼굴 모델(100)의 다양한 정점과 상관한다. 예를 들어, 코와 관련하여, 안면의 특징 추출 결정기는 바람직하게도 도 1에 도시된 바와 같이 정점(4, 5, 23 및 58)에 직접적으로 관련된 정보를 제공한다.The difference map is used to extract the coordinate or facial feature locations from the left and right frames 14 and 15. Preferably, the systems and methods described in US patent application 08 / 385,280, filed August 30, 1999, include feature extraction determiner 11. Preferably, facial feature locations include the head, nose, and mouth positions as well as the outline positions of the head. With reference to FIG. 1, this position correlates with various vertices of face model 100. For example, with respect to the nose, facial feature extraction determinants preferably provide information directly related to vertices 4, 5, 23 and 58, as shown in FIG. 1.

그러나, 특징 추출 결정기(11)는 안면의 특징의 X 및 Y 좌표만을 제공한다. 특징 대응부 매칭 유닛(13)은 Z 좌표를 제공한다. 바람직하게도, 특징 추출 결정기(11)는 왼쪽 및 오른쪽 스테레오 이미지 프레임(14 및 15)상의 투시도에 제공된 3차원 점의 위치의 추정에 근거한 3각 측량 절차를 사용한다. 예를 들어, 왼쪽 및 오른쪽 프레임(14 및 15)에서 특징 점(F_L및 F_R)의 X 및 Y 좌표를 제공하면, 3차원 표면(즉, Z 또는 깊이 정보)은 다음의 식에 의해 결정될 수 있다.However, feature extraction determiner 11 provides only the X and Y coordinates of facial features. The feature counterpart matching unit 13 provides the Z coordinate. Preferably, the feature extraction determiner 11 uses a triangulation procedure based on the estimation of the position of the three-dimensional point provided in the perspective views on the left and right stereo image frames 14 and 15. For example, providing the X and Y coordinates of feature points F _L and F _R in left and right frames 14 and 15, the three-dimensional surface (ie, Z or depth information) can be determined by the equation Can be.

[식][expression]

Z = f*b/(|F_L-F_R|),Z = f * b / (| F _L -F _R |),

여기서, 거리 f(도 5에 도시됨)는 카메라(60 및 61)의 초점 길이이며,Here, the distance f (shown in FIG. 5) is the focal length of the cameras 60 and 61,

거리 b(도 5에 도시됨)는 카메라(60 및 61) 사이의 기선 거리이며,Distance b (shown in FIG. 5) is the baseline distance between cameras 60 and 61,

|F_L-F_R|은 상기된 바와 같이 계산된 차이를 나타낸다.| F _L -F _R | represents the difference calculated as described above.

본 실시예에서, 상기 식은 몇몇 지리적 조건하에서 표면 Z와 차이간의 관계를 제공한다. 특히, 각 카메라의 정면에서 이미지 평면은 초점 길이 f이며 2개의 카메라는 카메라(60 및 61)의 위치에 의해 규정된 선을 따라 지향되는 카메라 기준 프레임의 X축과 동일하게 지향된다. 카메라(60 및 61)의 초점 길이는 동일하다고가정한다. 또한 카메라(60 및 61)의 렌즈의 임의의 지리적 왜곡이 보상된다고 가정한다. 다른 지리적 배치도 사용될 수 있다. 그러나, 차이와 표면 Z간의 관계는 더 복잡하게 된다.In this embodiment, the equation provides a relationship between the surface Z and the difference under some geographical conditions. In particular, at the front of each camera the image plane is the focal length f and the two cameras are oriented equal to the X axis of the camera frame of reference, which is directed along the line defined by the position of the cameras 60 and 61. It is assumed that the focal lengths of the cameras 60 and 61 are the same. It is also assumed that any geographic distortion of the lenses of cameras 60 and 61 is compensated for. Other geographic arrangements can also be used. However, the relationship between the difference and the surface Z becomes more complicated.

도 1에 도시된 얼굴 모델(100)의 다른 정점들이 특징 추출 결정기(11)에서의 위치(즉, 안면의 특징 정점 정보)와 특징 대응부 매칭 유닛(13)에서의 결정에 근거하여 내삽 및 외삽될 수 있다. 내삽은 선형 또는 비선형 또는 스케일 가능한 모델 또는 함수에 근거할 수 있다. 예를 들어, 2개의 다른 공지된 정점간의 정점은 모든 3개의 정점이 만족하는 소정의 포물선 함수를 사용하여 결정될 수 있다. 추가 정점을 갖는 다른 얼굴 모델은 또한 강화된 또는 향상된 모델링 결과를 제공하는데 사용될 수 있다.The other vertices of the face model 100 shown in FIG. 1 are interpolated and extrapolated based on the position in the feature extraction determiner 11 (ie, facial feature vertex information) and the determination in the feature correspondence matching unit 13. Can be. Interpolation can be based on linear or nonlinear or scalable models or functions. For example, a vertex between two other known vertices can be determined using a predetermined parabolic function where all three vertices are satisfied. Other face models with additional vertices can also be used to provide enhanced or improved modeling results.

도 1에 도시된 얼굴 모델(100)은 분명치 않은 표정을 갖는 일반적인 얼굴이다. 얼굴 모델(100)의 제어는 스케일 가능하다. 얼굴 모델(100) 형판은 임의의 통신이 개시되기 전에 멀리 떨어진 장소에 저장되거나 또는 적재될 수 있다. 추출된 안면의 특징을 사용하면, 다각형(polygon) 정점은 특정한 사람의 얼굴을 보다 근접하게 매치하기 위해 조절할 수 있다. 특히, 특징 대응부 매칭 유닛(13)과 특징 추출 결정기(11)에 의해 실행된 처리와 정보에 근거하여, 얼굴 모델(100) 형판은 움직임, 표정을 가능하도록 하는데 적합하며 생기를 불어 넣어주며 오디오(즉, 음성)를 화면과 일치시킨다. 본질적으로, 일반적인 얼굴 모델(100)은 특정 얼굴로 실시간에 다이나믹하게 변형된다. 모델 얼굴 파라미터/데이터의 실시간 또는 비실시간(non-real time) 전송은 합성 안면 모델의 저비트율 애니메이션을 제공한다.바람직하게도, 데이터 속도는 64Kbit/sec이거나 또는 그 미만이지만, 이미지를 이동하기 위해서 64Kbit/sec 내지 4Mbit/sec간의 데이터 속도도 가능하다.The face model 100 shown in FIG. 1 is a general face with an obscure facial expression. Control of the face model 100 is scalable. The face model 100 template may be stored or loaded at a remote location before any communication is initiated. Using extracted facial features, polygon vertices can be adjusted to match a particular person's face more closely. In particular, based on the processing and information performed by the feature correspondence matching unit 13 and the feature extraction determiner 11, the face model 100 template is suitable for enabling movement, facial expression, and animate audio. (Ie voice) to match the screen. In essence, the generic face model 100 dynamically transforms to a particular face in real time. Real-time or non-real time transmission of model face parameters / data provides low bit rate animation of the synthetic facial model. Preferably, the data rate is 64 Kbit / sec or less, but 64 Kbit to move the image. Data rates between / sec and 4Mbit / sec are also possible.

다른 실시예에서, 시간 정보(12)는 특징 대응부 매칭 유닛(13)으로부터 특징 매칭 결과를 증명하며 및/또는 대안적인 특징 매칭 처리를 실행하는데 사용된다. 본 실시예에서, 예를 들어, 매칭은 선택된 프레임상의 바람직하게는 "키" 프레임(MPEG 포맷에서)상의 특징 대응부 매칭 유닛(13)에 의해서만 실행된다. 키 프레임이 특징을 매치하자마자, 다른 비-키(non-key) 프레임(또는 다른 키 프레임)에서의 특징의 대응부 매칭(즉, 깊이)이 시간 방식으로 대응하는 특징 점을 트래킹함으로써 결정될 수 있다. 초기 특징 대응부가 제공되면, 3차원 모션은 2개의 시점(view)에서 한 전이(translation) 방향으로의 스케일까지 결정될 수 있다.(즉, 시간 정보(12)가 2개의 왼쪽 또는 2개의 오른쪽 순차적 또는 연속적인 프레임들로 구성될 수 있다.) 바람직하게도, 특징 대응부 매칭 유닛(13)은 시간 특징 매칭으로부터 임의의 확립된 에러를 제거하기 위해 다른 키 프레임을 주기적으로 특징 매치하는데 사용된다. 특징 대응부 매칭 유닛(13)은 필요에 따라 특징 대응부 매칭과 시간 대응 매칭을 실행하기 위해 구성될 수 있다. 시간 특징 매칭은 실시간 처리를 하는데 유리한 특징 대응부 매칭보다 빨리 실행될 수 있다.In another embodiment, the time information 12 is used to verify the feature matching result from the feature correspondence matching unit 13 and / or to perform an alternative feature matching process. In this embodiment, for example, matching is performed only by the feature correspondence matching unit 13 on the selected frame, preferably on the "key" frame (in MPEG format). As soon as a key frame matches a feature, the correspondence matching (ie depth) of the feature in another non-key frame (or other key frame) can be determined by tracking the corresponding feature point in a temporal manner. . If an initial feature correspondence is provided, the three-dimensional motion may be determined from two views to a scale in one translation direction (i.e., time information 12 may be two left or two right sequential or Preferably, the feature correspondence matching unit 13 is used to periodically feature match another key frame to remove any established error from temporal feature matching. The feature correspondence matching unit 13 may be configured to perform feature correspondence matching and time correspondence matching as necessary. Temporal feature matching can be performed faster than feature counter matching which is advantageous for real time processing.

본 발명은 비디오 컨퍼런싱 및 실제 대상의 애니메이션/시뮬레이션과 같은 분야에서 또는 대상 모델링을 필요로 하는 임의의 애플리케이션에서 많이 응용할 수 있다. 예를 들어, 전형적인 응용은 비디오게임, 멀티미디어 산물 및 인터넷을통한 개선된 네비게이션을 포함한다.The present invention can be applied in many fields such as video conferencing and animation / simulation of real objects, or in any application requiring object modeling. For example, typical applications include videogames, multimedia products and improved navigation through the Internet.

부가적으로, 본 발명은 3차원 얼굴 모델에 한정되지 않는다. 본 발명은 자동차와 방의 3차원 모델과 같은 다른 물리적인 대상 및 장면의 모델에 사용될 수 있다. 본 실시예에서 특징 추출 결정기(11)는 질문한 특정 대상 또는 장면 예를 들어, 바퀴의 위치 또는 가구의 위치와 관련된 위치 정보를 모은다. 다음으로 또 다른 처리는 이 정보에 근거한다.In addition, the present invention is not limited to a three-dimensional face model. The invention can be used in models of other physical objects and scenes, such as three-dimensional models of cars and rooms. In the present embodiment, the feature extraction determiner 11 collects position information related to the specific object or scene in question, for example, the position of the wheel or the position of the furniture. Then another process is based on this information.

본 발명이 특정 실시예에 의하여 상기에 기술되었던 동안, 본 발명이 여기에 기재된 실시예에 한정되거나 또는 제한되지 않음을 알 수 있다. 예를 들어, 본 발명은 임의의 특정 타입의 필터링 또는 수학적인 변환에 한정되지 않으며, 또는 임의의 특정 입력 이미지 스케일 또는 대상에 한정되지 않는다. 반면에, 본 발명은 첨부된 청구항의 정신 및 범위내에 포함된 본 발명의 다양한 구조 및 변형예를 커버한다.While the invention has been described above by way of specific embodiments, it will be appreciated that the invention is limited or not limited to the embodiments described herein. For example, the invention is not limited to any particular type of filtering or mathematical transformation, or to any particular input image scale or object. On the contrary, the invention covers various structures and modifications of the invention, which come within the spirit and scope of the appended claims.

Claims

In the image processing apparatus 10,

At least one feature extraction determiner 11 configured to extract feature position information from the pair of input image signals 14, 15, and

An image processing apparatus (10) comprising a matching unit (13) coupled to said feature extraction determiner (11) arranged to match corresponding features in input image signals (14, 15) according to feature position information and difference information ).

An image processing apparatus (10) according to claim 1, wherein the matching unit outputs three-dimensional (3D) information related to the input images (14, 15).

The image processing apparatus (10) of claim 2, wherein the three-dimensional surface information is based on a predetermined model (100).

5. An image processing apparatus (10) according to claim 4, wherein said predetermined model is a human face model (100).

2. An image processing apparatus (10) according to claim 1, wherein said matching unit performs a matching on at least one frame (14) of input image signals and a temporal feature matching (12) on at least one other frame.

7. An image processing apparatus (10) according to claim 6, wherein said temporal feature matching is performed using sequential input image frames.

7. An image processing apparatus (10) according to claim 6, wherein one frame is a key frame.

In the method of determining the parameters associated with the three-dimensional model 100,

Extracting feature position information associated with the pair of input images 14, 16;

Matching corresponding features to a pair of input images according to the extracted feature position information and the difference information, and

Determining parameters for the three-dimensional model in accordance with the result of the feature correspondence matching.

10. The method of claim 9, wherein the three-dimensional model is a human face model (100).

A method of coding an object in a digital image for transmission, said method comprising:

Extracting feature position information associated with the at least one pair of digital images 14, 15;

Matching corresponding features in the pair of digital images according to the extracted feature position information and the difference information, and

Coding the information for transmission in accordance with the result of the feature correspondence matching.

12. The method of claim 11, wherein the information for transmission includes parameters related to the three-dimensional model (100).