KR20090093904A

KR20090093904A - Apparatus and method for scene variation robust multimedia image analysis, and system for multimedia editing based on objects

Info

Publication number: KR20090093904A
Application number: KR1020090017721A
Authority: KR
Inventors: 안기옥; 이정헌
Original assignee: 미디어코러스 주식회사
Priority date: 2008-02-28
Filing date: 2009-03-02
Publication date: 2009-09-02

Abstract

본 발명은 장면 변화에 강인한 멀티미디어 영상 분석 장치 및 그 방법과, 그를 이용한 객체 기반 편집 시스템에 관한 것으로서, 정지영상, 동영상 등과 같은 멀티미디어 영상을 분석 및 편집함에 있어서 일정한 영상 단위(장면, 비디오 샷)로 분할하고 각각의 분할된 영상 단위에 대해서 대표 프레임을 선정한 후, 그 대표 프레임을 우선 대상으로 하여 멀티미디어 영상을 분석/편집(예를 들면, 객체 분류, 객체 검색/추적, 객체정보 부가 등)함으로써 분석/편집에 소요되는 시간/노력/비용을 최소화하고자 한다.The present invention relates to an apparatus and method for analyzing a multimedia image that is robust to scene changes, and an object-based editing system using the same. The present invention relates to a predetermined image unit (scene, video shot) in analyzing and editing a multimedia image such as a still image or a moving image. After segmentation, a representative frame is selected for each divided image unit, and the analysis is performed by analyzing / editing (eg, classifying an object, searching / tracking an object, adding object information, etc.) to a multimedia image using the representative frame as a target. / Minimize the time, effort, and cost of editing.

이를 위하여, 본 발명은, 객체 기반 멀티미디어 편집 시스템에 있어서, 멀티미디어 영상을 소정의 영상 단위로 분할해서 분할 영상마다 대표 프레임을 선정하고, 상기 선정된 대표 프레임들을 우선 대상으로 상기 멀티미디어 영상을 분석하기 위한 멀티미디어 영상 분석 수단; 및 상기 멀티미디어 영상 분석 수단의 분석을 통하여 검색/추적된 객체에 대하여 해당 객체 정보를 입력하거나, 사용자 인터페이스를 통하여 장면/객체 편집을 수행하기 위한 관리 수단을 포함하는 것을 특징으로 한다.To this end, the present invention, in the object-based multimedia editing system, to divide the multimedia image into a predetermined image unit to select a representative frame for each divided image, and to analyze the multimedia image with the selected representative frame first priority Multimedia image analysis means; And management means for inputting corresponding object information to the searched / tracked object through analysis of the multimedia image analyzing means, or performing scene / object editing through a user interface.

Description

Apparatus and method for multimedia image analysis robust to scene change, and object-based multimedia editing system using the same

본 발명은 장면 변화에 강인한 멀티미디어 영상 분석 장치 및 그 방법과, 그를 이용한 객체 기반 멀티미디어 편집 시스템에 관한 것으로, 더욱 상세하게는 정지영상, 동영상 등과 같은 멀티미디어 영상을 분석 및 편집함에 있어서 일정한 영상 단위(장면, 비디오 샷)로 분할하고 각각의 분할된 영상 단위에 대해서 대표 프레임을 선정한 후, 그 대표 프레임을 우선 대상으로 하여 멀티미디어 영상을 분석/편집(예를 들면, 객체 분류, 객체 검색/추적, 객체정보 부가 등)함으로써, 분석/편집에 소요되는 시간/노력/비용을 최소화할 수 있는, 장면 변화에 강인한 멀티미디어 영상 분석 장치 및 그 방법과, 그를 이용한 객체 기반 멀티미디어 편집 시스템에 관한 것이다.The present invention relates to an apparatus and method for analyzing a multimedia image that is robust to scene changes, and an object-based multimedia editing system using the same. More particularly, the present invention relates to a certain image unit (scene) in analyzing and editing a multimedia image such as a still image or a moving image. , Video shot) and select a representative frame for each divided image unit, and then analyze / edit multimedia images (for example, object classification, object search / tracking, and object information). The present invention relates to an apparatus and method for multimedia image analysis that is robust to scene changes, and an object-based multimedia editing system using the same, capable of minimizing time / effort / cost required for analysis / editing.

최근, 멀티미디어 정보(Multimedia Information)에 대한 수요가 급증하면서, 멀티미디어 콘텐츠(예를 들어, 동영상)에 대한 편집, 검색, 객체화 등의 관련한 다양한 기술이 폭 넓게 개발되고 있다. In recent years, as the demand for multimedia information increases, various technologies related to editing, searching, and objectification of multimedia contents (for example, moving images) have been widely developed.

예컨대, 한국특허공개공보 제2000-0014421호에 기재된 '재기록 가능 기록매체의 동영상 데이터 연결재생정보 생성 및 갱신 기록방법', 한국특허공개공보 제2000-0017815호에 기재된 '네트워크를 기반으로 하는 동영상 생성 시스템 및 그 생성방법', 한국 특허공개공보 제2000-49833호에 기재된 '인터넷을 통한 디지털 동영상 앨범 제작 방법', 한국특허공개공보 제2000-58241호에 기재된 '동영상 편집 시스템 및 이를 이용한 객체 정보 서비스 방법', 한국특허공개공보 제2000-58970호에 기재된 '동영상 정보 제공과 검색 방법' 등에는 이와 같은 종래의 동영상과 관련한 다양한 기술들이 상세하게 개시되어 있다.For example, a video generation method based on a network described in Korean Patent Laid-Open Publication No. 2000-0014421, " Method for Creating and Updating Video Data Linked Playback Information on Rewritable Recording Media " and Korean Patent Laid-Open Publication No. 2000-0017815 System and method of generating the same, ' Method of producing digital video album through the Internet ' described in Korean Patent Publication No. 2000-49833, ' Movie editing system and object information service using the same as described in Korean Patent Publication No. 2000-58241 Method ',' Video information providing and retrieval method 'described in Korean Patent Laid-Open No. 2000-58970, etc. disclose various technologies related to such a conventional video in detail.

그러나, 이와 같은 종래의 동영상 이용 기술들은 대부분 임의로 주어진 동영상 정보를 단편적으로 이용하는 정도에 불과하기 때문에, 상기와 같은 종래의 동영상 처리 기술을 이용하는 것만으로는, 해당 동영상 정보에 인터랙티브(Interactive)한 기능을 부여하는 데에는 많은 한계가 있다. However, since such conventional video using techniques are only a fraction of using arbitrarily given video information, only using the conventional video processing techniques as described above provides an interactive function to the video information. There are many limitations to this.

이러한 한계를 극복하기 위하여, 엠펙(MPEG: Moving Picture Expert Group)에서는 MPEG-4 BIFS(Binary Format for Scenes), MPEG-4 LASeR(Light Application Scene Representation) 등의 표준안을 통하여, 동영상이 서비스되는 시점에 어떻게 이벤트 정보를 첨부할 것인지에 대한 방법이 제시하였다. To overcome these limitations, MPEG (Moving Picture Expert Group) uses MPEG-4 Binary Format for Scenes (MPEG), MPEG-4 Light Application Scene Representation (LASeR), etc. How to attach event information is presented.

하지만, MPEG 표준안에서도 역시 동영상이나 기타 멀티미디어 데이터에 대하여 어떠한 방법으로 객체를 분리하고 그 분리된 객체에 대하여 어떻게 이벤트를 부가할 것인지에 대한 방법을 제시하고 있지 않아, 동영상의 경우 프레임(Frame) 단위로 편집을 하게 되어 결국에는 인터랙티브(Interactive) 기능을 부여하는데 많은 비용이 소요되었다. However, the MPEG standard also does not provide a way to separate objects in video or other multimedia data and how to add events to the separated objects. In the end, it was expensive to provide interactive functions.

한편, 한국특허공개공보 제2002-0063754호에 기재된 '멀티미디어 편집 툴 및 이를 이용한 멀티미디어 편집 방법'에서는 좀 더 간편한 방법이 제시되었으나, 대다수의 멀티미디어 콘텐츠에서 수많은 장면 변화가 있다는 특성으로 인하여 객체 추적에 많은 오류가 발생하고 있으며, 이로 인하여 동영상 관련 편집에 많은 비용/노력/시간이 소요되는 문제가 있다.On the other hand, the ' Multimedia Editing Tool and Multimedia Editing Method Using the Same' described in Korean Patent Laid-Open Publication No. 2002-0063754 have presented a more convenient method. An error occurs, which causes a lot of cost / effort / time for editing videos.

동영상 및 멀티미디어 데이터에 대하여 인터랙티브(Interactive) 기능을 부여하기 위한 편집은 통상적으로 크게 세 가지 과정으로 이루어진다. 첫 번째 과정은 객체의 위치를 선정하여 분리하는 과정, 두 번째 과정은 객체 정보를 입력하는 과정, 세 번째 과정은 검수하는 과정이다. 이하, 각각의 과정에 대하여 설명하기로 한다.Editing to give an interactive function to video and multimedia data is generally composed of three processes. The first process is to select and separate the location of the object, the second process is to enter object information, and the third process is to inspect. Hereinafter, each process will be described.

먼저, 첫 번째 과정(객체의 위치를 선정하여 분리하는 과정)은, 입력된 동영상 및 멀티미디어 데이터에서 필요로 하는 의미 객체에 대하여 위치 및 영역을 선정하는 과정이다. 이러한 위치 및 영역은 해당 객체가 나타나는 시간 동안에서 연속적으로 변화하는 정보를 담고 있어야 한다. First, a first process (a process of selecting and separating an object's position) is a process of selecting a position and an area with respect to a semantic object required by input video and multimedia data. These locations and areas must contain information that changes continuously during the time the object appears.

다음으로, 두 번째 과정(객체 정보를 입력하는 과정)은 편집자의 의도에 따라 의미 있는 객체를 추출하고, 그 추출된 객체에 다른 데이터(예를 들어, 다른 비디오 데이터, 다른 오디오 데이터, 다른 URL 링크 정보 등과 같은 메타 데이터 등)가 실시간으로 연결될 수 있도록 하는 정보를 입력하는 과정이다. Next, the second process (entering object information) extracts meaningful objects according to the editor's intention, and extracts other data (e.g., different video data, different audio data, different URL links) into the extracted object. Meta data such as information) is a process of inputting information that can be connected in real time.

다음으로, 세 번째 과정(검수 과정)은 앞의 두 가지 과정에서 이루어진 결과물에 대하여 검증을 하는 과정이다. Next, the third process (verification process) is the process of verifying the results of the previous two processes.

상기와 같은 세 가지 편집 과정 중에서 가장 많은 노력/시간/비용이 소모되는 과정이 바로 첫 번째 과정인 '객체의 위치를 선정하고 분리하는 과정'이기 때문에, 이러한 과정을 장면 인식, 객체 인식(예를 들면, 얼굴 인식 등), 자동 객체 추적 등의 인식 기반의 영상 처리 알고리즘을 통하여 자동화할 필요가 절실히 요구된다. 아울러, 객체 정보 입력 과정 및 검수 과정의 효율성도 증가시킬 필요가 있다.Of the three editing processes described above, the process that consumes the most effort / time / cost is the first process, 'selecting and separating the object's location'. For example, there is an urgent need for automation through image processing algorithms based on recognition such as face recognition) and automatic object tracking. In addition, it is necessary to increase the efficiency of the object information input process and the inspection process.

특히, 종래의 멀티미디어 영상 분석 기술에서는 멀티미디어 영상을 구성하는 모든 프레임들을 동등한 입장에서 취급하기 때문에, 특정 객체를 검색/검출하는데에 있어서 많은 시간/노력/비용과 연산 부담이 소요되며, 이로 인하여 해당 멀티미디어 영상을 편집함에 있어서도 많은 시간/노력/비용이 소요되는 문제가 있다.In particular, in the conventional multimedia image analysis technology, all frames constituting the multimedia image are treated in the same position, so that a lot of time / effort / cost and computational burden is required for searching / detecting a specific object. There is also a problem that takes a lot of time / effort / cost in editing the image.

따라서 본 발명은 동영상 등과 같은 멀티미디어 영상의 분석 및 편집 과정을 단순화 및 자동화함으로써 분석/편집에 소요되는 시간/노력/비용을 최소화할 수 있게 하는, 장면 변화에 강인한 멀티미디어 영상 분석 장치 및 그 방법과, 그를 이용한 객체 기반 멀티미디어 편집 시스템을 제공하는데 그 목적이 있다.Accordingly, the present invention provides a multimedia image analysis apparatus and method that is robust to scene changes, which can minimize the time / effort / cost required for analysis / editing by simplifying and automating the process of analyzing and editing multimedia images such as moving images, and the like; The object is to provide an object-based multimedia editing system using the same.

본 발명의 목적들은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects and advantages of the present invention which are not mentioned above can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. Also, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

본 발명은 상기와 같은 목적을 달성하기 위하여, 멀티미디어 콘텐츠 영상을 소정의 영상 단위(장면이나 샷 등)로 분할하여 각각의 분할된 영상 단위마다 대표 프레임을 선정한 후, 그 선정된 대표 프레임들을 우선 대상으로 객체 분석/편집 등의 처리(예를 들면, 객체 분류, 객체 검색/추적, 객체정보 부가 등)를 수행하는 것을 특징으로 한다.In order to achieve the above object, the present invention divides a multimedia content image into predetermined image units (scenes or shots), selects a representative frame for each divided image unit, and then selects the selected representative frames. Process such as object analysis / editing (eg, object classification, object search / tracking, object information addition, etc.).

또한, 본 발명은, 객체의 유형(예를 들어, 인물 얼굴, 사물 등)에 따라 객체 검출/추적 방식을 다르게 적용하는 것을 특징으로 한다.In addition, the present invention is characterized in that the object detection / tracking method is applied differently according to the type of the object (for example, a face, an object, etc.).

또한, 본 발명은, 대표 프레임에서의 객체 검출(추출) 과정에서는 특징 기반의 객체 검출 방식을 이용하고, 대표 프레임에 소속되는 개별 프레임에서의 객체 추적 과정에서는 블록 기반의 객체 추적 방식을 이용하는 것을 특징으로 한다.In addition, the present invention uses a feature-based object detection method in the object detection (extraction) process in the representative frame, and a block-based object tracking method in the object tracking process in the individual frame belonging to the representative frame. It is done.

더욱 상세하게, 본 발명은, 장면변화에 강인한 멀티미디어 영상 분석 장치에 있어서, 멀티미디어 영상을 소정의 영상 단위로 분할하고, 각각의 분할 영상마다 대표 프레임을 선정하기 위한 장면 분할 수단; 및 상기 선정된 대표 프레임들을 중심 객체유형을 기준으로 객체 동일 여부에 따라 분류하기 위한 객체 검색 수단을 포함한다. More particularly, the present invention provides a multimedia video analyzing apparatus that is robust to scene changes, comprising: scene segmentation means for segmenting a multimedia video into predetermined video units and selecting a representative frame for each divided video; And object searching means for classifying the selected representative frames according to whether objects are identical based on a central object type.

또한, 본 발명은, 장면변화에 강인한 멀티미디어 영상 분석 장치에 있어서, 멀티미디어 영상을 소정의 영상 단위로 분할하고, 각각의 분할 영상마다 대표 프레임을 선정하기 위한 장면 분할 수단; 및 상기 선정된 대표 프레임들을 대상으로 하여 검색대상 객체를 검색하기 위한 객체 검색 수단을 포함한다. In addition, the present invention provides a multimedia video analyzing apparatus that is robust to scene changes, comprising: scene segmentation means for segmenting a multimedia video into predetermined video units and selecting a representative frame for each divided video; And object searching means for searching for a search object based on the selected representative frames.

또한, 본 발명은, 장면변화에 강인한 멀티미디어 영상 분석 방법에 있어서, 멀티미디어 영상을 소정의 영상 단위로 분할하는 분할 단계; 각각의 분할 영상마다 대표 프레임을 선정하는 대표 선정 단계; 및 상기 선정된 대표 프레임들을 중심 객체유형을 기준으로 객체 동일 여부에 따라 분류하는 분류 단계를 포함한다.The present invention also provides a multimedia image analysis method that is robust against scene changes, comprising: a segmentation step of dividing a multimedia image into predetermined image units; A representative selecting step of selecting a representative frame for each divided image; And classifying the selected representative frames according to whether objects are identical based on a central object type.

또한, 본 발명은, 장면변화에 강인한 멀티미디어 영상 분석 방법에 있어서,멀티미디어 영상을 소정의 영상 단위로 분할하는 분할 단계; 각각의 분할 영상마다 대표 프레임을 선정하는 대표 선정 단계; 및 상기 선정된 대표 프레임들을 대상으로 하여 검색대상 객체를 검색하는 객체검색 단계를 포함한다.In addition, the present invention, a multimedia image analysis method that is robust to scene changes, the division step of dividing the multimedia image into predetermined image units; A representative selecting step of selecting a representative frame for each divided image; And an object searching step of searching for a search target object based on the selected representative frames.

또한, 본 발명은 객체 기반 멀티미디어 편집 시스템에 있어서, 멀티미디어 영상을 소정의 영상 단위로 분할해서 분할 영상마다 대표 프레임을 선정하고, 상기 선정된 대표 프레임들을 우선 대상으로 상기 멀티미디어 영상을 분석하기 위한 멀티미디어 영상 분석 수단; 및 상기 멀티미디어 영상 분석 수단의 분석을 통하여 검색/추적된 객체에 대하여 해당 객체 정보를 입력하거나, 사용자 인터페이스를 통하여 장면/객체 편집을 수행하기 위한 관리 수단을 포함한다.In the object-based multimedia editing system, the multimedia image is divided into predetermined image units to select a representative frame for each divided image, and the multimedia image for analyzing the multimedia image based on the selected representative frames. Analysis means; And management means for inputting corresponding object information to an object searched / tracked through analysis of the multimedia image analyzing means, or performing scene / object editing through a user interface.

상기와 같은 본 발명은, 장면/샷 단위의 장면 분할을 통하여 영상 분석/편집 과정을 단순화/자동화함으로써 영상 분석/편집에 소요되는 비용/시간/노력을 최소화할 수 있는 효과가 있다.As described above, the present invention has the effect of minimizing the cost / time / effort required for image analysis / editing by simplifying / automating the image analysis / editing process through scene division in scene / shot units.

즉, 본 발명은 멀티미디어 콘텐츠 영상을 분석/편집함에 있어서 장면이나 비디오 샷 등의 특성을 고려하여 일정한 영상 단위(장면 또는 비디오 샷 등)마다 대표 프레임을 선정해서 대표 프레임들을 우선 대상으로 하여 입력 영상을 분석/편집함으로써 영상 분석/편집에 소요되는 비용/시간/노력을 최소화할 수 있는 효과가 있다.That is, the present invention selects a representative frame for each predetermined image unit (scene or video shot, etc.) in consideration of characteristics such as a scene or a video shot in analyzing / editing a multimedia content image. By analyzing / editing, it is possible to minimize the cost / time / effort for analyzing / editing images.

또한, 본 발명은, 장면/샷 단위로 그룹화하여 콘텐츠 영상 파일을 분석/편집함으로써 분석/편집 과정에서의 오류를 최소화할 수 있으며, 또한 발생된 오류도 신속하고 간편하게 시정(정정)할 수 있는 효과가 있다.In addition, the present invention can minimize the errors in the analysis / editing process by analyzing / editing the content image file grouped by scene / shot unit, and also the effect that can be quickly (easily) correct (error) generated errors There is.

또한, 본 발명은, 객체 정보의 입력을 자동화하고, 시간별/객체특성별로 객체를 배치하며, 객체의 동일 여부를 서로 다른 식별표지(색깔 등)를 이용하여 구분하는 등의 다양한 사용자 인터페이스 기능을 제공함으로써, 편집자로 하여금 입력 영상 파일의 분석/편집을 용이하고 신속하게 수행할 수 있게 하는 효과가 있다.In addition, the present invention provides various user interface functions such as automating the input of object information, arranging objects by time / object characteristics, and distinguishing whether objects are identical by using different identification marks (colors, etc.). By doing so, there is an effect that allows an editor to easily and quickly analyze / edit the input image file.

또한, 본 발명은, 간이/신속한 영상 콘텐츠의 분석/편집을 통하여 양방향 서비스를 위한 콘텐츠 제작비용을 최소화함으로써, IPTV 및 양방향 TV, 양방향 UCC, 인터넷 TV, 각종 VoD 시스템에서의 서비스(사용자 참여 형 양방향 서비스) 및 수익 모델을 다양하게 창출할 수 있게 하며, 또한 차세대 시멘틱 웹에서 텍스트뿐만 아니라 다양한 멀티미디어 컨텐츠 내에서도 이를 지향할 수 있는 기틀이 마련해 주는 효과가 있다. In addition, the present invention, by minimizing the content production cost for the interactive service through the analysis / editing of the simple / fast video content, services in IPTV and interactive TV, two-way UCC, Internet TV, various VoD systems Service) and revenue model, and the framework to aim for it in various multimedia contents as well as text in the next generation semantic web.

도 1은 본 발명에 따른 객체 기반 멀티미디어 편집 서비스에 대한 전반적인 개념 설명도,1 is a schematic conceptual diagram of an object-based multimedia editing service according to the present invention;

도 2는 본 발명에 따른 편집 유닛이 참조하는 기본 참조 유닛에 대한 설명도,2 is an explanatory diagram of a basic reference unit referred to by the editing unit according to the present invention;

도 3은 본 발명에 따른 객체 기반 멀티미디어 편집 시스템의 일실시예 구성도,3 is a block diagram of an embodiment of an object-based multimedia editing system according to the present invention;

도 4는 본 발명에 따른 도 3의 멀티미디어 영상 분석 장치의 일실시예 상세 구성도,4 is a detailed configuration diagram of an embodiment of the multimedia image analyzing apparatus of FIG. 3 according to the present invention;

도 5는 본 발명에 따른 객체 기반 멀티미디어 편집 방법에 대한 일실시예 흐름도,5 is a flowchart illustrating an embodiment of an object-based multimedia editing method according to the present invention;

도 6은 본 발명에 따른 DB 정보를 이용한 객체 검출 방법에 대한 일실시예 흐름도,6 is a flowchart illustrating an object detection method using DB information according to the present invention;

도 7은 본 발명에 따른 대표 프레임에서의 객체 추출 및 객체영역 지정 방법에 대한 일실시예 흐름도,7 is a flowchart illustrating an object extraction and object region designation method in a representative frame according to the present invention;

도 8은 본 발명에 따른 객체정보 저장 프로세스에 대한 일실시예 흐름도,8 is a flowchart illustrating an embodiment of an object information storage process according to the present invention;

도 9는 본 발명에 따른 장면 분할에 대한 화면 구성도,9 is a screen configuration diagram for scene division according to the present invention;

도 10은 본 발명에 따른 작업 및 일부 검증에 대한 화면 구성도,10 is a screen configuration diagram for the operation and some verification in accordance with the present invention;

도 11은 본 발명에 따른 특징 기반 객체 검출/추적에 대한 화면 구성도,11 is a screen configuration diagram for feature-based object detection / tracking according to the present invention;

도 12는 본 발명에 따른 블록 기반 객체 추적/검증에 대한 화면 구성도,12 is a block diagram of a block-based object tracking / verification according to the present invention,

도 13은 본 발명에 따른 객체정보 입력 인터페이스 화면에 대한 구성도,13 is a block diagram of an object information input interface screen according to the present invention;

도 14는 본 발명에 따른 샷 보기 화면에 대한 구성도,14 is a diagram illustrating a shot view screen according to the present invention;

도 15는 본 발명에 따른 객체 관리 화면에 대한 구성도,15 is a block diagram of an object management screen according to the present invention;

도 16은 본 발명에 따른 객체 추적 정보 화면에 대한 구성도,16 is a block diagram of an object tracking information screen according to the present invention;

도 17은 본 발명에 따른 장면 분할기 및 샷 보기 및 얼굴 그룹화에 따른 화면 구성 예시도,17 is a view illustrating a screen configuration according to a scene divider, a shot view, and a face grouping according to the present invention;

도 18은 본 발명에 따른 작업 및 일부 검증을 위한 화면 구성의 다른 예시도이다.18 is another exemplary diagram of a screen configuration for work and partial verification according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

100: 객체 기반 멀티미디어 편집 시스템 30: 제어 관리부100: object-based multimedia editing system 30: control management unit

31: 데이터 관리부 32: 멀티미디어 영상 분석 장치31: data management unit 32: multimedia image analysis device

33: 사용자 인터페이스 관리부 35: 입/출력 관리부33: user interface management unit 35: input / output management unit

36: 객체/객체정보 DB 303: 프레임 추출부36: object / object information DB 303: frame extraction unit

311: 장면 및 비디오 샷 관리부 312: 객체정보 관리부311: scene and video shot management unit 312: object information management unit

313: 객체별 특징 추출 함수 관리부 321: 장면 분할기313: feature extraction function management unit for each object 321: scene divider

322: 얼굴 검출기 323: 얼굴 인식기322: face detector 323: face recognizer

324: 비얼굴 객체 검출기 325: 비얼굴 객체 인식기324: non-face object detector 325: non-face object recognizer

326: 객체 추적기 331: 객체정보 입력부326: object tracker 331: object information input unit

332: 편집부 40: 객체 검색기332: editor 40: object finder

본 발명은 멀티미디어 저작 툴(Multimedia Authoring Tool), 즉 멀티미디어 영상 분석/편집(예를 들면, 객체 분류, 객체 검색/추적, 객체정보 부가 등)에 관한 것으로, 일련의 분석/편집 모듈들을 긴밀하게 연계시킴으로써, 임의로 주어진 원본 멀티미디어 객체(예컨대, 원본 동영상의 일부 셀)를 편집자의 선택에 따라, 다른 데이터(예컨대, 다른 정지 영상, 비디오 데이터, 다른 오디오 데이터, 다른 URL 데이터 등)과 실시간 연결시킬 수 있도록 하거나 기타의 이벤트를 추가할 수 있게 하는 것이다.The present invention relates to a multimedia authoring tool, that is, a multimedia image analysis / editing (eg, object classification, object search / tracking, object information addition, etc.), and closely links a series of analysis / editing modules. To arbitrarily link a given original multimedia object (e.g., some cells of the original video) with other data (e.g., other still images, video data, other audio data, other URL data, etc.) at the editor's option. Or add other events.

즉, 본 발명은 동영상 혹은 정지 영상에서 특정한 인물, 물건, 소재 등의 객체를 선택할 경우, 하이퍼텍스트와 유사한 어떠한 행위를 할 수 있는 이벤트를 갖는 콘텐츠를 분석/편집할 수 있게 하는 멀티미디어 분석/편집기 개발에 관련한 기술이다.That is, the present invention develops a multimedia analysis / editor that makes it possible to analyze / edit content having an event capable of performing any action similar to hypertext when selecting an object such as a specific person, object, or material from a video or still image. This is related to technology.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 객체 기반 멀티미디어 편집 서비스에 대한 개념 설명도이다.1 is a conceptual diagram illustrating an object-based multimedia editing service according to the present invention.

동영상 미디어(멀티미디어 영상, 멀티미디어 콘텐츠 영상)는 복수의 다양한 장면들로 구성되어 있다. 객체(Object)는 여러 장면을 통하여 규칙 없이 출몰하며, 카메라의 각도 및 거리에 따라 또는 객체 주위 환경에 따라 형태 및 크기가 다르게 나타난다. 예를 들어, 사람(객체)의 경우, 입고 있는 옷이나 액세서리에 따라 그 형태 및 크기가 다르게 나타난다. Video media (multimedia video, multimedia content video) is composed of a plurality of various scenes. Objects appear in various scenes without rules and appear in different shapes and sizes depending on the angle and distance of the camera or the environment around the objects. For example, in the case of a person (object), its shape and size are different depending on the clothes or accessories being worn.

특히, 3차원 공간에 놓여 있는 객체들이 카메라를 통하여 2차원 정보로 매핑 되고 물체 간의 가림 현상으로 인하여, 실제로 보여지는 객체의 형태 및 크기는 매우 다양하다. 게다가, 한 편의 동영상 내에서 관심 대상이 되는 객체의 개수는 서비스의 기획에 따라 그 개수가 어떻게 되는지를 가늠하기란 매우 곤란하다. In particular, the objects placed in the three-dimensional space are mapped to the two-dimensional information through the camera, and due to the obscuration between the objects, the shape and size of the objects actually seen vary widely. In addition, it is very difficult to estimate the number of objects of interest in a video based on the plan of the service.

본 발명은, 이러한 환경에서 최소한의 편집자 인터랙션(Interaction)을 통해서 관심 객체를 추출/추적하고, 자동화된 객체 추적 기술에 의해 연속되지 않은 장면에서 관심 객체가 출몰하더라도 관심 객체의 정확한 변화량을 측정하여 저장한다. 즉, 본 발명은 장면 분할, 객체 검출, 객체 인식, 얼굴 인식, 얼굴 영역 검출, 객체 추적, 객체영역 추출 등을 활용함으로써, 편집자로 하여금 최소한의 인터랙션만으로도 원하는 콘텐츠를 생성할 수 있게 한다.The present invention extracts and tracks an object of interest through minimal editor interaction in such an environment, and measures and stores the exact amount of change of the object of interest even when the object of interest appears in a non-contiguous scene by an automated object tracking technique. do. That is, the present invention utilizes scene segmentation, object detection, object recognition, face recognition, face region detection, object tracking, object region extraction, and the like, thereby enabling the editor to generate desired content with minimal interaction.

이하, 도 1을 참조하여 본 발명을 설명하기로 한다.Hereinafter, the present invention will be described with reference to FIG. 1.

본 발명에 따른 객체 기반 멀티미디어 편집 시스템(100)은 입력 파일(멀티미디어 콘텐츠 영상)을 장면 특성을 고려하여 일정한 영상 단위(예를 들면, 장면 또는 비디오 샷)로 분할하여 그룹화하고, 각각의 분할 그룹마다 대표 프레임(대표 화면)을 선정한 후 그 선정된 대표 프레임을 대상으로 객체 검색(객체 검출 및 객체 인식)을 수행한다. 객체 기반 멀티미디어 편집 시스템(100)에 대한 상세한 구성 및 그에 대한 설명은 도 3 및 도 4에서 하기로 한다.The object-based multimedia editing system 100 according to the present invention divides an input file (multimedia content image) into groups of predetermined image units (for example, scenes or video shots) in consideration of scene characteristics, and groups each input group. After selecting a representative frame (representative screen), an object search (object detection and object recognition) is performed for the selected representative frame. Detailed configurations and descriptions of the object-based multimedia editing system 100 will be described with reference to FIGS. 3 and 4.

여기서, 영상 단위(장면분할 단위)는 장면으로 설정할 수도 있으나, 보다 정확한 객체 검색을 위해서는 장면을 구성하는 비디오 샷(일반적으로 하나의 장면은 복수의 비디오 샷으로 구성되며, 경우에 따라서는 장면이 곧 비디오 샷이 되는 경우도 있다)을 영상 단위로 설정하는 것이 바람직하다. 만약, 영상 단위가 비디오 샷인 경우에는 사용자(편집자)의 편집의 용이성을 위하여 장면 분할을 수행하고, 각각의 분할된 장면마다 '샷' 단위로 멀티미디어 영상을 분할하게 된다.In this case, the image unit (scene division unit) may be set as a scene, but for a more accurate object search, a video shot constituting the scene (generally, one scene is composed of a plurality of video shots, Video shots) may be set in video units. If the video unit is a video shot, scene division is performed for ease of editing by a user (editor), and the multimedia video is divided in units of shots for each divided scene.

장면(Scene)과 샷(비디오 샷)(Video Shot)에 대하여 더 살펴보면 다음과 같다. 보통, 기획에 의해 제작된 동영상의 경우, 많은 수의 카메라를 사용하여 촬영되며, 이를 편집하여 하나의 동영상으로 만들어진다. 이때, 같은 시간, 같은 장소에서 일어난 사건에 대하여 3대의 카메라를 사용하고, A, B 두 등장인물이 있다고 가정한다. 카메라 1은 '인물 A', 카메라 2는 '인물 B', 카메라 3은 '인물 A와 B'를 동시에 촬영하며, 이러한 동영상은 같은 장소, 같은 시간의 사건이지만, 카메라의 전환에 의해 유사 영상의 집합으로 분할(분리)된다. 카메라 1에 의해 인물 A가 나타나는 유사영상의 집합을 "샷"(비디오 샷)이라 정의하고, 이러한 샷들로 이루어진 같은 장소, 같은 시간의 사건에 대한 집합을 "장면"이라 정의한다. Scenes and shots (Video Shot) are described in more detail as follows. In general, a video produced by a plan is shot using a large number of cameras, which are edited into a single video. In this case, it is assumed that three cameras are used for the events occurring at the same time and in the same place, and there are two characters, A and B. Camera 1 shoots 'Portrait A', Camera 2 shoots 'Portrait B', and Camera 3 shoots 'Portrait A and B' at the same time.These videos are from the same place and at the same time. It is divided into sets. A set of similar images in which person A appears by camera 1 is defined as a "shot" (video shot), and a set of events of the same place and time of these shots is defined as "scene".

즉, 통상적으로, 하나의 장면은 복수의 샷으로 구성되고, 하나의 샷은 복수의 영상 프레임으로 구성된다. 하지만, 콘텐츠 영상 장면의 특성에 따라서는 장면이 곧 샷이 되거나, 또는 개별 프레임이 곧 샷이 될 수도 있다.That is, typically, one scene is composed of a plurality of shots, and one shot is composed of a plurality of image frames. However, depending on the characteristics of the content video scene, the scene may be a shot or an individual frame may be a shot.

다음은 특정한 객체에 대한 정보(객체 정보)에 대하여 살펴보기로 한다. 통상적으로, 실제 원 영상(입력 파일)에서 어떠한 객체가 어떠한 정보를 필요로 하는지에 대해서는 멀티미디어 저작/편집 과정을 수행하기 전에 이미 기획되어 있어야 한다. 이렇게 기획된 서비스 정보(예를 들어, 객체가 "안경"과 같이 제품인 경우, 상품 정보, 가격, 구입처, 구입 사이트 등)는 서비스정보 관리기(111)를 통하여 체계적으로 관리되다가 사용자(편집자)의 요구에 따라 특정 객체의 메타데이터로서 입력된다. 즉, 상기와 같이 미리 기획된 서비스 정보는 객체정보 입력부(도 3의 "331")에 의해 해당 객체와 연결된다.Next, we will look at the information (object information) for a specific object. In general, what kind of information needs what object in the actual original image (input file) should be already planned before performing the multimedia authoring / editing process. The planned service information (for example, when the object is a product such as "glasses", product information, price, place of purchase, purchase site, etc.) is systematically managed through the service information manager 111, and the user (editor) requests Is input as metadata of a specific object. That is, the service information previously planned as described above is connected to the corresponding object by the object information input unit (“331” of FIG. 3).

또한, 객체정보 입력부(331)를 이용한 상세한 정보 입력은 멀티미디어 편집 비용의 증가를 가져온다.In addition, inputting detailed information using the object information input unit 331 increases the cost of multimedia editing.

만약, 객체 기반 멀티미디어 편집 시스템(100)이 자체 데이터베이스(예를 들어, 객체/객체정보 DB)를 통하여 객체에 대한 상세 정보를 저장/관리하고 있다면, 객체 검색 과정에서 동일 객체라고 판단된 경우(객체 정보가 저장되어 있는 객체라고 판단된 경우) 기존의 객체/객체정보 DB(도 3의 "36")에 존재하는 상세 정보가 그대로 입력된다. 즉, 특정 객체에 대하여 자동으로 객체 정보가 입력되는 것이다.If the object-based multimedia editing system 100 stores / manages detailed information about an object through its own database (eg, object / object information DB), it is determined that the object is the same object in the object search process (object In the case where it is determined that the information is stored, detailed information existing in the existing object / object information DB ("36" in FIG. 3) is input as it is. That is, object information is automatically input for a specific object.

한편, 객체 검색 과정에서 새로운 객체로 인식된 경우에는 웹 사이트(112) 검색을 통하여 해당 객체 정보를 찾아 입력한다면, 편집 비용을 절감할 수 있다. On the other hand, if it is recognized as a new object in the object search process, if the corresponding object information is found and input through the search of the web site 112, the editing cost can be reduced.

서비스정보 관리기(111)에서 관리하고 있는 서비스 정보는 정보 서버(160)를 통하여 전자상거래 서버(170)에 전달되어 서비스 제공에 이용될 수 있다. 즉, 정보 서버(160)는 전자 상거래 서버(170)와 연계하여 인터넷 쇼핑 서비스를 제공할 수도 있다. The service information managed by the service information manager 111 may be transmitted to the e-commerce server 170 through the information server 160 and used to provide a service. That is, the information server 160 may provide an internet shopping service in association with the e-commerce server 170.

한편, 객체 기반 멀티미디어 편집 시스템(100)의 객체추적 및 편집의 결과로 생성된 객체의 메타 데이터는 서비스 규격에 따라 BIFS, LASeR 등으로 변환된다.Meanwhile, metadata of an object generated as a result of object tracking and editing of the object-based multimedia editing system 100 is converted into BIFS, LASeR, etc. according to a service standard.

다중화 장치(멀티플렉서)(130)는 오디오/비디오 인코더(120)를 통하여 인코딩된 데이터와 상기 변환된 메타 데이터를 다중화한다. The multiplexing device (multiplexer) 130 multiplexes the encoded metadata and the encoded data through the audio / video encoder 120.

다중화 장치(멀티플렉서)(130)에서 출력되는 데이터가 결국 스트리밍 서버(140)를 통해 출력 장치(150)(예를 들어, 셋톱 박스, 사용자 단말 등)으로 서비스된다. 여기서, 셋톱 박스의 역다중화 장치(디멀티플렉서)는 메타 데이터가 사용자의 인터랙션이 가해 질 수 있는 상태로 재생하며, 사용자의 인터랙션이 가해질 경우 백본 채널을 통해 정보 서버(160)에 특정한 정보를 요청하여 수신한다. Data output from the multiplexing device (multiplexer) 130 is eventually serviced to the output device 150 (eg, set-top box, user terminal, etc.) through the streaming server 140. Here, the demultiplexer (demultiplexer) of the set top box reproduces metadata in a state in which user interaction can be applied, and requests and receives specific information from the information server 160 through the backbone channel when the user interaction is applied. do.

도 2는 본 발명에 따른 편집 유닛이 참조하는 기본 참조 유닛에 대한 설명도로서, 편집 유닛(22)에 해당하는 객체 기반 멀티미디어 편집 시스템(100)이 어떠한 기본 참조 유닛(20)을 참조하는지를 나타낸다. 2 is an explanatory diagram of a basic reference unit referred to by the editing unit according to the present invention, and shows which basic reference unit 20 the object-based multimedia editing system 100 corresponding to the editing unit 22 refers to.

기본 참조 유닛(20)에는 동영상 파서(201), 디코더(202), 고속 로 레벨(Low-level) 영상 처리기(203), 고속 미드 레벨(Mid-level) 영상처리기(204), 하이 레벨(High-level) 영상 처리기(205), 이미지 파일 처리기(206), 동영상 재생기(207), 인코더(208) 등이 있으며, 각각의 프로세서의 기능은 일반적으로 공지되어 있는 바, 상세한 설명은 생략하기로 한다.The basic reference unit 20 includes a video parser 201, a decoder 202, a low-level image processor 203, a fast mid-level image processor 204, and a high level. -level) There is an image processor 205, an image file processor 206, a video player 207, an encoder 208, and the functions of each processor are generally known, and detailed description thereof will be omitted. .

본 발명에 따른 객체 기반 멀티미디어 편집 시스템(100)은 편집 유닛(22)에 해당하는 것으로서, 객체 추적 및 편집을 수행함에 있어서 기본 참조 유닛(20)을 이용한다.The object-based multimedia editing system 100 according to the present invention corresponds to the editing unit 22 and uses the basic reference unit 20 in performing object tracking and editing.

한편, 출력 유닛(24)에는 서비스 전송스트림(Service TS) 데이터, 객체/객체정보 DB, 메타데이터가 있다. 여기서, 객체/객체정보 DB는 객체 기반 멀티미디어 편집 시스템(100)에 포함된다고 볼 수도 있다.On the other hand, the output unit 24 includes service transport stream (Service TS) data, object / object information DB, and metadata. Here, the object / object information DB may be considered to be included in the object-based multimedia editing system 100.

도 3은 본 발명에 따른 객체 기반 멀티미디어 편집 시스템의 일실시예 구성도이다.3 is a block diagram of an embodiment of an object-based multimedia editing system according to the present invention.

본 발명에 따른 객체 기반 멀티미디어 편집 시스템(100)은, 도 3에 도시된 바와 같이, 제어 관리부(30), 데이터 관리부(31), 멀티미디어 영상 분석 장치(32), 사용자 인터페이스 관리부(33), 입/출력 관리부(35), 및 객체/객체정보 DB(36) 등을 포함하여 이루어진다. 여기서, 제어 관리부(30), 데이터 관리부(31), 사용자 인터페이스 관리부(33), 입/출력 관리부(35)를 묶어서 하나의 "관리부"라 칭할 수도 있다. 특히, 관리부는 멀티미디어 영상 분석 장치(32)의 분석을 통하여 검색/추적된 객체에 대하여 해당 객체 정보를 입력하거나, 사용자 인터페이스를 통하여 장면/객체 편집 등을 수행한다고 할 수 있다.As shown in FIG. 3, the object-based multimedia editing system 100 according to the present invention includes a control manager 30, a data manager 31, a multimedia image analyzing apparatus 32, a user interface manager 33, and an input. And the output management unit 35 and the object / object information DB 36. Here, the control manager 30, the data manager 31, the user interface manager 33, and the input / output manager 35 may be collectively referred to as a "management part". In particular, the management unit may input the corresponding object information on the searched / tracked object through analysis of the multimedia image analyzing apparatus 32, or perform scene / object editing through a user interface.

이하, 각각의 구성요소에 대하여 상세히 설명하기로 한다.Hereinafter, each component will be described in detail.

먼저, 제어 관리부(30)에 대하여 설명하면, 다음과 같다. 제어 관리부(30)는 객체처리 제어부(301), 코덱 관리부(302), 프레임 추출부(303), 및 객체/객체정보 DB 관리부(304)를 포함하여 이루어진다.First, the control manager 30 will be described. The control manager 30 includes an object processing controller 301, a codec manager 302, a frame extractor 303, and an object / object information DB manager 304.

객체처리 제어부(301)는 객체 검출, 인식, 추적 등과 같은 객체 처리와 관련된 전반적인 제어 기능을 수행한다. 즉, 객체처리 제어부(301)는 프레임 추출부(303) 및 객체/객체정보 DB 관리부(304)를 제어하고, 객체 검출/인식, 객체영역 추출 등이 원활하게 수행되도록 장면 및 비디오샷 관리부(311), 객체별 특징 추출함수 관리부(313), 및 멀티미디어 영상 분석 장치(32) 등을 제어하며, 사용자 인터랙티비가 달성될 수 있도록 사용자 인터페이스 관리부(33)과 연동한다.The object processing control unit 301 performs overall control functions related to object processing such as object detection, recognition, tracking, and the like. That is, the object processing control unit 301 controls the frame extraction unit 303 and the object / object information DB management unit 304, and the scene and video shot management unit 311 so that object detection / recognition, object area extraction, etc. can be performed smoothly. ), The feature extraction function management unit 313 for each object, the multimedia image analysis device 32, and the like, and interwork with the user interface management unit 33 to achieve a user interaction ratio.

코덱 관리부(302)는 입력 파일(동영상)의 처리를 위하여, 기본 참조 유닛(20)의 상세 기능부, 예를 들어 동영상 파서(201), 디코더(202), 인코더(208) 등의 조합을 관리한다. The codec management unit 302 manages a combination of a detailed function unit of the basic reference unit 20, for example, a video parser 201, a decoder 202, an encoder 208, and the like for processing an input file (video). do.

프레임 추출부(303)는 입력 파일에 해당하는 멀티미디어 콘텐츠 동영상으로부터 프레임 단위의 동영상(이하, 간단히, 프레임이라 한다)을 추출하여 장면 및 비디오샷 관리부(311)와 프레임별 개별 객체정보 디스플레이부(341)로 전달한다. 또한, 프레임 추출부(303)는 프레임 단위의 추적 또는 검증을 위하여 해당 프레임 리스트를 장면 및 비디오샷 관리부(311) 또는/및 프레임별 개별 객체정보 디스플레이부(341)에 전달한다. The frame extractor 303 extracts a frame-by-frame video (hereinafter, simply referred to as a frame) from a multimedia content video corresponding to an input file, and includes a scene and video shot management unit 311 and an individual object information display unit 341 for each frame. To pass). In addition, the frame extractor 303 transmits the frame list to the scene and video shot management unit 311 or / or the individual object information display unit 341 for each frame for tracking or verifying on a frame basis.

객체/객체정보 DB 관리부(304)는 객체/객체정보 DB(36)를 관리하고, 객체 처리 제어부(301) 또는 객체 정보 구조화부(313)와 연동한다. 특히, 객체/객체정보 DB 관리부(304)는 객체 정보 구조화부(313)에 저장된 정보를 DB화하여 출력/관리한다. The object / object information DB manager 304 manages the object / object information DB 36 and interworks with the object processing controller 301 or the object information structurer 313. In particular, the object / object information DB manager 304 outputs / manages DB stored in the object information structurer 313.

다음은, 데이터 관리부(31)에 대하여 설명하기로 한다. 데이터 관리부(31)는 장면 및 비디오샷 관리부(311), 객체정보 구조 관리부(312), 객체정보 구조화부(313), 및 객체별 특징 추출 함수 관리부(314)를 포함하여 이루어진다.Next, the data management unit 31 will be described. The data manager 31 includes a scene and video shot manager 311, an object information structure manager 312, an object information structurer 313, and an object-specific feature extraction function manager 314.

장면 및 비디오샷 관리부(311)는 장면 분할기(321)에 의하여 분할된 장면과 비디오 샷을 구분하여 저장/관리함으로써 객체 검출의 속도는 증가시키면서도 동시에 검출되지 않는 객체의 수가 최소화되게 한다. 또한 장면 및 비디오샷 관리부(311)는 장면 분할기(321)에 의하여 선정된 각각의 장면 또는 비디오샷에 대한 대표 프레임을 저장/관리한다.The scene and video shot manager 311 classifies and stores / manages the scene and the video shot divided by the scene divider 321 to increase the speed of object detection while minimizing the number of objects that are not detected at the same time. In addition, the scene and video shot manager 311 stores / manages a representative frame for each scene or video shot selected by the scene divider 321.

객체정보 관리부(312)는 객체 정보를 총괄하여 구조화해서 저장/관리하는 것으로서, 객체정보 입력부(331)를 통하여 입력된 객체 정보를 저장하여 관리하며, 프레임별 개별 객체 정보 디스플레이부(341), 편집정보 디스플레이부(353), 메타데이터 출력부(354)를 통하여 출력된다. 여기서, 객체 정보에는 정량적 정보, 관계적 정보, 프레임 내에서의 영역, 프레임 번호, 특징 벡터, 이벤트 정보, 객체의 깊이 정보 등이 포함된다.The object information management unit 312 collectively stores and manages object information. The object information management unit 312 stores and manages object information input through the object information input unit 331. Each object information display unit 341 for each frame is edited. The information is output through the information display unit 353 and the metadata output unit 354. Here, the object information includes quantitative information, relational information, an area within a frame, a frame number, a feature vector, event information, and depth information of an object.

객체별 특징 추출 함수 관리부(313)는 각각의 객체 유형별로 특징 추출 함수를 다르게 관리하는 것으로서, 객체 유형별로 해당 객체를 인식하기 위한 특징데이터 및 알고리즘(객체 검출/인식 등에 대한 알고리즘)을 다르게 적용하기 위하여, 객체별로 특징 데이터 및 특징 추출 알고리즘의 종류를 기억함으로써 정확인 객체 인식을 가능하게 한다. 예컨대, 사물 인식과 얼굴 인식의 경우에 대하여 특징 데이터 및 알고리즘을 동일하게 적용하면, 인식 자체가 불가능한 경우가 발생할 수 있다. 따라서 객체 유형에 따라 서로 다른 알고리즘과 서로 다른 특징 데이터가 적용되는 것이 바람직하다. The feature extraction function management unit 313 for each object manages a feature extraction function differently for each object type, and applies different feature data and algorithms (algorithms for object detection / recognition) to recognize the object for each object type. For this purpose, accurate object recognition is possible by storing the feature data and the type of feature extraction algorithm for each object. For example, if the feature data and the algorithm are applied to the case of the object recognition and the face recognition in the same manner, it may occur that the recognition itself is impossible. Therefore, it is desirable to apply different algorithms and different feature data according to the object type.

다음은, 멀티미디어 영상 분석 장치(32)에 대하여 설명하기로 한다. 멀티미디어 영상 분석 장치(32)는 장면 분할기(321), 얼굴 검출기(322), 얼굴 인식기(323), 비얼굴 객체 검출기(324), 비얼굴 객체 인식기(325), 객체 추적기(326), 영역 추출기(327)를 포함하여 이루어진다. 이에 대해서는 도 4에서 상세히 설명하기로 한다.Next, the multimedia image analyzing apparatus 32 will be described. The multimedia image analyzing apparatus 32 includes a scene divider 321, a face detector 322, a face recognizer 323, a non-face object detector 324, a non-face object recognizer 325, an object tracker 326, and an area extractor. 327. This will be described in detail with reference to FIG. 4.

다음은, 사용자 인터페이스 관리부(33)에 대하여 설명하기로 한다. 사용자 인터페이스 관리부(33)는 객체정보 입력부(331) 및 편집부(332)를 포함하여 이루어지고, 편집부(332)는 편집 작업 제어부(3321)와 편집 작업 관리부(3322)를 포함하여 이루어진다.Next, the user interface manager 33 will be described. The user interface management unit 33 includes an object information input unit 331 and an editing unit 332, and the editing unit 332 includes an editing operation control unit 3331 and an editing operation management unit 3322.

객체정보 입력부(331)는 객체에 대한 기본 정보(예를 들어, 객체 이름 등) 및 링크 정보를 사용자로부터 입력받는 것으로서, 객체 간의 구조 정보, 특징 정보 및 영역 정보 등을 화면에서 확인할 수 있게 하고 또한 이를 수정할 수 있게 하는 사용자 인터페이스를 제공한다. The object information input unit 331 receives basic information (eg, object name, etc.) and link information about an object from a user, and enables to check structure information, feature information, area information, etc. between objects on a screen. It provides a user interface that allows you to modify it.

편집작업 제어부(3321)는 편집 작업 중에 필요한 도구 및 솔루션에 사용자가 접근할 수 있도록 인터페이스를 제공한다. 편집작업 관리부(3322)는 편집 작업의 진행 사항을 기록함으로써 편집 실수에 대비한다. 즉, 편집작업 관리부(3322)는 기록된 편집 과정에 대하여 취소와 반복이 가능하도록 편집 히스토리를 관리한다. The editing task controller 3331 provides an interface for the user to access tools and solutions required during the editing task. The editing operation management unit 3322 prepares for editing mistakes by recording the progress of the editing operation. That is, the editing task manager 3322 manages the editing history so that the recorded editing process can be canceled and repeated.

다음은, 입/출력 관리부(35)에 대하여 설명하기로 한다. 입/출력 관리부(35)는 메타 데이터 출력부(354)과 비디오 랜더링부(351)과 오디오 랜더링부(352)과 편집 정보 디스플레이 시스템(184)으로 구성 된다. Next, the input / output management unit 35 will be described. The input / output management unit 35 includes a metadata output unit 354, a video rendering unit 351, an audio rendering unit 352, and an editing information display system 184.

비디오 랜더링부(351)는 코덱 관리부(302)에서 출력되는 비디오 데이터를 렌더링하며, 이러한 렌더링 화면은 편집작업 제어부(3322)와 공유된다. The video rendering unit 351 renders the video data output from the codec manager 302, and the rendering screen is shared with the editing operation controller 3322.

오디오 랜더링부(352)는 코덱 관리부(302)에서 출력되는 오디오 데이터를 렌더링하여 스피커로 출력한다. The audio rendering unit 352 renders audio data output from the codec manager 302 and outputs the audio data to the speaker.

편집 정보 디스플레이부(353)는 편집 과정의 특수정보를 디스플레이함으로써 편집과정의 실수를 최소화하고 필요한 편집과정으로 유도한다.The editing information display unit 353 displays special information of the editing process, thereby minimizing mistakes in the editing process and inducing the necessary editing process.

메타 데이터 출력부(354)는 객체 정보 구조화 시스템에 저장된 객체의 시간, 영역, 이벤트 정보를 원하는 어떠한 규격이든 그 규격에 맞게 메타데이터로 생성한다. 이러한 메타데이터의 규격으로는 BIFS, LASeR 등이 있다. The metadata output unit 354 generates time, region, and event information of an object stored in the object information structuring system as metadata that conforms to any standard. Such metadata standards include BIFS and LASeR.

다음은, 검증 관리부(34)에 대하여 설명하기로 한다. 검증 관리부(34)는 프레임별 개별 객체 정보 디스플레이부(341)와 순차적 객체정보 디스플레이부(342)를포함하여 이루어진다.Next, the verification management unit 34 will be described. The verification manager 34 includes a frame-specific individual object information display unit 341 and a sequential object information display unit 342.

프레임별 개별 객체 정보 디스플레이부(341)는 각각의 객체에 대한 검출, 추출, 추적 결과에 대하여 사용자가 프레임별로 한눈에 알아볼 수 있도록 디스플레이 하고, 만약 오류가 있는 경우에는 쉽게 오류를 확인할 수 있게 디스플레이한다.The individual object information display unit 341 for each frame displays the detection, extraction, and tracking results of each object so that the user can recognize the frame at a glance, and if there is an error, displays the error easily. .

순차적 객체정보 디스플레이부(342)는 입력 데이터의 재생 과정에 편집된 객체 정보가 어떻게 나타나는지를 사용자가 한눈에 알아볼 수 있도록 디스플레이한다.The sequential object information display unit 342 displays the edited object information so that the user can recognize at a glance how the edited object information appears during the reproduction of the input data.

도 4는 본 발명에 따른 도 3의 멀티미디어 영상 분석 장치의 일실시예 상세 구성도이다.4 is a detailed block diagram of an embodiment of the multimedia image analyzing apparatus of FIG. 3 according to the present invention.

본 발명에 따른 장면 변화에 강인한 멀티미디어 영상 분석 장치(32)는 멀티미디어 영상을 소정의 영상 단위로 분할해서 분할 영상마다 대표 프레임을 선정하고, 그 선정된 대표 프레임들을 우선 대상으로 하여 멀티미디어 영상을 분석하는 것이다.The multimedia image analyzing apparatus 32 robust to scene change according to the present invention divides a multimedia image into predetermined image units, selects a representative frame for each divided image, and analyzes the multimedia image by first selecting the representative frames. will be.

상기와 같은 멀티미디어 영상 분석 장치(32)는 도면에 도시된 바와 같이, 장면 분할기(321), 얼굴 검출기(322), 얼굴 인식기(323), 비얼굴 객체 검출기(324), 비얼굴 객체 인식기(325), 및 객체 추적기(326)를 포함하여 이루어진다. 여기서, "322" 내지 "325"를 객체 검색기(40)라 하고, "401"은 객체 검출기에 해당하며, "402"는 객체 인식기라 한다. 도면에는 도시되지 않았으나, 장면의 특성에 따라서는 객체 유형(사람 얼굴, 동물 얼굴, 의류 등)별로 다양한 객체 검출기 및 객체 인식기가 포함될 수 있다. As illustrated in the drawing, the multimedia image analyzing apparatus 32 includes a scene divider 321, a face detector 322, a face recognizer 323, a non-face object detector 324, and a non-face object recognizer 325. , And an object tracker 326. Here, "322" to "325" are referred to as the object finder 40, "401" corresponds to the object detector, and "402" is called the object recognizer. Although not shown in the drawing, various object detectors and object recognizers may be included for each object type (human face, animal face, clothing, etc.) according to the characteristics of the scene.

도 3에 도시된 멀티미디어 영상 분석 장치(32)에서는 객체 추출(검출)/인식/추적, 영역 추출 등과 관련된 부수적인 제어 기능(개체별 특징 추출 함수 관리, 멀티미디어 분석 결과의 출력, 사용자 인터페이스 기능 등)을 각각의 유형에 따라 별도의 관리부를 통하여 구현되도록 구성하였으나, 실시예에 따라서는 도 4에 도시된 바와 같이 기능 수단에 관련 제어 기능을 각각의 해당 기능 수단에서 동작하도록 구현할 수도 있다(도 3과 도 4의 대응관계를 고려하여 도 3과 동일한 식별번호를 사용하기로 한다). 예를 들어, 객체 추적기(326)에 의한 추적 결과를 사용자에게 출력해주는 기능을 해당 객체 추적기에서 수행하도록 구성할 수도 있다.In the multimedia image analyzing apparatus 32 illustrated in FIG. 3, additional control functions related to object extraction (detection) / recognition / tracking, region extraction, etc. (object feature extraction function management, output of a multimedia analysis result, a user interface function, etc.) Although it is configured to be implemented through a separate management unit according to each type, according to the embodiment may be implemented to operate the control function related to the functional means in each of the corresponding functional means as shown in FIG. Considering the correspondence of FIG. 4, the same identification number as that of FIG. 3 will be used). For example, the object tracker may be configured to perform a function of outputting the tracking result by the object tracker 326 to the user.

장면 분할기(321)는 멀티미디어 영상을 영상 단위(장면, 샷)로 분할하고, 각각의 분할 영상마다 대표 프레임을 선정한다. 더욱 상세하게, 장면 분할기(321)는 하나의 카메라에 의해 기록된 연속적인 일련의 프레임들, 즉 비디오의 물리적인 기본단위로서 샷 경계 면에 의해 구분된 비디오 샷과 의미적으로 관련되어 있고 시간적으로는 이웃한 샷들의 모임인 장면을 구분하며, 하나의 비디오 장면 또는/및 샷의 내용을 가장 잘 반영하는 대표 프레임(대표 화면)을 선정한다. 여기서, 분할된 샷 또는 장면마다 해당 분할된 샷 또는 장면에 속하는 모든 프레임들 중에서 가장 많은 정보를 포함하는 프레임을 해당 분할된 샷/장면의 대표 프레임으로 선정할 수 있으며, 이러한 대표 프레임 선정 방식을 사용한다면 비록 대표 프레임에서의 검색을 실시하더라도 그 오류를 줄일 수 있게 된다. 이러한 프레임에서의 정보에는 포커스 정보 등의 낮은 등급의 정보부터 갬색된 객체(얼굴, 비 얼굴) 개수 등이 포함될 수 있다. The scene divider 321 divides the multimedia image into image units (scenes and shots), and selects a representative frame for each divided image. More specifically, the scene divider 321 is semantically related and temporally related to a series of frames recorded by one camera, i.e., video shots separated by shot boundaries as a physical basic unit of video. Identifies scenes that are collections of neighboring shots and selects a representative frame (representative screen) that best reflects the content of one video scene and / or shot. Here, a frame including the most information among all frames belonging to the divided shot or scene for each divided shot or scene may be selected as the representative frame of the divided shot / scene, and the representative frame selection method is used. If the search is performed in the representative frame, the error can be reduced. The information in such a frame may include low-level information such as focus information and the number of darkened objects (faces and non-faces).

객체 검색기(40)는 영상 프레임(대표 프레임, 개별 프레임)에서 특정한 객체를 검출(추출)하는 '객체 검출기'(401)와, 상기 검출된 객체와 「비교 대상이 되는 객체(검색대상 객체, 관심 객체)」가 동일한지를 판단하는 '객체 인식기'(402)를 포함하여 이루어진다. 여기서, 객체 검출기(401)는 얼굴 검출기(322) 및 비얼굴 객체 검출기(324)를 포함하여 이루어지며, 실시예에 따라서는 객체 유형 별로 그에 부합하는 객체 검출기를 포함할 수 있다. 객체 인식기(402)도 객체 검출기(401)와 동일한 방식으로 구성된다. The object finder 40 is an 'object detector' 401 that detects (extracts) a specific object from an image frame (a representative frame, an individual frame), and an object (a search target object, an object to be compared with the detected object). Object) " to determine whether the object is the same. Here, the object detector 401 includes a face detector 322 and a non-face object detector 324, and in some embodiments, may include an object detector corresponding to each object type. The object recognizer 402 is configured in the same manner as the object detector 401.

우선, 객체 검색기(40)에 대하여 전반적으로 설명하면, 다음과 같다.First, the overall description of the object finder 40 is as follows.

객체 검색기(40)는 장면 분할기(321)에 의하여 선정된 대표 프레임들을 중심 객체유형을 기준으로 객체 동일 여부에 따라 분류한다. 즉, 중심 객체유형이 "인물 얼굴"인 경우, 인물 얼굴을 기준으로 대표 프레임을 분류한다. 배우 A의 얼굴이 포함된 대표 프레임 그룹, 배우 B의 얼굴이 포함된 대표 프레임 그룹 등으로 분류될 수 있으며, 동일 배우에 대해서도 특징 데이터가 복수 개 있는 경우(예를 들어, 촬영 각도, 얼굴 표정 등에 따라 특징 데이터가 다를 수 있음)에는 동일 배우에 대해서도 복수 개의 대표 프레임 그룹이 존재할 수 있다. The object finder 40 classifies the representative frames selected by the scene divider 321 according to whether the objects are the same based on the central object type. That is, when the central object type is "person face", the representative frame is classified based on the person face. A representative frame group including an actor A's face, a representative frame group including an actor B's face, and the like, and there are a plurality of feature data for the same actor (for example, a shooting angle, facial expression, etc.). Feature data may vary), a plurality of representative frame groups may exist for the same actor.

여기서, 객체 검색기(40)는 객체 동일 여부의 판단 대상이 되었던 중심 객체유형에 해당하는 객체(예를 들면, 인물의 얼굴)에 대해서는 해당 대표 프레임 내에서 식별 표지를 통하여 구분하되, 분류 그룹별로 서로 다른 식별 표지를 사용한다, 즉, 분류 그룹별로 색깔이 다른 식별 표지를 이용하거나, 사각형, 오각형 등 형태가 다른 식별 표지를 사용한다. 여기서, 식별 표지는, 중첩되는 객체가 보일 수 있도록 색깔로 구분되는 다각형 형태(이에 한정되는 것은 아님)의 영상(도 14의 "1402")일 수 잇으며, 이는 식별 대상이 되는 객체 (얼굴 객체)에 중첩(오버레이)된다. Here, the object finder 40 classifies an object (eg, a face of a person) corresponding to a central object type that has been determined to be the same object through an identification mark in a corresponding representative frame, but classifies each other according to a classification group. Use different identification markers, that is, use different identification markers for different classification groups, or use different identification markers such as squares and pentagons. Here, the identification mark may be an image ("1402" in FIG. 14) of a polygonal shape (but not limited thereto) that is color-coded so that the overlapping objects can be seen, which is an object to be identified (a face object). ) Is overlapped (overlay).

또한, 객체 검색기(40)는 분류 그룹(분류된 대표 프레임 그룹)별로 해당 소속 대표 프레임(특정 대표 프레임 그룹에 속하는 하나 또는 복수의 대표 프레임)을 대상으로 하여 검색대상 객체(예를 들어, 특정한 배우의 얼굴, 특정 사물(특정한 형태의 안경) 등)를 검색하고, 그 검색 결과를 구조화하여 사용자가 볼 수 있게 한다. 즉, 검색대상 객체가 포함된 대표 프레임들을 그룹화하여 관리한다. 실시예에 따라서는 중심 객체유형에 따른 분류 과정을 수행하지 않고, 장면 분할기(321)에서 선정된 대표 프레임들을 대상으로 하여 검색 대상 객체를 검색할 수도 있다. 여기서, 검색대상 객체와 동일한 객체로 인식되어 검색된 객체(특정인물의 얼굴, 특정한 사물 등)에 대해서는 해당 대표 프레임 내에서 식별표지를 이용하여 구분하되, 동일 객체들에 대해서는 동일한 식별표지를 사용하는데, 이에 대해서는 위에서 설명한 방식과 동일한 방식을 사용한다. 그리고, 검색대상 객체는 데이터베이스(36)를 통하여 저장/관리되는 객체이거나, 또는 사용자(편집자)로부터 검색을 위하여 입력받은 객체일 수 있다.In addition, the object searcher 40 may search for an object (eg, a specific actor) by targeting a corresponding representative frame (one or a plurality of representative frames belonging to a specific representative frame group) for each classification group (classified representative frame group). Search for a face, a specific object (glasses of a certain type, etc.), and structure the search result so that the user can view it. That is, the representative frames including the search object are grouped and managed. According to an exemplary embodiment, the search target object may be searched for the representative frames selected by the scene divider 321 without performing the classification process according to the central object type. Here, the objects (recognized faces, specific objects, etc.) that are recognized as the same object as the search object are classified by using the identification mark within the representative frame, but the same identification marks are used for the same objects. This is the same method as described above. The object to be searched for may be an object stored / managed through the database 36 or an object input for a search from a user (editor).

또한, 객체 검색기(40)는 검색대상 객체의 유형별로 객체 검색 관련 알고리즘을 다르게 적용할 수 있다. 즉, 검색대상 객체가 얼굴 객체 또는 비얼굴 객체 중 어느 하나에 속하는지에 따라 객체 검색 관련 알고리즘을 다르게 적용할 수 있다. 또한, 객체 검색기(40)는 대표 프레임에서의 객체 검색은 특징 기반의 객체 검출 방식을 이용한다. In addition, the object searcher 40 may apply an object search related algorithm differently for each type of search object. That is, an object search related algorithm may be applied differently according to whether the search object belongs to one of a face object and a non-face object. In addition, the object searcher 40 uses a feature-based object detection method for object search in the representative frame.

이하, 객체 검색기(40)의 각각의 구성요소별로 구분하여 설명하면, 다음과 같다. Hereinafter, the description will be made for each component of the object finder 40 as follows.

얼굴 검출기(322)는 임의의 대표 프레임에서 사람의 얼굴 영역을 추출하고, 얼굴 인식기(323)는 얼굴의 특징 벡터를 비교하여 동일 인물 여부를 판단하는 것이다. 객체는 그 객체 유형마다 특징 벡터(특징 데이터)가 서로 다르기 때문에 객체 검출/인식을 위한 알고리즘을 객체 유형별로 다르게 적용하는 것이 바람직하다. 특히, 얼굴의 경우에는 다른 유형의 객체(예를 들어, "자동차", "꽃병", "손", "의류" 등과 같은 비얼굴 객체)와는 다른 특성이 존재하는 바, 적어도 얼굴과 비얼굴 객체로는 구분하여 서로 다른 알고리즘을 적용하는 것이 바람직하다.The face detector 322 extracts a face region of a person from an arbitrary representative frame, and the face recognizer 323 compares feature vectors of a face to determine whether they are the same person. Since an object has different feature vectors (feature data) for each object type, it is preferable to apply an algorithm for object detection / recognition differently for each object type. In particular, faces have different characteristics from other types of objects (e.g., non-face objects such as "car", "vase", "hand", "clothing", etc., at least face and non-face objects It is desirable to apply different algorithms.

비얼굴 객체 검출기(324)는 미리 저장되어 있는 객체/객체정보 DB(36)의 특징 벡터나 사용자의 인터랙티브에 의해 추출된 객체의 특징 벡터를 이용하여, 비디오 샷 혹은 장면의 대표 프레임에서 동일 객체를 찾는다. 한편, 비얼굴 객체 인식기(325)는 추출된 특징 벡터의 비교를 통하여 비교 대상의 두 객체가 동일한 객체인지 여부를 판단한다. The non-face object detector 324 uses the feature vector of the object / object information DB 36 stored in advance or the feature vector of the object extracted by the user's interaction, to detect the same object in the representative frame of the video shot or scene. Find. Meanwhile, the non-face object recognizer 325 determines whether two objects to be compared are the same object by comparing the extracted feature vectors.

다음은, 객체 추적기(326)에 대하여 설명하기로 한다.Next, the object tracker 326 will be described.

객체 추적기(326)는 검색대상 객체가 검색된 대표 프레임에 소속된 개별 프레임에 대하여 상기 검색대상 객체를 추적하는 것으로서, 특히 블록 기반의 객체 추적 방식을 이용한다. The object tracker 326 tracks the searched object with respect to individual frames belonging to the representative frame from which the searched object is searched. In particular, the object tracker 326 uses a block-based object tracking method.

또한, 객체 추적기(326)는 장면/비디오 샷의 대표 프레임에서 검출된 객체를 그 대표프레임이 포함된 비디오 샷 혹은 장면이 포함하고 있는 모든 프레임에서 동일 객체의 위치를 파악한다. 즉, 객체 추적기(326)는 해당 대표 프레임에 소속된 개별 프레임들에 대하여 검색대상 객체의 시간상 위치(프레임 번호) 및 해당 프레임 내에서의 공간상의 위치 등을 획득한다.In addition, the object tracker 326 detects an object detected in the representative frame of the scene / video shot and locates the same object in every frame included in the video shot or the scene including the representative frame. That is, the object tracker 326 obtains the temporal position (frame number) of the search object and the spatial position within the frame with respect to the individual frames belonging to the representative frame.

또한, 객체 추적기(326)는 검색대상 객체와 동일한 객체로 인식되어 추적된 객체에 대해서는 해당 프레임 내에서 식별표지(객체 검색기에서 설명한 바와 같음)를 사용하여 구분하되, 동일 객체들에 대해서는 동일한 식별표지를 사용한다.In addition, the object tracker 326 recognizes the tracked object as the same object as the searched object and distinguishes the tracked object using an identification mark (as described in the object searcher) within the frame, but the same identification mark for the same objects. Use

한편, 영역 추출기(327)(도 4에는 미도시됨)는 사용자 인터랙티브에 의해 객체가 추출되어 질 경우 최소한의 사용자 액션에 의하여 관심 객체를 정확하게 추출한다. 즉, 추출되는 관심 객체에 비관심 대상인 다른 객체가 포함되지 않도록 한다.Meanwhile, the region extractor 327 (not shown in FIG. 4) accurately extracts the object of interest by minimal user action when the object is extracted by the user interactive. That is, the extracted object of interest does not include other objects of interest.

검출/추적된 객체에 대해서는 객체 정보를 입력할 수 있는데, 그 방식에는 객체정보 입력부(331)를 통하여 사용자가 입력하는 방식과 객체/객체정보 DB(36) 등을 이용한 자동 입력 방식이 있다. 즉, 객체/객체정보 DB(36)에 객체에 대한 상세 정보가 있으면, 비얼굴 객체 검출기(324) 및 얼굴 인식기(323)의 동작 단계에서 동일 객체라 판단된 경우 기존의 객체/객체정보 DB(36)에 존재하는 상세 정보가 그대로 입력(자동 입력)되며, 새로운 객체의 경우에는 웹(Web) 검색(112)에 의한 입력 방식을 사용할 수 있으며, 이로 인하여 편집 비용/노력/시간을 현저히 절감할 수 있다. Object information may be input to the detected / tracked object, and there are automatic input methods using a user input through the object information input unit 331 and an object / object information DB 36. That is, when the object / object information DB 36 has detailed information about the object, when it is determined that the object is the same object in the operation steps of the non-face object detector 324 and the face recognizer 323, the existing object / object information DB ( The detailed information existing in 36) is input as it is (automatic input), and in the case of a new object, the input method by the web search 112 can be used, thereby significantly reducing the editing cost / effort / time. Can be.

도 5는 본 발명에 따른 객체 기반 멀티미디어 편집 방법에 대한 일실시예 흐름도로서, 도 3에 도시된 바와 같은 구성을 가지는 객체 기반 멀티미디어 편집 시스템(100)에서 수행되는 방법을 나타낸다. FIG. 5 is a flowchart illustrating an object-based multimedia editing method according to the present invention, and shows a method performed in the object-based multimedia editing system 100 having the configuration as shown in FIG. 3.

도 5에 도시된 멀티미디어 편집 방법은 크게 (1)입력 비디오 파일에 대하여 장면 분할을 통하여 객체를 그룹화하는 과정(DB 정보를 이용한 객체 검출 프로세스 포함)(50), (2)사용자(편집자)의 선택에 의하여 지정된 특정 객체를 추적하는 과정(52), (3)사용자(편집자)에 의한 오류 검증 및 편집 과정(54)으로 나눌 수 있다. 이하, 각각의 과정을 설명하면 다음과 같다.The multimedia editing method illustrated in FIG. 5 includes (1) a process of grouping objects through scene division for input video files (including object detection process using DB information) 50, (2) user (editor) selection It can be divided into a process 52 for tracking a specific object designated by (3), (3) error verification and editing process 54 by the user (editor). Hereinafter, each process will be described.

사용자(편집자)로부터 편집 대상이 되는 원본 파일(멀티미디어 콘텐츠 영상 파일, 즉 비디오 파일)을 입력받으면(500), 입력받은 원본 비디오 파일에서 장면 및/또는 샷을 검출하는 과정을 수행하고, 이 과정에서 각각의 장면 및 샷마다 대표 프레임(대표 화면)을 선정한다(502, 도 9 참조). 이렇게 분리(분할된) 장면/샷과 대표 프레임(대표 화면)에 대한 정보는 정면 및 비디오샷 관리부(311)에서 저장/관리된다.When a source file (multimedia content image file, that is, a video file) to be edited is input from a user (editor) (500), a process of detecting a scene and / or a shot from the input source video file is performed. A representative frame (representative screen) is selected for each scene and shot (see 502, Fig. 9). The information about the divided (divided) scene / shot and the representative frame (representative screen) is stored / managed by the front and video shot manager 311.

객체 유형마다 상이한 특성이 있어서 접근 방식을 객체 유형별로 다르게 하는 것이 바람직한데, 본 발명에 따른 도 5에서는 의미 객체(중심 객체)의 유형을 "인물 얼굴"로 정하여 처리하는 경우를 설명하기로 한다. 의미 객체(중심 객체)는 멀티미디어 콘텐츠 영상의 장면 특성에 따라 결정되는 바, 반드시 "인물 얼굴"로 한정되는 것이 아니며, 예를 들어 '동물'들만 나오는 동영상의 경우에는 '동물' 또는 '동물 얼굴'이 중심 객체(의미 객체)가 될 수 있다. 또한, 중심 객체는 사용자(편집자)에 의하여 임의로 선택될 수도 있다. Since different types of objects have different characteristics, it is preferable to make the approach different for each object type. In FIG. 5 according to the present invention, a case in which a type of a semantic object (center object) is set as a “person face” will be described. The semantic object (center object) is determined according to the scene characteristics of the multimedia content image, and is not necessarily limited to the "person face". This can be the central object. In addition, the central object may be arbitrarily selected by a user (editor).

장면/샷으로의 분할과 각각의 장면/샷에 대한 대표 프레임(912, 962, 1401) 이 결정(선정)되면, 얼굴 검출기(322) 및 얼굴 인식기(323)를 통한 얼굴 검출 및 그룹화가 진행된다(504). 즉, 대표 프레임들만을 대상으로 하여 각각의 대표 프레임에서 얼굴 객체를 검출한 후, 검출된 얼굴 객체들을 비교하여 동일한 얼굴로 인식되는 객체들이 포함된 대표 프레임들을 동일 그룹으로 분류한다. 예를 들어, 도 17의 객체 트리에서 "얼굴"(1700) 아래에는 "공유_얼굴" 그룹(1701), "윤은혜_얼굴" 그룹(1702) 등이 있는데, 이것이 바로 얼굴을 중심으로 그룹화한 결과이다. 그리고, "공유_얼굴" 그룹(1701)에는 동일한 얼굴 객체로 인식된 "공유" 얼굴 객체를 포함하는 대표 프레임들이 포함되는데, 포함되는 대표 프레임들이 서로 다른 장면이나 서로 다른 샷에 속하더라도 상관이 없다. 동일한 인물일지라도 카메라 각도, 표정 등에 따라 서로 다른 얼굴 객체로 분류될 수도 있다.When the division into scenes / shots and the representative frames 912, 962, and 1401 for each scene / shot are determined (selected), face detection and grouping through the face detector 322 and the face recognizer 323 proceed. (504). That is, after detecting a face object in each representative frame using only the representative frames, the representative frames including objects recognized as the same face are classified into the same group by comparing the detected face objects. For example, in the object tree of FIG. 17, under the "face" 1700, there are the "shared_face" group 1701, the "Yun Eun Hye_face" group 1702, and this is the result of grouping on the face. to be. In addition, the "shared_face" group 1701 includes representative frames including the "shared" face object recognized as the same face object, regardless of whether the included representative frames belong to different scenes or different shots. . Even the same person may be classified into different face objects according to camera angles and facial expressions.

그리고, 상기와 같이 검출 및 그룹화가 완료된 얼굴객체는 객체정보 관리부(312)에 등록되어 객체 트리 창(1500)의 객체 목록과 장면/객체 보기 창(950)의 객체 목록으로 갱신된다. In addition, the face objects having been detected and grouped as described above are registered in the object information manager 312 and updated with the object list of the object tree window 1500 and the object list of the scene / object view window 950.

상기와 같이 얼굴을 중심 객체로 하여 그룹화가 완료된 경우, 추가적인 검색 대상이 되는 객체(검색 대상 객체)가 저장되어 있는 객체/객체정보 DB(36)가 있는지를 확인하여(506), 객체/객체정보 DB(36)가 있으면 DB 정보를 이용한 객체 검출 프로세스를 수행한다(508). When the grouping is completed using the face as the center object as described above, it is checked whether there is the object / object information DB 36 in which the additional object to be searched (the object to be searched) is stored (506), and the object / object information. If there is a DB 36, the object detection process using the DB information is performed (508).

"508" 과정을 설명하면, 다음과 같다. 얼굴 객체를 중심 객체로 하여 그룹화를 수행한 후, 객체/객체정보 DB(36)에 검색 대상이 되는 특정한 객체 데이터(예를 들어, 특정한 배우의 얼굴 영상, 특정한 형상의 자동차 객체 등)가 있음이 확인되면 각 그룹의 대표 프레임을 대상으로 해서 해당 검색 대상 객체를 검출하여 그 검출된 객체에 대하여 객체 정보를 입력한다. 즉, 검색 대상 객체에 대한 객체정보가 객체/객체정보 DB(36)에 저장되어 있으면, 그 객체 정보를 복사하다가 상기 검출된 객체에 대한 객체 정보로서 자동 입력(연결)한다(즉, 객체/객체정보 DB(36)에 저장된 객체와의 매칭을 통하여 해당 객체 정보를 자동으로 입력한다). "508" 과정에 대한 더욱 상세한 설명은 도 6에서 하기로 한다.The process of "508" is described as follows. After the grouping is performed using the face object as the center object, the object / object information DB 36 has specific object data (for example, a face image of a specific actor, a car object of a specific shape, etc.) to be searched. If it is confirmed, the target object of each group is detected and the object information is input to the detected object. That is, if the object information on the object to be searched is stored in the object / object information DB 36, the object information is copied and automatically input (connected) as the object information on the detected object (that is, the object / object The object information is automatically input through matching with the object stored in the information DB 36). A more detailed description of the process "508" is given below in FIG. 6.

일반 객체의 경우 편집자(사용자) 입력에 의해 중심 객체(의미 객체)가 등록되며("504"에서는 디폴트로 얼굴 객체를 중심 객체로 설정한 경우로 볼 수 있으며, 실시예에 따라서는 편집자가 중심 객체를 임의로 설정하게 할 수 있다), 등록된 객체에 대해서는 자동인식 및 추적 과정을 통하여 위치정보가 등록된다. 하지만, 원본 파일(입력 비디오 파일)의 입력시에 객체/객체정보 DB(36)를 함께 입력받은 경우에는 DB(36)에 정보가 있는 객체에 대해서는 "DB 정보를 이용한 객체 검출 프로세스"(508)가 수행되며, 이를 통해 자동 추출(검출) 및 추적 과정이 이루어진다. In the case of a general object, the center object (meaning object) is registered by the editor (user) input (in the case of "504", it can be regarded as the case where the face object is set as the center object by default, and in some embodiments, the editor is the center object). Can be set arbitrarily), the location information is registered through the automatic recognition and tracking process for the registered object. However, when the object / object information DB 36 is input together with the input of the original file (input video file), the object having the information in the DB 36 is "object detection process using DB information" (508). Is performed, and an automatic extraction (detection) and tracking process is performed.

"504" 과정 또는 "508" 과정을 통하여 검출된 장면/샷의 대표 프레임을 디스플레이하여 사용자에게 제공한다(510).The representative frame of the scene / shot detected through the process “504” or “508” is displayed and provided to the user (510).

이후, 사용자(편집자)로부터 객체 검출/추적을 진행할 장면 또는 샷을 선택받고 또한 그 선택된 장면 또는 샷의 대표 프레임에서 관심 객체(검색대상 객체)를 선택받는다(512). 여기서, 관심 객체란 검출/추적 대상이 되는 객체(검색대상 객체)를 의미하는 것으로서, 사용자가 대표 프레임 상에서 네모 박스 형태 등과 같은 영역 지정 방식으로 선택된다. 예를 들어, 특정 인물의 얼굴에 있는 "안경"을 관심 객체로 선택할 수 있다. Thereafter, a scene or shot to be detected / tracked by the user (editor) is selected, and an object of interest (a search target object) is selected in the representative frame of the selected scene or shot (512). Herein, the object of interest refers to an object to be detected / tracked (a search object), and the user is selected in an area designation method such as a square box shape on the representative frame. For example, "glasses" on the face of a particular person may be selected as the object of interest.

"512"에서의 선택 과정이 이루어지면, 선택된 장면 또는 선택된 샷의 대표 프레임(만약, 특정한 장면이 선택된 경우에는 그 장면의 대표 프레임)에서 관심 객체 영역을 추출한 후, 그 추출된 관심 객체 영역을 지정해 줄 수 있는데(514), 이에 대해서는 도 7에서 상세히 설명하기로 한다.When the selection process is made at "512", the region of interest is extracted from the selected scene or the representative frame of the selected shot (if the specific frame is selected, the representative frame of the scene), and then the extracted object region of interest is designated. 514, which will be described in detail with reference to FIG. 7.

이후, 사용자(편집자)는 추출된 객체에 대하여 객체 정보(이벤트, 구조 정보 등을 포함)를 입력할 수 있으며(516), 이렇게 입력된 객체 정보는 객체정보 관리부(312)를 통하여 저장/관리된다.Thereafter, the user (editor) may input object information (including event and structure information) with respect to the extracted object (516), and the input object information is stored / managed through the object information management unit 312. .

그리고, 선택된 장면/샷 내의 다른 모든 프레임에서 "514"에서 추출된 객체와 동일한 객체를 검출/추적하게 된다(518). 사용자의 요청이 있으면, 각각의 프레임에서의 추적 결과를 사용자가 볼 수 있도록 제공한다(520).Then, in all other frames in the selected scene / shot, the same object as the object extracted at "514" is detected / tracked (518). If requested by the user, the tracking result in each frame is provided for the user to view (520).

다시 말해, "518"의 객체 추적 결과는 프레임 별 객체정보 디스플레이부(341)를 통해, 사용자 인터페이스의 예시(930)처럼 표현될 수 있으며(520), 편집자는 이를 통하여 프레임 내의 객체 추적 결과에 대하여 검증을 하게 된다. In other words, the object tracking result of "518" may be represented as an example 930 of the user interface through the object information display unit 341 for each frame (520), through which the editor has the object tracking result in the frame. Will be verified.

다음은, 사용자(편집자)에 의한 오류 검증 및 편집 과정(54)에 대하여 설명하기로 한다.Next, an error verification and editing process 54 by a user (editor) will be described.

상기 "50" 및 "52" 과정이 모두 종결되면, 편집 자료에 대한 검사를 진행하게 되는데(522) 이를 진행하지 않는다면 저장 프로세스 과정(530)으로 천이하고, 이를 진행한다면 순차적 객체 정보 디스플레이부(342) 혹은 프레임별 개별 객체 정보 디스플레이부(341)를 통하여 편집 부분에 대한 편집 정보가 출력되는 과정(524)이 이루어지고, 정보의 오류 또는 편집자의 수정이 있는 경우에는(526) 프레임 수동 편집 과정(528)을 거친다. 이와 같으 "54"과정을 상세히 설명하면 다음과 같다.When all of the "50" and "52" process is terminated, the inspection of the edited material is performed (522). If the process is not performed, the process proceeds to the storage process process 530, and if so, the sequential object information display unit 342 Alternatively, a process 524 of outputting edited information on the edited part is performed through the individual object information display unit 341 for each frame, and if there is an error of information or an editor's correction (526), a manual process of editing a frame ( 528). As described above, the "54" process will be described in detail.

사용자(편집자)의 검증(검사) 요청이 있으면(522) 편집 부분에 대한 편집 정보를 출력한다(524). 즉, "50" 및 "52" 과정을 통하여 획득한 편집(장면 분할, 객체 검출/추적 등을 포함하는 광의의 의미) 결과를 디스플레이한다. If there is a verification (inspection) request of the user (editor) (522), the edit information for the edited portion is output (524). That is, the results of editing (broad meaning including scene segmentation, object detection / tracking, etc.) obtained through the processes of “50” and “52” are displayed.

그러면, 사용자(편집자)는 편집 결과나 편집 정보에 오류가 있는지를 확인하여(526) 오류가 있으면 이를 시정하는 편집을 수행한다(528). 예를 들어, 배우 A의 얼굴 그룹에 배우 B의 얼굴 프레임이 들어 있는 경우, 사용자는 분류가 잘못된 프레임을 마우스로 "드래그 앤 드롭"하여 배우 B의 얼굴 그룹에 넣을 수 있다.Then, the user (editor) checks whether there is an error in the edit result or the edit information (526) and performs an edit to correct the error (528). For example, if the actor A's face group includes the actor B's face frame, the user may “drag and drop” the wrongly classified frame into the actor B's face group.

상기와 같은 "50", "52", "54" 과정을 통하여 획득한 객체 정보(예를 들어, 장면 분할/그룹화 정보, 객체 분류, 객체 검출/추적 정보 등)는 저장 프로세스를 통하여 다양한 형태로 저장된다(530). 이에 대한 상세한 설명은 도 8에서 설명하기로 한다. Object information (for example, scene segmentation / grouping information, object classification, object detection / tracking information, etc.) obtained through the above “50”, “52”, and “54” processes may be stored in various forms through the storage process. Stored (530). Detailed description thereof will be described with reference to FIG. 8.

도 6은 본 발명에 따른 DB 정보를 이용한 객체 검출 방법에 대한 일실시예 흐름로서, 도 5의 "DB 정보를 이용한 객체검출 프로세스"(508)를 나타낸다.FIG. 6 is a flowchart illustrating an object detection method using DB information according to an embodiment of the present invention, and shows “object detection process using DB information” 508 of FIG. 5.

객체/객체정보 DB(36)에 검색대상이 되는 객체 정보(객체 데이터)가 있는지를 확인하여(600), 검색대상 객체 데이터가 있으면 객체/객체정보 DB(36)에서 가져온다(602). 만약, 검색대상 객체 데이터가 없으면 종료한다.The object / object information DB 36 checks whether there is object information (object data) to be searched for (600), and if there is object data to be searched, it is taken from the object / object information DB 36 (602). If there is no search object data, it is terminated.

"504"의 그룹화 과정을 통하여 생성된 그룹들의 대표 프레임들을 장면 및 비디오샷 관리부(311)에서 가져와서(606), 객체 유형(얼굴인지, 아니면 사물 등과 같이 비 얼굴인지)을 확인하여(608) 해당 객체의 유형에 따라 객체 검출(추출) 및 인식 알고리즘을 다르게 적용한다(610 내지 616). 즉, 검색대상 객체의 유형에 따라, 객체별 특징 추출 함수 관리부(313)에서 관리되는 적정한 함수를 선택하여 적용한다.Representative frames of the groups generated through the grouping process of "504" are taken from the scene and video shot management unit 311 (606), to determine the object type (face or non-face, such as objects) (608) Object detection (extraction) and recognition algorithms are applied differently according to the type of the object (610 to 616). That is, according to the type of the object to be searched, an appropriate function managed by the object-specific feature extraction function manager 313 is selected and applied.

객체 유형 확인 결과(608), 검색대상 객체가 얼굴이면, 대표 프레임에서 얼굴 영역을 검출한 후(610) 얼굴 인식 과정(612)을 수행한다. 그렇지 않은 경우(비얼굴인 경우), 대표 프레임에서 객체(비얼굴 객체)를 검출한 후(614) 객체 인식 과정(616)을 수행한다. 여기서, 얼굴 인식(612) 및 객체 인식(616)은 대표 프레임에서 검출한 객체(얼굴/비얼굴 객체)와 검색대상 객체가 동일한지를 확인하는 과정이다. As a result of the object type check 608, if the search target object is a face, the face region is detected in the representative frame (610) and the face recognition process 612 is performed. If not (non-face), after detecting the object (non-face object) in the representative frame (614), the object recognition process 616 is performed. Here, the face recognition 612 and the object recognition 616 are processes for checking whether the object (face / non-face object) detected in the representative frame and the search target object are the same.

얼굴 인식(612) 또는 객체 인식(616)을 수행한 결과, 객체가 매칭되지 않으면 대표 프레임 조사 단계(604)로 돌아가고, 매칭이 성공하면 객체 정보 복사 과정(620)을 수행한다. 즉, 검색대상 객체와 동일한 객체로 인식된 객체에 대하여 객체/객체정보 DB(36)에 저장되어 있는 객체 정보를 입력(자동 입력)하여 주는 것으로서, 더욱 상세하게는 객체정보 관리부(312)에 객체/객체정보 DB(36)의 내용이 복사("자동 입력")되는 것이다(620).As a result of performing the face recognition 612 or the object recognition 616, if the object does not match, the method returns to the representative frame investigation step 604. If the matching is successful, the object information copying process 620 is performed. That is, the object information stored in the object / object information DB 36 is input (automatically input) to the object recognized as the same object as the search object, and more specifically, the object information management unit 312 provides the object. The contents of the object information DB 36 are copied ("automatically input") (620).

다음으로, 객체 추적기(326)를 통해서, 해당 대표 프레임("606"에서 가져온 대표 프레임)이 나타내는 장면 또는 샷에 속하는 전체 프레임에서 검색대상 객체의 위치를 추적한다(622).Next, the object tracker 326 tracks the position of the object to be searched for in the entire frame belonging to the scene or shot indicated by the representative frame (representative frame taken from “606”) (622).

객체를 추적하는 방식에는 크게 블록 기반의 추적 방식과 특징 기반의 추적 방식이 있다. 블록 기반의 추적 방식은, 객체를 추적함에 있어 속도는 매우 빠르지만, 장면 변화에 의해 객체가 사라진 프레임에서도 추적이 시도되어 엉뚱한 객체가 추적되는 결과를 초래할 수 있다. 특징 기반의 추적의 경우, 특징 벡터에 의한 인식 과정을 거침으로 인해 정확한 추적이 가능하지만 모든 프레임에 대하여 이러한 방식을 적용할 경우, 그 엄청난 연산량으로 인한 속도 문제로 인해, 편집자의 대기 시간이 길어진다. There are two types of object tracking methods: block-based tracking and feature-based tracking. The block-based tracking method is very fast in tracking an object, but tracking may be attempted even in a frame in which the object disappears due to a scene change, resulting in the wrong object being tracked. In the case of feature-based tracking, accurate tracking is possible due to the recognition process by the feature vector, but when this method is applied to all frames, the editor has a long waiting time due to the speed problem due to the huge amount of computation. .

따라서, 본 발명에서는 장면으로 분할하고, 장면 및/또는 샷의 대표 프레임에서는 특징 기반의 검출을 수행하고, 장면/샷에 속하는 프레임들에 대해서는 블록 기반의 추적 방식을 적용함으로써 문제점을 해결한다. 여기서, 장면 중간에 나타난 객체에 대한 추적의 오류를 줄이기 위해, 더욱 세분화된 개념의 샷을 두어 대표 프레임을 추출하여 관리하고, 이들 대표 프레임에 대해서는 특징 기반의 검출 과정에 적용하지만 사용자 인터페이스에는 보여 주지 않음으로써, 문제를 해결함과 동시에 편집자의 인터랙션 수를 줄인다. 이는 샷에 나타난 관심 객체가 장면에 나타날 확률이 매우 크기에 가능한 것이다.Therefore, in the present invention, the problem is solved by dividing into scenes, performing feature-based detection in representative frames of scenes and / or shots, and applying a block-based tracking method to frames belonging to the scenes / shots. Here, in order to reduce the error of tracking the object appearing in the middle of the scene, the representative frames are extracted and managed by taking a more detailed shot, and these representative frames are applied to the feature-based detection process but are not shown in the user interface. By not doing so, it solves the problem and at the same time reduces the number of editor interactions. This is possible because the object of interest shown in the shot is very likely to appear in the scene.

상기와 같은 과정은 모든 대표 프레임에 대하여 수행될 때까지 피드백되고(즉, 미처리된 대표 프레임이 있는지를 확인하여 있으면 "606" 이하의 과정이 수행된다)(604), 또한 DB(36)에 있는 모든 검색대상 객체에 대하여 수행될 때까지 피드백된다(600).The above process is fed back until all the representative frames are performed (i.e., if there is an unprocessed representative frame, a process of " 606 " or less is performed) 604, and also in the DB 36 It is fed back 600 until it is performed for all searched objects.

다음 과정으로, 사용자 인터페이스(도 9의 "910" 참조)를 통하여 상기 과정을 통하여 검출된 장면(샷을 제외)의 대표 화면(대표 프레임)(편집자의 선택에 의해, 장면 내의 "샷" 단위의 대표 화면을 디스플레이할 수도 있음)을 디스플레이 함으로써(도 5의 "510" 참조) 실질적인 편집자의 편집 과정이 시작된다. 편집자는 관심 객체를 포함하는 장면 또는 샷을 선택하고 해당 관심 객체를 선택할 수 있는 인터페이스(512, 920)가 활성화되면, 대표 프레임에서 객체 추출 및 객체 영역 지정 프로세스(514)가 실행되는데, 이에 대해서는 도 5에서 설명한 바와 같다. Next, the representative screen (representative frame) of the scene (except the shot) detected through the above process through the user interface (see "910" in FIG. 9) (by the editor's choice, in units of "shot" in the scene). By displaying the representative screen (see " 510 " in FIG. 5), the editing process of the actual editor is started. When the editor selects a scene or shot that includes the object of interest and the interfaces 512 and 920 to select the object of interest are activated, the object extraction and object area designation process 514 is executed in the representative frame. As described in 5.

도 7은 본 발명에 따른 대표 프레임에서의 객체 추출 및 객체영역 지정 방법에 대한 일실시예 흐름도로서, 객체 기반 멀티미디어 편집 시스템(100)에서 수행되는 방법을 나타낸다.7 is a flowchart illustrating an object extraction and object region designation method in a representative frame according to the present invention, and shows a method performed by the object-based multimedia editing system 100.

편집자(사용자)가 작업도구(903)를 통하여 관심 객체(검색대상 객체)를 선택하면(512), 해당 관심 객체의 추적을 위하여 지능화된 영상 분할 알고리즘을 이용하여 해당 객체 영역을 설정(분할)한다(700). When the editor (user) selects the object of interest (object to be searched) through the work tool 903 (512), the object area is set (divided) by using an intelligent image segmentation algorithm for tracking the object of interest. (700).

그리고 나서, 해당 객체에 대한 특징값을 추출하고(702), 객체정보 그룹화 및 계층화(704)를 통하여 다른 객체와의 관계를 정립하여, 객체정보 관리부(312)에 저장한다.Then, the feature values for the object are extracted (702), the relationship with other objects is established through object information grouping and hierarchization (704), and stored in the object information manager (312).

다음으로, 관심 객체 이외의 불필요한 주변 요소들이 배제되도록 하는 객체 영역 근사화 과정(706)을 수행하고, 그 수행결과 데이터를 객체정보 관리부(312)에 저장한다. 상기 과정에서 인지된 관심 객체는 객체 정보 입력부(331)(1010)를 통하여 입력되며(516), 다른 모든 장면/샷에서 추적된다(518).Next, an object region approximation process 706 is performed to exclude unnecessary peripheral elements other than the object of interest, and the result data is stored in the object information management unit 312. The object of interest recognized in the above process is input through the object information input units 331 and 1010 (516) and tracked in all other scenes / shots (518).

도 8은 본 발명에 따른 객체정보 저장 프로세스에 대한 일실시예 흐름도로서, 도 5의 "530" 과정을 나타낸다. FIG. 8 is a flowchart illustrating an object information storage process according to the present invention, and illustrates the process “530” of FIG. 5.

객체정보 관리부(312)에서 저장 관리되는 객체 정보를 그 활용 용도에 따라 메타데이터 파일로 저장하거나, 메타 데이터로서 객체/객체정보 DB(36)에 저장하거나, 또는 일정한 규격 정보로 구성하여 저장할 수 있다(800).The object information managed by the object information management unit 312 may be stored in a metadata file according to the purpose of use, stored in the object / object information DB 36 as metadata, or configured and stored as predetermined standard information. (800).

편집자는 상기와 같은 과정을 통하여 저장소(저장부)에 저장(입력)된 메타 데이터의 상태를 확인할 수 있다(802). 즉, '미리 보기'와 같은 사용자 인터페이스를 통하여 메타데이터가 오류 없이 저장(입력)되었는지를 확인한다.The editor may check the state of the metadata stored (input) in the storage (storage unit) through the above process (802). That is, the user interface such as 'Preview' checks whether the metadata is stored (input) without error.

한편, 편집자의 요청에 따라, 저장(입력)된 메타 데이터를 플래시, 엠펙 4, 기타 서비스 타입의 미디어(미디어 데이터)로 변환하여 저장할 수도 있다(804).Meanwhile, at the request of the editor, the stored (input) metadata may be converted and stored into media (media data) of flash, MPEG 4, or other service type (804).

도 9 내지 도 12는 본 발명에 따른 객체 기반 멀티미디어 편집 시스템(100)에서 제공하는 사용자 인터페이스 화면의 구성을 나타낸다.9 to 12 illustrate a configuration of a user interface screen provided by the object-based multimedia editing system 100 according to the present invention.

객체 기반 멀티미디어 편집 시스템(100)에서 제공하는 인터페이스 화면(900)은 기본적으로 응용 프로그램이 가지는 메뉴(901), 메인 도구(902), 및 상태 바(970)를 포함하고, 더 나아가 작업 도구 바(903), 작업 창(940), 정보 창(1040), 장면선택 및 객체보기 창(950)으로 구성된다. The interface screen 900 provided by the object-based multimedia editing system 100 basically includes a menu 901, a main tool 902, and a status bar 970 which an application has. 903), a task window 940, an information window 1040, a scene selection, and an object view window 950.

작업 창(940)은 장면 보기 창(910), 미디어 재생기 및 작업 영역 창(920), 및 프레임 보기 창(1030)으로 구성된다. 작업 도구 바(903)는 작업 창(940)의 활성화 상태에 따라서 그 조합이 변경된다. The task pane 940 is comprised of a scene view pane 910, a media player and a work area pane 920, and a frame view pane 1030. The work tool bar 903 changes its combination according to the active state of the work window 940.

정보 창(1040)은 특정 객체의 상세 정보를 입력하고 보여줄 수 있는 객체 정보 창(1010), 객체 간의 구조 및 그룹 정보를 파악할 수 있는 모든 객체 정보 창(1020), 편집자의 작업 내용을 관리하고 보여 주는 작업 정보 창(1030)으로 구성된다. The information window 1040 manages and displays the object information window 1010 for inputting and displaying detailed information of a specific object, all object information windows 1020 for identifying structure and group information between objects, and editors' work contents. The week is composed of a job information window 1030.

장면선택 및 객체보기 창(950)은 객체의 목록(951, 952), 시간 축(954), 시간 순 장면 표시자(964)로 구성된다.The scene selection and object view window 950 consists of a list of objects 951, 952, a time axis 954, and a chronological scene indicator 964.

이하, 도 9 내지 도 12에 대하여 각각 설명하면, 다음과 같다.Hereinafter, each of FIGS. 9 to 12 will be described.

도 9는 본 발명에 따른 장면 분할에 대한 화면 구성도로서, 장면 분할기(321)에 의해 생성된 장면의 대표 화면(대표 프레임)(912) 및 정보(장면번호, 시각정보, 프레임 개수 등)(913)를 표시하고, 분할 장면 자체의 편집 및 편집 장면으로 가기 위한 사용자 인터페이스를 제공한다. 그리고, 장면 보기 창(910)에는 현재 장면이 분리되고 있는 중임을 알리는 하면 구성요소도 있다(904). 9 is a screen configuration diagram of scene division according to the present invention, which is a representative screen (representative frame) 912 and information (scene number, time information, number of frames, etc.) of a scene generated by the scene divider 321 ( 913 and provide a user interface for editing the split scene itself and going to the edit scene. In addition, the scene view window 910 also indicates that the current scene is being separated (step 904).

분할 장면 자체의 편집을 위해서는 작업도구(903)를 활용한다. 실제 분리된 장면 하나의 정보(913)는 대표화면(912)이 화면에 표시되면 그와 함께 표시되며, 시간 순 장면 표시자(964)에 동기화된다. 이때, 시간 축(954)과 장면의 연결성을 보여주기 위하여, 시간 축에 대한 장면 정보(956)와 실제 대표 화면을 동일 컬러로서 연결한다. The work tool 903 is used to edit the divided scene itself. The information 913 of the actual separated scenes is displayed together with the representative screen 912 when it is displayed on the screen, and is synchronized to the chronological scene indicator 964. In this case, in order to show the connectivity between the time axis 954 and the scene, the scene information 956 on the time axis and the actual representative screen are connected as the same color.

시간 축(954)은 한계를 가지는 화면에서 세부 사항과 개괄 사항을 동시 접근 가능하도록 시간 축 확대 축소 컨트롤(953)에 의해 조절될 수 있다. 또한 스크롤에 가려져 보이지 않는 장면 및 객체를 바로 선택할 수 있는 기능(959, 957)을 제공한다. 도 9에서는 콤보 박스를 이용하였으나 그 기능이 어떠한 도구를 이용하여도 무관하다. The time axis 954 can be adjusted by the time axis zoom control 953 to allow simultaneous access to details and outlines on the screen with limitations. It also provides the ability to directly select scenes and objects that are hidden by scrolling (959, 957). In FIG. 9, the combo box is used, but the function may be used by any tool.

또한, 검증 단계의 편리성 및 유연한 사용자 인터페이스를 제공하기 위하여 객체 목록(951)에서 선택된 객체가 포함된 장면만 볼 수 있는 기능(961)을 제공한다. 현재 화면에서 관심 있는 객체(관심 객체)가 존재하는 장면을 대표 화면(대표 프레임) 리스트(911)에서 선택하게 되면 다음 단계가 시작된다. 물론, 장면 분할기(321)는 별도 쓰레드에 의해 구동되어, 사용자 인터페이스에 지장을 주지 않게 할 수도 있다.In addition, in order to provide a convenience of the verification step and a flexible user interface, the function 961 may view only the scene including the object selected in the object list 951. When the scene in which the object of interest (object of interest) exists in the current screen is selected from the representative screen (representative frame) list 911, the next step is started. Of course, the scene divider 321 may be driven by a separate thread so as not to disturb the user interface.

그리고, 도 9의 화면의 맨 아래의 상태표시줄(970)을 통해서는 "Edited file Full Path"(971), "Current Scene"(972), "장면 분리 중'(973), ""진행상태"(974) 등을 나타낸다.And, through the status bar 970 at the bottom of the screen of FIG. 9, the "Edited file Full Path" 971, the "Current Scene" 972, the "Splitting Scene" 971, and the "" Progress state (974) and the like.

도 10은 본 발명에 따른 작업 및 일부 검증에 대한 화면 구성도이다.10 is a screen configuration diagram for a task and some verifications according to the present invention.

도 10에 도시된 인터페이스 화면은 이전 단계에서 선택된 장면의 작업(편집 작업) 및 재생에 관련된 것으로서, 미디어 재생기 및 작업 영역(920)이 활성화된 상태를 나타내며, 작업도구 창(903)에는 여기에 맞는 도구들이 존재한다. 이전 단계에서 선택된 장면의 시간이 작업 및 재생 화면(1001)에서 현재 재생 시간으로 일시 정지된 상태에서, 객체 선택 작업은 진행된다. 객체 선택과의 관계에서는 작업도구 창(903)의 도구를 선택하여 객체를 영역 추출기(327)를 거처 분리하는 기능을 제공한다. 여기서, 작업 화면과 재생기 화면을 동시에 사용하기 때문에, 재생과 관련한 제어 상자(제어 기능)(1009)를 제공한다. The interface screen shown in FIG. 10 relates to the operation (editing) and playback of the scene selected in the previous step, and shows a state where the media player and the work area 920 are activated, and the work tool window 903 corresponds to this. There are tools. In the state where the time of the scene selected in the previous step is paused as the current playback time on the work and playback screen 1001, the object selection work is in progress. In relation to the object selection, a tool of the work tool window 903 is selected to provide a function of separating the object via the area extractor 327. Here, since the work screen and the player screen are used at the same time, a control box (control function) 1009 related to playback is provided.

시간 탐색기(1004)는 현재 작업 장면의 위치를 포인트 컨트롤(1002)로 표시하며, 비디오 컨트롤의 시크 바에 영역 표시기(1003)를 두어 전체 시간에서 현 작업 장면의 범위를 표시하는 기능을 제공한다. 또한, 이러한 시간 정보는 장면 선택 및 객체 보기 창(950)의 시간 축(954), 시간 축에 대한 장면 정보(956), 시간 순 장면 표시자(964), 시간 축(954)의 영역 표시기(955) 등과 함께 동기화된다.The time navigator 1004 displays the position of the current working scene with the point control 1002, and provides a function of displaying the range of the current working scene over the entire time by placing an area indicator 1003 on the seek bar of the video control. In addition, this time information may include time axis 954 in the scene selection and object view window 950, scene information 956 for the time axis, chronological scene indicator 964, and area indicators in the time axis 954. 955) and the like.

분리된 객체가 생성되면 이는 객체 목록(951)에 추가되며, 이 객체에 연결된 도 13에 도시된 바와 같은 화면 정보를 제공하는 객체정보 입력부(331)를 통하여 객체 정보를 새로이 입력할 수 있다. 또한 작업과 관련된 기본적인 비디오 컨트롤 기능(1005)을 제공하고, 추적된 객체를 검증/확인할 수 있도록 "현재 장면만 재생하기" 버튼(1006), "현재 객체만 표시 후 재생" 버튼(1007), "모든 객체를 화면에 표시하여 재생하기" 버튼(1008)을 제공한다. 그리고, "객체 정보"와 관련해서는 선택된 오브젝트(객체) 정보를 보여 줄 수 있다(1011).When the separated object is created, it is added to the object list 951, and the object information may be newly input through the object information input unit 331 providing the screen information as shown in FIG. 13 connected to the object. It also provides basic video control functions (1005) related to the operation and allows the "play only current scene" button (1006), "play only current object" button (1007), " Display all objects on the screen "button 1008. In operation 1011, the selected object (object) information may be displayed in relation to the “object information”.

도 11은 본 발명에 따른 특징 기반 객체 검출/추적에 대한 화면 구성도로서, 특징 기반 객체 검출/추적 과정(325, 323, 608, 618, 622)에서 사용하는 사용자 인터페이스 화면(1100)을 나타낸다. FIG. 11 is a screen configuration diagram of feature-based object detection / tracking according to the present invention and shows a user interface screen 1100 used in feature-based object detection / tracking process 325, 323, 608, 618, or 622.

이전 단계에서 확정된 객체는 다른 장면 및 모든 샷의 대표 프레임에서도 검출/추적된다. 이 과정에서 해당 객체가 검출/추적된 장면들은 장면 리스트(1106)에 표시되고, 각각의 장면(1103)에 대해서는 대표 화면(대표 프레임)(1101) 및 해당 정보(1102)가 출력되고, 검증/확인을 위한 객체 표시기(1104)를 동시에 표현하는 기능이 제공된다. 이때, 장면 정보 항목에는 인식기(323, 325)가 인식을 위해 사용된 정보가 표기됨으로써, 편집자로 하여금 인식기(323, 325)가 사용한 정보를 확인할 수 있게 하는 기능을 제공한다. 그리고, "1105"는 현재 객체 검색 중임을 나타내는 화면 구성요소이다.The object determined in the previous step is also detected / tracked in other scenes and representative frames of all shots. In this process, scenes in which the object is detected / tracked are displayed in the scene list 1106, and a representative screen (representative frame) 1101 and corresponding information 1102 are output for each scene 1103, and the verification / The function of simultaneously presenting the object indicator 1104 for confirmation is provided. In this case, information used by the recognizers 323 and 325 for the recognition is displayed on the scene information item, thereby providing a function for allowing an editor to check the information used by the recognizers 323 and 325. And, "1105" is a screen component indicating that the object is currently being searched.

장면 리스트(1106)에서 특정 장면이 선택된 경우, "미리 보기" 화면(940)에 미디어 재생기 및 작업 영역(920)과 같은 기능을 제공하여 매순간 편집할 수 있는 기능를 제공한다. When a particular scene is selected in the scene list 1106, functions such as a media player and a work area 920 are provided in the "preview" screen 940 to provide a function of editing every moment.

도 12는 본 발명에 따른 블록 기반 객체 추적/검증에 대한 화면 구성도로서, "프레임 보기"(930)가 활성화되었을 때의 사용자 인터페이스를 나타낸다. 12 is a screen configuration diagram for block-based object tracking / verification according to the present invention, which shows a user interface when the "frame view" 930 is activated.

프레임 리스트 화면(1200)에는 프레임 추출부(303)로부터 받은 현재 장면에 포함된 모든 프레임들(1201)이 개별적으로 디스플레이된다. 이때, 각각의 프레임에는 추적된 객체의 결과(노란색으로 표시된 사각형 영역)(1202)를 함께 디스플레이하는 기능을 제공함으로써 편집자가 빠르게 검증할 수 있게 한다. The frame list screen 1200 individually displays all the frames 1201 included in the current scene received from the frame extractor 303. At this time, each frame is provided with a function of displaying the result (tracked yellow area) 1202 of the tracked object together so that the editor can quickly verify it.

만약, 프레임 리스트 화면(1200)에서 수정할 프레임이 있다면, 이를 선택하여 미디어 재생기 및 작업영역 창(920)이 활성화시킬 수 있으며, 이때 선택된 프레임의 시점으로 시간이 결정되어 일시 정지 상태로 된다. 이후, 편집자는 편집 및 수정 과정을 수행할 수 있다.If there is a frame to be modified on the frame list screen 1200, the media player and the work area window 920 may be activated by selecting it, and at this time, the time is determined as the start point of the selected frame, and the state is paused. The editor can then perform the editing and modification process.

도 13은 본 발명에 따른 객체정보 입력 인터페이스 화면에 대한 구성도로서, 객체정보 입력부(331)가 제공하는 사용자 인터페이스 화면을 나타낸다. FIG. 13 is a block diagram illustrating an object information input interface screen according to the present invention, and shows a user interface screen provided by the object information input unit 331.

사용자(편집자)는 객체정보 입력부(331)를 통해서 객체에 대한 전반적인 사항을 입력하게 된다. 즉, 객체정보 입력부(331)는 객체의 이름(1301), 보여 지게 될 대표 글(1302)에 대한 정보, 객체 선택 이후에 대한 불특정 다수의 행위 중 어떠한 한 행위(1306) 등 입력받는데, 본 발명에서는 이러한 행위에 대하여 특정한 범위를 제한하지 않는다. 다만 그 리스트를 예시(1308)하여 다양한 가능성을 지원하는 기능을 제공한다. 그리고, "1303"은 "객체 선택 후 행동하기"와 관련되고, 그에 대한 구체적인 유형은 "1308"의 행동 선택을 통하여 선택되며, 그 일예가 URL 바로 가기(1305)이다. 그리고, URL 입력과 관련해서는 "1307", "1309"가 있다.The user (editor) inputs general information on the object through the object information input unit 331. That is, the object information input unit 331 receives the name 1301 of the object, information on the representative article 1302 to be shown, any one action 1306 of the unspecified number of actions after the object selection, and the like. Does not limit the specific scope of these actions. However, the list is illustrated (1308) to provide a function supporting various possibilities. And, "1303" is related to "acting after object selection", the specific type thereof is selected through the behavior selection of "1308", an example is the URL shortcut 1305. In connection with URL input, there are "1307" and "1309".

도 14는 본 발명에 따른 샷 보기 화면에 대한 구성도이다.14 is a diagram illustrating a shot view screen according to the present invention.

장면 분할기(321)에 의해 분할된 장면의 대표 프레임이 장면 보기 창(910)을 통하여 화면상에 보여 질 때, 장면 보기 창(910)에서 하나의 장면(912)이 선택된다면, 그 선택된 특정 장면에 포함되어 있는 샷들에 대한 대표 프레임(1401)이 샷 보기 창(1400)을 통하여 나타난다. When a representative frame of a scene divided by the scene divider 321 is displayed on the screen through the scene view window 910, if a scene 912 is selected in the scene view window 910, the selected specific scene is selected. The representative frame 1401 for the shots included in is displayed through the shot viewing window 1400.

샷(비디오 샷)은 매우 유사한 영상들의 연속이기 때문에, 각 샷에 대한 대표 프레임(1401)에 특정 객체가 있을 경우, 그 객체의 영역을 컬러로 구분하여 표시(1402) 하면 편집에 유용하다. 즉, 객체 관리 목록(951)과, 객체 트리 창(1500)에서 "객체가 포함된 대표화면"(1542)의 객체 위치정보(1543)가 동기화된다면, 검수 과정에 객체에 대한 정보를 시간별/구조별로 용이하게 검수할 수 있게 된다.Since the shot (video shot) is a series of very similar images, when there is a specific object in the representative frame 1401 for each shot, it is useful to edit and display the area of the object by color coding 1402. That is, if the object management list 951 and the object location information 1543 of the "representative screen including the object" 1542 in the object tree window 1500 are synchronized, the information on the object may be hourly / structured during the inspection process. Very easy to inspect.

도 15는 본 발명에 따른 객체 관리 화면에 대한 구성도로서, 분할된 객체의 관리를 위한 객체 트리 창(1500)의 예시를 나타낸다.15 is a configuration diagram of an object management screen according to the present invention, which shows an example of an object tree window 1500 for managing a divided object.

객체의 관리는 기본적으로 객체 간의 관계를 트리로 표현하는 것이며, 객체 트리에 표현된 객체는 가상 객체(1511, 1512)와 실제 객체(1514)로 구분된다. Object management basically represents a relationship between objects in a tree, and the objects represented in the object tree are divided into virtual objects 1511 and 1512 and real objects 1514.

가상 객체는 실제 객체를 포함하여 관리되고, 실제 객체는 "1540"의 화면 영역에서, 특징 데이터(1541), 해당 객체를 포함한 프레임(1542), 및 해당 객체의 위치정보(1543)를 포함하여 관리된다. 가상 객체는 그에 포함된 실제 객체들 간의 상위 개념에서의 특징데이터를 포함할 수 있다. 그리고, 가상객체(1512)에는 하위의 객체(1513)가 있을 수 있다. 그리고, 특정한 사물에 대한 객체(선글라스 A)(1515)도 별도로 구분하여 관리할 수 있다.The virtual object is managed including the real object, and the real object is managed in the screen area of “1540” including the feature data 1541, the frame 1542 including the object, and the location information 1543 of the object. do. The virtual object may include feature data in a higher level concept between the actual objects included therein. The virtual object 1512 may have a lower object 1513. In addition, an object (sunglasses A) 1515 for a specific object may be separately managed.

이러한 방법으로 객체를 관리함으로써 오 인식 객체의 수정이 간단하게 이루어질 수 있으며, 객체 상호 간의 관계도를 구성하고 확인할 수 있다. By managing the objects in this way, it is possible to modify the false recognition objects simply, and to construct and check the relationship diagram between the objects.

특징 데이터(1541)의 경우에는 도 15에 도시된 바와 같이 표현할 수도 있지만, 그렇지 않을 수도 있다. 또한, 객체 트리 화면(1510)에는 정보 입력이 필요한 객체(1516)와 추적이 필요한 객체(1517)를 아이콘으로 별도 표현함으로써 편집자의 검수를 지원한다.The feature data 1541 may be expressed as shown in FIG. 15, but may not be represented. In addition, the object tree screen 1510 supports the editor's inspection by separately representing the object 1516 that needs information input and the object 1517 that needs to be tracked by an icon.

신규로 등록된 객체는 객체 트리 화면(1510)에서 객체를 선택한 후 "객체 추적 큐로 등록" 버튼(1520)을 누르면, 단일 프레임에서 등록된 객체의 위치 정보를 객체 추적기(326)를 통하여 연속 프레임에서의 위치정보를 모두 자동으로 선정한다. 이 과정은 백그라운드 프로세서(Background Processor)에 의해 이루어지므로, 동시에 객체에 대한 정보 입력이 가능하여, 편집 시간을 단축시켜 준다. 객체에 대한 정보 입력은 객체를 선택 한 후 정보 입력 버튼(1530)을 클릭함으로서 객체 정보 입력 창(1010)을 활성화 시켜 필요한 정보를 입력한다. When a newly registered object is selected in the object tree screen 1510 and the "Register as object tracking queue" button 1520 is pressed, the position information of the object registered in a single frame is stored in a continuous frame through the object tracker 326. All location information is automatically selected. Since this process is performed by a background processor, information about an object can be input at the same time, thereby reducing the editing time. In order to input information about an object, the object information input window 1010 is activated by inputting the information necessary by selecting the object and clicking the information input button 1530.

한편, 객체/객체정보 DB(36)에 동일 객체(검색대상 객체)가 존재할 경우, DB(36)로부터 받은 데이터는 미리 생성되어 있으며, 편집자는 필요한 경우 이를 수정한다. 신규 객체의 경우, 편집자의 직접 입력도 가능하나 웹 사이트(112)의 검색 결과를 이용하여 편리하게 입력하는 것도 가능하다.On the other hand, when the same object (object to be searched) exists in the object / object information DB 36, the data received from the DB 36 is generated in advance, and the editor corrects it if necessary. In the case of a new object, the editor can directly input it, but it is also possible to input it conveniently using the search result of the web site 112.

도 16은 본 발명에 따른 객체 추적 정보 화면에 대한 구성도로서, 객체 추적 정보를 보여 주는 창(1600)을 나타낸다. 16 is a block diagram of the object tracking information screen according to the present invention and shows a window 1600 showing object tracking information.

객체 추적 및 인식 단계를 백그라운드 프로세서(Background Processor)에 할당함으로써 편집자에게 자동 처리 항목 이외의 수동 처리가 가능한 방식까지도 제공하여 전체 편집 시간을 절감한다.By assigning object tracking and recognition steps to the Background Processor, it saves the entire editing time by providing editors with a way to do manual processing other than automatic processing.

도 16은 도 15의 "공유_썬글라스A_정면"(1514), "공유_썬글라스A_측면", "공유_썬글라스A_정면2"에 해당하는 객체(1601)를 자동으로 추적하는 과정을 나타낸다. FIG. 16 illustrates a process of automatically tracking an object 1601 corresponding to "Shared_Sunglasses A_Front" 1514, "Shared_Sunglasses A_Front", and "Shared_Sunglasses A_Front2" of FIG. 15. Indicates.

도 17은 본 발명에 따른 장면 분할기, 샷 보기, 및 얼굴 그룹화에 따른 화면 구성 예시도이고, 도 18은 본 발명에 따른 작업 및 일부 검증을 위한 화면 구성의 다른 예시도이다.FIG. 17 is a diagram illustrating a screen configuration according to a scene divider, a shot view, and a face grouping according to the present invention, and FIG. 18 is a diagram illustrating another screen configuration for work and partial verification according to the present invention.

한편, 전술한 바와 같은 본 발명의 방법은 컴퓨터 프로그램으로 작성이 가능하다. 그리고 상기 프로그램을 구성하는 코드 및 코드 세그먼트는 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다.　또한, 상기 작성된 프로그램은 컴퓨터가 읽을 수 있는 기록매체(정보저장매체)에 저장되고, 컴퓨터에 의하여 판독되고 실행됨으로써 본 발명의 방법을 구현한다. 그리고 상기 기록매체는 컴퓨터가 판독할 수 있는 모든 형태의 기록매체를 포함한다.On the other hand, the method of the present invention as described above can be written in a computer program. And the code and code segments constituting the program can be easily inferred by a computer programmer in the art. In addition, the written program is stored in a computer-readable recording medium (information storage medium), and read and executed by a computer to implement the method of the present invention. The recording medium may include any type of computer readable recording medium.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

Claims

In the multimedia image analysis apparatus robust to scene change,

Scene dividing means for dividing the multimedia image into predetermined image units and selecting a representative frame for each divided image; And

Object retrieval means for classifying the selected representative frames according to whether objects are identical based on a central object type

Multimedia image analysis device for including.

The method of claim 1,

The object searching means,

The multimedia image analyzing apparatus further performing a function of searching a search target object for a corresponding representative frame for each classification group.

The method of claim 2,

Object tracking means for tracking the search object for individual frames belonging to the searched representative frame of the search object

Multimedia image analysis device further comprising.

The method of claim 1,

The object searching means,

The object corresponding to the central object type, which has been determined to be the same as an object, is classified through an identification mark in a corresponding representative frame, and uses different identification marks for each classification group.

The method of claim 4, wherein

The identification mark,

Multimedia image analysis apparatus, characterized in that the image is displayed through the image superimposition method, the image of the polygonal form is color-coded so that the overlapping objects can be seen.

The method according to any one of claims 1 to 3,

The central object type is,

The multimedia image analyzing apparatus, characterized in that automatically determined based on the scene characteristics of the multimedia image or selected by the user.

The method of claim 6,

The central object type is,

Multimedia image analysis device, characterized in that the face of the person.

In the multimedia image analysis apparatus robust to scene change,

Object searching means for searching for a search target object based on the selected representative frames

Multimedia image analysis device comprising a.

The method of claim 8,

Object tracking means for tracking the search object for individual frames belonging to the representative frame from which the search object is searched

Multimedia image analysis device further comprising.

The method according to claim 2 or 8,

The object searching means,

And managing the representative frames including the search target object by grouping them.

The method according to any one of claims 2, 3, 8, or 9,

The search target object,

And at least one of an object stored / managed through a database or an object input by a user.

The method according to claim 1 or 8,

The video unit,

Multimedia image analysis device, characterized in that any one of the scene or video shot.

The method of claim 12,

If the video unit is a video shot,

A scene analysis apparatus for performing scene segmentation for user editing, and segmenting the multimedia image by shot unit for each divided scene.

The method according to claim 2 or 8,

The object searching means,

Multimedia image analysis device for applying a different object search-related algorithm for each type of the search object.

The method of claim 14,

The object searching means,

And an object searching related algorithm is applied differently according to whether the object to be searched belongs to one of a face object and a non-face object.

The method according to claim 2 or 8,

The object searching means,

An object recognized as the same object as the search target object and searched for is classified by using an identification mark within a corresponding representative frame, and the same image identification apparatus uses the same identification mark.

The method according to claim 2 or 8,

The object searching means,

Multimedia analysis apparatus for performing the object search in the representative frame using a feature-based object detection method.

The method according to claim 3 or 9,

The object tracking means,

Multimedia analysis apparatus for performing the object tracking in the individual frame using a block-based object tracking method.

The method according to claim 3 or 9,

The object tracking means,

An object identified and tracked as the same object as the search target object is classified using an identification mark in a corresponding frame, and the same identification mark is used for the same objects.

The method according to claim 3 or 9,

The object tracking means,

And a spatial position (frame number) of the search object and a spatial position within the frame with respect to individual frames belonging to the representative frame.

The method according to claim 3 or 9,

Means for collectively inputting corresponding object information on the object tracked through the object tracking means;

Multimedia analysis device further comprising.

The method according to claim 1 or 8,

The scene dividing means,

And selecting a frame including the most information among all the frames belonging to the divided image unit as the representative frame of the divided image.

In the multimedia image analysis method that is robust to scene change,

A division step of dividing the multimedia image into predetermined image units;

A representative selecting step of selecting a representative frame for each divided image; And

A classification step of classifying the selected representative frames according to whether objects are identical based on a central object type.

Multimedia image analysis method comprising a.

The method of claim 23,

An object searching step of searching for an object to be searched based on a representative frame corresponding to the classification group generated in the grouping step;

Multimedia image analysis method further comprising.

The method of claim 24,

An object tracking step of tracking the object to be searched for individual frames belonging to the representative frame in which the object to be searched is searched

Multimedia image analysis method further comprising.

The method of claim 23,

The classification step,

The object corresponding to the central object type, which has been determined to be the same as an object, is classified through an identification mark within a corresponding representative frame, and uses different identification marks for each classification group.

The method according to any one of claims 23 to 26,

The central object type is,

The multimedia image analysis method, characterized in that it is automatically determined based on the scene characteristics of the multimedia image or selected by the user.

The method of claim 27,

The central object type is,

Multimedia image analysis method characterized in that the face of the person.

In the multimedia image analysis method that is robust to scene change,

An object searching step of searching for a search target object based on the selected representative frames;

Multimedia image analysis method comprising a.

The method of claim 29,

Multimedia image analysis method further comprising.

The method of claim 24 or 29,

The object search step,

And managing representative frames including the search target object by grouping them.

The method according to any one of claims 24, 25, 29 or 30,

The search target object,

A multimedia image analysis method comprising at least one of an object stored / managed through a database or an object input by a user.

The method of claim 23 or 29,

The video unit,

Multimedia image analysis method, characterized in that any one of the scene or video shot.

The method of claim 24 or 29,

The object search step,

The multimedia image analysis method of applying an object search-related algorithm differently for each type of the search object.

The method of claim 24 or 29,

The object search step,

And identifying objects that are recognized and searched as the same object as the search object by using identification marks in the representative frame, and using the same identification marks for the same objects.

The method of claim 24 or 29,

The object search step,

Multimedia analysis method for performing the object search in the representative frame using a feature-based object detection method.

The method of claim 25 or 30,

The object tracking step,

Multimedia analysis method for performing the object tracking in the individual frame using a block-based object tracking method.

The method of claim 25 or 30,

The object tracking step,

And identifying the tracked objects that are recognized as the same object as the search target object by using identification marks in the corresponding frame, and using the same identification marks for the same objects.

The method of claim 25 or 30,

The object tracking step,

And obtaining a temporal position (frame number) of the search object and a spatial position within the frame with respect to individual frames belonging to the representative frame.

The method of claim 25 or 30,

Collectively inputting corresponding object information on the object tracked through the object tracking step;

Multimedia analysis method further comprising.

In the object-based multimedia editing system,

Multimedia image analyzing means for dividing the multimedia image into predetermined image units to select a representative frame for each divided image, and analyzing the multimedia image based on the selected representative frames; And

Management means for inputting corresponding object information to the searched / tracked object through analysis of the multimedia image analyzing means or performing scene / object editing through a user interface

Multimedia editing system comprising a.

42. The method of claim 41 wherein

The multimedia image analysis means,

10. The multimedia editing system according to any one of claims 1, 2, 3, 8, or 9, wherein the multimedia image analyzing apparatus is used.