WO2020111505A1 - Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images - Google Patents
Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images Download PDFInfo
- Publication number
- WO2020111505A1 WO2020111505A1 PCT/KR2019/013511 KR2019013511W WO2020111505A1 WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1 KR 2019013511 W KR2019013511 W KR 2019013511W WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- image
- generating
- metadata
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/7837—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/783—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
Definitions
- the present invention relates to a method for generating GT (Ground Truth) information used for machine learning of an image or video, and more specifically, to a method and system for generating GT information for an object such as a person or a car included in an image or video frame. It is about.
- GT Round Truth
- a technique for detecting an object is the basic technique.
- a GT Round Truth
- an object is manually extracted in an offline state and trained. It takes a method of generating an detector to perform object detection.
- a standard GT component and a GT description method are defined so as to describe comprehensive information of human or vehicle objects existing in an image or video frame.
- an object of the present invention is to provide a method and a system capable of efficiently generating GT information by automating GT tagging.
- the method for generating GT information of an object in an image according to the present invention includes automatically analyzing and generating GT information of each object included in an image according to a predetermined GT structure, and correcting the generated GT information. And generating metadata by converting the completed GT information, and the GT information for each object includes an object type, an object posture, an object state, an object location information, and an object property.
- the GT information generation system includes an image storage unit that stores an image, and a GT analysis unit that receives an image file from the image storage unit and analyzes and generates GT information for each object present in each frame. And, it includes a GT information correction unit for modifying the GT information generated by the GT analysis unit, and a metadata generation unit for generating metadata by converting the modified GT information.
- a standard GT component and a GT technology method can be defined so as to describe comprehensive information of objects such as a person or a vehicle existing in a still image image or a video frame.
- GT information can be automated to generate more efficiently compared to conventional methods.
- FIG. 1 is a block diagram of a GT information generation system according to an embodiment of the present invention.
- FIG. 2 is an overall flowchart of a method for generating GT information according to an embodiment of the present invention.
- FIG. 3 is a detailed flowchart of the GT analysis step shown in FIG. 2;
- FIG. 4 is a detailed flowchart of the GT information modification step illustrated in FIG. 2.
- each component, functional blocks or means may be composed of one or more sub-components, the electrical, electronic, and mechanical functions performed by each component are electronic circuits , It may be implemented with various known devices such as an integrated circuit, an ASIC (Application Specific Integrated Circuit), or mechanical elements, or may be implemented separately, or two or more may be integrated into one.
- ASIC Application Specific Integrated Circuit
- the overall information of the image object such as a person or a vehicle is tagged and GT meta is used. Use a method of organizing data.
- the image object GT information defined in the present invention is shown in Table 1 below.
- the frame number has one value in the case of inputting a video image, and the number of each frame in the case of video input.
- the number of objects means the number of objects detected in the corresponding image or frame.
- the object ID list is composed of the number of objects. For each object, the object type, object posture, object state, object position information, and object property information are defined based on the object ID.
- the types of objects are classified as'person' or'car', and the posture of the object is the'front','back','left','right','front-' of the posture of the object expressed in the image or video frame. It is divided into 8 directions such as'left','front-right','back-right', and'back-left'.
- the state of an object is divided into ‘total’, ‘cut’, and ‘overlap’ depending on whether the entire object is visible in an image or video frame or overlapping with another object.
- the location information of the object expresses four coordinates in which the object's bounding box is expressed based on the coordinates of (0, 0) of the image or video frame.
- the property information of the object is configured differently according to the type of the object.
- the type of the object is, for example, a person, it is composed of race, gender, age, height, color of the top, color of the bottom, and whether or not the glasses are worn.
- the present invention provides a method and system for defining a standard GT component so as to describe comprehensive information of objects existing in an image or video frame, and generating and managing GT information accordingly.
- FIG. 1 is a configuration diagram of a GT information generation system according to an embodiment of the present invention.
- the system includes an image storage unit 100, a GT analysis unit 110, a GT information correction unit 120, a GT metadata generation unit 130, and a metadata storage unit 140.
- the image storage unit 100 analyzes GT information and stores an image or video frame (hereinafter referred to as a'frame') containing an object to be generated.
- a'frame' an image or video frame
- Non-volatile memory such as a hard disk, volatile memory such as RAM, Alternatively, it may be implemented as a register that serves as a buffer for temporarily storing streaming data.
- the GT analysis unit 110 automatically determines each piece of information according to the above-described GT components for each frame from the image input from the image storage unit 100, that is, the type of the object, the posture of the object, the state of the object, and the location information of the object. Extract the properties of an object. For automatic GT analysis of objects in an image, you can use open source or cloud API as well as self-developed algorithm.
- the GT information correction unit 120 displays the extracted GT information as a list so that the user can modify the GT information generated by the GT analysis unit through analysis and displays it to the user.
- a list of objects having the above-described GT information structure is displayed for each frame number (image number of a video or frame number of a video), so that the user can review.
- the GT information is modified by reflecting the input.
- the GT metadata generation unit 130 converts the modified GT information into metadata, and the metadata storage unit 140 stores metadata.
- the metadata storage unit 140 and the image storage unit 100 are described separately from a logical point of view, and in hardware, each may be a separate storage device and may be configured in a single physical storage device.
- the GT automatic analysis step will be described in more detail as follows.
- the input video is divided into frames, and a frame number is assigned to the frame to be analyzed (S211).
- a frame number is assigned to the frame to be analyzed (S211). In the case of a normal image, only one frame exists, so frame division is not performed.
- GT analysis is automatically analyzed for the divided frame image (S214).
- developed algorithms, open source, or cloud APIs can be used, and examples of cloud APIs include the Sighthound Cloud API.
- the GT generated by analysis is generated as shown in Table 1 above, the frame number, the number of objects in each frame, the object ID for each object, the type of object by object ID, the posture of the object, the state of the object, the location information of the object, the object It is preferable to construct a data structure including the properties of.
- the GT information analyzed and generated is stored for modification (S216).
- the GT information correction step S220 is performed.
- the GT information modification step (S220) is a step for the user to manually modify the GT information that is automatically analyzed.
- the GT information first stored through the information correction unit 120 is represented as a list and displayed to the user (S222).
- a list of objects is represented according to the number of images or video frames.
- step S222 when the user selects an object for which information is to be modified, the selection input is received (S224), and the selected object is switched to a modifiable state.
- the user modifies GT information by receiving an input for modifying GT information of the selected object (S226). Specifically, the user can modify the type of the object, the posture of the object, the state of the object, the location information of the object, the attribute information of the object, and the information correction unit 120 receives the user's correction input for each of the attribute information GT Update and update information.
- step S228 it is determined whether GT information correction has been completed for all frames of the image or video, and if there is a frame that has not yet been reviewed, the process returns to step S224, and if all frames have been reviewed for correction, GT information Complete the modification.
- GT information is converted into metadata (S230).
- the converted generated metadata is stored in an appropriate storage (metadata storage unit 140).
- XML, EXCEL, JSON, TEXT, etc. formats can be used as metadata.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Library & Information Science (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Artificial Intelligence (AREA)
- Image Analysis (AREA)
Abstract
La présente invention concerne un procédé de production d'informations de GT d'objets dans une image qui comprend les étapes suivantes : effectuer une analyse automatique selon une structure de GT prédéterminée de façon à produire des informations de GT de chaque objet inclus dans l'image ; corriger les informations de GT produites ; et produire des métadonnées en transformant les informations de GT corrigées, les informations de GT pour chaque objet comprenant le type de l'objet, l'orientation de l'objet, l'état de l'objet, des informations de position de l'objet, et des attributs de l'objet. Selon la présente invention, des éléments constitutifs de GT standard et un procédé de description de GT peuvent être définis de façon à décrire des informations collectives d'objets tels que des êtres humains et des véhicules présents dans une image fixe ou une trame de vidéo, et des informations de GT peuvent être produites plus efficacement que par des procédés classiques en automatisant les informations de GT. Finalement, en définissant des éléments de GT standard pour des objets véhicules ou humains dans une image et une trame de vidéo, il est possible de fournir et d'utiliser des techniques de production de GT standard communes pour l'apprentissage machine d'images et de vidéos.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020180146998A KR20200068043A (ko) | 2018-11-26 | 2018-11-26 | 영상 기계학습을 위한 객체 gt 정보 생성 방법 및 시스템 |
| KR10-2018-0146998 | 2018-11-26 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020111505A1 true WO2020111505A1 (fr) | 2020-06-04 |
Family
ID=70851977
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2019/013511 Ceased WO2020111505A1 (fr) | 2018-11-26 | 2019-10-15 | Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20200068043A (fr) |
| WO (1) | WO2020111505A1 (fr) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR102313944B1 (ko) * | 2021-05-14 | 2021-10-18 | 주식회사 인피닉 | 화각의 경계를 넘어가는 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 |
| KR102310613B1 (ko) * | 2021-05-14 | 2021-10-12 | 주식회사 인피닉 | 연속된 2d 이미지에서 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 |
| KR102313940B1 (ko) * | 2021-05-14 | 2021-10-18 | 주식회사 인피닉 | 연속된 3d 데이터에서 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 |
| KR102310611B1 (ko) * | 2021-06-17 | 2021-10-13 | 주식회사 인피닉 | 2d 경로 추론을 통한 동일 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 |
| KR102313938B1 (ko) * | 2021-06-17 | 2021-10-18 | 주식회사 인피닉 | 3d 경로 추론을 통한 동일 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램 |
| KR102557136B1 (ko) * | 2021-11-23 | 2023-07-20 | 이인텔리전스 주식회사 | 차량 전방의 객체 및 차선에 대한 사용자 데이터 셋 생성 방법 및 장치 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050234960A1 (en) * | 2004-04-14 | 2005-10-20 | Microsoft Corporation | Automatic data perspective generation for a target variable |
| US8774515B2 (en) * | 2011-04-20 | 2014-07-08 | Xerox Corporation | Learning structured prediction models for interactive image labeling |
| KR20180029625A (ko) * | 2016-09-13 | 2018-03-21 | 대구대학교 산학협력단 | 영상처리 기술의 성능 평가를 위한 GT(Ground Truth) 생성 프로그램이 설치된 시스템 |
| KR20180118596A (ko) * | 2015-10-02 | 2018-10-31 | 트랙터블 리미티드 | 데이터세트들의 반-자동 라벨링 |
-
2018
- 2018-11-26 KR KR1020180146998A patent/KR20200068043A/ko not_active Withdrawn
-
2019
- 2019-10-15 WO PCT/KR2019/013511 patent/WO2020111505A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050234960A1 (en) * | 2004-04-14 | 2005-10-20 | Microsoft Corporation | Automatic data perspective generation for a target variable |
| US8774515B2 (en) * | 2011-04-20 | 2014-07-08 | Xerox Corporation | Learning structured prediction models for interactive image labeling |
| KR20180118596A (ko) * | 2015-10-02 | 2018-10-31 | 트랙터블 리미티드 | 데이터세트들의 반-자동 라벨링 |
| KR20180029625A (ko) * | 2016-09-13 | 2018-03-21 | 대구대학교 산학협력단 | 영상처리 기술의 성능 평가를 위한 GT(Ground Truth) 생성 프로그램이 설치된 시스템 |
Non-Patent Citations (1)
| Title |
|---|
| CARL VONDRICK: "Video Annotation and Tracking with Active Learning", RESEARCHGATE, January 2011 (2011-01-01), pages 1 - 9 * |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20200068043A (ko) | 2020-06-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020111505A1 (fr) | Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images | |
| CN111353555A (zh) | 一种标注检测方法、装置及计算机可读存储介质 | |
| WO2017039086A1 (fr) | Système de modularisation d'apprentissage profond sur la base d'un module d'extension internet et procédé de reconnaissance d'image l'utilisant | |
| CN112100438A (zh) | 一种标签抽取方法、设备及计算机可读存储介质 | |
| EP2630635A1 (fr) | Procédé et appareil destinés à reconnaître une émotion d'un individu sur la base d'unités d'actions faciales | |
| CN111242083A (zh) | 基于人工智能的文本处理方法、装置、设备、介质 | |
| US20200250401A1 (en) | Computer system and computer-readable storage medium | |
| WO2022213540A1 (fr) | Procédé et système de détection d'objet, d'identification d'attribut d'objet et de suivi d'objet | |
| EP3172683A1 (fr) | Procédé d'extraction d'image et dispositif électronique associé | |
| WO2021118047A1 (fr) | Procédé et appareil pour évaluer une responsabilité d'accident dans une image d'accident en utilisant l'apprentissage profond | |
| CN112699758A (zh) | 基于动态手势识别的手语翻译方法、装置、计算机设备及存储介质 | |
| Papakis et al. | Convolutional neural network-based in-vehicle occupant detection and classification method using second strategic highway research program cabin images | |
| WO2024005413A1 (fr) | Procédé et dispositif basés sur l'intelligence artificielle pour extraire des informations d'un document électronique | |
| CN113887481A (zh) | 一种图像处理方法、装置、电子设备及介质 | |
| WO2023109631A1 (fr) | Procédé et appareil de traitement de données, dispositif, support de stockage et produit-programme | |
| US11023713B2 (en) | Suspiciousness degree estimation model generation device | |
| CN113408329A (zh) | 基于人工智能的视频处理方法、装置、设备及存储介质 | |
| CN113763370A (zh) | 数字病理图像的处理方法、装置、电子设备及存储介质 | |
| CN114359160A (zh) | 一种屏幕的检测方法、装置、电子设备及存储介质 | |
| WO2023272991A1 (fr) | Procédé et appareil de traitement de données, dispositif informatique et support de stockage | |
| CN116310985A (zh) | 基于视频流数据的异常数据智能识别方法、装置以及设备 | |
| US20250131717A1 (en) | System and method to create configurable, context sensitive functions in ar experiences | |
| WO2011093568A1 (fr) | Procédé de reconnaissance de page de support d'impression basée sur une mise en page | |
| CN117610549B (zh) | 文档处理、内容生成方法、装置及电子设备 | |
| CN114821513B (zh) | 一种基于多层网络的图像处理方法及装置、电子设备 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19891015 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19891015 Country of ref document: EP Kind code of ref document: A1 |