[go: up one dir, main page]

WO2020111505A1 - Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images - Google Patents

Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images Download PDF

Info

Publication number
WO2020111505A1
WO2020111505A1 PCT/KR2019/013511 KR2019013511W WO2020111505A1 WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1 KR 2019013511 W KR2019013511 W KR 2019013511W WO 2020111505 A1 WO2020111505 A1 WO 2020111505A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
image
generating
metadata
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2019/013511
Other languages
English (en)
Korean (ko)
Inventor
양창모
송재종
추유식
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Korea Electronics Technology Institute
Original Assignee
Korea Electronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Korea Electronics Technology Institute filed Critical Korea Electronics Technology Institute
Publication of WO2020111505A1 publication Critical patent/WO2020111505A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present invention relates to a method for generating GT (Ground Truth) information used for machine learning of an image or video, and more specifically, to a method and system for generating GT information for an object such as a person or a car included in an image or video frame. It is about.
  • GT Round Truth
  • a technique for detecting an object is the basic technique.
  • a GT Round Truth
  • an object is manually extracted in an offline state and trained. It takes a method of generating an detector to perform object detection.
  • a standard GT component and a GT description method are defined so as to describe comprehensive information of human or vehicle objects existing in an image or video frame.
  • an object of the present invention is to provide a method and a system capable of efficiently generating GT information by automating GT tagging.
  • the method for generating GT information of an object in an image according to the present invention includes automatically analyzing and generating GT information of each object included in an image according to a predetermined GT structure, and correcting the generated GT information. And generating metadata by converting the completed GT information, and the GT information for each object includes an object type, an object posture, an object state, an object location information, and an object property.
  • the GT information generation system includes an image storage unit that stores an image, and a GT analysis unit that receives an image file from the image storage unit and analyzes and generates GT information for each object present in each frame. And, it includes a GT information correction unit for modifying the GT information generated by the GT analysis unit, and a metadata generation unit for generating metadata by converting the modified GT information.
  • a standard GT component and a GT technology method can be defined so as to describe comprehensive information of objects such as a person or a vehicle existing in a still image image or a video frame.
  • GT information can be automated to generate more efficiently compared to conventional methods.
  • FIG. 1 is a block diagram of a GT information generation system according to an embodiment of the present invention.
  • FIG. 2 is an overall flowchart of a method for generating GT information according to an embodiment of the present invention.
  • FIG. 3 is a detailed flowchart of the GT analysis step shown in FIG. 2;
  • FIG. 4 is a detailed flowchart of the GT information modification step illustrated in FIG. 2.
  • each component, functional blocks or means may be composed of one or more sub-components, the electrical, electronic, and mechanical functions performed by each component are electronic circuits , It may be implemented with various known devices such as an integrated circuit, an ASIC (Application Specific Integrated Circuit), or mechanical elements, or may be implemented separately, or two or more may be integrated into one.
  • ASIC Application Specific Integrated Circuit
  • the overall information of the image object such as a person or a vehicle is tagged and GT meta is used. Use a method of organizing data.
  • the image object GT information defined in the present invention is shown in Table 1 below.
  • the frame number has one value in the case of inputting a video image, and the number of each frame in the case of video input.
  • the number of objects means the number of objects detected in the corresponding image or frame.
  • the object ID list is composed of the number of objects. For each object, the object type, object posture, object state, object position information, and object property information are defined based on the object ID.
  • the types of objects are classified as'person' or'car', and the posture of the object is the'front','back','left','right','front-' of the posture of the object expressed in the image or video frame. It is divided into 8 directions such as'left','front-right','back-right', and'back-left'.
  • the state of an object is divided into ‘total’, ‘cut’, and ‘overlap’ depending on whether the entire object is visible in an image or video frame or overlapping with another object.
  • the location information of the object expresses four coordinates in which the object's bounding box is expressed based on the coordinates of (0, 0) of the image or video frame.
  • the property information of the object is configured differently according to the type of the object.
  • the type of the object is, for example, a person, it is composed of race, gender, age, height, color of the top, color of the bottom, and whether or not the glasses are worn.
  • the present invention provides a method and system for defining a standard GT component so as to describe comprehensive information of objects existing in an image or video frame, and generating and managing GT information accordingly.
  • FIG. 1 is a configuration diagram of a GT information generation system according to an embodiment of the present invention.
  • the system includes an image storage unit 100, a GT analysis unit 110, a GT information correction unit 120, a GT metadata generation unit 130, and a metadata storage unit 140.
  • the image storage unit 100 analyzes GT information and stores an image or video frame (hereinafter referred to as a'frame') containing an object to be generated.
  • a'frame' an image or video frame
  • Non-volatile memory such as a hard disk, volatile memory such as RAM, Alternatively, it may be implemented as a register that serves as a buffer for temporarily storing streaming data.
  • the GT analysis unit 110 automatically determines each piece of information according to the above-described GT components for each frame from the image input from the image storage unit 100, that is, the type of the object, the posture of the object, the state of the object, and the location information of the object. Extract the properties of an object. For automatic GT analysis of objects in an image, you can use open source or cloud API as well as self-developed algorithm.
  • the GT information correction unit 120 displays the extracted GT information as a list so that the user can modify the GT information generated by the GT analysis unit through analysis and displays it to the user.
  • a list of objects having the above-described GT information structure is displayed for each frame number (image number of a video or frame number of a video), so that the user can review.
  • the GT information is modified by reflecting the input.
  • the GT metadata generation unit 130 converts the modified GT information into metadata, and the metadata storage unit 140 stores metadata.
  • the metadata storage unit 140 and the image storage unit 100 are described separately from a logical point of view, and in hardware, each may be a separate storage device and may be configured in a single physical storage device.
  • the GT automatic analysis step will be described in more detail as follows.
  • the input video is divided into frames, and a frame number is assigned to the frame to be analyzed (S211).
  • a frame number is assigned to the frame to be analyzed (S211). In the case of a normal image, only one frame exists, so frame division is not performed.
  • GT analysis is automatically analyzed for the divided frame image (S214).
  • developed algorithms, open source, or cloud APIs can be used, and examples of cloud APIs include the Sighthound Cloud API.
  • the GT generated by analysis is generated as shown in Table 1 above, the frame number, the number of objects in each frame, the object ID for each object, the type of object by object ID, the posture of the object, the state of the object, the location information of the object, the object It is preferable to construct a data structure including the properties of.
  • the GT information analyzed and generated is stored for modification (S216).
  • the GT information correction step S220 is performed.
  • the GT information modification step (S220) is a step for the user to manually modify the GT information that is automatically analyzed.
  • the GT information first stored through the information correction unit 120 is represented as a list and displayed to the user (S222).
  • a list of objects is represented according to the number of images or video frames.
  • step S222 when the user selects an object for which information is to be modified, the selection input is received (S224), and the selected object is switched to a modifiable state.
  • the user modifies GT information by receiving an input for modifying GT information of the selected object (S226). Specifically, the user can modify the type of the object, the posture of the object, the state of the object, the location information of the object, the attribute information of the object, and the information correction unit 120 receives the user's correction input for each of the attribute information GT Update and update information.
  • step S228 it is determined whether GT information correction has been completed for all frames of the image or video, and if there is a frame that has not yet been reviewed, the process returns to step S224, and if all frames have been reviewed for correction, GT information Complete the modification.
  • GT information is converted into metadata (S230).
  • the converted generated metadata is stored in an appropriate storage (metadata storage unit 140).
  • XML, EXCEL, JSON, TEXT, etc. formats can be used as metadata.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de production d'informations de GT d'objets dans une image qui comprend les étapes suivantes : effectuer une analyse automatique selon une structure de GT prédéterminée de façon à produire des informations de GT de chaque objet inclus dans l'image ; corriger les informations de GT produites ; et produire des métadonnées en transformant les informations de GT corrigées, les informations de GT pour chaque objet comprenant le type de l'objet, l'orientation de l'objet, l'état de l'objet, des informations de position de l'objet, et des attributs de l'objet. Selon la présente invention, des éléments constitutifs de GT standard et un procédé de description de GT peuvent être définis de façon à décrire des informations collectives d'objets tels que des êtres humains et des véhicules présents dans une image fixe ou une trame de vidéo, et des informations de GT peuvent être produites plus efficacement que par des procédés classiques en automatisant les informations de GT. Finalement, en définissant des éléments de GT standard pour des objets véhicules ou humains dans une image et une trame de vidéo, il est possible de fournir et d'utiliser des techniques de production de GT standard communes pour l'apprentissage machine d'images et de vidéos.
PCT/KR2019/013511 2018-11-26 2019-10-15 Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images Ceased WO2020111505A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020180146998A KR20200068043A (ko) 2018-11-26 2018-11-26 영상 기계학습을 위한 객체 gt 정보 생성 방법 및 시스템
KR10-2018-0146998 2018-11-26

Publications (1)

Publication Number Publication Date
WO2020111505A1 true WO2020111505A1 (fr) 2020-06-04

Family

ID=70851977

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/013511 Ceased WO2020111505A1 (fr) 2018-11-26 2019-10-15 Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images

Country Status (2)

Country Link
KR (1) KR20200068043A (fr)
WO (1) WO2020111505A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102313944B1 (ko) * 2021-05-14 2021-10-18 주식회사 인피닉 화각의 경계를 넘어가는 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
KR102310613B1 (ko) * 2021-05-14 2021-10-12 주식회사 인피닉 연속된 2d 이미지에서 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
KR102313940B1 (ko) * 2021-05-14 2021-10-18 주식회사 인피닉 연속된 3d 데이터에서 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
KR102310611B1 (ko) * 2021-06-17 2021-10-13 주식회사 인피닉 2d 경로 추론을 통한 동일 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
KR102313938B1 (ko) * 2021-06-17 2021-10-18 주식회사 인피닉 3d 경로 추론을 통한 동일 객체 추적 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
KR102557136B1 (ko) * 2021-11-23 2023-07-20 이인텔리전스 주식회사 차량 전방의 객체 및 차선에 대한 사용자 데이터 셋 생성 방법 및 장치

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234960A1 (en) * 2004-04-14 2005-10-20 Microsoft Corporation Automatic data perspective generation for a target variable
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
KR20180029625A (ko) * 2016-09-13 2018-03-21 대구대학교 산학협력단 영상처리 기술의 성능 평가를 위한 GT(Ground Truth) 생성 프로그램이 설치된 시스템
KR20180118596A (ko) * 2015-10-02 2018-10-31 트랙터블 리미티드 데이터세트들의 반-자동 라벨링

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050234960A1 (en) * 2004-04-14 2005-10-20 Microsoft Corporation Automatic data perspective generation for a target variable
US8774515B2 (en) * 2011-04-20 2014-07-08 Xerox Corporation Learning structured prediction models for interactive image labeling
KR20180118596A (ko) * 2015-10-02 2018-10-31 트랙터블 리미티드 데이터세트들의 반-자동 라벨링
KR20180029625A (ko) * 2016-09-13 2018-03-21 대구대학교 산학협력단 영상처리 기술의 성능 평가를 위한 GT(Ground Truth) 생성 프로그램이 설치된 시스템

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CARL VONDRICK: "Video Annotation and Tracking with Active Learning", RESEARCHGATE, January 2011 (2011-01-01), pages 1 - 9 *

Also Published As

Publication number Publication date
KR20200068043A (ko) 2020-06-15

Similar Documents

Publication Publication Date Title
WO2020111505A1 (fr) Procédé et système de production d'informations de gt d'objet pour l'apprentissage machine d'images
CN111353555A (zh) 一种标注检测方法、装置及计算机可读存储介质
WO2017039086A1 (fr) Système de modularisation d'apprentissage profond sur la base d'un module d'extension internet et procédé de reconnaissance d'image l'utilisant
CN112100438A (zh) 一种标签抽取方法、设备及计算机可读存储介质
EP2630635A1 (fr) Procédé et appareil destinés à reconnaître une émotion d'un individu sur la base d'unités d'actions faciales
CN111242083A (zh) 基于人工智能的文本处理方法、装置、设备、介质
US20200250401A1 (en) Computer system and computer-readable storage medium
WO2022213540A1 (fr) Procédé et système de détection d'objet, d'identification d'attribut d'objet et de suivi d'objet
EP3172683A1 (fr) Procédé d'extraction d'image et dispositif électronique associé
WO2021118047A1 (fr) Procédé et appareil pour évaluer une responsabilité d'accident dans une image d'accident en utilisant l'apprentissage profond
CN112699758A (zh) 基于动态手势识别的手语翻译方法、装置、计算机设备及存储介质
Papakis et al. Convolutional neural network-based in-vehicle occupant detection and classification method using second strategic highway research program cabin images
WO2024005413A1 (fr) Procédé et dispositif basés sur l'intelligence artificielle pour extraire des informations d'un document électronique
CN113887481A (zh) 一种图像处理方法、装置、电子设备及介质
WO2023109631A1 (fr) Procédé et appareil de traitement de données, dispositif, support de stockage et produit-programme
US11023713B2 (en) Suspiciousness degree estimation model generation device
CN113408329A (zh) 基于人工智能的视频处理方法、装置、设备及存储介质
CN113763370A (zh) 数字病理图像的处理方法、装置、电子设备及存储介质
CN114359160A (zh) 一种屏幕的检测方法、装置、电子设备及存储介质
WO2023272991A1 (fr) Procédé et appareil de traitement de données, dispositif informatique et support de stockage
CN116310985A (zh) 基于视频流数据的异常数据智能识别方法、装置以及设备
US20250131717A1 (en) System and method to create configurable, context sensitive functions in ar experiences
WO2011093568A1 (fr) Procédé de reconnaissance de page de support d'impression basée sur une mise en page
CN117610549B (zh) 文档处理、内容生成方法、装置及电子设备
CN114821513B (zh) 一种基于多层网络的图像处理方法及装置、电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19891015

Country of ref document: EP

Kind code of ref document: A1