[go: up one dir, main page]

WO2023058999A1 - Object of interest detection device and method, and computer-readable program for same - Google Patents

Object of interest detection device and method, and computer-readable program for same Download PDF

Info

Publication number
WO2023058999A1
WO2023058999A1 PCT/KR2022/014745 KR2022014745W WO2023058999A1 WO 2023058999 A1 WO2023058999 A1 WO 2023058999A1 KR 2022014745 W KR2022014745 W KR 2022014745W WO 2023058999 A1 WO2023058999 A1 WO 2023058999A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature map
interest
extension
feature
extracting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2022/014745
Other languages
French (fr)
Korean (ko)
Inventor
고성제
조성진
이소열
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Korea University Research and Business Foundation
Original Assignee
Korea University Research and Business Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020220006514A external-priority patent/KR102840028B1/en
Application filed by Korea University Research and Business Foundation filed Critical Korea University Research and Business Foundation
Publication of WO2023058999A1 publication Critical patent/WO2023058999A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present invention relates to an object-of-interest detection apparatus, method, and computer readable program therefor, and more particularly, to an object-of-interest detection apparatus, method, and computer-readable program for detecting an object of interest based on an anchor box on an input image. am.
  • Object detection is one of the most actively researched areas in the field of computer vision.
  • Object detection refers to automatically classifying the type of object to be found in an image and automatically expressing the location and area of the identified object as a bounding box.
  • object detection technology in autonomous driving automatically detects cars or pedestrians seen on the road while driving and provides information such as road conditions and traffic information to the driver. Because of these advantages, object detection in a driving environment is highly likely to be used in various applications such as self-driving cars and automatic sensing systems, and the related market size is rapidly expanding.
  • Patent Document 1 Korean Patent Registration No. 10-1480065
  • a multiple extension-feature map based on the shape of an anchor box is extracted from an input image, a combined feature map is generated by combining the extracted features, and an offset is predicted from the combined feature map to obtain an object of interest
  • An object of the present invention is to provide an object of interest detection device, method, and computer readable program for detecting the object of interest.
  • An object of interest detection apparatus is an object of interest detection apparatus for detecting an object of interest based on at least one anchor box in an input image, wherein a plurality of multiple extensions are detected based on the shape of the anchor box from the input image.
  • a multi-extension-feature map extractor extracting a feature map, a feature combining unit generating a combined feature map by combining features extracted from each multi-extension-feature map, and predicting an offset from the combined feature map and an object detection unit that detects the object of interest.
  • the multi-extension-feature map extraction unit may extract the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.
  • the multiple extension-feature map extraction unit may perform the extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box.
  • the feature combiner extracts output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps, divides the output features, and combines the divided output features to generate a plurality of combined feature maps.
  • the object detection unit may predict the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.
  • a method of detecting an object of interest is a method of detecting an object of interest based on at least one anchor box on an input image in an object of interest detection apparatus, wherein a plurality of objects of interest are detected based on the shape of the anchor box from the input image. extracting multi-extension-feature maps of, generating a combined feature map by combining features extracted from each multi-extension-feature map, and detecting the object of interest by estimating an offset from the combined feature map. It includes steps to
  • the step of extracting the multi-extension-feature map may include extracting the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.
  • the step of extracting the multiple extension-feature map may include performing the extended convolution operation to which a plurality of convolution kernels corresponding to the aspect ratio of the anchor box are applied.
  • generating a combined feature map may include extracting output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps, dividing the output features, and combining the divided output features to form a plurality of combinations. It may be to generate a feature map.
  • the detecting of the object of interest may include predicting the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.
  • another embodiment of the present invention may include a computer readable program stored in a computer readable recording medium configured to execute the object of interest detection method.
  • the object can be more robustly matched to the shape of the object. can be detected.
  • the performance of the object-of-interest detection apparatus can be greatly improved.
  • 1 and 2 are diagrams for explaining a method of detecting an object of interest in a conventional device for detecting an object of interest.
  • FIG. 3 is a diagram showing the configuration of an object of interest detection apparatus according to an embodiment of the present invention.
  • FIG. 4 is a diagram schematically illustrating an object of interest detection mechanism in the apparatus for detecting an object of interest shown in FIG. 3 .
  • FIG. 5 is a diagram schematically illustrating a process of extracting a multi-extension-feature map in the multi-extension-feature map extractor shown in FIG. 3 .
  • FIG. 6 is a diagram schematically illustrating a process of generating a combined feature map in the feature combiner shown in FIG. 3 .
  • FIG. 7 is a diagram schematically illustrating a process of predicting an offset in the object detection unit shown in FIG. 3 .
  • FIG. 8 is a flowchart illustrating a method for detecting an object of interest according to another embodiment of the present invention.
  • 9A and 9B are diagrams illustrating an example of a result of detecting an object of interest according to the method of detecting an object of interest according to FIG. 8 and a conventional method.
  • 1 and 2 are diagrams for explaining a method of detecting an object of interest in an apparatus for detecting an object of interest.
  • the object-of-interest detection device is a device for detecting an object of interest on an input image in real time, and may be, for example, a single shot multi-box detector (SSD) or a refine detector.
  • SSD single shot multi-box detector
  • refine detector a refine detector
  • the apparatus for detecting an object of interest includes a plurality of detection heads, and each detection head is based on the difference between the size of an anchor box and the size of a real object (ground truth box). to detect the object of interest.
  • each detection head predicts a final offset by applying a 3x3 convolution operation collectively to the input feature map, as shown in FIG. 2 .
  • anchor boxes used in the object-of-interest detection apparatus are pre-existing boxes in an input image and have various shapes, that is, various sizes and aspect ratios.
  • various shapes that is, various sizes and aspect ratios.
  • 8700 anchor boxes may exist in advance. Therefore, there is a need to detect an object of interest in consideration of various shapes of anchor boxes.
  • FIG. 3 is a diagram showing the configuration of a device for detecting an object of interest according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of an object of interest detection mechanism in the device for detecting an object of interest shown in FIG. 3
  • FIG. It is a diagram schematically illustrating the process of extracting the multi-extension-feature map in the multi-extension-feature map extraction unit shown in FIG. 3, and
  • FIG. 6 is the process of generating the combined feature map in the feature combining unit shown in FIG.
  • FIG. 7 is a diagram schematically illustrating a process of predicting an offset in the object detection unit shown in FIG. 3 .
  • the object of interest detection apparatus 100 is an object of interest detection apparatus 100 that detects an object of interest based on at least one anchor box on an input image, and includes a multiple extension-feature map extractor 110, a feature It includes a coupling unit 120 and an object detection unit 130.
  • the multi-dilated feature map extractor 110 extracts a plurality of multi-dilated feature maps based on the shape of an anchor box from an input image.
  • the multi-extension-feature map extractor 110 may extract the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.
  • the multiple extension-feature map extractor 110 may perform an extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box.
  • the multi-expansion-feature map extractor 110 selects a convolution kernel corresponding to an anchor box having an aspect ratio of 1:1, 1:2, or 2:1, and the selected convolution kernel and By performing an extended convolution operation, a plurality of multiple extended-feature maps ( ) can be created.
  • the first multi-extension-feature map ( ) is extracted.
  • -Feature map ( ) is extracted.
  • the feature combiner 120 combines features extracted from each multi-extension-feature map to create a combined feature map.
  • the feature combiner 120 extracts output features of the reduced channel by applying a 1x1 convolution operation to a plurality of multiple extension-feature maps, divides the output features, and converts the output features into a combined channel having a preset size. It may be to generate a plurality of combined feature maps by combining them.
  • the feature combiner 120 each multiple extension-feature map ( ) by applying a 1x1 convolution operation to the output features ( , , , ) is extracted. More specifically, the feature combining unit 120 is a first multi-extension-feature map ( ) by applying a 1x1 convolution operation to the first output characteristic of 256 channels ( ) is extracted, and the second multi-extension-feature map ( ) by applying a 1x1 convolution operation to the second output characteristic of 128 channels ( ) is extracted.
  • the feature combining unit 120 is a third multi-extension-feature map ( ) by applying a 1x1 convolution operation to the third output characteristic of 128 channels ( ) is extracted, and the fourth multi-extension-feature map ( ) by applying a 1x1 convolution operation to the fourth output characteristic of 256 channels ( ) is extracted.
  • the feature combining unit 120 output features extracted in this way ( , , , ) are divided and combined into a combination channel having a certain size to form a plurality of combined feature maps ( ) to create That is, the feature combining unit 120 is a combined feature map in which output features from anchor boxes having different shapes are mutually combined ( ) to generate.
  • These combined feature maps ( ) can be defined as in Equation 1 below.
  • N is the number of anchor boxes. Meanwhile, in this embodiment, N is 3 considering three anchor boxes, but is not limited thereto.
  • the first output features ( ) and the fourth output features ( ) is divided into a first joint feature map ( ), the second joint feature map ( ) And the third joint feature map ( ), and the second output features ( ) and the third output features ( ) is the divided first output features ( ) and the fourth output features ( ) In combination with at least one or more of the second combined feature map ( ) And the third joint feature map ( ) is created.
  • the object detection unit 130 detects an object of interest by predicting an offset from the combined feature map.
  • the object detector 130 may predict the offset by performing a 3x3 convolution operation by applying a convolution kernel having a fixed size to each of a plurality of combined feature maps. That is, the object detector 130 predicts the final offset by applying the same convolution kernel to each combined feature map in which output features based on anchor boxes of different shapes are mutually combined, and detects the object of interest through this.
  • FIG. 8 is a flowchart illustrating a method for detecting an object of interest according to another embodiment of the present invention.
  • a method of detecting an object of interest is a method of detecting an object of interest based on at least one anchor box on an input image in an object of interest detection apparatus, wherein the multi-expansion-feature map extractor detects the anchor box from the input image. Extracting a plurality of multiple extension-feature maps based on the shape of (S10), generating a combined feature map by combining features extracted from each multiple extension-feature map in a feature combining unit (S20), and an object detection unit and detecting the object of interest by estimating an offset from the combined feature map in (S30).
  • the multi-extension-feature map may be extracted by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.
  • the step of extracting the multiple extension-feature map (S10) may be to perform the extended convolution operation to which a plurality of convolution kernels corresponding to the aspect ratio of the anchor box are applied.
  • output features are extracted by applying a 1x1 convolution operation to the plurality of multiple extension-feature maps, the output features are divided, the output features are divided, and the divided A plurality of combined feature maps may be generated by combining the output features with each other.
  • the offset value may be calculated by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.
  • 9A and 9B are diagrams illustrating an example of a result of detecting an object of interest according to the method of detecting an object of interest according to FIG. 8 and a conventional method.
  • FIG. 9A is a result of detecting an object of interest according to a conventional method
  • FIG. 9B is a diagram showing a result of detecting an object of interest according to the method of detecting an object of interest according to the present invention.
  • multiple extension-feature maps based on various shapes of anchor boxes are extracted from an input image, and an object of interest is detected from a combined feature map obtained by combining these features, thereby making it more robust to the shape of the object.
  • Object detection may be performed. Therefore, since it is possible to detect an object not detected by the existing object-of-interest detection apparatus, the performance of the object-of-interest detection apparatus can be greatly improved.
  • An operation by the method of detecting an object of interest according to the above-described embodiments may be at least partially implemented as a computer program and recorded on a computer-readable recording medium.
  • a computer-readable recording medium on which a program for implementing an object-of-interest detection operation according to embodiments is recorded includes all types of recording devices storing data readable by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices.
  • computer-readable recording media may be distributed in computer systems connected through a network, and computer-readable codes may be stored and executed in a distributed manner.
  • functional programs, codes, and code segments for implementing this embodiment can be easily understood by those skilled in the art to which this embodiment belongs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are an object of interest detection device and method, and a computer-readable program for same. The object of interest detection device according to the present invention detects an object of interest in an input image on the basis of one or more anchor boxes. The object of interest detection device comprises: a multi-extension feature map extraction unit for extracting, from the input image, a plurality of multi-extension feature maps based on the shapes of the anchor boxes; a feature combination unit for generating a combined feature map by combining features extracted from each of the multi-extension feature maps; and an object detection unit for detecting the object of interest by predicting an offset from the combined feature map. According to the invention described above, the multi-extension feature maps based on the various shapes of the anchor boxes are extracted from the input image, and the object of interest is detected from the combined feature map obtained by mutually combining the features, and thus the object detection can be performed more robustly on the shape of the object.

Description

관심객체 검출장치, 방법 및 이를 위한 컴퓨터 판독가능 프로그램Object of interest detection device, method and computer readable program therefor

본 발명은 관심객체 검출장치, 방법 및 이를 위한 컴퓨터 판독가능 프로그램에 관한 것으로서, 보다 상세하게는 입력 이미지 상에서 앵커 박스에 기반하여 관심객체를 검출하는 관심객체 검출장치, 방법 및 이를 위한 컴퓨터 판독가능 프로그램이다. The present invention relates to an object-of-interest detection apparatus, method, and computer readable program therefor, and more particularly, to an object-of-interest detection apparatus, method, and computer-readable program for detecting an object of interest based on an anchor box on an input image. am.

객체 검출(object detection) 분야는 컴퓨터 비전 분야에서 가장 활발히 연구되고 있는 분야 중 하나이다. Object detection is one of the most actively researched areas in the field of computer vision.

객체 검출이란 영상 내에서 찾고자 하는 객체의 종류를 자동으로 구분하고, 구분된 객체의 위치와 영역을 자동으로 바운딩 박스(bounding box)로 표현하는 것을 말한다.Object detection refers to automatically classifying the type of object to be found in an image and automatically expressing the location and area of the identified object as a bounding box.

최근에 이러한 객체 검출 기술이 자율주행과 보안 감시 기술이 성장함에 따라 그 중요성이 대두되고 있다.Recently, the importance of such object detection technology has emerged as autonomous driving and security monitoring technology have grown.

특히, 자율주행에서의 객체 검출기술은 주행을 하면서 도로에서 보이는 자동차 또는 보행자 등을 자동으로 검출하여 운전자에게 도로상태, 교통정보 등의 정보를 제공한다. 이러한 장점 때문에, 주행환경에서의 객체 검출은 자율주행 자동차, 자동 감지 시스템 등 다양한 어플리케이션으로의 활용 가능성이 높아 관련 시장 규모가 빠르게 확대되고 있다.In particular, object detection technology in autonomous driving automatically detects cars or pedestrians seen on the road while driving and provides information such as road conditions and traffic information to the driver. Because of these advantages, object detection in a driving environment is highly likely to be used in various applications such as self-driving cars and automatic sensing systems, and the related market size is rapidly expanding.

따라서, 보다 정확하게 객체를 검출할 수 있는 객체 검출 기술이 필요한 실정이다.Accordingly, there is a need for an object detection technique capable of more accurately detecting an object.

[선행기술문헌][Prior art literature]

[특허문헌][Patent Literature]

(특허문헌 1) 한국 등록특허공보 제10-1480065호(Patent Document 1) Korean Patent Registration No. 10-1480065

본 발명의 일 측면에 따르면, 입력 이미지로부터 앵커박스의 형상에 기초한 다중확장-특징맵을 추출하고, 이렇게 추출된 특징들을 결합하여 결합 특징맵을 생성하며 이러한 결합 특징맵으로부터 오프셋을 예측하여 관심객체를 검출하는 관심객체 검출장치, 방법 및 이를 위한 컴퓨터 판독가능 프로그램을 제공하는데 그 목적이 있다. According to one aspect of the present invention, a multiple extension-feature map based on the shape of an anchor box is extracted from an input image, a combined feature map is generated by combining the extracted features, and an offset is predicted from the combined feature map to obtain an object of interest An object of the present invention is to provide an object of interest detection device, method, and computer readable program for detecting the object of interest.

본 발명의 일 실시예에 따른 관심객체 검출장치는 입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 관심객체 검출장치로서, 상기 입력 이미지로부터 상기 앵커 박스의 형상에 기초한 복수의 다중확장- 특징맵을 추출하는 다중확장-특징맵 추출부, 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성하는 특징결합부 및 상기 결합 특징맵으로부터 오프셋(offset)을 예측하여 상기 관심 객체를 검출하는 객체검출부를 포함한다.An object of interest detection apparatus according to an embodiment of the present invention is an object of interest detection apparatus for detecting an object of interest based on at least one anchor box in an input image, wherein a plurality of multiple extensions are detected based on the shape of the anchor box from the input image. - A multi-extension-feature map extractor extracting a feature map, a feature combining unit generating a combined feature map by combining features extracted from each multi-extension-feature map, and predicting an offset from the combined feature map and an object detection unit that detects the object of interest.

한편, 다중확장-특징맵 추출부는, 상기 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출하는 것일 수 있다.Meanwhile, the multi-extension-feature map extraction unit may extract the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.

또한, 다중확장-특징맵 추출부는, 상기 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 상기 확장된 컨볼루션 연산을 수행하는 것일 수 있다.In addition, the multiple extension-feature map extraction unit may perform the extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box.

또한, 특징결합부는, 상기 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 출력 특징들을 추출하고, 상기 출력 특징들을 분할하고 분할된 출력 특징들을 상호 결합하여 복수의 결합 특징맵을 생성하는 것일 수 있다.In addition, the feature combiner extracts output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps, divides the output features, and combines the divided output features to generate a plurality of combined feature maps. it could be

또한, 객체 검출부는, 상기 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 컨볼루션 연산을 수행함으로써 상기 오프셋을 예측하는 것일 수 있다.In addition, the object detection unit may predict the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.

본 발명의 다른 실시예에 따른 관심객체 검출방법은 관심객체 검출장치에서의 입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 방법으로서, 상기 입력 이미지로부터 상기 앵커 박스의 형상에 기초한 복수의 다중확장- 특징맵을 추출하는 단계, 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성하는 단계 및 상기 결합 특징맵으로부터 오프셋(offset)을 예측하여 상기 관심 객체를 검출하는 단계를 포함한다.A method of detecting an object of interest according to another embodiment of the present invention is a method of detecting an object of interest based on at least one anchor box on an input image in an object of interest detection apparatus, wherein a plurality of objects of interest are detected based on the shape of the anchor box from the input image. extracting multi-extension-feature maps of, generating a combined feature map by combining features extracted from each multi-extension-feature map, and detecting the object of interest by estimating an offset from the combined feature map. It includes steps to

한편, 다중확장-특징맵을 추출하는 단계는, 상기 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출하는 것일 수 있다.Meanwhile, the step of extracting the multi-extension-feature map may include extracting the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.

또한, 다중확장-특징맵을 추출하는 단계는, 상기 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 상기 확장된 컨볼루션 연산을 수행하는 것일 수 있다.In addition, the step of extracting the multiple extension-feature map may include performing the extended convolution operation to which a plurality of convolution kernels corresponding to the aspect ratio of the anchor box are applied.

또한, 결합 특징맵을 생성하는 단계는, 상기 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 출력 특징들을 추출하고, 상기 출력 특징들을 분할하고 분할된 출력 특징들을 상호 결합하여 복수의 결합 특징맵을 생성하는 것일 수 있다.In addition, generating a combined feature map may include extracting output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps, dividing the output features, and combining the divided output features to form a plurality of combinations. It may be to generate a feature map.

또한, 관심 객체를 검출하는 단계는, 상기 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 컨볼루션 연산을 수행함으로써 상기 오프셋을 예측하는 것일 수 있다.The detecting of the object of interest may include predicting the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.

또한, 본 발명의 또 다른 실시예는 관심객체 검출방법을 실행하도록 구성된, 컴퓨터로 판독가능한 기록매체에 저장된 컴퓨터 판독가능한 프로그램을 포함할 수 있다Further, another embodiment of the present invention may include a computer readable program stored in a computer readable recording medium configured to execute the object of interest detection method.

상술한 본 발명에 따르면, 입력 이미지에서 앵커 박스의 다양한 형상에 기초한 다중확장-특징맵들을 추출하고 이들의 특징들을 상호 결합한 결합 특징맵으로부터 관심 객체를 검출하므로, 객체의 형상에 보다 강인하게 객체를 검출할 수 있다.According to the present invention described above, since multiple extension-feature maps based on various shapes of anchor boxes are extracted from an input image and an object of interest is detected from a combined feature map obtained by combining these features, the object can be more robustly matched to the shape of the object. can be detected.

따라서, 기존의 관심객체 검출장치에서 검출하지 못한 객체에 대해서도 검출이 가능하므로, 관심객체 검출장치의 성능이 큰 폭으로 향상될 수 있다.Therefore, since it is possible to detect an object not detected by the existing object-of-interest detection apparatus, the performance of the object-of-interest detection apparatus can be greatly improved.

도 1 및 도 2 는 종래의 관심 객체를 검출하는 장치에서의 관심 객체를 검출하는 방법을 설명하기 위한 도면이다.1 and 2 are diagrams for explaining a method of detecting an object of interest in a conventional device for detecting an object of interest.

도 3 은 본 발명의 일 실시예에 따른 관심객체 검출장치의 구성을 도시한 도면이다.3 is a diagram showing the configuration of an object of interest detection apparatus according to an embodiment of the present invention.

도 4 는 도 3 에 도시된 관심객체 검출장치에서의 관심객체 검출기작을 모식화한 도면이다.4 is a diagram schematically illustrating an object of interest detection mechanism in the apparatus for detecting an object of interest shown in FIG. 3 .

도 5 는 도 3 에 도시된 다중확장-특징맵 추출부에서의 다중확장-특징맵을 추출하는 과정을 모식화한 도면이다.FIG. 5 is a diagram schematically illustrating a process of extracting a multi-extension-feature map in the multi-extension-feature map extractor shown in FIG. 3 .

도 6 은 도 3 에 도시된 특징 결합부에서의 결합 특징맵을 생성하는 과정을 모식화한 도면이다.FIG. 6 is a diagram schematically illustrating a process of generating a combined feature map in the feature combiner shown in FIG. 3 .

도 7 은 도 3 에 도시된 객체 검출부에서의 오프셋을 예측하는 과정을 모식화한 도면이다.FIG. 7 is a diagram schematically illustrating a process of predicting an offset in the object detection unit shown in FIG. 3 .

도 8 은 본 발명의 다른 실시예에 따른 관심객체 검출방법을 도시한 순서도이다.8 is a flowchart illustrating a method for detecting an object of interest according to another embodiment of the present invention.

도 9a 및 도9b 는 도 8 에 따른 관심객체 검출방법과 종래의 방법에 따라 관심객체를 검출한 결과의 일 예를 도시한 도면이다. 9A and 9B are diagrams illustrating an example of a result of detecting an object of interest according to the method of detecting an object of interest according to FIG. 8 and a conventional method.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The detailed description of the present invention which follows refers to the accompanying drawings which illustrate, by way of illustration, specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable one skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different from each other but are not necessarily mutually exclusive. For example, specific shapes, structures, and characteristics described herein may be implemented in another embodiment without departing from the spirit and scope of the invention in connection with one embodiment. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description set forth below is not to be taken in a limiting sense, and the scope of the present invention, if properly described, is limited only by the appended claims, along with all equivalents as claimed by those claims. Like reference numbers in the drawings indicate the same or similar function throughout the various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1 및 도 2 는 관심 객체를 검출하는 장치에서의 관심 객체를 검출하는 방법을 설명하기 위한 도면이다.1 and 2 are diagrams for explaining a method of detecting an object of interest in an apparatus for detecting an object of interest.

관심객체 검출장치는 입력이미지 상에서 관심객체를 실시간으로 검출하는 장치로서, 예컨대 SSD(Single Shot multi-box detector) 또는 Refine detector 일 수 있다. The object-of-interest detection device is a device for detecting an object of interest on an input image in real time, and may be, for example, a single shot multi-box detector (SSD) or a refine detector.

이러한 관심객체 검출장치는 도 1 에서와 같이, 다수의 검출헤드(Detection Head)를 구비하며, 각 검출헤드에서는 앵커 박스(Anchor box)의 크기와 실제 객체의 크기(ground truth box)와 차이에 기초하여 관심객체를 검출한다. 이를 위해, 각 검출헤드는 도 2 에서와 같이, 입력 특징맵(Feature map)에 일괄적으로 3x3 컨볼루션 연산을 적용하여 최종적인 오프셋을 예측(Offset prediction)한다.As shown in FIG. 1, the apparatus for detecting an object of interest includes a plurality of detection heads, and each detection head is based on the difference between the size of an anchor box and the size of a real object (ground truth box). to detect the object of interest. To this end, each detection head predicts a final offset by applying a 3x3 convolution operation collectively to the input feature map, as shown in FIG. 2 .

한편, 이러한 관심객체 검출장치에서 사용하는 앵커박스는 입력 이미지 내에 사전에 존재하는 박스로서 다양한 형상, 즉, 다양한 크기와 종횡비를 갖는다. 예컨대, 300x300 입력크기를 갖는 SSD 의 경우 8700 개의 앵커박스가 사전에 존재할 수 있다. 따라서, 앵커박스의 다양한 형상을 고려하여 관심객체를 검출할 필요성이 있다.Meanwhile, anchor boxes used in the object-of-interest detection apparatus are pre-existing boxes in an input image and have various shapes, that is, various sizes and aspect ratios. For example, in the case of an SSD having an input size of 300x300, 8700 anchor boxes may exist in advance. Therefore, there is a need to detect an object of interest in consideration of various shapes of anchor boxes.

이하에서는 이러한 앵커박스의 다양한 형상에 기초하여 관심객체를 검출하는 장치에 대해 설명한다. Hereinafter, an apparatus for detecting an object of interest based on various shapes of the anchor box will be described.

도 3 은 본 발명의 일 실시예에 따른 관심객체 검출장치의 구성을 도시한 도면이고, 도 4 는 도 3 에 도시된 관심객체 검출장치에서의 관심객체 검출기작을 모식화한 도면이며, 도 5 는 도 3 에 도시된 다중확장-특징맵 추출부에서의 다중확장-특징맵을 추출하는 과정을 모식화한 도면이고, 도 6 은 도 3 에 도시된 특징 결합부에서의 결합 특징맵을 생성하는 과정을 모식화한 도면이며, 도 7 은 도 3 에 도시된 객체 검출부에서의 오프셋을 예측하는 과정을 모식화한 도면이다.3 is a diagram showing the configuration of a device for detecting an object of interest according to an embodiment of the present invention, FIG. 4 is a schematic diagram of an object of interest detection mechanism in the device for detecting an object of interest shown in FIG. 3, and FIG. It is a diagram schematically illustrating the process of extracting the multi-extension-feature map in the multi-extension-feature map extraction unit shown in FIG. 3, and FIG. 6 is the process of generating the combined feature map in the feature combining unit shown in FIG. , and FIG. 7 is a diagram schematically illustrating a process of predicting an offset in the object detection unit shown in FIG. 3 .

본 실시예에 따른 관심객체 검출장치(100)는 입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 관심객체 검출장치(100)로서, 다중확장-특징맵 추출부(110), 특징결합부(120) 및 객체검출부(130)를 포함한다. The object of interest detection apparatus 100 according to the present embodiment is an object of interest detection apparatus 100 that detects an object of interest based on at least one anchor box on an input image, and includes a multiple extension-feature map extractor 110, a feature It includes a coupling unit 120 and an object detection unit 130.

다중확장-특징맵 추출부(110)는 입력 이미지로부터 앵커 박스의 형상에 기초한 복수의 다중확장- 특징맵(multi-dilated feature maps)을 추출한다. The multi-dilated feature map extractor 110 extracts a plurality of multi-dilated feature maps based on the shape of an anchor box from an input image.

다중확장-특징맵 추출부(110)는 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출할 수 있다.The multi-extension-feature map extractor 110 may extract the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box.

보다 구체적으로, 다중확장-특징맵 추출부(110)는, 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 확장된 컨볼루션 연산을 수행할 수 있다. More specifically, the multiple extension-feature map extractor 110 may perform an extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box.

도 5 를 참조하면, 다중확장-특징맵 추출부(110)는 1:1, 1:2, 2:1 의 종횡비를 갖는 앵커박스에 대응하는 컨볼루션 커널을 선택하고, 상기 선택된 컨볼루션 커널과 확장된 컨볼루션 연산을 수행하여 복수의 다중확장-특징맵 (

Figure PCTKR2022014745-appb-img-000001
)을 생성할 수 있다. 먼저, 다중확장-특징맵 추출부(110)는 1:1 의 종횡비를 갖는 앵커박스(빨간색 앵커박스)의 형상을 고려한 컨볼루션 커널(d=(1,1)) 과 입력 특징맵을 3x3 컨볼루션 연산하여 제 1 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000002
)을 추출한다. 또한, 다중확장-특징맵 추출부는 1:2 의 종횡비를 갖는 앵커박스(노란색 앵커박스)의 형상을 고려한 컨볼루션 커널(d=(1,2)) 과 입력 특징맵을 3x3 컨볼루션 연산하여 제 2 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000003
)을 추출하며, 2:1 의 종횡비를 갖는 앵커박스(보라색 앵커박스)의 형상을 고려한 컨볼루션 커널(d=(2,1)) 과 입력 특징맵을 3x3 컨볼루션 연산하여 제 3 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000004
)을 추출한다. 또한, 다중확장-특징맵 추출부는 와이드-스케일(wide-scale) 객체의 특징을 보완하기 위해 컨볼루션 커널(d=(2,2)) 과 입력 특징맵을 3x3 컨볼루션 연산하여 제 4 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000005
)을 추출한다.Referring to FIG. 5, the multi-expansion-feature map extractor 110 selects a convolution kernel corresponding to an anchor box having an aspect ratio of 1:1, 1:2, or 2:1, and the selected convolution kernel and By performing an extended convolution operation, a plurality of multiple extended-feature maps (
Figure PCTKR2022014745-appb-img-000001
) can be created. First, the multiple extension-feature map extractor 110 performs a 3x3 convolution of a convolution kernel (d=(1,1)) considering the shape of an anchor box (red anchor box) having an aspect ratio of 1:1 and an input feature map. The first multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000002
) is extracted. In addition, the multi-expansion-feature map extractor calculates a convolution kernel (d=(1,2)) considering the shape of an anchor box (yellow anchor box) having an aspect ratio of 1:2 and an input feature map by performing a 3x3 convolution operation. 2 multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000003
) is extracted, and a convolution kernel (d=(2,1)) considering the shape of an anchor box (purple anchor box) having an aspect ratio of 2:1 is calculated by 3x3 convolution of the input feature map, and the third multiple expansion- feature map (
Figure PCTKR2022014745-appb-img-000004
) is extracted. In addition, the multi-expansion-feature map extractor performs a 3x3 convolution operation on the convolution kernel (d = (2,2)) and the input feature map to compensate for the wide-scale object features, and obtains a fourth multiple extension. -Feature map (
Figure PCTKR2022014745-appb-img-000005
) is extracted.

한편, 도 5 에서는 1:1, 1:2, 2:1 의 종횡비를 갖는 앵커박스의 형상을 고려하여 다중확장-특징맵(

Figure PCTKR2022014745-appb-img-000006
)을 추출하였으나, 이는 일 예일 뿐, 다양한 종횡비를 갖는 앵커박스의 형상을 고려하여 다중확장-특징맵을 추출할 수 있음은 물론이다. On the other hand, in FIG. 5, multiple expansion-feature maps (
Figure PCTKR2022014745-appb-img-000006
) was extracted, but this is only an example, and it is of course possible to extract multiple extension-feature maps in consideration of the shape of anchor boxes having various aspect ratios.

특징결합부(120)는 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성한다.The feature combiner 120 combines features extracted from each multi-extension-feature map to create a combined feature map.

보다 구체적으로, 특징결합부(120)는, 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 감소된 채널의 출력 특징들을 추출하고, 상기 출력 특징들을 분할하고 미리 설정된 크기의 결합 채널로 결합하여 복수의 결합 특징맵을 생성하는 것일 수 있다.More specifically, the feature combiner 120 extracts output features of the reduced channel by applying a 1x1 convolution operation to a plurality of multiple extension-feature maps, divides the output features, and converts the output features into a combined channel having a preset size. It may be to generate a plurality of combined feature maps by combining them.

도 6 을 참조하면, 먼저, 특징결합부(120)는 각 다중확장-특징맵(

Figure PCTKR2022014745-appb-img-000007
)에 1x1 컨볼루션 연산을 적용하여 출력 특징들(
Figure PCTKR2022014745-appb-img-000008
,
Figure PCTKR2022014745-appb-img-000009
,
Figure PCTKR2022014745-appb-img-000010
,
Figure PCTKR2022014745-appb-img-000011
)을 추출한다. 보다 구체적으로, 특징결합부(120)는 제 1 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000012
)에 1x1 컨볼루션 연산을 적용하여 256 채널의 제 1 출력특징(
Figure PCTKR2022014745-appb-img-000013
)을 추출하고, 제 2 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000014
)에 1x1 컨볼루션 연산을 적용하여 128 채널의 제 2출력특징(
Figure PCTKR2022014745-appb-img-000015
)을 추출한다. 또한, 특징결합부(120)는 제 3 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000016
)에 1x1 컨볼루션 연산을 적용하여 128 채널의 제 3 출력특징(
Figure PCTKR2022014745-appb-img-000017
)을 추출하고, 제 4 다중확장-특징맵(
Figure PCTKR2022014745-appb-img-000018
)에 1x1 컨볼루션 연산을 적용하여 256 채널의 제 4 출력특징(
Figure PCTKR2022014745-appb-img-000019
)을 추출한다.Referring to FIG. 6, first, the feature combiner 120 each multiple extension-feature map (
Figure PCTKR2022014745-appb-img-000007
) by applying a 1x1 convolution operation to the output features (
Figure PCTKR2022014745-appb-img-000008
,
Figure PCTKR2022014745-appb-img-000009
,
Figure PCTKR2022014745-appb-img-000010
,
Figure PCTKR2022014745-appb-img-000011
) is extracted. More specifically, the feature combining unit 120 is a first multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000012
) by applying a 1x1 convolution operation to the first output characteristic of 256 channels (
Figure PCTKR2022014745-appb-img-000013
) is extracted, and the second multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000014
) by applying a 1x1 convolution operation to the second output characteristic of 128 channels (
Figure PCTKR2022014745-appb-img-000015
) is extracted. In addition, the feature combining unit 120 is a third multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000016
) by applying a 1x1 convolution operation to the third output characteristic of 128 channels (
Figure PCTKR2022014745-appb-img-000017
) is extracted, and the fourth multi-extension-feature map (
Figure PCTKR2022014745-appb-img-000018
) by applying a 1x1 convolution operation to the fourth output characteristic of 256 channels (
Figure PCTKR2022014745-appb-img-000019
) is extracted.

또한, 특징결합부(120)는 이렇게 추출된 출력 특징들(

Figure PCTKR2022014745-appb-img-000020
,
Figure PCTKR2022014745-appb-img-000021
,
Figure PCTKR2022014745-appb-img-000022
,
Figure PCTKR2022014745-appb-img-000023
)을 분할하고 일정크기의 사이즈를 갖는 결합채널로 결합하여 복수의 결합 특징맵(
Figure PCTKR2022014745-appb-img-000024
)을 생성한다. 즉, 특징결합부(120)는 서로 다른 형상을 갖는 앵커박스로부터의 출력특징들이 상호 결합된 결합 특징맵(
Figure PCTKR2022014745-appb-img-000025
)을 생성하는 것이다. 이러한 결합 특징맵(
Figure PCTKR2022014745-appb-img-000026
) 은 아래 수학식 1 과 같이 정의될 수 있다.In addition, the feature combining unit 120 output features extracted in this way (
Figure PCTKR2022014745-appb-img-000020
,
Figure PCTKR2022014745-appb-img-000021
,
Figure PCTKR2022014745-appb-img-000022
,
Figure PCTKR2022014745-appb-img-000023
) are divided and combined into a combination channel having a certain size to form a plurality of combined feature maps (
Figure PCTKR2022014745-appb-img-000024
) to create That is, the feature combining unit 120 is a combined feature map in which output features from anchor boxes having different shapes are mutually combined (
Figure PCTKR2022014745-appb-img-000025
) to generate. These combined feature maps (
Figure PCTKR2022014745-appb-img-000026
) can be defined as in Equation 1 below.

[수학식 1][Equation 1]

Figure PCTKR2022014745-appb-img-000027
Figure PCTKR2022014745-appb-img-000027

여기서, N 은 앵커박스의 개수이다. 한편, 본 실시예에서는 3 개의 앵커박스를 고려한바 N 은 3이며, 이에 한정되지 않는다.Here, N is the number of anchor boxes. Meanwhile, in this embodiment, N is 3 considering three anchor boxes, but is not limited thereto.

즉, 제 1 출력특징들(

Figure PCTKR2022014745-appb-img-000028
) 과 제 4 출력특징들(
Figure PCTKR2022014745-appb-img-000029
)은 분할되어 제 1 결합특징맵(
Figure PCTKR2022014745-appb-img-000030
), 제 2 결합특징맵(
Figure PCTKR2022014745-appb-img-000031
) 및 제 3 결합특징맵(
Figure PCTKR2022014745-appb-img-000032
)의 구성성분이 되고, 제 2 출력특징들(
Figure PCTKR2022014745-appb-img-000033
) 과 제 3 출력특징들(
Figure PCTKR2022014745-appb-img-000034
)은 분할된 제 1 출력특징들(
Figure PCTKR2022014745-appb-img-000035
) 과 제 4 출력특징들(
Figure PCTKR2022014745-appb-img-000036
) 중 적어도 하나 이상과 결합하여 각각 제 2 결합특징맵(
Figure PCTKR2022014745-appb-img-000037
) 및 제 3 결합특징맵(
Figure PCTKR2022014745-appb-img-000038
)으로 생성되는 것이다. That is, the first output features (
Figure PCTKR2022014745-appb-img-000028
) and the fourth output features (
Figure PCTKR2022014745-appb-img-000029
) is divided into a first joint feature map (
Figure PCTKR2022014745-appb-img-000030
), the second joint feature map (
Figure PCTKR2022014745-appb-img-000031
) And the third joint feature map (
Figure PCTKR2022014745-appb-img-000032
), and the second output features (
Figure PCTKR2022014745-appb-img-000033
) and the third output features (
Figure PCTKR2022014745-appb-img-000034
) is the divided first output features (
Figure PCTKR2022014745-appb-img-000035
) and the fourth output features (
Figure PCTKR2022014745-appb-img-000036
) In combination with at least one or more of the second combined feature map (
Figure PCTKR2022014745-appb-img-000037
) And the third joint feature map (
Figure PCTKR2022014745-appb-img-000038
) is created.

객체검출부(130)는 결합 특징맵으로부터 오프셋(offset)을 예측하여 관심 객체를 검출한다.The object detection unit 130 detects an object of interest by predicting an offset from the combined feature map.

이를 위해, 객체검출부(130)는 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 3x3 컨볼루션 연산을 수행함으로써 오프셋을 예측할 수 있다. 즉, 객체검출부(130)는 서로 다른 형상의 앵커박스에 기초한 출력특징들이 상호 결합된 결합 특징맵 각각에 동일한 컨볼루션 커널을 적용하여 최종 오프셋을 예측하고 이를 통해 관심객체를 검출하는 것이다. To this end, the object detector 130 may predict the offset by performing a 3x3 convolution operation by applying a convolution kernel having a fixed size to each of a plurality of combined feature maps. That is, the object detector 130 predicts the final offset by applying the same convolution kernel to each combined feature map in which output features based on anchor boxes of different shapes are mutually combined, and detects the object of interest through this.

도 8 은 본 발명의 다른 실시예에 따른 관심객체 검출방법을 도시한 순서도이다.8 is a flowchart illustrating a method for detecting an object of interest according to another embodiment of the present invention.

본 실시예에 따른 관심객체 검출방법은 관심객체 검출장치에서의 입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 방법으로서, 다중확장-특징맵 추출부에서 상기 입력 이미지로부터 상기 앵커 박스의 형상에 기초한 복수의 다중확장-특징맵을 추출하는 단계(S10), 특징결합부에서 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성하는 단계(S20) 및 객체검출부에서 상기 결합 특징맵으로부터 오프셋(offset)을 예측하여 상기 관심 객체를 검출하는 단계(S30)를 포함한다.A method of detecting an object of interest according to the present embodiment is a method of detecting an object of interest based on at least one anchor box on an input image in an object of interest detection apparatus, wherein the multi-expansion-feature map extractor detects the anchor box from the input image. Extracting a plurality of multiple extension-feature maps based on the shape of (S10), generating a combined feature map by combining features extracted from each multiple extension-feature map in a feature combining unit (S20), and an object detection unit and detecting the object of interest by estimating an offset from the combined feature map in (S30).

한편, 다중확장-특징맵을 추출하는 단계(S10)는, 상기 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출하는 것일 수 있다. 또한, 다중확장-특징맵을 추출하는 단계(S10)는, 상기 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 상기 확장된 컨볼루션 연산을 수행하는 것일 수 있다.Meanwhile, in the step of extracting the multi-extension-feature map (S10), the multi-extension-feature map may be extracted by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box. . In addition, the step of extracting the multiple extension-feature map (S10) may be to perform the extended convolution operation to which a plurality of convolution kernels corresponding to the aspect ratio of the anchor box are applied.

또한, 결합 특징맵을 생성하는 단계(S20)는, 상기 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 출력 특징들을 추출하고, 상기 출력 특징들을 분할하고 상기 출력 특징들을 분할하고 분할된 출력 특징들을 상호 결합하여 복수의 결합 특징맵을 생성하는 것일 수 있다.In addition, in the step of generating a combined feature map (S20), output features are extracted by applying a 1x1 convolution operation to the plurality of multiple extension-feature maps, the output features are divided, the output features are divided, and the divided A plurality of combined feature maps may be generated by combining the output features with each other.

또한, 관심 객체를 검출하는 단계(S30)는, 상기 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 컨볼루션 연산을 수행함으로써 상기 오프셋 값을 산출하는 것일 수 있다.In the step of detecting the object of interest (S30), the offset value may be calculated by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps.

그 밖의 설명은 관심객체 검출장치와 동일한 바 이에 대한 설명은 도 3 내지 도 7 에 대한 설명으로 대체한다. Since other descriptions are the same as those of the object of interest detection device, descriptions thereof are replaced with those of FIGS. 3 to 7 .

도 9a 및 도9b 는 도 8 에 따른 관심객체 검출방법과 종래의 방법에 따라 관심객체를 검출한 결과의 일 예를 도시한 도면이다. 9A and 9B are diagrams illustrating an example of a result of detecting an object of interest according to the method of detecting an object of interest according to FIG. 8 and a conventional method.

구체적으로 도 9a 는 종래의 방법에 따라 관심객체를 검출한 결과이고 도 9b는 본 발명의 관심객체 검출방법에 따라 관심객체를 검출한 결과를 도시한 도면으로서, 도 9a 및 도 9b 를 참조하면, 본 발명에 따라 관심객체를 검출하면 기존 기법에서 검출하지 못한 객체들이 다양하게 검출되는 모습을 확인할 수 있다.Specifically, FIG. 9A is a result of detecting an object of interest according to a conventional method, and FIG. 9B is a diagram showing a result of detecting an object of interest according to the method of detecting an object of interest according to the present invention. Referring to FIGS. 9A and 9B, When an object of interest is detected according to the present invention, it is possible to confirm that objects not detected in the existing technique are detected in various ways.

즉, 상술한 본 발명에 따르면, 입력 이미지에서 앵커 박스의 다양한 형상에 기초한 다중확장-특징맵들을 추출하고, 이들의 특징들을 상호 결합한 결합 특징맵으로부터 관심 객체를 검출하므로 객체의 형상에 보다 강인하게 객체 검출이 수행될 수 있다. 따라서, 기존의 관심객체 검출장치에서 검출하지 못한 객체에 대해서도 검출이 가능하므로, 관심객체 검출장치의 성능이 큰 폭으로 향상될 수 있다.That is, according to the present invention described above, multiple extension-feature maps based on various shapes of anchor boxes are extracted from an input image, and an object of interest is detected from a combined feature map obtained by combining these features, thereby making it more robust to the shape of the object. Object detection may be performed. Therefore, since it is possible to detect an object not detected by the existing object-of-interest detection apparatus, the performance of the object-of-interest detection apparatus can be greatly improved.

이상에서 설명한 실시예들에 따른 관심객체 검출방법에 의한 동작은 적어도 부분적으로 컴퓨터 프로그램으로 구현되고 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다. 실시예들에 따른 관심객체 검출동작을 구현하기 위한 프로그램이 기록되고 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다. 또한, 본 실시예를 구현하기 위한 기능적인 프로그램, 코드 및 코드 세그먼트(segment)들은 본 실시예가 속하는 기술 분야의 통상의 기술자에 의해 용이하게 이해될 수 있을 것이다.An operation by the method of detecting an object of interest according to the above-described embodiments may be at least partially implemented as a computer program and recorded on a computer-readable recording medium. A computer-readable recording medium on which a program for implementing an object-of-interest detection operation according to embodiments is recorded includes all types of recording devices storing data readable by a computer. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, and optical data storage devices. In addition, computer-readable recording media may be distributed in computer systems connected through a network, and computer-readable codes may be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing this embodiment can be easily understood by those skilled in the art to which this embodiment belongs.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to embodiments, it will be understood that those skilled in the art can variously modify and change the present invention without departing from the spirit and scope of the present invention described in the claims below. You will be able to.

[부호의 설명][Description of code]

110: 다중확장-특징맵 추출부110: multi-extension-feature map extraction unit

120: 특징결합부120: feature coupling unit

130: 객체검출부130: object detection unit

Claims (11)

입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 관심객체 검출장치로서,An object of interest detection apparatus for detecting an object of interest based on at least one anchor box on an input image, comprising: 상기 입력 이미지로부터 상기 앵커 박스의 형상에 기초한 복수의 다중확장-특징맵을 추출하는 다중확장-특징맵 추출부;a multiple extension-feature map extractor extracting a plurality of multiple extension-feature maps based on the shape of the anchor box from the input image; 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성하는 특징결합부; 및 a feature combining unit generating a combined feature map by combining features extracted from each multi-extension-feature map; and 상기 결합 특징맵으로부터 오프셋(offset)을 예측하여 상기 관심 객체를 검출하는 객체검출부를 포함하는, 관심객체 검출장치.and an object detection unit configured to detect the object of interest by estimating an offset from the combined feature map. 제 1 항에 있어서,According to claim 1, 상기 다중확장-특징맵 추출부는,The multi-extension-feature map extraction unit, 상기 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출하는 것인, 관심객체 검출장치.and extracting the multi-extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box. 제 2 항에 있어서,According to claim 2, 상기 다중확장-특징맵 추출부는,The multi-extension-feature map extraction unit, 상기 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 상기 확장된 컨볼루션 연산을 수행하는 것인, 관심객체 검출장치.and performing the extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box. 제 3 항에 있어서,According to claim 3, 상기 특징결합부는,The feature combining unit, 상기 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 출력 특징들을 추출하고,Extracting output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps; 상기 출력 특징들을 분할하고 분할된 출력 특징들을 상호 결합하여 복수의 결합 특징맵을 생성하는 것인, 관심객체 검출장치.and dividing the output features and combining the divided output features to generate a plurality of combined feature maps. 제 4 항에 있어서,According to claim 4, 상기 객체 검출부는,The object detection unit, 상기 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 컨볼루션 연산을 수행함으로써 상기 오프셋을 예측하는 것인, 관심객체 검출장치.and predicting the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps. 관심객체 검출장치에서의 입력 이미지 상에서 적어도 하나 이상의 앵커 박스에 기반하여 관심 객체를 검출하는 방법으로서,A method of detecting an object of interest based on at least one anchor box on an input image in an object of interest detection apparatus, the method comprising: 상기 입력 이미지로부터 상기 앵커 박스의 형상에 기초한 복수의 다중확장- 특징맵을 추출하는 단계;extracting a plurality of multiple expansion-feature maps based on the shape of the anchor box from the input image; 각각의 다중확장-특징맵에서 추출된 특징들을 결합하여 결합 특징맵을 생성하는 단계; 및 generating a combined feature map by combining features extracted from each multi-extension-feature map; and 상기 결합 특징맵으로부터 오프셋(offset)을 예측하여 상기 관심 객체를 검출하는 단계를 포함하는, 관심객체 검출방법.and detecting the object of interest by estimating an offset from the combined feature map. 제 6 항에 있어서,According to claim 6, 상기 다중확장-특징맵을 추출하는 단계는,The step of extracting the multi-extension-feature map, 상기 앵커박스의 형상에 대응하는 비율을 갖는 확장된 컨볼루션(convolution) 연산을 수행하여 상기 다중확장-특징맵을 추출하는 것인, 관심객체 검출방법.and extracting the multiple extension-feature map by performing an extended convolution operation having a ratio corresponding to the shape of the anchor box. 제 7 항에 있어서,According to claim 7, 상기 다중확장-특징맵을 추출하는 단계는,The step of extracting the multi-extension-feature map, 상기 앵커박스의 종횡비에 대응하는 복수의 컨볼루션 커널(convolution kernel)을 적용한 상기 확장된 컨볼루션 연산을 수행하는 것인, 관심객체 검출방법.and performing the extended convolution operation by applying a plurality of convolution kernels corresponding to the aspect ratio of the anchor box. 제 8 항에 있어서,According to claim 8, 상기 결합 특징맵을 생성하는 단계는,In the step of generating the combined feature map, 상기 복수의 다중확장-특징맵에 1x1 컨볼루션 연산을 적용하여 출력 특징들을 추출하고,Extracting output features by applying a 1x1 convolution operation to the plurality of multi-extension-feature maps; 상기 출력 특징들을 분할하고 분할된 출력 특징들을 상호 결합하여 복수의 결합 특징맵을 생성하는 것인, 관심객체 검출방법.and generating a plurality of combined feature maps by dividing the output features and combining the divided output features. 제 9 항에 있어서,According to claim 9, 상기 관심 객체를 검출하는 단계는,The step of detecting the object of interest is, 상기 복수의 결합 특징맵 각각에 고정된 크기의 컨볼루션 커널을 적용하여 컨볼루션 연산을 수행함으로써 상기 오프셋을 예측하는 것인, 관심객체 검출방법.and predicting the offset by performing a convolution operation by applying a convolution kernel having a fixed size to each of the plurality of combined feature maps. 제 6 항에 따른 관심객체 검출방법을 실행하도록 구성된, 컴퓨터로 판독가능한 기록매체에 저장된 컴퓨터 판독가능한 프로그램.A computer readable program stored in a computer readable recording medium, configured to execute the object of interest detection method according to claim 6 .
PCT/KR2022/014745 2021-10-08 2022-09-30 Object of interest detection device and method, and computer-readable program for same Ceased WO2023058999A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2021-0134298 2021-10-08
KR20210134298 2021-10-08
KR10-2022-0006514 2022-01-17
KR1020220006514A KR102840028B1 (en) 2021-10-08 2022-01-17 Device and method for detecting object of interest and computer readable program for the same

Publications (1)

Publication Number Publication Date
WO2023058999A1 true WO2023058999A1 (en) 2023-04-13

Family

ID=85804502

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/014745 Ceased WO2023058999A1 (en) 2021-10-08 2022-09-30 Object of interest detection device and method, and computer-readable program for same

Country Status (1)

Country Link
WO (1) WO2023058999A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120434363A (en) * 2025-07-07 2025-08-05 飒铂智能科技(山东)有限公司 Ammunition search method and system based on drone

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190019822A (en) * 2017-08-18 2019-02-27 삼성전자주식회사 System and method for semantic segmentation of images
KR20200092844A (en) * 2019-01-25 2020-08-04 주식회사 스트라드비젼 Learning method and testing method of object detector to be used for surveillance based on r-cnn capable of converting modes according to aspect ratios or scales of objects, and learning device and testing device using the same
KR20210024124A (en) * 2018-12-29 2021-03-04 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드 Target object detection method and apparatus, electronic device and recording medium
KR20210111417A (en) * 2020-03-03 2021-09-13 한국과학기술연구원 Robust Multi-Object Detection Apparatus and Method Using Siamese Network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190019822A (en) * 2017-08-18 2019-02-27 삼성전자주식회사 System and method for semantic segmentation of images
KR20210024124A (en) * 2018-12-29 2021-03-04 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드 Target object detection method and apparatus, electronic device and recording medium
KR20200092844A (en) * 2019-01-25 2020-08-04 주식회사 스트라드비젼 Learning method and testing method of object detector to be used for surveillance based on r-cnn capable of converting modes according to aspect ratios or scales of objects, and learning device and testing device using the same
KR20210111417A (en) * 2020-03-03 2021-09-13 한국과학기술연구원 Robust Multi-Object Detection Apparatus and Method Using Siamese Network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAKAHASHI NAOYA; MITSUFUJI YUKI: "Densely connected multidilated convolutional networks for dense prediction tasks", 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), IEEE, 20 June 2021 (2021-06-20), pages 993 - 1002, XP034007538, DOI: 10.1109/CVPR46437.2021.00105 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120434363A (en) * 2025-07-07 2025-08-05 飒铂智能科技(山东)有限公司 Ammunition search method and system based on drone
CN120434363B (en) * 2025-07-07 2025-09-05 飒铂智能科技(山东)有限公司 Ammunition search method and system based on drone

Similar Documents

Publication Publication Date Title
KR20200047307A (en) Cnn-based learning method, learning device for selecting useful training data and test method, test device using the same
US9405986B2 (en) Apparatus and method for recognizing objects using filter information
EP3620945B1 (en) Obstacle distribution simulation method, device and terminal based on multiple models
WO2021230457A1 (en) Learning method and learning device for training an object detection network by using attention maps and testing method and testing device using the same
JP2019149150A (en) Method and apparatus for processing point cloud data
AU2019419781A1 (en) Vehicle using spatial information acquired using sensor, sensing device using spatial information acquired using sensor, and server
WO2019240452A1 (en) Method and system for automatically collecting and updating information related to point of interest in real space
WO2015105239A1 (en) Vehicle and lane position detection system and method
WO2021235682A1 (en) Method and device for performing behavior prediction by using explainable self-focused attention
WO2020067751A1 (en) Device and method for data fusion between heterogeneous sensors
WO2015178540A1 (en) Apparatus and method for tracking target using handover between cameras
WO2020141694A1 (en) Vehicle using spatial information acquired using sensor, sensing device using spatial information acquired using sensor, and server
KR102333520B1 (en) Method, device and system for detecting object on road
WO2011034308A2 (en) Method and system for matching panoramic images using a graph structure, and computer-readable recording medium
JP2007272456A5 (en)
KR102836141B1 (en) Methods for analyzing sensor data streams and guiding devices and vehicles
KR101963404B1 (en) Two-step optimized deep learning method, computer-readable medium having a program recorded therein for executing the same and deep learning system
US11847837B2 (en) Image-based lane detection and ego-lane recognition method and apparatus
WO2023058999A1 (en) Object of interest detection device and method, and computer-readable program for same
WO2016143976A1 (en) Method for recognizing operator in work site image data
KR101986015B1 (en) Device and method for multiple sensor simulation
WO2021091053A1 (en) Location measurement system using image similarity analysis, and method thereof
WO2022270751A1 (en) Method and device for detecting road surface by using lidar sensor
WO2018131729A1 (en) Method and system for detection of moving object in image using single camera
WO2025089565A1 (en) System and method for sensor fusion using plurality of radars and cameras

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22878820

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22878820

Country of ref document: EP

Kind code of ref document: A1