WO2023106870A1

WO2023106870A1 - Re-structuralized convolutional neural network system using cmp and operation method thereof

Info

Publication number: WO2023106870A1
Application number: PCT/KR2022/019972
Authority: WO
Inventors: 심춘보; 박준; 김준영; 정세훈; 박성욱
Original assignee: Sunchon National University SCNU
Current assignee: Sunchon National University SCNU
Priority date: 2021-12-08
Filing date: 2022-12-08
Publication date: 2023-06-15
Anticipated expiration: 2024-06-08
Also published as: KR102875528B1; KR20230086233A

Abstract

A re-structuralized convolutional neural network system using CMP according to the disclosed present invention comprises: a convolution layer that performs a convolution operation of an input image through a filter and outputs a feature map as a result of the operation; a pooling layer that performs sub sampling of the output feature map to reduce the size of the feature map; and a fully connected layer that flattens the feature map having the reduced size into one-dimensional array, adjusts a parameter to predict a label, proceeds to learning, and outputs a result. The pooling layer (200a, 200b) comprises a two-stage structure having: a conditional min pooling (CMP) unit (210) that, when 0 does not exist in a window, extracts a minimum value as a feature value by using min pooling and, when 0 exists in a window, does not extract 0 as a feature value, identifies the number of pixels including 0, and then configures a restriction for the ratio of the identified 0 according to an allowable value (0.25-1); and a max pooling unit (220) that extracts a maximum value from a window as a feature value.

Description

Restructured convolutional neural network system using CMP and its operation method

본 발명은 합성곱 신경망 시스템에 관한 것으로써, 보다 상세하게는 CMP(Conditional Min Pooling)를 이용한 재구조화된 합성곱 신경망 시스템 및 그 동작 방법에 관한 것이다. The present invention relates to a convolutional neural network system, and more particularly, to a restructured convolutional neural network system using Conditional Min Pooling (CMP) and an operation method thereof.

인공지능을 활용한 다양한 연구 중 특히 컴퓨터 비전(Computer Vision) 분야는 스마트폰 카메라의 성능 발달과 SNS(Social Networking Service)의 확산 등으로 양질의 데이터를 쉽게 습득할 수 있게 되었다. 컴퓨터 비전기술은 컴퓨터에 있어 사람의 눈과 같은 역할을 하기에 자동차나 드론의 무인화 운행, 실시간 행동 분석, 이미지 복원 등 다양한 분야에서 접목한 연구가 활발히 진행되고 있다.Among various studies using artificial intelligence, especially in the field of computer vision, it has become easy to acquire high-quality data due to the development of smartphone cameras and the spread of SNS (Social Networking Service). Since computer vision technology plays the same role as human eyes in computers, researches are being actively conducted in various fields such as unmanned operation of cars or drones, real-time behavior analysis, and image restoration.

그러나 컴퓨터 비전기술은 탐지할 객체가 없더라도 물체가 있는 것으로 인식하거나 탐지할 객체가 있음에도 탐지하지 못하는 문제가 발생한다. 이러한 잘못된 탐지는 사고 발생 혹은 완전히 다른 방향의 결론이 도출될 수 있기에 영상에 대한 정확한 분석은 필수적으로 이루어져야 한다.However, computer vision technology has a problem in that it recognizes that there is an object even if there is no object to be detected or cannot detect it even though there is an object to be detected. Accurate analysis of the images must be performed as a necessity since such false detections can lead to accidents or conclusions in a completely different direction.

컴퓨터 비전기술의 연구 방향에 있어 과거에는 정확한 영상 분석을 위해 사람이 의도하는 특징을 추출하여 활용하는 방법이 사용되었다. 현재는 대규모 데이터에서 스스로 중요한 패턴 및 규칙을 찾아내어 학습하는 딥러닝(Deep Learning)으로 변경되고 있다. 컴퓨터 비전기술에 활용되는 딥러닝의 대부분은 합성곱 신경망(Convolutional Neural Network, 이하 CNN)을 기반으로 하는 연구들로 진행되고 있다. In the research direction of computer vision technology, in the past, a method of extracting and utilizing features intended by humans was used for accurate image analysis. Currently, it is changing to Deep Learning, which learns by finding important patterns and rules on its own in large-scale data. Most of the deep learning used in computer vision technology is being conducted with studies based on convolutional neural networks (CNNs).

CNN 초기 모델은 개발 당시의 하드웨어 성능으로는 높은 연산량를 처리하는데 긴 시간이 소요되며, 괄목할만한 성능을 보이지 못해 주목받지 못하였으나, 최근 하드웨어의 성능 발전으로 기존 방식의 문제점이었던 높은 연산량을 빠른 속도의 처리와 더불어 높은 수준의 성능까지 보여 많은 연구자의 주목을 받아 다양한 목적으로 활발한 연구가 진행되고 있다.The initial model of CNN took a long time to process a high amount of computation with the hardware performance at the time of development, and did not receive attention because it did not show remarkable performance. In addition to the high level of performance, it has attracted the attention of many researchers, and active research is being conducted for various purposes.

특히 컨볼루션 구조 변경은 특징맵 추출에 가장 직접적인 영향을 반영하기에 CNN에 관한 주된 연구는 대부분 컨볼루션 구조 변경으로 진행되고 있다. In particular, since changing the convolutional structure has the most direct effect on feature map extraction, most of the main studies on CNN are conducted with the changing of the convolutional structure.

풀링(Pooling)의 경우 특징맵(Feature map) 크기를 줄여 연산량 감소와 과적합(Overfitting) 방지 같은 핵심 기능을 수행하지만, Pooling 구조 변경은 컨볼루션 구조 변경만큼 CNN 성능 향상에 많은 영향을 주지 못하기에 연구가 거의 진행되지 않는다. 그러나 CNN에서 풀링(Pooling)은 활용되는 기법에 따라 다양한 문제점이 존재한다. In the case of pooling, it performs key functions such as reducing the amount of computation and preventing overfitting by reducing the size of feature maps, but changing the pooling structure does not have as much effect on improving CNN performance as changing the convolution structure. Little research is done on However, pooling in CNN has various problems depending on the technique used.

일반적인 Pooling 기법은 Min Pooling, Average Pooling, Max Pooing 을 들 수 있다. Common pooling techniques include Min Pooling, Average Pooling, and Max Pooling.

Min Pooling은 특징값으로 최솟값의 추출을 진행하기에 특정 이미지에서는 특징 자체가 사라지거나 노이즈(Noise)가 특징으로 같이 검출될 수 있다는 문제점이 존재하여 거의 활용되지 않는다. Average Pooling은 양의 특징과 음의 특징이 같은 공간에 존재할 경우 두 특징값에 대한 평균값을 계산하여 두 특징이 서로 상쇄되어 아무런 특징이 남지 않는 상황이 발생할 수 있다. 또한, 활성화 함수의 종류에 따라 강한 특징들이 줄어드는 Down-Scale Weighting 효과가 발생할 수도 있다. Max Pooling은 공간에 배치된 특징들 사이에 강한 특징만을 취하기에 정교한 특징들은 사라지게 된다. Max Pooling을 통해 추출된 특징은 유사한 객체 판별에 있어서 정확도가 저하되거나 과적합이 되기 쉽다는 문제가 존재한다.Min Pooling extracts the minimum value as a feature value, so it is rarely used because there is a problem that the feature itself disappears or noise can be detected as a feature in a specific image. Average Pooling calculates the average value of the two feature values when a positive feature and a negative feature exist in the same space, and the two features cancel each other out, which can lead to a situation where no feature remains. In addition, a down-scale weighting effect in which strong features are reduced may occur depending on the type of activation function. Max Pooling takes only strong features among the features arranged in space, so sophisticated features disappear. Features extracted through max pooling have problems in that accuracy is reduced or overfitting is easy in discriminating similar objects.

이러한 일반적인 Pooling 기법들이 지닌 문제점들을 해결하기 위한 방안으로 Stochastic Pooling이 연구되고 있다. Stochastic Pooling은 확률을 활용하는 기법으로 기존의 Pooling 기법들과 마찬가지로 윈도우 방식을 사용한다. Stochastic Pooling은 윈도우 내의 픽셀마다 전체 픽셀값을 나누는 방식으로 픽셀값들을 Normalization 하여 픽셀별 확률을 계산한다. 이후 픽셀이 갖는 확률에 따라 랜덤하게 특징값으로 추출될 픽셀을 선정하게 된다. 결과적으로 중복되는 값이 많을수록 특징값으로 추출될 확률이 높기에 의미가 있는 값이 추출될 확률이 높다.Stochastic pooling is being studied as a way to solve the problems of these general pooling techniques. Stochastic Pooling is a technique that utilizes probability, and like existing pooling techniques, it uses a window method. Stochastic Pooling calculates the probability for each pixel by normalizing the pixel values by dividing the entire pixel value for each pixel in the window. Then, a pixel to be extracted as a feature value is randomly selected according to the probability of the pixel. As a result, the more overlapping values, the higher the probability of being extracted as a feature value, and the higher the probability of extracting a meaningful value.

그러나, Stochastic Pooling은 확률을 통해 Pooling의 문제를 해결하고자 하였으나, 실제 Stochastic Pooling을 사용할 경우 성능이 떨어지는 문제가 있어 이를 응용하는 연구들이 추가로 진행되고 있다.However, Stochastic Pooling tried to solve the pooling problem through probability, but when using Stochastic Pooling in practice, there is a problem with poor performance, so additional studies are being conducted to apply it.

본 발명은 상기와 같은 점을 감안하여 안출된 것으로써, 기존의 Pooling 기법이 지닌 문제들을 해결하기 위해 Stochastic Pooling과 같은 확률 기반이 아닌 통계 기반의 Conditional Min Pooling(이하 'CMP')을 제공하는데 그 목적이 있다.The present invention was made in view of the above points, and provides conditional min pooling (hereinafter referred to as 'CMP') based on statistics rather than probability-based like stochastic pooling to solve the problems of existing pooling techniques. There is a purpose.

또한 CMP를 효율적으로 활용하기 위해 재구조화된 합성곱 신경망 시스템을 제공하는데 또 다른 목적이 있다.In addition, another purpose is to provide a restructured convolutional neural network system to efficiently utilize CMP.

상기 목적을 달성하기 위한 본 발명에 따른 CMP를 이용한 재구조화된 합성곱 신경망 시스템은, 입력된 이미지를 필터를 통해 컨볼루션 연산을 처리하며 연산 결과로 특징맵을 출력하는 컨볼루션 레이어(Convolution Layer)와, 상기 출력된 특징맵을 서브 샘플링(sub sampling)을 통해 특징맵의 크기(size)를 줄이는 풀링 레이어(Pooling Layer) 및 상기 크기가 줄어든 특징맵을 1차원 배열로 풀어서 Label을 예측할 수 있도록 매개변수를 조정하여 학습을 진행하여 결과를 출력하는 전결합 레이어(Fully Connected Layer)를 포함한다. 여기서 상기 풀링 레이어는, 윈도우 내에 0이 존재하지 않으면 Min Pooling 과 같이 최소값을 특징값으로 추출하고, 윈도우 내에 0이 존재하면 0을 특징값으로 추출하지 않고 0이 포함된 픽셀의 수를 확인 후, 확인된 0의 비율은 허용치(0.25~1)에 따라 제약을 설정하도록 하는 CMP(Conditional Min Pooling) 유닛과, 윈도우 내에 최대값을 특징값으로 추출하는 Max Pooling 유닛의 2단 구조를 포함한다.The restructured convolutional neural network system using CMP according to the present invention for achieving the above object is a convolution layer that processes an input image through a filter and outputs a feature map as a result of the operation. And, a pooling layer that reduces the size of the feature map through sub-sampling of the output feature map and a parameter to predict the label by unpacking the size-reduced feature map into a one-dimensional array It includes a Fully Connected Layer that proceeds with learning by adjusting variables and outputs the result. Here, the pooling layer extracts the minimum value as a feature value as in Min Pooling if 0 does not exist in the window, and if 0 exists in the window, 0 is not extracted as a feature value and the number of pixels containing 0 is checked, The confirmed ratio of 0 includes a two-stage structure of a Conditional Min Pooling (CMP) unit that sets constraints according to allowable values (0.25 to 1) and a Max Pooling unit that extracts the maximum value within a window as a feature value.

상기 CMP 유닛의 허용치는 0.25, 0.5, 0.75, 1일 수 있다.The allowable value of the CMP unit may be 0.25, 0.5, 0.75, or 1.

상기 풀링 레이어는, 입력된 이미지를 컨볼루션 레이어에 의한 컨볼루션 연산처리 후 상기 CMP 유닛과 Max Pooling 유닛에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 1×1 크기의 컨볼루션을 통해 채널을 감소시키고 하나의 특징맵으로 결합하는 제 1 풀링 레이어와, 상기 제 1 풀링 레이어를 통해 결합된 특징맵이 상기 복수의 컨볼루션 레이어에 컨볼루션 연산을 반복적으로 수행한 후, 상기 전결합 레이어 이전에 상기 CMP 유닛과 Max Pooling에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 추출한 후 하나의 특징맵으로 겹합하여 상기 전결합 레이어로 보내도록 하는 제 2 풀링 레이어를 포함할 수 있다.The pooling layer converts the input image into a channel through convolution with a size of 1 × 1 for the two feature maps processed through two-stage pooling by the CMP unit and the Max Pooling unit after convolution operation is performed by the convolution layer. A first pooling layer for reducing and combining into one feature map, and after the feature maps combined through the first pooling layer repeatedly perform convolution operations on the plurality of convolution layers, before the precombination layer A second pooling layer may be included to extract two feature maps processed through two-stage pooling by the CMP unit and Max Pooling, combine them into one feature map, and send them to the pre-combination layer.

한편, 본 발명의 실시예에 따른 CMP를 이용한 재구조화된 합성곱 신경망 시스템의 동작 방법은, a) 입력된 이미지를 컨볼루션 레이어를 통해 컨볼루션 연산을 수행하여 특징맵을 추출하는 단계; b) 상기 a)단계에서 추출된 특징맵을 제 1 풀링 레이어를 통해 상기 CMP 유닛과 Max Pooling 유닛에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 1×1 크기의 컨볼루션을 통해 채널을 감소시키고 하나의 특징맵으로 결합하는 단계; c) 상기 b)단계에서 결합된 특징맵이 설정된 복수의 컨볼루션 레이어를 통해 컨볼루션 연산을 반복 수행하여 특징맵을 추출하는 단계; d) 상기 c)단계에서 추출된 특징맵을 제 2 풀링 레이어를 통해 상기 CMP 유닛과 Max Pooling에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 추출한 후 하나의 특징맵으로 겹합하는 단계; 및, e) 상기 d)단계에서 결합된 특쟁맵을 전결합 레이어를 통해 최종 신경망의 결과를 예측하여 출력하는 단계;를 포함한다. Meanwhile, a method of operating a restructured convolutional neural network system using CMP according to an embodiment of the present invention includes: a) extracting a feature map by performing a convolution operation on an input image through a convolution layer; b) The feature maps extracted in step a) are reduced through the first pooling layer, and the two feature maps processed through two-stage pooling by the CMP unit and the Max Pooling unit are convolved with a size of 1 × 1 to reduce the channel. and combining them into one feature map; c) extracting a feature map by repeatedly performing a convolution operation through a plurality of convolution layers in which the feature map combined in step b) is set; d) extracting the feature maps extracted in step c) through a second pooling layer and combining the two feature maps into one feature map after extracting two feature maps performed through two-stage pooling by the CMP unit and Max Pooling; and, e) predicting and outputting the result of the final neural network through the precombined layer of the feature map combined in step d).

본 발명에 의하면 풀링 레이어에서 CMP 구조를 제안함으로써 종래 Pooling에서 발생하는 특징 상쇄화 과적합 문제를 해결하여 세밀한 특징 정보를 찾을 수 있는 효과가 있으며, 아울러 이러한 CMP를 효율적으로 활용하기 위해 재구조화된 CNN 시스템을 제안함으로써 더욱 정밀하고 다양한 특징 정보들을 찾아내어 , 결국 신경망 모델의 성능 정확성을 높이고 에러율을 낮출 수 있는 우수한 효과가 있다. According to the present invention, by proposing a CMP structure in the pooling layer, there is an effect of finding detailed feature information by solving the feature offset overfitting problem that occurs in conventional pooling, and a CNN restructured to efficiently utilize such CMP By proposing the system, more precise and diverse feature information can be found, resulting in an excellent effect of increasing the performance accuracy of the neural network model and lowering the error rate.

도 1은 종래 일반적인 합성곱 신경망 시스템의 구조를 나타낸 도면,1 is a diagram showing the structure of a conventional general convolutional neural network system;

도 2는 도 1의 컨볼루션 레이어의 연산 과정을 설명하기 위한 도면,2 is a diagram for explaining the operation process of the convolution layer of FIG. 1;

도 3은 도 1의 풀링 레이어의 연산 과정을 설명하기 위한 도면,3 is a diagram for explaining the calculation process of the pooling layer of FIG. 1;

도 4는 도 1의 전결합 레이어의 연산 과정을 설명하기 위한 도면,4 is a diagram for explaining the calculation process of the precombined layer of FIG. 1;

도 5는 본 발명의 실시예에 따른 재구조화된 합성곱 신경망 시스템의 구조를 타나낸 도면,5 is a diagram showing the structure of a restructured convolutional neural network system according to an embodiment of the present invention;

도 6은 도 1의 CMP 유닛의 연산 과정을 설명하기 위한 도면, 6 is a diagram for explaining the calculation process of the CMP unit of FIG. 1;

도 7은 본 발명의 실시예에 따른 재구조화된 합성곱 신경망 시스템의 동작을 설명하기 위한 도면,7 is a diagram for explaining the operation of a restructured convolutional neural network system according to an embodiment of the present invention;

도 8은 Clatech 101 데이터를 활용한 Pooling 최종 성능 결과를 나타낸 도면,8 is a diagram showing the final performance results of Pooling using Clatech 101 data;

도 9는 크롤링 데이터를 활용한 Pooling 최종 성능 결과를 나타낸 도면, 9 is a diagram showing the final performance results of Pooling using crawling data;

도 10은 Caltech 101 데이터를 활용한 신경망 모델의 최종 성능 결과를 나타낸 도면,10 is a diagram showing final performance results of a neural network model using Caltech 101 data;

도 11은 크롤링 데이터를 활용한 신경망 모델의 최종 성능 결과를 나타낸 도면이다.11 is a diagram showing final performance results of a neural network model using crawling data.

본 발명의 상기와 같은 목적, 특징 및 다른 장점들은 첨부도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명함으로써 더욱 명백해질 것이다. The above objects, features and other advantages of the present invention will become more apparent by describing preferred embodiments of the present invention in detail with reference to the accompanying drawings.

먼저 이해의 편의를 위해, 본 발명의 실시예에 따른 CMP를 이용한 재구조화된 합성곱 신경망 시스템을 설명하기에 앞서 종래의 일반적인 합성곱 신경망 시스템에 대해 설명하기로 한다.First, for convenience of understanding, a conventional general convolutional neural network system will be described prior to describing a restructured convolutional neural network system using CMP according to an embodiment of the present invention.

도 1은 종래의 일반적인 합성곱 신경망 시스템의 구조를 나타낸 것이다.1 shows the structure of a conventional convolutional neural network system.

도시된 바와 같이, 일반적인 합성곱 신경망 시스템(CNN:Convolutional Neural Network)의 학습 방식은 이미지와 데이터의 Label을 제공하는 지도학습기반으로 진행된다. 입력된 이미지(Input Image)는 컨볼루션 레이어(Convolution Layer)에서 필터를 통해 컨볼루션 연산을 처리하며 연산 결과로 특징맵을 출력한다. 이렇게 획득한 특징맵은 풀링 레이어(Pooling Layer)를 거쳐 특징맵의 크기(Size)를 줄인다. 크기가 줄어든 특징맵은 1차원으로 펼쳐져 전결합 레이어(Fully Connected Layer)를 통해 Label을 예측할 수 있도록 매개변수를 조정하며 학습을 진행한다.As shown, the learning method of a general convolutional neural network (CNN) is based on supervised learning that provides labels for images and data. The input image is subjected to convolution operation through a filter in the convolution layer, and a feature map is output as a result of the operation. The feature map obtained in this way reduces the size of the feature map through a pooling layer. The size-reduced feature map is unfolded in one dimension, and training is performed while adjusting parameters so that the label can be predicted through a fully connected layer.

컨볼루션 레이어(Convolution Layer)는 이미지의 특징을 추출하기 위해 필터를 활용하여 컨볼루션 연산을 진행한다. 즉 컨볼루션 레이어는 이미지의 특징을 추출하여 특징맵을 생성한다. 도 2는 컨볼루션 레이어의 동작 방식으로 높이와 너비를 가지는 행렬 형태의 필터이다. 입력된 이미지의 픽셀에서 윈도우를 일정 간격으로 이동하여 필터와 이미지의 픽셀이 대응하는 원소끼리 곱셈 연산을 진행한다. 연산 후 총합을 구하는 단일 곱셈 누산(Fused Multiply-Add)을 진행한다. 연산 동작을 반복한 결과로 입력 이미지의 특징맵을 생성하게 된다. 이렇게 생성된 특징맵은 필터의 크기에 따라 특징맵의 크기가 결정된다. 특징맵은 필터의 형태에 따라 다른 특징이 추출되기 때문에 특징맵에 여러 필터를 사용함으로써 여러 채널(Channel)을 보유하게 된다.A convolution layer performs a convolution operation using a filter to extract features of an image. That is, the convolution layer extracts the features of the image and creates a feature map. 2 is a matrix-type filter having a height and a width as an operation method of a convolution layer. The window is moved at regular intervals from the pixels of the input image, and the elements corresponding to the pixels of the filter and the image are multiplied. After the operation, a single multiplication-accumulation (Fused Multiply-Add) is performed to obtain the total sum. As a result of repeating the calculation operation, a feature map of the input image is created. The size of the feature map created in this way is determined according to the size of the filter. Since different features are extracted from the feature map depending on the type of filter, multiple channels are retained by using multiple filters in the feature map.

풀링 레이어(Pooling Layer)는 컨볼루션 레이어의 연산 후 특징맵을 입력받아 특징맵의 가로 및 세로 방향 공간 크기를 줄여주는 연산(Sub Sampling)을 진행하는 계층이다. 풀링 레이어는 컨볼루션 레이어와 다르게 필터를 사용하지 않아 채널 수는 변하지 않으며, 풀링 레이어의 동작 방식은 도 3과 같다. 특징맵의 윈도우는 일정 간격으로 이동하며 윈도우 내에 픽셀값들을 특정 조건에 따라 하나의 픽셀값으로 추출한다. 이와 같은 동작으로 픽셀의 개수를 줄여 특징맵의 크기를 감소시킨다. Pooling 과정 중 윈도우 내에 픽셀값들을 하나의 픽셀값으로 추출하는 방법은 크게 2가지 방식이다. 첫째로, 윈도우 내에 가장 큰 값을 취하는 Max Pooling 둘째로, 윈도우 내에 모든 값의 평균을 취하는 Average Pooling이 주로 적용된다. Pooling을 거친 특징맵은 크기가 작아지기에 연산량이 감소하고, 작아진 크기만큼 정보가 감소하기 때문에 과적합(Overfitting)을 방지하는 효과도 존재한다.The pooling layer is a layer that receives a feature map after calculating the convolution layer and performs a calculation (sub sampling) to reduce the spatial size of the feature map in the horizontal and vertical directions. Unlike the convolution layer, the pooling layer does not use a filter, so the number of channels does not change, and the operation method of the pooling layer is as shown in FIG. 3 . The window of the feature map moves at regular intervals, and pixel values within the window are extracted as one pixel value according to specific conditions. With this operation, the size of the feature map is reduced by reducing the number of pixels. During the pooling process, there are two methods for extracting pixel values within a window as a single pixel value. First, Max Pooling, which takes the largest value within the window, and Second, Average Pooling, which takes the average of all values within the window, is mainly applied. Since the size of the feature map that has gone through pooling is reduced, the amount of computation is reduced, and information is reduced as much as the size is reduced, so there is an effect of preventing overfitting.

전결합 레이어(Fully Connected Layer)는 이미지를 컨볼류션 레이어와 풀링 레이어를 통과시켜 나온 특징맵을 분석하여 Label을 분류하는 CNN 구조의 마지막 계층으로써, 전결합 레이어의 동작 방식은 도 4와 같다. 전결합 레이어의 구조는 다층 퍼셉트론의 은닉층과 출력층의 형태를 가지며, 동작 방식은 최종적으로 선출된 특징맵을 1차원 배열로 풀어서 노드들에 전송하게 된다. 전송된 데이터들은 노드에 연산 값으로 사용되어 가중치를 훈련한다. 이후 예측을 진행하기 위해 활성화 함수로 소프트맥스(Softmax)를 활용함으로써 각 Label에 대한 예측값의 총합이 1이 되도록 한다. 가장 큰 확률을 갖는 결과를 최종 예측 Label로 결정하여 최종적인 신경망의 결과를 출력한다.The Fully Connected Layer is the last layer of the CNN structure that classifies labels by analyzing the feature map obtained by passing the image through the convolution layer and the pooling layer. The operation method of the Fully Connected Layer is shown in FIG. The structure of the fully combined layer has the form of a hidden layer and an output layer of a multi-layer perceptron, and the operation method is to unpack the finally selected feature map into a one-dimensional array and transmit it to the nodes. The transmitted data are used as computational values in the node to train the weights. Then, to proceed with prediction, Softmax is used as an activation function so that the sum of predicted values for each label becomes 1. The result with the highest probability is determined as the final prediction label, and the final neural network result is output.

이하 첨부된 도면을 참조하여 본 발명의 실시예에 따른 CMP를 이용한 재구조화된 합성곱 신경망 시스템 대해 상세히 설명하기로 한다.Hereinafter, a restructured convolutional neural network system using CMP according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 5은 본 발명의 실시예에 따른 CMP(Conditional Min Pooling)을 이용한 재구조화된 합성곱 신경망 시스템의 구조를 나타낸 것이다.5 shows the structure of a restructured convolutional neural network system using Conditional Min Pooling (CMP) according to an embodiment of the present invention.

본 발명의 실시예에 따른 재구조화된 합성곱 신경망 시스템(10)은 컨볼루션 레이어(100a~100d)와 풀링 레이어(200a, 200b) 및 전결합층 레이어(300)를 포함하며, 컨볼루션 레이어와 전결합층 레이어는 종래의 합성공 신경망의 그것과 동일한 구조 및 동작을 수행하게 된다. The restructured convolutional neural network system 10 according to an embodiment of the present invention includes convolution layers 100a to 100d, pooling layers 200a and 200b, and a fully coupled layer 300, and includes a convolution layer and The fully coupled layer performs the same structure and operation as that of the conventional synthetic hole neural network.

따라서, 컨볼루션 레이어 및 전결합층 레이어에 대한 상세한 설명은 생략하기로 하고, 이하 풀링 레이어에 대해 상세히 설명하기로 한다.Therefore, detailed descriptions of the convolutional layer and the fully coupled layer will be omitted, and the pooling layer will be described in detail below.

도 5을 참조하면, 본 발명의 실시예에 따른 풀링 레이어(200a, 200b)는, CMP(Conditional Min Pooling) 유닛(210)과 Max Pooling 유닛(220)의 2단으로 구성된다. 전술한 바와 같이 Max Pooling 유닛(220)은 윈도우 내 최대값을 특징값으로 추출하는 것으로써 종래 Pooling 방식과 차이가 없으므로, 이하 CMP 유닛(210)에 대해 상세히 설명하기로 한다.Referring to FIG. 5 , the pooling layers 200a and 200b according to an embodiment of the present invention are composed of two stages of a Conditional Min Pooling (CMP) unit 210 and a Max Pooling unit 220. As described above, the Max Pooling unit 220 extracts the maximum value within the window as a feature value, and since there is no difference from the conventional pooling method, the CMP unit 210 will be described in detail below.

CMP은 Min Pooling을 기반으로 설계되는데, CMP는 윈도우 내에 0이 존재하지 않으면 Min Pooling 과 같이 최소값을 특징값으로 추출하고, 윈도우 내에 0이 존재하면 0을 특징값으로 추출하지 않고 0이 포함된 픽셀의 수를 확인 후, 확인된 0의 비율은 허용치(0~1)에 따라 제약을 설정하도록 한다. CMP is designed based on Min Pooling. If 0 does not exist in the window, CMP extracts the minimum value as a feature value like Min Pooling, and if 0 exists in the window, it does not extract 0 as a feature value and pixels containing 0 After checking the number of , the ratio of confirmed 0 sets the constraint according to the allowable value (0 to 1).

기존 Min Pooling은 윈도우 내에 픽셀값 중 최솟값을 추출하기 때문에 윈도우 내에 픽셀값 중 0이 하나라도 존재할 경우 특징값을 0으로 추출한다. 이러한 동작은 나머지 특징들 모두 삭제되는 문제점을 갖고 있다. 이에 본 발명에서 제안하는 CMP는 Min Pooling의 동작 과정에 통계적으로 제약함으로써 Min Pooling의 문제점을 해결하고, 과적합 문제를 해결할 수 있다. Existing Min Pooling extracts the minimum value among pixel values within a window, so if there is even one 0 among pixel values within a window, the feature value is extracted as 0. This operation has the problem that all remaining features are deleted. Accordingly, the CMP proposed in the present invention can solve the problem of Min Pooling and the overfitting problem by statistically restricting the operation process of Min Pooling.

CMP의 기본 동작 방식은 윈도우 내에 0이 존재하지 않으면 Min Pooling과 동일하게 최솟값을 특징값으로 추출하지만, 윈도우 내에 0이 존재한다면 기존 Min Pooling과 다르게 0을 특징값으로 추출하지 않고 0이 포함된 픽셀의 수를 확인하다. The basic operation method of CMP extracts the minimum value as a feature value in the same way as Min Pooling if 0 does not exist within the window, but if 0 exists within the window, unlike the existing Min Pooling, 0 is not extracted as a feature value and pixels containing 0 are extracted. check the number of

확인된 0의 비율은 허용치(0.25~1)에 따라 제약을 설정한다. 허용치가 0일 경우 CMP는 Min Pooling과 동일하게 동작하므로 이 경우는 제외한다. 이하 허용치가 0.25, 0.5, 0.75, 1 일때의 경우를 설명하기로 한다.The percentage of zeros identified sets the constraints according to the allowable value (0.25 to 1). If the allowable value is 0, CMP operates the same as Min Pooling, so this case is excluded. Hereinafter, cases when the allowable values are 0.25, 0.5, 0.75, and 1 will be described.

0의 허용치가 0.25일 경우에는 윈도우 내에 0의 통계치가 1/4 이상일 때, 특징값으로 0을 추출하고, 통계치가 1/4 미만일 때 0을 제외한 최소값을 특징값으로 갖는다.If the allowable value of 0 is 0.25, 0 is extracted as a feature value when the statistical value of 0 within the window is greater than 1/4, and the minimum value excluding 0 is taken as the feature value when the statistical value is less than 1/4.

그리고, 0의 허용치가 0.5일 경우에는 윈도우 내에 0의 통계치가 절반 이상일 때, 특징값으로 0을 추출하고, 0의 통계치가 절반 미만일 때 0을 제외한 최소값을 특징값으로 갖는다. And, if the allowable value of 0 is 0.5, 0 is extracted as a feature value when the statistical value of 0 is more than half within the window, and the minimum value excluding 0 is taken as the feature value when the statistical value of 0 is less than half.

그리고, 0의 허용치가 0.75일 경우에는 윈도우 내에 0의 통계치가 3/4 이상일 때, 특징값으로 0을 추출하고, 0의 통계치가 3/5 미만일 때 0을 제외한 최소값을 특징값으로 갖는다. And, if the allowable value of 0 is 0.75, 0 is extracted as a feature value when the statistical value of 0 within the window is 3/4 or more, and the minimum value excluding 0 is taken as the feature value when the statistical value of 0 is less than 3/5.

마지막으로, 0의 허용치가 1일 경우 0을 제외한 최솟값을 특징값으로 가지며, 윈도우 전체값이 0일 경우만 0을 특징값으로 갖는다.Finally, if the allowable value of 0 is 1, the minimum value excluding 0 is taken as a feature value, and only when the entire window value is 0, 0 is taken as a feature value.

도 6은 본 발명의 실시예에 따른 CMP 유닛(210)의 동작 방식을 나타내는 것으로써, 윈도우 크기는 2×2, Stride는 2의 형태를 가질 때, 0의 허용치가 0.25, 0.5, 0.75, 1일 경우의 동작 방식으로 0의 허용치에 따라 특징맵의 구성 형태가 달라지는 것을 확인할 수 있다.6 shows an operating method of the CMP unit 210 according to an embodiment of the present invention, when the window size is 2×2 and the stride is 2, the allowable value of 0 is 0.25, 0.5, 0.75, 1 As an operation method in one case, it can be confirmed that the feature map configuration changes depending on the allowable value of 0.

이와 같이 본 발명에 따른 재구조화된 합성곱 신경망 구조에 의하면, 컨볼루션 레이어는 기존의 컨볼루션 레이어를 적용하고, 풀링 레이어의 구성을 재구조화한다. 재구조화된 풀링 레이어의 구조는 전술한 CMP 유닛(210)과 Max Pooling 유닛(220)으로 이루어져 두 개의 Pooling 기법을 활용할 수 있도록 2단으로 구성하고 각각 다른 Pooling 기법을 적용한다. 이처럼 2단 Pooling을 통해 진행된 두 개의 특징맵은 기존의 특징맵보다 2배 이상의 특징값을 갖는다. In this way, according to the restructured convolutional neural network structure according to the present invention, the convolution layer applies the existing convolution layer and restructures the configuration of the pooling layer. The structure of the restructured pooling layer consists of the above-described CMP unit 210 and Max Pooling unit 220, so that two pooling techniques can be used, and different pooling techniques are applied to each. As such, the two feature maps processed through two-step pooling have feature values more than twice that of the existing feature maps.

한편, 특징맵은 컨볼루션 과정뿐만 아니라 Pooling 과정에서도 데이터 손실이 일어날 수밖에 없다. 기존 연구에서는 데이터 손실을 줄이기 위해 많은 연구를 진행해왔지만, 대부분의 연구는 성능 변화가 큰 컨볼루션 계층을 기준으로 진행되고 있다. Pooling은 컨볼루션 계층만큼은 아니더라도 Pooling 기법에 따라 다른 형태의 특징맵이 추출될 수 있다. 또한, 컨볼루션 과정을 거친 특징맵은 연산량 감소 및 과적합을 방지하기 위해 필수적으로 Pooling 과정을 진행하기 때문에 Pooling 과정에 따른 CNN의 구조 개선은 필수적으로 이루어져야 하는 사항이다.On the other hand, feature maps inevitably suffer data loss not only in the convolution process but also in the pooling process. In previous studies, many studies have been conducted to reduce data loss, but most studies are conducted based on convolutional layers with large performance changes. Pooling can extract different types of feature maps depending on the pooling technique, although not as much as the convolution layer. In addition, since the feature map that has undergone the convolution process necessarily undergoes the pooling process to reduce the amount of computation and prevent overfitting, it is essential to improve the structure of the CNN according to the pooling process.

따라서, 이를 감안하여 본 발명은 풀링 레이어의 구조를 개선하게 되는데, 다시 도 5를 참조하면, 본 발명에 의한 풀링 레이어는 제 1 풀링 레이어(200a)와 제 2 풀링 레이어(200b)를 포함한다. Therefore, in consideration of this, the present invention improves the structure of the pooling layer. Referring back to FIG. 5, the pooling layer according to the present invention includes a first pooling layer 200a and a second pooling layer 200b.

제 1 풀링 레이어(200a)는 입력된 이미지를 컨볼루션 레이어(100a)에 의한 연산처리 후 상기 CMP 유닛(210)과 Max Pooling 유닛(220)에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 1×1 크기의 컨볼루션(convolution)을 통해 채널을 감소시키고, 비선형성(Non-Linearity)을 증가시킨 뒤 하나의 특징맵으로 결합하게 된다(Filter concaternation). The first pooling layer 200a converts the two feature maps processed through two-stage pooling by the CMP unit 210 and the Max Pooling unit 220 after calculation processing of the input image by the convolution layer 100a into 1 Channels are reduced through ×1-sized convolution, non-linearity is increased, and then combined into one feature map (filter concatenation).

상기 제 1 풀링 레이어(200a)를 통해 결합된 특징맵은 복수의 컨볼루션 레이어(100b, 100c, 100d)에 의한 컨벌류션 연산을 반복적으로 수행하게 된다.The feature maps combined through the first pooling layer 200a are repeatedly subjected to convolution operations by the plurality of convolution layers 100b, 100c, and 100d.

제 2 풀링 레이어(200b)는 상기 전결합 레이어(300) 이전에 상기 CMP 유닛(210)과 Max Pooling 유닛(220)에 의해 2단 Pooling을 통해 진행된 두 개의 특징맵을 추출한 후 하나의 특징맵으로 겹합하게 된다((Filter concaternation). The second pooling layer 200b extracts two feature maps that have been performed through two-step pooling by the CMP unit 210 and the Max Pooling unit 220 before the precombining layer 300 and converts them into one feature map. It is combined ((Filter concatenation).

이와 같은 구조를 갖는 본 발명에 따른 CMP를 이용한 재구조화된 합성공 신경망의 동작을 도 7을 참조하여 설명하기로 한다. The operation of the restructured synthetic hole neural network using CMP according to the present invention having such a structure will be described with reference to FIG. 7 .

먼저 전처리된 이미지를 입력(Input Image)시키는데(S410), 이미지가 입력된 후 RGB 형태로 이미지의 채널을 분리한다.First, the preprocessed image is input (Input Image) (S410). After the image is input, the channels of the image are separated in RGB form.

그 후, 입력된 이미지를 컨볼루션 레이어를 통해 컨볼루션(Convolution) 연산과 활성화 함수를 수행하여 특징맵을 추출하는데(S420), 여기서 활성화 함수로 ReLU를 사용한다.Thereafter, a feature map is extracted by performing a convolution operation and an activation function on the input image through a convolution layer (S420), where ReLU is used as an activation function.

그 후, 추출된 특징맵을 제 1 풀링 레이어(200a)를 통해 상기 CMP 유닛(410)과 Max Pooling 유닛(420)에 의해 2단 Pooling(CMP, Max Poolin)을 통해 진행된 두 개의 특징맵을 1×1 크기의 컨볼루션(Convolution)을 통해 채널을 감소시키고 하나의 특징맵으로 결합(Filter Concatenation)하게 된다(S430).Then, the extracted feature maps are processed through two-stage pooling (CMP, Max Poolin) by the CMP unit 410 and the Max Pooling unit 420 through the first pooling layer 200a. Channels are reduced through convolution of a size of ×1 and combined into one feature map (Filter Concatenation) (S430).

그 후, 결합된 특징맵을 설정된 복수의 컨볼루션 레이어를 통해 컨볼루션(Convolution) 연산과 활성화 함수를 반복적으로 진행하여 특징맵을 추출하게 된다(S440).Thereafter, a feature map is extracted by repeatedly performing a convolution operation and an activation function through a plurality of convolution layers set for the combined feature map (S440).

그 후, 전결합 레이어 이전에 상기 추출된 특징맵을 제 2 풀링 레이어(200b)를 통해 다시 한번 Pooling 과정을 수행하는데 즉, 상기 CMP 유닛(410)과 Max Pooling(420)에 의해 2단 Pooling(CMP, Max Poolin)을 통해 진행된 두 개의 특징맵을 추출한 후 하나의 특징맵으로 겹합((Filter Concatenation)하게 된다(S450). 여기서는 상기 S430 단계에서와 같이 1 1×1 크기의 컨볼루션을 사용하기 않게 된다.After that, before the pre-combination layer, the extracted feature map is pooled again through the second pooling layer 200b, that is, 2-stage pooling by the CMP unit 410 and Max Pooling 420 ( After extracting the two feature maps processed through CMP, Max Poolin), they are merged into one feature map ((Filter Concatenation) (S450). Here, as in step S430, 1 1 × 1 size convolution is used. will not be

마지막으로, S450 단계에서 결합된 특쟁맵을 전결합(Fully Connected) 레이어를 통해 최종 신경망의 결과를 예측하여 출력하게 된다.Finally, the result of the final neural network is predicted and output through the fully connected layer of the feature map combined in step S450.

구현 및 성능 평가Implementation and performance evaluation

이하 본 발명에 따른 CMP를 이용한 재구조화된 합성곱 신경망에 대한 성능평가를 위해 수집한 데이터를 활용하여 기존 연구와 비교하도록 한다.Hereinafter, data collected for performance evaluation of the restructured convolutional neural network using CMP according to the present invention will be used and compared with existing studies.

1. 시스템 구현 환경1. System implementation environment

본 발명에서 제안하는 재구조화된 합성곱 신경망의 설계 및 구현 환경은 하기 표 1의 개발 환경에서 개발과 성능평가가 진행된다.The design and implementation environment of the restructured convolutional neural network proposed in the present invention is developed and performance evaluated in the development environment shown in Table 1 below.

분류classification 세부 사항Detail OSOS Windows 10Windows 10 CPUCPU Intel Core i7-9700Intel Core i7-9700 RAMRAM 32GB32GB GPUGPU Geforce RTX 2080 SuperGeForce RTX 2080 Super LanguageLanguage Python 3.6Python 3.6 IDEIDE PyCharm Community 2020.1.2PyCharm Community 2020.1.2 LibraryLibrary Tensorflow1.14.0, Keras2.31Tensorflow1.14.0, Keras2.31

2. 성능평가 방법2. Performance evaluation method

기존 모델들과 본 발명에서 제안하는 모델의 성능 비교를 진행하기 위한 성능 평가와 요소로는 정확률을 활용한다. 본 발명에서 평가하기 위한 정확률의 측정방식은 전체 데이터 중에 올바르게 예측한 데이터의 비율로 하기의 식 1과 같다.Accuracy rate is used as a performance evaluation and factor to compare the performance of existing models and the model proposed in the present invention. The accuracy measurement method for evaluation in the present invention is the ratio of correctly predicted data to the total data and is as shown in Equation 1 below.

3. 학습 및 테스트 데이터3. Training and testing data

본 발명에서 제안하는 모델을 학습시키기 위해서는 대용량 데이터가 필요하다. 대용량 데이터를 수집할 수 있는 방안은 크게 공공데이터의 활용과 크롤링을 활용한 데이터 수집이 있다. 공공데이터의 경우 데이터가 질적으로 신뢰성이 높지만, 활용하고자 하는 조건에 부합 하면서 많은 양의 데이터를 가지고 있는 데이터 세트를 수집하기 어렵다는 문제점이 있다. 반면 크롤링을 활용하면 대용량의 데이터를 수집할 수 있다. 그러나 데이터의 중복 혹은 한 이미지에 두 개 이상의 카테고리 대상 포함 등의 이유로 정확한 데이터를 얻기 어려워 신뢰성이 떨어진다는 문제점이 있다. 이에 본 명세서에서는 공공데이터와 크롤링을 활용하여 수집한 데이터를 선별적으로 활용하여 성능평가를 진행한다.A large amount of data is required to train the model proposed in the present invention. Methods to collect large amounts of data include the use of public data and the collection of data using crawling. In the case of public data, although the data is qualitatively reliable, there is a problem in that it is difficult to collect a data set that meets the conditions to be utilized and has a large amount of data. On the other hand, crawling can collect large amounts of data. However, there is a problem in that it is difficult to obtain accurate data due to duplication of data or the inclusion of two or more category objects in one image, thereby reducing reliability. Therefore, in this specification, performance evaluation is performed by selectively utilizing public data and data collected through crawling.

먼저 Caltech 101은 캘리포니아 대학에서 제공하는 공공데이터로 총 9,146개의 이미지 데이터이다. 비행기, 개미, 카메라, 의자 등 101개의 서로 다른 개체(카테고리)로 이루어진 데이터 세트이다. 각 이미지의 평균 크기는 약 300×200 픽셀로 구성되었지며, 카테고리별 데이터의 양은 최소 31개부터 최대 800개까지 큰 폭의 차이로 구성되어 있다. 이에 모델의 학습에 사용될 데이터는 100개 이상의 이미지를 가지고 있는 12개의 카테고리 중 서로 유사하거나 흑백 이미지, 크롤링으로 수집한 데이터의 카테고리 등을 제외한 7개의 카테고리로 모델 학습을 진행한다. 표 2는 선출된 Caltech 101 데이터의 카테고리별 데이터 양을 나타낸다.First, Caltech 101 is public data provided by the University of California, and is a total of 9,146 image data. It is a data set consisting of 101 different entities (categories), such as airplanes, ants, cameras, and chairs. The average size of each image is composed of about 300 × 200 pixels, and the amount of data per category is composed of a large difference from a minimum of 31 to a maximum of 800. Therefore, among the 12 categories with more than 100 images, the data to be used for model learning proceeds with 7 categories excluding similar or black-and-white images and categories of data collected by crawling. Table 2 shows the amount of data by category of the selected Caltech 101 data.

카테고리category 데이터양amount of data 이미지 평균 크기average image size 이미지 평균 용량image average capacity AirplanesAirplanes 800800 402×158402×158 10KB10 KB MotorbikesMotorbikes 798798 263×165263×165 9KB9KB FacesFaces 435435 504×333504×333 28KB28 KB WatchWatch 239239 292×230292×230 14KB14 KB LeopardsLeopards 200200 182×138182×138 7KB7KB BonsaiBonsai 128128 263×281263×281 17KB17 KB ChandelierChandelier 107107 269×274269×274 15KB15 KB

다음으로 이미지 데이터 크롤링을 위하여 Python의 Beautifulsoup과 ChromeDriver 라이브러리를 활용하며, 데이터 수집 진행은 Google 이미지 검색을 활용한다. 또한, 이미지 데이터 수집 양을 늘리기 위해 한국 Google 이미지 검색과 미국 Google 이미지 검색을 통해 이미지 데이터 수집을 진행한다. 크롤링을 사용하여 수집한 데이터는 전처리 모듈을 통해 데이터의 이미지 크기 검사, 중복성 검사, 오류 데이터 검사를 진행한다. 이를 통해 데이터의 신뢰성을 높이고 이미지 크기 변환과 형태 변화를 통해 모델의 학습 및 테스트를 위한 데이터 세트 구축한다. 표 3은 크롤링을 통해 수집한 데이터를 카테고리별로 분류한 것이다. bird, boat, car, cat, dog, rabbit으로 총 6개의 카테고리로 구성하였으며 해당 수량은 Data Argument를 수행하기 이전 데이터 양이다.Next, Python's Beautifulsoup and ChromeDriver libraries are used to crawl image data, and Google image search is used to collect data. In addition, in order to increase the amount of image data collection, image data is collected through Google Image Search in Korea and Google Image Search in the United States. The data collected using crawling is inspected for image size, redundancy, and error data through the preprocessing module. Through this, data reliability is increased and a data set for model learning and testing is built through image size conversion and shape change. Table 3 classifies the data collected through crawling by category. It consists of a total of six categories: bird, boat, car, cat, dog, and rabbit, and the quantity is the amount of data before performing the Data Argument.

카테고리category 데이터양amount of data 이미지 평균 크기average image size 이미지 평균 용량image average capacity birdbird 676676 349×254349×254 26KB26 KB boatboat 557557 407×284407×284 39KB39 KB carcar 702702 566×357566×357 82KB82 KB catcat 786786 730×553730×553 108KB108 KB dogdog 663663 696×536696×536 122KB122KB rabbitrabbit 500500 403×302403×302 49KB49 KB

4. 본 발명에서 제안하는 모델의 성능평가4. Performance evaluation of the model proposed in the present invention

이하 본 발명에서 제안하는 CMP와 기존의 Pooling 기법의 성능평가를 진행하고 이후 재구조화된 CNN의 성능과 기존 알고리즘들의 성능을 비교평가 한다. 표 4는 본 발명에서 제안하는 CNN 구조의 성능 비교평가를 위한 모델의 구조 형태다.Hereinafter, the performance evaluation of the CMP proposed in the present invention and the existing pooling technique is performed, and then the performance of the restructured CNN and the performance of existing algorithms are compared and evaluated. Table 4 is the structural form of the model for performance comparison evaluation of the CNN structure proposed in the present invention.

모델Model 데이터셋dataset 이미지 크기image size 네트워크 깊이network depth 레이블 수number of labels AlexNetAlexNet Caltech 101Caltech 101 64x6464x64 88 77 ResNetResNet 64x6464x64 5656 77 DenseNetDenseNet 64x6464x64 121121 77 Study of ProposeStudy of Propose 64x6464x64 55 77 AlexNetAlexNet 크롤링 데이터crawl data 64x6464x64 88 66 ResNetResNet 64x6464x64 5656 66 DenseNetDenseNet 64x6464x64 121121 66 Study of ProposeStudy of Propose 64x6464x64 55 66

1) CMP 성능평가1) CMP performance evaluation

첫 번째 성능평가에서는 Caltech 101의 데이터 일부와 크롤링을 통해 수집한 데이터를 개별적으로 사용하여 Pooling의 성능 테스트를 진행한다. 성능평가를 위한 모델은 64x64 크기로 입력을 받고 2개의 컨볼루션 계층과 2개의 Pooling 계층을 갖는 구조로 구성한다. 동일한 조건에서 Pooling 기법만 변경하여 총 100 Epoch의 학습을 진행한다. CMP는 연속적으로 사용하기보다 Max Pooling을 같이 활용하는 것이 특징 추출에 유리하기에 첫 Pooling 계층에는 CMP를 마지막 Pooling 계층에는 Max Pooling을 사용한다. 나머지는 전부 Max Pooling 혹은 Average Pooling으로 구성한다. Pooling의 성능 테스트는 전체 데이터의 74%를 학습 데이터로 분류하였으며, 16%는 Epoch 진행 중 테스트 데이터로 분류하였고 10%는 최종 성능 테스트로 활용한다.In the first performance evaluation, a pooling performance test is conducted using a part of Caltech 101 data and data collected through crawling individually. The model for performance evaluation receives an input in the size of 64x64 and consists of a structure with 2 convolution layers and 2 pooling layers. A total of 100 epochs of learning is performed by changing only the pooling method under the same conditions. Since it is advantageous for feature extraction to use Max Pooling together rather than using CMP consecutively, CMP is used for the first pooling layer and Max Pooling is used for the last pooling layer. All the rest are composed of Max Pooling or Average Pooling. In the pooling performance test, 74% of the total data was classified as training data, 16% was classified as test data during the epoch, and 10% was used as the final performance test.

도 8은 Clatech 101 데이터를 활용한 Pooling 최종 성능 결과를 나타낸 것으로써, 기존 Pooling과 본 발명에서 제안하는 Pooling(CMP, Conditional)의 최종 정확률과 손실 값으로 전체 데이터의 10%를 사용하여 테스트한 결과이다. 정확률은 Average Pooling이 가장 낮은 수치를 보인다. Max Pooling과 제안하는 Pooling(CMP, Conditional)은 비슷한 성능의 결과를 보이지만 손실 값을 살펴보면 Max Pooling은 0.1021%, 제안하는 Pooling 방식(CMP)은 0.0817%의 결과를 보인다. 객관적인 성능의 결과로는 CMP를 활용한 조합이 단일 Max Pooling 조합보다 높은 성능을 나타내는 것을 확인할 수 있다.Figure 8 shows the final performance results of Pooling using Clatech 101 data, the results of testing using 10% of the total data as the final accuracy and loss values of existing Pooling and Pooling (CMP, Conditional) proposed in the present invention. am. As for accuracy, Average Pooling shows the lowest value. Max Pooling and the proposed Pooling (CMP, Conditional) show similar performance results, but looking at the loss value, Max Pooling shows 0.1021% and the proposed Pooling method (CMP) shows 0.0817%. As a result of objective performance, it can be confirmed that the combination using CMP shows higher performance than the single Max Pooling combination.

도 9는 크롤링 데이터를 활용한 Pooling 최종 성능 결과를 나타낸 것으로써, 전체 성능을 살펴보면 제안하는 Pooling(CMP, Conditional)의 정확률은 0.81로 가장 높고, 손실률은 0.23902로 가장 낮은 모습을 보인다. 성능 평가 결과로는 세 개의 Pooling 기법 중 가장 높은 성능을 보인다. 이후 나머지 Pooling은 Caltech 101과 다르게 Max Pooling보다 Average Pooling이 좀 더 높은 성능 결과를 보인다. Figure 9 shows the final performance results of pooling using crawling data. Looking at the overall performance, the proposed pooling (CMP, conditional) has the highest accuracy rate of 0.81 and the lowest loss rate of 0.23902. As a result of performance evaluation, it shows the highest performance among the three pooling techniques. After that, for the rest of the pooling, unlike Caltech 101, Average Pooling shows higher performance than Max Pooling.

이상에서 살펴본 바와 같이, Caltech 101 데이터를 활용한 테스트 결과는 본 발명에서 제안하는 Pooling 기법(CMP)의 정확률이 0.9928%로 기존의 Pooling 기법보다 0.16~0.52%의 성능 향상을 확인하였다. 손실률은 0.0817%로 기존 기법보다 19.98~28.71% 감소한 것을 확인하였다. 수집한 이미지(크롤링 데이터)를 활용한 테스트에서는 정확률은 0.81%로 기존의 기법보다 1.36~2.56%의 성능 개선이 진행되었으며, 손실률은 2.3902%로 9.22~13.28% 감소한 것을 확인하였다.As described above, the test results using Caltech 101 data confirmed that the accuracy rate of the pooling technique (CMP) proposed in the present invention was 0.9928%, which was 0.16 to 0.52% higher than the existing pooling technique. The loss rate was 0.0817%, which was confirmed to be 19.98~28.71% lower than the existing technique. In the test using the collected images (crawling data), the accuracy rate was 0.81%, which improved performance by 1.36~2.56% compared to the existing method, and the loss rate was 2.3902%, which was confirmed to decrease by 9.22~13.28%.

2) 재구조화된 합성곱 신경망 성능평가2) Restructured Convolutional Neural Network Performance Evaluation

CMP의 효율적인 활용을 위해 본 발명에 따른 재구조화된 합성곱 신경망은 앞서 살펴본 Pooling의 성능평가와 동일하게 Caltech 101과 크롤링 데이터를 사용하지만, 확실한 성능평가를 위해 Data Augmentation으로 기존 데이터양을 10배로 확대하였다. 또한, 학습 횟수도 기존 100회에서 500회로 확대하여 성능평가를 진행하였다.For efficient use of CMP, the restructured convolutional neural network according to the present invention uses Caltech 101 and crawling data in the same way as the pooling performance evaluation described above, but expands the amount of existing data by 10 times with data augmentation for reliable performance evaluation. did In addition, the performance evaluation was conducted by expanding the number of times of learning from 100 times to 500 times.

도 10은 Caltech 데이터를 활용한 신경망 모델의 최종 성능 결과이다. 정확률은 본 발명에서 제안하는 구조가 0.9844%로 가장 높지만 에러율은 AlexNet이 0.0491%로 가장 낮은 결과를 보인다. ResNet의 경우 본 발명에서 제안하는 구조와 정확률, 에러율의 성능에 큰 차이가 없으며, 학습을 진행하는 과정 일부분에서는 ResNet이 더 좋은 성능을 보여 모델 성능에 우열을 판단하기 어렵다.10 is a final performance result of a neural network model using Caltech data. The structure proposed in the present invention has the highest accuracy rate of 0.9844%, but the error rate of AlexNet is the lowest at 0.0491%. In the case of ResNet, there is no significant difference in the performance of the structure, accuracy, and error rate proposed in the present invention, and ResNet shows better performance in some parts of the learning process, making it difficult to judge superiority or inferiority in model performance.

도 11은 크롤링 데이터를 활용한 신경망 모델의 최종 성능 결과이다. 정확률은 DenseNet이 0.8686%로 가장 높으며, 본 발명에서 제안하는 모델은 0.8473%로 두 번째로 높은 성능을 보인다. 에러율은 AlexNet이 0.8454%로 가장 낮으며 본 발명에서 제안하는 모델은 2.306%로 두 번째로 낮은 에러율을 보인다.11 is a final performance result of a neural network model using crawling data. The accuracy rate of DenseNet is the highest at 0.8686%, and the model proposed in the present invention shows the second highest performance at 0.8473%. As for the error rate, AlexNet has the lowest error rate of 0.8454%, and the model proposed in the present invention has the second lowest error rate of 2.306%.

이상에서 살펴본 바와 같이, Caltech 101 데이터를 활용한 테스트에서는 본 발명에서 제안하는 구조가 최종 테스트에서 가장 높은 정확률로 0.38~2.69%의 성능 개선을 확인하였다. 에러율은 AlexNet이 0.0491%로 가장 낮은 에러율을 보였으며, 제안하는 구조가 0.2393%로 2위를 차지하였다. 한편, 수집한 이미지(크롤링 데이터)를 활용한 성능 테스트는 DenseNet이 0.8686%로 1위를 차지하였으며, 제안하는 구조는 0.8473%로 2위를 차지하였다. 에러율은 AlexNet이 1.0769%로 1위를 차지하였으며, 제안하는 구조는 2.306%로 2위를 차지하였다. 테스트 결과 제안하는 구조가 가장 높은 성능을 보이지는 못하였으나, 컨볼루션 구조의 변경이 없었던 점을 고려하면 충분히 괄목할만한 성능을 보인 것으로 판단된다.As described above, in the test using Caltech 101 data, the structure proposed in the present invention confirmed a performance improvement of 0.38 to 2.69% with the highest accuracy rate in the final test. As for the error rate, AlexNet showed the lowest error rate at 0.0491%, and the proposed structure ranked second with 0.2393%. Meanwhile, in the performance test using collected images (crawling data), DenseNet ranked first with 0.8686%, and the proposed structure ranked second with 0.8473%. As for the error rate, AlexNet took first place with 1.0769%, and the proposed structure took second place with 2.306%. As a result of the test, the proposed structure did not show the highest performance, but considering the fact that there was no change in the convolution structure, it is judged that it showed sufficiently remarkable performance.

이상에서 본 발명의 바람직한 실시예에 대하여 설명하였으나 본 발명은 상술한 특정의 실시예에 한정되지 아니한다. 즉, 본 발명이 속하는 기술분야에서 통상의 지식을 가지는 자라면 첨부된 특허청구범위의 사상 및 범주를 일탈함이 없이 본 발명에 대한 다수의 변경 및 수정이 가능하며, 그러한 모든 적절한 변경 및 수정의 균등물들도 본 발명의 범위에 속하는 것으로 간주되어야 할 것이다.Although preferred embodiments of the present invention have been described above, the present invention is not limited to the specific embodiments described above. That is, those skilled in the art to which the present invention pertains can make many changes and modifications to the present invention without departing from the spirit and scope of the appended claims, and all such appropriate changes and modifications Equivalents should also be considered as falling within the scope of this invention.

본 발명은 대규모 영상 데이터에서 스스로 패턴 및 규칙을 찾아내어 학습하는 딥러닝 분야에서 사용되는 CMP(Conditional Min Pooling)를 이용한 재구조화된 합성곱 신경망 시스템 및 그 동작 방법에 관한 것으로서 산업상 이용가능성이 있다. The present invention relates to a restructured convolutional neural network system using Conditional Min Pooling (CMP) used in the field of deep learning in which patterns and rules are discovered and learned by itself from large-scale image data, and an operating method thereof, which has industrial applicability. .

Claims

A convolution layer processes an input image through a filter and outputs a feature map as a result of the operation, and the size of the feature map through sub sampling of the output feature map. ) and a Fully Connected Layer that solves the size-reduced feature map into a one-dimensional array, adjusts the parameters to predict the label, proceeds with learning, and outputs the result. In the synthetic hole neural network system,

The pooling layer,

If 0 does not exist in the window, the minimum value is extracted as a feature value as in Min Pooling. A CMP (Conditional Min Pooling) unit to set constraints according to the allowable value (0.25 to 1);

A restructured convolutional neural network system using CMP, characterized in that it includes a two-stage structure of a Max Pooling unit that extracts the maximum value within the window as a feature value.

According to claim 1,

Restructured convolutional neural network system using CMP, characterized in that the allowable value of the CMP unit is 0.25, 0.5, 0.75, 1.

According to claim 1,

The pooling layer,

After convolution operation processing of the input image by the convolution layer, the two feature maps processed through two-stage pooling by the CMP unit and the Max Pooling unit are reduced through convolution with a size of 1 × 1 to reduce the channel and generate one feature. A first pooling layer combining into a map;

After the feature maps combined through the first pooling layer repeatedly perform convolution operations on the plurality of convolution layers, two stages of pooling performed by the CMP unit and Max Pooling before the precombination layer are performed. A restructured convolutional neural network system using CMP, characterized in that it comprises a second pooling layer for extracting two feature maps and then combining them into one feature map and sending them to the precombined layer.

A method of operating a restructured convolutional neural network system using the CMP according to any one of claims 1 to 3,

a) extracting a feature map by performing a convolution operation on the input image through a convolution layer;

b) The feature maps extracted in step a) are reduced through the first pooling layer, and the two feature maps processed through two-stage pooling by the CMP unit and the Max Pooling unit are convolved with a size of 1 × 1 to reduce the channel. and combining them into one feature map;

c) extracting a feature map by repeatedly performing a convolution operation through a plurality of convolution layers in which the feature map combined in step b) is set;

d) extracting the feature maps extracted in step c) through a second pooling layer and combining the two feature maps into one feature map after extracting two feature maps performed through two-stage pooling by the CMP unit and Max Pooling; and,

e) predicting and outputting the result of the final neural network through the fully combined layer using the feature map combined in step d);