WO2024117555A1

WO2024117555A1 - Method of training classification model for image data and device therefor

Info

Publication number: WO2024117555A1
Application number: PCT/KR2023/016841
Authority: WO
Inventors: 김진영; 박창현; 이동수; 이준형; 조현우; 정종훈; 정우석; 김동희; 조대현; 박찬민; 조성덕
Original assignee: Vuno Inc
Current assignee: Vuno Inc
Priority date: 2022-12-01
Filing date: 2023-10-27
Publication date: 2024-06-06
Anticipated expiration: 2025-06-01
Also published as: KR20240082452A; KR102727038B1

Abstract

Disclosed are a method of training, by a computing device, a classification model that outputs classification information on the basis of image data, and a device therefor according to various embodiments. A method and a device therefor are disclosed, the method comprising the steps of: pre-training an encoder on the basis of first training data including first image data and metadata; and training a classification model including the pre-trained encoder on the basis of second training data including label information, wherein the encoder is pre-trained on the basis of first feature vectors extracted from the first image data and third feature vectors obtained by applying the characteristic of meta-vectors related to the metadata to the first feature vectors.

Description

Method for learning a classification model for image data and device for the same

인공 신경망에 기반한 분류 모델 및 분류 모델에 포함된 인코더를 학습시키는 방법 및 이를 위한 장치에 대한 것이다.This topic relates to a classification model based on an artificial neural network, a method for learning an encoder included in the classification model, and a device for the same.

현재, 병변을 분석함으로써 진단에 이용하기 위하여 CT(computed tomography; 전산화 단층 촬영) 등의 의료 영상이 널리 이용되고 있다. 예를 들어, 흉부 CT 영상은 신체의 내부, 예컨대, 폐, 기관지, 심장 등의 이상을 관찰할 수 있어 판독용으로 빈번히 이용된다.Currently, medical images such as CT (computed tomography) are widely used for diagnosis by analyzing lesions. For example, chest CT images are frequently used for interpretation because they can observe abnormalities inside the body, such as the lungs, bronchial tubes, and heart.

흉부 CT 영상을 통하여 판독될 수 있는 몇몇 소견들은 영상의학과 의사도 다년 간의 수련을 통하여야만 그 특징 및 형태를 구분해낼 수 있을 정도로 그 판독이 용이하지 않아 인간인 의사가 쉽게 간과할 수 있다. 특히, 폐결절(lung nodule)과 같이 그 판독의 난이도가 높으면 의사가 고도의 주의를 기울여도 미처 보지 못하고 지나치는 경우가 발생할 수 있어 문제가 되기도 한다. 이 같이 인간이 쉽게 간과할 수 있는 영상의 판독을 보조하기 위하여, 컴퓨터 보조 진단(CAD; computer aided diagnosis)의 필요성이 대두되었는데, 종래의 CAD 기술은 매우 한정된 영역에서 의사의 판단을 보조함에 그친다. Some findings that can be interpreted through chest CT images are not easy to interpret, so even radiologists can distinguish their characteristics and shapes only after many years of training, so they can be easily overlooked by human doctors. In particular, when the level of difficulty in interpreting lung nodules is high, it can become a problem because doctors may miss them even if they pay a high level of attention. In order to assist in the interpretation of images that can be easily overlooked by humans, the need for computer-aided diagnosis (CAD) has emerged, but conventional CAD technology only assists the doctor's judgment in a very limited area. .

이와 같은 문제점을 해결하기 위해서 인공 신경망 모델을 이용한 판독 및/또는 진단을 수행하는 방법이 활발하게 연구되고 있다. 예컨대, 인공 신경망 모델에 기반한 의료 이미지에 대한 분류, 객체 검출, 객체 경계의 추출, 서로 다른 영상의 정합을 통해 상기 판독 및/또는 진단이 수행될 수 있다. 여기서, 신경망 모델(Neural Network model)은 생물학에서의 뉴런 구조로부터 착안된 지도 학습(supervised learning) 알고리즘이다. 신경망 모델의 기본적인 작동 원리는 여러 개의 뉴런들을 상호 연결하여 입력값에 대한 최적의 출력값을 예측하는 것이다. 통계적인 관점에서 보면 신경망 모델은 입력 변수의 선형 결합에 비선형 함수를 취하는 사영추적회귀로 볼 수 있다. 특히, 의료 이미지에서의 상술한 객체 등 특징의 추출은 컨볼루션 신경망(Convolution neural networks; CNN)가 가장 많이 활용되고 있다.In order to solve these problems, methods for performing reading and/or diagnosis using artificial neural network models are being actively studied. For example, the reading and/or diagnosis may be performed through classification of medical images based on an artificial neural network model, object detection, extraction of object boundaries, and registration of different images. Here, the neural network model is a supervised learning algorithm inspired from the structure of neurons in biology. The basic operating principle of a neural network model is to predict the optimal output value for input values by interconnecting multiple neurons. From a statistical perspective, a neural network model can be viewed as a projective trace regression that takes a non-linear function to a linear combination of input variables. In particular, convolution neural networks (CNN) are most widely used to extract features such as the above-mentioned objects from medical images.

해결하고자 하는 과제는 레이블 정보의 한계로 인한 메타 데이터의 특성이 편중된 이미지 데이터를 학습 데이터로 이용하는 경우에도 상기 메타 데이터의 특성에 의한 영향을 최소화할 수 있도록 상기 분류 모델을 학습시킬 수 있는 방법 및 이를 위한 장치를 제공하는 것이다.The problem to be solved is a method for learning the classification model so as to minimize the impact of the metadata characteristics even when image data with biased metadata characteristics due to limitations in label information are used as learning data; and The goal is to provide a device for this.

기술적 과제들은 이상에서 언급한 기술적 과제들로 제한되지 않으며, 언급하지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The technical problems are not limited to the technical problems mentioned above, and other technical problems not mentioned can be clearly understood by those skilled in the art from the description below.

일 측면에 따른 컴퓨팅 장치가 이미지 데이터에 기반하여 분류 정보를 출력하는 분류 모델을 학습시키는 방법은 제1 이미지 데이터 및 메타 데이터를 포함하는 제1 학습 데이터에 기반하여 인코더를 사전 학습시키는 단계, 및 레이블 정보를 포함하는 제2 학습 데이터에 기반하여 상기 사전 학습된 인코더를 포함하는 분류 모델을 학습시키는 단계를 포함하고, 상기 인코더는 상기 제1 이미지 데이터로부터 추출한 제1 특징 벡터들, 및 상기 제1 특징 벡터들에 상기 메타 데이터와 관련된 메타 벡터들의 특성을 적용한 제3 특징 벡터들에 기초하여 사전 학습될 수 있다.According to one aspect, a method of training a classification model in which a computing device outputs classification information based on image data includes pre-training an encoder based on first training data including first image data and metadata, and a label. A step of training a classification model including the pre-trained encoder based on second training data including information, wherein the encoder includes first feature vectors extracted from the first image data, and the first feature. Vectors may be pre-trained based on third feature vectors that apply characteristics of meta vectors related to the meta data.

또는, 상기 인코더는 상기 제1 특징 벡터들 및 상기 제3 특징 벡터들에 대한 제1 손실 함수에 기반하여 사전 학습되는 것을 특징으로 한다.Alternatively, the encoder may be pre-trained based on a first loss function for the first feature vectors and the third feature vectors.

또는, 상기 인코더는 상기 인코더의 파라미터를 이동 평균화하는 모멘텀 인코더로부터 추출된 제2 특징 벡터들을 더 고려하여 사전 학습되는 것을 특징으로 한다.Alternatively, the encoder may be pre-trained by further considering second feature vectors extracted from a momentum encoder that moves average the parameters of the encoder.

또는, 상기 모멘텀 인코더는 상기 제1 이미지 데이터를 증강시킨 증강 이미지 데이터로부터 상기 제2 특징 벡터들을 추출하는 것을 특징으로 한다.Alternatively, the momentum encoder may extract the second feature vectors from augmented image data that augments the first image data.

또는, 상기 인코더는 상기 제1 특징 벡터와 상기 제2 특징 벡터에 대한 제2 손실 함수 및 상기 제1 특징 벡터와 상기 제3 특징 벡터에 대한 제1 손실 함수에 기반하여 사전 학습되는 것을 특징으로 한다.Alternatively, the encoder is characterized in that it is pre-trained based on a second loss function for the first feature vector and the second feature vector and a first loss function for the first feature vector and the third feature vector. .

또는, 상기 컴퓨팅 장치는 메타 데이터 퓨전 모듈에 상기 메타 데이터 및 상기 제1 특징 벡터를 입력하여 상기 제3 특징 벡터들을 획득하고, 상기 인코더는 상기 메타 데이터 퓨전 모듈의 파라미터를 이동 평균화하는 모멘텀 메타 데이터 퓨전 모듈이 출력하는 제4 특징 벡터들을 더 고려하여 사전 학습되는 것을 특징으로 한다.Alternatively, the computing device obtains the third feature vectors by inputting the metadata and the first feature vector to a metadata fusion module, and the encoder performs momentum metadata fusion to move average the parameters of the metadata fusion module. It is characterized in that it is pre-trained by further considering the fourth feature vectors output by the module.

또는, 상기 인코더는 상기 제1 특징 벡터들과 상기 제3 특징 벡터들에 대한 제1 손실 함수 및 상기 제3 특징 벡터들과 상기 제4 특징 벡터들에 대한 제3 손실 함수에 기초하여 사전 학습되는 것을 특징으로 한다.Alternatively, the encoder is pre-trained based on a first loss function for the first feature vectors and the third feature vectors and a third loss function for the third feature vectors and the fourth feature vectors. It is characterized by

또는, 상기 메타 벡터들은 상기 메타 데이터에 기반하여 상기 제1 특징 벡터들과 관련된 N 채널과 대응하도록 M 개의 채널들에 대해 생성되고, 상기 메타 벡터들의 특성은 각 메타 벡터의 요소들의 평균 및 분산 중 적어도 하나에 기초하여 결정된 것을 특징으로 한다.Alternatively, the meta vectors are generated for M channels to correspond to N channels related to the first feature vectors based on the meta data, and the characteristics of the meta vectors are determined by the average and variance of the elements of each meta vector. Characterized by being determined based on at least one.

또는, 상기 제3 특징 벡터들은 상기 제1 특징 벡터들에 대한 적응형 인스턴스 정규화 (Adaptive Instance Normalization, AdaIN)를 통해 생성되고, 상기 적응형 인스턴스 정규화의 스케일 펙터 및 바이어스는 상기 메타 벡터들의 특성에 기반하여 결정되는 것을 특징으로 한다.Alternatively, the third feature vectors are generated through adaptive instance normalization (AdaIN) for the first feature vectors, and the scale factor and bias of the adaptive instance normalization are based on the characteristics of the meta vectors. It is characterized by being determined.

또는, 상기 레이블 정보는 상기 제2 학습 데이터에 포함된 제2 이미지 데이터에 대한 폐 결절의 위치, 상기 폐 결절의 타입 또는 상기 폐 결절의 세그먼테이션에 대한 레이블 값을 포함하는 것을 특징으로 한다.Alternatively, the label information may include a label value for the location of the lung nodule, the type of the lung nodule, or the segmentation of the lung nodule for the second image data included in the second learning data.

또는, 상기 제1 특징 벡터들은 상기 인코더에 의해 상기 제1 이미지 데이터로부터 추출된 추출 특징 벡터들에 다층 퍼셉트론 (multi-layer perceptron, MLP) 및 추가 가중치가 적용된 특징 벡터들인 것을 특징으로 한다.Alternatively, the first feature vectors may be feature vectors obtained by applying a multi-layer perceptron (MLP) and additional weights to the extracted feature vectors extracted from the first image data by the encoder.

다른 측면에 따른, 컴퓨팅 장치가 분류 모델을 이용하여 이미지 데이터에 기반하여 폐 결절과 관련된 분류 정보를 출력하는 방법은 이미지 데이터를 입력 받는 단계, 및 상기 이미지 데이터를 상기 분류 모델에 입력하여 분류 정보를 출력하는 단계를 포함하고, 상기 분류 모델은 제1 이미지 데이터 및 메타 데이터를 포함하는 제1 학습 데이터에 기반하여 사전 학습된 인코더를 포함하고, 상기 인코더는 상기 제1 이미지 데이터로부터 추출한 제1 특징 벡터들 및 상기 제1 특징 벡터들에 상기 메타 데이터로부터 생성된 메타 벡터들의 특성을 적용한 제3 특징 벡터들에 기초하여 사전 학습될 수 있다.According to another aspect, a method of a computing device outputting classification information related to lung nodules based on image data using a classification model includes receiving image data, and inputting the image data into the classification model to provide classification information. A step of outputting, wherein the classification model includes an encoder pre-trained based on first training data including first image data and metadata, and the encoder generates a first feature vector extracted from the first image data. and third feature vectors obtained by applying characteristics of meta vectors generated from the meta data to the first feature vectors.

다른 측면에 따른, 이미지 데이터에 기반하여 분류 정보를 출력하는 분류 모델을 학습시키는 컴퓨팅 장치는 외부 장치들과 연결된 통신부, 및 상기 통신부와 연결되는 프로세서를 포함하고, 상기 프로세서는 상기 통신부를 통해 획득한 제1 이미지 데이터 및 메타 데이터를 포함하는 제1 학습 데이터에 기반하여 인코더를 사전 학습시키고, 상기 통신부를 통해 획득한 레이블 정보를 포함하는 제2 학습 데이터에 기반하여 상기 사전 학습된 인코더를 포함하는 분류 모델을 학습시키며, 상기 프로세서는 상기 제1 이미지 데이터로부터 추출한 제1 특징 벡터들 및 상기 제1 특징 벡터들에 상기 메타 데이터와 관련된 메타 벡터들의 특성을 적용한 제3 특징 벡터들에 기초하여 상기 인코더를 사전 학습시킬 수 있다.According to another aspect, a computing device that trains a classification model that outputs classification information based on image data includes a communication unit connected to external devices, and a processor connected to the communication unit, wherein the processor receives information obtained through the communication unit. Pre-training an encoder based on first training data including first image data and metadata, and classifying the pre-trained encoder based on second training data including label information acquired through the communication unit. Learning a model, the processor operates the encoder based on first feature vectors extracted from the first image data and third feature vectors obtained by applying characteristics of meta vectors related to the meta data to the first feature vectors. It can be pre-trained.

다양한 실시예들은 메타 데이터의 특성이 고려될 수 있도록 인코더를 사전 학습시킴으로써 레이블 정보의 한계로 인한 메타 데이터의 특성이 편중된 이미지 데이터에 대한 학습 데이터에 기반한 경우에도 상기 메타 데이터의 특성에 의한 영향을 최소화할 수 있도록 상기 분류 모델을 학습시킬 수 있다.Various embodiments pre-train the encoder so that the characteristics of the meta data can be taken into account, thereby preventing the influence of the characteristics of the meta data even when it is based on training data for image data in which the characteristics of the meta data are biased due to limitations in label information. The classification model can be trained to minimize damage.

다양한 실시예에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained in various embodiments are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. There will be.

본 명세서에 첨부되는 도면은 본 발명에 대한 이해를 제공하기 위한 것으로서 본 발명의 다양한 실시형태들을 나타내고 명세서의 기재와 함께 본 발명의 원리를 설명하기 위한 것이다. The drawings attached to this specification are intended to provide an understanding of the present invention, show various embodiments of the present invention, and together with the description of the specification, explain the principles of the present invention.

도 1은 인공 신경망인 CNN의 구조를 설명하기 위한 도면이다.Figure 1 is a diagram to explain the structure of CNN, an artificial neural network.

도 2은 인공 신경망에 기반한 분류 모델을 학습시키는 컴퓨팅 장치를 설명한다. Figure 2 illustrates a computing device that trains a classification model based on an artificial neural network.

도 3 및 도 4는 컴퓨팅 장치가 상기 인코더를 사전 학습시키는 방법을 설명하기 위한 도면이다. 3 and 4 are diagrams for explaining a method by which a computing device pre-trains the encoder.

도 5는 컴퓨팅 장치가 대조 학습 (contrastive learning)에 기반하여 인코더를 자기 주도 사전 학습시키는 방법을 설명하기 위한 도면이다. FIG. 5 is a diagram illustrating a method by which a computing device self-directs pre-training an encoder based on contrastive learning.

도 6 및 도 7는 컴퓨팅 장치가 메타 데이터 퓨전 모듈을 이용하여 추출 특징 벡터에 메타 데이터의 특성을 적용하는 방법을 설명하기 위한 도면이다. Figures 6 and 7 are diagrams for explaining a method by which a computing device applies metadata characteristics to extracted feature vectors using a metadata fusion module.

도 8 및 도 9는 컴퓨팅 장치가 사전 학습된 인코더를 포함하는 분류 모델을 학습시키는 방법을 설명하기 위한 도면이다.FIGS. 8 and 9 are diagrams for explaining a method by which a computing device trains a classification model including a pre-trained encoder.

도 10는 컴퓨팅 장치가 인코더 및 상기 인코더를 포함하는 분류 모델을 학습시키는 방법을 설명하기 위한 도면이다.FIG. 10 is a diagram illustrating a method by which a computing device trains an encoder and a classification model including the encoder.

후술하는 본 발명에 대한 상세한 설명은, 본 발명의 목적들, 기술적 해법들 및 장점들을 분명하게 하기 위하여 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 통상의 기술자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced to make clear the objectives, technical solutions and advantages of the present invention. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention.

본 명세서의 상세한 설명 및 청구항들에 걸쳐 이용된 "영상" 또는 "영상 데이터"라는 용어는 이산적 영상 요소들(예컨대, 2차원 영상에 있어서는 픽셀, 3차원 영상에 있어서는 복셀)로 구성된 다차원 데이터를 지칭한다. As used throughout the description and claims of this specification, the term “image” or “image data” refers to multidimensional data consisting of discrete image elements (e.g., pixels in two-dimensional images and voxels in three-dimensional images). refers to

예를 들어 "영상"은 현미경을 이용하여 관찰되는 소정 조직에 대한 슬라이드에 대응하는 2차원 영상을 의미할 수 있으나, "영상"은 이에 한정되는 것이 아니고, (콘-빔형; cone-beam) 전산화 단층 촬영(computed tomography), MRI(magnetic resonance imaging), 초음파 또는 본 발명의 기술분야에서 공지된 임의의 다른 의료 영상 시스템의 의하여 수집된 피검체(subject)의 의료 영상일 수 있다. 또한 영상은 비의료적 맥락에서 제공될 수도 있는바, 예를 들어 원격 감지 시스템(remote sensing system), 전자현미경(electron microscopy) 등등이 있을 수 있다.For example, “image” may refer to a two-dimensional image corresponding to a slide of a certain tissue observed using a microscope, but “image” is not limited to this and is a (cone-beam) computerized image. It may be a medical image of a subject collected by computed tomography, magnetic resonance imaging (MRI), ultrasound, or any other medical imaging system known in the art. Images may also be provided in non-medical contexts, such as remote sensing systems, electron microscopy, etc.

본 명세서의 상세한 설명 및 청구항들에 걸쳐, '영상'은 (예컨대, 화면에 표시된) 눈으로 볼 수 있는 영상 또는 영상의 디지털 표현물을 지칭하는 용어이다.Throughout the description and claims herein, 'image' is a term that refers to a visible image (e.g., displayed on a screen) or a digital representation of an image.

설명의 편의를 위하여 제시된 도면에서는 슬라이드 영상 데이터가 예시적 영상 형식(modality)인 것으로 도시되었다. 그러나 통상의 기술자는 본 발명의 다양한 실시예에서 이용되는 영상 형식들이 X선 영상, MRI, CT, PET(positron emission tomography), PET-CT, SPECT, SPECT-CT, MR-PET, 3D 초음파 영상 등등을 포함하나 예시적으로 열거된 형식에 한정되지 않는다는 점을 이해할 수 있을 것이다.For convenience of explanation, in the drawings presented, slide image data is shown as an exemplary image modality. However, those skilled in the art will know that the imaging formats used in various embodiments of the present invention include X-ray images, MRI, CT, PET (positron emission tomography), PET-CT, SPECT, SPECT-CT, MR-PET, 3D ultrasound images, etc. It will be understood that the format includes but is not limited to the format listed as an example.

본 명세서의 상세한 설명 및 청구항들에 걸쳐 설명되는 의료 영상은 'DICOM(Digital Imaging and Communications in Medicine; 의료용 디지털 영상 및 통신)' 표준에 따를 수 있다. DICOM 표준은 의료용 기기에서 디지털 영상 표현과 통신에 이용되는 여러 가지 표준을 총칭하는 용어인바, DICOM 표준은 미국 방사선 의학회(ACR)와 미국 전기 공업회(NEMA)에서 구성한 연합 위원회에서 발표한다.Medical images described throughout the detailed description and claims of this specification may comply with the 'DICOM (Digital Imaging and Communications in Medicine)' standard. The DICOM standard is a general term for several standards used for digital image expression and communication in medical devices. The DICOM standard is announced by a joint committee formed by the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA).

또한, 본 명세서의 상세한 설명 및 청구항들에 걸쳐 설명되는 의료 영상은 '의료영상 저장 전송 시스템(PACS; Picture Archiving and Communication System)'을 통해 저장되거나 전송될 수 있으며, 의료영상 저장 전송 시스템은 DICOM 표준에 맞게 의료 영상을 저장, 가공, 전송하는 시스템일 수 있다. X선, CT, MRI와 같은 디지털 의료영상 장비를 이용하여 획득된 의료영상은 DICOM 형식으로 저장되고 네트워크를 통하여 병원 내외의 단말로 전송이 가능하며, 이에는 관찰 결과 및 진료 기록이 추가될 수 있다.In addition, the medical images described throughout the detailed description and claims of this specification may be stored or transmitted through the 'Picture Archiving and Communication System (PACS)', and the medical image storage and transmission system complies with the DICOM standard. It may be a system that stores, processes, and transmits medical images appropriately. Medical images acquired using digital medical imaging equipment such as .

그리고 본 명세서의 상세한 설명 및 청구항들에 걸쳐 '학습' 혹은 '러닝'은 절차에 따른 컴퓨팅을 통하여 기계 학습(machine learning)을 수행함을 일컫는 용어인바, 인간의 교육 활동과 같은 정신적 작용을 지칭하도록 의도된 것이 아니며, 훈련(training)은 기계 학습에 관하여 일반적으로 받아들여지는 의미로 쓰인 것이다. 예를 들어, '딥 러닝'은 심층 인공 신경망을 이용한 기계 학습을 의미한다. 심층 신경망은 다층의 인공 신경망으로 이루어진 구조에서 다량의 데이터를 학습시킴으로써 각각의 데이터의 특징을 자동으로 학습하고, 이를 통해 목적/손실 함수, 즉 분류 정확도의 에러를 최소화시키는 방식으로 학습을 진행하는 기계 학습 모델이며, 점, 선, 면 등의 저수준의 특징에서부터 복잡하고 의미 있는 고수준의 특징까지 다양한 수준의 특징을 추출하고 분류할 수 있다.In addition, throughout the detailed description and claims of this specification, 'learning' or 'learning' is a term referring to performing machine learning through procedural computing, and is intended to refer to mental operations such as human educational activities. This is not true, and training is used in the generally accepted sense of machine learning. For example, 'deep learning' refers to machine learning using deep artificial neural networks. A deep neural network is a machine that automatically learns the characteristics of each data by learning a large amount of data in a structure consisting of a multi-layer artificial neural network, and through this, learns in a way that minimizes the error in the objective/loss function, that is, classification accuracy. It is a learning model and can extract and classify various levels of features, from low-level features such as points, lines, and surfaces to complex and meaningful high-level features.

그리고 본 명세서의 상세한 설명 및 청구항들에 걸쳐, '포함하다'라는 단어 및 그 변형은 다른 기술적 특징들, 부가물들, 구성요소들 또는 단계들을 제외하는 것으로 의도된 것이 아니다. 또한, '하나' 또는 '한'은 하나 이상의 의미로 쓰인 것이며, '또 다른'은 적어도 두 번째 이상으로 한정된다.And throughout the description and claims herein, the word 'comprise' and variations thereof are not intended to exclude other technical features, attachments, components or steps. Additionally, 'one' or 'one' is used to mean more than one, and 'another' is limited to at least the second or more.

통상의 기술자에게 본 발명의 다른 목적들, 장점들 및 특성들이 일부는 본 명세서로부터, 그리고 일부는 본 발명의 실시로부터 드러날 것이다. 아래의 예시 및 도면은 실례로서 제공되며, 본 발명을 한정하는 것으로 의도된 것이 아니다. 따라서, 특정 구조나 기능에 관하여 본 명세서에 개시된 상세 사항들은 한정하는 의미로 해석되어서는 아니되고, 단지 통상의 기술자가 실질적으로 적합한 임의의 상세 구조들로써 본 발명을 다양하게 실시하도록 지침을 제공하는 대표적인 기초 자료로 해석되어야 할 것이다.Other objects, advantages and features of the invention will appear to those skilled in the art, partly from the specification and partly from practice of the invention. The examples and drawings below are provided by way of example and are not intended to limit the invention. Accordingly, the details disclosed in this specification with respect to specific structures or functions should not be construed in a limiting sense, but are merely representative examples that provide guidance for those skilled in the art to variously practice the present invention with any detailed structures that are practically suitable. It should be interpreted as basic data.

더욱이 본 발명은 본 명세서에 나타난 실시예들의 모든 가능한 조합들을 망라한다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 사상 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 사상 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다. Moreover, the present invention encompasses all possible combinations of the embodiments presented herein. It should be understood that the various embodiments of the present invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein with respect to one embodiment may be implemented in other embodiments without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

본 명세서에서 달리 표시되거나 분명히 문맥에 모순되지 않는 한, 단수로 지칭된 항목은, 그 문맥에서 달리 요구되지 않는 한, 복수의 것을 아우른다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.In this specification, unless otherwise indicated or clearly contradictory to the context, items referred to in the singular include plural unless the context otherwise requires. Additionally, in describing the present invention, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present invention, the detailed description will be omitted.

이하, 통상의 기술자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, in order to enable those skilled in the art to easily practice the present invention, preferred embodiments of the present invention will be described in detail with reference to the attached drawings.

CNN(Convolutional Neural Network)은 합성곱 연산을 사용하는 ANN (Artificial Neural Network, 인공 신경망)의 한 종류이다. CNN은 이미지 특징 추출을 위해 입력 이미지 (또는, 입력 데이터)를 필터가 순회하며 합성곱을 계산하고, 상기 합성곱의 계산 결과를 이용하여 특징 맵 (Feature map) 또는 활성 맵 (Activation Map)을 생성할 수 있다.CNN (Convolutional Neural Network) is a type of ANN (Artificial Neural Network) that uses convolution operations. To extract image features, CNN calculates a convolution by traversing the input image (or input data) through a filter, and uses the calculation result of the convolution to generate a feature map or activation map. You can.

상기 CNN은 각 레이어의 입출력 데이터의 형상 유지하고, 이미지의 공간 정보를 유지하면서 인접 이미지와의 특징을 효과적으로 인식하며, 복수의 필터로 이미지의 특징 추출 및 학습시킬 수 있고, 추출한 이미지의 특징을 모으고 강화하는 풀링 레이어를 선택적으로 포함할 수 있고, 필터를 공유 파라미터로 사용하기 때문에 일반 인공 신경망과 비교하여 학습 파라미터가 매우 적은 장점이 있다.The CNN maintains the shape of the input and output data of each layer, effectively recognizes features of adjacent images while maintaining spatial information of the image, extracts and learns image features using multiple filters, and collects and collects features of the extracted images. It has the advantage of being able to optionally include a strengthening pooling layer and using filters as shared parameters, requiring very few learning parameters compared to general artificial neural networks.

구체적으로, 도 1을 참조하면, 상기 CNN은 입력 이미지 (또는, 입력 데이터)로부터 특징을 추출하는 특징 추출 영역과 상기 추출된 특징을 분류하는 이미지 분류 영역을 포함할 수 있다. 상기 특징 추출 영역은 필터 (Filter)를 사용하여 공유 파라미터 수를 최소화하면서 이미지의 특징을 찾는 적어도 하나의 컨벌루션 (Convolution) 레이어를 포함할 수 있다. 또는, 상기 특징 추출 영역은 상기 특징을 강화하고 모으는 적어도 하나의 풀링 (Pooling) 레이어를 더 포함할 수도 있다. 즉, 상기 풀링 레이어는 생략될 수도 있다. Specifically, referring to FIG. 1, the CNN may include a feature extraction area for extracting features from an input image (or input data) and an image classification area for classifying the extracted features. The feature extraction area may include at least one convolution layer that finds features of the image while minimizing the number of shared parameters using a filter. Alternatively, the feature extraction area may further include at least one pooling layer that enhances and collects the features. That is, the pooling layer may be omitted.

상기 컨벌루션 레이어는 입력 이미지 (또는, 입력 데이터)에 필터를 적용 후 활성화 함수를 반영하는 레이어이다. 상기 컨벌루션 레이어에 유입되는 입력 이미지에는 한 개 이상의 필터가 적용될 수 있다. 상기 1개 필터는 상기 특징 맵 (Feature Map)의 채널을 구성할 수 있다. 예컨대, 상기 컨벌루션 레이어에 n개의 필터가 적용될 경우, 상기 특징 맵 또는 상기 활성 맵 (또는, 출력 데이터)는 n개의 채널을 갖게 된다.The convolution layer is a layer that reflects the activation function after applying a filter to the input image (or input data). One or more filters may be applied to the input image flowing into the convolution layer. The one filter can configure a channel of the feature map. For example, when n filters are applied to the convolutional layer, the feature map or the activation map (or output data) has n channels.

상기 풀링 레이어는 상기 컨벌루션 레이어 다음에 위치하는 선택적인 레이어이다. 상기 풀링 레이어는 컨볼류션 레이어의 출력 데이터를 입력으로 받아서 출력 데이터의 크기를 줄이거나 특정 데이터를 강조하기 위한 레이어로 사용될 수 있다. 상기 플링 레이어를 처리하는 방법은 최대 풀링 (Max Pooling), 평균 풀링 (Average Pooning), 최소 풀링 (Min Pooling)이 있다. 풀링 레이어는 학습 대상 파라미터가 없고, 행렬의 크기 감소시킬 수 있으며, 채널 수를 변경시키기 않는다.The pooling layer is an optional layer located after the convolution layer. The pooling layer receives the output data of the convolution layer as input and can be used as a layer to reduce the size of the output data or emphasize specific data. Methods for processing the pling layer include max pooling, average pooling, and min pooling. The pooling layer has no parameters to learn, can reduce the size of the matrix, and does not change the number of channels.

상기 CNN은 필터 (Filter) 크기, Stride, 패딩 (Padding) 적용 여부, 최대 풀링 (Max Pooling) 크기에 따라서 출력 데이터의 형태 (Shape) 또는 크기가 조절될 수 있고, 필터의 개수를 통해 채널을 결정할 수 있다.The CNN can adjust the shape or size of output data depending on the filter size, stride, whether padding is applied, and the maximum pooling size, and determines the channel through the number of filters. You can.

상기 FC 레이어는 인식 및 분류 동작을 위한 레이어로써, 기존 신경망에서 각 레이어 별로 연결에 사용하는 전결합 레이어이다. 상기 FC 레이어는 상기 특징 추출 영역에서의 출력 데이터의 2차원의 배열 형태를 1차원의 평탄화 작업을 수행할 수 있다. 상기 1차원의 평탄화된 상기 출력 데이터는 SoftMAx 함수를 통해 분류될 수 있다.The FC layer is a layer for recognition and classification operations and is a pre-combined layer used to connect each layer in an existing neural network. The FC layer may perform a one-dimensional flattening operation on the two-dimensional array form of the output data in the feature extraction area. The one-dimensional, flattened output data can be classified through the SoftMAx function.

이와 같은 CNN에 기반한 모델 (이하, 분류 모델)를 상기 MRI, CT 등에 의료 이미지 데이터로부터 소정의 진단 또는 판독할 수 있도록 학습시킬 수 있다. 이하에서는, 상기 MRI, CT 등에 의료 이미지 데이터에 기반하여 특정 분류 및/또는 진단 (예컨대, Nodule detection, FPR, Classification, Segmentation)을 수행하도록 인공 신경망에 기반한 분류 모델을 학습시키는 방법을 자세히 설명한다.Such a CNN-based model (hereinafter referred to as a classification model) can be trained to make a predetermined diagnosis or interpretation from medical image data such as MRI or CT. Hereinafter, a method of learning a classification model based on an artificial neural network to perform specific classification and/or diagnosis (e.g., Nodule detection, FPR, Classification, Segmentation) based on medical image data such as MRI, CT, etc. will be described in detail.

도 2은 인공 신경망에 기반한 분류 모델을 학습시키는 컴퓨팅 장치를 설명한다.Figure 2 illustrates a computing device that trains a classification model based on an artificial neural network.

도 2를 참조하면, 컴퓨팅 장치(20)는, 통신부(21) 및 프로세서(22)를 포함하며, 통신부(21)를 통하여 외부 컴퓨팅 장치(미도시)와 직간접적으로 통신할 수 있다. 여기서, 통신부 (21)는 타 컴퓨팅 장치와 요청과 응답을 송수신할 수 있는 송수신기와 대응하거나 상기 송수신기를 포함할 수 있다.Referring to FIG. 2, the computing device 20 includes a communication unit 21 and a processor 22, and can communicate directly or indirectly with an external computing device (not shown) through the communication unit 21. Here, the communication unit 21 may correspond to or include a transceiver capable of transmitting and receiving requests and responses to and from other computing devices.

구체적으로, 컴퓨팅 장치(20)는, 전형적인 컴퓨터 하드웨어(예컨대, 컴퓨터 프로세서, 메모리, 스토리지, 입력 장치 및 출력 장치, 기타 기존의 영상 처리 장치의 구성요소들을 포함할 수 있는 장치; 라우터, 스위치 등과 같은 전자 통신 장치; 네트워크 부착 스토리지(NAS; network-attached storage) 및 스토리지 영역 네트워크(SAN; storage area network)와 같은 전자 정보 스토리지 시스템)와 컴퓨터 소프트웨어(즉, 컴퓨팅 장치로 하여금 특정의 방식으로 기능하게 하는 명령어들)의 조합을 이용하여 원하는 시스템 성능을 달성하는 것일 수 있다.Specifically, computing device 20 is a device that may include typical computer hardware (e.g., a computer processor, memory, storage, input and output devices, and other components of a conventional image processing device; such as a router, switch, etc. electronic communication devices (electronic information storage systems, such as network-attached storage (NAS) and storage area networks (SAN)) and computer software (i.e., devices that enable computing devices to function in certain ways); The desired system performance may be achieved by using a combination of instructions.

이와 같은 통신부(21)는 연동되는 타 영상 처리 장치와 요청과 응답을 송수신할 수 있는바, 일 예시로서 그러한 요청과 응답은 동일한 TCP(transmission control protocol) 세션(session)에 의하여 이루어질 수 있지만, 이에 한정되지는 않는바, 예컨대 UDP(user datagram protocol) 데이터그램(datagram)으로서 송수신될 수도 있을 것이다. 덧붙여, 넓은 의미에서 상기 통신부(21)는 명령어 또는 지시 등을 전달받기 위한 키보드, 마우스와 같은 포인팅 장치(pointing device), 기타 외부 입력장치, 프린터, 디스플레이, 기타 외부 출력장치를 포함할 수 있다.This communication unit 21 is capable of transmitting and receiving requests and responses to other image processing devices that are linked to it. As an example, such requests and responses may be made through the same TCP (transmission control protocol) session. It is not limited, and may be transmitted and received as, for example, a UDP (user datagram protocol) datagram. Additionally, in a broad sense, the communication unit 21 may include a pointing device such as a keyboard or mouse, other external input devices, printers, displays, and other external output devices for receiving commands or instructions.

또한, 컴퓨팅 장치(20)의 프로세서(22)는 MPU(micro processing unit), CPU(central processing unit), GPU(graphics processing unit) 또는 TPU(tensor processing unit), 캐시 메모리(cache memory), 데이터 버스(data bus) 등의 하드웨어 구성을 포함할 수 있다. 또한, 운영체제, 특정 목적을 수행하는 애플리케이션의 소프트웨어 구성을 더 포함할 수도 있다. 프로세서(22)는 이하에서 설명하는 신경망의 기능을 수행하기 위한 명령어들을 실행할 수 있다.Additionally, the processor 22 of the computing device 20 may include a micro processing unit (MPU), a central processing unit (CPU), a graphics processing unit (GPU), a tensor processing unit (TPU), a cache memory, and a data bus. It may include hardware configuration such as (data bus). In addition, it may further include an operating system and software configuration of an application that performs a specific purpose. The processor 22 may execute instructions to perform the functions of the neural network described below.

일 예에 따르면, 프로세서(22)는 상기 인공 신경망에 기반한 분류 모델이 소정의 태스크 (병변의 검출, 분류 및 세그먼테이션)를 수행할 수 있도록 학습시킬 수 있다. 예컨대, 프로세서(22)는 의료 이미지들 및 각 의료 이미지에 대응하는 레이블 정보를 포함하는 학습 데이터를 통해 상기 분류 모델이 상기 레이블 정보와 출력 정보 간의 차이가 최소화되도록 상기 분류 모델 또는 상기 분류 모델에 포함된 인코더의 파라미터를 조정하여 상기 분류 모델을 학습시킬 수 있다. 여기서, 상기 분류 모델은 상기 의료 이미지들로부터 특징 또는 특징 벡터들을 추출할 수 있는 인공 신경망에 기반할 수 있으며, 예컨데, 상기 인공 신경망은 Deep Neural Network (DNN), Recurrent Neural Network (RNN), Long Short-Term Memory models (LSTM), BRDNN (Bidirectional Recurrent Deep Neural Network), Convolutional Neural Networks (CNN) 등 다양한 종류의 신경망 중 적어도 하나일 수 있다.According to one example, the processor 22 may train the artificial neural network-based classification model to perform a predetermined task (detection, classification, and segmentation of lesions). For example, the processor 22 may include medical images and training data including label information corresponding to each medical image in the classification model or the classification model so that the difference between the label information and output information is minimized. The classification model can be learned by adjusting the parameters of the encoder. Here, the classification model may be based on an artificial neural network capable of extracting features or feature vectors from the medical images. For example, the artificial neural network may include Deep Neural Network (DNN), Recurrent Neural Network (RNN), Long Short -It may be at least one of various types of neural networks, such as Term Memory models (LSTM), BRDNN (Bidirectional Recurrent Deep Neural Network), and Convolutional Neural Networks (CNN).

한편, 의료 이미지에 대한 레이블링의 한계로 인해, 상기 분류 모델을 학습하기 위한 레이블 정보를 포함하는 학습 데이터가 부족할 수 있다. 예컨대, 상기 학습 데이터는 특정 메타 데이터와 관련된 의료 이미지 (또는, 이미지 데이터)들로 편중된 의료 이미지들을 포함할 수 있다. 여기서, 상기 메타 데이터는 환자 등 대상체의 나이, 성별 등 환자의 고유 정보, 이미지 데이터 획득 장치의 프로토콜 정보 (예컨대, CT vendor, X-선 strength, X-선속 형태, 주사 방식 등) 등에 대한 특성 정보일 수 있다. 이 경우, 상기 학습 데이터로 학습될 경우, 상기 분류 모델은 상기 특정 메타 데이터에 의한 의료 이미지의 특성을 제대로 분별하지 못한 채 학습될 수 있는 바, 상기 분류 모델은 상기 특정 메타 데이터와 상이한 메타 데이터를 갖는 의료 이미지에 대한 분류 성능이 크게 저하될 수 있다. Meanwhile, due to limitations in labeling medical images, training data including label information for learning the classification model may be insufficient. For example, the learning data may include medical images that are biased toward medical images (or image data) related to specific metadata. Here, the metadata includes characteristic information about the patient's unique information, such as the patient's age and gender, and protocol information of the image data acquisition device (e.g., CT vendor, X-ray strength, X-ray beam type, injection method, etc.) It can be. In this case, when trained with the training data, the classification model may be learned without properly distinguishing the characteristics of the medical image by the specific metadata, and the classification model may use metadata different from the specific metadata. Classification performance for medical images may be greatly reduced.

따라서, 상기 분류 모델이 상기 특정 메타 데이터로 편중된 의료 이미지에 기반하여 학습되더라도 상기 특정 메타 데이터에 따른 영향을 최소화시킬 방법이 필요하다. 이를 위한 방법으로써, 컴퓨팅 장치 (20)는 상기 메타 데이터에 영향을 고려하여 의료 이미지에 대한 특징을 추출하도록 상기 분류 모델에 포함된 인코더를 사전 학습 (pre-train)시키는 과정을 먼저 수행할 수 있다. 상기 사전 학습은 별도의 레이블 정보 없는 다량의 이미지들을 활용하여 수행될 수 있다. Therefore, even if the classification model is learned based on medical images biased toward the specific metadata, a method is needed to minimize the influence of the specific metadata. As a method for this, the computing device 20 may first perform a process of pre-training the encoder included in the classification model to extract features for the medical image by considering the influence of the metadata. . The dictionary learning can be performed using a large number of images without separate label information.

이하에서는, 컴퓨팅 장치 (20)가 상기 메타 데이터에 의한 영향을 고려할 수 있도록 상기 분류 모델에 포함된 인코더를 자기 주도 사전 학습시키는 방법을 자세히 설명한다.Hereinafter, a method of self-directed pre-training of the encoder included in the classification model so that the computing device 20 can consider the influence of the metadata will be described in detail.

도 3 및 도 4는 컴퓨팅 장치가 상기 인코더를 사전 학습시키는 방법을 설명하기 위한 도면이다.3 and 4 are diagrams for explaining a method by which a computing device pre-trains the encoder.

컴퓨팅 장치는 인코더 (111)에서 출력되는 제1 특징 벡터들과 상기 제1 특징 벡터들에 대해 메타 데이터의 특징을 반영한 제3 특징 벡터들 간을 비교하여 인코더 (111)를 사전 학습 또는 자기 주도 사전 학습 (self-supervised learning)시킬 수 있다.The computing device compares the first feature vectors output from the encoder 111 with third feature vectors that reflect the features of metadata for the first feature vectors and trains the encoder 111 to pre-train or self-directed dictionary. It can be learned (self-supervised learning).

구체적으로, 도 3을 참조하면, 인코더 (111)는 제1 스트림 (stream, 110)과 제2 스트림 (120)에 기반하여 자기 주도 사전 학습될 수 있다. 제1 스트림 (110)에서는 의료 이미지가 입력되고, 인코더 (111)를 통해 제1 특징 벡터들 (f₁(x))이 추출될 수 있다. 예컨대, 인코더 (111)는 도 1을 참조하여 설명한 CNN에 기반하여 상기 입력된 의료 이미지 또는 이미지 데이터 (이하, 이미지 데이터)로부터 N 차원의 특징 벡터들인 추출 특징 벡터들 또는 제1 특징 벡터들(f₁(x))을 추출할 수 있다.Specifically, referring to FIG. 3, the encoder 111 may be self-directed pre-trained based on the first stream (stream, 110) and the second stream (120). A medical image is input into the first stream 110, and first feature vectors (f ₁ (x)) can be extracted through the encoder 111. For example, the encoder 111 extracts N-dimensional feature vectors or first feature vectors (f) from the input medical image or image data (hereinafter referred to as image data) based on the CNN described with reference to FIG. ₁ (x)) can be extracted.

제2 스트림 (120)에서는 상기 이미지 데이터에 대한 메타 데이터가 입력되고, 제1 스트림 (stream, 110)에서 추출된 추출 특징 벡터들이 입력될 수 있다. 상기 메타 데이터 및 추출 특징 벡터들은 메타 데이터 퓨전 모듈 (121)을 통해 상기 메타 데이터가 유의미하게 상기 추출 특징 벡터들에 적용 또는 퓨즈(fuse)된 제3 특징 벡터들 (f₃(x))이 출력될 수 있다. 예컨대, 메타 데이터 퓨전 모듈 (121)은 상기 메타 데이터에 기반하여 상기 추출 특징 벡터들의 채널들 (N)과 대응한 채널들 (M)을 갖는 확장 메타 벡터들을 생성할 수 있고, 상기 확장 메타 벡터들에 기반하여 상기 추출 특징 벡터들의 채널들 (N) 각각에 적용할 메타 데이터와 관련된 특성 값들을 산출 또는 결정하고, 상기 산출 또는 결정된 채널들 별 특성 값에 기반하여 상기 추출 특징 벡터들에 상기 메타 데이터와 관련된 특성을 적용할 수 있다. 즉, 상기 메타 데이터는 메타 데이터 퓨전 모듈 (121)을 통해 상기 추출 특징 벡터들에 대한 특징 공간 (feature space 또는 CNN feature space)에서 상기 추출 특징 벡터들에 적용될 수 있다. 이와 관련된 구체적인 내용은 도 6 및 7에 기반하여 자세히 후술한다.Metadata for the image data may be input into the second stream 120, and extracted feature vectors extracted from the first stream 110 may be input. The meta data and extracted feature vectors are output as third feature vectors (f ₃ (x)) in which the meta data is meaningfully applied or fused to the extracted feature vectors through the meta data fusion module 121. It can be. For example, the meta data fusion module 121 may generate extension meta vectors having channels (N) and corresponding channels (M) of the extracted feature vectors based on the meta data, and the extension meta vectors Based on this, calculate or determine feature values related to metadata to be applied to each of the channels (N) of the extracted feature vectors, and apply the metadata to the extracted feature vectors based on the calculated or determined feature values for each channel. Characteristics related to can be applied. That is, the metadata can be applied to the extracted feature vectors in a feature space (feature space or CNN feature space) for the extracted feature vectors through the metadata fusion module 121. Specific details related to this will be described in detail later based on FIGS. 6 and 7.

다음으로, 컴퓨팅 장치는 제1 특징 벡터들 (f₁(x)) 및 제3 특징 벡터들 (f₃(x))에 대한 제1 손실 함수 (L_ma)에 기반하여 인코더 (111) 및/또는 메타 데이터 퓨전 모듈 (121)을 학습시킬 수 있다. 제1 손실 함수 (L_ma)는 하기의 수학식 1과 같이 정의될 수 있다. 이 경우, 상기 컴퓨팅 장치는 제1 손실 함수 (L_ma)에 기반하여 제1 특징 벡터들 (f₁(x)) 및 제3 특징 벡터들 (f₃(x))의 차이가 최소화되도록 인코더 (111) 및/또는 메타 데이터 퓨전 모듈 (121)를 학습시킬 수 있다.Next, the _computing device encoder ₁₁₁ _and / Alternatively, the metadata fusion module 121 can be trained. The first loss function (L _ma ) can be defined as Equation 1 below. In this case _, the computing device operates _the encoder ₍ 111) and/or the meta data fusion module 121 may be trained.

[수학식 1][Equation 1]

이 때, 인코더 (111)는 메타 데이터 퓨전 모듈 (121)에 의해 상기 메타 데이터가 유의미하게 반영된 제3 특징 벡터들 (f₃(x))과의 관계를 고려하여 사전 학습되므로, 상기 메타 데이터의 영향에 따른 이미지 데이터의 특성을 고려하여 상기 이미지 데이터로부터 상기 추출 특징 벡터들을 효과적으로 추출할 수 있다.At this time, the encoder 111 is pre-trained by the metadata fusion module 121 by considering the relationship with the third feature vectors (f ₃ (x)) in which the metadata is meaningfully reflected, so that the metadata The extracted feature vectors can be effectively extracted from the image data by considering the characteristics of the image data according to the influence.

또는, 도 4를 참조하면, 제1 스트림 (110)에서 인코더 (111)에 의해 추출된 추출 특징 벡터들은 제1 프로젝터 (projector₁, 112) 및 제1 예측기 (predictor₁, 113)에 의해 추가적으로 입력되어 차원이 감소되고 추가 가중치가 부가될 수 있다. 구체적으로, 제1 프로젝터 (112)는 인코더 (111)를 통해 추출된 상기 추출 특징 벡터들 (또는, CNN feature)에 multi-layer perceptron network (MLP)를 적용하여 보다 작은 차원을 갖는 특징 공간 (feature space) 또는 특징 벡터들로 변환시킬 수 있다. 제1 예측기 (113)는 상기 변환된 특징 벡터들에 추가 가중치를 적용하여 제1 특징 벡터들 (f₁(x))를 출력할 수 있다. 이 경우, 상기 추출 특징 벡터들은 제1 프로젝터 (112)를 통해 차원이 감소하더라도 일반화 특성이 유지되며, 이와 같은 차원의 감소를 통해 상기 제1 손실 함수를 보다 효과적으로 산출할 수 있다. 또한, 제2 스트림 (120)에서 상기 메타 데이터가 적용 또는 반영된 추출 특징 벡터들인 퓨전 특징 벡터들은 제2 프로젝터 (projector₂, 122) 및 제2 예측기 (predictor₂, 123)에 의해 추가적으로 입력되어 차원이 감소되고 추가 가중치가 부가되고, 제3 특징 벡터들 (f₃(x))로 변환될 수 있다. Or, referring to FIG. 4, the extracted feature vectors extracted by the encoder 111 from the first stream 110 are additionally input by the first projector (projector ₁ , 112) and the first predictor (predictor ₁ , 113) The dimensionality can be reduced and additional weights added. Specifically, the first projector 112 applies a multi-layer perceptron network (MLP) to the extracted feature vectors (or CNN features) extracted through the encoder 111 to create a feature space with a smaller dimension. space) or can be converted to feature vectors. The first predictor 113 may apply additional weights to the converted feature vectors and output first feature vectors (f ₁ (x)). In this case, the generalization characteristics of the extracted feature vectors are maintained even if their dimensionality is reduced through the first projector 112, and the first loss function can be calculated more effectively through this reduction in dimensionality. In addition, in the second stream 120, the fusion feature vectors, which are extracted feature vectors to which the metadata is applied or reflected, are additionally input by the second projector (projector ₂ , 122) and the second predictor (predictor ₂ , 123) to increase the dimensionality. It can be reduced, added with additional weights, and converted to third feature vectors (f ₃ (x)).

이 경우, 상기 컴퓨팅 장치는 도 3을 참조하여 설명한 바와 같이 제1 특징 벡터들 (f₁(x)) 및 제3 특징 벡터들 (f₃(x))에 기반하여 제1 손실 함수가 최소화되도록 인코더 (111), 제1 프로젝터 (112), 제1 예측기 (113), 메타 데이터 퓨전 모듈 (121), 제2 프로젝터 (122) 및/또는 제2 예측기 (123)의 파라미터를 조정할 수 있다. 한편, 제2 프로젝터 (122)와 제1 프로젝터 (112)는 서로 대응한 차원을 갖는 특징 벡터들로 차원 감소시킬 수 있다.In this case, the computing device minimizes the first loss function based on the first feature vectors (f ₁ (x)) and the third feature vectors (f ₃ (x)) as described with reference to FIG. 3. Parameters of the encoder 111, the first projector 112, the first predictor 113, the metadata fusion module 121, the second projector 122, and/or the second predictor 123 may be adjusted. Meanwhile, the second projector 122 and the first projector 112 can reduce the dimension to feature vectors having corresponding dimensions.

이와 같이, 인코더 (111)는 사전 학습 과정에서 메타 데이터의 특성을 고려하여 특징 벡터들을 추출하도록 학습되었는바, 특정 태스크에 대한 추론 단계에서 입력되는 이미지 데이터에 대한 메타 데이터가 별도로 입력되지 않더라도 상기 입력된 이미지 데이터에서의 메타 데이터의 특성을 고려하여 상기 특징 벡터들을 추출할 수 있다.In this way, the encoder 111 is trained to extract feature vectors by considering the characteristics of metadata in the pre-learning process, so even if metadata for image data input in the inference step for a specific task is not separately input, the input The feature vectors can be extracted by considering the characteristics of metadata in the image data.

이하에서는, 모멘텀 인코더 (momentum encoder)를 추가적으로 이용하여 인코더 (111)를 보다 효과적이고 안정적으로 자기 주도 사전 학습시키는 방법을 설명한다.Below, a method of self-directed pre-training of the encoder 111 more effectively and stably by additionally using a momentum encoder will be described.

도 5는 컴퓨팅 장치가 대조 학습 (contrastive learning)에 기반하여 인코더를 자기 주도 사전 학습시키는 방법을 설명하기 위한 도면이다.FIG. 5 is a diagram illustrating a method by which a computing device self-directs pre-training an encoder based on contrastive learning.

도 5를 참조하면, 제1 스트림 (110)에서는 입력된 이미지 데이터가 모멘텀 인코더 (151) 및 제1 모멘텀 프로젝터 (152)를 통해 제2 특징 벡터들 (f₂(x))로 변환 또는 출력되는 과정이 더 추가될 수 있다. 여기서, 모멘텀 인코더 (151)는 인코더 (111)의 파라미터 (또는, 가중치, 파라미터 값)에 대한 이동 평균화 (moving averaging) 또는 지수 이동 평균화 (exponential moving averaging)를 통해 파라미터가 업데이트될 수 있고, 제1 모멘텀 프로젝터 (152)는 제1 프로젝터 (112)의 파라미터에 대한 이동 평균화를 통해 파라미터가 업데이트되는 프로젝터일 수 있다.Referring to FIG. 5, in the first stream 110, the input image data is converted or output into second feature vectors (f ₂ (x)) through the momentum encoder 151 and the first momentum projector 152. Additional processes may be added. Here, the momentum encoder 151 may have its parameters updated through moving averaging or exponential moving averaging for the parameters (or weights, parameter values) of the encoder 111, and the first The momentum projector 152 may be a projector whose parameters are updated through moving averaging of the parameters of the first projector 112.

구체적으로, 컴퓨팅 장치는, 입력된 이미지 데이터를 (임의) 증강 (예컨대, 이미지의 회전, 좌우 변경, 크롭 등 영상 변형 방식을 이용하여)시킬 수 있다. 이 경우, 컴퓨팅 장치는 인코더 (111)에 상기 입력된 이미지 데이터 (또는, 원본 이미지)를 입력하고, 모멘텀 인코더 (151)에 상기 증강된 증강 이미지 (augmented image view)를 입력할 수 있다. 인코더 (111)는 상기 원본 이미지로부터 CNN 특징 벡터인 제1 추출 특징 벡터들을 추출할 수 있다. 상기 제1 추출 특징 벡터들은 도 4를 참조하여 설명한 바와 같이 제1 프로젝터 (112) 및 제1 예측기 (113)을 통해 제1 특징 벡터들(f₁(x))로 변환 또는 출력될 수 있다. 모멘텀 인코더 (151)는 상기 증강 이미지로부터 제2 추출 특징 벡터들을 추출할 수 있다. 상기 제2 추출 특징 벡터들은 제1 모멘텀 프로젝터 (152)에 입력되어 제2 특징 벡터들 (f₂(x))로 변환 또는 출력될 수 있다. 이 경우, 상기 컴퓨팅 장치는 제1 특징 벡터들(f₁(x)) 및 제2 특징 벡터들 (f₂(x))에 대한 제2 손실 함수 (L_c)를 이용하여 인코더 (111) (또는, 제1 프로젝터 (112) 및/또는 제1 예측기 (113))를 사전 학습시킬 수 있다. Specifically, the computing device can (arbitrarily) augment the input image data (for example, using image transformation methods such as image rotation, left/right change, and cropping). In this case, the computing device may input the input image data (or original image) to the encoder 111 and input the augmented augmented image view to the momentum encoder 151. The encoder 111 may extract first extracted feature vectors, which are CNN feature vectors, from the original image. The first extracted feature vectors may be converted to or output as first feature vectors f ₁ (x) through the first projector 112 and the first predictor 113, as described with reference to FIG. 4 . The momentum encoder 151 may extract second extraction feature vectors from the augmented image. The second extracted feature vectors may be input to the first momentum projector 152 and converted to or output as second feature vectors (f ₂ (x)). In this case _, the _computing device encoder 111 ₍ Alternatively, the first projector 112 and/or the first predictor 113 may be pre-trained.

구체적으로, 상기 제2 손실 함수(L_c)는 수학식 2와 같이 평균제곱오차 (mean squared error)의 함수로 정의될 수 있다. Specifically, the second loss function (L _c ) may be defined as a function of mean squared error as shown in Equation 2.

[수학식 2][Equation 2]

한편, 모멘텀 인코더 (151)는, 상술한 바와 같이, 상기 학습에 의해 조정된 인코더 (111)의 파라미터를 이동 평균화 또는 지수 이동 평균화 (exponential moving averaging)하여 파라미터가 업데이트되고, 제1 모멘텀 프로젝터 (152)는 상기 학습에 의해 조정된 제1 프로젝터 (112)의 파라미터에 대한 이동 평균화를 통해 파라미터가 업데이트될 수 있다. 예컨대, 인코더 (111)가 제1 과정에서 제1 파라미터 값이 설정되고, 제2 과정에서 제2 파라미터 값이 설정된 경우, 모멘텀 인코더 (151)는 자신의 파라미터 값 (초기 설정 값)과 상기 제1 파라미터를 가중합 (예컨대, 자신의 파라미터 값에 더 많은 가중치 (예컨대, 자신의 파라미터 값에 0.99를 부가하고, 제1 파라미터 값에 0.01을 부가)할 수 있음)한 값으로 자신의 파라미터 값인 P를 업데이트할 수 있고, 제2 과정에서 제2 파라미터 값과 업데이트된 P 값을 가중합하여 자신의 파라미터 값을 다시 업데이트할 수 있다.Meanwhile, as described above, the momentum encoder 151 updates the parameters by moving averaging or exponential moving averaging the parameters of the encoder 111 adjusted by learning, and the first momentum projector 152 ), the parameters may be updated through moving averaging for the parameters of the first projector 112 adjusted by the learning. For example, when the encoder 111 sets the first parameter value in the first process and sets the second parameter value in the second process, the momentum encoder 151 sets its parameter value (initial setting value) and the first parameter value. P, which is the value of the parameter, is calculated by weighting the parameters (e.g., more weight can be added to the value of the parameter (e.g., adding 0.99 to the value of the parameter and 0.01 to the value of the first parameter)). It can be updated, and its own parameter value can be updated again by weighting the second parameter value and the updated P value in the second process.

또한, 제2 스트림 (120)은 입력된 메타 데이터가 모멘텀 메타 데이터 퓨전 모듈 (161) 및 제2 모멘텀 프로젝터 (162)를 통해 제4 특징 벡터들 (f₄(x))로 변환 또는 출력되는 데이터 흐름이 더 포함될 수 있다. 이 경우, 상기 컴퓨팅 장치는 입력된 메타 데이터를 메타 데이터 퓨전 모듈 (121) 및 모멘텀 메타 데이터 퓨전 모듈 (161)에 입력할 수 있다. 메타 데이터 퓨전 모듈 (121)은 상기 메타 데이터에 기반하여 생성된 제1 확장 메타 벡터들에 기반하여 제1 추출 특징 벡터들에 메타 데이터의 특성을 적용 또는 퓨즈한 제1 퓨젼 특징 벡터들을 생성할 수 있다. 상기 제1 퓨젼 특징 벡터들은 제2 프로젝터 (122) 및 제2 예측기 (123)을 통해 제3 특징 벡터들 (f₃(x))로 변환 또는 출력될 수 있다. 또한, 모멘텀 메타 데이터 퓨전 모듈 (161)은 상기 메타 데이터에 기반하여 제2 확장 메타 벡터들을 생성하고, 상기 제2 확장 메타 벡터들에 기반하여 상기 제2 추출 특징 벡터들에 메타 데이터의 특성을 적용 또는 퓨즈하여 제2 퓨젼 특징 벡터들을 생성할 수 있다. 상기 제2 퓨젼 특징 벡터들은 제2 모멘텀 프로젝터 (162)를 통해 차원이 감소된 제4 특징 벡터들 (f₄(x))로 변환 또는 출력될 수 있다. 이 경우, 상기 컴퓨팅 장치는 제3 특징 벡터들 (f₃(x)) 및 제4 특징 벡터들 (f₄(x))에 대한 제3 손실 함수 (Lm)에 기반하여 메타 데이터 퓨전 모듈 (121) (및/또는, 제2 프로젝터 (122) 및 제2 예측기 (123))를 학습시킬 수 있다. 예컨대, 제3 손실 함수 (Lm)은 하기의 수학식 3과 같이 정의될 수 있다. In addition, the second stream 120 is data in which the input metadata is converted or output into fourth feature vectors (f ₄ (x)) through the momentum meta data fusion module 161 and the second momentum projector 162. More flows may be included. In this case, the computing device may input the input metadata to the metadata fusion module 121 and the momentum metadata fusion module 161. The metadata fusion module 121 may generate first fusion feature vectors by applying or fusing characteristics of the metadata to the first extracted feature vectors based on the first extended metavectors generated based on the metadata. there is. The first fusion feature vectors may be converted or output into third feature vectors (f ₃ (x)) through the second projector 122 and the second predictor 123. Additionally, the momentum meta data fusion module 161 generates second extended meta vectors based on the meta data and applies the characteristics of the meta data to the second extracted feature vectors based on the second extended meta vectors. Alternatively, second fusion feature vectors may be generated by fusing. The second fusion feature vectors may be converted or output into fourth dimension-reduced feature vectors (f ₄ (x)) through the second momentum projector 162. In this case, the computing device uses the metadata fusion module 121 based on the third loss function (Lm) for the third feature vectors (f ₃ (x)) and the fourth feature vectors (f ₄ (x)) ) (and/or the second projector 122 and the second predictor 123) can be trained. For example, the third loss function (Lm) can be defined as Equation 3 below.

[수학식3][Equation 3]

나아가, 모멘텀 메타 데이터 퓨전 모듈 (161) 및/또는 제2 모멘텀 프로젝터 (162)는 메타 데이터 퓨전 모듈 (121) 및/또는 제2 프로젝터 (122)의 파라미터에 대한 이동 평균화 (moving averaging) 또는 지수 이동 평균화 (exponential moving averaging)를 통해 파라미터가 업데이트될 수 있다.Furthermore, the momentum metadata fusion module 161 and/or the second momentum projector 162 perform moving averaging or exponential moving on the parameters of the metadata fusion module 121 and/or the second projector 122. Parameters can be updated through averaging (exponential moving averaging).

최종적으로, 컴퓨팅 장치는 하기의 수학식 4에 기반한 제4 손실 함수 (L_ma)에 기반하여 인코더 (111)를 사전 학습시킬 수 있다.Finally, the computing device may pre-train the encoder 111 based on the fourth loss function (L _ma ) based on Equation 4 below.

[수학식4][Equation 4]

이와 같이, 컴퓨팅 장치는 제1 스트림 (110) 및 제2 스트림 (120)에 기반한 학습 과정에서 제4 손실 함수 (L_ma)도 최소화되도록 인코더 (111)를 사전 학습시킬 수 있다. 이 경우, 인코더 (111)는 멀티 차원 메타 데이터 공간 (multi-dimensional meta-data space)과 매핑될 수 있는 이미지 특징 공간 (CNN 인코더로부터의 잠재 공간 (latent space))에서의 추출 특징 벡터들을 추출할 수 있도록 사전 학습될 수 있다.In this way, the computing device may pre-train the encoder 111 so that the fourth loss function (L _ma ) is also minimized in the learning process based on the first stream 110 and the second stream 120. In this case, the encoder 111 extracts feature vectors from the image feature space (latent space from the CNN encoder), which can be mapped to a multi-dimensional meta-data space. It can be pre-trained so that

이하에서는, 메타 데이터 퓨전 모듈 (121)이 메타 데이터의 특성을 유의미하게 상기 추출 특징 벡터들에 적용하는 방법을 자세히 설명한다.Below, we will describe in detail how the metadata fusion module 121 meaningfully applies metadata characteristics to the extracted feature vectors.

도 6 및 도 7는 컴퓨팅 장치가 메타 데이터 퓨전 모듈을 이용하여 추출 특징 벡터에 메타 데이터의 특성을 적용하는 방법을 설명하기 위한 도면이다.Figures 6 and 7 are diagrams for explaining a method by which a computing device applies metadata characteristics to extracted feature vectors using a metadata fusion module.

도 6을 참조하면, 컴퓨팅 장치는 상기 메타 데이터 퓨전 모듈에 메타 데이터를 입력할 수 있다. 이 경우, 상기 메타 데이터 퓨전 모듈은 상기 메타 데이터에 기반하여 메타 벡터 (11)를 생성 또는 구성할 수 있다. 메타 벡터 (11)는 상기 메타 데이터에 포함된 환자의 특성 및 장치의 특성의 수에 기반하여 k 크기의 벡터 (또는, 요소의 수가 k인 벡터)로 구성될 수 있다. 또는, 메타 벡터 (11)는 상기 메타 데이터에 포함된 환자의 특성 및 장치의 특성의 수에 기반하여 k 크기의 벡터 (또는, 요소의 수가 k인 벡터)로써 하나의 채널로 구성될 수 있다.Referring to FIG. 6, a computing device can input metadata into the metadata fusion module. In this case, the meta data fusion module can generate or configure the meta vector 11 based on the meta data. The meta vector 11 may be composed of a vector of size k (or a vector with k elements) based on the number of patient characteristics and device characteristics included in the meta data. Alternatively, the meta vector 11 may be composed of one channel as a vector of size k (or a vector with k elements) based on the number of patient characteristics and device characteristics included in the meta data.

다음으로, 상기 메타 데이터 퓨전 모듈은 메타 벡터 (11)에 기반하여 인코더에 의해 추출된 추출 특징 벡터들의 N 채널들에 대응하도록 M개 또는 M개의 채널들에 대한 확장 메타 벡터들 (13)을 생성할 수 있다. 구체적으로, 상기 메타 데이터 퓨전 모듈은 적어도 하나의 전 연결 레이어 (FC 레이어) 및 선형 레이어를 이용하여 메타 벡터 (11)를 상기 복수의 채널인 M 채널들에 대한 확장 메타 벡터들 (13)로 확장시킬 수 있다. 예컨대, 상기 인코더로부터 추출된 추출 특징 벡터들의 채널이 N개인 경우, 상기 메타 데이터 퓨전 모듈은 메타 벡터 (11)를 M 채널들에 대한 확장 메타 벡터들 (13)로 확장 생성할 수 있다. 여기서, 상기 M은 N 또는 2N과 대응할 수 있다. Next, the meta data fusion module generates extension meta vectors (13) for M or M channels to correspond to the N channels of the extracted feature vectors extracted by the encoder based on the meta vector (11). can do. Specifically, the metadata fusion module extends the metavector 11 into extension metavectors 13 for the M channels, which are the plurality of channels, using at least one full connection layer (FC layer) and a linear layer. You can do it. For example, when the number of channels of extracted feature vectors extracted from the encoder is N, the meta data fusion module can expand and generate meta vectors (11) into extension meta vectors (13) for M channels. Here, M may correspond to N or 2N.

도 7을 참조하면, 메타 데이터 퓨전 모듈 (121)은 상기 확장 메타 벡터들에 기반하여 상기 N 채널들 각각에 대응하는 또는 적용될 특성 값 (Y)을 산출 또는 결정할 수 있다.Referring to FIG. 7, the meta data fusion module 121 may calculate or determine a characteristic value (Y) corresponding to or to be applied to each of the N channels based on the extended meta vectors.

구체적으로, 메타 데이터 퓨전 모듈 (121)은 확장 메타 벡터들 중 둘 이상의 확장 메타 벡터들에 기반하여 하나의 채널에 대한 특성 값을 산출할 수 있다. 구체적으로, 메타 데이터 퓨전 모듈 (121)은 하나의 채널에 대한 특성 값으로 하나의 확장 메타 벡터의 요소들의 분산 및 다른 하나의 확장 메타 벡터에 포함된 요소들의 평균을 산출할 수 있다 (제1 방식). 이와 같은 제1 방식으로, 상기 컴퓨팅 장치는 상기 N 채널들 각각에 대한 특성 값 (Y)을 산출 및 결정할 수 있다.Specifically, the meta data fusion module 121 may calculate characteristic values for one channel based on two or more of the extended meta vectors. Specifically, the metadata fusion module 121 may calculate the variance of the elements of one extended metavector and the average of the elements included in the other extended metavector as the characteristic value for one channel (first method ). In this first method, the computing device can calculate and determine characteristic values (Y) for each of the N channels.

또는, 메타 데이터 퓨전 모듈 (121)은 확장 메타 벡터들 중 하나의 확장 메타 벡터에 기반하여 하나의 채널에 대한 특성 값을 산출할 수 있다 (제2 방식). 구체적으로, 상기 메타 데이터 퓨전 모듈은 하나의 확장 메타 벡터의 요소들의 평균 및 분산 (하나의 평균 및 분산 셋)을 산출하고, 상기 평균 및 분산 셋을 하나의 채널에 대한 특성 값으로 결정할 수 있다. 예컨대, 메타 데이터 퓨전 모듈 (121)은 상기 메타 데이터에 기반하여 N 채널들에 대한 추출 특징 벡터들 (즉, N개의 추출 특징 벡터들)에 대응한 상기 N 개의 확장 메타 벡터들을 생성할 수 있고, N 개의 확장 메타 벡터들 각각의 요소들의 평균 및 분산을 산출하여 N개의 평균 및 분산 셋 들을 산출할 수 있다. 상기 하나의 평균 및 분산 셋은 하나의 채널에 대한 특성 값으로 결정될 수 있다.Alternatively, the meta data fusion module 121 may calculate a characteristic value for one channel based on one of the extended meta vectors (second method). Specifically, the metadata fusion module can calculate the average and variance (one average and variance set) of the elements of one extended metavector, and determine the average and variance set as the characteristic value for one channel. For example, the meta data fusion module 121 may generate the N extension meta vectors corresponding to extracted feature vectors (i.e., N extracted feature vectors) for N channels based on the meta data, By calculating the average and variance of each element of the N extended meta-vectors, N average and variance sets can be calculated. The one average and variance set can be determined as a characteristic value for one channel.

다음으로, 메타 데이터 퓨전 모듈 (121)은 상기 각 채널에 대한 특성 값에 기반하여 추출 특징 벡터들(X) 각각을 적응형 인스턴스 정규화 (Adaptive Instance Normalization, AdaIN)시키는 방식으로 추출 특징 벡터들(X)에 채널 별로 특성 값 (Y)을 적용할 수 있다. 이를 위해, 상기 각 채널에 대한 특성 값 (Y)은 적응형 인스턴스 정규화를 위한 스케일 팩터 및 바이어스로 이용할 수 있다. 예컨대, 하기의 수학식 5 및/또는 수학식 6를 참조하면, 하나의 채널에 대한 특성 값 (Y) 중 분산은 적응형 인스턴스 정규화를 위한 스케일 팩터로 이용되고, 평균은 상기 적응형 인스턴스 정규화를 위한 바이어스로 결정될 수 있다. 한편, 메타 데이터 퓨전 모듈 (121)은 각 채널 별 특성 값 (Y)을 적용하기 전에 추출 특징 벡터들 (X)에 대한 정규화 (x_norm)를 수행할 수 있다. 예컨대, 메타 데이터 퓨전 모듈 (121)은 추출 특징 벡터들 (X) 각각에 대한 평균 (

) 및 분산 (

)에 기반하여 추출 특징 벡터들 (X)을 정규화시킬 수 있다. 이하에서는, 설명의 편의를 위해 상기 정규화된 추출 특징 벡터들을 제1 정규 특징 벡터 (x_norm)들로 정의하고, 특성 값 (Y)에 기반하여 채널 별로 AdaIN된 제1 정규 특징 벡터(x_norm)들을 제2 정규 특징 벡터들로 정의한다.Next, the metadata fusion module 121 performs adaptive instance normalization (AdaIN) on each of the extracted feature vectors (X) based on the characteristic values for each channel. ), the characteristic value (Y) can be applied to each channel. For this purpose, the characteristic value (Y) for each channel can be used as a scale factor and bias for adaptive instance normalization. For example, referring to Equation 5 and/or Equation 6 below, the variance among the characteristic values (Y) for one channel is used as a scale factor for adaptive instance normalization, and the average is used for the adaptive instance normalization. It can be determined by the bias for. Meanwhile, the metadata fusion module 121 may perform normalization (x _norm ) on the extracted feature vectors (X) before applying the feature value (Y) for each channel. For example, the metadata fusion module 121 extracts the average (

) and variance (

), the extracted feature vectors (X) can be normalized based on Hereinafter, for convenience of explanation, the normalized extracted feature vectors are defined as first normal feature vectors (x _norm ), and the first normal feature vector (x _norm ) is AdaIN for each channel based on the feature value (Y). are defined as second normal feature vectors.

상술한 제1 방식에 따라 둘 이상의 메타 벡터들로부터 하나의 채널에 대한 특성 값 (Y)이 산출된 경우, 메타 데이터 퓨전 모듈 (121)은 하기의 수학식 5를 이용하여 상기 채널 별로 제1 정규 특징 벡터(x_norm)들을 AdaIN시켜 상기 제2 정규 특징 벡터들을 생성할 수 있다.When the characteristic value (Y) for one channel is calculated from two or more meta vectors according to the first method described above, the meta data fusion module 121 calculates the first normal value for each channel using Equation 5 below. The second normal feature vectors can be generated by AdaIN the feature vectors (x _norm ).

[수학식5][Equation 5]

여기서, σ(y₁)는 하나의 메타 벡터의 분산이고, μ(y₂)는 다른 하나의 메타 벡터의 평균일 수 있다. Here, σ(y ₁ ) may be the variance of one meta vector, and μ(y ₂ ) may be the average of another meta vector.

상술한 제2 방식에 따라 하나의 메타 벡터로부터 하나의 채널에 대한 특성 값 (Y)이 산출된 경우, 메타 데이터 퓨전 모듈 (121)은 하기의 수학식 6을 이용하여 상기 채널 별로 제1 정규 특징 벡터(x_norm)들을 AdaIN시켜 제2 정규 특징 벡터들을 생성할 수 있다.When the characteristic value (Y) for one channel is calculated from one meta vector according to the second method described above, the meta data fusion module 121 calculates the first normal feature for each channel using Equation 6 below. Second normal feature vectors can be generated by AdaIN vectors (x _norm ).

[수학식6][Equation 6]

여기서, σ(y₁) 및 μ(y₁)는 하나의 메타 벡터에 대한 분산 및 평균일 수 있다,Here, σ(y ₁ ) and μ(y ₁ ) can be the variance and mean for one metavector,

메타 데이터 퓨전 모듈 (121)은 상기 제2 정규 특징 벡터들을 ReLU 레이어를 통해 선형화 (또는, ReLU 활성화)시켜 제3 특징 벡터들(Z)을 생성 또는 출력할 수 있다. 여기서, 제3 특징 벡터들(Z)은 상기 메타 데이터의 특성이 적용된 상기 추출 특징 벡터로써, 도 2 내지 도 5에서 설명한 f₃(x)와 대응할 수 있다.The metadata fusion module 121 may generate or output third feature vectors (Z) by linearizing (or activating ReLU) the second regular feature vectors through a ReLU layer. Here, the third feature vectors (Z) are the extracted feature vectors to which the characteristics of the meta data are applied, and may correspond to f ₃ (x) described in FIGS. 2 to 5.

이와 같이, 상기 메타 데이터 퓨전 모듈은 상기 메타 데이터에 기반한 메타 벡터를 상기 추출 특징 벡터들과 동일한 채널로 확장시킨 확장 메타 벡터들을 생성하고, AdaIN을 통해 상기 확장 메타 벡터들의 특성을 효과적으로 상기 추출 특징 벡터에 반영할 수 있다.In this way, the metadata fusion module generates extended metavectors that extend metavectors based on the metadata to the same channel as the extracted feature vectors, and effectively combines the characteristics of the extended metavectors through AdaIN into the extracted feature vectors. can be reflected in

도 8을 참조하면, 컴퓨팅 장치는 상술한 바와 같이 제1 이미지 데이터 및 이에 대한 메타 데이터를 포함하는 제1 학습 데이터를 통하여 메타 데이터의 특성을 고려할 수 있도록 상기 인코더를 사전 학습시킬 수 있다. Referring to FIG. 8, the computing device may pre-train the encoder to consider the characteristics of metadata through first training data including first image data and metadata related thereto, as described above.

다음으로, 상기 컴퓨팅 장치는 제2 이미지 데이터 및 이에 대응하는 레이블 정보를 포함하는 제2 학습 데이터를 이용하여 상기 사전 학습된 인코더를 포함하는 분류 모델을 학습시킬 수 있다. 이 때, 분류 모델에 대한 학습은 다운스트림 태스크 (downstream task) 또는 특정 태스크 (폐 CT 이미지에 대한 폐결절 (lung nodules) 검출, 위양성 제거, 폐결절의 분류, 폐결절의 세그멘테이션)를 수행할 수 있도록 파인 튜닝 (fine-tuning)하는 학습 과정일 수 있다. 한편, 상술한 제1 이미지 데이터 및 상기 제2 이미지 데이터는 폐에 대한 CT 이미지를 포함할 수 있다.Next, the computing device may train a classification model including the pre-trained encoder using second training data including second image data and label information corresponding thereto. At this time, learning the classification model is fine-tuned to perform downstream tasks or specific tasks (lung nodules detection for lung CT images, false positive removal, classification of lung nodules, segmentation of lung nodules). It may be a learning process of (fine-tuning). Meanwhile, the above-described first image data and the second image data may include CT images of the lungs.

도 9를 참조하면, 컴퓨팅 장치는 상기 특정 태스크와 관련된 레이블 정보 및 상기 제2 이미지 데이터를 포함하는 제2 학습 데이터를 입력 받을 수 있다. 한편, 상술한 바와 같이 제2 학습 데이터는 특정 메타 데이터의 특성을 갖는 이미지 데이터가 편중될 수 있다. 상기 컴퓨팅 장치는 상기 제2 이미지 데이터를 사전 학습된 인코더 (211)에 입력할 수 있다. 사전 학습된 인코더 (211)는 상기 제2 이미지 데이터로부터 추출 특징 벡터들을 추출할 수 있다. 상기 추출 특징 벡터들은 상술한 바와 같이 특징 공간에서 차원을 줄이는 프로젝터 (212)에 입력되어 차원이 감소되고, 차원이 감소된 추출 특징 벡터는 예측기 (213)로 입력되어 상기 특정 태스크와 관련된 분류 정보 또는 분류 값으로 출력될 수 있다. 이 경우, 상기 컴퓨팅 장치는 상기 분류 정보 및 레이블 정보에 기초한 손실 함수 (L_ft)가 최소화되도록 상기 분류 모델의 파라미터 (예컨대, 인코더, 프로젝터 및/또는 예측기의 파라미터)를 조정할 수 있다.Referring to FIG. 9, the computing device may receive second learning data including label information related to the specific task and the second image data. Meanwhile, as described above, the second learning data may be biased toward image data having specific metadata characteristics. The computing device may input the second image data to the pre-trained encoder 211. The pre-trained encoder 211 may extract feature vectors from the second image data. The extracted feature vectors are input to the projector 212 to reduce the dimension in the feature space as described above, and the extracted feature vectors with reduced dimension are input to the predictor 213 to produce classification information or information related to the specific task. It can be output as a classification value. In this case, the computing device may adjust parameters of the classification model (e.g., parameters of the encoder, projector, and/or predictor) such that the loss function (L _ft ) based on the classification information and label information is minimized.

이 때, 사전 학습된 인코더 (211)는 메타 데이터의 특성을 고려하여 멀티 차원 메타 데이터 공간과 매핑될 수 있는 추출 특징 벡터들을 추출할 수 있도록 사전 학습되었는바, 상기 예측기 (213)는 상기 입력된 이미지 데이터와 관련된 메타 데이터의 특성에 의해 영향을 받지 않은 분류 정보를 출력할 수 있다. 즉, 상기 학습 데이터가 특정 메타 데이터의 특성을 갖는 이미지 데이터들이 편중되어 있더라도, 컴퓨팅 장치는 사전 학습된 인코더 (211)을 통하여 편중된 특정 메타 데이터의 특성에 의한 성능 저하 없도록 분류 모델 (200)을 학습시킬 수 있다.At this time, the pre-trained encoder 211 is pre-trained to extract extracted feature vectors that can be mapped to the multi-dimensional meta data space by considering the characteristics of the meta data, and the predictor 213 uses the input It is possible to output classification information that is not affected by the characteristics of metadata related to image data. In other words, even if the learning data is biased toward image data having the characteristics of specific metadata, the computing device uses the pre-trained encoder 211 to create a classification model 200 to prevent performance degradation due to the biased characteristics of the specific metadata. It can be learned.

이와 같은 학습 과정이 완료된 분류 모델은 이미지 데이터로부터 폐 결절 등과 관련된 분류 정보를 출력할 수 있다 (온라인 추론 단계). 구체적으로, 상기 컴퓨팅 장치는 이미지 데이터가 입력되면 상기 학습 완료된 분류 모델을 이용하여 상기 이미지 데이터에 대한 분류 정보를 획득 또는 출력할 수 있다. 이 때, 상기 분류 모델이 특정 메타 데이터의 특성을 갖는 이미지 데이터들이 편중되어 포함된 제2 학습 데이터로 학습되었더라도, 상기 분류 모델은 상기 이미지 데이터의 메타 데이터의 특성에 영향을 최소화한 상기 메타 데이터의 특성에 강건한 분류 정보를 출력할 수 있다.The classification model that has completed this learning process can output classification information related to lung nodules, etc. from image data (online inference step). Specifically, when image data is input, the computing device may obtain or output classification information for the image data using the learned classification model. At this time, even if the classification model was learned with second training data that includes image data with specific metadata characteristics in a biased manner, the classification model is based on the metadata that minimizes the impact on the metadata characteristics of the image data. Classification information that is robust to characteristics can be output.

도 10을 참조하면, 컴퓨팅 장치는 상기 인코더의 사전 학습을 위한 제1 학습 데이터를 입력 받을 수 있다 (S101). 여기서, 상기 제1 학습 데이터는 상술한 바와 같이 제1 이미지 데이터와 상기 제1 이미지 데이터에 대응하는 메타 데이터를 포함할 수 있다. 상기 제1 학습 데이터는 상기 제1 이미지 데이터에 대응하는 레이블 정보를 필요하지 않은 점에서 상당히 많은 양의 제1 이미지 데이터를 포함할 수 있다.Referring to FIG. 10, the computing device may receive first learning data for dictionary learning of the encoder (S101). Here, the first learning data may include first image data and metadata corresponding to the first image data, as described above. The first training data may include a significant amount of first image data in that label information corresponding to the first image data is not required.

상기 컴퓨팅 장치는 상기 제1 학습 데이터에 기반하여 상기 인코더를 자기 주도 사전 학습시킬 수 있다 (S103). 구체적으로, 상기 컴퓨팅 장치는 상기 인코더에 의해 추출된 제1 특징 벡터들 (또는, 추출 특징 벡터들) 및 제3 특징 벡터들에 기반하여 상기 인코더를 사전 학습시킬 수 있다. 예컨대, 상기 컴퓨팅 장치는 수학식 1에 기반한 제1 손실 함수가 최소화되도록 상기 인코더의 파라미터를 조정하여 상기 인코더를 사전 학습시킬 수 있다. 한편, 상기 제1 특징 벡터들은 도 4 및 도 5에서 도시된 바와 같이, 상기 인코더에 의해 상기 제1 이미지 데이터로부터 추출된 추출 특징 벡터들에 다층 퍼셉트론(multi-layer perceptron, MLP) 및 추가 가중치가 적용된 특징 벡터들일 수도 있다. 상기 제3 특징 벡터들도 도 4 및 도 5에서 도시된 바와 같이 메타 특징 벡터들에 다층 퍼셉트론 및 추가 가중치가 적용된 특징 벡터들일 수도 있다.The computing device may self-directly pre-train the encoder based on the first learning data (S103). Specifically, the computing device may pre-train the encoder based on first feature vectors (or extracted feature vectors) and third feature vectors extracted by the encoder. For example, the computing device may pre-train the encoder by adjusting the parameters of the encoder to minimize the first loss function based on Equation 1. Meanwhile, as shown in FIGS. 4 and 5, the first feature vectors have a multi-layer perceptron (MLP) and additional weights applied to the extracted feature vectors extracted from the first image data by the encoder. These may be applied feature vectors. The third feature vectors may also be feature vectors obtained by applying a multi-layer perceptron and additional weights to meta feature vectors, as shown in FIGS. 4 and 5.

또는, 상기 컴퓨팅 장치는 상기 인코더의 파라미터를 이동 평균화하는 모멘텀 인코더로부터 추출된 제2 특징 벡터들을 더 반영 또는 고려하여 상기 인코더를 사전 학습시킬 수 있다. 이 경우, 상기 컴퓨팅 장치는 상기 제1 특징 벡터들과 상기 제2 특징 벡터들에 대한 제2 손실 함수, 및 상기 제1 특징 벡터들과 상기 제3 특징 벡터들에 대한 제1 손실 함수에 기반하여 상기 인코더를 사전 학습시킬 수 있다.Alternatively, the computing device may pre-train the encoder by further reflecting or considering second feature vectors extracted from a momentum encoder that moves averages the parameters of the encoder. In this case, the computing device is based on a second loss function for the first feature vectors and the second feature vectors, and a first loss function for the first feature vectors and the third feature vectors. The encoder can be pre-trained.

또는, 상기 컴퓨팅 장치는 상기 메타 데이터 퓨전 모듈의 파라미터를 이동 평균화하는 모멘텀 메타 데이터 퓨전 모듈이 출력하는 제4 특징 벡터들을 더 고려하여 상기 인코더를 사전 학습시킬 수 있다. 이 경우, 상기 컴퓨팅 장치는 상기 제1 특징 벡터들과 상기 제3 특징 벡터들에 대한 제1 손실 함수, 및 상기 제3 특징 벡터들과 상기 제4 특징 벡터들에 대한 제3 손실 함수에 기반하여 상기 인코더를 사전 학습시킬 수 있다.Alternatively, the computing device may pre-train the encoder by further considering fourth feature vectors output by a momentum metadata fusion module that moves averages the parameters of the metadata fusion module. In this case, the computing device is based on a first loss function for the first feature vectors and the third feature vectors, and a third loss function for the third feature vectors and the fourth feature vectors. The encoder can be pre-trained.

또는, 상기 컴퓨팅 장치는 상기 제1 특징 벡터들, 상기 제2 특징 벡터들, 상기 제3 특징 벡터들 및 상기 제4 특징 벡터들 모두를 고려하여 상기 인코더를 사전학습 시킬 수 있다. 이 경우, 상기 컴퓨팅 장치는 수학식 4에 따라 정의된 제4 손실 함수에 기반하여 상기 인코더를 사전 학습시킬 수 있다.Alternatively, the computing device may pre-train the encoder by considering all of the first feature vectors, the second feature vectors, the third feature vectors, and the fourth feature vectors. In this case, the computing device may pre-train the encoder based on the fourth loss function defined according to Equation 4.

상술한 손실 함수들 중 적어도 하나에 기반한 상기 인코더의 사전 학습이 완료된 경우, 상기 컴퓨팅 장치는 제2 이미지 데이터 및 상기 제2 이미지 데이터에 대한 레이블 정보를 포함하는 제2 학습 데이터를 입력 받을 수 있다 (S105). 상기 레이블 정보는 상술한 바와 같이 상기 제2 학습 데이터에 포함된 제2 이미지 데이터에 대한 폐 결절의 위치, 상기 폐 결절의 타입 또는 상기 폐 결절의 세그먼테이션에 대한 레이블 값을 포함할 수 있다. 이 경우, 상기 컴퓨팅 장치는 상기 제2 이미지 데이터를 상기 사전 학습된 인코더를 포함하는 분류 모델에 입력할 수 있다.When pre-training of the encoder based on at least one of the above-described loss functions is completed, the computing device may receive second training data including second image data and label information for the second image data ( S105). As described above, the label information may include a label value for the location of the lung nodule, the type of the lung nodule, or the segmentation of the lung nodule for the second image data included in the second learning data. In this case, the computing device may input the second image data into a classification model including the pre-trained encoder.

다음으로, 상기 컴퓨팅 장치는 상기 분류 모델을 특정 태스크에 대한 분류 정보를 출력하도록 파인 튜닝 또는 학습시킬 수 있다 (S107). 상기 분류 모델에서 출력되는 분류 정보 (또는 분류 값) 및 레이블 정보 (또는, 레이블 값에 대한 손실 함수가 최소화되도록 상기 분류 모델을 학습시킬 수 있다.Next, the computing device can fine tune or learn the classification model to output classification information for a specific task (S107). The classification model may be trained so that the loss function for the classification information (or classification value) and label information (or label value) output from the classification model is minimized.

이와 같이 학습된 분류 모델은 이미지 데이터 (Query CT 등을 포함된)가 입력되면, 상기 이미지 데이터에 기반하여 폐 결절의 위치, 상기 폐 결절의 타입 또는 상기 폐 결절의 세그먼테이션에 대한 분류 정보를 출력할 수 있다 (온라인 추론 단계). 구체적으로, 상기 컴퓨팅 장치는 사전 학습된 인코더를 포함하는 분류 모델을 이용하여 입력된 이미지 데이터에 대한 메타 데이터의 특성에 영향을 받지 않는 분류 정보를 출력할 수 있다. When image data (including Query CT, etc.) is input, the classification model learned in this way outputs classification information about the location of the lung nodule, the type of the lung nodule, or the segmentation of the lung nodule based on the image data. (online inference step). Specifically, the computing device can output classification information that is not affected by the characteristics of metadata for input image data using a classification model including a pre-trained encoder.

위 실시예의 설명에 기초하여 해당 기술분야의 통상의 기술자는, 본 발명의 방법 및/또는 프로세스들, 그리고 그 단계들이 하드웨어, 소프트웨어 또는 특정 용례에 적합한 하드웨어 및 소프트웨어의 임의의 조합으로 실현될 수 있다는 점을 명확하게 이해할 수 있다. 상기 하드웨어는 범용 컴퓨터 및/또는 전용 컴퓨팅 장치 또는 특정 컴퓨팅 장치 또는 특정 컴퓨팅 장치의 특별한 모습 또는 구성요소를 포함할 수 있다. 상기 프로세스들은 내부 및/또는 외부 메모리를 가지는, 하나 이상의 마이크로프로세서, 마이크로컨트롤러, 임베디드 마이크로컨트롤러, 프로그래머블 디지털 신호 프로세서 또는 기타 프로그래머블 장치에 의하여 실현될 수 있다. 게다가, 혹은 대안으로서, 상기 프로세스들은 주문형 집적회로(application specific integrated circuit; ASIC), 프로그래머블 게이트 어레이(programmable gate array), 프로그래머블 어레이 로직(Programmable Array Logic; PAL) 또는 전자 신호들을 처리하기 위해 구성될 수 있는 임의의 다른 장치 또는 장치들의 조합으로 실시될 수 있다. 더욱이 본 발명의 기술적 해법의 대상물 또는 선행 기술들에 기여하는 부분들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 기계 관찰 가능한 기록 매체에 기록될 수 있다. 상기 기계 관찰 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 기계 관찰 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 기계 관찰 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD, Blu-ray와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 전술한 장치들 중 어느 하나뿐만 아니라 프로세서, 프로세서 아키텍처 또는 상이한 하드웨어 및 소프트웨어의 조합들의 이종 조합, 또는 다른 어떤 프로그램 명령어들을 실행할 수 있는 기계 상에서 실행되기 위하여 저장 및 컴파일 또는 인터프리트될 수 있는, C와 같은 구조적 프로그래밍 언어, C++ 같은 객체지향적 프로그래밍 언어 또는 고급 또는 저급 프로그래밍 언어(어셈블리어, 하드웨어 기술 언어들 및 데이터베이스 프로그래밍 언어 및 기술들)를 사용하여 만들어질 수 있는바, 기계어 코드, 바이트코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 이에 포함된다. Based on the description of the above embodiments, those skilled in the art will understand that the method and/or processes of the present invention, and the steps thereof, can be realized with hardware, software, or any combination of hardware and software suitable for a specific application. The point can be clearly understood. The hardware may include general-purpose computers and/or dedicated computing devices or specific computing devices or special features or components of specific computing devices. The processes may be realized by one or more microprocessors, microcontrollers, embedded microcontrollers, programmable digital signal processors, or other programmable devices, with internal and/or external memory. Additionally, or alternatively, the processes may be configured as an application specific integrated circuit (ASIC), programmable gate array, programmable array logic (PAL), or to process electronic signals. It can be implemented with any other device or combination of devices. Furthermore, the subject matter of the technical solution of the present invention or the parts contributing to the prior art may be implemented in the form of program instructions that can be executed through various computer components and recorded on a machine-viewable recording medium. The machine-viewable recording medium may include program instructions, data files, data structures, etc., singly or in combination. The program instructions recorded on the machine-viewable recording medium may be specially designed and configured for the present invention or may be known and usable by those skilled in the art of computer software. Examples of machine-viewable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs, DVDs, and Blu-rays, and magneto-optical media such as floptical disks. (magneto-optical media), and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, etc. Examples of program instructions include storing and compiling or interpreting them for execution on any one of the foregoing devices, as well as any heterogeneous combination of processors, processor architectures, or combinations of different hardware and software, or any other machine capable of executing the program instructions. machine code, which may be created using a structured programming language such as C, an object-oriented programming language such as C++, or a high- or low-level programming language (assembly language, hardware description languages, and database programming languages and techniques); This includes not only bytecode but also high-level language code that can be executed by a computer using an interpreter.

따라서 본 명세서에 따른 일 태양에서는, 앞서 설명된 방법 및 그 조합들이 하나 이상의 컴퓨팅 장치들에 의하여 수행될 때, 그 방법 및 방법의 조합들이 각 단계들을 수행하는 실행 가능한 코드로서 실시될 수 있다. 다른 일 태양에서는, 상기 방법은 상기 단계들을 수행하는 시스템들로서 실시될 수 있고, 방법들은 장치들에 걸쳐 여러 가지 방법으로 분산되거나 모든 기능들이 하나의 전용, 독립형 장치 또는 다른 하드웨어에 통합될 수 있다. 또 다른 일 태양에서는, 위에서 설명한 프로세스들과 연관된 단계들을 수행하는 수단들은 앞서 설명한 임의의 하드웨어 및/또는 소프트웨어를 포함할 수 있다. 그러한 모든 순차 결합 및 조합들은 본 명세서의 범위 내에 속하도록 의도된 것이다.Accordingly, in one aspect according to the present specification, when the above-described methods and combinations thereof are performed by one or more computing devices, the methods and combinations of methods may be implemented as executable code that performs the respective steps. In another aspect, the method may be practiced as systems that perform the steps above, and the methods may be distributed in several ways across devices or all functions may be integrated into one dedicated, standalone device or other hardware. In another aspect, the means for performing steps associated with the processes described above may include any of the hardware and/or software described above. All such sequential combinations and combinations are intended to be within the scope of this specification.

예를 들어, 상기 하드웨어 장치는 본 명세서에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. 상기 하드웨어 장치는, 프로그램 명령어를 저장하기 위한 ROM/RAM 등과 같은 메모리와 결합되고 상기 메모리에 저장된 명령어들을 실행하도록 구성되는 MPU, CPU, GPU, TPU와 같은 프로세서를 포함할 수 있으며, 외부 장치와 신호를 주고받을 수 있는 통신부를 포함할 수 있다. 덧붙여, 상기 하드웨어 장치는 개발자들에 의하여 작성된 명령어들을 전달받기 위한 키보드, 마우스, 기타 외부 입력장치를 포함할 수 있다.For example, the hardware device may be configured to operate as one or more software modules to perform processing according to the present disclosure, and vice versa. The hardware device may include a processor such as an MPU, CPU, GPU, or TPU combined with a memory such as ROM/RAM for storing program instructions and configured to execute instructions stored in the memory, and external devices and signals. It may include a communication unit capable of sending and receiving. In addition, the hardware device may include a keyboard, mouse, and other external input devices to receive commands written by developers.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 사람이라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described with specific details such as specific components and limited embodiments and drawings, but this is only provided to facilitate a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , a person with ordinary knowledge in the technical field to which the present invention pertains can make various modifications and variations from this description.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and the scope of the patent claims described below as well as all modifications equivalent to or equivalent to the scope of the claims fall within the scope of the spirit of the present invention. They will say they do it.

그와 같이 균등하게 또는 등가적으로 변형된 것에는, 예컨대 본 명세서에 따른 방법을 실시한 것과 동일한 결과를 낼 수 있는, 논리적으로 동치(logically equivalent)인 방법이 포함될 것인바, 본 발명의 진의 및 범위는 전술한 예시들에 의하여 제한되어서는 아니되며, 법률에 의하여 허용 가능한 가장 넓은 의미로 이해되어야 한다.Such equivalent or equivalent modifications will include, for example, logically equivalent methods that can produce the same results as performing the method according to the present specification, and the spirit and scope of the present invention should not be limited by the foregoing examples, but should be understood in the broadest sense permissible by law.

상술한 바와 같은 본 발명의 실시형태들은 다양한 의료 기기에 적용될 수 있다.Embodiments of the present invention as described above can be applied to various medical devices.

Claims

In a method for a computing device to learn a classification model that outputs classification information based on image data,

Pre-training an encoder based on first training data including first image data and metadata; and

A step of training a classification model including the pre-trained encoder based on second training data including label information,

The encoder is pre-trained based on first feature vectors extracted from the first image data and third feature vectors obtained by applying characteristics of meta vectors related to the meta data to the first feature vectors.

According to paragraph 1,

The method, characterized in that the encoder is pre-trained based on a first loss function for the first feature vectors and the third feature vectors.

According to paragraph 1,

The method is characterized in that the encoder is pre-trained by further considering second feature vectors extracted from a momentum encoder that moves averages the parameters of the encoder.

According to paragraph 3,

The method is characterized in that the momentum encoder extracts the second feature vectors from augmented image data that augments the first image data.

According to paragraph 3,

The encoder is pre-trained based on a second loss function for the first feature vector and the second feature vector and a first loss function for the first feature vector and the third feature vector. .

According to paragraph 1,

The computing device inputs the metadata and the first feature vector to a metadata fusion module to obtain the third feature vectors,

The method is characterized in that the encoder is pre-trained by further considering fourth feature vectors output by a momentum metadata fusion module that moves averages the parameters of the metadata fusion module.

According to clause 6,

The encoder is pre-trained based on a first loss function for the first feature vectors and the third feature vectors and a third loss function for the third feature vectors and the fourth feature vectors. to do, how to do.

According to paragraph 1,

The meta vectors are generated for M channels to correspond to N channels associated with the first feature vectors based on the meta data,

Characteristics of the meta vectors are determined based on at least one of the mean and variance of the elements of each meta vector.

According to paragraph 1,

The third feature vectors are generated through adaptive instance normalization (AdaIN) for the first feature vectors,

Characterized in that the scale factor and bias of the adaptive instance normalization are determined based on the characteristics of the meta vectors.

According to paragraph 1,

The label information includes a label value for the location of the lung nodule, the type of the lung nodule, or the segmentation of the lung nodule for the second image data included in the second learning data.

According to paragraph 1,

The first feature vectors are feature vectors obtained by applying a multi-layer perceptron (MLP) and additional weight to the extracted feature vectors extracted from the first image data by the encoder.

In a method where a computing device outputs classification information related to lung nodules based on image data using a classification model,

Receiving image data as input; and

Including the step of inputting the image data into the classification model and outputting classification information,

The classification model includes a pre-trained encoder based on first training data including first image data and metadata,

The encoder is pre-trained based on first feature vectors extracted from the first image data and third feature vectors obtained by applying characteristics of meta vectors generated from the meta data to the first feature vectors.

In a computing device that trains a classification model that outputs classification information based on image data,

A communication unit connected to external devices; and

Includes a processor connected to the communication unit,

The processor pre-trains the encoder based on first training data including first image data and metadata acquired through the communication unit, and based on second learning data including label information obtained through the communication unit. Train a classification model including the pre-trained encoder,

The processor pre-trains the encoder based on first feature vectors extracted from the first image data and third feature vectors obtained by applying characteristics of meta vectors related to the meta data to the first feature vectors. Device.