WO2024058585A1 - Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information - Google Patents
Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information Download PDFInfo
- Publication number
- WO2024058585A1 WO2024058585A1 PCT/KR2023/013863 KR2023013863W WO2024058585A1 WO 2024058585 A1 WO2024058585 A1 WO 2024058585A1 KR 2023013863 W KR2023013863 W KR 2023013863W WO 2024058585 A1 WO2024058585 A1 WO 2024058585A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- exercise
- voice data
- subject
- clinical information
- severity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Measuring devices for evaluating the respiratory organs
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4842—Monitoring progression or stage of a disease
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7275—Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Definitions
- the technology described below relates to a technique for predicting the degree of lung disease using the subject's voice.
- COPD chronic obstructive pulmonary disease
- the technology described below seeks to provide a technique for predicting the degree of lung disease such as COPD based on the subject's voice and clinical information.
- a method of classifying the severity of a subject's lung disease using voice data and clinical information includes the steps of: an analysis device receiving voice data and clinical information of a subject; the analysis device preprocessing the voice data and clinical information; The analysis device includes inputting the pre-processed voice data and clinical information into a pre-trained learning model, and the analysis device classifies the severity of the subject's lung disease based on the output value of the learning model.
- the analysis device that classifies the severity of the subject's lung disease includes an interface device that receives the subject's voice data and clinical information, a storage device that stores a learning model that receives the voice data and clinical information and classifies the severity of the lung disease, and the input device. It includes a computing device that preprocesses voice data and clinical information, inputs the preprocessed voice data and clinical information into the learning model, and classifies the severity of the subject's lung disease based on the output value of the learning model.
- the technology described below can predict the degree of lung disease by analyzing the user's voice and clinical information that can be obtained relatively easily.
- the technology described below can diagnose the severity of lung disease through voice recording and self-diagnosis without the patient having to visit a medical institution.
- Figure 1 is an example of a lung disease severity classification system using voice and clinical information.
- Figure 2 is an example of the learning process of a learning model for lung disease severity classification.
- Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity.
- Figure 4 is an example of an analysis device that classifies lung disease severity.
- first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, and are only used for the purpose of distinguishing one component from other components. It is used only as For example, a first component may be named a second component without departing from the scope of the technology described below, and similarly, the second component may also be named a first component.
- the term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.
- each component is responsible for. That is, two or more components, which will be described below, may be combined into one component, or one component may be divided into two or more components for more detailed functions.
- each of the components described below may additionally perform some or all of the functions handled by other components, and some of the main functions handled by each component may be performed by other components. Of course, it can also be carried out exclusively by .
- each process forming the method may occur in a different order from the specified order unless a specific order is clearly stated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.
- the technology described below is a technique for predicting or classifying the severity of lung diseases such as COPD based on the subject's voice and clinical information. For convenience of explanation, the following explanation will focus on COPD. However, the technology described below can be used to predict or classify the severity of various lung diseases other than COPD.
- User data used for analysis includes the user's voice and clinical information.
- user data is collected from a specific subject, and can be collected before and after exercise for a subject performing a certain exercise.
- the user's voice is collected before and after exercise, and input variables include features extracted from the voice.
- Some of the clinical information may be collected separately before and after exercise.
- clinical information may include questionnaire information collected from the subject. A detailed description of user data will be provided later.
- the analysis device classifies or predicts the degree of lung disease based on the user's voice and clinical information.
- the analysis device can be implemented as a variety of devices capable of processing data.
- an analysis device can be implemented as a PC, a server on a network, a smart device, a wearable device, or a chipset with a dedicated program embedded therein.
- analysis devices may be built into various devices such as exercise equipment, vehicles, smart speakers, etc.
- the analysis device can classify lung disease using a machine learning model.
- Machine learning models include decision trees, random forest, KNN (K-nearest neighbor), Naive Bayes, SVM (support vector machine), and ANN (artificial neural network). The following learning model will be explained focusing on DNN (Deep Neural Network). However, the learning model for lung disease classification can be implemented as various types of models.
- Figure 1 is an example of a lung disease severity classification system 100 using voice and clinical information.
- the analysis device is a user terminal 130, a computer terminal 140, and a server 150.
- Subject A performs a certain exercise for a certain amount of time.
- Patients with lung disease may have different vocal characteristics before and after exercise. Accordingly, user data can be collected from subject A before and after exercise, respectively.
- the user data may include the subject's voice data and clinical information.
- Voice data consists of voice data before exercise and voice data after exercise.
- the voice data before exercise and the voice data after exercise are composed of data in which the same subject A uttered the same words or sentences (text) before and after exercise, respectively.
- Clinical information may consist of various items. Some of the items included in clinical information correspond to data collected before and after exercise.
- the database may store the subject's voice data and clinical information.
- the database 110 may be a device such as an Electronic Medical Record (EMR).
- EMR Electronic Medical Record
- the user terminal 120 may receive user data from subject A.
- the user terminal 120 illustrates a device such as a smart device.
- the user terminal 120 corresponds to a device that can collect user voice through a microphone and receive clinical information through a certain interface device.
- the user terminal 120 may be any one of various types of devices, such as a smart device, PC, wearable device, smart speaker, etc.
- the user terminal 130 may receive user data from the database 110. Furthermore, the user terminal 120 and the user terminal 130 may be the same device. In this case, the user terminal 130 may be a device that collects and analyzes user data at the same time.
- the user terminal 130 may constantly preprocess the user data of the subject. For example, the user terminal 130 may remove noise from the subject's voice data. Additionally, the user terminal 130 may convert voice data into one of the following types: chromagram, Mel frequency cepstral coefficient (MFCC), and Mel spectrogram. Additionally, the user terminal 130 may perform preprocessing to normalize information of different categories among clinical information to a certain range.
- the user terminal 130 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model. User A can check the degree of the subject's lung disease through the user terminal 130.
- the computer terminal 140 receives user data from the database 110 or the user terminal 120.
- the computer terminal 140 may constantly preprocess user data.
- the computer terminal 140 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model.
- User B can check the degree of the subject's lung disease through the computer terminal 140.
- the server 150 receives user data from the database 110 or the user terminal 120.
- the server 150 may constantly preprocess the user data of the subject.
- the server 150 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model.
- User A can access the server 150 through the user terminal to check the degree of the subject's lung disease.
- Figure 2 is an example of a learning process 200 of a learning model for lung disease severity classification.
- a learning model may be one of various types.
- the learning model shows a deep learning model as an example.
- a learning model that classifies lung disease severity can be named a classification model.
- Classification models are built using training data.
- the learning process of the classification model can be performed by a learning device.
- a learning device refers to a computing device that controls digital data processing and the learning process of deep learning models.
- the learning device constructs learning data (210).
- Training data can be collected from various groups depending on the severity of lung disease. For example, learning data may be collected from the normal group, severity 1 group, ..., and severity n group, respectively.
- Lung disease severity can be determined based on FEV1 (Forced expiratory volume).
- FEV1 refers to the amount of air expelled from the lungs when exhaling in 1 second. If the patient's FEV1 is lower than a threshold (eg, the average of the entire population), the patient can be classified as a COPD patient. If a patient's FEV1 is above the threshold, the patient can be classified as a patient with low severity.
- subjects can be classified into normal, low-severity lung disease patients, and high-severity lung disease patients.
- Learning data includes clinical information and voice data for each group.
- the training data also includes the label value of each training data.
- Voice data is collected separately before and after performing certain exercises. Voice data can be collected as subjects utter the same sentence.
- Voice data may consist of items as shown in Table 1 below.
- the learning device can extract 32 features as shown in Table 1 below from voice signals.
- voice data may consist of any number of items among the items in Table 1 below.
- the learning device can extract silence sections and conversation sections from the entire file using a voice recognition tool.
- the silent section is defined as a section in which a signal with an amplitude level of -36dBFS (decibel full scale) or less lasts for more than 200ms.
- Jitter is a value that indicates how constant the period of vibration is. The more irregular the period or amplitude, the larger the value.
- Shimmer is a number that indicates how constant the amplitude of vibration is. The more irregular the period or amplitude, the larger the value.
- Formant is a resonance that occurs in the vocal tract (the space that extends from the pharynx and oral cavity to the nasal cavity and lips).
- HNR Harmonic to noise ratio
- Speech rate refers to the number of words per minute in speech.
- f0 (fundamental frequency) is the frequency of vocal cord vibration and perceptually corresponds to pitch.
- Articulation rate is the number of syllables per second in speech.
- Syllable duration refers to the duration of a syllable.
- the learning device can extract jitter, shimmer, formants, HNR, speech rate, f0, articulation rate, and syllable length using publicly available software for speech analysis.
- Clinical information can consist of 31 items as shown in Table 2 below.
- the clinical information below includes self-administration variables. Some of the clinical information may be collected through wearable devices, sensor devices, etc. Furthermore, clinical information may consist of any number of items among the items in Table 2 below.
- BMI Body Mass Index
- Resting SpO2 blood oxygen saturation
- SpO2 blood oxygen saturation
- resting heart rate 10 heart rate after exercise 11
- the learning device can consistently preprocess the initial learning data.
- Preprocessing for voice data may include noise removal, data type conversion, etc.
- Preprocessing of clinical information may include the process of adjusting values into certain categories.
- the learning device can normalize clinical information using preprocessing techniques such as Min-Max Normalization and z-score normalization.
- the learning device can convert the value of clinical information into a constant value by one-hot vector coding.
- the learning device can input encoded clinical information into a learning model.
- the learning device treats 32 voice variables and 31 types of clinical information as individual input variables and can construct a total of 63 input variables as learning data.
- the learning device builds a classification model using the learning data (220).
- the learning device extracts one input data from the collected learning data and inputs it into the classification model.
- the classification model outputs a probability value for lung disease severity for the corresponding input data.
- the learning device compares the value output by the classification model with the known correct answer (label value) and updates the weight of the classification model so that the classification model outputs a label corresponding to the correct answer.
- the learning device repeats the learning process using multiple learning data.
- Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity. Looking at Figure 3, the built model showed an average micro AUROC (area under the ROC) and an average macro AUROC of 0.87. Therefore, the classification model showed significantly high performance in classifying lung disease severity.
- FIG. 4 is an example of an analysis device 300 that classifies the severity of lung disease.
- the analysis device 300 corresponds to the above-described analysis device (130, 140, or 150 in FIG. 1).
- the analysis device 300 may be physically implemented in various forms.
- the analysis device 300 may take the form of a smart device, a computer device such as a PC, a network server, a wearable device, an exercise device, or a chipset dedicated to data processing.
- the analysis device 300 may include a storage device 310, a memory 320, an arithmetic device 330, an interface device 340, a communication device 350, and an output device 360.
- the storage device 310 may store the above-described classification model.
- the classification model is a pre-trained model.
- the classification model is a model that outputs lung disease severity based on input user data (voice data and clinical information).
- the storage device 310 can store user data.
- User data is the user's voice data and clinical information that are subject to analysis.
- Voice data consists of data collected before exercise and data collected after exercise.
- Voice data may consist of the items in Table 1.
- Clinical information may consist of the items in Table 2.
- the memory 320 may store data and information generated when the analysis device classifies the severity of lung disease using the subject's user data.
- the interface device 340 is a device that receives certain commands and data from the outside.
- the interface device 340 may receive the subject's voice data from a physically connected input device or an external storage device.
- the input device may include a device such as a microphone.
- Voice data consists of data measured before and after exercise.
- the interface device 340 may receive the subject's clinical information from a physically connected input device or an external storage device.
- the interface device 340 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object.
- the interface device 340 may receive data or information transmitted through the communication device 350 below.
- the communication device 350 refers to a configuration that receives and transmits certain information through a wired or wireless network.
- the communication device 350 may receive the subject's voice data from an external object (database, user terminal, microphone, etc.).
- an external object database, user terminal, microphone, etc.
- the communication device 350 may receive clinical information about a subject from an external object.
- the communication device 350 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object such as a user terminal.
- the output device 360 is a device that outputs certain information.
- the output device 360 can output interfaces, classification results, etc. required for the data processing process.
- the computing device 330 may preprocess user data consistently. For example, the computing device 330 may convert voice data into a certain type of data. Additionally, the computing device 330 may normalize each value of clinical information into a certain category.
- the computing device 330 inputs the preprocessed user data into a pre-trained learning model.
- the computing device 330 may classify the severity of the subject's lung disease based on the probability value output by the learning model.
- the computing device 330 may be a device such as a processor that processes data and performs certain operations, an AP, or a chip with an embedded program.
- the method for classifying the severity of a subject's lung disease as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer.
- the program may be stored and provided in a temporary or non-transitory computer readable medium.
- a non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories.
- the various applications or programs described above include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), and EPROM (Erasable PROM, EPROM).
- EEPROM Electrically EPROM
- Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), and Enhanced SDRAM (Enhanced RAM). It refers to various types of RAM, such as SDRAM (ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM), and Direct Rambus RAM (DRRAM).
- SRAM Static RAM
- DRAM Dynamic RAM
- SDRAM Synchronous DRAM
- DDR SDRAM Double Data Rate SDRAM
- Enhanced SDRAM Enhanced SDRAM
- ESDRAM Synchronous DRAM
- SLDRAM Synchronous DRAM
- DRRAM Direct Rambus RAM
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Veterinary Medicine (AREA)
- Animal Behavior & Ethology (AREA)
- Heart & Thoracic Surgery (AREA)
- Surgery (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Physiology (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Psychiatry (AREA)
- Pulmonology (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Description
이하 설명하는 기술은 대상자의 음성을 이용한 폐질환 정도를 예측하는 기법에 관한 것이다.The technology described below relates to a technique for predicting the degree of lung disease using the subject's voice.
COPD(chronic obstructive pulmonary disease)와 같은 폐질환은 악화 예방을 위하여 조기 진단이 중요하다. COPD는 임상적으로 기침, 객담, 호흡 곤란 등이 있는 환자를 대상으로 폐기능 검사 등을 수행하여 진단될 수 있다. 다만, COPD는 초기 증상을 판별하기 어렵기 때문에, 기본적인 진단만으로는 조기 발견이 어렵다.Early diagnosis of lung diseases such as COPD (chronic obstructive pulmonary disease) is important to prevent worsening. COPD can be clinically diagnosed by performing pulmonary function tests on patients with coughing, sputum production, and shortness of breath. However, because early symptoms of COPD are difficult to identify, early detection is difficult with basic diagnosis alone.
최근 흉부 CT(Computed Tomography) 영상을 분석하는 딥러닝 모델을 이용하여 COPD를 진단하는 연구가 진행된 바 있다. 그러나, 이와 같은 진단 기법도 환자에 대한 흉부 영상이 필요하므로 폐질환의 조기 발견에 기여하기는 어렵다.Recently, a study was conducted to diagnose COPD using a deep learning model that analyzes chest CT (Computed Tomography) images. However, this diagnostic technique also requires chest imaging of the patient, so it is difficult to contribute to the early detection of lung disease.
이하 설명하는 기술은 대상자의 음성 및 임상정보를 기준으로 COPD와 같은 폐질환의 정도를 예측하는 기법을 제공하고자 한다.The technology described below seeks to provide a technique for predicting the degree of lung disease such as COPD based on the subject's voice and clinical information.
음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법은 분석장치는 대상자의 음성 데이터 및 임상 정보를 입력받는 단계, 상기 분석장치는 상기 음성 데이터 및 상기 임상 정보를 전처리하는 단계, 상기 분석장치는 상기 전처리된 음성 데이터 및 임상 정보를 사전에 학습된 학습모델에 입력하는 단계 및 상기 분석장치는 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 단계를 포함한다.A method of classifying the severity of a subject's lung disease using voice data and clinical information includes the steps of: an analysis device receiving voice data and clinical information of a subject; the analysis device preprocessing the voice data and clinical information; The analysis device includes inputting the pre-processed voice data and clinical information into a pre-trained learning model, and the analysis device classifies the severity of the subject's lung disease based on the output value of the learning model.
대상자의 폐질환을 중증도를 분류하는 분석장치는 대상자의 음성 데이터 및 임상 정보를 입력받는 인터페이스 장치, 음성 데이터 및 임상 정보를 입력받아 폐질환 중증도를 분류하는 학습 모델을 저장하는 저장장치 및 상기 입력되는 음성 데이터 및 임상 정보를 전처리하고, 상기 전처리된 음성 데이터 및 임상 정보를 상기 학습 모델에에 입력하고, 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 연산장치를 포함한다.The analysis device that classifies the severity of the subject's lung disease includes an interface device that receives the subject's voice data and clinical information, a storage device that stores a learning model that receives the voice data and clinical information and classifies the severity of the lung disease, and the input device. It includes a computing device that preprocesses voice data and clinical information, inputs the preprocessed voice data and clinical information into the learning model, and classifies the severity of the subject's lung disease based on the output value of the learning model.
이하 설명하는 기술은 비교적 손쉽게 획득 가능한 사용자의 음성 및 임상 정보를 분석하여 폐질환 정도를 예측할 수 있다. 이하 설명하는 기술은 환자가 의료 기관을 방문하지 않고도 음성 녹음 및 자가 진단을 통해 폐질환의 중증도를 진단할 수 있다. The technology described below can predict the degree of lung disease by analyzing the user's voice and clinical information that can be obtained relatively easily. The technology described below can diagnose the severity of lung disease through voice recording and self-diagnosis without the patient having to visit a medical institution.
도 1은 음성 및 임상 정보를 이용한 폐질환 정도 분류 시스템에 대한 예이다.Figure 1 is an example of a lung disease severity classification system using voice and clinical information.
도 2는 폐질환 중증도 분류를 위한 학습 모델의 학습 과정의 예이다. Figure 2 is an example of the learning process of a learning model for lung disease severity classification.
도 3은 폐질환 중증도를 분류하는 학습모델의 성능을 검증한 결과이다.Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity.
도 4는 폐질환 중증도를 분류하는 분석장치의 예이다. Figure 4 is an example of an analysis device that classifies lung disease severity.
이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technology described below may be subject to various changes and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.
제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, and are only used for the purpose of distinguishing one component from other components. It is used only as For example, a first component may be named a second component without departing from the scope of the technology described below, and similarly, the second component may also be named a first component. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.
본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설명된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms used in this specification, singular expressions should be understood to include plural expressions, unless clearly interpreted differently from the context, and terms such as “including” refer to the described features, numbers, steps, operations, and components. , it means the existence of parts or a combination thereof, but should be understood as not excluding the possibility of the presence or addition of one or more other features, numbers, step operation components, parts, or combinations thereof.
도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Before providing a detailed description of the drawings, it would be clarified that the division of components in this specification is merely a division according to the main function each component is responsible for. That is, two or more components, which will be described below, may be combined into one component, or one component may be divided into two or more components for more detailed functions. In addition to the main functions it is responsible for, each of the components described below may additionally perform some or all of the functions handled by other components, and some of the main functions handled by each component may be performed by other components. Of course, it can also be carried out exclusively by .
또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, when performing a method or operation method, each process forming the method may occur in a different order from the specified order unless a specific order is clearly stated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.
이하 설명하는 기술은 대상자의 음성 및 임상정보를 기준으로 COPD와 같은 폐질환의 중증도를 예측 내지 분류하는 기법이다. 이하 설명의 편의를 위하여 COPD를 중심으로 설명한다. 다만, 이하 설명하는 기술은 COPD 외에 다양한 폐질환 중증도 예측 내지 분류에 활용될 수 있다.The technology described below is a technique for predicting or classifying the severity of lung diseases such as COPD based on the subject's voice and clinical information. For convenience of explanation, the following explanation will focus on COPD. However, the technology described below can be used to predict or classify the severity of various lung diseases other than COPD.
분석에 활용하는 사용자 데이터는 사용자의 음성 및 임상 정보를 포함한다. 이때 사용자 데이터는 특정 대상자로부터 수집되며, 일정한 운동을 수행하는 대상자에 대하여 운동 전과 운동 후에 수집될 수 있다. 사용자의 음성은 운동 전과 운동 후에 각각 수집되며, 입력 변수는 해당 음성에서 추출한 특징들을 포함한다. 임상 정보 중 일부는 운동 전과 운동 후에 각각 수집될 수 있다. 나아가, 임상 정보는 대상자로부터 수집한 문진 정보를 포함할 수 있다. 사용자 데이터에 대한 구체적인 설명은 후술한다.User data used for analysis includes the user's voice and clinical information. At this time, user data is collected from a specific subject, and can be collected before and after exercise for a subject performing a certain exercise. The user's voice is collected before and after exercise, and input variables include features extracted from the voice. Some of the clinical information may be collected separately before and after exercise. Furthermore, clinical information may include questionnaire information collected from the subject. A detailed description of user data will be provided later.
이하 분석장치가 사용자의 음성 및 임상 정보를 기준으로 폐질환의 정도를 분류 내지 예측한다고 설명한다. 분석 장치는 데이터 처리가 가능한 다양한 장치로 구현될 수 있다. 예컨대, 분석 장치는 PC, 네트워크상의 서버, 스마트 기기, 웨어러블 기기, 전용 프로그램이 임베딩된 칩셋 등으로 구현될 수 있다. 나아가 분석장치는 운동 기구, 차량, 스마트스피커 등과 같은 다양한 장치에 내장된 형태일 수도 있다.The following explains that the analysis device classifies or predicts the degree of lung disease based on the user's voice and clinical information. The analysis device can be implemented as a variety of devices capable of processing data. For example, an analysis device can be implemented as a PC, a server on a network, a smart device, a wearable device, or a chipset with a dedicated program embedded therein. Furthermore, analysis devices may be built into various devices such as exercise equipment, vehicles, smart speakers, etc.
분석장치는 기계 학습 모델을 이용하여 폐질환 여부를 분류할 수 있다. 기계 학습 모델은 결정 트리, 랜덤 포레스트(random forest), KNN(K-nearest neighbor), 나이브 베이즈(Naive Bayes), SVM(support vector machine), ANN(artificial neural network) 등이 있다. 이하 학습모델은 DNN(Deep Neural Network)을 중심으로 설명한다. 다만, 폐질환 분류를 위한 학습모델은 다양한 유형의 모델로 구현될 수 있다.The analysis device can classify lung disease using a machine learning model. Machine learning models include decision trees, random forest, KNN (K-nearest neighbor), Naive Bayes, SVM (support vector machine), and ANN (artificial neural network). The following learning model will be explained focusing on DNN (Deep Neural Network). However, the learning model for lung disease classification can be implemented as various types of models.
도 1은 음성 및 임상 정보를 이용한 폐질환 정도 분류 시스템(100)에 대한 예이다. 도 1에서 분석장치는 사용자 단말(130), 컴퓨터 단말(140) 및 서버(150)인 예를 도시하였다.Figure 1 is an example of a lung disease severity classification system 100 using voice and clinical information. In Figure 1, the analysis device is a
대상자 A는 일정한 시간 동안 일정한 운동을 수행한다. 폐질환 환자는 운동 전과 운동 후에 대비되는 음성적 특징이 다를 수 있다. 따라서, 사용자 데이터는 대상자 A로부터 운동 전과 운동 후에 각각 수집될 수 있다. Subject A performs a certain exercise for a certain amount of time. Patients with lung disease may have different vocal characteristics before and after exercise. Accordingly, user data can be collected from subject A before and after exercise, respectively.
사용자 데이터는 전술한 바와 같이 대상자의 음성 데이터 및 임상 정보를 포함할 수 있다. 음성 데이터는 운동 전 음성 데이터 및 운동 후 음성 데이터로 구성된다. 운동 전 음성 데이터 및 운동 후 음성 데이터는 각각 동일한 대상자 A가 운동 전과 운동 후에 각각 동일한 단어들 또는 문장(텍스트)을 발성한 데이터로 구성된다. 임상 정보는 다양한 항목들로 구성될 수 있다. 임상 정보에 포함되는 항목들 중 일부는 운동 전과 운동 후에 각각 수집되는 데이터에 해당한다.As described above, the user data may include the subject's voice data and clinical information. Voice data consists of voice data before exercise and voice data after exercise. The voice data before exercise and the voice data after exercise are composed of data in which the same subject A uttered the same words or sentences (text) before and after exercise, respectively. Clinical information may consist of various items. Some of the items included in clinical information correspond to data collected before and after exercise.
데이터베이스(DB, 110)는 대상자의 음성 데이터 및 임상 정보를 저장할 수 있다. 데이터베이스(110)는 EMR(Electronic Medical Record)과 같은 장치일 수 있다.The database (DB, 110) may store the subject's voice data and clinical information. The
사용자 단말(120)은 대상자 A로부터 사용자 데이터를 입력받을 수 있다. 도 1에서 사용자 단말(120)은 스마트 기기와 같은 장치를 도시하였다. 사용자 단말(120)은 마이크로 사용자 음성을 수집할 수 있고, 일정한 인터페이스 장치를 통하여 임상 정보를 입력받을 수 있는 장치에 해당한다. 사용자 단말(120)은 스마트 기기, PC, 웨어러블 기기, 스마트 스피커 등과 같이 다양한 형태의 장치 중 어느 하나일 수 있다.The
사용자 단말(130)은 데이터베이스(110)로부터 사용자 데이터를 수신할 수 있다. 나아가, 사용자 단말(120)과 사용자 단말(130)은 동일한 장치일 수 있다. 이 경우 사용자 단말(130)은 사용자 데이터를 수집하고 동시에 분석하는 장치일 수 있다. 사용자 단말(130)은 대상자의 사용자 데이터를 일정하게 전처리할 수 있다. 예컨대, 사용자 단말(130)은 대상자의 음성 데이터의 잡음을 제거할 수 있다. 또한, 사용자 단말(130)은 음성 데이터를 크로마그램(chromagram), MFCC(Mel frequency cepstral coefficient) 및 멜 스펙트로그램(Mel spectrogram) 등과 같은 유형 중 어느 하나로 변환할 수도 있다. 또한, 사용자 단말(130)은 임상 정보 중 범주가 서로 다른 정보를 일정한 범위로 정규화하는 전처리를 할 수 있다. 사용자 단말(130)은 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 A는 사용자 단말(130)을 통해 대상자의 폐질환 정도를 확인할 수 있다.The
컴퓨터 단말(140)은 데이터베이스(110) 또는 사용자 단말(120)로부터 사용자 데이터를 수신한다. 컴퓨터 단말(140)은 사용자 데이터를 일정하게 전처리할 수 있다. 컴퓨터 단말(140)은 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 B는 컴퓨터 단말(140)을 통해 대상자의 폐질환 정도를 확인할 수 있다. The
서버(150)는 데이터베이스(110) 또는 사용자 단말(120)로부터 사용자 데이터를 수신한다. 서버(150)는 대상자의 사용자 데이터를 일정하게 전처리할 수 있다. 서버(150)는 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 A는 사용자 단말을 통해 서버(150)에 접속하여 대상자의 폐질환 정도를 확인할 수 있다. The
도 2는 폐질환 중증도 분류를 위한 학습 모델의 학습 과정(200)의 예이다. 학습 모델은 다양한 유형 중 어느 하나일 수 있다. 도 2에서 학습모델은 딥러닝 모델을 예로 도시한다. 폐질환 중증도를 분류하는 학습 모델은 분류 모델이라고 명명할 수 있다. 분류 모델은 학습 데이터를 이용하여 구축된다. 분류 모델의 학습 과정은 학습장치가 수행할 수 있다. 학습장치는 디지털 데이터 처리 및 딥러닝 모델의 학습 과정을 제어하는 컴퓨팅 장치를 의미한다.Figure 2 is an example of a
학습장치는 학습 데이터를 구축한다(210). 학습 데이터는 폐질환 중증도에 따라 다양한 그룹에서 수집할 수 있다. 예컨대, 학습 데이터는 정상 그룹, 중증도 1 그룹, ..., 및 중증도 n 그룹으로부터 각각 수집될 수 있다. 폐질환 중증도는 FEV1(Forced expiratory volume)을 기준으로 결정될 수 있다. FEV1은 1초 동안 숨을 내쉬면서 폐에서 배출하는 공기량을 말한다. 환자의 FEV1이 임계값(예컨대, 전체 모집단의 평균)보다 낮은 경우, 해당 환자는 COPD 환자로 분류할 수 있다. 환자의 FEV1이 임계값 이상인 경우, 해당 환자는 중증도가 낮은 환자로 분류할 수 있다. 즉, 이 경우 대상자는 정상, 낮은 중증도의 폐질환 환자 및 높은 중증도의 폐질환 환자로 분류될 수 있다. 학습 데이터는 각 그룹의 임상 정보 및 음성 데이터를 포함한다. 나아가, 학습 데이터는 각 학습 데이터의 라벨값도 포함한다. 음성 데이터는 일정한 운동의 수행 전과 수행 후로 구분하여 각각 수집된다. 음성 데이터는 대상자들이 동일한 문장을 발성하면서 수집될 수 있다. The learning device constructs learning data (210). Training data can be collected from various groups depending on the severity of lung disease. For example, learning data may be collected from the normal group, severity 1 group, ..., and severity n group, respectively. Lung disease severity can be determined based on FEV1 (Forced expiratory volume). FEV1 refers to the amount of air expelled from the lungs when exhaling in 1 second. If the patient's FEV1 is lower than a threshold (eg, the average of the entire population), the patient can be classified as a COPD patient. If a patient's FEV1 is above the threshold, the patient can be classified as a patient with low severity. That is, in this case, subjects can be classified into normal, low-severity lung disease patients, and high-severity lung disease patients. Learning data includes clinical information and voice data for each group. Furthermore, the training data also includes the label value of each training data. Voice data is collected separately before and after performing certain exercises. Voice data can be collected as subjects utter the same sentence.
음성 데이터는 아래 표 1과 같은 항목들로 구성될 수 있다. 학습장치는 음성 신호로부터 아래와 표 1과 같은 32개의 특징을 추출할 수 있다. 나아가 음성 데이터는 아래 표 1의 항목들 중 임의의 복수의 항목들로 구성될 수도 있다.Voice data may consist of items as shown in Table 1 below. The learning device can extract 32 features as shown in Table 1 below from voice signals. Furthermore, voice data may consist of any number of items among the items in Table 1 below.
음성 신호의 특징들 중 침묵구간 수, 침묵구간 길이, 녹음 길이 및 녹음 길이 대비 침묵구간의 비율은 정상군에 비해 COPD 환자군에서 운동 전 대비 운동 후에 크게 증가할 것으로 평가된다. 사용자의 발성을 위한 텍스트가 사전에 설정된 경우, 학습장치는 음성 인식툴을 이용하여 전체 파일에서 침묵 구간 및 대화 구간 등을 추출할 수 있다. Among the characteristics of the voice signal, the number of silent sections, length of silent sections, recording length, and ratio of silent sections to recording length are expected to significantly increase after exercise compared to before exercise in the COPD patient group compared to the normal group. If the text for the user's speech is set in advance, the learning device can extract silence sections and conversation sections from the entire file using a voice recognition tool.
침묵구간은 -36dBFS(decibel Full scale) 이하의 크기 정도(amplitude level)를 갖는 신호가 200ms 이상 지속된 구간으로 정의한다.The silent section is defined as a section in which a signal with an amplitude level of -36dBFS (decibel full scale) or less lasts for more than 200ms.
지터(jitter)는 진동의 주기가 얼마나 일정한지 나타내는 수치로 주기나 진폭이 불규칙할수록 값이 커진다.Jitter is a value that indicates how constant the period of vibration is. The more irregular the period or amplitude, the larger the value.
시머(shimmer)는 진동의 진폭이 얼마나 일정한지 나타내는 수치로 주기나 진폭이 불규칙할수록 값이 커진다.Shimmer is a number that indicates how constant the amplitude of vibration is. The more irregular the period or amplitude, the larger the value.
포먼트(formant)는 성도(인두, 구강을 지나 비강과 입술까지 이르는 공간)에서 발생하는 공명이다.Formant is a resonance that occurs in the vocal tract (the space that extends from the pharynx and oral cavity to the nasal cavity and lips).
HNR(harmonic to noise ratio)은 70~4,500 Hz 사이에 존재하는 배음과 1,500~4,500 Hz 사이에 존재하는 비정상 배음 간의 비율 평균치이며, 그 값이 클수록 소음의 비율이 높음을 의미한다.HNR (harmonic to noise ratio) is the average value of the ratio between overtones that exist between 70 and 4,500 Hz and abnormal overtones that exist between 1,500 and 4,500 Hz. The larger the value, the higher the noise ratio.
발성 속도(speech rate)는 음성 내 분당 단어 수 (words per minute)를 의미한다.Speech rate refers to the number of words per minute in speech.
f0(기본주파수)는 성대 진동의 주파수로 지각적으로는 음높이(pitch)에 해당한다.f0 (fundamental frequency) is the frequency of vocal cord vibration and perceptually corresponds to pitch.
조음 속도(articulation rate)는 음성 내 초당 음절 수 (syllables per second)이다.Articulation rate is the number of syllables per second in speech.
음절 길이(syllable duration)는 음절 지속 시간을 말한다.Syllable duration refers to the duration of a syllable.
학습장치는 음성 분석을 위한 공개 소프트웨어를 사용하여 지터, 시머, 포먼트, HNR, 발성 속도, f0, 조음 속도 및 음절 길이를 추출할 수 있다.The learning device can extract jitter, shimmer, formants, HNR, speech rate, f0, articulation rate, and syllable length using publicly available software for speech analysis.
임상 정보는 아래 표 2와 같은 31개의 항목들로 구성될 수 있다. 아래 임상 정보는 자가문진 변수들을 포함한다. 임상 정보 중 일부 정보는 웨어러블기기, 센서 장치 등으로 통해 수집될 수도 있다. 나아가 임상 정보는 아래 표 2의 항목들 중 임의의 복수의 항목들로 구성될 수도 있다.Clinical information can consist of 31 items as shown in Table 2 below. The clinical information below includes self-administration variables. Some of the clinical information may be collected through wearable devices, sensor devices, etc. Furthermore, clinical information may consist of any number of items among the items in Table 2 below.
학습장치는 초기 학습 데이터를 일정하게 전처리할 수 있다. 음성 데이터에 대한 전처리는 잡음 제거, 데이터의 유형 변환 등을 포함할 수 있다. 임상 정보에 대한 전처리는 일정한 범주로 값을 조절하는 과정을 포함할 수 있다. 예컨대, 학습장치는 최소-최대 정규화(Min-Max Normalization), z-점수 정규화 등과 같은 전처리 기법을 이용하여 임상 정보를 정규화할 수 있다.The learning device can consistently preprocess the initial learning data. Preprocessing for voice data may include noise removal, data type conversion, etc. Preprocessing of clinical information may include the process of adjusting values into certain categories. For example, the learning device can normalize clinical information using preprocessing techniques such as Min-Max Normalization and z-score normalization.
또한, 학습장치는 임상 정보의 값을 원-핫 벡터 코딩하여 일정한 값으로 변환할 수 있다. 학습장치는 인코딩된 임상 정보를 학습 모델에 입력할 수 있다.Additionally, the learning device can convert the value of clinical information into a constant value by one-hot vector coding. The learning device can input encoded clinical information into a learning model.
학습장치는 음성 변수 32개와 임상 정보 31종을 개별적인 입력 변수로 취급하여 전체 63개의 입력 변수를 학습 데이터로 구축할 수 있다.The learning device treats 32 voice variables and 31 types of clinical information as individual input variables and can construct a total of 63 input variables as learning data.
학습장치는 학습 데이터를 이용하여 분류 모델을 구축한다(220). 학습장치는 수집한 학습데이터 중 하나의 입력 데이터룰 추출하여 분류 모델에 입력한다. 분류 모델은 해당 입력 데이터에 대한 폐질환 중증도에 대한 확률값을 출력한다. 학습장치는 분류 모델이 출력하는 값과 알고 있는 정답(라벨값)을 비교하여 분류 모델이 정답에 해당하는 라벨을 출력하도록 분류 모델의 가중치를 업데이트한다. 학습장치는 다수의 학습 데이터를 이용하여 학습 과정을 반복한다.The learning device builds a classification model using the learning data (220). The learning device extracts one input data from the collected learning data and inputs it into the classification model. The classification model outputs a probability value for lung disease severity for the corresponding input data. The learning device compares the value output by the classification model with the known correct answer (label value) and updates the weight of the classification model so that the classification model outputs a label corresponding to the correct answer. The learning device repeats the learning process using multiple learning data.
연구자는 소속 기관에서 수집한 248명의 데이터를 이용하여 전술한 분류 모델을 구축하고 검증하였다. 연구자는 248명의 데이터를 4:1로 구분하여 각각 학습 데이터 및 검증 데이터로 이용하였다. 248명의 데이터는 COPD 중증도가 높은 54건(FEV1 < 50), 중증도가 낮은 144건(FEV1 ≥ 50) 및 정상 상태 50건으로 구성되었다. 연구자는 다양한 기계학습모델을 구축하였다. 연구자는 MLP(Multi-layer Perceptron), 랜던 포레스트, Extra Tree Classifier, XGBoost 및 LightGBM의 모델을 각각 구축하였다. 연구자는 구축한 모델들의 성능을 비교하였는데 랜덤 포레스트가 가장 높은 성능을 보였다. 도 3은 폐질환 중증도를 분류하는 학습모델의 성능을 검증한 결과이다. 도 3을 살펴보면 구축한 모델은 평균 micro AUROC(area under the ROC) 및 평균 macro AUROC가 0.87의 성능을 보였다. 따라서, 해당 분류 모델은 폐질환 중증도 분류에 상당히 높은 성능을 보였다. The researcher built and verified the aforementioned classification model using data from 248 people collected from the affiliated institution. The researcher divided the data of 248 people into a 4:1 ratio and used them as training data and verification data, respectively. Data from 248 patients consisted of 54 cases with high COPD severity (FEV1 < 50), 144 cases with low severity (FEV1 ≥ 50), and 50 cases with normal COPD severity. The researcher built various machine learning models. The researcher built models of MLP (Multi-layer Perceptron), Landon Forest, Extra Tree Classifier, XGBoost, and LightGBM, respectively. The researcher compared the performance of the built models, and random forest showed the highest performance. Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity. Looking at Figure 3, the built model showed an average micro AUROC (area under the ROC) and an average macro AUROC of 0.87. Therefore, the classification model showed significantly high performance in classifying lung disease severity.
도 4는 폐질환 중증도를 분류하는 분석장치(300)의 예이다. 분석장치(300)는 전술한 분석장치(도 1의 130, 140 또는 150)에 해당한다. 분석장치(300)는 물리적으로 다양한 형태로 구현될 수 있다. 예컨대, 분석장치(300)는 스마트 기기, PC와 같은 컴퓨터 장치, 네트워크의 서버, 웨어러블 기기, 운동 기기, 데이터 처리 전용 칩셋 등의 형태를 가질 수 있다.Figure 4 is an example of an
분석장치(300)는 저장장치(310), 메모리(320), 연산장치(330), 인터페이스 장치(340), 통신장치(350) 및 출력장치(360)를 포함할 수 있다.The
저장장치(310)는 전술한 분류 모델을 저장할 수 있다. 분류 모델은 사전에 학숩된 모델이다. 분류 모델은 입력되는 사용자 데이터(음성 데이터 및 임상 정보)를 기준으로 폐질환 중증도를 출력하는 모델이다.The
저장장치(310)는 사용자 데이터를 저장할 수 있다. 사용자 데이터는 분석 대상인 사용자의 음성 데이터 및 임상 정보이다. 음성 데이터는 운동 전에 수집한 데이터 및 운동 후에 수집한 데이터로 구성된다. 음성 데이터는 표 1의 항목들로 구성될 수 있다. 임상 정보는 표 2의 항목들로 구성될 수 있다.The
메모리(320)는 분석장치가 대상자의 사용자 데이터를 이용하여 폐질환 중증도를 분류하는 과정에서 생성되는 데이터 및 정보 등을 저장할 수 있다.The
인터페이스 장치(340)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. The interface device 340 is a device that receives certain commands and data from the outside.
인터페이스 장치(340)는 물리적으로 연결된 입력 장치 또는 외부 저장장치로부터 대상자의 음성 데이터를 입력받을 수 있다. 이때 입력 장치는 마이크와 같은 장치를 포함할 수도 있다. 음성 데이터는 운동 전과 운동 후에 각각 측정한 데이터로 구성된다.The interface device 340 may receive the subject's voice data from a physically connected input device or an external storage device. At this time, the input device may include a device such as a microphone. Voice data consists of data measured before and after exercise.
인터페이스 장치(340)는 물리적으로 연결된 입력 장치 또는 외부 저장장치로부터 대상자의 임상 정보를 입력받을 수 있다. The interface device 340 may receive the subject's clinical information from a physically connected input device or an external storage device.
인터페이스 장치(340)는 대상자의 사용자 데이터를 분석하여 폐질환 중증도를 분류한 결과를 외부 객체에 전달할 수도 있다. The interface device 340 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object.
한편, 인터페이스 장치(340)는 아래 통신장치(350)를 경유하여 전달된 데이터 내지 정보를 입력받을 수도 있다.Meanwhile, the interface device 340 may receive data or information transmitted through the communication device 350 below.
통신장치(350)는 유선 또는 무선 네트워크를 통해 일정한 정보를 수신하고 전송하는 구성을 의미한다. The communication device 350 refers to a configuration that receives and transmits certain information through a wired or wireless network.
통신장치(350)는 외부 객체(데이터베이스, 사용자 단말, 마이크 등)로부터 대상자의 음성 데이터를 수신할 수 있다.The communication device 350 may receive the subject's voice data from an external object (database, user terminal, microphone, etc.).
통신장치(350)는 외부 객체로부터 대상자의 임상 정보를 수신할 수 있다.The communication device 350 may receive clinical information about a subject from an external object.
통신장치(350)는 대상자의 사용자 데이터를 분석하여 폐질환 중증도를 분류한 결과를 사용자 단말과 같은 외부 객체에 송신할 수도 있다.The communication device 350 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object such as a user terminal.
출력장치(360)는 일정한 정보를 출력하는 장치이다. 출력장치(360)는 데이터 처리 과정에 필요한 인터페이스, 분류 결과 등을 출력할 수 있다. The output device 360 is a device that outputs certain information. The output device 360 can output interfaces, classification results, etc. required for the data processing process.
연산 장치(330)는 사용자 데이터를 일정하게 전처리할 수 있다. 예컨대, 연산 장치(330)는 음성 데이터를 일정한 유형의 데이터로 변환할 수 있다. 또한, 연산 장치(330)는 임상 정보의 각 값을 일정한 범주로 정규화할 수도 있다.The computing device 330 may preprocess user data consistently. For example, the computing device 330 may convert voice data into a certain type of data. Additionally, the computing device 330 may normalize each value of clinical information into a certain category.
연산 장치(330)는 전처리한 사용자 데이터를 사전에 학습된 학습 모델에 입력한다. 연산 장치(330)는 학습 모델이 출력하는 확률값을 기준으로 대상자의 폐질환 중증도를 분류할 수 있다.The computing device 330 inputs the preprocessed user data into a pre-trained learning model. The computing device 330 may classify the severity of the subject's lung disease based on the probability value output by the learning model.
연산 장치(330)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.The computing device 330 may be a device such as a processor that processes data and performs certain operations, an AP, or a chip with an embedded program.
또한, 상술한 바와 같은 대상자의 폐질환 중증도 분류 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Additionally, the method for classifying the severity of a subject's lung disease as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a temporary or non-transitory computer readable medium.
비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories. Specifically, the various applications or programs described above include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), and EPROM (Erasable PROM, EPROM). Alternatively, it may be stored and provided in a non-transitory readable medium such as EEPROM (Electrically EPROM) or flash memory.
일시적 판독 가능 매체는 스태틱 램(Static RAM,SRAM), 다이내믹 램(Dynamic RAM,DRAM), 싱크로너스 디램 (Synchronous DRAM,SDRAM), 2배속 SDRAM(Double Data Rate SDRAM,DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM,ESDRAM), 동기화 DRAM(Synclink DRAM,SLDRAM) 및 직접 램버스 램(Direct Rambus RAM,DRRAM) 과 같은 다양한 RAM을 의미한다.Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), and Enhanced SDRAM (Enhanced RAM). It refers to various types of RAM, such as SDRAM (ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM), and Direct Rambus RAM (DRRAM).
본 실시예 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.This embodiment and the drawings attached to this specification only clearly show some of the technical ideas included in the above-described technology, and those skilled in the art can easily understand them within the scope of the technical ideas included in the specification and drawings of the above-described technology. It is self-evident that all inferable variations and specific embodiments are included in the scope of rights of the above-mentioned technology.
Claims (8)
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR10-2022-0117052 | 2022-09-16 | ||
| KR20220117052 | 2022-09-16 | ||
| KR10-2023-0122823 | 2023-09-15 | ||
| KR1020230122823A KR102685274B1 (en) | 2022-09-16 | 2023-09-15 | Classification method for severity of pulmonary disease based on vocal data and clinical information and analysis apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024058585A1 true WO2024058585A1 (en) | 2024-03-21 |
Family
ID=90275303
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/KR2023/013863 Ceased WO2024058585A1 (en) | 2022-09-16 | 2023-09-15 | Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20240110929A (en) |
| WO (1) | WO2024058585A1 (en) |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20150127380A (en) * | 2014-05-07 | 2015-11-17 | 한국 한의학 연구원 | Apparatus and method for diagnosis of physical conditions using phonetic analysis |
| JP2018516616A (en) * | 2015-04-16 | 2018-06-28 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Device, system and method for detecting heart and / or respiratory disease in a subject |
| JP2018534026A (en) * | 2015-10-08 | 2018-11-22 | コルディオ メディカル リミテッド | Evaluation of lung diseases by speech analysis |
| US20210076977A1 (en) * | 2017-12-21 | 2021-03-18 | The University Of Queensland | A method for analysis of cough sounds using disease signatures to diagnose respiratory diseases |
| JP2022100317A (en) * | 2019-03-11 | 2022-07-05 | 株式会社RevComm | Information processing equipment |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP3964134A1 (en) | 2020-09-02 | 2022-03-09 | Hill-Rom Services PTE. LTD. | Lung health sensing through voice analysis |
-
2023
- 2023-09-15 WO PCT/KR2023/013863 patent/WO2024058585A1/en not_active Ceased
-
2024
- 2024-07-09 KR KR1020240090558A patent/KR20240110929A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20150127380A (en) * | 2014-05-07 | 2015-11-17 | 한국 한의학 연구원 | Apparatus and method for diagnosis of physical conditions using phonetic analysis |
| JP2018516616A (en) * | 2015-04-16 | 2018-06-28 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Device, system and method for detecting heart and / or respiratory disease in a subject |
| JP2018534026A (en) * | 2015-10-08 | 2018-11-22 | コルディオ メディカル リミテッド | Evaluation of lung diseases by speech analysis |
| US20210076977A1 (en) * | 2017-12-21 | 2021-03-18 | The University Of Queensland | A method for analysis of cough sounds using disease signatures to diagnose respiratory diseases |
| JP2022100317A (en) * | 2019-03-11 | 2022-07-05 | 株式会社RevComm | Information processing equipment |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20240110929A (en) | 2024-07-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11810670B2 (en) | Intelligent health monitoring | |
| Muhammad et al. | Convergence of artificial intelligence and internet of things in smart healthcare: a case study of voice pathology detection | |
| US20240374187A1 (en) | Multi-modal systems and methods for voice-based mental health assessment with emotion stimulation | |
| Shi et al. | Theory and Application of Audio‐Based Assessment of Cough | |
| US10223934B2 (en) | Systems and methods for expressive language, developmental disorder, and emotion assessment, and contextual feedback | |
| Stasak et al. | Automatic detection of COVID-19 based on short-duration acoustic smartphone speech analysis | |
| CN106073706A (en) | A kind of customized information towards Mini-mental Status Examination and audio data analysis method and system | |
| Vatanparvar et al. | CoughMatch–subject verification using cough for personal passive health monitoring | |
| Romero et al. | Deep learning features for robust detection of acoustic events in sleep-disordered breathing | |
| Dubbioso et al. | Precision medicine in als: Identification of new acoustic markers for dysarthria severity assessment | |
| JP2023531464A (en) | A method and system for screening for obstructive sleep apnea during wakefulness using anthropometric information and tracheal breath sounds | |
| Zhao et al. | Dysphagia diagnosis system with integrated speech analysis from throat vibration | |
| Mitra et al. | Pre-trained foundation model representations to uncover breathing patterns in speech | |
| KR20230050208A (en) | Respiratory disease prognosis prediction system and method through time-series cough sound, breathing sound, reading sound or vocal sound measurement | |
| Romero et al. | Snorer diarisation based on deep neural network embeddings | |
| WO2024058585A1 (en) | Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information | |
| KR102686011B1 (en) | An AI based hearing loss/cognitive disorder/Alzheimer's Disease diagnosis method using audio data | |
| KR102685274B1 (en) | Classification method for severity of pulmonary disease based on vocal data and clinical information and analysis apparatus | |
| Ng et al. | A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation | |
| Serrano et al. | Obstructive Sleep Apnea Identification Based On VGGish Networks. | |
| Härmä et al. | Survey on biomarkers in human vocalizations | |
| Sayadi et al. | Voice as an indicator for laryngeal disorders using data mining approach. | |
| Dkhan et al. | Respiratory diseases detection and classification based on respiratory voice using artificial intelligence methods | |
| Xu et al. | A Review of Disorder Voice Processing toward to Applications | |
| WO2025084507A2 (en) | Method and analysis device for classifying lung disease of subject using vibration data according to phonation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23865881 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 23865881 Country of ref document: EP Kind code of ref document: A1 |