[go: up one dir, main page]

CN115116619A - Intelligent analysis method and system for stroke data distribution rule - Google Patents

Intelligent analysis method and system for stroke data distribution rule Download PDF

Info

Publication number
CN115116619A
CN115116619A CN202210855844.9A CN202210855844A CN115116619A CN 115116619 A CN115116619 A CN 115116619A CN 202210855844 A CN202210855844 A CN 202210855844A CN 115116619 A CN115116619 A CN 115116619A
Authority
CN
China
Prior art keywords
data
stroke
clustering
matrix
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210855844.9A
Other languages
Chinese (zh)
Other versions
CN115116619B (en
Inventor
李凤莲
张雪英
陈桂军
黄丽霞
焦江丽
李晓辉
史凯岳
杜鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taiyuan University of Technology
Original Assignee
Taiyuan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taiyuan University of Technology filed Critical Taiyuan University of Technology
Priority to CN202210855844.9A priority Critical patent/CN115116619B/en
Publication of CN115116619A publication Critical patent/CN115116619A/en
Application granted granted Critical
Publication of CN115116619B publication Critical patent/CN115116619B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Epidemiology (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Image Analysis (AREA)

Abstract

本发明涉及一种脑卒中数据分布规律智能分析方法和系统。其中方法包括:获取脑卒中病例数据;对脑卒中病例数据进行ZCA白化处理得到白化处理数据;采用堆叠稀疏自编码器对白化处理结果进行降维处理,得到降维数据;采用深度强化学习优化的学习矢量量化聚类方法对降维数据进行聚类得到聚类结果;基于聚类结果生成脑卒中数据分布规律。基于这一方法本发明能够提高数据处理效率,进而提高脑卒中数据分布规律的生成效率和精确性。

Figure 202210855844

The invention relates to an intelligent analysis method and system of stroke data distribution law. The methods include: obtaining stroke case data; performing ZCA whitening processing on stroke case data to obtain whitening data; using stacked sparse autoencoder to perform dimensionality reduction processing on the whitening results to obtain dimensionality reduction data; using deep reinforcement learning to optimize Learning the vector quantization clustering method to cluster the dimensionality reduction data to obtain the clustering results; based on the clustering results, the stroke data distribution rules are generated. Based on this method, the present invention can improve the data processing efficiency, thereby improving the generation efficiency and accuracy of the stroke data distribution law.

Figure 202210855844

Description

一种脑卒中数据分布规律智能分析方法及系统A method and system for intelligent analysis of stroke data distribution law

技术领域technical field

本发明涉及数据处理技术领域,特别是涉及一种脑卒中数据分布规律智能分析方法及系统。The invention relates to the technical field of data processing, in particular to a method and system for intelligent analysis of stroke data distribution law.

背景技术Background technique

目前针对脑卒中病例数据主要采用传统的统计学方法进行分析。随着人工智能在智能医疗领域研究的深入,基于机器学习方法开展脑卒中数据分布规律挖掘为近年的研究热点。但是现有技术在对脑卒中数据进行分析时,一般是采用人工分析分布规律的方式,因此,提供一种快速高效发现脑卒中数据分布规律的方法或系统,成为本领域亟待解决的一个技术问题。At present, the data of stroke cases are mainly analyzed by traditional statistical methods. With the in-depth research of artificial intelligence in the field of intelligent medicine, the mining of stroke data distribution rules based on machine learning methods has become a research hotspot in recent years. However, when analyzing the stroke data in the prior art, the method of manually analyzing the distribution law is generally adopted. Therefore, providing a method or system for quickly and efficiently discovering the distribution law of the stroke data has become a technical problem to be solved urgently in this field. .

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种脑卒中数据分布规律智能分析方法及系统,能够提高数据处理效率,进而提高脑卒中数据分布规律的生成效率和精确性。The purpose of the present invention is to provide an intelligent analysis method and system of stroke data distribution law, which can improve the data processing efficiency, thereby improving the generation efficiency and accuracy of the stroke data distribution law.

为实现上述目的,本发明提供了如下方案:For achieving the above object, the present invention provides the following scheme:

一种脑卒中数据分布规律智能分析方法,包括:An intelligent analysis method for stroke data distribution law, comprising:

获取脑卒中病例数据;Obtain stroke case data;

对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据;Performing ZCA whitening processing on the stroke case data to obtain whitening processing data;

采用堆叠稀疏自编码器对所述白化处理结果进行降维处理,得到降维数据;The dimensionality reduction processing is performed on the whitening processing result by using a stacked sparse autoencoder to obtain dimensionality reduction data;

采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果;The dimensionality reduction data is clustered by adopting the learning vector quantization clustering method optimized by deep reinforcement learning to obtain a clustering result;

基于所述聚类结果生成脑卒中数据分布规律。A stroke data distribution rule is generated based on the clustering result.

优选地,所述对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据,具体包括:Preferably, performing ZCA whitening processing on the stroke case data to obtain whitening processing data, specifically including:

将所述脑卒中病例数据转化为数值矩阵;converting the stroke case data into a numerical matrix;

对所述数值矩阵进行标准化处理得到第一矩阵;standardizing the numerical matrix to obtain a first matrix;

确定所述第一矩阵的样本协方差矩阵,并确定所述样本协方差矩阵的特征值;determining the sample covariance matrix of the first matrix, and determining the eigenvalues of the sample covariance matrix;

对所述特征值进行降序排列,并提取降序排列后每一特征值的特征向量得到第二矩阵;Arrange the eigenvalues in descending order, and extract the eigenvectors of each eigenvalue after the descending arrangement to obtain a second matrix;

确定所述第二矩阵的旋转矩阵;determining the rotation matrix of the second matrix;

基于所述旋转矩阵确定所述白化处理数据。The whitening process data is determined based on the rotation matrix.

优选地,采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果。Preferably, a clustering result is obtained by clustering the dimensionality reduction data using a learning vector quantization clustering method optimized by deep reinforcement learning.

优选地,所述深度强化学习优化的学习矢量量化聚类方法为融入有深度强化学习的状态集和深度强化学习的动作集的学习矢量量化聚类方法。Preferably, the learning vector quantization clustering method optimized by deep reinforcement learning is a learning vector quantization clustering method incorporating a deep reinforcement learning state set and a deep reinforcement learning action set.

优选地,所述状态集的选择过程为:Preferably, the selection process of the state set is:

将学习矢量量化聚类方法的每一次迭代得到的原型向量作为一个状态,迭代次数作为状态数,形成所述状态集。The prototype vector obtained in each iteration of the learning vector quantization clustering method is taken as a state, and the number of iterations is taken as the number of states to form the state set.

优选地,所述动作集的构建过程包括:Preferably, the construction process of the action set includes:

通过探索与利用机制得到动作集,设计了适用于该聚类方法的奖赏函数,以确定最大奖赏值所对应动作,每次迭代选择最大奖赏所对应动作来得到迭代后的聚类结果;The action set is obtained by exploring and utilizing the mechanism, and a reward function suitable for the clustering method is designed to determine the action corresponding to the maximum reward value, and each iteration selects the action corresponding to the maximum reward to obtain the iterative clustering result;

其中,奖赏函数设计:

Figure BDA0003754504800000021
Among them, the reward function design:
Figure BDA0003754504800000021

其中,ai代表第i次迭代从动作集A={a1,a2,...,ai,...,aL}中选取的动作,

Figure BDA0003754504800000022
代表执行动作ai后所得到质心与各类簇样本的平均距离,di代表第i次迭代原始LVQ各类簇样本与其质心平均距离。Among them, a i represents the action selected from the action set A={a 1 ,a 2 ,...,a i ,...,a L } in the ith iteration,
Figure BDA0003754504800000022
Represents the average distance between the centroid obtained after performing action a i and various cluster samples, and d i represents the average distance between various cluster samples and their centroids in the original LVQ iteration of the i-th iteration.

优选地,所述动作集的选取过程为:Preferably, the selection process of the action set is:

采用“探索”机制和“利用”机制选择性的挑选数据点与原型向量进行“拉近或远离”运算;Use the "exploration" mechanism and the "utilization" mechanism to selectively select data points and prototype vectors to perform "closer or farther" operations;

其中,“利用”机制的实现过程为:引入参数m<z,在输入数据集中随机选取m个样本组成数据子集Xm,将该数据集中的m个数据与原型向量做“拉近”或“远离”运算得到一个动作;z为脑卒中数据集样本个数;Among them, the realization process of the "utilization" mechanism is: introduce the parameter m<z, randomly select m samples from the input data set to form a data subset X m , and "close" the m data in the data set with the prototype vector or The "far away" operation gets an action; z is the number of samples in the stroke dataset;

“探索”机制实现过程为:引入参数v<z,探索系数ε取0.1,在脑卒中数据集中随机选取v个样本组成数据子集Xv,将数据子集Xv中的所有数据与原型向量做“拉近”或“远离”运算得到一个动作;The implementation process of the "exploration" mechanism is: introduce the parameter v<z, the exploration coefficient ε is set to 0.1, randomly select v samples from the stroke data set to form a data subset X v , and compare all the data in the data subset X v with the prototype vector Do a "closer" or "away" operation to get an action;

如果样本xi与原型向量pj的标签相同,则进行“拉近”运算,“拉近”运算为:采用公式p′j=xj+η(xi-xj)对原型向量进行更新;式中,η为学习率,p′j为更新后的原型向量,xj为原型向量pj对应的属性值;If the sample x i is the same as the label of the prototype vector p j , the "zoom in" operation is performed, and the "zoom in" operation is: use the formula p' j = x j +η(x i -x j ) to update the prototype vector In the formula, η is the learning rate, p′ j is the updated prototype vector, and x j is the attribute value corresponding to the prototype vector p j ;

样本xi与更新后的原型向量p′j之间距离为:The distance between the sample x i and the updated prototype vector p′ j is:

||p′j-xi||2=||xj+η(xi-xj)-xi||2=(1-η)*|pj-xi||||p′ j -x i || 2 =||x j +η(x i -x j )-x i || 2 =(1-η)*|p j -x i ||

如果样本xi与原型向量pj标签不相同,则进行“远离“运算,“远离”运算为:采用公式p′j=xj-η(xi-xj)对原型向量pj进行更新;If the sample x i and the prototype vector p j have different labels, the "far away" operation is performed, and the "far away" operation is: using the formula p' j =x j -η(x i -x j ) to update the prototype vector p j ;

此时,xi与更新后的原型向量p′j之间距离为:At this time, the distance between x i and the updated prototype vector p′ j is:

||p′j-xi||2=||xj-η(xi-xj)-xi||2=(1+η)*||pj-xi||2||p′ j -xi || 2 =||x j -η( xi -x j )-x i || 2 = (1+η)*||p j -xi || 2 .

根据本发明提供的具体实施例,本发明公开了以下技术效果:According to the specific embodiments provided by the present invention, the present invention discloses the following technical effects:

本发明提供的脑卒中数据分布规律智能分析方法,使用特征降维方法去除冗余噪声,将脑卒中数据冗余特征去除,对去除冗余后的特征数据采用聚类方法分析,得到数据分布规律,实现脑卒中病例数据的智能分析,能够提高数据处理效率,进而提高脑卒中数据分布规律的生成效率和精确性。The intelligent analysis method for the distribution law of stroke data provided by the present invention uses a feature dimension reduction method to remove redundant noise, removes redundant features of the stroke data, and uses a clustering method to analyze the redundant feature data to obtain the data distribution law. , to realize the intelligent analysis of stroke case data, which can improve the data processing efficiency, and then improve the generation efficiency and accuracy of stroke data distribution rules.

对应与上述提供的脑卒中数据分布规律智能分析方法,本发明还提供了一种脑卒中数据分布规律智能分析系统,该系统包括:Corresponding to the above-mentioned intelligent analysis method for stroke data distribution law, the present invention also provides an intelligent analysis system for stroke data distribution law, which includes:

数据获取模块,用于获取脑卒中病例数据;A data acquisition module for acquiring stroke case data;

白化处理模块,用于对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据;The whitening processing module is used to perform ZCA whitening processing on the stroke case data to obtain whitening processing data;

降维处理模块,用于采用堆叠稀疏自编码器对所述白化处理结果进行降维处理,得到降维数据;a dimensionality reduction processing module, configured to perform dimensionality reduction processing on the whitening processing result by using a stacked sparse autoencoder to obtain dimensionality reduction data;

聚类处理模块,用于采用聚类方法对所述降维数据进行聚类得到聚类结果;a clustering processing module, configured to perform clustering on the dimensionality reduction data using a clustering method to obtain a clustering result;

规律生成模块,用于基于所述聚类结果生成脑卒中数据分布规律。A rule generation module, configured to generate a stroke data distribution rule based on the clustering result.

优先地,所述白化处理模块包括:Preferably, the whitening processing module includes:

矩阵转化单元,用于将所述脑卒中病例数据转化为数值矩阵;a matrix conversion unit, for converting the stroke case data into a numerical matrix;

标准化处理单元,用于对所述数值矩阵进行标准化处理得到第一矩阵;a normalization processing unit, configured to perform normalization processing on the numerical matrix to obtain a first matrix;

特征值确定单元,用于确定所述第一矩阵的样本协方差矩阵,并确定所述样本协方差矩阵的特征值;an eigenvalue determining unit, configured to determine the sample covariance matrix of the first matrix, and determine the eigenvalues of the sample covariance matrix;

特征提取单元,用于对所述特征值进行降序排列,并提取降序排列后每一特征值的特征向量得到第二矩阵;a feature extraction unit, configured to arrange the feature values in descending order, and extract the feature vector of each feature value after the descending arrangement to obtain a second matrix;

矩阵确定单元,用于确定所述第二矩阵的旋转矩阵;a matrix determination unit for determining the rotation matrix of the second matrix;

白化处理单元,用于基于所述旋转矩阵确定所述白化处理数据。A whitening processing unit, configured to determine the whitening processing data based on the rotation matrix.

优先地,所述聚类处理模块包括:Preferably, the clustering processing module includes:

聚类处理单元,用于采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果;所述深度强化学习优化的学习矢量量化聚类方法为融入有深度强化学习的状态集和深度强化学习的动作集的学习矢量量化聚类方法。A clustering processing unit for clustering the dimensionality reduction data by adopting a learning vector quantization clustering method optimized by deep reinforcement learning to obtain a clustering result; A learning vector quantization clustering method for state sets for reinforcement learning and action sets for deep reinforcement learning.

因本发明提供的脑卒中数据分布规律智能分析系统实现的技术效果,与上述提供的脑卒中数据分布规律智能分析方法实现的技术效果相同,故在此不再进行赘述。Because the technical effect achieved by the intelligent analysis system for stroke data distribution law provided by the present invention is the same as the technical effect achieved by the intelligent analysis method for stroke data distribution law provided above, it will not be repeated here.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the accompanying drawings required in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some of the present invention. In the embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明提供的脑卒中数据分布规律智能分析方法的流程图;Fig. 1 is the flow chart of the intelligent analysis method of stroke data distribution law provided by the present invention;

图2为本发明实施例提供的堆叠稀疏自编码器的结构示意图;2 is a schematic structural diagram of a stacked sparse autoencoder provided by an embodiment of the present invention;

图3为本发明实施例提供的基于深度强化学习的学习矢量量化聚类方法的处理流程图;3 is a process flow diagram of a deep reinforcement learning-based learning vector quantization clustering method provided by an embodiment of the present invention;

图4为本发明实施例提供的深度Q网络的数据处理流程图;4 is a data processing flowchart of a deep Q network provided by an embodiment of the present invention;

图5为本发明提供的脑卒中数据分布规律智能分析系统的结构示意图。FIG. 5 is a schematic structural diagram of an intelligent analysis system for stroke data distribution law provided by the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的目的是提供一种脑卒中数据分布规律智能分析方法及系统,能够提高数据处理效率,进而提高脑卒中数据分布规律的生成效率和精确性。The purpose of the present invention is to provide an intelligent analysis method and system of stroke data distribution law, which can improve the data processing efficiency, thereby improving the generation efficiency and accuracy of the stroke data distribution law.

为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。In order to make the above objects, features and advantages of the present invention more clearly understood, the present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

如图1所示,本发明提供的脑卒中数据分布规律智能分析方法,包括:As shown in Figure 1, the intelligent analysis method for stroke data distribution law provided by the present invention includes:

步骤100:获取脑卒中病例数据。Step 100: Obtain stroke case data.

步骤101:对脑卒中病例数据进行ZCA白化处理得到白化处理数据。该步骤101中对原始的脑卒中病例数据进行ZCA白化处理的目的是为了,使所有属性的方差相同,每个属性之间拥有较低的相关性或者不相关,从而降低原始数据的冗余性。例如,原始的脑卒中病例数据的维度为40,包含1000个脑卒中数据样本,数据分为8类,分别是脑梗死、脑出血、TIA、未破裂颅内动脉瘤、自发性蛛网膜下腔出血、动静脉畸形AVM、颈动脉狭窄或闭塞以及烟雾病。Step 101: Perform ZCA whitening processing on the stroke case data to obtain whitening processing data. The purpose of performing ZCA whitening processing on the original stroke case data in step 101 is to make all attributes have the same variance, and each attribute has low correlation or no correlation, thereby reducing the redundancy of the original data . For example, the dimension of the original stroke case data is 40, including 1000 stroke data samples, and the data is divided into 8 categories, namely cerebral infarction, cerebral hemorrhage, TIA, unruptured intracranial aneurysm, spontaneous subarachnoid space Bleeding, arteriovenous malformation (AVM), carotid artery stenosis or occlusion, and moyamoya disease.

对脑卒中病例数据进行ZCA白化处理,使得年龄、来院方式、出院科别等40维属性的方差相同,每个属性之间拥有较低的相关性或者不相关,从而降低原始数据的冗余性。假定原始数据集有z个样本,每个样本的维数为d,则具体处理过程为:ZCA whitening is performed on the data of stroke cases, so that the variances of 40-dimensional attributes such as age, hospital admission, and discharge department are the same, and each attribute has a low correlation or irrelevance, thereby reducing the redundancy of the original data. . Assuming that the original data set has z samples, and the dimension of each sample is d, the specific processing process is as follows:

(1)把所有原始脑卒中数据集都转化为d×z的数值矩阵X,再通过标准化处理后,使每个属性均值为零,最后得到的矩阵记为A(即第一矩阵)。(1) All original stroke data sets are converted into a d×z numerical matrix X, and after normalization, the mean value of each attribute is made zero, and the final matrix is denoted as A (ie, the first matrix).

(2)先计算矩阵A所对应的样本协方差矩阵Σ,求出样本协方差矩阵Σ的特征值,并按大小顺序分别标记为λ1,λ2,...,λd,与之相应的特征向量分别标记为u1,u2,...,ud,并记为矩阵U(即第二矩阵),U=[u1,u2,...,ud]。(2) First calculate the sample covariance matrix Σ corresponding to the matrix A, find the eigenvalues of the sample covariance matrix Σ, and mark them as λ 1 , λ 2 , ..., λ d in the order of size, corresponding to them The eigenvectors of u 1 , u 2 , .

(3)计算旋转后的矩阵,为:(3) Calculate the rotated matrix as:

Figure BDA0003754504800000061
Figure BDA0003754504800000061

其中,xz为输入数据,输入数据维度为d,z为数据样本个数,分别用

Figure BDA0003754504800000062
(i=1,2,...,d)去乘以矩阵Xrot对应第g(g=1,2,...,d)行的各元素,以使旋转后矩阵对应的每个属性为单位方差。Among them, x z is the input data, the dimension of the input data is d, and z is the number of data samples.
Figure BDA0003754504800000062
( i =1, 2, . is the unit variance.

基于上述步骤(3)使得所有特征具有相同的方差。All features are made to have the same variance based on the above step (3).

(4)将矩阵Xrot左乘矩阵U,得到的矩阵X1=UXrot就是原始数据集ZCA白化的结果,矩阵的每一列对应ZCA白化后的样本数据,基于这一步骤的处理,白化后数据的特征之间相关性较低。(4) Multiply the matrix X rot to the left by the matrix U, and the obtained matrix X 1 =UX rot is the result of ZCA whitening of the original data set. Each column of the matrix corresponds to the sample data after ZCA whitening. Based on the processing of this step, after whitening The correlation between the features of the data is low.

步骤102:采用堆叠稀疏自编码器对白化处理结果进行降维处理,得到降维数据。例如,将ZCA白化后的脑卒中数据通过堆叠稀疏自编码(SAE)降维。堆叠稀疏自编码器的结构如图2所示,其中,x代表输入层数据,对应白化过程中的X1,h代表隐藏层中第一层SAE网络结构,o和k分别代表第二、第三层SAE网络结构,y代表输出层,其中h、o、k共同构成隐藏层。在训练三层SAE网络过程中先初始化输出层神经元,为每个神经元的权重赋值接近于零的随机值并初始化学习率η(随着时间的增加而减小)。Step 102: Use the stacked sparse autoencoder to perform dimension reduction processing on the whitening processing result to obtain dimension reduction data. For example, dimensionality reduction of ZCA-whitened stroke data by stacking sparse autoencoders (SAE). The structure of the stacked sparse autoencoder is shown in Figure 2, where x represents the input layer data, corresponding to X 1 in the whitening process, h represents the first layer of the SAE network structure in the hidden layer, and o and k represent the second and first layers, respectively. Three-layer SAE network structure, y represents the output layer, and h, o, and k together form the hidden layer. In the process of training the three-layer SAE network, the output layer neurons are initialized first, and the weight of each neuron is assigned a random value close to zero and the learning rate η is initialized (decreases as time increases).

基于如图2所示的结构,计算输入数据和所有输出层权值之间的距离,选择权值向量距离最小的神经元为获胜神经元并调整该获胜神经元的权值,降低学习率,判断是否满足停止条件(学习率为零或达到最大迭代次数),若不满足停止条件则计算输入数据与输出层权值之间的距离寻找获胜神经元。最后,经过SAE降维处理获得的数据集送入到深度强化学习优化的学习矢量量化聚类(DQN-LVQ)算法中进行聚类。Based on the structure shown in Figure 2, calculate the distance between the input data and the weights of all output layers, select the neuron with the smallest weight vector distance as the winning neuron and adjust the weight of the winning neuron to reduce the learning rate, Determine whether the stopping condition is met (the learning rate is zero or the maximum number of iterations is reached). If the stopping condition is not met, the distance between the input data and the weight of the output layer is calculated to find the winning neuron. Finally, the dataset obtained by SAE dimensionality reduction is sent to the Deep Reinforcement Learning Optimized Learning Vector Quantization Clustering (DQN-LVQ) algorithm for clustering.

步骤103:采用聚类方法对降维数据进行聚类得到聚类结果。在本发明中,是基于深度强化学习的学习矢量量化聚类方法(DQN-LVQ),将强化学习中的状态集和动作集融入到聚类过程中,建立聚类问题对应的马尔科夫决策过程,通过探索与利用机制优化动作集,以探索更多的可能性,并设计了适用于该聚类方法的奖赏函数,以确定最大奖赏所对应动作,每次迭代选择最佳动作来得到迭代后最佳的聚类结果。Step 103: Clustering the dimensionality reduction data by using a clustering method to obtain a clustering result. In the present invention, it is a learning vector quantization clustering method (DQN-LVQ) based on deep reinforcement learning, which integrates the state set and action set in reinforcement learning into the clustering process, and establishes the Markov decision corresponding to the clustering problem. In the process, the action set is optimized by exploration and utilization mechanism to explore more possibilities, and a reward function suitable for this clustering method is designed to determine the action corresponding to the maximum reward, and each iteration selects the best action to get the iteration the best clustering result.

其中,本发明采用的强化学习为深度Q网络。深度Q网络(DQN)属于基于值函数的深度强化学习算法,通过一个深度神经网络,来进行当前目标状态动作Q函数值(下面简称Q值)的估算,并且设定了另一种构造相似、但参数不同的网络来对当前目标状态Q值进行估算,即Q(s,a θi),其中s代表状态,a代表要执行的动作,对当前状态Q值进行测量。数据处理原理如图4所示,对环境进行初始化,得到状态st,执行动作a,获得新一轮的状态st+1和回报r,回放单元中存入(st,a,r,st+1),通过从回放单元中采集的样本,进行梯度下降法求解,更新参数。Among them, the reinforcement learning adopted in the present invention is a deep Q network. Deep Q network (DQN) belongs to the deep reinforcement learning algorithm based on value function. Through a deep neural network, the current target state action Q function value (hereinafter referred to as the Q value) is estimated, and another structure similar to, However, networks with different parameters are used to estimate the Q value of the current target state, that is, Q(s, a θ i ), where s represents the state, a represents the action to be performed, and the current state Q value is measured. The principle of data processing is shown in Figure 4. The environment is initialized, the state s t is obtained, the action a is executed, the state s t+1 and the return r of a new round are obtained, and the playback unit stores (s t , a, r, s t+1 ), through the samples collected from the playback unit, the gradient descent method is used to solve the problem, and the parameters are updated.

学习矢量量化聚类(LVQ)属于原型聚类方法,首先在数据集中随机选择一组原型向量作为类簇中心,将聚类空间划分为多个簇。对于每一个输入样本,将它划入到距离最近的簇中,要求数据带有类别标记。LVQ的核心思想是对各类簇所对应的原型向量进行迭代优化,每一次对所有带标记的训练样本,寻找与其位置最近的原型向量,并通过检查二者的类别标识值是否相同来对原型向量做出适当的更新。基于此,本发明的状态集选取过程为:Learning vector quantization clustering (LVQ) belongs to the prototype clustering method. First, a set of prototype vectors is randomly selected in the data set as the cluster center, and the clustering space is divided into multiple clusters. For each input sample, it is classified into the nearest cluster, requiring the data to be labeled with a class. The core idea of LVQ is to iteratively optimize the prototype vectors corresponding to various clusters. Each time, for all the labeled training samples, find the prototype vector closest to its position, and check whether the category identification values of the two are the same. vector to make appropriate updates. Based on this, the state set selection process of the present invention is:

将LVQ算法的每一次迭代得到的原型向量视为一个状态,迭代次数作为状态数,初始化原型向量即为状态s0,执行一个动作即执行一次LVQ运算后进入状态s1。例如,设定迭代次数为f次,则状态集包含f个状态sf,最终状态sf对应的原型向量即为各类簇中心,每个样本距离最近的原型向量即为其所属类簇。The prototype vector obtained by each iteration of the LVQ algorithm is regarded as a state, the number of iterations is regarded as the number of states, the initialized prototype vector is the state s 0 , and the state s 1 is entered after an action is performed, that is, an LVQ operation is performed. For example, if the number of iterations is set to f, the state set contains f states s f , the prototype vector corresponding to the final state s f is the center of each type of cluster, and the prototype vector with the closest distance to each sample is the cluster to which it belongs.

动作集选取为:The action set is selected as:

传统的LVQ算法将所有的数据点与原型向量做运算,这样得到的聚类结果不一定是最优解,因为有一部分数据点可能不需要做拉近或者远离运算,导致算法的聚类性能有待提升,且不必要的拉近或者远离运算增大了算法的计算复杂度。为了减少这些不必要的运算,本发明进一步采用“探索与利用”机制选择性的挑选数据点与原型向量进行“拉近或远离”运算,以降低算法复杂度,提升聚类效果。本发明将原型向量与数据集中选定的点进行“拉近或远离”运算作为一个动作。较多的动作意味着更多的可能性,可能得到更好的聚类效果,但过多的动作会增加计算量,需要达到一个平衡,设定动作集中包含L个动作。The traditional LVQ algorithm performs operations on all data points and prototype vectors, and the resulting clustering result is not necessarily the optimal solution, because some data points may not need to be zoomed in or away from the operation, resulting in the clustering performance of the algorithm. Lifting, and unnecessarily close or far away operations increase the computational complexity of the algorithm. In order to reduce these unnecessary operations, the present invention further adopts the "exploration and utilization" mechanism to selectively select data points and prototype vectors to perform "closer or farther" operations, so as to reduce the algorithm complexity and improve the clustering effect. The present invention takes the "closer or farther" operation of the prototype vector and the selected point in the data set as an action. More actions mean more possibilities, and better clustering effects may be obtained, but too many actions will increase the amount of calculation, and a balance needs to be achieved. The set action set contains L actions.

探索与利用:不同的数据点与原型向量做运算会得出不同的聚类结果,所以不能保证在原始数据集中对所有样本与原型向量进行运算能得到最佳聚类性能,为此,本发明进一步在数据集的选取上采用了“探索”与“利用”机制。采用“探索”机制,可增加未挖掘动作的选取机会,“利用”机制是为了使所有数据尽可能被选取进行“拉近”或“远离”运算。Exploration and utilization: different data points and prototype vectors will obtain different clustering results, so it cannot be guaranteed that the best clustering performance can be obtained by operating on all samples and prototype vectors in the original data set. Further, the "exploration" and "utilization" mechanisms are adopted in the selection of data sets. Using the "exploration" mechanism can increase the chance of selecting unmined actions, and the "utilization" mechanism is to make all data as possible to be selected for "closer" or "farther away" operations.

“利用”机制实现:引入参数m<z(其中z为脑卒中数据集样本个数),在输入数据集中随机选取m个样本组成数据子集Xm,将该数据集中的m个数据与原型向量做“拉近”或“远离”运算作为一个动作,这个过程体现了“利用”机制,以在所有可能的m个输入样本中进行“拉近”或“远离”运算,使得到的原型向量尽可能代表输入样本的类中心信息。"Using" mechanism to achieve: introduce the parameter m<z (where z is the number of samples in the stroke data set), randomly select m samples from the input data set to form a data subset X m , and match the m data in the data set with the prototype The vector does "closer" or "away" operation as an action, and this process embodies the "exploitation" mechanism to perform "closer" or "away" operation in all possible m input samples, so that the resulting prototype vector Represents the class center information of the input samples as much as possible.

“探索”机制实现:引入参数v<z,探索系数ε取为0.1,在输入脑卒中数据集中随机选取v个样本组成数据子集Xv,将该数据集中的所有数据与原型向量做“拉近”或“远离”运算作为一个动作。Implementation of the "exploration" mechanism: introduce the parameter v<z, the exploration coefficient ε is set to 0.1, randomly select v samples in the input stroke data set to form a data subset X v , and do "pull" with all the data in the data set and the prototype vector The "closer" or "farther away" operation is used as an action.

若设定需要L个动作,则利用过程进行

Figure BDA0003754504800000081
次,即对随机构成的
Figure BDA0003754504800000082
Figure BDA0003754504800000083
数据子集,依次对每个
Figure BDA0003754504800000084
中所有样本进行“拉近”或“远离”运算,共计执行
Figure BDA0003754504800000085
次。探索过程则为
Figure BDA0003754504800000086
次,即对随机构成的
Figure BDA0003754504800000087
Figure BDA0003754504800000088
数据子集,依次对每个
Figure BDA0003754504800000089
中所有的数据进行“拉近”或“远离”运算,共计执行
Figure BDA00037545048000000810
次。If the setting requires L actions, use the process to perform
Figure BDA0003754504800000081
times, that is, for randomly composed
Figure BDA0003754504800000082
indivual
Figure BDA0003754504800000083
subsets of data, in turn for each
Figure BDA0003754504800000084
All samples in the "zoom in" or "far away" operation, a total of
Figure BDA0003754504800000085
Second-rate. The exploration process is
Figure BDA0003754504800000086
times, that is, for the randomly formed
Figure BDA0003754504800000087
indivual
Figure BDA0003754504800000088
subsets of data, in turn for each
Figure BDA0003754504800000089
All the data in the "closer" or "farther" operation, a total of
Figure BDA00037545048000000810
Second-rate.

其中“拉近”运算为:设样本xi与原型向量pj标签相同,则进行“拉近”运算,即按照以下公式对原型向量进行更新,使二者更接近:The "closer" operation is: if the sample x i and the prototype vector pj have the same label, the "closer" operation is performed, that is, the prototype vector is updated according to the following formula to make the two closer:

p′j=xj+η(xi-xj)p′ j =x j +η(x i -x j )

xi与更新后的原型向量之间距离为:The distance between x i and the updated prototype vector is:

||p′j-xi||2=||xj+η(xi-xj)-xi||2=(1-η)*||pj-xi||||p′ j -x i || 2 =||x j +η(x i -x j )-x i || 2 =(1-η)*||p j -x i ||

更新后原型向量更接近xi;其中学习率η设置为η=0.1,The updated prototype vector is closer to xi ; where the learning rate η is set to η=0.1,

“远离”运算为:如果样本xi与原型向量pj标签不相同,则进行“远离“运算,即对原型向量按照下式进行更新:The "far away" operation is: if the sample x i and the prototype vector p j have different labels, the "far away" operation is performed, that is, the prototype vector is updated according to the following formula:

p′j=xj-η(xi-xj)p′ j =x j -η(x i -x j )

xi与更新后的原型向量之间距离为:The distance between x i and the updated prototype vector is:

||p′j-xi||2=||xj-η(xi-xj)-xi||2=(1+η)*||pj-xi||2 ||p′ j -x i || 2 =||x j -η(x i -x j )-x i || 2 =(1+η)*||p j -x i || 2

更新后原型向量离xi更远;After the update, the prototype vector is further away from xi ;

奖赏函数设计:

Figure BDA0003754504800000091
Reward function design:
Figure BDA0003754504800000091

其中,ai代表第i次迭代从动作集A={a1,a2,...,ai,...,aL}中选取的动作,

Figure BDA0003754504800000094
代表执行动作ai后所得到质心与各类簇样本的平均距离。di代表第i次迭代原始LVQ各类簇样本与其质心平均距离。α代表其比值。质心与簇内样本点欧氏距离越小,聚类效果越好,所以α>1时,代表本次迭代聚类效果优于LVQ。α<1时,表明本次迭代聚类性能劣于LVQ算法。Among them, a i represents the action selected from the action set A={a 1 ,a 2 ,...,a i ,...,a L } in the ith iteration,
Figure BDA0003754504800000094
Represents the average distance between the centroid obtained after performing action a i and the samples of various clusters. d i represents the average distance between the original LVQ cluster samples of the i-th iteration and their centroids. α represents its ratio. The smaller the Euclidean distance between the centroid and the sample points in the cluster, the better the clustering effect, so when α>1, it means that the iterative clustering effect is better than LVQ. When α<1, it indicates that the clustering performance of this iteration is inferior to that of the LVQ algorithm.

其中,在一个坐标系中用来求一个点到另一个点的最短距离,以两点之间的距离为例,假设Ai和Bj的坐标分别为(xi,yi)和(xj,yj),则二者之间的欧氏距离dij为:Among them, it is used to find the shortest distance from one point to another point in a coordinate system, taking the distance between two points as an example, assuming that the coordinates of A i and B j are (x i , y i ) and (x i , respectively) j , y j ), then the Euclidean distance d ij between the two is:

Figure BDA0003754504800000092
Figure BDA0003754504800000092

其中,DQN-LVQ的数据处理过程如图3和表1所示。Among them, the data processing process of DQN-LVQ is shown in Figure 3 and Table 1.

表1 DQN-LVQ的数据处理流程表Table 1 Data processing flow chart of DQN-LVQ

Figure BDA0003754504800000093
Figure BDA0003754504800000093

Figure BDA0003754504800000101
Figure BDA0003754504800000101

Figure BDA0003754504800000111
Figure BDA0003754504800000111

步骤104:基于聚类结果生成脑卒中数据分布规律。Step 104: Generate a stroke data distribution rule based on the clustering result.

基于上述描述,本发明还具有以下优点:Based on the above description, the present invention also has the following advantages:

1、针对脑卒中数据存在冗余特征的缺陷,本发明首先对原始的脑卒中数据进行ZCA白化预处理,以能够降低数据的冗余性。1. In view of the defect of redundant features in the stroke data, the present invention first performs ZCA whitening preprocessing on the original stroke data, so as to reduce the redundancy of the data.

2、针对学习矢量量化聚类方法随着数据维度增大,计算量随之增大,但数据分析效果呈现下降趋势的问题,本发明先对高维数据进行特征学习,即使用三层稀疏自编码对脑卒中数据集进行降维,然后对降维后的数据进行聚类。2. For the learning vector quantization clustering method, as the data dimension increases, the calculation amount increases, but the data analysis effect shows a downward trend. The encoding performs dimensionality reduction on the stroke dataset, and then clusters the dimensionality-reduced data.

3、本发明将深度强化学习与学习矢量量化聚类方法结合(LVQ)提出了一种新的深度强化学习LVQ聚类方法,然后利用提出的新型学习矢量量化聚类方法对脑卒中数据集进行分析,得到脑卒中数据分布规律。3. The present invention combines deep reinforcement learning and learning vector quantization clustering method (LVQ) to propose a new deep reinforcement learning LVQ clustering method, and then uses the proposed new learning vector quantization clustering method to perform a stroke data set. Analyzed and obtained the distribution law of stroke data.

对应与上述提供的脑卒中数据分布规律智能分析方法,本发明还提供了一种脑卒中数据分布规律智能分析系统,如图5所示,该系统包括:Corresponding to the above-mentioned intelligent analysis method for stroke data distribution law, the present invention also provides an intelligent analysis system for stroke data distribution law, as shown in FIG. 5 , the system includes:

数据获取模块1,用于获取脑卒中病例数据。The data acquisition module 1 is used for acquiring stroke case data.

白化处理模块2,用于对脑卒中病例数据进行ZCA白化处理得到白化处理数据。The whitening processing module 2 is used to perform ZCA whitening processing on the stroke case data to obtain whitening processing data.

降维处理模块3,用于采用堆叠稀疏自编码器对白化处理结果进行降维处理,得到降维数据。The dimensionality reduction processing module 3 is used to perform dimensionality reduction processing on the whitening processing result by using the stacked sparse autoencoder to obtain dimensionality reduction data.

聚类处理模块4,用于采用聚类方法对降维数据进行聚类得到聚类结果。The clustering processing module 4 is used for clustering the dimensionality reduction data by using a clustering method to obtain a clustering result.

规律生成模块5,用于基于聚类结果生成脑卒中数据分布规律。The rule generation module 5 is used for generating the stroke data distribution rule based on the clustering result.

作为本发明的一优选实施例,为了提高脑卒中数据处理的实时性和准确性,上述采用的白化处理模块1包括:As a preferred embodiment of the present invention, in order to improve the real-time performance and accuracy of stroke data processing, the above-mentioned whitening processing module 1 includes:

矩阵转化单元,用于将脑卒中病例数据转化为数值矩阵。The matrix transformation unit is used to transform the stroke case data into a numerical matrix.

标准化处理单元,用于对数值矩阵进行标准化处理得到第一矩阵。The normalization processing unit is configured to perform normalization processing on the numerical matrix to obtain the first matrix.

特征值确定单元,用于确定第一矩阵的样本协方差矩阵,并确定样本协方差矩阵的特征值。The eigenvalue determining unit is used for determining the sample covariance matrix of the first matrix and determining the eigenvalues of the sample covariance matrix.

特征提取单元,用于对特征值进行降序排列,并提取降序排列后每一特征值的特征向量得到第二矩阵。The feature extraction unit is used for arranging the eigenvalues in descending order, and extracting the eigenvectors of each eigenvalue after the descending ordering to obtain the second matrix.

矩阵确定单元,用于确定第二矩阵的旋转矩阵。The matrix determination unit is used for determining the rotation matrix of the second matrix.

白化处理单元,用于基于旋转矩阵确定白化处理数据。The whitening processing unit is used for determining the whitening processing data based on the rotation matrix.

作为本发明的另一优选实施例,上述采用的聚类处理模块4包括:As another preferred embodiment of the present invention, the clustering processing module 4 adopted above includes:

聚类处理单元,用于采用深度强化学习优化的学习矢量量化DQN-LVQ聚类方法对降维数据进行聚类得到聚类结果。DQN-LVQ聚类方法为融入有强化学习的状态集和强化学习的动作集的学习矢量量化聚类方法。The clustering processing unit is used for clustering the dimensionality reduction data by adopting the learning vector quantization DQN-LVQ clustering method optimized by deep reinforcement learning to obtain the clustering result. The DQN-LVQ clustering method is a learning vector quantization clustering method that integrates the state set of reinforcement learning and the action set of reinforcement learning.

本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same and similar parts between the various embodiments can be referred to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method.

本发明中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本发明的限制。In the present invention, specific examples are used to illustrate the principles and implementations of the present invention, and the descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; There will be changes in the specific implementation manner and application scope of the idea of the invention. In conclusion, the contents of this specification should not be construed as limiting the present invention.

Claims (10)

1.一种脑卒中数据分布规律智能分析方法,其特征在于,包括:1. a stroke data distribution law intelligent analysis method, is characterized in that, comprises: 获取脑卒中病例数据;Obtain stroke case data; 对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据;Performing ZCA whitening processing on the stroke case data to obtain whitening processing data; 采用堆叠稀疏自编码器对所述白化处理结果进行降维处理,得到降维数据;The dimensionality reduction processing is performed on the whitening processing result by using a stacked sparse autoencoder to obtain dimensionality reduction data; 采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果;The dimensionality reduction data is clustered by adopting the learning vector quantization clustering method optimized by deep reinforcement learning to obtain a clustering result; 基于所述聚类结果生成脑卒中数据分布规律。A stroke data distribution rule is generated based on the clustering result. 2.根据权利要求1所述的脑卒中数据分布规律智能分析方法,其特征在于,所述对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据,具体包括:2. The method for intelligent analysis of stroke data distribution law according to claim 1, wherein the described stroke case data is subjected to ZCA whitening processing to obtain whitening processing data, specifically comprising: 将所述脑卒中病例数据转化为数值矩阵;converting the stroke case data into a numerical matrix; 对所述数值矩阵进行标准化处理得到第一矩阵;standardizing the numerical matrix to obtain a first matrix; 确定所述第一矩阵的样本协方差矩阵,并确定所述样本协方差矩阵的特征值;determining the sample covariance matrix of the first matrix, and determining the eigenvalues of the sample covariance matrix; 对所述特征值进行降序排列,并提取降序排列后每一特征值的特征向量得到第二矩阵;Arrange the eigenvalues in descending order, and extract the eigenvectors of each eigenvalue after the descending arrangement to obtain a second matrix; 确定所述第二矩阵的旋转矩阵;determining the rotation matrix of the second matrix; 基于所述旋转矩阵确定所述白化处理数据。The whitening process data is determined based on the rotation matrix. 3.根据权利要求1所述的脑卒中数据分布规律智能分析方法,其特征在于,采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果。3 . The method for intelligent analysis of stroke data distribution law according to claim 1 , wherein a clustering result is obtained by clustering the dimensionality reduction data using a learning vector quantization clustering method optimized by deep reinforcement learning. 4 . 4.根据权利要求3所述的脑卒中数据分布规律智能分析方法,其特征在于,所述深度强化学习优化的学习矢量量化聚类方法为融入有深度强化学习的状态集和深度强化学习的动作集的学习矢量量化聚类方法。4. The method for intelligent analysis of stroke data distribution law according to claim 3, wherein the learning vector quantization clustering method optimized by deep reinforcement learning is to incorporate the state set of deep reinforcement learning and the action of deep reinforcement learning A set of learning vector quantization clustering methods. 5.根据权利要求3所述的脑卒中数据分布规律智能分析方法,其特征在于,所述状态集的选择过程为:5. intelligent analysis method of stroke data distribution law according to claim 3, is characterized in that, the selection process of described state set is: 将学习矢量量化聚类方法的每一次迭代得到的原型向量作为一个状态,迭代次数作为状态数,形成所述状态集。The prototype vector obtained in each iteration of the learning vector quantization clustering method is taken as a state, and the number of iterations is taken as the number of states to form the state set. 6.根据权利要求3所述的脑卒中数据分布规律智能分析方法,其特征在于,所述动作集的构建过程包括:6. The method for intelligent analysis of stroke data distribution law according to claim 3, wherein the construction process of the action set comprises: 通过探索与利用机制得到动作集,设计了适用于该聚类方法的奖赏函数,以确定最大奖赏值所对应动作,每次迭代选择最大奖赏所对应动作来得到迭代后的聚类结果;The action set is obtained by exploring and utilizing the mechanism, and a reward function suitable for the clustering method is designed to determine the action corresponding to the maximum reward value, and each iteration selects the action corresponding to the maximum reward to obtain the iterative clustering result; 其中,奖赏函数设计:
Figure FDA0003754504790000021
Among them, the reward function design:
Figure FDA0003754504790000021
其中,ai代表第i次迭代从动作集A={a1,a2,...,ai,...,aL}中选取的动作,dai代表执行动作ai后所得到质心与各类簇样本的平均距离,di代表第i次迭代原始LVQ各类簇样本与其质心平均距离。Among them, a i represents the action selected from the action set A={a 1 , a 2 , ..., a i , ..., a L } in the ith iteration, and d ai represents the result obtained after executing the action a i The average distance between the centroid and various cluster samples, d i represents the average distance between the original LVQ cluster samples of the i-th iteration and their centroids.
7.根据权利要求6所述的脑卒中数据分布规律智能分析方法,其特征在于,所述动作集的选取过程为:7. the stroke data distribution law intelligent analysis method according to claim 6, is characterized in that, the selection process of described action set is: 采用“探索”机制和“利用”机制选择性的挑选数据点与原型向量进行“拉近或远离”运算;Use the "exploration" mechanism and the "utilization" mechanism to selectively select data points and prototype vectors to perform "closer or farther" operations; 其中,“利用”机制的实现过程为:引入参数m<z,在输入数据集中随机选取m个样本组成数据子集Xm,将该数据集中的m个数据与原型向量做“拉近”或“远离”运算得到一个动作;z为脑卒中数据集样本个数;Among them, the realization process of the "utilization" mechanism is: introduce the parameter m<z, randomly select m samples from the input data set to form a data subset X m , and "close" the m data in the data set with the prototype vector or The "far away" operation gets an action; z is the number of samples in the stroke dataset; “探索”机制实现过程为:引入参数v<z,探索系数ε取0.1,在脑卒中数据集中随机选取v个样本组成数据子集Xv,将数据子集Xv中的所有数据与原型向量做“拉近”或“远离”运算得到一个动作;The implementation process of the "exploration" mechanism is: introduce the parameter v<z, the exploration coefficient ε is set to 0.1, randomly select v samples from the stroke data set to form a data subset X v , and compare all the data in the data subset X v with the prototype vector Do a "closer" or "away" operation to get an action; 如果样本xi与原型向量pj的标签相同,则进行“拉近”运算,“拉近”运算为:采用公式p′j=xj+η(xi-xj)对原型向量进行更新;式中,η为学习率,p′j为更新后的原型向量,xj为原型向量pj对应的属性值;If the sample x i is the same as the label of the prototype vector p j , the "zoom in" operation is performed, and the "zoom in" operation is: use the formula p' j = x j +η(x i -x j ) to update the prototype vector In the formula, η is the learning rate, p′ j is the updated prototype vector, and x j is the attribute value corresponding to the prototype vector p j ; 样本xi与更新后的原型向量p′j之间距离为:The distance between the sample x i and the updated prototype vector p′ j is: ||p′j-xi||2=||xj+η(xi-xj)-xi||2=(1-η)*||pj-xi||||p′ j -x i || 2 =||x j +η(x i -x j )-x i || 2 =(1-η)*||p j -x i || 如果样本xi与原型向量pj标签不相同,则进行“远离“运算,“远离”运算为:采用公式p′j=xj-η(xi-xj)对原型向量pj进行更新;If the sample x i and the prototype vector p j have different labels, the "far away" operation is performed, and the "far away" operation is: using the formula p' j =x j -η(x i -x j ) to update the prototype vector p j ; 此时,xi与更新后的原型向量p′j之间距离为:At this time, the distance between x i and the updated prototype vector p′ j is: ||p′j-xi||2=||xj-η(xi-xj)-xi||2=(1+η)*||pj-xi||2||p′ j -xi || 2 =||x j -η( xi -x j )-x i || 2 = (1+η)*||p j -xi || 2 . 8.一种脑卒中数据分布规律智能分析系统,其特征在于,包括:8. An intelligent analysis system for stroke data distribution law, characterized in that, comprising: 数据获取模块,用于获取脑卒中病例数据;A data acquisition module for acquiring stroke case data; 白化处理模块,用于对所述脑卒中病例数据进行ZCA白化处理得到白化处理数据;The whitening processing module is used to perform ZCA whitening processing on the stroke case data to obtain whitening processing data; 降维处理模块,用于采用堆叠稀疏自编码器对所述白化处理结果进行降维处理,得到降维数据;a dimensionality reduction processing module, configured to perform dimensionality reduction processing on the whitening processing result by using a stacked sparse autoencoder to obtain dimensionality reduction data; 聚类处理模块,用于采用深度强化学习优化的学习矢量量化聚类方法对所述降维数据进行聚类得到聚类结果;a clustering processing module, configured to perform clustering on the dimensionality reduction data by adopting a learning vector quantization clustering method optimized by deep reinforcement learning to obtain a clustering result; 规律生成模块,用于基于所述聚类结果生成脑卒中数据分布规律。A rule generation module, configured to generate a stroke data distribution rule based on the clustering result. 9.根据权利要求8所述的脑卒中数据分布规律智能分析系统,其特征在于,所述白化处理模块包括:9. The intelligent analysis system for stroke data distribution law according to claim 8, wherein the whitening processing module comprises: 矩阵转化单元,用于将所述脑卒中病例数据转化为数值矩阵;a matrix conversion unit, for converting the stroke case data into a numerical matrix; 标准化处理单元,用于对所述数值矩阵进行标准化处理得到第一矩阵;a normalization processing unit, configured to perform normalization processing on the numerical matrix to obtain a first matrix; 特征值确定单元,用于确定所述第一矩阵的样本协方差矩阵,并确定所述样本协方差矩阵的特征值;an eigenvalue determining unit, configured to determine the sample covariance matrix of the first matrix, and determine the eigenvalues of the sample covariance matrix; 特征提取单元,用于对所述特征值进行降序排列,并提取降序排列后每一特征值的特征向量得到第二矩阵;a feature extraction unit, configured to arrange the feature values in descending order, and extract the feature vector of each feature value after the descending arrangement to obtain a second matrix; 矩阵确定单元,用于确定所述第二矩阵的旋转矩阵;a matrix determination unit for determining the rotation matrix of the second matrix; 白化处理单元,用于基于所述旋转矩阵确定所述白化处理数据。A whitening processing unit, configured to determine the whitening processing data based on the rotation matrix. 10.根据权利要求8所述的脑卒中数据分布规律智能分析系统,其特征在于,所述聚类处理模块包括:10. The intelligent analysis system for stroke data distribution law according to claim 8, wherein the clustering processing module comprises: 聚类处理单元,用于采用深度强化学习优化的学习矢量量化DQN-LVQ聚类方法对所述降维数据进行聚类得到聚类结果;所述深度强化学习优化的学习矢量量化聚类方法为融入有深度强化学习的状态集和深度强化学习的动作集的学习矢量量化聚类方法。A clustering processing unit, used for clustering the dimensionality reduction data by adopting the learning vector quantization DQN-LVQ clustering method optimized by deep reinforcement learning to obtain a clustering result; the learning vector quantization clustering method optimized by deep reinforcement learning is: A learning vector quantization clustering method incorporating a deep reinforcement learning state set and a deep reinforcement learning action set.
CN202210855844.9A 2022-07-20 2022-07-20 A method and system for intelligent analysis of stroke data distribution patterns Active CN115116619B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210855844.9A CN115116619B (en) 2022-07-20 2022-07-20 A method and system for intelligent analysis of stroke data distribution patterns

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210855844.9A CN115116619B (en) 2022-07-20 2022-07-20 A method and system for intelligent analysis of stroke data distribution patterns

Publications (2)

Publication Number Publication Date
CN115116619A true CN115116619A (en) 2022-09-27
CN115116619B CN115116619B (en) 2025-09-12

Family

ID=83335192

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210855844.9A Active CN115116619B (en) 2022-07-20 2022-07-20 A method and system for intelligent analysis of stroke data distribution patterns

Country Status (1)

Country Link
CN (1) CN115116619B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558443A (en) * 2023-11-23 2024-02-13 南通大学 Intelligent analysis method for disease development and efficacy evaluation in patients with hemorrhagic stroke

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942564A (en) * 2014-04-08 2014-07-23 武汉大学 High-resolution remote sensing image scene classifying method based on unsupervised feature learning
CN103987118A (en) * 2014-05-19 2014-08-13 浙江师范大学 Access point k-means clustering method based on received signal strength signal ZCA whitening
CN104408469A (en) * 2014-11-28 2015-03-11 武汉大学 Firework identification method and firework identification system based on deep learning of image
CN111625576A (en) * 2020-05-15 2020-09-04 西北工业大学 A t-SNE-based score clustering analysis method
US20200410384A1 (en) * 2018-03-11 2020-12-31 President And Fellows Of Harvard College Hybrid quantum-classical generative models for learning data distributions
WO2022081646A1 (en) * 2020-10-14 2022-04-21 nference, inc. Noninvasive methods for detection of pulmonary hypertension
US20220180204A1 (en) * 2020-12-08 2022-06-09 International Business Machines Corporation Adversarial semi-supervised one-shot learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942564A (en) * 2014-04-08 2014-07-23 武汉大学 High-resolution remote sensing image scene classifying method based on unsupervised feature learning
CN103987118A (en) * 2014-05-19 2014-08-13 浙江师范大学 Access point k-means clustering method based on received signal strength signal ZCA whitening
CN104408469A (en) * 2014-11-28 2015-03-11 武汉大学 Firework identification method and firework identification system based on deep learning of image
US20200410384A1 (en) * 2018-03-11 2020-12-31 President And Fellows Of Harvard College Hybrid quantum-classical generative models for learning data distributions
CN111625576A (en) * 2020-05-15 2020-09-04 西北工业大学 A t-SNE-based score clustering analysis method
WO2022081646A1 (en) * 2020-10-14 2022-04-21 nference, inc. Noninvasive methods for detection of pulmonary hypertension
US20220180204A1 (en) * 2020-12-08 2022-06-09 International Business Machines Corporation Adversarial semi-supervised one-shot learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MACIEJ KUSY ET AL.: "Probabilistic neural network training procedure based on Q(0)-learning algorithm in medical data classification", APPLIED INTELLIGENCE, 6 August 2014 (2014-08-06), pages 837 - 854, XP035395820, DOI: 10.1007/s10489-014-0562-9 *
史凯岳: "学习向量量化聚类算法优化研究及其应用", 中国优秀硕士论文电子期刊, 15 November 2024 (2024-11-15), pages 1 - 70 *
纪国华 等: "基于混合特征与复合 LVQ 的肺结节恶性度分级方法", 图形处理与多媒体技术, 31 July 2019 (2019-07-31), pages 143 - 144 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117558443A (en) * 2023-11-23 2024-02-13 南通大学 Intelligent analysis method for disease development and efficacy evaluation in patients with hemorrhagic stroke

Also Published As

Publication number Publication date
CN115116619B (en) 2025-09-12

Similar Documents

Publication Publication Date Title
CN112150209B (en) A Construction Method of CNN-LSTM Time Series Prediction Model Based on Cluster Center
Li et al. Shapenet: A shapelet-neural network approach for multivariate time series classification
Sheikh et al. Genetic algorithm based clustering: a survey
CN111899882B (en) A method and system for predicting cancer
Wang et al. Imbalance data processing strategy for protein interaction sites prediction
CN113541834B (en) Abnormal signal semi-supervised classification method and system and data processing terminal
CN113611368B (en) 2D Embedding-Based Semi-Supervised Single-Cell Clustering Method, Apparatus, and Computer Equipment
CN114358169B (en) Colorectal cancer detection system based on XGBoost
CN101324926B (en) Method for selecting characteristic facing to complicated mode classification
CN115100709B (en) Feature separation image face recognition and age estimation method
CN105160352A (en) High-dimensional data subspace clustering projection effect optimization method based on dimension reconstitution
Sugendran et al. Earlier identification of heart disease using enhanced genetic algorithm and fuzzy weight based support vector machine algorithm
CN109545372B (en) A feature selection method for patient physiological data based on distance greedy strategy
Chiu et al. An evolutionary approach to compact dag neural network optimization
CN115116619A (en) Intelligent analysis method and system for stroke data distribution rule
CN113707317A (en) Disease risk factor importance analysis method based on mixed model
WO2021114262A1 (en) Facial image clustering method and apparatus, and computer-readable storage medium
Abd-el Fattah et al. A TOPSIS based method for gene selection for cancer classification
CN115618225A (en) State identification method of traditional Chinese medicine based on graph attention network
CN113971984B (en) Classification model construction method and device, electronic equipment and storage medium
Platon et al. Self-organizing maps with supervised layer
CN114842425B (en) Abnormal behavior identification method for petrochemical process and electronic equipment
CN117637035A (en) Classification model and method for multiple groups of credible integration of students based on graph neural network
CN117668747A (en) A multi-modal data fusion network method based on hypergraph
CN118298930A (en) Gene cluster analysis method based on Gaussian distribution grain spheres

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant