[go: up one dir, main page]

CN114819136B - Parallel deep convolutional neural network optimization method based on Im2col - Google Patents

Parallel deep convolutional neural network optimization method based on Im2col Download PDF

Info

Publication number
CN114819136B
CN114819136B CN202210279825.6A CN202210279825A CN114819136B CN 114819136 B CN114819136 B CN 114819136B CN 202210279825 A CN202210279825 A CN 202210279825A CN 114819136 B CN114819136 B CN 114819136B
Authority
CN
China
Prior art keywords
data
parallel
convolution
batch
strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210279825.6A
Other languages
Chinese (zh)
Other versions
CN114819136A (en
Inventor
毛伊敏
戴经国
龚克
陈志刚
霍英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoguan University
Original Assignee
Shaoguan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoguan University filed Critical Shaoguan University
Priority to CN202210279825.6A priority Critical patent/CN114819136B/en
Publication of CN114819136A publication Critical patent/CN114819136A/en
Application granted granted Critical
Publication of CN114819136B publication Critical patent/CN114819136B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Image Analysis (AREA)

Abstract

本发明提出了一种基于Im2col的并行深度卷积神经网络优化方法,包括以下步骤:S1,特征并行提取:提取数据中的目标特征作为卷积神经网络的输入;S2,模型并行训练:在并行DCNN模型训练阶段的卷积过程之中,通过IM‑PMTS策略完成分布式的卷积核剪枝,以及多节点卷积计算;并结合MapReduce和Im2col方法并行训练模型;S3,参数并行更新:在反向传播阶段,对于批量数据采用IM‑BGDS策略进行参数的更新;S4,将待测数据输入参数并行更新后的DCNN模型,输出分类结果。本发明提出MHO‑PFES策略能够避免数据冗余特征多的问题;IM‑PMTS策略提高了卷积层运算速度;IM‑BGDS策略排除异常数据对批梯度的影响,解决了损失函数收敛性差的问题。

The present invention proposes a parallel deep convolutional neural network optimization method based on Im2col, comprising the following steps: S1, feature parallel extraction: extracting target features from data as input of convolutional neural network; S2, model parallel training: in the convolution process of the parallel DCNN model training stage, the distributed convolution kernel pruning and multi-node convolution calculation are completed by the IM‑PMTS strategy; and the model is trained in parallel by combining MapReduce and Im2col methods; S3, parameter parallel update: in the back propagation stage, the IM‑BGDS strategy is used to update the parameters for batch data; S4, the data to be tested is input into the DCNN model after the parameters are updated in parallel, and the classification result is output. The MHO‑PFES strategy proposed in the present invention can avoid the problem of redundant data features; the IM‑PMTS strategy improves the speed of convolution layer operation; the IM‑BGDS strategy eliminates the influence of abnormal data on batch gradients and solves the problem of poor convergence of loss function.

Description

Parallel deep convolution neural network optimization method based on Im2col
Technical Field
The invention relates to the field of big data mining, in particular to a parallel deep convolutional neural network optimization method based on an Im2 col.
Background
DCNN is used as an important classification algorithm in the field of deep learning, has strong characterization capability, generalization capability and fitting capability, has stable effect, does not need to do additional characteristic engineering on data, is often applied to the fields of image classification, voice recognition, object detection, semantic segmentation, face recognition, automatic driving and the like, and is widely focused and studied in depth by people.
With the rapid development of internet technology and the arrival of big data age, big data has the characteristics of large volume, high change speed (speed), multiple modes (variety) and high value (value) compared with traditional data, and the 4V characteristics lead to the DCNN model training to be subjected to a great deal of time consumption caused by massive data training, and the model parameters need to be trained repeatedly due to data and mode changes. Therefore, how to reduce the cost of DCNN model training in a big data environment becomes a urgent problem to be solved.
In recent years, the MapReduce parallel computing model developed by Google corporation is deeply favored by vast students and enterprises due to the advantages of easy programming, high fault tolerance, balanced load, strong expansibility and the like, and a plurality of DCNN algorithms based on the MapReduce computing model are also widely studied. The Leung J et al propose a parallelization DCNN algorithm based on MapReduce, the algorithm adopts a divide-and-conquer idea, data are divided through a Split method of the MapReduce, a plurality of computing nodes are built for training a DCNN network model at the same time, a network model with highest accuracy is selected as the output of the algorithm, and a DCNN parallelization training process is realized. Based on this, huang X et al propose a parallel deep convolutional neural network algorithm FCNN (Fully CNN for processing CT SCAN IMAGE) that converts the full view into a sparse view and smoothes the feature edges by a gaussian filter to enhance important texture feature information. Although the algorithm can accelerate the reading speed in the process of converting the full view into the sparse view, the feature structure of the sparse view is changed, so that the feature is difficult to screen, and the problem of more data redundancy features exists in the training process of the model. Wang H et al propose a single stride optimized CNN algorithm SSOCNN (An optimization of Im2col, an important method of CNNs based on continuous ADDRESS ACCESS) based on an Im2col method, which designs an Im2col algorithm acceleration method under the single stride condition based on continuous memory address reading, and accelerates the progress of mapping an image into a matrix by changing the data reading sequence, and performs matrix multiplication operation on a column vector and a convolution kernel by utilizing general matrix multiplication, thereby realizing the acceleration of convolution layer operation. However, in the process of constructing parallel convolution operation, the algorithm is difficult to screen out redundant convolution kernels scattered at all nodes, so that the problem of low operation speed of a convolution layer cannot be solved in a big data environment. The fur et al propose an MR-FPDCNN algorithm (Deep convolutional neural network algorithm based on feature graph and parallel computing entropy using MapReduce), by combining the DCNN and the firefly algorithm, which combines the information sharing search strategy with the firefly algorithm to find the optimal parameters of the network model, and shares the DCNN network parameters through a MapReduce communication mechanism, so that the convergence rate of the loss function is accelerated. However, the firefly algorithm has poor robustness, and when abnormal data (error labeling, noise data and the like) are processed, the loss function converges and oscillates, so that the loss function has poor convergence.
Disclosure of Invention
The invention aims at least solving the technical problems in the prior art, and particularly creatively provides a parallel deep convolutional neural network optimization method based on an Im2 col.
In order to achieve the above object of the present invention, the present invention provides a parallel deep convolutional neural network optimization method based on Im2col, comprising the steps of:
s1, extracting characteristics in parallel, namely extracting target characteristics in data to serve as input of a convolutional neural network, and solving the problem of more data redundancy characteristics;
S2, model parallel training, namely completing distributed convolution kernel pruning and multi-node convolution calculation through an IM-PMTS strategy in the convolution process of a parallel DCNN model training stage, and combining a MapReduce method and an Im2col method to train the model in parallel, so that the operation speed of a convolution layer is improved;
And S3, updating parameters in parallel, namely adopting an IM-BGDS strategy to update the parameters of the batch data in a back propagation stage, wherein the strategy can exclude a gradient descent method of abnormal data points for the batch data, and can avoid the influence of the abnormal data points on the gradient of the batch data.
S4, inputting the data to be tested into the DCNN model with the parameters updated in parallel, and outputting a classification result.
Further, the S1 adopts an MHO-PFES strategy to carry out feature parallel extraction, and the MHO-PFES strategy comprises the following steps:
S1-1, extracting features, namely filtering input data by adopting an improved non-average filter, calculating a Laplace equation h (x, y) of the filtered data, and searching zero crossings of the Laplace equation to extract data features;
S1-2, feature screening, namely providing a feature correlation index FCI (u, v) to compare the similarity between any two data blocks for further screening target features, setting a correlation coefficient epsilon, and reducing redundant features in the data by removing the data blocks with FCI (u, v) < epsilon.
Further, the improved non-average filter FT (a, b) comprises:
Wherein a represents a target window matrix;
b represents a neighborhood window matrix;
θ (·) is the feature transformation function;
g i is current data;
Vectorized representations of matrices a, b, respectively;
The |·| represents the modulus of the vector.
Further, the characteristic correlation index FCI (u, v) includes:
wherein μ uv represents the expectations of u and v, respectively;
σ uv represents the variance of u and v, respectively;
u and v represent two feature vectors, respectively.
Further, the IM-PMTS strategy in S2 comprises the following steps:
s2-1, convolutional kernel pruning, namely designing a Marsh distance central value MDCV, searching vectors linearly related to the convolutional kernels in the network model by solving the MDCV value, calculating the distance dist between the vectors and each convolutional kernel, and reducing redundancy parameters in the network model by setting a threshold value alpha and cutting the convolutional kernels with dist < alpha;
s2-2, parallel Im2col convolution, namely mapping the feature map into a matrix by using an Im2col algorithm, distributing the matrix and a corresponding convolution kernel storage key value pair to each computing node to perform matrix operation so as to accelerate the operation of a convolution layer, obtaining an operation result of the operation convolution layer, and storing the result into an HDFS.
Further, the mahalanobis distance central value MDCV includes:
Where μ represents the average of all convolution kernels;
s represents covariance matrix of all convolution kernels;
r n is the set of convolution kernels in the same hierarchical model, R n={X1,X2,...,Xn},x∈Rn, X takes any one of the volume sets { X 1,X2,...,Xn }, X 1,X2,...,Xn represents the convolution kernels in the network model;
t represents the transpose.
Further, the IM-BGDS strategy comprises the following steps:
S3-1, gradient construction, namely providing a loss average weight LAW (g i) to eliminate the influence of abnormal data on a batch gradient, and designing a loss sum gradient LSG (T) to construct a batch data average gradient, so that the problem of poor convergence of a loss function is solved;
And S3-2, parameter parallel updating, namely after obtaining the average gradient of the batch data, parallelly calculating errors by combining a MapReduce calculation frame and a counter-propagating error conduction formula, and realizing parallel updating of the parameters.
Further, the loss average weight LAW (g i) includes:
Wherein:
Wherein LAD (g i) is the absolute value of the difference between the loss function value and the loss function value mean for data g i;
g i represents one piece of data in the batch data;
τ is the threshold for LAD (g i);
batch_size represents the batch data size;
J (ω, b) i represents the data g i loss function value;
ω, b are the convolution kernel parameter and the bias of the convolution layer, respectively.
Further, the loss-summing gradient LSG (T) includes:
Where batch_size represents the batch data size;
Representing the gradient of the loss function of data g i with respect to parameter x;
T represents all data in the batch;
LAW (g i) is a weight indicator of the loss function value of data g i.
In summary, due to the adoption of the technical scheme, the problem of more data redundancy features can be avoided by the MHO-PFES strategy, the operation speed of a convolution layer is improved by the IM-PMTS strategy, the influence of abnormal data on batch gradients is eliminated by the IM-BGDS strategy, and the problem of poor convergence of a loss function is solved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the invention will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:
FIG. 1 is the acceleration ratio of each algorithm at CIFAR, imageNet1K dataset, where FIG. 1 (a) is the acceleration ratio of each algorithm on dataset CIFAR and FIG. 1 (b) is the acceleration ratio of each algorithm on dataset ImageNet 1K.
FIG. 2 is the Top-1 accuracy of each algorithm on CIFAR, imageNet1K, where FIG. 2 (a) is the Top-1 accuracy of each algorithm on dataset CIFAR and FIG. 2 (b) is the Top-1 accuracy of each algorithm on dataset ImageNet 1K.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the invention.
The invention provides a parallel deep convolutional neural network optimization method based on an Im2col, which comprises the following steps:
S1, extracting target features in medical image data in parallel as input of a convolutional neural network;
S2, model parallel training, namely completing distributed convolution kernel pruning and multi-node convolution calculation through an IM-PMTS strategy in the convolution process of a parallel DCNN model training stage, and combining a MapReduce method and an Im2col method to train the model in parallel;
s3, updating parameters in parallel, namely adopting an IM-BGDS strategy to update parameters for batch medical image data in a back propagation stage;
s4, inputting the data to be tested of the medical image into the DCNN model with the parameters updated in parallel, and outputting the classification result of the medical image.
The invention provides a parallel deep convolutional neural network optimization algorithm IA-PDCNNOA based on an Im2col algorithm based on the advantage of a MapReduce programming model. Firstly, a Parallel feature extraction strategy MHO-PFES (Parallel feature extraction strategy based on MARR HILDRETH operator) based on a Marr-Hildrth operator is provided, target features in data are extracted to serve as input of a convolutional neural network, the problem of multiple data redundancy features is effectively avoided, secondly, a Parallel model training strategy IM-PMTS (Parallel model TRAINING STRATEGY based on Im2col method) based on an Im2col method is designed, redundant convolutional kernels are removed through designing a Markov distance central value, and Parallel training models are combined with MapReduce and Im2col methods, the operation speed of a convolutional layer is improved, finally, an improved small-batch gradient descent strategy IM-BGDS (Improved Mini Batch GRADIENT DESCENT STRATEGY) is provided, the influence of abnormal data on batch gradients is eliminated, and the problem of poor convergence of a loss function is solved. The algorithm provided by the invention has remarkable improvement on the operation efficiency and the model accuracy, and in addition, the knowledge dug by the method can provide huge help in biology, medicine and astronomical physics.
1. Feature parallel extraction
At present, in a parallel DCNN algorithm under a big data environment, the problem of more data redundancy features exists in a model training process. In order to solve the problem, a MHO-PFES strategy based on Marr-Hildeth operator is proposed, which mainly comprises two steps of (1) extracting features, namely, filtering input data by a modified non-average filter FT (a, b) (Filter transformation), calculating a Laplace equation h (x, y) of the filtered data, searching zero crossing of the Laplace equation to extract data features, and (2) screening features, namely, for further screening target features, extracting feature correlation indexes FCI (u, v) (Feature correlation indices) to compare the similarity between any two data blocks, setting a correlation coefficient epsilon, and reducing redundant features in the data by removing the data blocks of FCI (u, v) < epsilon.
1.1 Feature extraction
In order to acquire high-precision data features, noise removal is carried out on an initial data set, a non-average filter FT (a, b) based on cosine similarity is provided, data noise is removed through self-similarity of data in different areas, then a Laplacian operation of a convolution kernel f (x, y) and data g (x, y) is carried out, zero crossing of a Laplacian equation is constructed and found to extract the data features, the specific process is that firstly, a target window matrix a and a neighborhood window matrix b are set, a neighborhood window slides in current data, a weighted value of the neighborhood window is obtained through cosine similarity of the comparison matrices a and b, noise reduction processing is carried out on the data according to the weighted value and gray values of each point to obtain a noise reduction image g (x, y), and then a convolution kernel f (x, y) with the size of 3*3 is set, and the Laplacian equation is obtained through the Laplacian operationWhere x, y denote the pixel values of the image at (x, y), respectively,And finally, judging whether the second derivative of the current node is a cross zero point or not, and if the first derivative of the current node is in a larger peak value, reserving the node if the condition is met, otherwise, setting the pixel point to be zero, and merging the current data nodes to obtain the data after feature extraction. Generally, for non-mean denoising algorithms, data refers to image data.
Theorem 1 (cosine similarity based non-average filter FT (a, b)) knowing that a represents the target window matrix, b represents the neighborhood window matrix, a, b e (x, y), (x, y) represents the current data. The calculation formula of the transformation function FT (a, b) is as follows:
Wherein θ (·) is a feature transformation function, which may be, for example, a linear kernel function, a gaussian kernel function, etc., G i is current data, The vectorized representation of the matrices a, b, |·| represents the modulus of the vector, respectively.
The non-local mean filtering principle is proved to utilize the non-correlation characteristic of noise, and the values of the pixel blocks without noise are set as omega (p, q) and the values of the noise are set as phi (p, q), the values of the pixel blocks fused with the noise are set as rho (p, q) =omega (p, q) +phi (p, q), and the average value is obtained after the similar pixel blocks are overlappedWherein ρi (p, q) represents the pixel value of the ith pixel block after being fused with noise, and k is the total number of the pixel blocksIs expected to be Due to the similarity of pixel blocks, E [ ω i (p, q) ] can be reduced to ω (p, q), and when the noise is 0, E [ ψ (p, q) ]=0, soFurthermore, due to the uncorrelation of noise, the variance of ω (p, q) isSince ω (p, q) is noiseless and the variance is 0It is shown that the noise ψ (p, q) is related to the variance and FT (p, q) reduces the data noise by reducing ψ (p, q). The Pichia of the syndrome
1.2 Feature screening
After feature extraction is completed, the data in the batch are cut into blocks by a strategy, feature correlation indexes FCI (u, v) are provided to calculate feature similarity between any two data blocks, then the data blocks of FCI (u, v) < epsilon are removed to remove redundant features in the data, the specific process is that firstly, data of the same class are divided into the batch, the data in the batch are cut into data blocks with the same size, each data block is numbered sequentially, the feature correlation indexes FCI (u, v) between any two data blocks are calculated, key value pairs < (u, v) are stored, FCI (u, v) are sent to HDFS, then correlation coefficients epsilon are set, key value pairs < (u, v) are traversed sequentially, FCI (u, v) is removed to remove items of FCI (u, v) < epsilon, finally, key value pairs < (u, v) are traversed again, FCI (u, v) are read to obtain key values of all key value pairs to obtain target feature blocks, and then data of the target feature blocks are obtained, and data of the data are subjected to convolutional network data are subjected to data filtering, and feature filtering is completed.
Theorem 2 (characteristic correlation index FCI (u, v)). Knowing that u and v represent two eigenvectors, mu uv represents the expectation of u and v, and σ uv represents the variance of u and v, respectively. The calculation formula of the characteristic correlation index FCI (u, v) is as follows:
It is proved that FCI (u, v) is an index for measuring the similarity of features between u and v, mu uv is set to represent the expectations of u and v, sigma uv is set to represent the variances of u and v, when the feature vector u is at sigma u =0, the operation of the convolution process on u belongs to linear superposition, features cannot be extracted, FCI (u, v) =0, and when sigma u≠0,σv is not equal to 0 and features of the feature vectors x and y are similar, FCI (u, v) →1, wherein→represents approach. The Pichia of the syndrome
2. Model parallel training
In the DCNN algorithm under the big data environment at present, the parallel training of the model needs to disperse the feature images and the convolution kernels to different calculation nodes for operation, but in the process of constructing parallel convolution operation, the algorithm is difficult to screen out redundant convolution kernels dispersed at all nodes, so that the problem of low operation speed of a convolution layer cannot be solved under the big data environment. In order to solve the problem, an IM-PMTS strategy is provided, which mainly comprises the steps of (1) pruning convolution kernels, namely designing a Mahalanobis DISTANCE CENTER Value (MDCV), searching vectors linearly related to the convolution kernels in a network model by solving the MDCV value, calculating the distance dist between the vectors and each convolution kernel, reducing redundancy parameters in the network model by setting a threshold value alpha and cutting the convolution kernels of dist < alpha, and (2) parallel IM2col convolution, namely mapping a feature map into a matrix by using an IM2col algorithm, storing key value pairs of the matrix and the corresponding convolution kernels, distributing the matrix to each calculation node to accelerate the operation of a convolution layer, obtaining operation results of the operation convolution layer, and storing the results in an HDFS (Hadoop distributed file system).
2.1 Convolution kernel pruning
In order to reduce invalid calculation generated by redundant convolution kernels in a convolutional neural network, a Markov distance central value MDCV is designed to screen out the redundant convolution kernels in a current convolutional layer and further accelerate the operation of the convolutional layer, and the method comprises the specific processes of firstly calculating covariance matrixes S and average mu of all convolution kernels X 1,X2,...,Xn of the convolutional layer to construct an objective function f (X) of the MDCV, and then calculating second-order Taylor expansion of f (X) at a standing point X k of the objective function f (X) Representing the Laplace operator, (. Cndot.) T represents the transpose, if the current second derivative is not singular, the next iteration point isIf the current second derivative is singular, firstly solvingAnd finally, calculating the distances dist from all convolution kernels in the convolution layer to the MDCV value, setting a threshold value alpha, and cutting the convolution kernels with dist < alpha to complete the convolution kernel pruning process. Where k is the number of searches.
Theorem 3 (mahalanobis distance center value MDCV) it is known that X 1,X2,...,Xn represents the convolution kernels in the network model, S represents the covariance matrix of all convolution kernels, and μ represents the mean of all convolution kernels. The calculation formula of the mahalanobis distance center value MDCV is as follows:
Where Rn is the set of convolution kernels for the same hierarchical model and T represents the transpose.
It is proved that MDCV is the minimum distance from the feature vector X to the feature vector set X 1,X2,...,Xn, S is the covariance matrix of the vector set X 1,X2,...,Xn, μ is the mean of the vector set, where the covariance matrix S is introduced to exclude the interference of the correlation between the variables, the feature vector X is easier to be replaced by the feature vector set when the feature vector x→mdcv value, and the MDCV value is the minimum distance representing the feature vector X * to the feature vector set X 1,X2,...,Xn when x=mdcv, X and X 1,X2,...,Xn are linearly related. The Pichia of the syndrome
2.2 Parallel Im2col convolutions
After convolution kernel pruning is completed, an Im2col convolution parallel operation can be realized by combining a MapReduce calculation framework, and the specific process is that firstly, an input feature Map M i is mapped into a convolution calculation matrix I i by an Im2col method, each mapping matrix I i and a corresponding convolution kernel store a key value pair < I i,Kz >, wherein K z represents a convolution kernel corresponding to the convolution calculation matrix I i and is in a many-to-many relation, then, a Map () function is called, matrix multiplication operation is carried out on a matrix I i in the key value pair and a one-dimensional vector of the corresponding convolution kernel to obtain a convolution intermediate result, and finally, a Reduce () function is called to combine feature maps of the same piece of data to obtain a final output feature Map NM i.
3. Parameter parallel update
The parallel DCNN algorithm under the current big data adopts a random gradient descent method or a batch gradient descent method to update parameters in the back propagation process. However, in the process of realizing gradient descent, training of the DCNN model on abnormal data (error labeling, noise data, etc.) may cause the loss function to converge and oscillate, resulting in poor convergence of the loss function. In order to solve the problem, an IM-BGDS strategy is provided, which mainly comprises two steps of (1) gradient construction, namely, providing a Loss average weight LAW (g i) (Loss AVERAGE WEIGHT) to eliminate the influence of abnormal data on batch gradients, designing a Loss summation gradient LSG (T) (Loss Sum Gradient) to construct batch data average gradients, solving the problem of poor convergence of a Loss function, and (2) parameter parallel update, namely, after obtaining the average gradients of the batch data, parallelly calculating errors by combining a MapReduce calculation frame and a counter-propagating error conduction formula, and realizing parallel update of parameters.
(1) Gradient construction
In order to eliminate the influence of abnormal data on batch gradient, a loss average weight LAW (g i) and a loss sum gradient LSG (T) are designed to solve the problem of poor convergence of a loss function, which is characterized in that firstly, when parameters are updated, the average value of the loss function of the whole batch data is calculated, the average value is differenced with the loss function value of each piece of data g i, a loss average weight LAW (g i) is constructed, a key value pair < g i,LAW(gi > is stored in an HDFS, and then, the deviation of the loss function of each piece of data g i to the current parameter delta z is calculatedStoring key value pairsIn HDFS, setting the batch_size as 1 in LAW (g i), and traversing key value pair < g i,LAW(gi) > and with g i as indexAn average gradient LSG (T) of the batch data is constructed to obtain a batch gradient for the current parameter.
Theorem 4 (loss average weight LAW (g i)): knowing that g i represents a piece of data in the batch, J (ω, b) i represents the data g i loss function value, ω, b is the convolution kernel parameter and the bias of the convolution layer, respectively, batch_size represents the batch data size, and LAD (g i) is the absolute value of the difference between the loss function value and the loss function value average for data g i. The calculation formula of the loss average weight LAW (g i) is as follows:
Wherein:
It is proved that LAW (g i) is a weight index of a loss function value of data g i, batch_size is set as a batch data size, τ is a threshold for measuring LAD (g i), when LAD (g i) < τ, the loss function value of current data g i is a conventional value, so that LAW (g i) =1 is reserved, and when LAD (g i) > τ, the loss function value of current data g i is an abnormal value, so that LAW (g i) =0. The Pichia of the syndrome
Theorem 5 (loss-summing gradient LSG (T)): knowing that T represents all data in the batch,The gradient of the penalty function for data g i to parameter x is represented, and batch_size represents the batch data size. The calculation formula of the loss-summed gradient LSG (T) is as follows:
the LSG (T) was shown to be the average gradient of the batch data batch, set As the gradient of the loss function of data g i to parameter x, batch_size is the lot data size, when LIW (g i) =1, gradient of data g i Decreasing toward the optimal direction, when LIW (g i) =0, the gradient of data g i The deviation from the optimal direction is large and is not accounted for in the LSG (T) gradient. The Pichia of the syndrome
(2) Parameter parallel update
After obtaining the average gradient of the batch data, updating the error item parameters by using an error back propagation algorithm in parallel, and combining with a MapReduce calculation frame to obtain a network model with parameters updated in parallel, wherein the parameter parallel updating process comprises the following steps of firstly, calculating a first-1 layer convolution kernel according to the calculationGradient of all parametersAnd map the result to key value pairsStoring in HDFS, then calculating convolution kernel in network modelAmount of change in parametersThe network parameters of the layer 1 convolution kernel are updated, wherein r is the convolution kernel number, and the function of the network parameters is corresponding to the corresponding gradient. And finally, synchronizing the updated parameters to all the computing nodes through the HDFS, and carrying out next updating until all the parameters in the network model are updated. Where the range of values of l depends on the number of convolutional layers of the network model employed.
4. Effectiveness of parallel deep convolutional neural network optimization algorithm (IA-PDCNNOA) based on Im2col
To verify the performance effect of algorithm IA-PDCNNOA, we applied the IA-PDCNNOA method to both ImageNet 1K dataset and CIFAR dataset, the specific information of which is shown in table 1. The MR-FPDCNN, SSOCNN, FCNN algorithm is compared in terms of algorithm parallelism, classification accuracy, etc.
Table 1 data set details
Items CIFAR10 ImageNet 1K
Number of pictures/sheets 60 000 1281 167
Picture size/pixel 32*32 224*224
Number of categories/categories 10 1000
4.1IA-PDCNNOA algorithm speed ratio experimental analysis
To verify the parallelization performance of the IA-PDCNNOA algorithm in big data environment, the acceleration ratio is used as a measurement index based on CIFAR and ImageNet 1K data sets, and is compared with the MR-FPDCNN, SSOCNN, FCNN algorithm respectively. Meanwhile, in order to ensure the accuracy of the experimental result, the average operation time length of each algorithm is taken for 10 times to calculate the speed-up ratio as the final experimental result. The experimental results are shown in fig. 1:
It can be seen from fig. 1 (a) that when processing CIFAR such a relatively small-scale dataset, the acceleration ratio of each algorithm increases slowly with increasing node number, wherein the acceleration ratio of IA-PDCNNOA is lower by 0.3 and 0.5, respectively, than the FCNN and SSOCNN algorithms with low parallelization degree when the cluster node number is 4, whereas in fig. 1 (b) the acceleration ratio of IA-PDCNNOA algorithm increases more when the algorithm processes a relatively large dataset of ImageNet 1K, reaching 9.8 when the cluster node number is 8, 1.1, 4.1 and 4.6, respectively, higher than the ME-FPDCNN, FCNN and SSOCNN algorithms. The reason for these results is that when the IA-PDCNNOA algorithm processes a data set with a relatively small scale, the data distribution to each computing node causes a rapid increase in the communication time overhead among the nodes, the running speed obtained by parallelization operation is extremely limited, when the IA-PDCNNOA algorithm processes a data set with a relatively large scale, because of the designed IM-PMTS strategy, the overhead of convolutional layer parameters in network communication is reduced by proposing a mahalanobis distance central value MDCV to prune the same-layer convolutional kernel, and then the process of convolutional operation is accelerated by combining the MapReduce and IM2col methods for parallel training, the operation speed of the convolutional layer is improved, the acceleration ratio of the algorithm is improved, and experiments show that the parallelization capability of the IA-PDCNNOA algorithm is remarkably improved along with the increase of cluster nodes, and the method is suitable for parallelization of a large data set and has better performance.
4.2IA-PDCNNOA algorithm accuracy test analysis
In order to further verify the training effect of the IA-PDCNNOA algorithm, the training effect of the algorithm was evaluated using the Top-1 accuracy as a measure, IA-PDCNNOA, MR-FPDCNN, SSOCNN and FCNN were processed on CIFAR and ImageNet 1K datasets, respectively, and the Top-1 accuracy was calculated as an experimental result, as shown in fig. 2:
as can be seen from fig. 2 (a), when processing the dataset with a relatively small size of CIFAR, the Top-1 accuracy of each algorithm can be stabilized at a higher value, wherein the Top-1 accuracy of the IA-PDCNNOA algorithm is highest, and convergence is completed earlier, reaching 89.72%, which is 2.87%, 4.62% and 6.48% higher than that of the MR-FPDCNN, SSOCNN and FCNN algorithms, but when the algorithm processes the dataset with a relatively large size of ImageNet 1K, the Top-1 accuracy of each algorithm and the convergence of the algorithm are greatly different in fig. 2 (b), wherein the Top-1 accuracy of the IA-PDCNNOA algorithm is highest among four parallelization algorithms, reaching 72.41%, which is 2.31%, 7.98% and 2.85% higher than that of the MR-FPDCNN, SSOCNN and FCNN algorithms, but the other three algorithms are difficult to converge to different degrees. These results are generated because the IA-PDCNNOA algorithm proposes an IM-BGDS strategy that designs a loss sum gradient LSG (T) to construct a small batch of data gradients, and updates parameters in parallel by an error back propagation algorithm, eliminating the influence of abnormal data on the batch gradients, enhancing the convergence of the IA-PDCNNOA algorithm. Experimental data shows that compared with other three parallelization algorithms, the IA-PDCNNOA has higher convergence speed and higher accuracy, and is suitable for model parallelization training of the deep convolutional neural network under a large dataset.
4.3IA-PDCNNOA algorithm runtime and FLOPs experimental analysis
To verify the algorithm execution speed and model optimization effect of the IA-PDCNNOA algorithm in big data environments, the running times and FLOPs of Baseline, IA-PDCNNOA, MR-FPDCNN, SSOCNN and FCNN were calculated based on CIFAR and ImageNet 1K datasets, respectively, where Baseline is the Baseline data of ResNet model at 1/8 data load, and experimental results are shown in table 2:
table 2 run time and FLOPs of each algorithm on two datasets
As can be seen from Table 2, when processing CIFAR such a relatively small-scale dataset, the algorithm runs without a large gap, but with a different degree of reduction in their floating point operations, wherein the floating point operations of IA-PDCNNOA are reduced by 5%, 21% and 16% respectively, compared to the MR-FPDCNN, SSOCNN and FCNN algorithms, but when processing a large dataset such as ImageNet 1K, the algorithm runs and floating point operations of IA-PDCNNOA are better than the other three algorithms, wherein the algorithm runs of IA-PDCNNOA are faster by 1.32X10 4s、3.85×104 s and 5.29X 10 4 s, and the floating point operations are reduced by 3%, 13% and 8% respectively, compared to the MR-FPDCNN, SSOCNN and FCNN algorithms. These results are generated because the MHO-PFES strategy proposed by the IA-PDCNNOA algorithm removes redundant features in the data by providing the feature correlation index FCI (u, v), and screens the target features of the data as inputs to the convolutional neural network, thereby reducing the floating point operand of the model and speeding up the operation speed of the algorithm. In general, comparing the running time and the floating point operand variation trend of the four algorithms on CIFAR and ImageNet 1K, it can be seen that the running time and the floating point operand reduction of the IA-PDCNNOA algorithm are greatly separated from other algorithms along with the increase of the training dataset, so that it can be concluded that the IA-PDCNNOA is better than the MR-FPDCNN, SSOCNN and FCNN, and the method is suitable for the parallelization training of the DCNN model under the large dataset.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims (3)

1.一种基于Im2col的并行深度卷积神经网络优化方法,其特征在于,包括以下步骤:1. A parallel deep convolutional neural network optimization method based on Im2col, characterized by comprising the following steps: S1,特征并行提取:提取医疗图像数据中的目标特征作为卷积神经网络的输入;S1, parallel feature extraction: extracting target features from medical image data as input to the convolutional neural network; S2,模型并行训练:在并行DCNN模型训练阶段的卷积过程之中,通过IM-PMTS策略完成分布式的卷积核剪枝,以及多节点卷积计算;并结合MapReduce和Im2col方法并行训练模型;S2, model parallel training: In the convolution process of the parallel DCNN model training stage, the IM-PMTS strategy is used to complete the distributed convolution kernel pruning and multi-node convolution calculation; and the model is trained in parallel by combining MapReduce and Im2col methods; S3,参数并行更新:在反向传播阶段,对于批量医疗图像数据采用IM-BGDS策略进行参数的更新;S3, parallel parameter update: In the back propagation stage, the IM-BGDS strategy is used to update the parameters of batch medical image data; S4,将待测医疗图像测数据输入参数并行更新后的DCNN模型,输出医疗图像的分类结果;S4, inputting the measured data of the medical image to be tested into the DCNN model with updated parameters in parallel, and outputting the classification result of the medical image; 所述S1采用MHO-PFES策略进行特征并行提取,MHO-PFES策略包括以下步骤:The S1 adopts the MHO-PFES strategy to perform parallel feature extraction, and the MHO-PFES strategy includes the following steps: S1-1,特征提取:采用改进的非均值滤波器对输入数据进行滤波,并计算滤波数据的拉普拉斯方程h(x,y),寻找拉普拉斯方程的零交叉来提取数据特征;S1-1, feature extraction: use an improved non-mean filter to filter the input data, calculate the Laplace equation h(x,y) of the filtered data, and find the zero crossing of the Laplace equation to extract data features; S1-2,特征筛选:为进一步筛选目标特征,提出特征相关指数FCI(u,v)对比任意两个数据块间的相似度,并设定相关性系数ε,通过去除FCI(u,v)<ε的数据块来减少数据中的冗余特征;S1-2, feature screening: In order to further screen the target features, the feature correlation index FCI(u,v) is proposed to compare the similarity between any two data blocks, and the correlation coefficient ε is set to reduce the redundant features in the data by removing the data blocks with FCI(u,v)<ε; 所述特征相关指数FCI(u,v)包括:The feature correlation index FCI(u,v) includes: 其中μuv分别表示u和v的期望;Where μ uv represent the expectations of u and v respectively; σuv分别表示u和v的方差;σ uv represent the variance of u and v respectively; u和v分别表示两条特征向量;u and v represent two eigenvectors respectively; 所述S2中的IM-PMTS策略包括以下步骤:The IM-PMTS strategy in S2 includes the following steps: S2-1,卷积核剪枝:设计马氏距离中心值MDCV,通过求解MDCV值来寻找与网络模型中卷积核线性相关的向量,并计算此向量到各个卷积核之间的距离dist,通过设定阈值α,裁剪dist<α的卷积核来减少网络模型中冗余参数;S2-1, convolution kernel pruning: Design the Mahalanobis distance center value MDCV, find the vector linearly related to the convolution kernel in the network model by solving the MDCV value, and calculate the distance dist between this vector and each convolution kernel. By setting the threshold α, prune the convolution kernel with dist < α to reduce the redundant parameters in the network model; S2-2,并行Im2col卷积:利用Im2col算法将特征图映射成矩阵,将矩阵与对应卷积核存储键值对,分发到各计算节点进行矩阵运算来加快卷积层的运算,得到运算卷积层运算结果,并将结果存入HDFS中;S2-2, parallel Im2col convolution: Use the Im2col algorithm to map the feature map into a matrix, store the matrix and the corresponding convolution kernel as key-value pairs, distribute them to each computing node for matrix operations to speed up the convolution layer operations, obtain the convolution layer operation results, and store the results in HDFS; 所述马氏距离中心值MDCV包括:The Mahalanobis distance center value MDCV includes: 其中μ表示所有卷积核的均值;Where μ represents the mean of all convolution kernels; S表示所有卷积核的协方差矩阵;S represents the covariance matrix of all convolution kernels; Rn是对于同一层级模型中卷积核的集合; Rn is the set of convolution kernels in the same level model; T表示转置;T stands for transpose; 所述IM-BGDS策略包括以下步骤:The IM-BGDS strategy includes the following steps: S3-1,梯度构建:提出损失均值权重LAW(gi)来排除异常数据对批梯度的影响,并设计损失求和梯度LSG(T)来构建批数据平均梯度,解决了损失函数收敛性差的问题;S3-1, gradient construction: The loss mean weight LAW( gi ) is proposed to eliminate the influence of abnormal data on batch gradients, and the loss sum gradient LSG(T) is designed to construct the batch data average gradient, which solves the problem of poor convergence of the loss function; S3-2,参数并行更新:在得到批数据的平均梯度后,结合MapReduce计算框架和反向传播的误差传导公式来并行化地计算误差,实现参数的并行更新;S3-2, parallel parameter update: After obtaining the average gradient of the batch data, the MapReduce computing framework and the error conduction formula of back propagation are combined to parallelize the error calculation and implement parallel parameter update; 所述损失均值权重LAW(gi)包括:The loss mean weight LAW( gi ) includes: 其中:in: 其中LAD(gi)为数据gi的损失函数值与损失函数值均值之差的绝对值;Where LAD( gi ) is the absolute value of the difference between the loss function value of data g i and the mean of the loss function value; gi表示批数据中的一条数据; gi represents a piece of data in the batch data; τ为衡量LAD(gi)的阈值;τ is the threshold for measuring LAD(g i ); batch_size表示批数据大小;batch_size indicates the batch data size; J(ω,b)i表示数据gi损失函数值;J(ω,b) i represents the loss function value of data g i ; ω,b分别是卷积核参数和卷积层的偏置。ω, b are the convolution kernel parameters and the bias of the convolution layer respectively. 2.根据权利要求1所述的一种基于Im2col的并行深度卷积神经网络优化方法,其特征在于,所述改进的非均值滤波器FT(a,b)包括:2. According to a parallel deep convolutional neural network optimization method based on Im2col in claim 1, it is characterized in that the improved non-mean filter FT(a,b) comprises: 其中a表示目标窗口矩阵;Where a represents the target window matrix; b表示邻域窗口矩阵;b represents the neighborhood window matrix; θ(·)为特征变换函数;θ(·) is the feature transformation function; Gi为当前数据; Gi is the current data; 分别是矩阵a,b向量化的表示; They are the vectorized representations of matrices a and b respectively; |·|表示向量的模。|·| represents the magnitude of a vector. 3.根据权利要求1所述的一种基于Im2col的并行深度卷积神经网络优化方法,其特征在于,所述损失求和梯度LSG(T)包括:3. According to a parallel deep convolutional neural network optimization method based on Im2col according to claim 1, it is characterized in that the loss sum gradient LSG(T) comprises: 其中batch_size表示批数据大小;Where batch_size represents the batch data size; ▽Jxi表示数据gi的损失函数对于参数x的梯度;▽J xi represents the gradient of the loss function of data gi with respect to parameter x; T表示批中所有数据;T represents all the data in the batch; LAW(gi)是数据gi的损失函数值的权重指标。LAW( gi ) is the weight indicator of the loss function value of data g .
CN202210279825.6A 2022-03-21 2022-03-21 Parallel deep convolutional neural network optimization method based on Im2col Active CN114819136B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210279825.6A CN114819136B (en) 2022-03-21 2022-03-21 Parallel deep convolutional neural network optimization method based on Im2col

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210279825.6A CN114819136B (en) 2022-03-21 2022-03-21 Parallel deep convolutional neural network optimization method based on Im2col

Publications (2)

Publication Number Publication Date
CN114819136A CN114819136A (en) 2022-07-29
CN114819136B true CN114819136B (en) 2025-06-13

Family

ID=82530794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210279825.6A Active CN114819136B (en) 2022-03-21 2022-03-21 Parallel deep convolutional neural network optimization method based on Im2col

Country Status (1)

Country Link
CN (1) CN114819136B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3868358B2 (en) * 2002-09-27 2007-01-17 株式会社国際電気通信基礎技術研究所 Method and apparatus for controlling physical system and computer program for controlling physical system
CN107680082A (en) * 2017-09-11 2018-02-09 宁夏医科大学 Lung tumor identification method based on depth convolutional neural networks and global characteristics
CN107944399A (en) * 2017-11-28 2018-04-20 广州大学 A kind of pedestrian's recognition methods again based on convolutional neural networks target's center model
CN109271882B (en) * 2018-08-28 2020-05-15 昆明理工大学 A color-distinguishing method for extracting handwritten Chinese characters
CN111126602A (en) * 2019-12-25 2020-05-08 浙江大学 A Recurrent Neural Network Model Compression Method Based on Convolution Kernel Similarity Pruning
CN113610226B (en) * 2021-07-19 2022-08-09 南京中科逆熵科技有限公司 Online deep learning-based data set self-adaptive cutting method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于Im2col的并行深度卷积神经网络优化算法;胡健;龚克;毛伊敏;陈志刚;陈亮;计算机应用研究;20221231;第39卷(第010期);全文 *
基于MapReduce的深度并行卷积神经网络优化算法研究;龚克;《中国优秀硕士论文全文数据库》;20240415;I140-124 *

Also Published As

Publication number Publication date
CN114819136A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN108846445B (en) Image processing method
Van Der Maaten Accelerating t-SNE using tree-based algorithms
CN107480694B (en) Weighting selection integration three-branch clustering method adopting two-time evaluation based on Spark platform
CN110929029A (en) Text classification method and system based on graph convolution neural network
Dong et al. Copt: Coordinated optimal transport on graphs
CN108171010B (en) Protein complex detection method and device based on semi-supervised network embedded model
CN110674326A (en) Neural network structure retrieval method based on polynomial distribution learning
Kang et al. Consensus low-rank multi-view subspace clustering with cross-view diversity preserving
CN114819136B (en) Parallel deep convolutional neural network optimization method based on Im2col
CN111738298B (en) A classification method for MNIST handwritten digit data based on depth-width variable multi-kernel learning
Pal et al. Finding hierarchy of clusters
CN118799603B (en) An incomplete multi-view clustering method and system based on deep learning
CN114792397A (en) SAR image urban road extraction method, system and storage medium
CN120047859A (en) Unmanned aerial vehicle target tracking method, system, equipment and storage medium
CN118737273A (en) A deep learning approach for cell type deconvolution and 3D reconstruction
Shang et al. Co-evolution-based immune clonal algorithm for clustering
Van Gennip et al. A Prolegomenon to Differential Equations and Variational Methods on Graphs
Rosman et al. Topologically constrained isometric embedding
Huang et al. A Study of Deep Fuzzy Clustering Method Based on Maximum Entropy Clustering
Soheily-Khah Generalized k-means based clustering for temporal data under time warp
CN120429657B (en) A drug resistance prediction system and method based on tensor enhancement graph similarity
Abouelazm et al. Deep Distance Metric Learning for Similarity Preserving Embedding of Point Clouds.
Yuan et al. Spectral averagely-dense clustering based on dynamic shared nearest neighbors
CN115424138B (en) A hyperspectral image classification method based on deep neural network
CN113988139B (en) Hyperspectral band selection method and storage medium based on collaborative analysis of multiple data sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant