[go: up one dir, main page]

CN112884123B - Neural network optimization method, device, electronic equipment and readable storage medium - Google Patents

Neural network optimization method, device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112884123B
CN112884123B CN202110204808.1A CN202110204808A CN112884123B CN 112884123 B CN112884123 B CN 112884123B CN 202110204808 A CN202110204808 A CN 202110204808A CN 112884123 B CN112884123 B CN 112884123B
Authority
CN
China
Prior art keywords
fusion
subnet
layer
network
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110204808.1A
Other languages
Chinese (zh)
Other versions
CN112884123A (en
Inventor
张凯
谭文明
李哲暘
张如意
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202110204808.1A priority Critical patent/CN112884123B/en
Publication of CN112884123A publication Critical patent/CN112884123A/en
Application granted granted Critical
Publication of CN112884123B publication Critical patent/CN112884123B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请提供一种神经网络优化方法、装置、电子设备及可读存储介质,该神经网络优化方法包括:对待优化神经网络进行子网划分,并依据预设融合规则以及融合目标分别对各子网进行网络层layer融合,得到各子网的最优融合结果;依据各子网的最优融合结果、所述预设融合规则以及融合目标,对所述待优化神经网络进行layer融合,得到所述待优化神经网络的最优融合结果。该方法可以在保证得到满足预设融合规则以及融合目标的情况下的最优融合结果的情况下,提高确定待优化神经网络的最优融合结果的效率。

This application provides a neural network optimization method, device, electronic equipment and readable storage medium. The neural network optimization method includes: dividing the neural network to be optimized into subnets, and classifying each subnet according to the preset fusion rules and fusion goals. Perform network layer layer fusion to obtain the optimal fusion result of each subnet; based on the optimal fusion result of each subnet, the preset fusion rules and the fusion target, perform layer fusion on the neural network to be optimized to obtain the The optimal fusion result of the neural network to be optimized. This method can improve the efficiency of determining the optimal fusion result of the neural network to be optimized while ensuring that the optimal fusion result satisfies the preset fusion rules and fusion goals.

Description

Neural network optimization method and device, electronic equipment and readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a neural network optimization method, a device, an electronic device, and a readable storage medium.
Background
Neural Networks (NNs) are the focus and importance of research in the field of artificial intelligence, and the huge computational load and broadband requirements of the Neural networks become the main bottlenecks for the deployment of the Neural networks.
In order to reduce the bandwidth requirement of the neural network deployed in the computing platform, the neural network optimization can be realized by integrating a plurality of layers (network layers) of the neural network into a level (hardware basic computing unit, which can be called as a hierarchy) according to the constraint of the computing platform, so that the input and output of the layers integrated into one level only occupy one bandwidth, and the frequency of interaction between the computing platform and external storage data is reduced.
The practice finds that different fusion modes have different effects on reducing the bandwidth requirements, and how to minimize the bandwidth requirements of the neural network through fusion becomes a technical problem to be solved urgently.
Disclosure of Invention
In view of the foregoing, the present application provides a neural network optimization method, a neural network optimization device, an electronic device, and a readable storage medium.
Specifically, the application is realized by the following technical scheme:
according to a first aspect of embodiments of the present application, there is provided a neural network optimization method, including:
carrying out sub-network division on the neural network to be optimized, and carrying out network layer fusion on each sub-network according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each sub-network;
performing layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.
According to a second aspect of embodiments of the present application, there is provided a neural network optimization device, including:
the division unit is used for carrying out subnet division on the neural network to be optimized;
the optimizing unit is used for respectively carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet;
the optimizing unit is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.
According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor for executing the machine-executable instructions to implement the personnel archiving method described above.
According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions which when executed by a processor implement the personnel archiving method described above.
The technical scheme that this application provided can bring following beneficial effect at least:
the sub-network division is carried out on the neural network to be optimized, the network layer fusion is carried out on each sub-network according to the preset fusion rule and the fusion target to obtain the optimal fusion result of each sub-network, the layer fusion is carried out on the neural network to be optimized according to the optimal fusion result of each sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized, and the efficiency of determining the optimal fusion result of the neural network to be optimized is improved under the condition that the optimal fusion result meeting the preset fusion rule and the fusion target is ensured.
Drawings
FIG. 1 is a flow chart of a neural network optimization method according to an exemplary embodiment of the present application;
fig. 2 is a schematic flow chart of performing subnet division on a neural network to be optimized and performing network layer fusion on each subnet according to a preset fusion rule and a fusion target according to an exemplary embodiment of the present application;
FIG. 3A is a schematic diagram of a neural network according to an exemplary embodiment of the present application;
FIG. 3B is a schematic diagram of an optimal fusion result under a greedy fusion scheme as illustrated in an exemplary embodiment of the present application;
FIG. 3C is a schematic diagram of an optimal fusion result obtained by using the technical solution provided in the embodiments of the present application according to an exemplary embodiment of the present application;
FIG. 4 is a flow chart of a neural network optimization method according to an exemplary embodiment of the present application;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following description will first simply explain some technical terms related to the embodiments of the present application.
layer: on the network side, the basic constituent units in the neural network, such as a convolution layer, a pooling layer and the like;
level: a basic computing unit when the neural network is deployed in the computing platform, wherein one or more layers form a level;
in practical applications, there may be a case where one layer is split into a plurality of levels, but the probability of occurrence of this case is low.
BW (bandwidth): the throughput of data, the BW of the input and output of a neural network when running in a computing platform can be understood as the bandwidth required by the neural network;
ker: broadly referred to as convolutional layer weights (coefficients);
coefficient caching: storing a cache of a ker in the computing platform;
map cache: the computing platform stores a cache of feature maps.
In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solutions of the embodiments of the present application in detail with reference to the accompanying drawings.
Referring to fig. 1, a flow chart of a neural network optimization method provided in an embodiment of the present application, as shown in fig. 1, the neural network optimization method may include the following steps:
And step S100, carrying out subnet division on the neural network to be optimized, and carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimized fusion result of each subnet.
In the embodiment of the application, in order to reduce the requirement of the bandwidth of the neural network deployed in the computing platform, the layer in the neural network can be fused to reduce the data interaction bandwidth between the neural network and the external storage, so that the optimization of the neural network is realized.
For example, for a neural network to be optimized (referred to herein as a neural network to be optimized), a plurality of subnets may be obtained by first performing subnet division, and layer fusion is performed on each subnet according to a preset fusion rule and a fusion target (referred to herein as a preset fusion rule and a fusion target), so as to obtain an optimal fusion result of each subnet.
For example, for an N-layer neural network (N.gtoreq.3), the neural network subnetwork may be divided to provide a plurality of 2-layer subnetworks.
For example, assuming that n=3 (the neural network includes layer1, layer2, and layer 3), the sub-network of layer2 may include a sub-network of layer1 and layer2, and a sub-network of layer2 and layer 3; for nonlinear networks, subnets of layer1 and layer3 may also be included.
For ease of understanding and explanation, a linear network will be described hereinafter, i.e., layer1 is directly connected to layer2, and layer2 is directly connected to layer 3.
For the neural network to be optimized, the sub-network division can be performed according to a plurality of different sub-network division modes, so that a plurality of different types of sub-networks are obtained, and layer fusion is performed on the different types of sub-networks respectively.
For example, taking an N-layer neural network as an example, assuming that n=4, the neural network may be divided into a plurality of 2-layer subnets, and layer fusion is performed on each 2-layer subnet; and dividing the neural network into a plurality of 3-layer subnets according to another subnet division mode, and respectively performing layer fusion on each 3-layer subnet.
Illustratively, the fusion rule is used to limit layers participating in fusion, which may include, but is not limited to, a limit of a ker cache (i.e., a limit of a coefficient cache), a limit of a Map cache, a limit of layer-to-layer fusion, and the like.
For example, a maximum ker buffer (i.e., a coefficient buffer) may be preset, and the fused level coefficient cannot exceed the preset maximum ker buffer, so that the number of layers participating in layer fusion may be limited.
The fusion objective is to characterize the purpose of layer fusion of the neural network, for example, to minimize the bandwidth requirements of the neural network deployment (i.e., to minimize the throughput of the neural network data interaction with external storage).
Step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result of each sub-network, and presetting fusion rules and fusion targets to obtain the optimal fusion result of the neural network to be optimized; for any layer level including a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the same subnet as the layer included in the level.
In the embodiment of the present application, at least one fusion scheme needs to use the optimal fusion result of the sub-network in step S100 in consideration of the layer fusion process of the neural network to be optimized.
For example, for an N-layer neural network, assuming n=3, the candidate fusion scheme (to satisfy the preset fusion rule, the following is the same) may include the following scheme:
scheme 1, each layer is not fused;
scheme 2, layer1 and layer2 are fused, layer3 does not participate in the fusion;
scheme 3, layer2 and layer3 are fused, layer1 does not participate in the fusion;
scheme 4, layer1, layer2, and layer3 fusion (assuming 3 layer fusion can meet preset fusion rule requirements).
Wherein, both scheme 2 and scheme 3 require the use of fusion results to a layer2 subnet.
For example, when layer1 and layer2 are fused for scheme 2, the optimal fusion result is the optimal fusion result of the 2-layer subnets corresponding to layer1 and layer 2.
Namely, when determining the optimal fusion result of the neural network to be optimized, the optimal fusion result of the sub-network of the neural network to be optimized is required to be used. Therefore, the sub-network division is carried out on the neural network to be optimized, the layer fusion is carried out on each sub-network respectively to obtain the optimal fusion result of each sub-network, and the layer fusion is carried out on the neural network to be optimized according to the optimal fusion result of the sub-network to obtain the optimal fusion result of the neural network to be optimized, so that the calculation for determining the optimal fusion result of the neural network to be optimized can be simplified, and the efficiency for determining the optimal fusion result of the neural network to be optimized is improved.
For example, for any level including a plurality of layers in the optimal fusion result of the neural network to be optimized, if there is a subnet (referred to herein as a target subnet) that is the same as the layer included in the level, the structure of the level in the optimal fusion result of the neural network to be optimized is consistent with the structure of the optimal fusion result of the target subnet.
In the above example, assuming that the optimal fusion result of the neural network to be optimized is scheme 3, the structure of the level is identical to that of the optimal fusion result of the layer2 and layer3 sub-network for the level including layer2 and layer 3.
It can be seen that in the process of the method shown in fig. 1, the sub-networks are divided and layered according to the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of each sub-network, and then the layered fusion is performed on the neural network to be optimized according to the optimal fusion result of each sub-network, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized, and the efficiency of determining the optimal fusion result of the neural network to be optimized is improved under the condition that the optimal fusion result under the condition that the preset fusion rule and the fusion target are satisfied is ensured.
In some embodiments, as shown in fig. 2, in step S100, the sub-networks to be optimized are divided, and network layer fusion is performed on each sub-network according to a preset fusion rule and a fusion target, which may be implemented by the following steps:
step S101, respectively carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layer numbers;
step S102, respectively performing layer fusion on each bottommost subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-layer sub-network is the sub-network with the least layer number included in a plurality of different types of sub-networks;
Step S103, performing layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, a preset fusion rule and a fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a plurality of layers, the structure of the layers is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is the same low-level sub-network as the layers included in the layers.
For example, when the neural network to be optimized is divided into the sub-networks, the neural network to be optimized may be divided into different types of sub-networks according to different sub-network division modes.
Illustratively, different types of subnets include different numbers of layers.
Considering that the layer fusion of the higher layer subnetwork requires the use of the optimal fusion result of the layer of the lower layer subnetwork when the layer fusion of the subnetwork is performed.
It should be noted that, in the embodiment of the present application, the bottom layer subnets and the high layer subnets are relatively, but not absolutely, and for two different types of subnets, the subnets with a large number of layers are high layer subnets, and the subnets with a small number of layers are bottom layer subnets.
For example, for a layer 2 subnet (one subnet includes 2 layers) and a layer 3 subnet (one subnet includes 3 layers), the layer 2 subnet is the bottom layer subnet and the layer 3 subnet is the higher layer subnet.
For a 3-layer subnet and a 4-layer subnet (one subnet includes 4 layers), the 3-layer subnet is a bottom-layer subnet, and the 4-layer subnet is a higher-layer subnet.
It should be noted that, for the layer 4 subnetwork, the layer 2 subnetwork and the layer 3 subnetwork both belong to the bottom subnetwork.
When the subnets are in layer fusion, layer fusion can be performed on the subnets at the bottommost layer (namely the subnets with the least layer number), so that the optimal fusion result of the subnets at the bottommost layer can be obtained.
When the optimal fusion result of the bottom-layer subnetwork is determined, layer fusion can be performed on the high-layer subnetwork according to the sequence that the subnetwork comprises the layers from less to more, and the optimal fusion result of the high-layer subnetwork, the preset fusion rule and the fusion target are sequentially obtained.
Illustratively, for a level of any one of the optimal fusion results for any one of the higher-level subnets that includes multiple layers, if there is a same underlying subnet (referred to herein as a target underlying subnet) as the level includes layers, the structure of the level in the optimal fusion result for the higher-level subnet is consistent with the structure of the optimal fusion result for the target underlying subnet.
For example, for a layer4 sub-network (assuming that layer1 to layer4 are included), if the optimal fusion result is layer1 to layer3 fusion, layer4 does not participate in the fusion (also can be understood as a layer4 single layer is a layer, and the other single layers are the same), the structure of the layer obtained by fusion for layer1 to layer3 is identical to the structure of the optimal fusion result of a layer3 sub-network including layer1 to layer 3.
In one example, the bottom-most subnet includes 1 layer number; the subnets obtained in different subnets are divided into 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;
in step S103, performing layer fusion on the high-level subnet according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level subnet may include:
for any layer 2 sub-network, respectively determining a first in-out bandwidth of the layer 2 sub-network under the condition that two layers in the layer 2 sub-network are fused into one level, and a second in-out bandwidth of the bottommost sub-network under the condition that the two layers are not fused;
if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the layer 2 subnetwork;
If the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that the layer is not fused to be the optimal fusion result of the layer2 subnetwork.
It should be noted that, for a subnet with a layer number of 1 (may be referred to as a layer1 subnet), the optimal fusion result may be a level for 1 layer.
By way of example, taking an optimal fusion result as an fusion result with minimum bandwidth in and out as an example, that is, the fusion target is to minimize the throughput of the data interaction between the neural network and the external storage.
For any layer2 subnet (including layer1 and layer2 as an example), the candidate fusion scheme for the layer2 subnet may include the following scheme:
scheme 1: layer1 and layer2 are fused into 1 level;
scheme 2: layer1 and layer2 do not merge (i.e., layer1 is one level and layer2 is one level).
The ingress and egress bandwidths corresponding to scheme 1 (referred to herein as the first ingress and egress bandwidths) and scheme 2 (referred to herein as the second ingress and egress bandwidths) may be determined separately.
For example, for any fusion scheme, the ingress and egress bandwidths corresponding to the fusion scheme are the sum of bandwidths corresponding to the input features and bandwidths corresponding to the output features of each level under the fusion scheme, and specific implementation of the ingress and egress bandwidths may be described below with reference to specific examples, which are not described herein in detail.
The first access bandwidth and the second access bandwidth can be compared, and if the first access bandwidth is smaller than the second access bandwidth, the scheme 1 is determined to be an optimal fusion result; if the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that scheme 2 is the optimal fusion result.
It should be noted that, for the case where the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, the scheme 1 may be used as the optimal fusion result, and the scheme 2 may also be used as the optimal fusion result.
In addition, when the sub-network division is performed, the sub-network division of 1 layer may not be performed, that is, the sub-network of the lowest layer is not 1 layer, for example, the sub-network division of 2 layers, the sub-network division of 3 layers, the sub-network division of … and the sub-network of (N-1) layer may be performed, and in this case, the sub-network of the lowest layer may be the sub-network of 2 layers.
In one example, if the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, then scheme 2 is determined to be the optimal fusion result.
In an example, in step S103, layer fusion is performed on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the high-level subnet, which may further include:
and for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to an optimal fusion result, a preset fusion rule and a fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.
For example, when the neural network to be optimized (assuming N layers) is divided into subnets, the subnet division may be performed according to a division manner of 2 layers of subnets (i.e., each subnet is a 2-layer subnet), the subnet division may be performed according to a division manner of 3 layers of subnets (i.e., each subnet is a 3-layer subnet), …, and the subnet division may be performed according to a division manner of (N-1) layers of subnets (i.e., each subnet is a (N-1) layer subnet).
When the subnets are layer fused, the optimal fusion result of each 2-layer subnet can be determined first, then the optimal fusion result of each 3-layer subnet is determined according to the optimal fusion result of each single-layer subnet and the optimal fusion result of each 2-layer subnet, then the optimal fusion result of each 4-layer subnet is determined according to the optimal fusion result of each single-layer subnet, the optimal fusion result of each 2-layer subnet and the optimal fusion result of each 3-layer subnet, and then the like until the optimal fusion result of each highest-layer subnet (such as (N-1) layer subnet) is determined.
When the layer fusion is performed on the sub-network, the fusion rule and the fusion target are the same as those when the layer fusion is performed on the neural network to be optimized.
In one example, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets a preset fusion rule limit.
Illustratively, layers involved in fusion may include, but are not limited to, conv (convolutional) layers, nonlinear layers, pool (pooling) layers, fully-connected layers, deconvolution layers, upsampling layers, or other network base layers.
Illustratively, the nonlinear layer may include a linear rectification function (which may also be referred to as a modified linear unit, rectified Linear Units, abbreviated as ReLU) or other activation function.
For example, when performing layer fusion on any subnet or neural network to be optimized, the maximum number of layers (herein denoted as m) that can be fused may be determined according to the layers participating in fusion and a preset fusion rule.
It should be noted that, for different layers, the maximum number of layers that can be fused may be different under the same fusion rule.
In one example, for a network to be fused (including a neural network to be optimized or a sub-network of the neural network to be optimized), the candidate fusion scheme may include all layer fusion, or be divided into two optimal sub-networks (one optimal sub-network includes x layers, and the other optimal sub-network includes y-x layers, where y is the total number of layers in the network to be fused), and for an optimal sub-network including x layers, its structure is consistent with that under the optimal fusion scheme of the x-layer sub-network of the network to be fused (i.e., a sub-network including x layers); for an optimal sub-network of y-x layers, its structure is consistent with the structure under the optimal convergence scheme of the y-x layer sub-network of the network to be converged (i.e., the sub-network comprising y-x layers).
For example, for the k-layer subnet, if m=k under the condition that the fusion rule limit is satisfied, determining that the optimal fusion scheme is that all k layers are fused; if m < k, the optimal fusion scheme may be to divide the k-layer sub-network into 2 optimal sub-networks, one including m layers and one including k-m layers. For an optimal subnetwork comprising m layers, the result is consistent with the structure under the optimal convergence scheme of the m-layer subnetwork, and for an optimal subnetwork comprising k-m layers, the result is consistent with the structure under the optimal convergence scheme of the k-m layer subnetwork.
In some embodiments, before the sub-network division of the neural network to be optimized in step S100, the method may further include:
acquiring a network splitting configuration instruction;
splitting the neural network to be optimized into at least two parts to be optimized according to the acquired network splitting configuration instruction;
in step S100, performing subnet division on the neural network to be optimized may include:
respectively carrying out sub-network division on each part to be optimized;
in step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result, the preset fusion rule and the fusion target of each subnet may include:
for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, and a preset fusion rule and a fusion target to obtain the optimal fusion result of the part to be optimized;
And determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.
For example, in order to further improve the efficiency of optimizing the neural network, before the neural network is optimized according to the method flow shown in fig. 1, a network splitting configuration instruction may be further obtained, and according to the obtained network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized, and further, an optimal fusion scheme of each part to be optimized may be respectively determined, so as to obtain an optimal fusion scheme of the neural network to be optimized.
For any part to be optimized, the optimal fusion scheme can be determined according to the method flow shown in fig. 1.
Illustratively, the network split configuration instructions may be determined from a priori knowledge.
Because the optimal fusion scheme of the sub-network can be determined firstly in the flow of the method shown in fig. 1, and then the optimal fusion scheme of the neural network to be optimized can be determined according to the optimal fusion scheme of the sub-network, if the neural network to be optimized can be determined that certain layers of the neural network to be optimized cannot be fused into one level according to the existing priori knowledge, the network to be optimized can be split before the sub-network is divided.
For example, if it is known according to priori knowledge that the front N layer and the back (N-N) layer need to be separated under the premise of meeting the fusion target, N is 1-N < N, the neural network may be first separated into two parts including the front N layer and the back (N-N) layer, and the optimal fusion result of each part is determined according to the flow shown in fig. 1, and then the optimal fusion result of the neural network is determined, so as to further improve the efficiency of determining the optimal fusion result of the neural network to be optimized.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.
In this embodiment, the N-layer neural network is divided according to a subnet division manner in which one subnet includes 2 layers, one subnet includes 3 layers, …, and one subnet includes (N-1) subnets, and an optimal fusion result of each subnet is determined by adopting a bottom-up manner according to a sequence in which the number of layers included in the subnets is from less to more, and further, the optimal fusion result of the N-layer neural network (may be referred to as a pyramid layer fusion scheme) is determined, and in the fusion process, the optimal fusion result of each subnet is reserved, and non-optimal fusion results are not reserved.
For an N-layer neural network, a bottom-up calculation strategy is adopted, and an optimal fusion result of a 2-layer subnet is calculated first; calculating the optimal fusion result of the 3-layer subnetwork according to the optimal fusion result of the 2-layer subnetwork; and calculating the optimal fusion result of the 4-layer subnetwork based on the optimal fusion result of the 2-layer subnetwork and the optimal fusion result of the 3-layer subnetwork, and the like until the optimal fusion result of the N-layer neural network is determined.
In the process of fusing any sub-network or neural network, at least 2 adjacent layers are respectively selected for fusion (such as a horizontal adjacent layer with the same feature map input or a vertical adjacent layer with the feature map calculation result of the former layer being at least part of the input of the latter layer) according to a fusion rule (which can also be an optimization rule) so as to determine an optimal fusion result meeting a fusion target.
It should be noted that the number of layers involved in fusion cannot exceed the fusion rule limit.
Illustratively, the adjacent layers may include a Conv layer, a nonlinear layer, a pool layer, a full-link layer, a deconvolution layer, or an upsampling layer, etc.
For example, when the problem f (i, j) is defined as determining the optimal fusion result of the sub-network formed by layers i to j, f (1, h) exists for the problem f (1, N), that is, in the process of determining the optimal fusion result of the N-layer neural network, the optimal fusion result (h < N) of the sub-network formed by layers 1 to h needs to be determined.
Accordingly, in solving for f (1, N), each f (i, j) in the following table needs to be solved separately:
f(1,1) f(1,2) f(1,3) f(1,N)
f(2,2) f(2,3) f(h-2,h)
f(3,3) f(h-1,h) f(N-2,N)
f(h,h) f(N-1,N)
f(N,N)
for example, each f (i, j) can be sequentially solved from right to left and from bottom to top until the solution of f (1, N) is completed, so as to obtain the optimal fusion result of the N-layer neural network.
Effects of the technical solutions provided by the embodiments of the present application are described below with reference to examples.
Taking comparison with the following greedy fusion scheme as an example, it is assumed that an implementation procedure of the greedy fusion scheme is as follows:
1. setting the cache size in the chip, and inputting the network structure of the neural network;
2. if the cache in the chip can put down the current layer, putting in;
3. if the on-chip cache can not put down the current layer, ending the current level, emptying the current cache, and putting the current layer into a new level;
4. ending if all layers are divided; otherwise, go to 2).
For the neural network shown in fig. 3A, the optimal fusion results obtained according to the greedy fusion scheme and the technical scheme provided in the embodiments of the present application may be shown in fig. 3B and fig. 3C, respectively.
Wherein 3×128×64 represents a convolution kernel 3*3 ("×may also be described as" x "), the number of input/output channels is 128 and 64, the stride (step size) is 1, and the size of the feature map input is assumed to be w×h.
For the fusion result shown in fig. 3B, the bandwidth is 128wh+256wh+256wh+128 wh=768 wh, and for the fusion result shown in fig. 3C, the bandwidth is 128wh+64wh+64wh+128 wh=384 wh.
As can be seen, for the neural network shown in fig. 3A, the bandwidth of the optimal fusion result obtained by using the technical solution provided in the embodiment of the present application is half of the bandwidth of the fusion result obtained by using the greedy fusion solution.
The methods provided herein are described above. The apparatus provided in this application is described below:
referring to fig. 4, a schematic structural diagram of a neural network optimization device provided in an embodiment of the present application, as shown in fig. 4, the neural network optimization device may include:
a dividing unit 410, configured to divide the sub-network of the neural network to be optimized;
the optimizing unit 420 is configured to perform network layer fusion on each subnet according to a preset fusion rule and a fusion target, so as to obtain an optimal fusion result of each subnet;
the optimizing unit 420 is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain an optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.
In some embodiments, the dividing unit 410 performs subnet division on the neural network to be optimized, including:
carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes respectively to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layers;
the optimizing unit 420 performs network layer fusion on each subnet according to a preset fusion rule and a fusion target, including:
performing layer fusion on each bottommost subnet according to the preset fusion rule and the fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-most subnet is a subnet with the least number of layers included in the plurality of different types of subnets;
and carrying out layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is a low-level sub-network which is the same as the layer included in the level.
In some embodiments, the bottom-most subnet includes a layer number of 1; the subnets obtained in different subnet division modes comprise 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;
the optimizing unit 420 performs layer fusion on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, and includes:
for any 2-layer sub-network, respectively determining a first access bandwidth of the bottommost sub-network under the condition that two layers in the 2-layer sub-network are fused into one level and a second access bandwidth of the bottommost sub-network under the condition that the two layers are not fused;
if the first access bandwidth is smaller than the second access bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the 2-layer sub-network;
and if the first in-out bandwidth is larger than the second in-out bandwidth, determining that the layer is not fused into an optimal fusion result of the layer 2 subnetwork.
In some embodiments, the optimizing unit 420 performs layer fusion on the higher-layer subnet according to the optimal fusion result of the lower-layer subnet, the preset fusion rule and the fusion target, to obtain the optimal fusion result of the higher-layer subnet, and further includes:
And for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.
In some embodiments, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets the fusion rule limit;
the layers include a convolutional Conv layer, a nonlinear layer, a pooling pool layer, a full-concatenated layer, a deconvolution layer, or an upsampling layer.
In some embodiments, before the dividing unit 410 performs subnet division on the neural network to be optimized, the method further includes:
acquiring a network splitting configuration instruction;
splitting the neural network to be optimized into at least two parts to be optimized according to the network splitting configuration instruction;
the dividing unit 410 performs subnet division on the neural network to be optimized, including:
respectively carrying out sub-network division on each part to be optimized;
The optimizing unit 420 performs layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, and includes:
for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, the preset fusion rule and the fusion target to obtain the optimal fusion result of the part to be optimized;
and determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.
Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 501, a memory 502 storing machine-executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the neural network optimization method described above by reading and executing machine-executable instructions in the memory 502 corresponding to the encoded control logic.
The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
In some embodiments, a machine-readable storage medium, such as memory 502 in fig. 5, is also provided, having stored thereon machine-executable instructions that when executed by a processor implement the neural network optimization method described above. For example, the machine-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (9)

1.一种神经网络优化方法,其特征在于,所述方法用于降低计算平台的带宽需求,所述方法包括:1. A neural network optimization method, characterized in that the method is used to reduce the bandwidth requirements of a computing platform, and the method includes: 对待优化神经网络进行子网划分,并依据预设融合规则以及融合目标分别对各子网进行网络层融合,得到各子网的最优融合结果;其中,融合规则用于对参与融合的网络层进行限制,该限制包括系数缓存的限制、Map缓存的限制,以及,层与层融合的限制;所述系数为卷积层权重,所述系数缓存为计算平台中存放系数的缓存,所述Map缓存为计算平台中存放特征图的缓存;Divide the neural network to be optimized into subnets, and perform network layer fusion on each subnet based on the preset fusion rules and fusion goals to obtain the optimal fusion results for each subnet; among which, the fusion rules are used to classify the network layers participating in the fusion. Restrictions include restrictions on coefficient caching, restrictions on Map caching, and restrictions on layer-to-layer fusion; the coefficients are convolution layer weights, the coefficient cache is a cache that stores coefficients in the computing platform, and the Map The cache is a cache that stores feature maps in the computing platform; 依据各子网的最优融合结果、所述预设融合规则以及融合目标,对所述待优化神经网络进行网络层融合,得到所述待优化神经网络的最优融合结果,以最小化所述神经网络与外部存储的数据交互的带宽;其中,对于所述待优化神经网络的最优融合结果中任一包括多个网络层的层级,该层级的结构与目标子网的最优融合结果的结构一致,所述目标子网为与该层级包括的网络层相同的子网;According to the optimal fusion result of each subnetwork, the preset fusion rules and the fusion target, the neural network to be optimized is subjected to network layer fusion to obtain the optimal fusion result of the neural network to be optimized to minimize the The bandwidth of the interaction between the neural network and externally stored data; wherein, for any of the optimal fusion results of the neural network to be optimized including a hierarchy of multiple network layers, the structure of this hierarchy is consistent with the optimal fusion result of the target subnet The structure is consistent, and the target subnet is the same subnet as the network layer included in this level; 依据所述最优融合结果,以所述待优化神经网络的层级作为硬件基本计算单元,将所述待优化神经网络部署到计算平台;其中,所述待优化神经网络在所述计算平台中运行时,对于一个层级,输入占用一次带宽,输出占用一次带宽,所述带宽为所述计算平平台运行所述待优化神经网络时与外部存储的数据交互带宽;According to the optimal fusion result, the level of the neural network to be optimized is used as the basic computing unit of the hardware, and the neural network to be optimized is deployed to the computing platform; wherein the neural network to be optimized runs in the computing platform When , for a level, the input occupies one bandwidth and the output occupies one bandwidth, and the bandwidth is the data interaction bandwidth between the computing platform and external storage when running the neural network to be optimized; 其中,所述对待优化神经网络进行子网划分,并依据预设融合规则以及融合目标分别对各子网进行网络层融合,包括:Among them, the neural network to be optimized is divided into subnets, and the network layer fusion is performed on each subnet according to the preset fusion rules and fusion targets, including: 分别按照多种不同子网划分方式,对所述待优化神经网络进行子网划分,得到多种不同类型的子网,不同子网划分方式下得到的子网包括的网络层数量不同;Divide the neural network to be optimized into subnets according to a variety of different subnet division methods to obtain multiple different types of subnets, and the subnets obtained under different subnet division methods include different numbers of network layers; 依据所述预设融合规则以及融合目标,分别对各最底层子网进行网络层融合,得到各最底层子网的最优融合结果;所述最底层子网为所述多种不同类型的子网中包括的网络层数量最少的子网;According to the preset fusion rules and fusion targets, network layer fusion is performed on each bottom-layer subnet respectively to obtain the optimal integration result of each bottom-layer subnet; the bottom-layer subnet is the multiple different types of subnets. The subnet that contains the smallest number of network layers; 在确定了最底层子网的最优融合结果时,依据子网包括网络层数量从少到多的顺序,依次依据底层子网的最优融合结果、预设融合规则以及融合目标,对高层子网进行网络层融合,得到高层子网的最优融合结果,对于两个不同类型的子网,包括的网络层数量多的子网为高层子网,包括的网络层数量少的子网为底层子网;对于所述高层子网的最优融合结果中任一包括多个网络层的层级,该层级的结构与目标底层子网的最优融合结果的结构一致,该目标底层子网为与该层级包括的网络层相同的底层子网。When the optimal integration result of the lowest subnet is determined, the order of the number of network layers included in the subnet from small to large is determined, and the optimal integration result of the lower subnet, the preset integration rules and the integration target are sequentially determined for the high-level subnet. The network performs network layer integration to obtain the optimal integration result of the high-level subnet. For two different types of subnets, the subnet that includes a large number of network layers is the high-level subnet, and the subnet that includes a small number of network layers is the bottom layer. subnet; for any of the optimal integration results of the high-level subnet including a hierarchy of multiple network layers, the structure of this level is consistent with the structure of the optimal integration result of the target underlying subnet, and the target underlying subnet is the same as This layer includes the same underlying subnets as the network layer. 2.根据权利要求1所述的方法,其特征在于,所述最底层子网包括的网络层数量为1;所述不同子网划分方式下得到的子网包括2层子网,2层子网包括的网络层数量为2;最优融合结果为进出带宽最小的融合结果;2. The method according to claim 1, characterized in that the number of network layers included in the bottom subnet is 1; the subnets obtained under the different subnet division modes include 2-layer subnets, and 2-layer subnets. The number of network layers included in the network is 2; the optimal fusion result is the fusion result with the smallest incoming and outgoing bandwidth; 所述依据底层子网的最优融合结果、所述预设融合规则以及融合目标,对高层子网进行网络层融合,包括:The network layer integration of the high-level subnet based on the optimal integration result of the underlying subnet, the preset integration rules and the integration target includes: 对于任一2层子网,分别确定该2层子网中两个网络层融合为一个层级的情况下该最底层子网的第一进出带宽,以及该两个网络层未融合情况下该最底层子网的第二进出带宽;For any layer 2 subnet, determine the first inbound and outbound bandwidth of the lowest layer subnet when the two network layers in the layer 2 subnet are integrated into one layer, and the lowest bandwidth when the two network layers are not integrated. The second incoming and outgoing bandwidth of the underlying subnet; 若所述第一进出带宽小于所述第二进出带宽,则确定该两个网络层融合为一个层级为该2层子网的最优融合结果;If the first inbound and outbound bandwidth is less than the second inbound and outbound bandwidth, it is determined that the two network layers are merged into one level to be the optimal fusion result of the 2-layer subnet; 若所述第一进出带宽大于第二进出带宽,则确定该网络层未融合为该2层子网的最优融合结果。If the first inbound and outbound bandwidth is greater than the second inbound and outbound bandwidth, it is determined that the network layer is not integrated and is the optimal integration result of the layer 2 subnet. 3.根据权利要求2所述的方法,其特征在于,所述依据底层子网的最优融合结果、所述预设融合规则以及融合目标,对高层子网进行网络层融合,得到高层子网的最优融合结果,还包括:3. The method according to claim 2, characterized in that, based on the optimal fusion result of the underlying subnet, the preset fusion rules and the fusion target, network layer fusion is performed on the high-level subnet to obtain the high-level subnet. The optimal fusion results also include: 对于任一包括的网络层数量为k的高层子网,依据包括的网络层数量小于k的底层子网的最优融合结果、所述预设融合规则以及融合目标,对该高层子网进行融合,得到该高层子网的最优融合结果,2<k<N,N为所述待优化网络的网络层数量。For any high-level subnet that includes k network layers, the high-level subnet is merged based on the optimal fusion result of the underlying subnet that includes a number of network layers less than k, the preset fusion rules, and the fusion target. , the optimal integration result of the high-level subnet is obtained, 2<k<N, N is the number of network layers of the network to be optimized. 4.根据权利要求3所述的方法,其特征在于,对于任一包括的网络层数量为k的子网,对该子网进行网络层融合的候选融合方案中,包括对至少2个网络层,且至多m个网络层进行融合,m≤k,且m满足所述融合规则限制;4. The method according to claim 3, characterized in that, for any subnet that includes k network layers, the candidate fusion solution for network layer fusion for the subnet includes at least 2 network layers. , and at most m network layers are fused, m≤k, and m satisfies the restrictions of the fusion rules; 所述网络层包括卷积层、非线性层、池化层、全连接层、反卷积层或上采样层。The network layer includes a convolution layer, a nonlinear layer, a pooling layer, a fully connected layer, a deconvolution layer or an upsampling layer. 5.根据权利要求1-4任一项所述的方法,其特征在于,所述对待优化神经网络进行子网划分之前,还包括:5. The method according to any one of claims 1-4, characterized in that, before subnetting the neural network to be optimized, it further includes: 获取网络拆分配置指令;Get network split configuration instructions; 依据所述网络拆分配置指令,将所述待优化神经网络拆分为至少两个待优化部分;According to the network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized; 所述对待优化神经网络进行子网划分,包括:The subnetwork division of the neural network to be optimized includes: 分别对各待优化部分进行子网划分;Divide each part to be optimized into subnets respectively; 所述依据各子网的最优融合结果、所述预设融合规则以及融合目标,对所述待优化神经网络进行网络层融合,包括:Performing network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnet, the preset fusion rules and the fusion target includes: 对于任一待优化部分,依据该待优化部分的各子网的最优融合结果,以及所述预设融合规则以及融合目标,对该待优化部分进行网络层融合,得到该待优化部分的最优融合结果;For any part to be optimized, perform network layer fusion on the part to be optimized based on the optimal fusion results of each subnet of the part to be optimized, the preset fusion rules and the fusion target, and obtain the optimal part of the part to be optimized. Excellent fusion results; 依据各待优化部分的最优融合结果,确定所述待优化神经网络的最优融合结果。According to the optimal fusion result of each part to be optimized, the optimal fusion result of the neural network to be optimized is determined. 6.一种神经网络优化装置,其特征在于,所述装置用于降低计算平台的带宽需求,所述装置包括:6. A neural network optimization device, characterized in that the device is used to reduce the bandwidth requirements of a computing platform, and the device includes: 划分单元,用于对待优化神经网络进行子网划分;Division unit, used to divide the neural network to be optimized into subnets; 优化单元,用于依据预设融合规则以及融合目标分别对各子网进行网络层融合,得到各子网的最优融合结果;其中,融合规则用于对参与融合的网络层进行限制,该限制包括系数缓存的限制、Map缓存的限制,以及,层与层融合的限制;所述系数为卷积层权重,所述系数缓存为计算平台中存放系数的缓存,所述Map缓存为计算平台中存放特征图的缓存;The optimization unit is used to perform network layer fusion on each subnet based on the preset fusion rules and fusion targets to obtain the optimal fusion results for each subnet; among them, the fusion rules are used to limit the network layers participating in the fusion. This restriction Including limitations of coefficient cache, Map cache, and layer-to-layer fusion limitations; the coefficients are convolution layer weights, the coefficient cache is a cache that stores coefficients in the computing platform, and the Map cache is a cache in the computing platform. Cache to store feature maps; 所述优化单元,还用于依据各子网的最优融合结果、所述预设融合规则以及融合目标,对所述待优化神经网络进行网络层融合,得到所述待优化神经网络的最优融合结果,以最小化所述神经网络与外部存储的数据交互的带宽;其中,对于所述待优化神经网络的最优融合结果中任一包括多个网络层的层级,该层级的结构与目标子网的最优融合结果的结构一致,所述目标子网为与该层级包括的网络层相同的子网;依据所述最优融合结果,以所述待优化神经网络的层级作为硬件基本计算单元,将所述待优化神经网络部署到计算平台;其中,所述待优化神经网络在所述计算平台中运行时,对于一个层级,输入占用一次带宽,输出占用一次带宽,所述带宽为所述计算平平台运行所述待优化神经网络时与外部存储的数据交互带宽;The optimization unit is also used to perform network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnetwork, the preset fusion rules and the fusion target, to obtain the optimal neural network to be optimized. Fusion results to minimize the bandwidth of the interaction between the neural network and externally stored data; wherein, for any of the optimal fusion results of the neural network to be optimized includes a hierarchy of multiple network layers, the structure of the hierarchy is consistent with the goal The structure of the optimal fusion result of the subnet is consistent, and the target subnet is the same subnet as the network layer included in this level; based on the optimal fusion result, the level of the neural network to be optimized is used as the basic calculation of the hardware Unit, deploy the neural network to be optimized to a computing platform; wherein, when the neural network to be optimized is run in the computing platform, for a level, the input occupies one bandwidth and the output occupies one bandwidth, and the bandwidth is the The data interaction bandwidth between the computing platform and external storage when running the neural network to be optimized; 其中,所述划分单元对待优化神经网络进行子网划分,包括:Wherein, the dividing unit divides the neural network to be optimized into subnets, including: 分别按照多种不同子网划分方式,对所述待优化神经网络进行子网划分,得到多种不同类型的子网,不同子网划分方式下得到的子网包括的网络层数量不同;Divide the neural network to be optimized into subnets according to a variety of different subnet division methods to obtain multiple different types of subnets, and the subnets obtained under different subnet division methods include different numbers of network layers; 所述优化单元依据预设融合规则以及融合目标分别对各子网进行网络层融合,包括:The optimization unit performs network layer integration on each subnet based on the preset integration rules and integration targets, including: 依据所述预设融合规则以及融合目标,分别对各最底层子网进行网络层融合,得到各最底层子网的最优融合结果;所述最底层子网为所述多种不同类型的子网中包括的网络层数量最少的子网;According to the preset fusion rules and fusion targets, network layer fusion is performed on each bottom-layer subnet respectively to obtain the optimal integration result of each bottom-layer subnet; the bottom-layer subnet is the multiple different types of subnets. The subnet that contains the smallest number of network layers; 在确定了最底层子网的最优融合结果时,依据子网包括网络层数量从少到多的顺序,依次依据底层子网的最优融合结果、预设融合规则以及融合目标,对高层子网进行网络层融合,得到高层子网的最优融合结果,对于两个不同类型的子网,包括的网络层数量多的子网为高层子网,包括的网络层数量少的子网为底层子网;对于所述高层子网的最优融合结果中任一包括多个网络层的层级,该层级的结构与目标底层子网的最优融合结果的结构一致,该目标底层子网为与该层级包括的网络层相同的底层子网。When the optimal integration result of the lowest subnet is determined, the order of the number of network layers included in the subnet from small to large is determined, and the optimal integration result of the lower subnet, the preset integration rules and the integration target are sequentially determined for the high-level subnet. The network performs network layer integration to obtain the optimal integration result of the high-level subnet. For two different types of subnets, the subnet that includes a large number of network layers is the high-level subnet, and the subnet that includes a small number of network layers is the bottom layer. subnet; for any of the optimal integration results of the high-level subnet including a hierarchy of multiple network layers, the structure of this level is consistent with the structure of the optimal integration result of the target underlying subnet, and the target underlying subnet is the same as This layer includes the same underlying subnets as the network layer. 7.根据权利要求6所述的装置,其特征在于,7. The device according to claim 6, characterized in that, 所述最底层子网包括的网络层数量为1;所述不同子网划分方式下得到的子网包括2层子网,2层子网包括的网络层数量为2;最优融合结果为进出带宽最小的融合结果;The number of network layers included in the lowest subnet is 1; the subnets obtained under the different subnet division methods include 2-layer subnets, and the number of network layers included in the 2-layer subnet is 2; the optimal integration result is ingress and egress. Fusion result with minimum bandwidth; 所述优化单元依据底层子网的最优融合结果、所述预设融合规则以及融合目标,对高层子网进行网络层融合,包括:The optimization unit performs network layer integration on the high-level subnet based on the optimal integration result of the underlying subnet, the preset integration rules and the integration target, including: 对于任一2层子网,分别确定该2层子网中两个网络层融合为一个层级的情况下该最底层子网的第一进出带宽,以及该两个网络层未融合情况下该最底层子网的第二进出带宽;For any layer 2 subnet, determine the first inbound and outbound bandwidth of the lowest layer subnet when the two network layers in the layer 2 subnet are merged into one layer, and the lowest bandwidth when the two network layers are not merged. The second incoming and outgoing bandwidth of the underlying subnet; 若所述第一进出带宽小于所述第二进出带宽,则确定该两个网络层融合为一个层级为该2层子网的最优融合结果;If the first inbound and outbound bandwidth is less than the second inbound and outbound bandwidth, it is determined that the two network layers are merged into one level to be the optimal fusion result of the 2-layer subnet; 若所述第一进出带宽大于第二进出带宽,则确定该网络层未融合为该2层子网的最优融合结果;If the first inbound and outbound bandwidth is greater than the second inbound and outbound bandwidth, it is determined that the network layer is not integrated and is the optimal integration result of the layer 2 subnet; 其中,所述优化单元依据底层子网的最优融合结果、所述预设融合规则以及融合目标,对高层子网进行网络层融合,得到高层子网的最优融合结果,还包括:Wherein, the optimization unit performs network layer fusion on the high-level subnet based on the optimal fusion result of the underlying subnet, the preset fusion rules and the fusion target, and obtains the optimal fusion result of the high-level subnet, which also includes: 对于任一包括的网络层数量为k的高层子网,依据包括的网络层数量小于k的底层子网的最优融合结果、所述预设融合规则以及融合目标,对该高层子网进行融合,得到该高层子网的最优融合结果,2<k<N,N为所述待优化网络的网络层数量;For any high-level subnet that includes k network layers, the high-level subnet is merged based on the optimal fusion result of the underlying subnet that includes a number of network layers less than k, the preset fusion rules, and the fusion target. , to obtain the optimal integration result of the high-level subnet, 2<k<N, N is the number of network layers of the network to be optimized; 其中,对于任一包括的网络层数量为k的子网,对该子网进行网络层融合的候选融合方案中,包括对至少2个网络层,且至多m个网络层进行融合,m≤k,且m满足所述融合规则限制;Among them, for any subnet that includes k network layers, the candidate fusion solution for network layer fusion for this subnet includes fusion of at least 2 network layers and at most m network layers, m≤k , and m satisfies the fusion rule restrictions; 所述网络层包括卷积层、非线性层、池化层、全连接层、反卷积层或上采样层;The network layer includes a convolution layer, a nonlinear layer, a pooling layer, a fully connected layer, a deconvolution layer or an upsampling layer; 和/或,and / or, 所述划分单元对待优化神经网络进行子网划分之前,还包括:Before dividing the neural network to be optimized into subnets, the division unit also includes: 获取网络拆分配置指令;Get network split configuration instructions; 依据所述网络拆分配置指令,将所述待优化神经网络拆分为至少两个待优化部分;According to the network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized; 所述划分单元对待优化神经网络进行子网划分,包括:The division unit divides the neural network to be optimized into subnets, including: 分别对各待优化部分进行子网划分;Divide each part to be optimized into subnets respectively; 所述优化单元依据各子网的最优融合结果、所述预设融合规则以及融合目标,对所述待优化神经网络进行网络层融合,包括:The optimization unit performs network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnetwork, the preset fusion rules and the fusion target, including: 对于任一待优化部分,依据该待优化部分的各子网的最优融合结果,以及所述预设融合规则以及融合目标,对该待优化部分进行网络层融合,得到该待优化部分的最优融合结果;For any part to be optimized, perform network layer fusion on the part to be optimized based on the optimal fusion results of each subnet of the part to be optimized, the preset fusion rules and the fusion target, and obtain the optimal part of the part to be optimized. Excellent fusion results; 依据各待优化部分的最优融合结果,确定所述待优化神经网络的最优融合结果。According to the optimal fusion result of each part to be optimized, the optimal fusion result of the neural network to be optimized is determined. 8.一种电子设备,其特征在于,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器用于执行机器可执行指令,以实现如权利要求1-5任一项所述的方法。8. An electronic device, characterized in that it includes a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor is used to execute the machine-executable instructions to implement the rights as claimed in The method described in any one of claims 1-5. 9.一种机器可读存储介质,其特征在于,所述机器可读存储介质内存储有机器可执行指令,所述机器可执行指令被处理器执行时实现如权利要求1-5任一项所述的方法。9. A machine-readable storage medium, characterized in that machine-executable instructions are stored in the machine-readable storage medium, and when the machine-executable instructions are executed by a processor, any one of claims 1-5 is implemented. the method described.
CN202110204808.1A 2021-02-23 2021-02-23 Neural network optimization method, device, electronic equipment and readable storage medium Active CN112884123B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110204808.1A CN112884123B (en) 2021-02-23 2021-02-23 Neural network optimization method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110204808.1A CN112884123B (en) 2021-02-23 2021-02-23 Neural network optimization method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN112884123A CN112884123A (en) 2021-06-01
CN112884123B true CN112884123B (en) 2024-03-01

Family

ID=76054229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110204808.1A Active CN112884123B (en) 2021-02-23 2021-02-23 Neural network optimization method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN112884123B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115965060A (en) * 2021-10-11 2023-04-14 Oppo广东移动通信有限公司 Data processing method, device, chip and computer equipment for AI chip
CN115796228B (en) * 2022-11-15 2024-04-05 北京百度网讯科技有限公司 Operator fusion method, device, equipment and storage medium
CN116306805A (en) 2023-02-10 2023-06-23 上海安亭地平线智能交通技术有限公司 Instruction generation method, device and electronic equipment for neural network accelerator

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001256212A (en) * 2000-03-09 2001-09-21 Fuji Electric Co Ltd Optimization Learning Method of Neural Network
CN110321999A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Neural computing figure optimization method
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network
CN110490302A (en) * 2019-08-12 2019-11-22 北京中科寒武纪科技有限公司 A kind of neural network compiling optimization method, device and Related product
CN110663048A (en) * 2017-09-05 2020-01-07 松下电器(美国)知识产权公司 Execution method, execution device, learning method, learning device, and program for deep neural network
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
CN111260019A (en) * 2020-02-18 2020-06-09 深圳鲲云信息科技有限公司 Data processing method, device and equipment of neural network model and storage medium
CN111488983A (en) * 2020-03-24 2020-08-04 哈尔滨工业大学 Lightweight CNN model calculation accelerator based on FPGA
WO2020187041A1 (en) * 2019-03-18 2020-09-24 北京灵汐科技有限公司 Neural network mapping method employing many-core processor and computing device
WO2020207082A1 (en) * 2019-04-08 2020-10-15 创新先进技术有限公司 Neural network model optimization processing method and device
CN112116081A (en) * 2020-09-29 2020-12-22 杭州海康威视数字技术股份有限公司 Deep learning network optimization method and device
CN112116084A (en) * 2020-09-15 2020-12-22 中国科学技术大学 Convolution neural network hardware accelerator capable of solidifying full network layer on reconfigurable platform
CN113554164A (en) * 2020-04-24 2021-10-26 上海商汤智能科技有限公司 Neural network model optimization method, neural network model data processing method, neural network model optimization device, neural network model data processing device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102865734B1 (en) * 2019-01-02 2025-09-26 삼성전자주식회사 Neural network optimizing device and neural network optimizing method

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001256212A (en) * 2000-03-09 2001-09-21 Fuji Electric Co Ltd Optimization Learning Method of Neural Network
CN110663048A (en) * 2017-09-05 2020-01-07 松下电器(美国)知识产权公司 Execution method, execution device, learning method, learning device, and program for deep neural network
CN110321999A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Neural computing figure optimization method
CN110321064A (en) * 2018-03-30 2019-10-11 北京深鉴智能科技有限公司 Computing platform realization method and system for neural network
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
WO2020187041A1 (en) * 2019-03-18 2020-09-24 北京灵汐科技有限公司 Neural network mapping method employing many-core processor and computing device
WO2020207082A1 (en) * 2019-04-08 2020-10-15 创新先进技术有限公司 Neural network model optimization processing method and device
CN110490302A (en) * 2019-08-12 2019-11-22 北京中科寒武纪科技有限公司 A kind of neural network compiling optimization method, device and Related product
CN110490309A (en) * 2019-08-14 2019-11-22 北京中科寒武纪科技有限公司 A kind of Operator Fusion method and its Related product for neural network
CN111260019A (en) * 2020-02-18 2020-06-09 深圳鲲云信息科技有限公司 Data processing method, device and equipment of neural network model and storage medium
CN111488983A (en) * 2020-03-24 2020-08-04 哈尔滨工业大学 Lightweight CNN model calculation accelerator based on FPGA
CN113554164A (en) * 2020-04-24 2021-10-26 上海商汤智能科技有限公司 Neural network model optimization method, neural network model data processing method, neural network model optimization device, neural network model data processing device and storage medium
CN112116084A (en) * 2020-09-15 2020-12-22 中国科学技术大学 Convolution neural network hardware accelerator capable of solidifying full network layer on reconfigurable platform
CN112116081A (en) * 2020-09-29 2020-12-22 杭州海康威视数字技术股份有限公司 Deep learning network optimization method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于FPGA的卷积神经网络并行加速器设计";王婷 等;《电子技术应用》;第第47卷卷(第第02期期);第81-84页 *
稀疏卷积神经网络加速器设计;李永博;王琴;蒋剑飞;;微电子学与计算机(第06期);第34-38+43页 *

Also Published As

Publication number Publication date
CN112884123A (en) 2021-06-01

Similar Documents

Publication Publication Date Title
CN112884123B (en) Neural network optimization method, device, electronic equipment and readable storage medium
CN112738820B (en) Dynamic deployment method and device of service function chain and computer equipment
CN113193984B (en) Air-space-ground integrated network resource mapping method and system
CN116389266A (en) Method and device for slicing digital twin network based on reinforcement learning
WO2020187041A1 (en) Neural network mapping method employing many-core processor and computing device
CN113282409B (en) Edge calculation task processing method and device and computer equipment
CN111953547B (en) Heterogeneous base station overlapping grouping and resource allocation method and device based on service
CN112835719B (en) Method and device for task processing, many-core system and computer readable medium
JP6097449B2 (en) Provisioning time-varying traffic in software-defined flexible grid transport networks
CN112436992B (en) Virtual network mapping method and device based on graph convolution network
CN119603144B (en) A highly reliable resource scheduling optimization method for heterogeneous computing networks based on coded distributed computing and hypergraph neural networks
CN106576065A (en) Automatic division of internet protocol radio access networks to interior gateway protocol areas
CN119521309B (en) Task scheduling method based on improved bat algorithm in mobile edge computing environment
CN113965937A (en) A content popularity prediction method based on cluster federated learning in fog radio access network
CN116132292B (en) Network slice deployment method based on resource scheduling and adaptation joint optimization
CN113886090B (en) Memory allocation method, device, equipment, and storage medium
CN107579866B (en) A kind of business and Virtual Service intelligent Matching method of wireless dummyization access autonomous management network
CN112055394A (en) Wireless sensor network clustering routing method and system for rapid inclusion elimination
Cui et al. A controller deployment scheme in 5G-IoT network based on SDN
Joshitha et al. Performance analysis for lifetime improvement of a regression based clustered network through cluster member allocation and secondary cluster head selection in a distributed WSN
CN118467160A (en) Computing network cooperative scheduling method for edge computing task
CN110248386B (en) Hierarchical load balancing method for large-scale dense wireless network
CN118227331A (en) A multi-objective task offloading method based on UAV
CN114036814A (en) Method, system, terminal, medium, device and application for mining relatively important nodes
CN112511453B (en) SDN controller deployment method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant