Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.
The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following description will first simply explain some technical terms related to the embodiments of the present application.
layer: on the network side, the basic constituent units in the neural network, such as a convolution layer, a pooling layer and the like;
level: a basic computing unit when the neural network is deployed in the computing platform, wherein one or more layers form a level;
in practical applications, there may be a case where one layer is split into a plurality of levels, but the probability of occurrence of this case is low.
BW (bandwidth): the throughput of data, the BW of the input and output of a neural network when running in a computing platform can be understood as the bandwidth required by the neural network;
ker: broadly referred to as convolutional layer weights (coefficients);
coefficient caching: storing a cache of a ker in the computing platform;
map cache: the computing platform stores a cache of feature maps.
In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solutions of the embodiments of the present application in detail with reference to the accompanying drawings.
Referring to fig. 1, a flow chart of a neural network optimization method provided in an embodiment of the present application, as shown in fig. 1, the neural network optimization method may include the following steps:
And step S100, carrying out subnet division on the neural network to be optimized, and carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimized fusion result of each subnet.
In the embodiment of the application, in order to reduce the requirement of the bandwidth of the neural network deployed in the computing platform, the layer in the neural network can be fused to reduce the data interaction bandwidth between the neural network and the external storage, so that the optimization of the neural network is realized.
For example, for a neural network to be optimized (referred to herein as a neural network to be optimized), a plurality of subnets may be obtained by first performing subnet division, and layer fusion is performed on each subnet according to a preset fusion rule and a fusion target (referred to herein as a preset fusion rule and a fusion target), so as to obtain an optimal fusion result of each subnet.
For example, for an N-layer neural network (N.gtoreq.3), the neural network subnetwork may be divided to provide a plurality of 2-layer subnetworks.
For example, assuming that n=3 (the neural network includes layer1, layer2, and layer 3), the sub-network of layer2 may include a sub-network of layer1 and layer2, and a sub-network of layer2 and layer 3; for nonlinear networks, subnets of layer1 and layer3 may also be included.
For ease of understanding and explanation, a linear network will be described hereinafter, i.e., layer1 is directly connected to layer2, and layer2 is directly connected to layer 3.
For the neural network to be optimized, the sub-network division can be performed according to a plurality of different sub-network division modes, so that a plurality of different types of sub-networks are obtained, and layer fusion is performed on the different types of sub-networks respectively.
For example, taking an N-layer neural network as an example, assuming that n=4, the neural network may be divided into a plurality of 2-layer subnets, and layer fusion is performed on each 2-layer subnet; and dividing the neural network into a plurality of 3-layer subnets according to another subnet division mode, and respectively performing layer fusion on each 3-layer subnet.
Illustratively, the fusion rule is used to limit layers participating in fusion, which may include, but is not limited to, a limit of a ker cache (i.e., a limit of a coefficient cache), a limit of a Map cache, a limit of layer-to-layer fusion, and the like.
For example, a maximum ker buffer (i.e., a coefficient buffer) may be preset, and the fused level coefficient cannot exceed the preset maximum ker buffer, so that the number of layers participating in layer fusion may be limited.
The fusion objective is to characterize the purpose of layer fusion of the neural network, for example, to minimize the bandwidth requirements of the neural network deployment (i.e., to minimize the throughput of the neural network data interaction with external storage).
Step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result of each sub-network, and presetting fusion rules and fusion targets to obtain the optimal fusion result of the neural network to be optimized; for any layer level including a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the same subnet as the layer included in the level.
In the embodiment of the present application, at least one fusion scheme needs to use the optimal fusion result of the sub-network in step S100 in consideration of the layer fusion process of the neural network to be optimized.
For example, for an N-layer neural network, assuming n=3, the candidate fusion scheme (to satisfy the preset fusion rule, the following is the same) may include the following scheme:
scheme 1, each layer is not fused;
scheme 2, layer1 and layer2 are fused, layer3 does not participate in the fusion;
scheme 3, layer2 and layer3 are fused, layer1 does not participate in the fusion;
scheme 4, layer1, layer2, and layer3 fusion (assuming 3 layer fusion can meet preset fusion rule requirements).
Wherein, both scheme 2 and scheme 3 require the use of fusion results to a layer2 subnet.
For example, when layer1 and layer2 are fused for scheme 2, the optimal fusion result is the optimal fusion result of the 2-layer subnets corresponding to layer1 and layer 2.
Namely, when determining the optimal fusion result of the neural network to be optimized, the optimal fusion result of the sub-network of the neural network to be optimized is required to be used. Therefore, the sub-network division is carried out on the neural network to be optimized, the layer fusion is carried out on each sub-network respectively to obtain the optimal fusion result of each sub-network, and the layer fusion is carried out on the neural network to be optimized according to the optimal fusion result of the sub-network to obtain the optimal fusion result of the neural network to be optimized, so that the calculation for determining the optimal fusion result of the neural network to be optimized can be simplified, and the efficiency for determining the optimal fusion result of the neural network to be optimized is improved.
For example, for any level including a plurality of layers in the optimal fusion result of the neural network to be optimized, if there is a subnet (referred to herein as a target subnet) that is the same as the layer included in the level, the structure of the level in the optimal fusion result of the neural network to be optimized is consistent with the structure of the optimal fusion result of the target subnet.
In the above example, assuming that the optimal fusion result of the neural network to be optimized is scheme 3, the structure of the level is identical to that of the optimal fusion result of the layer2 and layer3 sub-network for the level including layer2 and layer 3.
It can be seen that in the process of the method shown in fig. 1, the sub-networks are divided and layered according to the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of each sub-network, and then the layered fusion is performed on the neural network to be optimized according to the optimal fusion result of each sub-network, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized, and the efficiency of determining the optimal fusion result of the neural network to be optimized is improved under the condition that the optimal fusion result under the condition that the preset fusion rule and the fusion target are satisfied is ensured.
In some embodiments, as shown in fig. 2, in step S100, the sub-networks to be optimized are divided, and network layer fusion is performed on each sub-network according to a preset fusion rule and a fusion target, which may be implemented by the following steps:
step S101, respectively carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layer numbers;
step S102, respectively performing layer fusion on each bottommost subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-layer sub-network is the sub-network with the least layer number included in a plurality of different types of sub-networks;
Step S103, performing layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, a preset fusion rule and a fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a plurality of layers, the structure of the layers is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is the same low-level sub-network as the layers included in the layers.
For example, when the neural network to be optimized is divided into the sub-networks, the neural network to be optimized may be divided into different types of sub-networks according to different sub-network division modes.
Illustratively, different types of subnets include different numbers of layers.
Considering that the layer fusion of the higher layer subnetwork requires the use of the optimal fusion result of the layer of the lower layer subnetwork when the layer fusion of the subnetwork is performed.
It should be noted that, in the embodiment of the present application, the bottom layer subnets and the high layer subnets are relatively, but not absolutely, and for two different types of subnets, the subnets with a large number of layers are high layer subnets, and the subnets with a small number of layers are bottom layer subnets.
For example, for a layer 2 subnet (one subnet includes 2 layers) and a layer 3 subnet (one subnet includes 3 layers), the layer 2 subnet is the bottom layer subnet and the layer 3 subnet is the higher layer subnet.
For a 3-layer subnet and a 4-layer subnet (one subnet includes 4 layers), the 3-layer subnet is a bottom-layer subnet, and the 4-layer subnet is a higher-layer subnet.
It should be noted that, for the layer 4 subnetwork, the layer 2 subnetwork and the layer 3 subnetwork both belong to the bottom subnetwork.
When the subnets are in layer fusion, layer fusion can be performed on the subnets at the bottommost layer (namely the subnets with the least layer number), so that the optimal fusion result of the subnets at the bottommost layer can be obtained.
When the optimal fusion result of the bottom-layer subnetwork is determined, layer fusion can be performed on the high-layer subnetwork according to the sequence that the subnetwork comprises the layers from less to more, and the optimal fusion result of the high-layer subnetwork, the preset fusion rule and the fusion target are sequentially obtained.
Illustratively, for a level of any one of the optimal fusion results for any one of the higher-level subnets that includes multiple layers, if there is a same underlying subnet (referred to herein as a target underlying subnet) as the level includes layers, the structure of the level in the optimal fusion result for the higher-level subnet is consistent with the structure of the optimal fusion result for the target underlying subnet.
For example, for a layer4 sub-network (assuming that layer1 to layer4 are included), if the optimal fusion result is layer1 to layer3 fusion, layer4 does not participate in the fusion (also can be understood as a layer4 single layer is a layer, and the other single layers are the same), the structure of the layer obtained by fusion for layer1 to layer3 is identical to the structure of the optimal fusion result of a layer3 sub-network including layer1 to layer 3.
In one example, the bottom-most subnet includes 1 layer number; the subnets obtained in different subnets are divided into 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;
in step S103, performing layer fusion on the high-level subnet according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level subnet may include:
for any layer 2 sub-network, respectively determining a first in-out bandwidth of the layer 2 sub-network under the condition that two layers in the layer 2 sub-network are fused into one level, and a second in-out bandwidth of the bottommost sub-network under the condition that the two layers are not fused;
if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the layer 2 subnetwork;
If the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that the layer is not fused to be the optimal fusion result of the layer2 subnetwork.
It should be noted that, for a subnet with a layer number of 1 (may be referred to as a layer1 subnet), the optimal fusion result may be a level for 1 layer.
By way of example, taking an optimal fusion result as an fusion result with minimum bandwidth in and out as an example, that is, the fusion target is to minimize the throughput of the data interaction between the neural network and the external storage.
For any layer2 subnet (including layer1 and layer2 as an example), the candidate fusion scheme for the layer2 subnet may include the following scheme:
scheme 1: layer1 and layer2 are fused into 1 level;
scheme 2: layer1 and layer2 do not merge (i.e., layer1 is one level and layer2 is one level).
The ingress and egress bandwidths corresponding to scheme 1 (referred to herein as the first ingress and egress bandwidths) and scheme 2 (referred to herein as the second ingress and egress bandwidths) may be determined separately.
For example, for any fusion scheme, the ingress and egress bandwidths corresponding to the fusion scheme are the sum of bandwidths corresponding to the input features and bandwidths corresponding to the output features of each level under the fusion scheme, and specific implementation of the ingress and egress bandwidths may be described below with reference to specific examples, which are not described herein in detail.
The first access bandwidth and the second access bandwidth can be compared, and if the first access bandwidth is smaller than the second access bandwidth, the scheme 1 is determined to be an optimal fusion result; if the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that scheme 2 is the optimal fusion result.
It should be noted that, for the case where the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, the scheme 1 may be used as the optimal fusion result, and the scheme 2 may also be used as the optimal fusion result.
In addition, when the sub-network division is performed, the sub-network division of 1 layer may not be performed, that is, the sub-network of the lowest layer is not 1 layer, for example, the sub-network division of 2 layers, the sub-network division of 3 layers, the sub-network division of … and the sub-network of (N-1) layer may be performed, and in this case, the sub-network of the lowest layer may be the sub-network of 2 layers.
In one example, if the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, then scheme 2 is determined to be the optimal fusion result.
In an example, in step S103, layer fusion is performed on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the high-level subnet, which may further include:
and for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to an optimal fusion result, a preset fusion rule and a fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.
For example, when the neural network to be optimized (assuming N layers) is divided into subnets, the subnet division may be performed according to a division manner of 2 layers of subnets (i.e., each subnet is a 2-layer subnet), the subnet division may be performed according to a division manner of 3 layers of subnets (i.e., each subnet is a 3-layer subnet), …, and the subnet division may be performed according to a division manner of (N-1) layers of subnets (i.e., each subnet is a (N-1) layer subnet).
When the subnets are layer fused, the optimal fusion result of each 2-layer subnet can be determined first, then the optimal fusion result of each 3-layer subnet is determined according to the optimal fusion result of each single-layer subnet and the optimal fusion result of each 2-layer subnet, then the optimal fusion result of each 4-layer subnet is determined according to the optimal fusion result of each single-layer subnet, the optimal fusion result of each 2-layer subnet and the optimal fusion result of each 3-layer subnet, and then the like until the optimal fusion result of each highest-layer subnet (such as (N-1) layer subnet) is determined.
When the layer fusion is performed on the sub-network, the fusion rule and the fusion target are the same as those when the layer fusion is performed on the neural network to be optimized.
In one example, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets a preset fusion rule limit.
Illustratively, layers involved in fusion may include, but are not limited to, conv (convolutional) layers, nonlinear layers, pool (pooling) layers, fully-connected layers, deconvolution layers, upsampling layers, or other network base layers.
Illustratively, the nonlinear layer may include a linear rectification function (which may also be referred to as a modified linear unit, rectified Linear Units, abbreviated as ReLU) or other activation function.
For example, when performing layer fusion on any subnet or neural network to be optimized, the maximum number of layers (herein denoted as m) that can be fused may be determined according to the layers participating in fusion and a preset fusion rule.
It should be noted that, for different layers, the maximum number of layers that can be fused may be different under the same fusion rule.
In one example, for a network to be fused (including a neural network to be optimized or a sub-network of the neural network to be optimized), the candidate fusion scheme may include all layer fusion, or be divided into two optimal sub-networks (one optimal sub-network includes x layers, and the other optimal sub-network includes y-x layers, where y is the total number of layers in the network to be fused), and for an optimal sub-network including x layers, its structure is consistent with that under the optimal fusion scheme of the x-layer sub-network of the network to be fused (i.e., a sub-network including x layers); for an optimal sub-network of y-x layers, its structure is consistent with the structure under the optimal convergence scheme of the y-x layer sub-network of the network to be converged (i.e., the sub-network comprising y-x layers).
For example, for the k-layer subnet, if m=k under the condition that the fusion rule limit is satisfied, determining that the optimal fusion scheme is that all k layers are fused; if m < k, the optimal fusion scheme may be to divide the k-layer sub-network into 2 optimal sub-networks, one including m layers and one including k-m layers. For an optimal subnetwork comprising m layers, the result is consistent with the structure under the optimal convergence scheme of the m-layer subnetwork, and for an optimal subnetwork comprising k-m layers, the result is consistent with the structure under the optimal convergence scheme of the k-m layer subnetwork.
In some embodiments, before the sub-network division of the neural network to be optimized in step S100, the method may further include:
acquiring a network splitting configuration instruction;
splitting the neural network to be optimized into at least two parts to be optimized according to the acquired network splitting configuration instruction;
in step S100, performing subnet division on the neural network to be optimized may include:
respectively carrying out sub-network division on each part to be optimized;
in step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result, the preset fusion rule and the fusion target of each subnet may include:
for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, and a preset fusion rule and a fusion target to obtain the optimal fusion result of the part to be optimized;
And determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.
For example, in order to further improve the efficiency of optimizing the neural network, before the neural network is optimized according to the method flow shown in fig. 1, a network splitting configuration instruction may be further obtained, and according to the obtained network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized, and further, an optimal fusion scheme of each part to be optimized may be respectively determined, so as to obtain an optimal fusion scheme of the neural network to be optimized.
For any part to be optimized, the optimal fusion scheme can be determined according to the method flow shown in fig. 1.
Illustratively, the network split configuration instructions may be determined from a priori knowledge.
Because the optimal fusion scheme of the sub-network can be determined firstly in the flow of the method shown in fig. 1, and then the optimal fusion scheme of the neural network to be optimized can be determined according to the optimal fusion scheme of the sub-network, if the neural network to be optimized can be determined that certain layers of the neural network to be optimized cannot be fused into one level according to the existing priori knowledge, the network to be optimized can be split before the sub-network is divided.
For example, if it is known according to priori knowledge that the front N layer and the back (N-N) layer need to be separated under the premise of meeting the fusion target, N is 1-N < N, the neural network may be first separated into two parts including the front N layer and the back (N-N) layer, and the optimal fusion result of each part is determined according to the flow shown in fig. 1, and then the optimal fusion result of the neural network is determined, so as to further improve the efficiency of determining the optimal fusion result of the neural network to be optimized.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.
In this embodiment, the N-layer neural network is divided according to a subnet division manner in which one subnet includes 2 layers, one subnet includes 3 layers, …, and one subnet includes (N-1) subnets, and an optimal fusion result of each subnet is determined by adopting a bottom-up manner according to a sequence in which the number of layers included in the subnets is from less to more, and further, the optimal fusion result of the N-layer neural network (may be referred to as a pyramid layer fusion scheme) is determined, and in the fusion process, the optimal fusion result of each subnet is reserved, and non-optimal fusion results are not reserved.
For an N-layer neural network, a bottom-up calculation strategy is adopted, and an optimal fusion result of a 2-layer subnet is calculated first; calculating the optimal fusion result of the 3-layer subnetwork according to the optimal fusion result of the 2-layer subnetwork; and calculating the optimal fusion result of the 4-layer subnetwork based on the optimal fusion result of the 2-layer subnetwork and the optimal fusion result of the 3-layer subnetwork, and the like until the optimal fusion result of the N-layer neural network is determined.
In the process of fusing any sub-network or neural network, at least 2 adjacent layers are respectively selected for fusion (such as a horizontal adjacent layer with the same feature map input or a vertical adjacent layer with the feature map calculation result of the former layer being at least part of the input of the latter layer) according to a fusion rule (which can also be an optimization rule) so as to determine an optimal fusion result meeting a fusion target.
It should be noted that the number of layers involved in fusion cannot exceed the fusion rule limit.
Illustratively, the adjacent layers may include a Conv layer, a nonlinear layer, a pool layer, a full-link layer, a deconvolution layer, or an upsampling layer, etc.
For example, when the problem f (i, j) is defined as determining the optimal fusion result of the sub-network formed by layers i to j, f (1, h) exists for the problem f (1, N), that is, in the process of determining the optimal fusion result of the N-layer neural network, the optimal fusion result (h < N) of the sub-network formed by layers 1 to h needs to be determined.
Accordingly, in solving for f (1, N), each f (i, j) in the following table needs to be solved separately:
| f(1,1)
|
f(1,2)
|
f(1,3)
|
…
|
f(1,N)
|
| |
f(2,2)
|
f(2,3)
|
f(h-2,h)
|
…
|
| |
|
f(3,3)
|
f(h-1,h)
|
f(N-2,N)
|
| |
|
|
f(h,h)
|
f(N-1,N)
|
| |
|
|
|
f(N,N) |
for example, each f (i, j) can be sequentially solved from right to left and from bottom to top until the solution of f (1, N) is completed, so as to obtain the optimal fusion result of the N-layer neural network.
Effects of the technical solutions provided by the embodiments of the present application are described below with reference to examples.
Taking comparison with the following greedy fusion scheme as an example, it is assumed that an implementation procedure of the greedy fusion scheme is as follows:
1. setting the cache size in the chip, and inputting the network structure of the neural network;
2. if the cache in the chip can put down the current layer, putting in;
3. if the on-chip cache can not put down the current layer, ending the current level, emptying the current cache, and putting the current layer into a new level;
4. ending if all layers are divided; otherwise, go to 2).
For the neural network shown in fig. 3A, the optimal fusion results obtained according to the greedy fusion scheme and the technical scheme provided in the embodiments of the present application may be shown in fig. 3B and fig. 3C, respectively.
Wherein 3×128×64 represents a convolution kernel 3*3 ("×may also be described as" x "), the number of input/output channels is 128 and 64, the stride (step size) is 1, and the size of the feature map input is assumed to be w×h.
For the fusion result shown in fig. 3B, the bandwidth is 128wh+256wh+256wh+128 wh=768 wh, and for the fusion result shown in fig. 3C, the bandwidth is 128wh+64wh+64wh+128 wh=384 wh.
As can be seen, for the neural network shown in fig. 3A, the bandwidth of the optimal fusion result obtained by using the technical solution provided in the embodiment of the present application is half of the bandwidth of the fusion result obtained by using the greedy fusion solution.
The methods provided herein are described above. The apparatus provided in this application is described below:
referring to fig. 4, a schematic structural diagram of a neural network optimization device provided in an embodiment of the present application, as shown in fig. 4, the neural network optimization device may include:
a dividing unit 410, configured to divide the sub-network of the neural network to be optimized;
the optimizing unit 420 is configured to perform network layer fusion on each subnet according to a preset fusion rule and a fusion target, so as to obtain an optimal fusion result of each subnet;
the optimizing unit 420 is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain an optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.
In some embodiments, the dividing unit 410 performs subnet division on the neural network to be optimized, including:
carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes respectively to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layers;
the optimizing unit 420 performs network layer fusion on each subnet according to a preset fusion rule and a fusion target, including:
performing layer fusion on each bottommost subnet according to the preset fusion rule and the fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-most subnet is a subnet with the least number of layers included in the plurality of different types of subnets;
and carrying out layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is a low-level sub-network which is the same as the layer included in the level.
In some embodiments, the bottom-most subnet includes a layer number of 1; the subnets obtained in different subnet division modes comprise 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;
the optimizing unit 420 performs layer fusion on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, and includes:
for any 2-layer sub-network, respectively determining a first access bandwidth of the bottommost sub-network under the condition that two layers in the 2-layer sub-network are fused into one level and a second access bandwidth of the bottommost sub-network under the condition that the two layers are not fused;
if the first access bandwidth is smaller than the second access bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the 2-layer sub-network;
and if the first in-out bandwidth is larger than the second in-out bandwidth, determining that the layer is not fused into an optimal fusion result of the layer 2 subnetwork.
In some embodiments, the optimizing unit 420 performs layer fusion on the higher-layer subnet according to the optimal fusion result of the lower-layer subnet, the preset fusion rule and the fusion target, to obtain the optimal fusion result of the higher-layer subnet, and further includes:
And for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.
In some embodiments, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets the fusion rule limit;
the layers include a convolutional Conv layer, a nonlinear layer, a pooling pool layer, a full-concatenated layer, a deconvolution layer, or an upsampling layer.
In some embodiments, before the dividing unit 410 performs subnet division on the neural network to be optimized, the method further includes:
acquiring a network splitting configuration instruction;
splitting the neural network to be optimized into at least two parts to be optimized according to the network splitting configuration instruction;
the dividing unit 410 performs subnet division on the neural network to be optimized, including:
respectively carrying out sub-network division on each part to be optimized;
The optimizing unit 420 performs layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, and includes:
for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, the preset fusion rule and the fusion target to obtain the optimal fusion result of the part to be optimized;
and determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.
Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 501, a memory 502 storing machine-executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the neural network optimization method described above by reading and executing machine-executable instructions in the memory 502 corresponding to the encoded control logic.
The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
In some embodiments, a machine-readable storage medium, such as memory 502 in fig. 5, is also provided, having stored thereon machine-executable instructions that when executed by a processor implement the neural network optimization method described above. For example, the machine-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.