CN112884123B

CN112884123B - Neural network optimization method, device, electronic equipment and readable storage medium

Info

Publication number: CN112884123B
Application number: CN202110204808.1A
Authority: CN
Inventors: 张凯; 谭文明; 李哲暘; 张如意
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2024-03-01
Anticipated expiration: 2041-02-23
Also published as: CN112884123A

Abstract

This application provides a neural network optimization method, device, electronic equipment and readable storage medium. The neural network optimization method includes: dividing the neural network to be optimized into subnets, and classifying each subnet according to the preset fusion rules and fusion goals. Perform network layer layer fusion to obtain the optimal fusion result of each subnet; based on the optimal fusion result of each subnet, the preset fusion rules and the fusion target, perform layer fusion on the neural network to be optimized to obtain the The optimal fusion result of the neural network to be optimized. This method can improve the efficiency of determining the optimal fusion result of the neural network to be optimized while ensuring that the optimal fusion result satisfies the preset fusion rules and fusion goals.

Description

Neural network optimization method and device, electronic equipment and readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a neural network optimization method, a device, an electronic device, and a readable storage medium.

Background

Neural Networks (NNs) are the focus and importance of research in the field of artificial intelligence, and the huge computational load and broadband requirements of the Neural networks become the main bottlenecks for the deployment of the Neural networks.

In order to reduce the bandwidth requirement of the neural network deployed in the computing platform, the neural network optimization can be realized by integrating a plurality of layers (network layers) of the neural network into a level (hardware basic computing unit, which can be called as a hierarchy) according to the constraint of the computing platform, so that the input and output of the layers integrated into one level only occupy one bandwidth, and the frequency of interaction between the computing platform and external storage data is reduced.

The practice finds that different fusion modes have different effects on reducing the bandwidth requirements, and how to minimize the bandwidth requirements of the neural network through fusion becomes a technical problem to be solved urgently.

Disclosure of Invention

In view of the foregoing, the present application provides a neural network optimization method, a neural network optimization device, an electronic device, and a readable storage medium.

Specifically, the application is realized by the following technical scheme:

according to a first aspect of embodiments of the present application, there is provided a neural network optimization method, including:

carrying out sub-network division on the neural network to be optimized, and carrying out network layer fusion on each sub-network according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each sub-network;

performing layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.

According to a second aspect of embodiments of the present application, there is provided a neural network optimization device, including:

the division unit is used for carrying out subnet division on the neural network to be optimized;

the optimizing unit is used for respectively carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each subnet;

the optimizing unit is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor for executing the machine-executable instructions to implement the personnel archiving method described above.

According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium having stored therein machine-executable instructions which when executed by a processor implement the personnel archiving method described above.

The technical scheme that this application provided can bring following beneficial effect at least:

the sub-network division is carried out on the neural network to be optimized, the network layer fusion is carried out on each sub-network according to the preset fusion rule and the fusion target to obtain the optimal fusion result of each sub-network, the layer fusion is carried out on the neural network to be optimized according to the optimal fusion result of each sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the neural network to be optimized, and the efficiency of determining the optimal fusion result of the neural network to be optimized is improved under the condition that the optimal fusion result meeting the preset fusion rule and the fusion target is ensured.

Drawings

FIG. 1 is a flow chart of a neural network optimization method according to an exemplary embodiment of the present application;

fig. 2 is a schematic flow chart of performing subnet division on a neural network to be optimized and performing network layer fusion on each subnet according to a preset fusion rule and a fusion target according to an exemplary embodiment of the present application;

FIG. 3A is a schematic diagram of a neural network according to an exemplary embodiment of the present application;

FIG. 3B is a schematic diagram of an optimal fusion result under a greedy fusion scheme as illustrated in an exemplary embodiment of the present application;

FIG. 3C is a schematic diagram of an optimal fusion result obtained by using the technical solution provided in the embodiments of the present application according to an exemplary embodiment of the present application;

FIG. 4 is a flow chart of a neural network optimization method according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present application as detailed in the accompanying claims.

The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following description will first simply explain some technical terms related to the embodiments of the present application.

layer: on the network side, the basic constituent units in the neural network, such as a convolution layer, a pooling layer and the like;

level: a basic computing unit when the neural network is deployed in the computing platform, wherein one or more layers form a level;

in practical applications, there may be a case where one layer is split into a plurality of levels, but the probability of occurrence of this case is low.

BW (bandwidth): the throughput of data, the BW of the input and output of a neural network when running in a computing platform can be understood as the bandwidth required by the neural network;

ker: broadly referred to as convolutional layer weights (coefficients);

coefficient caching: storing a cache of a ker in the computing platform;

map cache: the computing platform stores a cache of feature maps.

In order to make the above objects, features and advantages of the embodiments of the present application more comprehensible, the following describes the technical solutions of the embodiments of the present application in detail with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of a neural network optimization method provided in an embodiment of the present application, as shown in fig. 1, the neural network optimization method may include the following steps:

And step S100, carrying out subnet division on the neural network to be optimized, and carrying out network layer fusion on each subnet according to a preset fusion rule and a fusion target to obtain an optimized fusion result of each subnet.

In the embodiment of the application, in order to reduce the requirement of the bandwidth of the neural network deployed in the computing platform, the layer in the neural network can be fused to reduce the data interaction bandwidth between the neural network and the external storage, so that the optimization of the neural network is realized.

For example, for a neural network to be optimized (referred to herein as a neural network to be optimized), a plurality of subnets may be obtained by first performing subnet division, and layer fusion is performed on each subnet according to a preset fusion rule and a fusion target (referred to herein as a preset fusion rule and a fusion target), so as to obtain an optimal fusion result of each subnet.

For example, for an N-layer neural network (N.gtoreq.3), the neural network subnetwork may be divided to provide a plurality of 2-layer subnetworks.

For example, assuming that n=3 (the neural network includes layer1, layer2, and layer 3), the sub-network of layer2 may include a sub-network of layer1 and layer2, and a sub-network of layer2 and layer 3; for nonlinear networks, subnets of layer1 and layer3 may also be included.

For ease of understanding and explanation, a linear network will be described hereinafter, i.e., layer1 is directly connected to layer2, and layer2 is directly connected to layer 3.

For the neural network to be optimized, the sub-network division can be performed according to a plurality of different sub-network division modes, so that a plurality of different types of sub-networks are obtained, and layer fusion is performed on the different types of sub-networks respectively.

For example, taking an N-layer neural network as an example, assuming that n=4, the neural network may be divided into a plurality of 2-layer subnets, and layer fusion is performed on each 2-layer subnet; and dividing the neural network into a plurality of 3-layer subnets according to another subnet division mode, and respectively performing layer fusion on each 3-layer subnet.

Illustratively, the fusion rule is used to limit layers participating in fusion, which may include, but is not limited to, a limit of a ker cache (i.e., a limit of a coefficient cache), a limit of a Map cache, a limit of layer-to-layer fusion, and the like.

For example, a maximum ker buffer (i.e., a coefficient buffer) may be preset, and the fused level coefficient cannot exceed the preset maximum ker buffer, so that the number of layers participating in layer fusion may be limited.

The fusion objective is to characterize the purpose of layer fusion of the neural network, for example, to minimize the bandwidth requirements of the neural network deployment (i.e., to minimize the throughput of the neural network data interaction with external storage).

Step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result of each sub-network, and presetting fusion rules and fusion targets to obtain the optimal fusion result of the neural network to be optimized; for any layer level including a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of the target subnet, and the target subnet is the same subnet as the layer included in the level.

In the embodiment of the present application, at least one fusion scheme needs to use the optimal fusion result of the sub-network in step S100 in consideration of the layer fusion process of the neural network to be optimized.

For example, for an N-layer neural network, assuming n=3, the candidate fusion scheme (to satisfy the preset fusion rule, the following is the same) may include the following scheme:

scheme 1, each layer is not fused;

scheme 2, layer1 and layer2 are fused, layer3 does not participate in the fusion;

scheme 3, layer2 and layer3 are fused, layer1 does not participate in the fusion;

scheme 4, layer1, layer2, and layer3 fusion (assuming 3 layer fusion can meet preset fusion rule requirements).

Wherein, both scheme 2 and scheme 3 require the use of fusion results to a layer2 subnet.

For example, when layer1 and layer2 are fused for scheme 2, the optimal fusion result is the optimal fusion result of the 2-layer subnets corresponding to layer1 and layer 2.

Namely, when determining the optimal fusion result of the neural network to be optimized, the optimal fusion result of the sub-network of the neural network to be optimized is required to be used. Therefore, the sub-network division is carried out on the neural network to be optimized, the layer fusion is carried out on each sub-network respectively to obtain the optimal fusion result of each sub-network, and the layer fusion is carried out on the neural network to be optimized according to the optimal fusion result of the sub-network to obtain the optimal fusion result of the neural network to be optimized, so that the calculation for determining the optimal fusion result of the neural network to be optimized can be simplified, and the efficiency for determining the optimal fusion result of the neural network to be optimized is improved.

For example, for any level including a plurality of layers in the optimal fusion result of the neural network to be optimized, if there is a subnet (referred to herein as a target subnet) that is the same as the layer included in the level, the structure of the level in the optimal fusion result of the neural network to be optimized is consistent with the structure of the optimal fusion result of the target subnet.

In the above example, assuming that the optimal fusion result of the neural network to be optimized is scheme 3, the structure of the level is identical to that of the optimal fusion result of the layer2 and layer3 sub-network for the level including layer2 and layer 3.

It can be seen that in the process of the method shown in fig. 1, the sub-networks are divided and layered according to the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of each sub-network, and then the layered fusion is performed on the neural network to be optimized according to the optimal fusion result of each sub-network, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the neural network to be optimized, and the efficiency of determining the optimal fusion result of the neural network to be optimized is improved under the condition that the optimal fusion result under the condition that the preset fusion rule and the fusion target are satisfied is ensured.

In some embodiments, as shown in fig. 2, in step S100, the sub-networks to be optimized are divided, and network layer fusion is performed on each sub-network according to a preset fusion rule and a fusion target, which may be implemented by the following steps:

step S101, respectively carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layer numbers;

step S102, respectively performing layer fusion on each bottommost subnet according to a preset fusion rule and a fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-layer sub-network is the sub-network with the least layer number included in a plurality of different types of sub-networks;

Step S103, performing layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, a preset fusion rule and a fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a plurality of layers, the structure of the layers is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is the same low-level sub-network as the layers included in the layers.

For example, when the neural network to be optimized is divided into the sub-networks, the neural network to be optimized may be divided into different types of sub-networks according to different sub-network division modes.

Illustratively, different types of subnets include different numbers of layers.

Considering that the layer fusion of the higher layer subnetwork requires the use of the optimal fusion result of the layer of the lower layer subnetwork when the layer fusion of the subnetwork is performed.

It should be noted that, in the embodiment of the present application, the bottom layer subnets and the high layer subnets are relatively, but not absolutely, and for two different types of subnets, the subnets with a large number of layers are high layer subnets, and the subnets with a small number of layers are bottom layer subnets.

For example, for a layer 2 subnet (one subnet includes 2 layers) and a layer 3 subnet (one subnet includes 3 layers), the layer 2 subnet is the bottom layer subnet and the layer 3 subnet is the higher layer subnet.

For a 3-layer subnet and a 4-layer subnet (one subnet includes 4 layers), the 3-layer subnet is a bottom-layer subnet, and the 4-layer subnet is a higher-layer subnet.

It should be noted that, for the layer 4 subnetwork, the layer 2 subnetwork and the layer 3 subnetwork both belong to the bottom subnetwork.

When the subnets are in layer fusion, layer fusion can be performed on the subnets at the bottommost layer (namely the subnets with the least layer number), so that the optimal fusion result of the subnets at the bottommost layer can be obtained.

When the optimal fusion result of the bottom-layer subnetwork is determined, layer fusion can be performed on the high-layer subnetwork according to the sequence that the subnetwork comprises the layers from less to more, and the optimal fusion result of the high-layer subnetwork, the preset fusion rule and the fusion target are sequentially obtained.

Illustratively, for a level of any one of the optimal fusion results for any one of the higher-level subnets that includes multiple layers, if there is a same underlying subnet (referred to herein as a target underlying subnet) as the level includes layers, the structure of the level in the optimal fusion result for the higher-level subnet is consistent with the structure of the optimal fusion result for the target underlying subnet.

For example, for a layer4 sub-network (assuming that layer1 to layer4 are included), if the optimal fusion result is layer1 to layer3 fusion, layer4 does not participate in the fusion (also can be understood as a layer4 single layer is a layer, and the other single layers are the same), the structure of the layer obtained by fusion for layer1 to layer3 is identical to the structure of the optimal fusion result of a layer3 sub-network including layer1 to layer 3.

In one example, the bottom-most subnet includes 1 layer number; the subnets obtained in different subnets are divided into 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;

in step S103, performing layer fusion on the high-level subnet according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level subnet may include:

for any layer 2 sub-network, respectively determining a first in-out bandwidth of the layer 2 sub-network under the condition that two layers in the layer 2 sub-network are fused into one level, and a second in-out bandwidth of the bottommost sub-network under the condition that the two layers are not fused;

if the first in-out bandwidth is smaller than the second in-out bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the layer 2 subnetwork;

If the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that the layer is not fused to be the optimal fusion result of the layer2 subnetwork.

It should be noted that, for a subnet with a layer number of 1 (may be referred to as a layer1 subnet), the optimal fusion result may be a level for 1 layer.

By way of example, taking an optimal fusion result as an fusion result with minimum bandwidth in and out as an example, that is, the fusion target is to minimize the throughput of the data interaction between the neural network and the external storage.

For any layer2 subnet (including layer1 and layer2 as an example), the candidate fusion scheme for the layer2 subnet may include the following scheme:

scheme 1: layer1 and layer2 are fused into 1 level;

scheme 2: layer1 and layer2 do not merge (i.e., layer1 is one level and layer2 is one level).

The ingress and egress bandwidths corresponding to scheme 1 (referred to herein as the first ingress and egress bandwidths) and scheme 2 (referred to herein as the second ingress and egress bandwidths) may be determined separately.

For example, for any fusion scheme, the ingress and egress bandwidths corresponding to the fusion scheme are the sum of bandwidths corresponding to the input features and bandwidths corresponding to the output features of each level under the fusion scheme, and specific implementation of the ingress and egress bandwidths may be described below with reference to specific examples, which are not described herein in detail.

The first access bandwidth and the second access bandwidth can be compared, and if the first access bandwidth is smaller than the second access bandwidth, the scheme 1 is determined to be an optimal fusion result; if the first ingress and egress bandwidth is greater than the second ingress and egress bandwidth, determining that scheme 2 is the optimal fusion result.

It should be noted that, for the case where the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, the scheme 1 may be used as the optimal fusion result, and the scheme 2 may also be used as the optimal fusion result.

In addition, when the sub-network division is performed, the sub-network division of 1 layer may not be performed, that is, the sub-network of the lowest layer is not 1 layer, for example, the sub-network division of 2 layers, the sub-network division of 3 layers, the sub-network division of … and the sub-network of (N-1) layer may be performed, and in this case, the sub-network of the lowest layer may be the sub-network of 2 layers.

In one example, if the first ingress and egress bandwidth is equal to the second ingress and egress bandwidth, then scheme 2 is determined to be the optimal fusion result.

In an example, in step S103, layer fusion is performed on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, so as to obtain the optimal fusion result of the high-level subnet, which may further include:

and for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to an optimal fusion result, a preset fusion rule and a fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.

For example, when the neural network to be optimized (assuming N layers) is divided into subnets, the subnet division may be performed according to a division manner of 2 layers of subnets (i.e., each subnet is a 2-layer subnet), the subnet division may be performed according to a division manner of 3 layers of subnets (i.e., each subnet is a 3-layer subnet), …, and the subnet division may be performed according to a division manner of (N-1) layers of subnets (i.e., each subnet is a (N-1) layer subnet).

When the subnets are layer fused, the optimal fusion result of each 2-layer subnet can be determined first, then the optimal fusion result of each 3-layer subnet is determined according to the optimal fusion result of each single-layer subnet and the optimal fusion result of each 2-layer subnet, then the optimal fusion result of each 4-layer subnet is determined according to the optimal fusion result of each single-layer subnet, the optimal fusion result of each 2-layer subnet and the optimal fusion result of each 3-layer subnet, and then the like until the optimal fusion result of each highest-layer subnet (such as (N-1) layer subnet) is determined.

When the layer fusion is performed on the sub-network, the fusion rule and the fusion target are the same as those when the layer fusion is performed on the neural network to be optimized.

In one example, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets a preset fusion rule limit.

Illustratively, layers involved in fusion may include, but are not limited to, conv (convolutional) layers, nonlinear layers, pool (pooling) layers, fully-connected layers, deconvolution layers, upsampling layers, or other network base layers.

Illustratively, the nonlinear layer may include a linear rectification function (which may also be referred to as a modified linear unit, rectified Linear Units, abbreviated as ReLU) or other activation function.

For example, when performing layer fusion on any subnet or neural network to be optimized, the maximum number of layers (herein denoted as m) that can be fused may be determined according to the layers participating in fusion and a preset fusion rule.

It should be noted that, for different layers, the maximum number of layers that can be fused may be different under the same fusion rule.

In one example, for a network to be fused (including a neural network to be optimized or a sub-network of the neural network to be optimized), the candidate fusion scheme may include all layer fusion, or be divided into two optimal sub-networks (one optimal sub-network includes x layers, and the other optimal sub-network includes y-x layers, where y is the total number of layers in the network to be fused), and for an optimal sub-network including x layers, its structure is consistent with that under the optimal fusion scheme of the x-layer sub-network of the network to be fused (i.e., a sub-network including x layers); for an optimal sub-network of y-x layers, its structure is consistent with the structure under the optimal convergence scheme of the y-x layer sub-network of the network to be converged (i.e., the sub-network comprising y-x layers).

For example, for the k-layer subnet, if m=k under the condition that the fusion rule limit is satisfied, determining that the optimal fusion scheme is that all k layers are fused; if m < k, the optimal fusion scheme may be to divide the k-layer sub-network into 2 optimal sub-networks, one including m layers and one including k-m layers. For an optimal subnetwork comprising m layers, the result is consistent with the structure under the optimal convergence scheme of the m-layer subnetwork, and for an optimal subnetwork comprising k-m layers, the result is consistent with the structure under the optimal convergence scheme of the k-m layer subnetwork.

In some embodiments, before the sub-network division of the neural network to be optimized in step S100, the method may further include:

acquiring a network splitting configuration instruction;

splitting the neural network to be optimized into at least two parts to be optimized according to the acquired network splitting configuration instruction;

in step S100, performing subnet division on the neural network to be optimized may include:

respectively carrying out sub-network division on each part to be optimized;

in step S110, performing layer fusion on the neural network to be optimized according to the optimal fusion result, the preset fusion rule and the fusion target of each subnet may include:

for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, and a preset fusion rule and a fusion target to obtain the optimal fusion result of the part to be optimized;

And determining the optimal fusion result of the neural network to be optimized according to the optimal fusion result of each part to be optimized.

For example, in order to further improve the efficiency of optimizing the neural network, before the neural network is optimized according to the method flow shown in fig. 1, a network splitting configuration instruction may be further obtained, and according to the obtained network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized, and further, an optimal fusion scheme of each part to be optimized may be respectively determined, so as to obtain an optimal fusion scheme of the neural network to be optimized.

For any part to be optimized, the optimal fusion scheme can be determined according to the method flow shown in fig. 1.

Illustratively, the network split configuration instructions may be determined from a priori knowledge.

Because the optimal fusion scheme of the sub-network can be determined firstly in the flow of the method shown in fig. 1, and then the optimal fusion scheme of the neural network to be optimized can be determined according to the optimal fusion scheme of the sub-network, if the neural network to be optimized can be determined that certain layers of the neural network to be optimized cannot be fused into one level according to the existing priori knowledge, the network to be optimized can be split before the sub-network is divided.

For example, if it is known according to priori knowledge that the front N layer and the back (N-N) layer need to be separated under the premise of meeting the fusion target, N is 1-N < N, the neural network may be first separated into two parts including the front N layer and the back (N-N) layer, and the optimal fusion result of each part is determined according to the flow shown in fig. 1, and then the optimal fusion result of the neural network is determined, so as to further improve the efficiency of determining the optimal fusion result of the neural network to be optimized.

In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.

In this embodiment, the N-layer neural network is divided according to a subnet division manner in which one subnet includes 2 layers, one subnet includes 3 layers, …, and one subnet includes (N-1) subnets, and an optimal fusion result of each subnet is determined by adopting a bottom-up manner according to a sequence in which the number of layers included in the subnets is from less to more, and further, the optimal fusion result of the N-layer neural network (may be referred to as a pyramid layer fusion scheme) is determined, and in the fusion process, the optimal fusion result of each subnet is reserved, and non-optimal fusion results are not reserved.

For an N-layer neural network, a bottom-up calculation strategy is adopted, and an optimal fusion result of a 2-layer subnet is calculated first; calculating the optimal fusion result of the 3-layer subnetwork according to the optimal fusion result of the 2-layer subnetwork; and calculating the optimal fusion result of the 4-layer subnetwork based on the optimal fusion result of the 2-layer subnetwork and the optimal fusion result of the 3-layer subnetwork, and the like until the optimal fusion result of the N-layer neural network is determined.

In the process of fusing any sub-network or neural network, at least 2 adjacent layers are respectively selected for fusion (such as a horizontal adjacent layer with the same feature map input or a vertical adjacent layer with the feature map calculation result of the former layer being at least part of the input of the latter layer) according to a fusion rule (which can also be an optimization rule) so as to determine an optimal fusion result meeting a fusion target.

It should be noted that the number of layers involved in fusion cannot exceed the fusion rule limit.

Illustratively, the adjacent layers may include a Conv layer, a nonlinear layer, a pool layer, a full-link layer, a deconvolution layer, or an upsampling layer, etc.

For example, when the problem f (i, j) is defined as determining the optimal fusion result of the sub-network formed by layers i to j, f (1, h) exists for the problem f (1, N), that is, in the process of determining the optimal fusion result of the N-layer neural network, the optimal fusion result (h < N) of the sub-network formed by layers 1 to h needs to be determined.

Accordingly, in solving for f (1, N), each f (i, j) in the following table needs to be solved separately:

f(1，1)	f(1，2)	f(1，3)	…	f(1，N)
					f(2，2)	f(2，3)	f(h-2，h)	…
		f(3，3)	f(h-1，h)	f(N-2，N)
							f(h，h)	f(N-1，N)
				f(N，N)

for example, each f (i, j) can be sequentially solved from right to left and from bottom to top until the solution of f (1, N) is completed, so as to obtain the optimal fusion result of the N-layer neural network.

Effects of the technical solutions provided by the embodiments of the present application are described below with reference to examples.

Taking comparison with the following greedy fusion scheme as an example, it is assumed that an implementation procedure of the greedy fusion scheme is as follows:

1. setting the cache size in the chip, and inputting the network structure of the neural network;

2. if the cache in the chip can put down the current layer, putting in;

3. if the on-chip cache can not put down the current layer, ending the current level, emptying the current cache, and putting the current layer into a new level;

4. ending if all layers are divided; otherwise, go to 2).

For the neural network shown in fig. 3A, the optimal fusion results obtained according to the greedy fusion scheme and the technical scheme provided in the embodiments of the present application may be shown in fig. 3B and fig. 3C, respectively.

Wherein 3×128×64 represents a convolution kernel 3*3 ("×may also be described as" x "), the number of input/output channels is 128 and 64, the stride (step size) is 1, and the size of the feature map input is assumed to be w×h.

For the fusion result shown in fig. 3B, the bandwidth is 128wh+256wh+256wh+128 wh=768 wh, and for the fusion result shown in fig. 3C, the bandwidth is 128wh+64wh+64wh+128 wh=384 wh.

As can be seen, for the neural network shown in fig. 3A, the bandwidth of the optimal fusion result obtained by using the technical solution provided in the embodiment of the present application is half of the bandwidth of the fusion result obtained by using the greedy fusion solution.

The methods provided herein are described above. The apparatus provided in this application is described below:

referring to fig. 4, a schematic structural diagram of a neural network optimization device provided in an embodiment of the present application, as shown in fig. 4, the neural network optimization device may include:

a dividing unit 410, configured to divide the sub-network of the neural network to be optimized;

the optimizing unit 420 is configured to perform network layer fusion on each subnet according to a preset fusion rule and a fusion target, so as to obtain an optimal fusion result of each subnet;

the optimizing unit 420 is further configured to perform layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, so as to obtain an optimal fusion result of the neural network to be optimized; and for any layer level comprising a plurality of layers in the optimal fusion result of the neural network to be optimized, the structure of the level is consistent with that of the optimal fusion result of a target subnet, and the target subnet is the same subnet as the layer included in the level.

In some embodiments, the dividing unit 410 performs subnet division on the neural network to be optimized, including:

carrying out subnet division on the neural network to be optimized according to a plurality of different subnet division modes respectively to obtain a plurality of different types of subnets, wherein the subnets obtained in the different subnet division modes comprise different layers;

the optimizing unit 420 performs network layer fusion on each subnet according to a preset fusion rule and a fusion target, including:

performing layer fusion on each bottommost subnet according to the preset fusion rule and the fusion target to obtain an optimal fusion result of each bottommost subnet; the bottom-most subnet is a subnet with the least number of layers included in the plurality of different types of subnets;

and carrying out layer fusion on the high-level sub-network according to the optimal fusion result of the low-level sub-network, the preset fusion rule and the fusion target to obtain the optimal fusion result of the high-level sub-network, wherein the number of layers included in the high-level sub-network is larger than that of layers included in the low-level sub-network, any one of the optimal fusion results of the high-level sub-network comprises a level of a plurality of layers, the structure of the level is consistent with that of the optimal fusion result of the target low-level sub-network, and the target low-level sub-network is a low-level sub-network which is the same as the layer included in the level.

In some embodiments, the bottom-most subnet includes a layer number of 1; the subnets obtained in different subnet division modes comprise 2 layers of subnets, and the number of layers included in the 2 layers of subnets is 2; the optimal fusion result is the fusion result with the minimum bandwidth of the incoming and outgoing bands;

the optimizing unit 420 performs layer fusion on the high-level subnet according to the optimal fusion result of the low-level subnet, the preset fusion rule and the fusion target, and includes:

for any 2-layer sub-network, respectively determining a first access bandwidth of the bottommost sub-network under the condition that two layers in the 2-layer sub-network are fused into one level and a second access bandwidth of the bottommost sub-network under the condition that the two layers are not fused;

if the first access bandwidth is smaller than the second access bandwidth, determining that the two layers are fused into one level as an optimal fusion result of the 2-layer sub-network;

and if the first in-out bandwidth is larger than the second in-out bandwidth, determining that the layer is not fused into an optimal fusion result of the layer 2 subnetwork.

In some embodiments, the optimizing unit 420 performs layer fusion on the higher-layer subnet according to the optimal fusion result of the lower-layer subnet, the preset fusion rule and the fusion target, to obtain the optimal fusion result of the higher-layer subnet, and further includes:

And for any high-level sub-network with the layer number of k, fusing the high-level sub-network according to the optimal fusion result, the preset fusion rule and the fusion target of the low-level sub-network with the layer number of less than k to obtain the optimal fusion result of the high-level sub-network, wherein k is more than 2 and less than N, and N is the layer number of the network to be optimized.

In some embodiments, for any subnet including a layer number k, a candidate fusion scheme for performing layer fusion on the subnet includes fusing at least 2 layers and at most m layers, m is less than or equal to k, and m meets the fusion rule limit;

the layers include a convolutional Conv layer, a nonlinear layer, a pooling pool layer, a full-concatenated layer, a deconvolution layer, or an upsampling layer.

In some embodiments, before the dividing unit 410 performs subnet division on the neural network to be optimized, the method further includes:

acquiring a network splitting configuration instruction;

splitting the neural network to be optimized into at least two parts to be optimized according to the network splitting configuration instruction;

the dividing unit 410 performs subnet division on the neural network to be optimized, including:

respectively carrying out sub-network division on each part to be optimized;

The optimizing unit 420 performs layer fusion on the neural network to be optimized according to the optimal fusion result of each subnet, the preset fusion rule and the fusion target, and includes:

for any part to be optimized, performing layer fusion on the part to be optimized according to the optimal fusion result of each subnet of the part to be optimized, the preset fusion rule and the fusion target to obtain the optimal fusion result of the part to be optimized;

Fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application. The electronic device may include a processor 501, a memory 502 storing machine-executable instructions. The processor 501 and the memory 502 may communicate via a system bus 503. Also, the processor 501 may perform the neural network optimization method described above by reading and executing machine-executable instructions in the memory 502 corresponding to the encoded control logic.

The memory 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

In some embodiments, a machine-readable storage medium, such as memory 502 in fig. 5, is also provided, having stored thereon machine-executable instructions that when executed by a processor implement the neural network optimization method described above. For example, the machine-readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. A neural network optimization method, characterized in that the method is used to reduce the bandwidth requirements of a computing platform, and the method includes:

Divide the neural network to be optimized into subnets, and perform network layer fusion on each subnet based on the preset fusion rules and fusion goals to obtain the optimal fusion results for each subnet; among which, the fusion rules are used to classify the network layers participating in the fusion. Restrictions include restrictions on coefficient caching, restrictions on Map caching, and restrictions on layer-to-layer fusion; the coefficients are convolution layer weights, the coefficient cache is a cache that stores coefficients in the computing platform, and the Map The cache is a cache that stores feature maps in the computing platform;

According to the optimal fusion result of each subnetwork, the preset fusion rules and the fusion target, the neural network to be optimized is subjected to network layer fusion to obtain the optimal fusion result of the neural network to be optimized to minimize the The bandwidth of the interaction between the neural network and externally stored data; wherein, for any of the optimal fusion results of the neural network to be optimized including a hierarchy of multiple network layers, the structure of this hierarchy is consistent with the optimal fusion result of the target subnet The structure is consistent, and the target subnet is the same subnet as the network layer included in this level;

According to the optimal fusion result, the level of the neural network to be optimized is used as the basic computing unit of the hardware, and the neural network to be optimized is deployed to the computing platform; wherein the neural network to be optimized runs in the computing platform When , for a level, the input occupies one bandwidth and the output occupies one bandwidth, and the bandwidth is the data interaction bandwidth between the computing platform and external storage when running the neural network to be optimized;

Among them, the neural network to be optimized is divided into subnets, and the network layer fusion is performed on each subnet according to the preset fusion rules and fusion targets, including:

Divide the neural network to be optimized into subnets according to a variety of different subnet division methods to obtain multiple different types of subnets, and the subnets obtained under different subnet division methods include different numbers of network layers;

According to the preset fusion rules and fusion targets, network layer fusion is performed on each bottom-layer subnet respectively to obtain the optimal integration result of each bottom-layer subnet; the bottom-layer subnet is the multiple different types of subnets. The subnet that contains the smallest number of network layers;

When the optimal integration result of the lowest subnet is determined, the order of the number of network layers included in the subnet from small to large is determined, and the optimal integration result of the lower subnet, the preset integration rules and the integration target are sequentially determined for the high-level subnet. The network performs network layer integration to obtain the optimal integration result of the high-level subnet. For two different types of subnets, the subnet that includes a large number of network layers is the high-level subnet, and the subnet that includes a small number of network layers is the bottom layer. subnet; for any of the optimal integration results of the high-level subnet including a hierarchy of multiple network layers, the structure of this level is consistent with the structure of the optimal integration result of the target underlying subnet, and the target underlying subnet is the same as This layer includes the same underlying subnets as the network layer.

2. The method according to claim 1, characterized in that the number of network layers included in the bottom subnet is 1; the subnets obtained under the different subnet division modes include 2-layer subnets, and 2-layer subnets. The number of network layers included in the network is 2; the optimal fusion result is the fusion result with the smallest incoming and outgoing bandwidth;

The network layer integration of the high-level subnet based on the optimal integration result of the underlying subnet, the preset integration rules and the integration target includes:

For any layer 2 subnet, determine the first inbound and outbound bandwidth of the lowest layer subnet when the two network layers in the layer 2 subnet are integrated into one layer, and the lowest bandwidth when the two network layers are not integrated. The second incoming and outgoing bandwidth of the underlying subnet;

If the first inbound and outbound bandwidth is less than the second inbound and outbound bandwidth, it is determined that the two network layers are merged into one level to be the optimal fusion result of the 2-layer subnet;

If the first inbound and outbound bandwidth is greater than the second inbound and outbound bandwidth, it is determined that the network layer is not integrated and is the optimal integration result of the layer 2 subnet.

3. The method according to claim 2, characterized in that, based on the optimal fusion result of the underlying subnet, the preset fusion rules and the fusion target, network layer fusion is performed on the high-level subnet to obtain the high-level subnet. The optimal fusion results also include:

For any high-level subnet that includes k network layers, the high-level subnet is merged based on the optimal fusion result of the underlying subnet that includes a number of network layers less than k, the preset fusion rules, and the fusion target. , the optimal integration result of the high-level subnet is obtained, 2＜k＜N, N is the number of network layers of the network to be optimized.

4. The method according to claim 3, characterized in that, for any subnet that includes k network layers, the candidate fusion solution for network layer fusion for the subnet includes at least 2 network layers. , and at most m network layers are fused, m≤k, and m satisfies the restrictions of the fusion rules;

The network layer includes a convolution layer, a nonlinear layer, a pooling layer, a fully connected layer, a deconvolution layer or an upsampling layer.

5. The method according to any one of claims 1-4, characterized in that, before subnetting the neural network to be optimized, it further includes:

Get network split configuration instructions;

According to the network splitting configuration instruction, the neural network to be optimized is split into at least two parts to be optimized;

The subnetwork division of the neural network to be optimized includes:

Divide each part to be optimized into subnets respectively;

Performing network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnet, the preset fusion rules and the fusion target includes:

For any part to be optimized, perform network layer fusion on the part to be optimized based on the optimal fusion results of each subnet of the part to be optimized, the preset fusion rules and the fusion target, and obtain the optimal part of the part to be optimized. Excellent fusion results;

According to the optimal fusion result of each part to be optimized, the optimal fusion result of the neural network to be optimized is determined.

6. A neural network optimization device, characterized in that the device is used to reduce the bandwidth requirements of a computing platform, and the device includes:

Division unit, used to divide the neural network to be optimized into subnets;

The optimization unit is used to perform network layer fusion on each subnet based on the preset fusion rules and fusion targets to obtain the optimal fusion results for each subnet; among them, the fusion rules are used to limit the network layers participating in the fusion. This restriction Including limitations of coefficient cache, Map cache, and layer-to-layer fusion limitations; the coefficients are convolution layer weights, the coefficient cache is a cache that stores coefficients in the computing platform, and the Map cache is a cache in the computing platform. Cache to store feature maps;

The optimization unit is also used to perform network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnetwork, the preset fusion rules and the fusion target, to obtain the optimal neural network to be optimized. Fusion results to minimize the bandwidth of the interaction between the neural network and externally stored data; wherein, for any of the optimal fusion results of the neural network to be optimized includes a hierarchy of multiple network layers, the structure of the hierarchy is consistent with the goal The structure of the optimal fusion result of the subnet is consistent, and the target subnet is the same subnet as the network layer included in this level; based on the optimal fusion result, the level of the neural network to be optimized is used as the basic calculation of the hardware Unit, deploy the neural network to be optimized to a computing platform; wherein, when the neural network to be optimized is run in the computing platform, for a level, the input occupies one bandwidth and the output occupies one bandwidth, and the bandwidth is the The data interaction bandwidth between the computing platform and external storage when running the neural network to be optimized;

Wherein, the dividing unit divides the neural network to be optimized into subnets, including:

The optimization unit performs network layer integration on each subnet based on the preset integration rules and integration targets, including:

7. The device according to claim 6, characterized in that,

The number of network layers included in the lowest subnet is 1; the subnets obtained under the different subnet division methods include 2-layer subnets, and the number of network layers included in the 2-layer subnet is 2; the optimal integration result is ingress and egress. Fusion result with minimum bandwidth;

The optimization unit performs network layer integration on the high-level subnet based on the optimal integration result of the underlying subnet, the preset integration rules and the integration target, including:

For any layer 2 subnet, determine the first inbound and outbound bandwidth of the lowest layer subnet when the two network layers in the layer 2 subnet are merged into one layer, and the lowest bandwidth when the two network layers are not merged. The second incoming and outgoing bandwidth of the underlying subnet;

If the first inbound and outbound bandwidth is greater than the second inbound and outbound bandwidth, it is determined that the network layer is not integrated and is the optimal integration result of the layer 2 subnet;

Wherein, the optimization unit performs network layer fusion on the high-level subnet based on the optimal fusion result of the underlying subnet, the preset fusion rules and the fusion target, and obtains the optimal fusion result of the high-level subnet, which also includes:

For any high-level subnet that includes k network layers, the high-level subnet is merged based on the optimal fusion result of the underlying subnet that includes a number of network layers less than k, the preset fusion rules, and the fusion target. , to obtain the optimal integration result of the high-level subnet, 2＜k＜N, N is the number of network layers of the network to be optimized;

Among them, for any subnet that includes k network layers, the candidate fusion solution for network layer fusion for this subnet includes fusion of at least 2 network layers and at most m network layers, m≤k , and m satisfies the fusion rule restrictions;

The network layer includes a convolution layer, a nonlinear layer, a pooling layer, a fully connected layer, a deconvolution layer or an upsampling layer;

and / or,

Before dividing the neural network to be optimized into subnets, the division unit also includes:

Get network split configuration instructions;

The division unit divides the neural network to be optimized into subnets, including:

Divide each part to be optimized into subnets respectively;

The optimization unit performs network layer fusion on the neural network to be optimized based on the optimal fusion results of each subnetwork, the preset fusion rules and the fusion target, including:

8. An electronic device, characterized in that it includes a processor and a memory, the memory stores machine-executable instructions that can be executed by the processor, and the processor is used to execute the machine-executable instructions to implement the rights as claimed in The method described in any one of claims 1-5.

9. A machine-readable storage medium, characterized in that machine-executable instructions are stored in the machine-readable storage medium, and when the machine-executable instructions are executed by a processor, any one of claims 1-5 is implemented. the method described.