CN109961138A

CN109961138A - Neural network training method and Related product

Info

Publication number: CN109961138A
Application number: CN201711347767.1A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Cambrian Technology Co Ltd
Current assignee: Cambricon Technologies Corp Ltd; Beijing Zhongke Cambrian Technology Co Ltd
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2019-07-02
Anticipated expiration: 2037-12-14
Also published as: TWI793225B; CN109961138B; TW201928794A

Abstract

Present disclosure provides the training method and Related product of the neural network executed on a kind of integrated circuit chip device, the neural network includes multilayer, described method includes following steps: receiving training instruction, first layer input data and first layer weight data are determined according to the training instruction, and computing device obtains the n-th output result by the n-layer forward operation that first layer input data and first layer weight data execute neural network；Result, which is exported, according to n-th obtains the n-th output result gradient, the n-th reversed operation of the reversed operation of n-th layer is obtained according to the training instruction, computing device exports result gradient, n-th layer input data, n-th layer weight group data and the n-th reversed operation according to n-th and obtains the n-th reversed computational complexity.The advantage that the technical solution that present disclosure provides has calculation amount small, low in energy consumption.

Description

Neural network training method and Related product

Technical field

Present disclosure is related to field of neural networks more particularly to a kind of neural network training method and Related product.

Background technique

Artificial neural network (Artificial Neural Network, i.e. ANN), it is artificial since being the 1980s The research hotspot that smart field rises.It is abstracted human brain neuroid from information processing angle, and it is simple to establish certain Model is formed different networks by different connection types.Neural network or class are also often directly referred to as in engineering and academia Neural network.Neural network is a kind of operational model, is constituted by being coupled to each other between a large amount of node (or neuron).It is existing Neural network operation be based on CPU (Central Processing Unit, central processing unit) or GPU (English: Graphics Processing Unit, graphics processor) Lai Shixian neural network forward operation, such forward operation Computationally intensive, power consumption is high.

Summary of the invention

Present disclosure embodiment provides a kind of neural network training method and Related product, can promote the processing of computing device Speed improves efficiency.

In a first aspect, providing a kind of training method of the neural network executed on integrated circuit chip device, the nerve net Network includes n-layer, which is characterized in that described method includes following steps:

Training instruction is received, determines first layer input data and first layer weight group data according to the training instruction, is calculated Device obtains forward operation by the n-layer forward operation that first layer input data and first layer weight group data execute neural network N-th output result；

The n-th output result gradient is obtained according to the n-th output result, it is reversed to obtain n-th layer according to the training instruction It is anti-to export result gradient, n-th layer input data, n-th layer weight group data and n-th according to n-th for the reversed operation of the n-th of operation The n-th reversed computational complexity is obtained to operation, determines the n-th output result gradient, n-th according to the described n-th reversed computational complexity Layer input data, the corresponding n-th reverse data type of n-th layer weight group data export result gradient for n-th, n-th layer inputs number N-th layer weight group is obtained with the reversed operation of n-th layer that the n-th reverse data type executes neural network according to, n-th layer weight group data Gradient and n-th layer input data gradient；N-th layer weight group data are updated using the n-th layer weight group gradient；It is described N-th reverse data type includes: fixed point type or floating point type；

N-th layer input data gradient is executed n-1 layers of direction operation as (n-1)th layer of the (n-1)th output result gradient to obtain To n-1 layers of weight group gradient, using the weight group data of n-1 layers of weight group gradient updating respective layer, the weight group data packet It includes；At least two weights.

Second aspect provides a kind of integrated circuit chip device, and the integrated circuit chip device is for executing nerve net The training operation of network, the neural network include n-layer；The integrated circuit chip device includes: that processing circuit and outside connect Mouthful；

The external interface, for receiving training instruction；

The processing circuit, for determining first layer input data and first layer weight group data according to the training instruction, Computing device obtains forward direction by the n-layer forward operation that first layer input data and first layer weight group data execute neural network N-th output result of operation；

The processing circuit is also used to obtain the n-th output result gradient according to the n-th output result, according to the instruction Practice the n-th reversed operation that instruction obtains the reversed operation of n-th layer, exports result gradient, n-th layer input data, n-th layer according to n-th Weight group data and the n-th reversed operation obtain the n-th reversed computational complexity, determine according to the described n-th reversed computational complexity N-th output result gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data, by the n-th output As a result the n-th layer that gradient, n-th layer of input data, n-th layer weight group data execute neural network with the n-th reverse data type is anti- N-th layer weight group gradient and n-th layer input data gradient are obtained to operation；N-th layer is weighed using the n-th layer weight group gradient Value group data are updated；The n-th reverse data type includes: fixed point type or floating point type；

The processing circuit, be also used to using n-th layer input data gradient as (n-1)th layer (n-1)th export result gradient It executes n-1 layers of direction operation and obtains n-1 layers of weight group gradient, using the weight group number of n-1 layers of weight group gradient updating respective layer According to the weight group data include；At least two weights.

The third aspect, provides a kind of neural network computing device, and the neural network computing device includes one or more The integrated circuit chip device that second aspect provides.

Fourth aspect, provides a kind of combined treatment device, and the combined treatment device includes: the nerve that the third aspect provides Network operations device, general interconnecting interface and general processing unit；

The neural network computing device is connect by the general interconnecting interface with the general processing unit.

5th aspect, provides a kind of chip, the device or the 4th of the device of the integrated chip second aspect, the third aspect The device of aspect.

6th aspect, provides a kind of electronic equipment, the electronic equipment includes the chip of fourth aspect.

As can be seen that providing data conversion computing circuit by present disclosure embodiment and converting the type of data block Operation afterwards saves transfer resource and computing resource, so it is with low in energy consumption, the small advantage of calculation amount.

Detailed description of the invention

Fig. 1 is a kind of training method schematic diagram of neural network.

Fig. 1 a is a kind of forward operation schematic diagram of neural network.

Fig. 1 b is a kind of schematic configuration diagram of fixed-point data type.

Fig. 2 a is convolution input data schematic diagram.

Fig. 2 b is convolution kernel schematic diagram.

Fig. 2 c is the operation window schematic diagram of a three-dimensional data block of input data.

Fig. 2 d is another operation window schematic diagram of a three-dimensional data block of input data.

Fig. 2 e is the another operation window schematic diagram of a three-dimensional data block of input data

Fig. 3 a is a kind of structural schematic diagram of neural network chip.

Fig. 3 b is the structural schematic diagram of another neural network chip.

Fig. 4 a is Matrix Multiplication with matrix schematic diagram.

Fig. 4 b is Matrix Multiplication with the method flow diagram of matrix.

Fig. 4 c is Matrix Multiplication with vector schematic diagram.

Fig. 4 d is Matrix Multiplication with the method flow diagram of vector.

Fig. 4 e is a kind of neural metwork training schematic diagram.

Fig. 4 f is another neural metwork training schematic diagram.

Fig. 4 g is neural network forward direction and reversed operation schematic diagram.

Fig. 4 h is neural metwork training multilayered structure schematic diagram.

Fig. 5 a is that present disclosure is also disclosed that a combined treatment device structural schematic diagram.

Fig. 5 b is that present disclosure is also disclosed that a combined treatment device another kind structural schematic diagram.

Fig. 5 c is a kind of structural schematic diagram for neural network processor board that present disclosure embodiment provides；

Fig. 5 d is a kind of structural schematic diagram for neural network chip encapsulating structure that present disclosure embodiment stream provides；

Fig. 5 e is a kind of structural schematic diagram for neural network chip that present disclosure embodiment stream provides；

Fig. 6 is a kind of schematic diagram for neural network chip encapsulating structure that present disclosure embodiment stream provides；

Fig. 6 a is the schematic diagram for another neural network chip encapsulating structure that present disclosure embodiment stream provides.

Specific embodiment

In order to make those skilled in the art more fully understand present disclosure scheme, below in conjunction in present disclosure embodiment The technical solution in present disclosure embodiment is clearly and completely described in attached drawing, it is clear that described embodiment is only Present disclosure a part of the embodiment, instead of all the embodiments.Based on the embodiment in present disclosure, those of ordinary skill in the art Every other embodiment obtained without creative efforts belongs to the range of present disclosure protection.

It is described to determine the n-th output result ladder according to the described n-th reversed computational complexity in the method that first aspect provides Degree, n-th layer of input data, the corresponding n-th reverse data type of n-th layer weight group data, comprising:

By the n-th reversed computational complexity compared with preset threshold, such as the described n-th reversed computational complexity is higher than described default Threshold value determines that the n-th reverse data type is fixed point type, as described in being less than or equal to the described n-th reversed computational complexity Preset threshold, computing device determine that the n-th reverse data type is floating point type.

In the method that first aspect provides, the method is determining the n-th output according to the described n-th reversed computational complexity As a result after gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data further include:

Determine that the n-th output result gradient, n-th layer input data, n-th layer weight group data belong to (n+1)th is reversed Data type, such as the (n+1)th reverse data type is different from the n-th reverse data type, will belong to the (n+1)th reverse data The n-th output result gradient of type, n-th layer input data, n-th layer weight group data conversion are at belonging to the n-th reverse data The n-th output the result gradient, n-th layer input data, n-th layer weight group data of type.

In the method that first aspect provides, such as the reversed operation of the n-layer is convolution algorithm, and convolution input data is described N-th layer input data, convolution kernel are the n-th output result gradient,

N-th reversed computational complexity=α * C*kW*kW*M*N*W*C*H；

Wherein, α is convolution coefficient, and value range is greater than 1；C, kW, kW, M be convolution kernel four dimensions value, N, W, C, H is the value of convolution input data four dimensions；

If the complexity is greater than given threshold, determines that the n-th reverse data type is floating type, determine the volume Whether product input data and convolution kernel are floating data, will if the convolution input data and convolution kernel are not floating data The convolution input data is converted into floating data, and convolution kernel is converted into floating data, then by convolution input data, convolution kernel Convolution algorithm is executed with floating type.

In the method that first aspect provides, such as the described n-th reversed operation are as follows: Matrix Multiplication matrix operation, the input number According to for n-th layer of input data, the weight is the n-th output result gradient；

Complexity=β * F*G*E*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are n-th layer input The row, column value of data, E, F are the row, column value of weight；

If the complexity is greater than given threshold, determine that the n-th reverse data type is floating type, determine this Whether n-layer input data and weight are floating data, if the n-th layer input data and weight are not floating data, by this N-th layer input data is converted into floating data, and weight is converted into floating data, then by n-th layer input data, weight with floating Point data type executes Matrix Multiplication matrix operation.

In the method that first aspect provides, such as the described n-th reversed operation are as follows: Matrix Multiplication vector operation, the input number According to for n-th layer of input data, the weight is the n-th output result gradient；

Complexity=β * F*G*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are that n-th layer inputs number According to row, column value, F be n-th output result gradient train value；

If the complexity is greater than given threshold, determine that the n-th reverse data type is floating type, determine this Whether n-layer input data and weight are floating data, if the n-th layer input data and weight are not floating data, by this N-th layer input data is converted into floating data, and weight is converted into floating data, then by n-th layer input data, weight with floating Point data type executes Matrix Multiplication vector operation.

First aspect provide method in, the n-th reversed operation can also include: bigoted operation, entirely connect operation, One of GEMM operation, GEMV operation, activation operation or any combination.

In the device that second aspect provides, the processing circuit, specifically by the n-th reversed computational complexity and preset threshold Compare, such as the described n-th reversed computational complexity is higher than the preset threshold, determines the n-th reverse data type for fixed point class Type, such as the described n-th reversed computational complexity are less than or equal to the preset threshold, determine that the n-th reverse data type is floating Vertex type.

In the device that second aspect provides, the integrated circuit chip device further include: data type conversion circuit；

The processing circuit is also used to determine the n-th output the result gradient, n-th layer input data, n-th layer weight group The (n+1)th reverse data type that data belong to, such as the (n+1)th reverse data type is different from the n-th reverse data type, Conversion command is sent to the data type conversion circuit,

The data type conversion circuit, for the n-th output result ladder of the (n+1)th reverse data type will to be belonged to Degree, n-th layer input data, n-th layer weight group data conversion are at the n-th output result ladder for belonging to the n-th reverse data type Degree, n-th layer input data, n-th layer weight group data.

In the device that second aspect provides, such as the reversed operation of the n-layer is convolution algorithm, and convolution input data is described N-th layer input data, convolution kernel are the n-th output result gradient,

The processing circuit, for calculating the n-th reversed computational complexity,

N-th reversed computational complexity=α * C*kW*kW*M*N*W*C*H；

The processing circuit is also used to the complexity such as and is greater than given threshold, determines that the n-th reverse data type is floating Point data type determines whether the convolution input data and convolution kernel are floating data；Such as the convolution input data and volume Product core is not floating data, which is converted into floating data, convolution kernel is converted into floating data, then will Convolution input data, convolution kernel execute convolution algorithm with floating type.

In the device that second aspect provides, such as the described n-th reversed operation are as follows: Matrix Multiplication matrix operation, the input number According to for n-th layer of input data, the weight is the n-th output result gradient；

N-th reversed computational complexity=β * F*G*E*F；Wherein, β is matrix coefficient, and value range is F, G more than or equal to 1 For the row, column value of n-th layer input data, E, F are the row, column value of weight；

The processing unit is greater than given threshold for such as the complexity, determines that the n-th reverse data type is floating-point Data type determines whether the n-th layer input data and weight are floating data, such as the n-th layer input data and weight It is not floating data, which is converted into floating data, weight is converted into floating data, then by n-th layer Input data, weight execute Matrix Multiplication matrix operation with floating type.

In the device that second aspect provides, such as the described n-th reversed operation are as follows: Matrix Multiplication vector operation, the input number According to for n-th layer of input data, the weight is the n-th output result gradient；

N-th reversed computational complexity=β * F*G*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are The row, column value of n-th layer of input data, F are the train value of the n-th output result gradient；

The processing circuit is also used to the complexity such as and is greater than given threshold, determines that the n-th reverse data type is floating Point data type determines whether the n-th layer input data and weight are floating data, such as the n-th layer input data and power Value is not floating data, which is converted into floating data, weight is converted into floating data, then by n-th Layer input data, weight execute Matrix Multiplication vector operation with floating type.

As shown in Figure 1, the step of neural metwork training, includes:

Each layer in one (multilayer) neural network successively executes forward operation；

Reversed operation, which is successively executed, according to the sequence of opposite layer obtains weight gradient；

The weight of update forward operation is removed with the gradient for the weight being calculated；

Here it is the successively iteration of the training of neural network, entire training process needs repeat (i.e. successive ignition meter Calculate) this process is multiple；

As shown in Figure 1a, a kind of forward operation of the neural network provided for present disclosure embodiment, each layer uses oneself Type according to layer of input data and weight specified by operation rule corresponding output data is calculated；

The forward operation process (being also reasoning, inference) of neural network is the input data for successively handling each layer, warp Certain calculating is crossed, the process of output data is obtained, has the feature that

The input of a certain layer:

The input of a certain layer can be the input data of neural network；

The input of a certain layer can be the output of other layers；

The input of a certain layer can be the output (the case where corresponding to Recognition with Recurrent Neural Network) of this layer of last moment；

A certain layer can obtain input from multiple above-mentioned input sources simultaneously；

The output of a certain layer:

The output of a certain layer can be used as the output result of neural network；

The output of a certain layer can be other layers of input；

The output of a certain layer can be the input (the case where Recognition with Recurrent Neural Network) of this layer of subsequent time；

The output of a certain layer can export result to above-mentioned multiple outbound courses；

Specifically, the type of the operation of the layer in the neural network includes but is not limited to following several:

Convolutional layer (i.e. execution convolution algorithm)；

Full articulamentum (executing full connection operation)；

Normalize (regularization) layer: including LRN (Local Response Normalization) layer, BN (Batch Normalization) the types such as layer；

Pond layer；

Active coating: including but is not limited to the Tanh with Sigmoid layers of Types Below, ReLU layers, PReLu layers, LeakyReLu layers Layer；

The reversed operation of layer, each layer of reversed operation need to be implemented two parts operation: a part is using may be dilute It dredges the output data gradient indicated and may be that the input data of rarefaction representation calculates the gradient of weight (for " weight is more Newly " step updates the weight of this layer), another part is using the output data gradient that may be rarefaction representation and may be sparse The weight of expression, calculate input data gradient (for the output data gradient as next layer in reversed operation for its into The reversed operation of row)；

Reversed operation is according to the sequence opposite with forward operation, the back transfer gradient since the last layer.

In a kind of optinal plan, the output data gradient that a certain layer retrospectively calculate obtains be can come from:

The gradient of the last loss function of neural network (lost function or cost function) passback；

Other layers of input data gradient；

The input data gradient (the case where corresponding to Recognition with Recurrent Neural Network) of this layer of last moment；

A certain layer can obtain output data gradient from multiple above-mentioned sources simultaneously；

After having executed the reversed operation of neural network, the gradient of the weight of each layer is just calculated, in the step In, the first input-buffer and the second input-buffer of described device are respectively used to store the gradient of the weight of this layer and weight, so Using weights gradient is updated weight in arithmetic element afterwards；

The operation being mentioned above all is that multilayer neural network was realized in one layer of operation in neural network Cheng Shi, in forward operation, after upper one layer of artificial neural network, which executes, to be completed, next layer of operational order can be by operation list Calculated output data carries out operation as next layer of input data and (or carries out certain behaviour to the output data in member It is re-used as next layer of input data), meanwhile, weight is also replaced with to next layer of weight；In reversed operation, when upper one After the completion of the reversed operation of layer artificial neural network executes, next layer of operational order can be by input number calculated in arithmetic element According to gradient as next layer output data gradient carry out operation (or to the input data gradient carry out it is certain operation remake Output data gradient for next layer), while weight being replaced with to next layer of weight；It (is indicated with figure below, in the following figure The arrow of dotted line indicates reversed operation, and the arrow of solid line indicates forward operation, respectively schemes the meaning of following mark expression figure)

The representation method of fixed point data

The method of fixed point refers to that the expression of the data of some data block in network is converted into certain specific fixation is small The data coding method (the 0/1 bit disposing way for being mapped to data on circuit device) of several positions；

In a kind of optinal plan, multiple data composition number is used into same fixed-point representation according to block as a whole Method carries out fixed point expression；

Fig. 1 b shows the specific table of short digit fixed-point data structure for storing data according to an embodiment of the present invention Show method.Wherein, 1Bit are used to indicate symbol, and M are used to indicate integer part, and N for indicating fractional part；It compares In 32 floating data representations, the short position fixed-point data representation that the present invention uses is less in addition to occupying number of bits Outside, it for same layer, same type of data in neural network, such as all weight datas of first convolutional layer, also in addition sets The position of a flag bit Point location record decimal point has been set, number can have been adjusted according to the distribution of real data in this way According to expression precision and can indicate data area.

Expression, that is, 32bit of floating number is indicated, but for this technical solution, uses fixed-point number that can reduce The digit of the bit of one numerical value, to reduce the data volume of transmission and the data volume of operation.

Input data indicated with Fig. 2 a (N number of sample, each sample have C channel, a height of H of the characteristic pattern in each channel, Width is W), weight namely convolution kernel indicate (there is M convolution kernel, each convolution kernel has C channel, and height and width are respectively with Fig. 2 b KH and KW).For N number of sample of input data, the rule of convolution algorithm is the same, and explained later is on a sample The process of convolution algorithm is carried out, on a sample, each of M convolution kernel will carry out same operation, Mei Gejuan Product kernel operation obtains a sheet of planar characteristic pattern, and M plane characteristic figure is finally calculated in M convolution kernel, (to a sample, volume Long-pending output is M characteristic pattern), for a convolution kernel, inner product fortune is carried out in each plan-position of a sample It calculates, is slided then along the direction H and W, for example, Fig. 2 c indicates that a convolution kernel is right in a sample of input data The position of inferior horn carries out the corresponding diagram of inner product operation；Fig. 2 d indicates that the position of convolution slides a lattice and Fig. 2 e to the left and indicates convolution One lattice of position upward sliding.

When the first operation is convolution algorithm, the input data is convolution input data, and the weight data is convolution kernel,

First complexity=α * C*kW*kW*M*N*W*C*H；

If first complexity is greater than given threshold, determine whether the convolution input data and convolution kernel are floating number According to which being converted into floating data, will be rolled up if the convolution input data and convolution kernel is not floating data Product consideration convey changes floating data into, and convolution input data, convolution kernel are then executed convolution algorithm with floating type.

Specifically, the mode of the process of convolution can be handled using chip structure as shown in Figure 3a, main process task circuit ( Be properly termed as master unit) data conversion computing circuit can the first complexity be greater than given threshold when, by the part of weight Or the data conversion in whole convolution kernels, at the data of fixed point type, the control circuit of main process task circuit is by the part of weight or entirely Data in portion's convolution kernel are sent to those of to be directly connected with main process task circuit based process by lateral Data Input Interface Circuit (being referred to as base unit) (for example, vertical data path that the grey of the top is filled in Fig. 3 b)；

In a kind of optinal plan, the control circuit of main process task circuit sends the data of some convolution kernel in weight every time One number or a part of number give some based process circuit；(for example, for some based process circuit, send for the 1st time The 1st number of 3 rows, the 2nd the 2nd number sent in the 3rd row data, the 3rd number ... or the 1st of the 3rd the 3rd row of transmission The 3rd row the first two number of secondary transmission, second of the 3rd row the 3rd of transmission and the 4th number, third time send the 3rd row the 5th and the 6th Number ...；)

Another situation is that, the control circuit of main process task circuit is by the several convolution kernels of certain in weight in a kind of optinal plan Data every time respectively send an a part of number of number person give some based process circuit；(for example, for some based process electricity Road, the 1st number of the 1st the 3rd, 4, the 5 every row of row of transmission, the 2nd number of the 2nd the 3rd, 4, the 5 every row of row of transmission, the 3rd transmission 3rd number ... of the 3rd, 4, the 5 every row of row or the 1st transmission every row the first two number of the 3rd, 4,5 row, second of transmission the 3rd, The every row the 3rd of 4,5 rows and the 4th number, third time send the every row the 5th of the 3rd, 4,5 row and the 6th number ...；)

The control circuit of main process task circuit divides input data according to the position of convolution, the control of main process task circuit Circuit by the data some or all of in input data in convolution position be sent to by vertical Data Input Interface directly with Main process task circuit be connected those of based process circuit (for example, what the grey in Fig. 3 b on the left of based process gate array was filled Lateral data path)；

In a kind of optinal plan, the control circuit of main process task circuit is every by the data of some convolution position in input data One number of secondary transmission or a part of number give some based process circuit；(for example, for some based process circuit, the 1st time It sending the 3rd and arranges the 1st number, the 2nd the 2nd number sent in the 3rd column data sends the 3rd number ... of the 3rd column for the 3rd time, Or the 1st the 3rd column the first two number of transmission, second, which sends the 3rd, arranges the 3rd and the 4th number, and third time sends the 3rd and arranges the 5th and the 6 numbers ...；)

Another situation is that, the control circuit of main process task circuit is by the several volumes of certain in input data in a kind of optinal plan The data of product position respectively send a number every time or a part of number gives some based process circuit；(for example, for some base Plinth processing circuit, the 1st number of the 1st the 3rd, 4,5 column each column of transmission, the 2nd number of the 2nd the 3rd, 4,5 column each column of transmission, The 3rd number ... or the 1st the 3rd, 4,5 column each column the first two number of transmission of 3rd the 3rd, 4,5 column each column of transmission, second The 3rd, 4,5 column each column the 3rd and the 4th number are sent, third time sends the 3rd, 4,5 column each column the 5th and the 6th number ...；)

After based process circuit receives the data of weight, which is transmitted by its lateral data output interface It is connected next based process circuit to it (for example, the transverse direction of the white filling in Fig. 3 b among based process gate array Data path)；After based process circuit receives the data of input data, which is connect by its vertical data output Port transmission is to coupled next based process circuit (for example, the white in Fig. 3 b among based process gate array The vertical data path of filling)；

Each based process circuit carries out operation to the data received；

In a kind of optinal plan, based process circuit calculates the multiplication of one or more groups of two data every time, then will As a result it is added on register and/or on piece caching；

In a kind of optinal plan, based process circuit calculates the inner product of one or more groups of two vectors every time, then will As a result it is added on register and/or on piece caching；

After based process circuit counting goes out result, result can be transferred out from data output interface；

In a kind of optinal plan, which can be the final result or intermediate result of inner product operation；

Specifically, from the interface if the based process circuit has the output interface being directly connected with main process task circuit Transmission is as a result, if it is not, towards that directly can export result to the direction of the based process circuit of main process task circuit output (for example, bottom line based process circuit outputs it result and is directly output to main process task circuit in Fig. 3 b, other bases Processing circuit transmits downwards operation result from vertical output interface).

After based process circuit receives the calculated result from other based process circuits, transmit the data to Its other based process circuit or main process task circuit for being connected；

Towards can be directly to the direction of main process task circuit output output result (for example, bottom line based process electricity Road outputs it result and is directly output to main process task circuit, other based process circuits transmit downwards fortune from vertical output interface Calculate result)；

Main process task circuit receive each based process circuit inner product operation as a result, output result can be obtained.

Refering to Fig. 4 a, Fig. 4 a is a kind of Matrix Multiplication with the operation of matrix, such as first operation are as follows: Matrix Multiplication matrix fortune It calculates, the input data is the first matrix of the Matrix Multiplication matrix operation, and the weight is the Matrix Multiplication matrix operation Second matrix；

First complexity=β * F*G*E*F；Wherein, β is matrix coefficient, and value range is F, G first more than or equal to 1 The row, column value of matrix, E, F are the row, column value of the second matrix；

If first complexity is greater than given threshold, determine whether first matrix and the second matrix are floating number According to if first matrix and the second matrix are not floating data, by first matrix conversion at floating data, by the second matrix It is converted into floating data, the first matrix, the second matrix are then executed into Matrix Multiplication matrix operation with floating type.

Refering to Fig. 4 b, the operation of Matrix Multiplication matrix is completed using device as shown in Figure 3b；

Be described below calculate size be M row L column matrix S and size be L row N column matrix P multiplication operation, (square Every a line in battle array S is identical with each column length of matrix P, as shown in Figure 2 d) to possess K a for the neural computing device Based process circuit:

Step S401b, matrix S and matrix P are converted by main process task circuit when such as the first complexity is greater than given threshold Every data line in matrix S is distributed in K based process circuit by the control circuit of fixed point type data, main process task circuit Some on, based process circuit by the data received be stored on piece caching and/or register in；Specifically, can be with It is sent to the based process circuit in K based process circuit with main process task circuit connection.

In a kind of optinal plan, if line number M≤K of S, the control circuit of main process task circuit is to M based process Circuit distributes a line of s-matrix respectively；

In a kind of optinal plan, if line number M > K of S, the control circuit of main process task circuit is to each based process electricity Distribute a line or the data of multirow in s-matrix respectively in road.

There is Mi row to be distributed to i-th of based process circuit in S, the collection of this Mi row is collectively referred to as Ai, as Fig. 2 e indicates i-th of base Calculating to be executed on plinth processing circuit.

In a kind of optinal plan, in each based process circuit, such as in i-th of based process circuit:

Matrix A i is stored in i-th of based process circuit register by the received matrix A i distributed by main process task circuit And/or on piece caching；Advantage be the reduction of after volume of transmitted data, improve computational efficiency, reduce power consumption.

Step S402b, each section in matrix P is transferred to each base by the control circuit of main process task circuit in a broadcast manner Plinth processing circuit；

In a kind of optinal plan, each section in matrix P can only be broadcasted and once arrive posting for each based process circuit In storage or on piece caching, i-th of based process circuit is fully multiplexed the data of the matrix P this time obtained, Complete the corresponding inner product operation with every a line in matrix A i；Multiplexing in the present embodiment is specifically as follows based process circuit and exists Reused in calculating, for example, matrix P data multiplexing, can be and the data of matrix P are being used for multiple times.

In a kind of optinal plan, each section in matrix P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the matrix P obtained every time Without multiplexing, the inner product operation of the every a line corresponded in matrix A i is completed by several times；

In a kind of optinal plan, each section in matrix P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the matrix P obtained every time Fractional reuse is carried out, the inner product operation of the every a line corresponded in matrix A i is completed；

In a kind of optinal plan, each based process circuit, such as i-th of based process circuit, calculating matrix Ai's The inner product of data and the data of matrix P；

Step S403b, the result of inner product operation is added up and is transmitted by the accumulator circuit of each based process circuit Return main process task circuit.

In a kind of optinal plan, based process circuit can execute the part and be transmitted back to that inner product operation obtains for each Main process task circuit adds up；

The part that can also be obtained the inner product operation that each based process circuit executes in a kind of optinal plan and guarantor It is cumulative to terminate to be transmitted back to main process task circuit later in register and/or the on piece caching of existence foundation processing circuit；

In a kind of optinal plan, can also by the obtained part of inner product operation that each based process circuit executes and It is stored in the register and/or on piece caching of based process circuit and adds up under partial picture, be transferred under partial picture Main process task circuit adds up, cumulative to terminate to be transmitted back to main process task circuit later.

It is a kind of Matrix Multiplication with the operation schematic diagram of vector refering to Fig. 4 c.Such as first operation are as follows: Matrix Multiplication vector fortune It calculates, the input data is the first matrix of the Matrix Multiplication vector operation, and the weight is the Matrix Multiplication vector operation Vector；

First complexity=β * F*G*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are the first square The row, column value of battle array, F are the train value of vector；

If first complexity is greater than given threshold, determine whether first matrix and vector are floating data, such as First matrix and vector are not floating data, and by first matrix conversion at floating data, vector is converted into floating number According to then by the first matrix, vector with floating type execution Matrix Multiplication vector operation.

Refering to Fig. 4 d, Fig. 4 d has provided a kind of implementation method of Matrix Multiplication vector, can specifically include:

Step S401, every data line in matrix S is converted into pinpointing by the data conversion computing circuit of main process task circuit The data of type, the control circuit of main process task circuit are distributed in some in K based process circuit, based process circuit The distribution data received are stored in the on piece caching and/or register of based process circuit；

In a kind of optinal plan, if line number M≤K of matrix S, the control circuit of main process task circuit is to K basis Processing circuit distributes a line of s-matrix respectively；

In a kind of optinal plan, if line number M > K of matrix S, the control circuit of main process task circuit gives each basis Processing circuit distributes a line or the data of multirow in s-matrix respectively.

The collection for the row being distributed in the S of i-th of based process circuit is combined into Ai, shares Mi row, as Fig. 2 c is indicated i-th Calculating to be executed on based process circuit.

In a kind of optinal plan, in each based process circuit, such as in i-th of based process circuit, it can incite somebody to action The distribution data received such as matrix A i is stored in the register and/or on piece caching of i-th of based process circuit；Advantage The volume of transmitted data of distribution data after being the reduction of, improves computational efficiency, reduces power consumption.

Step S402, vector P is converted into the data of fixed point type, main place by the data type computing circuit of main process task circuit Each section in the vector P of fixed point type is transferred to K based process circuit by the control circuit of reason circuit in a broadcast manner；

In a kind of optinal plan, the control circuit of main process task circuit, which can only broadcast each section in vector P, once to be arrived In register or the on piece caching of each based process circuit, i-th of based process circuit is to the vector P's this time obtained Data are fully multiplexed, and the corresponding inner product operation with every a line in matrix A i is completed.Advantage is reduced from main process task circuit To the volume of transmitted data of the repetition transmission of the vector P of based process circuit, execution efficiency is improved, reduces transmission power consumption.

In a kind of optinal plan, each section in vector P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the vector P obtained every time Without multiplexing, the inner product operation of the every a line corresponded in matrix A i is completed by several times；Advantage is reduced in based process circuit The volume of transmitted data of the vector P of the single transmission in portion, and the capacity of based process circuit caching and/or register can be reduced, Execution efficiency is improved, transmission power consumption is reduced, reduces cost.

In a kind of optinal plan, each section in vector P can be repeatedly broadcast to respectively by the control circuit of main process task circuit In register or the on piece caching of a based process circuit, data of i-th of based process circuit to the vector P obtained every time Fractional reuse is carried out, the inner product operation of the every a line corresponded in matrix A i is completed；Advantage is reduced from main process task circuit to base The volume of transmitted data of plinth processing circuit also reduces the volume of transmitted data inside based process circuit, improves execution efficiency, reduces and pass Defeated power consumption.

Step S403, the inner product of the data of inner product operation device the circuit counting matrix S and vector P of K based process circuit, Such as i-th of based process circuit, the inner product of the data of the data and vector P of calculating matrix Ai；

Step S404, the accumulator circuit of K based process circuit is added up the result of inner product operation As a result, accumulation result to be transmitted back to main process task circuit in the form of fixed point type.

In a kind of optinal plan, each based process circuit can be executed to the part and (part that inner product operation obtains That is a part of accumulation result, such as accumulation result are as follows: F1*G1+F2*G2+F3*G3+F4*G4+F5*G5, then part and Can be with are as follows: the value of F1*G1+ F2*G2+F3*G3) it is transmitted back to main process task circuit and adds up；Advantage is to reduce based process Operand inside circuit improves the operation efficiency of based process circuit.

The part that can also be obtained the inner product operation that each based process circuit executes in a kind of optinal plan and guarantor It is cumulative to terminate to be transmitted back to main process task circuit later in register and/or the on piece caching of existence foundation processing circuit；Advantage is, Reduce the volume of transmitted data between based process circuit and main process task circuit, improve operation efficiency, reduces data transmission Power consumption.

In a kind of optinal plan, can also by the obtained part of inner product operation that each based process circuit executes and It is stored in the register and/or on piece caching of based process circuit and adds up under partial picture, be transferred under partial picture Main process task circuit adds up, cumulative to terminate to be transmitted back to main process task circuit later；Advantage is to reduce based process circuit and master Volume of transmitted data between processing circuit, improves operation efficiency, reduces data transmission power consumption, reduces based process circuit Internal operand improves the operation efficiency of based process circuit.

Neural network training method

Involved all data can use different data presentation techniques in neural network training process；

Specifically, the data presentation technique includes but is not limited to following situations:

The floating number of different bit wides；

The fixed-point number of different bit wides, the fixed-point number of different fixed positions；

The different moments (at the time of being specifically just different the number of iterations or initialization) of training process trained Different data block (i.e. multiple input numbers in different phase (i.e. positive or reversed operation), different layers, same layer in journey According to block, output block) or the same data block in the sub-block of different piece that divides, be ok:

It can be respectively using fixed point or floating-point；

For fixed point:

Use different fixed point bit wides；

Use different fixed point deviants (namely fixed position)；

The concrete methods of realizing for illustrating neural metwork training with an actual example below, is as shown in Figure 1a single layer The specific calculating schematic diagram of the neural metwork training of operation, as shown in Figure 1a, input data and weight or parameter execute this layer Operation, technical solution provided by the embodiments of the present application determine whether according to the forward operation amount of input data, weight and this layer The type of the input data and weight is converted, specific mode can be with are as follows: such as the input data and weight storage institute The register or storage space of occupancy are greater than given threshold and the forward operation amount of this layer is greater than setting operand, determine that this is defeated When to enter data and weight data be floating data, the input data and weight data are converted into fixed-point data.Such as input number Accordingly and the occupied register of weight storage or storage space are less than given threshold, as the input data and weight data are Fixed-point data after input data and weight data are converted into floating data, executes this layer of operation.

Principle the application of above-mentioned data type conversion is elaborated, is a kind of fixed point class as shown in Figure 1 b The expression of type data, for computing system, the storage bit number of 1 floating data is 32bit, and for fixed-point data, especially It is using the expression of the data progress data of the floating point type such as Fig. 1 b shown in, and the storage bit number of 1 fixed-point data can be with Accomplish 16Bit hereinafter, so for this conversion for, the transport overhead that can be significantly reduced between calculator, in addition, for For calculator, the space of the data storage of less bit is also smaller, i.e., storage overhead can be smaller, and calculation amount can also be reduced, I.e. computing cost can be reduced, so the expense of computing cost and storage can be reduced, but the conversion for data type It is the need for the expense of part, hereinafter referred to as transition overhead, for computationally intensive, the big data of data storage capacity, conversion is opened Pin almost can be ignored for subsequent computing cost, storage overhead and transport overhead, so for calculating Amount is big, the big data of data storage capacity, and the application is used data type conversion into the technical solution of the data of fixed point type, It is small conversely, for calculation amount, the small data of data storage capacity, at this time since computing cost itself, storage overhead and transmission are opened Pin just it is smaller, at this time if using fixed-point data, since the precision of fixed-point data can be slightly below floating data, calculation amount compared with Under the premise of small, need to guarantee the precision calculated, so passing through increasing here by the data conversion of fixed point type at floating data Add lesser expense to improve the purpose of the precision of calculating.

Illustrated below with actual example, as shown in fig 4e, this layer of operation mode is matrix multiplication, input data And weight is matrix, input data here is by taking matrix I as an example for convenience of explanation, and weight is by taking matrix W as an example, such as Fig. 4 e It is shown, output data=matrix I* matrix W；Here if the sum of number of columns and line number amount of matrix I and matrix W are larger, It can think that above-mentioned matrix I and matrix W take up too much space in memory and/or register and calculation amount is also larger, In this way if matrix I and matrix W are floating data, matrix I and matrix W are converted into fixed-point data, then executed The operation of matrix multiplication.

For example, matrix I be 1000*1000 matrix, matrix W is also the matrix of 1000*1000, then for number of columns with And the sum of line number amount is 2000, quantity is very big, and corresponding calculation amount is just bigger, and Matrix Multiplication is with the multiplication of the inner product operation of matrix Operation i.e. 109 time, for this technical solution, since the quantity of matrix I and matrix W are very big, it is impossible to once by all numbers According to whole transmission, data same in this way may be transmitted several times, it is assumed that be transmitted for fixed-point data, so that it may be significantly reduced transmission Data volume, and then reduce transport overhead, the calculating and storage relative to, less bit can also reduce computing cost with And storage overhead.

It is for the technical solution that fixed-point data is converted into floating data, by taking reversed operation as an example, as shown in figure 4g Calculate structure on to arrow direction be a kind of reversed operation.By taking reversed operation as an example, for direction operation, direction operation For output data gradient, which is specifically as follows, if the output data gradient is the last of current iteration calculating One layer, (the default operation can by default operation for the output data for the last layer which calculates By producer's sets itself according to their needs, not limit the concrete operation step of the default operation here) obtain output number It is the last layer that non-current iteration calculates according to gradient, such as the output data gradient, such as the output data gradient changes for this The n-th layer that generation calculates, then the output data gradient is the input data gradient that (n+1)th layer of reversed operation is calculated.

Illustrated below with actual example, as shown in figure 4g, this layer of operation mode is matrix multiplication, input data For matrix, weight is scalar, and input data here by taking matrix I as an example, such as scheme by taking scalar C as an example by weight for convenience of explanation Shown in 4g, output data=matrix I*C；At this time due to the data that weight is scalar, data calculation amount is smaller, in this way if matrix I is fixed-point data, then matrix I is converted into floating data, then in the operation for executing Matrix Multiplication scalar.

For example, matrix I is the matrix of 10*10, scalar C is counted then being 20 for the sum of number of columns and line number amount Amount is smaller, (assuming that being greater than 100 here is considered larger, is considered smaller less than 100, for the 100 digital capacity field technique Personnel can arbitrarily set.) corresponding calculation amount with regard to very little, Matrix Multiplication with the multiplying of the inner product operation of matrix i.e. 102 time, Since calculation amount is small, if still calculated using fixed-point data, its precision can be had an impact, in order to enable computational accuracy It is higher, under the premise of smaller calculation amount, computational accuracy can be improved by floating data calculating.

In a kind of optinal plan, fixed fixed point bit wide can be respectively adopted in each data block of each layer in network, but It is its fixed position with training iteration cycle variation；

Specifically, in the training process, the data presentation technique of some data block can be set as follows；

It specifically, can be to some data block selection arbitrary data representation method when starting to train；

In a kind of optinal plan, the floating point representation method of specific bit wide can choose；

In a kind of optinal plan, the fixed-point representation method of particular form can choose；

It can choose specific fixed point bit wide；

It can choose specific fixed position；

In a kind of optional scheme, it is fixed to be arranged according to the maximum value of the absolute value of data all in the data block Point position；

In a kind of optinal plan, fixed point can be set according to the minimum value of the absolute value of data all in the data block Position；

It, can be according to the fixed position of other data blocks come notebook data block when determining initialization in a kind of optinal plan Fixed position；

In a kind of optinal plan, the fixed position of notebook data block can be set based on experience value；

Specifically, in the training process, the data that can change some data block in any iteration cycle number indicate Method；

It, can be without adjustment for some data block in a kind of optinal plan；

In a kind of optinal plan, it can be adjusted every certain the number of iterations；

In a kind of optinal plan, it can be adjusted every certain training epoch number；

In a kind of optinal plan, it can be adjusted according to unfixed the number of iterations interval；

In a kind of optinal plan, unfixed trained epoch number can be spaced and be adjusted；

Specifically, in the training process, it adjusts adjustable for arbitrary data when the representation method of some data block Representation method；

In a kind of optinal plan, if a data block is indicated using fixed fixed point bit wide fixed-point number, The adjustment mode for the fixed position that data indicate may is that

In a kind of optinal plan, fixed position is set according to the setting method of initialization fixed position every time；

In a kind of optinal plan, if what some data block was calculated according to the initial setting method of fixed position Fixed position increased in some iteration cycle than last iteration cycle, that is just by the fixed position in this period towards the side of increase Method changes；Conversely, then changing towards reduced direction.

Present disclosure also provides a kind of integrated circuit chip device, and the integrated circuit chip device is for executing neural network Training, the neural network include multilayer, the integrated circuit chip device includes: processing circuit and external interface；

The external interface, for receiving training instruction；

The processing circuit leads to for determining first layer input data and first layer weight data according to the training instruction The n-layer forward operation for crossing first layer input data and first layer weight data execution neural network obtains the n-th output result；

The processing circuit is also used to obtain the n-th output result gradient according to the n-th output result, refer to according to the training The the n-th reversed operation for obtaining n-th layer of reversed operation is enabled, exports result gradient, n-th layer input data, n-th layer weight according to n-th Group data and the n-th reversed operation obtain the n-th reversed computational complexity, determine that n-th is defeated according to the described n-th reversed computational complexity Result gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data out, by the n-th output result ladder Degree, n-th layer input data, n-th layer weight group data are obtained with the reversed operation of n-layer that the n-th reverse data type executes neural network To n weight gradient of n-layer operation；The n-th reverse data type includes: fixed point type or floating point type；

The processing circuit is also used to be updated n weight of n-layer operation using the n weight gradient.

Present disclosure is also disclosed that a neural network computing device comprising one or more is in such as Fig. 3 a or such as Fig. 3 b institute The chip shown is used to obtained from other processing units to operational data and control information, executes specified neural network computing, Implementing result passes to peripheral equipment by I/O interface.Peripheral equipment for example camera, display, mouse, keyboard, network interface card, Wifi interface, server.When comprising more than one mind such as Fig. 3 a or chip as shown in Figure 3b, such as Fig. 3 a or as shown in Figure 3b Chip chamber can be linked by specific structure and transmit data, for example, interconnected and transmitted by PCIE bus Data, to support the operation of more massive neural network.At this point it is possible to share same control system, can also have respectively solely Vertical control system；Can with shared drive, can also each accelerator have respective memory.In addition, its mutual contact mode can be Any interconnection topology.

The neural network computing device compatibility with higher can pass through PCIE interface and various types of server phases Connection.

Present disclosure is also disclosed that a combined treatment device comprising above-mentioned neural network computing device, general interconnection Interface and other processing units (i.e. general processing unit).Neural network computing device is interacted with other processing units, altogether The operation specified with completion user.Such as the schematic diagram that 5a is combined treatment device.

Other processing units, including central processor CPU, graphics processor GPU, neural network processor etc. are general/special With one of processor or above processor type.Processor quantity included by other processing units is with no restrictions.Its His interface of the processing unit as neural network computing device and external data and control, including data are carried, and are completed to Benshen Unlatching, stopping through network operations device etc. control substantially；Other processing units can also cooperate with neural network computing device It is common to complete processor active task.

General interconnecting interface, for transmitting data and control between the neural network computing device and other processing units Instruction.The neural network computing device obtains required input data, write-in neural network computing dress from other processing units Set the storage device of on piece；Control instruction can be obtained from other processing units, write-in neural network computing device on piece Control caching；The data in the memory module of neural network computing device can also be read and be transferred to other processing units.

As shown in Figure 5 b, optionally, which further includes storage device, for being stored in this arithmetic element/arithmetic unit Or data required for other arithmetic elements, be particularly suitable for required for operation data this neural network computing device or its The data that can not be all saved in the storage inside of his processing unit.

The combined treatment device can be used as the SOC on piece of the equipment such as mobile phone, robot, unmanned plane, video monitoring equipment The die area of control section is effectively reduced in system, improves processing speed, reduces overall power.When this situation, the combined treatment The general interconnecting interface of device is connected with certain components of equipment.Certain components for example camera, display, mouse, keyboard, Network interface card, wifi interface.

C referring to figure 5., Fig. 5 c are a kind of structural representation for neural network processor board that present disclosure embodiment provides Figure.As shown in Fig. 5 c, above-mentioned neural network processor board 10 include neural network chip encapsulating structure 11, first it is electrical and Non-electrical attachment device 12 and first substrate (substrate) 13.

Present disclosure is not construed as limiting the specific structure of neural network chip encapsulating structure 11, optionally, as fig 5d, Above-mentioned neural network chip encapsulating structure 11 includes: neural network chip 111, second electrical and non-electrical attachment device 112, the Two substrates 113.

The concrete form of neural network chip 111 involved in present disclosure is not construed as limiting, above-mentioned neural network chip 111 Including but not limited to the neural network chip for integrating neural network processor, above-mentioned chip can be by silicon materials, germanium material, amount Sub- material or molecular material etc. are made.(such as: more harsh environment) and different application demands can will be upper according to the actual situation Neural network chip is stated to be packaged, so that the major part of neural network chip is wrapped, and will be on neural network chip Pin is connected to the outside of encapsulating structure by conductors such as gold threads, for carrying out circuit connection with more outer layer.

Present disclosure is not construed as limiting the specific structure of neural network chip 111, optionally, please refers to Fig. 1 a or Fig. 1 b institute The device shown.

Present disclosure for first substrate 13 and the second substrate 113 type without limitation, can be printed circuit board (printed circuit board, PCB) or (printed wiring board, PWB), it is also possible to be other circuit boards.It is right The making material of PCB is also without limitation.

The second substrate 113 involved in present disclosure is electrical and non-by second for carrying above-mentioned neural network chip 111 The neural network chip that above-mentioned neural network chip 111 and the second substrate 113 are attached by electrical connection arrangement 112 Encapsulating structure 11, for protecting neural network chip 111, convenient for by neural network chip encapsulating structure 11 and first substrate 13 into Row further encapsulation.

Electrical for above-mentioned specific second and non-electrical attachment device 112 the corresponding knot of packaged type and packaged type Structure is not construed as limiting, and can be selected suitable packaged type with different application demands according to the actual situation and simply be improved, Such as: flip chip ball grid array encapsulates (Flip Chip Ball Grid Array Package, FCBGAP), slim four directions Flat type packaged (Low-profile Quad Flat Package, LQFP), the quad flat package (Quad with radiator Flat Package with Heat sink, HQFP), without pin quad flat package (Quad Flat Non-lead Package, QFN) or the encapsulation side small spacing quad flat formula encapsulation (Fine-pitch Ball Grid Package, FBGA) etc. Formula.

Flip-chip (Flip Chip), suitable for the area requirements after encapsulation are high or biography to the inductance of conducting wire, signal In the case where defeated time-sensitive.In addition to this packaged type that wire bonding (Wire Bonding) can be used, reduces cost, mentions The flexibility of high encapsulating structure.

Ball grid array (Ball Grid Array), is capable of providing more pins, and the average conductor length of pin is short, tool The effect of standby high-speed transfer signal, wherein encapsulation can encapsulate (Pin Grid Array, PGA), zero slotting with Pin-Grid Array Pull out force (Zero Insertion Force, ZIF), single edge contact connection (Single Edge Contact Connection, SECC), contact array (Land Grid Array, LGA) etc. replaces.

Optionally, using the packaged type of flip chip ball grid array (Flip Chip Ball Grid Array) to mind It is packaged through network chip 111 and the second substrate 113, the schematic diagram of specific neural network chip encapsulating structure can refer to Fig. 6.As shown in fig. 6, above-mentioned neural network chip encapsulating structure includes: neural network chip 21, pad 22, soldered ball 23, second Tie point 25, pin 26 on substrate 24, the second substrate 24.

Wherein, pad 22 is connected with neural network chip 21, passes through the tie point 25 on pad 22 and the second substrate 24 Between welding form soldered ball 23, neural network chip 21 and the second substrate 24 are connected, that is, realize neural network chip 21 Encapsulation.

Pin 26 is used for the external circuit with encapsulating structure (for example, the first substrate on neural network processor board 10 13) be connected, it can be achieved that external data and internal data transmission, it is corresponding convenient for neural network chip 21 or neural network chip 21 Neural network processor data are handled.Type and quantity present disclosure for pin are also not construed as limiting, according to difference Encapsulation technology different pin forms can be selected, and defer to certain rule and arranged.

Optionally, above-mentioned neural network chip encapsulating structure further includes insulation filler, is placed in pad 22, soldered ball 23 and connects In gap between contact 25, interference is generated between soldered ball and soldered ball for preventing.

Wherein, the material of insulation filler can be silicon nitride, silica or silicon oxynitride；Interference comprising electromagnetic interference, Inductive interferences etc..

Optionally, above-mentioned neural network chip encapsulating structure further includes radiator, for distributing neural network chip 21 Heat when operation.Wherein, radiator can be the good sheet metal of one piece of thermal conductivity, cooling fin or radiator, for example, wind Fan.

For example, as shown in Figure 6 a, neural network chip encapsulating structure 11 include: neural network chip 21, pad 22, Soldered ball 23, the second substrate 24, the tie point 25 in the second substrate 24, pin 26, insulation filler 27, thermal grease 28 and metal Shell cooling fin 29.Wherein, thermal grease 28 and metal shell cooling fin 29 are used to distribute heat when neural network chip 21 is run Amount.

Optionally, above-mentioned neural network chip encapsulating structure 11 further includes reinforced structure, is connect with pad 22, and interior is embedded in In soldered ball 23, to enhance the bonding strength between soldered ball 23 and pad 22.

Wherein, reinforced structure can be metal wire structure or column structure, it is not limited here.

Present disclosure is electrical for first and the concrete form of non-electrical device of air 12 is also not construed as limiting, can refer to second it is electrical and Neural network chip encapsulating structure 11 is packaged by the description of non-electrical device of air 112 by welding, can also be with By the way of connecting line connection or pluggable mode connection the second substrate 113 and first substrate 13, it is convenient for the first base of subsequent replacement Plate 13 or neural network chip encapsulating structure 11.

Optionally, first substrate 13 includes the interface etc. for the internal storage location of extension storage capacity, such as: synchronous dynamic Random access memory (Synchronous Dynamic Random Access Memory, SDRAM), Double Data Rate synchronous dynamic with Machine memory (Double Date Rate SDRAM, DDR) etc., the place of neural network processor is improved by exented memory Reason ability.

It may also include quick external equipment interconnection bus (Peripheral Component on first substrate 13 Interconnect-Express, PCI-E or PCIe) interface, hot-swappable (the Small Form-factor of small package Pluggable, SFP) interface, Ethernet interface, Controller Area Network BUS (Controller Area Network, CAN) connect Mouthful etc., for the data transmission between encapsulating structure and external circuit, the convenience of arithmetic speed and operation can be improved.

Neural network processor is encapsulated as neural network chip 111, neural network chip 111 is encapsulated as neural network Neural network chip encapsulating structure 11 is encapsulated as neural network processor board 10, by board by chip-packaging structure 11 Interface (slot or lock pin) and external circuit (such as: computer motherboard) carry out data interaction, i.e., directly by using nerve Network processing unit board 10 realizes the function of neural network processor, and protects neural network chip 111.And Processing with Neural Network Other modules can be also added on device board 10, improve the application range and operation efficiency of neural network processor.

In one embodiment, the present disclosure discloses an electronic devices comprising above-mentioned neural network processor plate Card 10 or neural network chip encapsulating structure 11.

Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal, Mobile phone, automobile data recorder, navigator, sensor, camera, server, camera, video camera, projector, wrist-watch, earphone, movement Storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.

The vehicles include aircraft, steamer and/or vehicle；The household electrical appliance include TV, air-conditioning, micro-wave oven, Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator；The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument And/or electrocardiograph.

Particular embodiments described above has carried out further in detail the purpose of present disclosure, technical scheme and beneficial effects Describe in detail it is bright, it is all it should be understood that be not limited to present disclosure the foregoing is merely the specific embodiment of present disclosure Within the spirit and principle of present disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of present disclosure Within the scope of shield.

Claims

1. a kind of training method of the neural network executed on integrated circuit chip device, which includes n-layer, the n Value range is the integer more than or equal to 2, which is characterized in that described method includes following steps:

Training instruction is received, determines first layer input data and first layer weight group data, computing device according to the training instruction The of forward operation is obtained by the n-layer forward operation that first layer input data and first layer weight group data execute neural network N exports result；

The n-th output result gradient is obtained according to the n-th output result, obtains the reversed operation of n-th layer according to the training instruction The n-th reversed operation, export result gradient, n-th layer input data, n-th layer weight group data and the n-th reversed fortune according to n-th Calculation obtains the n-th reversed computational complexity, determines that the n-th output result gradient, n-th layer are defeated according to the described n-th reversed computational complexity Enter data, the corresponding n-th reverse data type of n-th layer weight group data, by n-th export result gradient, n-th layer input data, N-th layer weight group data obtain n-th layer weight group ladder with the reversed operation of n-th layer that the n-th reverse data type executes neural network Degree and n-th layer input data gradient；N-th layer weight group data are updated using the n-th layer weight group gradient；Described N reverse data type includes: fixed point type or floating point type；

N-th layer input data gradient is executed into n-1 layers of direction operation as (n-1)th layer of the (n-1)th output result gradient and obtains n-1 Layer weight group gradient, using the weight group data of n-1 layers of weight group gradient updating respective layer, the weight group data include；Extremely Few two weights.

2. the method according to claim 1, wherein described determine n-th according to the described n-th reversed computational complexity Export result gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data, comprising:

By the n-th reversed computational complexity compared with preset threshold, such as the described n-th reversed computational complexity is higher than the default threshold Value determines that the n-th reverse data type is fixed point type, and such as the described n-th reversed computational complexity is less than or equal to described pre- If threshold value, computing device determines that the n-th reverse data type is floating point type.

3. according to the method described in claim 2, it is characterized in that, the method is according to the described n-th reversed computational complexity Determine the n-th output result gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data later also Include:

Determine the (n+1)th reverse data that the n-th output result gradient, n-th layer input data, n-th layer weight group data belong to Type, such as the (n+1)th reverse data type is different from the n-th reverse data type, will belong to the (n+1)th reverse data type The n-th output result gradient, n-th layer input data, n-th layer weight group data conversion is at belonging to the n-th reverse data type It is described n-th output result gradient, n-th layer input data, n-th layer weight group data.

4. the method according to claim 1, wherein if the reversed operation of the n-layer is convolution algorithm, convolution input Data are the n-th layer input data, and convolution kernel is the n-th output result gradient,

N-th reversed computational complexity=α * C*kW*kW*M*N*W*C*H；

Wherein, α is convolution coefficient, and value range is greater than 1；C, kW, kW, M are the value of convolution kernel four dimensions, and N, W, C, H are The value of convolution input data four dimensions；

If the complexity is greater than given threshold, determines that the n-th reverse data type is floating type, determine that the convolution is defeated Enter whether data and convolution kernel are floating data, if the convolution input data and convolution kernel are not floating data, by the volume Product input data is converted into floating data, and convolution kernel is converted into floating data, then by convolution input data, convolution kernel with floating Point data type executes convolution algorithm.

5. the method according to claim 1, wherein such as the described n-th reversed operation are as follows: Matrix Multiplication matrix operation, The input data is n-th layer input data, and the weight is the n-th output result gradient；

Complexity=β * F*G*E*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are n-th layer input data Row, column value, E, F be weight row, column value；

If the complexity is greater than given threshold, determines that the n-th reverse data type is floating type, determine the n-th layer Whether input data and weight are floating data, if the n-th layer input data and weight are not floating data, by this n-th Layer input data is converted into floating data, weight is converted into floating data, then by n-th layer input data, weight with floating-point Data type executes Matrix Multiplication matrix operation.

6. the method according to claim 1, wherein such as the described n-th reversed operation are as follows: Matrix Multiplication vector operation, The input data is n-th layer input data, and the weight is the n-th output result gradient；

Complexity=β * F*G*F；Wherein, β is matrix coefficient, and value range is more than or equal to 1, and F, G are n-th layer input data Row, column value, F are the train value of the n-th output result gradient；

If the complexity is greater than given threshold, determines that the n-th reverse data type is floating type, determine the n-th layer Whether input data and weight are floating data, if the n-th layer input data and weight are not floating data, by this n-th Layer input data is converted into floating data, weight is converted into floating data, then by n-th layer input data, weight with floating-point Data type executes Matrix Multiplication vector operation.

7. method described in -6 any one according to claim 1, which is characterized in that

The n-layer inverse operation further include: bigoted operation, entirely connect operation, GEMM operation, GEMV operation, activation operation in one Kind or any combination.

8. a kind of integrated circuit chip device, which is characterized in that the integrated circuit chip device is for executing neural network Training operation, the neural network includes n-layer；The integrated circuit chip device includes: processing circuit and external interface；

The external interface, for receiving training instruction；

The processing circuit is calculated for determining first layer input data and first layer weight group data according to the training instruction Device obtains forward operation by the n-layer forward operation that first layer input data and first layer weight group data execute neural network N-th output result；

The processing circuit is also used to obtain the n-th output result gradient according to the n-th output result, refer to according to the training The the n-th reversed operation for obtaining the reversed operation of n-th layer is enabled, exports result gradient, n-th layer input data, n-th layer weight according to n-th Group data and the n-th reversed operation obtain the n-th reversed computational complexity, determine that n-th is defeated according to the described n-th reversed computational complexity Result gradient, n-th layer input data, the corresponding n-th reverse data type of n-th layer weight group data out, by the n-th output result ladder Degree, n-th layer input data, n-th layer weight group data execute the reversed operation of n-th layer of neural network with the n-th reverse data type Obtain n-th layer weight group gradient and n-th layer input data gradient；Using the n-th layer weight group gradient to n-th layer weight group number According to being updated；The n-th reverse data type includes: fixed point type or floating point type；

The processing circuit is also used to execute n-th layer input data gradient as (n-1)th layer of the (n-1)th output result gradient N-1 layers of direction operation obtain n-1 layers of weight group gradient, using the weight group data of n-1 layers of weight group gradient updating respective layer, institute Stating weight group data includes；At least two weights.

9. integrated circuit chip device according to claim 8, which is characterized in that

The processing circuit, specifically by the n-th reversed computational complexity compared with preset threshold, such as the described n-th reversed operation is complicated Degree is higher than the preset threshold, determines that the n-th reverse data type is fixed point type, such as the described n-th reversed computational complexity Less than or equal to the preset threshold, determine that the n-th reverse data type is floating point type.

10. integrated circuit chip device according to claim 9, which is characterized in that the integrated circuit chip device is also It include: data type conversion circuit；

The processing circuit is also used to determine the n-th output the result gradient, n-th layer input data, n-th layer weight group data The the (n+1)th reverse data type belonged to, such as the (n+1)th reverse data type is different from the n-th reverse data type, to institute It states data type conversion circuit and sends conversion command,

The data type conversion circuit, for the n-th output the result gradient, n-th of the (n+1)th reverse data type will to be belonged to Layer input data, n-th layer weight group data conversion are at the n-th output the result gradient, n-th for belonging to the n-th reverse data type Layer input data, n-th layer weight group data.

11. integrated circuit chip device according to claim 8, which is characterized in that if the reversed operation of the n-layer is convolution Operation, convolution input data are the n-th layer input data, and convolution kernel is the n-th output result gradient,

N-th reversed computational complexity=α * C*kW*kW*M*N*W*C*H；

The processing circuit is also used to the complexity such as and is greater than given threshold, determines that the n-th reverse data type is floating number According to type, determine whether the convolution input data and convolution kernel are floating data；Such as the convolution input data and convolution kernel It is not floating data, which is converted into floating data, convolution kernel is converted into floating data, then by convolution Input data, convolution kernel execute convolution algorithm with floating type.

12. integrated circuit chip device according to claim 8, which is characterized in that such as the described n-th reversed operation are as follows: square Battle array multiplies matrix operation, and the input data is n-th layer input data, and the weight is the n-th output result gradient；

N-th reversed computational complexity=β * F*G*E*F；Wherein, β is matrix coefficient, and value range is F, G the more than or equal to 1 The row, column value of n-layer input data, E, F are the row, column value of weight；

The processing unit is greater than given threshold for such as the complexity, determines that the n-th reverse data type is floating data Type determines whether the n-th layer input data and weight are floating data, as the n-th layer input data and weight are not The n-th layer input data is converted into floating data, weight is converted into floating data, then inputs n-th layer by floating data Data, weight execute Matrix Multiplication matrix operation with floating type.

13. integrated circuit chip device according to claim 8, which is characterized in that such as the described n-th reversed operation are as follows: square Battle array multiplies vector operation, and the input data is n-th layer input data, and the weight is the n-th output result gradient；

N-th reversed computational complexity=β * F*G*F；Wherein, β is matrix coefficient, and value range is F, G n-th more than or equal to 1 The row, column value of layer input data, F are the train value of the n-th output result gradient；

The processing circuit is also used to the complexity such as and is greater than given threshold, determines that the n-th reverse data type is floating number According to type, determine whether the n-th layer input data and weight are floating data, not such as the n-th layer input data and weight For floating data, which is converted into floating data, weight is converted into floating data, it is then that n-th layer is defeated Enter data, weight and Matrix Multiplication vector operation is executed with floating type.

14. according to integrated circuit chip device described in claim 8-13 any one, which is characterized in that

The reversed operation of n further include: bigoted operation, entirely connect operation, GEMM operation, GEMV operation, activation operation in one Kind or any combination.

15. a kind of neural network computing device, which is characterized in that the neural network computing device includes one or more as weighed Benefit requires integrated circuit chip device described in 8-14 any one.

16. a kind of combined treatment device, which is characterized in that the combined treatment device includes: mind as claimed in claim 15 Through network operations device, general interconnecting interface and general processing unit；

17. a kind of chip, which is characterized in that device of the integrated chip as described in claim 8-14 any one.

18. a kind of electronic equipment, which is characterized in that the electronic equipment includes chip as claimed in claim 17.