WO2019084254A1 - Manipulation de tenseur dans un réseau neuronal - Google Patents
Manipulation de tenseur dans un réseau neuronalInfo
- Publication number
- WO2019084254A1 WO2019084254A1 PCT/US2018/057488 US2018057488W WO2019084254A1 WO 2019084254 A1 WO2019084254 A1 WO 2019084254A1 US 2018057488 W US2018057488 W US 2018057488W WO 2019084254 A1 WO2019084254 A1 WO 2019084254A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- tensor
- input
- neural network
- layer
- radix points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- ANN artificial neural networks
- the task can include image recognition, speech recognition, and other computationally intensive applications.
- This "learning”, called machine learning, is based on the premise that computers can be trained to perform a task without being specifically programmed to do so.
- the training builds algorithms to learn using a known dataset (supervised learning).
- the algorithms can then be used to make predictions about the current and future datasets.
- the advantage of machine learning is that the algorithms are based on models.
- the algorithms can adapt and improve over time based on past experience with data such as prediction success rates and error rates.
- a model is constructed from a set of sample data with known characteristics.
- the model is trained using the known data to make desired predictions and decisions. Once the model has been trained, the model is applied to other datasets.
- the model can be updated over time based on the success rate of the model to make correct predictions using the data.
- machine learned models include: network and system intrusion detection; optical character recognition (OCR); email filtering for spam detection, computer vision (CV); and so on.
- OCR optical character recognition
- CV computer vision
- the success of the model is limited by the quality of the training data. Analysis of the training data often requires human intervention, so such analysis is both expensive and at risk of human error.
- Deep neural networks are a form of artificial neural networks (ANN). Like artificial neural networks, the deep neural networks are based on layers. For the deep neural networks, there can be multiple hidden layers between the input layer and the output layer. DNNs are well suited to modeling complex, non-linear relationships. A DNN can be used to generate a compositional model. A compositional model can support automatic formulation of models using explicit representation for modeling assumptions. The compositional model can be expressed as a layered composition of primitive data types. The additional layers of the DNN can support formulation of features from lower layers of the composition. The result can be modeling the complexities of data using fewer computational resources.
- ANN artificial neural networks
- Neural networks can be used to process vast quantities of unstructured data.
- the neural networks can manipulate tensors, where the tensors can represent the data including the unstructured data.
- Neural networks are finding many data processing applications in diverse fields such as machine learning, including deep learning, artificial intelligence, business and research applications such as trend analysis, and so on. Von Neumann and other traditional control flow computational architectures are not well suited to highly data-intensive processing requirements. Although designers and architects continue to construct faster processors, improved custom integrated circuits or chips, more capable application specific integrated circuits (ASIC), and so on, the new designs and architectures still fail to meet the data processing demands because these architectures are not designed specifically for processing vast amounts of data.
- An alternative architecture to the control flow architectures is based on data flow.
- a data flow architecture In a data flow architecture, the execution of instructions, functions, subroutines, etc., is based on the presence or absence of data. This latter approach, that of a data flow architecture, is better suited to handling the large amounts of unstructured data that are processed as part of the machine learning and deep learning applications.
- Neural networks can be implemented using a reconfigurable fabric comprised of processing elements, switching elements, and/or memory elements.
- training data can be applied to the neural network.
- the results from each layer of nodes based on the training data can then be propagated forward to achieve an end result.
- Error data can then be generated by comparing the neural network result of processing the training data to a desired result included with the training data.
- the error data can then be backward propagated into the network to fine tune the weightings of each layer.
- the training process can be iterated until desired results are achieved.
- Tensor manipulation within a neural network is realized using a reconfigurable fabric.
- the reconfigurable fabric includes processing elements, switching elements, memory elements, communications capabilities, and so on.
- Embodiments include a computer-implemented method for computational manipulation comprising: obtaining a first input tensor for manipulation within a deep neural network, wherein the first input tensor includes fixed-point numerical representations, and wherein the first input tensor includes tensor metadata; applying the first input tensor to a first layer within the deep neural network, wherein the first input tensor with fixed-point values has a first set of variable radix points, wherein the first set of variable radix points is associated with the fixed-point values of the first input tensor; determining a first weighting tensor for the first input tensor applied to the first layer, wherein the first weighting tensor includes tensor metadata; calculating a first output tensor from the first layer within the deep neural network based on
- the tensor metadata for each tensor includes tensor dimension, tensor element count, tensor radix points, tensor element precision, tensor element range, or tensor element classification.
- each set of radix points is determined per tensor.
- Fig. 1 is a flow diagram for tensor manipulation within a neural network.
- Fig. 2 is a flow diagram for tensor metadata inclusion.
- Fig. 3 shows an example layer.
- Fig. 4 illustrates example layers with forward propagation and backward propagation.
- Fig. 5A shows example fixed radix point representations.
- Fig. 5B shows example variable radix point representations.
- Fig. 6 illustrates an example first layer and an example second layer.
- Fig. 7 shows a deep learning block diagram.
- Fig. 8 illustrates a cluster for coarse-grained reconfigurable processing.
- Fig. 9 shows a block diagram of a circular buffer.
- Fig. 10 illustrates a circular buffer and processing elements.
- Fig. 11 is a system diagram for computational manipulation for tensor manipulation within a neural network.
- a tensor is a convenient mathematical structure for use in many neural network applications.
- data can be stored using many different schemas, and the disclosed techniques are applicable to other data structures besides tensors, such as list structures and tree structures.
- Neural networks such as deep neural networks, convolutional neural networks, and so on, are being developed to handle highly complex data processing requirements such as those presented by "big data".
- the immense datasets associated with big data can overwhelm conventional, control-based computer hardware techniques including those based on Von Neumann techniques.
- the data itself can have large dynamic ranges. That is, the data can include very small values and very large values.
- Number representation schemes can include fixed-point representations and floating-point representations.
- the former is computationally simple and can handle accuracy requirements until the fixed-point values saturate or overflow. Saturation can occur when a number or a result of an operation cannot be represented by the number of digits available to the fixed- point number representation scheme.
- Floating-point techniques can handle large dynamic ranges of numbers, but suffer from roundoff error and an inability to handle small numbers and large number concurrently in various operations. For example, adding a small number to a large number can leave the large number unchanged.
- manipulation of floatingpoint representations is more computationally intensive.
- a deep neural network can be realized using a reconfigurable fabric.
- the reconfigurable fabric includes
- the reconfigurable fabric can include elements that can be configured as processing elements, switching elements, or memory elements. Configuration and control of the elements can be controlled by rotating circular buffers. By loading instructions into a given circular buffer, the instructions can configure the element associated with the circular buffer and can enable the element to operate on data, which can include s very large quantities of data.
- the rotating circular buffers can be statically scheduled, so that processing time is saved by avoiding the reloading of instructions into the circular buffers.
- a number representation scheme based on variable radix points and fixed-point representations can be used. The variable radix points can be used to handle a wide, dynamic range of data values, and the variable radix point fixed-point number representation scheme can be used to both simplify computations and reduce data storage requirements.
- Tensor manipulation is performed within a neural network.
- a first input tensor is obtained for manipulation within a deep neural network, where the first input tensor includes fixed-point numerical representations, and where the first input tensor includes tensor metadata.
- the tensor metadata for each tensor can include tensor dimension, tensor element count, tensor radix points, tensor element precision, tensor element range, or tensor element classification.
- the first input tensor is applied to a first layer within the deep neural network, where the first input tensor with fixed-point values has a first set of variable radix points, and where the first set of variable radix points is associated with the fixed-point values of the first input tensor.
- a first weighting tensor is determined for the first input tensor applied to the first layer, where the first weighting tensor includes tensor metadata.
- a first output tensor is calculated from the first layer within the deep neural network based on the first input tensor and the first weighting tensor, where the first output tensor has fixed-point values with a second set of variable radix points, where the second set of variable radix points is associated with the fixed-point values of the first output tensor, and where the first output tensor includes tensor metadata.
- the variable radix points associated with input tensors can be determined by heuristic and computational techniques. Computational techniques can be very costly calculations in terms of processing multidimensional tensors through a large, deep, complex neural network. Heuristic techniques can be far less costly from a
- Tensor metadata can be integral to performing variable radix point calculations within a neural network implemented on a reconfigurable fabric.
- Tensor metadata can include tensor dimension, tensor element count, tensor radix points, tensor element precision, tensor element range, or tensor element classification.
- the tensor dimension can include the order, degree, rank, etc., of one or more arrays that can be used to represent the tensor.
- the tensor metadata can be used along with the tensor as it is applied to a layer within a neural network.
- the tensor metadata can be included to determine radix points for both the tensor being applied to a neural network layer and a resulting output tensor.
- the output tensor can be used as an input tensor for a next layer of the neural network.
- the fixed- point numerical representation can include a set of variable radix points.
- each set of radix points can be determined per tensor.
- the tensor metadata can be determined for each tensor.
- the tensor metadata for each tensor can include tensor dimension, tensor element count, tensor radix points, tensor element precision, tensor element range, or tensor element classification.
- the tensor dimension can include the order, degree, rank, etc., of one or more arrays that can be used to represent the tensor.
- the flow 100 includes applying the first input tensor to a first layer 120 within the deep neural network, wherein the first input tensor with fixed-point values has a first set of variable radix points, wherein the first set of variable radix points is associated with the fixed-point values of the first input tensor.
- the first layer can be an input layer, an output layer, a hidden layer, and so on, in the deep neural network or other neural network.
- the first set of variable radix points 122 associated with the first input tensor can be used for the applying.
- the first set of variable radix points associated with the first input tensor with fixed-point values can be used to increase precision, to normalize, to reduce saturation, to reduce roundoff errors, and the like.
- the set of variable radix points can be associated with an input tensor, shared by two or more tensors, and so on.
- the first set of variable radix points can have different radix points for different blocks within the first input tensor.
- the flow 100 includes determining a first weighting tensor 130 for the first input tensor applied to the first layer, wherein the first weighting tensor includes tensor metadata.
- the weighting tensor can be obtained, loaded from a library, downloaded from the Internet and so on.
- a second set of variable radix points 132 can be used for the determining.
- the second set of variable radix points can be associated with a weighting tensor, a scaling tensor, a normalizing tensor, and so on.
- the deep neural network is implemented using a reconfigurable fabric.
- Reconfigurable fabrics can include arrays or clusters of elements.
- the reconfigurable fabric can be implemented as a custom integrated circuit or chip, a system on a chip (SoC), and so on.
- Reconfigurable fabrics can be applied to many applications where high-speed transferring and processing of data is performed.
- the reconfigurable fabric comprises processing elements, switching elements, or memory elements.
- the reconfigurable fabric can also include communications and interconnection capabilities.
- the elements can be controlled by rotating circular buffers.
- the rotating circular buffer can be loaded with instructions that can be used to control the processing elements.
- the rotating circular buffers can be statically scheduled.
- the static scheduling can include loading instructions into the circular buffers and controlling the circulation of the circular buffers. The circulation of the circular buffers allows execution of the instructions stored in the circular buffers.
- the flow 100 includes calculating a first output tensor 140 from the first layer within the deep neural network based on the first input tensor and the first weighting tensor, wherein the first output tensor has fixed-point values with a second set of variable radix points, wherein the second set of variable radix points is associated with the fixed-point values of the first output tensor, and wherein the first output tensor includes tensor metadata.
- the calculating can be based on Boolean operations, convolution, rectification, such as a rectified linear unit (ReLU), pooling, max pooling, addition, multiplication, and so on.
- the flow 100 further includes using the second set of variable radix points to determine variable radix points for a next operation 142 by the first layer.
- the using of the second set of variable radix points can include scaling, normalization, saturation, reduction, and so on.
- the training can include supervised training, unsupervised training, partially supervised training, and so on.
- the training can include training layers of the deep neural network by changing values of one or more weighting tensors.
- the training can include forward propagation of activations.
- An activation can define an output based on one or more inputs.
- the activation can be propagated to modify a task or operation performed by one or more nodes in a layer.
- the training can include backward propagation of error.
- the backward propagation of error can be used to update activations, to update weights, and so on, or to improve convergence, to reduce error, etc.
- the propagating, or using, of the first output tensor is in the backward direction for training.
- the first input tensor comprises deep neural network user training data.
- Various steps in the flow 100 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.
- Various embodiments of the flow 100 may be included in a computer program product embodied in a non-transitory computer readable medium that includes code executable by one or more processors.
- Fig. 2 is a flow diagram for tensor metadata inclusion.
- Tensors are manipulated within neural networks such as deep neural networks, convolutional neural networks, and so on.
- the tensors can include metadata.
- a first input tensor is obtained for manipulation within a deep neural network, where the first input tensor includes fixed-point numerical representations, and where the first input tensor also includes tensor metadata.
- the tensor metadata for each tensor can include tensor dimension, tensor element count, tensor radix points, tensor element precision, tensor element range, or tensor element classification.
- the first input tensor is applied to a first layer within the deep neural network, where the first input tensor with fixed-point values has a first set of variable radix points, and where the first set of variable radix points is associated with the fixed-point values of the first input tensor.
- a first weighting tensor is determined for the first input tensor applied to the first layer, where the first weighting tensor includes tensor metadata.
- the flow 200 includes obtaining a tensor 210.
- a tensor can be a multidimensional array.
- the tensor can include a first tensor for manipulation within a deep neural network (DNN).
- the tensor can include input data, output data, weights, etc.
- the first tensor can include one or more fixed-point representations.
- the fixed-point representations can include fixed radix point representations, variable radix point representations, and so on.
- the flow 200 includes tensor metadata 220.
- the tensor metadata can be used to further describe the tensor, to aid computations based on the tensor, etc.
- the tensor metadata can include a tensor dimension 222.
- the tensor dimension can include the order, degree, rank, etc., of one or more arrays that can be used to represent the tensor.
- the tensor metadata can include tensor element precision 224. Tensors can be described in terms of elements, where the elements can be related to tensor products.
- the tensor element precision can include a number of bits, digits, bytes, words, and so on that can be used to describe the tensor.
- the tensor metadata can include tensor range 226. Tensor range can include values that can be assigned to the tensor such as [1 , 2, 3, 4], [3, 6, 9, 12, 15], and so on.
- the included tensor metadata 220 can include tensor element count 223.
- the tensor element count can include a count of the number of occurrences of a given element in the tensor. An element count for an element "1 " in tensor [2, 1 , 0, 1, 1, 2] is 3.
- the tensor metadata can include tensor radix points 225.
- the tensor radix points can include a set of radix points, where the set of radix points can include variable radix points.
- the tensor metadata can include tensor classification 227.
- Tensor classification can include vectorizing tensor data and applying regression techniques. The regression techniques can include classification techniques.
- the flow 200 includes propagating, or using, tensor metadata in a layer 230.
- the tensor metadata can be associated with an input tensor to a layer, a weighting tensor for a layer, an output tensor from a layer, etc.
- the weighting tensor can include tensor metadata.
- Various steps in the flow 200 may be changed in order, repeated, omitted, or the like without departing from the disclosed concepts.
- Various embodiments of the flow 200 may be included in a computer program product embodied in a non-transitory computer readable medium that includes code executable by one or more processors.
- Fig. 3 shows an example layer.
- Layers such as input layers, output layers, hidden layers, and so on can be included in neural networks.
- Neural networks such as deep neural networks (DNN), convolutional neural networks (CNN), and so on, can be applied to deep learning and other techniques.
- the neural networks can manipulate data types including tensors. Layers support tensor manipulation within a neural network.
- An example 300 can include layer F(A, B) 320.
- the layer 320 can include an input A(t) 310 and an input B(t) 312.
- the layer 320 includes implementation of function F(A, B), where the function F is based on inputs A and B.
- the input A(t) 310 can include fixed-point values, variable radix point values, tensors, vectors, and so on.
- the input B(t) 312 can also include values such as weights.
- the inputs A and B are a function of time t in the sense that at a certain point in time, inputs A and B will have certain values. At a later point in time, for example, t+1, inputs A and B may have different values associated with a subsequent cycle. At an earlier point in time, for example, t-1, inputs A and B may have different values associated with a previous cycle.
- other inputs and/or outputs to layer 320 such as a variable radix point, designated by RPz(t) can have a time dependency.
- the point in time and the later point in time can represent various data being processed by the layer in the neural network.
- a first weighting tensor can have fixed-point values with a third set of variable radix points, where the third set of radix points can be associated with the fixed-point values of the first weighting tensor.
- the layer 320 can receive a set of radix points.
- a second set of variable radix points can be a function of a preceding set of variable radix points associated with fixed-point values of a previous output tensor.
- the set of radix points can include radix points from a previous computation, such as radix points RPz(t-l).
- the layer 320 can include an operation type 330.
- the operation type 330 can include a convolution, a rectification such as a rectified linear unit (ReLU), pooling such as max pooling, Boolean operations, addition, multiplication, and so on.
- the operation type can operate on values such as tensors.
- the tensors can include a set of variable radix points.
- the operation type 330 can include a set of variable radix points for input Al, RPA; a set of variable radix points for input Bl, RPB; a set of variable radix points from another operation RPz; and the like.
- the first set of variable radix points has different radix points for different blocks within the first input tensor.
- the layer 320 can produce an output Z(t) 342.
- the output Z can be a tensor with an associated set of variable radix points RPz(t).
- the associated set of variable radix points can be used by layer 320 or another layer for another operation.
- Fig. 4 illustrates example layers 400 with forward propagation and backward propagation.
- the example layers can represent layers in a deep neural network (DNN), a convolutional neural network (CNN), and so on.
- the forward propagation and the backward propagation can be used for tensor manipulation within a neural network.
- Example layers 400 are shown.
- the layers can include an input layer, an output layer, a fully connected layer, hidden layers, and so on.
- Two layers are shown, layer 410 and layer 430.
- a layer 410 includes an input Al(t) 412 and an input Bl(t) 414.
- Input Al(t) can be a tensor, a vector, a fixed-point number, and so on.
- the layer 410 includes a layer operation F1(A, B) 420.
- the layer operation 420 can include a Boolean operation, a convolution, a rectified linear unit (ReLU), a pooling operation such as a max pooling operation, addition, multiplication, and so on.
- the layer operation 420 can determine an output Zl(t) 416.
- the layer operation 420 can determine a set of radix points such as RPzi(t). The set of radix points can be fed back, becoming a set of radix points RPz ⁇ (t-1) for the next layer operation 420.
- a layer 430 includes an input A2(t) 432, and an input B2(t) 434.
- the first output tensor can be propagated, or used, as an input to a second layer within the deep neural network with a set of radix points for the input to the second layer.
- the input A2(t) 432 can include an output from another layer, such as Zl(t) 416 from layer 410.
- the input B2(t) can include weights, etc.
- the layer 430 includes a layer operation F2(A, B) 440.
- layer operation 440 can include a Boolean operation, a convolution, a ReLU, a pooling operation, an addition, a multiplication, etc.
- the layer 410 and the layer 430 can be layers in a deep neural network, a convolutional neural network, and so on.
- weights used by a given layer can be updated as part of a learning technique.
- the learning technique can include training the neural network.
- the weights can include input Bl(t) 414, input B2(t) 434, etc.
- the updating of the weights can be based on forward propagation 460, on backward propagation 462, on forward propagation and backward propagation, and so on.
- the updating of weights such as weights B2(t) 434 can be based on an output from a stage, such as Zl(t) 416.
- the training includes forward propagation of activations.
- the updating of weights such as weights Bl (t) 414 can be based on an output from a stage, such as Z2(t) 436.
- the training includes backward propagation of error.
- the forward propagation 460 and the backward propagation 462 can be used to adjust tensors such as weighting tensors.
- the adjusting further includes adjusting the first weighting tensor based on the forward propagation and the backward propagation.
- Fig. 5A shows example fixed radix point representations.
- Fixed radix point representations of numbers can represent tensors.
- the tensors can be manipulated within a neural network.
- the neural network such as a deep neural network (DNN), a convolutional neural network (CNN), and so on, can be used for deep learning and other techniques.
- Real data types can be represented by fixed-point representations, where the fixed-point representation can include a fixed or implied radix point, shown in example 500.
- the fixed-point representation there can be a specific number of digits to the left of the radix point, and a specific number of digits to the right of the radix point.
- the group of bits 544 has a left most digit for sign bit digit 546 and then five integer digits to the left of the implied radix point large dot.
- the sign bit digit 546 of the group of bits 544 can be a one, which can indicate that the number represented is a negative number.
- Fig. 5B shows example variable radix point representations.
- the variable radix representations 502 can be used for real data types, integer data types, and so on.
- the values represented by the variable radix representations can be scaled for accuracy, normalization, and other operations.
- a number 560 can have a sign bit digit 562.
- a number 564 can have a sign bit digit 566.
- the number 580 and number 584 are shown with a scaling factor 570.
- the number 580 can have a sign bit 582, and the number 584 can have a sign bit 586.
- a sign bit with a value of zero can indicate that the number with which the sign bit is associated is a positive number
- a sign bit with a value of one can indicate that the number with which the sign bit is associated is a negative number.
- the number 580 and the number 584 are scaled by 2 13 , where the scaling technique can include shifting left number 580 and number 584 by thirteen positions.
- the collecting of the data groups can be performed in a first locality, a second locality, a third locality, a fourth locality, and so on, respectively.
- the input layer can then perform processing such as partitioning collected data into non-overlapping partitions.
- the deep learning block diagram 700 which can represent a network such as a convolutional neural network, can contain a plurality of hidden layers. While three hidden layers, hidden layer 720, hidden layer 730, and hidden layer 740 are shown, other numbers of hidden layers may be present.
- Each hidden layer can include layers that perform various operations, where the various layers can include a convolution layer, a pooling layer, and a rectifier layer such as a rectified linear unit (ReLU) layer.
- ReLU rectified linear unit
- Data flow processors can be applied to many applications where large amounts of data such as unstructured data are processed.
- Typical processing applications for unstructured data can include speech and image recognition, natural language processing, bioinformatics, customer relationship management, digital signal processing (DSP), graphics processing (GP), network routing, telemetry such as weather data, data warehousing, and so on.
- Data flow processors can be programmed using software and can be applied to highly advanced problems in computer science such as deep learning. Deep learning techniques can include an artificial neural network, a convolutional neural network, etc. The success of these techniques is highly dependent on large quantities of data for training and learning. The data-driven nature of these techniques is well suited to implementations based on data flow processors.
- a Manhattan distance can include a number of steps to the east, west, north, and south.
- a control signal can be propagated from the start cluster to the end cluster. The control signal advances one cluster per cycle. When the counters for the PEs all reach 0, then the processors have been reset.
- the processors can be suspended for configuration, where configuration can include loading of one or more kernels onto the cluster.
- the processors can be enabled to execute the one or more kernels.
- Configuring mode for a cluster can include propagating a signal.
- Clusters can be preprogrammed to enter configuration mode. Various techniques, including direct memory access (DMA) can be used to load instructions from the kernel into instruction memories of the PEs.
- the clusters that were preprogrammed into configuration mode can be preprogrammed to exit configuration mode. When configuration mode has been exited, execution of the one or more kernels loaded onto the clusters can commence.
- DMA direct memory access
- the SDK can include an architectural simulator, where the architectural simulator can simulate a data flow processor or processors.
- the SDK can include an assembler, where the assembler can be used to generate object modules.
- the object modules can represent agents.
- the agents can be stored in a library of agents.
- Other tools can be included in the SDK.
- the various techniques of the SDK can operate on various representations of a wave flow graph (WFG).
- WFG wave flow graph
- Fig. 8 illustrates a cluster for coarse-grained reconfigurable processing.
- the cluster 800 for coarse-grained reconfigurable processing can be used for tensor manipulation within a neural network. Data can be obtained from a first switching unit, where the first switching unit can be controlled by a first circular buffer.
- the example cluster 800 also comprises four processing elements— qO, ql , q2, and q3.
- the four processing elements can collectively be referred to as a "quad," and can be jointly indicated by a grey reference box 828. In embodiments, there is intercommunication among and between each of the four processing elements.
- the circular buffer 802 controls the passing of data to the quad of processing elements 828 through switching elements.
- the four processing elements 828 comprise a processing cluster.
- the processing elements can be placed into a sleep state. In embodiments, the processing elements wake up from a sleep state when valid data is applied to the inputs of the processing elements.
- the individual processors of a processing cluster share data and/or instruction caches. The individual processors of a processing cluster can implement message transfer via a bus or shared memory interface. Power gating can be applied to one or more processors (e.g. ql) in order to reduce power.
- a preprocessor or compiler can be configured to prevent data collisions within the circular buffer 802.
- the prevention of collisions can be accomplished by inserting no-op or sleep instructions into the circular buffer (pipeline).
- intermediate data can be stored in registers for one or more pipeline cycles before being sent out through the output port.
- the preprocessor can change one switching instruction to another switching instruction to avoid a conflict. For example, in some instances the preprocessor can change an instruction placing data on the west output 824 to an instruction placing data on the south output 820, such that the data can be output on both output ports within the same pipeline cycle.
- Accesses to RAM in different clusters can travel through the same DMA path, but the transactions must be separately defined.
- a maximum block size for a single DMA transfer can be 8KB.
- Accesses to data RAMs can be performed either when the processors are running, or while the processors are in a low power "sleep" state.
- Accesses to the instruction RAMs and the PE and Co-Processor Registers may be performed during configuration mode.
- the quad RAMs may have a single read/write port with a single address decoder, thus allowing shared access to them by the quads and the switches.
- the static scheduler i.e. the router determines when a switch is granted access to the RAMs in the cluster.
- Fig. 9 shows a block diagram 900 of a circular buffer 910.
- the circular buffer 910 can include a switching element 912 corresponding to the circular buffer.
- the circular buffer and the corresponding switching element can be used in part for tensor manipulation within a neural network including a deep neural network (DNN).
- Data can be obtained from a first switching unit, where the first switching unit can be controlled by a first circular buffer.
- Data can be sent to a second switching element, where the second switching element can be controlled by a second circular buffer.
- Obtaining data from the first switching element and sending data to the second switching element can include a direct memory access (DMA).
- the block diagram 900 describes a processor-implemented method for data manipulation.
- the circular buffer 910 contains a plurality of pipeline stages.
- the clusters implement multiple storage elements in the form of registers.
- the instruction 962 is a local storage instruction.
- the instruction 962 takes data from the instruction ' s south input and stores it in a register (rO).
- Another instruction (not shown) is a retrieval instruction.
- the retrieval instruction takes data from a register (e.g. rO) and outputs it from the instruction ' s output (north, south, east, west).
- Some embodiments utilize four general purpose registers, referred to as registers rO, rl, r2, and r3.
- the registers are, in embodiments, storage elements which store data while the configurable connections are busy with other data.
- the storage elements are 32-bit registers. In other embodiments, the storage elements are 64-bit registers. Other register widths are possible.
- the sleep state is exited based on an instruction applied to a switching fabric.
- the sleep state can, in some embodiments, only be exited by a stimulus external to the logical element and not based on the programming of the logical element.
- the external stimulus can include an input signal, which in turn can cause a wake up or an interrupt service request to execute on one or more of the logical elements.
- An example of such a wake-up request can be seen in the instruction 958, assuming that the processor ql was previously in a sleep state.
- the processor ql wakes up and operates on the received data.
- a circular buffer 1010 feeds a processing element 1030.
- a second circular buffer 1012 feeds another processing element 1032.
- a third circular buffer 1014 feeds another processing element 1034.
- a fourth circular buffer 1016 feeds another processing element 1036.
- These circular buffers are shown with lengths of 128 entries, but various lengths are possible.
- the four processing elements 1030, 1032, 1034, and 1036 can represent a quad of processing elements.
- the processing elements 1030, 1032, 1034, and 1036 are controlled by instructions received from the circular buffers 1010, 1012, 1014, and 1016.
- the circular buffers can be implemented using feedback paths 1040, 1042, 1044, and 1046, respectively.
- circular buffer 1010 contains a MOV instruction.
- Circular buffer 1012 contains a SKIP instruction.
- Circular buffer 1014 contains a SLEEP instruction and an ANDI instruction.
- Circular buffer 1016 contains an AND instruction, a MOVE instruction, an ANDI instruction, and an ADD instruction.
- the operations performed by the processing elements 1030, 1032, 1034, and 1036 are dynamic and can change over time, based on the instructions loaded into the respective circular buffers. As the circular buffers rotate, new instructions can be executed by the respective processing element.
- one or more processors 1 110 are attached to the memory 1 112 where the one or more processors, when executing the stored instructions are configured to: obtain a first input tensor for manipulation within a deep neural network, wherein the first input tensor includes fixed-point numerical representations, and wherein the first input tensor includes tensor metadata; apply the first input tensor to a first layer within the deep neural network, wherein the first input tensor with fixed-point values has a first set of variable radix points, wherein the first set of variable radix points is associated with the fixed-point values of the first input tensor; determine a first weighting tensor for the first input tensor applied to the first layer, wherein the first weighting tensor includes tensor metadata; calculate a first output tensor from the first layer within the deep neural network based on the first input tensor and the first weighting tensor, wherein the first output tensor has fixed-point values
- the system 1100 can include an applying component 1140.
- the applying component 1140 can include functions and instructions for applying the first input tensor to a first layer within the deep neural network.
- the first input tensor with fixed-point values can have a first set of variable radix points.
- the first set of variable radix points can be associated with the fixed-point values of the first input tensor.
- the system 1100 can include a determining component 1150.
- the determining component 1150 can include functions and instructions for determining a first weighting tensor for the first input tensor applied to the first layer.
- the block diagrams and flowchart illustrations depict methods, apparatus, systems, and computer program products.
- the elements and combinations of elements in the block diagrams and flow diagrams show functions, steps, or groups of steps of the methods, apparatus, systems, computer program products and/or computer-implemented methods. Any and all such functions— generally referred to herein as a "circuit,” “module,” or “system”— may be implemented by computer program instructions, by special-purpose hardware-based computer systems, by combinations of special purpose hardware and computer instructions, by combinations of general purpose hardware and computer instructions, and so on.
- a computer may enable execution of computer program instructions including multiple programs or threads.
- the multiple programs or threads may be processed approximately simultaneously to enhance utilization of the processor and to facilitate substantially simultaneous functions.
- any and all methods, program codes, program instructions, and the like described herein may be implemented in one or more threads which may in turn spawn other threads, which may themselves have priorities associated with them.
- a computer may process these threads based on priority or other order.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne des techniques de manipulation de tenseur dans un réseau neuronal, les techniques comprenant l'apprentissage du réseau neuronal. Un tenseur d'entrée est obtenu en vue d'une manipulation au sein d'un réseau neuronal profond. Le tenseur d'entrée comprend des représentations numériques à virgule fixe et des métadonnées de tenseur et est appliqué à une couche dans le réseau neuronal profond. Le tenseur d'entrée présente des séparations fractionnaires variables associées aux valeurs à virgule fixe du tenseur d'entrée. Un tenseur de pondération comprenant des métadonnées est déterminé pour le tenseur d'entrée appliqué à la couche. Un tenseur de sortie est calculé à partir de la couche dans le réseau neuronal profond sur la base du tenseur d'entrée et du tenseur de pondération. Le tenseur de sortie présente des valeurs à virgule fixe comprenant un deuxième ensemble de séparations fractionnaires variables associées aux valeurs à virgule fixe du tenseur de sortie. Le tenseur de sortie comprend des métadonnées de tenseur. Le tenseur de sortie est propagé au sein du réseau neuronal profond.
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201762577902P | 2017-10-27 | 2017-10-27 | |
| US62/577,902 | 2017-10-27 | ||
| US201762579616P | 2017-10-31 | 2017-10-31 | |
| US62/579,616 | 2017-10-31 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2019084254A1 true WO2019084254A1 (fr) | 2019-05-02 |
Family
ID=66247629
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2018/057488 Ceased WO2019084254A1 (fr) | 2017-10-27 | 2018-10-25 | Manipulation de tenseur dans un réseau neuronal |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2019084254A1 (fr) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111709553A (zh) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | 一种基于张量gru神经网络的地铁流量预测方法 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150123707A1 (en) * | 2013-11-02 | 2015-05-07 | Wave Semiconductor, Inc. | Logical elements with switchable connections |
| US20170061279A1 (en) * | 2015-01-14 | 2017-03-02 | Intel Corporation | Updating an artificial neural network using flexible fixed point representation |
| US20170140263A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Convolutional gated recurrent neural networks |
-
2018
- 2018-10-25 WO PCT/US2018/057488 patent/WO2019084254A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20150123707A1 (en) * | 2013-11-02 | 2015-05-07 | Wave Semiconductor, Inc. | Logical elements with switchable connections |
| US20170061279A1 (en) * | 2015-01-14 | 2017-03-02 | Intel Corporation | Updating an artificial neural network using flexible fixed point representation |
| US20170140263A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Convolutional gated recurrent neural networks |
Non-Patent Citations (2)
| Title |
|---|
| LIANGZHEN LAI ET AL.: "Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations", COMPUTER SCIENCE , MACHINE LEARNING, ARXIV.ORG,, 8 March 2017 (2017-03-08), pages 1 - 10, XP080755374, Retrieved from the Internet <URL:https://arxiv.org/pdf/1703.03073v1.pdf> * |
| MATTHIEU COURBARIAUX ET AL.: "Training deep neural networks with low precision multiplications", CORR (ARXIV),, 23 September 2015 (2015-09-23), pages 1 - 10, XP055566721, Retrieved from the Internet <URL:https://arxiv.org/pdf/1412.7024v5.pdf> * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111709553A (zh) * | 2020-05-18 | 2020-09-25 | 杭州电子科技大学 | 一种基于张量gru神经网络的地铁流量预测方法 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11106976B2 (en) | Neural network output layer for machine learning | |
| US20190130268A1 (en) | Tensor radix point calculation in a neural network | |
| US10949328B2 (en) | Data flow graph computation using exceptions | |
| US11227030B2 (en) | Matrix multiplication engine using pipelining | |
| US20190279038A1 (en) | Data flow graph node parallel update for machine learning | |
| US20190228037A1 (en) | Checkpointing data flow graph computation for machine learning | |
| WO2019191578A1 (fr) | Calcul de graphe de flux de données pour apprentissage automatique | |
| US20200174707A1 (en) | Fifo filling logic for tensor calculation | |
| US20190138373A1 (en) | Multithreaded data flow processing within a reconfigurable fabric | |
| US12223011B1 (en) | Integer matrix multiplication engine using pipelining | |
| US20190266218A1 (en) | Matrix computation within a reconfigurable processor fabric | |
| US10997102B2 (en) | Multidimensional address generation for direct memory access | |
| US20190279086A1 (en) | Data flow graph node update for machine learning | |
| US20190130269A1 (en) | Pipelined tensor manipulation within a reconfigurable fabric | |
| US20190130270A1 (en) | Tensor manipulation within a reconfigurable fabric using pointers | |
| US12306752B2 (en) | Processor cluster address generation | |
| US20190042918A1 (en) | Remote usage of machine learned layers by a second machine learning construct | |
| US20190057060A1 (en) | Reconfigurable fabric data routing | |
| US20200167309A1 (en) | Reconfigurable fabric configuration using spatial and temporal routing | |
| US20190197018A1 (en) | Dynamic reconfiguration using data transfer control | |
| US20190130276A1 (en) | Tensor manipulation within a neural network | |
| US20190130291A1 (en) | Dynamic reconfiguration with partially resident agents | |
| US20190228340A1 (en) | Data flow graph computation for machine learning | |
| WO2019089553A1 (fr) | Calcul de séparation fractionnaire de tenseur dans un réseau neuronal | |
| US20240061704A1 (en) | Processor graph execution using interrupt conservation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18869631 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15.09.2020) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18869631 Country of ref document: EP Kind code of ref document: A1 |