[go: up one dir, main page]

WO2024203190A1 - Calculating device - Google Patents

Calculating device Download PDF

Info

Publication number
WO2024203190A1
WO2024203190A1 PCT/JP2024/009203 JP2024009203W WO2024203190A1 WO 2024203190 A1 WO2024203190 A1 WO 2024203190A1 JP 2024009203 W JP2024009203 W JP 2024009203W WO 2024203190 A1 WO2024203190 A1 WO 2024203190A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
data
calculation
submap
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/009203
Other languages
French (fr)
Japanese (ja)
Inventor
美香 中村
周一 高田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Architek
Architek Corp
Original Assignee
Architek
Architek Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Architek, Architek Corp filed Critical Architek
Publication of WO2024203190A1 publication Critical patent/WO2024203190A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Definitions

  • the present invention relates to a calculation device that performs matrix calculations such as convolution calculations.
  • CNNs convolutional neural networks
  • image recognition the input image is transformed using convolutional layers and pooling layers, gradually reducing the amount of data, and finally outputting the probability value for each classification.
  • a filter multiplication operation is performed on each coordinate region (e.g., a 3x3 cell region) in the input data.
  • the calculation result is then used as the input for the calculation in the next layer, and the convolution operation is repeated.
  • machine learning using a CNN requires many matrix calculations and memory bandwidth.
  • a configuration is used that skips calculations when the calculation target is zero (e.g., Patent Documents 1 and 2, etc.).
  • Patent Documents 1 and 2 do not involve calculations, and therefore can reduce calculation time. As a result, the time required for the entire convolution operation can also be reduced.
  • the data used as input for the operation target is loaded into a calculation memory, and a determination is made as to whether the data loaded into the calculation memory is zero or not. In other words, if the calculation is skipped as a result, the time and memory space required to read the data is used for data not used in the calculation.
  • the present invention was made in consideration of the above-mentioned circumstances, and aims to provide a calculation device that can reduce the time that is wasted when performing matrix calculations such as convolution operations, and can further shorten the time required for the entire calculation compared to conventional methods.
  • the present invention employs the following technical means.
  • the calculation device comprises a data memory, a matrix calculation unit, a zero check unit, a submap memory, and a map check unit.
  • the data memory stores the data to be calculated.
  • the matrix calculation unit reads data from the data memory, performs a matrix calculation, and stores the output matrix in the data memory.
  • the zero check unit judges whether each element of the output matrix falls within a pre-specified range.
  • the submap memory stores the judgment result of the zero check unit as status information.
  • the map check unit judges, based on the status information stored in the submap memory, whether to cause the matrix calculation unit to read out the output matrix corresponding to the status information as the data to be calculated.
  • the matrix calculation unit may simultaneously read out multiple pieces of data for successively calculating the same output matrix.
  • the zero check unit uses one of the multiple pieces of status information corresponding to the multiple pieces of data as status information for the multiple pieces of data successively read out to the matrix calculation unit.
  • the above-mentioned configuration can also be applied to cases where the above-mentioned matrix calculation is, for example, a convolution operation in a convolutional neural network.
  • the matrix calculation unit stores the output matrix in the data memory as the calculation target data of the next layer in the convolution operation.
  • the map check unit determines whether or not to read the output matrix corresponding to the state information stored in the submap memory as the calculation target data of the next layer in the convolution operation to the matrix calculation unit.
  • a configuration can be adopted in which the matrix calculation unit reads out the same coordinate area that constitutes a part of each input channel as the calculation target data in all input channels belonging to the same layer in the convolution operation and performs matrix calculation.
  • a configuration can also be adopted in which the matrix calculation unit reads out multiple data for which matrix calculation is to be performed continuously at the same layer in the convolution operation at one time.
  • the zero check unit uses one of multiple state information corresponding to the multiple data as the state information of the multiple data continuously read out to the matrix calculation unit.
  • the above-mentioned arithmetic device may also be configured to further include a table creation unit and a read control unit.
  • the table creation unit creates a table that specifies the output matrix to be read by the matrix calculation unit based on the determination result of the map check unit.
  • the read control unit causes the matrix calculation unit to read data based on the created table.
  • the zero check unit further determines whether or not each element of the output matrix by the matrix calculation unit falls within a pre-specified range in units of memory access.
  • the submap memory stores the determination result of the zero check unit as the second state information.
  • a configuration can be adopted in which the map check unit executes matrix calculations for the first layer and beyond based on state information stored in the submap memory as a result of the initial matrix calculation for the first layer in the convolution calculation.
  • the above-mentioned arithmetic device may be further configured to include a submap memory buffer that stores, in association with each other, information for identifying an output matrix corresponding to state information, information indicating the storage location of the state information in the submap memory, and information indicating whether the state information has been used in the convolution calculation of the next layer.
  • a storage location in the submap memory that is associated with usage information indicating use in the convolution calculation of the next layer is selected as the storage location in the submap memory for the newly generated state information.
  • the above-mentioned calculation device can also be configured to have the submap memory store in advance kernel state information that has been determined for each element of the kernel used in the above-mentioned matrix calculation as being within a pre-specified range.
  • the map check unit determines, based on the state information and kernel state information stored in the submap memory, whether or not to cause the matrix calculation unit to read out the output matrix corresponding to the state information as data to be calculated.
  • the zero check unit compares each element of the output matrix with multiple thresholds and determines which of multiple ranges defined by the multiple thresholds all elements of the output matrix belong to.
  • the zero check unit further determines whether or not a negative value exists in each element of the output matrix, or the number of elements that belong to one of a plurality of ranges.
  • the zero check unit creates status information during matrix calculation for the input channel that is the last to be calculated among input channels belonging to the same layer, and stores the information in the submap memory.
  • FIG. 1 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention.
  • FIG. 2 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 3 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 4 is a schematic configuration diagram showing an example of a zero check unit included in the arithmetic device according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing an example of a sub-map included in the arithmetic device according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention.
  • FIG. 1 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention.
  • FIG. 2 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram showing an example of a submap address table included in the arithmetic unit according to one embodiment of the present invention.
  • FIG. 8 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 9 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 10 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention.
  • FIG. 11 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 12 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • FIG. 13 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention.
  • FIG. 14 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention.
  • FIG. 15 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention.
  • 16(a) and 16(b) are explanatory diagrams that diagrammatically show the concept of a convolution calculation method by a calculation device according to one embodiment of the present invention.
  • a calculation device which uses the output matrix of a previous matrix calculation as the calculation target data for a subsequent matrix calculation in a series of matrix calculations, is embodied as a calculation device that realizes the processing of the convolution layer of a convolutional neural network (CNN).
  • CNN convolutional neural network
  • a convolutional neural network includes a convolutional layer and a pooling layer.
  • the amount of input data such as an image to be recognized, gradually decreases as a series of processes in the convolutional layer and the pooling layer are repeatedly performed.
  • the convolutional neural network then ultimately outputs a classification probability value that indicates the type of object the input image is.
  • the recognition target is an image
  • data in which each pixel value is arranged two-dimensionally for each input channel of R, G, and B of each pixel of the image is input as input data to the convolution layer.
  • a kernel filter
  • a coordinate region e.g., a 3 ⁇ 3 region
  • the results calculated for each input channel are added together to calculate the output value of that coordinate region.
  • This multiplication with the kernel is performed on the entire input image by sequentially moving the coordinate region in each input channel while a portion of the coordinate region overlaps.
  • multiple sets of kernels are prepared for each input channel according to the number of output channels output by the convolution operation. For example, if the input data is three channels and the output data is three channels, three kernel sets consisting of three kernels that are multiplied to the coordinate region of each input channel are prepared.
  • output values are calculated by processing the data for each coordinate region of the two-dimensional data (output matrix) output as the result of the calculations in the convolution layer. For example, in the pooling layer, the average or maximum value in a 2x2 coordinate region is output as the output value for that coordinate region. Note that the pooling layer may be omitted. When the pooling layer is omitted, the data output by the convolution calculation in the first layer is used as input data to carry out the convolution calculation in the second layer.
  • FIG. 1 is a schematic diagram showing the configuration of a calculation device in one embodiment of the present invention.
  • the calculation device 100 of this embodiment includes a data memory 111, a matrix calculation unit 112, a zero check unit 113, a submap memory 114, a map check unit 115, and a controller 116.
  • the calculation device 100 creates a submap separate from the output matrix (main map) obtained as the calculation result in a conventional convolution calculation, and uses the submap to reduce the time required for the entire convolution calculation.
  • a submap is created for each input channel used as input data for the convolution calculation in the next layer.
  • the data memory 111 stores the data to be calculated. As described above, when an image is the object of calculation, for example, the pixel values of each pixel constituting the image are stored in the data memory 111 as input data. In this embodiment, the data memory 111 also stores the kernel and bias described below that are used for the matrix calculation in the matrix calculation unit 112.
  • the matrix calculation unit 112 stores an output matrix (output channel), which is the result of the matrix calculation for the entire input data, in the data memory 111 as the data to be calculated in the next layer in the convolution calculation.
  • the matrix calculation unit 112 outputs, as an output value, the sum of the elements of the calculation matrix Q, which is the result of performing the above matrix calculation for the coordinate area of each input channel.
  • the matrix calculation unit 112 performs the matrix calculation for the entire input data, and stores the output value for each coordinate area in the data memory 111 together with information indicating the coordinate area. Therefore, when the matrix calculation for the input data is completed, the data memory 111 stores the output matrix that is the result of performing the matrix calculation (convolution operation) on the entire input data.
  • the matrix calculation unit 112 first performs matrix calculations for each coordinate region for one of the three input channels.
  • the data memory 111 stores an output matrix resulting from the matrix calculations for one input channel.
  • the matrix calculation unit 112 performs matrix calculations for each coordinate region for one of the remaining two input channels.
  • a bias a matrix in which elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrix for the first input channel stored in the data memory 111 is used.
  • the data memory 111 stores an output matrix in which the result of the matrix calculation for the first input channel and the result of the matrix calculation for the second input channel are added together.
  • the matrix calculation unit 112 performs matrix calculations for each coordinate region for the remaining input channel.
  • a matrix in which all elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrices for the first and second input channels stored in the data memory 111 are used as the bias.
  • the data memory 111 stores an output matrix in which the results of the matrix calculation for the three input channels are added together.
  • the zero check unit 113 judges whether each element of the output matrix output by the matrix calculation unit 112, i.e., the output value of the matrix calculation for each coordinate region, falls within a pre-specified range. Although not limited to this, in this embodiment, the zero check unit 113 judges which of a number of ranges defined by a number of pre-set threshold values the element belongs to. As described below, in this embodiment, the zero check unit 113 is configured to make the above-mentioned judgment each time an output value is output from the matrix calculation unit 112, based on the state information stored in the submap memory 114 at that time and the output value from the matrix calculation unit 112.
  • the submap memory 114 stores the judgment result of the zero check unit 113 as state information.
  • the state information is stored in correspondence with the output matrix described above.
  • the state information is one of the pieces of information that constitute the submap described above.
  • the state information is information indicating which of four states the state belongs to. That is, the state information is information indicating which of the four states the state belongs to: "state 1" where all the elements of the output matrix are equal to or less than the first threshold, "state 2" where all the elements of the output matrix are equal to or less than the second threshold, "state 3" where all the elements of the output matrix are equal to or less than the third threshold, and "state 4" where the state is neither state 1, state 2, nor state 3.
  • the numerical values "0", “1", “2”, and “3" are stored as information indicating state 1, state 2, state 3, and state 4, respectively.
  • the state information is updated each time the zero check unit 113 makes a judgment.
  • the submap memory 114 may be configured as part of the memory device that constitutes the data memory 111, or may be configured as a separate memory device.
  • the map check unit 115 judges whether or not to have the matrix calculation unit 112 read out the output matrix corresponding to the state information as input data (data to be calculated) for the next layer in the convolution calculation. For example, in the above example, the following judgment is made. If the state information is state 1, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112. If the state information is state 2 or state 3, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112 when the value of kernel A satisfies a preset condition.
  • the preset condition is, for example, state 2 where more than half of the kernel elements are zero, state 3 where more than 3/4 of the kernel elements are zero, etc. Also, if the state information is state 4, the map check unit 115 judges to read out the data (output matrix) corresponding to the matrix calculation unit 112.
  • the calculation device 100 also includes a controller 116 that controls the operation timing of the data memory 111, matrix calculation unit 112, zero check unit 113, submap memory 114, and map check unit 115.
  • the matrix calculation unit 112 can be realized by a processor, such as a GPU (Graphic Processing Unit) specialized for image processing.
  • a processor such as a GPU (Graphic Processing Unit) specialized for image processing.
  • each element that performs signal processing and data processing such as the zero check unit 113, map check unit 115, and controller 116, can be realized by, for example, a dedicated arithmetic circuit, or hardware equipped with a processor and memory such as RAM (Random Access Memory) or ROM (Read Only Memory), and software that is stored in the memory and runs on the processor.
  • RAM Random Access Memory
  • ROM Read Only Memory
  • the operation of the arithmetic device 100 having the above configuration will be described.
  • a convolution operation is performed on input data
  • a submap including state information is created for each output matrix used as an input channel for the convolution operation of the next layer. That is, the submap created during the convolution operation of the first layer is used during the convolution operation of the second layer, and the submap created during the convolution operation of the second layer is used during the convolution operation of the third layer. Therefore, in the convolution operation of the first layer, there are no submaps corresponding to each input channel.
  • the operation during the convolution operation of the first layer and the convolution operations of the second layer and onwards will be described.
  • FIG. 2 is a flow diagram showing the procedure performed during the first layer convolution calculation of the calculation device 100 of this embodiment. Note that FIG. 2 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.
  • the matrix calculation unit 112 starts matrix calculation for the first input channel.
  • the matrix calculation unit 112 first reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (step S201).
  • the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel from the data memory 111, and the bias described above (step S202).
  • the bias for the first input channel is a zero matrix.
  • the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S203 and S204).
  • the matrix calculation unit 112 inputs the output value to the zero check unit 113.
  • the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to.
  • the zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S205).
  • the calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.
  • the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No).
  • the matrix calculation unit 112 reads the kernel to be used for the matrix calculation for the second input channel from the data memory 111 (step S201).
  • the matrix calculation unit 112 also reads the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the second input channel, and the bias from the data memory 111 (step S202).
  • the bias for the second input channel is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time are set to zero except for the elements corresponding to the coordinate region to be calculated.
  • the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S203 and S204).
  • the matrix calculation unit 112 inputs the output value to the zero check unit 113.
  • the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to.
  • the zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S205).
  • the calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the second input channel (step S206: No).
  • the data memory 111 stores an output matrix that is the sum of the results of the matrix calculation for the first input channel and the results of the matrix calculation for the second input channel.
  • the data memory 111 will store an output matrix that is the result of performing matrix calculations on the entire input data (all input channels).
  • m output matrices are stored in the data memory 111, and m submaps corresponding to each output matrix are stored in the submap memory 114.
  • These m output matrices are used as input channels for the convolution calculation of the next layer.
  • FIG. 3 is a flow diagram showing the procedure performed during convolution calculations from the second layer onward in the calculation device 100 of this embodiment. Note that FIG. 3 shows an example in which the number of input channels is n and the number of output channels is m. The number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.
  • the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information contained in that submap (step S301).
  • the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in the submap (steps S302 Yes, S310 No, S301).
  • the map check unit 115 causes the matrix calculation unit 112 to read a kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is confirmed whether or not the kernel satisfies the above-mentioned conditions (steps S302: No, S303). If the kernel satisfies the above-mentioned conditions, the map check unit 115 reads a submap corresponding to the second input channel from the submap memory 114, and checks the state information included in the submap (steps S304: Yes, S310: No, S301).
  • the map check unit 115 causes the matrix calculation unit 112 to execute a matrix calculation (step S302: No).
  • the matrix calculation unit 112 reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (steps S302: No, S303).
  • the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel, and the above-mentioned bias from the data memory 111 (steps S304: No, S305).
  • each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.
  • the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S306 and S307).
  • the matrix calculation unit 112 inputs the output value to the zero check unit 113.
  • the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to.
  • the zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S308).
  • the calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S309 No).
  • the data memory 111 stores the output matrix that is the result of performing matrix calculations on the entire input data (all input channels).
  • state information is created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to that state information is skipped. Also, at this time, data belonging to that input channel is not read from the data memory 111 to the matrix calculation unit 112. In other words, since the reading of unnecessary data does not occur, it is possible to further reduce the wasted data read time, and the time required for the entire calculation can be further shortened compared to the conventional method.
  • FIG. 4 is a schematic diagram showing an example of the zero check unit 113 provided in the arithmetic device 100.
  • the zero check unit 113 has an input terminal 31, a comparison terminal 32, and an output terminal 33.
  • the input terminal 31 receives an output value from the matrix calculation unit 112.
  • the comparison terminal 32 receives state information stored in the submap memory 114.
  • the output terminal 33 outputs data to be stored in the submap memory 114.
  • the output value from the matrix calculation unit 112 input through the input terminal 31 is input to the comparison unit 34 having multiple comparators.
  • the comparison unit 34 has a number of comparators equal to or greater than the number of preset thresholds. As described above, in this embodiment, three thresholds are set, so the comparison unit 34 has three comparators 34a, 34b, and 34c.
  • the output value from the matrix calculation unit 112 is input to one input terminal of each of the comparators 34a, 34b, and 34c, and the threshold is input to the other input terminal.
  • the comparators 34a, 34b, and 34c are configured to output the numerical value "1".
  • each comparator 34a, 34b, 34c is input to checker 35.
  • the status information stored in submap memory 114 at that time is input to checker 35 via comparison terminal 32.
  • the status information stored in submap memory 114 is a number between "0" and "3.”
  • the checker 35 When the output of each of the comparators 34a, 34b, and 34c includes the value "1" and the status information stored in the submap memory 114 needs to be updated, the checker 35 outputs an output corresponding to the updated status information to the output terminal 33. For example, when the stored status information is "0", the checker 35 updates the status information to one of the values “1", “2", or “3” according to the output of each of the comparators 34b and 34c when at least the output of the comparator 34a is the value "1". When the stored status information is "1”, the checker 35 updates the status information to one of the values "2" or “3” according to the output of the comparator 34c when at least the output of the comparator 34b is the value "1". When the stored status information is "2”, the checker 35 updates the status information to the value "3” when the output of the comparator 34c is the value "1". When the stored status information is "3", the checker 35 does not update the status information.
  • the zero check unit 113 when the convolution operation to obtain one output matrix is completed, the status information corresponding to that output matrix is stored in the submap memory 114.
  • the zero check unit 113 can also be realized with other configurations. For example, a configuration can be adopted in which the zero check unit 113 holds a cumulative value of the number of output values that belong to each range. In this case, each time an output value is output from the matrix calculation unit 112, the zero check unit 113 can determine which of the above-mentioned multiple ranges it belongs to based on the held cumulative value.
  • the zero check unit 113 is configured to be able to make the above-mentioned judgments as well as make negative judgments as to whether or not the elements of the output matrix contain negative values, and to count the number of elements in the output matrix that exceed any of the above-mentioned thresholds.
  • negative judgment information indicating the result of the negative judgment and counting information indicating the coefficient result are stored in the submap memory 114.
  • the negative judgment information and counting information, together with the above-mentioned state information, are information that constitutes the above-mentioned submap.
  • the zero check unit 113 includes a comparator 34d in the comparison unit 34, one input terminal of which receives the output value from the matrix calculation unit 112 and the other input terminal of which receives the numerical value "0".
  • the comparator 34d is configured to output the numerical value "1" when the input output value from the matrix calculation unit 112 is smaller than "0".
  • the output of the comparator 34d is input to the OR circuit 36.
  • the negative judgment information stored in the submap memory 114 at that time is also input to the OR circuit 36 via the comparison terminal 32. If either the output of the comparator 34d or the negative judgment information stored in the submap memory 114 is the numerical value "1", the OR circuit 36 outputs the numerical value "1" to the output terminal 33.
  • the convolution operation for obtaining one output matrix is completed, if the element of the output matrix contains a negative value, the numerical value "1" is stored in the submap memory 114 as the negative judgment information. Furthermore, if the elements of the output matrix do not contain negative values, the numerical value "0" will be stored in the submap memory 114 as negative determination information.
  • the zero check unit 113 also includes a selector 37 to which the outputs of the comparators 34a, 34b, and 34c are input.
  • the selector 37 inputs one of the outputs of the comparators 34a, 34b, and 34c that is set in advance to a counter 38.
  • the counter 38 also receives the count information stored in the submap memory 114 at that time via a comparison terminal 32.
  • the counter 38 outputs a value obtained by adding "1" to the stored count information to the output terminal 33.
  • the selector 36 is set to a state in which it outputs the output value of the comparator 34c. According to this configuration, when the convolution operation for obtaining one output matrix is completed, the count value of the elements contained in the output matrix that are greater than the third threshold is stored in the submap memory 114 as count information.
  • the above-mentioned negative determination information can be used as a flag to determine which process to implement when the process differs depending on whether or not there is a negative value.
  • the count information can perform a process such as skipping the corresponding output matrix (input channel) without reading it out if the total number of elements exceeds a preset threshold value.
  • FIG. 5 is a diagram showing an example of a submap.
  • a submap using 1 byte (8 bits) of data is shown here as an example.
  • the submap 40 includes status information, negative judgment information, and count information.
  • the submap 40 is composed of 2 bits of status information, 1 bit of negative judgment information, and 5 bits of count information.
  • the address information of the submap 40 in the submap memory 114 and the address information of the output matrix (next layer input channel) corresponding to the submap 40 stored in the data memory 111 are mutually related. This relationship is, for example, such that the top address of the submap 40 is an address obtained by adding a pre-specified offset to the top address of the output matrix corresponding to the submap 40.
  • the zero check unit 113 updates the state information in steps S205 and S308 each time the matrix calculation unit 112 outputs a calculation result.
  • FIG. 6 is a schematic diagram showing the configuration of a modified example of the arithmetic device in one embodiment of the present invention. Note that in Figure 6, components that achieve the same effects as the arithmetic device 100 are given the same reference numerals as in Figure 1, and detailed description thereof will be omitted below.
  • the arithmetic device 300 of this embodiment further includes a submap address buffer 120.
  • the arithmetic device 300 creates a submap address table in addition to the submap created by the arithmetic device 100 described above.
  • the submap address table is stored in the submap address buffer 120.
  • the submap address table 41 is a table in which ID numbers, address information, and usage information are recorded in a linked state.
  • the submap address buffer 120 may be configured as part of the memory device that constitutes the submap memory 114 or the memory device that constitutes the data memory 111, or may be configured as a separate memory device.
  • the ID number functions as information for identifying a submap. As described above, a submap is created for each input channel used as input data for the convolution operation of the next layer, that is, for each output channel in the layer where the convolution operation is performed. Therefore, the number of ID numbers is the same as the number of output channels.
  • a unique number is assigned as the ID number, which is a combination of a number indicating which layer of the convolution operation and a number indicating which output channel. For example, when three output channels are operated in the (k-1)th layer, the ID numbers "k1", “k2", and “k3" are assigned. More specifically, the ID numbers "31”, "32", and “33” are assigned to the three output channels of the second layer. Since these output channels become input channels in the operation of the third layer, when reading out the submap, the address information associated with the ID numbers "31", "32", and "33” is referenced.
  • the address information is information that indicates the storage location of the submap in the submap memory 114. More specifically, it is, for example, the starting address of the storage location of the submap. As explained with reference to FIG. 5, the data length (number of bits) of the submap is constant. Therefore, the storage location of the submap in the submap memory 114 can be identified by a single address.
  • the usage information indicates whether the linked submap has been used in the convolution calculation of the next layer.
  • the usage information is displayed as "valid".
  • a submap is created for each input channel that is used as input data for the convolution calculation of the next layer. Therefore, a submap that is read and used in the convolution calculation of the next layer is not used in subsequent convolution calculations.
  • the numerical values "0" and "1" are used as information indicating that it has been used and information indicating that it has not been used, respectively.
  • a new ID number, address information, and usage information can be linked and stored in a record in the submap address table whose usage information is "0".
  • the controller 116 generates the ID number, address information, and usage information, and records them in the submap address table of the submap address buffer 120, but other configurations can also be used.
  • a submap address management unit having the function of performing these processes may be provided separately from the controller 116.
  • FIG. 8 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the first-layer convolution calculation by the arithmetic device 300.
  • FIG. 8 shows an example in which the number of input channels is n and the number of output channels is m.
  • the procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the arithmetic device 300.
  • the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 8, steps that perform the same operation as the arithmetic device 100 are given the same reference numerals as in FIG. 2, and detailed description thereof will be omitted below.
  • the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel).
  • the controller 116 generates data for the submap address table (step S220). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the record in the submap address table 41 whose usage information is "0".
  • the ID number "21" is generated at this time.
  • An address in the submap memory 114 is appropriately selected as the address information.
  • a configuration can be adopted in which all selectable addresses in the submap memory 114 are recorded in the submap address table 41, and the controller 116 selects the address information recorded in the record into which information is to be written as the address information to be linked to the generated ID number. Note that the linked submap has not yet been read, so the usage information recorded is "1".
  • the matrix calculation unit 112 starts matrix calculation for the first input channel.
  • the procedure for matrix calculation for the first input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above.
  • the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.
  • the calculation device 300 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.
  • the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No).
  • the procedure for the matrix calculation for the second input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above.
  • the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.
  • the data memory 111 stores an output matrix that is the result of performing a matrix calculation on all input data (all input channels).
  • the controller 116 When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S220). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120.
  • the controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "22". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".
  • m output matrices When all calculations to obtain a given number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices will be used as input channels for the convolution operation of the next layer.
  • the data for the submap address table is generated at the start of the calculation of the output matrix.
  • the data for the table may be generated at other times in the matrix calculation for one output channel, as long as it is generated before the zero check unit 113 first stores the judgment result (state information) in the submap memory 114.
  • FIG. 9 is a flow diagram showing the procedure performed by the arithmetic device 300 of this embodiment when performing convolution calculations on the second and subsequent layers.
  • FIG. 9 shows an example in which the number of input channels is n and the number of output channels is m.
  • the number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.
  • the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 9, steps that perform the same operations as the arithmetic device 100 are given the same reference numerals as in FIG. 3, and detailed explanations thereof will be omitted below.
  • the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel).
  • the controller 116 generates data for the submap address table (step S320). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120.
  • the controller 116 records the generated information in a record in the submap address table 41 whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "31". Also, since the linked submap has not yet been read, the usage information recorded is "1".
  • the matrix calculation unit 112 starts matrix calculation for the first input channel.
  • the map check unit 115 reads address information of the submap corresponding to the first input channel from the submap address buffer 120 (step S321). Then, based on the address information, the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information included in the submap (step S301). For example, if the calculation is for the first input channel of the second layer, the map check unit 115 reads address information linked to the ID number "21". In addition, the map check unit 115 notifies the controller 116 of the ID number for which the address information was read from the submap address buffer 120.
  • the controller 116 that receives the notification rewrites the usage information linked to the ID number of the submap address buffer 120 from "1" to "0". Note that the rewriting of the usage information can be performed at any timing other than the timing. However, from the perspective of making effective use of the storage area of the submap memory 114, it is preferable to perform this process after the address information is read and before a new submap is stored in the submap memory 114.
  • the subsequent procedure for the first input channel is generally the same as that described in FIG. 3, but the zero check unit 113 stores the result of the determination in a storage location in the submap memory 114 specified by the address information generated by the controller 116 (step S308).
  • the calculation device 300 repeats the above process until it is completed for all coordinate areas belonging to the first input channel (step S309 No).
  • the controller 116 When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S320). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120.
  • the controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "32". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".
  • m output matrices When all calculations to obtain a predetermined number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices are used as input channels for the convolution calculation of the next layer. Then, the calculation device 300 repeatedly performs the procedures performed during the convolution calculation of the second layer and thereafter until all convolution calculations for the specified number of layers are completed.
  • the submap memory 114 can be realized using a memory with a limited size, such as a ring buffer.
  • a submap is generated only for the data of the input channel used in the convolution operation, and whether or not to read the data of the input channel is determined based on the state information included in the submap.
  • state information can also be applied to the kernel used in the matrix calculation of the input data. That is, a configuration can also be adopted in which state information is created for the kernel based on each element of the kernel, and when the state information satisfies a pre-specified condition, the matrix calculation using the kernel corresponding to the state information is skipped without reading the data of the input channel.
  • the state information of the kernel can be obtained in advance.
  • a configuration can be adopted in which the state information of the kernel is stored in the submap memory 114, and in the judgment of the map check unit 115 in step S302 of FIG. 3 or FIG. 9, in addition to the state information of the input channel, the state information of the kernel is also taken into consideration to determine whether or not to read the data of the input channel.
  • the matrix calculation unit 112 performs matrix calculations for one entire input channel, and after completion, performs matrix calculations for the entire next input channel. However, the matrix calculations do not need to be performed continuously for the entire input channel.
  • a configuration is described in which matrix calculations are performed for each element of the output matrix. In such a configuration, the matrix calculations are performed for each coordinate region that constitutes part of the input channel.
  • Figure 10 is a diagram for explaining the concept of this method. Figure 10 shows an example in which three output matrices (output channels) are obtained from three input channels. Also, in this method, only the order in which data is read by the matrix calculation unit 112 and the like is changed, and the configuration of the arithmetic device is the same as the configuration shown in Figure 1.
  • data in the same coordinate region in the three input channels is used to calculate an element located at a specific coordinate in each of the three output matrices.
  • data belonging to a 3 ⁇ 3 coordinate region 51a centered on the coordinate (2,3) of the first output matrix 61 is used in the matrix calculation to calculate an element 61a located at the coordinate (2,3) of the first input channel 51, data belonging to a 3 ⁇ 3 coordinate region 52a centered on the coordinate (2,3) of the second input channel 52, and data belonging to a 3 ⁇ 3 coordinate region 53a centered on the coordinate (2,3) of the third input channel 53.
  • data belonging to the coordinate region 51a of the first input channel 51, data belonging to the coordinate region 52a of the second input channel 52, and data belonging to the coordinate region 53a of the third input channel 53 are used in the matrix calculation to calculate an element 62a located at the coordinate (2,3) of the second output matrix 62 and an element 63a located at the coordinate (2,3) of the third output matrix 63, although the kernels are different.
  • the same output matrix as that calculated in the above embodiment can be calculated by sequentially reading out the data in the coordinate region 51a of the first input channel 51, the data in the coordinate region 52a of the second input channel 52, and the data in the coordinate region 53a of the third input channel 53, and then sequentially reading out the data in the coordinate regions whose positions have been changed in each of the input channels 51, 52, and 53, and performing matrix calculations.
  • FIG. 11 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the first-layer convolution calculation by the calculation device 100.
  • FIG. 11 shows an example in which the number of input channels is n and the number of output channels is m.
  • this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.
  • step S701 the coordinates of the elements of the output matrix to be calculated are determined. Then, based on the determined coordinates, the coordinate area of the input channel required for the matrix calculation is identified (step S702). Note that in this embodiment, the controller 116 determines the coordinates of the elements in the output matrix and identifies the coordinate area of the input channel.
  • the matrix calculation unit 112 reads from the data memory 111 the kernel to be used for the matrix calculation for the first input channel for the first output matrix (step S703).
  • the matrix calculation unit 112 also reads from the data memory 111 the data belonging to the coordinate region of the first input channel and the bias described above (step S704). As described above, the bias for the first input channel is a zero matrix.
  • the matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S705 and S706). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. The zero check unit 113 then stores the judgment result in the submap memory 114 (step S707).
  • the matrix calculation unit 112 reads out from the data memory 111 the kernel to be used for the matrix calculation of the first output matrix for the second input channel (step S708 No, S703).
  • the matrix calculation unit 112 also reads out from the data memory 111 the data belonging to the coordinate region of the second input channel and the bias described above (step S704).
  • the bias is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time, other than the element to be calculated, are set to zero.
  • the matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S705 and S706). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. The zero check unit 113 then stores the judgment result in the submap memory 114 (step S707).
  • the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.
  • the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S708: Yes, S709: No, S701, S702).
  • the calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S709: No).
  • FIG. 12 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 100.
  • FIG. 12 shows an example in which the number of input channels is n and the number of output channels is m.
  • the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.
  • the controller 116 When this procedure is started, the controller 116 first determines the coordinates of the output matrix elements to be calculated, and then identifies the coordinate region of the input channel required for the matrix calculation based on the determined coordinates (steps S801 and S802).
  • the map check unit 115 reads out the submap corresponding to the first input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (step S803). If the result of the check is that no matrix calculation is to be performed, the map check unit 115 reads out the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S804 Yes, S811 No, S803).
  • the map check unit 115 causes the matrix calculation unit 112 to read the kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is checked whether the kernel satisfies the above-mentioned conditions (steps S804: No, S805). If the result of the check is that the matrix calculation is not to be performed, the map check unit 115 reads the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S806: Yes, S811: No, S803).
  • the map check unit 115 causes the matrix calculation unit 112 to execute the matrix calculation (step S806: No).
  • the matrix calculation unit 112 reads out data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel stored in the data memory 111, and the bias described above (step S807).
  • each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.
  • the matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S808 and S809).
  • the matrix calculation unit 112 inputs the output value to the zero check unit 113.
  • the zero check unit 113 performs the above-mentioned state judgment.
  • the zero check unit 113 stores the judgment result in the submap memory 114 (step S810).
  • the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.
  • the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S811: yes, S812: no, S801, S802).
  • the calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S812: no).
  • the above-mentioned effects can also be obtained with this method.
  • state information is created based on each element of the output matrix of the convolution calculation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped.
  • the matrix calculation unit 112 skips the data belonging to the input channel from the data memory 111 without reading it. In other words, since the reading of unnecessary data does not occur, the wasted data read time can be reduced, and the time required for the entire calculation can be further shortened compared to the conventional method.
  • the zero check unit 113 updates the state information in steps S707 and S810 each time the matrix calculation unit 112 outputs a calculation result.
  • step S804 or S806 of the flow diagram shown in FIG. 12 if the reading of input data of all input channels for the same element of the output matrix is skipped, a zero check is not performed on that element. That is, the state information corresponding to that output matrix is not updated to its initial value. For this reason, it is preferable to configure the initial value of the state information to be set to, for example, "0" indicating state 1.
  • the output matrix can be skipped without being read as input data in the next layer in the convolution calculation. That is, even if an element of the output matrix is not calculated, a zero clear operation (access to data memory 111) to set the value of that element to zero is not required.
  • the matrix calculation unit 112 is configured to read one coordinate domain data for each input channel. However, it is also possible to adopt a configuration in which the matrix calculation unit 112 reads multiple consecutive coordinate domain data for each input channel and writes the results of multiple matrix calculations to the data memory 111. This makes it possible to parallelize the matrix calculations. As can be seen from FIG. 10, reading consecutive coordinate domain data in each input channel and performing matrix calculations is equivalent to performing matrix calculations on consecutive elements in the output matrix.
  • the zero check unit 113 stores status information based on any one of the multiple calculated output values in the submap memory 114 as status information for all elements corresponding to the multiple output values.
  • FIG. 13 is a diagram for explaining the concept of this method.
  • FIG. 13 when the matrix calculation for calculating element 71a located at coordinates (2, 3), element 71b located at coordinates (3, 3), and element 71c located at coordinates (4, 3) of output matrix 71 is parallelized, zero check unit 113 registers the judgment result for any element (for example, element 71c) in submap memory 114 as the judgment result for all elements 71a, 71b, and 71c. This makes it easy to realize parallelization of matrix calculation. Note that here, an example is shown in which three consecutive data are treated as one unit, but if the data is consecutive, this method can also be applied in output matrix units, output matrix row units, and input channel input range units.
  • the map check unit 115 checks the state information contained in the submap and selects whether or not to have the matrix calculation unit 113 read the data of the input channel. From the perspective of further reducing the time required for the entire calculation, it is preferable to check as few times as possible.
  • FIG. 14 is a schematic diagram showing the configuration of a calculation device that can reduce the number of confirmations.
  • the calculation device 200 in addition to the configuration of the calculation device 100 described above, the calculation device 200 includes a table creation unit 117 and a read control unit 118. Note that in FIG. 14, components that achieve the same effects as the calculation device 100 are given the same reference numerals as in FIG. 1, and detailed explanations thereof will be omitted below.
  • the table creation unit 117 creates a table that specifies the output matrices to be read as input data by the matrix calculation unit 112 based on the judgment result of the map check unit 115. That is, when starting a convolution operation, the table creation unit 117 first makes the map check unit 115 read state information corresponding to all input channels used in the convolution operation (all output matrices calculated in the convolution operation in the previous layer), and judges whether to make the matrix calculation unit 112 read them as input data using the above-mentioned method. Then, based on the judgment result, the table creation unit 117 creates a table that specifies the input channels to be read as input data by the matrix calculation unit 112.
  • the table creation unit 117 creates a table indicating that.
  • the table creation unit 117 is configured to hold the created table by itself.
  • the read control unit 118 causes the matrix calculation unit 112 to read the data to be calculated based on the table created by the table creation unit 117. As described above, when a table is created indicating that the matrix calculation unit 112 is to read the first and third input channels as input data, the read control unit 118 causes the matrix calculation unit 112 to perform the matrix calculation using only the first and third input channels in the convolution operation.
  • the table creation unit 117 and the read control unit can be realized, for example, by hardware including a processor and memory such as RAM or ROM, and by software stored in the memory and running on the processor.
  • the operation of the arithmetic device 200 having the above configuration will be described. Even in the arithmetic device 200, the operation differs between the first layer convolution operation where there is no submap and the second layer and subsequent layers where there are submaps. However, the operation of the first layer convolution operation is the same as the operation shown in FIG. 11, so a description thereof will be omitted here.
  • FIG. 15 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 200.
  • FIG. 15 shows an example in which the number of input channels is n and the number of output channels is m.
  • the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.
  • the table creation unit 117 first causes the map check unit 115 to read out state information corresponding to all input channels, and then uses the method described above to determine whether or not to cause the matrix calculation unit 112 to read out the state information as input data. The table creation unit 117 then creates the above-mentioned table based on the result of this determination (step S1101).
  • the controller 116 determines the coordinates of the elements of the output matrix to be calculated, and, based on the determined coordinates, identifies the coordinate region of the input channel required for the matrix calculation (steps 1102, S1103).
  • the read control unit 118 instructs the matrix calculation unit 112 to execute a matrix calculation for the input channel listed first in the table.
  • the matrix calculation unit 112 reads out from the data memory 111 a kernel to be used for the matrix calculation for the input channel listed first in the table for the first output matrix (step S1104).
  • the matrix calculation unit 112 also reads out from the data memory 111 data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the input channel listed first in the table, and the bias mentioned above (step S1105).
  • each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.
  • the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S1106 and S1107).
  • the matrix calculation unit 112 inputs the output value to the zero check unit 113.
  • the zero check unit 113 performs the above-mentioned state judgment.
  • the zero check unit 113 stores the judgment result in the submap memory 114 (step S1108).
  • the calculation device 200 repeats the above process for the first output matrix until processing is completed for all input channels listed in the above table (step S1109 No).
  • the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.
  • the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel corresponding to the determined coordinates (steps S1109: Yes, S1110: No, S1102, S1103).
  • the calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S1110: No).
  • state information is also created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped.
  • the matrix calculation unit 112 skips data belonging to the input channel from the data memory 111 without reading it. In other words, since no unnecessary data is read, the data read time that is wasted as a result can be further reduced, and the time required for the entire calculation can be further shortened compared to the conventional method.
  • the arithmetic device 200 since the input channels used for the calculation are written in a table, it is not necessary to determine whether or not to read data belonging to the input channel each time a matrix calculation is performed. As a result, the time required for the entire calculation can be further shortened.
  • the zero check unit 113 updates the state information in step S1108 each time the matrix calculation unit 112 outputs a calculation result.
  • the state information can also be updated at other times.
  • the zero check unit 113 can create state information and store it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.
  • a submap can also be created in physical memory access units.
  • a physical memory access unit refers to the amount of data that can be obtained by one memory access.
  • FIGS. 16(a) and 16(b) are diagrams for explaining the concept of this method.
  • FIG. 16(a) corresponds to the case where the amount of data in one input channel is smaller than the memory access unit.
  • FIG. 16(b) corresponds to the case where the amount of data in one input channel is larger than the memory access unit.
  • memory access unit 81 contains data from three input channels, Ch0, Ch1, and Ch2.
  • memory access unit 82 contains data from two input channels, Ch2 and Ch3.
  • creating submaps in input channel units results in four submaps, 83a, 83b, 83c, and 83d, whereas creating submaps in memory access units results in two submaps, 84a and 84b.
  • the matrix calculations for the three input channels Ch0, Ch1, and Ch2 can be skipped simply by checking the state information.
  • the state information of the submap 84a of the memory access unit 81 is, for example, the above-mentioned state 4
  • the matrix calculations for a specific input channel can be skipped by further checking the state information of the submaps 82a, 82b, and 82c of the three input channels Ch0, Ch1, and Ch2.
  • Such a technique can be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, in units of memory access, whether or not each element of the output matrix by the matrix calculation unit 112 falls within a pre-specified range.
  • the zero check unit 113 can be configured to store the determination result of this zero check unit 113 in the submap memory 114 as second state information.
  • the data for one input channel when the amount of data in one input channel is larger than the memory access unit, the data for one input channel will be made up of multiple memory access units.
  • the data for one input channel 95 is made up of four memory access units 91, 92, 93, and 94.
  • creating submaps in input channel units will result in one submap 96a, whereas creating submaps in memory access units will result in four submaps 97a, 97b, 97c, and 97d.
  • the state information of the input channel unit submap 96a is, for example, state 4 described above, it may be possible to skip the matrix calculation for a portion of the input channel by further checking the state information of the four memory access unit submaps 97a, 97b, 97c, and 97d.
  • Such a technique can also be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, on a memory access basis, whether each element of the output matrix by the matrix calculation unit falls within a pre-specified range.
  • the operation when implementing this method differs between the first layer convolution operation where there is no submap and the second layer and subsequent convolution operations where there are submaps.
  • the operation of the first layer convolution operation is the same as the operation shown in FIG. 2 except that the zero check unit 113 further stores the above-mentioned second state information in the submap memory 114, so a description thereof will be omitted here.
  • a step is added in which the map check unit 115 reads the submap corresponding to the memory access and checks the second state information contained in the submap. That is, as shown in FIG. 16(a), when the amount of data of one input channel is smaller than the memory access unit, a step is added in which the submap corresponding to the memory access is read and the second state information contained in the submap is checked before the step in which the submap corresponding to the memory access is checked. Also, as shown in FIG.
  • the operation when calculating the first output matrix of the same layer is the same as the part that calculates one output matrix in the flow diagram shown in FIG. 2 and the operation shown in FIG. 11. Also, the operation when calculating the second and subsequent output matrices of the same layer is the same as the case where the state information read by the map check unit 115 is the state information corresponding to the first output matrix of the same layer in the flow diagram shown in FIG. 3 and the operation shown in FIG. 12.
  • a pooling layer may exist between the convolution operation of the k-1th layer and the convolution operation of the kth layer. Even if pooling is performed on the output matrix, the characteristics of the output matrix before pooling are inherited by the data after pooling, so the information in the submap can be used without problems. Furthermore, in the above-described embodiments, a case has been described in which the submap includes negative judgment information and counting information, but it is sufficient that the submap includes at least status information, and it is not essential that it includes other information.
  • the matrix calculation performed by the arithmetic device is a matrix calculation in a convolutional layer of a convolutional neural network, but the present invention is not limited to the convolutional layer of a convolutional neural network.
  • the present invention is applicable to any matrix calculation in which, in a series of matrix calculations, the output matrix of a previous matrix calculation is used as the data to be calculated in the subsequent matrix calculation.
  • the present invention is thus useful as a calculation device, since it is possible to reduce wasted time and therefore shorten the time required for the entire calculation compared to conventional methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Software Systems (AREA)
  • Algebra (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Complex Calculations (AREA)

Abstract

In the present invention, a matrix calculation unit reads calculation target data from a data memory, performs matrix calculation, and stores an output matrix in the data memory as the calculation target data. A zero check unit determines whether or not each element of the output matrix belongs to a predetermined range. A sub-map memory stores the determination result of the zero check unit as state information. On the basis of the state information stored in the sub-map memory, a map check unit determines whether or not to cause the matrix calculation unit to read, as the calculation target data, the output matrix corresponding to the state information.

Description

演算装置Calculation Unit

 本発明は、畳み込み演算等の行列計算を実行する演算装置に関する。 The present invention relates to a calculation device that performs matrix calculations such as convolution calculations.

 従来、機械学習の分野において、畳み込みニューラルネットワーク(CNN)を用いて、画像や動画の認識が行われている。例えば、画像の認識では、畳み込み層とプーリング層を使って入力画像を変換しながら、データ量を徐々に小さくしていき、最終的に各分類の確率の値を出力する。 Traditionally, in the field of machine learning, convolutional neural networks (CNNs) are used to recognize images and videos. For example, in image recognition, the input image is transformed using convolutional layers and pooling layers, gradually reducing the amount of data, and finally outputting the probability value for each classification.

 ここで、CNNの畳み込み層では、入力データにおけるそれぞれの座標領域(例えば、3×3のセルの領域)に対して、フィルタをかけ合わせる演算が実行される。そして、演算結果が次層の演算の入力として使用され、畳み込み演算が繰り返される。そのため、CNNを用いた機械学習では、多くの行列計算とメモリ帯域幅が必要になる。この条件を緩和する手法として、演算対象がゼロである場合に計算を行うことなくスキップする構成が用いられている(例えば、特許文献1、2等)。 Here, in the convolutional layer of a CNN, a filter multiplication operation is performed on each coordinate region (e.g., a 3x3 cell region) in the input data. The calculation result is then used as the input for the calculation in the next layer, and the convolution operation is repeated. For this reason, machine learning using a CNN requires many matrix calculations and memory bandwidth. As a method for alleviating this condition, a configuration is used that skips calculations when the calculation target is zero (e.g., Patent Documents 1 and 2, etc.).

特開2020-184309号公報JP 2020-184309 A 特開2018-028908号公報JP 2018-028908 A

 特許文献1や特許文献2が開示する技術は計算を実施しないため、計算時間を短縮することができる。その結果、畳み込み演算全体に要する時間も短縮することができる。しかしながら、当該技術では、演算対象の入力に使用されるデータが計算用メモリに読み込まれ、計算用メモリに読み込まれたデータに対してゼロであるか否かの判定がされる。すなわち、結果として計算がスキップされた場合には、計算に使用しないデータのために、データ読み出しの時間やメモリ領域が使用されていることになる。 The techniques disclosed in Patent Documents 1 and 2 do not involve calculations, and therefore can reduce calculation time. As a result, the time required for the entire convolution operation can also be reduced. However, in these techniques, the data used as input for the operation target is loaded into a calculation memory, and a determination is made as to whether the data loaded into the calculation memory is zero or not. In other words, if the calculation is skipped as a result, the time and memory space required to read the data is used for data not used in the calculation.

 本発明は、上記実情に鑑みてなされたものであり、畳み込み演算等の行列計算を実行する際に、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる演算装置を提供することを目的としている。 The present invention was made in consideration of the above-mentioned circumstances, and aims to provide a calculation device that can reduce the time that is wasted when performing matrix calculations such as convolution operations, and can further shorten the time required for the entire calculation compared to conventional methods.

 上述の目的を達成するために、本発明は以下の技術的手段を採用している。まず、本発明は、一連の行列計算において、先の行列計算の出力行列を後の行列計算の演算対象データとして使用する演算装置を前提としている。そして、本発明に係る演算装置は、データメモリ、行列計算部、ゼロチェック部、サブマップメモリ、及びマップチェック部を備える。データメモリは演算対象データを格納する。行列計算部はデータメモリからデータを読み出して行列計算をし、出力行列をデータメモリに格納する。ゼロチェック部は出力行列の各要素に対して予め指定された範囲内に属するか否かを判断する。サブマップメモリは、ゼロチェック部の判断結果を状態情報として格納する。マップチェック部は、サブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして行列計算部に読み出させるか否かを判断する。 In order to achieve the above object, the present invention employs the following technical means. First, the present invention is premised on a calculation device in which, in a series of matrix calculations, the output matrix of a previous matrix calculation is used as the data to be calculated for the subsequent matrix calculation. The calculation device according to the present invention comprises a data memory, a matrix calculation unit, a zero check unit, a submap memory, and a map check unit. The data memory stores the data to be calculated. The matrix calculation unit reads data from the data memory, performs a matrix calculation, and stores the output matrix in the data memory. The zero check unit judges whether each element of the output matrix falls within a pre-specified range. The submap memory stores the judgment result of the zero check unit as status information. The map check unit judges, based on the status information stored in the submap memory, whether to cause the matrix calculation unit to read out the output matrix corresponding to the status information as the data to be calculated.

 この演算装置によれば、マップチェック部が演算対象データを読み出させないと判断した場合、演算対象データが行列計算部に読み出されることなく計算がスキップされるため、不必要なデータ読み出し時間を低減することができる。その結果、演算全体に要する時間を短縮することが可能となる。 With this calculation device, if the map check unit determines that the data to be calculated should not be read, the calculation is skipped without the data to be calculated being read to the matrix calculation unit, thereby reducing unnecessary data read time. As a result, it is possible to shorten the time required for the entire calculation.

 上述の構成において、行列計算部が、同一出力行列を連続的に行列計算する複数のデータを一時に読み出す構成を採用することもできる。この場合、ゼロチェック部は、行列計算部に連続して読み出される複数のデータの状態情報として、当該複数のデータに対応する複数の状態情報のうちの1つを使用する。 In the above configuration, the matrix calculation unit may simultaneously read out multiple pieces of data for successively calculating the same output matrix. In this case, the zero check unit uses one of the multiple pieces of status information corresponding to the multiple pieces of data as status information for the multiple pieces of data successively read out to the matrix calculation unit.

 また、上述の構成は、上述の行列計算が、例えば、畳み込みニューラルネットワークにおける畳み込み演算である事例に適用可能である。この場合、行列計算部が、出力行列を畳み込み演算における次層の演算対象データとしてデータメモリに格納する。また、マップチェック部がサブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を畳み込み演算における次層の演算対象データとして行列計算部に読み出させるか否かを判断する。また、本構成において、行列計算部が、畳み込み演算における同一層に属する全ての入力チャネルにおいて、各入力チャネルの一部を構成する、同一の座標領域を計算対象データとして読み出して行列計算をする構成を採用することができる。また、当該構成において、行列計算部が、畳み込み演算における同一層において、連続的に行列計算する複数のデータを一時に読み出す構成を採用することもできる。この場合、ゼロチェック部は、行列計算部に連続して読み出される複数のデータの状態情報として、当該複数のデータに対応する複数の状態情報のうちの1つを使用する。 The above-mentioned configuration can also be applied to cases where the above-mentioned matrix calculation is, for example, a convolution operation in a convolutional neural network. In this case, the matrix calculation unit stores the output matrix in the data memory as the calculation target data of the next layer in the convolution operation. Furthermore, the map check unit determines whether or not to read the output matrix corresponding to the state information stored in the submap memory as the calculation target data of the next layer in the convolution operation to the matrix calculation unit. In addition, in this configuration, a configuration can be adopted in which the matrix calculation unit reads out the same coordinate area that constitutes a part of each input channel as the calculation target data in all input channels belonging to the same layer in the convolution operation and performs matrix calculation. In addition, in this configuration, a configuration can also be adopted in which the matrix calculation unit reads out multiple data for which matrix calculation is to be performed continuously at the same layer in the convolution operation at one time. In this case, the zero check unit uses one of multiple state information corresponding to the multiple data as the state information of the multiple data continuously read out to the matrix calculation unit.

 また、上述の演算装置において、テーブル作成部と読み出し制御部をさらに備える構成を採用することもできる。テーブル作成部は、マップチェック部の判断結果に基づいて、行列計算部に読み出させる出力行列を特定するテーブルを作成する。読み出し制御部は、作成されたテーブルに基づいて行列計算部にデータを読み出させる。 The above-mentioned arithmetic device may also be configured to further include a table creation unit and a read control unit. The table creation unit creates a table that specifies the output matrix to be read by the matrix calculation unit based on the determination result of the map check unit. The read control unit causes the matrix calculation unit to read data based on the created table.

 また、上述の演算装置において、ゼロチェック部が、メモリアクセス単位で行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成を採用することもできる。この場合、サブマップメモリは、当該ゼロチェック部の判断結果を第2状態情報として格納する。 In addition, in the above-mentioned arithmetic device, a configuration can be adopted in which the zero check unit further determines whether or not each element of the output matrix by the matrix calculation unit falls within a pre-specified range in units of memory access. In this case, the submap memory stores the determination result of the zero check unit as the second state information.

 また、上述の演算装置において、マップチェック部が、畳み込み演算における第1層目の最初の行列計算の結果としてサブマップメモリに格納された状態情報に基づいて、第1層目の以降の行列計算を実行する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the map check unit executes matrix calculations for the first layer and beyond based on state information stored in the submap memory as a result of the initial matrix calculation for the first layer in the convolution calculation.

 また、上述の演算装置において、状態情報に対応する出力行列を識別するための情報と、サブマップメモリにおける当該状態情報の格納位置を示す情報と、当該状態情報が次層の畳み込み演算に使用されたか否かを示す情報と、を紐づけて格納するサブマップメモリバッファをさらに備える構成を採用することもできる。この構成では、新たに生成される状態情報のサブマップメモリにおける格納位置として、次層の畳み込み演算に使用されたことを示す使用情報に紐づけられた格納位置が選択される。 In addition, the above-mentioned arithmetic device may be further configured to include a submap memory buffer that stores, in association with each other, information for identifying an output matrix corresponding to state information, information indicating the storage location of the state information in the submap memory, and information indicating whether the state information has been used in the convolution calculation of the next layer. In this configuration, a storage location in the submap memory that is associated with usage information indicating use in the convolution calculation of the next layer is selected as the storage location in the submap memory for the newly generated state information.

 また、上述の演算装置において、サブマップメモリに、上述の行列計算に使用されるカーネルの各要素に対して予め指定された範囲内に属するか否かを判断したカーネル状態情報が予め格納される構成を採用することもできる。この場合、マップチェック部が、サブマップメモリに格納された状態情報及びカーネル状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして行列計算部に読み出させるか否かを判断する。 In addition, the above-mentioned calculation device can also be configured to have the submap memory store in advance kernel state information that has been determined for each element of the kernel used in the above-mentioned matrix calculation as being within a pre-specified range. In this case, the map check unit determines, based on the state information and kernel state information stored in the submap memory, whether or not to cause the matrix calculation unit to read out the output matrix corresponding to the state information as data to be calculated.

 また、上述の演算装置において、ゼロチェック部が、複数の閾値と出力行列の各要素とを比較し、出力行列の全要素が、複数の閾値により規定される複数の範囲のいずれに属するかを判断する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit compares each element of the output matrix with multiple thresholds and determines which of multiple ranges defined by the multiple thresholds all elements of the output matrix belong to.

 また、上述の演算装置において、ゼロチェック部が、出力行列の各要素に負の値が存在するか否か、又は、複数の範囲のいずれか1つに属する要素の個数を、さらに判断する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit further determines whether or not a negative value exists in each element of the output matrix, or the number of elements that belong to one of a plurality of ranges.

 また、上述の演算装置において、ゼロチェック部が、同一層に属する入力チャネルにおいて、最後に行列計算される入力チャネルについての行列計算の際に状態情報を作成し、サブマップメモリに格納する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit creates status information during matrix calculation for the input channel that is the last to be calculated among input channels belonging to the same layer, and stores the information in the submap memory.

 本発明によれば、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 According to the present invention, it is possible to reduce wasted time and further shorten the time required for the entire calculation compared to the conventional method.

図1は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 1 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図2は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 2 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図3は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 3 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図4は、本発明の一実施形態に係る演算装置が備えるゼロチェック部の一例を示す概略構成図である。FIG. 4 is a schematic configuration diagram showing an example of a zero check unit included in the arithmetic device according to an embodiment of the present invention. 図5は、本発明の一実施形態に係る演算装置が備えるサブマップの一例を示す概略構成図である。FIG. 5 is a schematic diagram showing an example of a sub-map included in the arithmetic device according to an embodiment of the present invention. 図6は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 6 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図7は、本発明の一実施形態に係る演算装置が備えるサブマップアドレステーブルの一例を示す概略構成図である。FIG. 7 is a schematic diagram showing an example of a submap address table included in the arithmetic unit according to one embodiment of the present invention. 図8は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 8 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図9は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 9 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図10は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。FIG. 10 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention. 図11は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 11 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図12は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 12 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図13は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。FIG. 13 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention. 図14は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 14 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図15は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 15 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図16(a)及び図16(b)は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。16(a) and 16(b) are explanatory diagrams that diagrammatically show the concept of a convolution calculation method by a calculation device according to one embodiment of the present invention.

 以下、本発明の実施形態について、図面を参照しながらより詳細に説明する。以下では、本発明に係る、一連の行列計算において先の行列計算の出力行列を後の行列計算の演算対象データとして使用する演算装置を、畳み込みニューラルネットワーク(CNN)の畳み込み層の処理を実現する演算装置として具体化している。 Below, an embodiment of the present invention will be described in more detail with reference to the drawings. In the following, a calculation device according to the present invention, which uses the output matrix of a previous matrix calculation as the calculation target data for a subsequent matrix calculation in a series of matrix calculations, is embodied as a calculation device that realizes the processing of the convolution layer of a convolutional neural network (CNN).

 公知のとおり、畳み込みニューラルネットワーク(CNN)は、畳み込み層とプーリング層とを含む。認識対象である画像等の入力データは、畳み込み層における処理とプーリング層における処理との一連の処理が繰り返し実施されることで、データ量が徐々に小さくなる。そして、畳み込みニューラルネットワークは、最終的に、その入力画像がどのような物体であるかを示す分類の確率の値を出力する。 As is well known, a convolutional neural network (CNN) includes a convolutional layer and a pooling layer. The amount of input data, such as an image to be recognized, gradually decreases as a series of processes in the convolutional layer and the pooling layer are repeatedly performed. The convolutional neural network then ultimately outputs a classification probability value that indicates the type of object the input image is.

 認識対象が画像である場合、畳み込み層では、例えば、画像の各画素のR、G、Bの各入力チャネルについて、各画素値を二次元状に配列したデータが入力データとして入力される。畳み込み層では、入力画像の各入力チャネルにおいて、入力データの一部を構成する座標領域(例えば、3×3の領域)に対して、カーネル(フィルタ)が掛け合わされる。そして、例えば、各入力チャネルについて計算された結果を合算することで、当該座標領域の出力値が算出される。このようなカーネルとの掛け合わせは、各入力チャネルにおいて、座標領域の一部分が重なる状態で座標領域が順次移動され、入力画像の全体に対して実施される。このとき、各入力チャネルに対して使用されるカーネルは、畳み込み演算により出力される出力チャネル数に応じた複数組が用意される。例えば、入力データが3チャネルで出力データが3チャネルの場合、各入力チャネルの座標領域に対して掛け合わされる3つのカーネルからなるカーネルセットが3組用意される。 When the recognition target is an image, for example, data in which each pixel value is arranged two-dimensionally for each input channel of R, G, and B of each pixel of the image is input as input data to the convolution layer. In the convolution layer, a kernel (filter) is multiplied to a coordinate region (e.g., a 3×3 region) that constitutes part of the input data for each input channel of the input image. Then, for example, the results calculated for each input channel are added together to calculate the output value of that coordinate region. This multiplication with the kernel is performed on the entire input image by sequentially moving the coordinate region in each input channel while a portion of the coordinate region overlaps. At this time, multiple sets of kernels are prepared for each input channel according to the number of output channels output by the convolution operation. For example, if the input data is three channels and the output data is three channels, three kernel sets consisting of three kernels that are multiplied to the coordinate region of each input channel are prepared.

 また、プーリング層では、畳み込み層での演算結果として出力される二次元データ(出力行列)について座標領域ごとにデータが処理されることで出力値が算出される。例えば、プーリング層では、2×2の座標領域における平均値や最大値がその座標領域の出力値として出力される。なお、プーリング層は省略されることもある。プーリング層が省略された場合、1層目の畳み込み演算により出力されたデータをそのまま入力データとして2層目の畳み込み演算が実施される。 In addition, in the pooling layer, output values are calculated by processing the data for each coordinate region of the two-dimensional data (output matrix) output as the result of the calculations in the convolution layer. For example, in the pooling layer, the average or maximum value in a 2x2 coordinate region is output as the output value for that coordinate region. Note that the pooling layer may be omitted. When the pooling layer is omitted, the data output by the convolution calculation in the first layer is used as input data to carry out the convolution calculation in the second layer.

 図1は、本発明の一実施形態における演算装置の構成を示す概略構成図である。図1に示すように本実施形態の演算装置100は、データメモリ111、行列計算部112、ゼロチェック部113、サブマップメモリ114、マップチェック部115、コントローラ116を備える。演算装置100は、従来の畳み込み演算において演算結果として得られる出力行列(メインマップ)とは別にサブマップを作成し、当該サブマップを利用して畳み込み演算全体に要する時間を短縮する。サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごとに作成される。 FIG. 1 is a schematic diagram showing the configuration of a calculation device in one embodiment of the present invention. As shown in FIG. 1, the calculation device 100 of this embodiment includes a data memory 111, a matrix calculation unit 112, a zero check unit 113, a submap memory 114, a map check unit 115, and a controller 116. The calculation device 100 creates a submap separate from the output matrix (main map) obtained as the calculation result in a conventional convolution calculation, and uses the submap to reduce the time required for the entire convolution calculation. A submap is created for each input channel used as input data for the convolution calculation in the next layer.

 データメモリ111は、演算対象データを格納する。上述のとおり、画像が演算対象になる場合、例えば、画像を構成する各画素の画素値が入力データとしてデータメモリ111に格納される。また、本実施形態では、データメモリ111には、行列計算部112における行列計算に使用される後述のカーネル、バイアスも格納されている。 The data memory 111 stores the data to be calculated. As described above, when an image is the object of calculation, for example, the pixel values of each pixel constituting the image are stored in the data memory 111 as input data. In this embodiment, the data memory 111 also stores the kernel and bias described below that are used for the matrix calculation in the matrix calculation unit 112.

 行列計算部112は、データメモリ111からデータを読み出して行列計算をする。行列計算部112は、上述のように、R、G、Bの入力チャネルごとに、3×3の領域により構成される座標領域の画素値とカーネルとの掛け合わせを含む行列計算を実行する。本実施形態では、行列計算部112は、座標領域データP、カーネルA、バイアスB、算出行列Q、として、Q=A*P+Bの行列計算を実施する。行列計算部112は、入力データ全体についての行列計算結果である出力行列(出力チャネル)を畳み込み演算における次層の演算対象データとしてデータメモリ111に格納する。特に限定されないが、本実施形態では、行列計算部112は、各入力チャネルの座標領域についての上記行列計算を実施した結果である算出行列Qの各要素の合算値を出力値として出力する。行列計算部112は、上述のとおり、入力データ全体について当該行列計算を実施し、各座標領域についての出力値を、座標領域を示す情報とともにデータメモリ111に格納する。したがって、入力データに対する行列計算が完了した時点で、データメモリ111には、入力データ全体に対して行列計算(畳み込み演算)を行った結果である出力行列が格納されていることになる。 The matrix calculation unit 112 reads data from the data memory 111 and performs matrix calculations. As described above, the matrix calculation unit 112 performs matrix calculations including multiplication of pixel values of a coordinate area consisting of a 3×3 area and a kernel for each of the R, G, and B input channels. In this embodiment, the matrix calculation unit 112 performs a matrix calculation of Q=A*P+B, where P is coordinate area data, A is kernel, B is bias, and Q is calculation matrix. The matrix calculation unit 112 stores an output matrix (output channel), which is the result of the matrix calculation for the entire input data, in the data memory 111 as the data to be calculated in the next layer in the convolution calculation. Although not limited in particular, in this embodiment, the matrix calculation unit 112 outputs, as an output value, the sum of the elements of the calculation matrix Q, which is the result of performing the above matrix calculation for the coordinate area of each input channel. As described above, the matrix calculation unit 112 performs the matrix calculation for the entire input data, and stores the output value for each coordinate area in the data memory 111 together with information indicating the coordinate area. Therefore, when the matrix calculation for the input data is completed, the data memory 111 stores the output matrix that is the result of performing the matrix calculation (convolution operation) on the entire input data.

 特に限定されないが、本実施形態では、行列計算部112は、まず、3つの入力チャネルの1つの入力チャネルについて、各座標領域に対する行列計算を実施する。この各座標領域に対する行列計算が完了した時点で、データメモリ111には1つの入力チャネルに対して行列計算がなされた出力行列が格納されていることになる。次いで、行列計算部112は、残りの2つの入力チャネルの1つの入力チャネルについて、各座標領域に対する行列計算を実施する。このとき、バイアスとして、データメモリ111に格納されている1番目の入力チャネルについての出力行列において、行列計算中の座標領域に対応する要素以外の要素をゼロにした行列が使用される。この場合、2番目の入力チャネルの各座標領域に対する行列計算が完了した時点で、データメモリ111には、1番目の入力チャネルに対する行列計算の結果と2番目の入力チャネルに対する行列計算の結果とが合算された出力行列が格納されていることになる。続いて、行列計算部112は、残りの1つの入力チャネルについて、各座標領域に対する行列計算を実施する。このとき、バイアスとして、データメモリ111に格納されている1番目及び2番目の入力チャネルについての出力行列において、行列計算中の座標領域に対応する要素以外の要素をゼロとした行列が使用される。この場合、3番目の入力チャネルの各座標領域に対する行列計算が完了した時点で、データメモリ111には、3つの入力チャネルに対する行列計算の結果が合算された出力行列が格納されていることになる。 In this embodiment, the matrix calculation unit 112 first performs matrix calculations for each coordinate region for one of the three input channels. When the matrix calculations for each coordinate region are completed, the data memory 111 stores an output matrix resulting from the matrix calculations for one input channel. Next, the matrix calculation unit 112 performs matrix calculations for each coordinate region for one of the remaining two input channels. At this time, as a bias, a matrix in which elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrix for the first input channel stored in the data memory 111 is used. In this case, when the matrix calculations for each coordinate region of the second input channel are completed, the data memory 111 stores an output matrix in which the result of the matrix calculation for the first input channel and the result of the matrix calculation for the second input channel are added together. Next, the matrix calculation unit 112 performs matrix calculations for each coordinate region for the remaining input channel. At this time, a matrix in which all elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrices for the first and second input channels stored in the data memory 111 are used as the bias. In this case, when the matrix calculation for each coordinate region of the third input channel is completed, the data memory 111 stores an output matrix in which the results of the matrix calculation for the three input channels are added together.

 ゼロチェック部113は、行列計算部112が出力する出力行列の各要素、すなわち、各座標領域に対する行列計算の出力値に対して予め指定された範囲内に属するか否かを判断する。特に限定されないが、本実施形態では、ゼロチェック部113は予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。後述のように、本実施形態では、ゼロチェック部113は、行列計算部112から出力値が出力される都度、その時点でサブマップメモリ114に格納されている状態情報と行列計算部112からの出力値とに基づいて上述の判断をする構成になっている。 The zero check unit 113 judges whether each element of the output matrix output by the matrix calculation unit 112, i.e., the output value of the matrix calculation for each coordinate region, falls within a pre-specified range. Although not limited to this, in this embodiment, the zero check unit 113 judges which of a number of ranges defined by a number of pre-set threshold values the element belongs to. As described below, in this embodiment, the zero check unit 113 is configured to make the above-mentioned judgment each time an output value is output from the matrix calculation unit 112, based on the state information stored in the submap memory 114 at that time and the output value from the matrix calculation unit 112.

 サブマップメモリ114は、ゼロチェック部113の判断結果を状態情報として格納する。当該状態情報は、上述の出力行列に対応して格納される。また、状態情報は上述のサブマップを構成する情報の1つである。状態情報とは、例えば、ゼロチェック部113に3つの閾値が予め設定されている場合、4つの状態のいずれに属するかを示す情報になる。すなわち、状態情報は、出力行列の要素のすべてが第1閾値以下である「状態1」、出力行列の要素のすべてが第2閾値以下である「状態2」、出力行列の要素のすべてが第3閾値以下である「状態3」、状態1・状態2・状態3のいずれでもない「状態4」、の4つの状態のいずれに属しているかを示す情報である。特に限定されないが、本実施形態では、状態1、状態2、状態3、状態4を示す情報としてそれぞれ「0」、「1」、「2」、「3」の数値が格納されている。当該状態情報は、ゼロチェック部113が判断する都度、更新される。なお、サブマップメモリ114はデータメモリ111を構成するメモリ装置の一部として構成されてもよく、別体のメモリ装置として構成されてもよい。 The submap memory 114 stores the judgment result of the zero check unit 113 as state information. The state information is stored in correspondence with the output matrix described above. The state information is one of the pieces of information that constitute the submap described above. For example, when three thresholds are preset in the zero check unit 113, the state information is information indicating which of four states the state belongs to. That is, the state information is information indicating which of the four states the state belongs to: "state 1" where all the elements of the output matrix are equal to or less than the first threshold, "state 2" where all the elements of the output matrix are equal to or less than the second threshold, "state 3" where all the elements of the output matrix are equal to or less than the third threshold, and "state 4" where the state is neither state 1, state 2, nor state 3. Although not limited to this, in this embodiment, the numerical values "0", "1", "2", and "3" are stored as information indicating state 1, state 2, state 3, and state 4, respectively. The state information is updated each time the zero check unit 113 makes a judgment. The submap memory 114 may be configured as part of the memory device that constitutes the data memory 111, or may be configured as a separate memory device.

 マップチェック部115は、サブマップメモリ114に格納された状態情報に基づいて、当該状態情報に対応する出力行列を畳み込み演算における次層の入力データ(演算対象データ)として行列計算部112に読み出させるか否かを判断する。例えば、上述の例の場合、以下のような判断をする。状態情報が状態1の場合、マップチェック部115は行列計算部112に対応するデータ(出力行列)を読み出させないと判断する。状態情報が状態2や状態3の場合、マップチェック部115は、カーネルAの値が予め設定された条件を満足するときは行列計算部112に対応するデータ(出力行列)を読み出させないと判断する。予め設定された条件とは、例えば、状態2はカーネルの要素の半分以上がゼロであること、状態3はカーネルの要素の3/4以上がゼロであること、等である。また、状態情報が状態4場合、マップチェック部115は行列計算部112に対応するデータ(出力行列)を読み出させると判断する。 Based on the state information stored in the submap memory 114, the map check unit 115 judges whether or not to have the matrix calculation unit 112 read out the output matrix corresponding to the state information as input data (data to be calculated) for the next layer in the convolution calculation. For example, in the above example, the following judgment is made. If the state information is state 1, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112. If the state information is state 2 or state 3, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112 when the value of kernel A satisfies a preset condition. The preset condition is, for example, state 2 where more than half of the kernel elements are zero, state 3 where more than 3/4 of the kernel elements are zero, etc. Also, if the state information is state 4, the map check unit 115 judges to read out the data (output matrix) corresponding to the matrix calculation unit 112.

 また、演算装置100は、データメモリ111、行列計算部112、ゼロチェック部113、サブマップメモリ114、マップチェック部115の動作タイミング等を制御するコントローラ116を備える。 The calculation device 100 also includes a controller 116 that controls the operation timing of the data memory 111, matrix calculation unit 112, zero check unit 113, submap memory 114, and map check unit 115.

 なお、演算装置100において、行列計算部112は、例えば、画像処理に特化されたGPU(Graphic Processing Unit)等のプロセッサにより実現することができる。また、ゼロチェック部113、マップチェック部115、コントローラ116等の信号処理やデータ処理を実施する各要素は、例えば、専用の演算回路、あるいは、プロセッサとRAM(Random Access Memory)やROM(Read Only Memory)等のメモリとを備えたハードウェア、及び当該メモリに格納され、プロセッサ上で動作するソフトウェアにより実現することができる。 In the arithmetic device 100, the matrix calculation unit 112 can be realized by a processor, such as a GPU (Graphic Processing Unit) specialized for image processing. Furthermore, each element that performs signal processing and data processing, such as the zero check unit 113, map check unit 115, and controller 116, can be realized by, for example, a dedicated arithmetic circuit, or hardware equipped with a processor and memory such as RAM (Random Access Memory) or ROM (Read Only Memory), and software that is stored in the memory and runs on the processor.

 続いて、以上の構成を有する演算装置100の動作について説明する。上述のとおり、状態情報を含むサブマップは、入力データに対する畳み込み演算が実施される際に、次層の畳み込み演算の入力チャネルとして使用される出力行列ごとに作成される。すなわち、1層目の畳み込み演算の際に作成されたサブマップが第2層の畳み込み演算の際に使用され、2層目の畳み込み演算の際に作成されたサブマップが第3層の畳み込み演算の際に使用される。したがって、1層目の畳み込み演算では、各入力チャネルに対応するサブマップは存在しない。以下では、1層目の畳み込み演算時の動作と、2層目以降の畳み込み演算時とのそれぞれについて説明する。 Next, the operation of the arithmetic device 100 having the above configuration will be described. As described above, when a convolution operation is performed on input data, a submap including state information is created for each output matrix used as an input channel for the convolution operation of the next layer. That is, the submap created during the convolution operation of the first layer is used during the convolution operation of the second layer, and the submap created during the convolution operation of the second layer is used during the convolution operation of the third layer. Therefore, in the convolution operation of the first layer, there are no submaps corresponding to each input channel. Below, the operation during the convolution operation of the first layer and the convolution operations of the second layer and onwards will be described.

 図2は、本実施形態の演算装置100の1層目の畳み込み演算時に実施される手順を示すフロー図である。なお、図2では、入力チャネル数がn、出力チャネル数がmの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置100の外部からデータメモリ111に格納されたタイミングで開始される。 FIG. 2 is a flow diagram showing the procedure performed during the first layer convolution calculation of the calculation device 100 of this embodiment. Note that FIG. 2 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.

 当該手順が開始されると、行列計算部112が1番目の入力チャネルについて行列計算を開始する。行列計算部112は、まず、データメモリ111から、1番目の入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS201)。また、このとき、行列計算部112は、データメモリ111から、1番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS202)。上述のとおり、1番目の入力チャネルに対するバイアスはゼロ行列である。 When this procedure is started, the matrix calculation unit 112 starts matrix calculation for the first input channel. The matrix calculation unit 112 first reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (step S201). At this time, the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel from the data memory 111, and the bias described above (step S202). As described above, the bias for the first input channel is a zero matrix.

 次いで、行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS203、S204)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は、予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納し、既に格納されていた状態情報を更新する(ステップS205)。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S203 and S204). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to. The zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S205).

 演算装置100は、以上の処理を、1番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS206No)。このとき、データメモリ111には1番目の入力チャネルについて行列計算された出力行列が格納されていることになる。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.

 1番目の入力チャネルに対する計算が完了すると、行列計算部112は、2番目の入力チャネルについて行列計算を開始する(ステップS206Yes、S207No)。行列計算部112は、データメモリ111から、2番目の入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS201)。また、このとき、行列計算部112は、データメモリ111から、2番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及びバイアスを読み出す(ステップS202)。上述のとおり、2番目の入力チャネルに対するバイアスは、計算中の出力行列において、その時点でデータメモリ111に格納されている、計算対象の座標領域に対応する要素以外の要素をゼロにした行列である。 When the calculation for the first input channel is completed, the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No). The matrix calculation unit 112 reads the kernel to be used for the matrix calculation for the second input channel from the data memory 111 (step S201). At this time, the matrix calculation unit 112 also reads the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the second input channel, and the bias from the data memory 111 (step S202). As described above, the bias for the second input channel is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time are set to zero except for the elements corresponding to the coordinate region to be calculated.

 次いで、行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS203、S204)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は、予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納し、既に格納されていた状態情報を更新する(ステップS205)。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S203 and S204). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to. The zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S205).

 演算装置100は、以上の処理を、2番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS206No)。このとき、データメモリ111には1番目の入力チャネルに対する行列計算の結果と2番目の入力チャネルに対する行列計算の結果とが合算された出力行列が格納されていることになる。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the second input channel (step S206: No). At this time, the data memory 111 stores an output matrix that is the sum of the results of the matrix calculation for the first input channel and the results of the matrix calculation for the second input channel.

 以降、2番目の入力チャネルに対する処理と同様の処理がn番目(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施される(ステップS206Yes、S207No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には入力データ全体(全入力チャネル)に対して行列計算を行った結果である出力行列が格納されていることになる。 After this, processing similar to that for the second input channel is repeated until processing for the nth input channel (in this example, n=3) is completed (steps S206: Yes, S207: No). When processing for the nth input channel is completed, the data memory 111 will store an output matrix that is the result of performing matrix calculations on the entire input data (all input channels).

 また、演算装置100は、以上のn個の入力チャネルに対する処理を、所定の出力行列数m(本例では、m=3)が得られるまで繰り返し実施する(ステップS207Yes、S208No)。その結果、データメモリ111にはm個の出力行列が格納され、サブマップメモリ114には各出力行列に対応するm個のサブマップが格納されることになる。このm個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。 The calculation device 100 also repeats the process for the above n input channels until a predetermined number m of output matrices (in this example, m = 3) is obtained (steps S207: Yes, S208: No). As a result, m output matrices are stored in the data memory 111, and m submaps corresponding to each output matrix are stored in the submap memory 114. These m output matrices are used as input channels for the convolution calculation of the next layer.

 図3は、本実施形態の演算装置100の2層目以降の畳み込み演算時に実施される手順を示すフロー図である。なお、図3では、入力チャネル数がn、出力チャネル数がmの事例を示している。また、入力チャネル数nは、直前の層の畳み込み演算により計算される出力行列(出力チャネル)の数mと一致する。当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 3 is a flow diagram showing the procedure performed during convolution calculations from the second layer onward in the calculation device 100 of this embodiment. Note that FIG. 3 shows an example in which the number of input channels is n and the number of output channels is m. The number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

 当該手順が開始されると、マップチェック部115が、サブマップメモリ114から、1番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する(ステップS301)。 When this procedure starts, the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information contained in that submap (step S301).

 状態情報が上述の状態1を示す情報である場合、マップチェック部115は、サブマップメモリ114から、2番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する(ステップS302Yes、S310No、S301)。 If the status information indicates the above-mentioned status 1, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in the submap (steps S302 Yes, S310 No, S301).

 また、状態情報が上述の状態2、又は状態3を示す情報である場合、マップチェック部115は、行列計算部112に、データメモリ111から、1番目の入力チャネルに対する行列計算に使用するカーネルを読み出させる。そして、当該カーネルが、上述の条件を満足するか否かを確認する(ステップS302No、S303)。カーネルが上述の条件を満足する場合、マップチェック部115は、サブマップメモリ114から、2番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する(ステップS304Yes、S310No、S301)。 If the state information indicates the above-mentioned state 2 or state 3, the map check unit 115 causes the matrix calculation unit 112 to read a kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is confirmed whether or not the kernel satisfies the above-mentioned conditions (steps S302: No, S303). If the kernel satisfies the above-mentioned conditions, the map check unit 115 reads a submap corresponding to the second input channel from the submap memory 114, and checks the state information included in the submap (steps S304: Yes, S310: No, S301).

 また、状態情報が上述の状態4を示す情報である場合、マップチェック部115は、行列計算部112に行列計算を実行させる(ステップS302No)。この場合、行列計算部112は、データメモリ111から、1番目の入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS302No、S303)。また、このとき、行列計算部112は、データメモリ111から、1番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS304No、S305)。なお、上述のとおり、k層目の畳み込み演算の場合、k-1層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 If the state information indicates the above-mentioned state 4, the map check unit 115 causes the matrix calculation unit 112 to execute a matrix calculation (step S302: No). In this case, the matrix calculation unit 112 reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (steps S302: No, S303). At this time, the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel, and the above-mentioned bias from the data memory 111 (steps S304: No, S305). As described above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

 次いで、行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS306、S307)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は、予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納し、既に格納されていた状態情報を更新する(ステップS308)。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S306 and S307). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to. The zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S308).

 演算装置100は、以上の処理を、1番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS309No)。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S309 No).

 1番目の入力チャネルに対する計算が完了すると、マップチェック部115が、サブマップメモリ114から、2番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する(ステップS309Yes、S310No、S301)。以降の処理は、上述の1番目の入力チャネルに対する処理と同様の処理が、n番目(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施される(ステップS309Yes、S310No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には入力データ全体(全入力チャネル)に対して行列計算を行った結果である出力行列が格納されていることになる。 When the calculation for the first input channel is completed, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in that submap (steps S309 Yes, S310 No, S301). Subsequent processing is similar to the processing for the first input channel described above, and is repeated until processing for the nth input channel (n=3 in this example) is completed (steps S309 Yes, S310 No). When processing for the nth input channel is completed, the data memory 111 stores the output matrix that is the result of performing matrix calculations on the entire input data (all input channels).

 また、演算装置100は、以上のn個の入力チャネルに対する処理を、所定の出力行列数m(本例では、m=3)が得られるまで繰り返し実施する(ステップS310Yes、S311No)。その結果、データメモリ111にはm個の出力行列が格納され、サブマップメモリ114には各出力行列に対応するm個のサブマップが格納されることになる。このm個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。そして、演算装置100は、指定された層数の畳み込み演算のすべてが完了するまで2層目以降の畳み込み演算時に実施される手順を繰り返し実施する。 The arithmetic device 100 also repeats the process for the above n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (steps S310: Yes, S311: No). As a result, m output matrices are stored in the data memory 111, and m submaps corresponding to each output matrix are stored in the submap memory 114. These m output matrices are used as input channels for the convolution operation of the next layer. The arithmetic device 100 then repeats the procedure performed during the convolution operation of the second layer and onwards, until all of the convolution operations for the specified number of layers are completed.

 以上説明したように、本実施形態の演算装置100では、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部112には、データメモリ111から当該入力チャネルに属するデータは読み出されない。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As described above, in the arithmetic device 100 of this embodiment, state information is created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to that state information is skipped. Also, at this time, data belonging to that input channel is not read from the data memory 111 to the matrix calculation unit 112. In other words, since the reading of unnecessary data does not occur, it is possible to further reduce the wasted data read time, and the time required for the entire calculation can be further shortened compared to the conventional method.

 ここで、ゼロチェック部113の構成の一例について説明する。図4は、演算装置100が備えるゼロチェック部113の一例を示す概略構成図である。図4に示すように、ゼロチェック部113は、入力端子31、比較端子32及び出力端子33を備える。入力端子31には、行列計算部112からの出力値が入力される。比較端子32には、サブマップメモリ114に格納されている状態情報等が入力される。出力端子33は、サブマップメモリ114へ格納するデータを出力する。 Here, an example of the configuration of the zero check unit 113 will be described. FIG. 4 is a schematic diagram showing an example of the zero check unit 113 provided in the arithmetic device 100. As shown in FIG. 4, the zero check unit 113 has an input terminal 31, a comparison terminal 32, and an output terminal 33. The input terminal 31 receives an output value from the matrix calculation unit 112. The comparison terminal 32 receives state information stored in the submap memory 114. The output terminal 33 outputs data to be stored in the submap memory 114.

 入力端子31を通じて入力された行列計算部112からの出力値は、複数の比較器を有する比較部34に入力される。比較部34は、予め設定される閾値の数以上の比較器を備える。上述のように、本実施形態では、3つの閾値が設定されるため、比較部34は、3つの比較器34a、34b、34cを備えている。各比較器34a、34b、34cの一方の入力端子には行列計算部112からの出力値が入力され、他方の入力端子には閾値が入力される。なお、特に限定されないが、ここでは、各比較器34a、34b、34cにおいて、入力される行列計算部112からの出力値が閾値よりも大きい場合、比較器34a、34b、34cは数値「1」を出力する構成になっている。 The output value from the matrix calculation unit 112 input through the input terminal 31 is input to the comparison unit 34 having multiple comparators. The comparison unit 34 has a number of comparators equal to or greater than the number of preset thresholds. As described above, in this embodiment, three thresholds are set, so the comparison unit 34 has three comparators 34a, 34b, and 34c. The output value from the matrix calculation unit 112 is input to one input terminal of each of the comparators 34a, 34b, and 34c, and the threshold is input to the other input terminal. Although not particularly limited, in this embodiment, when the output value input from the matrix calculation unit 112 is greater than the threshold, the comparators 34a, 34b, and 34c are configured to output the numerical value "1".

 各比較器34a、34b、34cの出力はチェッカー35に入力される。また、チェッカー35には、比較端子32を介してその時点でサブマップメモリ114に格納されている状態情報が入力される。上述のとおり、サブマップメモリ114に格納されている状態情報は「0」から「3」までのいずれかの数値である。 The output of each comparator 34a, 34b, 34c is input to checker 35. In addition, the status information stored in submap memory 114 at that time is input to checker 35 via comparison terminal 32. As described above, the status information stored in submap memory 114 is a number between "0" and "3."

 チェッカー35は、各比較器34a、34b、34cの出力に数値「1」が含まれており、サブマップメモリ114に格納されている状態情報を更新する必要がある場合は、更新後の状態情報に対応する出力を出力端子33に出力する。例えば、格納されている状態情報が「0」である場合、チェッカー35は、少なくとも比較器34aの出力が数値「1」であるとき、各比較器34b、34cの出力に応じて状態情報を数値「1」、「2」、「3」のいずれかに更新する。また、格納されている状態情報が「1」である場合、チェッカー35は、少なくとも比較器34bの出力が数値「1」であるときに比較器34cの出力に応じて状態情報を数値「2」、「3」のいずれかに更新する。また、格納されている状態情報が「2」である場合、チェッカー35は、比較器34cの出力が数値「1」であるときに状態情報を数値「3」に更新する。なお、格納されている状態情報が「3」である場合、チェッカー35は状態情報を更新しない。 When the output of each of the comparators 34a, 34b, and 34c includes the value "1" and the status information stored in the submap memory 114 needs to be updated, the checker 35 outputs an output corresponding to the updated status information to the output terminal 33. For example, when the stored status information is "0", the checker 35 updates the status information to one of the values "1", "2", or "3" according to the output of each of the comparators 34b and 34c when at least the output of the comparator 34a is the value "1". When the stored status information is "1", the checker 35 updates the status information to one of the values "2" or "3" according to the output of the comparator 34c when at least the output of the comparator 34b is the value "1". When the stored status information is "2", the checker 35 updates the status information to the value "3" when the output of the comparator 34c is the value "1". When the stored status information is "3", the checker 35 does not update the status information.

 以上の構成を有するセロチェック部113によれば、1つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列に対応する状態情報がサブマップメモリ114に格納されることになる。なお、ゼロチェック部113は他の構成により実現することも可能である。例えば、ゼロチェック部113が各範囲に属する出力値の数の累積値を保持する構成を採用することもできる。この場合、ゼロチェック部113は、行列計算部112から出力値が出力される都度、保持している累積値に基づいて上述の複数の範囲のいずれに属するかを判断することができる。 With the zero check unit 113 having the above configuration, when the convolution operation to obtain one output matrix is completed, the status information corresponding to that output matrix is stored in the submap memory 114. Note that the zero check unit 113 can also be realized with other configurations. For example, a configuration can be adopted in which the zero check unit 113 holds a cumulative value of the number of output values that belong to each range. In this case, each time an output value is output from the matrix calculation unit 112, the zero check unit 113 can determine which of the above-mentioned multiple ranges it belongs to based on the held cumulative value.

 特に限定されないが、この例では、ゼロチェック部113が上述の判断の他、出力行列の要素に負の値が含まれているか否かの負判定と、出力行列において上述のいずれかの閾値を超える要素の数の計数ができる構成を有している。なお、負判定の結果を示す負判定情報及び係数結果を示す計数情報はサブマップメモリ114に格納される。負判定情報、計数情報も、上述の状態情報とともに上述のサブマップを構成する情報である。 In this example, although not limited to this, the zero check unit 113 is configured to be able to make the above-mentioned judgments as well as make negative judgments as to whether or not the elements of the output matrix contain negative values, and to count the number of elements in the output matrix that exceed any of the above-mentioned thresholds. Note that negative judgment information indicating the result of the negative judgment and counting information indicating the coefficient result are stored in the submap memory 114. The negative judgment information and counting information, together with the above-mentioned state information, are information that constitutes the above-mentioned submap.

 すなわち、ゼロチェック部113は、比較部34に、一方の入力端子に行列計算部112からの出力値が入力され、他方の入力端子に数値「0」が入力される比較器34dを備える。比較器34dは、入力される行列計算部112からの出力値が「0」よりも小さい場合、数値「1」を出力する構成になっている。比較器34dの出力は、OR回路36に入力される。また、OR回路36には、比較端子32を介してその時点でサブマップメモリ114に格納されている負判定情報が入力される。OR回路36は、比較器34dの出力及びサブマップメモリ114に格納されている負判定情報のいずれかが数値「1」である場合、数値「1」を出力端子33に出力する。すなわち、1つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列の要素に負の値が含まれていると負判定情報として数値「1」がサブマップメモリ114に格納されることになる。また、当該出力行列の要素に負の値が含まれていない場合は、負判定情報として数値「0」がサブマップメモリ114に格納されることになる。 That is, the zero check unit 113 includes a comparator 34d in the comparison unit 34, one input terminal of which receives the output value from the matrix calculation unit 112 and the other input terminal of which receives the numerical value "0". The comparator 34d is configured to output the numerical value "1" when the input output value from the matrix calculation unit 112 is smaller than "0". The output of the comparator 34d is input to the OR circuit 36. The negative judgment information stored in the submap memory 114 at that time is also input to the OR circuit 36 via the comparison terminal 32. If either the output of the comparator 34d or the negative judgment information stored in the submap memory 114 is the numerical value "1", the OR circuit 36 outputs the numerical value "1" to the output terminal 33. That is, when the convolution operation for obtaining one output matrix is completed, if the element of the output matrix contains a negative value, the numerical value "1" is stored in the submap memory 114 as the negative judgment information. Furthermore, if the elements of the output matrix do not contain negative values, the numerical value "0" will be stored in the submap memory 114 as negative determination information.

 また、ゼロチェック部113は、各比較器34a、34b、34cの出力が入力されるセレクタ37を備える。セレクタ37は、比較器34a、34b、34cの出力のうち、予め設定された1つの出力をカウンタ38に入力する。また、カウンタ38には、比較端子32を介してその時点でサブマップメモリ114に格納されている計数情報が入力される。カウンタ38は、数値「1」が入力された場合、格納されている計数情報に「1」を加算した値を出力端子33に出力する。例えば、出力行列において、第3閾値より大きい要素の数を計数する場合、セレクタ36は、比較器34cの出力値を出力する状態に設定される。そして、本構成によれば、1つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列に含まれる、第3閾値より大きい要素の計数値が計数情報としてサブマップメモリ114に格納されることになる。 The zero check unit 113 also includes a selector 37 to which the outputs of the comparators 34a, 34b, and 34c are input. The selector 37 inputs one of the outputs of the comparators 34a, 34b, and 34c that is set in advance to a counter 38. The counter 38 also receives the count information stored in the submap memory 114 at that time via a comparison terminal 32. When the value "1" is input to the counter 38, the counter 38 outputs a value obtained by adding "1" to the stored count information to the output terminal 33. For example, when counting the number of elements in the output matrix that are greater than the third threshold, the selector 36 is set to a state in which it outputs the output value of the comparator 34c. According to this configuration, when the convolution operation for obtaining one output matrix is completed, the count value of the elements contained in the output matrix that are greater than the third threshold is stored in the submap memory 114 as count information.

 なお、上述の負判定情報は、例えば、負の値の有無により処理が異なる場合、いずれの処理を実装するかのフラグとして使用することができる。また、計数情報は、例えば、予め設定された閾値を超える要素が含まれている場合でも、その総数が少ない場合は対応する出力行列(入力チャネル)を読み出さずにスキップする等の処理を実行することが可能になる。 The above-mentioned negative determination information can be used as a flag to determine which process to implement when the process differs depending on whether or not there is a negative value. In addition, the count information can perform a process such as skipping the corresponding output matrix (input channel) without reading it out if the total number of elements exceeds a preset threshold value.

 図5は、サブマップの一例を示す図である。特に限定されないが、ここでは1バイト(8ビット)データによるサブマップを例示している。上述のように、本実施形態では、サブマップ40は、状態情報、負判定情報、計数情報を含む。この例では、2ビットの状態情報、1ビットの負判定情報、5ビットの計数情報によりサブマップ40が構成されている。なお、サブマップメモリ114におけるサブマップ40のアドレス情報と、データメモリ111に格納されている当該サブマップ40に対応する出力行列(次層の入力チャネル)のアドレス情報は、相互に関連性を有していることが好ましい。当該関連性は、例えば、サブマップ40の先頭アドレスが、当該サブマップ40に対応する出力行列の先頭アドレスに、予め指定されたオフセットを加算したアドレスになっている等である。 FIG. 5 is a diagram showing an example of a submap. Although not limited to this, a submap using 1 byte (8 bits) of data is shown here as an example. As described above, in this embodiment, the submap 40 includes status information, negative judgment information, and count information. In this example, the submap 40 is composed of 2 bits of status information, 1 bit of negative judgment information, and 5 bits of count information. It is preferable that the address information of the submap 40 in the submap memory 114 and the address information of the output matrix (next layer input channel) corresponding to the submap 40 stored in the data memory 111 are mutually related. This relationship is, for example, such that the top address of the submap 40 is an address obtained by adding a pre-specified offset to the top address of the output matrix corresponding to the submap 40.

 なお、図2及び図3に示すフロー図では、行列計算部112が計算結果を出力する都度、ステップS205やステップS308において、ゼロチェック部113が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部113が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ114に格納する構成を採用することもできる。 In the flow diagrams shown in Figures 2 and 3, the zero check unit 113 updates the state information in steps S205 and S308 each time the matrix calculation unit 112 outputs a calculation result. However, it is also possible to update the state information at other times. For example, it is also possible to adopt a configuration in which the zero check unit 113 creates state information and stores it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

 上述の演算装置100の構成では、サブマップメモリ114に、行列計算により算出される全出力行列のそれぞれに対応するサブマップが格納されることになる。そのため、畳み込み演算の層数やその各層における出力行列の数が多い場合、サブマップメモリ114として多くの記憶領域を準備する必要がある。ここでは、準備すべきサブマップメモリ114の記憶領域を低減することができる構成について説明する。図6は、本発明の一実施形態における演算装置の変形例の構成を示す概略構成図である。なお、図6において演算装置100と同様の作用効果を奏する構成要素には、図1と同一の符号を付し、以下での詳細な説明は省略する。 In the configuration of the arithmetic device 100 described above, submaps corresponding to all output matrices calculated by the matrix calculation are stored in the submap memory 114. Therefore, when the number of layers of the convolution calculation and the number of output matrices in each layer are large, it is necessary to prepare a large storage area for the submap memory 114. Here, a configuration that can reduce the storage area of the submap memory 114 that needs to be prepared is described. Figure 6 is a schematic diagram showing the configuration of a modified example of the arithmetic device in one embodiment of the present invention. Note that in Figure 6, components that achieve the same effects as the arithmetic device 100 are given the same reference numerals as in Figure 1, and detailed description thereof will be omitted below.

 図6に示すように本実施形態の演算装置300は、サブマップアドレスバッファ120をさらに備える。演算装置300は、上述の演算装置100が作成するサブマップに加えて、サブマップアドレステーブルを作成する。サブマップアドレステーブルは、サブマップアドレスバッファ120に格納される。図7に示すように、サブマップアドレステーブル41は、ID番号、アドレス情報、及び使用情報が紐づけられた状態で記録されるテーブルである。なお、サブマップアドレスバッファ120は、サブマップメモリ114を構成するメモリ装置やデータメモリ111を構成するメモリ装置の一部として構成されてもよく、別体のメモリ装置として構成されてもよい。 As shown in FIG. 6, the arithmetic device 300 of this embodiment further includes a submap address buffer 120. The arithmetic device 300 creates a submap address table in addition to the submap created by the arithmetic device 100 described above. The submap address table is stored in the submap address buffer 120. As shown in FIG. 7, the submap address table 41 is a table in which ID numbers, address information, and usage information are recorded in a linked state. The submap address buffer 120 may be configured as part of the memory device that constitutes the submap memory 114 or the memory device that constitutes the data memory 111, or may be configured as a separate memory device.

 ID番号は、サブマップを識別するための情報として機能する。上述のとおり、サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごと、すなわち、畳み込み演算を実施する層における出力チャネルごとに作成される。そのため、ID番号の数は、出力チャネルの数と同数になる。特に限定されないが、本実施形態では、ID番号として、畳み込み演算の何層目であるかを示す数字と何番目の出力チャネルであるかを示す数字を組み合わせた固有の番号が付与される。例えば、第(k-1)層において3つの出力チャネルが演算される場合、ID番号として「k1」、「k2」、「k3」が付与される。より具体的には、第2層目の3つの出力チャネルに対しては、「31」、「32」、「33」がID番号として付与される。これらの出力チャネルは、第3層の演算では入力チャネルとなるため、サブマップを読み出す際は、ID番号が「31」、「32」、「33」のそれぞれに紐づけられたアドレス情報が参照されることになる。 The ID number functions as information for identifying a submap. As described above, a submap is created for each input channel used as input data for the convolution operation of the next layer, that is, for each output channel in the layer where the convolution operation is performed. Therefore, the number of ID numbers is the same as the number of output channels. Although not limited to this, in this embodiment, a unique number is assigned as the ID number, which is a combination of a number indicating which layer of the convolution operation and a number indicating which output channel. For example, when three output channels are operated in the (k-1)th layer, the ID numbers "k1", "k2", and "k3" are assigned. More specifically, the ID numbers "31", "32", and "33" are assigned to the three output channels of the second layer. Since these output channels become input channels in the operation of the third layer, when reading out the submap, the address information associated with the ID numbers "31", "32", and "33" is referenced.

 アドレス情報は、サブマップメモリ114におけるサブマップの格納位置を示す情報である。より具体的には、例えば、サブマップの格納位置の先頭アドレスである。図5を参照して説明したように、サブマップのデータ長(ビット数)は一定である。そのため、サブマップメモリ114におけるサブマップの格納位置は、1つのアドレスにより特定可能である。 The address information is information that indicates the storage location of the submap in the submap memory 114. More specifically, it is, for example, the starting address of the storage location of the submap. As explained with reference to FIG. 5, the data length (number of bits) of the submap is constant. Therefore, the storage location of the submap in the submap memory 114 can be identified by a single address.

 使用情報は、紐づけられたサブマップが次層の畳み込み演算に使用されたか否かを示す情報である。図7では、使用情報を「valid」として表示している。上述のとおり、サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごとに作成される。そのため、次層の畳み込み演算において読み出されて使用されたサブマップは以降の畳み込み演算に使用されない。本実施形態では、使用されたことを示す情報、及び使用されていないことを示す情報として、それぞれ「0」、「1」の数値を用いている。この場合、サブマップアドレステーブルにおいて、使用情報が「0」であるレコードには、新たなID番号、アドレス情報、使用情報を紐づけて格納することができる。 The usage information indicates whether the linked submap has been used in the convolution calculation of the next layer. In FIG. 7, the usage information is displayed as "valid". As described above, a submap is created for each input channel that is used as input data for the convolution calculation of the next layer. Therefore, a submap that is read and used in the convolution calculation of the next layer is not used in subsequent convolution calculations. In this embodiment, the numerical values "0" and "1" are used as information indicating that it has been used and information indicating that it has not been used, respectively. In this case, a new ID number, address information, and usage information can be linked and stored in a record in the submap address table whose usage information is "0".

 なお、本実施形態では、コントローラ116がID番号、アドレス情報、及び使用情報を生成し、サブマップアドレスバッファ120のサブマップアドレステーブルに記録する構成を採用しているが他の構成を採用することも可能である。例えば、これらの処理を実施する機能を有するサブマップアドレス管理部をコントローラ116と別体で設けてもよい。 In this embodiment, the controller 116 generates the ID number, address information, and usage information, and records them in the submap address table of the submap address buffer 120, but other configurations can also be used. For example, a submap address management unit having the function of performing these processes may be provided separately from the controller 116.

 続いて、以上の構成を有する演算装置300の動作について説明する。図8は、演算装置300による1層目の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図8では、入力チャネル数がn、出力チャネル数がmの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置300の外部からデータメモリ111に格納されたタイミングで開始される。なお、上述のとおり、演算装置300は、サブマップアドレステーブルを作成する点のみが演算装置100と相違する。そのため、図8に示す手順において、演算装置100と同様の動作をするステップには、図2と同一の符号を付し、以下での詳細な説明は省略する。 Next, the operation of the arithmetic device 300 having the above configuration will be described. FIG. 8 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the first-layer convolution calculation by the arithmetic device 300. FIG. 8 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, the procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the arithmetic device 300. As described above, the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 8, steps that perform the same operation as the arithmetic device 100 are given the same reference numerals as in FIG. 2, and detailed description thereof will be omitted below.

 当該手順が開始されると、行列計算部112が1番目の出力行列(出力チャネル)の行列計算を開始する。このとき、コントローラ116は、サブマップアドレステーブル用データを生成する(ステップS220)。すなわち、コントローラ116は、上述のID番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ120のサブマップアドレステーブル41に記録する。なお、コントローラ116は、生成した情報をサブマップアドレステーブル41において使用情報が「0」であるレコードに記録する。 When this procedure is started, the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel). At this time, the controller 116 generates data for the submap address table (step S220). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the record in the submap address table 41 whose usage information is "0".

 上述のID番号の生成規則では、このとき、ID番号として「21」が生成されることになる。アドレス情報はサブマップメモリ114内のアドレスが適宜選択される。例えば、選択可能なサブマップメモリ114内のアドレスが全てサブマップアドレステーブル41に記録されており、コントローラ116が、情報を書き込もうとするレコードに記録されているアドレス情報を、生成したID番号に紐づけるアドレス情報として選択する構成を採用することができる。なお、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「1」である。 In the above-mentioned ID number generation rules, the ID number "21" is generated at this time. An address in the submap memory 114 is appropriately selected as the address information. For example, a configuration can be adopted in which all selectable addresses in the submap memory 114 are recorded in the submap address table 41, and the controller 116 selects the address information recorded in the record into which information is to be written as the address information to be linked to the generated ID number. Note that the linked submap has not yet been read, so the usage information recorded is "1".

 サブマップアドレステーブル用データの生成が完了すると、行列計算部112が1番目の入力チャネルについて行列計算を開始する。1番目の入力チャネルについての行列計算の手順は図2において説明した手順と概ね同一である。すなわち、カーネルの読み出し(ステップS201)、データ読み出し(ステップS202)、行列計算(ステップS203)、計算結果格納(ステップS204)については上述したとおりである。また、ゼロチェック部113による判断結果の格納(ステップS205)では、ゼロチェック部113は、判断結果をコントローラ116が生成したアドレス情報で指定されるサブマップメモリ114の格納位置に格納する。 When the generation of data for the submap address table is completed, the matrix calculation unit 112 starts matrix calculation for the first input channel. The procedure for matrix calculation for the first input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above. In addition, in storing the determination result by the zero check unit 113 (step S205), the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.

 演算装置300は、以上の処理を、1番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS206No)。このとき、データメモリ111には1番目の入力チャネルについて行列計算された出力行列が格納されていることになる。 The calculation device 300 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.

 1番目の入力チャネルに対する計算が完了すると、行列計算部112は、2番目の入力チャネルについて行列計算を開始する(ステップS206Yes、S207No)。2番目の入力チャネルについての行列計算の手順も図2において説明した手順と概ね同一である。すなわち、カーネルの読み出し(ステップS201)、データ読み出し(ステップS202)、行列計算(ステップS203)、計算結果格納(ステップS204)については上述したとおりである。また、ゼロチェック部113による判断結果の格納(ステップS205)では、ゼロチェック部113は、判断結果をコントローラ116が生成したアドレス情報で指定されるサブマップメモリ114の格納位置に格納する。 When the calculation for the first input channel is completed, the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No). The procedure for the matrix calculation for the second input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above. In addition, in storing the determination result by the zero check unit 113 (step S205), the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.

 演算装置300は、以上の処理を2番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS206No)。以降、2番目の入力チャネルに対する処理と同様の処理がn番目(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施される(ステップS206Yes、S207No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には入力データ全体(全入力チャネル)に対して行列計算を行った結果である出力行列が格納されていることになる。 The calculation device 300 repeats the above process until it is completed for all coordinate regions belonging to the second input channel (step S206: No). Thereafter, a process similar to the process for the second input channel is repeated until the process for the nth input channel (in this example, n=3) is completed (steps S206: Yes, S207: No). When the process for the nth input channel is completed, the data memory 111 stores an output matrix that is the result of performing a matrix calculation on all input data (all input channels).

 また、演算装置300は、以上のn個の入力チャネルに対する処理を、所定の出力行列数m(本例では、m=3)が得られるまで繰り返し実施する(ステップS207Yes、S208No)。行列計算部112が2番目の出力行列(出力チャネル)の行列計算を開始するとき、コントローラ116は、当該出力行列に対応するサブマップアドレステーブル用データを生成する(ステップS220)。すなわち、コントローラ116は、ID番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ120のサブマップアドレステーブル41に記録する。なお、コントローラ116は、生成した情報をサブマップアドレステーブル41において使用情報が「0」であるレコードに記録する。なお、上述のID番号の生成規則では、このとき生成されるID番号は「22」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「1」である。 The arithmetic device 300 repeats the above process for the n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (step S207: Yes, S208: No). When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S220). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "22". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".

 所定の出力行列数mを得るための計算がすべて完了したとき、データメモリ111にはm個の出力行列が格納され、サブマップメモリ114には各出力行列に対応するm個のサブマップが格納され、サブマップアドレスバッファ120にはm個の出力行列のそれぞれに対応するアドレス情報と使用情報とが記録されることになる。このm個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。 When all calculations to obtain a given number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices will be used as input channels for the convolution operation of the next layer.

 なお、以上の説明では、サブマップアドレステーブル用データの生成が出力行列の計算開始時に実施される構成を例示した。しかしながら、当該テーブル用データの生成は、1つの出力チャネルについての行列計算において、ゼロチェック部113が判断結果(状態情報)を最初にサブマップメモリ114に格納するまでであれば他のタイミングで実施されてもよい。 In the above explanation, an example was given of a configuration in which the data for the submap address table is generated at the start of the calculation of the output matrix. However, the data for the table may be generated at other times in the matrix calculation for one output channel, as long as it is generated before the zero check unit 113 first stores the judgment result (state information) in the submap memory 114.

 図9は、本実施形態の演算装置300の2層目以降の畳み込み演算時に実施される手順を示すフロー図である。なお、図9では、入力チャネル数がn、出力チャネル数がmの事例を示している。また、入力チャネル数nは、直前の層の畳み込み演算により計算される出力行列(出力チャネル)の数mと一致する。当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。なお、上述のとおり、演算装置300は、サブマップアドレステーブルを作成する点のみが演算装置100と相違する。そのため、図9に示す手順において、演算装置100と同様の動作をするステップには、図3と同一の符号を付し、以下での詳細な説明は省略する。 FIG. 9 is a flow diagram showing the procedure performed by the arithmetic device 300 of this embodiment when performing convolution calculations on the second and subsequent layers. Note that FIG. 9 shows an example in which the number of input channels is n and the number of output channels is m. The number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed. Note that, as described above, the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 9, steps that perform the same operations as the arithmetic device 100 are given the same reference numerals as in FIG. 3, and detailed explanations thereof will be omitted below.

 当該手順が開始されると、行列計算部112が1番目の出力行列(出力チャネル)の行列計算を開始する。このとき、コントローラ116は、サブマップアドレステーブル用データを生成する(ステップS320)。すなわち、コントローラ116は、上述のID番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ120のサブマップアドレステーブル41に記録する。なお、コントローラ116は、生成した情報をサブマップアドレステーブル41において使用情報が「0」であるレコードに記録する。なお、上述のID番号の生成規則では、このとき生成されるID番号は「31」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「1」である。 When this procedure is started, the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel). At this time, the controller 116 generates data for the submap address table (step S320). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in a record in the submap address table 41 whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "31". Also, since the linked submap has not yet been read, the usage information recorded is "1".

 サブマップアドレステーブル用データの生成が完了すると、行列計算部112が1番目の入力チャネルについて行列計算を開始する。このとき、マップチェック部115が、サブマップアドレスバッファ120から、1番目の入力チャネルに対応するサブマップのアドレス情報を読み出す(ステップS321)。そして、マップチェック部115は、当該アドレス情報に基づいて、サブマップメモリ114から、1番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する(ステップS301)。例えば、2層目の1番目の入力チャネルに対する演算であれば、マップチェック部115は、ID番号「21」に紐づけられたアドレス情報を読み出すことになる。また、マップチェック部115は、サブマップアドレスバッファ120からアドレス情報を読み出したID番号をコントローラ116に通知する。当該通知を受けたコントローラ116は、サブマップアドレスバッファ120のID番号に紐づけられた使用情報を「1」から「0」に書き換える。なお、使用情報の書き換えは、当該タイミングではなく任意のタイミングで実施することができる。しかしながら、サブマップメモリ114の記憶領域を有効利用する観点では、アドレス情報が読み出された後、新たなサブマップがサブマップメモリ114に格納されるまでの間に実施されることが好ましい。 When the generation of data for the submap address table is completed, the matrix calculation unit 112 starts matrix calculation for the first input channel. At this time, the map check unit 115 reads address information of the submap corresponding to the first input channel from the submap address buffer 120 (step S321). Then, based on the address information, the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information included in the submap (step S301). For example, if the calculation is for the first input channel of the second layer, the map check unit 115 reads address information linked to the ID number "21". In addition, the map check unit 115 notifies the controller 116 of the ID number for which the address information was read from the submap address buffer 120. The controller 116 that receives the notification rewrites the usage information linked to the ID number of the submap address buffer 120 from "1" to "0". Note that the rewriting of the usage information can be performed at any timing other than the timing. However, from the perspective of making effective use of the storage area of the submap memory 114, it is preferable to perform this process after the address information is read and before a new submap is stored in the submap memory 114.

 1番目の入力チャネルについての以降の手順は図3において説明した手順と概ね同一であるが、ゼロチェック部113は、判断結果をコントローラ116が生成したアドレス情報で指定されるサブマップメモリ114の格納位置に格納する(ステップS308)。演算装置300は、以上の処理を、1番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する(ステップS309No)。 The subsequent procedure for the first input channel is generally the same as that described in FIG. 3, but the zero check unit 113 stores the result of the determination in a storage location in the submap memory 114 specified by the address information generated by the controller 116 (step S308). The calculation device 300 repeats the above process until it is completed for all coordinate areas belonging to the first input channel (step S309 No).

 1番目の入力チャネルに対する計算が完了すると、マップチェック部115が、サブマップアドレスバッファ120から、2番目の入力チャネルに対応するサブマップのアドレス情報を読み出す(ステップS309Yes、S310No、S321)。そして、マップチェック部115は、当該アドレス情報に基づいて、サブマップメモリ114から、2番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する(ステップS301)。以降の処理は、上述の1番目の入力チャネルに対する処理と同様の処理が、n番目(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施される(ステップS309Yes、S310No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には入力データ全体(全入力チャネル)に対して行列計算を行った結果である出力行列が格納されていることになる。 When the calculation for the first input channel is completed, the map check unit 115 reads the address information of the submap corresponding to the second input channel from the submap address buffer 120 (steps S309 Yes, S310 No, S321). Then, based on the address information, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in the submap (step S301). Subsequent processing is similar to the processing for the first input channel described above, and is repeated until processing for the nth input channel (n=3 in this example) is completed (steps S309 Yes, S310 No). When processing for the nth input channel is completed, the output matrix, which is the result of performing matrix calculations on the entire input data (all input channels), is stored in the data memory 111.

 また、演算装置300は、以上のn個の入力チャネルに対する処理を、所定の出力行列数m(本例では、m=3)が得られるまで繰り返し実施する(ステップS310Yes、S311No)。行列計算部112が2番目の出力行列(出力チャネル)の行列計算を開始するとき、コントローラ116は、当該出力行列に対応するサブマップアドレステーブル用データを生成する(ステップS320)。すなわち、コントローラ116は、ID番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ120のサブマップアドレステーブル41に記録する。なお、コントローラ116は、生成した情報をサブマップアドレステーブル41において使用情報が「0」であるレコードに記録する。なお、上述のID番号の生成規則では、このとき生成されるID番号は「32」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「1」である。 The arithmetic device 300 repeats the above process for the n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (step S310: Yes, S311: No). When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S320). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "32". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".

 所定の出力行列数mを得るための計算がすべて完了したとき、データメモリ111にはm個の出力行列が格納され、サブマップメモリ114には各出力行列に対応するm個のサブマップが格納され、サブマップアドレスバッファ120にはm個の出力行列のそれぞれに対応するアドレス情報と使用情報とが記録されることになる。このm個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。そして、演算装置300は、指定された層数の畳み込み演算のすべてが完了するまで2層目以降の畳み込み演算時に実施される手順を繰り返し実施する。 When all calculations to obtain a predetermined number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices are used as input channels for the convolution calculation of the next layer. Then, the calculation device 300 repeatedly performs the procedures performed during the convolution calculation of the second layer and thereafter until all convolution calculations for the specified number of layers are completed.

 以上説明したように、本実施形態の演算装置300では、サブマップアドレステーブルを使用することにより、サブマップメモリ114において以降で使用しないサブマップが記憶されている領域に、新たなサブマップを上書きすることが可能になる。その結果、リングバッファ等のサイズが限定されたメモリを使用してサブマップメモリ114を実現することができる。 As described above, in the computing device 300 of this embodiment, by using a submap address table, it becomes possible to overwrite a new submap in an area of the submap memory 114 where a submap that will not be used in the future is stored. As a result, the submap memory 114 can be realized using a memory with a limited size, such as a ring buffer.

 なお、上述の演算装置100、300では、畳み込み演算に使用される入力チャネルのデータについてのみサブマップを生成し、サブマップに含まれる状態情報に基づいて当該入力チャネルのデータを読み出すが否かを判断する構成について説明した。しかしながら、このような状態情報は、入力データの行列計算に使用されるカーネルに適用することも可能である。すなわち、カーネルについても、カーネルの各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応するカーネルを使用した行列計算が入力チャネルのデータを読み出すことなくスキップされる構成を採用することもできる。この場合、カーネルは各要素の値が予め指定された行列であるため、カーネルの状態情報は予め取得することが可能である。例えば、当該カーネルの状態情報をサブマップメモリ114に格納しておき、図3や図9のステップS302における、マップチェック部115の判定において、入力チャネルの状態情報に加えて、カーネルの状態情報も考慮して入力チャネルのデータを読み出すが否かを判断する構成を採用することができる。 In the above-mentioned arithmetic devices 100 and 300, a submap is generated only for the data of the input channel used in the convolution operation, and whether or not to read the data of the input channel is determined based on the state information included in the submap. However, such state information can also be applied to the kernel used in the matrix calculation of the input data. That is, a configuration can also be adopted in which state information is created for the kernel based on each element of the kernel, and when the state information satisfies a pre-specified condition, the matrix calculation using the kernel corresponding to the state information is skipped without reading the data of the input channel. In this case, since the kernel is a matrix in which the values of each element are pre-specified, the state information of the kernel can be obtained in advance. For example, a configuration can be adopted in which the state information of the kernel is stored in the submap memory 114, and in the judgment of the map check unit 115 in step S302 of FIG. 3 or FIG. 9, in addition to the state information of the input channel, the state information of the kernel is also taken into consideration to determine whether or not to read the data of the input channel.

 上述の演算装置100では、行列計算部112が、1つの入力チャネルの全体について行列計算を実施し、完了後に次の入力チャネルの全体について行列計算する構成について説明した。しかしながら、行列計算は、入力チャネルの全体について連続して実施される必要はない。ここでは、出力行列の1つの要素ごとに行列計算を行う構成について説明する。このような構成では、行列計算は、入力チャネルの一部を構成する座標領域ごとに実施されることになる。図10は、本手法の概念を説明するための図である。図10では、3つの入力チャネルから3つの出力行列(出力チャネル)を得る事例を記載している。また、本手法は、行列計算部112等によるデータの読み出し順序等が変更されるだけであり、演算装置の構成は図1に示す構成と同様である。 In the above-mentioned arithmetic device 100, the matrix calculation unit 112 performs matrix calculations for one entire input channel, and after completion, performs matrix calculations for the entire next input channel. However, the matrix calculations do not need to be performed continuously for the entire input channel. Here, a configuration is described in which matrix calculations are performed for each element of the output matrix. In such a configuration, the matrix calculations are performed for each coordinate region that constitutes part of the input channel. Figure 10 is a diagram for explaining the concept of this method. Figure 10 shows an example in which three output matrices (output channels) are obtained from three input channels. Also, in this method, only the order in which data is read by the matrix calculation unit 112 and the like is changed, and the configuration of the arithmetic device is the same as the configuration shown in Figure 1.

 図10に示すように、3つの出力行列のそれぞれにおいて特定の座標に位置する要素の計算には、3つの入力チャネルにおいて同一の座標領域にあるデータが使用される。例えば、図10において第1の出力行列61の座標(2,3)に位置する要素61aを算出する行列計算には、第1の入力チャネル51の座標(2,3)を中心とする3×3の座標領域51aに属するデータ、第2の入力チャネル52の座標(2,3)を中心とする3×3の座標領域52aに属するデータ、第3の入力チャネル53の座標(2,3)を中心とする3×3の座標領域53aに属するデータが使用される。同様に、第2の出力行列62の座標(2,3)に位置する要素62a及び第3出力行列63の座標(2,3)に位置する要素63aを算出する行列計算にも、カーネルが異なるだけで第1の入力チャネル51の座標領域51aに属するデータ、第2の入力チャネル52の座標領域52aに属するデータ、第3の入力チャネル53の座標領域53aに属するデータが使用される。 As shown in FIG. 10, data in the same coordinate region in the three input channels is used to calculate an element located at a specific coordinate in each of the three output matrices. For example, in FIG. 10, data belonging to a 3×3 coordinate region 51a centered on the coordinate (2,3) of the first output matrix 61 is used in the matrix calculation to calculate an element 61a located at the coordinate (2,3) of the first input channel 51, data belonging to a 3×3 coordinate region 52a centered on the coordinate (2,3) of the second input channel 52, and data belonging to a 3×3 coordinate region 53a centered on the coordinate (2,3) of the third input channel 53. Similarly, data belonging to the coordinate region 51a of the first input channel 51, data belonging to the coordinate region 52a of the second input channel 52, and data belonging to the coordinate region 53a of the third input channel 53 are used in the matrix calculation to calculate an element 62a located at the coordinate (2,3) of the second output matrix 62 and an element 63a located at the coordinate (2,3) of the third output matrix 63, although the kernels are different.

 したがって、第1の入力チャネル51の座標領域51aのデータ、第2の入力チャネル52の座標領域52aのデータ、第3の入力チャネル53の座標領域53aのデータを順に読み出し、続いて、各入力チャネル51、52、53において位置を変えた座標領域のデータを順に読み出して行列計算することでも上述の実施形態において算出される出力行列と同一の出力行列を算出することができる。 Therefore, the same output matrix as that calculated in the above embodiment can be calculated by sequentially reading out the data in the coordinate region 51a of the first input channel 51, the data in the coordinate region 52a of the second input channel 52, and the data in the coordinate region 53a of the third input channel 53, and then sequentially reading out the data in the coordinate regions whose positions have been changed in each of the input channels 51, 52, and 53, and performing matrix calculations.

 図11は、演算装置100による1層目の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図11では、入力チャネル数がn、出力チャネル数がmの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置100の外部からデータメモリ111に格納されたタイミングで開始される。 FIG. 11 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the first-layer convolution calculation by the calculation device 100. FIG. 11 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.

 当該手順が開始されると、まず、計算対象となる出力行列の要素の座標が決定される(ステップS701)。そして、決定された座標に基づいて、行列計算に必要な入力チャネルの座標領域が特定される(ステップS702)。なお、本実施形態では、コントローラ116が出力行列における要素の座標の決定、及び入力チャネルにおける座標領域の特定を行う。 When this procedure starts, first, the coordinates of the elements of the output matrix to be calculated are determined (step S701). Then, based on the determined coordinates, the coordinate area of the input channel required for the matrix calculation is identified (step S702). Note that in this embodiment, the controller 116 determines the coordinates of the elements in the output matrix and identifies the coordinate area of the input channel.

 行列計算部112は、データメモリ111から、1番目の出力行列について1番目の入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS703)。また、行列計算部112は、データメモリ111から、1番目の入力チャネルの座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS704)。上述のとおり、1番目の入力チャネルに対するバイアスはゼロ行列である。 The matrix calculation unit 112 reads from the data memory 111 the kernel to be used for the matrix calculation for the first input channel for the first output matrix (step S703). The matrix calculation unit 112 also reads from the data memory 111 the data belonging to the coordinate region of the first input channel and the bias described above (step S704). As described above, the bias for the first input channel is a zero matrix.

 行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS705、S706)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は上述の状態判断を行う。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納する(ステップS707)。 The matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S705 and S706). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. The zero check unit 113 then stores the judgment result in the submap memory 114 (step S707).

 次いで、行列計算部112は、データメモリ111から、1番目の出力行列について2番目の入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS708No、S703)。また、行列計算部112は、データメモリ111から、2番目の入力チャネルの座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS704)。このとき、バイアスは、計算中の出力行列において、その時点でデータメモリ111に格納されている、計算対象の要素以外の要素をゼロにした行列である。 Then, the matrix calculation unit 112 reads out from the data memory 111 the kernel to be used for the matrix calculation of the first output matrix for the second input channel (step S708 No, S703). The matrix calculation unit 112 also reads out from the data memory 111 the data belonging to the coordinate region of the second input channel and the bias described above (step S704). At this time, the bias is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time, other than the element to be calculated, are set to zero.

 行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS705、S706)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は上述の状態判断を行う。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納する(ステップS707)。 The matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S705 and S706). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. The zero check unit 113 then stores the judgment result in the submap memory 114 (step S707).

 演算装置100は、以上の処理を、1番目の出力行列についてn番目の(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施する(ステップS708No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 100 repeats the above process for the first output matrix until processing for the nth (in this example, n=3) input channel is completed (step S708 No). When processing for the nth input channel is completed, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

 また、出力行列の1つの要素に対する処理が完了すると、コントローラ116が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する(ステップS708Yes、S709No、S701、S702)。そして、演算装置100は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する(ステップS709No)。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S708: Yes, S709: No, S701, S702). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S709: No).

 また、演算装置100は、以上の処理を、m番目(本例では、m=3)の出力行列に対する処理が完了するまで繰り返し実施する(ステップS709Yes、S710No)。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S709: Yes, S710: No).

 図12は、演算装置100による2層目以降の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図12では、入力チャネル数がn、出力チャネル数がmの事例を示している。特に限定されないが、当該手順は、例えば直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 12 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 100. FIG. 12 shows an example in which the number of input channels is n and the number of output channels is m. Although not limited to this, the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

 当該手順が開始されると、まず、コントローラ116が計算対象となる出力行列の要素の座標を決定し、決定した座標に基づいて、行列計算に必要な入力チャネルの座標領域を特定する(ステップS801、S802)。 When this procedure is started, the controller 116 first determines the coordinates of the output matrix elements to be calculated, and then identifies the coordinate region of the input channel required for the matrix calculation based on the determined coordinates (steps S801 and S802).

 マップチェック部115が、サブマップメモリ114から、1番目の出力行列について1番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する(ステップS803)。確認の結果、行列計算をしない場合、マップチェック部115は、サブマップメモリ114から、1番目の出力行列について2番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する(ステップS804Yes、S811No、S803)。 The map check unit 115 reads out the submap corresponding to the first input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (step S803). If the result of the check is that no matrix calculation is to be performed, the map check unit 115 reads out the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S804 Yes, S811 No, S803).

 また、確認の結果、行列計算を進める場合、マップチェック部115は、行列計算部112に、データメモリ111から、1番目の入力チャネルに対する行列計算に使用するカーネルを読み出させる。そして、当該カーネルが、上述の条件を満足するか否かを確認する(ステップS804No、S805)。確認の結果、行列計算をしない場合、マップチェック部115は、サブマップメモリ114から、1番目の出力行列について2番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する(ステップS806Yes、S811No、S803)。 If the result of the check is to proceed with the matrix calculation, the map check unit 115 causes the matrix calculation unit 112 to read the kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is checked whether the kernel satisfies the above-mentioned conditions (steps S804: No, S805). If the result of the check is that the matrix calculation is not to be performed, the map check unit 115 reads the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S806: Yes, S811: No, S803).

 また、カーネルの確認の結果、行列計算を進める場合、マップチェック部115は、行列計算部112に行列計算を実行させる(ステップS806No)。この場合、行列計算部112は、データメモリ111に格納されている1番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS807)。なお、上述のとおり、k層目の畳み込み演算の場合、k-1層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 If the kernel check result indicates that the matrix calculation should proceed, the map check unit 115 causes the matrix calculation unit 112 to execute the matrix calculation (step S806: No). In this case, the matrix calculation unit 112 reads out data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel stored in the data memory 111, and the bias described above (step S807). As described above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

 次いで、行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS808、S809)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は上述の状態判断を行う。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納する(ステップS810)。 Then, the matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S808 and S809). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. Then, the zero check unit 113 stores the judgment result in the submap memory 114 (step S810).

 演算装置100は、以上の処理を、1番目の出力行列についてn番目の(本例では、n=3)の入力チャネルに対する処理が完了するまで繰り返し実施する(ステップS811No)。n番目の入力チャネルに対する処理が完了すると、データメモリ111には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 100 repeats the above process for the first output matrix until processing for the nth (in this example, n=3) input channel is completed (step S811 No). When processing for the nth input channel is completed, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

 また、出力行列の1つの要素に対する処理が完了すると、コントローラ116が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する(ステップS811yes、S812No、S801、S802)。そして、演算装置100は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する(ステップS812No)。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S811: yes, S812: no, S801, S802). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S812: no).

 また、演算装置100は、以上の処理を、m番目(本例では、m=3)の出力行列に対する処理が完了するまで繰り返し実施する(ステップS812Yes、S813No)。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S812: Yes, S813: No).

 以上説明したように、本手法においても、上述の効果を得ることができる。演算装置100では、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部112には、データメモリ111から当該入力チャネルに属するデータは読み出されることなくスキップされる。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As explained above, the above-mentioned effects can also be obtained with this method. In the calculation device 100, state information is created based on each element of the output matrix of the convolution calculation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped. At this time, the matrix calculation unit 112 skips the data belonging to the input channel from the data memory 111 without reading it. In other words, since the reading of unnecessary data does not occur, the wasted data read time can be reduced, and the time required for the entire calculation can be further shortened compared to the conventional method.

 なお、図11及び図12に示すフロー図では、行列計算部112が計算結果を出力する都度、ステップS707やステップS810において、ゼロチェック部113が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部113が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ114に格納する構成を採用することもできる。 In the flow diagrams shown in Figures 11 and 12, the zero check unit 113 updates the state information in steps S707 and S810 each time the matrix calculation unit 112 outputs a calculation result. However, it is also possible to update the state information at other times. For example, it is also possible to adopt a configuration in which the zero check unit 113 creates state information and stores it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

 また、図12に示すフロー図のステップS804又はS806において、出力行列の同一の要素についての全入力チャネルの入力データの読み込みがスキップされた場合、当該要素に対してゼロチェックは実行されない。すなわち、当該出力行列に対応する状態情報は初期値のまま更新されない。そのため、状態情報の初期値を、例えば、状態1を示す「0」に設定する構成とすることが好ましい。本構成により、全入力チャネルの入力データの読み込みがスキップされた場合に、畳み込み演算における次層で当該出力行列が入力データとして読み出されることなくスキップさせることができる。すなわち、出力行列の要素が計算されない場合でも、当該要素の値をゼロにするためのゼロクリア動作(データメモリ111へのアクセス)が不要になる。 Furthermore, in step S804 or S806 of the flow diagram shown in FIG. 12, if the reading of input data of all input channels for the same element of the output matrix is skipped, a zero check is not performed on that element. That is, the state information corresponding to that output matrix is not updated to its initial value. For this reason, it is preferable to configure the initial value of the state information to be set to, for example, "0" indicating state 1. With this configuration, when the reading of input data of all input channels is skipped, the output matrix can be skipped without being read as input data in the next layer in the convolution calculation. That is, even if an element of the output matrix is not calculated, a zero clear operation (access to data memory 111) to set the value of that element to zero is not required.

 以上の説明では、行列計算部112が、各入力チャネルについて、1つの座標領域データを読み込む構成とした。しかしながら、行列計算部112が、各入力チャネルについて、連続する複数の座標領域データを読み込み、複数の行列計算の結果をデータメモリ111に書き込む構成を採用することも可能である。これにより、行列計算の並列化が可能となる。なお、図10から理解できるように、各入力チャネルにおいて連続する座標領域データを読み込んで行列計算することは、出力行列において連続する要素について行列計算することと等価である。 In the above explanation, the matrix calculation unit 112 is configured to read one coordinate domain data for each input channel. However, it is also possible to adopt a configuration in which the matrix calculation unit 112 reads multiple consecutive coordinate domain data for each input channel and writes the results of multiple matrix calculations to the data memory 111. This makes it possible to parallelize the matrix calculations. As can be seen from FIG. 10, reading consecutive coordinate domain data in each input channel and performing matrix calculations is equivalent to performing matrix calculations on consecutive elements in the output matrix.

 行列計算部112が複数の行列計算を並列化した場合、例えば、各行列計算の出力値が同一のタイミングで出力されることになるため、ゼロチェック部113による出力行列の状態情報のサブマップメモリ114への格納が複雑になる。そのため、各入力チャネルにおいて連続する座標領域データを読み込んで行列計算する際には、ゼロチェック部113は、算出された複数の出力値のいずれか1つに基づく状態情報を、複数の出力値に対応する全での要素の状態情報としてサブマップメモリ114に格納する。 When the matrix calculation unit 112 parallelizes multiple matrix calculations, for example, the output values of each matrix calculation are output at the same timing, which complicates the storage of output matrix status information in the submap memory 114 by the zero check unit 113. Therefore, when reading continuous coordinate area data in each input channel to perform a matrix calculation, the zero check unit 113 stores status information based on any one of the multiple calculated output values in the submap memory 114 as status information for all elements corresponding to the multiple output values.

 図13は、本手法の概念を説明するための図である。図13では、1つの出力行列のみを図示している。例えば、図13において出力行列71の座標(2,3)に位置する要素71a、座標(3,3)に位置する要素71b、座標(4,3)に位置する要素71cを算出する行列計算を並列化した場合、ゼロチェック部113は、いずれかの要素(例えば、要素71c)についての判断結果を全ての要素71a、71b、71cについての判断結果としてサブマップメモリ114に登録する。これにより、行列計算の並列化を容易に実現することができる。なお、ここでは、連続する3つのデータを1単位とした事例を例示したが、連続するデータであれば、出力行列単位、出力行列の行単位、入力チャネルの入力範囲単位で本手法を適用することも可能である。 FIG. 13 is a diagram for explaining the concept of this method. In FIG. 13, only one output matrix is illustrated. For example, in FIG. 13, when the matrix calculation for calculating element 71a located at coordinates (2, 3), element 71b located at coordinates (3, 3), and element 71c located at coordinates (4, 3) of output matrix 71 is parallelized, zero check unit 113 registers the judgment result for any element (for example, element 71c) in submap memory 114 as the judgment result for all elements 71a, 71b, and 71c. This makes it easy to realize parallelization of matrix calculation. Note that here, an example is shown in which three consecutive data are treated as one unit, but if the data is consecutive, this method can also be applied in output matrix units, output matrix row units, and input channel input range units.

 上述の全入力チャネルについて同一の座標領域のデータを取得する構成では、マップチェック部115がサブマップに含まれる状態情報を確認し、行列計算部113に入力チャネルのデータを読み込ませるか否かを選択する構成について説明した。演算全体に要する時間をより低減する観点では、確認回数がより少ないことが好ましい。 In the configuration described above in which data of the same coordinate region is obtained for all input channels, the map check unit 115 checks the state information contained in the submap and selects whether or not to have the matrix calculation unit 113 read the data of the input channel. From the perspective of further reducing the time required for the entire calculation, it is preferable to check as few times as possible.

 図14は、確認回数をより少なくすることができる演算装置の構成を示す概略構成図である。図14に示すように、演算装置200は、上述した演算装置100の構成に加えて、テーブル作成部117及び読み出し制御部118を備える。なお、図14において演算装置100と同様の作用効果を奏する構成要素には、図1と同一の符号を付し、以下での詳細な説明は省略する。 FIG. 14 is a schematic diagram showing the configuration of a calculation device that can reduce the number of confirmations. As shown in FIG. 14, in addition to the configuration of the calculation device 100 described above, the calculation device 200 includes a table creation unit 117 and a read control unit 118. Note that in FIG. 14, components that achieve the same effects as the calculation device 100 are given the same reference numerals as in FIG. 1, and detailed explanations thereof will be omitted below.

 テーブル作成部117は、マップチェック部115の判断結果に基づいて、行列計算部112に入力データとして読み出させる出力行列を特定するテーブルを作成する。すなわち、テーブル作成部117は、畳み込み演算を開始する際に、まず、マップチェック部115に当該畳み込み演算に使用する全ての入力チャネル(前層における畳み込み演算で算出された全ての出力行列)に対応する状態情報を読み出させ、上述した手法により、行列計算部112に入力データとして読み出させるか否かを判断させる。そして、テーブル作成部117は、当該判断結果に基づいて、行列計算部112に入力データとして読み出させる入力チャネルを特定するテーブルを作成する。例えば、入力チャネル数が3つであり、1番目の入力チャネルと3番目の入力チャネルを行列計算部112に入力データとして読み出させるとマップチェック部115が判断した場合、テーブル作成部117はその旨を示すテーブルを作成する。特に限定されないが、本実施形態では、テーブル作成部117は作成したテーブルを自身で保持する構成になっている。 The table creation unit 117 creates a table that specifies the output matrices to be read as input data by the matrix calculation unit 112 based on the judgment result of the map check unit 115. That is, when starting a convolution operation, the table creation unit 117 first makes the map check unit 115 read state information corresponding to all input channels used in the convolution operation (all output matrices calculated in the convolution operation in the previous layer), and judges whether to make the matrix calculation unit 112 read them as input data using the above-mentioned method. Then, based on the judgment result, the table creation unit 117 creates a table that specifies the input channels to be read as input data by the matrix calculation unit 112. For example, if there are three input channels and the map check unit 115 judges that the first input channel and the third input channel are to be read as input data by the matrix calculation unit 112, the table creation unit 117 creates a table indicating that. Although not particularly limited, in this embodiment, the table creation unit 117 is configured to hold the created table by itself.

 読み出し制御部118は、テーブル作成部117により作成されたテーブルに基づいて行列計算部112に演算対象データを読み出させる。上述のように、1番目の入力チャネルと3番目の入力チャネルを行列計算部112に入力データとして読み出させることを示すテーブルが作成された場合、読み出し制御部118は、畳み込み演算において、1番目の入力チャネルと3番目の入力チャネルのみを使用して、行列計算部112に行列計算を実行させる。 The read control unit 118 causes the matrix calculation unit 112 to read the data to be calculated based on the table created by the table creation unit 117. As described above, when a table is created indicating that the matrix calculation unit 112 is to read the first and third input channels as input data, the read control unit 118 causes the matrix calculation unit 112 to perform the matrix calculation using only the first and third input channels in the convolution operation.

 なお、演算装置200において、テーブル作成部117や読み出し制御部は、例えば、プロセッサとRAMやROM等のメモリとを備えたハードウェア、及び当該メモリに格納され、プロセッサ上で動作するソフトウェアにより実現することができる。 In addition, in the computing device 200, the table creation unit 117 and the read control unit can be realized, for example, by hardware including a processor and memory such as RAM or ROM, and by software stored in the memory and running on the processor.

 続いて、以上の構成を有する演算装置200の動作について説明する。演算装置200においても、サブマップが存在しない1層目の畳み込み演算とサブマップが存在する2層目以降の畳み込み演算とで動作が異なる。しかしながら、1層目の畳み込み演算の動作は、図11に示す動作と同一であるためここでの説明は省略する。 Next, the operation of the arithmetic device 200 having the above configuration will be described. Even in the arithmetic device 200, the operation differs between the first layer convolution operation where there is no submap and the second layer and subsequent layers where there are submaps. However, the operation of the first layer convolution operation is the same as the operation shown in FIG. 11, so a description thereof will be omitted here.

 図15は、演算装置200による2層目以降の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図15では、入力チャネル数がn、出力チャネル数がmの事例を示している。特に限定されないが、当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 15 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 200. FIG. 15 shows an example in which the number of input channels is n and the number of output channels is m. Although not limited to this, the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

 当該手順が開始されると、まず、テーブル作成部117が、マップチェック部115に全ての入力チャネルに対応する状態情報を読み出させ、上述した手法により、行列計算部112に入力データとして読み出させるか否かを判断させる。そして、テーブル作成部117は、当該判断結果に基づいて上述のテーブルを作成する(ステップS1101)。 When this procedure starts, the table creation unit 117 first causes the map check unit 115 to read out state information corresponding to all input channels, and then uses the method described above to determine whether or not to cause the matrix calculation unit 112 to read out the state information as input data. The table creation unit 117 then creates the above-mentioned table based on the result of this determination (step S1101).

 テーブル作成が完了すると、コントローラ116は、計算対象となる出力行列の要素の座標を決定し、決定した座標に基づいて、行列計算に必要な入力チャネルの座標領域を特定する(ステップ1102、S1103)。 Once the table creation is complete, the controller 116 determines the coordinates of the elements of the output matrix to be calculated, and, based on the determined coordinates, identifies the coordinate region of the input channel required for the matrix calculation (steps 1102, S1103).

 読み出し制御部118はテーブル作成部117が作成したテーブルに基づいて、行列計算部112に、テーブルの1番目に記載されている入力チャネルに対する行列計算の実行を指示する。当該指示に基づいて、行列計算部112は、データメモリ111から、1番目の出力行列について、テーブルの1番目に記載されている入力チャネルに対する行列計算に使用するカーネルを読み出す(ステップS1104)。また、このとき、行列計算部112は、データメモリ111から、テーブルの1番目に記載された入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す(ステップS1105)。なお、上述のとおり、k層目の畳み込み演算の場合、k-1層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 Based on the table created by the table creation unit 117, the read control unit 118 instructs the matrix calculation unit 112 to execute a matrix calculation for the input channel listed first in the table. Based on this instruction, the matrix calculation unit 112 reads out from the data memory 111 a kernel to be used for the matrix calculation for the input channel listed first in the table for the first output matrix (step S1104). At this time, the matrix calculation unit 112 also reads out from the data memory 111 data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the input channel listed first in the table, and the bias mentioned above (step S1105). As mentioned above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

 次いで、行列計算部112は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ111に格納する(ステップS1106、S1107)。このとき、行列計算部112は出力値をゼロチェック部113に入力する。当該入力に応じて、ゼロチェック部113は上述の状態判断を行う。そして、ゼロチェック部113は、判断結果をサブマップメモリ114に格納する(ステップS1108)。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S1106 and S1107). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. Then, the zero check unit 113 stores the judgment result in the submap memory 114 (step S1108).

 演算装置200は、以上の処理を、1番目の出力行列について上述のテーブルに記載されている全ての入力チャネルに対する処理が完了するまで繰り返し実施する(ステップS1109No)。テーブル中の全ての入力チャネルに対する処理が完了すると、データメモリ111には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 200 repeats the above process for the first output matrix until processing is completed for all input channels listed in the above table (step S1109 No). When processing is completed for all input channels in the table, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

 また、出力行列の1つの要素に対する処理が完了すると、コントローラ116が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する(ステップS1109Yes、S1110No、S1102、S1103)。そして、演算装置100は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する(ステップS1110No)。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel corresponding to the determined coordinates (steps S1109: Yes, S1110: No, S1102, S1103). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S1110: No).

 また、演算装置100は、以上の処理を、m番目(本例では、m=3)の出力行列に対する処理が完了するまで繰り返し実施する(ステップS1110Yes、S1111No)。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S1110: Yes, S1111: No).

 以上説明したように、演算装置200においても、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部112には、データメモリ111から当該入力チャネルに属するデータは読み出されることなくスキップされる。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。さらに、演算装置200では、演算に使用する入力チャネルがテーブルに記載されているため、行列計算の都度、入力チャネルに属するデータを読み込むか否かを判断する必要がない。その結果、演算全体に要する時間をより短縮することができる。 As described above, in the arithmetic device 200, state information is also created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped. At this time, the matrix calculation unit 112 skips data belonging to the input channel from the data memory 111 without reading it. In other words, since no unnecessary data is read, the data read time that is wasted as a result can be further reduced, and the time required for the entire calculation can be further shortened compared to the conventional method. Furthermore, in the arithmetic device 200, since the input channels used for the calculation are written in a table, it is not necessary to determine whether or not to read data belonging to the input channel each time a matrix calculation is performed. As a result, the time required for the entire calculation can be further shortened.

 なお、図15に示すフロー図では、行列計算部112が計算結果を出力する都度、ステップS1108において、ゼロチェック部113が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部113が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ114に格納する構成を採用することもできる。 In the flow diagram shown in FIG. 15, the zero check unit 113 updates the state information in step S1108 each time the matrix calculation unit 112 outputs a calculation result. However, the state information can also be updated at other times. For example, the zero check unit 113 can create state information and store it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

 以上では、出力行列ごとに状態情報を含むサブマップを作成する構成について説明した。しかしながら、サブマップは、物理的なメモリアクセス単位で作成することもできる。ここで、物理的なメモリアクセス単位とは、一度のメモリアクセスにより取得することができるデータ量を意味する。 The above describes a configuration for creating a submap that includes status information for each output matrix. However, a submap can also be created in physical memory access units. Here, a physical memory access unit refers to the amount of data that can be obtained by one memory access.

 図16(a)及び図16(b)は、本手法の概念を説明するための図である。図16(a)は1つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合に対応する。また、図16(b)は、1つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合に対応する。 FIGS. 16(a) and 16(b) are diagrams for explaining the concept of this method. FIG. 16(a) corresponds to the case where the amount of data in one input channel is smaller than the memory access unit. Also, FIG. 16(b) corresponds to the case where the amount of data in one input channel is larger than the memory access unit.

 図16(a)に示すように、1つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合、メモリアクセス単位中には複数の入力チャネルのデータが含まれることになる。図16(a)に示す事例では、メモリアクセス単位81には3つの入力チャネルCh0、Ch1、Ch2のデータが含まれている。また、メモリアクセス単位82には2つの入力チャネルCh2、Ch3のデータが含まれている。この事例では、入力チャネル単位でサブマップを作成すると4つのサブマップ83a、83b、83c、83dになるのに対し、メモリアクセス単位でサブマップを作成すると2つのサブマップ84a、84bになる。 As shown in Figure 16(a), when the amount of data in one input channel is smaller than the memory access unit, the memory access unit will contain data from multiple input channels. In the example shown in Figure 16(a), memory access unit 81 contains data from three input channels, Ch0, Ch1, and Ch2. Also, memory access unit 82 contains data from two input channels, Ch2 and Ch3. In this example, creating submaps in input channel units results in four submaps, 83a, 83b, 83c, and 83d, whereas creating submaps in memory access units results in two submaps, 84a and 84b.

 この場合、メモリアクセス単位81のサブマップ84aの状態情報が、例えば、上述の状態1である場合は、当該状態情報を確認するだけで3つの入力チャネルCh0、Ch1、Ch2に対する行列計算をスキップさせることができる。また、メモリアクセス単位81のサブマップ84aの状態情報が、例えば、上述の状態4である場合は、さらに、3つの入力チャネルCh0、Ch1、Ch2のサブマップ82a、82b、82cの状態情報を確認することで、特定の入力チャネルに対する行列計算をスキップさせることができる。すなわち、上述の構成に比べて、マップチェック部115によるサブマップの確認回数を少なくできる可能性がある。 In this case, if the state information of the submap 84a of the memory access unit 81 is, for example, the above-mentioned state 1, the matrix calculations for the three input channels Ch0, Ch1, and Ch2 can be skipped simply by checking the state information. Also, if the state information of the submap 84a of the memory access unit 81 is, for example, the above-mentioned state 4, the matrix calculations for a specific input channel can be skipped by further checking the state information of the submaps 82a, 82b, and 82c of the three input channels Ch0, Ch1, and Ch2. In other words, compared to the above-mentioned configuration, it is possible to reduce the number of times the submaps are checked by the map check unit 115.

 このような手法は、図1に示す構成において、ゼロチェック部113が、メモリアクセス単位で、行列計算部112による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成とすることで実現可能である。この場合、ゼロチェック部113は、サブマップメモリ114に、このゼロチェック部113の判断結果を第2状態情報として格納する構成を採用することができる。 Such a technique can be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, in units of memory access, whether or not each element of the output matrix by the matrix calculation unit 112 falls within a pre-specified range. In this case, the zero check unit 113 can be configured to store the determination result of this zero check unit 113 in the submap memory 114 as second state information.

 一方、図16(b)に示すように、1つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合、複数のメモリアクセス単位により1つの入力チャネルのデータが構成されることになる。図16(b)に示す事例では、4つのメモリアクセス単位91、92、93、94により、1つの入力チャネル95のデータが構成される。この事例では、入力チャネル単位でサブマップを作成すると1つのサブマップ96aになるのに対し、メモリアクセス単位でサブマップを作成すると4つのサブマップ97a、97b、97c、97dになる。 On the other hand, as shown in Figure 16(b), when the amount of data in one input channel is larger than the memory access unit, the data for one input channel will be made up of multiple memory access units. In the example shown in Figure 16(b), the data for one input channel 95 is made up of four memory access units 91, 92, 93, and 94. In this example, creating submaps in input channel units will result in one submap 96a, whereas creating submaps in memory access units will result in four submaps 97a, 97b, 97c, and 97d.

 この場合、入力チャネル単位のサブマップ96aの状態情報が、例えば、上述の状態4である場合は、さらに、4つのメモリアクセス単位のサブマップ97a、97b、97c、97dの状態情報を確認することで、入力チャネルの1部分に対する行列計算をスキップさせることができる可能性がある。 In this case, if the state information of the input channel unit submap 96a is, for example, state 4 described above, it may be possible to skip the matrix calculation for a portion of the input channel by further checking the state information of the four memory access unit submaps 97a, 97b, 97c, and 97d.

 このような手法も、図1に示す構成において、ゼロチェック部113が、メモリアクセス単位で、行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成により実現可能である。 Such a technique can also be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, on a memory access basis, whether each element of the output matrix by the matrix calculation unit falls within a pre-specified range.

 続いて、本手法を実施する際の動作について説明する。本手法においても、サブマップが存在しない1層目の畳み込み演算とサブマップが存在する2層目以降の畳み込み演算とで動作が異なる。しかしながら、1層目の畳み込み演算の動作は、ゼロチェック部113が上述の第2状態情報をサブマップメモリ114にさらに格納することを除いて図2に示す動作と同一であるためここでの説明は省略する。 Next, the operation when implementing this method will be described. In this method as well, the operation differs between the first layer convolution operation where there is no submap and the second layer and subsequent convolution operations where there are submaps. However, the operation of the first layer convolution operation is the same as the operation shown in FIG. 2 except that the zero check unit 113 further stores the above-mentioned second state information in the submap memory 114, so a description thereof will be omitted here.

 また、2層目以降の畳み込み演算の動作は、図3に示すフロー図において、マップチェック部115が、入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認するステップの前後に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第2状態情報を確認するステップが追加されることになる。すなわち、図16(a)に示すように、1つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合、当該サブマップに含まれる状態情報を確認するステップの前に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第2状態情報を確認するステップが追加されることになる。また、図16(b)に示すように、1つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合、当該サブマップに含まれる状態情報を確認するステップの後に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第2状態情報を確認するステップが追加されることになる。 In addition, in the operation of the convolution calculation from the second layer onwards, in the flow diagram shown in FIG. 3, before and after the step in which the map check unit 115 reads the submap corresponding to the input channel and checks the state information contained in the submap, a step is added in which the map check unit 115 reads the submap corresponding to the memory access and checks the second state information contained in the submap. That is, as shown in FIG. 16(a), when the amount of data of one input channel is smaller than the memory access unit, a step is added in which the submap corresponding to the memory access is read and the second state information contained in the submap is checked before the step in which the submap corresponding to the memory access is checked. Also, as shown in FIG. 16(b), when the amount of data of one input channel is larger than the memory access unit, a step is added in which the submap corresponding to the memory access is read and the second state information contained in the submap is checked after the step in which the submap corresponding to the memory access is checked.

 以上説明したように、本手法においても、上述の効果を得ることができる。加えて、本手法では、行列計算部112に、入力チャネルに属するデータが読み出されることなくスキップされる回数を増やすことができる可能性がある。 As explained above, the above-mentioned effects can be obtained with this method as well. In addition, with this method, it is possible to increase the number of times that data belonging to an input channel is skipped without being read by the matrix calculation unit 112.

 以上では、1層目の畳み込み演算では状態情報を使用しない構成について説明したが、1層目の畳み込み演算において状態情報を使用することも不可能ではない。図10において説明したように、畳み込み演算では、同一層の行列計算において、同じ入力チャネルを使用して複数の出力行列が算出される。したがって、1つ目の出力行列を算出する行列計算においてサブマップメモリ114に登録された状態情報を、同一層の畳み込み演算における2つ目以降の出力行列を算出する行列計算において使用することは可能である。 Although the above describes a configuration in which state information is not used in the convolution calculation of the first layer, it is not impossible to use state information in the convolution calculation of the first layer. As described in FIG. 10, in the convolution calculation, multiple output matrices are calculated using the same input channel in the matrix calculation of the same layer. Therefore, it is possible to use state information registered in the submap memory 114 in the matrix calculation that calculates the first output matrix in the matrix calculation that calculates the second or subsequent output matrices in the convolution calculation of the same layer.

 なお、本構成において、同一層の1つ目の出力行列を計算する際の動作は、図2に示すフロー図や図11に示す動作における、1つの出力行列を計算する部分と同様である。また、同一層の2つ目以降の出力行列を計算する際の動作は、図3に示すフロー図や図12に示す動作において、マップチェック部115が読み出す状態情報を同一層の1つ目の出力行列に対応する状態情報とした場合と同様である。 In this configuration, the operation when calculating the first output matrix of the same layer is the same as the part that calculates one output matrix in the flow diagram shown in FIG. 2 and the operation shown in FIG. 11. Also, the operation when calculating the second and subsequent output matrices of the same layer is the same as the case where the state information read by the map check unit 115 is the state information corresponding to the first output matrix of the same layer in the flow diagram shown in FIG. 3 and the operation shown in FIG. 12.

 以上説明したとおり、本発明によれば、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As explained above, according to the present invention, it is possible to reduce wasted time and further shorten the time required for the entire calculation compared to the conventional method.

 なお、上述した実施形態は本発明の技術的範囲を制限するものではなく、既に記載したもの以外でも種々の変形や応用が可能である。例えば、上記実施形態では、プーリング層について言及していないが、第k-1層の畳み込み演算と第k層の畳み込み演算との間にプーリング層が存在してもよい。出力行列にプーリングが行われても、プーリング後のデータにプーリング前の出力行列の特徴は引き継がれているため、サブマップの情報を問題なく使用できる。また、上述の実施形態では、サブマップが、負判定情報や計数情報を含む事例について説明したが、サブマップは少なくとも状態情報を含んでいればよく、他の情報を含むことは必須ではない。 Note that the above-described embodiments do not limit the technical scope of the present invention, and various modifications and applications are possible other than those already described. For example, although the above embodiments do not mention a pooling layer, a pooling layer may exist between the convolution operation of the k-1th layer and the convolution operation of the kth layer. Even if pooling is performed on the output matrix, the characteristics of the output matrix before pooling are inherited by the data after pooling, so the information in the submap can be used without problems. Furthermore, in the above-described embodiments, a case has been described in which the submap includes negative judgment information and counting information, but it is sufficient that the submap includes at least status information, and it is not essential that it includes other information.

 また、実施形態中で言及した種々の構成は、任意に組み合わせて使用することができる。さらに、図2、図3、図8、図9、図11、図12、図15に示すフロー図は例示であり、等価な作用を奏する範囲において、各ステップの順序を適宜変更可能である。 Furthermore, the various configurations mentioned in the embodiments can be used in any combination. Furthermore, the flow charts shown in Figures 2, 3, 8, 9, 11, 12, and 15 are examples, and the order of each step can be changed as appropriate within the range where an equivalent effect is achieved.

 また、以上では、本発明に係る演算装置による行列計算が、畳み込みニューラルネットワークの畳み込み層における行列計算である事例について説明したが、本発明は、畳み込みニューラルネットワークの畳み込み層に限定されない。本発明は、一連の行列計算において、先の行列計算の出力行列を後の行列計算の演算対象データとして使用する任意の行列計算に適用可能である。 In the above, an example has been described in which the matrix calculation performed by the arithmetic device according to the present invention is a matrix calculation in a convolutional layer of a convolutional neural network, but the present invention is not limited to the convolutional layer of a convolutional neural network. The present invention is applicable to any matrix calculation in which, in a series of matrix calculations, the output matrix of a previous matrix calculation is used as the data to be calculated in the subsequent matrix calculation.

 本発明によれば、結果的に無駄になる時間をより低減できる結果、演算全体に要する時間を従来に比べてさらに短縮することができ、演算装置として有用である。 The present invention is thus useful as a calculation device, since it is possible to reduce wasted time and therefore shorten the time required for the entire calculation compared to conventional methods.

 100、200、300 演算装置
 111 データメモリ
 112 行列計算部
 113 ゼロチェック部
 114 サブマップメモリ
 115 マップチェック部
 116 コントローラ
 117 テーブル作成部
 118 読み出し制御部
 120 サブマップアドレスバッファ
100, 200, 300 Calculation device 111 Data memory 112 Matrix calculation unit 113 Zero check unit 114 Submap memory 115 Map check unit 116 Controller 117 Table creation unit 118 Read control unit 120 Submap address buffer

Claims (14)

 一連の行列計算において、先の行列計算の出力行列を後の行列計算の演算対象データとして使用する演算装置であって、
 演算対象データを格納するデータメモリと、
 前記データメモリから前記データを読み出して行列計算をし、出力行列を前記データメモリに格納する行列計算部と、
 前記出力行列の各要素に対して予め指定された範囲内に属するか否かを判断するゼロチェック部と、
 前記ゼロチェック部の判断結果を状態情報として格納するサブマップメモリと、
 前記サブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして前記行列計算部に読み出させるか否かを判断するマップチェック部と、
を備える演算装置。
A calculation device that uses an output matrix of a previous matrix calculation as calculation target data of a subsequent matrix calculation in a series of matrix calculations,
a data memory for storing data to be calculated;
a matrix calculation unit that reads the data from the data memory, performs a matrix calculation, and stores an output matrix in the data memory;
a zero check unit that judges whether each element of the output matrix is within a predetermined range;
a submap memory for storing a result of the determination by the zero check unit as status information;
a map check unit which determines whether or not to cause the matrix calculation unit to read an output matrix corresponding to the state information stored in the submap memory as data to be calculated, based on the state information;
A computing device comprising:
 前記行列計算が畳み込みニューラルネットワークにおける畳み込み演算であり、
 前記行列計算部が出力行列を畳み込み演算における次層の演算対象データとして前記データメモリに格納し、
 前記マップチェック部が前記サブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を畳み込み演算における次層の演算対象データとして前記行列計算部に読み出させるか否かを判断する、請求項1記載の演算装置。
the matrix calculation is a convolution operation in a convolutional neural network;
The matrix calculation unit stores the output matrix in the data memory as data to be calculated in a next layer in the convolution calculation;
2. The arithmetic device according to claim 1, wherein the map check unit determines, based on the state information stored in the submap memory, whether or not to cause the matrix calculation unit to read out an output matrix corresponding to the state information as data to be calculated in a next layer in a convolution calculation.
 前記行列計算部は、畳み込み演算における同一層に属する全ての入力チャネルにおいて、各入力チャネルの一部を構成する、同一の座標領域を前記データとして読み出して行列計算をする、請求項2記載の演算装置。 The calculation device according to claim 2, wherein the matrix calculation unit reads out the same coordinate region that constitutes a part of each input channel as the data for all input channels that belong to the same layer in the convolution calculation, and performs matrix calculation.  前記行列計算部は、同一出力行列を連続的に行列計算する複数の前記データを一時に読み出し、
 前記ゼロチェック部は、前記行列計算部に連続して読み出される複数の前記データの状態情報として、当該複数の前記データに対応する複数の前記状態情報のうちの1つを使用する、請求項1記載の演算装置。
The matrix calculation unit simultaneously reads out a plurality of the data for successively calculating the same output matrix;
2 . The arithmetic device according to claim 1 , wherein the zero check unit uses one of the plurality of state information corresponding to the plurality of data successively read out to the matrix calculation unit as the state information of the plurality of data.
 前記行列計算部は、畳み込み演算における同一層において、連続的に行列計算する複数の前記データを一時に読み出し、
 前記ゼロチェック部は、前記行列計算部に連続して読み出される複数の前記データの状態情報として、当該複数の前記データに対応する複数の前記状態情報のうちの1つを使用する、請求項3記載の演算装置。
the matrix calculation unit simultaneously reads out a plurality of pieces of data for which matrix calculation is to be performed successively in the same layer in a convolution operation;
4. The arithmetic device according to claim 3, wherein the zero check unit uses one of the plurality of state information corresponding to the plurality of data as the state information of the plurality of data successively read out to the matrix calculation unit.
 前記マップチェック部の判断結果に基づいて、前記行列計算部に読み出させる前記出力行列を特定するテーブルを作成するテーブル作成部と、
 前記テーブルに基づいて前記行列計算部に前記データを読み出させる読み出し制御部をさらに備える、請求項2記載の演算装置。
a table creation unit that creates a table for specifying the output matrix to be read by the matrix calculation unit based on a result of the determination by the map check unit;
The arithmetic unit according to claim 2 , further comprising a read control unit that causes said matrix calculation unit to read said data based on said table.
 前記ゼロチェック部は、メモリアクセス単位で行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断するとともに、前記サブマップメモリは、当該ゼロチェック部の判断結果を第2状態情報として格納する、請求項1記載の演算装置。 The arithmetic device according to claim 1, wherein the zero check unit further determines whether each element of the output matrix by the matrix calculation unit is within a pre-specified range in units of memory access, and the submap memory stores the determination result of the zero check unit as second state information.  前記ゼロチェック部は、メモリアクセス単位で行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断するとともに、前記サブマップメモリは、当該ゼロチェック部の判断結果を第2状態情報として格納する、請求項2記載の演算装置。 The arithmetic device according to claim 2, wherein the zero check unit further determines whether each element of the output matrix by the matrix calculation unit is within a pre-specified range in units of memory access, and the submap memory stores the determination result of the zero check unit as second state information.  前記マップチェック部は、畳み込み演算における第1層目の最初の行列計算の結果として前記サブマップメモリに格納された前記状態情報に基づいて、前記第1層目の以降の行列計算を実行する、請求項2記載の演算装置。 The arithmetic device according to claim 2, wherein the map check unit executes matrix calculations for the first layer and subsequent layers based on the state information stored in the submap memory as a result of the initial matrix calculation for the first layer in the convolution calculation.  前記状態情報に対応する出力行列を識別するための情報と、前記サブマップメモリにおける当該状態情報の格納位置を示す情報と、当該状態情報が次層の畳み込み演算に使用されたか否かを示す情報と、を紐づけて格納するサブマップメモリバッファをさらに備え、
 新たに生成される前記状態情報の前記サブマップメモリにおける格納位置として、次層の畳み込み演算に使用されたことを示す前記使用情報に紐づけられた格納位置が選択される、請求項2記載の演算装置。
a submap memory buffer that stores information for identifying an output matrix corresponding to the state information, information indicating a storage location of the state information in the submap memory, and information indicating whether the state information has been used in a convolution operation of a next layer, in association with each other;
The arithmetic device according to claim 2 , wherein a storage location associated with the usage information indicating that the state information has been used in a convolution operation of a next layer is selected as a storage location in the submap memory for the newly generated state information.
 前記サブマップメモリに、前記行列計算に使用されるカーネルの各要素に対して予め指定された範囲内に属するか否かを判断したカーネル状態情報が予め格納され、前記マップチェック部が、前記サブマップメモリに格納された状態情報及びカーネル状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして前記行列計算部に読み出させるか否かを判断する、請求項2記載の演算装置。 The calculation device according to claim 2, wherein the submap memory stores in advance kernel state information that determines whether each element of the kernel used in the matrix calculation falls within a pre-specified range, and the map check unit determines whether to read the output matrix corresponding to the state information as data to be calculated from the state information stored in the submap memory and the kernel state information.  前記ゼロチェック部は、複数の閾値と前記出力行列の各要素とを比較し、前記出力行列の全要素が、前記複数の閾値により規定される複数の範囲のいずれに属するかを判断する、請求項1から請求項11のいずれか1項に記載の演算装置。 The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit compares each element of the output matrix with a plurality of thresholds and determines which of a plurality of ranges defined by the plurality of thresholds all elements of the output matrix belong to.  前記ゼロチェック部は、前記出力行列の各要素に負の値が存在するか否か、又は、前記複数の範囲のいずれか1つに属する前記要素の個数を、さらに判断する、請求項1から請求項11のいずれか1項に記載の演算装置。 The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit further determines whether or not a negative value exists in each element of the output matrix, or the number of the elements that belong to any one of the multiple ranges.  前記ゼロチェック部は、同一層に属する入力チャネルにおいて、最後に行列計算される入力チャネルについての行列計算の際に状態情報を作成し、サブマップメモリに格納する請求項1から請求項11のいずれか1項に記載の演算装置。 The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit creates state information during matrix calculation for an input channel that is last to be matrix-calculated among input channels belonging to the same layer, and stores the state information in a submap memory.
PCT/JP2024/009203 2023-03-30 2024-03-09 Calculating device Pending WO2024203190A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023-056298 2023-03-30
JP2023056298 2023-03-30

Publications (1)

Publication Number Publication Date
WO2024203190A1 true WO2024203190A1 (en) 2024-10-03

Family

ID=92904410

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/009203 Pending WO2024203190A1 (en) 2023-03-30 2024-03-09 Calculating device

Country Status (1)

Country Link
WO (1) WO2024203190A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190317857A1 (en) * 2019-04-26 2019-10-17 Intel Corporation Technologies for providing error correction for row direction and column direction in a cross point memory
JP2022523762A (en) * 2019-03-15 2022-04-26 インテル コーポレイション Sparse optimization for matrix accelerator architecture
WO2022123687A1 (en) * 2020-12-09 2022-06-16 日本電信電話株式会社 Calculation circuit, calculation method, and program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022523762A (en) * 2019-03-15 2022-04-26 インテル コーポレイション Sparse optimization for matrix accelerator architecture
US20190317857A1 (en) * 2019-04-26 2019-10-17 Intel Corporation Technologies for providing error correction for row direction and column direction in a cross point memory
WO2022123687A1 (en) * 2020-12-09 2022-06-16 日本電信電話株式会社 Calculation circuit, calculation method, and program

Similar Documents

Publication Publication Date Title
US11423285B2 (en) Buffer addressing for a convolutional neural network
US11720646B2 (en) Operation accelerator
US11436483B2 (en) Neural network engine with tile-based execution
JP5171118B2 (en) Arithmetic processing apparatus and control method thereof
US20210192246A1 (en) Convolutional neural network-based image processing method and device, and unmanned aerial vehicle
US20200327079A1 (en) Data processing method and device, dma controller, and computer readable storage medium
US20210157594A1 (en) Data temporary storage apparatus, data temporary storage method and operation method
US12488253B2 (en) Neural network comprising matrix multiplication
EP3985509A1 (en) Neural network segmentation method, prediction method, and related apparatus
KR910000365B1 (en) Memory circuit
KR102733032B1 (en) Apparatus and method for address generation of multi-dimensional tensor
WO2024203190A1 (en) Calculating device
CN109726798B (en) Data processing method and device
CN118012631B (en) Operator execution method, processing device, storage medium and program product
US20200226450A1 (en) Model calculation unit and control unit for calculating a multilayer perceptron model with feedforward and feedback
CN116150563B (en) A business execution method, device, storage medium and electronic equipment
KR970004526B1 (en) Fuzzy reasoning processor and method and rule setting apparatus and method
US12361274B2 (en) Processing unit for performing operations of a neural network
CN116721006B (en) Feature map processing method and device
US20230214445A1 (en) Electronic device and control method for electronic device
CN118821872A (en) Model processing method, device, equipment, storage medium and computer program product
JP7420100B2 (en) Processing device, processing method, and program
US10761848B1 (en) Systems and methods for implementing core level predication within a machine perception and dense algorithm integrated circuit
KR102783144B1 (en) Data processing method and electromic device using scatter gather dma
CN118672589B (en) A register resource allocation method, device, storage medium and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24779321

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025510214

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025510214

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE