WO2024203190A1

WO2024203190A1 - Calculating device

Info

Publication number: WO2024203190A1
Application number: PCT/JP2024/009203
Authority: WO
Inventors: 美香中村; 周一高田
Original assignee: Architek; Architek Corp
Current assignee: Architek; Architek Corp
Priority date: 2023-03-30
Filing date: 2024-03-09
Publication date: 2024-10-03
Anticipated expiration: 2025-09-30

Abstract

In the present invention, a matrix calculation unit reads calculation target data from a data memory, performs matrix calculation, and stores an output matrix in the data memory as the calculation target data. A zero check unit determines whether or not each element of the output matrix belongs to a predetermined range. A sub-map memory stores the determination result of the zero check unit as state information. On the basis of the state information stored in the sub-map memory, a map check unit determines whether or not to cause the matrix calculation unit to read, as the calculation target data, the output matrix corresponding to the state information.

Description

Calculation Unit

　本発明は、畳み込み演算等の行列計算を実行する演算装置に関する。 The present invention relates to a calculation device that performs matrix calculations such as convolution calculations.

　従来、機械学習の分野において、畳み込みニューラルネットワーク（ＣＮＮ）を用いて、画像や動画の認識が行われている。例えば、画像の認識では、畳み込み層とプーリング層を使って入力画像を変換しながら、データ量を徐々に小さくしていき、最終的に各分類の確率の値を出力する。 Traditionally, in the field of machine learning, convolutional neural networks (CNNs) are used to recognize images and videos. For example, in image recognition, the input image is transformed using convolutional layers and pooling layers, gradually reducing the amount of data, and finally outputting the probability value for each classification.

　ここで、ＣＮＮの畳み込み層では、入力データにおけるそれぞれの座標領域（例えば、３×３のセルの領域）に対して、フィルタをかけ合わせる演算が実行される。そして、演算結果が次層の演算の入力として使用され、畳み込み演算が繰り返される。そのため、ＣＮＮを用いた機械学習では、多くの行列計算とメモリ帯域幅が必要になる。この条件を緩和する手法として、演算対象がゼロである場合に計算を行うことなくスキップする構成が用いられている（例えば、特許文献１、２等）。 Here, in the convolutional layer of a CNN, a filter multiplication operation is performed on each coordinate region (e.g., a 3x3 cell region) in the input data. The calculation result is then used as the input for the calculation in the next layer, and the convolution operation is repeated. For this reason, machine learning using a CNN requires many matrix calculations and memory bandwidth. As a method for alleviating this condition, a configuration is used that skips calculations when the calculation target is zero (e.g., Patent Documents 1 and 2, etc.).

特開２０２０－１８４３０９号公報JP 2020-184309 A 特開２０１８－０２８９０８号公報JP 2018-028908 A

　特許文献１や特許文献２が開示する技術は計算を実施しないため、計算時間を短縮することができる。その結果、畳み込み演算全体に要する時間も短縮することができる。しかしながら、当該技術では、演算対象の入力に使用されるデータが計算用メモリに読み込まれ、計算用メモリに読み込まれたデータに対してゼロであるか否かの判定がされる。すなわち、結果として計算がスキップされた場合には、計算に使用しないデータのために、データ読み出しの時間やメモリ領域が使用されていることになる。 The techniques disclosed in Patent Documents 1 and 2 do not involve calculations, and therefore can reduce calculation time. As a result, the time required for the entire convolution operation can also be reduced. However, in these techniques, the data used as input for the operation target is loaded into a calculation memory, and a determination is made as to whether the data loaded into the calculation memory is zero or not. In other words, if the calculation is skipped as a result, the time and memory space required to read the data is used for data not used in the calculation.

　本発明は、上記実情に鑑みてなされたものであり、畳み込み演算等の行列計算を実行する際に、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる演算装置を提供することを目的としている。 The present invention was made in consideration of the above-mentioned circumstances, and aims to provide a calculation device that can reduce the time that is wasted when performing matrix calculations such as convolution operations, and can further shorten the time required for the entire calculation compared to conventional methods.

　上述の目的を達成するために、本発明は以下の技術的手段を採用している。まず、本発明は、一連の行列計算において、先の行列計算の出力行列を後の行列計算の演算対象データとして使用する演算装置を前提としている。そして、本発明に係る演算装置は、データメモリ、行列計算部、ゼロチェック部、サブマップメモリ、及びマップチェック部を備える。データメモリは演算対象データを格納する。行列計算部はデータメモリからデータを読み出して行列計算をし、出力行列をデータメモリに格納する。ゼロチェック部は出力行列の各要素に対して予め指定された範囲内に属するか否かを判断する。サブマップメモリは、ゼロチェック部の判断結果を状態情報として格納する。マップチェック部は、サブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして行列計算部に読み出させるか否かを判断する。 In order to achieve the above object, the present invention employs the following technical means. First, the present invention is premised on a calculation device in which, in a series of matrix calculations, the output matrix of a previous matrix calculation is used as the data to be calculated for the subsequent matrix calculation. The calculation device according to the present invention comprises a data memory, a matrix calculation unit, a zero check unit, a submap memory, and a map check unit. The data memory stores the data to be calculated. The matrix calculation unit reads data from the data memory, performs a matrix calculation, and stores the output matrix in the data memory. The zero check unit judges whether each element of the output matrix falls within a pre-specified range. The submap memory stores the judgment result of the zero check unit as status information. The map check unit judges, based on the status information stored in the submap memory, whether to cause the matrix calculation unit to read out the output matrix corresponding to the status information as the data to be calculated.

　この演算装置によれば、マップチェック部が演算対象データを読み出させないと判断した場合、演算対象データが行列計算部に読み出されることなく計算がスキップされるため、不必要なデータ読み出し時間を低減することができる。その結果、演算全体に要する時間を短縮することが可能となる。 With this calculation device, if the map check unit determines that the data to be calculated should not be read, the calculation is skipped without the data to be calculated being read to the matrix calculation unit, thereby reducing unnecessary data read time. As a result, it is possible to shorten the time required for the entire calculation.

　上述の構成において、行列計算部が、同一出力行列を連続的に行列計算する複数のデータを一時に読み出す構成を採用することもできる。この場合、ゼロチェック部は、行列計算部に連続して読み出される複数のデータの状態情報として、当該複数のデータに対応する複数の状態情報のうちの１つを使用する。 In the above configuration, the matrix calculation unit may simultaneously read out multiple pieces of data for successively calculating the same output matrix. In this case, the zero check unit uses one of the multiple pieces of status information corresponding to the multiple pieces of data as status information for the multiple pieces of data successively read out to the matrix calculation unit.

　また、上述の構成は、上述の行列計算が、例えば、畳み込みニューラルネットワークにおける畳み込み演算である事例に適用可能である。この場合、行列計算部が、出力行列を畳み込み演算における次層の演算対象データとしてデータメモリに格納する。また、マップチェック部がサブマップメモリに格納された状態情報に基づいて、当該状態情報に対応する出力行列を畳み込み演算における次層の演算対象データとして行列計算部に読み出させるか否かを判断する。また、本構成において、行列計算部が、畳み込み演算における同一層に属する全ての入力チャネルにおいて、各入力チャネルの一部を構成する、同一の座標領域を計算対象データとして読み出して行列計算をする構成を採用することができる。また、当該構成において、行列計算部が、畳み込み演算における同一層において、連続的に行列計算する複数のデータを一時に読み出す構成を採用することもできる。この場合、ゼロチェック部は、行列計算部に連続して読み出される複数のデータの状態情報として、当該複数のデータに対応する複数の状態情報のうちの１つを使用する。 The above-mentioned configuration can also be applied to cases where the above-mentioned matrix calculation is, for example, a convolution operation in a convolutional neural network. In this case, the matrix calculation unit stores the output matrix in the data memory as the calculation target data of the next layer in the convolution operation. Furthermore, the map check unit determines whether or not to read the output matrix corresponding to the state information stored in the submap memory as the calculation target data of the next layer in the convolution operation to the matrix calculation unit. In addition, in this configuration, a configuration can be adopted in which the matrix calculation unit reads out the same coordinate area that constitutes a part of each input channel as the calculation target data in all input channels belonging to the same layer in the convolution operation and performs matrix calculation. In addition, in this configuration, a configuration can also be adopted in which the matrix calculation unit reads out multiple data for which matrix calculation is to be performed continuously at the same layer in the convolution operation at one time. In this case, the zero check unit uses one of multiple state information corresponding to the multiple data as the state information of the multiple data continuously read out to the matrix calculation unit.

　また、上述の演算装置において、テーブル作成部と読み出し制御部をさらに備える構成を採用することもできる。テーブル作成部は、マップチェック部の判断結果に基づいて、行列計算部に読み出させる出力行列を特定するテーブルを作成する。読み出し制御部は、作成されたテーブルに基づいて行列計算部にデータを読み出させる。 The above-mentioned arithmetic device may also be configured to further include a table creation unit and a read control unit. The table creation unit creates a table that specifies the output matrix to be read by the matrix calculation unit based on the determination result of the map check unit. The read control unit causes the matrix calculation unit to read data based on the created table.

　また、上述の演算装置において、ゼロチェック部が、メモリアクセス単位で行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成を採用することもできる。この場合、サブマップメモリは、当該ゼロチェック部の判断結果を第２状態情報として格納する。 In addition, in the above-mentioned arithmetic device, a configuration can be adopted in which the zero check unit further determines whether or not each element of the output matrix by the matrix calculation unit falls within a pre-specified range in units of memory access. In this case, the submap memory stores the determination result of the zero check unit as the second state information.

　また、上述の演算装置において、マップチェック部が、畳み込み演算における第１層目の最初の行列計算の結果としてサブマップメモリに格納された状態情報に基づいて、第１層目の以降の行列計算を実行する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the map check unit executes matrix calculations for the first layer and beyond based on state information stored in the submap memory as a result of the initial matrix calculation for the first layer in the convolution calculation.

　また、上述の演算装置において、状態情報に対応する出力行列を識別するための情報と、サブマップメモリにおける当該状態情報の格納位置を示す情報と、当該状態情報が次層の畳み込み演算に使用されたか否かを示す情報と、を紐づけて格納するサブマップメモリバッファをさらに備える構成を採用することもできる。この構成では、新たに生成される状態情報のサブマップメモリにおける格納位置として、次層の畳み込み演算に使用されたことを示す使用情報に紐づけられた格納位置が選択される。 In addition, the above-mentioned arithmetic device may be further configured to include a submap memory buffer that stores, in association with each other, information for identifying an output matrix corresponding to state information, information indicating the storage location of the state information in the submap memory, and information indicating whether the state information has been used in the convolution calculation of the next layer. In this configuration, a storage location in the submap memory that is associated with usage information indicating use in the convolution calculation of the next layer is selected as the storage location in the submap memory for the newly generated state information.

　また、上述の演算装置において、サブマップメモリに、上述の行列計算に使用されるカーネルの各要素に対して予め指定された範囲内に属するか否かを判断したカーネル状態情報が予め格納される構成を採用することもできる。この場合、マップチェック部が、サブマップメモリに格納された状態情報及びカーネル状態情報に基づいて、当該状態情報に対応する出力行列を演算対象データとして行列計算部に読み出させるか否かを判断する。 In addition, the above-mentioned calculation device can also be configured to have the submap memory store in advance kernel state information that has been determined for each element of the kernel used in the above-mentioned matrix calculation as being within a pre-specified range. In this case, the map check unit determines, based on the state information and kernel state information stored in the submap memory, whether or not to cause the matrix calculation unit to read out the output matrix corresponding to the state information as data to be calculated.

　また、上述の演算装置において、ゼロチェック部が、複数の閾値と出力行列の各要素とを比較し、出力行列の全要素が、複数の閾値により規定される複数の範囲のいずれに属するかを判断する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit compares each element of the output matrix with multiple thresholds and determines which of multiple ranges defined by the multiple thresholds all elements of the output matrix belong to.

　また、上述の演算装置において、ゼロチェック部が、出力行列の各要素に負の値が存在するか否か、又は、複数の範囲のいずれか１つに属する要素の個数を、さらに判断する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit further determines whether or not a negative value exists in each element of the output matrix, or the number of elements that belong to one of a plurality of ranges.

　また、上述の演算装置において、ゼロチェック部が、同一層に属する入力チャネルにおいて、最後に行列計算される入力チャネルについての行列計算の際に状態情報を作成し、サブマップメモリに格納する構成を採用することもできる。 In addition, in the above-mentioned calculation device, a configuration can be adopted in which the zero check unit creates status information during matrix calculation for the input channel that is the last to be calculated among input channels belonging to the same layer, and stores the information in the submap memory.

　本発明によれば、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 According to the present invention, it is possible to reduce wasted time and further shorten the time required for the entire calculation compared to the conventional method.

図１は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 1 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図２は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 2 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図３は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 3 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図４は、本発明の一実施形態に係る演算装置が備えるゼロチェック部の一例を示す概略構成図である。FIG. 4 is a schematic configuration diagram showing an example of a zero check unit included in the arithmetic device according to an embodiment of the present invention. 図５は、本発明の一実施形態に係る演算装置が備えるサブマップの一例を示す概略構成図である。FIG. 5 is a schematic diagram showing an example of a sub-map included in the arithmetic device according to an embodiment of the present invention. 図６は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 6 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図７は、本発明の一実施形態に係る演算装置が備えるサブマップアドレステーブルの一例を示す概略構成図である。FIG. 7 is a schematic diagram showing an example of a submap address table included in the arithmetic unit according to one embodiment of the present invention. 図８は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 8 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図９は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 9 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図１０は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。FIG. 10 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention. 図１１は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 11 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図１２は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 12 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図１３は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。FIG. 13 is an explanatory diagram that illustrates the concept of a convolution calculation method by a calculation device according to an embodiment of the present invention. 図１４は、本発明の一実施形態に係る演算装置の一例を示す概略構成図である。FIG. 14 is a schematic diagram showing an example of a calculation device according to an embodiment of the present invention. 図１５は、本発明の一実施形態に係る演算装置による畳み込み演算の手順を示すフロー図である。FIG. 15 is a flow diagram showing the procedure of a convolution calculation by a calculation device according to an embodiment of the present invention. 図１６（ａ）及び図１６（ｂ）は、本発明の一実施形態に係る演算装置による畳み込み演算手法の概念を模式的に示す説明図である。16(a) and 16(b) are explanatory diagrams that diagrammatically show the concept of a convolution calculation method by a calculation device according to one embodiment of the present invention.

　以下、本発明の実施形態について、図面を参照しながらより詳細に説明する。以下では、本発明に係る、一連の行列計算において先の行列計算の出力行列を後の行列計算の演算対象データとして使用する演算装置を、畳み込みニューラルネットワーク（ＣＮＮ）の畳み込み層の処理を実現する演算装置として具体化している。 Below, an embodiment of the present invention will be described in more detail with reference to the drawings. In the following, a calculation device according to the present invention, which uses the output matrix of a previous matrix calculation as the calculation target data for a subsequent matrix calculation in a series of matrix calculations, is embodied as a calculation device that realizes the processing of the convolution layer of a convolutional neural network (CNN).

　公知のとおり、畳み込みニューラルネットワーク（ＣＮＮ）は、畳み込み層とプーリング層とを含む。認識対象である画像等の入力データは、畳み込み層における処理とプーリング層における処理との一連の処理が繰り返し実施されることで、データ量が徐々に小さくなる。そして、畳み込みニューラルネットワークは、最終的に、その入力画像がどのような物体であるかを示す分類の確率の値を出力する。 As is well known, a convolutional neural network (CNN) includes a convolutional layer and a pooling layer. The amount of input data, such as an image to be recognized, gradually decreases as a series of processes in the convolutional layer and the pooling layer are repeatedly performed. The convolutional neural network then ultimately outputs a classification probability value that indicates the type of object the input image is.

　認識対象が画像である場合、畳み込み層では、例えば、画像の各画素のＲ、Ｇ、Ｂの各入力チャネルについて、各画素値を二次元状に配列したデータが入力データとして入力される。畳み込み層では、入力画像の各入力チャネルにおいて、入力データの一部を構成する座標領域（例えば、３×３の領域）に対して、カーネル（フィルタ）が掛け合わされる。そして、例えば、各入力チャネルについて計算された結果を合算することで、当該座標領域の出力値が算出される。このようなカーネルとの掛け合わせは、各入力チャネルにおいて、座標領域の一部分が重なる状態で座標領域が順次移動され、入力画像の全体に対して実施される。このとき、各入力チャネルに対して使用されるカーネルは、畳み込み演算により出力される出力チャネル数に応じた複数組が用意される。例えば、入力データが３チャネルで出力データが３チャネルの場合、各入力チャネルの座標領域に対して掛け合わされる３つのカーネルからなるカーネルセットが３組用意される。 When the recognition target is an image, for example, data in which each pixel value is arranged two-dimensionally for each input channel of R, G, and B of each pixel of the image is input as input data to the convolution layer. In the convolution layer, a kernel (filter) is multiplied to a coordinate region (e.g., a 3×3 region) that constitutes part of the input data for each input channel of the input image. Then, for example, the results calculated for each input channel are added together to calculate the output value of that coordinate region. This multiplication with the kernel is performed on the entire input image by sequentially moving the coordinate region in each input channel while a portion of the coordinate region overlaps. At this time, multiple sets of kernels are prepared for each input channel according to the number of output channels output by the convolution operation. For example, if the input data is three channels and the output data is three channels, three kernel sets consisting of three kernels that are multiplied to the coordinate region of each input channel are prepared.

　また、プーリング層では、畳み込み層での演算結果として出力される二次元データ(出力行列）について座標領域ごとにデータが処理されることで出力値が算出される。例えば、プーリング層では、２×２の座標領域における平均値や最大値がその座標領域の出力値として出力される。なお、プーリング層は省略されることもある。プーリング層が省略された場合、１層目の畳み込み演算により出力されたデータをそのまま入力データとして２層目の畳み込み演算が実施される。 In addition, in the pooling layer, output values are calculated by processing the data for each coordinate region of the two-dimensional data (output matrix) output as the result of the calculations in the convolution layer. For example, in the pooling layer, the average or maximum value in a 2x2 coordinate region is output as the output value for that coordinate region. Note that the pooling layer may be omitted. When the pooling layer is omitted, the data output by the convolution calculation in the first layer is used as input data to carry out the convolution calculation in the second layer.

　図１は、本発明の一実施形態における演算装置の構成を示す概略構成図である。図１に示すように本実施形態の演算装置１００は、データメモリ１１１、行列計算部１１２、ゼロチェック部１１３、サブマップメモリ１１４、マップチェック部１１５、コントローラ１１６を備える。演算装置１００は、従来の畳み込み演算において演算結果として得られる出力行列（メインマップ）とは別にサブマップを作成し、当該サブマップを利用して畳み込み演算全体に要する時間を短縮する。サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごとに作成される。 FIG. 1 is a schematic diagram showing the configuration of a calculation device in one embodiment of the present invention. As shown in FIG. 1, the calculation device 100 of this embodiment includes a data memory 111, a matrix calculation unit 112, a zero check unit 113, a submap memory 114, a map check unit 115, and a controller 116. The calculation device 100 creates a submap separate from the output matrix (main map) obtained as the calculation result in a conventional convolution calculation, and uses the submap to reduce the time required for the entire convolution calculation. A submap is created for each input channel used as input data for the convolution calculation in the next layer.

　データメモリ１１１は、演算対象データを格納する。上述のとおり、画像が演算対象になる場合、例えば、画像を構成する各画素の画素値が入力データとしてデータメモリ１１１に格納される。また、本実施形態では、データメモリ１１１には、行列計算部１１２における行列計算に使用される後述のカーネル、バイアスも格納されている。 The data memory 111 stores the data to be calculated. As described above, when an image is the object of calculation, for example, the pixel values of each pixel constituting the image are stored in the data memory 111 as input data. In this embodiment, the data memory 111 also stores the kernel and bias described below that are used for the matrix calculation in the matrix calculation unit 112.

　行列計算部１１２は、データメモリ１１１からデータを読み出して行列計算をする。行列計算部１１２は、上述のように、Ｒ、Ｇ、Ｂの入力チャネルごとに、３×３の領域により構成される座標領域の画素値とカーネルとの掛け合わせを含む行列計算を実行する。本実施形態では、行列計算部１１２は、座標領域データＰ、カーネルＡ、バイアスＢ、算出行列Ｑ、として、Ｑ＝Ａ＊Ｐ＋Ｂの行列計算を実施する。行列計算部１１２は、入力データ全体についての行列計算結果である出力行列（出力チャネル）を畳み込み演算における次層の演算対象データとしてデータメモリ１１１に格納する。特に限定されないが、本実施形態では、行列計算部１１２は、各入力チャネルの座標領域についての上記行列計算を実施した結果である算出行列Ｑの各要素の合算値を出力値として出力する。行列計算部１１２は、上述のとおり、入力データ全体について当該行列計算を実施し、各座標領域についての出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する。したがって、入力データに対する行列計算が完了した時点で、データメモリ１１１には、入力データ全体に対して行列計算（畳み込み演算）を行った結果である出力行列が格納されていることになる。 The matrix calculation unit 112 reads data from the data memory 111 and performs matrix calculations. As described above, the matrix calculation unit 112 performs matrix calculations including multiplication of pixel values of a coordinate area consisting of a 3×3 area and a kernel for each of the R, G, and B input channels. In this embodiment, the matrix calculation unit 112 performs a matrix calculation of Q=A*P+B, where P is coordinate area data, A is kernel, B is bias, and Q is calculation matrix. The matrix calculation unit 112 stores an output matrix (output channel), which is the result of the matrix calculation for the entire input data, in the data memory 111 as the data to be calculated in the next layer in the convolution calculation. Although not limited in particular, in this embodiment, the matrix calculation unit 112 outputs, as an output value, the sum of the elements of the calculation matrix Q, which is the result of performing the above matrix calculation for the coordinate area of each input channel. As described above, the matrix calculation unit 112 performs the matrix calculation for the entire input data, and stores the output value for each coordinate area in the data memory 111 together with information indicating the coordinate area. Therefore, when the matrix calculation for the input data is completed, the data memory 111 stores the output matrix that is the result of performing the matrix calculation (convolution operation) on the entire input data.

　特に限定されないが、本実施形態では、行列計算部１１２は、まず、３つの入力チャネルの１つの入力チャネルについて、各座標領域に対する行列計算を実施する。この各座標領域に対する行列計算が完了した時点で、データメモリ１１１には１つの入力チャネルに対して行列計算がなされた出力行列が格納されていることになる。次いで、行列計算部１１２は、残りの２つの入力チャネルの１つの入力チャネルについて、各座標領域に対する行列計算を実施する。このとき、バイアスとして、データメモリ１１１に格納されている１番目の入力チャネルについての出力行列において、行列計算中の座標領域に対応する要素以外の要素をゼロにした行列が使用される。この場合、２番目の入力チャネルの各座標領域に対する行列計算が完了した時点で、データメモリ１１１には、１番目の入力チャネルに対する行列計算の結果と２番目の入力チャネルに対する行列計算の結果とが合算された出力行列が格納されていることになる。続いて、行列計算部１１２は、残りの１つの入力チャネルについて、各座標領域に対する行列計算を実施する。このとき、バイアスとして、データメモリ１１１に格納されている１番目及び２番目の入力チャネルについての出力行列において、行列計算中の座標領域に対応する要素以外の要素をゼロとした行列が使用される。この場合、３番目の入力チャネルの各座標領域に対する行列計算が完了した時点で、データメモリ１１１には、３つの入力チャネルに対する行列計算の結果が合算された出力行列が格納されていることになる。 In this embodiment, the matrix calculation unit 112 first performs matrix calculations for each coordinate region for one of the three input channels. When the matrix calculations for each coordinate region are completed, the data memory 111 stores an output matrix resulting from the matrix calculations for one input channel. Next, the matrix calculation unit 112 performs matrix calculations for each coordinate region for one of the remaining two input channels. At this time, as a bias, a matrix in which elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrix for the first input channel stored in the data memory 111 is used. In this case, when the matrix calculations for each coordinate region of the second input channel are completed, the data memory 111 stores an output matrix in which the result of the matrix calculation for the first input channel and the result of the matrix calculation for the second input channel are added together. Next, the matrix calculation unit 112 performs matrix calculations for each coordinate region for the remaining input channel. At this time, a matrix in which all elements other than those corresponding to the coordinate region being calculated are set to zero in the output matrices for the first and second input channels stored in the data memory 111 are used as the bias. In this case, when the matrix calculation for each coordinate region of the third input channel is completed, the data memory 111 stores an output matrix in which the results of the matrix calculation for the three input channels are added together.

　ゼロチェック部１１３は、行列計算部１１２が出力する出力行列の各要素、すなわち、各座標領域に対する行列計算の出力値に対して予め指定された範囲内に属するか否かを判断する。特に限定されないが、本実施形態では、ゼロチェック部１１３は予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。後述のように、本実施形態では、ゼロチェック部１１３は、行列計算部１１２から出力値が出力される都度、その時点でサブマップメモリ１１４に格納されている状態情報と行列計算部１１２からの出力値とに基づいて上述の判断をする構成になっている。 The zero check unit 113 judges whether each element of the output matrix output by the matrix calculation unit 112, i.e., the output value of the matrix calculation for each coordinate region, falls within a pre-specified range. Although not limited to this, in this embodiment, the zero check unit 113 judges which of a number of ranges defined by a number of pre-set threshold values the element belongs to. As described below, in this embodiment, the zero check unit 113 is configured to make the above-mentioned judgment each time an output value is output from the matrix calculation unit 112, based on the state information stored in the submap memory 114 at that time and the output value from the matrix calculation unit 112.

　サブマップメモリ１１４は、ゼロチェック部１１３の判断結果を状態情報として格納する。当該状態情報は、上述の出力行列に対応して格納される。また、状態情報は上述のサブマップを構成する情報の１つである。状態情報とは、例えば、ゼロチェック部１１３に３つの閾値が予め設定されている場合、４つの状態のいずれに属するかを示す情報になる。すなわち、状態情報は、出力行列の要素のすべてが第１閾値以下である「状態１」、出力行列の要素のすべてが第２閾値以下である「状態２」、出力行列の要素のすべてが第３閾値以下である「状態３」、状態１・状態２・状態３のいずれでもない「状態４」、の４つの状態のいずれに属しているかを示す情報である。特に限定されないが、本実施形態では、状態１、状態２、状態３、状態４を示す情報としてそれぞれ「０」、「１」、「２」、「３」の数値が格納されている。当該状態情報は、ゼロチェック部１１３が判断する都度、更新される。なお、サブマップメモリ１１４はデータメモリ１１１を構成するメモリ装置の一部として構成されてもよく、別体のメモリ装置として構成されてもよい。 The submap memory 114 stores the judgment result of the zero check unit 113 as state information. The state information is stored in correspondence with the output matrix described above. The state information is one of the pieces of information that constitute the submap described above. For example, when three thresholds are preset in the zero check unit 113, the state information is information indicating which of four states the state belongs to. That is, the state information is information indicating which of the four states the state belongs to: "state 1" where all the elements of the output matrix are equal to or less than the first threshold, "state 2" where all the elements of the output matrix are equal to or less than the second threshold, "state 3" where all the elements of the output matrix are equal to or less than the third threshold, and "state 4" where the state is neither state 1, state 2, nor state 3. Although not limited to this, in this embodiment, the numerical values "0", "1", "2", and "3" are stored as information indicating state 1, state 2, state 3, and state 4, respectively. The state information is updated each time the zero check unit 113 makes a judgment. The submap memory 114 may be configured as part of the memory device that constitutes the data memory 111, or may be configured as a separate memory device.

　マップチェック部１１５は、サブマップメモリ１１４に格納された状態情報に基づいて、当該状態情報に対応する出力行列を畳み込み演算における次層の入力データ（演算対象データ）として行列計算部１１２に読み出させるか否かを判断する。例えば、上述の例の場合、以下のような判断をする。状態情報が状態１の場合、マップチェック部１１５は行列計算部１１２に対応するデータ（出力行列）を読み出させないと判断する。状態情報が状態２や状態３の場合、マップチェック部１１５は、カーネルＡの値が予め設定された条件を満足するときは行列計算部１１２に対応するデータ（出力行列）を読み出させないと判断する。予め設定された条件とは、例えば、状態２はカーネルの要素の半分以上がゼロであること、状態３はカーネルの要素の３／４以上がゼロであること、等である。また、状態情報が状態４場合、マップチェック部１１５は行列計算部１１２に対応するデータ（出力行列）を読み出させると判断する。 Based on the state information stored in the submap memory 114, the map check unit 115 judges whether or not to have the matrix calculation unit 112 read out the output matrix corresponding to the state information as input data (data to be calculated) for the next layer in the convolution calculation. For example, in the above example, the following judgment is made. If the state information is state 1, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112. If the state information is state 2 or state 3, the map check unit 115 judges not to read out the data (output matrix) corresponding to the matrix calculation unit 112 when the value of kernel A satisfies a preset condition. The preset condition is, for example, state 2 where more than half of the kernel elements are zero, state 3 where more than 3/4 of the kernel elements are zero, etc. Also, if the state information is state 4, the map check unit 115 judges to read out the data (output matrix) corresponding to the matrix calculation unit 112.

　また、演算装置１００は、データメモリ１１１、行列計算部１１２、ゼロチェック部１１３、サブマップメモリ１１４、マップチェック部１１５の動作タイミング等を制御するコントローラ１１６を備える。 The calculation device 100 also includes a controller 116 that controls the operation timing of the data memory 111, matrix calculation unit 112, zero check unit 113, submap memory 114, and map check unit 115.

　なお、演算装置１００において、行列計算部１１２は、例えば、画像処理に特化されたＧＰＵ（Graphic Processing Unit）等のプロセッサにより実現することができる。また、ゼロチェック部１１３、マップチェック部１１５、コントローラ１１６等の信号処理やデータ処理を実施する各要素は、例えば、専用の演算回路、あるいは、プロセッサとＲＡＭ（Random Access Memory）やＲＯＭ（Read Only Memory）等のメモリとを備えたハードウェア、及び当該メモリに格納され、プロセッサ上で動作するソフトウェアにより実現することができる。 In the arithmetic device 100, the matrix calculation unit 112 can be realized by a processor, such as a GPU (Graphic Processing Unit) specialized for image processing. Furthermore, each element that performs signal processing and data processing, such as the zero check unit 113, map check unit 115, and controller 116, can be realized by, for example, a dedicated arithmetic circuit, or hardware equipped with a processor and memory such as RAM (Random Access Memory) or ROM (Read Only Memory), and software that is stored in the memory and runs on the processor.

　続いて、以上の構成を有する演算装置１００の動作について説明する。上述のとおり、状態情報を含むサブマップは、入力データに対する畳み込み演算が実施される際に、次層の畳み込み演算の入力チャネルとして使用される出力行列ごとに作成される。すなわち、１層目の畳み込み演算の際に作成されたサブマップが第２層の畳み込み演算の際に使用され、２層目の畳み込み演算の際に作成されたサブマップが第３層の畳み込み演算の際に使用される。したがって、１層目の畳み込み演算では、各入力チャネルに対応するサブマップは存在しない。以下では、１層目の畳み込み演算時の動作と、２層目以降の畳み込み演算時とのそれぞれについて説明する。 Next, the operation of the arithmetic device 100 having the above configuration will be described. As described above, when a convolution operation is performed on input data, a submap including state information is created for each output matrix used as an input channel for the convolution operation of the next layer. That is, the submap created during the convolution operation of the first layer is used during the convolution operation of the second layer, and the submap created during the convolution operation of the second layer is used during the convolution operation of the third layer. Therefore, in the convolution operation of the first layer, there are no submaps corresponding to each input channel. Below, the operation during the convolution operation of the first layer and the convolution operations of the second layer and onwards will be described.

　図２は、本実施形態の演算装置１００の１層目の畳み込み演算時に実施される手順を示すフロー図である。なお、図２では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置１００の外部からデータメモリ１１１に格納されたタイミングで開始される。 FIG. 2 is a flow diagram showing the procedure performed during the first layer convolution calculation of the calculation device 100 of this embodiment. Note that FIG. 2 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.

　当該手順が開始されると、行列計算部１１２が１番目の入力チャネルについて行列計算を開始する。行列計算部１１２は、まず、データメモリ１１１から、１番目の入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ２０１）。また、このとき、行列計算部１１２は、データメモリ１１１から、１番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ２０２）。上述のとおり、１番目の入力チャネルに対するバイアスはゼロ行列である。 When this procedure is started, the matrix calculation unit 112 starts matrix calculation for the first input channel. The matrix calculation unit 112 first reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (step S201). At this time, the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel from the data memory 111, and the bias described above (step S202). As described above, the bias for the first input channel is a zero matrix.

　次いで、行列計算部１１２は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する（ステップＳ２０３、Ｓ２０４）。このとき、行列計算部１１２は出力値をゼロチェック部１１３に入力する。当該入力に応じて、ゼロチェック部１１３は、予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。そして、ゼロチェック部１１３は、判断結果をサブマップメモリ１１４に格納し、既に格納されていた状態情報を更新する（ステップＳ２０５）。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S203 and S204). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to. The zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S205).

　演算装置１００は、以上の処理を、１番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ２０６Ｎｏ）。このとき、データメモリ１１１には１番目の入力チャネルについて行列計算された出力行列が格納されていることになる。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.

　１番目の入力チャネルに対する計算が完了すると、行列計算部１１２は、２番目の入力チャネルについて行列計算を開始する（ステップＳ２０６Ｙｅｓ、Ｓ２０７Ｎｏ）。行列計算部１１２は、データメモリ１１１から、２番目の入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ２０１）。また、このとき、行列計算部１１２は、データメモリ１１１から、２番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及びバイアスを読み出す（ステップＳ２０２）。上述のとおり、２番目の入力チャネルに対するバイアスは、計算中の出力行列において、その時点でデータメモリ１１１に格納されている、計算対象の座標領域に対応する要素以外の要素をゼロにした行列である。 When the calculation for the first input channel is completed, the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No). The matrix calculation unit 112 reads the kernel to be used for the matrix calculation for the second input channel from the data memory 111 (step S201). At this time, the matrix calculation unit 112 also reads the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the second input channel, and the bias from the data memory 111 (step S202). As described above, the bias for the second input channel is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time are set to zero except for the elements corresponding to the coordinate region to be calculated.

　演算装置１００は、以上の処理を、２番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ２０６Ｎｏ）。このとき、データメモリ１１１には１番目の入力チャネルに対する行列計算の結果と２番目の入力チャネルに対する行列計算の結果とが合算された出力行列が格納されていることになる。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the second input channel (step S206: No). At this time, the data memory 111 stores an output matrix that is the sum of the results of the matrix calculation for the first input channel and the results of the matrix calculation for the second input channel.

　以降、２番目の入力チャネルに対する処理と同様の処理がｎ番目（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施される（ステップＳ２０６Ｙｅｓ、Ｓ２０７Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には入力データ全体（全入力チャネル）に対して行列計算を行った結果である出力行列が格納されていることになる。 After this, processing similar to that for the second input channel is repeated until processing for the nth input channel (in this example, n=3) is completed (steps S206: Yes, S207: No). When processing for the nth input channel is completed, the data memory 111 will store an output matrix that is the result of performing matrix calculations on the entire input data (all input channels).

　また、演算装置１００は、以上のｎ個の入力チャネルに対する処理を、所定の出力行列数ｍ（本例では、ｍ＝３）が得られるまで繰り返し実施する（ステップＳ２０７Ｙｅｓ、Ｓ２０８Ｎｏ）。その結果、データメモリ１１１にはｍ個の出力行列が格納され、サブマップメモリ１１４には各出力行列に対応するｍ個のサブマップが格納されることになる。このｍ個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。 The calculation device 100 also repeats the process for the above n input channels until a predetermined number m of output matrices (in this example, m = 3) is obtained (steps S207: Yes, S208: No). As a result, m output matrices are stored in the data memory 111, and m submaps corresponding to each output matrix are stored in the submap memory 114. These m output matrices are used as input channels for the convolution calculation of the next layer.

　図３は、本実施形態の演算装置１００の２層目以降の畳み込み演算時に実施される手順を示すフロー図である。なお、図３では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。また、入力チャネル数ｎは、直前の層の畳み込み演算により計算される出力行列（出力チャネル）の数ｍと一致する。当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 3 is a flow diagram showing the procedure performed during convolution calculations from the second layer onward in the calculation device 100 of this embodiment. Note that FIG. 3 shows an example in which the number of input channels is n and the number of output channels is m. The number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

　当該手順が開始されると、マップチェック部１１５が、サブマップメモリ１１４から、１番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する（ステップＳ３０１）。 When this procedure starts, the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information contained in that submap (step S301).

　状態情報が上述の状態１を示す情報である場合、マップチェック部１１５は、サブマップメモリ１１４から、２番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する（ステップＳ３０２Ｙｅｓ、Ｓ３１０Ｎｏ、Ｓ３０１）。 If the status information indicates the above-mentioned status 1, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in the submap (steps S302 Yes, S310 No, S301).

　また、状態情報が上述の状態２、又は状態３を示す情報である場合、マップチェック部１１５は、行列計算部１１２に、データメモリ１１１から、１番目の入力チャネルに対する行列計算に使用するカーネルを読み出させる。そして、当該カーネルが、上述の条件を満足するか否かを確認する（ステップＳ３０２Ｎｏ、Ｓ３０３）。カーネルが上述の条件を満足する場合、マップチェック部１１５は、サブマップメモリ１１４から、２番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する（ステップＳ３０４Ｙｅｓ、Ｓ３１０Ｎｏ、Ｓ３０１）。 If the state information indicates the above-mentioned state 2 or state 3, the map check unit 115 causes the matrix calculation unit 112 to read a kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is confirmed whether or not the kernel satisfies the above-mentioned conditions (steps S302: No, S303). If the kernel satisfies the above-mentioned conditions, the map check unit 115 reads a submap corresponding to the second input channel from the submap memory 114, and checks the state information included in the submap (steps S304: Yes, S310: No, S301).

　また、状態情報が上述の状態４を示す情報である場合、マップチェック部１１５は、行列計算部１１２に行列計算を実行させる（ステップＳ３０２Ｎｏ）。この場合、行列計算部１１２は、データメモリ１１１から、１番目の入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ３０２Ｎｏ、Ｓ３０３）。また、このとき、行列計算部１１２は、データメモリ１１１から、１番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ３０４Ｎｏ、Ｓ３０５）。なお、上述のとおり、ｋ層目の畳み込み演算の場合、ｋ－１層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 If the state information indicates the above-mentioned state 4, the map check unit 115 causes the matrix calculation unit 112 to execute a matrix calculation (step S302: No). In this case, the matrix calculation unit 112 reads out the kernel to be used for the matrix calculation for the first input channel from the data memory 111 (steps S302: No, S303). At this time, the matrix calculation unit 112 also reads out the data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel, and the above-mentioned bias from the data memory 111 (steps S304: No, S305). As described above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

　次いで、行列計算部１１２は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する（ステップＳ３０６、Ｓ３０７）。このとき、行列計算部１１２は出力値をゼロチェック部１１３に入力する。当該入力に応じて、ゼロチェック部１１３は、予め設定された複数の閾値により規定される複数の範囲のいずれに属するかを判断する。そして、ゼロチェック部１１３は、判断結果をサブマップメモリ１１４に格納し、既に格納されていた状態情報を更新する（ステップＳ３０８）。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate region (steps S306 and S307). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 determines which of a number of ranges defined by a number of preset thresholds the value belongs to. The zero check unit 113 then stores the determination result in the submap memory 114, and updates the state information already stored (step S308).

　演算装置１００は、以上の処理を、１番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ３０９Ｎｏ）。 The calculation device 100 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S309 No).

　１番目の入力チャネルに対する計算が完了すると、マップチェック部１１５が、サブマップメモリ１１４から、２番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する（ステップＳ３０９Ｙｅｓ、Ｓ３１０Ｎｏ、Ｓ３０１）。以降の処理は、上述の１番目の入力チャネルに対する処理と同様の処理が、ｎ番目（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施される（ステップＳ３０９Ｙｅｓ、Ｓ３１０Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には入力データ全体（全入力チャネル）に対して行列計算を行った結果である出力行列が格納されていることになる。 When the calculation for the first input channel is completed, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in that submap (steps S309 Yes, S310 No, S301). Subsequent processing is similar to the processing for the first input channel described above, and is repeated until processing for the nth input channel (n=3 in this example) is completed (steps S309 Yes, S310 No). When processing for the nth input channel is completed, the data memory 111 stores the output matrix that is the result of performing matrix calculations on the entire input data (all input channels).

　また、演算装置１００は、以上のｎ個の入力チャネルに対する処理を、所定の出力行列数ｍ（本例では、ｍ＝３）が得られるまで繰り返し実施する（ステップＳ３１０Ｙｅｓ、Ｓ３１１Ｎｏ）。その結果、データメモリ１１１にはｍ個の出力行列が格納され、サブマップメモリ１１４には各出力行列に対応するｍ個のサブマップが格納されることになる。このｍ個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。そして、演算装置１００は、指定された層数の畳み込み演算のすべてが完了するまで２層目以降の畳み込み演算時に実施される手順を繰り返し実施する。 The arithmetic device 100 also repeats the process for the above n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (steps S310: Yes, S311: No). As a result, m output matrices are stored in the data memory 111, and m submaps corresponding to each output matrix are stored in the submap memory 114. These m output matrices are used as input channels for the convolution operation of the next layer. The arithmetic device 100 then repeats the procedure performed during the convolution operation of the second layer and onwards, until all of the convolution operations for the specified number of layers are completed.

　以上説明したように、本実施形態の演算装置１００では、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部１１２には、データメモリ１１１から当該入力チャネルに属するデータは読み出されない。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As described above, in the arithmetic device 100 of this embodiment, state information is created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to that state information is skipped. Also, at this time, data belonging to that input channel is not read from the data memory 111 to the matrix calculation unit 112. In other words, since the reading of unnecessary data does not occur, it is possible to further reduce the wasted data read time, and the time required for the entire calculation can be further shortened compared to the conventional method.

　ここで、ゼロチェック部１１３の構成の一例について説明する。図４は、演算装置１００が備えるゼロチェック部１１３の一例を示す概略構成図である。図４に示すように、ゼロチェック部１１３は、入力端子３１、比較端子３２及び出力端子３３を備える。入力端子３１には、行列計算部１１２からの出力値が入力される。比較端子３２には、サブマップメモリ１１４に格納されている状態情報等が入力される。出力端子３３は、サブマップメモリ１１４へ格納するデータを出力する。 Here, an example of the configuration of the zero check unit 113 will be described. FIG. 4 is a schematic diagram showing an example of the zero check unit 113 provided in the arithmetic device 100. As shown in FIG. 4, the zero check unit 113 has an input terminal 31, a comparison terminal 32, and an output terminal 33. The input terminal 31 receives an output value from the matrix calculation unit 112. The comparison terminal 32 receives state information stored in the submap memory 114. The output terminal 33 outputs data to be stored in the submap memory 114.

　入力端子３１を通じて入力された行列計算部１１２からの出力値は、複数の比較器を有する比較部３４に入力される。比較部３４は、予め設定される閾値の数以上の比較器を備える。上述のように、本実施形態では、３つの閾値が設定されるため、比較部３４は、３つの比較器３４ａ、３４ｂ、３４ｃを備えている。各比較器３４ａ、３４ｂ、３４ｃの一方の入力端子には行列計算部１１２からの出力値が入力され、他方の入力端子には閾値が入力される。なお、特に限定されないが、ここでは、各比較器３４ａ、３４ｂ、３４ｃにおいて、入力される行列計算部１１２からの出力値が閾値よりも大きい場合、比較器３４ａ、３４ｂ、３４ｃは数値「１」を出力する構成になっている。 The output value from the matrix calculation unit 112 input through the input terminal 31 is input to the comparison unit 34 having multiple comparators. The comparison unit 34 has a number of comparators equal to or greater than the number of preset thresholds. As described above, in this embodiment, three thresholds are set, so the comparison unit 34 has three comparators 34a, 34b, and 34c. The output value from the matrix calculation unit 112 is input to one input terminal of each of the comparators 34a, 34b, and 34c, and the threshold is input to the other input terminal. Although not particularly limited, in this embodiment, when the output value input from the matrix calculation unit 112 is greater than the threshold, the comparators 34a, 34b, and 34c are configured to output the numerical value "1".

　各比較器３４ａ、３４ｂ、３４ｃの出力はチェッカー３５に入力される。また、チェッカー３５には、比較端子３２を介してその時点でサブマップメモリ１１４に格納されている状態情報が入力される。上述のとおり、サブマップメモリ１１４に格納されている状態情報は「０」から「３」までのいずれかの数値である。 The output of each comparator 34a, 34b, 34c is input to checker 35. In addition, the status information stored in submap memory 114 at that time is input to checker 35 via comparison terminal 32. As described above, the status information stored in submap memory 114 is a number between "0" and "3."

　チェッカー３５は、各比較器３４ａ、３４ｂ、３４ｃの出力に数値「１」が含まれており、サブマップメモリ１１４に格納されている状態情報を更新する必要がある場合は、更新後の状態情報に対応する出力を出力端子３３に出力する。例えば、格納されている状態情報が「０」である場合、チェッカー３５は、少なくとも比較器３４ａの出力が数値「１」であるとき、各比較器３４ｂ、３４ｃの出力に応じて状態情報を数値「１」、「２」、「３」のいずれかに更新する。また、格納されている状態情報が「１」である場合、チェッカー３５は、少なくとも比較器３４ｂの出力が数値「１」であるときに比較器３４ｃの出力に応じて状態情報を数値「２」、「３」のいずれかに更新する。また、格納されている状態情報が「２」である場合、チェッカー３５は、比較器３４ｃの出力が数値「１」であるときに状態情報を数値「３」に更新する。なお、格納されている状態情報が「３」である場合、チェッカー３５は状態情報を更新しない。 When the output of each of the comparators 34a, 34b, and 34c includes the value "1" and the status information stored in the submap memory 114 needs to be updated, the checker 35 outputs an output corresponding to the updated status information to the output terminal 33. For example, when the stored status information is "0", the checker 35 updates the status information to one of the values "1", "2", or "3" according to the output of each of the comparators 34b and 34c when at least the output of the comparator 34a is the value "1". When the stored status information is "1", the checker 35 updates the status information to one of the values "2" or "3" according to the output of the comparator 34c when at least the output of the comparator 34b is the value "1". When the stored status information is "2", the checker 35 updates the status information to the value "3" when the output of the comparator 34c is the value "1". When the stored status information is "3", the checker 35 does not update the status information.

　以上の構成を有するセロチェック部１１３によれば、１つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列に対応する状態情報がサブマップメモリ１１４に格納されることになる。なお、ゼロチェック部１１３は他の構成により実現することも可能である。例えば、ゼロチェック部１１３が各範囲に属する出力値の数の累積値を保持する構成を採用することもできる。この場合、ゼロチェック部１１３は、行列計算部１１２から出力値が出力される都度、保持している累積値に基づいて上述の複数の範囲のいずれに属するかを判断することができる。 With the zero check unit 113 having the above configuration, when the convolution operation to obtain one output matrix is completed, the status information corresponding to that output matrix is stored in the submap memory 114. Note that the zero check unit 113 can also be realized with other configurations. For example, a configuration can be adopted in which the zero check unit 113 holds a cumulative value of the number of output values that belong to each range. In this case, each time an output value is output from the matrix calculation unit 112, the zero check unit 113 can determine which of the above-mentioned multiple ranges it belongs to based on the held cumulative value.

　特に限定されないが、この例では、ゼロチェック部１１３が上述の判断の他、出力行列の要素に負の値が含まれているか否かの負判定と、出力行列において上述のいずれかの閾値を超える要素の数の計数ができる構成を有している。なお、負判定の結果を示す負判定情報及び係数結果を示す計数情報はサブマップメモリ１１４に格納される。負判定情報、計数情報も、上述の状態情報とともに上述のサブマップを構成する情報である。 In this example, although not limited to this, the zero check unit 113 is configured to be able to make the above-mentioned judgments as well as make negative judgments as to whether or not the elements of the output matrix contain negative values, and to count the number of elements in the output matrix that exceed any of the above-mentioned thresholds. Note that negative judgment information indicating the result of the negative judgment and counting information indicating the coefficient result are stored in the submap memory 114. The negative judgment information and counting information, together with the above-mentioned state information, are information that constitutes the above-mentioned submap.

　すなわち、ゼロチェック部１１３は、比較部３４に、一方の入力端子に行列計算部１１２からの出力値が入力され、他方の入力端子に数値「０」が入力される比較器３４ｄを備える。比較器３４ｄは、入力される行列計算部１１２からの出力値が「０」よりも小さい場合、数値「１」を出力する構成になっている。比較器３４ｄの出力は、ＯＲ回路３６に入力される。また、ＯＲ回路３６には、比較端子３２を介してその時点でサブマップメモリ１１４に格納されている負判定情報が入力される。ＯＲ回路３６は、比較器３４ｄの出力及びサブマップメモリ１１４に格納されている負判定情報のいずれかが数値「１」である場合、数値「１」を出力端子３３に出力する。すなわち、１つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列の要素に負の値が含まれていると負判定情報として数値「１」がサブマップメモリ１１４に格納されることになる。また、当該出力行列の要素に負の値が含まれていない場合は、負判定情報として数値「０」がサブマップメモリ１１４に格納されることになる。 That is, the zero check unit 113 includes a comparator 34d in the comparison unit 34, one input terminal of which receives the output value from the matrix calculation unit 112 and the other input terminal of which receives the numerical value "0". The comparator 34d is configured to output the numerical value "1" when the input output value from the matrix calculation unit 112 is smaller than "0". The output of the comparator 34d is input to the OR circuit 36. The negative judgment information stored in the submap memory 114 at that time is also input to the OR circuit 36 via the comparison terminal 32. If either the output of the comparator 34d or the negative judgment information stored in the submap memory 114 is the numerical value "1", the OR circuit 36 outputs the numerical value "1" to the output terminal 33. That is, when the convolution operation for obtaining one output matrix is completed, if the element of the output matrix contains a negative value, the numerical value "1" is stored in the submap memory 114 as the negative judgment information. Furthermore, if the elements of the output matrix do not contain negative values, the numerical value "0" will be stored in the submap memory 114 as negative determination information.

　また、ゼロチェック部１１３は、各比較器３４ａ、３４ｂ、３４ｃの出力が入力されるセレクタ３７を備える。セレクタ３７は、比較器３４ａ、３４ｂ、３４ｃの出力のうち、予め設定された１つの出力をカウンタ３８に入力する。また、カウンタ３８には、比較端子３２を介してその時点でサブマップメモリ１１４に格納されている計数情報が入力される。カウンタ３８は、数値「１」が入力された場合、格納されている計数情報に「１」を加算した値を出力端子３３に出力する。例えば、出力行列において、第３閾値より大きい要素の数を計数する場合、セレクタ３６は、比較器３４ｃの出力値を出力する状態に設定される。そして、本構成によれば、１つの出力行列を得るための畳み込み演算が完了したときに、当該出力行列に含まれる、第３閾値より大きい要素の計数値が計数情報としてサブマップメモリ１１４に格納されることになる。 The zero check unit 113 also includes a selector 37 to which the outputs of the comparators 34a, 34b, and 34c are input. The selector 37 inputs one of the outputs of the comparators 34a, 34b, and 34c that is set in advance to a counter 38. The counter 38 also receives the count information stored in the submap memory 114 at that time via a comparison terminal 32. When the value "1" is input to the counter 38, the counter 38 outputs a value obtained by adding "1" to the stored count information to the output terminal 33. For example, when counting the number of elements in the output matrix that are greater than the third threshold, the selector 36 is set to a state in which it outputs the output value of the comparator 34c. According to this configuration, when the convolution operation for obtaining one output matrix is completed, the count value of the elements contained in the output matrix that are greater than the third threshold is stored in the submap memory 114 as count information.

　なお、上述の負判定情報は、例えば、負の値の有無により処理が異なる場合、いずれの処理を実装するかのフラグとして使用することができる。また、計数情報は、例えば、予め設定された閾値を超える要素が含まれている場合でも、その総数が少ない場合は対応する出力行列（入力チャネル）を読み出さずにスキップする等の処理を実行することが可能になる。 The above-mentioned negative determination information can be used as a flag to determine which process to implement when the process differs depending on whether or not there is a negative value. In addition, the count information can perform a process such as skipping the corresponding output matrix (input channel) without reading it out if the total number of elements exceeds a preset threshold value.

　図５は、サブマップの一例を示す図である。特に限定されないが、ここでは１バイト（８ビット）データによるサブマップを例示している。上述のように、本実施形態では、サブマップ４０は、状態情報、負判定情報、計数情報を含む。この例では、２ビットの状態情報、１ビットの負判定情報、５ビットの計数情報によりサブマップ４０が構成されている。なお、サブマップメモリ１１４におけるサブマップ４０のアドレス情報と、データメモリ１１１に格納されている当該サブマップ４０に対応する出力行列（次層の入力チャネル）のアドレス情報は、相互に関連性を有していることが好ましい。当該関連性は、例えば、サブマップ４０の先頭アドレスが、当該サブマップ４０に対応する出力行列の先頭アドレスに、予め指定されたオフセットを加算したアドレスになっている等である。 FIG. 5 is a diagram showing an example of a submap. Although not limited to this, a submap using 1 byte (8 bits) of data is shown here as an example. As described above, in this embodiment, the submap 40 includes status information, negative judgment information, and count information. In this example, the submap 40 is composed of 2 bits of status information, 1 bit of negative judgment information, and 5 bits of count information. It is preferable that the address information of the submap 40 in the submap memory 114 and the address information of the output matrix (next layer input channel) corresponding to the submap 40 stored in the data memory 111 are mutually related. This relationship is, for example, such that the top address of the submap 40 is an address obtained by adding a pre-specified offset to the top address of the output matrix corresponding to the submap 40.

　なお、図２及び図３に示すフロー図では、行列計算部１１２が計算結果を出力する都度、ステップＳ２０５やステップＳ３０８において、ゼロチェック部１１３が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部１１３が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ１１４に格納する構成を採用することもできる。 In the flow diagrams shown in Figures 2 and 3, the zero check unit 113 updates the state information in steps S205 and S308 each time the matrix calculation unit 112 outputs a calculation result. However, it is also possible to update the state information at other times. For example, it is also possible to adopt a configuration in which the zero check unit 113 creates state information and stores it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

　上述の演算装置１００の構成では、サブマップメモリ１１４に、行列計算により算出される全出力行列のそれぞれに対応するサブマップが格納されることになる。そのため、畳み込み演算の層数やその各層における出力行列の数が多い場合、サブマップメモリ１１４として多くの記憶領域を準備する必要がある。ここでは、準備すべきサブマップメモリ１１４の記憶領域を低減することができる構成について説明する。図６は、本発明の一実施形態における演算装置の変形例の構成を示す概略構成図である。なお、図６において演算装置１００と同様の作用効果を奏する構成要素には、図１と同一の符号を付し、以下での詳細な説明は省略する。 In the configuration of the arithmetic device 100 described above, submaps corresponding to all output matrices calculated by the matrix calculation are stored in the submap memory 114. Therefore, when the number of layers of the convolution calculation and the number of output matrices in each layer are large, it is necessary to prepare a large storage area for the submap memory 114. Here, a configuration that can reduce the storage area of the submap memory 114 that needs to be prepared is described. Figure 6 is a schematic diagram showing the configuration of a modified example of the arithmetic device in one embodiment of the present invention. Note that in Figure 6, components that achieve the same effects as the arithmetic device 100 are given the same reference numerals as in Figure 1, and detailed description thereof will be omitted below.

　図６に示すように本実施形態の演算装置３００は、サブマップアドレスバッファ１２０をさらに備える。演算装置３００は、上述の演算装置１００が作成するサブマップに加えて、サブマップアドレステーブルを作成する。サブマップアドレステーブルは、サブマップアドレスバッファ１２０に格納される。図７に示すように、サブマップアドレステーブル４１は、ＩＤ番号、アドレス情報、及び使用情報が紐づけられた状態で記録されるテーブルである。なお、サブマップアドレスバッファ１２０は、サブマップメモリ１１４を構成するメモリ装置やデータメモリ１１１を構成するメモリ装置の一部として構成されてもよく、別体のメモリ装置として構成されてもよい。 As shown in FIG. 6, the arithmetic device 300 of this embodiment further includes a submap address buffer 120. The arithmetic device 300 creates a submap address table in addition to the submap created by the arithmetic device 100 described above. The submap address table is stored in the submap address buffer 120. As shown in FIG. 7, the submap address table 41 is a table in which ID numbers, address information, and usage information are recorded in a linked state. The submap address buffer 120 may be configured as part of the memory device that constitutes the submap memory 114 or the memory device that constitutes the data memory 111, or may be configured as a separate memory device.

　ＩＤ番号は、サブマップを識別するための情報として機能する。上述のとおり、サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごと、すなわち、畳み込み演算を実施する層における出力チャネルごとに作成される。そのため、ＩＤ番号の数は、出力チャネルの数と同数になる。特に限定されないが、本実施形態では、ＩＤ番号として、畳み込み演算の何層目であるかを示す数字と何番目の出力チャネルであるかを示す数字を組み合わせた固有の番号が付与される。例えば、第（ｋ－１）層において３つの出力チャネルが演算される場合、ＩＤ番号として「ｋ１」、「ｋ２」、「ｋ３」が付与される。より具体的には、第２層目の３つの出力チャネルに対しては、「３１」、「３２」、「３３」がＩＤ番号として付与される。これらの出力チャネルは、第３層の演算では入力チャネルとなるため、サブマップを読み出す際は、ＩＤ番号が「３１」、「３２」、「３３」のそれぞれに紐づけられたアドレス情報が参照されることになる。 The ID number functions as information for identifying a submap. As described above, a submap is created for each input channel used as input data for the convolution operation of the next layer, that is, for each output channel in the layer where the convolution operation is performed. Therefore, the number of ID numbers is the same as the number of output channels. Although not limited to this, in this embodiment, a unique number is assigned as the ID number, which is a combination of a number indicating which layer of the convolution operation and a number indicating which output channel. For example, when three output channels are operated in the (k-1)th layer, the ID numbers "k1", "k2", and "k3" are assigned. More specifically, the ID numbers "31", "32", and "33" are assigned to the three output channels of the second layer. Since these output channels become input channels in the operation of the third layer, when reading out the submap, the address information associated with the ID numbers "31", "32", and "33" is referenced.

　アドレス情報は、サブマップメモリ１１４におけるサブマップの格納位置を示す情報である。より具体的には、例えば、サブマップの格納位置の先頭アドレスである。図５を参照して説明したように、サブマップのデータ長（ビット数）は一定である。そのため、サブマップメモリ１１４におけるサブマップの格納位置は、１つのアドレスにより特定可能である。 The address information is information that indicates the storage location of the submap in the submap memory 114. More specifically, it is, for example, the starting address of the storage location of the submap. As explained with reference to FIG. 5, the data length (number of bits) of the submap is constant. Therefore, the storage location of the submap in the submap memory 114 can be identified by a single address.

　使用情報は、紐づけられたサブマップが次層の畳み込み演算に使用されたか否かを示す情報である。図７では、使用情報を「ｖａｌｉｄ」として表示している。上述のとおり、サブマップは、次層の畳み込み演算の入力データとして使用される入力チャネルごとに作成される。そのため、次層の畳み込み演算において読み出されて使用されたサブマップは以降の畳み込み演算に使用されない。本実施形態では、使用されたことを示す情報、及び使用されていないことを示す情報として、それぞれ「０」、「１」の数値を用いている。この場合、サブマップアドレステーブルにおいて、使用情報が「０」であるレコードには、新たなＩＤ番号、アドレス情報、使用情報を紐づけて格納することができる。 The usage information indicates whether the linked submap has been used in the convolution calculation of the next layer. In FIG. 7, the usage information is displayed as "valid". As described above, a submap is created for each input channel that is used as input data for the convolution calculation of the next layer. Therefore, a submap that is read and used in the convolution calculation of the next layer is not used in subsequent convolution calculations. In this embodiment, the numerical values "0" and "1" are used as information indicating that it has been used and information indicating that it has not been used, respectively. In this case, a new ID number, address information, and usage information can be linked and stored in a record in the submap address table whose usage information is "0".

　なお、本実施形態では、コントローラ１１６がＩＤ番号、アドレス情報、及び使用情報を生成し、サブマップアドレスバッファ１２０のサブマップアドレステーブルに記録する構成を採用しているが他の構成を採用することも可能である。例えば、これらの処理を実施する機能を有するサブマップアドレス管理部をコントローラ１１６と別体で設けてもよい。 In this embodiment, the controller 116 generates the ID number, address information, and usage information, and records them in the submap address table of the submap address buffer 120, but other configurations can also be used. For example, a submap address management unit having the function of performing these processes may be provided separately from the controller 116.

　続いて、以上の構成を有する演算装置３００の動作について説明する。図８は、演算装置３００による１層目の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図８では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置３００の外部からデータメモリ１１１に格納されたタイミングで開始される。なお、上述のとおり、演算装置３００は、サブマップアドレステーブルを作成する点のみが演算装置１００と相違する。そのため、図８に示す手順において、演算装置１００と同様の動作をするステップには、図２と同一の符号を付し、以下での詳細な説明は省略する。 Next, the operation of the arithmetic device 300 having the above configuration will be described. FIG. 8 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the first-layer convolution calculation by the arithmetic device 300. FIG. 8 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, the procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the arithmetic device 300. As described above, the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 8, steps that perform the same operation as the arithmetic device 100 are given the same reference numerals as in FIG. 2, and detailed description thereof will be omitted below.

　当該手順が開始されると、行列計算部１１２が１番目の出力行列（出力チャネル）の行列計算を開始する。このとき、コントローラ１１６は、サブマップアドレステーブル用データを生成する（ステップＳ２２０）。すなわち、コントローラ１１６は、上述のＩＤ番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ１２０のサブマップアドレステーブル４１に記録する。なお、コントローラ１１６は、生成した情報をサブマップアドレステーブル４１において使用情報が「０」であるレコードに記録する。 When this procedure is started, the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel). At this time, the controller 116 generates data for the submap address table (step S220). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the record in the submap address table 41 whose usage information is "0".

　上述のＩＤ番号の生成規則では、このとき、ＩＤ番号として「２１」が生成されることになる。アドレス情報はサブマップメモリ１１４内のアドレスが適宜選択される。例えば、選択可能なサブマップメモリ１１４内のアドレスが全てサブマップアドレステーブル４１に記録されており、コントローラ１１６が、情報を書き込もうとするレコードに記録されているアドレス情報を、生成したＩＤ番号に紐づけるアドレス情報として選択する構成を採用することができる。なお、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「１」である。 In the above-mentioned ID number generation rules, the ID number "21" is generated at this time. An address in the submap memory 114 is appropriately selected as the address information. For example, a configuration can be adopted in which all selectable addresses in the submap memory 114 are recorded in the submap address table 41, and the controller 116 selects the address information recorded in the record into which information is to be written as the address information to be linked to the generated ID number. Note that the linked submap has not yet been read, so the usage information recorded is "1".

　サブマップアドレステーブル用データの生成が完了すると、行列計算部１１２が１番目の入力チャネルについて行列計算を開始する。１番目の入力チャネルについての行列計算の手順は図２において説明した手順と概ね同一である。すなわち、カーネルの読み出し（ステップＳ２０１）、データ読み出し（ステップＳ２０２）、行列計算（ステップＳ２０３）、計算結果格納（ステップＳ２０４）については上述したとおりである。また、ゼロチェック部１１３による判断結果の格納（ステップＳ２０５）では、ゼロチェック部１１３は、判断結果をコントローラ１１６が生成したアドレス情報で指定されるサブマップメモリ１１４の格納位置に格納する。 When the generation of data for the submap address table is completed, the matrix calculation unit 112 starts matrix calculation for the first input channel. The procedure for matrix calculation for the first input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above. In addition, in storing the determination result by the zero check unit 113 (step S205), the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.

　演算装置３００は、以上の処理を、１番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ２０６Ｎｏ）。このとき、データメモリ１１１には１番目の入力チャネルについて行列計算された出力行列が格納されていることになる。 The calculation device 300 repeats the above process until it is completed for all coordinate regions belonging to the first input channel (step S206: No). At this time, the data memory 111 stores the output matrix calculated for the first input channel.

　１番目の入力チャネルに対する計算が完了すると、行列計算部１１２は、２番目の入力チャネルについて行列計算を開始する（ステップＳ２０６Ｙｅｓ、Ｓ２０７Ｎｏ）。２番目の入力チャネルについての行列計算の手順も図２において説明した手順と概ね同一である。すなわち、カーネルの読み出し（ステップＳ２０１）、データ読み出し（ステップＳ２０２）、行列計算（ステップＳ２０３）、計算結果格納（ステップＳ２０４）については上述したとおりである。また、ゼロチェック部１１３による判断結果の格納（ステップＳ２０５）では、ゼロチェック部１１３は、判断結果をコントローラ１１６が生成したアドレス情報で指定されるサブマップメモリ１１４の格納位置に格納する。 When the calculation for the first input channel is completed, the matrix calculation unit 112 starts the matrix calculation for the second input channel (step S206: Yes, S207: No). The procedure for the matrix calculation for the second input channel is generally the same as the procedure described in FIG. 2. That is, the kernel reading (step S201), data reading (step S202), matrix calculation (step S203), and calculation result storage (step S204) are as described above. In addition, in storing the determination result by the zero check unit 113 (step S205), the zero check unit 113 stores the determination result in a storage location in the submap memory 114 specified by the address information generated by the controller 116.

　演算装置３００は、以上の処理を２番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ２０６Ｎｏ）。以降、２番目の入力チャネルに対する処理と同様の処理がｎ番目（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施される（ステップＳ２０６Ｙｅｓ、Ｓ２０７Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には入力データ全体（全入力チャネル）に対して行列計算を行った結果である出力行列が格納されていることになる。 The calculation device 300 repeats the above process until it is completed for all coordinate regions belonging to the second input channel (step S206: No). Thereafter, a process similar to the process for the second input channel is repeated until the process for the nth input channel (in this example, n=3) is completed (steps S206: Yes, S207: No). When the process for the nth input channel is completed, the data memory 111 stores an output matrix that is the result of performing a matrix calculation on all input data (all input channels).

　また、演算装置３００は、以上のｎ個の入力チャネルに対する処理を、所定の出力行列数ｍ（本例では、ｍ＝３）が得られるまで繰り返し実施する（ステップＳ２０７Ｙｅｓ、Ｓ２０８Ｎｏ）。行列計算部１１２が２番目の出力行列（出力チャネル）の行列計算を開始するとき、コントローラ１１６は、当該出力行列に対応するサブマップアドレステーブル用データを生成する（ステップＳ２２０）。すなわち、コントローラ１１６は、ＩＤ番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ１２０のサブマップアドレステーブル４１に記録する。なお、コントローラ１１６は、生成した情報をサブマップアドレステーブル４１において使用情報が「０」であるレコードに記録する。なお、上述のＩＤ番号の生成規則では、このとき生成されるＩＤ番号は「２２」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「１」である。 The arithmetic device 300 repeats the above process for the n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (step S207: Yes, S208: No). When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S220). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "22". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".

　所定の出力行列数ｍを得るための計算がすべて完了したとき、データメモリ１１１にはｍ個の出力行列が格納され、サブマップメモリ１１４には各出力行列に対応するｍ個のサブマップが格納され、サブマップアドレスバッファ１２０にはｍ個の出力行列のそれぞれに対応するアドレス情報と使用情報とが記録されることになる。このｍ個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。 When all calculations to obtain a given number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices will be used as input channels for the convolution operation of the next layer.

　なお、以上の説明では、サブマップアドレステーブル用データの生成が出力行列の計算開始時に実施される構成を例示した。しかしながら、当該テーブル用データの生成は、１つの出力チャネルについての行列計算において、ゼロチェック部１１３が判断結果（状態情報）を最初にサブマップメモリ１１４に格納するまでであれば他のタイミングで実施されてもよい。 In the above explanation, an example was given of a configuration in which the data for the submap address table is generated at the start of the calculation of the output matrix. However, the data for the table may be generated at other times in the matrix calculation for one output channel, as long as it is generated before the zero check unit 113 first stores the judgment result (state information) in the submap memory 114.

　図９は、本実施形態の演算装置３００の２層目以降の畳み込み演算時に実施される手順を示すフロー図である。なお、図９では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。また、入力チャネル数ｎは、直前の層の畳み込み演算により計算される出力行列（出力チャネル）の数ｍと一致する。当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。なお、上述のとおり、演算装置３００は、サブマップアドレステーブルを作成する点のみが演算装置１００と相違する。そのため、図９に示す手順において、演算装置１００と同様の動作をするステップには、図３と同一の符号を付し、以下での詳細な説明は省略する。 FIG. 9 is a flow diagram showing the procedure performed by the arithmetic device 300 of this embodiment when performing convolution calculations on the second and subsequent layers. Note that FIG. 9 shows an example in which the number of input channels is n and the number of output channels is m. The number of input channels n is equal to the number m of output matrices (output channels) calculated by the convolution calculation of the immediately preceding layer. This procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed. Note that, as described above, the arithmetic device 300 differs from the arithmetic device 100 only in that the arithmetic device 300 creates a submap address table. Therefore, in the procedure shown in FIG. 9, steps that perform the same operations as the arithmetic device 100 are given the same reference numerals as in FIG. 3, and detailed explanations thereof will be omitted below.

　当該手順が開始されると、行列計算部１１２が１番目の出力行列（出力チャネル）の行列計算を開始する。このとき、コントローラ１１６は、サブマップアドレステーブル用データを生成する（ステップＳ３２０）。すなわち、コントローラ１１６は、上述のＩＤ番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ１２０のサブマップアドレステーブル４１に記録する。なお、コントローラ１１６は、生成した情報をサブマップアドレステーブル４１において使用情報が「０」であるレコードに記録する。なお、上述のＩＤ番号の生成規則では、このとき生成されるＩＤ番号は「３１」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「１」である。 When this procedure is started, the matrix calculation unit 112 starts the matrix calculation of the first output matrix (output channel). At this time, the controller 116 generates data for the submap address table (step S320). That is, the controller 116 generates the above-mentioned ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in a record in the submap address table 41 whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "31". Also, since the linked submap has not yet been read, the usage information recorded is "1".

　サブマップアドレステーブル用データの生成が完了すると、行列計算部１１２が１番目の入力チャネルについて行列計算を開始する。このとき、マップチェック部１１５が、サブマップアドレスバッファ１２０から、１番目の入力チャネルに対応するサブマップのアドレス情報を読み出す（ステップＳ３２１）。そして、マップチェック部１１５は、当該アドレス情報に基づいて、サブマップメモリ１１４から、１番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する（ステップＳ３０１）。例えば、２層目の１番目の入力チャネルに対する演算であれば、マップチェック部１１５は、ＩＤ番号「２１」に紐づけられたアドレス情報を読み出すことになる。また、マップチェック部１１５は、サブマップアドレスバッファ１２０からアドレス情報を読み出したＩＤ番号をコントローラ１１６に通知する。当該通知を受けたコントローラ１１６は、サブマップアドレスバッファ１２０のＩＤ番号に紐づけられた使用情報を「１」から「０」に書き換える。なお、使用情報の書き換えは、当該タイミングではなく任意のタイミングで実施することができる。しかしながら、サブマップメモリ１１４の記憶領域を有効利用する観点では、アドレス情報が読み出された後、新たなサブマップがサブマップメモリ１１４に格納されるまでの間に実施されることが好ましい。 When the generation of data for the submap address table is completed, the matrix calculation unit 112 starts matrix calculation for the first input channel. At this time, the map check unit 115 reads address information of the submap corresponding to the first input channel from the submap address buffer 120 (step S321). Then, based on the address information, the map check unit 115 reads the submap corresponding to the first input channel from the submap memory 114 and checks the status information included in the submap (step S301). For example, if the calculation is for the first input channel of the second layer, the map check unit 115 reads address information linked to the ID number "21". In addition, the map check unit 115 notifies the controller 116 of the ID number for which the address information was read from the submap address buffer 120. The controller 116 that receives the notification rewrites the usage information linked to the ID number of the submap address buffer 120 from "1" to "0". Note that the rewriting of the usage information can be performed at any timing other than the timing. However, from the perspective of making effective use of the storage area of the submap memory 114, it is preferable to perform this process after the address information is read and before a new submap is stored in the submap memory 114.

　１番目の入力チャネルについての以降の手順は図３において説明した手順と概ね同一であるが、ゼロチェック部１１３は、判断結果をコントローラ１１６が生成したアドレス情報で指定されるサブマップメモリ１１４の格納位置に格納する（ステップＳ３０８）。演算装置３００は、以上の処理を、１番目の入力チャネルに属する全座標領域について完了するまで繰り返し実施する（ステップＳ３０９Ｎｏ）。 The subsequent procedure for the first input channel is generally the same as that described in FIG. 3, but the zero check unit 113 stores the result of the determination in a storage location in the submap memory 114 specified by the address information generated by the controller 116 (step S308). The calculation device 300 repeats the above process until it is completed for all coordinate areas belonging to the first input channel (step S309 No).

　１番目の入力チャネルに対する計算が完了すると、マップチェック部１１５が、サブマップアドレスバッファ１２０から、２番目の入力チャネルに対応するサブマップのアドレス情報を読み出す（ステップＳ３０９Ｙｅｓ、Ｓ３１０Ｎｏ、Ｓ３２１）。そして、マップチェック部１１５は、当該アドレス情報に基づいて、サブマップメモリ１１４から、２番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する（ステップＳ３０１）。以降の処理は、上述の１番目の入力チャネルに対する処理と同様の処理が、ｎ番目（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施される（ステップＳ３０９Ｙｅｓ、Ｓ３１０Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には入力データ全体（全入力チャネル）に対して行列計算を行った結果である出力行列が格納されていることになる。 When the calculation for the first input channel is completed, the map check unit 115 reads the address information of the submap corresponding to the second input channel from the submap address buffer 120 (steps S309 Yes, S310 No, S321). Then, based on the address information, the map check unit 115 reads the submap corresponding to the second input channel from the submap memory 114 and checks the status information contained in the submap (step S301). Subsequent processing is similar to the processing for the first input channel described above, and is repeated until processing for the nth input channel (n=3 in this example) is completed (steps S309 Yes, S310 No). When processing for the nth input channel is completed, the output matrix, which is the result of performing matrix calculations on the entire input data (all input channels), is stored in the data memory 111.

　また、演算装置３００は、以上のｎ個の入力チャネルに対する処理を、所定の出力行列数ｍ（本例では、ｍ＝３）が得られるまで繰り返し実施する（ステップＳ３１０Ｙｅｓ、Ｓ３１１Ｎｏ）。行列計算部１１２が２番目の出力行列（出力チャネル）の行列計算を開始するとき、コントローラ１１６は、当該出力行列に対応するサブマップアドレステーブル用データを生成する（ステップＳ３２０）。すなわち、コントローラ１１６は、ＩＤ番号、アドレス情報、使用情報を生成し、サブマップアドレスバッファ１２０のサブマップアドレステーブル４１に記録する。なお、コントローラ１１６は、生成した情報をサブマップアドレステーブル４１において使用情報が「０」であるレコードに記録する。なお、上述のＩＤ番号の生成規則では、このとき生成されるＩＤ番号は「３２」である。また、紐づけられたサブマップは、未だ読み出されていないため、記録される使用情報は「１」である。 The arithmetic device 300 repeats the above process for the n input channels until a predetermined number m of output matrices (m=3 in this example) is obtained (step S310: Yes, S311: No). When the matrix calculation unit 112 starts the matrix calculation for the second output matrix (output channel), the controller 116 generates data for the submap address table corresponding to the output matrix (step S320). That is, the controller 116 generates an ID number, address information, and usage information, and records them in the submap address table 41 of the submap address buffer 120. The controller 116 records the generated information in the submap address table 41 in a record whose usage information is "0". According to the above-mentioned ID number generation rules, the ID number generated at this time is "32". Furthermore, since the linked submap has not yet been read, the usage information recorded is "1".

　所定の出力行列数ｍを得るための計算がすべて完了したとき、データメモリ１１１にはｍ個の出力行列が格納され、サブマップメモリ１１４には各出力行列に対応するｍ個のサブマップが格納され、サブマップアドレスバッファ１２０にはｍ個の出力行列のそれぞれに対応するアドレス情報と使用情報とが記録されることになる。このｍ個の出力行列は、次層の畳み込み演算の入力チャネルとして使用される。そして、演算装置３００は、指定された層数の畳み込み演算のすべてが完了するまで２層目以降の畳み込み演算時に実施される手順を繰り返し実施する。 When all calculations to obtain a predetermined number m of output matrices are completed, m output matrices will be stored in data memory 111, m submaps corresponding to each output matrix will be stored in submap memory 114, and address information and usage information corresponding to each of the m output matrices will be recorded in submap address buffer 120. These m output matrices are used as input channels for the convolution calculation of the next layer. Then, the calculation device 300 repeatedly performs the procedures performed during the convolution calculation of the second layer and thereafter until all convolution calculations for the specified number of layers are completed.

　以上説明したように、本実施形態の演算装置３００では、サブマップアドレステーブルを使用することにより、サブマップメモリ１１４において以降で使用しないサブマップが記憶されている領域に、新たなサブマップを上書きすることが可能になる。その結果、リングバッファ等のサイズが限定されたメモリを使用してサブマップメモリ１１４を実現することができる。 As described above, in the computing device 300 of this embodiment, by using a submap address table, it becomes possible to overwrite a new submap in an area of the submap memory 114 where a submap that will not be used in the future is stored. As a result, the submap memory 114 can be realized using a memory with a limited size, such as a ring buffer.

　なお、上述の演算装置１００、３００では、畳み込み演算に使用される入力チャネルのデータについてのみサブマップを生成し、サブマップに含まれる状態情報に基づいて当該入力チャネルのデータを読み出すが否かを判断する構成について説明した。しかしながら、このような状態情報は、入力データの行列計算に使用されるカーネルに適用することも可能である。すなわち、カーネルについても、カーネルの各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応するカーネルを使用した行列計算が入力チャネルのデータを読み出すことなくスキップされる構成を採用することもできる。この場合、カーネルは各要素の値が予め指定された行列であるため、カーネルの状態情報は予め取得することが可能である。例えば、当該カーネルの状態情報をサブマップメモリ１１４に格納しておき、図３や図９のステップＳ３０２における、マップチェック部１１５の判定において、入力チャネルの状態情報に加えて、カーネルの状態情報も考慮して入力チャネルのデータを読み出すが否かを判断する構成を採用することができる。 In the above-mentioned arithmetic devices 100 and 300, a submap is generated only for the data of the input channel used in the convolution operation, and whether or not to read the data of the input channel is determined based on the state information included in the submap. However, such state information can also be applied to the kernel used in the matrix calculation of the input data. That is, a configuration can also be adopted in which state information is created for the kernel based on each element of the kernel, and when the state information satisfies a pre-specified condition, the matrix calculation using the kernel corresponding to the state information is skipped without reading the data of the input channel. In this case, since the kernel is a matrix in which the values of each element are pre-specified, the state information of the kernel can be obtained in advance. For example, a configuration can be adopted in which the state information of the kernel is stored in the submap memory 114, and in the judgment of the map check unit 115 in step S302 of FIG. 3 or FIG. 9, in addition to the state information of the input channel, the state information of the kernel is also taken into consideration to determine whether or not to read the data of the input channel.

　上述の演算装置１００では、行列計算部１１２が、１つの入力チャネルの全体について行列計算を実施し、完了後に次の入力チャネルの全体について行列計算する構成について説明した。しかしながら、行列計算は、入力チャネルの全体について連続して実施される必要はない。ここでは、出力行列の１つの要素ごとに行列計算を行う構成について説明する。このような構成では、行列計算は、入力チャネルの一部を構成する座標領域ごとに実施されることになる。図１０は、本手法の概念を説明するための図である。図１０では、３つの入力チャネルから３つの出力行列（出力チャネル）を得る事例を記載している。また、本手法は、行列計算部１１２等によるデータの読み出し順序等が変更されるだけであり、演算装置の構成は図１に示す構成と同様である。 In the above-mentioned arithmetic device 100, the matrix calculation unit 112 performs matrix calculations for one entire input channel, and after completion, performs matrix calculations for the entire next input channel. However, the matrix calculations do not need to be performed continuously for the entire input channel. Here, a configuration is described in which matrix calculations are performed for each element of the output matrix. In such a configuration, the matrix calculations are performed for each coordinate region that constitutes part of the input channel. Figure 10 is a diagram for explaining the concept of this method. Figure 10 shows an example in which three output matrices (output channels) are obtained from three input channels. Also, in this method, only the order in which data is read by the matrix calculation unit 112 and the like is changed, and the configuration of the arithmetic device is the same as the configuration shown in Figure 1.

　図１０に示すように、３つの出力行列のそれぞれにおいて特定の座標に位置する要素の計算には、３つの入力チャネルにおいて同一の座標領域にあるデータが使用される。例えば、図１０において第１の出力行列６１の座標（２，３）に位置する要素６１ａを算出する行列計算には、第１の入力チャネル５１の座標（２，３）を中心とする３×３の座標領域５１ａに属するデータ、第２の入力チャネル５２の座標（２，３）を中心とする３×３の座標領域５２ａに属するデータ、第３の入力チャネル５３の座標（２，３）を中心とする３×３の座標領域５３ａに属するデータが使用される。同様に、第２の出力行列６２の座標（２，３）に位置する要素６２ａ及び第３出力行列６３の座標（２，３）に位置する要素６３ａを算出する行列計算にも、カーネルが異なるだけで第１の入力チャネル５１の座標領域５１ａに属するデータ、第２の入力チャネル５２の座標領域５２ａに属するデータ、第３の入力チャネル５３の座標領域５３ａに属するデータが使用される。 As shown in FIG. 10, data in the same coordinate region in the three input channels is used to calculate an element located at a specific coordinate in each of the three output matrices. For example, in FIG. 10, data belonging to a 3×3 coordinate region 51a centered on the coordinate (2,3) of the first output matrix 61 is used in the matrix calculation to calculate an element 61a located at the coordinate (2,3) of the first input channel 51, data belonging to a 3×3 coordinate region 52a centered on the coordinate (2,3) of the second input channel 52, and data belonging to a 3×3 coordinate region 53a centered on the coordinate (2,3) of the third input channel 53. Similarly, data belonging to the coordinate region 51a of the first input channel 51, data belonging to the coordinate region 52a of the second input channel 52, and data belonging to the coordinate region 53a of the third input channel 53 are used in the matrix calculation to calculate an element 62a located at the coordinate (2,3) of the second output matrix 62 and an element 63a located at the coordinate (2,3) of the third output matrix 63, although the kernels are different.

　したがって、第１の入力チャネル５１の座標領域５１ａのデータ、第２の入力チャネル５２の座標領域５２ａのデータ、第３の入力チャネル５３の座標領域５３ａのデータを順に読み出し、続いて、各入力チャネル５１、５２、５３において位置を変えた座標領域のデータを順に読み出して行列計算することでも上述の実施形態において算出される出力行列と同一の出力行列を算出することができる。 Therefore, the same output matrix as that calculated in the above embodiment can be calculated by sequentially reading out the data in the coordinate region 51a of the first input channel 51, the data in the coordinate region 52a of the second input channel 52, and the data in the coordinate region 53a of the third input channel 53, and then sequentially reading out the data in the coordinate regions whose positions have been changed in each of the input channels 51, 52, and 53, and performing matrix calculations.

　図１１は、演算装置１００による１層目の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図１１では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。特に限定されないが、当該手順は、例えば、演算対象のデータが演算装置１００の外部からデータメモリ１１１に格納されたタイミングで開始される。 FIG. 11 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the first-layer convolution calculation by the calculation device 100. FIG. 11 shows an example in which the number of input channels is n and the number of output channels is m. Although not particularly limited, this procedure is started, for example, when the data to be calculated is stored in the data memory 111 from outside the calculation device 100.

　当該手順が開始されると、まず、計算対象となる出力行列の要素の座標が決定される（ステップＳ７０１）。そして、決定された座標に基づいて、行列計算に必要な入力チャネルの座標領域が特定される（ステップＳ７０２）。なお、本実施形態では、コントローラ１１６が出力行列における要素の座標の決定、及び入力チャネルにおける座標領域の特定を行う。 When this procedure starts, first, the coordinates of the elements of the output matrix to be calculated are determined (step S701). Then, based on the determined coordinates, the coordinate area of the input channel required for the matrix calculation is identified (step S702). Note that in this embodiment, the controller 116 determines the coordinates of the elements in the output matrix and identifies the coordinate area of the input channel.

　行列計算部１１２は、データメモリ１１１から、１番目の出力行列について１番目の入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ７０３）。また、行列計算部１１２は、データメモリ１１１から、１番目の入力チャネルの座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ７０４）。上述のとおり、１番目の入力チャネルに対するバイアスはゼロ行列である。 The matrix calculation unit 112 reads from the data memory 111 the kernel to be used for the matrix calculation for the first input channel for the first output matrix (step S703). The matrix calculation unit 112 also reads from the data memory 111 the data belonging to the coordinate region of the first input channel and the bias described above (step S704). As described above, the bias for the first input channel is a zero matrix.

　行列計算部１１２は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する（ステップＳ７０５、Ｓ７０６）。このとき、行列計算部１１２は出力値をゼロチェック部１１３に入力する。当該入力に応じて、ゼロチェック部１１３は上述の状態判断を行う。そして、ゼロチェック部１１３は、判断結果をサブマップメモリ１１４に格納する（ステップＳ７０７）。 The matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S705 and S706). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. The zero check unit 113 then stores the judgment result in the submap memory 114 (step S707).

　次いで、行列計算部１１２は、データメモリ１１１から、１番目の出力行列について２番目の入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ７０８Ｎｏ、Ｓ７０３）。また、行列計算部１１２は、データメモリ１１１から、２番目の入力チャネルの座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ７０４）。このとき、バイアスは、計算中の出力行列において、その時点でデータメモリ１１１に格納されている、計算対象の要素以外の要素をゼロにした行列である。 Then, the matrix calculation unit 112 reads out from the data memory 111 the kernel to be used for the matrix calculation of the first output matrix for the second input channel (step S708 No, S703). The matrix calculation unit 112 also reads out from the data memory 111 the data belonging to the coordinate region of the second input channel and the bias described above (step S704). At this time, the bias is a matrix in which all elements of the output matrix being calculated that are stored in the data memory 111 at that time, other than the element to be calculated, are set to zero.

　演算装置１００は、以上の処理を、１番目の出力行列についてｎ番目の（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施する（ステップＳ７０８Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 100 repeats the above process for the first output matrix until processing for the nth (in this example, n=3) input channel is completed (step S708 No). When processing for the nth input channel is completed, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

　また、出力行列の１つの要素に対する処理が完了すると、コントローラ１１６が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する（ステップＳ７０８Ｙｅｓ、Ｓ７０９Ｎｏ、Ｓ７０１、Ｓ７０２）。そして、演算装置１００は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する（ステップＳ７０９Ｎｏ）。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S708: Yes, S709: No, S701, S702). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S709: No).

　また、演算装置１００は、以上の処理を、ｍ番目（本例では、ｍ＝３）の出力行列に対する処理が完了するまで繰り返し実施する（ステップＳ７０９Ｙｅｓ、Ｓ７１０Ｎｏ）。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S709: Yes, S710: No).

　図１２は、演算装置１００による２層目以降の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図１２では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。特に限定されないが、当該手順は、例えば直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 12 is a flow diagram showing the procedure for implementing a method for acquiring data in the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 100. FIG. 12 shows an example in which the number of input channels is n and the number of output channels is m. Although not limited to this, the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

　当該手順が開始されると、まず、コントローラ１１６が計算対象となる出力行列の要素の座標を決定し、決定した座標に基づいて、行列計算に必要な入力チャネルの座標領域を特定する（ステップＳ８０１、Ｓ８０２）。 When this procedure is started, the controller 116 first determines the coordinates of the output matrix elements to be calculated, and then identifies the coordinate region of the input channel required for the matrix calculation based on the determined coordinates (steps S801 and S802).

　マップチェック部１１５が、サブマップメモリ１１４から、１番目の出力行列について１番目の入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認する（ステップＳ８０３）。確認の結果、行列計算をしない場合、マップチェック部１１５は、サブマップメモリ１１４から、１番目の出力行列について２番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する（ステップＳ８０４Ｙｅｓ、Ｓ８１１Ｎｏ、Ｓ８０３）。 The map check unit 115 reads out the submap corresponding to the first input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (step S803). If the result of the check is that no matrix calculation is to be performed, the map check unit 115 reads out the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S804 Yes, S811 No, S803).

　また、確認の結果、行列計算を進める場合、マップチェック部１１５は、行列計算部１１２に、データメモリ１１１から、１番目の入力チャネルに対する行列計算に使用するカーネルを読み出させる。そして、当該カーネルが、上述の条件を満足するか否かを確認する（ステップＳ８０４Ｎｏ、Ｓ８０５）。確認の結果、行列計算をしない場合、マップチェック部１１５は、サブマップメモリ１１４から、１番目の出力行列について２番目の入力チャネルに対応するサブマップを読み出し、サブマップに含まれる状態情報を確認する（ステップＳ８０６Ｙｅｓ、Ｓ８１１Ｎｏ、Ｓ８０３）。 If the result of the check is to proceed with the matrix calculation, the map check unit 115 causes the matrix calculation unit 112 to read the kernel to be used for the matrix calculation for the first input channel from the data memory 111. Then, it is checked whether the kernel satisfies the above-mentioned conditions (steps S804: No, S805). If the result of the check is that the matrix calculation is not to be performed, the map check unit 115 reads the submap corresponding to the second input channel for the first output matrix from the submap memory 114, and checks the status information included in the submap (steps S806: Yes, S811: No, S803).

　また、カーネルの確認の結果、行列計算を進める場合、マップチェック部１１５は、行列計算部１１２に行列計算を実行させる（ステップＳ８０６Ｎｏ）。この場合、行列計算部１１２は、データメモリ１１１に格納されている１番目の入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ８０７）。なお、上述のとおり、ｋ層目の畳み込み演算の場合、ｋ－１層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 If the kernel check result indicates that the matrix calculation should proceed, the map check unit 115 causes the matrix calculation unit 112 to execute the matrix calculation (step S806: No). In this case, the matrix calculation unit 112 reads out data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the first input channel stored in the data memory 111, and the bias described above (step S807). As described above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

　次いで、行列計算部１１２は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する（ステップＳ８０８、Ｓ８０９）。このとき、行列計算部１１２は出力値をゼロチェック部１１３に入力する。当該入力に応じて、ゼロチェック部１１３は上述の状態判断を行う。そして、ゼロチェック部１１３は、判断結果をサブマップメモリ１１４に格納する（ステップＳ８１０）。 Then, the matrix calculation unit 112 executes the matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S808 and S809). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. Then, the zero check unit 113 stores the judgment result in the submap memory 114 (step S810).

　演算装置１００は、以上の処理を、１番目の出力行列についてｎ番目の（本例では、ｎ＝３）の入力チャネルに対する処理が完了するまで繰り返し実施する（ステップＳ８１１Ｎｏ）。ｎ番目の入力チャネルに対する処理が完了すると、データメモリ１１１には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 100 repeats the above process for the first output matrix until processing for the nth (in this example, n=3) input channel is completed (step S811 No). When processing for the nth input channel is completed, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

　また、出力行列の１つの要素に対する処理が完了すると、コントローラ１１６が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する（ステップＳ８１１ｙｅｓ、Ｓ８１２Ｎｏ、Ｓ８０１、Ｓ８０２）。そして、演算装置１００は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する（ステップＳ８１２Ｎｏ）。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel that corresponds to the determined coordinates (steps S811: yes, S812: no, S801, S802). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S812: no).

　また、演算装置１００は、以上の処理を、ｍ番目（本例では、ｍ＝３）の出力行列に対する処理が完了するまで繰り返し実施する（ステップＳ８１２Ｙｅｓ、Ｓ８１３Ｎｏ）。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S812: Yes, S813: No).

　以上説明したように、本手法においても、上述の効果を得ることができる。演算装置１００では、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部１１２には、データメモリ１１１から当該入力チャネルに属するデータは読み出されることなくスキップされる。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As explained above, the above-mentioned effects can also be obtained with this method. In the calculation device 100, state information is created based on each element of the output matrix of the convolution calculation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped. At this time, the matrix calculation unit 112 skips the data belonging to the input channel from the data memory 111 without reading it. In other words, since the reading of unnecessary data does not occur, the wasted data read time can be reduced, and the time required for the entire calculation can be further shortened compared to the conventional method.

　なお、図１１及び図１２に示すフロー図では、行列計算部１１２が計算結果を出力する都度、ステップＳ７０７やステップＳ８１０において、ゼロチェック部１１３が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部１１３が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ１１４に格納する構成を採用することもできる。 In the flow diagrams shown in Figures 11 and 12, the zero check unit 113 updates the state information in steps S707 and S810 each time the matrix calculation unit 112 outputs a calculation result. However, it is also possible to update the state information at other times. For example, it is also possible to adopt a configuration in which the zero check unit 113 creates state information and stores it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

　また、図１２に示すフロー図のステップＳ８０４又はＳ８０６において、出力行列の同一の要素についての全入力チャネルの入力データの読み込みがスキップされた場合、当該要素に対してゼロチェックは実行されない。すなわち、当該出力行列に対応する状態情報は初期値のまま更新されない。そのため、状態情報の初期値を、例えば、状態１を示す「０」に設定する構成とすることが好ましい。本構成により、全入力チャネルの入力データの読み込みがスキップされた場合に、畳み込み演算における次層で当該出力行列が入力データとして読み出されることなくスキップさせることができる。すなわち、出力行列の要素が計算されない場合でも、当該要素の値をゼロにするためのゼロクリア動作（データメモリ１１１へのアクセス）が不要になる。 Furthermore, in step S804 or S806 of the flow diagram shown in FIG. 12, if the reading of input data of all input channels for the same element of the output matrix is skipped, a zero check is not performed on that element. That is, the state information corresponding to that output matrix is not updated to its initial value. For this reason, it is preferable to configure the initial value of the state information to be set to, for example, "0" indicating state 1. With this configuration, when the reading of input data of all input channels is skipped, the output matrix can be skipped without being read as input data in the next layer in the convolution calculation. That is, even if an element of the output matrix is not calculated, a zero clear operation (access to data memory 111) to set the value of that element to zero is not required.

　以上の説明では、行列計算部１１２が、各入力チャネルについて、１つの座標領域データを読み込む構成とした。しかしながら、行列計算部１１２が、各入力チャネルについて、連続する複数の座標領域データを読み込み、複数の行列計算の結果をデータメモリ１１１に書き込む構成を採用することも可能である。これにより、行列計算の並列化が可能となる。なお、図１０から理解できるように、各入力チャネルにおいて連続する座標領域データを読み込んで行列計算することは、出力行列において連続する要素について行列計算することと等価である。 In the above explanation, the matrix calculation unit 112 is configured to read one coordinate domain data for each input channel. However, it is also possible to adopt a configuration in which the matrix calculation unit 112 reads multiple consecutive coordinate domain data for each input channel and writes the results of multiple matrix calculations to the data memory 111. This makes it possible to parallelize the matrix calculations. As can be seen from FIG. 10, reading consecutive coordinate domain data in each input channel and performing matrix calculations is equivalent to performing matrix calculations on consecutive elements in the output matrix.

　行列計算部１１２が複数の行列計算を並列化した場合、例えば、各行列計算の出力値が同一のタイミングで出力されることになるため、ゼロチェック部１１３による出力行列の状態情報のサブマップメモリ１１４への格納が複雑になる。そのため、各入力チャネルにおいて連続する座標領域データを読み込んで行列計算する際には、ゼロチェック部１１３は、算出された複数の出力値のいずれか１つに基づく状態情報を、複数の出力値に対応する全での要素の状態情報としてサブマップメモリ１１４に格納する。 When the matrix calculation unit 112 parallelizes multiple matrix calculations, for example, the output values of each matrix calculation are output at the same timing, which complicates the storage of output matrix status information in the submap memory 114 by the zero check unit 113. Therefore, when reading continuous coordinate area data in each input channel to perform a matrix calculation, the zero check unit 113 stores status information based on any one of the multiple calculated output values in the submap memory 114 as status information for all elements corresponding to the multiple output values.

　図１３は、本手法の概念を説明するための図である。図１３では、１つの出力行列のみを図示している。例えば、図１３において出力行列７１の座標（２，３）に位置する要素７１ａ、座標（３，３）に位置する要素７１ｂ、座標（４，３）に位置する要素７１ｃを算出する行列計算を並列化した場合、ゼロチェック部１１３は、いずれかの要素（例えば、要素７１ｃ）についての判断結果を全ての要素７１ａ、７１ｂ、７１ｃについての判断結果としてサブマップメモリ１１４に登録する。これにより、行列計算の並列化を容易に実現することができる。なお、ここでは、連続する３つのデータを１単位とした事例を例示したが、連続するデータであれば、出力行列単位、出力行列の行単位、入力チャネルの入力範囲単位で本手法を適用することも可能である。 FIG. 13 is a diagram for explaining the concept of this method. In FIG. 13, only one output matrix is illustrated. For example, in FIG. 13, when the matrix calculation for calculating element 71a located at coordinates (2, 3), element 71b located at coordinates (3, 3), and element 71c located at coordinates (4, 3) of output matrix 71 is parallelized, zero check unit 113 registers the judgment result for any element (for example, element 71c) in submap memory 114 as the judgment result for all elements 71a, 71b, and 71c. This makes it easy to realize parallelization of matrix calculation. Note that here, an example is shown in which three consecutive data are treated as one unit, but if the data is consecutive, this method can also be applied in output matrix units, output matrix row units, and input channel input range units.

　上述の全入力チャネルについて同一の座標領域のデータを取得する構成では、マップチェック部１１５がサブマップに含まれる状態情報を確認し、行列計算部１１３に入力チャネルのデータを読み込ませるか否かを選択する構成について説明した。演算全体に要する時間をより低減する観点では、確認回数がより少ないことが好ましい。 In the configuration described above in which data of the same coordinate region is obtained for all input channels, the map check unit 115 checks the state information contained in the submap and selects whether or not to have the matrix calculation unit 113 read the data of the input channel. From the perspective of further reducing the time required for the entire calculation, it is preferable to check as few times as possible.

　図１４は、確認回数をより少なくすることができる演算装置の構成を示す概略構成図である。図１４に示すように、演算装置２００は、上述した演算装置１００の構成に加えて、テーブル作成部１１７及び読み出し制御部１１８を備える。なお、図１４において演算装置１００と同様の作用効果を奏する構成要素には、図１と同一の符号を付し、以下での詳細な説明は省略する。 FIG. 14 is a schematic diagram showing the configuration of a calculation device that can reduce the number of confirmations. As shown in FIG. 14, in addition to the configuration of the calculation device 100 described above, the calculation device 200 includes a table creation unit 117 and a read control unit 118. Note that in FIG. 14, components that achieve the same effects as the calculation device 100 are given the same reference numerals as in FIG. 1, and detailed explanations thereof will be omitted below.

　テーブル作成部１１７は、マップチェック部１１５の判断結果に基づいて、行列計算部１１２に入力データとして読み出させる出力行列を特定するテーブルを作成する。すなわち、テーブル作成部１１７は、畳み込み演算を開始する際に、まず、マップチェック部１１５に当該畳み込み演算に使用する全ての入力チャネル（前層における畳み込み演算で算出された全ての出力行列）に対応する状態情報を読み出させ、上述した手法により、行列計算部１１２に入力データとして読み出させるか否かを判断させる。そして、テーブル作成部１１７は、当該判断結果に基づいて、行列計算部１１２に入力データとして読み出させる入力チャネルを特定するテーブルを作成する。例えば、入力チャネル数が３つであり、１番目の入力チャネルと３番目の入力チャネルを行列計算部１１２に入力データとして読み出させるとマップチェック部１１５が判断した場合、テーブル作成部１１７はその旨を示すテーブルを作成する。特に限定されないが、本実施形態では、テーブル作成部１１７は作成したテーブルを自身で保持する構成になっている。 The table creation unit 117 creates a table that specifies the output matrices to be read as input data by the matrix calculation unit 112 based on the judgment result of the map check unit 115. That is, when starting a convolution operation, the table creation unit 117 first makes the map check unit 115 read state information corresponding to all input channels used in the convolution operation (all output matrices calculated in the convolution operation in the previous layer), and judges whether to make the matrix calculation unit 112 read them as input data using the above-mentioned method. Then, based on the judgment result, the table creation unit 117 creates a table that specifies the input channels to be read as input data by the matrix calculation unit 112. For example, if there are three input channels and the map check unit 115 judges that the first input channel and the third input channel are to be read as input data by the matrix calculation unit 112, the table creation unit 117 creates a table indicating that. Although not particularly limited, in this embodiment, the table creation unit 117 is configured to hold the created table by itself.

　読み出し制御部１１８は、テーブル作成部１１７により作成されたテーブルに基づいて行列計算部１１２に演算対象データを読み出させる。上述のように、１番目の入力チャネルと３番目の入力チャネルを行列計算部１１２に入力データとして読み出させることを示すテーブルが作成された場合、読み出し制御部１１８は、畳み込み演算において、１番目の入力チャネルと３番目の入力チャネルのみを使用して、行列計算部１１２に行列計算を実行させる。 The read control unit 118 causes the matrix calculation unit 112 to read the data to be calculated based on the table created by the table creation unit 117. As described above, when a table is created indicating that the matrix calculation unit 112 is to read the first and third input channels as input data, the read control unit 118 causes the matrix calculation unit 112 to perform the matrix calculation using only the first and third input channels in the convolution operation.

　なお、演算装置２００において、テーブル作成部１１７や読み出し制御部は、例えば、プロセッサとＲＡＭやＲＯＭ等のメモリとを備えたハードウェア、及び当該メモリに格納され、プロセッサ上で動作するソフトウェアにより実現することができる。 In addition, in the computing device 200, the table creation unit 117 and the read control unit can be realized, for example, by hardware including a processor and memory such as RAM or ROM, and by software stored in the memory and running on the processor.

　続いて、以上の構成を有する演算装置２００の動作について説明する。演算装置２００においても、サブマップが存在しない１層目の畳み込み演算とサブマップが存在する２層目以降の畳み込み演算とで動作が異なる。しかしながら、１層目の畳み込み演算の動作は、図１１に示す動作と同一であるためここでの説明は省略する。 Next, the operation of the arithmetic device 200 having the above configuration will be described. Even in the arithmetic device 200, the operation differs between the first layer convolution operation where there is no submap and the second layer and subsequent layers where there are submaps. However, the operation of the first layer convolution operation is the same as the operation shown in FIG. 11, so a description thereof will be omitted here.

　図１５は、演算装置２００による２層目以降の畳み込み演算において、全入力チャネルについて同一の座標領域のデータを取得する手法を実施する手順を示すフロー図である。図１５では、入力チャネル数がｎ、出力チャネル数がｍの事例を示している。特に限定されないが、当該手順は、例えば、直前の層の畳み込み演算が完了したタイミングで開始される。 FIG. 15 is a flow diagram showing the procedure for implementing a method for acquiring data of the same coordinate region for all input channels in the convolution calculation of the second layer and thereafter by the calculation device 200. FIG. 15 shows an example in which the number of input channels is n and the number of output channels is m. Although not limited to this, the procedure is started, for example, when the convolution calculation of the immediately preceding layer is completed.

　当該手順が開始されると、まず、テーブル作成部１１７が、マップチェック部１１５に全ての入力チャネルに対応する状態情報を読み出させ、上述した手法により、行列計算部１１２に入力データとして読み出させるか否かを判断させる。そして、テーブル作成部１１７は、当該判断結果に基づいて上述のテーブルを作成する（ステップＳ１１０１）。 When this procedure starts, the table creation unit 117 first causes the map check unit 115 to read out state information corresponding to all input channels, and then uses the method described above to determine whether or not to cause the matrix calculation unit 112 to read out the state information as input data. The table creation unit 117 then creates the above-mentioned table based on the result of this determination (step S1101).

　テーブル作成が完了すると、コントローラ１１６は、計算対象となる出力行列の要素の座標を決定し、決定した座標に基づいて、行列計算に必要な入力チャネルの座標領域を特定する（ステップ１１０２、Ｓ１１０３）。 Once the table creation is complete, the controller 116 determines the coordinates of the elements of the output matrix to be calculated, and, based on the determined coordinates, identifies the coordinate region of the input channel required for the matrix calculation (steps 1102, S1103).

　読み出し制御部１１８はテーブル作成部１１７が作成したテーブルに基づいて、行列計算部１１２に、テーブルの１番目に記載されている入力チャネルに対する行列計算の実行を指示する。当該指示に基づいて、行列計算部１１２は、データメモリ１１１から、１番目の出力行列について、テーブルの１番目に記載されている入力チャネルに対する行列計算に使用するカーネルを読み出す（ステップＳ１１０４）。また、このとき、行列計算部１１２は、データメモリ１１１から、テーブルの１番目に記載された入力チャネルに属する入力データから行列計算に使用する座標領域に属するデータ、及び上述のバイアスを読み出す（ステップＳ１１０５）。なお、上述のとおり、ｋ層目の畳み込み演算の場合、ｋ－１層目の畳み込み演算で算出された複数の出力行列のそれぞれが入力チャネルとして使用される。 Based on the table created by the table creation unit 117, the read control unit 118 instructs the matrix calculation unit 112 to execute a matrix calculation for the input channel listed first in the table. Based on this instruction, the matrix calculation unit 112 reads out from the data memory 111 a kernel to be used for the matrix calculation for the input channel listed first in the table for the first output matrix (step S1104). At this time, the matrix calculation unit 112 also reads out from the data memory 111 data belonging to the coordinate region to be used for the matrix calculation from the input data belonging to the input channel listed first in the table, and the bias mentioned above (step S1105). As mentioned above, in the case of the kth layer convolution calculation, each of the multiple output matrices calculated in the k-1th layer convolution calculation is used as an input channel.

　次いで、行列計算部１１２は行列計算を実行し、計算結果である出力値を、座標領域を示す情報とともにデータメモリ１１１に格納する（ステップＳ１１０６、Ｓ１１０７）。このとき、行列計算部１１２は出力値をゼロチェック部１１３に入力する。当該入力に応じて、ゼロチェック部１１３は上述の状態判断を行う。そして、ゼロチェック部１１３は、判断結果をサブマップメモリ１１４に格納する（ステップＳ１１０８）。 Then, the matrix calculation unit 112 executes a matrix calculation, and stores the output value, which is the calculation result, in the data memory 111 together with information indicating the coordinate area (steps S1106 and S1107). At this time, the matrix calculation unit 112 inputs the output value to the zero check unit 113. In response to this input, the zero check unit 113 performs the above-mentioned state judgment. Then, the zero check unit 113 stores the judgment result in the submap memory 114 (step S1108).

　演算装置２００は、以上の処理を、１番目の出力行列について上述のテーブルに記載されている全ての入力チャネルに対する処理が完了するまで繰り返し実施する（ステップＳ１１０９Ｎｏ）。テーブル中の全ての入力チャネルに対する処理が完了すると、データメモリ１１１には全入力チャネルに対して行列計算を行った結果である出力行列の要素が格納されていることになる。 The calculation device 200 repeats the above process for the first output matrix until processing is completed for all input channels listed in the above table (step S1109 No). When processing is completed for all input channels in the table, the data memory 111 stores the elements of the output matrix that is the result of performing matrix calculations for all input channels.

　また、出力行列の１つの要素に対する処理が完了すると、コントローラ１１６が次に計算対象となる出力行列の要素の座標を決定し、決定した座標に対応する入力チャネルの座標領域を特定する（ステップＳ１１０９Ｙｅｓ、Ｓ１１１０Ｎｏ、Ｓ１１０２、Ｓ１１０３）。そして、演算装置１００は、上述の処理が、出力行列の全ての要素について完了するまで繰り返し実施する（ステップＳ１１１０Ｎｏ）。 Furthermore, when processing for one element of the output matrix is completed, the controller 116 determines the coordinates of the next element of the output matrix to be calculated, and identifies the coordinate region of the input channel corresponding to the determined coordinates (steps S1109: Yes, S1110: No, S1102, S1103). The calculation device 100 then repeats the above-mentioned processing until it is completed for all elements of the output matrix (step S1110: No).

　また、演算装置１００は、以上の処理を、ｍ番目（本例では、ｍ＝３）の出力行列に対する処理が完了するまで繰り返し実施する（ステップＳ１１１０Ｙｅｓ、Ｓ１１１１Ｎｏ）。 The calculation device 100 also repeats the above process until processing for the mth (in this example, m=3) output matrix is completed (steps S1110: Yes, S1111: No).

　以上説明したように、演算装置２００においても、直前の層における畳み込み演算の出力行列の各要素に基づいて状態情報が作成され、当該状態情報が予め指定された条件を満足すると、その状態情報に対応する入力チャネルを使用した行列計算がスキップされる。また、このとき、行列計算部１１２には、データメモリ１１１から当該入力チャネルに属するデータは読み出されることなくスキップされる。すなわち、不要なデータの読み込みが発生しないため、結果的に無駄になるデータ読み出し時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。さらに、演算装置２００では、演算に使用する入力チャネルがテーブルに記載されているため、行列計算の都度、入力チャネルに属するデータを読み込むか否かを判断する必要がない。その結果、演算全体に要する時間をより短縮することができる。 As described above, in the arithmetic device 200, state information is also created based on each element of the output matrix of the convolution operation in the immediately preceding layer, and when the state information satisfies a pre-specified condition, the matrix calculation using the input channel corresponding to the state information is skipped. At this time, the matrix calculation unit 112 skips data belonging to the input channel from the data memory 111 without reading it. In other words, since no unnecessary data is read, the data read time that is wasted as a result can be further reduced, and the time required for the entire calculation can be further shortened compared to the conventional method. Furthermore, in the arithmetic device 200, since the input channels used for the calculation are written in a table, it is not necessary to determine whether or not to read data belonging to the input channel each time a matrix calculation is performed. As a result, the time required for the entire calculation can be further shortened.

　なお、図１５に示すフロー図では、行列計算部１１２が計算結果を出力する都度、ステップＳ１１０８において、ゼロチェック部１１３が状態情報を更新する構成とした。しかしながら、状態情報の更新は、他のタイミングで実施することも可能である。例えば、ゼロチェック部１１３が、同一層に属する入力チャネルにおいて、最後に行列計算をする入力チャネルについての行列計算、すなわち、出力行列の要素の値が確定する行列計算の際のみに状態情報を作成し、サブマップメモリ１１４に格納する構成を採用することもできる。 In the flow diagram shown in FIG. 15, the zero check unit 113 updates the state information in step S1108 each time the matrix calculation unit 112 outputs a calculation result. However, the state information can also be updated at other times. For example, the zero check unit 113 can create state information and store it in the submap memory 114 only when performing a matrix calculation for the last input channel that performs a matrix calculation among input channels belonging to the same layer, that is, when performing a matrix calculation that determines the values of the elements of the output matrix.

　以上では、出力行列ごとに状態情報を含むサブマップを作成する構成について説明した。しかしながら、サブマップは、物理的なメモリアクセス単位で作成することもできる。ここで、物理的なメモリアクセス単位とは、一度のメモリアクセスにより取得することができるデータ量を意味する。 The above describes a configuration for creating a submap that includes status information for each output matrix. However, a submap can also be created in physical memory access units. Here, a physical memory access unit refers to the amount of data that can be obtained by one memory access.

　図１６（ａ）及び図１６（ｂ）は、本手法の概念を説明するための図である。図１６（ａ）は１つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合に対応する。また、図１６（ｂ）は、１つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合に対応する。 FIGS. 16(a) and 16(b) are diagrams for explaining the concept of this method. FIG. 16(a) corresponds to the case where the amount of data in one input channel is smaller than the memory access unit. Also, FIG. 16(b) corresponds to the case where the amount of data in one input channel is larger than the memory access unit.

　図１６（ａ）に示すように、１つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合、メモリアクセス単位中には複数の入力チャネルのデータが含まれることになる。図１６（ａ）に示す事例では、メモリアクセス単位８１には３つの入力チャネルＣｈ０、Ｃｈ１、Ｃｈ２のデータが含まれている。また、メモリアクセス単位８２には２つの入力チャネルＣｈ２、Ｃｈ３のデータが含まれている。この事例では、入力チャネル単位でサブマップを作成すると４つのサブマップ８３ａ、８３ｂ、８３ｃ、８３ｄになるのに対し、メモリアクセス単位でサブマップを作成すると２つのサブマップ８４ａ、８４ｂになる。 As shown in Figure 16(a), when the amount of data in one input channel is smaller than the memory access unit, the memory access unit will contain data from multiple input channels. In the example shown in Figure 16(a), memory access unit 81 contains data from three input channels, Ch0, Ch1, and Ch2. Also, memory access unit 82 contains data from two input channels, Ch2 and Ch3. In this example, creating submaps in input channel units results in four submaps, 83a, 83b, 83c, and 83d, whereas creating submaps in memory access units results in two submaps, 84a and 84b.

　この場合、メモリアクセス単位８１のサブマップ８４ａの状態情報が、例えば、上述の状態１である場合は、当該状態情報を確認するだけで３つの入力チャネルＣｈ０、Ｃｈ１、Ｃｈ２に対する行列計算をスキップさせることができる。また、メモリアクセス単位８１のサブマップ８４ａの状態情報が、例えば、上述の状態４である場合は、さらに、３つの入力チャネルＣｈ０、Ｃｈ１、Ｃｈ２のサブマップ８２ａ、８２ｂ、８２ｃの状態情報を確認することで、特定の入力チャネルに対する行列計算をスキップさせることができる。すなわち、上述の構成に比べて、マップチェック部１１５によるサブマップの確認回数を少なくできる可能性がある。 In this case, if the state information of the submap 84a of the memory access unit 81 is, for example, the above-mentioned state 1, the matrix calculations for the three input channels Ch0, Ch1, and Ch2 can be skipped simply by checking the state information. Also, if the state information of the submap 84a of the memory access unit 81 is, for example, the above-mentioned state 4, the matrix calculations for a specific input channel can be skipped by further checking the state information of the submaps 82a, 82b, and 82c of the three input channels Ch0, Ch1, and Ch2. In other words, compared to the above-mentioned configuration, it is possible to reduce the number of times the submaps are checked by the map check unit 115.

　このような手法は、図１に示す構成において、ゼロチェック部１１３が、メモリアクセス単位で、行列計算部１１２による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成とすることで実現可能である。この場合、ゼロチェック部１１３は、サブマップメモリ１１４に、このゼロチェック部１１３の判断結果を第２状態情報として格納する構成を採用することができる。 Such a technique can be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, in units of memory access, whether or not each element of the output matrix by the matrix calculation unit 112 falls within a pre-specified range. In this case, the zero check unit 113 can be configured to store the determination result of this zero check unit 113 in the submap memory 114 as second state information.

　一方、図１６（ｂ）に示すように、１つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合、複数のメモリアクセス単位により１つの入力チャネルのデータが構成されることになる。図１６（ｂ）に示す事例では、４つのメモリアクセス単位９１、９２、９３、９４により、１つの入力チャネル９５のデータが構成される。この事例では、入力チャネル単位でサブマップを作成すると１つのサブマップ９６ａになるのに対し、メモリアクセス単位でサブマップを作成すると４つのサブマップ９７ａ、９７ｂ、９７ｃ、９７ｄになる。 On the other hand, as shown in Figure 16(b), when the amount of data in one input channel is larger than the memory access unit, the data for one input channel will be made up of multiple memory access units. In the example shown in Figure 16(b), the data for one input channel 95 is made up of four memory access units 91, 92, 93, and 94. In this example, creating submaps in input channel units will result in one submap 96a, whereas creating submaps in memory access units will result in four submaps 97a, 97b, 97c, and 97d.

　この場合、入力チャネル単位のサブマップ９６ａの状態情報が、例えば、上述の状態４である場合は、さらに、４つのメモリアクセス単位のサブマップ９７ａ、９７ｂ、９７ｃ、９７ｄの状態情報を確認することで、入力チャネルの１部分に対する行列計算をスキップさせることができる可能性がある。 In this case, if the state information of the input channel unit submap 96a is, for example, state 4 described above, it may be possible to skip the matrix calculation for a portion of the input channel by further checking the state information of the four memory access unit submaps 97a, 97b, 97c, and 97d.

　このような手法も、図１に示す構成において、ゼロチェック部１１３が、メモリアクセス単位で、行列計算部による出力行列の各要素に対して予め指定された範囲内に属するか否かをさらに判断する構成により実現可能である。 Such a technique can also be realized by configuring the zero check unit 113 in the configuration shown in FIG. 1 to further determine, on a memory access basis, whether each element of the output matrix by the matrix calculation unit falls within a pre-specified range.

　続いて、本手法を実施する際の動作について説明する。本手法においても、サブマップが存在しない１層目の畳み込み演算とサブマップが存在する２層目以降の畳み込み演算とで動作が異なる。しかしながら、１層目の畳み込み演算の動作は、ゼロチェック部１１３が上述の第２状態情報をサブマップメモリ１１４にさらに格納することを除いて図２に示す動作と同一であるためここでの説明は省略する。 Next, the operation when implementing this method will be described. In this method as well, the operation differs between the first layer convolution operation where there is no submap and the second layer and subsequent convolution operations where there are submaps. However, the operation of the first layer convolution operation is the same as the operation shown in FIG. 2 except that the zero check unit 113 further stores the above-mentioned second state information in the submap memory 114, so a description thereof will be omitted here.

　また、２層目以降の畳み込み演算の動作は、図３に示すフロー図において、マップチェック部１１５が、入力チャネルに対応するサブマップを読み出し、当該サブマップに含まれる状態情報を確認するステップの前後に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第２状態情報を確認するステップが追加されることになる。すなわち、図１６（ａ）に示すように、１つの入力チャネルのデータ量がメモリアクセス単位よりも小さい場合、当該サブマップに含まれる状態情報を確認するステップの前に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第２状態情報を確認するステップが追加されることになる。また、図１６（ｂ）に示すように、１つの入力チャネルのデータ量がメモリアクセス単位よりも大きい場合、当該サブマップに含まれる状態情報を確認するステップの後に、メモリアクセスに対応するサブマップを読み出し、サブマップに含まれる第２状態情報を確認するステップが追加されることになる。 In addition, in the operation of the convolution calculation from the second layer onwards, in the flow diagram shown in FIG. 3, before and after the step in which the map check unit 115 reads the submap corresponding to the input channel and checks the state information contained in the submap, a step is added in which the map check unit 115 reads the submap corresponding to the memory access and checks the second state information contained in the submap. That is, as shown in FIG. 16(a), when the amount of data of one input channel is smaller than the memory access unit, a step is added in which the submap corresponding to the memory access is read and the second state information contained in the submap is checked before the step in which the submap corresponding to the memory access is checked. Also, as shown in FIG. 16(b), when the amount of data of one input channel is larger than the memory access unit, a step is added in which the submap corresponding to the memory access is read and the second state information contained in the submap is checked after the step in which the submap corresponding to the memory access is checked.

　以上説明したように、本手法においても、上述の効果を得ることができる。加えて、本手法では、行列計算部１１２に、入力チャネルに属するデータが読み出されることなくスキップされる回数を増やすことができる可能性がある。 As explained above, the above-mentioned effects can be obtained with this method as well. In addition, with this method, it is possible to increase the number of times that data belonging to an input channel is skipped without being read by the matrix calculation unit 112.

　以上では、１層目の畳み込み演算では状態情報を使用しない構成について説明したが、１層目の畳み込み演算において状態情報を使用することも不可能ではない。図１０において説明したように、畳み込み演算では、同一層の行列計算において、同じ入力チャネルを使用して複数の出力行列が算出される。したがって、１つ目の出力行列を算出する行列計算においてサブマップメモリ１１４に登録された状態情報を、同一層の畳み込み演算における２つ目以降の出力行列を算出する行列計算において使用することは可能である。 Although the above describes a configuration in which state information is not used in the convolution calculation of the first layer, it is not impossible to use state information in the convolution calculation of the first layer. As described in FIG. 10, in the convolution calculation, multiple output matrices are calculated using the same input channel in the matrix calculation of the same layer. Therefore, it is possible to use state information registered in the submap memory 114 in the matrix calculation that calculates the first output matrix in the matrix calculation that calculates the second or subsequent output matrices in the convolution calculation of the same layer.

　なお、本構成において、同一層の１つ目の出力行列を計算する際の動作は、図２に示すフロー図や図１１に示す動作における、１つの出力行列を計算する部分と同様である。また、同一層の２つ目以降の出力行列を計算する際の動作は、図３に示すフロー図や図１２に示す動作において、マップチェック部１１５が読み出す状態情報を同一層の１つ目の出力行列に対応する状態情報とした場合と同様である。 In this configuration, the operation when calculating the first output matrix of the same layer is the same as the part that calculates one output matrix in the flow diagram shown in FIG. 2 and the operation shown in FIG. 11. Also, the operation when calculating the second and subsequent output matrices of the same layer is the same as the case where the state information read by the map check unit 115 is the state information corresponding to the first output matrix of the same layer in the flow diagram shown in FIG. 3 and the operation shown in FIG. 12.

　以上説明したとおり、本発明によれば、結果的に無駄になる時間をより低減でき、演算全体に要する時間を従来に比べてさらに短縮することができる。 As explained above, according to the present invention, it is possible to reduce wasted time and further shorten the time required for the entire calculation compared to the conventional method.

　なお、上述した実施形態は本発明の技術的範囲を制限するものではなく、既に記載したもの以外でも種々の変形や応用が可能である。例えば、上記実施形態では、プーリング層について言及していないが、第ｋ－１層の畳み込み演算と第ｋ層の畳み込み演算との間にプーリング層が存在してもよい。出力行列にプーリングが行われても、プーリング後のデータにプーリング前の出力行列の特徴は引き継がれているため、サブマップの情報を問題なく使用できる。また、上述の実施形態では、サブマップが、負判定情報や計数情報を含む事例について説明したが、サブマップは少なくとも状態情報を含んでいればよく、他の情報を含むことは必須ではない。 Note that the above-described embodiments do not limit the technical scope of the present invention, and various modifications and applications are possible other than those already described. For example, although the above embodiments do not mention a pooling layer, a pooling layer may exist between the convolution operation of the k-1th layer and the convolution operation of the kth layer. Even if pooling is performed on the output matrix, the characteristics of the output matrix before pooling are inherited by the data after pooling, so the information in the submap can be used without problems. Furthermore, in the above-described embodiments, a case has been described in which the submap includes negative judgment information and counting information, but it is sufficient that the submap includes at least status information, and it is not essential that it includes other information.

　また、実施形態中で言及した種々の構成は、任意に組み合わせて使用することができる。さらに、図２、図３、図８、図９、図１１、図１２、図１５に示すフロー図は例示であり、等価な作用を奏する範囲において、各ステップの順序を適宜変更可能である。 Furthermore, the various configurations mentioned in the embodiments can be used in any combination. Furthermore, the flow charts shown in Figures 2, 3, 8, 9, 11, 12, and 15 are examples, and the order of each step can be changed as appropriate within the range where an equivalent effect is achieved.

　また、以上では、本発明に係る演算装置による行列計算が、畳み込みニューラルネットワークの畳み込み層における行列計算である事例について説明したが、本発明は、畳み込みニューラルネットワークの畳み込み層に限定されない。本発明は、一連の行列計算において、先の行列計算の出力行列を後の行列計算の演算対象データとして使用する任意の行列計算に適用可能である。 In the above, an example has been described in which the matrix calculation performed by the arithmetic device according to the present invention is a matrix calculation in a convolutional layer of a convolutional neural network, but the present invention is not limited to the convolutional layer of a convolutional neural network. The present invention is applicable to any matrix calculation in which, in a series of matrix calculations, the output matrix of a previous matrix calculation is used as the data to be calculated in the subsequent matrix calculation.

　本発明によれば、結果的に無駄になる時間をより低減できる結果、演算全体に要する時間を従来に比べてさらに短縮することができ、演算装置として有用である。 The present invention is thus useful as a calculation device, since it is possible to reduce wasted time and therefore shorten the time required for the entire calculation compared to conventional methods.

　１００、２００、３００　演算装置
　１１１　データメモリ
　１１２　行列計算部
　１１３　ゼロチェック部
　１１４　サブマップメモリ
　１１５　マップチェック部
　１１６　コントローラ
　１１７　テーブル作成部
　１１８　読み出し制御部
　１２０　サブマップアドレスバッファ 100, 200, 300 Calculation device 111 Data memory 112 Matrix calculation unit 113 Zero check unit 114 Submap memory 115 Map check unit 116 Controller 117 Table creation unit 118 Read control unit 120 Submap address buffer

Claims

A calculation device that uses an output matrix of a previous matrix calculation as calculation target data of a subsequent matrix calculation in a series of matrix calculations,
a data memory for storing data to be calculated;
a matrix calculation unit that reads the data from the data memory, performs a matrix calculation, and stores an output matrix in the data memory;
a zero check unit that judges whether each element of the output matrix is within a predetermined range;
a submap memory for storing a result of the determination by the zero check unit as status information;
a map check unit which determines whether or not to cause the matrix calculation unit to read an output matrix corresponding to the state information stored in the submap memory as data to be calculated, based on the state information;
A computing device comprising:

the matrix calculation is a convolution operation in a convolutional neural network;
The matrix calculation unit stores the output matrix in the data memory as data to be calculated in a next layer in the convolution calculation;
2. The arithmetic device according to claim 1, wherein the map check unit determines, based on the state information stored in the submap memory, whether or not to cause the matrix calculation unit to read out an output matrix corresponding to the state information as data to be calculated in a next layer in a convolution calculation.

The calculation device according to claim 2, wherein the matrix calculation unit reads out the same coordinate region that constitutes a part of each input channel as the data for all input channels that belong to the same layer in the convolution calculation, and performs matrix calculation.

The matrix calculation unit simultaneously reads out a plurality of the data for successively calculating the same output matrix;
2 . The arithmetic device according to claim 1 , wherein the zero check unit uses one of the plurality of state information corresponding to the plurality of data successively read out to the matrix calculation unit as the state information of the plurality of data.

the matrix calculation unit simultaneously reads out a plurality of pieces of data for which matrix calculation is to be performed successively in the same layer in a convolution operation;
4. The arithmetic device according to claim 3, wherein the zero check unit uses one of the plurality of state information corresponding to the plurality of data as the state information of the plurality of data successively read out to the matrix calculation unit.

a table creation unit that creates a table for specifying the output matrix to be read by the matrix calculation unit based on a result of the determination by the map check unit;
The arithmetic unit according to claim 2 , further comprising a read control unit that causes said matrix calculation unit to read said data based on said table.

The arithmetic device according to claim 1, wherein the zero check unit further determines whether each element of the output matrix by the matrix calculation unit is within a pre-specified range in units of memory access, and the submap memory stores the determination result of the zero check unit as second state information.

The arithmetic device according to claim 2, wherein the zero check unit further determines whether each element of the output matrix by the matrix calculation unit is within a pre-specified range in units of memory access, and the submap memory stores the determination result of the zero check unit as second state information.

The arithmetic device according to claim 2, wherein the map check unit executes matrix calculations for the first layer and subsequent layers based on the state information stored in the submap memory as a result of the initial matrix calculation for the first layer in the convolution calculation.

a submap memory buffer that stores information for identifying an output matrix corresponding to the state information, information indicating a storage location of the state information in the submap memory, and information indicating whether the state information has been used in a convolution operation of a next layer, in association with each other;
The arithmetic device according to claim 2 , wherein a storage location associated with the usage information indicating that the state information has been used in a convolution operation of a next layer is selected as a storage location in the submap memory for the newly generated state information.

The calculation device according to claim 2, wherein the submap memory stores in advance kernel state information that determines whether each element of the kernel used in the matrix calculation falls within a pre-specified range, and the map check unit determines whether to read the output matrix corresponding to the state information as data to be calculated from the state information stored in the submap memory and the kernel state information.

The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit compares each element of the output matrix with a plurality of thresholds and determines which of a plurality of ranges defined by the plurality of thresholds all elements of the output matrix belong to.

The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit further determines whether or not a negative value exists in each element of the output matrix, or the number of the elements that belong to any one of the multiple ranges.

The arithmetic device according to any one of claims 1 to 11, wherein the zero check unit creates state information during matrix calculation for an input channel that is last to be matrix-calculated among input channels belonging to the same layer, and stores the state information in a submap memory.