WO2021235312A1

WO2021235312A1 - Information processing device, and information processing method

Info

Publication number: WO2021235312A1
Application number: PCT/JP2021/018193
Authority: WO
Inventors: 馨佐宗
Original assignee: Sony Group Corp
Current assignee: Sony Group Corp
Priority date: 2020-05-20
Filing date: 2021-05-13
Publication date: 2021-11-25
Anticipated expiration: 2022-11-20

Abstract

An information processing device (100) is applied to an information processing system which utilizes an inferred result obtained by an inferrer employing a neural network, and is provided with an acquiring unit (113), a calculating unit (114), and a determining unit (115). The acquiring unit (113) acquires the overall resource usage amount of the information processing system. The calculating unit (114) calculates, on the basis of the resource usage amount, a target resource usage amount to be allocated to at least a portion of the calculations in the inference processing performed by the inferrer. The determining unit (115) determines a calculation method corresponding to the target resource usage amount.

Description

Information processing equipment and information processing method

　本開示は、情報処理装置及び情報処理方法に関する。 This disclosure relates to an information processing device and an information processing method.

　従来、データ処理負荷の増大が予測された場合、入力データを選択的に除去したり、データ処理を遅延させたり、データ処理を他のシステムにオフロードさせたりする技術が提案されている。 Conventionally, when an increase in data processing load is predicted, technologies for selectively removing input data, delaying data processing, and offloading data processing to other systems have been proposed.

　また、近年では、ＤＮＮ（Deep　Neural　Network）などの人工ニューラルネットワークを用いることにより機械学習アルゴリズムが急速な進化を遂げており、ＤＮＮを用いた処理は、画像認識や音声認識、人工知能などの各分野において広く応用されている。 In recent years, machine learning algorithms have rapidly evolved by using artificial neural networks such as DNN (Deep Neural Network), and processing using DNN includes image recognition, voice recognition, and artificial intelligence. Widely applied in the field.

特開２０１２－４３４０９号公報Japanese Unexamined Patent Publication No. 2012-43409

　しかしながら、ＤＮＮを用いた処理は高い精度を有する一方、演算に係る処理負担が大きく、大量のリソースを必要とする場合がある。そこで、ＤＮＮの処理を動作させるシステムに上述の従来技術を適用させることにより、システム全体のリソース使用量がシステムにおいて許容される最大量を超過しないように調整することも考えられる。すなわち、上述の従来技術を適用することにより、システム全体のリソース使用量が最大量を超過すると予測された場合に、データ処理で選択的除去・遅延・オフロードを行うことが考えられる。しかし、データ処理自体のリアルタイム性及び出力品質が低下する等の問題が生じ得る。このため、システム全体のリソース使用量がシステムにおいて許容される最大量を超過しないように調整することを目的として、上述の従来技術の適用は選択し難い。 However, while the processing using DNN has high accuracy, the processing load related to the calculation is large, and a large amount of resources may be required. Therefore, it is conceivable to apply the above-mentioned conventional technique to the system that operates the DNN process so that the resource usage of the entire system does not exceed the maximum amount allowed in the system. That is, by applying the above-mentioned conventional technique, when it is predicted that the resource usage of the entire system exceeds the maximum amount, it is conceivable to perform selective removal / delay / offload in data processing. However, problems such as deterioration of real-time performance and output quality of data processing itself may occur. Therefore, it is difficult to select the application of the above-mentioned prior art for the purpose of adjusting the resource usage of the entire system so as not to exceed the maximum amount allowed in the system.

　そこで、本開示では、リソース使用量の超過を起こさないようにＤＮＮの処理に用いるリソース量を調整できる情報処理装置及び情報処理方法を提案する。 Therefore, in the present disclosure, we propose an information processing device and an information processing method that can adjust the amount of resources used for DNN processing so as not to cause an excess of the amount of resources used.

　上記の課題を解決するために、本開示に係る一形態の情報処理装置は、ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置であって、取得部と、算出部と、決定部とを備える。取得部は、情報処理システムの全体のリソース使用量を取得する。算出部は、リソース使用量に基づいて、推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。決定部は、目標リソース使用量に対応する計算方法を決定する。 In order to solve the above problems, the information processing device of one form according to the present disclosure is an information processing device applied to an information processing system that uses an inference result by an inference device using a neural network, and is an acquisition unit. , A calculation unit, and a determination unit. The acquisition unit acquires the total resource usage of the information processing system. The calculation unit calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage. The decision unit determines the calculation method corresponding to the target resource usage.

本開示の実施形態に係る情報処理の一例を示す図である。It is a figure which shows an example of information processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理システムの概略構成例を示す図である。It is a figure which shows the schematic structure example of the information processing system which concerns on embodiment of this disclosure. 本開示の実施形態に係るＤＮＮモデルの構成例を示す図である。It is a figure which shows the structural example of the DNN model which concerns on embodiment of this disclosure. 本開示の実施形態に係るＤＮＮ部分推論器の概要を示す図である。It is a figure which shows the outline of the DNN partial inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係るＤＮＮ部分推論器の概要を示す図である。It is a figure which shows the outline of the DNN partial inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係るＤＮＮ推論器の概要を示す図である。It is a figure which shows the outline of the DNN inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the functional structure of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの概要を示す図である。It is a figure which shows the outline of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの概要を示す図である。It is a figure which shows the outline of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。It is a figure which shows the outline of the determination method of the calculation method based on the preliminary analysis result which concerns on embodiment of this disclosure. 本開示の実施形態に係る通知動作の概要を示す図である。It is a figure which shows the outline of the notification operation which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールによる処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the processing procedure by the monitoring / adjustment module which concerns on embodiment of this disclosure. 変形例に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。It is a figure which shows the outline of the determination method of the calculation method based on the preliminary analysis result which concerns on a modification. 変形例に係るＤＮＮ推論器の構成例を示す図である。It is a figure which shows the structural example of the DNN inference device which concerns on the modification. 比較例に係る情報処理の一例を示す図である。It is a figure which shows an example of information processing which concerns on a comparative example. 本開示の実施形態に係るＤＮＮ推論器による情報処理の一例を示す図である。It is a figure which shows an example of the information processing by the DNN inference device which concerns on embodiment of this disclosure. 比較例に係る情報処理についての経過時間とリソース使用量との関係を示す図である。It is a figure which shows the relationship between the elapsed time and the resource use amount about the information processing which concerns on a comparative example. 本開示の実施形態に係るＤＮＮ推論器による情報処理についての経過時間とリソース使用量との関係を示す図である。It is a figure which shows the relationship between the elapsed time and the resource use amount about the information processing by the DNN inference device which concerns on embodiment of this disclosure. 既存技術による情報処理の一例を示す図である。It is a figure which shows an example of information processing by an existing technology.

　以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の数字又は符号を付することにより重複する説明を省略する場合がある。また、本明細書及び図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の数字又は符号の後に異なる数字又は符号を付して区別する場合もある。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, duplicate explanations may be omitted by assigning the same numbers or reference numerals to the same parts. Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding a different number or reference numeral after the same number or reference numeral.

　また、以下に示す項目順序に従って本開示を説明する。
　　１．本開示の実施形態に係る情報処理の一例
　　２．情報処理システムの構成例
　　３．ＤＮＮモデルの構成例
　　４．監視・調整モジュールの機能構成例
　　４－１．監視・調整モジュールの概要
　　４－２．監視・調整モジュールの動作（１）
　　４－３．監視・調整モジュールの動作（２）
　　５．監視・調整モジュールの処理手順例
　　６．変形例
　　６－１．機械学習による対応情報の生成
　　６－２．リソース切り替えのオーバーヘッド削減
　　７．その他
　　７－１．ロボットシステムへの適用
　　７－２．ゲーム機への適用
　　８．むすび In addition, the present disclosure will be described according to the order of items shown below.
1. 1. An example of information processing according to the embodiment of the present disclosure 2. Configuration example of information processing system 3. DNN model configuration example 4. Example of functional configuration of monitoring / adjustment module 4-1. Overview of monitoring / adjustment module 4-2. Operation of monitoring / adjustment module (1)
4-3. Operation of monitoring / adjustment module (2)
5. Example of processing procedure of monitoring / adjustment module 6. Modification example 6-1. Generation of correspondence information by machine learning 6-2. Reduction of resource switching overhead 7. Others 7-1. Application to robot systems 7-2. Application to game consoles 8. Conclusion

＜＜１．本開示の実施形態に係る情報処理の一例＞＞
　図１は、本開示の実施形態に係る情報処理の一例を示す図である。図１に示すように、本開示の実施形態に係る監視・調整モジュール１００（情報処理装置の一例）は、ＤＮＮモデル３０とシステムモジュール５０とを備える情報処理システム１に適用される。 << 1. An example of information processing according to an embodiment of the present disclosure >>
FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure. As shown in FIG. 1, the monitoring / adjusting module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 including a DNN model 30 and a system module 50.

　情報処理システム１は、システムモジュール５０における処理と並行して、ＤＮＮモデル３０の推論を動作させる。 The information processing system 1 operates the inference of the DNN model 30 in parallel with the processing in the system module 50.

　ＤＮＮモデル３０は、入力層、出力層、複数の隠れ層（中間層）などの複数の層を有する人工ニューラルネットワークであるＤＮＮを用いて機械学習された学習済みモデルにより構成される。ＤＮＮモデル３０は、入力層、出力層、複数の隠れ層などの複数の層の他、チャネル、行列などの複数の要素で構成される。ＤＮＮモデル３０は、監視・調整モジュール１００から要素ごとに与えられた計算方法により入力データの演算処理を行い、演算結果を出力する。 The DNN model 30 is composed of a trained model machine-learned using DNN, which is an artificial neural network having a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers (intermediate layers). The DNN model 30 is composed of a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers, as well as a plurality of elements such as a channel and a matrix. The DNN model 30 performs calculation processing of input data by a calculation method given for each element from the monitoring / adjustment module 100, and outputs a calculation result.

　システムモジュール５０は、ＤＮＮモデル３０の推論結果を利用して各種処理を実行する。システムモジュール５０は、例えば、ＤＮＮモデル３０によるカメラ画像の認識結果に基づいてロボットの動作を制御するモジュールや、ＤＮＮモデル３０による音声やハンドジェスチャの認識結果を利用するＵＩ（User　Interface）機能を備えたゲーム機の処理を実行するモジュールなどに相当する。 The system module 50 executes various processes using the inference result of the DNN model 30. The system module 50 is provided with, for example, a module that controls the operation of the robot based on the recognition result of the camera image by the DNN model 30, and a UI (User Interface) function that uses the recognition result of voice and hand gesture by the DNN model 30. It corresponds to a module that executes the processing of the game machine.

　このような情報処理システム１において、システム全体のリソース使用量が最大量を超過すると予測された場合に、データ処理で選択的除去・遅延・オフロードを行うと、データ処理自体のリアルタイム性及び出力品質が低下する等の問題が生じ得る。すなわち、データ処理において時系列データの選択的除去が行われると、ある時刻の重要なデータを見逃す可能性が生じ、処理の遅延が行われれば処理結果の取得が遅れ、処理結果を用いるシステムの他の機能のリアルタイム性に影響を及ぼす。 In such an information processing system 1, when the resource usage of the entire system is predicted to exceed the maximum amount, if selective removal / delay / offload is performed in the data processing, the real-time property and output of the data processing itself are performed. Problems such as deterioration of quality may occur. That is, if time-series data is selectively removed in data processing, important data at a certain time may be overlooked, and if processing is delayed, acquisition of processing results is delayed, and a system that uses processing results Affects the real-time nature of other functions.

　例えば、ロボットシステムのように、認識から行動までのリアルタイム性が重視されるシステムには、選択的除去による見逃しや処理遅延による行動決定の遅れは深刻である。また、認識処理によっては、１つのデータに対して１０ミリ秒程度を要するタスクもあるが、電動部の制御は数十から数百マイクロ秒間隔で制御されることもある。この場合、認識処理するデータが切り替わるタイミングで認識処理のリソース使用量を切り替えたとしても電動部のリソース使用量の変動に追従できない。 For example, in a system such as a robot system where real-time performance from recognition to action is important, the delay in action decision due to oversight due to selective removal or processing delay is serious. Further, depending on the recognition process, there is a task that requires about 10 milliseconds for one data, but the control of the electric unit may be controlled at intervals of several tens to several hundreds of microseconds. In this case, even if the resource usage amount of the recognition processing is switched at the timing when the data to be recognized processing is switched, it is not possible to follow the fluctuation of the resource usage amount of the electric unit.

　また、ゲームソフトウェアと並行して動作するＵＩエージェントを備えるシステムであれば、ゲームソフトウェア自体の処理負荷が高い間、ユーザによる操作を全く認識できなくなってしまうという問題が生じ得る。 Further, if the system is equipped with a UI agent that operates in parallel with the game software, there may be a problem that the operation by the user cannot be recognized at all while the processing load of the game software itself is high.

　このような問題点に鑑み、本開示では、ある処理と並行してＤＮＮの推論を動作させるシステムにおいて、リソース使用量の超過を起こさないようにＤＮＮの処理に用いるリソース量を調整することが可能な監視・調整モジュール１００を提案する。 In view of these problems, in the present disclosure, in a system in which DNN inference is operated in parallel with a certain process, it is possible to adjust the amount of resources used for DNN processing so as not to cause an excess of resource usage. We propose a monitoring / adjustment module 100.

　本開示の実施形態に係る監視・調整モジュール１００は、システム全体のリソース使用量を監視しながら、例えば、ＤＮＮによる推論処理に使用するリソース量を瞬時に調整することができる。かかる監視・調整モジュール１００は、ＤＮＮを構成する層やマップ等の要素の一部を計算するたびに、システム全体のリソース使用量を取得し、システムにおいて使用が許容されるリソースの最大量の超過を起こさないように次の要素で使ってもよいリソース量を決定する。このリソース量を決める時間間隔（要素の細かさ）を小さくすることで、処理途中であってもＤＮＮによる推論処理に使用するリソースの一部を瞬時に調整することができるようになる。調整の結果生まれた余剰によりシステム全体のリソース量は限界を超えることがなくなり、異常終了・異常動作・レイテンシ超過等をすることなく、システムは動作し続けられる。 The monitoring / adjustment module 100 according to the embodiment of the present disclosure can instantly adjust the amount of resources used for inference processing by DNN, for example, while monitoring the amount of resources used in the entire system. Each time the monitoring / adjusting module 100 calculates a part of elements such as layers and maps constituting the DNN, the resource usage of the entire system is acquired, and the maximum amount of resources allowed to be used in the system is exceeded. Determine the amount of resources that may be used in the next factor so as not to cause. By reducing the time interval (fineness of elements) that determines the amount of resources, it becomes possible to instantly adjust a part of the resources used for inference processing by DNN even during processing. Due to the surplus created as a result of the adjustment, the amount of resources of the entire system does not exceed the limit, and the system continues to operate without abnormal termination, abnormal operation, excess latency, etc.

　また、ＤＮＮは、参考文献１に示されるように、計算精度を落として少ないリソースで処理したとしても、例えばカメラ画像などの認識精度が低下しにくい性質がある。参考文献１では、ＤＮＮの量子化ビット数を増減させたとき認識精度への影響を抑える手法が提案されている。この手法（PACT）によれば、いくつかの物体認識モデルにおいて量子化ビット数が削減されても、ある程度の認識精度低下で抑えることができる。
　参考文献１：C．Jungwook，“PACT：Parameterized　Clipping　Activation　for　Quantized　Neural　Networks”,　Computer　Vision　and　Pattern　Recognition,　2018 Further, as shown in Reference 1, DNN has a property that the recognition accuracy of, for example, a camera image is unlikely to be lowered even if the calculation accuracy is lowered and the processing is performed with a small number of resources. Reference 1 proposes a method of suppressing the influence on the recognition accuracy when the number of quantization bits of DNN is increased or decreased. According to this method (PACT), even if the number of quantization bits is reduced in some object recognition models, it can be suppressed by a certain decrease in recognition accuracy.
Reference 1: C.I. Jungwook, “PACT: Parameterized Clipping Activation for Quantized Neural Networks”, Computer Vision and Pattern Recognition, 2018

　本開示の実施形態に係る監視・調整モジュール１００により実現される機能は、ＤＮＮによる推論処理中に要素間の途中でリソース使用量を調整する技術を含んでいる。上述のように、ＤＮＮは計算精度（ＤＮＮに割り当てるリソース量）を落としても精度が低下しにくい性質に着目することによって、システム全体のリソース使用量に配慮しつつ、ＤＮＮによる推論を中止する必要のない推論器を実現できる。 The function realized by the monitoring / adjusting module 100 according to the embodiment of the present disclosure includes a technique for adjusting the resource usage in the middle between the elements during the inference processing by DNN. As mentioned above, DNN needs to stop inference by DNN while considering the resource usage of the entire system by paying attention to the property that the accuracy does not easily decrease even if the calculation accuracy (the amount of resources allocated to DNN) is reduced. It is possible to realize an inference device without.

　また、本開示の実施形態に係る監視・調整モジュール１００は各要素のリソース使用量を決定する際に、ＤＮＮの推論処理の精度ができるだけ低下しないよう考慮する仕組みを有する。これは、ＤＮＮモデルにおいて処理に必要になるリソースを削減する要素と量と精度（推論の精度）の関係を事前に解析することにより実現する。この技術により、システム全体のリソース使用量に余裕がある際には、非常に高精度でデータ処理が行うことができ、使用量に余裕がなくなった状況においても、ある程度の処理（認識）精度でデータ処理を継続できることを保証できる。 Further, the monitoring / adjustment module 100 according to the embodiment of the present disclosure has a mechanism for considering that the accuracy of the DNN inference processing is not lowered as much as possible when determining the resource usage amount of each element. This is realized by analyzing in advance the relationship between the amount and the accuracy (inference accuracy) and the element that reduces the resources required for processing in the DNN model. With this technology, data processing can be performed with extremely high accuracy when there is a margin in the resource usage of the entire system, and even in a situation where there is no margin in the usage, with a certain degree of processing (recognition) accuracy. It can be guaranteed that data processing can be continued.

　本開示の実施形態に係る監視・調整モジュール１００による情報処理について、以下にその概要を説明する。 The outline of the information processing by the monitoring / adjustment module 100 according to the embodiment of the present disclosure will be described below.

　まず、監視・調整モジュール１００は、情報処理システム１の全体のリソース使用量を取得する（ステップＳ１）。情報処理システム１の全体のリソース使用量は、例えば、消費電力やメモリ使用量、処理時間などに該当する。 First, the monitoring / adjustment module 100 acquires the total resource usage of the information processing system 1 (step S1). The total resource usage of the information processing system 1 corresponds to, for example, power consumption, memory usage, processing time, and the like.

　続いて、監視・調整モジュール１００は、情報処理システム１の全体のリソース使用量に基づいて、ＤＮＮモデル３０による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する（ステップＳ２）。ＤＮＮモデル３０による推論処理の計算は、例えば、多段で構成された複数の層の計算を連ねて構成される。そこで、監視・調整モジュール１００は、例えば、情報処理システム１のリソース余剰に基づいて、ＤＮＮモデルの推論処理を構成する層ごとに、目標リソース使用量を算出する。 Subsequently, the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1 (step S2). The calculation of the inference process by the DNN model 30 is composed of, for example, a series of calculations of a plurality of layers composed of multiple stages. Therefore, the monitoring / adjustment module 100 calculates the target resource usage amount for each layer constituting the inference processing of the DNN model, for example, based on the resource surplus of the information processing system 1.

　目標リソース使用量の算出後、監視・調整モジュール１００は、目標リソース使用量に対応する計算方法を決定し（ステップＳ３）、ＤＮＮモデル３０へ出力する。具体的には、監視・調整モジュール１００は、リソース使用量と、リソース使用量ごとに事前解析した計算方法との対応情報に基づいて、計算方法を決定する。計算方法は、ＤＮＮモデル３０による推論処理の計算をどのように行うかを示す制御情報である「量子化ビット数」や「Pruning割合」などに基づいて決定される。 After calculating the target resource usage amount, the monitoring / adjustment module 100 determines a calculation method corresponding to the target resource usage amount (step S3), and outputs the calculation method to the DNN model 30. Specifically, the monitoring / adjustment module 100 determines the calculation method based on the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The calculation method is determined based on "quantization bit number", "Pruning ratio", etc., which are control information indicating how the calculation of the inference process by the DNN model 30 is performed.

　このように、監視・調整モジュール１００は、情報処理システム１の全体のリソース使用量に基づいて、ＤＮＮモデル３０による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。そして、監視・調整モジュール１００は、算出した目標リソース使用量に対応する計算方法を決定する。これにより、監視・調整モジュール１００は、情報処理システム１においてリソース使用量の超過を起こさないようにＤＮＮモデル３０の推論処理に用いるリソース量を調整できる。 In this way, the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1. Then, the monitoring / adjustment module 100 determines a calculation method corresponding to the calculated target resource usage amount. Thereby, the monitoring / adjusting module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.

＜＜２．情報処理システムの構成例＞＞
　以下、本開示の実施形態に係る情報処理システム１の構成例を説明する。図２は、本開示の実施形態に係る情報処理システムの概略構成例を示す図である。 << 2. Information processing system configuration example >>
Hereinafter, a configuration example of the information processing system 1 according to the embodiment of the present disclosure will be described. FIG. 2 is a diagram showing a schematic configuration example of an information processing system according to an embodiment of the present disclosure.

　図２に示すように、情報処理システム１は、プロセッサ１１と、主記憶装置１２と、補助記憶装置１３と、周辺回路１４と、入力装置１５と、出力装置１６と、周辺装置１７と、通信装置１８とを備える。プロセッサ１１と、主記憶装置１２と、補助記憶装置１３と、周辺回路１４と、通信装置１８とは、内部バス２０を介して相互に接続される。 As shown in FIG. 2, the information processing system 1 communicates with a processor 11, a main storage device 12, an auxiliary storage device 13, a peripheral circuit 14, an input device 15, an output device 16, and a peripheral device 17. A device 18 is provided. The processor 11, the main storage device 12, the auxiliary storage device 13, the peripheral circuit 14, and the communication device 18 are connected to each other via the internal bus 20.

　プロセッサ１１は、例えば、ＣＰＵ（Central　Processing　Unit）やＭＰＵ（Micro　Processing　Unit）、ＧＰＵ（Graphics　Processing　Unit）等のプロセッサにより実現される。プロセッサ１１は、情報処理システム１における演算処理及び動作制御を実行する。 The processor 11 is realized by, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit). The processor 11 executes arithmetic processing and operation control in the information processing system 1.

　主記憶装置１２は、ＲＡＭ（Random　Access　Memory)等の半導体メモリ素子により実現される。補助記憶装置１３は、ＲＯＭ（Read　Only　Memory）等の半導体メモリ素子やハードディスク、光ディスク等の記憶装置により実現される。 The main storage device 12 is realized by a semiconductor memory element such as a RAM (Random Access Memory). The auxiliary storage device 13 is realized by a semiconductor memory element such as a ROM (Read Only Memory) or a storage device such as a hard disk or an optical disk.

　周辺回路１４は、Ａ／Ｄコンバータやタイマ、信号処理回路などにより実現される。周辺回路１４は、入力装置１５や出力装置１６、周辺装置１７の各種信号やデータを処理する。 The peripheral circuit 14 is realized by an A / D converter, a timer, a signal processing circuit, or the like. The peripheral circuit 14 processes various signals and data of the input device 15, the output device 16, and the peripheral device 17.

　入力装置１５は、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等により実現される。また、入力装置１５は、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラやマイクロフォンなどの音声入力装置により実現することもできる。 The input device 15 is realized by, for example, a mouse, a keyboard, a touch panel, buttons, switches, levers, and the like. Further, the input device 15 can also be realized by a voice input device such as a remote controller or a microphone capable of transmitting a control signal using infrared rays or other radio waves.

　出力装置１６は、ＣＲＴ（Cathode　Ray　Tube）、ＬＣＤ（Liquid　Crystal　Display）、又は有機ＥＬ等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置により実現できる。また、出力装置１６は、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置により実現することもできる。 The output device 16 can be realized by a display device such as a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), or an organic EL, and an audio output device such as a speaker or a headphone. Further, the output device 16 can also be realized by a device such as a printer, a mobile phone, or a facsimile that can visually or audibly notify the user of the acquired information.

　周辺装置１７は、入力装置１５及び出力装置１６以外で情報処理システム１に搭載される他の装置である。周辺装置１７は、例えば、加速度センサや角速度センサなどの各種センサ、慣性計測装置、ＴｏＦ（Time　of　Flight）センサの他、ＧＰＳ（Global　Positioning　System）、アクチュエータ、カメラ、スピーカ、バッテリーなどにより実現できる。 The peripheral device 17 is another device mounted on the information processing system 1 other than the input device 15 and the output device 16. The peripheral device 17 can be realized by, for example, various sensors such as an acceleration sensor and an angular velocity sensor, an inertial measurement unit, a ToF (Time of Flight) sensor, a GPS (Global Positioning System), an actuator, a camera, a speaker, a battery, and the like.

　通信装置１８は、ＮＩＣ（Network　Interface　Card）や各種通信用モデム、Ｂｌｕｅｔｏｏｔｈ（登録商標）やＷｉ－Ｆｉ（登録商標）などの無線モジュールにより実現できる。 The communication device 18 can be realized by a wireless module such as a NIC (Network Interface Card), various communication modems, Bluetooth (registered trademark) and Wi-Fi (registered trademark).

　図１に示す監視・調整モジュール１００による各種処理は、主記憶装置１２等を作業領域として、図２に示すプロセッサ１１が図２に示す補助記憶装置１３に格納されている各種プログラム等を実行することで実現され得る。また、ＤＮＮモデル３０による推論は、主記憶装置１２等を作業領域として、図２に示すプロセッサ１１が図２に示す補助記憶装置１３に格納されている各種プログラム等を実行することで実現され得る。また、システムモジュール５０による各種処理は、主記憶装置１２等を作業領域として、図２に示すプロセッサ１１が図２に示す補助記憶装置１３に格納されている各種プログラム等を実行することで実現され得る。すなわち、プロセッサ１１、主記憶装置１２、及び補助記憶装置１３は、ソフトウェア（各種プログラム）との協働により、以下に説明する監視・調整モジュール１００の各種機能（例えば、取得部１１３～通知部１１６による処理機能）を実現し得る。なお、図２に示すプロセッサ１１が実行する各種プログラムは、通信装置１８を介して、サーバ等の外部装置からダウンロードされたプログラムを用いることもできる。 In the various processes by the monitoring / adjusting module 100 shown in FIG. 1, the processor 11 shown in FIG. 2 executes various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. It can be realized by. Further, the inference by the DNN model 30 can be realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a working area. .. Further, various processes by the system module 50 are realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. obtain. That is, the processor 11, the main storage device 12, and the auxiliary storage device 13 cooperate with software (various programs) to perform various functions (for example, acquisition unit 113 to notification unit 116) of the monitoring / adjustment module 100 described below. Processing function) can be realized. As the various programs executed by the processor 11 shown in FIG. 2, a program downloaded from an external device such as a server can also be used via the communication device 18.

＜＜３．ＤＮＮモデルの構成例＞＞
　以下、本開示の実施形態に係るＤＮＮモデル３０の概要を説明する。図３は、本開示の実施形態に係るＤＮＮモデルの構成例を示す図である。図４及び図５は、本開示の実施形態に係るＤＮＮ部分推論器の概要を示す図である。 << 3. DNN model configuration example >>
Hereinafter, the outline of the DNN model 30 according to the embodiment of the present disclosure will be described. FIG. 3 is a diagram showing a configuration example of the DNN model according to the embodiment of the present disclosure. 4 and 5 are diagrams showing an outline of the DNN partial inference device according to the embodiment of the present disclosure.

　図３に示すように、ＤＮＮモデル３０は、予め定められる条件に基づいて、所定の粒度で分割された複数のＤＮＮ部分推論器３１（３１_ｍ～３１_ｍ＋ｎ）で構成される。ＤＮＮモデル３０は、層、チャネル、及び行列を含む複数の要素で構成される。ＤＮＮ部分推論器３１は、それぞれ、ＤＮＮモデル３０を構成する層、チャネル、及び行列のうちの少なくとも１つの要素に基づく粒度で分割されることにより構成される。ＤＮＮモデル３０を分割する粒度は、後述する監視・調整モジュール１００の管理者が、事前解析に要するコストや求めるリソース使用量調整の時間間隔に基づいて予め決定する。 As shown in FIG. 3, the DNN model 30 is composed of _{a plurality of DNN partial inferiors 31 (31 m} to 31 _{m + n) divided into predetermined particle sizes based on predetermined conditions.} The DNN model 30 is composed of a plurality of elements including layers, channels, and matrices. The DNN partial reasoner 31 is configured by being divided by a particle size based on at least one element of the layers, channels, and matrices constituting the DNN model 30, respectively. The particle size for dividing the DNN model 30 is determined in advance by the administrator of the monitoring / adjustment module 100, which will be described later, based on the cost required for the preliminary analysis and the required resource usage adjustment time interval.

　図４に示すように、ＤＮＮ部分推論器３１は、分割されたＤＮＮのある要素（ブロック）の結果を、例えば、監視・調整モジュール１００により与えられた計算方法で演算して出力する。このため、入力はＤＮＮ推論の入力データ、もしくは前段のＤＮＮ部分推論器３１の出力（活性値：Activation）であり、出力は担当するＤＮＮの要素(ブロック)の結果である。つまり、図３に示すように、複数のＤＮＮ部分推論器３１（３１_ｍ～３１_ｍ＋ｎ）を繋げて動作させることにより、入力データに対してＤＮＮの推論が行われた結果を返す機能が実現できる。推論するＤＮＮモデル３０の構造によっては枝分かれ状の構造が必要となる場合もあるが、同じように構成することができる。 As shown in FIG. 4, the DNN partial inferior 31 calculates and outputs the result of a certain element (block) of the divided DNN by, for example, the calculation method given by the monitoring / adjusting module 100. Therefore, the input is the input data of DNN inference or the output (activity value: Activation) of the DNN partial inference device 31 in the previous stage, and the output is the result of the element (block) of the DNN in charge. That is, as shown in FIG. 3, _{by connecting and operating a plurality of DNN partial inference devices 31 (31 m} to 31 _{m + n} ), it is possible to realize a function of returning the result of DNN inference for the input data. .. Depending on the structure of the DNN model 30 to be inferred, a branched structure may be required, but it can be configured in the same manner.

　本開示の実施形態において、ＤＮＮ部分推論器３１は、図５に示すように、監視・調整モジュール１００の出力である計算方法Ｏｍ（制御情報）を入力として受け取る。ＤＮＮ部分推論器３１は、入力データと計算方法Ｏｍ（制御情報）を同時に取得し、取得した（制御情報に基づく）計算方法により入力データに対する計算を行い、担当するＤＮＮ要素（ブロック）の結果として出力を返す。具体的には、重みや活性値（Activation）の量子化ビット数や量子化方式、Pruning手法のPruning割合、用いる計算式や重みパラメータなどを可変とできる。 In the embodiment of the present disclosure, the DNN partial inference device 31 receives the calculation method Om (control information) which is the output of the monitoring / adjusting module 100 as an input, as shown in FIG. The DNN partial inference device 31 simultaneously acquires the input data and the calculation method Om (control information), performs a calculation on the input data by the acquired (based on the control information) calculation method, and as a result of the DNN element (block) in charge. Returns the output. Specifically, the number of quantization bits and the quantization method of the weight and the activation value, the Pruning ratio of the Pruning method, the calculation formula to be used, the weight parameter, and the like can be changed.

＜＜４．監視・調整モジュールの機能構成例＞＞
　以下、本開示の実施形態に係る監視・調整モジュールの機能構成の一例を説明する。図６は、本開示の実施形態に係るＤＮＮ推論器の概要を示す図である。図７は、本開示の実施形態に係る監視・調整モジュールの機能構成の一例を示す機能ブロック図である。 << 4. Function configuration example of monitoring / adjustment module >>
Hereinafter, an example of the functional configuration of the monitoring / adjustment module according to the embodiment of the present disclosure will be described. FIG. 6 is a diagram showing an outline of the DNN inference device according to the embodiment of the present disclosure. FIG. 7 is a functional block diagram showing an example of the functional configuration of the monitoring / adjustment module according to the embodiment of the present disclosure.

　図６に示すように、ＤＮＮ部分推論器３１（３１_ｍ～３１_ｍ＋ｎ）と、監視・調整モジュール１００（１００_ｍ～１００_ｍ＋ｎ）とを組み合わせることにより、ＤＮＮ推論器７０を実現できる。図６に示すＤＮＮ推論器７０では、ＤＮＮ部分推論器３１ごとに、監視・調整モジュール１００（１００_ｍ～１００_ｍ＋ｎ）が設けられる。図６に示すＤＮＮ推論器７０によれば、監視・調整モジュール１００（１００_ｍ～１００_ｍ＋ｎ）が、ＤＮＮの分解された要素を計算する毎にシステム全体のリソース使用量を取得し、システム状態に合わせた計算方法に都度切り替えることができる。これにより、リソース使用量を瞬時に調整可能であり、推論の中断が不要なＤＮＮ推論器７０を実現できる。なお、以下の説明において登場する「ブロック」は、ＤＮＮ部分推論器３１と同義である。 As shown in FIG. 6, the DNN inference device 70 can be realized by combining the DNN partial inference device 31 (31 _m to 31 _{m + n} ) and the monitoring / adjustment module 100 (100 _m to 100 _{m + n).} In the DNN inference device 70 shown in FIG. 6, a monitoring / adjustment module 100 (100 _m to 100 _{m + n} ) is provided for each DNN partial inference device 31. According to the DNN inferior 70 shown in FIG. 6, the monitoring / adjustment module 100 (100 _m to 100 _{m + n} ) acquires the resource usage of the entire system every time the decomposed element of the DNN is calculated, and puts it in the system state. You can switch to the matching calculation method each time. As a result, the resource usage can be adjusted instantly, and the DNN inference device 70 that does not require interruption of inference can be realized. The "block" appearing in the following description is synonymous with the DNN partial inference device 31.

　図７に示すように、監視・調整モジュール１００は、リソース使用量情報格納部１１１と、対応情報格納部１１２と、取得部１１３と、算出部１１４と、決定部１１５と、通知部１１６とを備える。監視・調整モジュール１００は、これらの各部により、以下に説明する監視・調整モジュール１００の機能や作用を実現または実行する。 As shown in FIG. 7, the monitoring / adjustment module 100 includes a resource usage information storage unit 111, a correspondence information storage unit 112, an acquisition unit 113, a calculation unit 114, a determination unit 115, and a notification unit 116. Be prepared. The monitoring / adjusting module 100 realizes or executes the functions and operations of the monitoring / adjusting module 100 described below by each of these parts.

　なお、監視・調整モジュール１００を構成する各ブロック（リソース使用量情報格納部１１１～通知部１１６）はそれぞれ監視・調整モジュール１００の機能を示す機能ブロックである。これら機能ブロックはソフトウェアブロックであってもよいし、ハードウェアブロックであってもよい。例えば、上述の機能ブロックが、それぞれ、ソフトウェア（マイクロプログラムを含む。）で実現される１つのソフトウェアモジュールであってもよいし、半導体チップ（ダイ）上の１つの回路ブロックであってもよい。勿論、各機能ブロックがそれぞれ１つのプロセッサ又は１つの集積回路であってもよい。機能ブロックの構成方法は任意である。なお、監視・調整モジュール１００は、上述の機能ブロックとは異なる機能単位で構成されていてもよい。 Each block (resource usage information storage unit 111 to notification unit 116) constituting the monitoring / adjustment module 100 is a functional block indicating the function of the monitoring / adjustment module 100, respectively. These functional blocks may be software blocks or hardware blocks. For example, each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die). Of course, each functional block may be one processor or one integrated circuit. The method of configuring the functional block is arbitrary. The monitoring / adjustment module 100 may be configured in a functional unit different from the above-mentioned functional block.

　リソース使用量情報格納部１１１は、ＤＮＮの各ブロック（ＤＮＮ部分推論器３１）の計算で使用されたリソース使用量の情報を記憶する。リソース使用量情報格納部１１１に記憶されるリソース使用量の情報は、後述する算出部１１４により格納される。 The resource usage information storage unit 111 stores information on the resource usage used in the calculation of each block of DNN (DNN partial inference device 31). The resource usage information stored in the resource usage information storage unit 111 is stored by the calculation unit 114, which will be described later.

　対応情報格納部１１２は、リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する。リソース使用量ごとの計算方法は、例えば、監視・調整モジュール１００の管理者であるオペレータの事前解析により取得される。 The correspondence information storage unit 112 stores the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The calculation method for each resource usage is acquired, for example, by prior analysis of the operator who is the administrator of the monitoring / adjustment module 100.

　取得部１１３は、情報処理システム１の全体のリソース使用量を取得する。取得部１１３は、例えば、システムモジュール５０から、情報処理システム１の全体の消費電力やメモリ使用量を取得する。 The acquisition unit 113 acquires the total resource usage of the information processing system 1. The acquisition unit 113 acquires the total power consumption and memory usage of the information processing system 1 from, for example, the system module 50.

　算出部１１４は、取得部１１３により取得された情報処理システム１の全体のリソース使用量に基づいて、ＤＮＮ推論器７０による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。例えば、算出部１１４は、予め定められる条件に基づいて、所定の粒度で分割されたＤＮＮ部分推論器３１（図６参照）による推論処理の計算ごとに割り当てる目標リソース使用量を算出する。算出部１１４は、例えば、情報処理システム１のリソース余剰に基づいて、目標リソース使用量を算出する。 The calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 based on the total resource usage amount of the information processing system 1 acquired by the acquisition unit 113. For example, the calculation unit 114 calculates the target resource usage amount to be allocated for each calculation of the inference process by the DNN partial inference device 31 (see FIG. 6) divided into predetermined particle sizes based on predetermined conditions. The calculation unit 114 calculates the target resource usage amount based on, for example, the resource surplus of the information processing system 1.

　決定部１１５は、算出部１１４により算出された目標リソース使用量に対応するＤＮＮ部分推論器３１の計算方法を決定する。計算方法は、あるＤＮＮモデル３０の要素（ブロック）、すなわち次に推論処理を行うＤＮＮ部分推論器３１（図６参照）の計算をどのように行うかを示す制御情報であり、「量子化ビット数」や「Pruning割合」などに該当する。決定部１１５は、「量子化ビット数」と「Pruning割合」を組み合わせるなど、タプルで構成された制御情報に基づいて計算方法を決定できる。決定部１１５は、決定した計算方法を対応するＤＮＮ部分推論器３１に出力する。 The determination unit 115 determines the calculation method of the DNN partial inference device 31 corresponding to the target resource usage calculated by the calculation unit 114. The calculation method is control information indicating how to calculate an element (block) of a certain DNN model 30, that is, a DNN partial inference device 31 (see FIG. 6) that performs inference processing next, and is "quantization bit". It corresponds to "number" and "Pruning ratio". The determination unit 115 can determine the calculation method based on the control information composed of tuples, such as combining the "quantization bit number" and the "Pruning ratio". The determination unit 115 outputs the determined calculation method to the corresponding DNN partial inference device 31.

　通知部１１６は、ＤＮＮ部分推論器３１に割り当てるリソース使用量が低下することを通知する。通知部１１６は、例えば、算出部１１４により算出された目標リソース使用量が所定の基準以下となる場合、ＤＮＮ推論器７０による推論処理の結果（推論結果）の精度が低下する可能性があることを通知する。 The notification unit 116 notifies that the resource usage allocated to the DNN partial inferior 31 will decrease. For example, when the target resource usage amount calculated by the calculation unit 114 is equal to or less than a predetermined standard, the notification unit 116 may reduce the accuracy of the inference processing result (inference result) by the DNN inference device 70. Notify.

＜４－１．監視・調整モジュールの概要＞
　以下、監視・調整モジュール１００の概要について説明する。図８は、本開示の実施形態に係る監視・調整モジュールの概要を示す図である。 <4-1. Overview of monitoring / adjustment module>
Hereinafter, the outline of the monitoring / adjustment module 100 will be described. FIG. 8 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.

　図８に示すように、監視・調整モジュール１００は、入力として与えられる情報処理システム１の全体のリソース使用量Ｉｍから、ＤＮＮモデル３０を構成する、ある要素（ブロック）の計算方法Ｏｍを決定して出力する。ある要素（各ブロック：ＤＮＮ部分推論器３１）は、レイヤー（Layer）、チャネル（Channel（Map））、行列（Tensor）の一部分であっても、これらの組み合わせであってもよい。監視・調整モジュール１００の管理者は、予めＤＮＮモデル３０を複数のブロックに分割しておき、各ブロックにブロック番号ｂｉを付けておくこととする。どの程度細かい単位で分割しておくかは事前解析に要するコストや求めるリソース使用量調整の時間間隔に基づいて決定する。 As shown in FIG. 8, the monitoring / adjusting module 100 determines the calculation method Om of a certain element (block) constituting the DNN model 30 from the total resource usage Im of the information processing system 1 given as an input. And output. An element (each block: DNN partial inferior 31) may be a part of a layer, a channel (Map), a matrix (Tensor), or a combination thereof. The administrator of the monitoring / adjusting module 100 divides the DNN model 30 into a plurality of blocks in advance, and assigns a block number bi to each block. The degree of fine division is determined based on the cost required for pre-analysis and the required resource usage adjustment time interval.

　監視・調整モジュール１００に対して入力として与えられるシステム全体のリソース使用量Ｉｍには、システム全体の消費電力やメモリ使用量や処理時間などのリソース使用量を与えることができる。 The resource usage amount of the entire system given as an input to the monitoring / adjustment module 100 can be given the resource usage amount such as the power consumption, memory usage amount, and processing time of the entire system.

　なお、監視・調整モジュール１００は、情報処理システム１の全体のリソース使用量として、実測値の代わりに、簡単な機械学習モデルを用いて未来の使用量の予測値を取得してもよい。このようにすることで、情報処理システム１における急激なリソースの使用量の変化に対応できる。また、監視・調整モジュール１００は、情報処理システム１の全体のリソース使用量として、例えば、消費電力とメモリ使用量などのように、種類の異なる複数の使用量を組み合わせたタプルを取得してもよい。このようにすることで、各リソース使用量に合わせた適切な出力を算出できる。 Note that the monitoring / adjustment module 100 may acquire a predicted value of future usage using a simple machine learning model instead of the measured value as the total resource usage of the information processing system 1. By doing so, it is possible to cope with a sudden change in the amount of resources used in the information processing system 1. Further, the monitoring / adjustment module 100 may acquire a taple that combines a plurality of different types of usage, such as power consumption and memory usage, as the total resource usage of the information processing system 1. good. By doing so, it is possible to calculate an appropriate output according to the amount of each resource used.

　さらに、監視・調整モジュール１００は、ＤＮＮのブロック番号ｂｉを入力に指定できる。これはどのブロックの計算方法を決定・出力するかを判断するために用いられる。 Furthermore, the monitoring / adjustment module 100 can specify the DNN block number bi as an input. This is used to determine which block calculation method is to be determined and output.

　加えて、監視・調整モジュール１００は、ブロック番号ｂｉ以前のブロックにそれぞれ割り当てられたリソース使用量Ｒｄｎｎ’を受け取ることができる。これは、モデル全体の計算状況を把握し、ＤＮＮモデル全体の精度を向上できる計算方法Ｏｍを決定するために用いられる。 In addition, the monitoring / adjustment module 100 can receive the resource usage amount Rdn'allocated to each block before the block number bi. This is used to grasp the calculation status of the entire model and determine the calculation method Om that can improve the accuracy of the entire DNN model.

　監視・調整モジュール１００の出力である計算方法Ｏｍは、ブロック番号ｂｉのブロックの計算をどのように行うかを示す制御情報である。例えば、数値計算の精度を調整する量子化ビット数、Pruning手法に代表される何割のＤＮＮの要素の計算を省くかを示すPruning割合、使用するネットワーク構造やパラメータなどである。モジュール出力の形式としては、量子化ビット数やPruning割合などの複数の要素で構成された制御情報を用いる場合、量子化ビット数ＱｂとPruning割合Ｐｒを用いて、Ｏｍ＝（Ｑｂ，Ｐｒ）とタプルで表現することができる。 The calculation method Om, which is the output of the monitoring / adjustment module 100, is control information indicating how to calculate the block with the block number bi. For example, the number of quantization bits for adjusting the accuracy of numerical calculation, the Pruning ratio indicating what percentage of DNN elements are omitted as represented by the Pruning method, the network structure and parameters to be used, and the like. As the module output format, when control information composed of multiple elements such as the number of quantization bits and the Pruning ratio is used, the number of quantization bits Qb and the Pruning ratio Pr are used, and Om = (Qb, Pr). It can be expressed in tuples.

　監視・調整モジュール１００の出力である計算方法Ｏｍは、１つのＤＮＮのブロックに対して計算方法を決定するが、１つのブロック中に複数のチャネル：Channelや、行列：Tensorの部分が含まれることがある。この場合は、ブロック中の要素のそれぞれに対して量子化ビット数などを指定できるように、量子化ビット数Ｑｂが複数の量子化ビット数を持つよう構成してもよい。 The calculation method Om, which is the output of the monitoring / adjustment module 100, determines the calculation method for one DNN block, but one block includes a plurality of channels: Channel and a matrix: Tensor. There is. In this case, the number of quantization bits Qb may be configured to have a plurality of quantization bits so that the number of quantization bits or the like can be specified for each of the elements in the block.

＜４－２．監視・調整モジュールの動作（１）＞
　以下、監視・調整モジュール１００の動作について説明する。図９は、本開示の実施形態に係る監視・調整モジュールの概要を示す図である。 <4-2. Operation of monitoring / adjustment module (1)>
Hereinafter, the operation of the monitoring / adjusting module 100 will be described. FIG. 9 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.

　図９に示すように、監視・調整モジュール１００による計算方法Ｏｍの決定は、２段階に分けて行われる。まず１段階目で、監視・調整モジュール１００は、入力として与えられるシステム全体のリソース使用量Ｉｍからブロック番号ｂｉのブロックで使用する目標リソース使用量Ｒｄｎｎを決定する。１段目の動作は、例えば、図７に示す算出部１１４の機能により実現される。 As shown in FIG. 9, the determination of the calculation method Om by the monitoring / adjustment module 100 is performed in two stages. First, in the first stage, the monitoring / adjusting module 100 determines the target resource usage Rdnn used in the block of the block number bi from the resource usage Im of the entire system given as an input. The operation of the first stage is realized, for example, by the function of the calculation unit 114 shown in FIG. 7.

　次に、監視・調整モジュール１００は、２段階目で、その目標リソース使用量Ｒｄｎｎを達成する計算方法のうち、ＤＮＮの計算精度ができる限り低下しない計算方法Ｏｍを選択して決定する。２段目の動作は、例えば、図７に示す決定部１１５の機能により実現される。すなわち、監視・調整モジュール１００の取得部１１３は、決定部１１５により計算方法が決定される度に、リソース使用量を取得する。続いて、監視・調整モジュール１００算出部１１４は、取得部１１３によりリソース使用量が取得される度に、次に推論処理の計算を行うＤＮＮ部分推論器３１の目標リソース使用量を算出する。そして、監視・調整モジュール１００の決定部１１５は、算出部１１４により目標リソース使用量が算出される度に、次に推論処理の計算を行うＤＮＮ部分推論器の計算方法を決定する。 Next, in the second stage, the monitoring / adjustment module 100 selects and determines the calculation method Om that does not reduce the DNN calculation accuracy as much as possible from among the calculation methods that achieve the target resource usage Rdnn. The operation of the second stage is realized, for example, by the function of the determination unit 115 shown in FIG. 7. That is, the acquisition unit 113 of the monitoring / adjustment module 100 acquires the resource usage amount each time the calculation method is determined by the determination unit 115. Subsequently, the monitoring / adjustment module 100 calculation unit 114 calculates the target resource usage amount of the DNN partial inference device 31 that next calculates the inference processing each time the resource usage amount is acquired by the acquisition unit 113. Then, each time the determination unit 115 of the monitoring / adjustment module 100 calculates the target resource usage amount by the calculation unit 114, the determination unit 115 determines the calculation method of the DNN partial inference device that next calculates the inference process.

　上述した１段目の動作により決定される目標リソース使用量Ｒｄｎｎの簡易な計算方法として、システムのリソース余剰をＤＮＮの計算に割り当てる方法がある。例えば、入力されるシステム全体のリソース使用量Ｉｍと、システムが供給できる最大リソース量Ｒｍａｘと、定数マージンＥｐｓｉｌｏｎを用いて、以下に示す式（１）のように計算できる。
　「目標リソース使用量Ｒｄｎｎ」＝「最大リソース量Ｒｍａｘ」－「リソース使用量Ｉｍ」－「定数マージンＥｐｓｉｌｏｎ」・・・（１） As a simple calculation method of the target resource usage amount Rdnn determined by the operation of the first stage described above, there is a method of allocating the resource surplus of the system to the calculation of DNN. For example, it can be calculated by the following equation (1) using the input resource usage amount Im of the entire system, the maximum resource amount Rmax that can be supplied by the system, and the constant margin Epsilon.
"Target resource usage Rdnn" = "Maximum resource usage Rmax"-"Resource usage Im"-"Constant margin Epsilon" ... (1)

　上述した２段階目では、ブロック番号ｂｉのブロックで使用してよい目標リソース使用量Ｒｄｎｎとブロック番号ｂｉ以前のブロックで使用したリソース使用量Ｒｄｎｎ’から計算方法Ｏｍを決定する必要がある。しかしながら、量子化ビット数やPruning割合などは様々な組み合わせがあり、かつＤＮＮの部分ごとに必要な計算精度は異なり、計算方法Ｏｍの決定は難しい。一方、単にブロック毎のＲｄｎｎに基づいて均等に計算リソースを割り当てる方法では、ＤＮＮの推論処理の精度が大きく低下してしまう。そこで、瞬時に複数の目標リソース使用量Ｒｄｎｎから計算方法Ｏｍを決定する方法の一例として、事前解析結果に基づく決定方法を示す。図１０は、本開示の実施形態に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。 In the second step described above, it is necessary to determine the calculation method Om from the target resource usage amount Rdnn that may be used in the block with the block number bi and the resource usage amount Rdnn'used in the block before the block number bi. However, there are various combinations of the number of quantization bits and the Pruning ratio, and the required calculation accuracy differs for each DNN part, so it is difficult to determine the calculation method Om. On the other hand, in the method of simply allocating the calculation resources evenly based on Rdnn for each block, the accuracy of the DNN inference processing is greatly reduced. Therefore, as an example of a method of instantly determining the calculation method Om from a plurality of target resource usage Rdnn, a determination method based on the preliminary analysis result is shown. FIG. 10 is a diagram showing an outline of a method for determining a calculation method based on the preliminary analysis result according to the embodiment of the present disclosure.

　図１０は、様々なリソース使用量の組み合わせに対して、推論処理の精度低下が少ない計算方法を事前に列挙しておく方法の一例を示している。入力Ｉｍが、消費電力やメモリ使用量などの１種類のリソースであり、その範囲が０％から１００％までであるとする。この場合、例えば０％と１００％の間のＩｍ＝０％，１０％，２０％，・・・，１００％の１１個の補間点に対して、それぞれ推論処理の精度低下の少ない最適な計算方法Ｏｍを事前に解析しておく。この解析は計算方法の決定対象となるブロックの目標リソース使用量Ｒｄｎｎと、以前のブロックで使用したリソース使用量Ｒｄｎｎ’の組み合わせの全てに対して行う。 FIG. 10 shows an example of a method of listing in advance calculation methods with less decrease in accuracy of inference processing for various combinations of resource usage. It is assumed that the input Im is one type of resource such as power consumption and memory usage, and the range is from 0% to 100%. In this case, for example, for 11 interpolation points of Im = 0%, 10%, 20%, ..., 100% between 0% and 100%, the optimum calculation with little deterioration in the accuracy of the inference processing. Method Om is analyzed in advance. This analysis is performed for all combinations of the target resource usage Rdnn of the block to be determined by the calculation method and the resource usage Rdn'used in the previous block.

　そして、監視・調整モジュール１００は、対応情報格納部１１２に記憶されている対応情報に基づいて、目標リソース使用量Ｒｄｎｎに近いリソース使用量に対応付けられた複数の計算方法を取得し、取得した複数の計算方法に基づいて、計算方法を決定する。具体的には、監視・調整モジュール１００は、動作時、事前解析結果から入力Ｉｍに近い点（補間点）を探し、それらから補間することで、計算方法Ｏｍを決定できる。事前解析において補間点を１１個以上の点に増やせばより正確な計算方法Ｏｍを決定することができる。また、入力Ｉｍが複数のリソースについてであっても同様に補間点に対して事前に解析しておくことで対応できる。 Then, the monitoring / adjustment module 100 acquires and acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount Rdnn based on the correspondence information stored in the correspondence information storage unit 112. Determine the calculation method based on multiple calculation methods. Specifically, the monitoring / adjustment module 100 can determine the calculation method Om by searching for points (interpolation points) close to the input Im from the pre-analysis results during operation and interpolating from them. If the number of interpolation points is increased to 11 or more in the pre-analysis, a more accurate calculation method Om can be determined. Further, even if the input Im is for a plurality of resources, it can be dealt with by similarly analyzing the interpolation points in advance.

　上述した監視・調整モジュール１００の動作は一例に過ぎず、入力Ｉｍとして、前にある監視・調整モジュール１００（動作主体が監視・調整モジュール１００_ｍ＋１であれば、監視・調整モジュール１００_ｍ）の出力である計算方法Ｏｍを合わせて与えてもよい。これにより、長期的なシステム全体のリソース使用量の時間的変化やＤＮＮ中のどの要素にどの程度の計算精度を割り振ったかが分かる。このため、変動の大きいリソース使用量に対してより安定してＤＮＮの処理全体の精度を高めるような計算方法Ｏｍを求めることができる。 The operation of the monitoring / adjustment module 100 described above is only an example, and as an input Im, the output of the monitoring / adjustment module 100 in front (if the operating subject is the monitoring / adjustment module 100 _{m + 1} , the monitoring / adjustment module 100 _m ). The calculation method Om is also given. From this, it is possible to know the change over time in the resource usage of the entire system over a long period of time and the degree of calculation accuracy assigned to which element in the DNN. Therefore, it is possible to obtain a calculation method Om that is more stable and improves the accuracy of the entire DNN process with respect to the resource usage amount with large fluctuation.

　なお、上述した監視・調整モジュール１００の動作において、「ブロック」を「ＤＮＮ部分推論器」と読み替えることにより、図６に示すＤＮＮ推論器７０を構成するＤＮＮ部分推論器３１のそれぞれに対し、計算方法Ｏｍを出力する動作を実現できる。 In the operation of the monitoring / adjustment module 100 described above, by replacing "block" with "DNN partial inference device", calculation is performed for each of the DNN partial inference devices 31 constituting the DNN inference device 70 shown in FIG. The operation of outputting the method Om can be realized.

＜４－３．監視・調整モジュールの動作（２）＞
　監視・調整モジュール１００は、ＤＮＮ部分推論器３１に割り当てるリソース使用量が低下することを情報処理システム１において通知するように動作できる。かかる動作は、図７に示す通知部１１６の機能により実現される。図１１は、本開示の実施形態に係る通知動作の概要を示す図である。 <4-3. Operation of monitoring / adjustment module (2)>
The monitoring / adjusting module 100 can operate so as to notify the information processing system 1 that the resource usage allocated to the DNN partial inferencer 31 is decreasing. Such an operation is realized by the function of the notification unit 116 shown in FIG. 7. FIG. 11 is a diagram showing an outline of the notification operation according to the embodiment of the present disclosure.

　図１１に示すように、情報処理システム１は、ユーザ通知モジュール９０を備える。ユーザ通知モジュール９０は、例えば、情報処理システム１のユーザに、情報処理システム１の処理状態を可視化して報知する。処理状態は、例えば、画像認識や音声認識などの認識処理であれば認識精度の低下、ＵＩの応答処理であれば応答性の低下などに該当する。 As shown in FIG. 11, the information processing system 1 includes a user notification module 90. The user notification module 90 visualizes and notifies the user of the information processing system 1, for example, of the processing state of the information processing system 1. The processing state corresponds to, for example, a decrease in recognition accuracy in the case of recognition processing such as image recognition or voice recognition, and a decrease in responsiveness in the case of UI response processing.

　監視・調整モジュール１００は、例えば、ＤＮＮ部分推論器３１に割り当てる目標リソース使用量Ｒｄｎｎが所定の基準以下となる場合、ユーザ通知モジュール９０への通知を実行する。 The monitoring / adjustment module 100 executes notification to the user notification module 90, for example, when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard.

　ユーザ通知モジュール９０は、監視・調整モジュール１００からの通知を受けて、例えば、ユーザが操作する操作デバイス９１等に、情報処理システム１の応答性が低下していることを示す情報を可視化する。可視化の方法は、例えば、操作デバイス９１に設けられた発光部を所定の色で点灯させる、あるいは発光部を点滅させるなどの方法が考えられる。 Upon receiving the notification from the monitoring / adjusting module 100, the user notification module 90 visualizes information indicating that the responsiveness of the information processing system 1 is deteriorated, for example, on the operation device 91 operated by the user. As a visualization method, for example, a method of lighting the light emitting unit provided in the operation device 91 with a predetermined color, or a method of blinking the light emitting unit can be considered.

　なお、監視・調整モジュール１００は、ＤＮＮ部分推論器３１に割り当てる目標リソース使用量Ｒｄｎｎが所定の基準以下となる場合、システムモジュール５０への通知を行ってもよい。システムモジュール５０は、監視・調整モジュール１００からの通知を受けると、ＤＮＮ推論器７０による推論結果の精度低下に応じて、ユーザの安全性を高めるように、システムの動作を変更できる。例えば、情報処理システム１が運搬ロボットシステムであれば、運搬時間を犠牲にしてでも環境や人に被害を与えないような経路を選択するように動作を変更することが考えられる。あるいは、情報処理システム１がペット型ロボットであれば、応答処理の精度低下に合わせて、例えば目をつぶって休んでいる仕草をとるように動作変更することが考えられる。 Note that the monitoring / adjustment module 100 may notify the system module 50 when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard. Upon receiving the notification from the monitoring / adjusting module 100, the system module 50 can change the operation of the system so as to improve the safety of the user according to the decrease in the accuracy of the inference result by the DNN inference device 70. For example, if the information processing system 1 is a transport robot system, it is conceivable to change the operation so as to select a route that does not damage the environment or people even at the expense of transport time. Alternatively, if the information processing system 1 is a pet-type robot, it is conceivable to change the operation so as to take a resting gesture, for example, by closing the eyes in accordance with the decrease in the accuracy of the response processing.

＜＜５．監視・調整モジュールによる処理手順例＞＞
　以下、本開示の実施形態に係る監視・調整モジュール１００による処理手順について説明する。図１２は、本開示の実施形態に係る監視・調整モジュールによる処理手順の一例を示すフローチャートである。図１２に示す処理手順は、情報処理システム１の稼働中、繰り返し実行される。 << 5. Example of processing procedure by monitoring / adjustment module >>
Hereinafter, the processing procedure by the monitoring / adjusting module 100 according to the embodiment of the present disclosure will be described. FIG. 12 is a flowchart showing an example of a processing procedure by the monitoring / adjusting module according to the embodiment of the present disclosure. The processing procedure shown in FIG. 12 is repeatedly executed while the information processing system 1 is in operation.

　図１２に示すように、取得部１１３は、情報処理システム１の全体のリソース使用量である入力Ｉｍを取得する（ステップＳ１０１）。 As shown in FIG. 12, the acquisition unit 113 acquires the input Im, which is the total resource usage of the information processing system 1 (step S101).

　算出部１１４は、入力Ｉｍに基づいて、ＤＮＮ部分推論器３１の目標リソース使用量Ｒｄｎｎを算出して、リソース使用量情報格納部１１１に格納する（ステップＳ１０２）。 The calculation unit 114 calculates the target resource usage amount Rdnn of the DNN partial inference device 31 based on the input Im, and stores it in the resource usage amount information storage unit 111 (step S102).

　決定部１１５は、処理対象となる現在のブロック（ＤＮＮ部分推論器３１）に付与されたブロック番号ｂｉをキーとして、現在のブロックより以前のブロックで使用されたリソース使用量Ｒｄｎｎ’をリソース使用量情報格納部１１１から取得する（ステップＳ１０３）。 The determination unit 115 uses the block number bi assigned to the current block (DNN partial inferior 31) to be processed as a key, and the resource usage amount Rdn'used in the blocks before the current block as the resource usage amount. Obtained from the information storage unit 111 (step S103).

　決定部１１５は、目標リソース使用量Ｒｄｎｎ、及び以前のブロックで使用されたリソース使用量Ｒｄｎｎ’に基づいて、計算方法Ｏｍを決定する（ステップＳ１０４）。 The determination unit 115 determines the calculation method Om based on the target resource usage amount Rdnn and the resource usage amount Rdnn'used in the previous block (step S104).

　決定部１１５は、決定した計算方法Ｏｍを処理対象となる現在のブロックに出力し（ステップＳ１０５）、上記ステップＳ１０１の処理手順に戻る。 The determination unit 115 outputs the determined calculation method Om to the current block to be processed (step S105), and returns to the processing procedure of step S101.

　上述してきた実施形態において、監視・調整モジュール１００の入力Ｉｍとして、現在のリソース使用量ではなく、将来のリソース使用量の予測値を与えることで、リソース使用量の超過を回避しやすくなるように、ＤＮＮ推論器７０を構成できる。将来のリソース使用量の予測には、簡単な機械学習モデルやカルマンフィルタを用いることができる。 In the above-described embodiment, by giving the predicted value of the future resource usage instead of the current resource usage as the input Im of the monitoring / adjustment module 100, it becomes easy to avoid the excess of the resource usage. , The DNN inferior 70 can be configured. Simple machine learning models and Kalman filters can be used to predict future resource usage.

　また、例えば、ＤＮＮモデル３０をどの粒度で細かく分割して事前解析を行い、ＤＮＮ推論器７０を構成するかは、事前解析にかかる時間や計算資源のコストをどの程度許容できるか、リソース使用量を切り替える時間間隔をどの程度細かくしたいかに依存する。切り替える時間間隔を小さくするには、事前解析でより多くの計算方法の組み合わせを評価する必要が生じるため大きな実施時間と計算資源を要する。また、モデル全体として切り替え可能なタイミングが増えるため、切り替えのオーバーヘッドによりモデル全体の処理時間が増加するおそれもある。 Further, for example, the particle size of the DNN model 30 to be finely divided and the pre-analysis to configure the DNN inferior 70 depends on how much time required for the pre-analysis and the cost of computational resources can be tolerated, and the amount of resources used. It depends on how fine the time interval you want to switch between. In order to reduce the switching time interval, it is necessary to evaluate more combinations of calculation methods in the preliminary analysis, which requires a large implementation time and calculation resources. In addition, since the timing at which switching is possible for the entire model increases, the processing time of the entire model may increase due to the switching overhead.

　この技術を利用する際には２つの粒度を決める必要がある。１つ目は、例えば、ＤＮＮモデル３０をブロックに分割する粒度で、２つ目はブロックの中の要素をどこまで細かく分析するかであり、上記で述べた事前解析のコストと切り替えのオーバーヘッドのコストのバランスをこれらの粒度で調整する。 When using this technology, it is necessary to determine two particle sizes. The first is, for example, the particle size of dividing the DNN model 30 into blocks, and the second is how finely the elements in the block are analyzed. The balance of these is adjusted with these particle sizes.

　まず、ＤＮＮ推論は、１つ目の「ＤＮＮモデル３０をブロックに分割する粒度」で、ＤＮＮモデル３０を複数のＤＮＮ部分推論器３１に分けることができる。例えば、図１０等に例で示した事前解析の方法では、この粒度の切り替えに対して計算方法を探索するため、ＤＮＮ推論中にリソース使用量が切り替えられる最小単位はこの粒度となる。なお、この粒度で分けられた１つのブロックを処理した後、あまり時間が経過していない場合には、リソース使用量を切り替えずに、次のブロックも続けて同じリソース使用量で処理を実行等することによりブロックを結合することはできる。しかしながら、ブロックより細かい単位ではリソース使用量を切り替えられない。 First, the DNN inference is the first "particle size for dividing the DNN model 30 into blocks", and the DNN model 30 can be divided into a plurality of DNN partial inference devices 31. For example, in the pre-analysis method shown in the example shown in FIG. 10 and the like, since the calculation method is searched for this particle size switching, the minimum unit in which the resource usage is switched during DNN inference is this particle size. If not much time has passed after processing one block divided by this particle size, the next block is continuously executed with the same resource usage without switching the resource usage, etc. Blocks can be combined by doing so. However, the resource usage cannot be switched in units smaller than the block.

　２つ目の「ブロック中の要素をどこまで細かく見るかという粒度」は、そのブロックのリソース使用量を守り精度を最大化する上で、そのブロック内の細かい要素であるチャネル（channel）や行列（tensor）にどうビット数や演算力を割り振るかである。これを細かくすることで精度低下が抑えられることは、上記参考文献１に示される通りだが、細かくすることで、増加することが予想される事前解析のコストとのバランスを取る必要がある。 The second "grain size of how finely the elements in a block are viewed" is the finer elements in the block, such as channels and matrices, in order to protect the resource usage of the block and maximize the accuracy. How to allocate the number of bits and computing power to tensor). As shown in Reference 1 above, the decrease in accuracy can be suppressed by making this finer, but it is necessary to balance it with the cost of pre-analysis, which is expected to increase by making it finer.

　また、上記実施形態では、推論器として、複数の層による計算を多段に連ねて推論処理を行うＤＮＮについて説明したが、単層で構成されたニューラルネットワークについても同様に、上述した監視・調整モジュール１００による処理を適用できる。この場合、対象となるニューラルネットワークの層を構成するチャネルや行列等の要素ごとに、リソース使用量に対応する計算方法を事前解析した結果に基づいて、目標リソース使用量に対応する計算方法を決定できる。 Further, in the above embodiment, as an inference device, a DNN that performs inference processing by connecting calculations by a plurality of layers in multiple stages has been described, but similarly, the above-mentioned monitoring / adjustment module is also used for a neural network composed of a single layer. Processing by 100 can be applied. In this case, the calculation method corresponding to the target resource usage is determined based on the result of pre-analysis of the calculation method corresponding to the resource usage for each element such as the channel and the matrix constituting the layer of the target neural network. can.

＜＜６．変形例＞＞
＜６－１．機械学習による対応情報の生成＞
　対応情報格納部１１２に記憶される対応情報は、オペレータによる事前解析により獲得される場合には特に限定される必要はなく、例えば、機械学習の結果として獲得されてもよい。 << 6. Modification example >>
<6-1. Generation of correspondence information by machine learning ＞
The correspondence information stored in the correspondence information storage unit 112 is not particularly limited when it is acquired by prior analysis by the operator, and may be acquired as a result of machine learning, for example.

　例えば、強化学習を用いて、入力を以前のブロックで使用されたリソース使用量Ｒｄｎｎ’、目標リソース使用量Ｒｄｎｎ、ブロック番号ｂｉとして、これらの入力に対する出力として計算方法Ｏｍを推論するようなＤＮＮモデルを学習する。この場合は、強化学習の状態を、Ｒｄｎｎ’（以前のブロックで使用されたリソース量）、Ｒｄｎｎ（目標リソース使用量）、ブロック番号ｂｉのタプルとし、出力を計算方法Ｏｍとし、そして報酬を計算方法Ｏｍで達成できる推論処理の精度（認識処理の場合、認識精度）とリソース使用量を制限せずに推論処理行った際の精度との差とすればよい。強化学習モデルを用いることにより効率的な計算方法の決定ができる。 For example, a DNN model that uses reinforcement learning to infer the calculation method Om as the output for these inputs, with the inputs as the resource usage Rdn', the target resource usage Rdnn, and the block number bi used in the previous block. To learn. In this case, the state of reinforcement learning is Rdnn'(the amount of resources used in the previous block), Rdnn (the amount of target resources used), the taple of the block number bi, the output is the calculation method Om, and the reward is calculated. It may be the difference between the accuracy of the inference processing that can be achieved by the method Om (in the case of the recognition processing, the recognition accuracy) and the accuracy when the inference processing is performed without limiting the resource usage. An efficient calculation method can be determined by using the reinforcement learning model.

＜６－２．リソース切り替えのオーバーヘッド削減＞
　上述した監視・調整モジュールの動作（１）において、リソース切り替えのオーバーヘッドを削減するために、上述した２段目の動作を変形することもできる。図１３は、変形例に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。図１４は、変形例に係るＤＮＮ推論器の構成例を示す図である。 <6-2. Resource switching overhead reduction>
In the operation (1) of the monitoring / adjustment module described above, the operation of the second stage described above can be modified in order to reduce the overhead of resource switching. FIG. 13 is a diagram showing an outline of a method for determining a calculation method based on a preliminary analysis result according to a modified example. FIG. 14 is a diagram showing a configuration example of the DNN inference device according to the modified example.

　例えば、監視・調整モジュール１００は、図１３に示すように、入力であるリソース使用量Ｒｄｎｎ’、目標リソース使用量Ｒｄｎｎ、及びブロック番号ｂｉに対して、ブロック番号ｂｉ以降のブロック全ての計算方法を出力するようにしてもよい。 For example, as shown in FIG. 13, the monitoring / adjustment module 100 calculates all the blocks after the block number bi with respect to the input resource usage amount Rdn', target resource usage amount Rdnn, and block number bi. It may be output.

　また、図１４に示すように、あるブロックを計算する前に監視・調整モジュール１００を動作させて新しい計算方法を出力するか、若しくは以前決定した計算方法をそのまま用いるかを判断する判断機構７１（７１_ｍ～７１_{ｍ＋ｎ－１}）をＤＮＮ推論器７０に新たに設ければよい。図１４に示す判断機構７１は、前回の監視・調整モジュール１００の動作からの時間経過を計測し、一定時間が経過している場合、監視・調整モジュール１００を動作させ、新しい計算方法を出力させる。一方、判断機構７１は、前回の監視・調整モジュール１００の動作から一定時間が経過していない場合、以前決定された計算方法をそのまま利用する。このようにして、システム全体のリソース使用量を取得する度に、計算方法を決定するよりも、リソース切り替えのオーバーヘッドを削減できる。 Further, as shown in FIG. 14, a determination mechanism 71 (determining whether to operate the monitoring / adjustment module 100 to output a new calculation method or to use the previously determined calculation method as it is before calculating a certain block ( 71 _m to 71 _{m + n-1} ) may be newly provided in the DNN inferior 70. The determination mechanism 71 shown in FIG. 14 measures the passage of time from the previous operation of the monitoring / adjusting module 100, and when a certain time has elapsed, operates the monitoring / adjusting module 100 to output a new calculation method. .. On the other hand, if a certain time has not passed since the previous operation of the monitoring / adjusting module 100, the determination mechanism 71 uses the previously determined calculation method as it is. In this way, the overhead of resource switching can be reduced rather than determining the calculation method each time the resource usage of the entire system is acquired.

　例えば、判断機構７１_{ｍ＋ｎ－１}は、ブロック番号ｂｉ＝ｎが付与された部分推論器３１_ｍ＋ｎの計算方法を決定する場合、ブロック番号ｂｉ＝ｎ－１の計算方法を決定してから、一定時間が経過しているか否かを判定する。一定時間が経過している場合、判断機構７１_{ｍ＋ｎ－１}は、監視・調整モジュール１００_ｍ＋ｎに、ブロック番号ｂｉ＝ｎが付与された部分推論器３１_ｍ＋ｎの計算方法を決定させ、新たな計算方法を出力する。一方、一定時間が経過していない場合、判断機構７１_{ｍ＋ｎ－１}は、以前、ブロック番号ｂｉ＝ｎが付与された部分推論器３１_ｍ＋ｎについて決定した計算方法を、そのまま出力する。 For example, when the determination mechanism 71 _{m + n-1} _{determines the calculation method of the partial inference device 31 m + n} to which the block number bi = n is assigned, the determination mechanism 71 m + n-1 determines the calculation method of the block number bi = n-1 for a certain period of time. Determines if has passed. When a certain period of time has passed, the determination mechanism 71 _{m + n-1} causes the monitoring / adjustment module 100 _{m + n} _{to determine the calculation method of the partial inference device 31 m + n} to which the block number bi = n is assigned, and a new calculation method is used. Is output. On the other hand, when a certain period of time has not elapsed, the determination mechanism 71 _{m + n-1} outputs the calculation method previously determined for the _{partial inference device 31 m + n} to which the block number bi = n is assigned, as it is.

＜＜７．その他＞＞
＜７－１．ロボットシステムへの適用＞
　上述してきたＤＮＮ部分推論器３１と、監視・調整モジュール１００とで構成されるＤＮＮ推論器７０は、カメラ画像をＤＮＮにより処理するロボットシステムに適用できる。ロボットは屋外で動作するものはバッテリーにより駆動するため最大消費電力の成約をもつものが多い。またアクチュエータ等の消費電力の大きい電動部品が搭載されていたり、行動を決定するための認識処理や通信処理も同時に行われるため状況に応じて消費電力が変動したりする。システム全体の消費電力に配慮せず動作させると使用量が最大供給量を超過し異常動作を引き起こしたり、姿勢を制御できなかったりするなどの問題を引き起こす可能性がある。 << 7. Others >>
<7-1. Application to robot systems>
The DNN inference device 70 including the DNN partial inference device 31 and the monitoring / adjustment module 100 described above can be applied to a robot system that processes a camera image by a DNN. Most robots that operate outdoors have a maximum power consumption contract because they are driven by a battery. In addition, electric parts with high power consumption such as actuators are mounted, and recognition processing and communication processing for determining actions are also performed at the same time, so that the power consumption fluctuates depending on the situation. If the system is operated without considering the power consumption of the entire system, the usage amount may exceed the maximum supply amount, causing abnormal operation or the inability to control the posture.

　ロボットシステムにおいて、ＤＮＮ部分推論器３１と、監視・調整モジュール１００とで構成されるＤＮＮ推論器７０を用いて、カメラ画像をＤＮＮにより認識する認識処理を実行することにより、消費電力が最大供給量を超えないように調整しつつ認識処理を絶やさずに実行し続けることができる。監視・調整モジュール１００の入力に現在の消費電力を与え、出力として次のＤＮＮ要素のPruning割合が決定されるように構成する。これにより、システム全体の消費電力が大きくなった場合にはPruning割合が大きくなり一部のＤＮＮの要素の計算が省略される。一部の計算が省略されることにより時間当たりの回路の利用率が低下し、ＤＮＮの処理の動的消費電力が低く抑えられ余剰が生まれ、アクチュエータ等の消費電力を多く求めている箇所に電力を供給することが可能となる。また、Pruning割合は事前の解析により決定され、ＤＮＮによる認識処理自体も継続して行えるため、カメラでとらえた動物体の認識漏れや行動決定の遅れを最小限に抑えることができる。アクチュエータ等の消費電力が増大し、大幅にPruning割合を高くしなければならない状況では、ＤＮＮによる認識処理の認識精度が低下し、これに伴い、ある程度動物体の検出率が低下することによって、周囲の環境や人に対して安全でない行動をとる可能性も考えられる。そのような場合では、運搬ロボットであれば運搬時間を犠牲にしてでも環境や人に被害を与えないような経路を選択したり、家庭内で使用されるペット型ロボットであれば目をつぶっているようにユーザに見せたりする等の工夫により、安全性をより高めることができる。 In the robot system, the maximum power consumption is supplied by executing the recognition process of recognizing the camera image by the DNN by using the DNN inference device 70 composed of the DNN partial inference module 31 and the monitoring / adjustment module 100. It is possible to continue to execute the recognition process while adjusting so that it does not exceed. The current power consumption is given to the input of the monitoring / adjusting module 100, and the Pruning ratio of the next DNN element is determined as the output. As a result, when the power consumption of the entire system becomes large, the pruning ratio becomes large and the calculation of some DNN elements is omitted. By omitting some calculations, the utilization rate of the circuit per hour decreases, the dynamic power consumption of DNN processing is kept low, a surplus is generated, and power is generated in places where a large amount of power consumption is required, such as actuators. Can be supplied. In addition, since the pruning ratio is determined by prior analysis and the recognition process itself by DNN can be continuously performed, it is possible to minimize the omission of recognition of the animal body captured by the camera and the delay in action decision. In a situation where the power consumption of the actuator etc. increases and the Pruning ratio must be significantly increased, the recognition accuracy of the recognition process by DNN decreases, and the detection rate of the animal body decreases to some extent, so that the surroundings It is also possible to take unsafe actions against the environment and people. In such a case, if it is a transport robot, select a route that does not damage the environment or people even if the transport time is sacrificed, or if it is a pet-type robot used at home, close your eyes. The safety can be further enhanced by making the user look like it is.

＜７－２．ゲーム機への適用＞
　上述してきたＤＮＮ推論器７０を、音声やハンドジェスチャなどをＤＮＮによる処理により認識するＵＩ機能を備えたゲーム機に適用することができる。ゲーム機は予め決められたメモリ容量がある。シーン描画処理の品質を上げるためには大量のジオメトリや材質情報をメモリに展開しておく必要がある。ＤＮＮによる認識処理が並行して動作し、ある程度メモリを使用する場合は描画品質をある程度下げることになる。しかし、ゲームの進行の都合上、ムービーシーン等において操作性をある程度諦めてでも描画品質を高めてプレイヤーに情景を見させたいという場面が存在する。また、屋内シーンや、ＵＩだけ表示される画面においては、描画処理のために使われるメモリ量は少ない。そこで、このような場合には、余剰のメモリ量をＤＮＮによる認識処理に割り当てて認識精度を向上したい。 <7-2. Application to game consoles>
The above-mentioned DNN inference device 70 can be applied to a game machine having a UI function for recognizing voice, hand gesture, and the like by processing by DNN. The game machine has a predetermined memory capacity. In order to improve the quality of the scene drawing process, it is necessary to expand a large amount of geometry and material information in the memory. The recognition process by DNN operates in parallel, and when the memory is used to some extent, the drawing quality is lowered to some extent. However, due to the progress of the game, there are scenes in which the player wants to see the scene by improving the drawing quality even if the operability is given up to some extent in the movie scene or the like. In addition, in indoor scenes and screens where only the UI is displayed, the amount of memory used for drawing processing is small. Therefore, in such a case, it is desired to allocate the surplus memory amount to the recognition process by DNN to improve the recognition accuracy.

　上述してきたＤＮＮ推論器７０をゲーム機に適用することにより、上記のような要求をかなえることが可能となる。監視・調整モジュール１００の入力に描画処理のメモリ使用量を与え、出力として量子化ビット数が決定されるようにする。こうすることで、描画処理のために大きなメモリ領域を確保する必要が生じた時、量子化ビット数が瞬時に切り替わり、少ないメモリ使用量でＤＮＮによる認識処理が行われるようになる。この結果生まれたメモリ容量の余剰で描画品質を向上することが実現できる。一方で、ＤＮＮを用いているＵＩ機能の応答性が低下してしまうことが懸念されるが、ゲーム開発者がプレイヤーをムービーシーンに引き込みたいときに使うなど場面を慎重に選択したり、応答性が低下していることをコントローラ等に可視化したり（図１１等参照）、ユーザにゲームを中断してＵＩ機能を使わせるよう示したりするなどの対策をとることができる。 By applying the DNN inference device 70 described above to a game machine, it is possible to meet the above requirements. The memory usage of the drawing process is given to the input of the monitoring / adjusting module 100 so that the number of quantization bits is determined as the output. By doing so, when it becomes necessary to secure a large memory area for the drawing process, the number of quantization bits is changed instantly, and the recognition process by the DNN can be performed with a small amount of memory usage. It is possible to improve the drawing quality with the surplus memory capacity created as a result. On the other hand, there is a concern that the responsiveness of the UI function using DNN will decrease, but the game developer can carefully select the scene such as using it when he wants to draw the player into the movie scene, and the responsiveness. It is possible to take measures such as visualizing the decrease in the game to a controller or the like (see FIG. 11 or the like), or instructing the user to interrupt the game and use the UI function.

＜＜８．むすび＞＞
　本開示の実施形態に係る監視・調整モジュール１００（情報処理装置の一例）は、ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システム１に適用され、取得部１１３と、算出部１１４と、決定部１１５とを備える。取得部１１３は、情報処理システム１の全体のリソース使用量を取得する。算出部１１４は、リソース使用量に基づいて、ＤＮＮ推論器７０による推論処理の少なくとも一部の計算（一例として、ＤＮＮ部分推論器３１による推論処理の計算）に割り当てる目標リソース使用量を算出する。決定部１１５は、目標リソース使用量に対応する計算方法（ＤＮＮ部分推論器３１による推論処理の計算方法）を決定する。このようなことから、監視・調整モジュール１００は、情報処理システム１においてリソース使用量の超過を起こさないようにＤＮＮモデル３０の推論処理に用いるリソース量を調整できる。 << 8. Conclusion >>
The monitoring / adjustment module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 that uses an inference result by an inference device using a neural network, and is applied to an information processing system 1 and has an acquisition unit 113 and a calculation unit 114. And a determination unit 115. The acquisition unit 113 acquires the total resource usage of the information processing system 1. The calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 (for example, the calculation of the inference processing by the DNN partial inference device 31) based on the resource usage amount. The determination unit 115 determines a calculation method (calculation method of inference processing by the DNN partial inference device 31) corresponding to the target resource usage amount. Therefore, the monitoring / adjustment module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.

　図１５は、比較例に係る情報処理の一例を示す図である。図１６は、本開示の実施形態に係るＤＮＮ推論器による情報処理の一例を示す図である。図１７は、比較例に係る情報処理についての経過時間とリソース使用量との関係を示す図である。図１８は、本開示の実施形態に係るＤＮＮ推論器による情報処理についての経過時間とリソース使用量との関係を示す図である。 FIG. 15 is a diagram showing an example of information processing according to a comparative example. FIG. 16 is a diagram showing an example of information processing by the DNN inference device according to the embodiment of the present disclosure. FIG. 17 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing according to the comparative example. FIG. 18 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing by the DNN inferior according to the embodiment of the present disclosure.

　図１５に示すように、システム全体のリソース使用量に基づいて処理のリソース使用量を調整する機能は、コード切り替えという技術による実現することも考えられる。図１５では、ＤＮＮ推論器で使用するコードをリソース余剰がある場合の「通常時のコード」と、リソース超過を起こしている場合の「超過時のコード」という同じ目的の２つのコードが用意されていて、リソース使用量によりコードが切り替えられる仕組みを取っているとする。この仕組みでは、図１７の矢印に示すように、リソース使用量を調整できる機会は、例えば、データ１に対するＤＮＮ推論やデータ２に対するＤＮＮ推論などＤＮＮによる推論を実行する前に限られている。このため、システム全体のリソース使用量が急に増加した際に、ＤＮＮ推論器のリソース使用量を瞬時に融通・調整することができない。この結果、システム全体のリソース使用量が超過しシステムが停止する可能性や、安全をとってＤＮＮ推論が停止される可能性がある。 As shown in FIG. 15, the function of adjusting the resource usage of processing based on the resource usage of the entire system may be realized by a technique called code switching. In FIG. 15, two codes having the same purpose are prepared, that is, the code used in the DNN inferior is the "normal code" when there is a resource surplus and the "excess code" when the resource is exceeded. It is assumed that the code can be switched according to the resource usage. In this mechanism, as shown by the arrow in FIG. 17, the opportunity to adjust the resource usage is limited before executing the inference by DNN such as the DNN inference for the data 1 and the DNN inference for the data 2. Therefore, when the resource usage of the entire system suddenly increases, the resource usage of the DNN inferior cannot be instantly accommodated and adjusted. As a result, the resource usage of the entire system may be exceeded and the system may be stopped, or DNN inference may be stopped for safety.

　一方、本開示の実施形態にＤＮＮ推論器７０は、上述したように、ＤＮＮの計算が要素ごとの計算（各ＤＮＮ部分推論器３１の計算）の積み重ねであること、要素ごとに計算精度を変更したとしてもＤＮＮ全体の出力精度が低下しにくい性質（上述の参考文献１参照）を活用している。本開示の実施形態に係るＤＮＮ推論器７０では、図１６に示すように、ＤＮＮ部分推論器３１_ｍ＋１においてリソース使用量を調整する必要が生じても、図１８の矢印に示すように、要素毎にリソース使用量を調整できる機会があることより、順次に融通・調整を行うことができる。 On the other hand, in the embodiment of the present disclosure, in the DNN inferior 70, as described above, the DNN calculation is a stack of calculation for each element (calculation of each DNN partial inference device 31), and the calculation accuracy is changed for each element. Even if this is the case, the property that the output accuracy of the entire DNN does not easily decrease (see Reference 1 above) is utilized. In the DNN inference device 70 according to the embodiment of the present disclosure, _{even if it becomes necessary to adjust the resource usage in the DNN partial inference device 31 m + 1} as shown in FIG. 16, as shown by the arrow in FIG. 18, for each element. Since there is an opportunity to adjust the resource usage, it is possible to make adjustments in sequence.

　図１９は、既存技術による情報処理の一例を示す図である。システム全体のリソース使用量に基づいて処理のリソース使用量を調整する目的は、システムで利用することのできる限られたリソースをＤＮＮの処理や他の処理で適切に配分する点にある。このような目的を実現するための機能として、例えば、先に説明した先行技術文献（特開２０１２－４３４０９号公報）で示されている仕組みを適用することが考えらえる。すなわち、データ処理負荷（リソース使用量）の増大することが予測された場合に、時系列の入力データを選択的に除去したり、データ処理を遅延させたり、データ処理を他のシステムにオフロードしたりするデータ処理システムにおいて、データ処理をＤＮＮ推論処理と読み替えることにより実現することができる。しかしながら、このような仕組みでは、他の処理のリソース使用量が増大した際に、ＤＮＮ推論が行えないデータが生じたり、処理結果が遅れて取得されることになりＤＮＮ推論のリアルタイム性・品質が著しく低下したりするというおそれがある。 FIG. 19 is a diagram showing an example of information processing by the existing technology. The purpose of adjusting the resource usage of processing based on the resource usage of the entire system is to appropriately allocate the limited resources that can be used in the system in DNN processing and other processing. As a function for realizing such an object, for example, it is conceivable to apply the mechanism shown in the prior art document (Japanese Patent Laid-Open No. 2012-43409) described above. That is, when the data processing load (resource usage) is predicted to increase, time-series input data is selectively removed, data processing is delayed, or data processing is offloaded to other systems. It can be realized by reading the data processing as the DNN inference processing in the data processing system. However, with such a mechanism, when the resource usage of other processing increases, data that cannot be DNN inferred is generated, or the processing result is acquired with a delay, and the real-time property and quality of DNN inference are improved. There is a risk that it will drop significantly.

　一方、本開示の実施形態に係るＤＮＮ推論器７０では、ＤＮＮの計算途中で計算精度を低くしても結果の精度に影響しにくい性質（上述の参考文献１参照）を活用して、例えば、上記図６等を用いて説明したように、リソース使用量を調整しつつもデータ飛ばしや遅延増加のないデータ処理システムを構築できる。 On the other hand, in the DNN inference device 70 according to the embodiment of the present disclosure, for example, by utilizing the property that even if the calculation accuracy is lowered during the DNN calculation, the accuracy of the result is not easily affected (see Reference 1 above), for example. As described with reference to FIG. 6 and the like, it is possible to construct a data processing system that does not skip data or increase delay while adjusting the amount of resources used.

　また、ＤＮＮ推論器７０（推論器の一例）は、予め定められる条件に基づいて、所定の粒度で複数のＤＮＮ部分推論器３１（部分推論器の一例）に分割される。そして、算出部１１４は、処理対象となるＤＮＮ部分推論器３１の推論処理に割り当てる目標リソース量を算出し、決定部１１５は、目標リソース量に基づいて、処理対象となるＤＮＮ部分推論器３１の推論処理における計算方法を決定し、当該処理対象となるＤＮＮ部分推論器３１に出力する。これにより、監視・調整モジュール１００は、システムにおいて割り当て可能なリソースの中から、ＤＮＮ部分推論器３１ごとに適切なリソース量を配分できる。 Further, the DNN inference device 70 (an example of an inference device) is divided into a plurality of DNN partial inference devices 31 (an example of a partial inference device) with a predetermined particle size based on a predetermined condition. Then, the calculation unit 114 calculates the target resource amount to be allocated to the inference processing of the DNN partial inference device 31 to be processed, and the determination unit 115 calculates the target resource amount of the DNN partial inference device 31 to be processed based on the target resource amount. The calculation method in the inference processing is determined and output to the DNN partial inference device 31 to be processed. As a result, the monitoring / adjustment module 100 can allocate an appropriate amount of resources to each DNN partial inferencer 31 from the resources that can be allocated in the system.

　また、ＤＮＮ推論器７０は、リソースの使用量ごとの計算方法を解析するために要するコスト、又はリソース使用量の調整に要する時間間隔に基づいて、複数のＤＮＮ部分推論器３１に分割される。これにより、監視・調整モジュール１００は、システム管理者の目的に応じた柔軟なリソース管理が可能となる。 Further, the DNN inferior 70 is divided into a plurality of DNN partial inferiors 31 based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage. As a result, the monitoring / adjustment module 100 enables flexible resource management according to the purpose of the system administrator.

　また、ＤＮＮ推論器７０は、層、チャネル、及び行列を含む複数の要素で構成される。そして、ＤＮＮ部分推論器３１は、層、チャネル、及び行列のうちの少なくとも１つの要素に基づく粒度で分割される。これにより、監視・調整モジュール１００は、システム管理者の目的に応じた任意の粒度でリソース使用量の調整が可能となる。 Further, the DNN inferior 70 is composed of a plurality of elements including a layer, a channel, and a matrix. The DNN partial reasoner 31 is then divided at a particle size based on at least one element of the layer, channel, and matrix. As a result, the monitoring / adjustment module 100 can adjust the resource usage amount at an arbitrary granularity according to the purpose of the system administrator.

　また、決定部１１５は、タプルで構成された制御情報に基づいて計算方法を決定する。これにより、監視・調整モジュール１００は、複数の要素で計算方法を規定できる。 Further, the determination unit 115 determines the calculation method based on the control information composed of tuples. Thereby, the monitoring / adjustment module 100 can specify the calculation method by a plurality of elements.

　また、算出部１１４は、情報処理システム１におけるリソース余剰に基づいて、目標リソース使用量を算出する。これにより、監視・調整モジュール１００は、簡易に目標リソース量を算出できる。 Further, the calculation unit 114 calculates the target resource usage amount based on the resource surplus in the information processing system 1. As a result, the monitoring / adjustment module 100 can easily calculate the target resource amount.

　また、算出部１１４は、リソース余剰と、予め定められるマージンに基づいて、目標リソース使用量を算出する。これにより、監視・調整モジュール１００は、システムの可用性を高めることができる。 In addition, the calculation unit 114 calculates the target resource usage amount based on the resource surplus and the predetermined margin. As a result, the monitoring / adjustment module 100 can increase the availability of the system.

　また、監視・調整モジュール１００は、リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する対応情報格納部１１２（記憶部の一例）を備える。決定部１１５は、対応情報に基づいて、計算方法を決定する。これにより、監視・調整モジュール１００は、システムにおけるＤＮＮによる処理の精度を維持しつつ、ＤＮＮの各要素に対するリソース量の調整を迅速に行うことができる。 Further, the monitoring / adjustment module 100 includes a correspondence information storage unit 112 (an example of a storage unit) that stores correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The determination unit 115 determines the calculation method based on the correspondence information. As a result, the monitoring / adjusting module 100 can quickly adjust the amount of resources for each element of the DNN while maintaining the accuracy of the processing by the DNN in the system.

　また、対応情報格納部１１２は、機械学習の結果として得られる対応情報を記憶する。これにより、リソース調整の精度を高めることができる。 Further, the correspondence information storage unit 112 stores the correspondence information obtained as a result of machine learning. This makes it possible to improve the accuracy of resource adjustment.

　また、決定部１１５は、対応情報に基づいて、目標リソース使用量に近いリソース使用量に対応付けられた複数の計算方法を取得し、取得した複数の計算方法に基づいて、計算方法を決定する。これにより、リソース量の調整速度と、調整精度とをバランスを取ることができる。 Further, the determination unit 115 acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount based on the correspondence information, and determines the calculation method based on the acquired plurality of calculation methods. .. This makes it possible to balance the adjustment speed of the resource amount and the adjustment accuracy.

　また、監視・調整モジュール１００は、情報処理システム１において、目標リソース使用量が所定の基準以下となることを通知する通知部１１６を備える。これにより、情報処理システム１の動作の安全性を高めることができる。 Further, the monitoring / adjusting module 100 includes a notification unit 116 for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system 1. Thereby, the safety of the operation of the information processing system 1 can be enhanced.

　また、取得部１１３は、決定部１１５により計算方法が決定される度に、リソース使用量を取得し、算出部１１４は、取得部１１３によりリソース使用量を取得される度に、次に処理対象となるＤＮＮ部分推論器３１の目標リソース使用量を算出し、決定部１１５は、算出部１１４により目標リソース使用量が算出される度に、次に処理対象となるＤＮＮ部分推論器３１の計算方法を決定する。これにより、システムの状態にできるだけ合わせて、ＤＮＮ部分推論器３１のリソース量調整を実現できる。 Further, the acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 next processes the resource usage amount each time the acquisition unit 113 acquires the resource usage amount. The target resource usage amount of the DNN partial inference device 31 is calculated, and the determination unit 115 calculates the target resource usage amount of the DNN partial inference device 31 to be processed next each time the target resource usage amount is calculated by the calculation unit 114. To determine. Thereby, the resource amount adjustment of the DNN partial inference device 31 can be realized according to the state of the system as much as possible.

　取得部１１３は、決定部１１５により計算方法が決定される度に、リソース使用量を取得し、算出部１１４は、取得部１１３によりリソース使用量を取得される度に、以降に処理対象となる全てのＤＮＮ部分推論器３１の目標リソース使用量を算出し、決定部１１５は、算出部１１４により目標リソース使用量が算出される度に、以降に処理対象となる全てのＤＮＮ部分推論器３１の計算方法を決定する。これにより、システムにおけるリソース切り替えに伴うオーバーヘッドを削減できる。 The acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 is subject to subsequent processing each time the resource usage amount is acquired by the acquisition unit 113. The target resource usage of all the DNN partial inference devices 31 is calculated, and each time the calculation unit 114 calculates the target resource usage of the determination unit 115, the determination unit 115 of all the DNN partial inference devices 31 to be processed thereafter. Determine the calculation method. This can reduce the overhead associated with resource switching in the system.

　以上、本開示の実施形態について説明したが、本開示の技術的範囲は、上述の実施形態そのままに限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。また、異なる実施形態及び変形例にわたる構成要素を適宜組み合わせてもよい。 Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments as they are, and various changes can be made without departing from the gist of the present disclosure. In addition, components spanning different embodiments and modifications may be combined as appropriate.

　また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示の技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者にとって明らかな他の効果を奏しうる。 Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques of the present disclosure may have other effects apparent to those of skill in the art from the description herein, in addition to, or in lieu of, the above effects.

　なお、本開示の技術は、本開示の技術的範囲に属するものとして、以下のような構成もとることができる。
（１）
　ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置であって、
　前記情報処理システムの全体のリソース使用量を取得する取得部と、
　前記リソース使用量に基づいて、前記推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する算出部と、
　前記目標リソース使用量に対応する計算方法を決定する決定部と
　を備える情報処理装置。
（２）
　前記推論器は、予め定められる条件に基づいて、所定の粒度で複数の部分推論器に分割され、
　前記算出部は、次に推論処理の計算を行う前記部分推論器の推論処理に割り当てる前記目標リソース使用量を算出し、
　前記決定部は、前記目標リソース使用量に基づいて、次に推論処理の計算を行う前記部分推論器の推論処理における計算方法を決定する
　前記（１）に記載の情報処理装置。
（３）
　前記推論器は、前記リソースの使用量ごとの前記計算方法を解析するために要するコスト、又はリソース使用量の調整に要する時間間隔に基づいて、複数の部分推論器に分割される
　前記（２）に記載の情報処理装置。
（４）
　前記推論器は、層、チャネル、及び行列を含む複数の要素で構成され、
　前記部分推論器は、層、チャネル、及び行列のうちの少なくとも１つの要素に基づく粒度で分割される
　前記（３）に記載の情報処理装置。
（５）
　前記決定部は、
　タプルで構成された制御情報に基づいて前記計算方法を決定する
　前記（１）に記載の情報処理装置。
（６）
　前記算出部は、
　前記情報処理システムにおけるリソース余剰に基づいて、前記目標リソース使用量を算出する
　前記（１）～前記（５）のいずれか１つに記載の情報処理装置。
（７）
　前記算出部は、
　前記リソース余剰と、予め定められるマージンに基づいて、前記目標リソース使用量を算出する
　前記（６）に記載の情報処理装置。
（８）
　リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する記憶部
　をさらに備え、
　前記決定部は、
　前記対応情報に基づいて、前記計算方法を決定する
　前記（１）～前記（７）のいずれか１つに記載の情報処理装置。
（９）
　前記記憶部は、
　機械学習の結果として得られる前記対応情報を記憶する
　前記（８）に記載の情報処理装置。
（１０）
　前記決定部は、
　前記対応情報に基づいて、前記目標リソース使用量に近い前記リソース使用量に対応付けられた複数の前記計算方法を取得し、取得した複数の前記計算方法に基づいて、前記計算方法を決定する
　前記（８）又は前記（９）に記載の情報処理装置。
（１１）
　前記情報処理システムにおいて、前記目標リソース使用量が所定の基準以下となることを通知する通知部
　をさらに備える前記（１）～前記（１０）のいずれか１つに記載の情報処理装置。
（１２）
　前記取得部は、
　前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
　前記算出部は、
　前記取得部により前記リソース使用量が取得される度に、次に処理対象となる前記部分推論器の前記目標リソース使用量を算出し、
　前記決定部は、
　前記算出部により前記目標リソース使用量が算出される度に、次に処理対象となる前記部分推論器の前記計算方法を決定する
　前記（２）に記載の情報処理装置。
（１３）
　前記取得部は、
　前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
　前記算出部は、
　前記取得部により前記リソース使用量が取得される度に、以降に処理対象となる全ての前記部分推論器の前記目標リソース使用量を算出し、
　前記決定部は、
　前記算出部により前記目標リソース使用量が算出される度に、以降に処理対象となる全ての前記部分推論器の前記計算方法を決定する
　前記（２）に記載の情報処理装置。
（１４）
　ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置のプロセッサが、
　前記情報処理システムの全体のリソース使用量を取得し、
　前記リソース使用量に基づいて、前記推論器の推論処理に割り当てる目標リソース使用量を算出し、
　前記目標リソース使用量に基づいて、前記推論器の推論処理における計算方法を決定する
　情報処理方法。 The technology of the present disclosure can be configured as follows, assuming that it belongs to the technical scope of the present disclosure.
(1)
It is an information processing device applied to an information processing system that uses the inference result by an inference device using a neural network.
An acquisition unit that acquires the total resource usage of the information processing system,
A calculation unit that calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
An information processing device including a determination unit that determines a calculation method corresponding to the target resource usage.
(2)
The inference device is divided into a plurality of partial inference devices with a predetermined particle size based on predetermined conditions.
The calculation unit calculates the target resource usage amount to be allocated to the inference processing of the partial inference device that next calculates the inference processing.
The information processing apparatus according to (1), wherein the determination unit determines a calculation method in the inference processing of the partial inference device that next calculates the inference processing based on the target resource usage amount.
(3)
The inferior is divided into a plurality of partial inferiors based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage (2). The information processing device described in.
(4)
The inferencer is composed of a plurality of elements including layers, channels, and matrices.
The information processing apparatus according to (3) above, wherein the partial inference device is divided by a particle size based on at least one element of a layer, a channel, and a matrix.
(5)
The decision-making part
The information processing apparatus according to (1) above, wherein the calculation method is determined based on control information composed of tuples.
(6)
The calculation unit
The information processing apparatus according to any one of (1) to (5) above, which calculates the target resource usage amount based on the resource surplus in the information processing system.
(7)
The calculation unit
The information processing apparatus according to (6), wherein the target resource usage amount is calculated based on the resource surplus and a predetermined margin.
(8)
It also has a storage unit that stores the correspondence information between the resource usage and the calculation method pre-analyzed for each resource usage.
The decision-making part
The information processing apparatus according to any one of (1) to (7), wherein the calculation method is determined based on the corresponding information.
(9)
The storage unit is
The information processing device according to (8) above, which stores the corresponding information obtained as a result of machine learning.
(10)
The decision-making part
Based on the correspondence information, a plurality of the calculation methods associated with the resource usage amount close to the target resource usage amount are acquired, and the calculation method is determined based on the acquired plurality of the calculation methods. (8) or the information processing apparatus according to (9) above.
(11)
The information processing apparatus according to any one of (1) to (10) above, further comprising a notification unit for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system.
(12)
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Each time the resource usage amount is acquired by the acquisition unit, the target resource usage amount of the partial inference device to be processed next is calculated.
The decision-making part
The information processing apparatus according to (2), wherein the calculation method of the partial inference device to be processed next is determined each time the target resource usage amount is calculated by the calculation unit.
(13)
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Every time the resource usage amount is acquired by the acquisition unit, the target resource usage amount of all the partial inference devices to be processed thereafter is calculated.
The decision-making part
The information processing apparatus according to (2) above, wherein each time the target resource usage amount is calculated by the calculation unit, the calculation method of all the partial inference devices to be processed thereafter is determined.
(14)
The processor of the information processing device applied to the information processing system that uses the inference result by the inference device using the neural network
Acquire the total resource usage of the information processing system,
Based on the resource usage, the target resource usage to be allocated to the inference processing of the inference device is calculated.
An information processing method that determines a calculation method in the inference processing of the inference device based on the target resource usage.

１　情報処理システム
１１　プロセッサ
１２　主記憶装置
１３　補助記憶装置
１４　周辺回路
１５　入力装置
１６　出力装置
１７　周辺装置
１８　通信装置
２０　内部バス
３０　ＤＮＮモデル
３１　ＤＮＮ部分推論器
５０　システムモジュール
７０　ＤＮＮ推論器
９０　ユーザ通知モジュール
９１　操作デバイス
１００　監視・調整モジュール
１１１　リソース使用量情報格納部
１１２　対応情報格納部
１１３　取得部
１１４　算出部
１１５　決定部
１１６　通知部 1 Information processing system 11 Processor 12 Main storage device 13 Auxiliary storage device 14 Peripheral circuit 15 Input device 16 Output device 17 Peripheral device 18 Communication device 20 Internal bus 30 DNN model 31 DNN partial inference device 50 System module 70 DNN inference device 90 User notification Module 91 Operation device 100 Monitoring / adjustment module 111 Resource usage information storage unit 112 Corresponding information storage unit 113 Acquisition unit 114 Calculation unit 115 Decision unit 116 Notification unit

Claims

It is an information processing device applied to an information processing system that uses the inference result by an inference device using a neural network.
An acquisition unit that acquires the total resource usage of the information processing system,
A calculation unit that calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
An information processing device including a determination unit that determines a calculation method corresponding to the target resource usage.

The inference device is divided into a plurality of partial inference devices with a predetermined particle size based on predetermined conditions.
The calculation unit calculates the target resource usage amount to be allocated to the inference processing of the partial inference device that next calculates the inference processing.
The information processing apparatus according to claim 1, wherein the determination unit determines a calculation method in the inference processing of the partial inference device that next calculates the inference processing based on the target resource usage amount.

The inference device is divided into a plurality of partial inferencers based on the cost required for analyzing the calculation method for each resource usage amount or the time interval required for adjusting the resource usage amount according to claim 2. The information processing device described.

The inferencer is composed of a plurality of elements including layers, channels, and matrices.
The information processing apparatus according to claim 3, wherein the partial inference device is divided by a particle size based on at least one element of a layer, a channel, and a matrix.

The decision-making part
The information processing apparatus according to claim 1, wherein the calculation method is determined based on control information composed of tuples.

The calculation unit
The information processing apparatus according to claim 1, wherein the target resource usage amount is calculated based on the resource surplus in the information processing system.

The calculation unit
The information processing apparatus according to claim 6, wherein the target resource usage amount is calculated based on the resource surplus and a predetermined margin.

It also has a storage unit that stores the correspondence information between the resource usage and the calculation method pre-analyzed for each resource usage.
The decision-making part
The information processing apparatus according to claim 1, wherein the calculation method is determined based on the corresponding information.

The storage unit is
The information processing apparatus according to claim 8, which stores the corresponding information obtained as a result of machine learning.

The decision-making part
A claim that acquires a plurality of the calculation methods associated with the resource usage amount close to the target resource usage amount based on the correspondence information, and determines the calculation method based on the acquired plurality of the calculation methods. Item 8. The information processing apparatus according to Item 8.

The information processing apparatus according to claim 1, further comprising a notification unit for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system.

The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Each time the resource usage is acquired by the acquisition unit, the target resource usage of the partial inference device that next calculates the inference process is calculated.
The decision-making part
The information processing apparatus according to claim 2, wherein the calculation method of the partial inference device that next calculates the inference process is determined each time the target resource usage amount is calculated by the calculation unit.

The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Every time the resource usage is acquired by the acquisition unit, the target resource usage of all the partial inference devices for which the inference processing is calculated thereafter is calculated.
The decision-making part
The information processing apparatus according to claim 2, wherein every time the target resource usage amount is calculated by the calculation unit, the calculation method of all the partial inference devices for which the inference processing is calculated thereafter is determined.

The processor of the information processing device applied to the information processing system that uses the inference result by the inference device using the neural network
Acquire the total resource usage of the information processing system,
Based on the resource usage, the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device is calculated.
An information processing method that determines a calculation method corresponding to the target resource usage.