[go: up one dir, main page]

WO2021235312A1 - Information processing device, and information processing method - Google Patents

Information processing device, and information processing method Download PDF

Info

Publication number
WO2021235312A1
WO2021235312A1 PCT/JP2021/018193 JP2021018193W WO2021235312A1 WO 2021235312 A1 WO2021235312 A1 WO 2021235312A1 JP 2021018193 W JP2021018193 W JP 2021018193W WO 2021235312 A1 WO2021235312 A1 WO 2021235312A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource usage
inference
information processing
dnn
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/018193
Other languages
French (fr)
Japanese (ja)
Inventor
馨 佐宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Publication of WO2021235312A1 publication Critical patent/WO2021235312A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks

Definitions

  • This disclosure relates to an information processing device and an information processing method.
  • DNN Deep Neural Network
  • processing using DNN includes image recognition, voice recognition, and artificial intelligence. Widely applied in the field.
  • the information processing device of one form according to the present disclosure is an information processing device applied to an information processing system that uses an inference result by an inference device using a neural network, and is an acquisition unit. , A calculation unit, and a determination unit.
  • the acquisition unit acquires the total resource usage of the information processing system.
  • the calculation unit calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
  • the decision unit determines the calculation method corresponding to the target resource usage.
  • FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure.
  • the monitoring / adjusting module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 including a DNN model 30 and a system module 50.
  • the information processing system 1 operates the inference of the DNN model 30 in parallel with the processing in the system module 50.
  • the DNN model 30 is composed of a trained model machine-learned using DNN, which is an artificial neural network having a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers (intermediate layers).
  • the DNN model 30 is composed of a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers, as well as a plurality of elements such as a channel and a matrix.
  • the DNN model 30 performs calculation processing of input data by a calculation method given for each element from the monitoring / adjustment module 100, and outputs a calculation result.
  • the system module 50 executes various processes using the inference result of the DNN model 30.
  • the system module 50 is provided with, for example, a module that controls the operation of the robot based on the recognition result of the camera image by the DNN model 30, and a UI (User Interface) function that uses the recognition result of voice and hand gesture by the DNN model 30. It corresponds to a module that executes the processing of the game machine.
  • the delay in action decision due to oversight due to selective removal or processing delay is serious.
  • the control of the electric unit may be controlled at intervals of several tens to several hundreds of microseconds. In this case, even if the resource usage amount of the recognition processing is switched at the timing when the data to be recognized processing is switched, it is not possible to follow the fluctuation of the resource usage amount of the electric unit.
  • the monitoring / adjustment module 100 can instantly adjust the amount of resources used for inference processing by DNN, for example, while monitoring the amount of resources used in the entire system.
  • By reducing the time interval (fineness of elements) that determines the amount of resources it becomes possible to instantly adjust a part of the resources used for inference processing by DNN even during processing. Due to the surplus created as a result of the adjustment, the amount of resources of the entire system does not exceed the limit, and the system continues to operate without abnormal termination, abnormal operation, excess latency, etc.
  • DNN has a property that the recognition accuracy of, for example, a camera image is unlikely to be lowered even if the calculation accuracy is lowered and the processing is performed with a small number of resources.
  • Reference 1 proposes a method of suppressing the influence on the recognition accuracy when the number of quantization bits of DNN is increased or decreased. According to this method (PACT), even if the number of quantization bits is reduced in some object recognition models, it can be suppressed by a certain decrease in recognition accuracy.
  • PACT Parameterized Clipping Activation for Quantized Neural Networks
  • the function realized by the monitoring / adjusting module 100 according to the embodiment of the present disclosure includes a technique for adjusting the resource usage in the middle between the elements during the inference processing by DNN.
  • DNN needs to stop inference by DNN while considering the resource usage of the entire system by paying attention to the property that the accuracy does not easily decrease even if the calculation accuracy (the amount of resources allocated to DNN) is reduced. It is possible to realize an inference device without.
  • the monitoring / adjustment module 100 has a mechanism for considering that the accuracy of the DNN inference processing is not lowered as much as possible when determining the resource usage amount of each element. This is realized by analyzing in advance the relationship between the amount and the accuracy (inference accuracy) and the element that reduces the resources required for processing in the DNN model.
  • this technology data processing can be performed with extremely high accuracy when there is a margin in the resource usage of the entire system, and even in a situation where there is no margin in the usage, with a certain degree of processing (recognition) accuracy. It can be guaranteed that data processing can be continued.
  • the monitoring / adjustment module 100 acquires the total resource usage of the information processing system 1 (step S1).
  • the total resource usage of the information processing system 1 corresponds to, for example, power consumption, memory usage, processing time, and the like.
  • the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1 (step S2).
  • the calculation of the inference process by the DNN model 30 is composed of, for example, a series of calculations of a plurality of layers composed of multiple stages. Therefore, the monitoring / adjustment module 100 calculates the target resource usage amount for each layer constituting the inference processing of the DNN model, for example, based on the resource surplus of the information processing system 1.
  • the monitoring / adjustment module 100 determines a calculation method corresponding to the target resource usage amount (step S3), and outputs the calculation method to the DNN model 30. Specifically, the monitoring / adjustment module 100 determines the calculation method based on the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The calculation method is determined based on "quantization bit number", "Pruning ratio”, etc., which are control information indicating how the calculation of the inference process by the DNN model 30 is performed.
  • the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1. Then, the monitoring / adjustment module 100 determines a calculation method corresponding to the calculated target resource usage amount. Thereby, the monitoring / adjusting module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.
  • FIG. 2 is a diagram showing a schematic configuration example of an information processing system according to an embodiment of the present disclosure.
  • the information processing system 1 communicates with a processor 11, a main storage device 12, an auxiliary storage device 13, a peripheral circuit 14, an input device 15, an output device 16, and a peripheral device 17.
  • a device 18 is provided.
  • the processor 11, the main storage device 12, the auxiliary storage device 13, the peripheral circuit 14, and the communication device 18 are connected to each other via the internal bus 20.
  • the processor 11 is realized by, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit).
  • the processor 11 executes arithmetic processing and operation control in the information processing system 1.
  • the main storage device 12 is realized by a semiconductor memory element such as a RAM (Random Access Memory).
  • the auxiliary storage device 13 is realized by a semiconductor memory element such as a ROM (Read Only Memory) or a storage device such as a hard disk or an optical disk.
  • the peripheral circuit 14 is realized by an A / D converter, a timer, a signal processing circuit, or the like.
  • the peripheral circuit 14 processes various signals and data of the input device 15, the output device 16, and the peripheral device 17.
  • the input device 15 is realized by, for example, a mouse, a keyboard, a touch panel, buttons, switches, levers, and the like. Further, the input device 15 can also be realized by a voice input device such as a remote controller or a microphone capable of transmitting a control signal using infrared rays or other radio waves.
  • the output device 16 can be realized by a display device such as a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), or an organic EL, and an audio output device such as a speaker or a headphone. Further, the output device 16 can also be realized by a device such as a printer, a mobile phone, or a facsimile that can visually or audibly notify the user of the acquired information.
  • a display device such as a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), or an organic EL
  • an audio output device such as a speaker or a headphone.
  • the output device 16 can also be realized by a device such as a printer, a mobile phone, or a facsimile that can visually or audibly notify the user of the acquired information.
  • the peripheral device 17 is another device mounted on the information processing system 1 other than the input device 15 and the output device 16.
  • the peripheral device 17 can be realized by, for example, various sensors such as an acceleration sensor and an angular velocity sensor, an inertial measurement unit, a ToF (Time of Flight) sensor, a GPS (Global Positioning System), an actuator, a camera, a speaker, a battery, and the like.
  • the communication device 18 can be realized by a wireless module such as a NIC (Network Interface Card), various communication modems, Bluetooth (registered trademark) and Wi-Fi (registered trademark).
  • a wireless module such as a NIC (Network Interface Card), various communication modems, Bluetooth (registered trademark) and Wi-Fi (registered trademark).
  • the processor 11 shown in FIG. 2 executes various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. It can be realized by. Further, the inference by the DNN model 30 can be realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a working area. .. Further, various processes by the system module 50 are realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. obtain.
  • the processor 11, the main storage device 12, and the auxiliary storage device 13 cooperate with software (various programs) to perform various functions (for example, acquisition unit 113 to notification unit 116) of the monitoring / adjustment module 100 described below. Processing function) can be realized.
  • software for example, acquisition unit 113 to notification unit 116
  • processing function can be realized.
  • a program downloaded from an external device such as a server can also be used via the communication device 18.
  • FIG. 3 is a diagram showing a configuration example of the DNN model according to the embodiment of the present disclosure.
  • 4 and 5 are diagrams showing an outline of the DNN partial inference device according to the embodiment of the present disclosure.
  • the DNN model 30 is composed of a plurality of DNN partial inferiors 31 (31 m to 31 m + n) divided into predetermined particle sizes based on predetermined conditions.
  • the DNN model 30 is composed of a plurality of elements including layers, channels, and matrices.
  • the DNN partial reasoner 31 is configured by being divided by a particle size based on at least one element of the layers, channels, and matrices constituting the DNN model 30, respectively.
  • the particle size for dividing the DNN model 30 is determined in advance by the administrator of the monitoring / adjustment module 100, which will be described later, based on the cost required for the preliminary analysis and the required resource usage adjustment time interval.
  • the DNN partial inferior 31 calculates and outputs the result of a certain element (block) of the divided DNN by, for example, the calculation method given by the monitoring / adjusting module 100. Therefore, the input is the input data of DNN inference or the output (activity value: Activation) of the DNN partial inference device 31 in the previous stage, and the output is the result of the element (block) of the DNN in charge. That is, as shown in FIG. 3, by connecting and operating a plurality of DNN partial inference devices 31 (31 m to 31 m + n ), it is possible to realize a function of returning the result of DNN inference for the input data. .. Depending on the structure of the DNN model 30 to be inferred, a branched structure may be required, but it can be configured in the same manner.
  • the DNN partial inference device 31 receives the calculation method Om (control information) which is the output of the monitoring / adjusting module 100 as an input, as shown in FIG.
  • the DNN partial inference device 31 simultaneously acquires the input data and the calculation method Om (control information), performs a calculation on the input data by the acquired (based on the control information) calculation method, and as a result of the DNN element (block) in charge. Returns the output.
  • the number of quantization bits and the quantization method of the weight and the activation value, the Pruning ratio of the Pruning method, the calculation formula to be used, the weight parameter, and the like can be changed.
  • FIG. 6 is a diagram showing an outline of the DNN inference device according to the embodiment of the present disclosure.
  • FIG. 7 is a functional block diagram showing an example of the functional configuration of the monitoring / adjustment module according to the embodiment of the present disclosure.
  • the DNN inference device 70 can be realized by combining the DNN partial inference device 31 (31 m to 31 m + n ) and the monitoring / adjustment module 100 (100 m to 100 m + n).
  • a monitoring / adjustment module 100 (100 m to 100 m + n ) is provided for each DNN partial inference device 31.
  • the monitoring / adjustment module 100 acquires the resource usage of the entire system every time the decomposed element of the DNN is calculated, and puts it in the system state. You can switch to the matching calculation method each time. As a result, the resource usage can be adjusted instantly, and the DNN inference device 70 that does not require interruption of inference can be realized.
  • the "block" appearing in the following description is synonymous with the DNN partial inference device 31.
  • the monitoring / adjustment module 100 includes a resource usage information storage unit 111, a correspondence information storage unit 112, an acquisition unit 113, a calculation unit 114, a determination unit 115, and a notification unit 116. Be prepared.
  • the monitoring / adjusting module 100 realizes or executes the functions and operations of the monitoring / adjusting module 100 described below by each of these parts.
  • Each block (resource usage information storage unit 111 to notification unit 116) constituting the monitoring / adjustment module 100 is a functional block indicating the function of the monitoring / adjustment module 100, respectively.
  • These functional blocks may be software blocks or hardware blocks.
  • each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die).
  • each functional block may be one processor or one integrated circuit.
  • the method of configuring the functional block is arbitrary.
  • the monitoring / adjustment module 100 may be configured in a functional unit different from the above-mentioned functional block.
  • the resource usage information storage unit 111 stores information on the resource usage used in the calculation of each block of DNN (DNN partial inference device 31).
  • the resource usage information stored in the resource usage information storage unit 111 is stored by the calculation unit 114, which will be described later.
  • the correspondence information storage unit 112 stores the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount.
  • the calculation method for each resource usage is acquired, for example, by prior analysis of the operator who is the administrator of the monitoring / adjustment module 100.
  • the acquisition unit 113 acquires the total resource usage of the information processing system 1.
  • the acquisition unit 113 acquires the total power consumption and memory usage of the information processing system 1 from, for example, the system module 50.
  • the calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 based on the total resource usage amount of the information processing system 1 acquired by the acquisition unit 113. For example, the calculation unit 114 calculates the target resource usage amount to be allocated for each calculation of the inference process by the DNN partial inference device 31 (see FIG. 6) divided into predetermined particle sizes based on predetermined conditions. The calculation unit 114 calculates the target resource usage amount based on, for example, the resource surplus of the information processing system 1.
  • the determination unit 115 determines the calculation method of the DNN partial inference device 31 corresponding to the target resource usage calculated by the calculation unit 114.
  • the calculation method is control information indicating how to calculate an element (block) of a certain DNN model 30, that is, a DNN partial inference device 31 (see FIG. 6) that performs inference processing next, and is "quantization bit". It corresponds to "number” and "Pruning ratio".
  • the determination unit 115 can determine the calculation method based on the control information composed of tuples, such as combining the "quantization bit number” and the "Pruning ratio".
  • the determination unit 115 outputs the determined calculation method to the corresponding DNN partial inference device 31.
  • the notification unit 116 notifies that the resource usage allocated to the DNN partial inferior 31 will decrease. For example, when the target resource usage amount calculated by the calculation unit 114 is equal to or less than a predetermined standard, the notification unit 116 may reduce the accuracy of the inference processing result (inference result) by the DNN inference device 70. Notify.
  • FIG. 8 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.
  • the monitoring / adjusting module 100 determines the calculation method Om of a certain element (block) constituting the DNN model 30 from the total resource usage Im of the information processing system 1 given as an input. And output.
  • An element may be a part of a layer, a channel (Map), a matrix (Tensor), or a combination thereof.
  • the administrator of the monitoring / adjusting module 100 divides the DNN model 30 into a plurality of blocks in advance, and assigns a block number bi to each block. The degree of fine division is determined based on the cost required for pre-analysis and the required resource usage adjustment time interval.
  • the resource usage amount of the entire system given as an input to the monitoring / adjustment module 100 can be given the resource usage amount such as the power consumption, memory usage amount, and processing time of the entire system.
  • the monitoring / adjustment module 100 may acquire a predicted value of future usage using a simple machine learning model instead of the measured value as the total resource usage of the information processing system 1. By doing so, it is possible to cope with a sudden change in the amount of resources used in the information processing system 1. Further, the monitoring / adjustment module 100 may acquire a taple that combines a plurality of different types of usage, such as power consumption and memory usage, as the total resource usage of the information processing system 1. good. By doing so, it is possible to calculate an appropriate output according to the amount of each resource used.
  • the monitoring / adjustment module 100 can specify the DNN block number bi as an input. This is used to determine which block calculation method is to be determined and output.
  • the monitoring / adjustment module 100 can receive the resource usage amount Rdn'allocated to each block before the block number bi. This is used to grasp the calculation status of the entire model and determine the calculation method Om that can improve the accuracy of the entire DNN model.
  • the calculation method Om which is the output of the monitoring / adjustment module 100, is control information indicating how to calculate the block with the block number bi. For example, the number of quantization bits for adjusting the accuracy of numerical calculation, the Pruning ratio indicating what percentage of DNN elements are omitted as represented by the Pruning method, the network structure and parameters to be used, and the like.
  • the calculation method Om which is the output of the monitoring / adjustment module 100, determines the calculation method for one DNN block, but one block includes a plurality of channels: Channel and a matrix: Tensor. There is.
  • the number of quantization bits Qb may be configured to have a plurality of quantization bits so that the number of quantization bits or the like can be specified for each of the elements in the block.
  • FIG. 9 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.
  • the determination of the calculation method Om by the monitoring / adjustment module 100 is performed in two stages. First, in the first stage, the monitoring / adjusting module 100 determines the target resource usage Rdnn used in the block of the block number bi from the resource usage Im of the entire system given as an input. The operation of the first stage is realized, for example, by the function of the calculation unit 114 shown in FIG. 7.
  • the monitoring / adjustment module 100 selects and determines the calculation method Om that does not reduce the DNN calculation accuracy as much as possible from among the calculation methods that achieve the target resource usage Rdnn.
  • the operation of the second stage is realized, for example, by the function of the determination unit 115 shown in FIG. 7. That is, the acquisition unit 113 of the monitoring / adjustment module 100 acquires the resource usage amount each time the calculation method is determined by the determination unit 115. Subsequently, the monitoring / adjustment module 100 calculation unit 114 calculates the target resource usage amount of the DNN partial inference device 31 that next calculates the inference processing each time the resource usage amount is acquired by the acquisition unit 113. Then, each time the determination unit 115 of the monitoring / adjustment module 100 calculates the target resource usage amount by the calculation unit 114, the determination unit 115 determines the calculation method of the DNN partial inference device that next calculates the inference process.
  • the target resource usage amount Rdnn determined by the operation of the first stage described above, there is a method of allocating the resource surplus of the system to the calculation of DNN. For example, it can be calculated by the following equation (1) using the input resource usage amount Im of the entire system, the maximum resource amount Rmax that can be supplied by the system, and the constant margin Epsilon.
  • “Target resource usage Rdnn” "Maximum resource usage Rmax"-"Resource usage Im"-"Constant margin Epsilon" ... (1)
  • FIG. 10 is a diagram showing an outline of a method for determining a calculation method based on the preliminary analysis result according to the embodiment of the present disclosure.
  • FIG. 10 shows an example of a method of listing in advance calculation methods with less decrease in accuracy of inference processing for various combinations of resource usage.
  • the input Im is one type of resource such as power consumption and memory usage, and the range is from 0% to 100%.
  • Method Om is analyzed in advance. This analysis is performed for all combinations of the target resource usage Rdnn of the block to be determined by the calculation method and the resource usage Rdn'used in the previous block.
  • the monitoring / adjustment module 100 acquires and acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount Rdnn based on the correspondence information stored in the correspondence information storage unit 112. Determine the calculation method based on multiple calculation methods. Specifically, the monitoring / adjustment module 100 can determine the calculation method Om by searching for points (interpolation points) close to the input Im from the pre-analysis results during operation and interpolating from them. If the number of interpolation points is increased to 11 or more in the pre-analysis, a more accurate calculation method Om can be determined. Further, even if the input Im is for a plurality of resources, it can be dealt with by similarly analyzing the interpolation points in advance.
  • the operation of the monitoring / adjustment module 100 described above is only an example, and as an input Im, the output of the monitoring / adjustment module 100 in front (if the operating subject is the monitoring / adjustment module 100 m + 1 , the monitoring / adjustment module 100 m ).
  • the calculation method Om is also given. From this, it is possible to know the change over time in the resource usage of the entire system over a long period of time and the degree of calculation accuracy assigned to which element in the DNN. Therefore, it is possible to obtain a calculation method Om that is more stable and improves the accuracy of the entire DNN process with respect to the resource usage amount with large fluctuation.
  • FIG. 11 is a diagram showing an outline of the notification operation according to the embodiment of the present disclosure.
  • the information processing system 1 includes a user notification module 90.
  • the user notification module 90 visualizes and notifies the user of the information processing system 1, for example, of the processing state of the information processing system 1.
  • the processing state corresponds to, for example, a decrease in recognition accuracy in the case of recognition processing such as image recognition or voice recognition, and a decrease in responsiveness in the case of UI response processing.
  • the monitoring / adjustment module 100 executes notification to the user notification module 90, for example, when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard.
  • the user notification module 90 Upon receiving the notification from the monitoring / adjusting module 100, the user notification module 90 visualizes information indicating that the responsiveness of the information processing system 1 is deteriorated, for example, on the operation device 91 operated by the user.
  • a visualization method for example, a method of lighting the light emitting unit provided in the operation device 91 with a predetermined color, or a method of blinking the light emitting unit can be considered.
  • the monitoring / adjustment module 100 may notify the system module 50 when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard.
  • the system module 50 can change the operation of the system so as to improve the safety of the user according to the decrease in the accuracy of the inference result by the DNN inference device 70.
  • the information processing system 1 is a transport robot system
  • the information processing system 1 is a pet-type robot, it is conceivable to change the operation so as to take a resting gesture, for example, by closing the eyes in accordance with the decrease in the accuracy of the response processing.
  • FIG. 12 is a flowchart showing an example of a processing procedure by the monitoring / adjusting module according to the embodiment of the present disclosure. The processing procedure shown in FIG. 12 is repeatedly executed while the information processing system 1 is in operation.
  • the acquisition unit 113 acquires the input Im, which is the total resource usage of the information processing system 1 (step S101).
  • the calculation unit 114 calculates the target resource usage amount Rdnn of the DNN partial inference device 31 based on the input Im, and stores it in the resource usage amount information storage unit 111 (step S102).
  • the determination unit 115 uses the block number bi assigned to the current block (DNN partial inferior 31) to be processed as a key, and the resource usage amount Rdn'used in the blocks before the current block as the resource usage amount. Obtained from the information storage unit 111 (step S103).
  • the determination unit 115 determines the calculation method Om based on the target resource usage amount Rdnn and the resource usage amount Rdnn'used in the previous block (step S104).
  • the determination unit 115 outputs the determined calculation method Om to the current block to be processed (step S105), and returns to the processing procedure of step S101.
  • the DNN inferior 70 can be configured. Simple machine learning models and Kalman filters can be used to predict future resource usage.
  • the particle size of the DNN model 30 to be finely divided and the pre-analysis to configure the DNN inferior 70 depends on how much time required for the pre-analysis and the cost of computational resources can be tolerated, and the amount of resources used. It depends on how fine the time interval you want to switch between. In order to reduce the switching time interval, it is necessary to evaluate more combinations of calculation methods in the preliminary analysis, which requires a large implementation time and calculation resources. In addition, since the timing at which switching is possible for the entire model increases, the processing time of the entire model may increase due to the switching overhead.
  • the first is, for example, the particle size of dividing the DNN model 30 into blocks, and the second is how finely the elements in the block are analyzed. The balance of these is adjusted with these particle sizes.
  • the DNN inference is the first "particle size for dividing the DNN model 30 into blocks", and the DNN model 30 can be divided into a plurality of DNN partial inference devices 31.
  • the minimum unit in which the resource usage is switched during DNN inference is this particle size. If not much time has passed after processing one block divided by this particle size, the next block is continuously executed with the same resource usage without switching the resource usage, etc. Blocks can be combined by doing so. However, the resource usage cannot be switched in units smaller than the block.
  • the second "grain size of how finely the elements in a block are viewed" is the finer elements in the block, such as channels and matrices, in order to protect the resource usage of the block and maximize the accuracy. How to allocate the number of bits and computing power to tensor). As shown in Reference 1 above, the decrease in accuracy can be suppressed by making this finer, but it is necessary to balance it with the cost of pre-analysis, which is expected to increase by making it finer.
  • the above-mentioned monitoring / adjustment module is also used for a neural network composed of a single layer. Processing by 100 can be applied.
  • the calculation method corresponding to the target resource usage is determined based on the result of pre-analysis of the calculation method corresponding to the resource usage for each element such as the channel and the matrix constituting the layer of the target neural network. can.
  • the correspondence information stored in the correspondence information storage unit 112 is not particularly limited when it is acquired by prior analysis by the operator, and may be acquired as a result of machine learning, for example.
  • a DNN model that uses reinforcement learning to infer the calculation method Om as the output for these inputs, with the inputs as the resource usage Rdn', the target resource usage Rdnn, and the block number bi used in the previous block.
  • the state of reinforcement learning is Rdnn'(the amount of resources used in the previous block), Rdnn (the amount of target resources used), the taple of the block number bi, the output is the calculation method Om, and the reward is calculated.
  • Rdnn' the amount of resources used in the previous block
  • Rdnn the amount of target resources used
  • the output is the calculation method Om
  • the reward is calculated. It may be the difference between the accuracy of the inference processing that can be achieved by the method Om (in the case of the recognition processing, the recognition accuracy) and the accuracy when the inference processing is performed without limiting the resource usage.
  • An efficient calculation method can be determined by using the reinforcement learning model.
  • FIG. 13 is a diagram showing an outline of a method for determining a calculation method based on a preliminary analysis result according to a modified example.
  • FIG. 14 is a diagram showing a configuration example of the DNN inference device according to the modified example.
  • the monitoring / adjustment module 100 calculates all the blocks after the block number bi with respect to the input resource usage amount Rdn', target resource usage amount Rdnn, and block number bi. It may be output.
  • a determination mechanism 71 (determining whether to operate the monitoring / adjustment module 100 to output a new calculation method or to use the previously determined calculation method as it is before calculating a certain block ( 71 m to 71 m + n-1 ) may be newly provided in the DNN inferior 70.
  • the determination mechanism 71 shown in FIG. 14 measures the passage of time from the previous operation of the monitoring / adjusting module 100, and when a certain time has elapsed, operates the monitoring / adjusting module 100 to output a new calculation method. ..
  • the determination mechanism 71 uses the previously determined calculation method as it is. In this way, the overhead of resource switching can be reduced rather than determining the calculation method each time the resource usage of the entire system is acquired.
  • the DNN inference device 70 including the DNN partial inference device 31 and the monitoring / adjustment module 100 described above can be applied to a robot system that processes a camera image by a DNN.
  • Most robots that operate outdoors have a maximum power consumption contract because they are driven by a battery.
  • electric parts with high power consumption such as actuators are mounted, and recognition processing and communication processing for determining actions are also performed at the same time, so that the power consumption fluctuates depending on the situation. If the system is operated without considering the power consumption of the entire system, the usage amount may exceed the maximum supply amount, causing abnormal operation or the inability to control the posture.
  • the maximum power consumption is supplied by executing the recognition process of recognizing the camera image by the DNN by using the DNN inference device 70 composed of the DNN partial inference module 31 and the monitoring / adjustment module 100. It is possible to continue to execute the recognition process while adjusting so that it does not exceed.
  • the current power consumption is given to the input of the monitoring / adjusting module 100, and the Pruning ratio of the next DNN element is determined as the output. As a result, when the power consumption of the entire system becomes large, the pruning ratio becomes large and the calculation of some DNN elements is omitted.
  • the utilization rate of the circuit per hour decreases, the dynamic power consumption of DNN processing is kept low, a surplus is generated, and power is generated in places where a large amount of power consumption is required, such as actuators. Can be supplied.
  • the pruning ratio is determined by prior analysis and the recognition process itself by DNN can be continuously performed, it is possible to minimize the omission of recognition of the animal body captured by the camera and the delay in action decision.
  • the recognition accuracy of the recognition process by DNN decreases, and the detection rate of the animal body decreases to some extent, so that the surroundings It is also possible to take unsafe actions against the environment and people.
  • the above-mentioned DNN inference device 70 can be applied to a game machine having a UI function for recognizing voice, hand gesture, and the like by processing by DNN.
  • the game machine has a predetermined memory capacity.
  • the recognition process by DNN operates in parallel, and when the memory is used to some extent, the drawing quality is lowered to some extent.
  • due to the progress of the game there are scenes in which the player wants to see the scene by improving the drawing quality even if the operability is given up to some extent in the movie scene or the like.
  • the amount of memory used for drawing processing is small. Therefore, in such a case, it is desired to allocate the surplus memory amount to the recognition process by DNN to improve the recognition accuracy.
  • the memory usage of the drawing process is given to the input of the monitoring / adjusting module 100 so that the number of quantization bits is determined as the output. By doing so, when it becomes necessary to secure a large memory area for the drawing process, the number of quantization bits is changed instantly, and the recognition process by the DNN can be performed with a small amount of memory usage. It is possible to improve the drawing quality with the surplus memory capacity created as a result.
  • the responsiveness of the UI function using DNN will decrease, but the game developer can carefully select the scene such as using it when he wants to draw the player into the movie scene, and the responsiveness. It is possible to take measures such as visualizing the decrease in the game to a controller or the like (see FIG. 11 or the like), or instructing the user to interrupt the game and use the UI function.
  • the monitoring / adjustment module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 that uses an inference result by an inference device using a neural network, and is applied to an information processing system 1 and has an acquisition unit 113 and a calculation unit 114. And a determination unit 115.
  • the acquisition unit 113 acquires the total resource usage of the information processing system 1.
  • the calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 (for example, the calculation of the inference processing by the DNN partial inference device 31) based on the resource usage amount.
  • the determination unit 115 determines a calculation method (calculation method of inference processing by the DNN partial inference device 31) corresponding to the target resource usage amount. Therefore, the monitoring / adjustment module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.
  • FIG. 15 is a diagram showing an example of information processing according to a comparative example.
  • FIG. 16 is a diagram showing an example of information processing by the DNN inference device according to the embodiment of the present disclosure.
  • FIG. 17 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing according to the comparative example.
  • FIG. 18 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing by the DNN inferior according to the embodiment of the present disclosure.
  • the function of adjusting the resource usage of processing based on the resource usage of the entire system may be realized by a technique called code switching.
  • code switching two codes having the same purpose are prepared, that is, the code used in the DNN inferior is the "normal code” when there is a resource surplus and the "excess code” when the resource is exceeded. It is assumed that the code can be switched according to the resource usage.
  • the opportunity to adjust the resource usage is limited before executing the inference by DNN such as the DNN inference for the data 1 and the DNN inference for the data 2. Therefore, when the resource usage of the entire system suddenly increases, the resource usage of the DNN inferior cannot be instantly accommodated and adjusted. As a result, the resource usage of the entire system may be exceeded and the system may be stopped, or DNN inference may be stopped for safety.
  • the DNN calculation is a stack of calculation for each element (calculation of each DNN partial inference device 31), and the calculation accuracy is changed for each element. Even if this is the case, the property that the output accuracy of the entire DNN does not easily decrease (see Reference 1 above) is utilized.
  • the DNN inference device 70 according to the embodiment of the present disclosure even if it becomes necessary to adjust the resource usage in the DNN partial inference device 31 m + 1 as shown in FIG. 16, as shown by the arrow in FIG. 18, for each element. Since there is an opportunity to adjust the resource usage, it is possible to make adjustments in sequence.
  • FIG. 19 is a diagram showing an example of information processing by the existing technology.
  • the purpose of adjusting the resource usage of processing based on the resource usage of the entire system is to appropriately allocate the limited resources that can be used in the system in DNN processing and other processing.
  • As a function for realizing such an object for example, it is conceivable to apply the mechanism shown in the prior art document (Japanese Patent Laid-Open No. 2012-43409) described above. That is, when the data processing load (resource usage) is predicted to increase, time-series input data is selectively removed, data processing is delayed, or data processing is offloaded to other systems. It can be realized by reading the data processing as the DNN inference processing in the data processing system.
  • the DNN inference device 70 for example, by utilizing the property that even if the calculation accuracy is lowered during the DNN calculation, the accuracy of the result is not easily affected (see Reference 1 above), for example. As described with reference to FIG. 6 and the like, it is possible to construct a data processing system that does not skip data or increase delay while adjusting the amount of resources used.
  • the DNN inference device 70 (an example of an inference device) is divided into a plurality of DNN partial inference devices 31 (an example of a partial inference device) with a predetermined particle size based on a predetermined condition. Then, the calculation unit 114 calculates the target resource amount to be allocated to the inference processing of the DNN partial inference device 31 to be processed, and the determination unit 115 calculates the target resource amount of the DNN partial inference device 31 to be processed based on the target resource amount. The calculation method in the inference processing is determined and output to the DNN partial inference device 31 to be processed. As a result, the monitoring / adjustment module 100 can allocate an appropriate amount of resources to each DNN partial inferencer 31 from the resources that can be allocated in the system.
  • the DNN inferior 70 is divided into a plurality of DNN partial inferiors 31 based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage.
  • the monitoring / adjustment module 100 enables flexible resource management according to the purpose of the system administrator.
  • the DNN inferior 70 is composed of a plurality of elements including a layer, a channel, and a matrix.
  • the DNN partial reasoner 31 is then divided at a particle size based on at least one element of the layer, channel, and matrix.
  • the monitoring / adjustment module 100 can adjust the resource usage amount at an arbitrary granularity according to the purpose of the system administrator.
  • the determination unit 115 determines the calculation method based on the control information composed of tuples. Thereby, the monitoring / adjustment module 100 can specify the calculation method by a plurality of elements.
  • the calculation unit 114 calculates the target resource usage amount based on the resource surplus in the information processing system 1. As a result, the monitoring / adjustment module 100 can easily calculate the target resource amount.
  • the calculation unit 114 calculates the target resource usage amount based on the resource surplus and the predetermined margin. As a result, the monitoring / adjustment module 100 can increase the availability of the system.
  • the monitoring / adjustment module 100 includes a correspondence information storage unit 112 (an example of a storage unit) that stores correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount.
  • the determination unit 115 determines the calculation method based on the correspondence information.
  • the correspondence information storage unit 112 stores the correspondence information obtained as a result of machine learning. This makes it possible to improve the accuracy of resource adjustment.
  • the determination unit 115 acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount based on the correspondence information, and determines the calculation method based on the acquired plurality of calculation methods. .. This makes it possible to balance the adjustment speed of the resource amount and the adjustment accuracy.
  • the monitoring / adjusting module 100 includes a notification unit 116 for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system 1. Thereby, the safety of the operation of the information processing system 1 can be enhanced.
  • the acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 next processes the resource usage amount each time the acquisition unit 113 acquires the resource usage amount.
  • the target resource usage amount of the DNN partial inference device 31 is calculated, and the determination unit 115 calculates the target resource usage amount of the DNN partial inference device 31 to be processed next each time the target resource usage amount is calculated by the calculation unit 114. To determine. Thereby, the resource amount adjustment of the DNN partial inference device 31 can be realized according to the state of the system as much as possible.
  • the acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 is subject to subsequent processing each time the resource usage amount is acquired by the acquisition unit 113.
  • the target resource usage of all the DNN partial inference devices 31 is calculated, and each time the calculation unit 114 calculates the target resource usage of the determination unit 115, the determination unit 115 of all the DNN partial inference devices 31 to be processed thereafter. Determine the calculation method. This can reduce the overhead associated with resource switching in the system.
  • the technology of the present disclosure can be configured as follows, assuming that it belongs to the technical scope of the present disclosure.
  • It is an information processing device applied to an information processing system that uses the inference result by an inference device using a neural network.
  • An acquisition unit that acquires the total resource usage of the information processing system,
  • a calculation unit that calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
  • An information processing device including a determination unit that determines a calculation method corresponding to the target resource usage.
  • the inference device is divided into a plurality of partial inference devices with a predetermined particle size based on predetermined conditions.
  • the calculation unit calculates the target resource usage amount to be allocated to the inference processing of the partial inference device that next calculates the inference processing.
  • the information processing apparatus according to (1), wherein the determination unit determines a calculation method in the inference processing of the partial inference device that next calculates the inference processing based on the target resource usage amount.
  • the inferior is divided into a plurality of partial inferiors based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage (2).
  • the inferencer is composed of a plurality of elements including layers, channels, and matrices.
  • the information processing apparatus according to (3) above, wherein the partial inference device is divided by a particle size based on at least one element of a layer, a channel, and a matrix.
  • the decision-making part The information processing apparatus according to (1) above, wherein the calculation method is determined based on control information composed of tuples.
  • the calculation unit The information processing apparatus according to any one of (1) to (5) above, which calculates the target resource usage amount based on the resource surplus in the information processing system.
  • the calculation unit The information processing apparatus according to (6), wherein the target resource usage amount is calculated based on the resource surplus and a predetermined margin.
  • the decision-making part The information processing apparatus according to any one of (1) to (7), wherein the calculation method is determined based on the corresponding information.
  • the storage unit is The information processing device according to (8) above, which stores the corresponding information obtained as a result of machine learning.
  • the decision-making part Based on the correspondence information, a plurality of the calculation methods associated with the resource usage amount close to the target resource usage amount are acquired, and the calculation method is determined based on the acquired plurality of the calculation methods. (8) or the information processing apparatus according to (9) above.
  • (11) The information processing apparatus according to any one of (1) to (10) above, further comprising a notification unit for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system.
  • the calculation unit Each time the resource usage amount is acquired by the acquisition unit, the target resource usage amount of the partial inference device to be processed next is calculated.
  • the decision-making part The information processing apparatus according to (2), wherein the calculation method of the partial inference device to be processed next is determined each time the target resource usage amount is calculated by the calculation unit.
  • (13) The acquisition unit Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
  • the decision-making part The information processing apparatus according to (2) above, wherein each time the target resource usage amount is calculated by the calculation unit, the calculation method of all the partial inference devices to be processed thereafter is determined. (14)
  • the processor of the information processing device applied to the information processing system that uses the inference result by the inference device using the neural network Acquire the total resource usage of the information processing system, Based on the resource usage, the target resource usage to be allocated to the inference processing of the inference device is calculated.
  • An information processing method that determines a calculation method in the inference processing of the inference device based on the target resource usage.
  • Information processing system 11
  • Main storage device 13
  • Auxiliary storage device 14
  • Peripheral circuit 15
  • Input device 16
  • Output device 17
  • Peripheral device 18
  • Communication device 20
  • Internal bus 30
  • DNN model 31
  • DNN partial inference device 50
  • System module 70
  • DNN inference device 90
  • User notification Module 91
  • Operation device 100
  • Monitoring / adjustment module 111
  • Resource usage information storage unit 112
  • Corresponding information storage unit 113
  • Calculation unit 115
  • Decision unit 116 Notification unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

An information processing device (100) is applied to an information processing system which utilizes an inferred result obtained by an inferrer employing a neural network, and is provided with an acquiring unit (113), a calculating unit (114), and a determining unit (115). The acquiring unit (113) acquires the overall resource usage amount of the information processing system. The calculating unit (114) calculates, on the basis of the resource usage amount, a target resource usage amount to be allocated to at least a portion of the calculations in the inference processing performed by the inferrer. The determining unit (115) determines a calculation method corresponding to the target resource usage amount.

Description

情報処理装置及び情報処理方法Information processing equipment and information processing method

 本開示は、情報処理装置及び情報処理方法に関する。 This disclosure relates to an information processing device and an information processing method.

 従来、データ処理負荷の増大が予測された場合、入力データを選択的に除去したり、データ処理を遅延させたり、データ処理を他のシステムにオフロードさせたりする技術が提案されている。 Conventionally, when an increase in data processing load is predicted, technologies for selectively removing input data, delaying data processing, and offloading data processing to other systems have been proposed.

 また、近年では、DNN(Deep Neural Network)などの人工ニューラルネットワークを用いることにより機械学習アルゴリズムが急速な進化を遂げており、DNNを用いた処理は、画像認識や音声認識、人工知能などの各分野において広く応用されている。 In recent years, machine learning algorithms have rapidly evolved by using artificial neural networks such as DNN (Deep Neural Network), and processing using DNN includes image recognition, voice recognition, and artificial intelligence. Widely applied in the field.

特開2012-43409号公報Japanese Unexamined Patent Publication No. 2012-43409

 しかしながら、DNNを用いた処理は高い精度を有する一方、演算に係る処理負担が大きく、大量のリソースを必要とする場合がある。そこで、DNNの処理を動作させるシステムに上述の従来技術を適用させることにより、システム全体のリソース使用量がシステムにおいて許容される最大量を超過しないように調整することも考えられる。すなわち、上述の従来技術を適用することにより、システム全体のリソース使用量が最大量を超過すると予測された場合に、データ処理で選択的除去・遅延・オフロードを行うことが考えられる。しかし、データ処理自体のリアルタイム性及び出力品質が低下する等の問題が生じ得る。このため、システム全体のリソース使用量がシステムにおいて許容される最大量を超過しないように調整することを目的として、上述の従来技術の適用は選択し難い。 However, while the processing using DNN has high accuracy, the processing load related to the calculation is large, and a large amount of resources may be required. Therefore, it is conceivable to apply the above-mentioned conventional technique to the system that operates the DNN process so that the resource usage of the entire system does not exceed the maximum amount allowed in the system. That is, by applying the above-mentioned conventional technique, when it is predicted that the resource usage of the entire system exceeds the maximum amount, it is conceivable to perform selective removal / delay / offload in data processing. However, problems such as deterioration of real-time performance and output quality of data processing itself may occur. Therefore, it is difficult to select the application of the above-mentioned prior art for the purpose of adjusting the resource usage of the entire system so as not to exceed the maximum amount allowed in the system.

 そこで、本開示では、リソース使用量の超過を起こさないようにDNNの処理に用いるリソース量を調整できる情報処理装置及び情報処理方法を提案する。 Therefore, in the present disclosure, we propose an information processing device and an information processing method that can adjust the amount of resources used for DNN processing so as not to cause an excess of the amount of resources used.

 上記の課題を解決するために、本開示に係る一形態の情報処理装置は、ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置であって、取得部と、算出部と、決定部とを備える。取得部は、情報処理システムの全体のリソース使用量を取得する。算出部は、リソース使用量に基づいて、推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。決定部は、目標リソース使用量に対応する計算方法を決定する。 In order to solve the above problems, the information processing device of one form according to the present disclosure is an information processing device applied to an information processing system that uses an inference result by an inference device using a neural network, and is an acquisition unit. , A calculation unit, and a determination unit. The acquisition unit acquires the total resource usage of the information processing system. The calculation unit calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage. The decision unit determines the calculation method corresponding to the target resource usage.

本開示の実施形態に係る情報処理の一例を示す図である。It is a figure which shows an example of information processing which concerns on embodiment of this disclosure. 本開示の実施形態に係る情報処理システムの概略構成例を示す図である。It is a figure which shows the schematic structure example of the information processing system which concerns on embodiment of this disclosure. 本開示の実施形態に係るDNNモデルの構成例を示す図である。It is a figure which shows the structural example of the DNN model which concerns on embodiment of this disclosure. 本開示の実施形態に係るDNN部分推論器の概要を示す図である。It is a figure which shows the outline of the DNN partial inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係るDNN部分推論器の概要を示す図である。It is a figure which shows the outline of the DNN partial inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係るDNN推論器の概要を示す図である。It is a figure which shows the outline of the DNN inference device which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの機能構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the functional structure of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの概要を示す図である。It is a figure which shows the outline of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールの概要を示す図である。It is a figure which shows the outline of the monitoring / adjustment module which concerns on embodiment of this disclosure. 本開示の実施形態に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。It is a figure which shows the outline of the determination method of the calculation method based on the preliminary analysis result which concerns on embodiment of this disclosure. 本開示の実施形態に係る通知動作の概要を示す図である。It is a figure which shows the outline of the notification operation which concerns on embodiment of this disclosure. 本開示の実施形態に係る監視・調整モジュールによる処理手順の一例を示すフローチャートである。It is a flowchart which shows an example of the processing procedure by the monitoring / adjustment module which concerns on embodiment of this disclosure. 変形例に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。It is a figure which shows the outline of the determination method of the calculation method based on the preliminary analysis result which concerns on a modification. 変形例に係るDNN推論器の構成例を示す図である。It is a figure which shows the structural example of the DNN inference device which concerns on the modification. 比較例に係る情報処理の一例を示す図である。It is a figure which shows an example of information processing which concerns on a comparative example. 本開示の実施形態に係るDNN推論器による情報処理の一例を示す図である。It is a figure which shows an example of the information processing by the DNN inference device which concerns on embodiment of this disclosure. 比較例に係る情報処理についての経過時間とリソース使用量との関係を示す図である。It is a figure which shows the relationship between the elapsed time and the resource use amount about the information processing which concerns on a comparative example. 本開示の実施形態に係るDNN推論器による情報処理についての経過時間とリソース使用量との関係を示す図である。It is a figure which shows the relationship between the elapsed time and the resource use amount about the information processing by the DNN inference device which concerns on embodiment of this disclosure. 既存技術による情報処理の一例を示す図である。It is a figure which shows an example of information processing by an existing technology.

 以下に、本開示の実施形態について図面に基づいて詳細に説明する。なお、以下の各実施形態において、同一の部位には同一の数字又は符号を付することにより重複する説明を省略する場合がある。また、本明細書及び図面において、実質的に同一の機能構成を有する複数の構成要素を、同一の数字又は符号の後に異なる数字又は符号を付して区別する場合もある。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. In each of the following embodiments, duplicate explanations may be omitted by assigning the same numbers or reference numerals to the same parts. Further, in the present specification and the drawings, a plurality of components having substantially the same functional configuration may be distinguished by adding a different number or reference numeral after the same number or reference numeral.

 また、以下に示す項目順序に従って本開示を説明する。
  1.本開示の実施形態に係る情報処理の一例
  2.情報処理システムの構成例
  3.DNNモデルの構成例
  4.監視・調整モジュールの機能構成例
  4-1.監視・調整モジュールの概要
  4-2.監視・調整モジュールの動作(1)
  4-3.監視・調整モジュールの動作(2)
  5.監視・調整モジュールの処理手順例
  6.変形例
  6-1.機械学習による対応情報の生成
  6-2.リソース切り替えのオーバーヘッド削減
  7.その他
  7-1.ロボットシステムへの適用
  7-2.ゲーム機への適用
  8.むすび
In addition, the present disclosure will be described according to the order of items shown below.
1. 1. An example of information processing according to the embodiment of the present disclosure 2. Configuration example of information processing system 3. DNN model configuration example 4. Example of functional configuration of monitoring / adjustment module 4-1. Overview of monitoring / adjustment module 4-2. Operation of monitoring / adjustment module (1)
4-3. Operation of monitoring / adjustment module (2)
5. Example of processing procedure of monitoring / adjustment module 6. Modification example 6-1. Generation of correspondence information by machine learning 6-2. Reduction of resource switching overhead 7. Others 7-1. Application to robot systems 7-2. Application to game consoles 8. Conclusion

<<1.本開示の実施形態に係る情報処理の一例>>
 図1は、本開示の実施形態に係る情報処理の一例を示す図である。図1に示すように、本開示の実施形態に係る監視・調整モジュール100(情報処理装置の一例)は、DNNモデル30とシステムモジュール50とを備える情報処理システム1に適用される。
<< 1. An example of information processing according to an embodiment of the present disclosure >>
FIG. 1 is a diagram showing an example of information processing according to the embodiment of the present disclosure. As shown in FIG. 1, the monitoring / adjusting module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 including a DNN model 30 and a system module 50.

 情報処理システム1は、システムモジュール50における処理と並行して、DNNモデル30の推論を動作させる。 The information processing system 1 operates the inference of the DNN model 30 in parallel with the processing in the system module 50.

 DNNモデル30は、入力層、出力層、複数の隠れ層(中間層)などの複数の層を有する人工ニューラルネットワークであるDNNを用いて機械学習された学習済みモデルにより構成される。DNNモデル30は、入力層、出力層、複数の隠れ層などの複数の層の他、チャネル、行列などの複数の要素で構成される。DNNモデル30は、監視・調整モジュール100から要素ごとに与えられた計算方法により入力データの演算処理を行い、演算結果を出力する。 The DNN model 30 is composed of a trained model machine-learned using DNN, which is an artificial neural network having a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers (intermediate layers). The DNN model 30 is composed of a plurality of layers such as an input layer, an output layer, and a plurality of hidden layers, as well as a plurality of elements such as a channel and a matrix. The DNN model 30 performs calculation processing of input data by a calculation method given for each element from the monitoring / adjustment module 100, and outputs a calculation result.

 システムモジュール50は、DNNモデル30の推論結果を利用して各種処理を実行する。システムモジュール50は、例えば、DNNモデル30によるカメラ画像の認識結果に基づいてロボットの動作を制御するモジュールや、DNNモデル30による音声やハンドジェスチャの認識結果を利用するUI(User Interface)機能を備えたゲーム機の処理を実行するモジュールなどに相当する。 The system module 50 executes various processes using the inference result of the DNN model 30. The system module 50 is provided with, for example, a module that controls the operation of the robot based on the recognition result of the camera image by the DNN model 30, and a UI (User Interface) function that uses the recognition result of voice and hand gesture by the DNN model 30. It corresponds to a module that executes the processing of the game machine.

 このような情報処理システム1において、システム全体のリソース使用量が最大量を超過すると予測された場合に、データ処理で選択的除去・遅延・オフロードを行うと、データ処理自体のリアルタイム性及び出力品質が低下する等の問題が生じ得る。すなわち、データ処理において時系列データの選択的除去が行われると、ある時刻の重要なデータを見逃す可能性が生じ、処理の遅延が行われれば処理結果の取得が遅れ、処理結果を用いるシステムの他の機能のリアルタイム性に影響を及ぼす。 In such an information processing system 1, when the resource usage of the entire system is predicted to exceed the maximum amount, if selective removal / delay / offload is performed in the data processing, the real-time property and output of the data processing itself are performed. Problems such as deterioration of quality may occur. That is, if time-series data is selectively removed in data processing, important data at a certain time may be overlooked, and if processing is delayed, acquisition of processing results is delayed, and a system that uses processing results Affects the real-time nature of other functions.

 例えば、ロボットシステムのように、認識から行動までのリアルタイム性が重視されるシステムには、選択的除去による見逃しや処理遅延による行動決定の遅れは深刻である。また、認識処理によっては、1つのデータに対して10ミリ秒程度を要するタスクもあるが、電動部の制御は数十から数百マイクロ秒間隔で制御されることもある。この場合、認識処理するデータが切り替わるタイミングで認識処理のリソース使用量を切り替えたとしても電動部のリソース使用量の変動に追従できない。 For example, in a system such as a robot system where real-time performance from recognition to action is important, the delay in action decision due to oversight due to selective removal or processing delay is serious. Further, depending on the recognition process, there is a task that requires about 10 milliseconds for one data, but the control of the electric unit may be controlled at intervals of several tens to several hundreds of microseconds. In this case, even if the resource usage amount of the recognition processing is switched at the timing when the data to be recognized processing is switched, it is not possible to follow the fluctuation of the resource usage amount of the electric unit.

 また、ゲームソフトウェアと並行して動作するUIエージェントを備えるシステムであれば、ゲームソフトウェア自体の処理負荷が高い間、ユーザによる操作を全く認識できなくなってしまうという問題が生じ得る。 Further, if the system is equipped with a UI agent that operates in parallel with the game software, there may be a problem that the operation by the user cannot be recognized at all while the processing load of the game software itself is high.

 このような問題点に鑑み、本開示では、ある処理と並行してDNNの推論を動作させるシステムにおいて、リソース使用量の超過を起こさないようにDNNの処理に用いるリソース量を調整することが可能な監視・調整モジュール100を提案する。 In view of these problems, in the present disclosure, in a system in which DNN inference is operated in parallel with a certain process, it is possible to adjust the amount of resources used for DNN processing so as not to cause an excess of resource usage. We propose a monitoring / adjustment module 100.

 本開示の実施形態に係る監視・調整モジュール100は、システム全体のリソース使用量を監視しながら、例えば、DNNによる推論処理に使用するリソース量を瞬時に調整することができる。かかる監視・調整モジュール100は、DNNを構成する層やマップ等の要素の一部を計算するたびに、システム全体のリソース使用量を取得し、システムにおいて使用が許容されるリソースの最大量の超過を起こさないように次の要素で使ってもよいリソース量を決定する。このリソース量を決める時間間隔(要素の細かさ)を小さくすることで、処理途中であってもDNNによる推論処理に使用するリソースの一部を瞬時に調整することができるようになる。調整の結果生まれた余剰によりシステム全体のリソース量は限界を超えることがなくなり、異常終了・異常動作・レイテンシ超過等をすることなく、システムは動作し続けられる。 The monitoring / adjustment module 100 according to the embodiment of the present disclosure can instantly adjust the amount of resources used for inference processing by DNN, for example, while monitoring the amount of resources used in the entire system. Each time the monitoring / adjusting module 100 calculates a part of elements such as layers and maps constituting the DNN, the resource usage of the entire system is acquired, and the maximum amount of resources allowed to be used in the system is exceeded. Determine the amount of resources that may be used in the next factor so as not to cause. By reducing the time interval (fineness of elements) that determines the amount of resources, it becomes possible to instantly adjust a part of the resources used for inference processing by DNN even during processing. Due to the surplus created as a result of the adjustment, the amount of resources of the entire system does not exceed the limit, and the system continues to operate without abnormal termination, abnormal operation, excess latency, etc.

 また、DNNは、参考文献1に示されるように、計算精度を落として少ないリソースで処理したとしても、例えばカメラ画像などの認識精度が低下しにくい性質がある。参考文献1では、DNNの量子化ビット数を増減させたとき認識精度への影響を抑える手法が提案されている。この手法(PACT)によれば、いくつかの物体認識モデルにおいて量子化ビット数が削減されても、ある程度の認識精度低下で抑えることができる。
 参考文献1:C.Jungwook,“PACT:Parameterized Clipping Activation for Quantized Neural Networks”, Computer Vision and Pattern Recognition, 2018
Further, as shown in Reference 1, DNN has a property that the recognition accuracy of, for example, a camera image is unlikely to be lowered even if the calculation accuracy is lowered and the processing is performed with a small number of resources. Reference 1 proposes a method of suppressing the influence on the recognition accuracy when the number of quantization bits of DNN is increased or decreased. According to this method (PACT), even if the number of quantization bits is reduced in some object recognition models, it can be suppressed by a certain decrease in recognition accuracy.
Reference 1: C.I. Jungwook, “PACT: Parameterized Clipping Activation for Quantized Neural Networks”, Computer Vision and Pattern Recognition, 2018

 本開示の実施形態に係る監視・調整モジュール100により実現される機能は、DNNによる推論処理中に要素間の途中でリソース使用量を調整する技術を含んでいる。上述のように、DNNは計算精度(DNNに割り当てるリソース量)を落としても精度が低下しにくい性質に着目することによって、システム全体のリソース使用量に配慮しつつ、DNNによる推論を中止する必要のない推論器を実現できる。 The function realized by the monitoring / adjusting module 100 according to the embodiment of the present disclosure includes a technique for adjusting the resource usage in the middle between the elements during the inference processing by DNN. As mentioned above, DNN needs to stop inference by DNN while considering the resource usage of the entire system by paying attention to the property that the accuracy does not easily decrease even if the calculation accuracy (the amount of resources allocated to DNN) is reduced. It is possible to realize an inference device without.

 また、本開示の実施形態に係る監視・調整モジュール100は各要素のリソース使用量を決定する際に、DNNの推論処理の精度ができるだけ低下しないよう考慮する仕組みを有する。これは、DNNモデルにおいて処理に必要になるリソースを削減する要素と量と精度(推論の精度)の関係を事前に解析することにより実現する。この技術により、システム全体のリソース使用量に余裕がある際には、非常に高精度でデータ処理が行うことができ、使用量に余裕がなくなった状況においても、ある程度の処理(認識)精度でデータ処理を継続できることを保証できる。 Further, the monitoring / adjustment module 100 according to the embodiment of the present disclosure has a mechanism for considering that the accuracy of the DNN inference processing is not lowered as much as possible when determining the resource usage amount of each element. This is realized by analyzing in advance the relationship between the amount and the accuracy (inference accuracy) and the element that reduces the resources required for processing in the DNN model. With this technology, data processing can be performed with extremely high accuracy when there is a margin in the resource usage of the entire system, and even in a situation where there is no margin in the usage, with a certain degree of processing (recognition) accuracy. It can be guaranteed that data processing can be continued.

 本開示の実施形態に係る監視・調整モジュール100による情報処理について、以下にその概要を説明する。 The outline of the information processing by the monitoring / adjustment module 100 according to the embodiment of the present disclosure will be described below.

 まず、監視・調整モジュール100は、情報処理システム1の全体のリソース使用量を取得する(ステップS1)。情報処理システム1の全体のリソース使用量は、例えば、消費電力やメモリ使用量、処理時間などに該当する。 First, the monitoring / adjustment module 100 acquires the total resource usage of the information processing system 1 (step S1). The total resource usage of the information processing system 1 corresponds to, for example, power consumption, memory usage, processing time, and the like.

 続いて、監視・調整モジュール100は、情報処理システム1の全体のリソース使用量に基づいて、DNNモデル30による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する(ステップS2)。DNNモデル30による推論処理の計算は、例えば、多段で構成された複数の層の計算を連ねて構成される。そこで、監視・調整モジュール100は、例えば、情報処理システム1のリソース余剰に基づいて、DNNモデルの推論処理を構成する層ごとに、目標リソース使用量を算出する。 Subsequently, the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1 (step S2). The calculation of the inference process by the DNN model 30 is composed of, for example, a series of calculations of a plurality of layers composed of multiple stages. Therefore, the monitoring / adjustment module 100 calculates the target resource usage amount for each layer constituting the inference processing of the DNN model, for example, based on the resource surplus of the information processing system 1.

 目標リソース使用量の算出後、監視・調整モジュール100は、目標リソース使用量に対応する計算方法を決定し(ステップS3)、DNNモデル30へ出力する。具体的には、監視・調整モジュール100は、リソース使用量と、リソース使用量ごとに事前解析した計算方法との対応情報に基づいて、計算方法を決定する。計算方法は、DNNモデル30による推論処理の計算をどのように行うかを示す制御情報である「量子化ビット数」や「Pruning割合」などに基づいて決定される。 After calculating the target resource usage amount, the monitoring / adjustment module 100 determines a calculation method corresponding to the target resource usage amount (step S3), and outputs the calculation method to the DNN model 30. Specifically, the monitoring / adjustment module 100 determines the calculation method based on the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The calculation method is determined based on "quantization bit number", "Pruning ratio", etc., which are control information indicating how the calculation of the inference process by the DNN model 30 is performed.

 このように、監視・調整モジュール100は、情報処理システム1の全体のリソース使用量に基づいて、DNNモデル30による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。そして、監視・調整モジュール100は、算出した目標リソース使用量に対応する計算方法を決定する。これにより、監視・調整モジュール100は、情報処理システム1においてリソース使用量の超過を起こさないようにDNNモデル30の推論処理に用いるリソース量を調整できる。 In this way, the monitoring / adjustment module 100 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN model 30 based on the total resource usage amount of the information processing system 1. Then, the monitoring / adjustment module 100 determines a calculation method corresponding to the calculated target resource usage amount. Thereby, the monitoring / adjusting module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.

<<2.情報処理システムの構成例>>
 以下、本開示の実施形態に係る情報処理システム1の構成例を説明する。図2は、本開示の実施形態に係る情報処理システムの概略構成例を示す図である。
<< 2. Information processing system configuration example >>
Hereinafter, a configuration example of the information processing system 1 according to the embodiment of the present disclosure will be described. FIG. 2 is a diagram showing a schematic configuration example of an information processing system according to an embodiment of the present disclosure.

 図2に示すように、情報処理システム1は、プロセッサ11と、主記憶装置12と、補助記憶装置13と、周辺回路14と、入力装置15と、出力装置16と、周辺装置17と、通信装置18とを備える。プロセッサ11と、主記憶装置12と、補助記憶装置13と、周辺回路14と、通信装置18とは、内部バス20を介して相互に接続される。 As shown in FIG. 2, the information processing system 1 communicates with a processor 11, a main storage device 12, an auxiliary storage device 13, a peripheral circuit 14, an input device 15, an output device 16, and a peripheral device 17. A device 18 is provided. The processor 11, the main storage device 12, the auxiliary storage device 13, the peripheral circuit 14, and the communication device 18 are connected to each other via the internal bus 20.

 プロセッサ11は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)、GPU(Graphics Processing Unit)等のプロセッサにより実現される。プロセッサ11は、情報処理システム1における演算処理及び動作制御を実行する。 The processor 11 is realized by, for example, a processor such as a CPU (Central Processing Unit), an MPU (Micro Processing Unit), or a GPU (Graphics Processing Unit). The processor 11 executes arithmetic processing and operation control in the information processing system 1.

 主記憶装置12は、RAM(Random Access Memory)等の半導体メモリ素子により実現される。補助記憶装置13は、ROM(Read Only Memory)等の半導体メモリ素子やハードディスク、光ディスク等の記憶装置により実現される。 The main storage device 12 is realized by a semiconductor memory element such as a RAM (Random Access Memory). The auxiliary storage device 13 is realized by a semiconductor memory element such as a ROM (Read Only Memory) or a storage device such as a hard disk or an optical disk.

 周辺回路14は、A/Dコンバータやタイマ、信号処理回路などにより実現される。周辺回路14は、入力装置15や出力装置16、周辺装置17の各種信号やデータを処理する。 The peripheral circuit 14 is realized by an A / D converter, a timer, a signal processing circuit, or the like. The peripheral circuit 14 processes various signals and data of the input device 15, the output device 16, and the peripheral device 17.

 入力装置15は、例えば、マウス、キーボード、タッチパネル、ボタン、スイッチ、及びレバー等により実現される。また、入力装置15は、赤外線やその他の電波を利用して制御信号を送信することが可能なリモートコントローラやマイクロフォンなどの音声入力装置により実現することもできる。 The input device 15 is realized by, for example, a mouse, a keyboard, a touch panel, buttons, switches, levers, and the like. Further, the input device 15 can also be realized by a voice input device such as a remote controller or a microphone capable of transmitting a control signal using infrared rays or other radio waves.

 出力装置16は、CRT(Cathode Ray Tube)、LCD(Liquid Crystal Display)、又は有機EL等のディスプレイ装置、スピーカ、ヘッドホン等のオーディオ出力装置により実現できる。また、出力装置16は、プリンタ、携帯電話、又はファクシミリ等、取得した情報を利用者に対して視覚的又は聴覚的に通知することが可能な装置により実現することもできる。 The output device 16 can be realized by a display device such as a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), or an organic EL, and an audio output device such as a speaker or a headphone. Further, the output device 16 can also be realized by a device such as a printer, a mobile phone, or a facsimile that can visually or audibly notify the user of the acquired information.

 周辺装置17は、入力装置15及び出力装置16以外で情報処理システム1に搭載される他の装置である。周辺装置17は、例えば、加速度センサや角速度センサなどの各種センサ、慣性計測装置、ToF(Time of Flight)センサの他、GPS(Global Positioning System)、アクチュエータ、カメラ、スピーカ、バッテリーなどにより実現できる。 The peripheral device 17 is another device mounted on the information processing system 1 other than the input device 15 and the output device 16. The peripheral device 17 can be realized by, for example, various sensors such as an acceleration sensor and an angular velocity sensor, an inertial measurement unit, a ToF (Time of Flight) sensor, a GPS (Global Positioning System), an actuator, a camera, a speaker, a battery, and the like.

 通信装置18は、NIC(Network Interface Card)や各種通信用モデム、Bluetooth(登録商標)やWi-Fi(登録商標)などの無線モジュールにより実現できる。 The communication device 18 can be realized by a wireless module such as a NIC (Network Interface Card), various communication modems, Bluetooth (registered trademark) and Wi-Fi (registered trademark).

 図1に示す監視・調整モジュール100による各種処理は、主記憶装置12等を作業領域として、図2に示すプロセッサ11が図2に示す補助記憶装置13に格納されている各種プログラム等を実行することで実現され得る。また、DNNモデル30による推論は、主記憶装置12等を作業領域として、図2に示すプロセッサ11が図2に示す補助記憶装置13に格納されている各種プログラム等を実行することで実現され得る。また、システムモジュール50による各種処理は、主記憶装置12等を作業領域として、図2に示すプロセッサ11が図2に示す補助記憶装置13に格納されている各種プログラム等を実行することで実現され得る。すなわち、プロセッサ11、主記憶装置12、及び補助記憶装置13は、ソフトウェア(各種プログラム)との協働により、以下に説明する監視・調整モジュール100の各種機能(例えば、取得部113~通知部116による処理機能)を実現し得る。なお、図2に示すプロセッサ11が実行する各種プログラムは、通信装置18を介して、サーバ等の外部装置からダウンロードされたプログラムを用いることもできる。 In the various processes by the monitoring / adjusting module 100 shown in FIG. 1, the processor 11 shown in FIG. 2 executes various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. It can be realized by. Further, the inference by the DNN model 30 can be realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a working area. .. Further, various processes by the system module 50 are realized by the processor 11 shown in FIG. 2 executing various programs and the like stored in the auxiliary storage device 13 shown in FIG. 2 with the main storage device 12 and the like as a work area. obtain. That is, the processor 11, the main storage device 12, and the auxiliary storage device 13 cooperate with software (various programs) to perform various functions (for example, acquisition unit 113 to notification unit 116) of the monitoring / adjustment module 100 described below. Processing function) can be realized. As the various programs executed by the processor 11 shown in FIG. 2, a program downloaded from an external device such as a server can also be used via the communication device 18.

<<3.DNNモデルの構成例>>
 以下、本開示の実施形態に係るDNNモデル30の概要を説明する。図3は、本開示の実施形態に係るDNNモデルの構成例を示す図である。図4及び図5は、本開示の実施形態に係るDNN部分推論器の概要を示す図である。
<< 3. DNN model configuration example >>
Hereinafter, the outline of the DNN model 30 according to the embodiment of the present disclosure will be described. FIG. 3 is a diagram showing a configuration example of the DNN model according to the embodiment of the present disclosure. 4 and 5 are diagrams showing an outline of the DNN partial inference device according to the embodiment of the present disclosure.

 図3に示すように、DNNモデル30は、予め定められる条件に基づいて、所定の粒度で分割された複数のDNN部分推論器31(31~31m+n)で構成される。DNNモデル30は、層、チャネル、及び行列を含む複数の要素で構成される。DNN部分推論器31は、それぞれ、DNNモデル30を構成する層、チャネル、及び行列のうちの少なくとも1つの要素に基づく粒度で分割されることにより構成される。DNNモデル30を分割する粒度は、後述する監視・調整モジュール100の管理者が、事前解析に要するコストや求めるリソース使用量調整の時間間隔に基づいて予め決定する。 As shown in FIG. 3, the DNN model 30 is composed of a plurality of DNN partial inferiors 31 (31 m to 31 m + n) divided into predetermined particle sizes based on predetermined conditions. The DNN model 30 is composed of a plurality of elements including layers, channels, and matrices. The DNN partial reasoner 31 is configured by being divided by a particle size based on at least one element of the layers, channels, and matrices constituting the DNN model 30, respectively. The particle size for dividing the DNN model 30 is determined in advance by the administrator of the monitoring / adjustment module 100, which will be described later, based on the cost required for the preliminary analysis and the required resource usage adjustment time interval.

 図4に示すように、DNN部分推論器31は、分割されたDNNのある要素(ブロック)の結果を、例えば、監視・調整モジュール100により与えられた計算方法で演算して出力する。このため、入力はDNN推論の入力データ、もしくは前段のDNN部分推論器31の出力(活性値:Activation)であり、出力は担当するDNNの要素(ブロック)の結果である。つまり、図3に示すように、複数のDNN部分推論器31(31~31m+n)を繋げて動作させることにより、入力データに対してDNNの推論が行われた結果を返す機能が実現できる。推論するDNNモデル30の構造によっては枝分かれ状の構造が必要となる場合もあるが、同じように構成することができる。 As shown in FIG. 4, the DNN partial inferior 31 calculates and outputs the result of a certain element (block) of the divided DNN by, for example, the calculation method given by the monitoring / adjusting module 100. Therefore, the input is the input data of DNN inference or the output (activity value: Activation) of the DNN partial inference device 31 in the previous stage, and the output is the result of the element (block) of the DNN in charge. That is, as shown in FIG. 3, by connecting and operating a plurality of DNN partial inference devices 31 (31 m to 31 m + n ), it is possible to realize a function of returning the result of DNN inference for the input data. .. Depending on the structure of the DNN model 30 to be inferred, a branched structure may be required, but it can be configured in the same manner.

 本開示の実施形態において、DNN部分推論器31は、図5に示すように、監視・調整モジュール100の出力である計算方法Om(制御情報)を入力として受け取る。DNN部分推論器31は、入力データと計算方法Om(制御情報)を同時に取得し、取得した(制御情報に基づく)計算方法により入力データに対する計算を行い、担当するDNN要素(ブロック)の結果として出力を返す。具体的には、重みや活性値(Activation)の量子化ビット数や量子化方式、Pruning手法のPruning割合、用いる計算式や重みパラメータなどを可変とできる。 In the embodiment of the present disclosure, the DNN partial inference device 31 receives the calculation method Om (control information) which is the output of the monitoring / adjusting module 100 as an input, as shown in FIG. The DNN partial inference device 31 simultaneously acquires the input data and the calculation method Om (control information), performs a calculation on the input data by the acquired (based on the control information) calculation method, and as a result of the DNN element (block) in charge. Returns the output. Specifically, the number of quantization bits and the quantization method of the weight and the activation value, the Pruning ratio of the Pruning method, the calculation formula to be used, the weight parameter, and the like can be changed.

<<4.監視・調整モジュールの機能構成例>>
 以下、本開示の実施形態に係る監視・調整モジュールの機能構成の一例を説明する。図6は、本開示の実施形態に係るDNN推論器の概要を示す図である。図7は、本開示の実施形態に係る監視・調整モジュールの機能構成の一例を示す機能ブロック図である。
<< 4. Function configuration example of monitoring / adjustment module >>
Hereinafter, an example of the functional configuration of the monitoring / adjustment module according to the embodiment of the present disclosure will be described. FIG. 6 is a diagram showing an outline of the DNN inference device according to the embodiment of the present disclosure. FIG. 7 is a functional block diagram showing an example of the functional configuration of the monitoring / adjustment module according to the embodiment of the present disclosure.

 図6に示すように、DNN部分推論器31(31~31m+n)と、監視・調整モジュール100(100~100m+n)とを組み合わせることにより、DNN推論器70を実現できる。図6に示すDNN推論器70では、DNN部分推論器31ごとに、監視・調整モジュール100(100~100m+n)が設けられる。図6に示すDNN推論器70によれば、監視・調整モジュール100(100~100m+n)が、DNNの分解された要素を計算する毎にシステム全体のリソース使用量を取得し、システム状態に合わせた計算方法に都度切り替えることができる。これにより、リソース使用量を瞬時に調整可能であり、推論の中断が不要なDNN推論器70を実現できる。なお、以下の説明において登場する「ブロック」は、DNN部分推論器31と同義である。 As shown in FIG. 6, the DNN inference device 70 can be realized by combining the DNN partial inference device 31 (31 m to 31 m + n ) and the monitoring / adjustment module 100 (100 m to 100 m + n). In the DNN inference device 70 shown in FIG. 6, a monitoring / adjustment module 100 (100 m to 100 m + n ) is provided for each DNN partial inference device 31. According to the DNN inferior 70 shown in FIG. 6, the monitoring / adjustment module 100 (100 m to 100 m + n ) acquires the resource usage of the entire system every time the decomposed element of the DNN is calculated, and puts it in the system state. You can switch to the matching calculation method each time. As a result, the resource usage can be adjusted instantly, and the DNN inference device 70 that does not require interruption of inference can be realized. The "block" appearing in the following description is synonymous with the DNN partial inference device 31.

 図7に示すように、監視・調整モジュール100は、リソース使用量情報格納部111と、対応情報格納部112と、取得部113と、算出部114と、決定部115と、通知部116とを備える。監視・調整モジュール100は、これらの各部により、以下に説明する監視・調整モジュール100の機能や作用を実現または実行する。 As shown in FIG. 7, the monitoring / adjustment module 100 includes a resource usage information storage unit 111, a correspondence information storage unit 112, an acquisition unit 113, a calculation unit 114, a determination unit 115, and a notification unit 116. Be prepared. The monitoring / adjusting module 100 realizes or executes the functions and operations of the monitoring / adjusting module 100 described below by each of these parts.

 なお、監視・調整モジュール100を構成する各ブロック(リソース使用量情報格納部111~通知部116)はそれぞれ監視・調整モジュール100の機能を示す機能ブロックである。これら機能ブロックはソフトウェアブロックであってもよいし、ハードウェアブロックであってもよい。例えば、上述の機能ブロックが、それぞれ、ソフトウェア(マイクロプログラムを含む。)で実現される1つのソフトウェアモジュールであってもよいし、半導体チップ(ダイ)上の1つの回路ブロックであってもよい。勿論、各機能ブロックがそれぞれ1つのプロセッサ又は1つの集積回路であってもよい。機能ブロックの構成方法は任意である。なお、監視・調整モジュール100は、上述の機能ブロックとは異なる機能単位で構成されていてもよい。 Each block (resource usage information storage unit 111 to notification unit 116) constituting the monitoring / adjustment module 100 is a functional block indicating the function of the monitoring / adjustment module 100, respectively. These functional blocks may be software blocks or hardware blocks. For example, each of the above-mentioned functional blocks may be one software module realized by software (including a microprogram), or may be one circuit block on a semiconductor chip (die). Of course, each functional block may be one processor or one integrated circuit. The method of configuring the functional block is arbitrary. The monitoring / adjustment module 100 may be configured in a functional unit different from the above-mentioned functional block.

 リソース使用量情報格納部111は、DNNの各ブロック(DNN部分推論器31)の計算で使用されたリソース使用量の情報を記憶する。リソース使用量情報格納部111に記憶されるリソース使用量の情報は、後述する算出部114により格納される。 The resource usage information storage unit 111 stores information on the resource usage used in the calculation of each block of DNN (DNN partial inference device 31). The resource usage information stored in the resource usage information storage unit 111 is stored by the calculation unit 114, which will be described later.

 対応情報格納部112は、リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する。リソース使用量ごとの計算方法は、例えば、監視・調整モジュール100の管理者であるオペレータの事前解析により取得される。 The correspondence information storage unit 112 stores the correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The calculation method for each resource usage is acquired, for example, by prior analysis of the operator who is the administrator of the monitoring / adjustment module 100.

 取得部113は、情報処理システム1の全体のリソース使用量を取得する。取得部113は、例えば、システムモジュール50から、情報処理システム1の全体の消費電力やメモリ使用量を取得する。 The acquisition unit 113 acquires the total resource usage of the information processing system 1. The acquisition unit 113 acquires the total power consumption and memory usage of the information processing system 1 from, for example, the system module 50.

 算出部114は、取得部113により取得された情報処理システム1の全体のリソース使用量に基づいて、DNN推論器70による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する。例えば、算出部114は、予め定められる条件に基づいて、所定の粒度で分割されたDNN部分推論器31(図6参照)による推論処理の計算ごとに割り当てる目標リソース使用量を算出する。算出部114は、例えば、情報処理システム1のリソース余剰に基づいて、目標リソース使用量を算出する。 The calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 based on the total resource usage amount of the information processing system 1 acquired by the acquisition unit 113. For example, the calculation unit 114 calculates the target resource usage amount to be allocated for each calculation of the inference process by the DNN partial inference device 31 (see FIG. 6) divided into predetermined particle sizes based on predetermined conditions. The calculation unit 114 calculates the target resource usage amount based on, for example, the resource surplus of the information processing system 1.

 決定部115は、算出部114により算出された目標リソース使用量に対応するDNN部分推論器31の計算方法を決定する。計算方法は、あるDNNモデル30の要素(ブロック)、すなわち次に推論処理を行うDNN部分推論器31(図6参照)の計算をどのように行うかを示す制御情報であり、「量子化ビット数」や「Pruning割合」などに該当する。決定部115は、「量子化ビット数」と「Pruning割合」を組み合わせるなど、タプルで構成された制御情報に基づいて計算方法を決定できる。決定部115は、決定した計算方法を対応するDNN部分推論器31に出力する。 The determination unit 115 determines the calculation method of the DNN partial inference device 31 corresponding to the target resource usage calculated by the calculation unit 114. The calculation method is control information indicating how to calculate an element (block) of a certain DNN model 30, that is, a DNN partial inference device 31 (see FIG. 6) that performs inference processing next, and is "quantization bit". It corresponds to "number" and "Pruning ratio". The determination unit 115 can determine the calculation method based on the control information composed of tuples, such as combining the "quantization bit number" and the "Pruning ratio". The determination unit 115 outputs the determined calculation method to the corresponding DNN partial inference device 31.

 通知部116は、DNN部分推論器31に割り当てるリソース使用量が低下することを通知する。通知部116は、例えば、算出部114により算出された目標リソース使用量が所定の基準以下となる場合、DNN推論器70による推論処理の結果(推論結果)の精度が低下する可能性があることを通知する。 The notification unit 116 notifies that the resource usage allocated to the DNN partial inferior 31 will decrease. For example, when the target resource usage amount calculated by the calculation unit 114 is equal to or less than a predetermined standard, the notification unit 116 may reduce the accuracy of the inference processing result (inference result) by the DNN inference device 70. Notify.

<4-1.監視・調整モジュールの概要>
 以下、監視・調整モジュール100の概要について説明する。図8は、本開示の実施形態に係る監視・調整モジュールの概要を示す図である。
<4-1. Overview of monitoring / adjustment module>
Hereinafter, the outline of the monitoring / adjustment module 100 will be described. FIG. 8 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.

 図8に示すように、監視・調整モジュール100は、入力として与えられる情報処理システム1の全体のリソース使用量Imから、DNNモデル30を構成する、ある要素(ブロック)の計算方法Omを決定して出力する。ある要素(各ブロック:DNN部分推論器31)は、レイヤー(Layer)、チャネル(Channel(Map))、行列(Tensor)の一部分であっても、これらの組み合わせであってもよい。監視・調整モジュール100の管理者は、予めDNNモデル30を複数のブロックに分割しておき、各ブロックにブロック番号biを付けておくこととする。どの程度細かい単位で分割しておくかは事前解析に要するコストや求めるリソース使用量調整の時間間隔に基づいて決定する。 As shown in FIG. 8, the monitoring / adjusting module 100 determines the calculation method Om of a certain element (block) constituting the DNN model 30 from the total resource usage Im of the information processing system 1 given as an input. And output. An element (each block: DNN partial inferior 31) may be a part of a layer, a channel (Map), a matrix (Tensor), or a combination thereof. The administrator of the monitoring / adjusting module 100 divides the DNN model 30 into a plurality of blocks in advance, and assigns a block number bi to each block. The degree of fine division is determined based on the cost required for pre-analysis and the required resource usage adjustment time interval.

 監視・調整モジュール100に対して入力として与えられるシステム全体のリソース使用量Imには、システム全体の消費電力やメモリ使用量や処理時間などのリソース使用量を与えることができる。 The resource usage amount of the entire system given as an input to the monitoring / adjustment module 100 can be given the resource usage amount such as the power consumption, memory usage amount, and processing time of the entire system.

 なお、監視・調整モジュール100は、情報処理システム1の全体のリソース使用量として、実測値の代わりに、簡単な機械学習モデルを用いて未来の使用量の予測値を取得してもよい。このようにすることで、情報処理システム1における急激なリソースの使用量の変化に対応できる。また、監視・調整モジュール100は、情報処理システム1の全体のリソース使用量として、例えば、消費電力とメモリ使用量などのように、種類の異なる複数の使用量を組み合わせたタプルを取得してもよい。このようにすることで、各リソース使用量に合わせた適切な出力を算出できる。 Note that the monitoring / adjustment module 100 may acquire a predicted value of future usage using a simple machine learning model instead of the measured value as the total resource usage of the information processing system 1. By doing so, it is possible to cope with a sudden change in the amount of resources used in the information processing system 1. Further, the monitoring / adjustment module 100 may acquire a taple that combines a plurality of different types of usage, such as power consumption and memory usage, as the total resource usage of the information processing system 1. good. By doing so, it is possible to calculate an appropriate output according to the amount of each resource used.

 さらに、監視・調整モジュール100は、DNNのブロック番号biを入力に指定できる。これはどのブロックの計算方法を決定・出力するかを判断するために用いられる。 Furthermore, the monitoring / adjustment module 100 can specify the DNN block number bi as an input. This is used to determine which block calculation method is to be determined and output.

 加えて、監視・調整モジュール100は、ブロック番号bi以前のブロックにそれぞれ割り当てられたリソース使用量Rdnn’を受け取ることができる。これは、モデル全体の計算状況を把握し、DNNモデル全体の精度を向上できる計算方法Omを決定するために用いられる。 In addition, the monitoring / adjustment module 100 can receive the resource usage amount Rdn'allocated to each block before the block number bi. This is used to grasp the calculation status of the entire model and determine the calculation method Om that can improve the accuracy of the entire DNN model.

 監視・調整モジュール100の出力である計算方法Omは、ブロック番号biのブロックの計算をどのように行うかを示す制御情報である。例えば、数値計算の精度を調整する量子化ビット数、Pruning手法に代表される何割のDNNの要素の計算を省くかを示すPruning割合、使用するネットワーク構造やパラメータなどである。モジュール出力の形式としては、量子化ビット数やPruning割合などの複数の要素で構成された制御情報を用いる場合、量子化ビット数QbとPruning割合Prを用いて、Om=(Qb,Pr)とタプルで表現することができる。 The calculation method Om, which is the output of the monitoring / adjustment module 100, is control information indicating how to calculate the block with the block number bi. For example, the number of quantization bits for adjusting the accuracy of numerical calculation, the Pruning ratio indicating what percentage of DNN elements are omitted as represented by the Pruning method, the network structure and parameters to be used, and the like. As the module output format, when control information composed of multiple elements such as the number of quantization bits and the Pruning ratio is used, the number of quantization bits Qb and the Pruning ratio Pr are used, and Om = (Qb, Pr). It can be expressed in tuples.

 監視・調整モジュール100の出力である計算方法Omは、1つのDNNのブロックに対して計算方法を決定するが、1つのブロック中に複数のチャネル:Channelや、行列:Tensorの部分が含まれることがある。この場合は、ブロック中の要素のそれぞれに対して量子化ビット数などを指定できるように、量子化ビット数Qbが複数の量子化ビット数を持つよう構成してもよい。 The calculation method Om, which is the output of the monitoring / adjustment module 100, determines the calculation method for one DNN block, but one block includes a plurality of channels: Channel and a matrix: Tensor. There is. In this case, the number of quantization bits Qb may be configured to have a plurality of quantization bits so that the number of quantization bits or the like can be specified for each of the elements in the block.

<4-2.監視・調整モジュールの動作(1)>
 以下、監視・調整モジュール100の動作について説明する。図9は、本開示の実施形態に係る監視・調整モジュールの概要を示す図である。
<4-2. Operation of monitoring / adjustment module (1)>
Hereinafter, the operation of the monitoring / adjusting module 100 will be described. FIG. 9 is a diagram showing an outline of the monitoring / adjustment module according to the embodiment of the present disclosure.

 図9に示すように、監視・調整モジュール100による計算方法Omの決定は、2段階に分けて行われる。まず1段階目で、監視・調整モジュール100は、入力として与えられるシステム全体のリソース使用量Imからブロック番号biのブロックで使用する目標リソース使用量Rdnnを決定する。1段目の動作は、例えば、図7に示す算出部114の機能により実現される。 As shown in FIG. 9, the determination of the calculation method Om by the monitoring / adjustment module 100 is performed in two stages. First, in the first stage, the monitoring / adjusting module 100 determines the target resource usage Rdnn used in the block of the block number bi from the resource usage Im of the entire system given as an input. The operation of the first stage is realized, for example, by the function of the calculation unit 114 shown in FIG. 7.

 次に、監視・調整モジュール100は、2段階目で、その目標リソース使用量Rdnnを達成する計算方法のうち、DNNの計算精度ができる限り低下しない計算方法Omを選択して決定する。2段目の動作は、例えば、図7に示す決定部115の機能により実現される。すなわち、監視・調整モジュール100の取得部113は、決定部115により計算方法が決定される度に、リソース使用量を取得する。続いて、監視・調整モジュール100算出部114は、取得部113によりリソース使用量が取得される度に、次に推論処理の計算を行うDNN部分推論器31の目標リソース使用量を算出する。そして、監視・調整モジュール100の決定部115は、算出部114により目標リソース使用量が算出される度に、次に推論処理の計算を行うDNN部分推論器の計算方法を決定する。 Next, in the second stage, the monitoring / adjustment module 100 selects and determines the calculation method Om that does not reduce the DNN calculation accuracy as much as possible from among the calculation methods that achieve the target resource usage Rdnn. The operation of the second stage is realized, for example, by the function of the determination unit 115 shown in FIG. 7. That is, the acquisition unit 113 of the monitoring / adjustment module 100 acquires the resource usage amount each time the calculation method is determined by the determination unit 115. Subsequently, the monitoring / adjustment module 100 calculation unit 114 calculates the target resource usage amount of the DNN partial inference device 31 that next calculates the inference processing each time the resource usage amount is acquired by the acquisition unit 113. Then, each time the determination unit 115 of the monitoring / adjustment module 100 calculates the target resource usage amount by the calculation unit 114, the determination unit 115 determines the calculation method of the DNN partial inference device that next calculates the inference process.

 上述した1段目の動作により決定される目標リソース使用量Rdnnの簡易な計算方法として、システムのリソース余剰をDNNの計算に割り当てる方法がある。例えば、入力されるシステム全体のリソース使用量Imと、システムが供給できる最大リソース量Rmaxと、定数マージンEpsilonを用いて、以下に示す式(1)のように計算できる。
 「目標リソース使用量Rdnn」=「最大リソース量Rmax」-「リソース使用量Im」-「定数マージンEpsilon」・・・(1)
As a simple calculation method of the target resource usage amount Rdnn determined by the operation of the first stage described above, there is a method of allocating the resource surplus of the system to the calculation of DNN. For example, it can be calculated by the following equation (1) using the input resource usage amount Im of the entire system, the maximum resource amount Rmax that can be supplied by the system, and the constant margin Epsilon.
"Target resource usage Rdnn" = "Maximum resource usage Rmax"-"Resource usage Im"-"Constant margin Epsilon" ... (1)

 上述した2段階目では、ブロック番号biのブロックで使用してよい目標リソース使用量Rdnnとブロック番号bi以前のブロックで使用したリソース使用量Rdnn’から計算方法Omを決定する必要がある。しかしながら、量子化ビット数やPruning割合などは様々な組み合わせがあり、かつDNNの部分ごとに必要な計算精度は異なり、計算方法Omの決定は難しい。一方、単にブロック毎のRdnnに基づいて均等に計算リソースを割り当てる方法では、DNNの推論処理の精度が大きく低下してしまう。そこで、瞬時に複数の目標リソース使用量Rdnnから計算方法Omを決定する方法の一例として、事前解析結果に基づく決定方法を示す。図10は、本開示の実施形態に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。 In the second step described above, it is necessary to determine the calculation method Om from the target resource usage amount Rdnn that may be used in the block with the block number bi and the resource usage amount Rdnn'used in the block before the block number bi. However, there are various combinations of the number of quantization bits and the Pruning ratio, and the required calculation accuracy differs for each DNN part, so it is difficult to determine the calculation method Om. On the other hand, in the method of simply allocating the calculation resources evenly based on Rdnn for each block, the accuracy of the DNN inference processing is greatly reduced. Therefore, as an example of a method of instantly determining the calculation method Om from a plurality of target resource usage Rdnn, a determination method based on the preliminary analysis result is shown. FIG. 10 is a diagram showing an outline of a method for determining a calculation method based on the preliminary analysis result according to the embodiment of the present disclosure.

 図10は、様々なリソース使用量の組み合わせに対して、推論処理の精度低下が少ない計算方法を事前に列挙しておく方法の一例を示している。入力Imが、消費電力やメモリ使用量などの1種類のリソースであり、その範囲が0%から100%までであるとする。この場合、例えば0%と100%の間のIm=0%,10%,20%,・・・,100%の11個の補間点に対して、それぞれ推論処理の精度低下の少ない最適な計算方法Omを事前に解析しておく。この解析は計算方法の決定対象となるブロックの目標リソース使用量Rdnnと、以前のブロックで使用したリソース使用量Rdnn’の組み合わせの全てに対して行う。 FIG. 10 shows an example of a method of listing in advance calculation methods with less decrease in accuracy of inference processing for various combinations of resource usage. It is assumed that the input Im is one type of resource such as power consumption and memory usage, and the range is from 0% to 100%. In this case, for example, for 11 interpolation points of Im = 0%, 10%, 20%, ..., 100% between 0% and 100%, the optimum calculation with little deterioration in the accuracy of the inference processing. Method Om is analyzed in advance. This analysis is performed for all combinations of the target resource usage Rdnn of the block to be determined by the calculation method and the resource usage Rdn'used in the previous block.

 そして、監視・調整モジュール100は、対応情報格納部112に記憶されている対応情報に基づいて、目標リソース使用量Rdnnに近いリソース使用量に対応付けられた複数の計算方法を取得し、取得した複数の計算方法に基づいて、計算方法を決定する。具体的には、監視・調整モジュール100は、動作時、事前解析結果から入力Imに近い点(補間点)を探し、それらから補間することで、計算方法Omを決定できる。事前解析において補間点を11個以上の点に増やせばより正確な計算方法Omを決定することができる。また、入力Imが複数のリソースについてであっても同様に補間点に対して事前に解析しておくことで対応できる。 Then, the monitoring / adjustment module 100 acquires and acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount Rdnn based on the correspondence information stored in the correspondence information storage unit 112. Determine the calculation method based on multiple calculation methods. Specifically, the monitoring / adjustment module 100 can determine the calculation method Om by searching for points (interpolation points) close to the input Im from the pre-analysis results during operation and interpolating from them. If the number of interpolation points is increased to 11 or more in the pre-analysis, a more accurate calculation method Om can be determined. Further, even if the input Im is for a plurality of resources, it can be dealt with by similarly analyzing the interpolation points in advance.

 上述した監視・調整モジュール100の動作は一例に過ぎず、入力Imとして、前にある監視・調整モジュール100(動作主体が監視・調整モジュール100m+1であれば、監視・調整モジュール100)の出力である計算方法Omを合わせて与えてもよい。これにより、長期的なシステム全体のリソース使用量の時間的変化やDNN中のどの要素にどの程度の計算精度を割り振ったかが分かる。このため、変動の大きいリソース使用量に対してより安定してDNNの処理全体の精度を高めるような計算方法Omを求めることができる。 The operation of the monitoring / adjustment module 100 described above is only an example, and as an input Im, the output of the monitoring / adjustment module 100 in front (if the operating subject is the monitoring / adjustment module 100 m + 1 , the monitoring / adjustment module 100 m ). The calculation method Om is also given. From this, it is possible to know the change over time in the resource usage of the entire system over a long period of time and the degree of calculation accuracy assigned to which element in the DNN. Therefore, it is possible to obtain a calculation method Om that is more stable and improves the accuracy of the entire DNN process with respect to the resource usage amount with large fluctuation.

 なお、上述した監視・調整モジュール100の動作において、「ブロック」を「DNN部分推論器」と読み替えることにより、図6に示すDNN推論器70を構成するDNN部分推論器31のそれぞれに対し、計算方法Omを出力する動作を実現できる。 In the operation of the monitoring / adjustment module 100 described above, by replacing "block" with "DNN partial inference device", calculation is performed for each of the DNN partial inference devices 31 constituting the DNN inference device 70 shown in FIG. The operation of outputting the method Om can be realized.

<4-3.監視・調整モジュールの動作(2)>
 監視・調整モジュール100は、DNN部分推論器31に割り当てるリソース使用量が低下することを情報処理システム1において通知するように動作できる。かかる動作は、図7に示す通知部116の機能により実現される。図11は、本開示の実施形態に係る通知動作の概要を示す図である。
<4-3. Operation of monitoring / adjustment module (2)>
The monitoring / adjusting module 100 can operate so as to notify the information processing system 1 that the resource usage allocated to the DNN partial inferencer 31 is decreasing. Such an operation is realized by the function of the notification unit 116 shown in FIG. 7. FIG. 11 is a diagram showing an outline of the notification operation according to the embodiment of the present disclosure.

 図11に示すように、情報処理システム1は、ユーザ通知モジュール90を備える。ユーザ通知モジュール90は、例えば、情報処理システム1のユーザに、情報処理システム1の処理状態を可視化して報知する。処理状態は、例えば、画像認識や音声認識などの認識処理であれば認識精度の低下、UIの応答処理であれば応答性の低下などに該当する。 As shown in FIG. 11, the information processing system 1 includes a user notification module 90. The user notification module 90 visualizes and notifies the user of the information processing system 1, for example, of the processing state of the information processing system 1. The processing state corresponds to, for example, a decrease in recognition accuracy in the case of recognition processing such as image recognition or voice recognition, and a decrease in responsiveness in the case of UI response processing.

 監視・調整モジュール100は、例えば、DNN部分推論器31に割り当てる目標リソース使用量Rdnnが所定の基準以下となる場合、ユーザ通知モジュール90への通知を実行する。 The monitoring / adjustment module 100 executes notification to the user notification module 90, for example, when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard.

 ユーザ通知モジュール90は、監視・調整モジュール100からの通知を受けて、例えば、ユーザが操作する操作デバイス91等に、情報処理システム1の応答性が低下していることを示す情報を可視化する。可視化の方法は、例えば、操作デバイス91に設けられた発光部を所定の色で点灯させる、あるいは発光部を点滅させるなどの方法が考えられる。 Upon receiving the notification from the monitoring / adjusting module 100, the user notification module 90 visualizes information indicating that the responsiveness of the information processing system 1 is deteriorated, for example, on the operation device 91 operated by the user. As a visualization method, for example, a method of lighting the light emitting unit provided in the operation device 91 with a predetermined color, or a method of blinking the light emitting unit can be considered.

 なお、監視・調整モジュール100は、DNN部分推論器31に割り当てる目標リソース使用量Rdnnが所定の基準以下となる場合、システムモジュール50への通知を行ってもよい。システムモジュール50は、監視・調整モジュール100からの通知を受けると、DNN推論器70による推論結果の精度低下に応じて、ユーザの安全性を高めるように、システムの動作を変更できる。例えば、情報処理システム1が運搬ロボットシステムであれば、運搬時間を犠牲にしてでも環境や人に被害を与えないような経路を選択するように動作を変更することが考えられる。あるいは、情報処理システム1がペット型ロボットであれば、応答処理の精度低下に合わせて、例えば目をつぶって休んでいる仕草をとるように動作変更することが考えられる。 Note that the monitoring / adjustment module 100 may notify the system module 50 when the target resource usage amount Rdnn allocated to the DNN partial inference device 31 is equal to or less than a predetermined standard. Upon receiving the notification from the monitoring / adjusting module 100, the system module 50 can change the operation of the system so as to improve the safety of the user according to the decrease in the accuracy of the inference result by the DNN inference device 70. For example, if the information processing system 1 is a transport robot system, it is conceivable to change the operation so as to select a route that does not damage the environment or people even at the expense of transport time. Alternatively, if the information processing system 1 is a pet-type robot, it is conceivable to change the operation so as to take a resting gesture, for example, by closing the eyes in accordance with the decrease in the accuracy of the response processing.

<<5.監視・調整モジュールによる処理手順例>>
 以下、本開示の実施形態に係る監視・調整モジュール100による処理手順について説明する。図12は、本開示の実施形態に係る監視・調整モジュールによる処理手順の一例を示すフローチャートである。図12に示す処理手順は、情報処理システム1の稼働中、繰り返し実行される。
<< 5. Example of processing procedure by monitoring / adjustment module >>
Hereinafter, the processing procedure by the monitoring / adjusting module 100 according to the embodiment of the present disclosure will be described. FIG. 12 is a flowchart showing an example of a processing procedure by the monitoring / adjusting module according to the embodiment of the present disclosure. The processing procedure shown in FIG. 12 is repeatedly executed while the information processing system 1 is in operation.

 図12に示すように、取得部113は、情報処理システム1の全体のリソース使用量である入力Imを取得する(ステップS101)。 As shown in FIG. 12, the acquisition unit 113 acquires the input Im, which is the total resource usage of the information processing system 1 (step S101).

 算出部114は、入力Imに基づいて、DNN部分推論器31の目標リソース使用量Rdnnを算出して、リソース使用量情報格納部111に格納する(ステップS102)。 The calculation unit 114 calculates the target resource usage amount Rdnn of the DNN partial inference device 31 based on the input Im, and stores it in the resource usage amount information storage unit 111 (step S102).

 決定部115は、処理対象となる現在のブロック(DNN部分推論器31)に付与されたブロック番号biをキーとして、現在のブロックより以前のブロックで使用されたリソース使用量Rdnn’をリソース使用量情報格納部111から取得する(ステップS103)。 The determination unit 115 uses the block number bi assigned to the current block (DNN partial inferior 31) to be processed as a key, and the resource usage amount Rdn'used in the blocks before the current block as the resource usage amount. Obtained from the information storage unit 111 (step S103).

 決定部115は、目標リソース使用量Rdnn、及び以前のブロックで使用されたリソース使用量Rdnn’に基づいて、計算方法Omを決定する(ステップS104)。 The determination unit 115 determines the calculation method Om based on the target resource usage amount Rdnn and the resource usage amount Rdnn'used in the previous block (step S104).

 決定部115は、決定した計算方法Omを処理対象となる現在のブロックに出力し(ステップS105)、上記ステップS101の処理手順に戻る。 The determination unit 115 outputs the determined calculation method Om to the current block to be processed (step S105), and returns to the processing procedure of step S101.

 上述してきた実施形態において、監視・調整モジュール100の入力Imとして、現在のリソース使用量ではなく、将来のリソース使用量の予測値を与えることで、リソース使用量の超過を回避しやすくなるように、DNN推論器70を構成できる。将来のリソース使用量の予測には、簡単な機械学習モデルやカルマンフィルタを用いることができる。 In the above-described embodiment, by giving the predicted value of the future resource usage instead of the current resource usage as the input Im of the monitoring / adjustment module 100, it becomes easy to avoid the excess of the resource usage. , The DNN inferior 70 can be configured. Simple machine learning models and Kalman filters can be used to predict future resource usage.

 また、例えば、DNNモデル30をどの粒度で細かく分割して事前解析を行い、DNN推論器70を構成するかは、事前解析にかかる時間や計算資源のコストをどの程度許容できるか、リソース使用量を切り替える時間間隔をどの程度細かくしたいかに依存する。切り替える時間間隔を小さくするには、事前解析でより多くの計算方法の組み合わせを評価する必要が生じるため大きな実施時間と計算資源を要する。また、モデル全体として切り替え可能なタイミングが増えるため、切り替えのオーバーヘッドによりモデル全体の処理時間が増加するおそれもある。 Further, for example, the particle size of the DNN model 30 to be finely divided and the pre-analysis to configure the DNN inferior 70 depends on how much time required for the pre-analysis and the cost of computational resources can be tolerated, and the amount of resources used. It depends on how fine the time interval you want to switch between. In order to reduce the switching time interval, it is necessary to evaluate more combinations of calculation methods in the preliminary analysis, which requires a large implementation time and calculation resources. In addition, since the timing at which switching is possible for the entire model increases, the processing time of the entire model may increase due to the switching overhead.

 この技術を利用する際には2つの粒度を決める必要がある。1つ目は、例えば、DNNモデル30をブロックに分割する粒度で、2つ目はブロックの中の要素をどこまで細かく分析するかであり、上記で述べた事前解析のコストと切り替えのオーバーヘッドのコストのバランスをこれらの粒度で調整する。 When using this technology, it is necessary to determine two particle sizes. The first is, for example, the particle size of dividing the DNN model 30 into blocks, and the second is how finely the elements in the block are analyzed. The balance of these is adjusted with these particle sizes.

 まず、DNN推論は、1つ目の「DNNモデル30をブロックに分割する粒度」で、DNNモデル30を複数のDNN部分推論器31に分けることができる。例えば、図10等に例で示した事前解析の方法では、この粒度の切り替えに対して計算方法を探索するため、DNN推論中にリソース使用量が切り替えられる最小単位はこの粒度となる。なお、この粒度で分けられた1つのブロックを処理した後、あまり時間が経過していない場合には、リソース使用量を切り替えずに、次のブロックも続けて同じリソース使用量で処理を実行等することによりブロックを結合することはできる。しかしながら、ブロックより細かい単位ではリソース使用量を切り替えられない。 First, the DNN inference is the first "particle size for dividing the DNN model 30 into blocks", and the DNN model 30 can be divided into a plurality of DNN partial inference devices 31. For example, in the pre-analysis method shown in the example shown in FIG. 10 and the like, since the calculation method is searched for this particle size switching, the minimum unit in which the resource usage is switched during DNN inference is this particle size. If not much time has passed after processing one block divided by this particle size, the next block is continuously executed with the same resource usage without switching the resource usage, etc. Blocks can be combined by doing so. However, the resource usage cannot be switched in units smaller than the block.

 2つ目の「ブロック中の要素をどこまで細かく見るかという粒度」は、そのブロックのリソース使用量を守り精度を最大化する上で、そのブロック内の細かい要素であるチャネル(channel)や行列(tensor)にどうビット数や演算力を割り振るかである。これを細かくすることで精度低下が抑えられることは、上記参考文献1に示される通りだが、細かくすることで、増加することが予想される事前解析のコストとのバランスを取る必要がある。 The second "grain size of how finely the elements in a block are viewed" is the finer elements in the block, such as channels and matrices, in order to protect the resource usage of the block and maximize the accuracy. How to allocate the number of bits and computing power to tensor). As shown in Reference 1 above, the decrease in accuracy can be suppressed by making this finer, but it is necessary to balance it with the cost of pre-analysis, which is expected to increase by making it finer.

 また、上記実施形態では、推論器として、複数の層による計算を多段に連ねて推論処理を行うDNNについて説明したが、単層で構成されたニューラルネットワークについても同様に、上述した監視・調整モジュール100による処理を適用できる。この場合、対象となるニューラルネットワークの層を構成するチャネルや行列等の要素ごとに、リソース使用量に対応する計算方法を事前解析した結果に基づいて、目標リソース使用量に対応する計算方法を決定できる。 Further, in the above embodiment, as an inference device, a DNN that performs inference processing by connecting calculations by a plurality of layers in multiple stages has been described, but similarly, the above-mentioned monitoring / adjustment module is also used for a neural network composed of a single layer. Processing by 100 can be applied. In this case, the calculation method corresponding to the target resource usage is determined based on the result of pre-analysis of the calculation method corresponding to the resource usage for each element such as the channel and the matrix constituting the layer of the target neural network. can.

<<6.変形例>>
<6-1.機械学習による対応情報の生成>
 対応情報格納部112に記憶される対応情報は、オペレータによる事前解析により獲得される場合には特に限定される必要はなく、例えば、機械学習の結果として獲得されてもよい。
<< 6. Modification example >>
<6-1. Generation of correspondence information by machine learning >
The correspondence information stored in the correspondence information storage unit 112 is not particularly limited when it is acquired by prior analysis by the operator, and may be acquired as a result of machine learning, for example.

 例えば、強化学習を用いて、入力を以前のブロックで使用されたリソース使用量Rdnn’、目標リソース使用量Rdnn、ブロック番号biとして、これらの入力に対する出力として計算方法Omを推論するようなDNNモデルを学習する。この場合は、強化学習の状態を、Rdnn’(以前のブロックで使用されたリソース量)、Rdnn(目標リソース使用量)、ブロック番号biのタプルとし、出力を計算方法Omとし、そして報酬を計算方法Omで達成できる推論処理の精度(認識処理の場合、認識精度)とリソース使用量を制限せずに推論処理行った際の精度との差とすればよい。強化学習モデルを用いることにより効率的な計算方法の決定ができる。 For example, a DNN model that uses reinforcement learning to infer the calculation method Om as the output for these inputs, with the inputs as the resource usage Rdn', the target resource usage Rdnn, and the block number bi used in the previous block. To learn. In this case, the state of reinforcement learning is Rdnn'(the amount of resources used in the previous block), Rdnn (the amount of target resources used), the taple of the block number bi, the output is the calculation method Om, and the reward is calculated. It may be the difference between the accuracy of the inference processing that can be achieved by the method Om (in the case of the recognition processing, the recognition accuracy) and the accuracy when the inference processing is performed without limiting the resource usage. An efficient calculation method can be determined by using the reinforcement learning model.

<6-2.リソース切り替えのオーバーヘッド削減>
 上述した監視・調整モジュールの動作(1)において、リソース切り替えのオーバーヘッドを削減するために、上述した2段目の動作を変形することもできる。図13は、変形例に係る事前解析結果に基づく計算方法の決定方法の概要を示す図である。図14は、変形例に係るDNN推論器の構成例を示す図である。
<6-2. Resource switching overhead reduction>
In the operation (1) of the monitoring / adjustment module described above, the operation of the second stage described above can be modified in order to reduce the overhead of resource switching. FIG. 13 is a diagram showing an outline of a method for determining a calculation method based on a preliminary analysis result according to a modified example. FIG. 14 is a diagram showing a configuration example of the DNN inference device according to the modified example.

 例えば、監視・調整モジュール100は、図13に示すように、入力であるリソース使用量Rdnn’、目標リソース使用量Rdnn、及びブロック番号biに対して、ブロック番号bi以降のブロック全ての計算方法を出力するようにしてもよい。 For example, as shown in FIG. 13, the monitoring / adjustment module 100 calculates all the blocks after the block number bi with respect to the input resource usage amount Rdn', target resource usage amount Rdnn, and block number bi. It may be output.

 また、図14に示すように、あるブロックを計算する前に監視・調整モジュール100を動作させて新しい計算方法を出力するか、若しくは以前決定した計算方法をそのまま用いるかを判断する判断機構71(71~71m+n-1)をDNN推論器70に新たに設ければよい。図14に示す判断機構71は、前回の監視・調整モジュール100の動作からの時間経過を計測し、一定時間が経過している場合、監視・調整モジュール100を動作させ、新しい計算方法を出力させる。一方、判断機構71は、前回の監視・調整モジュール100の動作から一定時間が経過していない場合、以前決定された計算方法をそのまま利用する。このようにして、システム全体のリソース使用量を取得する度に、計算方法を決定するよりも、リソース切り替えのオーバーヘッドを削減できる。 Further, as shown in FIG. 14, a determination mechanism 71 (determining whether to operate the monitoring / adjustment module 100 to output a new calculation method or to use the previously determined calculation method as it is before calculating a certain block ( 71 m to 71 m + n-1 ) may be newly provided in the DNN inferior 70. The determination mechanism 71 shown in FIG. 14 measures the passage of time from the previous operation of the monitoring / adjusting module 100, and when a certain time has elapsed, operates the monitoring / adjusting module 100 to output a new calculation method. .. On the other hand, if a certain time has not passed since the previous operation of the monitoring / adjusting module 100, the determination mechanism 71 uses the previously determined calculation method as it is. In this way, the overhead of resource switching can be reduced rather than determining the calculation method each time the resource usage of the entire system is acquired.

 例えば、判断機構71m+n-1は、ブロック番号bi=nが付与された部分推論器31m+nの計算方法を決定する場合、ブロック番号bi=n-1の計算方法を決定してから、一定時間が経過しているか否かを判定する。一定時間が経過している場合、判断機構71m+n-1は、監視・調整モジュール100m+nに、ブロック番号bi=nが付与された部分推論器31m+nの計算方法を決定させ、新たな計算方法を出力する。一方、一定時間が経過していない場合、判断機構71m+n-1は、以前、ブロック番号bi=nが付与された部分推論器31m+nについて決定した計算方法を、そのまま出力する。 For example, when the determination mechanism 71 m + n-1 determines the calculation method of the partial inference device 31 m + n to which the block number bi = n is assigned, the determination mechanism 71 m + n-1 determines the calculation method of the block number bi = n-1 for a certain period of time. Determines if has passed. When a certain period of time has passed, the determination mechanism 71 m + n-1 causes the monitoring / adjustment module 100 m + n to determine the calculation method of the partial inference device 31 m + n to which the block number bi = n is assigned, and a new calculation method is used. Is output. On the other hand, when a certain period of time has not elapsed, the determination mechanism 71 m + n-1 outputs the calculation method previously determined for the partial inference device 31 m + n to which the block number bi = n is assigned, as it is.

<<7.その他>>
<7-1.ロボットシステムへの適用>
 上述してきたDNN部分推論器31と、監視・調整モジュール100とで構成されるDNN推論器70は、カメラ画像をDNNにより処理するロボットシステムに適用できる。ロボットは屋外で動作するものはバッテリーにより駆動するため最大消費電力の成約をもつものが多い。またアクチュエータ等の消費電力の大きい電動部品が搭載されていたり、行動を決定するための認識処理や通信処理も同時に行われるため状況に応じて消費電力が変動したりする。システム全体の消費電力に配慮せず動作させると使用量が最大供給量を超過し異常動作を引き起こしたり、姿勢を制御できなかったりするなどの問題を引き起こす可能性がある。
<< 7. Others >>
<7-1. Application to robot systems>
The DNN inference device 70 including the DNN partial inference device 31 and the monitoring / adjustment module 100 described above can be applied to a robot system that processes a camera image by a DNN. Most robots that operate outdoors have a maximum power consumption contract because they are driven by a battery. In addition, electric parts with high power consumption such as actuators are mounted, and recognition processing and communication processing for determining actions are also performed at the same time, so that the power consumption fluctuates depending on the situation. If the system is operated without considering the power consumption of the entire system, the usage amount may exceed the maximum supply amount, causing abnormal operation or the inability to control the posture.

 ロボットシステムにおいて、DNN部分推論器31と、監視・調整モジュール100とで構成されるDNN推論器70を用いて、カメラ画像をDNNにより認識する認識処理を実行することにより、消費電力が最大供給量を超えないように調整しつつ認識処理を絶やさずに実行し続けることができる。監視・調整モジュール100の入力に現在の消費電力を与え、出力として次のDNN要素のPruning割合が決定されるように構成する。これにより、システム全体の消費電力が大きくなった場合にはPruning割合が大きくなり一部のDNNの要素の計算が省略される。一部の計算が省略されることにより時間当たりの回路の利用率が低下し、DNNの処理の動的消費電力が低く抑えられ余剰が生まれ、アクチュエータ等の消費電力を多く求めている箇所に電力を供給することが可能となる。また、Pruning割合は事前の解析により決定され、DNNによる認識処理自体も継続して行えるため、カメラでとらえた動物体の認識漏れや行動決定の遅れを最小限に抑えることができる。アクチュエータ等の消費電力が増大し、大幅にPruning割合を高くしなければならない状況では、DNNによる認識処理の認識精度が低下し、これに伴い、ある程度動物体の検出率が低下することによって、周囲の環境や人に対して安全でない行動をとる可能性も考えられる。そのような場合では、運搬ロボットであれば運搬時間を犠牲にしてでも環境や人に被害を与えないような経路を選択したり、家庭内で使用されるペット型ロボットであれば目をつぶっているようにユーザに見せたりする等の工夫により、安全性をより高めることができる。 In the robot system, the maximum power consumption is supplied by executing the recognition process of recognizing the camera image by the DNN by using the DNN inference device 70 composed of the DNN partial inference module 31 and the monitoring / adjustment module 100. It is possible to continue to execute the recognition process while adjusting so that it does not exceed. The current power consumption is given to the input of the monitoring / adjusting module 100, and the Pruning ratio of the next DNN element is determined as the output. As a result, when the power consumption of the entire system becomes large, the pruning ratio becomes large and the calculation of some DNN elements is omitted. By omitting some calculations, the utilization rate of the circuit per hour decreases, the dynamic power consumption of DNN processing is kept low, a surplus is generated, and power is generated in places where a large amount of power consumption is required, such as actuators. Can be supplied. In addition, since the pruning ratio is determined by prior analysis and the recognition process itself by DNN can be continuously performed, it is possible to minimize the omission of recognition of the animal body captured by the camera and the delay in action decision. In a situation where the power consumption of the actuator etc. increases and the Pruning ratio must be significantly increased, the recognition accuracy of the recognition process by DNN decreases, and the detection rate of the animal body decreases to some extent, so that the surroundings It is also possible to take unsafe actions against the environment and people. In such a case, if it is a transport robot, select a route that does not damage the environment or people even if the transport time is sacrificed, or if it is a pet-type robot used at home, close your eyes. The safety can be further enhanced by making the user look like it is.

<7-2.ゲーム機への適用>
 上述してきたDNN推論器70を、音声やハンドジェスチャなどをDNNによる処理により認識するUI機能を備えたゲーム機に適用することができる。ゲーム機は予め決められたメモリ容量がある。シーン描画処理の品質を上げるためには大量のジオメトリや材質情報をメモリに展開しておく必要がある。DNNによる認識処理が並行して動作し、ある程度メモリを使用する場合は描画品質をある程度下げることになる。しかし、ゲームの進行の都合上、ムービーシーン等において操作性をある程度諦めてでも描画品質を高めてプレイヤーに情景を見させたいという場面が存在する。また、屋内シーンや、UIだけ表示される画面においては、描画処理のために使われるメモリ量は少ない。そこで、このような場合には、余剰のメモリ量をDNNによる認識処理に割り当てて認識精度を向上したい。
<7-2. Application to game consoles>
The above-mentioned DNN inference device 70 can be applied to a game machine having a UI function for recognizing voice, hand gesture, and the like by processing by DNN. The game machine has a predetermined memory capacity. In order to improve the quality of the scene drawing process, it is necessary to expand a large amount of geometry and material information in the memory. The recognition process by DNN operates in parallel, and when the memory is used to some extent, the drawing quality is lowered to some extent. However, due to the progress of the game, there are scenes in which the player wants to see the scene by improving the drawing quality even if the operability is given up to some extent in the movie scene or the like. In addition, in indoor scenes and screens where only the UI is displayed, the amount of memory used for drawing processing is small. Therefore, in such a case, it is desired to allocate the surplus memory amount to the recognition process by DNN to improve the recognition accuracy.

 上述してきたDNN推論器70をゲーム機に適用することにより、上記のような要求をかなえることが可能となる。監視・調整モジュール100の入力に描画処理のメモリ使用量を与え、出力として量子化ビット数が決定されるようにする。こうすることで、描画処理のために大きなメモリ領域を確保する必要が生じた時、量子化ビット数が瞬時に切り替わり、少ないメモリ使用量でDNNによる認識処理が行われるようになる。この結果生まれたメモリ容量の余剰で描画品質を向上することが実現できる。一方で、DNNを用いているUI機能の応答性が低下してしまうことが懸念されるが、ゲーム開発者がプレイヤーをムービーシーンに引き込みたいときに使うなど場面を慎重に選択したり、応答性が低下していることをコントローラ等に可視化したり(図11等参照)、ユーザにゲームを中断してUI機能を使わせるよう示したりするなどの対策をとることができる。 By applying the DNN inference device 70 described above to a game machine, it is possible to meet the above requirements. The memory usage of the drawing process is given to the input of the monitoring / adjusting module 100 so that the number of quantization bits is determined as the output. By doing so, when it becomes necessary to secure a large memory area for the drawing process, the number of quantization bits is changed instantly, and the recognition process by the DNN can be performed with a small amount of memory usage. It is possible to improve the drawing quality with the surplus memory capacity created as a result. On the other hand, there is a concern that the responsiveness of the UI function using DNN will decrease, but the game developer can carefully select the scene such as using it when he wants to draw the player into the movie scene, and the responsiveness. It is possible to take measures such as visualizing the decrease in the game to a controller or the like (see FIG. 11 or the like), or instructing the user to interrupt the game and use the UI function.

<<8.むすび>>
 本開示の実施形態に係る監視・調整モジュール100(情報処理装置の一例)は、ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システム1に適用され、取得部113と、算出部114と、決定部115とを備える。取得部113は、情報処理システム1の全体のリソース使用量を取得する。算出部114は、リソース使用量に基づいて、DNN推論器70による推論処理の少なくとも一部の計算(一例として、DNN部分推論器31による推論処理の計算)に割り当てる目標リソース使用量を算出する。決定部115は、目標リソース使用量に対応する計算方法(DNN部分推論器31による推論処理の計算方法)を決定する。このようなことから、監視・調整モジュール100は、情報処理システム1においてリソース使用量の超過を起こさないようにDNNモデル30の推論処理に用いるリソース量を調整できる。
<< 8. Conclusion >>
The monitoring / adjustment module 100 (an example of an information processing apparatus) according to the embodiment of the present disclosure is applied to an information processing system 1 that uses an inference result by an inference device using a neural network, and is applied to an information processing system 1 and has an acquisition unit 113 and a calculation unit 114. And a determination unit 115. The acquisition unit 113 acquires the total resource usage of the information processing system 1. The calculation unit 114 calculates the target resource usage amount to be allocated to at least a part of the calculation of the inference processing by the DNN inference device 70 (for example, the calculation of the inference processing by the DNN partial inference device 31) based on the resource usage amount. The determination unit 115 determines a calculation method (calculation method of inference processing by the DNN partial inference device 31) corresponding to the target resource usage amount. Therefore, the monitoring / adjustment module 100 can adjust the resource amount used for the inference processing of the DNN model 30 so as not to cause the resource usage amount to be exceeded in the information processing system 1.

 図15は、比較例に係る情報処理の一例を示す図である。図16は、本開示の実施形態に係るDNN推論器による情報処理の一例を示す図である。図17は、比較例に係る情報処理についての経過時間とリソース使用量との関係を示す図である。図18は、本開示の実施形態に係るDNN推論器による情報処理についての経過時間とリソース使用量との関係を示す図である。 FIG. 15 is a diagram showing an example of information processing according to a comparative example. FIG. 16 is a diagram showing an example of information processing by the DNN inference device according to the embodiment of the present disclosure. FIG. 17 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing according to the comparative example. FIG. 18 is a diagram showing the relationship between the elapsed time and the resource usage amount of the information processing by the DNN inferior according to the embodiment of the present disclosure.

 図15に示すように、システム全体のリソース使用量に基づいて処理のリソース使用量を調整する機能は、コード切り替えという技術による実現することも考えられる。図15では、DNN推論器で使用するコードをリソース余剰がある場合の「通常時のコード」と、リソース超過を起こしている場合の「超過時のコード」という同じ目的の2つのコードが用意されていて、リソース使用量によりコードが切り替えられる仕組みを取っているとする。この仕組みでは、図17の矢印に示すように、リソース使用量を調整できる機会は、例えば、データ1に対するDNN推論やデータ2に対するDNN推論などDNNによる推論を実行する前に限られている。このため、システム全体のリソース使用量が急に増加した際に、DNN推論器のリソース使用量を瞬時に融通・調整することができない。この結果、システム全体のリソース使用量が超過しシステムが停止する可能性や、安全をとってDNN推論が停止される可能性がある。 As shown in FIG. 15, the function of adjusting the resource usage of processing based on the resource usage of the entire system may be realized by a technique called code switching. In FIG. 15, two codes having the same purpose are prepared, that is, the code used in the DNN inferior is the "normal code" when there is a resource surplus and the "excess code" when the resource is exceeded. It is assumed that the code can be switched according to the resource usage. In this mechanism, as shown by the arrow in FIG. 17, the opportunity to adjust the resource usage is limited before executing the inference by DNN such as the DNN inference for the data 1 and the DNN inference for the data 2. Therefore, when the resource usage of the entire system suddenly increases, the resource usage of the DNN inferior cannot be instantly accommodated and adjusted. As a result, the resource usage of the entire system may be exceeded and the system may be stopped, or DNN inference may be stopped for safety.

 一方、本開示の実施形態にDNN推論器70は、上述したように、DNNの計算が要素ごとの計算(各DNN部分推論器31の計算)の積み重ねであること、要素ごとに計算精度を変更したとしてもDNN全体の出力精度が低下しにくい性質(上述の参考文献1参照)を活用している。本開示の実施形態に係るDNN推論器70では、図16に示すように、DNN部分推論器31m+1においてリソース使用量を調整する必要が生じても、図18の矢印に示すように、要素毎にリソース使用量を調整できる機会があることより、順次に融通・調整を行うことができる。 On the other hand, in the embodiment of the present disclosure, in the DNN inferior 70, as described above, the DNN calculation is a stack of calculation for each element (calculation of each DNN partial inference device 31), and the calculation accuracy is changed for each element. Even if this is the case, the property that the output accuracy of the entire DNN does not easily decrease (see Reference 1 above) is utilized. In the DNN inference device 70 according to the embodiment of the present disclosure, even if it becomes necessary to adjust the resource usage in the DNN partial inference device 31 m + 1 as shown in FIG. 16, as shown by the arrow in FIG. 18, for each element. Since there is an opportunity to adjust the resource usage, it is possible to make adjustments in sequence.

 図19は、既存技術による情報処理の一例を示す図である。システム全体のリソース使用量に基づいて処理のリソース使用量を調整する目的は、システムで利用することのできる限られたリソースをDNNの処理や他の処理で適切に配分する点にある。このような目的を実現するための機能として、例えば、先に説明した先行技術文献(特開2012-43409号公報)で示されている仕組みを適用することが考えらえる。すなわち、データ処理負荷(リソース使用量)の増大することが予測された場合に、時系列の入力データを選択的に除去したり、データ処理を遅延させたり、データ処理を他のシステムにオフロードしたりするデータ処理システムにおいて、データ処理をDNN推論処理と読み替えることにより実現することができる。しかしながら、このような仕組みでは、他の処理のリソース使用量が増大した際に、DNN推論が行えないデータが生じたり、処理結果が遅れて取得されることになりDNN推論のリアルタイム性・品質が著しく低下したりするというおそれがある。 FIG. 19 is a diagram showing an example of information processing by the existing technology. The purpose of adjusting the resource usage of processing based on the resource usage of the entire system is to appropriately allocate the limited resources that can be used in the system in DNN processing and other processing. As a function for realizing such an object, for example, it is conceivable to apply the mechanism shown in the prior art document (Japanese Patent Laid-Open No. 2012-43409) described above. That is, when the data processing load (resource usage) is predicted to increase, time-series input data is selectively removed, data processing is delayed, or data processing is offloaded to other systems. It can be realized by reading the data processing as the DNN inference processing in the data processing system. However, with such a mechanism, when the resource usage of other processing increases, data that cannot be DNN inferred is generated, or the processing result is acquired with a delay, and the real-time property and quality of DNN inference are improved. There is a risk that it will drop significantly.

 一方、本開示の実施形態に係るDNN推論器70では、DNNの計算途中で計算精度を低くしても結果の精度に影響しにくい性質(上述の参考文献1参照)を活用して、例えば、上記図6等を用いて説明したように、リソース使用量を調整しつつもデータ飛ばしや遅延増加のないデータ処理システムを構築できる。 On the other hand, in the DNN inference device 70 according to the embodiment of the present disclosure, for example, by utilizing the property that even if the calculation accuracy is lowered during the DNN calculation, the accuracy of the result is not easily affected (see Reference 1 above), for example. As described with reference to FIG. 6 and the like, it is possible to construct a data processing system that does not skip data or increase delay while adjusting the amount of resources used.

 また、DNN推論器70(推論器の一例)は、予め定められる条件に基づいて、所定の粒度で複数のDNN部分推論器31(部分推論器の一例)に分割される。そして、算出部114は、処理対象となるDNN部分推論器31の推論処理に割り当てる目標リソース量を算出し、決定部115は、目標リソース量に基づいて、処理対象となるDNN部分推論器31の推論処理における計算方法を決定し、当該処理対象となるDNN部分推論器31に出力する。これにより、監視・調整モジュール100は、システムにおいて割り当て可能なリソースの中から、DNN部分推論器31ごとに適切なリソース量を配分できる。 Further, the DNN inference device 70 (an example of an inference device) is divided into a plurality of DNN partial inference devices 31 (an example of a partial inference device) with a predetermined particle size based on a predetermined condition. Then, the calculation unit 114 calculates the target resource amount to be allocated to the inference processing of the DNN partial inference device 31 to be processed, and the determination unit 115 calculates the target resource amount of the DNN partial inference device 31 to be processed based on the target resource amount. The calculation method in the inference processing is determined and output to the DNN partial inference device 31 to be processed. As a result, the monitoring / adjustment module 100 can allocate an appropriate amount of resources to each DNN partial inferencer 31 from the resources that can be allocated in the system.

 また、DNN推論器70は、リソースの使用量ごとの計算方法を解析するために要するコスト、又はリソース使用量の調整に要する時間間隔に基づいて、複数のDNN部分推論器31に分割される。これにより、監視・調整モジュール100は、システム管理者の目的に応じた柔軟なリソース管理が可能となる。 Further, the DNN inferior 70 is divided into a plurality of DNN partial inferiors 31 based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage. As a result, the monitoring / adjustment module 100 enables flexible resource management according to the purpose of the system administrator.

 また、DNN推論器70は、層、チャネル、及び行列を含む複数の要素で構成される。そして、DNN部分推論器31は、層、チャネル、及び行列のうちの少なくとも1つの要素に基づく粒度で分割される。これにより、監視・調整モジュール100は、システム管理者の目的に応じた任意の粒度でリソース使用量の調整が可能となる。 Further, the DNN inferior 70 is composed of a plurality of elements including a layer, a channel, and a matrix. The DNN partial reasoner 31 is then divided at a particle size based on at least one element of the layer, channel, and matrix. As a result, the monitoring / adjustment module 100 can adjust the resource usage amount at an arbitrary granularity according to the purpose of the system administrator.

 また、決定部115は、タプルで構成された制御情報に基づいて計算方法を決定する。これにより、監視・調整モジュール100は、複数の要素で計算方法を規定できる。 Further, the determination unit 115 determines the calculation method based on the control information composed of tuples. Thereby, the monitoring / adjustment module 100 can specify the calculation method by a plurality of elements.

 また、算出部114は、情報処理システム1におけるリソース余剰に基づいて、目標リソース使用量を算出する。これにより、監視・調整モジュール100は、簡易に目標リソース量を算出できる。 Further, the calculation unit 114 calculates the target resource usage amount based on the resource surplus in the information processing system 1. As a result, the monitoring / adjustment module 100 can easily calculate the target resource amount.

 また、算出部114は、リソース余剰と、予め定められるマージンに基づいて、目標リソース使用量を算出する。これにより、監視・調整モジュール100は、システムの可用性を高めることができる。 In addition, the calculation unit 114 calculates the target resource usage amount based on the resource surplus and the predetermined margin. As a result, the monitoring / adjustment module 100 can increase the availability of the system.

 また、監視・調整モジュール100は、リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する対応情報格納部112(記憶部の一例)を備える。決定部115は、対応情報に基づいて、計算方法を決定する。これにより、監視・調整モジュール100は、システムにおけるDNNによる処理の精度を維持しつつ、DNNの各要素に対するリソース量の調整を迅速に行うことができる。 Further, the monitoring / adjustment module 100 includes a correspondence information storage unit 112 (an example of a storage unit) that stores correspondence information between the resource usage amount and the calculation method pre-analyzed for each resource usage amount. The determination unit 115 determines the calculation method based on the correspondence information. As a result, the monitoring / adjusting module 100 can quickly adjust the amount of resources for each element of the DNN while maintaining the accuracy of the processing by the DNN in the system.

 また、対応情報格納部112は、機械学習の結果として得られる対応情報を記憶する。これにより、リソース調整の精度を高めることができる。 Further, the correspondence information storage unit 112 stores the correspondence information obtained as a result of machine learning. This makes it possible to improve the accuracy of resource adjustment.

 また、決定部115は、対応情報に基づいて、目標リソース使用量に近いリソース使用量に対応付けられた複数の計算方法を取得し、取得した複数の計算方法に基づいて、計算方法を決定する。これにより、リソース量の調整速度と、調整精度とをバランスを取ることができる。 Further, the determination unit 115 acquires a plurality of calculation methods associated with the resource usage amount close to the target resource usage amount based on the correspondence information, and determines the calculation method based on the acquired plurality of calculation methods. .. This makes it possible to balance the adjustment speed of the resource amount and the adjustment accuracy.

 また、監視・調整モジュール100は、情報処理システム1において、目標リソース使用量が所定の基準以下となることを通知する通知部116を備える。これにより、情報処理システム1の動作の安全性を高めることができる。 Further, the monitoring / adjusting module 100 includes a notification unit 116 for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system 1. Thereby, the safety of the operation of the information processing system 1 can be enhanced.

 また、取得部113は、決定部115により計算方法が決定される度に、リソース使用量を取得し、算出部114は、取得部113によりリソース使用量を取得される度に、次に処理対象となるDNN部分推論器31の目標リソース使用量を算出し、決定部115は、算出部114により目標リソース使用量が算出される度に、次に処理対象となるDNN部分推論器31の計算方法を決定する。これにより、システムの状態にできるだけ合わせて、DNN部分推論器31のリソース量調整を実現できる。 Further, the acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 next processes the resource usage amount each time the acquisition unit 113 acquires the resource usage amount. The target resource usage amount of the DNN partial inference device 31 is calculated, and the determination unit 115 calculates the target resource usage amount of the DNN partial inference device 31 to be processed next each time the target resource usage amount is calculated by the calculation unit 114. To determine. Thereby, the resource amount adjustment of the DNN partial inference device 31 can be realized according to the state of the system as much as possible.

 取得部113は、決定部115により計算方法が決定される度に、リソース使用量を取得し、算出部114は、取得部113によりリソース使用量を取得される度に、以降に処理対象となる全てのDNN部分推論器31の目標リソース使用量を算出し、決定部115は、算出部114により目標リソース使用量が算出される度に、以降に処理対象となる全てのDNN部分推論器31の計算方法を決定する。これにより、システムにおけるリソース切り替えに伴うオーバーヘッドを削減できる。 The acquisition unit 113 acquires the resource usage amount each time the determination unit 115 determines the calculation method, and the calculation unit 114 is subject to subsequent processing each time the resource usage amount is acquired by the acquisition unit 113. The target resource usage of all the DNN partial inference devices 31 is calculated, and each time the calculation unit 114 calculates the target resource usage of the determination unit 115, the determination unit 115 of all the DNN partial inference devices 31 to be processed thereafter. Determine the calculation method. This can reduce the overhead associated with resource switching in the system.

 以上、本開示の実施形態について説明したが、本開示の技術的範囲は、上述の実施形態そのままに限定されるものではなく、本開示の要旨を逸脱しない範囲において種々の変更が可能である。また、異なる実施形態及び変形例にわたる構成要素を適宜組み合わせてもよい。 Although the embodiments of the present disclosure have been described above, the technical scope of the present disclosure is not limited to the above-described embodiments as they are, and various changes can be made without departing from the gist of the present disclosure. In addition, components spanning different embodiments and modifications may be combined as appropriate.

 また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示の技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者にとって明らかな他の効果を奏しうる。 Further, the effects described in the present specification are merely explanatory or exemplary and are not limited. That is, the techniques of the present disclosure may have other effects apparent to those of skill in the art from the description herein, in addition to, or in lieu of, the above effects.

 なお、本開示の技術は、本開示の技術的範囲に属するものとして、以下のような構成もとることができる。
(1)
 ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置であって、
 前記情報処理システムの全体のリソース使用量を取得する取得部と、
 前記リソース使用量に基づいて、前記推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する算出部と、
 前記目標リソース使用量に対応する計算方法を決定する決定部と
 を備える情報処理装置。
(2)
 前記推論器は、予め定められる条件に基づいて、所定の粒度で複数の部分推論器に分割され、
 前記算出部は、次に推論処理の計算を行う前記部分推論器の推論処理に割り当てる前記目標リソース使用量を算出し、
 前記決定部は、前記目標リソース使用量に基づいて、次に推論処理の計算を行う前記部分推論器の推論処理における計算方法を決定する
 前記(1)に記載の情報処理装置。
(3)
 前記推論器は、前記リソースの使用量ごとの前記計算方法を解析するために要するコスト、又はリソース使用量の調整に要する時間間隔に基づいて、複数の部分推論器に分割される
 前記(2)に記載の情報処理装置。
(4)
 前記推論器は、層、チャネル、及び行列を含む複数の要素で構成され、
 前記部分推論器は、層、チャネル、及び行列のうちの少なくとも1つの要素に基づく粒度で分割される
 前記(3)に記載の情報処理装置。
(5)
 前記決定部は、
 タプルで構成された制御情報に基づいて前記計算方法を決定する
 前記(1)に記載の情報処理装置。
(6)
 前記算出部は、
 前記情報処理システムにおけるリソース余剰に基づいて、前記目標リソース使用量を算出する
 前記(1)~前記(5)のいずれか1つに記載の情報処理装置。
(7)
 前記算出部は、
 前記リソース余剰と、予め定められるマージンに基づいて、前記目標リソース使用量を算出する
 前記(6)に記載の情報処理装置。
(8)
 リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する記憶部
 をさらに備え、
 前記決定部は、
 前記対応情報に基づいて、前記計算方法を決定する
 前記(1)~前記(7)のいずれか1つに記載の情報処理装置。
(9)
 前記記憶部は、
 機械学習の結果として得られる前記対応情報を記憶する
 前記(8)に記載の情報処理装置。
(10)
 前記決定部は、
 前記対応情報に基づいて、前記目標リソース使用量に近い前記リソース使用量に対応付けられた複数の前記計算方法を取得し、取得した複数の前記計算方法に基づいて、前記計算方法を決定する
 前記(8)又は前記(9)に記載の情報処理装置。
(11)
 前記情報処理システムにおいて、前記目標リソース使用量が所定の基準以下となることを通知する通知部
 をさらに備える前記(1)~前記(10)のいずれか1つに記載の情報処理装置。
(12)
 前記取得部は、
 前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
 前記算出部は、
 前記取得部により前記リソース使用量が取得される度に、次に処理対象となる前記部分推論器の前記目標リソース使用量を算出し、
 前記決定部は、
 前記算出部により前記目標リソース使用量が算出される度に、次に処理対象となる前記部分推論器の前記計算方法を決定する
 前記(2)に記載の情報処理装置。
(13)
 前記取得部は、
 前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
 前記算出部は、
 前記取得部により前記リソース使用量が取得される度に、以降に処理対象となる全ての前記部分推論器の前記目標リソース使用量を算出し、
 前記決定部は、
 前記算出部により前記目標リソース使用量が算出される度に、以降に処理対象となる全ての前記部分推論器の前記計算方法を決定する
 前記(2)に記載の情報処理装置。
(14)
 ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置のプロセッサが、
 前記情報処理システムの全体のリソース使用量を取得し、
 前記リソース使用量に基づいて、前記推論器の推論処理に割り当てる目標リソース使用量を算出し、
 前記目標リソース使用量に基づいて、前記推論器の推論処理における計算方法を決定する
 情報処理方法。
The technology of the present disclosure can be configured as follows, assuming that it belongs to the technical scope of the present disclosure.
(1)
It is an information processing device applied to an information processing system that uses the inference result by an inference device using a neural network.
An acquisition unit that acquires the total resource usage of the information processing system,
A calculation unit that calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
An information processing device including a determination unit that determines a calculation method corresponding to the target resource usage.
(2)
The inference device is divided into a plurality of partial inference devices with a predetermined particle size based on predetermined conditions.
The calculation unit calculates the target resource usage amount to be allocated to the inference processing of the partial inference device that next calculates the inference processing.
The information processing apparatus according to (1), wherein the determination unit determines a calculation method in the inference processing of the partial inference device that next calculates the inference processing based on the target resource usage amount.
(3)
The inferior is divided into a plurality of partial inferiors based on the cost required for analyzing the calculation method for each resource usage or the time interval required for adjusting the resource usage (2). The information processing device described in.
(4)
The inferencer is composed of a plurality of elements including layers, channels, and matrices.
The information processing apparatus according to (3) above, wherein the partial inference device is divided by a particle size based on at least one element of a layer, a channel, and a matrix.
(5)
The decision-making part
The information processing apparatus according to (1) above, wherein the calculation method is determined based on control information composed of tuples.
(6)
The calculation unit
The information processing apparatus according to any one of (1) to (5) above, which calculates the target resource usage amount based on the resource surplus in the information processing system.
(7)
The calculation unit
The information processing apparatus according to (6), wherein the target resource usage amount is calculated based on the resource surplus and a predetermined margin.
(8)
It also has a storage unit that stores the correspondence information between the resource usage and the calculation method pre-analyzed for each resource usage.
The decision-making part
The information processing apparatus according to any one of (1) to (7), wherein the calculation method is determined based on the corresponding information.
(9)
The storage unit is
The information processing device according to (8) above, which stores the corresponding information obtained as a result of machine learning.
(10)
The decision-making part
Based on the correspondence information, a plurality of the calculation methods associated with the resource usage amount close to the target resource usage amount are acquired, and the calculation method is determined based on the acquired plurality of the calculation methods. (8) or the information processing apparatus according to (9) above.
(11)
The information processing apparatus according to any one of (1) to (10) above, further comprising a notification unit for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system.
(12)
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Each time the resource usage amount is acquired by the acquisition unit, the target resource usage amount of the partial inference device to be processed next is calculated.
The decision-making part
The information processing apparatus according to (2), wherein the calculation method of the partial inference device to be processed next is determined each time the target resource usage amount is calculated by the calculation unit.
(13)
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Every time the resource usage amount is acquired by the acquisition unit, the target resource usage amount of all the partial inference devices to be processed thereafter is calculated.
The decision-making part
The information processing apparatus according to (2) above, wherein each time the target resource usage amount is calculated by the calculation unit, the calculation method of all the partial inference devices to be processed thereafter is determined.
(14)
The processor of the information processing device applied to the information processing system that uses the inference result by the inference device using the neural network
Acquire the total resource usage of the information processing system,
Based on the resource usage, the target resource usage to be allocated to the inference processing of the inference device is calculated.
An information processing method that determines a calculation method in the inference processing of the inference device based on the target resource usage.

1 情報処理システム
11 プロセッサ
12 主記憶装置
13 補助記憶装置
14 周辺回路
15 入力装置
16 出力装置
17 周辺装置
18 通信装置
20 内部バス
30 DNNモデル
31 DNN部分推論器
50 システムモジュール
70 DNN推論器
90 ユーザ通知モジュール
91 操作デバイス
100 監視・調整モジュール
111 リソース使用量情報格納部
112 対応情報格納部
113 取得部
114 算出部
115 決定部
116 通知部
1 Information processing system 11 Processor 12 Main storage device 13 Auxiliary storage device 14 Peripheral circuit 15 Input device 16 Output device 17 Peripheral device 18 Communication device 20 Internal bus 30 DNN model 31 DNN partial inference device 50 System module 70 DNN inference device 90 User notification Module 91 Operation device 100 Monitoring / adjustment module 111 Resource usage information storage unit 112 Corresponding information storage unit 113 Acquisition unit 114 Calculation unit 115 Decision unit 116 Notification unit

Claims (14)

 ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置であって、
 前記情報処理システムの全体のリソース使用量を取得する取得部と、
 前記リソース使用量に基づいて、前記推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出する算出部と、
 前記目標リソース使用量に対応する計算方法を決定する決定部と
 を備える情報処理装置。
It is an information processing device applied to an information processing system that uses the inference result by an inference device using a neural network.
An acquisition unit that acquires the total resource usage of the information processing system,
A calculation unit that calculates the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device based on the resource usage.
An information processing device including a determination unit that determines a calculation method corresponding to the target resource usage.
 前記推論器は、予め定められる条件に基づいて、所定の粒度で複数の部分推論器に分割され、
 前記算出部は、次に推論処理の計算を行う前記部分推論器の推論処理に割り当てる前記目標リソース使用量を算出し、
 前記決定部は、前記目標リソース使用量に基づいて、次に推論処理の計算を行う前記部分推論器の推論処理における計算方法を決定する
 請求項1に記載の情報処理装置。
The inference device is divided into a plurality of partial inference devices with a predetermined particle size based on predetermined conditions.
The calculation unit calculates the target resource usage amount to be allocated to the inference processing of the partial inference device that next calculates the inference processing.
The information processing apparatus according to claim 1, wherein the determination unit determines a calculation method in the inference processing of the partial inference device that next calculates the inference processing based on the target resource usage amount.
 前記推論器は、前記リソース使用量ごとの前記計算方法を解析するために要するコスト、又は前記リソース使用量の調整に要する時間間隔に基づいて、複数の部分推論器に分割される
 請求項2に記載の情報処理装置。
The inference device is divided into a plurality of partial inferencers based on the cost required for analyzing the calculation method for each resource usage amount or the time interval required for adjusting the resource usage amount according to claim 2. The information processing device described.
 前記推論器は、層、チャネル、及び行列を含む複数の要素で構成され、
 前記部分推論器は、層、チャネル、及び行列のうちの少なくとも1つの要素に基づく粒度で分割される
 請求項3に記載の情報処理装置。
The inferencer is composed of a plurality of elements including layers, channels, and matrices.
The information processing apparatus according to claim 3, wherein the partial inference device is divided by a particle size based on at least one element of a layer, a channel, and a matrix.
 前記決定部は、
 タプルで構成された制御情報に基づいて前記計算方法を決定する
 請求項1に記載の情報処理装置。
The decision-making part
The information processing apparatus according to claim 1, wherein the calculation method is determined based on control information composed of tuples.
 前記算出部は、
 前記情報処理システムにおけるリソース余剰に基づいて、前記目標リソース使用量を算出する
 請求項1に記載の情報処理装置。
The calculation unit
The information processing apparatus according to claim 1, wherein the target resource usage amount is calculated based on the resource surplus in the information processing system.
 前記算出部は、
 前記リソース余剰と、予め定められるマージンに基づいて、前記目標リソース使用量を算出する
 請求項6に記載の情報処理装置。
The calculation unit
The information processing apparatus according to claim 6, wherein the target resource usage amount is calculated based on the resource surplus and a predetermined margin.
 リソース使用量と、当該リソース使用量ごとに事前解析した計算方法との対応情報を記憶する記憶部
 をさらに備え、
 前記決定部は、
 前記対応情報に基づいて、前記計算方法を決定する
 請求項1に記載の情報処理装置。
It also has a storage unit that stores the correspondence information between the resource usage and the calculation method pre-analyzed for each resource usage.
The decision-making part
The information processing apparatus according to claim 1, wherein the calculation method is determined based on the corresponding information.
 前記記憶部は、
 機械学習の結果として得られる前記対応情報を記憶する
 請求項8に記載の情報処理装置。
The storage unit is
The information processing apparatus according to claim 8, which stores the corresponding information obtained as a result of machine learning.
 前記決定部は、
 前記対応情報に基づいて、前記目標リソース使用量に近い前記リソース使用量に対応付けられた複数の前記計算方法を取得し、取得した複数の前記計算方法に基づいて、前記計算方法を決定する
 請求項8に記載の情報処理装置。
The decision-making part
A claim that acquires a plurality of the calculation methods associated with the resource usage amount close to the target resource usage amount based on the correspondence information, and determines the calculation method based on the acquired plurality of the calculation methods. Item 8. The information processing apparatus according to Item 8.
 前記情報処理システムにおいて、前記目標リソース使用量が所定の基準以下となることを通知する通知部
 をさらに備える請求項1に記載の情報処理装置。
The information processing apparatus according to claim 1, further comprising a notification unit for notifying that the target resource usage amount is equal to or less than a predetermined standard in the information processing system.
 前記取得部は、
 前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
 前記算出部は、
 前記取得部により前記リソース使用量が取得される度に、次に推論処理の計算を行う前記部分推論器の前記目標リソース使用量を算出し、
 前記決定部は、
 前記算出部により前記目標リソース使用量が算出される度に、次に推論処理の計算を行う前記部分推論器の前記計算方法を決定する
 請求項2に記載の情報処理装置。
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Each time the resource usage is acquired by the acquisition unit, the target resource usage of the partial inference device that next calculates the inference process is calculated.
The decision-making part
The information processing apparatus according to claim 2, wherein the calculation method of the partial inference device that next calculates the inference process is determined each time the target resource usage amount is calculated by the calculation unit.
 前記取得部は、
 前記決定部により前記計算方法が決定される度に、前記リソース使用量を取得し、
 前記算出部は、
 前記取得部により前記リソース使用量が取得される度に、以降に推論処理の計算を行う全ての前記部分推論器の前記目標リソース使用量を算出し、
 前記決定部は、
 前記算出部により前記目標リソース使用量が算出される度に、以降に推論処理の計算を行う全ての前記部分推論器の前記計算方法を決定する
 請求項2に記載の情報処理装置。
The acquisition unit
Every time the calculation method is determined by the determination unit, the resource usage amount is acquired.
The calculation unit
Every time the resource usage is acquired by the acquisition unit, the target resource usage of all the partial inference devices for which the inference processing is calculated thereafter is calculated.
The decision-making part
The information processing apparatus according to claim 2, wherein every time the target resource usage amount is calculated by the calculation unit, the calculation method of all the partial inference devices for which the inference processing is calculated thereafter is determined.
 ニューラルネットワークを用いた推論器による推論結果を利用する情報処理システムに適用される情報処理装置のプロセッサが、
 前記情報処理システムの全体のリソース使用量を取得し、
 前記リソース使用量に基づいて、前記推論器による推論処理の少なくとも一部の計算に割り当てる目標リソース使用量を算出し、
 前記目標リソース使用量に対応する計算方法を決定する
 情報処理方法。
The processor of the information processing device applied to the information processing system that uses the inference result by the inference device using the neural network
Acquire the total resource usage of the information processing system,
Based on the resource usage, the target resource usage to be allocated to at least a part of the calculation of the inference processing by the inference device is calculated.
An information processing method that determines a calculation method corresponding to the target resource usage.
PCT/JP2021/018193 2020-05-20 2021-05-13 Information processing device, and information processing method Ceased WO2021235312A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020088465 2020-05-20
JP2020-088465 2020-05-20

Publications (1)

Publication Number Publication Date
WO2021235312A1 true WO2021235312A1 (en) 2021-11-25

Family

ID=78708356

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/018193 Ceased WO2021235312A1 (en) 2020-05-20 2021-05-13 Information processing device, and information processing method

Country Status (1)

Country Link
WO (1) WO2021235312A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7421628B1 (en) 2022-12-26 2024-01-24 ソフトバンク株式会社 Information processing device, information processing method, and information processing program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004246552A (en) * 2003-02-13 2004-09-02 Nec Corp Computer system, and memory allocation amount guaranteed value dynamic changing method and program
JP2012181730A (en) * 2011-03-02 2012-09-20 Nec Corp Resource management system, resource management method, and resource management program
US20180336481A1 (en) * 2017-07-31 2018-11-22 Seematics Systems Ltd System and method for incremental annotation of datasets
WO2020053991A1 (en) * 2018-09-12 2020-03-19 三菱電機株式会社 Manufacturing system design assistance apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004246552A (en) * 2003-02-13 2004-09-02 Nec Corp Computer system, and memory allocation amount guaranteed value dynamic changing method and program
JP2012181730A (en) * 2011-03-02 2012-09-20 Nec Corp Resource management system, resource management method, and resource management program
US20180336481A1 (en) * 2017-07-31 2018-11-22 Seematics Systems Ltd System and method for incremental annotation of datasets
WO2020053991A1 (en) * 2018-09-12 2020-03-19 三菱電機株式会社 Manufacturing system design assistance apparatus

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7421628B1 (en) 2022-12-26 2024-01-24 ソフトバンク株式会社 Information processing device, information processing method, and information processing program
JP2024092532A (en) * 2022-12-26 2024-07-08 ソフトバンク株式会社 Information processing device, information processing method, and information processing program

Similar Documents

Publication Publication Date Title
KR102428091B1 (en) Method of application aware io completion mode changer for key value device
US12353192B2 (en) Machine learning device, computer device, control system, and machine learning method
KR101990411B1 (en) System for scaling resource based on priority in cloud system, apparatus and method thereof
US11803221B2 (en) AI power regulation
KR20190043419A (en) Method of controlling computing operations based on early-stop in deep neural network
KR20190143072A (en) Method and system for adaptive bitrate publishing
EP3557417A1 (en) Information processing device and information processing method
US20200218632A1 (en) Performance engineering
CN114419229B (en) Image rendering method, device, computer equipment and storage medium
WO2020045794A1 (en) Electronic device and control method thereof
WO2021235312A1 (en) Information processing device, and information processing method
KR20200139909A (en) Electronic apparatus and method of performing operations thereof
JP2006178854A (en) Electronic circuit
JP4811260B2 (en) Programmable controller and its support device
KR20200101210A (en) Electronic device and method for determining operating frequency of processor
JP7517746B2 (en) Learning device, communication system, learning method, and learning program
KR102816748B1 (en) Apparatus and method for processing task offloading
US20230171340A1 (en) Mobile device and method for providing personalized management system
CN115309517A (en) Task scheduling training method, device and equipment and computer readable storage medium
KR20230108063A (en) Control method and system based on layer-wise adaptive channel pruning
CN117076134A (en) Unmanned aerial vehicle state data processing method and system based on artificial intelligence
KR102526636B1 (en) Neural network parameter processing method and device in distributed cloud environment
CN120806566B (en) Production line resource control methods, devices, electronic equipment and storage media
US12474760B2 (en) Face tracking device, system, and method
Prodanov et al. Demonstrating Smart Scaling of AI-Services for Future Networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21808490

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21808490

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP