WO2025099876A1

WO2025099876A1 - Information processing device, information processing method, and information processing program

Info

Publication number: WO2025099876A1
Application number: PCT/JP2023/040280
Authority: WO
Inventors: 規昭井上; 淑実大久保; 旭史; 玲那星野; 啓坂本; 淑美一柳
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: NTT Inc
Priority date: 2023-11-08
Filing date: 2023-11-08
Publication date: 2025-05-15
Anticipated expiration: 2026-05-08

Abstract

This information processing device makes inferences (e.g., object detection) by decimating consecutive frames on a prescribed interval. Upon detecting an event by inference with respect to a frame, the information processing device makes inferences with respect to consecutive frames to the frame in which the event was detected. If the event is no longer detected as a result of the inferences with respect to the frames, the information processing device again makes inferences by decimating frames on the prescribed interval. The information processing device also makes inferences by going back to a frame preceding the frame in which the event was detected.

Description

Information processing device, information processing method, and information processing program

　本発明は、動画のフレームに対する推論を行うための、情報処理装置、情報処理方法、および、情報処理プログラムに関する。 The present invention relates to an information processing device, an information processing method, and an information processing program for performing inference on video frames.

　従来、深層学習の発展や、IoT（Internet　of　Things）機器等の普及により、市中に設定された監視カメラで撮影した動画から人物検知を行ったり、車載カメラで撮影した動画から車両検知を行ったりするアプリケーションが登場している。　With the development of deep learning and the spread of IoT (Internet of Things) devices, applications have emerged that detect people from videos captured by surveillance cameras installed throughout the city and detect vehicles from videos captured by in-vehicle cameras.

　ここで、上記の動画に対する処理のリアルタイム性を担保するにはIoT機器上で処理を行うことが望ましい。しかし、IoT機器はリソース等の制約が大きいことから、物体検知等を高精度に行うための深層学習モデルをデプロイすることは困難である。 Here, to ensure that the above video processing can be done in real time, it is desirable to perform the processing on the IoT device. However, IoT devices have significant resource constraints, making it difficult to deploy deep learning models for highly accurate object detection, etc.

　そこで、IoT機器とクラウドコンピュータが連携して物体検知処理を実行する技術が提案されている。しかし、当該技術は、IoT機器が増加すると、それに伴いシステム全体の推論コストが増大するという問題がある。このような問題を解決するため、例えば、まずIoT機器側（１層目）で軽量なモデルにより動画に対する物体検知等の推論処理を行い、上記の推論処理の出力結果に基づき必要と判定された場合にのみ、クラウドコンピュータ（２層目）が高度な推論処理を行う技術が提案されている（２層推論システム、非特許文献１参照）。 In response, a technology has been proposed in which IoT devices and a cloud computer work together to perform object detection processing. However, this technology has the problem that as the number of IoT devices increases, the inference costs of the entire system increase accordingly. To solve this problem, a technology has been proposed in which, for example, the IoT device (first layer) first performs inference processing such as object detection for videos using a lightweight model, and the cloud computer (second layer) performs advanced inference processing only when it is determined to be necessary based on the output results of the above inference processing (two-layer inference system, see non-patent document 1).

Daniel　Kang,　John　Emmons,　Firas　Abuzaid,　Peter　Bailis,　Matei　Zaharia,　"NoScope:Optimizing　Neural　Network　Queries　over　Video　at　Scale"，［online］，インターネット＜URL：https://www.vldb.org/pvldb/vol10/p1586-kang.pdf＞Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, Matei Zaharia, "NoScope: Optimizing Neural Network Queries over Video at Scale", [online], Internet <URL: https://www.vldb.org/pvldb/vol10/p1586-kang.pdf>

　しかし、上記の２層推論システムにおいて、IoT機器（１層目）のリソースを削減するために１層目で用いる推論モデルを軽量化なモデルにすると、１層目の推論精度が低下し、その影響を受けてシステム全体の推論精度が低下してしまうという問題がある。 However, in the two-layer inference system described above, if the inference model used in the first layer is made lightweight in order to reduce the resources of the IoT device (first layer), the inference accuracy of the first layer will decrease, which will in turn affect the inference accuracy of the entire system.

　例えば、入力データに実際には検知対象の物体が映っているにもかかわらず、１層目で「入力データに検知対象が映っている確度が低い」と推論されると、そのデータについて２層目での推論は行われない。その結果、入力データに映っている検知対象が未検知となってしまう。 For example, if the input data actually contains the object to be detected, but the first layer infers that "there is a low probability that the input data contains the object to be detected," then inference will not be performed on that data in the second layer. As a result, the object to be detected that is contained in the input data will go undetected.

　そこで、本発明は、例えば、動画等の連続してイベントが発生する可能性の高いデータに対し推論処理を行う際、推論の精度を低下させず、かつ、推論の頻度を低減することを課題とする。 The present invention aims to reduce the frequency of inference without reducing the accuracy of inference when performing inference processing on data such as video data, which is likely to have consecutive events.

　前記した課題を解決するため、本発明は、連続したデータの入力を受け付けるデータ入力部と、前記連続したデータを間引いて前記データに対する推論を行い、前記推論によりイベントが検知されたデータを起点に連続したデータを対象に推論を行う推論部と、前記データの推論による推論結果を出力する出力処理部とを備えることを特徴とする。 In order to solve the above problems, the present invention is characterized by comprising a data input unit that accepts input of consecutive data, an inference unit that thins out the consecutive data and performs inference on the data, and performs inference on consecutive data starting from data at which an event is detected by the inference, and an output processing unit that outputs the inference results based on the inference on the data.

　本発明によれば、動画等の連続してイベントが発生する可能性の高いデータに対し推論処理を行う際、推論の精度を低下させず、かつ、推論の頻度を低減することができる。 According to the present invention, when performing inference processing on data such as video data in which there is a high possibility of consecutive events occurring, it is possible to reduce the frequency of inference without reducing the accuracy of the inference.

図１は、情報処理装置の概要を説明するための図である。FIG. 1 is a diagram for explaining an overview of an information processing device. 図２は、フレームの推論の例を説明するための図である。FIG. 2 is a diagram for explaining an example of frame inference. 図３は、情報処理装置が実行する処理の例を示す図である。FIG. 3 is a diagram illustrating an example of a process executed by the information processing device. 図４は、情報処理装置の構成例を示す図である。FIG. 4 is a diagram illustrating an example of the configuration of an information processing device. 図５は、情報処理装置が実行する処理の例を示すフローチャートである。FIG. 5 is a flowchart illustrating an example of a process executed by the information processing device. 図６は、情報処理装置が過去のフレームに遡って推論を実行する処理の例を示す図である。FIG. 6 is a diagram illustrating an example of a process in which the information processing device performs inference by going back to past frames. 図７は、情報処理装置が動画ファイルの各フレームを対象に推論を実行する処理の例を示す図である。FIG. 7 is a diagram illustrating an example of a process in which an information processing device executes inference on each frame of a video file. 図８は、情報処理プログラムを実行するコンピュータの構成例を示す図である。FIG. 8 is a diagram illustrating an example of the configuration of a computer that executes an information processing program.

　以下、図面を参照しながら、本発明を実施するための形態（実施形態）について説明する。本発明は、本実施形態に限定されない。 Below, a mode (embodiment) for carrying out the present invention will be described with reference to the drawings. The present invention is not limited to this embodiment.

　まず、図１を用いて、本実施形態の情報処理装置の概要を説明する。なお、情報処理装置が処理対象とするデータは、例えば、IoT機器から出力される連続したデータである。以下では、当該データが監視カメラ等に撮影された動画（連続するフレーム）である場合を例に説明する。 First, an overview of the information processing device of this embodiment will be described with reference to FIG. 1. Note that the data to be processed by the information processing device is, for example, continuous data output from an IoT device. In the following, an example will be described in which the data is a video (continuous frames) captured by a surveillance camera or the like.

　動画を構成するフレームのうち、あるフレームで発生したイベントは、当該フレームの近傍の連続するフレームにおいても発生している可能性が高いという特性を持つ。発明者らは上記の特性に着目し、情報処理装置が以下のようにして各フレームの推論を行う構成とした。 Among the frames that make up a video, a characteristic is that an event that occurs in a certain frame is highly likely to have also occurred in adjacent frames. The inventors focused on this characteristic and configured an information processing device to perform inference for each frame in the following manner.

　情報処理装置は、まず入力データ（連続するフレーム）を間引きながら推論を行う。ここでの推論は、例えば、フレームに映った物体の検知である（図２参照）。そして、情報処理装置は、フレームの推論の結果、所定のイベントが発生していると判定した場合（イベントありと判定した場合）、次のフレーム以降、連続してフレームの推論を実行する。 The information processing device first performs inference while thinning out the input data (consecutive frames). The inference here is, for example, the detection of an object reflected in the frame (see Figure 2). Then, if the information processing device determines that a specific event has occurred as a result of the frame inference (if it determines that an event has occurred), it performs frame inference continuously from the next frame onwards.

　例えば、情報処理装置は、図２に示すフレーム２０１に対する推論の結果（符号２０２）に基づき、フレーム２０１の所定の領域に所定の属性（クラス）の物体が映っていると判定した場合、イベントが発生していると判定し、次のフレーム以降、連続してフレームの推論を実行する。一方、情報処理装置は、フレームの推論の結果、所定のイベントが発生していないと判定した場合（イベントなしと判定した場合）、いままで同様にフレームを間引きして推論を実行する。 For example, if the information processing device determines that an object with a predetermined attribute (class) is captured in a predetermined area of frame 201 based on the inference result (reference symbol 202) for frame 201 shown in FIG. 2, it determines that an event has occurred, and executes frame inference continuously from the next frame onwards. On the other hand, if the information processing device determines that a predetermined event has not occurred as a result of frame inference (determines that there is no event), it thins out frames and executes inference in the same way as before.

　つまり、情報処理装置は、フレームに所定のイベントが発生しているか否かに基づき、フレームの推論を間引くか否かを判定する。これにより、情報処理装置は、フレームの推論の頻度を低減し、かつ、検知漏れを防ぐことができる。 In other words, the information processing device determines whether to thin out inferences for a frame based on whether a specified event has occurred in the frame. This enables the information processing device to reduce the frequency of inferences for a frame and prevent missed detections.

　図３を参照しつつ、情報処理装置が実行する処理の具体例を説明する。図３に示すように、情報処理装置は連続したフレームを間引いて推論し（例えば、５フレームごとに推論し）、推論の結果、フレームからイベントを検知しなかった場合は、次の推論のタイミングまで推論をスキップする。一方、情報処理装置は、推論の結果、フレームからイベントを検知した場合、連続してフレームを推論する。そして、情報処理装置は、推論の結果、イベントを検知しなかった場合、再度フレームを間引いて推論する。 A specific example of processing executed by an information processing device will be described with reference to FIG. 3. As shown in FIG. 3, the information processing device performs inference by thinning out consecutive frames (for example, performing inference every 5 frames), and if the inference result shows that no event is detected from the frame, the inference is skipped until the next inference timing. On the other hand, if the inference result shows that an event is detected from the frame, the information processing device performs inference on consecutive frames. Then, if the inference result shows that no event is detected, the information processing device performs inference by thinning out frames again.

　なお、情報処理装置は、連続してフレームを推論し、１つのフレームでもイベントを検知しなくなったタイミングで、フレームを間引いて推論する処理を再開してもよいし、複数のフレーム（例えば、２つのフレーム）で連続してイベントを検知しなくなったタイミングでフレームを間引いて推論する処理を再開してもよい。 The information processing device may infer frames consecutively, and when an event is no longer detected in even one frame, it may resume the process of thinning out frames and inferring, or when an event is no longer detected in multiple consecutive frames (e.g., two frames), it may resume the process of thinning out frames and inferring.

［構成例］
　次に、図４を用いて、情報処理装置１０の構成例を説明する。情報処理装置１０は、例えば、入出力部１１、通信部１２、記憶部１３、および、制御部１４を備える。 [Configuration example]
Next, a configuration example of the information processing device 10 will be described with reference to Fig. 4. The information processing device 10 includes, for example, an input/output unit 11, a communication unit 12, a storage unit 13, and a control unit 14.

　入出力部１１は、各種データの入出力を司るインタフェースである。入出力部１１は、例えば、情報処理装置１０への各種設定の入力等を受け付ける。 The input/output unit 11 is an interface that handles the input and output of various data. For example, the input/output unit 11 accepts input of various settings to the information processing device 10.

　通信部１２は、ネットワーク経由で各種データの入出力を行うための通信インタフェースである。通信部１２は、例えば、ネットワーク経由でIoT機器から出力されたデータ（連続するフレーム等）を受信する。 The communication unit 12 is a communication interface for inputting and outputting various types of data via a network. For example, the communication unit 12 receives data (successive frames, etc.) output from an IoT device via a network.

　記憶部１３は、制御部１４が各種処理を実行する際に参照されるデータ、プログラム等を記憶する。記憶部１３は、ＲＡＭ（Random　Access　Memory）、フラッシュメモリ（Flash　Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The memory unit 13 stores data, programs, etc. that are referenced when the control unit 14 executes various processes. The memory unit 13 is realized by a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory, or a storage device such as a hard disk or an optical disk.

　記憶部１３には、例えば、推論部１４２（後記）がデータの推論を行う際に用いる深層学習モデルのパラメータ等、様々な情報を記憶する。例えば、記憶部１３は、推論部１４２がフレームを間引いて推論する際、どの程度フレーム数を間引いて推論するか（間引き量）、イベント不検知のフレームが何枚連続した場合に再度フレームを間引いて推論するか（不検知フレーム数）等の設定情報を記憶する。 The storage unit 13 stores various information, such as parameters of a deep learning model used by the inference unit 142 (described below) when inferring data. For example, the storage unit 13 stores setting information such as how many frames to thin out when the inference unit 142 thins out frames to perform inference (thinning amount), how many consecutive frames of no event detection are required before thinning out frames again to perform inference (number of undetected frames), etc.

　また、記憶部１３は、検知対象のイベントの定義情報を記憶する。イベントの定義情報は、例えば、フレーム上の対象物体の検出数、検出サイズ、確信度がどの程度である場合、イベントが発生していると判定するかを定義した情報である。上記の設定情報、検知対象のイベントの定義情報等は、情報処理装置１０の利用者等により適宜変更可能である。 The storage unit 13 also stores definition information for events to be detected. The event definition information is information that defines, for example, the number of detected target objects in the frame, the detection size, and the degree of certainty required to determine that an event has occurred. The above setting information, definition information for events to be detected, etc. can be changed as appropriate by the user of the information processing device 10, etc.

　制御部１４は、情報処理装置１０全体の制御を司る。制御部１４の機能は、例えば、ＣＰＵ（Central　Processing　Unit）が、記憶部１３に記憶されるプログラムを実行することにより実現される。 The control unit 14 is responsible for controlling the entire information processing device 10. The functions of the control unit 14 are realized, for example, by the CPU (Central Processing Unit) executing a program stored in the storage unit 13.

　制御部１４は、例えば、データ入力部１４１と、推論部１４２と、出力処理部１４３とを備える。データ入力部１４１は、連続したデータの入力を受け付ける。例えば、データ入力部１４１は、通信部１２経由で監視カメラにより撮影されたデータ（連続するフレーム）の入力を受け付ける。 The control unit 14 includes, for example, a data input unit 141, an inference unit 142, and an output processing unit 143. The data input unit 141 accepts input of continuous data. For example, the data input unit 141 accepts input of data (continuous frames) captured by a surveillance camera via the communication unit 12.

　推論部１４２は、データ入力部１４１で受け付けた連続したデータに対し、間引いて推論を行う。そして、推論部１４２は、データに対する推論によりイベントを検知した場合、そのイベントが検知されたデータを起点に連続したデータを対象に推論を行う。 The inference unit 142 performs inference on the consecutive data received by the data input unit 141 by thinning it out. Then, when the inference unit 142 detects an event by inference on the data, it performs inference on the consecutive data starting from the data where the event was detected.

　例えば、推論部１４２は、データ入力部１４１で受け付けた連続したフレームを所定間隔で間引いて推論を行う。そして、推論部１４２は、フレームに対する推論の結果、イベント定義情報に示されるイベントを検知した場合、以降のフレームについて連続して推論を行う。そして、フレームの推論の結果、推論部１４２がイベントを検知しなくなったと判定した場合、連続したフレームの推論を停止し、再度フレームを間引いた推論を行う。 For example, the inference unit 142 performs inference by thinning out consecutive frames received by the data input unit 141 at a predetermined interval. Then, if the inference unit 142 detects an event indicated in the event definition information as a result of the inference for a frame, it performs inference continuously for the subsequent frames. Then, if the inference unit 142 determines as a result of the inference for a frame that it has no longer detected an event, it stops inference for the consecutive frames and performs inference again by thinning out the frames.

　例えば、推論部１４２は、前フレームの推論結果を用いて前フレームのイベント判定（イベントあり／イベントなし）を行うと、その判定結果をその次のフレームにメタデータ（推論要否フラグ）として付与する。そして、推論部１４２は、当該フレームの推論要否フラグが「要」であれば当該フレームの推論を行い、推論要否フラグが「否」であれば当該フレームの推論を行わない。 For example, when the inference unit 142 uses the inference result of the previous frame to make an event judgment (event present/no event) for the previous frame, it assigns the judgment result to the next frame as metadata (inference necessity flag). Then, if the inference necessity flag of the frame is "necessary", the inference unit 142 performs inference for that frame, and if the inference necessity flag is "no", it does not perform inference for that frame.

　なお、推論部１４２が行うフレームの推論は、例えば、フレームに映った物体のBBOX（位置情報）の特定、当該BBOX内の物体のクラス（物体が何であるか）の特定、当該BBOXの確信度の算出等である。フレームの推論には、例えば、画像から物体検知を行うための深層学習モデル等を用いる。 The frame inference performed by the inference unit 142 includes, for example, identifying the BBOX (position information) of an object captured in the frame, identifying the class of the object in the BBOX (what the object is), and calculating the confidence level of the BBOX. For frame inference, for example, a deep learning model for object detection from an image is used.

　出力処理部１４３は、推論部１４２によるフレームの推論結果を出力する。例えば、出力処理部１４３は、推論部１４２から出力されたデータ（フレーム）それぞれに推論結果を示すメタ情報を付与して出力する。 The output processing unit 143 outputs the inference results of the frames by the inference unit 142. For example, the output processing unit 143 outputs each piece of data (frame) output from the inference unit 142 by adding meta-information indicating the inference results.

［処理手順の例］
　次に図５を用いて情報処理装置１０が実行する処理手順の例を説明する。例えば、情報処理装置１０のデータ入力部１４１が、連続したデータの入力を受け付けると（Ｓ１）、推論部１４２は、データを間引きながら推論を行う（Ｓ２）。そして、推論部１４２は、データの推論の結果から、イベントありと判定すると（Ｓ３でＹｅｓ）、次のデータ以降、連続して推論を行う（Ｓ４）。その後、Ｓ３へ戻る。一方、推論部１４２は、データの推論の結果、イベントなしと判定すると（Ｓ３でＮｏ）、Ｓ２に戻り、次のデータ以降、データを間引きながら推論を行う（Ｓ２）。 [Example of processing procedure]
Next, an example of a processing procedure executed by the information processing device 10 will be described with reference to FIG. 5. For example, when the data input unit 141 of the information processing device 10 receives input of continuous data (S1), the inference unit 142 performs inference while thinning out the data (S2). Then, when the inference unit 142 determines that an event has occurred based on the result of the data inference (Yes in S3), it performs inference continuously from the next data onwards (S4). Then, it returns to S3. On the other hand, when the inference unit 142 determines that no event has occurred based on the result of the data inference (No in S3), it returns to S2 and performs inference while thinning out the data from the next data onwards (S2).

　情報処理装置１０が上記の処理を実行することで、推論の対象となるデータ（連続したデータ）に対する推論の頻度を低減し、かつ、検知漏れを防ぐことができる。 By executing the above process, the information processing device 10 can reduce the frequency of inference for the data to be inferred (continuous data) and prevent missed detections.

　なお、情報処理装置１０が、フレームの推論の結果、イベントありと判定した場合、推論をスキップした過去のフレームにおいてもイベントが発生している可能性がある。そこで、情報処理装置１０は、データの推論によりイベントを検知した場合、スキップした過去のデータ（フレーム）について遡及して推論を行ってもよい。 If the information processing device 10 determines that an event has occurred as a result of frame inference, there is a possibility that the event has also occurred in a past frame for which inference was skipped. Therefore, when the information processing device 10 detects an event by inferring data, it may perform inference retroactively on the skipped past data (frames).

　この場合、推論部１４２は、例えば、図６に示すように、イベントを検知したフレームを起点に、過去のフレームに遡って推論を行う。例えば、推論部１４２は、イベントを検知したフレームの１つ前のフレームから順に、推論を未実施の過去のフレームに対する推論を行う。 In this case, the inference unit 142 performs inference by going back to past frames starting from the frame in which the event is detected, as shown in FIG. 6. For example, the inference unit 142 performs inference on past frames for which inference has not been performed, starting from the frame immediately before the frame in which the event is detected.

　このようにすることで、情報処理装置１０は、処理対象のフレームについて漏れなくイベントを検知することができる。 In this way, the information processing device 10 can detect events without omission for the frames being processed.

　なお、推論部１４２は、上記のように過去に遡ってフレームの推論を行う際、推論を未実施の過去の複数のフレーム（例えば、４つのフレーム）に対し、一度にバッチ処理を行ってもよい。このようにすることで、推論部１４２は、過去のフレームについて効率よく推論を行うことができる。 When going back in time to perform frame inference as described above, the inference unit 142 may perform batch processing at once on multiple past frames (e.g., four frames) for which inference has not yet been performed. In this way, the inference unit 142 can efficiently perform inference on past frames.

　なお、推論部１４２は、１.イベントを検知したフレームから先のフレームに対する推論（未来方向のフレームに対する推論）と、２.イベントを検知したフレームから過去のフレームに対する推論（過去方向のフレームに対する推論）を、１→２の順で実行してもよいし、２→１の順で実行してもよい。 The inference unit 142 may perform 1. inference from the frame in which the event was detected to frames beyond (inference for frames in the future direction) and 2. inference from the frame in which the event was detected to frames past (inference for frames in the past direction) in the order of 1 → 2 or 2 → 1.

［動画ファイルへの適用例］
　また、情報処理装置１０は、IoT機器から逐次出力されるフレームを対象に上記の処理を行ってもよいし、所定時間分のまとまったフレーム（例えば、動画ファイル）を対象に上記の処理を行ってもよい。図７を用いて、情報処理装置１０が、動画ファイルを対象に処理を実行する場合の例を説明する。 [Example of application to video files]
The information processing device 10 may perform the above processing on frames sequentially output from an IoT device, or may perform the above processing on a set of frames for a predetermined period of time (e.g., a video file). An example of the case where the information processing device 10 performs processing on a video file will be described with reference to FIG. 7.

　例えば、情報処理装置１０が、動画ファイルを対象に処理を実行する場合、動画内のどの時刻のフレームからでもチェックすることが可能である。よって、例えば、推論部１４２は、図７に示すように動画ファイル内の未来側のフレームの定期チェック（先頭のフレームから所定間隔でフレームを間引いて推論することによるイベント検知）を行い（（１））、イベントを検知した場合、そのイベントを検知したフレームから未来側のフレームへ連続してイベントチェックを行う（（２））。 For example, when the information processing device 10 executes processing on a video file, it is possible to check from any frame at any time within the video. Thus, for example, the inference unit 142 performs regular checks of future frames within the video file (detecting an event by thinning out frames at a predetermined interval from the first frame and inferring) as shown in FIG. 7 ((1)), and when an event is detected, performs event checks continuously from the frame where the event was detected to future frames ((2)).

　その後、推論部１４２は、未来側の先頭のフレームまでチェックを終えると、今度は過去側のフレームへ連続してイベントチェックを行う（（３））。 After that, when the inference unit 142 has finished checking up to the first frame in the future, it then performs event checks continuously on the frames in the past ((3)).

　このようにすることで、推論部１４２は、例えば、長時間の動画ファイル内の最近のイベントから順に検知することができる。 In this way, the inference unit 142 can, for example, detect events in a long video file starting from the most recent events.

　なお、推論部１４２は、上記の（２）および（３）において、例えば、複数のフレームの推論の結果を用いて、イベントの不検知を判定してもよい（（４）複数のフレームの結果で不検知判定）。例えば、推論部１４２が、連続する２つのフレームでイベントを検知しなかった場合、イベント判定を「否」とし、連続したイベントチェックを停止してもよい。 In addition, in the above (2) and (3), the inference unit 142 may determine non-detection of an event, for example, using the results of inference for multiple frames ((4) Non-detection determination based on results for multiple frames). For example, if the inference unit 142 does not detect an event in two consecutive frames, the event determination may be "no" and consecutive event checks may be stopped.

　なお、推論部１４２が、どの程度のフレーム数を間引いて推論するか（間引き量）や、どの程度のフレーム数連続してイベントを検知されなくなったときにフレームを間引いて推論するか（不検知フレーム数）は、固定された値でもよいし、動的に変更される値でもよい。 The amount of frames that the inference unit 142 will thin out before making an inference (thinning amount) and the amount of consecutive frames over which an event is no longer detected before making an inference (number of undetected frames) may be fixed values or may be values that change dynamically.

　例えば、入力データが、車載カメラにより撮影された動画である場合、推論部１４２は、車載カメラが搭載される乗用車の速度の情報を逐次取得する。そして、推論部１４２は、乗用車の速度が上昇した場合、その速度に応じて、間引き量を減らしてもよい。 For example, if the input data is video captured by an on-board camera, the inference unit 142 sequentially acquires information on the speed of the passenger vehicle in which the on-board camera is mounted. Then, if the speed of the passenger vehicle increases, the inference unit 142 may reduce the amount of thinning in accordance with the speed.

　このようにすることで、車載カメラが搭載される乗用車の速度が上昇したことにより、今までよりも動画のフレームそれぞれの変化が大きくなった場合でも、推論部１４２がそれに応じて短い間隔でフレームを推論するので、検知漏れを低減することができる。 By doing this, even if the speed of the passenger vehicle equipped with the onboard camera increases and the changes in each frame of the video become greater than before, the inference unit 142 infers frames at shorter intervals accordingly, thereby reducing missed detections.

　また、推論部１４２は、乗用車の速度が低下した場合、その速度に応じて、フレームの間引き量を増やす。このようにすることで、車載カメラが搭載される乗用車の速度の低下したことにより、今までよりも動画のフレームそれぞれの変化が小さくなった場合、推論部１４２が行うフレームの推論の頻度を低くすることができる。その結果、推論部１４２は限られた計算リソースを有効活用してフレームの推論を行うことができる。 Furthermore, when the speed of the passenger vehicle decreases, the inference unit 142 increases the amount of frame thinning in accordance with the speed. In this way, when the change in each frame of the video becomes smaller than before due to a decrease in the speed of the passenger vehicle equipped with an on-board camera, the inference unit 142 can perform frame inference less frequently. As a result, the inference unit 142 can perform frame inference by making effective use of limited computational resources.

　また、間引き量や不検知フレーム数は、フレームの推論結果に含まれるBBOXの確信度等に基づき、動的に変更されてもよい。 In addition, the amount of thinning and the number of non-detection frames may be dynamically changed based on the confidence level of the BBOX included in the inference result of the frame.

　例えば、推論部１４２によるフレームの推論の結果、BBOXの確信度がイベント判定の閾値をぎりぎり下回っている（例えば、フレームに人らしきものが映っているようだが確信度はあまり高くない）場合、フレームの間引き量を少なくしたり、不検知フレーム数を増やしたりしてもよい。 For example, if the result of frame inference by the inference unit 142 is that the certainty of the BBOX is just below the event determination threshold (for example, the frame appears to show something that looks like a person, but the certainty is not very high), the amount of frame thinning may be reduced or the number of undetected frames may be increased.

　以上説明した情報処理装置１０によれば、例えば、前記した２層推論システムで発生していた１層目の低精度モデルに起因するフレームの推論精度の劣化がない。また、情報処理装置１０は、イベントの未発生時はフレームを間引いて推論するので、推論の頻度を低減することができる。 According to the information processing device 10 described above, for example, there is no degradation in frame inference accuracy due to the low-precision model in the first layer that occurs in the two-layer inference system described above. Furthermore, the information processing device 10 thins out frames and performs inference when an event has not yet occurred, thereby reducing the frequency of inference.

　また、情報処理装置１０は、２層推論システムにおける１層目のモデルに相当するモデルは不要である。よって、情報処理装置１０の場合、２層推論システムの１層目のモデルに相当するモデルの実装やチューニングも不要である。また、各フレームに高頻度でイベントが発生していた場合、２層推論システムは１層目のリソース+２層目のリソースを使用するが、情報処理装置１０は２層目のリソースに相当するリソースがあればよい。よって、情報処理装置１０によれば、従来よりもフレームの推論に要するリソースを低減することができる。 In addition, the information processing device 10 does not require a model equivalent to the first layer model in a two-layer inference system. Therefore, in the case of the information processing device 10, there is no need to implement or tune a model equivalent to the first layer model in a two-layer inference system. Furthermore, if events occur frequently in each frame, the two-layer inference system uses first layer resources + second layer resources, but the information processing device 10 only needs resources equivalent to the second layer resources. Therefore, with the information processing device 10, it is possible to reduce the resources required for frame inference compared to conventional methods.

［システム構成等］
　また、図示した各部の各構成要素は機能概念的なものであり、必ずしも物理的に図示のように構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部又は一部を、各種の負荷や使用状況等に応じて、任意の単位で機能的又は物理的に分散・統合して構成することができる。さらに、各装置にて行われる各処理機能は、その全部又は任意の一部が、ＣＰＵ及び当該ＣＰＵにて実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
In addition, each component of each unit shown in the figure is a functional concept, and does not necessarily have to be physically configured as shown in the figure. In other words, the specific form of distribution and integration of each device is not limited to that shown in the figure, and all or a part of it can be functionally or physically distributed and integrated in any unit depending on various loads, usage conditions, etc. Furthermore, each processing function performed by each device can be realized in whole or in any part by a CPU and a program executed by the CPU, or can be realized as hardware using wired logic.

　また、前記した実施形態において説明した処理のうち、自動的に行われるものとして説明した処理の全部又は一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部又は一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Furthermore, among the processes described in the above embodiments, all or part of the processes described as being performed automatically can be performed manually, or all or part of the processes described as being performed manually can be performed automatically using known methods. In addition, the information including the processing procedures, control procedures, specific names, various data and parameters shown in the above documents and drawings can be changed as desired unless otherwise specified.

［プログラム］
　前記した情報処理装置１０は、パッケージソフトウェアやオンラインソフトウェアとしてプログラム（情報処理プログラム）を所望のコンピュータにインストールさせることによって実装できる。例えば、上記のプログラムを情報処理装置に実行させることにより、情報処理装置を情報処理装置１０として機能させることができる。ここで言う情報処理装置にはスマートフォン、携帯電話機やＰＨＳ（Personal　Handyphone　System）等の移動体通信端末、さらには、ＰＤＡ（Personal　Digital　Assistant）等の端末等がその範疇に含まれる。 [program]
The information processing device 10 can be implemented by installing a program (information processing program) as package software or online software on a desired computer. For example, the information processing device can function as the information processing device 10 by executing the above program on the information processing device. The information processing device referred to here includes mobile communication terminals such as smartphones, mobile phones, and PHS (Personal Handyphone System), and further terminals such as PDAs (Personal Digital Assistants).

　図８は、情報処理プログラムを実行するコンピュータの一例を示す図である。コンピュータ１０００は、例えば、メモリ１０１０、ＣＰＵ１０２０を有する。また、コンピュータ１０００は、ハードディスクドライブインタフェース１０３０、ディスクドライブインタフェース１０４０、シリアルポートインタフェース１０５０、ビデオアダプタ１０６０、ネットワークインタフェース１０７０を有する。これらの各部は、バス１０８０によって接続される。 FIG. 8 is a diagram showing an example of a computer that executes an information processing program. The computer 1000 has, for example, a memory 1010 and a CPU 1020. The computer 1000 also has a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. Each of these components is connected by a bus 1080.

　メモリ１０１０は、ＲＯＭ（Read　Only　Memory）１０１１及びＲＡＭ（Random　Access　Memory）１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic　Input　Output　System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM (Random Access Memory) 1012. The ROM 1011 stores a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to a hard disk drive 1090. The disk drive interface 1040 is connected to a disk drive 1100. A removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to a mouse 1110 and a keyboard 1120, for example. The video adapter 1060 is connected to a display 1130, for example.

　ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記の情報処理装置１０が実行する各処理を規定するプログラムは、コンピュータにより実行可能なコードが記述されたプログラムモジュール１０９３として実装される。プログラムモジュール１０９３は、例えばハードディスクドライブ１０９０に記憶される。例えば、情報処理装置１０における機能構成と同様の処理を実行するためのプログラムモジュール１０９３が、ハードディスクドライブ１０９０に記憶される。なお、ハードディスクドライブ１０９０は、ＳＳＤ（Solid　State　Drive）により代替されてもよい。 The hard disk drive 1090 stores, for example, an OS 1091, an application program 1092, a program module 1093, and program data 1094. That is, the programs that define each process executed by the information processing device 10 are implemented as program modules 1093 in which computer-executable code is written. The program modules 1093 are stored, for example, in the hard disk drive 1090. For example, a program module 1093 for executing processes similar to the functional configuration of the information processing device 10 is stored in the hard disk drive 1090. The hard disk drive 1090 may be replaced by an SSD (Solid State Drive).

　また、上述した実施形態の処理で用いられるデータは、プログラムデータ１０９４として、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出して実行する。 The data used in the processing of the above-described embodiment is stored as program data 1094, for example, in memory 1010 or hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 or program data 1094 stored in memory 1010 or hard disk drive 1090 into RAM 1012 as necessary and executes it.

　なお、プログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限らず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ１１００等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、プログラムモジュール１０９３及びプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local　Area　Network）、ＷＡＮ（Wide　Area　Network）等）を介して接続される他のコンピュータに記憶されてもよい。そして、プログラムモジュール１０９３及びプログラムデータ１０９４は、他のコンピュータから、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in, for example, a removable storage medium and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program module 1093 and program data 1094 may be stored in another computer connected via a network (such as a LAN (Local Area Network), WAN (Wide Area Network)). The program module 1093 and program data 1094 may then be read by the CPU 1020 from the other computer via the network interface 1070.

　１０　情報処理装置
　１１　入出力部
　１２　通信部
　１３　記憶部
　１４　制御部
　１４１　データ入力部
　１４２　推論部
　１４３　出力処理部 REFERENCE SIGNS LIST 10 Information processing device 11 Input/output unit 12 Communication unit 13 Storage unit 14 Control unit 141 Data input unit 142 Inference unit 143 Output processing unit

Claims

a data input unit for accepting input of continuous data;
an inference unit that thins out the continuous data and performs inference on the data, and performs inference on continuous data starting from data in which an event is detected by the inference;
and an output processing unit that outputs an inference result based on the inference of the data.

The inference unit is
The information processing apparatus according to claim 1 , further comprising: a processor configured to execute inference on the continuous data until the event is no longer detected.

The inference unit is
The information processing device according to claim 1, characterized in that, starting from the data at which an event is detected by the inference, inference is performed on consecutive data from the data and on data older than the data at which the inference has not been performed.

The inference unit is
The information processing apparatus according to claim 3 , further comprising: performing inference by batch processing on a plurality of data older than the data and not yet subjected to the inference.

The information processing apparatus according to claim 1 , wherein the data is frames constituting a moving image.

An information processing method executed by an information processing device,
accepting input of continuous data;
a step of thinning out the continuous data and performing inference on the data, and performing inference on continuous data starting from data in which an event is detected by the inference;
and outputting an inference result based on the inference of the data.

accepting input of continuous data;
a step of thinning out the continuous data and performing inference on the data, and performing inference on continuous data starting from data in which an event is detected by the inference;
and outputting an inference result based on the inference of the data.