JP7667723B2

JP7667723B2 - Information processing device and information processing method

Info

Publication number: JP7667723B2
Application number: JP2021170384A
Authority: JP
Inventors: 聡一郎野村
Original assignee: Komatsu Ltd
Current assignee: Komatsu Ltd
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2025-04-23
Anticipated expiration: 2041-10-18
Also published as: JP2023060666A

Description

本開示は、情報処理装置および情報処理方法に関する。 This disclosure relates to an information processing device and an information processing method.

従来、たとえば特開平７－１４６８９７号公報（特許文献１）に示すように、赤外線ビデオカメラによって撮像された作業場の映像を再生しながら、当該作業場における作業内容を分類するシステムが知られている。 Conventionally, as shown in, for example, Japanese Patent Application Laid-Open No. 7-146897 (Patent Document 1), a system is known that classifies the tasks performed in a workplace while playing back video of the workplace captured by an infrared video camera.

詳しくは、このシステムでは、赤外線ビデオカメラによって記録したビデオテープの内容をビデオデッキにて再生する。システムのコントローラ内の人体認識部は、赤外ビデオモニタに表示された作業者に相当する赤色部を追跡し、当該赤色部の動きをモード切替部を介して条件比較部に入力する。コントローラ内の条件比較部は、条件記憶部に格納してある条件を読み出し、人体認識部から送られてきたデータと比較して作業者が行っている作業を分類する。 In more detail, in this system, the contents of a video tape recorded by an infrared video camera are played back on a video deck. The human body recognition unit in the system's controller tracks the red area displayed on the infrared video monitor that corresponds to the worker, and inputs the movement of the red area into the condition comparison unit via the mode switching unit. The condition comparison unit in the controller reads the conditions stored in the condition memory unit, compares them with the data sent from the human body recognition unit, and classifies the work being performed by the worker.

特開平７－１４６８９７号公報Japanese Unexamined Patent Publication No. 7-146897

特許文献１のシステムでは、被写体を赤外線ビデオカメラによって撮像する必要があり、汎用性に欠ける。 The system in Patent Document 1 requires the subject to be captured by an infrared video camera, making it less versatile.

本開示は、可視光カメラによる撮像によって得られた動画像データ（複数のフレーム画像データ）に基づき、作業場において行われている作業の種別を判定可能な情報処理装置および情報処理方法を提供する。 The present disclosure provides an information processing device and information processing method that can determine the type of work being performed in a workplace based on video image data (multiple frame image data) captured by a visible light camera.

本開示のある局面に従うと、情報処理装置は、可視光カメラによる撮像によって得られた連続する複数のフレーム画像データを取得する取得手段を備える。可視光カメラは、設置位置および姿勢が固定され、かつ、光の明滅を伴う作業が作業者によって行われている作業場を被写体として撮像する。情報処理装置は、複数のフレーム画像データのうちの第１のフレーム画像データにおいて、作業者の領域を検出する検出手段と、第１のフレーム画像データと、複数のフレーム画像データのうち第１のフレーム画像データよりも所定個前の第２のフレーム画像データとに基づいて、被写体の状態変化を示す画像データを生成する生成手段と、生成された画像データから、検出された作業者の領域に対応する領域の画像データを抽出する抽出手段と、抽出された画像データに基づき、作業場で行われている作業の種別を判定する判定手段とをさらに備える。 According to one aspect of the present disclosure, an information processing device includes an acquisition means for acquiring a series of multiple frame image data captured by an image capture using a visible light camera. The visible light camera is fixed in position and orientation, and captures an image of a workplace where a worker is performing work involving blinking light as a subject. The information processing device further includes a detection means for detecting an area of the worker in a first frame image data of the multiple frame image data, a generation means for generating image data showing a change in the state of the subject based on the first frame image data and a second frame image data of the multiple frame image data that is a predetermined number of frames before the first frame image data, an extraction means for extracting image data of an area corresponding to the detected area of the worker from the generated image data, and a determination means for determining the type of work being performed in the workplace based on the extracted image data.

本開示の他の局面に従うと、情報処理方法は、可視光カメラによって、光の明滅を伴う作業が作業者によって行われている作業場を被写体として撮像するステップを備える。可視光カメラは、設置位置および姿勢が固定されている。情報処理方法は、可視光カメラによる撮像によって得られた連続する複数のフレーム画像データを取得するステップと、複数のフレーム画像データのうちの第１のフレーム画像データにおいて、作業者の領域を検出するステップと、第１のフレーム画像データと、複数のフレーム画像データのうち第１のフレーム画像データよりも所定個前の第２のフレーム画像データとに基づいて、被写体の状態変化を示す画像データを生成するステップと、生成された画像データから、検出された作業者の領域に対応する領域の画像データを抽出するステップと、抽出された画像データに基づき、作業場で行われている作業の種別を判定するステップとをさらに備える。 According to another aspect of the present disclosure, the information processing method includes a step of capturing an image of a workplace where a worker is performing work involving blinking light, as a subject, by a visible light camera. The visible light camera is fixed in its installation position and attitude. The information processing method further includes a step of acquiring a plurality of consecutive frame image data obtained by capturing images by the visible light camera, a step of detecting an area of the worker in a first frame image data of the plurality of frame image data, a step of generating image data showing a change in the state of the subject based on the first frame image data and a second frame image data of the plurality of frame image data that is a predetermined number of frames before the first frame image data, a step of extracting image data of an area corresponding to the area of the detected worker from the generated image data, and a step of determining a type of work being performed in the workplace based on the extracted image data.

本開示によれば、可視光カメラによる撮像によって得られた複数のフレーム画像データに基づき、作業場において行われている作業の種別を判定可能となる。 According to the present disclosure, it is possible to determine the type of work being performed in a workplace based on multiple frame image data obtained by capturing images using a visible light camera.

判定システムの概略構成を説明するための図である。FIG. 1 is a diagram for explaining a schematic configuration of a determination system. 情報処理装置のハードウェア構成を示した図である。FIG. 2 is a diagram illustrating a hardware configuration of an information processing device. 情報処理装置で実行される処理の概要を説明するための図である。FIG. 2 is a diagram for explaining an overview of a process executed by an information processing device. 判定処理の流れを示すフロー図である。FIG. 11 is a flowchart showing the flow of a determination process. 図４のステップＳ１の処理の詳細を示したフロー図である。FIG. 5 is a flow chart showing details of the process in step S1 of FIG. 4. 図４のステップＳ２の処理の詳細を示したフロー図である。FIG. 5 is a flowchart showing details of the process in step S2 of FIG. 4. 図６のステップＳ２０２の処理の詳細を説明するためのフロー図である。FIG. 7 is a flowchart for explaining details of the process of step S202 in FIG. 6. 画像データを用いて図７の処理を説明するための図である。FIG. 8 is a diagram for explaining the process of FIG. 7 using image data. 図６のステップＳ２０６の処理の詳細を説明するためのフロー図である。FIG. 7 is a flowchart for explaining details of the process of step S206 in FIG. 6. 互換性が高いファイル形式としてメモリに保存された最終判定結果を含むデータ示した図である。13 is a diagram showing data including the final judgment result stored in memory in a highly compatible file format. FIG. 図７に示したステップＳ２０２の一連の処理の変形例を示したフロー図である。FIG. 8 is a flowchart showing a modified example of the series of processes in step S202 shown in FIG. 7.

以下、実施形態について図に基づいて説明する。なお、以下の説明では、同一部品には、同一の符号を付している。それらの名称および機能も同じである。したがって、それらについての詳細な説明は繰り返さない。 The following describes the embodiment with reference to the drawings. In the following description, identical parts are given the same reference numerals. Their names and functions are also the same. Therefore, detailed descriptions of them will not be repeated.

はじめに、本実施の形態で用いる用語の一部について説明する。 First, we will explain some of the terms used in this embodiment.

「学習済みモデル」とは「学習済みパラメータ」が組み込まれた「推論プログラム」をいう。「学習済みパラメータ」とは、学習用データセットを用いた学習の結果、得られたパラメータ（係数）をいう。学習済みパラメータは、学習用データセットを学習用プログラムに対して入力することで、一定の目的のために機械的に調整されることで生成される。「推論プログラム」とは、組み込まれた学習済みパラメータを適用することで、入力に対して一定の結果を出力することを可能にするプログラムをいう。 "Trained model" refers to an "inference program" that incorporates "trained parameters." "Trained parameters" refer to parameters (coefficients) obtained as a result of learning using a training dataset. Trained parameters are generated by inputting the training dataset into a training program and mechanically adjusting it for a certain purpose. "Inference program" refers to a program that makes it possible to output a certain result for an input by applying the incorporated trained parameters.

「学習用プログラム」とは、学習用データセットの中から一定の規則を見出し、その規則を表現するモデルを生成するためのアルゴリズムを実行するプログラムをいう。具体的には、採用する学習手法による学習を実現するために、コンピュータに実行させる手順を規定するプログラムがこれに該当する。 "Learning program" refers to a program that executes an algorithm to find certain rules from a training dataset and generate a model that expresses those rules. Specifically, this refers to a program that specifies the procedures to be executed by a computer in order to realize learning using the adopted learning method.

「学習用データセット」とは、生データに対して、欠測値および外れ値の除去等の前処理、ラベル情報（正解データ）等の別個のデータの付加、あるいはこれらを組み合わせて、変換および／または加工処理を施すことによって、対象とする学習の手法による解析を容易にするために生成された二次的な加工データをいう。学習用データセットは、データ（以下、「学習用データ」とも称する）の集合体である。本実施の形態では、学習用データは、１枚の画像データと、ラベル情報とを含む。 A "learning dataset" refers to secondary processed data that is generated by converting and/or processing raw data through preprocessing such as removing missing values and outliers, adding separate data such as label information (ground truth data), or combining these to facilitate analysis using a target learning method. A learning dataset is a collection of data (hereinafter also referred to as "learning data"). In this embodiment, the learning data includes one image data and label information.

＜Ａ．システム構成＞
図１は、本実施の形態の判定システムの概略構成を説明するための図である。 A. System Configuration
FIG. 1 is a diagram for explaining a schematic configuration of a determination system according to the present embodiment.

図１に示されるように、判定システム１０００は、カメラ１と、情報処理装置２とを備える。 As shown in FIG. 1, the determination system 1000 includes a camera 1 and an information processing device 2.

カメラ１は、可視光カメラである。カメラ１は、設置位置および姿勢が固定されている。カメラ１は、光の明滅を伴う作業が作業者９００によって行われている作業場を被写体として撮像する。光の明滅を伴う作業としては、たとえば、溶接作業、グラインダ作業、ガウジング作業等がある。なお、溶接作業には、スポット溶接を含む。 Camera 1 is a visible light camera. The installation position and posture of camera 1 are fixed. Camera 1 captures an image of a workplace where a worker 900 is performing work involving blinking light as a subject. Examples of work involving blinking light include welding work, grinding work, and gouging work. Note that welding work includes spot welding.

カメラ１によって撮像された動画像データＤａは、情報処理装置２に送られる。動画像データＤａは、複数の連続するフレーム画像データ＃１，＃２，＃３,…を含んで構成される。 Movie data Da captured by camera 1 is sent to information processing device 2. The movie data Da is composed of multiple consecutive frame image data #1, #2, #3, ....

情報処理装置２は、ユーザ９５０によって利用される。情報処理装置２は、典型的には、パーソナルコンピュータである。情報処理装置２は、カメラ１によって撮像された動画像データＤａをカメラ１から取得する。なお、情報処理装置２は、サーバ装置等の他の機器を介して、動画像データＤａを取得してもよい。また、情報処理装置２は、ＩＣカード、ＵＳＢメモリ等の記憶媒体を介して、動画像データＤａを取得してもよい。 The information processing device 2 is used by a user 950. The information processing device 2 is typically a personal computer. The information processing device 2 acquires moving image data Da captured by the camera 1 from the camera 1. The information processing device 2 may acquire the moving image data Da via another device such as a server device. The information processing device 2 may also acquire the moving image data Da via a storage medium such as an IC card or a USB memory.

図２は、情報処理装置２のハードウェア構成を示した図である。 Figure 2 shows the hardware configuration of information processing device 2.

図２に示されるように、情報処理装置２は、プロセッサ２０１と、メモリ２０２と、ディスプレイ２０３と、入力装置２０４と、通信インターフェイス２０５と、カードリーダ２０６と、ＵＳＢポート２０７とを備える。メモリ２０２は、ＲＯＭ（Read Only Memory）２２１と、ＲＡＭ（Random Access Memory）２２２と、ＳＳＤ（Solid State Drive）２２３と、ＨＤＤ（Hard Disk Drive）２２４とを含む。 2, the information processing device 2 includes a processor 201, a memory 202, a display 203, an input device 204, a communication interface 205, a card reader 206, and a USB port 207. The memory 202 includes a ROM (Read Only Memory) 221, a RAM (Random Access Memory) 222, a SSD (Solid State Drive) 223, and a HDD (Hard Disk Drive) 224.

メモリ２０２には、オペレーティングシステムと、学習済みモデルを含む各種のプログラムとが格納されている。メモリ２０２には、後述する各種の処理を実行するためのプログラムが格納されている。プロセッサ２０１は、オペレーティングシステムおよび上記プログラムを実行する。 The memory 202 stores an operating system and various programs including a trained model. The memory 202 stores programs for executing various processes described below. The processor 201 executes the operating system and the above programs.

入力装置２０４は、ユーザ９５０からの操作入力を受け付ける。入力装置２０４は、典型的には、キーボード、マウスである。 The input device 204 accepts operational input from the user 950. The input device 204 is typically a keyboard or a mouse.

プロセッサ２０１は、入力装置２０４が受け付けた操作に基づき、各種の処理を実行する。プロセッサ２０１は、各種の情報をディスプレイ２０３に表示する。プロセッサ２０１は、プログラムの実行結果をディスプレイに表示する。 The processor 201 executes various processes based on operations received by the input device 204. The processor 201 displays various information on the display 203. The processor 201 displays the results of program execution on the display.

通信インターフェイス２０５は、外部の機器と通信するためのインターフェイスである。プロセッサ２０１は、通信インターフェイス２０５を介して、カメラ１から動画像データＤａを取得する。 The communication interface 205 is an interface for communicating with external devices. The processor 201 acquires video image data Da from the camera 1 via the communication interface 205.

カードリーダ２０６は、ＩＣカードに記憶されたデータを読み取る。ＵＳＢポート２０７には、ＵＳＢメモリが接続される。プロセッサ２０１は、カードリーダ２０６またはＵＳＢポート２０７を介して、動画像データＤａを取得することも可能である。 The card reader 206 reads data stored on an IC card. A USB memory is connected to the USB port 207. The processor 201 can also acquire video image data Da via the card reader 206 or the USB port 207.

＜Ｂ．作業種別の判定＞
図３は、情報処理装置２で実行される処理の概要を説明するための図である。 <B. Determination of work type>
FIG. 3 is a diagram for explaining an overview of the process executed by the information processing device 2. As shown in FIG.

図３に示されるように、情報処理装置２は、動画像データ取得部１０と、人物領域検出部２０と、生成部３０と、抽出部４０と、作業種別判定部５０と、記憶部６０と、表示制御部７０と、表示部８０とを備える。 As shown in FIG. 3, the information processing device 2 includes a video data acquisition unit 10, a person area detection unit 20, a generation unit 30, an extraction unit 40, an operation type determination unit 50, a storage unit 60, a display control unit 70, and a display unit 80.

なお、動画像データ取得部１０は、通信インターフェイス２０５、カードリーダ２０６、または、ＵＳＢポート２０７に対応する。人物領域検出部２０と、生成部３０と、抽出部４０と、作業種別判定部５０と、表示制御部７０とは、プロセッサ２０１が、メモリ２０２に記憶されたプログラム等を実行することにより実現される機能ブロックである。記憶部６０は、メモリ２０２に対応する。表示部８０は、ディスプレイ２０３に対応する。 The video data acquisition unit 10 corresponds to the communication interface 205, the card reader 206, or the USB port 207. The person area detection unit 20, the generation unit 30, the extraction unit 40, the task type determination unit 50, and the display control unit 70 are functional blocks that are realized by the processor 201 executing programs and the like stored in the memory 202. The storage unit 60 corresponds to the memory 202. The display unit 80 corresponds to the display 203.

動画像データ取得部１０は、カメラ１による撮像によって得られた動画像データＤａを、カメラ１から取得する。詳しくは、動画像データ取得部１０は、カメラ１によって得られた連続する複数のフレーム画像データ＃１，＃２，＃３,…を取得する。 The video data acquisition unit 10 acquires video data Da obtained by imaging using the camera 1 from the camera 1. In detail, the video data acquisition unit 10 acquires a series of multiple frame image data #1, #2, #3, ... obtained by the camera 1.

情報処理装置２では、複数のフレーム画像データ＃１，＃２，＃３,…に対し、人物領域検出部２０による検出処理と、生成部３０による画像生成処理とが個別に行われる。なお、人物領域検出部２０による検出処理と、生成部３０による画像生成処理とが行われるタイミングは、いずれが先であってもよいし、同時であってもよい。 In the information processing device 2, the detection process by the person area detection unit 20 and the image generation process by the generation unit 30 are performed separately for a plurality of frame image data #1, #2, #3, .... Note that the detection process by the person area detection unit 20 and the image generation process by the generation unit 30 may be performed either first or simultaneously.

詳しくは、動画像データ取得部１０によって取得された動画像データＤａは、記憶部６０等のメモリ２０２に一時的に格納され、その後、人物領域検出部２０および生成部３０によって読み出される。 In detail, the video data Da acquired by the video data acquisition unit 10 is temporarily stored in a memory 202 such as the storage unit 60, and is then read out by the person area detection unit 20 and the generation unit 30.

以下、主として、人物領域検出部２０と、生成部３０と、抽出部４０と、作業種別判定部５０と、表示制御部７０とについて説明する。 The following mainly describes the person area detection unit 20, the generation unit 30, the extraction unit 40, the task type determination unit 50, and the display control unit 70.

（ｂ１．人物領域検出部２０）
人物領域検出部２０は、複数のフレーム画像データ＃１，＃２，＃３,…の各々のフレーム画像データ（以下、Ｎを自然数として、「フレーム画像データ＃Ｎ」と称する）において、作業者の領域（以下、「人物領域」とも称する）を検出する。本例では、人物領域検出部２０は、学習済みモデルＭ２０によって実現される。なお、本例では、フレーム画像データ＃１における人物領域は利用しないため、フレーム画像データ＃１において人物領域を検出する必要はない。 (b1. Person area detection unit 20)
The person area detection unit 20 detects the area of the worker (hereinafter also referred to as "person area") in each of the multiple frame image data #1, #2, #3, ... (hereinafter referred to as "frame image data #N" where N is a natural number). In this example, the person area detection unit 20 is realized by the trained model M20. Note that in this example, the person area in frame image data #1 is not used, so there is no need to detect the person area in frame image data #1.

学習済みモデルＭ２０は、フレーム画像データ＃Ｎを入力とし、かつ、人物領域を示す情報を出力する。具体的には、学習済みモデルＭ２０は、人物領域の座標を出力する。より具体的には、学習済みモデルＭ２０は、人物領域として、矩形状の領域を抽出する。当該領域は、典型的には、矩形の４つの角の座標値として表すことができる。このように、学習済みモデルＭ２０は、人物領域の座標として、４つ座標値を出力する。 The trained model M20 receives frame image data #N as input and outputs information indicating a person area. Specifically, the trained model M20 outputs the coordinates of the person area. More specifically, the trained model M20 extracts a rectangular area as the person area. Typically, the area can be expressed as the coordinate values of the four corners of the rectangle. In this way, the trained model M20 outputs four coordinate values as the coordinates of the person area.

人物領域検出部２０は、フレーム画像データ＃Ｎを示す識別子（たとえば、フレームナンバー、タイムスタンプ）とともに、人物領域の座標を抽出部４０に送る。具体的には、人物領域検出部２０は、フレーム画像データ＃Ｎの識別子に関連付けられた、フレーム画像データ＃Ｎにおける人物領域の座標を、抽出部４０に送る。たとえば、動画像データＤａにフレーム画像データがＫ個（１≦Ｎ≦Ｋ）だけ含まれる場合、人物領域検出部２０は、Ｋ個の人物領域の座標のセット（座標値は、合計個数４Ｋ＝４×Ｋ）を抽出部４０に送る。より詳しくは、本例では、Ｋ個の人物領域の座標は、記憶部６０等のメモリ２０２に一時的に格納され、その後、抽出部４０によって読み出される。 The person area detection unit 20 sends the coordinates of the person area to the extraction unit 40 along with an identifier (e.g., frame number, timestamp) indicating frame image data #N. Specifically, the person area detection unit 20 sends the coordinates of the person area in frame image data #N associated with the identifier of frame image data #N to the extraction unit 40. For example, if the video data Da contains K frame image data (1≦N≦K), the person area detection unit 20 sends a set of coordinates of K person areas (coordinate values are 4K=4×K in total) to the extraction unit 40. More specifically, in this example, the coordinates of the K person areas are temporarily stored in memory 202 such as the storage unit 60, and then read out by the extraction unit 40.

（ｂ２．生成部３０）
生成部３０は、フレーム画像データ＃Ｎと、フレーム画像データ＃１よりも１個前のフレーム画像データ＃Ｎ－１とに基づいて、被写体の状態変化を示す画像データを生成する。詳しくは、生成部３０は、フレーム差分法（「フレーム間差分法」とも称される）によって、被写体の状態変化を示す画像データを生成する。なお、フレーム画像データ＃０は存在しないため、フレーム画像データ＃０とフレーム画像データ＃１とに基づいた、被写体の状態変化を示す画像データは生成されない。 (b2. Generation unit 30)
The generating unit 30 generates image data showing a change in the subject's state based on frame image data #N and frame image data #N-1, which is one frame before frame image data #1. More specifically, the generating unit 30 generates image data showing a change in the subject's state by a frame difference method (also called an "inter-frame difference method"). Note that since frame image data #0 does not exist, image data showing a change in the subject's state based on frame image data #0 and frame image data #1 is not generated.

より詳しくは、生成部３０は、連続するフレーム画像データ＃Ｎ－１とフレーム画像データ＃Ｎとから、フレーム差分法によって、マスク処理後の画像データ＃Ｎ－１（２≦Ｎ≦Ｋ）を生成する。なお、フレーム差分法およびマスク処理については後述する。 More specifically, the generator 30 generates masked image data #N-1 (2≦N≦K) from consecutive frame image data #N-1 and frame image data #N using the frame difference method. The frame difference method and mask processing will be described later.

生成部３０によって、複数のマスク処理後の画像データ＃Ｎ－１が生成される。このように、生成部３０によって、各々が被写体の状態変化を示す画像データが複数生成される。 The generating unit 30 generates multiple pieces of image data #N-1 after mask processing. In this way, the generating unit 30 generates multiple pieces of image data, each of which indicates a change in the state of the subject.

生成部３０は、フレーム画像データ＃Ｎを示す識別子とともに、マスク処理後の画像データ＃Ｎ－１を、抽出部４０に送る。具体的には、生成部３０は、フレーム画像データ＃Ｎの識別子に関連付けられたマスク処理後の画像データ＃Ｎ－１を抽出部４０に送る。たとえば、動画像データＤａにフレーム画像データがＫ個含まれる場合、生成部３０は、Ｋ－１個のマスク処理後の画像データを抽出部４０に送る。より詳しくは、本例では、Ｋ－１個のマスク処理後の画像データは、記憶部６０等のメモリ２０２に一時的に格納され、その後、抽出部４０によって読み出される。 The generation unit 30 sends the masked image data #N-1 to the extraction unit 40 along with an identifier indicating the frame image data #N. Specifically, the generation unit 30 sends the masked image data #N-1 associated with the identifier of the frame image data #N to the extraction unit 40. For example, if the video image data Da contains K frame image data, the generation unit 30 sends the K-1 masked image data to the extraction unit 40. More specifically, in this example, the K-1 masked image data are temporarily stored in a memory 202 such as the storage unit 60, and are then read out by the extraction unit 40.

（ｂ３．抽出部４０）
抽出部４０は、生成部３０によって生成された画像データ（マスク処理後の画像データ＃Ｎ－１）から、人物領域検出部２０によって検出された人物領域に対応する領域の画像データを抽出する。換言すれば、抽出部４０は、画像の切り出しを行う。 (b3. Extraction Unit 40)
The extraction unit 40 extracts image data of an area corresponding to the person area detected by the person area detection unit 20 from the image data (masked image data #N-1) generated by the generation unit 30. In other words, the extraction unit 40 cuts out the image.

詳しくは、抽出部４０は、同じ識別子が付された、人物領域の座標とマスク処理後の画像データとを用いて、当該マスク処理後の画像データから、人物領域に対応する矩形領域の画像データを抽出する。 In detail, the extraction unit 40 uses the coordinates of the person area and the masked image data, which are assigned the same identifier, to extract image data of a rectangular area corresponding to the person area from the masked image data.

具体的には、抽出部４０は、たとえばフレーム画像データ＃１とフレーム画像データ＃２とにフレーム差分法を適用することよって生成されたマスク処理後の画像データ＃１から、フレーム画像データ＃２において検出された人物領域に対応する領域（詳しくは、人物領域の４つの頂点座標で特定される矩形領域）の画像データを抽出する。同様に、抽出部４０は、フレーム画像データ＃２とフレーム画像データ＃３とにフレーム差分法を適用することよって生成されたマスク処理後の画像データ＃２から、フレーム画像データ＃３において検出された人物領域に対応する領域の画像データを抽出する。動画像データＤａにフレーム画像データがＫ個含まれる場合、抽出部４０は、このような抽出処理を、合計Ｋ－１回行う。 Specifically, the extraction unit 40 extracts image data of an area (more specifically, a rectangular area specified by the coordinates of the four vertices of the person area) corresponding to the person area detected in frame image data #2 from post-mask processing image data #1 generated by applying the frame difference method to frame image data #1 and frame image data #2. Similarly, the extraction unit 40 extracts image data of an area corresponding to the person area detected in frame image data #3 from post-mask processing image data #2 generated by applying the frame difference method to frame image data #2 and frame image data #3. If the moving image data Da contains K pieces of frame image data, the extraction unit 40 performs this type of extraction process K-1 times in total.

抽出部４０は、抽出された画像データ＃Ｎ－１を、作業種別判定部５０に送る。詳しくは、本例では、Ｋ－１個の抽出された画像データは、記憶部６０等のメモリ２０２に一時的に格納され、その後、作業種別判定部５０によって読み出される。 The extraction unit 40 sends the extracted image data #N-1 to the work type determination unit 50. In detail, in this example, the K-1 extracted image data are temporarily stored in a memory 202 such as the storage unit 60, and are then read out by the work type determination unit 50.

（ｂ４．作業種別判定部５０）
作業種別判定部５０は、抽出部４０によって抽出された画像データに基づき、作業場で行われている作業の種別を判定する。換言すれば、作業種別判定部５０は、作業場で行われている作業を分類する。作業種別判定部５０は、溶接作業判定部５１と、グラインダ作業判定部５２と、ガウジング作業判定部５３と、最終判定部５４とを含む。 (b4. Work type determination unit 50)
The work type determination unit 50 determines the type of work being performed in the workplace based on the image data extracted by the extraction unit 40. In other words, the work type determination unit 50 classifies the works being performed in the workplace. The work type determination unit 50 includes a welding work determination unit 51, a grinding work determination unit 52, a gouging work determination unit 53, and a final determination unit 54.

溶接作業は、金属同士を接合する接合加工である。溶接作業が行われる際、アーク光と呼ばれる発光が生じる。グライダ作業は、砥石などで金属を削る研削加工である。グラインダ作業が行われる際、削られた金属粉が発光する。ガウジング作業は、金属を切断、溶断、除去する加工である。ガウジング作業が行われる際、アーク放電によって発光が生じる。このように、溶接作業と、グラインダ作業と、ガウジング作業とは、いずれも金属加工であって、かつ、作業に伴い加工対象の部分で発光が生じる。しかしながら、溶接作業と、グラインダ作業と、ガウジング作業とは、それぞれ、光の発光状態が異なる。本実施の形態では、情報処理装置２は、これらの光の発光状態の違いに着目し、作業種別を判定する。 Welding is a joining process that joins metals together. When welding is performed, light is emitted, called arc light. Grinding is a grinding process that cuts metal with a grindstone or the like. When grinding is performed, the shaved metal powder emits light. Gouging is a process that cuts, melts, and removes metal. When gouging is performed, light is emitted by arc discharge. In this way, welding, grinding, and gouging are all metal processing, and light is emitted from the part being processed as the work is performed. However, welding, grinding, and gouging each have different light emission states. In this embodiment, the information processing device 2 focuses on the difference in the light emission states and determines the work type.

溶接作業判定部５１は、抽出部４０によって抽出された画像データ＃Ｎ－１に基づき、作業場で行われている作業が溶接作業であるか否かを判定する。詳しくは、本例では、溶接作業判定部５１は、学習済みモデルＭ５１によって実現される。 The welding work determination unit 51 determines whether the work being performed in the workplace is welding work or not based on the image data #N-1 extracted by the extraction unit 40. In more detail, in this example, the welding work determination unit 51 is realized by the trained model M51.

学習済みモデルＭ５１は、抽出部４０によって抽出された画像データ＃Ｎ－１（１個の画像データ）を入力とし、かつ、作業場で行われている作業が溶接作業であることを示す確度を出力する。本例では、確度は、０以上１以下の値である。このように、学習済みモデルＭ５１は、確度を正規化（本例では、最小値が０、最大値が１）して出力する。確度が高い程、作業場で行われている作業が溶接作業である可能性が高い。 The trained model M51 receives as input the image data #N-1 (one piece of image data) extracted by the extraction unit 40, and outputs a degree of accuracy indicating that the work being performed in the workplace is welding work. In this example, the degree of accuracy is a value between 0 and 1 inclusive. In this way, the trained model M51 normalizes the degree of accuracy (in this example, the minimum value is 0 and the maximum value is 1) and outputs it. The higher the degree of accuracy, the more likely it is that the work being performed in the workplace is welding work.

学習済みモデルＭ５１は、抽出された画像データ＃Ｎ－１毎に算出された確度を、抽出された画像データ＃Ｎ－１の識別子に関連付けて、最終判定部５４に出力する。たとえば動画像データＤａにフレーム画像データがＫ個含まれる場合、学習済みモデルＭ５１は、Ｋ－１個の確度を最終判定部５４に出力する。 The trained model M51 associates the accuracy calculated for each extracted image data #N-1 with the identifier of the extracted image data #N-1 and outputs it to the final determination unit 54. For example, if the video image data Da contains K frame image data, the trained model M51 outputs K-1 accuracy values to the final determination unit 54.

グラインダ作業判定部５２は、抽出部４０によって抽出された画像データ＃Ｎ－１に基づき、作業場で行われている作業がグラインダ作業であるか否かを判定する。詳しくは、本例では、グラインダ作業判定部は、学習済みモデルＭ５２によって実現される。 The grinding operation determination unit 52 determines whether the work being performed in the workplace is grinding operation based on the image data #N-1 extracted by the extraction unit 40. In more detail, in this example, the grinding operation determination unit is realized by the trained model M52.

学習済みモデルＭ５２は、抽出部４０によって抽出された画像データ＃Ｎ－１（１個の画像データ）を入力とし、かつ、作業場で行われている作業がグラインダ作業であることを示す確度を出力する。本例では、確度は、０以上１以下の値である。このように、学習済みモデルＭ５２は、学習済みモデルＭ５１と同様に、確度を正規化（本例では、最小値が０、最大値が１）して出力する。確度が高い程、作業場で行われている作業がグラインダ作業である可能性が高い。 The trained model M52 receives as input the image data #N-1 (one piece of image data) extracted by the extraction unit 40, and outputs a degree of accuracy indicating that the work being done in the workshop is grinding work. In this example, the degree of accuracy is a value between 0 and 1 inclusive. In this way, like the trained model M51, the trained model M52 normalizes the degree of accuracy (in this example, the minimum value is 0 and the maximum value is 1) and outputs it. The higher the degree of accuracy, the more likely it is that the work being done in the workshop is grinding work.

学習済みモデルＭ５２は、学習済みモデルＭ５１と同様、抽出された画像データ＃Ｎ－１毎に算出された確度を、抽出された画像データ＃Ｎ－１の識別子に関連付けて、最終判定部５４に出力する。たとえば動画像データＤａにフレーム画像データがＫ個含まれる場合、学習済みモデルＭ５２は、Ｋ－１個の確度を最終判定部５４に出力する。 Like the trained model M51, the trained model M52 associates the accuracy calculated for each extracted image data #N-1 with the identifier of the extracted image data #N-1 and outputs it to the final determination unit 54. For example, if the video image data Da contains K frame image data, the trained model M52 outputs K-1 accuracy rates to the final determination unit 54.

ガウジング作業判定部５３は、抽出部４０によって抽出された画像データ＃Ｎ－１に基づき、作業場で行われている作業がガウジング作業であるか否かを判定する。詳しくは、本例では、ガウジング作業判定部は、学習済みモデルＭ５３によって実現される。 The gouging operation determination unit 53 determines whether the work being performed in the workplace is gouging work based on the image data #N-1 extracted by the extraction unit 40. In more detail, in this example, the gouging operation determination unit is realized by the trained model M53.

学習済みモデルＭ５３は、抽出部４０によって抽出された画像データ＃Ｎ－１（１個の画像データ）を入力とし、かつ、作業場で行われている作業がガウジング作業であることを示す確度を出力する。本例では、確度は、０以上１以下の値である。このように、学習済みモデルＭ５３は、学習済みモデルＭ５１，Ｍ５２と同様に、確度を正規化（本例では、最小値が０、最大値が１）して出力する。確度が高い程、作業場で行われている作業がガウジング作業である可能性が高い。 The trained model M53 receives as input the image data #N-1 (one piece of image data) extracted by the extraction unit 40, and outputs a degree of accuracy indicating that the work being performed in the workplace is gouging work. In this example, the degree of accuracy is a value between 0 and 1 inclusive. In this way, like the trained models M51 and M52, the trained model M53 normalizes the degree of accuracy (in this example, the minimum value is 0 and the maximum value is 1) and outputs it. The higher the degree of accuracy, the more likely it is that the work being performed in the workplace is gouging work.

学習済みモデルＭ５３は、学習済みモデルＭ５１，Ｍ５２と同様、抽出された画像データ＃Ｎ－１毎に算出された確度を、抽出された画像データ＃Ｎ－１の識別子に関連付けて、最終判定部５４に出力する。たとえば動画像データＤａにフレーム画像データがＫ個含まれる場合、学習済みモデルＭ５３は、Ｋ－１個の確度を最終判定部５４に出力する。 Similar to trained models M51 and M52, trained model M53 associates the accuracy calculated for each extracted image data #N-1 with the identifier of the extracted image data #N-1 and outputs it to final determination unit 54. For example, if video image data Da contains K frame image data, trained model M53 outputs K-1 accuracy rates to final determination unit 54.

最終判定部５４は、学習済みモデルＭ５１による判定の結果と、学習済みモデルＭ５２による判定の結果と、学習済みモデルＭ５３による判定の結果とに基づき、作業場で行われている作業が、溶接作業、グラインダ作業、ガウジング作業、および、分類が不可な作業のうちの何れであるかを判定する。他の作業としては、たとえば、作業者の移動が挙げられる。 The final determination unit 54 determines whether the work being performed in the workplace is welding work, grinding work, gouging work, or work that cannot be classified, based on the results of the determination by the trained model M51, the results of the determination by the trained model M52, and the results of the determination by the trained model M53. Examples of other work include the movement of workers.

詳しくは、最終判定部５４は、抽出された各画像データ＃Ｎ－１（１個毎の画像データ）について、確度が閾値（たとえば、０．６）以上となった作業が存在するかを判断する。最終判定部５４は、確度が閾値以上となった作業が存在する場合には、当該作業を、作業場で行われている作業と判定する。 In more detail, the final determination unit 54 determines whether or not there is any work whose accuracy is equal to or greater than a threshold value (e.g., 0.6) for each extracted image data #N-1 (each piece of image data). If there is any work whose accuracy is equal to or greater than the threshold value, the final determination unit 54 determines that the work is being performed in a workplace.

たとえば、抽出された１個の画像データ＃１について、学習済みモデルＭ５１から出力された確度が０．７であり、学習済みモデルＭ５２から出力された確度が０．１であり、学習済みモデルＭ５３から出力された確度が０．０５である場合、最終判定部５４は、作業場で行われている作業が溶接作業であると判定する。また、抽出された１個の画像データ＃２について、学習済みモデルＭ５１から出力された確度が０．５であり、学習済みモデルＭ５２から出力された確度が０．２であり、学習済みモデルＭ５３から出力された確度が０．１である場合、確度が閾値（本例では、０．６）以上となるものがないため、最終判定部５４は、作業場で行われている作業については分類が不可であると判定する。このような判定は、抽出された各画像データ＃Ｎ－１について行われる。 For example, if the accuracy output from trained model M51 for one extracted image data #1 is 0.7, the accuracy output from trained model M52 is 0.1, and the accuracy output from trained model M53 is 0.05, the final judgment unit 54 judges that the work being performed in the workplace is welding work. Also, if the accuracy output from trained model M51 for one extracted image data #2 is 0.5, the accuracy output from trained model M52 is 0.2, and the accuracy output from trained model M53 is 0.1, there is no accuracy greater than or equal to the threshold value (0.6 in this example), so the final judgment unit 54 judges that the work being performed in the workplace cannot be classified. Such a judgment is made for each extracted image data #N-1.

さらに、最終判定部５４は、所定の周期（たとえば、１秒毎）に、最終判定を行う。当該周期は、動画像データＤａのフレームレートに基づき適宜設定され得る。たとえば、動画像データＤａのフレームレートが６０ｆｐｓ（frames per second）とする。この場合、動画像データＤａは、１秒間に６０個のフレーム画像データを含む。 Furthermore, the final judgment unit 54 performs the final judgment at a predetermined cycle (for example, every second). The cycle can be set appropriately based on the frame rate of the video data Da. For example, the frame rate of the video data Da is 60 fps (frames per second). In this case, the video data Da includes 60 frame image data per second.

したがって、最終判定部５４では、動画像データＤａの１秒間において、６０個の判定結果が得られる。最終判定部５４は、当該６０個の判定結果のうち、最も数が多い作業種別を、当該期間（１秒間）において作業場で行われている作業であると判定（以下、「最終判定」とも称する）する。 Therefore, the final determination unit 54 obtains 60 determination results during one second of the video image data Da. The final determination unit 54 determines (hereinafter also referred to as the "final determination") that the work type with the greatest number of results among the 60 determination results is the work being performed in the workplace during that period (one second).

たとえば、ある１秒の期間における６０個の判定結果のうち、溶接作業の判定が４０回、グラインダ作業の判定が４回、ガウジング作業の判定が０回、分類が不可の判定が１６回であったとすると、最終判定部５４は、当該期間の作業種別を溶接と判定（最終判定）する。 For example, if, out of 60 judgment results in a one-second period, there were 40 judgments of welding work, 4 judgments of grinding work, 0 judgments of gouging work, and 16 judgments that classification was not possible, the final judgment unit 54 will judge the work type for that period to be welding (final judgment).

最終判定部５４は、最終判定の結果を、記憶部６０に記憶させる。詳しくは、最終判定部５４は、最終判定の結果を、動画像データＤａに関連付けて記憶部６０に記憶させる。より詳しくは、最終判定部５４は、最終判定の結果を動画像データＤａに同期させる。最終判定部５４は、最終判定の元になった各フレーム画像データに、当該最終判定の結果を関連付ける。 The final judgment unit 54 stores the result of the final judgment in the storage unit 60. In particular, the final judgment unit 54 stores the result of the final judgment in the storage unit 60 in association with the moving image data Da. In even more particular, the final judgment unit 54 synchronizes the result of the final judgment with the moving image data Da. The final judgment unit 54 associates the result of the final judgment with each frame image data on which the final judgment was based.

なお、関連付けの方法は、フレーム画像データの識別子であってもよいし、動画像データＤａの再生開始からの経過時刻を基準にしてもよい。 The association may be based on the identifier of the frame image data, or on the elapsed time from the start of playback of the video image data Da.

（ｂ５．表示制御部７０）
表示制御部７０は、表示部８０の表示を制御する。表示制御部７０は、ユーザ操作に基づき、動画像データＤａとともに、最終判定の結果を表示する。上述した関連付けにより、最終判定の結果は、動画像データＤａの再生が進むに連れて逐次変化する。本例では、動画像データＤａの再生時、最終判定の結果は１秒毎に更新される。 (b5. Display control unit 70)
The display control unit 70 controls the display of the display unit 80. The display control unit 70 displays the result of the final judgment together with the moving image data Da based on a user operation. Due to the association described above, the result of the final judgment changes successively as the playback of the moving image data Da progresses. In this example, the result of the final judgment is updated every second during playback of the moving image data Da.

＜Ｃ学習済みモデル＞
学習済みモデルＭ２０，Ｍ５１，Ｍ５２，Ｍ５３について、説明する。 <C-trained model>
The trained models M20, M51, M52, and M53 will be explained.

学習済みモデルＭ２０は、予め準備された学習用データセットと、学習用プログラムとにより生成される。当該学習用データセットは、複数の学習用データを含む。各学習用データは、作業場で作業を行う作業者を撮像した画像データ（静止画像データ、フレーム画像データ）に、当該画像データにおける人物領域を示すラベル情報（正解データ）が付与されたものである。本例では、ラベル情報の人物領域は、矩形領域で指定されている。 The trained model M20 is generated by a training dataset and a training program that have been prepared in advance. The training dataset includes multiple training data. Each training data is image data (still image data, frame image data) of a worker performing work in a workplace, to which label information (correct answer data) indicating the person area in the image data has been added. In this example, the person area in the label information is specified as a rectangular area.

学習済みモデルＭ５１は、学習済みモデルＭ２０と同様、予め準備された学習用データセットと、学習用プログラムとにより生成される。当該学習用データセットは、複数の学習用データを含む。各学習用データは、作業場で溶接作業を行う作業者を撮像した画像データに、作業種別が溶接であることを示すラベル情報（正解データ）が付加されたものである。 Like trained model M20, trained model M51 is generated from a training data set and a training program that have been prepared in advance. The training data set includes multiple pieces of training data. Each piece of training data is image data of a worker performing welding work in a workplace, to which label information (correct answer data) indicating that the work type is welding has been added.

学習済みモデルＭ５２は、学習済みモデルＭ２０，Ｍ５１と同様、予め準備された学習用データセットと、学習用プログラムとにより生成される。当該学習用データセットは、複数の学習用データを含む。各学習用データは、作業場でグラインダ作業を行う作業者を撮像した画像データに、作業種別がグラインダであることを示すラベル情報が付加されたものである。 Like trained models M20 and M51, trained model M52 is generated from a training data set and a training program that have been prepared in advance. The training data set includes multiple pieces of training data. Each piece of training data is image data of a worker performing grinding work in a workshop, to which label information indicating that the work type is grinding has been added.

学習済みモデルＭ５３は、学習済みモデルＭ２０，Ｍ５１，Ｍ５２と同様、予め準備された学習用データセットと、学習用プログラムとにより生成される。当該学習用データセットは、複数の学習用データを含む。各学習用データは、作業場でガウジング作業を行う作業者を撮像した画像データに、作業種別がガウジングであることを示すラベル情報が付加されたものである。 Like trained models M20, M51, and M52, trained model M53 is generated from a training data set and a training program that have been prepared in advance. The training data set includes multiple pieces of training data. Each piece of training data is image data of a worker performing gouging work in a workplace, to which label information indicating that the work type is gouging has been added.

学習済みモデルＭ２０，Ｍ５１，Ｍ５２，Ｍ５３は、ＤＮＮ（Deep Neural Network）に分類されるネットワークである。学習済みモデルＭ２０，Ｍ５１，Ｍ５２，Ｍ５３は、ＣＮＮ（Convolutional Neural Network）に分類される前処理ネットワークと、中間層と、出力層に相当する活性化関数と、Ｓｏｆｔｍａｘ関数とを含む。 The trained models M20, M51, M52, and M53 are networks classified as DNNs (Deep Neural Networks). The trained models M20, M51, M52, and M53 include a preprocessing network classified as a CNN (Convolutional Neural Network), an intermediate layer, an activation function corresponding to the output layer, and a Softmax function.

前処理ネットワークは、相対的に次数の大きな特徴量から、推定結果を算出するために有効な特徴量を抽出するための一種のフィルタとして機能することが予定されている。前処理ネットワークは、畳み込み層（CONV）およびプーリング層（Pooling）が交互に配置された構成を有している。なお、畳み込み層とプーリング層との数は同数でなくてもよく、また、畳み込み層の出力側にはＲｅＬＵ（正規化線形関数：rectified linear unit）などの活性化関数が配置される。 The preprocessing network is expected to function as a kind of filter for extracting effective features for calculating estimation results from features with relatively large degrees. The preprocessing network has a configuration in which convolutional layers (CONV) and pooling layers (Pooling) are arranged alternately. Note that the number of convolutional layers and pooling layers does not have to be the same, and an activation function such as ReLU (rectified linear unit) is arranged on the output side of the convolutional layer.

より具体的には、前処理ネットワークは、特徴量の入力を受けて、所定の属性情報を示す内部特徴量を出力するように構築される。中間層は、所定数の層数を有する全結合ネットワークからなり、前処理ネットワークからの出力を、各ノードについて決定される重みおよびバイアスを用いてノード毎に順次結合する。 More specifically, the preprocessing network is constructed to receive feature inputs and output internal features that indicate specified attribute information. The intermediate layer is made up of a fully connected network with a specified number of layers, and sequentially connects the outputs from the preprocessing network for each node using weights and biases determined for each node.

中間層の出力側には、ＲｅＬＵなどの活性化関数が配置され、最終的には、Ｓｏｆｔｍａｘ関数により確率分布に正規化された上で、推定結果が出力される。 An activation function such as ReLU is placed on the output side of the intermediate layer, and finally, the estimation result is output after being normalized to a probability distribution using the Softmax function.

学習用プログラムがパラメータの値を最適化するにあたっては、任意の最適化アルゴリズムを用いることができる。より具体的には、最適化アルゴリズムとしては、たとえば、ＳＧＤ（Stochastic Gradient Descent：確率的勾配降下法）、ＭｏｍｅｎｔｕｍＳＧＤ（慣性項付加ＳＧＤ）、ＡｄａＧｒａｄ、ＲＭＳｐｒｏｐ、ＡｄａＤｅｌｔａ、Ａｄａｍ（Adaptive moment estimation）などの勾配法を用いることができる。 When the learning program optimizes the parameter values, any optimization algorithm can be used. More specifically, the optimization algorithm can be, for example, a gradient method such as SGD (Stochastic Gradient Descent), Momentum SGD (SGD with inertia term added), AdaGrad, RMSprop, AdaDelta, or Adam (Adaptive moment estimation).

＜Ｄ．処理の流れ＞
情報処理装置２における上述した処理の流れについて、フロー図等を用いてさらに説明する。 D. Processing Flow
The above-mentioned process flow in the information processing device 2 will be further described with reference to a flow chart or the like.

図４は、判定処理の流れを示すフロー図である。 Figure 4 is a flow diagram showing the flow of the determination process.

図４に示されるように、ステップＳ１において、情報処理装置２は、動画像データＤａを構成する各フレーム画像データ＃Ｎにおいて、人物領域を検出する。ステップＳ２において、情報処理装置２は、人物領域の検出結果を用いてフレーム画像データ＃Ｎ毎に作業種別を判定し、さらに、各フレーム画像データ＃Ｎによる判定結果に基づき、１秒間毎の最終判定を実行する。 As shown in FIG. 4, in step S1, the information processing device 2 detects a person area in each frame image data #N constituting the moving image data Da. In step S2, the information processing device 2 uses the person area detection result to determine the task type for each frame image data #N, and further performs a final determination every second based on the determination result for each frame image data #N.

ステップＳ１およびステップＳ２の処理は、プロセッサ２０１によって実行される。ステップＳ１の処理は、人物領域検出部２０（図３）によって実行される。具体的には、ステップＳ１の処理は、学習済みモデルＭ２０によって実現される。ステップＳ２の処理は、作業種別判定部５０（図３）によって実行される。具体的には、ステップＳ２の処理は、学習済みモデルＭ５１，Ｍ５２，Ｍ５３によって実現される。 The processes of steps S1 and S2 are executed by the processor 201. The process of step S1 is executed by the person area detection unit 20 (FIG. 3). Specifically, the process of step S1 is realized by the trained model M20. The process of step S2 is executed by the task type determination unit 50 (FIG. 3). Specifically, the process of step S2 is realized by the trained models M51, M52, and M53.

図５は、図４のステップＳ１の処理の詳細を示したフロー図である。 Figure 5 is a flow diagram showing the details of the processing of step S1 in Figure 4.

図５に示されているように、ステップＳ１０１において、プロセッサ２０１は、入力装置２０４を介して、判定する人数の設定入力を受け付ける。典型的には、プロセッサ２０１は、１人（１人作業）または２人（２人作業）を示す入力を受け付ける。人数の設定入力を受け付ける理由は、人物領域の検出精度を高めるためである。 As shown in FIG. 5, in step S101, the processor 201 accepts a setting input of the number of people to be determined via the input device 204. Typically, the processor 201 accepts an input indicating one person (single person work) or two people (two people work). The reason for accepting a setting input of the number of people is to improve the accuracy of detecting the person area.

ステップＳ１０２において、プロセッサ２０１は、メモリ２０２から動画像データＤａを読み込む。ステップＳ１０３において、プロセッサ２０１は、メモリ２０２から人物領域検出用の学習済みモデルＭ２０を読み込む。ステップＳ１０４において、プロセッサ２０１は、各フレーム画像データ＃Ｎに対して、学習済みモデルＭ２０を用いた人物領域検出処理を実行する。 In step S102, the processor 201 reads the video data Da from the memory 202. In step S103, the processor 201 reads the trained model M20 for human area detection from the memory 202. In step S104, the processor 201 executes human area detection processing using the trained model M20 for each frame image data #N.

ステップＳ１０５において、プロセッサ２０１は、ステップＳ１０１で設定された設定人数に合わせて判定を調整する。具体的には、プロセッサ２０１は、同じフレーム画像データ内で、設定人数分だけ、人物領域の判定の確度が高い順に人物領域を選定する。たとえば、設定人数が１人の場合、プロセッサ２０１は、複数の人物領域（候補領域）から、確度が最も高い人物領域を選定する。設定人数が２人の場合、プロセッサ２０１は、複数の人物領域（候補領域）から、確度が最も高い人物領域と、確度が次に高い人物領域とを選定する。ステップＳ１０６において、プロセッサ２０１は、前後補完等の後処理を実行する。 In step S105, the processor 201 adjusts the judgment according to the number of people set in step S101. Specifically, the processor 201 selects person areas for the set number of people in the same frame image data in descending order of the accuracy of the judgment of the person area. For example, when the set number of people is one, the processor 201 selects the person area with the highest accuracy from multiple person areas (candidate areas). When the set number of people is two, the processor 201 selects the person area with the highest accuracy and the person area with the second highest accuracy from multiple person areas (candidate areas). In step S106, the processor 201 performs post-processing such as front and back completion.

ステップＳ１０７において、プロセッサ２０１は、フレーム画像データ＃Ｎ毎に，人物領域検出結果（座標）を所定の形式でメモリ２０２に保存する。プロセッサ２０１は、典型的には、人物領域検出結果を、互換性が高いフィアル形式の一つ（たとえば、ｃｓｖ（Comma Separated Value）形式）でデータ保存する。詳しくは、プロセッサ２０１は、人物領域検出（座標）を、タイムスタンプ、フレーム画像データのフレームナンバー、オブジェクトナンバー、人物領域検出についての判定の確度等の情報と関連付けて判定する。 In step S107, the processor 201 stores the person area detection result (coordinates) in a predetermined format in the memory 202 for each frame image data #N. The processor 201 typically stores the person area detection result in one of the highly compatible file formats (for example, the csv (Comma Separated Value) format). In detail, the processor 201 determines the person area detection (coordinates) in association with information such as a timestamp, the frame number of the frame image data, the object number, and the accuracy of the determination of the person area detection.

ステップＳ１０８において、プロセッサ２０１は、人物領域検出結果の動画像データＤｃを作成し、かつメモリ２０２に保存する。なお、動画像データＤｃは、動画像データＤａに対して、人物領域を示す図形（矩形）を重畳したものである。 In step S108, the processor 201 creates video data Dc of the person area detection result and stores it in the memory 202. Note that the video data Dc is obtained by superimposing a figure (rectangle) indicating the person area on the video data Da.

図６は、図４のステップＳ２の処理の詳細を示したフロー図である。 Figure 6 is a flow diagram showing the details of the processing of step S2 in Figure 4.

図６に示されているように、ステップＳ２０１において、プロセッサ２０１は、メモリ２０２から動画像データＤａを読み込む。ステップＳ２０２において、プロセッサ２０１は、フレーム差分法を用いて、フレーム差分動画像データＤｂを作成する。この処理は、生成部３０（図３）によって実行される処理である。なお、ステップＳ２０２の処理の詳細については、後述する（図７）。 As shown in FIG. 6, in step S201, the processor 201 reads video data Da from the memory 202. In step S202, the processor 201 creates frame difference video data Db using a frame difference method. This process is executed by the generator 30 (FIG. 3). Details of the process of step S202 will be described later (FIG. 7).

ステップＳ２０３において、プロセッサ２０１は、メモリ２０２から人物領域検出結果を読み込む。なお、人物領域検出結果は、人物領域の座標と、上述したフレーム画像データの識別子（タイムスタンプまたはフレームナンバー）等の情報を含む。 In step S203, the processor 201 reads the person area detection result from the memory 202. Note that the person area detection result includes information such as the coordinates of the person area and the identifier (timestamp or frame number) of the frame image data described above.

ステップＳ２０４において、プロセッサ２０１は、人物領域検出結果に基づき、フレーム差分動画像データＤｂを構成する各フレーム画像データＰから人物領域に対応する領域を抽出する。換言すれば、プロセッサ２０１は、画像の切り出しを行う。詳しくは、プロセッサ２０１は、各フレーム画像データＰに関連付いた人物領域検出結果を用いて、各フレーム画像データＰから、人物領域に対応する領域を抽出する。詳しくは、プロセッサ２０１は、フレーム画像データＰ毎に、異なる人物領域検出結果を用いて、画像の切り出しを行う。 In step S204, the processor 201 extracts an area corresponding to a person area from each frame image data P constituting the frame difference moving image data Db based on the person area detection result. In other words, the processor 201 cuts out an image. In more detail, the processor 201 uses the person area detection result associated with each frame image data P to extract an area corresponding to a person area from each frame image data P. In more detail, the processor 201 cuts out an image using a different person area detection result for each frame image data P.

ステップＳ２０５において、プロセッサ２０１は、メモリ２０２から、作業種別判定用の学習済みモデルＭ５１，Ｍ５２，Ｍ５３を読み込む。ステップＳ２０６において、プロセッサ２０１は、学習済みモデルＭ５１，Ｍ５２，Ｍ５３を実行することにより、ステップＳ２０４で抽出された各画像データＱ（切り出した部分の画像データ）から作業種別の判定処理を実行する。なお、ステップＳ２０６の処理の詳細については、後述する（図９）。 In step S205, the processor 201 reads from the memory 202 the trained models M51, M52, and M53 for determining the work type. In step S206, the processor 201 executes the trained models M51, M52, and M53 to perform a process of determining the work type from each image data Q (image data of the cut-out portion) extracted in step S204. Details of the process in step S206 will be described later (FIG. 9).

ステップＳ２０７において、プロセッサ２０１は、ステップＳ２０６における作業種別判定結果を、１秒毎の最終判定結果として出力する。具体的には、上述したように、プロセッサ２０１は、１秒間において最も数が多い作業種別を、当該１秒間において作業場で行われている作業であると判定（最終判定）する。１秒毎の最終判定結果は、逐次、メモリ２０２の作業領域（典型的には、ＲＡＭ２２２）に一時的に記憶される。 In step S207, the processor 201 outputs the work type determination result in step S206 as the final determination result for each second. Specifically, as described above, the processor 201 determines (final determination) that the most numerous work type in each second is the work being performed in the workplace during that second. The final determination results for each second are temporarily stored in the working area of the memory 202 (typically, the RAM 222) one by one.

ステップＳ２０８において、プロセッサ２０１は、１秒間における作業が分類不可と判定された場合、当該作業が作業者の移動であるか否かを判定する移動判定処理を実行する。なお、移動判定処理の詳細については、後述する。 In step S208, if the processor 201 determines that the work performed during one second cannot be classified, the processor 201 executes a movement determination process to determine whether the work is a movement of a worker. Details of the movement determination process will be described later.

ステップＳ２０９において、プロセッサ２０１は、前後補完等の後処理を実行する。ステップＳ２１０において、プロセッサ２０１は、１秒毎の最終判定結果を所定の形式でメモリ２０２に不揮発的に保存する。典型的には、プロセッサ２０１は、最終判定結果を、ＳＳＤ２２３またはＨＤＤ２２４にｃｓｖ形式でデータ保存する。詳しくは、プロセッサ２０１は、最終判定結果を、タイムスタンプ、フレームナンバー等の情報と関連付けて判定する。 In step S209, the processor 201 executes post-processing such as completion of beginning and end. In step S210, the processor 201 stores the final judgment result for each second in a non-volatile manner in the memory 202 in a predetermined format. Typically, the processor 201 stores the final judgment result in the SSD 223 or the HDD 224 as data in a csv format. In more detail, the processor 201 judges the final judgment result in association with information such as a timestamp and a frame number.

ステップＳ２１１において、プロセッサ２０１は、最終判定結果を含んだ動画像データＤｄを作成し、かつメモリ２０２に保存する。なお、動画像データＤｄは、動画像データＤａに対して、最終判定結果を文字等の識別情報で示した画像を重畳したものである。 In step S211, the processor 201 creates video data Dd including the final judgment result and stores it in the memory 202. Note that the video data Dd is video data Da with an image superimposed thereon indicating the final judgment result using identification information such as characters.

ユーザ９５０が、情報処理装置２において動画像データＤｄを再生することにより、ディスプレイ２０３には、作業場での作業の映像に重畳した形式で作業種別が表示される。また、作業種別の表示は、１秒毎に更新される。 When the user 950 plays the video data Dd on the information processing device 2, the display 203 displays the work type superimposed on the image of the work being performed in the workplace. The display of the work type is updated every second.

次に、ステップＳ２０８の移動判定処理について説明する。移動判定処理では、先ず、プロセッサ２０１は、検出された各人物領域の幅と高さとについて平均値を算出する。すなわち、プロセッサ２０１は、各人物領域のフレーム画像データ＃Ｎ内での重心位置を算出する。 Next, the movement determination process of step S208 will be described. In the movement determination process, the processor 201 first calculates the average value of the width and height of each detected person area. In other words, the processor 201 calculates the position of the center of gravity of each person area within frame image data #N.

次に、プロセッサ２０１は、１秒間毎の重心位置の平均値を算出する。その後、プロセッサ２０１は、重心位置の各平均値を、同じ時刻（タイミング）の最終判定結果に関連付ける。さらにプロセッサ２０１は、重心位置の各平均値に基づき、１秒毎の人物領域の移動量を算出する。 Next, the processor 201 calculates the average value of the center of gravity position for each second. After that, the processor 201 associates each average value of the center of gravity position with the final determination result at the same time (timing). Furthermore, the processor 201 calculates the amount of movement of the person area for each second based on each average value of the center of gravity position.

プロセッサ２０１は、他の作業（分類不可）とされた最終判定結果を、メモリ２０２の作業領域から抽出する。その後、プロセッサ２０１は、抽出された最終判定結果に関連付けられた移動量が所定の範囲内（下限の閾値と上限の閾値との間）に収まっているか否かを判定する。プロセッサ２０１は、移動量が所定の範囲内に収まっている場合、他の作業（分類不可）の情報を「移動」に置換する。 The processor 201 extracts the final judgment result that is determined to be other work (unclassifiable) from the working area of the memory 202. The processor 201 then determines whether or not the movement amount associated with the extracted final judgment result is within a predetermined range (between a lower threshold and an upper threshold). If the movement amount is within the predetermined range, the processor 201 replaces the information of other work (unclassifiable) with "movement".

図７は、図６のステップＳ２０２の処理の詳細を説明するためのフロー図である。図８は、画像データを用いて図７の処理を説明するための図である。 Figure 7 is a flow diagram for explaining the details of the process of step S202 in Figure 6. Figure 8 is a diagram for explaining the process of Figure 7 using image data.

図７に示されるように、ステップＳ２２０１において、プロセッサ２０１は、上述した変数Ｎ（＃Ｎ）の値を２とする。ステップＳ２２０２において、プロセッサ２０１は、読み込んだ動画像データＤａから、連続する２つのフレーム画像データ＃Ｎ－１，＃Ｎを取得する。図８に、フレーム画像データ＃Ｎ－１とフレーム画像データ＃Ｎとの例を示す。 As shown in FIG. 7, in step S2201, the processor 201 sets the value of the above-mentioned variable N (#N) to 2. In step S2202, the processor 201 acquires two consecutive frame image data #N-1 and #N from the loaded video image data Da. FIG. 8 shows examples of frame image data #N-1 and frame image data #N.

ステップＳ２２０３において、プロセッサ２０１は、フレーム画像データ＃Ｎ－１とフレーム画像データ＃Ｎとの差分を表す差分画像データＲ（図８参照）を生成する。ステップＳ２２０４において、プロセッサ２０１は、差分画像データＲを二値化し、二値化画像データＴ（図８参照）を生成する。 In step S2203, the processor 201 generates differential image data R (see FIG. 8) that represents the difference between frame image data #N-1 and frame image data #N. In step S2204, the processor 201 binarizes the differential image data R to generate binarized image data T (see FIG. 8).

ステップＳ２２０５において、プロセッサ２０１は、二値化画像データＴに対してクロージング処理を行うことにより、クロージング画像データＵ（図８参照）を生成する。クロージング画像データＵは、マスク画像として用いられる。ステップＳ２２０６において、プロセッサ２０１は、フレーム画像データ＃Ｎに対してクロージング画像データＵでマスキング処理を行う。これにより、マスク処理後の画像データＶ（図８参照）が生成される。 In step S2205, the processor 201 performs a closing process on the binarized image data T to generate closing image data U (see FIG. 8). The closing image data U is used as a mask image. In step S2206, the processor 201 performs a masking process on the frame image data #N using the closing image data U. As a result, image data V after the mask process (see FIG. 8) is generated.

ステップＳ２２０７において、プロセッサ２０１は、マスク処理後の画像データＶを、新規画像データ＃Ｎ－１としてメモリ２０２に保存する。ステップＳ２２０８において、プロセッサ２０１は、動画像データＤａが終了したか否かを判断する。具体的には、プロセッサ２０１は、動画像データＤａの全ての連続する２つのフレーム画像データに対して上述した処理を実行したか否かを判断する。 In step S2207, the processor 201 stores the image data V after the mask processing in the memory 202 as new image data #N-1. In step S2208, the processor 201 determines whether the moving image data Da is finished. Specifically, the processor 201 determines whether the above-mentioned processing has been performed on all two consecutive frame image data of the moving image data Da.

動画像データＤａが終了していないと判断された場合（ステップＳ２２０８においてＮＯ）、プロセッサ２０１は、ステップＳ２２１０において、Ｎの値を１つだけ増加（インクリメント）させる。動画像データＤａが終了したと判断された場合（ステップＳ２２０８においてＹＥＳ）、プロセッサ２０１は、ステップＳ２２０９において、新規画像データ＃Ｎの全てを時系列の順につなげることにより上述したフレーム差分動画像データＤｂを生成し、かつ、メモリ２０２に保存する。 If it is determined that the video data Da is not complete (NO in step S2208), the processor 201 increments the value of N by 1 in step S2210. If it is determined that the video data Da is complete (YES in step S2208), the processor 201 generates the above-mentioned frame difference video data Db by connecting all of the new image data #N in chronological order in step S2209, and stores it in the memory 202.

プロセッサ２０１は、ステップＳ２２０９の後、処理を図６のステップＳ２０３に進める。 After step S2209, the processor 201 proceeds to step S203 in FIG. 6.

図９は、図６のステップＳ２０６の処理の詳細を説明するためのフロー図である。 Figure 9 is a flow diagram for explaining the details of the processing of step S206 in Figure 6.

図９に示されるように、ステップＳ２６０１において、プロセッサ２０１は、メモリ２０２から抽出された各画像データＱ（切り出した部分の画像データ）を読み込む。 As shown in FIG. 9, in step S2601, the processor 201 reads each image data Q (image data of the cut-out portion) extracted from the memory 202.

ステップＳ２６０２において、プロセッサ２０１は、１つの画像データＱに対して、各学習済みモデルＭ５１，Ｍ５２，Ｍ５３を用いて判定を行う。具体的には、プロセッサ２０１は、各学習済みモデルＭ５１，Ｍ５２，Ｍ５３を実行することにより、上述したように各作業（溶接作業、グラインダ作業、ガウジング作業）の確度を算出する。ステップＳ２６０３において、プロセッサ２０１は、算出された３つの作業の確度のうち最大の確度が閾値（たとえば、０．６）を超えているか否かを判断する。 In step S2602, the processor 201 performs a judgment on one image data Q using each of the trained models M51, M52, and M53. Specifically, the processor 201 executes each of the trained models M51, M52, and M53 to calculate the accuracy of each operation (welding operation, grinding operation, and gouging operation) as described above. In step S2603, the processor 201 determines whether the maximum accuracy among the calculated accuracy of the three operations exceeds a threshold value (e.g., 0.6).

最大の確度が閾値を超えていると判断された場合（ステップＳ２６０３においてＹＥＳ）、プロセッサ２０１は、ステップＳ２６０４において、最大の確度の判定を採用する。具体例を挙げると、学習済みモデルＭ５１から出力された確度が０．７であり、学習済みモデルＭ５２から出力された確度が０．１であり、学習済みモデルＭ５３から出力された確度が０．０５である場合、プロセッサ２０１は、学習済みモデルＭ５１による判定を採用する。プロセッサ２０１は、作業種別が溶接であると判定する。 If it is determined that the maximum accuracy exceeds the threshold (YES in step S2603), the processor 201 adopts the judgment of the maximum accuracy in step S2604. As a specific example, if the accuracy output from the trained model M51 is 0.7, the accuracy output from the trained model M52 is 0.1, and the accuracy output from the trained model M53 is 0.05, the processor 201 adopts the judgment by the trained model M51. The processor 201 determines that the work type is welding.

最大の確度が閾値を超えていないと判断された場合（ステップＳ２６０３においてＮＯ）、プロセッサ２０１は、ステップＳ２６０７において、分類不可と判定する。ステップＳ２６０５において、プロセッサ２０１は、判定の結果をメモリ２０２に保存する。 If it is determined that the maximum accuracy does not exceed the threshold (NO in step S2603), the processor 201 determines in step S2607 that classification is not possible. In step S2605, the processor 201 stores the result of the determination in the memory 202.

ステップＳ２６０６において、プロセッサ２０１は、最後の画像データＱであるか否かを判断する。最後の画像データＱであると判定された場合（ステップＳ２６０６においてＹＥＳ）、プロセッサ２０１は、ステップＳ２０６の一連の処理を終了し、処理を図６のステップＳ２０７に進める。最後の画像データＱでないと判定された場合（ステップＳ２６０６においてＮＯ）、プロセッサ２０１は、ステップＳ２６０８において、処理対象を次の画像データＱに切り替える。その後、プロセッサ２０１は、処理をステップＳ２６０２に進める。 In step S2606, the processor 201 determines whether or not the image data Q is the last. If it is determined that the image data Q is the last (YES in step S2606), the processor 201 ends the series of processes in step S206 and proceeds to step S207 in FIG. 6. If it is determined that the image data Q is not the last (NO in step S2606), the processor 201 switches the processing target to the next image data Q in step S2608. The processor 201 then proceeds to step S2602.

＜Ｅ．最終判定結果例＞
図１０は、互換性が高いファイル形式（本例では、ｃｓｖ形式）としてメモリ２０２に保存された最終判定結果を含むデータ示した図である。 <E. Final judgment result example>
FIG. 10 is a diagram showing data including the final determination results stored in the memory 202 in a highly compatible file format (in this example, the csv format).

図１０に示されるように、データの「Predict」の欄には、１秒ごとに最終判定結果が記録されている。“Indistinguishable”は「分類不可」を、“Moving”は「移動」を、“Welding”は「溶接」を表している。 As shown in Figure 10, the "Predict" column of the data records the final judgment results every second. "Indistinguishable" means "unclassifiable," "Moving" means "moving," and "Welding" means "welding."

なお、“Ｇ＿Ｘ”と“Ｇ＿Ｙ”とは、それぞれ、人物領域の重心のＸ座標とＹ座標とを表している。詳しくは、“Ｇ＿Ｘ”と“Ｇ＿Ｙ”とは、１秒間における重心位置の平均値である。“Ｗｉｄｔｈ”と“Ｈｅｉｇｈｔ”とは、それぞれ、人物領域の幅と高さとを表している。 Note that "G_X" and "G_Y" respectively represent the X and Y coordinates of the center of gravity of the person area. More specifically, "G_X" and "G_Y" are the average values of the center of gravity position over one second. "Width" and "Height" respectively represent the width and height of the person area.

このように、情報処理装置２によれば、１秒毎に作業種別が特定される。それゆえ、作業種別毎に時間を累積すれば、各作業種別の作業に要した時間を算出できる。情報処理装置２は、たとえばユーザ操作に応じて、このような時間を算出し、かつ算出された結果（作業種別毎の作業時間）をディスプレイ２０３に表示させる。 In this way, the information processing device 2 identifies the task type every second. Therefore, by accumulating the time for each task type, the time required for each task type can be calculated. The information processing device 2 calculates such times, for example, in response to a user operation, and displays the calculated results (task time for each task type) on the display 203.

＜Ｆ．小括＞
情報処理装置２で実行される処理の一部を小括すると、以下のとおりである。 <F. Summary>
Some of the processes executed by the information processing device 2 can be summarized as follows.

（１）情報処理装置２は、カメラ１（可視光カメラ）による撮像によって得られた連続する複数のフレーム画像データ（動画像データ、映像データ）を取得する動画像データ取得部１０を備える。カメラ１は、設置位置および姿勢が固定され、かつ、光の明滅を伴う作業が作業者によって行われている作業場を被写体として撮像する。 (1) The information processing device 2 includes a video data acquisition unit 10 that acquires a series of multiple frame image data (video data, image data) captured by a camera 1 (visible light camera). The camera 1 is fixed in its installation position and orientation, and captures an image of a workplace where workers are performing work that involves blinking light.

情報処理装置２は、上記複数のフレーム画像データのうちのフレーム画像データ＃Ｎにおいて、作業者の領域（人物領域）を検出する人物領域検出部２０と、フレーム画像データ＃Ｎと、上記複数のフレーム画像データのうちフレーム画像データ＃Ｎよりも１個前のフレーム画像データ＃Ｎ－１とに基づいて、被写体の状態変化を示す画像データを生成する生成部３０と、生成された画像データから、検出された作業者の領域に対応する領域の画像データを抽出する抽出部４０と、抽出された画像データに基づき、作業場で行われている作業の種別を判定する作業種別判定部５０とをさらに備える。 The information processing device 2 further includes a person area detection unit 20 that detects the area of the worker (person area) in frame image data #N of the plurality of frame image data, a generation unit 30 that generates image data showing a change in the state of the subject based on the frame image data #N and frame image data #N-1 that precedes frame image data #N of the plurality of frame image data, an extraction unit 40 that extracts image data of an area corresponding to the detected area of the worker from the generated image data, and a work type determination unit 50 that determines the type of work being performed in the workplace based on the extracted image data.

このような構成の情報処理装置２によれば、可視光カメラによる撮像によって得られた複数のフレーム画像データにおいて、作業者の領域における状態変化を示す画像データを抽出できる。さらに、情報処理装置２によれば、作業者の領域における状態変化を示す画像データに基づいて、作用場で行われている作業の種別を判定する。 According to the information processing device 2 configured in this way, it is possible to extract image data that indicates a change in state in the worker's area from a plurality of frame image data obtained by imaging with a visible light camera. Furthermore, according to the information processing device 2, the type of work being performed in the work area is determined based on the image data that indicates the change in state in the worker's area.

それゆえ、情報処理装置２によれば、可視光カメラによる撮像によって得られた動画像データに基づき、作業場において行われている作業の種別を判定可能となる。詳しくは、フレーム画像データ毎に、光の明滅を伴う作業の種別を精度良く判定することができる。 Therefore, the information processing device 2 can determine the type of work being performed in the workplace based on video image data captured by a visible light camera. In particular, the type of work that involves blinking of light can be accurately determined for each frame of image data.

（２）情報処理装置２は、判定された作業の種別を示す情報を記憶する記憶部６０をさらに備える。このような構成によれば、情報処理装置２では、判定の結果が記憶されるため、当該結果を用いた各種の後処理（たとえば、上述した最終判定処理、表示処理）を行うことができる。 (2) The information processing device 2 further includes a storage unit 60 that stores information indicating the type of work that has been determined. With this configuration, the information processing device 2 stores the results of the determination, and can therefore perform various post-processing steps using the results (for example, the above-mentioned final determination process and display process).

（３）生成部３０は、フレーム画像データ＃Ｎとフレーム画像データ＃Ｎ－１とを用いたフレーム差分法により、被写体の状態変化を示す画像データを生成する。このような構成によれば、移動物体の検出方法の1つであるフレーム差分法を用いることにより、被写体の状態変化を示す画像データを生成できる。 (3) The generation unit 30 generates image data indicating a change in the state of the subject by a frame difference method using frame image data #N and frame image data #N-1. With this configuration, image data indicating a change in the state of the subject can be generated by using the frame difference method, which is one of the methods for detecting moving objects.

（４）作業種別判定部５０は、作業場で行われている作業が溶接作業であるか否かを判定する。このような構成によれば、光の明滅を伴う作業が溶接作業であるか否かを判定できる。 (4) The work type determination unit 50 determines whether the work being performed in the workplace is welding work. With this configuration, it is possible to determine whether work accompanied by blinking light is welding work.

（５）作業種別判定部５０は、作業場で行われている作業が、予め指定された複数の作業のうちのいずれであるかを判定する。このような構成によれば、光の明滅を伴う作業が複数の作業のうちの何れであるか否かを判定できる。 (5) The task type determination unit 50 determines which of multiple pre-specified tasks the task being performed in the workplace is. With this configuration, it is possible to determine which of the multiple tasks the task that is accompanied by blinking light is.

（６）上記複数の作業は、溶接作業と、グラインダ作業と、ガウジング作業とを含む。このような構成によれば、光の明滅を伴う作業が、溶接作業、グラインダ作業、およびガウジング作業の何れであるかを判定できる。 (6) The multiple operations include welding, grinding, and gouging. With this configuration, it is possible to determine whether the operation accompanied by the blinking of light is welding, grinding, or gouging.

（７）作業種別判定部５０は、抽出された画像データを入力として受け付け、作業場で行われている作業が溶接作業であるか否かを判定する学習済みモデルＭ５１と、抽出された画像データを入力として受け付け、作業場で行われている作業がグラインダ作業であるか否かを判定する学習済みモデルＭ５２と、抽出された画像データを入力として受け付け、作業場で行われている作業がガウジング作業であるか否かを判定する学習済みモデルＭ５３とを含む。 (7) The work type determination unit 50 includes a trained model M51 that receives the extracted image data as input and determines whether the work being performed in the workplace is welding work, a trained model M52 that receives the extracted image data as input and determines whether the work being performed in the workplace is grinding work, and a trained model M53 that receives the extracted image data as input and determines whether the work being performed in the workplace is gouging work.

作業種別判定部５０は、学習済みモデルＭ５１による判定の結果と、学習済みモデルＭ５２による判定の結果と、学習済みモデルＭ５３による判定の結果とに基づき、作業場で行われている作業が、溶接作業とグラインダ作業とガウジング作業とのうちの何れであるかを判定する。 The work type determination unit 50 determines whether the work being performed in the workplace is welding work, grinding work, or gouging work, based on the results of the determination by the trained model M51, the results of the determination by the trained model M52, and the results of the determination by the trained model M53.

このような構成によれば、情報処理装置２は、学習済みモデルＭ５１，Ｍ５２，Ｍ５３を用いて、作業種別の判定をおこなう。それゆえ、情報処理装置２によれば、学習済みモデルを用いないルールベースの判定処理に比べて、精度の高い判定が可能となる。 According to this configuration, the information processing device 2 uses the trained models M51, M52, and M53 to determine the task type. Therefore, the information processing device 2 can make a more accurate determination than a rule-based determination process that does not use trained models.

（８）人物領域検出部２０は、フレーム画像データ＃Ｎを入力とし、かつ、作業者の領域を示す情報を出力する、学習済みモデルＭ２０である。このような構成によれば、情報処理装置２は、学習済みモデルＭ２０を用いて、作業者の領域（人物領域）の検出をおこなう。それゆえ、情報処理装置２によれば、学習済みモデルを用いないルールベースの検出処理に比べて、精度の高い検出が可能となる。 (8) The person area detection unit 20 is a trained model M20 that receives frame image data #N as input and outputs information indicating the area of the worker. With this configuration, the information processing device 2 uses the trained model M20 to detect the area of the worker (person area). Therefore, the information processing device 2 can achieve more accurate detection than a rule-based detection process that does not use a trained model.

（９）情報処理装置２は、複数のフレーム画像データの各々について、人物領域検出部２０による検出と、生成部３０による生成と、抽出部４０による抽出と、作業種別判定部５０による判定とを行う。情報処理装置２は、作業種別判定部５０により判定された作業種別毎の判定数に基づき、作業種別毎の作業時間を算出する。 (9) For each of the multiple frame image data, the information processing device 2 performs detection by the person area detection unit 20, generation by the generation unit 30, extraction by the extraction unit 40, and judgment by the task type determination unit 50. The information processing device 2 calculates the work time for each task type based on the number of judgments for each task type determined by the task type determination unit 50.

このような構成によれば、情報処理装置２は、カメラ１による撮像によって得られた複数のフレーム画像データに基づいて、各作業の作業時間の合計を作業毎に算出する。それゆえ、情報処理装置２のユーザ９５０は、どの作業にどの位の時間を要しているかを把握することができる。 With this configuration, the information processing device 2 calculates the total task time for each task based on multiple frame image data captured by the camera 1. Therefore, the user 950 of the information processing device 2 can know how much time each task takes.

＜Ｇ．変形例＞
（１）図１１は、図７に示したステップＳ２０２の一連の処理の変形例を示したフロー図である。以下では、３つの連続するフレーム画像データを用いたフレーム差分法を利用する構成について説明する。 G. Modifications
(1) Fig. 11 is a flow chart showing a modified example of the series of processes in step S202 shown in Fig. 7. In the following, a configuration using a frame difference method using three consecutive frame image data will be described.

図１１を参照して、図１１に示す一連の処理は、図７に示す一連の処理に比べて、以下の点が異なっている。図１１に示す一連の処理は、ステップＳ２２０２，Ｓ２２０３，Ｓ２２０４，Ｓ２２０５（図７参照）の代わりに、ステップＳ２２０２Ａ，Ｓ２２０３Ａ，Ｓ２２０４Ａ，Ｓ２２０５Ａを備える。さらに、図１１に示す一連の処理は、ステップＳ２２１１を備える点で、このステップを備えない図７とは異なる。 Referring to FIG. 11, the series of processes shown in FIG. 11 differs from the series of processes shown in FIG. 7 in the following respects. The series of processes shown in FIG. 11 includes steps S2202A, S2203A, S2204A, and S2205A instead of steps S2202, S2203, S2204, and S2205 (see FIG. 7). Furthermore, the series of processes shown in FIG. 11 differs from FIG. 7 in that it includes step S2211, which does not include this step.

なお、図１１の他のステップの処理は、図７で説明した処理と同じである。そこで、以下では、これらのステップＳ２２０２Ａ，Ｓ２２０３Ａ，Ｓ２２０４Ａ，Ｓ２２０５Ａ，Ｓ２２１１について説明する。 Note that the processing of the other steps in FIG. 11 is the same as the processing described in FIG. 7. Therefore, the following describes these steps S2202A, S2203A, S2204A, S2205A, and S2211.

ステップＳ２２０１の後のステップＳ２２０２Ａにおいて、プロセッサ２０１は、読み込んだ動画像データＤａから、連続する３つのフレーム画像データ＃Ｎ－１，＃Ｎ，＃Ｎ＋１を取得する。ステップＳ２２０３Ａにおいて、プロセッサ２０１は、フレーム画像データ＃Ｎ－１とフレーム画像データ＃Ｎとの差分を表す差分画像データＲと、フレーム画像データ＃Ｎとフレーム画像データ＃Ｎ＋１との差分を表す差分画像データＲとを生成する。 In step S2202A after step S2201, the processor 201 obtains three consecutive frame image data #N-1, #N, and #N+1 from the loaded video image data Da. In step S2203A, the processor 201 generates differential image data R representing the difference between frame image data #N-1 and frame image data #N, and differential image data R representing the difference between frame image data #N and frame image data #N+1.

ステップＳ２２０４Ａにおいて、プロセッサ２０１は、各差分画像データＲを二値化し、２つの二値化画像データＴを生成する。ステップＳ２２１１において、プロセッサ２０１は、２つの二値化画像データＴの共有部分を抽出し、画像データＷを生成する。具体的には、プロセッサ２０１は、２つの二値化画像データＴにおいてともに白色（すなわち、値が１の部分）となっている部分は白色（値を１）とし、それ以外は、黒色（値を０）とする。 In step S2204A, the processor 201 binarizes each differential image data R to generate two binary image data T. In step S2211, the processor 201 extracts the shared portion of the two binary image data T to generate image data W. Specifically, the processor 201 sets the portions of the two binary image data T that are both white (i.e., portions with a value of 1) to white (value 1), and sets the rest to black (value 0).

ステップＳ２２０５Ａにおいて、プロセッサ２０１は、抽出された画像データＷに対してクロージング処理を行うことにより、クロージング画像データＵを生成する。その後、プロセッサ２０１は、処理をステップＳ２２０６に進める。 In step S2205A, the processor 201 performs a closing process on the extracted image data W to generate closing image data U. The processor 201 then advances the process to step S2206.

詳しくは、本変形例においては、生成部３０が、フレーム画像データ＃Ｎとフレーム画像データ＃Ｎ－１とフレーム画像データ＃Ｎ＋１とを用いたフレーム差分法により、被写体の状態変化を示す画像データを生成する。 In more detail, in this modified example, the generation unit 30 generates image data indicating a change in the state of the subject by a frame difference method using frame image data #N, frame image data #N-1, and frame image data #N+1.

このような処理によれば、図７の構成よりも精度の高い判定処理が可能となる。 This type of processing enables more accurate determination processing than the configuration in Figure 7.

（２）上記においては、生成部３０が、連続するフレーム画像データ＃Ｎ，＃Ｎ－１を用いて、被写体の状態変化を示す画像データを生成する構成を例に挙げて説明したが、必ずしも、これに限定されるものではない。生成部３０は、フレーム画像データ＃Ｎと、フレーム画像データ＃Ｎよりも所定個前（１個以上前）のフレーム画像データとに基づいて、被写体の状態変化を示す画像データを生成する構成であればよい。たとえば、生成部３０は、フレーム画像データ＃Ｎと、フレーム画像データ＃Ｎよりも２個前のフレーム画像データ＃Ｎ－２とに基づいて、被写体の状態変化を示す画像データを生成してもよい。 (2) In the above, an example was described in which the generation unit 30 generates image data indicating a change in the subject's condition using successive frame image data #N and #N-1, but this is not necessarily limited to this. The generation unit 30 may be configured to generate image data indicating a change in the subject's condition based on frame image data #N and a predetermined number of frame image data (one or more frames) before frame image data #N. For example, the generation unit 30 may generate image data indicating a change in the subject's condition based on frame image data #N and frame image data #N-2, which is two frames before frame image data #N.

（３）被写体の状態変化を示す画像データを生成できれば、フレーム差分法以外の移動体検出の手法も適用可能である。 (3) If it is possible to generate image data that indicates changes in the state of a subject, it is possible to apply moving object detection methods other than the frame difference method.

（４）作業者の領域（人物領域）の検出には、必ずしも、学習済みモデルを用いる必要はない。ルールベースの手法により、作業者の領域を検出してもよい。 (4) It is not necessary to use a trained model to detect the worker's area (person area). The worker's area may be detected using a rule-based method.

今回開示された実施の形態は例示であって、上記内容のみに制限されるものではない。本発明の範囲は特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed herein are illustrative and are not limited to the above. The scope of the present invention is defined by the claims, and is intended to include all modifications within the scope and meaning equivalent to the claims.

１カメラ、２情報処理装置、１０動画像データ取得部、２０人物領域検出部、３０生成部、４０抽出部、５０作業種別判定部、５１溶接作業判定部、５２グラインダ作業判定部、５３ガウジング作業判定部、５４最終判定部、６０記憶部、７０表示制御部、８０表示部、２０１プロセッサ、２０２メモリ、２０３ディスプレイ、２０４入力装置、２０５通信インターフェイス、２０６カードリーダ、２０７ポート、９００作業者、９５０ユーザ、１０００判定システム、Ｍ２０，Ｍ５１，Ｍ５２，Ｍ５３学習済みモデル。 1 camera, 2 information processing device, 10 video data acquisition unit, 20 person area detection unit, 30 generation unit, 40 extraction unit, 50 work type determination unit, 51 welding work determination unit, 52 grinding work determination unit, 53 gouging work determination unit, 54 final determination unit, 60 storage unit, 70 display control unit, 80 display unit, 201 processor, 202 memory, 203 display, 204 input device, 205 communication interface, 206 card reader, 207 port, 900 worker, 950 user, 1000 determination system, M20, M51, M52, M53 trained model.

Claims

an acquisition means for acquiring a plurality of consecutive frame image data obtained by imaging with a visible light camera, the visible light camera being installed at a fixed position and attitude, and capturing an image of a workplace where a worker is performing work involving blinking light as a subject;
a detection means for detecting an area of the worker in a first frame image data of the plurality of frame image data;
a generating means for generating difference data between the first frame image data and a second frame image data that is a predetermined number of frames before the first frame image data among the plurality of frame image data by a frame difference method, and for generating image data that indicates a change in state of the subject based on the generated difference data and the first frame image data;
an extraction means for extracting image data of an area corresponding to an area of the detected worker from the generated image data;
The information processing device further comprises a determination means for determining whether a type of work being performed at the workplace is welding work, grinding work, or gouging work, based on the extracted image data.

The information processing device according to claim 1, wherein the second frame image data is the frame image data immediately preceding the first frame image data.

The determination means is
A first trained model that receives the extracted image data as an input and determines whether the work being performed at the workplace is the welding work; and
A second trained model that receives the extracted image data as an input and determines whether the work being performed at the workplace is the grinding work; and
a third trained model that receives the extracted image data as an input and determines whether the work being performed at the work site is the gouging work;
3. The information processing device according to claim 1 or 2, which determines whether the work being performed in the workplace is the welding work, the grinding work, or the gouging work based on a result of judgment using the first trained model, a result of judgment using the second trained model, and a result of judgment using the third trained model.

The information processing device according to claim 1 , wherein the detection means is a fourth trained model that receives the first frame image data as an input and outputs information indicating an area of the worker.

For each of the plurality of frame image data, detection by the detection means, generation by the generation means, extraction by the extraction means, and determination by the determination means are performed;
The information processing apparatus according to claim 1 , further comprising: a determining unit that determines a task time for each of the task types based on a determination number for each of the task types determined by the determining unit.

a step of capturing an image of a workplace where a worker is performing work involving blinking light by a visible light camera as a subject, the visible light camera having a fixed installation position and attitude;
acquiring a plurality of consecutive frame image data captured by the visible light camera;
detecting an area of the worker in a first frame image data of the plurality of frame image data;
generating difference data between the first frame image data and a second frame image data that is a predetermined number of frames before the first frame image data among the plurality of frame image data by a frame difference method;
generating image data representing a change in a state of the subject based on the generated difference data and the first frame image data;
extracting image data of an area corresponding to an area of the detected worker from the generated image data;
The information processing method further comprises a step of determining whether a type of work being performed at the workplace is welding work, grinding work, or gouging work based on the extracted image data.