JP7057959B2

JP7057959B2 - Motion analysis device

Info

Publication number: JP7057959B2
Application number: JP2017153215A
Authority: JP
Inventors: 祐樹永野; 勝彦植田; 伸敬島田; 良明白井
Original assignee: Sumitomo Rubber Industries Ltd; Ritsumeikan Trust
Current assignee: Sumitomo Rubber Industries Ltd; Ritsumeikan Trust
Priority date: 2016-08-09
Filing date: 2017-08-08
Publication date: 2022-04-21
Anticipated expiration: 2037-08-08
Also published as: JP2018026131A

Description

本発明は、物体の動作を解析するための動作解析装置、方法及びプログラム、並びに物体の動作を解析するためのモデルを構築するモデル構築装置、方法及びプログラムに関し、特に、ゴルフスイングを解析するのに適した装置、方法及びプログラムに関する。 The present invention relates to a motion analysis device, method and program for analyzing the motion of an object, and a model building device, method and program for constructing a model for analyzing the motion of an object, and particularly analyzes a golf swing. With respect to suitable equipment, methods and programs.

従来より、ゴルフスイングをカメラで撮影し、このときの画像に基づいてゴルフスイングを解析する装置が公知である（特許文献１，２等）。解析の結果は、ゴルファーに適したゴルフクラブのフィッティングや、ゴルファーのフォームの改善、ゴルフ用品の開発等、様々な用途で利用される。以上のようなゴルフスイングの解析においては、しばしば、画像に写るゴルフクラブのグリップやヘッド、ゴルファーの関節等、注目部位の三次元計測が行われる。特許文献１，２は、複数台のカメラにより複数の方向からゴルフスイングを撮影し、三角測量法やＤＬＴ法等に基づいて注目部位の三次元計測を行うことを開示している。 Conventionally, a device for photographing a golf swing with a camera and analyzing the golf swing based on the image at this time has been known (Patent Documents 1, 2, etc.). The results of the analysis are used for various purposes such as fitting a golf club suitable for a golfer, improving the form of a golfer, and developing golf equipment. In the analysis of the golf swing as described above, three-dimensional measurement of the part of interest such as the grip and head of the golf club and the joint of the golfer shown in the image is often performed. Patent Documents 1 and 2 disclose that a golf swing is photographed from a plurality of directions by a plurality of cameras, and three-dimensional measurement of a region of interest is performed based on a triangulation method, a DLT method, or the like.

また、近年では、二次元カメラに加え、距離画像センサを備えたｋｉｎｅｃｔ（登録商標）と呼ばれる三次元計測が可能なデバイスが普及しており、同装置を用いた人物の動作解析の研究が盛んである（例えば、特許文献３，４等）。 In recent years, in addition to two-dimensional cameras, a device called kinect (registered trademark) equipped with a range image sensor that enables three-dimensional measurement has become widespread, and research on motion analysis of people using this device has been active. (For example, Patent Documents 3, 4, etc.).

特開２０１３－２１５３４９号公報Japanese Unexamined Patent Publication No. 2013-215349 特開２００４－３１３４７９号公報Japanese Unexamined Patent Publication No. 2004-313479 特開２０１５－１０６２８１号公報Japanese Unexamined Patent Publication No. 2015-106281 特開２０１６－７５６９３号公報Japanese Unexamined Patent Publication No. 2016-75693

しかしながら、特許文献１，２のようなカメラの構成は、設備を大型化させ得る。一方、特許文献３，４のような、距離画像センサからの深度画像に基づく動作の解析技術については、未だ発展段階であり、解析の精度のさらなる向上が望まれる。 However, the camera configuration as in Patent Documents 1 and 2 can increase the size of the equipment. On the other hand, the operation analysis technique based on the depth image from the distance image sensor, such as Patent Documents 3 and 4, is still in the development stage, and further improvement of the analysis accuracy is desired.

本発明は、簡易かつ高精度に物体の動作を解析することができる動作解析装置、方法及びプログラム、並びに物体の動作を解析するためのモデルを構築するモデル構築装置、方法及びプログラムを提供することを目的とする。 The present invention provides a motion analysis device, a method and a program capable of analyzing the motion of an object easily and with high accuracy, and a model building device, a method and a program for constructing a model for analyzing the motion of an object. With the goal.

第１観点に係る動作解析装置は、物体の動作を解析するための動作解析装置であって、取得部と、導出部とを備える。前記取得部は、前記物体の動作を距離画像センサにより撮影した深度画像を取得する。前記導出部は、前記物体の動作を定量的に表す動作値を出力とするニューラルネットワークに、前記取得部により取得された前記深度画像を入力することにより、前記動作値を導出する。 The motion analysis device according to the first aspect is a motion analysis device for analyzing the motion of an object, and includes an acquisition unit and a derivation unit. The acquisition unit acquires a depth image obtained by capturing the movement of the object with a distance image sensor. The derivation unit derives the operation value by inputting the depth image acquired by the acquisition unit into a neural network that outputs an operation value that quantitatively represents the operation of the object.

第２観点に係る動作解析装置は、第１観点に係る動作解析装置であって、前記深度画像は、ゴルフスイングを撮影した画像である。 The motion analysis device according to the second aspect is the motion analysis device according to the first aspect, and the depth image is an image obtained by photographing a golf swing.

第３観点に係る動作解析装置は、第２観点に係る動作解析装置であって、前記動作値は、ゴルファーの腰の回転角度である。 The motion analysis device according to the third aspect is the motion analysis device according to the second aspect, and the motion value is the rotation angle of the golfer's waist.

第４観点に係る動作解析装置は、第３観点に係る動作解析装置であって、前記取得部は、前記距離画像センサにより計測された人体の骨組みを表すスケルトンデータをさらに取得する。前記導出部は、前記スケルトンデータに基づいて、前記深度画像から前記腰の近傍の腰領域を抽出した後、前記腰領域の画像を前記ニューラルネットワークに入力する。 The motion analysis device according to the fourth aspect is the motion analysis device according to the third aspect, and the acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor. The derivation unit extracts the waist region in the vicinity of the waist from the depth image based on the skeleton data, and then inputs the image of the waist region to the neural network.

第５観点に係る動作解析装置は、第３観点又は第４観点に係る動作解析装置であって、前記ニューラルネットワークは、前記深度画像から前記ゴルファーの腕を表す腕領域を無効化するドロップアウト層を有する。 The motion analysis device according to the fifth aspect is the motion analysis device according to the third aspect or the fourth aspect, and the neural network is a dropout layer that invalidates the arm region representing the golfer's arm from the depth image. Has.

第６観点に係る動作解析装置は、第２観点に係る動作解析装置であって、前記動作値は、ゴルファーの体重移動を表す値である。 The motion analysis device according to the sixth aspect is the motion analysis device according to the second aspect, and the motion value is a value representing the weight shift of the golfer.

第７観点に係る動作解析装置は、第６観点に係る動作解析装置であって、前記取得部は、前記距離画像センサにより計測された人体の骨組みを表すスケルトンデータをさらに取得する。前記導出部は、前記スケルトンデータに基づいて、前記深度画像から前記ゴルファーの近傍のゴルファー領域を抽出した後、前記ゴルファー領域の画像を前記ニューラルネットワークに入力する。 The motion analysis device according to the seventh aspect is the motion analysis device according to the sixth aspect, and the acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor. The derivation unit extracts a golfer region in the vicinity of the golfer from the depth image based on the skeleton data, and then inputs the image of the golfer region to the neural network.

第８観点に係る動作解析装置は、第２観点に係る動作解析装置であって、前記動作値は、ゴルファーの肩の回転角度である。 The motion analysis device according to the eighth aspect is the motion analysis device according to the second aspect, and the motion value is the rotation angle of the golfer's shoulder.

第９観点に係る動作解析装置は、第８観点に係る動作解析装置であって、前記取得部は、前記距離画像センサにより計測された人体の骨組みを表すスケルトンデータをさらに取得する。前記導出部は、前記スケルトンデータに基づいて、前記深度画像から前記肩の近傍の肩領域を抽出した後、前記肩領域の画像を前記ニューラルネットワークに入力する。 The motion analysis device according to the ninth aspect is the motion analysis device according to the eighth aspect, and the acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor. The derivation unit extracts a shoulder region in the vicinity of the shoulder from the depth image based on the skeleton data, and then inputs the image of the shoulder region to the neural network.

第１０観点に係る動作解析装置は、第１観点から第９観点のいずれかに係る動作解析装置であって、前記ニューラルネットワークは、畳み込み層を有する。 The motion analysis device according to the tenth viewpoint is the motion analysis device according to any one of the first to ninth viewpoints, and the neural network has a convolution layer.

第１１観点に係る動作解析装置は、第１観点から第１０観点のいずれかに係る動作解析装置であって、前記取得部は、時系列の前記深度画像を取得する。前記導出部は、前記ニューラルネットワークに前記時系列の深度画像を入力することにより、時系列の前記動作値を導出する。 The motion analysis device according to the eleventh viewpoint is a motion analysis device according to any one of the first to tenth viewpoints, and the acquisition unit acquires the depth image in time series. The derivation unit derives the operation value of the time series by inputting the depth image of the time series into the neural network.

第１２観点に係る動作解析装置は、第１観点、第２観点、第１０観点及び第１１観点のいずれかに係る動作解析装置であって、前記取得部は、前記距離画像センサにより計測された人体の骨組みを表すスケルトンデータをさらに取得する。前記導出部は、前記スケルトンデータに基づいて、前記深度画像から前記物体の注目部位の近傍の注目領域を抽出した後、前記注目領域の画像を前記ニューラルネットワークに入力する。 The motion analysis device according to the twelfth viewpoint is a motion analysis device according to any one of the first viewpoint, the second viewpoint, the tenth viewpoint and the eleventh viewpoint, and the acquisition unit is measured by the distance image sensor. Further acquire skeleton data representing the skeleton of the human body. Based on the skeleton data, the derivation unit extracts a region of interest in the vicinity of the region of interest of the object from the depth image, and then inputs an image of the region of interest to the neural network.

第１３観点に係る動作解析装置は、第１観点、第２観点及び第１０観点～第１２観点のいずれかに係る動作解析装置であって、前記ニューラルネットワークは、前記深度画像から前記物体の注目していない部位を表す非注目領域を無効化するドロップアウト層を有する。 The motion analysis device according to the thirteenth viewpoint is a motion analysis device according to any one of the first viewpoint, the second viewpoint and the tenth viewpoint to the twelfth viewpoint, and the neural network pays attention to the object from the depth image. It has a dropout layer that nullifies non-focused areas that represent areas that are not.

第１４観点に係るモデル構築装置は、物体の動作を解析するためのモデルを構築するモデル構築装置であって、第１取得部と、第２取得部と、学習部とを備える。前記第１取得部は、前記物体の動作を距離画像センサにより撮影した多数の深度画像を取得する。前記第２取得部は、前記多数の深度画像にそれぞれ対応する、前記物体の動作を定量的に表す多数の動作値を取得する。前記学習部は、前記第１取得部により取得された前記多数の深度画像に基づいて、前記第２取得部により取得された前記多数の動作値を教師信号として、前記深度画像を入力とし、前記動作値を出力とするニューラルネットワークを学習する。 The model building apparatus according to the fourteenth aspect is a model building device for constructing a model for analyzing the motion of an object, and includes a first acquisition unit, a second acquisition unit, and a learning unit. The first acquisition unit acquires a large number of depth images obtained by capturing the movement of the object with a distance image sensor. The second acquisition unit acquires a large number of motion values that quantitatively represent the motion of the object, respectively, corresponding to the large number of depth images. The learning unit uses the large number of operation values acquired by the second acquisition unit as a teacher signal based on the large number of depth images acquired by the first acquisition unit, and inputs the depth image to the learning unit. Learn a neural network that outputs operation values.

第１５観点に係るモデル構築装置は、第１４観点に係るモデル構築装置であって、前記第２取得部は、前記物体に取り付けられた角速度センサの出力値から前記物体の回転角度を表す前記多数の動作値を取得する。 The model building apparatus according to the fifteenth viewpoint is the model building device according to the fourteenth viewpoint, and the second acquisition unit represents the rotation angle of the object from the output value of the angular velocity sensor attached to the object. Get the operation value of.

第１６観点に係るモデル構築装置は、第１４観点に係るモデル構築装置であって、前記第２取得部は、前記物体が乗る床反力計の出力値から前記物体の重心の位置を表す前記多数の動作値を取得する。 The model building device according to the 16th viewpoint is the model building device according to the 14th viewpoint, and the second acquisition unit represents the position of the center of gravity of the object from the output value of the floor reaction force meter on which the object is placed. Get a large number of operation values.

第１７観点に係る動作解析システムは、物体の動作を解析するための動作解析システムであって、距離画像センサと、動作解析装置とを備える。前記距離画像センサは、前記物体の動作を捉えた深度画像を撮影する。前記動作解析装置は、前記物体の動作を定量的に表す動作値を出力とするニューラルネットワークに、前記距離画像センサにより撮影された前記深度画像を入力することにより、前記動作値を導出する。 The motion analysis system according to the seventeenth aspect is a motion analysis system for analyzing the motion of an object, and includes a distance image sensor and a motion analysis device. The distance image sensor captures a depth image that captures the movement of the object. The motion analysis device derives the motion value by inputting the depth image captured by the distance image sensor into a neural network that outputs an motion value that quantitatively represents the motion of the object.

第１８観点に係る動作解析方法は、物体の動作を解析するための動作解析方法であって、以下のステップを含む。
（１）距離画像センサにより前記物体の動作を捉えた深度画像を撮影するステップ。
（２）前記物体の動作を定量的に表す動作値を出力とするニューラルネットワークに、前記距離画像センサにより撮影された前記深度画像を入力することにより、前記動作値を導出するステップ。 The motion analysis method according to the eighteenth aspect is a motion analysis method for analyzing the motion of an object, and includes the following steps.
(1) A step of taking a depth image that captures the movement of the object by a distance image sensor.
(2) A step of deriving the operation value by inputting the depth image captured by the distance image sensor into a neural network that outputs an operation value that quantitatively represents the operation of the object.

第１９観点に係る動作解析プログラムは、物体の動作を解析するための動作解析プログラムであって、以下のステップをコンピュータに実行させる。
（１）前記物体の動作を距離画像センサにより撮影した深度画像を取得するステップ。
（２）前記物体の動作を定量的に表す動作値を出力とするニューラルネットワークに、前記取得された深度画像を入力することにより、前記動作値を導出するステップ。 The motion analysis program according to the nineteenth aspect is a motion analysis program for analyzing the motion of an object, and causes a computer to execute the following steps.
(1) A step of acquiring a depth image of the movement of the object taken by a distance image sensor.
(2) A step of deriving the operation value by inputting the acquired depth image into a neural network that outputs an operation value that quantitatively represents the operation of the object.

第１観点によれば、物体の動作を距離画像センサにより撮影した深度画像が取得され、当該深度画像がニューラルネットワークに入力され、当該ニューラルネットワークの出力として物体の動作を定量的に表す動作値が導出される。つまり、ニューラルネットワークにより、深度画像から直接的に、物体の動作を定量的に評価することができる。以上より、簡易かつ高精度に物体の動作を解析することができる。 According to the first aspect, a depth image obtained by capturing the movement of an object with a distance image sensor is acquired, the depth image is input to a neural network, and an operation value that quantitatively represents the movement of the object is output as an output of the neural network. Derived. That is, the neural network can quantitatively evaluate the movement of the object directly from the depth image. From the above, it is possible to analyze the movement of an object easily and with high accuracy.

本発明の第１実施形態に係る動作解析装置を含む動作解析システムの全体構成を示す図。The figure which shows the whole structure of the motion analysis system including the motion analysis apparatus which concerns on 1st Embodiment of this invention. 第１実施形態に係る動作解析システムの機能ブロック図。The functional block diagram of the motion analysis system which concerns on 1st Embodiment. 腰の回転角度を導出するためのニューラルネットワークのモデル構成を示す図。The figure which shows the model structure of the neural network for deriving the rotation angle of the waist. 第１実施形態に係るニューラルネットワークに基づく動作解析処理の流れを示すフローチャート。The flowchart which shows the flow of the motion analysis processing based on the neural network which concerns on 1st Embodiment. 正規化された深度画像の例を示す図。The figure which shows the example of the normalized depth image. スケルトンデータの例を示す図。The figure which shows the example of the skeleton data. 時系列の腰領域の画像の例を示す図。The figure which shows the example of the image of the waist region of time series. 腕領域が除去された時系列の腰領域の画像の例を示す図。The figure which shows the example of the image of the waist region of the time series which the arm region was removed. ニューラルネットワークに基づく腰の回転角度の推定値のグラフ。Graph of estimated hip rotation angle based on neural network. 腰の回転角度を導出するためのニューラルネットワークの学習処理の流れを示すフローチャート。A flowchart showing the flow of the learning process of the neural network for deriving the rotation angle of the waist. 本発明の第２実施形態に係る動作解析装置を含む動作解析システムの全体構成を示す図。The figure which shows the whole structure of the motion analysis system including the motion analysis apparatus which concerns on 2nd Embodiment of this invention. 第２実施形態に係る動作解析システムの機能ブロック図。The functional block diagram of the motion analysis system which concerns on 2nd Embodiment. 身体の重心移動量を導出するためのニューラルネットワークのモデル構成を示す図。The figure which shows the model structure of the neural network for deriving the amount of movement of the center of gravity of a body. 肩の回転角度を導出するためのニューラルネットワークのモデル構成を示す図。The figure which shows the model structure of the neural network for deriving the rotation angle of a shoulder. 第２実施形態に係るニューラルネットワークに基づく動作解析処理の流れを示すフローチャート。The flowchart which shows the flow of the motion analysis processing based on the neural network which concerns on 2nd Embodiment. 図５の深度画像に対応する人物領域画像を示す図。The figure which shows the person area image corresponding to the depth image of FIG. 図５の深度画像から図１６Ａの人物領域画像を用いて作成された切り取り画像を示す図。The figure which shows the cut-out image created from the depth image of FIG. 5 using the person area image of FIG. 16A. 鳩尾を基準とするゴルファー領域の設定方法の例を説明する図。The figure explaining the example of the setting method of the golfer area with respect to the pigeon tail. ニューラルネットワークに基づく身体のＸ方向の重心移動量の推定値のグラフ。A graph of the estimated value of the movement of the center of gravity of the body in the X direction based on the neural network. ニューラルネットワークに基づく身体のＹ方向の重心移動量の推定値のグラフ。A graph of the estimated value of the movement of the center of gravity of the body in the Y direction based on the neural network. 図１８Ａ及び図１８Ｂの推定値に基づく平面視における身体の重心の軌跡のグラフ。Graphs of the locus of the center of gravity of the body in a plan view based on the estimates of FIGS. 18A and 18B. 肩中央を基準とする肩領域の設定方法の例を説明する図。The figure explaining the example of the setting method of the shoulder area with respect to the center of the shoulder. ニューラルネットワークに基づく肩の回転角度の推定値のグラフ。Graph of estimated shoulder rotation angle based on neural network. 身体の重心移動量を導出するためのニューラルネットワークの学習処理の流れを示すフローチャート。A flowchart showing the flow of learning processing of a neural network for deriving the amount of movement of the center of gravity of the body. 肩の回転角度を導出するためのニューラルネットワークの学習処理の流れを示すフローチャート。A flowchart showing the flow of the learning process of the neural network for deriving the rotation angle of the shoulder.

以下、図面を参照しつつ、本発明の幾つかの実施形態に係る動作解析装置、方法及びプログラム、並びにモデル構築装置、方法及びプログラムについて説明する。以下の実施形態は、ゴルフスイングを解析する場面を例に説明される。 Hereinafter, the motion analysis device, the method and the program, and the model building device, the method and the program according to some embodiments of the present invention will be described with reference to the drawings. The following embodiment will be described by taking a scene of analyzing a golf swing as an example.

＜１．第１実施形態＞
＜１－１．動作解析システムの概要＞
図１及び図２に、本実施形態に係る動作解析装置１を含む動作解析システム１００の全体構成図を示す。動作解析システム１００は、ゴルファー７によるゴルフクラブ５のスイング動作を動画として撮影し、当該動画に基づいてスイング動作を解析するためのシステムである。以上の撮影は、距離画像センサ２により行われる。動作解析装置１は、距離画像センサ２とともに動作解析システム１００を構成し、距離画像センサ２により取得される深度画像を含む画像データを解析することにより、スイング動作を解析する。動作解析装置１による解析の結果は、ゴルファー７に適したゴルフクラブ５のフィッティングや、ゴルファー７のフォームの改善、ゴルフ用品の開発等、様々な用途で利用される。 <1. First Embodiment>
<1-1. Overview of motion analysis system>
1 and 2 show an overall configuration diagram of an operation analysis system 100 including an operation analysis device 1 according to the present embodiment. The motion analysis system 100 is a system for photographing the swing motion of the golf club 5 by the golfer 7 as a moving image and analyzing the swing motion based on the moving image. The above shooting is performed by the distance image sensor 2. The motion analysis device 1 constitutes a motion analysis system 100 together with the range image sensor 2, and analyzes the swing motion by analyzing the image data including the depth image acquired by the range image sensor 2. The result of the analysis by the motion analysis device 1 is used for various purposes such as fitting of a golf club 5 suitable for the golfer 7, improvement of the form of the golfer 7, development of golf equipment, and the like.

スイング動作の解析は、深度画像を入力とするニューラルネットワーク８（図３参照）に基づいて行われる。ニューラルネットワーク８は、スイング動作を解析するためのモデルであり、スイング動作を定量的に表す動作値を出力する。ニューラルネットワーク８は、事前学習により構築される。以下、動作解析システム１００の各部の詳細について述べた後、ニューラルネットワーク８のモデル構成、ニューラルネットワーク８に基づく動作解析方法、及びニューラルネットワーク８の学習方法について順に説明する。 The analysis of the swing motion is performed based on the neural network 8 (see FIG. 3) that inputs the depth image. The neural network 8 is a model for analyzing a swing motion, and outputs an motion value that quantitatively represents the swing motion. The neural network 8 is constructed by pre-learning. Hereinafter, the details of each part of the motion analysis system 100 will be described, and then the model configuration of the neural network 8, the motion analysis method based on the neural network 8, and the learning method of the neural network 8 will be described in order.

＜１－２．各部の詳細＞
＜１－２－１．距離画像センサ＞
距離画像センサ２は、ゴルファー７がゴルフクラブ５を試打する様子を二次元画像として撮影するとともに、被写体までの距離を測定する測距機能を有する三次元計測カメラである。従って、距離画像センサ２は、二次元画像とともに、深度画像を出力することができる。なお、ここでいう二次元画像とは、撮影空間の像をカメラの光軸に直交する平面内へ投影した画像である。また、深度画像とは、カメラの光軸方向の被写体の奥行きのデータ（深度データ）を、二次元画像と略同じ撮像範囲内の画素に割り当てた画像である。 <1-2. Details of each part>
<1-2-1. Distance image sensor>
The distance image sensor 2 is a three-dimensional measurement camera having a distance measuring function of taking a picture of a golfer 7 trying to hit a golf club 5 as a two-dimensional image and measuring the distance to a subject. Therefore, the distance image sensor 2 can output a depth image together with the two-dimensional image. The two-dimensional image referred to here is an image obtained by projecting an image of the shooting space into a plane orthogonal to the optical axis of the camera. The depth image is an image in which the depth data (depth data) of the subject in the optical axis direction of the camera is assigned to pixels within the same imaging range as the two-dimensional image.

本実施形態で使用される距離画像センサ２は、二次元画像を赤外線画像（以下、ＩＲ画像という）として撮影する。また、深度画像は、赤外線を用いたタイムオブフライト方式やドットパターン投影方式等の方法により得られる。従って、図１に示すように、距離画像センサ２は、赤外線を前方に向けて発光するＩＲ発光部２１と、ＩＲ発光部２１から照射され、被写体に反射して戻ってきた赤外線を受光するＩＲ受光部２２とを有する。ＩＲ受光部２２は、光学系及び撮像素子等を有するカメラである。ドットパターン投影方式では、ＩＲ発光部２１から照射された赤外線のドットパターンをＩＲ受光部２２で読み取り、距離画像センサ２内部での画像処理によりドットパターンを検出し、これに基づいて奥行きが計算される。本実施形態では、ＩＲ発光部２１及びＩＲ受光部２２は、同じ筐体２０内に収容され、筐体２０の前方に配置されている。本実施形態では、距離画像センサ２は、ゴルファー７を正面側から撮影すべく、ゴルファー７の前方に設置され、ＩＲ発光部２１及びＩＲ受光部２２がゴルファー７に向けられている。 The distance image sensor 2 used in the present embodiment captures a two-dimensional image as an infrared image (hereinafter referred to as an IR image). Further, the depth image can be obtained by a method such as a time-of-flight method using infrared rays or a dot pattern projection method. Therefore, as shown in FIG. 1, the distance image sensor 2 has an IR light emitting unit 21 that emits infrared rays toward the front and an IR that receives infrared rays that are emitted from the IR light emitting unit 21 and reflected on the subject and returned. It has a light receiving unit 22. The IR light receiving unit 22 is a camera having an optical system, an image pickup device, and the like. In the dot pattern projection method, the infrared dot pattern emitted from the IR light emitting unit 21 is read by the IR light receiving unit 22, the dot pattern is detected by image processing inside the distance image sensor 2, and the depth is calculated based on this. To. In the present embodiment, the IR light emitting unit 21 and the IR light receiving unit 22 are housed in the same housing 20 and are arranged in front of the housing 20. In the present embodiment, the distance image sensor 2 is installed in front of the golfer 7 in order to photograph the golfer 7 from the front side, and the IR light emitting unit 21 and the IR light receiving unit 22 are directed toward the golfer 7.

距離画像センサ２には、距離画像センサ２の動作全体を制御するＣＰＵ２３の他、撮影された画像データを少なくとも一時的に記憶するメモリ２４が内蔵されている。距離画像センサ２の動作を制御する制御プログラムは、メモリ２４内に格納されている。また、距離画像センサ２には、通信部２５も内蔵されており、通信部２５は、撮影された画像データを有線又は無線の通信線１７を介して、動作解析装置１等の外部のデバイスへと出力することができる。本実施形態では、ＣＰＵ２３及びメモリ２４も、ＩＲ発光部２１及びＩＲ受光部２２とともに、筐体２０内に収納されている。なお、動作解析装置１への画像データの受け渡しは、必ずしも通信部２５を介して行う必要はない。例えば、メモリ２４が着脱式であれば、これを筐体２０内から取り外し、動作解析装置１のリーダー（後述する通信部１５に対応）に挿入する等して、動作解析装置１で画像データを読み出すことができる。 In addition to the CPU 23 that controls the entire operation of the distance image sensor 2, the distance image sensor 2 has a built-in memory 24 that stores captured image data at least temporarily. The control program that controls the operation of the distance image sensor 2 is stored in the memory 24. Further, the distance image sensor 2 also has a built-in communication unit 25, and the communication unit 25 transfers the captured image data to an external device such as the motion analysis device 1 via a wired or wireless communication line 17. Can be output. In the present embodiment, the CPU 23 and the memory 24 are also housed in the housing 20 together with the IR light emitting unit 21 and the IR light receiving unit 22. It should be noted that the transfer of image data to the motion analysis device 1 does not necessarily have to be performed via the communication unit 25. For example, if the memory 24 is removable, it can be removed from the housing 20 and inserted into the reader of the motion analysis device 1 (corresponding to the communication unit 15 described later) to obtain image data in the motion analysis device 1. It can be read.

＜１－２－２．動作解析装置＞
図２を参照しつつ、動作解析装置１の構成について説明する。動作解析装置１は、ハードウェアとしては汎用のコンピュータであり、例えば、デスクトップ型コンピュータ、ノート型コンピュータ、タブレットコンピュータ、スマートフォンとして実現される。動作解析装置１は、ＣＤ－ＲＯＭ、ＵＳＢメモリ等のコンピュータで読み取り可能な記録媒体３０から、或いはインターネット等のネットワークを介して、動作解析プログラム３を汎用のコンピュータにインストールすることにより製造される。動作解析プログラム３は、距離画像センサ２から送られてくる画像データに基づいてゴルフスイングを解析するためのソフトウェアであり、動作解析装置１に後述する動作を実行させる。 <1-2-2. Motion analysis device>
The configuration of the motion analysis device 1 will be described with reference to FIG. 2. The motion analysis device 1 is a general-purpose computer as hardware, and is realized as, for example, a desktop computer, a notebook computer, a tablet computer, or a smartphone. The motion analysis device 1 is manufactured by installing the motion analysis program 3 on a general-purpose computer from a computer-readable recording medium 30 such as a CD-ROM or a USB memory, or via a network such as the Internet. The motion analysis program 3 is software for analyzing the golf swing based on the image data sent from the distance image sensor 2, and causes the motion analysis device 1 to execute the motion described later.

動作解析装置１は、表示部１１、入力部１２、記憶部１３、制御部１４及び通信部１５を備える。これらの部１１～１５は、互いにバス線１６を介して接続されており、相互に通信可能である。表示部１１は、液晶ディスプレイ等で構成することができ、ゴルフスイングの解析の結果等をユーザに対し表示する。なお、ここでいうユーザとは、ゴルファー７自身やそのインストラクター、ゴルフ用品の開発者等、ゴルフスイングの解析の結果を必要とする者の総称である。入力部１２は、マウス、キーボード、タッチパネル等で構成することができ、動作解析装置１に対するユーザからの操作を受け付ける。 The motion analysis device 1 includes a display unit 11, an input unit 12, a storage unit 13, a control unit 14, and a communication unit 15. These units 11 to 15 are connected to each other via the bus line 16 and can communicate with each other. The display unit 11 can be configured as a liquid crystal display or the like, and displays the result of golf swing analysis or the like to the user. The term "user" as used herein is a general term for golfers 7 themselves, their instructors, developers of golf equipment, and other persons who require the results of golf swing analysis. The input unit 12 can be composed of a mouse, a keyboard, a touch panel, or the like, and receives an operation from the user on the motion analysis device 1.

記憶部１３は、ハードディスク等で構成することができる。記憶部１３内には、動作解析プログラム３が格納されている他、距離画像センサ２から送られてくる画像データが保存される。また、記憶部１３内には、後述する学習処理で学習され、後述する動作解析処理で使用されるニューラルネットワーク８を定義する情報が格納される。制御部１４は、ＣＰＵ、ＲＯＭおよびＲＡＭ等から構成することができる。制御部１４は、記憶部１３内の動作解析プログラム３を読み出して実行することにより、仮想的に第１取得部１４ａ、第２取得部１４ｂ、導出部１４ｃ、学習部１４ｄ及び表示制御部１４ｅとして動作する。各部１４ａ～１４ｅの動作の詳細については、後述する。通信部１５は、距離画像センサ２等の外部のデバイスから通信線１７を介してデータを受信する通信インターフェースとして機能する。 The storage unit 13 can be configured by a hard disk or the like. In addition to storing the motion analysis program 3, the storage unit 13 stores image data sent from the distance image sensor 2. Further, in the storage unit 13, information that is learned by the learning process described later and defines the neural network 8 used in the motion analysis process described later is stored. The control unit 14 can be composed of a CPU, a ROM, a RAM, and the like. By reading and executing the motion analysis program 3 in the storage unit 13, the control unit 14 virtually serves as the first acquisition unit 14a, the second acquisition unit 14b, the derivation unit 14c, the learning unit 14d, and the display control unit 14e. Operate. Details of the operation of each part 14a to 14e will be described later. The communication unit 15 functions as a communication interface for receiving data from an external device such as the distance image sensor 2 via the communication line 17.

＜１－３．ニューラルネットワークのモデル構成＞
次に、図３を参照しつつ、後述する動作解析処理の中で使用されるニューラルネットワーク８のモデル構成について説明する。ニューラルネットワーク８は、上述したとおり、深度画像を入力とし、スイング動作を定量的に表す動作値を出力とするネットワークである。本実施形態では、ニューラルネットワーク８により、スイング動作中のゴルファー７の腰の回転動作が定量的に解析され、より具体的には、腰の回転角度が定量的に導出される。 <1-3. Neural network model configuration>
Next, the model configuration of the neural network 8 used in the motion analysis process described later will be described with reference to FIG. As described above, the neural network 8 is a network that inputs a depth image and outputs an operation value that quantitatively represents a swing operation. In the present embodiment, the neural network 8 quantitatively analyzes the hip rotation motion of the golfer 7 during the swing motion, and more specifically, the hip rotation angle is quantitatively derived.

図３に示すとおり、本実施形態に係るニューラルネットワーク８は、畳み込みニューラルネットワークであり、識別部８１と、識別部８１の入力側に接続される特徴抽出部８２とを有する。識別部８１は、多層パーセプトロンである。特徴抽出部８２は、第１中間層８３及び第２中間層８４を有する多層構造に形成されており、深度画像の特徴量を抽出する。第１中間層８３は、畳み込み層８３Ａ、プーリング層８３Ｂ及び正規化層８３Ｃを有し、同様に、第２中間層８４も、畳み込み層８４Ａ、プーリング層８４Ｂ及び正規化層８４Ｃを有する。従って、ニューラルネットワーク８では、複数の中間層８３，８４を通過することにより、畳み込み、プーリング及び正規化の処理が複数回繰り返される。 As shown in FIG. 3, the neural network 8 according to the present embodiment is a convolutional neural network, and has an identification unit 81 and a feature extraction unit 82 connected to the input side of the identification unit 81. The identification unit 81 is a multi-layer perceptron. The feature extraction unit 82 is formed in a multilayer structure having a first intermediate layer 83 and a second intermediate layer 84, and extracts features of a depth image. The first intermediate layer 83 has a convolution layer 83A, a pooling layer 83B and a normalization layer 83C, and similarly, the second intermediate layer 84 also has a convolution layer 84A, a pooling layer 84B and a normalization layer 84C. Therefore, in the neural network 8, the convolution, pooling, and normalization processes are repeated a plurality of times by passing through the plurality of intermediate layers 83 and 84.

また、本実施形態に係るニューラルネットワーク８は、特徴抽出部８２の入力側にドロップアウト層８５を有する。一般的なドロップアウト層とは、多層ネットワークのユニットを確率的に選別し、学習した層であり、選別されたユニット以外を無効化、すなわち、存在しないかのように扱う。これにより、学習時にネットワークの自由度を強制的に小さくし、過学習を回避することができる。一方、本実施形態に係るドロップアウト層８５は、解析の対象となる入力画像において、解析の妨げとなる領域を除去する層である。ニューラルネットワーク８の入力は、深度画像である。本実施形態では、ドロップアウト層８５に入力される深度画像においては、後述するとおり、予め解析の妨げとなると分かっている背景及び腕の領域に対応する画素の画素値が所定の画素値（本実施形態では、「０」）に設定されている。そして、ドロップアウト層８５は、入力される深度画像（図３の例では、６４×３２ピクセル）の所定の画素値「０」の画素のユニットを無効化するような層であり、出力画像（図３の例では、６４×３２ピクセル）を出力する。つまり、ドロップアウト層８５は、無効化されるユニットを確率的に選別するのではなく、予め解析の妨げとなると分かっている領域を選別して無効化する。具体的には、ドロップアウト層８５は、「０」以外の部分を使って重みを決定し、「０」が含まれていたフィルタ応答値の重みを嵩上げして補償する層である。 Further, the neural network 8 according to the present embodiment has a dropout layer 85 on the input side of the feature extraction unit 82. The general dropout layer is a layer in which units of a multi-layer network are stochastically selected and learned, and the units other than the selected units are invalidated, that is, treated as if they do not exist. As a result, the degree of freedom of the network can be forcibly reduced during learning, and overfitting can be avoided. On the other hand, the dropout layer 85 according to the present embodiment is a layer for removing a region that hinders the analysis in the input image to be analyzed. The input of the neural network 8 is a depth image. In the present embodiment, in the depth image input to the dropout layer 85, as will be described later, the pixel values of the pixels corresponding to the background and arm regions that are known to interfere with the analysis in advance are predetermined pixel values (this). In the embodiment, it is set to "0"). The dropout layer 85 is a layer that invalidates the unit of pixels having a predetermined pixel value "0" in the input depth image (64 x 32 pixels in the example of FIG. 3), and is an output image (output image (64 × 32 pixels)). In the example of FIG. 3, 64 × 32 pixels) is output. That is, the dropout layer 85 does not probabilistically select the units to be invalidated, but selects and invalidates the region that is known to interfere with the analysis in advance. Specifically, the dropout layer 85 is a layer in which the weight is determined using a portion other than "0", and the weight of the filter response value including "0" is increased to compensate.

その後、ドロップアウト層８５からの出力画像に対し、多数の重みフィルタＧ₁，Ｇ₂，・・・，Ｇ_Nによる畳み込み処理が実行される（Ｎは、２以上の整数。図３の例では、Ｎ＝１６）。その結果、Ｎ枚の特徴マップＡ₁，Ａ₂，・・・，Ａ_Nが生成される。特徴マップＡ₁，Ａ₂，・・・，Ａ_Nは、畳み込み層８３Ａを構成するＮ個のユニットにそれぞれ入力される。具体的には、入力画像と重みフィルタＧ_nとの内積がラスタスキャンで繰り返し計算され、入力画像に重みフィルタＧ_nが畳み込まれてゆくことにより、特徴マップＡ_nが算出される（ｎ＝１，２，・・・，Ｎ）。図３の例では、重みフィルタＧ_nが５×５ピクセルであり、特徴マップＡ_nのサイズは、６０×２８ピクセルとなる。 After that, the output image from the dropout layer 85 is subjected to convolution processing by a large number of weight filters G ₁ , G ₂ , ..., GN ( _N is an integer of 2 or more. In the example of FIG. 3, , N = 16). As a result, _N feature maps A ₁ , A ₂ , ..., AN are generated. The feature maps A ₁ , A ₂ , ..., AN are input to the _N units constituting the convolution layer 83A, respectively. Specifically, the inner product of the input image and the weight filter G _n is repeatedly calculated by raster scan, and the weight filter G _n is convoluted into the input image to calculate the feature map A _n (n =). 1, 2, ..., N). In the example of FIG. 3, the weight filter G _n is 5 × 5 pixels, and the size of the feature map _An is 60 × 28 pixels.

重みフィルタＧ₁，Ｇ₂，・・・，Ｇ_Nは、フィルタカーネルとも呼ばれ、深度画像に比べて微細な画像（又は値の配列）であり、各々、入力画像に含まれる一定のパターン（特徴）を検出し、強調するためのフィルタである。特徴マップＡ_nは、重みフィルタＧ_nの特徴に反応し、入力画像において重みフィルタＧ_nの特徴が強調された画像（又は値の配列）である。ここで、入力画像のサイズをＨ１×Ｈ２ピクセルとし、入力画像の画素をインデックス（ｉ，ｊ）で表し（ｉ＝０，１，・・・，Ｈ１－１，ｊ＝０，１，・・・，Ｈ２－１）、入力画像の画素（ｉ，ｊ）の画素値をｘ_i,jと表す。また、重みフィルタＧ_nのサイズをＨ×Ｈピクセルとし、重みフィルタＧ_nの画素をインデックス（ｐ，ｑ）で表し（ｐ＝０，１，・・・，Ｈ－１，ｑ＝０，１，・・・，Ｈ－１）、重みフィルタＧ_nの画素（ｐ，ｑ）の画素値をｈ_p,qと表す。このとき、入力画像の画素（ｉ，ｊ）に畳み込まれた、特徴マップＡ_nの画素値ｔ_i,jは、以下のように算出することができる。

The weight filters G ₁ , G ₂ , ..., _GN , also called a filter kernel, are finer images (or an array of values) than a depth image, and each has a certain pattern (or an array of values) included in the input image. It is a filter for detecting and emphasizing (feature). The feature map A _n is an image (or an array of values) in which the features of the weight filter G _n are emphasized in the input image in response to the features of the weight filter G _n . Here, the size of the input image is H1 × H2 pixels, and the pixels of the input image are represented by indexes (i, j) (i = 0,1, ..., H1-1, j = 0,1, ... ·, H2-1), the pixel value of the pixel (i, j) of the input image is expressed as x _{i, j} . Further, the size of the weight filter G _n is set to H × H pixels, and the pixels of the weight filter G _n are represented by indexes (p, q) (p = 0,1, ..., H-1, q = 0,1). , ..., H-1), the pixel values of the pixels (p, q) of the weight filter G _n are expressed as h _{p, q} . At this time, the pixel values t _{i, j} _of the feature map An convoluted in the pixels (i, j) of the input image can be calculated as follows.

次に、特徴マップＡ₁，Ａ₂，・・・，Ａ_Nの各々に対しプーリング処理が実行され、その結果、Ｎ枚の特徴マップＢ₁，Ｂ₂，・・・，Ｂ_Nが生成される。特徴マップＢ₁，Ｂ₂，・・・，Ｂ_Nは、プーリング層８３Ｂを構成するＮ個のユニットにそれぞれ入力される。プーリング処理とは、特徴マップＡ_nに含まれる小領域を代表する応答値を出力することにより、特徴マップＡ_nを新たな特徴マップＢ_nに変換する処理である。このプーリング処理により、特徴マップＡ_nのサイズを縮小することができる。また、プーリング処理では、入力画像Ａ_nの小領域に含まれる多数の画素値が応答値に集約されるため、出力画像Ｂ_nにおいて位置感度が若干低下する。そのため、検出しようとする特徴の位置が深度画像において若干変化したとしても、当該変化を吸収することができ、プーリング処理後の出力画像Ｂ_nを一定に近づけることができる。 Next, pooling processing is executed for each of the feature maps A ₁ , A ₂ , ..., AN, and as a result, _N feature maps B ₁ , B ₂ , ..., _BN are generated. The map. The feature maps B ₁ , B ₂ , ..., BN are input to the _N units constituting the pooling layer 83B, respectively. The pooling process is a process of converting the feature map A _n into a new feature map B _n by outputting a response value representing a small area included in the feature map A _n . By this pooling process, the size of the feature map _Ann can be reduced. Further, in the pooling process, since a large number of pixel values included in the small area of the input image A _n are aggregated into the response value, the position sensitivity of the output image B _n is slightly lowered. Therefore, even if the position of the feature to be detected changes slightly in the depth image, the change can be absorbed and the output image B _n after the pooling process can be brought close to a constant value.

より具体的には、プーリング処理では、入力画像Ａ_nが小領域に分割され、各小領域に含まれる画素値に基づいて応答値となる１つの画素値が決定される。図３の例では、特徴マップＡ_nが２×２ピクセルの小領域に分割され、１／２のサイズに縮小される。従って、特徴マップＢ_nのサイズは、図３の例では、３０×１４ピクセルとなる。応答値の決定方法は様々考えられ、例えば、小領域内の画素値の平均値を応答値とすることもできるし（平均プーリング）、最大値を応答値とすることもできる（最大プーリング）。また、Ｌｐプーリングと呼ばれる方法のように、小領域内の大きな画素値の影響を大きくし、小さな画素値の影響もある程度残すように応答値を決定することもできる。 More specifically, in the pooling process, the input image _Ann is divided into small areas, and one pixel value as a response value is determined based on the pixel values included in each small area. In the example of FIG. 3, the feature map _Ann is divided into small areas of 2 × 2 pixels and reduced to 1/2 the size. Therefore, the size of the feature map B _n is 30 × 14 pixels in the example of FIG. Various methods for determining the response value can be considered. For example, the average value of the pixel values in the small area can be used as the response value (mean pooling), or the maximum value can be used as the response value (maximum pooling). Further, as in the method called Lp pooling, the response value can be determined so as to increase the influence of the large pixel value in the small region and leave the influence of the small pixel value to some extent.

続いて、特徴マップＢ₁，Ｂ₂，・・・，Ｂ_Nの各々が正規化され、その結果、Ｎ枚の特徴マップＣ₁，Ｃ₂，・・・，Ｃ_Nが生成される。特徴マップＣ₁，Ｃ₂，・・・，Ｃ_Nは、正規化層８３Ｃを構成するＮ個のユニットにそれぞれ入力される。ここでいう正規化とは、局所コントラスト正規化であり、本実施形態では、減算正規化が実行される。この正規化により、入力画像Ｂ_n上において周辺部の画素値に対して変化の大きな画素値が検出され、出力画像Ｃ_n上において当該画素値が強調される。 Subsequently, each of the feature maps B ₁ , B ₂ , ..., BN is normalized, and as a result, _N feature maps C ₁ , C ₂ , ..., _CN are generated. The feature maps C ₁ , C ₂ , ..., CN are input to the _N units constituting the normalization layer 83C, respectively. The normalization referred to here is local contrast normalization, and in the present embodiment, subtraction normalization is executed. By this normalization, a pixel value having a large change with respect to the pixel value in the peripheral portion is detected on the input image B _n , and the pixel value is emphasized on the output image C _n .

以上より、第１中間層８３での処理が終了し、第２中間層８４での処理に移行し、２回目の畳み込み処理が実行される。２回目の畳み込み処理では、第１中間層８３から出力されるＮ枚の特徴マップＣ₁，Ｃ₂，・・・，Ｃ_Nの中から、所定枚数の、本実施形態では４枚の特徴マップをランダムに選択して１組とし、このような組をＭ組作成する（図３の例では、Ｍ＝２５６）。また、多数の重みフィルタＬ₁，Ｌ₂，・・・，Ｌ_Rを新たに用意し（Ｒは、２以上の整数。本実施形態では、Ｒ＝１６）、これらの重みフィルタＬ₁，Ｌ₂，・・・，Ｌ_Rの中から、特徴マップの１つの組に含まれる特徴マップの枚数と同数の、すなわち、本実施形態では４枚の重みフィルタをランダムに選択して１組とし、このような組をＭ組作成する。こうして選択された特徴マップの組に対し、重みフィルタの組を１対１で対応させ、畳み込みを行う。より具体的には、ある組に含まれる４枚の特徴マップと、当該組に対応する組に含まれる４枚の重みフィルタを１対１で対応させ、この対応関係に従って、畳み込みを行う。本実施形態では、ここでの重みフィルタＬ_rは、５×５ピクセルである（ｒ＝１，２，・・・，Ｒ）。なお、畳み込みの方法は、第１中間層８３での処理と同じである。そして、重みフィルタＬ_rが畳み込まれた特徴マップが組毎に平均され、その結果、Ｍ枚の特徴マップＤ₁，Ｄ₂，・・・，Ｄ_Mが生成される。特徴マップＤ₁，Ｄ₂，・・・，Ｄ_Mは、畳み込み層８４Ａを構成するＭ個のユニットにそれぞれ入力される。特徴マップＤ_mのサイズ（ｍ＝１，２，・・・，Ｍ）は、図３の例では、２６×１０ピクセルとなる。 From the above, the processing in the first intermediate layer 83 is completed, the process proceeds to the processing in the second intermediate layer 84, and the second convolution processing is executed. In the second _convolution process, a predetermined number of feature maps C ₁ , C ₂ , ..., CN output from the first intermediate layer 83, and four feature maps in this embodiment. Are randomly selected to form one set, and M sets of such sets are created (in the example of FIG. 3, M = 256). Further, a large number of weight filters L ₁ , L ₂ , ..., L _R are newly prepared (R is an integer of 2 or more. In this embodiment, R = 16), and these weight filters L ₁ , L are prepared. From ₂ , ..., _LR , the same number of feature maps as the number of feature maps included in one set of feature maps, that is, in this embodiment, four weight filters are randomly selected to form one set. Create M sets like this. The set of weight filters is made to correspond one-to-one with the set of feature maps selected in this way, and convolution is performed. More specifically, the four feature maps included in a certain set and the four weight filters included in the set corresponding to the set have a one-to-one correspondence, and convolution is performed according to this correspondence. In the present embodiment, the weight filter L _r here is 5 × 5 pixels (r = 1, 2, ..., R). The method of convolution is the same as the processing in the first intermediate layer 83. Then, the feature maps in which the weight filters L _r are convoluted are averaged for each set, and as a result, _M feature maps D ₁ , D ₂ , ..., DM are generated. The feature maps D ₁ , D ₂ , ..., DM are input to the _M units constituting the convolution layer 84A, respectively. The size (m = 1, 2, ..., M) of the feature map D _m is 26 × 10 pixels in the example of FIG.

その後、特徴マップＤ₁，Ｄ₂，・・・，Ｄ_Mの各々に対し、第１中間層８３での処理と同様のプーリング処理が実行され、その結果、Ｍ枚の特徴マップＥ₁，Ｅ₂，・・・，Ｅ_Mが生成される。特徴マップＥ₁，Ｅ₂，・・・，Ｅ_Mは、プーリング層８４Ｂを構成するＭ個のユニットにそれぞれ入力される。続いて、特徴マップＥ₁，Ｅ₂，・・・，Ｅ_Mの各々に対し、第１中間層８３での処理と同様の正規化が実行され、その結果、Ｍ枚の特徴マップＦ₁，Ｆ₂，・・・，Ｆ_Mが生成される。特徴マップＦ₁，Ｆ₂，・・・，Ｆ_Mは、正規化層８４Ｃを構成するＭ個のユニットにそれぞれ入力される。特徴マップＥ_m，Ｆ_mのサイズは、図３の例では、１３×５ピクセルとなる。特徴マップＦ₁，Ｆ₂，・・・，Ｆ_Mは、特徴抽出部８２の最終的な出力画像となり、識別部８１に入力される。 After that, the same pooling process as the process in the first intermediate layer 83 is executed for each of the feature maps D ₁ , D ₂ , ..., D _M , and as a result, M feature maps E ₁ , E are executed. ₂ , ..., _EM is generated. The feature maps E ₁ , E ₂ , ..., EM are input to the _M units constituting the pooling layer 84B, respectively. Subsequently, normalization similar to the processing in the first intermediate layer 83 is executed for each of the feature maps E ₁ , E ₂ , ..., EM, and as a result, _M feature maps F ₁ , ... F ₂ , ..., _FM are generated. The feature maps F ₁ , F ₂ , ..., FM are input to the _M units constituting the normalization layer 84C, respectively. The size of the feature maps E _m and F _m is 13 × 5 pixels in the example of FIG. The feature maps F ₁ , F ₂ , ..., FM are the final output _images of the feature extraction unit 82 and are input to the identification unit 81.

識別部８１は、入力層８６、中間層８７及び出力層８８を有し、これらの層８６～８８は、全結合層を構成している。出力画像Ｆ₁，Ｆ₂，・・・，Ｆ_Mに含まれる全ての画素値Ｕ₁，Ｕ₂，・・・，Ｕ_I1は、入力層８６を構成するＩ１個の入力ユニットにそれぞれ入力される。図３の例では、Ｉ１＝１３×５（ピクセル数）×２５６（画像の枚数）＝１６６４０である。 The identification unit 81 has an input layer 86, an intermediate layer 87, and an output layer 88, and these layers 86 to 88 constitute a fully connected layer. All the pixel values U ₁ , U ₂ , ..., U I 1 included in the output _images F ₁ , F ₂ , ..., FM are input to each of the _I1 input units constituting the input layer 86, respectively. To. In the example of FIG. 3, I1 = 13 × 5 (number of pixels) × 256 (number of images) = 16640.

中間層８７は、Ｉ２個の中間ユニットから構成され、図３の例では、Ｉ２＝１０００に設定されている。ｉ番目の中間ユニットには、入力ユニットに含まれる値Ｕ₁，Ｕ₂，・・・，Ｕ_I1に基づいて算出される値Ｖ_iが入力される（ｉ＝１，２，・・・，Ｉ２）。Ｖ_iは、以下の式に従って、算出される。なお、下式中のu_i,1，u_i,2，・・・，u_i,I1は、重み係数であり、b_iは、バイアスである。

The intermediate layer 87 is composed of two intermediate units of I, and is set to I2 = 1000 in the example of FIG. In the i-th intermediate unit, the values V _i calculated based on the values U ₁ , U ₂ , ..., U _I 1 included in the input unit are input (i = 1, 2, ..., I2). V _i is calculated according to the following formula. In the following equation, u _{i, 1} , u _{i, 2} , ..., u _{i, I 1} are weighting coefficients, and b _i is a bias.

出力層８８は、Ｉ３個の出力ユニットから構成され、本実施形態では、Ｉ３＝１である。出力ユニットには、中間ユニットに含まれる値Ｖ₁，Ｖ₂，・・・，Ｖ_I2に基づいて算出される値Ｗ₁が入力され、本実施形態では、ゴルファー７の腰の回転角度を定量的に表す動作値Ｗ₁が入力される。Ｗ₁は、以下の式に従って、算出される。なお、下式中のv₁，v₂，・・・，v_I2は、重み係数であり、bは、バイアスである。

The output layer 88 is composed of I3 output units, and in this embodiment, I3 = 1. A value W ₁ calculated based on the values V ₁ , V ₂ , ..., V _I 2 included in the intermediate unit is input to the output unit, and in the present embodiment, the rotation angle of the golfer 7's waist is quantified. The operation value W ₁ to be represented is input. W ₁ is calculated according to the following formula. In the following equation, v ₁ , v ₂ , ..., V _I2 are weighting factors, and b is a bias.

＜１－４．ニューラルネットワークに基づく動作解析処理＞
以下、図４を参照しつつ、ゴルフスイングの動作解析処理について説明する。既に述べたとおり、本実施形態では、ニューラルネットワーク８に基づいて、スイング動作中のゴルファー７の腰の回転角度を定量的に表す動作値Ｗ₁が導出される。なお、解析の対象となる画像データは、動画である。従って、以下、ＩＲ画像及び深度画像を、それぞれＩＲフレーム、深度フレームということがあり、単にフレームということもある。 <1-4. Motion analysis processing based on neural network>
Hereinafter, the motion analysis process of the golf swing will be described with reference to FIG. 4. As described above, in the present embodiment, the motion value W ₁ that quantitatively represents the rotation angle of the waist of the golfer 7 during the swing motion is derived based on the neural network 8. The image data to be analyzed is a moving image. Therefore, hereinafter, the IR image and the depth image may be referred to as an IR frame and a depth frame, respectively, and may be simply referred to as a frame.

動作解析処理を実行するための準備として、まず、ゴルファー７にゴルフクラブ５を試打させ、その様子を距離画像センサ２により動画として撮影する。距離画像センサ２により撮影された時系列のＩＲフレーム及び深度フレームは、距離画像センサ２から動作解析装置１に送られる。動作解析装置１側では、第１取得部１４ａが、距離画像センサ２からの時系列のＩＲフレーム及び深度フレームを取得し、記憶部１３内に格納する（ステップＳ１）。 As a preparation for executing the motion analysis process, first, the golfer 7 is made to try out the golf club 5, and the state is photographed as a moving image by the distance image sensor 2. The time-series IR frames and depth frames captured by the distance image sensor 2 are sent from the distance image sensor 2 to the motion analysis device 1. On the motion analysis device 1 side, the first acquisition unit 14a acquires time-series IR frames and depth frames from the distance image sensor 2 and stores them in the storage unit 13 (step S1).

また、ステップＳ１では、距離画像センサ２によりゴルファー７の身体の骨組みを表す時系列のスケルトンデータが計測され、このスケルトンデータが距離画像センサ２から動作解析装置１に送られる。スケルトンデータとは、人体の主要な関節の位置（三次元座標）を表すデータであり、深度フレームから導出可能である。距離画像センサの１つであるＫｉｎｅｃｔ（登録商標）は、深度フレームからスケルトンデータを導出し、これを深度フレームとともに出力する機能を有している。第１取得部１４ａは、距離画像センサ２からの時系列のスケルトンデータについても取得し、記憶部１３内に格納する。なお、スケルトンデータが出力されないような距離画像センサ２が使用される場合には、第１取得部１４ａが、深度フレームからこれを取得するようにしてもよい。具体的には、第１取得部１４ａは、記憶部１３内に格納されているスイング動作中の時系列の深度フレームを読み出し、これらのフレームに基づいて、スイング動作中の各タイミングでのスケルトンデータを取得する。Ｋｉｎｅｃｔ（登録商標）には、深度画像からスケルトンデータを導出するためのライブラリが提供されており、このとき、これを使用してスケルトンデータを取得することができる。 Further, in step S1, the distance image sensor 2 measures time-series skeleton data representing the skeleton of the golfer 7's body, and the skeleton data is sent from the distance image sensor 2 to the motion analysis device 1. The skeleton data is data representing the positions (three-dimensional coordinates) of the main joints of the human body, and can be derived from the depth frame. Kinect (registered trademark), which is one of the distance image sensors, has a function of deriving skeleton data from a depth frame and outputting it together with the depth frame. The first acquisition unit 14a also acquires time-series skeleton data from the distance image sensor 2 and stores it in the storage unit 13. When the distance image sensor 2 that does not output the skeleton data is used, the first acquisition unit 14a may acquire this from the depth frame. Specifically, the first acquisition unit 14a reads out the time-series depth frames stored in the storage unit 13 during the swing operation, and based on these frames, the skeleton data at each timing during the swing operation. To get. Kinect® provides a library for deriving skeleton data from depth images, which can be used to obtain skeleton data.

続いて、導出部１４ｃは、スイング動作中の各タイミングでの深度フレームを正規化する（ステップＳ２）。ここでいう正規化とは、ゴルファー７を含む被写体の深度に合わせて、深度フレームの階調のスケール変換を行う処理である。具体的には、導出部１４ｃは、記憶部１３内に格納されているスイング動作中の時系列の深度フレームを読み出す。このとき、深度フレームの画素値である深度データは、距離画像センサ２の規格に従う階調を有しており、本実施形態では、１画素に１６ビットが割り当てられており、各画素は、０～６５５３５の画素値をとる。また、距離画像センサ２の奥行き方向の撮像範囲も、距離画像センサ２の規格により定められている。一方で、ゴルファー７の腰の回転角度を推定するために、ゴルファー７以外の深度データは特に必要とされない。そのため、深度フレームの中からゴルファー７を捉えた領域（以下、人物領域という）に含まれる画素値が０～６５５３５の範囲内の値をとるように、深度フレームの階調をスケール変換する。 Subsequently, the derivation unit 14c normalizes the depth frame at each timing during the swing operation (step S2). The normalization referred to here is a process of scaling the gradation of the depth frame according to the depth of the subject including the golfer 7. Specifically, the derivation unit 14c reads out a time-series depth frame stored in the storage unit 13 during the swing operation. At this time, the depth data, which is the pixel value of the depth frame, has a gradation according to the standard of the distance image sensor 2, and in this embodiment, 16 bits are assigned to one pixel, and each pixel is 0. It takes a pixel value of ~ 65535. Further, the imaging range in the depth direction of the distance image sensor 2 is also defined by the standard of the distance image sensor 2. On the other hand, in order to estimate the rotation angle of the waist of the golfer 7, depth data other than the golfer 7 is not particularly required. Therefore, the gradation of the depth frame is scale-converted so that the pixel value included in the area where the golfer 7 is captured from the depth frame (hereinafter referred to as the person area) takes a value in the range of 0 to 65535.

距離画像センサ２の設置位置及びゴルファー７の立ち位置は、固定されている。従って、人物領域において深度データが取り得る値の範囲（以下、人物深度範囲という）が、予め設定されている。本実施形態では、導出部１４ｃは、深度フレーム内の各画素の画素値（深度データ）を下式に従って、スケール変換する。ただし、人物深度範囲をｍｉｎ_z～ｍａｘ_zとし、右辺のｚを深度フレーム内の各画素の画素値とする。左辺のｚは、変換後の画素値である。

The installation position of the distance image sensor 2 and the standing position of the golfer 7 are fixed. Therefore, the range of values that the depth data can take in the person area (hereinafter referred to as the person depth range) is set in advance. In the present embodiment, the derivation unit 14c scale-converts the pixel value (depth data) of each pixel in the depth frame according to the following equation. However, the person depth range is min _z to max _z , and z on the right side is the pixel value of each pixel in the depth frame. Z on the left side is a pixel value after conversion.

以上のスケール変換は、深度フレームから人物深度範囲内の画素値を有する領域を抽出する処理である。図５は、以上のスケール変換後のある特定のタイミングでの深度フレームである。同図から分かるように、スケール変換後の深度フレーム内では、主としてゴルファー７以外を捉えた領域、すなわち、背景領域には、画素値「０」（黒色）が与えられる。その結果、以上のスケール変換により、人物領域が抽出される。なお、図６は、図５のタイミングに対応する、Ｋｉｎｅｃｔ（登録商標）により計測されたスケルトンデータを示している。 The above scale conversion is a process of extracting a region having a pixel value within the person depth range from the depth frame. FIG. 5 is a depth frame at a specific timing after the above scale conversion. As can be seen from the figure, in the depth frame after the scale conversion, the pixel value "0" (black) is given to the region that mainly captures the region other than the golfer 7, that is, the background region. As a result, the person area is extracted by the above scale conversion. Note that FIG. 6 shows skeleton data measured by Kinect (registered trademark) corresponding to the timing of FIG.

続いて、導出部１４ｃは、ステップＳ１で取得されたスケルトンデータに基づいて、ステップＳ２で正規化された深度フレームから、スイング動作中の各タイミングでのゴルファー７の腰の近傍の領域（以下、腰領域）を抽出する（ステップＳ３）。具体的には、導出部１４ｃは、スケルトンデータから、深度フレーム内での腰の座標を取得する。そして、深度フレームから、この腰の座標を基準とする所定のサイズの領域を、腰領域として切り出す。図７は、図５のような深度フレームから切り出されたスイング動作中の時系列の腰領域の画像を示している。この例では、図５の画像は、５１２×２１４ピクセルであり、図７の画像は、腰を中心又は略中心とする１２８×６４ピクセルの画像である。 Subsequently, the derivation unit 14c is based on the skeleton data acquired in step S1 from the depth frame normalized in step S2 to the region near the waist of the golfer 7 at each timing during the swing operation (hereinafter, The waist region) is extracted (step S3). Specifically, the derivation unit 14c acquires the coordinates of the waist in the depth frame from the skeleton data. Then, a region having a predetermined size based on the coordinates of the waist is cut out from the depth frame as a waist region. FIG. 7 shows an image of a time-series waist region during a swing motion cut out from a depth frame as shown in FIG. In this example, the image of FIG. 5 is 512 × 214 pixels, and the image of FIG. 7 is an image of 128 × 64 pixels centered on or substantially centered on the waist.

注目される腰の動きと、肩や足、腕等の他の部位の動きとは独立的である。従って、人体全体が写っている画像に基づいてニューラルネットワークを学習すると、腰の見え方に対応した特徴の検出の精度が低下する虞がある。また、画像のサイズが大き過ぎると、ニューラルネットワークによる解析が困難になり得る。ステップＳ３は、ニューラルネットワークに基づいて腰の動作の特徴を精度よく検出できるように、解析の対象となる深度フレームから、腰領域を抽出している。 The movement of the hips that attracts attention is independent of the movements of other parts such as the shoulders, legs, and arms. Therefore, learning a neural network based on an image showing the entire human body may reduce the accuracy of detecting features corresponding to the appearance of the waist. Also, if the size of the image is too large, it may be difficult to analyze by the neural network. In step S3, the waist region is extracted from the depth frame to be analyzed so that the characteristics of the movement of the waist can be accurately detected based on the neural network.

続いて、導出部１４ｃは、ステップＳ１で取得されたスケルトンデータに基づいて、ステップＳ３で取得された腰領域の画像から、スイング動作中の各タイミングでのゴルファー７の腕を表す領域（以下、腕領域）を除去する（ステップＳ４）。具体的には、導出部１４ｃは、腰領域の画像から腕領域を抽出し、腕領域に含まれる画素の画素値を所定の画素値（本実施形態では、０）に設定する。腕領域の抽出は、スケルトンデータに含まれる３つの関節（本実施形態では、左肩、右肩及び左手の手首）の三次元座標に基づいて行われる。より具体的には、導出部１４ｃは、これらの３つの関節の三次元座標を通る平面Ｋを導出し、腰領域の画像内の任意の点（画素）から平面Ｋまでの垂線距離ｄを導出する。なお、任意の点（画素）の三次元座標は、深度画像である腰領域の画像から取得される。また、平面Ｋ及び垂線距離ｄは、幾何学計算により算出することができる。次に、導出部１４ｃは、垂線距離ｄが所定の範囲内に含まれるか否かを判定し、所定の範囲内に含まれる場合には、当該垂線距離ｄに対応する点（画素）を腕領域に含まれるものと判定し、所定の範囲外である場合には、腕領域に含まれないものと判定する。図８は、図７のスイング動作中の時系列の腰領域の画像から、腕領域を除去した画像を示している。同図では、腕領域の画素の画素値が「０」（黒色）に変換されている。 Subsequently, the derivation unit 14c is a region representing the arm of the golfer 7 at each timing during the swing motion from the image of the waist region acquired in step S3 based on the skeleton data acquired in step S1 (hereinafter,). The arm region) is removed (step S4). Specifically, the derivation unit 14c extracts the arm region from the image of the waist region and sets the pixel value of the pixel included in the arm region to a predetermined pixel value (0 in this embodiment). The extraction of the arm region is performed based on the three-dimensional coordinates of the three joints (in this embodiment, the left shoulder, the right shoulder and the wrist of the left hand) included in the skeleton data. More specifically, the derivation unit 14c derives a plane K passing through the three-dimensional coordinates of these three joints, and derives a perpendicular distance d from an arbitrary point (pixel) in the image of the waist region to the plane K. do. The three-dimensional coordinates of any point (pixel) are acquired from the image of the waist region, which is a depth image. Further, the plane K and the perpendicular distance d can be calculated by geometric calculation. Next, the derivation unit 14c determines whether or not the perpendicular distance d is included in the predetermined range, and if it is included in the predetermined range, the arm is a point (pixel) corresponding to the perpendicular distance d. It is determined that it is included in the area, and if it is out of the predetermined range, it is determined that it is not included in the arm area. FIG. 8 shows an image in which the arm region is removed from the image of the waist region in the time series during the swing motion of FIG. 7. In the figure, the pixel value of the pixel in the arm region is converted to “0” (black).

ステップＳ４は、後述するステップＳ５でニューラルネットワーク８に画像が入力されたときに、ドロップアウト層８５において腕領域が無効化されるようにするための処理である。すなわち、ステップＳ４は、ニューラルネットワーク８に基づく解析の対象となる深度フレームから、解析の妨げとなる領域を除去するステップである。なお、本実施形態では、ステップＳ２においても、解析の妨げとなる領域、すなわち、背景領域が除去されている。 Step S4 is a process for invalidating the arm region in the dropout layer 85 when an image is input to the neural network 8 in step S5 described later. That is, step S4 is a step of removing a region that hinders the analysis from the depth frame to be analyzed based on the neural network 8. In this embodiment, also in step S2, the region that hinders the analysis, that is, the background region is removed.

続くステップＳ５では、導出部１４ｃは、ステップＳ４で取得されたスイング動作中の時系列の腰領域の画像を、順次、ニューラルネットワーク８に入力する。これにより、ニューラルネットワーク８の出力ユニットからは、順次、スイング動作を定量的に表す動作値Ｗ₁、本実施形態では、ゴルファー７の腰の回転角度Ｗ₁が出力される。ただし、本実施形態では、ステップＳ４までで取得された１２８×６４ピクセルの画像は、６４×３２ピクセルのサイズまで圧縮された後、ニューラルネットワーク８へ入力される。 In the following step S5, the derivation unit 14c sequentially inputs the images of the waist region of the time series during the swing operation acquired in step S4 to the neural network 8. As a result, the output unit of the neural network 8 sequentially outputs the motion value W ₁ that quantitatively represents the swing motion, and in the present embodiment, the rotation angle W ₁ of the waist of the golfer 7. However, in the present embodiment, the 128 × 64 pixel image acquired up to step S4 is compressed to a size of 64 × 32 pixels and then input to the neural network 8.

続いて、導出部１４ｃは、ステップＳ５で導出された回転角度Ｗ₁の時系列データの平滑化及び補間を行う（ステップＳ６）。図９は、５点の移動平均により平滑化し、３３ｍｓ間隔のデータを１ｍｓ間隔のデータに変換するスプライン補間を行った例を示している。これにより、平滑化及び補間された滑らかな回転角度Ｗ₁の時系列データが取得される。 Subsequently, the derivation unit 14c smoothes and interpolates the time _- series data of the rotation angle W1 derived in step S5 (step S6). FIG. 9 shows an example in which spline interpolation is performed by smoothing by a moving average of 5 points and converting data at 33 ms intervals into data at 1 ms intervals. As a result, the smoothed and interpolated time-series data of the smooth rotation angle W ₁ is acquired.

その後、表示制御部１４ｅは、ステップＳ６で導出されたスイング動作中の時系列の回転角度Ｗ₁及びその時系列変化、並びに図９に示すようなこれらのグラフを表示部１１上に表示させる（ステップＳ７）。これにより、ユーザは、ゴルファー７の腰の回転の動作を把握することができる。 After that, the display control unit 14e displays the rotation angle W1 in the time series during the swing operation derived in step S6, the time series change thereof, and these graphs as shown in FIG. ₉ on the display unit 11 (step). S7). As a result, the user can grasp the operation of the rotation of the waist of the golfer 7.

＜１－５．ニューラルネットワークの学習方法＞
次に、図１０を参照しつつ、ニューラルネットワーク８の学習方法について説明する。以下では、ニューラルネットワーク８を構築するための学習用データセットについて説明した後、当該データセットに基づく学習処理の流れについて説明する。 <1-5. Neural network learning method ＞
Next, the learning method of the neural network 8 will be described with reference to FIG. In the following, a learning data set for constructing the neural network 8 will be described, and then a flow of learning processing based on the data set will be described.

＜１－５－１．学習用データセット＞
学習用データセットは、距離画像センサ２により撮影される深度フレーム及びスケルトンデータと、当該深度フレーム及びスケルトンデータの撮影のタイミングでのゴルファー７の腰の回転角度（真値）との対のデータであり、このような学習用データセットが多数収集される。学習用データセットに含まれる回転角度（真値）は、ニューラルネットワーク８の学習時の教師信号となる。 <1-5-1. Data set for training ＞
The training data set is a pair of data of the depth frame and skeleton data captured by the distance image sensor 2 and the rotation angle (true value) of the golfer 7's waist at the timing of capturing the depth frame and skeleton data. Yes, many such training datasets are collected. The rotation angle (true value) included in the training data set is a teacher signal at the time of training of the neural network 8.

本実施形態では、ゴルファー７の腰の回転角度に関する教師信号は、ゴルファー７の腰に取り付けられた角速度センサ４（図１及び図２参照）により取得される。角速度センサ４により計測された角速度データは、有線又は無線の通信線を介して、角速度センサ４から動作解析装置１へ出力される。 In the present embodiment, the teacher signal regarding the rotation angle of the golfer 7's waist is acquired by the angular velocity sensor 4 (see FIGS. 1 and 2) attached to the golfer 7's waist. The angular velocity data measured by the angular velocity sensor 4 is output from the angular velocity sensor 4 to the motion analysis device 1 via a wired or wireless communication line.

なお、教師信号は、ニューラルネットワーク８の学習の場面では必要とされるが、ニューラルネットワーク８に基づく動作解析の場面においては特に必要とされない。従って、ニューラルネットワーク８が学習され、記憶部１３内に保存された後においては、角速度センサ４は、動作解析システム１００から省略することができる。また、動作解析システム１００と、ニューラルネットワーク８のモデルを構築するモデル構築システムは、異なるハードウェアにより実現することもできる。すなわち、動作解析システム１００は、別のシステムで学習されたニューラルネットワーク８を取得し、これを解析に使用してもよい。ただし、本実施形態に係る動作解析システム１００は、ニューラルネットワーク８のモデルを構築するモデル構築システムの役割も兼ねている。 The teacher signal is required in the learning scene of the neural network 8, but is not particularly required in the motion analysis scene based on the neural network 8. Therefore, after the neural network 8 is learned and stored in the storage unit 13, the angular velocity sensor 4 can be omitted from the motion analysis system 100. Further, the motion analysis system 100 and the model construction system for constructing the model of the neural network 8 can be realized by different hardware. That is, the motion analysis system 100 may acquire the neural network 8 learned by another system and use it for the analysis. However, the motion analysis system 100 according to the present embodiment also serves as a model construction system for constructing a model of the neural network 8.

＜１－５－２．学習処理の流れ＞
次に、図１０を参照しつつ、ニューラルネットワーク８の学習処理について説明する。まず、学習用データセットを取得するために、角速度センサ４を腰に取り付けたゴルファー７にゴルフクラブ５を試打させ、その様子を距離画像センサ２により動画として撮影する。このとき、好ましくは、学習の効果を高めるために、複数人のゴルファー７により多数回のスイング動作が実施される。そして、ステップＳ１と同様に、第１取得部１４ａが、距離画像センサ２から送られてくる時系列のＩＲフレーム、深度フレーム及びスケルトンデータを取得し、記憶部１３内に格納する（ステップＳ２１）。また、ステップＳ２１では、スイング動作中の角速度センサ４により計測された時系列の角速度データも、動作解析装置１に送信される。そして、第２取得部１４ｂが、この時系列の角速度データを取得し、これを時系列の腰の回転角度を表すデータ（腰の回転角度データ）に変換した後、記憶部１３内に格納する。記憶部１３内には、多数回のスイング動作に対応するＩＲフレーム、深度フレーム、スケルトンデータ及び回転角度データが格納される。このとき、ＩＲフレーム、深度フレーム及びスケルトンデータと、回転角度データとの同期が取られ、同じタイミングでのデータは、互いに対応付けられて記憶部１３内に格納される。 <1-5-2. Flow of learning process>
Next, the learning process of the neural network 8 will be described with reference to FIG. First, in order to acquire a learning data set, a golfer 7 having an angular velocity sensor 4 attached to the waist is made to try out a golf club 5, and the state is photographed as a moving image by a distance image sensor 2. At this time, preferably, a large number of swing motions are performed by a plurality of golfers 7 in order to enhance the learning effect. Then, similarly to step S1, the first acquisition unit 14a acquires the time-series IR frame, depth frame, and skeleton data sent from the distance image sensor 2 and stores them in the storage unit 13 (step S21). .. Further, in step S21, the time-series angular velocity data measured by the angular velocity sensor 4 during the swing operation is also transmitted to the motion analysis device 1. Then, the second acquisition unit 14b acquires the angular velocity data of this time series, converts it into data representing the rotation angle of the waist in the time series (rotation angle data of the waist), and then stores it in the storage unit 13. .. The storage unit 13 stores IR frames, depth frames, skeleton data, and rotation angle data corresponding to a large number of swing operations. At this time, the IR frame, the depth frame, and the skeleton data are synchronized with the rotation angle data, and the data at the same timing are associated with each other and stored in the storage unit 13.

続いて、学習部１４ｄは、ステップＳ２１で取得された多数回のスイング動作に対応する時系列の深度フレームに対し、ステップＳ２２～Ｓ２４を実行する。ステップＳ２２～Ｓ２４は、ステップＳ２１で取得された深度フレームから、ニューラルネットワーク８に入力されるべき図８のような腰領域の画像を切り出すステップである。なお、ステップＳ２２～Ｓ２４は、上述したステップＳ２～Ｓ４と同様のステップであるため、ここでは詳細な説明を省略する。 Subsequently, the learning unit 14d executes steps S22 to S24 for the time-series depth frames corresponding to the multiple swing motions acquired in step S21. Steps S22 to S24 are steps of cutting out an image of the waist region as shown in FIG. 8 to be input to the neural network 8 from the depth frame acquired in step S21. Since steps S22 to S24 are the same steps as steps S2 to S4 described above, detailed description thereof will be omitted here.

続くステップＳ２５では、学習部１４ｄは、ステップＳ２４で取得された腰領域の画像を入力とし、ステップＳ２１で取得された腰の回転角度データを教師信号として、ニューラルネットワーク８を学習させる。より具体的には、学習部１４ｄは、腰領域の画像を現在のニューラルネットワーク８に入力し、出力値として回転角度Ｗ₁を取得し、この回転角度Ｗ₁と腰の回転角度データとの誤差を最小化するように、ニューラルネットワーク８のパラメータを更新する。ここでいう学習の対象となるパラメータとは、上述した重みフィルタＧ₁，Ｇ₂，・・・，Ｇ_N，Ｌ₁，Ｌ₂，・・・，Ｌ_R、重み係数u_i,1，u_i,2，・・・，u_i,I1、重み係数v₁，v₂，・・・，v_I2、バイアスb_i，b等である。そして、このようにして、次々と学習用データセットを適用させながら、ニューラルネットワーク８を最適化してゆく。ニューラルネットワーク８の教師付き学習方法は、様々なもの公知であるため、ここでは詳細な説明を省略するが、例えば、誤差逆伝播法を用いた確率的勾配降下法を用いることができる。以上により、学習処理が終了する。 In the following step S25, the learning unit 14d trains the neural network 8 using the image of the waist region acquired in step S24 as an input and the rotation angle data of the waist acquired in step S21 as a teacher signal. More specifically, the learning unit 14d inputs an image of the waist region to the current neural network 8, acquires a rotation angle W ₁ as an output value, and has an error between the rotation angle W ₁ and the rotation angle data of the waist. Update the parameters of the neural network 8 so as to minimize. The parameters to be learned here are the above _- mentioned weight filters G ₁ , G ₂ , ..., GN, L ₁ , _L ₂ , ..., LR, and weight coefficients u _{i, 1} , u. _{i, 2} , ..., u _{i, I1} , weighting factors v ₁ , v ₂ , ..., v _I2 , bias b _i , b, etc. Then, in this way, the neural network 8 is optimized while applying the learning data sets one after another. Since various supervised learning methods of the neural network 8 are known, detailed description thereof will be omitted here, but for example, a stochastic gradient descent method using an error backpropagation method can be used. With the above, the learning process is completed.

＜２．第２実施形態＞
図１１及び図１２に、本実施形態に係る動作解析システム１０１の全体構成図を示す。これらの図をそれぞれ図１及び図２と比較すれば明らかなように、第２実施形態に係る動作解析システム１０１は、第１実施形態に係る動作解析システム１００と多くの点で共通する。以下では、簡単のため、第１実施形態と共通の要素には同じ参照符号を付し、第１実施形態との相違点を中心に説明する。 <2. 2nd Embodiment>
11 and 12 show an overall configuration diagram of the motion analysis system 101 according to the present embodiment. As is clear from comparing these figures with FIGS. 1 and 2, the motion analysis system 101 according to the second embodiment is common to the motion analysis system 100 according to the first embodiment in many respects. Hereinafter, for the sake of simplicity, the same reference numerals will be given to the elements common to the first embodiment, and the differences from the first embodiment will be mainly described.

第１及び第２実施形態の主な相違点について述べると、第１実施形態では、スイング動作中のゴルファー７の腰の回転動作が定量的に解析されるのに対し、第２実施形態では、これに加えて、スイング動作中のゴルファー７の身体の体重移動及び肩の回転動作も定量的に解析される。本実施形態では、図１３に示すニューラルネットワーク１０８により、ゴルファー７の動作を定量的に表す動作値として、ゴルファー７の身体の体重移動を表す重心移動量が定量的に導出される。重心移動量は、重心の位置を表す動作値である。また、図１４に示すニューラルネットワーク２０８により、ゴルファー７の動作を定量的に表す動作値として、肩の回転角度が定量的に導出される。これらのニューラルネットワーク１０８，２０８も、ニューラルネットワーク８と同様に、深度画像を入力とする。以下、ニューラルネットワーク１０８，２０８のモデル構成、ニューラルネットワーク１０８，２０８に基づく動作解析方法、及びニューラルネットワーク１０８，２０８の学習方法について順に説明する。 The main differences between the first and second embodiments will be described. In the first embodiment, the rotation motion of the golfer 7's waist during the swing motion is quantitatively analyzed, whereas in the second embodiment, the rotation motion is quantitatively analyzed. In addition to this, the weight shift of the golfer 7 during the swing motion and the rotational motion of the shoulder are also quantitatively analyzed. In the present embodiment, the neural network 108 shown in FIG. 13 quantitatively derives the amount of movement of the center of gravity, which represents the weight movement of the body of the golfer 7, as the movement value that quantitatively represents the movement of the golfer 7. The amount of movement of the center of gravity is an operation value representing the position of the center of gravity. Further, the neural network 208 shown in FIG. 14 quantitatively derives the rotation angle of the shoulder as an operation value that quantitatively represents the operation of the golfer 7. Similar to the neural network 8, these neural networks 108 and 208 also input a depth image. Hereinafter, the model configuration of the neural networks 108 and 208, the motion analysis method based on the neural networks 108 and 208, and the learning method of the neural networks 108 and 208 will be described in order.

＜２－１．ニューラルネットワークのモデル構成＞
図３と図１３とを比較すれば明らかなとおり、ニューラルネットワーク８，１０８は、類似の構成を有する。両者の主たる相違点について述べると、ニューラルネットワーク１０８では、ゴルファー７の身体の重心移動量が２次元的に評価されるため、出力層８８が２個の出力ユニットから構成される。一方の出力ユニットには、Ｘ方向（ゴルフボールの飛球線方向）の重心移動量を定量的に表す動作値Ｗ₁₀₁が入力され、他方の出力ユニットには、Ｙ方向（ゴルファー７の背から腹に向かう方向）の重心移動量を定量的に表す動作値Ｗ₁₀₂が入力される。Ｗ₁₀₁,Ｗ₁₀₂は、以下の式に従って算出される。なお、下式中のv_1,1，v_1,2，・・・，v_1,I2及びv_2,1，v_2,2，・・・，v_2,I2は、重み係数であり、b₁₀₁，b₁₀₂は、バイアスである。

<2-1. Neural network model configuration>
As is clear from comparing FIGS. 3 and 13, the

neural networks

8 and 108 have similar configurations. As for the main difference between the two, in the neural network 108, the amount of movement of the center of gravity of the golfer 7's body is evaluated two-dimensionally, so that the output layer 88 is composed of two output units. An operation value W ₁₀₁ that quantitatively represents the amount of movement of the center of gravity in the X direction (direction of the golf ball's flight line) is input to one output unit, and the Y direction (from the back of the golfer 7) is input to the other output unit. An operation value W ₁₀₂ that quantitatively represents the amount of movement of the center of gravity in the direction toward the abdomen is input. W ₁₀₁ and W ₁₀₂ are calculated according to the following formula. Note that v _1,1 , v _1,2 , ..., v _{1, I2} and v _2,1 , v _2,2 , ..., v _{2, I2} in the following equation are weighting coefficients. b ₁₀₁ and b ₁₀₂ are biases.

また、ニューラルネットワーク１０８は、ドロップアウト層８５を有さないが、勿論、ドロップアウト層８５を有するように構成することもできる。さらに、特徴マップのサイズや層を形成するユニット数等が、適宜変更されている。 Further, although the neural network 108 does not have the dropout layer 85, it can of course be configured to have the dropout layer 85. Further, the size of the feature map, the number of units forming the layer, and the like are appropriately changed.

また、図３と図１４とを比較すれば明らかなとおり、ニューラルネットワーク８，２０８も、類似の構成を有する。両者の主たる相違点は、ニューラルネットワーク２０８がドロップアウト層８５を有さないことであるが、勿論、ニューラルネットワーク２０８がドロップアウト層８５を有するように構成することもできる。ニューラルネットワーク２０８の出力層８８は、１個の出力ユニットから構成される。出力ユニットには、ゴルファー７の肩の回転角度を定量的に表す動作値Ｗ₂₀₁が入力される。Ｗ₂₀₁は、Ｗ₁と同様に、数３の式に従って算出される。なお、腰の回転角度、肩の回転角度及び重心移動量を導出するためのニューラルネットワーク８，１０８，２０８の説明において、Viやbi等のように、しばしば同じ記号が用いられることがあるが、最終的に出力される解析対象が異なり、入力される画像データも異なるため、当然に各記号の表す具体的な値は異なる。 Further, as is clear from comparing FIGS. 3 and 14, the neural networks 8 and 208 also have a similar configuration. The main difference between the two is that the neural network 208 does not have a dropout layer 85, but of course the neural network 208 can also be configured to have a dropout layer 85. The output layer 88 of the neural network 208 is composed of one output unit. An operation value W ₂₀₁ that quantitatively represents the rotation angle of the shoulder of the golfer 7 is input to the output unit. W ₂₀₁ is calculated according to the equation of equation 3 in the same manner as W ₁ . In the description of the neural networks 8, 108, 208 for deriving the rotation angle of the waist, the rotation angle of the shoulder, and the amount of movement of the center of gravity, the same symbols are often used, such as Vi and bi. Since the analysis target that is finally output is different and the input image data is also different, the specific values represented by each symbol are naturally different.

なお、第１実施形態及び第２実施形態で開示されるニューラルネットワーク８，１０８，２０８の構成は例示である。よって、各ニューラルネットワーク８，１０８，２０８の各層を形成するユニット数や、特徴マップの数及びサイズ、フィルタの数及びサイズ等は、適宜変更することができる。 The configurations of the neural networks 8, 108, 208 disclosed in the first embodiment and the second embodiment are examples. Therefore, the number of units forming each layer of each neural network 8, 108, 208, the number and size of feature maps, the number and size of filters, and the like can be appropriately changed.

＜２－２．ニューラルネットワークに基づく動作解析処理＞
次に、図１５を参照しつつ、第２実施形態に係るゴルフスイングの動作解析処理について説明する。既に述べたとおり、ここでは、ニューラルネットワーク８に基づいて、スイング動作中のゴルファー７の腰の回転角度を定量的に表す動作値Ｗ₁が導出されるとともに、ニューラルネットワーク１０８に基づいて、スイング動作中のゴルファー７の身体の重心移動量を定量的に表す動作値Ｗ_101,Ｗ₁₀₂が導出される。さらに、ニューラルネットワーク２０８に基づいて、スイング動作中のゴルファー７の腰の回転角度を定量的に表す動作値Ｗ₂₀₁も導出される。以下、図１５の処理について説明するが、図１５の処理は図４の処理を含み、新たなステップＳ１０２，Ｓ１０３，Ｓ１０５，Ｓ１０６，Ｓ２０３，Ｓ２０５，Ｓ２０６も図４の処理の一部のステップに類似する。よって、以下では、図４の説明を参照しつつ、主に両処理の相違点についての説明を行う。 <2-2. Motion analysis processing based on neural network>
Next, the operation analysis process of the golf swing according to the second embodiment will be described with reference to FIG. As described above, here, the motion value W ₁ that quantitatively represents the rotation angle of the golfer 7 during the swing motion is derived based on the neural network 8, and the swing motion is based on the neural network 108. The operating values W _{101 and} W ₁₀₂ that quantitatively represent the amount of movement of the center of gravity of the golfer 7 in the golfer 7 are derived. Further, based on the neural network 208, an operation value W ₂₀₁ that quantitatively represents the rotation angle of the waist of the golfer 7 during the swing operation is also derived. Hereinafter, the process of FIG. 15 will be described, but the process of FIG. 15 includes the process of FIG. 4, and the new steps S102, S103, S105, S106, S203, S205, and S206 are also part of the process of FIG. Similar. Therefore, in the following, the differences between the two processes will be mainly described with reference to the description of FIG.

まず、図４の処理と同様に、ステップＳ１及びステップＳ２が実行される。すなわち、距離画像センサ２により、ゴルファー７がゴルフクラブ５をスイングする様子が撮影され、時系列のＩＲフレーム、深度フレーム及びスケルトンデータが取得され、動作解析装置１の記憶部１３内に格納される。なお、本実施形態のステップＳ１では、距離画像センサ２により、深度フレームの撮像範囲内においてゴルファー７の身体の占める範囲を表す人物領域画像が作成され、この人物領域画像が距離画像センサ２から動作解析装置１に送られる。人物領域画像は、人物の占める領域とそれ以外の領域とを二値的に区別する画像であり、図１６Ａにされるような二値画像である。人物領域画像は、深度フレームから導出可能であり、距離画像センサの１つであるＫｉｎｅｃｔ（登録商標）は、深度フレームから人物領域画像を導出し、これを深度フレームとともに出力する機能を有している。第１取得部１４ａは、距離画像センサ２からの時系列の人物領域画像についても取得し、記憶部１３内に格納する。なお、人物領域画像が出力されないような距離画像センサ２が使用される場合には、第１取得部１４ａが、深度フレームからこれを取得するようにしてもよい。具体的には、第１取得部１４ａは、記憶部１３内に格納されているスイング動作中の時系列の深度フレームを読み出し、これらのフレームに基づいて、スイング動作中の各タイミングでの人物領域画像を作成する。 First, step S1 and step S2 are executed in the same manner as in the process of FIG. That is, the distance image sensor 2 photographs the golfer 7 swinging the golf club 5, acquires time-series IR frames, depth frames, and skeleton data, and stores them in the storage unit 13 of the motion analysis device 1. .. In step S1 of the present embodiment, the distance image sensor 2 creates a person area image representing the range occupied by the golfer 7's body within the imaging range of the depth frame, and this person area image operates from the distance image sensor 2. It is sent to the analyzer 1. The person area image is an image that binaryally distinguishes the area occupied by the person from the other area, and is a binary image as shown in FIG. 16A. The person area image can be derived from the depth frame, and Kinect (registered trademark), which is one of the distance image sensors, has a function of deriving the person area image from the depth frame and outputting it together with the depth frame. There is. The first acquisition unit 14a also acquires a time-series person area image from the distance image sensor 2 and stores it in the storage unit 13. When the distance image sensor 2 is used so that the person area image is not output, the first acquisition unit 14a may acquire the distance image sensor 2 from the depth frame. Specifically, the first acquisition unit 14a reads out the time-series depth frames stored in the storage unit 13 during the swing operation, and based on these frames, the person area at each timing during the swing operation. Create an image.

続くステップＳ２では、深度フレームが正規化される。そして、ステップＳ２が終わると、図４の処理と同様にステップＳ３～Ｓ６が実行され、その結果、時系列の腰の回転角度Ｗ₁が算出され、これに対し平滑化及びスプライン補間が実行される。一方、ステップＳ３～Ｓ６と並列に、時系列の身体の重心移動量Ｗ_101,Ｗ₁₀₂を導出するためのステップＳ１０２，Ｓ１０３，Ｓ１０５，Ｓ１０６と、時系列の肩の回転角度Ｗ₂₀₁を導出するステップＳ２０３，Ｓ２０５，Ｓ２０６とが実行される。なお、ステップＳ３～Ｓ６と、ステップＳ１０２，Ｓ１０３，Ｓ１０５，Ｓ１０６と、ステップＳ２０３，Ｓ２０５，Ｓ２０６とは、並列に実行される必要はなく、例えば、適当な順番で順に実行されてもよい。 In the following step S2, the depth frame is normalized. Then, when step S2 is completed, steps S3 to S6 are executed in the same manner as in the process of FIG. 4, and as _a result, the rotation angle W1 of the waist in the time series is calculated, and smoothing and spline interpolation are executed for this. To. On the other hand, in parallel with steps S3 to S6, steps S102, S103, S105, S106 for deriving the amount of movement of the center of gravity of the body in time series W _101, W ₁₀₂ and the rotation angle W ₂₀₁ of the shoulder in time series are derived. Steps S203, S205, and S206 are executed. It should be noted that steps S3 to S6, steps S102, S103, S105, and S106 and steps S203, S205, and S206 do not need to be executed in parallel, and may be executed in an appropriate order, for example.

まず、重心移動量Ｗ_101,Ｗ₁₀₂を導出するためのステップＳ１０２，Ｓ１０３，Ｓ１０５，Ｓ１０６について説明する。ステップＳ１０２では、導出部１４ｃは、スイング動作中の各タイミングでのステップＳ２で正規化された深度フレームから、ゴルファー７の占める領域だけを切り取った画像（以下、切り取り画像）を作成する。上記のとおり、正規化された深度フレームは、主にゴルファー７の深度の情報のみを含む画像であるが、厳密にはゴルファー７の足元付近の地面の深度の情報も含む。ステップＳ１０２では、このような地面の深度の情報を除去するべく、ステップＳ１で取得された人物領域画像に基づいて、ステップＳ２で正規化された深度フレームから、ゴルファー７の占める領域だけを切り取る。図１６Ｂは、図５に示す正規化された深度フレームから、図１６Ａの人物領域画像を用いて作成された切り取り画像を示している。 First, steps S102, S103, S105, and S106 for deriving the center of gravity movement amounts W _{101 and} W ₁₀₂ will be described. In step S102, the derivation unit 14c creates an image (hereinafter referred to as a cut image) in which only the area occupied by the golfer 7 is cut out from the depth frame normalized in step S2 at each timing during the swing operation. As described above, the normalized depth frame is an image mainly containing information on the depth of the golfer 7, but strictly speaking, it also includes information on the depth of the ground near the feet of the golfer 7. In step S102, in order to remove such information on the depth of the ground, only the area occupied by the golfer 7 is cut out from the depth frame normalized in step S2 based on the person area image acquired in step S1. FIG. 16B shows a cropped image created from the normalized depth frame shown in FIG. 5 using the person region image of FIG. 16A.

続くステップＳ１０３では、導出部１４ｃは、ステップＳ１で取得されたスケルトンデータに基づいて、切り取り画像から、ゴルファー７の近傍の領域（以下、ゴルファー領域）を抽出する。具体的には、導出部１４ｃは、スケルトンデータから、深度フレーム内での鳩尾の座標を取得する。そして、深度フレームから、この鳩尾の座標を基準とする所定のサイズの領域を、ゴルファー領域として切り出す。図１７の左側の画像は、鳩尾を基準とするゴルファー領域の設定方法の例を示しており、図１７の右側の画像は、この設定方法の例に従って、図１６Ｂの切り取り画像から切り出されたゴルファー領域の画像を示している。 In the following step S103, the derivation unit 14c extracts a region in the vicinity of the golfer 7 (hereinafter referred to as a golfer region) from the clipped image based on the skeleton data acquired in step S1. Specifically, the derivation unit 14c acquires the coordinates of the dovetail in the depth frame from the skeleton data. Then, a region having a predetermined size based on the coordinates of the pigeon tail is cut out from the depth frame as a golfer region. The image on the left side of FIG. 17 shows an example of a method of setting a golfer area with reference to the dovetail, and the image on the right side of FIG. 17 shows a golfer cut out from the cut-out image of FIG. 16B according to the example of this setting method. An image of the area is shown.

身体の重心の位置を評価しようとするとき、地面の情報は不要となり得る。従って、地面の深度の情報を含む画像に基づいてニューラルネットワークを学習すると、重心の検出の精度が低下する虞がある。また、画像のサイズが大き過ぎると、ニューラルネットワークによる解析が困難になり得る。ステップＳ１０２，Ｓ１０３は、ニューラルネットワークに基づいて身体の重心の位置を精度よく検出できるように、解析の対象となる深度フレームから、ゴルファー領域を抽出している。 Information on the ground may not be needed when trying to assess the position of the center of gravity of the body. Therefore, learning a neural network based on an image containing information on the depth of the ground may reduce the accuracy of detecting the center of gravity. Also, if the size of the image is too large, it may be difficult to analyze by the neural network. In steps S102 and S103, the golfer region is extracted from the depth frame to be analyzed so that the position of the center of gravity of the body can be accurately detected based on the neural network.

続くステップＳ１０５では、導出部１４ｃは、ステップＳ１０３で取得されたスイング動作中の時系列のゴルファー領域の画像を、順次、ニューラルネットワーク１０８に入力する。これにより、ニューラルネットワーク１０８の出力ユニットからは、順次、スイング動作中の時系列のゴルファー７の身体の重心移動量Ｗ_101,Ｗ₁₀₂が出力される。ただし、本実施形態では、ステップＳ１０３までで取得された３４８×１９２ピクセルの画像は、最近傍補間等により１１６×６４ピクセルのサイズまで圧縮された後、ニューラルネットワーク１０８へ入力される。 In the following step S105, the derivation unit 14c sequentially inputs the images of the golfer region of the time series during the swing operation acquired in step S103 to the neural network 108. As a result, the output units of the neural network 108 sequentially output the amount of movement of the center of gravity of the body of the golfer 7 in the time series during the swing operation, W _{101 and} W ₁₀₂ . However, in the present embodiment, the 348 × 192 pixel image acquired up to step S103 is compressed to a size of 116 × 64 pixels by nearest neighbor interpolation or the like, and then input to the neural network 108.

続いて、導出部１４ｃは、ステップＳ１０５で導出された重心移動量Ｗ_101,Ｗ₁₀₂の時系列データの平滑化及び補間を行う（ステップＳ１０６）。図１８Ａ及び図１８Ｂは、それぞれ重心移動量Ｗ_101,Ｗ₁₀₂の時系列データを５点の移動平均により平滑化し、３３ｍｓ間隔のデータを１ｍｓ間隔のデータに変換するスプライン補間を行った例を示している。これにより、滑らかな重心移動量Ｗ_101,Ｗ₁₀₂の時系列データが取得される。図１８Ｃは、Ｘ方向及びＹ方向の重心移動量Ｗ_101,Ｗ₁₀₂を組み合わせて作成した、平面視における身体の重心の軌跡のグラフである。 Subsequently, the derivation unit 14c smoothes and interpolates the time-series data of the center of gravity movement amounts W _{101 and} W ₁₀₂ derived in step S105 (step S106). FIGS. 18A and 18B show an example in which the time-series data of the center of gravity movement amounts W _{101 and} W ₁₀₂ are smoothed by a moving average of 5 points, and spline interpolation is performed to convert the data at 33 ms intervals into the data at 1 ms intervals. ing. As a result, time-series data of smooth center of gravity movement amounts W _{101 and} W ₁₀₂ are acquired. FIG. 18C is a graph of the locus of the center of gravity of the body in a plan view created by combining the movement amounts of the center of gravity W _{101 and} W ₁₀₂ in the X direction and the Y direction.

次に、肩の回転角度Ｗ₂₀₁を導出するためのステップＳ２０３，Ｓ２０５，Ｓ２０６について説明する。まず、導出部１４ｃは、ステップＳ１で取得されたスケルトンデータに基づいて、ステップＳ２で正規化された深度フレームから、スイング動作中の各タイミングでのゴルファー７の肩の近傍の領域（以下、肩領域）を抽出する（ステップＳ２０３）。具体的には、導出部１４ｃは、スケルトンデータから深度フレーム内での両肩の座標を取得し、両肩の中央（以下、肩中央）の座標を特定する。そして、深度フレームから、この肩中央の座標を基準とする所定のサイズの領域を、肩領域として切り出す。図１９の左側の画像は、肩中央を基準とする肩領域の設定方法の例を示しており、図１９の右側の画像は、この設定方法の例に従って、図１６Ｂの切り取り画像から又は図５のような深度フレームから切り出された肩領域の画像を示している。 Next, steps S203, S205, and S206 for deriving the shoulder rotation angle W ₂₀₁ will be described. First, the derivation unit 14c is a region near the shoulder of the golfer 7 at each timing during the swing operation (hereinafter, shoulder) from the depth frame normalized in step S2 based on the skeleton data acquired in step S1. Region) is extracted (step S203). Specifically, the derivation unit 14c acquires the coordinates of both shoulders in the depth frame from the skeleton data, and specifies the coordinates of the center of both shoulders (hereinafter, the center of the shoulder). Then, a region having a predetermined size based on the coordinates of the center of the shoulder is cut out from the depth frame as a shoulder region. The image on the left side of FIG. 19 shows an example of how to set the shoulder region with respect to the center of the shoulder, and the image on the right side of FIG. 19 is from the clipped image of FIG. 16B or according to the example of this setting method. The image of the shoulder region cut out from the depth frame such as is shown.

注目される肩の動きと、腰や足、腕等の他の部位の動きとは独立的である。従って、人体全体が写っている画像に基づいてニューラルネットワークを学習すると、肩の見え方に対応した特徴の検出の精度が低下する虞がある。また、画像のサイズが大き過ぎると、ニューラルネットワークによる解析が困難になり得る。ステップＳ２０３は、ニューラルネットワークに基づいて肩の動作の特徴を精度よく検出できるように、解析の対象となる深度フレームから、肩領域を抽出している。 The movement of the shoulder that is noticed is independent of the movement of other parts such as the waist, legs, and arms. Therefore, learning a neural network based on an image showing the entire human body may reduce the accuracy of detecting features corresponding to the appearance of the shoulder. Also, if the size of the image is too large, it may be difficult to analyze by the neural network. In step S203, the shoulder region is extracted from the depth frame to be analyzed so that the characteristics of the shoulder movement can be accurately detected based on the neural network.

続くステップＳ２０５では、導出部１４ｃは、ステップＳ２０３で取得されたスイング動作中の時系列の肩領域の画像を、順次、ニューラルネットワーク２０８に入力する。これにより、ニューラルネットワーク２０８の出力ユニットからは、順次、ゴルファー７の肩の回転角度Ｗ₂₀₁が出力される。ただし、本実施形態では、ステップＳ２０３までで取得された１２８×６４ピクセルの画像は、最近傍補間等により６４×３２ピクセルのサイズまで圧縮された後、ニューラルネットワーク２０８へ入力される。 In the following step S205, the derivation unit 14c sequentially inputs the images of the shoulder region of the time series during the swing operation acquired in step S203 to the neural network 208. As a result, the rotation angle W ₂₀₁ of the shoulder of the golfer 7 is sequentially output from the output unit of the neural network 208. However, in the present embodiment, the 128 × 64 pixel image acquired up to step S203 is compressed to a size of 64 × 32 pixels by nearest neighbor interpolation or the like, and then input to the neural network 208.

続いて、導出部１４ｃは、ステップＳ２０５で導出された肩の回転角度Ｗ₂₀₁の時系列データの平滑化及び補間を行う（ステップＳ２０６）。図２０は、肩の回転角度Ｗ₂₀₁の時系列データを５点の移動平均により平滑化し、３３ｍｓ間隔のデータを１ｍｓ間隔のデータに変換するスプライン補間を行った例を示している。これにより、滑らかな肩の回転角度Ｗ₂₀₁の時系列データが取得される。 Subsequently, the derivation unit 14c smoothes and interpolates the time-series data of the shoulder rotation angle W ₂₀₁ derived in step S205 (step S206). FIG. 20 shows an example in which the time-series data of the shoulder rotation angle W ₂₀₁ is smoothed by a moving average of 5 points, and spline interpolation is performed to convert the data at 33 ms intervals into the data at 1 ms intervals. As a result, time-series data of a smooth shoulder rotation angle W ₂₀₁ is acquired.

その後、表示制御部１４ｅは、ステップＳ６で導出されたスイング動作中の時系列の腰の回転角度Ｗ₁及びその時系列変化、並びに図９に示すようなグラフを表示部１１上に表示させる（ステップＳ７）。また、本実施形態のステップＳ７では、表示制御部１４ｅは、ステップＳ１０６で導出されたスイング動作中の時系列の身体の重心移動量Ｗ_101,Ｗ₁₀₂及びその時系列変化、並びに図１８Ａ～１８Ｃに示すようなグラフを表示部１１上に表示させる。さらに、表示制御部１４ｅは、ステップＳ２０６で導出されたスイング動作中の時系列の肩の回転角度Ｗ₂₀₁及びその時系列変化、並びに図２０に示すようなグラフを表示部１１上に表示させる。これにより、ユーザは、ゴルファー７の腰及び肩の回転の動作、並びにゴルファー７の体重移動を詳細に把握することができる。 After that, the display control unit 14e displays the rotation angle W1 of the waist during the swing operation derived in step S6, the time _- series change thereof, and the graph as shown in FIG. 9 on the display unit 11 (step). S7). Further, in step S7 of the present embodiment, the display control unit 14e shows the time-series body center of gravity movement amounts W _{101 and} W ₁₀₂ and their time-series changes during the swing motion derived in step S106, as well as FIGS. 18A to 18C. A graph as shown is displayed on the display unit 11. Further, the display control unit 14e displays the rotation angle W ₂₀₁ of the shoulder in the time series during the swing operation derived in step S206, the time series change thereof, and the graph as shown in FIG. 20 on the display unit 11. As a result, the user can grasp in detail the movement of the rotation of the waist and shoulders of the golfer 7 and the weight shift of the golfer 7.

＜２－３．ニューラルネットワークの学習方法＞
次に、図２１及び図２２を参照しつつ、ニューラルネットワーク１０８，２０８の学習方法について説明する。以下では、ニューラルネットワーク１０８，２０８のそれぞれを構築するための学習用データセットについて説明するとともに、各データセットに基づく学習処理の流れについて説明する。 <2-3. Neural network learning method ＞
Next, the learning method of the neural networks 108 and 208 will be described with reference to FIGS. 21 and 22. In the following, a learning data set for constructing each of the neural networks 108 and 208 will be described, and a flow of learning processing based on each data set will be described.

＜２－３－１．学習用データセット（身体の重心移動量）＞
ニューラルネットワーク１０８を構築するための学習用データセットは、距離画像センサ２により取得される深度フレーム、スケルトンデータ及び人物領域画像と、これらが取得されるタイミングでのゴルファー７の重心移動量（真値）との対のデータであり、このような学習用データセットが多数収集される。学習用データセットに含まれる重心移動量（真値）は、ニューラルネットワーク１０８の学習時の教師信号となる。 <2-3-1. Data set for learning (movement of the center of gravity of the body)>
The training data set for constructing the neural network 108 includes the depth frame, the skeleton data, and the person area image acquired by the distance image sensor 2, and the amount of movement of the center of gravity (true value) of the golfer 7 at the timing when these are acquired. ), And a large number of such training data sets are collected. The amount of movement of the center of gravity (true value) included in the training data set is a teacher signal during training of the neural network 108.

本実施形態では、ゴルファー７の重心移動量に関する教師信号は、ゴルファー７の足元に設置される床反力計１０４（図１１及び図１２参照）の出力値（床反力データ）に基づいて取得される。床反力計１０４は、左右一対のフォースプレート１０４Ｌ，１０４Ｒから構成される。ゴルフスイング時、ゴルファー７はフォースプレート１０４Ｌ，１０４Ｒ上に乗る。このとき、ゴルファー７の左足はフォースプレート１０４Ｌ上に位置決めされ、右足はフォースプレート１０４Ｒ上に位置決めされる。 In the present embodiment, the teacher signal regarding the amount of movement of the center of gravity of the golfer 7 is acquired based on the output value (floor reaction force data) of the floor reaction force meter 104 (see FIGS. 11 and 12) installed at the foot of the golfer 7. Will be done. The floor reaction force meter 104 is composed of a pair of left and right force plates 104L and 104R. During the golf swing, the golfer 7 rides on the force plates 104L and 104R. At this time, the left foot of the golfer 7 is positioned on the force plate 104L, and the right foot is positioned on the force plate 104R.

フォースプレート１０４Ｌ，１０４Ｒは、各々、複数の力センサ１２１を有する。力センサ１２１は、フォースプレート１０４Ｌ，１０４Ｒの板状のケース内に分散して配置され（例えば、四隅に配置され）、ゴルファー７の体重を受け取ってフォースプレート１０４Ｌ，１０４Ｒ上に作用する床反力を検出する。フォースプレート１０４Ｌ，１０４Ｒは、動作解析装置１に通信接続されており、力センサ１２１より検出された床反力データは、有線又は無線の通信線を介して、フォースプレート１０４Ｌ，１０４Ｒから動作解析装置１へ出力される。 The force plates 104L and 104R each have a plurality of force sensors 121. The force sensors 121 are dispersedly arranged in the plate-shaped case of the force plates 104L and 104R (for example, arranged at the four corners), receive the weight of the golfer 7, and act on the force plates 104L and 104R. Is detected. The force plates 104L and 104R are communication-connected to the motion analysis device 1, and the floor reaction force data detected by the force sensor 121 is transmitted from the force plates 104L and 104R via a wired or wireless communication line. It is output to 1.

なお、ニューラルネットワーク１０８が学習され、記憶部１３内に保存された後においては、床反力計１０４は、動作解析システム１０１から省略することができる。また、動作解析システム１０１と、ニューラルネットワーク１０８のモデルを構築するモデル構築システムは、異なるハードウェアにより実現することもできる。 After the neural network 108 is learned and stored in the storage unit 13, the floor reaction force meter 104 can be omitted from the motion analysis system 101. Further, the motion analysis system 101 and the model construction system for constructing the model of the neural network 108 can be realized by different hardware.

＜２－３－２．学習処理の流れ（身体の重心移動量）＞
次に、図２１を参照しつつ、ニューラルネットワーク１０８の学習処理について説明する。まず、ニューラルネットワーク１０８を構築するための学習用データセットを取得するために、床反力計１０４上に乗ったゴルファー７にゴルフクラブ５を試打させ、その様子を距離画像センサ２により動画として撮影する。このとき、好ましくは、学習の効果を高めるために、複数人のゴルファー７により多数回のスイング動作が実施される。そして、ステップＳ１と同様に、第１取得部１４ａが、距離画像センサ２から送られてくる時系列のＩＲフレーム、深度フレーム、スケルトンデータ及び人物領域画像を取得し、記憶部１３内に格納する（ステップＳ１２１）。また、ステップＳ１２１では、床反力計１０４により計測された時系列の床反力データも、動作解析装置１に送信される。そして、第２取得部１４ｂが、この時系列の床反力データに基づいて、ゴルファー７の身体の重心移動量のデータ（以下、重心位置データ）を取得し、記憶部１３内に格納する。このとき、ＩＲフレーム、深度フレーム、スケルトンデータ及び人物領域画像と、重心位置データとの同期が取られ、同じタイミングでのデータは、互いに対応付けられて記憶部１３内に格納される。 <2-3-2. Flow of learning process (movement of the center of gravity of the body)>
Next, the learning process of the neural network 108 will be described with reference to FIG. First, in order to acquire a learning data set for constructing the neural network 108, a golfer 7 riding on the floor reaction force meter 104 is made to test-hit the golf club 5, and the state is photographed as a moving image by the distance image sensor 2. do. At this time, preferably, a large number of swing motions are performed by a plurality of golfers 7 in order to enhance the learning effect. Then, as in step S1, the first acquisition unit 14a acquires the time-series IR frame, depth frame, skeleton data, and person area image sent from the distance image sensor 2 and stores them in the storage unit 13. (Step S121). Further, in step S121, the time-series floor reaction force data measured by the floor reaction force meter 104 is also transmitted to the motion analysis device 1. Then, the second acquisition unit 14b acquires the data of the amount of movement of the center of gravity of the golfer 7's body (hereinafter, the center of gravity position data) based on the floor reaction force data of this time series, and stores it in the storage unit 13. At this time, the IR frame, the depth frame, the skeleton data, the person area image, and the center of gravity position data are synchronized, and the data at the same timing are stored in the storage unit 13 in association with each other.

続いて、学習部１４ｄは、ステップＳ１２１で取得された多数回のスイング動作に対応する時系列の深度フレームに対し、ステップＳ１２２，Ｓ１３２，Ｓ１３３を実行する。ステップＳ１２２，Ｓ１３２，Ｓ１３３は、ステップＳ１２１で取得された深度フレームから、ニューラルネットワーク１０８に入力されるべきゴルファー領域の画像を切り出すステップである。なお、ステップＳ１２２，Ｓ１３２，Ｓ１３３は、上述したステップＳ２，Ｓ１０２，Ｓ１０３と同様のステップであるため、ここでは詳細な説明を省略する。 Subsequently, the learning unit 14d executes steps S122, S132, and S133 for the time-series depth frames corresponding to the multiple swing motions acquired in step S121. Steps S122, S132, and S133 are steps for cutting out an image of the golfer region to be input to the neural network 108 from the depth frame acquired in step S121. Since steps S122, S132, and S133 are the same steps as steps S2, S102, and S103 described above, detailed description thereof will be omitted here.

続くステップＳ１３５では、学習部１４ｄは、ステップＳ１３３で取得されたゴルファー領域の画像を入力とし、ステップＳ１２１で取得された重心位置データを教師信号として、ニューラルネットワーク１０８を学習させる。より具体的には、学習部１４ｄは、ゴルファー領域の画像を現在のニューラルネットワーク１０８に入力し、出力値として重心移動量Ｗ₁₀₁，Ｗ₁₀₂を取得し、この重心移動量Ｗ₁₀₁，Ｗ₁₀₂と重心位置データとの誤差を最小化するように、ニューラルネットワーク１０８のパラメータを更新する。ここでいう学習の対象となるパラメータとは、上述した重みフィルタＧ₁，Ｇ₂，・・・，Ｇ_N，Ｌ₁，Ｌ₂，・・・，Ｌ_R、重み係数u_i,1，u_i,2，・・・，u_i,I1、v_1,1，v_1,2，・・・，v_1,I2及びv_2,1，v_2,2，・・・，v_2,I2、バイアスb_i，b₁₀₁，b₁₀₂等である。そして、このようにして、次々と学習用データセットを適用させながら、ニューラルネットワーク１０８を最適化してゆく。ニューラルネットワーク１０８の教師付き学習方法は、様々なもの公知であるため、ここでは詳細な説明を省略するが、例えば、誤差逆伝播法を用いた確率的勾配降下法を用いることができる。以上により、学習処理が終了する。 In the following step S135, the learning unit 14d trains the neural network 108 using the image of the golfer region acquired in step S133 as an input and the center of gravity position data acquired in step S121 as a teacher signal. More specifically, the learning unit 14d inputs an image of the golfer region to the current neural network 108, acquires the center of gravity movement amounts W ₁₀₁ and W ₁₀₂ as output values, and obtains the center of gravity movement amounts W ₁₀₁ and W ₁₀₂ . The parameters of the neural network 108 are updated so as to minimize the error from the center of gravity position data. The parameters to be learned here are the above _- mentioned weight filters G ₁ , G ₂ , ..., GN, L ₁ , _L ₂ , ..., LR, and weight coefficients u _{i, 1} , u. _{i, 2} , ..., u _{i, I1} , v _1,1 , v _1,2 , ..., v _{1, I 2} and v 2, ₁ , v 2, ₂ , ..., v _{2, I2} , Bias _bi , b ₁₀₁ , b ₁₀₂ , etc. Then, in this way, the neural network 108 is optimized while applying the training data sets one after another. Since various supervised learning methods of the neural network 108 are known, detailed description thereof will be omitted here, but for example, a stochastic gradient descent method using an error backpropagation method can be used. With the above, the learning process is completed.

＜２－３－３．学習用データセット（肩の回転角度）＞
ニューラルネットワーク２０８を構築するための学習用データセットは、距離画像センサ２により取得される深度フレーム及びスケルトンデータと、これらが取得されるタイミングでのゴルファー７の肩の回転角度（真値）との対のデータであり、このような学習用データセットが多数収集される。学習用データセットに含まれる肩の回転角度（真値）は、ニューラルネットワーク２０８の学習時の教師信号となる。 <2-3-3. Data set for learning (shoulder rotation angle)>
The training data set for constructing the neural network 208 consists of the depth frame and skeleton data acquired by the distance image sensor 2 and the rotation angle (true value) of the golfer 7's shoulder at the timing when these are acquired. It is a pair of data, and many such training data sets are collected. The rotation angle (true value) of the shoulder included in the training data set becomes a teacher signal at the time of training of the neural network 208.

本実施形態では、ゴルファー７の肩の回転角度に関する教師信号は、ゴルファー７の肩中央に取り付けられた角速度センサ２０４（図１１及び図１２参照）により取得される。角速度センサ２０４により計測された角速度データは、有線又は無線の通信線を介して、角速度センサ２０４から動作解析装置１へ出力される。 In the present embodiment, the teacher signal regarding the rotation angle of the shoulder of the golfer 7 is acquired by the angular velocity sensor 204 (see FIGS. 11 and 12) attached to the center of the shoulder of the golfer 7. The angular velocity data measured by the angular velocity sensor 204 is output from the angular velocity sensor 204 to the motion analysis device 1 via a wired or wireless communication line.

なお、ニューラルネットワーク２０８が学習され、記憶部１３内に保存された後においては、角速度センサ２０４は、動作解析システム１０１から省略することができる。また、動作解析システム１０１と、ニューラルネットワーク２０８のモデルを構築するモデル構築システムは、異なるハードウェアにより実現することもできる。 After the neural network 208 is learned and stored in the storage unit 13, the angular velocity sensor 204 can be omitted from the motion analysis system 101. Further, the motion analysis system 101 and the model construction system for constructing the model of the neural network 208 can be realized by different hardware.

＜２－３－４．学習処理の流れ（肩の回転角度）＞
図２２は、ニューラルネットワーク２０８の学習処理の流れを示すフローチャートである。まず、ニューラルネットワーク２０８を構築するための学習用データセットを取得するために、角速度センサ２０４を肩に取り付けたゴルファー７にゴルフクラブ５を試打させ、その様子を距離画像センサ２により動画として撮影する。このとき、好ましくは、学習の効果を高めるために、複数人のゴルファー７により多数回のスイング動作が実施される。そして、ステップＳ１と同様に、第１取得部１４ａが、距離画像センサ２から送られてくる時系列のＩＲフレーム、深度フレーム及びスケルトンデータを取得し、記憶部１３内に格納する（ステップＳ２２１）。また、ステップＳ２２１では、スイング動作中の角速度センサ２０４により計測された時系列の角速度データも、動作解析装置１に送信される。そして、第２取得部１４ｂが、この時系列の角速度データを取得し、これを時系列の肩の回転角度を表すデータ（肩の回転角度データ）に変換した後、記憶部１３内に格納する。記憶部１３内には、多数回のスイング動作に対応するＩＲフレーム、深度フレーム、スケルトンデータ及び肩の回転角度データが格納される。このとき、ＩＲフレーム、深度フレーム及びスケルトンデータと、回転角度データとの同期が取られ、同じタイミングでのデータは、互いに対応付けられて記憶部１３内に格納される。 <2-3-4. Flow of learning process (shoulder rotation angle)>
FIG. 22 is a flowchart showing the flow of the learning process of the neural network 208. First, in order to acquire a training data set for constructing the neural network 208, a golfer 7 having an angular velocity sensor 204 attached to the shoulder is made to test-hit the golf club 5, and the state is photographed as a moving image by the distance image sensor 2. .. At this time, preferably, a large number of swing motions are performed by a plurality of golfers 7 in order to enhance the learning effect. Then, similarly to step S1, the first acquisition unit 14a acquires the time-series IR frame, depth frame, and skeleton data sent from the distance image sensor 2 and stores them in the storage unit 13 (step S221). .. Further, in step S221, the time-series angular velocity data measured by the angular velocity sensor 204 during the swing operation is also transmitted to the motion analysis device 1. Then, the second acquisition unit 14b acquires the angular velocity data of this time series, converts it into data representing the rotation angle of the shoulder in the time series (rotation angle data of the shoulder), and then stores it in the storage unit 13. .. The storage unit 13 stores IR frames, depth frames, skeleton data, and shoulder rotation angle data corresponding to a large number of swing movements. At this time, the IR frame, the depth frame, and the skeleton data are synchronized with the rotation angle data, and the data at the same timing are associated with each other and stored in the storage unit 13.

続いて、学習部１４ｄは、ステップＳ２２１で取得された多数回のスイング動作に対応する時系列の深度フレームに対し、ステップＳ２２２，Ｓ２２３を実行する。ステップＳ２２２，Ｓ２２３は、ステップＳ２２１で取得された深度フレームから、ニューラルネットワーク２０８に入力されるべき図１９に示されるような肩領域の画像を切り出すステップである。なお、ステップＳ２２２，Ｓ２２３は、上述したステップＳ２，Ｓ２０３と同様のステップであるため、ここでは詳細な説明を省略する。 Subsequently, the learning unit 14d executes steps S222 and S223 for the time-series depth frames corresponding to the multiple swing motions acquired in step S221. Steps S222 and S223 are steps to cut out an image of the shoulder region as shown in FIG. 19 to be input to the neural network 208 from the depth frame acquired in step S221. Since steps S222 and S223 are the same steps as steps S2 and S203 described above, detailed description thereof will be omitted here.

続くステップＳ２２５では、学習部１４ｄは、ステップＳ２２３で取得された肩領域の画像を入力とし、ステップＳ２２１で取得された肩の回転角度データを教師信号として、ニューラルネットワーク２０８を学習させる。より具体的には、学習部１４ｄは、肩領域の画像を現在のニューラルネットワーク２０８に入力し、出力値として肩の回転角度Ｗ₂₀₁を取得し、この肩の回転角度Ｗ₂₀₁と肩の回転角度データとの誤差を最小化するように、ニューラルネットワーク２０８のパラメータを更新する。ここでいう学習の対象となるパラメータとは、上述した重みフィルタＧ₁，Ｇ₂，・・・，Ｇ_N，Ｌ₁，Ｌ₂，・・・，Ｌ_R、重み係数u_i,1，u_i,2，・・・，u_i,I1、重み係数v₁，v₂，・・・，v_I2、バイアスb_i，b等である。そして、このようにして、次々と学習用データセットを適用させながら、ニューラルネットワーク２０８を最適化してゆく。ニューラルネットワーク２０８の教師付き学習方法は、様々なもの公知であるため、ここでは詳細な説明を省略するが、例えば、誤差逆伝播法を用いた確率的勾配降下法を用いることができる。以上により、学習処理が終了する。 In the following step S225, the learning unit 14d trains the neural network 208 using the image of the shoulder region acquired in step S223 as an input and the rotation angle data of the shoulder acquired in step S221 as a teacher signal. More specifically, the learning unit 14d inputs an image of the shoulder region to the current neural network 208, acquires the shoulder rotation angle W ₂₀₁ as an output value, and obtains the shoulder rotation angle W ₂₀₁ and the shoulder rotation angle. Update the parameters of the neural network 208 to minimize the error with the data. The parameters to be learned here are the above _- mentioned weight filters G ₁ , G ₂ , ..., GN, L ₁ , _L ₂ , ..., LR, and weight coefficients u _{i, 1} , u. _{i, 2} , ..., u _{i, I1} , weighting factors v ₁ , v ₂ , ..., v _I2 , bias b _i , b, etc. Then, in this way, the neural network 208 is optimized while applying the training data sets one after another. Since various supervised learning methods of the neural network 208 are known, detailed description thereof will be omitted here, but for example, a stochastic gradient descent method using an error backpropagation method can be used. With the above, the learning process is completed.

＜３．変形例＞
以上、本発明のいくつかの実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、その趣旨を逸脱しない限りにおいて、種々の変更が可能である。例えば、以下の変更が可能である。また、以下の変形例の要旨は、適宜組み合わせることができる。 <3. Modification example>
Although some embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and various modifications can be made without departing from the spirit of the present invention. For example, the following changes are possible. In addition, the gist of the following modifications can be combined as appropriate.

＜３－１＞
動作解析処理において導出される動作値は、注目部位の回転角度や位置（座標）に限られず、例えば、回転速度、回転加速度等であってもよいし、速度及び加速度等であってもよい。また、注目部位も、ゴルファー７の腰、肩及び重心に限らず、ゴルファー７の腕、頭等であってもよい。また、ゴルフスイングに限らず、任意の対象の任意の動作を解析することもできる。 <3-1>
The motion value derived in the motion analysis process is not limited to the rotation angle and position (coordinates) of the region of interest, and may be, for example, a rotation speed, a rotation acceleration, or the like, or may be a speed, an acceleration, or the like. Further, the region of interest is not limited to the waist, shoulders and center of gravity of the golfer 7, and may be the arm, head, or the like of the golfer 7. Further, not only the golf swing but also any movement of any object can be analyzed.

＜３－２＞
距離画像センサは、深度画像のみを撮影するものであってもよいし、ＩＲ画像に代えてカラー画像を撮影するものであってもよい。後者の場合、距離画像センサには、可視光を受光する可視光受光部（例えば、ＲＧＢカメラ）を搭載すればよい。 <3-2>
The distance image sensor may capture only a depth image, or may capture a color image instead of an IR image. In the latter case, the distance image sensor may be equipped with a visible light receiving unit (for example, an RGB camera) that receives visible light.

＜３－３＞
第１実施形態では、腰の回転角度が算出され、第２実施形態では、腰の回転角度、肩の回転角度及び重心移動量が算出された。しかしながら、腰の回転角度、肩の回転角度及び重心移動量の３つの中から任意に選択される１又は複数の要素を算出するようにすることができる。例えば、腰の回転角度及び肩の回転角度のみを算出してもよいし、肩の回転角度及び重心移動量のみを算出してもよい。 <3-3>
In the first embodiment, the rotation angle of the waist was calculated, and in the second embodiment, the rotation angle of the waist, the rotation angle of the shoulder, and the amount of movement of the center of gravity were calculated. However, it is possible to calculate one or more elements arbitrarily selected from the three of the rotation angle of the waist, the rotation angle of the shoulder, and the amount of movement of the center of gravity. For example, only the rotation angle of the waist and the rotation angle of the shoulder may be calculated, or only the rotation angle of the shoulder and the amount of movement of the center of gravity may be calculated.

１動作解析装置（モデル構築装置）
２距離画像センサ
３動作解析プログラム
４，２０４角速度センサ
１０４床反力計
５ゴルフクラブ
７ゴルファー（物体）
８，１０８，２０８ニューラルネットワーク
８１識別層
８２特徴抽出部
８３Ａ，８４Ａ畳み込み層
８５ドロップアウト層
１４ａ第１取得部（取得部）
１４ｂ第２取得部
１４ｃ導出部
１４ｄ学習部
１００，１０１動作解析システム 1 Motion analysis device (model construction device)
2 Distance image sensor 3 Motion analysis program 4,204 Angular velocity sensor 104 Floor platform 5 Golf club 7 Golfer (object)
8,108,208 Neural network 81 Identification layer 82 Feature extraction unit 83A, 84A Convolution layer 85 Dropout layer 14a First acquisition unit (acquisition unit)
14b 2nd acquisition unit 14c Derivation unit 14d Learning unit 100,101 Motion analysis system

Claims

It is a motion analysis device for analyzing the motion of an object.
An acquisition unit that acquires a depth image of the movement of the object taken by a distance image sensor, and
A neural network that outputs an operation value that quantitatively represents the operation of the object is provided with a derivation unit that derives the operation value by inputting the depth image acquired by the acquisition unit .
The depth image is an image of a golf swing.
Motion analysis device.

The operation value is the rotation angle of the golfer's waist.
The motion analysis device according to claim 1 .

The acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor.
The derivation unit extracts the waist region near the waist from the depth image based on the skeleton data, and then inputs the image of the waist region to the neural network.
The motion analysis device according to claim 2 .

The neural network has a dropout layer that nullifies the arm region representing the golfer's arm from the depth image.
The motion analysis device according to claim 2 or 3 .

The operation value is a value representing the weight shift of the golfer.
The motion analysis device according to claim 1 .

The acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor.
The derivation unit extracts a golfer region in the vicinity of the golfer from the depth image based on the skeleton data, and then inputs the image of the golfer region to the neural network.
The motion analysis device according to claim 5 .

The operation value is the rotation angle of the golfer's shoulder.
The motion analysis device according to claim 1 .

The acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor.
The derivation unit extracts a shoulder region in the vicinity of the shoulder from the depth image based on the skeleton data, and then inputs the image of the shoulder region to the neural network.
The motion analysis device according to claim 7 .

The neural network has a convolution layer.
The motion analysis device according to any one of claims 1 to 8 .

The acquisition unit acquires the depth image in time series and obtains the depth image.
The derivation unit derives the operation value of the time series by inputting the depth image of the time series into the neural network.
The motion analysis device according to any one of claims 1 to 9 .

The acquisition unit further acquires skeleton data representing the skeleton of the human body measured by the distance image sensor.
The derivation unit extracts a region of interest in the vicinity of the region of interest of the object from the depth image based on the skeleton data, and then inputs the image of the region of interest to the neural network.
The motion analysis device according to any one of claims 1, 9 and 10 .

The neural network has a dropout layer that nullifies non-focused areas representing unfocused parts of the object from the depth image.
The operation analysis device according to any one of claims 1 and 9 to 11 .

It is a motion analysis device for analyzing the motion of an object.
An acquisition unit that acquires a depth image of the movement of the object taken by a distance image sensor, and
A derivation unit that derives the operation value by inputting the depth image acquired by the acquisition unit into a neural network that outputs an operation value that quantitatively represents the operation of the object.
Equipped with
The neural network has a dropout layer that nullifies non-focused areas representing unfocused parts of the object from the depth image.
Motion analysis device.

A model building device that builds a model for analyzing the movement of an object.
A first acquisition unit that acquires a large number of depth images obtained by capturing the movement of the object with a distance image sensor, and
A second acquisition unit that acquires a large number of motion values that quantitatively represent the motion of the object, respectively, corresponding to the large number of depth images.
Based on the large number of depth images acquired by the first acquisition unit, the large number of operation values acquired by the second acquisition unit are used as a teacher signal, the depth image is input, and the operation value is output. Equipped with a learning unit to learn neural networks
The second acquisition unit acquires a large number of operation values representing the rotation angle of the object from the output values of the angular velocity sensor attached to the object.
Model building equipment.

A model building device that builds a model for analyzing the movement of an object.
A first acquisition unit that acquires a large number of depth images obtained by capturing the movement of the object with a distance image sensor, and
A second acquisition unit that acquires a large number of motion values that quantitatively represent the motion of the object, respectively, corresponding to the large number of depth images.
Based on the large number of depth images acquired by the first acquisition unit, the large number of operation values acquired by the second acquisition unit are used as a teacher signal, the depth image is input, and the operation value is output. With a learning unit that learns neural networks
Equipped with
The second acquisition unit acquires a large number of operation values representing the position of the center of gravity of the object from the output value of the floor reaction force meter on which the object rides.
Model building equipment.

It is a motion analysis system for analyzing the motion of an object.
A distance image sensor that captures a depth image that captures the movement of the object,
A neural network that outputs an operation value that quantitatively represents the operation of the object is provided with an operation analysis device that derives the operation value by inputting the depth image taken by the distance image sensor .
The depth image is an image of a golf swing.
system.

It is a motion analysis system for analyzing the motion of an object.
A distance image sensor that captures a depth image that captures the movement of the object,
An motion analysis device that derives the motion value by inputting the depth image captured by the distance image sensor into a neural network that outputs an motion value that quantitatively represents the motion of the object.
Equipped with
The neural network has a dropout layer that nullifies non-focused areas representing unfocused parts of the object from the depth image.
system.

It is a motion analysis method for analyzing the motion of an object.
A step of taking a depth image that captures the movement of the object with a distance image sensor,
A step of deriving the motion value by inputting the depth image captured by the distance image sensor into a neural network that outputs an motion value that quantitatively represents the motion of the object is included.
The depth image is an image of a golf swing.
Motion analysis method.

It is a motion analysis method for analyzing the motion of an object.
A step of taking a depth image that captures the movement of the object with a distance image sensor,
A step of deriving the motion value by inputting the depth image captured by the distance image sensor into a neural network that outputs an motion value that quantitatively represents the motion of the object.
Including
The neural network has a dropout layer that nullifies non-focused areas representing unfocused parts of the object from the depth image.
Motion analysis method.

A motion analysis program for analyzing the motion of an object.
A step of acquiring a depth image of the movement of the object taken by a distance image sensor, and
By inputting the acquired depth image into a neural network that outputs an operation value that quantitatively represents the operation of the object, a computer is made to execute a step of deriving the operation value .
The depth image is an image of a golf swing.
Motion analysis program.

A motion analysis program for analyzing the motion of an object.
A step of acquiring a depth image of the movement of the object taken by a distance image sensor, and
A step of deriving the motion value by inputting the acquired depth image into a neural network that outputs an motion value that quantitatively represents the motion of the object.
Let the computer run
The neural network has a dropout layer that nullifies non-focused areas representing unfocused parts of the object from the depth image.
Motion analysis program.