[go: up one dir, main page]

WO2024247080A1 - Training device, evaluation device, training method, evaluation method, and program - Google Patents

Training device, evaluation device, training method, evaluation method, and program Download PDF

Info

Publication number
WO2024247080A1
WO2024247080A1 PCT/JP2023/020016 JP2023020016W WO2024247080A1 WO 2024247080 A1 WO2024247080 A1 WO 2024247080A1 JP 2023020016 W JP2023020016 W JP 2023020016W WO 2024247080 A1 WO2024247080 A1 WO 2024247080A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
evaluation
video data
data
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2023/020016
Other languages
French (fr)
Japanese (ja)
Inventor
隆昌 永井
翔一郎 武田
健司 江崎
仁志 瀬下
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2023/020016 priority Critical patent/WO2024247080A1/en
Publication of WO2024247080A1 publication Critical patent/WO2024247080A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present invention relates to a learning device, an evaluation device, a learning method, an evaluation method, and a program.
  • the environments in which sports and medical procedures are performed are not necessarily the same.
  • the lighting may be different for an indoor movement
  • the weather may be different for an outdoor movement.
  • the footage shows not only the person being evaluated, but also these environmental differences.
  • the computer evaluation results may differ.
  • the accuracy of the computer evaluation may be poor. This situation is not limited to cases where the person being evaluated is a human, but is the same even when the person being evaluated is an animal, such as a dog or cat.
  • the present invention aims to provide a technology that improves the accuracy of motion evaluation.
  • One aspect of the present invention is a learning device that includes a control unit that learns a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and that obtains results according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data that indicates the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.
  • motion video data which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated
  • motion measurement data which is a time series of results obtained during the motion shown in the motion video data by a motion measurement
  • One aspect of the present invention is an evaluation device that includes an interface unit that acquires a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data, a control unit that performs learning of a learning target using motion video data that is video data showing the motion of an evaluation target that is a human or animal whose motion is to be evaluated, motion measurement data that is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target when performing the motion and that obtains a result according to the motion of the evaluation target, and ground truth data that indicates the result of the evaluation of the motion, and the learning target is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution unit that uses the learned mathematical model obtained by the learning device and the set acquired by the interface unit to evaluate the motion shown in the motion video data included in the set.
  • One aspect of the present invention is a learning method that includes a control step of learning a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and which obtains a result according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data indicating the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.
  • motion video data which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated
  • motion measurement data which is a time series of results obtained during the motion shown in the motion video data by a
  • One aspect of the present invention is an evaluation method including an interface step of acquiring a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data; a control step of performing learning of a learning target using motion video data, which is video data showing the motion of an evaluation target which is a person or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor which is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target during the motion, and which obtains a result according to the motion of the evaluation target, and ground truth data indicating the result of the evaluation of the motion, and the learning target is a mathematical model which evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution step of evaluating the motion shown in the motion video data included in the set using the learned mathematical model obtained by the learning method and the set obtained by the interface step.
  • One aspect of the present invention is a program for causing a computer to function as either or both of the above-mentioned learning device and the above-mentioned evaluation device.
  • the present invention makes it possible to provide technology that improves the accuracy of motion evaluation.
  • FIG. 1 is a diagram showing an example of the configuration of an evaluation system according to an embodiment.
  • FIG. 2 is a diagram showing an example of a hardware configuration of a learning device according to an embodiment.
  • FIG. 2 is a diagram showing an example of a hardware configuration of an evaluation apparatus according to an embodiment.
  • 1 is a flowchart showing an example of a flow of a process executed by a learning device in an embodiment.
  • 4 is a flowchart showing an example of a flow of a process executed by an evaluation device in the embodiment.
  • 13A to 13C are diagrams showing an example of a first modification process in the modified example.
  • FIG. 11 is an explanatory diagram illustrating an example of a process for assigning a noteworthy identifier in a modified example.
  • the evaluation system 100 includes a learning apparatus 1 and an evaluation apparatus 2.
  • the learning apparatus 1 includes a control unit 11 including a processor 91 such as a CPU (Central Processing Unit) and a memory 92 connected by a bus, and executes a program.
  • a processor 91 such as a CPU (Central Processing Unit)
  • a memory 92 connected by a bus, and executes a program.
  • the control unit 11 executes a learning process.
  • the learning process is a process for learning a mathematical model of the learning target.
  • the mathematical model of the learning target is a mathematical model (hereinafter referred to as an "evaluation model") that evaluates the movement of the evaluation target using a set of movement video data, movement measurement data, and ground truth data (hereinafter referred to as "learning data").
  • the motion video data is video data of a video showing the motion of a person or animal (hereafter referred to as the "evaluation subject") whose motion is to be evaluated.
  • the subject of evaluation may be any person or animal, so long as it is an object of evaluation of motion.
  • the subject of evaluation may be, for example, an athlete in a sport such as figure skating, surfing, or diving.
  • the motions evaluated by the evaluation model are the athlete's motions during the competition.
  • the subject of evaluation may be, for example, a medical intern.
  • the motions evaluated by the evaluation model are the motions during a medical procedure such as surgery.
  • the motion measurement data is a time series of results obtained by a motion measurement sensor during the motion captured in the motion video data.
  • the motion measurement sensor is a sensor attached to at least one of the body of the subject to be evaluated, something worn by the subject to be evaluated, and something used by the subject to be evaluated when performing the motion, and obtains results according to the motion of the subject to be evaluated.
  • the motion measurement sensor is, for example, an acceleration sensor.
  • the result obtained by the motion measurement sensor is acceleration. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of acceleration.
  • the motion measurement sensor is, for example, an angular velocity sensor. In such a case, the result obtained by the motion measurement sensor is angular velocity. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of angular velocity.
  • the motion measurement sensor may be, for example, an inertial measurement unit (IMU), or a sensor that obtains biosignals that indicate changes in response to the motion of the subject being evaluated, such as heart rate, brain waves, pulse, blood pressure, breathing, and sweating.
  • IMU inertial measurement unit
  • biosignals that indicate changes in response to the motion of the subject being evaluated, such as heart rate, brain waves, pulse, blood pressure, breathing, and sweating.
  • a sensor that obtains results according to the movements of the subject is attached to the body of the subject or to something worn by the subject, the results obtained by the sensor will be less affected by the environment, such as the intensity and color of lighting, than images, and will have a high correlation with the movements of the subject. Therefore, an evaluation based on not only movement video data but also movement measurement data will be more accurate than an evaluation based only on movement video data.
  • the motion evaluated by the evaluation model is a medical procedure
  • the movement of medical equipment used during the medical procedure such as a scalpel
  • the motion measurement sensor will obtain results that are highly correlated with the motion of the object being evaluated, even if the motion measurement sensor is not attached to the body of the object being evaluated. Again, this is a result that is less affected by the environment, such as the intensity and color of lighting, than video.
  • the motion measurement sensor may be attached to the collar, for example, or, if a tag is attached to the foot or the like, it may be attached to the tag.
  • Such motion measurement sensors may be sensors that obtain information on the movement of the evaluation subject in each of three linearly independent directions, such as forward/backward, left/right, and up/down, or may be sensors that obtain information in one or two dimensions.
  • the motion measurement data used by the evaluation model does not have to be the results obtained by a single motion measurement sensor.
  • the evaluation model may use motion measurement data obtained by multiple motion measurement sensors. In such a case, multiple motion measurement data are input to the evaluation model.
  • the multiple motion measurement sensors may be multiple types of motion measurement sensors that obtain different physical quantities, such as a motion measurement sensor that obtains acceleration and a motion measurement sensor that obtains bioinformation. In such a case, the evaluation model uses time series of multiple different types of physical quantities as motion measurement data for evaluation.
  • Correct answer data is data that indicates the results of the evaluation of the subject's movements.
  • control unit 11 updates the evaluation model based on the evaluation result of the evaluation model and the correct answer data so as to reduce the difference between the evaluation result of the evaluation model and the correct answer data.
  • the difference may be expressed, for example, by the L1 norm, the L2 norm, or the linear sum of the L1 norm and the L2 norm.
  • the learning process is executed until a predetermined end condition for learning (hereinafter referred to as the "learning end condition") is satisfied.
  • the learning end condition is, for example, a condition that the change in the learning object due to learning is smaller than a predetermined change.
  • the learning end condition may be, for example, a condition that the learning object has been updated a predetermined number of times or more.
  • the evaluation device 2 evaluates the movement of the evaluation subject using a trained evaluation model, which is the evaluation model at the time when the learning termination condition is satisfied.
  • the evaluation device 2 accepts input of a set of movement video data showing the movement of the evaluation subject and movement measurement data obtained during the movement shown in the movement video data (hereinafter referred to as "inference stage data").
  • the evaluation device 2 performs evaluation based on the accepted inference stage data.
  • the evaluation model may be of any type as long as it can evaluate the motion of the evaluation subject using motion video data, motion measurement data, and ground truth data.
  • the evaluation model includes, for example, a video data feature acquisition process and a measurement data feature acquisition process.
  • the video data feature acquisition process is a process for acquiring the feature of motion video data.
  • the measurement data feature acquisition process is a process for acquiring the feature of motion measurement data.
  • An evaluation model including a video data feature acquisition process and a measurement data feature acquisition process evaluates the motion of the evaluation subject based on the results of the video data feature acquisition process and the results of the measurement data feature acquisition process.
  • the feature may be obtained for each group (hereinafter referred to as "partial video data") of the action video data divided into a plurality of groups along the time axis. Therefore, in this case, the feature of the partial video data is obtained for each partial video data.
  • the video is motion video data consisting of P x Q frames (P is an integer equal to or greater than 1, and Q is an integer equal to or greater than 2)
  • the video is divided into partial video data for every P frames, for example, in the order of frame numbers, and features are acquired for each partial video data. Therefore, in such a case, the video data feature acquisition process obtains Q features from the motion video data.
  • video data division process the process of dividing the motion video data into partial video data (hereinafter referred to as the "video data division process”) may be performed at any time before the video data feature acquisition process is performed.
  • the video data division process is a process of dividing the motion video data along the time axis, for example, for each predetermined number of frames.
  • action video data that has been divided into partial video data is also a type of action video data. Therefore, the action video data input to the evaluation model may be action video data after video data division processing.
  • the video data division processing may be executed, for example, by the control unit 11, or may be executed by another device other than the learning device 1.
  • the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model.
  • the motion video data input to the evaluation model may be motion video data before the video data division process, and in such a case, when the video data division process is also executed, the evaluation model executes the video data division process.
  • the same situation applies to the measurement data feature acquisition process.
  • feature values may be obtained for each group (hereinafter referred to as "partial measurement data") after the motion measurement data, which is divided into a plurality of groups at a predetermined time width along the time axis, is divided. Therefore, in this case, feature values for each partial measurement data are obtained.
  • the number of feature values obtained is greater than that obtained when the motion measurement data is not divided into partial measurement data.
  • the process of dividing the motion measurement data into partial measurement data may be performed at any time before the measurement data feature acquisition process is performed.
  • the measurement data division process is, for example, a process of dividing the measurement video data, which is a type of time series, along the time axis for each predetermined time width.
  • the motion measurement data input to the evaluation model may be motion measurement data after measurement data division processing.
  • the measurement data division processing may be executed, for example, by the control unit 11, or may be executed by a device other than the learning device 1.
  • the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model.
  • the motion measurement data input to the evaluation model may be the motion measurement data before the measurement data division process, and in such a case, when the measurement data division process is also executed, the evaluation model executes the measurement data division process.
  • the evaluation model evaluates the motion of the evaluation object based on the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process.
  • the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process may be used in an integrated state to evaluate the motion of the evaluation object.
  • the results of the video data feature acquisition process and the measurement data feature acquisition process may be used in an unintegrated state to evaluate the motion of the evaluation subject.
  • Integration is the process of combining different data into one.
  • the data when the data is expressed as vectors, it is the process of obtaining the Cartesian product of the vectors.
  • the data when the data is expressed as matrices, it may be the process of concatenating matrices in the row or column direction.
  • one or more fully connected layers may be used, or the Transformer Encoder described in Reference 1 below may be used.
  • the number of Transformer Blocks may be any number, for example, two Transformer Blocks may be connected in a row.
  • ⁇ Example of hardware configuration> 2 is a diagram showing an example of a hardware configuration of the learning device 1 according to the embodiment.
  • the learning device 1 includes the control unit 11 and executes a program.
  • the learning device 1 functions as a device including the control unit 11, an interface unit 12, and a storage unit 13 by executing the program.
  • the processor 91 reads out a program stored in the storage unit 13 and stores the read out program in the memory 92.
  • the processor 91 executes the program stored in the memory 92, whereby the learning device 1 functions as a device including a control unit 11, an interface unit 12, and a storage unit 13.
  • the control unit 11 performs, for example, a learning process.
  • the control unit 11 performs, for example, a video data division process.
  • the control unit 11 performs, for example, a measurement data division process.
  • the control unit 11 controls, for example, the operation of each functional unit provided in the learning device 1.
  • the control unit 11 acquires learning data, for example, via the interface unit 12.
  • the control unit 11 acquires, for example, information stored in the memory unit 13.
  • the process of acquiring information stored in the memory unit 13 is specifically a read process.
  • the interface unit 12 includes a communication interface for connecting the learning device 1 to an external device.
  • the interface unit 12 communicates with the external device via wired or wireless communication.
  • the external device is, for example, a device that transmits learning data. In such a case, the interface unit 12 acquires learning data by communicating with the device that transmits the learning data.
  • the interface unit 12 obtains the motion video data after the video data division processing by the external device as motion video data including learning data.
  • the interface unit 12 obtains the motion measurement data after the measurement data division processing by the external device as motion measurement data including learning data.
  • the external device is, for example, the evaluation device 2.
  • the interface unit 12 transmits the trained evaluation model obtained by the learning process (i.e., the trained evaluation model) to the evaluation device 2 through communication with the evaluation device 2.
  • the interface unit 12 may be configured to include input devices such as a mouse, keyboard, touch panel, etc.
  • the interface unit 12 may be configured as an interface that connects these input devices to the learning device 1. In this way, the interface unit 12 accepts input of various information to the learning device 1 via the input device, either wired or wirelessly. Note that learning data does not necessarily need to be input to a communication interface, and may be input to an input device.
  • the interface unit 12 may output various types of information.
  • the interface unit 12 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, or an organic EL (Electro-Luminescence) display.
  • the interface unit 12 may be configured as an interface that connects these display devices to the learning device 1.
  • the interface unit 12 outputs information that has been input to the interface unit 12, for example.
  • the memory unit 13 is configured using a computer-readable storage medium device (non-transitory computer-readable recording medium) such as a magnetic hard disk device or a semiconductor storage device.
  • the memory unit 13 stores various information related to the learning device 1.
  • the memory unit 13 stores, for example, information necessary for the learning process.
  • the memory unit 13 stores, for example, an evaluation model in advance.
  • the memory unit 13 stores, for example, various information generated by the operation of the control unit 11.
  • the memory unit 13 stores, for example, a learned evaluation model.
  • the memory unit 13 stores, for example, information acquired by the interface unit 12.
  • FIG. 3 is a diagram showing an example of the hardware configuration of the evaluation device 2 in an embodiment.
  • the evaluation device 2 includes a control unit 21 (an example of a trained evaluation model execution unit) and executes a program. By executing the program, the evaluation device 2 functions as a device including the control unit 21, an interface unit 22, and a memory unit 23.
  • the processor 93 reads the program stored in the storage unit 23 and stores the read program in the memory 94.
  • the processor 93 executes the program stored in the memory 94, whereby the evaluation device 2 functions as a device including the control unit 21, the interface unit 22, and the storage unit 23.
  • the control unit 21 executes the learned evaluation model, for example, with the inference stage data as the execution target. This allows the control unit 21 to evaluate the actions depicted in the action video data included in the inference stage data. In other words, the control unit 21 uses the learned evaluation model and the inference stage data to evaluate the actions depicted in the action video data included in the inference stage data.
  • control unit 21 performs the video data splitting processing before executing the learned evaluation model. If the control unit 11 has performed measurement data splitting processing during the learning stage of the evaluation model but the measurement data splitting processing is not included in the evaluation model, the control unit 21 performs the measurement data splitting processing before executing the learned evaluation model.
  • the control unit 21 controls the operation of each functional unit of the evaluation device 2, for example.
  • the control unit 21 acquires various information acquired, for example, via the interface unit 22.
  • the control unit 21 acquires data on which the trained evaluation model is to be executed (i.e., inference stage data).
  • the control unit 21 acquires information stored in the memory unit 23, for example.
  • the process of acquiring information stored in the memory unit 23 is specifically a read process.
  • the interface unit 22 includes a communication interface for connecting the evaluation device 2 to an external device.
  • the interface unit 22 communicates with the external device via wired or wireless communication.
  • the external device is, for example, a device that transmits inference stage data.
  • the interface unit 22 acquires the inference stage data by communicating with the device that transmits the inference stage data.
  • the external device is, for example, the learning device 1.
  • the interface unit 22 acquires the trained evaluation model by communicating with the learning device 1.
  • the interface unit 22 obtains the motion video data after the video data division process by the external device as motion video data including inference stage data.
  • the evaluation device 2 also obtains the result of the video data division process executed by the external device.
  • the interface unit 22 obtains the motion measurement data after the measurement data division process by the external device as motion measurement data including inference stage data.
  • the evaluation device 2 also obtains the result of the measurement data division process executed by the external device.
  • the interface unit 22 includes input devices such as a mouse, keyboard, and touch panel.
  • the interface unit 22 may be configured as an interface that connects these input devices to the evaluation device 2. In this way, the interface unit 22 accepts input of various information to the evaluation device 2 via the input device, either wired or wirelessly.
  • the inference stage data does not necessarily need to be input to a communication interface, and may be input to an input device.
  • the interface unit 22 outputs various types of information.
  • the interface unit 22 includes a display device such as a CRT display, a liquid crystal display, or an organic EL display.
  • the interface unit 22 may be configured as an interface that connects these display devices to the evaluation device 2.
  • the interface unit 22 outputs, for example, information input to the interface unit 22.
  • the storage unit 23 is configured using a computer-readable storage medium device such as a magnetic hard disk device or a semiconductor storage device.
  • the storage unit 23 stores various information related to the evaluation device 2.
  • the storage unit 23 stores, for example, a trained evaluation model.
  • the storage unit 23 stores, for example, various information generated by the operation of the control unit 21.
  • the storage unit 23 stores, for example, information acquired by the interface unit 22.
  • FIG. 4 is a flowchart showing an example of the flow of processing executed by the learning device 1 in the embodiment.
  • the control unit 11 acquires learning data (step S101).
  • the control unit 11 executes a learning process using the acquired learning data until a learning end condition is satisfied (step S102).
  • the control unit 11 outputs the trained evaluation model to a predetermined output destination (step S103).
  • the predetermined output destination is, for example, the memory unit 13. Note that output to the memory unit 13 means writing to the memory unit 13.
  • the predetermined output destination may be, for example, an external device such as the evaluation device 2 that is communicatively connected via the interface unit 12.
  • FIG. 5 is a flowchart showing an example of the flow of processing executed by the evaluation device 2 in an embodiment. For ease of explanation, it is assumed in the example of the flowchart in FIG. 5 that the trained evaluation model has been stored in advance in the storage unit 23.
  • Inference stage data is input to the interface unit 22, and the control unit 21 acquires the inference stage data input to the interface unit 22 (step S201). Inputting inference stage data to the interface unit 22 means that the interface unit 22 acquires the inference stage data.
  • the control unit 21 executes the learned evaluation model on the inference stage data obtained in step S201 (step S202). By the processing of step S202, the action shown in the action video data included in the inference stage data obtained in step S201 is evaluated. After step S202, the control unit 21 outputs the evaluation result obtained in step S202 to a predetermined output destination (step S203).
  • the predetermined output destination is, for example, the memory unit 23. Note that output to the memory unit 23 means writing to the memory unit 23.
  • the predetermined output destination may be, for example, a predetermined external device connected for communication via the interface unit 22.
  • the learning device 1 configured in this manner learns an evaluation model based not only on movement video data but also on movement measurement data. Because the movement measurement data is obtained by a movement measurement sensor, it is less affected by the environment, such as lighting intensity and color, than video. Therefore, an evaluation model learned based not only on movement video data but also on movement measurement data can be evaluated with higher accuracy than when based only on movement video data. Therefore, the learning device 1 can improve the accuracy of movement evaluation.
  • the evaluation device 2 configured in this way evaluates the movement using the learned mathematical model obtained by the learning device 1. Therefore, the evaluation device 2 can improve the accuracy of the movement evaluation.
  • the evaluation system 100 configured in this manner also includes a learning device 1. As a result, the evaluation system 100 can improve the accuracy of the evaluation of actions.
  • the first transformation process is a process in which the dimensions of the feature amounts obtained in the video data feature amount acquisition process and the feature amounts obtained in the measurement data feature amount acquisition process are aligned via a linear layer, input to a transformer block via embedding processing, and evaluated through an estimator.
  • the estimator is, for example, a plurality of fully connected layers.
  • the processing of the Linear layer is a process for aligning the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process, so if the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process are originally the same, it does not necessarily have to be executed.
  • the structure of the Linear layer may be any, and may be, for example, a single fully connected layer.
  • the Transformer block handles inputs in parallel, it cannot take into account the chronological order of the inputs, or whether the input is the result of video data feature acquisition processing or measurement data feature acquisition processing. "Not being able to take into account” means that such information cannot be used. Therefore, when using the Transformer, information indicating the attributes of the features input to the Transformer must be provided by the embedding process to the Transformer block, which is the layer that executes the Transformer.
  • the information indicating the attribute of a feature includes, for example, information as to whether the feature is the result of a video data feature acquisition process or a measurement data feature acquisition process. For example, if the feature is the result of a video data feature acquisition process, the information indicating the attribute may include information indicating which partial video data the feature is a feature of. For example, if the feature is the result of a measurement data feature acquisition process, the information indicating the attribute may include information indicating which partial measurement data the feature is a feature of.
  • the embedding process obtains information indicating the attributes of the features used in the Transformer so that the Transformer process is based on information indicating the attributes of the features.
  • the embedding process in the first transformation process is, for example, sensor embedding.
  • Sensor embedding is a process that assigns an identifier to each feature input to the Transformer, identifying whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process.
  • the identifier can be any identifier as long as it can identify whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process.
  • the identifier may be, for example, "1" indicating that it is the result of the video data feature acquisition process, and "0" indicating that it is the result of the measurement data feature acquisition process. Also, instead of integers such as 1 or 0, it may be a decimal point such as 0.1, or it may not even be a number.
  • the identifier to be assigned is determined in advance based on the results of the video data feature acquisition process and the measurement data feature acquisition process.
  • Positional embedding is a process that assigns an identifier indicating which partial video data the feature is from when the feature is the result of a video data feature acquisition process, and assigns an identifier indicating which partial measurement data the feature is from when the feature is the result of a measurement data feature acquisition process.
  • the assigned identifier is, for example, an identifier that indicates the temporal order of the partial video data or partial measurement data.
  • FIG. 6 is a diagram showing an example of the first transformation process in the modified example.
  • one piece of motion video data and two pieces of motion measurement data are used.
  • one piece of motion video data is divided into partial video data, and both pieces of motion measurement data are divided into partial measurement data.
  • both the partial video data and the partial measurement data are expressed as "Clip”.
  • Position Embedding and Sensor Embedding are performed.
  • there is a Linear layer and the output of the Linear layer is input to a Transformer block together with the results of Position Embedding and Sensor Embedding.
  • evaluation is performed based on the results of the Transformer block.
  • video footage data and movement measurement data may contain movements that are not necessarily suitable for evaluating movements.
  • this data may include data on the period before the competition begins or the period after the competition ends when players are substituted.
  • the movements indicated by this data are not subject to evaluation, and may act as noise when evaluating movements.
  • the learning device 1 may therefore suppress the occurrence of such a situation by making the time series samples in the action video data and the action measurement data non-uniform.
  • the evaluation model also performs evaluation based on a focus identifier, which is an identifier that indicates that an influence on the evaluation of an action is stronger than others in the action video data and the action measurement data. In such a case, the evaluation model is capable of evaluation with higher accuracy.
  • the attention identifier is assigned to samples from a period that includes the time when a specified index showing the intensity of the movement is highest in the graph shown by the movement measurement data and that satisfies a specified condition.
  • the evaluation model evaluates samples that have been assigned an attention identifier by weighting them more heavily than samples that have not been assigned an attention identifier.
  • the predetermined condition may be any condition that specifies a period of time that includes the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data.
  • the predetermined condition is a condition that specifies each time span before and after the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data as a predetermined time span.
  • the evaluation model weights samples that have been assigned a focus identifier more heavily than samples that do not. Therefore, since the more intense the movement, the stronger the impact on the evaluation of the movement should be, the focus identifier is assigned to samples from the period that includes the time when a specified index showing the degree of movement intensity is highest in the graph shown by the movement measurement data, and that satisfies specified conditions. As a result, the evaluation model can perform evaluation with greater accuracy.
  • assigning a focus identifier to data means determining that the data being evaluated satisfies the content indicated by the focus identifier, and recording information indicating the evaluation result in a specified storage device from which an evaluation model, such as storage unit 13 or storage unit 23, can read information.
  • the process of making the time series samples in the action video data and the action measurement data non-uniform includes a process of dividing the action video data and the action measurement data into a plurality of Cilps, and in the process of dividing into Clips, the action video data and the action measurement data may be divided into a plurality of Cilps such that the number of Clips included in a period that includes the time when a predetermined index indicating the degree of intensity of the action is highest in the graph shown by the action measurement data and that satisfies a predetermined condition is greater than in other periods.
  • FIG. 7 is an explanatory diagram illustrating an example of the process of assigning an attention identifier in a modified example.
  • FIG. 7 shows three types of time series.
  • the three types of time series are IMU signals.
  • an attention identifier is assigned to motion video data and motion measurement data within a period that satisfies a predetermined attention period condition that includes at least the condition that the maximum differential coefficient time is included within the period.
  • the maximum derivative time is the time that gives the maximum derivative value in the graph of the movement measurement data. When moving, the value of the movement measurement data should change more drastically than when not moving.
  • the time of maximum derivative coefficient is an example of the time at which a specified index showing the degree of intensity of the movement is highest in the graph shown by the movement measurement data.
  • the set of samples that have been assigned a focus identifier are samples from the period indicated as "Technique".
  • the horizontal axis indicates time.
  • the "Preparation Period” is the period when the athlete is preparing.
  • a "Technique” is the period when the athlete is performing an action.
  • Post-Execution is the period when the athlete has finished performing an action and is not performing an action.
  • the process of assigning an attention identifier to the action video data and action measurement data within a period that satisfies the attention period condition based on the action measurement data is referred to as the attention identifier assignment process.
  • the attention period condition may be any condition including a condition that the maximum derivative time is included within the period.
  • the attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time.
  • the attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time, and is longer than a period that does not satisfy the attention period condition.
  • the attention period condition may be, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period may be a predetermined length that is at least 1/3 of the time from the start to the end of the image shown by the motion image data.
  • the image shown by the motion video data may show a preparatory period when the athlete is getting ready, a period when the athlete is performing, and a period after the athlete has finished performing.
  • the preparatory period when the athlete is getting ready, the period when the athlete is performing, and the period after the athlete has finished performing are all of the same duration, it is preferable to give the evaluation model a period centered on the maximum differential coefficient time and that is 1/3 the length of the video shown by the motion video data.
  • the attention period condition is, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period is a predetermined length that is at least 1/3 of the time from the start to the end of the video shown by the action video data.
  • the attention identifier assignment process When executed, it may be executed at any timing before the evaluation model determines the evaluation of the action.
  • the attention identifier assignment process is executed, for example, before the video data feature acquisition process and the measurement data feature acquisition process are executed.
  • the evaluation model acquires features by weighting data to which an attention identifier has been assigned more heavily than other data in the video data feature acquisition process and the measurement data feature acquisition process.
  • the attention identifier assignment process may be executed, for example, after the video data division process and the measurement data division process are executed. Note that, if the video data division process and the measurement data division process are executed, the attention identifier assignment process may be assigned to each partial video data and partial measurement data, thereby assigning an attention identifier to the sample.
  • the attention identifier assignment process is executed after the video data division process and the measurement data division process, for example, it may be executed after the video data feature acquisition process and the measurement data feature acquisition process.
  • the evaluation model may determine the evaluation by giving a large weight to the feature to which the attention identifier has been assigned among the obtained feature values.
  • the attention identifier assignment process may be executed by the control unit 11 during the learning stage, or may be executed by an external device when the video data division process and the measurement data division process are executed by the external device.
  • the control unit 11 obtains the results of the attention identifier assignment process together with the results of the division of the video data division process and the measurement data division process via the interface unit 12.
  • the evaluation device 2 also uses the result of the attention identifier assignment process.
  • the control unit 21 obtains the result of the attention identifier assignment process by, for example, the control unit 21 executing the attention identifier assignment process.
  • the control unit 21 may obtain the result of the attention identifier assignment process by having an external device execute the attention identifier assignment process and obtaining the result.
  • the control unit 21 acquires the result of the attention identifier assignment process by having the control unit 21 execute the attention identifier assignment process.
  • the control unit 21 may acquire the result of the attention identifier assignment process by having the external device execute the attention identifier assignment process and acquiring the result.
  • the learning device 1 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 11 may be executed in a distributed manner by the multiple information processing devices.
  • the evaluation device 2 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 21 may be executed in a distributed manner by the multiple information processing devices.
  • the learning device 1 and the evaluation device 2 do not necessarily need to be implemented in different housings, and may be integrated into a single housing.
  • the learning device 1 and the evaluation device 2 may be a computer that functions as both the learning device 1 and the evaluation device 2 by executing a program.
  • All or part of the functions of the learning device 1 and the evaluation device 2 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array).
  • the program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems.
  • the program may be transmitted via a telecommunications line.
  • the control unit 21 is an example of a trained evaluation model execution unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

One aspect of the present invention is a training device comprising a control unit that trains a training target by using: action video data that is video data of a video showing an action of an evaluation target, which is a person or animal whose action is to be evaluated; action measurement data indicating time-series results, which are obtained during an action shown in the action video data, of an action measurement sensor that is a sensor attached to at least one of the body of the evaluation target, an item worn by the evaluation target, and an item used by the evaluation target during the action, and obtains a result according to the action of the evaluation target; and correct data indicating a result of evaluating the action. The training target is a mathematical model that evaluates the action on the basis of the action video data and the action measurement data.

Description

学習装置、評価装置、学習方法、評価方法及びプログラムLearning device, evaluation device, learning method, evaluation method, and program

 本発明は、学習装置、評価装置、学習方法、評価方法及びプログラムに関する。 The present invention relates to a learning device, an evaluation device, a learning method, an evaluation method, and a program.

 フィギュアスケートやサーフィンや飛び込み競技等の競技者の動作を採点する競技における採点や、医療実習生による手術等の医療行為中の動作の採点など、評価対象者の動作の評価を映像に基づいてコンピュータに行わせる技術への関心が高まっている。 Interest is growing in technology that allows computers to evaluate the movements of people being evaluated based on video footage, such as in judging the movements of athletes in competitions such as figure skating, surfing, and diving, and in judging the movements of medical interns during medical procedures such as surgery.

Paritosh Parmar and Brendan Tran Morri, ”What and How Well You Performed? A Multitask Learning Approach to Action Quality Assess”, arXiv:1904.04346v2, (2019)Paritosh Parmar and Brendan Tran Morri, “What and How Well You Performed? A Multitask Learning Approach to Action Quality Assess”, arXiv:1904.04346v2, (2019)

 しかしながら、競技や医療行為が行われる環境は必ずしも同じではなく、例えば室内で行われる動作であれば照明の具合が異なったり、室外で行われる動作であれば天候が異なったりする。映像には、評価対象者だけでなくこういった環境の違いも写る。その結果、たとえ評価対象者の動作が同じであったとしても、コンピュータによる評価結果が異なる場合があった。すなわち、コンピュータによる評価の精度が悪い場合があった。なお、このような事情は、評価対象者が人である場合だけに限らず、評価対象者が犬や猫等の動物である場合であっても同様であった。 However, the environments in which sports and medical procedures are performed are not necessarily the same. For example, the lighting may be different for an indoor movement, and the weather may be different for an outdoor movement. The footage shows not only the person being evaluated, but also these environmental differences. As a result, even if the movements of the person being evaluated are the same, the computer evaluation results may differ. In other words, the accuracy of the computer evaluation may be poor. This situation is not limited to cases where the person being evaluated is a human, but is the same even when the person being evaluated is an animal, such as a dog or cat.

 上記事情に鑑み、本発明は、動作の評価の精度の高める技術を提供することを目的としている。 In view of the above, the present invention aims to provide a technology that improves the accuracy of motion evaluation.

 本発明の一態様は、動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御部、を備え、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習装置である。 One aspect of the present invention is a learning device that includes a control unit that learns a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and that obtains results according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data that indicates the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.

 本発明の一態様は、評価対象の動作を写す動作映像データと、前記動作映像データに写る動作中に得られた動作計測データとの組を取得するインタフェース部と、動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御部、を備え、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習装置の得た学習済みの前記数理モデルと、前記インタフェース部の取得した前記組と、を用い、前記組に含まれる前記動作映像データに写る動作の評価を行う、学習済み評価モデル実行部と、を備える評価装置である。 One aspect of the present invention is an evaluation device that includes an interface unit that acquires a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data, a control unit that performs learning of a learning target using motion video data that is video data showing the motion of an evaluation target that is a human or animal whose motion is to be evaluated, motion measurement data that is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target when performing the motion and that obtains a result according to the motion of the evaluation target, and ground truth data that indicates the result of the evaluation of the motion, and the learning target is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution unit that uses the learned mathematical model obtained by the learning device and the set acquired by the interface unit to evaluate the motion shown in the motion video data included in the set.

 本発明の一態様は、動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御ステップ、を有し、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習方法である。 One aspect of the present invention is a learning method that includes a control step of learning a learning subject using motion video data, which is video data of a video showing the motion of an evaluation subject, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation subject, something worn by the evaluation subject, and something used by the evaluation subject when performing the motion, and which obtains a result according to the motion of the evaluation subject, during the motion shown in the motion video data, and ground truth data indicating the result of the evaluation of the motion, and the learning subject is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data.

 本発明の一態様は、評価対象の動作を写す動作映像データと、前記動作映像データに写る動作中に得られた動作計測データとの組を取得するインタフェースステップと、動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御ステップ、を有し、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習方法の得た学習済みの前記数理モデルと、前記インタフェースステップの取得した前記組と、を用い、前記組に含まれる前記動作映像データに写る動作の評価を行う、学習済み評価モデル実行ステップと、を有する評価方法である。 One aspect of the present invention is an evaluation method including an interface step of acquiring a set of motion video data showing the motion of an evaluation target and motion measurement data obtained during the motion shown in the motion video data; a control step of performing learning of a learning target using motion video data, which is video data showing the motion of an evaluation target which is a person or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor which is a sensor attached to at least one of the body of the evaluation target, something worn by the evaluation target, and something used by the evaluation target during the motion, and which obtains a result according to the motion of the evaluation target, and ground truth data indicating the result of the evaluation of the motion, and the learning target is a mathematical model which evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution step of evaluating the motion shown in the motion video data included in the set using the learned mathematical model obtained by the learning method and the set obtained by the interface step.

 本発明の一態様は、上記の学習装置と、上記の評価装置と、のいずれか一方又は両方としてコンピュータを機能させるためのプログラムである。 One aspect of the present invention is a program for causing a computer to function as either or both of the above-mentioned learning device and the above-mentioned evaluation device.

 本発明により、動作の評価の精度の高める技術を提供することが可能となる。 The present invention makes it possible to provide technology that improves the accuracy of motion evaluation.

実施形態の評価システムの構成の一例を示す図。FIG. 1 is a diagram showing an example of the configuration of an evaluation system according to an embodiment. 実施形態における学習装置のハードウェア構成の一例を示す図。FIG. 2 is a diagram showing an example of a hardware configuration of a learning device according to an embodiment. 実施形態における評価装置のハードウェア構成の一例を示す図。FIG. 2 is a diagram showing an example of a hardware configuration of an evaluation apparatus according to an embodiment. 実施形態における学習装置が実行する処理の流れの一例を示すフローチャート。1 is a flowchart showing an example of a flow of a process executed by a learning device in an embodiment. 実施形態における評価装置が実行する処理の流れの一例を示すフローチャート。4 is a flowchart showing an example of a flow of a process executed by an evaluation device in the embodiment. 変形例における第1変形処理の一例を示す図。13A to 13C are diagrams showing an example of a first modification process in the modified example. 変形例における注目識別子の付与の処理の一例を説明する説明図。FIG. 11 is an explanatory diagram illustrating an example of a process for assigning a noteworthy identifier in a modified example.

 (実施形態)
 図1は、実施形態の評価システム100の構成の一例を示す図である。評価システム100は、学習装置(Learning Apparatus)1と評価装置(Evaluation Apparatus)2とを備える。学習装置1は、バスで接続されたCPU(Central Processing Unit)等のプロセッサ91とメモリ92とを備える制御部11を備え、プログラムを実行する。
(Embodiment)
1 is a diagram showing an example of the configuration of an evaluation system 100 according to an embodiment. The evaluation system 100 includes a learning apparatus 1 and an evaluation apparatus 2. The learning apparatus 1 includes a control unit 11 including a processor 91 such as a CPU (Central Processing Unit) and a memory 92 connected by a bus, and executes a program.

 制御部11は、学習処理を実行する。学習処理は、学習対象の数理モデルの学習を行う処理である。学習対象の数理モデルは、動作映像データと、動作計測データと、正解データとの組(以下「学習用データ」という。)を用いて、評価対象体の動作の評価を行う数理モデル(以下「評価モデル」という。)である。 The control unit 11 executes a learning process. The learning process is a process for learning a mathematical model of the learning target. The mathematical model of the learning target is a mathematical model (hereinafter referred to as an "evaluation model") that evaluates the movement of the evaluation target using a set of movement video data, movement measurement data, and ground truth data (hereinafter referred to as "learning data").

 動作映像データは、動作の評価対象である人又は動物(以下「評価対象体」という)の動作を写す映像の映像データである。 The motion video data is video data of a video showing the motion of a person or animal (hereafter referred to as the "evaluation subject") whose motion is to be evaluated.

 評価対象体は、動作の評価対象であればどのような人又は動物であってもよい。評価対象体は、例えば、フィギュアスケートやサーフィンや飛び込み競技等の競技者である。評価対象体が競技者である場合、評価モデルによって評価される動作は、競技者の競技中の動作である。評価対象体は、例えば、医療実習生であってもよい。評価対象体が医療実習生である場合、評価モデルによって評価される動作は、手術等の医療行為中の動作である。 The subject of evaluation may be any person or animal, so long as it is an object of evaluation of motion. The subject of evaluation may be, for example, an athlete in a sport such as figure skating, surfing, or diving. When the subject of evaluation is an athlete, the motions evaluated by the evaluation model are the athlete's motions during the competition. The subject of evaluation may be, for example, a medical intern. When the subject of evaluation is a medical intern, the motions evaluated by the evaluation model are the motions during a medical procedure such as surgery.

 動作計測データは、動作映像データに写る動作中に動作計測センサの得た結果の時系列である。動作計測センサは、評価対象体の身体と評価対象体が身に着けるものと評価対象体が動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって評価対象体の動作に応じた結果を得るセンサである。 The motion measurement data is a time series of results obtained by a motion measurement sensor during the motion captured in the motion video data. The motion measurement sensor is a sensor attached to at least one of the body of the subject to be evaluated, something worn by the subject to be evaluated, and something used by the subject to be evaluated when performing the motion, and obtains results according to the motion of the subject to be evaluated.

 動作計測センサは、例えば、加速度センサである。このような場合、動作計測センサの得る結果は、加速度である。したがってこのような場合、動作計測センサの得た結果の時系列は、加速度の時系列である。動作計測センサは、例えば、角速度センサである。このような場合、動作計測センサの得る結果は、角速度である。したがってこのような場合、動作計測センサの得た結果の時系列は、角速度の時系列である。 The motion measurement sensor is, for example, an acceleration sensor. In such a case, the result obtained by the motion measurement sensor is acceleration. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of acceleration. The motion measurement sensor is, for example, an angular velocity sensor. In such a case, the result obtained by the motion measurement sensor is angular velocity. Therefore, in such a case, the time series of the results obtained by the motion measurement sensor is a time series of angular velocity.

 その他、動作計測センサは、例えば、慣性計測ユニット(IMU)であってもよいし、心拍や脳波や脈拍や血圧や呼吸や発汗等の評価対象体の動作に応じた変化を示す生体信号を得るセンサであってもよい。 Otherwise, the motion measurement sensor may be, for example, an inertial measurement unit (IMU), or a sensor that obtains biosignals that indicate changes in response to the motion of the subject being evaluated, such as heart rate, brain waves, pulse, blood pressure, breathing, and sweating.

 評価対象体の動作に応じた結果を得るセンサが評価対象体の身体や評価対象体が身に着けるものに取り付けられていれば、そのセンサの得る結果は、映像よりも照明の強さや色等の環境の影響を受けない結果であり、評価対象体の動作に高い相関を有する。そのため、動作映像データだけでなく動作計測データにも基づいた評価の方が、動作映像データだけに基づいた評価よりも、精度が高い。 If a sensor that obtains results according to the movements of the subject is attached to the body of the subject or to something worn by the subject, the results obtained by the sensor will be less affected by the environment, such as the intensity and color of lighting, than images, and will have a high correlation with the movements of the subject. Therefore, an evaluation based on not only movement video data but also movement measurement data will be more accurate than an evaluation based only on movement video data.

 評価対象体が動作の際に用いるものについて説明する。例えば競技がサーフィンである場合には、評価対象体は競技中にサーフボードを用いる。サーフボードは競技中の多くの場面で評価対象体の動作に高い相関を有する動きをする。したがってサーフボードにセンサが取り付けられていれば、たとえセンサが評価対象体の身体に取り付けられていなくても、評価対象体の動作に高い相関を有する結果が得られる。そしてこれもまた、映像よりも照明の強さや色等の環境の影響を受けない結果である。 Explain what the subject of evaluation uses when making movements. For example, if the sport is surfing, the subject of evaluation uses a surfboard during the competition. The surfboard makes movements that are highly correlated with the movements of the subject of evaluation in many scenes during the competition. Therefore, if a sensor is attached to the surfboard, results that are highly correlated with the movements of the subject of evaluation can be obtained even if the sensor is not attached to the body of the subject of evaluation. Again, this result is less affected by environmental factors such as lighting intensity and color than video images.

 したがって、センサが評価対象体の動作の際に用いるものに取り付けられている場合であっても、センサの得た結果の時系列を用いれば、動作映像データだけに基づいた評価よりも、高い精度の評価が得られる。このような事情は、サーフィンに限らず、スキーや、スノーボード等の道具を用いた動作が採点の対象である行事について共通である。 Therefore, even if the sensor is attached to something used during the movement of the subject being evaluated, using the time series of results obtained by the sensor will provide a more accurate evaluation than an evaluation based only on video data of the movement. This is not limited to surfing, but is common to all events where movements using equipment such as skis and snowboards are the subject of grading.

 また、評価モデルによって評価される動作が医療行為である場合、メス等の医療行為中に用いられる医療機器の動きもまた、計測対象体の動作に高い相関を有する。したがって、動作が医療行為中の動作である場合も、医療機器に動作計測センサが取り付けられていれば、たとえ動作計測センサが評価対象体の身体に取り付けられていなくても、評価対象体の動作に高い相関を有する結果を動作計測センサは得る。そしてこれもまた、映像よりも照明の強さや色等の環境の影響を受けない結果である。 Furthermore, when the motion evaluated by the evaluation model is a medical procedure, the movement of medical equipment used during the medical procedure, such as a scalpel, also has a high correlation with the motion of the object being measured. Therefore, even if the motion is during a medical procedure, if a motion measurement sensor is attached to the medical equipment, the motion measurement sensor will obtain results that are highly correlated with the motion of the object being evaluated, even if the motion measurement sensor is not attached to the body of the object being evaluated. Again, this is a result that is less affected by the environment, such as the intensity and color of lighting, than video.

 したがって、評価される動作が医療行為の場合であってセンサが評価対象体の動作の際に用いるものに取り付けられている場合であっても、センサの得た結果の時系列を用いれば、動作映像データだけに基づいた評価よりも、高い精度の評価が得られる。 Therefore, even if the movement being evaluated is a medical procedure and the sensor is attached to something used during the movement of the subject being evaluated, using the time series of results obtained by the sensor will provide a more accurate evaluation than an evaluation based solely on video data of the movement.

 なお、評価対象体が犬や猫等の動物である場合には、動作計測センサは例えば首輪に取り付けられていてもよいし、足等にタグが取り付けられている場合にはそのタグに取り付けられていてもよい。 If the subject to be evaluated is an animal such as a dog or cat, the motion measurement sensor may be attached to the collar, for example, or, if a tag is attached to the foot or the like, it may be attached to the tag.

 なお、このような動作計測センサは、評価対象体の動きの情報を前後、左右及び上下等の線形独立な3次元方向それぞれについて得るセンサであってもよいし、2次元や1次元について得るセンサであってもよい。 Such motion measurement sensors may be sensors that obtain information on the movement of the evaluation subject in each of three linearly independent directions, such as forward/backward, left/right, and up/down, or may be sensors that obtain information in one or two dimensions.

 なお評価モデルの用いる動作計測データは1つの動作計測センサの得た結果である必要はない。評価モデルは、複数の動作計測センサの得た動作計測データを用いてもよい。このような場合、評価モデルには、複数の動作計測データが入力される。複数の動作計測センサは、例えば加速度を得る動作計測センサと生体情報を得る動作計測センサと、のように互いに異なる物理量を得る複数種類の動作計測センサであってもよい。このような場合、評価モデルは、互いに異なる複数種類の物理量の時系列をそれぞれ動作計測データとして評価に用いる。 The motion measurement data used by the evaluation model does not have to be the results obtained by a single motion measurement sensor. The evaluation model may use motion measurement data obtained by multiple motion measurement sensors. In such a case, multiple motion measurement data are input to the evaluation model. The multiple motion measurement sensors may be multiple types of motion measurement sensors that obtain different physical quantities, such as a motion measurement sensor that obtains acceleration and a motion measurement sensor that obtains bioinformation. In such a case, the evaluation model uses time series of multiple different types of physical quantities as motion measurement data for evaluation.

 正解データは、評価対象体の動作の評価の結果を示すデータである。 Correct answer data is data that indicates the results of the evaluation of the subject's movements.

 制御部11は学習処理において、評価モデルの評価の結果と正解データとに基づき、評価モデルの評価の結果と正解データとの違いを小さくするように評価モデルを更新する。違いは、例えばL1ノルムで表現されてもよいし、L2ノルムで表現されてもよいし、l1ノルムとL2ノルムとの線形和で表現されてもよい。 In the learning process, the control unit 11 updates the evaluation model based on the evaluation result of the evaluation model and the correct answer data so as to reduce the difference between the evaluation result of the evaluation model and the correct answer data. The difference may be expressed, for example, by the L1 norm, the L2 norm, or the linear sum of the L1 norm and the L2 norm.

 学習処理は、学習に関する所定の終了条件(以下「学習終了条件」という。)が満たされるまで実行される。学習終了条件は、例えば、学習による学習対象の変化が所定の変化より小さいという条件である。学習終了条件は、例えば、学習対象が所定の回数以上更新された、という条件であってもよい。 The learning process is executed until a predetermined end condition for learning (hereinafter referred to as the "learning end condition") is satisfied. The learning end condition is, for example, a condition that the change in the learning object due to learning is smaller than a predetermined change. The learning end condition may be, for example, a condition that the learning object has been updated a predetermined number of times or more.

 評価装置2は、学習終了条件が満たされた時点の評価モデルである学習済み評価モデル、を用いて、評価対象体の動作を評価する。評価装置2は、評価対象の動作を写す動作映像データと、動作映像データに写る動作中に得られた動作計測データとの組(以下「推論段階データ」という。)の入力を受け付ける。評価装置2は、受け付けた推論段階データに基づき、評価を行う。 The evaluation device 2 evaluates the movement of the evaluation subject using a trained evaluation model, which is the evaluation model at the time when the learning termination condition is satisfied. The evaluation device 2 accepts input of a set of movement video data showing the movement of the evaluation subject and movement measurement data obtained during the movement shown in the movement video data (hereinafter referred to as "inference stage data"). The evaluation device 2 performs evaluation based on the accepted inference stage data.

<評価モデルについて>
 評価モデルは、動作映像データと、動作計測データと、正解データと、を用いて、評価対象体の動作の評価を行うことができれば、どのようであってもよい。
<About the evaluation model>
The evaluation model may be of any type as long as it can evaluate the motion of the evaluation subject using motion video data, motion measurement data, and ground truth data.

 評価モデルは、例えば、映像データ特徴量取得処理と、計測データ特徴量取得処理と、を含む。映像データ特徴量取得処理は、動作映像データの特徴量を取得する処理である。計測データ特徴量取得処理は、動作計測データの特徴量を取得する処理である。 The evaluation model includes, for example, a video data feature acquisition process and a measurement data feature acquisition process. The video data feature acquisition process is a process for acquiring the feature of motion video data. The measurement data feature acquisition process is a process for acquiring the feature of motion measurement data.

 映像データ特徴量取得処理と計測データ特徴量取得処理とを含む評価モデルは、映像データ特徴量取得処理の結果と、計測データ特徴量取得処理の結果と、に基づいて、評価対象体の動作の評価を行う。 An evaluation model including a video data feature acquisition process and a measurement data feature acquisition process evaluates the motion of the evaluation subject based on the results of the video data feature acquisition process and the results of the measurement data feature acquisition process.

 <特徴量の取得の処理対象について>
 映像データ特徴量取得処理では、時間軸に沿って複数の集合に分割された状態にある動作映像データの、分割後の各集合(以下「部分映像データ」という。)ごとに特徴量が得られてもよい。したがってこの場合、部分映像データごとにその部分映像データの特徴量が得られる。
<Regarding processing targets for obtaining feature amounts>
In the video data feature acquisition process, the feature may be obtained for each group (hereinafter referred to as "partial video data") of the action video data divided into a plurality of groups along the time axis. Therefore, in this case, the feature of the partial video data is obtained for each partial video data.

 例えば映像がP×Qフレーム(Pは1以上の整数。Qは2以上の整数)からなる動作映像データの場合、映像特徴取得処理では、例えばフレーム番号順にPフレームごとに1つの部分映像データに分割され、各部分映像データについて特徴量が取得される。したがってこのような場合、映像データ特徴量取得処理では、動作映像データからQ個の特徴量が得られる。 For example, if the video is motion video data consisting of P x Q frames (P is an integer equal to or greater than 1, and Q is an integer equal to or greater than 2), in the video feature acquisition process, the video is divided into partial video data for every P frames, for example, in the order of frame numbers, and features are acquired for each partial video data. Therefore, in such a case, the video data feature acquisition process obtains Q features from the motion video data.

 なお、動作映像データを部分映像データに分割する処理(以下「映像データ分割処理」という。)は、それが実行される場合、映像データ特徴量取得処理が実行される前であれば、どのようなタイミングで実行されてもよい。 Note that the process of dividing the motion video data into partial video data (hereinafter referred to as the "video data division process") may be performed at any time before the video data feature acquisition process is performed.

 なお、映像データ分割処理は、例えば、予め定められたフレーム数ごとに時間軸に沿って、動作映像データを分割する処理である。 Note that the video data division process is a process of dividing the motion video data along the time axis, for example, for each predetermined number of frames.

 なお、部分映像データに分割された状態にある動作映像データもまた、動作映像データの1種である。したがって、評価モデルに入力される動作映像データは、映像データ分割処理後の動作映像データであってもよい。評価モデルに入力される動作映像データが映像データ分割処理後の動作映像データである場合、映像データ分割処理は、例えば制御部11が実行してもよいし、学習装置1以外の他の装置が実行してもよい。 Note that action video data that has been divided into partial video data is also a type of action video data. Therefore, the action video data input to the evaluation model may be action video data after video data division processing. When the action video data input to the evaluation model is action video data after video data division processing, the video data division processing may be executed, for example, by the control unit 11, or may be executed by another device other than the learning device 1.

 他の装置が実行する場合、制御部11は、例えば後述するインタフェース部12を介して、その結果を取得して、評価モデルの実行による評価に用いる。なお、評価モデルに入力される動作映像データは映像データ分割処理前の動作映像データであってもよく、このような場合であって映像データ分割処理も実行する場合には、評価モデルが映像データ分割処理を実行する。 When another device executes, the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model. Note that the motion video data input to the evaluation model may be motion video data before the video data division process, and in such a case, when the video data division process is also executed, the evaluation model executes the video data division process.

 このような事情は、計測データ特徴量取得処理についても同様である。計測データ特徴量取得処理では、時間軸に沿って所定の時間幅で複数の集合に分割された状態にある動作計測データの分割後の各集合(以下「部分計測データ」という。)ごとに特徴量が得られてもよい。したがってこの場合、部分計測データごとにその部分計測データの特徴量が得られる。 The same situation applies to the measurement data feature acquisition process. In the measurement data feature acquisition process, feature values may be obtained for each group (hereinafter referred to as "partial measurement data") after the motion measurement data, which is divided into a plurality of groups at a predetermined time width along the time axis, is divided. Therefore, in this case, feature values for each partial measurement data are obtained.

 動作計測データが複数の部分計測データごとに特徴量が得られると、動作計測データが部分計測データに分割されていない場合に得られる特徴量よりも、数が多い。 When feature values are obtained for each of multiple partial measurement data of motion measurement data, the number of feature values obtained is greater than that obtained when the motion measurement data is not divided into partial measurement data.

 なお、動作計測データを部分計測データに分割する処理(以下「計測データ分割処理」という。)は、それが実行される場合、計測データ特徴量取得処理が実行される前であれば、どのようなタイミングで実行されてもよい。 Note that the process of dividing the motion measurement data into partial measurement data (hereinafter referred to as the "measurement data division process") may be performed at any time before the measurement data feature acquisition process is performed.

 なお、計測データ分割処理は、例えば、予め定められた時間幅ごとに時間軸に沿って、時系列の1種である計測映像データを分割する処理である。 The measurement data division process is, for example, a process of dividing the measurement video data, which is a type of time series, along the time axis for each predetermined time width.

 なお、部分計測データに分割された状態にある動作計測データもまた、動作計測データの1種である。したがって、評価モデルに入力される動作計測データは、計測データ分割処理後の動作計測データであってもよい。評価モデルに入力される動作計測データが計測データ分割処理後の動作計測データである場合、計測データ分割処理は、例えば制御部11が実行してもよいし、学習装置1以外の他の装置が実行してもよい。 Note that motion measurement data that has been divided into partial measurement data is also a type of motion measurement data. Therefore, the motion measurement data input to the evaluation model may be motion measurement data after measurement data division processing. When the motion measurement data input to the evaluation model is motion measurement data after measurement data division processing, the measurement data division processing may be executed, for example, by the control unit 11, or may be executed by a device other than the learning device 1.

 他の装置が実行する場合、制御部11は、例えば後述するインタフェース部12を介して、その結果を取得して、評価モデルの実行による評価に用いる。なお、評価モデルに入力される動作計測データは計測データ分割処理前の動作計測データであってもよく、このような場合であって計測データ分割処理も実行する場合には、評価モデルが計測データ分割処理を実行する。 When another device executes, the control unit 11 acquires the results, for example via the interface unit 12 described below, and uses them for evaluation by executing the evaluation model. Note that the motion measurement data input to the evaluation model may be the motion measurement data before the measurement data division process, and in such a case, when the measurement data division process is also executed, the evaluation model executes the measurement data division process.

 <評価のための特徴量の用いられ方について>
 上述したように、評価モデルは、映像データ特徴量取得処理の結果と、計測データ特徴量取得処理の結果と、に基づいて、評価対象体の動作の評価を行う。映像データ特徴量取得処理の結果と計測データ特徴量取得処理の結果とに基づく評価対象体の動作の評価においては、映像データ特徴量取得処理の結果と計測データ特徴量取得処理の結果と、は統合された状態で評価対象体の動作の評価に用いられてもよい。
<How features are used for evaluation>
As described above, the evaluation model evaluates the motion of the evaluation object based on the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process. In evaluating the motion of the evaluation object based on the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process, the result of the video data feature amount acquisition process and the result of the measurement data feature amount acquisition process may be used in an integrated state to evaluate the motion of the evaluation object.

 映像データ特徴量取得処理の結果と計測データ特徴量取得処理の結果とに基づく評価対象体の動作の評価においては、映像データ特徴量取得処理の結果と計測データ特徴量取得処理の結果と、は統合されていない状態で評価対象体の動作の評価に用いられてもよい。 In evaluating the motion of the evaluation subject based on the results of the video data feature acquisition process and the results of the measurement data feature acquisition process, the results of the video data feature acquisition process and the measurement data feature acquisition process may be used in an unintegrated state to evaluate the motion of the evaluation subject.

 統合とは、異なるデータを1つにする処理であって、例えばデータがベクトルで表現されている場合には、ベクトルの直積を得る処理である。データが行列で表現されている場合には、行列の行方向又は列方向に行列を連結する処理であってもよい。 Integration is the process of combining different data into one. For example, when the data is expressed as vectors, it is the process of obtaining the Cartesian product of the vectors. When the data is expressed as matrices, it may be the process of concatenating matrices in the row or column direction.

 なお統合には、1又は複数の全結合層が用いられてもよいし、以下の参考文献1に記載のTransformerのEncoderが用いられてもよい。この際、Transformer Blockの個数はいくつであってもよく、例えば2個連続させても良い。 For the integration, one or more fully connected layers may be used, or the Transformer Encoder described in Reference 1 below may be used. In this case, the number of Transformer Blocks may be any number, for example, two Transformer Blocks may be connected in a row.

 参考文献1:Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017). Reference 1: Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).

<ハードウェア構成の一例>
 図2は、実施形態における学習装置1のハードウェア構成の一例を示す図である。学習装置1は、上述したように制御部11を備え、プログラムを実行する。学習装置1は、プログラムの実行によって制御部11、インタフェース部12及び記憶部13を備える装置として機能する。
<Example of hardware configuration>
2 is a diagram showing an example of a hardware configuration of the learning device 1 according to the embodiment. As described above, the learning device 1 includes the control unit 11 and executes a program. The learning device 1 functions as a device including the control unit 11, an interface unit 12, and a storage unit 13 by executing the program.

 より具体的には、プロセッサ91が記憶部13に記憶されているプログラムを読み出し、読み出したプログラムをメモリ92に記憶させる。プロセッサ91が、メモリ92に記憶させたプログラムを実行することによって、学習装置1は、制御部11、インタフェース部12及び記憶部13を備える装置として機能する。 More specifically, the processor 91 reads out a program stored in the storage unit 13 and stores the read out program in the memory 92. The processor 91 executes the program stored in the memory 92, whereby the learning device 1 functions as a device including a control unit 11, an interface unit 12, and a storage unit 13.

 制御部11は、例えば学習処理を行う。制御部11は、例えば映像データ分割処理を実行する。制御部11は、例えば計測データ分割処理を実行する。制御部11は、例えば学習装置1が備える各機能部の動作を制御する。制御部11は、例えばインタフェース部12を介して、学習用データを取得する。制御部11は、例えば記憶部13の記憶する情報を取得する。記憶部13の記憶する情報を取得する処理は、具体的には、読み出しである。 The control unit 11 performs, for example, a learning process. The control unit 11 performs, for example, a video data division process. The control unit 11 performs, for example, a measurement data division process. The control unit 11 controls, for example, the operation of each functional unit provided in the learning device 1. The control unit 11 acquires learning data, for example, via the interface unit 12. The control unit 11 acquires, for example, information stored in the memory unit 13. The process of acquiring information stored in the memory unit 13 is specifically a read process.

 インタフェース部12は、学習装置1を外部装置に接続するための通信インタフェースを含んで構成される。インタフェース部12は、有線又は無線を介して外部装置と通信する。外部装置は、例えば学習用データの送信元の装置である。このような場合、インタフェース部12は、学習用データの送信元の装置との通信によって学習用データを取得する。 The interface unit 12 includes a communication interface for connecting the learning device 1 to an external device. The interface unit 12 communicates with the external device via wired or wireless communication. The external device is, for example, a device that transmits learning data. In such a case, the interface unit 12 acquires learning data by communicating with the device that transmits the learning data.

 なお、外部装置が映像データ分割処理を実行する場合、インタフェース部12はその外部装置による映像データ分割処理後の動作映像データを学習用データの含む動作映像データとして得る。なお、外部装置が計測データ分割処理を実行する場合、インタフェース部12はその外部装置による計測データ分割処理後の動作計測データを学習用データの含む動作計測データとして得る。 When an external device performs video data division processing, the interface unit 12 obtains the motion video data after the video data division processing by the external device as motion video data including learning data. When an external device performs measurement data division processing, the interface unit 12 obtains the motion measurement data after the measurement data division processing by the external device as motion measurement data including learning data.

 外部装置は、例えば評価装置2である。インタフェース部12は評価装置2との間の通信によって、学習処理により得られた学習済みの評価モデル(すなわち学習済み評価モデル)を評価装置2に送信する。 The external device is, for example, the evaluation device 2. The interface unit 12 transmits the trained evaluation model obtained by the learning process (i.e., the trained evaluation model) to the evaluation device 2 through communication with the evaluation device 2.

 インタフェース部12は、マウスやキーボード、タッチパネル等の入力装置を含んで構成されてもよい。インタフェース部12は、これらの入力装置を学習装置1に接続するインタフェースとして構成されてもよい。このように、インタフェース部12は、入力装置、有線又は無線、を介して学習装置1に対する各種情報の入力を受け付ける。なお、学習用データは、必ずしも通信インタフェースに入力される必要はなく入力装置に入力されてもよい。 The interface unit 12 may be configured to include input devices such as a mouse, keyboard, touch panel, etc. The interface unit 12 may be configured as an interface that connects these input devices to the learning device 1. In this way, the interface unit 12 accepts input of various information to the learning device 1 via the input device, either wired or wirelessly. Note that learning data does not necessarily need to be input to a communication interface, and may be input to an input device.

 インタフェース部12は、各種情報を出力してもよい。インタフェース部12は、例えばCRT(Cathode Ray Tube)ディスプレイや液晶ディスプレイ、有機EL(Electro-Luminescence)ディスプレイ等の表示装置を含んで構成される。インタフェース部12は、これらの表示装置を学習装置1に接続するインタフェースとして構成されてもよい。インタフェース部12は、例えばインタフェース部12に入力された情報を出力する。 The interface unit 12 may output various types of information. The interface unit 12 includes a display device such as a CRT (Cathode Ray Tube) display, a liquid crystal display, or an organic EL (Electro-Luminescence) display. The interface unit 12 may be configured as an interface that connects these display devices to the learning device 1. The interface unit 12 outputs information that has been input to the interface unit 12, for example.

 記憶部13は、磁気ハードディスク装置や半導体記憶装置などのコンピュータ読み出し可能な記憶媒体装置(non-transitory computer-readable recording medium)を用いて構成される。記憶部13は学習装置1に関する各種情報を記憶する。記憶部13は、例えば学習処理に必要な情報を記憶する。記憶部13は、例えば予め評価モデルを記憶する。記憶部13は、例えば制御部11の動作により生じた各種情報を記憶する。したがって記憶部13は、例えば学習済みの評価モデルを記憶する。記憶部13は、例えばインタフェース部12が取得した情報を記憶する。 The memory unit 13 is configured using a computer-readable storage medium device (non-transitory computer-readable recording medium) such as a magnetic hard disk device or a semiconductor storage device. The memory unit 13 stores various information related to the learning device 1. The memory unit 13 stores, for example, information necessary for the learning process. The memory unit 13 stores, for example, an evaluation model in advance. The memory unit 13 stores, for example, various information generated by the operation of the control unit 11. Thus, the memory unit 13 stores, for example, a learned evaluation model. The memory unit 13 stores, for example, information acquired by the interface unit 12.

 図3は、実施形態における評価装置2のハードウェア構成の一例を示す図である。評価装置2は、上述したように制御部21(学習済み評価モデル実行部の一例)を備え、プログラムを実行する。評価装置2は、プログラムの実行によって制御部21、インタフェース部22及び記憶部23を備える装置として機能する。 FIG. 3 is a diagram showing an example of the hardware configuration of the evaluation device 2 in an embodiment. As described above, the evaluation device 2 includes a control unit 21 (an example of a trained evaluation model execution unit) and executes a program. By executing the program, the evaluation device 2 functions as a device including the control unit 21, an interface unit 22, and a memory unit 23.

 より具体的には、プロセッサ93が記憶部23に記憶されているプログラムを読み出し、読み出したプログラムをメモリ94に記憶させる。プロセッサ93が、メモリ94に記憶させたプログラムを実行することによって、評価装置2は、制御部21、インタフェース部22及び記憶部23を備える装置として機能する。 More specifically, the processor 93 reads the program stored in the storage unit 23 and stores the read program in the memory 94. The processor 93 executes the program stored in the memory 94, whereby the evaluation device 2 functions as a device including the control unit 21, the interface unit 22, and the storage unit 23.

 制御部21は、例えば推論段階データを実行対象として学習済み評価モデルを実行する。これにより制御部21は、推論段階データに含まれる動作映像データに写る動作の評価を行う。すなわち、制御部21は、学習済み評価モデルと、推論段階データと、を用い、推論段階データに含まれる動作映像データに写る動作を評価する。 The control unit 21 executes the learned evaluation model, for example, with the inference stage data as the execution target. This allows the control unit 21 to evaluate the actions depicted in the action video data included in the inference stage data. In other words, the control unit 21 uses the learned evaluation model and the inference stage data to evaluate the actions depicted in the action video data included in the inference stage data.

 なお、評価モデルの学習段階において映像データ分割処理を制御部11が実行していた場合であって映像データ分割処理が評価モデルに含まれていない場合には、制御部21は映像データ分割処理を学習済み評価モデルの実行前に実行する。なお、評価モデルの学習段階において計測データ分割処理を制御部11が実行していた場合であって計測データ分割処理が評価モデルに含まれていない場合には、制御部21は計測データ分割処理を学習済み評価モデルの実行前に実行する。 If the control unit 11 has performed video data splitting processing during the learning stage of the evaluation model but the video data splitting processing is not included in the evaluation model, the control unit 21 performs the video data splitting processing before executing the learned evaluation model. If the control unit 11 has performed measurement data splitting processing during the learning stage of the evaluation model but the measurement data splitting processing is not included in the evaluation model, the control unit 21 performs the measurement data splitting processing before executing the learned evaluation model.

 制御部21は、例えば評価装置2が備える各機能部の動作を制御する。制御部21は、例えばインタフェース部22を介して取得された各種情報を取得する。したがって例えば制御部21は、例えば学習済み評価モデルの実行対象のデータ(すなわち推論段階データ)を取得する。制御部21は、例えば記憶部23の記憶する情報を取得する。記憶部23の記憶する情報を取得する処理は、具体的には、読み出しである。 The control unit 21 controls the operation of each functional unit of the evaluation device 2, for example. The control unit 21 acquires various information acquired, for example, via the interface unit 22. Thus, for example, the control unit 21 acquires data on which the trained evaluation model is to be executed (i.e., inference stage data). The control unit 21 acquires information stored in the memory unit 23, for example. The process of acquiring information stored in the memory unit 23 is specifically a read process.

 インタフェース部22は、評価装置2を外部装置に接続するための通信インタフェースを含んで構成される。インタフェース部22は、有線又は無線を介して外部装置と通信する。外部装置は、例えば、推論段階データの送信元の装置である。このような場合、インタフェース部22は、推論段階データの送信元の装置との通信によって推論段階データを取得する。外部装置は、例えば学習装置1である。インタフェース部22は学習装置1との間の通信によって、学習済み評価モデルを取得する。 The interface unit 22 includes a communication interface for connecting the evaluation device 2 to an external device. The interface unit 22 communicates with the external device via wired or wireless communication. The external device is, for example, a device that transmits inference stage data. In such a case, the interface unit 22 acquires the inference stage data by communicating with the device that transmits the inference stage data. The external device is, for example, the learning device 1. The interface unit 22 acquires the trained evaluation model by communicating with the learning device 1.

 なお、外部装置が映像データ分割処理を実行する場合、インタフェース部22はその外部装置による映像データ分割処理後の動作映像データを推論段階データの含む動作映像データとして得る。なお、評価モデルの学習段階において外部装置が映像データ分割処理を実行していた場合には、評価装置2も、外部装置の実行した映像データ分割処理の結果を得る。 When an external device executes a video data division process, the interface unit 22 obtains the motion video data after the video data division process by the external device as motion video data including inference stage data. When an external device executes a video data division process during the learning stage of the evaluation model, the evaluation device 2 also obtains the result of the video data division process executed by the external device.

 なお、外部装置が計測データ分割処理を実行する場合、インタフェース部22はその外部装置による計測データ分割処理後の動作計測データを推論段階データの含む動作計測データとして得る。なお、評価モデルの学習段階において外部装置が計測データ分割処理を実行していた場合には、評価装置2も、外部装置の実行した計測データ分割処理の結果を得る。 When an external device executes the measurement data division process, the interface unit 22 obtains the motion measurement data after the measurement data division process by the external device as motion measurement data including inference stage data. When an external device executes the measurement data division process during the learning stage of the evaluation model, the evaluation device 2 also obtains the result of the measurement data division process executed by the external device.

 インタフェース部22は、マウスやキーボード、タッチパネル等の入力装置を含んで構成される。インタフェース部22は、これらの入力装置を評価装置2に接続するインタフェースとして構成されてもよい。このように、インタフェース部22は、入力装置、有線又は無線、を介して評価装置2に対する各種情報の入力を受け付ける。なお、推論段階データは、必ずしも通信インタフェースに入力される必要はなく入力装置に入力されてもよい。 The interface unit 22 includes input devices such as a mouse, keyboard, and touch panel. The interface unit 22 may be configured as an interface that connects these input devices to the evaluation device 2. In this way, the interface unit 22 accepts input of various information to the evaluation device 2 via the input device, either wired or wirelessly. Note that the inference stage data does not necessarily need to be input to a communication interface, and may be input to an input device.

 インタフェース部22は、各種情報を出力する。インタフェース部22は、例えばCRTディスプレイや液晶ディスプレイ、有機ELディスプレイ等の表示装置を含んで構成される。インタフェース部22は、これらの表示装置を評価装置2に接続するインタフェースとして構成されてもよい。インタフェース部22は、例えばインタフェース部22に入力された情報を出力する。 The interface unit 22 outputs various types of information. The interface unit 22 includes a display device such as a CRT display, a liquid crystal display, or an organic EL display. The interface unit 22 may be configured as an interface that connects these display devices to the evaluation device 2. The interface unit 22 outputs, for example, information input to the interface unit 22.

 記憶部23は、磁気ハードディスク装置や半導体記憶装置などのコンピュータ読み出し可能な記憶媒体装置を用いて構成される。記憶部23は評価装置2に関する各種情報を記憶する。記憶部23は、例えば学習済み評価モデルを記憶する。記憶部23は、例えば制御部21の動作により生じた各種情報を記憶する。記憶部23は、例えばインタフェース部22が取得した情報を記憶する。 The storage unit 23 is configured using a computer-readable storage medium device such as a magnetic hard disk device or a semiconductor storage device. The storage unit 23 stores various information related to the evaluation device 2. The storage unit 23 stores, for example, a trained evaluation model. The storage unit 23 stores, for example, various information generated by the operation of the control unit 21. The storage unit 23 stores, for example, information acquired by the interface unit 22.

 図4は、実施形態における学習装置1が実行する処理の流れの一例を示すフローチャートである。制御部11が、学習用データを取得する(ステップS101)。制御部11が、取得した学習用データを用いた学習処理を、学習終了条件が満たされるまで実行する(ステップS102)。制御部11は、学習済みの評価モデルを所定の出力先に出力する(ステップS103)。所定の出力先は、例えば、記憶部13である。なお、記憶部13への出力とは記憶部13への書き込みを意味する。所定の出力先は、例えば、インタフェース部12を介して通信可能に接続された、評価装置2等の外部装置であってもよい。 FIG. 4 is a flowchart showing an example of the flow of processing executed by the learning device 1 in the embodiment. The control unit 11 acquires learning data (step S101). The control unit 11 executes a learning process using the acquired learning data until a learning end condition is satisfied (step S102). The control unit 11 outputs the trained evaluation model to a predetermined output destination (step S103). The predetermined output destination is, for example, the memory unit 13. Note that output to the memory unit 13 means writing to the memory unit 13. The predetermined output destination may be, for example, an external device such as the evaluation device 2 that is communicatively connected via the interface unit 12.

 図5は、実施形態における評価装置2が実行する処理の流れの一例を示すフローチャートである。なお、説明の簡単のため図5のフローチャートの例において学習済みの評価モデルは予め記憶部23に記憶済みであるとする。 FIG. 5 is a flowchart showing an example of the flow of processing executed by the evaluation device 2 in an embodiment. For ease of explanation, it is assumed in the example of the flowchart in FIG. 5 that the trained evaluation model has been stored in advance in the storage unit 23.

 インタフェース部22に推論段階データが入力され、インタフェース部22に入力された推論段階データを制御部21が取得する(ステップS201)。インタフェース部22に推論段階データが入力されるとは、インタフェース部22が推論段階データを取得することである。 Inference stage data is input to the interface unit 22, and the control unit 21 acquires the inference stage data input to the interface unit 22 (step S201). Inputting inference stage data to the interface unit 22 means that the interface unit 22 acquires the inference stage data.

 制御部21が、ステップS201で得た推論段階データに対して学習済み評価モデルを実行する(ステップS202)。ステップS202の処理により、ステップS201で取得された推論段階データに含まれる動作映像データに写る動作を評価する。ステップS202の次に、制御部21は、ステップS202で得られた評価の結果を所定の出力先に出力する(ステップS203)。所定の出力先は、例えば、記憶部23である。なお、記憶部23への出力とは記憶部23への書き込みを意味する。所定の出力先は、例えば、インタフェース部22を介して通信可能に接続された所定の外部装置であってもよい。 The control unit 21 executes the learned evaluation model on the inference stage data obtained in step S201 (step S202). By the processing of step S202, the action shown in the action video data included in the inference stage data obtained in step S201 is evaluated. After step S202, the control unit 21 outputs the evaluation result obtained in step S202 to a predetermined output destination (step S203). The predetermined output destination is, for example, the memory unit 23. Note that output to the memory unit 23 means writing to the memory unit 23. The predetermined output destination may be, for example, a predetermined external device connected for communication via the interface unit 22.

 このように構成された学習装置1は、動作映像データだけでなく、動作計測データにも基づいて評価モデルの学習を行う。動作計測データは、動作計測センサで得られるため、映像よりも照明の強さや色等の環境の影響を受けない結果である。そのため、動作映像データだけでなく動作計測データにも基づいて学習された評価モデルは、動作映像データだけに基づく場合よりも高い精度で評価することが可能である。したがって学習装置1は、動作の評価の精度を高めることができる。 The learning device 1 configured in this manner learns an evaluation model based not only on movement video data but also on movement measurement data. Because the movement measurement data is obtained by a movement measurement sensor, it is less affected by the environment, such as lighting intensity and color, than video. Therefore, an evaluation model learned based not only on movement video data but also on movement measurement data can be evaluated with higher accuracy than when based only on movement video data. Therefore, the learning device 1 can improve the accuracy of movement evaluation.

 また、このように構成された評価装置2は、学習装置1の得た学習済みの数理モデルを用いて動作の評価を行う。そのため、評価装置2は、動作の評価の精度を高めることができる。 The evaluation device 2 configured in this way evaluates the movement using the learned mathematical model obtained by the learning device 1. Therefore, the evaluation device 2 can improve the accuracy of the movement evaluation.

 また、このように構成された評価システム100は、学習装置1を備える。そのため、評価システム100は、動作の評価の精度を高めることができる。 The evaluation system 100 configured in this manner also includes a learning device 1. As a result, the evaluation system 100 can improve the accuracy of the evaluation of actions.

(変形例)
 なお、映像データ特徴量取得処理で得られた特徴量と、計測データ特徴量取得処理で得られた特徴量と、に基づく評価対象体の動作の評価においては、例えば以下の第1変形処理が実行されてもよい。第1変形処理は、映像データ特徴量取得処理で得られた特徴量と計測データ特徴量取得処理で得られた特徴量との次元をLinear層を介して揃え、Embedding処理を介してTransformer Blockに入力し、推定器を通して評価する、という処理である。推定器は、例えば複数の全結合層である。
(Modification)
In addition, in evaluating the motion of the evaluation target object based on the feature amounts obtained in the video data feature amount acquisition process and the feature amounts obtained in the measurement data feature amount acquisition process, for example, the following first transformation process may be executed. The first transformation process is a process in which the dimensions of the feature amounts obtained in the video data feature amount acquisition process and the feature amounts obtained in the measurement data feature amount acquisition process are aligned via a linear layer, input to a transformer block via embedding processing, and evaluated through an estimator. The estimator is, for example, a plurality of fully connected layers.

 第1変形処理においてLinear層の処理は、映像データ特徴量取得処理で得られた特徴量と計測データ特徴量取得処理で得られた特徴量との次元を揃える処理であるため、映像データ特徴量取得処理で得られた特徴量と計測データ特徴量取得処理で得られた特徴量との次元が元々同じである場合には、必ずしも実行される必要はない。また、Linear層の構造はどのようであってもよく、例えば1層の全結合層であってもよい。 In the first transformation process, the processing of the Linear layer is a process for aligning the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process, so if the dimensions of the features obtained in the video data feature acquisition process and the feature obtained in the measurement data feature acquisition process are originally the same, it does not necessarily have to be executed. Furthermore, the structure of the Linear layer may be any, and may be, for example, a single fully connected layer.

 Embeddingの処理について説明する。Transformer blockは入力を並列に扱うため、入力の時系列的な順序や、入力が映像データ特徴量取得処理の結果か、計測データ特徴量取得処理の結果か、などを考慮できない。考慮できない、とはそういった情報を用いることができない、という意味である。そのため、Transformerを用いる場合、Embeddingの処理により、Transformerに入力される特徴量の属性を示す情報が、Transformerを実行する層であるTransformer Blockに与えられる必要がある。 The embedding process is explained below. Because the Transformer block handles inputs in parallel, it cannot take into account the chronological order of the inputs, or whether the input is the result of video data feature acquisition processing or measurement data feature acquisition processing. "Not being able to take into account" means that such information cannot be used. Therefore, when using the Transformer, information indicating the attributes of the features input to the Transformer must be provided by the embedding process to the Transformer block, which is the layer that executes the Transformer.

 特徴量の属性を示す情報は、例えば、特徴量が映像データ特徴量取得処理の結果か、計測データ特徴量取得処理の結果か、という情報を含む。属性を示す情報は、例えば、特徴量が映像データ特徴量取得処理の結果である場合には、特徴量がどの部分映像データの特徴量かということを示す情報を含んでもよい。属性を示す情報は、例えば、特徴量が計測データ特徴量取得処理の結果である場合には、特徴量がどの部分計測データの特徴量かということを示す情報を含んでもよい。 The information indicating the attribute of a feature includes, for example, information as to whether the feature is the result of a video data feature acquisition process or a measurement data feature acquisition process. For example, if the feature is the result of a video data feature acquisition process, the information indicating the attribute may include information indicating which partial video data the feature is a feature of. For example, if the feature is the result of a measurement data feature acquisition process, the information indicating the attribute may include information indicating which partial measurement data the feature is a feature of.

 そこでEmbeddingの処理は、Transformerの処理において特徴量の属性を示す情報に基づくTransformerが行われるように、Transformerで用いられる特徴量の属性を示す情報の取得を行う。 The embedding process obtains information indicating the attributes of the features used in the Transformer so that the Transformer process is based on information indicating the attributes of the features.

 第1変形処理におけるEmbedding処理は、例えば、Sensor Embeddingである。Sensor Embeddingは、Transformerに入力される各特徴量に、映像データ特徴量取得処理の結果であるのか計測データ特徴量取得処理の結果であるのかを識別する識別子を付与する処理である。識別子は、映像データ特徴量取得処理の結果であるのか計測データ特徴量取得処理の結果と、を識別できればどのような識別子であってもよい。 The embedding process in the first transformation process is, for example, sensor embedding. Sensor embedding is a process that assigns an identifier to each feature input to the Transformer, identifying whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process. The identifier can be any identifier as long as it can identify whether it is the result of a video data feature acquisition process or a measurement data feature acquisition process.

 したがって識別子は例えば「1」が映像データ特徴量取得処理の結果であることを示し、「0」が計測データ特徴量取得処理の結果であることを示すものであってもよい。また、1や0のような整数でなくて、0.1等の小数点であってもよいし、数字でなくてもよい。 Therefore, the identifier may be, for example, "1" indicating that it is the result of the video data feature acquisition process, and "0" indicating that it is the result of the measurement data feature acquisition process. Also, instead of integers such as 1 or 0, it may be a decimal point such as 0.1, or it may not even be a number.

 なお、付与される識別子は、映像データ特徴量取得処理の結果と計測データ特徴量取得処理の結果とに応じて予め定められている。 The identifier to be assigned is determined in advance based on the results of the video data feature acquisition process and the measurement data feature acquisition process.

 第1変形処理におけるEmbedding処理は、例えば、Positional Embeddingである。Positional Embeddingは、特徴量が映像データ特徴量取得処理の結果である場合にはどの部分映像データの特徴量かということを示す識別子を付与し、特徴量が計測データ特徴量取得処理の結果である場合にはどの部分計測データの特徴量かということを示す識別子を付与する処理である。付与される識別子は、例えば部分映像データや部分計測データの時間的な順序を示す識別子である。 The embedding process in the first transformation process is, for example, positional embedding. Positional embedding is a process that assigns an identifier indicating which partial video data the feature is from when the feature is the result of a video data feature acquisition process, and assigns an identifier indicating which partial measurement data the feature is from when the feature is the result of a measurement data feature acquisition process. The assigned identifier is, for example, an identifier that indicates the temporal order of the partial video data or partial measurement data.

 図6は、変形例における第1変形処理の一例を示す図である。図6の例では、1つの動作映像データと、2つの動作計測データとが用いられている。図6の例では、1つの動作映像データは部分映像データに分割されており、2つの動作計測データはどちらとも部分計測データに分割されている。図6の例では、部分映像データと部分計測データとがどちらも”Clip”と表現されている。図6の例では、Position Embedding とSensor Embeddingとが実行されている。図6の例では、Linear層があり、Linear層の出力が、Position Embedding とSensor Embeddingとの結果とともに、Transformer blockに入力されている。図6の例では、Transformer blockの結果に基づいて、評価が行われている。 FIG. 6 is a diagram showing an example of the first transformation process in the modified example. In the example of FIG. 6, one piece of motion video data and two pieces of motion measurement data are used. In the example of FIG. 6, one piece of motion video data is divided into partial video data, and both pieces of motion measurement data are divided into partial measurement data. In the example of FIG. 6, both the partial video data and the partial measurement data are expressed as "Clip". In the example of FIG. 6, Position Embedding and Sensor Embedding are performed. In the example of FIG. 6, there is a Linear layer, and the output of the Linear layer is input to a Transformer block together with the results of Position Embedding and Sensor Embedding. In the example of FIG. 6, evaluation is performed based on the results of the Transformer block.

 <非均質なデータの処理を可能にする技術について>
 ここまでの説明において動作映像データ及び動作計測データにおける時系列のサンプルには、動作の評価に際して、明示的にサンプル毎の重要度が与えられることはなかった。つまり、全てのサンプルが均質に扱われて評価が行われていた。
<Technology enabling the processing of heterogeneous data>
In the above explanation, the importance of each sample in the time series of the motion video data and the motion measurement data is not explicitly assigned to the samples when evaluating the motion. In other words, all samples are treated uniformly for evaluation.

 しかしながら、動画映像データ及び動作計測データには、必ずしも動作の評価には適さない動作も含まれている場合がある。例えば、採点競技であれば、競技が始まる前の期間や、競技終了後の選手の交代の期間などのデータである。動作の評価を行う場合、これらのデータが示す動作は評価の対象ではないので、動作の評価に際してはノイズとして作用する場合がある。 However, video footage data and movement measurement data may contain movements that are not necessarily suitable for evaluating movements. For example, in judged sports, this data may include data on the period before the competition begins or the period after the competition ends when players are substituted. When evaluating movements, the movements indicated by this data are not subject to evaluation, and may act as noise when evaluating movements.

 そこで、学習装置1は、動作映像データ及び動作計測データにおける時系列のサンプルを非均一にすることでこのような事態の発生を抑制してもよい。具体的には、評価モデルは、動作映像データ及び動作計測データにおいて、動作の評価への影響が他よりも強いことを示す識別子である注目識別子にも基づいて評価を行う。このような場合評価モデルは、より高い精度での評価可能である。 The learning device 1 may therefore suppress the occurrence of such a situation by making the time series samples in the action video data and the action measurement data non-uniform. Specifically, the evaluation model also performs evaluation based on a focus identifier, which is an identifier that indicates that an influence on the evaluation of an action is stronger than others in the action video data and the action measurement data. In such a case, the evaluation model is capable of evaluation with higher accuracy.

 注目識別子は、動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻を含む期間であって所定の条件を満たす期間のサンプルに付与される。評価モデルは、注目識別子の付与されたサンプルを付与されていないサンプルよりも大きな重みづけをして評価を行う。 The attention identifier is assigned to samples from a period that includes the time when a specified index showing the intensity of the movement is highest in the graph shown by the movement measurement data and that satisfies a specified condition. The evaluation model evaluates samples that have been assigned an attention identifier by weighting them more heavily than samples that have not been assigned an attention identifier.

 所定の条件は、動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻を含むように期間を指定する条件であればどのような条件であってもよい。所定の条件は、例えば動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻の以前と以降との各時間幅を所定の時間幅に指定する条件である。 The predetermined condition may be any condition that specifies a period of time that includes the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data. For example, the predetermined condition is a condition that specifies each time span before and after the time when a predetermined index indicating the degree of intensity of the movement is highest in the graph shown by the movement measurement data as a predetermined time span.

 上述したように評価モデルは注目識別子の付与されたサンプルを付与されていないサンプルよりも大きな重みづけをして評価を行う。そのため、動作が激しい期間ほど、動作の評価には強く影響するべきであるので、注目識別子は、動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻を含む期間であって所定の条件を満たす期間のサンプルに付与される。その結果、評価モデルによる、より高い精度での評価が可能になる。 As described above, the evaluation model weights samples that have been assigned a focus identifier more heavily than samples that do not. Therefore, since the more intense the movement, the stronger the impact on the evaluation of the movement should be, the focus identifier is assigned to samples from the period that includes the time when a specified index showing the degree of movement intensity is highest in the graph shown by the movement measurement data, and that satisfies specified conditions. As a result, the evaluation model can perform evaluation with greater accuracy.

 なお、データに対して注目識別子を付与するとは判定対象のデータについて注目識別子が示す内容を満たすデータであると判定し、判定結果を示す情報を記憶部13や記憶部23等の評価モデルが情報を読み出し可能な所定の記憶装置に記録することを意味する。 Note that assigning a focus identifier to data means determining that the data being evaluated satisfies the content indicated by the focus identifier, and recording information indicating the evaluation result in a specified storage device from which an evaluation model, such as storage unit 13 or storage unit 23, can read information.

 なお、動作映像データ及び動作計測データにおける時系列のサンプルを非均一にする処理は、動作映像データ及び動作計測データを複数のCilpへ分割する処理を含み、Clipへの分割の処理では、Clipの分割の際に、動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻を含む期間であって所定の条件を満たす期間に含まれるClipの数が他の期間よりも多くなるように動作映像データ及び動作計測データが複数のCilpへ分割されてもよい。 The process of making the time series samples in the action video data and the action measurement data non-uniform includes a process of dividing the action video data and the action measurement data into a plurality of Cilps, and in the process of dividing into Clips, the action video data and the action measurement data may be divided into a plurality of Cilps such that the number of Clips included in a period that includes the time when a predetermined index indicating the degree of intensity of the action is highest in the graph shown by the action measurement data and that satisfies a predetermined condition is greater than in other periods.

 図7は、変形例における注目識別子の付与の処理の一例を説明する説明図である。図7は、3種類の時系列を示す。3種類の時系列はIMU信号である。図7の例では、注目識別子は、期間内に最大微分係数時刻を含むという条件を少なくとも含む予め定められた注目期間条件、を満たす期間内の動作映像データ及び動作計測データに対して、注目識別子が付与される。 FIG. 7 is an explanatory diagram illustrating an example of the process of assigning an attention identifier in a modified example. FIG. 7 shows three types of time series. The three types of time series are IMU signals. In the example of FIG. 7, an attention identifier is assigned to motion video data and motion measurement data within a period that satisfies a predetermined attention period condition that includes at least the condition that the maximum differential coefficient time is included within the period.

 最大微分係数時刻は、動作計測データのグラフにおいて微分係数の最大値を与える時刻である。動作している間は、動作していないときよりも、動作計測データの値の変化は激しいはずである。 The maximum derivative time is the time that gives the maximum derivative value in the graph of the movement measurement data. When moving, the value of the movement measurement data should change more drastically than when not moving.

 したがって、動作計測データのグラフにおいて微分係数の最大値を与える時刻を含むように、注目識別子を付与すれば、注目識別子の付与されたサンプルは、動作が行われている最中のデータである可能性が高い。そのため、図7の例では、最大微分係数時刻が、動作計測データが示すグラフのうち動作の激しさの度合いを示す所定の指標が最も高い時刻の一例である。 Therefore, if an attention identifier is assigned so that it includes the time at which the maximum derivative coefficient is obtained in the graph of the movement measurement data, the sample to which the attention identifier is assigned is highly likely to be data obtained while the movement is being performed. Therefore, in the example of Figure 7, the time of maximum derivative coefficient is an example of the time at which a specified index showing the degree of intensity of the movement is highest in the graph shown by the movement measurement data.

 図7の例において、注目識別子の付与されたサンプルの集合が“技”と示された期間のサンプルである。図7において横軸は、時刻を示す。図7において“準備期間”とは、競技者が準備している期間である。図7において“技”とは、競技者が動作している期間である。図7において“実行後”とは競技者が動作を終え、動作していない期間である。図7において時刻t=taが最大微分係数時刻の一例である。 In the example of Figure 7, the set of samples that have been assigned a focus identifier are samples from the period indicated as "Technique". In Figure 7, the horizontal axis indicates time. In Figure 7, the "Preparation Period" is the period when the athlete is preparing. In Figure 7, a "Technique" is the period when the athlete is performing an action. In Figure 7, "Post-Execution" is the period when the athlete has finished performing an action and is not performing an action. In Figure 7, the time t = ta is an example of the maximum derivative time.

 以下、動作計測データに基づき、注目期間条件を満たす期間内の動作映像データ及び動作計測データに対して、注目識別子を付与する処理を、注目識別子付与処理という。 Hereinafter, the process of assigning an attention identifier to the action video data and action measurement data within a period that satisfies the attention period condition based on the action measurement data is referred to as the attention identifier assignment process.

 注目期間条件は、期間内に最大微分係数時刻を含むという条件を含む条件であればどのような条件であってもよい。注目期間条件は、例えば、最大微分係数時刻以前と以降とに予め定められた所定の時間幅を有する期間、という条件である。注目期間条件は、例えば、最大微分係数時刻以前と以降とに予め定められた所定の時間幅を有する期間であって、注目期間条件を満たさない期間より長い期間、という条件であってもよい。 The attention period condition may be any condition including a condition that the maximum derivative time is included within the period. The attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time. The attention period condition may be, for example, a condition that the period has a predetermined time width before and after the maximum derivative time, and is longer than a period that does not satisfy the attention period condition.

 注目期間条件は、例えば、最大微分係数時刻以前と以降とに予め定められた所定の時間幅を有する期間であって、期間の長さは動作映像データが示す映像の開始から終わりまでの時間の1/3以上の所定の長さである、という条件であってもよい。 The attention period condition may be, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period may be a predetermined length that is at least 1/3 of the time from the start to the end of the image shown by the motion image data.

 1/3の意味について説明する。動作映像データが示す映像は、競技者が準備をする準備期間と、競技者が演技をする期間と、競技者の演技終了後の期間とを写す場合がある。このような場合であって、競技者が準備をする準備期間と、競技者が演技をする期間と、競技者の演技終了後の期間とが全て同一の時間幅であれば、評価モデルには、最大微分係数時刻を中心にする期間であって動作映像データが示す動画の長さの1/3の長さの期間、が与えられるのがよい。 The meaning of 1/3 will be explained. The image shown by the motion video data may show a preparatory period when the athlete is getting ready, a period when the athlete is performing, and a period after the athlete has finished performing. In such a case, if the preparatory period when the athlete is getting ready, the period when the athlete is performing, and the period after the athlete has finished performing are all of the same duration, it is preferable to give the evaluation model a period centered on the maximum differential coefficient time and that is 1/3 the length of the video shown by the motion video data.

 しかしながら、動作の評価を行う、という場合、ユーザは不要なデータを削除したものを動作映像データとして、学習装置1に入力する場合が想定される。そこでこのような場合には、注目期間条件は、例えば、最大微分係数時刻以前と以降とに予め定められた所定の時間幅を有する期間であって、期間の長さは動作映像データが示す映像の開始から終わりまでの時間の1/3以上の所定の長さである、という条件が好ましい。 However, when evaluating an action, it is expected that the user will delete unnecessary data and input the data as action video data to the learning device 1. In such a case, it is preferable that the attention period condition is, for example, a period having a predetermined time width before and after the maximum differential coefficient time, and the length of the period is a predetermined length that is at least 1/3 of the time from the start to the end of the video shown by the action video data.

 注目識別子付与処理は、それが実行される場合、評価モデルが動作の評価を決定する前であればどのようなタイミングで実行されてもよい。注目識別子付与処理は、例えば、映像データ特徴量取得処理及び計測データ特徴量取得処理の実行前に実行される。このような場合、評価モデルは、映像データ特徴量取得処理及び計測データ特徴量取得処理において、注目識別子の付与されたデータに他のデータよりも大きな重みをつけて特徴量を取得する。 When the attention identifier assignment process is executed, it may be executed at any timing before the evaluation model determines the evaluation of the action. The attention identifier assignment process is executed, for example, before the video data feature acquisition process and the measurement data feature acquisition process are executed. In such a case, the evaluation model acquires features by weighting data to which an attention identifier has been assigned more heavily than other data in the video data feature acquisition process and the measurement data feature acquisition process.

 注目識別子付与処理は、映像データ特徴量取得処理及び計測データ特徴量取得処理の実行前に実行される場合、注目識別子付与処理は、例えば映像データ分割処理及び計測データ分割処理の実行後に実行されてもよい。なお、映像データ分割処理及び計測データ分割処理が実行される場合、注目識別子付与処理は部分映像データ及び部分計測データごとに付与されることで、サンプルに注目識別子が付与されてもよい。 If the attention identifier assignment process is executed before the video data feature acquisition process and the measurement data feature acquisition process are executed, the attention identifier assignment process may be executed, for example, after the video data division process and the measurement data division process are executed. Note that, if the video data division process and the measurement data division process are executed, the attention identifier assignment process may be assigned to each partial video data and partial measurement data, thereby assigning an attention identifier to the sample.

 注目識別子付与処理は、映像データ分割処理及び計測データ分割処理の実行後に実行された場合、例えば、映像データ特徴量取得処理及び計測データ特徴量取得処理の実行後に実行されてもよい。このような場合、部分映像データ及び部分計測データごと注目識別子と特徴量とが得られているので、評価モデルは、得られた特徴量のうち注目識別子が付与された特徴量に対して、大きな重みをつけて評価を決定してもよい。 If the attention identifier assignment process is executed after the video data division process and the measurement data division process, for example, it may be executed after the video data feature acquisition process and the measurement data feature acquisition process. In such a case, since an attention identifier and feature are obtained for each partial video data and partial measurement data, the evaluation model may determine the evaluation by giving a large weight to the feature to which the attention identifier has been assigned among the obtained feature values.

 なお、注目識別子付与処理は、学習段階では制御部11が実行してもよいし、映像データ分割処理及び計測データ分割処理を外部装置が実行する場合には、その外部装置が実行してもよい。外部装置が実行する場合、制御部11は、インタフェース部12を介して、映像データ分割処理及び計測データ分割処理の分割の結果とともに注目識別子付与処理の結果も取得する。 The attention identifier assignment process may be executed by the control unit 11 during the learning stage, or may be executed by an external device when the video data division process and the measurement data division process are executed by the external device. When executed by an external device, the control unit 11 obtains the results of the attention identifier assignment process together with the results of the division of the video data division process and the measurement data division process via the interface unit 12.

 なお、学習段階で注目識別子付与処理が実行された場合、評価装置2も注目識別子付与処理の結果を用いる。例えば学習段階で注目識別子付与処理を制御部11が実行した場合、制御部21は、例えば、制御部21が注目識別子付与処理を実行することで注目識別子付与処理の結果を取得する。例えば学習段階で注目識別子付与処理を制御部11が実行した場合、制御部21は、外部装置に注目識別子付与処理を実行させその結果を取得することで、注目識別子付与処理の結果を取得してもよい。 In addition, when the attention identifier assignment process is executed in the learning phase, the evaluation device 2 also uses the result of the attention identifier assignment process. For example, when the control unit 11 executes the attention identifier assignment process in the learning phase, the control unit 21 obtains the result of the attention identifier assignment process by, for example, the control unit 21 executing the attention identifier assignment process. For example, when the control unit 11 executes the attention identifier assignment process in the learning phase, the control unit 21 may obtain the result of the attention identifier assignment process by having an external device execute the attention identifier assignment process and obtaining the result.

 例えば学習段階で注目識別子付与処理を外部装置が実行した場合、制御部21は、例えば、制御部21が注目識別子付与処理を実行することで注目識別子付与処理の結果を取得する。例えば学習段階で注目識別子付与処理を外部装置が実行した場合、制御部21は、外部装置に注目識別子付与処理を実行させその結果を取得することで、注目識別子付与処理の結果を取得してもよい。 For example, if an external device executes the attention identifier assignment process in the learning phase, the control unit 21, for example, acquires the result of the attention identifier assignment process by having the control unit 21 execute the attention identifier assignment process. For example, if an external device executes the attention identifier assignment process in the learning phase, the control unit 21 may acquire the result of the attention identifier assignment process by having the external device execute the attention identifier assignment process and acquiring the result.

 なお、学習装置1は、ネットワークを介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、制御部11の実行する各処理は、複数の情報処理装置が分散して実行してもよい。 The learning device 1 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 11 may be executed in a distributed manner by the multiple information processing devices.

 なお、評価装置2は、ネットワークを介して通信可能に接続された複数台の情報処理装置を用いて実装されてもよい。この場合、制御部21の実行する各処理は、複数の情報処理装置が分散して実行してもよい。 The evaluation device 2 may be implemented using multiple information processing devices connected to each other so that they can communicate with each other via a network. In this case, each process executed by the control unit 21 may be executed in a distributed manner by the multiple information processing devices.

 なお学習装置1と評価装置2とは必ずしも異なる筐体で実装される必要はなく、1つの筐体に一体で実装されてもよい。このような場合、学習装置1と評価装置2とは、プログラムを実行することで学習装置1と評価装置2との両方として機能するコンピュータであってもよい。 The learning device 1 and the evaluation device 2 do not necessarily need to be implemented in different housings, and may be integrated into a single housing. In such a case, the learning device 1 and the evaluation device 2 may be a computer that functions as both the learning device 1 and the evaluation device 2 by executing a program.

 なお、学習装置1と評価装置2との各機能の全て又は一部は、ASIC(Application Specific Integrated Circuit)やPLD(Programmable Logic Device)やFPGA(Field Programmable Gate Array)等のハードウェアを用いて実現されてもよい。プログラムは、コンピュータ読み取り可能な記録媒体に記録されてもよい。コンピュータ読み取り可能な記録媒体とは、例えばフレキシブルディスク、光磁気ディスク、ROM、CD-ROM等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置である。プログラムは、電気通信回線を介して送信されてもよい。 All or part of the functions of the learning device 1 and the evaluation device 2 may be realized using hardware such as an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device), or an FPGA (Field Programmable Gate Array). The program may be recorded on a computer-readable recording medium. Examples of computer-readable recording media include portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. The program may be transmitted via a telecommunications line.

 なお、制御部21は学習済み評価モデル実行部の一例である。 The control unit 21 is an example of a trained evaluation model execution unit.

 以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。  Although an embodiment of the present invention has been described above in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and includes designs that do not deviate from the gist of the present invention.

 100…評価システム、 1…学習装置、 2…評価装置、 11…制御部、 12…インタフェース部、 13…記憶部、 21…制御部、 22…インタフェース部、 23…記憶部、 91…プロセッサ、 92…メモリ、 93…プロセッサ、 94…メモリ 100...Evaluation system, 1...Learning device, 2...Evaluation device, 11...Control unit, 12...Interface unit, 13...Memory unit, 21...Control unit, 22...Interface unit, 23...Memory unit, 91...Processor, 92...Memory, 93...Processor, 94...Memory

Claims (8)

 動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御部、
 を備え、
 前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、
 学習装置。
a control unit that learns a learning object using motion video data, which is video data of a video showing a motion of an evaluation subject that is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor that is attached to at least one of the body of the evaluation subject, an item worn by the evaluation subject, and an item used by the evaluation subject when performing the motion and that obtains results according to the motion of the evaluation subject, and ground truth data that indicates the result of evaluation of the motion;
Equipped with
The learning object is a mathematical model that evaluates the movement based on the movement video data and the movement measurement data.
Learning device.
 前記数理モデルは、時間軸に沿って複数の集合に分割された状態にある前記動作映像データの、分割後の各集合である部分映像データごとに前記部分映像データの特徴量を得る、
 請求項1に記載の学習装置。
The mathematical model obtains a feature amount of the partial video data for each partial video data set of the motion video data divided into a plurality of sets along a time axis.
The learning device according to claim 1 .
 前記数理モデルは、時間軸に沿って所定の時間幅で複数の集合に分割された状態にある前記動作計測データの分割後の各集合である部分計測データごとに前記部分計測データの特徴量を得る、
 請求項1に記載の学習装置。
The mathematical model obtains a feature amount of the partial measurement data for each partial measurement data set obtained after dividing the motion measurement data into a plurality of sets at a predetermined time width along a time axis.
The learning device according to claim 1 .
 前記数理モデルは、前記動作映像データ及び前記動作計測データにおいて、前記動作の評価への影響が他よりも強いことを示す注目識別子にも基づいて、前記評価を行う、
 請求項1に記載の学習装置。
The mathematical model performs the evaluation based on a note identifier that indicates a stronger influence on the evaluation of the action than other identifiers in the action video data and the action measurement data.
The learning device according to claim 1 .
 評価対象の動作を写す動作映像データと、前記動作映像データに写る動作中に得られた動作計測データとの組を取得するインタフェース部と、
 動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御部、を備え、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習装置の得た学習済みの前記数理モデルと、前記インタフェース部の取得した前記組と、を用い、前記組に含まれる前記動作映像データに写る動作の評価を行う、学習済み評価モデル実行部と、
 を備える評価装置。
an interface unit that acquires a set of motion image data showing a motion of an evaluation target and motion measurement data obtained during the motion shown in the motion image data;
a control unit that performs learning of a learning object using motion video data, which is video data of a video showing a motion of an evaluation object, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion captured in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation object, an item worn by the evaluation object, and an item used by the evaluation object when performing the motion, and which obtains a result according to the motion of the evaluation object, and ground truth data that indicates the result of evaluation of the motion, wherein the learning object is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data, and a learned evaluation model execution unit that uses the learned mathematical model obtained by the learning device and the set obtained by the interface unit to evaluate the motion captured in the motion video data included in the set;
An evaluation device comprising:
 動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御ステップ、
 を有し、
 前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、
 学習方法。
a control step of learning a learning object using motion video data, which is video data of a video showing the motion of an evaluation object, which is a human or animal whose motion is to be evaluated, motion measurement data, which is a time series of results obtained during the motion shown in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation object, an item worn by the evaluation object, and an item used by the evaluation object when performing the motion, and which obtains results according to the motion of the evaluation object, and ground truth data, which indicates the result of evaluation of the motion;
having
The learning object is a mathematical model that evaluates the movement based on the movement video data and the movement measurement data.
How to learn.
 評価対象の動作を写す動作映像データと、前記動作映像データに写る動作中に得られた動作計測データとの組を取得するインタフェースステップと、
 動作の評価対象の人又は動物である評価対象体の動作を写す映像の映像データである動作映像データと、前記評価対象体の身体と前記評価対象体が身に着けるものと前記評価対象体が前記動作の際に用いるものとの少なくとも1つに取り付けられたセンサであって前記評価対象体の動作に応じた結果を得るセンサである動作計測センサが、前記動作映像データに写る動作中に得た結果の時系列である動作計測データと、前記動作の評価の結果を示す正解データと、を用いて、学習対象の学習を行う制御ステップ、を有し、前記学習対象は、前記動作映像データと前記動作計測データとに基づいて前記動作の評価を行う数理モデルである、学習方法の得た学習済みの前記数理モデルと、前記インタフェースステップの取得した前記組と、を用い、前記組に含まれる前記動作映像データに写る動作の評価を行う、学習済み評価モデル実行ステップと、
 を有する評価方法。
an interface step of acquiring a set of motion image data showing a motion to be evaluated and motion measurement data obtained during the motion shown in the motion image data;
a control step of learning a learning object using motion video data, which is video data of a video showing a motion of an evaluation object, which is a human or animal to be evaluated for motion, motion measurement data, which is a time series of results obtained during the motion captured in the motion video data by a motion measurement sensor, which is a sensor attached to at least one of the body of the evaluation object, an item worn by the evaluation object, and an item used by the evaluation object when performing the motion, and which obtains a result according to the motion of the evaluation object, and ground truth data, which indicates a result of evaluation of the motion, wherein the learning object is a mathematical model that evaluates the motion based on the motion video data and the motion measurement data; and a learned evaluation model execution step of evaluating the motion captured in the motion video data included in the set, using the learned mathematical model obtained by the learning method and the set obtained by the interface step;
An evaluation method having the following:
 請求項1から4のいずれか一項に記載の学習装置と、請求項5に記載の評価装置と、のいずれか一方又は両方としてコンピュータを機能させるためのプログラム。 A program for causing a computer to function as either or both of the learning device described in any one of claims 1 to 4 and the evaluation device described in claim 5.
PCT/JP2023/020016 2023-05-30 2023-05-30 Training device, evaluation device, training method, evaluation method, and program Pending WO2024247080A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/020016 WO2024247080A1 (en) 2023-05-30 2023-05-30 Training device, evaluation device, training method, evaluation method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2023/020016 WO2024247080A1 (en) 2023-05-30 2023-05-30 Training device, evaluation device, training method, evaluation method, and program

Publications (1)

Publication Number Publication Date
WO2024247080A1 true WO2024247080A1 (en) 2024-12-05

Family

ID=93656963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/020016 Pending WO2024247080A1 (en) 2023-05-30 2023-05-30 Training device, evaluation device, training method, evaluation method, and program

Country Status (1)

Country Link
WO (1) WO2024247080A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132785A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system for evaluating the quality of a surgical procedure from in-vivo video
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
WO2022049700A1 (en) * 2020-09-03 2022-03-10 日本電信電話株式会社 Movement evaluating method, computer program, and movement evaluating system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170132785A1 (en) * 2015-11-09 2017-05-11 Xerox Corporation Method and system for evaluating the quality of a surgical procedure from in-vivo video
US20170220854A1 (en) * 2016-01-29 2017-08-03 Conduent Business Services, Llc Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
WO2022049700A1 (en) * 2020-09-03 2022-03-10 日本電信電話株式会社 Movement evaluating method, computer program, and movement evaluating system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZENG, LING-AN ET AL.: "Hybrid Dynamic-Static Context-Aware Attention Network for Action Assessment in Long Videos", MM '20 : PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, pages 2526 - 2534, XP059453862, Retrieved from the Internet <URL:https://dl.acm.org/doi/abs/10.1145/3394171.3413560> [retrieved on 20230803], DOI: 10.1145/3394171.3413560 *

Similar Documents

Publication Publication Date Title
US20220335851A1 (en) Identification and analysis of movement using sensor devices
Romeas et al. Combining 3D-MOT with sport decision-making for perceptual-cognitive training in virtual reality
van Maarseveen et al. Perceptual-cognitive skill and the in situ performance of soccer players
JP5760139B1 (en) GAME PROGRAM, GAME CONTROL METHOD, AND COMPUTER
Hosp et al. Soccer goalkeeper expertise identification based on eye movements
KR20190100011A (en) Method and apparatus for providing surgical information using surgical video
Tan et al. Applying artificial intelligence technology to analyze the athletes’ training under sports training monitoring system
CN112149602A (en) Action counting method and device, electronic equipment and storage medium
Gou et al. Study on the correlation between basketball players’ multiple-object tracking ability and sports decision-making
KR20230127776A (en) Apparatus and method for adjusting game difficulty based on user satisfaction
Roca et al. Capturing and testing perceptual-cognitive expertise: A comparison of stationary and movement response methods
Fini et al. Sharing space: the presence of other bodies extends the space judged as near
JP7747131B2 (en) Information processing device, information processing method, and program
JP2019136493A (en) Exercise scoring method, system and program
JP7409390B2 (en) Movement recognition method, movement recognition program and information processing device
Barnaveli et al. Hippocampal-entorhinal cognitive maps and cortical motor system represent action plans and their outcomes
Bańkosz et al. The application of statistical parametric mapping to evaluate differences in topspin backhand between Chinese and Polish female table tennis players
CN114296539B (en) Direction prediction method, virtual reality device and non-transitory computer readable medium
WO2024247080A1 (en) Training device, evaluation device, training method, evaluation method, and program
Li et al. [Retracted] Deep Learning Algorithm‐Based Target Detection and Fine Localization of Technical Features in Basketball
CN118968610A (en) System and method for generating defensive tactical suggestions based on offensive player&#39;s shooting behavior
Xie et al. Lightweight Football Motion Recognition and Intensity Analysis Using Low‐Cost Wearable Sensors
JP7310929B2 (en) Exercise menu evaluation device, method, and program
JP7599504B2 (en) Exercise improvement instruction device, exercise improvement instruction method, and exercise improvement instruction program
KR20200065834A (en) System and method for evaluating user response of vr contents, and system and method for executing the vr contents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23939559

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2025523719

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 2025523719

Country of ref document: JP