[go: up one dir, main page]

WO2022181253A1 - Dispositif de détection de point d'articulation, dispositif de génération de modèle d'apprentissage, procédé de détection de point d'articulation, procédé de génération de modèle d'apprentissage et support d'enregistrement lisible par ordinateur - Google Patents

Dispositif de détection de point d'articulation, dispositif de génération de modèle d'apprentissage, procédé de détection de point d'articulation, procédé de génération de modèle d'apprentissage et support d'enregistrement lisible par ordinateur Download PDF

Info

Publication number
WO2022181253A1
WO2022181253A1 PCT/JP2022/003767 JP2022003767W WO2022181253A1 WO 2022181253 A1 WO2022181253 A1 WO 2022181253A1 JP 2022003767 W JP2022003767 W JP 2022003767W WO 2022181253 A1 WO2022181253 A1 WO 2022181253A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph structure
graph
feature
feature extractor
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/003767
Other languages
English (en)
Japanese (ja)
Inventor
遊哉 石井
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2023502226A priority Critical patent/JP7635823B2/ja
Publication of WO2022181253A1 publication Critical patent/WO2022181253A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present invention relates to a joint point detection device and a joint point detection method for detecting joint points of an object from an image, and further relates to a computer-readable recording medium recording a program for realizing these.
  • the present invention also relates to a learning model generation device and a learning model generation method for generating a learning model for detecting joint points of an object from an image, and furthermore, a program for realizing these is recorded. It relates to a computer-readable recording medium.
  • Non-Patent Document 1 discloses a system for estimating the posture of a person, especially the posture of a person's hand, from an image.
  • the system disclosed in Non-Patent Document 1 first acquires image data including an image of a hand. , its two-dimensional coordinates are estimated.
  • Non-Patent Document 1 inputs the estimated two-dimensional coordinates of each joint point to a graph convolution network (hereinafter also referred to as “GCN” (Graphic Convolution Network)), and each joint point Estimate the three-dimensional coordinates of A GCN is a network that takes as input a graph structure composed of a plurality of nodes and performs convolution processing using adjacent nodes (see, for example, Patent Document 1).
  • GCN graph convolution Network
  • the GCN performs pooling processing multiple times to reduce the number of nodes in the input graph structure in the previous stage, and finally reduces the number of nodes to 1. Also, in the later stage, the GCN performs unpooling processing for increasing the number of nodes by the same number of times as the pooling processing for the graph structure having one node. In addition, in the latter stage, the GCN connects the graph structure to be processed in the latter stage with the graph structure in the previous stage that has the same number of nodes, and executes convolution. Output a graph structure with the same number of nodes.
  • the graph structure of the two-dimensional coordinates of each joint point is input to the GCN, and the output graph structure and the three points of each joint point as correct data.
  • Machine learning of GCN is performed so that the difference with the graph structure of dimensional coordinates becomes small.
  • HOPE-Net A Graph-based Model for Hand-Object Pose Estimation”, [online], IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Georgia University , 31 March 2020, [searched February 12, 2021], Internet ⁇ URL: https://arxiv.org/abs/2004.00060>
  • An example of the object of the present invention is a joint point detection device, a learning model generation device, a joint point detection method, a learning model generation method, and a An object of the present invention is to provide a computer-readable recording medium.
  • a joint point detection device includes: a graph structure acquisition unit that acquires a first graph structure in which two-dimensional feature values of each of a plurality of target joint points are represented by nodes; a graph structure output unit that receives the first graph structure as an input and outputs a second graph structure indicating a three-dimensional feature quantity of each of the plurality of joint points using a graph convolution network; with The graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • the learning model generation device in one aspect of the present invention includes: training a first graph structure indicating a two-dimensional feature quantity of each of a plurality of target joint points and a second graph structure indicating a three-dimensional feature quantity of each of the plurality of joint points as correct data; a training data acquisition unit that acquires data;
  • the first graph structure is input to a graph convolution network, a difference between the graph structure output from the graph convolution network and the correct data is calculated, and the graph convolution network is adjusted so that the calculated difference becomes small.
  • a learning model generation unit that generates a machine learning model constructed by the graph convolutional network by machine learning parameters in with
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • a joint point detection method includes: a graph structure obtaining step of obtaining a first graph structure indicating a two-dimensional feature quantity of each of a plurality of target joint points; a graph structure output step of outputting a second graph structure showing three-dimensional feature amounts of each of the plurality of joint points using the first graph structure as an input and using a graph convolution network; has The graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • a learning model generation method includes: training a first graph structure indicating a two-dimensional feature quantity of each of a plurality of target joint points and a second graph structure indicating a three-dimensional feature quantity of each of the plurality of joint points as correct data; a training data acquisition step for acquiring as data;
  • the first graph structure is input to a graph convolution network, a difference between the graph structure output from the graph convolution network and the correct data is calculated, and the graph convolution network is adjusted so that the calculated difference becomes small.
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • a first computer-readable recording medium in one aspect of the present invention comprises: to the computer, a graph structure obtaining step of obtaining a first graph structure indicating a two-dimensional feature quantity of each of a plurality of target joint points; a graph structure output step of outputting a second graph structure showing three-dimensional feature amounts of each of the plurality of joint points using the first graph structure as an input and using a graph convolution network; Record a program containing instructions to execute
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • a second computer-readable recording medium in one aspect of the present invention comprises: to the computer, training a first graph structure indicating a two-dimensional feature quantity of each of a plurality of target joint points and a second graph structure indicating a three-dimensional feature quantity of each of the plurality of joint points as correct data; a training data acquisition step for acquiring as data;
  • the first graph structure is input to a graph convolution network, a difference between the graph structure output from the graph convolution network and the correct data is calculated, and the graph convolution network is adjusted so that the calculated difference becomes small.
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • the graph convolutional network in one aspect of the present invention is comprising a plurality of intermediate layers and an output layer, By machine learning a relationship between a first graph structure indicating two-dimensional feature amounts of each of a plurality of target joint points and a second graph structure indicating three-dimensional feature amounts of each of the plurality of joint points.
  • All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure. It is characterized by
  • FIG. 1 is a configuration diagram showing a schematic configuration of a learning model generation device according to Embodiment 1.
  • FIG. 2 is a configuration diagram specifically showing the configuration of the learning model generation device according to the first embodiment.
  • FIG. 3 is a configuration diagram showing the configuration of the graph convolutional network according to Embodiment 1.
  • FIG. 4 is an explanatory diagram for explaining processing in the graph convolution network shown in FIG.
  • FIG. 5 is a flowchart showing the operation of the learning model generation device according to Embodiment 1.
  • FIG. FIG. 6 is a configuration diagram showing a schematic configuration of a joint point detection device according to Embodiment 2.
  • FIG. 7 is a diagram more specifically showing the configuration of the joint point detection device according to the second embodiment.
  • FIG. 8 is a flowchart showing the operation of the joint point detection device according to the second embodiment.
  • FIG. 9 is a block diagram showing an example of a computer that realizes the learning model generation device according to Embodiment 1 and the joint point detection device according to Embodiment 2.
  • FIG. 9 is a block diagram showing an example of a computer that realizes the learning model generation device according to Embodiment 1 and the joint point detection device according to Embodiment 2.
  • Embodiment 1 A learning model generation device, a learning model generation method, a learning model generation program, and a graph convolution network in Embodiment 1 will be described below with reference to FIGS. 1 to 5.
  • FIG. 1 A learning model generation device, a learning model generation method, a learning model generation program, and a graph convolution network in Embodiment 1 will be described below with reference to FIGS. 1 to 5.
  • FIG. 1 A learning model generation device, a learning model generation method, a learning model generation program, and a graph convolution network in Embodiment 1 will be described below with reference to FIGS. 1 to 5.
  • FIG. 1 is a configuration diagram showing a schematic configuration of a learning model generation device according to Embodiment 1. As shown in FIG.
  • the learning model generation device 10 is a device that generates a machine learning model for detecting joint points of interest. As shown in FIG. 1 , the learning model generation device 10 includes a training data acquisition section 11 and a learning model generation section 12 .
  • the training data acquisition unit 11 obtains a first graph structure indicating two-dimensional feature amounts of each of the target joint points, and a second graph structure indicating three-dimensional feature amounts of each of the plurality of joint points as correct data.
  • Graph structure is obtained as training data.
  • the learning model generation unit 12 inputs the first graph structure to a graph convolution network (GCN) and calculates the difference between the graph structure output from the graph convolution network and the correct data. Then, the learning model generation unit 12 generates a machine learning model constructed by the graph convolution network by performing machine learning on the parameters in the graph convolution network so that the calculated difference becomes small.
  • GCN graph convolution network
  • a graph convolutional network comprises multiple intermediate layers and an output layer. All or some of the plurality of intermediate layers include feature extractors that perform feature extraction without changing the number of nodes in the graph structure and feature extractors that perform feature extraction with a reduced number of nodes in the graph structure. ing.
  • each feature extractor uses as input the graph structure output by each upper layer feature extractor.
  • the output layer uses the graph structure output by each feature extractor in the lowest intermediate layer as input and outputs a graph structure.
  • the graph convolutional network includes, in the intermediate layer, a feature extractor that performs feature extraction without changing the number of nodes in the graph structure, and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure. and a feature extractor that performs Therefore, according to the graph convolution network 30, it is possible to avoid a situation in which the feature quantity cannot be sufficiently extracted due to a small number of convolutions and a situation in which the feature quantity cannot be accurately extracted due to an insufficient number of dimensions. As a result, according to the first embodiment, it is possible to improve the detection accuracy when detecting the three-dimensional coordinates of the joint points from the image.
  • FIG. 2 is a configuration diagram specifically showing the configuration of the learning model generation device according to the first embodiment.
  • the learning model generation device 10 includes a storage unit 13 in addition to the training data acquisition unit 11 and the learning model generation unit 12 described above.
  • the storage unit 13 stores a graph convolutional network 30 (hereinafter referred to as "GCN 30").
  • the target is a human hand
  • the target is not limited to the human hand, and may be the entire human body or other parts.
  • the object may be anything that has joint points, and may be something other than a person, such as a robot.
  • the object is the human hand, so in the graph structure, the nodes that constitute it are represented by two-dimensional or three-dimensional feature amounts of each joint point of the hand.
  • a specific example of the feature amount is a coordinate value.
  • 21 indicates the graph structure obtained from the image data 20.
  • nodes 22 represent two-dimensional coordinate values of joint points of the hand as feature quantities.
  • Graph structure 21 is a first graph structure.
  • Reference numeral 23 denotes a second graph structure, which is a graph structure for correct data.
  • nodes 24 represent three-dimensional coordinate values of joint points of the hand as feature quantities.
  • the nodes of the graph structure may represent two-dimensional or three-dimensional feature amounts of parts other than the joint points, for example, characteristic parts such as fingertips.
  • the first graph structure as training data is obtained by inputting target image data into a machine learning model that machine-learns the relationship between the image data of the joint points and the graph structure. can be done.
  • the training data acquisition unit 11 acquires the first graph structure 21 and the second graph structure 23 as correct data as training data. Then, the training data acquisition unit 11 inputs the acquired training data to the learning model generation unit 12 .
  • the learning model generation unit 12 first acquires the GCN 30 from the storage unit 13. Next, the learning model generation unit 12 inputs the first graph structure 21 constituting the training data to the GCN 30, the second graph structure output from the GCN 30, and the second graph structure 23 as correct data. Calculate the difference between Then, the learning model generation unit 12 updates the parameters of the GCN 30 so that the calculated difference is minimized, and stores the GCN 30 with the updated parameters in the storage unit 13 . As a result, a GCN is generated for detecting the three-dimensional coordinates of the target joint point.
  • FIG. 3 is a configuration diagram showing the configuration of the graph convolutional network according to Embodiment 1.
  • FIG. 4 is an explanatory diagram for explaining processing in the graph convolution network shown in FIG.
  • the GCN 30 includes an input layer 31, multiple intermediate layers, and an output layer 33.
  • the input layer 31 accepts input of the first graph structure 21 .
  • the plurality of hidden layers consisting of a first hidden layer 32a and a second hidden layer 32b, perform feature extraction of the first graph structure.
  • the output layer 33 uses, as input, the graph structure output by each feature extractor of the intermediate layer, which is the lowest layer, and outputs a second graph structure.
  • the first intermediate layer 32a includes only the first feature extractor (" ⁇ " in FIG. 3) that performs feature extraction without changing the number of nodes in the graph structure. .
  • a first feature extractor performs convolution.
  • each feature extractor uses as input the graph structure output by the feature extractor that performs feature extraction with the same number of nodes in the upper layer.
  • the feature extractor of the first intermediate layer 32a connects the graph structures output by each feature extractor and executes convolution.
  • the second intermediate layer 32b includes a second feature extractor (“ ⁇ ” in FIG. 3) that extracts features by reducing the number of nodes in the graph structure, and a second feature extractor that extracts features by increasing the number of nodes in the graph structure. 3 feature extractors (hatched “o” in FIG. 3), or both. A second feature extractor performs pooling and a third feature extractor performs unpooling.
  • the second intermediate layer 32b also includes a first feature extractor (“O” in FIG. 3). In each of the second intermediate layers 32b, each feature extractor uses as input a plurality of graph structures output by each upper layer feature extractor.
  • the input layer 31 also includes a first feature extractor (" ⁇ " in FIG. 3).
  • the output layer 33 includes a first feature extractor (“ ⁇ ” in FIG. 3) and a third feature extractor (hatched “ ⁇ ” in FIG. 3).
  • graph structures with different numbers of nodes are generated by feature extractors, and graph structures with different numbers of nodes are exchanged between feature extractors.
  • numbers attached to the graph structure indicate the number of nodes.
  • FIG. 5 is a flowchart showing the operation of the learning model generation device according to Embodiment 1.
  • FIG. 1 to 4 will be referred to as needed in the following description.
  • the learning model generation method is implemented by operating the learning model generation device 10 . Therefore, the explanation of the learning model generation method in Embodiment 1 is replaced with the following explanation of the operation of the learning model generation device.
  • the training data acquisition unit 11 obtains a first graph structure indicating two-dimensional feature amounts of each of the target joint points, and correct data of each of the joint points.
  • a second graph structure representing a three-dimensional feature quantity is acquired as training data (step A1).
  • the learning model generation unit 12 inputs the first graph structure acquired as training data in step A1 to the GCN 30, and calculates the difference between the graph structure output from the GCN and the correct data. Then, the learning model generation unit 12 updates the parameters of the GCN so that the calculated difference becomes smaller (step A2).
  • the learning model generation unit 12 stores the GCN whose parameters are updated in step A2 in the storage unit 13 (step A3). This generates a GCN that can detect the three-dimensional coordinates of joint points.
  • Embodiment 1 since a GCN is constructed that can sufficiently and accurately extract the feature amount, the detection accuracy when detecting the three-dimensional coordinates of the target joint point from the image is will be improved.
  • a program for generating a learning model in Embodiment 1 may be a program that causes a computer to execute steps A1 to A3 shown in FIG. By installing this program in a computer and executing it, the learning model generation device and learning model generation method in Embodiment 1 can be realized.
  • the processor of the computer functions as a training data acquisition unit 11 and a learning model generation unit 12 to perform processing.
  • the storage unit 13 may be implemented by storing the data files constituting these in a storage device such as a hard disk provided in the computer, or may be realized by storing the data files in a storage device of another computer. It may be realized by Moreover, as a computer, a smart phone and a tablet-type terminal device are mentioned other than general-purpose PC.
  • the learning model generation program in Embodiment 1 may be executed by a computer system constructed by a plurality of computers.
  • each computer may function as either the training data acquisition unit 11 or the learning model generation unit 12 .
  • Embodiment 2 Next, in Embodiment 2, a joint point detection device, a joint point detection method, and a joint point detection program will be described with reference to FIGS. 6 and 7.
  • FIG. 2 a joint point detection device, a joint point detection method, and a joint point detection program will be described with reference to FIGS. 6 and 7.
  • FIG. 6 is a configuration diagram showing a schematic configuration of a joint point detection device according to Embodiment 2. As shown in FIG.
  • a joint point detection device 40 according to Embodiment 2 shown in FIG. 6 is a device for detecting joint points of an object, for example, a living body, a robot, or the like. As shown in FIG. 6 , the joint point detection device 40 includes a graph structure acquisition section 41 and a graph structure output section 42 .
  • the graph structure acquisition unit 41 acquires a first graph structure in which two-dimensional feature quantities of each of a plurality of target joint points are represented by nodes.
  • the graph structure output unit 42 receives the first graph structure and uses a graph convolution network to output a second graph structure indicating the three-dimensional feature quantity of each of the joint points.
  • a graph convolutional network comprises multiple intermediate layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure. All or some of the plurality of intermediate layers include feature extractors that perform feature extraction without changing the number of nodes in the graph structure and feature extractors that perform feature extraction with a reduced number of nodes in the graph structure. ing.
  • each feature extractor uses as input the graph structure output by each feature extractor in the upper layer.
  • the output layer uses the graph structure output by each feature extractor in the lowest intermediate layer as input and outputs a graph structure.
  • a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure. is used to output a second graph structure. Therefore, according to the second embodiment, it is possible to avoid a situation in which the feature amount cannot be sufficiently extracted due to a small number of convolutions and a situation in which the feature amount cannot be accurately extracted due to an insufficient number of dimensions. As a result, according to the second embodiment, it is possible to improve the detection accuracy when detecting the three-dimensional coordinates of the joint points from the image.
  • FIG. 7 is a diagram more specifically showing the configuration of the joint point detection device according to the second embodiment.
  • the joint point detection device 40 includes a storage unit 43 in addition to the graph structure acquisition unit 41 and the graph structure output unit 42 described above.
  • the storage unit 43 stores the GCN 30 shown in FIG. 2 in the first embodiment.
  • the target is a human hand
  • the joint point detection target is not limited to the human hand, and may be the entire human body or other parts.
  • the target of joint point detection may be any object that has joint points, and may be an object other than a person, such as a robot.
  • two-dimensional or three-dimensional coordinate values can be mentioned as feature values indicated by nodes in the graph structure.
  • the nodes of the graph structure may represent two-dimensional or three-dimensional feature amounts of parts other than the joint points, for example, characteristic parts such as fingertips.
  • the graph structure acquisition unit 41 acquires a first graph structure 50 obtained from image data of a human hand, as shown in FIG.
  • the first graph structure can be obtained by inputting image data into a machine learning model that machine-learns the relationship between the image data of the joint points and the graph structure.
  • the graph structure output unit 42 acquires the GCN 30 from the storage unit 43. Then, the graph structure output unit 42 inputs the first graph structure 50 to the GCN 30, and causes the GCN to output a second graph structure 51 indicating the three-dimensional feature quantity of each of the multiple joint points.
  • the GCN 30 includes an input layer 31, a plurality of intermediate layers 32a and 32b, and an output layer 33, as described in the first embodiment, and mechanically expresses the relationship between the first graph structure and the second graph structure. Built by learning. Therefore, in the output second graph structure 51, the three-dimensional feature amount (coordinate value) indicated by each node is a highly accurate value.
  • FIG. 8 is a flowchart showing the operation of the joint point detection device according to the second embodiment. 6 and 7 will be referred to as necessary in the following description. Further, in the second embodiment, the joint point detection method is implemented by operating the joint point detection device 40 . Therefore, the description of the joint point detection method in the second embodiment is replaced with the description of the operation of the joint point detection device 40 below.
  • the graph structure acquisition unit 41 first acquires a first graph structure 50 obtained from image data of a human hand. (Step B1).
  • the graph structure output unit 42 inputs the first graph structure to the GCN 30, and causes the GCN to output a second graph structure indicating three-dimensional feature amounts of each of the plurality of joint points (step B2). .
  • each node represents the three-dimensional coordinates of each joint point of the target human hand, so each joint point of the human hand is detected in step B2.
  • the three-dimensional feature values (coordinate values) indicated by each node are highly accurate values in the output second graph structure.
  • the joint point detection program in the second embodiment may be any program that causes a computer to execute steps B1 and B2 shown in FIG. By installing this program in a computer and executing it, the joint point detecting device and the joint point detecting method in the second embodiment can be realized.
  • the processor of the computer functions as a graph structure acquisition unit 41 and a graph structure output unit 42 to perform processing.
  • the storage unit 43 may be realized by storing data files constituting these in a storage device such as a hard disk provided in the computer, or may be realized by storing the data files in a storage device of another computer. It may be realized by Moreover, as a computer, a smart phone and a tablet-type terminal device are mentioned other than general-purpose PC.
  • the joint point detection program in Embodiment 2 may be executed by a computer system constructed by a plurality of computers.
  • each computer may function as either the graph structure acquisition unit 41 or the graph structure output unit 42, respectively.
  • FIG. 9 is a block diagram showing an example of a computer that realizes the learning model generation device according to Embodiment 1 and the joint point detection device according to Embodiment 2.
  • FIG. 9 is a block diagram showing an example of a computer that realizes the learning model generation device according to Embodiment 1 and the joint point detection device according to Embodiment 2.
  • a computer 110 includes a CPU (Central Processing Unit) 111, a main memory 112, a storage device 113, an input interface 114, a display controller 115, a data reader/writer 116, and a communication interface 117. and These units are connected to each other via a bus 121 so as to be able to communicate with each other.
  • CPU Central Processing Unit
  • the computer 110 may include a GPU (Graphics Processing Unit) or an FPGA (Field-Programmable Gate Array) in addition to the CPU 111 or instead of the CPU 111 .
  • a GPU or FPGA can execute the programs in the embodiments.
  • the CPU 111 expands the program in the embodiment, which is composed of a code group stored in the storage device 113, into the main memory 112 and executes various operations by executing each code in a predetermined order.
  • the main memory 112 is typically a volatile storage device such as DRAM (Dynamic Random Access Memory).
  • the program in the embodiment is provided in a state stored in a computer-readable recording medium 120. It should be noted that the program in this embodiment may be distributed on the Internet connected via communication interface 117 .
  • Input interface 114 mediates data transmission between CPU 111 and input devices 118 such as a keyboard and mouse.
  • the display controller 115 is connected to the display device 119 and controls display on the display device 119 .
  • the data reader/writer 116 mediates data transmission between the CPU 111 and the recording medium 120, reads programs from the recording medium 120, and writes processing results in the computer 110 to the recording medium 120.
  • Communication interface 117 mediates data transmission between CPU 111 and other computers.
  • the recording medium 120 include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic recording media such as flexible disks, and CD- Optical recording media such as ROM (Compact Disk Read Only Memory) can be mentioned.
  • CF Compact Flash
  • SD Secure Digital
  • magnetic recording media such as flexible disks
  • CD- Optical recording media such as ROM (Compact Disk Read Only Memory) can be mentioned.
  • the learning model generation device 10 and the joint point detection device 40 can also be realized by using hardware corresponding to each part instead of a computer in which a program is installed. Further, the learning model generation device 10 and the joint point detection device 40 may be partly implemented by a program and the rest by hardware.
  • (Appendix 1) a graph structure acquisition unit that acquires a first graph structure in which two-dimensional feature values of each of a plurality of target joint points are represented by nodes; a graph structure output unit that receives the first graph structure as an input and outputs a second graph structure indicating a three-dimensional feature quantity of each of the plurality of joint points using a graph convolution network; with The graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • the joint point detection device according to appendix 1, the graph convolutional network, wherein the plurality of hidden layers comprises a first hidden layer and a second hidden layer;
  • the first hidden layer comprises only feature extractors that perform feature extraction without changing the number of nodes in the graph structure, and in each of the first hidden layers, each feature extractor has the same number of nodes in the upper layer.
  • the second intermediate layer includes a feature extractor that performs feature extraction without changing the number of nodes in the graph structure, a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure, and the number of nodes in the graph structure.
  • a learning model generation unit that generates a machine learning model constructed by the graph convolutional network by machine learning parameters in with
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • a learning model generation device characterized by:
  • the learning model generation device comprises a first hidden layer and a second hidden layer;
  • the first hidden layer comprises only feature extractors that perform feature extraction without changing the number of nodes in the graph structure, and in each of the first hidden layers, each feature extractor has the same number of nodes in the upper layer.
  • the second intermediate layer includes a feature extractor that performs feature extraction without changing the number of nodes in the graph structure, a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure, and the number of nodes in the graph structure.
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • a machine learning model generation step of generating a machine learning model constructed by the graph convolutional network by machine learning parameters in has
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • a learning model generation method characterized by:
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, and is constructed by machine learning the relationship between the first graph structure and the second graph structure, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • the graph convolutional network comprises a plurality of hidden layers and an output layer, All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • a computer-readable recording medium characterized by:
  • (Appendix 11) comprising a plurality of intermediate layers and an output layer, By machine learning a relationship between a first graph structure indicating two-dimensional feature amounts of each of a plurality of target joint points and a second graph structure indicating three-dimensional feature amounts of each of the plurality of joint points.
  • All or some of the plurality of intermediate layers include a feature extractor that performs feature extraction without changing the number of nodes in the graph structure and a feature extractor that performs feature extraction with a reduced number of nodes in the graph structure.
  • each feature extractor in each of the plurality of intermediate layers uses as input a graph structure output by each feature extractor in the upper layer
  • the output layer uses as input the graph structure output by each feature extractor of the lowest intermediate layer, and outputs a graph structure.
  • a graph convolutional network characterized by:
  • the present invention it is possible to improve detection accuracy when detecting three-dimensional coordinates of joint points from an image.
  • INDUSTRIAL APPLICABILITY The present invention is useful in fields that require posture detection of objects having joint points, such as people and robots. Specific fields include video surveillance and user interfaces.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un dispositif de génération de modèle d'apprentissage (10) comprenant : une unité d'acquisition de données d'apprentissage (11) qui acquiert une première structure de graphe qui indique une valeur de caractéristique bidimensionnelle pour un point d'articulation et une seconde structure de graphe qui indique une valeur de caractéristique tridimensionnelle pour un point d'articulation en tant que données de réponse correcte ; et une unité de génération de modèle d'apprentissage (12) qui entre la première structure de graphe dans un réseau de convolution graphique, qui calcule la différence entre la structure de graphe de sortie et les données de réponse correcte, et qui apprend automatiquement des paramètres pour le réseau de convolution graphique de sorte à réduire la différence. Le réseau de convolution graphique comprend une couche intermédiaire et une couche de sortie. La couche intermédiaire comprend : un dispositif d'extraction de caractéristiques qui exécute une extraction de caractéristiques sans changer le nombre de nœuds dans la structure de graphe ; et un dispositif d'extraction de caractéristiques qui réduit le nombre de nœuds dans la structure de graphe et qui effectue une extraction de caractéristiques. Chaque dispositif d'extraction de caractéristiques utilise la structure de graphe délivrée en sortie par un dispositif d'extraction de caractéristiques de niveau supérieur comme entrées pour celui-ci. La couche de sortie délivre en sortie une structure de graphe, à l'aide de la structure de graphe délivrée en sortie par chaque dispositif d'extraction de caractéristiques dans une couche intermédiaire parmi les couches les plus basses comme entrée pour celle-ci.
PCT/JP2022/003767 2021-02-26 2022-02-01 Dispositif de détection de point d'articulation, dispositif de génération de modèle d'apprentissage, procédé de détection de point d'articulation, procédé de génération de modèle d'apprentissage et support d'enregistrement lisible par ordinateur Ceased WO2022181253A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023502226A JP7635823B2 (ja) 2021-02-26 2022-02-01 関節点検出装置、学習モデル生成装置、関節点検出方法、学習モデル生成方法、及びプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021029412 2021-02-26
JP2021-029412 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022181253A1 true WO2022181253A1 (fr) 2022-09-01

Family

ID=83048173

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/003767 Ceased WO2022181253A1 (fr) 2021-02-26 2022-02-01 Dispositif de détection de point d'articulation, dispositif de génération de modèle d'apprentissage, procédé de détection de point d'articulation, procédé de génération de modèle d'apprentissage et support d'enregistrement lisible par ordinateur

Country Status (2)

Country Link
JP (1) JP7635823B2 (fr)
WO (1) WO2022181253A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024143048A1 (fr) * 2022-12-28 2024-07-04 日本電気株式会社 Dispositif et procédé de génération de modèle d'apprentissage, dispositif et procédé de détection de point de jonction et support d'enregistrement lisible par ordinateur

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DOOSTI BARDIA; NAHA SHUJON; MIRBAGHERI MAJID; CRANDALL DAVID J: "HOPE-Net: A Graph-Based Model for Hand-Object Pose Estimation", 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 13 June 2020 (2020-06-13), pages 6607 - 6616, XP033804923, DOI: 10.1109/CVPR42600.2020.00664 *
MD ZAHANGIR ALOM; MAHMUDUL HASAN; CHRIS YAKOPCIC; TAREK M. TAHA; VIJAYAN K. ASARI: "Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation", ARXIV.ORG, 20 February 2018 (2018-02-20), pages 1 - 12, XP081216782 *
SEO HYUNSEOK; HUANG CHARLES; BASSENNE MAXIME; XIAO RUOXIU; XING LEI: "Modified U-Net (mU-Net) With Incorporation of Object-Dependent High Level Features for Improved Liver and Liver-Tumor Segmentation in CT Images", IEEE TRANSACTIONS ON MEDICAL IMAGING, vol. 39, no. 5, 18 October 2019 (2019-10-18), USA, pages 1316 - 1325, XP011785778, ISSN: 0278-0062, DOI: 10.1109/TMI.2019.2948320 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024143048A1 (fr) * 2022-12-28 2024-07-04 日本電気株式会社 Dispositif et procédé de génération de modèle d'apprentissage, dispositif et procédé de détection de point de jonction et support d'enregistrement lisible par ordinateur

Also Published As

Publication number Publication date
JP7635823B2 (ja) 2025-02-26
JPWO2022181253A1 (fr) 2022-09-01

Similar Documents

Publication Publication Date Title
EP2880633B1 (fr) Animation d'objets utilisant le corps humain
EP3326156B1 (fr) Tessellation cohérente par l'intermédiaire d'un suivi de surface sensible à la topologie
CN110930386B (zh) 图像处理方法、装置、设备及存储介质
CN111868786B (zh) 跨设备监控计算机视觉系统
CN114550282A (zh) 多人三维姿态估计方法、装置及电子设备
JP2019016164A (ja) 学習データ生成装置、推定装置、推定方法及びコンピュータプログラム
JP5674550B2 (ja) 状態追跡装置、方法、及びプログラム
CN114859938B (zh) 机器人、动态障碍物状态估计方法、装置和计算机设备
CN118314618A (zh) 融合虹膜分割的眼动追踪方法、装置、设备及存储介质
CN109934165A (zh) 一种关节点检测方法、装置、存储介质及电子设备
KR101586007B1 (ko) 데이터 처리 장치 및 방법
JP7635823B2 (ja) 関節点検出装置、学習モデル生成装置、関節点検出方法、学習モデル生成方法、及びプログラム
JP6986160B2 (ja) 画像処理方法および画像処理装置
CN116229262A (zh) 一种建筑信息模型的建模方法及装置
WO2019186833A1 (fr) Dispositif de traitement d'image, procédé de traitement d'image et support d'enregistrement lisible par ordinateur
JP2020004179A (ja) 画像処理プログラム、画像処理装置及び画像処理方法
CN112560959A (zh) 骨骼动画的姿态匹配方法、装置、电子设备及存储介质
WO2024166600A1 (fr) Dispositif de génération de modèle d'apprentissage, procédé de génération de modèle d'apprentissage et support d'enregistrement lisible par ordinateur
KR20250107418A (ko) 타겟 뷰의 장면 이미지를 복원하는 방법 및 전자 장치
JP7521704B2 (ja) 姿勢推定装置、学習モデル生成装置、姿勢推定方法、学習モデル生成方法及び、プログラム
JP7635822B2 (ja) 関節点検出装置、関節点検出方法、及びプログラム
JP7687382B2 (ja) 関節点検出装置、関節点検出方法、及びプログラム
JP2023167320A (ja) 学習モデル生成装置、関節点検出装置、学習モデル生成方法、関節点検出方法、及びプログラム
JP7683784B2 (ja) 情報処理装置、情報処理方法及びプログラム
CN113077512B (zh) 一种rgb-d位姿识别模型训练方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22759294

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023502226

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22759294

Country of ref document: EP

Kind code of ref document: A1