[go: up one dir, main page]

WO2025033602A1 - Appareil et procédé d'estimation de pose d'objet d'après une image rvb et des informations de profondeur - Google Patents

Appareil et procédé d'estimation de pose d'objet d'après une image rvb et des informations de profondeur Download PDF

Info

Publication number
WO2025033602A1
WO2025033602A1 PCT/KR2023/016199 KR2023016199W WO2025033602A1 WO 2025033602 A1 WO2025033602 A1 WO 2025033602A1 KR 2023016199 W KR2023016199 W KR 2023016199W WO 2025033602 A1 WO2025033602 A1 WO 2025033602A1
Authority
WO
WIPO (PCT)
Prior art keywords
pose
target object
pose estimation
semantic
estimation device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/KR2023/016199
Other languages
English (en)
Korean (ko)
Inventor
엄태영
박지현
이나현
노경석
이효준
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KOREA INST OF ROBOT AND CONVERGENCE
Korea Institute of Robot and Convergence
Original Assignee
KOREA INST OF ROBOT AND CONVERGENCE
Korea Institute of Robot and Convergence
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KOREA INST OF ROBOT AND CONVERGENCE, Korea Institute of Robot and Convergence filed Critical KOREA INST OF ROBOT AND CONVERGENCE
Publication of WO2025033602A1 publication Critical patent/WO2025033602A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box

Definitions

  • the present invention relates to a technology for estimating the pose of an object, and more specifically, to a device and method for estimating the pose of an object based on an RGB image and depth information that accurately distinguishes an object having a similar shape to a target object but a different size.
  • Manipulation for robot tasks is mostly aimed at specifying the object to be transported, picking it up, and transporting it to the desired destination.
  • technology for specifying objects from camera images has been continuously developed.
  • the problem to be solved by the present invention is to provide a device and method for estimating the pose of an object based on an RGB image and depth information, which accurately estimates the pose of a target object based on an RGB image and depth information, and accurately distinguishes it from objects having similar shapes but different sizes.
  • the pose estimation device includes a region segmentation unit which segments a semantic region based on an RGB image and a depth image including a target object and calculates depth information for each of the segmented semantic regions, a pose estimation unit which recognizes the target object by applying the RGB image to a pre-learned recognition model and estimates a 3D pose, and a semantic map generation unit which reflects the depth information to the estimated 3D pose to correct the 3D pose of the target object and generates a semantic map in which the semantic region and the corrected 3D pose of the target object are combined.
  • the recognition model is characterized in that it further includes a learning unit that learns the features of the target object on a pixel-by-pixel basis using the recognition model, and the recognition model includes a convolutional neural network that learns recognition of the target object using a learning RGB image, a PointNet that learns feature extraction of the target object using a learning depth image, and a multilayer perceptron (MLP) that uses the respective result values of the convolutional neural network and the PointNet as input values, extracts global features of the target object through the input values, and performs learning on a pixel-by-pixel basis.
  • MLP multilayer perceptron
  • the above region segmentation unit is characterized by calculating depth information for the semantic region using histogram analysis.
  • the pose estimation unit is characterized by specifying the target object as a 3D bounding box when the 3D pose is estimated.
  • the pose estimation unit is characterized in that, when multiple target objects are recognized, it estimates only the 3D pose of the target object with the highest preset priority.
  • the pose estimation unit is characterized in that, when multiple target objects are recognized, the order is set by labeling target objects starting from those with higher priorities based on preset priorities.
  • the above semantic map generation unit is characterized by generating a semantic map so that a sequence number is included for each target object.
  • a pose estimation method includes a step in which a pose estimation device divides a semantic region based on an RGB image and a depth image including a target object, and calculates depth information for each of the divided semantic regions, a step in which the pose estimation device recognizes the target object by applying the RGB image to a pre-learned recognition model, and estimates a three-dimensional pose, and a step in which the pose estimation device reflects the depth information to the estimated three-dimensional pose to correct the three-dimensional pose of the target object, and generates a semantic map in which the semantic region and the corrected three-dimensional pose of the target object are combined.
  • the above-described generating step is characterized by generating depth information for the semantic region using histogram analysis.
  • the estimating step is characterized by specifying the target object as a 3D bounding box when the 3D pose is estimated.
  • the estimating step is characterized in that, when multiple target objects are recognized, only the 3D pose of the target object with the highest preset priority is estimated.
  • the above estimating step is characterized in that, when multiple target objects are recognized, the order is set by labeling target objects starting from those with higher priorities based on preset priorities.
  • the above generating step is characterized by generating a semantic map so that a sequence number is included for each target object.
  • the object pose estimation system comprises: a robot performing a manipulation-related task; and a pose estimation device provided in the robot and estimating a pose of a target object for the task, wherein the pose estimation device comprises: a region dividing unit which divides a semantic region based on an RGB image and a depth image including the target object and calculates depth information for each of the divided semantic regions; a pose estimation unit which recognizes the target object by applying the RGB image to a previously learned recognition model and estimates a three-dimensional pose; and a semantic map generation unit which reflects the depth information to the estimated three-dimensional pose to correct the three-dimensional pose of the target object and generates a semantic map in which the semantic region and the corrected three-dimensional pose of the target object are combined.
  • the robot is characterized in that, when there are multiple target objects included in the semantic map, it performs a task based on the order of the target objects included in the semantic map.
  • depth information is specified by performing semantic region segmentation and histogram analysis based on a depth image including a target object
  • recognition and pose of the target object are estimated by applying a pre-learned recognition model to an RGB image including the target object, and a semantic map for the target object can be generated by reflecting the specified depth information to the estimated pose.
  • FIG. 1 is a configuration diagram for explaining an object pose estimation system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a pose estimation device according to an embodiment of the present invention.
  • FIG. 3 is a block diagram for explaining a control unit according to an embodiment of the present invention.
  • FIG. 4 is a drawing schematically explaining a process for estimating a pose of an object according to an embodiment of the present invention.
  • Figure 5 is a diagram for explaining a learning process according to an embodiment of the present invention.
  • FIG. 6 is a diagram for explaining a process of generating a semantic map according to an embodiment of the present invention.
  • FIG. 7 is a flowchart for explaining a pose estimation method from the perspective of a pose estimation device according to an embodiment of the present invention.
  • FIG. 8 is a flowchart for explaining a pose estimation method from the perspective of an object pose estimation system according to an embodiment of the present invention.
  • FIG. 9 is a block diagram illustrating a computing device according to an embodiment of the present invention.
  • a component when it is mentioned that a component is 'connected' or 'connected' to another component, it should be understood that it may be directly connected or connected to that other component, but that there may be other components in between.
  • a component when it is mentioned that a component is 'directly connected' or 'directly connected' to another component, it should be understood that there are no other components in between.
  • 'and/or' includes a combination of multiple listed items or any one of multiple listed items.
  • 'A or B' can include 'A', 'B', or 'both A and B'.
  • FIG. 1 is a configuration diagram for explaining an object pose estimation system according to an embodiment of the present invention.
  • the object pose estimation system (400) accurately estimates the pose of a target object based on an RGB image and depth information, and accurately performs distinction from objects having similar shapes but different sizes. Through this, the object pose estimation system (400) can reduce errors in operations caused by objects having similar shapes but different sizes.
  • the object pose estimation system (400) includes a pose estimation device (100) and a robot (200), and may further include a user terminal (300).
  • the pose estimation device (100) estimates the pose of a target object used for the robot (200) to perform a manipulation-related task. To this end, the pose estimation device (100) divides a semantic region based on an RGB image and a depth image including the target object, and calculates depth information for each of the divided semantic regions. In addition, the pose estimation device (100) recognizes the target object by applying the RGB image to a pre-learned recognition model, and estimates a three-dimensional pose. The pose estimation device (100) reflects the depth information to the estimated three-dimensional pose to correct the three-dimensional pose of the target object. The pose estimation device (100) generates a semantic map in which the semantic region and the corrected three-dimensional pose of the target object are combined. The pose estimation device (100) transmits the generated semantic map to the robot (100) and controls the robot (100) to perform a task based on the semantic map.
  • the robot (200) performs manipulation-related tasks.
  • the robot (200) can perform a drive to pick up a target object and transport it to a desired destination, but is not limited thereto.
  • the robot (200) receives a semantic map from the pose estimation device (100) and performs a task using the received semantic map. That is, the robot (200) can perform accurate and precise picking using the size, shape, and distance of the target object included in the semantic map, and then transport the target object to the destination.
  • the robot (200) can perform a task based on the order of the target objects included in the semantic map when there are multiple target objects included in the semantic map. That is, the robot (200) can preferentially transport a target object with a higher order.
  • the robot (200) can generate monitoring information related to the task.
  • the monitoring information can include the robot's task status, task time, task intensity, task completion rate, surrounding environment information, semantic map, etc.
  • the robot (200) transmits monitoring information to the user terminal (300).
  • the user terminal (300) is a terminal used by a user (or administrator) and informs the user about the working status of the robot (200). That is, the user terminal (300) receives monitoring information from the robot (200) and outputs the received monitoring information. Through this, the user terminal (300) allows the user to intuitively recognize the working status of the robot (200), thereby supporting the user to respond immediately in the event of an emergency.
  • the drawing depicts a structure in which the pose estimation device (100) is included in the robot (200), but the present invention is not limited thereto and the pose estimation device (100) may be implemented as a separate configuration separate from the robot (200).
  • the object pose estimation system (400) can support communication between the pose estimation device (100), the robot (200), and the user terminal (300) by establishing a communication network (450).
  • the communication network (450) can be composed of a backbone network and a subscriber network.
  • the backbone network can be composed of one or more integrated networks among an X.25 network, a Frame Relay network, an ATM network, an MPLS (Multi-Protocol Label Switching) network, and a GMPLS (Generalized Multi-Protocol Label Switching) network.
  • the subscriber network may be FTTH (Fiber To The Home), ADSL (Asymmetric Digital Subscriber Line), cable network, zigbee, bluetooth, Wireless LAN (IEEE 802.11b, IEEE 802.11a, IEEE 802.11g, IEEE 802.11n), Wireless Hart (ISO/IEC62591-1), ISA100.11a (ISO/IEC 62734), CoAP (Constrained Application Protocol), MQTT (Message Queuing Telemetry Transport), WIBro (Wireless Broadband), Wimax, 3G, HSDPA (High Speed Downlink Packet Access), 4G, 5G, and 6G.
  • the communication network (450) may be the Internet or a mobile communication network.
  • the communication network (450) may include any other widely known or future-developed wireless or wired communication method.
  • FIG. 2 is a block diagram for explaining a pose estimation device according to an embodiment of the present invention
  • FIG. 3 is a block diagram for explaining a control unit according to an embodiment of the present invention
  • FIG. 4 is a diagram for schematically explaining a process for estimating a pose of an object according to an embodiment of the present invention
  • FIG. 5 is a diagram for explaining a learning process according to an embodiment of the present invention
  • FIG. 6 is a diagram for explaining a process for generating a semantic map according to an embodiment of the present invention.
  • the pose estimation device (100) performs learning on a learning target object (T 1 ) (FIG. 4 (a)), and when learning is completed, recognizes and estimates a pose on an actual target object (T 2 ) (FIG. 4 (b) and (c)).
  • the pose estimation device (100) recognizes the actual target object (T 2 ) as a different object rather than the same object as the learning target object (T 1 ) by using depth information, thereby enabling accurate object recognition.
  • the pose estimation device (100) includes a camera unit (10), a sensor unit (20), a control unit (30), a communication unit (40), and a storage unit (50).
  • the camera unit (10) captures an image containing a target object used in manipulation-related tasks.
  • the camera unit (10) may include at least one camera.
  • the camera may be an RGB camera that generates an RGB image, or an RGBD camera that generates an RGB image and a depth image.
  • the sensor unit (20) generates a depth image containing a target object used in manipulation-related work.
  • the sensor unit (20) may be a 3D LiDAR and may generate a depth image by measuring a 3D point cloud.
  • the sensor unit (20) may be omitted if the camera is an RGBD camera.
  • the control unit (30) performs overall control of the pose estimation device (100).
  • the control unit (30) estimates an accurate pose of a target object to help the robot (100) perform its work smoothly.
  • the control unit (30) includes a region division unit (33), a pose estimation unit (35), and a semantic map generation unit (37), and may further include a learning unit (31).
  • the learning unit (31) learns the features of the target object on a pixel-by-pixel basis using a recognition model to which artificial intelligence technology is applied.
  • the recognition model includes a convolutional neural network (CNN) (M 1 ), a PointNet (M 2 ), and a multilayer perceptron (MLP) (M 3 ).
  • the convolutional neural network (M 1 ) learns to recognize the target object using a learning RGB image.
  • the convolutional neural network (M 1 ) can perform learning on the shape of the target object.
  • the PointNet (M 2 ) learns to extract features of the target object using a learning depth image.
  • the PointNet (M 2 ) can perform learning on the size of the target object based on a point cloud.
  • the multilayer perceptron (M 3 ) uses the respective output values of the convolutional neural network (M 1 ) and the PointNet (M 2 ) as input values, and extracts the global features of the target object through the input values to perform learning on a pixel-by-pixel basis.
  • the learning unit (31) can perform learning on not only the shape of the target object but also the size through the recognition model.
  • the learning unit (31) can repeat learning on the recognition model until the accuracy of the evaluation index desired by the user is reached. Meanwhile, the learning unit (31) can perform learning on only one specific target object, or on target objects of various shapes and sizes.
  • the region segmentation unit (33) segments the semantic region based on the RGB image and the depth image ((b) of FIG. 6) when the RGB image and the depth image including the target object (T) are collected ((a) of FIG. 6).
  • the region segmentation unit (33) calculates depth information for each segmented semantic region.
  • the region segmentation unit (33) calculates depth information for the semantic region using histogram analysis. In other words, the region segmentation unit (33) calculates depth information for each semantic region, thereby being able to specify depth information for each object included in the image.
  • the pose estimation unit (35) recognizes the target object (T) by applying the RGB image to the recognition model learned from the learning unit (31) and estimates the three-dimensional pose ((C) of FIG. 6). At this time, the pose estimation unit (35) can recognize only the target object (T) learned from the recognition model and estimate the three-dimensional pose of the recognized target object (T). For example, if the target object (T) learned from the recognition model is a cup, the pose estimation unit (35) can recognize the cup included in the RGB image and estimate the pose by confirming the handle direction of the recognized cup. Here, if the three-dimensional pose of the target object (T) is estimated, the pose estimation unit (35) can specify the target object (T) as a three-dimensional bounding box.
  • the pose estimation unit (35) can estimate only the 3D pose of the target object with the highest preset priority. In other words, the pose estimation unit (35) can recognize and estimate the pose of only the target object (T) with the highest priority.
  • the pose estimation unit (35) can set the order by labeling target objects with higher priorities based on the preset priorities when multiple target objects (T) are recognized.
  • the pose estimation unit (35) can recognize and estimate poses for all target objects (T) that have been learned.
  • the priorities can be set based on the importance of the work performed by the robot (200), the size, color, shape, etc. of the object.
  • the semantic map generation unit (37) reflects the depth information produced from the region division unit (33) to the 3D pose of the target object (T) estimated from the pose estimation unit (35) to correct the 3D pose of the target object (T). By correcting the 3D pose, the semantic map generation unit (37) can estimate the size and posture of the target object (T) more accurately.
  • the semantic map generation unit (37) generates a semantic map in which the divided semantic region and the corrected 3D pose are combined. In other words, the semantic map generation unit (37) generates a semantic map including information on the corrected 3D pose of the target object (T).
  • the semantic map also includes depth information on the remaining objects other than the target object (T).
  • the semantic map generation unit (37) generates a semantic map so that the sequence appears in the target object (T). Through this, the semantic map generation unit (37) can support the robot (200) to pick the target object (T) in sequence and transport it to the destination.
  • the communication unit (40) performs communication with the robot (200).
  • the communication unit (40) transmits the semantic map to the robot (200). If the pose estimation device (100) is included in the robot (200), the communication unit (40) transmits the semantic map through an internal line, and if the pose estimation device (100) is implemented in a separate configuration from the robot (200), the communication unit (40) can transmit the semantic map through a communication network (450).
  • the storage unit (50) stores a program or algorithm for driving the pose estimation device (100).
  • the storage unit (50) stores information for learning, such as a learning RGB image and a learning depth image, and stores an RGB image and a depth image containing a target object required for performing manipulation-related tasks.
  • the storage unit (50) stores information generated in the process of generating a semantic map.
  • the storage unit (50) may include at least one storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, an SD or XD memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read-Only Memory (ROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Programmable Read-Only Memory (PROM), a magnetic memory, a magnetic disk, and an optical disk.
  • a flash memory type for example, an SD or XD memory, etc.
  • RAM Random Access Memory
  • SRAM Static Random Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • FIG. 7 is a flowchart for explaining a pose estimation method from the perspective of a pose estimation device according to an embodiment of the present invention.
  • the pose estimation method performs semantic region segmentation and histogram analysis based on a depth image including a target object to specify depth information, and applies a pre-learned recognition model to an RGB image including the target object to recognize and estimate the pose of the target object.
  • the pose estimation method can generate a semantic map for the target object that reflects the specified depth information to the estimated pose, thereby producing the exact shape, size, etc. of the target object.
  • step S110 the pose estimation device (100) divides the semantic region.
  • the pose estimation device (100) divides the semantic region based on the RGB image and depth image including the target object, and calculates depth information for each of the divided semantic regions.
  • the pose estimation device (100) can calculate depth information for the semantic region using histogram analysis.
  • the pose estimation device (100) performs object recognition and pose estimation for the target object.
  • the pose estimation device (100) recognizes the target object and estimates the 3D pose by applying the RGB image to the pre-learned recognition model.
  • the recognition model is a model to which artificial intelligence technology is applied, and is learned to recognize the features of the target object on a pixel-by-pixel basis.
  • step S130 the pose estimation device (100) reflects the calculated depth information to the estimated 3D pose to correct the 3D pose of the target object. Through this, the pose estimation device (100) can estimate the size and pose of the target object more accurately.
  • step S140 the pose estimation device (100) generates a semantic map related to the target object.
  • the pose estimation device (100) generates a semantic map in which the semantic region and the 3D pose of the corrected target object are combined.
  • FIG. 8 is a flowchart for explaining a pose estimation method from the perspective of an object pose estimation system according to an embodiment of the present invention.
  • the pose estimation method can improve the accuracy and precision of a task by performing manipulation of a robot (200) based on a semantic map related to a target object.
  • the pose estimation device (100) collects information related to the target object.
  • the pose estimation device (100) collects an RGB image and a depth image containing the target object.
  • the pose estimation device (100) may collect the corresponding information using an RGB camera and a lidar sensor or an RGBD camera, but is not limited thereto.
  • step S220 the pose estimation device (100) generates a semantic map related to the target object.
  • the pose estimation device (100) generates a semantic map in which the semantic region and the 3D pose of the target object are combined based on the collected RGB image and depth image.
  • the pose estimation device (100) transmits the semantic map to the robot (200).
  • the pose estimation device (100) can transmit the semantic map through an internal line, and if it is implemented as a separate configuration from the robot (200), it can transmit the semantic map through a communication network.
  • the robot (200) performs manipulation-related tasks.
  • the robot (200) can perform tasks based on a semantic map to reduce the error rate during tasks.
  • the robot (200) generates monitoring information.
  • the robot (200) generates monitoring information related to the work being performed.
  • the monitoring information may include the robot's work status, work time, work intensity, work completion rate, surrounding environment information, semantic maps, etc.
  • the robot (200) transmits monitoring information to the user terminal (300).
  • the robot (200) can transmit the monitoring information through a communication network.
  • step S270 the user terminal (300) outputs monitoring information.
  • the user terminal (300) outputs the received monitoring information in real time to help the user intuitively recognize work-related information.
  • FIG. 9 is a block diagram illustrating a computing device according to an embodiment of the present invention.
  • the computing device may be a device described in this specification (e.g., a pose estimation device, a robot, a user terminal, etc.).
  • the computing device (TN100) may include at least one processor (TN110), a transceiver (TN120), and a memory (TN130).
  • the computing device (TN100) may further include a storage device (TN140), an input interface device (TN150), an output interface device (TN160), etc.
  • the components included in the computing device (TN100) may be connected by a bus (TN170) to communicate with each other.
  • the processor (TN110) can execute a program command stored in at least one of the memory (TN130) and the storage device (TN140).
  • the processor (TN110) may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.
  • the processor (TN110) may be configured to implement procedures, functions, methods, etc. described in relation to embodiments of the present invention.
  • the processor (TN110) may control each component of the computing device (TN100).
  • Each of the memory (TN130) and the storage device (TN140) can store various information related to the operation of the processor (TN110).
  • Each of the memory (TN130) and the storage device (TN140) can be configured with at least one of a volatile storage medium and a nonvolatile storage medium.
  • the memory (TN130) can be configured with at least one of a read-only memory (ROM) and a random access memory (RAM).
  • the transceiver (TN120) can transmit or receive wired or wireless signals.
  • the transceiver (TN120) can be connected to a network and perform communications.
  • the embodiments of the present invention are not implemented only through the devices and/or methods described so far, but may also be implemented through a program that realizes a function corresponding to the configuration of the embodiments of the present invention or a recording medium on which the program is recorded, and such implementation can be easily implemented by a person skilled in the art in the technical field to which the present invention belongs from the description of the embodiments described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Sont divulgués un appareil et un procédé d'estimation d'une pose d'un objet d'après une image RVB et des informations de profondeur. L'appareil permettant d'estimer une pose comprend : une unité de division de zone permettant de diviser une zone sémantique d'après une image RVB comprenant un objet cible et une image de profondeur, et de calculer des informations de profondeur pour chacune des zones sémantiques divisées ; une unité d'estimation de pose permettant de reconnaître l'objet cible en appliquant l'image RVB à un modèle de reconnaissance pré-appris, puis d'estimer une pose 3D ; et une unité de génération de carte sémantique permettant de corriger la pose 3D de l'objet cible en reflétant les informations de profondeur sur la pose 3D estimée, ainsi que de générer une carte sémantique dans laquelle la zone sémantique et la pose 3D corrigée de l'objet cible sont combinées.
PCT/KR2023/016199 2023-08-10 2023-10-18 Appareil et procédé d'estimation de pose d'objet d'après une image rvb et des informations de profondeur Pending WO2025033602A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2023-0105000 2023-08-10
KR1020230105000A KR20250023813A (ko) 2023-08-10 2023-08-10 Rgb 이미지 및 깊이 정보를 기반으로 객체의 포즈를 추정하는 장치 및 방법

Publications (1)

Publication Number Publication Date
WO2025033602A1 true WO2025033602A1 (fr) 2025-02-13

Family

ID=94534556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/016199 Pending WO2025033602A1 (fr) 2023-08-10 2023-10-18 Appareil et procédé d'estimation de pose d'objet d'après une image rvb et des informations de profondeur

Country Status (2)

Country Link
KR (1) KR20250023813A (fr)
WO (1) WO2025033602A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120269580A (zh) * 2025-06-10 2025-07-08 视瑞(北京)机器人技术有限公司 一种基于图像分割与物体位姿估计的物体抓取方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190038808A (ko) * 2016-06-24 2019-04-09 임피리얼 컬리지 오브 사이언스 테크놀로지 앤드 메디신 비디오 데이터의 객체 검출
KR102178013B1 (ko) * 2019-02-08 2020-11-12 한양대학교 산학협력단 물체 파지를 위한 훈련 데이터 생성 방법 및 파지 자세 결정 방법
KR20210078529A (ko) * 2018-11-29 2021-06-28 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드 목표 객체 3d 검출 방법, 장치, 매체 및 디바이스
KR102366396B1 (ko) * 2020-05-08 2022-02-22 동명대학교산학협력단 Rgb-d 데이터와 딥러닝을 이용한 3d객체영역분할 방법과 그 시스템
KR20220081261A (ko) * 2020-12-08 2022-06-15 삼성전자주식회사 객체 포즈 추정 방법 및 장치

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114332214B (zh) 2020-09-29 2024-12-06 北京三星通信技术研究有限公司 物体姿态估计方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190038808A (ko) * 2016-06-24 2019-04-09 임피리얼 컬리지 오브 사이언스 테크놀로지 앤드 메디신 비디오 데이터의 객체 검출
KR20210078529A (ko) * 2018-11-29 2021-06-28 베이징 센스타임 테크놀로지 디벨롭먼트 컴퍼니 리미티드 목표 객체 3d 검출 방법, 장치, 매체 및 디바이스
KR102178013B1 (ko) * 2019-02-08 2020-11-12 한양대학교 산학협력단 물체 파지를 위한 훈련 데이터 생성 방법 및 파지 자세 결정 방법
KR102366396B1 (ko) * 2020-05-08 2022-02-22 동명대학교산학협력단 Rgb-d 데이터와 딥러닝을 이용한 3d객체영역분할 방법과 그 시스템
KR20220081261A (ko) * 2020-12-08 2022-06-15 삼성전자주식회사 객체 포즈 추정 방법 및 장치

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120269580A (zh) * 2025-06-10 2025-07-08 视瑞(北京)机器人技术有限公司 一种基于图像分割与物体位姿估计的物体抓取方法及系统

Also Published As

Publication number Publication date
KR20250023813A (ko) 2025-02-18

Similar Documents

Publication Publication Date Title
US11238604B1 (en) Densifying sparse depth maps
WO2016053067A1 (fr) Génération de modèle tridimensionnel à l'aide de bords
WO2021101045A1 (fr) Appareil électronique et procédé de commande associé
WO2019088335A1 (fr) Serveur et système de collaboration intelligent, et procédé d'analyse associé basé sur la collaboration
WO2025033602A1 (fr) Appareil et procédé d'estimation de pose d'objet d'après une image rvb et des informations de profondeur
WO2020159241A1 (fr) Procédé permettant de traiter une image et appareil associé
WO2019190076A1 (fr) Procédé de suivi des yeux et terminal permettant la mise en œuvre dudit procédé
JP7107596B2 (ja) 駅監視システム及び駅監視方法
WO2020149601A1 (fr) Procédé et dispositif de reconnaissance d'image à grande vitesse à l'aide d'un réseau neuronal à convolution (cnn) tridimensionnel
WO2020111327A1 (fr) Dispositif sans contact et procédé de reconnaissance d'un attribut d'objet
WO2019112385A1 (fr) Procédé de codage d'informations temporelles de caractéristiques spécifiques à une trame de segment d'image en vue d'une reconnaissance de vidéo
WO2020235740A1 (fr) Système et procédé de service de positionnement intérieur basé sur une image
WO2022045519A1 (fr) Dispositif et procédé de sélection de modèle d'optimisation
WO2019146903A1 (fr) Appareil et procédé permettant de fournir une image de vue panoramique à l'aide d'informations de distance
WO2023096133A1 (fr) Procédé et dispositif pour fournir un modèle d'estimation de pose léger
CN113239873A (zh) 吸烟行为识别方法、系统、计算机设备和存储介质
CN114298129B (zh) 无人机巡检全过程树障实时分析研判方法、设备及介质
WO2018131729A1 (fr) Procédé et système de détection d'un objet mobile dans une image à l'aide d'une seule caméra
WO2024106682A1 (fr) Dispositif et procédé d'analyse de rugosité de surface moyenne par extraction d'une caractéristique d'une image de membrane
WO2020175729A1 (fr) Appareil et procédé pour détecter un point de caractéristique faciale à l'aide d'une carte de points caractéristiques gaussiens et d'un schéma de régression
WO2024106925A1 (fr) Système et procédé de communication basés sur une réalité augmentée et dispositif informatique pour la mise en œuvre de celui-ci
WO2023224251A1 (fr) Systèmes et procédés pour reconnaître des actions humaines sans visibilité directe
WO2023120743A1 (fr) Procédé d'apprentissage spécifique au contexte pour un modèle de reconnaissance d'objet tridimensionnel sur la base d'un nuage de points
WO2022131390A1 (fr) Procédé d'estimation de posture humaine tridimensionnelle basée sur un apprentissage auto-supervisé utilisant des images multi-vues
WO2022098164A1 (fr) Dispositif électronique et son procédé de commande

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23948595

Country of ref document: EP

Kind code of ref document: A1