US20240066695A1 - Systems and methods for online iterative re-planning - Google Patents
Systems and methods for online iterative re-planning Download PDFInfo
- Publication number
- US20240066695A1 US20240066695A1 US18/090,966 US202218090966A US2024066695A1 US 20240066695 A1 US20240066695 A1 US 20240066695A1 US 202218090966 A US202218090966 A US 202218090966A US 2024066695 A1 US2024066695 A1 US 2024066695A1
- Authority
- US
- United States
- Prior art keywords
- candidate object
- grasp
- agent
- trajectory
- trajectories
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1612—Programme controls characterised by the hand, wrist, grip control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1679—Programme controls characterised by the tasks executed
- B25J9/1689—Teleoperation
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/39—Robotics, robotics to robotics hand
- G05B2219/39001—Robot, manipulator control
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/39—Robotics, robotics to robotics hand
- G05B2219/39511—Reorient, rotate object in hand between fingers by action of fingers
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/40—Robotics, robotics mapping to robotics vision
- G05B2219/40465—Criteria is lowest cost function, minimum work path
Definitions
- Dexterous in-hand manipulation is a challenging robotics task which involves re-orientating an object from an initial pose to a goal pose while grasping the object. Re-orientating an object in-hand may be necessary when some task requires a certain object pose or grasp that is different from the initial grasped pose.
- a system for online iterative re-planning includes a processor and a memory storing instructions.
- the instructions When executed by the processor, the instructions cause the processor to receive, at a first time step, a first grasp and an initial object pose of an agent.
- the instructions When executed by the processor, the instructions also cause the processor to generate a first set of candidate object trajectories based on the first grasp and the initial object pose.
- Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step.
- the instructions When executed by the processor, the instructions further cause the processor to calculate contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. When executed by the processor, the instructions yet further cause the processor to select a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent. When executed by the processor, the instructions further cause the processor to cause the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- a method for online iterative re-planning includes receiving, at a first time step, a first grasp and an initial object pose of an agent.
- the method also includes generating a first set of candidate object trajectories based on the first grasp and the initial object pose.
- Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step.
- the method further includes calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories.
- the method yet further includes selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent.
- the method includes causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- a non-transitory computer readable storage medium storing instructions that when executed by a computer having a processor to perform a method for online iterative re-planning.
- the method includes receiving, at a first time step, a first grasp and an initial object pose of an agent.
- the method also includes generating a first set of candidate object trajectories based on the first grasp and the initial object pose.
- Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step.
- the method further includes calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories.
- the method yet further includes selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent.
- the method includes causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- FIG. 1 is an exemplary component diagram of a system for online iterative re-planning, according to one aspect.
- FIG. 2 is an exemplary agent environment of a system for online iterative re-planning, according to one aspect.
- FIG. 3 is an exemplary process flow of a method for online iterative re-planning, according to one aspect.
- FIG. 4 are illustrations of an exemplary implementation of a system for online iterative re-planning, according to one aspect.
- FIG. 5 is an exemplary process flow of a method for online iterative re-planning, according to one aspect.
- FIG. 6 is an illustration of an example computer-readable medium or computer-readable device including processor-executable instructions configured to embody one or more of the provisions set forth herein, according to one aspect.
- a new online iterative re-planning method may use optimization of learned cost models and a signed-distance function (SDF) of the object to select contact points on the object.
- Previous methods may use a finite number of manually specified contact points which may lie arbitrarily on the object surface.
- an online iterative re-planning method is provided which plans a sequence of object poses and grasps for dexterous in-hand manipulation.
- the existing methods typically did not use feedback from the actual achieved grasps. By incorporating feedback, the method may achieve improved robustness and failure recovery.
- the proposed planning method may be extendable to the case where there is parametric uncertainty in object parameters for improved robustness.
- Controlling a multi-fingered hand to manipulate a tool is provided.
- the goal may be to move the tool from a start pose to a goal pose. Once at the goal pose, the tool may be grasped such that the multi-fingered hand may apply a task wrench.
- the higher-level planner may select a sequence of object poses and grasps. Grasps may be specified by contact locations on the surface of the object for each finger that is in contact. This means that the grasp may be parameterized by both continuous and discrete variables (the contact locations and the fingers that are in contact).
- the sequence of object poses and grasps should be feasible, i.e. it must be able to move from one object pose to the next and from one grasp to the next while keeping hold of the object. Once such a trajectory has been chosen, existing algorithms may track this trajectory using lower-level controllers.
- a method for online re-planning of trajectory is provided for in-hand manipulation of tools.
- the planner may receive the current grasp and object pose and generates a plan for the remainder of the trajectory. After generating this trajectory, lower level controllers may be used to transition towards the next sequence in the plan. After this, the actual achieved pose and grasp are used as input to the planner for the next iteration.
- Trajectory optimization may be used to plan which requires a trajectory cost.
- the planner may incorporate a novel grasp selection using sampling and gradient based optimization. This grasp selection may optimize a general grasp quality cost as well as a task-specific cost. This grasp selection may be used for tasks other than in-hand manipulation by choosing an appropriate task-specific cost.
- Agent as used herein is a self-propelled machine that moves through or manipulates an environment.
- exemplary agents may include, but is not limited to, robots, vehicles, or other self-propelled machines.
- the agent may be autonomously, semi-autonomously, or manually operated.
- Agent system may include, but is not limited to, any automatic or manual systems that may be used to enhance the agent.
- Exemplary systems include, but are not limited to: an electronic stability control system, an anti-lock brake system, a brake assist system, an automatic brake prefill system, a low speed follow system, a cruise control system, a collision warning system, a collision mitigation braking system, a velocity control system, a lane departure warning system, a blind spot indicator system, a lane keep assist system, a navigation system, a steering system, a transmission system, brake pedal systems, an electronic power steering system, visual devices (e.g., camera systems, proximity sensor systems), an electronic pretensioning system, a monitoring system, a passenger detection system, a suspension system, and a sensory system among others.
- visual devices e.g., camera systems, proximity sensor systems
- an electronic pretensioning system e.g., a monitoring system, a passenger detection system, a suspension system, and a sensory system among others.
- Bus refers to an interconnected architecture that is operably connected to other computer components inside a computer or between computers.
- the bus may transfer data between the computer components.
- the bus may be a memory bus, a memory processor, a peripheral bus, an external bus, a crossbar switch, and/or a local bus, among others.
- the bus may also be a bus that interconnects components inside an agent using protocols such as Media Oriented Systems Transport (MOST), Controller Area network (CAN), Local Interconnect network (LIN), among others.
- MOST Media Oriented Systems Transport
- CAN Controller Area network
- LIN Local Interconnect network
- Computer components refers to a computer-related entity (e.g., hardware, firmware, instructions in execution, combinations thereof).
- Computer components may include, for example, a process running on a processor, a processor, an object, an executable, a thread of execution, and a computer.
- a computer component(s) may reside within a process and/or thread.
- a computer component may be localized on one computer and/or may be distributed between multiple computers.
- Computer communication refers to a communication between two or more communicating devices (e.g., computer, personal digital assistant, cellular telephone, network device, vehicle, computing device, infrastructure device, roadside equipment) and may be, for example, a network transfer, a data transfer, a file transfer, an applet transfer, an email, a hypertext transfer protocol (HTTP) transfer, and so on.
- communicating devices e.g., computer, personal digital assistant, cellular telephone, network device, vehicle, computing device, infrastructure device, roadside equipment
- HTTP hypertext transfer protocol
- a computer communication may occur across any type of wired or wireless system and/or network having any type of configuration, for example, a local area network (LAN), a personal area network (PAN), a wireless personal area network (WPAN), a wireless network (WAN), a wide area network (WAN), a metropolitan area network (MAN), a virtual private network (VPN), a cellular network, a token ring network, a point-to-point network, an ad hoc network, a mobile ad hoc network, a vehicular ad hoc network (VANET), a vehicle-to-vehicle (V2V) network, a vehicle-to-everything (V2X) network, a vehicle-to-infrastructure (V2I) network, among others.
- LAN local area network
- PAN personal area network
- WPAN wireless personal area network
- WAN wireless network
- WAN wide area network
- MAN metropolitan area network
- VPN virtual private network
- VANET vehicular ad
- Computer communication may utilize any type of wired, wireless, or network communication protocol including, but not limited to, Ethernet (e.g., IEEE 802.3), WiFi (e.g., IEEE 802.11), communications access for land mobiles (CALM), WiMax, Bluetooth, Zigbee, ultra-wideband (UWAB), multiple-input and multiple-output (MIMO), telecommunications and/or cellular network communication (e.g., SMS, MMS, 3G, 4G, LTE, 5G, GSM, CDMA, WAVE), satellite, dedicated short range communication (DSRC), among others.
- Ethernet e.g., IEEE 802.3
- WiFi e.g., IEEE 802.11
- Communications Access e.g., WiMax
- Bluetooth e.g., WiMax
- UWAB ultra-wideband
- MIMO multiple-input and multiple-output
- telecommunications and/or cellular network communication e.g., SMS, MMS, 3G, 4G, LTE, 5G, GSM, CD
- Communication interface may include input and/or output devices for receiving input and/or devices for outputting data.
- the input and/or output may be for controlling different agent features, which include various agent components, systems, and subsystems.
- the term “input device” includes, but is not limited to: keyboard, microphones, pointing and selection devices, cameras, imaging devices, video cards, displays, push buttons, rotary knobs, and the like.
- Computer-readable medium refers to a non-transitory medium that stores instructions and/or data.
- a computer-readable medium may take forms, including, but not limited to, non-volatile media, and volatile media.
- Non-volatile media may include, for example, optical disks, magnetic disks, and so on.
- Volatile media may include, for example, semiconductor memories, dynamic memory, and so on.
- a computer-readable medium may include, but are not limited to, a floppy disk, a flexible disk, a hard disk, a magnetic tape, other magnetic medium, an ASIC, a CD, other optical medium, a RAM, a ROM, a memory chip or card, a memory stick, and other media from which a computer, a processor or other electronic device may read.
- Database is used to refer to a table. In other examples, “database” may be used to refer to a set of tables. In still other examples, “database” may refer to a set of data stores and methods for accessing and/or manipulating those data stores. In one embodiment, a database may be stored, for example, at a disk, data store, and/or a memory. A database may be stored locally or remotely and accessed via a network.
- Data store may be, for example, a magnetic disk drive, a solid-state disk drive, a floppy disk drive, a tape drive, a Zip drive, a flash memory card, and/or a memory stick.
- the disk may be a CD-ROM (compact disk ROM), a CD recordable drive (CD-R drive), a CD rewritable drive (CD-RW drive), and/or a digital video ROM drive (DVD ROM).
- the disk may store an operating system that controls or allocates resources of a computing device.
- Logic circuitry includes, but is not limited to, hardware, firmware, a non-transitory computer readable medium that stores instructions, instructions in execution on a machine, and/or to cause (e.g., execute) an action(s) from another logic circuitry, module, method and/or system.
- Logic circuitry may include and/or be a part of a processor controlled by an algorithm, a discrete logic (e.g., ASIC), an analog circuit, a digital circuit, a programmed logic device, a memory device containing instructions, and so on.
- Logic may include one or more gates, combinations of gates, or other circuit components. Where multiple logics are described, it may be possible to incorporate the multiple logics into one physical logic. Similarly, where a single logic is described, it may be possible to distribute that single logic between multiple physical logics.
- Non-volatile memory may include volatile memory and/or nonvolatile memory.
- Non-volatile memory may include, for example, ROM (read only memory), PROM (programmable read only memory), EPROM (erasable PROM), and EEPROM (electrically erasable PROM).
- Volatile memory may include, for example, RAM (random access memory), synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), and direct RAM bus RAM (DRRAM).
- the memory may store an operating system that controls or allocates resources of a computing device.
- Module includes, but is not limited to, non-transitory computer readable medium that stores instructions, instructions in execution on a machine, hardware, firmware, software in execution on a machine or computer readable media, and/or combinations of each to perform a function(s) or an action(s), and/or to cause a function or action from another module, method, and/or system.
- a module may also include logic, a software-controlled microprocessor, a discrete logic circuit, an analog circuit, a digital circuit, a programmed logic device, a memory device containing executing instructions, logic gates, a combination of gates, and/or other circuit components. Multiple modules may be combined into one module and single modules may be distributed among multiple modules.
- Operaable connection or a connection by which entities are “operably connected,” is one in which signals, physical communications, and/or logical communications may be sent and/or received.
- An operable connection may include a wireless interface, firmware interface, a physical interface, a data interface, and/or an electrical interface.
- Portable device is a computing device typically having a display screen with user input (e.g., touch, keyboard) and a processor for computing. Portable devices include, but are not limited to, handheld devices, mobile devices, smart phones, laptops, tablets, e-readers, smart speakers. In some embodiments, a “portable device” could refer to a remote device that includes a processor for computing and/or a communication interface for receiving and transmitting data remotely.
- Processor processes signals and performs general computing and arithmetic functions. Signals processed by the processor may include digital signals, data signals, computer instructions, processor instructions, messages, a bit, a bit stream, that may be received, transmitted and/or detected. Generally, the processor may be a variety of various processors including multiple single and multicore processors and co-processors and other multiple single and multicore processor and co-processor architectures. The processor may include logic circuitry to execute actions and/or algorithms.
- FIG. 1 is an exemplary component diagram of an operating environment 100 for online iterative re-planning, according to one aspect.
- the operating environment 100 includes a sensor module 102 , a computing device 104 , and operational systems 106 interconnected by a bus 108 .
- the components of the operating environment 100 as well as the components of other systems, hardware architectures, and software architectures discussed herein, may be combined, omitted, or organized into different architectures for various embodiments.
- the computing device 104 may be implemented with a device or remotely stored.
- the computing device 104 may be implemented as a part of an agent.
- the agent may be bipedal, two-wheeled, four-wheeled robot, vehicle, or self-propelled machine.
- the autonomous agent may be configured as a humanoid robot.
- the humanoid robot may take the form of all or a portion of a robot.
- the humanoid robot may take the form of an arm with fingers.
- the computing device 104 may be implemented as part of a telematics unit, a head unit, a navigation unit, an infotainment unit, an electronic control unit, among others of an agent.
- the components and functions of the computing device 104 may be implemented, for example, with other devices (e.g., a portable device) or another device connected via a network (e.g., a network 134 ).
- the computing device 104 may be capable of providing wired or wireless computer communications utilizing various protocols to send/receive electronic signals internally to/from components of the operating environment 100 .
- the computing device 104 may be operably connected for internal computer communication via the bus 108 (e.g., a Controller Area Network (CAN) or a Local Interconnect Network (LIN) protocol bus) to facilitate data input and output between the computing device 104 and the components of the operating environment 100 .
- the bus 108 e.g., a Controller Area Network (CAN) or a Local Interconnect Network (LIN) protocol bus
- the agent may be the agent 200 shown in FIG. 2 .
- the agent 200 may include a number of sensors.
- the agent 200 may include a first optical sensor 202 , a second optical sensor 204 , and a force sensor 206 .
- the first optical sensor 202 , the second optical sensor 204 , and the force sensor 206 receive data from an environment of an object 208 .
- the sensor module 102 receives, provides, and/or senses information associated with the agent 200 , an object 208 , the operating environment 100 , an environment of the agent 200 , and/or the operational systems 106 .
- the sensor module 102 may receive image data 110 , depth data 112 , and tactile data 114 from the sensors.
- the sensor module 102 may receive image data 110 from the first optical sensor 202 , depth data 112 from the second optical sensor 204 , and the tactile data 114 from the force sensor 206 .
- the computing device 104 receives the image data 110 , the depth data 112 , and the tactile data 114 from the sensor module 102 . Therefore, the image data 110 , depth data 112 , and tactile data 114 is raw sensor data received from their respective sensors.
- the image data 110 , depth data 112 , and tactile data 114 may include information about the sensors.
- the image data 110 , depth data 112 , and tactile data 114 may include information about the force sensor 206 such as the relative position of the force sensor 206 to a reference point as measured by a sensor.
- the reference point may be the first optical sensor 202 or the second optical sensor 204 .
- the depth data 112 may include distance measurements from the second optical sensor 204 to the force sensor 206 .
- the tactile data 114 may include dimensions (e.g., width, height, length, etc.) of the force sensor 206 .
- the force censor may measure various forces exerted by (e.g., torque) or experienced by (e.g., resistance) the agent 200 .
- the sensors 202 - 206 and/or the sensor module 102 are operable to sense a measurement of data associated with the agent 200 , the operating environment 100 , the object 208 , the environment, and/or the operational systems 106 and generate a data signal indicating said measurement of data. These data signals may be converted into other data formats (e.g., numerical) and/or used by the sensor module 102 , the computing device 104 , and/or the operational systems 106 to generate other data metrics and parameters. It is understood that the sensors may be any type of sensor, for example, acoustic, electric, environmental, optical, imaging, light, pressure, force, thermal, temperature, proximity, gyroscope, and accelerometers, among others.
- the computing device 104 includes a processor 116 , a memory 118 , a data store 120 , and a communication interface 122 , which are each operably connected for computer communication via a bus 108 and/or other wired and wireless technologies.
- the communication interface 122 provides software and hardware to facilitate data input and output between the components of the computing device 104 and other components, networks, and data sources, which will be described herein.
- the computing device 104 also includes a planner module 124 , a grasp selector module 126 , and a controller 128 , for online iterative re-planning facilitated by the components of the operating environment 100 .
- the planner module 124 , the grasp selector module 126 , and the controller 128 may be an artificial neural network that act as a framework for machine learning, including deep learning.
- the planner module 124 , the grasp selector module 126 , and the controller 128 may be a convolution neural network (CNN).
- CNN convolution neural network
- the planner module 124 , the grasp selector module 126 , and the controller 128 may further include or implement concatenator, a deep neural network (DNN), a recurrent neural network (RNN), a 3D Convolutional Neural Network (3DCNN) and/or Convolutional Long-Short Term Memory (ConvLSTM).
- DNN deep neural network
- RNN recurrent neural network
- 3DCNN 3D Convolutional Neural Network
- ConvLSTM Convolutional Long-Short Term Memory
- the computing device 104 is also operably connected for computer communication (e.g., via the bus 108 and/or the communication interface 122 ) to one or more operational systems 106 .
- the operational systems 106 may include, but are not limited to, any automatic or manual systems that may be used to enhance the agent 200 , operation, and/or propulsion.
- the operational systems 106 may be dependent on the implementation.
- the operational system 106 may include a path planning module 130 and a trajectory module 132 .
- the path planning module 130 monitors, analyses, operates the agent 200 to some degree.
- the path planning module may define a path as a course or direction through space. The path may be defined by a number of waypoints.
- the trajectory module 132 applies time points to the path, for example, by assigning time stamps to the waypoints.
- the operational systems 106 may include manipulating the object 208 to affect a target.
- the operational systems 106 also include and/or are operably connected for computer communication to the sensor module 102 .
- one or more sensors of the sensor module 102 may be incorporated with the path planning module 130 to monitor characteristics of the environment or the agent 200 , for example, based on the tactile data 114 .
- the sensor module 102 , the computing device 104 , and/or the operational systems 106 are also operatively connected for computer communication to the network 134 .
- the network 134 is, for example, a data network, the Internet, a wide area network (WAN) or a local area (LAN) network.
- the network 134 serves as a communication medium to various remote devices (e.g., databases, web servers, remote servers, application servers, intermediary servers, client machines, other portable devices).
- remote devices e.g., databases, web servers, remote servers, application servers, intermediary servers, client machines, other portable devices.
- FIG. 3 a method 300 for online augmentation for learned grasping will now be described according to an exemplary embodiment.
- FIG. 3 will be described with reference to FIGS. 1 , 2 , 4 , and 6 .
- the method 300 will be described as a sequence of blocks, but it is understood that the blocks of the method 300 may be organized into different architectures, elements, stages, and/or processes.
- the method 300 includes the planner module 124 receiving, at a first time step, a first grasp and an initial object pose of an agent 200 .
- the first grasp includes contact points, c t , and the initial object pose o t of an agent 200 are based on state information from the environment of the agent 200 , as shown in FIG. 4 .
- the first grasp and/or the initial object pose, o t may be selected based on state information, a known state, or a random selection.
- the initial object pose, o t may be determined based on a signed distance field (SDF) is a function ⁇ (x): R 3 ⁇ R which maps a point in space to the distance to the nearest surface of an object 208 .
- SDF signed distance field
- the level set ⁇ (x) 0 implicitly defines the surface of the object 208 . It is often computed offline and stored in a voxel-grid, but a neural network may be used to learn the function.
- a DeepSDF models may be learned using data generated from a mesh of the object 208 . It is also possible to learn SDF models generated from point cloud based on the image data 110 , the depth data 112 , and/or the tactile data 114 .
- the initial object pose, o t may be based on any neural-network based SDF model.
- the agent 200 is able to grasp the object 208 in the environment using one or more links.
- the grasp may be performed by links.
- links are portions of the agent 200 used in contact with the object 208 to achieve a grasp, such as the initial grasp.
- the links may include a number of end effector(s) that operate to mimic the function of human fingers.
- the agent 200 may include a first end effector 210 , a second end effector 212 , a third end effector 214 , a fourth end effector 216 , and a fifth end effector 218 that is opposable such that it may move toward and touch the other end effector 210 - 216 .
- the end effector 210 - 218 are links of the agent 200 with the object 208 .
- the links may also include other portions of an agent 200 such as a grasper, rod, or clamp, among others.
- a set of grasps may include any grasp that may achieved by the agent 200 based on the state of the agent 200 at the given time step.
- the first grasp of the object 208 at a first time step and includes a contact point c t .
- a contact point includes contact pairs ⁇ p 0 , p R ⁇ where po is an object contact point on the object 208 in the object's reference frame and p R is an agent contact point on the agent 200 in the agent's reference frame.
- Each instance where a link of the agent 200 is or may be in contact with the object 208 is represented in a contact point c t .
- the first end effector 210 , the second end effector 212 , the third end effector 214 , and so on of the agent 200 are in contact with the object 208 may be represented as contact points c t .
- the contact points include contact pairs ⁇ p 0 , p R ⁇ where p 0 is an object contact point on the object 208 in the object's reference frame and p R is an agent contact point on the agent 200 in the agent's reference frame.
- the object contact point, p 0 , and/or the agent contact point, p R may be based on the state information, such object state information and/or agent state information.
- the state information may be received or determined from the sensor module 102 or the operational systems 106 .
- the image data 110 may correspond to the portion of the object 208 that is not occluded by the grasp of the agent 200 or the environment.
- the agent 200 is holding an object 208 , shown here as a wrench, with a force sensor 206 represented by the end effectors 210 - 218 of the agent 200 .
- the image data 110 may be image data, such as RGB data, YCB data, and/or YUV data. The image data may be used to determine a joint configuration of the end effectors 210 - 218 of the agent.
- the state information may also be based on the depth data 112 about the object 208 .
- the depth data 112 may be received from the second optical sensor 204 .
- the depth data 112 may include a video sequence or a series of images, user inputs, and/or data from the operational systems 106 , such as data from a Controller Area Network (CAN) bus including as pedal pressure, steer angle, etc.
- the depth data 112 may be received from the second optical sensor 204 .
- the second optical sensor 204 may include radar units, lidar units, image capture components, ranging sensors, cameras, scanners (e.g., 2-D scanners or 3-D scanners), or other measurement components.
- the depth data 112 from the second optical sensor 204 may be augmented by other sources, such as the first optical sensor 202 , and/or remote devices (e.g., via the bus 108 and/or the communication interface 122 ).
- the state information may also be based on the tactile data 114 received from the force sensor 206 .
- the force sensor 206 may include tensile force sensors, compressions force sensors, tensile and force compression sensors, or other measurement components.
- the force sensor 206 may be with a Weiss gripper with GelSight tactile sensors.
- the tactile data 114 is augmented as additional sensor data from other sources is received.
- the tactile data 114 from the force sensor 206 may be augmented by other sources, such as additional force sensors (e.g., from a second hand of the agent 200 ), and/or remote devices (e.g., via the bus 108 and/or the communication interface 122 ).
- the planner module 124 may determine the agent contact points and the object contact points based on the tactile data 114 .
- the planner module 124 may also use the operational system 106 , including but not limited to the agent systems, to determine the state information.
- the trajectory module 132 may provide the planner module 124 with the state information from previous time steps. Based on the state information from the previous time steps, the planner module 124 may determine current state information, such as a current joint configuration of the end effector 210 - 218 of the agent 200 . Therefore, using the image data 110 , the depth data 112 , the tactile data 114 , and/or the operational systems, planner module 124 may determine
- the method 300 includes the planner module 124 generating a first set of candidate object trajectories based on the first grasp and the initial object pose.
- Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step.
- the goal may be to move the object 208 from the initial object pose o t to a goal pose o G .
- the candidate object trajectory is defined to allow the agent 200 to move from one time step to a next time step until the final time step is achieved.
- the first time step is associated with the first grasp that includes contact points, c t , and the initial object pose o t of an agent 200 represented as (o t , c t ) and transition to a second grasp that includes contact points, c t+1 , and a next object pose o t+1 of the agent 200 represented as (o t+1 , c t+1 ).
- the object 208 should be grasped such that the links may apply a task wrench, w G . Since a grasp is specified by contact point, c t , on the surface of the object 208 , the grasp is parameterized by both continuous and discrete variables.
- the first set of candidate object trajectories is also generated such that the task wrench, w G is inside the wrench conde of the grasp c T .
- the current object pose and grasp 402 may reflect a first grasp that includes contact points, c t , and an object pose o t at any time step t including the first time step, the second time step, and so on.
- the planner module 124 includes an object trajectory sampler 404 , shown in FIG. 4 .
- the object trajectory sampler 404 receives the current object pose and grasp 402 , here the first grasp that includes contact points, c t , and the initial object pose o t .
- the object trajectory sampler 404 may generate a set of candidate object trajectories, here the first set of candidate object trajectories, based on paths to the goal, such as a straight-line path to the goal.
- Gaussian perturbations may be sampled by the object trajectory sampler 404 such that the end points are constrained and variance is highest in the middle of the trajectory. By sampling in this manner, each candidate object trajectory of the first set of candidate object trajectories reaches the goal.
- the method 300 includes the grasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories.
- the grasp selection may be calculated by randomly sampling the fingers in contact and then performing continuous optimization of the contact points due to the mix of discrete and continuous decision variables.
- a contact sequence may be constrained by assuming that between grasp c t at a first time step and c t+1 at a next time step, an end effectuator may be removed from contact or added to contact at a new contact location on the object 208 .
- the grasps may be constrained by a requiring that multiple end effectuators not be added or removed at the same time. Constraint also include end effectuators not changing directly from one contact location to another without being removed first. The constraints allow the contact sequence to be processed by lower-level controllers.
- the contact sequence as (c t , ⁇ c t+1 , . . . , ⁇ c T ) which is the current grasp at the first time step and a sequence of grasp changes over subsequent time steps.
- the grasp sequence is initialized to be a sequence of null changes.
- the grasp selector module 126 includes a contact point sampler 406 and contact point optimizer 408 shown in FIG. 4 .
- the contact point sampler 406 may use a current nominal grasp sequence.
- the contact point sampler 406 re-samples for each link and a new contact point c.
- the method 300 includes the grasp selector module 126 selecting a first candidate object trajectory 412 from the first set of candidate object trajectories o t:T .
- the first candidate object trajectory 412 is selected based on the contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories c t+1:T .
- the contact points c t+1:T for grasps associated with each candidate object trajectory of the first set of candidate object trajectories o t:T are sampled at block 306 by the contact point sampler 406 .
- the contact points c t+1:T for grasps associated with the candidate object trajectories are optimized using a learned neural network cost model, such as a cost function J( ⁇ ) by the contact point optimizer 408 such that the first set of candidate object trajectories are optimized samples 410 .
- the learned neural network cost model is a generative model based on feasible outcomes.
- the feasible outcomes may be based on offline data such as wrench cost data, kinematics cost data, superposition cost data, as will be discussed in greater detail below.
- the learned neural network cost model is based on the form:
- the grasp selection may be incorporated using the contact point sampler 406 to sample and the contact point optimizer 408 gradient based optimization.
- This grasp selection optimizes a general grasp quality cost L grasp as well as a task-specific cost L task to generate the first set of candidate object trajectories are optimized samples 410 such that the first candidate object trajectory 412 is selected from the optimized samples 410 .
- a gradient descent may be used to minimize this loss function for each sample using ⁇ J( ⁇ ).
- the contact point optimizer 408 optimizes the contact points based on the constraints. For example, since moving an end effectuator involves first removing that end effectuator, only contact points for an end effectuator that has just been added to contact is optimized.
- J( ⁇ ) includes a cost, which encourages the contact points to be on the surface of the object 208
- the contact points may deviate from the surface of the object 208 .
- a projected gradient descent may be used, where the planner module 124 iterates between optimizing J( ⁇ ) and minimizing
- the learned neural network cost model may be represented as an energy based model such that the energy based model is a generative model that learns unnormalized density, p ⁇ (x) ⁇ exp( ⁇ E ⁇ (x)).
- E ⁇ is a neural network with parameter ⁇ .
- the energy based model is trained via contrastive divergence to minimize the objective given by the energy function:
- the first expectation is with respect to the data (e.g., positive examples), and the second expectation is with respect to samples generated from the model (e.g., negative examples). These may be generated via Langevin dynamics with the energy function. Thus, the energy of positive examples is minimized while the energy of negative examples maximized. In this manner, the grasp selector module 126 selecting a first candidate object trajectory 412 from the first set of candidate object trajectories o t:T based on the learned neural network cost model.
- the learned neural network cost model may be based on offline data.
- Offline data may be associated with the agent 200 , such as a specific agent or type of agent.
- the offline data is data that includes a number of feasible outcomes for an agent.
- the feasible outcomes are possible and/or allowable physical orientation, such as position, grasp, velocity, acceleration, path plan, etc.
- the feasible outcomes may be based on cost functions, such a wrench cost function, superposition cost, and/or inverse kinematics cost, among others.
- f k and A k are the contact force and the linearized friction cone for the kth finger. Both p k and A k are functions of the contact points of the grasp, and the contact normal which is a function of the geometry of the object 208 and the grasp contact point.
- the desired wrench is set to ⁇ w ext and the wrench error is ⁇ w ext +w * ⁇ 2 . All forces are vectors and may be in the frame of reference of the object 208 .
- the learned neural network cost model estimates the cost based on the feasible outcomes.
- the feasible outcomes include a large dataset of grasps c k , contact normals n k , and desired wrenches w d in the object frame of the object 208 .
- the learned neural network cost model is parameterized as a neural network with four layers and ReLU activations.
- the learned neural network cost model is trained based on the energy function. In addition to this cost, the energy of the negative examples is maximized. In this way the resulting the learned neural network cost model has low energy for low-error force optimizations, and high energy for both high-error force optimizations, as well as data that is outside the training distribution.
- the external wrench is first transformed to the object frame with the object pose, compute the contact normals at the grasp using the object SDF, and then evaluate the learned neural network cost model.
- the learned neural network cost model may be trained based on the superposition cost between the finger and the object 208 when moving between contact points.
- Waypoints may be generated from the current finger position to the contact point and check the signed distance at each waypoint using ⁇ . Since the waypoint locations are a linear function of c k and c k+1 , the derivative of the superposition cost is calculated with respect to the contact point using ⁇ x ⁇ (x).
- the learned neural network cost model may be trained based on the inverse kinematics cost. Like wrench cost, the inverse kinematics is prohibitively slow to be performed online. Instead, a set of feasible outcomes is received offline, including a large dataset of joint configurations for the fingers. Forward kinematics may be performed for all of these joint configurations to generate finger locations. A check is performed at each of these joint configurations to determine which joint configurations are feasible outcomes.
- the learned neural network cost model is trained in a similar manner as described above with respect to wrench cost, for example, using contrastive divergence to minimize the energy of positive example, while maximizing the energy of negative examples and artificially generated negative examples from the model.
- the method 300 includes the path planning module 130 causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- the path planning module 130 may receive the selected candidate object trajectory.
- the path planning module 130 may then cause the agent 200 to move the object 208 toward the goal pose o G .and/or grasp the object 208 in order to apply a task wrench, w G .
- the path planning module 130 may cause the links of the agent 200 to exert a mechanical force on a target using the object 208 .
- a policy may be generated based on the selected candidate object trajectory using the contact point(s) to pretrain the agent 200 or another agent on the operation of the object 208 with respect to a target.
- FIG. 5 a method 500 for online augmentation for learned grasping will now be described according to an exemplary embodiment.
- FIG. 5 will be described with reference to FIGS. 1 - 4 , and 6 .
- the method 500 will be described as a sequence of blocks, but it is understood that the blocks of the method 500 may be organized into different architectures, elements, stages, and/or processes.
- the method 500 includes blocks of the method 300 , which operate in a similar manner as described above.
- the method 500 includes the planner module 124 receiving, at a first time step, the first grasp that includes contact points, c t , and the initial object pose o t of the agent 200 .
- the method 500 includes the planner module 124 generating the first grasp that includes contact points, c t , and the initial object pose o t .
- the candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step.
- the method 500 includes the grasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories.
- the method 500 includes the planner module 124 selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent 200 , as described above with respect to the method 300 .
- the method 500 includes the controller 128 causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- the systems and methods herein describe online iterative re-planning and may include re-planning at the future time steps.
- the method 500 determines if the agent 200 has reached goal.
- the determination may be based on the state information including the object state information and/or the agent state information.
- the determination may be based on the image data 110 , the depth data 112 , and the tactile data 114 .
- the determination may be based on the SDF being a function ⁇ (x): R ⁇ circumflex over ( ) ⁇ 3 ⁇ R.
- the method 500 ends. However, if it is determined that the agent 200 has not reached the goal, then the method 500 proceeds to block 504 .
- the method 500 includes the planner module 124 receiving, at a next time step, a next grasp and a next object pose based on the selected candidate object trajectory of the agent 200 .
- the planner module 124 receives, at the second time step, a second grasp and a next object pose based on the first candidate object trajectory of the agent.
- the second grasp and a next object pose may additionally be based on SDF, the state information including the object state information and/or the agent state information, the image data 110 , the depth data 112 , and the tactile data 114 , as described above with respect to block 302 of the method 300 .
- the method 500 includes the planner module 124 generating a second set of candidate object trajectories based on the next grasp, in this example the second grasp, and the next object pose. Therefore, in this example, the candidate object trajectories of the second set of candidate object trajectories provide a number object poses from the next object pose to the goal for a number of future time steps after the second time.
- the second set of candidate object trajectories is also generated such that the task wrench, w G is inside the wrench conde of the grasp c T .
- the method 500 includes the grasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the second set of candidate object trajectories.
- the contact point sampler 406 may use a current nominal grasp sequence. In one embodiment, for each timestep with some probability, the contact point sampler 406 re-samples for each link and a new contact point c.
- the method 500 includes the planner module 124 selecting a second candidate object trajectory from the second set of candidate object trajectories based on the learned neural network cost model.
- the contact points c t+1:T for grasps associated with the candidate object trajectories are optimized using a cost function J( ⁇ ) by the contact point optimizer 408 such that the second set of candidate object trajectories are optimized samples 410 .
- the method 500 includes the controller 128 causing the agent to execute the second candidate object trajectory at a second time step.
- the method ten returns to block 502 to if the agent 200 has reached the goal. If it is determined that the agent 200 has reached the goal, the method 500 ends. However, if it is determined that the agent 200 has not reached the goal, then the method 500 proceeds to block 504 for another iteration. Continuing the example from above, such an iteration would be the third iteration for a third time steps. Therefore, the systems and method described herein are iterative. Accordingly, the systems and methods described herein plan a sequence of object poses and grasps for the agent, such as dexterous in-hand manipulation. While existing methods do not use feedback from the actual achieved grasps, here feedback is incorporated to achieve improved robustness and failure recovery while requiring less processing power. The proposed planning method is extendable to the case where there is parametric uncertainty in object parameters for improved robustness.
- Still another aspect involves a computer-readable medium including processor-executable instructions configured to implement one aspect of the techniques presented herein.
- An aspect of a computer-readable medium or a computer-readable device devised in these ways is illustrated in FIG. 6 , wherein an implementation 600 includes a computer-readable medium 608 , such as a CD-R, DVD-R, flash drive, a platter of a hard disk drive, etc., on which is encoded computer-readable data 606 .
- This encoded computer-readable data 606 such as binary data including a plurality of zero's and one's as shown in 606 , in turn includes a set of processor-executable computer instructions 604 configured to operate according to one or more of the principles set forth herein.
- the processor-executable computer instructions 604 may be configured to perform a method 602 , such as the method 300 of FIG. 3 or the method 500 of FIG. 5 .
- the processor-executable computer instructions 604 may be configured to implement a system, such as the operating environment 100 of FIG. 1 .
- Many such computer-readable media may be devised by those of ordinary skill in the art that are configured to operate in accordance with the techniques presented herein.
- a component may be, but is not limited to being, a process running on a processor, a processing unit, an object, an executable, a thread of execution, a program, or a computer.
- a component may be, but is not limited to being, a process running on a processor, a processing unit, an object, an executable, a thread of execution, a program, or a computer.
- an application running on a controller and the controller may be a component.
- One or more components residing within a process or thread of execution and a component may be localized on one computer or distributed between two or more computers.
- the claimed subject matter is implemented as a method, apparatus, or article of manufacture using standard programming or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter.
- article of manufacture as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media.
- first”, “second”, or the like are not intended to imply a temporal aspect, a spatial aspect, an ordering, etc. Rather, such terms are merely used as identifiers, names, etc. for features, elements, items, etc.
- a first channel and a second channel generally correspond to channel A and channel B or two different or two identical channels or the same channel.
- “comprising”, “comprises”, “including”, “includes”, or the like generally means comprising or including, but not limited to.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Orthopedic Medicine & Surgery (AREA)
- Manipulator (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application, Ser. No. 63/401,488 (Attorney Docket No. H1223023US01) entitled “COMBINING SAMPLE-BASED AND GRADIENT-BASED TRAJECTORY OPTIMIZATION FOR DEXTEROUS MANIPULATION WITH LEARNED COST”, filed on Aug. 26, 2022; the entirety of the above-noted application(s) is incorporated by reference herein. The present application is related to co-pending U.S. patent application Ser. No. 18/080,623 (Attorney Docket No. HRA-53318//H1223023US02) filed on Dec. 13, 2022 entitled “GRASP SELECTION”. The co-pending application is incorporated herein by reference, but is not admitted to be prior art with respect to the present application.
- Dexterous in-hand manipulation is a challenging robotics task which involves re-orientating an object from an initial pose to a goal pose while grasping the object. Re-orientating an object in-hand may be necessary when some task requires a certain object pose or grasp that is different from the initial grasped pose.
- In one embodiment, a system for online iterative re-planning is provided. The system includes a processor and a memory storing instructions. When executed by the processor, the instructions cause the processor to receive, at a first time step, a first grasp and an initial object pose of an agent. When executed by the processor, the instructions also cause the processor to generate a first set of candidate object trajectories based on the first grasp and the initial object pose. Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step. When executed by the processor, the instructions further cause the processor to calculate contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. When executed by the processor, the instructions yet further cause the processor to select a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent. When executed by the processor, the instructions further cause the processor to cause the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- According to another aspect, a method for online iterative re-planning is provided. The method includes receiving, at a first time step, a first grasp and an initial object pose of an agent. The method also includes generating a first set of candidate object trajectories based on the first grasp and the initial object pose. Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step. The method further includes calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. The method yet further includes selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent. The method includes causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
- According to yet another aspect, a non-transitory computer readable storage medium storing instructions that when executed by a computer having a processor to perform a method for online iterative re-planning is provided. The method includes receiving, at a first time step, a first grasp and an initial object pose of an agent. The method also includes generating a first set of candidate object trajectories based on the first grasp and the initial object pose. Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step. The method further includes calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. The method yet further includes selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with the agent. The method includes causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps.
-
FIG. 1 is an exemplary component diagram of a system for online iterative re-planning, according to one aspect. -
FIG. 2 is an exemplary agent environment of a system for online iterative re-planning, according to one aspect. -
FIG. 3 is an exemplary process flow of a method for online iterative re-planning, according to one aspect. -
FIG. 4 are illustrations of an exemplary implementation of a system for online iterative re-planning, according to one aspect. -
FIG. 5 is an exemplary process flow of a method for online iterative re-planning, according to one aspect. -
FIG. 6 is an illustration of an example computer-readable medium or computer-readable device including processor-executable instructions configured to embody one or more of the provisions set forth herein, according to one aspect. - Generally described, a new online iterative re-planning method is provided. This method may use optimization of learned cost models and a signed-distance function (SDF) of the object to select contact points on the object. Previous methods may use a finite number of manually specified contact points which may lie arbitrarily on the object surface. In addition, and as disclosed herein, an online iterative re-planning method is provided which plans a sequence of object poses and grasps for dexterous in-hand manipulation. The existing methods typically did not use feedback from the actual achieved grasps. By incorporating feedback, the method may achieve improved robustness and failure recovery. The proposed planning method may be extendable to the case where there is parametric uncertainty in object parameters for improved robustness.
- Controlling a multi-fingered hand to manipulate a tool is provided. The goal may be to move the tool from a start pose to a goal pose. Once at the goal pose, the tool may be grasped such that the multi-fingered hand may apply a task wrench. The higher-level planner may select a sequence of object poses and grasps. Grasps may be specified by contact locations on the surface of the object for each finger that is in contact. This means that the grasp may be parameterized by both continuous and discrete variables (the contact locations and the fingers that are in contact). In addition, the sequence of object poses and grasps should be feasible, i.e. it must be able to move from one object pose to the next and from one grasp to the next while keeping hold of the object. Once such a trajectory has been chosen, existing algorithms may track this trajectory using lower-level controllers.
- A method for online re-planning of trajectory is provided for in-hand manipulation of tools. At each timestep, the planner may receive the current grasp and object pose and generates a plan for the remainder of the trajectory. After generating this trajectory, lower level controllers may be used to transition towards the next sequence in the plan. After this, the actual achieved pose and grasp are used as input to the planner for the next iteration. Trajectory optimization may be used to plan which requires a trajectory cost. In particular the planner may incorporate a novel grasp selection using sampling and gradient based optimization. This grasp selection may optimize a general grasp quality cost as well as a task-specific cost. This grasp selection may be used for tasks other than in-hand manipulation by choosing an appropriate task-specific cost. Due to the hybrid discrete and continuous trajectory, optimization is not easily performed via gradient based methods only. Therefore, a combination of gradient-based methods and sample-based methods to perform the grasp selection may be used. Existing cost functions used for planning are typically not easily parallelizable or differentiable. Learned neural network cost models may therefore be used to approximate these existing cost functions.
- The following includes definitions of selected terms employed herein. The definitions include various examples and/or forms of components that fall within the scope of a term and that may be used for implementation. The examples are not intended to be limiting. Furthermore, the components discussed herein, may be combined, omitted, or organized with other components or into different architectures.
- “Agent” as used herein is a self-propelled machine that moves through or manipulates an environment. Exemplary agents may include, but is not limited to, robots, vehicles, or other self-propelled machines. The agent may be autonomously, semi-autonomously, or manually operated.
- “Agent system,” as used herein may include, but is not limited to, any automatic or manual systems that may be used to enhance the agent. Exemplary systems include, but are not limited to: an electronic stability control system, an anti-lock brake system, a brake assist system, an automatic brake prefill system, a low speed follow system, a cruise control system, a collision warning system, a collision mitigation braking system, a velocity control system, a lane departure warning system, a blind spot indicator system, a lane keep assist system, a navigation system, a steering system, a transmission system, brake pedal systems, an electronic power steering system, visual devices (e.g., camera systems, proximity sensor systems), an electronic pretensioning system, a monitoring system, a passenger detection system, a suspension system, and a sensory system among others.
- “Bus,” as used herein, refers to an interconnected architecture that is operably connected to other computer components inside a computer or between computers. The bus may transfer data between the computer components. The bus may be a memory bus, a memory processor, a peripheral bus, an external bus, a crossbar switch, and/or a local bus, among others. The bus may also be a bus that interconnects components inside an agent using protocols such as Media Oriented Systems Transport (MOST), Controller Area network (CAN), Local Interconnect network (LIN), among others.
- “Component,” as used herein, refers to a computer-related entity (e.g., hardware, firmware, instructions in execution, combinations thereof). Computer components may include, for example, a process running on a processor, a processor, an object, an executable, a thread of execution, and a computer. A computer component(s) may reside within a process and/or thread. A computer component may be localized on one computer and/or may be distributed between multiple computers.
- “Computer communication,” as used herein, refers to a communication between two or more communicating devices (e.g., computer, personal digital assistant, cellular telephone, network device, vehicle, computing device, infrastructure device, roadside equipment) and may be, for example, a network transfer, a data transfer, a file transfer, an applet transfer, an email, a hypertext transfer protocol (HTTP) transfer, and so on. A computer communication may occur across any type of wired or wireless system and/or network having any type of configuration, for example, a local area network (LAN), a personal area network (PAN), a wireless personal area network (WPAN), a wireless network (WAN), a wide area network (WAN), a metropolitan area network (MAN), a virtual private network (VPN), a cellular network, a token ring network, a point-to-point network, an ad hoc network, a mobile ad hoc network, a vehicular ad hoc network (VANET), a vehicle-to-vehicle (V2V) network, a vehicle-to-everything (V2X) network, a vehicle-to-infrastructure (V2I) network, among others. Computer communication may utilize any type of wired, wireless, or network communication protocol including, but not limited to, Ethernet (e.g., IEEE 802.3), WiFi (e.g., IEEE 802.11), communications access for land mobiles (CALM), WiMax, Bluetooth, Zigbee, ultra-wideband (UWAB), multiple-input and multiple-output (MIMO), telecommunications and/or cellular network communication (e.g., SMS, MMS, 3G, 4G, LTE, 5G, GSM, CDMA, WAVE), satellite, dedicated short range communication (DSRC), among others.
- “Communication interface” as used herein may include input and/or output devices for receiving input and/or devices for outputting data. The input and/or output may be for controlling different agent features, which include various agent components, systems, and subsystems. Specifically, the term “input device” includes, but is not limited to: keyboard, microphones, pointing and selection devices, cameras, imaging devices, video cards, displays, push buttons, rotary knobs, and the like.
- “Computer-readable medium,” as used herein, refers to a non-transitory medium that stores instructions and/or data. A computer-readable medium may take forms, including, but not limited to, non-volatile media, and volatile media. Non-volatile media may include, for example, optical disks, magnetic disks, and so on. Volatile media may include, for example, semiconductor memories, dynamic memory, and so on. Common forms of a computer-readable medium may include, but are not limited to, a floppy disk, a flexible disk, a hard disk, a magnetic tape, other magnetic medium, an ASIC, a CD, other optical medium, a RAM, a ROM, a memory chip or card, a memory stick, and other media from which a computer, a processor or other electronic device may read.
- “Database,” as used herein, is used to refer to a table. In other examples, “database” may be used to refer to a set of tables. In still other examples, “database” may refer to a set of data stores and methods for accessing and/or manipulating those data stores. In one embodiment, a database may be stored, for example, at a disk, data store, and/or a memory. A database may be stored locally or remotely and accessed via a network.
- “Data store,” as used herein may be, for example, a magnetic disk drive, a solid-state disk drive, a floppy disk drive, a tape drive, a Zip drive, a flash memory card, and/or a memory stick. Furthermore, the disk may be a CD-ROM (compact disk ROM), a CD recordable drive (CD-R drive), a CD rewritable drive (CD-RW drive), and/or a digital video ROM drive (DVD ROM). The disk may store an operating system that controls or allocates resources of a computing device.
- “Logic circuitry,” as used herein, includes, but is not limited to, hardware, firmware, a non-transitory computer readable medium that stores instructions, instructions in execution on a machine, and/or to cause (e.g., execute) an action(s) from another logic circuitry, module, method and/or system. Logic circuitry may include and/or be a part of a processor controlled by an algorithm, a discrete logic (e.g., ASIC), an analog circuit, a digital circuit, a programmed logic device, a memory device containing instructions, and so on. Logic may include one or more gates, combinations of gates, or other circuit components. Where multiple logics are described, it may be possible to incorporate the multiple logics into one physical logic. Similarly, where a single logic is described, it may be possible to distribute that single logic between multiple physical logics.
- “Memory,” as used herein may include volatile memory and/or nonvolatile memory. Non-volatile memory may include, for example, ROM (read only memory), PROM (programmable read only memory), EPROM (erasable PROM), and EEPROM (electrically erasable PROM). Volatile memory may include, for example, RAM (random access memory), synchronous RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), and direct RAM bus RAM (DRRAM). The memory may store an operating system that controls or allocates resources of a computing device.
- “Module,” as used herein, includes, but is not limited to, non-transitory computer readable medium that stores instructions, instructions in execution on a machine, hardware, firmware, software in execution on a machine or computer readable media, and/or combinations of each to perform a function(s) or an action(s), and/or to cause a function or action from another module, method, and/or system. A module may also include logic, a software-controlled microprocessor, a discrete logic circuit, an analog circuit, a digital circuit, a programmed logic device, a memory device containing executing instructions, logic gates, a combination of gates, and/or other circuit components. Multiple modules may be combined into one module and single modules may be distributed among multiple modules.
- “Operable connection,” or a connection by which entities are “operably connected,” is one in which signals, physical communications, and/or logical communications may be sent and/or received. An operable connection may include a wireless interface, firmware interface, a physical interface, a data interface, and/or an electrical interface.
- “Portable device,” as used herein, is a computing device typically having a display screen with user input (e.g., touch, keyboard) and a processor for computing. Portable devices include, but are not limited to, handheld devices, mobile devices, smart phones, laptops, tablets, e-readers, smart speakers. In some embodiments, a “portable device” could refer to a remote device that includes a processor for computing and/or a communication interface for receiving and transmitting data remotely.
- “Processor,” as used herein, processes signals and performs general computing and arithmetic functions. Signals processed by the processor may include digital signals, data signals, computer instructions, processor instructions, messages, a bit, a bit stream, that may be received, transmitted and/or detected. Generally, the processor may be a variety of various processors including multiple single and multicore processors and co-processors and other multiple single and multicore processor and co-processor architectures. The processor may include logic circuitry to execute actions and/or algorithms.
- Referring now to the drawings, the drawings are for purposes of illustrating one or more exemplary embodiments and not for purposes of limiting the same.
FIG. 1 is an exemplary component diagram of an operatingenvironment 100 for online iterative re-planning, according to one aspect. The operatingenvironment 100 includes asensor module 102, acomputing device 104, andoperational systems 106 interconnected by a bus 108. The components of the operatingenvironment 100, as well as the components of other systems, hardware architectures, and software architectures discussed herein, may be combined, omitted, or organized into different architectures for various embodiments. Thecomputing device 104 may be implemented with a device or remotely stored. - The
computing device 104 may be implemented as a part of an agent. The agent may be bipedal, two-wheeled, four-wheeled robot, vehicle, or self-propelled machine. The autonomous agent may be configured as a humanoid robot. The humanoid robot may take the form of all or a portion of a robot. For example, the humanoid robot may take the form of an arm with fingers. Thecomputing device 104 may be implemented as part of a telematics unit, a head unit, a navigation unit, an infotainment unit, an electronic control unit, among others of an agent. In other embodiments, the components and functions of thecomputing device 104 may be implemented, for example, with other devices (e.g., a portable device) or another device connected via a network (e.g., a network 134). Thecomputing device 104 may be capable of providing wired or wireless computer communications utilizing various protocols to send/receive electronic signals internally to/from components of the operatingenvironment 100. Additionally, thecomputing device 104 may be operably connected for internal computer communication via the bus 108 (e.g., a Controller Area Network (CAN) or a Local Interconnect Network (LIN) protocol bus) to facilitate data input and output between thecomputing device 104 and the components of the operatingenvironment 100. - In some embodiments, the agent may be the
agent 200 shown inFIG. 2 . Theagent 200 may include a number of sensors. For example, theagent 200 may include a firstoptical sensor 202, a secondoptical sensor 204, and aforce sensor 206. The firstoptical sensor 202, the secondoptical sensor 204, and theforce sensor 206 receive data from an environment of anobject 208. Thesensor module 102 receives, provides, and/or senses information associated with theagent 200, anobject 208, the operatingenvironment 100, an environment of theagent 200, and/or theoperational systems 106. In one embodiment, thesensor module 102 may receiveimage data 110,depth data 112, andtactile data 114 from the sensors. For example, thesensor module 102 may receiveimage data 110 from the firstoptical sensor 202,depth data 112 from the secondoptical sensor 204, and thetactile data 114 from theforce sensor 206. Thecomputing device 104 receives theimage data 110, thedepth data 112, and thetactile data 114 from thesensor module 102. Therefore, theimage data 110,depth data 112, andtactile data 114 is raw sensor data received from their respective sensors. - Likewise, the
image data 110,depth data 112, andtactile data 114 may include information about the sensors. For example, suppose theforce sensor 206 is able to move. Theimage data 110,depth data 112, andtactile data 114 may include information about theforce sensor 206 such as the relative position of theforce sensor 206 to a reference point as measured by a sensor. The reference point may be the firstoptical sensor 202 or the secondoptical sensor 204. For example, thedepth data 112 may include distance measurements from the secondoptical sensor 204 to theforce sensor 206. Likewise, thetactile data 114 may include dimensions (e.g., width, height, length, etc.) of theforce sensor 206. The force censor may measure various forces exerted by (e.g., torque) or experienced by (e.g., resistance) theagent 200. - The sensors 202-206 and/or the
sensor module 102 are operable to sense a measurement of data associated with theagent 200, the operatingenvironment 100, theobject 208, the environment, and/or theoperational systems 106 and generate a data signal indicating said measurement of data. These data signals may be converted into other data formats (e.g., numerical) and/or used by thesensor module 102, thecomputing device 104, and/or theoperational systems 106 to generate other data metrics and parameters. It is understood that the sensors may be any type of sensor, for example, acoustic, electric, environmental, optical, imaging, light, pressure, force, thermal, temperature, proximity, gyroscope, and accelerometers, among others. - The
computing device 104 includes aprocessor 116, amemory 118, adata store 120, and acommunication interface 122, which are each operably connected for computer communication via a bus 108 and/or other wired and wireless technologies. Thecommunication interface 122 provides software and hardware to facilitate data input and output between the components of thecomputing device 104 and other components, networks, and data sources, which will be described herein. Additionally, thecomputing device 104 also includes aplanner module 124, agrasp selector module 126, and acontroller 128, for online iterative re-planning facilitated by the components of the operatingenvironment 100. - The
planner module 124, thegrasp selector module 126, and thecontroller 128 may be an artificial neural network that act as a framework for machine learning, including deep learning. For example, theplanner module 124, thegrasp selector module 126, and thecontroller 128 may be a convolution neural network (CNN). In another embodiment, theplanner module 124, thegrasp selector module 126, and thecontroller 128 may further include or implement concatenator, a deep neural network (DNN), a recurrent neural network (RNN), a 3D Convolutional Neural Network (3DCNN) and/or Convolutional Long-Short Term Memory (ConvLSTM). - The
computing device 104 is also operably connected for computer communication (e.g., via the bus 108 and/or the communication interface 122) to one or moreoperational systems 106. Theoperational systems 106 may include, but are not limited to, any automatic or manual systems that may be used to enhance theagent 200, operation, and/or propulsion. Theoperational systems 106 may be dependent on the implementation. For example, theoperational system 106 may include apath planning module 130 and atrajectory module 132. Thepath planning module 130 monitors, analyses, operates theagent 200 to some degree. The path planning module may define a path as a course or direction through space. The path may be defined by a number of waypoints. Thetrajectory module 132 applies time points to the path, for example, by assigning time stamps to the waypoints. As another example, theoperational systems 106 may include manipulating theobject 208 to affect a target. Theoperational systems 106 also include and/or are operably connected for computer communication to thesensor module 102. For example, one or more sensors of thesensor module 102 may be incorporated with thepath planning module 130 to monitor characteristics of the environment or theagent 200, for example, based on thetactile data 114. - The
sensor module 102, thecomputing device 104, and/or theoperational systems 106 are also operatively connected for computer communication to thenetwork 134. Thenetwork 134 is, for example, a data network, the Internet, a wide area network (WAN) or a local area (LAN) network. Thenetwork 134 serves as a communication medium to various remote devices (e.g., databases, web servers, remote servers, application servers, intermediary servers, client machines, other portable devices). Detailed embodiments describing exemplary methods using the system and network configuration discussed above for online iterative re-planning will now be discussed in detail. - Referring now to
FIG. 3 , amethod 300 for online augmentation for learned grasping will now be described according to an exemplary embodiment.FIG. 3 will be described with reference toFIGS. 1, 2, 4, and 6 . For simplicity, themethod 300 will be described as a sequence of blocks, but it is understood that the blocks of themethod 300 may be organized into different architectures, elements, stages, and/or processes. - At
block 302, themethod 300 includes theplanner module 124 receiving, at a first time step, a first grasp and an initial object pose of anagent 200. The first grasp includes contact points, ct, and the initial object pose ot of anagent 200 are based on state information from the environment of theagent 200, as shown inFIG. 4 . The first grasp and/or the initial object pose, ot, may be selected based on state information, a known state, or a random selection. - The initial object pose, ot, may be determined based on a signed distance field (SDF) is a function ϕ(x): R3→R which maps a point in space to the distance to the nearest surface of an
object 208. The level set ϕ(x)=0 implicitly defines the surface of theobject 208. It is often computed offline and stored in a voxel-grid, but a neural network may be used to learn the function. - A DeepSDF models may be learned using data generated from a mesh of the
object 208. It is also possible to learn SDF models generated from point cloud based on theimage data 110, thedepth data 112, and/or thetactile data 114. The initial object pose, ot, may be based on any neural-network based SDF model. Using a learned SDF enables us to find the contact normal at the surface as the gradient n=∇xϕ(x) and compute it via back-propagation through the learned model. The contact normal will be used as an input to the wrench cost, thus the gradient of the contact normal with respect to the contact point, ct, is computed as the second derivative ∇xn=∇x 2ϕ(x). - The
agent 200 is able to grasp theobject 208 in the environment using one or more links. In an embodiment with an at least partially humanoid robot, the grasp may be performed by links. In this manner, links are portions of theagent 200 used in contact with theobject 208 to achieve a grasp, such as the initial grasp. In one embodiment, the links may include a number of end effector(s) that operate to mimic the function of human fingers. For example, theagent 200 may include afirst end effector 210, asecond end effector 212, athird end effector 214, afourth end effector 216, and afifth end effector 218 that is opposable such that it may move toward and touch the other end effector 210-216. When in contact with theobject 208, the end effector 210-218 are links of theagent 200 with theobject 208. The links may also include other portions of anagent 200 such as a grasper, rod, or clamp, among others. - A set of grasps may include any grasp that may achieved by the
agent 200 based on the state of theagent 200 at the given time step. For example, the first grasp of theobject 208 at a first time step and includes a contact point ct. A contact point includes contact pairs {p0, pR} where po is an object contact point on theobject 208 in the object's reference frame and pR is an agent contact point on theagent 200 in the agent's reference frame. Each instance where a link of theagent 200 is or may be in contact with theobject 208 is represented in a contact point ct. For example, where thefirst end effector 210, thesecond end effector 212, thethird end effector 214, and so on of theagent 200 are in contact with theobject 208 may be represented as contact points ct. - The contact points include contact pairs {p0, pR} where p0 is an object contact point on the
object 208 in the object's reference frame and pR is an agent contact point on theagent 200 in the agent's reference frame. The object contact point, p0, and/or the agent contact point, pR, may be based on the state information, such object state information and/or agent state information. - The state information, including the agent state information and the object state information, may be received or determined from the
sensor module 102 or theoperational systems 106. For example, theimage data 110 may correspond to the portion of theobject 208 that is not occluded by the grasp of theagent 200 or the environment. For example, as shown inFIG. 2 , theagent 200 is holding anobject 208, shown here as a wrench, with aforce sensor 206 represented by the end effectors 210-218 of theagent 200. Theimage data 110 may be image data, such as RGB data, YCB data, and/or YUV data. The image data may be used to determine a joint configuration of the end effectors 210-218 of the agent. - The state information may also be based on the
depth data 112 about theobject 208. Thedepth data 112 may be received from the secondoptical sensor 204. Thedepth data 112 may include a video sequence or a series of images, user inputs, and/or data from theoperational systems 106, such as data from a Controller Area Network (CAN) bus including as pedal pressure, steer angle, etc. Thedepth data 112 may be received from the secondoptical sensor 204. The secondoptical sensor 204 may include radar units, lidar units, image capture components, ranging sensors, cameras, scanners (e.g., 2-D scanners or 3-D scanners), or other measurement components. For example, thedepth data 112 from the secondoptical sensor 204 may be augmented by other sources, such as the firstoptical sensor 202, and/or remote devices (e.g., via the bus 108 and/or the communication interface 122). - The state information may also be based on the
tactile data 114 received from theforce sensor 206. Theforce sensor 206 may include tensile force sensors, compressions force sensors, tensile and force compression sensors, or other measurement components. Theforce sensor 206 may be with a Weiss gripper with GelSight tactile sensors. In some embodiments, thetactile data 114 is augmented as additional sensor data from other sources is received. For example, thetactile data 114 from theforce sensor 206 may be augmented by other sources, such as additional force sensors (e.g., from a second hand of the agent 200), and/or remote devices (e.g., via the bus 108 and/or the communication interface 122). Theplanner module 124 may determine the agent contact points and the object contact points based on thetactile data 114. - The
planner module 124 may also use theoperational system 106, including but not limited to the agent systems, to determine the state information. For example, thetrajectory module 132 may provide theplanner module 124 with the state information from previous time steps. Based on the state information from the previous time steps, theplanner module 124 may determine current state information, such as a current joint configuration of the end effector 210-218 of theagent 200. Therefore, using theimage data 110, thedepth data 112, thetactile data 114, and/or the operational systems,planner module 124 may determine - At
block 304, themethod 300 includes theplanner module 124 generating a first set of candidate object trajectories based on the first grasp and the initial object pose. Candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step. For example, the goal may be to move theobject 208 from the initial object pose ot to a goal pose oG. The first set of candidate object trajectories is generated such that oG=oT. - A candidate object trajectory is defined as τt=(ot+1, ct+1, . . . , oT, cT) where oT is a final object pose at a final time step and cT represents contact points of a final grasp at the final time step. The candidate object trajectory is defined to allow the
agent 200 to move from one time step to a next time step until the final time step is achieved. For example, the first time step is associated with the first grasp that includes contact points, ct, and the initial object pose ot of anagent 200 represented as (ot, ct) and transition to a second grasp that includes contact points, ct+1, and a next object pose ot+1 of theagent 200 represented as (ot+1, ct+1). Once at oG, theobject 208 should be grasped such that the links may apply a task wrench, wG. Since a grasp is specified by contact point, ct, on the surface of theobject 208, the grasp is parameterized by both continuous and discrete variables. Thus, in addition to the first set of candidate object trajectories is generated such that oG=oT, the first set of candidate object trajectories is also generated such that the task wrench, wG is inside the wrench conde of the grasp cT. - In this manner, a first set of candidate trajectories ot:T is generated from the initial object pose to the goal being sampled as τt k=ot:T k and reflect the current object pose and grasp 402. As the systems and methods described herein are directed to online iterative re-planning, the current object pose and grasp 402 may reflect a first grasp that includes contact points, ct, and an object pose ot at any time step t including the first time step, the second time step, and so on.
- In one embodiment the
planner module 124 includes anobject trajectory sampler 404, shown inFIG. 4 . Theobject trajectory sampler 404 receives the current object pose and grasp 402, here the first grasp that includes contact points, ct, and the initial object pose ot. Theobject trajectory sampler 404 may generate a set of candidate object trajectories, here the first set of candidate object trajectories, based on paths to the goal, such as a straight-line path to the goal. Gaussian perturbations may be sampled by theobject trajectory sampler 404 such that the end points are constrained and variance is highest in the middle of the trajectory. By sampling in this manner, each candidate object trajectory of the first set of candidate object trajectories reaches the goal. - At
block 306, themethod 300 includes thegrasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. In one embodiment, the grasp selection may be calculated by randomly sampling the fingers in contact and then performing continuous optimization of the contact points due to the mix of discrete and continuous decision variables. For example, a contact sequence may be constrained by assuming that between grasp ct at a first time step and ct+1 at a next time step, an end effectuator may be removed from contact or added to contact at a new contact location on theobject 208. Accordingly, the grasps may be constrained by a requiring that multiple end effectuators not be added or removed at the same time. Constraint also include end effectuators not changing directly from one contact location to another without being removed first. The constraints allow the contact sequence to be processed by lower-level controllers. - The contact sequence as (ct, Δct+1, . . . , ΔcT) which is the current grasp at the first time step and a sequence of grasp changes over subsequent time steps. Each grasp change Δct=(i, c)t is parameterized by an integer representing which end effectuator (if any) to change, and a contact point. If an end effectuator at the previous grasp is in contact, then this change removes that end effectuator from contact and the contact point is ignored. If an end effectuator is currently in contact, then the end effectuator is added to contact at contact location c. The grasp sequence is initialized to be a sequence of null changes.
- In one embodiment, the
grasp selector module 126 includes acontact point sampler 406 andcontact point optimizer 408 shown inFIG. 4 . Thecontact point sampler 406 may use a current nominal grasp sequence. In one embodiment, for each timestep with some probability, thecontact point sampler 406 re-samples for each link and a new contact point c. The contact points may be sampled from a zero mean Gaussian. In this manner, Δct+1:T is sampled and then contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories are calculated as ct+1:T from Δct+1:T and ct. Accordingly, the grasp selection is performed, represented as ct:T k=GraspSelect (ot:T k). - At
block 308, themethod 300 includes thegrasp selector module 126 selecting a firstcandidate object trajectory 412 from the first set of candidate object trajectories ot:T. The firstcandidate object trajectory 412 is selected based on the contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories ct+1:T. For example, once the contact points ct+1:T for grasps associated with each candidate object trajectory of the first set of candidate object trajectories ot:T are sampled atblock 306 by thecontact point sampler 406, then the contact points ct+1:T for grasps associated with the candidate object trajectories are optimized using a learned neural network cost model, such as a cost function J(τ) by thecontact point optimizer 408 such that the first set of candidate object trajectories are optimizedsamples 410. - The learned neural network cost model is a generative model based on feasible outcomes. The feasible outcomes may be based on offline data such as wrench cost data, kinematics cost data, superposition cost data, as will be discussed in greater detail below. The learned neural network cost model is based on the form:
-
J(τt)=L task +L grasp - In particular, the grasp selection may be incorporated using the
contact point sampler 406 to sample and thecontact point optimizer 408 gradient based optimization. This grasp selection optimizes a general grasp quality cost Lgrasp as well as a task-specific cost Ltask to generate the first set of candidate object trajectories are optimizedsamples 410 such that the firstcandidate object trajectory 412 is selected from the optimizedsamples 410. In one embodiment, a gradient descent may be used to minimize this loss function for each sample using ∇τJ(τ). Thecontact point optimizer 408 optimizes the contact points based on the constraints. For example, since moving an end effectuator involves first removing that end effectuator, only contact points for an end effectuator that has just been added to contact is optimized. While J(τ) includes a cost, which encourages the contact points to be on the surface of theobject 208, the contact points may deviate from the surface of theobject 208. To avoid this, a projected gradient descent may be used, where theplanner module 124 iterates between optimizing J(τ) and minimizing |ϕ(c)ϕ to project the contact points onto the surface of theobject 208. The trajectory that has the lowest cost for execution is selected from the first set of candidate object trajectories ot:T as the firstcandidate object trajectory 412 according to τt *=arg min J (ot:T, ct:T,)k. Accordingly, an object trajectory is selected from a set of candidate object trajectories for the corresponding time step. - In one embodiment, the learned neural network cost model may be represented as an energy based model such that the energy based model is a generative model that learns unnormalized density, pθ(x)∝exp(−Eθ(x)). Eθ is a neural network with parameter θ. The energy based model is trained via contrastive divergence to minimize the objective given by the energy function:
-
L=E D(−E θ(x))−E pθ(−E θ(x)) - The first expectation is with respect to the data (e.g., positive examples), and the second expectation is with respect to samples generated from the model (e.g., negative examples). These may be generated via Langevin dynamics with the energy function. Thus, the energy of positive examples is minimized while the energy of negative examples maximized. In this manner, the
grasp selector module 126 selecting a firstcandidate object trajectory 412 from the first set of candidate object trajectories ot:T based on the learned neural network cost model. - The learned neural network cost model may be based on offline data. Offline data may be associated with the
agent 200, such as a specific agent or type of agent. The offline data is data that includes a number of feasible outcomes for an agent. The feasible outcomes are possible and/or allowable physical orientation, such as position, grasp, velocity, acceleration, path plan, etc. The feasible outcomes may be based on cost functions, such a wrench cost function, superposition cost, and/or inverse kinematics cost, among others. - In one embodiment, the wrench cost may be found by optimizing the contact point locations to be able to resist the external wrench. For example, given a desired wrench cost w=(ftotal, Ttotal) and grasp c, a following optimization may be:
-
- where the pk vector is from the object center of mass to the contact point, fk and Ak are the contact force and the linearized friction cone for the kth finger. Both pk and Ak are functions of the contact points of the grasp, and the contact normal which is a function of the geometry of the
object 208 and the grasp contact point. To avoid an external wrench wext therefore, the desired wrench is set to −wext and the wrench error is ∥wext+w*∥2. All forces are vectors and may be in the frame of reference of theobject 208. - This cost function is a quadratic and thus may be optimized fairly quickly. However, to evaluate this cost in parallel and differentiate through it, the cost function is prohibitively slow to use inside a re-planning loop to perform in an online manner with current data. Instead, the learned neural network cost model estimates the cost based on the feasible outcomes. The feasible outcomes include a large dataset of grasps ck, contact normals nk, and desired wrenches wd in the object frame of the
object 208. The contact force optimization is calculated for each element of the dataset to generate error labels e=∥wd−w*∥2. The learned neural network cost model is parameterized as a neural network with four layers and ReLU activations. In this manner the learned neural network cost model is trained based on the energy function. In addition to this cost, the energy of the negative examples is maximized. In this way the resulting the learned neural network cost model has low energy for low-error force optimizations, and high energy for both high-error force optimizations, as well as data that is outside the training distribution. For example, to evaluate the wrench cost for a grasp and object pose the external wrench is first transformed to the object frame with the object pose, compute the contact normals at the grasp using the object SDF, and then evaluate the learned neural network cost model. - In another embodiment, the learned neural network cost model may be trained based on the superposition cost between the finger and the
object 208 when moving between contact points. Waypoints may be generated from the current finger position to the contact point and check the signed distance at each waypoint using ϕ. Since the waypoint locations are a linear function of ck and ck+1, the derivative of the superposition cost is calculated with respect to the contact point using ∇xϕ(x). - In yet another embodiment, the learned neural network cost model may be trained based on the inverse kinematics cost. Like wrench cost, the inverse kinematics is prohibitively slow to be performed online. Instead, a set of feasible outcomes is received offline, including a large dataset of joint configurations for the fingers. Forward kinematics may be performed for all of these joint configurations to generate finger locations. A check is performed at each of these joint configurations to determine which joint configurations are feasible outcomes. The learned neural network cost model is trained in a similar manner as described above with respect to wrench cost, for example, using contrastive divergence to minimize the energy of positive example, while maximizing the energy of negative examples and artificially generated negative examples from the model.
- At
block 310, themethod 300 includes thepath planning module 130 causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps. In one embodiment, thepath planning module 130 may receive the selected candidate object trajectory. Thepath planning module 130 may then cause theagent 200 to move theobject 208 toward the goal pose oG.and/or grasp theobject 208 in order to apply a task wrench, wG. For example, thepath planning module 130 may cause the links of theagent 200 to exert a mechanical force on a target using theobject 208. Additionally or alternatively, a policy may be generated based on the selected candidate object trajectory using the contact point(s) to pretrain theagent 200 or another agent on the operation of theobject 208 with respect to a target. - Referring now to
FIG. 5 , amethod 500 for online augmentation for learned grasping will now be described according to an exemplary embodiment.FIG. 5 will be described with reference toFIGS. 1-4, and 6 . For simplicity, themethod 500 will be described as a sequence of blocks, but it is understood that the blocks of themethod 500 may be organized into different architectures, elements, stages, and/or processes. Furthermore, themethod 500 includes blocks of themethod 300, which operate in a similar manner as described above. - At
block 302, themethod 500 includes theplanner module 124 receiving, at a first time step, the first grasp that includes contact points, ct, and the initial object pose ot of theagent 200. - At
block 304, themethod 500 includes theplanner module 124 generating the first grasp that includes contact points, ct, and the initial object pose ot. As described above, the candidate object trajectories of the first set of candidate object trajectories provide a number object poses from the initial object pose to a goal for a number of future time steps after the first time step. - At
block 306, themethod 500 includes thegrasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the first set of candidate object trajectories. - At
block 308, themethod 500 includes theplanner module 124 selecting a first candidate object trajectory from the first set of candidate object trajectories based on a learned neural network cost model trained using offline data associated with theagent 200, as described above with respect to themethod 300. - At
block 310, themethod 500 includes thecontroller 128 causing the agent to execute the first candidate object trajectory at a second time step of the number of future time steps. However, as described above, the systems and methods herein describe online iterative re-planning and may include re-planning at the future time steps. - At
block 502, themethod 500 determines if theagent 200 has reached goal. The determination may be based on the state information including the object state information and/or the agent state information. In one embodiment, the determination may be based on theimage data 110, thedepth data 112, and thetactile data 114. In another embodiment, the determination may be based on the SDF being a function ϕ(x): R{circumflex over ( )}3→R. - If it is determined that the
agent 200 has reached the goal, themethod 500 ends. However, if it is determined that theagent 200 has not reached the goal, then themethod 500 proceeds to block 504. - At
block 504, themethod 500 includes theplanner module 124 receiving, at a next time step, a next grasp and a next object pose based on the selected candidate object trajectory of theagent 200. For example, continuing the method from the first time step, theplanner module 124 receives, at the second time step, a second grasp and a next object pose based on the first candidate object trajectory of the agent. The second grasp and a next object pose may additionally be based on SDF, the state information including the object state information and/or the agent state information, theimage data 110, thedepth data 112, and thetactile data 114, as described above with respect to block 302 of themethod 300. - At
block 506, themethod 500 includes theplanner module 124 generating a second set of candidate object trajectories based on the next grasp, in this example the second grasp, and the next object pose. Therefore, in this example, the candidate object trajectories of the second set of candidate object trajectories provide a number object poses from the next object pose to the goal for a number of future time steps after the second time. As described above, the candidate object trajectory is defined as τt=(ot+1, ct+1, . . . , oT, cT) but here time step t is the second time step, accordingly time step t+1 is the third time step, and so on to the goal. - In addition to the second set of candidate object trajectories is generated such that oG=oT, the second set of candidate object trajectories is also generated such that the task wrench, wG is inside the wrench conde of the grasp cT. The second set of candidate trajectories ot:T is generated from the next object pose to the goal being sampled as τt k=ot:T k and reflect the current object pose and grasp 402.
- At
block 508, themethod 500 includes thegrasp selector module 126 calculating contact points for grasps associated with each candidate object trajectory of the second set of candidate object trajectories. As discussed above, thecontact point sampler 406 may use a current nominal grasp sequence. In one embodiment, for each timestep with some probability, thecontact point sampler 406 re-samples for each link and a new contact point c. - At
block 510, themethod 500 includes theplanner module 124 selecting a second candidate object trajectory from the second set of candidate object trajectories based on the learned neural network cost model. As described above with respect to block 308, in one embodiment, the contact points ct+1:T for grasps associated with the candidate object trajectories are optimized using a cost function J(τ) by thecontact point optimizer 408 such that the second set of candidate object trajectories are optimizedsamples 410. - At
block 512, themethod 500 includes thecontroller 128 causing the agent to execute the second candidate object trajectory at a second time step. The method ten returns to block 502 to if theagent 200 has reached the goal. If it is determined that theagent 200 has reached the goal, themethod 500 ends. However, if it is determined that theagent 200 has not reached the goal, then themethod 500 proceeds to block 504 for another iteration. Continuing the example from above, such an iteration would be the third iteration for a third time steps. Therefore, the systems and method described herein are iterative. Accordingly, the systems and methods described herein plan a sequence of object poses and grasps for the agent, such as dexterous in-hand manipulation. While existing methods do not use feedback from the actual achieved grasps, here feedback is incorporated to achieve improved robustness and failure recovery while requiring less processing power. The proposed planning method is extendable to the case where there is parametric uncertainty in object parameters for improved robustness. - Still another aspect involves a computer-readable medium including processor-executable instructions configured to implement one aspect of the techniques presented herein. An aspect of a computer-readable medium or a computer-readable device devised in these ways is illustrated in
FIG. 6 , wherein animplementation 600 includes a computer-readable medium 608, such as a CD-R, DVD-R, flash drive, a platter of a hard disk drive, etc., on which is encoded computer-readable data 606. This encoded computer-readable data 606, such as binary data including a plurality of zero's and one's as shown in 606, in turn includes a set of processor-executable computer instructions 604 configured to operate according to one or more of the principles set forth herein. - In this
implementation 600, the processor-executable computer instructions 604 may be configured to perform amethod 602, such as themethod 300 ofFIG. 3 or themethod 500 ofFIG. 5 . In another aspect, the processor-executable computer instructions 604 may be configured to implement a system, such as the operatingenvironment 100 ofFIG. 1 . Many such computer-readable media may be devised by those of ordinary skill in the art that are configured to operate in accordance with the techniques presented herein. - As used in this application, the terms “component,” “module,” “system,” “interface,” and the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processing unit, an object, an executable, a thread of execution, a program, or a computer. By way of illustration, both an application running on a controller and the controller may be a component. One or more components residing within a process or thread of execution and a component may be localized on one computer or distributed between two or more computers.
- Further, the claimed subject matter is implemented as a method, apparatus, or article of manufacture using standard programming or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. Of course, many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.
- Although the subject matter has been described in language specific to structural features or methodological acts, it is to be understood that the subject matter of the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example aspects. Various operations of aspects are provided herein. The order in which one or more or all of the operations are described should not be construed as to imply that these operations are necessarily order dependent. Alternative ordering will be appreciated based on this description. Further, not all operations may necessarily be present in each aspect provided herein.
- As used in this application, “or” is intended to mean an inclusive “or” rather than an exclusive “or.” Further, an inclusive “or” may include any combination thereof (e.g., A, B, or any combination thereof). In addition, “a” and “an” as used in this application are generally construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Additionally, at least one of A and B and/or the like generally means A or B or both A and B. Further, to the extent that “includes,” “having,” “has,” “with,” or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising.”
- Further, unless specified otherwise, “first”, “second”, or the like are not intended to imply a temporal aspect, a spatial aspect, an ordering, etc. Rather, such terms are merely used as identifiers, names, etc. for features, elements, items, etc. For example, a first channel and a second channel generally correspond to channel A and channel B or two different or two identical channels or the same channel. Additionally, “comprising”, “comprises”, “including”, “includes”, or the like generally means comprising or including, but not limited to.
- It will be appreciated that several of the above-disclosed and other features and functions, or alternatives or varieties thereof, may be desirably combined into many other different systems or applications. Also, that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/090,966 US12397426B2 (en) | 2022-08-26 | 2022-12-29 | Systems and methods for online iterative re-planning |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202263401488P | 2022-08-26 | 2022-08-26 | |
| US18/090,966 US12397426B2 (en) | 2022-08-26 | 2022-12-29 | Systems and methods for online iterative re-planning |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240066695A1 true US20240066695A1 (en) | 2024-02-29 |
| US12397426B2 US12397426B2 (en) | 2025-08-26 |
Family
ID=90000838
Family Applications (2)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/080,623 Pending US20240083021A1 (en) | 2022-08-26 | 2022-12-13 | Grasp selection |
| US18/090,966 Active 2043-09-21 US12397426B2 (en) | 2022-08-26 | 2022-12-29 | Systems and methods for online iterative re-planning |
Family Applications Before (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/080,623 Pending US20240083021A1 (en) | 2022-08-26 | 2022-12-13 | Grasp selection |
Country Status (1)
| Country | Link |
|---|---|
| US (2) | US20240083021A1 (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250291473A1 (en) * | 2024-03-13 | 2025-09-18 | Honda Motor Co., Ltd. | Human-machine interaction device touch-interaction control based on user-defined parameters |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| EP4497549A1 (en) * | 2023-07-27 | 2025-01-29 | Robert Bosch GmbH | Device and method for training a machine-learning model for determining a grasp of a multi-finger gripper for manipulating an object |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200039073A1 (en) * | 2017-04-04 | 2020-02-06 | Mujin, Inc. | Control device, picking system, distribution system, program, control method and production method |
| US20210154832A1 (en) * | 2019-11-26 | 2021-05-27 | Teradyne, Inc. | Multi-angle end effector |
| US20210299866A1 (en) * | 2020-12-23 | 2021-09-30 | Intel Corporation | Robotic manipulation planning based on probalistic elastoplastic deformation material point method |
| US20230294277A1 (en) * | 2022-03-20 | 2023-09-21 | Nvidia Corporation | Reactive interactions for robotic applications and other automated systems |
| US20240139946A1 (en) * | 2021-03-03 | 2024-05-02 | Omron Corporation | Robot Cell System Design Device, Method, and Program |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9694494B1 (en) * | 2015-12-11 | 2017-07-04 | Amazon Technologies, Inc. | Feature identification and extrapolation for robotic item grasping |
| US12017356B2 (en) * | 2021-11-30 | 2024-06-25 | Fanuc Corporation | Collision handling methods in grasp generation |
| WO2023150238A1 (en) * | 2022-02-02 | 2023-08-10 | Intrinsic Innovation Llc | Object placement |
-
2022
- 2022-12-13 US US18/080,623 patent/US20240083021A1/en active Pending
- 2022-12-29 US US18/090,966 patent/US12397426B2/en active Active
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200039073A1 (en) * | 2017-04-04 | 2020-02-06 | Mujin, Inc. | Control device, picking system, distribution system, program, control method and production method |
| US20210154832A1 (en) * | 2019-11-26 | 2021-05-27 | Teradyne, Inc. | Multi-angle end effector |
| US20210299866A1 (en) * | 2020-12-23 | 2021-09-30 | Intel Corporation | Robotic manipulation planning based on probalistic elastoplastic deformation material point method |
| US20240139946A1 (en) * | 2021-03-03 | 2024-05-02 | Omron Corporation | Robot Cell System Design Device, Method, and Program |
| US20230294277A1 (en) * | 2022-03-20 | 2023-09-21 | Nvidia Corporation | Reactive interactions for robotic applications and other automated systems |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20250291473A1 (en) * | 2024-03-13 | 2025-09-18 | Honda Motor Co., Ltd. | Human-machine interaction device touch-interaction control based on user-defined parameters |
Also Published As
| Publication number | Publication date |
|---|---|
| US20240083021A1 (en) | 2024-03-14 |
| US12397426B2 (en) | 2025-08-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Huang et al. | Uncertainty-aware driver trajectory prediction at urban intersections | |
| EP3880413B1 (en) | Method and system for trajectory optimization for vehicles with geometric constraints | |
| US12087012B2 (en) | Systems and methods for visuo-tactile object pose estimation | |
| Zhang et al. | Sim2real learning of obstacle avoidance for robotic manipulators in uncertain environments | |
| CN112537306A (en) | System and method for providing cooperative perceived lane change control in dense traffic | |
| US20220379478A1 (en) | Object manipulation with collision avoidance using complementarity constraints | |
| US12112622B2 (en) | Systems and methods for heterogeneous multi-agent multi-modal trajectory prediction with evolving interaction graphs | |
| US12397426B2 (en) | Systems and methods for online iterative re-planning | |
| US11958201B2 (en) | Systems and methods for visuo-tactile object pose estimation | |
| US12330303B2 (en) | Online augmentation of learned grasping | |
| US11420331B2 (en) | Motion retargeting control for human-robot interaction | |
| US20220269948A1 (en) | Training of a convolutional neural network | |
| US12097614B2 (en) | Object manipulation | |
| US20210269060A1 (en) | Systems and methods for curiousity development in agents | |
| US20210270605A1 (en) | Systems and methods for estimating tactile output based on depth data | |
| US11597088B2 (en) | Systems and methods for fully coupled models for crowd navigation | |
| Pratama et al. | Enhance Deep Reinforcement Learning with Denoising Autoencoder for Self-Driving Mobile Robot | |
| US20240091938A1 (en) | System and method for providing in hand robotics dexterous manipulation of objects | |
| US20240160812A1 (en) | Trajectory imputation and prediction | |
| US20230316126A1 (en) | System and method for providing accelerated reinforcement learning training | |
| Moridian et al. | Learning navigation tasks from demonstration for semi-autonomous remote operation of mobile robots | |
| US11787053B2 (en) | Systems and methods for utilizing interacting Gaussian mixture models for crowd navigation | |
| Khlif | A Comprehensive Review of Intelligent Navigation of Mobile Robots Using Reinforcement Learning with A Comparative Analysis of a modified Q-Learning Method and DQN in Simulated Gym Environment | |
| US11979590B2 (en) | Systems and methods for predicting future data using diverse sampling | |
| US20250345935A1 (en) | Manipulation task solver |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: HONDA MOTOR CO., LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:POWER, THOMAS J.;SOLTANI ZARRIN, RANA;SIGNING DATES FROM 20221219 TO 20221222;REEL/FRAME:062236/0918 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |