US20240383143A1 - Affordance-driven modular reinforcement learning - Google Patents
Affordance-driven modular reinforcement learning Download PDFInfo
- Publication number
- US20240383143A1 US20240383143A1 US18/391,129 US202318391129A US2024383143A1 US 20240383143 A1 US20240383143 A1 US 20240383143A1 US 202318391129 A US202318391129 A US 202318391129A US 2024383143 A1 US2024383143 A1 US 2024383143A1
- Authority
- US
- United States
- Prior art keywords
- action
- maps
- affordance
- location
- parameters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1628—Programme controls characterised by the control loop
- B25J9/163—Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1602—Programme controls characterised by the control system, structure, architecture
- B25J9/161—Hardware, e.g. neural networks, fuzzy logic, interfaces, processor
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1669—Programme controls characterised by programming, planning systems for manipulators characterised by special application, e.g. multi-arm co-operation, assembly, grasping
Definitions
- aspects of the present disclosure relate to machine learning.
- Robotic systems are used to perform a wide variety of tasks today. Additionally, the use of robots has increased substantially, and is expected to continue to increase. For example, robotic arms can be used to manipulate and move objects or to perform other actions, such as on a vehicle assembly line. As the desired tasks have expanded, the robotic control systems have similarly grown increasingly complex. Beyond controlling the positioning of robotic manipulators with high accuracy (which may include not only positioning and/or orientation of any end effectors such as graspers, but also of the other components of the arm itself), control systems may also obtain and use information about their environment. For example, before a robotic arm can be used to pick up objects in some cases, the control system may first determine environmental context, such as where the objects are, how the objects are positioned/oriented, how the objects can be lifted, and/or the like.
- Machine learning has revolutionized many fields and systems, including some aspects of robotics.
- dynamically controlling robotic systems based on the surrounding environment remains a highly difficult problem, even with advantages provided by some conventional machine learning solutions.
- Certain aspects of the present disclosure provide a processor-implemented method, comprising: accessing sensor data depicting a physical environment; generating a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters; selecting, based on the set of output affordance maps, a first set of action parameters and the first location; and performing the first action at the first location in accordance with the first set of action parameters.
- processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer-readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.
- FIG. 1 depicts an example environment for training and using machine learning models to control robotic systems.
- FIG. 4 is a flow diagram depicting an example method for selecting and performing actions using machine learning.
- FIG. 7 depicts an example processing system configured to perform various aspects of the present disclosure.
- affordances refer to action possibilities that an actor or agent can perform in an environment.
- affordances e.g., actions that can be performed
- affordances might include picking up an object, placing an object atop another object, sliding an object to a new spot, rotating an object, and the like.
- robotic manipulators e.g., robotic arms
- grasping is used as an example action that a robotic manipulator can perform.
- the action parameters may include an action orientation (e.g., the orientation of the end effector 107 or other entity), an action force (e.g., how much force the robotic arm 105 should apply on the object, such as via the grasper (e.g., how tightly to grasp) and/or how much force to apply to push or move the object), an action direction (e.g., which direction the robotic arm 105 should move during the action, such as which direction to push, pull, or turn the object), and the like.
- an action orientation e.g., the orientation of the end effector 107 or other entity
- an action force e.g., how much force the robotic arm 105 should apply on the object, such as via the grasper (e.g., how tightly to grasp) and/or how much force to apply to push or move the object
- an action direction e.g., which direction the robotic arm 105 should move during the action, such as which direction to push, pull, or turn the object
- control system 125 may perform both model training (e.g., during an exploration phase) as well as runtime inferencing (e.g., during a robustness phase).
- model(s) may be trained by one or more other systems before being deployed to the control system 125 (or the control system 125 may train the model(s) before deploying the model(s) to other inferencing system(s)).
- each encoder 210 generates a latent tensor 215 based on the input sensor data 205 .
- the encoder 210 may process the sensor data 205 using one or more convolution layers to extract salient features and generate the latent tensor 215 .
- an action parameter tensor 220 can be combined with the latent tensor 215 .
- the action parameter tensor 220 may be appended to or concatenated with the latent tensor 215 , added to the latent tensor (e.g., via element-wise addition), and the like.
- the control system then passes each aggregated latent tensor through a decoder 225 to generate one or more affordance maps 230 .
- an “affordance map” is generally a data structure representing the probabilities that one or more locations in an environment correspond possible action(s).
- an affordance map may indicate, for each location (e.g., each pixel in an image), the probability that one or more actions can be performed at the location (e.g., a grasping action).
- each decoder 225 generates an affordance map 230 for each aggregated latent tensor.
- each decoder 225 may generate the same number of affordance maps 230 .
- each decoder 225 may generate a corresponding set of three hundred affordance maps 230 for a total of fifteen-hundred affordance maps 230 generated based on a single set of input sensor data 205 .
- each encoder 210 may be used to generate a corresponding latent tensor 215 , each of which may be used to generate a set of aggregated latent tensors.
- a set of three hundred aggregated latent tensors may be generated for each encoder-decoder pair (resulting in fifteen-hundred aggregated latent tensors), and each aggregated latent tensor may then be processed using a corresponding decoder 225 (e.g., using the decoder 225 that corresponds to the encoder 210 used to generate each given aggregated latent tensor), resulting in fifteen-hundred affordance maps 230 .
- each affordance map 230 is generated based on a corresponding set of action parameters (encoded in the action parameter tensor 220 ), each affordance map thereby corresponds to or indicates a predicted set of success probabilities if the corresponding set of action parameters is used to perform the action.
- the affordance maps 230 indicate the probability that the given action will be successfully completed for each location or point in the scene (as depicted in the sensor data 205 ), if the corresponding set of action parameters is used (e.g., the probability that a grasp action will be successful if the end effector is used to grasp at each location).
- FIG. 3 depicts an example architecture 300 for generating affordance maps and uncertainty maps.
- the architecture 300 is used by a control system, such as the control system 125 , to train the model(s) and/or to generate affordance maps that drive action selection, as discussed in more detail below.
- the architecture 300 provides additional detail for the architecture 200 of FIG. 2 .
- sensor data 305 (which may correspond to the sensor data 120 of FIG. 1 and/or the sensor data 205 of FIG. 2 ) is evaluated to generate affordance maps 345 (which may correspond to affordance maps 230 of FIG. 2 ) and uncertainty maps 350 .
- the sensor data 305 is processed by an encoder 310 (which may correspond to the encoder 210 of FIG. 2 ) to generate a latent tensor, which is combined with one or more action parameter tensors and is processed by a set of decoders 325 A-C (collectively, decoders 325 ), which may correspond to the decoder 225 of FIG. 2 .
- each decoder 325 may correspond to a branch or model of the ensemble.
- a shared encoder 310 is used for each decoder 325 .
- each decoder 325 may have its own corresponding encoder 310 . Additionally, though three decoders 325 are depicted, in other aspects, there may be any number of decoders 325 or branches in the model ensemble.
- each decoder 325 generates an interim affordance map 330 for each set of action parameters based on the sensor data 305 .
- the decoder 325 A generates the interim affordance maps 330 A
- the decoder 325 B generates the interim affordance maps 330 B
- the decoder 325 C generates the interim affordance maps 330 C.
- the interim affordance maps 330 may generally indicate probabilities that an action will be successful if the action is performed at one or more specific locations using one or more specific action parameters (e.g., at a specific point on an object and using a specific grip orientation).
- the generated interim affordance maps 330 are provided to an aggregation component 335 and an uncertainty component 340 .
- the aggregation component 335 aggregates the interim affordance maps 330 to generate the output affordance map(s) 345 .
- the aggregation component 335 may perform element-wise summation or averaging.
- each affordance map 345 may therefore include action success probabilities determined based on the collective predictions contained within each interim affordance map 330 (e.g., the average probability of success for each pixel).
- the aggregation component 335 may identify the corresponding set of interim affordance maps 330 (one generated by each decoder 325 ) for the set of parameter values, and aggregate this set to generate an output affordance map 345 for the set of action parameter values. In this way, the total number of affordance maps 345 may match the number of unique action parameter value combinations. For example, if there are three hundred unique options, then the aggregation component 335 may generate three hundred output affordance maps 345 , one for each option.
- the uncertainty component 340 generates a set of uncertainty maps 350 based on the interim affordance maps 330 .
- the uncertainty maps 350 indicate the uncertainty of the model with respect to the affordance maps. For example, if the predicted probability of success for a single point varies substantially between interim affordance maps 330 A, 330 B, and 330 C, then the uncertainty component 340 may determine that uncertainty is high for the single point.
- the uncertainty maps 350 are generated using a Jensen-Shannon Divergence (JSD) approach (also referred to in some aspects as the information radius).
- JSD Jensen-Shannon Divergence
- the uncertainty value for each point may be defined using Equation 1 below, where u(s, a) is the uncertainty value for a given state s (e.g., the state of the robot and/or environment, such as for a given location or pixel in the input) and set of action parameters a, JSD( ⁇ ) is the JSD function, p(g
- the uncertainty may be defined as the entropy (H) of the expected ( ) probability of success (e.g., the mean probability across the interim affordance maps 330 for the set of action parameters), minus the expected entropy of the predicted probabilities of success.
- the uncertainty component 340 can generate a respective uncertainty map 350 for each respective set of action parameters, indicating the model uncertainty with respect to each location in the space (e.g., for each pixel in image data) and with respect to each set of action parameters.
- these uncertainty maps 350 may be used during training and/or during inferencing.
- the control system may use the affordance maps 345 and uncertainty maps 350 to select an action that maximizes (or at least increases) predicted success while also maximizing (or at least increasing) uncertainty in order to learn more rapidly.
- the control system may select an action that maximizes, or at least increases, predicted success.
- the control system may also seek to minimize, or at least reduce, the uncertainty.
- the control system can perform ensemble sampling. For example, for each set of input sensor data 305 (e.g., each time an action is requested or desired), the one member of the ensemble (e.g., one decoder 325 ) may be selected with at least an element of randomness (e.g., selecting the decoder randomly or pseudo-randomly).
- the interim affordance maps 330 generated by this selected decoder are the most important or dominant maps (or the only maps) used during this exploration stage for the current input data.
- the control system may use the interim affordance maps 330 generated by the (randomly selected) decoder 325 during exploration. This can make the training process faster by adding noise to the training data to accelerate generalization.
- the uncertainty values may be summed with the probability values of the corresponding interim affordance maps 330 of the selected decoder 325 . That is, for each set of action parameter values, the control system may sum the corresponding uncertainty map 350 with the corresponding interim affordance map 330 . For example, the control system may perform element-wise summation to add the uncertainty value for each location (e.g., each pixel) with the predicted probability of action success for each location. In some aspects, this summation is performed for each interim affordance map 330 generated by the selected decoder 325 (e.g., for each set of action parameters).
- the control system can use the uncertainty maps to provide a proxy of the information that can be gained by attempting the action at each location using the indicated set of parameters.
- the control system can obtain an upper confidence bound (UCB) for exploration, which can be used to efficiently learn to find new graspable configurations in the scene.
- UMB upper confidence bound
- control system can score the possible configurations (e.g., each combination of a location and a set of action parameters) and select the highest-valued configuration (e.g., the location and set of action parameteres having the highest score) to test.
- the actions are sampled or selected according to Equation 2 below, where r(s, a) is the generated score of a given state s (e.g., a given location) using a given set of action parameters a, and p(g
- r ⁇ ( s , a ) p ⁇ ( g ⁇ s , a , ⁇ ) + u ⁇ ( s , a ) ( 2 )
- control system may generate a respective score for each respective pixel (e.g., for each location depicted by a pixel) in each respective interim affordance map 330 (e.g., for each set of action parameters).
- control system evaluates the generated scores to select the peak or highest score (e.g., the location and set of action parameters having the highest generated value).
- the control system selects the action based on determining that performing the selected action (e.g., the action at the selected location and using the selected parameters) will maximize (or at least increase) the predicted success while also maximizing (or at least increasing) the uncertainty.
- this action may then be performed, and the success of the action can be evaluated to update or refine one or more parameters of the model.
- the control system may update a subset of the parameters, rather than all parameters. For example, the control system may only update the parameters of a selected decoder 325 , leaving the other decoders unchanged, based on the success of the action.
- control system may use masked updating (e.g., masked backpropagation) to update only a subset of those parameters of the selected decoder 325 , such as by updating only the parameters that correspond to the selected action location (e.g., the parameters used to predict the success probability for the selected pixel(s)), such that parameters corresponding to other locations (e.g., other pixels in the interim affordance map 330 ) are unchanged.
- masked updating e.g., masked backpropagation
- the control system may use the average affordance probability map(s) (e.g., the affordance maps 345 ), obtained by averaging the probability values of the components in the ensemble, to select the best configuration to perform the action (e.g., the location and set of action parameters with the highest predicted probability of success).
- the control system may optionally incorporate the uncertainty maps 350 into this selection process (e.g., to select the least ambiguous configurations that are most likely to result in success).
- the actions are sampled or selected according to Equation 3 below, where r (s, a) is the generated score of a given state s (e.g., a given location) using a given set of action parameters a, p(g
- r ⁇ ( s , a ) E ⁇ ⁇ ⁇ [ p ⁇ ( g ⁇ s , a , ⁇ ) ] - u ⁇ ( s , a ) ( 3 )
- control system may generate a respective score for each respective pixel or location in each respective affordance map 345 (e.g., for each set of action parameters).
- control system then evaluates the generated scores to select the peak or highest score (e.g., the location and set of action parameters having the highest generated value).
- the control system selects the action based on determining that performing the selected action (e.g., the action at the selected location and using the selected parameters) will maximize, or at least increase, the predicted success while also minimizing, or at least reducing, the uncertainty.
- the selected action may then be performed, and the success of the action can be optionally evaluated to update or refine one or more parameters of the model.
- FIG. 4 is a flow diagram depicting an example method 400 for selecting and performing actions using machine learning.
- the method 400 is performed by a control system, such as the control system 125 of FIG. 1 , which may use an architecture for generating affordance maps, such as the architecture 200 of FIG. 2 and/or the architecture 300 of FIG. 3 .
- the control system accesses sensor data (e.g., the sensor data 120 of FIG. 1 , the sensor data 205 of FIG. 2 , and/or the sensor data 305 of FIG. 3 ).
- “accessing” data may generally include receiving, requesting, retrieving, collecting, generating, or otherwise gaining access to the data.
- the control system may access the sensor data continuously or periodically (e.g., every second), or each time an action is desired (e.g., each time the control system or another entity desires to perform the action, such as grasping an object and picking the object up).
- the sensor data may generally include a wide variety of data, including image data, depth data, point clouds, and the like.
- the control system generates a set of affordance maps (e.g., the affordance maps 230 of FIG. 2 and/or the affordance maps 345 of FIG. 3 ) by processing the sensor data using a machine learning model (e.g., an ensemble model), as discussed above.
- a machine learning model e.g., an ensemble model
- the control system generates a set of uncertainty maps (e.g., the uncertainty maps 350 of FIG. 3 ) based on the interim affordance maps, as discussed above. For example, the control system may use Equation 1 to evaluate an uncertainty value for each pixel or location.
- the control system selects an action based on the affordance maps and/or uncertainty maps.
- selecting an action may generally include selecting both a point in the space where the action will be performed (e.g., a location on an object depicted by a pixel in the affordance maps) and a set of action parameters (e.g., a grasp orientation, a grip force, and the like).
- the control system may evaluate the affordance maps and uncertainty maps during exploration (e.g., using Equation 2 above) to select the action.
- the control system may evaluate only the affordance maps or may evaluate both the affordance maps and the uncertainty maps during runtime use (when robustness is desired), such as using Equation 3 above.
- the control system can perform the selected action.
- performing may include transmitting, instructing, or otherwise facilitating performance of the action by another entity, such as a robotic arm. That is, “performing” the action may include instructing a robot (or another system that controls the robot) to perform the indicated action (e.g., to perform the action at the indicated location using the indicated action parameters).
- the method 400 can then terminate or loop back to block 405 to select the next action. In some aspects, during exploration and/or when the control system is collecting data for potential further training, the method 400 continues to block 430 .
- the control system generates a success value based on the performance of the action. For example, as discussed above, the control system may evaluate one or more sets of sensor data during and/or after performance of the action to evaluate how successful the action was.
- the success value is a categorical (e.g., binary) value, such as indicating whether the action was performed successfully.
- the success criteria used to define whether a given action was successful may be defined based on the particular action.
- the control system updates one or more model parameters of the ensemble model based on the generated success value.
- the control system may use masked updating (e.g., masked backpropagation in the case of convolutional models) based on the selected location or pixel (where the action was performed), such that other parameters of the ensemble machine learning model corresponding to locations other than the selected location are not updated based on the success value.
- the control system may generate a loss based on the success value, and mask the loss based on the specific location(s) or pixels (in the affordance map(s)) used to select the action (e.g., where the peak was). This masked loss can then be used to perform a masked backpropagation operation to update the corresponding (relevant) parameters of the model.
- the control system updates the parameters of the selected subset (e.g., the selected decoder and encoder), leaving remaining parameters (e.g., parameters of the other decoders) frozen.
- the control system generates a set of interim affordance maps (e.g., the interim affordance maps 330 of FIG. 3 ) by processing the aggregated latent tensor(s) using each branch (e.g., each decoder, such as the decoders 325 of FIG. 3 ) of the ensemble, as discussed above.
- each branch e.g., each decoder, such as the decoders 325 of FIG. 3
- the control system determines whether there is at least one additional set of action parameters that has not been used to generate an aggregated latent tensor. For example, if there are three hundred unique sets of values, then the control system may determine whether each unique set of values has been evaluated. If there is at least one additional set of values remaining, then the method 500 returns to block 510 . If not, then the method 500 continues to block 525 .
- the illustrated example depicts an iterative process for conceptual clarity (selecting and evaluating each set of values sequentially), in some aspects, some or all of the alternative parameter values may be evaluated in parallel, as discussed above.
- the control system aggregates the interim affordance maps to generate one or more output affordance maps in response to determining that the control system is executing in a runtime inference or robustness phase. For example, as discussed above with reference to Equation 3 , the control system may aggregate the interim affordance maps in order to smooth over noise that may be caused by using the output of any single branch of the ensemble. In some aspects, in response to determining that the control system is executing in a training or exploration phase, the control system may refrain from aggregating the interim affordance maps (or may otherwise refrain from using the aggregated affordance maps, if the aggregated affordance maps are still generated). For example, as discussed above with reference to Equation 2 , the control system may select and evaluate one of the branches (e.g., one of the interim affordance maps) rather than aggregating the branches during these phases.
- the branches e.g., one of the interim affordance maps
- the method 500 then terminates (e.g., returning to block 415 of FIG. 4 ).
- FIG. 6 is a flow diagram depicting an example method 600 for selecting and performing actions using machine learning.
- the method 600 is performed by a control system, such as the control system 125 of FIG. 1 .
- sensor data depicting a physical environment is accessed.
- a set of output affordance maps is generated based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters.
- a first set of action parameters and the first location are selected based on the set of output affordance maps.
- the first action is performed at the first location in accordance with the first set of action parameters.
- the method 600 further includes generating a set of uncertainty maps based on the set of output affordance maps, comprising evaluating divergence between the set of output affordance maps, wherein the first set of action parameters and the first location are selected based further on the set of uncertainty maps.
- generating the set of output affordance maps comprises: generating a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model, generating a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor, and generating a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
- generating the set of output affordance maps further comprises: generating a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model, and generating the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
- selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
- the first decoder is selected, from a plurality of decoders, with at least an element of randomness, and selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
- each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
- the action orientation comprises a grasp orientation for a robotic grasper.
- the method 600 further includes generating a success value based on the performance of the first action at the first location in accordance with the first set of action parameters, and updating one or more parameters of the ensemble machine learning model based on the success value.
- updating the one or more parameters of the ensemble machine learning model comprises performing a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
- FIG. 7 depicts an example processing system 700 configured to perform various aspects of the present disclosure, including, for example, the techniques and methods described with respect to FIGS. 1 - 6 .
- the processing system 700 may correspond to a control system, such as the control system 125 of FIG. 1 .
- the processing system 700 may correspond to a device that controls robotic manipulators, trains affordance prediction models, and/or uses affordance prediction models during runtime.
- the operations described below with respect to the processing system 700 may be distributed across any number of devices or systems.
- the processing system 700 includes a central processing unit (CPU) 702 , which in some examples may be a multi-core CPU. Instructions executed at the CPU 702 may be loaded, for example, from a program memory associated with the CPU 702 or may be loaded from a memory partition (e.g., a partition of memory 724 ).
- CPU central processing unit
- a memory partition e.g., a partition of memory 724
- the processing system 700 also includes additional processing components tailored to specific functions, such as a graphics processing unit (GPU) 704 , a digital signal processor (DSP) 706 , a neural processing unit (NPU) 708 , a multimedia component 710 (e.g., a multimedia processing unit), and a wireless connectivity component 712 .
- GPU graphics processing unit
- DSP digital signal processor
- NPU neural processing unit
- multimedia component 710 e.g., a multimedia processing unit
- wireless connectivity component 712 e.g., a wireless connectivity component
- An NPU such as NPU 708
- NPU 708 is generally a specialized circuit configured for implementing the control and arithmetic logic for executing machine learning algorithms, such as algorithms for processing artificial neural networks (ANNs), deep neural networks (DNNs), random forests (RFs), and the like.
- An NPU may sometimes alternatively be referred to as a neural signal processor (NSP), tensor processing unit (TPU), neural network processor (NNP), intelligence processing unit (IPU), vision processing unit (VPU), or graph processing unit.
- NSP neural signal processor
- TPU tensor processing unit
- NNP neural network processor
- IPU intelligence processing unit
- VPU vision processing unit
- graph processing unit graph processing unit
- NPUs such as the NPU 708
- a plurality of NPUs may be instantiated on a single chip, such as a system on a chip (SoC), while in other examples the NPUs may be part of a dedicated neural-network accelerator.
- SoC system on a chip
- NPUs may be optimized for training or inference, or in some cases configured to balance performance between both.
- the two tasks may still generally be performed independently.
- NPUs designed to accelerate training are generally configured to accelerate the optimization of new models, which is a highly compute-intensive operation that involves inputting an existing dataset (often labeled or tagged), iterating over the dataset, and then adjusting model parameters, such as weights and biases, in order to improve model performance.
- model parameters such as weights and biases
- NPUs designed to accelerate inference are generally configured to operate on complete models. Such NPUs may thus be configured to input a new piece of data and rapidly process this piece of data through an already trained model to generate a model output (e.g., an inference).
- a model output e.g., an inference
- the processing system 700 may also include one or more sensor processing units 716 associated with any manner of sensor, one or more image signal processors (ISPs) 718 associated with any manner of image sensor, and/or a navigation processor 720 , which may include satellite-based positioning system components (e.g., GPS or GLONASS), as well as inertial positioning system components.
- ISPs image signal processors
- navigation processor 720 which may include satellite-based positioning system components (e.g., GPS or GLONASS), as well as inertial positioning system components.
- the processing system 700 may also include one or more input and/or output devices 722 , such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like.
- input and/or output devices 722 such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like.
- one or more of the processors of the processing system 700 may be based on an ARM or RISC-V instruction set.
- the processing system 700 also includes the memory 724 , which is representative of one or more static and/or dynamic memories, such as a dynamic random access memory, a flash-based static memory, and the like.
- the memory 724 includes computer-executable components, which may be executed by one or more of the aforementioned processors of the processing system 700 .
- the memory 724 includes an affordance component 724 A, an aggregation component 724 B, an uncertainty component 724 C, an action component 724 D, and a training component 724 E.
- the memory 724 further includes model parameters 724 F for one or more models (e.g., affordance prediction models, such as the machine learning model 207 of FIG. 2 , which may include one or more encoders such as the encoder 310 of FIG. 3 and/or one or more decoders such as the decoders 325 of FIG. 3 ).
- the memory 724 may also include other data, such as a list of available or possible actions that the robotic manipulator(s) can perform, relevant action parameters for each action, possible values for each action parameter, and the like. Though depicted as discrete components for conceptual clarity in FIG. 7 , the illustrated components (and others not depicted) may be collectively or individually implemented in various aspects.
- the processing system 700 further comprises an affordance circuit 726 , an aggregation circuit 727 , an uncertainty circuit 728 , an action circuit 729 , and a training circuit 730 .
- the depicted circuits, and others not depicted, may be configured to perform various aspects of the techniques described herein.
- the affordance component 724 A and/or the affordance circuit 726 may be used to generate interim affordance maps (e.g., the interim affordance maps 330 of FIG. 3 ), as discussed above.
- the affordance component 724 A and/or the affordance circuit 726 may process input sensor data using encoder(s) to generate latent tensor(s), aggregate these latent tensor(s) with action parameter information, and generate interim affordance map(s) using decoder(s).
- the aggregation component 724 B and/or the aggregation circuit 727 may be used to aggregate interim affordance maps (generated by the affordance component 724 A and/or the affordance circuit 726 ) to generate output affordance maps (e.g., the affordance maps 345 of FIG. 3 ), as discussed above.
- the aggregation component 724 B and/or the aggregation circuit 727 may generate, for each respective set of action parameter values, a respective aggregated or output affordance map by averaging the corresponding set of interim affordance maps.
- the uncertainty component 724 C and/or the uncertainty circuit 728 may be used to generate uncertainty maps (e.g., the uncertainty maps 350 of FIG. 3 ) based on interim affordance maps (generated by the affordance component 724 A and/or the affordance circuit 726 ), as discussed above.
- the uncertainty component 724 C and/or the uncertainty circuit 728 may, for each respective set of action parameter values, generate a respective uncertainty map by computing the JSD of the corresponding set of interim affordance maps.
- the action component 724 D and/or the action circuit 729 may be used to generate action instructions (e.g., the actions 130 of FIG. 1 ) based on interim and/or output affordance maps (generated by the affordance component 724 A, the affordance circuit 726 , the aggregation component 724 B, and/or the aggregation circuit 727 ) and/or based on uncertainty maps (generated by the uncertainty component 724 C and/or the uncertainty circuit 728 ), as discussed above.
- action instructions e.g., the actions 130 of FIG. 1
- interim and/or output affordance maps generated by the affordance component 724 A, the affordance circuit 726 , the aggregation component 724 B, and/or the aggregation circuit 727
- uncertainty maps generated by the uncertainty component 724 C and/or the uncertainty circuit 728
- the action component 724 D and/or the action circuit 729 may use Equation 2 and/or Equation 3 above to select an action (e.g., a location in the environment, such as on an object, where the action should be performed, as well as a set of action parameter values for performing the action) that maximizes the probability of success and/or maximizes or minimizes uncertainty.
- an action e.g., a location in the environment, such as on an object, where the action should be performed, as well as a set of action parameter values for performing the action
- the training component 724 E and/or the training circuit 730 may be used to evaluate the success of the performed action(s) and/or to update the machine learning ensemble based on the determined success, as discussed above. For example, the training component 724 E and/or the training circuit 730 may generate a success value or label based on the results of the action, and update the parameters of the corresponding portion(s) of the machine learning model that were used to select the action (e.g., the specific decoder and/or a subset of parameters for the encoder, such as the subset of parameters that correspond to the location/pixels where the action was performed).
- the affordance circuit 726 , the aggregation circuit 727 , the uncertainty circuit 728 , the action circuit 729 , and the training circuit 730 may collectively or individually be implemented in other processing devices of the processing system 700 , such as within the CPU 702 , the GPU 704 , the DSP 706 , the NPU 708 , and the like.
- processing system 700 and/or components thereof may be configured to perform the methods described herein.
- elements of the processing system 700 may be omitted, such as where the processing system 700 is a server computer or the like.
- the multimedia component 710 , the wireless connectivity component 712 , the sensor processing units 716 , the ISPs 718 , and/or the navigation processor 720 may be omitted in other aspects.
- aspects of the processing system 700 may be distributed between multiple devices.
- a method comprising: accessing sensor data depicting a physical environment; generating a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters; selecting, based on the set of output affordance maps, a first set of action parameters and the first location; and performing the first action at the first location in accordance with the first set of action parameters.
- Clause 2 A method according to Clause 1, further comprising generating a set of uncertainty maps based on the set of output affordance maps, comprising evaluating divergence between the set of output affordance maps, wherein the first set of action parameters and the first location are selected based further on the set of uncertainty maps.
- Clause 3 A method according to Clause 2, wherein generating the set of output affordance maps comprises: generating a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model; generating a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor; and generating a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
- Clause 4 A method according to Clause 3, wherein generating the set of output affordance maps further comprises: generating a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model; and generating the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
- Clause 5 A method according to Clause 4, wherein selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
- Clause 6 A method according to any of Clauses 3-5, wherein: the first decoder is selected, from a plurality of decoders, with at least an element of randomness, and selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
- Clause 7 A method according to any of Clauses 3-6, wherein each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
- Clause 8 A method according to Clause 7, wherein the action orientation comprises a grasp orientation for a robotic grasper.
- Clause 9 A method according to any of Clauses 1-8, further comprising: generating a success value based on the performance of the first action at the first location in accordance with the first set of action parameters; and updating one or more parameters of the ensemble machine learning model based on the success value.
- Clause 10 A method according to Clause 9, wherein updating the one or more parameters of the ensemble machine learning model comprises performing a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
- Clause 11 A processing system comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to perform a method in accordance with any of Clauses 1-10.
- Clause 12 A processing system comprising means for performing a method in accordance with any of Clauses 1-10.
- Clause 13 A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any of Clauses 1-10.
- Clause 14 A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any of Clauses 1-10.
- an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein.
- the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
- exemplary means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
- a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members.
- “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
- determining encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining, and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Also, “determining” may include resolving, selecting, choosing, establishing, and the like.
- the methods disclosed herein comprise one or more steps or actions for achieving the methods.
- the method steps and/or actions may be interchanged with one another without departing from the scope of the claims.
- the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.
- the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions.
- the means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor.
- ASIC application specific integrated circuit
- those operations may have corresponding counterpart means-plus-function components with similar numbering.
Landscapes
- Engineering & Computer Science (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Automation & Control Theory (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manipulator (AREA)
Abstract
Certain aspects of the present disclosure provide techniques and apparatus for improved machine learning. Sensor data depicting a physical environment is accessed, and a set of output affordance maps is generated based on processing the sensor data using an ensemble machine learning model, where each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters. Based on the set of output affordance maps, a first set of action parameters and the first location are selected. The first action is performed at the first location in accordance with the first set of action parameters.
Description
- The present application for patent claims the benefit of priority to U.S. Provisional Appl. No. 63/502,752, filed May 17, 2023, which is hereby incorporated by reference herein in its entirety.
- Aspects of the present disclosure relate to machine learning.
- Robotic systems are used to perform a wide variety of tasks today. Additionally, the use of robots has increased substantially, and is expected to continue to increase. For example, robotic arms can be used to manipulate and move objects or to perform other actions, such as on a vehicle assembly line. As the desired tasks have expanded, the robotic control systems have similarly grown increasingly complex. Beyond controlling the positioning of robotic manipulators with high accuracy (which may include not only positioning and/or orientation of any end effectors such as graspers, but also of the other components of the arm itself), control systems may also obtain and use information about their environment. For example, before a robotic arm can be used to pick up objects in some cases, the control system may first determine environmental context, such as where the objects are, how the objects are positioned/oriented, how the objects can be lifted, and/or the like.
- Machine learning has revolutionized many fields and systems, including some aspects of robotics. However, dynamically controlling robotic systems based on the surrounding environment remains a highly difficult problem, even with advantages provided by some conventional machine learning solutions.
- Certain aspects of the present disclosure provide a processor-implemented method, comprising: accessing sensor data depicting a physical environment; generating a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters; selecting, based on the set of output affordance maps, a first set of action parameters and the first location; and performing the first action at the first location in accordance with the first set of action parameters.
- Other aspects provide processing systems configured to perform the aforementioned methods as well as those described herein; non-transitory, computer-readable media comprising instructions that, when executed by one or more processors of a processing system, cause the processing system to perform the aforementioned methods as well as those described herein; a computer program product embodied on a computer-readable storage medium comprising code for performing the aforementioned methods as well as those further described herein; and a processing system comprising means for performing the aforementioned methods as well as those further described herein.
- The following description and the related drawings set forth in detail certain illustrative features of one or more aspects.
- The appended figures depict certain aspects of the present disclosure and are therefore not to be considered limiting of the scope of this disclosure.
-
FIG. 1 depicts an example environment for training and using machine learning models to control robotic systems. -
FIG. 2 depicts an example architecture for generating affordance maps. -
FIG. 3 depicts an example architecture for generating affordance maps and uncertainty maps. -
FIG. 4 is a flow diagram depicting an example method for selecting and performing actions using machine learning. -
FIG. 5 is a flow diagram depicting an example method for generating affordance maps. -
FIG. 6 is a flow diagram depicting an example method for selecting and performing actions using machine learning. -
FIG. 7 depicts an example processing system configured to perform various aspects of the present disclosure. - To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the drawings. It is contemplated that elements and features of one aspect may be beneficially incorporated in other aspects without further recitation.
- Aspects of the present disclosure provide apparatuses, methods, processing systems, and non-transitory computer-readable mediums for providing affordance-based reinforcement learning to improve action selection and performance, such as using robotic manipulators.
- As used herein, “affordances” refer to action possibilities that an actor or agent can perform in an environment. For example, in an environment with one or more objects located nearby, affordances (e.g., actions that can be performed) might include picking up an object, placing an object atop another object, sliding an object to a new spot, rotating an object, and the like. In some aspects of the present disclosure, robotic manipulators (e.g., robotic arms) are used as an example technology that can be improved or controlled using the techniques described herein. Specifically, in some aspects, grasping is used as an example action that a robotic manipulator can perform. However, aspects of the present disclosure are applicable to a wide variety of actions-such as pushing, pulling, placing objects on top of each other, inserting objects into other objects, extracting objects from within other objects, turning objects, and the like—which a robot may be capable of performing. Further, aspects of the present disclosure are applicable to a wide variety of non-robot technologies and solutions, including simulation or control of virtual entities (e.g., simulated robots) or other physical or virtual agents.
- In some aspects, sensor data from the environment is collected and evaluated using machine learning to generate or select an action, and a robotic manipulator is controlled to perform the action (or to attempt to perform the action). As used herein, “performing” an action may include successfully performing the action (e.g., picking up an object) as well as unsuccessfully performing the action (e.g., dropping an object or failing to grasp the object entirely). In some aspects, during a training or exploration phase, actions that may maximize (or at least increase) learning potential are selected, and the resulting action success or outcome can be evaluated to refine the models. In some aspects, during an inferencing or robustness phase, actions that maximize (or at least increase) probability of success and/or minimize (or at least reduce) uncertainty may be selected. In some aspects, during the inferencing phase, the system may continue to monitor the action success or outcome to further refine the models.
- In some aspects, the content and format of the evaluated sensor data may vary depending on the particular implementation and may include a variety of data such as, but not limited to, image data (e.g., captured using one or more imaging sensors), information about the robot's internal state (e.g., pose, velocity, force, grasp detection, what tool or end effector the robot is using, and the like), point cloud data (e.g., from light detection and ranging (LIDAR) or other depth sensors), radar data (which may include point cloud data, 3D volume heatmaps, and the like), ultrasonic data, and the like.
- In some aspects, machine learning models can be trained based on knowledge of the robot control itself (e.g., the system knows how to move the robotic manipulators to desired positions and orientations), without previous knowledge of the environment or objects (e.g., with no pre-training or other such knowledge).
- In some aspects, machine learning models are trained to identify and score the robot's affordances, which correspond to actions the robot (also referred to in some aspects as an agent) can perform in the environment. For example, given an image of the scene (including one or more objects), the models may generate affordance map(s) indicating, for each pixel or depicted location, the probability that a “grasp” action can be successfully completed (e.g., the probability that the robot would be successful in picking up the object if the robot grasped the object at the given location using one or more specific parameters, such as a defined grasper orientation).
- In this way, by allowing the system to explore and attempt to complete actions, the models can be trained based on (automated) experimentation rather than active labeling or manual effort. This can substantially improve model robustness, as well as substantially reducing the costs and delays of training the model(s) and/or deploying the robot to a new position or otherwise changing the environment or objects with which the robot interacts.
-
FIG. 1 depicts anexample environment 100 for training and using machine learning models to control robotic systems. - In the illustrated example, a
robotic arm 105 equipped with a roboticgrasper end effector 107 is in an environment with one ormore objects 110. In the illustratedenvironment 100, one ormore sensors 115 are used to collectsensor data 120 for a space in theenvironment 100. For example, in some aspects, thesensor 115 includes an imaging sensor, such as a red-green-blue-depth (RGB-D) imaging sensor. In some aspects, one or more of thesensors 115 are configured to capture data (e.g., images) of the scene, which may include theobjects 110 present, therobotic arm 105, and/or theend effector 107 itself. In some aspects, thesensor data 120 is collected continuously or periodically (e.g., several times per second) and evaluated by acontrol system 125 to generateactions 130. As described above, theactions 130 may include any suitable action which a robot may be capable of performing, such as, but not limited to, grasping, pushing, pulling, placing objects on top of each other, inserting objects into other objects, extracting objects from within other objects, turning objects, and the like. - In some aspects, the
control system 125 evaluates thesensor data 120 to predict which parts of the scene or theobjects 110 are graspable by the robot (e.g., which objects can be grasped and moved by the robot and/or which part(s) of a given object are graspable), as well as how the object(s) or portion(s) thereof should be grasped (e.g., what orientation of theend effector 107 and/or what point or location on the object should be grasped). - In some aspects, as discussed below in more detail, a deep learning solution (e.g., a convolutional-neural-network-based solution) is used by the
control system 125, based on interactive learning and/or uncertainty minimization (or at least reduction). As discussed below in more detail, thecontrol system 125 chooses both a location to grasp in the environment as well as one or more action parameters (e.g., an angle or orientation of theend effector 107 to specify the grasping direction). As used herein, the action parameters can generally correspond to or include a wide variety of parameters, depending on the particular implementation and/or action. For example, the action parameters may include an action orientation (e.g., the orientation of theend effector 107 or other entity), an action force (e.g., how much force therobotic arm 105 should apply on the object, such as via the grasper (e.g., how tightly to grasp) and/or how much force to apply to push or move the object), an action direction (e.g., which direction therobotic arm 105 should move during the action, such as which direction to push, pull, or turn the object), and the like. - In the illustrated example, the
control system 125 selects one ormore actions 130 based on evaluating thesensor data 120 using one or more machine learning models. For example, the action(s) 130 may indicate a specific set of action parameters (e.g., an orientation of the end effector 107) and a specific location in the scene (e.g., where on theobject 110 therobotic arm 105 should grasp). Therobotic arm 105 and/or theend effector 107 can then be driven to the indicated location for grasping at the specified orientation, and theend effector 107 attempts to grasp the object (or perform some other action). In some aspects, in order to refine or update the models, the success of the attempt can be determined by the one or more sensors 115 (e.g., load cell sensors, imaging sensors, and the like). - In some aspects, the
control system 125 can thereby be used to repeatedly generate the action(s) 130 for therobotic arm 105, observing the results and refining the models. After experimentation (e.g., attempting to perform the action some number of times), the models can be used to provide robust and accurate actions with a high success rate. - In some aspects, the
control system 125 may perform both model training (e.g., during an exploration phase) as well as runtime inferencing (e.g., during a robustness phase). In other aspects, the model(s) may be trained by one or more other systems before being deployed to the control system 125 (or thecontrol system 125 may train the model(s) before deploying the model(s) to other inferencing system(s)). -
FIG. 2 depicts anexample architecture 200 for generating affordance maps. In some aspects, thearchitecture 200 is used by a control system, such as thecontrol system 125, to train the model(s) and/or to generate affordance maps that drive action selection, as discussed in more detail below. - In the illustrated example, sensor data 205 (which may correspond to the
sensor data 120 ofFIG. 1 ) is evaluated by a machine learning model 207 (referred to in some aspects as an affordance model) to generateaffordance maps 230. In some aspects, themachine learning model 207 is an ensemble (e.g., a combination of multiple models) of deep learning models (e.g., convolution-based models). In some aspects, theinput sensor data 205 is collected and/or received from a camera (e.g., the sensor data may indicate the color and/or depth of each pixel in an image). - In the illustrated example, the
machine learning model 207 includes one ormore encoders 210 and one ormore decoders 225. In some aspects, if themachine learning model 207 is an ensemble, then each encoder and decoder pair may correspond to a single model within the ensemble. That is, there may be multiple models, each including acorresponding encoder 210 anddecoder 225, in themachine learning model 207. In some aspects, a single sharedencoder 210 may be used in combination with a set ofmultiple decoders 225 in the ensemble. - In some aspects, the machine learning model 207 (or each branch thereof) is implemented as a U-Net. For example, the
encoder 210 may comprise or correspond to one or more convolutions and/or downsampling operations, while thedecoder 225 may comprise or correspond to one or more corresponding convolutions and/or upsampling operations. In some aspects, one or more skip connections may also be used to directly provide intermediate features (generated by the encoder 210) as input to one or more operations of thedecoder 225. - As illustrated, each
encoder 210 generates alatent tensor 215 based on theinput sensor data 205. For example, as discussed above, theencoder 210 may process thesensor data 205 using one or more convolution layers to extract salient features and generate thelatent tensor 215. In the illustrated example, in this latent space, anaction parameter tensor 220 can be combined with thelatent tensor 215. For example, theaction parameter tensor 220 may be appended to or concatenated with thelatent tensor 215, added to the latent tensor (e.g., via element-wise addition), and the like. - In some aspects, the
action parameter tensor 220 may generally encode one or more action parameters for performing the action, as discussed above. For example, theaction parameter tensor 220 may encode the grasping orientation to be used. In some aspects, the control system generates multiple combined or aggregated latent tensors using a multipleaction parameter tensor 220. For example, for each respective combination of action parameter values, the control system may generate a corresponding aggregated latent tensor (including both thelatent tensor 215 and a respective action parameter tensor 220). - As one example, for categorical action parameters (e.g., whether to push or pull an object, whether to rotate the object left or right, and the like), the
action parameter tensor 220 may encode a specific combination or set of categories (e.g., a first set of action parameters indicating to rotate the object to the right while pushing the object, a second set indicating to rotate the object to the right while pulling the object, a third set indicating to rotate the object to the left while pushing the object, and a fourth set indicating to rotate the object to the left while pulling the object). - As another example, in some aspects, continuous action parameters (e.g., grasp orientation, action force, and the like) may be discretized into a set of categories or values, and the
action parameter tensor 220 may encode a specific combination for such categories or values. For example, the orientation and force options may be discretized into some number (e.g., five hundred) of possible orientations and/or forces. - In this way, the control system may use a single
latent tensor 215 to generate a larger number of aggregated latent tensors by combining a copy or instance of the singlelatent tensor 215 with each uniqueaction parameter tensor 220 in turn (in sequence or in parallel). - In the illustrated example, the control system then passes each aggregated latent tensor through a
decoder 225 to generate one or more affordance maps 230. As used herein, an “affordance map” is generally a data structure representing the probabilities that one or more locations in an environment correspond possible action(s). For example, an affordance map may indicate, for each location (e.g., each pixel in an image), the probability that one or more actions can be performed at the location (e.g., a grasping action). In some aspects, eachdecoder 225 generates anaffordance map 230 for each aggregated latent tensor. For example, if grasp orientation is the only action parameter and there are three hundred discrete orientations that the control system considers, then three hundred aggregated latent tensors may be generated (based on a singlelatent tensor 215 if a sharedencoder 210 is used, or based on multiple latent tensors if multiple encoders are used), and thedecoder 225 may be used to generate three hundred affordance maps 230 (in sequence or in parallel). Additionally, as discussed above, if themachine learning model 207 is an ensemble (e.g., with multiple decoders, each either using a corresponding encoder or using a shared encoder), then eachdecoder 225 may generate the same number of affordance maps 230. Continuing the above example, if there are five branches ordecoders 225 in themachine learning model 207, then eachdecoder 225 may generate a corresponding set of three hundred affordance maps 230 for a total of fifteen-hundred affordance maps 230 generated based on a single set ofinput sensor data 205. - Further, if a
separate encoder 210 is used for each of thedecoders 225, eachencoder 210 may be used to generate a correspondinglatent tensor 215, each of which may be used to generate a set of aggregated latent tensors. Continuing the above example, if themachine learning model 207 includes five branches (e.g., five encoder-decoder pairs) and the action parameter has three hundred discrete orientations or alternatives, a set of three hundred aggregated latent tensors may be generated for each encoder-decoder pair (resulting in fifteen-hundred aggregated latent tensors), and each aggregated latent tensor may then be processed using a corresponding decoder 225 (e.g., using thedecoder 225 that corresponds to theencoder 210 used to generate each given aggregated latent tensor), resulting in fifteen-hundred affordance maps 230. - As each
affordance map 230 is generated based on a corresponding set of action parameters (encoded in the action parameter tensor 220), each affordance map thereby corresponds to or indicates a predicted set of success probabilities if the corresponding set of action parameters is used to perform the action. In some aspects, the affordance maps 230 indicate the probability that the given action will be successfully completed for each location or point in the scene (as depicted in the sensor data 205), if the corresponding set of action parameters is used (e.g., the probability that a grasp action will be successful if the end effector is used to grasp at each location). For example, if thesensor data 205 comprises image data, then eachaffordance map 230 may include a predicted success probability for each pixel (or other logical portion) of the image, indicating the probability that the action will be successful if the action is performed in accordance with the corresponding action parameter(s) at the physical location that corresponds to or is depicted by the pixel. - In some aspects, the affordance maps 230 can be collectively thought of as maps of Bernoulli distributions, one per each point or pixel in the input data. That is, each
decoder 225 in the ensemble generates acorresponding affordance map 230 for each set of action parameters. Accordingly, for each location (e.g., each pixel), there may be multiple predicted success probabilities for each set of action parameters (one generated by each decoder 225). - In some aspects, during training, the control system explores uncertainty in grasping points and orientations (or other action parameters), as discussed in more detail below. This can allow the control system to rapidly learn (e.g., to update the parameters of the decoder(s) 225 and encoder(s) 210. In some aspects, during runtime (when robustness is desired), the control system may evaluate the affordance maps 230 to identify the specific action (e.g., a specific location and grasp orientation) that results in the highest probability of success.
-
FIG. 3 depicts anexample architecture 300 for generating affordance maps and uncertainty maps. In some aspects, thearchitecture 300 is used by a control system, such as thecontrol system 125, to train the model(s) and/or to generate affordance maps that drive action selection, as discussed in more detail below. In some aspects, thearchitecture 300 provides additional detail for thearchitecture 200 ofFIG. 2 . In the illustrated example, sensor data 305 (which may correspond to thesensor data 120 ofFIG. 1 and/or thesensor data 205 ofFIG. 2 ) is evaluated to generate affordance maps 345 (which may correspond toaffordance maps 230 ofFIG. 2 ) and uncertainty maps 350. - In the illustrated example, the
sensor data 305 is processed by an encoder 310 (which may correspond to theencoder 210 ofFIG. 2 ) to generate a latent tensor, which is combined with one or more action parameter tensors and is processed by a set ofdecoders 325A-C (collectively, decoders 325), which may correspond to thedecoder 225 ofFIG. 2 . For example, as discussed above, each decoder 325 may correspond to a branch or model of the ensemble. In the illustrated example, a sharedencoder 310 is used for each decoder 325. In some aspects, as discussed above, each decoder 325 may have its owncorresponding encoder 310. Additionally, though three decoders 325 are depicted, in other aspects, there may be any number of decoders 325 or branches in the model ensemble. - As illustrated, each decoder 325 generates an interim affordance map 330 for each set of action parameters based on the
sensor data 305. Specifically, thedecoder 325A generates the interim affordance maps 330A, thedecoder 325B generates the interim affordance maps 330B, and thedecoder 325C generates the interim affordance maps 330C. In some aspects, as discussed above, the interim affordance maps 330 may generally indicate probabilities that an action will be successful if the action is performed at one or more specific locations using one or more specific action parameters (e.g., at a specific point on an object and using a specific grip orientation). - In the illustrated example, the generated interim affordance maps 330 are provided to an
aggregation component 335 and anuncertainty component 340. Generally, theaggregation component 335 aggregates the interim affordance maps 330 to generate the output affordance map(s) 345. For example, theaggregation component 335 may perform element-wise summation or averaging. In some aspects, eachaffordance map 345 may therefore include action success probabilities determined based on the collective predictions contained within each interim affordance map 330 (e.g., the average probability of success for each pixel). In some aspects, as discussed above, there may be anaffordance map 345 for each unique set of possible action parameters for performing the action. - That is, for each respective set of action parameter values (e.g., each action parameter tensor), the
aggregation component 335 may identify the corresponding set of interim affordance maps 330 (one generated by each decoder 325) for the set of parameter values, and aggregate this set to generate anoutput affordance map 345 for the set of action parameter values. In this way, the total number of affordance maps 345 may match the number of unique action parameter value combinations. For example, if there are three hundred unique options, then theaggregation component 335 may generate three hundred output affordance maps 345, one for each option. - In the illustrated example, the
uncertainty component 340 generates a set of uncertainty maps 350 based on the interim affordance maps 330. In some aspects, the uncertainty maps 350 indicate the uncertainty of the model with respect to the affordance maps. For example, if the predicted probability of success for a single point varies substantially between interim affordance maps 330A, 330B, and 330C, then theuncertainty component 340 may determine that uncertainty is high for the single point. In some aspects, the uncertainty maps 350 are generated using a Jensen-Shannon Divergence (JSD) approach (also referred to in some aspects as the information radius). - In some aspects, a
respective uncertainty map 350 is generated for each set of action parameters. That is, for each respective set of action parameter values (e.g., each action parameter tensor), theuncertainty component 340 may identify the corresponding set of interim affordance maps 330 (one generated by each decoder 325) for the set of parameter values, and evaluate this set to generate theuncertainty map 350 for the set of action parameter values, indicating the success uncertainty at each location if the set of action parameter values is used. In this way, the total number of uncertainty maps 350 may match the number of unique action parameter value combinations. For example, if there are three hundred unique options, then theuncertainty component 340 may generate three hundred output uncertainty maps 350, one for each option. - In some aspects, the uncertainty value for each point (e.g., each pixel) may be defined using Equation 1 below, where u(s, a) is the uncertainty value for a given state s (e.g., the state of the robot and/or environment, such as for a given location or pixel in the input) and set of action parameters a, JSD(⋅) is the JSD function, p(g|s, a) is the probability of successfully performing the action g with the action parameters a in state S (e.g., at a given location in the environment), and θ is a set of parameters sampled from the set of ensemble parameters Θ (where θ corresponds to the parameters of a specific model or branch of the ensemble, such as a single decoder 325):
-
- That is, the uncertainty may be defined as the entropy (H) of the expected () probability of success (e.g., the mean probability across the interim affordance maps 330 for the set of action parameters), minus the expected entropy of the predicted probabilities of success. In this way, the
uncertainty component 340 can generate arespective uncertainty map 350 for each respective set of action parameters, indicating the model uncertainty with respect to each location in the space (e.g., for each pixel in image data) and with respect to each set of action parameters. - In some aspects, these uncertainty maps 350 may be used during training and/or during inferencing. For example, during training, the control system may use the affordance maps 345 and uncertainty maps 350 to select an action that maximizes (or at least increases) predicted success while also maximizing (or at least increasing) uncertainty in order to learn more rapidly. During inferencing (when maximum robustness is desired), the control system may select an action that maximizes, or at least increases, predicted success. In some aspects, in addition to maximizing predicted success, the control system may also seek to minimize, or at least reduce, the uncertainty.
- In some aspects, during the training or exploration phase, the control system can perform ensemble sampling. For example, for each set of input sensor data 305 (e.g., each time an action is requested or desired), the one member of the ensemble (e.g., one decoder 325) may be selected with at least an element of randomness (e.g., selecting the decoder randomly or pseudo-randomly). In some aspects, the interim affordance maps 330 generated by this selected decoder are the most important or dominant maps (or the only maps) used during this exploration stage for the current input data. For example, rather than using the output affordance maps 345, the control system may use the interim affordance maps 330 generated by the (randomly selected) decoder 325 during exploration. This can make the training process faster by adding noise to the training data to accelerate generalization.
- In some aspects, the uncertainty values (reflected in the uncertainty maps 350) may be summed with the probability values of the corresponding interim affordance maps 330 of the selected decoder 325. That is, for each set of action parameter values, the control system may sum the
corresponding uncertainty map 350 with the corresponding interim affordance map 330. For example, the control system may perform element-wise summation to add the uncertainty value for each location (e.g., each pixel) with the predicted probability of action success for each location. In some aspects, this summation is performed for each interim affordance map 330 generated by the selected decoder 325 (e.g., for each set of action parameters). - As the uncertainty maps 350 reflect the information radius with respect to performing the action using each configuration of action parameters, the control system can use the uncertainty maps to provide a proxy of the information that can be gained by attempting the action at each location using the indicated set of parameters. By summing affordance probabilities and the uncertainty values, the control system can obtain an upper confidence bound (UCB) for exploration, which can be used to efficiently learn to find new graspable configurations in the scene. In some aspects, at each time step (e.g., for each set of
input sensor data 305 or each time an action is requested or desired), the control system can score the possible configurations (e.g., each combination of a location and a set of action parameters) and select the highest-valued configuration (e.g., the location and set of action parameteres having the highest score) to test. - In some aspects, during exploration, the actions are sampled or selected according to Equation 2 below, where r(s, a) is the generated score of a given state s (e.g., a given location) using a given set of action parameters a, and p(g|s, a, θ) is the predicted probability of success for performing the action g with the action parameters a in state s, as generated by the selected portion of the model (e.g., the interim affordance map 330 generated using the decoder 325 that corresponds to parameters θ):
-
- In this way, the control system may generate a respective score for each respective pixel (e.g., for each location depicted by a pixel) in each respective interim affordance map 330 (e.g., for each set of action parameters). In some aspects, the control system then evaluates the generated scores to select the peak or highest score (e.g., the location and set of action parameters having the highest generated value). In this way, during exploration, the control system selects the action based on determining that performing the selected action (e.g., the action at the selected location and using the selected parameters) will maximize (or at least increase) the predicted success while also maximizing (or at least increasing) the uncertainty.
- As discussed above, this action may then be performed, and the success of the action can be evaluated to update or refine one or more parameters of the model. In some aspects, as discussed below in more detail, the control system may update a subset of the parameters, rather than all parameters. For example, the control system may only update the parameters of a selected decoder 325, leaving the other decoders unchanged, based on the success of the action. Similarly, in some aspects, the control system may use masked updating (e.g., masked backpropagation) to update only a subset of those parameters of the selected decoder 325, such as by updating only the parameters that correspond to the selected action location (e.g., the parameters used to predict the success probability for the selected pixel(s)), such that parameters corresponding to other locations (e.g., other pixels in the interim affordance map 330) are unchanged.
- In some aspects, during evaluation or use (e.g., runtime inferencing), where maximum accuracy may be preferred, the control system may use the average affordance probability map(s) (e.g., the affordance maps 345), obtained by averaging the probability values of the components in the ensemble, to select the best configuration to perform the action (e.g., the location and set of action parameters with the highest predicted probability of success). In some aspects, the control system may optionally incorporate the uncertainty maps 350 into this selection process (e.g., to select the least ambiguous configurations that are most likely to result in success).
- In some aspects, during this runtime or robustness phase, the actions are sampled or selected according to Equation 3 below, where r (s, a) is the generated score of a given state s (e.g., a given location) using a given set of action parameters a, p(g|s, a, θ) is the predicted probability of success for performing the action g with the action parameters a in state s, as generated by a specific portion of the model (e.g., the interim affordance map 330 generated using a single decoder 325 that corresponds to parameters θ), and θ˜Θ reflects that the expected value (e.g., the average value across the interim affordance maps 330) is evaluated:
-
- In this way, the control system may generate a respective score for each respective pixel or location in each respective affordance map 345 (e.g., for each set of action parameters). In some aspects, the control system then evaluates the generated scores to select the peak or highest score (e.g., the location and set of action parameters having the highest generated value). In this way, during inference, the control system selects the action based on determining that performing the selected action (e.g., the action at the selected location and using the selected parameters) will maximize, or at least increase, the predicted success while also minimizing, or at least reducing, the uncertainty.
- In some aspects, in a similar manner to training, the selected action may then be performed, and the success of the action can be optionally evaluated to update or refine one or more parameters of the model.
-
FIG. 4 is a flow diagram depicting anexample method 400 for selecting and performing actions using machine learning. In some aspects, themethod 400 is performed by a control system, such as thecontrol system 125 ofFIG. 1 , which may use an architecture for generating affordance maps, such as thearchitecture 200 ofFIG. 2 and/or thearchitecture 300 ofFIG. 3 . - At
block 405, the control system accesses sensor data (e.g., thesensor data 120 ofFIG. 1 , thesensor data 205 ofFIG. 2 , and/or thesensor data 305 ofFIG. 3 ). As used herein, “accessing” data may generally include receiving, requesting, retrieving, collecting, generating, or otherwise gaining access to the data. For example, as discussed above, the control system may access the sensor data continuously or periodically (e.g., every second), or each time an action is desired (e.g., each time the control system or another entity desires to perform the action, such as grasping an object and picking the object up). As discussed above, the sensor data may generally include a wide variety of data, including image data, depth data, point clouds, and the like. - At
block 410, the control system generates a set of affordance maps (e.g., the affordance maps 230 ofFIG. 2 and/or the affordance maps 345 ofFIG. 3 ) by processing the sensor data using a machine learning model (e.g., an ensemble model), as discussed above. One example of generating the set of affordance maps is described in more detail below with reference toFIG. 5 . - At
block 415, the control system generates a set of uncertainty maps (e.g., the uncertainty maps 350 ofFIG. 3 ) based on the interim affordance maps, as discussed above. For example, the control system may use Equation 1 to evaluate an uncertainty value for each pixel or location. - At
block 420, the control system selects an action based on the affordance maps and/or uncertainty maps. As discussed above, selecting an action may generally include selecting both a point in the space where the action will be performed (e.g., a location on an object depicted by a pixel in the affordance maps) and a set of action parameters (e.g., a grasp orientation, a grip force, and the like). In some aspects, as discussed above, the control system may evaluate the affordance maps and uncertainty maps during exploration (e.g., using Equation 2 above) to select the action. In some aspects, as discussed above, the control system may evaluate only the affordance maps or may evaluate both the affordance maps and the uncertainty maps during runtime use (when robustness is desired), such as using Equation 3 above. - At
block 425, the control system can perform the selected action. As used herein, “performing” the action may include transmitting, instructing, or otherwise facilitating performance of the action by another entity, such as a robotic arm. That is, “performing” the action may include instructing a robot (or another system that controls the robot) to perform the indicated action (e.g., to perform the action at the indicated location using the indicated action parameters). - In some aspects, during runtime, the
method 400 can then terminate or loop back to block 405 to select the next action. In some aspects, during exploration and/or when the control system is collecting data for potential further training, themethod 400 continues to block 430. - At
block 430, the control system generates a success value based on the performance of the action. For example, as discussed above, the control system may evaluate one or more sets of sensor data during and/or after performance of the action to evaluate how successful the action was. In some aspects, the success value is a categorical (e.g., binary) value, such as indicating whether the action was performed successfully. In some such aspects, the success criteria used to define whether a given action was successful may be defined based on the particular action. For example, with respect to a grasp action, the success criteria may include considerations such as whether the robot successfully picked up the object, whether the robot was able to lift and hold the object for at least some minimum period of time, whether the robot was able to rotate the object some amount, whether the robot was able to retain grip on the object while shaking or moving the object/end effector, and the like. - In some aspects, the success value is a continuous value indicating the degree of success. For example, the success value may be defined based on the acceleration the robot is able to undergo while maintaining the robot's grasp on the object (e.g., where higher accelerations of the end effector result in higher success scores).
- At
block 435, the control system updates one or more model parameters of the ensemble model based on the generated success value. In some aspects, as discussed above, the control system may use masked updating (e.g., masked backpropagation in the case of convolutional models) based on the selected location or pixel (where the action was performed), such that other parameters of the ensemble machine learning model corresponding to locations other than the selected location are not updated based on the success value. For example, the control system may generate a loss based on the success value, and mask the loss based on the specific location(s) or pixels (in the affordance map(s)) used to select the action (e.g., where the peak was). This masked loss can then be used to perform a masked backpropagation operation to update the corresponding (relevant) parameters of the model. - Similarly, in some aspects, if a subset of the ensemble model was used (e.g., a single decoder, such as during the exploration phase), the control system updates the parameters of the selected subset (e.g., the selected decoder and encoder), leaving remaining parameters (e.g., parameters of the other decoders) frozen.
- Although the illustrated example depicts updating the model parameters based on a single selected action/experiment (e.g., using stochastic gradient descent) for conceptual clarity, in some aspects the control system may update the model based on batches of data (e.g., using batch gradient descent).
- In some aspects, during runtime, the
method 400 can then terminate or loop back to block 405 to select the next action for new input data. -
FIG. 5 is a flow diagram depicting anexample method 500 for generating affordance maps. In some aspects, themethod 500 is performed by a control system, such as thecontrol system 125 ofFIG. 1 . In some aspects, themethod 500 provides additional detail forblock 410 ofFIG. 4 . - At
block 505, the control system generates one or more latent tensors (e.g., thelatent tensors 215 ofFIG. 2 ) by processing the input sensor data using one or more encoders (e.g., theencoders 210 ofFIG. 2 ). That is, if a single (shared) encoder is used, then the control system may generate a single latent tensor based on the input sensor data. If multiple encoders are used (e.g., one for each branch or model of the ensemble), then the control system may generate a respective latent tensor using each respective encoder. - At
block 510, the control system selects value(s) for a set of action parameter(s). In some aspects, as discussed above, each action that the system is able to perform may have an associated set of relevant action parameters indicating how to perform the action. For example, for a grasping action, the action parameters may include a grasping orientation (of the end effector), an amount of force with which to grip the object, a direction to move or turn the object after gripping, and the like. In some aspects, as discussed above, the values for continuous action parameters may be discretized into a set of categories or discrete values. For example, for an action parameter corresponding to the end effector orientation (which may have rotation components in multiple dimensions), there may be an infinite (or extremely large) number of possible rotation values. In some aspects, therefore, a defined set of orientations may be defined (e.g., five hundred different possible orientations, out of many more that are technically possible). - In some aspects, at
block 510, the control system selects a value or category for each relevant action parameter. In some aspects, the control system may select the set of parameters using any suitable criteria or technique, including randomly or pseudo-randomly. For example, in some aspects, the control system will select each possible combination of action parameters during the method 500 (sequentially or in parallel). In some aspects, atblock 510, the control system may generate an action parameter tensor encoding the selected values. - At
block 515, the control system generates one or more aggregated latent tensors based on the selected action parameters and/or generated action parameter tensor. For example, if a single shared encoder is used to create a single latent tensor, then the control system may generate an aggregated latent tensor by combining the action parameter tensor with the latent tensor (e.g., using concatenation). In some aspects, if multiple encoders are used (e.g., one for each branch of the ensemble), then the control system may combine the generated action parameter tensor with each respective latent tensor. - At
block 520, the control system generates a set of interim affordance maps (e.g., the interim affordance maps 330 ofFIG. 3 ) by processing the aggregated latent tensor(s) using each branch (e.g., each decoder, such as the decoders 325 ofFIG. 3 ) of the ensemble, as discussed above. - At
block 525, the control system determines whether there is at least one additional set of action parameters that has not been used to generate an aggregated latent tensor. For example, if there are three hundred unique sets of values, then the control system may determine whether each unique set of values has been evaluated. If there is at least one additional set of values remaining, then themethod 500 returns to block 510. If not, then themethod 500 continues to block 525. Although the illustrated example depicts an iterative process for conceptual clarity (selecting and evaluating each set of values sequentially), in some aspects, some or all of the alternative parameter values may be evaluated in parallel, as discussed above. - At
block 530, the control system optionally aggregates the interim affordance maps. Generally, the control system may use a variety of techniques to aggregate the interim affordance maps. For example, in some aspects, for each set of interim affordance maps that corresponds to the same set of action parameter values (e.g., one from each decoder in the ensemble), the control system may generate an output interim map reflecting (for each pixel or location), the average values from the set, the sum of the values in the set, and the like. - In some aspects, the control system aggregates the interim affordance maps to generate one or more output affordance maps in response to determining that the control system is executing in a runtime inference or robustness phase. For example, as discussed above with reference to Equation 3, the control system may aggregate the interim affordance maps in order to smooth over noise that may be caused by using the output of any single branch of the ensemble. In some aspects, in response to determining that the control system is executing in a training or exploration phase, the control system may refrain from aggregating the interim affordance maps (or may otherwise refrain from using the aggregated affordance maps, if the aggregated affordance maps are still generated). For example, as discussed above with reference to Equation 2, the control system may select and evaluate one of the branches (e.g., one of the interim affordance maps) rather than aggregating the branches during these phases.
- The
method 500 then terminates (e.g., returning to block 415 ofFIG. 4 ). -
FIG. 6 is a flow diagram depicting anexample method 600 for selecting and performing actions using machine learning. In some aspects, themethod 600 is performed by a control system, such as thecontrol system 125 ofFIG. 1 . - At
block 605, sensor data depicting a physical environment is accessed. - At
block 610, a set of output affordance maps is generated based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters. - At
block 615, a first set of action parameters and the first location are selected based on the set of output affordance maps. - At
block 620, the first action is performed at the first location in accordance with the first set of action parameters. - In some aspects, the
method 600 further includes generating a set of uncertainty maps based on the set of output affordance maps, comprising evaluating divergence between the set of output affordance maps, wherein the first set of action parameters and the first location are selected based further on the set of uncertainty maps. - In some aspects, generating the set of output affordance maps comprises: generating a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model, generating a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor, and generating a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
- In some aspects, generating the set of output affordance maps further comprises: generating a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model, and generating the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
- In some aspects, selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
- In some aspects, the first decoder is selected, from a plurality of decoders, with at least an element of randomness, and selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
- In some aspects, each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
- In some aspects, the action orientation comprises a grasp orientation for a robotic grasper.
- In some aspects, the
method 600 further includes generating a success value based on the performance of the first action at the first location in accordance with the first set of action parameters, and updating one or more parameters of the ensemble machine learning model based on the success value. - In some aspects, updating the one or more parameters of the ensemble machine learning model comprises performing a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
- In some aspects, the workflows, techniques, and methods described with reference to
FIGS. 1-6 may be implemented on one or more devices or systems.FIG. 7 depicts anexample processing system 700 configured to perform various aspects of the present disclosure, including, for example, the techniques and methods described with respect toFIGS. 1-6 . In some aspects, theprocessing system 700 may correspond to a control system, such as thecontrol system 125 ofFIG. 1 . For example, theprocessing system 700 may correspond to a device that controls robotic manipulators, trains affordance prediction models, and/or uses affordance prediction models during runtime. Although depicted as a single system for conceptual clarity, in some aspects, as discussed above, the operations described below with respect to theprocessing system 700 may be distributed across any number of devices or systems. - The
processing system 700 includes a central processing unit (CPU) 702, which in some examples may be a multi-core CPU. Instructions executed at theCPU 702 may be loaded, for example, from a program memory associated with theCPU 702 or may be loaded from a memory partition (e.g., a partition of memory 724). - The
processing system 700 also includes additional processing components tailored to specific functions, such as a graphics processing unit (GPU) 704, a digital signal processor (DSP) 706, a neural processing unit (NPU) 708, a multimedia component 710 (e.g., a multimedia processing unit), and awireless connectivity component 712. - An NPU, such as
NPU 708, is generally a specialized circuit configured for implementing the control and arithmetic logic for executing machine learning algorithms, such as algorithms for processing artificial neural networks (ANNs), deep neural networks (DNNs), random forests (RFs), and the like. An NPU may sometimes alternatively be referred to as a neural signal processor (NSP), tensor processing unit (TPU), neural network processor (NNP), intelligence processing unit (IPU), vision processing unit (VPU), or graph processing unit. - NPUs, such as the
NPU 708, are configured to accelerate the performance of common machine learning tasks, such as image classification, machine translation, object detection, and various other predictive models. In some examples, a plurality of NPUs may be instantiated on a single chip, such as a system on a chip (SoC), while in other examples the NPUs may be part of a dedicated neural-network accelerator. - NPUs may be optimized for training or inference, or in some cases configured to balance performance between both. For NPUs that are capable of performing both training and inference, the two tasks may still generally be performed independently.
- NPUs designed to accelerate training are generally configured to accelerate the optimization of new models, which is a highly compute-intensive operation that involves inputting an existing dataset (often labeled or tagged), iterating over the dataset, and then adjusting model parameters, such as weights and biases, in order to improve model performance. Generally, optimizing based on a wrong prediction involves propagating back through the layers of the model and determining gradients to reduce the prediction error.
- NPUs designed to accelerate inference are generally configured to operate on complete models. Such NPUs may thus be configured to input a new piece of data and rapidly process this piece of data through an already trained model to generate a model output (e.g., an inference).
- In some implementations, the
NPU 708 is a part of one or more of theCPU 702, theGPU 704, and/or theDSP 706. - In some examples, the
wireless connectivity component 712 may include subcomponents, for example, for third generation (3G) connectivity, fourth generation (4G) connectivity (e.g., Long-Term Evolution (LTE)), fifth generation (5G) connectivity (e.g., New Radio (NR)), Wi-Fi connectivity, Bluetooth connectivity, and/or other wireless data transmission standards. Thewireless connectivity component 712 is further coupled to one ormore antennas 714. - The
processing system 700 may also include one or moresensor processing units 716 associated with any manner of sensor, one or more image signal processors (ISPs) 718 associated with any manner of image sensor, and/or anavigation processor 720, which may include satellite-based positioning system components (e.g., GPS or GLONASS), as well as inertial positioning system components. - The
processing system 700 may also include one or more input and/oroutput devices 722, such as screens, touch-sensitive surfaces (including touch-sensitive displays), physical buttons, speakers, microphones, and the like. - In some examples, one or more of the processors of the
processing system 700 may be based on an ARM or RISC-V instruction set. - The
processing system 700 also includes thememory 724, which is representative of one or more static and/or dynamic memories, such as a dynamic random access memory, a flash-based static memory, and the like. In this example, thememory 724 includes computer-executable components, which may be executed by one or more of the aforementioned processors of theprocessing system 700. - In particular, in this example, the
memory 724 includes anaffordance component 724A, anaggregation component 724B, anuncertainty component 724C, anaction component 724D, and atraining component 724E. Thememory 724 further includesmodel parameters 724F for one or more models (e.g., affordance prediction models, such as themachine learning model 207 ofFIG. 2 , which may include one or more encoders such as theencoder 310 ofFIG. 3 and/or one or more decoders such as the decoders 325 ofFIG. 3 ). Although not included in the illustrated example, in some aspects thememory 724 may also include other data, such as a list of available or possible actions that the robotic manipulator(s) can perform, relevant action parameters for each action, possible values for each action parameter, and the like. Though depicted as discrete components for conceptual clarity inFIG. 7 , the illustrated components (and others not depicted) may be collectively or individually implemented in various aspects. - The
processing system 700 further comprises anaffordance circuit 726, anaggregation circuit 727, anuncertainty circuit 728, anaction circuit 729, and atraining circuit 730. The depicted circuits, and others not depicted, may be configured to perform various aspects of the techniques described herein. - For example, the
affordance component 724A and/or the affordance circuit 726 (which may correspond to or use all or a portion of a machine learning model such as themachine learning model 207 ofFIG. 2 , theencoder 310 ofFIG. 3 , and/or the decoders 325 ofFIG. 3 ) may be used to generate interim affordance maps (e.g., the interim affordance maps 330 ofFIG. 3 ), as discussed above. For example, theaffordance component 724A and/or theaffordance circuit 726 may process input sensor data using encoder(s) to generate latent tensor(s), aggregate these latent tensor(s) with action parameter information, and generate interim affordance map(s) using decoder(s). - The
aggregation component 724B and/or the aggregation circuit 727 (which may correspond to theaggregation component 335 ofFIG. 3 ) may be used to aggregate interim affordance maps (generated by theaffordance component 724A and/or the affordance circuit 726) to generate output affordance maps (e.g., the affordance maps 345 ofFIG. 3 ), as discussed above. For example, theaggregation component 724B and/or theaggregation circuit 727 may generate, for each respective set of action parameter values, a respective aggregated or output affordance map by averaging the corresponding set of interim affordance maps. - The
uncertainty component 724C and/or the uncertainty circuit 728 (which may correspond to theuncertainty component 340 ofFIG. 3 ) may be used to generate uncertainty maps (e.g., the uncertainty maps 350 ofFIG. 3 ) based on interim affordance maps (generated by theaffordance component 724A and/or the affordance circuit 726), as discussed above. For example, theuncertainty component 724C and/or theuncertainty circuit 728 may, for each respective set of action parameter values, generate a respective uncertainty map by computing the JSD of the corresponding set of interim affordance maps. - The
action component 724D and/or theaction circuit 729 may be used to generate action instructions (e.g., theactions 130 ofFIG. 1 ) based on interim and/or output affordance maps (generated by theaffordance component 724A, theaffordance circuit 726, theaggregation component 724B, and/or the aggregation circuit 727) and/or based on uncertainty maps (generated by theuncertainty component 724C and/or the uncertainty circuit 728), as discussed above. For example, theaction component 724D and/or theaction circuit 729 may use Equation 2 and/or Equation 3 above to select an action (e.g., a location in the environment, such as on an object, where the action should be performed, as well as a set of action parameter values for performing the action) that maximizes the probability of success and/or maximizes or minimizes uncertainty. - The
training component 724E and/or thetraining circuit 730 may be used to evaluate the success of the performed action(s) and/or to update the machine learning ensemble based on the determined success, as discussed above. For example, thetraining component 724E and/or thetraining circuit 730 may generate a success value or label based on the results of the action, and update the parameters of the corresponding portion(s) of the machine learning model that were used to select the action (e.g., the specific decoder and/or a subset of parameters for the encoder, such as the subset of parameters that correspond to the location/pixels where the action was performed). - Though depicted as separate components and circuits for clarity in
FIG. 7 , theaffordance circuit 726, theaggregation circuit 727, theuncertainty circuit 728, theaction circuit 729, and thetraining circuit 730 may collectively or individually be implemented in other processing devices of theprocessing system 700, such as within theCPU 702, theGPU 704, theDSP 706, theNPU 708, and the like. - Generally, the
processing system 700 and/or components thereof may be configured to perform the methods described herein. - Notably, in other aspects, elements of the
processing system 700 may be omitted, such as where theprocessing system 700 is a server computer or the like. For example, themultimedia component 710, thewireless connectivity component 712, thesensor processing units 716, theISPs 718, and/or thenavigation processor 720 may be omitted in other aspects. Further, aspects of theprocessing system 700 may be distributed between multiple devices. - Implementation examples are described in the following numbered clauses:
- Clause 1: A method, comprising: accessing sensor data depicting a physical environment; generating a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters; selecting, based on the set of output affordance maps, a first set of action parameters and the first location; and performing the first action at the first location in accordance with the first set of action parameters.
- Clause 2: A method according to Clause 1, further comprising generating a set of uncertainty maps based on the set of output affordance maps, comprising evaluating divergence between the set of output affordance maps, wherein the first set of action parameters and the first location are selected based further on the set of uncertainty maps.
- Clause 3: A method according to Clause 2, wherein generating the set of output affordance maps comprises: generating a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model; generating a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor; and generating a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
- Clause 4: A method according to Clause 3, wherein generating the set of output affordance maps further comprises: generating a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model; and generating the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
- Clause 5: A method according to Clause 4, wherein selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
- Clause 6: A method according to any of Clauses 3-5, wherein: the first decoder is selected, from a plurality of decoders, with at least an element of randomness, and selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
- Clause 7: A method according to any of Clauses 3-6, wherein each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
- Clause 8: A method according to Clause 7, wherein the action orientation comprises a grasp orientation for a robotic grasper.
- Clause 9: A method according to any of Clauses 1-8, further comprising: generating a success value based on the performance of the first action at the first location in accordance with the first set of action parameters; and updating one or more parameters of the ensemble machine learning model based on the success value.
- Clause 10: A method according to Clause 9, wherein updating the one or more parameters of the ensemble machine learning model comprises performing a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
- Clause 11: A processing system comprising: a memory comprising computer-executable instructions; and one or more processors configured to execute the computer-executable instructions and cause the processing system to perform a method in accordance with any of Clauses 1-10.
- Clause 12: A processing system comprising means for performing a method in accordance with any of Clauses 1-10.
- Clause 13: A non-transitory computer-readable medium comprising computer-executable instructions that, when executed by one or more processors of a processing system, cause the processing system to perform a method in accordance with any of Clauses 1-10.
- Clause 14: A computer program product embodied on a computer-readable storage medium comprising code for performing a method in accordance with any of Clauses 1-10.
- The preceding description is provided to enable any person skilled in the art to practice the various aspects described herein. The examples discussed herein are not limiting of the scope, applicability, or aspects set forth in the claims. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. For example, changes may be made in the function and arrangement of elements discussed without departing from the scope of the disclosure. Various examples may omit, substitute, or add various procedures or components as appropriate. For instance, the methods described may be performed in an order different from that described, and various steps may be added, omitted, or combined. Also, features described with respect to some examples may be combined in some other examples. For example, an apparatus may be implemented or a method may be practiced using any number of the aspects set forth herein. In addition, the scope of the disclosure is intended to cover such an apparatus or method that is practiced using other structure, functionality, or structure and functionality in addition to, or other than, the various aspects of the disclosure set forth herein. It should be understood that any aspect of the disclosure disclosed herein may be embodied by one or more elements of a claim.
- As used herein, the word “exemplary” means “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
- As used herein, a phrase referring to “at least one of” a list of items refers to any combination of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiples of the same element (e.g., a-a, a-a-a, a-a-b, a-a-c, a-b-b, a-c-c, b-b, b-b-b, b-b-c, c-c, and c-c-c or any other ordering of a, b, and c).
- As used herein, the term “determining” encompasses a wide variety of actions. For example, “determining” may include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure), ascertaining, and the like. Also, “determining” may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), and the like. Also, “determining” may include resolving, selecting, choosing, establishing, and the like.
- The methods disclosed herein comprise one or more steps or actions for achieving the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. Further, the various operations of methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software component(s) and/or module(s), including, but not limited to a circuit, an application specific integrated circuit (ASIC), or processor. Generally, where there are operations illustrated in figures, those operations may have corresponding counterpart means-plus-function components with similar numbering.
- The following claims are not intended to be limited to the aspects shown herein, but are to be accorded the full scope consistent with the language of the claims. Within a claim, reference to an element in the singular is not intended to mean “one and only one” unless specifically so stated, but rather “one or more.” Unless specifically stated otherwise, the term “some” refers to one or more. No claim element is to be construed under the provisions of 35 U.S.C. § 112 (f) unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.” All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims.
Claims (20)
1. A processing system, comprising:
at least one memory comprising processor-executable instructions; and
one or more processors configured to execute the processor-executable instructions and cause the processing system to:
access sensor data depicting a physical environment;
generate a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters;
select, based on the set of output affordance maps, a first set of action parameters and the first location; and
perform the first action at the first location in accordance with the first set of action parameters.
2. The processing system of claim 1 , wherein:
the one or more processors are configured to further execute the processor-executable instructions to cause the processing system to generate a set of uncertainty maps based on the set of output affordance maps;
to generate the set of uncertainty maps, the one or more processors are configured to execute the processor-executable instructions to cause the processing system to evaluate divergence between the set of output affordance maps; and
the first set of action parameters and the first location are selected based further on the set of uncertainty maps.
3. The processing system of claim 2 , wherein, to generate the set of output affordance maps, the one or more processors are configured to execute the processor-executable instructions to cause the processing system to:
generate a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model;
generate a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor; and
generate a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
4. The processing system of claim 3 , wherein, to generate the set of output affordance maps, the one or more processors are configured to further execute the processor-executable instructions to cause the processing system to:
generate a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model; and
generate the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
5. The processing system of claim 4 , wherein, to select the first set of action parameters and the first location, the one or more processors are configured to execute the processor-executable instructions to cause the processing system to determine, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
6. The processing system of claim 3 , wherein:
the first decoder is selected, from a plurality of decoders, with at least an element of randomness; and
to select the first set of action parameters and the first location, the one or more processors are configured to further execute the processor-executable instructions to cause the processing system to determine, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
7. The processing system of claim 3 , wherein each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
8. The processing system of claim 7 , wherein the action orientation comprises a grasp orientation for a robotic grasper.
9. The processing system of claim 1 , wherein the one or more processors are configured to further execute the processor-executable instructions to cause the processing system to:
generate a success value based on a performance of the first action at the first location in accordance with the first set of action parameters; and
update one or more parameters of the ensemble machine learning model based on the success value.
10. The processing system of claim 9 , wherein, to update the one or more parameters of the ensemble machine learning model, the one or more processors are configured to further execute the processor-executable instructions to cause the processing system to perform a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
11. A processor-implemented method, comprising:
accessing sensor data depicting a physical environment;
generating a set of output affordance maps based on processing the sensor data using an ensemble machine learning model, wherein each respective output affordance map of the set of output affordance maps indicates a respective probability that a first action can be performed at at least a first location in the physical environment using a respective set of action parameters;
selecting, based on the set of output affordance maps, a first set of action parameters and the first location; and
performing the first action at the first location in accordance with the first set of action parameters.
12. The processor-implemented method of claim 11 , further comprising generating a set of uncertainty maps based on the set of output affordance maps, comprising evaluating divergence between the set of output affordance maps, wherein the first set of action parameters and the first location are selected based further on the set of uncertainty maps.
13. The processor-implemented method of claim 12 , wherein generating the set of output affordance maps comprises:
generating a first latent tensor based on processing the sensor data using a first encoder of the ensemble machine learning model;
generating a plurality of aggregated latent tensors based on combining each of a plurality of action parameter tensors with the first latent tensor; and
generating a first plurality of interim affordance maps based on processing each of the plurality of aggregated latent tensors using a first decoder of the ensemble machine learning model.
14. The processor-implemented method of claim 13 , wherein generating the set of output affordance maps further comprises:
generating a second plurality of interim affordance maps based on a plurality of decoders of the ensemble machine learning model; and
generating the set of output affordance maps based on aggregating the first and second pluralities of interim affordance maps.
15. The processor-implemented method of claim 14 , wherein selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while minimizing uncertainty.
16. The processor-implemented method of claim 13 , wherein:
the first decoder is selected, from a plurality of decoders, with at least an element of randomness; and
selecting the first set of action parameters and the first location comprises determining, based on the set of output affordance maps and the set of uncertainty maps, that performing the first action at the first location will maximize predicted success while maximizing uncertainty.
17. The processor-implemented method of claim 13 , wherein each of the plurality of action parameter tensors corresponds to at least one of: (i) an action orientation, (ii) an action force, or (iii) an action direction.
18. The processor-implemented method of claim 17 , wherein the action orientation comprises a grasp orientation for a robotic grasper.
19. The processor-implemented method of claim 11 , further comprising:
generating a success value based on a performance of the first action at the first location in accordance with the first set of action parameters; and
updating one or more parameters of the ensemble machine learning model based on the success value.
20. The processor-implemented method of claim 19 , wherein updating the one or more parameters of the ensemble machine learning model comprises performing a masked backpropagation operation based on the first location such that one or more other parameters of the ensemble machine learning model corresponding to locations other than the first location are not updated based on the success value.
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US18/391,129 US20240383143A1 (en) | 2023-05-17 | 2023-12-20 | Affordance-driven modular reinforcement learning |
| PCT/US2024/021516 WO2024238018A1 (en) | 2023-05-17 | 2024-03-26 | Affordance-driven modular reinforcement learning |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363502752P | 2023-05-17 | 2023-05-17 | |
| US18/391,129 US20240383143A1 (en) | 2023-05-17 | 2023-12-20 | Affordance-driven modular reinforcement learning |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| US20240383143A1 true US20240383143A1 (en) | 2024-11-21 |
Family
ID=93465542
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/391,129 Pending US20240383143A1 (en) | 2023-05-17 | 2023-12-20 | Affordance-driven modular reinforcement learning |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US20240383143A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200361083A1 (en) * | 2019-05-15 | 2020-11-19 | Nvidia Corporation | Grasp generation using a variational autoencoder |
| US20210406695A1 (en) * | 2018-11-06 | 2021-12-30 | Emory University | Systems and Methods for Training an Autoencoder Neural Network Using Sparse Data |
| US20230311335A1 (en) * | 2022-03-30 | 2023-10-05 | Google Llc | Natural language control of a robot |
-
2023
- 2023-12-20 US US18/391,129 patent/US20240383143A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20210406695A1 (en) * | 2018-11-06 | 2021-12-30 | Emory University | Systems and Methods for Training an Autoencoder Neural Network Using Sparse Data |
| US20200361083A1 (en) * | 2019-05-15 | 2020-11-19 | Nvidia Corporation | Grasp generation using a variational autoencoder |
| US20230311335A1 (en) * | 2022-03-30 | 2023-10-05 | Google Llc | Natural language control of a robot |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110799992B (en) | Using simulation and domain adaptation for robot control | |
| EP3707645B1 (en) | Neural network systems implementing conditional neural processes for efficient learning | |
| EP3676765B1 (en) | Using hierarchical representations for neural network architecture searching | |
| WO2022042713A1 (en) | Deep learning training method and apparatus for use in computing device | |
| US20210390653A1 (en) | Learning robotic tasks using one or more neural networks | |
| CN112313043B (en) | Self-supervised robotic object interaction | |
| WO2021218517A1 (en) | Method for acquiring neural network model, and image processing method and apparatus | |
| WO2021218470A1 (en) | Neural network optimization method and device | |
| EP4172861B1 (en) | Semi-supervised keypoint based models | |
| CN111797895A (en) | A classifier training method, data processing method, system and device | |
| CN116992917A (en) | Systems and methods for selecting actions | |
| US10860895B2 (en) | Imagination-based agent neural networks | |
| US20240100694A1 (en) | Ai-based control for robotics systems and applications | |
| CN111340190A (en) | Method and device for constructing network structure, and image generation method and device | |
| CN116997939A (en) | Use expert blending to process images | |
| CN120641914A (en) | Controlling an Agent Using a Q-Transformer Neural Network | |
| CN117242453A (en) | Training graph neural networks using denoising targets | |
| US20240189994A1 (en) | Real-world robot control using transformer neural networks | |
| WO2023273934A1 (en) | Method for selecting hyper-parameter of model, and related apparatus | |
| JP2024102049A (en) | Training an Action Selection System Using Relative Entropy Q-Learning | |
| US20250042024A1 (en) | Affordance-based control system | |
| US20240383143A1 (en) | Affordance-driven modular reinforcement learning | |
| WO2024238018A1 (en) | Affordance-driven modular reinforcement learning | |
| CN117083643A (en) | Image object detection and classification method and system | |
| CN120604238A (en) | Open Vocabulary Robot Control Using Multimodal Language Models |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MAZZAGLIA, PIETRO;COHEN, TACO SEBASTIAAN;DIJKMAN, DANIEL HENDRICUS FRANCISCUS;SIGNING DATES FROM 20240102 TO 20240122;REEL/FRAME:066243/0011 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION COUNTED, NOT YET MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |