WO2025140777A1

WO2025140777A1 - Method and device for speech-supplemented kinesthetic robot programming

Info

Publication number: WO2025140777A1
Application number: PCT/EP2023/087949
Authority: WO
Inventors: Sebastian ZUDAIRE; Fabio AMADIO
Original assignee: ABB Schweiz AG
Current assignee: ABB Schweiz AG
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2025-07-03
Anticipated expiration: 2026-06-29

Abstract

A method of programming an industrial robot (100), composed of a robot manipulator (110) and a robot controller (120), comprises: recording movements of the robot manipulator during a kinesthetic programming session, for thereby obtaining a robot trajectory; capturing speech data while recording the movements of the robot manipulator; decomposing the recorded speech data into a plurality of phases; for each phase of the speech data, parsing the speech data into robot parameter values using a natural-language model (170), identifying a contemporaneous phase of the robot trajectory and annotating it with the robot parameter values; and, on the basis of the annotated robot trajectory, generating a robot program executable by the robot controller.

Description

METHOD AND DEVICE FOR SPEECH-SUPPLEMENTED KINESTHETIC ROBOT PROGRAMMING

TECHNICAL FIELD

[0001] The present disclosure generally relates to the field of robotic control, and specifically to kinesthetic programming of an industrial robot. More precisely, methods and devices are proposed herein which allow an operator to annotate different phases of a robot trajectory recorded during such kinesthetic programming with robot-parameter values that are parsed from speech data.

BACKGROUND

[0002] The state of the art includes setups where speech input is used to supplement observation-based robot programming. For example, the patent application published as EP4171893A1 discloses methods and systems for teaching robots to learn tasks in real environments based on verbal or textual cues. In particular, a verbal-based focus-of-attention model receives input, parses the input to recognize at least a task and a target object name. This information is used to spatio- temporally filter a demonstration of the task, for thereby allowing the robot to focus on the target object and movements associated with the target object within a real environment, which may be cluttered, noisy, chaotic etc. In this way, the robot is able to recognize where and when to pay attention to the demonstration of the task, thereby enabling the robot to learn the task by observation in a real environment.

[0003] According to said application EP4171893A1, the speech commands are analyzed using a language parser configured to recognize a list of predefined command words. The parsed speech commands are utilized as a voice-over, in the sense that they duplicate - and normally confirm - the information that an observer of the demonstrated physical movements can notice. For instance, the operator may say “Pick up the cup and place it on the shelf” while doing exactly this. It is claimed that this voice-based support helps the learning module concentrate on the decisive events during the demonstration while disregarding extraneous information, acoustic noise, irrelevant objects in the scene, and the like. The speech commands in EP4171893A1 apply to the robot task as a whole; for a robot task with multiple steps, it is not possible to provide independent commands for the different steps. [0004] A second prior art disclosure, US20230120598A1, proposes a method for teaching a robot to perform an operation based on a human demonstration that is observed using force sensors and vision sensors. The method includes a vision sensor to detect a position and pose of the operator’s hand and optionally a position and pose of a workpiece during teaching of an operation. Example operations may be pick, move and place. Data from the vision and force sensors, along with other optional inputs, are used to teach both motions and state-change logic for the operation being taught. Several techniques are proposed in US20230120598A1 for determining the state-change logic, such as the transition from approaching to grasping. It is disclosed that a voice command can trigger a transition between different states of the robot, each time with a visible result that bystander to the demonstrated physical movements can observe and estimate. For example, the word “release” causes the robot to leave a “pick” state; and the word “depart” causes the robot to enter a “move” state. Alternatively, a voice command may explicitly state the target state that the robot is asked to enter, such as “pick” or “move”.

[0005] Even in view of these and other solutions within the state of the art, it is clear that the area of speech-supported robot programming holds further potential to be tapped.

SUMMARY

[0006] One objective of the present disclosure is to propose methods and devices that allow more efficient or more precise ways of supplementing kinesthetic robot programming with speech. A further objective is to allow the operator to apply different speech data to different portions of the robot trajectory, that is, with a local applicability. A further objective is to automate the decomposing of a robot trajectory into phases; particularly, it would be desirable to provide automated decomposition of the robot trajectory into phases that lend themselves to generalization, e.g., that the robot may learn how to perform the actions in the phases autonomously or semi- autonomously. A further objective is to enable the operator to provide, during the kinesthetic programming session, indications concerning such parameters of the industrial robot that do not produce a visible result that an observer of the kinesthetic programming session can see. A further objective is to make speech-based interaction available also to an unexperienced operator, who may start using the speech medium with no or just a negligible amount of prior training; in particular, it would be desirable if the operator could use speech commands from an unrestricted vocabulary and/or according to a leniently defined syntax while being understood as if by a human listener.

[0007] At least some of these objectives are achieved by the invention defined by the independent claims. The dependent claims relate to advantageous embodiments of the invention.

[0008] In a first aspect of the present disclosure, there is provided a method of programming an industrial robot, which comprises a robot manipulator and a robot controller. The method comprises: recording movements of the robot manipulator during a kinesthetic programming session, for thereby obtaining a robot trajectory; capturing speech data while recording the movements of the robot manipulator; and decomposing the captured speech data into a plurality of phases. For at least one phase (in particular, each phase) of the speech data, the speech data is parsed into at least one robot parameter value of at least one robot parameter using a naturallanguage model, a contemporaneous phase of the robot trajectory is identified, and said phase of the robot trajectory is annotated with the robot parameter values. On the basis of the annotated robot trajectory, then, a robot program which is executable by the robot controller is generated.

[0009] Thanks to the novel combination of decomposing the captured speech data into a plurality of phases and identifying phases of the robot trajectory which are contemporaneous with these, the method according to the first aspect offers a precise way of annotating different phases of the trajectory in different ways. The different phases may correspond to different segments or different time intervals. Because furthermore the annotations are derived by parsing speech data into robot parameter values using a natural-language model, operators are not required to devote their hands to a new input task while carrying out the kinesthetic programming.

[0010] Advantageously, a large language model (LLM) can be used as the naturallanguage model. Because the LLM has been trained on a very large corpus of speech or text, it is able to successfully process a broad spectrum of natural-language input. When an LLM is used in the present method, the operator does not need to actively restrict his or her speech input to a closed list of command words, nor is it necessary for the operator to respect a specified syntax very attentively. [oon] In some embodiments, the method according to the first aspect includes a step of enabling the operator to review and edit a text representation of the robot program. For instance, the output of the speech-supplemented lead-through programming can be a routine in a robot script language, such as RAPID™, which the operator can not only have executed by the robot, but which the operator can also inspect and edit if necessary. It has been observed in the prior art that purely demonstration-based programming interfaces are sometimes perceived as tiring, and the operators will want to perform some aspects of the programming through the medium of text. Indeed, because re-recording a full trajectory is time consuming, text may be an operator’s preferred medium of editing the robot program or extending it to include interactions with equipment outside the robot. In the context of RAPID™, such interactions may relate to error handling, wait times, input/ output connections, etc.

[0012] In a second aspect of the disclosure, there is provided a device for facilitating programming of an industrial robot. The device comprising memory and processing circuitry, which is configured to record movements of the robot manipulator during a kinesthetic programming session, for thereby obtaining a robot trajectory; to capture speech data while recording the movements of the robot manipulator; and to decompose the captured speech data into a plurality of phases. The processing circuitry is further configured to perform the following acts for at least one phase (in particular, each phase) of the speech data: to parse the speech data within this phase into at least one robot parameter value of at least one robot parameter using a natural-language model, to identify a phase of the robot trajectory which is contemporaneous with said phase of the speech data, and to annotate the contemporaneous phase with the at least one robot parameter value. Further, the processing circuitry is configured to generate, on the basis of the annotated robot trajectory, a robot program executable by the robot controller.

[0013] The present disclosure further relates to a computer program containing instructions for causing a computer - or the device according to the second aspect in particular - to carry out the above method. The computer program may be stored or distributed on a data carrier. As used herein, a “data carrier” may be a transitory data carrier, such as modulated electromagnetic or optical waves, or a non-transitory data carrier. Non-transitory data carriers include volatile and non-volatile memories, such as permanent and non-permanent storage media of magnetic, optical or solid-state type. Still within the scope of “data carrier”, such memories may be fixedly mounted or portable.

[0014] In the terminology of the present disclosure, an “industrial robot” is a device with a physical manipulator, which is designed or suitable for manufacturing, processing, material handling, destruction and similar tasks in an industrial context. The term industrial robot covers the full range from lightweight robots designed to replace human manual work, over collaborative robots for supporting a human worker, all the way up to heavy-duty robots.

[0015] “Kinesthetic programming” - referring to the robot’s proprioceptive ability to sense its own position, pose and movements - is a programming approach in which the user performs a demonstration by physically moving the robot manipulator through the desired motions. The terms kinesthetic programming and lead-through programming are synonymous or at least partially overlapping in meaning. The state of the robot during a kinesthetic programming session is typically recorded by means of the robot’s onboard sensors, e.g., joint angles and torques. As used herein, kinesthetic programming also includes teleoperation as a special case, where the movement of the robot manipulator is controlled by an external input to the robot through a joystick, graphical user interface or other input means; kinesthetic programming by teleoperation does not require the user to be present in the work area of the robot.

[0016] Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to “a/an/the element, apparatus, component, means, step, etc.” are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order described, unless this is explicitly stated.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] Aspects and embodiments are now described, by way of example, with reference to the accompanying drawings, on which: figure i shows a work area in which a robot manipulator operates under the control of a robot controller, and in which there is further arranged a device supporting speech- supplemented kinesthetic robot programming; and figure 2 is a flowchart of a method of programming an industrial robot, according to embodiments herein.

DETAILED DESCRIPTION

[0018] The aspects of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, on which certain embodiments of the invention are shown. These aspects may, however, be embodied in many different forms and should not be construed as limiting; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and to fully convey the scope of all aspects of the invention to those skilled in the art. Like numbers refer to like elements throughout the description.

System overview

[0019] Figure 1 shows an industrial robot 100 made up of a robot manipulator no and a robot controller 120. The robot manipulator 110 and the robot controller 120 are joined by a wired or wireless bidirectional data connection, which conveys control signals, sensor data etc.

[0020] The robot manipulator no includes an arm, which extends from a base 112 and is made up of structural elements 114 and at least one linear or rotary joint 113. The arm may further carry tools 111 that allow it to interact with various workpieces 130, which are present in a work area 150 of the robot manipulator 110. The workpieces are subject to manufacture, processing or handling by the robot manipulator 110. The work area 150 may further include additional objects 140, such as containers, fixtures, separators, insulators, supports etc.; these objects 140 maybe generic or may be specifically adapted to the workpieces 130 handled by the robot manipulator no. The arm of the robot manipulator no is movable by action of internal motors, drives or actuators (not shown), and it includes transducers, sensors and other measuring equipment (not shown), from which the manipulator’s no current position, pose, technical condition, load etc. can be derived, to some degree of accuracy. The position, pose etc. of the robot manipulator 110 may in particular refer to a point on the arm, particularly to a tool-center point (TCP). [0021] The positions, poses etc. may be expressed with respect to one of multiple possible reference frames, including a fixed reference frame (O₁₅₀ ) with its origin in a point in the work area 150, a fixed reference frame with origin in the base 112, a moving reference frame (O_130a) with origin in a workpiece or a moving reference frame (O_1U) with its origin at the TCP and oriented parallel to the tool 111 at all times.

[0022] The robot controller 120 comprises processing circuitry 121, a memory 122 and a communication interface 123. Example content of the memory 122 during operation may include

S: an operating system, basic settings, software implementing generic movements, sensing, self-monitoring, generically useful functionalities and services (all typically contributed by an original manufacturer), task- or role-specific configurations, configuration templates (typically contributed by a robot system integrator), and site-specific settings (typically contributed by an end user);

C: robot programs or projects causing the robot manipulator no to perform useful or intended tasks in its work area 150.

Processes that execute the robot programs may do so in accordance with the program-independent memory content S, e.g., by making calls to available functionalities, libraries, routines or parameter values therein. It is recalled that the memory 122 and processing circuitry 121 of the robot controller 120 maybe distributed and/or contain networked resources having a different physical localization than figure 1 suggests.

[0023] Each of the programs C may contain a plurality of movement instructions relating to locations such as points, poses, paths as well as modulated paths. A program maybe a compiled executable (binary) or a script. A movement instruction relating to a modulated path maybe expressed as - or may include - a process-on- path instruction. The programs C maybe created by an operator 190 with the aid of a robot programming device (programming station) or a general-purpose computer, or they maybe created directly at the robot controller 120 if it has an operator interface (not shown). In the first two cases, versions of the programs C maybe downloaded to the robot controller 120 over a wired or wireless connection or by being temporarily stored on a portable memory. As will be described below, a robot program may further be included by means of kinesthetic programming supplemented with speech data. [0024] A dedicated programming device 160 is shown in the right-hand portion of figure 1. From the programming device 160, the created programs C can be transmitted to the communication interface 123 of the robot controller 120 and then stored in the memory 122 where they are available for execution. It is noted that the programming device 160 may be implemented as a set of collaborating components within the robot controller 120. In fact, the components can be shared with the robot controller 120, e.g., by using the processing circuitry 121 for the dual purposes of robot control and programming and/or using the memory 122 for the same dual purposes. In other words, the programming device 160 may constitute a portion of a multi-purpose device; it need not be a standalone device or a device with programming as its sole or main purpose.

[0025] The programming device 160 - which is shown as a standalone device in the non-limiting example of figure 1 - comprises processing circuitry 161, memory 162 and at least one communication interface 163. To parse speech data, the programming device 160 utilizes a natural-language model 170, which may be either stored internally or - as in the example configuration shown in figure 1 - may be available from an external server or host computer. In experiments, the inventors have observed that general-purpose natural-language models 170 perform surprisingly well; relevant training and re-training on data related to robot programming may improve this further. During a kinesthetic programming session, the programming device 160 has access to position data representing an actual position, a recorded position or recorded movements of the robot manipulator no, as well as speech data representing utterances or narration by the operator 190 during the programming session. The position data may for example be obtained through the intermediary of the robot controller 120, which monitors position data in the normal course of its operation; alternatively, the programming device 160 is granted access to signals from the transducers, sensors or other measuring equipment in the robot manipulator no. The speech data may be captured by one or more acoustic transducers (microphones) arranged in the vicinity of the operator’s 190 normal position; alternatively, the speech data may be captured by a portable microphone or a headset worn by the operator 190.

[0026] Optionally, the programming device 160 further has access to an imaging device (e.g., camera, video camera) 165, by which images of the kinesthetic programming section can be captured. In particular, the imaging device maybe a depth camera, which in addition to the two-dimensional appearance of an object also determines a depth coordinate of the objection, e.g., by time-of-flight measurements, triangulation, lidar or other per se known techniques. An RGB-D camera is an example of a depth camera. The imaging device 165 may be a part of the programming device 160, or it maybe integrated in the industrial robot 100 and optionally be used for other tasks as well.

Programming method

[0027] Turning to the flowchart in figure 2, there will now be described an embodiment of a method 200 of programming an industrial robot 100 by means of speech-supplemented kinesthetic programming. The method 200 maybe considered as a description of the behavior which the programming device 160 is configured to carry out. The method 200 may as well correspond to the behavior of the robot controller 120 in a special programming mode, which the robot controller 120 can be requested to enter. Instructions for causing a computer - or the programming device 160 in particular - to carry out the method of any of claims may be provided in the form of a computer program 164 (see figure 1).

[0028] A first step 210 of the method 200, which is executed during a kinesthetic programming session, includes recording movements of the robot manipulator no. On the basis of the recorded movements, a robot trajectory can be obtained, e.g., by combining the recorded movements. The robot trajectory may indicate the robot manipulator’s no position or pose (e.g., in Cartesian or joint-space coordinates) as a function of time. The position or pose need not be indicated for all points in time. Rather, the trajectory may also be expressed as a number of reference times at which the robot manipulator 110 is to assume corresponding setpoint positions or setpoint poses, wherein the robot manipulator 110 is free to have arbitrary positions or poses in the intervals between the reference times.

[0029] As explained initially, the kinesthetic programming session includes phases where the operator 190 moves the robot manipulator no into desired poses and positions by direct manual manipulation, or where the operator 190 causes the robot manipulator 110 to perform such movements by providing an external input through a joystick, graphical user interface or other input means connected to the robot controller 120 or the programming devices 160. The trajectory may correspond to a state-space representation, such as joint angles as a function of time. Alternatively, in other use cases, it may suffice to express the trajectory in terms of the position of a reference point on the robot manipulator no as a function of time.

[0030] In a next step 211, speech data is captured while the movements of the robot manipulator are being recorded. For this purpose, a microphone or another acoustic transducer maybe utilized. The speech data maybe stored in any suitable audio format, and at a bitrate considered adequate for the application at hand.

[0031] Further, in a step 212, the captured speech data is decomposed into a plurality of phases. The phases may correspond to different time intervals of a captured audio track or different segments. They should preferably correspond to different robot activities, so that parsed parameter values (see step 214) can be applied naturally. Normally, a time-uniform decomposition is not very useful as it does not account for differences in duration of the different activities, nor is a spaceuniform decomposition. The robot activities maybe sub-tasks (e.g., picking a cube, rotating a handle), steps explicitly mentioned by the operator 190 (e.g., “Now I will begin the process of moving backwards”) or robot operating modes (e.g., linear motion, motion parallel to the table, opening gripper, closing gripper), or various combinations of these. The phases maybe annotated with respective human- intelligible labels; optionally, the content of these labels is derived from the speech data, e.g., based on utterances by the operator 190 such as “The preheating is now complete, and I move the workpiece from the oven to the mold to begin the shaping.” Such a statement maybe interpreted as marking the end of a preheating phase and the start of a shaping or molding phase.

[0032] The decomposing in step 212 may be carried out using the naturallanguage model 170. The natural-language model 170 may constitute a large language model (LLM), that is, a type of machine-learning algorithm which has been trained on very large datasets using deep learning techniques to be able to perform naturallanguage processing (NLP) tasks. Example NLP tasks are recognizing, summarizing, translating, predicting and generating plausible textual content. A very large dataset in this sense may include of the order of one million parameters, such as tens of millions of parameters, such as hundreds of millions of parameters. An LLM may have a transducer architecture, particularly a transducer architecture with four cascaded key links when it processes the input data, namely: word embedding, position encoding, self-attention mechanism, feedforward neural network. At the time of filing this disclosure, noteworthy example LLMs include Bidirectional Encoder Representations from Transformers (BERT), Bard, BLOOM, Claude 2, various versions of Generative Pre-trained Transformer (GPT), Llama, PaLM 2, RoBERTa, T5, LaMDA, Turing NLG. LLMs include as a special case multimodal models, such as Gemini. One benefit of an LLM that the inventors target specifically is that the vocabulary is practically open-ended. The operator 190 can start using it without prior training. The kinesthetic programming can be carried out without requiring the operator to be extremely focused on using the right command words (or avoiding them) and/or syntax.

[0033] In a next step 214, for each of the thus obtained phases during which speech data was captured, this speech data is parsed into one or more robotparameter values using the natural-language model 170. (It is noted that the described embodiment of the method 200 includes executing step 214 for all phases of the captured speed data; in other embodiments, step 214 maybe executed for just one phase or just a subset of all phases.) The parsing may include transcribing the speech data into timestamped text, after which the natural-language model 170 is used for extracting robot parameter values from the timestamped text. Step 214 may return robot-parameter values pertaining to one or more robot parameter; further, step 214 may return one or more parsed robot-parameter values for each of said robot parameter or robot parameters.

[0034] The robot parameter(s), whose values are determined in step 214, may for example include a state of a tool 111 carried by the robot manipulator, such as an open or closed position of a gripper. This state is to apply for the phase in which it was uttered by the operator 190.

[0035] Further, the robot parameter(s) may include, additionally or alternatively, a (setpoint) movement speed of the robot manipulator no. The speed may be measured at a reference point on the robot manipulator 110, such as the TCP. The ability to indicate the setpoint speed by the medium of speech data may be a great help to the operator 190 as it is, most often, practically impossible to perform the desired manipulator movements at the true speed without sacrificing accuracy and safety. A natural-language model 170 may be used that has been trained to understand not only numerical speed indications but also everyday speed-related vocabulary like “fast”, “slowly”, “quite slowly” or relative speed indications such as “faster”. Early experiments suggest that that a general-purpose natural-language model 170 is able to map the everyday vocabulary to reasonable and practically useful numerical values expressed as a percentage of full speed. The accuracy can of course be further improved by retraining on representative robots and robot tasks. In the RAPID™ robot programming language, which can be used with a number of the applicant’s products at the time of filing, the movement speed may correspond to the type speeddata, to which values can be assigned according to the operator’s desired percentages. The assigned values could for instance be v20 (20 mm/s; “slow”), V50 (50 mm/s; “medium speed”), vioo (100 mm/s; “fast”) for a robot model with a full speed of approximately 100 mm/s.

[0036] Further, the robot parameter(s) may include, additionally or alternatively, a required degree of compliance with the robot trajectory obtained in the recording step 210. A low degree of compliance signifies that the industrial robot 100, when executing the robot program, is authorized to apply simplifying geometric modifications to the robot trajectory, such as smoothing, straightening or removal of sharp bends; this may enable a higher movement speed and/ or less mechanical wear. A high degree of required compliance signifies that the robot trajectory is to be reproduced more or less identically. Operation with a maximum degree of compliance may require the robot manipulator no to stop completely at one or more points on the trajectory. The following example phrases illustrate a desirable behavior of the natural-language model 170:

- “Move the robot close to this object”: low compliance, as only the end position is relevant;

- “Reorient the box handle towards the top”: high compliance, as it involves manipulation of a relatively complex manufactured article;

- “Reorient the object so it is aligned with the place position”: low compliance, as only the final position is relevant;

- “Carefully proceed with the insertion”: high compliance, given the probable tight tolerances during an insertion of an elongated object;

- “Distribute the paint evenly”: high compliance because the paint is ejected during movement of the robot arm. In some embodiments of the method 200, the value of the required degree of compliance influences the performance of step 218. More precisely, if the recorded trajectory is sampled into discrete points, then a low degree of compliance maybe ensured by deliberately excluding some of these points from consideration; this may for example allow the robot manipulator no to jump straight to the final position. Conversely, ensuring a high degree of compliance could mean all points are used. Current releases of the RAPID™ robot programming language contain no directly corresponding type.

[0037] Further, the robot parameter(s) may include, additionally or alternatively, a required degree of movement precision. This maybe expressed as a movement precision margin, which embeds the trajectory and which represents an acceptable geometric deviation (e.g., in millimeters) from the trajectory. In the RAPID™ robot programming language, the movement precision margin may correspond to the type zonedata. With reference to the preceding paragraph, the required degree of compliance targets primarily large-scale simplifications (e.g., jumping straight to a final position the robot manipulator no), while the required degree of movement precision is connected more closely to zonedata in RAPID. In many practical cases, the two parameters can be used to achieve an equivalent or similar effect. The following example phrases illustrate a desirable behavior of the natural-language model 170:

- “Move carefully downwards to do the insertion”: high precision is required;

- “I am moving the robot back to the home position”: low precision is acceptable since the robot manipulator no is moving away from the equipment;

- “I will move the robot close to the object to grip”: medium precision, as there maybe other objects in the environment of the object that the robot manipulator 110 should not collide with.

[0038] Further still, the robot parameter(s) may include, additionally or alternatively, a reference frame in which the robot program C shall express the movements of a tool 111 carried by the robot manipulator 110. A reference frame is defined by coordinate axes and an origin. The coordinate axes may be predefined and common to all available reference frames; alternatively, the coordinate axes may be moveable and at all times oriented parallel to the object in which the origin is located. More precisely, the origin of the reference frame may be located in a point in the base 112 of the robot manipulator no or in a point in the work area 150; these options correspond in practice to using a fixed reference frame. Another option is to use an origin which moves with a workpiece 130; the coordinate axes maybe fixed or rotate together with any rotations of the workpiece 130. A further option is to locate the origin of the reference frame in a non-workpiece object 140 (e.g., container, fixture, separator etc.) in the work area 150. In the RAPID™ robot programming language, the reference frame may correspond to the type wobjdata. The following example phrases illustrate a desirable behavior of the natural-language model 170:

- “I am taking the robot back to the home position”: the robot base 112 can be the reference frame;

- “I am approaching the red cube to pick up”: the red cube can be the reference frame;

- “I am inserting the tube in the hole”: a point in or at the hole can be the reference frame.

[0039] In such embodiments of the method 200 where an LLM is used as the natural-language model 170, the decomposition in step 212 and the parsing in step 214 maybe carried out jointly by prompting the LLM to segment the timestamped text into phases, each phase annotated with a start and end time, and to identify values of one or more explicitly indicated robot parameters. This will be illustrated by examples below.

[0040] The execution flow of the method 200, after completing the parsing of the speech data in a phase into robot parameter values (step 214), moves on to a step 21 of identifying a phase of the robot trajectory which is contemporaneous with the phase of the parsed speed data. The phase of the robot trajectory maybe identified by matching start end times of the phase of the parsed speed data with time indications forming part of the recorded trajectory.

[0041] Then, in a step 216, the identified contemporaneous phase is annotated with the robot-parameter values, which are the expected output of the parsing. The annotation shall be local in the sense that it does not apply - at least not by default - to the full robot trajectory. The output of step 216 includes the extracted robot parameter values associated with the respective phase of the speech data. The output is to be provided in a form that can be handed over safely to the downstream processing steps without losing or corrupting the information. The output is preferably organized phase by phase, that is, the set of robot-parameter assignments for each new phase are contained in a new item in the output. In particular, the extracted robot parameter values may be formatted in accordance with a data serialization format segmented into phases. Examples data serialization formats are JSON (specified in The JSON Data Interchange Syntax, Standard ECMA-404, 2^nd edition (2017-12)), YAML (specified in YAML Ain’t Markup Language (YAML™), version 1.2, revision 1.2.2 (2021-10-01)) and XML (specified in XML Signature Syntax and Processing, version 1.1, W3C Recommendation (2013-04-11)).

[0042] This output of the annotating step 216 may be used as an instruction to the effect that the default robot-executable code - which would result from a default transformation of the recorded trajectory into robot-executable code - shall be modified or extended in such manner as to realize the parsed robot-parameter values. Indeed, in some implementations of the method 200, the robot-executable code is not generated until in the final or penultimate step (step 218 in this example); this is the step in which the generated robot-executable code is modified or extended in relation to its default appearance, so as to comply with the output of the annotating step 216. For example, the output of the annotating step 216 may act as an instruction, applicable to said code-generating step, to add a modifier to a robot command which is to be executed in this particular phase of the robot trajectory. In the RAPID™ robot programming language, a modifier of the type speeddata or the type zonedata may be added to the example command MoveL, so that the speed or movement precision of the industrial robot 100 gets this value rather than the default value.

[0043] Steps 214, 215 and 216 are then repeated for a next phase of the speech data. The execution of steps 214, 215 and 216 on the different phases maybe sequential, but could as well be carried out for all phases of the speech data in parallel or in an order that does not correspond to the temporal sequence of the phases. An advantage of executing step 214 a sequentially for the sequence of phases maybe the ability to make contextual guesses, including stateful guesses, e.g., by simulating how the internal state of the industrial robot 100 evolves from one phase to the next one, so that some interpretations of the speech data can be ruled out and attention focused on the still possible interpretations. A phase- wise sequential execution of step 214 (and possible steps 215 and 216 too) may thus, at least in some circumstances, lead to more accurate parameter-value assignments.

[0044] The method 200 further includes a step 218 of generating a robot program (or robot routine) executable by the robot controller. Before execution of this step 218, it is assumed that the trajectory recorded in step 211 - typically a geometric description or a joint-space description - and the instruction concerning robotparameter values which was prepared in the annotating step 216. The present step 218 may include converting the trajectory into a sequence of robot commands that cause the robot manipulator no to realize the trajectory, while applying the robotparameter values as modifiers to these commands. The robot commands may be selected from a predefined set of robot commands executable by the robot controller 120. The robot commands may for example be compliant with the RAPID™ robot programming language. The step 218 may include a first substep of sampling the trajectory into a sequence of discrete points and a second substep of selecting robot commands that cause the robot manipulator no to move between each pair of consecutive discrete points. The sampling used in the first substep maybe timeuniform sampling (constant step duration), space-uniform sampling (constant step length), or a non-uniform sampling algorithm with controlled deviation, such as Ramer-Douglas-Peucker. In a RAPID™ environment, the second substep may be performed so as to output instances of the command MoveL (cartesian linear motion) or Move J (joint-space linear motion) or a combination of these. Optionally, the step 218 may include a postprocessing substep applied to the sequence of generated robot commands, such as formatting the sequence into a predefined script format by appending a header, performing a consistency check, or the like.

[0045] In an optional final step 21Q, the operator 190 is offered the ability to review and edit a text representation of the robot program resulting after step 218. For example, the programming device 160 may initiate the activation of a text editor or another user interface to be put at the operator’s 190 disposal. The activated user interface maybe a user interface of the robot controller 120. As explained above, it has been observed in the prior art that purely kinesthetic programming is sometimes perceived as tiring, and the operators will want to perform some aspects of the programming through the medium of text. Further, because re-recording a full trajectory is time consuming, text maybe an operator’s preferred medium of editing the robot program.

[0046] Special embodiments of the programming method 200 further take into account information derived from images by at least one camera 165 or depth camera. The images are acquired and preprocessed in a step 21.2, and the derived information is utilized in the subsequent processing steps.

[0047] In one of the special embodiments, the method 200 further includes capturing images of the kinesthetic programming session (substep 213.1) and estimating the poses of the workpiece 130 in respective phases of the robot trajectory (substep 213.2). Poses in this sense may include positions and orientations. The images maybe still images acquired at relevant points in time, or the images maybe extracted from a video sequence. The estimated poses may be used to determine the reference frames in which the robot program C shall express the movements of a tool 111 carried by the robot manipulator no, and particularly to determine relationships between the reference frames. This way, even if the reference frame moves between two executions, the trajectory can be replicated with respect to the demonstration so that the relative motion between the TCP and the reference frame stays the same.

[0048] A further possible use of the images of the kinesthetic programming session is in step 216, and more precisely for the purpose of further annotating the robot trajectory with the estimated poses of the workpiece 130, e.g., by including them in the serialized data file. The estimated poses may appear explicitly in the annotations, e.g., to allow a verification that the robot program is executing in the way expected, so that irregularities can be discovered early. Alternatively, the estimated workpiece 130 poses may have a merely indirect influence, such as preferable ways of gripping or lifting the workpiece 130 using the tool 111 carried by the robot manipulator no.

Example 1

[0049] In a laboratory experiment, the inventors taught an ABB GoFa™ CRB 15000, a lightweight collaborative industrial robot manufactured by the applicant, to manipulate a plastic toolbox of approximately 50 x 30 x 30 cm with a movable handle on the lid. The appearance of the toolbox maybe similar to item 130a in figure 1. The GoFa robot was equipped with an OnRobot 2FG7 gripper. The manipulation consisted in rotating the handle from a flat/horizontal position to an upright position where it is ready for use by a lifting hook, claw or a similar tool. The operator’s 190 utterances during the programming session were captured (speech data) and transcribed into timestamped text. Then GPT-3.5-turbo, an LLM, was prompted to segment the timestamped text into phases, each phase annotated with a start and end time, to identify values of specified robot parameters, including speed, precision, exact^following and reference_Jrame, and finally to format the output in accordance with the YAML format. The prompt to the LLM is shown Table 1 and the resulting YAML file in Table 2. (The experiment has also been successfully repeated using GPT-4.)

[0050] Because the transcript of the operator’s 190 utterances result is given to the LLM, it can be inspected at the end of the text in Table 1. In plaintext without the timestamp markup, the operator 190 is saying: “I start moving the robot close to the handle. Putting all the right angles. Now at this point I’m really close to the handle so I begin a very precise moment [movement] of grabbing the handle, moving it upwards. Slowly and now at the end the handle is perfectly towards the top. So now I move backwards and then away, taking the robot back to the home position.”

[0051] Still during the kinesthetic programming session, cartesian waypoints of the robot trajectory were estimated based on a video recording. When these waypoints, the data in Table 2 and the recorded trajectory were converted into robot commands - in a process corresponding to step 218 described above - a RAPID™ robot program was generated. Table 3 shows excerpts of the robot program, wherein the somewhat unwieldy initial definitions of constant waypoints (robtargets) box_handle_sebas2_pi, box_handle_sebas2_p2, etc. have been omitted for space reasons. A definition of the reference frame “handle” (wobjdata) has also been intentionally left out; the reference frame was derived at runtime by an RGB-D camera image processing module.

Example 2

[0052] In a further example, an ABB YuMi™ robot was taught to insert sample tubes into slots of an eight-tube laboratory centrifuge while shifting the rotor 1/8 of a turn between each tube. The robot was also trained to remove the sample tubes after they had been processed by the centrifuge. Substantially the same workflow as for Example 1 was followed and the resulting generated robot program was found to be satisfactory.

[0053] The aspects of the present disclosure have mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the invention, as defined by the appended patent claims.

Claims

1. A method (200) of programming an industrial robot (100), which comprises a robot manipulator (no) and a robot controller (120), the method comprising: recording (210) movements of the robot manipulator during a kinesthetic programming session, for thereby obtaining a robot trajectory; capturing (211) speech data while recording the movements of the robot manipulator; decomposing (212) the captured speech data into a plurality of phases; for at least one phase of the speech data, parsing (214) the speech data into at least one robot parameter value of at least one robot parameter using a natural-language model (170); identifying (215) a contemporaneous phase of the robot trajectory and annotating (216) it with the at least one robot parameter value; on the basis of the annotated robot trajectory, generating (218) a robot program executable by the robot controller.

2. The method of claim 1, wherein parsing (214) the speech data includes: transcribing the speech data into timestamped text; and extracting at least one robot parameter value from the timestamped text using the natural-language model.

3. The method of claim 2, wherein annotating (216) the speech data further includes: formatting the extracted robot parameter value(s) for the respective phases in accordance with a data serialization format.

4. The method of any of the preceding claims, wherein generating (218) the robot program includes converting the annotated robot trajectory into a sequence of robot commands selected from a predefined set of robot commands executable by the robot controller.

5. The method of any of the preceding claims, wherein the decomposing (212) of the captured speech data into phases is performed using the natural-language model.

6. The method of any of the preceding claims, wherein the natural-language model is a large language model, LLM.

7. The method of claim 6, wherein parsing (214) the speech data includes: transcribing the speech data into timestamped text; and extracting at least one robot parameter value by prompting the LLM to segment the timestamped text into phases, each phase annotated with a start and end time, and to identify values of one or more explicitly indicated robot parameters.

8. The method of any of the preceding claims, wherein the at least one robot parameter includes a state of a tool (111) carried by the robot manipulator, such as an open or closed position of a gripper.

9. The method of any of the preceding claims, wherein the at least one robot parameter includes a movement speed of the robot manipulator.

10. The method of any of the preceding claims, wherein the at least one robot parameter includes a degree of compliance with the robot trajectory.

11. The method of any of the preceding claims, wherein the at least one robot parameter includes a degree of movement precision.

12. The method of any of the preceding claims, wherein the at least one robot parameter includes a reference frame in which the robot program expresses the movements of a tool (111) of the robot manipulator, the reference frame having an origin selected from a group consisting of: a base (112) of the robot manipulator, a workpiece (130), an object (140) other than the workpiece in a work area (150) of the robot, a point in the work area (150) of the robot.

13. The method of any of the preceding claims, further comprising: capturing (213.1) images of the kinesthetic programming session using a camera (165) or depth camera; and estimating (213.2) poses of a workpiece (130) in respective phases of the robot trajectory.

14- The method of any of the preceding claims, further comprising: enabling (219) an operator to review and edit a text representation of the robot program.

15. The method of any of the preceding claims, wherein the decomposing (212) includes associating (212.1) the phases with respective human-intelligible labels.

16. A programming device (160) for facilitating programming of an industrial robot (100), the device comprising memory (161) and processing circuitry (162) configured: to record movements of the robot manipulator during a kinesthetic programming session, for thereby obtaining a robot trajectory; to capture speech data while recording the movements of the robot manipulator; to decompose the captured speech data into a plurality of phases; for at least one phase of the speech data, to parse the speech data into at least one robot parameter value of at least one robot parameter using a natural-language model (170), to identify a contemporaneous phase of the robot trajectory and to annotate the contemporaneous phase with the at least one robot parameter value; and, on the basis of the annotated robot trajectory, to generate a robot program executable by the robot controller.

17. A computer program (164) comprising instructions which, when the program is executed by a computer, cause the computer to carry out the method of any of claims 1 to 15.