WO2025250119A1

WO2025250119A1 - Vision-based bin profiling for robotic bin picking

Info

Publication number: WO2025250119A1
Application number: PCT/US2024/031207
Authority: WO
Inventors: Husnu Melih ERDOGAN; Ines UGALDE DIAZ; Eugen SOLOWJOW; Christopher SCHÜTTE; Paul Andreas Batsii
Original assignee: Siemens Corp
Current assignee: Siemens Corp
Filing date: 2024-05-28
Publication date: 2025-12-04
Anticipated expiration: 2026-11-28

Abstract

A method for bin profiling for robotic bin picking includes acquiring first and second images of a plane surface without and with the bin placed on it, respectively. The images include depth frames. A user is instructed, via a UI, to locate points on the plane surface in the first image to estimate a height of the plane surface, locate points on a base of the bin in the second image to estimate a height of the bin base, and locate points lying on a top rim of the bin the second image to estimate a height of the bin top rim. The user is instructed to fit a 2D geometric shape describing a contour of the bin in the second image, which is projected from a pixel frame to a 3D camera reference frame using camera intrinsic parameters and the estimated height information, to estimate bin dimensions.

Description

VISION-BASED BIN PROFILING FOR ROBOTIC BIN PICKING

TECHNICAL FIELD

[0001] The present disclosure relates generally to the field of robotics in factory automation. Specifically, the described embodiments relate to a 3D vision-based profiling tool for estimating dimensions of bins in robotic bin picking applications.

BACKGROUND

[0002] The fourth industrial revolution aims to drive mass customization to the cost of mass production. This can be enabled by autonomous machines that no longer have to be programmed with detailed instructions, such as way points or manually taught paths, but automatically define their tasks using design information of the product to be produced. Robotic bin picking is one such skill that enables autonomous machines. Traditional robotic picking relies on a teach-based approach, such that operators can predefine the robot poses for pick and drop positions. In the last decade, advances in computer vision and deep learning have enabled flexible robotic bin picking, where pre-teaching pick and drop positions is no longer required. Camera systems, such as RGB-D cameras, collect both color pictures and depth maps or point-clouds of bins with objects in random configurations. The camera input is then transferred to computer vision algorithms or deep neural networks that have been trained to compute grasping locations or “pick points” on said input.

[0003] A fully automated bin picking application typically requires a bin pose estimation algorithm, which often needs accurate bin dimensions for detecting and validating the detected bin pose. Usually, this information is provided manually by the user by entering the width, height, and depth of the bin. Such measurements may be often prone to errors. Incorrect dimensions may result in failure to detect a bin of interest in a workspace or accurately estimate its pose.

[0004] Alternately, the dimensions of a bin can be obtained from an existing CAD model of the bin. However, in many applications, a CAD model of the bin is not readily available as a large number of different types of bins may be used on the factory floor. Furthermore, the creation of a CAD model also often requires designer to manually measure the bin dimensions. SUMMARY

[0005] Aspects of this disclosure provide methods, systems, and computer program products that address and overcome one or more of the described-herein shortcomings. In particular, some aspects of the disclosure are directed to a 3D vision-based bin profiling tool that can estimate dimensions of bins in robotic bin picking applications in a model-free manner while reducing measurement error.

[0006] A first aspect of the disclosure provides a computer-implemented method for estimating dimensions of a bin in a robotic bin picking task. The method comprises acquiring a first image, captured by one or more cameras, of a workspace defined by a plane surface, the first image comprising a first depth frame, and displaying the first image via a user interface (UI). The method further comprises estimating a height of the plane surface by instructing a user to locate one or more first sample points lying on the plane surface in the first image using the UI. The method further comprises acquiring a second image, captured by the one or more cameras, of the workspace including the bin placed on the plane surface, the second image comprising a second depth frame, and displaying the second image via the UI. The method further comprises estimating a height of a base of the bin by instructing the user to locate one or more second sample points lying on the base of the bin in the second image using the UI. The method further comprises estimating a height of a top rim of the bin by instructing the user to locate a plurality of third sample points lying on the top rim in the second image using the UI. The method further comprises instructing the user to fit a 2D geometric shape describing a contour of the bin in the second image using the UI. The method further comprises estimating dimensions of the bin by projecting the fitted 2D geometric shape from a pixel frame to a 3D camera reference frame using camera intrinsic parameters of the first and second images and the estimated height information.

[0007] Further aspects of this disclosure provide vision-based systems and computer program products embodying the above-described method.

[0008] Additional technical features and benefits may be realized through the techniques of the present disclosure. Embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed subject matter. For a better understanding, refer to the detailed description and to the drawings. BRIEF DESCRIPTION OF THE DRAWINGS

[0009] The foregoing and other aspects of the present disclosure are best understood from the following detailed description when read in connection with the accompanying drawings. To easily identify the discussion of any element or act, the most significant digit or digits in a reference number refer to the figure number in which the element or act is first introduced.

[0010] FIG. 1 illustrates an example of an autonomous system capable of executing a robotic bin picking task, wherein aspects of the present disclosure may be incorporated.

[0011] FIG. 2 illustrates an example definition of typical bin dimensions.

[0012] FIG. 3 illustrates an offline operational mode of a bin profiling tool according to an example embodiment of this disclosure.

[0013] FIG. 4 illustrates an online operational mode of a bin profiling tool according to an example embodiment of this disclosure.

[0014] FIG. 5 illustrates a computing environment within which embodiments of this disclosure may be implemented.

DETAILED DESCRIPTION

[0015] Various technologies are described herein that are directed to robotic bin picking applications. The term “bin”, as used herein, refers to a container or other structure (e.g., a tote, pallet, carton, etc.) capable of receiving physical objects. A robotic bin picking application typically involves controlling a robot having a robotic arm with end effector to grasp objects individually from a pile of objects disposed in a bin. The objects may be of the same or assorted types, and may, for example, be disposed in random poses in the bin.

[0016] To execute a grasp, an optimal grasping location or pick point may be computed from an image of the workspace, which includes the bin with the objects placed therein. In some cases, the workspace may include multiple bins. To compute the pick point, it may be necessary to detect the bin of interest in the image and estimate a pose of the detected bin. For this purpose, it is desirable to have a 3D representation of the bin, which requires accurate estimation of dimensions of the bin. Such a 3D representation may be further useful in determining a trajectory or direction of approach for executing a grasp that is free from collisions with (e.g., with the bin wall).

[0017] The present disclosure provides a methodology to accurately estimate bin dimensions for executing a robotic bin picking task. Unlike existing approaches, the proposed methodology does not rely on the availability of CAD files describing the 3D geometry of the bins or manual measurements that are often prone to errors.

[0018] The proposed methodology uses a vision-based tool for estimating bin dimensions, which is referred to herein a “bin profiling” tool. The inputs to the bin profiling tool include camera images of a workspace defined by a plane surface. The camera images include a first image of the empty workspace and a second image of the workspace with the bin placed on the plane surface. The images include depth frames. A depth frame, also referred to as a depth map, includes a two- dimensional representation of image pixels that contains, for each pixel, a depth or height value corresponding to a distance with respect to a camera viewpoint. The camera images may be captured offline or online. In the offline operational mode, images may be captured prior to the execution of the robotic bin picking task. This allows the images to be captured at a convenient time and/or location (e.g., remote from the site of the robotic bin picking task) without necessarily using the same camera which is utilized for online execution of the robotic bin picking task. The image frames and the corresponding camera intrinsic parameters may be saved as files, which may be retrieved from a storage medium to estimate the bin dimensions. In the online operational mode, which can be simpler and faster to execute, image frames and the camera intrinsic parameters may be directly captured from a live camera feed utilized for bin pose estimation in the robotic bin picking task.

[0019] The bin profiling tool provides a user interface (UI) to instruct a user to locate sample points on specified regions in the first and second images, such as on the plane surface, on the bottom of the bin and on the top rim of the bin. Using the depth values of the sampled points from the depth frames, the heights of the respective regions can be estimated. The user is further instructed via the UI to fit a 2D geometric shape (e.g., a rectangle or a circle) describing a contour of the bin in the second image. The 2D geometric shape is projected from a pixel frame to a 3D camera reference frame using camera intrinsic parameters and the estimated height information, to estimate bin dimensions.

[0020] Aspects of the proposed methodology may be embodied as software executable by a processor. In some embodiments, aspects of the disclosed methodology may be suitably integrated into commercial artificial intelligence (Al)-based automation software products, such as SIMATIC Robot Pick Al™ developed by Siemens AG, among others.

[0021] Turning now to the drawings, FIG. 1 illustrates an autonomous system 100 configured for performing robotic bin packing according to one or more embodiments. In the following description, unless otherwise specified, the term “system” refers to the autonomous system 100. The system 100 may be implemented in a factory setting. In contrast to conventional automation, autonomy gives each asset on the factory floor the decision-making and self-controlling abilities to act independently in the event of local issues. The system 100 may comprise one or more robots, such as the robot 102, which may be controlled by a computing system 104 to execute one or more industrial tasks within a physical environment such as a shopfloor. Examples of industrial tasks include assembly, transport, or the like.

[0022] The computing system 104 may comprise an industrial PC, or any other computing device, such as a desktop or a laptop, or an embedded system, among others. The computing system 104 can include one or more processors configured to process information and/or control various operations associated with the robot 102. The processor(s) may include a one or more CPUs, GPUs, microprocessors, or any hardware devices suitable for executing instructions stored on a memory comprising a machine-readable medium. In particular, the one or more processors may be configured to execute an application program, such as an engineering tool, for operating the robot 102.

[0023] To realize autonomy of the system 100, in one embodiment, the application program may be designed to operate the robot 102 to perform a task in a skill-based programming environment. In contrast to conventional automation, where an engineer is usually involved in programming an entire task from start to finish, typically utilizing low-level code to generate individual commands, in an autonomous system as described herein, a physical device, such as the robot 102, is programmed at a higher level of abstraction using skills instead of individual commands. The skills are derived for higher-level abstract behaviors centered on how the physical environment is to be modified by the programmed physical device. Illustrative examples of skills include a skill to grasp or pick up an object, a skill to place an object, a skill to open a door, a skill to detect an object, and so on.

[0024] The application program may generate controller code that defines a task at a high level, for example, using skill functions as described above, which may be communicated to a robot controller 108. From the high-level controller code, the robot controller 108 may generate low-level control signals for one or more motors for controlling the movement of the robot 102, such as angular position of the robot arms, swivel angle of the robot base, and so on, to execute the specified task. In other embodiments, the controller code generated by the application program may be communicated to intermediate control equipment, such as programmable logic controllers (PLC), which may then generate low-level control commands for the robot 102 to be controlled. Additionally, the application program may be configured to directly integrate sensor data from the physical environment in which the robot 102 operates. To this end, the computing system 104 may comprise a network interface to facilitate transfer of live data between the application program and various sensors, such as camera 122.

[0025] The robot 102 can include a robotic arm or manipulator 110 and a base 112 configured to support the robotic manipulator 110. The base 112 can include wheels 114 or can otherwise be configured to move within the physical environment 106. The robot 102 can further include an end effector 116 attached to the robotic manipulator 110. The end effector 116 may include a gripper configured to grasp (hold) and pick up an object 118. Examples of end effectors include vacuumbased grippers or suction cups (as shown), antipodal grippers such as fingers or claws, magnetic grippers, among others. The robotic manipulator 110 can be configured to move so as to change the position of the end effector 116, to enable picking and moving objects 118 within the physical environment.

[0026] A robotic bin picking task may involve picking objects 118 one-by-one from a bin 120, using the end effector 116. The objects 118 may be disposed in random poses within the bin 120. The objects 118 can be of assorted types or of the same type. To accomplish this task, the system 100 may include sensors or cameras that enable the robot 102 to perceive the physical environment. As shown, these sensors may include (among others) a camera 122 for capturing an image of the workspace that includes the bin 120 containing the objects 118. The camera 122 may include, for instance, 2D camera such as depth camera or RGB-D camera, or 3D point cloud sensors, among others. The camera 122 may be positioned to capture an image with a top-down view of the workspace. The camera image may be provided as an input to a computing system, such as the computing system 104, for computing a pick point for the end effector 116 to execute a grasp. The pick point may be computed, for example, using a grasp neural network. Grasp neural networks are often convolutional, such that given an input image (typically including a depth frame), the network can label each pixel of the input image with some type of grasp affordance metric or grasp score. Based on the grasp scores, an optimal grasp location (“pick point”) may be computed that satisfies specified constraints, such as avoiding collision with a bin wall. The computed pick point and trajectory (direction of approach) may be outputted to a controller, such as the robot controller 108, to control the end effector 116 to pick an object 118. For example, as described above, the pick point and trajectory may be output as high-level controller code to the controller, which may therefrom generate low-level commands to control movement of the end effector 116.

[0027] To compute the pick point, it may be necessary to detect the bin in the camera image and estimate its pose, which may be achieved by creating a 3D representation of the bin based on its dimensions.

[0028] FIG. 2 illustrates an example definition of typical bin dimensions. As shown, a bin 200 may include a base 202 representing a bottom of the bin 200 defined by length and width dimensions, a rim 204 defining a top edge of the bin 200 and a wall 206 extending between the base 202 and the top rim 206 defined by a height dimension. The dimensions of the bin 200 may include external dimensions defined by length Li, width Wi and height Hi and internal dimensions defined by length L2, width W2 and height H2. The external and internal dimensions typically differ due to the thickness of the bin base 202 and the bin wall 204. Often, the top rim 204 may also have a suitable width (extending transversely from the bin wall 206) to facilitate stacking of bins. The bin dimensions, such as the external dimensions Li, Wi, Hi and internal dimensions L2, W2, H2 may be estimated using the proposed bin profiling tool as described below.

[0029] FIG. 3 illustrates a computer-implemented workflow describing an offline operational mode of the proposed bin profiling tool according to an example embodiment. The workflow, including components thereof, may be implemented by a computing system in various ways, for example, as hardware and programming. The programming may take the form of processor- executable instructions stored on non-transitory machine-readable storage mediums and the hardware may include processors to execute those instructions. For example, the programs may run on the computing system 104 of the system 100 shown in FIG. 1. Furthermore, the processing capability may be distributed among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems or cloud/network elements. The steps in the workflow need not be implemented in the exact sequence as described.

[0030] The input 302 to the workflow includes a first camera image of an empty workspace with a plane surface, a second camera image of the same workspace with a bin placed on the plane surface, and camera intrinsic parameters of the captured images. The first and second images comprise respective first and second depth frames. A depth frame includes a two-dimensional representation of image pixels that contains, for each pixel, a depth or height value corresponding to a distance with respect to a camera viewpoint. In one embodiment, a camera used for capturing the first and second images may be suitably positioned to capture a top-down view of the plane surface and the bin respectively. If the camera is positioned at an angle in relation to the plane surface, the camera image may be suitably re-projected to compute the height from the depth value using known transformations. Each image is associated with camera intrinsic parameters that can be used to compute a mapping between coordinates in the 2D pixel frame and coordinates in a 3D camera reference frame. Typically, the camera intrinsic parameters include the pixel coordinates of the principal point or optical center, and the focal length along orthogonal axes.

[0031] In one embodiment, the first and second images may further comprise respective first and second color frames. A color frame includes a two-dimensional representation of image pixels, where each pixel includes intensity values for a number of color components. An example of a color intensity image is an RGB color image, which is an image including pixel intensity information in red, green and blue color channels. The first and second color frames may be aligned pixel-wise with the first and second depth frames respectively. For this purpose, an RGB-D camera can be used, which may be configured to acquire an image with red-green-blue (RGB) color and depth (D) channels. For the purpose of executing the steps described hereinafter, the bin profiling tool may be configured to display the color frames via a UI, making it easier for the user to locate sample points in specified regions in the images. The height of the sample points may be computed from the depth frames aligned with the respective color frames. [0032] In the offline operational mode, the images can be captured prior to the execution of the robotic bin picking task. The offline mode allows a user to save the image frames and camera intrinsic parameters and perform the bin profiling later. In one embodiment, the saved image frames may include color (e.g., RGB) frames and depth frames. If the camera produces distorted images (e.g., some form of radial distortion that portrays lines as curves), the images may be first undistorted using the camera distortion coefficients. If any of the camera intrinsic parameters, depthcolor extrinsic parameters (required to pixel-wise align color and depth images from different sensors) or distortion coefficients are missing, then standard computer vision methods (such as checkerboard calibration) may be employed to compute them. In various embodiments, the images may be captured at a different location (e.g., remotely) with respect to the site of the robotic bin picking task and/or captured via a different camera than that utilized for bin pose estimation in the robotic bin picking task. In some embodiments, the first and second images may themselves be captured using different cameras. When the robotic bin picking task is to be executed, the bin profiling tool may acquire the images by retrieving the images along with the corresponding camera intrinsic parameters as saved files from a storage medium. In embodiments, the bin profiling tool may automatically load and select which image frame to use in each step and instruct the user how to proceed via the UI. The UI may be implemented on a display device, such as a display monitor, a touchscreen, among others. To locate the sample points, the user may use an input device, such as a mouse, a stylus, among others.

[0033] Continuing with reference to FIG. 3, at block 304, the bin profiling tool may display the first image via the UI and instruct the user to locate one or more first sample points 322 lying on the plane surface 320 in the first image using the UI. For example, as shown in the present embodiment, the user may be instructed to locate the one or more first sample points 322 within an automatically designated region at a center of the first image using the UI. The designated region may be indicated via the UI, for example, by a circle, as shown. The depth or height value of the pixel(s) corresponding to the one or more first sample points 322 may be determined from the first depth frame, from which a height of the plane surface 320 may be estimated. In some embodiments, if multiple first points 322 are sampled, the height of the plane surface 320 may be estimated as an average height of the pixels corresponding to the first sample points 322. In other embodiments, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of first sample points 322. The height of the plane surface 320 may be estimated by computing the height of the fitting plane. This approach can make the system robust to larger depth noise spikes present in a subset of the points, as it may enable identification, and subsequent removal, of outliers. This approach can also identify large variances in the sampled points and provide the user with an appropriate warning or error message, prompting them to recheck the setup and recollect the camera images and/or points.

[0034] At block 306, the bin profiling tool may display the second image via the UI and instruct the user to locate one or more second sample points 328 lying on a base 326 of a bin 324 (placed on the plane surface 320) in the second image using the UI. For example, as shown in the present embodiment, the user may be instructed to locate the one or more second sample points 328 within an automatically designated region at a center of the first image using the UI. The designated region may be indicated via the UI, for example, by a circle, as shown. The depth or height value of the pixel(s) corresponding to the one or more second sample points 328 may be determined from the second depth frame, from which a height of the bin base 326 may be estimated. In some embodiments, if multiple second points 328 are sampled, the height of the bin base 326 may be estimated as an average height of the pixels corresponding to the second sample points 328. In other embodiments, as described above, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of second sample points 328. The height of the bin base 326 may be estimated by computing the height of the fitting plane.

[0035] At block 308, the bin profiling tool may display the second image via the UI and instruct the user to locate several third sample points 332 lying on a top rim 330 of the bin 324 in the second image using the UI. The depth or height value of the pixels corresponding to the third sample points 332 may be used to estimate a height of the top rim 330. The third sample points 332 may often include noise introduced by factors such as user error, image artifacts, lower image resolution, among others. In some embodiments, noisy points may be removed from the third sample points 332 using a statistical outlier filter based on determining a height of each sample point 332 from the second depth frame. The height of the top rim 330 may be estimated by computing an average height of the pixels corresponding to the remaining sample points 332 after filtering out the noisy points. In other embodiments, as described above, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of third sample points 332. The height of the top rim 330 may be estimated by computing the height of the fitting plane.

[0036] At block 310, the bin profiling tool may display the second image via the UI and instruct the user to fit a first 2D geometric shape describing an external contour of the top rim 330 in the second image using the UI. In various embodiments, the fitted 2D geometric shape may include any primitive depending on the shape of the bin 324. In the present embodiment, the first 2D geometric shape includes a first rectangle 334. The corner points of the rectangle 334 may thereby define external corner points of the top rim 330.

[0037] At block 312, the bin profiling tool may display the second image via the UI and instruct the user to fit a second 2D geometric shape describing an internal contour of the bin base 326 in the second image using the UI. In various embodiments, the fitted 2D geometric shape may include any primitive depending on the shape of the bin 324. In the present embodiment, the second 2D geometric shape includes a second rectangle 336. The corner points of the rectangle 336 may thereby define internal corner points of the bin base 326.

[0038] At block 314, the bin profiling tool may estimate the external and internal dimensions of the bin 324 by projecting the fitted first and second 2D geometric shapes 334, 336 from the pixel frame 350 to the 3D camera reference frame 130 (see FIG. 1) using the camera intrinsic parameters and the estimated height information.

[0039] In the present embodiment, block 314 may comprise transforming coordinates of the external corner points (comer points of rectangle 334) and internal corner points (corner point of rectangle 336) from the 2D pixel frame 350 to the 3D camera reference frame 130 using the camera intrinsic parameters and the estimated height information. In the 3D camera reference frame 130, the estimated height values are represented along the z-axis. The coordinates of each comer point (external and internal) in the 3D camera frame 130 may be computed, for example, using the following transformations: x in camera frame = (x in pixel frame - ppx / fx) * z (1) y_ in camera frame = (y in pixel frame - ppy / fy) * z (2) z in camera frame = z (3) where, ppx and ppy describe coordinates of the principal point in pixel frame (camera intrinsic parameter), fx and fy describe the focal length of the image (camera intrinsic parameter), and z is the estimated height of the point, i.e., the distance from the camera to that point in 3D space.

[0040] Based on the determined coordinates of the external and internal comer points in the 3D camera reference frame 130, the external bin dimensions Li, Wi, Hi and the internal bin dimensions L2, W2, H2 may be computed in length units (e.g., in millimeters) by computing distances between the respective points in 3D space and using the estimated height information.

[0041] In some embodiments, for example, where a bin is made of a relatively thin material (such as a carton), the external and internal dimensions of the bin may not be significantly different, and hence only a single length L, width W and height H of the bin may be estimated to determine its pose. In this case, the blocks 310 and 312 in FIG. 3 may be combined into a single step where a user is instructed to fit a single 2D geometric shape describing a contour of the bin 324 in the second image, using the UI. Note that in this scenario, the coordinates of the corner points of the top rim 330 and the bin base 326 would coincide in the 2D pixel frame 350 but would be different in the 3D camera reference frame 130 due to the difference in height between them. Based on the determined coordinates of the comer points of the top rim 330 and bin base 326 in the 3D camera reference frame 130 (e.g., computed as described above), bin dimensions L, B, H may be computed in length units by computing distances between the respective points in 3D space and using the estimated height information.

[0042] FIG. 4 illustrates a computer-implemented workflow describing an online operational mode of the proposed bin profiling tool according to an example embodiment. The workflow, including components thereof, may be implemented by a computing system in various ways, for example, as hardware and programming. The programming may take the form of processor-executable instructions stored on non-transitory machine-readable storage mediums and the hardware may include processors to execute those instructions. For example, the programs may run on the computing system 104 of the system 100 shown in FIG. 1. Furthermore, the processing capability may be distributed among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems or cloud/network elements. The steps in the workflow need not be implemented in the exact sequence as described.

[0043] In the online mode, the images may be acquired online during execution of the robotic bin picking task. The images may be acquired by the bin profiling tool by capturing respective depth frames and the intrinsic parameters from a live camera feed utilized for bin pose estimation in the robotic bin picking task. For example, in the system 100 shown in FIG. 1, the camera 122 may be used for this purpose.

[0044] The input 402 to the workflow includes a first camera image of an empty workspace with a plane surface, a second camera image of the same workspace with a bin placed on the plane surface, and camera intrinsic parameters of the captured images. The image frames are captured from a live camera feed. The first and second images comprise respective first and second depth frames and may additionally include first and second color frames aligned pixel-wise with the respective depth frames. For the purpose of executing the steps described hereinafter, the bin profiling tool may be configured to display the color frames via a UI, making it easier for the user to locate sample points in specified regions in the images. The height of the sample points may be computed from the depth frames aligned with the respective color frames. The UI may be implemented on a display device, such as a display monitor, a touchscreen, among others. To locate the sample points, the user may use an input device, such as a mouse, a stylus, among others.

[0045] The camera may include, for example, an RGB-D camera, a color point cloud camera, among others. The camera intrinsic properties (e.g., principal point and focal length) may be acquired by the bin profiling tool directly from the live camera, for example via a camera API, or may be provided to the tool via a file or other means by the user. If a point cloud camera is used, the camera intrinsic parameters may be used to convert a 3D point cloud into 2D depth and RGB color frames. The camera may be suitably positioned to capture a top-down view of the plane surface and the bin respectively. If the camera is positioned at an angle in relation to the plane surface, the camera image may be suitably re-projected to compute the height from the depth value using known transformations. If the camera produces distorted images (e.g., some form of radial distortion that portrays lines as curves), the images may be first undistorted using the camera distortion coefficients. If any of the camera intrinsic parameters, depth-color extrinsic parameters (required to pixel-wise align color and depth images from different sensors) or distortion coefficients are missing, then standard computer vision methods (such as checkerboard calibration) may be employed to compute them.

[0046] Continuing with reference to FIG. 4, at block 404, the bin profiling tool may capture a first image of an empty workspace with a plane surface 420 from the live camera feed. The bin profiling tool may display the first image via the UI and instruct the user to locate one or more first sample points 422 lying on the plane surface 420 in the first image using the UI. For example, as shown in the present embodiment, the user may be instructed to locate the one or more first sample points 422 within an automatically designated region at a center of the first image using the UI. The designated region may be indicated via the UI, for example, by a circle, as shown. The depth or height value of the pixel(s) corresponding to the one or more first sample points 422 may be determined from the first depth frame, from which a height of the plane surface 420 may be estimated. In some embodiments, if multiple first points 422 are sampled, the height of the plane surface 420 may be estimated as an average height of the pixels corresponding to the first sample points 422. In other embodiments, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of first sample points 422. The height of the plane surface 420 may be estimated by computing the height of the fitting plane. This approach can make the system robust to larger depth noise spikes present in a subset of the points, as it may enable identification, and subsequent removal, of outliers. This approach can also identify large variances in the sampled points and provide the user with an appropriate warning or error message, prompting them to recheck the setup and recollect the camera images and/or points.

[0047] At block 406, the bin profiling tool may instruct the user to place a bin 424 in the workspace on the plane surface 420 and capture a second image of the workspace from the live camera feed. The bin profiling tool may display the second image via the UI and instruct the user to locate one or more second sample points 428 lying on a base 426 of the bin 424 in the second image using the UI. For example, as shown in the present embodiment, the user may be instructed to locate the one or more second sample points 428 within an automatically designated region at a center of the first image using the UI. The designated region may be indicated via the UI, for example, by a circle, as shown. The depth or height value of the pixel(s) corresponding to the one or more second sample points 428 may be determined from the second depth frame, from which a height of the bin base 426 may be estimated. In some embodiments, if multiple second points 428 are sampled, the height of the bin base 426 may be estimated as an average height of the pixels corresponding to the second sample points 428. In other embodiments, as described above, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of second sample points 328. The height of the bin base 426 may be estimated by computing the height of the fitting plane. [0048] At block 408, the bin profiling tool may display the second image via the UI and instruct the user to locate several third sample points 432 lying on a top rim 430 of the bin 424 in the second image using the UI. The depth or height value of the pixels corresponding to the third sample points 432 may be used to estimate a height of the top rim 430. The third sample points 432 may often include noise introduced by factors such as user error, image artifacts, lower image resolution, among others. In some embodiments, noisy points may be removed from the third sample points 432 using a statistical outlier filter based on determining a height of each sample point 432 from the second depth frame. The height of the top rim 430 may be estimated by computing an average height of the pixels corresponding to the remaining sample points 432 after filtering out the noisy points. In other embodiments, as described above, fitting or regression methods (such as linear least squares) may be used to find the best fitting plane to the given set of third sample points 432. The height of the top rim 430 may be estimated by computing the height of the fitting plane.

[0049] At block 410, the bin profiling tool may display the second image via the UI and instruct the user to fit a first 2D geometric shape describing an external contour of the top rim 430 in the second image using the UI. In various embodiments, the fitted 2D geometric shape may include any primitive depending on the shape of the bin 424. In the present embodiment, the first 2D geometric shape includes a first rectangle 434. The corner points of the rectangle 434 may thereby define external corner points of the top rim 430.

[0050] At block 412, the bin profiling tool may display the second image via the UI and instruct the user to fit a second 2D geometric shape describing an internal contour of the bin base 426 in the second image using the UI. In various embodiments, the fitted 2D geometric shape may include any primitive depending on the shape of the bin 424. In the present embodiment, the second 2D geometric shape includes a second rectangle 436. The corner points of the rectangle 436 may thereby define internal corner points of the bin base 426.

[0051] At block 414, the bin profiling tool may estimate the external and internal dimensions of the bin 424 by projecting the fitted first and second 2D geometric shapes 434, 436 from the pixel frame 450 to the 3D camera reference frame 130 (see FIG. 1) using the camera intrinsic parameters and the estimated height information.

[0052] In the present embodiment, block 414 may comprise transforming coordinates of the external corner points (comer points of rectangle 434) and internal corner points (corner point of rectangle 436) from the 2D pixel frame 450 to the 3D camera reference frame 130 using the camera intrinsic parameters and the estimated height information. In the 3D camera reference frame 130, the estimated height values are represented along the z-axis. The coordinates of each comer point (external and internal) in the 3D camera frame 130 may be computed, for example, using the following transformations described above in equations (1) to (3).

[0053] Based on the determined coordinates of the external and internal comer points in the 3D camera reference frame 130, the external bin dimensions Li, Wi, Hi and the internal bin dimensions L2, W2, H2 may be computed in length units (e.g., in millimeters) by computing distances between the respective points in 3D space and using the estimated height information.

[0054] In some embodiments, for example, where a bin is made of a relatively thin material (such as a carton), the external and internal dimensions of the bin may not be significantly different, and hence only a single length L, width W and height H of the bin may be estimated to determine its pose. In this case, the blocks 410 and 412 in FIG. 4 may be combined into a single step where a user is instructed to fit a single 2D geometric shape describing a contour of the bin 424 in the second image, using the UI. Note that in this scenario, the coordinates of the corner points of the top rim 430 and the bin base 426 would coincide in the 2D pixel frame 450 but would be different in the 3D camera reference frame 130 due to the difference in height between them. Based on the determined coordinates of the comer points of the top rim 430 and bin base 426 in the 3D camera reference frame 130 (e.g., computed as described above), bin dimensions L, B, H may be computed in length units by computing distances between the respective points in 3D space and using the estimated height information.

[0055] FIG. 5 illustrates an exemplary computing environment comprising a computing system 502, within which aspects of the present disclosure may be implemented. The computing system 502 may be embodied, for example and without limitation, as an industrial PC with a Linux operating system, for executing real-time control of a physical device, such as a robot.

[0056] As shown in FIG. 5, the computing system 502 may include a communication mechanism such as a system bus 504 or other communication mechanism for communicating information within the computing system 502. The computing system 502 further includes one or more processors 506 coupled with the system bus 504 for processing the information. The processors 506 may include one or more central processing units (CPUs), graphical processing units (GPUs), Al accelerators, or any other processor known in the art.

[0057] The computing system 502 also includes a system memory 508 coupled to the system bus 504 for storing information and instructions to be executed by processors 506. The system memory 508 may include computer readable storage media in the form of volatile and/or nonvolatile memory, such as read only memory (ROM) 510 and/or random access memory (RAM) 512. The system memory RAM 512 may include other dynamic storage device(s) (e.g., dynamic RAM, static RAM, and synchronous DRAM). The system memory ROM 510 may include other static storage device(s) (e.g., programmable ROM, erasable PROM, and electrically erasable PROM). In addition, the system memory 508 may be used for storing temporary variables or other intermediate information during the execution of instructions by the processors 506. A basic input/output system 514 (BIOS) containing the basic routines that help to transfer information between elements within computing system 502, such as during start-up, may be stored in system memory ROM 510. System memory RAM 512 may contain data and/or program modules that are immediately accessible to and/or presently being operated on by the processors 506. System memory 508 may additionally include, for example, operating system 516, application programs 518, other program modules 520 and program data 522.

[0058] The computing system 502 also includes a disk controller 524 coupled to the system bus 504 to control one or more storage devices for storing information and instructions, such as a magnetic hard disk 526 and a removable media drive 528 (e.g., floppy disk drive, compact disc drive, tape drive, and/or solid state drive). The storage devices may be added to the computing system 502 using an appropriate device interface (e.g., a small computer system interface (SCSI), integrated device electronics (IDE), Universal Serial Bus (USB), or FireWire).

[0059] The computing system 502 may also include a display controller 530 coupled to the system bus 504 to control a display 532, such as a cathode ray tube (CRT) or liquid crystal display (LCD), among other, for displaying information to a computer user. The computing system 502 includes a user input interface 534 and one or more input devices, such as a keyboard 536 and a pointing device 538, for interacting with a computer user and providing information to the one or more processors 506. The pointing device 538, for example, may be a mouse, a light pen, a trackball, or a pointing stick for communicating direction information and command selections to the one or more processors 506 and for controlling cursor movement on the display 532. The display 532 may provide a touch screen interface which allows input to supplement or replace the communication of direction information and command selections by the pointing device 538.

[0060] The computing system 502 also includes an I/O adapter 546 coupled to the system bus 504 to connect the computing system 502 to a controllable physical device, such as a robot. In the example shown in FIG. 5, the I/O adapter 546 is connected to robot controller 548. In one embodiment, the robot controller 548 includes, for example, one or more motors for controlling linear and/or angular positions of various parts (e.g., arm, base, etc.) of a robot.

[0061] The computing system 502 may perform a portion or all of the processing steps of embodiments of the disclosure in response to the one or more processors 506 executing one or more sequences of one or more instructions contained in a memory, such as the system memory 508. Such instructions may be read into the system memory 508 from another computer readable storage medium, such as a magnetic hard disk 526 or a removable media drive 528. The magnetic hard disk 526 may contain one or more datastores and data files used by embodiments of the present disclosure. Datastore contents and data files may be encrypted to improve security. The processors 506 may also be employed in a multi-processing arrangement to execute the one or more sequences of instructions contained in system memory 508. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions. Thus, embodiments are not limited to any specific combination of hardware circuitry and software.

[0062] The computing system 502 may include at least one computer readable storage medium or memory for holding instructions programmed according to embodiments of the disclosure and for containing data structures, tables, records, or other data described herein. The term “computer readable storage medium” as used herein refers to any medium that participates in providing instructions to the one or more processors 506 for execution. A computer readable storage medium may take many forms including, but not limited to, non-transitory, non-volatile media, volatile media, and transmission media. Non-limiting examples of non-volatile media include optical disks, solid state drives, magnetic disks, and magneto-optical disks, such as magnetic hard disk 526 or removable media drive 528. Non-limiting examples of volatile media include dynamic memory, such as system memory 508. Non-limiting examples of transmission media include coaxial cables, copper wire, and fiber optics, including the wires that make up the system bus 504. Transmission media may also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.

[0063] The computing environment 500 may further include the computing system 502 operating in a networked environment using logical connections to one or more remote computers, such as remote computing device 544. Remote computing device 544 may be a personal computer (laptop or desktop), a mobile device, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to computing system 502. When used in a networking environment, computing system 502 may include a modem 542 for establishing communications over a network 540, such as the Internet. Modem 542 may be connected to system bus 504 via network interface 545, or via another appropriate mechanism.

[0064] Network 540 may be any network or system generally known in the art, including the Internet, an intranet, a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), a direct connection or series of connections, a cellular telephone network, or any other network or medium capable of facilitating communication between computing system 502 and other computers (e.g., remote computing device 544). The network 540 may be wired, wireless or a combination thereof. Wired connections may be implemented using Ethernet, Universal Serial Bus (USB), RJ-6, or any other wired connection generally known in the art. Wireless connections may be implemented using Wi-Fi, WiMAX, and Bluetooth, infrared, cellular networks, satellite or any other wireless connection methodology generally known in the art. Additionally, several networks may work alone or in communication with each other to facilitate communication in the network 540.

[0065] The embodiments of the present disclosure may be implemented with any combination of hardware and software. In addition, the embodiments of the present disclosure may be included in an article of manufacture (e.g., one or more computer program products) having, for example, a non-transitory computer-readable storage medium. The computer readable storage medium has embodied therein, for instance, computer readable program instructions for providing and facilitating the mechanisms of the embodiments of the present disclosure. The article of manufacture can be included as part of a computer system or sold separately. [0066] The computer readable storage medium can include a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.

[0067] The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the disclosure to accomplish the same objectives. Although this disclosure has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the disclosure.

Claims

CLAIMS What is claimed is:

1. A computer-implemented method for estimating dimensions of a bin in a robotic bin picking task, comprising: acquiring a first image, captured by one or more cameras, of a workspace defined by a plane surface, the first image comprising a first depth frame, and displaying the first image via a user interface (UI), estimating a height of the plane surface by instructing a user to locate one or more first sample points lying on the plane surface in the first image using the UI, acquiring a second image, captured by the one or more cameras, of the workspace including the bin placed on the plane surface, the second image comprising a second depth frame, and displaying the second image via the UI, estimating a height of a base of the bin by instructing the user to locate one or more second sample points lying on the base of the bin in the second image using the UI, estimating a height of a top rim of the bin by instructing the user to locate a plurality of third sample points lying on the top rim in the second image using the UI, instructing the user to fit a 2D geometric shape describing a contour of the bin in the second image using the UI, and estimating dimensions of the bin by projecting the fitted 2D geometric shape from a pixel frame to a 3D camera reference frame using camera intrinsic parameters of the first and second images and the estimated height information.

2. The method according to claim 1, wherein the first and second images are captured offline prior to execution of the robotic bin picking task, wherein acquiring the first and second images comprises retrieving the first and second images along with the intrinsic parameters as files from a storage medium.

3. The method according to claim 2, wherein the first and second images are captured at a different location than a site of the robotic bin picking task and/or captured via a different camera than that utilized for bin pose estimation in the robotic bin picking task.

4. The method according to claim 1, wherein the first and second images are captured online during execution of the robotic bin picking task, wherein acquiring the first and second images comprises capturing respective depth frames and the camera intrinsic parameters from a live camera feed utilized for bin pose estimation in the robotic bin picking task.

5. The method according to any of claims 1 to 4, wherein the first image further comprises a first color frame and the second image further comprises a second color frame, the first and second color frames aligned pixel-wise with the first and second depth frames respectively, and wherein displaying the first and second images via the UI include displaying the first and second color frames respectively.

6. The method according to any of claims 1 to 5, wherein the first and second images are captured with a top-down camera view of the plane surface and the bin respectively.

7. The method according to any of claims 1 to 6, comprising instructing the user to locate the one or more first sample points and the one or more second sample points within automatically designated regions at a center of the first image and second image respectively using the UI.

8. The method according to any of claims 1 to 7, wherein estimating the height of the top rim of the bin comprises: removing noisy points from the plurality of third sample points using a statistical outlier filter based on determining a height of each sample point from the second depth frame, and computing an average height of the remaining sample points in the plurality of third sample points.

9. The method according to any of claims 1 to 8, wherein estimating the height of the top rim of the bin comprises determining a fitting plane to the plurality of third sample points and computing a height of the fitting plane.

10. The method according to any of claims 1 to 9, wherein fitting a 2D geometric shape includes fitting a first 2D geometric shape describing an external contour of the top rim of the bin and a fitting a second 2D geometric shape describing an internal contour of the base of the bin, and wherein estimating the dimensions of the bin comprises estimating external and internal dimensions of the bin by projecting the fitted first and second 2D geometric shapes from the pixel frame to the 3D camera reference frame using the camera intrinsic parameters and the estimated height information.

11. The method according to claim 10, wherein the fitted first and second 2D shapes include first and second rectangles, which respectively define external corner points of the top rim of the bin and internal corner points of the base of the bin, wherein the step of projecting comprises transforming coordinates of the external and internal corner points from the pixel frame to the 3D camera reference frame using the camera intrinsic parameters and the estimated height information.

12. A non-transitory computer-readable storage medium including instructions that, when processed by one or more processors, configure the one or more processors to perform the method according to any one of claims 1 to 11.

13. A system for estimating dimensions of a bin in a robotic bin picking task, comprising: one or more processors, and memory storing instructions executable by the one or more processors to perform a method according to any of claims 1 to 11.