WO2021197195A1

WO2021197195A1 - Picking/placing behavior recognition method and apparatus, and electronic device

Info

Publication number: WO2021197195A1
Application number: PCT/CN2021/082960
Authority: WO
Inventors: 冯昊; 冯雪涛
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-03-31
Filing date: 2021-03-25
Publication date: 2021-10-07
Anticipated expiration: 2022-09-30
Also published as: CN113468926A

Abstract

A picking/placing behavior recognition method and apparatus, and an electronic device. The method comprises: on the basis of multiple frames of acquired images to be recognized, obtaining feature point coordinates corresponding to a target object in each frame of image to be recognized (2000); on the basis of multiple frames of the feature point coordinates, obtaining a recognition result of an action category of the target object, and calculating the distance between the target object and a shelf (2200); and obtaining a recognition result of a picking/placing behavior according to the action category and the distance between the target object and the shelf (2400). The method can reduce the costs of recognizing picking/placing behaviors in a hypermarket, and can achieve the simultaneous recognition of picking/placing behaviors for multiple target objects.

Description

Recognition method, device and electronic equipment of pick-and-place behavior

本申请要求2020年03月31日递交的申请号为202010244985.8、发明名称为“一种取放行为的识别方法、装置及电子设备”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on March 31, 2020, with the application number 202010244985.8 and the invention title "A method, device and electronic equipment for picking and placing behavior", the entire content of which is incorporated herein by reference Applying.

Technical field

本说明书实施例涉及动作识别技术领域，更具体地，涉及一种取放行为的识别方法、一种取放行为的识别装置、一种电子设备以及一种计算机可读存储介质。The embodiments of the present specification relate to the technical field of action recognition, and more specifically, to a method for recognizing pick-and-place behavior, a device for recognizing pick-and-place behavior, an electronic device, and a computer-readable storage medium.

Background technique

智能零售系统不但能够给用户提供更好的购物体验，也可以给商家提供更为便捷的管理方案。智能零售系统通过识别并记录用户在卖场中的取货或放货行为，防止商品丢失，及正向应用，如商品自动化盘点和自动化结帐等功能。Smart retail systems can not only provide users with a better shopping experience, but also provide merchants with more convenient management solutions. The intelligent retail system prevents the loss of goods by identifying and recording the user's pick-up or release behavior in the store, and positive applications, such as automated commodity inventory and automated checkout functions.

现有的智能零售系统常使用的方案有三种。第一种是双目识别方案，即在货架或天花板上安装大量的摄像头，通过双目三维重建的方法测量用户手接触货架及接触货架的位置。第二种是在货架顶部安装摄像头，进行动作识别和商品取放行为的识别。第三种是多传感器融合方案，在货架内置重量传感器，通过重量信息配合摄像头实现对商品的取放行为的识别。There are three schemes commonly used in existing smart retail systems. The first is the binocular recognition scheme, that is, a large number of cameras are installed on the shelf or the ceiling, and the position of the user's hand touching the shelf and touching the shelf is measured by the method of binocular three-dimensional reconstruction. The second is to install a camera on the top of the shelf to perform motion recognition and product pick-and-place behavior recognition. The third is the multi-sensor fusion scheme, in which a weight sensor is built into the shelf, and the weight information is combined with the camera to realize the identification of the picking and placing behavior of the goods.

上述现有的方案中，需要对货架进行定制，且需要安装大量的摄像头，成本较高。因此，有必要提出一种新的取放行为的识别方法。In the above-mentioned existing solutions, the shelves need to be customized, and a large number of cameras need to be installed, and the cost is relatively high. Therefore, it is necessary to propose a new recognition method of pick-and-place behavior.

发明内容Summary of the invention

本说明书实施例的一个目的是提供一种识别取放行为的新的技术方案。One purpose of the embodiments of this specification is to provide a new technical solution for identifying pick-and-place behavior.

根据本说明书实施例的第一方面，提供了一种取放行为的识别方法，包括：According to the first aspect of the embodiments of this specification, there is provided a method for identifying pick-and-place behavior, including:

基于所采集的多帧待识别图像，获得每帧所述待识别图像对应的特征点坐标；Obtain the feature point coordinates corresponding to each frame of the to-be-recognized image based on the collected multiple frames of the to-be-recognized image;

基于多帧所述特征点坐标，得到所述目标物的动作类别的识别结果，并计算得到所述目标物与货架之间的距离；Obtain the recognition result of the action category of the target based on the coordinates of the feature points in multiple frames, and calculate the distance between the target and the shelf;

根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果。According to the action category and the distance between the target and the shelf, the recognition result of the pick and place behavior is obtained.

可选地，其中，所述基于所采集的多帧待识别图像，获得每帧所述待识别图像中目标物对应的特征点坐标的步骤，包括：Optionally, wherein the step of obtaining the feature point coordinates corresponding to the target object in each frame of the image to be identified based on the acquired multiple frames of images to be identified includes:

分别将每帧所述待识别图像作为检测算法的输入，输出对应的目标物图像；Each frame of the to-be-recognized image is used as the input of the detection algorithm, and the corresponding target image is output;

将所述目标物图像作为特征点提取算法的输入，输出所述特征点坐标。The target image is used as an input of a feature point extraction algorithm, and the feature point coordinates are output.

可选地，其中，所述基于多帧所述特征点坐标，得到所述目标物的动作类别的识别结果的步骤，包括：Optionally, wherein the step of obtaining the recognition result of the action category of the target object based on the coordinate of the feature points in multiple frames includes:

分别对多帧所述特征点坐标进行编码，并将编码后的特征点坐标作为动作识别算法的输入，计算得到对应的所述动作类别，以及每个类别的动作的起始时刻和结束时刻，作为所述识别结果。Encode the feature point coordinates of multiple frames respectively, and use the encoded feature point coordinates as the input of the action recognition algorithm to calculate the corresponding action category, as well as the start time and end time of each category of action, As the recognition result.

可选地，其中，所述基于多帧所述特征点坐标，计算得到所述目标物与货架之间的距离的步骤，包括：Optionally, wherein the step of calculating the distance between the target object and the shelf based on the coordinate of the characteristic points in multiple frames includes:

基于所述特征点坐标中的第一指定点坐标和第二指定点坐标，以及预先确定的货架立面标注和地面标注，计算得到所述目标物与货架之间的距离。Based on the coordinates of the first designated point and the coordinate of the second designated point in the coordinate of the characteristic point, and the predetermined shelf elevation label and ground label, the distance between the target and the shelf is calculated.

可选地，其中，所述根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果的步骤之前，所述方法还包括：Optionally, before the step of obtaining the recognition result of the pick and place behavior according to the action category and the distance between the target and the shelf, the method further includes:

基于所述目标物图像得到所述目标物是否持有商品的确定结果；Obtaining a determination result of whether the target object holds a commodity based on the target object image;

所述根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果的步骤包括：The step of obtaining the recognition result of the pick-and-place behavior according to the action category and the distance between the target object and the shelf includes:

根据所述动作类别、所述目标物与货架之间的距离以及所述目标物是否持有商品的确定结果，得到所述取放行为的识别结果。According to the action type, the distance between the target object and the shelf, and the determination result of whether the target object holds a commodity, the recognition result of the pick-and-place behavior is obtained.

可选地，其中，所述基于所述目标物图像得到所述目标物是否持有商品的确定结果的步骤，包括：Optionally, wherein the step of obtaining a determination result of whether the target object holds a commodity based on the target object image includes:

采用所述二分类模型对所述目标物图像进行计算，得到所述目标物是否持有商品的确定结果。The two-classification model is used to calculate the target image to obtain a determination result of whether the target holds a commodity.

可选地，其中，所述多帧待识别图像通过单目摄像机采集得到。Optionally, wherein the multiple frames of images to be recognized are acquired by a monocular camera.

可选地，其中，所述目标物图像至少包括：人体图像，机器人图像，或者，无人机图像。Optionally, wherein the target image includes at least: a human body image, a robot image, or an unmanned aerial vehicle image.

根据本说明书实施例的第二方面，还提供一种取放行为的识别装置，包括：According to the second aspect of the embodiments of the present specification, there is also provided a pick-and-place behavior identification device, including:

获取模块，用于基于所采集的多帧待识别图像，获得每帧所述待识别图像中目标物对应的特征点坐标；An acquisition module, configured to obtain the coordinates of the feature points corresponding to the target in each frame of the image to be identified based on the collected multiple frames of images to be identified;

计算模块，用于基于多帧所述特征点坐标，得到所述目标物的动作类别的识别结果，并计算得到所述目标物与货架之间的距离；A calculation module, configured to obtain the recognition result of the action category of the target object based on the coordinate of the feature points in multiple frames, and calculate the distance between the target object and the shelf;

识别模块，用于根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果。The recognition module is used to obtain the recognition result of the pick-and-place behavior according to the action category and the distance between the target object and the shelf.

根据本说明书实施例的第三方面，提供了一种电子设备，包括如本说明书实施例的第二方面所述的取放行为的识别装置，或者，所述电子设备包括：According to the third aspect of the embodiments of the present specification, there is provided an electronic device, including the pick-and-place behavior identification device as described in the second aspect of the embodiments of the present specification, or the electronic device includes:

存储器，用于存储可执行命令；Memory, used to store executable commands;

处理器，用于在所述可执行命令的控制下，执行如本说明书实施例的第一方面中任一项所述的取放行为的识别方法。The processor is configured to, under the control of the executable command, execute the pick-and-place behavior identification method according to any one of the first aspects of the embodiments of this specification.

根据本说明书实施例的第四方面，还提供一种计算机可读存储介质，存储有可执行指令，所述可执行指令被处理器执行时，执行如本说明书实施例第一方面所述的取放行为的识别方法。According to the fourth aspect of the embodiments of the present specification, there is also provided a computer-readable storage medium storing executable instructions, and when the executable instructions are executed by a processor, the fetch as described in the first aspect of the embodiments of the present specification is executed. Recognition method of release behavior.

本说明书实施例的一个有益效果在于，本实施例的方法基于所采集的多帧待识别图像，获得每帧所述待识别图像中目标物对应的特征点坐标；基于多帧所述特征点坐标，得到所述目标物的动作类别的识别结果，并计算得到所述目标物与货架之间的距离；根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果。根据本说明书实施例，可以降低识别卖场内取放行为的成本，并可以实现同时对多个目标物的取放行为进行识别的目的。One beneficial effect of the embodiment of this specification is that the method of this embodiment obtains the feature point coordinates corresponding to the target in each frame of the image to be identified based on the collected multiple frames of images to be identified; based on the multiple frames of the feature point coordinates , Obtain the recognition result of the action category of the target, and calculate the distance between the target and the shelf; according to the action category and the distance between the target and the shelf, obtain the recognition of the pick-and-place behavior result. According to the embodiments of the present specification, the cost of identifying the pick-and-place behavior in the store can be reduced, and the purpose of simultaneously identifying the pick-and-place behavior of multiple targets can be achieved.

通过以下参照附图对本说明书实施例的示例性实施例的详细描述，本说明书实施例的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the embodiments of the present specification with reference to the accompanying drawings, other features and advantages of the embodiments of the present specification will become clear.

Description of the drawings

被结合在说明书中并构成说明书的一部分的附图示出了本说明书实施例的实施例，并且连同其说明一起用于解释本说明书实施例的原理。The drawings incorporated in the specification and constituting a part of the specification illustrate the embodiments of the embodiments of the specification, and together with the descriptions are used to explain the principles of the embodiments of the specification.

图1是示出可以实现本发明的实施例的服务器1000的硬件配置的框图；FIG. 1 is a block diagram showing the hardware configuration of a server 1000 that can implement an embodiment of the present invention;

图2是根据本说明书实施例的取放行为的识别方法的流程图；Fig. 2 is a flowchart of a method for recognizing pick-and-place behavior according to an embodiment of the present specification;

图3是根据本说明书实施例的例子的流程图；Figure 3 is a flowchart of an example according to an embodiment of the present specification;

图4是根据本说明书实施例的识别过程示意图；Figure 4 is a schematic diagram of the recognition process according to an embodiment of the present specification;

图5是根据本说明书实施例的计算手与货架之间的距离的示意图；Figure 5 is a schematic diagram of calculating the distance between a hand and a shelf according to an embodiment of the present specification;

图6是根据本说明书实施例的取放行为的识别装置的原理框图；Fig. 6 is a functional block diagram of a pick-and-place behavior identification device according to an embodiment of the present specification;

图7示出了根据本说明书实施例的电子设备的原理框图。Fig. 7 shows a functional block diagram of an electronic device according to an embodiment of the present specification.

Detailed ways

现在将参照附图来详细描述本说明书实施例的各种示例性实施例。应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本说明书实施例的范围。Various exemplary embodiments of the embodiments of the present specification will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement of components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the embodiments of this specification.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本说明书实施例及其应用或使用的任何限制。The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any limitation to the embodiment of the specification and its application or use.

对于相关领域普通技术人物已知的技术、方法和设备可能不作详细讨论，但在适当情况下，该技术、方法和设备应当被视为说明书的一部分。The technologies, methods, and equipment known to persons of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the technologies, methods, and equipment should be regarded as part of the specification.

在这里示出和讨论的所有例子中，任何具体值应被解释为仅仅是示例性的，而不是作为限制。因此，示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following drawings, therefore, once an item is defined in one drawing, it does not need to be further discussed in the subsequent drawings.

<硬件配置><Hardware Configuration>

图1是示出可以实现本发明的实施例的服务器1000的硬件配置的框图。FIG. 1 is a block diagram showing a hardware configuration of a server 1000 that can implement an embodiment of the present invention.

服务器1000例如可以是刀片服务器等。The server 1000 may be, for example, a blade server or the like.

在一个例子中，服务器1000可以是一台计算机。In an example, the server 1000 may be a computer.

在另一个例子中，服务器1000可以如图2所示，包括处理器1100、存储器1200、接口装置1300、通信装置1400、显示装置1500、输入装置1600。尽管服务器也可以包括扬声器、麦克风等等，但是，这些部件与本发明无关，故在此省略。In another example, as shown in FIG. 2, the server 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, and an input device 1600. Although the server may also include speakers, microphones, etc., these components are not related to the present invention and are therefore omitted here.

其中，处理器1100例如可以是中央处理器CPU、微处理器MCU等。存储器1200例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1300例如包括USB接口、串行接口等。通信装置1400例如能够进行有线或无线通信。显示装置1500例如是液晶显示屏。输入装置1600例如可以包括触摸屏、键盘等。Wherein, the processor 1100 may be, for example, a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a serial interface, and the like. The communication device 1400 can perform wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display. The input device 1600 may include, for example, a touch screen, a keyboard, and the like.

本实施例中，服务器1000用于基于所采集的多帧待识别图像，获得每帧所述待识别图像中目标物对应的特征点坐标；基于多帧所述特征点坐标，得到所述目标物的动作类别的识别结果，并计算得到所述目标物与货架之间的距离；根据所述动作类别和所述目标物与货架之间的距离，得到取放行为的识别结果。In this embodiment, the server 1000 is configured to obtain the feature point coordinates corresponding to the target object in each frame of the image to be recognized based on the collected multiple frames of images to be recognized; to obtain the target object based on the feature point coordinates of the multiple frames And calculate the distance between the target object and the shelf; according to the action type and the distance between the target object and the shelf, the recognition result of the pick-and-place behavior is obtained.

图1所示的服务器仅仅是说明性的并且决不意味着对本发明、其应用或使用的任何限制。应用于本发明的实施例中，服务器1000的的该存储器1200用于存储指令，该指令用于控制该处理器1100进行操作以执行本发明实施例提供的任意一项取放行为的识别方法。The server shown in FIG. 1 is only illustrative and does not imply any restriction on the present invention, its application or use. In an embodiment of the present invention, the memory 1200 of the server 1000 is used to store instructions, and the instructions are used to control the processor 1100 to operate to execute any one of the pick-and-place behavior identification methods provided in the embodiments of the present invention.

本领域技术人员应当理解，尽管在图1中对服务器1000的示出了多个装置，但是，本发明可以仅涉及其中的部分装置，例如，服务器1000的只涉及处理器1100和存储装置1200。技术人员可以根据本发明所公开方案设计指令。指令如何控制处理器进行操作，这是本领域公知，故在此不再详细描述。Those skilled in the art should understand that although multiple devices are shown for the server 1000 in FIG. 1, the present invention may only involve some of the devices. For example, the server 1000 only involves the processor 1100 and the storage device 1200. Technical personnel can design instructions according to the disclosed scheme of the present invention. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.

<方法实施例><Method Example>

本实施例提供了一种取放行为的识别方法，该方法例如可以由如图1所示的服务器1000执行。This embodiment provides a method for recognizing pick-and-place behavior, and the method may be executed by, for example, the server 1000 as shown in FIG. 1.

如图2所示，该方法包括以下步骤2000～2400：As shown in Figure 2, the method includes the following steps 2000-2400:

步骤2000，基于所采集的多帧待识别图像，获得每帧该待识别图像中目标物对应的特征点坐标。Step 2000: Obtain the coordinates of the feature points corresponding to the target in each frame of the image to be identified based on the acquired multiple frames of images to be identified.

其中，该多帧待识别图像通过单目摄像机采集得到。在本实施例中，通过预先对货架立面的标注给出了空间信息，从而可以仅适用单目摄像机采集到的多帧待识别图像，实现目标物与货架的三维空间距离测量。Wherein, the multi-frame image to be recognized is acquired by the monocular camera. In this embodiment, the space information is given in advance for the labeling of the shelf facade, so that only the multi-frame to-be-identified images collected by the monocular camera can be used to realize the three-dimensional space distance measurement between the target and the shelf.

在本步骤中，该服务器1000具体可以利用检测算法和特征点提取算法，计算得到特征点坐标。具体的，该服务器1000可以分别将每帧该待识别图像作为检测算法的输入，输出对应的目标物图像；并将该目标物图像作为特征点提取算法的输入，输出该特征点坐标。In this step, the server 1000 may specifically use the detection algorithm and the feature point extraction algorithm to calculate the feature point coordinates. Specifically, the server 1000 may use each frame of the image to be recognized as the input of the detection algorithm and output the corresponding target image; and use the target image as the input of the feature point extraction algorithm to output the coordinate of the feature point.

在实际应用中，该服务器1000识别的对象可以是人体，也可以是机器人，或者，也可以是无人机等。相应的，该目标物图像可以是人体图像，机器人图像，或者无人机图像。In practical applications, the object recognized by the server 1000 may be a human body, a robot, or a drone. Correspondingly, the target image may be a human body image, a robot image, or an unmanned aerial vehicle image.

在一个例子中，假设单目摄像机采集到的多帧待识别图像是用户图像，相应的，则该服务器1000可以利用人体检测算法和人体骨架点提取算法计算得到人体骨架点坐标。即，该服务器1000分别将每帧用户图像作为人体检测算法的输入，输出对应的人体图像，并将该人体图像输入人体骨架点提取算法，得到该人体图像对应的人体骨架点坐标。In an example, it is assumed that the multiple frames of images to be recognized collected by the monocular camera are user images. Accordingly, the server 1000 may use the human body detection algorithm and the human skeleton point extraction algorithm to calculate the coordinates of the human skeleton point. That is, the server 1000 uses each frame of user image as the input of the human body detection algorithm, outputs the corresponding human body image, and inputs the human body image into the human skeleton point extraction algorithm to obtain the human skeleton point coordinates corresponding to the human body image.

其中，该人体骨架点提取算法例如可以是人体姿态识别算法(OpenPose)，人体姿态估计算法(AlphaPose)等等，在此不做具体限定。Wherein, the human skeleton point extraction algorithm may be, for example, a human pose recognition algorithm (OpenPose), a human pose estimation algorithm (AlphaPose), etc., which are not specifically limited herein.

步骤2200，基于多帧该特征点坐标，得到该目标物的动作类别的识别结果，并计算得到该目标物与货架之间的距离。Step 2200: Obtain the recognition result of the action category of the target based on the coordinate of the feature point in multiple frames, and calculate the distance between the target and the shelf.

本步骤中，该服务器1000在基于多帧该特征点坐标，得到该目标物的动作类别的识别结果时，可以分别对多帧该特征点坐标进行编码，并将编码后的特征点坐标作为动作识别算法的输入，计算得到对应的该动作类别，以及每个类别的动作的起始时刻和结束时刻，作为该识别结果。In this step, when the server 1000 obtains the recognition result of the action category of the target object based on the coordinate of the feature point in multiple frames, the server 1000 may respectively encode the coordinate of the feature point in the multiple frames, and use the coded feature point coordinates as the action The input of the recognition algorithm, the corresponding action category, and the start time and end time of each category of action are calculated as the recognition result.

如图3所示，以目标物图像为人体图像为例，根据人体图像对应的人体骨架点坐标可以识别出伸手、摸货架、取回商品等动作。在识别时，如图4所示，该服务器1000先将多帧该人体骨架点坐标编码为图的形式，然后作为卷积神经网络(CNN)，例如动作识别算法的输入，计算得到对应的识别结果，如伸手、缩回、摸货架等。As shown in Figure 3, taking the target image as a human body image as an example, actions such as extending a hand, touching a shelf, and retrieving goods can be recognized according to the coordinates of the human skeleton point corresponding to the human body image. During recognition, as shown in Figure 4, the server 1000 first encodes multiple frames of the human skeleton point coordinates in the form of a graph, and then uses it as the input of a convolutional neural network (CNN), such as an action recognition algorithm, to calculate the corresponding recognition As a result, such as reaching out, retracting, touching shelves, etc.

其中，对人体骨架点坐标进行编码，具体可以是对每个人体骨架点坐标进行归一化处理，对每个骨架点的x和y坐标减去脖子点的坐标，再除以人的身高，从而排除人高度和位置的因素，然后按照坐标轴排序：x轴是坐标点的类型，比如手、脚、肩膀等，y轴是时间，z轴是横坐标、纵坐标和置信度。Among them, the encoding of the coordinates of the human skeleton point can specifically be normalization of the coordinates of each human skeleton point, subtracting the coordinates of the neck point from the x and y coordinates of each skeleton point, and dividing by the height of the person, In order to exclude the factors of human height and position, and then sort them according to the coordinate axis: the x-axis is the type of coordinate points, such as hands, feet, shoulders, etc., the y-axis is time, and the z-axis is the abscissa, ordinate, and confidence.

在动作识别时，该服务器1000以固定间隔和固定长度沿时间轴取人体骨架点并识别后，将相同的动作识别结果进行合并，得出每个动作起始时刻和结束时刻。其中，固定长度指的是取帧的数量，比如，取11帧的关键点，设置固定间隔是考虑到相邻两帧的运动幅度太小而设置的，例如，每隔一帧取一次骨架点。During the action recognition, the server 1000 takes and recognizes the skeleton points of the human body along the time axis at a fixed interval and a fixed length, and then combines the same action recognition results to obtain the start time and end time of each action. Among them, the fixed length refers to the number of frames taken, for example, take the key points of 11 frames, and the fixed interval is set considering that the motion amplitude of two adjacent frames is too small. For example, take a skeleton point every other frame .

该服务器1000在基于多帧该特征点坐标，计算得到该目标物与货架之间的距离时，具体可以基于该特征点坐标中的第一指定点坐标和第二指定点坐标，以及预先确定的货架立面标注和地面标注，计算得到该目标物与货架之间的距离。When the server 1000 calculates the distance between the target object and the shelf based on the coordinate of the characteristic point in multiple frames, it may be specifically based on the coordinate of the first designated point and the coordinate of the second designated point in the coordinate of the characteristic point, and the predetermined Label the shelf elevation and the floor, and calculate the distance between the target and the shelf.

例如，在计算用户的手与货架之间的距离时，该服务器1000可以基于人体骨架点坐标中的手坐标(即上述第一指定点坐标)和脚底点坐标(即上述第二指定点坐标)，以及预先确定的货架立面标注和地面标注，计算得到该手与货架之间的距离。For example, when calculating the distance between the user's hand and the shelf, the server 1000 may be based on the hand coordinates (that is, the coordinates of the first designated point) and the coordinates of the sole point (that is, the coordinates of the second designated point) in the coordinates of the human skeleton point. , And pre-determined shelf labeling and floor labeling, calculate the distance between the hand and the shelf.

如图5所示，abcd构成的矩形为地面标注，bcef为货架立面标注。货架立面需要按照货架结构的四条边标注。ab和cd两条边在货架侧面底边延长线上。cb为货架底边。在实际计算时，假设手脚深度差可以忽略不计，即平面AFH与货架平面bcef垂直。As shown in Figure 5, the rectangle formed by abcd is the ground label, and bcef is the shelf elevation label. The shelf facade needs to be marked according to the four sides of the shelf structure. The two sides ab and cd are on the extension line of the bottom edge of the shelf side. cb is the bottom edge of the shelf. In the actual calculation, it is assumed that the depth difference between hands and feet is negligible, that is, the plane AFH is perpendicular to the shelf plane bcef.

假设cd,ab近似平行，由边cd,ab推算直线AF方程，并计算A点坐标。具体的，已知abcd点坐标，直线cd,ab的斜率为k,F坐标为(x _f，y _f)，则有AF直线方程为y＝kx+(y _F-x _F×k)。 Assuming that cd and ab are approximately parallel, the straight line AF equation is calculated from the sides cd and ab, and the coordinates of point A are calculated. Specifically, given the coordinates of the abcd point, the slope of the straight line cd, ab is k, and the F coordinate is (x _f , y _f ), then the AF straight line equation is y=kx+(y _F -x _F ×k).

计算直线bc与AF的交点A：直线bc和AF已经求得，写为一般式Calculate the intersection A of the straight line bc and AF: the straight line bc and AF have been obtained, written as a general formula

a _bcx+b _bcy+c _bc＝0 a _bc x+b _bc y+c _bc =0

a _AFx+b _AFy+c _AF＝0 a _AF x+b _AF y+c _AF =0

则直线bc与AF的交点A的计算公式为：

Then the calculation formula for the intersection A of the straight line bc and AF is:

然后根据Ac,Ab比例，推算B的坐标(x _B，y _B)，d(x)为两点间距离公式，计算比值

d(fB)＝d(ef)×r，可得，x _B＝min(x _f，x _e)+|x _e-x _f|×r，y _B＝min(y _f，y _e)+|y _e-y _f|×r。 Then according to the ratio of Ac, Ab, calculate the coordinates of B (x _B , y _B ), d(x) is the distance formula between the two points, and calculate the ratio

d(fB)=d(ef)×r, we can get, x _B =min(x _f ,x _e )+|x _e -x _f |×r,y _B =min(y _f ,y _e )+| y _e -y _f |×r.

基于A点坐标和B点坐标计算直线方程AB。Calculate the straight line equation AB based on the coordinates of point A and point B.

由于直线HT和直线AF方向相同，由H点坐标和直线AF的方向计算HT的直线方程。具体的，A点和F点坐标已知，可求直线方程，设AF的直线方程斜率为k，直线HT斜率也为k，H点坐标已知(x _H，y _H)，HT直线方程为y＝kx+(y _H-x _H×k)。 Since the direction of the straight line HT and the straight line AF are the same, the straight line equation of HT is calculated from the coordinates of the H point and the direction of the straight line AF. Specifically, the coordinates of point A and point F are known, and the linear equation can be obtained. Suppose the slope of the linear equation of AF is k, the slope of the straight line HT is also k, the coordinates of point H are known (x _H , y _H ), and the linear equation of HT is y=kx+(y _H- x _H × k).

然后计算AB和HT交点T。由上述可知，直线AB和HT已经求得，写为一般式：a _ABx+b _ABy+c _AB＝0，a _HTx+b _HTy+c _HT＝0。则两直线交点计算公式为：

Then calculate the intersection point T of AB and HT. It can be seen from the above that the straight lines AB and HT have been obtained and written as general formulas: a _AB x+b _AB y+c _AB =0, a _HT x+b _HT y+c _HT =0. The calculation formula for the intersection of two straight lines is:

基于H点坐标和T点坐标，可计算得到线段HT长度，即手到货架距离。当手到货架距离HT足够小，则认为手触碰到货架。Based on the coordinates of the H point and the T point, the HT length of the line segment can be calculated, that is, the distance from the hand to the shelf. When the distance HT from the hand to the shelf is small enough, it is considered that the hand touches the shelf.

需要说明的是，在本步骤中，服务器1000计算的目标物与货架之间的距离，可以是空间实际距离，也可以是图像上的像素距离。其中，实际距离需要标定像素与空间的对应关系，从而反映出目标物与货架之间的真实距离，例如，H点处像素距离与空间距离之间的近似比值为r(该r是根据实际测量得到的近似数值)，则空间距离(厘米/米)＝d(HT)*r。It should be noted that in this step, the distance between the target and the shelf calculated by the server 1000 may be the actual distance in space or the pixel distance on the image. Among them, the actual distance needs to calibrate the corresponding relationship between the pixel and the space to reflect the true distance between the target and the shelf. For example, the approximate ratio between the pixel distance and the space distance at point H is r (the r is based on the actual measurement The approximate value obtained), then the spatial distance (cm/m)=d(HT)*r.

在实际应用中，计算空间实际距离得到的结果是更为准确的。如果空间没有标定，则只能使用像素距离得到近似值，但由于像素的个数无法与空间中的厘米对应，因此该结果会不够准确。In practical applications, the result obtained by calculating the actual distance in space is more accurate. If the space is not calibrated, only the pixel distance can be used to obtain an approximate value, but since the number of pixels cannot correspond to the centimeter in the space, the result will not be accurate enough.

步骤2400，根据该动作类别和该目标物与货架之间的距离，得到取放行为的识别结果。Step 2400: According to the action category and the distance between the target object and the shelf, a recognition result of the pick and place behavior is obtained.

具体的，该动作类别包括伸出、缩回、碰摸货架等，在每个动作的起始时刻和终止时刻之间，目标物与货架之间的距离逐渐缩小，则该服务器1000可以识别到该行为为伸出，在每个动作的起始时刻和终止时刻之间，目标物与货架之间的距离逐渐增大，则该服务器1000可以识别到该行为为缩回等。Specifically, the action category includes extending, retracting, touching the shelf, etc. Between the start time and the end time of each action, the distance between the target and the shelf is gradually reduced, then the server 1000 can recognize This behavior is extending. Between the start time and the end time of each action, the distance between the target and the shelf gradually increases, and the server 1000 can recognize that the behavior is retracting.

在一个例子中，该服务器1000还可以基于该目标物图像得到目标物是否持有商品的确定结果。具体的，该服务器1000可以采用该二分类模型对该目标物图像进行计算，得到该目标物是否持有商品的确定结果。In an example, the server 1000 may also obtain a determination result of whether the target object holds a commodity based on the target object image. Specifically, the server 1000 may use the two-classification model to calculate the target image to obtain a determination result of whether the target holds a commodity.

相应的，该服务器1000会根据该动作类别、该目标物与货架之间的距离以及该目标物是否持有商品的确定结果，得到该取放行为的识别结果。Correspondingly, the server 1000 will obtain the recognition result of the pick-and-place behavior according to the action type, the distance between the target object and the shelf, and the determination result of whether the target object holds a commodity.

在本例中，识别结果中包括动作类别，以及每个类别的动作的起始时刻和结束时刻。该服务器1000可以根据该识别结果，判断出目标物是取货还是放货。具体的，起始时刻和终止时刻之间是动作的过程，也就是伸入货架或者伸出货架，结合这个过程的目标物是否持有商品的确定结果，得到取放行为的识别结果。例如，伸入过程中目标物没有持有商品，伸出过程中目标物持有商品，则这是取货过程。伸入过程中目标物持有商品，伸出过程中目标物没有持有商品，则这是放货过程。In this example, the recognition result includes the action category, and the start time and end time of each category of action. The server 1000 can determine whether the target object is to pick up or release the goods based on the recognition result. Specifically, between the start time and the end time is a process of action, that is, reaching into the shelf or extending the shelf, and combining the determination result of whether the target object holds the commodity in this process, the recognition result of the pick-and-place behavior is obtained. For example, if the target does not hold the commodity during the extension process, and the target holds the commodity during the extension process, this is the process of picking up the goods. The target holds the commodity during the extension process, and the target does not hold the commodity during the extension process, then this is the delivery process.

以上已结合附图和具体的例子对本实施例的方法进行了说明。该服务器基于所采集的多帧待识别图像，获得每帧该待识别图像中目标物对应的特征点坐标；基于多帧该特征点坐标，得到目标物的动作类别的识别结果，并计算得到目标物与货架之间的距离；根据该动作类别和目标物与货架之间的距离，得到取放行为的识别结果。本实施例借助对货架的标注，使用单目摄像机实现目标物与货架之间距离的测量，操作简单，实现了同时对多个目标物的取放行为的识别，且对现有卖场设备的改造规模小，有效降低了实现成本和维护成本。The method of this embodiment has been described above with reference to the drawings and specific examples. The server obtains the coordinate of the feature point corresponding to the target in each frame of the image to be recognized based on the collected multiple frames of images to be recognized; based on the coordinate of the feature point in the multiple frames, it obtains the recognition result of the action category of the target, and calculates the target The distance between the object and the shelf; according to the action category and the distance between the target object and the shelf, the recognition result of the pick-and-place behavior is obtained. This embodiment uses a monocular camera to measure the distance between the target and the shelf with the help of the labeling of the shelf. The operation is simple, and the recognition of the picking and placing behavior of multiple targets at the same time is realized, and the existing store equipment is modified. The small scale effectively reduces the implementation cost and maintenance cost.

<装置实施例><Device Example>

本实施例提供一种取放行为的识别装置，该装置例如是图6所示的取放行为的识别装置6000，该取放行为的识别装置6000可以包括获取模块6100，计算模块6200以及识别模块6300。This embodiment provides a pick-and-place behavior recognition device. The device is, for example, the pick-and-place behavior recognition device 6000 shown in FIG. 6. The pick-and-place behavior recognition device 6000 may include an acquisition module 6100, a calculation module 6200, and an identification module 6300.

其中，获取模块6100，用于基于所采集的多帧待识别图像，获得每帧该待识别图像中目标物对应的特征点坐标。Wherein, the acquisition module 6100 is configured to obtain the coordinates of the feature points corresponding to the target in each frame of the image to be identified based on the acquired multiple frames of images to be identified.

计算模块6200，用于基于多帧该特征点坐标，得到目标物的动作类别的识别结果，并计算得到目标物与货架之间的距离。The calculation module 6200 is configured to obtain the recognition result of the action category of the target object based on the coordinate of the feature point in multiple frames, and calculate the distance between the target object and the shelf.

识别模块6300，用于根据该动作类别和目标物与货架之间的距离，得到取放行为的识别结果。The recognition module 6300 is used to obtain the recognition result of the pick-and-place behavior according to the action category and the distance between the target object and the shelf.

其中，该多帧待识别图像通过单目摄像机采集得到。Wherein, the multi-frame image to be recognized is acquired by the monocular camera.

具体的，该获取模块6100具体用于：分别将每帧该待识别图像作为检测算法的输入，输出对应的目标物图像；将该目标物图像作为特征点提取算法的输入，输出该特征点坐标。Specifically, the acquisition module 6100 is specifically configured to: use each frame of the image to be recognized as the input of the detection algorithm and output the corresponding target image; use the target image as the input of the feature point extraction algorithm to output the coordinate of the feature point .

在一个例子中，该计算模块6200具体可以用于分别对多帧该特征点坐标进行编码，并将编码后的特征点坐标作为动作识别算法的输入，计算得到对应的该动作类别，以及每个类别的动作的起始时刻和结束时刻，作为该识别结果。In an example, the calculation module 6200 can be specifically used to separately encode the feature point coordinates of multiple frames, and use the encoded feature point coordinates as the input of the action recognition algorithm to calculate the corresponding action category and each The start time and end time of the action of the category are used as the recognition result.

在一个例子中，该计算模块6200具体可以用于基于该特征点坐标中的第一指定点坐标和第二指定点坐标，以及预先确定的货架立面标注和地面标注，计算得到该目标物与货架之间的距离。In an example, the calculation module 6200 can be specifically configured to calculate the target object and the target object based on the first designated point coordinates and the second designated point coordinates in the characteristic point coordinates, and the predetermined shelf elevation and ground labels. The distance between the shelves.

在一个例子中，该计算模块6200还可以用于基于该目标物图像得到目标物是否持有商品的确定结果；相应的，该识别模块6300具体用于根据该动作类别、该目标物与货架之间的距离以及该目标物是否持有商品的确定结果，得到该取放行为的识别结果。In an example, the calculation module 6200 can also be used to obtain a determination result of whether the target object holds a commodity based on the target object image; correspondingly, the identification module 6300 is specifically configured to determine whether the target object holds a commodity according to the action category, the target object and the shelf. The distance between the two and the determination result of whether the target object holds the commodity, and the recognition result of the pick-and-place behavior is obtained.

其中，该计算模块6200在基于该目标物图像得到用目标物是否持有商品的确定结果时，具体可以采用该二分类模型对该目标物图像进行计算，得到该目标物是否持有商品的确定结果。Wherein, when the calculation module 6200 obtains the determination result of whether the target object holds the commodity based on the target object image, it can specifically calculate the target object image by using the two-class model to obtain the determination of whether the target object holds the commodity. result.

本实施例的取放行为的识别装置，可用于执行上述方法实施例的技术方案，其实现原理及技术效果类似，此处不再赘述。The pick-and-place behavior identification device of this embodiment can be used to implement the technical solutions of the foregoing method embodiments, and its implementation principles and technical effects are similar, and will not be repeated here.

<设备实施例><Equipment Example>

本实施例中，还提供一种电子设备，该电子设备包括本说明书装置实施例中描述的取放行为的识别装置6000；或者，该电子设备为图7所示的电子设备7000，包括：In this embodiment, there is also provided an electronic device, which includes the pick-and-place behavior identification device 6000 described in the device embodiment of this specification; or, the electronic device is the electronic device 7000 shown in FIG. 7 and includes:

存储器7100，用于存储可执行命令。The memory 7100 is used to store executable commands.

处理器7200，用于在存储器7100存储的可执行命令的控制下，执行本说明书任意方法实施例中描述的方法。The processor 7200 is configured to execute the method described in any method embodiment in this specification under the control of the executable command stored in the memory 7100.

在电子设备根据所执行的方法实施例的实施主体可以是服务器。The implementation subject of the electronic device according to the executed method embodiment may be a server.

<计算机可读存储介质实施例><Computer-readable storage medium embodiment>

本实施例提供一种计算机可读存储介质，该存储介质中存储有可执行命令，该可执行命令被处理器执行时，执行本说明书任意方法实施例中描述的方法。This embodiment provides a computer-readable storage medium in which an executable command is stored, and when the executable command is executed by a processor, the method described in any method embodiment in this specification is executed.

本说明书实施例可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质，其上载有用于使处理器实现本说明书实施例的各个方面的计算机可读程序指令。The embodiments of this specification may be systems, methods and/or computer program products. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the embodiments of the present specification.

计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples of computer-readable storage media (non-exhaustive list) include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.

这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备，或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令，并转发该计算机可读程序指令，以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .

用于执行本说明书实施例操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码，编程语言包括面向对象的编程语言—诸如Smalltalk、C++等，以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中，远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机，或者，可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中，通过利用计算机可读程序指令的状态信息来个性化定制电子电路，例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA)，该电子电路可以执行计算机可读程序指令，从而实现本说明书实施例的各个方面。The computer program instructions used to perform the operations of the embodiments of this specification may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or one or more programming Source code or object code written in any combination of languages. Programming languages include object-oriented programming languages-such as Smalltalk, C++, etc., and conventional procedural programming languages-such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server implement. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet). connect). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions. The computer-readable program instructions are executed to realize various aspects of the embodiments of this specification.

这里参照根据本说明书实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本说明书实施例的各个方面。应当理解，流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合，都可以由计算机可读程序指令实现。Here, various aspects of the embodiments of the present specification are described with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present specification. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器，从而生产出一种机器，使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时，产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中，这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作，从而，存储有指令的计算机可读介质则包括一个制造品，其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner, so that the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上，使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤，以产生计算机实现的过程，从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions onto a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

附图中的流程图和框图显示了根据本说明书实施例的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上，流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分，模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中，方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如，两个连续的方框实际上可以基本并行地执行，它们有时也可以按相反的顺序执行，这依所涉及的功能而定。也要注意的是，框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合，可以用执行规定的功能或动作的专用的基于硬件的系统来实现，或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人物来说公知的是，通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the embodiments of this specification. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more executables for realizing the specified logic function. instruction. In some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that realization by hardware, realization by software, and realization by a combination of software and hardware are all equivalent.

以上已经描述了本说明书实施例的各实施例，上述说明是示例性的，并非穷尽性的，并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下，对于本技术领域的普通技术人物来说许多修改和变更都是显而易见的。本文中所用术语的选择，旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进，或者使本技术领域的其它普通技术人物能理解本文披露的各实施例。本说明书实施例的范围由所附权利要求来限定。The various embodiments of the embodiments of the present specification have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of each embodiment, or to enable other ordinary technical persons in the technical field to understand the various embodiments disclosed herein. The scope of the embodiments of this specification is defined by the appended claims.

Claims

A method for identifying pick-and-place behavior, including:

Based on the collected multiple frames of images to be identified, obtaining the coordinates of the feature points corresponding to the target in each frame of the image to be identified;

Obtain the recognition result of the action category of the target based on the coordinates of the feature points in multiple frames, and calculate the distance between the target and the shelf;

According to the action category and the distance between the target and the shelf, the recognition result of the pick and place behavior is obtained.

The method according to claim 1, wherein the step of obtaining the coordinate of the feature point corresponding to the target object in each frame of the image to be recognized based on the collected multiple frames of the image to be recognized comprises:

Each frame of the to-be-recognized image is used as the input of the detection algorithm, and the corresponding target image is output;

The target image is used as an input of a feature point extraction algorithm, and the feature point coordinates are output.

The method according to claim 1, wherein the step of obtaining the recognition result of the action category of the target object based on the coordinate of the feature points in multiple frames comprises:

Encode the feature point coordinates of multiple frames respectively, and use the encoded feature point coordinates as the input of the action recognition algorithm to calculate the corresponding action category, as well as the start time and end time of each category of action, As the recognition result.

The method according to claim 1, wherein the step of calculating the distance between the target object and the shelf based on the coordinate of the characteristic points in multiple frames comprises:

Based on the coordinates of the first designated point and the coordinate of the second designated point in the coordinate of the characteristic point, and the predetermined shelf elevation label and ground label, the distance between the target and the shelf is calculated.

The method according to claim 2, wherein before the step of obtaining the recognition result of the pick and place behavior according to the action category and the distance between the target object and the shelf, the method further comprises:

Obtaining a determination result of whether the target object holds a commodity based on the target object image;

The step of obtaining the recognition result of the pick-and-place behavior according to the action category and the distance between the target object and the shelf includes:

According to the action type, the distance between the target object and the shelf, and the determination result of whether the target object holds a commodity, the recognition result of the pick-and-place behavior is obtained.

The method according to claim 5, wherein the step of obtaining a determination result of whether the target object holds a commodity based on the target object image comprises:

A two-class model is used to calculate the image of the target, and the result of determining whether the target holds a commodity is obtained.

The method according to claim 1, wherein the multiple frames of to-be-recognized images are collected by a monocular camera.

The method according to claim 2, wherein the target image includes at least: a human body image, a robot image, or an unmanned aerial vehicle image.

A recognition device for pick-and-place behavior, including:

An acquisition module, configured to obtain the coordinates of the feature points corresponding to the target in each frame of the image to be identified based on the collected multiple frames of images to be identified;

A calculation module, configured to obtain the recognition result of the action category of the target object based on the coordinate of the feature points in multiple frames, and calculate the distance between the target object and the shelf;

The recognition module is used to obtain the recognition result of the pick-and-place behavior according to the action category and the distance between the target object and the shelf.

An electronic device, comprising the pick-and-place behavior identification device according to claim 9, or the electronic device comprises:

Memory, used to store executable commands;

The processor is configured to execute the pick-and-place behavior identification method according to any one of claims 1-8 under the control of the executable command.

A computer-readable storage medium storing executable instructions, and when the executable instructions are executed by a processor, the method for identifying pick-and-place behaviors according to any one of claims 1-8 is executed.