CN110032914B

CN110032914B - Picture labeling method and device

Info

Publication number: CN110032914B
Application number: CN201810029737.4A
Authority: CN
Inventors: 李凡; 吴江旭; 张洪光; 张伟华; 孔磊锋; 彭刚林
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2024-07-19
Anticipated expiration: 2038-01-12
Also published as: CN110032914A

Abstract

The invention discloses a method and a device for labeling pictures, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring at least two marked pictures, the difference value of the time stamps of which is smaller than a threshold value and the time stamps of which are earlier than the picture to be marked, and determining the directionality of an object frame to be marked on the picture to be marked according to the position change of the object frames on the plurality of marked pictures; determining position data of the object frame to be marked according to the directionality of the object frame to be marked; and labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled. The method and the device can solve the problem that a great deal of repetitive labor exists in labeling work.

Description

Picture labeling method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for labeling pictures.

Background

In the field of machine learning, the size of a data set and the quality of labeling of the data set have a critical influence on the output result of a machine learning algorithm.

Taking a picture data set as an example, the basic steps of labeling by adopting the existing labeling tool are as follows: and manually labeling each picture, or acquiring a frame picture from the video, and manually labeling each picture.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: pictures with space-time similarity have great similarity, so that a great deal of repeated labor exists in the labeling work, and the efficiency of the labeling work is low and the cost is high.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a method and a device for labeling pictures, which are used for solving the technical problem that a great deal of repetitive labor exists in labeling work.

To achieve the above object, according to an aspect of the embodiments of the present invention, there is provided a method for labeling a picture, including:

Acquiring at least two marked pictures, the difference value of the time stamps of which is smaller than a threshold value and the time stamps of which are earlier than the picture to be marked, and determining the directionality of an object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures;

Determining position data of the object frame to be marked on the picture to be marked according to the directionality of the object frame to be marked;

and labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled.

Optionally, determining the directionality of the object frame to be marked on the image to be marked according to the position change of the object frames on the at least two marked images includes:

Acquiring position data of object frames on the at least two marked pictures;

and calculating weights of the object frames to be marked on the picture to be marked in different directions according to the change of the position data so as to determine the directionality of the object frames to be marked.

acquiring position data of object frames on the at least two marked pictures, wherein the position data comprises coordinate data, width and height;

determining object frames belonging to the same object on different pictures according to the position data of the object frames;

and calculating weights of the object frames to be marked on the picture to be marked in different directions according to the change of the position data of the object frames belonging to the same object on different pictures so as to determine the directionality of the object frames to be marked.

Optionally, determining the position data of the object frame to be annotated on the picture to be annotated according to the directionality of the object frame to be annotated includes:

according to the weight of the object frame to be marked in different directions, and the time stamp difference value between the object frame to be marked and the picture to be marked is smaller than a threshold value and the time stamp is earlier than the position data of the object frame on the marked picture of the picture to be marked,

And calculating the position data of the object frame to be marked on the picture to be marked.

Adding blank data to the position data of the object frame on the marked picture, wherein the difference value of the time stamp between the blank data and the image to be marked is smaller than a threshold value and the time stamp is earlier than that of the image to be marked, and according to the weights of the object frame to be marked in different directions,

Optionally, labeling the to-be-labeled object frame on the to-be-labeled picture according to an object detection algorithm and the position data of the to-be-labeled object frame includes:

inputting the position data of the object frame to be marked into an object detection algorithm, wherein the object detection algorithm screens out the position data of the object frame to be marked with the object through training and predictive regression;

Mapping the position data of the object frame to be marked with the object to the coordinates of the picture to be marked so as to mark the object frame to be marked on the picture to be marked.

In addition, according to another aspect of the embodiment of the present invention, there is provided an apparatus for labeling a picture, including:

The direction calculation module is used for obtaining at least two marked pictures with the time stamp difference value smaller than a threshold value and the time stamp earlier than the picture to be marked, and determining the directionality of the object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures;

The position calculation module is used for determining the position data of the object frame to be marked on the picture to be marked according to the directionality of the object frame to be marked;

And the labeling module is used for labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled.

Optionally, determining the directionality of the object frame to be annotated according to the position change of the object frames on the at least two annotated pictures includes:

Acquiring position data of object frames on the at least two marked pictures;

Optionally, the position calculation module is configured to:

Optionally, the labeling module is configured to:

and mapping the position data of the object frame to be marked with the object to the coordinates of the picture to be marked, so that the object frame to be marked is marked on the picture to be marked.

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including:

One or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of any of the embodiments described above.

According to another aspect of an embodiment of the present invention, there is also provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of the embodiments described above.

One embodiment of the above invention has the following advantages or benefits: the method adopts the technical means that the directionality of the object frame to be marked is determined according to the position changes of the object frames on at least two marked pictures, so that the position data of the object frame to be marked is determined, the technical problem that a great amount of repetitive labor exists in marking work is solved, the directionality of the object frame to be marked is determined according to the position changes of the object frames on at least two marked pictures based on the space-time similarity of at least two pictures, so that the position data of the object frame to be marked is determined, and then the object frame to be marked is marked on the image to be marked according to an object detection algorithm and the position data of the object frame to be marked, so that the repetitive labor is reduced, the efficiency and the quality of marking work are improved, and the marking cost is reduced.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main flow of a method of labeling pictures according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a change in coordinate data of an object frame over different pictures according to an embodiment of the invention;

FIG. 3 is a schematic diagram of the main flow of a method of labeling pictures according to one referenceable embodiment of the invention;

FIG. 4 is a schematic diagram of main modules of an apparatus for labeling pictures according to an embodiment of the present invention;

FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In order to reduce the labeling repeatability, the current labeling process is only considered from the aspects of picture interception, convenient character labeling input and the like, and the workload of manually labeling pictures is not considered to be reduced by extracting the correlation between at least two pictures.

The method provided by the embodiment of the invention comprises the following steps: and acquiring at least two marked pictures with the time stamp difference value smaller than a threshold value, and determining the directionality of the object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures. Therefore, the method predicts the labeling position of the object frame to be labeled based on the space-time similarity between at least two pictures, thereby reducing the workload of manually labeling the pictures and avoiding the repeated labeling of the pictures with large similarity.

Fig. 1 is a schematic diagram of a main flow of a method for labeling pictures according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the method for labeling a picture may include:

Step 101, obtaining at least two marked pictures with a time stamp difference value smaller than a threshold value and a time stamp earlier than that of the picture to be marked, and determining the directionality of the object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures.

In the step, at least two marked pictures which are closest to the picture to be marked in time and are earlier than the picture to be marked are obtained, and the advancing direction of each object frame to be marked is determined according to the position change of each object frame on the at least two marked pictures on different pictures. For example, according to the position change of one object frame on two adjacent pictures in time, determining the travelling direction of the object frame to be marked as upper left; or determining that the travelling direction of the object frame to be marked is lower right according to the position change of one object frame on three adjacent pictures in time. Therefore, based on at least two pictures with space-time similarity, the directionality of the object frame to be marked on the picture to be marked can be determined. The threshold value is set according to the requirement, and the number of the acquired marked pictures is different due to different threshold values. Specifically, the time stamp difference value of each marked picture and the picture to be marked is calculated respectively, and whether the time stamp difference value is smaller than a threshold value is judged respectively, so that at least two marked pictures with the time stamp difference value of the picture to be marked smaller than the threshold value are obtained.

As another embodiment of the present invention, the determining the directionality of the object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures includes:

Acquiring position data of object frames on the at least two marked pictures;

Specifically, each marked picture can be mapped to the same coordinate, the position data of the object frame on the marked picture is obtained, and the weight of the object frame to be marked in different directions is calculated according to the change of the position data of the object frame on different pictures, so that the directionality of the object frame to be marked is determined. In this embodiment, there is only one object on each picture, so it is not necessary to determine whether or not a plurality of objects on each picture belong to the same object, nor is it necessary to determine which object the objects on each picture belong to.

The following will describe a video frame picture in detail.

Firstly, acquiring video data, selecting a representative time period (with larger passenger flow) for batch frame extraction, and acquiring a large number of frame pictures which all have timestamp information; then, sequencing the frame pictures according to the time sequence, selecting K pictures closest to the time stamp of the picture to be marked, and manually marking the K pictures by using a marking tool; and then, mapping the K pictures onto the same coordinate, and calculating weights of the object frames to be marked in different directions through the position data of the object frames on different pictures, so as to determine the directionality of the object frames.

It should be noted that, the larger the selection of K, the more the number of pictures to be manually marked, the more accurate the prediction of the travelling direction of the object frame to be marked, but the larger the corresponding marking workload and calculation amount will be.

For example, 5 frame pictures (picture a, picture B, picture C, picture D, picture E in order from front to back according to the time stamp) are obtained, if k=2, the pictures to be marked are 3 (picture C, picture D, picture E), comprising the steps of:

1) Two frames (for example, a picture A and a picture B) which are the forefront in time and are continuous in time are obtained, an object frame (for example, a pedestrian frame) is marked on the two frames by using a marking tool, the two frames marked with the object frame are mapped to the same coordinate, and the position data (X, Y, W, H) of each pedestrian frame on each picture is read, for example, the result of the position data (X, Y, W, H) is as follows:

The fields represent, in order from left to right, the X-coordinate, Y-coordinate of the upper left corner of each object frame, and the width and height of the object frame. Wherein, let X direction be width direction, Y direction be the direction of height.

2) And determining the object frames belonging to the same object on different pictures according to the width and the height of the object frame. Specifically, according to the width and the height of the object frames, an IOU (the intersection ratio of two rectangles) algorithm is adopted to calculate the area overlapping areas of the object frames on different pictures, and when the area overlapping areas are larger than a threshold value, the object frames on different pictures are determined to belong to the same object. For example, if the overlapping area of the object frame A1 on the a picture and the object frame B1 on the B picture is 78.27% (greater than the threshold value 75%), it is determined that the object frame A1 and the object frame B1 belong to the same object.

3) And calculating weights of the object frames to be marked in different directions according to the change of the position data of the object frames belonging to the same object on different pictures, so as to determine the directionality of the object frames to be marked and determine the directionality of the object frames to be marked.

For example, as shown in fig. 2, the position data of the object frame A1 is x=145, y=237, w=220, h=354, the position data of the object frame B1 is x=142, y=269, w=266, h=442, and the weights of the object frames to be marked in different directions are calculated:

weights travelling in left (142-145)/145-0.02

Weight of downward travel (269-237)/237 ≡0.14

Weights of width variation (266-220)/220≡0.21

Weight of height variation (442-354)/354≡0.25

It can be seen that the weight deviation of the object frame to be marked travelling in the left direction is smaller, and the weight deviation of the object frame to be marked travelling below is larger, so that the object frame to be marked is predicted to travel towards the left lower direction. It is noted that in this embodiment, the minus sign in-0.02 indicates left, right, and lower in 0.14, and upper, according to the coordinates of fig. 2. The directions represented by the negative and positive signs relate to the established coordinate directions, and the embodiment of the present invention establishes the coordinate directions as shown in fig. 2 by way of example only. The positive sign in 0.21 indicates that the width of the object frame is large, the negative sign indicates that the width of the object frame is small, the positive sign in 0.25 indicates that the height of the object frame is large, and the negative sign indicates that the height of the object frame is small.

In practice, the pictures extracted from the video are very different from the individual pictures, and the pictures in the video have not only a correlation in time but also a correlation in space. If the independent pictures are adopted for marking, the same fixed points in various pictures are required to be determined in advance so as to mark and calculate based on the same coordinates, thereby ensuring the spatial correlation of the pictures. The video frame pictures have space-time correlation, so that the method provided by the embodiment of the invention can further improve the labeling accuracy of the video frame pictures.

And 102, determining the position data of the object frame to be marked on the picture to be marked according to the directionality of the object frame to be marked.

In this step, the position data of the object to be marked on the image to be marked may be calculated according to the weights of the object frames to be marked in different directions calculated in step 101.

As another embodiment of the present invention, the step 102 may include: and calculating the position data of the object frame to be marked on the picture to be marked according to the weight of the object frame to be marked in different directions and the position data of the object frame on the marked picture, wherein the difference value of the time stamp between the object frame and the picture to be marked is smaller than a threshold value and the time stamp is earlier than that of the object frame on the marked picture of the picture to be marked.

Specifically, firstly, position data of an object frame on a picture closest to a time stamp of the picture to be marked, which may be the object frame B1, is obtained, and then, according to weights of the object frame to be marked in different directions, the position data of the object frame to be marked is obtained through calculation.

For example, let the initial values of the weights (W _X,W_Y,W_W,W_H) be 1, take the weight (-0.02,0.14,0.21,0.25) calculated in step 101 and the position data x=142, y=269, w=266, and h=442 of the object frame B1 as examples, the position data of the object frame to be marked includes:

X’＝X×(W_X-0.02)＝142×(1-0.02)＝139.16

Y’＝Y×(W_Y+0.14)＝269×(1+0.14)＝306.66

W’＝W×(W_W+0.21)＝266×(1+0.21)＝321.86

H’＝H×(W_H+0.25)＝442×(1+0.25)＝552.5

Position data (139.16, 306.66, 321.86, 552.5) of the object frame to be marked are thus obtained.

As yet another embodiment of the present invention, the step 102 may include: and adding blank data to the position data of the object frame on one marked picture, wherein the time stamp difference value of the object frame and the to-be-marked picture is smaller than a threshold value and the time stamp is earlier than that of the to-be-marked picture, and calculating the position data of the to-be-marked object frame on the to-be-marked picture according to the weights of the to-be-marked object frames in different directions.

In this embodiment, the blank data P is added to the position data of the object frame on the picture closest to the timestamp of the picture to be marked, so as to increase the blank area of the object frame to be marked in each direction, and offset the prediction space difference caused by the video time difference. Alternatively, the size of the blank data P may be determined based on the prediction space difference due to the video time difference, or may be determined empirically, for example, 20, 25, 30, 50, 75, etc.

Taking the empirical initial value of the blank data P as 20 as an example, the position data of the object frame to be marked includes:

X’＝(X-P)×(W_X-0.02)＝(142-20)×(1-0.02)＝119.56

Y’＝(Y-P)×(W_Y+0.14)＝(269-20)×(1+0.14)＝283.86

W’＝(W+2P)×(W_W+0.21)＝(266+40)×(1+0.21)＝370.26

H’＝(H+2P)×(W_H+0.25)＝(442+40)×(1+0.25)＝602.5

Position data (119.56, 283.86, 370.26, 602.5) of the object frame to be marked are thus obtained.

And step 103, marking the object frame to be marked on the picture to be marked according to an object detection algorithm and the position data of the object frame to be marked.

In this step, the position data of the to-be-annotated object frame calculated in step 102 is input into an object detection algorithm, and the position data of the to-be-annotated object frame with the object is screened out based on the object detection algorithm, so as to annotate the to-be-annotated object frame on the to-be-annotated picture.

As yet another embodiment of the present invention, the step 103 may include:

firstly, inputting the position data of the object frame to be marked into an object detection algorithm, wherein the object detection algorithm screens out the position data of the object frame to be marked with the object through training and predictive regression;

And then mapping the position data of the object frame to be marked with the object to the coordinates of the picture to be marked, so as to mark the object frame to be marked on the picture to be marked.

The object detection algorithm may be a pedestrian detection algorithm, such as YOLO-V2 algorithm, which uses end-to-end training and predictive regression to obtain a result through one calculation. Specifically, YOLO divides the whole picture into Grid of SXS for target detection, the Grid where the central area of the object (e.g. pedestrian) is located is responsible for the detection of the object, and the detection area is finally combined by NMS to obtain the region combination, and the prediction result is output after fine adjustment. Therefore, through the object detection algorithm, the position data of a plurality of object frames to be marked of the picture to be marked can be output, and the output data format is as follows:

[(0.9028096199035645,(840.4544677734375,258.7015380859375,133.60772705078125,345.61395263671875)),(0.8951594233512878,(531.0243530273438,412.25616455078125,240.8721160888672,419.3092346191406)),(0.8737973570823669,(273.46173095703125,390.53289794921875,193.24966430664062,404.64691162109375)),(0.8556963801383972,(960.7030029296875,366.8395080566406,139.21823120117188,450.0631408691406))]|

Taking the first line of data as an example, the meaning represented by the data means: confidence of object detection, X, Y, W, H. Wherein X, Y represents the upper left corner coordinates of the object frame to be marked, and W and H represent the width and height of the object frame to be marked.

And then, mapping the data output by the object detection algorithm to global coordinates of the picture to be marked, and loading the data by a marking tool so as to mark the picture to be marked on the picture to be marked. Finally, the object frame to be marked on the picture to be marked can be further adjusted in a manual fine adjustment mode, and position data of the object frame after fine adjustment are stored.

Because the picture input into the object detection algorithm is small (namely, the picture size corresponding to the object frame to be marked) and the pixel points are few and only a part of the whole picture, the operation amount of the object detection algorithm is small, and because the space-time blank data is added, the marking accuracy can be improved. In the labeling tool scene, the object detection algorithm can return a result in real time, and the method has a good practical application effect.

According to the various embodiments described above, it can be seen that the present invention solves the problem of a great deal of repetitive labor in labeling work by adopting a technical means of determining the orientation of the object frame to be labeled according to the position changes of the object frames on at least two labeled pictures, thereby determining the position data of the object frame to be labeled. That is, the prior art is to repeatedly label pictures with temporal-spatial similarity. The method and the device are based on the space-time similarity of at least two pictures, the directionality of the object frame to be marked is determined according to the position change of the object frame on at least two marked pictures, so that the position data of the object frame to be marked is determined, then the object frame is marked on the picture to be marked according to an object detection algorithm and the position data of the object frame to be marked, thereby reducing the repetitive labor, improving the efficiency and quality of marking work and reducing the marking cost. In addition, the invention further combines the object detection algorithm, the position data of the object frame is input into the object detection algorithm, only a part of the whole picture is predicted, the running quantity of the object detection algorithm is reduced due to the reduction of the picture, and the operation efficiency and the instantaneity of the detection algorithm are accelerated.

FIG. 3 is a schematic diagram of the main flow of a method for labeling pictures according to another referenceable embodiment of the invention, which may include:

Step 301, obtaining video data, and performing batch frame extraction on the video data;

Step 302, sorting the frame pictures according to the time sequence, selecting K pictures closest to the time stamp of the picture to be marked, and marking the K pictures by using a marking tool;

step 303, mapping all the K pictures to the same coordinate, and obtaining the position data of the object frame on the K marked pictures;

step 304, calculating weights of the object frames to be marked in different directions according to the change of the position data of the object frames on different pictures;

Step 305, adding blank data to the position data of the object frame on the picture closest to the timestamp of the picture to be marked, and calculating the position data of the object frame to be marked according to the weights of the object frame to be marked in different directions;

step 306, inputting the position data of the object frame to be marked into an object detection algorithm, wherein the object detection algorithm screens out the position data of the object frame to be marked with the object through training and predictive regression;

Step 307, mapping the position data of the to-be-annotated object frame with the object to the coordinates of the to-be-annotated picture, so as to annotate the to-be-annotated object frame on the to-be-annotated picture.

According to the method provided by the embodiment of the invention, the directivity of the object frame to be marked is determined according to the position change of the object frames on at least two marked pictures, so that the technical means of determining the position data of the object frame to be marked is adopted, and the problem that a great deal of repetitive labor exists in marking work is solved. That is, the prior art is to repeatedly label pictures with temporal-spatial similarity. The method and the device are based on the space-time similarity of at least two pictures, the directionality of the object frame to be marked is determined according to the position change of the object frame on at least two marked pictures, so that the position data of the object frame to be marked is determined, then the object frame is marked on the picture to be marked according to an object detection algorithm and the position data of the object frame to be marked, thereby reducing the repetitive labor, improving the efficiency and quality of marking work and reducing the marking cost.

In addition, in the embodiment of the present invention, the method for labeling pictures has been described in detail in the above-mentioned method for labeling pictures, and thus the description thereof will not be repeated here.

Fig. 4 is a schematic diagram of main modules of a device for labeling pictures according to an embodiment of the present invention, and as shown in fig. 4, the device for labeling pictures includes a direction calculation module 401, a position calculation module 402, and a labeling module 403. The direction calculation module 401 obtains at least two marked pictures with a timestamp difference value smaller than a threshold value and a timestamp earlier than the to-be-marked picture, and determines the directionality of the to-be-marked object frame on the to-be-marked picture according to the position change of the object frame on the at least two marked pictures; the position calculation module 402 determines position data of the object frame to be marked on the picture to be marked according to the directionality of the object frame to be marked; the labeling module 403 labels the to-be-labeled object frame on the to-be-labeled picture according to an object detection algorithm and the position data of the to-be-labeled object frame.

Optionally, determining the directionality of the object frame to be annotated according to the position change of the object frames on the at least two annotated pictures includes: acquiring position data of object frames on the at least two marked pictures; and calculating weights of the object frames to be marked on the picture to be marked in different directions according to the change of the position data so as to determine the directionality of the object frames to be marked. Specifically, each marked picture can be mapped to the same coordinate, the position data of the object frame on the marked picture is obtained, and the weight of the object frame to be marked in different directions is calculated according to the change of the position data of the object frame on different pictures, so that the directionality of the object frame to be marked is determined.

Optionally, determining the directionality of the object frame to be annotated according to the position change of the object frames on the at least two annotated pictures includes: acquiring position data of object frames on the at least two marked pictures, wherein the position data comprises coordinate data, width and height; determining object frames belonging to the same object on different pictures according to the position data of the object frames; and calculating weights of the object frames to be marked on the picture to be marked in different directions according to the change of the position data of the object frames belonging to the same object on different pictures so as to determine the directionality of the object frames to be marked.

The position calculating module 402 may calculate the position data of the object to be marked according to the weights of the object frames to be marked in different directions calculated by the direction calculating module 401. Optionally, the position calculating module 402 calculates the position data of the to-be-marked object frame on the to-be-marked picture according to the weights of the to-be-marked object frame in different directions, and the position data of the to-be-marked object frame on the to-be-marked picture, wherein the time stamp difference value between the to-be-marked object frame and the to-be-marked picture is smaller than a threshold value and the time stamp is earlier than the position data of the to-be-marked object frame on the to-be-marked picture.

Optionally, the position calculating module 402 adds blank data to the position data of the object frame on the marked picture, where the difference value of the time stamp between the position calculating module and the to-be-marked picture is smaller than a threshold value and the time stamp is earlier than that of the to-be-marked picture, and calculates the position data of the to-be-marked object frame on the to-be-marked picture according to the weights of the to-be-marked object frames in different directions.

In this embodiment, the blank data P is added to the position data of the object frame on the picture closest to the timestamp of the picture to be marked, so as to increase the blank area of the object frame to be marked in each direction, and offset the prediction space difference caused by the video time difference. Alternatively, the size of the blank data P may be determined based on a prediction space difference due to a video time difference, or may be determined empirically.

The labeling module 403 inputs the position data of the to-be-labeled object frame calculated by the position calculation module 402 into an object detection algorithm, and filters the position data of the to-be-labeled object frame with the object based on the object detection algorithm, so as to label the to-be-labeled object frame on the to-be-labeled picture.

Optionally, the labeling module 403 inputs the position data of the to-be-labeled object frame into an object detection algorithm, where the object detection algorithm screens out the position data of the to-be-labeled object frame with the object through training and predictive regression; and then mapping the position data of the object frame to be marked with the object to the coordinates of the picture to be marked, so as to mark the object frame to be marked on the picture to be marked. The object detection algorithm may be a pedestrian detection algorithm, such as YOLO-V2 algorithm, which uses end-to-end training and predictive regression to obtain a result through one calculation.

According to the various embodiments described above, it can be seen that the present invention solves the problem of a great deal of repetitive labor in labeling work by adopting a technical means of determining the orientation of the object frame to be labeled according to the position changes of the object frames on at least two labeled pictures, thereby determining the position data of the object frame to be labeled. That is, the prior art is to repeatedly label pictures with temporal-spatial similarity. The method and the device are based on the space-time similarity of at least two pictures, the directionality of the object frame to be marked is determined according to the position change of the object frame on at least two marked pictures, so that the position data of the object frame to be marked is determined, then the object frame is marked on the picture to be marked according to an object detection algorithm and the position data of the object frame to be marked, thereby reducing the repetitive labor, improving the efficiency and quality of marking work and reducing the marking cost.

The details of the implementation of the apparatus for labeling pictures according to the present invention are already described in the above method for labeling pictures, and thus the description thereof will not be repeated here.

Fig. 5 illustrates an exemplary system architecture 500 to which a method of processing a timeout task or an apparatus for processing a timeout task of an embodiment of the present invention may be applied.

As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 404, and a server 405. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the method for labeling pictures provided in the embodiment of the present invention is generally performed on the terminal devices 501, 502, 503 in the public place, and may also be performed by the server 505, and accordingly, the device for labeling pictures is generally disposed on the terminal devices 501, 502, 503 in the public place, and may also be disposed in the server 505.

It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 6 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a direction calculation module, a position calculation module, and a labeling module, where the names of the modules do not constitute a limitation on the module itself in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring at least two marked pictures, the difference value of the time stamps of which is smaller than a threshold value and the time stamps of which are earlier than the picture to be marked, and determining the directionality of an object frame to be marked on the picture to be marked according to the position change of the object frame on the at least two marked pictures; determining position data of the object frame to be marked on the picture to be marked according to the directionality of the object frame to be marked; and labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled.

According to the technical scheme of the embodiment of the invention, the directionality of the object frame to be marked is determined according to the position changes of the object frames on at least two marked pictures, so that the technical means of determining the position data of the object frame to be marked is adopted, the technical problem that a great deal of repetitive labor exists in marking work is overcome, the directionality of the object frame to be marked is determined according to the position changes of the object frames on at least two marked pictures based on the space-time similarity of the at least two pictures, so that the position data of the object frame to be marked is determined, and then the object frame to be marked is marked on the picture to be marked according to an object detection algorithm and the position data of the object frame to be marked, so that the repetitive labor is reduced, the efficiency and quality of marking work are improved, and the marking cost is reduced.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of annotating a picture, comprising:

Labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled;

According to the position change of the object frames on the at least two marked pictures, determining the directionality of the object frames to be marked on the pictures to be marked comprises the following steps:

Acquiring position data of object frames on the at least two marked pictures;

according to the change of the position data, calculating weights of the object frames to be marked on the picture to be marked in different directions so as to determine the directionality of the object frames to be marked;

according to the directionality of the object frame to be marked, determining the position data of the object frame to be marked on the picture to be marked comprises the following steps:

Calculating the position data of the object frame to be marked on the picture to be marked according to the weight of the object frame to be marked in different directions and the position data of the object frame on the marked picture, wherein the difference value of the time stamp between the object frame and the picture to be marked is smaller than a threshold value and the time stamp is earlier than that of the object frame on the marked picture of the picture to be marked;

Or alternatively

Adding blank data to the position data of an object frame on a marked picture, wherein the difference value of the time stamp between the blank data and the image to be marked is smaller than a threshold value and the time stamp is earlier than that of the image to be marked, and calculating the position data of the object frame to be marked on the image to be marked according to the weights of the object frame to be marked in different directions; the time stamp of the marked picture is closest to the time stamp of the picture to be marked.

2. The method of claim 1, wherein determining the directionality of the object frame to be annotated on the picture to be annotated according to the change in position of the object frame on the at least two annotated pictures comprises:

3. The method according to claim 1, wherein labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and position data of the object frame to be labeled, comprises:

4. A device for annotating a picture, comprising:

the labeling module is used for labeling the object frame to be labeled on the picture to be labeled according to an object detection algorithm and the position data of the object frame to be labeled;

determining the directionality of the object frame to be marked according to the position change of the object frame on the at least two marked pictures, including:

Acquiring position data of object frames on the at least two marked pictures;

The position calculation module is used for:

Or alternatively

5. The apparatus of claim 4, wherein determining the directionality of the object frames to be annotated based on the change in position of the object frames on the at least two annotated pictures comprises:

6. The apparatus of claim 4, wherein the labeling module is configured to:

7. An electronic device, comprising:

One or more processors;

storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-3.

8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-3.