Detailed Description
The foregoing and other features of the present application will become apparent from the following description, taken in conjunction with the accompanying drawings. In the description and drawings, particular embodiments of the application are disclosed in detail as being indicative of some of the embodiments in which the principles of the application may be employed, it being understood that the application is not limited to the embodiments described, but, on the contrary, is intended to cover all modifications, variations, and equivalents falling within the scope of the appended claims.
In the embodiments of the present application, the terms "first", "second", and the like are used for distinguishing different elements by reference, but do not denote a spatial arrangement, a temporal order, or the like of the elements, and the elements should not be limited by the terms. The term "and/or" includes any and all combinations of one or more of the associated listed terms. The terms "comprising," "including," "having," and the like, refer to the presence of stated features, elements, components, and do not preclude the presence or addition of one or more other features, elements, components, and elements.
In the embodiments of the present application, the singular forms "a", "an", and the like include the plural forms and are to be construed broadly as "a" or "an" and not limited to the meaning of "a" or "an"; furthermore, the term "the" should be understood to include both the singular and the plural, unless the context clearly dictates otherwise. Further, the term "according to" should be understood as "at least partially according to … …," and the term "based on" should be understood as "based at least partially on … …," unless the context clearly dictates otherwise.
First aspect of the embodiments
A first aspect of an embodiment of the present application provides a method for detecting a moving object. Fig. 1 is a schematic diagram of a moving object detection method according to a first aspect of an embodiment of the present application, and as shown in fig. 1, the method includes the following operations:
in operation 101, for a first number (m) of temporally consecutive images, pairing an object in each image with an object corresponding to the same object in a predetermined other image to form an object pair;
operation 102, generating a directed acyclic graph according to the target object pairs, wherein the directed acyclic graph includes nodes and edges connecting different nodes, the nodes represent each target object in a target object pair (pair), the nodes are arranged in the order of each frame image, two nodes connected by one edge represent one target object pair, and the direction of the edge is from the target object in the previous frame image to the target object in the next frame image;
operation 103, extracting chains from the directed acyclic graph, where each chain includes the node and the edge, and in each chain, at most one node is provided for each frame of image; and
and operation 104, detecting a moving object with a moving speed within a certain range based on the extracted chain.
According to the first aspect of the embodiments of the present application, in the moving object detection method, the same target object in different images is detected and represented in a directed acyclic graph (directed acyclic graph), and since the directed acyclic graph can clearly represent the relationship of the pair of target objects in the multi-frame image and the moving direction of the same target object, detection of a moving object based on the directed acyclic graph can reduce missing detection of the moving object, and in addition, the speed and efficiency of detecting the moving object can be improved.
In operation 101, a first number of frames of images in a time series may be processed to form a target object pair. The first number of frame images in the time series may be consecutive frame images in the time series, or may be a plurality of frame images extracted from consecutive frame images in the time series.
On the multi-frame image, each frame image can be marked with the type (type) of the object (object) and the bounding box (bounding box) of the object in the frame image. The type and the bounding box of the object may be detected by using a Deep Learning (Deep Learning) based classifier, and with respect to a specific method for detecting the image by the classifier, reference may be made to the related art. Further, the plurality of frame images and information of the kind and the bounding box of the object in each frame image may be stored in advance.
In at least one embodiment, the object may be, for example, a vehicle, and the class of objects may be, for example, a sedan, a truck, a bus, or the like. In the following of the present application, the parking detection method will be described by taking an example in which the target object is a vehicle. In addition, when the target object is of another type, the detection method of the application is also applicable to detecting the motion of the target object of the other type, for example, when the target object is a human body, the detection method of the moving object of the application can be used for detecting the moving person, so that the congestion of people and other situations can be conveniently judged.
Fig. 2 is a schematic diagram of a plurality of frames of images in time series according to the first aspect of the embodiment of the present application. Fig. 2 shows 6 frame images, the indexes of the frame images are I0, I1, I2, I3, I4, I5, respectively, and the corresponding time points are sequentially delayed from I0 to I5.
As shown in fig. 2, the detected target objects are labeled 0, 1, … …, etc. in each frame image. For example, for image I2, there are 2 objects, labeled 0, 1, respectively.
In operation 101, each frame image may be compared with a predetermined other frame image, and in each comparison, a pair of objects in the two frame images for comparison is detected. The predetermined other frame image may be an image that is within a predetermined number of frames from the frame image in time series, where the predetermined number may be represented by m, and m is a natural number equal to or greater than 2.
For example, m is equal to 2, taking the images I0, I1, I2, I3 shown in fig. 2 as an example, in operation 101, for I0, since I0 and I1 differ by 1 frame, I0 and I2 differ by 2 frames, i.e., I1, I2 and I0 differ in time series by up to 2 frames, I1, I2 are predetermined other frame images relative to I0, I0 and I1 are respectively compared to obtain a target pair, and I0 and I2 are compared to obtain a target pair; similarly, for I1, I1 and I2 were compared to obtain a target pair, I1 and I3 were compared to obtain a target pair; for I2, I2 and I3 were compared to obtain target pairs. Further, with respect to I3 to I5 of fig. 2, pairs of image frames for comparison may also be determined in the same manner.
In fig. 2, an arrow on the object indicates the moving direction of the object in the image.
Fig. 3 is a schematic diagram of a method for comparing a pair of image frames in operation 101 to obtain a pair of objects, and as shown in fig. 3, operation 101 may include the following operations:
operation 301, a distance between an object (a) in the current frame image and each object (B) corresponding to the object (a) in one other frame image is calculated, and if the distance is in a first interval ([ threshold _ inner, threshold _ outer ]), the objects in the other frame image and the object in the current frame image are paired to form an object pair.
In at least one embodiment, the object a corresponds to the object B, and may be the same type of the object a as the object B, such as the same type of vehicle, or the same or similar other parameters of the object a and the object B.
For example, the current frame may be image I0, the other frame image to which image I0 is compared may be image I1, in image I0 there are M objects, the type of which may be labeled Ti for the ith object, and its bounding box aiCan be represented as [ x ]i,yi,wi,hi]Wherein (x)i,yi) Represents a bounding box AiIn the width direction and the height direction of the image, (w)i,hi) Represents a bounding box AiI is a natural number, i is more than or equal to 1 and less than or equal to M;
in the image I1, the objects whose type is denoted as Ti have Ni objects, for example, and among the Ni objects, the bounding box B of the jth objectjCan be represented as [ x ]j,yj,wj,hj]Wherein (x)j,yj) Represents a bounding box BjIn the width direction and the height direction of the image, (w)j,hj) Represents a bounding box BjJ is a natural number, and j is more than or equal to 1 and less than or equal to Ni;
the Euclidean distance (Euclidean distance) between the I-th object in the image I0 and each of the Ni objects in the image I1 is represented as a vector di=[di1,di2,…,dij,…,diNi]Wherein d isijThe Euclidean distance (Euclidean distance) between the ith object in image I0 and the jth object in the Ni objects in image I1 is shown.
In operation 301, for the ith object in the image I0, the vector d is foundiIs located in a first interval (threshold _ inner, threshold _ outer)]Inner element if vector diIf only 1 element in the first interval is located in the first interval, the target object in the image I1 corresponding to the element and the ith target object in the image I0 are used as a target object pair.
If more than 2 elements in the vector di are located in the first interval, more than 2 objects in the image I1 corresponding to the more than 2 elements and the ith object in the image I0 form candidate object pairs respectively.
As shown in fig. 3, operation 101 may further include the following operations:
operation 302, calculating the distance between the appearance features of the two targets in each candidate target pair in the feature space, and regarding the candidate target pair with the smallest distance in the feature space and the distance not greater than the feature distance threshold as the target pair.
For example, if 3 elements in the vector di are located in the first interval, 3 target objects in the image I1 corresponding to the 3 elements and the ith target object in the image I0 are respectively formed into candidate target object pairs p1, p2 and p3, the distances of the candidate target object pairs in the feature space are respectively dp1, dp2 and dp3, and if dp2 is minimum and dp2 is smaller than or equal to (i.e., not greater than) the feature distance threshold Tz, the candidate target object pair p2 is formed into a target object pair with the image I1 as the ith target object in the image I0; further, if dp2 is minimal, but dp2 is greater than the feature distance threshold Tz, none of the candidate target pairs p2, p1, p3 are target pairs.
In operation 302, the distance in feature space between the features of the two objects in each candidate object pair may be calculated according to a Cosine metric learning (Cosine metric learning) algorithm. For example, a classifier may be trained according to a Cosine metric learning (Cosine metric learning) algorithm, and the trained classifier is used to extract the feature r of each candidate object in the candidate object pair as its appearance feature, and calculate the distance between the two features r of the candidate object pair in the feature space (i.e., dp1, dp2, dp3, etc.).
In the application, compared with a classifier obtained by a traditional training method, a classifier obtained by Cosine metric learning (Cosine metric learning) algorithm training can obtain a lower intra-class distance (low-class distance) and a higher inter-class distance (high-class distance), so that the accuracy of calculation can be improved when the distance between features is calculated based on the features extracted by the classifier.
In the present application, reference may be made to the related art for a specific method for calculating the distance between the features of the two objects in each candidate object pair in the feature space according to a Cosine metric learning (Cosine metric learning) algorithm.
In addition, the present application may not be limited to this, and in operation 302, the distance between the features of the two targets in each candidate target pair in the feature space may also be calculated based on other methods.
In at least one embodiment, as shown in fig. 3, operation 101 may further include the following operations:
in operation 303, before in operation 301, a bounding box (bounding box) of each object in the images of the first number of frames is corrected to a bounding box in an image having a predetermined resolution.
After operation 303, in operation 301, calculation of a distance is performed using the size and position of the bounding box corrected by operation 303.
In the present application, the resolutions of the images of the first number of frames may not be the same, and thus, by correcting the frame in each frame image to the frame in the image of the same predetermined resolution in operation 301, the influence of the different resolutions of the images on the distance calculation in operation 301 can be eliminated.
For example, in operation 303, the position of the bottom center point of the frame of the object in the image of the predetermined resolution is adjusted, and the length and width of the frame are adjusted according to the proportional relationship between the resolution of each frame image and the predetermined resolution.
Fig. 4 is a schematic diagram of the correction of the frame. As shown in fig. 4, the resolution of the image 401 is: high 1080 pixels and wide 1920 pixels. The resolution of the predetermined resolution image 402 is: 480 pixels high and 720 pixels wide. In operation 303, the border 4011 of the target object in the image 401 is corrected to the border 4012 of the target object in the image 402; in step 301, the distance between the target objects is calculated according to the position and size of the target object border 4012 in the corrected image 402, thereby determining a target object pair or a candidate target object pair.
In the present application, the detection of the target object pair is performed for the images of the first number of frames, for example, 6 image frames I0 to I5 shown in fig. 2, according to operation 101.
In operation 102, a directed acyclic graph (directed acyclic graph) may be generated according to the detection result of the target object pair in operation 101.
FIG. 5 is a schematic diagram of a directed acyclic graph of the present application. As shown in fig. 5, the directed acyclic graph 500 includes nodes 501 and edges 502 connecting the nodes, the nodes 501 indicate respective objects in an object pair (pair), and the arrangement order of the nodes 501 in the first direction D1 corresponds to the arrangement order of the frame images I0 to I5 on the time axis. In fig. 5, two nodes 501 connected by an edge 502 represent one object pair. For example, for the object pair detected in operation 101, the node 501 corresponding to the I-th object in the image I0 and the node 501 corresponding to the j-th object in the image I1 may be connected by the edge 502.
As shown in fig. 5, in the directed acyclic graph, the edge 502 does not form a ring end to end, and the edge 502 has a direction, as indicated by the arrow of the edge 502, pointing from a target in the previous frame image to a target in the subsequent frame image, e.g., pointing from a target in the image I1 to a target in the image I2.
Further, as shown in fig. 5, in the directed acyclic graph 500, a plurality of nodes 501 corresponding to a plurality of objects in each image frame may be arranged in the second direction D2 corresponding to the image frame. For example, in fig. 5, the node 501 corresponding to each of the objects 0, 1 in the image frame I2 may be arranged below the index I1 of the image frame.
In operation 103, chains (discrete chains) different from each other may be extracted from the directed acyclic graph 500.
Fig. 6 is a schematic view of an operation of picking up a chain, as shown in fig. 6, including:
operation 601, selecting a starting node of a chain, and obtaining a plurality of chains from the starting node; and
in operation 602, if the node and the edge of the chain are included in other chains, the chain is deleted, and if the starting node and the ending node of more than two chains are respectively the same, the chain with the largest number of nodes is reserved.
In operation 601, the selected start node may be: among the nodes that do not belong to any existing chain (i.e., the remaining nodes), the nodes in the temporally most advanced image frame.
For example, in operation 601, for the directed acyclic graph shown in fig. 5, the node corresponding to object 0 of I0 is first used as a starting node of a chain to obtain a plurality of chains, each chain may include at least 2 nodes 501 and at least 1 edge 502, and the chain 502 is not interrupted, for example, the node corresponding to object 0I of I0 may form a chain to the node corresponding to object 0 of I1, the node corresponding to object 0 of I0 may form a chain to the node corresponding to object 0 of I1 to the node corresponding to object 0 of I2, the node corresponding to object 0 of I0 to the node corresponding to object 0 of I1 to the node corresponding to object 0 of I2 to the node corresponding to object 1 of I4 may form a chain, and the other chains are not listed; after a plurality of chains are obtained with the node corresponding to the object 0 of I0 as the starting node of the chain, the node corresponding to the object 1 of I1, the node corresponding to the object 1 of I2, and the node corresponding to the object 1 of I3 are not located on any chain, and therefore, at least one chain is obtained again with the node corresponding to the object 1 in the temporally foremost image I1 as a new starting point; the above operations are repeatedly performed until all chains in the directed acyclic graph are obtained.
In operation 602, a process of deleting or retaining the plurality of chains obtained in operation 601 is performed. For example, if a node and an edge of one chain are included in other chains, the chain belongs to sub-chains (sub-chains) of the other chains, and thus the chain can be deleted, so that the number of chains is reduced, and the complexity of calculation is reduced; and/or if the starting nodes and the ending nodes of more than two chains are respectively the same, reserving the chain with the largest number of nodes, and deleting other chains in the more than two chains, so that the reserved chains can reflect the motion states of the target objects corresponding to more nodes, and the detection result is more accurate.
Fig. 7 is a schematic diagram of a chain obtained based on a directed acyclic graph in the first aspect of the embodiment of the present application. As shown in fig. 7, 4 chains, i.e., 701, 702, 703, and 704, different from each other are extracted from the directed acyclic graph 500 of fig. 5. Each chain is composed of nodes 501 and edges 502, and in each chain, at most one node is provided per frame of image, that is, no more than two nodes are located in the same chain on the same frame of image.
In fig. 7, the starting point of the chain 702 and the end point of the chain 703 are both nodes corresponding to the object 0 of I0, and the end point thereof is both nodes corresponding to the object 1 of I5. Since the number of nodes included in both the chain 702 and the chain 703 is 5, both the chain 702 and the chain 703 may be retained, or only either one of the chain 702 and the chain 703 may be retained.
In operation 104, a moving object having a moving speed within a certain range may be detected from the chain extracted in operation 103.
In at least one embodiment, in operation 104, the distance between the starting node at each end of the chain corresponding to the first position in the image and the ending node corresponding to the second position in the image may be calculated, and when the distance is in the second interval [ threshold _ inner _ total, threshold _ outer _ total ], it is determined that the moving speed of the same moving object corresponding to each node on the chain is within a certain range.
For example, in the chain 704 of FIG. 7, the starting point is the node corresponding to object 0 of I0, the ending point is the node corresponding to object 0 of I5, therefore, the position of the object 0 in the image I0 (for example, the pixel position of the center of the frame of the object 0 in the image I0) is taken as the first position, the position of the object 0 in the image I5 (for example, the pixel position of the center of the frame of the object 0 in the image I5) is taken as the second position, the pixel distance between the first position and the second position is calculated, if the pixel distance is in the second interval, the moving speed of the same moving object corresponding to each node on the chain 704 is determined to be in a certain range, if the pixel distance is outside the second interval, the moving speed of the same moving object corresponding to each node on the chain 704 is determined to be outside the certain range, for example, the movement speed is greater than the upper limit value of the certain range or less than the lower limit value of the certain range.
In this embodiment, as shown in fig. 1, the method may further include:
an operation 105 of retaining the chain when the number of image frames spanned by the chain in the directed acyclic graph exceeds a number threshold; and/or
And in operation 106, when the included angle between two adjacent edges in the chain exceeds a preset angle threshold value, deleting the chain.
As shown in fig. 7, the number of image frames spanned by chains 702, 703, 704 is all 6, the number of image frames spanned by chain 701 is 3, and if the number threshold is 4, then chains 702, 703, 704 may be retained and chain 701 may be deleted in operation 105. According to operation 105, a chain with a larger number of image frames to be spanned can be retained, and a chain with a smaller number of image frames to be spanned can be deleted, thereby reducing the amount of calculation.
FIG. 8 is a schematic illustration of the presence of an included angle between two adjacent edges in a chain exceeding a predetermined angle threshold. As shown in fig. 8, in chain 800, the included angle a1 between edge 801 and edge 802 is greater than a predetermined angle threshold, and therefore, chain 800 is deleted in operation 106.
In operation 106, the predetermined angle threshold is, for example, 90 degrees.
Since a vehicle traveling on a road generally does not have a large bending movement, there is generally no large angle bend in the chain to which the traveling vehicle corresponds. According to operation 106, chains with large bending angles can be deleted, thereby avoiding false detection.
In at least one embodiment, there may be at least one of operation 105 and operation 106 in the detection method of the moving object, and in the case of at least one of operation 105 and operation 106, operation 104 may be located after operation 105 and/or operation 106, whereby the number of chains that need to be detected in operation 104 can be reduced, thereby reducing the amount of calculation in operation 104. Further, the moving object detection method may not include operation 105 and operation 106.
According to the first aspect of the embodiment of the application, whether the moving speed of the moving object is within a certain range or not is judged according to the directed acyclic graph, so that missed detection of the moving object can be reduced, and in addition, the speed and the efficiency of detecting the moving object can be improved.
Second aspect of the embodiments
A second aspect of the embodiments of the present application provides a moving object detection device, which corresponds to the moving object detection method of the first aspect of the embodiments.
Fig. 9 is a schematic diagram of a moving object detection apparatus 900 according to a second aspect of the embodiment of the present application, and as shown in fig. 9, the apparatus 900 includes: a first processing unit 901, a second processing unit 902, a third processing unit 903 and a fourth processing unit 904.
The first processing unit 901 pairs the target object in each frame image with the target object corresponding to the same object in the other predetermined frame images for the first number of frames of images to form a target object pair (pair); the second processing unit 902 generates a directed acyclic graph from the target object pairs, wherein the directed acyclic graph includes nodes and edges connecting different nodes, the nodes represent each target object in the target object pair (pair), the nodes are arranged in the order of each frame image, two nodes connected by one edge represent one target object pair, and the direction of the edge is from the target object in the previous frame image to the target object in the next frame image; the third processing unit 903 extracts chains (discontinuity chains) from the directed acyclic graph, wherein each chain includes the node and the edge, and in each chain, at most one node is provided per frame of image; the fourth processing unit 904 detects a moving object whose moving speed is within a certain range based on the extracted chain.
Further, as shown in the flow chart, the moving object detection apparatus 900 may further include: a fifth processing unit 905 and/or a sixth processing unit 906.
The fifth processing unit 905 retains the chain when the number of image frames spanned by the chain in the directed acyclic graph exceeds a number threshold; the sixth processing unit 906 deletes the chain when the included angle between two adjacent edges in the chain exceeds a predetermined angle threshold.
With regard to the explanation of the units of the moving object detection device 900, reference may be made to the explanation of the operations of the detection method of the moving object in the first aspect of the embodiment.
According to the second aspect of the embodiment of the present application, it is determined whether the moving speed of the moving object is within a certain range according to the directed acyclic graph, so that missed detection of the moving object can be reduced, and in addition, the speed and efficiency of detecting the moving object can be improved.
Third aspect of the embodiments
A third aspect of an embodiment of the present application provides an electronic device, including: the moving object detection device according to the second aspect of the embodiment.
Fig. 10 is a schematic diagram of a configuration of an electronic apparatus according to the third aspect of the embodiment of the present application. As shown in fig. 10, the electronic device 1000 may include: a Central Processing Unit (CPU)1001 and a memory 1002; the memory 1002 is coupled to the cpu 1001. Wherein the memory 1002 can store various data; further, a program for performing control is stored, and is executed under the control of the cpu 1001.
In one embodiment, the functionality of the moving object detection device 900 may be integrated into the central processing unit 1001.
The central processing unit 1001 may be configured to execute the method for detecting a moving object according to the first aspect of the embodiment.
In another embodiment, the moving object detection device 900 may be configured separately from the processor 1001, for example, the moving object detection device 900 may be configured as a chip connected to the processor 1001, and the functions of the moving object detection device 900 may be realized by the control of the processor 1001.
Further, as shown in fig. 10, the electronic device 1000 may further include: an input/output unit 1003, a display unit 1004, and the like; the functions of the above components are similar to those of the prior art, and are not described in detail here. It is noted that the electronic device 1000 does not necessarily include all of the components shown in FIG. 10; furthermore, the electronic device 1000 may also comprise components not shown in fig. 10, which may be referred to in the prior art.
Embodiments of the present application also provide a computer-readable program, where when the program is executed in a detection device or an electronic apparatus of a moving object, the program causes the detection device or the electronic apparatus of the moving object to execute the detection method of the moving object described in the first aspect of the embodiments.
An embodiment of the present application further provides a storage medium storing a computer-readable program, where the storage medium stores the computer-readable program, and the computer-readable program enables a moving object detection apparatus or an electronic device to execute the moving object detection method according to the first aspect of the embodiment.
The measurement devices described in connection with the embodiments of the invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. For example, one or more of the functional block diagrams and/or one or more combinations of the functional block diagrams illustrated in fig. 9 may correspond to individual software modules of a computer program flow or may correspond to individual hardware modules. These software modules may respectively correspond to the respective operations shown in the first aspect of the embodiment. These hardware modules may be implemented, for example, by solidifying these software modules using a Field Programmable Gate Array (FPGA).
A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium; or the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The software module may be stored in the memory of the mobile terminal or in a memory card that is insertable into the mobile terminal. For example, if the electronic device employs a MEGA-SIM card with a larger capacity or a flash memory device with a larger capacity, the software module may be stored in the MEGA-SIM card or the flash memory device with a larger capacity.
One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 9 may be implemented as a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof designed to perform the functions described herein. One or more of the functional block diagrams and/or one or more combinations of the functional block diagrams described with respect to fig. 9 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP communication, or any other such configuration.
The present application has been described in conjunction with specific embodiments, but it should be understood by those skilled in the art that these descriptions are intended to be illustrative, and not limiting. Various modifications and adaptations of the present application may occur to those skilled in the art based on the teachings herein and are within the scope of the present application.
With respect to the embodiments including the above embodiments, the following remarks are also disclosed:
1. a detection device of a moving object, comprising:
a first processing unit which pairs the target object in each frame image with the target object corresponding to the same object in the other predetermined frame images to form a target object pair (pair) for the images of the first number of frames;
a second processing unit that generates a directed acyclic graph from the target object pairs, wherein the directed acyclic graph includes nodes representing respective target objects in a target object pair (pair) and edges connecting different nodes arranged in order of respective frame images, two nodes connected by one of the edges representing one of the target object pairs, and the direction of the edge is directed from a target object in a preceding frame image to a target object in a succeeding frame image;
a third processing unit that extracts chains (distinct chains) from the directed acyclic graph, wherein each chain includes the node and the edge, and in each chain, at most one node is provided per frame of image; and
and a fourth processing unit which detects a moving object with a moving speed within a certain range based on the extracted chain.
2. The detecting unit according to supplementary note 1, wherein,
the first processing unit calculates the distance between a target object (A) in the current frame image and each target object (B) corresponding to the target object (same vehicle type) in one other frame image, and if the distance is in a first interval, the target object in the other frame image is paired with the target object in the current frame image to form a target object pair.
3. The detecting unit according to supplementary note 2, wherein,
if the distances between one object (a) in the current frame image and more than two objects (B) in one other frame image are all in the first interval,
the first processing unit is further configured to combine one target object (a) in the current frame image and the two or more target objects (B) in the other frame image into candidate target object pairs, calculate a distance between appearance features of the two target objects in each candidate target object pair in the feature space, and use the candidate target object pair with the smallest distance in the feature space and the distance not greater than the feature distance threshold as the target object pair.
4. The detecting unit according to supplementary note 3, wherein,
the first processing unit calculates a distance in a feature space between features of two objects in each candidate object pair based on a Cosine metric learning (Cosine metric learning) algorithm.
5. The detecting unit according to supplementary note 2, wherein,
before calculating the distance, the first processing unit further corrects a bounding box (bounding box) of each object in the images of the first number of frames to a bounding box in an image having a predetermined resolution,
wherein, when calculating the distance, the first processing unit uses the corrected frame to perform calculation.
6. The detection apparatus according to supplementary note 1, wherein the third processing unit:
selecting a starting node of a chain, and obtaining a plurality of chains from the starting node; and
if the node and the edge of the chain are contained in other chains, the chain is deleted, and if the starting nodes and the ending nodes of more than two chains are respectively the same, the chain with the largest number of nodes is reserved.
7. The detecting unit according to supplementary note 6, wherein,
the starting node is: among nodes not belonging to any existing chain, nodes in the temporally most advanced image frame.
8. The detection apparatus according to supplementary note 1, wherein the fourth processing unit:
and calculating the distance between the first position of the starting node corresponding to the two ends of the chain in the image and the second position of the ending node corresponding to the image, and when the distance is in a second interval, judging that the movement speed of the same moving object corresponding to each node on the chain is in a certain range.
9. The detection apparatus according to supplementary note 1, wherein the apparatus further comprises:
a fifth processing unit that retains the chain when the number of image frames spanned by the chain in the directed acyclic graph exceeds a number threshold; and/or
And the sixth processing unit deletes the chain when the included angle between two adjacent edges in the chain exceeds a preset angle threshold value.
10. An electronic apparatus comprising a detection device of a moving object described in any of supplementary notes 1 to 9.
11. A method of detecting a moving object, comprising:
for the images of the first number of frames, pairing the target object in each frame image with the target object corresponding to the same object in the other predetermined frame images to form a target object pair (pair);
generating a directed acyclic graph from the pairs of objects, wherein the directed acyclic graph comprises nodes and edges connecting different nodes, the nodes represent the objects in the pair of objects (pair), the nodes are arranged in the order of the images of the frames, two nodes connected by one edge represent one pair of the objects, and the direction of the edge is from the object in the image of the previous frame to the object in the image of the next frame;
extracting chains (distinguishing chains) from the directed acyclic graph, wherein each chain comprises the nodes and the edges, and in each chain, at most one node is provided for each frame of image; and
and detecting a moving object with a moving speed within a certain range based on the extracted chain.
12. The detection method according to supplementary note 11, wherein the method of forming a pair of targets (pair) comprises:
and calculating the distance between a target object (A) in the current frame image and each target object (B) corresponding to the target object (the same vehicle type) in one other frame image, and pairing the target object in the other frame image with the target object in the current frame image to form a target object pair if the distance is in a first interval.
13. The detection method according to supplementary note 12, wherein the method of forming a pair of target objects (pair) further comprises:
if the distances between one target object (A) in the current frame image and more than two target objects (B) in one other frame image are all in the first interval, forming candidate target object pairs by the target object (A) in the current frame image and the more than two target objects (B) in the other frame image respectively, calculating the distance of the appearance features of the two target objects in each candidate target object pair in the feature space, and taking the candidate target object pair with the minimum distance in the feature space and the distance not greater than the feature distance threshold as the target object pair.
14. The detection method according to supplementary note 13, wherein,
in the step of calculating the distance in the feature space, the distance in the feature space of the features of the two objects in each candidate object pair is calculated based on a Cosine metric learning (Cosine metric learning) algorithm.
15. The detection method according to supplementary note 12, wherein the method of forming a pair of target objects (pair) further comprises:
correcting a bounding box (bounding box) of each object in the images of the first number of frames to a bounding box in an image having a predetermined resolution before calculating the distance,
wherein, in the step of calculating the distance, the calculation is performed using the corrected frame.
16. The detection method according to supplementary note 11, wherein the step of extracting a chain (discontinuous chain) from the directed acyclic graph includes:
selecting a starting node of a chain, and obtaining a plurality of chains from the starting node; and
if the node and the edge of the chain are contained in other chains, the chain (sub-chain) is deleted, and if the starting node and the ending node of more than two chains are respectively the same, the chain with the largest number of nodes is reserved.
17. The detection method according to supplementary note 16, wherein,
the starting node is: among nodes not belonging to any existing chain, nodes in the temporally most advanced image frame.
18. The detection method according to supplementary note 11, wherein the step of detecting the moving object whose moving speed is within a certain range based on the extracted chain includes:
and calculating the distance between the first position of the starting node corresponding to the two ends of the chain in the image and the second position of the ending node corresponding to the image, and when the distance is in a second interval, judging that the movement speed of the same moving object corresponding to each node on the chain is in a certain range.
19. The detection method according to supplementary note 11, wherein the method further comprises:
when the number of image frames spanned by the chain in the directed acyclic graph exceeds a number threshold, retaining the chain; and/or
And when the included angle of two adjacent edges in the chain exceeds a preset angle threshold value, deleting the chain.