[go: up one dir, main page]

WO2021147325A1 - Procédé et appareil de détection d'objets, et support de stockage - Google Patents

Procédé et appareil de détection d'objets, et support de stockage Download PDF

Info

Publication number
WO2021147325A1
WO2021147325A1 PCT/CN2020/112796 CN2020112796W WO2021147325A1 WO 2021147325 A1 WO2021147325 A1 WO 2021147325A1 CN 2020112796 W CN2020112796 W CN 2020112796W WO 2021147325 A1 WO2021147325 A1 WO 2021147325A1
Authority
WO
WIPO (PCT)
Prior art keywords
detected
image
different domains
relationship
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/112796
Other languages
English (en)
Chinese (zh)
Inventor
徐航
周峰暐
黎嘉伟
梁小丹
李震国
钱莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of WO2021147325A1 publication Critical patent/WO2021147325A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene

Definitions

  • This application relates to the field of computer vision, and in particular to an object detection method, device and storage medium.
  • Artificial intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theories.
  • Object detection is a basic computer vision task that can identify the location and category of objects in an image.
  • researchers and engineers will create data sets for different specific problems according to the application scenarios and actual task requirements, and use them to train highly customized and unique automatic object detectors.
  • Object detection across data sets is an efficient method to achieve large-scale object detection.
  • the existing multi-task learning only handles multiple tasks at the same time by adding multiple branches to the model, and cannot realize the interaction between different data sets and different object categories, and cannot capture the internal relationship between the objects to be detected in different data sets. , So the effect is not good.
  • the present application provides an object detection method, device, and computer storage medium to improve the effect of object detection.
  • the first aspect of the present application provides an object detection method, which may include: acquiring an image to be detected. Determine the initial image characteristics of the object to be detected in the image to be detected. Determine the enhanced image feature of the object to be detected according to the cross-domain knowledge map information.
  • the cross-domain knowledge map information can include the association relationship between the object categories corresponding to the object to be detected in different domains, and the enhanced image feature indicates that the object in different domains is related to the object to be detected.
  • the semantic information of the object category corresponding to the other objects in the link According to the initial image feature of the object to be detected and the enhanced image feature of the object to be detected, the candidate frame and classification of the object to be detected are determined.
  • the above object detection method can be applied in different application scenarios.
  • the above object detection method can be applied in the scene of recognizing everything, and it can also be applied in the scene of street view recognition.
  • the above-mentioned image to be detected may be an image taken by the mobile terminal through a camera, or an image already stored in the mobile terminal's album.
  • the above-mentioned image to be detected may be a street view image taken by a camera on the roadside.
  • the object categories in the first domain or the first data set include men, women, boys, girls, roads, and streets.
  • Object categories in the second domain include people, handbags, school bags, cars, and trucks. It can be considered that the men, women, boys, and girls in the first domain have an association relationship with the people in the second domain. The women and girls in the first domain have an association with the handbags in the second domain. There is an association between roads and streets in the first domain and cars and trucks in the second domain.
  • Semantic information can refer to high-level information that can assist in image detection.
  • the above-mentioned semantic information can specifically be what the object is and what is around the object (semantic information is generally different from low-level information, such as image edges, pixels, brightness, etc.).
  • the object to be detected is a woman, and other objects associated with the bicycle in the image to be detected include a handbag, then the enhanced image feature of the object to be detected may indicate semantic information of the handbag.
  • the solution provided by this application can effectively use a large number of different data sets and different types of information to train the same network at the same time, which greatly improves the data utilization rate and makes the detection performance higher.
  • the cross-domain knowledge graph may include nodes and node edges, nodes corresponding to objects to be detected, and node edges corresponding to high-level semantic features of different objects to be detected
  • the method may also include: obtaining classification layer parameters corresponding to different domains. According to the classification weights of the initial image features in different domains on different object categories, the classification layer parameters corresponding to different domains are weighted and merged to obtain the high-level semantic features of the object to be detected. The weight of the relationship between the object categories corresponding to the object to be detected in different domains is projected onto the node connection edge of the object to be detected, and the weight of the node connection edge is obtained.
  • the method may further include: determining the relationship according to the distance relationship between the object categories corresponding to the objects to be detected in different domains Weights.
  • the distance relationship between object categories corresponding to the object to be detected may include one or more of the following information: Attribute relationships between object categories corresponding to objects to be detected in different domains. The positional relationship or active-object relationship between object categories corresponding to objects to be detected in different domains. The similarity of word embeddings between the object categories corresponding to the objects to be detected in different domains is constructed using linguistic knowledge. The distance relationship between the object categories corresponding to the objects to be detected in different domains is obtained by training the neural network model according to the training data.
  • the enhanced image feature of the object to be detected is determined according to the cross-domain knowledge map information, and Including: performing convolution processing on the high-level semantic features according to the weights of the edges of the nodes to obtain the enhanced image features of the object to be detected.
  • a second aspect of the present application provides an image detection device, which may include: an image acquisition module for acquiring an image to be detected.
  • the feature extraction module is used to determine the initial image feature of the object to be detected in the image to be detected.
  • the feature extraction module is also used to determine the enhanced image features of the object to be detected according to the cross-domain knowledge map information.
  • the cross-domain knowledge map information can include the association relationship between the object categories corresponding to the object to be detected in different domains, and the enhanced image feature indicates different Semantic information of object categories corresponding to other objects in the domain associated with the object to be detected.
  • the detection module is used to determine the candidate frame and classification of the object to be detected according to the initial image feature of the object to be detected and the enhanced image feature of the object to be detected.
  • the cross-domain knowledge graph may include nodes and node edges, nodes corresponding to objects to be detected, and node edges corresponding to high-level semantic features of different objects to be detected
  • the image detection device may also include a parameter acquisition module and a projection module.
  • the parameter acquisition module is used to acquire classification layer parameters corresponding to different domains.
  • the feature extraction module is specifically used to weight and fuse the classification layer parameters corresponding to different domains according to the classification weights of the initial image features in different domains on different object categories to obtain the high-level semantic features of the object to be detected.
  • the projection module is used to project the weights of the relationships between the object categories corresponding to the objects to be detected in different domains onto the edges of the nodes of the objects to be detected to obtain the weights of the edges of the nodes.
  • the second possible implementation may also include a relationship weight determination module, a relationship weight determination module, configured to correspond to objects to be detected in different domains. The distance relationship between the object categories determines the relationship weight.
  • the distance relationship between object categories corresponding to the object to be detected may include one or more of the following information: Attribute relationships between object categories corresponding to objects to be detected in different domains. The positional relationship or active-object relationship between object categories corresponding to objects to be detected in different domains. The similarity of word embeddings between the object categories corresponding to the objects to be detected in different domains is constructed using linguistic knowledge. The distance relationship between the object categories corresponding to the objects to be detected in different domains is obtained by training the neural network model according to the training data.
  • the feature extraction module is specifically used to compare the high-level semantics according to the weight of the node connection
  • the features are processed by convolution to obtain the enhanced image features of the object to be detected.
  • the third aspect of the present application provides a neural network training method.
  • the method includes: acquiring training data, the training data including training images and object detection and labeling results of the objects to be detected in the training images; extracting the training images from the neural network The initial image features of the object to be detected; the enhanced image features of the object to be detected in the training image are extracted according to the neural network and the cross-domain knowledge map information; the initial image features and the enhanced image features of the object to be detected are processed according to the neural network , Obtain the object detection result of the object to be detected; determine the model parameters of the neural network according to the object detection result of the object to be detected in the training image and the object detection label result of the object to be detected in the training image.
  • the cross-domain knowledge map information may include the association relationship between the object categories corresponding to the object to be detected in different domains, and the enhanced image feature indicates semantic information of the object category corresponding to other objects in the different domains associated with the object to be detected.
  • the object detection and labeling result of the object to be detected in the training image includes the labeling candidate frame and labeling classification result of the object to be detected in the training image.
  • a set of initial model parameters can be set for the neural network, and then based on the object detection result of the object to be detected in the training image and the object detection labeling result of the object to be detected in the training image.
  • the neural network obtained through training in the third aspect can be used to implement the method in the first aspect of the present application.
  • the cross-domain knowledge graph may include nodes and node edges, where nodes correspond to objects to be detected, and node edges correspond to the high-level semantics of different objects to be detected
  • the relationship between features According to the classification weights of the initial image features in different domains on different object categories, the classification layer parameters corresponding to different domains are weighted and merged to obtain the high-level semantic features of the object to be detected.
  • the classification layer parameters can be understood as maintaining a class of the category. center.
  • the weight of the relationship between the object categories corresponding to the object to be detected in different domains is projected onto the node connection edge of the object to be detected, and the weight of the node connection edge is obtained.
  • the second possible implementation manner may further include determining the relationship weight according to the distance relationship between the object categories corresponding to the objects to be detected in different domains.
  • the distance relationship between the object categories corresponding to the object to be detected may include one or more of the following information: Attribute relationships between object categories corresponding to objects to be detected in different domains. The positional relationship or active-object relationship between object categories corresponding to objects to be detected in different domains. The similarity of word embeddings between the object categories corresponding to the objects to be detected in different domains is constructed using linguistic knowledge. The distance relationship between the object categories corresponding to the objects to be detected in different domains is obtained by training the neural network model according to the training data.
  • the high-level semantic features are convolved according to the weights of the node edges to obtain Enhanced image characteristics of the object to be detected.
  • an object detection device in a fourth aspect, includes modules for executing the method in the first aspect.
  • a neural network training device in a fifth aspect, includes various modules for executing the method in the third aspect.
  • an object detection device in a sixth aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to perform the method in the first aspect described above.
  • a neural network training device in a seventh aspect, includes: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the device The processor is used to execute the method in the third aspect described above.
  • an electronic device which includes the object detection device in the fourth aspect or the sixth aspect.
  • an electronic device in a ninth aspect, includes the object detection device in the fifth aspect or the seventh aspect.
  • the above-mentioned electronic device may specifically be a mobile terminal (for example, a smart phone), a tablet computer, a notebook computer, an augmented reality/virtual reality device, a vehicle-mounted terminal device, and so on.
  • a mobile terminal for example, a smart phone
  • a tablet computer for example, a tablet computer
  • a notebook computer for example, a tablet computer
  • an augmented reality/virtual reality device for example, a vehicle-mounted terminal device, and so on.
  • a computer storage medium stores program code, and the program code includes instructions for executing the steps in the method in the first aspect or the third aspect.
  • a computer program product containing instructions is provided, when the computer program product runs on a computer, the computer executes the method in the first aspect or the third aspect.
  • a chip in a twelfth aspect, includes a processor and a data interface.
  • the processor reads instructions stored in a memory through the data interface and executes the method in the first aspect or the third aspect.
  • the chip may further include a memory in which instructions are stored, and the processor is configured to execute instructions stored on the memory.
  • the processor is used to execute the method in the first aspect.
  • the above-mentioned chip may specifically be a field programmable gate array FPGA or an application-specific integrated circuit ASIC.
  • the above-mentioned method of the first aspect may specifically refer to the first aspect and a method in any one of the various implementation manners of the first aspect.
  • the foregoing method of the third aspect may specifically refer to the third aspect and a method in any one of the various implementation manners of the third aspect.
  • a cross-domain knowledge graph is constructed, which can capture the intrinsic relationship between different objects to be detected, and the enhanced image features include the semantics of object categories corresponding to other objects in different domains associated with the object to be detected Information, so this application can improve the effect of the object detection method.
  • FIG. 1 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of object detection using a convolutional neural network model provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a chip hardware structure provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an object detection method according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the association relationship of an embodiment of the present application.
  • Fig. 6 is a flowchart of an object detection method according to an embodiment of the present application.
  • FIG. 7 is a flowchart of an object detection method according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a neural network training method according to an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of an object detection device according to an embodiment of the present application.
  • FIG. 10 is a schematic block diagram of an object detection device according to an embodiment of the present application.
  • Fig. 11 is a schematic block diagram of a neural network training device according to an embodiment of the present application.
  • the embodiments of this application are mainly applied in scenes of large-scale object detection, such as mobile phone face recognition, mobile phone recognition of everything, the perception system of unmanned vehicles, security cameras, photo object recognition on social networking sites, smart robots, and so on.
  • object detection such as mobile phone face recognition, mobile phone recognition of everything, the perception system of unmanned vehicles, security cameras, photo object recognition on social networking sites, smart robots, and so on.
  • the object detection method of the embodiment of the application can be used to detect objects in the pictures taken by the mobile phone. Since the object detection method of the embodiment of the application combines the cross-domain knowledge graph when detecting objects, the method of the embodiment of the application is adopted The object detection method performs better object detection on the pictures taken by the mobile phone (for example, the position of the object and the classification of the object are more accurate).
  • Cameras deployed on the street can take pictures of passing vehicles and people. After the pictures are obtained, the pictures can be uploaded to the control center equipment, and the control center equipment can perform object detection on the pictures and obtain the object detection results. When abnormalities occur, the pictures can be uploaded to the control center equipment. The control center can send out an alarm when the object is missing.
  • the neural network training method provided in the embodiments of this application involves computer vision processing, and can be specifically applied to data processing methods such as data training, machine learning, and deep learning. Labeling results) Carry out symbolic and formal intelligent information modeling, extraction, preprocessing, training, etc., and finally get a trained neural network.
  • the object detection method provided by the embodiments of this application can use the above-mentioned trained neural network to input input data (such as the picture in this application) into the trained neural network to obtain output data (such as the picture in this application). Test results).
  • the neural network training method provided in the embodiments of this application and the object detection method in the embodiments of this application are inventions based on the same concept, and can also be understood as two parts in a system, or an overall process Two stages: such as model training stage and model application stage.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • Ws is the weight of xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network can be understood as a neural network with many hidden layers. There is no special metric for "many” here. The essence of the multi-layer neural network and deep neural network we often say The above is the same thing. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer. Although DNN looks very complicated, it is not complicated in terms of the work of each layer.
  • the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as
  • the superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficients from the kth neuron in the L-1th layer to the jth neuron in the Lth layer are defined as Note that the input layer has no W parameter.
  • more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the classifier is generally composed of a fully connected layer and a softmax function, which can output probabilities of different categories according to the input.
  • FPN is based on the original detector to independently predict in different feature layers.
  • the original intention of migration learning is to deal with the problem of insufficient training samples, so that the model can use the existing source domain data (source domain data) to migrate to related but not identical target domain data (target domain data), thereby training suitable for the target
  • the model of the domain is an abstract concept that refers to tasks with similar properties. Specifically, a domain (or domain) can be a detection task on a specific data set, or it can refer to a detection task for a specific object (such as a human face), and so on. There are often obvious differences between different domains, which are difficult to deal with in a unified manner.
  • the global domain refers to the collective name of all domains including all potential tasks. It is the complete set of domains and is generally used for definitions and conceptual expressions.
  • the core algorithm of transfer learning is to extract domain-invariant information by maximizing a specific domain similarity measure, so that data in different domains can learn from each other to obtain a model suitable for the target domain.
  • a graph is a data format that can be used to represent social networks, communication networks, protein molecular networks, etc.
  • the nodes in the graph represent individuals in the network, and the lines represent the connections between individuals.
  • Many machine learning tasks such as community discovery, link prediction, etc. require graph structure data. Therefore, the emergence of graph convolutional neural networks (GCN) provides new ideas for solving these problems.
  • GCN can be used for deep learning of graph data.
  • GCN is a natural promotion of convolutional neural networks in the graph domain. It can perform end-to-end learning of node feature information and structural information at the same time, and is currently the best choice for graph data learning tasks.
  • the applicability of GCN is extremely wide, and it is suitable for nodes and graphs of any topology.
  • Fig. 1 is a schematic diagram of the system architecture of an embodiment of the present application.
  • the system architecture 100 includes an execution device 110, a training device 120, a database 130, a client device 140, a data storage system 150, and a data collection system 160.
  • the execution device 110 includes a calculation module 111, an I/O interface 112, a preprocessing module 113, and a preprocessing module 114.
  • the calculation module 111 may include the target model/rule 101, and the preprocessing module 113 and the preprocessing module 114 are optional.
  • the data collection device 160 is used to collect training data.
  • the training data may include training images of different domains or different data sets and the annotation results corresponding to the training images.
  • the labeling result of the training image may be the (manually) pre-labeled classification result of each object to be detected in the training image.
  • the data collection device 160 stores the training data in the database 130, and the training device 120 trains to obtain the target model/rule 101 based on the training data maintained in the database 130.
  • the training device 120 performs object detection on the input training image, and compares the output detection result with the object pre-labeled detection result, until the training device 120 outputs The difference between the detection result of the object and the pre-labeled detection result is less than a certain threshold, thereby completing the training of the target model/rule 101.
  • the above-mentioned target model/rule 101 can be used to implement the object detection method of the embodiment of the present application, that is, input the image to be detected (after relevant preprocessing) into the target model/rule 101 to obtain the detection result of the image to be detected.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network.
  • the training data maintained in the database 130 may not all come from the collection of the data collection device 160, and may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rule 101 completely based on the training data maintained by the database 130. It may also obtain training data from the cloud or other places for model training.
  • the above description should not be used as a reference to this application. Limitations of the embodiment.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1, which can be a terminal, such as a mobile phone terminal, a tablet computer, notebook computers, augmented reality (AR)/virtual reality (VR), vehicle-mounted terminals, etc., can also be servers or clouds.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices.
  • the user can input data to the I/O interface 112 through the client device 140.
  • the input data in this embodiment of the present application may include: a to-be-processed image input by the client device.
  • the client device 140 here may specifically be a terminal device.
  • the preprocessing module 113 and the preprocessing module 114 are used to perform preprocessing according to the input data (such as the image to be processed) received by the I/O interface 112.
  • the preprocessing module 113 and the preprocessing module may not be provided.
  • 114 there may only be one preprocessing module, and the calculation module 111 is directly used to process the input data.
  • the execution device 110 may call data, codes, etc. in the data storage system 150 for corresponding processing .
  • the data, instructions, etc. obtained by corresponding processing may also be stored in the data storage system 150.
  • the I/O interface 112 presents the processing result, such as the detection result of the object obtained above, to the client device 140 to provide it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete The above tasks provide users with the desired results.
  • the user can manually set input data, and the manual setting can be operated through the interface provided by the I/O interface 112.
  • the client device 140 can automatically send input data to the I/O interface 112. If the client device 140 is required to automatically send the input data and the user's authorization is required, the user can set the corresponding authority in the client device 140.
  • the user can view the result output by the execution device 110 on the client device 140, and the specific presentation form may be a specific manner such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal to collect the input data of the input I/O interface 112 and the output result of the output I/O interface 112 as new sample data and store it in the database 130 as shown in the figure.
  • the I/O interface 112 directly uses the input data input to the I/O interface 112 and the output result of the output I/O interface 112 as a new sample as shown in the figure.
  • the data is stored in the database 130.
  • FIG. 1 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship between the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110. In other cases, the data storage system 150 may also be placed in the execution device 110.
  • the target model/rule 101 obtained by training according to the training device 120 can be the neural network in this application in the embodiment of the application.
  • the neural network provided in the embodiment of the application can be CNN and deep convolution.
  • Neural networks deep convolutional neural networks, DCNN) and so on.
  • CNN is a very common neural network
  • the structure of CNN will be introduced in detail below in conjunction with Figure 2.
  • a convolutional neural network is a deep neural network with a convolutional structure. It is a deep learning architecture.
  • the deep learning architecture refers to the algorithm of machine learning. Multi-level learning is carried out on the abstract level of.
  • CNN is a feed-forward artificial neural network. Each neuron in the feed-forward artificial neural network can respond to the input image.
  • a convolutional neural network (CNN) 200 may include an input layer 210, a convolutional layer/pooling layer 220 (the pooling layer is optional), and a neural network layer 230.
  • CNN convolutional neural network
  • the convolutional layer/pooling layer 220 shown in FIG. 2 may include layers 221-226 as shown in the examples.
  • layer 221 is a convolutional layer
  • layer 222 is a pooling layer
  • layer 223 is a convolutional layer.
  • Layers, 224 is the pooling layer
  • 225 is the convolutional layer
  • 226 is the pooling layer; in another implementation, 221 and 222 are the convolutional layers, 223 is the pooling layer, and 224 and 225 are the convolutional layers.
  • Layer, 226 is the pooling layer. That is, the output of the convolutional layer can be used as the input of the subsequent pooling layer, or as the input of another convolutional layer to continue the convolution operation.
  • the convolution layer 221 can include many convolution operators.
  • the convolution operator is also called a kernel. Its function in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator is essentially It can be a weight matrix. This weight matrix is usually pre-defined. In the process of convolution on the image, the weight matrix is usually one pixel after one pixel (or two pixels after two pixels) along the horizontal direction on the input image. ...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a single depth dimension convolution output, but in most cases, a single weight matrix is not used, but multiple weight matrices of the same size (row ⁇ column) are applied. That is, multiple homogeneous matrices.
  • the output of each weight matrix is stacked to form the depth dimension of the convolutional image, where the dimension can be understood as determined by the "multiple" mentioned above.
  • Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image.
  • the multiple weight matrices have the same size (row ⁇ column), the size of the convolution feature maps extracted by the multiple weight matrices of the same size are also the same, and then the multiple extracted convolution feature maps of the same size are combined to form The output of the convolution operation.
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can be used to extract information from the input image, so that the convolutional neural network 200 can make correct predictions. .
  • the initial convolutional layer (such as 221) often extracts more general features, which can also be called low-level features; with the convolutional neural network
  • the features extracted by the subsequent convolutional layers (for example, 226) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the pooling layer can be a convolutional layer followed by a layer.
  • the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
  • the sole purpose of the pooling layer is to reduce the size of the image space.
  • the pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to obtain an image with a smaller size.
  • the average pooling operator can calculate the pixel values in the image within a specific range to generate an average value as the result of the average pooling.
  • the maximum pooling operator can take the pixel with the largest value within a specific range as the result of the maximum pooling.
  • the operators in the pooling layer should also be related to the image size.
  • the size of the image output after processing by the pooling layer can be smaller than the size of the image of the input pooling layer, and each pixel in the image output by the pooling layer represents the average value or the maximum value of the corresponding sub-region of the image input to the pooling layer.
  • the convolutional neural network 200 After processing by the convolutional layer/pooling layer 220, the convolutional neural network 200 is not enough to output the required output information. Because as mentioned above, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, in order to generate final output information (required class information or other related information), the convolutional neural network 200 needs to use the neural network layer 230 to generate one or a group of required classes of output. Therefore, the neural network layer 230 can include multiple hidden layers (231, 232 to 23n as shown in FIG. 2) and an output layer 240. The parameters contained in the hidden layers can be based on specific task types. Relevant training data of, is obtained through pre-training. For example, the task type can include image recognition, image classification, image super-resolution reconstruction, and so on.
  • the output layer 240 After the multiple hidden layers in the neural network layer 230, that is, the final layer of the entire convolutional neural network 200 is the output layer 240.
  • the output layer 240 has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the convolutional neural network 200 shown in FIG. 2 is only used as an example of a convolutional neural network. In specific applications, the convolutional neural network may also exist in the form of other network models.
  • CNN convolutional neural network
  • FIG. 2 may be used to execute the object detection method of the embodiment of the present application.
  • the image to be processed passes through the input layer 210 and the convolutional layer/pooling layer 220. After processing with the neural network layer 230, the detection result of the image can be obtained.
  • FIG. 3 is a hardware structure of a chip provided by an embodiment of the application, and the chip includes a neural network processor.
  • the chip may be set in the execution device 110 as shown in FIG. 1 to complete the calculation work of the calculation module 111.
  • the chip can also be set in the training device 120 as shown in FIG. 1 to complete the training work of the training device 120 and output the target model/rule 101.
  • the algorithms of each layer in the convolutional neural network as shown in Figure 2 can be implemented in the chip as shown in Figure 3.
  • the neural network processor NPU is mounted as a coprocessor to a main central processing unit (central processing unit, CPU) (host CPU), and the main CPU distributes tasks.
  • the core part of the NPU is the arithmetic circuit 303.
  • the controller 304 controls the arithmetic circuit 303 to extract data from the memory (weight memory or input memory) and perform calculations.
  • the arithmetic circuit 303 includes multiple processing units (process engines, PE). In some implementations, the arithmetic circuit 303 is a two-dimensional systolic array. The arithmetic circuit 303 may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 303 is a general-purpose matrix processor.
  • the arithmetic circuit 303 fetches the data corresponding to the matrix B from the weight memory 302 and caches it on each PE in the arithmetic circuit 303.
  • the arithmetic circuit 303 fetches the matrix A data and matrix B from the input memory 301 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 308.
  • the vector calculation unit 307 can perform further processing on the output of the arithmetic circuit 303, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on.
  • the vector calculation unit 307 can be used for network calculations in the non-convolutional/non-FC layer of the neural network, such as pooling, batch normalization, local response normalization, etc. .
  • the vector calculation unit 307 can store the processed output vector to the unified buffer 306.
  • the vector calculation unit 307 may apply a nonlinear function to the output of the arithmetic circuit 303, such as a vector of accumulated values, to generate the activation value.
  • the vector calculation unit 307 generates a normalized value, a combined value, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 303, for example for use in a subsequent layer in a neural network.
  • the unified memory 306 is used to store input data and output data.
  • the weight data directly transfers the input data in the external memory to the input memory 301 and/or the unified memory 306 through the storage unit access controller 305 (direct memory access controller, DMAC), stores the weight data in the external memory into the weight memory 302, and The data in the unified memory 306 is stored in the external memory.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 310 is used to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 309 through the bus.
  • An instruction fetch buffer 309 connected to the controller 304 is used to store instructions used by the controller 304.
  • the controller 304 is used to call the instructions cached in the memory 309 to control the working process of the computing accelerator.
  • the unified memory 306, the input memory 301, the weight memory 302, and the instruction fetch memory 309 are all on-chip (On-Chip) memories.
  • the external memory is a memory external to the NPU.
  • the external memory can be a double data rate synchronous dynamic random access memory.
  • Memory double data rate synchronous dynamic random access memory, referred to as DDR SDRAM
  • HBM high bandwidth memory
  • each layer in the convolutional neural network shown in FIG. 2 can be executed by the arithmetic circuit or the vector calculation module 307.
  • the execution device 110 in FIG. 1 introduced above can execute each step of the object detection method in the embodiment of the present application.
  • the CNN model shown in FIG. 2 and the chip shown in FIG. 3 can also be used to execute the object in the embodiment of the present application.
  • the object detection method of the embodiment of the present application will be described in detail below with reference to the accompanying drawings.
  • the method shown in FIG. 4 can be applied in different scenarios. Specifically, the method shown in FIG. 4 can be applied in scenarios such as recognizing everything and street view recognition.
  • the image to be detected in step 401 may be an image taken by the mobile terminal through a camera, or an image already stored in the mobile terminal's album.
  • the image to be detected in step 401 may be a street view image taken by a camera on the roadside.
  • the method shown in FIG. 4 may be executed by a neural network (model). Specifically, the method shown in FIG. 4 may be executed by CNN or DNN.
  • the entire image of the image to be detected may be subjected to convolution processing or regularization processing, etc., to obtain the image characteristics of the entire image, and then the initial image characteristics corresponding to the object to be detected are obtained from the image characteristics of the entire image. .
  • performing convolution processing on the image to be detected to obtain the initial image feature of the object to be detected includes: performing convolution processing on the entire image of the image to be detected to obtain the complete image feature of the image to be detected; Among the complete image features of the image to be detected, the image feature corresponding to the object to be detected is determined as the initial image feature of the object to be detected.
  • performing convolution processing on the image to be detected to obtain the initial image feature of the object to be detected includes: separately acquiring the image feature corresponding to each object to be detected each time.
  • the cross-domain knowledge map information includes the association relationship between object categories corresponding to the objects to be detected in different domains, and the enhanced image features indicate semantic information of object categories corresponding to other objects in different domains that are associated with the objects to be detected.
  • the object categories in the first domain or the first data set include men, women, boys, girls, roads, and streets.
  • Object categories in the second domain include people, handbags, school bags, cars, and trucks. It can be considered that the men, women, boys, and girls in the first domain have an association relationship with the people in the second domain. The women and girls in the first domain have an association with the handbags in the second domain. The boys and girls in the first domain have an association with the school bags in the second domain. There is an association between roads and streets in the first domain and cars and trucks in the second domain.
  • Semantic information can refer to high-level information that can assist in image detection.
  • the above-mentioned semantic information can specifically be what the object is and what is around the object (semantic information is generally different from low-level information, such as image edges, pixels, brightness, etc.).
  • the object to be detected is a woman, and other objects associated with the woman in the image to be detected include a handbag, then the enhanced image feature of the object to be detected may indicate semantic information of the handbag.
  • the cross-domain knowledge graph may include nodes and node edges, where nodes correspond to objects to be detected, and node edges correspond to relationships between high-level semantic features of different objects to be detected.
  • the classification layer parameters corresponding to different domains are weighted and merged to obtain the high-level semantic features of the object to be detected.
  • the classification layer parameters can be understood as maintaining a class of the category. Center, class center refers to the high-level semantic features of the category.
  • the weight of the relationship between the object categories corresponding to the object to be detected in different domains is projected onto the node connection edge of the object to be detected, and the weight of the node connection edge is obtained.
  • the weight of the edge between the i-th node and the j-th node of the area graph in the S domain is Where Cf j is the feature of the i-th object to be detected in one domain and the feature of the j-th object to be detected in another domain.
  • G SP is the weight of the relationship between object categories corresponding to the objects to be detected in different domains, and G SP can be regarded as a matrix.
  • the weight of the relationship between the object categories corresponding to the object to be detected in different domains is projected onto the node connection edge of the object to be detected, and the weight of the node connection edge is obtained, which can be expressed by the following formula, where T represents the transposition of the matrix:
  • the process of projection can be regarded as the process of converting the weight of the relationship between the object categories into the weight of the relationship between the objects to be detected, and the weight of the relationship between the objects to be detected is the weight of the edges of the nodes.
  • the high-level semantic features are convolved according to the weights of the edges of the nodes, and the enhanced image features of the object to be detected can be obtained.
  • the relationship weight may be determined according to the distance relationship between the object categories corresponding to the objects to be detected in different domains.
  • the distance relationship includes one or more of the following information:
  • the color of an apple is red, and the color of a strawberry is also red. Then, apples and strawberries have the same color attributes (or, it can be said that apples and strawberries are relatively close in color attributes).
  • the similarity of word embedding constructed with linguistic knowledge can be understood as the degree of similarity between word vectors of different object categories.
  • the weight of the edge between the i-th node in one domain and the j-th node in the other domain is
  • f i and f j are the feature of the i-th object to be detected in one domain and the feature of the j-th object to be detected in the other domain (abbreviation of the initial image feature of the object to be detected).
  • the candidate frame and classification of the object to be detected determined in step 404 may be the final candidate frame and the final classification (result) of the object to be detected, respectively.
  • step 404 the initial image feature of the object to be detected and the enhanced image feature of the object to be detected can be combined to obtain the final image feature of the object to be detected, and then the candidate of the object to be detected can be determined according to the final image feature of the object to be detected. Box and classification.
  • the initial image feature of the object to be detected is a convolution feature map with a size of M1 ⁇ N1 ⁇ C1 (M1, N1, and C1 can represent width, height, and number of channels, respectively), and the enhanced image feature of the object to be detected is a size It is a convolution feature map of M1 ⁇ N1 ⁇ C2 (M1, N1, and C2 represent width, height, and number of channels, respectively). Then, by combining these two convolution feature maps, the final image features of the object to be detected can be obtained , The final image feature is a convolution feature map with a size of M1 ⁇ N1 ⁇ (C1+C2).
  • the description here is based on an example in which the convolution feature map of the initial image feature and the convolution feature map of the enhanced image feature have the same size (same width and height) but different channel numbers.
  • the initial image feature and the enhanced image feature can also be combined.
  • the size of the convolution feature map and the convolution feature map of the enhanced image feature are unified (the width and height are unified), and then the convolution feature map of the initial image feature and the convolution feature map of the enhanced image feature are combined to obtain the final image feature Convolution feature map.
  • the detection result of the object to be detected is comprehensively determined by the initial image feature of the object to be detected and the enhanced image feature, and the detection result is obtained by considering only the initial image feature of the object to be detected In comparison, better detection results can be obtained.
  • this application when determining the detection result of the object to be detected, this application not only considers the initial image characteristics reflecting the characteristics of the object to be detected, but also considers the semantic information of other objects in the image to be detected that are associated with the object to be detected.
  • a cross-domain knowledge graph also known as a transferable knowledge graph in multiple scenarios
  • the present invention can capture the internal relationship between different objects, and use graph convolutional networks to fuse a large number of different data sets and different types of information. The data utilization rate is greatly improved, the detection performance is higher, and the large-scale object detection is truly realized.
  • a model trained only through the second domain mentioned above may be determined to be a person and a handbag when the detection result is determined. If the solution provided in this application is passed, the second training through the first domain and the second domain The model test, after confirming that the test result may be a woman carrying a handbag. And finally improve the effect of object detection.
  • the method shown in FIG. 4 further includes: determining the initial candidate frame of the object to be detected according to the initial image feature of the object to be detected.
  • the entire image of the image to be detected is first subjected to convolution processing to obtain the convolution characteristics of the entire image of the image to be detected, and then according to the fixed size requirements, the image to be detected Divide into different boxes, score the features corresponding to the image in each box, and filter out the boxes with higher scores as the initial candidate boxes.
  • the image to be detected is the first image.
  • the entire image of the first image can be convolved to obtain the convolution characteristics of the entire image of the first image. , And then divide the first image into 3 ⁇ 3 boxes, and score the corresponding features of each box image. Finally, box A and box B with higher scores can be screened out as initial candidate boxes.
  • the process of determining the candidate frame and classification of the object to be detected may be to first combine the initial image feature and the enhanced image feature to obtain After the final image feature, the initial candidate frame is adjusted according to the final image feature to obtain the candidate frame, and the initial classification result is corrected according to the final image feature to obtain the classification result.
  • the foregoing adjustment of the initial candidate frame according to the final image feature may be adjusting the coordinates around the initial candidate frame according to the final image feature until the candidate frame is obtained, and the foregoing adjustment of the initial classification result according to the final image feature may be: Build a classifier to reclassify, and then get the classification result.
  • Fig. 6 is a schematic flowchart of an object detection method according to an embodiment of the present application.
  • the method shown in FIG. 6 may be executed by an object detection device, which may be an electronic device with an object detection function.
  • the form of the device specifically included in the electronic device can be as described above in the method shown in FIG. 4.
  • the method shown in FIG. 6 includes steps 601 to 609, and these steps are described in detail below.
  • step 602 and step 603 may be detailed implementations of step 402 (or referred to as specific implementations), and steps 604 to 608 can be detailed implementations of step 403 (or referred to as specific implementations).
  • Step 601 can be understood with reference to step 401 in the embodiment corresponding to FIG. 4, and details are not repeated here.
  • the image to be detected can be input into a traditional object detector for processing (such as Faster-RCNN) to obtain the initial candidate area. Since this application performs object detection for multiple different domains, each domain has its own corresponding initial candidate area.
  • a traditional object detector for processing such as Faster-RCNN
  • the image to be detected can be convolved first to obtain the convolution characteristics of the entire image of the image to be detected, and then the image to be detected is divided into different boxes according to certain size requirements, and then for different domains, Score the features corresponding to the image in each box, and filter out the boxes with higher scores as the initial candidate boxes, thereby obtaining the initial candidate boxes corresponding to different domains.
  • CNN can be used to extract the image features of the initial candidate region. For example, if the first image is the image to be detected, in order to obtain the initial candidate frame of the object to be detected in the first image, the first image can be convolved to obtain the convolution feature of the first image, and then Divide the first image into 4 ⁇ 4 boxes (can also be divided into other numbers of boxes), score the corresponding features of the image of each box, and score the higher box A and box B Filtered out as the initial candidate frame.
  • the image features of the entire image of the image to be detected (the image features of the entire image of the image to be detected can be obtained by convolution processing the entire image of the image to be detected) corresponding to the square
  • the initial image feature corresponding to box A and the initial image feature corresponding to box B are obtained.
  • the domain-related semantic pool records the high-level semantic features of each category.
  • the classification layer parameters corresponding to different categories in the classifier may continuously change. In this case, the semantic pool can be classified The corresponding classification layer parameters are updated.
  • the extracted classification layer parameters may be the classification layer parameters of all classifications in the classifiers corresponding to different domains in the object detector for object detection of the object to be detected.
  • the high-level semantic features in the semantic pool corresponding to the domain are mapped to the nodes of the area map in the domain to obtain the high-level of the object to be detected Semantic representation.
  • the weights on the edges of the area graph nodes in the domain are given.
  • the weight of the edge between the i-th node and the j-th node in the region graph is Where f i and f i are the feature of the i-th object to be detected in one domain and the feature of the j-th object to be detected in the other domain.
  • an intra-domain area map can be constructed separately according to the above method.
  • the high-level semantic features of the semantic pool are mapped to the nodes of the inter-domain map to obtain the high-level semantics of the object to be detected Express.
  • the weight of the relationship between the categories is given, and then projected to the edge of the inter-domain graph node to obtain the weight of the node's edge of the inter-domain graph.
  • the distance in step 606 can be understood with reference to the explanation of the distance in the embodiment corresponding to FIG. 4, and details are not repeated here.
  • the feature construction method on the nodes of the inter-domain graph is the same as that of the intra-domain graph.
  • the weight of the edge between the i-th node in one domain and the j-th node in the other domain of the inter-domain graph is Where f i and f i are the feature of the i-th object to be detected in one domain and the feature of the j-th object to be detected in the other domain.
  • the intra-domain graph convolutional network is used to spread the high-level semantic representations of different objects to be detected on the nodes, and the features that are combined with the high-level semantic representations of other objects to be detected after inference and inference are obtained.
  • the graph convolution of the spatial information mechanism can be selected.
  • the relative spatial information between the objects to be detected is used to learn K Gaussian kernels.
  • the specific formula is:
  • ⁇ (k) is the k-th Gaussian kernel
  • ⁇ k and ⁇ k are the learnable mean vector and covariance vector
  • g ij represents the relative spatial relationship between the i-th and j-th objects to be detected
  • f′ k (i) ⁇ j ⁇ adjacent node (i) ⁇ k (g ij )x j e ij .
  • the K features obtained by the intra-domain graph convolution on each node will be fused into the corresponding high-level semantic representation of the object to be detected.
  • the high-level semantic representation of the object to be detected in different domains on the node is propagated using the inter-domain graph convolutional network, and the features of the high-level semantic representation of the object to be detected in different domains are obtained after inference and inference.
  • Step 609 can be understood with reference to step 404 in the embodiment corresponding to FIG. 4, and details are not repeated here.
  • the object detection method of the embodiment of the present application is described in detail above in combination with the flowchart. In order to better understand the object detection method of the embodiment of the present application, the object detection method of the embodiment of the present application will be described in detail below in conjunction with a more specific flowchart. description of.
  • FIG. 7 is a schematic flowchart of an object detection method according to an embodiment of the present application.
  • the method shown in FIG. 7 may be executed by an object detection device, which may be an electronic device with an object detection function.
  • the form of the device specifically included in the electronic device can be as described above in the method shown in FIG. 4 introduced above.
  • Step 1 Input the picture and pass through a traditional object detector to obtain a preliminary candidate frame and the characteristics of the object to be detected.
  • Step 2 Use classifiers corresponding to different domains in the object detector to extract classification layer parameters, and construct a domain-related semantic pool for each domain to record the high-level semantic features of each category. This semantic pool will be continuously updated as the classifier is optimized during the training process.
  • Step 3 Construct a region map in the domain: According to the classification weights of the features of the object to be detected in different categories given by the detection network, map the high-level semantic features of the semantic pool to the nodes of the region map in the domain to obtain the high-level semantics of the object to be detected Express. According to the relationship between the features of different objects to be detected, the weights on the edges of the nodes in the area graph are given.
  • Step 4 Construct the inter-domain area map: According to the classification weights of the object features to be detected in the respective domains given by the detection network in different categories, map the high-level semantic features of the semantic pool to the nodes of the inter-domain area graph to obtain the High-level semantic representation of detected objects. According to the distance between the classification features of the detected objects in two different domains, the weight of the relationship between the categories is given, and then projected to the edge of the inter-domain graph node to obtain the weight of the node's edge of the inter-domain graph.
  • Step 5 Intra-domain graph convolution: Through the constructed intra-domain map, the intra-domain graph convolution network is used to spread the high-level semantic representations of different objects to be detected on the nodes, and the features that are combined with the high-level semantic representations of other objects to be detected after inference are obtained . By learning a sparse area map to fuse the high-level semantic representation of different objects to be detected, the feature expression ability of different objects to be detected is enhanced.
  • Step 6 Inter-domain graph convolution: Through the constructed inter-domain area graph, the inter-domain graph convolution network is used to propagate the high-level semantic representation of the objects to be detected in different domains on the node, and the inference and inference of the fusion of different domains are obtained. Detect features of high-level semantic representations of objects.
  • Step 7 Optimize and enhance the feature layer of the candidate region: project the features obtained after inference inference from the intra-domain graph convolution and the inter-domain graph convolution into the corresponding high-level semantic representation of the object to be detected, and perform classification and regression to achieve improvement The purpose of large-scale testing performance.
  • the first method shown in Table 1 is the FPN detection method
  • the second method is the multi-branch detection method (Multi Branches).
  • the data set used for training the model includes three data sets: MSCOCO data set, visual genome (VG) data set and ADE data set, that is, the three data sets are used to train the model together, and the test phase is performed for each data set separately test.
  • the MSCOCO data set has 80 general object detection annotations, containing about 110,000 training data sets and 5,000 test sets.
  • the VG data set has a total of 1,000 large-scale general object detection data sets, a training data set of 88,000 images, and a test set of 5,000.
  • the ADE dataset has 445 types of large-scale general object detection datasets, a training dataset of 20,000 images, and a test set of 1,000.
  • the average precision (AP) and average recall (AR) are mainly used for evaluation, and the accuracy under different thresholds is considered in the comparison.
  • the average precision and average recall of the object are mainly used for evaluation, and the accuracy under different thresholds is considered in the comparison.
  • the three data sets are used for model training.
  • the AP and AR of the method of this application are respectively greater than the first The AP and AR of one method and the second method, and the larger the value of AP and AR, the better the effect of object detection. It can be seen from Table 1 that the method of the present application has a significant improvement in effect compared with several existing object detection methods.
  • the training model in Table 1 uses three data sets for training.
  • this application provides The method is compared with the effects of several existing object detection methods.
  • several other object detection methods can also be included, such as the third method: fine-tuning, fourth One method: overlap label detection method (overlap labels), and the fifth method: pseudo label detection method (pseudo labels).
  • any two data sets of the three data sets are used for model training.
  • the AP and AR must be larger than AP and AR of the first object detection method to the sixth object detection method, and the larger the value of AP and AR, the better the effect of object detection. It can be seen from Table 2 that the method of the present application has a significant improvement in effect compared with several existing object detection methods.
  • the method provided in this application has a significant improvement in the detection effect in situations where there are serious object occlusions, blurred categories, and small-scale objects.
  • our method effectively captures the internal relationship between different objects by constructing a multi-domain transferable knowledge map, and uses graph convolutional networks to fuse a large number of different data sets and different categories Information greatly improves the data utilization rate, makes the detection performance higher, and truly realizes large-scale object detection.
  • Fig. 8 is a schematic flowchart of a neural network training method according to an embodiment of the present application.
  • the method shown in FIG. 8 can be executed by a device with strong computing capabilities such as a computer device, a server device, or a computing device.
  • a device with strong computing capabilities such as a computer device, a server device, or a computing device.
  • the training data includes training images in different domains and object detection and labeling results of the objects to be detected in the training images.
  • the cross-domain knowledge map information includes the association relationship between object categories corresponding to the objects to be detected in different domains, and the enhanced image features indicate semantic information of object categories corresponding to other objects in different domains that are associated with the objects to be detected.
  • the object detection and annotation result of the object to be detected in the training image includes the annotation candidate frame and the annotation classification result of the object to be detected in the training image.
  • a set of initial model parameters can be set for the neural network, and then based on the object detection result of the object to be detected in the training image and the object detection labeling result of the object to be detected in the training image.
  • Gradually adjust the model parameters of the neural network until the difference between the object detection structure of the object to be detected in the training image and the object detection and annotation results of the object to be detected in the training image is within a certain preset range, or when When the number of times of training reaches the preset number of times, the model parameters of the neural network at this time are determined as the final parameters of the neural network model, thus completing the training of the neural network.
  • neural network trained through the method shown in FIG. 8 can be used to implement the object detection method of the embodiment of the present application.
  • the training method of the present application extracts more features for object detection during the training process, and can train a neural network with better performance, so that the neural network for object detection can achieve better object detection results. .
  • the cross-domain knowledge graph may include nodes and node edges, where nodes correspond to objects to be detected, and node edges correspond to relationships between high-level semantic features of different objects to be detected.
  • the classification layer parameters corresponding to different domains are weighted and merged to obtain the high-level semantic features of the object to be detected.
  • the classification layer parameters can be understood as maintaining a class of the category. center. The weight of the relationship between the object categories corresponding to the object to be detected in different domains is projected onto the node connection edge of the object to be detected, and the weight of the node connection edge is obtained.
  • the high-level semantic features are convolved according to the weights of the edges of the nodes, and the enhanced image features of the object to be detected can be obtained.
  • the relationship weight may be determined according to the distance relationship between the object categories corresponding to the objects to be detected in different domains.
  • the distance relationship includes one or more of the following information:
  • the color of an apple is red, and the color of a strawberry is also red. Then, apples and strawberries have the same color attributes (or, it can be said that apples and strawberries are relatively close in color attributes).
  • the similarity of word embedding constructed with linguistic knowledge can be understood as the degree of similarity between word vectors of different object categories.
  • the weight of the edge between the i-th node in one domain and the j-th node in the other domain is
  • f i and f j are the feature of the i-th object to be detected in one domain and the feature of the j-th object to be detected in the other domain (abbreviation of the initial image feature of the object to be detected).
  • FIG. 9 and FIG. 10 can execute each step of the object detection method of the embodiment of the present application
  • the neural network training device shown in FIG. 11 can execute each of the neural network training method of the embodiment of the present application. Steps, the repeated description will be appropriately omitted when introducing the devices shown in FIG. 9 to FIG. 11 below.
  • Fig. 9 is a schematic block diagram of an object detection device according to an embodiment of the present application.
  • the object detection device 7000 shown in FIG. 9 includes:
  • the image acquisition module 901 is configured to perform step 401 in the embodiment corresponding to FIG. 4 and step 601 in the embodiment corresponding to FIG. 6.
  • the feature extraction module 902 is configured to perform step 402 in the embodiment corresponding to FIG. 4, step 602 in the embodiment corresponding to FIG. 6, step 603 in the embodiment corresponding to FIG. 6, and step in the embodiment corresponding to FIG. 6 607, step 608 in the embodiment corresponding to FIG. 6.
  • the detection module 903 is configured to execute step 404 in the embodiment corresponding to FIG. 4 and step 609 in the embodiment corresponding to FIG. 6.
  • the parameter extraction module 904 is configured to execute step 403 in the embodiment corresponding to FIG. 4 and step 604 in the embodiment corresponding to FIG. 6.
  • the projection module 905 is configured to perform step 605 in the embodiment corresponding to FIG. 6 and step 606 in the embodiment corresponding to FIG. 6.
  • the relationship weight determination module 906 is configured to perform step 605 in the embodiment corresponding to FIG. 6 and step 606 in the embodiment corresponding to FIG. 6.
  • the image acquisition module 901 in the above object detection device may be equivalent to the I/O interface 112 in the execution device 110, and the object detection device
  • the feature extraction module 902 and the detection module 903 are equivalent to the calculation module 111 in the execution device 110.
  • the image acquisition module 901 in the above object detection device may be equivalent to the bus interface unit 510 in the neural network processor, and the object detection device
  • the feature extraction module 902 and the detection module 903 in the execution device 110 are equivalent to the arithmetic circuit 503 in the execution device 110, or the feature extraction module 902 and the detection module 903 in the object detection device can also be equivalent to the arithmetic circuit 303+vector calculation in the execution device 110 Unit 307 + accumulator 308.
  • Fig. 10 is a schematic block diagram of an object detection device according to an embodiment of the present application.
  • the object detection device module shown in FIG. 10 includes a memory 1001, a processor 1002, a communication interface 1003, and a bus 1004.
  • the memory 1001, the processor 1002, and the communication interface 1003 implement communication connections between each other through the bus 1004.
  • the communication interface 1003 is equivalent to the image acquisition module 901 in the object detection device, and the processor 1002 is equivalent to the feature extraction module 902 and the detection module 903 in the object detection device.
  • the following is a detailed introduction to each module and modules in the object detection device module.
  • the memory 1001 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1001 may store a program.
  • the processor 1002 and the communication interface 1003 are used to execute each step of the object detection method in the embodiment of the present application.
  • the communication interface 1003 may obtain the image to be detected from a memory or other devices, and then the processor 1002 performs object detection on the image to be detected.
  • the processor 1002 may adopt a general central processing unit (CPU), a microprocessor, an application specific integrated circuit (ASIC), a graphics processing unit (GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the modules in the object detection device of the embodiment of the present application (for example, the processor 1002 can implement the feature extraction module 902 and the detection module 903 in the above-mentioned object detection device).
  • the function to be executed or execute the object detection method of the embodiment of the present application.
  • the processor 1002 may also be an integrated circuit chip with signal processing capability.
  • each step of the object detection method in the embodiment of the present application can be completed by an integrated logic circuit of hardware in the processor 1002 or instructions in the form of software.
  • the above-mentioned processor 1002 may also be a general-purpose processor, a digital signal processing (digital signal processing, DSP), an ASIC, an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices , Discrete hardware components.
  • the aforementioned general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and combines its hardware to complete the functions required by the modules included in the object detection apparatus of the embodiment of the present application, or perform the object detection of the method embodiment of the present application method.
  • the communication interface 1003 uses a transceiving device such as but not limited to a transceiver to implement communication between the device module and other devices or a communication network.
  • a transceiving device such as but not limited to a transceiver to implement communication between the device module and other devices or a communication network.
  • the image to be processed can be obtained through the communication interface 1003.
  • the bus 1004 may include a path for transferring information between various components of the device module (for example, the memory 1001, the processor 1002, and the communication interface 1003).
  • FIG. 11 is a schematic diagram of the hardware structure of a neural network training device according to an embodiment of the present application. Similar to the above device, the neural network training device shown in FIG. 11 includes a memory 1101, a processor 1102, a communication interface 1103, and a bus 1104. Among them, the memory 1101, the processor 1102, and the communication interface 1103 implement communication connections between each other through the bus 1104.
  • the memory 1101 may store a program.
  • the processor 1102 is configured to execute each step of the neural network training method of the embodiment of the present application.
  • the processor 1102 may adopt a general CPU, a microprocessor, an ASIC, a GPU, or one or more integrated circuits for executing related programs to implement the neural network training method of the embodiment of the present application.
  • the processor 1102 may also be an integrated circuit chip with signal processing capabilities.
  • each step of the neural network training method (the method shown in FIG. 8) of the embodiment of the present application can be completed by the integrated logic circuit of the hardware in the processor 1102 or the instructions in the form of software.
  • the neural network is trained by the neural network training device shown in FIG. 11, and the trained neural network can be used to execute the object detection method of the embodiment of the present application (the method shown in FIG. 8).
  • the device shown in FIG. 11 can obtain training data and the neural network to be trained from the outside through the communication interface 1103, and then the processor trains the neural network to be trained according to the training data.
  • the device modules and devices only show the memory, processor, and communication interface, in the specific implementation process, those skilled in the art should understand that the device modules and devices may also include other necessary for normal operation. Device. At the same time, according to specific needs, those skilled in the art should understand that the device modules and devices may also include hardware devices that implement other additional functions. In addition, those skilled in the art should understand that the device module and the device may also only include the necessary devices for implementing the embodiments of the present application, and not necessarily all the devices shown in FIG. 10 and FIG. 11.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division, and there may be other divisions in actual implementation, for example, multiple modules or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or modules, and may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or they may be distributed to multiple network modules. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
  • the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Databases & Information Systems (AREA)
  • Image Analysis (AREA)

Abstract

Des modes de réalisation de la présente demande se rapportent au domaine de l'intelligence artificielle, et en particulier au domaine de la vision par ordinateur. L'invention concerne un procédé et un appareil de détection d'objets. Le procédé peut comprendre les étapes consistant à : obtenir une image à détecter ; déterminer, dans l'image à détecter, des caractéristiques d'image initiales d'un objet à détecter ; selon des informations de graphe de connaissances inter-domaines, déterminer des caractéristiques d'image améliorées de l'objet à détecter, les informations de graphe de connaissances inter-domaines comprenant une relation d'association entre des catégories d'objet correspondant à l'objet à détecter dans différents domaines, et les caractéristiques d'image améliorées indiquant des informations sémantiques des catégories d'objet correspondant à d'autres objets associés à l'objet à détecter dans différents domaines ; et en fonction des caractéristiques d'image initiales de l'objet à détecter et des caractéristiques d'image améliorées de l'objet à détecter, déterminer la boîte candidate et la catégorie de l'objet à détecter. Au moyen de la solution technique décrite par la présente demande, un graphe de connaissances inter-domaines est construit, une relation interne entre différents objets à détecter peut être obtenue, et l'effet de détection d'objets est amélioré.
PCT/CN2020/112796 2020-01-21 2020-09-01 Procédé et appareil de détection d'objets, et support de stockage Ceased WO2021147325A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010072238.0 2020-01-21
CN202010072238.0A CN111310604A (zh) 2020-01-21 2020-01-21 一种物体检测方法、装置以及存储介质

Publications (1)

Publication Number Publication Date
WO2021147325A1 true WO2021147325A1 (fr) 2021-07-29

Family

ID=71161604

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/112796 Ceased WO2021147325A1 (fr) 2020-01-21 2020-09-01 Procédé et appareil de détection d'objets, et support de stockage

Country Status (2)

Country Link
CN (1) CN111310604A (fr)
WO (1) WO2021147325A1 (fr)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807247A (zh) * 2021-09-16 2021-12-17 清华大学 基于图卷积网络的行人重识别高效标注方法及装置
CN114266940A (zh) * 2021-12-24 2022-04-01 中山大学 基于动态标签分配的图像目标检测方法及装置
CN114283317A (zh) * 2021-11-26 2022-04-05 中国传媒大学 目标检测方法、装置、设备和存储介质
CN114281976A (zh) * 2021-08-27 2022-04-05 腾讯科技(深圳)有限公司 一种模型训练方法、装置、电子设备及存储介质
CN114579981A (zh) * 2022-03-10 2022-06-03 北京国腾创新科技有限公司 一种跨域漏洞检测方法、系统、存储介质和电子设备
CN114881329A (zh) * 2022-05-09 2022-08-09 山东大学 一种基于引导图卷积神经网络的轮胎质量预测方法及系统
CN115830721A (zh) * 2022-11-02 2023-03-21 深圳市新良田科技股份有限公司 活体检测方法、装置、终端设备和可读存储介质
CN116049443A (zh) * 2023-02-13 2023-05-02 南京云创大数据科技股份有限公司 一种知识图谱的构造方法、装置、电子设备和存储介质
CN116244284A (zh) * 2022-12-30 2023-06-09 成都中轨轨道设备有限公司 一种基于立体内容的大数据处理方法
CN116665095A (zh) * 2023-05-18 2023-08-29 中国科学院空间应用工程与技术中心 一种运动舰船检测方法、系统、存储介质和电子设备
CN116956228A (zh) * 2023-08-03 2023-10-27 重庆市科学技术研究院 一种技术交易平台的文本挖掘方法
CN118735716A (zh) * 2024-08-30 2024-10-01 天津市地质研究和海洋地质中心 一种高标准农田健康耕层构建方法及系统
CN119516329A (zh) * 2024-11-01 2025-02-25 重庆大学 基于Transformer与U型网络融合架构的红外小目标检测方法
CN119762414A (zh) * 2024-08-01 2025-04-04 北京理工大学 一种基于跨布匹纹理特征增强的断经断纬检测模型及检测方法
CN120409495A (zh) * 2025-06-30 2025-08-01 北京清大科越股份有限公司 一种电力市场数据图谱的语义增强及动态补全方法及系统

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378381B (zh) * 2019-06-17 2024-01-19 华为技术有限公司 物体检测方法、装置和计算机存储介质
CN111310604A (zh) * 2020-01-21 2020-06-19 华为技术有限公司 一种物体检测方法、装置以及存储介质
CN113935391B (zh) * 2020-06-29 2024-11-12 中国移动通信有限公司研究院 物体检测方法、知识库的构建方法、装置及电子设备
CN111783457B (zh) * 2020-07-28 2021-05-11 北京深睿博联科技有限责任公司 一种基于多模态图卷积网络的语义视觉定位方法及装置
CN112925920A (zh) * 2021-03-23 2021-06-08 西安电子科技大学昆山创新研究院 一种智慧社区大数据知识图谱网络社团检测方法
CN114627443B (zh) * 2022-03-14 2023-06-09 小米汽车科技有限公司 目标检测方法、装置、存储介质、电子设备及车辆
CN115375657A (zh) * 2022-08-23 2022-11-22 抖音视界有限公司 息肉检测模型的训练方法、检测方法、装置、介质及设备
CN116563701A (zh) * 2023-04-19 2023-08-08 中国工商银行股份有限公司 目标对象检测方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7664339B2 (en) * 2004-05-03 2010-02-16 Jacek Turski Image processing method for object recognition and dynamic scene understanding
CN104573711A (zh) * 2014-12-22 2015-04-29 上海交通大学 基于文本-物体-场景关系的物体和场景的图像理解方法
CN110378381A (zh) * 2019-06-17 2019-10-25 华为技术有限公司 物体检测方法、装置和计算机存储介质
CN110704626A (zh) * 2019-09-30 2020-01-17 北京邮电大学 一种用于短文本的分类方法及装置
CN111310604A (zh) * 2020-01-21 2020-06-19 华为技术有限公司 一种物体检测方法、装置以及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7664339B2 (en) * 2004-05-03 2010-02-16 Jacek Turski Image processing method for object recognition and dynamic scene understanding
CN104573711A (zh) * 2014-12-22 2015-04-29 上海交通大学 基于文本-物体-场景关系的物体和场景的图像理解方法
CN110378381A (zh) * 2019-06-17 2019-10-25 华为技术有限公司 物体检测方法、装置和计算机存储介质
CN110704626A (zh) * 2019-09-30 2020-01-17 北京邮电大学 一种用于短文本的分类方法及装置
CN111310604A (zh) * 2020-01-21 2020-06-19 华为技术有限公司 一种物体检测方法、装置以及存储介质

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114281976A (zh) * 2021-08-27 2022-04-05 腾讯科技(深圳)有限公司 一种模型训练方法、装置、电子设备及存储介质
CN113807247B (zh) * 2021-09-16 2024-04-26 清华大学 基于图卷积网络的行人重识别高效标注方法及装置
CN113807247A (zh) * 2021-09-16 2021-12-17 清华大学 基于图卷积网络的行人重识别高效标注方法及装置
CN114283317A (zh) * 2021-11-26 2022-04-05 中国传媒大学 目标检测方法、装置、设备和存储介质
CN114266940A (zh) * 2021-12-24 2022-04-01 中山大学 基于动态标签分配的图像目标检测方法及装置
CN114579981A (zh) * 2022-03-10 2022-06-03 北京国腾创新科技有限公司 一种跨域漏洞检测方法、系统、存储介质和电子设备
CN114881329A (zh) * 2022-05-09 2022-08-09 山东大学 一种基于引导图卷积神经网络的轮胎质量预测方法及系统
CN115830721A (zh) * 2022-11-02 2023-03-21 深圳市新良田科技股份有限公司 活体检测方法、装置、终端设备和可读存储介质
CN115830721B (zh) * 2022-11-02 2024-05-03 深圳市新良田科技股份有限公司 活体检测方法、装置、终端设备和可读存储介质
CN116244284A (zh) * 2022-12-30 2023-06-09 成都中轨轨道设备有限公司 一种基于立体内容的大数据处理方法
CN116244284B (zh) * 2022-12-30 2023-11-14 成都中轨轨道设备有限公司 一种基于立体内容的大数据处理方法
CN116049443A (zh) * 2023-02-13 2023-05-02 南京云创大数据科技股份有限公司 一种知识图谱的构造方法、装置、电子设备和存储介质
CN116665095B (zh) * 2023-05-18 2023-12-22 中国科学院空间应用工程与技术中心 一种运动舰船检测方法、系统、存储介质和电子设备
CN116665095A (zh) * 2023-05-18 2023-08-29 中国科学院空间应用工程与技术中心 一种运动舰船检测方法、系统、存储介质和电子设备
CN116956228A (zh) * 2023-08-03 2023-10-27 重庆市科学技术研究院 一种技术交易平台的文本挖掘方法
CN119762414A (zh) * 2024-08-01 2025-04-04 北京理工大学 一种基于跨布匹纹理特征增强的断经断纬检测模型及检测方法
CN118735716A (zh) * 2024-08-30 2024-10-01 天津市地质研究和海洋地质中心 一种高标准农田健康耕层构建方法及系统
CN119516329A (zh) * 2024-11-01 2025-02-25 重庆大学 基于Transformer与U型网络融合架构的红外小目标检测方法
CN120409495A (zh) * 2025-06-30 2025-08-01 北京清大科越股份有限公司 一种电力市场数据图谱的语义增强及动态补全方法及系统

Also Published As

Publication number Publication date
CN111310604A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021147325A1 (fr) Procédé et appareil de détection d'objets, et support de stockage
CN111291809B (zh) 一种处理装置、方法及存储介质
US12314343B2 (en) Image classification method, neural network training method, and apparatus
CN110378381B (zh) 物体检测方法、装置和计算机存储介质
CN112446398B (zh) 图像分类方法以及装置
CN112990211B (zh) 一种神经网络的训练方法、图像处理方法以及装置
CN110188795B (zh) 图像分类方法、数据处理方法和装置
CN110070107B (zh) 物体识别方法及装置
CN110222717B (zh) 图像处理方法和装置
CN111368972B (zh) 一种卷积层量化方法及其装置
WO2021057056A1 (fr) Procédé de recherche d'architecture neuronale, procédé et dispositif de traitement d'image, et support de stockage
WO2021042828A1 (fr) Procédé et appareil de compression de modèle de réseau neuronal, ainsi que support de stockage et puce
WO2020244653A1 (fr) Procédé et dispositif d'identification d'objet
WO2022001805A1 (fr) Procédé et dispositif de distillation de réseau neuronal
CN112580720A (zh) 一种模型训练方法及装置
WO2022007867A1 (fr) Procédé et dispositif de construction de réseau neuronal
WO2022217434A1 (fr) Réseau cognitif, procédé de formation de réseau cognitif, et procédé et appareil de reconnaissance d'objet
WO2021136058A1 (fr) Procédé et dispositif de traitement vidéo
WO2022179606A1 (fr) Procédé de traitement d'image et appareil associé
CN111797970A (zh) 训练神经网络的方法和装置
WO2023125628A1 (fr) Procédé et appareil d'optimisation de modèle de réseau neuronal et dispositif informatique
US20250356173A1 (en) Data processing method and apparatus thereof
Grigorev et al. Depth estimation from single monocular images using deep hybrid network
CN112464930A (zh) 目标检测网络构建方法、目标检测方法、装置和存储介质
CN115731530A (zh) 一种模型训练方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20914791

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20914791

Country of ref document: EP

Kind code of ref document: A1