[go: up one dir, main page]

CN116977751A - Image processing method, device and computer readable storage medium - Google Patents

Image processing method, device and computer readable storage medium Download PDF

Info

Publication number
CN116977751A
CN116977751A CN202310091688.8A CN202310091688A CN116977751A CN 116977751 A CN116977751 A CN 116977751A CN 202310091688 A CN202310091688 A CN 202310091688A CN 116977751 A CN116977751 A CN 116977751A
Authority
CN
China
Prior art keywords
loss function
image
model
dimensional feature
prediction probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310091688.8A
Other languages
Chinese (zh)
Inventor
张博深
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310091688.8A priority Critical patent/CN116977751A/en
Publication of CN116977751A publication Critical patent/CN116977751A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses an image processing method, an image processing device and a computer readable storage medium, wherein a first image and a second image are acquired; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a contrast loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model; and identifying the image to be identified based on the trained first model or second model. Therefore, the accuracy of image processing is greatly improved.

Description

Image processing method, device and computer readable storage medium
Technical Field
The present application relates to the field of computer vision, and in particular, to an image processing method, an image processing device, and a computer readable storage medium.
Background
The industrial defect quality inspection refers to quality inspection of industrial products in the production and manufacturing process, and the traditional industrial quality inspection is generally performed by quality inspection workers, so that the quality inspection efficiency can be greatly improved and the labor cost can be saved along with the rising of artificial intelligence (Artificial Intelligence, AI) technology in recent years.
In the related art, the AI quality inspection based on computer vision can extract manual characteristics of an input image, and training a classifier to perform two classifications according to the extracted manual characteristics, so as to realize automatic AI quality inspection. In the research and practice process of the prior art, the inventor of the present application finds that in the related art, the generalization of manual feature extraction is poor, which can affect the accuracy of the subsequent model identification and waste the labor cost.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium, which can improve the accuracy of model identification and further improve the accuracy of image processing.
In order to solve the technical problems, the embodiment of the application provides the following technical scheme:
an image processing method, comprising:
acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image;
Inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability;
inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability;
constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability;
performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model;
and identifying the image to be identified based on the trained first model or second model.
An image processing apparatus comprising:
the acquisition unit is used for acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image;
a first output unit for inputting the first image to a first model, outputting a first high-dimensional feature, a first low-dimensional feature, and a first prediction probability;
A second output unit for inputting the second image to a second model, outputting a second high-dimensional feature, a second low-dimensional feature, and a second prediction probability;
the first construction unit is used for constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability;
the second construction unit is used for carrying out iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function combined construction target loss function to obtain a trained first model and trained second model;
and the identification unit is used for identifying the image to be identified based on the trained first model or second model.
In some embodiments, the first building element comprises:
the first reconstruction subunit is used for carrying out reconstruction processing on the first low-dimensional features to obtain reconstructed first reconstructed features;
the second reconstruction subunit is used for carrying out reconstruction processing on the second low-dimensional features to obtain reconstructed second reconstructed features;
an acquisition subunit configured to acquire a first difference between the first high-dimensional feature and a first reconstructed feature, and acquire a second difference between a second high-dimensional feature and a second reconstructed feature;
A first construction subunit for constructing a reconstruction loss function from the first and second differences;
and the second construction subunit is used for constructing a contrast loss function according to the first prediction probability and the second prediction probability.
In some embodiments, the acquisition subunit is configured to:
calculating a first difference of the first high-dimensional feature and a first reconstructed feature;
determining an absolute value of the first difference as a first difference;
calculating a second difference of the second high-dimensional feature and a second reconstructed feature;
and determining the absolute value of the second difference value as a second difference.
In some embodiments, the second building subunit is configured to:
calculating a third difference between the first prediction probability and the second prediction probability;
and constructing a contrast loss function based on the absolute value of the third difference value.
In some embodiments, the second building element comprises:
a third construction subunit, configured to construct a supervision loss function according to the first prediction probability, the second prediction probability, and the tag information;
a fourth construction subunit configured to construct a target loss function based on a sum of the reconstructed loss function, the comparison loss function, and the supervised loss function;
And the training subunit is used for carrying out iterative training on the first model and the second model based on the target loss function until the target loss function converges to obtain a first model and a second model after training.
In some embodiments, the third building subunit is configured to:
acquiring a third difference between the first prediction probability and the tag information;
acquiring a fourth difference between the second prediction probability and the tag information;
and constructing a supervision loss function according to the third difference and the fourth difference.
In some embodiments, the fourth building subunit is configured to:
determining a target contrast loss function according to the contrast loss function and the first weight;
determining a target supervision loss function according to the supervision loss function and the second weight;
and constructing a target loss function based on the sum of the reconstruction loss function, the target contrast loss function and the target supervision loss function.
In some embodiments, the acquiring unit is configured to:
acquiring a first image;
performing data enhancement processing on the first image to obtain a second image;
the processing mode of the data enhancement processing comprises at least one of noise increase, image inversion, image rotation, image brightness adjustment and contrast adjustment.
In some embodiments, the first output unit is configured to:
inputting the first image into a first model for feature extraction to obtain a first high-dimensional feature;
processing the first high-dimensional feature through a first full-connection layer and a nonlinear activation layer in the first model to obtain a first low-dimensional feature;
and processing the first low-dimensional features through a second full-connection layer in the first model to obtain a first prediction probability.
The embodiment of the application acquires a first image and a second image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a contrast loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model; and identifying the image to be identified based on the trained first model or second model. In this way, a first image and a second image camouflaged based on the first image are obtained, a reconstruction loss function between high-dimensional features and low-dimensional features is constructed through design, and a comparison loss function between a first prediction probability output by the first image by a first model and a second prediction probability output by the second image by a second model is combined with a supervision loss function to construct a target loss function for training, so that the feature representation capability and robustness of the model can be learned are improved, the accuracy and generalization of the trained model for identifying the image to be identified are also stronger, and compared with the technical scheme that manual feature extraction is required in the related art, the embodiment of the application greatly improves the accuracy of image processing and saves the labor cost.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an image processing system according to an embodiment of the present application;
fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application;
fig. 3a is a schematic view of a scenario of an image processing method according to an embodiment of the present application;
FIG. 3b is a schematic diagram of model training of an image processing method according to an embodiment of the present application;
FIG. 3c is a schematic diagram of a model application of an image processing method according to an embodiment of the present application;
FIG. 4 is a flow chart of an image processing method according to an embodiment of the present application;
fig. 5 is a schematic structural view of an image processing apparatus according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides an image processing method, an image processing device and a computer readable storage medium.
Referring to fig. 1, fig. 1 is a schematic view of a scene of an image processing system according to an embodiment of the present application, including: the terminal and the server (the image processing system may further include other terminals besides the terminal, the specific number of the terminals is not limited herein) may be connected through a communication network, and the communication network may include a wireless network and a wired network, where the wireless network includes one or more of a wireless wide area network, a wireless local area network, a wireless metropolitan area network, and a wireless personal area network. The network includes network entities such as routers, gateways, etc., which are not shown. The terminal can perform information interaction with the server through a communication network, for example, the terminal can send an image to be identified, which needs to be subjected to defect quality inspection, to the server.
The image processing system may include an image processing device, where the image processing device may be specifically integrated in a server, and the server may be an independent physical server, or may be a server cluster or a distributed system formed by multiple physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and an artificial intelligence platform. As shown in fig. 1, the server acquires a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and a trained second model; and identifying the image to be identified sent by the terminal based on the trained first model or second model.
The terminal in the image processing system can be provided with various applications required by users, such as instant messaging application, media application, browser application and the like, and can send images to be identified, which need industrial quality inspection, to the server for defect identification, so that an automatic quality inspection function is realized.
It should be noted that, the schematic view of the image processing system shown in fig. 1 is only an example, and the image processing system and the scene described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided by the embodiment of the present application, and those skilled in the art can know that, with the evolution of the image processing system and the appearance of a new service scene, the technical solution provided by the embodiment of the present application is equally applicable to similar technical problems.
In this embodiment, description will be made from the viewpoint of an image processing apparatus which may be integrated in a computer device having a storage unit and a microprocessor mounted thereon and having an arithmetic capability, and the computer device may be a server or a terminal, and in this embodiment, the description will be made with the computer device as a server.
It will be appreciated that in the specific embodiments of the present application, related data such as images are involved, and when the above embodiments of the present application are applied to specific products or technologies, user approval or consent is required, and the collection, use and processing of related data is required to comply with relevant laws and regulations and standards of the relevant countries and regions.
It should be noted that, in some of the processes described in the specification, claims and drawings above, a plurality of steps appearing in a particular order are included, but it should be clearly understood that the steps may be performed out of order or performed in parallel, the step numbers are merely used to distinguish between the different steps, and the numbers themselves do not represent any order of execution. Furthermore, the description of "first," "second," or "object" and the like herein is for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.
Referring to fig. 2, fig. 2 is a flowchart illustrating an image processing method according to an embodiment of the application. The image processing method comprises the following steps:
in step 101, a first image and a second image are acquired.
It should be noted that, the quality inspection of industrial defects refers to quality inspection of industrial products in the production and manufacturing process, and the traditional quality inspection of industrial products is generally performed by quality inspection workers, so that with the rise of artificial intelligence technology in recent years, the quality inspection efficiency can be greatly improved and the labor cost can be saved by AI quality inspection based on computer vision.
The first image and the second image may be captured images of the industrial product surface, the images being reflective of the severity of defects on the industrial product surface; taking the first image as an example, the first image may be an image of a surface of a non-defective industrial product, or may be an image of a surface of a defective industrial product such as a mild defect or a severe defect, and the second image is similar.
Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions, and the scheme provided by the embodiment of the application relates to the computer vision technology of artificial intelligence.
Computer Vision (CV) is a science of studying how to "look" a machine, and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing to make the Computer process into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, optical character recognition (Optical Character Recognition, OCR), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition techniques such as face recognition, fingerprint recognition, etc.
In the related art, the industrial defect quality inspection of computer vision can be realized by firstly extracting manual features of an input image and then performing two-classification based on a manual feature training classifier, so that the AI quality inspection is realized, and the processing mode has the defects that the generalization of the manual feature extraction is poor, the accuracy of the subsequent model identification is influenced, and the labor cost is wasted. 2. The processing can be performed based on a convolutional neural network (Convolutional Neural Networks, CNN), namely, the images are input into the CNN for feature extraction, and then the full-connection layer is used for automatic classification, so that the automatic quality inspection of AI is realized, the processing mode has the defects that the defects of the images can be various, as shown in FIG. 3a, the ok images (qualified images) are fixed in format, but the defect degree can be different, so that the types of the defects are too many, but the training data set is limited, the model learning is limited, namely, the training data set can be well fitted by the trained model, but the problem can occur in the output of the CNN if the defect images which are not collected in the training data set are encountered in the actual deployment stage, because the current model is only well fitted on the training data set, and the generalization is often poor.
In order to solve the above problems, the embodiment of the present application may relate to a strategy for performing feature reconstruction from low-dimensional features to high-dimensional features, and a strategy for performing contrast training on a dual-depth model, so that the model can learn features with strong characterization capability and strong separability, and the trained model has better recognition capability and higher accuracy on a defect image, and the specific implementation process is as follows:
the embodiment of the application acquires a first image in a training data set, wherein the first image carries corresponding label information, the label information is a binary label, namely 0 and 1,0 represent normal images, 1 represents a defect image, and the label information is artificial labeling.
The training data set comprises a preset number of images, for example 100 images, in order to enrich the training data set as much as possible, data enhancement processing can be performed on the first image to obtain a corresponding expanded second image, and the processing mode of the data enhancement processing comprises at least one of noise increase, image inversion, image rotation, image brightness adjustment and contrast adjustment, so that a camouflaged second image is obtained.
The noise increase means that the first image is increased by gaussian noise, the image inversion may be that the first image is rotated by 180 degrees, the image rotation may be that the first image is rotated by 90 degrees or 270 degrees, the rotation of the first image is achieved, the brightness of the image is adjusted to enhance or reduce the brightness of the first image, the brightness of the first image is adjusted to enhance or reduce the contrast of the first image, the contrast of the first image is adjusted, the processing manner may be that one manner is selected randomly, or the processing is sequentially selected, which is not limited in detail herein.
For example, referring to fig. 3b together, a first image (input image x) is obtained for random data enhancement, so as to obtain a second image (i.e. random data image e'), thereby implementing extended training data and enriching robustness and generalization of subsequent model training.
In step 102, a first image is input to a first model, outputting a first high-dimensional feature, a first low-dimensional feature, and a first prediction probability.
The first model and the second model are depth models (Deep and Cross Network, DCN), the depth models are depth models capable of simultaneously and efficiently learning low-dimensional feature intersections (i.e., low-dimensional features in the embodiment of the application) and high-dimensional nonlinear features (i.e., high-dimensional features in the embodiment of the application), computing resources required while artificial feature engineering is not required are very low, prediction results can be accurately output, and a specific network structure of the depth models is not particularly limited and can be a convolutional neural network model.
Thus, based on the depth model principle, the first image can be input into the first model, and the corresponding first high-dimensional feature is output, wherein the high-dimensional feature is an initial high-dimensional feature set output by the depth model, and the following problems exist in the high-dimensional feature:
(a) The method has a large number of characteristics, and is inconvenient to calculate;
(b) Many features that are not relevant to a given task, i.e., there are many features that have only a weak degree of relevance to a category;
(c) Many features that are redundant for a given task, such as features that have a strong correlation with each other, affect subsequent predictions;
(d) Noise data.
In order to overcome the above problems, the first model further performs feature extraction processing on the first high-dimensional feature, effectively eliminates irrelevant and redundant features, improves efficiency of the mining task, obtains corresponding first low-dimensional features, and further predicts according to the first low-dimensional features to obtain corresponding first prediction probabilities, wherein the first prediction probabilities are values between 0 and 1, the closer the first prediction probability is to 0, the greater the probability that the first image is a defect-free image is, the closer the first prediction probability is to 1, and the greater the probability that the first image is a defect image is.
In some embodiments, the step of inputting the first image into the first model, outputting the first high-dimensional feature, the first low-dimensional feature, and the first prediction probability may include:
(1) Inputting the first image into a first model for feature extraction to obtain a first high-dimensional feature;
(2) Processing the first high-dimensional feature through a first full-connection layer and a nonlinear activation layer in the first model to obtain a first low-dimensional feature;
(3) And processing the first low-dimensional feature through a second full-connection layer in the first model to obtain a first prediction probability.
Referring to fig. 3b, the first image may be input to a first model (depth model 1) for feature extraction to obtain a corresponding first high-dimensional feature, which may be understood by referring to the following formula:
the x is the first image, theThe weight parameters representing the first model are used,the F is h1 For the first high-dimensional feature, the f () function represents the feature extraction method.
The fully connected layer (fully connected layers, FC) functions to map the learned distributed feature representation to the sample tag space, and the nonlinear activation layer functions to add some nonlinear features to the network, avoiding the multi-layer network being equivalent to a single-layer linear function, resulting in greater learning and fitting capabilities. Therefore, the feature dimension reduction is carried out on the first high-dimensional feature through the first full-connection layer and the nonlinear activation layer in the first model, irrelevant and redundant features are effectively eliminated, and the first low-dimensional feature is obtained, and can be understood by referring to the following formula:
Wherein the method comprisesParameters representing the first fully connected layer and the nonlinear activation layer of the first model, F l1 I.e. the first low-dimensional feature.
Further, a second full-connection layer may be added to the first model, and the second full-connection layer may process the first low-dimensional feature and output a first prediction probability (i.e. the supervision loss 1) from 0 to 1, which may be understood by referring to the following formula:
the method comprisesIs the parameter of the second full connection layer, p 1 I.e. the first prediction probability.
In step 103, a second image is input to the second model, outputting a second high-dimensional feature, a second low-dimensional feature, and a second prediction probability.
Based on the depth model principle, the second image can be input into the second model, and corresponding second high-dimensional features are output. In order to realize the second high-dimensional feature optimization, the second model also carries out feature extraction processing on the second high-dimensional feature, effectively eliminates irrelevant and redundant features, improves the efficiency of the mining task, obtains corresponding second low-dimensional features, further predicts according to the second low-dimensional features to obtain corresponding second prediction probability, wherein the second prediction probability is a numerical value between 0 and 1, the closer the second prediction probability is to 0, the greater the probability that the second image is a defect-free image is, the closer the second prediction probability is to 1, and the greater the probability that the second image is a defect image is.
In some embodiments, the step of inputting the second image to the second model, outputting the second high-dimensional feature, the second low-dimensional feature, and the second prediction probability may include:
(1) Inputting the second image into a second model for feature extraction to obtain a second high-dimensional feature;
(2) Processing the second high-dimensional feature through a third full-connection layer and a nonlinear activation layer in the second model to obtain a second low-dimensional feature;
(3) And processing the second low-dimensional feature through a fourth full-connection layer in the second model to obtain a second prediction probability.
Referring to fig. 3b, the second image may be input to a second model (depth model 2) for feature extraction, so as to obtain a corresponding second high-dimensional feature, which may be understood by referring to the following formula:
the x' is the second image, theRepresenting the weight parameters of the second model, the F h2 For the second high-dimensional feature, the f () function represents the feature extraction method.
Feature dimension reduction is performed on the second high-dimensional feature through a third full-connection layer and a nonlinear activation layer in the second model, irrelevant and redundant features are effectively eliminated, and a second low-dimensional feature is obtained, and can be understood by referring to the following formula:
Wherein the method comprisesParameters representing a fourth fully-connected layer and a nonlinear activation layer of the second model, F l2 I.e. the second low-dimensional feature.
Further, a fourth full-connection layer may be added to the second model, and the second low-dimensional feature may be processed through the fourth full-connection layer, so as to output a first prediction probability (i.e. supervision loss 2) from 0 to 1.
The method comprisesIs the parameter of the fourth full connection layer, p 2 I.e. the second prediction probability.
In step 104, a reconstruction loss function is constructed from the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature, and the second low-dimensional feature, and a contrast loss function is constructed from the first predictive probability and the second predictive probability.
In the embodiment of the application, since the first model and the second model are not trained models, the first prediction probability and the second prediction probability are inaccurate, and in order to realize accurate probability prediction, parameters of the first model and the second model are required to be optimized by constructing a target loss function, and the function of the target loss function is to determine the performance of the model by comparing the prediction output and the expected output of the model, so as to find the optimization direction and reduce the difference between model label information and a prediction value.
In this way, in order to construct a better target loss function, the embodiment of the present application may construct a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, where the purpose of the reconstruction loss function is to reduce information loss of the first high-dimensional feature and the first low-dimensional feature, and the second high-dimensional feature and the second low-dimensional feature during feature dimension reduction, and the specific implementation process refers to the following steps:
(1) Carrying out reconstruction processing on the first low-dimensional features to obtain reconstructed first features;
(2) Performing reconstruction processing on the second low-dimensional features to obtain reconstructed second reconstructed features;
(3) Acquiring a first difference between the first high-dimensional feature and the first reconstructed feature, and acquiring a second difference between the second high-dimensional feature and the second reconstructed feature;
(4) A reconstruction loss function is constructed from the first difference and the second difference.
The reconstruction processing means that reverse feature reconstruction is performed on the low-dimensional features, corresponding feature dimensions are supplemented, the low-dimensional features are reversely restored to high-dimensional features, information loss is inevitably caused in the process that the first model reduces the first high-dimensional features to the first low-dimensional features and the second model reduces the second high-dimensional features to the second low-dimensional features, in order to ensure that the information loss is as small as possible, a reconstruction loss function needs to be constructed, the first low-dimensional features can be subjected to reconstruction processing, the first reconstructed features with reversely restored features after the reconstruction processing are obtained, and the first reconstructed features are high-dimensional features, and have the same feature dimensions as the first high-dimensional features. Correspondingly, the second low-dimensional feature can be subjected to reconstruction processing to obtain a second reconstructed feature of which the feature is reversely restored after the reconstruction processing, and it is noted that the second reconstructed feature is a high-dimensional feature, and the feature dimension is the same as the dimension of the second high-dimensional feature.
The first high-dimensional feature is an initial high-dimensional feature, information is complete, the first reconstruction feature is obtained by reconstruction processing based on the first low-dimensional feature, so that information loss exists in the first reconstruction feature, and the information loss is reduced as much as possible when the reconstruction loss function aims to convert the first high-dimensional feature into the first low-dimensional feature as much as possible, so that the information loss in the first low-dimensional feature is minimum, and therefore, the first difference between the first high-dimensional feature and the first reconstruction feature can be obtained.
And in the same way, the second high-dimensional feature is an initial high-dimensional feature, the information is complete, the second reconstructed feature is obtained by reconstructing based on the second low-dimensional feature, so that information loss exists in the second reconstructed feature, and the information loss is reduced as much as possible when the reconstruction loss function aims to convert the second high-dimensional feature into the second low-dimensional feature as much as possible, so that the information loss in the second low-dimensional feature is minimum, and therefore, a second difference between the second high-dimensional feature and the second reconstructed feature can be obtained.
Further, a reconstruction loss function may be constructed from the first and second differences, the purpose of the reconstruction loss function being to make the first and second differences as small as possible and to make the information loss in the first and second low-dimensional features as small as possible.
In one embodiment, the step of obtaining a first difference between the first high-dimensional feature and the first reconstructed feature and obtaining a second difference between the second high-dimensional feature and the second reconstructed feature comprises:
(1.1) computing a first difference of the first high-dimensional feature and the first reconstructed feature;
(1.2) determining an absolute value of the first difference as a first difference;
(1.3) calculating a second difference of the second high-dimensional feature and the second reconstructed feature;
(1.4) determining an absolute value of the second difference as a second difference.
For a better description of the embodiments of the present application, please refer to the following formula:
L recons =(|F h1 -Rec(F l1 )|+|F h2 -Rec(F l2 )|)/2
wherein the Rec () represents the reconstructionA function for realizing a reconstruction process for the low-dimensional features, the Rec (F l1 ) Representing a first reconstruction feature, the Rec (F l2 ) Represents a second reconstruction feature, the L recons Representing a reconstruction loss function, whereby the reconstruction loss function L is constructed by calculating a first difference between the first high-dimensional feature and the first reconstruction feature, determining an absolute value of the first difference as a first difference, calculating a second difference between the second high-dimensional feature and the second reconstruction feature, determining an absolute value of the second difference as a second difference, and then averaging the first difference and the second difference recons
Because the second image is an image obtained by performing data enhancement processing based on the first image, label information of the first image and label information of the second image are consistent, but the second image is camouflaged (namely, data enhancement processing), and a result output by a model may be different, in order to enable the first prediction probability and the second prediction probability, in the embodiment of the application, a contrast loss function needs to be constructed according to the first prediction probability and the second prediction probability, and the purpose of the contrast loss function is to reduce the difference between the first prediction probability and the second prediction probability, so that the model can identify the first image and the second image at the same time. The specific implementation process is as follows:
(2.1) calculating a third difference between the first and second predicted probabilities;
(2.2) constructing a contrast loss function based on the absolute value of the third difference.
For a better description of the embodiments of the present application, please refer to the following formula:
L contras =|p 1 -p 2 |
wherein the p is 1 Representing a first predictive probability, p 2 Represents a second predictive probability, L contras Representing the contrast loss function, whereby the contrast loss function is constructed based on the absolute value of the third difference by calculating the third difference of the first prediction probability and the second prediction probability.
In step 105, the first model and the second model are iteratively trained based on the reconstructed loss function, the compared loss function and the supervised loss function combined construction target loss function, so as to obtain a trained first model and second model.
The monitoring loss function is a general loss function and is constructed based on the difference between the first prediction probability and the tag information and the difference between the second prediction probability and the tag information, namely, the objective of the monitoring loss function is to reduce the difference between the first prediction probability and the tag information and the difference between the second prediction probability and the tag information, so that the output result of the first model and the second model is as close to the tag information as possible.
Therefore, the target loss function can be constructed according to the reconstruction loss function, the comparison loss function and the supervision loss function, the target loss function is optimized, the information loss of the first high-dimensional feature and the first low-dimensional feature in feature dimension reduction can be reduced, the information loss of the second high-dimensional feature and the second low-dimensional feature in feature dimension reduction can be reduced, the difference between the first prediction probability and the second prediction probability is reduced, the difference between the first prediction probability and the label information and the difference between the second prediction probability and the label information are reduced, and therefore only the gradient descent method is needed for the target loss function to continuously find the minimum value until the target loss value output by the target loss function converges, namely the training is completed, and the first model and the second model after training are obtained.
In some embodiments, the iterative training of the first model and the second model based on the reconstructed loss function, the compared loss function, and the supervised loss function combined constructed target loss function, to obtain a trained first model and second model, includes:
(1) Constructing a supervision loss function according to the first prediction probability, the second prediction probability and the label information;
(2) Constructing a target loss function based on the sum of the reconstructed loss function, the comparison loss function and the supervision loss function;
(3) And carrying out iterative training on the first model and the second model based on the target loss function until the target loss function converges, and obtaining a trained first model and second model.
The supervised loss function may be constructed according to the difference between the first prediction probability and the tag information and the difference between the second prediction probability and the tag information, and further, the reconstructed loss function, the comparison loss function, and the supervised loss function may be summed to construct the supervised loss function.
Further, a gradient descent algorithm can be performed based on the target loss function, parameters of the first model and the second model are continuously adjusted according to the output target loss value until the output target loss value converges, the fact that the parameters of the first model and the second model are adjusted is achieved, the trained first model and the trained second model are obtained after training, the information loss of the low-dimensional features after the high-dimensional features are reduced to the low-dimensional information is small through training of the trained first model and the trained second model, the feature representation capability is higher, the output result is more accurate, and the models have wide recognition capability and are higher in robustness through comparison of the first image and the second image.
In some embodiments, the step of constructing a supervised loss function based on the first predictive probability, the second predictive probability, and the tag information may include:
(1.1) obtaining a third difference between the first predictive probability and the tag information;
(1.2) obtaining a fourth difference between the second prediction probability and the tag information;
(1.3) constructing a supervised loss function based on the third and fourth differences.
For a better description of the embodiments of the present application, please refer to the following formula:
L supervised1 =-[y*logp 1 +(1-y)*log(1-p 1 ) (1)
L superuised2 =-[y*logp 2 +(1-y)*log(1-p 2 ) (2)
L supervised =(L supervised1 +L supervised2 )/2 (3)
wherein y represents tag information, L supervised1 Represents the third difference, L supervised2 Representing a fourth difference, which is a function of the first,the L is supervised Representing a supervised loss function, whereby a third difference between the first predictive probability and the tag information is obtained by equation (1), and a fourth difference between the second predictive probability and the tag information is obtained by equation (2). A supervised loss function is constructed from the third and fourth differences by equation (3).
In some embodiments, the step of constructing a target loss function based on the sum of the reconstructed loss function, the contrast loss function, and the supervised loss function comprises:
(2.1) determining a target contrast loss function from the contrast loss function and the first weight;
(2.2) determining a target supervisory loss function based on the supervisory loss function and the second weight;
(2.3) constructing a target loss function based on the sum of the reconstructed loss function, the target contrast loss function, and the target supervisory loss function.
For a better description of the embodiments of the present application, please refer to the following formula:
L=L recons +αL contras +βL superuised
wherein α is a first weight, β is a second weight, L is a target loss function, and a sum of the first weight and the second weight may be 1, so as to balance a plurality of optimization targets, where specific values of the first weight and the second weight may be empirically set, or may be set according to the optimization targets, which is not specifically limited herein. Thus, the target contrast loss function alpha L is determined according to the contrast loss function and the first weight contras And determining a target loss function beta L according to the supervision loss function and the second weight supervised And then summing the reconstruction loss function, the target contrast loss function and the target supervision loss function to construct a target loss function L.
In step 106, the image to be identified is identified based on the trained first model or second model.
The first model and the second model are trained, so that the information loss of the low-dimensional features is small after the high-dimensional features are reduced to the low-dimensional information, the feature representation capability is stronger, the output result is more accurate, and the models have wide recognition capability and stronger robustness through the comparison of the first image and the second image. Therefore, when the first model or the second model after training is used for identifying the image to be identified, which is not known whether the image has industrial defects, a more accurate identification result can be output, namely in the embodiment of the application, only one of the two depth models participating in training is needed to be actually deployed, and extra time consumption is not increased.
For example, referring to fig. 3c together, an image to be identified (test data 1) is input into a first model after training, a corresponding target prediction probability p is output, and the value of the target prediction probability p is between 0 and 1.
From the above, the embodiment of the present application obtains the first image and the second image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a contrast loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model; and identifying the image to be identified based on the trained first model or second model. In this way, a first image and a second image camouflaged based on the first image are obtained, a reconstruction loss function between high-dimensional features and low-dimensional features is constructed through design, and a comparison loss function between a first prediction probability output by the first image by a first model and a second prediction probability output by the second image by a second model is combined with a supervision loss function to construct a target loss function for training, so that the feature representation capability and robustness of the model can be learned are improved, the accuracy and generalization of the trained model for identifying the image to be identified are also stronger, and compared with the technical scheme that manual feature extraction is required in the related art, the embodiment of the application greatly improves the accuracy of image processing and saves the labor cost.
The methods described in connection with the above embodiments are described in further detail below by way of example.
In this embodiment, description will be given taking an example in which the image processing apparatus is specifically integrated in a server.
Referring to fig. 4, fig. 4 is a schematic view of a scenario of an image processing method according to an embodiment of the application. The method flow may include:
in step 201, the server acquires a first image, and performs data enhancement processing on the first image to obtain a second image.
The server acquires a first image from the training data set, wherein the first image carries corresponding label information, the label information is a binary label, namely 0 and 1,0 represent normal images, 1 represents a defect image, and the label information is an artificial mark.
In order to enrich the training data set, data enhancement processing can be performed on the first image to obtain a corresponding expanded second image, and the processing mode of the data enhancement processing includes at least one of noise increase, image inversion, image rotation, image brightness adjustment and contrast adjustment, so as to obtain a camouflaged second image.
For example, as shown in fig. 3b, a first image (input image x) is obtained for random data enhancement, and a second image (i.e., random data image x') is obtained assuming that the enhancement mode is noise enhancement, so as to realize extended training data and enrich robustness and generalization of subsequent model training.
In step 202, the server inputs a first image into a first model, outputs a first high-dimensional feature, a first low-dimensional feature, and a first prediction probability, inputs a second image into a second model, and outputs a second high-dimensional feature, a second low-dimensional feature, and a second prediction probability.
The first model and the second model are both depth models, please refer to fig. 3b, in which a first image may be input to the first model (depth model 1) for feature extraction, so as to obtain a corresponding first high-dimensional feature, which may be understood by referring to the following formula:
the x is the first image, theA weight parameter representing a first model, the F h1 For the first high-dimensional feature, the f () function represents the feature extraction method.
Therefore, the feature dimension reduction is carried out on the first high-dimensional feature through the first full-connection layer and the nonlinear activation layer in the first model, irrelevant and redundant features are effectively eliminated, and the first low-dimensional feature is obtained, and can be understood by referring to the following formula:
Wherein the method comprisesParameters representing the first fully connected layer and the nonlinear activation layer of the first model, F l1 I.e. the first low-dimensional feature.
Further, a second full-connection layer may be added to the first model, and the second full-connection layer may process the first low-dimensional feature and output a first prediction probability (i.e. the supervision loss 1) from 0 to 1, which may be understood by referring to the following formula:
the method comprisesIs the parameter of the second full connection layer, p 1 I.e. the first prediction probability.
Inputting the second image into a second model (depth model 2) for feature extraction to obtain corresponding second high-dimensional features, which can be understood by referring to the following formula:
the x' is the second image, theRepresenting the weight parameters of the second model, the F h2 For the second high-dimensional feature, the f () function represents the feature extraction method.
Feature dimension reduction is performed on the second high-dimensional feature through a third full-connection layer and a nonlinear activation layer in the second model, irrelevant and redundant features are effectively eliminated, and a second low-dimensional feature is obtained, and can be understood by referring to the following formula:
wherein the method comprisesFourth full connection layer and nonlinear activation layer representing second modelIs a parameter of F l2 I.e. the second low-dimensional feature.
Further, a fourth full-connection layer may be added to the second model, and the second low-dimensional feature may be processed through the fourth full-connection layer, so as to output a first prediction probability (i.e. supervision loss 2) from 0 to 1.
The method comprisesIs the parameter of the fourth full connection layer, p 2 I.e. the second prediction probability.
In step 203, the server performs reconstruction processing on the first low-dimensional feature to obtain a reconstructed first feature, performs reconstruction processing on the second low-dimensional feature to obtain a reconstructed second feature, calculates a first difference between the first high-dimensional feature and the first reconstructed feature, determines an absolute value of the first difference as a first difference, calculates a second difference between the second high-dimensional feature and the second reconstructed feature, determines an absolute value of the second difference as a second difference, and constructs a reconstruction loss function according to the first difference and the second difference.
For a better description of the embodiments of the present application, please refer to the following formula:
L recons =(|F h1 -Rec(F l1 )|+|F h2 -Rec(F l2 )|)/2
wherein the Rec () represents a reconstruction function for implementing a reconstruction process for low-dimensional features, the Rec (F l1 ) Representing a first reconstruction feature, the Rec (F l2 ) Represents a second reconstruction feature, the L recons Representing a reconstruction loss function, performing reconstruction processing on the first low-dimensional feature through the reconstruction function to obtain a first reconstructed feature after reconstruction processing, performing reconstruction processing on the second low-dimensional feature to obtain a second reconstructed feature after reconstruction processing, and calculating a first difference value between the first high-dimensional feature and the first reconstructed feature to obtain a second reconstructed feature Determining an absolute value of a difference as a first difference, determining the absolute value of a second difference as a second difference by calculating a second difference of the second high-dimensional feature and the second reconstruction feature, and then constructing a reconstruction loss function L by an average of the first difference and the second difference recons The purpose of the reconstruction loss function is to reduce the information loss of the first high-dimensional feature and the first low-dimensional feature and the second high-dimensional feature and the second low-dimensional feature during feature dimension reduction.
In step 204, the server calculates a third difference of the first and second predictive probabilities, and constructs a contrast loss function based on an absolute value of the third difference.
For a better description of the embodiments of the present application, please refer to the following formula:
L contras =|p 1 -p 2 |
wherein the p is 1 Representing a first predictive probability, p 2 Represents a second predictive probability, L contras And representing a contrast loss function, so that the model can simultaneously identify the first image and the second image by calculating a third difference value of the first prediction probability and the second prediction probability and constructing the contrast loss function based on the absolute value of the third difference value, wherein the aim of the contrast loss function is to reduce the difference between the first prediction probability and the second prediction probability.
In step 205, the server obtains a third difference between the first prediction probability and the tag information, obtains a fourth difference between the second prediction probability and the tag information, and constructs a supervised loss function based on the third difference and the fourth difference.
For a better description of the embodiments of the present application, please refer to the following formula:
L supervised1 =-[y*logp 1 +(1-y)*log(1-p 1 ) (1)
L supervised2 =-[y*logp 2 +(1-y)*log(1-p 2 ) (2)
L supervised =(L supervised1 +L supervised2 )/2 (3)
wherein, the y represents tag information,the L is supervised1 Represents the third difference, L supervised2 Represents the fourth difference, L supervised Representing a supervised loss function, whereby a third difference between the first predictive probability and the tag information is obtained by equation (1), and a fourth difference between the second predictive probability and the tag information is obtained by equation (2). A supervised loss function is constructed from the third and fourth differences by equation (3). The objective of the supervised loss function is to narrow down the differences between the first predictive probability and the label information and the second predictive probability and the label information so that the results output by the first model and the second model are as close as possible to the label information.
In step 206, the server determines a target contrast loss function from the contrast loss function and the first weight, determines a target supervisory loss function from the supervisory loss function and the second weight, and constructs a target loss function based on the sum of the reconstructed loss function, the target contrast loss function, and the target supervisory loss function.
For a better description of the embodiments of the present application, please refer to the following formula:
L=L recons +αL contras +βL supervised
wherein α is a first weight, β is a second weight, L is a target loss function, a sum of the first weight and the second weight may be 1, for balancing a plurality of optimization targets, and specific values of the first weight and the second weight may be empirically set, or may be set according to the optimization targets, for example, the first weight is 0.4, and the second weight is 0.6. Thus, the target contrast loss function alpha L is determined according to the contrast loss function and the first weight contras And determining a target loss function beta L according to the supervision loss function and the second weight supervised And then summing the reconstruction loss function, the target contrast loss function and the target supervision loss function to construct a target loss function L.
In step 207, the server performs iterative training on the first model and the second model based on the objective loss function until the objective loss function converges, and obtains a trained first model and second model.
The target loss function is optimized, so that the information loss of the first high-dimensional feature and the first low-dimensional feature during feature dimension reduction, the information loss of the second high-dimensional feature and the second low-dimensional feature during feature dimension reduction, the difference between the first prediction probability and the second prediction probability, the difference between the first prediction probability and the label information and the difference between the second prediction probability and the label information can be reduced, and the minimum value is continuously required by only carrying out a gradient descent method on the target loss function until the target loss value output by the target loss function converges, namely the target loss value represents that training is completed, and the first model and the second model after training are obtained.
Therefore, a gradient descent algorithm can be performed based on the target loss function, parameters of the first model and the second model are continuously adjusted according to the output target loss value until the output target loss value converges, the fact that the parameters of the first model and the second model are adjusted is achieved, the first model and the second model after training are obtained after training, the information loss of the low-dimensional features after the high-dimensional features are reduced to the low-dimensional information is small through training of the first model and the second model after training, the feature representation capability is higher, the output result is more accurate, and the models have wide recognition capability and are higher in robustness through comparison of the first image and the second image.
In step 208, the server identifies the image to be identified based on the trained first model or second model.
The first model and the second model are trained, so that the information loss of the low-dimensional features is small after the high-dimensional features are reduced to the low-dimensional information, the feature representation capability is stronger, the output result is more accurate, and the models have wide recognition capability and stronger robustness through the comparison of the first image and the second image.
With continued reference to fig. 3c, the image to be identified (test data 1) is input into the trained first model, and a corresponding target prediction probability p is output, where the value of the target prediction probability p is between 0 and 1, in this embodiment of the present application, when the target prediction probability p is less than 0.5, it is indicated that the prediction result is more prone to be a defect image, and the result of the image to be identified may be a defect image, and when the target prediction probability p is greater than or equal to 0.5, it is indicated that the prediction result is more prone to be a normal image, and the result of the image to be identified may be a normal image, so as to implement an accurate defect quality inspection function, and provide a reliable technical support for industrial AI defect quality detection.
From the above, the embodiment of the present application obtains the first image and the second image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a contrast loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model; and identifying the image to be identified based on the trained first model or second model. In this way, a first image and a second image camouflaged based on the first image are obtained, a reconstruction loss function between high-dimensional features and low-dimensional features is constructed through design, and a comparison loss function between a first prediction probability output by the first image by a first model and a second prediction probability output by the second image by a second model is combined with a supervision loss function to construct a target loss function for training, so that the feature representation capability and robustness of the model can be learned are improved, the accuracy and generalization of the trained model for identifying the image to be identified are also stronger, and compared with the technical scheme that manual feature extraction is required in the related art, the embodiment of the application greatly improves the accuracy of image processing and saves the labor cost.
In order to facilitate better implementation of the image processing method provided by the embodiment of the application, the embodiment of the application also provides a device based on the image processing method. Where the meaning of the terms is the same as in the image processing method described above, specific implementation details may be referred to in the description of the method embodiments.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application, where the image processing apparatus may include an obtaining unit 301, a first output unit 302, a second output unit 303, a first constructing unit 304, a second constructing unit 305, an identifying unit 306, and the like.
An acquiring unit 301, configured to acquire a first image and a second image, where the second image is an image obtained by performing data enhancement processing according to the first image.
In some embodiments, the obtaining unit 301 is configured to:
acquiring a first image;
performing data enhancement processing on the first image to obtain a second image;
the processing mode of the data enhancement processing comprises at least one of noise increase, image inversion, image rotation, image brightness adjustment and contrast adjustment.
The first output unit 302 is configured to input the first image into the first model, and output a first high-dimensional feature, a first low-dimensional feature, and a first prediction probability.
In some embodiments, the first output unit 302 is configured to:
inputting the first image into a first model for feature extraction to obtain a first high-dimensional feature;
processing the first high-dimensional feature through a first full-connection layer and a nonlinear activation layer in the first model to obtain a first low-dimensional feature;
and processing the first low-dimensional feature through a second full-connection layer in the first model to obtain a first prediction probability.
A second output unit 303 for inputting the second image to the second model and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability.
A first construction unit 304 is configured to construct a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature, and the second low-dimensional feature, and construct a contrast loss function according to the first prediction probability and the second prediction probability.
In some embodiments, the first building element 304 comprises:
the first reconstruction subunit is used for carrying out reconstruction processing on the first low-dimensional feature to obtain a first reconstructed feature after reconstruction processing;
the second reconstruction subunit is used for carrying out reconstruction processing on the second low-dimensional features to obtain reconstructed second reconstructed features;
An acquisition subunit configured to acquire a first difference between the first high-dimensional feature and the first reconstructed feature, and acquire a second difference between the second high-dimensional feature and the second reconstructed feature;
a first construction subunit for constructing a reconstruction loss function from the first difference and the second difference;
and the second construction subunit is used for constructing a contrast loss function according to the first prediction probability and the second prediction probability.
In some embodiments, the acquisition subunit is configured to:
calculating a first difference of the first high-dimensional feature and the first reconstructed feature;
determining an absolute value of the first difference as a first difference;
calculating a second difference of the second high-dimensional feature and the second reconstructed feature;
the absolute value of the second difference is determined as the second difference.
In some embodiments, the second building subunit is configured to:
calculating a third difference between the first predictive probability and the second predictive probability;
a contrast loss function is constructed based on the absolute value of the third difference.
The second construction unit 305 is configured to perform iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function, and the supervised loss function to jointly construct a target loss function, thereby obtaining a trained first model and second model.
In some embodiments, the second building unit 305 comprises:
the third construction subunit is used for constructing a supervision loss function according to the first prediction probability, the second prediction probability and the label information;
a fourth construction subunit for constructing a target loss function based on the sum of the reconstructed loss function, the comparison loss function, and the supervised loss function;
and the training subunit is used for carrying out iterative training on the first model and the second model based on the target loss function until the target loss function converges to obtain a trained first model and a trained second model.
In some embodiments, the third building subunit is configured to:
acquiring a third difference between the first prediction probability and the tag information;
acquiring a fourth difference between the second prediction probability and the tag information;
and constructing a supervision loss function according to the third difference and the fourth difference.
In some embodiments, the fourth building subunit is configured to:
determining a target contrast loss function according to the contrast loss function and the first weight;
determining a target supervision loss function according to the supervision loss function and the second weight;
and constructing a target loss function based on the sum of the reconstruction loss function, the target contrast loss function and the target supervision loss function.
The identifying unit 306 is configured to identify the image to be identified based on the trained first model or the trained second model.
The specific implementation of each unit can be referred to the previous embodiments, and will not be repeated here.
As can be seen from the above, the embodiment of the present application acquires the first image and the second image through the acquisition unit 301; the first output unit 302 inputs the first image to the first model, outputting the first high-dimensional feature, the first low-dimensional feature, and the first prediction probability; the second output unit 303 inputs the second image to the second model, outputting the second high-dimensional feature, the second low-dimensional feature, and the second prediction probability; the first construction unit 304 constructs a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature, and the second low-dimensional feature, and constructs a contrast loss function according to the first prediction probability and the second prediction probability; the second construction unit 305 performs iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervised loss function combined construction target loss function, and obtains a trained first model and second model; the recognition unit 306 recognizes the image to be recognized based on the trained first model or second model. In this way, a first image and a second image camouflaged based on the first image are obtained, a reconstruction loss function between high-dimensional features and low-dimensional features is constructed through design, and a comparison loss function between a first prediction probability output by the first image by a first model and a second prediction probability output by the second image by a second model is combined with a supervision loss function to construct a target loss function for training, so that the feature representation capability and robustness of the model can be learned are improved, the accuracy and generalization of the trained model for identifying the image to be identified are also stronger, and compared with the technical scheme that manual feature extraction is required in the related art, the embodiment of the application greatly improves the accuracy of image processing and saves the labor cost.
The embodiment of the application also provides a computer device, as shown in fig. 6, which shows a schematic structural diagram of a server according to the embodiment of the application, specifically:
the computer device may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, a power supply 403, and an input unit 404, among other components. Those skilled in the art will appreciate that the computer device structure shown in FIG. 6 is not limiting of the computer device and may include more or fewer components than shown, or may be combined with certain components, or a different arrangement of components. Wherein:
processor 401 is the control center of the computer device and connects the various parts of the entire computer device using various interfaces and lines to perform various functions of the computer device and process data by running or executing software programs and/or modules stored in memory 402 and invoking data stored in memory 402, thereby performing overall monitoring of the computer device. Optionally, processor 401 may include one or more processing cores; alternatively, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.
The memory 402 may be used to store software programs and modules, and the processor 401 executes various functional applications and data processing by executing the software programs and modules stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.
The computer device further comprises a power supply 403 for supplying power to the various components, optionally, the power supply 403 may be logically connected to the processor 401 by a power management system, so that functions of charge, discharge, and power consumption management are performed by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The computer device may also include an input unit 404, which input unit 404 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 401 in the computer device loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement the various method steps provided in the foregoing embodiment, as follows:
acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and a trained second model; and identifying the image to be identified based on the trained first model or second model.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and the portions of an embodiment that are not described in detail in the foregoing embodiments may be referred to the detailed description of the image processing method, which is not repeated herein.
From the foregoing, it can be seen that the computer device according to the embodiment of the present application may acquire the first image and the second image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a contrast loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model; and identifying the image to be identified based on the trained first model or second model. In this way, a first image and a second image camouflaged based on the first image are obtained, a reconstruction loss function between high-dimensional features and low-dimensional features is constructed through design, and a comparison loss function between a first prediction probability output by the first image by a first model and a second prediction probability output by the second image by a second model is combined with a supervision loss function to construct a target loss function for training, so that the feature representation capability and robustness of the model can be learned are improved, the accuracy and generalization of the trained model for identifying the image to be identified are also stronger, and compared with the technical scheme that manual feature extraction is required in the related art, the embodiment of the application greatly improves the accuracy of image processing and saves the labor cost.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the image processing methods provided by the embodiments of the present application. For example, the instructions may perform the steps of:
acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image; inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability; inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability; constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability; performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and a trained second model; and identifying the image to be identified based on the trained first model or second model.
According to one aspect of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the various alternative implementations provided in the above embodiments.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium may execute the steps in any one of the image processing methods provided in the embodiments of the present application, the beneficial effects that any one of the image processing methods provided in the embodiments of the present application can achieve are detailed in the previous embodiments, and are not described herein.
The foregoing has described in detail the methods, apparatuses and computer readable storage medium for image processing according to the embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only for aiding in understanding the methods and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims (13)

1. An image processing method, comprising:
acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image;
inputting the first image into a first model, and outputting a first high-dimensional feature, a first low-dimensional feature and a first prediction probability;
inputting the second image into a second model, and outputting a second high-dimensional feature, a second low-dimensional feature and a second prediction probability;
constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability;
performing iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function to jointly construct a target loss function, so as to obtain a trained first model and trained second model;
and identifying the image to be identified based on the trained first model or second model.
2. The image processing method of claim 1, wherein the constructing a reconstruction loss function from the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature, and the second low-dimensional feature comprises:
Performing reconstruction processing on the first low-dimensional features to obtain reconstructed first features;
performing reconstruction processing on the second low-dimensional features to obtain reconstructed second reconstructed features;
acquiring a first difference between the first high-dimensional feature and the first reconstruction feature, and acquiring a second difference between the second high-dimensional feature and the second reconstruction feature;
and constructing a reconstruction loss function according to the first difference and the second difference.
3. The image processing method according to claim 2, wherein the acquiring a first difference between the first high-dimensional feature and the first reconstructed feature and acquiring a second difference between the second high-dimensional feature and the second reconstructed feature, comprises:
calculating a first difference of the first high-dimensional feature and a first reconstructed feature;
determining an absolute value of the first difference as a first difference;
calculating a second difference of the second high-dimensional feature and a second reconstructed feature;
and determining the absolute value of the second difference value as a second difference.
4. A method of image processing according to any one of claims 1 to 3, wherein said constructing a contrast loss function from said first and second prediction probabilities comprises:
Calculating a third difference between the first prediction probability and the second prediction probability;
and constructing a contrast loss function based on the absolute value of the third difference value.
5. The image processing method according to any one of claims 1 to 4, wherein the iterative training of the first model and the second model based on the combined construction of the reconstruction loss function, the contrast loss function, and the supervision loss function to obtain a trained first model and second model includes:
constructing a supervision loss function according to the first prediction probability, the second prediction probability and the label information;
constructing a target loss function based on the sum of the reconstruction loss function, the contrast loss function and the supervision loss function;
and carrying out iterative training on the first model and the second model based on the target loss function until the target loss function converges, so as to obtain a trained first model and second model.
6. The image processing method according to claim 5, wherein the constructing a supervised loss function from the first prediction probability, the second prediction probability, and the tag information includes:
acquiring a third difference between the first prediction probability and the tag information;
Acquiring a fourth difference between the second prediction probability and the tag information;
and constructing a supervision loss function according to the third difference and the fourth difference.
7. The image processing method according to claim 5, wherein the constructing an objective loss function based on the sum of the reconstruction loss function, the contrast loss function, and the supervised loss function includes:
determining a target contrast loss function according to the contrast loss function and the first weight;
determining a target supervision loss function according to the supervision loss function and the second weight;
and constructing a target loss function based on the sum of the reconstruction loss function, the target contrast loss function and the target supervision loss function.
8. The image processing method according to any one of claims 1 to 7, wherein the acquiring the first image and the second image includes:
acquiring a first image;
performing data enhancement processing on the first image to obtain a second image;
the processing mode of the data enhancement processing comprises at least one of noise increase, image inversion, image rotation, image brightness adjustment and contrast adjustment.
9. The image processing method according to any one of claims 1 to 8, wherein the inputting the first image into the first model, outputting the first high-dimensional feature, the first low-dimensional feature, and the first prediction probability, comprises:
Inputting the first image into a first model for feature extraction to obtain a first high-dimensional feature;
processing the first high-dimensional feature through a first full-connection layer and a nonlinear activation layer in the first model to obtain a first low-dimensional feature;
and processing the first low-dimensional features through a second full-connection layer in the first model to obtain a first prediction probability.
10. An image processing apparatus, comprising:
the acquisition unit is used for acquiring a first image and a second image, wherein the second image is an image obtained by performing data enhancement processing according to the first image;
a first output unit for inputting the first image to a first model, outputting a first high-dimensional feature, a first low-dimensional feature, and a first prediction probability;
a second output unit for inputting the second image to a second model, outputting a second high-dimensional feature, a second low-dimensional feature, and a second prediction probability;
the first construction unit is used for constructing a reconstruction loss function according to the first high-dimensional feature, the first low-dimensional feature, the second high-dimensional feature and the second low-dimensional feature, and constructing a comparison loss function according to the first prediction probability and the second prediction probability;
The second construction unit is used for carrying out iterative training on the first model and the second model based on the reconstructed loss function, the comparison loss function and the supervision loss function combined construction target loss function to obtain a trained first model and trained second model;
and the identification unit is used for identifying the image to be identified based on the trained first model or second model.
11. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the image processing method of any of claims 1 to 9.
12. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the image processing method according to any of claims 1 to 9 when the computer program is executed.
13. A computer program product comprising a computer program or instructions which, when executed by a processor, carries out the steps of the image processing method according to any one of claims 1 to 9.
CN202310091688.8A 2023-01-29 2023-01-29 Image processing method, device and computer readable storage medium Pending CN116977751A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310091688.8A CN116977751A (en) 2023-01-29 2023-01-29 Image processing method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310091688.8A CN116977751A (en) 2023-01-29 2023-01-29 Image processing method, device and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116977751A true CN116977751A (en) 2023-10-31

Family

ID=88475515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310091688.8A Pending CN116977751A (en) 2023-01-29 2023-01-29 Image processing method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116977751A (en)

Similar Documents

Publication Publication Date Title
CN112686223B (en) Table identification method and device and computer readable storage medium
CN113392867B (en) Image recognition method, device, computer equipment and storage medium
CN110598019B (en) Repeated image identification method and device
CN111242019B (en) Video content detection method and device, electronic equipment and storage medium
CN109598301B (en) Detection area removing method, device, terminal and storage medium
CN114462526B (en) Classification model training method and device, computer equipment and storage medium
CN114708518A (en) Bolt defect detection method based on semi-supervised learning and priori knowledge embedding strategy
CN107992937B (en) Unstructured data judgment method and device based on deep learning
CN111046655A (en) Data processing method and device and computer readable storage medium
CN117152528A (en) Insulator status identification method, device, equipment, storage medium and program product
JP7206892B2 (en) Image inspection device, learning method for image inspection, and image inspection program
CN113705648B (en) A data processing method, device and equipment
CN112529025A (en) Data processing method and device
CN116977265A (en) Training method and device for defect detection model, computer equipment and storage medium
CN116977703A (en) Abnormality detection method, device, equipment and storage medium based on artificial intelligence
CN114330090A (en) Defect detection method and device, computer equipment and storage medium
CN116977695B (en) Image processing method, apparatus, device, readable storage medium, and program product
CN116756881B (en) Bearing residual service life prediction method, device and storage medium
CN119251471A (en) Target detection method and device based on YOLO architecture
CN116977751A (en) Image processing method, device and computer readable storage medium
CN117932455A (en) Internet of things asset identification method and system based on neural network
CN115115536B (en) Image processing method, device, electronic device and computer readable storage medium
CN114580761B (en) Steam volume abnormality determination method, apparatus, electronic device, and readable storage medium
CN118333122A (en) Management method, device, electronic device and storage medium of AI training platform
CN117893490A (en) Industrial product defect detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination