[go: up one dir, main page]

CN119150925A - Method and system for searching generated countermeasure network architecture based on mixed convolution operation - Google Patents

Method and system for searching generated countermeasure network architecture based on mixed convolution operation Download PDF

Info

Publication number
CN119150925A
CN119150925A CN202411673668.2A CN202411673668A CN119150925A CN 119150925 A CN119150925 A CN 119150925A CN 202411673668 A CN202411673668 A CN 202411673668A CN 119150925 A CN119150925 A CN 119150925A
Authority
CN
China
Prior art keywords
discriminator
architecture
candidate
generator
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202411673668.2A
Other languages
Chinese (zh)
Other versions
CN119150925B (en
Inventor
邹雨枫
薛羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202411673668.2A priority Critical patent/CN119150925B/en
Publication of CN119150925A publication Critical patent/CN119150925A/en
Application granted granted Critical
Publication of CN119150925B publication Critical patent/CN119150925B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method and a system for searching a generating countermeasure network architecture based on a mixed convolution operation, wherein the method firstly designs a wide search space, then constructs a super network which merges all candidate operations, and enables the super network to be completely micro in training process by a continuous relaxation method, so that efficient training is carried out by using a gradient descent method, and finally, the operation with the best performance at each position is selected to obtain an optimal architecture. Unlike traditional architectural search methods, the present invention introduces a novel hybrid convolution operation in the search space and optimizes the search process by integrating part of the channel attention mechanism. The generation of the countermeasure network architecture search based on the mixed convolution operation enhances the capturing capability of the network to the long-distance information and the architecture search speed, and compared with the traditional search method, the method has higher efficiency and better performance, and can find the optimal neural network architecture suitable for the task more quickly.

Description

Method and system for searching generated countermeasure network architecture based on mixed convolution operation
Technical Field
The invention relates to the technical field of deep neural networks, in particular to a method and a system for searching a generating countermeasure network architecture based on mixed convolution operation.
Background
The generation of the countermeasure network is a deep learning model, which consists of a generator and a discriminator, and automatically learns the complex characteristics of data through the countermeasure process, so that the quality of the generated data is improved, new ideas and tools are provided for various applications, and remarkable results are achieved in the fields of computer vision, image generation, data enhancement, style migration, speech synthesis and the like. The traditional generation countermeasure network architecture is designed manually by researchers, and continuous trial and error and adjustment are required, so that the researchers are required to have abundant expert experience, and a great deal of labor cost is also required. The advent of neural architecture search technology has revolutionized the generation of reactive network architecture designs.
Neural architecture search is a method of automatically designing neural network architecture that utilizes search algorithms to automatically discover high performance neural network architecture within a predefined search space. By searching the neural architecture, the manpower cost and time investment can be greatly reduced, so that even non-expert can design the efficient generation countermeasure network architecture. The neural architecture search not only improves the design efficiency of generating the countermeasure network architecture, but also discovers an innovative architecture which is difficult to explore by the traditional method, and further improves the quality of generated data and the performance of a model. However, the conventional neural architecture search method can obtain good effects in generating the antagonistic network architecture search, but has problems such as low search efficiency and huge consumption of computing resources.
In order to further advance the application of neural architecture searches in generating antagonistic network architecture designs, a number of new and improved approaches have emerged in recent years, such as introducing evolutionary computation into neural architecture searches. The evolution calculation simulates the biological evolution process, and the network architecture is gradually optimized through operations such as selection, intersection, variation and the like. By combining evolutionary computation with neural architecture searching, a superior performing generation countermeasure network architecture can be efficiently found in a wider search space. However, this method still has some drawbacks, such as low efficiency in a high-dimensional search space, easy sinking into a locally optimal solution, difficulty in ensuring that a globally optimal architecture is found, and often the search space is composed of pure convolution operation, lack of exploration of a wider search space, and failure in fully releasing the long-distance information capturing capability of the model. Therefore, it is necessary to propose a differentiable architecture search method that introduces a hybrid convolution operation. This can improve the search efficiency of the architecture and enhance the long-distance information capturing capability.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a method and a system for searching a generating countermeasure network architecture based on a mixed convolution operation.
Wherein the method comprises the steps of:
step 1, designing and generating a search space for searching against a network architecture, defining candidate operations, wherein the candidate operations comprise two types of common candidate operations and up-sampling candidate operations, and determining the scope of the architecture search;
step 2, constructing a generation countermeasure network super network containing all candidate frameworks according to the search space designed in the step 1, and randomly assigning an framework parameter for each candidate operation by using continuous numerical values in the range of 0 to 1 to represent the probability that each candidate operation is selected;
step 3, adding a part of channel attention mechanisms in the edges of the common candidate operation of the super network so as to calculate the attention weights of different channels;
step 4, according to the feature map output and the calculated channel attention weight of each candidate operation, weighting and adjusting important channels and features in the feature map, so that the output of different operations is more prominent in important features, and thus the architecture parameters of the candidate operation are dynamically adjusted and optimized;
step 5, training by using a gradient descent method to generate an antagonistic network super-network, and optimizing super-network structural parameters and weight parameters according to gradient information;
and step 6, alternately executing the step 4 and the step 5 until the preset maximum cycle times are reached, and selecting the candidate operation with the maximum architecture parameter weight in each side of the super network to obtain the optimal network architecture.
In step 1, the search space includes N interconnected and jump-connected units, each unit is regarded as a directed acyclic graph composed of M nodes, and the unit includes D up-sampling candidate operation nodes and E common candidate operation nodes, whereinN, M, D, E are natural numbers;
The method comprises the steps of adding a mixed convolution operation in a common candidate operation, combining the convolution operation and a self-attention operation, enabling an input feature map to be projected through three shared 1X 1 convolutions, generating three intermediate feature maps, enabling the projected feature maps to be used for a convolution path and a self-attention path respectively, dividing each intermediate feature map into h groups in the self-attention path, enabling each group to comprise three feature maps from each 1X 1 convolution to serve as a query, a key and a value respectively, conducting multi-head attention calculation, generating k2 feature maps through a lightweight full-connection layer in a convolution path with a convolution kernel size of k, processing the input feature maps through translation and aggregation operation, and obtaining final output according to weighted summation of output from the convolution path and the self-attention path, wherein the input feature maps are similar to the traditional convolution operation :
,
Where α and β represent the mixed convolution operational weights.
In step 2, the generation of the countermeasure network extranet sharingStrip edge, whereinThe strips are the edges of the jump connection,The bar is the edge on which the upsampling candidate operation is located,The jump connection and up-sampling candidate operation have T operation selections, and the common candidate operation has Y operation selections;
the super network comprises S sub networks, wherein S is a natural number, The sub-net represents a network with only one operation on all connected edges;
When the super-network is initialized, random distribution assignment is adopted on the architecture parameters of the candidate operation on each side, the numerical value represents the probability of the candidate operation being selected, for example, continuous data with random distribution ranging from 0 to 1 is used for giving weight to each connection change, the random initialization can break the symmetry of the network, each node can learn the characteristics of the input data in different modes at the initial stage of training, and the method is beneficial to avoiding the network from falling into a local optimal solution.
A subnet is a collective term for all candidate architectures in a search space, with a subnet representing a particular architecture selected in a given search space, each subnet being part of the subnet. Traditionally, the input information is typically processed through a single operation (e.g., convolution) to generate a new signature. However, in a super-network, the input information may undergo various operations such as dynamic separable convolution, hole convolution, and jump connection. The feature graphs generated by the operations are fused in a mode of adding elements bit by bit, and finally a new mixed feature graph is obtained. In the training process, the super-network can automatically select and adjust the weights of the candidate operations through learning and optimization, so that the optimal operation combination and network structure are finally determined.
In the invention, the up-sampling and jump connection edge of the super-network has T candidate operations including nearest neighbor interpolation up-sampling, bilinear interpolation up-sampling and the like, and the common candidate operations have Y candidate operations including 1×1 convolution, 1×1 mixed convolution, 3×3 separable convolution, 3×3 mixed convolution, 5×5 mixed convolution and the like.
The super-network described in the present invention is a network structure comprising a plurality of sub-networks, but the network itself is discrete and discontinuous, which directly hampers the use of gradient-based optimization methods for searching. In order to overcome the difficulty, the invention adopts a continuous relaxation strategy to convert the original discrete candidate operation into a continuous and conductive representation form so as to be capable of searching by utilizing optimization algorithms such as gradient descent and the like.
Specific continuous relaxation operations are as follows, for each node pair (i, j) in the generator super-network, all defined candidate operations are represented as a set O n in a unified way, the value range of n is determined by the number of candidate operations, and a group of architecture parameters corresponding to O n are introducedO represents an operation in the candidate operation set,. Wherein the architecture parametersThe larger the value, the higher the probability that the operation o is selected in the connection from node i to node j. Specifically, the architecture parameters are initialized to trainable continuous variables, which are converted to probability distributions by applying a softmax function, wherein the probability that each candidate operation is selected is proportional to the architecture parameter value corresponding to the candidate operation. Specifically, for the output of node i to node jThe calculation formula is as follows:
,
Wherein, Representing parameters of a frameworkBy conversion of the softmax function to a probability distributionRepresenting application operation o at inputAs a result of (a),For temporary parameters, the index representing a particular operation, exp is an exponential function,E represents a natural constant.Indicating that in the connection of node i to node j, a specific operation is selectedIs a relative likelihood of (a) is determined.
Through such operations, the original discrete candidate operation selection problem is converted into a continuous optimization problem, and then the optimal generated countermeasure network architecture can be searched by optimizing architecture parameters. By iteratively updating the architecture parameters and calculating the gradient of the loss function, the weight of each operation can be adjusted step by step so that the performance of the whole generator super-network on the training set is optimal. And finally, selecting the optimal operation between each node pair according to the values of the architecture parameters, and obtaining the searched optimal architecture.
In step 3, after adding part of channel attention mechanism to each common convolution operation node, the method reduces computing resource consumption while improving search efficiency, and specifically comprises the steps of firstly enabling an input feature map to pass through an average pooling layer and a maximum pooling layer respectively to obtain two different feature maps and realize compression of channel number and extraction of important features when computing the feature map, and then further carrying out linear change on the pooled features by the obtained feature map through two full-connection layers FC 1 and FC 2, wherein a first full-connection layer FC 1 reduces the channel number to the original channel numberAnd finally, adding the output from the average pooling and the maximum pooling, and normalizing the obtained output to ensure that the sum of the weights of all channels is 1, thereby obtaining the attention weights of different channels.
In step 4, firstly summing the attention weights of each channel in the attention mechanism of part of channels to obtain a total vector representing the weight of each channel, wherein each element of the total vector represents the importance of the corresponding channel, and then selecting the weights from the total vector to reorder the previous oneWherein q represents a scaling factor selected by a part of the channel attention mechanism, and finally the weighting operation is calculatedUpdating the corresponding positions of the channels and the original feature map to form a new output feature map, setting the input feature map as X, setting the attention weight of part of the channel attention mechanisms as A,The specific weight of the ith channel is indicated as small, the number of channels is C, the architecture parameter weight of each candidate operation is W,The architecture parameter weight representing the jth operation, the list of operations is O,Indicating that the j-th operation is to be performed,For the calculation of the partial channel attention mechanism, topq () is a channel selection operation,All are intermediate variables, the step 4 specifically includes the following steps:
step 4.1, calculating the attention weight of part of the channel attention mechanism:
;
Step 4.2, weighting the input feature map:
;
Step 4.3, calculating the total weight vector :
;
Step 4.4, selecting the front position according to the weight of each channelSince the number of channels is C, the parameters in the channel selection are:
;
Step 4.5, extracting important channel data, wherein the channels parameter specifies the channel index to be extracted:
;
wherein the first symbol Indicating the selection of all samples, symbolsRepresenting that all features in the channels specified by the indices are reserved during data extraction so as to ensure complete feature information;
step 4.6, weighting the important channel:
;
Step 4.7, splicing the calculated important channels with the original channels, and designating channel indexes to be updated by indexes:
,
wherein the first symbol Indicating the selection of all samples, symbolsRepresenting reservation of a front specified by an index at the time of data updateAll features in the channel to ensure that the updated channel is intact.
Through the steps, the consumption of computing resources is effectively reduced by the partial channel attention mechanism, and the response of the model to the key information is enhanced.
The step 5 comprises the following steps:
Step 5.1, initializing training parameters including the learning rate of the generator and the discriminator Weighting with network parametersOptimizer parameters, hybrid convolution operation weightsAnd a fixed noise vector;
Step 5.2, developing resistance training for the generator and the discriminator, and calculating loss through the discriminator according to the image and the real image generated by the generator;
Step 5.3, counter-propagating by using the discriminator and the generator loss, updating the weights of the discriminator and the generator, and optimizing the architecture parameters of the candidate operation;
Step 5.4, adjusting the weight of the mixed convolution operation according to the image generated in the training process and the feedback of the discriminator
Step 5.1 comprises:
step 5.1.1, setting learning rates of the generator and the discriminator to be respectively AndAnd creating an optimizer according to preset parameters to ensure that network parameters can be adjusted in the optimization process.
And 5.1.2, initializing weights alpha and beta of the mixed convolution operation to be 0.5, and taking the weights as importance weights for controlling the output of each path of the mixed convolution operation so as to realize flexible adjustment of network feature extraction.
And 5.1.3, generating a fixed noise vector fixed z for evaluating the performance of the generator in the training process, and ensuring that the evaluation result in each training iteration has consistency.
Step 5.2 comprises:
step 5.2.1, the generator randomly samples a noise vector z from the noise distribution, which is typically a uniform or normal distribution, to ensure that the generator can explore the diversified potential space. And generating a false image by a generator For subsequent comparison and discrimination with the real image. The generation of the countermeasure network consists of a generator and a discriminator, wherein the generator is used for generating images and is a component for generating the countermeasure network;
Step 5.2.2 the discriminator receives the real image x and the false image And calculates the output by forward propagationAnd,AndThe discrimination results of the true image and the false image are respectively shown, the values of the discrimination results are 0 and 1, and the higher the value is, the higher the possibility that the discriminator considers the input content to be true is. The aim of the discriminator is to identify as much as possible a true image and a false image, which is intended to be true, i.e. to increaseIs close to 1, while the false image is judged as false, i.e. reducedThe value of (2) is close to 0. The goal of the generator is to make the false image generated by itself deceive the discriminator, i.e. increaseThe value of (2) is close to 1. In this way, the generator and discriminator develop the resistance training at these two conflicting goals.
Step 5.2.3, performing a discriminator loss function calculation, wherein the discriminator loss function uses a logarithmic loss functionThe formula is:
,
Loss function The range of values is 0, + -infinity), where E represents the desire,Representing a real image x subject to a real data distribution,Representing random noise z subject to noise distributionThe goal of the discriminator is to minimize the loss functionI.e. maximizing the recognition rate for real data and the rejection rate for generated data;
Step 5.2.4, performing a generator loss function calculation, wherein the generator loss function uses a logarithmic loss function The formula is:
,
the loss function The range of values is [0, + ], the goal of the generator is to minimize the loss functionValues of (2) such thatAs close to 1 as possible, i.e. maximizing the probability that the generated image is considered by the discriminator to be a true image.
Step 5.3 comprises:
step 5.3.1, fixing the network parameter weights of the generators Keeping the generator unchanged according to the loss function of the discriminatorGradient calculation using back propagation methodAnd according to the learning rate of the discriminator by a gradient descent methodUpdating weight parameters of a discriminatorThe calculation formula is as follows:
,
then, one weight update is completed and the frequency is updated according to the discriminator Repeating the above operationsSecondary times;
step 5.3.2, fixing the network parameter weights of the discriminator Keeping the discriminator unchanged, according to the generator's loss functionComputing gradients using back propagation methodsAnd uses gradient descent method according to generator learning rateUpdating weights of a generatorThe calculation formula is as follows:
,
then finishing one-time weight updating;
step 5.3.3, optimizing the candidate operation architecture parameter weight W, wherein the calculation formula of the Output of each candidate operation is as follows:
Output,
where X is the input feature map, tol is the total number of candidate operations, The architecture parameter weight for the ith operation,Is the output of the ith operation;
Calculating a loss function L using the chain law weights each candidate operation Gradient of (2)Updating candidate operating architecture parameter weights using a gradient descent method:
,
finally, optimizing candidate operation architecture parameters;
step 5.3.4, dynamically adjusting the learning rate of the discriminator and the generator by using an adaptive moment estimation algorithm according to the optimizer settings:
Updating parameters by calculating moving averages of first and second moments, the updating formula being:
,
Wherein the method comprises the steps of Weight parameters that need to be updated on behalf of the current time step, as described aboveT in the subscript indicates the current time step or iteration number, lr is the number of iterations includingIs used for the learning rate of the user,Is an exponentially weighted moving average of the current gradient,Is an exponentially weighted moving average of the square of the current gradient,Is a minute constant for preventing zero removal errors. By the method, the step length of weight updating is dynamically optimized on the basis of keeping the respective learning rate, and the learning rate is indirectly adjusted, so that the training effect and the convergence speed of the generator and the discriminator are improved;
in step 5.3.5, the perceived similarity index IS (Inception Score) and the fraiche distance FID (Frechet Inception Distance) in the training process are monitored and recorded, and IS used for evaluating the diversity and the inter-class separation degree of the generated images in order to evaluate the common performance index of the generated countermeasure network, wherein the IS used for evaluating the quality and the diversity of the generated images, and the FID IS used for evaluating the similarity of the generated images and the real images in the feature space.
In step 6, the network weight parameters and the architecture parameters are continuously optimized and adjusted in a gradient descent mode, so that the super network gradually approaches to the optimal performance, and finally, the candidate operation with the largest weight is selected according to the distribution condition of the architecture parameter weights, so that the optimal network architecture is constructed.
At each iteration, the importance of the different operations is dynamically adjusted based on the feature map output and corresponding attention weights for each candidate operation to ensure that the training process to generate the countering network extranet is more efficient. And simultaneously, selecting the candidate operation with the greatest weight in each side of the super network to form an optimal network architecture. The process ensures that the expression capability and the generation effect of the model are improved by fully utilizing part of attention mechanisms while optimizing the architecture, so that better performance is achieved while the calculation complexity is lower.
The invention also provides a system for searching the generating countermeasure network architecture based on the mixed convolution operation, which comprises the following steps:
A building module for building up an antagonism network super-network and giving importance weights to each operation in the super-network with random continuous values in the range of 0 to 1, the weights being used to represent the probability that each operation is selected as an optimal operation;
The mixed convolution module is used for integrating common convolution operation and a self-attention mechanism to obtain a new operation which combines the advantages of the common convolution operation and the self-attention mechanism;
The training module is used for training the network weight of the super-network by using the training data and the labels thereof;
and the optimizing module is used for optimizing the hypersystem framework parameters by using a gradient descent method.
According to the invention, by introducing the mixed convolution operation, the search space of the traditional pure convolution neural network is remarkably expanded, the capturing capability of the model on long-distance information is enhanced, and the generation effect of the generation countermeasure network is improved. In addition, by combining partial channel attention mechanisms, the Martai effect caused by the randomness of the initialization of the architecture parameters in the architecture searching process is effectively relieved, the effect is represented in that the operation of initially obtaining high random values is continuously strengthened in the subsequent process, and the operation of low random values can be ignored in the architecture searching, so that a potential excellent architecture is missed, and the architecture searching is unstable. This approach achieves better results than the prior art approach.
The method has the beneficial effects that the method provides a hybrid convolution-based search algorithm for generating the countermeasure network architecture aiming at the neural architecture search for generating the countermeasure network. The algorithm introduces the mixed convolution combining the convolution and the self-attention mechanism into the search space, so that the traditional convolutional neural network search space is effectively expanded, and the capturing capacity of the model for long-distance information is improved. Before searching, a continuous loose mode is adopted, so that the original discrete search space becomes continuous, and gradient descent is directly used for searching, and the searching efficiency is greatly improved. In the searching process, a part of channel attention mechanisms are introduced in the common candidate operation, attention weights are distributed for different candidate operations, and a channel with the front attention weight rank is selected to participate in subsequent calculation, so that the consumption of calculation resources is reduced, and the searching efficiency is improved. The hybrid convolution-based generation countermeasure network architecture search algorithm brings new ideas and breakthroughs for the architecture search field, and has sufficient practical value and application scenes.
Drawings
FIG. 1 is a diagram of an overall framework of a search space in the present invention.
Fig. 2 is a schematic diagram of the hybrid convolution operation in the present invention.
FIG. 3 is a schematic diagram of a portion of the channel attention mechanism in the present invention.
FIG. 4 IS a graph of the architecture optimized IS score trend in the present invention.
FIG. 5 is a graph of architecture optimized FID score trend in the present invention.
Detailed Description
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.
As shown in fig. 1, the present embodiment provides a method for searching a generating countermeasure network architecture based on a hybrid convolution operation, including the steps of:
Step 1, firstly, designing a search space for generating an antagonistic network generator architecture as shown in fig. 1, wherein the search space consists of 3 units which are connected with each other and are connected with hops, and the inside of the unit can be regarded as a directed acyclic graph which is connected with each other by 5 nodes. Gray arrows indicate up-sampling operations, black arrows indicate normal operations, up-sampling candidate operations are set to { transpose convolution, nearest neighbor interpolation up-sampling, bilinear interpolation up-sampling }, normal candidate operations are set to {1×1 convolution, 3×3 convolution, 5×5 convolution, 3×3 separable convolution, 5×5 separable convolution, 7×7 separable convolution, 1×1 mixed convolution, 3×3 mixed convolution, 5×5 mixed convolution }.
The mixed convolution operation is shown in fig. 2, firstly, an input feature map with length, width and channel number of H, R, C is projected through three 1×1 convolutions to obtain an intermediate feature map after three projections, and then the three feature maps are divided into h groups, wherein h represents the number of attention heads. 3h intermediate feature maps were obtained and these projected feature maps were used for the convolution path and the self-attention path, respectively. The upper part of fig. 2 shows the calculation process of the convolution path, where k represents the convolution kernel size in the convolution path, and 1,3, and 5 are taken in the present invention, respectively, corresponding to the hybrid convolution operation described above. In the convolution path, a lightweight full connection layer is adopted and generatedFeatures generated by translation and aggregation are processed in a convolution-like manner, information is collected from the local receptive field as in conventional convolution, and finally the feature map is output. The lower half of FIG. 2 illustrates the operation of the self-attention mechanism, which divides the intermediate features into h groups, each group containing three features, from each 1X 1 convolution, respectively, the corresponding three feature maps acting as queries, keys and values, respectively, following the conventional multi-headed self-attention calculation output. Finally, a final output feature map y is calculated from two trainable parameters α and β:
Step 2, constructing a generator super-network according to the search space shown in fig. 1, randomly initializing the super-network, and endowing each candidate operation with random architecture parameters between 0 and 1 And ensures that the sum of the parameter values of each node architecture is 1. The input noise with 128-dimensional channel number is randomly generated to input the super-network to generate an image, and the input directions of the super-network feature map are all unidirectional input from front to back.
And 3, adding a part of channel attention mechanism after the node where the common candidate operation is located as shown in fig. 3, so that the input feature map respectively passes through an average pooling layer and a maximum pooling layer to obtain two different feature maps, and compressing the number of channels and extracting important features. The obtained characteristic diagram is further subjected to linear change on the pooled characteristic through two full-connection layers with RELU linear units in the middle, and finally attention scores are output after normalization.
And 4, summing the architecture parameter weights of each channel in the attention mechanism to obtain a vector representing the total weight of each channel, wherein each element of the vector represents the total importance of the corresponding channel. Then select the front from the total weight vectorsIs involved in the calculation of the weighting operation, and finally the part of the calculation is performedChannels and the rest of the original characteristic diagramThe channels are spliced to form a new output feature map, the input feature map is set to be X, the attention weight of the channels is set to be A, the number of the channels is set to be 128, the weight of each operation is set to be W, and the operation list is set to be O, which can be expressed as follows:
step 4.1, calculating the channel weight, ;
Step 4.2, weighting the input feature map,;
Step 4.3, calculating the total weight vector,;
Step 4.4, selecting the most important channel,;
Step 4.5, extracting important channel data,;
Step 4.6, weighting the important channels,;
Step 4.7, splicing the calculated important channel with the original channel,
And 5, according to the set super network, developing and generating search of the architecture of the countermeasure network generator, wherein the method specifically comprises the following steps of:
step 5.1, initializing training parameters including learning rate and network parameters of generator and discriminator, optimizer parameters, mixed convolution operation weights And a fixed noise vector:
step 5.1.1 setting the learning rates of the generator and the discriminator to 0.0002 according to the parameters An adaptive moment estimation algorithm optimizer is created to ensure that network parameters can be adjusted during the optimization process.
And 5.1.2, initializing weights alpha and beta of the mixed convolution operation to be 0.5, and taking the weights as importance weights for controlling the output of each path of the mixed convolution operation so as to realize flexible adjustment of network feature extraction.
And 5.1.3, generating a fixed noise vector fixed z with the dimension of 128, which is used for evaluating the performance of the generator in the training process and ensuring that the evaluation result in each training iteration has consistency.
Step 5.2, developing the resistance training for the generator and the discriminator, and calculating the loss through the discriminator according to the image and the real image generated by the generator:
Step 5.2.1 the generator randomly samples a 128-dimensional noise vector z from the noise distribution, a process that typically uses a uniform or normal distribution to ensure that the generator can explore the diverse potential space. Subsequently, the generator network maps the noise vector z through a multi-layer neural network to generate a false image For subsequent comparison and discrimination with the real image.
Step 5.2.2 the discriminator receives the real image x and the generated false imageAnd calculates the output by forward propagationAnd. Wherein, Representing the discrimination result of the discriminator on the real imageThe result of discrimination of the generated image is indicated. The goal of the discriminator is to distinguish the real image from the generated image as accurately as possible to enhance the model's understanding of the real data distribution.
Step 5.2.3 calculating the loss function of the discriminatorThe loss function is a logarithmic loss function, which is designed to measure the classification ability of the discriminator in discriminating between true and false images. By matching the real imageThe discriminator is penalized to encourage it to better identify the true data. To generate an imageThe output of (a) is logarithmic, the discriminator is caused to enhance the recognition ability of the generated image. The optimization of the loss function aims at improving the accuracy of the discriminator and ensuring that the discriminator can effectively distinguish a true sample from a generated sample in the training process.
Step 5.2.4 calculating the loss function of the generator using the logarithmic loss functionWhich is intended to reflect the capabilities of the false image spoof discriminator generated by the generator. The loss of the generator is reduced when the false images generated by the generator successfully make the discriminator consider them as coming from the true data distribution. By optimizing the loss, the generator continuously adjusts the generation strategy, enhances the authenticity of the generated image, and enables the generated image to be more visually recognized, thereby improving the generation effect. In this process, the goal of the generator is to minimize its loss so thatAs close to 1 as possible, thereby improving the quality of the generated image.
And 5.3, counter-propagating by using the discriminator and the generator loss, updating the weights of the discriminator and the generator, and optimizing the parameters of the network architecture:
Step 5.3.1, updating the weight parameters of the discriminator, fixing the parameters of the generator, keeping the generator unchanged, calculating the gradient according to the loss function of the discriminator, using a back propagation algorithm, calculating the gradient of the loss function to the weight of the discriminator by a chain law, updating the weight of the discriminator by a gradient descent method, multiplying the calculated gradient value by the learning rate, subtracting from the current weight, and repeating the above operations Secondary weight updates to ensure that the discriminatory power of the discriminator is sufficiently trained to maintain balance between the generator and the discriminator.
And 5.3.2, updating the weight parameters of the generator, fixing the parameters of the discriminator, keeping the discriminator unchanged, calculating the gradient according to the loss function of the generator, using a back propagation algorithm, calculating the gradient of the loss function to the weight of the generator through a chain rule, updating the weight of the generator through a gradient descent method, multiplying the calculated gradient value by a learning rate, and subtracting the learning rate from the current weight to finish 1-time weight updating.
And 5.3.3, optimizing network architecture parameters, and selecting the optimal operation through weight adjustment of different candidate operations in the super network. At this stage, the weights of the candidate operations in the super-network are dynamically adjusted according to the current training progress and network performance index. By evaluating the influence of different operations on the performance of the generator and the discriminator, the operation with the best performance is selected, so that the generation capacity and the discrimination capacity of the model are further improved.
And 5.3.4, developing dynamic adjustment on the learning rate of the discriminator and the generator according to the setting of the optimizer, and gradually adjusting the learning rate to adapt to the convergence rate of the model according to the current training stage and performance.
And 5.3.5, monitoring and recording performance indexes and training parameters in the training process.
Step 5.4, adjusting the weight of the mixed convolution operation according to the image generated in the training process and the feedback of the discriminator
And 6, adopting a differentiable searching method, optimizing the super network through a gradient, and stopping searching after 100 iterations are carried out. At this time, according to the architecture parameters of the candidate operation in each nodeAnd selecting the operation with the largest weight as a final framework. The concrete steps are as follows:
{ normal operation, [ [1×1 convolution, 3×3 mixed convolution ], [5×5 mixed convolution, 3×3 separable convolution, [ 3×3 separable convolution ], [7×7 separable convolution, 3×3 mixed convolution ] ], upsampling operation [ [ transposed convolution, transposed convolution ], [ nearest neighbor interpolation upsampling ] ], skip connection of length 1 [ [ transposed convolution, bilinear interpolation upsampling ], skip connection of length 2 [ transposedconvolution ] }. Wherein 1×1,3×3, etc. represent the convolution kernel size, the contents in each square bracket represent an operational combination of units, the units are arranged in the numbering order, and the operations in the units are arranged in the node numbering order. The length of a hop connection refers to the span of the connection-the hop connection of length 1 includes the connection of unit 1 to unit 2 and unit 2 to unit 3, and the hop connection of length 2 is the connection of unit 1 to unit 3.
The embodiment adds the hybrid convolution operation of integrating the self-attention mechanism and the convolution on the basis of the traditional convolution neural network, effectively expands the search space for generating the countermeasure network architecture, and improves the capturing capability of the network for long-distance information. In addition, the partial channel attention mechanism introduced by the embodiment effectively reduces the consumption of computing resources and improves the capability of selecting key operations of the super network. This example conducted experiments and validated performance on two common data, CIFAR-10 and STL-10. CIFAR-10 pictures of 60000 different categories are included, of which 50000 are used for training of the network and 10000 are used for testing of the network, and the sizes of the pictures are 32×32.STL-10 contained 13000 10 different categories of pictures, 96X 96 in size, which provided higher resolution than STL-10, and in this example were sized 48X 48 for balance of performance. The present embodiment only requires 2 hours on the GPU4090 to search for an excellent network architecture, which is far faster than other non-gradient algorithms, and the search speed is greatly improved. On the basis, the found architecture achieves better generation effect and can generate rich and lifelike images. The performance of the network architecture obtained by searching in the invention when optimized in the standard dataset CIFAR-10 is shown in fig. 4 and 5:
Inception Score (perceived similarity index, IS) and Frechet Inception Distance (fraiche distance, FID) in the figure are performance metrics commonly used in generating countermeasure networks. Where the IS score in fig. 4 IS used to evaluate the diversity and inter-class separation of the generated images, the size of the score reflects the quality and diversity of the generated images, and thus the higher the score the better, the higher IS score indicates that the generated images are not only visually diversified, but also more attractive. Meanwhile, the FID score in fig. 5 measures the similarity between the generated image and the real image in the feature space, so that the lower the score, the better, the lower the FID score indicates that the feature distribution of the generated image is closer to the feature distribution of the real image, indicating that the generated sample is more true on the feature level. The method of the present embodiment enables the discovery of excellent performance architectures in a very short time, and these architectures can quickly converge to optimal performance upon retraining.
The present invention provides a method and a system for generating an antagonistic network architecture based on a hybrid convolution operation, and the method and the way for realizing the technical scheme are numerous, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (10)

1.基于混合卷积操作的生成对抗网络架构搜索方法,其特征在于,包括以下步骤:1. A method for searching a generative adversarial network architecture based on a hybrid convolution operation, comprising the following steps: 步骤1,设计生成对抗网络架构搜索的搜索空间,定义候选操作,所述候选操作包括普通候选操作和上采样候选操作两种类型,确定架构搜索的范围;Step 1, designing a search space for a generative adversarial network architecture search, defining candidate operations, which include two types: ordinary candidate operations and upsampling candidate operations, and determining the scope of the architecture search; 步骤2,根据步骤1中设计的搜索空间构建一个包含所有候选架构的生成对抗网络超网,并使用0到1范围内的连续数值随机为每个候选操作赋予一个架构参数,来表示每个候选操作被选中的概率;Step 2: construct a generative adversarial network supernet containing all candidate architectures based on the search space designed in step 1, and randomly assign an architecture parameter to each candidate operation using a continuous value ranging from 0 to 1 to represent the probability of each candidate operation being selected; 步骤3,在超网的普通候选操作所在边中加入部分通道注意力机制,以计算不同通道的注意力权重;Step 3: Add a partial channel attention mechanism to the edge where the common candidate operations of the supernet are located to calculate the attention weights of different channels; 步骤4,根据每个候选操作的特征图输出和计算得到的通道注意力权重,加权调整特征图中的重要通道和特征,使得不同操作的输出更加突出重要特征,从而动态地调整和优化候选操作的架构参数;Step 4: According to the feature map output of each candidate operation and the calculated channel attention weight, the important channels and features in the feature map are weighted and adjusted so that the outputs of different operations highlight the important features, thereby dynamically adjusting and optimizing the architecture parameters of the candidate operations; 步骤5,使用梯度下降方法训练生成对抗网络超网,根据梯度信息优化超网架构参数与权重参数;Step 5: Use the gradient descent method to train the generated adversarial network supernet, and optimize the supernet architecture parameters and weight parameters according to the gradient information; 步骤6,交替执行步骤4和步骤5,直至达到预先设置的最大循环次数,选择超网每条边中架构参数权重最大的候选操作,得到最优网络架构。Step 6: Alternately execute steps 4 and 5 until the preset maximum number of cycles is reached, and select the candidate operation with the largest architecture parameter weight in each edge of the supernet to obtain the optimal network architecture. 2.根据权利要求1所述的方法,其特征在于,步骤1中,所述搜索空间包括N个相互连接 与跳连接的单元,每个单元被看作一个由M个节点组成的有向无环图,单元内包含D个上采 样候选操作节点和E个普通候选操作节点,其中,N、M、D、E均为自然数; 2. The method according to claim 1, characterized in that, in step 1, the search space includes N interconnected and skip-connected units, each unit is regarded as a directed acyclic graph consisting of M nodes, and the unit contains D upsampling candidate operation nodes and E ordinary candidate operation nodes, wherein , N, M, D, E are all natural numbers; 在普通候选操作中加入了混合卷积操作;所述混合卷积操作同时结合了卷积操作与自 注意力操作,使输入特征图通过三个共享的1×1卷积进行投影,从而生成三份中间特征图, 再将投影后的特征图分别用于卷积路径和自注意力路径;在自注意力路径中,每份中间特 征图被分为h组,每组包含来自每个1×1卷积的三个特征图,分别作为查询、键和值,进行多 头注意力计算;在卷积核大小为k的卷积路径中,通过轻量级的全连接层生成k²个特征图, 并通过平移和聚合操作处理输入特征图;最终,来自卷积路径和自注意力路径的输出按权 重加和,得到最终输出A hybrid convolution operation is added to the normal candidate operation; the hybrid convolution operation combines the convolution operation and the self-attention operation at the same time, so that the input feature map is projected through three shared 1×1 convolutions to generate three intermediate feature maps, and then the projected feature maps are used in the convolution path and the self-attention path respectively; in the self-attention path, each intermediate feature map is divided into h groups, each group contains three feature maps from each 1×1 convolution, which are used as query, key and value for multi-head attention calculation; in the convolution path with a convolution kernel size of k, k² feature maps are generated through a lightweight fully connected layer, and the input feature map is processed by translation and aggregation operations; finally, the outputs from the convolution path and the self-attention path are summed according to the weights to obtain the final output : 其中𝛼与𝛽表示混合卷积操作权重。Where 𝛼 and 𝛽 represent the weights of the mixed convolution operation. 3.根据权利要求2所述的方法,其特征在于,步骤2中,所述生成对抗网络超网共有条边,其中条为跳连接的边,条为上采样候选操作所在边,条为普通候选操作所在边;跳连接与上采样候选操作具有T个操作选择,普通候选操 作具有Y个操作选择; 3. The method according to claim 2, characterized in that in step 2, the generative adversarial network supernet has edge, where The edges are skip connections, The edge is where the upsampling candidate operation is located. The strips are edges where common candidate operations are located; the skip connection and upsampling candidate operations have T operation options, and the common candidate operations have Y operation options; 所述超网包括S个子网,其中S为自然数,,子网表示所有连接 边上仅有一个操作的网络; The supernet includes S subnets, where S is a natural number. , a subnet represents a network with only one operation on all connected edges; 超网初始化时,将会对每条边上的候选操作的架构参数采用随机分布赋值,数值大小表示候选操作被选中的概率。When the supernet is initialized, the architectural parameters of the candidate operations on each edge will be assigned values using a random distribution, and the value indicates the probability of the candidate operation being selected. 4.根据权利要求3所述的方法,其特征在于,步骤3中,将部分通道注意力机制加入到每 个普通卷积操作节点之后,在提升搜索效率的同时减少计算资源消耗,具体包括:在计算特 征图时,首先使输入特征图分别通过一个平均池化层与一个最大池化层,得到两份不同的 特征图并实现通道数的压缩与重要特征的提取;然后将得到的特征图通过两个全连接层 FC1与FC2来进一步对池化后的特征进行线性变化,其中第一全连接层FC1将通道数减少至原 通道数的,并通过 ReLU 激活函数引入非线性,再通过第二全连接层FC2 将通道数恢复到 原大小;最后将来自平均池化和最大池化的输出相加,并对获得的输出进行归一化,使得所 有通道的权重和为 1,得到不同通道的注意力权重。 4. The method according to claim 3 is characterized in that, in step 3, after adding the partial channel attention mechanism to each ordinary convolution operation node, the search efficiency is improved while reducing the consumption of computing resources, specifically comprising: when calculating the feature map, firstly, the input feature map is passed through an average pooling layer and a maximum pooling layer respectively to obtain two different feature maps and realize the compression of the number of channels and the extraction of important features; then the obtained feature map is passed through two fully connected layers FC 1 and FC 2 to further linearly change the pooled features, wherein the first fully connected layer FC 1 reduces the number of channels to 1/2 of the original number of channels. , and nonlinearity is introduced through the ReLU activation function, and the number of channels is restored to its original size through the second fully connected layer FC 2 ; finally, the outputs from average pooling and maximum pooling are added, and the obtained output is normalized so that the sum of the weights of all channels is 1, and the attention weights of different channels are obtained. 5.根据权利要求4所述的方法,其特征在于,步骤4中,首先对部分通道注意力机制中每 个通道的注意力权重求和,得到一个表示每个通道权重的总向量,所述总向量的每个元素 表示对应通道的重要性;然后从所述总向量中选择权重排在前的通道参与加权操作计算, 其中q表示部分通道注意力机制选取的比例系数,最后再将加权操作计算过的通道与原特 征图中对应的位置进行更新,形成新的输出特征图,设定输入特征图为X,部分通道注意力 机制的注意力权重为A,表示第i个通道的具体权重大小,通道数为C,每个候选操作的架 构参数权重为W,表示第j个操作的架构参数权重,操作列表为O,表示第j个操作,为部分通道注意力机制的计算,topq()为通道选择操作,均为中间变量,则步骤4具体包括如下步骤: 5. The method according to claim 4, characterized in that in step 4, firstly, the attention weights of each channel in the partial channel attention mechanism are summed to obtain a total vector representing the weight of each channel, and each element of the total vector represents the importance of the corresponding channel; then, the weighted channels are selected from the total vector. The channels of the weighted operation are calculated, where q represents the proportional coefficient selected by the partial channel attention mechanism. Finally, the weighted operation is calculated. The corresponding positions of the channel and the original feature map are updated to form a new output feature map. The input feature map is set to X, and the attention weight of the partial channel attention mechanism is set to A. represents the specific weight of the i-th channel, the number of channels is C, and the architectural parameter weight of each candidate operation is W. represents the architectural parameter weight of the jth operation, the operation list is O, represents the jth operation, is the calculation of the partial channel attention mechanism, topq() is the channel selection operation, , , , are all intermediate variables, then step 4 specifically includes the following steps: 步骤4.1,计算部分通道注意力机制的注意力权重:Step 4.1, calculate the attention weights of the partial channel attention mechanism: 步骤4.2,加权输入特征图:Step 4.2, weighted input feature map: 步骤4.3,计算总权重向量Step 4.3, calculate the total weight vector : 步骤4.4,根据每个通道的权重大小选择位于前的通道,因通道数为C,所以通道选择 内的参数为Step 4.4, select the first channel according to the weight of each channel. The number of channels is C, so the parameters in the channel selection are : 步骤4.5,提取重要通道数据,indices参数指定要提取的通道索引:Step 4.5, extract important channel data, the indices parameter specifies the channel index to be extracted: 其中第一个符号 表示选择所有样本,符号表示在数据提取时保留indices指定通道 中的所有特征; The first symbol Indicates the selection of all samples, symbol Indicates that all features in the channel specified by indices are retained during data extraction; 步骤4.6,对重要通道进行加权操作:Step 4.6, weighting operation on important channels: 步骤4.7,将计算得到的重要通道与原本通道进行拼接,indices参数指定要更新的通道索引:Step 4.7, concatenate the calculated important channels with the original channels. The indices parameter specifies the channel index to be updated: 其中第一个符号表示选择所有样本,符号表示在数据更新时保留由indices指定的 前通道中的所有特征。 The first symbol Indicates the selection of all samples, symbol Indicates that the previous value specified by indices is retained when the data is updated. All features in the channel. 6.根据权利要求5所述的方法,其特征在于,步骤5包括:6. The method according to claim 5, characterized in that step 5 comprises: 步骤5.1,初始化训练参数,所述训练参数包括生成器与鉴别器的学习率与网 络参数权重、优化器参数、混合卷积操作权重和固定噪声向量; Step 5.1, initialize the training parameters, which include the learning rates of the generator and the discriminator and network parameter weights , optimizer parameters, mixed convolution operation weights and a fixed noise vector; 步骤5.2,对生成器和鉴别器展开对抗性训练,并根据生成器生成的图像和真实图像,通过鉴别器计算损失;Step 5.2: Conduct adversarial training on the generator and discriminator, and calculate the loss through the discriminator based on the images generated by the generator and the real images; 步骤5.3,使用鉴别器与生成器损失进行反向传播,更新鉴别器与生成器权重,优化候选操作的架构参数;Step 5.3, back-propagate using the discriminator and generator losses, update the discriminator and generator weights, and optimize the architecture parameters of the candidate operation; 步骤5.4,根据训练过程中生成的图像和鉴别器的反馈,调整混合卷积操作的权重Step 5.4, adjust the weights of the mixed convolution operation based on the images generated during training and the feedback from the discriminator . 7.根据权利要求6所述的方法,其特征在于,步骤5.2包括:7. The method according to claim 6, characterized in that step 5.2 comprises: 步骤5.2.1,生成器从噪声分布中随机采样一个噪声向量z,并通过生成器来生成假图 像;生成对抗网络由生成器和鉴别器组成,生成器是用来生成图像的,是生成对抗网络 的一个组件; Step 5.2.1, the generator randomly samples a noise vector z from the noise distribution and generates a fake image through the generator ; The Generative Adversarial Network consists of a generator and a discriminator. The generator is used to generate images and is a component of the Generative Adversarial Network; 步骤5.2.2,鉴别器接收真实图像x和假图像,并通过前向传播计算输出分别表示对真实图像的判别结果和对假图像的判别结果,取值均 为[0,1]; Step 5.2.2, the discriminator receives the real image x and the fake image , and output is calculated by forward propagation and , and They represent the discrimination results of real images and fake images respectively, and their values are both [0,1]; 步骤5.2.3,进行鉴别器损失函数计算,其中鉴别器的损失函数使用对数损失函数, 公式为: Step 5.2.3, calculate the discriminator loss function, where the discriminator loss function uses the logarithmic loss function , the formula is: 损失函数取值范围为[0,+∞),其中E表示期望,表示真实图像x服从真 实数据分布表示随机噪声z服从噪声分布;鉴别器的目标是最小 化损失函数的值,即最大化对真实数据的识别率和对生成数据的拒绝率; Loss Function The value range is [0, +∞), where E represents the expectation, Indicates that the real image x follows the real data distribution , Indicates that random noise z follows the noise distribution ; The goal of the discriminator is to minimize the loss function The value of is to maximize the recognition rate of real data and the rejection rate of generated data; 步骤5.2.4,进行生成器损失函数计算,其中生成器的损失函数使用对数损失函数, 公式为: Step 5.2.4, calculate the generator loss function, where the generator loss function uses the logarithmic loss function , the formula is: 该损失函数取值范围为[0,+∞),生成器的目标是最小化损失函数的值,即最大化 生成图像被鉴别器认为是真实图像的概率。 The loss function The value range is [0, +∞), and the goal of the generator is to minimize the loss function The value of is to maximize the probability that the generated image is considered to be a real image by the discriminator. 8.根据权利要求7所述的方法,其特征在于,步骤5.3包括:8. The method according to claim 7, characterized in that step 5.3 comprises: 步骤5.3.1,固定生成器的网络参数权重,保持生成器不变,根据鉴别器的损失函数,使用反向传播方法计算梯度,并通过梯度下降方法根据鉴别器学习率更新鉴 别器的权重参数,计算公式为: Step 5.3.1, fix the network parameter weights of the generator , keeping the generator unchanged, according to the loss function of the discriminator , using the back-propagation method to calculate the gradient , and the discriminator learning rate is calculated by gradient descent Update the weight parameters of the discriminator , the calculation formula is: 然后完成一次权重更新,并根据鉴别器更新频率的设置,重复以上操作次; Then complete a weight update and update the frequency according to the discriminator Repeat the above steps for the settings Second-rate; 步骤5.3.2,固定鉴别器的网络参数权重,保持鉴别器不变,根据生成器的损失函数使用反向传播方法计算梯度,并使用梯度下降方法根据生成器学习率更新生成 器的权重,计算公式为: Step 5.3.2, fix the network parameter weights of the discriminator , keeping the discriminator unchanged, according to the loss function of the generator Calculate gradients using backpropagation , and use the gradient descent method according to the generator learning rate Update the weights of the generator , the calculation formula is: 然后完成一次权重更新;Then complete a weight update; 步骤5.3.3,优化候选操作架构参数权重W,每个候选操作的输出Output的计算公式为:Step 5.3.3, optimize the candidate operation architecture parameter weight W, and the calculation formula for the output Output of each candidate operation is: OutputOutput , 其中X为输入特征图,tol为候选操作总数,为第i个操作的架构参数权重,为第 i个操作的输出; Where X is the input feature map, tol is the total number of candidate operations, is the architectural parameter weight of the ith operation, is the output of the i-th operation; 使用链式法则计算损失函数L对每个候选操作权重的梯度,使用梯度下降法更 新候选操作架构参数权重: Use the chain rule to calculate the loss function L for each candidate operation weight Gradient , update the candidate operation architecture parameter weights using gradient descent: 最终完成候选操作架构参数的优化;Finally, the optimization of candidate operation architecture parameters is completed; 步骤5.3.4,根据优化器设置,使用自适应矩估计算法对鉴别器和生成器的学习率进行动态调整:Step 5.3.4, according to the optimizer settings, use the adaptive moment estimation algorithm to dynamically adjust the learning rate of the discriminator and generator: 通过计算一阶矩和二阶矩的移动平均来更新参数,更新公式为:The parameters are updated by calculating the moving average of the first-order moment and the second-order moment. The update formula is: 其中代表当前的时间步需要更新的权重参数,lr是包括的预设学习率,是当前梯度的指数加权移动平均,是当前梯度平方的指数加权移动平均,是常数; in Represents the weight parameters that need to be updated at the current time step, lr includes The default learning rate is is the exponentially weighted moving average of the current gradient, is the exponentially weighted moving average of the current squared gradient, is a constant; 步骤5.3.5,监控和记录训练过程中的感知相似度指数IS与弗雷歇距离FID,其中,IS用于评估生成图像的多样性和类间分离度,衡量生成图像的质量和多样性,FID衡量生成图像与真实图像在特征空间的相似度。Step 5.3.5, monitor and record the perceptual similarity index IS and Fréchet distance FID during the training process, where IS is used to evaluate the diversity and inter-class separation of generated images, measure the quality and diversity of generated images, and FID measures the similarity between generated images and real images in the feature space. 9.根据权利要求8所述的方法,其特征在于,步骤6中,采用梯度下降方式不断优化与调整网络权重参数和架构参数,使超网逐渐逼近最优性能;最终,根据架构参数权重的分布情况,选择权重最大的候选操作,构建出最优网络架构。9. The method according to claim 8 is characterized in that in step 6, the network weight parameters and architecture parameters are continuously optimized and adjusted by gradient descent so that the supernet gradually approaches the optimal performance; finally, according to the distribution of the architecture parameter weights, the candidate operation with the largest weight is selected to construct the optimal network architecture. 10.根据权利要求1~9任一项所述的方法实现的基于混合卷积操作的生成对抗网络架构搜索系统,其特征在于,包括:10. A generative adversarial network architecture search system based on hybrid convolution operations implemented according to the method of any one of claims 1 to 9, characterized in that it comprises: 构建模块,用于构建生成对抗网络超网,并用0到1范围内的随机连续数值为超网内的每个操作赋予重要性权重,所述权重用于表示每个操作被选择作为最优操作的概率;A construction module, used to construct a generative adversarial network supernet, and assign an importance weight to each operation in the supernet using a random continuous numerical value ranging from 0 to 1, wherein the weight is used to represent the probability of each operation being selected as the optimal operation; 混合卷积模块,用于集成普通卷积操作与自注意力机制,获得一种同时融合了二者优点的新操作;The hybrid convolution module is used to integrate the ordinary convolution operation with the self-attention mechanism to obtain a new operation that combines the advantages of both. 训练模块,用于使用训练数据及其标签训练超网的网络权重;A training module for training the network weights of the supernet using training data and its labels; 优化模块,用于使用梯度下降方法对超网架构参数进行优化。The optimization module is used to optimize the parameters of the supernet architecture using the gradient descent method.
CN202411673668.2A 2024-11-21 2024-11-21 Generative adversarial network architecture search method and system based on hybrid convolution operation Active CN119150925B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411673668.2A CN119150925B (en) 2024-11-21 2024-11-21 Generative adversarial network architecture search method and system based on hybrid convolution operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411673668.2A CN119150925B (en) 2024-11-21 2024-11-21 Generative adversarial network architecture search method and system based on hybrid convolution operation

Publications (2)

Publication Number Publication Date
CN119150925A true CN119150925A (en) 2024-12-17
CN119150925B CN119150925B (en) 2025-03-14

Family

ID=93810883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411673668.2A Active CN119150925B (en) 2024-11-21 2024-11-21 Generative adversarial network architecture search method and system based on hybrid convolution operation

Country Status (1)

Country Link
CN (1) CN119150925B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119886228A (en) * 2025-03-27 2025-04-25 南京信息工程大学 Method and system for searching architecture of generating countermeasure network based on architecture distillation technology
CN120108386A (en) * 2025-05-12 2025-06-06 中国民用航空飞行学院 A method for predicting controller voice fatigue based on remote tower scenario

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011521A (en) * 2023-02-17 2023-04-25 北京工业大学 Efficient neural network architecture searching method based on differentiable
CN118196600A (en) * 2024-05-17 2024-06-14 南京信息工程大学 Neural architecture searching method and system based on differential evolution algorithm
CN118506096A (en) * 2024-05-29 2024-08-16 哈尔滨理工大学 Space-spectrum neural architecture search HSI classification method based on noise interference heuristic
CN118821905A (en) * 2024-09-18 2024-10-22 南京信息工程大学 Agent model-assisted evolutionary generative adversarial network architecture search method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116011521A (en) * 2023-02-17 2023-04-25 北京工业大学 Efficient neural network architecture searching method based on differentiable
CN118196600A (en) * 2024-05-17 2024-06-14 南京信息工程大学 Neural architecture searching method and system based on differential evolution algorithm
CN118506096A (en) * 2024-05-29 2024-08-16 哈尔滨理工大学 Space-spectrum neural architecture search HSI classification method based on noise interference heuristic
CN118821905A (en) * 2024-09-18 2024-10-22 南京信息工程大学 Agent model-assisted evolutionary generative adversarial network architecture search method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
尚迪雅;孙华;洪振厚;曾庆亮;: "基于无梯度进化的神经架构搜索算法研究综述", 计算机工程, no. 09, 31 December 2020 (2020-12-31) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119886228A (en) * 2025-03-27 2025-04-25 南京信息工程大学 Method and system for searching architecture of generating countermeasure network based on architecture distillation technology
CN120108386A (en) * 2025-05-12 2025-06-06 中国民用航空飞行学院 A method for predicting controller voice fatigue based on remote tower scenario

Also Published As

Publication number Publication date
CN119150925B (en) 2025-03-14

Similar Documents

Publication Publication Date Title
CN119150925B (en) Generative adversarial network architecture search method and system based on hybrid convolution operation
Ding et al. Where to prune: Using LSTM to guide data-dependent soft pruning
CN110782015A (en) Training method, device and storage medium for network structure optimizer of neural network
CN113011487B (en) An Open Set Image Classification Method Based on Joint Learning and Knowledge Transfer
CN115116139B (en) Multi-granularity human action classification method based on graph convolutional network
CN119047519B (en) Neural architecture searching method for long tail data set
CN113807176A (en) Small sample video behavior identification method based on multi-knowledge fusion
Xue et al. An effective surrogate-assisted rank method for evolutionary neural architecture search
CN113111308A (en) Symbolic regression method and system based on data-driven genetic programming algorithm
CN119886226B (en) Neural architecture searching method based on diffusion evolution algorithm
CN116468095A (en) Neural network architecture searching method and device, equipment, chip and storage medium
CN119416822B (en) Task processing method, system, terminal and medium based on multiple expert layers
CN120562484A (en) A multi-objective deep neural network architecture search method for hybrid CNN-Transformer architecture
CN120123589A (en) An evaluation data recommendation method based on heterogeneous graph neural network
Zhai et al. Generative neural architecture search
CN108833173B (en) A deep network representation method with rich structural information
CN114239795B (en) Convolutional Neural Network Architecture Search Method Based on Differentiable Sampler and Progressive Learning
Ali et al. Recent trends in neural architecture search systems
Fu et al. Study of DNN Network Architecture Search for Robot Vision
CN116091167A (en) Group purchase recommendation model based on multitask learning framework
CN117689865A (en) Target detection method and system based on feature and fusion mode search
CN119886228B (en) A generative adversarial network architecture search method and system based on architecture distillation technology
CN114387490A (en) Backbone Design of End-to-End OCR Recognition System Based on NAS Search
CN118314504B (en) Action recognition method based on progressive few-shot knowledge distillation
CN119646290B (en) A recommendation method based on users' implicit hierarchical interests

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant