CN119150925A

CN119150925A - Method and system for searching generated countermeasure network architecture based on mixed convolution operation

Info

Publication number: CN119150925A
Application number: CN202411673668.2A
Authority: CN
Inventors: 邹雨枫; 薛羽
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2024-11-21
Filing date: 2024-11-21
Publication date: 2024-12-17
Anticipated expiration: 2044-11-21
Also published as: CN119150925B

Abstract

The invention provides a method and a system for searching a generating countermeasure network architecture based on a mixed convolution operation, wherein the method firstly designs a wide search space, then constructs a super network which merges all candidate operations, and enables the super network to be completely micro in training process by a continuous relaxation method, so that efficient training is carried out by using a gradient descent method, and finally, the operation with the best performance at each position is selected to obtain an optimal architecture. Unlike traditional architectural search methods, the present invention introduces a novel hybrid convolution operation in the search space and optimizes the search process by integrating part of the channel attention mechanism. The generation of the countermeasure network architecture search based on the mixed convolution operation enhances the capturing capability of the network to the long-distance information and the architecture search speed, and compared with the traditional search method, the method has higher efficiency and better performance, and can find the optimal neural network architecture suitable for the task more quickly.

Description

Method and system for searching generated countermeasure network architecture based on mixed convolution operation

Technical Field

The invention relates to the technical field of deep neural networks, in particular to a method and a system for searching a generating countermeasure network architecture based on mixed convolution operation.

Background

The generation of the countermeasure network is a deep learning model, which consists of a generator and a discriminator, and automatically learns the complex characteristics of data through the countermeasure process, so that the quality of the generated data is improved, new ideas and tools are provided for various applications, and remarkable results are achieved in the fields of computer vision, image generation, data enhancement, style migration, speech synthesis and the like. The traditional generation countermeasure network architecture is designed manually by researchers, and continuous trial and error and adjustment are required, so that the researchers are required to have abundant expert experience, and a great deal of labor cost is also required. The advent of neural architecture search technology has revolutionized the generation of reactive network architecture designs.

Neural architecture search is a method of automatically designing neural network architecture that utilizes search algorithms to automatically discover high performance neural network architecture within a predefined search space. By searching the neural architecture, the manpower cost and time investment can be greatly reduced, so that even non-expert can design the efficient generation countermeasure network architecture. The neural architecture search not only improves the design efficiency of generating the countermeasure network architecture, but also discovers an innovative architecture which is difficult to explore by the traditional method, and further improves the quality of generated data and the performance of a model. However, the conventional neural architecture search method can obtain good effects in generating the antagonistic network architecture search, but has problems such as low search efficiency and huge consumption of computing resources.

In order to further advance the application of neural architecture searches in generating antagonistic network architecture designs, a number of new and improved approaches have emerged in recent years, such as introducing evolutionary computation into neural architecture searches. The evolution calculation simulates the biological evolution process, and the network architecture is gradually optimized through operations such as selection, intersection, variation and the like. By combining evolutionary computation with neural architecture searching, a superior performing generation countermeasure network architecture can be efficiently found in a wider search space. However, this method still has some drawbacks, such as low efficiency in a high-dimensional search space, easy sinking into a locally optimal solution, difficulty in ensuring that a globally optimal architecture is found, and often the search space is composed of pure convolution operation, lack of exploration of a wider search space, and failure in fully releasing the long-distance information capturing capability of the model. Therefore, it is necessary to propose a differentiable architecture search method that introduces a hybrid convolution operation. This can improve the search efficiency of the architecture and enhance the long-distance information capturing capability.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a system for searching a generating countermeasure network architecture based on a mixed convolution operation.

Wherein the method comprises the steps of:

step 1, designing and generating a search space for searching against a network architecture, defining candidate operations, wherein the candidate operations comprise two types of common candidate operations and up-sampling candidate operations, and determining the scope of the architecture search;

step 2, constructing a generation countermeasure network super network containing all candidate frameworks according to the search space designed in the step 1, and randomly assigning an framework parameter for each candidate operation by using continuous numerical values in the range of 0 to 1 to represent the probability that each candidate operation is selected;

step 3, adding a part of channel attention mechanisms in the edges of the common candidate operation of the super network so as to calculate the attention weights of different channels;

step 4, according to the feature map output and the calculated channel attention weight of each candidate operation, weighting and adjusting important channels and features in the feature map, so that the output of different operations is more prominent in important features, and thus the architecture parameters of the candidate operation are dynamically adjusted and optimized;

step 5, training by using a gradient descent method to generate an antagonistic network super-network, and optimizing super-network structural parameters and weight parameters according to gradient information;

and step 6, alternately executing the step 4 and the step 5 until the preset maximum cycle times are reached, and selecting the candidate operation with the maximum architecture parameter weight in each side of the super network to obtain the optimal network architecture.

In step 1, the search space includes N interconnected and jump-connected units, each unit is regarded as a directed acyclic graph composed of M nodes, and the unit includes D up-sampling candidate operation nodes and E common candidate operation nodes, whereinN, M, D, E are natural numbers;

The method comprises the steps of adding a mixed convolution operation in a common candidate operation, combining the convolution operation and a self-attention operation, enabling an input feature map to be projected through three shared 1X 1 convolutions, generating three intermediate feature maps, enabling the projected feature maps to be used for a convolution path and a self-attention path respectively, dividing each intermediate feature map into h groups in the self-attention path, enabling each group to comprise three feature maps from each 1X 1 convolution to serve as a query, a key and a value respectively, conducting multi-head attention calculation, generating k2 feature maps through a lightweight full-connection layer in a convolution path with a convolution kernel size of k, processing the input feature maps through translation and aggregation operation, and obtaining final output according to weighted summation of output from the convolution path and the self-attention path, wherein the input feature maps are similar to the traditional convolution operation :

,

Where α and β represent the mixed convolution operational weights.

In step 2, the generation of the countermeasure network extranet sharingStrip edge, whereinThe strips are the edges of the jump connection,The bar is the edge on which the upsampling candidate operation is located,The jump connection and up-sampling candidate operation have T operation selections, and the common candidate operation has Y operation selections;

the super network comprises S sub networks, wherein S is a natural number, The sub-net represents a network with only one operation on all connected edges;

When the super-network is initialized, random distribution assignment is adopted on the architecture parameters of the candidate operation on each side, the numerical value represents the probability of the candidate operation being selected, for example, continuous data with random distribution ranging from 0 to 1 is used for giving weight to each connection change, the random initialization can break the symmetry of the network, each node can learn the characteristics of the input data in different modes at the initial stage of training, and the method is beneficial to avoiding the network from falling into a local optimal solution.

A subnet is a collective term for all candidate architectures in a search space, with a subnet representing a particular architecture selected in a given search space, each subnet being part of the subnet. Traditionally, the input information is typically processed through a single operation (e.g., convolution) to generate a new signature. However, in a super-network, the input information may undergo various operations such as dynamic separable convolution, hole convolution, and jump connection. The feature graphs generated by the operations are fused in a mode of adding elements bit by bit, and finally a new mixed feature graph is obtained. In the training process, the super-network can automatically select and adjust the weights of the candidate operations through learning and optimization, so that the optimal operation combination and network structure are finally determined.

In the invention, the up-sampling and jump connection edge of the super-network has T candidate operations including nearest neighbor interpolation up-sampling, bilinear interpolation up-sampling and the like, and the common candidate operations have Y candidate operations including 1×1 convolution, 1×1 mixed convolution, 3×3 separable convolution, 3×3 mixed convolution, 5×5 mixed convolution and the like.

The super-network described in the present invention is a network structure comprising a plurality of sub-networks, but the network itself is discrete and discontinuous, which directly hampers the use of gradient-based optimization methods for searching. In order to overcome the difficulty, the invention adopts a continuous relaxation strategy to convert the original discrete candidate operation into a continuous and conductive representation form so as to be capable of searching by utilizing optimization algorithms such as gradient descent and the like.

Specific continuous relaxation operations are as follows, for each node pair (i, j) in the generator super-network, all defined candidate operations are represented as a set O _n in a unified way, the value range of n is determined by the number of candidate operations, and a group of architecture parameters corresponding to O _n are introducedO represents an operation in the candidate operation set,. Wherein the architecture parametersThe larger the value, the higher the probability that the operation o is selected in the connection from node i to node j. Specifically, the architecture parameters are initialized to trainable continuous variables, which are converted to probability distributions by applying a softmax function, wherein the probability that each candidate operation is selected is proportional to the architecture parameter value corresponding to the candidate operation. Specifically, for the output of node i to node jThe calculation formula is as follows:

,

Wherein, Representing parameters of a frameworkBy conversion of the softmax function to a probability distributionRepresenting application operation o at inputAs a result of (a),For temporary parameters, the index representing a particular operation, exp is an exponential function,E represents a natural constant.Indicating that in the connection of node i to node j, a specific operation is selectedIs a relative likelihood of (a) is determined.

Through such operations, the original discrete candidate operation selection problem is converted into a continuous optimization problem, and then the optimal generated countermeasure network architecture can be searched by optimizing architecture parameters. By iteratively updating the architecture parameters and calculating the gradient of the loss function, the weight of each operation can be adjusted step by step so that the performance of the whole generator super-network on the training set is optimal. And finally, selecting the optimal operation between each node pair according to the values of the architecture parameters, and obtaining the searched optimal architecture.

In step 3, after adding part of channel attention mechanism to each common convolution operation node, the method reduces computing resource consumption while improving search efficiency, and specifically comprises the steps of firstly enabling an input feature map to pass through an average pooling layer and a maximum pooling layer respectively to obtain two different feature maps and realize compression of channel number and extraction of important features when computing the feature map, and then further carrying out linear change on the pooled features by the obtained feature map through two full-connection layers FC ₁ and FC ₂, wherein a first full-connection layer FC ₁ reduces the channel number to the original channel numberAnd finally, adding the output from the average pooling and the maximum pooling, and normalizing the obtained output to ensure that the sum of the weights of all channels is 1, thereby obtaining the attention weights of different channels.

In step 4, firstly summing the attention weights of each channel in the attention mechanism of part of channels to obtain a total vector representing the weight of each channel, wherein each element of the total vector represents the importance of the corresponding channel, and then selecting the weights from the total vector to reorder the previous oneWherein q represents a scaling factor selected by a part of the channel attention mechanism, and finally the weighting operation is calculatedUpdating the corresponding positions of the channels and the original feature map to form a new output feature map, setting the input feature map as X, setting the attention weight of part of the channel attention mechanisms as A,The specific weight of the ith channel is indicated as small, the number of channels is C, the architecture parameter weight of each candidate operation is W,The architecture parameter weight representing the jth operation, the list of operations is O,Indicating that the j-th operation is to be performed,For the calculation of the partial channel attention mechanism, topq () is a channel selection operation,、、、All are intermediate variables, the step 4 specifically includes the following steps:

step 4.1, calculating the attention weight of part of the channel attention mechanism:

;

Step 4.2, weighting the input feature map:

;

Step 4.3, calculating the total weight vector :

;

Step 4.4, selecting the front position according to the weight of each channelSince the number of channels is C, the parameters in the channel selection are:

;

Step 4.5, extracting important channel data, wherein the channels parameter specifies the channel index to be extracted:

;

wherein the first symbol Indicating the selection of all samples, symbolsRepresenting that all features in the channels specified by the indices are reserved during data extraction so as to ensure complete feature information;

step 4.6, weighting the important channel:

;

Step 4.7, splicing the calculated important channels with the original channels, and designating channel indexes to be updated by indexes:

,

wherein the first symbol Indicating the selection of all samples, symbolsRepresenting reservation of a front specified by an index at the time of data updateAll features in the channel to ensure that the updated channel is intact.

Through the steps, the consumption of computing resources is effectively reduced by the partial channel attention mechanism, and the response of the model to the key information is enhanced.

The step 5 comprises the following steps:

Step 5.1, initializing training parameters including the learning rate of the generator and the discriminator Weighting with network parametersOptimizer parameters, hybrid convolution operation weightsAnd a fixed noise vector;

Step 5.2, developing resistance training for the generator and the discriminator, and calculating loss through the discriminator according to the image and the real image generated by the generator;

Step 5.3, counter-propagating by using the discriminator and the generator loss, updating the weights of the discriminator and the generator, and optimizing the architecture parameters of the candidate operation;

Step 5.4, adjusting the weight of the mixed convolution operation according to the image generated in the training process and the feedback of the discriminator 。

Step 5.1 comprises:

step 5.1.1, setting learning rates of the generator and the discriminator to be respectively AndAnd creating an optimizer according to preset parameters to ensure that network parameters can be adjusted in the optimization process.

And 5.1.2, initializing weights alpha and beta of the mixed convolution operation to be 0.5, and taking the weights as importance weights for controlling the output of each path of the mixed convolution operation so as to realize flexible adjustment of network feature extraction.

And 5.1.3, generating a fixed noise vector fixed _z for evaluating the performance of the generator in the training process, and ensuring that the evaluation result in each training iteration has consistency.

Step 5.2 comprises:

step 5.2.1, the generator randomly samples a noise vector z from the noise distribution, which is typically a uniform or normal distribution, to ensure that the generator can explore the diversified potential space. And generating a false image by a generator For subsequent comparison and discrimination with the real image. The generation of the countermeasure network consists of a generator and a discriminator, wherein the generator is used for generating images and is a component for generating the countermeasure network;

Step 5.2.2 the discriminator receives the real image x and the false image And calculates the output by forward propagationAnd,AndThe discrimination results of the true image and the false image are respectively shown, the values of the discrimination results are 0 and 1, and the higher the value is, the higher the possibility that the discriminator considers the input content to be true is. The aim of the discriminator is to identify as much as possible a true image and a false image, which is intended to be true, i.e. to increaseIs close to 1, while the false image is judged as false, i.e. reducedThe value of (2) is close to 0. The goal of the generator is to make the false image generated by itself deceive the discriminator, i.e. increaseThe value of (2) is close to 1. In this way, the generator and discriminator develop the resistance training at these two conflicting goals.

Step 5.2.3, performing a discriminator loss function calculation, wherein the discriminator loss function uses a logarithmic loss functionThe formula is:

,

Loss function The range of values is 0, + -infinity), where E represents the desire,Representing a real image x subject to a real data distribution,Representing random noise z subject to noise distributionThe goal of the discriminator is to minimize the loss functionI.e. maximizing the recognition rate for real data and the rejection rate for generated data;

Step 5.2.4, performing a generator loss function calculation, wherein the generator loss function uses a logarithmic loss function The formula is:

,

the loss function The range of values is [0, + ], the goal of the generator is to minimize the loss functionValues of (2) such thatAs close to 1 as possible, i.e. maximizing the probability that the generated image is considered by the discriminator to be a true image.

Step 5.3 comprises:

step 5.3.1, fixing the network parameter weights of the generators Keeping the generator unchanged according to the loss function of the discriminatorGradient calculation using back propagation methodAnd according to the learning rate of the discriminator by a gradient descent methodUpdating weight parameters of a discriminatorThe calculation formula is as follows:

,

then, one weight update is completed and the frequency is updated according to the discriminator Repeating the above operationsSecondary times;

step 5.3.2, fixing the network parameter weights of the discriminator Keeping the discriminator unchanged, according to the generator's loss functionComputing gradients using back propagation methodsAnd uses gradient descent method according to generator learning rateUpdating weights of a generatorThe calculation formula is as follows:

,

then finishing one-time weight updating;

step 5.3.3, optimizing the candidate operation architecture parameter weight W, wherein the calculation formula of the Output of each candidate operation is as follows:

Output,

where X is the input feature map, tol is the total number of candidate operations, The architecture parameter weight for the ith operation,Is the output of the ith operation;

Calculating a loss function L using the chain law weights each candidate operation Gradient of (2)Updating candidate operating architecture parameter weights using a gradient descent method:

,

finally, optimizing candidate operation architecture parameters;

step 5.3.4, dynamically adjusting the learning rate of the discriminator and the generator by using an adaptive moment estimation algorithm according to the optimizer settings:

Updating parameters by calculating moving averages of first and second moments, the updating formula being:

,

Wherein the method comprises the steps of Weight parameters that need to be updated on behalf of the current time step, as described aboveT in the subscript indicates the current time step or iteration number, lr is the number of iterations includingIs used for the learning rate of the user,Is an exponentially weighted moving average of the current gradient,Is an exponentially weighted moving average of the square of the current gradient,Is a minute constant for preventing zero removal errors. By the method, the step length of weight updating is dynamically optimized on the basis of keeping the respective learning rate, and the learning rate is indirectly adjusted, so that the training effect and the convergence speed of the generator and the discriminator are improved;

in step 5.3.5, the perceived similarity index IS (Inception Score) and the fraiche distance FID (Frechet Inception Distance) in the training process are monitored and recorded, and IS used for evaluating the diversity and the inter-class separation degree of the generated images in order to evaluate the common performance index of the generated countermeasure network, wherein the IS used for evaluating the quality and the diversity of the generated images, and the FID IS used for evaluating the similarity of the generated images and the real images in the feature space.

In step 6, the network weight parameters and the architecture parameters are continuously optimized and adjusted in a gradient descent mode, so that the super network gradually approaches to the optimal performance, and finally, the candidate operation with the largest weight is selected according to the distribution condition of the architecture parameter weights, so that the optimal network architecture is constructed.

At each iteration, the importance of the different operations is dynamically adjusted based on the feature map output and corresponding attention weights for each candidate operation to ensure that the training process to generate the countering network extranet is more efficient. And simultaneously, selecting the candidate operation with the greatest weight in each side of the super network to form an optimal network architecture. The process ensures that the expression capability and the generation effect of the model are improved by fully utilizing part of attention mechanisms while optimizing the architecture, so that better performance is achieved while the calculation complexity is lower.

The invention also provides a system for searching the generating countermeasure network architecture based on the mixed convolution operation, which comprises the following steps:

A building module for building up an antagonism network super-network and giving importance weights to each operation in the super-network with random continuous values in the range of 0 to 1, the weights being used to represent the probability that each operation is selected as an optimal operation;

The mixed convolution module is used for integrating common convolution operation and a self-attention mechanism to obtain a new operation which combines the advantages of the common convolution operation and the self-attention mechanism;

The training module is used for training the network weight of the super-network by using the training data and the labels thereof;

and the optimizing module is used for optimizing the hypersystem framework parameters by using a gradient descent method.

According to the invention, by introducing the mixed convolution operation, the search space of the traditional pure convolution neural network is remarkably expanded, the capturing capability of the model on long-distance information is enhanced, and the generation effect of the generation countermeasure network is improved. In addition, by combining partial channel attention mechanisms, the Martai effect caused by the randomness of the initialization of the architecture parameters in the architecture searching process is effectively relieved, the effect is represented in that the operation of initially obtaining high random values is continuously strengthened in the subsequent process, and the operation of low random values can be ignored in the architecture searching, so that a potential excellent architecture is missed, and the architecture searching is unstable. This approach achieves better results than the prior art approach.

The method has the beneficial effects that the method provides a hybrid convolution-based search algorithm for generating the countermeasure network architecture aiming at the neural architecture search for generating the countermeasure network. The algorithm introduces the mixed convolution combining the convolution and the self-attention mechanism into the search space, so that the traditional convolutional neural network search space is effectively expanded, and the capturing capacity of the model for long-distance information is improved. Before searching, a continuous loose mode is adopted, so that the original discrete search space becomes continuous, and gradient descent is directly used for searching, and the searching efficiency is greatly improved. In the searching process, a part of channel attention mechanisms are introduced in the common candidate operation, attention weights are distributed for different candidate operations, and a channel with the front attention weight rank is selected to participate in subsequent calculation, so that the consumption of calculation resources is reduced, and the searching efficiency is improved. The hybrid convolution-based generation countermeasure network architecture search algorithm brings new ideas and breakthroughs for the architecture search field, and has sufficient practical value and application scenes.

Drawings

FIG. 1 is a diagram of an overall framework of a search space in the present invention.

Fig. 2 is a schematic diagram of the hybrid convolution operation in the present invention.

FIG. 3 is a schematic diagram of a portion of the channel attention mechanism in the present invention.

FIG. 4 IS a graph of the architecture optimized IS score trend in the present invention.

FIG. 5 is a graph of architecture optimized FID score trend in the present invention.

Detailed Description

The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings and detailed description.

As shown in fig. 1, the present embodiment provides a method for searching a generating countermeasure network architecture based on a hybrid convolution operation, including the steps of:

Step 1, firstly, designing a search space for generating an antagonistic network generator architecture as shown in fig. 1, wherein the search space consists of 3 units which are connected with each other and are connected with hops, and the inside of the unit can be regarded as a directed acyclic graph which is connected with each other by 5 nodes. Gray arrows indicate up-sampling operations, black arrows indicate normal operations, up-sampling candidate operations are set to { transpose convolution, nearest neighbor interpolation up-sampling, bilinear interpolation up-sampling }, normal candidate operations are set to {1×1 convolution, 3×3 convolution, 5×5 convolution, 3×3 separable convolution, 5×5 separable convolution, 7×7 separable convolution, 1×1 mixed convolution, 3×3 mixed convolution, 5×5 mixed convolution }.

The mixed convolution operation is shown in fig. 2, firstly, an input feature map with length, width and channel number of H, R, C is projected through three 1×1 convolutions to obtain an intermediate feature map after three projections, and then the three feature maps are divided into h groups, wherein h represents the number of attention heads. 3h intermediate feature maps were obtained and these projected feature maps were used for the convolution path and the self-attention path, respectively. The upper part of fig. 2 shows the calculation process of the convolution path, where k represents the convolution kernel size in the convolution path, and 1,3, and 5 are taken in the present invention, respectively, corresponding to the hybrid convolution operation described above. In the convolution path, a lightweight full connection layer is adopted and generatedFeatures generated by translation and aggregation are processed in a convolution-like manner, information is collected from the local receptive field as in conventional convolution, and finally the feature map is output. The lower half of FIG. 2 illustrates the operation of the self-attention mechanism, which divides the intermediate features into h groups, each group containing three features, from each 1X 1 convolution, respectively, the corresponding three feature maps acting as queries, keys and values, respectively, following the conventional multi-headed self-attention calculation output. Finally, a final output feature map y is calculated from two trainable parameters α and β:

Step 2, constructing a generator super-network according to the search space shown in fig. 1, randomly initializing the super-network, and endowing each candidate operation with random architecture parameters between 0 and 1 And ensures that the sum of the parameter values of each node architecture is 1. The input noise with 128-dimensional channel number is randomly generated to input the super-network to generate an image, and the input directions of the super-network feature map are all unidirectional input from front to back.

And 3, adding a part of channel attention mechanism after the node where the common candidate operation is located as shown in fig. 3, so that the input feature map respectively passes through an average pooling layer and a maximum pooling layer to obtain two different feature maps, and compressing the number of channels and extracting important features. The obtained characteristic diagram is further subjected to linear change on the pooled characteristic through two full-connection layers with RELU linear units in the middle, and finally attention scores are output after normalization.

And 4, summing the architecture parameter weights of each channel in the attention mechanism to obtain a vector representing the total weight of each channel, wherein each element of the vector represents the total importance of the corresponding channel. Then select the front from the total weight vectorsIs involved in the calculation of the weighting operation, and finally the part of the calculation is performedChannels and the rest of the original characteristic diagramThe channels are spliced to form a new output feature map, the input feature map is set to be X, the attention weight of the channels is set to be A, the number of the channels is set to be 128, the weight of each operation is set to be W, and the operation list is set to be O, which can be expressed as follows:

step 4.1, calculating the channel weight, ;

Step 4.2, weighting the input feature map,;

Step 4.3, calculating the total weight vector,;

Step 4.4, selecting the most important channel,;

Step 4.5, extracting important channel data,;

Step 4.6, weighting the important channels,;

Step 4.7, splicing the calculated important channel with the original channel, 。

And 5, according to the set super network, developing and generating search of the architecture of the countermeasure network generator, wherein the method specifically comprises the following steps of:

step 5.1, initializing training parameters including learning rate and network parameters of generator and discriminator, optimizer parameters, mixed convolution operation weights And a fixed noise vector:

step 5.1.1 setting the learning rates of the generator and the discriminator to 0.0002 according to the parameters An adaptive moment estimation algorithm optimizer is created to ensure that network parameters can be adjusted during the optimization process.

And 5.1.3, generating a fixed noise vector fixed _z with the dimension of 128, which is used for evaluating the performance of the generator in the training process and ensuring that the evaluation result in each training iteration has consistency.

Step 5.2, developing the resistance training for the generator and the discriminator, and calculating the loss through the discriminator according to the image and the real image generated by the generator:

Step 5.2.1 the generator randomly samples a 128-dimensional noise vector z from the noise distribution, a process that typically uses a uniform or normal distribution to ensure that the generator can explore the diverse potential space. Subsequently, the generator network maps the noise vector z through a multi-layer neural network to generate a false image For subsequent comparison and discrimination with the real image.

Step 5.2.2 the discriminator receives the real image x and the generated false imageAnd calculates the output by forward propagationAnd. Wherein, Representing the discrimination result of the discriminator on the real imageThe result of discrimination of the generated image is indicated. The goal of the discriminator is to distinguish the real image from the generated image as accurately as possible to enhance the model's understanding of the real data distribution.

Step 5.2.3 calculating the loss function of the discriminatorThe loss function is a logarithmic loss function, which is designed to measure the classification ability of the discriminator in discriminating between true and false images. By matching the real imageThe discriminator is penalized to encourage it to better identify the true data. To generate an imageThe output of (a) is logarithmic, the discriminator is caused to enhance the recognition ability of the generated image. The optimization of the loss function aims at improving the accuracy of the discriminator and ensuring that the discriminator can effectively distinguish a true sample from a generated sample in the training process.

Step 5.2.4 calculating the loss function of the generator using the logarithmic loss functionWhich is intended to reflect the capabilities of the false image spoof discriminator generated by the generator. The loss of the generator is reduced when the false images generated by the generator successfully make the discriminator consider them as coming from the true data distribution. By optimizing the loss, the generator continuously adjusts the generation strategy, enhances the authenticity of the generated image, and enables the generated image to be more visually recognized, thereby improving the generation effect. In this process, the goal of the generator is to minimize its loss so thatAs close to 1 as possible, thereby improving the quality of the generated image.

And 5.3, counter-propagating by using the discriminator and the generator loss, updating the weights of the discriminator and the generator, and optimizing the parameters of the network architecture:

Step 5.3.1, updating the weight parameters of the discriminator, fixing the parameters of the generator, keeping the generator unchanged, calculating the gradient according to the loss function of the discriminator, using a back propagation algorithm, calculating the gradient of the loss function to the weight of the discriminator by a chain law, updating the weight of the discriminator by a gradient descent method, multiplying the calculated gradient value by the learning rate, subtracting from the current weight, and repeating the above operations Secondary weight updates to ensure that the discriminatory power of the discriminator is sufficiently trained to maintain balance between the generator and the discriminator.

And 5.3.2, updating the weight parameters of the generator, fixing the parameters of the discriminator, keeping the discriminator unchanged, calculating the gradient according to the loss function of the generator, using a back propagation algorithm, calculating the gradient of the loss function to the weight of the generator through a chain rule, updating the weight of the generator through a gradient descent method, multiplying the calculated gradient value by a learning rate, and subtracting the learning rate from the current weight to finish 1-time weight updating.

And 5.3.3, optimizing network architecture parameters, and selecting the optimal operation through weight adjustment of different candidate operations in the super network. At this stage, the weights of the candidate operations in the super-network are dynamically adjusted according to the current training progress and network performance index. By evaluating the influence of different operations on the performance of the generator and the discriminator, the operation with the best performance is selected, so that the generation capacity and the discrimination capacity of the model are further improved.

And 5.3.4, developing dynamic adjustment on the learning rate of the discriminator and the generator according to the setting of the optimizer, and gradually adjusting the learning rate to adapt to the convergence rate of the model according to the current training stage and performance.

And 5.3.5, monitoring and recording performance indexes and training parameters in the training process.

Step 5.4, adjusting the weight of the mixed convolution operation according to the image generated in the training process and the feedback of the discriminator。

And 6, adopting a differentiable searching method, optimizing the super network through a gradient, and stopping searching after 100 iterations are carried out. At this time, according to the architecture parameters of the candidate operation in each nodeAnd selecting the operation with the largest weight as a final framework. The concrete steps are as follows:

{ normal operation, [ [1×1 convolution, 3×3 mixed convolution ], [5×5 mixed convolution, 3×3 separable convolution, [ 3×3 separable convolution ], [7×7 separable convolution, 3×3 mixed convolution ] ], upsampling operation [ [ transposed convolution, transposed convolution ], [ nearest neighbor interpolation upsampling ] ], skip connection of length 1 [ [ transposed convolution, bilinear interpolation upsampling ], skip connection of length 2 [ transposedconvolution ] }. Wherein 1×1,3×3, etc. represent the convolution kernel size, the contents in each square bracket represent an operational combination of units, the units are arranged in the numbering order, and the operations in the units are arranged in the node numbering order. The length of a hop connection refers to the span of the connection-the hop connection of length 1 includes the connection of unit 1 to unit 2 and unit 2 to unit 3, and the hop connection of length 2 is the connection of unit 1 to unit 3.

The embodiment adds the hybrid convolution operation of integrating the self-attention mechanism and the convolution on the basis of the traditional convolution neural network, effectively expands the search space for generating the countermeasure network architecture, and improves the capturing capability of the network for long-distance information. In addition, the partial channel attention mechanism introduced by the embodiment effectively reduces the consumption of computing resources and improves the capability of selecting key operations of the super network. This example conducted experiments and validated performance on two common data, CIFAR-10 and STL-10. CIFAR-10 pictures of 60000 different categories are included, of which 50000 are used for training of the network and 10000 are used for testing of the network, and the sizes of the pictures are 32×32.STL-10 contained 13000 10 different categories of pictures, 96X 96 in size, which provided higher resolution than STL-10, and in this example were sized 48X 48 for balance of performance. The present embodiment only requires 2 hours on the GPU4090 to search for an excellent network architecture, which is far faster than other non-gradient algorithms, and the search speed is greatly improved. On the basis, the found architecture achieves better generation effect and can generate rich and lifelike images. The performance of the network architecture obtained by searching in the invention when optimized in the standard dataset CIFAR-10 is shown in fig. 4 and 5:

Inception Score (perceived similarity index, IS) and Frechet Inception Distance (fraiche distance, FID) in the figure are performance metrics commonly used in generating countermeasure networks. Where the IS score in fig. 4 IS used to evaluate the diversity and inter-class separation of the generated images, the size of the score reflects the quality and diversity of the generated images, and thus the higher the score the better, the higher IS score indicates that the generated images are not only visually diversified, but also more attractive. Meanwhile, the FID score in fig. 5 measures the similarity between the generated image and the real image in the feature space, so that the lower the score, the better, the lower the FID score indicates that the feature distribution of the generated image is closer to the feature distribution of the real image, indicating that the generated sample is more true on the feature level. The method of the present embodiment enables the discovery of excellent performance architectures in a very short time, and these architectures can quickly converge to optimal performance upon retraining.

The present invention provides a method and a system for generating an antagonistic network architecture based on a hybrid convolution operation, and the method and the way for realizing the technical scheme are numerous, the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principles of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims

1. A method for searching a generative adversarial network architecture based on a hybrid convolution operation, comprising the following steps:

Step 1, designing a search space for a generative adversarial network architecture search, defining candidate operations, which include two types: ordinary candidate operations and upsampling candidate operations, and determining the scope of the architecture search;

Step 2: construct a generative adversarial network supernet containing all candidate architectures based on the search space designed in step 1, and randomly assign an architecture parameter to each candidate operation using a continuous value ranging from 0 to 1 to represent the probability of each candidate operation being selected;

Step 3: Add a partial channel attention mechanism to the edge where the common candidate operations of the supernet are located to calculate the attention weights of different channels;

Step 4: According to the feature map output of each candidate operation and the calculated channel attention weight, the important channels and features in the feature map are weighted and adjusted so that the outputs of different operations highlight the important features, thereby dynamically adjusting and optimizing the architecture parameters of the candidate operations;

Step 5: Use the gradient descent method to train the generated adversarial network supernet, and optimize the supernet architecture parameters and weight parameters according to the gradient information;

Step 6: Alternately execute steps 4 and 5 until the preset maximum number of cycles is reached, and select the candidate operation with the largest architecture parameter weight in each edge of the supernet to obtain the optimal network architecture.

2. The method according to claim 1, characterized in that, in step 1, the search space includes N interconnected and skip-connected units, each unit is regarded as a directed acyclic graph consisting of M nodes, and the unit contains D upsampling candidate operation nodes and E ordinary candidate operation nodes, wherein , N, M, D, E are all natural numbers;

A hybrid convolution operation is added to the normal candidate operation; the hybrid convolution operation combines the convolution operation and the self-attention operation at the same time, so that the input feature map is projected through three shared 1×1 convolutions to generate three intermediate feature maps, and then the projected feature maps are used in the convolution path and the self-attention path respectively; in the self-attention path, each intermediate feature map is divided into h groups, each group contains three feature maps from each 1×1 convolution, which are used as query, key and value for multi-head attention calculation; in the convolution path with a convolution kernel size of k, k² feature maps are generated through a lightweight fully connected layer, and the input feature map is processed by translation and aggregation operations; finally, the outputs from the convolution path and the self-attention path are summed according to the weights to obtain the final output :

Where 𝛼 and 𝛽 represent the weights of the mixed convolution operation.

3. The method according to claim 2, characterized in that in step 2, the generative adversarial network supernet has edge, where The edges are skip connections, The edge is where the upsampling candidate operation is located. The strips are edges where common candidate operations are located; the skip connection and upsampling candidate operations have T operation options, and the common candidate operations have Y operation options;

The supernet includes S subnets, where S is a natural number. , a subnet represents a network with only one operation on all connected edges;

When the supernet is initialized, the architectural parameters of the candidate operations on each edge will be assigned values using a random distribution, and the value indicates the probability of the candidate operation being selected.

4. The method according to claim 3 is characterized in that, in step 3, after adding the partial channel attention mechanism to each ordinary convolution operation node, the search efficiency is improved while reducing the consumption of computing resources, specifically comprising: when calculating the feature map, firstly, the input feature map is passed through an average pooling layer and a maximum pooling layer respectively to obtain two different feature maps and realize the compression of the number of channels and the extraction of important features; then the obtained feature map is passed through two fully connected layers FC ₁ and FC ₂ to further linearly change the pooled features, wherein the first fully connected layer FC ₁ reduces the number of channels to 1/2 of the original number of channels. , and nonlinearity is introduced through the ReLU activation function, and the number of channels is restored to its original size through the second fully connected layer FC ₂ ; finally, the outputs from average pooling and maximum pooling are added, and the obtained output is normalized so that the sum of the weights of all channels is 1, and the attention weights of different channels are obtained.

5. The method according to claim 4, characterized in that in step 4, firstly, the attention weights of each channel in the partial channel attention mechanism are summed to obtain a total vector representing the weight of each channel, and each element of the total vector represents the importance of the corresponding channel; then, the weighted channels are selected from the total vector. The channels of the weighted operation are calculated, where q represents the proportional coefficient selected by the partial channel attention mechanism. Finally, the weighted operation is calculated. The corresponding positions of the channel and the original feature map are updated to form a new output feature map. The input feature map is set to X, and the attention weight of the partial channel attention mechanism is set to A. represents the specific weight of the i-th channel, the number of channels is C, and the architectural parameter weight of each candidate operation is W. represents the architectural parameter weight of the jth operation, the operation list is O, represents the jth operation, is the calculation of the partial channel attention mechanism, topq() is the channel selection operation, , , , are all intermediate variables, then step 4 specifically includes the following steps:

Step 4.1, calculate the attention weights of the partial channel attention mechanism:

Step 4.2, weighted input feature map:

Step 4.3, calculate the total weight vector :

Step 4.4, select the first channel according to the weight of each channel. The number of channels is C, so the parameters in the channel selection are :

Step 4.5, extract important channel data, the indices parameter specifies the channel index to be extracted:

The first symbol Indicates the selection of all samples, symbol Indicates that all features in the channel specified by indices are retained during data extraction;

Step 4.6, weighting operation on important channels:

Step 4.7, concatenate the calculated important channels with the original channels. The indices parameter specifies the channel index to be updated:

The first symbol Indicates the selection of all samples, symbol Indicates that the previous value specified by indices is retained when the data is updated. All features in the channel.

6. The method according to claim 5, characterized in that step 5 comprises:

Step 5.1, initialize the training parameters, which include the learning rates of the generator and the discriminator and network parameter weights , optimizer parameters, mixed convolution operation weights and a fixed noise vector;

Step 5.2: Conduct adversarial training on the generator and discriminator, and calculate the loss through the discriminator based on the images generated by the generator and the real images;

Step 5.3, back-propagate using the discriminator and generator losses, update the discriminator and generator weights, and optimize the architecture parameters of the candidate operation;

Step 5.4, adjust the weights of the mixed convolution operation based on the images generated during training and the feedback from the discriminator .

7. The method according to claim 6, characterized in that step 5.2 comprises:

Step 5.2.1, the generator randomly samples a noise vector z from the noise distribution and generates a fake image through the generator ; The Generative Adversarial Network consists of a generator and a discriminator. The generator is used to generate images and is a component of the Generative Adversarial Network;

Step 5.2.2, the discriminator receives the real image x and the fake image , and output is calculated by forward propagation and , and They represent the discrimination results of real images and fake images respectively, and their values are both [0,1];

Step 5.2.3, calculate the discriminator loss function, where the discriminator loss function uses the logarithmic loss function , the formula is:

Loss Function The value range is [0, +∞), where E represents the expectation, Indicates that the real image x follows the real data distribution , Indicates that random noise z follows the noise distribution ; The goal of the discriminator is to minimize the loss function The value of is to maximize the recognition rate of real data and the rejection rate of generated data;

Step 5.2.4, calculate the generator loss function, where the generator loss function uses the logarithmic loss function , the formula is:

The loss function The value range is [0, +∞), and the goal of the generator is to minimize the loss function The value of is to maximize the probability that the generated image is considered to be a real image by the discriminator.

8. The method according to claim 7, characterized in that step 5.3 comprises:

Step 5.3.1, fix the network parameter weights of the generator , keeping the generator unchanged, according to the loss function of the discriminator , using the back-propagation method to calculate the gradient , and the discriminator learning rate is calculated by gradient descent Update the weight parameters of the discriminator , the calculation formula is:

Then complete a weight update and update the frequency according to the discriminator Repeat the above steps for the settings Second-rate;

Step 5.3.2, fix the network parameter weights of the discriminator , keeping the discriminator unchanged, according to the loss function of the generator Calculate gradients using backpropagation , and use the gradient descent method according to the generator learning rate Update the weights of the generator , the calculation formula is:

Then complete a weight update;

Step 5.3.3, optimize the candidate operation architecture parameter weight W, and the calculation formula for the output Output of each candidate operation is:

Output ,

Where X is the input feature map, tol is the total number of candidate operations, is the architectural parameter weight of the ith operation, is the output of the i-th operation;

Use the chain rule to calculate the loss function L for each candidate operation weight Gradient , update the candidate operation architecture parameter weights using gradient descent:

Finally, the optimization of candidate operation architecture parameters is completed;

Step 5.3.4, according to the optimizer settings, use the adaptive moment estimation algorithm to dynamically adjust the learning rate of the discriminator and generator:

The parameters are updated by calculating the moving average of the first-order moment and the second-order moment. The update formula is:

in Represents the weight parameters that need to be updated at the current time step, lr includes The default learning rate is is the exponentially weighted moving average of the current gradient, is the exponentially weighted moving average of the current squared gradient, is a constant;

Step 5.3.5, monitor and record the perceptual similarity index IS and Fréchet distance FID during the training process, where IS is used to evaluate the diversity and inter-class separation of generated images, measure the quality and diversity of generated images, and FID measures the similarity between generated images and real images in the feature space.

9. The method according to claim 8 is characterized in that in step 6, the network weight parameters and architecture parameters are continuously optimized and adjusted by gradient descent so that the supernet gradually approaches the optimal performance; finally, according to the distribution of the architecture parameter weights, the candidate operation with the largest weight is selected to construct the optimal network architecture.

10. A generative adversarial network architecture search system based on hybrid convolution operations implemented according to the method of any one of claims 1 to 9, characterized in that it comprises:

A construction module, used to construct a generative adversarial network supernet, and assign an importance weight to each operation in the supernet using a random continuous numerical value ranging from 0 to 1, wherein the weight is used to represent the probability of each operation being selected as the optimal operation;

The hybrid convolution module is used to integrate the ordinary convolution operation with the self-attention mechanism to obtain a new operation that combines the advantages of both.

A training module for training the network weights of the supernet using training data and its labels;

The optimization module is used to optimize the parameters of the supernet architecture using the gradient descent method.