WO2022012668A1 - Procédé et appareil de traitement d'ensemble d'apprentissage - Google Patents
Procédé et appareil de traitement d'ensemble d'apprentissage Download PDFInfo
- Publication number
- WO2022012668A1 WO2022012668A1 PCT/CN2021/106758 CN2021106758W WO2022012668A1 WO 2022012668 A1 WO2022012668 A1 WO 2022012668A1 CN 2021106758 W CN2021106758 W CN 2021106758W WO 2022012668 A1 WO2022012668 A1 WO 2022012668A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- training set
- basis
- encoded data
- discrete
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2136—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on sparsity criteria, e.g. with an overcomplete basis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
Definitions
- the present application relates to the field of artificial intelligence, and in particular, to a training set processing method and apparatus.
- neural networks In the field of artificial intelligence, neural networks have made a series of breakthroughs in computer vision, natural language processing, network security and other tasks in recent years, showing new possibilities in the age of intelligence.
- neural networks generally suffer from an unexplained vulnerability to adversarial examples, which may be tricked into giving false results by very small but carefully designed noises added to their input data. Such carefully constructed examples are called for adversarial samples.
- the present application provides a training set processing method for training a neural network by extracting the principal components of the training set, so that more robust features can be used to train the neural network, so that the output accuracy of the neural network is not reduced on the basis of , to improve the robustness of the resulting neural network.
- the present application provides a training set processing method, including: first, decomposing according to the first training set to obtain a basis of the first training set and a discrete sequence of the basis, the first training set includes Multiple samples, the basis of the first training set can be understood as a space composed of data in multiple directions, the basis includes at least one basis vector, and the discrete sequence includes discrete values that correspond to each basis vector in the basis one-to-one, The discrete sequence can be used to represent the discrete degree of the basis; then, the component value of each sample in the first training set on each basis vector is obtained, and multiple sets of first encoded data are obtained, wherein each set of first encoded data corresponds to one Sample, each sample has a component value in each base vector, each base vector included in the first encoded data corresponds to a component, and the components included in the first encoded data are in one-to-one correspondence with at least one base vector; The principal components in each set of first coded data in the set of first coded data, to obtain multiple
- the second encoded data with reduced interference is obtained, and a new sample can be obtained according to the second encoded data , and use the new samples for neural network training.
- the new sample reduces the interference of other components except the main component, so that the neural network obtained by training can be trained by the main component, and the robustness of the neural network is improved.
- the data set processing method provided by this application is processed for the training set, and the processing method is only related to the distribution of the principal components of the training set itself. Therefore, even if different models are trained in different scenarios, this application can be used.
- the provided training set processing method processes the training set, has strong generalization ability, and does not depend on the general algorithm of the trained model, so it has high versatility.
- obtaining the basis of the first training set and the discrete sequence of the basis according to the first training set may include: performing principal component analysis (PCA) on the first training set process to obtain the orthonormal basis and the discrete sequence of the orthonormal basis.
- PCA principal component analysis
- the first training set there are various ways to decompose the first training set, such as PCA processing or sparse coding, etc., to decompose the first training set, so as to obtain the basis and discrete sequence of the first training set, and then obtain the first training set.
- decompose the first training set such as PCA processing or sparse coding, etc.
- performing PCA processing on the first training set to obtain the orthonormal basis and the discrete sequence of the orthonormal basis may include: performing central processing on the first training set to obtain the centralised first training set. Training set, the mean value of the data included in the first training set after centralization is 0; PCA processing is performed on the first training set after centralization to obtain the orthonormal basis and the discrete sequence of the orthonormal basis, and the discrete sequence includes the centralization
- the sequence of variances in the orthonormal basis after the first training set is arranged in decreasing order, and the variances included in the discrete sequence are in one-to-one correspondence with the basis vectors in the orthonormal basis.
- the first training set can be centrally processed to obtain a centralized first training set, and then PCA processing can be performed on the centralized training set, so as to complete the rapid analysis of the first training set, Obtain orthonormal basis and discrete sequence of orthonormal basis.
- acquiring the component value of each sample in the first training set in the basis may include: using a preset algorithm, calculating the orthonormal basis of each sample in the first training set after centralization The component values on each base vector in , obtain multiple sets of first encoded data.
- the component value of each sample on each basis vector can be calculated based on the orthonormal basis after PCA processing, thereby obtaining the first encoded data.
- the preset algorithm includes an inner product operation or a sparse coding operation.
- the component value of each sample on each basis vector can be calculated through the inner product operation or the sparse coding operation, so as to realize the projection of each sample on each basis vector.
- acquiring principal components in each group of first encoded data to obtain multiple groups of second encoded data may include: retaining principal components in each group of first encoded data, and encoding each group of first encoded data Other components in the data except the principal component are replaced with preset values to obtain multiple sets of second coded data, wherein the principal components of each set of first coded data constitute at least one set of second coded data.
- other components except the main component in each group of first encoded data can be replaced with preset values, so that the components of each group of first encoded data can form one or more groups of second encoded data .
- the preset value includes 0 or a preset noise vector
- the preset noise vector includes Gaussian noise or uniformly distributed noise.
- values other than the principal component in the first encoded data can be replaced with 0 or a preset noise vector, thereby obtaining one or more sets of second encoded data, so as to reduce the number of The influence of values other than principal components in each sample on the training of the neural network improves the robustness of the resulting neural network.
- the preset noise vector is proportional to the discrete value of the transposed component. Therefore, in the embodiment of the present application, the distribution of the replaced noise vector is similar to that of the original data in the first encoded data. On the basis of reducing the influence of the values other than the principal components in each sample on the training of the neural network, The interference to the training of the neural network is reduced, and the resulting output of the neural network is more accurate.
- the present application provides a training set processing device, including:
- a decomposition unit configured to obtain a basis of the first training set and a discrete sequence of the basis according to the first training set, the first training set includes a plurality of samples, the basis includes at least one basis vector, and the degree of discreteness includes and each of the basis The discrete value corresponding to the basis vector one-to-one, and the discrete sequence is used to represent the discrete degree of the basis;
- an acquisition unit used for acquiring the component value of each sample in the first training set in the base, and obtaining multiple groups of first encoded data, each group of first encoded data corresponding to one sample;
- the obtaining unit is further configured to obtain the principal component in each group of first encoded data in the multiple groups of first encoded data, and obtain multiple groups of second encoded data, and the discrete value of the principal component is higher than the preset discrete value;
- the mapping unit is used to map multiple sets of second encoded data to the base of the first training set, and obtain samples corresponding to multiple sets of second encoded data, and the samples corresponding to multiple sets of second encoded data form a second training set.
- Sets are used to train neural networks.
- the basis is an orthonormal basis
- the decomposition unit is specifically configured to perform principal component analysis PCA processing on the first training set to obtain an orthonormal basis and a discrete sequence of orthonormal basis.
- the decomposition unit is specifically configured to: perform centralized processing on the first training set to obtain a centralized first training set, and the average value of the data included in the centralized first training set is 0; perform PCA processing on the first training set after centralization to obtain an orthonormal basis and a discrete sequence of the orthonormal basis, and the discrete sequence includes a sequence composed of the variance of the first training set after centralization in the orthonormal basis.
- the decomposition unit is specifically configured to calculate, through a preset algorithm, the component value of each sample in the first centralized training set on each basis vector in the orthonormal basis, to obtain a A set of first encoded data.
- the preset algorithm includes an inner product operation or a sparse coding operation.
- the obtaining unit is specifically configured to retain the principal components in each group of first encoded data, and replace other components except the principal components in each group of first encoded data with preset values, to obtain A plurality of sets of second coded data, wherein the principal components of each set of first coded data constitute at least one set of second coded data.
- the preset value includes 0 or a preset noise vector
- the preset noise vector includes Gaussian noise or uniformly distributed noise.
- the preset noise vector is proportional to the discrete value of the component being replaced.
- an embodiment of the present application provides a training set processing apparatus, and the training set processing apparatus has the function of implementing the training set processing method of the first aspect.
- This function can be implemented by hardware or by executing corresponding software by hardware.
- the hardware or software includes one or more modules corresponding to the above functions.
- an embodiment of the present application provides a training set processing apparatus, including: a processor and a memory, wherein the processor and the memory are interconnected through a line, and the processor invokes program codes in the memory to execute any one of the above-mentioned first aspects
- the processing-related functions in the training set processing method shown in item may be a chip.
- an embodiment of the present application provides a training set processing device.
- the training set processing device may also be referred to as a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface.
- the instructions are executed by a processing unit, and the processing unit is configured to perform processing-related functions as in the first aspect or any of the optional embodiments of the first aspect.
- an embodiment of the present application provides a computer-readable storage medium, including instructions, which, when executed on a computer, cause the computer to execute the method in the first aspect or any optional implementation manner of the first aspect.
- an embodiment of the present application provides a computer program product including instructions, which, when run on a computer, enables the computer to execute the method in the first aspect or any optional implementation manner of the first aspect.
- Fig. 1 is a kind of artificial intelligence main frame schematic diagram applied in this application
- FIG. 2 is a schematic diagram of a system architecture provided by the present application.
- FIG. 3 is a schematic structural diagram of a convolutional neural network provided by an embodiment of the present application.
- FIG. 4 is a schematic structural diagram of another convolutional neural network provided by an embodiment of the present application.
- FIG. 6 is a schematic flowchart of a training set processing method provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of an adversarial sample provided by an embodiment of the present application.
- FIG. 8 is a schematic structural diagram of a training set processing apparatus provided by the present application.
- FIG. 9 is another schematic structural diagram of a training set processing apparatus provided by the present application.
- FIG. 10 is a schematic structural diagram of a chip according to an embodiment of the present application.
- AI artificial intelligence
- AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
- artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that responds in a similar way to human intelligence.
- Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
- Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, and basic AI theory.
- Figure 1 shows a schematic diagram of an artificial intelligence main frame, which describes the overall workflow of an artificial intelligence system and is suitable for general artificial intelligence field requirements.
- the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
- the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
- the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communicate with the outside through sensors; computing power is provided by intelligent chips, such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips) are provided; the basic platform includes distributed computing framework and network related platform guarantee and support, It can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
- intelligent chips such as central processing unit (CPU), network processor (neural-network processing unit, NPU), graphics processor (English: graphics processing unit, GPU), Application specific integrated circuit (ASIC) or field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips
- CPU central processing unit
- the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
- the data involves graphics, images, voice, video, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
- Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
- machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
- Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
- Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
- some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.
- algorithms or a general system such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.
- Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, and the productization of intelligent information decision-making and implementation of applications. Its application areas mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminals, etc.
- an embodiment of the present application provides a system architecture 200 .
- the system architecture includes a database 230 and a client device 240 .
- the data collection device 260 is used to collect data and store it in the database 230 , and the training module 211 generates the target model/rule 201 based on the data maintained in the database 230 .
- the following will describe in more detail how the training module 211 obtains the target model/rule 201 based on the data.
- the target model/rule 201 is the neural network constructed in the following embodiments of the present application. For details, please refer to the relevant description of FIG. 6 below.
- the computing module may include a training module 211, and the target model/rule obtained by the training module 211 may be applied to different systems or devices.
- the execution device 210 configures a transceiver 212, which can be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and a "user" can
- the client device 240 inputs data to the transceiver 212.
- the client device 240 can send target tasks to the execution device 210, request the execution device to build a neural network, and send the execution device 210 a database for training.
- the execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
- the calculation module 211 uses the target model/rule 201 to process the input data. Specifically, the computing module 211 is used to: first, obtain a first training set, which includes a plurality of samples; then, obtain a basis of the first training set and a discrete sequence of The set includes multiple samples, the base includes at least one basis vector, and the discrete sequence includes discrete values that correspond one-to-one with each basis vector in the base, and the discrete sequence can be used to represent the discrete degree of the base;
- the samples corresponding to the second set of encoded data form a second
- transceiver 212 returns the constructed neural network to client device 240 to deploy the neural network in client device 240 or other devices.
- the training module 211 can obtain corresponding target models/rules 201 based on different data for different tasks, so as to provide users with better results.
- the data input into the execution device 210 can be determined according to the input data of the user, for example, the user can operate in the interface provided by the transceiver 212 .
- the client device 240 can automatically input data to the transceiver 212 and obtain the result. If the client device 240 automatically inputs data and needs to obtain the authorization of the user, the user can set the corresponding permission in the client device 240 .
- the user can view the result output by the execution device 210 on the client device 240, and the specific presentation form can be a specific manner such as display, sound, and action.
- the client device 240 can also act as a data collection end to store the collected data associated with the target task into the database 230 .
- FIG. 2 is only an exemplary schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
- the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .
- the training or updating process mentioned in this application may be performed by the training module 211 .
- the training process of the neural network is to learn the way to control the spatial transformation, and more specifically, to learn the weight matrix.
- the purpose of training a neural network is to make the output of the neural network as close to the expected value as possible, so you can compare the predicted value and expected value of the current network, and then update the weight of each layer of the neural network in the neural network according to the difference between the two. vector (of course, the weight vector can usually be initialized before the first update, that is, the parameters are pre-configured for each layer in the deep neural network).
- the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the expected value.
- the difference between the predicted value and the expected value of the neural network can be measured by a loss function or an objective function. Taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference.
- the training of the neural network can be understood as the process of reducing the loss as much as possible. For the process of updating the weight of the starting point network and training the serial network in the following embodiments of the present application, reference may be made to this process, which will not be repeated below.
- the neural network mentioned in this application may include various types, such as deep neural network (DNN), convolutional neural network (CNN), recurrent neural network (RNN) or residual network other neural networks, etc.
- DNN deep neural network
- CNN convolutional neural network
- RNN recurrent neural network
- residual network other neural networks etc.
- CNN is taken as an example below.
- CNN is a deep neural network with a convolutional structure.
- CNN is a deep learning architecture, which refers to multiple levels of learning at different levels of abstraction through machine learning algorithms.
- a CNN is a feed-forward artificial neural network in which each neuron responds to overlapping regions in images fed into it.
- a convolutional neural network consists of a feature extractor consisting of convolutional and subsampling layers.
- the feature extractor can be viewed as a filter, and the convolution process can be viewed as convolution with an input image or a convolutional feature map using a trainable filter.
- the convolutional layer refers to the neuron layer in the convolutional neural network that convolves the input signal.
- a convolutional layer of a convolutional neural network a neuron can only be connected to some of its neighbors.
- a convolutional layer usually contains several feature planes, and each feature plane can be composed of some neural units arranged in a rectangle. Neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract image information is independent of location. The underlying principle is that the statistics of one part of the image are the same as the other parts. This means that image information learned in one part can also be used in another part. So for all positions on the image, we can use the same learned image information. In the same convolution layer, multiple convolution kernels can be used to extract different image information. Generally, the more convolution kernels, the richer the image information reflected by the convolution operation.
- the convolution kernel can be initialized in the form of a matrix of random size, and the convolution kernel can obtain reasonable weights by learning during the training process of the convolutional neural network.
- the immediate benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
- the convolutional neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial super-resolution model during the training process, so that the reconstruction error loss of the super-resolution model becomes smaller and smaller. Specifically, forwarding the input signal until the output will generate an error loss, and updating the parameters in the initial super-resolution model by back-propagating the error loss information, so that the error loss converges.
- the back-propagation algorithm is a back-propagation motion dominated by the error loss, aiming to obtain the parameters of the optimal super-resolution model, such as the weight matrix.
- a convolutional neural network (CNN) 100 may include an input layer 110 , a convolutional/pooling layer 120 , where the pooling layer is optional, and a neural network layer 130 .
- the convolutional/pooling layer 120 may include layers 121-126 as examples.
- layer 121 is a convolutional layer
- layer 122 is a pooling layer
- layer 123 is a convolutional layer
- layer 124 is a convolutional layer.
- Layers are pooling layers
- 125 are convolutional layers
- 126 are pooling layers; in another implementation, 121 and 122 are convolutional layers, 123 are pooling layers, 124 and 125 are convolutional layers, and 126 are pooling layer. That is, the output of a convolutional layer can be used as the input of a subsequent pooling layer, or it can be used as the input of another convolutional layer to continue the convolution operation.
- the convolution layer 121 may include many convolution operators, which are also called kernels, and their role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
- the convolution operator can essentially be a weight matrix, which is usually pre-defined. During a convolution operation on an image, the weight matrix is usually performed one pixel by one pixel (or two pixels by two pixels...depending on the value of stride) in the horizontal direction on the input image. processing to complete the work of extracting specific features from an image.
- the size of this weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix is the same as the depth dimension of the input image.
- the weight matrix will extend to the entire depth of the input image. Therefore, convolution with a single weight matrix will produce a single depth dimension of the convolutional output, but in most cases a single weight matrix is not used, but multiple weight matrices of the same dimension are applied.
- the output of each weight matrix is stacked to form the depth dimension of the convolutional image.
- Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract image edge information, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to extract unwanted noise in the image. Blur, etc.
- the multiple weight matrices have the same dimension, and the feature maps extracted from the multiple weight matrices with the same dimension have the same dimension, and then the multiple extracted feature maps with the same dimension are combined to form the output of the convolution operation.
- the weight values in the weight matrix need to be obtained through a lot of training in practical applications, and each weight matrix formed by the weight values obtained by training can extract information from the input image, thereby helping the convolutional neural network 100 to make correct predictions.
- the initial convolutional layer eg 121
- the features extracted by the later convolutional layers become more and more complex, such as features such as high-level semantics.
- features with higher semantics are more suitable for the problem to be solved.
- pooling layer after the convolutional layer, that is, each layer 121-126 exemplified by 120 in Figure 3, which can be a convolutional layer followed by a layer
- the pooling layer can also be a multi-layer convolutional layer followed by one or more pooling layers.
- the pooling layer may include an average pooling operator and/or a max pooling operator for sampling the input image to obtain a smaller size image.
- the average pooling operator can calculate the average value of the pixel values in the image within a certain range.
- the max pooling operator can take the pixel with the largest value within a specific range as the result of max pooling. Also, just as the size of the weight matrix used in the convolutional layer should be related to the size of the image, the operators in the pooling layer should also be related to the size of the image.
- the size of the output image after processing by the pooling layer can be smaller than the size of the image input to the pooling layer, and each pixel in the image output by the pooling layer represents the average or maximum value of the corresponding sub-region of the image input to the pooling layer.
- the convolutional neural network 100 After being processed by the convolutional layer/pooling layer 120, the convolutional neural network 100 is not sufficient to output the required output information. Because as mentioned before, the convolutional layer/pooling layer 120 only extracts features and reduces the parameters brought by the input image. However, in order to generate the final output information (required class information or other related information), the convolutional neural network 100 needs to utilize the neural network layer 130 to generate one or a set of outputs of the required number of classes. Therefore, the neural network layer 130 may include multiple hidden layers (131, 132 to 13n as shown in FIG. 3) and an output layer 140. In this application, the convolutional neural network is obtained by deforming the selected starting point network at least once to obtain a serial network, and then obtaining it according to the trained serial network. This convolutional neural network can be used for image recognition, image classification, image super-resolution reconstruction, and more.
- the output layer 140 After the multi-layer hidden layers in the neural network layer 130, that is, the last layer of the entire convolutional neural network 100 is the output layer 140, the output layer 140 has a loss function similar to the classification cross entropy, and is specifically used to calculate the prediction error, Once the forward propagation of the entire convolutional neural network 100 (as shown in Fig. 3 from 110 to 140 is forward propagation) is completed, the back propagation (as shown in Fig. 3 from 140 to 110 as back propagation) will start to update The weight values and biases of the aforementioned layers are used to reduce the loss of the convolutional neural network 100 and the error between the result output by the convolutional neural network 100 through the output layer and the ideal result.
- the convolutional neural network 100 shown in FIG. 3 is only used as an example of a convolutional neural network.
- the convolutional neural network can also exist in the form of other network models, for example, such as The multiple convolutional layers/pooling layers shown in FIG. 4 are in parallel, and the extracted features are input to the full neural network layer 130 for processing.
- an embodiment of the present application further provides a system architecture 300 .
- the execution device 210 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 can be arranged on a physical site, or distributed in multiple on the physical site.
- the execution device 210 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the steps of the training set processing method corresponding to FIG. 6 below in this application.
- a user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 .
- Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and the like.
- Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
- the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like.
- the wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, the long term evolution (long term evolution, LTE) system, the global system for mobile communication (global system for mobile communication, GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or a combination of radio frequency identification technology (radio frequency identification, RFID), long range (Long Range, Lora) wireless communication, and near field communication (near field communication, NFC).
- the wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.
- one or more aspects of the execution device 210 may be implemented by each local device, for example, the local device 301 may provide the execution device 210 with local data or feedback calculation results.
- the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for the users of the local device 302 .
- more adversarial samples can be added on the basis of the original samples, and both the original samples and the adversarial samples can be added to the training set for training, so that the trained neural network can recognize the disturbance in the adversarial samples.
- the effect of adversarial training is greatly affected by the model, which may produce completely different effects for different models, and adversarial training needs to continuously generate adversarial samples during the training process, which greatly reduces the training efficiency, and may cause training results.
- the output accuracy of the neural network decreases, that is, the robustness of the neural network is reduced. Robustness can be understood as the ability of a neural network to keep the output value unchanged in the face of small changes in the input.
- denoising can be performed on adversarial samples, adding a denoising network in front of the neural network, and reducing the gap between the paired adversarial samples and the top-level representation generated by the original sample through the gradient descent algorithm, so that the denoising can be achieved.
- the noise network achieves the effect of adversarial robustness.
- the denoising network needs to be added in a targeted manner, and different denoising networks may need to be set in different training scenarios, and the generalization ability is weak, resulting in low training efficiency.
- the vulnerability of neural networks stems from the fact that in the process of neural network training, it is more inclined to learn which functions have more drastic changes outside the prevalence of data distribution, that is, neural networks mainly rely on the relatively small changes in linear space. , but also those components that contain information that can be used for classification are classified, that is, the aforementioned components other than the principal component.
- the components other than the principal components included in the samples of the training set have a greater impact on the training of the neural network.
- the gradient leakage phenomenon is hereinafter referred to as the gradient leakage phenomenon.
- the present application provides a training set processing method, by processing the training set before training, extracting the principal component of each sample, and obtaining a new sample to form a second training set.
- the sample includes the principal component in the original sample, so that training the neural network through the second training set can improve the robustness of the neural network.
- FIG. 6 a schematic flowchart of a training set processing method provided by the present application is as follows.
- all the data included in the first training set can be decomposed to obtain the basis of the first training set and the discrete sequence corresponding to the basis
- the first training set includes multiple samples
- the basis can include one or more basis vectors
- the discrete sequence is used to represent the discrete degree of the basis
- the discrete sequence may include discrete values one-to-one corresponding to each basis vector in the basis.
- the aforementioned basis may be understood as a space, and the space may include one or more basis vectors, and the discrete sequence is used to represent the degree of dispersion of the data included in the first training set in the basis.
- the types of samples included in the first training set are related to the neural network to be trained. For example, if a classification network needs to be trained, the first training set may include a large number of classified images, and if a face recognition network needs to be trained, the first training set may include a large number of face images.
- the aforementioned discrete value may be a value used to represent the degree of dispersion, such as variance, standard deviation, or range.
- the following exemplarily takes the variance as an example to represent the degree of dispersion of the basis in the embodiments of the present application. It should be understood that the variance mentioned below can also be replaced by standard deviation or range to represent the degree of dispersion The value of , will not be repeated below.
- the first training set there may be various ways to decompose the first training set, such as PCA or sparse coding algorithm.
- the first training set can be decomposed by a sparse coding algorithm.
- dataset X solve the optimization problem
- a i is the i th column of A, S i, X i empathy;.
- p L p norm representatives, ⁇ is a regularization term; whereby X i of the dictionary and sparse coding matrix S i (i.e., group )A.
- PCA processing is performed on the first training set to obtain an orthogonal basis and a discrete sequence corresponding to the orthogonal basis.
- PCA is an algorithm to achieve dimensionality reduction by performing orthogonal spectral decomposition of the covariance matrix of the input data to determine the linear direction of the main changes between data points.
- the PCA process can be understood as performing orthonormal spectral decomposition on the input data, that is, the covariance matrix of the first data set to determine the linear direction of the main changes between data points, thereby implementing a dimensionality reduction algorithm.
- the following steps are exemplarily described below by taking the PCA processing on the first training set as an example.
- performing principal component analysis (PCA) processing on the first training set to obtain discrete sequences corresponding to the orthonormal basis and the orthonormal basis may specifically include: centralizing the first training set, Obtain the first training set after centralization. Then perform PCA processing on the first training set after centralization to obtain the orthonormal basis and the discrete sequence corresponding to the orthonormal basis.
- the discrete sequence is composed of the variance of the first training set after centralization in the orthonormal basis.
- the variance included in the discrete sequence is in decreasing order, and the variance included in the discrete sequence corresponds to the basis vectors in the orthonormal basis in order one-to-one.
- the aforementioned centralization process can be understood as zero-average, that is, to detect the mean of all the data included in the first training set, and translate the first training set in space, such as subtracting the mean of all the data in the first training set, so that the first The mean value of all data included in the training set is 0, and the first centralized training set is obtained.
- the calculated variances may be sorted and arranged in descending order to obtain a discrete sequence. Then, the basis vectors included in the orthonormal basis are sorted, and the arrangement of the basis vectors in the orthonormal basis corresponds to the sorting manner in the discrete sequence.
- the orthonormal basis can be expressed as ⁇ v_1,v_2, across,v_N ⁇
- the discrete sequence can be expressed as ⁇ _1, ⁇ _2, across, ⁇ _N ⁇
- the variance corresponding to the basis vector v_1 is ⁇ _1
- the basis vector v_2 corresponds to The variance is ⁇ _2, etc., and so on.
- the discrete sequences can also be sorted in other ways, for example, they can be arranged in increasing order, or according to other set rules, etc.
- the basis vectors in the orthonormal basis can also be sorted according to the corresponding discrete sequences. Sort by.
- the component value of each sample in the first training set in each base vector in the base is obtained, and a plurality of sets of first encoded data are obtained.
- the component values of the samples in each base vector form a set of first encoded data.
- Step 602 is to obtain the component value of each sample projected on each direction, so that Obtain the first encoded data for each sample.
- the component of each sample in the first training set in the orthonormal basis can be obtained, and the first training set corresponding to each sample can be obtained. an encoded data.
- the first training set in step 602 may be replaced with the centralized first training set.
- a preset algorithm may be used to calculate the component value of each sample in the first training set in the basis to obtain multiple sets of first encoded data.
- the preset algorithm may include an inner product operation or a sparse coding operation or the like. For example, taking the inner product operation as an example, the inner product operation may be performed on each sample in the first training set and each basis vector in the basis to obtain the first encoded data of each sample.
- the principal components in each set of the first coded data in the multiple sets of first coded data are extracted, thereby obtaining multiple sets of second coded data.
- the principal component may be understood as a component whose corresponding discrete value is greater than a preset discrete value, or, the principal component may be understood as a preset number of components whose corresponding discrete value is the largest in the first encoded data.
- the basis in the aforementioned step 601 can be an orthonormal basis
- the orthonormal basis can be expressed as ⁇ v_1,v_2,...,v_N ⁇
- the discrete sequence can be expressed as ⁇ _1, ⁇ _2,...., ⁇ _N ⁇
- the basis vector in the orthonormal basis corresponds to the variance in the discrete sequence one-to-one in order.
- the first coded data can be represented as a: ⁇ a_1, a_2,...,a_N ⁇
- the components included in the first coded data correspond to the variance in the discrete sequence one-to-one in order.
- a variance greater than a preset discrete value is determined, and a component corresponding to the variance greater than the preset discrete value is determined from the first encoded data as a principal component to obtain second encoded data.
- a': ⁇ a_1,a_2,...,a_K ⁇ can be extracted from a: ⁇ a_1,a_2,...,a_N ⁇ , and a' is a set of second codes Data, K is less than or equal to N.
- each group of first encoded data may be retained, and other components except the principal component in each group of first encoded data may be replaced with preset values to obtain multiple groups of second encoded data data.
- each group of first encoded data may correspond to one or more groups of second encoded data.
- components other than the principal component in each set of first coded data can be replaced in various ways to obtain multiple sets of second coded data, thereby obtaining multiple sets of second coded data corresponding to each set of first coded data .
- the preset value may include 0 or a preset noise vector.
- the preset noise vector may include noise independent of the real value of the sample, or noise independent of the real value of the sample but similar to the data distribution in the first training set, or noise of a specific distribution, or the like.
- the preset noise vector may be Gaussian noise or uniform distribution features, or the like. Therefore, in the embodiment of the present application, by replacing other components except the main component with 0 or a noise vector, the influence of this part of the components on the training of the neural network is eliminated, so that when training the neural network later, you can use The principal component is dominant, and the neural network with better robustness is obtained.
- the replaced noise may be proportional to the variance corresponding to the component before replacement.
- the noise can be 5 times, 10 times, etc., the variance corresponding to the original component. Therefore, in the embodiment of the present application, the distribution law of the replaced noise can be similar to the distribution law of the variance of the original component, thereby reducing the impact on the subsequent training of the neural network, so that in the process of training the neural network, the main component is used to perform Under the premise of obtaining a neural network with better robustness, avoid the impact on the output accuracy of the neural network, and obtain a neural network with more balanced robustness and output accuracy.
- PCA processing is used for the first training set in the aforementioned step 601
- a certain amount of noise may be added in the normal direction of the PCA subspace to suppress the gradient leakage phenomenon.
- the arrangement order of the components included in the first encoded data corresponds to the arrangement order of the variances in the discrete sequence, and it is determined that the variance is greater than the predetermined order.
- the multiple sets of second encoded data are mapped to the basis of the first training set to obtain new samples corresponding to each set of second encoded data in the multiple sets of second encoded data.
- the samples corresponding to the second encoded data constitute the second training set.
- the basis of the first training set can be understood as a space
- the second encoded data includes component values in one or more directions in the space
- a group of second encoded data is mapped into the space, A new sample can be obtained.
- the basis in the foregoing step 601 may be mapped according to the second encoded data to obtain new samples corresponding to each group of the second encoded data. Multiple new samples corresponding to the multiple sets of second encoded data form a second training set.
- the basis mentioned in the aforementioned step 601 can be understood as space
- the second encoded data includes the value of the projection of the new sample in the direction of each basis vector
- the components included in the second encoded data are in the basis Mapping in , you can get a new sample.
- the second training set can be used to perform neural network training to obtain a neural network with better robustness.
- the second coded data with reduced interference is obtained, and a new sample can be obtained according to the second coded data, And use the new samples for neural network training.
- the new sample reduces the interference of other components except the main component, so that the neural network obtained by training can be trained by the main component, and the robustness of the neural network is improved.
- the training set processing method provided by this application is processed for the training set, and the processing method is only related to the distribution of the principal components of the training set itself. Therefore, even if different models are trained in different scenarios, this application can be used.
- the provided training set processing method processes the training set, has strong generalization ability, and does not depend on the general algorithm of the trained model, so it has high versatility.
- An adversarial sample is a sample that includes disturbance information, and the disturbance included in the adversarial sample will reduce the output accuracy of the neural network.
- the adversarial sample includes a sample of disturbance information, and the disturbance included in the adversarial sample will affect the robustness of the neural network obtained by subsequent training, and reduce the output accuracy of the neural network.
- the schematic diagram of the adversarial sample can be shown in FIG. 7 , the image A1 is an image that does not include disturbance, and the disturbance information A2 is superimposed on the basis of the image A1, and 0.007 identifies the coefficient of the added disturbance information, thereby obtaining the added disturbance information.
- the perturbation information is included in the adversarial sample, which can easily cause the recognition error of the neural network.
- the image A1 is actually classified as a panda, and the image 3 after adding the disturbance information may be recognized as a dog. Therefore, if Using this adversarial example for training may lead to poor robustness of the neural network.
- the training set before using the training set for neural network training, is processed to reduce the disturbance part in each sample in the training set, so that the subsequent neural network training can use the reduced disturbance part.
- the training set is used for training to avoid the influence of the disturbance in the sample on the neural network and improve the robustness of the neural network.
- the principal component of the sample is the main component, and the neural network with better robustness is obtained by training.
- the main component of the sample can be used to train the neural network. While improving the robustness of the neural network, the impact on the output accuracy of the neural network can be reduced, and a neural network with balanced robustness and output accuracy can be obtained.
- the principal component in the encoded data corresponding to the original sample is retained, and other components except the principal component are replaced with noise or 0, so as to obtain the elimination of the principal component except the principal component.
- New samples of other components of When the neural network is trained using the training set including new samples, the training can be performed mainly on the principal component, which reduces the disturbance included in the samples. It is equivalent to introducing a data manifold to defend against adversarial samples in the training process to improve the robustness of the neural network.
- the data manifold can be understood as one of the hypotheses embedded in the linear space of the real dimension of the data, but the dimension is much lower than the manifold of the space, and the observed input data points are distributed near it, similar to neural networks such as natural images.
- Data that is good at classification basically conforms to this distribution assumption.
- the principal components of different samples are quite different, and if the principal components are successfully classified, a higher robust neural network can be obtained. Therefore, without reducing the output accuracy of the neural network, the robustness of the neural network is improved, and the problem of gradient leakage during the training of the neural network is suppressed.
- a sample in the first training set can be represented as a data point xi
- the first training set can be a N*D dimensional data set
- the orthonormal basis ⁇ v_1,v_2,.....,v_N ⁇ is obtained, and the corresponding discrete sequence ⁇ _1, ⁇ _2, etc, ⁇ _N ⁇ , and the variance included in the discrete sequence is decreasing.
- arrangement Determine the component dimension d included in the principal component, the added noise dimension m, and the noise size c.
- the neural network can then be trained using the new second training set.
- the other components of the principal component are replaced with noise proportional to the corresponding variance, which improves the robustness of the neural network without affecting the training process of the neural network.
- the formula for taking the sample can include: Ux old represents the aforementioned first encoded data, s is a scaling vector, ⁇ is a noise vector, and c is a constant greater than 1.
- s can be taken as the truncation vector ⁇ 1,1,...,1,1,0,0,....,0 ⁇ , and the number of 1 is the truncation hyperparameter, that is, the aforementioned preset value d. In order to achieve the purpose of robustness, it is generally necessary to further suppress the corresponding components with smaller scales. ).
- the aforementioned noise vector can also be replaced with a vector that is not Gaussian distributed, but has other similar variance levels independent of the distribution of the samples themselves.
- a method is proposed to detect and improve the vulnerability of neural networks to confrontation caused by non-robust parts of the training set.
- the method provided in this application is versatile and efficient, and can be directly applied to most tasks and networks.
- a neural network with higher output accuracy can be trained, and a neural network with balanced robustness and output accuracy can be obtained.
- the classification network As an example, if the defense method of projection in the test phase is adopted, that is, the disturbance in the adversarial sample is trained in the test phase, since the projected manifold has errors compared with the real data manifold, during training If unprojected data is used for training, the classification effect of the model on the data projected onto the manifold is unknown. On the one hand, it will lead to a significant decline in classification accuracy, and on the other hand, because the classification network is not well trained on the manifold, it is easier to find adversarial examples on the manifold.
- the projection operation is performed during training, which can avoid the aforementioned problems.
- the classification network obtained by training does not need to project the input data during testing, which improves the training efficiency.
- the present invention can have a better and more flexible trade-off between robust accuracy and original test set accuracy.
- the present application can flexibly adjust the final accuracy rate of model training and adjust the balance between the output accuracy rate and robustness of the neural network by adjusting the parameter d and the specific distribution and amplitude of the added noise according to the needs of the application scenario.
- FIG. 8 a schematic structural diagram of a training set processing apparatus provided by the present application, the training set processing apparatus is used to execute the steps of the method in the aforementioned FIG. 6 .
- the training set processing means may include:
- the decomposition unit 801 is configured to decompose according to the first training set to obtain a basis of the first training set and a discrete sequence corresponding to the basis, the first training set includes a plurality of samples, the basis includes at least one basis vector, and the degree of dispersion includes A discrete value corresponding to each basis vector in , the discrete sequence can be used to represent the discrete degree of the basis;
- Obtaining unit 802 is used to obtain the component value of each sample in the first training set in the base, and obtain multiple groups of first encoded data, and each group of first encoded data corresponds to one sample;
- the obtaining unit 802 is further configured to obtain principal components in each group of first encoded data in the multiple groups of first encoded data, and obtain multiple groups of second encoded data, wherein the discrete values corresponding to the principal components are higher than the preset discrete values, and each First;
- the mapping unit 803 is configured to map multiple groups of second encoded data to the base of the first training set, and obtain samples corresponding to multiple groups of second encoded data, and the samples corresponding to multiple groups of second encoded data form a second training set, and the first The second training set is used to train the neural network.
- the basis is an orthonormal basis
- the decomposition unit 801 is specifically configured to perform principal component analysis PCA processing on the first training set to obtain the orthonormal basis and the corresponding discrete sequence in the orthonormal basis.
- the decomposition unit 801 is specifically configured to: centralize the first training set to obtain a centralized first training set, and the mean value of the data included in the centralized first training set is 0; perform PCA processing on the first training set after centralization to obtain the orthonormal basis and the discrete sequence of the orthonormal basis, the discrete sequence includes the sequence composed of the variance of the first training set after centralization in the orthonormal basis .
- the decomposition unit 801 is specifically configured to calculate, through a preset algorithm, the component value of each sample in the centered first training set on each basis vector in the orthonormal basis, to obtain A plurality of sets of first encoded data.
- the preset algorithm includes an inner product operation or a sparse coding operation.
- the obtaining unit 802 is specifically configured to retain the main component in each group of first encoded data, and replace other components except the main component in each group of first encoded data with a preset value, A plurality of sets of second coded data are obtained, wherein the principal components of each set of first coded data constitute at least one set of second coded data.
- the preset value includes 0 or a preset noise vector
- the preset noise vector includes Gaussian noise or uniformly distributed noise.
- the preset noise vector is proportional to the variance corresponding to the replaced component.
- FIG. 9 is a schematic structural diagram of another training set processing apparatus provided by the present application, as described below.
- the training set processing apparatus may include a processor 901 and a memory 902 .
- the processor 901 and the memory 902 are interconnected by wires.
- the memory 902 stores program instructions and data.
- the memory 902 stores program instructions and data corresponding to the aforementioned steps in FIG. 6 .
- the processor 901 is configured to perform the method steps performed by the training set processing apparatus shown in any of the foregoing embodiments in FIG. 6 .
- the training set processing apparatus may further include a transceiver 903 for receiving or sending data.
- Embodiments of the present application further provide a computer-readable storage medium, where a program for generating a vehicle's running speed is stored in the computer-readable storage medium, and when the computer is running on a computer, the computer is made to execute the implementation as shown in the aforementioned FIG. 6 . steps in the method described in the example.
- the aforementioned training set processing apparatus shown in FIG. 9 is a chip.
- An embodiment of the present application further provides a training set processing device, which may also be called a digital processing chip or a chip.
- the chip includes a processing unit and a communication interface.
- the processing unit obtains program instructions through the communication interface, and the program instructions are processed.
- the unit is executed, and the processing unit is configured to execute the method steps executed by the training set processing apparatus shown in any of the foregoing embodiments in FIG. 6 .
- the embodiments of the present application also provide a digital processing chip.
- the digital processing chip integrates circuits and one or more interfaces for realizing the above-mentioned processor 901 or the functions of the processor 901 .
- the digital processing chip can perform the method steps of any one or more of the foregoing embodiments.
- the digital processing chip does not integrate the memory, it can be connected with the external memory through the communication interface.
- the digital processing chip implements the actions performed by the training set processing apparatus in the above embodiment according to the program codes stored in the external memory.
- the embodiment of the present application also provides a computer program product that, when driving on the computer, causes the computer to execute the steps executed by the training set processing apparatus in the method described in the embodiment shown in FIG. 6 .
- the training set processing apparatus may be a chip, and the chip includes: a processing unit and a communication unit.
- the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin, or a circuit, etc. .
- the processing unit can execute the computer-executed instructions stored in the storage unit, so that the chip in the server executes the training set processing method described in the embodiment shown in FIG. 6 above.
- the storage unit is a storage unit in the chip, such as a register, a cache, etc.
- the storage unit may also be a storage unit located outside the chip in the wireless access device, such as only Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
- ROM Read-only memory
- RAM random access memory
- the aforementioned processing unit or processor may be a central processing unit (CPU), a network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), a digital signal processing digital signal processor (DSP), application specific integrated circuit (ASIC) or field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general purpose processor may be a microprocessor or it may be any conventional processor or the like.
- FIG. 10 is a schematic structural diagram of a chip provided by an embodiment of the present application.
- the chip may be represented as a neural network processor NPU 100, and the NPU 100 is mounted on the main CPU ( Host CPU), the task is allocated by the Host CPU.
- the core part of the NPU is the arithmetic circuit 1003, which is controlled by the controller 1004 to extract the matrix data in the memory and perform multiplication operations.
- the arithmetic circuit 1003 includes multiple processing units (process engines, PEs). In some implementations, the arithmetic circuit 1003 is a two-dimensional systolic array. The arithmetic circuit 1003 may also be a one-dimensional systolic array or other electronic circuitry capable of performing mathematical operations such as multiplication and addition. In some implementations, arithmetic circuit 1003 is a general-purpose matrix processor.
- the arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1002 and buffers it on each PE in the arithmetic circuit.
- the arithmetic circuit fetches the data of the matrix A and the matrix B from the input memory 1001 to perform the matrix operation, and stores the partial result or the final result of the matrix in the accumulator 1008 .
- Unified memory 1006 is used to store input data and output data.
- the weight data is directly passed through the storage unit access controller (direct memory access controller, DMAC) 1005, and the DMAC is transferred to the weight memory 1002.
- Input data is also transferred to unified memory 1006 via the DMAC.
- a bus interface unit (BIU) 1010 is used for the interaction between the AXI bus and the DMAC and an instruction fetch buffer (instruction fetch buffer, IFB) 1009.
- IFB instruction fetch buffer
- the bus interface unit 1010 (bus interface unit, BIU) is used for the instruction fetch memory 1009 to obtain instructions from the external memory, and is also used for the storage unit access controller 1005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
- the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1006 , the weight data to the weight memory 1002 , or the input data to the input memory 1001 .
- the vector calculation unit 1007 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc., if necessary. It is mainly used for non-convolutional/fully connected layer network computations in neural networks, such as batch normalization, pixel-level summation, and upsampling of feature planes.
- the vector computation unit 1007 can store the vector of processed outputs to the unified memory 1006 .
- the vector calculation unit 1007 may apply a linear function and/or a nonlinear function to the output of the operation circuit 1003, such as linear interpolation of the feature plane extracted by the convolutional layer, such as a vector of accumulated values, to generate activation values.
- the vector computation unit 1007 generates normalized values, pixel-level summed values, or both.
- the vector of processed outputs can be used as activation input to the arithmetic circuit 1003, eg, for use in subsequent layers in a neural network.
- the instruction fetch memory (instruction fetch buffer) 1009 connected to the controller 1004 is used to store the instructions used by the controller 1004;
- the unified memory 1006, the input memory 1001, the weight memory 1002 and the instruction fetch memory 1009 are all On-Chip memories. External memory is private to the NPU hardware architecture.
- each layer in the recurrent neural network can be performed by the operation circuit 1003 or the vector calculation unit 1007 .
- the processor mentioned in any one of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program of the method in FIG. 6 above.
- the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be A physical unit, which can be located in one place or distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- U disk U disk
- mobile hard disk ROM
- RAM random access memory
- disk or CD etc.
- a computer device which can be a personal computer, server, or network device, etc. to execute the methods described in the various embodiments of the present application.
- the computer program product includes one or more computer instructions.
- the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
- the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
- wireless eg, infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
- the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state disks (SSDs)), and the like.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un procédé de traitement d'ensemble d'apprentissage, utilisé pour former un réseau neuronal par extraction d'un composant primaire d'un ensemble d'apprentissage. Des caractéristiques plus robustes peuvent être utilisées pour former le réseau neuronal, ce qui permet d'améliorer la robustesse du réseau neuronal obtenu sans diminuer la précision de sortie du réseau neuronal. Ledit procédé consiste : dans un premier temps, à acquérir une base et une séquence discrète d'un premier ensemble d'apprentissage, la séquence discrète comprenant des valeurs discrètes correspondant à chaque vecteur de base sur la base d'une base biunivoque et la séquence discrète étant utilisée pour indiquer le degré discret de la base ; ensuite, à acquérir une valeur de composant de chaque échantillon sur chaque vecteur de base de sorte à obtenir de multiples groupes de premières données codées ; par la suite, à acquérir un composant primaire dans chaque groupe de premières données codées pour obtenir de multiples groupes de secondes données codées, une valeur discrète correspondant au composant primaire étant supérieure à une valeur discrète prédéfinie ; et à effectuer un mappage en fonction des multiples groupes de secondes données codées, pour obtenir de nouveaux échantillons de sorte à former un second ensemble d'apprentissage, le second ensemble d'apprentissage étant utilisé pour former un réseau neuronal.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010692947.9A CN114091554A (zh) | 2020-07-17 | 2020-07-17 | 一种训练集处理方法和装置 |
| CN202010692947.9 | 2020-07-17 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2022012668A1 true WO2022012668A1 (fr) | 2022-01-20 |
Family
ID=79554489
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2021/106758 Ceased WO2022012668A1 (fr) | 2020-07-17 | 2021-07-16 | Procédé et appareil de traitement d'ensemble d'apprentissage |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN114091554A (fr) |
| WO (1) | WO2022012668A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114547482A (zh) * | 2022-03-03 | 2022-05-27 | 智慧足迹数据科技有限公司 | 业务特征生成方法、装置、电子设备及存储介质 |
| CN114579851A (zh) * | 2022-02-25 | 2022-06-03 | 电子科技大学 | 一种基于自适应性节点特征生成的信息推荐方法 |
| CN117152486A (zh) * | 2023-07-26 | 2023-12-01 | 北京工业大学 | 一种基于可解释性的图像对抗样本检测方法 |
| CN117349764A (zh) * | 2023-12-05 | 2024-01-05 | 河北三臧生物科技有限公司 | 一种干细胞诱导数据智能分析方法 |
| CN120340506A (zh) * | 2025-05-14 | 2025-07-18 | 智明日新(南京)人工智能科技有限公司 | 一种基于多模态大模型的asr音频语料的生成方法和装置 |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114612688B (zh) * | 2022-05-16 | 2022-09-09 | 中国科学技术大学 | 对抗样本生成方法、模型训练方法、处理方法及电子设备 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108896499A (zh) * | 2018-05-09 | 2018-11-27 | 西安建筑科技大学 | 结合主成分分析与正则化多项式的光谱反射率重建方法 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8595164B2 (en) * | 2011-01-27 | 2013-11-26 | Ming-Chui DONG | Wavelet modeling paradigms for cardiovascular physiological signal interpretation |
| CN110321929A (zh) * | 2019-06-04 | 2019-10-11 | 平安科技(深圳)有限公司 | 一种提取文本特征的方法、装置及存储介质 |
-
2020
- 2020-07-17 CN CN202010692947.9A patent/CN114091554A/zh active Pending
-
2021
- 2021-07-16 WO PCT/CN2021/106758 patent/WO2022012668A1/fr not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108896499A (zh) * | 2018-05-09 | 2018-11-27 | 西安建筑科技大学 | 结合主成分分析与正则化多项式的光谱反射率重建方法 |
Non-Patent Citations (3)
| Title |
|---|
| INDRANIL CHAKRABORTY; DEBOLEENA ROY; ISHA GARG; AAYUSH ANKIT; KAUSHIK ROY: "Constructing Energy-efficient Mixed-precision Neural Networks through Principal Component Analysis for Edge Intelligence", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 4 June 2019 (2019-06-04), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081543655 * |
| PENGUIN MEDIA CONTENT PLATFORM – AI CLASSROOM: "Face Recognition Based on PCA for Dimensionality Reduction and BP Neural Network", 27 April 2018 (2018-04-27), Retrieved from the Internet <URL:https://cloud.tencent.com/developer/news/193912> * |
| SOME HINTS ON MACHINE LEARNING ALGORITHMS: "Summary of Principal Components Analysis (PCA)", 22 October 2019 (2019-10-22), Retrieved from the Internet <URL:https://tianchi.aliyun.com/forum/postDetail?postId=79395> * |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114579851A (zh) * | 2022-02-25 | 2022-06-03 | 电子科技大学 | 一种基于自适应性节点特征生成的信息推荐方法 |
| CN114579851B (zh) * | 2022-02-25 | 2023-03-14 | 电子科技大学 | 一种基于自适应性节点特征生成的信息推荐方法 |
| CN114547482A (zh) * | 2022-03-03 | 2022-05-27 | 智慧足迹数据科技有限公司 | 业务特征生成方法、装置、电子设备及存储介质 |
| CN114547482B (zh) * | 2022-03-03 | 2023-01-20 | 智慧足迹数据科技有限公司 | 业务特征生成方法、装置、电子设备及存储介质 |
| CN117152486A (zh) * | 2023-07-26 | 2023-12-01 | 北京工业大学 | 一种基于可解释性的图像对抗样本检测方法 |
| CN117349764A (zh) * | 2023-12-05 | 2024-01-05 | 河北三臧生物科技有限公司 | 一种干细胞诱导数据智能分析方法 |
| CN117349764B (zh) * | 2023-12-05 | 2024-02-27 | 河北三臧生物科技有限公司 | 一种干细胞诱导数据智能分析方法 |
| CN120340506A (zh) * | 2025-05-14 | 2025-07-18 | 智明日新(南京)人工智能科技有限公司 | 一种基于多模态大模型的asr音频语料的生成方法和装置 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114091554A (zh) | 2022-02-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112183718B (zh) | 一种用于计算设备的深度学习训练方法和装置 | |
| CN114255361B (zh) | 神经网络模型的训练方法、图像处理方法及装置 | |
| CN112651511B (zh) | 一种训练模型的方法、数据处理的方法以及装置 | |
| CN113570029B (zh) | 获取神经网络模型的方法、图像处理方法及装置 | |
| CN111797983B (zh) | 一种神经网络构建方法以及装置 | |
| WO2022083536A1 (fr) | Procédé et appareil de construction de réseau neuronal | |
| CN113705769A (zh) | 一种神经网络训练方法以及装置 | |
| US20230082597A1 (en) | Neural Network Construction Method and System | |
| WO2022012668A1 (fr) | Procédé et appareil de traitement d'ensemble d'apprentissage | |
| CN111860588A (zh) | 一种用于图神经网络的训练方法以及相关设备 | |
| WO2022068623A1 (fr) | Procédé de formation de modèle et dispositif associé | |
| CN112529146A (zh) | 神经网络模型训练的方法和装置 | |
| WO2022111617A1 (fr) | Procédé et appareil d'entraînement de modèle | |
| CN111368972A (zh) | 一种卷积层量化方法及其装置 | |
| CN113627163B (zh) | 一种注意力模型、特征提取方法及相关装置 | |
| CN111931901A (zh) | 一种神经网络构建方法以及装置 | |
| CN111797992A (zh) | 一种机器学习优化方法以及装置 | |
| CN113536970A (zh) | 一种视频分类模型的训练方法及相关装置 | |
| WO2023231794A1 (fr) | Procédé et appareil de quantification de paramètres de réseau neuronal | |
| CN113627422A (zh) | 一种图像分类方法及其相关设备 | |
| CN113191489A (zh) | 二值神经网络模型的训练方法、图像处理方法和装置 | |
| US11868878B1 (en) | Executing sublayers of a fully-connected layer | |
| WO2022156475A1 (fr) | Procédé et appareil de formation de modèle de réseau neuronal, et procédé et appareil de traitement de données | |
| CN115601513A (zh) | 一种模型超参数的选择方法及相关装置 | |
| WO2022227024A1 (fr) | Procédé et appareil opérationnels pour un modèle de réseau neuronal et procédé et appareil d'apprentissage pour un modèle de réseau neuronal |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21842978 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21842978 Country of ref document: EP Kind code of ref document: A1 |