WO2024164590A1 - Quantization method for encoder-decoder network model and related apparatus - Google Patents
Quantization method for encoder-decoder network model and related apparatus Download PDFInfo
- Publication number
- WO2024164590A1 WO2024164590A1 PCT/CN2023/129969 CN2023129969W WO2024164590A1 WO 2024164590 A1 WO2024164590 A1 WO 2024164590A1 CN 2023129969 W CN2023129969 W CN 2023129969W WO 2024164590 A1 WO2024164590 A1 WO 2024164590A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- network
- network layer
- layer
- model
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
Definitions
- the present application relates to the field of image compression, and in particular to a quantization method and related device for a coding and decoding network model.
- Model quantization is a technology that converts the floating-point calculation of the model into fixed-point calculation, which can effectively reduce the model calculation amount, parameter size and memory consumption.
- the codec network model can be deployed on end-side devices with limited resources such as mobile phones and robots.
- the above codec network model has the problem of poor stability. For example, the bit rate after encoding the image is too high, and there are anomalies in the reconstructed image. Even if the codec network model is quantized, the problem of poor stability of the codec network model still exists.
- the present application provides a method, device, equipment, storage medium and computer program for quantizing a codec network model, which can solve the problem of poor stability of the codec network model in the related art.
- the technical solution is as follows:
- a quantization method for a codec network model comprising: determining H network units included in an unquantized codec network model, each of the H network units comprising a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer, so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, H is an integer greater than or equal to 1, and for each network The weights of each output channel included in the first network layer in the unit are scaled so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold, and the weights of each input channel included in the second network layer in each network unit are scaled so that the output result of the codec network model
- the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
- an image can output a feature map after passing through a network layer.
- the codec network model contains multiple network layers, and each network layer includes multiple weights, which are the weights of the input channel and the output channel of the corresponding network layer.
- the weights of can process the feature maps of the network layer input, and the weights of the output channels can process the feature maps of the network layer output.
- the image to be compressed is input into the codec network model, and each network layer in the codec network model can output the corresponding feature map.
- determining a first network layer among H network units included in an unquantized codec network model includes: determining M first sample feature maps output by a first linear operation layer, the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, the first linear operation layer is any linear operation layer included in the codec network model, determining N second sample feature maps output by the first linear operation layer, the N second sample feature maps are feature maps corresponding to N abnormal sample images, N is an integer greater than or equal to 1, and when the M first sample feature maps and the N second sample feature maps meet a feature abnormality condition, determining that the first linear operation layer is the first network layer.
- the linear operation layers included in the codec network model can also be screened to determine the linear operation layers that will cause abnormal coding results, and the linear operation layers that will cause abnormal coding results are used as the first network layer. In this way, the weights of each output channel included in the linear operation layer that causes abnormal coding results can be scaled in a targeted manner, which can reduce the computational complexity of the codec network model while effectively ensuring the stability of the codec network model.
- determining the second network layer in the H network units included in the unquantized codec network model includes: for the first network layer in each of the network units, when there is a linear operation layer after the first network layer, determining the linear operation layer after the first network layer as the second network layer; when there is no linear operation layer after the first network layer, adding a linear operation layer after the first network layer, and determining the added linear operation layer as the second network layer.
- the second network layer is a linear operation layer after the first network layer, the second network layer can timely correct the feature values in the feature map output by the first network layer, effectively reducing the impact of the codec network model caused by the change of the weights of each output channel of the first network layer, so that the output result of the codec network model is the same before and after the model weight scaling, thereby ensuring that there is no abnormality in the final generated code stream and/or reconstructed image, and improving the stability of the codec network model.
- scaling the weights of each output channel included in the first network layer in each network unit includes: for the first network layer in each network unit, determining K third sample feature maps output by the first network layer, the K third sample feature maps are feature maps corresponding to K normal sample images, K is an integer greater than or equal to 1, determining a scaling ratio of each output channel included in the first network layer based on the K third sample feature maps, and scaling the weights of each output channel included in the first network layer according to the scaling ratio of each output channel included in the first network layer.
- the normal sample image is an image without any abnormalities in the bit rate and the reconstructed image after passing through the codec network
- the value range of the eigenvalues in each channel of the feature map corresponding to the normal sample image is smaller than the value range of the eigenvalues in the corresponding channel of the feature map corresponding to the abnormal sample image
- the effective truncation of the abnormal eigenvalues can be achieved, thereby ensuring that the values of the truncated abnormal eigenvalues are consistent with the values of the eigenvalues in the normal sample feature map, thereby improving the stability of the codec network model.
- determining the scaling ratio of each output channel included in the first network layer based on the K third sample feature maps includes: determining a reference eigenvalue corresponding to the first network layer and a boundary eigenvalue of each output channel included in the first network layer based on the K third sample feature maps, and determining the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of each output channel included in the first network layer as the scaling ratio of each output channel included in the first network layer.
- the boundary eigenvalues of each output channel included in the first network layer determined based on the third sample feature map can indicate the range of eigenvalues of the feature map corresponding to the normal sample image in each channel
- the reference eigenvalues corresponding to the first network layer can indicate the maximum eigenvalue in the feature map corresponding to the normal sample image.
- the ratio between the reference eigenvalues corresponding to the first network layer and the boundary eigenvalues of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer, which can ensure the truncation of abnormal eigenvalues and can also ensure that normal eigenvalues are not affected by the model weight scaling, thereby effectively improving the stability of the codec network model.
- scaling the weights of the respective input channels included in the second network layer in each network unit includes: for the second network layer in each network unit, determining the scaling ratio of the respective input channels included in the second network layer based on the scaling ratio of the respective output channels included in the first network layer in the network unit where the second network layer is located, and scaling the weights of the respective input channels included in the second network layer according to the scaling ratio of the respective input channels included in the second network layer.
- the weights of each input channel included in the second network layer are scaled, so that the second network layer can modify the feature values in the feature map output by the corresponding first network layer in a targeted manner, so that the normal feature
- the eigenvalue is not affected by the weight scaling of the first network layer, thereby ensuring that the output results of the normal sample image in the codec network model are the same before and after the model weight scaling, thereby improving the stability of the codec network model.
- the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
- the quantization method of the codec network model provided in the embodiment of the present application can determine the first network layer and the second network layer from the codec network model, and then realize the correction of abnormal feature values, so that the finally generated bitstream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
- the boundary eigenvalue is a maximum eigenvalue or a minimum eigenvalue.
- the boundary eigenvalue is the maximum eigenvalue
- the minimum eigenvalue in the feature map output by the third network layer is 0, that is, all are non-negative.
- the minimum truncation value is generally a negative number
- the abnormal eigenvalue less than the minimum truncation value is truncated to 0 after passing through the third network layer, thereby achieving the correction of the abnormal eigenvalue.
- the maximum eigenvalue in the feature map output by the third network layer is 0, that is, all are non-positive.
- the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of the abnormal eigenvalue, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
- the value range of the feature value in the normal sample feature map output by any network layer in the codec network model is called the truncation value of the network layer, and the truncation value includes the minimum truncation value and the maximum truncation value.
- the minimum truncation value refers to the minimum value in the value range
- the maximum truncation value refers to the maximum value in the value range.
- a quantization device for a codec network model wherein the quantization device for the codec network model has the function of implementing the quantization method behavior of the codec network model in the first aspect.
- the quantization device for the codec network model includes at least one module, and the at least one module is used to implement the quantization method of the codec network model provided in the first aspect.
- a quantization device for a codec network model comprising a processor and a memory, the memory being used to store a computer program for executing the quantization method for the codec network model provided in the first aspect.
- the processor is configured to execute the computer program stored in the memory to implement the quantization method for the codec network model described in the first aspect.
- the quantization device of the codec network model may further include a communication bus, and the communication bus is used to establish a connection between the processor and the memory.
- a computer-readable storage medium stores instructions, and when the instructions are executed on a computer, the computer executes the quantization method of the encoding and decoding network model described in the first aspect.
- a computer program product comprising instructions is provided, and when the instructions are executed on a computer, the computer executes the quantization method of the codec network model described in the first aspect.
- a computer program is provided, and when the computer program is executed on a computer, the computer executes the quantization method of the codec network model described in the first aspect.
- FIG1 is a schematic diagram of an image compression framework provided in an embodiment of the present application.
- FIG2 is a schematic diagram of a network model provided in an embodiment of the present application.
- FIG3 is a statistical diagram of a maximum eigenvalue provided in an embodiment of the present application.
- FIG4 is a statistical diagram of a minimum eigenvalue provided in an embodiment of the present application.
- FIG5 is a statistical diagram of another maximum eigenvalue provided in an embodiment of the present application.
- FIG6 is a schematic diagram of a reconstructed image provided by an embodiment of the present application.
- FIG7 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
- FIG8 is a schematic diagram of another implementation environment provided by an embodiment of the present application.
- FIG9 is a schematic diagram of the structure of a coding and decoding framework provided in an embodiment of the present application.
- FIG10 is a schematic diagram of the structure of a coding network model provided in an embodiment of the present application.
- FIG11 is a schematic diagram of the structure of a decoding network model provided in an embodiment of the present application.
- FIG12 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application.
- FIG13 is a flowchart of a quantization method of a coding and decoding network model provided in an embodiment of the present application.
- FIG14 is a schematic diagram of a network unit provided in an embodiment of the present application.
- FIG15 is a schematic diagram of a weight provided in an embodiment of the present application.
- FIG16 is a statistical diagram of another maximum eigenvalue provided in an embodiment of the present application.
- FIG17 is a schematic diagram of another weight provided in an embodiment of the present application.
- FIG18 is a schematic diagram of another reconstructed image provided by an embodiment of the present application.
- FIG19 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application.
- FIG20 is a schematic diagram of the structure of a quantization device for a coding and decoding network model provided in an embodiment of the present application.
- AI Artificial intelligence
- Rectified linear unit is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions represented by ramp functions and their variants.
- CNN Convolutional neural network
- CNN is a feedforward neural network with deep structure and convolution calculation. It is one of the representative algorithms of deep learning. CNN may also contain activation layer (such as ReLU, etc.), pooling layer, batch normalization layer, fully connected layer and other modules.
- Typical convolutional neural networks include LeNet, AlexNet, VGGNet, ResNet, etc.
- Basic CNN can be composed of backbone network and head network, and complex CNN is composed of backbone, neck and head network.
- Feature map refers to the three-dimensional data output by the network layers such as convolution layer, activation layer, pooling layer, batch normalization layer, etc. in CNN.
- the three dimensions of this three-dimensional data are width, height, and channel.
- an image can output a feature map after passing through a network layer.
- the network layer contains three convolution kernels
- the three convolution kernels can respectively convolve the feature map input by the network layer to obtain the feature map output by the network layer.
- the number of channels of the feature map output by the network layer is 3, that is, the number of convolution kernels contained in the network layer corresponds one to one to the number of channels of the feature map output by the network layer.
- Variational Autoencoder An AI-based image codec for data compression or noise removal.
- the embodiment of the present application is introduced by taking the VAE-based image compression framework as an example.
- the VAE-based image compression framework is shown in Figure 1.
- the image to be compressed is input into the encoding network model to obtain the features to be encoded.
- the features to be encoded are quantized to obtain the quantized features to be encoded.
- the probability distribution is estimated through the entropy estimation network model.
- the quantized features to be encoded are entropy encoded based on the probability distribution to obtain an image code stream.
- the decoding process corresponds to the encoding process.
- PSNR Peak signal to noise ratio
- Bit rate In image compression, it refers to the code length required to encode a unit pixel. The higher the bit rate, the better the image reconstruction quality.
- Pixel bit rate bits per pixel, BPP: also known as bits/pixel, BPP is the number of bits used to store each pixel. The smaller the BPP, the lower the compression bit rate, and the larger the BPP, the higher the compression bit rate.
- Quantization refers to converting a continuous signal into a discrete signal.
- image compression process it refers to converting a continuous feature into a discrete signal.
- probability value of the probability distribution is usually changed from a continuous value to a discrete value.
- Rate-distortion curve (RD-Curve): The horizontal axis is the bit rate, and the vertical axis is the image quality. Generally, the closer the curve is to the upper left, the better the encoding and decoding performance.
- AI compression algorithm refers to a data compression method based on deep learning technology.
- Entropy estimation The process of predicting the distribution of quantized data.
- Model quantization refers to the method of converting a floating-point neural network model into a fixed-point model that can be represented by a finite number of bits. Model quantization can establish a mapping relationship between floating-point data and fixed-point data. Its principle can be understood as using integers in the range of n (e.g., 8, 10, 16) bits to represent the floating-point weights in the network model and the tensors in the calculation process. For ease of understanding, the process of quantizing tensors in the model calculation process is introduced by taking the network model shown in Figure 2 as an example.
- x represents the input of the network model
- y represents the output of the network model
- a1 is the feature map output by the convolutional layer (Conv)
- a2 is the feature map output by the activation layer (Relu)
- qx represents the fixed-point value of x after quantization
- qa1 represents the feature map after quantization of a1
- qa2 represents the feature map after quantization of a2
- qy represents the data output by the fully connected layers (FC).
- FC fully connected layers
- the quantization formulas corresponding to x, a1, a2, and y are determined. In this way, during the inference process of the model, the input of the model and the tensor in the calculation process can be quantized based on the quantization formulas corresponding to x, a1, a2, and y to obtain qy, and qy can be dequantized to obtain the inferred floating-point value y.
- the quantization formula corresponding to the input x of the network model can be expressed by the following expression (1).
- x quant represents the fixed-point value after quantization
- x represents the floating-point value that has not been quantized
- scale represents the minimum scale that the fixed-point value after quantization can represent, that is, the scale
- zero_point represents the fixed-point value corresponding to the floating-point value 0, that is, the zero point position
- quant min represents the minimum value in the value range of x quant
- quant max represents the maximum value in the value range of x quant
- the round() function is used to round the numbers in the brackets
- the clamp() function is used to limit x quant to between quant max and quant min . That is, if the calculated fixed-point value exceeds the value range of x quant , it needs to be truncated so that x quant is equal to the value of quant max and quant min that is closest to the calculated fixed-point value.
- the main model quantization methods include post-training quantization (PTQ) and quantization aware training (QAT).
- PTQ refers to directly converting a floating-point network model into a fixed-point network model without retraining the network model, that is, without updating the weights of the network model.
- QAT refers to inserting a pseudo-quantization (simulated quantization) operation into the network model during network model training or fine-tuning to simulate the quantization operation, so that the parameters of the network model can better adapt to the information loss caused by quantization.
- QAT refers to inserting a pseudo-quantization (simulated quantization) operation into the network model during network model training or fine-tuning to simulate the quantization operation, so that the parameters of the network model can better adapt to the information loss caused by quantization.
- Image compression refers to a technology that uses image data characteristics such as spatial redundancy, visual redundancy, and statistical redundancy to represent the original image pixel matrix losslessly or with fewer bits, achieving effective transmission and storage of image information. It plays an important role in the current media era where the types and amount of image information are increasing.
- Image compression technology includes encoding and decoding of images, and encoding and decoding performance (reflecting image quality) and encoding and decoding efficiency (reflecting time consumption) are factors that need to be considered in image compression technology.
- JPEG Joint Photographic Experts Group
- Stability mainly includes two aspects. On the one hand, it is the stability of the bit rate after encoding the image without adding noise and the stability of the effect of reconstructing the image. On the other hand, it is the stability of the bit rate after encoding the image with specific noise and the stability of the effect of reconstructing the image, that is, the stability against attacks.
- AI compression algorithms have poor stability in both aspects. For example, some The bit rate of the image with added noise after passing through the codec network model of the AI compression algorithm is too high and/or the reconstructed image has abnormalities.
- the bit rate of the image with added specific noise increases abnormally and/or the reconstructed image has abnormalities after passing through the codec network model, that is, the performance of the AI compression algorithm is severely reduced.
- images with no abnormalities in bit rate and reconstructed image after passing through the codec network model are referred to as normal sample images, and images with abnormalities in bit rate and/or reconstructed image after passing through the codec network model are referred to as abnormal sample images.
- the technicians have counted the feature maps output by the normal sample image and the abnormal sample image in each layer of the codec network model.
- the feature map output by any layer of the normal sample image in the codec network model can be called a normal sample feature map
- the feature map output by any layer of the abnormal sample image in the codec network model can be called an abnormal sample feature map. That is, the maximum eigenvalue and the minimum eigenvalue in the feature map of each channel included in the normal sample feature map and the abnormal sample feature map are counted.
- FIG3. Taking the hyper encoder (HyEnc) network model included in the encoding and decoding network model as an example, it can be seen from the statistical graph of the maximum eigenvalue in the feature graph of each channel included in the normal sample feature graph and the abnormal sample feature graph outputted by the first network layer in the hyperencoding network model that in the first network layer in the hyperencoding network model, the maximum eigenvalue in the feature graph of each channel included in the abnormal sample feature graph is greater than the maximum eigenvalue in the feature graph of each channel included in the normal sample feature graph.
- the minimum eigenvalue in the feature graph of each channel included in the abnormal sample feature graph is less than the minimum eigenvalue in the feature graph of each channel included in the normal sample feature graph.
- the range of eigenvalues in the feature graph of each channel of the normal sample feature graph is smaller than the range of eigenvalues in the feature graph of the corresponding channel of the abnormal sample feature graph.
- the codec network model introduces a huge amount of parameters and calculations while improving the codec performance, which leads to various problems when the codec network model runs on a resource-limited end-side device.
- the codec network model runs on a low-performance mobile device or a low-power embedded device, the efficiency of model reasoning will be reduced.
- some end-side devices do not support floating-point calculations of the codec network model, which limits the deployment of the codec network model on the end-side device.
- the above-mentioned codec network model has the problem of poor stability and is difficult to deploy on end-side devices with limited resources.
- the range of eigenvalues in the normal sample feature map output by each network layer in the codec network model can be determined based on the normal sample image during the model quantization process, and then when the model is inferred, the eigenvalue of each feature point in the feature map output by the corresponding network layer can be truncated based on the range of eigenvalues in the normal sample feature map output by each network layer in the codec network model, so that the value of the eigenvalue is consistent with the eigenvalue in the normal sample feature map, thereby reducing abnormal eigenvalues to a certain extent, but this cannot effectively solve the problem of poor stability of the codec network model.
- the range of eigenvalues in the normal sample feature map output by any network layer in the codec network model is referred to as the truncation value of the network layer in the following text, and the truncation value includes the minimum truncation value and the maximum truncation value.
- the minimum truncation value refers to the minimum value in the value range
- the maximum truncation value refers to the maximum value in the value range.
- the truncation value used by the network layer is the maximum eigenvalue and the minimum eigenvalue of all eigenvalues included in the normal sample feature map output by the network layer.
- the truncation value may be greater than or equal to the value range of the eigenvalues of the abnormal sample feature map in the feature map of a certain channel. In this case, the abnormal eigenvalues of the channel cannot be truncated. In this way, the bit rate and/or reconstructed image of the abnormal sample image after passing through the codec network model will still be abnormal, thereby failing to effectively solve the problem of poor stability of the codec network model.
- Figure 5 is a statistical graph of the maximum eigenvalues in the feature map of each channel included in the normal sample feature map and the abnormal sample feature map output by a network layer in the codec network model.
- the maximum truncation value of the network layer is greater than the maximum eigenvalue in the feature map of each channel included in the abnormal sample feature map.
- the abnormal eigenvalues of other channels except channel 8 cannot be truncated.
- the reconstructed image a in Figure 6 is the reconstructed image obtained by the abnormal sample image after the unquantized codec network model
- the reconstructed image b is the reconstructed image obtained by the abnormal sample image after the quantized codec network model.
- the embodiment of the present application provides a codec A quantization method for a network model, after quantizing the codec network model by this method, can effectively truncate the abnormal feature values of the feature map output by the first network layer in each channel, thereby correcting the feature map output by the first network layer, so that the finally generated bitstream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
- FIG. 7 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
- the implementation environment includes a source device 10, a destination device 20, a link 30, and a storage device 40.
- the source device 10 can generate an encoded image. Therefore, the source device 10 can also be referred to as an image encoding device.
- the destination device 20 can decode the encoded image generated by the source device 10. Therefore, the destination device 20 can also be referred to as an image decoding device.
- the link 30 can receive the encoded image generated by the source device 10, and can transmit the encoded image to the destination device 20.
- the storage device 40 can receive the encoded image generated by the source device 10, and can store the encoded image.
- the destination device 20 can directly obtain the encoded image from the storage device 40.
- the storage device 40 can correspond to a file server or another intermediate storage device that can store the encoded image generated by the source device 10. Under such conditions, the destination device 20 can stream or download the encoded image stored in the storage device 40.
- the source device 10 and the destination device 20 may each include one or more processors and a memory coupled to the one or more processors, the memory may include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, etc.
- RAM random access memory
- ROM read-only memory
- EEPROM electrically erasable programmable read-only memory
- flash memory any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, etc.
- the source device 10 and the destination device 20 may each include a mobile phone, a smart phone, a personal digital assistant (PDA), a wearable device, a pocket PC (PPC), a tablet computer, a smart car machine, a smart TV, a smart speaker, a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a telephone handset such as a so-called "smart" phone, a television, a camera, a display device, a digital media player, a video game console, a car computer, or the like.
- PDA personal digital assistant
- PPC pocket PC
- the link 30 may include one or more media or devices capable of transmitting the encoded image from the source device 10 to the destination device 20.
- the link 30 may include one or more communication media that enable the source device 10 to send the encoded image directly to the destination device 20 in real time.
- the source device 10 may modulate the encoded image based on a communication standard, which may be a wireless communication protocol, etc., and may send the modulated image to the destination device 20.
- the one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include a radio frequency (RF) spectrum or one or more physical transmission lines.
- RF radio frequency
- the one or more communication media may form part of a packet-based network, which may be a local area network, a wide area network, or a global network (e.g., the Internet), etc.
- the one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., and the embodiment of the present application does not specifically limit this.
- the storage device 40 may store the received encoded image sent by the source device 10, and the destination device 20 may directly obtain the encoded image from the storage device 40.
- the storage device 40 may include any of a variety of distributed or locally accessible data storage media, for example, any of the multiple distributed or locally accessible data storage media may be a hard disk drive, a Blu-ray disc, a digital versatile disc (DVD), a compact disc read-only memory (CD-ROM), a flash memory, a volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded images.
- the storage device 40 may correspond to a file server or another intermediate storage device that can store the encoded image generated by the source device 10, and the destination device 20 may stream or download the image stored in the storage device 40.
- the file server may be any type of server that can store the encoded image and send the encoded image to the destination device 20.
- the file server may include a network server, a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive, etc.
- the destination device 20 may obtain the encoded image through any standard data connection (including an Internet connection).
- Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), a cable modem, etc.), or a combination of the two suitable for obtaining the encoded image stored on the file server.
- the transmission of the encoded image from the storage device 40 may be streaming transmission, download transmission, or a combination of the two.
- the implementation environment shown in FIG7 is only one possible implementation method, and the technology of the embodiment of the present application is not only applicable to the source device 10 that can encode images and the destination device 20 that can decode the encoded images shown in FIG7, but also to other devices that can encode images and decode encoded images, and the embodiment of the present application does not make specific limitations on this.
- the source device 10 includes a data source 120, an encoder 100, and an output interface 140.
- the output interface 140 may include a modulator/demodulator (modem) and/or a transmitter, where the transmitter may also be referred to as a transmitter.
- the data source 120 may include an image capture device (e.g., a camera, etc.), an archive containing previously captured images, a feed interface for receiving images from an image content provider, and/or a computer graphics system for generating images, or a combination of these sources of images.
- the data source 120 may send an image to the encoder 100, and the encoder 100 may encode the image received from the data source 120 to obtain an encoded image.
- the encoder may send the encoded image to the output interface.
- the source device 10 directly sends the encoded image to the destination device 20 via the output interface 140.
- the encoded image may also be stored in the storage device 40 for the destination device 20 to obtain and use for decoding and/or display later.
- the destination device 20 includes an input interface 240, a decoder 200, and a display device 220.
- the input interface 240 includes a receiver and/or a modem.
- the input interface 240 may receive an encoded image via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 may decode the received encoded image to obtain a decoded image.
- the decoder may send the decoded image to the display device 220.
- the display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. In general, the display device 220 displays the decoded image.
- the display device 220 may be any of a variety of types of display devices, for example, the display device 220 may be a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
- LCD liquid crystal display
- plasma display a plasma display
- OLED organic light-emitting diode
- encoder 100 and decoder 200 may be integrated with an encoder and decoder, respectively, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software for encoding both audio and video in a common data stream or in separate data streams.
- MUX-DEMUX multiplexer-demultiplexer
- the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP), if applicable.
- the encoder 100 and the decoder 200 may each be any of the following circuits: one or more microprocessors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuits, ASIC), field-programmable gate arrays (field-programmable gate arrays, FPGA), discrete logic, hardware, or any combination thereof.
- DSP digital signal processors
- ASIC application specific integrated circuits
- FPGA field-programmable gate arrays
- the device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors to execute the instructions in hardware to implement the technology of the embodiment of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be regarded as one or more processors.
- Each of the encoder 100 and the decoder 200 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as part of a combined encoder/decoder (codec) in a corresponding device.
- codec encoder/decoder
- the embodiments of the present application may generally refer to the encoder 100 as “signaling” or “sending” certain information to another device, such as the decoder 200.
- the term “signaling” or “sending” may generally refer to the transmission of syntax elements and/or other data used to decode the compressed image. This transmission may occur in real time or near real time. Alternatively, this communication may occur over a period of time, such as when the syntax elements are stored in the encoded bitstream to a computer-readable storage medium at the time of encoding, and the decoding device may then retrieve the syntax elements at any time after the syntax elements are stored to this medium.
- the quantization method of the codec network model provided in the embodiment of the present application can be applied to a variety of scenarios.
- the images encoded and decoded in various scenarios can be images included in image files or images included in video files.
- Fig. 8 is a schematic diagram of another implementation environment provided by an embodiment of the present application.
- the implementation environment includes an encoding end and a decoding end, the encoding end includes an AI encoding module, an entropy encoding module and a file saving module, and the decoding end includes a file loading module, an entropy decoding module and an AI decoding module.
- the encoder obtains the image to be compressed, such as videos and pictures captured by the camera.
- the AI encoding module obtains the features to be encoded and the corresponding probability distribution. Based on the probability distribution, the entropy encoding module performs entropy encoding on the features to be encoded to obtain a bitstream file, which is saved by the file saving module to obtain a compressed file of the image.
- the compressed file is input to the decoder, which loads the compressed file through the file loading module and obtains the reconstructed image through the entropy decoding module and the AI decoding module.
- the above encoding and decoding process can be implemented by an encoding and decoding network model, and the quantization method of the encoding and decoding network model provided in the embodiment of the present application can quantize the encoding and decoding network model.
- the encoding and decoding network model includes at least one network model of an encoding network model, a decoding network model and an entropy estimation network model.
- the AI encoding unit in Figure 8 is equivalent to including the encoding network model and entropy estimation network model in Figure 1
- the AI decoding unit is equivalent to including the decoding network model in Figure 1. and entropy estimation network models.
- the data processing process of the AI encoding module and the AI decoding module is implemented on an embedded neural network processing unit (NPU) to improve data processing efficiency, and the processes such as entropy coding, saving files, and loading files are implemented on a central processing unit (CPU).
- NPU embedded neural network processing unit
- CPU central processing unit
- the encoding end and the decoding end are one device, or the encoding end and the decoding end are two independent devices. If the encoding end and the decoding end are one device, the encoding and decoding network model model quantized by the method provided in the embodiment of the present application can be deployed on the device. If the encoding end and the decoding end are two independent devices, the encoding and decoding network model quantized by the method provided in the embodiment of the present application can be deployed in the two devices respectively. That is, for one device, the device has both an image compression function and an image decompression function, or the device has an image compression function or an image decompression function.
- the quantization method of the encoding and decoding network model provided in the embodiment of the present application can be applied to a variety of scenarios, such as cloud storage, video surveillance, live broadcast, transmission and other business scenarios, and can be specifically applied to terminal recording, video albums, cloud storage, etc.
- the quantization method of the codec network model provided in the embodiment of the present application can be applied to any codec network model.
- the codec network model based on VAE is introduced.
- the original image is input into the encoding network model to extract features to obtain image features y to be quantified of multiple feature points, and the image features y to be quantified of the multiple feature points are quantized to obtain the first image features of the multiple feature points.
- the first image feature of the plurality of feature points Input into the super coding network model to obtain the super prior features z of the multiple feature points to be quantified, and quantize the super prior features z of the multiple feature points to be quantified to obtain the first super prior features of the multiple feature points
- the first super-prior features of the multiple feature points are calculated according to the specified probability distribution.
- Entropy coding is performed to convert As shown in Figure 2
- the bit sequence obtained by entropy coding is a partial bit sequence included in the code stream.
- This partial bit sequence (as shown by the black and white bars on the right side of FIG. 2 ) can be called a super-prior bit stream.
- the first super prior features of the multiple feature points Input the super decoding network model to obtain the prior features ⁇ of the multiple feature points.
- the context model (CM) is input to obtain the context features ⁇ of the multiple feature points.
- the probability distribution N( ⁇ , ⁇ ) of the multiple feature points is estimated through a probability distribution estimation network model (shown as a gather model, GM) based on the probability distribution N( ⁇ , ⁇ ) of the multiple feature points.
- the first image feature of each feature point in the multiple feature points is sequentially converted into the image feature vector.
- the bit sequence obtained by entropy coding is a partial bit sequence included in the code stream. This partial bit sequence (as shown by the black and white bars on the left side of FIG. 9 ) can be called an image bit stream.
- the first super-prior feature of the plurality of feature points is obtained by entropy decoding from the super-prior bit stream included in the bit stream according to the specified probability distribution.
- the first super prior features of the multiple feature points Input the super decoding network model to obtain the prior features ⁇ of the multiple feature points. For the first feature point among the multiple feature points, estimate the probability distribution of the first feature point based on the prior features of the first feature point, and parse the first image feature of the first feature point from the image bit stream included in the code stream based on the probability distribution of the first feature point.
- the non-first feature point among the multiple feature points such as the first feature point
- combine the prior features and context features of the first feature point estimate the probability distribution of the first feature point through the probability distribution estimation network model GM
- the estimated probability distribution parameters include mean and variance. If any of the above-mentioned probability distribution estimation network models uses a Laplace distribution model to model, the estimated probability distribution parameters include location parameters and scale parameters. If any of the above-mentioned probability distribution estimation network models uses a logistic distribution model to model, the estimated probability distribution parameters include mean and scale parameters.
- the probability distribution estimation network model in the embodiment of the present application can also be referred to as a factor entropy model, and the probability distribution estimation network model is a part of the entropy estimation network model, and the entropy estimation network model also includes the above-mentioned super encoding network model and super decoding network model.
- FIG10 is a schematic diagram of the structure of a coding network model provided in an embodiment of the present application.
- the coding network model is a convolutional neural network model, which includes four convolutional layers (Conv) and three activation layers (such as Relu or other activation function structures) interspersed in cascade.
- the activation layer is constructed).
- the convolution kernel size of each convolution layer is 5 ⁇ 5, the number of channels of the output feature map is M, and each convolution layer downsamples the width and height by 2 times.
- the structure of the encoding network model shown in Figure 10 is not used to limit the embodiments of the present application.
- the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the number of convolution layers, etc. can all be adjusted.
- FIG 11 is a structural diagram of a decoding network model provided in an embodiment of the present application.
- the decoding network model is a convolutional neural network model, which includes four convolutional layers (Conv) and three activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions).
- the convolution kernel size of each convolution layer is 5 ⁇ 5, the number of channels of the output feature map is M or N, and each convolution layer upsamples the width and height by 2 times.
- the structure of the decoding network model shown in Figure 11 is not used to limit the embodiments of the present application.
- the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the number of convolution layers, etc. can all be adjusted.
- FIG 12 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application.
- the entropy estimation network model includes a super encoding network model (HyEnc), a factor entropy model and a super decoding network model (HyDec).
- the super encoding network model includes three convolutional layers (Conv) and two activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions).
- the convolution kernel size of each convolutional layer is 5 ⁇ 5, the number of channels of the output feature map is M, the first two convolutional layers downsample the width and height by 2 times, and the last convolutional layer does not downsample.
- the network model structure of the factor entropy model is the same as the network structure of the probability distribution estimation network model introduced above.
- the super encoding network model includes three convolutional layers (Conv) and two activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions).
- the convolution kernel size of each convolutional layer is 5 ⁇ 5, the number of channels of the output feature map is M, the first convolutional layer does not upsample, and the last two convolutional layers upsample the width and height by 2 times. It should be noted that the structure of the entropy estimation network model shown in Figure 12 is not used to limit the embodiments of the present application.
- the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the upsampling multiple, the number of upsampling times, the number of convolution layers, etc. can all be adjusted.
- Fig. 13 is a flow chart of a quantization method of a coding network model provided in an embodiment of the present application. Referring to Fig. 13, the method includes the following steps.
- Step 1301 Determine H network units included in the unquantized codec network model, each of the H network units includes a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, and the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, and H is an integer greater than or equal to 1.
- multiple third network layers can be determined from an unquantized codec network model, and for any third network layer among the multiple third network layers, if the network layer adjacent to the third network layer is a linear operation layer, the adjacent network layer is used as a first network layer, and if the network layer adjacent to the third network layer is not a linear operation layer, the adjacent network layer is not used as a first network layer.
- Each third network layer among the multiple third network layers is processed in the same manner, and finally the first network layer among the H network units can be determined.
- each linear operation layer adjacent to each third network layer in the codec network model can be used as the first network layer.
- each linear operation layer included in the codec network model can also be screened to determine the linear operation layer that will cause abnormal codec results, and the linear operation layer that will cause abnormal codec results will be used as the first network layer.
- M Determine M first sample feature maps output by the first linear operation layer, the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, the first linear operation layer is any linear operation layer included in the encoding and decoding network model, determine N second sample feature maps output by the first linear operation layer, the N second sample feature maps are feature maps corresponding to N abnormal sample images, N is an integer greater than or equal to 1, and when the M first sample feature maps and the N second sample feature maps meet the feature abnormality condition, determine that the first linear operation layer is the first network layer.
- the M feature maps output by the first linear operation layer of the M normal sample images are used as the M first sample feature maps
- the N feature maps output by the first linear operation layer of the N abnormal sample images are used as the N second sample feature maps.
- the first statistical parameter is determined, based on the N second sample feature maps, the second statistical parameter is determined, and based on the first statistical parameter and the second statistical parameter, it is determined whether the M first sample feature maps and the N second sample feature maps meet the feature abnormality condition.
- the first linear operation layer includes W channels
- the first sample feature map includes feature maps of the W channels
- the first statistical parameter includes first parameters of the W channels.
- the implementation process of determining the first statistical parameter includes: counting candidate parameters of the feature map of the target channel included in each of the M first sample feature maps to obtain M candidate parameters, the target channel is any one of the W channels, and based on the M candidate parameters, determining the first parameter of the target channel.
- the first parameter includes a first maximum value and a first minimum value
- the feature map of the target channel in each first sample feature map includes feature values of multiple feature points
- the candidate parameter includes a candidate maximum feature value and a candidate minimum feature value.
- the detailed implementation process of determining the first parameter of the target channel includes: counting the maximum feature value and the minimum feature value in the feature map of the target channel included in each first sample feature map to obtain M candidate maximum feature values and M candidate minimum feature values.
- the first maximum value is determined based on the M candidate maximum feature values
- the first minimum value is determined based on the M candidate minimum feature values.
- the average value of the M candidate maximum eigenvalues can be determined as the first maximum value.
- the first maximum value can also be determined based on the M candidate maximum eigenvalues using a sliding average method.
- the mean and standard deviation corresponding to the M candidate maximum eigenvalues can also be determined based on the M candidate maximum eigenvalues, and then the sum of three times the standard deviation and the mean can be determined as the first maximum value.
- the maximum value among the M candidate maximum eigenvalues can also be determined as the first maximum value.
- the first maximum value can also be determined in other ways, and the embodiments of the present application are not limited to this.
- the implementation method of determining the first minimum value based on M candidate minimum eigenvalues is similar to the implementation method of determining the first maximum value based on M candidate maximum eigenvalues. For details, please refer to the relevant content above and will not be repeated here.
- the second sample feature map includes feature maps of W channels
- the second statistical parameter includes second parameters of the W channels.
- the implementation process of determining the second statistical parameter includes: counting the candidate parameters of the feature map of the target channel included in each of the N second sample feature maps to obtain N candidate parameters, the target channel is any one of the W channels, and based on the N candidate parameters, determining the second parameter of the target channel.
- the second parameter includes a second maximum value and a second minimum value
- the feature map of the target channel in each second sample feature map includes feature values of multiple feature points
- the candidate parameter includes a candidate maximum feature value and a candidate minimum feature value.
- the implementation process of determining the second parameter of the target channel is similar to the implementation process of determining the first parameter of the target channel described above. For details, please refer to the relevant content above, which will not be repeated here.
- the implementation process of determining whether the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition based on the first statistical parameter and the second statistical parameter includes: determining the number of abnormal channels based on the first parameters of the W channels and the second parameters of the W channels, and determining that the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition when the number of abnormal channels is greater than or equal to the threshold value of the number of abnormal channels. When the number of abnormal channels is less than the threshold value of the number of abnormal channels, determining that the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition.
- any one of the W channels if the difference between the first maximum value and the second maximum value of the channel is greater than the maximum value difference threshold and the difference between the first minimum value and the second minimum value is greater than the minimum value difference threshold, the channel is determined to be an abnormal channel, and the number of abnormal channels is increased by 1. Otherwise, it is determined that the channel is not an abnormal channel. In the same way, each of the W channels is processed in the above manner to obtain the number of abnormal channels.
- any one of the W channels if the difference between the first maximum value and the second maximum value of the channel divided by the value of the first maximum value is greater than the maximum value ratio threshold, and the difference between the first minimum value and the second minimum value divided by the value of the first minimum value is greater than the minimum value ratio threshold, the channel is determined to be an abnormal channel, and the number of abnormal channels is increased by 1. Otherwise, it is determined that the channel is not an abnormal channel.
- each of the W channels is processed in the above manner to obtain the number of abnormal channels.
- the number of abnormal channels can also be determined in other ways, and the embodiments of the present application are not limited to this.
- the number of abnormal channels is greater than or equal to the threshold value of the number of abnormal channels, it means that the number of abnormal channels is large, and therefore, it can be determined that the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition.
- the number of abnormal channels is less than the threshold value of the number of abnormal channels, it means that the number of abnormal channels is small, and therefore, it can be determined that the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition.
- the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition, it means that the feature graph output by the first linear operation layer is abnormal, and therefore, it can be determined that the first linear operation layer is the first network layer.
- the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition, it means that the feature graph output by the first linear operation layer is not abnormal, and therefore, it can be determined that the first linear operation layer is not the first network layer.
- the implementation process of determining the second network layer in the H network units included in the unquantized codec network model includes: for the first network layer in each network unit, when there is a linear operation layer after the first network layer, the linear operation layer after the first network layer is determined as the second network layer; when there is no linear operation layer after the first network layer, a linear operation layer is added after the first network layer, and the added linear operation layer is determined as the second network layer.
- the weights included in the second network layer can all be 1 before weight scaling is performed on the second network layer, so that the input feature map and the output feature map of the second network layer are consistent before weight scaling is performed.
- the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
- the abnormal channel number threshold is set in advance and is related to the number of channels in the linear operation layer.
- the abnormal channel number threshold can be set to 80% of the number of channels in the linear operation layer.
- the minimum value difference threshold, the maximum value difference threshold, the maximum value ratio threshold and the minimum value ratio threshold are also set in advance and can be adjusted according to different requirements in different situations.
- Figure 14 is a schematic diagram of a network unit provided in an embodiment of the present application.
- the encoding and decoding network model includes H network units, namely network unit 1, network unit 2 to network unit H.
- Step 1302 Scale the weights of each output channel included in the first network layer in each network unit so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold.
- K For the first network layer in each network unit, determine K third sample feature maps output by the first network layer, where the K third sample feature maps are feature maps corresponding to K normal sample images, K is an integer greater than or equal to 1, and determine the scaling ratio of each output channel included in the first network layer based on the K third sample feature maps, and scale the weights of each output channel included in the first network layer according to the scaling ratio of each output channel included in the first network layer.
- the reference feature value corresponding to the first network layer and the boundary feature value of each output channel included in the first network layer are determined based on K third sample feature maps, and the ratio between the reference feature value corresponding to the first network layer and the boundary feature value of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer.
- the third statistical parameter of the first network layer is determined, and based on the third statistical parameter of the first network layer, the reference characteristic value corresponding to the first network layer and the boundary characteristic value of each output channel included in the first network layer are determined.
- the implementation process of determining the third statistical parameter of the first network layer based on the K third sample feature graphs is consistent with the process of determining the first statistical parameter based on the M first sample feature graphs in the above text. For details, please refer to the relevant content above, which will not be repeated here.
- the first network layer includes P output channels
- the third statistical parameter includes the third parameter of the P output channels
- the third parameter may include a third maximum value or a third minimum value
- the boundary eigenvalue may be a maximum eigenvalue or a minimum eigenvalue
- the boundary eigenvalue of the channel refers to the maximum eigenvalue or the minimum eigenvalue in the feature map corresponding to the channel in the feature map output by the first network layer.
- the methods for determining the reference eigenvalue corresponding to the first network layer and the boundary eigenvalues of each output channel included in the first network layer are different. They will be introduced separately below.
- the third parameter includes a third maximum value
- the maximum value of the third maximum values of the P output channels is used as the reference eigenvalue
- the third maximum value of the P output channels is used as the boundary eigenvalue of the P output channels included in the first network layer.
- the third parameter includes the third minimum value, the minimum value among the third minimum values of the P output channels is used as the reference eigenvalue, and the third minimum value of the P output channels is used as the boundary eigenvalue of the P output channels included in the first network layer.
- the reference eigenvalue corresponding to the first network layer is divided by the value of the boundary eigenvalue of the output channel to determine the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of the output channel.
- Each of the P output channels is processed in the same manner to obtain the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of each output channel included in the first network layer.
- the implementation process of scaling the weights of each output channel included in the first network layer includes: for any one of the P output channels, multiplying the weight of the output channel by the scaling ratio of the output channel to achieve scaling of the weight of the output channel.
- the weight of each of the P output channels is processed in the same manner to achieve scaling of the weights of each output channel included in the first network layer.
- the process of scaling the weights of each output channel included in the first network layer is now illustrated by way of example.
- the first network layer includes 3 input channels and 5 output channels.
- the feature map y1 a ⁇ x1+b ⁇ x2+c ⁇ x3 output by output channel 1, and the determination method of y2 to y5 is similar to that of y1, which will not be repeated here.
- the weights of output channel 1 are a, b, and c. If the scaling ratio of output channel 1 is 2, then after scaling the weights of output channel 1 according to the scaling ratio of output channel 1, the weights of output channel 1 become 2a, 2b, and 2c.
- the output channel can output the feature map corresponding to the channel. If the network layer includes multiple output channels, feature maps corresponding to multiple output channels can be obtained, and the feature maps corresponding to the multiple output channels of the network layer can be referred to as feature maps output by the network layer.
- the difference between the boundary feature values of each channel in the feature map output by the first network layer in each of the above network units is less than the feature threshold, which can be understood as follows: for any first network layer, the difference between the boundary feature values of each channel in the feature map output by the first network layer is less than the feature threshold, that is, the difference between the boundary feature values of each channel is less than the feature threshold for the same feature map.
- the feature threshold is set in advance, and can also be adjusted according to different needs in different situations.
- Step 1303 Scale the weights of each input channel included in the second network layer in each network unit so that the output result of the codec network model is the same before and after the model weight scaling, and the model weight scaling includes weight scaling of each output channel of the first network layer in each network unit and weight scaling of each input channel of the second network layer.
- the scaling ratio of each input channel included in the second network layer is determined based on the scaling ratio of each output channel included in the first network layer in the network unit where the second network layer is located, and the weights of each input channel included in the second network layer are scaled according to the scaling ratio of each input channel included in the second network layer.
- the first network layer in the network unit where the second network layer is located includes P output channels
- the second network layer includes P input channels
- the P input channels of the second network layer correspond one-to-one to the P output channels of the first network layer.
- the reciprocal of the scaling ratio of the output channel corresponding to the input channel among the P output channels included in the first network layer in the network unit where the second network layer is located is used as the scaling ratio of the input channel.
- Each of the P input channels is processed in the same manner to obtain the scaling ratio of each input channel included in the second network layer.
- the implementation process of scaling the weights of the input channels included in the second network layer according to the scaling ratio of the input channels included in the second network layer is similar to the implementation process of scaling the weights of the output channels included in the first network layer according to the scaling ratio of the output channels included in the first network layer, and will not be repeated here.
- the second network layer includes 3 input channels and 5 output channels, and the weights of the input data x1 of input channel 1 on output channels 1 to 5 are a, d, j, g, and m, respectively.
- the weights of x2 to x5 on output channels 1 to 5 are similar to x1 and will not be repeated here.
- the weights of input channel 1 are a, d, j, g, and m. If the scaling ratio of input channel 1 is 0.5, then after scaling the weights of input channel 1 according to the scaling ratio of input channel 1, the weights of input channel 1 become 0.5a, 0.5d, 0.5j, 0.5g, and 0.5m.
- Step 1304 quantize the codec network model after the model weights are scaled.
- PTQ can be used to quantize the codec network model after the model weights are scaled.
- QAT can also be used to quantize the codec network model after the model weights are scaled.
- other methods can also be used to quantize the codec network model after the model weights are scaled, and the embodiments of the present application do not limit this.
- the reconstructed image 1 in Figure 18 is the reconstructed image obtained by the codec network model after the abnormal sample image is quantized by the codec network model after the model weights are scaled by PTQ.
- the reconstructed image 2 is the reconstructed image obtained by the codec network model after the abnormal sample image is quantized by QAT after the model weights are scaled. It can be seen from Figure 18 that whether QAT is used to quantize the codec network model after the model weights are scaled, or PTQ is used to quantize the codec network model after the model weights are scaled, the stability of the codec network model can be enhanced, so that the compression rate and the reconstructed image of the image after passing through the codec network model are greatly improved.
- the quantization method of the codec network model provided in the embodiment of the present application can truncate abnormal eigenvalues greater than the maximum cutoff value when the boundary eigenvalue is the maximum eigenvalue, and can truncate abnormal eigenvalues less than the minimum cutoff value when the boundary eigenvalue is the minimum eigenvalue.
- the above-mentioned third network layer has the characteristics of overcoming gradient vanishing and accelerating training speed, and can also increase the nonlinear expression ability of the encoding and decoding network model.
- the boundary eigenvalue is the maximum eigenvalue, that is, when the abnormal eigenvalue greater than the maximum truncation value is truncated
- the third network layer can be an activation layer constructed based on the Relu function, so that the minimum eigenvalue of the eigenvalue of each feature point in the feature map output by the third network layer is 0, that is, all are non-negative.
- the third network layer can be an activation layer constructed based on a function after the Relu function is symmetric about the y-axis, so that the maximum eigenvalue of the eigenvalue of each feature point in the feature map output after the third network layer is The value is 0, that is, all are non-positive.
- the maximum cutoff value is generally greater than 0, the abnormal feature value greater than the maximum cutoff value is truncated to 0 after passing through the third network layer, thereby realizing the correction of the abnormal feature value.
- the first network layer of the six network units included in the codec network model is network layer 1, network layer 2, network layer 3, network layer 4, network layer 5 and network layer 6 in the entropy estimation network model.
- the process of encoding image 1 is as follows: image 1 passes through the encoding network to obtain feature y.
- feature y passes through network layer 1, network layer 2 and network layer 3 in the super-coding network model, the feature map will be corrected according to the cutoff value of each network layer, and finally the prior feature z is obtained.
- the feature map will be corrected according to the cutoff value of each network layer, and finally the probability distribution of each feature point of feature y is obtained, and then the feature to be encoded is determined based on the probability distribution of each feature point of feature y, and the feature to be encoded is entropy encoded to obtain a code stream.
- the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
- the linear operation layers included in the codec network model can be screened to determine the linear operation layers that will cause abnormal codec results, and the linear operation layers that will cause abnormal codec results will be used as the first network layer, so that the weights of each output channel included in the linear operation layer that causes abnormal codec results can be targeted and scaled, while reducing the amount of calculation of the codec network model, the stability of the codec network model can also be effectively guaranteed.
- the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of abnormal eigenvalues, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
- a quantization method for a codec network model provided in an embodiment of the present application can greatly enhance the stability of the quantized codec network model in encoding and decoding various images without increasing the computational intensity of the codec network model and without reducing the performance of the codec network model. It can also be used to enhance the stability of quantization models for other low-level tasks (such as super-resolution tasks, denoising tasks, etc.).
- FIG20 is a schematic diagram of the structure of a quantization device for a coding network model provided in an embodiment of the present application, and the quantization device for the coding network model can be implemented by software, hardware, or a combination of both to become part or all of the quantization device for the coding network model.
- the device includes: a determination module 2001, a first scaling module 2002, a second scaling module 2003, and a quantization module 2004.
- Determination module 2001 is used to determine the H network units included in the unquantized codec network model, each of the H network units includes a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, and the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, and H is an integer greater than or equal to 1.
- the first scaling module 2002 is used to scale the weights of each output channel included in the first network layer in each network unit so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold.
- the second scaling module 2003 is used to scale the weights of each input channel included in the second network layer in each network unit so that the output result of the codec network model is the same before and after the model weight scaling.
- the model weight scaling includes weight scaling of each output channel of the first network layer in each network unit and weight scaling of each input channel of the second network layer.
- the quantization module 2004 is used to quantize the codec network model after the model weights are scaled.
- the determination module 2001 is specifically used for:
- M first sample feature maps output by the first linear operation layer where the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, and the first linear operation layer is any linear operation layer included in the codec network model;
- N second sample feature maps output by the first linear operation layer, where the N second sample feature maps are feature maps corresponding to N abnormal sample images, and N is an integer greater than or equal to 1;
- the first linear operation layer is determined to be the first network layer.
- the determination module 2001 is specifically used for:
- the linear operation layer after the first network layer is determined as the second network layer;
- a linear operation layer is added after the first network layer, and the added linear operation layer is determined as the second network layer.
- the first scaling module 2002 is specifically configured to:
- K third sample feature maps output by the first network layer where the K third sample feature maps are feature maps corresponding to K normal sample images, and K is an integer greater than or equal to 1;
- the weights of each output channel included in the first network layer are scaled.
- the first scaling module 2002 is specifically configured to:
- the ratio between the reference characteristic value corresponding to the first network layer and the boundary characteristic value of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer.
- the second scaling module 2003 is specifically configured to:
- the weights of each input channel included in the second network layer are scaled.
- the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
- the boundary eigenvalue is a maximum eigenvalue or a minimum eigenvalue.
- the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
- the linear operation layers included in the codec network model can be screened to determine the linear operation layers that will cause abnormal codec results, and the linear operation layers that will cause abnormal codec results will be used as the first network layer, so that the weights of each output channel included in the linear operation layer that causes abnormal codec results can be targeted and scaled, while reducing the amount of calculation of the codec network model, the stability of the codec network model can also be effectively guaranteed.
- the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of abnormal eigenvalues, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
- a quantization method for a codec network model provided in an embodiment of the present application can greatly enhance the stability of the quantized codec network model in encoding and decoding various images without increasing the computational intensity of the codec network model and without reducing the performance of the codec network model. It can also be used to enhance the stability of quantization models for other low-level tasks (such as super-resolution tasks, denoising tasks, etc.).
- the quantization device of the codec network model provided in the above embodiment only uses the division of the above functional modules as an example when performing the quantization of the codec network model.
- the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
- the quantization device of the codec network model provided in the above embodiment and the quantization method embodiment of the codec network model belong to the same concept. The specific implementation process is detailed in the method embodiment, which will not be repeated here.
- all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof.
- the computer program product When the computer program product is loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part.
- the computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part.
- the computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable device.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
- the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or data center that includes one or more available media integrated.
- the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a digital versatile disc (DVD)) or a semiconductor medium (e.g., a solid state disk (SSD)), etc.
- the computer-readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.
- the information including but not limited to user device information, user personal information, etc.
- data including but not limited to data for analysis, stored data, displayed data, etc.
- signals involved in the embodiments of the present application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions.
- the feature graphs output by the first network layer and the feature graphs output by the third network layer involved in the embodiments of the present application are obtained with full authorization.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
本申请要求于2023年02月08日提交的申请号为202310153239.1、发明名称为“编解码网络模型的量化方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese patent application No. 202310153239.1, filed on February 8, 2023, and entitled “Quantization method and related device for coding and decoding network model”, the entire contents of which are incorporated by reference into this application.
本申请涉及图像压缩领域,特别涉及一种编解码网络模型的量化方法和相关装置。The present application relates to the field of image compression, and in particular to a quantization method and related device for a coding and decoding network model.
随着深度学习技术在图像识别、目标检测等领域的广泛应用,深度学习技术也被应用于图像压缩任务中,也即是,使用编解码网络模型进行图像压缩。这种使用编解码网络模型进行图像压缩的方法在编解码效率和图像压缩效果等方面都优于传统的图像压缩方法。例如,使用变分自编码器(variational auto-encoder,VAE)进行图像编解码,能够大幅提升编解码性能和图像压缩效果。但是,编解码网络模型在编解码性能的提高同时也引入了巨大的参数量和计算量,从而导致该编解码网络模型运行在资源有限的端侧设备上会出现各种问题。例如,当编解码网络模型运行在低性能的移动设备或者低功耗的嵌入式设备上时,会导致模型推理的效率降低。又例如,部分端侧设备不支持编解码网络模型的浮点计算,从而限制了编解码网络模型在端侧设备上的部署。With the widespread application of deep learning technology in image recognition, target detection and other fields, deep learning technology has also been applied to image compression tasks, that is, using codec network models for image compression. This method of using codec network models for image compression is superior to traditional image compression methods in terms of codec efficiency and image compression effect. For example, using variational auto-encoder (VAE) for image encoding and decoding can greatly improve the codec performance and image compression effect. However, the codec network model introduces a huge amount of parameters and calculations while improving the codec performance, which leads to various problems when the codec network model runs on resource-limited end-side devices. For example, when the codec network model runs on a low-performance mobile device or a low-power embedded device, the efficiency of model inference will be reduced. For another example, some end-side devices do not support floating-point calculations of the codec network model, which limits the deployment of the codec network model on the end-side device.
为了解决上述问题,模型量化应运而生。模型量化是一种将模型的浮点计算转成定点计算的技术,可以有效的降低模型计算量、参数大小和内存消耗。编解码网络模型经过量化后,能够部署在手机、机器人等资源有限的端侧设备上。然而,上述编解码网络模型存在稳定性不佳的问题。例如,对图像编码后的码率过高,重建图像存在异常等等。即使将编解码网络模型进行量化,该编解码网络模型稳定性不佳的问题仍旧存在。In order to solve the above problems, model quantization came into being. Model quantization is a technology that converts the floating-point calculation of the model into fixed-point calculation, which can effectively reduce the model calculation amount, parameter size and memory consumption. After quantization, the codec network model can be deployed on end-side devices with limited resources such as mobile phones and robots. However, the above codec network model has the problem of poor stability. For example, the bit rate after encoding the image is too high, and there are anomalies in the reconstructed image. Even if the codec network model is quantized, the problem of poor stability of the codec network model still exists.
发明内容Summary of the invention
本申请提供了一种编解码网络模型的量化方法、装置、设备、存储介质及计算机程序,可以解决相关技术中编解码网络模型稳定性不佳的问题。所述技术方案如下:The present application provides a method, device, equipment, storage medium and computer program for quantizing a codec network model, which can solve the problem of poor stability of the codec network model in the related art. The technical solution is as follows:
第一方面,提供了一种编解码网络模型的量化方法,所述方法包括:确定未量化的编解码网络模型包括的H个网络单元,所述H个网络单元中的每个网络单元包括第一网络层、第二网络层和第三网络层,所述第一网络层与所述第二网络层均为线性运算层,所述第三网络层位于所述第一网络层与所述第二网络层之间,所述第三网络层用于对所述第一网络层输出的特征图中的特征值进行截断,以使所述第三网络层输出的特征图中的特征值全为非负或者全为非正,H为大于或等于1的整数,对所述每个网络单元中所述第一网络层包括的各个输出通道的权重进行缩放,以使所述每个网络单元中所述第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值,对所述每个网络单元中所述第二网络层包括的各个输入通道的权重进行缩放,以使所述编解码网络模型的输出结果在模型权重缩放前后相同,所述模型权重缩放包括对所述每个网络单元中所述第一网络层的各个输出通道的权重缩放和对所述第二网络层的各个输入通道的权重缩放,对所述模型权重缩放后的所述编解码网络模型进行量化。In a first aspect, a quantization method for a codec network model is provided, the method comprising: determining H network units included in an unquantized codec network model, each of the H network units comprising a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer, so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, H is an integer greater than or equal to 1, and for each network The weights of each output channel included in the first network layer in the unit are scaled so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold, and the weights of each input channel included in the second network layer in each network unit are scaled so that the output result of the codec network model is the same before and after the model weight scaling. The model weight scaling includes weight scaling of each output channel of the first network layer in each network unit and weight scaling of each input channel of the second network layer. The codec network model after the model weight scaling is quantized.
通过对每个第一网络层包括的各个输出通道的权重进行缩放,能够保证每个第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值,而且通过对第一网络层之后的第二网络层的输入通道的权重进行缩放,能够保证编解码网络模型的输出结果在模型权重缩放前后相同。这样,在对模型权重缩放后的编解码网络模型进行量化之后,通过模型权重缩放的第一网络层和第二网络层能够对异常特征值进行截断,还能够保证正常特征值不受到模型权重缩放的影响,从而降低编解码网络模型因第一网络层各个输出通道的权重改变而受到的影响,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。By scaling the weights of each output channel included in each first network layer, it is possible to ensure that the difference between the boundary feature values of each channel in the feature map output by each first network layer is less than the feature threshold, and by scaling the weights of the input channels of the second network layer after the first network layer, it is possible to ensure that the output result of the codec network model is the same before and after the model weight scaling. In this way, after the codec network model after the model weight scaling is quantized, the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
需要说明的是,一张图像经过一个网络层能够输出一个特征图,编解码网络模型中包含多个网络层,每个网络层中包括多个权重,该多个权重分别为相应网络层输入通道的权重和输出通道的权重,输入通道 的权重能够对网络层输入的特征图进行处理,输出通道的权重能够对网络层输出的特征图进行处理。将待压缩图像输入编解码网络模型中,该编解码网络模型中的每个网络层都能够输出相应的特征图。It should be noted that an image can output a feature map after passing through a network layer. The codec network model contains multiple network layers, and each network layer includes multiple weights, which are the weights of the input channel and the output channel of the corresponding network layer. The weights of can process the feature maps of the network layer input, and the weights of the output channels can process the feature maps of the network layer output. The image to be compressed is input into the codec network model, and each network layer in the codec network model can output the corresponding feature map.
可选地,确定未量化的编解码网络模型包括的H个网络单元中的第一网络层,包括:确定第一线性运算层输出的M个第一样本特征图,所述M个第一样本特征图为M个正常样本图像对应的特征图,M为大于或等于1的整数,所述第一线性运算层为所述编解码网络模型包括的任意一个线性运算层,确定所述第一线性运算层输出的N个第二样本特征图,所述N个第二样本特征图为N个异常样本图像对应的特征图,N为大于或等于1的整数,在所述M个第一样本特征图和所述N个第二样本特征图满足特征异常条件的情况下,确定所述第一线性运算层为所述第一网络层。Optionally, determining a first network layer among H network units included in an unquantized codec network model includes: determining M first sample feature maps output by a first linear operation layer, the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, the first linear operation layer is any linear operation layer included in the codec network model, determining N second sample feature maps output by the first linear operation layer, the N second sample feature maps are feature maps corresponding to N abnormal sample images, N is an integer greater than or equal to 1, and when the M first sample feature maps and the N second sample feature maps meet a feature abnormality condition, determining that the first linear operation layer is the first network layer.
由于编解码网络模型中并不是每个线性运算层都会导致编解码结果异常,因此,也可以对编解码网络模型包括的线性运算层进行筛选,以确定会导致编解码结果异常的线性运算层,将会导致编解码结果异常的线性运算层作为第一网络层,这样能够针对性的对导致编解码结果异常的线性运算层包括的各个输出通道的权重进行缩放,在减少编解码网络模型的计算量的同时,还能够有效保证编解码网络模型的稳定性。Since not every linear operation layer in the codec network model will cause abnormal coding results, the linear operation layers included in the codec network model can also be screened to determine the linear operation layers that will cause abnormal coding results, and the linear operation layers that will cause abnormal coding results are used as the first network layer. In this way, the weights of each output channel included in the linear operation layer that causes abnormal coding results can be scaled in a targeted manner, which can reduce the computational complexity of the codec network model while effectively ensuring the stability of the codec network model.
可选地,确定未量化的编解码网络模型包括的H个网络单元中的第二网络层,包括:对于所述每个网络单元中的所述第一网络层,在所述第一网络层之后具有线性运算层的情况下,将所述第一网络层之后的线性运算层确定为所述第二网络层,在所述第一网络层之后不具有线性运算层的情况下,在所述第一网络层之后添加一个线性运算层,将添加的线性运算层确定为所述第二网络层。Optionally, determining the second network layer in the H network units included in the unquantized codec network model includes: for the first network layer in each of the network units, when there is a linear operation layer after the first network layer, determining the linear operation layer after the first network layer as the second network layer; when there is no linear operation layer after the first network layer, adding a linear operation layer after the first network layer, and determining the added linear operation layer as the second network layer.
由于第二网络层为第一网络层之后的线性运算层,因此,第二网络层能够及时对第一网络层输出的特征图中的特征值进行修正,有效降低编解码网络模型因第一网络层各个输出通道的权重改变而受到的影响,使得编解码网络模型的输出结果在模型权重缩放前后相同,进而保证最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。Since the second network layer is a linear operation layer after the first network layer, the second network layer can timely correct the feature values in the feature map output by the first network layer, effectively reducing the impact of the codec network model caused by the change of the weights of each output channel of the first network layer, so that the output result of the codec network model is the same before and after the model weight scaling, thereby ensuring that there is no abnormality in the final generated code stream and/or reconstructed image, and improving the stability of the codec network model.
可选地,所述对所述每个网络单元中所述第一网络层包括的各个输出通道的权重进行缩放,包括:对于所述每个网络单元中的所述第一网络层,确定所述第一网络层输出的K个第三样本特征图,所述K个第三样本特征图为K个正常样本图像对应的特征图,K为大于或等于1的整数,基于所述K个第三样本特征图确定所述第一网络层包括的各个输出通道的缩放比例,按照所述第一网络层包括的各个输出通道的缩放比例,对所述第一网络层包括的各个输出通道的权重进行缩放。Optionally, scaling the weights of each output channel included in the first network layer in each network unit includes: for the first network layer in each network unit, determining K third sample feature maps output by the first network layer, the K third sample feature maps are feature maps corresponding to K normal sample images, K is an integer greater than or equal to 1, determining a scaling ratio of each output channel included in the first network layer based on the K third sample feature maps, and scaling the weights of each output channel included in the first network layer according to the scaling ratio of each output channel included in the first network layer.
由于正常样本图像是经过编解码网络后的码率和重建图像不存在异常的图像,并且正常样本图像对应的特征图在每个通道的特征值的取值范围都小于异常样本图像对应的特征图在相应通道的特征值的取值范围,因此,基于第三样本特征图确定出的第一网络层包括的各个输出通道的缩放比例,对第一网络层包括的各个输出通道的权重进行缩放,能够实现对异常特征值的有效截断,进而保证截断后的异常特征值的取值与正常样本特征图中的特征值的取值相符合,从而提高编解码网络模型的稳定性。Since the normal sample image is an image without any abnormalities in the bit rate and the reconstructed image after passing through the codec network, and the value range of the eigenvalues in each channel of the feature map corresponding to the normal sample image is smaller than the value range of the eigenvalues in the corresponding channel of the feature map corresponding to the abnormal sample image, therefore, by scaling the weights of each output channel included in the first network layer based on the scaling ratio of the output channels determined by the third sample feature map, the effective truncation of the abnormal eigenvalues can be achieved, thereby ensuring that the values of the truncated abnormal eigenvalues are consistent with the values of the eigenvalues in the normal sample feature map, thereby improving the stability of the codec network model.
可选地,所述基于所述K个第三样本特征图确定所述第一网络层包括的各个输出通道的缩放比例,包括:基于所述K个第三样本特征图确定所述第一网络层对应的参考特征值以及所述第一网络层包括的各个输出通道的边界特征值,将所述第一网络层对应的参考特征值与所述第一网络层包括的各个输出通道的边界特征值之间的比例,确定为所述第一网络层包括的各个输出通道的缩放比例。Optionally, determining the scaling ratio of each output channel included in the first network layer based on the K third sample feature maps includes: determining a reference eigenvalue corresponding to the first network layer and a boundary eigenvalue of each output channel included in the first network layer based on the K third sample feature maps, and determining the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of each output channel included in the first network layer as the scaling ratio of each output channel included in the first network layer.
由于正常样本图像对应的特征图在每个通道的特征值的取值范围都小于异常样本图像对应的特征图在相应通道的特征值的取值范围,因此,基于第三样本特征图确定出的第一网络层包括的各个输出通道的边界特征值,能够指示正常样本图像对应的特征图在每个通道的特征值的取值范围,第一网络层对应的参考特征值能够指示正常样本图像对应的特征图中的最大特征值。在这种情况下,将第一网络层对应的参考特征值与第一网络层包括的各个输出通道的边界特征值之间的比例确定为第一网络层包括的各个输出通道的缩放比例,能够保证对异常特征值的进行截断,还能够保证正常特征值不受到模型权重缩放的影响,从而有效提高编解码网络模型的稳定性。Since the range of eigenvalues of the feature map corresponding to the normal sample image in each channel is smaller than the range of eigenvalues of the feature map corresponding to the abnormal sample image in the corresponding channel, the boundary eigenvalues of each output channel included in the first network layer determined based on the third sample feature map can indicate the range of eigenvalues of the feature map corresponding to the normal sample image in each channel, and the reference eigenvalues corresponding to the first network layer can indicate the maximum eigenvalue in the feature map corresponding to the normal sample image. In this case, the ratio between the reference eigenvalues corresponding to the first network layer and the boundary eigenvalues of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer, which can ensure the truncation of abnormal eigenvalues and can also ensure that normal eigenvalues are not affected by the model weight scaling, thereby effectively improving the stability of the codec network model.
可选地,所述对所述每个网络单元中所述第二网络层包括的各个输入通道的权重进行缩放,包括:对于所述每个网络单元中的所述第二网络层,基于所述第二网络层所处的网络单元中的所述第一网络层包括的各个输出通道的缩放比例,确定所述第二网络层包括的各个输入通道的缩放比例,按照所述第二网络层包括的各个输入通道的缩放比例,对所述第二网络层包括的各个输入通道的权重进行缩放。Optionally, scaling the weights of the respective input channels included in the second network layer in each network unit includes: for the second network layer in each network unit, determining the scaling ratio of the respective input channels included in the second network layer based on the scaling ratio of the respective output channels included in the first network layer in the network unit where the second network layer is located, and scaling the weights of the respective input channels included in the second network layer according to the scaling ratio of the respective input channels included in the second network layer.
按照第二网络层包括的各个输入通道的缩放比例,对该第二网络层包括的各个输入通道的权重进行缩放,这样,第二网络层能够有针对性的对相应第一网络层输出的特征图中的特征值进行修正,使得正常特 征值不受到第一网络层权重缩放的影响,进而保证正常样本图像在编解码网络模型的输出结果在模型权重缩放前后相同,提高编解码网络模型的稳定性。According to the scaling ratio of each input channel included in the second network layer, the weights of each input channel included in the second network layer are scaled, so that the second network layer can modify the feature values in the feature map output by the corresponding first network layer in a targeted manner, so that the normal feature The eigenvalue is not affected by the weight scaling of the first network layer, thereby ensuring that the output results of the normal sample image in the codec network model are the same before and after the model weight scaling, thereby improving the stability of the codec network model.
可选地,所述编解码网络模型包括编码网络模型、解码网络模型或者熵估计网络模型。Optionally, the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
无论编解码网络模型包括编码网络模型、解码网络模型或者熵估计网络模型,本申请实施例提供的编解码网络模型的量化方法都能够从编解码网络模型中确定出第一网络层和第二网络层,进而实现对异常特征值的修正,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。Regardless of whether the codec network model includes an encoding network model, a decoding network model or an entropy estimation network model, the quantization method of the codec network model provided in the embodiment of the present application can determine the first network layer and the second network layer from the codec network model, and then realize the correction of abnormal feature values, so that the finally generated bitstream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
可选地,所述边界特征值为最大特征值或者最小特征值。Optionally, the boundary eigenvalue is a maximum eigenvalue or a minimum eigenvalue.
在边界特征值为最大特征值的情况下,也即是,在对大于最大截断值的异常特征值进行截断的情况下,该第三网络层输出的特征图中的最小特征值为0,也即是,全为非负,在这种情况下,由于最小截断值一般为负数,因此,小于最小截断值的异常特征值在经过第三网络层之后被截断为0,从而实现了对该异常特征值的修正。在边界特征值为最小特征值的情况下,该第三网络层输出的特征图中的最大特征值为0,也即是,全为非正,在这种情况下,由于最大截断值一般为正数,因此,大于最大截断值的异常特征值在经过该第三网络层之后被截断为0,从而实现了对该异常特征值的修正。也就是说,无论边界特征值为最大特征值还是最小特征值,本申请实施例提供的编解码网络模型的量化方法能够实现对异常特征值的修正,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。In the case where the boundary eigenvalue is the maximum eigenvalue, that is, when the abnormal eigenvalue greater than the maximum truncation value is truncated, the minimum eigenvalue in the feature map output by the third network layer is 0, that is, all are non-negative. In this case, since the minimum truncation value is generally a negative number, the abnormal eigenvalue less than the minimum truncation value is truncated to 0 after passing through the third network layer, thereby achieving the correction of the abnormal eigenvalue. In the case where the boundary eigenvalue is the minimum eigenvalue, the maximum eigenvalue in the feature map output by the third network layer is 0, that is, all are non-positive. In this case, since the maximum truncation value is generally a positive number, the abnormal eigenvalue greater than the maximum truncation value is truncated to 0 after passing through the third network layer, thereby achieving the correction of the abnormal eigenvalue. In other words, regardless of whether the boundary eigenvalue is the maximum eigenvalue or the minimum eigenvalue, the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of the abnormal eigenvalue, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
其中,编解码网络模型中任意一个网络层输出的正常样本特征图中的特征值的取值范围称作该网络层的截断值,该截断值包括最小截断值和最大截断值,最小截断值是指该取值范围中的最小值,最大截断值是指该取值范围中的最大值。Among them, the value range of the feature value in the normal sample feature map output by any network layer in the codec network model is called the truncation value of the network layer, and the truncation value includes the minimum truncation value and the maximum truncation value. The minimum truncation value refers to the minimum value in the value range, and the maximum truncation value refers to the maximum value in the value range.
第二方面,提供了一种编解码网络模型的量化装置,所述编解码网络模型的量化装置具有实现上述第一方面中编解码网络模型的量化方法行为的功能。所述编解码网络模型的量化装置包括至少一个模块,该至少一个模块用于实现上述第一方面所提供的编解码网络模型的量化方法。In a second aspect, a quantization device for a codec network model is provided, wherein the quantization device for the codec network model has the function of implementing the quantization method behavior of the codec network model in the first aspect. The quantization device for the codec network model includes at least one module, and the at least one module is used to implement the quantization method of the codec network model provided in the first aspect.
第三方面,提供了一种编解码网络模型的量化设备,所述编解码网络模型的量化设备包括处理器和存储器,所述存储器用于存储执行上述第一方面所提供的编解码网络模型的量化方法的计算机程序。所述处理器被配置为用于执行所述存储器中存储的计算机程序,以实现上述第一方面所述的编解码网络模型的量化方法。In a third aspect, a quantization device for a codec network model is provided, the quantization device for the codec network model comprising a processor and a memory, the memory being used to store a computer program for executing the quantization method for the codec network model provided in the first aspect. The processor is configured to execute the computer program stored in the memory to implement the quantization method for the codec network model described in the first aspect.
可选地,所述编解码网络模型的量化设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。Optionally, the quantization device of the codec network model may further include a communication bus, and the communication bus is used to establish a connection between the processor and the memory.
第四方面,提供了一种计算机可读存储介质,所述存储介质内存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的编解码网络模型的量化方法。In a fourth aspect, a computer-readable storage medium is provided, wherein the storage medium stores instructions, and when the instructions are executed on a computer, the computer executes the quantization method of the encoding and decoding network model described in the first aspect.
第五方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的编解码网络模型的量化方法。或者说,提供了一种计算机程序,当所述计算机程序在计算机上运行时,使得计算机执行上述第一方面所述的编解码网络模型的量化方法。In a fifth aspect, a computer program product comprising instructions is provided, and when the instructions are executed on a computer, the computer executes the quantization method of the codec network model described in the first aspect. In other words, a computer program is provided, and when the computer program is executed on a computer, the computer executes the quantization method of the codec network model described in the first aspect.
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the above-mentioned second, third, fourth and fifth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and will not be repeated here.
图1是本申请实施例提供的一种图像压缩框架的示意图;FIG1 is a schematic diagram of an image compression framework provided in an embodiment of the present application;
图2是本申请实施例提供的一种网络模型的示意图;FIG2 is a schematic diagram of a network model provided in an embodiment of the present application;
图3是本申请实施例提供的一种最大特征值的统计图;FIG3 is a statistical diagram of a maximum eigenvalue provided in an embodiment of the present application;
图4是本申请实施例提供的一种最小特征值的统计图;FIG4 is a statistical diagram of a minimum eigenvalue provided in an embodiment of the present application;
图5是本申请实施例提供的另一种最大特征值的统计图;FIG5 is a statistical diagram of another maximum eigenvalue provided in an embodiment of the present application;
图6是本申请实施例提供的一种重建图像的示意图;FIG6 is a schematic diagram of a reconstructed image provided by an embodiment of the present application;
图7是本申请实施例提供的一种实施环境的示意图; FIG7 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
图8是本申请实施例提供的另一种实施环境的示意图;FIG8 is a schematic diagram of another implementation environment provided by an embodiment of the present application;
图9是本申请实施例提供的一种编解码框架的结构示意图;FIG9 is a schematic diagram of the structure of a coding and decoding framework provided in an embodiment of the present application;
图10是本申请实施例提供的一种编码网络模型的结构示意图;FIG10 is a schematic diagram of the structure of a coding network model provided in an embodiment of the present application;
图11是本申请实施例提供的一种解码网络模型的结构示意图;FIG11 is a schematic diagram of the structure of a decoding network model provided in an embodiment of the present application;
图12是本申请实施例提供的一种熵估计网络模型的结构示意图;FIG12 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application;
图13是本申请实施例提供的一种编解码网络模型的量化方法的流程图;FIG13 is a flowchart of a quantization method of a coding and decoding network model provided in an embodiment of the present application;
图14是本申请实施例提供的一种网络单元的示意图;FIG14 is a schematic diagram of a network unit provided in an embodiment of the present application;
图15是本申请实施例提供的一种权重的示意图;FIG15 is a schematic diagram of a weight provided in an embodiment of the present application;
图16是本申请实施例提供的另一种最大特征值的统计图;FIG16 is a statistical diagram of another maximum eigenvalue provided in an embodiment of the present application;
图17是本申请实施例提供的另一种权重的示意图;FIG17 is a schematic diagram of another weight provided in an embodiment of the present application;
图18是本申请实施例提供的另一种重建图像的示意图;FIG18 is a schematic diagram of another reconstructed image provided by an embodiment of the present application;
图19是本申请实施例提供的一种熵估计网络模型的结构示意图;FIG19 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application;
图20是本申请实施例提供的一种编解码网络模型的量化装置的结构示意图。FIG20 is a schematic diagram of the structure of a quantization device for a coding and decoding network model provided in an embodiment of the present application.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the embodiments of the present application clearer, the implementation methods of the present application will be further described in detail below in conjunction with the accompanying drawings.
为了便于理解,在对本申请实施例提供的编解码网络模型的量化方法进行详细地解释说明之前,先对本申请实施例涉及的名词、应用场景以及实施环境进行介绍。For ease of understanding, before explaining in detail the quantization method of the encoding and decoding network model provided in the embodiment of the present application, the nouns, application scenarios and implementation environment involved in the embodiment of the present application are first introduced.
首先对本申请实施例涉及的名词进行解释。First, the terms involved in the embodiments of the present application are explained.
人工智能(artificial intelligence,AI):是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术以及应用系统的一门技术科学。Artificial intelligence (AI) is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
线性修正单元(rectified linear unit,ReLU):是一种人工神经网络中常用的激活函数,通常指代以斜坡函数及其变种为代表的非线性函数。Rectified linear unit (ReLU): is a commonly used activation function in artificial neural networks, usually referring to nonlinear functions represented by ramp functions and their variants.
卷积神经网络(convolution neural network,CNN):是一种包含卷积计算且具有深度结构的前馈神经网络,是深度学习的代表算法之一。CNN中还可能包含激活层(如ReLU等)、池化层(pooling layer)、批量归一化层(batch normalization layer)、全连接层(fully connected layer)等模块。典型的卷积神经网络如LeNet、AlexNet、VGGNet、ResNet等。基本的CNN可由主干(backbone)网络和头部(head)网络构成,复杂的CNN由backbone、颈部(neck)和head网络构成。Convolutional neural network (CNN): It is a feedforward neural network with deep structure and convolution calculation. It is one of the representative algorithms of deep learning. CNN may also contain activation layer (such as ReLU, etc.), pooling layer, batch normalization layer, fully connected layer and other modules. Typical convolutional neural networks include LeNet, AlexNet, VGGNet, ResNet, etc. Basic CNN can be composed of backbone network and head network, and complex CNN is composed of backbone, neck and head network.
特征图(feature map):是指CNN中卷积层、激活层、池化层、批量归一化层等网络层输出的三维数据,该三维数据的三个维度分别为宽(Width)、高(Height)和通道(Channel)。Feature map: refers to the three-dimensional data output by the network layers such as convolution layer, activation layer, pooling layer, batch normalization layer, etc. in CNN. The three dimensions of this three-dimensional data are width, height, and channel.
在本申请实施例中,一张图像经过一个网络层能够输出一个特征图,对于任意一个网络层来说,若该网络层包含3个卷积核,则该3个卷积核能够分别对该网络层输入的特征图进行卷积,以得到该网络层输出的特征图,该网络层输出的特征图的通道数为3,也就是说,该网络层包含的卷积核的数量与该网络层输出的特征图的通道数量一一对应。In an embodiment of the present application, an image can output a feature map after passing through a network layer. For any network layer, if the network layer contains three convolution kernels, the three convolution kernels can respectively convolve the feature map input by the network layer to obtain the feature map output by the network layer. The number of channels of the feature map output by the network layer is 3, that is, the number of convolution kernels contained in the network layer corresponds one to one to the number of channels of the feature map output by the network layer.
变分自编码器:一种基于AI的图像编解码器,用于数据压缩或者去除噪声。本申请实施例以基于VAE的图像压缩框架为例进行介绍,基于VAE的图像压缩框架如图1所示。在编码过程中,将待压缩图像输入编码网络模型,以得到待编码特征。对待编码特征进行量化,以得到量化后的待编码特征。基于量化后的待编码特征,通过熵估计网络模型来估计概率分布。基于概率分布对量化后的待编码特征进行熵编码,以得到图像码流。解码过程与编码过程是相对应的。Variational Autoencoder: An AI-based image codec for data compression or noise removal. The embodiment of the present application is introduced by taking the VAE-based image compression framework as an example. The VAE-based image compression framework is shown in Figure 1. During the encoding process, the image to be compressed is input into the encoding network model to obtain the features to be encoded. The features to be encoded are quantized to obtain the quantized features to be encoded. Based on the quantized features to be encoded, the probability distribution is estimated through the entropy estimation network model. The quantized features to be encoded are entropy encoded based on the probability distribution to obtain an image code stream. The decoding process corresponds to the encoding process.
峰值信噪比(peak signal to noise ratio,PSNR):是一种评价图像质量的客观标准,PSNR越高代表图像质量相对越好。Peak signal to noise ratio (PSNR) is an objective standard for evaluating image quality. The higher the PSNR, the better the image quality.
码率:在图像压缩中,指单位像素编码所需要的编码长度,码率越高,图像重建质量越好。Bit rate: In image compression, it refers to the code length required to encode a unit pixel. The higher the bit rate, the better the image reconstruction quality.
像素比特率(bits per pixel,BPP):又称为位/像素,BPP是存储每个像素所用的位数,BPP越小代表压缩码率越小,BPP越大代表压缩码率越大。Pixel bit rate (bits per pixel, BPP): also known as bits/pixel, BPP is the number of bits used to store each pixel. The smaller the BPP, the lower the compression bit rate, and the larger the BPP, the higher the compression bit rate.
量化(Quantization):表示将连续的信号变成离散的信号。在图像压缩过程中,表示将连续的特征变 成离散的特征。在熵编码中,通常会将概率分布的概率值从连续的值,变成离散值。Quantization: refers to converting a continuous signal into a discrete signal. In the image compression process, it refers to converting a continuous feature into a discrete signal. In entropy coding, the probability value of the probability distribution is usually changed from a continuous value to a discrete value.
率失真曲线(rate-distortion curve,RD-Curve):横坐标为码率,纵坐标为图像质量,通常曲线越靠左上方编解码性能越好。Rate-distortion curve (RD-Curve): The horizontal axis is the bit rate, and the vertical axis is the image quality. Generally, the closer the curve is to the upper left, the better the encoding and decoding performance.
AI压缩算法(AICodec):是指基于深度学习技术的数据压缩方法。AI compression algorithm (AICodec): refers to a data compression method based on deep learning technology.
熵估计:预测已量化数据的分布的过程。Entropy estimation: The process of predicting the distribution of quantized data.
模型量化:表示将浮点类型的神经网络模型定点化为有限位数能够表示的定点类型的模型的方法,模型量化能够建立浮点数据与定点数据之间的映射关系。其原理可以理解为使用n(例如8,10,16)bit范围内的整数来表示网络模型中的浮点类型的权重和计算过程中的张量。为了便于理解,现以图2所示的网络模型为例,对模型计算过程中的张量进行量化的过程进行介绍。在图2中,x表示网络模型的输入,y表示网络模型的输出,a1为卷积层(Conv)输出的特征图,a2为激活层(Relu)输出的特征图,qx表示x经过量化后的定点值,qa1表示a1经过量化后的特征图,qa2表示a2经过量化之后的特征图,qy表示全连接层(fully connected layers,FC)输出的数据。在模型量化中,可以基于样本数据,对x、a1、a2和y的取值范围分别进行统计,以分别得到x、a1、a2和y的取值范围中的最大值和最小值。基于x、a1、a2和y的取值范围中的最大值和最小值,确定x、a1、a2和y分别对应的量化公式。这样,在模型进行推理的过程中,能够基于x、a1、a2和y分别对应的量化公式,对模型的输入以及计算过程中的张量进行量化,以得到qy,对qy进行反量化,以得到推算的浮点值y。Model quantization: refers to the method of converting a floating-point neural network model into a fixed-point model that can be represented by a finite number of bits. Model quantization can establish a mapping relationship between floating-point data and fixed-point data. Its principle can be understood as using integers in the range of n (e.g., 8, 10, 16) bits to represent the floating-point weights in the network model and the tensors in the calculation process. For ease of understanding, the process of quantizing tensors in the model calculation process is introduced by taking the network model shown in Figure 2 as an example. In Figure 2, x represents the input of the network model, y represents the output of the network model, a1 is the feature map output by the convolutional layer (Conv), a2 is the feature map output by the activation layer (Relu), qx represents the fixed-point value of x after quantization, qa1 represents the feature map after quantization of a1, qa2 represents the feature map after quantization of a2, and qy represents the data output by the fully connected layers (FC). In model quantization, based on sample data, the value ranges of x, a1, a2, and y can be statistically analyzed to obtain the maximum and minimum values in the value ranges of x, a1, a2, and y. Based on the maximum and minimum values in the value ranges of x, a1, a2, and y, the quantization formulas corresponding to x, a1, a2, and y are determined. In this way, during the inference process of the model, the input of the model and the tensor in the calculation process can be quantized based on the quantization formulas corresponding to x, a1, a2, and y to obtain qy, and qy can be dequantized to obtain the inferred floating-point value y.
作为一种示例,网络模型的输入x对应的量化公式可以通过如下表达式(1)表示。
As an example, the quantization formula corresponding to the input x of the network model can be expressed by the following expression (1).
其中,在上述表达式(1)中,xquant代表量化后的定点值,x代表未进行量化的浮点值,scale表示量化后的定点值能够表示的最小刻度,即标度,zero_point代表浮点值0对应的定点值,即零点位置,quantmin代表xquant的取值范围中的最小值,quantmax代表xquant的取值范围中的最大值,round()函数用于对括号内的数字进行四舍五入,clamp()函数用于将xquant限制在quantmax和quantmin之间。也即是,若计算出的定点值超出xquant的取值范围,则需进行截断处理,使得xquant的等于quantmax和quantmin中与计算出的定点值最接近的值。Among them, in the above expression (1), x quant represents the fixed-point value after quantization, x represents the floating-point value that has not been quantized, scale represents the minimum scale that the fixed-point value after quantization can represent, that is, the scale, zero_point represents the fixed-point value corresponding to the floating-point value 0, that is, the zero point position, quant min represents the minimum value in the value range of x quant , quant max represents the maximum value in the value range of x quant , the round() function is used to round the numbers in the brackets, and the clamp() function is used to limit x quant to between quant max and quant min . That is, if the calculated fixed-point value exceeds the value range of x quant , it needs to be truncated so that x quant is equal to the value of quant max and quant min that is closest to the calculated fixed-point value.
模型量化方法主要有后训练量化(post training quantization,PTQ)和量化感知训练(quantization aware training,QAT)。其中,PTQ是指在不重新训练网络模型,即不更新网络模型的权重的前提下,将浮点类型的网络模型直接转换为定点类型的网络模型。QAT是指在网络模型训练或微调时,将伪量化(模拟量化)操作插入该网络模型中,以实现对量化操作的模拟,从而使得网络模型的参数能够更好的适应量化带来的信息损失。通常,将QAT应用于网络模型的微调过程中,从预先训练的网络模型开始微调可以使得网络模型的精度和鲁棒性更高,并且可以显著减少网络模型训练的迭代次数。The main model quantization methods include post-training quantization (PTQ) and quantization aware training (QAT). Among them, PTQ refers to directly converting a floating-point network model into a fixed-point network model without retraining the network model, that is, without updating the weights of the network model. QAT refers to inserting a pseudo-quantization (simulated quantization) operation into the network model during network model training or fine-tuning to simulate the quantization operation, so that the parameters of the network model can better adapt to the information loss caused by quantization. Usually, applying QAT to the fine-tuning process of the network model, fine-tuning from a pre-trained network model can make the network model more accurate and robust, and can significantly reduce the number of iterations of network model training.
图像压缩是指一种利用空间冗余度、视觉冗余度和统计冗余度等图像数据特性,以较少的比特有损或无损地表示原本的图像像素矩阵的技术,实现图像信息的有效传输和存储,对当前图像信息的种类和数据量越来越大的媒体时代起着重要作用。图像压缩技术包括对图像的编码和解码,而编解码性能(体现图像质量)以及编解码效率(体现耗时)是图像压缩技术中需要考虑的要素。Image compression refers to a technology that uses image data characteristics such as spatial redundancy, visual redundancy, and statistical redundancy to represent the original image pixel matrix losslessly or with fewer bits, achieving effective transmission and storage of image information. It plays an important role in the current media era where the types and amount of image information are increasing. Image compression technology includes encoding and decoding of images, and encoding and decoding performance (reflecting image quality) and encoding and decoding efficiency (reflecting time consumption) are factors that need to be considered in image compression technology.
经过技术人员长期的研究与优化,目前已经形成了诸如联合图像专家组(joint photographic experts group,JPEG)等有损图像压缩标准。但是这些较为传统的图像压缩技术在编解码性能的提升上遇到了瓶颈,已无法满足多媒体应用数据日益增多的时代需求。随着深度学习技术在图像识别、目标检测等领域的广泛应用,深度学习技术也被应用于图像压缩任务中,也即是,使用编解码网络模型进行图像压缩。这种使用编解码网络模型进行图像压缩的方法在编解码效率和图像压缩效果等方面都优于传统的图像压缩方法。例如,使用变分自编码器进行图像编解码,能够大幅提升编解码性能和图像压缩效果。After long-term research and optimization by technical personnel, lossy image compression standards such as the Joint Photographic Experts Group (JPEG) have been formed. However, these more traditional image compression technologies have encountered bottlenecks in improving encoding and decoding performance, and can no longer meet the needs of the era of increasing multimedia application data. With the widespread application of deep learning technology in image recognition, target detection and other fields, deep learning technology has also been applied to image compression tasks, that is, using codec network models for image compression. This method of using codec network models for image compression is superior to traditional image compression methods in terms of encoding and decoding efficiency and image compression effect. For example, using variational autoencoders for image encoding and decoding can greatly improve encoding and decoding performance and image compression effect.
在实际应用中,在AI压缩算法的编解码效率和图像压缩效果提升的同时,对于AI压缩算法的稳定性也有较高的要求。稳定性主要包括两个方面,一方面是未加入噪声的图像编码后的码率的稳定性以及重建图像的效果的稳定性,另一方面是对加入特定噪声的图像编码后的码率的稳定性以及重建图像的效果的稳定性,也即是,对抗攻击的稳定性。而AI压缩算法在这两个方面都存在稳定性不佳的问题。例如,部分未 加入噪声的图像经过AI压缩算法的编解码网络模型后的码率过高和/或重建图像存在异常。又例如,加入特定噪声的图像经过编解码网络模型后的码率异常升高和/或重建图像存在异常,也即是,AI压缩算法的性能严重降低。为了便于描述,在后文中将经过编解码网络模型后的码率和重建图像均不存在异常的图像称为正常样本图像,将经过编解码网络模型后的码率和/或重建图像存在异常的图像称为异常样本图像。In practical applications, while the encoding and decoding efficiency and image compression effect of AI compression algorithms are improved, there are also high requirements for the stability of AI compression algorithms. Stability mainly includes two aspects. On the one hand, it is the stability of the bit rate after encoding the image without adding noise and the stability of the effect of reconstructing the image. On the other hand, it is the stability of the bit rate after encoding the image with specific noise and the stability of the effect of reconstructing the image, that is, the stability against attacks. However, AI compression algorithms have poor stability in both aspects. For example, some The bit rate of the image with added noise after passing through the codec network model of the AI compression algorithm is too high and/or the reconstructed image has abnormalities. For another example, the bit rate of the image with added specific noise increases abnormally and/or the reconstructed image has abnormalities after passing through the codec network model, that is, the performance of the AI compression algorithm is severely reduced. For the sake of ease of description, in the following text, images with no abnormalities in bit rate and reconstructed image after passing through the codec network model are referred to as normal sample images, and images with abnormalities in bit rate and/or reconstructed image after passing through the codec network model are referred to as abnormal sample images.
为了明确异常样本图像经过编解码网络模型后码率和/或重建图像产生异常的原因,技术人员分别对正常样本图像和异常样本图像在编解码网络模型中每一层输出的特征图进行统计。该正常样本图像在编解码网络模型中任意一层输出的特征图可以称为正常样本特征图,异常样本图像在编解码网络模型中任意一层输出的特征图可以称为异常样本特征图。也即是,统计正常样本特征图和异常样本特征图包括的每个通道的特征图中的最大特征值和最小特征值。基于该统计数据,技术人员发现在同一个网络层输出的正常样本特征图和异常样本特征图包括的每个通道的特征图中的最大特征值和最小特征值具有很大差异。请参考图3,以编解码网络模型包括的超编码(hyper encoder,HyEnc)网络模型为例,从超编码网络模型中第一个网络层输出的正常样本特征图和异常样本特征图包括的每个通道的特征图中的最大特征值的统计图中可以看出,在超编码网络模型中的第一个网络层中,异常样本特征图包括的每个通道的特征图中的最大特征值都大于正常样本特征图包括的每个通道的特征图中的最大特征值。请参考图4,在超编码网络模型中第一个网络层输出的正常样本特征图和异常样本特征图包括的每个通道的特征图中的最小特征值的统计图中可以看出,在超编码网络模型中的第一个网络层中,异常样本特征图包括的每个通道的特征图中的最小特征值都小于正常样本特征图包括的每个通道的特征图中的最小特征值。换句话说,正常样本特征图在每个通道的特征图中的特征值的取值范围都小于异常样本特征图在相应通道的特征图中的特征值的取值范围。In order to clarify the reasons why the bit rate and/or reconstructed image of the abnormal sample image is abnormal after passing through the codec network model, the technicians have counted the feature maps output by the normal sample image and the abnormal sample image in each layer of the codec network model. The feature map output by any layer of the normal sample image in the codec network model can be called a normal sample feature map, and the feature map output by any layer of the abnormal sample image in the codec network model can be called an abnormal sample feature map. That is, the maximum eigenvalue and the minimum eigenvalue in the feature map of each channel included in the normal sample feature map and the abnormal sample feature map are counted. Based on the statistical data, the technicians found that the maximum eigenvalue and the minimum eigenvalue in the feature map of each channel included in the normal sample feature map and the abnormal sample feature map output in the same network layer are very different. Please refer to FIG3. Taking the hyper encoder (HyEnc) network model included in the encoding and decoding network model as an example, it can be seen from the statistical graph of the maximum eigenvalue in the feature graph of each channel included in the normal sample feature graph and the abnormal sample feature graph outputted by the first network layer in the hyperencoding network model that in the first network layer in the hyperencoding network model, the maximum eigenvalue in the feature graph of each channel included in the abnormal sample feature graph is greater than the maximum eigenvalue in the feature graph of each channel included in the normal sample feature graph. Please refer to FIG4. It can be seen from the statistical graph of the minimum eigenvalue in the feature graph of each channel included in the normal sample feature graph and the abnormal sample feature graph outputted by the first network layer in the hyperencoding network model that in the first network layer in the hyperencoding network model, the minimum eigenvalue in the feature graph of each channel included in the abnormal sample feature graph is less than the minimum eigenvalue in the feature graph of each channel included in the normal sample feature graph. In other words, the range of eigenvalues in the feature graph of each channel of the normal sample feature graph is smaller than the range of eigenvalues in the feature graph of the corresponding channel of the abnormal sample feature graph.
除此之外,编解码网络模型在编解码性能的提高同时也引入了巨大的参数量和计算量,从而导致该编解码网络模型运行在资源有限的端侧设备上会出现各种问题。例如,当编解码网络模型运行在低性能的移动设备或者低功耗的嵌入式设备上时,会导致模型推理的效率降低。又例如,部分端侧设备不支持编解码网络模型的浮点计算,从而限制了编解码网络模型在端侧设备上的部署。也就是说,上述编解码网络模型存在稳定性不佳的问题,并且难以部署在资源有限的端侧设备上。In addition, the codec network model introduces a huge amount of parameters and calculations while improving the codec performance, which leads to various problems when the codec network model runs on a resource-limited end-side device. For example, when the codec network model runs on a low-performance mobile device or a low-power embedded device, the efficiency of model reasoning will be reduced. For another example, some end-side devices do not support floating-point calculations of the codec network model, which limits the deployment of the codec network model on the end-side device. In other words, the above-mentioned codec network model has the problem of poor stability and is difficult to deploy on end-side devices with limited resources.
为了解决编解码网络模型难以部署在资源有限的端侧设备上的问题,技术人员对模型进行模型量化,将模型的浮点计算转成定点计算,进而有效的降低模型计算量、参数大小和内存消耗,使得编解码网络模型经过量化后,能够部署在手机、机器人等资源有限的端侧设备上。In order to solve the problem that the codec network model is difficult to deploy on end-side devices with limited resources, technicians quantize the model and convert the model's floating-point calculations into fixed-point calculations, thereby effectively reducing the model's calculation amount, parameter size, and memory consumption. This allows the codec network model to be deployed on end-side devices with limited resources, such as mobile phones and robots, after quantization.
虽然在模型量化的过程中能够基于正常样本图像,确定编解码网络模型中每个网络层输出的正常样本特征图中的特征值的取值范围,进而在模型进行推理时,能够基于编解码网络模型中每个网络层输出的正常样本特征图中的特征值的取值范围,对相应网络层输出的特征图中每个特征点的特征值进行截断,使得该特征值的取值与正常样本特征图中的特征值相符合,从而在一定程度上减少异常特征值,但是这样并不能有效解决编解码网络模型的稳定性不佳的问题。为了便于描述,在后文将编解码网络模型中任意一个网络层输出的正常样本特征图中的特征值的取值范围称作该网络层的截断值,该截断值包括最小截断值和最大截断值,最小截断值是指该取值范围中的最小值,最大截断值是指该取值范围中的最大值,对于任意一个通道的特征图中的特征值来说,若该特征值明显超过该正常样本特征图在相应通道的特征图中的特征值的取值范围,则将该特征值称为异常特征值。Although the range of eigenvalues in the normal sample feature map output by each network layer in the codec network model can be determined based on the normal sample image during the model quantization process, and then when the model is inferred, the eigenvalue of each feature point in the feature map output by the corresponding network layer can be truncated based on the range of eigenvalues in the normal sample feature map output by each network layer in the codec network model, so that the value of the eigenvalue is consistent with the eigenvalue in the normal sample feature map, thereby reducing abnormal eigenvalues to a certain extent, but this cannot effectively solve the problem of poor stability of the codec network model. For the sake of convenience of description, the range of eigenvalues in the normal sample feature map output by any network layer in the codec network model is referred to as the truncation value of the network layer in the following text, and the truncation value includes the minimum truncation value and the maximum truncation value. The minimum truncation value refers to the minimum value in the value range, and the maximum truncation value refers to the maximum value in the value range. For the eigenvalue in the feature map of any channel, if the eigenvalue obviously exceeds the range of eigenvalues of the normal sample feature map in the feature map of the corresponding channel, the eigenvalue is called an abnormal eigenvalue.
对于编解码网络模型中的任意一个网络层来说,该网络层使用的截断值为该网络层输出的正常样本特征图包括的所有特征值中的最大特征值和最小特征值。而该截断值可能大于或等于异常样本特征图在某个通道的特征图中的特征值的取值范围,在这种情况下,无法对该通道的异常特征值进行截断,这样,异常样本图像经过编解码网络模型后的码率和/或重建图像仍会存在异常,从而无法有效解决编解码网络模型的稳定性不佳的问题。请参考图5,图5是编解码网络模型中的某个网络层输出的正常样本特征图和异常样本特征图包括的每个通道的特征图中的最大特征值的统计图,该网络层的最大截断值大于异常样本特征图包括的每个通道的特征图中的最大特征值,此时,基于该最大截断值无法对除通道8之外的其他通道的异常特征值进行截断。请参考图6,图6中的重建图像a为异常样本图像经过未量化的编解码网络模型所得到的重建图像,重建图像b为异常样本图像经过量化的编解码网络模型所得到的重建图像,从图6中可以看出,异常样本图像经过量化的编解码网络模型后的压缩率和重建图像存在异常,而异常样本图像经过量化的编解码网络模型后的压缩率和重建图像仍旧具有较大异常。基于此,本申请实施例提供了一种编解码 网络模型的量化方法,通过该方法对编解码网络模型进行量化之后,能够有效的对第一网络层输出的特征图在每个通道的异常特征值进行截断,从而实现对第一网络层输出的特征图进行修正,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。For any network layer in the codec network model, the truncation value used by the network layer is the maximum eigenvalue and the minimum eigenvalue of all eigenvalues included in the normal sample feature map output by the network layer. The truncation value may be greater than or equal to the value range of the eigenvalues of the abnormal sample feature map in the feature map of a certain channel. In this case, the abnormal eigenvalues of the channel cannot be truncated. In this way, the bit rate and/or reconstructed image of the abnormal sample image after passing through the codec network model will still be abnormal, thereby failing to effectively solve the problem of poor stability of the codec network model. Please refer to Figure 5, which is a statistical graph of the maximum eigenvalues in the feature map of each channel included in the normal sample feature map and the abnormal sample feature map output by a network layer in the codec network model. The maximum truncation value of the network layer is greater than the maximum eigenvalue in the feature map of each channel included in the abnormal sample feature map. At this time, based on the maximum truncation value, the abnormal eigenvalues of other channels except channel 8 cannot be truncated. Please refer to Figure 6. The reconstructed image a in Figure 6 is the reconstructed image obtained by the abnormal sample image after the unquantized codec network model, and the reconstructed image b is the reconstructed image obtained by the abnormal sample image after the quantized codec network model. It can be seen from Figure 6 that the compression rate and reconstructed image of the abnormal sample image after the quantized codec network model are abnormal, and the compression rate and reconstructed image of the abnormal sample image after the quantized codec network model still have large abnormalities. Based on this, the embodiment of the present application provides a codec A quantization method for a network model, after quantizing the codec network model by this method, can effectively truncate the abnormal feature values of the feature map output by the first network layer in each channel, thereby correcting the feature map output by the first network layer, so that the finally generated bitstream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
接下来对本申请实施例涉及的实施环境进行介绍。Next, the implementation environment involved in the embodiments of the present application is introduced.
请参考图7,图7是本申请实施例提供的一种实施环境的示意图。该实施环境包括源装置10、目的地装置20、链路30和存储装置40。其中,源装置10可以产生经编码的图像。因此,源装置10也可以被称为图像编码装置。目的地装置20可以对由源装置10所产生的经编码的图像进行解码。因此,目的地装置20也可以被称为图像解码装置。链路30可以接收源装置10所产生的经编码的图像,并可以将该经编码的图像传输给目的地装置20。存储装置40可以接收源装置10所产生的经编码的图像,并可以将该经编码的图像进行存储,这样的条件下,目的地装置20可以直接从存储装置40中获取经编码的图像。或者,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码的图像的另一中间存储装置,这样的条件下,目的地装置20可以经由流式传输或下载存储装置40存储的经编码的图像。Please refer to FIG. 7, which is a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes a source device 10, a destination device 20, a link 30, and a storage device 40. Among them, the source device 10 can generate an encoded image. Therefore, the source device 10 can also be referred to as an image encoding device. The destination device 20 can decode the encoded image generated by the source device 10. Therefore, the destination device 20 can also be referred to as an image decoding device. The link 30 can receive the encoded image generated by the source device 10, and can transmit the encoded image to the destination device 20. The storage device 40 can receive the encoded image generated by the source device 10, and can store the encoded image. Under such conditions, the destination device 20 can directly obtain the encoded image from the storage device 40. Alternatively, the storage device 40 can correspond to a file server or another intermediate storage device that can store the encoded image generated by the source device 10. Under such conditions, the destination device 20 can stream or download the encoded image stored in the storage device 40.
源装置10和目的地装置20均可以包括一个或多个处理器以及耦合到该一个或多个处理器的存储器,该存储器可以包括随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、带电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、快闪存储器、可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体等。例如,源装置10和目的地装置20均可以包括手机、智能手机、个人数字助手(personal digital assistant,PDA)、可穿戴设备、掌上电脑(pocket PC,PPC)、平板电脑、智能车机、智能电视、智能音箱、桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。The source device 10 and the destination device 20 may each include one or more processors and a memory coupled to the one or more processors, the memory may include random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, any other medium that can be used to store desired program code in the form of instructions or data structures that can be accessed by a computer, etc. For example, the source device 10 and the destination device 20 may each include a mobile phone, a smart phone, a personal digital assistant (PDA), a wearable device, a pocket PC (PPC), a tablet computer, a smart car machine, a smart TV, a smart speaker, a desktop computer, a mobile computing device, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a telephone handset such as a so-called "smart" phone, a television, a camera, a display device, a digital media player, a video game console, a car computer, or the like.
链路30可以包括能够将经编码的图像从源装置10传输到目的地装置20的一个或多个媒体或装置。在一种可能的实现方式中,链路30可以包括能够使源装置10实时地将经编码的图像直接发送到目的地装置20的一个或多个通信媒体。在本申请实施例中,源装置10可以基于通信标准来调制经编码的图像,该通信标准可以为无线通信协议等,并且可以将经调制的图像发送给目的地装置20。该一个或多个通信媒体可以包括无线和/或有线通信媒体,例如该一个或多个通信媒体可以包括射频(radio frequency,RF)频谱或一个或多个物理传输线。该一个或多个通信媒体可以形成基于分组的网络的一部分,基于分组的网络可以为局域网、广域网或全球网络(例如,因特网)等。该一个或多个通信媒体可以包括路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备等,本申请实施例对此不做具体限定。The link 30 may include one or more media or devices capable of transmitting the encoded image from the source device 10 to the destination device 20. In one possible implementation, the link 30 may include one or more communication media that enable the source device 10 to send the encoded image directly to the destination device 20 in real time. In an embodiment of the present application, the source device 10 may modulate the encoded image based on a communication standard, which may be a wireless communication protocol, etc., and may send the modulated image to the destination device 20. The one or more communication media may include wireless and/or wired communication media, for example, the one or more communication media may include a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, which may be a local area network, a wide area network, or a global network (e.g., the Internet), etc. The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., and the embodiment of the present application does not specifically limit this.
在一种可能的实现方式中,存储装置40可以将接收到的由源装置10发送的经编码的图像进行存储,目的地装置20可以直接从存储装置40中获取经编码的图像。这样的条件下,存储装置40可以包括多种分布式或本地存取的数据存储媒体中的任一者,例如,该多种分布式或本地存取的数据存储媒体中的任一者可以为硬盘驱动器、蓝光光盘、数字多功能光盘(digital versatile disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码图像的任何其它合适的数字存储媒体等。In a possible implementation, the storage device 40 may store the received encoded image sent by the source device 10, and the destination device 20 may directly obtain the encoded image from the storage device 40. Under such conditions, the storage device 40 may include any of a variety of distributed or locally accessible data storage media, for example, any of the multiple distributed or locally accessible data storage media may be a hard disk drive, a Blu-ray disc, a digital versatile disc (DVD), a compact disc read-only memory (CD-ROM), a flash memory, a volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded images.
在一种可能的实现方式中,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码图像的另一中间存储装置,目的地装置20可经由流式传输或下载存储装置40存储的图像。文件服务器可以为能够存储经编码的图像并且将经编码的图像发送给目的地装置20的任意类型的服务器。在一种可能的实现方式中,文件服务器可以包括网络服务器、文件传输协议(file transfer protocol,FTP)服务器、网络附属存储(network attached storage,NAS)装置或本地磁盘驱动器等。目的地装置20可以通过任意标准数据连接(包括因特网连接)来获取经编码图像。任意标准数据连接可以包括无线信道(例如,Wi-Fi连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于获取存储在文件服务器上的经编码的图像的两者的组合。经编码的图像从存储装置40的传输可为流式传输、下载传输或两者的组合。In one possible implementation, the storage device 40 may correspond to a file server or another intermediate storage device that can store the encoded image generated by the source device 10, and the destination device 20 may stream or download the image stored in the storage device 40. The file server may be any type of server that can store the encoded image and send the encoded image to the destination device 20. In one possible implementation, the file server may include a network server, a file transfer protocol (FTP) server, a network attached storage (NAS) device, or a local disk drive, etc. The destination device 20 may obtain the encoded image through any standard data connection (including an Internet connection). Any standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a digital subscriber line (DSL), a cable modem, etc.), or a combination of the two suitable for obtaining the encoded image stored on the file server. The transmission of the encoded image from the storage device 40 may be streaming transmission, download transmission, or a combination of the two.
图7所示的实施环境仅为一种可能的实现方式,并且本申请实施例的技术不仅可以适用于图7所示的可以对图像进行编码的源装置10,以及可以对经编码的图像进行解码的目的地装置20,还可以适用于其他可以对图像进行编码和对经编码的图像进行解码的装置,本申请实施例对此不做具体限定。 The implementation environment shown in FIG7 is only one possible implementation method, and the technology of the embodiment of the present application is not only applicable to the source device 10 that can encode images and the destination device 20 that can decode the encoded images shown in FIG7, but also to other devices that can encode images and decode encoded images, and the embodiment of the present application does not make specific limitations on this.
在图7所示的实施环境中,源装置10包括数据源120、编码器100和输出接口140。在一些实施例中,输出接口140可以包括调节器/解调器(调制解调器)和/或发送器,其中发送器也可以称为发射器。数据源120可以包括图像捕获装置(例如,摄像机等)、含有先前捕获的图像的存档、用于从图像内容提供者接收图像的馈入接口,和/或用于产生图像的计算机图形系统,或图像的这些来源的组合。In the implementation environment shown in FIG7 , the source device 10 includes a data source 120, an encoder 100, and an output interface 140. In some embodiments, the output interface 140 may include a modulator/demodulator (modem) and/or a transmitter, where the transmitter may also be referred to as a transmitter. The data source 120 may include an image capture device (e.g., a camera, etc.), an archive containing previously captured images, a feed interface for receiving images from an image content provider, and/or a computer graphics system for generating images, or a combination of these sources of images.
数据源120可以向编码器100发送图像,编码器100可以对接收到由数据源120发送的图像进行编码,得到经编码的图像。编码器可以将经编码的图像发送给输出接口。在一些实施例中,源装置10经由输出接口140将经编码的图像直接发送到目的地装置20。在其它实施例中,经编码的图像还可存储到存储装置40上,供目的地装置20以后获取并用于解码和/或显示。The data source 120 may send an image to the encoder 100, and the encoder 100 may encode the image received from the data source 120 to obtain an encoded image. The encoder may send the encoded image to the output interface. In some embodiments, the source device 10 directly sends the encoded image to the destination device 20 via the output interface 140. In other embodiments, the encoded image may also be stored in the storage device 40 for the destination device 20 to obtain and use for decoding and/or display later.
在图7所示的实施环境中,目的地装置20包括输入接口240、解码器200和显示装置220。在一些实施例中,输入接口240包括接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码的图像,然后再发送给解码器200,解码器200可以对接收到的经编码的图像进行解码,得到经解码的图像。解码器可以将经解码的图像发送给显示装置220。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码的图像。显示装置220可以为多种类型中的任一种类型的显示装置,例如,显示装置220可以为液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。In the implementation environment shown in FIG. 7 , the destination device 20 includes an input interface 240, a decoder 200, and a display device 220. In some embodiments, the input interface 240 includes a receiver and/or a modem. The input interface 240 may receive an encoded image via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 may decode the received encoded image to obtain a decoded image. The decoder may send the decoded image to the display device 220. The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20. In general, the display device 220 displays the decoded image. The display device 220 may be any of a variety of types of display devices, for example, the display device 220 may be a liquid crystal display (LCD), a plasma display, an organic light-emitting diode (OLED) display, or other types of display devices.
尽管图7中未示出,但在一些方面,编码器100和解码器200可各自与编码器和解码器集成,且可以包括适当的多路复用器-多路分用器(multiplexer-demultiplexer,MUX-DEMUX)单元或其它硬件和软件,用于共同数据流或单独数据流中的音频和视频两者的编码。在一些实施例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。Although not shown in FIG. 7 , in some aspects, encoder 100 and decoder 200 may be integrated with an encoder and decoder, respectively, and may include appropriate multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software for encoding both audio and video in a common data stream or in separate data streams. In some embodiments, the MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP), if applicable.
编码器100和解码器200各自可为以下各项电路中的任一者:一个或多个微处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请实施例的技术,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本申请实施例的技术。前述内容(包括硬件、软件、硬件与软件的组合等)中的任一者可被视为一个或多个处理器。编码器100和解码器200中的每一者都可以包括在一个或多个编码器或解码器中,所述编码器或所述解码器中的任一者可以集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。The encoder 100 and the decoder 200 may each be any of the following circuits: one or more microprocessors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (application specific integrated circuits, ASIC), field-programmable gate arrays (field-programmable gate arrays, FPGA), discrete logic, hardware, or any combination thereof. If the technology of the embodiment of the present application is partially implemented in software, the device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors to execute the instructions in hardware to implement the technology of the embodiment of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be regarded as one or more processors. Each of the encoder 100 and the decoder 200 may be included in one or more encoders or decoders, and any of the encoders or decoders may be integrated as part of a combined encoder/decoder (codec) in a corresponding device.
本申请实施例可大体上将编码器100称为将某些信息“发信号通知”或“发送”到例如解码器200的另一装置。术语“发信号通知”或“发送”可大体上指代用于对经压缩的图像进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。The embodiments of the present application may generally refer to the encoder 100 as "signaling" or "sending" certain information to another device, such as the decoder 200. The term "signaling" or "sending" may generally refer to the transmission of syntax elements and/or other data used to decode the compressed image. This transmission may occur in real time or near real time. Alternatively, this communication may occur over a period of time, such as when the syntax elements are stored in the encoded bitstream to a computer-readable storage medium at the time of encoding, and the decoding device may then retrieve the syntax elements at any time after the syntax elements are stored to this medium.
本申请实施例提供的编解码网络模型的量化方法可以应用于多种场景,在各种场景中编解码的图像均可以是图像文件包括的图像,也可以是视频文件包括的图像。The quantization method of the codec network model provided in the embodiment of the present application can be applied to a variety of scenarios. The images encoded and decoded in various scenarios can be images included in image files or images included in video files.
图8是本申请实施例提供的另一种实施环境的示意图。该实施环境包括编码端和解码端,编码端包括AI编码模块、熵编码模块和文件保存模块,解码端包括文件加载模块、熵解码模块和AI解码模块。Fig. 8 is a schematic diagram of another implementation environment provided by an embodiment of the present application. The implementation environment includes an encoding end and a decoding end, the encoding end includes an AI encoding module, an entropy encoding module and a file saving module, and the decoding end includes a file loading module, an entropy decoding module and an AI decoding module.
在压缩过程中,编码端获取待压缩的图像,例如相机采集的视频、图片等图像。之后,经AI编码模块得到待编码特征和对应的概率分布,基于概率分布,通过熵编码模块对待编码特征进行熵编码,以得到码流文件,通过文件保存模块对该码流文件进行保存,以得到图像的压缩文件。压缩文件输入到解码端,解码端通过文件加载模块加载压缩文件,并通过熵解码模块和AI解码模块得到重建图像。During the compression process, the encoder obtains the image to be compressed, such as videos and pictures captured by the camera. After that, the AI encoding module obtains the features to be encoded and the corresponding probability distribution. Based on the probability distribution, the entropy encoding module performs entropy encoding on the features to be encoded to obtain a bitstream file, which is saved by the file saving module to obtain a compressed file of the image. The compressed file is input to the decoder, which loads the compressed file through the file loading module and obtains the reconstructed image through the entropy decoding module and the AI decoding module.
上述编解码过程可以通过编解码网络模型来实现,本申请实施例所提供的编解码网络模型的量化方法能够对编解码网络模型进行量化。可选地,该编解码网络模型包括编码网络模型、解码网络模型和熵估计网络模型中的至少一个网络模型。The above encoding and decoding process can be implemented by an encoding and decoding network model, and the quantization method of the encoding and decoding network model provided in the embodiment of the present application can quantize the encoding and decoding network model. Optionally, the encoding and decoding network model includes at least one network model of an encoding network model, a decoding network model and an entropy estimation network model.
需要说明的是,图8与图1所示图像压缩框架的图像编解码原理是类似的,图8中的AI编码单元相当于包含了图1中的编码网络模型和熵估计网络模型,AI解码单元相当于包含了图1中的解码网络模型 和熵估计网络模型。It should be noted that the image encoding and decoding principles of the image compression framework shown in Figure 8 and Figure 1 are similar. The AI encoding unit in Figure 8 is equivalent to including the encoding network model and entropy estimation network model in Figure 1, and the AI decoding unit is equivalent to including the decoding network model in Figure 1. and entropy estimation network models.
可选地,AI编码模块和AI解码模块处理数据的过程在嵌入式的神经网络处理器(neural network processing unit,NPU)上实现,以提高数据处理效率,熵编码、保存文件以及加载文件等过程在中央处理器(central processing unit,CPU)上实现。Optionally, the data processing process of the AI encoding module and the AI decoding module is implemented on an embedded neural network processing unit (NPU) to improve data processing efficiency, and the processes such as entropy coding, saving files, and loading files are implemented on a central processing unit (CPU).
可选地,编码端和解码端为一台设备,或者,编码端和解码端为两台独立的设备。若编码端和解码端为一台设备,可以将通过本申请实施例提供的方法进行量化后的编解码网络模型模型部署在该设备上。若编码端和解码端为两台独立的设备,可以将通过本申请实施例提供的方法进行量化后的编解码网络模型分别部署在这两台设备中。也即是,对于一台设备来说,该设备既具备图像压缩功能,也具备图像解压功能,或者,该设备具备图像压缩功能或图像解压功能。Optionally, the encoding end and the decoding end are one device, or the encoding end and the decoding end are two independent devices. If the encoding end and the decoding end are one device, the encoding and decoding network model model quantized by the method provided in the embodiment of the present application can be deployed on the device. If the encoding end and the decoding end are two independent devices, the encoding and decoding network model quantized by the method provided in the embodiment of the present application can be deployed in the two devices respectively. That is, for one device, the device has both an image compression function and an image decompression function, or the device has an image compression function or an image decompression function.
本申请实施例提供的编解码网络模型的量化方法可以应用于多种场景,比如云存储、视频监控、直播、传输等业务场景中,具体可以应用到终端录像、视频相册、云存储等中。The quantization method of the encoding and decoding network model provided in the embodiment of the present application can be applied to a variety of scenarios, such as cloud storage, video surveillance, live broadcast, transmission and other business scenarios, and can be specifically applied to terminal recording, video albums, cloud storage, etc.
需要说明的是,本申请实施例提供的编解码网络模型的量化方法可以应用于任何编解码网络模型中。接下来先对基于VAE的编解码网络模型进行介绍。It should be noted that the quantization method of the codec network model provided in the embodiment of the present application can be applied to any codec network model. Next, the codec network model based on VAE is introduced.
请参考图9,在编码端,将原始图像输入编码网络模型提取特征得到多个特征点的待量化的图像特征y,对该多个特征点的待量化的图像特征y进行量化得到该多个特征点的第一图像特征将该多个特征点的第一图像特征输入至超编码网络模型得到该多个特征点的待量化的超先验特征z,对该多个特征点的待量化的超先验特征z进行量化得到该多个特征点的第一超先验特征根据指定的概率分布对该多个特征点的第一超先验特征进行熵编码,以将编入码流。如图2所示对进行熵编码得到的比特序列即为码流包括的部分比特序列,这部分比特序列(如图2中右侧黑白条所示)可称为超先验比特流。Please refer to FIG9 . At the encoding end, the original image is input into the encoding network model to extract features to obtain image features y to be quantified of multiple feature points, and the image features y to be quantified of the multiple feature points are quantized to obtain the first image features of the multiple feature points. The first image feature of the plurality of feature points Input into the super coding network model to obtain the super prior features z of the multiple feature points to be quantified, and quantize the super prior features z of the multiple feature points to be quantified to obtain the first super prior features of the multiple feature points The first super-prior features of the multiple feature points are calculated according to the specified probability distribution. Entropy coding is performed to convert As shown in Figure 2 The bit sequence obtained by entropy coding is a partial bit sequence included in the code stream. This partial bit sequence (as shown by the black and white bars on the right side of FIG. 2 ) can be called a super-prior bit stream.
另外,将该多个特征点的第一超先验特征输入超解码网络模型得到该多个特征点的先验特征ψ。将该多个特征点的第一图像特征输入上下文模型(context model,CM)得到该多个特征点的上下文特征φ。结合该多个特征点的先验特征ψ和上下文特征φ,通过概率分布估计网络模型(图示为gather model,GM)估计出该多个特征点的概率分布N(μ,σ),基于该多个特征点的概率分布N(μ,σ),依次将该多个特征点中各个特征点的第一图像特征以编入码流。如图9所示对进行熵编码得到的比特序列即为码流包括的部分比特序列,这部分比特序列(如图9中左侧黑白条所示)可称为图像比特流。In addition, the first super prior features of the multiple feature points Input the super decoding network model to obtain the prior features ψ of the multiple feature points. The context model (CM) is input to obtain the context features φ of the multiple feature points. The probability distribution N(μ, σ) of the multiple feature points is estimated through a probability distribution estimation network model (shown as a gather model, GM) based on the probability distribution N(μ, σ) of the multiple feature points. The first image feature of each feature point in the multiple feature points is sequentially converted into the image feature vector. To be encoded into the code stream. The bit sequence obtained by entropy coding is a partial bit sequence included in the code stream. This partial bit sequence (as shown by the black and white bars on the left side of FIG. 9 ) can be called an image bit stream.
在解码端,首先根据指定的概率分布从码流包括的超先验比特流中熵解码得到该多个特征点的第一超先验特征将该多个特征点的第一超先验特征输入超解码网络模型得到该多个特征点的先验特征ψ。对于该多个特征点中的首个特征点,基于首个特征点的先验特征来估计首个特征点的概率分布,基于首个特征点的概率分布,从码流包括的图像比特流中解析出首个特征点的第一图像特征。对于该多个特征点中的非首个特征点,如第一特征点,从已解码的各个特征点的第一图像特征中,确定第一特征点的周边信息,将第一特征点的周边信息输入上下文模型CM得到第一特征点的上下文特征,结合第一特征点的先验特征和上下文特征,通过概率分布估计网络模型GM估计出第一特征点的概率分布,基于第一特征点的概率分布,从码流包括的图像比特流中解析出第一特征点的第一图像特征。从码流中熵解码出该多个特征点的第一图像特征之后,将输入解码网络模型得到重建后的图像。At the decoding end, first, the first super-prior feature of the plurality of feature points is obtained by entropy decoding from the super-prior bit stream included in the bit stream according to the specified probability distribution. The first super prior features of the multiple feature points Input the super decoding network model to obtain the prior features ψ of the multiple feature points. For the first feature point among the multiple feature points, estimate the probability distribution of the first feature point based on the prior features of the first feature point, and parse the first image feature of the first feature point from the image bit stream included in the code stream based on the probability distribution of the first feature point. For the non-first feature point among the multiple feature points, such as the first feature point, determine the surrounding information of the first feature point from the first image features of each decoded feature point, input the surrounding information of the first feature point into the context model CM to obtain the context feature of the first feature point, combine the prior features and context features of the first feature point, estimate the probability distribution of the first feature point through the probability distribution estimation network model GM, and parse the first image feature of the first feature point from the image bit stream included in the code stream based on the probability distribution of the first feature point. Entropy decode the first image feature of the multiple feature points from the code stream Afterwards, Input the decoding network model to obtain the reconstructed image.
需要说明的是,若上述任一概率分布估计网络模型使用高斯模型(如单高斯模型或混合高斯模型)来建模,则估计出的概率分布参数包括均值和方差。若上述任一概率分布估计网络模型使用拉普拉斯分布模型来建模,则估计出的概率分布参数包括位置参数和尺度参数。若上述任一概率分布估计网络模型使用逻辑斯谛分布模型来建模,则估计出的概率分布参数包括均值和尺度参数。另外,本申请实施例中的概率分布估计网络模型也可称为因子熵模型,概率分布估计网络模型是熵估计网络模型的一部分,熵估计网络模型还包括上述超编码网络模型和超解码网络模型。It should be noted that if any of the above-mentioned probability distribution estimation network models uses a Gaussian model (such as a single Gaussian model or a mixed Gaussian model) to model, the estimated probability distribution parameters include mean and variance. If any of the above-mentioned probability distribution estimation network models uses a Laplace distribution model to model, the estimated probability distribution parameters include location parameters and scale parameters. If any of the above-mentioned probability distribution estimation network models uses a logistic distribution model to model, the estimated probability distribution parameters include mean and scale parameters. In addition, the probability distribution estimation network model in the embodiment of the present application can also be referred to as a factor entropy model, and the probability distribution estimation network model is a part of the entropy estimation network model, and the entropy estimation network model also includes the above-mentioned super encoding network model and super decoding network model.
图10是本申请实施例提供的一种编码网络模型的结构示意图。该编码网络模型为卷积神经网络模型,该卷积神经网络模型包括四个卷积层(Conv)和穿插级联的三个激活层(如基于Relu或其他激活函数构 建的激活层)。每个卷积层的卷积核大小均为5×5,输出的特征图的通道数为M,每个卷积层对宽和高进行2倍下采样。需要说明的是,图10所示编码网络模型的结构并不用于限制本申请实施例,例如,卷积核大小、特征图的通道数、下采样倍数、下采样次数、卷积层数等均可调整。FIG10 is a schematic diagram of the structure of a coding network model provided in an embodiment of the present application. The coding network model is a convolutional neural network model, which includes four convolutional layers (Conv) and three activation layers (such as Relu or other activation function structures) interspersed in cascade. The activation layer is constructed). The convolution kernel size of each convolution layer is 5×5, the number of channels of the output feature map is M, and each convolution layer downsamples the width and height by 2 times. It should be noted that the structure of the encoding network model shown in Figure 10 is not used to limit the embodiments of the present application. For example, the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the number of convolution layers, etc. can all be adjusted.
图11是本申请实施例提供的一种解码网络模型的结构示意图。该解码网络模型为卷积神经网络模型,该卷积神经网络模型包括四个卷积层(Conv)和穿插级联的三个激活层(如基于Relu或其他激活函数构建的激活层)。每个卷积层的卷积核大小均为5×5,输出的特征图的通道数为M或N,每个卷积层对宽和高进行2倍上采样。需要说明的是,图11所示解码网络模型的结构并不用于限制本申请实施例,例如,卷积核大小、特征图的通道数、下采样倍数、下采样次数、卷积层数等均可调整。Figure 11 is a structural diagram of a decoding network model provided in an embodiment of the present application. The decoding network model is a convolutional neural network model, which includes four convolutional layers (Conv) and three activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions). The convolution kernel size of each convolution layer is 5×5, the number of channels of the output feature map is M or N, and each convolution layer upsamples the width and height by 2 times. It should be noted that the structure of the decoding network model shown in Figure 11 is not used to limit the embodiments of the present application. For example, the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the number of convolution layers, etc. can all be adjusted.
图12是本申请实施例提供的一种熵估计网络模型的结构示意图。该熵估计网络模型包括超编码网络模型(HyEnc)、因子熵模型和超解码网络模型(HyDec)。超编码网络模型包括三个卷积层(Conv)和穿插级联的两个激活层(如基于Relu或其他激活函数构建的激活层)。每个卷积层的卷积核大小均为5×5,输出的特征图的通道数为M,前两个卷积层对宽和高进行2倍下采样,最后一个卷积层不进行下采样。因子熵模型的网络模型结构如前述介绍的概率分布估计网络模型的网络结构。超编码网络模型包括三个卷积层(Conv)和穿插级联的两个激活层(如基于Relu或其他激活函数构建的激活层)。每个卷积层的卷积核大小均为5×5,输出的特征图的通道数为M,第一个卷积层不进行上采样,后两个卷积层对宽和高进行2倍上采样。需要说明的是,图12所示熵估计网络模型的结构并不用于限制本申请实施例,例如,卷积核大小、特征图的通道数、下采样倍数、下采样次数、上采样倍数、上采样次数、卷积层数等均可调整。Figure 12 is a schematic diagram of the structure of an entropy estimation network model provided in an embodiment of the present application. The entropy estimation network model includes a super encoding network model (HyEnc), a factor entropy model and a super decoding network model (HyDec). The super encoding network model includes three convolutional layers (Conv) and two activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions). The convolution kernel size of each convolutional layer is 5×5, the number of channels of the output feature map is M, the first two convolutional layers downsample the width and height by 2 times, and the last convolutional layer does not downsample. The network model structure of the factor entropy model is the same as the network structure of the probability distribution estimation network model introduced above. The super encoding network model includes three convolutional layers (Conv) and two activation layers interspersed in cascade (such as activation layers built based on Relu or other activation functions). The convolution kernel size of each convolutional layer is 5×5, the number of channels of the output feature map is M, the first convolutional layer does not upsample, and the last two convolutional layers upsample the width and height by 2 times. It should be noted that the structure of the entropy estimation network model shown in Figure 12 is not used to limit the embodiments of the present application. For example, the convolution kernel size, the number of channels of the feature map, the downsampling multiple, the number of downsampling times, the upsampling multiple, the number of upsampling times, the number of convolution layers, etc. can all be adjusted.
图13是本申请实施例提供的一种编解码网络模型的量化方法的流程图。请参考图13,该方法包括如下步骤。Fig. 13 is a flow chart of a quantization method of a coding network model provided in an embodiment of the present application. Referring to Fig. 13, the method includes the following steps.
步骤1301:确定未量化的编解码网络模型包括的H个网络单元,该H个网络单元中的每个网络单元包括第一网络层、第二网络层和第三网络层,该第一网络层与第二网络层均为线性运算层,该第三网络层位于第一网络层与第二网络层之间,该第三网络层用于对第一网络层输出的特征图中的特征值进行截断,以使第三网络层输出的特征图中的特征值全为非负或者全为非正,H为大于或等于1的整数。Step 1301: Determine H network units included in the unquantized codec network model, each of the H network units includes a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, and the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, and H is an integer greater than or equal to 1.
在一些实施例中,可以从未量化的编解码网络模型中确定出多个第三网络层,对于该多个第三网络层中的任意一个第三网络层,若该第三网络层之前相邻的网络层为线性运算层,则将该相邻的网络层作为一个第一网络层,若该第三网络层之前相邻的网络层不为线性运算层,则不将该相邻的网络层作为一个第一网络层。按照相同的方式对该多个第三网络层中的每个第三网络层都进行处理,最终能够确定出H个网络单元中的第一网络层。In some embodiments, multiple third network layers can be determined from an unquantized codec network model, and for any third network layer among the multiple third network layers, if the network layer adjacent to the third network layer is a linear operation layer, the adjacent network layer is used as a first network layer, and if the network layer adjacent to the third network layer is not a linear operation layer, the adjacent network layer is not used as a first network layer. Each third network layer among the multiple third network layers is processed in the same manner, and finally the first network layer among the H network units can be determined.
也就是说,可以将编解码网络模型中每个第三网络层之前相邻的线性运算层都作为第一网络层。当然,由于编解码网络模型中并不是每个第三网络层之前相邻的线性运算层都会导致编解码结果异常,因此,也可以对编解码网络模型包括的每个线性运算层进行筛选,以确定会导致编解码结果异常的线性运算层,将会导致编解码结果异常的线性运算层作为第一网络层。接下来将确定会导致编解码结果异常的线性运算层的过程进行详细介绍。That is to say, the linear operation layer adjacent to each third network layer in the codec network model can be used as the first network layer. Of course, since not every linear operation layer adjacent to the third network layer in the codec network model will cause abnormal codec results, each linear operation layer included in the codec network model can also be screened to determine the linear operation layer that will cause abnormal codec results, and the linear operation layer that will cause abnormal codec results will be used as the first network layer. Next, the process of determining the linear operation layer that will cause abnormal codec results will be introduced in detail.
确定第一线性运算层输出的M个第一样本特征图,该M个第一样本特征图为M个正常样本图像对应的特征图,M为大于或等于1的整数,第一线性运算层为编解码网络模型包括的任意一个线性运算层,确定第一线性运算层输出的N个第二样本特征图,该N个第二样本特征图为N个异常样本图像对应的特征图,N为大于或等于1的整数,在该M个第一样本特征图和N个第二样本特征图满足特征异常条件的情况下,确定第一线性运算层为第一网络层。Determine M first sample feature maps output by the first linear operation layer, the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, the first linear operation layer is any linear operation layer included in the encoding and decoding network model, determine N second sample feature maps output by the first linear operation layer, the N second sample feature maps are feature maps corresponding to N abnormal sample images, N is an integer greater than or equal to 1, and when the M first sample feature maps and the N second sample feature maps meet the feature abnormality condition, determine that the first linear operation layer is the first network layer.
将M个正常样本图像经过第一线性运算层输出的M个特征图作为M个第一样本特征图,将N个异常样本图像经过第一线性运算层输出的N个特征图作为N个第二样本特征图。进而基于M个第一样本特征图,确定第一统计参数,基于N个第二样本特征图,确定第二统计参数,基于第一统计参数和第二统计参数,确定M个第一样本特征图和N个第二样本特征图是否满足特征异常条件。The M feature maps output by the first linear operation layer of the M normal sample images are used as the M first sample feature maps, and the N feature maps output by the first linear operation layer of the N abnormal sample images are used as the N second sample feature maps. Then, based on the M first sample feature maps, the first statistical parameter is determined, based on the N second sample feature maps, the second statistical parameter is determined, and based on the first statistical parameter and the second statistical parameter, it is determined whether the M first sample feature maps and the N second sample feature maps meet the feature abnormality condition.
可选地,第一线性运算层包括W个通道,第一样本特征图包括W个通道的特征图,第一统计参数包括W个通道的第一参数。在这种情况下,基于M个第一样本特征图,确定第一统计参数的实现过程包括:统计M个第一样本特征图中每个第一样本特征图包括的目标通道的特征图的候选参数,以得到M个候选参数,目标通道为该W个通道中的任意一个通道,基于M个候选参数,确定目标通道的第一参数。 Optionally, the first linear operation layer includes W channels, the first sample feature map includes feature maps of the W channels, and the first statistical parameter includes first parameters of the W channels. In this case, based on the M first sample feature maps, the implementation process of determining the first statistical parameter includes: counting candidate parameters of the feature map of the target channel included in each of the M first sample feature maps to obtain M candidate parameters, the target channel is any one of the W channels, and based on the M candidate parameters, determining the first parameter of the target channel.
可选地,第一参数包括第一最大值和第一最小值,每个第一样本特征图中目标通道的特征图包括多个特征点的特征值,该候选参数包括候选最大特征值和候选最小特征值。在这种情况下,确定目标通道的第一参数的详细实现过程包括:统计每个第一样本特征图包括的目标通道的特征图中的最大特征值和最小特征值,以得到M个候选最大特征值和M个候选最小特征值。基于M个候选最大特征值确定第一最大值,基于M个候选最小特征值确定第一最小值。Optionally, the first parameter includes a first maximum value and a first minimum value, the feature map of the target channel in each first sample feature map includes feature values of multiple feature points, and the candidate parameter includes a candidate maximum feature value and a candidate minimum feature value. In this case, the detailed implementation process of determining the first parameter of the target channel includes: counting the maximum feature value and the minimum feature value in the feature map of the target channel included in each first sample feature map to obtain M candidate maximum feature values and M candidate minimum feature values. The first maximum value is determined based on the M candidate maximum feature values, and the first minimum value is determined based on the M candidate minimum feature values.
在一些实施例中,可以将M个候选最大特征值的平均值确定为第一最大值。在另一些实施例中,也可以基于M个候选最大特征值,使用滑动平均法确定第一最大值。在另一些实施例中,还可以基于M个候选最大特征值,确定该M个候选最大特征值对应的均值和标准差,进而将三倍的标准差与均值的和确定为第一最大值。在另一些实施例中,还可以将M个候选最大特征值中的最大值确定为第一最大值。当然,还可以通过其他的方式确定第一最大值,本申请实施例对此不做限定。In some embodiments, the average value of the M candidate maximum eigenvalues can be determined as the first maximum value. In other embodiments, the first maximum value can also be determined based on the M candidate maximum eigenvalues using a sliding average method. In other embodiments, the mean and standard deviation corresponding to the M candidate maximum eigenvalues can also be determined based on the M candidate maximum eigenvalues, and then the sum of three times the standard deviation and the mean can be determined as the first maximum value. In other embodiments, the maximum value among the M candidate maximum eigenvalues can also be determined as the first maximum value. Of course, the first maximum value can also be determined in other ways, and the embodiments of the present application are not limited to this.
基于M个候选最小特征值确定第一最小值的实现方式与基于M个候选最大特征值确定第一最大值的实现方式类似,详细内容请参考上文中的相关内容,此处不再赘述。The implementation method of determining the first minimum value based on M candidate minimum eigenvalues is similar to the implementation method of determining the first maximum value based on M candidate maximum eigenvalues. For details, please refer to the relevant content above and will not be repeated here.
在一些实施例中,第二样本特征图包括W个通道的特征图,第二统计参数包括W个通道的第二参数。在这种情况下,基于N个第二样本特征图,确定第二统计参数的实现过程包括:统计N个第二样本特征图中每个第二样本特征图包括的目标通道的特征图的候选参数,以得到N个候选参数,目标通道为该W个通道中的任意一个通道,基于N个候选参数,确定目标通道的第二参数。In some embodiments, the second sample feature map includes feature maps of W channels, and the second statistical parameter includes second parameters of the W channels. In this case, based on the N second sample feature maps, the implementation process of determining the second statistical parameter includes: counting the candidate parameters of the feature map of the target channel included in each of the N second sample feature maps to obtain N candidate parameters, the target channel is any one of the W channels, and based on the N candidate parameters, determining the second parameter of the target channel.
可选地,第二参数包括第二最大值和第二最小值,每个第二样本特征图中目标通道的特征图包括多个特征点的特征值,该候选参数包括候选最大特征值和候选最小特征值。在这种情况下,确定目标通道的第二参数的实现过程与上述确定目标通道的第一参数的实现过程类似,详细内容请参考上文中的相关内容,此处不再赘述。Optionally, the second parameter includes a second maximum value and a second minimum value, the feature map of the target channel in each second sample feature map includes feature values of multiple feature points, and the candidate parameter includes a candidate maximum feature value and a candidate minimum feature value. In this case, the implementation process of determining the second parameter of the target channel is similar to the implementation process of determining the first parameter of the target channel described above. For details, please refer to the relevant content above, which will not be repeated here.
基于第一统计参数和第二统计参数,确定M个第一样本特征图和N个第二样本特征图是否满足特征异常条件的实现过程包括:基于W个通道的第一参数和W个通道的第二参数,确定异常通道数量,在异常通道数量大于或等于异常通道数量阈值的情况下,确定M个第一样本特征图和N个第二样本特征图满足特征异常条件。在异常通道数量小于异常通道数量阈值的情况下,确定M个第一样本特征图和N个第二样本特征图不满足特征异常条件。The implementation process of determining whether the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition based on the first statistical parameter and the second statistical parameter includes: determining the number of abnormal channels based on the first parameters of the W channels and the second parameters of the W channels, and determining that the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition when the number of abnormal channels is greater than or equal to the threshold value of the number of abnormal channels. When the number of abnormal channels is less than the threshold value of the number of abnormal channels, determining that the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition.
在一些实施例中,对于W个通道中的任意一个通道,若该通道的第一最大值与第二最大值的差值大于最大值差值阈值且第一最小值与第二最小值的差值大于最小值差值阈值,则确定该通道为异常通道,并将异常通道数量加1。否则,则确定该通道不为异常通道。按照相同的方式对W个通道中的每个通道都按照上述方式进行处理,以得到异常通道数量。在另一些实施例中,对于W个通道中的任意一个通道,若该通道的第一最大值与第二最大值的差值除以第一最大值的值大于最大值比例阈值,且第一最小值与第二最小值的差值除以第一最小值的值大于最小值比例阈值,则确定该通道为异常通道,并将异常通道数量加1。否则,则确定该通道不为异常通道。按照相同的方式对W个通道中的每个通道都按照上述方式进行处理,以得到异常通道数量。当然,也可以通过其他的方式确定异常通道数量,本申请实施例对此不做限定。In some embodiments, for any one of the W channels, if the difference between the first maximum value and the second maximum value of the channel is greater than the maximum value difference threshold and the difference between the first minimum value and the second minimum value is greater than the minimum value difference threshold, the channel is determined to be an abnormal channel, and the number of abnormal channels is increased by 1. Otherwise, it is determined that the channel is not an abnormal channel. In the same way, each of the W channels is processed in the above manner to obtain the number of abnormal channels. In other embodiments, for any one of the W channels, if the difference between the first maximum value and the second maximum value of the channel divided by the value of the first maximum value is greater than the maximum value ratio threshold, and the difference between the first minimum value and the second minimum value divided by the value of the first minimum value is greater than the minimum value ratio threshold, the channel is determined to be an abnormal channel, and the number of abnormal channels is increased by 1. Otherwise, it is determined that the channel is not an abnormal channel. In the same way, each of the W channels is processed in the above manner to obtain the number of abnormal channels. Of course, the number of abnormal channels can also be determined in other ways, and the embodiments of the present application are not limited to this.
在异常通道数量大于或等于异常通道数量阈值的情况下,说明异常通道的数量较多,因此,可以确定M个第一样本特征图和N个第二样本特征图满足特征异常条件。在异常通道数量小于异常通道数量阈值的情况下,说明异常通道的数量较少,因此,可以确定M个第一样本特征图和N个第二样本特征图不满足特征异常条件。When the number of abnormal channels is greater than or equal to the threshold value of the number of abnormal channels, it means that the number of abnormal channels is large, and therefore, it can be determined that the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition. When the number of abnormal channels is less than the threshold value of the number of abnormal channels, it means that the number of abnormal channels is small, and therefore, it can be determined that the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition.
在M个第一样本特征图和N个第二样本特征图满足特征异常条件的情况下,说明第一线性运算层输出的特征图存在异常,因此,可以确定第一线性运算层为第一网络层。在M个第一样本特征图和N个第二样本特征图不满足特征异常条件的情况下,说明第一线性运算层输出的特征图不存在异常,因此,可以确定第一线性运算层不为第一网络层。When the M first sample feature graphs and the N second sample feature graphs meet the feature abnormality condition, it means that the feature graph output by the first linear operation layer is abnormal, and therefore, it can be determined that the first linear operation layer is the first network layer. When the M first sample feature graphs and the N second sample feature graphs do not meet the feature abnormality condition, it means that the feature graph output by the first linear operation layer is not abnormal, and therefore, it can be determined that the first linear operation layer is not the first network layer.
在一些实施例中,确定未量化的编解码网络模型包括的H个网络单元中的第二网络层的实现过程包括:对于每个网络单元中的第一网络层,在该第一网络层之后具有线性运算层的情况下,将该第一网络层之后的线性运算层确定为第二网络层,在该第一网络层之后不具有线性运算层的情况下,在该第一网络层之后添加一个线性运算层,将添加的线性运算层确定为第二网络层。In some embodiments, the implementation process of determining the second network layer in the H network units included in the unquantized codec network model includes: for the first network layer in each network unit, when there is a linear operation layer after the first network layer, the linear operation layer after the first network layer is determined as the second network layer; when there is no linear operation layer after the first network layer, a linear operation layer is added after the first network layer, and the added linear operation layer is determined as the second network layer.
需要说明的是,若第二网络层为添加的线性运算层,则该第二网络层在进行权重缩放之前,该第二网络层中包括的权重可以全为1,以使该第二网络层在进行权重缩放之前,输入特征图和输出特征图一致。 可选地,该编解码网络模型包括编码网络模型、解码网络模型或者熵估计网络模型。It should be noted that if the second network layer is an added linear operation layer, the weights included in the second network layer can all be 1 before weight scaling is performed on the second network layer, so that the input feature map and the output feature map of the second network layer are consistent before weight scaling is performed. Optionally, the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
其中,异常通道数量阈值是事先设置的,与线性运算层的通道数量有关。例如,该异常通道数量阈值可以设置为线性运算层的通道数量的百分之八十。最小值差值阈值、最大值差值阈值、最大值比例阈值和最小值比例阈值也是事先设置的,在不同的情况下,还可以按照不同的需求进行调整。The abnormal channel number threshold is set in advance and is related to the number of channels in the linear operation layer. For example, the abnormal channel number threshold can be set to 80% of the number of channels in the linear operation layer. The minimum value difference threshold, the maximum value difference threshold, the maximum value ratio threshold and the minimum value ratio threshold are also set in advance and can be adjusted according to different requirements in different situations.
作为一种示例,请参考图14,图14是本申请实施例提供的一种网络单元的示意图,在该图14中,编解码网络模型包括H个网络单元,分别为网络单元1、网络单元2至网络单元H。As an example, please refer to Figure 14, which is a schematic diagram of a network unit provided in an embodiment of the present application. In Figure 14, the encoding and decoding network model includes H network units, namely network unit 1, network unit 2 to network unit H.
步骤1302:对每个网络单元中第一网络层包括的各个输出通道的权重进行缩放,以使每个网络单元中第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值。Step 1302: Scale the weights of each output channel included in the first network layer in each network unit so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold.
对于每个网络单元中的第一网络层,确定该第一网络层输出的K个第三样本特征图,该K个第三样本特征图为K个正常样本图像对应的特征图,K为大于或等于1的整数,基于K个第三样本特征图确定该第一网络层包括的各个输出通道的缩放比例,按照该第一网络层包括的各个输出通道的缩放比例,对该第一网络层包括的各个输出通道的权重进行缩放。For the first network layer in each network unit, determine K third sample feature maps output by the first network layer, where the K third sample feature maps are feature maps corresponding to K normal sample images, K is an integer greater than or equal to 1, and determine the scaling ratio of each output channel included in the first network layer based on the K third sample feature maps, and scale the weights of each output channel included in the first network layer according to the scaling ratio of each output channel included in the first network layer.
在一些实施例中,基于K个第三样本特征图确定该第一网络层对应的参考特征值以及第一网络层包括的各个输出通道的边界特征值,将该第一网络层对应的参考特征值与第一网络层包括的各个输出通道的边界特征值之间的比例,确定为该第一网络层包括的各个输出通道的缩放比例。In some embodiments, the reference feature value corresponding to the first network layer and the boundary feature value of each output channel included in the first network layer are determined based on K third sample feature maps, and the ratio between the reference feature value corresponding to the first network layer and the boundary feature value of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer.
基于K个第三样本特征图,确定该第一网络层的第三统计参数,基于该第一网络层的第三统计参数,确定该第一网络层对应的参考特征值以及该第一网络层包括的各个输出通道的边界特征值。其中,基于K个第三样本特征图,确定该第一网络层的第三统计参数的实现过程与上文中基于M个第一样本特征图,确定第一统计参数的过程一致,详细内容请参考上文相关内容,此处不再赘述。Based on the K third sample feature graphs, the third statistical parameter of the first network layer is determined, and based on the third statistical parameter of the first network layer, the reference characteristic value corresponding to the first network layer and the boundary characteristic value of each output channel included in the first network layer are determined. Among them, the implementation process of determining the third statistical parameter of the first network layer based on the K third sample feature graphs is consistent with the process of determining the first statistical parameter based on the M first sample feature graphs in the above text. For details, please refer to the relevant content above, which will not be repeated here.
可选地,第一网络层包括P个输出通道,第三统计参数包括P个输出通道的第三参数,第三参数可以包括第三最大值或第三最小值,该边界特征值可以为最大特征值或者最小特征值,对于第一网络层输出的特征图包括的多个通道中的任意一个通道来说,该通道的边界特征值是指第一网络层输出的特征图中与该通道对应的特征图中的最大特征值或者最小特征值。在边界特征值含义不同的情况下,基于第一网络层的第三统计参数,确定该第一网络层对应的参考特征值以及该第一网络层包括的各个输出通道的边界特征值的方式不同。接下来将分别进行介绍。Optionally, the first network layer includes P output channels, the third statistical parameter includes the third parameter of the P output channels, the third parameter may include a third maximum value or a third minimum value, the boundary eigenvalue may be a maximum eigenvalue or a minimum eigenvalue, and for any one of the multiple channels included in the feature map output by the first network layer, the boundary eigenvalue of the channel refers to the maximum eigenvalue or the minimum eigenvalue in the feature map corresponding to the channel in the feature map output by the first network layer. In the case where the meaning of the boundary eigenvalue is different, based on the third statistical parameter of the first network layer, the methods for determining the reference eigenvalue corresponding to the first network layer and the boundary eigenvalues of each output channel included in the first network layer are different. They will be introduced separately below.
在边界特征值为最大特征值的情况下,第三参数包括第三最大值,将P个输出通道的第三最大值中的最大值作为参考特征值,将P个输出通道的第三最大值作为该第一网络层包括的P个输出通道的边界特征值。When the boundary eigenvalue is the maximum eigenvalue, the third parameter includes a third maximum value, the maximum value of the third maximum values of the P output channels is used as the reference eigenvalue, and the third maximum value of the P output channels is used as the boundary eigenvalue of the P output channels included in the first network layer.
在边界特征值为最小特征值的情况下,第三参数包括第三最小值,将P个输出通道的第三最小值中的最小值作为参考特征值,将P个输出通道的第三最小值作为该第一网络层包括的P个输出通道的边界特征值。When the boundary eigenvalue is the minimum eigenvalue, the third parameter includes the third minimum value, the minimum value among the third minimum values of the P output channels is used as the reference eigenvalue, and the third minimum value of the P output channels is used as the boundary eigenvalue of the P output channels included in the first network layer.
对于该P个输出通道中的任意一个输出通道来说,将该第一网络层对应的参考特征值除以该输出通道的边界特征值的值,确定为该第一网络层对应的参考特征值与该输出通道的边界特征值之间的比例。对该P个输出通道中的每个输出通道都按照同样的方式进行处理,以得到第一网络层对应的参考特征值与第一网络层包括的各个输出通道的边界特征值之间的比例。For any one of the P output channels, the reference eigenvalue corresponding to the first network layer is divided by the value of the boundary eigenvalue of the output channel to determine the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of the output channel. Each of the P output channels is processed in the same manner to obtain the ratio between the reference eigenvalue corresponding to the first network layer and the boundary eigenvalue of each output channel included in the first network layer.
可选地,按照第一网络层包括的各个输出通道的缩放比例,对该第一网络层包括的各个输出通道的权重进行缩放的实现过程包括:对于该P个输出通道中的任意一个输出通道来说,将该输出通道的权重乘以该输出通道的缩放比例,以实现对该输出通道的权重进行缩放。对该P个输出通道中的每个输出通道的权重都按照同样的方式进行处理,以实现对该第一网络层包括的各个输出通道的权重进行缩放。Optionally, according to the scaling ratio of each output channel included in the first network layer, the implementation process of scaling the weights of each output channel included in the first network layer includes: for any one of the P output channels, multiplying the weight of the output channel by the scaling ratio of the output channel to achieve scaling of the weight of the output channel. The weight of each of the P output channels is processed in the same manner to achieve scaling of the weights of each output channel included in the first network layer.
为了便于理解,现举例说明对第一网络层包括的各个输出通道的权重进行缩放的过程。请参考图15,第一网络层包括3个输入通道和5个输出通道,输出通道1输出的特征图y1=a×x1+b×x2+c×x3,y2至y5的确定方式与y1类似,此处不再赘述。在这种情况下,输出通道1的权重为a、b和c。若输出通道1的缩放比例为2,则按照输出通道1的缩放比例,对输出通道1的权重进行缩放之后,输出通道1的权重变为2a、2b和2c。For ease of understanding, the process of scaling the weights of each output channel included in the first network layer is now illustrated by way of example. Referring to Figure 15, the first network layer includes 3 input channels and 5 output channels. The feature map y1=a×x1+b×x2+c×x3 output by output channel 1, and the determination method of y2 to y5 is similar to that of y1, which will not be repeated here. In this case, the weights of output channel 1 are a, b, and c. If the scaling ratio of output channel 1 is 2, then after scaling the weights of output channel 1 according to the scaling ratio of output channel 1, the weights of output channel 1 become 2a, 2b, and 2c.
请参考图16,对第一网络层进行缩放之后,该第一网络层输出的正常样本特征图和异常样本特征图包括的每个通道的特征图的最大特征值都相应放大,并且正常样本特征图包括的每个通道的特征图的最大特征值一致。 Please refer to Figure 16. After the first network layer is scaled, the maximum eigenvalues of the feature maps of each channel included in the normal sample feature map and the abnormal sample feature map output by the first network layer are enlarged accordingly, and the maximum eigenvalues of the feature maps of each channel included in the normal sample feature map are consistent.
需要说明的是,在第一网络层包括的P个输出通道中存在某个输出通道的边界特征值为0的情况下,对该输出通道的权重不进行缩放。并且,对于任意一个网络层中的任意一个输出通道来说,该输出通道能够输出该通道对应的特征图。若该网络层包括多个输出通道,则能够得到多个输出通道对应的特征图,该网络层的多个输出通道对应的特征图可以称作该网络层输出的特征图。此外,上述每个网络单元中第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值可以理解为:对于任意一个第一网络层来说,该第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值,也即是,各个通道的边界特征值之间的差值小于特征阈值是针对同一个特征图而言的。其中,特征阈值是事先设置的,在不同的情况下,还可以按照不同的需求进行调整。It should be noted that, when there is a boundary feature value of 0 in a certain output channel among the P output channels included in the first network layer, the weight of the output channel is not scaled. Moreover, for any output channel in any network layer, the output channel can output the feature map corresponding to the channel. If the network layer includes multiple output channels, feature maps corresponding to multiple output channels can be obtained, and the feature maps corresponding to the multiple output channels of the network layer can be referred to as feature maps output by the network layer. In addition, the difference between the boundary feature values of each channel in the feature map output by the first network layer in each of the above network units is less than the feature threshold, which can be understood as follows: for any first network layer, the difference between the boundary feature values of each channel in the feature map output by the first network layer is less than the feature threshold, that is, the difference between the boundary feature values of each channel is less than the feature threshold for the same feature map. Among them, the feature threshold is set in advance, and can also be adjusted according to different needs in different situations.
步骤1303:对每个网络单元中第二网络层包括的各个输入通道的权重进行缩放,以使编解码网络模型的输出结果在模型权重缩放前后相同,该模型权重缩放包括对每个网络单元中第一网络层的各个输出通道的权重缩放和对第二网络层的各个输入通道的权重缩放。Step 1303: Scale the weights of each input channel included in the second network layer in each network unit so that the output result of the codec network model is the same before and after the model weight scaling, and the model weight scaling includes weight scaling of each output channel of the first network layer in each network unit and weight scaling of each input channel of the second network layer.
对于每个网络单元中的第二网络层,基于该第二网络层所处的网络单元中的第一网络层包括的各个输出通道的缩放比例,确定该第二网络层包括的各个输入通道的缩放比例,按照该第二网络层包括的各个输入通道的缩放比例,对该第二网络层包括的各个输入通道的权重进行缩放。For the second network layer in each network unit, the scaling ratio of each input channel included in the second network layer is determined based on the scaling ratio of each output channel included in the first network layer in the network unit where the second network layer is located, and the weights of each input channel included in the second network layer are scaled according to the scaling ratio of each input channel included in the second network layer.
可选地,若第二网络层所处的网络单元中的第一网络层包括P个输出通道,则该第二网络层包括P个输入通道,该第二网络层的P个输入通道与该第一网络层的P输出通道是一一对应的。对于该P个输入通道中的任意一个输入通道来说,将该第二网络层所处的网络单元中的第一网络层包括的P个输出通道中与该输入通道对应的输出通道的缩放比例的倒数,作为该输入通道的缩放比例。对该P个输入通道中的每个输入通道都按照同样的方式进行处理,以得到该第二网络层包括的各个输入通道的缩放比例。Optionally, if the first network layer in the network unit where the second network layer is located includes P output channels, the second network layer includes P input channels, and the P input channels of the second network layer correspond one-to-one to the P output channels of the first network layer. For any one of the P input channels, the reciprocal of the scaling ratio of the output channel corresponding to the input channel among the P output channels included in the first network layer in the network unit where the second network layer is located is used as the scaling ratio of the input channel. Each of the P input channels is processed in the same manner to obtain the scaling ratio of each input channel included in the second network layer.
按照第二网络层包括的各个输入通道的缩放比例,对该第二网络层包括的各个输入通道的权重进行缩放的实现过程与按照第一网络层包括的各个输出通道的缩放比例,对该第一网络层包括的各个输出通道的权重进行缩放的实现过程类似,此处不再赘述。The implementation process of scaling the weights of the input channels included in the second network layer according to the scaling ratio of the input channels included in the second network layer is similar to the implementation process of scaling the weights of the output channels included in the first network layer according to the scaling ratio of the output channels included in the first network layer, and will not be repeated here.
为了便于理解,现举例说明对第二网络层包括的各个输出通道的权重进行缩放的过程。请参考图17,该第二网络层包括3个输入通道和5个输出通道,输入通道1的输入数据x1在输出通道1至5上的权重分别为a、d、j、g、m。x2至x5在输出通道1至5上的权重与x1类似,此处不再赘述。在这种情况下,输入通道1的权重为a、d、j、g、m。若输入通道1的缩放比例为0.5,则按照输入通道1的缩放比例,对输入通道1的权重进行缩放之后,输入通道1的权重变为0.5a、0.5d、0.5j、0.5g、0.5m。For ease of understanding, the process of scaling the weights of each output channel included in the second network layer is now illustrated by way of example. Referring to Figure 17, the second network layer includes 3 input channels and 5 output channels, and the weights of the input data x1 of input channel 1 on output channels 1 to 5 are a, d, j, g, and m, respectively. The weights of x2 to x5 on output channels 1 to 5 are similar to x1 and will not be repeated here. In this case, the weights of input channel 1 are a, d, j, g, and m. If the scaling ratio of input channel 1 is 0.5, then after scaling the weights of input channel 1 according to the scaling ratio of input channel 1, the weights of input channel 1 become 0.5a, 0.5d, 0.5j, 0.5g, and 0.5m.
步骤1304:对模型权重缩放后的编解码网络模型进行量化。Step 1304: quantize the codec network model after the model weights are scaled.
在一些实施例中,可以采用PTQ对模型权重缩放后的编解码网络模型进行量化。在另一些实施例中,也可以采用QAT对模型权重缩放后的编解码网络模型进行量化,当然,也可以采用其他的方式对模型权重缩放后的编解码网络模型进行量化,本申请实施例对此不做限定。In some embodiments, PTQ can be used to quantize the codec network model after the model weights are scaled. In other embodiments, QAT can also be used to quantize the codec network model after the model weights are scaled. Of course, other methods can also be used to quantize the codec network model after the model weights are scaled, and the embodiments of the present application do not limit this.
请参考图18,图18中的重建图像1为异常样本图像经过PTQ对模型权重缩放后的编解码网络模型进行量化的编解码网络模型所得到的重建图像,重建图像2为异常样本图像经过QAT对模型权重缩放后的编解码网络模型进行量化的编解码网络模型所得到的重建图像,从图18中可以看出,无论是采用QAT对模型权重缩放后的编解码网络模型进行量化,还是采用PTQ对模型权重缩放后的编解码网络模型进行量化,都能够增强编解码网络模型的稳定性,使得图像经过编解码网络模型后的压缩率和重建图像都有较大改善。Please refer to Figure 18. The reconstructed image 1 in Figure 18 is the reconstructed image obtained by the codec network model after the abnormal sample image is quantized by the codec network model after the model weights are scaled by PTQ. The reconstructed image 2 is the reconstructed image obtained by the codec network model after the abnormal sample image is quantized by QAT after the model weights are scaled. It can be seen from Figure 18 that whether QAT is used to quantize the codec network model after the model weights are scaled, or PTQ is used to quantize the codec network model after the model weights are scaled, the stability of the codec network model can be enhanced, so that the compression rate and the reconstructed image of the image after passing through the codec network model are greatly improved.
综上所述,本申请实施例提供的编解码网络模型的量化方法在边界特征值为最大特征值的情况下,能够对大于最大截断值的异常特征值进行截断,在边界特征值为最小特征值的情况下,能够对小于最小截断值的异常特征值进行截断。To sum up, the quantization method of the codec network model provided in the embodiment of the present application can truncate abnormal eigenvalues greater than the maximum cutoff value when the boundary eigenvalue is the maximum eigenvalue, and can truncate abnormal eigenvalues less than the minimum cutoff value when the boundary eigenvalue is the minimum eigenvalue.
需要说明的是,上述第三网络层具有克服梯度消失且加快训练速度的特性,并且还能够增加编解码网络模型的非线性表达能力。在边界特征值为最大特征值的情况下,也即是,对大于最大截断值的异常特征值进行截断的情况下,该第三网络层可以为基于Relu函数构建的激活层,以使经过该第三网络层输出的特征图中各个特征点的特征值的最小特征值为0,也即是,全为非负,在这种情况下,由于最小截断值一般小于0,因此,该小于最小截断值的异常特征值在经过该第三网络层之后被截断为0,从而实现了对该异常特征值的修正。在边界特征值为最小特征值的情况下,该第三网络层可以为基于Relu函数关于y轴对称之后的函数构建的激活层,以使经过该第三网络层之后输出的特征图中各个特征点的特征值的最大特征 值为0,也即是,全为非正,在这种情况下,由于最大截断值一般大于0,因此,该大于最大截断值的异常特征值在经过该第三网络层之后被截断为0,从而实现了对该异常特征值的修正。It should be noted that the above-mentioned third network layer has the characteristics of overcoming gradient vanishing and accelerating training speed, and can also increase the nonlinear expression ability of the encoding and decoding network model. In the case where the boundary eigenvalue is the maximum eigenvalue, that is, when the abnormal eigenvalue greater than the maximum truncation value is truncated, the third network layer can be an activation layer constructed based on the Relu function, so that the minimum eigenvalue of the eigenvalue of each feature point in the feature map output by the third network layer is 0, that is, all are non-negative. In this case, since the minimum truncation value is generally less than 0, the abnormal eigenvalue less than the minimum truncation value is truncated to 0 after passing through the third network layer, thereby realizing the correction of the abnormal eigenvalue. In the case where the boundary eigenvalue is the minimum eigenvalue, the third network layer can be an activation layer constructed based on a function after the Relu function is symmetric about the y-axis, so that the maximum eigenvalue of the eigenvalue of each feature point in the feature map output after the third network layer is The value is 0, that is, all are non-positive. In this case, since the maximum cutoff value is generally greater than 0, the abnormal feature value greater than the maximum cutoff value is truncated to 0 after passing through the third network layer, thereby realizing the correction of the abnormal feature value.
作为一种示例,请参考图19,编解码网络模型包括的6个网络单元中的第一网络层为分别为熵估计网络模型中的网络层1、网络层2、网络层3、网络层4、网络层5和网络层6,在对模型权重缩放后的编解码网络模型进行量化之后,对图像1进行编码的流程为:图像1经过编码网络得到特征y,特征y经过超编码网络模型中的网络层1、网络层2和网络层3时,都将根据每个网络层的截断值对特征图进行修正,最终得到先验特征z,先验特征z经过熵估计网络模型中超解码网络模型中的网络层4、网络层5和网络层6时,都将根据每个网络层的截断值对特征图进行修正,最终得到特征y的每个特征点的概率分布,进而基于该特征y的每个特征点的概率分布确定待编码特征,将该待编码特征进行熵编码,以得到码流。As an example, please refer to Figure 19. The first network layer of the six network units included in the codec network model is network layer 1, network layer 2, network layer 3, network layer 4, network layer 5 and network layer 6 in the entropy estimation network model. After the codec network model after model weight scaling is quantized, the process of encoding image 1 is as follows: image 1 passes through the encoding network to obtain feature y. When feature y passes through network layer 1, network layer 2 and network layer 3 in the super-coding network model, the feature map will be corrected according to the cutoff value of each network layer, and finally the prior feature z is obtained. When the prior feature z passes through network layer 4, network layer 5 and network layer 6 in the super-decoding network model in the entropy estimation network model, the feature map will be corrected according to the cutoff value of each network layer, and finally the probability distribution of each feature point of feature y is obtained, and then the feature to be encoded is determined based on the probability distribution of each feature point of feature y, and the feature to be encoded is entropy encoded to obtain a code stream.
通过对每个第一网络层包括的各个输出通道的权重进行缩放,能够保证每个第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值,而且通过对第一网络层之后的第二网络层的输入通道的权重进行缩放,能够保证编解码网络模型的输出结果在模型权重缩放前后相同。这样,在对模型权重缩放后的编解码网络模型进行量化之后,通过模型权重缩放的第一网络层和第二网络层能够对异常特征值进行截断,还能够保证正常特征值不受到模型权重缩放的影响,从而降低编解码网络模型因第一网络层各个输出通道的权重改变而受到的影响,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。By scaling the weights of each output channel included in each first network layer, it is possible to ensure that the difference between the boundary feature values of each channel in the feature map output by each first network layer is less than the feature threshold, and by scaling the weights of the input channels of the second network layer after the first network layer, it is possible to ensure that the output result of the codec network model is the same before and after the model weight scaling. In this way, after the codec network model after the model weight scaling is quantized, the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
由于编解码网络模型中并不是每个线性运算层都会导致编解码结果异常,因此,在本申请实施例中可以对编解码网络模型包括的线性运算层进行筛选,以确定会导致编解码结果异常的线性运算层,将会导致编解码结果异常的线性运算层作为第一网络层,这样能够针对性的对导致编解码结果异常的线性运算层包括的各个输出通道的权重进行缩放,在减少编解码网络模型的计算量的同时,还能够有效保证编解码网络模型的稳定性。并且,无论边界特征值为最大特征值还是最小特征值,本申请实施例提供的编解码网络模型的量化方法都能够实现对异常特征值的修正,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。Since not every linear operation layer in the codec network model will cause abnormal codec results, therefore, in the embodiment of the present application, the linear operation layers included in the codec network model can be screened to determine the linear operation layers that will cause abnormal codec results, and the linear operation layers that will cause abnormal codec results will be used as the first network layer, so that the weights of each output channel included in the linear operation layer that causes abnormal codec results can be targeted and scaled, while reducing the amount of calculation of the codec network model, the stability of the codec network model can also be effectively guaranteed. Moreover, regardless of whether the boundary eigenvalue is the maximum eigenvalue or the minimum eigenvalue, the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of abnormal eigenvalues, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
本申请实施例提供的一种编解码网络模型的量化方法能够在不增加编解码网络模型的计算强度和不降低编解码网络模型性能的情况下,大幅增强量化后的编解码网络模型对各种图像的进行编解码的稳定性,还能够用于增强其他Low-Level任务(例如超分辨率任务、去噪任务等)量化模型的稳定性。A quantization method for a codec network model provided in an embodiment of the present application can greatly enhance the stability of the quantized codec network model in encoding and decoding various images without increasing the computational intensity of the codec network model and without reducing the performance of the codec network model. It can also be used to enhance the stability of quantization models for other low-level tasks (such as super-resolution tasks, denoising tasks, etc.).
图20是本申请实施例提供的一种编解码网络模型的量化装置的结构示意图,该编解码网络模型的量化装置可以由软件、硬件或者两者的结合实现成为编解码网络模型的量化设备的部分或者全部。参见图20,该装置包括:确定模块2001、第一缩放模块2002、第二缩放模块2003和量化模块2004。FIG20 is a schematic diagram of the structure of a quantization device for a coding network model provided in an embodiment of the present application, and the quantization device for the coding network model can be implemented by software, hardware, or a combination of both to become part or all of the quantization device for the coding network model. Referring to FIG20 , the device includes: a determination module 2001, a first scaling module 2002, a second scaling module 2003, and a quantization module 2004.
确定模块2001,用于确定未量化的编解码网络模型包括的H个网络单元,该H个网络单元中的每个网络单元包括第一网络层、第二网络层和第三网络层,该第一网络层与第二网络层均为线性运算层,该第三网络层位于第一网络层与第二网络层之间,该第三网络层用于对第一网络层输出的特征图中的特征值进行截断,以使第三网络层输出的特征图中的特征值全为非负或者全为非正,H为大于或等于1的整数。Determination module 2001 is used to determine the H network units included in the unquantized codec network model, each of the H network units includes a first network layer, a second network layer and a third network layer, the first network layer and the second network layer are both linear operation layers, the third network layer is located between the first network layer and the second network layer, and the third network layer is used to truncate the eigenvalues in the feature map output by the first network layer so that the eigenvalues in the feature map output by the third network layer are all non-negative or all non-positive, and H is an integer greater than or equal to 1.
第一缩放模块2002,用于对每个网络单元中第一网络层包括的各个输出通道的权重进行缩放,以使每个网络单元中第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值。The first scaling module 2002 is used to scale the weights of each output channel included in the first network layer in each network unit so that the difference between the boundary feature values of each channel in the feature map output by the first network layer in each network unit is less than the feature threshold.
第二缩放模块2003,用于对每个网络单元中第二网络层包括的各个输入通道的权重进行缩放,以使编解码网络模型的输出结果在模型权重缩放前后相同,该模型权重缩放包括对每个网络单元中第一网络层的各个输出通道的权重缩放和对第二网络层的各个输入通道的权重缩放。The second scaling module 2003 is used to scale the weights of each input channel included in the second network layer in each network unit so that the output result of the codec network model is the same before and after the model weight scaling. The model weight scaling includes weight scaling of each output channel of the first network layer in each network unit and weight scaling of each input channel of the second network layer.
量化模块2004,用于对模型权重缩放后的编解码网络模型进行量化。The quantization module 2004 is used to quantize the codec network model after the model weights are scaled.
可选地,确定模块2001具体用于:Optionally, the determination module 2001 is specifically used for:
确定第一线性运算层输出的M个第一样本特征图,该M个第一样本特征图为M个正常样本图像对应的特征图,M为大于或等于1的整数,该第一线性运算层为编解码网络模型包括的任意一个线性运算层;Determine M first sample feature maps output by the first linear operation layer, where the M first sample feature maps are feature maps corresponding to M normal sample images, M is an integer greater than or equal to 1, and the first linear operation layer is any linear operation layer included in the codec network model;
确定第一线性运算层输出的N个第二样本特征图,该N个第二样本特征图为N个异常样本图像对应的特征图,N为大于或等于1的整数;Determine N second sample feature maps output by the first linear operation layer, where the N second sample feature maps are feature maps corresponding to N abnormal sample images, and N is an integer greater than or equal to 1;
在M个第一样本特征图和N个第二样本特征图满足特征异常条件的情况下,确定第一线性运算层为第一网络层。 When the M first sample feature maps and the N second sample feature maps meet the feature abnormality condition, the first linear operation layer is determined to be the first network layer.
可选地,确定模块2001具体用于:Optionally, the determination module 2001 is specifically used for:
对于每个网络单元中的第一网络层,在第一网络层之后具有线性运算层的情况下,将该第一网络层之后的线性运算层确定为第二网络层;For the first network layer in each network unit, in the case where there is a linear operation layer after the first network layer, the linear operation layer after the first network layer is determined as the second network layer;
在第一网络层之后不具有线性运算层的情况下,在该第一网络层之后添加一个线性运算层,将添加的线性运算层确定为第二网络层。In the case that there is no linear operation layer after the first network layer, a linear operation layer is added after the first network layer, and the added linear operation layer is determined as the second network layer.
可选地,第一缩放模块2002具体用于:Optionally, the first scaling module 2002 is specifically configured to:
对于每个网络单元中的第一网络层,确定该第一网络层输出的K个第三样本特征图,该K个第三样本特征图为K个正常样本图像对应的特征图,K为大于或等于1的整数;For the first network layer in each network unit, determine K third sample feature maps output by the first network layer, where the K third sample feature maps are feature maps corresponding to K normal sample images, and K is an integer greater than or equal to 1;
基于K个第三样本特征图确定第一网络层包括的各个输出通道的缩放比例;Determine a scaling ratio of each output channel included in the first network layer based on the K third sample feature maps;
按照第一网络层包括的各个输出通道的缩放比例,对该第一网络层包括的各个输出通道的权重进行缩放。According to the scaling ratio of each output channel included in the first network layer, the weights of each output channel included in the first network layer are scaled.
可选地,第一缩放模块2002具体用于:Optionally, the first scaling module 2002 is specifically configured to:
基于K个第三样本特征图确定第一网络层对应的参考特征值以及第一网络层包括的各个输出通道的边界特征值;Determine a reference eigenvalue corresponding to the first network layer and a boundary eigenvalue of each output channel included in the first network layer based on the K third sample feature maps;
将第一网络层对应的参考特征值与第一网络层包括的各个输出通道的边界特征值之间的比例,确定为第一网络层包括的各个输出通道的缩放比例。The ratio between the reference characteristic value corresponding to the first network layer and the boundary characteristic value of each output channel included in the first network layer is determined as the scaling ratio of each output channel included in the first network layer.
可选地,第二缩放模块2003具体用于:Optionally, the second scaling module 2003 is specifically configured to:
对于每个网络单元中的第二网络层,基于该第二网络层所处的网络单元中的第一网络层包括的各个输出通道的缩放比例,确定该第二网络层包括的各个输入通道的缩放比例;For the second network layer in each network unit, based on the scaling ratios of the output channels included in the first network layer in the network unit where the second network layer is located, determining the scaling ratios of the input channels included in the second network layer;
按照第二网络层包括的各个输入通道的缩放比例,对该第二网络层包括的各个输入通道的权重进行缩放。According to the scaling ratio of each input channel included in the second network layer, the weights of each input channel included in the second network layer are scaled.
可选地,编解码网络模型包括编码网络模型、解码网络模型或者熵估计网络模型。Optionally, the encoding and decoding network model includes an encoding network model, a decoding network model or an entropy estimation network model.
可选地,边界特征值为最大特征值或者最小特征值。Optionally, the boundary eigenvalue is a maximum eigenvalue or a minimum eigenvalue.
通过对每个第一网络层包括的各个输出通道的权重进行缩放,能够保证每个第一网络层输出的特征图中各个通道的边界特征值之间的差值小于特征阈值,而且通过对第一网络层之后的第二网络层的输入通道的权重进行缩放,能够保证编解码网络模型的输出结果在模型权重缩放前后相同。这样,在对模型权重缩放后的编解码网络模型进行量化之后,通过模型权重缩放的第一网络层和第二网络层能够对异常特征值进行截断,还能够保证正常特征值不受到模型权重缩放的影响,从而降低编解码网络模型因第一网络层各个输出通道的权重改变而受到的影响,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。By scaling the weights of each output channel included in each first network layer, it is possible to ensure that the difference between the boundary feature values of each channel in the feature map output by each first network layer is less than the feature threshold, and by scaling the weights of the input channels of the second network layer after the first network layer, it is possible to ensure that the output result of the codec network model is the same before and after the model weight scaling. In this way, after the codec network model after the model weight scaling is quantized, the first network layer and the second network layer after the model weight scaling can truncate the abnormal feature values, and can also ensure that the normal feature values are not affected by the model weight scaling, thereby reducing the impact of the codec network model on the weight change of each output channel of the first network layer, so that the final generated code stream and/or reconstructed image does not have abnormalities, and improves the stability of the codec network model.
由于编解码网络模型中并不是每个线性运算层都会导致编解码结果异常,因此,在本申请实施例中可以对编解码网络模型包括的线性运算层进行筛选,以确定会导致编解码结果异常的线性运算层,将会导致编解码结果异常的线性运算层作为第一网络层,这样能够针对性的对导致编解码结果异常的线性运算层包括的各个输出通道的权重进行缩放,在减少编解码网络模型的计算量的同时,还能够有效保证编解码网络模型的稳定性。并且,无论边界特征值为最大特征值还是最小特征值,本申请实施例提供的编解码网络模型的量化方法都能够实现对异常特征值的修正,使得最终生成的码流和/或者重建图像不存在异常,提高编解码网络模型的稳定性。Since not every linear operation layer in the codec network model will cause abnormal codec results, therefore, in the embodiment of the present application, the linear operation layers included in the codec network model can be screened to determine the linear operation layers that will cause abnormal codec results, and the linear operation layers that will cause abnormal codec results will be used as the first network layer, so that the weights of each output channel included in the linear operation layer that causes abnormal codec results can be targeted and scaled, while reducing the amount of calculation of the codec network model, the stability of the codec network model can also be effectively guaranteed. Moreover, regardless of whether the boundary eigenvalue is the maximum eigenvalue or the minimum eigenvalue, the quantization method of the codec network model provided in the embodiment of the present application can realize the correction of abnormal eigenvalues, so that the final generated code stream and/or reconstructed image does not have abnormalities, thereby improving the stability of the codec network model.
本申请实施例提供的一种编解码网络模型的量化方法能够在不增加编解码网络模型的计算强度和不降低编解码网络模型性能的情况下,大幅增强量化后的编解码网络模型对各种图像的进行编解码的稳定性,还能够用于增强其他Low-Level任务(例如超分辨率任务、去噪任务等)量化模型的稳定性。A quantization method for a codec network model provided in an embodiment of the present application can greatly enhance the stability of the quantized codec network model in encoding and decoding various images without increasing the computational intensity of the codec network model and without reducing the performance of the codec network model. It can also be used to enhance the stability of quantization models for other low-level tasks (such as super-resolution tasks, denoising tasks, etc.).
需要说明的是:上述实施例提供的编解码网络模型的量化装置在进行编解码网络模型的量化时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的编解码网络模型的量化装置与编解码网络模型的量化方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: the quantization device of the codec network model provided in the above embodiment only uses the division of the above functional modules as an example when performing the quantization of the codec network model. In actual applications, the above functions can be assigned to different functional modules as needed, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the quantization device of the codec network model provided in the above embodiment and the quantization method embodiment of the codec network model belong to the same concept. The specific implementation process is detailed in the method embodiment, which will not be repeated here.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现 时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof. When the computer program product is loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server or data center that includes one or more available media integrated. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a digital versatile disc (DVD)) or a semiconductor medium (e.g., a solid state disk (SSD)), etc. It is worth noting that the computer-readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, a non-transitory storage medium.
应当理解的是,本文提及的“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。It should be understood that the "multiple" mentioned herein refers to two or more. In the description of the embodiments of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this article is only a description of the association relationship of the associated objects, indicating that there can be three relationships, for example, A and/or B can mean: A exists alone, A and B exist at the same time, and B exists alone. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second" and the like are used to distinguish between the same items or similar items with basically the same functions and effects. Those skilled in the art can understand that the words "first", "second" and the like do not limit the quantity and execution order, and the words "first", "second" and the like do not limit them to be necessarily different.
需要说明的是,本申请实施例所涉及的信息(包括但不限于用户设备信息、用户个人信息等)、数据(包括但不限于用于分析的数据、存储的数据、展示的数据等)以及信号,均为经用户授权或者经过各方充分授权的,且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准。例如,本申请实施例中涉及到的第一网络层输出的特征图和第三网络层输出的特征图都是在充分授权的情况下获取的。It should be noted that the information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.) and signals involved in the embodiments of the present application are all authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions. For example, the feature graphs output by the first network layer and the feature graphs output by the third network layer involved in the embodiments of the present application are obtained with full authorization.
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。 The above-mentioned embodiments are provided for the present application and are not intended to limit the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application shall be included in the protection scope of the present application.
Claims (19)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310153239.1A CN118468945A (en) | 2023-02-08 | 2023-02-08 | Quantization method and related device for coding and decoding network model |
| CN202310153239.1 | 2023-02-08 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024164590A1 true WO2024164590A1 (en) | 2024-08-15 |
Family
ID=92165626
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/129969 Ceased WO2024164590A1 (en) | 2023-02-08 | 2023-11-06 | Quantization method for encoder-decoder network model and related apparatus |
Country Status (3)
| Country | Link |
|---|---|
| CN (1) | CN118468945A (en) |
| TW (1) | TW202433928A (en) |
| WO (1) | WO2024164590A1 (en) |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110619392A (en) * | 2019-09-19 | 2019-12-27 | 哈尔滨工业大学(威海) | Deep neural network compression method for embedded mobile equipment |
| CN111105017A (en) * | 2019-12-24 | 2020-05-05 | 北京旷视科技有限公司 | Neural network quantization method and device and electronic equipment |
| CN111105007A (en) * | 2018-10-26 | 2020-05-05 | 中国科学院半导体研究所 | A Compression Acceleration Method for Deep Convolutional Neural Networks for Object Detection |
| CN112733964A (en) * | 2021-02-01 | 2021-04-30 | 西安交通大学 | Convolutional neural network quantification method for reinforcement learning automatic perception weight distribution |
| CN114580280A (en) * | 2022-03-02 | 2022-06-03 | 北京市商汤科技开发有限公司 | Model quantization method, device, apparatus, computer program and storage medium |
| CN114707637A (en) * | 2022-03-18 | 2022-07-05 | 恒烁半导体(合肥)股份有限公司 | Neural network quantitative deployment method, system and storage medium |
| CN114970853A (en) * | 2022-03-16 | 2022-08-30 | 华南理工大学 | Cross-range quantization convolutional neural network compression method |
| US20220300784A1 (en) * | 2021-03-19 | 2022-09-22 | Fujitsu Limited | Computer-readable recording medium having stored therein machine-learning program, method for machine learning, and calculating machine |
-
2023
- 2023-02-08 CN CN202310153239.1A patent/CN118468945A/en active Pending
- 2023-11-06 WO PCT/CN2023/129969 patent/WO2024164590A1/en not_active Ceased
-
2024
- 2024-01-30 TW TW113103449A patent/TW202433928A/en unknown
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111105007A (en) * | 2018-10-26 | 2020-05-05 | 中国科学院半导体研究所 | A Compression Acceleration Method for Deep Convolutional Neural Networks for Object Detection |
| CN110619392A (en) * | 2019-09-19 | 2019-12-27 | 哈尔滨工业大学(威海) | Deep neural network compression method for embedded mobile equipment |
| CN111105017A (en) * | 2019-12-24 | 2020-05-05 | 北京旷视科技有限公司 | Neural network quantization method and device and electronic equipment |
| CN112733964A (en) * | 2021-02-01 | 2021-04-30 | 西安交通大学 | Convolutional neural network quantification method for reinforcement learning automatic perception weight distribution |
| US20220300784A1 (en) * | 2021-03-19 | 2022-09-22 | Fujitsu Limited | Computer-readable recording medium having stored therein machine-learning program, method for machine learning, and calculating machine |
| CN114580280A (en) * | 2022-03-02 | 2022-06-03 | 北京市商汤科技开发有限公司 | Model quantization method, device, apparatus, computer program and storage medium |
| CN114970853A (en) * | 2022-03-16 | 2022-08-30 | 华南理工大学 | Cross-range quantization convolutional neural network compression method |
| CN114707637A (en) * | 2022-03-18 | 2022-07-05 | 恒烁半导体(合肥)股份有限公司 | Neural network quantitative deployment method, system and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202433928A (en) | 2024-08-16 |
| CN118468945A (en) | 2024-08-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11423310B2 (en) | Deep learning based adaptive arithmetic coding and codelength regularization | |
| US11783511B2 (en) | Channel-wise autoregressive entropy models for image compression | |
| US20200145692A1 (en) | Video processing method and apparatus | |
| WO2022194137A1 (en) | Video image encoding method, video image decoding method and related devices | |
| CN110072119B (en) | Content-aware video self-adaptive transmission method based on deep learning network | |
| US20230106778A1 (en) | Quantization for Neural Networks | |
| CN110169068A (en) | Dc coefficient sign coding scheme | |
| WO2017151877A1 (en) | Apparatus and method to improve image or video quality or encoding performance by enhancing discrete cosine transform coefficients | |
| CN107018416A (en) | Adaptive tile data size encoding for video and image compression | |
| WO2022253088A1 (en) | Encoding method and apparatus, decoding method and apparatus, device, storage medium, and computer program and product | |
| US20250024064A1 (en) | Encoding Method and Apparatus, Storage Medium, and Computer Program Product | |
| WO2021063218A1 (en) | Image signal processing method and apparatus | |
| WO2023169303A1 (en) | Encoding and decoding method and apparatus, device, storage medium, and computer program product | |
| WO2024164590A1 (en) | Quantization method for encoder-decoder network model and related apparatus | |
| CN119484883A (en) | Video narrowband transmission method, device and electronic device under bandwidth-limited conditions | |
| WO2024193426A1 (en) | Feature map coding method and apparatus | |
| WO2023082773A1 (en) | Video encoding method and apparatus, video decoding method and apparatus, and device, storage medium and computer program | |
| US20230007260A1 (en) | Probability Estimation for Video Coding | |
| WO2024164591A1 (en) | Coding method, apparatus and device, decoding method, apparatus and device, storage medium, and computer program | |
| WO2025087214A1 (en) | Decoder, encoder, image encoding method, image decoding method and storage medium | |
| US8270746B2 (en) | Image compression method and device thereof | |
| WO2025011493A1 (en) | Decoding network model, quantization method for decoding network model and related apparatus | |
| EP4591577A1 (en) | Improved entropy bypass coding | |
| CN120201201A (en) | Coding and decoding method, image classification method and related device | |
| KR20230089753A (en) | Method and apparatus for live streaming |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23920782 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |