EP3818502A1 - A method, an apparatus and a computer program product for image compression - Google Patents
A method, an apparatus and a computer program product for image compressionInfo
- Publication number
- EP3818502A1 EP3818502A1 EP19831508.7A EP19831508A EP3818502A1 EP 3818502 A1 EP3818502 A1 EP 3818502A1 EP 19831508 A EP19831508 A EP 19831508A EP 3818502 A1 EP3818502 A1 EP 3818502A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- block
- neural
- network
- encoder network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0499—Feedforward networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
- G06T9/002—Image coding using neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/192—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/189—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
- H04N19/196—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding being specially adapted for the computation of encoding parameters, e.g. by averaging previously computed encoding parameters
Definitions
- the present solution generally relates to an image or video compression.
- the solution relates to neural image (or video) compression.
- Semantic information is represented by metadata which may express the type of scene, the occurrence of a specific action/activity, the presence of a specific object, etc. Such semantic information can be obtained by analyzing the media.
- neural networks have been adapted to take advantage of visual spatial attention, i.e. the manner how humans conceive a new environment by focusing first to a limited spatial region of the scene for a short moment and then repeating this for a few more spatial regions in the scene in order to obtain an understanding of the semantics in the scene.
- a method comprising receiving input data divided into a plurality of blocks; overfitting a first neural encoder network for a first block of the data based on a baseline encoder network; encoding the first block by the first overfitted neural encoder network; overfitting a second neural encoder network for at least one subsequent block of the data based on a combination of neural encoder networks used for previous blocks and/or the baseline encoder network; and encoding the at least one subsequent block by the second overfitted neural encoder network.
- a method for a neural decoder network comprising receiving a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, recovering the original block to be used as ground-truth data; and overfitting the neural decoder network based on the ground-truth data.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to receive input data divided into a plurality of blocks; overfit a first neural encoder network for a first block of the data based on a baseline encoder network; encode the first block by the first overfitted neural encoder network; overfit a second neural encoder network for at least one subsequent block of data based on a combination of neural networks used for previous blocks and/or the baseline encoder network; and encode the at least one subsequent block by the second overfitted neural encoder network.
- the apparatus further configured to determine which one of the overfitted neural encoder networks performs the best; and select such overfitted neural encoder network for a current block.
- the performance is determined according to one or both of the following aspects: a reconstruction quality or a bitrate.
- the data comprises image data, video data, or audio data.
- an apparatus comprising at least one processor, memory including computer program code, the memory and the computer program code configured to, with the at least one processor, cause the apparatus to receive a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, recover the original block to be used as ground-truth data; and overfit the neural decoder network based on the ground-truth data.
- the apparatus is further being configured to receive a weight residual from a transmitter, the weight residual defining the difference between weights of the decoder before and after an overfitting.
- the data comprises image data, video data, or audio data.
- a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive input data divided into a plurality of blocks; overfit a first neural encoder network for a first block of the data based on a baseline encoder network; encode the first block by the first overfitted neural encoder network; overfit a second neural encoder network for at least one subsequent block of data based on a combination of neural networks used for previous blocks and/or the baseline encoder network; and encode the at least one subsequent block by the second overfitted neural encoder network.
- a computer program product comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to receive a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, recover the original block to be used as ground-truth data; and overfit the neural decoder network based on the ground-truth data.
- the computer program product is embodied on a non- transitory computer readable medium.
- Fig. 1 shows an example of a computer system according to an embodiment
- Fig. 2 shows an embodiment for a training process of a neural auto-encoder
- Fig. 3 shows a sequential encoder overfitting according to a first embodiment
- Fig. 4 shows another example of a sequential encoder overfitting according to a first embodiment
- Fig. 5 shows a decoder overfitting according to a second embodiment
- Fig. 6 is a flowchart illustrating a method according to an embodiment
- Fig. 7 is a flowchart illustrating a method according to another embodiment.
- the several embodiments enable using neural network for image compression/decompression. It is to be noted, however, that the embodiments are not limited to compression/decompression of images, but compression/decompression of video as well. Therefore, any time term“image” is used in the following description, it is appreciated that the term covers also“video” or“video frame”.
- the present embodiments are also applicable with other media content, such as audio, speech, etc.
- the data block corresponding of the image block concept in audio signals may be an audio frame.
- Spatially neighboring image blocks may correspond to temporally neighboring audio frames.
- similar concepts of image blocks can be used when considering audio spectrogram images.
- Figure 1 shows a computer system suitable to be used in data processing.
- the generalized structure of the computer system will be explained in accordance with the functional blocks of the system.
- Several functionalities can be carried out with a single physical device, e.g. all calculation procedures can be performed in a single processor if desired.
- a data processing system of an apparatus comprises a main processing unit 100, a memory 102, a storage device 104, an input device 106, an output device 108, and a graphics subsystem 1 10, which are all connected to each other via a data bus 1 12.
- the main processing unit 100 is a processing unit comprising processor circuitry and arranged to process data within the data processing system.
- the memory 102, the storage device 104, the input device 106, and the output device 108 may include conventional components as recognized by those skilled in the art.
- the memory 102 and storage device 104 store data within the data processing system 100.
- Computer program code resides in the memory 102 for implementing, for example, computer vision process.
- the input device 106 inputs data into the system while the output device 108 receives data from the data processing system and forwards the data, for example to a display, a data transmitter, or other output device.
- the data bus 1 12 is a conventional data bus and while shown as a single line it may be any combination of the following: a processor bus, a PCI bus, a graphical bus, an ISA bus. Accordingly, a skilled person readily recognizes that the apparatus may be any data processing device, such as a computer device, a personal computer, a server computer, a mobile phone, a smart phone or an Internet access device, for example Internet tablet computer.
- the elements of data processing may be implemented as a software component residing on one device or distributed across several devices, as mentioned above, for example so that the devices form a so-called cloud.
- a neural network is a computation graph comprising several layers of computation. Each layer comprises one or more units, where each unit performs an elementary computation. A unit is connected to one or more other units, and the connection may have associated a weight. The weight may be used for scaling the signal passing through the associated connection. Weights are usually learnable parameters, i.e., values which can be learned from training data. There may be other learnable parameters, such as those of batch-normalization layers.
- Two examples of architecture for neural networks are feed-forward and recurrent architectures.
- Feed-forward neural networks are such that there is no feedback loop: each layer takes input from one or more of preceding layers and provides its output as the input for one or more of the subsequent layers. Also, units inside certain layers take input from units in one or more of preceding layers, and provide output to one or more of the following layers.
- Initial layers extract semantically low-level features such as edges and texture in images, and intermediate and final layers extract more high-level features.
- semantically low-level features such as edges and texture in images
- intermediate and final layers extract more high-level features.
- the feature extraction layers there may be one or more layers performing a certain task, such as classification, semantic segmentation, object detection, denoising, style transfer, super-resolution, etc.
- Neural networks may be utilized in an ever-increasing number of applications for many different types of device, such as mobile phones. Examples include image and video analysis and processing, social media data analysis, device usage analysis, etc.
- neural networks are able to learn properties from input data, either in supervised way or in unsupervised way. Such learning is a result of a training algorithm, or of a meta-level neural network providing the training signal.
- the training algorithm changes some properties of the neural network so that its output is as close as possible to a desired output. For example, in the case of classification of objects in images, the output of the neural network can be used to derive a class or category index which indicates the class or category that the object in the input image belongs to. Training may happen by minimizing or decreasing the output’s error, also referred to as the loss. Examples of losses are mean squared error, cross-entropy, etc.
- training is an iterative process, where at each iteration the algorithm modifies the weights of the neural net to make a gradual improvement of the network’s output, i.e. to gradually decrease the loss.
- neural network “neural net” and“network” are used interchangeably, and also the“weights” of neural network may be referred to as“learnable parameters” or“parameters”.
- a neural network has two main modes of operation: training phase and testing phase.
- the training phase is the development phase, where the network learns to perform the final task.
- Training may involve iteratively updating the weights between units.
- Training a neural network is an optimization process, where the goal of optimization or training process is to make the neural network to learn the properties of the data distribution from a limited training dataset. In other words, the goal is to learn to use a limited training dataset in order to learn to generalize the previously unseen data, i.e., data which was not used for training the neural network. This may be referred to as generalization.
- data may be split into at least two sets, the training set and the validation set.
- the training set is used for training the network, i.e., to modify its learnable parameters in order to minimize the loss.
- the validation set is used for checking the performance of the network on data which was not used to minimize the loss, as an indication of the final performance of the neural network.
- the errors on the training set and on the validation set are monitored during the training process to understand the following things:
- the training set error should decrease, otherwise the neural network is in the region of underfitting.
- the neural network is learning to generalize - in this case, also the validation set error needs to decrease and not to be too much higher than the training set error. If the training set error is low, but the validation set error is much higher than the training set error, or it does not decrease, or it even increases, the neural network is in the regime of overfitting (i.e. optimization). This means that the neural network has just memorized the training set’s properties and performs well only on that set, but performs poorly on a set not used for tuning its parameters. Recently, neural image compression and decompression systems are based on neural auto-encoders, or simply auto-encoders.
- An auto-encoder may comprise two neural networks, one of which is the neural encoder (also referred to as“encoder” in this description for simplicity) and the other is the neural decoder (also referred to as“decoder” in this description for simplicity).
- the encoder is configured to map the input data (such as an image, for example) to a representation which is more easily or more efficiently compressed.
- the decoder gets the compressed version of the data and is configured to de-compress it, thus reconstructing the data.
- the two networks in the auto-encoder may be trained simultaneously, in an end-to- end fashion.
- the training may be performed by using at least a reconstruction loss, which trains the auto-encoder to reconstruct the image correctly.
- An example of reconstruction loss is the mean squared error (MSE).
- MSE mean squared error
- an additional loss on the output of the encoder may be used.
- Fig. 2 illustrates an example of a neural auto-encoder training process for image compression.
- the output of the encoder may be binarized and entropy-coded.
- Binarization is a non-differentiable operation, so it cannot be used during training of the encoder, because it is not possible to obtain useful gradients for the encoder. However, even if there is a binarization operation, it is still possible to train the decoder.
- One common training strategy is to have two alternate training steps: in one training step no binarization is used (optionally, a differentiable approximation of the binarization may be used instead), and both encoder and decoder are trained; in the second training step, binarization is used and only the decoder is trained.
- the training may be performed on a big dataset, which is good representative of the data that may be used at test time. This way, the network is trained to generalize on unseen data (but which is still sufficiently similar to the training data). However, the performance of a neural network can drastically be improved if the network is optimized on the target input data on which it will be used. In the present disclosure this specific optimization to one or more test input data is referred to as“overfitting” or“fine-tuning”. Overfitting may refer to optimizing or training (e.g., updating the learnable parameters of) a neural network on a certain test datum or several test data, as opposed to optimizing on a general set of training data.
- the test data is the data on which the neural network is applied when it is utilized for its purpose (for example, the test data may include an image that is to be compressed). Overfitting is for example beneficial when there is a sudden domain shift in the data, and especially if the domain shift happens continuously and gradually.
- a data domain shift means that the data domain or context or type changes, for example a camera may start to capture data from a different-looking scene, where the difference may be in the lighting, in the type or amount of objects, in the type or amount of motion, in the type or amount of texture, etc.
- a neural network which is trained on a different data domain than the one on which it is run may perform sub-optimally.
- the present embodiments relate to neural image (or video) compression.
- This may include using neural networks for compressing and/or de-compressing images (or other data), with high compression gain and a high reconstruction quality.
- the compression gain can be measured by the number of bits of the encoded or compressed representation.
- the reconstruction quality can be measured by a certain metric which compares the original image and the de-compressed or reconstructed image.
- the various embodiments provide a set of techniques for optimizing image compression auto-encoders on the specific input data (“overfitting”), and in a way which improves the encoding speed and the reconstruction quality, or alternatively improves the encoding speed and the compression gain.
- an auto-encoder is used as an example.
- the auto-encoder is optimized (i.e., overfitted) to a specific input data on which it is used. This optimization may be performed at network utilization time, not at training time.
- the encoding is sped up via sequential neural encoder network overfitting.
- the decoding quality is improved via neural decoder network overfitting.
- the neural decoder network should not be optimized for a certain input data because sending the data to the decoder for performing the optimization may require too many bits.
- a strategy allowing trade-offs between bitrates and decoding quality is introduced. Overfitting (i.e. optimization) may be more effective if it is performed on an image- block level. So, there will be as many optimized networks as there are blocks in the image.
- a baseline network is assumed to be available, which has been trained on a large dataset. This baseline network is overfitted to the first block of the image. For subsequent blocks, the overfitting is performed by starting from the neural network overfitted on the neighboring blocks. This will speed up the overfitting process as it may require much less training iterations than if started from the baseline network.
- the baseline network is better to act as a starting neural network in some situations, thus multiple evaluation strategies are introduced for selecting the starting neural network.
- An initial network to be overfitted can be one of the already overfitted ones for neighboring blocks, without combining their weights. So, each of the previously- overfitted networks may be overfitted on the current block and then evaluated to determine the best performing network.
- the multiple weight versions obtained at different training iterations during the previous overfitting process may be stored (e.g., every 100 iterations). Each of such weight versions may be referred to as an intermediate version of a neural network.
- overfitting on the current block may be done on a plurality of network versions previously-overfitted on the at least one neighboring block. Finally, a comparison of the plurality of overfitted networks may be performed.
- This comparison may include the network(s) overfitted from baseline, the network(s) overfitted from previously-overfitted network(s) on neighboring blocks, the network(s) overfitted from intermediate version(s) of previously- overfitted networks(s) on neighboring blocks, and network(s) overfitted from a combination of previously-overfitted on neighboring blocks
- the neural decoder network overfitting process may be performed at the encoder side, where the neural decoder network is overfitted to one block of every N blocks. If this overfitting results into a much improved PSNR (Peak Signal-to-Noise Ratio) for the considered block and for its neighbors compared to the bitrate increase caused by sending the necessary additional data to the decoder’s side, then also the decoder’s side uses the overfitted neural network.
- PSNR Peak Signal-to-Noise Ratio
- neural decoder network overfitting may allow using a lower- dimensional auto-encoder so that the PSNR is similar to the one obtained by a higher-dimensional auto-encoder but with a lower bitrate, thus resulting into higher compression efficiency.
- the transmitter may comprise the neural encoder network
- the receiver may comprise the neural decoder network.
- the encoded data may be saved on the same device from which is will be decoded, or it may be saved on another device which will be used for moving the data to another memory device from which the decoder will decoded the encoded data.
- the auto-encoder has been trained on a large collection of images, such as ImageNet, Places 2, or similar datasets.
- This pre-trained auto- encoder is referred in this disclosure to as“baseline auto-encoder” or“baseline network” or simply as“baseline”.
- the encoder and decoder of the autoencoder will be referred to as the“baseline encoder” and“baseline decoder” respectively, or simply as“baselines”, when it is clear that both encoder and decoder are being referred to.
- the training of the baseline network is assumed to have been performed for the task of image (or other specific data) compression, thus by using at least one reconstruction loss and at least one compression loss.
- the baseline network will be the starting point for overfitting either the encoder or the decoder or both.
- the baseline network is considered to be a neural network which is able to compress and decompress well any piece of data which is not too different from the training data.
- it is a neural network which is able to generalize. Although having such generalization characteristics is beneficial, the present embodiments are not restricted to this requirement.
- neural network can be optimized (i.e.“overfitted” or“fine-tuned”) on a specific input data.
- optimization operation is a training operation, thus comprising one or more training iterations, where the weights of the neural network are changed in order to improve the performance of the neural network on the input data.
- the neural network may deviate from its generalization capabilities and will become instead specific or fine-tuned on the input data on which it was optimized.
- This optimization operation may performed at inference time and not at training time. I.e., it may be performed during encoding or decoding the data.
- the overfitting may be performed on a block-level. Therefore, an image can be divided into blocks and a neural network will be fine-tuned on at least one block. According to an embodiment the blocks are non-overlapping.
- a network may be optimized on at least one block and at least one frame, or alternatively it may be optimized on at least one whole frame.
- evaluation is performed by overfitting the candidate neural networks on the current block, and the choosing the best neural network.
- the transmitter receives an input image 300 (or other data) which needs to be encoded.
- the transmitter has at least one baseline encoder network 310 (i.e. encoder of the pretrained auto-encoder).
- the transmitter may divide the image 300 into blocks (1 , 2, 3, 4, ...., 12, ...), if the image hasn’t been already divided. After that, the transmitter overfits a neural encoder network 305 to each block of the image.
- the transmitter may decide to overfit neural encoder network to a subset of all blocks, and to use the neural network overfitted on neighboring blocks for encoding the blocks on which no overfitting was performed. Flowever, for the sake of simplicity, in an example of this embodiment, the transmitter has overfitted one neural encoder network for each block.
- the baseline encoder 310 For overfitting on the first block (which may be anywhere in the image but it is considered here to be the top-left most block with number 1 ), the only available neural encoder network is the baseline encoder 310, so the overfitting will start from the baseline encoder 310. A copy of the original baseline encoder may be made and kept at the transmitter side. In general, the overfitting of the first block starts from a baseline encoder.
- the baseline encoder may be determined to be an encoder corresponding to a block of a previous image or video frame having same or nearby (e.g. adjacent) location with respect to the first block.
- the baseline encoder may be also determined based on similarity of the first block of the current image or video frame and a block of a previous image or video frame.
- a neural network is defined by its topology or architecture (e.g., number and type of layers), and by its weights. It is assumed in this example that only the weights of the neural encoder network are changed during the optimization processes, but the topology may be changed as well, for example based on type of content in the relevant block of image.
- the neural network can be characterized or represented by a point in the weight space, where each dimension of this space is a weight of the neural network.
- the baseline network may be considered to be a point in weight space which is relatively close to the optimal points for all images, but not too close to any of those optimal points. By the optimization operation, the optimized neural network gets closer in weight space to the optimal neural network for the data on which it was optimized.
- the overfitting for the current block 301 can be started from a combination of the overfitted neural networks on those neighboring blocks 302.
- the combination may be an average of the weights 306, or any suitable neural network combining method.
- the transmitter may apply an evaluation phase where the best neural network for the start is selected. This can be made for example by running parallel overfitting sessions from different neural networks used as a start, such as from the “neighboring overfitted neural networks” (the neural networks overfitted on neighboring blocks), from the baseline network, and optionally from any other previously-overfitted neural network.
- Fig. 4 shows an alternative strategy for the first embodiment, wherein the evaluation is performed by running candidate starting neural networks on the current block 401 , and the best performing neural network is then used as the starting neural network for being overfitted to the current block 401 .
- selecting the best neural network for the current block may include running (i.e. only inference stage, instead of overfitting) the candidate neural networks on the current block and of determining which neural network performs best without any optimization on the current block.
- the best neural network may be selected, and used as a reference encoder to overfit on the current block.
- the motivation is that if a neural network is already performing well on the current block 401 , it is likely to be close in weight space to the optimal point for the current block 401 .
- This strategy has the advantage of avoiding having multiple overfitting session for the current block 401 , as only inference phase is run on all candidate neural networks and then only one neural network is optimized.
- a further additional neural network to be evaluated may be a neural network overfitted on the current block but on the previous frame, or on neighboring blocks on the previous frame, or a combination thereof. Furthermore, neural networks from multiple previous frames may be considered too.
- the original decoder may still be used at inference stage because during overfitting the original decoder was used and its weights were not modified. This enables optimizing the encoder during data transmission without a need to send updated weights of the decoder to the receiver.
- Overfitting the decoder means that the neural decoder network is further optimized on the current block so that the decoding or reconstruction of the encoded block is improved.
- Fig. 5 illustrates an example of decoder overfitting.
- the transmitter 510 overfits the decoder 512 by using the original block as ground-truth, in order to obtain the overfitted decoder 514.
- the overfitted decoder 514 will be evaluated and compared to other decoders which are available at receiver’s 520 side, in order to decide if it is worth using such overfitted decoder 514 with respect to the bitrate increase of sending the needed additional information. This evaluation is done for the current block and N subsequent blocks.
- the transmitter 510 computes the block residual and sends it to the receiver 520.
- the receiver 520 uses the block residual to recover the high quality block and uses it as ground-truth for performing the overfitting. It is realized that the overfitting is done at the receiver (decoder) device based on the residual, instead of just receiving the overfitted version of the neural network from the transmitter. This avoids sending the updated decoder weights to receiver 520 and reduces the overhead to the block residuals.
- Training a neural network may involve using ground-truth data in order to compute the loss or error value, which is then differentiated with respect to the network’s weights, and the obtained gradients are used for updating the weights’ value.
- the ground-truth for training a decoder is the desired reconstructed blocks, which usually are the original blocks which are input to the encoder. However, the original blocks are available only at transmitter’s side. Therefore, two possible alternatives are possible:
- the transmitter sends the block residual to the receiver, i.e., the difference between the original block and the decoded block.
- the receiver can recover the original block and use it as ground-truth for performing the overfitting.
- the additional signaling associated with this option may include informing the receiver that the transmitted data is the block residual for a certain block or a certain image, using unique IDs for both, for example a block identifier and/or an image identifier or a frame number.
- Fig. 4 illustrates this.
- the transmitted performs the overfitting of the decoder, and sends to the receiver the decoder’s weight residual, i.e., the difference between the weights of the decoder before and after the overfitting.
- the additional signaling associated with this option may include informing the receiver that the transmitted data is weights’ residual and each single weight residual value may be associated to an identifier of the weight to be applied to.
- each single weight residual value may be associated to an identifier of the weight to be applied to.
- one may send the weight residual for all weights, where the order of the weights residuals implicitly identify what weights they need to be applied to, and where many weights residuals may be zero.
- Other suitable ways of associating the weights residuals to the correct weights may be used.
- the transmitter may choose to consider both of the above two options initially. Then, it may compute the bitrate increase separately for each option and select the option with minimal bitrate increase. However, in some cases the bitrate increase for allowing the receiver to run an overfitted decoder for the current block may not be worth the reconstruction quality increase. Thus, since subsequent nearby blocks are likely to benefit also by the decoder overfitted to the current block (due to spatial correlation/redundancy in images), the transmitter may take into account the reconstruction quality (e.g., PSNR) increase for the current block and for the subsequent N blocks. If the quality increase for those blocks is worth bitrate increase, then the transmitter may send the additional data (either the block residual or the weights residual) to the receiver.
- the reconstruction quality e.g., PSNR
- the transmitter may take into account the baseline and other decoders previously overfitted and which are already available at receiver’s side. If the baseline decoder performs well enough (especially compared to the bitrate increase for using an overfitted decoder), the transmitter may not send any additional data. If one of the previously overfitted decoders which are available at receiver’s side performs well enough, the transmitter will signal to the receiver to use that overfitted decoder (e.g., by using a unique decoder’s ID).
- the overfitting starts from the baseline, from the neural networks overfitted on previous blocks, or from neural networks overfitted on previous frames, or a combination thereof, similarly to what was done in the first embodiment.
- the overfitted decoder can be used for bitrate saving.
- the PSNR gain brought up by a better decoder can be exploited for achieving a better compression gain.
- This can be implemented for example by having multiple auto-encoders, one for each encoding dimension (e.g., 64 bits and 216 bits). If overfitting the decoders allows to obtain the same or better PSNR with a lower-dimension auto-encoder, and the saved bits are lower than the overhead bits needed by the block residual or by the weights residual, then there is compression gain.
- Fig. 6 is a flowchart illustrating a method according to an embodiment.
- a method comprises receiving 601 input data divided into a plurality of blocks; overfitting 602 a first neural encoder network for a first block of the data based on a baseline encoder network; encoding 603 the first block by the first overfitted neural encoder network; overfitting 604 a second neural encoder network for at least one subsequent block of the data based on a combination of neural networks used for previous blocks and/or the baseline encoder network; and encoding 605 the at least one subsequent block by the second overfitted neural encoder network.
- An apparatus comprises means for receiving input data divided into a plurality of blocks; means for overfitting a first neural encoder network for a first block of the data based on a baseline encoder network; means for encoding the first block by the first overfitted neural encoder network; means for overfitting a second neural encoder network for at least one subsequent block of the data based on a combination of neural encoder networks used for previous blocks and/or the baseline encoder network; and means for encoding the at least one subsequent block by the second overfitted neural encoder network.
- the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Fig, 6 according to various embodiments.
- FIG. 7 is a flowchart illustrating a method according to another embodiment.
- a method for neural decoder network comprises receiving 701 a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, recovering 702 the original block to be used as ground- truth data; and overfitting 703 the neural decoder network based on the ground-truth data.
- An apparatus comprises means for receiving a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, means for recovering the original block to be used as ground-truth data; and means for overfitting the neural decoder network based on the ground-truth data.
- the means comprises at least one processor, and a memory including a computer program code, wherein the processor may further comprise processor circuitry.
- the memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform the method of Fig, 7 according to various embodiments.
- the various embodiments may provide advantages. For example, the various embodiments improve the inference speed and decoding quality, or alternatively inference speed and compression efficiency, for neural image (or video) compression.
- a device may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the device to carry out the features of an embodiment.
- a network device like a server may comprise circuitry and electronics for handling, receiving and transmitting data, computer program code in a memory, and a processor that, when running the computer program code, causes the network device to carry out the features of an embodiment.
- An apparatus may comprise means for performing functions described in the appended claims and throughout the description.
- the computer program code comprises one or more operational characteristics.
- said operational characteristics are being defined through configuration by said computer based on the type of said processor, wherein a system is connectable to said processor by a bus, wherein a programmable operational characteristic of the system comprises receiving input data divided into a plurality of blocks; overfitting a first neural encoder network for a first block of the data based on a baseline encoder network; encoding the first block by the first overfitted neural encoder network; overfitting a second neural encoder network for at least one subsequent block of the data based on a combination of neural networks used for previous blocks and/or the baseline encoder network; and encoding the at least one subsequent block by the second overfitted neural encoder network.
- the programmable operational characteristic of the system comprises receiving a block residual defining a difference between an original block of data and a decoded block of the data; based on the residual, recovering the original block to be used as ground-truth data; and overfitting the neural decoder network based on the ground-truth data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| FI20185611 | 2018-07-02 | ||
| PCT/FI2019/050483 WO2020008104A1 (en) | 2018-07-02 | 2019-06-20 | A method, an apparatus and a computer program product for image compression |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| EP3818502A1 true EP3818502A1 (en) | 2021-05-12 |
| EP3818502A4 EP3818502A4 (en) | 2022-06-29 |
Family
ID=69060463
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP19831508.7A Pending EP3818502A4 (en) | 2018-07-02 | 2019-06-20 | A method, an apparatus and a computer program product for image compression |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP3818502A4 (en) |
| WO (1) | WO2020008104A1 (en) |
Families Citing this family (12)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12243281B2 (en) | 2020-03-03 | 2025-03-04 | Telefonaktiebolaget Lm Ericsson (Publ) | System, an arrangement, a computer software module arrangement, a circuitry arrangement and a method for improved image processing utilizing two entities |
| US11915487B2 (en) | 2020-05-05 | 2024-02-27 | Toyota Research Institute, Inc. | System and method for self-supervised depth and ego-motion overfitting |
| WO2021255605A1 (en) * | 2020-06-19 | 2021-12-23 | Nokia Technologies Oy | Apparatus, method and computer program product for optimizing parameters of a compressed representation of a neural network |
| CN116134822A (en) * | 2020-07-21 | 2023-05-16 | 交互数字Vc控股法国有限公司 | Method and apparatus for updating deep neural network based image or video decoder |
| US11651053B2 (en) * | 2020-10-07 | 2023-05-16 | Samsung Electronics Co., Ltd. | Method and apparatus with neural network training and inference |
| CN113035211B (en) * | 2021-03-11 | 2021-11-16 | 马上消费金融股份有限公司 | Audio compression method, audio decompression method and device |
| US12482256B2 (en) | 2021-06-22 | 2025-11-25 | Electronics And Telecommunications Research Institute | Method and apparatus for compression of a task output by machine learning |
| US20230186081A1 (en) * | 2021-12-13 | 2023-06-15 | Tencent America LLC | System, method, and computer program for iterative content adaptive online training in neural image compression |
| US20230186526A1 (en) | 2021-12-13 | 2023-06-15 | Tencent America LLC | System, method, and computer program for content adaptive online training for multiple blocks based on certain patterns |
| US20230306239A1 (en) * | 2022-03-25 | 2023-09-28 | Tencent America LLC | Online training-based encoder tuning in neural image compression |
| US20230316588A1 (en) * | 2022-03-29 | 2023-10-05 | Tencent America LLC | Online training-based encoder tuning with multi model selection in neural image compression |
| CN119316618A (en) * | 2023-07-12 | 2025-01-14 | 杭州海康威视数字技术股份有限公司 | A decoding and encoding method, device and equipment thereof |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016199330A1 (en) * | 2015-06-12 | 2016-12-15 | パナソニックIpマネジメント株式会社 | Image coding method, image decoding method, image coding device and image decoding device |
| CN107925762B (en) * | 2015-09-03 | 2020-11-27 | 联发科技股份有限公司 | Video codec processing method and device based on neural network |
-
2019
- 2019-06-20 WO PCT/FI2019/050483 patent/WO2020008104A1/en not_active Ceased
- 2019-06-20 EP EP19831508.7A patent/EP3818502A4/en active Pending
Also Published As
| Publication number | Publication date |
|---|---|
| EP3818502A4 (en) | 2022-06-29 |
| WO2020008104A1 (en) | 2020-01-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3818502A1 (en) | A method, an apparatus and a computer program product for image compression | |
| EP4218238B1 (en) | Instance-adaptive image and video compression using machine learning systems | |
| US11657264B2 (en) | Content-specific neural network distribution | |
| EP3276540B1 (en) | Neural network method and apparatus | |
| US10390040B2 (en) | Method, apparatus, and system for deep feature coding and decoding | |
| US12108050B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
| EP3934254A1 (en) | Encoding and decoding of extracted features for use with machines | |
| WO2021205065A1 (en) | Training a data coding system comprising a feature extractor neural network | |
| EP3938965A1 (en) | An apparatus, a method and a computer program for training a neural network | |
| WO2021205066A1 (en) | Training a data coding system for use with machines | |
| US12142014B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
| US12483715B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
| JP2025534966A (en) | Diffusion-Based Data Compression | |
| EP3803712A1 (en) | An apparatus, a method and a computer program for selecting a neural network | |
| US12388999B2 (en) | Method, an apparatus and a computer program product for video encoding and video decoding | |
| US20240185572A1 (en) | Systems and methods for joint optimization training and encoder side downsampling | |
| US20240406424A1 (en) | Systems and methods for video coding for machines using an autoencoder | |
| US20240340391A1 (en) | Intelligent multi-stream video coding for video surveillance | |
| CN117716687A (en) | Implicit image and video compression using machine learning system | |
| US20250225388A1 (en) | A method, an apparatus and a computer program product for machine learning | |
| Paul | Deep learning solutions for video encoding and streaming | |
| WO2025006997A2 (en) | Method, apparatus, and medium for visual data processing | |
| WO2025007083A1 (en) | Systems and method for decoded frame augmentation for video coding for machines | |
| CN119967187A (en) | Video processing method and device | |
| WO2023081091A9 (en) | Systems and methods for motion information transfer from visual to feature domain |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20210202 |
|
| AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
| DAV | Request for validation of the european patent (deleted) | ||
| DAX | Request for extension of the european patent (deleted) | ||
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/08 20060101ALI20220217BHEP Ipc: H04N 19/176 20140101ALI20220217BHEP Ipc: G06N 3/04 20060101ALI20220217BHEP Ipc: H04N 19/196 20140101ALI20220217BHEP Ipc: H04N 19/192 20140101ALI20220217BHEP Ipc: G06T 9/00 20060101AFI20220217BHEP |
|
| A4 | Supplementary search report drawn up and despatched |
Effective date: 20220530 |
|
| RIC1 | Information provided on ipc code assigned before grant |
Ipc: G06N 3/08 20060101ALI20220523BHEP Ipc: H04N 19/176 20140101ALI20220523BHEP Ipc: G06N 3/04 20060101ALI20220523BHEP Ipc: H04N 19/196 20140101ALI20220523BHEP Ipc: H04N 19/192 20140101ALI20220523BHEP Ipc: G06T 9/00 20060101AFI20220523BHEP |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
| 17Q | First examination report despatched |
Effective date: 20240424 |