WO2025133815A1 - Overfitting shared multipliers - Google Patents
Overfitting shared multipliers Download PDFInfo
- Publication number
- WO2025133815A1 WO2025133815A1 PCT/IB2024/062397 IB2024062397W WO2025133815A1 WO 2025133815 A1 WO2025133815 A1 WO 2025133815A1 IB 2024062397 W IB2024062397 W IB 2024062397W WO 2025133815 A1 WO2025133815 A1 WO 2025133815A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- multipliers
- values
- multiplier
- updated
- updates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
Definitions
- the examples and non-limiting embodiments relate generally to multimedia transport and neural networks, and more particularly, to method, apparatus, and computer program product for overfitting shared multipliers.
- Example 1 An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers; and multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
- K cardinality
- Example 2 An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
- K cardinality
- Example 3 The apparatus of example 1 or 2, wherein K is a number of different values than that the C multipliers comprise, and wherein the values of the C multipliers are grouped into K groups, and wherein multipliers within each group comprise same value and multipliers within different groups comprise different values.
- Example 4 The apparatus of any of the examples 1 to 3, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
- Example 5 The apparatus of any of the examples 1 to 4, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
- Example 6 The apparatus of any of the examples 1 to 5, wherein the apparatus is caused to perform: assigning of the C multipliers to the different groups.
- Example 7 The apparatus of example 6, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
- Example 8 The apparatus of example 6, wherein the apparatus is further caused to perform the grouping operation, and wherein to perform the grouping operation the apparatus is further caused to perform: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
- Example 9 The apparatus of example 7 or 8, wherein the apparatus is further caused to perform: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
- Example 10 The apparatus of example 6, wherein the assigning is determined based on a content on which the C multipliers are overfitted and on a grouping operation.
- Example 11 The apparatus of any of the examples 7 to 9, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the apparatus is further caused to perform: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
- Example 12 The apparatus of examples 6 or 10, wherein during an overfitting operation, the C multipliers are overfitted without constraining the values of the C multipliers or without considering the C multipliers to belong to any groups, and wherein the apparatus is further caused to perform: updating of the C multipliers, based on the overfitting operation; and applying a clustering operation to the values of the C updated multipliers or to the C multiplier-updates by using a number K of clusters, wherein values of the C updated multipliers or values of the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a value of a multiplier-update.
- Example 13 The apparatus of example 12, wherein the apparatus is further caused to perform: signaling the K values of multiplier-updates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
- Example 14 The apparatus of example 12, wherein the apparatus is further caused to perform: signaling values of the C multiplier-updates or the C updated multipliers, wherein the C values are in the set of cardinality K, and wherein the C values are ordered according to an assumed order of the channels of the tensor and/or the order of multipliers that multiply the channels of the tensor.
- Example 15 The apparatus of any of the examples 11, 13, or 14, wherein the apparatus is further caused to perform: quantizing the values of the C multiplier-updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
- Example 16 The apparatus of any of the previous examples, wherein the layer comprises a convolution layer of the neural network.
- Example 17 An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
- K cardinality
- Example 18 The apparatus of example 17, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
- Example 19 An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
- Example 21 The apparatus of any of the examples 19 or 20, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
- Example 22 An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
- Example 24 The apparatus of any of the examples 22 or 23, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized C multiplier updates or the C updated multipliers.
- Example 28 The method of any of the examples 25 to 27, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
- Example 29 The method of any of the examples 25 to 28, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
- Example 30 The method of any of the examples 25 to 29 further comprising: assigning of the C multipliers to the different groups.
- Example 31 The method of example 30, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
- Example 32 The method of example 30 further comprising performing the grouping operation, and wherein performing the grouping operation comprises: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
- Example 33 The method of example 31 or 32 further comprising: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
- Example 35 The method of any of the examples 31 to 33, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the method further comprises: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
- Example 37 The method of example 36 further comprising: signaling the K values of multiplier-updates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
- Example 39 The method of any of the examples 35, 37, or 38 further comprising: quantizing the values of the C multiplier -updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
- Example 40 The method of any of the examples 25 to 39, wherein the layer comprises a convolution layer of the neural network.
- Example 41 A method comprising: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
- K cardinality
- Example 42 The method of example 41, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
- Example 43 A method: receiving, for a layer of a neural network, K values of multiplierupdates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
- Example 44 The method of example 43, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number smaller than C.
- Example 45 The method of any of the examples 43 or 44, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
- Example 46 A method comprising: receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
- Example 47 The method of example 46, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or an multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number different from C.
- Example 48 The method of any of the examples 46 or 47, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized C multiplier updates or the C updated multipliers.
- Example 49 A computer readable medium comprising program instructions which, when executed by an apparatus, cause the apparatus to perform the methods as described in any of the examples 25 to 48.
- Example 50 An apparatus comprising means for performing methods as described in any of the examples 25 to 48.
- FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein.
- FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein.
- FIG. 3 shows a block diagram of a general structure of a video encoder.
- FIG. 4 illustrates a pipeline of video coding for machines (VCM).
- FIG. 5 illustrates a pipeline where a decoder-side neural network (DSNN) is overfitted based at least on an overfitting signal, obtaining an overfitted DSNN.
- DSNN decoder-side neural network
- FIG. 6 illustrates that the overfitted DSNN may be used for its purpose, such as for filtering input data.
- FIG. 7 is an example apparatus, which may be implemented in hardware, and is caused to, implement examples described herein.
- FIG. 8 shows a representation of an example of non-volatile memory media used to store instructions that implement the examples described herein.
- FIG. 9 is an example method to implement the embodiments described herein, in accordance with another embodiment.
- FIG. 10 is another example method to implement the embodiments described herein, in accordance with another embodiment.
- FIG. 11 is an example method to implement the embodiments described herein, in yet accordance with another embodiment.
- FIG. 12 is another example method to implement the embodiments described herein, in still accordance with another embodiment.
- FIG. 13 is another example method to implement the embodiments described herein, in still accordance with another embodiment.
- FIG. 14 is a block diagram of one possible and non-limiting system in which the example embodiments may be practiced. DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
- DU distributed unit eNB or eNodeB evolved Node B (for example, an LTE base station)
- eNB or eNodeB evolved Node B (for example, an LTE base station)
- EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN-DC
- E-UTRA evolved universal terrestrial radio access, for example, the LTE radio access technology
- Fl or Fl-C interface between CU and DU control interface gNB (or gNodeB) base station for 5G/NR for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
- circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even when the software or firmware is not physically present.
- This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims.
- the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
- the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
- a method, apparatus and computer program product are provided in accordance with example embodiments for overfitting shared multipliers.
- FIG. 1 shows an example block diagram of an apparatus 50.
- the apparatus may be an internet of things (loT) apparatus configured to perform various functions, for example, gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like.
- the apparatus may comprise a video coding system, which may incorporate a codec.
- FIG. 2 shows a layout of an apparatus according to an example embodiment. The elements of FIG. 1 and FIG. 2 will be explained next.
- the apparatus 50 may for example be, a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or a lower power device.
- a mobile terminal or user equipment of a wireless communication system may for example be, a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or a lower power device.
- embodiments of the examples described herein may be implemented within any electronic device or apparatus which may process data by neural networks.
- the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
- the apparatus 50 may further comprise a display 32, for example, in the form of a liquid crystal display, light emitting diode display, organic light emitting diode display, and the like.
- the display may be any suitable display technology suitable to display media or multimedia content, for example, an image or a video.
- the apparatus 50 may further comprise a keypad 34.
- any suitable data or user interface mechanism may be employed.
- the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
- the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
- the apparatus 50 may further comprise an audio output device which in embodiments of the examples described herein may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
- the apparatus 50 may also comprise a battery (or in other embodiments of the examples described herein the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
- the apparatus may further comprise a camera 42 capable of recording or capturing images and/or video.
- the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
- the apparatus 50 may comprise a controller 56, a processor or a processor circuitry for controlling the apparatus 50.
- the controller 56 may be connected to a memory 58 which in embodiments of the examples described herein may store both data in the form of an image, audio data and video data, and/or may also store instructions for implementation on the controller 56.
- the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio, image and/or video data or assisting in coding and/or decoding carried out by the controller.
- the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example, a universal integrated circuit card (UICC) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
- a card reader 48 and a smart card 46 for example, a universal integrated circuit card (UICC) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
- a smart card 46 for example, a universal integrated circuit card (UICC) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
- UICC universal integrated circuit card
- the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals, for example, for communication with a cellular communications network, a wireless communications system or a wireless local area network.
- the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and/or for receiving radio frequency signals from other apparatus(es).
- FIG. 3 shows a block diagram of a general structure of a video encoder.
- FIG. 3 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers.
- FIG. 3 illustrates a video encoder comprising a first encoder section 301 for a base layer and a second encoder section 351 for an enhancement layer.
- Each of the first encoder section 301 and the second encoder section 351 may comprise similar elements for encoding incoming pictures.
- the encoder sections 301, 351 may comprise a pixel predictor 302, 352, prediction error encoder 303, 353 and prediction error decoder 304, 354.
- FIG. 3 shows a block diagram of a general structure of a video encoder.
- FIG. 3 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers.
- FIG. 3 illustrates a video encoder comprising a first encoder section 301 for a base layer and a second
- the pixel predictor 302 of the first encoder section 301 receives base layer picture(s)/image(s) 300 of a video stream to be encoded at both the inter-predictor 306 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of current frame or picture).
- the output of both the inter-predictor and the intra-predictor are passed to the mode selector 310.
- the intra- predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 310.
- the mode selector 310 also receives a copy of the base layer image(s) 300.
- the pixel predictor 352 of the second encoder section 351 receives enhancement layer picture(s)/images(s) 350 of a video stream to be encoded at both the inter-predictor 356 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 358 (which determines a prediction for an image block based only on the already processed parts of current frame or picture).
- the output of both the inter-predictor and the intra-predictor are passed to the mode selector 360.
- the intra-predictor 358 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 360.
- the mode selector 360 also receives a copy of the enhancement layer pictures 350.
- the output of the inter-predictor 306, 356 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 360.
- the output of the mode selector 310, 360 is passed to a first summing device 321, 371.
- the first summing device may subtract the output of the pixel predictor 302, 352 from the base layer image(s) 300/enhancement layer image(s) 350 to produce a first prediction error signal 320, 370 which is input to the prediction error encoder 303, 353.
- the pixel predictor 302, 352 further receives from a preliminary reconstructor 339, 389 the combination of the prediction representation of the image block 312, 362 and the output 338, 388 of the prediction error decoder 304, 354.
- the preliminary reconstructed image 314, 364 may be passed to the intra-predictor 308, 358 and to the filter 316, 366.
- the filter 316, 366 receiving the preliminary representation may filter the preliminary representation and output a final reconstructed image 340, 490 which may be saved in the reference frame memory 318, 368.
- the reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer image 300 is compared in inter-prediction operations.
- the reference frame memory 318 may also be connected to the inter-predictor 356 to be used as the reference image against which a future enhancement layer image(s) 350 is compared in inter-prediction operations.
- the reference frame memory 368 may be connected to the inter-predictor 356 to be used as the reference image against which the future enhancement layer image(s) 350 is compared in inter -prediction operations.
- Filtering parameters from the filter 316 of the first encoder section 301 may be provided to the second encoder section 351 subject to the base layer being selected and indicated to be source for predicting the filtering parameters of the enhancement layer according to some embodiments.
- the prediction error decoder may be considered to comprise a dequantizer 346, 496, which dequantizes the quantized coefficient values, for example, DCT coefficients, to reconstruct the transform signal and an inverse transformation unit 348, 498, which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 348, 498 includes reconstructed block(s).
- the prediction error decoder may also comprise a block filter which may filter the reconstructed block(s) according to further decoded information and filter parameters.
- the entropy encoder 330, 380 receives the output of the prediction error encoder 303, 353 and may perform a suitable entropy encoding/variable length encoding on the signal to provide a compressed signal.
- the outputs of the entropy encoders 330, 380 may be inserted into a bitstream, for example, by a multiplexer 365.
- the one or more apparatuses described in FIGs 1 to 3 may be caused to perform: overfitting shared multipliers.
- a neural network is a computation graph consisting of several layers of computation. Each layer consists of one or more units, where each unit performs a computation. A unit is connected to one or more other units, and a connection may be associated with a weight. The weight may be used for scaling the signal passing through an associated connection. Weights are learnable parameters, for example, values which can be learned from training data. There may be other learnable parameters, such as those of batch-normalization layers.
- Feed-forward neural networks are such that there is no feedback loop, each layer takes input from one or more of the previous layers, and provides its output as the input for one or more of the subsequent layers. Also, units inside a certain layer take input from units in one or more of preceding layers and provide output to one or more of following layers.
- Initial layers those close to the input data, extract semantically low-level features, for example, edges and textures in images, and intermediate and final layers extract more high-level features.
- feature extraction layers there may be one or more layers performing a certain task, for example, classification, semantic segmentation, object detection, denoising, style transfer, superresolution, and the like.
- recurrent neural networks there is a feedback loop, so that the neural network becomes stateful, for example, it is able to memorize information or a state.
- Neural networks are being utilized in an ever-increasing number of applications for many different types of devices, for example, mobile phones, chat bots, loT devices, smart cars, voice assistants, and the like. Some of these applications include, but are not limited to, image and video analysis and processing, social media data analysis, device usage data analysis, and the like.
- One of the properties of neural networks, and other machine learning tools, is that they are able to learn properties from input data, either in a supervised way or in an unsupervised way. Such learning is a result of a training algorithm, or of a meta-level neural network providing the training signal.
- the training algorithm consists of changing some properties of the neural network so that its output is as close as possible to a desired output.
- the output of the neural network can be used to derive a class or category index which indicates the class or category that the object in the input image belongs to.
- Training usually happens by minimizing or decreasing the output error, also referred to as the loss. Examples of losses are mean squared error, cross-entropy, and the like.
- training is an iterative process, where at each iteration the algorithm modifies the weights of the neural network to make a gradual improvement in the network’s output, for example, gradually decrease the loss.
- Training a neural network is an optimization process, but the final goal is different from the typical goal of optimization. In optimization, the only goal is to minimize a function.
- the goal of the optimization or training process is to make the model learn the properties of the data distribution from a limited training dataset. In other words, the goal is to learn to use a limited training dataset in order to learn to generalize to previously unseen data, for example, data which was not used for training the model. This is usually referred to as generalization.
- data is usually split into at least two sets, the training set and the validation set.
- the training set is used for training the network, for example, to modify its learnable parameters in order to minimize the loss.
- the validation set is used for checking the performance of the network on data, which was not used to minimize the loss, as an indication of the final performance of the model.
- the errors on the training set and on the validation set are monitored during the training process to understand the following:
- the training set error should decrease, otherwise the model is in the regime of underfitting.
- the validation set error needs to decrease and be not too much higher than the training set error.
- the validation set error should be less than 20% higher than the training set error. If the training set error is low, for example 10% of its value at the beginning of training, or with respect to a threshold that may have been determined based on an evaluation metric, but the validation set error is much higher than the training set error, or it does not decrease, or it even increases, the model is in the regime of overfitting. This means that the model has just memorized properties of the training set and performs well only on that set, but performs poorly on a set not used for training or tuning of its parameters.
- neural networks have been used for compressing and de-compressing data such as images.
- the most widely used architecture for such task is the auto-encoder, which is a neural network consisting of two parts: a neural encoder and a neural decoder.
- these neural encoder and neural decoder would be referred to as encoder and decoder, even though these refer to algorithms which are learned from data instead of being tuned manually.
- the encoder takes an image as an input and produces a code, to represent the input image, which requires less bits than the input image. This code may have been obtained by a binarization or quantization process after the encoder.
- the decoder takes in this code and reconstructs the image which was input to the encoder.
- Such encoder and decoder are usually trained to minimize a combination of bitrate and distortion, where the distortion may be based on one or more of the following metrics: mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), or the like.
- MSE mean squared error
- PSNR peak signal-to-noise ratio
- SSIM structural similarity index measure
- models may be used interchangeably, and also the weights of neural networks may be sometimes referred to as learnable parameters or as parameters.
- Video codec includes an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form.
- an encoder discards some information in the original video sequence in order to represent the video in a more compact form, for example, at lower bitrate.
- Typical hybrid video codecs encode the video information in two phases. Firstly, pixel values in a certain picture area (or ‘block’) are predicted, for example, by motion compensation means or circuits (by finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means or circuit (by using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, e.g. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g.
- the encoder may control the balance between the accuracy of the pixel representation (e.g., picture quality) and size of the resulting coded video representation (e.g., file size or transmission bitrate).
- DCT discrete cosine transform
- Inter prediction which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, exploits temporal redundancy.
- the sources of prediction are previously decoded pictures, e.g., reference pictures.
- inter prediction the sources of prediction are previously decoded pictures in the same scalable layer.
- IBC intra block copy
- prediction may be applied similarly to temporal inter prediction but the reference picture is the current picture and only previously decoded samples may be referred in the prediction process.
- Inter-layer or inter-view prediction may be applied similarly to temporal inter prediction, but the reference picture is a decoded picture from another scalable layer or from another view, respectively.
- inter prediction may refer to temporal inter prediction only, while in other cases inter prediction may refer collectively to temporal inter prediction and any of intra block copy, inter-layer prediction, and inter- view prediction provided that they are performed with the same or similar process than temporal prediction.
- Inter prediction, temporal inter prediction, or temporal prediction may sometimes be referred to as motion compensation or motion-compensated prediction.
- Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in spatial or transform domain, for example, either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intracoding, where no inter prediction is applied.
- One example outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently when they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.
- the decoder reconstructs the output video by applying prediction techniques similar to the encoder to form a predicted representation of the pixel blocks. For example, using the motion or spatial information created by the encoder and stored in the compressed representation and prediction error decoding, which is inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain. After applying prediction and prediction error decoding techniques the decoder sums up the prediction and prediction error signals, for example, pixel values to form the output video frame.
- the decoder and encoder can also apply additional filtering techniques to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence.
- the motion information is indicated with motion vectors associated with each motion compensated image block.
- Each of these motion vectors represents the displacement of the image block in the picture to be coded in the encoder side or decoded in the decoder side and the prediction source block in one of the previously coded or decoded pictures.
- the motion vectors are typically coded differentially with respect to block specific predicted motion vectors.
- the predicted motion vectors are created in a predefined way, for example, calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
- Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor.
- the reference index of previously coded/decoded picture can be predicted.
- the reference index is typically predicted from adjacent blocks and/or or co-located blocks in temporal reference picture.
- typical high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction.
- predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.
- the prediction residual after motion compensation is first transformed with a transform kernel, for example, DCT and then coded.
- a transform kernel for example, DCT
- Typical video encoders utilize Lagrangian cost functions to find optimal coding modes, for example, the desired macroblock mode and associated motion vectors.
- This kind of cost function uses a weighting factor X to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information that is required to represent the pixel values in an image area:
- C is the Lagrangian cost to be minimized
- D is the image distortion, for example, mean squared error with the mode and motion vectors considered
- R is the number of bits needed to represent the required data to reconstruct the image block in the decoder including the amount of data to represent the candidate motion vectors.
- Video coding specifications may enable the use of supplemental enhancement information (SEI) messages or alike.
- SEI Supplemental enhancement information
- Some video coding specifications include SEI NAL units, and some video coding specifications include both prefix SEI NAL units and suffix SEI NAL units, where the former type may start a picture unit or alike and the latter type may end a picture unit or alike.
- An SEI NAL unit may include one or more SEI messages, which are not required for the decoding of output pictures but may assist in related processes, such as picture output timing, post-processing of decoded pictures, rendering, error detection, error concealment, and resource reservation.
- SEI messages are specified in H.264/AVC, H.265/HEVC, H.266/VVC, and H.274/VSEI standards, and the user data SEI messages enable organizations and companies to specify SEI messages for their own use.
- the standards may contain the syntax and semantics for the specified SEI messages but a process for handling the messages in the recipient may not be defined. Consequently, encoders may be required to follow the standard specifying a SEI message when they create SEI message(s), and decoders may not be required to process SEI messages for output order conformance.
- One of the example reasons to include the syntax and semantics of SEI messages in standards is to allow different system specifications to interpret the supplemental information identically and hence interoperate. It is intended that system specifications may require the use of particular SEI messages both in the encoding end and in the decoding end, and additionally the process for handling particular SEI messages in the recipient may be specified.
- a quality metric for the decoded data may be defined, which may be different from a quality metric for human perceptual quality.
- dedicated algorithms for compressing and decompressing data for machine consumption may be different than those for compressing and decompressing data for human consumption.
- the set of tools and concepts for compressing and decompressing data for machine consumption is referred to here as Video Coding for Machines.
- the receiver-side device include multiple ‘machines’ or neural networks (NNs). These multiple machines may be used in a certain combination which is for example determined by an orchestrator sub-system. The multiple machines may be used for example in succession, based on the output of the previously used machine, and/or in parallel. For example, a video which was compressed and then decompressed may be analyzed by one machine (NN) for detecting pedestrians, by another machine (another NN) for detecting cars, and by another machine (another NN) for estimating the depth of all the pixels in the frames.
- NN neural network
- the term ’receiver-side’ or ‘decoder-side’ to refer to the physical or abstract entity or device which includes one or more machines, and runs these one or more machines on some encoded and eventually decoded video representation which is encoded by another physical or abstract entity or device, the ‘encoder-side device’ .
- the encoded video data may be stored into a memory device, for example as a file.
- the stored file may later be provided to another device.
- the encoded video data may be streamed from one device to another.
- FIG. 4 illustrates a pipeline of video coding for machines (VCM).
- VCM encoder 402 encodes the input video into a bitstream 404.
- a bitrate 406 may be computed 408 from the bitstream 404 in order to evaluate the size of the bitstream 404.
- a VCM decoder 410 decodes the bitstream 404 output by the VCM encoder 402.
- An output of the VCM decoder 410 may be referred, for example, as decoded data for machines 412. This data may be considered as the decoded or reconstructed video.
- the decoded data for machines 412 may not have same or similar characteristics as the original video which was input to the VCM encoder 402.
- this data may not be easily understandable by a human, when the human watches the decoded video from a suitable output device such as a display.
- the output of the VCM decoder 410 is then input to one or more task neural network (task-NN).
- task-NN task neural network
- FIG. 4 is shown to include three example task-NNs, a task-NN 414 for object detection, a task-NN 416 for image segmentation, a task-NN 418 for object tracking, and a non-specified one, a task-NN 420 for performing task X.
- the goal of VCM is to obtain a low bitrate while guaranteeing that the task-NNs still perform well in terms of the evaluation metric associated with each task.
- VCM encoder When a conventional video encoder, such as a H.266/VVC encoder, is used as a VCM encoder, one or more of the following approaches may be used to adapt the encoding to be suitable to machine analysis tasks:
- ROI detection may be performed using a task NN, such as an object detection NN.
- ROI boundaries of a group of pictures or an intra period may be spatially overlaid and rectangular areas may be formed to cover the ROI boundaries.
- the detected ROIs (or rectangular areas, likewise) may be used in one or more of the following ways: o
- the quantization parameter (QP) may be adjusted spatially in a manner that ROIs are encoded using finer quantization step size(s) than other regions. For example, QP may be adjusted CTU-wise.
- the video is preprocessed to contain only the ROIs, while the other areas are replaced by one or more constant values or removed.
- a grid is formed in a manner that a single grid cell covers a ROI. Grid rows or grid columns that contain no ROIs are downsampled as preprocessing to encoding.
- Quantization parameter of the highest temporal sublayer(s) is increased (i.e. coarser quantization is used) when compared to practices for human watchable video.
- the original video is temporally downsampled as preprocessing prior to encoding.
- a frame rate upsampling method may be used as postprocessing subsequent to decoding, when machine analysis at the original frame rate is desired.
- a filter is used to preprocess the input to the conventional encoder.
- the filter may be a machine learning based filter, such as a convolutional neural network.
- a neural network may be used as filter in the decoding loop (also referred to simply as coding loop), and it may be referred to as neural network loop filter, or neural network in-loop filter.
- the NN loop filter may replace all other loop filters of an existing video codec, or may represent an additional loop filter with respect to the already present loop filters in an existing video codec.
- a neural network may be used as postprocessing filter, for example applied to the output of an image or video decoder in order to remove or reduce coding artifacts.
- neural network filter or NN filter as a filter that comprises one or more neural networks and is used either as a loop filter in the coding loop or as a post-processing filter.
- the example system comprises a codec that comprises one or more NN loop filters.
- the codec could comprise a modified VVC/H.266 compliant codec (e.g., a VVC/H.266 compliant codec that has been modified so that it would comprise one or more NN loop filters).
- the input to the one or more NN loop filters may comprise at least a reconstructed block or frames (simply referred to as reconstruction) or data derived from a reconstructed block or frame (e.g., the output of a conventional loop filter).
- the reconstruction may be obtained based on predicting a block or frame (e.g., by means of intra-frame prediction or inter-frame prediction) and performing residual compensation.
- the input to the one or more NN loop filters may also comprise other data that is associated or related to the reconstructed block or frame and may be referred to as auxiliary input data.
- the one or more NN loop filters may enhance the quality of at least one of their inputs, such as the quality of the input reconstructed block or frame, where the quality may be measured in terms of one or more quality metrics, so that a rate-distortion loss is decreased.
- the rate may indicate a bitrate (estimate or real) of the encoded video.
- the distortion may indicate a pixel fidelity distortion such as the following:
- MSE mean-squared error
- mAP mean average precision
- the enhancement may result into a coding gain, which can be expressed for example in terms of BD-rate, BD-PSNR, or BD-mAP.
- the NN filter may be a NN post-processing filter, whose input may comprise one or more outputs of a video codec.
- the filter may be used only for increasing a quality metric for at least one of its inputs, where the quality metric may be, for example, peak signal-to-noise ratio (PSNR), mAP for object detection, MOTA for object tracking, and the like.
- PSNR peak signal-to-noise ratio
- mAP for object detection
- MOTA for object tracking
- the decoder or the receiver comprising the decoder is assumed to comprise at least a neural network, referred to as decoder-side neural network (DSNN).
- DSNN decoder-side neural network
- Various embodiments consider the case where the DSNN is optimized for a certain data unit, with respect to one or more metrics.
- PSNR Peak signal-to-noise ratio
- the NN which is optimized may have been pretrained (e.g., may have been previously trained on a training dataset).
- Overfitting of a DSNN may be performed by computing a weight-update at encoder side by means of one or more training iterations, e.g., based on backpropagation. Then, the obtained weight update is compressed by a neural network encoder and provided to a neural network decoder.
- the neural network encoder may be part of the encoder of data units, such as a video encoder; the neural network decoder may be part of the decoder of data units, such as a video decoder.
- the decoder decompresses the compressed weight-update, uses the decompressed weight-update for updating the DSNN, and uses the updated DSNN for its purpose, such as for decoding a data unit or for post-processing a decoded data unit; for example, the updated DSNN may be used for decoding a video frame or for postprocessing a decoded video frame or data derived from the decoded video frame.
- Sending a weight update from encoder to decoder may cause a bitrate increase, or bitrate overhead, with respect to the bitrate required for sending an encoded data unit (e.g., an encoded video).
- an encoded data unit e.g., an encoded video
- An example problem addressed by one or more embodiments is how to reduce such bitrate overhead or, in other words, how to generate a low-bitrate bitstream representing a weight update.
- a filter may take as input at least one or more first images (or blocks of an image) to be filtered and may output at least one or more second images (or blocks of an image), where the one or more second images may be the filtered version of the one or more first images.
- the filter takes as input one image and outputs one image.
- the filter takes as input more than one image and outputs one image.
- the filter takes as input more than one image and outputs more than one image.
- a filter may take as input also other data (also referred to as auxiliary data) than the data that is to be filtered, such as data that may aid the filter to perform a better filtering than when no auxiliary data was provided as input.
- the auxiliary data comprises information about prediction data, and/or information about the picture type, and/or information about the slice type, and/or information about a Quantization Parameter (QP) used for encoding, and/or information about boundary strength, and the like.
- the filter takes as input one image and other data associated to that image, such as information about the quantization parameter (QP) used for quantizing and/or dequantizing that image, and outputs one image.
- a filter may be a neural network based filter, or may be another type of filter.
- training features which may be applicable to machine learning based filters such as NN filters.
- a filter may be, for example, an in-loop filter that is used in the decoding loop of a codec, or a post-processing filter applied on the data decoded by the codec.
- the input and output data are in the form of images or (video) frames or pictures
- those embodiments may be applicable also to other types of data, such as audio frames.
- those embodiments may be applicable also when considering one or more blocks or portions of an image.
- the figures do not include visual representations of other components that may be present at encoder side and/or at decoder side.
- a figure may not include information about some of the encoder-side operations performed by the encoder of the end-to-end learned image codec, such as an encoder neural network to process an input image into a latent tensor or a lossless encoder to encode a latent tensor into a bitstream, and/or may not include information about some of the decoder-side operations performed by the decoder of the end-to-end learned image codec, such as a lossless decoder that decodes a bitstream representing an encoded image.
- the DSNN is a neural network based loop filter that is part of a video decoding loop
- a figure may not depict some of the video
- a decoder-side neural network (DSNN) is overfitted based at least on an overfitting signal, obtaining an overfitted DSNN.
- the aim of the overfitting process is that the overfitted DSNN performs better than the DSNN with respect to a predefined metric, on at least some input data.
- a decoder-side neural network (DSNN) 502 is overfitted 504 based at least on an overfitting signal 506, obtaining an overfitted DSNN 508.
- An example objective of the overfitting process is that the overfitted DSNN performs better than the DSNN with respect to a predefined metric, on at least some input data.
- the overfitted DSNN may be used (604) for its purpose, such as for filtering input data 602 to obtain filtered data 606.
- a DSNN is a neural network which is used in a decoder or as a post-processing operation.
- Some examples of a DSNN used in a decoder are the following:
- An NN loop filter that is comprised in a conventional decoder or in an end-to-end learned codec
- An NN post-filter that follows, in processing order, an inner decoder, such as a conventional decoder or an end-to-end learned codec; or
- An NN decoder which is comprised in an end-to-end learned codec.
- a DSNN that performs filtering of at least a portion of the input data.
- the at least portion of input data to be filtered by the DSNN is referred to as “data to be filtered”.
- the DSNN may take other inputs than the data to be filtered, and those other inputs may not be mentioned in the description of some embodiments or examples for the sake of simplicity.
- the same embodiments may be applicable to or valid for a DSNN that performs other purposes or tasks, such as a neural network decoder that is part of an end-to-end learned codec.
- the DSNN may be pretrained, e.g., may have been previously trained on a training dataset, or may be initialized in some other suitable way, such as by random or pseudo-random initialization.
- the DSNN or a copy of the DSNN, or a neural network which is same or substantially the same as the DSNN, is available at encoder side, and it may be referred to simply as DSNN even when meaning a DSNN present at encoder side.
- Overfitting a DSNN refers to optimizing the DSNN for a certain data unit, with respect to one or more metrics. Examples of data units are:
- PSNR Peak signal-to-noise ratio
- the process of optimizing a DSNN on a certain data unit with respect to one or more metrics may be referred to as overfitting, or specializing, or finetuning.
- the DSNN which is optimized may be pretrained (e.g., may have been previously trained on a training dataset).
- a DSNN that has been overfitted or optimized on a certain data unit may be then used for processing (e.g., filtering, decoding, post-processing) one or more of the following data:
- the overfitting signal may comprise updated weights.
- the updated weights may comprise one or more updated values associated to respective one or more weights or parameters of the DSNN.
- Overfitting a DSNN based on updated weights may comprise replacing the values of one or more weights or parameters of the DSNN with respective one or more updated values comprised in the updated weights.
- the updated weights comprise one or more updated values associated to respective one or more multiplying parameters, where the one or more multiplying parameters are comprised in a DSNN, and where each of the one or more multiplying parameters multiply an output of a layer of the DSNN.
- the updated weights comprise one or more updated values associated to respective one or more bias parameters, where the one or more bias parameters are comprised in a DSNN, and where each of the one or more bias parameters are added to an output of a layer of the DSNN.
- the overfitting signal may comprise a weight-update.
- the weightupdate may comprise one or more update values associated to respective one or more weights or parameters of the DSNN, where each update value represents an update or change to a respective weight or parameter of the DSNN.
- Overfitting a DSNN based on a weight-update may comprise adding or subtracting (or other suitable operation) the one or more update values of the weight-update to respective one or more weights or parameters of the DSNN.
- the weight-update comprises one or more updates associated to respective one or more multiplying parameters, where the one or more multiplying parameters are comprised in a DSNN, and where each of the one or more multiplying parameters multiply an output of a layer of the DSNN.
- the weight-update comprises one or more updates associated to respective one or more bias parameters, where the one or more bias parameters are comprised in a DSNN, and where each of the one or more bias parameters are added to an output of a layer of the DSNN.
- the overfitting signal may comprise a modulating signal.
- the modulating signal may comprise one or more modulating values, or one or more sets of modulating values, associated to respective one or more outputs of one or more layers of a DSNN.
- Overfitting a DSNN based on a modulating signal may comprise modifying the one or more outputs based at least on respective one or more modulating values, or the one or more sets of modulating values, and a modulation operation.
- a multiplier multiplies a channel of a tensor, where the tensor can be, for example, the output of a convolutional layer.
- an overfitting signal is applied to a tensor that is output by a convolutional layer with C convolutional kernels.
- the shape or size of the tensor is (B, C, H, W) (an alternative notation for the shape or size of a tensor is BxCxHxW), where B indicates a batch size, C indicates a number of kernels of the convolutional kernels that produced the tensor and represents the number of channel of the tensor, H and W indicate the height and width of the tensor.
- the batch size B is assumed to be 1 (i.e., the shape of a tensor output by a NN layer will be assumed to be (1, C, H, W)) and it will not be shown, i.e., the shape of a tensor that is output by a NN layer will be shown simply as C, H, W).
- the NN filter is assumed to comprise at least one layer that outputs a tensor, where the tensor is multiplied with one or more multipliers.
- each of the one or more multipliers multiplies a channel of the tensor, where a channel of the tensor is a sliced tensor of shape or size (1, H, W) (or, in the alternative notation, IxHxW).
- a channel of the tensor is a sliced tensor of shape or size (1, H, W) (or, in the alternative notation, IxHxW).
- T is the tensor of shape (C, H, W) and m is an array of shape C.
- m is an array of shape C.
- the C multipliers are stored in an array m of length C, and the tensor is referred to as T, the value m[0] multiplies the sliced tensor T[0, :, :], where indicates all the indexes on that axis, the value m[l] multiplies the sliced tensor T[l, :, :], and so on.
- a neural network may have more than one layer, where each layer may output a tensor.
- a neural network may internally produce more than one tensor.
- Different tensors produced within a neural network may have different shapes. For example, different tensors may have different number of channels.
- at least some of the described embodiments consider the case of a single layer, but the underlying ideas can be extended to the case of multiple layers with potentially different shapes of the tensors produced by those layers.
- the values of the C multipliers may be in a set of cardinality less than C. In other words, different multipliers may be constrained to have the same value.
- the cardinality of the set of values of the C multipliers may be referred to as K.
- K is the number of different values than that the C multipliers may have. This way, the C multipliers are grouped into K groups, where the multipliers within each group have same value and multipliers within different groups may have different values.
- the number of multipliers in different groups is the same or approximately the same.
- each group comprises C/K multipliers.
- the number of multipliers in different groups may be different.
- one of the K groups may comprise much more multipliers than another one of the K groups.
- the number K of different values of multipliers may be same for all the layers of a neural network. In an alternative embodiment, the number K of different values of multipliers may be different for different layers of a neural network.
- multipliers are assigned to groups in a predetermined way based at least on a grouping operation and is same for any content or video sequence on which the overfitting is performed. In other words, the assigning of multipliers to groups is not dependent on the particular content being overfitted or encoded/decoded.
- the multipliers in the same group may be constrained to have the same value.
- an encoder may signal the values of K multiplier updates or K updated multipliers to a decoder.
- the received values of K multiplier updates or K updated multipliers are used to update the C multipliers of that layer, based on information about the grouping operation.
- the grouping operation may comprise determining or assuming an order of the channels and/or multipliers that are applied to the channels and then assigning nearby or consecutive multipliers to the same group.
- K 2
- 6 multipliers may be assigned as 1,1, 2, 2, 2, 3 and not 1,2, 1,2, 3, 2.
- an encoder may signal to a decoder an indication of the grouping operation, and/or an indication of an order of the channels or multipliers.
- multiplier m[0] is associated to channel T[0
- multiplier m[l] is associated to channel T[l, :, :], and so on.
- each of the 6 multipliers is assigned to one of the 2 groups as follows: m[0] is assigned to group 1, m[l] is assigned to group 1, m[2] is assigned to group 1, m[3] is assigned to group 2, m[4] is assigned to group 2, m[5] is assigned to group 2.
- the multipliers m[0], m[l], m[2], m[3], m[4], m[5] are assigned to the groups 1, 1, 1, 2, 2, 2, respectively.
- the ordering of channels and multipliers is assumed to be the order of channels of the tensor that is output by the considered NN layer, or the order of the convolutional kernels in the considered NN layer.
- the assigning of multipliers to groups may be determined based on the content on which the multipliers are overfitted and based on a grouping operation.
- the assigning of multipliers to groups may thus be different for different contents (e.g., different video sequences, or different video frames).
- C multipliers are overfitted without constraining their values to be same or without considering them as belonging to any groups.
- a clustering operation is applied to the values of the updated multipliers or to the multiplier-updates, by using a number K of clusters.
- the number K of clusters may be predetermined or determined during the clustering operation.
- the values of C updated multipliers or multiplier-updates are clustered into K clusters or groups and represented by K centroids. Each centroid represents the value of the new updated multiplier or multiplier -update.
- an encoder may signal the values of the new K multiplier-updates or K updated multipliers (e.g., the centroids) to a decoder, together with an indication of the assignment of each value to the respective multiplier.
- the received values of K multiplier updates or K updated multipliers are used to update the C multipliers of that layer, based on the information about the assignment of each value to the respective multiplier.
- an encoder may signal C values of the multiplier-updates or updated multipliers, where the C values are in a set of cardinality K, and where the C values are ordered according to the assumed order of the channels of the tensor and/or the order of the multipliers that multiply the channels of the tensor.
- the ordering of the multipliers is performed by using the order of the channels in the tensor T, e.g., multiplier m[0] is associated to channel T[0, :, :], multiplier m[l] is associated to channel T[l, :, :], and so on.
- each of the 6 multipliers is assigned to one of the 2 clusters as follows: m[0] is assigned to group 1, m[l] is assigned to group 2, m[2] is assigned to group 2, m[3] is assigned to group 2, m[4] is assigned to group 1, m[5] is assigned to group 2.
- the multipliers m[0], m[l], m[2], m[3], m[4], m[5] are assigned to the groups 1, 2, 2, 2, 1, 2, respectively.
- the encoder signals the following: cl, c2, idxO, idxl, idx2, idx3, idx4, idx5, where cl and c2 represent the values of the centroids and idxO .. idx5 represent the indexes of the clusters for multipliers m[0] .. m[5].
- the signal could be: 1.2, 0.8, 0, 1, 1, 1, 0, 1, which indicates that there are two centroids with values 1.2 and 0.8, associated to clusters with indexes 0 and 1, respectively, and that the multiplier are assigned to the clusters as follows: m[0] is assigned to cluster with index 0, m[l] is assigned to cluster with index 1, m[2] is assigned to cluster with index 1, m[3] is assigned to cluster with index 1, m[4] is assigned to cluster with index 0, m[5] is assigned to cluster with index 1.
- the updated values of the multipliers e.g., the updated multipliers
- the update values e.g., the multiplier updates
- the quantized updated values or the quantized update values may be dequantized prior to be used for updating the neural network.
- FIG. 7 is an example apparatus, which may be implemented in hardware, caused to perform: overfitting shared multipliers.
- the apparatus 700 comprises at least one processor 702 (e.g., an FPGA and/or CPU), one or more memories 704 including computer program code 705, the computer program code 705 having instructions to carry out the methods described herein, wherein the at least one memory 704 and the computer program code 705 are configured to, with the at least one processor 702, cause the apparatus 700 to implement circuitry, a process, component, module, or function (implemented with control module 706) to implement the examples described herein, including overfitting shared multipliers.
- Optionally included encoder 730 of the control module 706 performs encoding, and optionally included decoder 732 implements decoding.
- the memory 704 may be a non- transitory memory, a transitory memory, a volatile memory (e.g., RAM), or a non-volatile memory (e.g., ROM).
- the apparatus 700 includes a display and/or I/O interface 708, which includes user interface (UI) circuitry and elements, that may be used to display features or a status of the methods described herein (e.g., as one of the methods is being performed or at a subsequent time), or to receive input from a user such as with using a keypad, camera, touchscreen, touch area, microphone, biometric recognition, one or more sensors, and the like.
- the apparatus 700 includes one or more communication, e.g., network (N/W) interfaces (I/F(s)) 710.
- the communication I/F(s) 710 may be wired and/or wireless and communicate over the Internet/other network(s) via any communication technique including via one or more links 724.
- the communication I/F(s) 710 may comprise one or more transmitters or one or more receivers.
- the transceiver 716 comprises one or more transmitters 718 and one or more receivers 720.
- the transceiver 716 and/or communication I/F(s) 710 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitries and one or more antennas, such as antennas 714 used for communication over wireless link 722.
- the control module 706 of the apparatus 700 comprises one of or both parts 706-1 and/or 706-2, which may be implemented in a number of ways.
- the control module 706 may be implemented in hardware as control module 706-1, such as being implemented as part of the one or more processors 702.
- the control module 706-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
- the control module 706 may be implemented as control module 706-2, which is implemented as computer program code (having corresponding instructions) 705 and is executed by the one or more processors 702.
- the one or more memories 704 store instructions that, when executed by the one or more processors 702, cause the apparatus 700 to perform one or more of the operations as described herein.
- the one or more processors 702, one or more memories 704, and example algorithms are means for causing performance of the operations described herein.
- the apparatus 700 to implement the functionality of control 706 may correspond to any of the apparatuses depicted herein.
- apparatus 700 and its elements may not correspond to any of the other apparatuses depicted herein, as apparatus 700 may be part of a self- organizing/optimizing network (SON) node or other node, such as a node in a cloud.
- SON self- organizing/optimizing network
- the apparatus 700 may also be distributed throughout the network (e.g., internet 28) including within and between apparatus 700 and any network element (such as a base station 24 and/or apparatus 90).
- network element such as a base station 24 and/or apparatus 90.
- Interface 712 enables data communication and signaling between the various items of apparatus 700, as shown in FIG. 7.
- the interface 712 may be one or more buses such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
- Computer program code (e.g., instructions) 705, including control 706 may comprise object- oriented software configured to pass data or messages between objects within computer program code 705.
- the apparatus 700 need not comprise each of the features mentioned, or may comprise other features as well.
- the various components of apparatus 700 may at least partially reside in a common housing 728, or a subset of the various components of apparatus 700 may at least partially be located in different housings, which different housings may include housing 728.
- FIG. 8 shows a schematic representation of non-volatile memory media 800a (e.g. computer/compact disc (CD) or digital versatile disc (DVD)) and 800b (e.g. universal serial bus (USB) memory stick) and 800c (e.g. cloud storage for downloading instructions and/or parameters 802 or receiving emailed instructions and/or parameters 802) storing instructions and/or parameters 802 which when executed by a processor allows the processor to perform one or more of the operations of the methods described herein.
- non-volatile memory media 800a e.g. computer/compact disc (CD) or digital versatile disc (DVD)
- 800b e.g. universal serial bus (USB) memory stick
- 800c e.g. cloud storage for downloading instructions and/or parameters 802 or receiving emailed instructions and/or parameters 802 storing instructions and/or parameters 802 which when executed by a processor allows the processor to perform one or more of the operations of the methods described herein.
- FIG. 9 is an example method 900 to implement the embodiments described herein, in accordance with an embodiment.
- the method 900 includes setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers.
- the method 900 includes multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
- FIG. 10 is another example method 1000 to implement the embodiments described herein, in accordance with an embodiment.
- the method 1000 includes multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
- the method 1000 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
- FIG. 11 is yet another example method 1100 to implement the embodiments described herein, in accordance with an embodiment.
- the method 1100 includes receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise. In an embodiment, K is less than C.
- the method 1100 includes receiving K values of multiplier updates or K values of updated multipliers.
- the method 1100 includes updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
- the method 1100 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
- FIG. 12 is still another example method 1200 to implement the embodiments described herein, in accordance with an embodiment.
- the method 1200 includes receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier.
- the method 1200 includes using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
- the method 1200 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
- FIG. 13 is still another example method 1300 to implement the embodiments described herein, in accordance with an embodiment.
- the method 1300 includes receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K.
- the method 1300 includes using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
- the method 1300 may be performed with an apparatus described herein, for example, the any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
- FIG. 14 shows a block diagram of one possible and non-limiting example in which the examples may be practiced.
- a user equipment (UE) 110 radio access network (RAN) node 170, and network element(s) 190 are illustrated.
- the user equipment (UE) 110 is in wireless communication with a wireless network 100.
- a UE is a wireless device that can access the wireless network 100.
- the UE 110 includes one or more processors 120, one or more memories 125, and one or more transceivers 130 interconnected through one or more buses 127.
- Each of the one or more transceivers 130 includes a receiver, Rx, 132 and a transmitter, Tx, 133.
- the one or more buses 127 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
- the one or more transceivers 130 are connected to one or more antennas 128.
- the one or more memories 125 include computer program code 123.
- the UE 110 includes a module 140, comprising one of or both parts 140-1 and/or 140-2, which may be implemented in a number of ways.
- the module 140 may be implemented in hardware as module 140-1, such as being implemented as part of the one or more processors 120.
- the module 140-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
- the module 140 may be implemented as module 140-2, which is implemented as computer program code 123 and is executed by the one or more processors 120.
- the one or more memories 125 and the computer program code 123 may be configured to, with the one or more processors 120, cause the user equipment 110 to perform one or more of the operations as described herein.
- the UE 110 communicates with a radio access network (RAN) node 170 via a wireless link 111.
- RAN radio access network
- the RAN node 170 in this example is a base station that provides access by wireless devices such as the UE 110 to the wireless network 100.
- the RAN node 170 may be, for example, a base station for fifth generation cellular network technology (5G), also called New Radio (NR).
- 5G fifth generation cellular network technology
- NR New Radio
- the RAN node 170 may be a NG-RAN node, which is defined as either a gNB (e.g., base station for 5G/NR, for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC) or an ng (new generation)-eNB.
- gNB e.g., base station for 5G/NR, for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
- ng new generation
- a gNB is a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to a 5G core network (5GC) (such as, for example, the network element(s) 190).
- the ng-eNB is a node providing evolved universal terrestrial radio access (E-UTRA), for example, the LTE radio access technology, user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC.
- the NG-RAN node may include multiple gNBs, which may also include a central unit (CU) (gNB-CU) 196 and distributed unit(s) (DUs) (gNB-DUs), of which DU 195 is shown.
- CU central unit
- DUs distributed unit
- the DU may include or be coupled to and control a radio unit (RU).
- the gNB- CU is a logical node hosting radio resource control (RRC), service data adaptation protocol (SDAP) and PDCP protocols of the gNB or RRC and packet data convergence protocol (PDCP) protocols of the en-gNB (e.g., node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in E-UTRA-NR dual connectivity (EN-DC)) that controls the operation of one or more gNB-DUs.
- RRC radio resource control
- SDAP service data adaptation protocol
- PDCP packet data convergence protocol
- the gNB-CU terminates the interface between CU and DU control interface (Fl or Fl-C) interface connected with the gNB -DU.
- Fl or Fl-C DU control interface
- the Fl interface is illustrated as reference 198, although reference 198 also illustrates a link between remote elements of the RAN node 170 and centralized elements of the RAN node 170, such as between the gNB-CU 196 and the gNB-DU 195.
- the gNB-DU is a logical node hosting radio link control (RLC), MAC and physical layer (PHY) layers of the gNB or en-gNB, and its operation is partly controlled by gNB-CU.
- RLC radio link control
- PHY physical layer
- One gNB-CU supports one or multiple cells.
- One cell is supported by only one gNB-DU.
- the gNB-DU terminates the Fl interface 198 connected with the gNB-CU.
- the DU 195 is considered to include the transceiver 160, for example, as part of a RU, but some examples of this may have the transceiver 160 as part of a separate RU, for example, under control of and connected to the DU 195.
- the RAN node 170 may also be an eNB (evolved NodeB) base station, for example, long term evolution (LTE), or any other suitable base station or node.
- eNB evolved NodeB
- LTE long term evolution
- the RAN node 170 includes one or more processors 152, one or more memories 155, one or more network interfaces (N/W I/F(s)) 161, and one or more transceivers 160 interconnected through one or more buses 157.
- Each of the one or more transceivers 160 includes a receiver, Rx, 162 and a transmitter, Tx, 163.
- the one or more transceivers 160 are connected to one or more antennas 158.
- the one or more memories 155 include computer program code 153.
- the CU 196 may include the processor(s) 152, memories 155, and network interfaces 161.
- the DU 195 may also include its own memory/memories and processor(s), and/or other hardware, but these are not shown.
- the RAN node 170 includes a module 150, comprising one of or both parts 150-1 and/or 150-2, which may be implemented in a number of ways.
- the module 150 may be implemented in hardware as module 150-1, such as being implemented as part of the one or more processors 152.
- the module 150-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
- the module 150 may be implemented as module 150-2, which is implemented as computer program code 153 and is executed by the one or more processors 152.
- the one or more memories 155 and the computer program code 153 are configured to, with the one or more processors 152, cause the RAN node 170 to perform one or more of the operations as described herein.
- the one or more network interfaces 161 communicate over a network such as via the links 176 and 131.
- Two or more gNBs 170 may communicate using, for example, link 176.
- the link 176 may be wired or wireless or both and may implement, for example, an Xn interface for 5G, an X2 interface for LTE, or other suitable interface for other standards.
- the one or more buses 157 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, wireless channels, and the like.
- the one or more transceivers 160 may be implemented as a remote radio head (RRH) 195 for LTE or a distributed unit (DU) 195 for gNB implementation for 5G, with the other elements of the RAN node 170 possibly being physically in a different location from the RRH/DU, and the one or more buses 157 could be implemented in part as, for example, fiber optic cable or other suitable network connection to connect the other elements (for example, a central unit (CU), gNB-CU) of the RAN node 170 to the RRH/DU 195.
- Reference 198 also indicates those suitable network link(s).
- the cell makes up part of a base station. That is, there can be multiple cells per base station. For example, there could be three cells for a single carrier frequency and associated bandwidth, each cell covering one-third of a 360 degree area so that the single base station’s coverage area covers an approximate oval or circle. Furthermore, each cell can correspond to a single carrier and a base station may use multiple carriers. So when there are three 120 degree cells per carrier and two carriers, then the base station has a total of 6 cells.
- the wireless network 100 may include a network element or elements 190 that may include core network functionality, and which provides connectivity via a link or links 181 with a further network, such as a telephone network and/or a data communications network (for example, the Internet).
- core network functionality for 5G may include access and mobility management function(s) (AMF(S)) and/or user plane functions (UPF(s)) and/or session management function(s) (SMF(s)).
- AMF(S) access and mobility management function(s)
- UPF(s) user plane functions
- SMF(s) session management function
- Such core network functionality for LTE may include MME (Mobility Management Entity )/SGW (Serving Gateway) functionality. These are merely example functions that may be supported by the network element(s) 190, and note that both 5G and LTE functions might be supported.
- the RAN node 170 is coupled via a link 131 to the network element 190.
- the link 131 may be implemented as, for example, an NG interface for 5G, or an SI interface for LTE, or other suitable interface for other standards.
- the network element 190 includes one or more processors 175, one or more memories 171, and one or more network interfaces (N/W I/F(s)) 180, interconnected through one or more buses 185.
- the one or more memories 171 include computer program code 173.
- the one or more memories 171 and the computer program code 173 are configured to, with the one or more processors 175, cause the network element 190 to perform one or more operations.
- the wireless network 100 may implement network virtualization, which is the process of combining hardware and software network resources and network functionality into a single, softwarebased administrative entity, a virtual network.
- Network virtualization involves platform virtualization, often combined with resource virtualization.
- Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to software containers on a single system. Note that the virtualized entities that result from the network virtualization are still implemented, at some level, using hardware such as processors 152 or 175 and memories 155 and 171, and also such virtualized entities create technical effects.
- the computer readable memories 125, 155, and 171 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
- the computer readable memories 125, 155, and 171 may be means for performing storage functions.
- the processors 120, 152, and 175 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non-limiting examples.
- the processors 120, 152, and 175 may be means for performing functions, such as controlling the UE 110, RAN node 170, network element(s) 190, and other functions as described herein.
- the various embodiments of the user equipment 110 can include, but are not limited to, cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
- cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
- PDAs personal digital assistants
- portable computers having wireless communication capabilities
- image capture devices such as digital cameras having wireless communication capabilities
- gaming devices having wireless communication capabilities
- music storage and playback appliances having wireless communication capabilities
- modules 140-1, 140-2, 150-1, and 150-2 may be caused to perform: overfitting shared multipliers.
- Computer program code 173 may also be caused perform: overfitting shared multipliers.
- FIGs. 9 to 13 include flowcharts of an apparatus (e.g. 50, 700, or any other apparatuses described herein), method, and computer program product according to certain example embodiments. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory (e.g.
- any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks.
- These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture, the execution of which implements the function specified in the flowchart blocks.
- the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.
- a computer program product is therefore defined in those instances in which the computer program instructions, such as computer-readable program code portions, are stored by at least one non- transitory computer-readable storage medium with the computer program instructions, such as the computer-readable program code portions, being configured, upon execution, to perform the functions described above, such as in conjunction with the flowchart(s) of FIGs. 9 to 13.
- the computer program instructions, such as the computer-readable program code portions need not be stored or otherwise embodied by a non-transitory computer-readable storage medium, but may, instead, be embodied by a transitory medium with the computer program instructions, such as the computer- readable program code portions, still being configured, upon execution, to perform the functions described above.
- blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. [00224] In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.
- references to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, and the like.
- circuitry may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even when the software or firmware is not physically present.
- This description of ‘circuitry’ applies to uses of this term in this application.
- circuitry would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware.
- circuitry would also cover, for example and when applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device.
- Circuitry or Circuit As used in this application, the term ‘circuitry’ or ‘circuit’ may refer to one or more or all of the following:
- any portions of hardware processor(s) with software including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions
- hardware circuit(s) and or processor(s) such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.
- circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware.
- circuitry also covers, for example, and when applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Mathematical Optimization (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Various embodiments provide methods, apparatuses, and computer program products. An example apparatus includes at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
Description
OVERFITTING SHARED MULTIPLIERS
TECHNICAL FIELD
[001] the examples and non-limiting embodiments relate generally to multimedia transport and neural networks, and more particularly, to method, apparatus, and computer program product for overfitting shared multipliers.
BACKGROUND
[002] It is known to provide video encoding and decoding.
SUMMARY
[003] Example 1 : An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers; and multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
[004] Example 2: An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
[005] Example 3: The apparatus of example 1 or 2, wherein K is a number of different values than that the C multipliers comprise, and wherein the values of the C multipliers are grouped into K groups, and wherein multipliers within each group comprise same value and multipliers within different groups comprise different values.
[006] Example 4: The apparatus of any of the examples 1 to 3, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
[007] Example 5: The apparatus of any of the examples 1 to 4, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
[008] Example 6: The apparatus of any of the examples 1 to 5, wherein the apparatus is caused to perform: assigning of the C multipliers to the different groups.
[009] Example 7: The apparatus of example 6, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
[0010] Example 8: The apparatus of example 6, wherein the apparatus is further caused to perform the grouping operation, and wherein to perform the grouping operation the apparatus is further caused to perform: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
[0011] Example 9: The apparatus of example 7 or 8, wherein the apparatus is further caused to perform: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
[0012] Example 10: The apparatus of example 6, wherein the assigning is determined based on a content on which the C multipliers are overfitted and on a grouping operation.
[0013] Example 11: The apparatus of any of the examples 7 to 9, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the apparatus is further caused to perform: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
[0014] Example 12: The apparatus of examples 6 or 10, wherein during an overfitting operation, the C multipliers are overfitted without constraining the values of the C multipliers or without considering the C multipliers to belong to any groups, and wherein the apparatus is further caused to perform: updating of the C multipliers, based on the overfitting operation; and applying a clustering operation to the values of the C updated multipliers or to the C multiplier-updates by using a number K of clusters, wherein values of the C updated multipliers or values of the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a value of a multiplier-update.
[0015] Example 13: The apparatus of example 12, wherein the apparatus is further caused to perform: signaling the K values of multiplier-updates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
[0016] Example 14: The apparatus of example 12, wherein the apparatus is further caused to perform: signaling values of the C multiplier-updates or the C updated multipliers, wherein the C values are in the set of cardinality K, and wherein the C values are ordered according to an assumed order of the channels of the tensor and/or the order of multipliers that multiply the channels of the tensor.
[0017] Example 15: The apparatus of any of the examples 11, 13, or 14, wherein the apparatus is further caused to perform: quantizing the values of the C multiplier-updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
[0018] Example 16: The apparatus of any of the previous examples, wherein the layer comprises a convolution layer of the neural network.
[0019] Example 17: An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
[0020] Example 18: The apparatus of example 17, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
[0021] Example 19: An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
[0022] Example 20: The apparatus of example 19, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or
K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number smaller than C.
[0023] Example 21: The apparatus of any of the examples 19 or 20, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
[0024] Example 22: An apparatus comprising at least one processor; and at least one non- transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
[0025] Example 23 The apparatus of example 22, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or an multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number different from C.
[0026] Example 24: The apparatus of any of the examples 22 or 23, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized C multiplier updates or the C updated multipliers.
[0027] Example 25: A method comprising: setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers; and multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
[0028] Example 26: A method comprising: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
[0029] Example 27 : The method of example 25 or 26, wherein K is a number of different values than that the C multipliers comprise, and wherein the values of the C multipliers are grouped into K
groups, and wherein multipliers within each group comprise same value and multipliers within different groups comprise different values.
[0030] Example 28: The method of any of the examples 25 to 27, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
[0031] Example 29: The method of any of the examples 25 to 28, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
[0032] Example 30: The method of any of the examples 25 to 29 further comprising: assigning of the C multipliers to the different groups.
[0033] Example 31: The method of example 30, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
[0034] Example 32: The method of example 30 further comprising performing the grouping operation, and wherein performing the grouping operation comprises: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
[0035] Example 33: The method of example 31 or 32 further comprising: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
[0036] Example 34: The method of example 30, wherein the assigning is determined based on a content on which the C multipliers are overfitted and on a grouping operation.
[0037] Example 35: The method of any of the examples 31 to 33, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the method further comprises: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
[0038] Example 36: The method of example 30 or 34, wherein during an overfitting operation, the C multipliers are overfitted without constraining the values of the C multipliers or without considering the C multipliers to belong to any groups, and wherein the method further comprises: updating of the C multipliers, based on the overfitting operation; and applying a clustering operation to
the values of the C updated multipliers or to the C multiplier-updates by using a number K of clusters, wherein values of the C updated multipliers or values of the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a value of a multiplier-update.
[0039] Example 37: The method of example 36 further comprising: signaling the K values of multiplier-updates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
[0040] Example 38: The method of example 36 further comprising: signaling values of the C multiplier-updates or the C updated multipliers, wherein the C values are in the set of cardinality K, and wherein the C values are ordered according to an assumed order of the channels of the tensor and/or the order of multipliers that multiply the channels of the tensor.
[0041] Example 39: The method of any of the examples 35, 37, or 38 further comprising: quantizing the values of the C multiplier -updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
[0042] Example 40: The method of any of the examples 25 to 39, wherein the layer comprises a convolution layer of the neural network.
[0043] Example 41: A method comprising: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
[0044] Example 42: The method of example 41, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
[0045] Example 43: A method: receiving, for a layer of a neural network, K values of multiplierupdates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
[0046] Example 44: The method of example 43, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number smaller than C.
[0047] Example 45: The method of any of the examples 43 or 44, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
[0048] Example 46: A method comprising: receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
[0049] Example 47: The method of example 46, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or an multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number different from C.
[0050] Example 48: The method of any of the examples 46 or 47, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized C multiplier updates or the C updated multipliers.
[0051] Example 49: A computer readable medium comprising program instructions which, when executed by an apparatus, cause the apparatus to perform the methods as described in any of the examples 25 to 48.
[0052] Example 50: An apparatus comprising means for performing methods as described in any of the examples 25 to 48.
BRIEF DESCRIPTION OF THE DRAWINGS
[0053] The foregoing embodiments and other features are explained in the following description, taken in connection with the accompanying drawings, wherein:
[0054] FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein.
[0055] FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein.
[0056] FIG. 3 shows a block diagram of a general structure of a video encoder.
[0057] FIG. 4 illustrates a pipeline of video coding for machines (VCM).
[0058] FIG. 5 illustrates a pipeline where a decoder-side neural network (DSNN) is overfitted based at least on an overfitting signal, obtaining an overfitted DSNN.
[0059] FIG. 6 illustrates that the overfitted DSNN may be used for its purpose, such as for filtering input data.
[0060] FIG. 7 is an example apparatus, which may be implemented in hardware, and is caused to, implement examples described herein.
[0061] FIG. 8 shows a representation of an example of non-volatile memory media used to store instructions that implement the examples described herein.
[0062] FIG. 9 is an example method to implement the embodiments described herein, in accordance with another embodiment.
[0063] FIG. 10 is another example method to implement the embodiments described herein, in accordance with another embodiment.
[0064] FIG. 11 is an example method to implement the embodiments described herein, in yet accordance with another embodiment.
[0065] FIG. 12 is another example method to implement the embodiments described herein, in still accordance with another embodiment.
[0066] FIG. 13 is another example method to implement the embodiments described herein, in still accordance with another embodiment.
[0067] FIG. 14 is a block diagram of one possible and non-limiting system in which the example embodiments may be practiced.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
[0068] The following acronyms and abbreviations that may be found in the specification and/or the drawing figures are defined as follows:
4CC four character code
5G fifth generation cellular network technology
5GC 5G core network a.k.a. also known as
AVC advanced video coding
CU central unit
DSP digital signal processor
DU distributed unit eNB (or eNodeB) evolved Node B (for example, an LTE base station)
EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN-DC
E-UTRA evolved universal terrestrial radio access, for example, the LTE radio access technology
Fl or Fl-C interface between CU and DU control interface gNB (or gNodeB) base station for 5G/NR, for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC
IEC International Electrotechnical Commission loT internet of things
ISO International Organization for Standardization
ISOBMFF ISO base media file format
JPEG joint photographic experts group
LTE long-term evolution
mdat MediaDataBox
MME mobility management entity moov MovieBox
MP4 file format for MPEG-4 Part 14 files
MPEG moving picture experts group
MPEG-2 H.222/H.262 as defined by the ITU
MPEG-4 audio and video coding standard for ISO/IEC 14496 ng or NG new generation ng-eNB or NG-eNB new generation eNB
NR new radio (5G radio)
N/W or NW network
PDCP packet data convergence protocol
PHY physical layer
PNG portable network graphics
RAN radio access network
RFC request for comments
RLC radio link control
RRC radio resource control
RRH remote radio head
RU radio unit
Rx receiver
SDAP service data adaptation protocol
SGW serving gateway
SMF session management function
SPS sequence parameter set
SVC scalable video coding
SI interface between eNodeBs and the EPC
trak TrackBox
Tx transmitter
UE user equipment
UICC Universal Integrated Circuit Card
UPF user plane function
URL uniform resource locator
X2 interconnecting interface between two eNodeBs in LTE network
Xn interface between two NG-RAN nodes
[0069] Some embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all, embodiments are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms ‘data,’ ‘content,’ ‘information,’ and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments.
[0070] Additionally, as used herein, the term ‘circuitry’ refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even when the software or firmware is not physically present. This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
[0071] As defined herein, a ‘computer-readable storage medium,’ which refers to a non- transitory physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a ‘computer-readable transmission medium,’ which refers to an electromagnetic signal.
[0072] A method, apparatus and computer program product are provided in accordance with example embodiments for overfitting shared multipliers.
[0073] In an example, the following describes in detail suitable apparatus and possible mechanisms for overfitting shared multipliers. In this regard reference is first made to FIG. 1 and FIG. 2, where FIG. 1 shows an example block diagram of an apparatus 50. The apparatus may be an internet of things (loT) apparatus configured to perform various functions, for example, gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like. The apparatus may comprise a video coding system, which may incorporate a codec. FIG. 2 shows a layout of an apparatus according to an example embodiment. The elements of FIG. 1 and FIG. 2 will be explained next.
[0074] The apparatus 50, may for example be, a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or a lower power device. However, it would be appreciated that embodiments of the examples described herein may be implemented within any electronic device or apparatus which may process data by neural networks.
[0075] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 may further comprise a display 32, for example, in the form of a liquid crystal display, light emitting diode display, organic light emitting diode display, and the like. In other embodiments of the examples described herein the display may be any suitable display technology suitable to display media or multimedia content, for example, an image or a video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the examples described herein any suitable data or user interface mechanism may be employed. For example, the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
[0076] The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the examples described herein may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery (or in other embodiments of the examples described herein the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera 42 capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other
embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
[0077] The apparatus 50 may comprise a controller 56, a processor or a processor circuitry for controlling the apparatus 50. The controller 56 may be connected to a memory 58 which in embodiments of the examples described herein may store both data in the form of an image, audio data and video data, and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio, image and/or video data or assisting in coding and/or decoding carried out by the controller.
[0078] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example, a universal integrated circuit card (UICC) and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
[0079] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals, for example, for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and/or for receiving radio frequency signals from other apparatus(es).
[0080] The apparatus 50 may comprise a camera 42 capable of recording or detecting individual frames which are then passed to the codec circuitry 54 or the controller for processing. The apparatus may receive the video image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding. The structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.
[0081] FIG. 3 shows a block diagram of a general structure of a video encoder. FIG. 3 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers. FIG. 3 illustrates a video encoder comprising a first encoder section 301 for a base layer and a second encoder section 351 for an enhancement layer. Each of the first encoder section 301 and the second encoder section 351 may comprise similar elements for encoding incoming pictures. The encoder sections 301, 351 may comprise a pixel predictor 302, 352, prediction error encoder 303, 353 and prediction error decoder 304, 354. FIG. 3 also shows an embodiment of the pixel predictor 302, 352 as comprising an inter -predictor 306, 356, an intra-predictor 308, 358, a mode selector 310, 360, a filter 316, 366, and a reference frame memory 318, 368. The pixel
predictor 302 of the first encoder section 301 receives base layer picture(s)/image(s) 300 of a video stream to be encoded at both the inter-predictor 306 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 310. The intra- predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 310. The mode selector 310 also receives a copy of the base layer image(s) 300. Correspondingly, the pixel predictor 352 of the second encoder section 351 receives enhancement layer picture(s)/images(s) 350 of a video stream to be encoded at both the inter-predictor 356 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 358 (which determines a prediction for an image block based only on the already processed parts of current frame or picture). The output of both the inter-predictor and the intra-predictor are passed to the mode selector 360. The intra-predictor 358 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 360. The mode selector 360 also receives a copy of the enhancement layer pictures 350.
[0082] Depending on which encoding mode is selected to encode the current block, the output of the inter-predictor 306, 356 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 360. The output of the mode selector 310, 360 is passed to a first summing device 321, 371. The first summing device may subtract the output of the pixel predictor 302, 352 from the base layer image(s) 300/enhancement layer image(s) 350 to produce a first prediction error signal 320, 370 which is input to the prediction error encoder 303, 353.
[0083] The pixel predictor 302, 352 further receives from a preliminary reconstructor 339, 389 the combination of the prediction representation of the image block 312, 362 and the output 338, 388 of the prediction error decoder 304, 354. The preliminary reconstructed image 314, 364 may be passed to the intra-predictor 308, 358 and to the filter 316, 366. The filter 316, 366 receiving the preliminary representation may filter the preliminary representation and output a final reconstructed image 340, 490 which may be saved in the reference frame memory 318, 368. The reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer image 300 is compared in inter-prediction operations. Subject to the base layer being selected and indicated to be source for inter-layer sample prediction and/or inter-layer motion information prediction of the enhancement layer according to some embodiments, the reference frame memory 318 may also be connected to the inter-predictor 356 to be used as the reference image against which a future enhancement layer image(s) 350 is compared in inter-prediction operations. Moreover, the reference
frame memory 368 may be connected to the inter-predictor 356 to be used as the reference image against which the future enhancement layer image(s) 350 is compared in inter -prediction operations.
[0084] Filtering parameters from the filter 316 of the first encoder section 301 may be provided to the second encoder section 351 subject to the base layer being selected and indicated to be source for predicting the filtering parameters of the enhancement layer according to some embodiments.
[0085] The prediction error encoder 303, 353 comprises a transform unit 342, 492 and a quantizer 344, 494. The transform unit 342, 492 transforms the first prediction error signal 320, 370 to a transform domain. The transform is, for example, the DCT transform. The quantizer 344, 494 quantizes the transform domain signal, for example, the DCT coefficients, to form quantized coefficients.
[0086] The prediction error decoder 304, 354 receives the output from the prediction error encoder 303, 353 and performs the opposite processes of the prediction error encoder 303, 353 to produce a decoded prediction error signal 338, 388 which, when combined with the prediction representation of the image block 312, 362 at the second summing device 339, 389, produces the preliminary reconstructed image 314, 364. The prediction error decoder may be considered to comprise a dequantizer 346, 496, which dequantizes the quantized coefficient values, for example, DCT coefficients, to reconstruct the transform signal and an inverse transformation unit 348, 498, which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 348, 498 includes reconstructed block(s). The prediction error decoder may also comprise a block filter which may filter the reconstructed block(s) according to further decoded information and filter parameters.
[0087] The entropy encoder 330, 380 receives the output of the prediction error encoder 303, 353 and may perform a suitable entropy encoding/variable length encoding on the signal to provide a compressed signal. The outputs of the entropy encoders 330, 380 may be inserted into a bitstream, for example, by a multiplexer 365.
[0088] The one or more apparatuses described in FIGs 1 to 3 may be caused to perform: overfitting shared multipliers.
[0089] Fundamentals of neural networks
[0090] A neural network (NN) is a computation graph consisting of several layers of computation. Each layer consists of one or more units, where each unit performs a computation. A unit is connected to one or more other units, and a connection may be associated with a weight. The weight may be used for scaling the signal passing through an associated connection. Weights are learnable parameters, for
example, values which can be learned from training data. There may be other learnable parameters, such as those of batch-normalization layers.
[0091] Couple of examples of architectures for neural networks are feed-forward and recurrent architectures. Feed-forward neural networks are such that there is no feedback loop, each layer takes input from one or more of the previous layers, and provides its output as the input for one or more of the subsequent layers. Also, units inside a certain layer take input from units in one or more of preceding layers and provide output to one or more of following layers.
[0092] Initial layers, those close to the input data, extract semantically low-level features, for example, edges and textures in images, and intermediate and final layers extract more high-level features. After the feature extraction layers there may be one or more layers performing a certain task, for example, classification, semantic segmentation, object detection, denoising, style transfer, superresolution, and the like. In recurrent neural networks, there is a feedback loop, so that the neural network becomes stateful, for example, it is able to memorize information or a state.
[0093] Neural networks are being utilized in an ever-increasing number of applications for many different types of devices, for example, mobile phones, chat bots, loT devices, smart cars, voice assistants, and the like. Some of these applications include, but are not limited to, image and video analysis and processing, social media data analysis, device usage data analysis, and the like.
[0094] One of the properties of neural networks, and other machine learning tools, is that they are able to learn properties from input data, either in a supervised way or in an unsupervised way. Such learning is a result of a training algorithm, or of a meta-level neural network providing the training signal.
[0095] In general, the training algorithm consists of changing some properties of the neural network so that its output is as close as possible to a desired output. For example, in the case of classification of objects in images, the output of the neural network can be used to derive a class or category index which indicates the class or category that the object in the input image belongs to. Training usually happens by minimizing or decreasing the output error, also referred to as the loss. Examples of losses are mean squared error, cross-entropy, and the like. In recent deep learning techniques, training is an iterative process, where at each iteration the algorithm modifies the weights of the neural network to make a gradual improvement in the network’s output, for example, gradually decrease the loss.
[0096] Training a neural network is an optimization process, but the final goal is different from the typical goal of optimization. In optimization, the only goal is to minimize a function. In machine
learning, the goal of the optimization or training process is to make the model learn the properties of the data distribution from a limited training dataset. In other words, the goal is to learn to use a limited training dataset in order to learn to generalize to previously unseen data, for example, data which was not used for training the model. This is usually referred to as generalization. In practice, data is usually split into at least two sets, the training set and the validation set. The training set is used for training the network, for example, to modify its learnable parameters in order to minimize the loss. The validation set is used for checking the performance of the network on data, which was not used to minimize the loss, as an indication of the final performance of the model. In particular, the errors on the training set and on the validation set are monitored during the training process to understand the following:
- when the network is learning at all - in this case, the training set error should decrease, otherwise the model is in the regime of underfitting.
- when the network is learning to generalize - in this case, also the validation set error needs to decrease and be not too much higher than the training set error. For example, the validation set error should be less than 20% higher than the training set error. If the training set error is low, for example 10% of its value at the beginning of training, or with respect to a threshold that may have been determined based on an evaluation metric, but the validation set error is much higher than the training set error, or it does not decrease, or it even increases, the model is in the regime of overfitting. This means that the model has just memorized properties of the training set and performs well only on that set, but performs poorly on a set not used for training or tuning of its parameters.
[0097] Lately, neural networks have been used for compressing and de-compressing data such as images. The most widely used architecture for such task is the auto-encoder, which is a neural network consisting of two parts: a neural encoder and a neural decoder. In various embodiments, these neural encoder and neural decoder would be referred to as encoder and decoder, even though these refer to algorithms which are learned from data instead of being tuned manually. The encoder takes an image as an input and produces a code, to represent the input image, which requires less bits than the input image. This code may have been obtained by a binarization or quantization process after the encoder. The decoder takes in this code and reconstructs the image which was input to the encoder.
[0098] Such encoder and decoder are usually trained to minimize a combination of bitrate and distortion, where the distortion may be based on one or more of the following metrics: mean squared error (MSE), peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), or the like. These distortion metrics are meant to be correlated to the human visual perception quality, so that minimizing or maximizing one or more of these distortion metrics results into improving the visual quality of the decoded image as perceived by humans.
[0099] In various embodiments, terms ‘model’, ‘neural network’, ‘neural net’ and ‘network’ may be used interchangeably, and also the weights of neural networks may be sometimes referred to as learnable parameters or as parameters.
[00100] Fundamentals of video/image coding
[00101] Video codec includes an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form. Typically, an encoder discards some information in the original video sequence in order to represent the video in a more compact form, for example, at lower bitrate.
[00102] Typical hybrid video codecs, for example ITU-T H.263 and H.264, encode the video information in two phases. Firstly, pixel values in a certain picture area (or ‘block’) are predicted, for example, by motion compensation means or circuits (by finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means or circuit (by using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, e.g. the difference between the predicted block of pixels and the original block of pixels, is coded. This is typically done by transforming the difference in pixel values using a specified transform (e.g. discrete cosine transform (DCT) or a variant of it), quantizing the coefficients and entropy coding the quantized coefficients. By varying the fidelity of the quantization process, the encoder may control the balance between the accuracy of the pixel representation (e.g., picture quality) and size of the resulting coded video representation (e.g., file size or transmission bitrate).
[00103] Inter prediction, which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, exploits temporal redundancy. In inter prediction the sources of prediction are previously decoded pictures, e.g., reference pictures.
[00104] In temporal inter prediction, the sources of prediction are previously decoded pictures in the same scalable layer. In intra block copy (IBC), e.g., intra-block-copy prediction, prediction may be applied similarly to temporal inter prediction but the reference picture is the current picture and only previously decoded samples may be referred in the prediction process. Inter-layer or inter-view prediction may be applied similarly to temporal inter prediction, but the reference picture is a decoded picture from another scalable layer or from another view, respectively. In some cases, inter prediction may refer to temporal inter prediction only, while in other cases inter prediction may refer collectively to temporal inter prediction and any of intra block copy, inter-layer prediction, and inter- view prediction provided that they are performed with the same or similar process than temporal prediction. Inter
prediction, temporal inter prediction, or temporal prediction may sometimes be referred to as motion compensation or motion-compensated prediction.
[00105] Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in spatial or transform domain, for example, either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intracoding, where no inter prediction is applied.
[00106] One example outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently when they are predicted first from spatially or temporally neighboring parameters. For example, a motion vector may be predicted from spatially adjacent motion vectors and only the difference relative to the motion vector predictor may be coded. Prediction of coding parameters and intra prediction may be collectively referred to as in-picture prediction.
[00107] The decoder reconstructs the output video by applying prediction techniques similar to the encoder to form a predicted representation of the pixel blocks. For example, using the motion or spatial information created by the encoder and stored in the compressed representation and prediction error decoding, which is inverse operation of the prediction error coding recovering the quantized prediction error signal in spatial pixel domain. After applying prediction and prediction error decoding techniques the decoder sums up the prediction and prediction error signals, for example, pixel values to form the output video frame. The decoder and encoder can also apply additional filtering techniques to improve the quality of the output video before passing it for display and/or storing it as prediction reference for the forthcoming frames in the video sequence.
[00108] In typical video codecs the motion information is indicated with motion vectors associated with each motion compensated image block. Each of these motion vectors represents the displacement of the image block in the picture to be coded in the encoder side or decoded in the decoder side and the prediction source block in one of the previously coded or decoded pictures.
[00109] In order to represent motion vectors efficiently, the motion vectors are typically coded differentially with respect to block specific predicted motion vectors. In typical video codecs, the predicted motion vectors are created in a predefined way, for example, calculating the median of the encoded or decoded motion vectors of the adjacent blocks.
[00110] Another way to create motion vector predictions is to generate a list of candidate predictions from adjacent blocks and/or co-located blocks in temporal reference pictures and signaling the chosen candidate as the motion vector predictor. In addition to predicting the motion vector values,
the reference index of previously coded/decoded picture can be predicted. The reference index is typically predicted from adjacent blocks and/or or co-located blocks in temporal reference picture.
[00111] Moreover, typical high efficiency video codecs employ an additional motion information coding/decoding mechanism, often called merging/merge mode, where all the motion field information, which includes motion vector and corresponding reference picture index for each available reference picture list, is predicted and used without any modification/correction. Similarly, predicting the motion field information is carried out using the motion field information of adjacent blocks and/or co-located blocks in temporal reference pictures and the used motion field information is signaled among a list of motion field candidate list filled with motion field information of available adjacent/co-located blocks.
[00112] In typical video codecs, the prediction residual after motion compensation is first transformed with a transform kernel, for example, DCT and then coded. The reason for this is that often there still exists some correlation among the residual and transform can in many cases help reduce this correlation and provide more efficient coding.
[00113] Typical video encoders utilize Lagrangian cost functions to find optimal coding modes, for example, the desired macroblock mode and associated motion vectors. This kind of cost function uses a weighting factor X to tie together the exact or estimated image distortion due to lossy coding methods and the exact or estimated amount of information that is required to represent the pixel values in an image area:
C = D + R - equation 1
[00114] In equation 1, C is the Lagrangian cost to be minimized, D is the image distortion, for example, mean squared error with the mode and motion vectors considered, and R is the number of bits needed to represent the required data to reconstruct the image block in the decoder including the amount of data to represent the candidate motion vectors.
[00115] Video coding specifications may enable the use of supplemental enhancement information (SEI) messages or alike. Some video coding specifications include SEI NAL units, and some video coding specifications include both prefix SEI NAL units and suffix SEI NAL units, where the former type may start a picture unit or alike and the latter type may end a picture unit or alike. An SEI NAL unit may include one or more SEI messages, which are not required for the decoding of output pictures but may assist in related processes, such as picture output timing, post-processing of decoded pictures, rendering, error detection, error concealment, and resource reservation. Several SEI messages are specified in H.264/AVC, H.265/HEVC, H.266/VVC, and H.274/VSEI standards, and the user data SEI messages enable organizations and companies to specify SEI messages for their own use. The standards
may contain the syntax and semantics for the specified SEI messages but a process for handling the messages in the recipient may not be defined. Consequently, encoders may be required to follow the standard specifying a SEI message when they create SEI message(s), and decoders may not be required to process SEI messages for output order conformance. One of the example reasons to include the syntax and semantics of SEI messages in standards is to allow different system specifications to interpret the supplemental information identically and hence interoperate. It is intended that system specifications may require the use of particular SEI messages both in the encoding end and in the decoding end, and additionally the process for handling particular SEI messages in the recipient may be specified.
[00116] Information on Video Coding for Machines (VCM)
[00117] Reducing the distortion in image and video compression is often intended to increase human perceptual quality, as humans are considered to be the end users, e.g. consuming or watching the decoded images or videos. Recently, with the advent of machine learning, especially deep learning, there is a rising number of machines (e.g., autonomous agents) that analyze or process data independently from humans and may even take decisions based on the analysis results without human intervention. Examples of such analysis are object detection, scene classification, semantic segmentation, video event detection, anomaly detection, pedestrian tracking, and the like. Example use cases and applications are self-driving cars, video surveillance cameras and public safety, smart sensor networks, smart TV and smart advertisement, person re-identification, smart traffic monitoring, drones, and the like. Accordingly, when decoded data is consumed by machines, a quality metric for the decoded data may be defined, which may be different from a quality metric for human perceptual quality. Also, dedicated algorithms for compressing and decompressing data for machine consumption may be different than those for compressing and decompressing data for human consumption. The set of tools and concepts for compressing and decompressing data for machine consumption is referred to here as Video Coding for Machines.
[00118] It is likely that the receiver-side device include multiple ‘machines’ or neural networks (NNs). These multiple machines may be used in a certain combination which is for example determined by an orchestrator sub-system. The multiple machines may be used for example in succession, based on the output of the previously used machine, and/or in parallel. For example, a video which was compressed and then decompressed may be analyzed by one machine (NN) for detecting pedestrians, by another machine (another NN) for detecting cars, and by another machine (another NN) for estimating the depth of all the pixels in the frames.
[00119] Also, the term ’receiver-side’ or ‘decoder-side’ to refer to the physical or abstract entity or device which includes one or more machines, and runs these one or more machines on some encoded
and eventually decoded video representation which is encoded by another physical or abstract entity or device, the ‘encoder-side device’ .
[00120] The encoded video data may be stored into a memory device, for example as a file. The stored file may later be provided to another device.
[00121] Alternatively, the encoded video data may be streamed from one device to another.
[00122] FIG. 4 illustrates a pipeline of video coding for machines (VCM). A VCM encoder 402 encodes the input video into a bitstream 404. A bitrate 406 may be computed 408 from the bitstream 404 in order to evaluate the size of the bitstream 404. A VCM decoder 410 decodes the bitstream 404 output by the VCM encoder 402. An output of the VCM decoder 410 may be referred, for example, as decoded data for machines 412. This data may be considered as the decoded or reconstructed video. However, in some implementations of the pipeline of VCM, the decoded data for machines 412 may not have same or similar characteristics as the original video which was input to the VCM encoder 402. For example, this data may not be easily understandable by a human, when the human watches the decoded video from a suitable output device such as a display. The output of the VCM decoder 410 is then input to one or more task neural network (task-NN). For the sake of illustration, FIG. 4 is shown to include three example task-NNs, a task-NN 414 for object detection, a task-NN 416 for image segmentation, a task-NN 418 for object tracking, and a non-specified one, a task-NN 420 for performing task X. The goal of VCM is to obtain a low bitrate while guaranteeing that the task-NNs still perform well in terms of the evaluation metric associated with each task.
[00123] When a conventional video encoder, such as a H.266/VVC encoder, is used as a VCM encoder, one or more of the following approaches may be used to adapt the encoding to be suitable to machine analysis tasks:
One or more regions of interest (ROIs) may be detected. An ROI detection method may be used. For example, ROI detection may be performed using a task NN, such as an object detection NN. In some cases, ROI boundaries of a group of pictures or an intra period may be spatially overlaid and rectangular areas may be formed to cover the ROI boundaries. The detected ROIs (or rectangular areas, likewise) may be used in one or more of the following ways: o The quantization parameter (QP) may be adjusted spatially in a manner that ROIs are encoded using finer quantization step size(s) than other regions. For example, QP may be adjusted CTU-wise. o The video is preprocessed to contain only the ROIs, while the other areas are replaced by one or more constant values or removed.
o A grid is formed in a manner that a single grid cell covers a ROI. Grid rows or grid columns that contain no ROIs are downsampled as preprocessing to encoding.
Quantization parameter of the highest temporal sublayer(s) is increased (i.e. coarser quantization is used) when compared to practices for human watchable video.
The original video is temporally downsampled as preprocessing prior to encoding. A frame rate upsampling method may be used as postprocessing subsequent to decoding, when machine analysis at the original frame rate is desired.
A filter is used to preprocess the input to the conventional encoder. The filter may be a machine learning based filter, such as a convolutional neural network.
[00124] Neural network based filtering
[00125] In some video codecs, a neural network may be used as filter in the decoding loop (also referred to simply as coding loop), and it may be referred to as neural network loop filter, or neural network in-loop filter. The NN loop filter may replace all other loop filters of an existing video codec, or may represent an additional loop filter with respect to the already present loop filters in an existing video codec.
[00126] In the context of image and video enhancement, a neural network may be used as postprocessing filter, for example applied to the output of an image or video decoder in order to remove or reduce coding artifacts.
[00127] For simplicity, we refer to neural network filter or NN filter as a filter that comprises one or more neural networks and is used either as a loop filter in the coding loop or as a post-processing filter.
[00128] It is also possible to have two NN filters in the set of in-loop filters.
[00129] It is also possible to have one or more NN in-loop filters and one or more NN postprocessing filters.
[00130] The following example system will be used in several embodiments to illustrate or describe the idea. The example system comprises a codec that comprises one or more NN loop filters. For example, the codec could comprise a modified VVC/H.266 compliant codec (e.g., a VVC/H.266 compliant codec that has been modified so that it would comprise one or more NN loop filters). The input to the one or more NN loop filters may comprise at least a reconstructed block or frames (simply referred to as reconstruction) or data derived from a reconstructed block or frame (e.g., the output of a conventional loop filter). The reconstruction may be obtained based on predicting a block or frame (e.g.,
by means of intra-frame prediction or inter-frame prediction) and performing residual compensation. The input to the one or more NN loop filters may also comprise other data that is associated or related to the reconstructed block or frame and may be referred to as auxiliary input data. The one or more NN loop filters (may be referred to simply as NN filters in some of the embodiments) may enhance the quality of at least one of their inputs, such as the quality of the input reconstructed block or frame, where the quality may be measured in terms of one or more quality metrics, so that a rate-distortion loss is decreased. The rate may indicate a bitrate (estimate or real) of the encoded video. The distortion may indicate a pixel fidelity distortion such as the following:
- mean-squared error (MSE);
- mean absolute error (MAE);
- mean average precision (mAP) computed based on the output of a task NN (such as an object detection NN) when the input is the output of the post-processing NN; and
- Other machine task-related metric, for tasks such as object tracking, video activity classification, video anomaly detection, and the like.
[00131] The enhancement may result into a coding gain, which can be expressed for example in terms of BD-rate, BD-PSNR, or BD-mAP.
[00132] However, at least some of the embodiments described herein are applicable to a NN filter which is not a loop filter of a codec. For example, the NN filter may be a NN post-processing filter, whose input may comprise one or more outputs of a video codec. In this case, the filter may be used only for increasing a quality metric for at least one of its inputs, where the quality metric may be, for example, peak signal-to-noise ratio (PSNR), mAP for object detection, MOTA for object tracking, and the like.
[00133] Information on overfitting a decoder-side neural network
[00134] Various embodiments consider the case where a data unit is encoded by an encoder and decoded by a decoder.
[00135] Some examples of data units are:
- One or more video sequences;
- One or more frames of one or more video sequences; or
- One or more blocks or regions of one or more frames.
[00136] The decoder or the receiver comprising the decoder is assumed to comprise at least a
neural network, referred to as decoder-side neural network (DSNN). Various embodiments consider the case where the DSNN is optimized for a certain data unit, with respect to one or more metrics.
[00137] Some examples of metrics are:
- Peak signal-to-noise ratio (PSNR);
- Mean-squared error (MSE); and
- Mean average precision (mAP).
[00138] Herein, the process of optimizing a NN on a certain data unit with respect to one or more metrics may be referred to as overfitting, specializing, or finetuning.
[00139] The NN which is optimized may have been pretrained (e.g., may have been previously trained on a training dataset).
[00140] Overfitting of a DSNN may be performed by computing a weight-update at encoder side by means of one or more training iterations, e.g., based on backpropagation. Then, the obtained weight update is compressed by a neural network encoder and provided to a neural network decoder. The neural network encoder may be part of the encoder of data units, such as a video encoder; the neural network decoder may be part of the decoder of data units, such as a video decoder. The decoder decompresses the compressed weight-update, uses the decompressed weight-update for updating the DSNN, and uses the updated DSNN for its purpose, such as for decoding a data unit or for post-processing a decoded data unit; for example, the updated DSNN may be used for decoding a video frame or for postprocessing a decoded video frame or data derived from the decoded video frame.
[00141] Sending a weight update from encoder to decoder may cause a bitrate increase, or bitrate overhead, with respect to the bitrate required for sending an encoded data unit (e.g., an encoded video).
[00142] An example problem addressed by one or more embodiments is how to reduce such bitrate overhead or, in other words, how to generate a low-bitrate bitstream representing a weight update.
[00143] General information
[00144] For the sake of simplicity, at least some embodiments are described herein as applied to a filter. A filter may take as input at least one or more first images (or blocks of an image) to be filtered and may output at least one or more second images (or blocks of an image), where the one or more second images may be the filtered version of the one or more first images. In an example, the filter takes as input one image and outputs one image. In another example, the filter takes as input more than one
image and outputs one image. In yet another example, the filter takes as input more than one image and outputs more than one image.
[00145] It is to be understood that a filter may take as input also other data (also referred to as auxiliary data) than the data that is to be filtered, such as data that may aid the filter to perform a better filtering than when no auxiliary data was provided as input. In an example, the auxiliary data comprises information about prediction data, and/or information about the picture type, and/or information about the slice type, and/or information about a Quantization Parameter (QP) used for encoding, and/or information about boundary strength, and the like. In an example, the filter takes as input one image and other data associated to that image, such as information about the quantization parameter (QP) used for quantizing and/or dequantizing that image, and outputs one image.
[00146] A filter may be a neural network based filter, or may be another type of filter. However, several embodiments describe training features which may be applicable to machine learning based filters such as NN filters.
[00147] A filter may be, for example, an in-loop filter that is used in the decoding loop of a codec, or a post-processing filter applied on the data decoded by the codec.
[00148] Even when at least some of the embodiments are described with reference to a filter, those embodiments may be applied also to other operations than just filters, such as an operation performing intra-frame prediction, or an operation performing inter-frame prediction, or an operation performing frame-rate upsampling, or an operation performing encoding and/or decoding (e.g., a neural network that is comprised in an end-to-end learned codec). The described embodiments may also be applied to components in an end-to-end learned image/video codec, for example, a decoder network, an optical flow estimation network, or a probability model neural network.
[00149] While at least some embodiments are described such that the input and output data are in the form of images or (video) frames or pictures, those embodiments may be applicable also to other types of data, such as audio frames. Furthermore, while at least some embodiments are described by considering a full image, those embodiments may be applicable also when considering one or more blocks or portions of an image.
[00150] It is to be noticed that, for simplicity, at least some of the figures do not include visual representations of other components that may be present at encoder side and/or at decoder side. In an example, in case the DSNN is a decoder neural network which is part of an end-to-end learned image codec, a figure may not include information about some of the encoder-side operations performed by the encoder of the end-to-end learned image codec, such as an encoder neural network to process an
input image into a latent tensor or a lossless encoder to encode a latent tensor into a bitstream, and/or may not include information about some of the decoder-side operations performed by the decoder of the end-to-end learned image codec, such as a lossless decoder that decodes a bitstream representing an encoded image. In another example, in case the DSNN is a neural network based loop filter that is part of a video decoding loop, a figure may not depict some of the video encoding operations and/or some of the video decoding operations.
[00151] Several embodiments are described herein (e.g., as illustrated in FIG. 5 and FIG. 6), where a decoder-side neural network (DSNN) is overfitted based at least on an overfitting signal, obtaining an overfitted DSNN. The aim of the overfitting process is that the overfitted DSNN performs better than the DSNN with respect to a predefined metric, on at least some input data.
[00152] Referring to FIG. 5, several embodiments are described herein, where a decoder-side neural network (DSNN) 502 is overfitted 504 based at least on an overfitting signal 506, obtaining an overfitted DSNN 508. An example objective of the overfitting process is that the overfitted DSNN performs better than the DSNN with respect to a predefined metric, on at least some input data.
[00153] Referring to FIG. 6, after a DSNN has been overfitted, the overfitted DSNN may be used (604) for its purpose, such as for filtering input data 602 to obtain filtered data 606.
[00154] A DSNN is a neural network which is used in a decoder or as a post-processing operation. Some examples of a DSNN used in a decoder are the following:
- An NN loop filter that is comprised in a conventional decoder or in an end-to-end learned codec;
- An NN post-filter that follows, in processing order, an inner decoder, such as a conventional decoder or an end-to-end learned codec; or
- An NN decoder, which is comprised in an end-to-end learned codec.
[00155] Several embodiments herein are described with reference to a DSNN that performs filtering of at least a portion of the input data. In some of the embodiments and examples, the at least portion of input data to be filtered by the DSNN is referred to as “data to be filtered”. However, the DSNN may take other inputs than the data to be filtered, and those other inputs may not be mentioned in the description of some embodiments or examples for the sake of simplicity. Even though several embodiments herein are described with reference to a DSNN that performs filtering, the same embodiments may be applicable to or valid for a DSNN that performs other purposes or tasks, such as a neural network decoder that is part of an end-to-end learned codec.
[00156] The DSNN may be pretrained, e.g., may have been previously trained on a training
dataset, or may be initialized in some other suitable way, such as by random or pseudo-random initialization.
[00157] In some of the embodiments described herein, the DSNN, or a copy of the DSNN, or a neural network which is same or substantially the same as the DSNN, is available at encoder side, and it may be referred to simply as DSNN even when meaning a DSNN present at encoder side.
[00158] Overfitting a DSNN refers to optimizing the DSNN for a certain data unit, with respect to one or more metrics. Examples of data units are:
- one or more video sequences;
- one or more frames of one or more video sequences;
- and/or one or more blocks or regions of one or more frames.
[00159] Examples of metrics are:
- Peak signal-to-noise ratio (PSNR);
- Mean-squared error (MSE); and
- Mean average precision (mAP).
[00160] Herein, the process of optimizing a DSNN on a certain data unit with respect to one or more metrics may be referred to as overfitting, or specializing, or finetuning. The DSNN which is optimized may be pretrained (e.g., may have been previously trained on a training dataset).
[00161] It is to be noted that a DSNN that has been overfitted or optimized on a certain data unit may be then used for processing (e.g., filtering, decoding, post-processing) one or more of the following data:
- the same data unit used for overfitting; and/or
- other data unit than the data unit used for overfitting.
[00162] In an embodiment, the overfitting signal may comprise updated weights. The updated weights may comprise one or more updated values associated to respective one or more weights or parameters of the DSNN. Overfitting a DSNN based on updated weights may comprise replacing the values of one or more weights or parameters of the DSNN with respective one or more updated values comprised in the updated weights.
[00163] In an example, the updated weights comprise one or more updated values associated to respective one or more multiplying parameters, where the one or more multiplying parameters are
comprised in a DSNN, and where each of the one or more multiplying parameters multiply an output of a layer of the DSNN.
[00164] In another example, the updated weights comprise one or more updated values associated to respective one or more bias parameters, where the one or more bias parameters are comprised in a DSNN, and where each of the one or more bias parameters are added to an output of a layer of the DSNN.
[00165] In another embodiment, the overfitting signal may comprise a weight-update. The weightupdate may comprise one or more update values associated to respective one or more weights or parameters of the DSNN, where each update value represents an update or change to a respective weight or parameter of the DSNN. Overfitting a DSNN based on a weight-update may comprise adding or subtracting (or other suitable operation) the one or more update values of the weight-update to respective one or more weights or parameters of the DSNN.
[00166] In an example, the weight-update comprises one or more updates associated to respective one or more multiplying parameters, where the one or more multiplying parameters are comprised in a DSNN, and where each of the one or more multiplying parameters multiply an output of a layer of the DSNN.
[00167] In another example, the weight-update comprises one or more updates associated to respective one or more bias parameters, where the one or more bias parameters are comprised in a DSNN, and where each of the one or more bias parameters are added to an output of a layer of the DSNN.
[00168] In another embodiment, the overfitting signal may comprise a modulating signal. The modulating signal may comprise one or more modulating values, or one or more sets of modulating values, associated to respective one or more outputs of one or more layers of a DSNN. Overfitting a DSNN based on a modulating signal may comprise modifying the one or more outputs based at least on respective one or more modulating values, or the one or more sets of modulating values, and a modulation operation.
[00169] In an example, a modulating signal comprises one or more sets of modulating values associated to respective one or more channels of tensors output by one or more convolutional layers of a DSNN, where each set of modulating values comprises a multiplicative value and an additive value, and where the modulation operation comprises multiplying the multiplicative value with the respective channel and adding the additive value to the result of the multiplication for that channel.
[00170] For the sake of simplicity, at least some of the embodiments and examples herein are
described by referring to an overfitting signal that comprises multipliers or updates to multipliers. A multiplier multiplies a channel of a tensor, where the tensor can be, for example, the output of a convolutional layer.
[00171] Example Embodiments
[00172] Consider an example, in which an overfitting signal is applied to a tensor that is output by a convolutional layer with C convolutional kernels. The shape or size of the tensor is (B, C, H, W) (an alternative notation for the shape or size of a tensor is BxCxHxW), where B indicates a batch size, C indicates a number of kernels of the convolutional kernels that produced the tensor and represents the number of channel of the tensor, H and W indicate the height and width of the tensor. For simplicity, the batch size B is assumed to be 1 (i.e., the shape of a tensor output by a NN layer will be assumed to be (1, C, H, W)) and it will not be shown, i.e., the shape of a tensor that is output by a NN layer will be shown simply as C, H, W). The NN filter is assumed to comprise at least one layer that outputs a tensor, where the tensor is multiplied with one or more multipliers. In particular, each of the one or more multipliers multiplies a channel of the tensor, where a channel of the tensor is a sliced tensor of shape or size (1, H, W) (or, in the alternative notation, IxHxW). Mathematically, the multiplication of a tensor T with multipliers m can be described as follows: y = T * m
[00173] Where T is the tensor of shape (C, H, W) and m is an array of shape C. As the shapes of the tensor T and of the array m are different, broadcasting is performed, so that the same i-th value m[i] multiplies all the spatial elements of the i-th channel of T.
[00174] In one embodiment, there may be C multipliers that multiply the respective C channels of the tensor. For example, when the C multipliers are stored in an array m of length C, and the tensor is referred to as T, the value m[0] multiplies the sliced tensor T[0, :, :], where indicates all the indexes on that axis, the value m[l] multiplies the sliced tensor T[l, :, :], and so on.
[00175] It is to be noted that a neural network may have more than one layer, where each layer may output a tensor. Thus, a neural network may internally produce more than one tensor. Different tensors produced within a neural network may have different shapes. For example, different tensors may have different number of channels. For the sake of simplicity, at least some of the described embodiments consider the case of a single layer, but the underlying ideas can be extended to the case of multiple layers with potentially different shapes of the tensors produced by those layers.
[00176] Embodiment on grouping the multipliers
[00177] In an embodiment, the values of the C multipliers may be in a set of cardinality less than C. In other words, different multipliers may be constrained to have the same value. The cardinality of the set of values of the C multipliers may be referred to as K. For example, K is the number of different values than that the C multipliers may have. This way, the C multipliers are grouped into K groups, where the multipliers within each group have same value and multipliers within different groups may have different values.
[00178] Embodiment on properties of groups
[00179] In an embodiment, the number of multipliers in different groups is the same or approximately the same. In an example, each group comprises C/K multipliers. In an alternative embodiment, the number of multipliers in different groups may be different. In an example, one of the K groups may comprise much more multipliers than another one of the K groups.
[00180] In one embodiment, the number K of different values of multipliers may be same for all the layers of a neural network. In an alternative embodiment, the number K of different values of multipliers may be different for different layers of a neural network.
[00181] Embodiments on assigning multipliers to groups
[00182] In an embodiment, multipliers are assigned to groups in a predetermined way based at least on a grouping operation and is same for any content or video sequence on which the overfitting is performed. In other words, the assigning of multipliers to groups is not dependent on the particular content being overfitted or encoded/decoded. During overfitting, the multipliers in the same group may be constrained to have the same value. After overfitting, for each overfitted layer, an encoder may signal the values of K multiplier updates or K updated multipliers to a decoder. At decoder side, for each overfitted layer, the received values of K multiplier updates or K updated multipliers are used to update the C multipliers of that layer, based on information about the grouping operation.
[00183] In an embodiment, the grouping operation may comprise determining or assuming an order of the channels and/or multipliers that are applied to the channels and then assigning nearby or consecutive multipliers to the same group. In other words, all the multipliers that belong to a certain group appear in a consecutive sequence within the array (or other data structure) that contains the multipliers. For example, groups 1, 2 (wherein K = 2) may be assigned to 6 multipliers as 1,1, 2, 2, 2, 2 and not as 1,2, 2, 1,1, 2. In another example, with K=3 groups, 6 multipliers may be assigned as 1,1, 2, 2, 2, 3 and not 1,2, 1,2, 3, 2. So, all multipliers whose indexes are in the range [i, j] are assigned to a k-th group (where k is in the range [1, K]), and then all other multipliers whose indexes are not in the range [i, j] are assigned to groups different from the k-th group. In an embodiment, an encoder may
signal to a decoder an indication of the grouping operation, and/or an indication of an order of the channels or multipliers.
[00184] In an example, there are C=6 channels in a tensor T, C=6 multipliers, and K=2 different values of the multipliers (e.g., 2 groups of multipliers’ values). The ordering of the multipliers is performed by using the order of the channels in the tensor T, e.g., multiplier m[0] is associated to channel T[0,
multiplier m[l] is associated to channel T[l, :, :], and so on. Then, each of the 6 multipliers is assigned to one of the 2 groups as follows: m[0] is assigned to group 1, m[l] is assigned to group 1, m[2] is assigned to group 1, m[3] is assigned to group 2, m[4] is assigned to group 2, m[5] is assigned to group 2. Example, the multipliers m[0], m[l], m[2], m[3], m[4], m[5] are assigned to the groups 1, 1, 1, 2, 2, 2, respectively. In this example, the K=2 groups contain the same number of multipliers, e.g., 3 multipliers. During overfitting, the multipliers in the same group are constrained to have the same value. In this example, the ordering of channels and multipliers is assumed to be the order of channels of the tensor that is output by the considered NN layer, or the order of the convolutional kernels in the considered NN layer.
[00185] In an embodiment, the assigning of multipliers to groups may be determined based on the content on which the multipliers are overfitted and based on a grouping operation. The assigning of multipliers to groups may thus be different for different contents (e.g., different video sequences, or different video frames).
[00186] In an embodiment, during overfitting, C multipliers are overfitted without constraining their values to be same or without considering them as belonging to any groups. After overfitting, a clustering operation is applied to the values of the updated multipliers or to the multiplier-updates, by using a number K of clusters. The number K of clusters may be predetermined or determined during the clustering operation. As a result, the values of C updated multipliers or multiplier-updates are clustered into K clusters or groups and represented by K centroids. Each centroid represents the value of the new updated multiplier or multiplier -update.
[00187] In an embodiment, after the clustering operation, for each overfitted layer, an encoder may signal the values of the new K multiplier-updates or K updated multipliers (e.g., the centroids) to a decoder, together with an indication of the assignment of each value to the respective multiplier. At decoder side, for each overfitted layer, the received values of K multiplier updates or K updated multipliers are used to update the C multipliers of that layer, based on the information about the assignment of each value to the respective multiplier.
[00188] In another embodiment, after the clustering operation, for each overfitted layer, an encoder may signal C values of the multiplier-updates or updated multipliers, where the C values are in
a set of cardinality K, and where the C values are ordered according to the assumed order of the channels of the tensor and/or the order of the multipliers that multiply the channels of the tensor.
[00189] In an example, there are C=6 channels in a tensor T, C=6 multipliers, and the number of clusters is predetermined to be K=2. The ordering of the multipliers is performed by using the order of the channels in the tensor T, e.g., multiplier m[0] is associated to channel T[0, :, :], multiplier m[l] is associated to channel T[l, :, :], and so on. The C=6 multipliers are overfitted on a content, e.g., a video sequence. After overfitting, the values of the C=6 multiplier-updates are clustered into K=2 clusters. The obtained K=2 centroids represent the 2 possible values that the C=6 multiplier-updates may have. For example, based on the results of the clustering, each of the 6 multipliers is assigned to one of the 2 clusters as follows: m[0] is assigned to group 1, m[l] is assigned to group 2, m[2] is assigned to group 2, m[3] is assigned to group 2, m[4] is assigned to group 1, m[5] is assigned to group 2. I.e., the multipliers m[0], m[l], m[2], m[3], m[4], m[5] are assigned to the groups 1, 2, 2, 2, 1, 2, respectively. In this example, the K=2 groups contain different number of values of multiplier-updates, i.e., group 1 contains 2 multiplier-updates, group 2 contains 4 multiplier-updates. An encoder signals the K=2 values of the centroids, representing the 2 values that the C=6 multiplier -updates can have, and information about the assignment of multipliers to the 2 values, based on an assumed or determined ordering of the multipliers. Such information could be, for example, the positions of each of the two values with respect to the 6 multiplies. For example, the encoder signals the following: cl, c2, idxO, idxl, idx2, idx3, idx4, idx5, where cl and c2 represent the values of the centroids and idxO .. idx5 represent the indexes of the clusters for multipliers m[0] .. m[5]. E.g., the signal could be: 1.2, 0.8, 0, 1, 1, 1, 0, 1, which indicates that there are two centroids with values 1.2 and 0.8, associated to clusters with indexes 0 and 1, respectively, and that the multiplier are assigned to the clusters as follows: m[0] is assigned to cluster with index 0, m[l] is assigned to cluster with index 1, m[2] is assigned to cluster with index 1, m[3] is assigned to cluster with index 1, m[4] is assigned to cluster with index 0, m[5] is assigned to cluster with index 1.
[00190] In any of the described embodiments, the updated values of the multipliers (e.g., the updated multipliers), or the update values (e.g., the multiplier updates) of the multipliers may be subject to quantization prior to signaling from encoder to decoder. At decoder side, the quantized updated values or the quantized update values may be dequantized prior to be used for updating the neural network.
[00191] FIG. 7 is an example apparatus, which may be implemented in hardware, caused to perform: overfitting shared multipliers. The apparatus 700 comprises at least one processor 702 (e.g., an FPGA and/or CPU), one or more memories 704 including computer program code 705, the computer program code 705 having instructions to carry out the methods described herein, wherein the at least
one memory 704 and the computer program code 705 are configured to, with the at least one processor 702, cause the apparatus 700 to implement circuitry, a process, component, module, or function (implemented with control module 706) to implement the examples described herein, including overfitting shared multipliers. Optionally included encoder 730 of the control module 706 performs encoding, and optionally included decoder 732 implements decoding. The memory 704 may be a non- transitory memory, a transitory memory, a volatile memory (e.g., RAM), or a non-volatile memory (e.g., ROM).
[00192] The apparatus 700 includes a display and/or I/O interface 708, which includes user interface (UI) circuitry and elements, that may be used to display features or a status of the methods described herein (e.g., as one of the methods is being performed or at a subsequent time), or to receive input from a user such as with using a keypad, camera, touchscreen, touch area, microphone, biometric recognition, one or more sensors, and the like. The apparatus 700 includes one or more communication, e.g., network (N/W) interfaces (I/F(s)) 710. The communication I/F(s) 710 may be wired and/or wireless and communicate over the Internet/other network(s) via any communication technique including via one or more links 724. The communication I/F(s) 710 may comprise one or more transmitters or one or more receivers.
[00193] The transceiver 716 comprises one or more transmitters 718 and one or more receivers 720. The transceiver 716 and/or communication I/F(s) 710 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitries and one or more antennas, such as antennas 714 used for communication over wireless link 722.
[00194] The control module 706 of the apparatus 700 comprises one of or both parts 706-1 and/or 706-2, which may be implemented in a number of ways. The control module 706 may be implemented in hardware as control module 706-1, such as being implemented as part of the one or more processors 702. The control module 706-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array. In another example, the control module 706 may be implemented as control module 706-2, which is implemented as computer program code (having corresponding instructions) 705 and is executed by the one or more processors 702. For instance, the one or more memories 704 store instructions that, when executed by the one or more processors 702, cause the apparatus 700 to perform one or more of the operations as described herein. Furthermore, the one or more processors 702, one or more memories 704, and example algorithms (e.g., as flowcharts and/or signaling diagrams), encoded as instructions, programs, or code, are means for causing performance of the operations described herein.
[00195] The apparatus 700 to implement the functionality of control 706 may correspond to any of the apparatuses depicted herein. Alternatively, apparatus 700 and its elements may not correspond to any of the other apparatuses depicted herein, as apparatus 700 may be part of a self- organizing/optimizing network (SON) node or other node, such as a node in a cloud.
[00196] The apparatus 700 may also be distributed throughout the network (e.g., internet 28) including within and between apparatus 700 and any network element (such as a base station 24 and/or apparatus 90).
[00197] Interface 712 enables data communication and signaling between the various items of apparatus 700, as shown in FIG. 7. For example, the interface 712 may be one or more buses such as address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like. Computer program code (e.g., instructions) 705, including control 706 may comprise object- oriented software configured to pass data or messages between objects within computer program code 705. The apparatus 700 need not comprise each of the features mentioned, or may comprise other features as well. The various components of apparatus 700 may at least partially reside in a common housing 728, or a subset of the various components of apparatus 700 may at least partially be located in different housings, which different housings may include housing 728.
[00198] FIG. 8 shows a schematic representation of non-volatile memory media 800a (e.g. computer/compact disc (CD) or digital versatile disc (DVD)) and 800b (e.g. universal serial bus (USB) memory stick) and 800c (e.g. cloud storage for downloading instructions and/or parameters 802 or receiving emailed instructions and/or parameters 802) storing instructions and/or parameters 802 which when executed by a processor allows the processor to perform one or more of the operations of the methods described herein.
[00199] FIG. 9 is an example method 900 to implement the embodiments described herein, in accordance with an embodiment. At 902 the method 900 includes setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers. At 904 the method 900 includes multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
[00200] The method 900 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
[00201] FIG. 10 is another example method 1000 to implement the embodiments described herein, in accordance with an embodiment. At 1002 the method 1000 includes multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
[00202] The method 1000 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
[00203] FIG. 11 is yet another example method 1100 to implement the embodiments described herein, in accordance with an embodiment. At 1102 the method 1100 includes receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise. In an embodiment, K is less than C. At 1104 the method 1100 includes receiving K values of multiplier updates or K values of updated multipliers. At 1106 the method 1100 includes updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
[00204] The method 1100 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
[00205] FIG. 12 is still another example method 1200 to implement the embodiments described herein, in accordance with an embodiment. At 1202 the method 1200 includes receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier. At 1204 the method 1200 includes using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
[00206] The method 1200 may be performed with an apparatus described herein, for example, any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
[00207] FIG. 13 is still another example method 1300 to implement the embodiments described herein, in accordance with an embodiment. At 1302 the method 1300 includes receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K. At 1304 the method 1300 includes using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
[00208] The method 1300 may be performed with an apparatus described herein, for example, the any apparatus of FIG. 1 to FIG. 4, FIG. 7, any apparatus of FIG. 14, or any other apparatus described herein.
[00209] Referring to FIG. 14, this figure shows a block diagram of one possible and non-limiting example in which the examples may be practiced. A user equipment (UE) 110, radio access network (RAN) node 170, and network element(s) 190 are illustrated. In the example of FIG. 1, the user equipment (UE) 110 is in wireless communication with a wireless network 100. A UE is a wireless device that can access the wireless network 100. The UE 110 includes one or more processors 120, one or more memories 125, and one or more transceivers 130 interconnected through one or more buses 127. Each of the one or more transceivers 130 includes a receiver, Rx, 132 and a transmitter, Tx, 133. The one or more buses 127 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like. The one or more transceivers 130 are connected to one or more antennas 128. The one or more memories 125 include computer program code 123. The UE 110 includes a module 140, comprising one of or both parts 140-1 and/or 140-2, which may be implemented in a number of ways. The module 140 may be implemented in hardware as module 140-1, such as being implemented as part of the one or more processors 120. The module 140-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array. In another example, the module 140 may be implemented as module 140-2, which is implemented as computer program code 123 and is executed by the one or more processors 120. For instance, the one or more memories 125 and the computer program code 123 may be configured to, with the one or more processors 120, cause the user equipment 110 to perform one or more of the operations as described herein. The UE 110 communicates with a radio access network (RAN) node 170 via a wireless link 111.
[00210] The RAN node 170 in this example is a base station that provides access by wireless devices such as the UE 110 to the wireless network 100. The RAN node 170 may be, for example, a base station for fifth generation cellular network technology (5G), also called New Radio (NR). In 5G, the RAN node 170 may be a NG-RAN node, which is defined as either a gNB (e.g., base station for 5G/NR, for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC) or an ng (new generation)-eNB. A gNB is a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to a 5G core network (5GC) (such as, for example, the network element(s) 190). The ng-eNB, is a node providing evolved universal terrestrial radio access (E-UTRA), for example, the LTE radio access technology, user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC. The NG-RAN node may include multiple gNBs, which may
also include a central unit (CU) (gNB-CU) 196 and distributed unit(s) (DUs) (gNB-DUs), of which DU 195 is shown. Note that the DU may include or be coupled to and control a radio unit (RU). The gNB- CU is a logical node hosting radio resource control (RRC), service data adaptation protocol (SDAP) and PDCP protocols of the gNB or RRC and packet data convergence protocol (PDCP) protocols of the en-gNB (e.g., node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in E-UTRA-NR dual connectivity (EN-DC)) that controls the operation of one or more gNB-DUs. The gNB-CU terminates the interface between CU and DU control interface (Fl or Fl-C) interface connected with the gNB -DU. The Fl interface is illustrated as reference 198, although reference 198 also illustrates a link between remote elements of the RAN node 170 and centralized elements of the RAN node 170, such as between the gNB-CU 196 and the gNB-DU 195. The gNB-DU is a logical node hosting radio link control (RLC), MAC and physical layer (PHY) layers of the gNB or en-gNB, and its operation is partly controlled by gNB-CU. One gNB-CU supports one or multiple cells. One cell is supported by only one gNB-DU. The gNB-DU terminates the Fl interface 198 connected with the gNB-CU. Note that the DU 195 is considered to include the transceiver 160, for example, as part of a RU, but some examples of this may have the transceiver 160 as part of a separate RU, for example, under control of and connected to the DU 195. The RAN node 170 may also be an eNB (evolved NodeB) base station, for example, long term evolution (LTE), or any other suitable base station or node.
[00211] The RAN node 170 includes one or more processors 152, one or more memories 155, one or more network interfaces (N/W I/F(s)) 161, and one or more transceivers 160 interconnected through one or more buses 157. Each of the one or more transceivers 160 includes a receiver, Rx, 162 and a transmitter, Tx, 163. The one or more transceivers 160 are connected to one or more antennas 158. The one or more memories 155 include computer program code 153. The CU 196 may include the processor(s) 152, memories 155, and network interfaces 161. Note that the DU 195 may also include its own memory/memories and processor(s), and/or other hardware, but these are not shown.
[00212] The RAN node 170 includes a module 150, comprising one of or both parts 150-1 and/or 150-2, which may be implemented in a number of ways. The module 150 may be implemented in hardware as module 150-1, such as being implemented as part of the one or more processors 152. The module 150-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array. In another example, the module 150 may be implemented as module 150-2, which is implemented as computer program code 153 and is executed by the one or more processors 152. For instance, the one or more memories 155 and the computer program code 153 are configured to, with the one or more processors 152, cause the RAN node 170 to perform one or more of the operations as described herein. Note that the functionality of the module 150 may be distributed, such as being distributed between the DU 195 and the CU 196, or be implemented solely in the DU 195.
[00213] The one or more network interfaces 161 communicate over a network such as via the links 176 and 131. Two or more gNBs 170 may communicate using, for example, link 176. The link 176 may be wired or wireless or both and may implement, for example, an Xn interface for 5G, an X2 interface for LTE, or other suitable interface for other standards.
[00214] The one or more buses 157 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, wireless channels, and the like. For example, the one or more transceivers 160 may be implemented as a remote radio head (RRH) 195 for LTE or a distributed unit (DU) 195 for gNB implementation for 5G, with the other elements of the RAN node 170 possibly being physically in a different location from the RRH/DU, and the one or more buses 157 could be implemented in part as, for example, fiber optic cable or other suitable network connection to connect the other elements (for example, a central unit (CU), gNB-CU) of the RAN node 170 to the RRH/DU 195. Reference 198 also indicates those suitable network link(s).
[00215] It is noted that description herein indicates that ‘cells’ perform functions, but it should be clear that equipment which forms the cell may perform the functions. The cell makes up part of a base station. That is, there can be multiple cells per base station. For example, there could be three cells for a single carrier frequency and associated bandwidth, each cell covering one-third of a 360 degree area so that the single base station’s coverage area covers an approximate oval or circle. Furthermore, each cell can correspond to a single carrier and a base station may use multiple carriers. So when there are three 120 degree cells per carrier and two carriers, then the base station has a total of 6 cells.
[00216] The wireless network 100 may include a network element or elements 190 that may include core network functionality, and which provides connectivity via a link or links 181 with a further network, such as a telephone network and/or a data communications network (for example, the Internet). Such core network functionality for 5G may include access and mobility management function(s) (AMF(S)) and/or user plane functions (UPF(s)) and/or session management function(s) (SMF(s)). Such core network functionality for LTE may include MME (Mobility Management Entity )/SGW (Serving Gateway) functionality. These are merely example functions that may be supported by the network element(s) 190, and note that both 5G and LTE functions might be supported. The RAN node 170 is coupled via a link 131 to the network element 190. The link 131 may be implemented as, for example, an NG interface for 5G, or an SI interface for LTE, or other suitable interface for other standards. The network element 190 includes one or more processors 175, one or more memories 171, and one or more network interfaces (N/W I/F(s)) 180, interconnected through one or more buses 185. The one or more memories 171 include computer program code 173. The one or more memories 171 and the computer
program code 173 are configured to, with the one or more processors 175, cause the network element 190 to perform one or more operations.
[00217] The wireless network 100 may implement network virtualization, which is the process of combining hardware and software network resources and network functionality into a single, softwarebased administrative entity, a virtual network. Network virtualization involves platform virtualization, often combined with resource virtualization. Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to software containers on a single system. Note that the virtualized entities that result from the network virtualization are still implemented, at some level, using hardware such as processors 152 or 175 and memories 155 and 171, and also such virtualized entities create technical effects.
[00218] The computer readable memories 125, 155, and 171 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The computer readable memories 125, 155, and 171 may be means for performing storage functions. The processors 120, 152, and 175 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non-limiting examples. The processors 120, 152, and 175 may be means for performing functions, such as controlling the UE 110, RAN node 170, network element(s) 190, and other functions as described herein.
[00219] In general, the various embodiments of the user equipment 110 can include, but are not limited to, cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
[00220] One or more of modules 140-1, 140-2, 150-1, and 150-2 may be caused to perform: overfitting shared multipliers. Computer program code 173 may also be caused perform: overfitting shared multipliers.
[00221] As described above, FIGs. 9 to 13 include flowcharts of an apparatus (e.g. 50, 700, or any other apparatuses described herein), method, and computer program product according to certain
example embodiments. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory (e.g. 58, 125, or 504) of an apparatus employing an embodiment of the present invention and executed by processing circuitry (e.g. 56, 120, or 502) of the apparatus. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks. These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture, the execution of which implements the function specified in the flowchart blocks. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.
[00222] A computer program product is therefore defined in those instances in which the computer program instructions, such as computer-readable program code portions, are stored by at least one non- transitory computer-readable storage medium with the computer program instructions, such as the computer-readable program code portions, being configured, upon execution, to perform the functions described above, such as in conjunction with the flowchart(s) of FIGs. 9 to 13. In other embodiments, the computer program instructions, such as the computer-readable program code portions, need not be stored or otherwise embodied by a non-transitory computer-readable storage medium, but may, instead, be embodied by a transitory medium with the computer program instructions, such as the computer- readable program code portions, still being configured, upon execution, to perform the functions described above.
[00223] Accordingly, blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
[00224] In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination.
[00225] In the above, some example embodiments have been described with the help of syntax of the bitstream. It needs to be understood, however, that the corresponding structure and/or computer program may reside at the encoder for generating the bitstream and/or at the decoder for decoding the bitstream.
[00226] In the above, where example embodiments have been described with reference to an encoder, it needs to be understood that the resulting bitstream and the decoder have corresponding elements in them. Likewise, where example embodiments have been described with reference to a decoder, it needs to be understood that the encoder has structure and/or computer program for generating the bitstream to be decoded by the decoder.
[00227] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
[00228] It should be understood that the foregoing description is only illustrative. Various alternatives and modifications may be devised by those skilled in the art. For example, features recited in the various dependent claims could be combined with each other in any suitable combination(s). In addition, features from different embodiments described above could be selectively combined into a new embodiment. Accordingly, the description is intended to embrace all such alternatives, modifications and variances which fall within the scope of the appended claims.
[00229] References to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field-programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc. should be understood to encompass software for a programmable processor or firmware such as, for example, the programmable content of a hardware device such as instructions for a processor, or configuration settings for a fixed-function device, gate array or programmable logic device, and the like.
[00230] As used herein, the term ‘circuitry’ may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even when the software or firmware is not physically present. This description of ‘circuitry’ applies to uses of this term in this application. As a further example, as used herein, the term ‘circuitry’ would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware. The term ‘circuitry’ would also cover, for example and when applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device.
[00231] Circuitry or Circuit: As used in this application, the term ‘circuitry’ or ‘circuit’ may refer to one or more or all of the following:
(a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry); and
(b) combinations of hardware circuits and software, such as (as applicable):
(i) a combination of analog and/or digital hardware circuit(s) with software/firmware; and
(ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and
(c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.
[00232] This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example, and when applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
Claims
1. An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers; and multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
2. An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
3. The apparatus of claim 1 or 2, wherein K is a number of different values than that the C multipliers comprise, and wherein the values of the C multipliers are grouped into K groups, and wherein multipliers within each group comprise same value and multipliers within different groups comprise different values.
4. The apparatus of any of the claims 1 to 3, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
5. The apparatus of any of the claims 1 to 4, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
6. The apparatus of any of the claims 1 to 5, wherein the apparatus is caused to perform: assigning of the C multipliers to the different groups.
7. The apparatus of claim 6, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
8. The apparatus of claim 6, wherein the apparatus is further caused to perform the grouping operation, and wherein to perform the grouping operation the apparatus is further caused to perform: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
9. The apparatus of claim 7 or 8, wherein the apparatus is further caused to perform: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
10. The apparatus of claim 6, wherein the assigning is determined based on a content on which the C multipliers are overfitted and on a grouping operation.
11. The apparatus of any of the claims 7 to 9, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the apparatus is further caused to perform: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
12. The apparatus of claims 6 or 10, wherein during an overfitting operation, the C multipliers are overfitted without constraining the values of the C multipliers or without considering the C multipliers to belong to any groups, and wherein the apparatus is further caused to perform: updating of the C multipliers, based on the overfitting operation; and applying a clustering operation to the values of the C updated multipliers or to the C multiplier-updates by using a number K of clusters, wherein values of the C updated multipliers or values of the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a value of a multiplier-update.
13. The apparatus of claim 1 , wherein the apparatus is further caused to perform: signaling the K values of multiplier-updates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
14. The apparatus of claim 12, wherein the apparatus is further caused to perform: signaling values of the C multiplier -updates or the C updated multipliers, wherein the C values are in the set of cardinality K, and wherein the C values are ordered according to an assumed order of the channels of the tensor and/or the order of multipliers that multiply the channels of the tensor.
15. The apparatus of any of the claims 11, 13, or 14, wherein the apparatus is further caused to perform: quantizing the values of the C multiplier-updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
16. The apparatus of any of the previous claims, wherein the layer comprises a convolution layer of the neural network.
17. An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
18. The apparatus of claim 17, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
19. An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and
using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
20. The apparatus of claim 19, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier -updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number smaller than C.
21. The apparatus of any of the claims 19 or 20, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
22. An apparatus comprising at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor, cause the apparatus at least to perform: receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
23. The apparatus of claim 22, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier -updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or an multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number different from C.
24. The apparatus of any of the claims 22 or 23, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the apparatus is further caused to perform: dequantizing the quantized C multiplier updates or the C updated multipliers.
25. A method comprising:
setting values of C multipliers such that the values of the C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number of channels of a tensor that is multiplied by the C multipliers; and multiplying respective C channels of the tensor with the C multipliers, wherein the tensor comprises an output of a layer of a neural network.
26. A method comprising: multiplying C channels or portions of a tensor with respective C multipliers, wherein the tensor comprises an output of a layer of a neural network, and wherein values of the C multipliers are in a set of cardinality (K) less than C.
27. The method of claim 25 or 26, wherein K is a number of different values than that the C multipliers comprise, and wherein the values of the C multipliers are grouped into K groups, and wherein multipliers within each group comprise same value and multipliers within different groups comprise different values.
28. The method of any of the claims 25 to 27, wherein a number of the C multipliers in the different groups is same or approximately the same; or the number of multipliers in the different groups are different.
29. The method of any of the claims 25 to 28, wherein the number K of the different values of the C multipliers is same for all layers of the neural network; or the number K of the different values of multipliers is different for different layers of the neural network.
30. The method of any of the claims 25 to 29 further comprising: assigning of the C multipliers to the different groups.
31. The method of claim 30, wherein the assigning is predetermined based on a grouping operation and is same for any content or video sequence on which overfitting is performed.
32. The method of claim 30 further comprising performing the grouping operation, and wherein performing the grouping operation comprises: determining or assuming an order of the C channels and/or the C multipliers that are multiplied with the channels; and assigning nearby or consecutive multipliers to the same group, such that all the multipliers that belong to a certain group appear in a consecutive sequence within an array or other data structure that include the C multipliers.
33. The method of claim 31 or 32 further comprising: signaling, to a decoder, an indication of the grouping operation, and/or an indication of the order of the C channels or the C multipliers.
34. The method of claim 30, wherein the assigning is determined based on a content on which the C multipliers are overfitted and on a grouping operation.
35. The method of any of the claims 31 to 33, wherein, during an overfitting operation, multipliers in same group are constrained to comprise same value, and wherein the method further comprises: in response to the overfitting operation, signaling K values of multiplier updates or K values of updated multipliers to the decoder.
36. The method of claim 30 or 34, wherein during an overfitting operation, the C multipliers are overfitted without constraining the values of the C multipliers or without considering the C multipliers to belong to any groups, and wherein the method further comprises: updating of the C multipliers, based on the overfitting operation; and applying a clustering operation to the values of the C updated multipliers or to the C multiplier-updates by using a number K of clusters, wherein values of the C updated multipliers or values of the C multiplier-updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a value of a multiplier-update.
37. The method of claim 36 further comprising: signaling the K values of multiplierupdates or K values of updated multipliers to a decoder, together with an indication of assignment of each value to respective multiplier items sharing same properties.
38. The method of claim 36 further comprising: signaling values of the C multiplierupdates or the C updated multipliers, wherein the C values are in the set of cardinality K, and wherein the C values are ordered according to an assumed order of the channels of the tensor and/or the order of multipliers that multiply the channels of the tensor.
39. The method of any of the claims 35, 37, or 38 further comprising: quantizing the values of the C multiplier -updates, the values of the C updated multipliers, the K values of multiplier updates, and/or or the K values of updated multipliers prior to signaling.
40. The method of any of the claims 25 to 39, wherein the layer comprises a convolution layer of the neural network.
41. A method comprising: receiving an indication of a grouping operation, wherein C multipliers are assigned to groups based on the grouping operation, and wherein values of C multipliers are in a set of cardinality (K) less than C, wherein C comprises a number channels of a tensor that is output by a layer of a neural network, and wherein K is a number of different values than that the C multipliers comprise; receiving K values of multiplier updates or K values of updated multipliers; and updating the C multipliers by using the K values of multiplier updates or the K values of updated multipliers and the grouping information.
42. The method of claim 41, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
43. A method comprising: receiving, for a layer of a neural network, K values of multiplier-updates or K values of updated multipliers together with an information of assignment of each value to respective multiplier; and using the K values of multiplier-updates or K values of updated multipliers to update C multipliers of the layer, based on the information of assignment of each value to respective multiplier.
44. The method of claim 43, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier -updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or a multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number smaller than C.
45. The method of any of the claims 43 or 44, wherein the K values of multiplier updates or K values of updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized K values of multiplier updates or the K values of updated multipliers.
46. A method comprising:
receiving, for a layer of a neural network, values of C multiplier updates or values of C updated multipliers, wherein the value of C multiplier updates or the C updated multipliers are in a set of cardinality K; and using the values of C multiplier updates or values of C updated multipliers to update the C multipliers of the layer.
47. The method of claim 46, wherein a clustering operation is applied to the values of the C updated multipliers or to the C multiplier-updates, by using a number K of clusters, wherein values of the C updated multipliers or the C multiplier -updates are clustered into K clusters or K groups represented by K centroids, and wherein each centroid represents a value of an updated multiplier or an multiplier-update, and wherein C comprises a number of kernels of the layer, and wherein K is a number different from C.
48. The method of any of the claims 46 or 47, wherein the C multiplier updates or the C updated multipliers are quantized, and wherein the method further comprises: dequantizing the quantized C multiplier updates or the C updated multipliers.
49. A computer readable medium comprising program instructions which, when executed by an apparatus, cause the apparatus to perform the methods as claimed in any of the claims 25 to 48.
50. An apparatus comprising means for performing methods as claimed in any of the claims
25 to 48.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363612109P | 2023-12-19 | 2023-12-19 | |
| US63/612,109 | 2023-12-19 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2025133815A1 true WO2025133815A1 (en) | 2025-06-26 |
Family
ID=94128822
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/IB2024/062397 Pending WO2025133815A1 (en) | 2023-12-19 | 2024-12-09 | Overfitting shared multipliers |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2025133815A1 (en) |
-
2024
- 2024-12-09 WO PCT/IB2024/062397 patent/WO2025133815A1/en active Pending
Non-Patent Citations (3)
| Title |
|---|
| SANTAMARIA MARIA ET AL: "Overfitting multiplier parameters for content-adaptive post-filtering in video coding", 2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), IEEE, 11 September 2022 (2022-09-11), pages 1 - 6, XP034212214, DOI: 10.1109/EUVIP53989.2022.9922721 * |
| YANG (NOKIA) R ET AL: "EE1-2.2: Content-adaptive LOP filter", no. JVET-AG0111, 19 January 2024 (2024-01-19), XP030314005, Retrieved from the Internet <URL:https://jvet-experts.org/doc_end_user/documents/33_Teleconference/wg11/JVET-AG0111-v4.zip JVET-AG0111-v4-clean.docx> [retrieved on 20240119] * |
| YANG RUIYING ET AL: "Low-precision post-filtering in video coding", 2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), IEEE, 5 December 2022 (2022-12-05), pages 137 - 140, XP034280793, DOI: 10.1109/ISM55400.2022.00027 * |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11375204B2 (en) | Feature-domain residual for video coding for machines | |
| US11575938B2 (en) | Cascaded prediction-transform approach for mixed machine-human targeted video coding | |
| US12323607B2 (en) | Apparatus, method and computer program product for optimizing parameters of a compressed representation of a neural network | |
| US20240314362A1 (en) | Performance improvements of machine vision tasks via learned neural network based filter | |
| US11558628B2 (en) | Caching and clearing mechanism for deep convolutional neural networks | |
| US20240146938A1 (en) | Method, apparatus and computer program product for end-to-end learned predictive coding of media frames | |
| US20240249514A1 (en) | Method, apparatus and computer program product for providing finetuned neural network | |
| WO2022269415A1 (en) | Method, apparatus and computer program product for providng an attention block for neural network-based image and video compression | |
| EP4038875A1 (en) | Guiding decoder-side optimization of neural network filter | |
| US20230412806A1 (en) | Apparatus, method and computer program product for quantizing neural networks | |
| EP4464009A1 (en) | High-level syntax of predictive residual encoding in neural network compression | |
| US20240265240A1 (en) | Method, apparatus and computer program product for defining importance mask and importance ordering list | |
| US12321870B2 (en) | Apparatus method and computer program product for probability model overfitting | |
| US20240013046A1 (en) | Apparatus, method and computer program product for learned video coding for machine | |
| US20230325639A1 (en) | Apparatus and method for joint training of multiple neural networks | |
| WO2023199172A1 (en) | Apparatus and method for optimizing the overfitting of neural network filters | |
| EP4181511A2 (en) | Decoder-side fine-tuning of neural networks for video coding for machines | |
| US20230186054A1 (en) | Task-dependent selection of decoder-side neural network | |
| WO2025133815A1 (en) | Overfitting shared multipliers | |
| US20250373831A1 (en) | End-to-end learned codec for multiple bitrates | |
| US20250310522A1 (en) | Quantizing overfitted filters | |
| US20240357104A1 (en) | Determining regions of interest using learned image codec for machines | |
| US20240267543A1 (en) | Transformer based video coding | |
| WO2025104676A1 (en) | Single-bit overfitting | |
| WO2025202872A1 (en) | Minimizing coding delay and memory requirements for overfitted filters |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 24829242 Country of ref document: EP Kind code of ref document: A1 |