[go: up one dir, main page]

WO2025154665A1 - Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage - Google Patents

Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage

Info

Publication number
WO2025154665A1
WO2025154665A1 PCT/JP2025/000612 JP2025000612W WO2025154665A1 WO 2025154665 A1 WO2025154665 A1 WO 2025154665A1 JP 2025000612 W JP2025000612 W JP 2025000612W WO 2025154665 A1 WO2025154665 A1 WO 2025154665A1
Authority
WO
WIPO (PCT)
Prior art keywords
geometric
geometric information
attribute set
encoding device
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2025/000612
Other languages
English (en)
Japanese (ja)
Inventor
ジン ユエン トン
ハン ブン テオ
ジャヤシュリ カーレカー
チョン スン リム
スギリ プラナタ リム
清史 安倍
孝啓 西
敏康 杉尾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Corp of America
Original Assignee
Panasonic Intellectual Property Corp of America
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Intellectual Property Corp of America filed Critical Panasonic Intellectual Property Corp of America
Publication of WO2025154665A1 publication Critical patent/WO2025154665A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process

Definitions

  • This disclosure relates to a decoding device, etc.
  • Non-Patent Document 1 relates to an example of a conventional standard for the above-mentioned video coding technology.
  • a decoding device includes a circuit and a memory connected to the circuit, and the circuit, in operation, decodes from a bit stream face image base data relating to a facial video and geometric information, which corresponds to each of a plurality of frames of the facial video and indicates geometric attributes within an area including a person's face, further decodes from the bit stream concealment parameters relating to error recovery control in the case where the geometric information is not properly acquired by the encoding device, and generates the facial video from the base data, the geometric information, and the concealment parameters using a generative model.
  • the configuration or method according to one aspect of the present disclosure may contribute to one or more of the following, for example: improved coding efficiency, improved image quality, reduced processing volume, reduced circuit scale, improved processing speed, and appropriate selection of elements or operations. Note that the configuration or method according to one aspect of the present disclosure may also contribute to benefits other than those mentioned above.
  • FIG. 1 is a block diagram showing a configuration of an encoding/decoding system according to a reference example.
  • FIG. 2 is a block diagram showing the configuration of an encoding device in a reference example.
  • FIG. 3 is a block diagram showing the configuration of a decoding device in a reference example.
  • FIG. 4 is a conceptual diagram showing an example of a reference image.
  • FIG. 5 is a conceptual diagram showing an example of a geometric attribute set.
  • FIG. 6 is a conceptual diagram showing an example of a face moving image.
  • FIG. 7 is a block diagram showing an example of a configuration of an encoding/decoding system according to an embodiment.
  • FIG. 8 is a diagram showing an example of a hierarchical structure of data in a stream.
  • FIG. 9 is a block diagram showing an example of a configuration of a decoding device according to an embodiment.
  • FIG. 10 is a flowchart showing an example of the operation of the decoding device according to the embodiment.
  • FIG. 11 is a conceptual diagram showing an example of an operation performed in accordance with the concealment parameters.
  • FIG. 12 is a conceptual diagram showing another example of an operation performed in accordance with the concealment parameters.
  • FIG. 13 is a conceptual diagram showing an example of an operation when a geometric attribute set is not acquired.
  • FIG. 14 is a conceptual diagram showing an example of the operation of the decoding device when the number of attributes in the geometry attribute set is different from the original number of attributes.
  • FIG. 15 is a conceptual diagram showing an example of an operation for controlling storage of a geometric attribute set.
  • FIG. 10 is a flowchart showing an example of the operation of the decoding device according to the embodiment.
  • FIG. 11 is a conceptual diagram showing an example of an operation performed in accordance with the concealment parameters.
  • FIG. 16 is a conceptual diagram showing an example of an operation performed in a decoding device according to the reliability of a geometric attribute set.
  • FIG. 17 is a conceptual diagram showing an example of an operation in which a geometric attribute is replaced in the decoding device.
  • FIG. 18 is a conceptual diagram showing another example of the operation in which a geometric attribute is replaced in the decoding device.
  • FIG. 19 is a conceptual diagram showing an example of a geometric attribute set stored in the decoding device.
  • FIG. 20 is a conceptual diagram showing an example of a geometric attribute set decoded in the decoding device.
  • FIG. 21 is a conceptual diagram showing an example of a geometric attribute set derived in the decoding device.
  • FIG. 22 is a block diagram showing another example of the configuration of a decoding device according to an embodiment. In FIG. FIG. FIG.
  • FIG. 23 is a block diagram showing an example of a configuration of an encoding device according to an embodiment.
  • FIG. 24 is a flowchart showing an example of the operation of the encoding device according to the embodiment.
  • FIG. 25 is a conceptual diagram showing a specific example of a geometric attribute set.
  • FIG. 26 is a conceptual diagram showing another specific example of a geometric attribute set.
  • FIG. 27 is a conceptual diagram showing yet another specific example of a geometric attribute set.
  • FIG. 28 is a conceptual diagram showing an example of the operation of the encoding device when the number of attributes in the geometry attribute set is different from the original number of attributes.
  • FIG. 29 is a conceptual diagram showing an example of a face that is partially hidden by occlusion.
  • FIG. 30 is a conceptual diagram showing examples of faces having extreme yaw poses.
  • FIG. 31 is a conceptual diagram showing examples of faces having extreme poses relative to roll.
  • FIG. 32 is a conceptual diagram showing an example of a blurred face.
  • FIG. 33 is a conceptual diagram showing an example of a face with a low number of detected landmarks.
  • FIG. 34 is a conceptual diagram showing an example of operations performed in an encoding device according to the reliability of a geometric attribute set.
  • FIG. 35 is a conceptual diagram showing an example of an operation in which a geometric attribute is replaced in the encoding device.
  • FIG. 36 is a conceptual diagram showing another example of the operation in which a geometric attribute is replaced in the encoding device.
  • FIG. 37 is a conceptual diagram showing an example of a geometric attribute set stored in the encoding device.
  • FIG. 38 is a conceptual diagram showing an example of a geometric attribute set detected in the encoding device.
  • FIG. 39 is a conceptual diagram showing an example of a geometric attribute set derived in the encoding device.
  • FIG. 40 is a syntax diagram illustrating an example syntax structure for concealment parameters.
  • FIG. 41 is a syntax diagram illustrating another example syntax structure for concealment parameters.
  • FIG. 42 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 43 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 44 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 45 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 46 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 47 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 48 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 49 is a syntax diagram illustrating yet another example syntax structure for concealment parameters.
  • FIG. 50 shows examples of various neural networks that can be used as generative models.
  • FIG. 51 is a block diagram showing an example of a configuration for an encoding device in an embodiment to encode moving images.
  • FIG. 52 is a block diagram showing an example of a configuration for a decoding device in an embodiment to decode a moving image.
  • FIG. 53 is a block diagram showing an example of implementation of an encoding device in an embodiment.
  • FIG. 53 is a block diagram showing an example of implementation of an encoding device in an embodiment.
  • FIG. 54 is a flowchart showing an example of a basic operation of the encoding device in an embodiment.
  • FIG. 55 is a flowchart showing another example of basic operation of the encoding device in an embodiment.
  • FIG. 56 is a block diagram showing an example of implementation of a decoding device according to an embodiment.
  • FIG. 57 is a flowchart showing an example of a basic operation of the decoding device in the embodiment.
  • FIG. 58 is a flowchart showing another example of the basic operation of the decoding device in an embodiment.
  • FIG. 59 is a diagram showing the overall configuration of a content supply system that realizes a content distribution service.
  • FIG. 60 is a diagram showing an example of a display screen of a web page.
  • FIG. 61 is a diagram showing an example of a display screen of a web page.
  • FIG. 62 is a diagram showing an example of a smartphone.
  • FIG. 63 is a block diagram showing an example configuration of a smartphone.
  • FIG. 1 is a block diagram showing the configuration of an encoding/decoding system in a reference example.
  • the encoding/decoding system shown in FIG. 1 includes an encoding device 700 and a decoding device 800.
  • the model architecture of the current research can be represented as a framework of the encoding device 700 and the decoding device 800, as shown in FIG. 1.
  • the model architecture may be input in the form of a reference image (source image) of the target person and one or more frames of driving video, which are encoded and compressed into one or more bitstreams by the encoding device 700.
  • the compressed bitstreams are then transmitted to the decoding device 800 using a transmission channel. Finally, the decoding device 800 reconstructs the output video from the received bitstreams.
  • FIG. 2 is a block diagram showing the configuration of an encoding device 700 in a reference example.
  • the encoding device 700 includes compressors 731 and 733, and a deriver 732.
  • the encoding device 700 first encodes one or more reference images into a bitstream via a compressor 731.
  • each reference image may be the first one or more frames of a driving video, or at least one image or avatar containing the face of a target person.
  • the reference image may also be expressed as a face image.
  • Subsequent frames of the driving video are sent to a deriving device 732 to derive a set of geometric attributes for the frames, which are then encoded into a bitstream via a compressor 733.
  • Each driving frame may include a person's face.
  • the compressor 733 may or may not be the same as the compressor 731.
  • the bitstream is transmitted to the decoding device 800 via a transmission channel.
  • the bitstream may be a single bitstream or may be composed of multiple sub-bitstreams.
  • FIG. 3 is a block diagram showing the configuration of a decoding device 800 in a reference example.
  • the decoding device 800 includes decompressors 831 and 833, a deriver 832, and a generator 834.
  • the decoding device 800 first decodes one or more reference images from the bitstream.
  • Each reference image may include a human face.
  • the reference images are passed to a deriver 832, which extracts related information of the reference images as a reference attribute set.
  • the reference attribute set may include a geometric attribute set of the reference images.
  • the data of the reference images and the reference attribute set is data that is commonly used for multiple pictures of the video to be generated, and is also referred to as base data.
  • each geometric attribute set contains the geometric attributes of the person in the driving frame.
  • the generator 834 takes as input one or more reference images obtained by the decompressor 831 and the reference attribute set obtained by the deriving unit 832 along with the geometric attribute set obtained by the decompressor 833, and generates a facial motion image.
  • the generator 834 may use only one of the reference images obtained by the decompressor 831 and the reference attribute set obtained by the deriving unit 832.
  • FIG. 4 is a conceptual diagram showing an example of a reference image.
  • the reference image is an image that includes a face.
  • FIG. 6 is a conceptual diagram showing an example of a facial motion image.
  • the facial motion image is a motion image that includes a face.
  • the geometric attributes of that frame are reflected in the reference image. This gives movement to the reference image.
  • the amount of information of the multiple geometric attributes corresponding to the reference image and multiple frames is smaller than the amount of information of the multiple frames included in the captured video. Therefore, by encoding the multiple geometric attributes corresponding to the reference image and multiple frames, the amount of code is reduced compared to encoding the multiple frames included in the captured video. Furthermore, each geometric attribute imparts movement to the reference image. This imparts movement to the face of the display subject, enabling rich expression.
  • a geometric attribute set is required when generating each frame of a facial video image.
  • the geometric attribute set is not always appropriate. And, for example, it is not easy for the decoding device 800 to evaluate whether the geometric information set decoded from the bitstream is appropriate. If the geometric attribute set is not appropriate, it may result in the generation of distorted or incorrect frames.
  • the decoding device of Example 1 includes a circuit and a memory connected to the circuit, and in operation, the circuit decodes from the bit stream face image base data relating to a facial video and geometric information, which corresponds to each of a plurality of frames of the facial video and indicates geometric attributes within an area including a person's face, and further decodes from the bit stream concealment parameters relating to error recovery control in the case where the geometric information is not properly acquired by the encoding device, and generates the facial video from the base data, the geometric information, and the concealment parameters using a generative model.
  • the decoding device of Example 6 may be any of the decoding devices of Examples 1 to 5, in which the concealment parameter indicates, depending on the value of the concealment parameter, that stored geometric information is to be applied to the current frame of the facial video image instead of the geometric information decoded from the bitstream.
  • the decoding device of Example 7 may be any one of the decoding devices of Examples 1 to 6, in which the concealment parameter indicates, depending on the value of the concealment parameter, that the geometric information corresponding to the current frame is to be stored for a subsequent frame.
  • the decoding device of Example 8 may be any of the decoding devices of Examples 1 to 7, and when the concealment parameters indicate that the geometric information corresponding to the current frame obtained by the encoding device has low reliability or that the number of attributes in the geometric information is different from the original number of attributes, the circuit corrects the geometric information corresponding to the current frame decoded from the bit stream using stored geometric information, and applies the corrected geometric information to the geometric information for generating the face image corresponding to the current frame in the face video.
  • the decoding device of Example 9 may be any of the decoding devices of Examples 1 to 7, and when the concealment parameters indicate that the geometric information corresponding to the current frame obtained by the encoding device has low reliability, that the geometric information was not obtained, that the number of attributes in the geometric information is different from the original number of attributes, or that stored geometric information is to be applied to the geometric information corresponding to the current frame, the circuit may be a decoding device that applies the stored geometric information to the geometric information for generating the face image corresponding to the current frame in the face video.
  • the decoding device of Example 10 may be any of the decoding devices of Examples 1 to 7, and when the concealment parameters indicate that the geometric information corresponding to the current frame obtained by the encoding device has low reliability, that the geometric information was not obtained, or that the number of attributes in the geometric information is different from the original number of attributes, the circuit does not use the generative model to generate the facial image corresponding to the current frame in the facial video, but instead applies the facial image that has already been generated to the facial image corresponding to the current frame.
  • the encoding device is unable to properly acquire geometric information, it may be possible to use a facial image that has already been generated for the current frame, rather than generating a new facial image. This may therefore make it possible to properly perform error recovery control.
  • the decoding device of Example 11 may be any one of the decoding devices of Examples 6, 8, and 9, in which the stored geometric information is geometric information corresponding to a past frame decoded from the bitstream.
  • the decoding device of Example 12 may be any one of the decoding devices of Examples 6, 8, and 9, in which the stored geometric information is predefined reference geometric information.
  • the decoding device of Example 13 includes a circuit and a memory connected to the circuit, and in operation, the circuit acquires base data of an image included in a video, decodes facial attribute parameters indicating a face included in the image from a bit stream, and generates an output image corresponding to the image by inputting the base data and the facial attribute parameters into a generative model, the bit stream includes a reliability parameter related to the reliability of the facial attribute parameter, and the reliability parameter indicates that the reliability is low according to the value of the reliability parameter.
  • the encoding device when the reliability of the face attribute parameters is low, it may be possible for the encoding device to notify the decoding device that the reliability of the face attribute parameters is low. Therefore, it may be possible for the decoding device to appropriately determine that the reliability of the face attribute parameters is low.
  • the decoding device of Example 14 may be the decoding device of Example 13, in which the images correspond to each of a plurality of pictures included in the video, the base data is data common to the plurality of pictures, and the bit stream includes the reliability parameter and the face attribute parameter for each of the plurality of pictures.
  • the decoding device of Example 15 may also be the decoding device of Example 14, in which the bitstream includes the reliability parameters before the facial attribute parameters for each of the plurality of pictures.
  • the encoding device of Example 20 may be any of the encoding devices of Examples 17 to 19, in which the concealment parameter indicates, depending on the value of the concealment parameter, that the geometric information corresponding to the current frame has not been obtained in the encoding device.
  • the encoding device of Example 21 may be any of the encoding devices of Examples 17 to 20, in which the concealment parameter indicates, depending on the value of the concealment parameter, that the number of pieces of geometric information corresponding to the current frame acquired by the encoding device is different from the original number.
  • the encoding device of Example 22 may be any of the encoding devices of Examples 17 to 21, in which the concealment parameter indicates, depending on the value of the concealment parameter, that stored geometric information is to be applied to the current frame of the facial video image instead of the geometric information decoded from the bitstream.
  • the encoding device of Example 24 may be any of the encoding devices of Examples 17 to 23, and the circuit may set the concealment parameter to a value indicating that the reliability of the geometric information corresponding to the current frame obtained by the encoding device is low when the reliability of the geometric information is lower than a threshold value.
  • the encoding device when the reliability of the geometric information is lower than a threshold, it may be possible for the encoding device to notify the decoding device that the reliability of the geometric information is low. Therefore, it may be possible for the decoding device to appropriately perform error recovery control.
  • the encoding device of Example 25 may be any of the encoding devices of Examples 17 to 24, and the circuit may be an encoding device that, if the reliability of the geometric information corresponding to the current frame acquired by the encoding device is lower than a threshold value, or if the current frame acquired by the encoding device does not include a face, does not encode the geometric information corresponding to the current frame into the bit stream, and sets the concealment parameter to a value indicating that the geometric information was not acquired.
  • the encoding device may notify the decoding device that the geometric information will be stored. This may make it possible for the decoding device to appropriately perform error recovery control for subsequent frames.
  • the encoding device of Example 29 may be any of the encoding devices of Examples 17 to 28, and when the reliability of the geometric information corresponding to the current frame acquired by the encoding device is lower than a threshold, the circuit corrects the acquired geometric information using stored geometric information and encodes the corrected geometric information into the bit stream.
  • the encoding device does not properly acquire geometric information, it may be possible to correct the improperly acquired geometric information using the stored geometric information. Therefore, it may be possible to properly perform error recovery control in the encoding device.
  • the encoding device of Example 30 may be any of the encoding devices of Examples 17 to 28, and when the reliability of the geometric information corresponding to the current frame obtained by the encoding device is lower than a threshold, the circuit may encode the stored geometric information into the bit stream as the geometric information corresponding to the current frame.
  • the encoding device of Example 32 may be any one of the encoding devices of Examples 22, 27, 29, and 30, in which the stored geometric information is predefined reference geometric information.
  • the encoding device of Example 33 includes a circuit and a memory connected to the circuit, and the circuit, in operation, encodes base data of an image included in a video, encodes face attribute parameters indicating a face included in the image into a bit stream, and further encodes a reliability parameter related to the reliability of the face attribute parameter into the bit stream, the reliability parameter indicating that the reliability is low according to the value of the reliability parameter.
  • the decoding device may infer the reliability and perform processing, which may result in the generation of an inappropriate image.
  • the encoding device may notify the decoding device that the reliability of the face attribute parameters is low.
  • the decoding device may be possible for appropriately determine whether the reliability of the facial attribute parameters is low. Furthermore, it may be possible for the encoding device to control the decoding device's determination of whether the reliability of the facial attribute parameters is low.
  • the encoding device of Example 35 may also be the encoding device of Example 34, in which the bitstream includes the reliability parameters before the face attribute parameters for each of the plurality of pictures.
  • the overall flexibility of the decoding device is improved to adaptively utilize previously decoded information in reproducing the current frame.
  • errors in frames generated when insufficient information is received are concealed, making the output video look more natural.
  • the operation of the GAN can be guaranteed at each step of the face reproduction process.
  • Pixel Value/Sample Value A value inherent to a pixel, including not only brightness value, color difference value, and RGB gradation, but also depth value or the binary values 0 and 1.
  • flags may be multi-bit, for example, parameters or indexes of two or more bits.
  • flags may be multi-valued using other base numbers as well as two values using binary numbers.
  • Signal Something that is symbolized or coded to transmit information, including discrete digital signals as well as analog signals that take continuous values.
  • a stream/Bit Stream A data string of digital data or a flow of digital data.
  • a stream/bit stream may be a single stream or may be divided into multiple layers and composed of multiple streams. It also includes cases where the data is transmitted by serial communication over a single transmission line, as well as cases where the data is transmitted by packet communication over multiple transmission lines.
  • an encoding device and a decoding device are described.
  • the embodiments are examples of encoding devices and decoding devices to which the processes and/or configurations described in each aspect of the present disclosure can be applied.
  • the processes and/or configurations can also be implemented in encoding devices and decoding devices that are different from the embodiments.
  • any of the following may be implemented.
  • Some of the components among the multiple components constituting the encoding device or decoding device of the embodiment may be combined with components described in any of the aspects of the present disclosure, may be combined with components having some of the functions described in any of the aspects of the present disclosure, or may be combined with components that perform some of the processing performed by the components described in any of the aspects of the present disclosure.
  • Fig. 7 is a block diagram showing a configuration example of a coding/decoding system according to the present embodiment.
  • the coding/decoding system includes a coding device 100 and a decoding device 200.
  • the example in Fig. 7 is similar to the example in Fig. 1, but in Fig. 7, the specific configuration and processing of the coding device 100, the specific configuration and processing of the decoding device 200, and the bit stream are different from those in the example in Fig. 1.
  • the reference image is an image including a face, and may also be expressed as a face image, a source image, or an identity image.
  • the reference image represents static visual features for reconstructing a face video.
  • the driving video is a video including a face, and is a captured video obtained by a camera. The driving video serves to impart motion to the reference image.
  • the bit stream is also simply expressed as a stream. In addition, the use of one bit stream is not limited, and multiple bit streams may be used.
  • the person in the reference image and the person in the driving video may or may not be the same person.
  • the encoding/decoding system of this embodiment can be applied to video conferencing, video generation and editing in the entertainment industry, social media, and the e-commerce industry.
  • the scope of application is not limited to these.
  • the PPS includes parameters used for a picture, i.e., encoding parameters referenced by the decoding device 200 to decode each picture in a sequence.
  • the encoding parameters may include a reference value of the quantization width used in decoding the picture and a flag indicating the application of weighted prediction.
  • the SPS and PPS may simply be referred to as parameter sets.
  • a CTU is also called a superblock or a basic division unit.
  • Such a CTU includes a CTU header and one or more CUs (Coding Units), as shown in (e) of FIG. 8.
  • the CTU header includes coding parameters that are referenced by the decoding device 200 to decode one or more CUs.
  • a picture that is currently the subject of processing performed by a device such as the encoding device 100 or the decoding device 200 is called a current picture. If the processing is encoding, the current picture is synonymous with a picture to be encoded, and if the processing is decoding, the current picture is synonymous with a picture to be decoded.
  • a block, such as a CU or CU, that is currently the subject of processing performed by a device such as the encoding device 100 or the decoding device 200 is called a current block. If the processing is encoding, the current block is synonymous with a block to be encoded, and if the processing is decoding, the current block is synonymous with a block to be decoded.
  • the header area is an area that includes an SEI.
  • the header area can further include a VPS, SPS, PPS, SEI, a picture header, a slice header, a CTU header, and a CU header.
  • [Decryption configuration and processing] 9 is a block diagram showing an example of the configuration of a decoding device 200 according to this embodiment.
  • the decoding device 200 includes decompressors 231, 233, and 235, derivers 232 and 236, a generator 234, and a buffer 237.
  • Each component is, for example, an electric circuit that performs information processing.
  • Two or more of the decompressors 231, 233, and 235 may be integrated together.
  • the decompressor 231 decodes a reference image from the bitstream.
  • the deriver 232 derives a reference attribute set from the reference image.
  • the generator 234 may be provided with not only the reference attribute set obtained by the deriver 232 but also the reference image obtained by the decompressor 231.
  • the configuration and processing of the deriver 232 may be omitted, and the generator 234 may be provided with a reference image without being provided with a reference attribute set.
  • the decompressor 233 decodes the geometry attribute set for each picture from the bitstream.
  • the geometry attribute set decoded from the bitstream may simply be referred to as the decoded geometry attribute set.
  • the decompressor 235 decodes the concealment parameters from the bitstream.
  • the concealment parameters are parameters related to error recovery control when an appropriate geometric attribute set cannot be obtained in the encoding device 100, and are decoded, for example, for each picture.
  • the concealment parameters can also be expressed as error recovery control parameters.
  • the buffer 237 stores a geometric attribute set.
  • the geometric attribute set stored in the buffer 237 may be simply referred to as a stored geometric attribute set.
  • the stored geometric attribute set may be a previously decoded geometric attribute set, or may be a predefined geometric attribute set.
  • the buffer 237 may store multiple stored geometric attribute sets corresponding to multiple pictures.
  • the generator 234 generates a facial motion image based on the reference attribute set and the derived geometric attribute set.
  • the generator 234 has a generative model that outputs an image in response to the input of the reference attribute set and the derived geometric attribute set, and generates a facial motion image from the reference attribute set and the derived geometric attribute set using the generative model.
  • the generative model may be a neural network such as a Generative Adversarial Network (GAN).
  • GAN Generative Adversarial Network
  • decoding information from the bitstream corresponds to decompressing the compressed information in the bitstream.
  • a portion of the geometric attribute set corresponding to a frame may be stored in buffer 237 or retrieved from buffer 237.
  • FIG. 10 is a flowchart showing an example of the operation of the decoding device 200 in this embodiment.
  • the multiple components of the decoding device 200 shown in FIG. 9 operate according to the flowchart in FIG. 10.
  • An example of a geometric attribute set is facial landmarks derived from a frame of driving video at a given time. These landmarks indicate the location of points on key areas of the face, including the facial contours, eyes, eyebrows, nose, mouth, lips and chin. This allows for interpretation of the facial attributes and allows for easy modification of the facial attributes to produce desired emotions and facial expressions. These landmarks may be in a 2D or 3D spatial coordinate system. After decoding, the geometric attribute set may be stored in buffer 237.
  • Example of concealment parameter indicating that a geometric attribute set was not obtained the concealment parameter indicates that a geometric attribute set was not obtained in the encoding device 100. Specifically, a geometric attribute set may not be obtained because no face is present in the driving frame captured by the encoding device 100. In such a case, the concealment parameter may indicate that a geometric attribute set was not obtained. Such a concealment parameter may indicate that no geometric attribute is present in the bitstream.
  • Such cases may occur when only part of the face is enclosed in the driving frame, or when the encoding device 100 is unable to partially detect (extract) the geometric attribute set from the current frame due to accuracy, environment, etc.
  • the hiding parameters may indicate low confidence (confidence score) in the geometry attribute set. This may arise from any of the following scenarios:
  • encoding device 100 may detect (extract) some or all of the geometric attribute set with a low confidence score and signal occlusion parameters indicating that the geometric attribute set has a low confidence score.
  • the deriver 236 of the decoding device 200 determines whether the value of the concealment parameter decoded from the bitstream is equal to a predetermined value (S202), thereby determining the process for deriving a geometric attribute set by error concealment in the decoding device 200.
  • the deriver 236 derives a geometric attribute set for generating a facial moving image using at least the geometric attribute set stored in the buffer 237 (S203).
  • the stored attribute set which is the geometric attribute set stored in the buffer 237, is, for example, a geometric attribute set previously decoded from the bitstream.
  • the geometric attribute set for generating the facial moving image may be derived based on the attributes and values of the occlusion parameters, and at least one of the stored geometric attribute set and the decoded geometric attribute set.
  • the decoded geometric attribute set is the geometric attribute set corresponding to the current frame.
  • FIG. 11 is a conceptual diagram showing an example of operations performed according to the concealment parameters.
  • the decompressors 233 and 235 obtain a decoded geometric attribute set and concealment parameters by applying arithmetic decoding to the bitstream.
  • the deriver 236 determines whether the concealment parameters are equal to a predetermined value. If the concealment parameters are equal to the predetermined value, the deriver 236 obtains a stored geometric attribute set.
  • the deriver 236 derives a geometric attribute set for generating a facial moving image by applying an inverse affine transformation to the decoded geometric attribute set and the stored geometric attribute set.
  • the decoded geometric attribute set and the stored geometric attribute set may be combined.
  • the decoded geometric attribute set may be combined with multiple stored geometric attribute sets.
  • the deriver 236 may derive a geometric attribute set for generating a facial moving image only from the decoded geometric attribute set, without relying on the stored geometric attribute set. Furthermore, the deriver 236 may apply the decoded geometric attribute set directly to the derived geometric attribute set, without using an inverse affine transformation.
  • the deriver 236 may derive a geometric attribute set for generating a facial moving image only from the stored geometric attribute set, without relying on the decoded geometric attribute set.
  • the deriver 236 may also apply the stored geometric attribute set directly to the derived geometric attribute set, without using an inverse affine transformation.
  • the inverse affine transformation corresponds to linear transformation such as enlargement, reduction, rotation, and translation.
  • the inverse affine transformation may also be an affine transformation.
  • FIG. 12 is a conceptual diagram showing another example of an operation performed according to the concealment parameters.
  • the deriver 236 applies an inverse affine transformation to the decoded geometric attribute set.
  • the deriver 236 then combines the result obtained by applying the inverse affine transformation to the decoded geometric attribute set with the stored geometric attribute set to derive a geometric attribute set for generating a facial moving image.
  • the inverse affine transformation does not need to be applied to the stored geometric attribute set.
  • the stored geometry attribute set may be a geometry attribute set previously decoded from the bitstream and stored in the buffer 237.
  • the stored geometry attribute set may be a geometry attribute set decoded and stored for a frame prior to the current frame.
  • the stored geometry attribute set may be a geometry attribute set decoded and stored for a first (intra) frame of a group of pictures.
  • the stored geometric attribute set may correspond to a portion of a complete geometric attribute set representing the entire face.
  • the stored geometric attribute set may be the left eye group in the entire geometric attribute set.
  • the derived geometric attribute set examples (1) to (5) may correspond to the obscuration parameter examples (1) to (5) above.
  • the concealment parameter indicates that the geometry attribute set has not been acquired.
  • the decoding device 200 can skip decoding the geometry attribute set. Then, the decoding device 200 can derive the geometry attribute set by directly using the stored geometry attribute set acquired from the buffer 237. For example, when the geometry attribute set has not been acquired in the encoding device 100, the concealment parameter is set to 1, thereby indicating that the geometry attribute set has not been acquired.
  • FIG. 13 is a conceptual diagram showing an example of operation when a geometric attribute set is not obtained.
  • the concealment parameter is equal to 1.
  • the deriving unit 236 obtains a stored geometric attribute set from the buffer 237.
  • the deriving unit 236 sets the stored geometric attribute set to a derived geometric attribute set.
  • the deriving unit 236 then outputs the derived geometric attribute set. That is, in this case, the deriving unit 236 outputs the stored geometric attribute set as the derived geometric attribute set.
  • the concealment parameter indicates that the number of attributes is different from the original number of attributes.
  • the concealment parameter indicates that the number of attributes is different from the original number of attributes.
  • the number of attributes in the decoded geometry attribute set has a different number of attributes than expected. Therefore, in this case, the decoding device 200 obtains the stored geometry attribute set from the buffer 237, and uses the decoded geometry attribute set and the stored geometry attribute set in combination. For example, when the concealment parameter is set to 1, the concealment parameter indicates that the number of attributes is different from the original number of attributes.
  • the missing attributes in the decoded geometry attribute set may be filled in or replaced by corresponding attributes in the stored geometry attribute set. If the number of attributes in the geometry attribute set decoded from the bitstream is more than expected, the excess attributes may be removed from the decoded geometry attribute set.
  • FIG. 14 is a conceptual diagram showing an example of the operation of the decoding device 200 when the number of attributes in the geometry attribute set is different from the original number of attributes.
  • missing attributes in the decoded geometry attribute set are filled in by corresponding attributes in the stored geometry attribute set.
  • the geometric attribute set decoded from the bitstream may be ignored.
  • the decoding device 200 directly uses the stored geometric attribute set obtained from the buffer 237 to derive the geometric attribute set for generating the facial motion image.
  • the decoding device 200 may skip the decoding process of the geometric attribute set and directly use the stored geometric attribute set acquired from the buffer 237 for generating the facial moving image. For example, when the concealment parameter is set to 1, it indicates that the stored geometric attribute set is to be used.
  • a user of the encoding device 100 may repeatedly perform natural movements such as blinking or nodding.
  • driving frame capture may be temporarily disabled and a natural movement such as blinking every 3 seconds may be simulated in a time loop.
  • the concealment parameter when the concealment parameter has a value of 2, it indicates to stop storing the geometric attribute set in the buffer 237 and to start retrieving the geometric attribute set from the buffer 237.Also, when the concealment parameter has a value of 0, it indicates to stop retrieving the geometric attribute set from the buffer 237 and to apply the decoded geometric attribute set to the derived geometric attribute set.
  • a confidence score may be calculated and compared to a predefined threshold. Then, for attributes that fall below the threshold, the predefined condition may be determined to be true. That is, for attributes that fall below the threshold, the confidence may be determined to be low.
  • FIG. 16 is a conceptual diagram showing an example of the operation performed in the decoding device 200 according to the reliability of the geometric attribute set.
  • the concealment parameter When the concealment parameter is set to 0, the concealment parameter does not indicate that the reliability of the geometric attribute set is low. For example, in this case, the reliability of the geometric attribute set is high, so the decoded geometric attribute set is stored as the stored geometric attribute set and derived as the derived geometric attribute set.
  • the concealment parameter indicates that the reliability of the geometric attribute set is low.
  • the stored geometric attribute set is obtained from the buffer 237.
  • a derived geometric attribute set is obtained from the decoded geometric attribute set and the stored geometric attribute set.
  • the derived geometric attribute set is then used to generate the facial motion image. If the concealment parameter is not set to 0, the concealment parameter may indicate how to derive the geometric attribute set for generating the facial motion image, depending on the value of the concealment parameter.
  • the decoded geometric attribute set obtained from the bitstream may be ignored, and the decoding device 200 may directly use the stored geometric attribute set obtained from the buffer 237 as a derived geometric attribute set for generating the facial motion image.
  • is a smoothing parameter that indicates the level of emphasis given to the stored geometry attribute set.
  • the value of ⁇ may be predetermined or may be decoded from the bitstream.
  • the smoothing parameter ⁇ may take a value between 0% and 100%, indicating the percentage weight assigned to the stored geometry attribute set.
  • x derived refers to the coordinate value of an attribute (landmark) in the derived geometry attribute set
  • x decoded refers to the coordinate value of an attribute (landmark) in the decoded geometry attribute set
  • x stored refers to the coordinate value of an attribute (landmark) in the stored geometry attribute set.
  • the coordinate value for the derived geometry attribute set is calculated by a weighted average of the coordinate value for the decoded geometry attribute set and the coordinate value for the stored geometry attribute set.
  • a weighted average of one decoded geometry attribute set and one stored geometry attribute set is used.
  • a weighted average of one decoded geometry attribute set and multiple stored geometry attribute sets may also be used.
  • FIG. 17 is a conceptual diagram showing an example of the operation of replacing geometric attributes in the decoding device 200. Specifically, attributes with low reliability in the decoded geometric attribute set are replaced with attributes with high reliability in the stored geometric attribute set. For example, the reliability threshold for replacement is set to 0.7. Therefore, as shown in FIG. 17, attributes in the decoded geometric attribute set that have a reliability lower than the threshold are replaced with corresponding attributes in the stored geometric attribute set.
  • FIG. 18 is a conceptual diagram showing another example of operation in which geometric attributes are replaced by the decoding device 200.
  • the decoding device 200 replaces the missing attributes with corresponding attributes in the stored geometric attribute set. In other words, the decoding device 200 compensates for the missing attributes with corresponding attributes in the stored geometric attribute set.
  • a corresponding attribute may be selected from among a plurality of corresponding attributes in a plurality of stored geometric attribute sets based on a plurality of confidence levels and applied to the missing attribute.
  • the reliability may also be determined in units of a geometric attribute set, in units of an attribute, or in units of a group including multiple attributes in a geometric attribute set. A geometric attribute set or an attribute may then be selected in units for which the reliability is determined.
  • (c) Distance Correction For example, if the concealment parameters indicate that the reliability of the geometric attribute set is low, the deriver 236 obtains a derived geometric attribute set by referring to the stored geometric attribute set and correcting the distances between the portions in the decoded geometric attribute set.
  • the memory geometric attribute set is divided into different groups, each of which represents a major facial feature, and a reference distance set is derived and stored by deriving the center point of each group and calculating the relative distance from the derived center point to a reference point (such as the tip of the nose).
  • a distance is derived for each group by similarly dividing the decoded geometric attribute set into groups, deriving the center point of each group, and calculating the relative distance from the derived center point to the reference point. Then, if the derived distance differs from the distance in the reference distance set by more than a threshold, the entire group is shifted so that the derived distance matches the distance in the reference distance set.
  • the geometric attribute set refers to the geometric attribute set corresponding to the frame. That is, the geometric attribute set refers to the complete geometric attribute set for the entire face.
  • the center point of the group may be derived by averaging the x and y coordinates of all attributes (points) in the group.
  • the center point of the group may be determined by deriving a bounding box that encloses all points in the group based on the minimum and maximum x and y values of all attributes (points) in the group, and setting the center of the bounding box as the center point of the group.
  • FIG. 19 is a conceptual diagram showing an example of a stored geometry attribute set, which is a geometry attribute set stored in the decoding device 200.
  • the center point of the "right eye" group of the stored geometry attribute set is represented by (x stored_re , y stored_re )
  • the nose tip point of the stored geometry attribute set is represented by (x stored_n , y stored_n ).
  • FIG. 20 is a conceptual diagram showing an example of a decoded geometry attribute set that is a geometry attribute set decoded in the decoding device 200.
  • the center point of the "right eye" group of the decoded geometry attribute set is represented by (x decoded_re , y decoded_re ).
  • the nose tip point of the decoded geometry attribute set is represented by (x decoded_n , y decoded_n ).
  • 21 is a conceptual diagram showing an example of a derived geometry attribute set which is a geometry attribute set derived in the decoding device 200.
  • the center point of the "right eye" group of the derived geometry attribute set is represented by (x derived_re , y derived_re )
  • the nose tip point of the derived geometry attribute set is represented by (x derived_n , y derived_n ).
  • the method for setting the derived geometric attribute set in FIG. 21 based on the stored geometric attribute set in FIG. 19 and the decoded geometric attribute set in FIG. 20 is as follows.
  • the deriver 236 calculates the distance d decoded_re from the nose tip point to the center point of the right eye group in the decoded geometric attribute set according to the following formula:
  • the deriver 236 determines whether the difference between the distance from the nose tip point to the center point of the right eye group in the stored geometric attribute set and the distance from the nose tip point to the center point of the right eye group in the decoded geometric attribute set is greater than a threshold value.
  • the deriver 236 sets the right-eye group in the decoded geometry attribute set to the right-eye group in the derived geometry attribute set, and skips the following processes (4), (5), and (6).
  • the deriver 236 derives a transformation to map the center point of the right-eye group (x decoded_re , y decoded_re ) in the decoded geometric attribute set to the center point of the right-eye group (x derived_re , y derived_re ) in the derived geometric attribute set.
  • the deriving unit 236 shifts the entire right-eye group using the same transformation derived in process (5) above.
  • the deriving unit 236 sets a derived geometric attribute set by performing the above steps (1) to (6) for each of the other groups. This adjusts the distance between the groups so that they are not too far apart.
  • the generator 234 generates an image from the derived geometric attribute set via a neural network.
  • the neural network is a generative network.
  • the generative network may also be a generative adversarial network (GAN), a variational autoencoder (VAE), an autoregressive model, or a diffusion model.
  • GAN generative adversarial network
  • VAE variational autoencoder
  • the generative network may also be a machine learning framework that generates new data based on a provided dataset.
  • a generative network is also called a generative model.
  • a generative network can be guaranteed that a new dataset obtained by a generative network is similar to the original dataset by analyzing and learning from the underlying distribution of the dataset.
  • the decoding device 200 generates an image based on the concealment parameters.
  • Target applications may include, but are not limited to, video generation, editing and playback in the video conferencing, entertainment industry, social media and e-commerce industries.
  • the processing of the decoding device 200 may also be performed in the encoding device 100 in a similar manner. Also, not all of the components in this disclosure are necessarily required, and only some of the multiple components may be implemented.
  • FIG. 22 is a block diagram showing another example configuration of the decoding device 200 in this embodiment.
  • the decoding device 200 includes a video decoder 401, an entropy decoder 402, an inverse affine transformer 403, a determiner 404, a geometric attribute buffer 405, and a generator 406.
  • the entropy decoder 402 may be included in the video decoder 401.
  • the video decoder 401 applies a video decoding process to the bitstream received from the encoding device 100.
  • the entropy decoder 402 then applies an entropy decoding process, such as CABAC or VLC, to the SEI data obtained from the bitstream to obtain concealment parameters and, optionally, a current decoded geometric attribute set.
  • the inverse affine transformer 403 performs a transform, such as an inverse affine transform, on the decoded geometric attribute set.
  • a transform such as an inverse affine transform
  • the inverse affine transformer 403 may be replaced with another transformer used to transform the decoded geometric attribute set. Specifically, if affine parameters or other transformation parameters are decoded, an inverse affine transform may be performed. In other cases, the inverse affine transform may be replaced with another compatible process.
  • the determiner 404 determines whether the concealment parameter is equal to a predetermined value. If it is determined that the concealment parameter is equal to the predetermined value, the inverse affine transformer 403 retrieves the stored geometric attribute set in the geometric attribute buffer 405 and combines it with the decoded geometric attribute set to set a derived geometric attribute set. The generator 406 then generates a frame of the facial video image based on the derived geometric attribute set set by error concealment in the decoding device 200.
  • the decoding device 200 can perform error concealment based on the concealment parameters and generate a facial moving image.
  • a memory geometry attribute set is used in error concealment.
  • the decoding device 200 does not need to use a memory geometry attribute set for error concealment.
  • the decoding device 200 may apply a previous frame in the facial video to a current frame in the facial video.
  • the decoding device 200 may apply an already generated facial image to a facial image corresponding to the current frame, without using a generative model to generate a facial image corresponding to the current frame in the facial video.
  • [Encoding structure and processing] 23 is a block diagram showing a configuration example of the encoding device 100 according to the present embodiment.
  • the encoding device 100 includes compressors 131, 133, and 134, a deriver 132, and a buffer 135.
  • Each component is, for example, an electric circuit that performs information processing.
  • Two or more of the compressors 131, 133, and 134 may be integrated together.
  • the compressor 131 encodes a reference image into a bitstream.
  • the deriver 132 derives a geometric attribute set for each picture from the driving video.
  • the deriver 132 also generates (derives) concealment parameters in deriving the geometric attribute set.
  • the compressor 133 encodes the geometric attribute set for each picture into a bitstream.
  • the geometric attribute set derived and encoded from the driving video may also be expressed as a derived geometric attribute set, a target geometric attribute set for encoding, or an encoded geometric attribute set.
  • the compressor 134 encodes the concealment parameters into a bitstream.
  • the concealment parameters are parameters related to error recovery control when an appropriate geometric attribute set cannot be obtained in the encoding device 100.
  • the concealment parameters can also be expressed as error recovery control parameters. For example, the concealment parameters are derived and encoded for each picture.
  • the buffer 135 stores a geometric attribute set.
  • the geometric attribute set stored in the buffer 135 may be simply referred to as a stored geometric attribute set.
  • the stored geometric attribute set may be a geometric attribute set derived in the past, or may be a predefined geometric attribute set.
  • the buffer 135 may store multiple stored geometric attribute sets corresponding to multiple pictures.
  • the deriving unit 132 may use the stored geometric attribute set in the buffer 135 in deriving the geometric attribute set. For example, the deriving unit 132 detects a geometric attribute set for each picture from the driving video. The geometric attribute set detected from the driving video may also be expressed as a detected geometric attribute set or an extracted geometric attribute set. The deriving unit 132 then sets a derived geometric attribute set based on the detected geometric attribute set and the stored geometric attribute set.
  • the deriver 132 may derive the geometric attribute set based on multiple stored geometric attribute sets corresponding to multiple pictures.
  • the stored geometric attribute set may be an average of the multiple stored geometric attribute sets.
  • the concealment parameters may be generated by the detected geometric attribute set, may be generated by the derived geometric attribute set, or may be generated by the detected geometric attribute set and the derived geometric attribute set. Also, for example, the concealment parameters may be generated based on the detected geometric attribute set, the derived geometric attribute set may be set based on the concealment parameters, and the concealment parameters may be updated based on the derived geometric attribute set.
  • encoding the information into a bitstream corresponds to compressing the information and including the compressed information in the bitstream. Also, a portion of the geometric attribute set corresponding to the frame may be stored in buffer 135 or retrieved from buffer 135.
  • FIG. 24 is a flowchart showing an example of the operation of the encoding device 100 in this embodiment.
  • the multiple components of the encoding device 100 shown in FIG. 23 operate according to the flowchart in FIG. 24.
  • the deriver 132 detects a geometric attribute set for each picture from the driving video (S101). The deriver 132 then generates concealment parameters based on the detection result of the geometric attribute set (S102). The compressor 134 then encodes the concealment parameters into a bitstream (S103).
  • occlusion parameters may be generated based on the face detection results.
  • the face detection results in the image may be reflected in the geometric attribute set detection results. For example, if a face is not detected in the image, the occlusion parameters may indicate that the geometric attribute set is not detected.
  • the geometric attributes may be facial landmarks that indicate the location of points on key areas of the face, including the facial contours, eyes, eyebrows, nose, mouth, lips, and chin, allowing for interpretation of facial expressions and allowing the facial expressions to be easily modified to produce appropriate facial expressions.
  • These landmarks may be in a two-dimensional spatial coordinate system or a three-dimensional spatial coordinate system.
  • FIG. 25 is a conceptual diagram showing a specific example of a geometric attribute set.
  • the geometric attribute set is a face landmark set covering the eyes, eyebrows, nose, mouth, lips, chin, and facial contours extracted from the image.
  • FIG. 26 is a conceptual diagram showing another specific example of a geometric attribute set.
  • the geometric attribute set is a face landmark set that covers the eyes, eyebrows, nose, mouth, lips, chin, and facial contours extracted from the image, covering more points than the example in FIG. 25.
  • FIG. 27 is a conceptual diagram showing yet another specific example of a geometric attribute set.
  • the geometric attribute set is a face landmark set that covers the eyes, eyebrows, nose, mouth, lips, chin, cheeks, and facial contours extracted from the image, covering more points than the examples in FIG. 25 and FIG. 26.
  • the geometric attribute set may be stored in buffer 135.
  • the set of geometric attributes encoded in the bitstream may also be a subset of the full face geometric attribute set, or one or more groups in the full face geometric attribute set, each group representing a key facial feature.
  • the deriver 132 may predict a confidence score for a geometric attribute set.
  • the prediction of the confidence score corresponds to the derivation, calculation, evaluation, determination, or acquisition of a confidence score.
  • the prediction of the confidence score may be performed in units of a geometric attribute set, in units of a geometric attribute, or in units of a group in the geometric attribute set. Then, a concealment parameter may be generated based on the confidence score.
  • Example of concealment parameter indicating that a geometric attribute set has not been acquired In detecting a geometric attribute set, there is a possibility that the detection of the geometric attribute set fails and the geometric attribute set is not detected. Therefore, the encoding device 100 may signal a concealment parameter based on the detection result of the geometric attribute set. In other words, the concealment parameter may indicate that a geometric attribute set has not been acquired.
  • Such cases may occur in situations where no face is detected in the driving frames captured by the encoding device 100. This may result in an empty set of geometric attributes in the bitstream. This may result in the face not being reproduced or in erroneous distortion in the output image.
  • the concealment parameters may suppress such errors.
  • Example of a concealment parameter indicating that the number of attributes is different from the original number of attributes the concealment parameter may indicate that the number of attributes in the geometric attribute set is different from the original number of attributes.
  • the original number of attributes may be the expected number of attributes.
  • the number of attributes in the geometry attribute set corresponding to the current frame may differ from the number of attributes in the geometry attribute set corresponding to the previous frame.
  • the number of attributes in the geometry attribute set corresponding to the current frame may differ from the defined number of attributes or the number of attributes that should be included in each geometry attribute set.
  • missing attributes in the detected geometry attribute set may be supplemented or replaced by corresponding attributes in the stored geometry attribute set. If the number of attributes in the detected geometry attribute set is greater than expected, excess attributes may be removed from the detected geometry attribute set.
  • the encoding device 100 may encode and transmit the detected geometry attribute set as a derived geometry attribute set, and further signal a concealment parameter indicating that the number of attributes in the derived geometry attribute set is different than expected. In yet another example, the encoding device 100 may ignore the detected geometry attribute set and encode the stored geometry attribute set obtained from the buffer 135 into the bitstream as the derived geometry attribute set.
  • the coding device 100 does not need to set a value indicating that the number of attributes is different from the original number of attributes as the concealment parameter. Instead, the coding device 100 may signal to the bitstream a concealment parameter indicating that the reliability of the geometric attribute set is low.
  • the encoding device 100 may additionally transmit the position of the center of the face in the driving frame to the decoding device 200 to inform the decoding device 200 of which part of the frame the face is located. This may assist in signaling the direction and proportion of the occluded face to the decoding device 200.
  • the encoding device 100 skips encoding the geometric attribute set and signals the concealment parameter to the decoding device 200.
  • the decoding device 200 may directly use the stored geometric attribute set in the buffer 237 for face reconstruction based on the concealment parameter.
  • the concealment parameter may indicate to use the stored geometric attribute set when set to 1.
  • Such a case may occur in a situation where a user of the encoding device 100 selects a stored geometric attribute set and the encoding device 100 requests the decoding device 200 to perform face reproduction using the selected stored geometric attribute set. In this case, detection of the geometric attribute set may be omitted in the encoding device 100.
  • the encoding device 100 may signal to the decoding device 200 the selection information of the stored geometric attribute set to be used for face reproduction, without signaling the detected geometric attribute set.
  • the occlusion parameters indicate that the confidence of the geometric attribute set is low
  • a face is included in the driving frame and the geometric attribute set is detected from the driving frame.
  • the confidence score of the geometric attribute set may be predicted.
  • a part of the face may be temporarily hidden by wearing a mask that partially or completely covers the mouth, an eye patch that covers one or both eyes, a hand, a body movement or other object that temporarily blocks a part of the face, or a facial ornament and accessory such as sunglasses.
  • FIG. 29 is a conceptual diagram showing an example of a face that is partially hidden by occlusion.
  • the encoding device 100 can predict attributes (locations of landmarks) that were not detected due to occlusion by using nearby attributes (locations of landmarks) that are not affected by occlusion.
  • the set of geometric attributes derived and stored from the previous frame in the driving video is useful.
  • a face in a driving frame may be in an extreme head pose, resulting in parts of the face being obscured.
  • Figure 30 is a conceptual diagram showing an example of a face with extreme yaw poses.
  • the encoding device 100 may signal occlusion parameters indicating that these landmarks are occluded and have a low confidence score.
  • a pre-set limit may be set on the maximum allowed head pose angle in the encoding device 100, such as a 45 degree limit for yaw from the front.
  • FIG. 31 is a conceptual diagram showing an example of a face having an extreme pose with respect to roll. In such a case, face reproduction may not be performed appropriately. Therefore, even in such a case, the encoding device 100 may signal concealment parameters indicating a low confidence score.
  • the encoding device 100 predicts some or all of the attributes in the geometric attribute set with a low confidence score.
  • the encoding device 100 then encodes a concealment parameter indicating that the confidence score of the geometric attribute set is low.
  • the concealment parameter may indicate that the confidence score is low per geometric attribute set, per group in the geometric attribute set, or per attribute in the geometric attribute set.
  • the encoding device 100 may perform one or more of the following processes for error concealment:
  • the concealment parameter when set to 1, 2 or 3, may indicate a low confidence level for the geometric attribute set.
  • the concealment parameter set to 1, 2 or 3 may correspond to the above processes (a), (b) or (c).
  • FIG. 34 is a conceptual diagram showing an example of an operation performed in the encoding device 100 according to the reliability of the geometric attribute set.
  • the concealment parameter When the concealment parameter is set to 0, the concealment parameter does not indicate that the reliability of the geometric attribute set is low. For example, in this case, the reliability of the geometric attribute set is high, so the detected geometric attribute set is stored as the stored geometric attribute set and derived as the derived geometric attribute set.
  • multiple detected geometry attribute sets corresponding to multiple pictures may be stored in the buffer 135 as multiple stored geometry attribute sets. Then, multiple stored geometry attribute sets corresponding to multiple pictures may be used in deriving the geometry attribute set.
  • is a smoothing parameter indicating the level of emphasis given to the stored geometric attribute set.
  • the value of ⁇ may be predetermined or may be dynamically determined and coded into the bitstream.
  • the smoothing parameter ⁇ may take a value between 0% and 100%, indicating the percentage weight assigned to the stored geometric attribute set.
  • x derived refers to the coordinate values of the attributes (landmarks) in the derived geometry attribute set
  • x detected refers to the coordinate values of the attributes (landmarks) in the detected geometry attribute set
  • x stored refers to the coordinate values of the attributes (landmarks) in the stored geometry attribute set.
  • the coordinate values for the derived geometry attribute set are calculated by a weighted average of the coordinate values for the detected geometry attribute set and the coordinate values for the stored geometry attribute set.
  • the deriver 132 obtains a derived geometry attribute set by combining the decoded geometry attribute set with the stored geometry attribute set.
  • FIG. 35 is a conceptual diagram showing an example of the operation of replacing geometric attributes in the encoding device 100. Specifically, attributes with low reliability in the detected geometry attribute set are replaced with attributes with high reliability in the stored geometry attribute set. For example, the reliability threshold for replacement is set to 0.7. Therefore, as shown in FIG. 35, attributes in the detected geometry attribute set that have a reliability lower than the threshold are replaced with corresponding attributes in the stored geometry attribute set.
  • the reliability may also be determined in units of a geometric attribute set, in units of an attribute, or in units of a group including multiple attributes in a geometric attribute set. A geometric attribute set or an attribute may then be selected in units for which the reliability is determined.
  • the deriver 132 obtains a derived geometric attribute set by referring to the stored geometric attribute set and correcting the distances between the portions in the detected geometric attribute set.
  • a distance is derived for each group by similarly dividing the detection geometric attribute set into groups, deriving the center point of each group, and calculating the relative distance from the derived center point to the reference point. Then, if the derived distance differs from the distance in the reference distance set by more than a threshold, the entire group is shifted so that the derived distance matches the distance in the reference distance set.
  • the geometric attribute set refers to the geometric attribute set corresponding to the frame. That is, the geometric attribute set refers to the complete geometric attribute set for the entire face.
  • the center point of the group may be derived by averaging the x and y coordinates of all attributes (points) in the group.
  • the center point of the group may be determined by deriving a bounding box that encloses all points in the group based on the minimum and maximum x and y values of all attributes (points) in the group, and setting the center of the bounding box as the center point of the group.
  • FIG. 37 is a conceptual diagram showing an example of a stored geometry attribute set, which is a geometry attribute set stored in the encoding device 100.
  • the center point of the "right eye" group in the stored geometry attribute set is represented by (x stored_re , y stored_re )
  • the nose tip point in the stored geometry attribute set is represented by (x stored_n , y stored_n ).
  • the deriver 132 calculates the distance d detected_re from the nose tip point to the center point of the right eye group in the detected geometric attribute set according to the following formula:
  • the deriver 132 determines whether the difference between the distance from the nose tip point in the stored geometric attribute set to the center point of the right eye group and the distance from the nose tip point to the center point of the right eye group in the detected geometric attribute set is greater than a threshold value.
  • the deriver 132 sets the distance from the nose tip point in the derived geometric attribute set to the center point of the right eye group to the distance from the nose tip point in the stored geometric attribute set to the center point of the right eye group. In this case, the deriver 132 also sets the nose tip point in the derived geometric attribute set to the nose tip point in the detected geometric attribute set.
  • the deriver 132 sets the right eye group in the detected geometric attribute set to the right eye group in the derived geometric attribute set, and skips the following processes (4), (5), and (6).
  • deriver 132 derives a transformation to map the center point of the right-eye group (x detected_re , y detected_re ) in the detected geometric attribute set to the center point of the right-eye group (x derived_re , y derived_re ) in the derived geometric attribute set.
  • the deriving unit 132 sets a derived geometric attribute set by performing the above steps (1) to (6) for each of the other groups. This adjusts the distance between the groups so that they are not too far apart.
  • the stored geometric attribute set may be a geometric attribute set previously coded into the bitstream and stored in the buffer 135.
  • the stored geometric attribute set may be a geometric attribute set coded and stored in a frame prior to the current frame.
  • the stored geometric attribute set may be a geometric attribute set coded and stored in the first (intra) frame of a group of pictures.
  • the stored geometric attribute set may be a geometric attribute set that exists locally in both the encoding device 100 and the decoding device 200. Specifically, the geometric attribute set may be derived from a common image that exists locally in both the encoding device 100 and the decoding device 200. In yet another example, the stored geometric attribute set may be a predetermined geometric attribute set.
  • the stored geometric attribute set may correspond to a portion of a complete geometric attribute set representing the entire face.
  • the stored geometric attribute set may be the left eye group in the entire geometric attribute set.
  • the encoding device 100 may retain only the most recently derived geometric attribute sets. Alternatively, the encoding device 100 may retain only those geometric attribute sets in the ordered historical geometric attribute sets that are below a preset threshold and discard less recent geometric attribute sets. This ensures that one or more geometric attribute sets in the buffer 135 of the encoding device 100 are kept up to date with the latest changes to the driving frame scene.
  • the encoding device 100 may notify the decoding device 200 that the number of attributes in the geometric information differs from the original number. Therefore, it may be possible for the decoding device 200 to appropriately perform error recovery control.
  • the circuit 151 encodes base data of an image included in the video (S611).
  • the circuit 151 also encodes face attribute parameters indicating a face included in the image into a bit stream (S612).
  • the circuit 151 also further encodes a reliability parameter related to the reliability of the face attribute parameter into the bit stream (S613).
  • the reliability parameter indicates that the reliability is low depending on the value of the reliability parameter.
  • the base data corresponds to a reference image or reference attributes.
  • the facial attribute parameters correspond to a geometric attribute set.
  • the reliability parameters correspond to occlusion parameters. If the reliability parameters do not indicate that the reliability of the facial attribute parameters is low, the reliability of the facial attribute parameters may be high or indefinite.
  • the encoding device 100 may also include an input terminal, an entropy encoder, and an output terminal.
  • the operations performed by the circuit 151 may be performed by the entropy encoder.
  • Data used in the operation of the entropy encoder may be input to the input terminal.
  • Data obtained by the operation of the entropy encoder may be output from the output terminal.
  • FIG. 56 is a block diagram showing an implementation example of the decoding device 200.
  • the decoding device 200 includes a circuit 251 and a memory 252.
  • the multiple components of the decoding device 200 described above are implemented by the circuit 251 and the memory 252.
  • Circuit 251 is an electric circuit that performs information processing and can access memory 252.
  • circuit 251 may be a dedicated circuit that executes the decoding method of the present disclosure, or may be a general-purpose circuit that executes a program corresponding to the decoding method of the present disclosure.
  • Circuit 251 may also be a processor such as a CPU.
  • circuit 251 may be a collection of multiple circuits.
  • FIG. 57 is a flowchart showing a first basic operation example of the decoding device 200.
  • the circuit 251 of the decoding device 200 uses the memory 252 to perform the following:
  • the circuit 251 decodes from the bit stream face image base data relating to the face moving image and geometric information, which corresponds to each of multiple frames of the face moving image and indicates geometric attributes within an area including a person's face (S701).
  • the circuit 251 also decodes from the bit stream concealment parameters relating to error recovery control in the case where the geometric information has not been properly acquired by the encoding device 100 (S702).
  • the circuit 251 generates a facial motion image from the base data, geometric information, and occlusion parameters using a generative model (S703).
  • the base data corresponds to a reference image or reference attributes.
  • the geometric information corresponds to a geometric attribute set.
  • whether the geometric information has been appropriately acquired corresponds, for example, to whether the geometric information has been acquired according to a predetermined criterion that corresponds to the geometric information being appropriately acquired, and more specifically, to whether the geometric information that meets a predetermined condition has been acquired.
  • the circuit 251 may decode the concealment parameters from a header region in the bitstream. This may enable the encoding device 100 to notify the decoding device 200 of information relating to error recovery control in the case where the geometric information has not been properly acquired in the encoding device 100 via the header region in the bitstream. Therefore, it may be possible for the decoding device 200 to properly perform error recovery control based on the information in the header region.
  • the concealment parameter may indicate that the reliability of the geometric information corresponding to the current frame obtained by the encoding device 100 is low according to the value of the concealment parameter. This may make it possible for the encoding device 100 to notify the decoding device 200 that the reliability of the geometric information obtained by the encoding device 100 is low. Therefore, it may become possible for the decoding device 200 to appropriately perform error recovery control.
  • a low reliability of the geometric information may correspond to, for example, a reliability of the geometric information being lower than a threshold value.
  • the threshold value may be an average reliability or an arbitrarily determined reliability.
  • a low reliability of the geometric information may correspond to the geometric information not conforming to a specified condition, or the geometric information not being acquired in accordance with a specified criterion, etc. Also, if the geometric information does not conform to a specified condition, or the geometric information is not acquired in accordance with a specified criterion, the reliability of the geometric information may be considered to be lower than a threshold value.
  • the concealment parameter may indicate that the encoding device 100 has not acquired geometric information corresponding to the current frame, depending on the value of the concealment parameter. This may make it possible for the encoding device 100 to notify the decoding device 200 that the encoding device 100 has not acquired geometric information. Therefore, it may be possible for the decoding device 200 to appropriately perform error recovery control.
  • the concealment parameter may indicate, depending on the value of the concealment parameter, that the number of attributes in the geometric information corresponding to the current frame obtained by the encoding device 100 is different from the original number of attributes. This may make it possible for the encoding device 100 to notify the decoding device 200 that the number of attributes in the geometric information obtained by the encoding device 100 is inappropriate. Therefore, it may become possible for the decoding device 200 to appropriately perform error recovery control.
  • the concealment parameter may indicate, depending on the value of the concealment parameter, that stored geometric information is to be applied to the current frame of the facial video image, instead of geometric information decoded from the bitstream. This may enable the encoding device 100 to notify the decoding device 200 that the stored geometric information is to be applied to the current frame. Therefore, it may be possible to appropriately perform error recovery control in the decoding device 200.
  • the concealment parameter may indicate that geometric information corresponding to the current frame is to be stored for a subsequent frame, depending on the value of the concealment parameter. This may enable the encoding device 100 to notify the decoding device 200 that geometric information corresponding to the current frame is to be stored. Therefore, it may be possible for the decoding device 200 to appropriately perform error recovery control for the subsequent frame.
  • the concealment parameters may indicate that the geometric information corresponding to the current frame obtained by the encoding device 100 is not reliable, or that the number of attributes in the geometric information is different from the original number of attributes.
  • the circuit 251 may correct the geometric information corresponding to the current frame decoded from the bitstream using the stored geometric information. The circuit 251 may then apply the corrected geometric information to the geometric information for generating a face image corresponding to the current frame in the face video.
  • the circuit 251 may apply the stored geometric information to the geometric information for generating a facial image corresponding to the current frame in the facial video.
  • the concealment parameters may indicate the first information, the second information, or the third information.
  • the first information is that the geometric information corresponding to the current frame, which is obtained by the encoding device 100, has low reliability.
  • the second information is that the geometric information was not obtained.
  • the third information is that the number of attributes in the geometric information is different from the original number of attributes.
  • the encoding device 100 may not properly acquired geometric information, it may be possible to use a facial image that has already been generated for the current frame, rather than generating a new facial image. This may therefore make it possible to properly perform error recovery control.
  • FIG. 58 is a flowchart showing a second basic operation example of the decoding device 200.
  • the circuit 251 of the decoding device 200 uses the memory 252 to perform the following:
  • the bitstream may include a reliability parameter before the facial attribute parameter for each of a plurality of pictures. This may make it possible for the encoding device 100 to notify the decoding device 200 of the reliability of the facial attribute parameter before notifying the facial attribute parameter. Therefore, in the decoding device 200, it may be possible to appropriately determine that the reliability of the facial attribute parameter is low, and then process the facial attribute parameter.
  • the encoding device 100 and the decoding device 200 in each of the above-mentioned examples may be used as an image encoding device and an image decoding device, or as a video encoding device and a video decoding device, respectively. Furthermore, a plurality of components included in the encoding device 100 and a plurality of components included in the decoding device 200 may perform corresponding operations.
  • the coding information and compression information contained in a bitstream may simply be referred to as information.
  • each of the above examples may be used as an encoding method, a decoding method, an entropy encoding method, an entropy decoding method, or any other method.
  • An example of the above-mentioned software program is a bitstream.
  • the bitstream includes an encoded image and a syntax for performing a decoding process to decode the image.
  • the bitstream causes the decoding device 200 to decode the image by causing the decoding device 200 to execute a process based on the syntax.
  • software for realizing the above-mentioned encoding device 100 or decoding device 200 is a program such as the following.
  • the program may cause a computer to execute an encoding method that encodes, into a bit stream, face image base data relating to a face moving image and geometric information, which corresponds to each of a plurality of frames of the face moving image and indicates geometric attributes within an area including a person's face, determines whether the geometric information has been properly acquired, and further encodes, into the bit stream, concealment parameters relating to error recovery control in a decoding device in the event that the geometric information has not been properly acquired.
  • the program may cause a computer to execute an encoding method that encodes base data of an image included in a video, encodes face attribute parameters indicating a face included in the image into a bitstream, and further encodes a reliability parameter related to the reliability of the face attribute parameter into the bitstream, the reliability parameter indicating that the reliability is low according to the value of the reliability parameter.
  • the program may cause a computer to execute a decoding method that decodes from a bit stream face image base data relating to a facial video and geometric information, which corresponds to each of a plurality of frames of the facial video and indicates geometric attributes within an area including a person's face, further decodes from the bit stream concealment parameters relating to error recovery control in the event that the geometric information is not properly acquired by the encoding device, and generates the facial video from the base data, the geometric information, and the concealment parameters using a generative model.
  • the program may cause a computer to execute a decoding method that acquires base data of an image included in a video, decodes facial attribute parameters indicating a face included in the image from a bit stream, and inputs the base data and the facial attribute parameters into a generative model to generate an output image corresponding to the image, the bit stream including a reliability parameter related to the reliability of the facial attribute parameter, and the reliability parameter indicating that the reliability is low according to the value of the reliability parameter.
  • each of the components described above may be a circuit. These circuits may form a single circuit as a whole, or each may be a separate circuit. In addition, each of the components may be realized by a general-purpose processor or a dedicated processor.
  • the processing performed by a specific component may be executed by another component. Furthermore, the order in which the processing is executed may be changed, or multiple processing may be executed in parallel. Furthermore, any two or more of the multiple examples of the present disclosure may be appropriately combined and implemented.
  • the encoding/decoding device may include the encoding device 100 and the decoding device 200.
  • all of the multiple components in this disclosure may not be implemented, and only some of the multiple components in this disclosure may be implemented.
  • all of the multiple processes in this disclosure may not be executed, and only some of the multiple processes in this disclosure may be executed.
  • ordinal numbers such as first and second used in the description may be changed as appropriate. New ordinal numbers may be added to components, etc., or ordinal numbers may be removed. These ordinal numbers may be added to elements in order to identify them, and may not correspond to a meaningful order.
  • an expression "at least one (or more than one) of a first element, a second element, and a third element” corresponds to a first element, a second element, a third element, or any combination thereof.
  • the aspects of the encoding device 100 and the decoding device 200 have been described above based on a number of examples, the aspects of the encoding device 100 and the decoding device 200 are not limited to these examples. As long as they do not deviate from the spirit of this disclosure, various modifications conceivable by those skilled in the art to each example, or configurations constructed by combining components in different examples, may also be included within the scope of the aspects of the encoding device 100 and the decoding device 200.
  • One or more aspects disclosed herein may be implemented in combination with at least a portion of other aspects of the present disclosure.
  • some of the processes described in the flowcharts of one or more aspects disclosed herein, some of the configurations of the device, some of the syntax, etc. may be implemented in combination with other aspects.
  • each of the functional or operational blocks can usually be realized by an MPU (micro processing unit) and a memory, etc.
  • the processing by each of the functional blocks may be realized as a program execution unit such as a processor that reads and executes software (programs) recorded on a recording medium such as a ROM.
  • the software may be distributed.
  • the software may be recorded on various recording media such as semiconductor memories. It is also possible to realize each of the functional blocks by hardware (dedicated circuits).
  • each embodiment may be realized by centralized processing using a single device (system), or may be realized by distributed processing using multiple devices.
  • the processor that executes the above program may be either single or multiple. In other words, centralized processing or distributed processing may be performed.
  • Such a system may be characterized by having an image encoding device using the image encoding method, an image decoding device using the image decoding method, or an image encoding/decoding device that includes both. Other configurations of such a system can be appropriately changed depending on the case.
  • devices such as a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, and a smartphone ex115 are connected to the Internet ex101 via an Internet service provider ex102 or a communication network ex104, and base stations ex106 to ex110.
  • the content supply system ex100 may be configured to connect a combination of any of the above devices.
  • the devices may be directly or indirectly connected to each other via a telephone network or short-range wireless communication, etc., without going through the base stations ex106 to ex110.
  • the streaming server ex103 may be connected to devices such as a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, and a smartphone ex115 via the Internet ex101, etc.
  • the streaming server ex103 may be connected to a terminal in a hotspot on an airplane ex117 via a satellite ex116.
  • Camera ex113 is a device such as a digital camera that can take still images and videos.
  • Smartphone ex115 is a smartphone, mobile phone, or PHS (Personal Handyphone System) that supports the mobile communication system formats known as 2G, 3G, 3.9G, 4G, and in the future, 5G.
  • PHS Personal Handyphone System
  • a terminal having a photographing function is connected to a streaming server ex103 via a base station ex106 or the like, thereby enabling live distribution and the like.
  • a terminal such as a computer ex111, a game machine ex112, a camera ex113, a home appliance ex114, a smartphone ex115, or a terminal in an airplane ex117
  • each terminal functions as an image encoding device according to one aspect of the present disclosure.
  • the streaming server ex103 streams the transmitted content data to a client that has requested it.
  • the clients are computers ex111, game consoles ex112, cameras ex113, home appliances ex114, smartphones ex115, or terminals in airplanes ex117 that are capable of decoding the encoded data.
  • Each device that receives the distributed data decodes and plays back the received data.
  • each device may function as an image decoding device according to one aspect of the present disclosure.
  • the streaming server ex103 may be a plurality of servers or computers that process, record, and distribute data in a distributed manner.
  • the streaming server ex103 may be realized by a CDN (Contents Delivery Network), and content distribution may be realized by a network that connects a large number of edge servers distributed around the world.
  • CDN Contents Delivery Network
  • content distribution may be realized by a network that connects a large number of edge servers distributed around the world.
  • an edge server that is physically close to the client is dynamically assigned according to the client.
  • the content is cached and distributed to the edge server, thereby reducing delays.
  • the processing can be distributed among multiple edge servers, the distribution entity can be switched to another edge server, or distribution can be continued by bypassing the part of the network where a failure has occurred, thereby realizing high-speed and stable distribution.
  • multiple video data may exist that have been shot by multiple terminals of almost the same scene.
  • the multiple terminals that shot the footage, and other terminals and servers that did not shoot the footage as necessary are used to perform distributed processing by assigning coding processing to each of them, for example, on a GOP (group of picture) basis, on a picture basis, or on a tile basis into which a picture is divided. This reduces delays and achieves better real-time performance.
  • the server may manage and/or instruct the video data shot on each terminal to be mutually referenced.
  • the server may also receive encoded data from each terminal and change the reference relationships between the multiple data, or correct or replace the pictures themselves and re-encode them. This makes it possible to generate a stream that improves the quality and efficiency of each piece of data.
  • the server may distribute the video data after performing transcoding to change the encoding method of the video data.
  • the server may convert an MPEG-based encoding method to a VP-based encoding method (e.g., VP9), or convert H.264 to H.265.
  • the encoding process can be performed by a terminal or one or more servers. Therefore, in the following, descriptions such as “server” or “terminal” are used to indicate the entity performing the processing, but some or all of the processing performed by the server may be performed by the terminal, and some or all of the processing performed by the terminal may be performed by the server. The same applies to the decoding process.
  • the user can enjoy a scene by arbitrarily selecting each video corresponding to each shooting terminal, or can enjoy content in which a video from a selected viewpoint is cut out from 3D data reconstructed using multiple images or videos.
  • sound may also be collected from multiple different angles, and the server may multiplex the sound from a particular angle or space with the corresponding video and transmit the multiplexed video and sound.
  • a user may freely select and switch in real time between a decoding device or a display device, such as a user's terminal or a display device placed indoors or outdoors.
  • decoding can be performed while switching between a decoding terminal and a display terminal using the user's own location information, etc. This makes it possible to map and display information on a part of the wall or ground of a neighboring building in which a displayable device is embedded while the user is moving to a destination.
  • the display device may display a still image or I picture that each content has as a link image, or may display an image such as a GIF animation using a plurality of still images or I pictures, or may receive only the base layer and decode and display the image.
  • the server performs recognition processing such as shooting errors, scene search, semantic analysis, and object detection from the original image data or encoded data. Based on the recognition results, the server manually or automatically corrects out-of-focus or camera shake, deletes less important scenes such as scenes that are less bright than other pictures or out of focus, emphasizes object edges, changes color, and performs other editing.
  • the server encodes the edited data based on the editing results. It is also known that if the shooting time is too long, the viewer ratings will decrease, and the server may automatically clip not only scenes with less importance as described above, but also scenes with little movement, based on the image processing results, so that the content will be within a specific time range depending on the shooting time.
  • the server may generate a digest based on the results of the semantic analysis of the scene and encode it.
  • these encoding or decoding processes are generally processed in the LSIex500 possessed by each terminal.
  • the LSI (large scale integration circuitry) ex500 (see FIG. 59) may be a one-chip or a multi-chip configuration.
  • software for encoding or decoding moving images may be incorporated into some recording medium (such as a CD-ROM, a flexible disk, or a hard disk) that can be read by the computer ex111, etc., and the encoding or decoding process may be performed using the software.
  • some recording medium such as a CD-ROM, a flexible disk, or a hard disk
  • video data acquired by the camera may be transmitted. The video data at this time is data encoded and processed by the LSIex500 possessed by the smartphone ex115.
  • the LSIex500 may be configured to download and activate application software.
  • the terminal first determines whether it supports the content encoding method or has the ability to execute a specific service. If the terminal does not support the content encoding method or does not have the ability to execute a specific service, the terminal downloads a codec or application software, and then acquires and plays the content.
  • the video signal processing unit ex455 compresses and codes the video signal stored in the memory unit ex467 or the video signal input from the camera unit ex465 by the moving image coding method shown in each of the above embodiments, and sends the coded video data to the multiplexing/separation unit ex453.
  • the audio signal processing unit ex454 codes the audio signal collected by the audio input unit ex456 while the camera unit ex465 is capturing the video or still image, and sends the coded audio data to the multiplexing/separation unit ex453.
  • main control unit ex460 including the CPU has been described as controlling the encoding or decoding process
  • various terminals often also have a GPU (Graphics Processing Unit). Therefore, a configuration may be used in which a wide area is processed collectively by utilizing the performance of the GPU using a memory shared by the CPU and GPU, or a memory whose addresses are managed so that they can be used in common. This can shorten the encoding time, ensure real-time performance, and achieve low latency. It is particularly efficient to perform the processes of motion search, deblocking filter, SAO (Sample Adaptive Offset), and conversion/quantization collectively in units such as pictures by the GPU, rather than by the CPU.
  • SAO Sample Adaptive Offset

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

L'invention concerne un dispositif de codage (200) comprenant un circuit (251) et une mémoire (252) connectée au circuit (251). En fonctionnement, le circuit (251) : décode, à partir d'un flux binaire, des données de base d'une image de visage se rapportant à une image animée de visage, et des informations géométriques qui correspondent à chacune d'une pluralité de trames de l'image animée de visage et indiquent les attributs géométriques dans une région comprenant le visage d'une personne (S701) ; décode en outre, à partir du flux binaire, des paramètres de dissimulation concernant une commande de correction d'erreur lorsque les informations géométriques ne sont pas acquises de manière appropriée dans un dispositif de codage (S702) ; et utilise un modèle de génération pour générer l'image animée de visage à partir des données de base, des informations géométriques et des paramètres de dissimulation (S703).
PCT/JP2025/000612 2024-01-16 2025-01-10 Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage Pending WO2025154665A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202463621222P 2024-01-16 2024-01-16
US63/621,222 2024-01-16

Publications (1)

Publication Number Publication Date
WO2025154665A1 true WO2025154665A1 (fr) 2025-07-24

Family

ID=96471576

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2025/000612 Pending WO2025154665A1 (fr) 2024-01-16 2025-01-10 Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage

Country Status (2)

Country Link
TW (1) TW202535061A (fr)
WO (1) WO2025154665A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002008051A (ja) * 2000-06-27 2002-01-11 Sony Corp 画像表示装置、画像伝送システム、送信装置及び受信装置
JP2005526457A (ja) * 2002-05-17 2005-09-02 ジェネラル・インスツルメント・コーポレーション ビデオ・トランスコーダ
JP2012191450A (ja) * 2011-03-10 2012-10-04 Canon Inc 画像符号化装置
WO2023022697A1 (fr) * 2021-08-16 2023-02-23 Ltn Global Communcations, Inc. Système et procédé évolutifs utilisant des entités logiques pour la production de programmes qui utilisent des signaux multimédia

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002008051A (ja) * 2000-06-27 2002-01-11 Sony Corp 画像表示装置、画像伝送システム、送信装置及び受信装置
JP2005526457A (ja) * 2002-05-17 2005-09-02 ジェネラル・インスツルメント・コーポレーション ビデオ・トランスコーダ
JP2012191450A (ja) * 2011-03-10 2012-10-04 Canon Inc 画像符号化装置
WO2023022697A1 (fr) * 2021-08-16 2023-02-23 Ltn Global Communcations, Inc. Système et procédé évolutifs utilisant des entités logiques pour la production de programmes qui utilisent des signaux multimédia

Also Published As

Publication number Publication date
TW202535061A (zh) 2025-09-01

Similar Documents

Publication Publication Date Title
US12069303B2 (en) Encoder, decoder, encoding method, and decoding method
US12166982B2 (en) Encoder, decoder, encoding method, and decoding method
CN111295884B (zh) 图像处理装置及图像处理方法
US20220303534A1 (en) Encoder, decoder, encoding method, and decoding method
US20190273931A1 (en) Encoder, decoder, encoding method, and decoding method
WO2019093234A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
US11245913B2 (en) Encoder, decoder, encoding method, and decoding method with parameter sets for pictures
US20230217065A1 (en) Reproduction apparatus, transmission apparatus, reproduction method, and transmission method
US20250014256A1 (en) Decoder, encoder, decoding method, and encoding method
JPWO2018074291A1 (ja) 画像符号化方法、伝送方法および画像符号化装置
WO2025154665A1 (fr) Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage
US20250336097A1 (en) Encoder, decoder, encoding method, and decoding method
WO2024241962A1 (fr) Dispositif de décodage, dispositif de codage, procédé de décodage et procédé de codage
WO2024224970A1 (fr) Dispositif de décodage, dispositif de codage, dispositif de génération de flux binaire, procédé de décodage et procédé de codage
WO2025204830A1 (fr) Dispositif de codage, dispositif de décodage, procédé de codage et procédé de décodage
CN121176015A (en) Decoding device, encoding device, decoding method, and encoding method
CN118947124A (zh) 解码装置、编码装置、解码方法及编码方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25741797

Country of ref document: EP

Kind code of ref document: A1