[go: up one dir, main page]

WO2025009366A1 - Dispositif et procédé de traitement d'informations - Google Patents

Dispositif et procédé de traitement d'informations Download PDF

Info

Publication number
WO2025009366A1
WO2025009366A1 PCT/JP2024/021822 JP2024021822W WO2025009366A1 WO 2025009366 A1 WO2025009366 A1 WO 2025009366A1 JP 2024021822 W JP2024021822 W JP 2024021822W WO 2025009366 A1 WO2025009366 A1 WO 2025009366A1
Authority
WO
WIPO (PCT)
Prior art keywords
guide surface
unit
feature
cnn
feature amount
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/JP2024/021822
Other languages
English (en)
Japanese (ja)
Inventor
航 河合
央二 中神
智 隈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sony Group Corp
Original Assignee
Sony Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sony Group Corp filed Critical Sony Group Corp
Publication of WO2025009366A1 publication Critical patent/WO2025009366A1/fr
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Definitions

  • This disclosure relates to an information processing device and method, and in particular to an information processing device and method that can suppress a decrease in coding efficiency.
  • AI-PCC Artificial Intelligence - Point Cloud Compression
  • Non-Patent Document 1 the method described in Non-Patent Document 1 is known as one such AI-PCC method.
  • This method uses a sparse 3D-CNN (3-dimensional Convolutional Neural Network) (a type of neural network) in the encoder and decoder.
  • Non-Patent Document 1 is not always the best method, and other methods are needed. In other words, there is a risk that the coding efficiency will decrease with the conventional methods.
  • An information processing device includes a guide surface feature derivation unit that uses a first coordinate set, which is a set of coordinates of points forming a point cloud, a first feature set, which is a set of features of the coordinates, and a guide surface, which is a reference surface formed in three-dimensional space, to derive guide surface features, which are features based on the positional relationship between the guide surface and the points; a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface features, to derive a second coordinate set and a second feature set; a coordinate encoding unit that encodes the derived second coordinate set; and a feature encoding unit that encodes the derived second feature set, and the information processing device has a code size smaller than the code size of the encoded data of the first coordinate set and the first feature set.
  • An information processing method uses a first coordinate set, which is a set of coordinates of points forming a point cloud, a first feature set, which is a set of features of the coordinates, and a guide surface, which is a reference surface formed in three-dimensional space, to derive guide surface features, which are features based on the positional relationship between the guide surface and the points, perform a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface features, derive a second coordinate set and a second feature set, encode the derived second coordinate set, encode the derived second feature set, and the amount of code for the encoded data of the second coordinate set and the second feature set is less than the amount of code for the encoded data of the first coordinate set and the first feature set.
  • An information processing device including: a coordinate decoding unit that decodes encoded data to generate a first coordinate set, which is a set of coordinates of points that form a point cloud; a feature decoding unit that decodes encoded data to generate a first feature set, which is a set of features of the coordinates; a guide surface feature derivation unit that uses the generated first coordinate set, the generated first feature set, and a guide surface that is a reference surface formed in three-dimensional space to derive guide surface features, which are features based on the positional relationship between the guide surface and the points; and a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature set that reflects the guide surface features, to derive a second coordinate set and a second feature set.
  • An information processing method is an information processing method that decodes encoded data to generate a first coordinate set, which is a set of coordinates of points forming a point cloud, decodes the encoded data to generate a first feature set, which is a set of features of the coordinates, derives guide surface features, which are features based on the positional relationship between the guide surface and the points, using the generated first coordinate set, the generated first feature set, and a guide surface, which is a reference surface formed in three-dimensional space, and performs a calculation using the first coordinate set and the first feature set reflecting the guide surface features, to derive a second coordinate set and a second feature set.
  • a first coordinate set which is a set of coordinates of points forming a point cloud
  • a first feature set which is a set of features of those coordinates
  • a guide surface which is a reference surface formed in three-dimensional space
  • encoded data is decoded to generate a first coordinate set which is a set of coordinates of points forming a point cloud
  • the encoded data is decoded to generate a first feature set which is a set of features of the coordinates
  • the generated first coordinate set, the generated first feature set, and a guide surface which is a reference surface formed in three-dimensional space are used to derive guide surface features which are features based on the positional relationship between the guide surface and the points
  • a predetermined calculation is performed using the first coordinate set and the first feature set reflecting the guide surface features to derive a second coordinate set and a second feature set.
  • FIG. 1 is a diagram illustrating an example of a main configuration of a conventional calculation unit.
  • FIG. 1 is a diagram illustrating an example of a main configuration of a conventional encoding device.
  • FIG. 1 is a diagram illustrating an example of a main configuration of a conventional encoding device.
  • FIG. 1 is a diagram illustrating an example of an encoding/decoding method to which the present technology is applied.
  • FIG. 2 is a diagram illustrating an example of a main configuration of a calculation unit.
  • FIG. 13 is a diagram showing an example of a guide surface.
  • FIG. 2 is a block diagram showing an example of the main configuration of an encoding device.
  • 11 is a flowchart illustrating an example of the flow of an encoding process.
  • FIG. 9 is a flowchart continuing from FIG. 8 , illustrating an example of the flow of the encoding process.
  • 13 is a flowchart illustrating an example of the flow of a guide surface feature amount set derivation process.
  • FIG. 2 is a block diagram showing an example of the main configuration of a decoding device.
  • 13 is a flowchart illustrating an example of the flow of a decoding process.
  • 13 is a flowchart continuing from FIG. 12, illustrating an example of the flow of a decoding process.
  • 13A and 13B are diagrams illustrating examples of guide surface feature amounts.
  • FIG. 13 is a diagram showing an example of fitting.
  • FIG. 2 is a block diagram showing an example of the main configuration of a decoding device.
  • FIG. 13 is a flowchart illustrating an example of the flow of a decoding process.
  • FIG. 1 is a diagram showing an example of Trisoup.
  • FIG. 2 is a block diagram showing an example of the main configuration of an encoding device.
  • 11 is a flowchart illustrating an example of the flow of an encoding process.
  • 9 is a flowchart continuing from FIG. 8 , illustrating an example of the flow of the encoding process.
  • FIG. 2 is a block diagram showing an example of the main configuration of an encoding device.
  • 11 is a flowchart illustrating an example of the flow of an encoding process.
  • FIG. 2 is a block diagram showing an example of the main configuration of an encoding device.
  • FIG. 11 is a flowchart illustrating an example of the flow of an encoding process.
  • 9 is a flowchart continuing from FIG. 8 , illustrating an example of the flow of the encoding process.
  • 13 is a flowchart illustrating an example of the flow of a guide surface feature amount set derivation process.
  • FIG. 2 is a block diagram showing an example of the main configuration of a decoding device. 13 is a flowchart illustrating an example of the flow of a decoding process. 30 is a flowchart continuing from FIG. 29 , illustrating an example of the flow of a decoding process.
  • FIG. 13 is a diagram illustrating an example of a simulation result.
  • FIG. 2 is a block diagram showing an example of a main configuration of a computer.
  • Non-patent document 1 (mentioned above)
  • Non-patent document 2 Michael Kazhdan, Matthew Bolitho, Hugues Hoppe, "Poisson Surface Reconstruction", Eurographics Symposium on Geometry Processing (2006), Konrad Polthier, Alla Sheffer (Editors)
  • Non-patent document 3 Stoll Carsten, Karni Zachi, Rossl Christian, Hitoshi Yamauchi, Seidel Hans-Peter, "Template Deformation for Point Cloud Fitting”
  • Non-patent document 4 Danhang Tang, Philip A. Chou, Christian Hane, Mingsong Dou, Sean Fanello, Jonathan Taylor, Philip Davidson, Onur G.
  • the contents of the above-mentioned non-patent documents and the contents of other documents referenced in the above-mentioned non-patent documents are also used as the basis for determining the support requirements.
  • Point Cloud> As 3D data representing the three-dimensional structure of a three-dimensional structure (object of a three-dimensional shape), there exists a point cloud (also called a point group) that represents the object as a collection of many points.
  • Point cloud data (also called point cloud data) is composed of the geometry (position information) and attributes (attribute information) of each point that constitutes the point cloud.
  • the geometry indicates the position of the point in three-dimensional space.
  • the attributes indicate the attributes of the point.
  • the attributes can include any information.
  • the attributes may include color information, reflectance information, normal information, etc. of each point.
  • the point cloud has a relatively simple data structure and can represent any three-dimensional structure with sufficient accuracy by using a sufficient number of points.
  • AI-PCC Artificial Intelligence - Point Cloud Compression
  • Non-Patent Document 1 the method described in Non-Patent Document 1 is known as one such AI-PCC method.
  • this method is also referred to as a "conventional AI-PCC method.”
  • This conventional AI-PCC method uses a sparse 3D-CNN (3-dimensional Convolutional Neural Network) (a type of neural network) in the encoder and decoder.
  • 3D-CNN 3-dimensional Convolutional Neural Network
  • a CNN (Convolution Neural Network) 11 that converts an input sparse tensor into an output sparse tensor is used.
  • some kind of calculation unit 21 that converts an input coordinate set and an input feature set into an output coordinate set and an output feature set, such as a 3D-CNN, a multilayer perceptron, or another neural network, is used.
  • FIG. 2 shows an example of the main configuration of an encoder for this conventional AI-PCC method.
  • CNN 52 (CNN_E1), CNN 53 (CNN_E2), CNN 54 (CNN_E3), CNN 55 (CNN_E4), CNN 56 (CNN_E5), and CNN 57 (CNN_E6) receive the sparse tensor and return the converted sparse tensor.
  • a point encoding unit 58 encodes the points of C_X, the points of the coordinate set obtained by downscaling C_X by 1/2, and the points of the coordinate set obtained by downscaling C_X by 1/4, and outputs a score bit stream.
  • the coordinate encoding unit 59 encodes (Octree encodes) the coordinate set C_Y, which is obtained by downscaling the input point group C_X by 1/8, and outputs a coordinate bit stream.
  • the feature encoding unit 60 encodes the feature set F_Y associated with each coordinate in the coordinate set C_Y, and outputs a feature bit stream.
  • the main configuration of a decoder for this conventional AI-PCC method is shown in Figure 3.
  • the score decoding unit 71 decodes the score bit stream.
  • the coordinate decoding unit 72 is a G-PCC decoder that performs lossless decoding of the coordinate set bit stream and generates (restores) the coordinate set C_Y.
  • the feature decoding unit 73 entropy decodes the feature bit stream and generates (restores) the feature set F_Y.
  • the feature bit stream is lossy compressed, and strictly speaking, the restored feature set F_Y may differ from F_Y before encoding.
  • CNN74 (CNN_D1), CNN75 (CNN_D2), CNN77 (CNN_D3), CNN78 (CNN_D4), CNN80 (CNN_D5), and CNN81 (CNN_D6) are sparse 3D-CNNs that receive sparse tensors and return sparse tensors after transformation. These are mainly composed of sparse 3D convolution layers and nonlinear transformation layers.
  • the occupancy state classification unit 76, the occupancy state classification unit 79, and the occupancy state classification unit 82 are each composed of a single sparse 3D convolution layer. This sparse 3D convolution layer inputs a sparse tensor and outputs a sparse tensor as a result of the operation.
  • the feature of the sparse tensor as a result of the operation of this sparse 3D convolution layer represents the occupancy probability value of the coordinate.
  • the occupancy state classification unit 76, the occupancy state classification unit 79, and the occupancy state classification unit 82 leave only the top k coordinates with the highest occupancy probability value and delete the other coordinates.
  • the occupancy state classification unit 76, the occupancy state classification unit 79, and the occupancy state classification unit 82 output a coordinate set in which the coordinate set of the input sparse tensor is deleted except for the top k coordinates, and a feature set corresponding to the coordinate set (i.e., the feature set of the input sparse tensor in which the feature corresponding to the deleted coordinates has been deleted).
  • k uses the corresponding scale among N_X, N_X', and N_X''.
  • the feature corresponding to the deleted coordinate in F ⁇ out is also deleted.
  • Non-Patent Document 1 is not always the best method, and other methods were needed. In other words, there was a risk that the coding efficiency would decrease with the conventional methods.
  • the guide surface feature derivation unit 101 reflects the set of guide surface features g(C ⁇ in) in the input sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ , and inputs the input sparse tensor ⁇ C ⁇ in, [F ⁇ in, g(C ⁇ in)] ⁇ to the CNN 102.
  • the guide surface feature derivation unit 121 reflects the set of guide surface features g(C ⁇ in) in the input coordinate set and input feature set ⁇ C ⁇ in, F ⁇ in ⁇ , and inputs the input coordinate set and input feature set ⁇ C ⁇ in, [F ⁇ in, g(C ⁇ in)] ⁇ to the calculation unit 122.
  • This calculation unit 122 may have any configuration, such as a 3D-CNN, a multilayer perceptron, or other neural network.
  • a guide surface 142 is provided.
  • This guide surface 142 is a virtual surface in a three-dimensional space set in the present technology, which is used to convert the information of each point 141.
  • This guide surface 142 may be any surface, and may be a flat surface or a curved surface.
  • this guide surface 142 may represent the three-dimensional shape (outer shape) of the same 3D object as the point cloud with a predetermined resolution, or may be a surface unrelated to the outer shape of the 3D object.
  • the information of each point 141 is converted into information (guide surface feature amount) based on this guide surface 142 and encoded. Therefore, in 2D encoding, the amount of information to be encoded can be reduced, as in the case of encoding a prediction residual. Therefore, it is possible to suppress a decrease in encoding efficiency.
  • the first information processing device is provided with a guide surface feature derivation unit that uses a first coordinate set, which is a set of coordinates of points forming a point cloud, a first feature set, which is a set of features of the coordinates, and a guide surface, which is a reference surface formed in three-dimensional space, to derive guide surface features, which are features based on the positional relationship between the guide surface and the points; a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface features, to derive a second coordinate set and a second feature set; a coordinate encoding unit that encodes the derived second coordinate set; and a feature encoding unit that encodes the derived second feature set.
  • a first coordinate set which is a set of coordinates of points forming a point cloud
  • a first feature set which is a set of features of the coordinates
  • a guide surface which is a reference surface formed in three-dimensional space
  • a first coordinate set which is a set of coordinates of points forming a point cloud
  • a first feature set which is a set of features of the coordinates
  • a guide surface which is a reference surface formed in three-dimensional space
  • a predetermined calculation is performed using the first coordinate set and the first feature set reflecting the guide surface features to derive a second coordinate set and a second feature set
  • the derived second coordinate set is encoded
  • the derived second feature set is encoded.
  • the scale of the second coordinate set and the second feature set may be smaller than the scale of the first coordinate set and the first feature set.
  • the calculation unit may derive the second coordinate set and the second feature set by recursively repeating a predetermined calculation multiple times.
  • the calculation unit may downscale the input coordinate set and feature set in all or a part of the multiple predetermined calculations.
  • the guide surface feature may be derived for each scale.
  • the guide surface feature derivation unit may derive guide surface feature for some or all scales of the coordinate set and feature set.
  • the guide surface may also be encoded.
  • the bit stream may also be decoded to generate (restore) the guide surface.
  • the guide surface may also be generated.
  • the first information processing device may further include a guide surface generation unit that generates a guide surface using a point cloud, a guide surface encoding unit that encodes the guide surface and generates encoded data for the guide surface, and a guide surface decoding unit that decodes the encoded data for the guide surface and generates a guide surface.
  • the first information processing device may further include a score encoding unit that encodes the score.
  • the score encoding unit may also encode the score for each scale.
  • the calculation unit may be configured with a CNN (Convolution Neural Network).
  • the first information processing device may further include a sparse tensor construction unit that constructs a sparse tensor from the point cloud.
  • the CNN may input the constructed sparse tensor.
  • the second information processing device also includes a coordinate decoding unit that decodes encoded data to generate a first coordinate set, which is a set of coordinates of points forming a point cloud; a feature decoding unit that decodes encoded data to generate a first feature set, which is a set of features of the coordinates; a guide surface feature derivation unit that uses the generated first coordinate set, the generated first feature set, and a guide surface that is a reference surface formed in three-dimensional space to derive guide surface features, which are features based on the positional relationship between the guide surface and the points; and a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface features to derive a second coordinate set and a second feature set, so that the code amount of the encoded data of the first coordinate set and the first feature set is smaller than the code amount of the encoded data of the second coordinate set and the second feature set.
  • a coordinate decoding unit that decodes encoded data to generate a first coordinate set, which is
  • encoded data is decoded to generate a first coordinate set which is a set of coordinates of points forming a point cloud
  • encoded data is decoded to generate a first feature set which is a set of coordinate features
  • the generated first coordinate set, the generated first feature set, and a guide surface which is a reference surface formed in three-dimensional space are used to derive guide surface features which are features based on the positional relationship between the guide surface and the points
  • a predetermined calculation is performed using the first coordinate set and the first feature set reflecting the guide surface features
  • a new coordinate set and feature set are derived
  • the amount of code of the encoded data of the first coordinate set and the first feature set is made smaller than the amount of code of the encoded data of the second coordinate set and the second feature set.
  • the scale of the second coordinate set and the second feature set may be larger than the scale of the first coordinate set and the first feature set.
  • the calculation unit may derive the second coordinate set and the second feature set by recursively repeating a predetermined calculation multiple times.
  • the calculation unit may upscale the input coordinate set and feature set in all or a part of the multiple predetermined calculations.
  • the guide surface feature may be derived for each scale.
  • the guide surface feature derivation unit may derive guide surface feature for some or all scales of the coordinate set and feature set.
  • the second information processing device may also decode and generate (restore) the guide surface.
  • the second information processing device may further include a guide surface decoding unit that decodes the encoded data of the guide surface and generates the guide surface.
  • the second information processing device may also include an occupancy state classification unit that predicts an occupancy probability value for each coordinate and extracts some coordinates and features with high occupancy probability values.
  • the second information processing device may also include a score decoding unit that decodes to generate (restore) a score.
  • the score decoding unit may also decode to generate (restore) a score for each scale.
  • the calculation unit may be configured using a Convolution Neural Network (CNN).
  • CNN Convolution Neural Network
  • the first information processing device may be any device.
  • it may be an encoding device that encodes 3D data (e.g., a point cloud).
  • FIG. 7 is a block diagram showing an example of the configuration of an encoding device, which is one aspect of an information processing device to which the present technology is applied.
  • the encoding device 300 shown in FIG. 7 is a device that encodes the geometry of a point cloud (3D data).
  • the encoding device 300 encodes the geometry by applying the above-mentioned method 1.
  • FIG. 7 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 7.
  • processing units that are not shown as blocks in FIG. 7, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 7.
  • the encoding device 300 has a guide surface generation unit 311, a guide surface encoding unit 312, a guide surface decoding unit 313, a sparse tensor construction unit 314, a guide surface feature derivation unit 315, a CNN 316 (CNN_E1), a guide surface feature derivation unit 317, a CNN 318 (CNN_E2), a CNN 319 (CNN_E3), a guide surface feature derivation unit 320, a CNN 321 (CNN_E4), a CNN 322 (CNN_E5), a CNN 323 (CNN_E6), a coordinate encoding unit 324, a feature encoding unit 325, and a score encoding unit 326.
  • CNN_E1 CNN 316
  • CNN_E2 CNN 318
  • CNN 319 CNN_E3
  • a guide surface feature derivation unit 320 a CNN 321 (CNN_E4), a CNN 322 (CNN_E5), a CNN 3
  • the guide surface generation unit 311 generates a guide surface from the input point cloud C_X.
  • the guide surface is a reference surface formed in three-dimensional space, and is used for encoding the point cloud.
  • the guide surface generation unit 311 may generate a mesh from the point cloud and use the mesh as the guide surface. Any method may be used to generate a mesh from this point cloud, and an existing method such as Poisson surface reconstruction described in Non-Patent Document 2 may be applied. However, this method requires normal data, so it is necessary to calculate the normals of the point cloud.
  • the guide surface generation unit 311 supplies data indicating the generated guide surface to the guide surface encoding unit 312.
  • the guide surface encoding unit 312 encodes the supplied guide surface to generate encoded data of the guide surface (also referred to as a guide surface bit stream). Any encoding method may be used. For example, if the guide surface is composed of a mesh, the guide surface encoding unit 312 may encode the guide surface by applying an encoding method for meshes. For example, an existing method such as reducing the number of triangles by mesh decimation and then performing Draco encoding may be applied. The guide surface encoding unit 312 may output the generated guide surface bit stream to the outside of the encoding device 300. The guide surface encoding unit 312 may also supply the generated guide surface bit stream to the guide surface decoding unit 313.
  • the guide surface decoding unit 313 decodes the supplied guide surface bit stream and generates (restores) the guide surface.
  • This decoding method may be any method that corresponds to the encoding method applied by the guide surface encoding unit 312.
  • the guide surface decoding unit 313 may apply a decoding method for meshes to decode the guide surface bit stream.
  • This decoding method may be an existing method.
  • Draco decoding may be applied. Note that the encoding and decoding of this guide surface may be a lossless method or a lossy method.
  • the guide surface generated (restored) by the guide surface decoding unit 313 may not completely match the guide surface before encoding (for example, the guide surface generated by the guide surface generating unit 311).
  • the guide surface decoding unit 313 may supply the generated decoded guide surface to the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, and the guide surface feature amount derivation unit 320.
  • feature f_i is arranged at coordinates (x_i, y_i, z_i).
  • [F ⁇ in, g(C ⁇ in)] is a feature set obtained by concatenating the guide surface feature g(c_i) calculated from the coordinate values corresponding to each feature f_i ⁇ F ⁇ in.
  • [F ⁇ in, g(C ⁇ in)] ⁇ [f_i, g(c_i)] ⁇ _i.
  • the dimension of the feature quantity set [F ⁇ in, g(C ⁇ in)] resulting from the calculation is three more than the dimension of the feature quantity set F ⁇ in to be processed.
  • the guide surface feature derivation unit 315 may also obtain the decoded guide surface supplied from the guide surface decoding unit 313.
  • the guide surface feature derivation unit 315 may then use the decoded guide surface to perform the above-mentioned processing on the input sparse tensor to generate a sparse tensor as the calculation result.
  • the guide surface feature derivation unit 315 may also supply the generated sparse tensor as the calculation result to the CNN 316 (CNN_E1).
  • CNN316 (CNN_E1) is a sparse 3D-CNN.
  • Sparse 3D-CNN is a 3D-CNN with a sparse 3D convolutional layer, which inputs the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed and outputs the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ as the result of the calculation.
  • 3D-CNN is a neural network mainly composed of a 3D convolutional layer and a nonlinear transformation layer. The 3D convolutional layer performs a kernel convolution operation on a set of features arranged in a three-dimensional space, and outputs the set of features as the result of the calculation.
  • the 3D convolutional layer performs an operation to aggregate nearby feature sets in three-dimensional space by weighting each feature with the corresponding kernel value. This realizes feature transformation that takes into account nearby features.
  • 3D-CNN realizes more nonlinear and complex feature transformation by stacking 3D convolutional layers and nonlinear transformation layers.
  • a sparse 3D convolutional layer is a 3D convolutional layer whose input and output are sparse tensors.
  • a sparse 3D convolutional layer has the advantage of being computationally more efficient (because empty coordinates with no corresponding features are ignored) than a normal 3D convolutional layer (also called a dense 3D convolutional layer for comparison).
  • the calculation of a sparse 3D convolutional layer can be expressed as the following equation (1).
  • C ⁇ in is the input coordinate set (also called the coordinate set of the processing target), and C ⁇ out is the output coordinate set (also called the coordinate set of the calculation result).
  • f_u ⁇ in is the input feature vector at coordinate u (also called the feature vector of the processing target), and f_u ⁇ out is the output feature vector at coordinate u (also called the feature vector of the calculation result).
  • N ⁇ 3(u, C ⁇ in) ⁇ i
  • W_i represents the kernel value at offset value i.
  • the coordinate set of the processing target C ⁇ in and the coordinate set of the calculation result C ⁇ out are not always the same. For example, C ⁇ out is different from C ⁇ in when downscaling or upscaling.
  • CNN316 may obtain the sparse tensor of the calculation result of the guide surface feature derivation unit 315 as the sparse tensor to be processed. In other words, since the dimension of the feature increases by the amount of the guide surface feature, the input/output dimension of each sparse 3D-CNN is changed accordingly. This also applies to the following CNNs (CNN318, CNN319, CNN321, CNN322, and CNN323).
  • CNN316 (CNN_E1) is composed of two sparse 3D convolution layers and two ReLU (Rectified Linear Unit) layers, which are a type of nonlinear transformation layer, and may perform 1/2 downscaling on the sparse tensor to be processed.
  • CNN316 may supply the sparse tensor of the calculation result obtained to the guide surface feature derivation unit 317.
  • the guide surface feature derivation unit 317 is a processing unit similar to the guide surface feature derivation unit 315, and performs processing related to the derivation of guide surface features.
  • the guide surface feature derivation unit 317 may obtain a sparse tensor resulting from the calculation of the CNN 316 (CNN_E1) as a sparse tensor to be processed.
  • the guide surface feature derivation unit 317 may also obtain a decoded guide surface supplied from the guide surface decoding unit 313.
  • the guide surface feature derivation unit 317 may then use the decoded guide surface to perform the above-mentioned processing on the obtained sparse tensor to generate a sparse tensor resulting from the calculation.
  • the guide surface feature derivation unit 317 may supply the sparse tensor resulting from the calculation that has been generated to the CNN 318 (CNN_E2).
  • CNN318 is a sparse 3D-CNN, similar to CNN316 (CNN_E1).
  • CNN318 (CNN_E2) may obtain the sparse tensor resulting from the calculation of the guide surface feature derivation unit 317 as the sparse tensor to be processed.
  • CNN318 (CNN_E2) may be configured with a three-layer IRN (Inception-Residual Network).
  • the IRN is also a sparse 3D-CNN, and is configured with a sparse 3D convolutional layer and a nonlinear transformation layer.
  • CNN318 (CNN_E2) may perform calculations using the three-layer IRN on the sparse tensor to be processed.
  • the CNN319 is a sparse 3D-CNN, similar to the CNN316 (CNN_E1).
  • the CNN319 (CNN_E3) may obtain the sparse tensor resulting from the calculation of the CNN318 (CNN_E2) as the sparse tensor to be processed.
  • the CNN319 (CNN_E3) is composed of two sparse 3D convolutional layers and two ReLU layers, and may perform 1/2 downscaling on the sparse tensor to be processed.
  • the CNN319 (CNN_E3) may supply the sparse tensor resulting from the calculation to the guide surface feature derivation unit 320.
  • the guide surface feature derivation unit 320 is a processing unit similar to the guide surface feature derivation unit 315, and performs processing related to the derivation of guide surface features.
  • the guide surface feature derivation unit 320 may obtain the sparse tensor resulting from the calculation of CNN 319 (CNN_E3) as the sparse tensor to be processed.
  • the guide surface feature derivation unit 320 may also obtain the decoded guide surface supplied from the guide surface decoding unit 313.
  • the guide surface feature derivation unit 320 may then use the decoded guide surface to perform the above-described processing on the sparse tensor to be processed, to generate a sparse tensor resulting from the calculation.
  • the guide surface feature derivation unit 320 may supply the sparse tensor resulting from the calculation that has been generated to CNN 321 (CNN_E4).
  • the CNN321 (CNN_E4) is a sparse 3D-CNN, similar to the CNN318 (CNN_E2).
  • the CNN321 (CNN_E4) may obtain the sparse tensor resulting from the calculation of the guide surface feature derivation unit 320 as the sparse tensor to be processed.
  • the CNN321 (CNN_E4) may be configured with three layers of IRNs, and may perform calculations using the three layers of IRNs on the sparse tensor to be processed.
  • CNN322 (CNN_E5) is a sparse 3D-CNN, similar to CNN316 (CNN_E1). CNN322 (CNN_E5) may obtain the sparse tensor resulting from the computation of CNN321 (CNN_E4) as the sparse tensor to be processed.
  • CNN322 (CNN_E5) is composed of two sparse 3D convolutional layers and two ReLU layers, and may perform 1/2 downscaling on the sparse tensor to be processed.
  • CNN322 (CNN_E5) may supply the sparse tensor resulting from the computation obtained to CNN323 (CNN_E6).
  • CNN323 is a sparse 3D-CNN, similar to CNN318 (CNN_E2).
  • CNN323 (CNN_E6) may obtain the sparse tensor resulting from the computation of CNN322 (CNN_E5) as the sparse tensor to be processed.
  • CNN323 (CNN_E6) may be configured with three layers of IRN and one layer of sparse 3D convolutional layer, and may perform computations on the sparse tensor to be processed using the three layers of IRN and one layer of sparse 3D convolutional layer.
  • the coordinate encoding unit 324 is a G-PCC (Geometry-based Point Cloud Compression) encoder, and performs processing related to encoding of coordinates (coordinate set).
  • the coordinate encoding unit 324 may encode the coordinate set C_Y and generate encoded data (also called a coordinate bit stream) of the coordinate set.
  • the coordinate encoding unit 324 may encode the coordinate set C_Y using a lossless method.
  • the coordinate encoding unit 324 may encode the coordinate set C_Y using an octree.
  • the coordinate encoding unit 324 may output the generated coordinate bit stream to the outside of the encoding device 300.
  • the feature encoding unit 325 performs processing related to encoding of features (feature set).
  • the feature encoding unit 325 may quantize and entropy encode the feature set F_Y to generate encoded data of the feature set (also referred to as a feature bit stream). In other words, the feature encoding unit 325 encodes the feature set F_Y using a lossy method.
  • the feature encoding unit 325 may output the generated feature bit stream to the outside of the encoding device 300.
  • the score encoding unit 326 performs processing related to encoding the scores of the coordinate sets of each scale.
  • the score encoding unit 326 may encode these N_X, N_X', and N_X'' to generate encoded data of the scores (also referred to as a score bit stream).
  • the score encoding unit 326 may output the generated score bitstream outside the encoding device 300.
  • the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, and the guide surface feature derivation unit 320 derive guide surface feature amounts, which are features based on the positional relationship between the guide surface and the points, using a first coordinate set, which is a set of coordinates of points forming a point cloud, a first feature set, which is a set of features of the coordinates, and a guide surface, which is a reference surface formed in a three-dimensional space.
  • CNN316, CNN318, CNN319, CNN321, CNN322, and CNN323 perform a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface feature amounts (using features including the guide surface feature amounts) to derive a second coordinate set and a second feature set. Therefore, CNN316, CNN318, CNN319, CNN321, CNN322, and CNN323 can also be said to be calculation units.
  • the coordinate encoding unit 324 encodes the derived second coordinate set.
  • the feature encoding unit 325 encodes the derived second feature set. The amount of code for the encoded data of the second coordinate set and the second feature set is smaller than the amount of code for the encoded data of the first coordinate set and the first feature set.
  • the encoding device 300 can reduce the amount of information to be encoded, and suppress a decrease in encoding efficiency.
  • the CNN and guide surface feature derivation unit may have any configuration and are not limited to the example in FIG. 7.
  • the number of downscalings and the conversion rate for each downscaling may be any number.
  • FIG. 7 illustrates an example in which the calculation unit is 3D-CNN
  • the calculation unit may have any configuration, such as 3D-CNN, a multilayer perceptron, or other neural networks, as in the calculation unit 122 in FIG. 5.
  • the guide surface feature amount derivation unit only needs to be provided at least at the beginning of the CNN sequence. In addition, it may be provided anywhere in the CNN sequence.
  • the guide surface feature amount may be derived for each scale.
  • the guide surface feature amount derivation unit 315 may derive the guide surface feature amount at 1/1 scale
  • the guide surface feature amount derivation unit 317 may derive the guide surface feature amount at 1/2 scale
  • the guide surface feature amount derivation unit 320 may derive the guide surface feature amount at 1/4 scale.
  • the guide surface feature amount may also be derived for 1/8 scale.
  • the scale of the second coordinate set and the second feature amount set may be smaller than the scale of the first coordinate set and the first feature amount set.
  • the calculation unit may derive the second coordinate set and the second feature amount set by recursively repeating a predetermined calculation multiple times.
  • the calculation unit may also downscale the input coordinate set and feature set in all or part of the multiple predetermined calculations.
  • the guide surface feature derivation unit may then derive guide surface feature values for some or all of the scales of the coordinate set and feature set.
  • the guide surface may also be encoded.
  • the bit stream may also be decoded to generate (restore) the guide surface.
  • the guide surface may also be generated.
  • the guide surface generation unit 311 may generate a guide surface using a point cloud
  • the guide surface encoding unit 312 may encode the guide surface to generate encoded data for the guide surface
  • the guide surface decoding unit 313 may decode the encoded data for the guide surface to generate the guide surface.
  • the encoding device 300 performs an encoding process to encode the geometry as described above. An example of the flow of the encoding process will be described with reference to the flowcharts of FIGS.
  • the guide surface generation unit 311 and the sparse tensor construction unit 314 acquire the input point group C_X, which is the point cloud to be input to the encoding device 300, in step S301 of FIG. 8.
  • step S302 the guide surface generation unit 311 generates a guide surface from the input point group C_X.
  • step S303 the guide surface encoding unit 312 encodes the generated guide surface to generate a guide surface bit stream.
  • the guide surface encoding unit 312 outputs the guide surface bit stream to the outside of the encoding device 300.
  • step S304 the guide surface decoding unit 313 decodes the guide surface bitstream and generates (restores) the decoded guide surface.
  • step S306 the guide surface feature derivation unit 315 sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ (C_X, F_X).
  • step S307 the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, or the guide surface feature derivation unit 320 determines whether or not to input the guide surface feature to the next 3D-CNN. If it is determined that the guide surface feature is to be input to the next 3D-CNN, the process proceeds to step S308.
  • step S308 the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, or the guide surface feature derivation unit 320 executes a guide surface feature set derivation process, and derives a guide surface feature set g(C ⁇ in) using the sparse tensor to be processed and the decoded guide surface.
  • step S309 the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, or the guide surface feature derivation unit 320 sets F ⁇ in ⁇ [F ⁇ in, g(C ⁇ in)].
  • the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, or the guide surface feature derivation unit 320 sets the concatenation of the feature set to be processed and the guide surface feature set g(C ⁇ in) as the feature set to be processed by the next 3D-CNN.
  • step S309 When the processing of step S309 is completed, the processing proceeds to FIG. 9. Also, in step S307, if it is determined that the guide surface feature amount is not to be input to the next 3D-CNN, the processing proceeds to FIG. 9. In other words, if there is no guide surface feature amount derivation unit immediately before the 3D-CNN, the processing of steps S308 and S309 is omitted.
  • the processing of steps S308 and S309 is omitted.
  • the encoding device 300 has the configuration example shown in FIG.
  • CNN316 (CNN_E1), CNN318 (CNN_E2), CNN319 (CNN_E3), CNN321 (CNN_E4), CNN322 (CNN_E5), or CNN323 (CNN_E6) performs 3D-CNN operations on the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed, and derives the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ as the operation result.
  • CNN323 determines whether or not all 3D-CNNs have been processed. If it is determined that all 3D-CNNs have not been processed (i.e., there are unprocessed 3D-CNNs (at least CNN323 (CNN_E6) has not been processed)), the process proceeds to step S323.
  • CNN316 (CNN_E1), CNN318 (CNN_E2), CNN319 (CNN_E3), CNN321 (CNN_E4), or CNN322 (CNN_E5) sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • CNN316 (CNN_E1), CNN318 (CNN_E2), CNN319 (CNN_E3), CNN321 (CNN_E4), or CNN322 (CNN_E5) sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ resulting from the calculation it derived as the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed by the next processing unit (guide surface feature derivation unit or 3D-CNN).
  • step S323 When the processing of step S323 is completed, the process returns to step S307 in FIG. 8, and the subsequent processing is executed. That is, the processing of each of steps S307 to S309 in FIG. 8 and the processing of each of steps S321 to S323 in FIG. 9 are executed for each 3D-CNN (including the guide surface feature derivation unit if one exists immediately before the 3D-CNN).
  • step S322 of FIG. 9 If it is determined in step S322 of FIG. 9 that processing has been performed for all 3D-CNNs (i.e., processing has been performed for CNN323 (CNN_E6)), processing proceeds to step S324.
  • CNN323 sets ⁇ C_Y, F_Y ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • the last CNN323 sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ , which is the result of the calculation it derived, as the output sparse tensor ⁇ C_Y, F_Y ⁇ .
  • step S325 the coordinate encoding unit 324 encodes the coordinate set C_Y of the output sparse tensor ⁇ C_Y, F_Y ⁇ to generate a coordinate bitstream. It also outputs the coordinate bitstream to the outside of the encoding device 300.
  • step S326 the feature encoding unit 325 encodes the feature set F_Y of the output sparse tensor ⁇ C_Y, F_Y ⁇ to generate a feature bitstream.
  • the feature bitstream is then output to the outside of the encoding device 300.
  • step S327 the scores for each scale are encoded to generate a score bit stream.
  • the score bit stream is then output to the outside of the encoding device 300.
  • step S327 When step S327 is completed, the encoding process ends.
  • step S331 the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, or the guide surface feature amount derivation unit 320 extracts coordinates c_i ⁇ C ⁇ in.
  • the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, or the guide surface feature amount derivation unit 320 selects one coordinate c_i from the coordinate set C ⁇ in to be processed, and makes it the processing target.
  • step S332 the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, or the guide surface feature amount derivation unit 320 searches for the nearest point p_i on the decoded guide surface from the selected coordinate c_i.
  • the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, or the guide surface feature amount derivation unit 320 derives the guide surface feature amount g(c_i) based on the positional relationship between the guide surface and the point (in this example, p_i - c_i).
  • step S334 the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, or the guide surface feature derivation unit 320 determines whether or not processing has been performed for all coordinates c_i ⁇ C ⁇ in. If it is determined that an unprocessed coordinate c_i exists, the process returns to step S331, and the subsequent processes are executed. In other words, the processes of steps S331 to S334 are executed for each coordinate value c_i in the coordinate set C ⁇ in to be processed.
  • step S334 If it is determined in step S334 that processing has been performed for all coordinate values c_i, the guide surface feature set derivation process ends and processing returns to FIG. 8.
  • the encoding device 300 can reduce the amount of information to be encoded and suppress a decrease in encoding efficiency.
  • Fig. 11 is a block diagram showing an example of the configuration of a decoding device, which is one aspect of an information processing device to which the present technology is applied.
  • the decoding device 350 shown in Fig. 11 is a device that decodes a bit stream in which the geometry of a point cloud (3D data) is encoded.
  • the decoding device 350 decodes bit streams (guide surface bit stream, coordinate bit stream, feature bit stream, and point number bit stream) generated by encoding (the geometry of) the point cloud by the encoding device 300, and generates (restores) (the geometry of) the point cloud.
  • FIG. 11 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 11.
  • processing units that are not shown as blocks in FIG. 11, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 11.
  • the decoding device 350 has a score decoding unit 361, a coordinate decoding unit 362, a feature decoding unit 363, a guide surface decoding unit 364, a CNN 365 (CNN_D1), a guide surface feature derivation unit 366, a CNN 367 (CNN_D2), an occupancy state classification unit 368, a CNN 369 (CNN_D3), a guide surface feature derivation unit 370, a CNN 371 (CNN_D4), an occupancy state classification unit 372, a CNN 373 (CNN_D5), a guide surface feature derivation unit 374, a CNN 375 (CNN_D6), and an occupancy state classification unit 376.
  • CNN_D1 CNN
  • CNN_D2 CNN 367
  • CNN_D3 an occupancy state classification unit 368
  • CNN 369 CNN 369
  • a guide surface feature derivation unit 370 a CNN 371 (CNN_D4)
  • N_X'
  • N_X''
  • the score decoding unit 361 may supply the generated N_X to the occupancy state classification unit 376.
  • the score decoding unit 361 may supply the generated N_X' to the occupancy state classification unit 372.
  • the score decoding unit 361 may supply the generated N_X'' to the occupancy state classification unit 368.
  • the coordinate decoding unit 362 is a G-PCC decoder and executes processing related to decoding of the coordinate bit stream. For example, the coordinate decoding unit 362 may acquire the coordinate bit stream input to the decoding device 350. The coordinate decoding unit 362 may decode the acquired coordinate bit stream and generate (restore) the coordinate set C_Y. In this case, the coordinate decoding unit 362 may decode the coordinate bit stream using a lossless method. For example, the coordinate decoding unit 362 may decode the coordinate bit stream using an octree. The coordinate decoding unit 362 may supply the generated coordinate set C_Y to the CNN 365 (CNN_D1).
  • the feature decoding unit 363 executes processing related to the decoding of the feature bit stream. For example, the feature decoding unit 363 may acquire the feature bit stream input to the decoding device 350. The feature decoding unit 363 may entropy decode the acquired feature bit stream to generate (restore) a feature set F_Y. In this case, the feature decoding unit 363 decodes the feature bit stream using a lossy method. Therefore, the feature set F_Y generated by the feature decoding unit 363 does not have to completely match the feature set F_Y before being encoded by the feature encoding unit 325 ( Figure 7). The feature decoding unit 363 may supply the generated feature set F_Y to the CNN 365 (CNN_D1).
  • the guide surface decoding unit 364 is a processing unit similar to the guide surface decoding unit 313 (FIG. 7), and executes processing related to the decoding of the guide surface bit stream.
  • the guide surface decoding unit 364 may acquire the guide surface bit stream input to the decoding device 350.
  • the guide surface decoding unit 364 may decode the acquired guide surface bit stream and generate a decoded guide surface.
  • This decoding method may be any method that corresponds to the encoding method applied by the guide surface encoding unit 312 (FIG. 7).
  • the guide surface decoding unit 364 may decode the guide surface bit stream by applying a decoding method for meshes.
  • This decoding method may be an existing method.
  • Draco decoding may be applied.
  • the encoding and decoding of the guide surface may be a lossless method or a lossy method.
  • the guide surface (also referred to as the decoded guide surface) generated (restored) by the guide surface decoding unit 364 does not have to completely match the guide surface before encoding (for example, the guide surface generated by the guide surface generating unit 311).
  • the guide surface decoding unit 364 may supply the generated decoded guide surface to the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, and the guide surface feature amount derivation unit 374.
  • CNN365 (CNN_D1) is a sparse 3D-CNN, and is mainly composed of sparse 3D convolution layers and nonlinear transformation layers. It acquires a sparse tensor as a processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the result of the operation.
  • CNN365 (CNN_D1) is composed of two sparse 3D convolution layers and two ReLU layers, and may perform upscaling by 2 times on the sparse tensor to be processed.
  • the CNN 365 (CNN_D1) may supply the sparse tensor of the obtained calculation result to the guide surface feature derivation unit 366.
  • the guide surface feature derivation unit 366 is a processing unit similar to the guide surface feature derivation units of the encoding device 300 such as the guide surface feature derivation unit 315, and performs processing related to the derivation of guide surface features.
  • the dimension of the feature quantity set [F ⁇ in, g(C ⁇ in)] resulting from the calculation is three more than the dimension of the feature quantity set F ⁇ in to be processed.
  • the guide surface feature derivation unit 366 may obtain the sparse tensor resulting from the calculation of CNN 365 (CNN_D1) as the sparse tensor to be processed.
  • the guide surface feature derivation unit 366 may also obtain the decoded guide surface supplied from the guide surface decoding unit 364.
  • the guide surface feature derivation unit 366 may then use the decoded guide surface to perform the above-mentioned processing on the sparse tensor to be processed, to generate a sparse tensor resulting from the calculation.
  • the guide surface feature derivation unit 366 may also supply the sparse tensor resulting from the calculation that has been generated to CNN 367 (CNN_D2).
  • CNN367 (CNN_D2), like CNN365 (CNN_D1), is a sparse 3D-CNN, and is mainly composed of a sparse 3D convolution layer and a nonlinear transformation layer.
  • CNN367 (CNN_D2) acquires a sparse tensor as a processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the operation result.
  • CNN367 (CNN_D2) may acquire a sparse tensor as the operation result of the guide surface feature derivation unit 366 as the sparse tensor to be processed.
  • CNN367 is composed of a three-layer IRN, and may perform an operation using the three-layer IRN on the sparse tensor to be processed.
  • the CNN 367 (CNN_D2) may supply the sparse tensor ⁇ Up(C_Y), F_X'' ⁇ obtained as the result of the calculation to the occupancy state classification unit 368.
  • Up() represents a process of upsampling by a factor of two in the x, y, and z directions. For example, the score of Up(C_Y) is eight times the score of C_Y.
  • the occupancy state classification unit 368 is composed of one sparse 3D convolutional layer. This sparse 3D convolutional layer inputs the sparse tensor to be processed and outputs the sparse tensor that is the result of the calculation. The feature values of the sparse tensor that is the result of the calculation of this sparse 3D convolutional layer represent the occupancy probability value of the coordinates.
  • the occupancy state classification unit 368 extracts a predetermined number of coordinates (top k coordinates) from those with the highest occupancy probability value, and deletes the other coordinates.
  • the occupancy state classification unit 368 outputs a coordinate set in which the coordinate set of the input sparse tensor has been deleted except for the top k coordinates, and a feature set corresponding to the coordinate set (i.e., the feature set of the input sparse tensor in which the feature values corresponding to the deleted coordinates have been deleted).
  • a feature set corresponding to the coordinate set i.e., the feature set of the input sparse tensor in which the feature values corresponding to the deleted coordinates have been deleted.
  • the scale value corresponding to the sparse tensor to be processed among N_X, N_X', and N_X'', is applied to this k.
  • the occupancy state classification unit 368 also deletes the feature values corresponding to the deleted coordinates in F ⁇ out.
  • the occupancy state classification unit 368 may obtain the sparse tensor ⁇ Up(C_Y), F_X'' ⁇ resulting from the calculation of the CNN 367 (CNN_D2) as the sparse tensor to be processed.
  • CNN369 (CNN_D3), like CNN365 (CNN_D1), is a sparse 3D-CNN, and is mainly composed of sparse 3D convolutional layers and nonlinear transformation layers.
  • CNN369 (CNN_D3) acquires a sparse tensor as a processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the operation result.
  • CNN369 (CNN_D3) is composed of two sparse 3D convolutional layers and two ReLU layers, and may perform upscaling by 2 times on the sparse tensor to be processed.
  • the CNN 369 (CNN_D3) may supply the sparse tensor of the obtained calculation result to the guide surface feature derivation unit 370.
  • the guide surface feature derivation unit 370 may obtain the sparse tensor resulting from the calculation of CNN 369 (CNN_D3) as the sparse tensor to be processed.
  • the guide surface feature derivation unit 370 may also obtain the decoded guide surface supplied from the guide surface decoding unit 364.
  • the guide surface feature derivation unit 370 may then use the decoded guide surface to perform the above-mentioned processing on the sparse tensor to be processed, to generate a sparse tensor resulting from the calculation.
  • the guide surface feature derivation unit 370 may also supply the sparse tensor resulting from the calculation that has been generated to CNN 371 (CNN_D4).
  • CNN371 (CNN_D4) is a sparse 3D-CNN, similar to CNN367 (CNN_D2), and is mainly composed of a sparse 3D convolution layer and a nonlinear transformation layer.
  • CNN371 (CNN_D4) acquires a sparse tensor as a processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the operation result.
  • CNN371 (CNN_D4) may acquire a sparse tensor as the operation result of the guide surface feature derivation unit 370 as the sparse tensor to be processed.
  • CNN371 is composed of a three-layer IRN, and may perform an operation using the three-layer IRN on the sparse tensor to be processed.
  • the CNN 371 may supply the sparse tensor ⁇ Up(C_X''), F_X' ⁇ obtained as the calculation result to the occupancy state classification unit 372.
  • the occupancy state classification unit 372 is a processing unit similar to the occupancy state classification unit 368, and is composed of one sparse 3D convolutional layer. It predicts the occupancy probability value of the sparse tensor to be processed, extracts a predetermined number of coordinates (the top k coordinates) with the highest occupancy probability values, and deletes the remaining coordinates. The occupancy state classification unit 372 also deletes the features in F ⁇ out that correspond to the deleted coordinates.
  • the occupancy state classification unit 372 may obtain the sparse tensor ⁇ Up(C_X'', F_X' ⁇ resulting from the calculation of the CNN 371 (CNN_D4) as the sparse tensor to be processed.
  • CNN373 (CNN_D5), like CNN365 (CNN_D1), is a sparse 3D-CNN, and is mainly composed of sparse 3D convolutional layers and nonlinear transformation layers.
  • CNN373 (CNN_D5) acquires a sparse tensor as the processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the operation result.
  • CNN373 (CNN_D5) is composed of two sparse 3D convolutional layers and two ReLU layers, and may perform upscaling by 2 times on the sparse tensor to be processed.
  • the CNN 373 (CNN_D5) may supply the sparse tensor of the obtained calculation result to the guide surface feature derivation unit 374.
  • the guide surface feature derivation unit 374 may obtain the sparse tensor resulting from the calculation of CNN 373 (CNN_D5) as the sparse tensor to be processed.
  • the guide surface feature derivation unit 374 may also obtain the decoded guide surface supplied from the guide surface decoding unit 364.
  • the guide surface feature derivation unit 374 may then use the decoded guide surface to perform the above-mentioned processing on the sparse tensor to be processed, to generate a sparse tensor resulting from the calculation.
  • the guide surface feature derivation unit 374 may also supply the sparse tensor resulting from the calculation that has been generated to CNN 375 (CNN_D6).
  • CNN375 is a sparse 3D-CNN, similar to CNN367 (CNN_D2), and is mainly composed of a sparse 3D convolution layer and a nonlinear transformation layer.
  • CNN375 (CNN_D6) acquires a sparse tensor as a processing target, performs a predetermined operation on the sparse tensor to be processed, and outputs a sparse tensor as the operation result.
  • CNN375 (CNN_D6) may acquire a sparse tensor as the operation result of the guide surface feature derivation unit 374 as the sparse tensor to be processed.
  • CNN375 is composed of a three-layer IRN, and may perform an operation using the three-layer IRN on the sparse tensor to be processed.
  • the CNN 375 (CNN_D6) may supply the sparse tensor ⁇ Up(C_X'), F_X ⁇ obtained as the calculation result to the occupancy state classification unit 376.
  • the occupancy state classification unit 376 is a processing unit similar to the occupancy state classification unit 368, and is composed of one sparse 3D convolutional layer. It predicts the occupancy probability value of the sparse tensor to be processed, extracts a predetermined number of coordinates (the top k coordinates) with the highest occupancy probability values, and deletes the remaining coordinates. The occupancy state classification unit 376 also deletes the features in F ⁇ out that correspond to the deleted coordinates.
  • the occupancy state classification unit 376 may obtain the sparse tensor ⁇ Up(C_X'), F_X ⁇ resulting from the computation of the CNN 375 (CNN_D6) as the sparse tensor to be processed.
  • the scale of this sparse tensor X ⁇ C_X, F_X ⁇ is 1x (x1) the scale before encoding.
  • the occupancy state classification unit 376 may generate (restore) a point cloud obtained by decoding the bit stream group (score bit stream, guide surface bit stream, coordinate bit stream, and feature bit stream). This point cloud is also referred to as the decoded point group C ⁇ X.
  • the occupancy state classification unit 376 may output the decoded point group C_X resulting from the calculation to outside the decoding device 350.
  • the coordinate decoding unit 362 decodes the encoded data to generate a first coordinate set, which is a set of coordinates of points forming a point cloud.
  • the feature decoding unit 363 decodes the encoded data to generate a first feature set, which is a set of features of the coordinates.
  • the guide surface feature derivation unit 366, the guide surface feature derivation unit 370, and the guide surface feature derivation unit 374 use the first coordinate set, the first feature set, and the guide surface, which is a reference surface formed in three-dimensional space, to derive guide surface features, which are features based on the positional relationship between the guide surface and the points.
  • the CNNs 365, 367, 369, CNN 371, CNN 373, and CNN 375 perform a predetermined calculation using the first coordinate set and the first feature set reflecting the guide surface features, to derive a second coordinate set and a second feature set. Therefore, CNN365, CNN367, CNN369, CNN371, CNN373, and CNN375 can also be considered calculation units.
  • the decoding device 350 can suppress a decrease in encoding efficiency.
  • the configurations of the CNN, guide surface feature derivation unit, occupancy state classification unit, etc. may be any configuration and are not limited to the example of FIG. 11.
  • the number of downscalings and the conversion rate for each downscaling may be any number.
  • FIG. 11 illustrates an example in which the calculation unit is 3D-CNN
  • the calculation unit may have any configuration, such as 3D-CNN, a multilayer perceptron, or other neural networks, as in the calculation unit 122 of FIG. 5.
  • the guide surface feature amount derivation unit may be provided anywhere in the CNN sequence.
  • the guide surface feature amount may be derived for each scale.
  • the guide surface feature amount derivation unit 366 may derive the guide surface feature amount at 1/4 scale
  • the guide surface feature amount derivation unit 370 may derive the guide surface feature amount at 1/2 scale
  • the guide surface feature amount derivation unit 374 may derive the guide surface feature amount at 1/1 scale.
  • the guide surface feature amount may also be derived for 1/8 scale. That is, the scale of the second coordinate set and the second feature amount set may be larger than the scale of the first coordinate set and the first feature amount set.
  • the calculation unit may derive the second coordinate set and the second feature amount set by recursively repeating a predetermined calculation multiple times.
  • the calculation unit may upscale the input coordinate set and feature amount set in all or part of the multiple predetermined calculations.
  • the guide surface feature derivation unit may then derive guide surface feature values for some or all of the scales of the coordinate set and feature set.
  • the guide surface may also be generated (restored) by decoding.
  • the guide surface decoding unit 364 may decode the encoded data of the guide surface and generate the guide surface.
  • the decoding device 350 executes the decoding process to decode the bit stream and generate (restore) the geometry as described above.
  • An example of the flow of this decoding process will be described with reference to the flowcharts of Figs. 12 and 13.
  • the score decoding unit 361 decodes the score bit stream in step S351 of FIG. 12, and generates scores for each scale (N_X, N_X', N_X'').
  • step S352 the coordinate decoding unit 362 decodes the coordinate bit stream and generates a coordinate set C_Y.
  • step S353 the feature decoding unit 363 decodes the feature bit stream and generates a feature set F_Y.
  • step S354 the guide surface decoding unit 364 decodes the guide surface bitstream and generates a decoded guide surface.
  • CNN365 sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C_Y, F_Y ⁇ .
  • CNN365 determines whether or not to input guide surface features to its own 3D-CNN. If it is determined that the guide surface features are to be input, the process proceeds to step S357. In other words, if the CNN to be processed is CNN367 (CNN_D2), CNN371 (CNN_D4), or CNN375 (CNN_D6), the process proceeds to step S357.
  • step S357 the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, or the guide surface feature amount derivation unit 374 executes a guide surface feature amount set derivation process to derive the guide surface feature amount set g(C ⁇ in).
  • This guide surface feature amount set derivation process is executed in the same manner as described with reference to the flowchart in FIG. 10.
  • step S358 the guide surface feature derivation unit 366, the guide surface feature derivation unit 370, or the guide surface feature derivation unit 374 sets F ⁇ in ⁇ [F ⁇ in, g(C ⁇ in)].
  • the guide surface feature derivation unit 366, the guide surface feature derivation unit 370, or the guide surface feature derivation unit 374 sets the concatenation of the feature set to be processed and the guide surface feature set g(C ⁇ in) as the feature set to be processed by the next 3D-CNN.
  • step S358 ends, the process proceeds to FIG. 13. Also, if it is determined in step S356 of FIG. 12 that guide surface features are not to be input to its own 3D-CNN, the processing of steps S357 and S358 is omitted, and the process proceeds to FIG. 13. In other words, if the CNN to be processed is CNN365 (CNN_D1), CNN369 (CNN_D3), or CNN373 (CNN_D5), the processing of steps S357 and S358 is omitted, and the process proceeds to FIG. 13.
  • CNN365 (CNN_D1), CNN367 (CNN_D2), CNN369 (CNN_D3), CNN371 (CNN_D4), CNN373 (CNN_D5), or CNN375 (CNN_D6) performs 3D-CNN operations on the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed, and derives the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ as the operation result.
  • step S372 the occupancy state classification unit 368, the occupancy state classification unit 372, or the occupancy state classification unit 376 determines whether or not to classify the occupancy state. If it is determined that the occupancy state is to be classified, the process proceeds to step S373. In other words, if the process of step S371 is executed by CNN 367 (CNN_D2), CNN 371 (CNN_D4), or CNN 375 (CNN_D6), the process proceeds to step S373.
  • CNN_D2 CNN 367
  • CNN_D4 CNN 371
  • CNN_D6 CNN 375
  • step S373 the occupancy state classification unit 368, the occupancy state classification unit 372, or the occupancy state classification unit 376 predicts the occupancy probability of each coordinate in the coordinate set C ⁇ out, extracts the top k coordinates with the highest occupancy probability, and deletes the remaining coordinates.
  • the scale value corresponding to the sparse tensor to be processed, among N_X, N_X', or N_X'', is applied to this k.
  • the occupancy state classification unit 368, the occupancy state classification unit 372, or the occupancy state classification unit 376 also deletes the features corresponding to the deleted coordinates in F ⁇ out.
  • step S373 ends, the process proceeds to step S374. Also, if it is determined in step S372 that occupancy state classification is not to be performed, the process of step S373 is omitted and the process proceeds to step S374. In other words, if the process of step S371 is performed by CNN365 (CNN_D1), CNN369 (CNN_D3), or CNN373 (CNN_D5), the process of step S373 is omitted and the process proceeds to step S374.
  • CNN365 CNN_D1
  • CNN369 CNN369
  • CNN373 CNN373
  • step S374 CNN375 (CNN_D6) determines whether processing has been performed for all 3D-CNNs. If it is determined that processing has not been performed for all 3D-CNNs (i.e., there are unprocessed 3D-CNNs (at least CNN375 (CNN_D6) is unprocessed)), processing proceeds to step S375.
  • CNN365 sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • CNN365 sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ of the calculation result derived by itself as the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed by the next processing unit (guide surface feature derivation unit or occupancy state classification unit).
  • the processing of step S375 returns to step S356 in FIG. 12, and subsequent processing is executed. That is, the processes of steps S356 to S358 in FIG. 12 and steps S371 to S375 in FIG. 13 are executed for each 3D-CNN (which may include a guide surface feature derivation unit and an occupancy state classification unit).
  • step S374 of FIG. 13 If it is determined in step S374 of FIG. 13 that processing has been performed for all 3D-CNNs, processing proceeds to step S376.
  • step S376 the occupancy state classification unit 376 sets C_X ⁇ C ⁇ out. In other words, the occupancy state classification unit 376 sets the coordinate set C ⁇ out of the calculation result as the decoding result (decoded point group C_X).
  • step S377 the occupancy state classification unit 376 outputs the decoded point group C_X to the outside of the decoding device 350.
  • step S377 the decoding process ends.
  • the decoding device 350 can suppress a decrease in encoding efficiency.
  • the guide surface feature amount may be a feature amount based on the nearest point on the guide surface from the point (method 1-1).
  • the guide surface feature amount may be a feature amount based on the nearest point on the guide surface with respect to the point.
  • the guide surface feature amount may be a vector from a point to the nearest point, as shown in the third row from the top of the table in FIG. 4 (method 1-1-1).
  • the nearest point p_i on the guide surface 142 is derived.
  • the guide surface feature amount may be a feature amount relating to the distance between a point and a nearest neighboring point (method 1-1-2).
  • the guide surface feature may be a signed distance.
  • a signed distance is a scalar value obtained by multiplying the distance from a point to a boundary surface by +1 if the point is included within the boundary, and by -1 if not.
  • the distance from coordinate c_i to the nearest point p_i on the guide surface is expressed as
  • signed distance can only be defined for faces that have an inside and an outside.
  • the guide surface is a mesh
  • the mesh must be watertight (in simple terms, a mesh without holes).
  • the unsigned distance g(c_i)
  • may be applied instead of the signed distance.
  • the unsigned distance simply indicates the distance only, and can be calculated even for meshes that are not watertight.
  • a truncated signed distance which limits the range of absolute values in the signed distance, may be applied as a guide surface feature.
  • a certain maximum absolute value ⁇ is determined, and if the absolute value of the signed distance is greater than ⁇ , it is truncated to - ⁇ or ⁇ .
  • the guide surface feature amount may be the feature amount of the nearest point (method 1-1-3).
  • some feature value h(p_i) at the nearest point p_i on the guide surface may be applied as the guide surface feature value g(c_i).
  • g(c_i) h(p_i) is defined.
  • This feature value h(p_i) may be anything. For example, it may be the normal vector at the nearest point p_i, or it may be the curvature of the guide surface at the nearest point p_i (a numerical value that indicates the degree of curvature of the surface, and Gaussian curvature or mean curvature can be specifically calculated).
  • the guide surface which is a reference surface formed in a three-dimensional space, may be configured with any 3D data.
  • the guide surface may be a mesh (method 1-2), as shown in the sixth row from the top of the table in FIG. 4.
  • the guide surface may be a mesh.
  • a mesh contains less information than a point cloud.
  • guide surface features that use the mesh as a guide surface (reference surface)
  • the amount of information to be encoded can be reduced compared to encoding the point cloud as is. This makes it possible to prevent a decrease in encoding efficiency.
  • the guide surface when the guide surface is a mesh, the guide surface may be output as 3D data of the decoding result. For example, the restoration of the point cloud may be omitted, and the decoded guide surface may be output as the decoding result. In this way, the increase in the load of the decoding process can be suppressed. Also, the decoded guide surface may be fitted with the restored point cloud. That is, when the above-mentioned method 1-2 is applied, the guide surface may be fitted to the point cloud as shown in the seventh row from the top of the table in FIG. 4 (method 1-2-1).
  • three-dimensional space 380A and three-dimensional space 380B in FIG. 15 are two-dimensional representations of three-dimensional space.
  • grey squares 381 indicate points of the restored point cloud.
  • all grey squares indicate points of the point cloud.
  • guide surface 382A is restored like three-dimensional space 380A. If the guide surface is a mesh, as in this example, there may be cases where the mesh surfaces and vertices are misaligned with respect to the point cloud. In other words, if such a guide surface is used as the decoded result, the resolution will be lower than in the case of a point cloud, and there is a risk that the quality of the decoded result will be reduced.
  • fitting may be performed on the guide surface using points of the point cloud (so that the surfaces and vertices of the guide surface are brought closer to the points of the point cloud).
  • the second information processing device may further include a fitting unit that fits the guide surface to the point cloud.
  • FIG. 16 is a block diagram showing an example of the main configuration of a decoding device when this method 1-2-1 is applied.
  • the decoding device 400 shown in FIG. 16 is a device that decodes a bit stream in which the geometry of a point cloud (3D data) is encoded, similar to the decoding device 350.
  • the decoding device 400 decodes the bit streams (guide surface bit stream, coordinate bit stream, feature bit stream, and point number bit stream) generated by encoding the point cloud (geometry) by the encoding device 300.
  • the decoding device 400 outputs a decoded guide surface (mesh) as the decoding result.
  • FIG. 16 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 16.
  • processing units that are not shown as blocks in FIG. 16, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 16.
  • the decoding device 400 has a decoding unit 411 and a fitting unit 412.
  • the decoding unit 411 has the same configuration as the decoding device 350 (FIG. 11) and performs the same processing. That is, the explanation given above with reference to FIGS. 11 to 13 can be applied to the decoding unit 411.
  • the decoding unit 411 decodes the point bit stream, coordinate bit stream, feature bit stream, and guide surface bit stream, and supplies the decoded point group C_X to the fitting unit 412.
  • the decoding unit 411 also supplies the generated decoded guide surface (in this case, the guide surface is a mesh, so it is also called a decoded guide mesh) to the fitting unit 412.
  • the fitting unit 412 obtains the decoded point group C_X and the decoded guide mesh.
  • the fitting unit 412 fits the decoded guide mesh to the decoded point group C_X.
  • any method may be used for this fitting.
  • an existing method for fitting a mesh to a point cloud may be applied.
  • the fitting unit 412 may perform mesh subdivision, which subdivides polygons, on the guide mesh, and mesh smoothing, which is a smoothing process.
  • the fitting unit 412 may then perform fitting using Template Deformation for Point Cloud Fitting, as described in Non-Patent Document 3, on the guide mesh whose surface shape has been smoothed by increasing the number of triangles.
  • This method is a process for fitting a template mesh to a point cloud.
  • the template mesh corresponds to the guide mesh
  • the point cloud corresponds to the decoded point cloud.
  • a small number of corresponding point pairs (information indicating which vertices of the template mesh correspond to which points in the point cloud) are required.
  • the nearest point in the point cloud is searched for from each vertex of the template mesh, and the top few pairs with the shortest distance to the nearest point can be used as the initial corresponding point pairs.
  • no rotation is input, and fitting can be performed with a rotation angle of 0.
  • all rotation matrices in the algorithm can be set to unit matrices.
  • the fitting unit 412 outputs the guide mesh after fitting to the outside of the decoding device 400 as the decoding result.
  • the decoding device 400 can output the guide mesh as the decoding result while suppressing any reduction in the quality of the decoding result.
  • step S401 the decoding unit 411 executes the decoding process described with reference to Figures 8 and 9, and decodes the bit stream to generate a decoded point group and a guide mesh.
  • step S402 the fitting unit 412 fits the guide mesh to the decoded point cloud.
  • step S403 the fitting unit 412 outputs the mesh resulting from the fitting to the outside of the decoding device 400 as a decoding result.
  • the decoding processing ends.
  • the decoding device 400 can output the guide mesh as the decoding result while suppressing any reduction in the quality of the decoding result.
  • the guide surface may be a triangle soup, as shown in the eighth row from the top of the table in FIG. 4 (method 1-3).
  • the guide surface may be a triangle soup.
  • the guide surface is stored as a type of mesh called a triangle soup, and is encoded using Trisoup coding.
  • Trisoup coding is one option of G-PCC for encoding dense point clouds.
  • the Triangle soup is the low-resolution Octree and the triangular data in each Octree node.
  • the low-resolution Octree is encoded using Octree coding, and the intersections (also called vertices) between the triangles in each node and each edge are encoded.
  • the decoder decodes the Triangle soup, samples the points from the surface, and outputs them.
  • ⁇ Encoding device> An example of the main configuration of the encoding device 300 when applying this method 1-3 is shown in Fig. 19. As shown in Fig. 19, in this case, the encoding device 300 has a Trisoup encoding unit 431 instead of the guide surface generating unit 311 and the guide surface encoding unit 312 in the example of Fig. 7. Also, the encoding device 300 has a Trisoup decoding unit 432 instead of the guide surface decoding unit 313 in the example of Fig. 7.
  • the Trisoup encoding unit 431 performs Trisoup encoding on the input point group C_X, estimates the Triangle soup internally, and generates encoded data (guide surface bit stream).
  • the Trisoup encoding unit 431 supplies the generated guide surface bit stream to the Trisoup decoding unit 432.
  • the Trisoup encoding unit 431 also outputs the guide surface bit stream to the outside of the encoding device 300.
  • the Trisoup encoding unit 431 has the functions of both the guide surface generation unit 311 and the guide surface encoding unit 312.
  • the Trisoup decoding unit 432 obtains the guide surface bit stream supplied from the Trisoup encoding unit 431.
  • the Trisoup decoding unit 432 Trisoup decodes the guide surface bit stream, generates a Triangle soup internally, and generates a point cloud sampled from that surface. However, instead of the point cloud, the Trisoup decoding unit 432 uses the Triangle soup generated internally during the decoding as the decoded guide surface, and supplies this to the guide surface feature derivation unit 315, the guide surface feature derivation unit 317, and the guide surface feature derivation unit 320.
  • the encoding device 300 can encode the guide surface as a triangle soup. In other words, in this case too, the encoding device 300 can reduce the amount of information to be encoded, and suppress a decrease in encoding efficiency.
  • the Trisoup encoding unit 431 encodes the low-resolution Octree.
  • the coordinate encoding unit 324 performs Octree encoding and encodes the Octree corresponding to the coordinate set C_Y. These two Octrees are not necessarily identical, but if the two Octrees match, the encoding device 300 may encode and output one of the Octrees. In this case, the decoding device 350 can reuse one of the Octrees obtained by decoding as the other Octree. In this way, it is possible to suppress an increase in the amount of code and further suppress a decrease in encoding efficiency.
  • the encoding device 300 may encode and output only the high-resolution octree.
  • the decoding device 350 may downscale the high-resolution octree obtained by decoding to restore the low-resolution octree. In this way, it is possible to suppress an increase in the amount of code and further suppress a decrease in encoding efficiency.
  • the Trisoup encoding unit 431 and the sparse tensor construction unit 314 acquire the input point cloud C_X, which is the point cloud to be input to the encoding device 300, in step S431 of FIG. 20.
  • step S432 the Trisoup encoding unit 431 performs Trisoup encoding on the input point group C_X to generate a guide plane bit stream.
  • the Trisoup encoding unit 431 outputs the guide plane bit stream to the outside of the encoding device 300.
  • step S433 the Triangle soup decoding unit 432 decodes the guide surface bit stream and generates a Triangle soup as the decoded guide surface.
  • steps S434 to S438 in FIG. 20 and steps S441 to S447 in FIG. 21 are executed in the same manner as the processes of steps S305 to S309 in FIG. 8 and steps S321 to S327 in FIG. 9. Then, when the process of step S447 in FIG. 21 ends, the encoding process ends.
  • the encoding device 300 can reduce the amount of information to be encoded and suppress a decrease in encoding efficiency.
  • the decoding device 350 may include a Trisoup decoding unit instead of the guide surface decoding unit 364 in Fig. 11.
  • This Trisoup decoding unit is a processing unit similar to the Trisoup decoding unit 432 of the encoding device 300, and performs similar processing.
  • this Trisoup decoding unit acquires the guide surface bit stream supplied to the decoding device 350, Trisoup-decodes the guide surface bit stream, generates a Triangle soup inside, sets it as a decoded guide surface, and supplies it to the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, and the guide surface feature amount derivation unit 374.
  • the other configurations are the same as those in Fig. 11.
  • step S364 the Triangle soup decoding unit performs Triangle decoding on the guide surface bit stream, and the obtained Triangle soup is used as the decoded guide surface.
  • the guide surface may be an implicit function expressing a three-dimensional shape, as shown in the ninth row from the top of the table in FIG. 4 (method 1-4).
  • the guide surface may be an implicit function.
  • the signed distance from a point (x, y, z) to a guide surface may be represented as f(x, y, z).
  • the unsigned distance or the truncated signed distance may be represented as f(x, y, z).
  • the occupancy probability value at a point (x, y, z) may also be represented as f(x, y, z).
  • an appropriately scaled value of the occupancy probability value may be represented as f(x, y, z).
  • the guide surface feature amount may be the output of the implicit function at that coordinate or the gradient vector of that output (method 1-4-1).
  • the value of the gradient vector ⁇ f at that coordinate value of the implicit function f or the Laplacian value ⁇ f may also be applied. Roughly speaking, when the implicit function f is a signed distance function, the gradient vector ⁇ f on the guide surface coincides with the normal vector.
  • the guide surface generating unit 311 estimates an implicit function as a guide surface from the input point group C_X. For example, the guide surface generating unit 311 generates a mesh from the input point group C_X, quantizes a three-dimensional space with an appropriate resolution (i.e., initializes a three-dimensional array), calculates the signed distance (or unsigned distance) from the center of each cell to the estimated mesh, and generates a three-dimensional array representing the signed distance. This three-dimensional array is supplied to the guide surface encoding unit 312 as a guide surface of the implicit function. This guide surface of the implicit function is also called a guide implicit function.
  • the guide surface encoding unit 312 encodes the guide implicit function to generate a guide surface bit stream. Any encoding method may be used.
  • the guide surface encoding unit 312 may apply an existing implicit function encoding method.
  • deep implicit volume compression as described in Non-Patent Document 4 may be applied. This method is capable of encoding a truncated signed distance.
  • the guide surface decoding unit 313 decodes the guide surface bit stream and generates (restores) a guide implicit function. Therefore, the obtained decoded guide surface is also called a decoded guide implicit function. Any decoding method may be applied as long as it corresponds to the encoding method of the guide surface encoding unit 312.
  • the decoded guide implicit function is supplied to the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, and the guide surface feature amount derivation unit 320.
  • the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, and the guide surface feature amount derivation unit 320 use the decoded guide implicit function to derive the guide surface feature amount.
  • the guide surface feature amount derivation unit 315, the guide surface feature amount derivation unit 317, and the guide surface feature amount derivation unit 320 may use the output of the implicit function at the coordinates of the point or the gradient vector of that output as this guide surface feature amount.
  • step S302 a guide implicit function is generated from the input point group.
  • the guide implicit function is encoded to generate a guide plane bit stream.
  • step S304 the guide plane bit stream is decoded to generate a decoded guide implicit function.
  • step S308 the output of the implicit function at the coordinates of the point and the gradient vector of that output are derived as guide surface features.
  • ⁇ Decryption device> The configuration of the decoding device 350 when applying this method 1-4-1 is the same as that described with reference to FIG. 11. However, the guide surface decoding unit 364 decodes the acquired guide surface bit stream and generates (restores) a guide implicit function. Any decoding method may be applied as long as it corresponds to the encoding method of the guide surface encoding unit 312. The guide surface decoding unit 364 supplies the generated decoded guide implicit function to the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, and the guide surface feature amount derivation unit 374.
  • the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, and the guide surface feature amount derivation unit 374 use the decoded guide implicit function to derive the guide surface feature amount.
  • the guide surface feature amount derivation unit 366, the guide surface feature amount derivation unit 370, and the guide surface feature amount derivation unit 374 may use the output of the implicit function at the coordinates of the point or the gradient vector of that output as this guide surface feature amount.
  • step S354 the guide surface bit stream is decoded to generate a decoded guide implicit function.
  • step S357 the output of the implicit function at the coordinates of the point and the gradient vector of the output are derived as guide surface feature amounts.
  • Method 1-5 when the above-mentioned method 1 is applied, the parameter set to be applied may be selected as shown in the eleventh row from the top of the table in FIG. 4 (method 1-5).
  • the first information processing device may further include a parameter set selection unit that selects a parameter set to be applied to the encoding of the point cloud.
  • the first information processing device may further include a guide surface generation unit that generates a guide surface using the point cloud, and a guide surface encoding unit that applies the selected parameter set to encode the guide surface and generate encoded data for the guide surface.
  • coding parameter sets that affect the coding efficiency and decoding error of the guide surface.
  • examples of coding parameter sets include the target number of triangles in decimation (the smaller the number, the better the coding efficiency but the larger the error) and the quantization parameter (QP (Quantization Parameter)) in the Draco encoder (the smaller the number, the better the coding efficiency but the larger the error).
  • QP Quantization Parameter
  • the encoder may select and apply the parameter set that provides the highest coding efficiency for the point cloud from a candidate set of parameter sets (also called guide surface coding parameter sets) that are applied to coding of such guide surfaces.
  • Fig. 22 is a block diagram showing a main configuration example of an encoding device when methods 1-5 are applied.
  • the encoding device 500 shown in Fig. 22 is a device that encodes the geometry of a point cloud (3D data) in the same manner as the encoding device 300.
  • the encoding device 500 encodes the geometry by applying the above-mentioned methods 1-5.
  • FIG. 22 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 22.
  • processing units that are not shown as blocks in FIG. 22, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 22.
  • the encoding device 500 has a parameter set supply unit 511, an encoding unit 512, a decoding unit 513, and a parameter set selection unit 514.
  • the parameter set supply unit 511 selects one parameter set from the candidate set of guide surface encoding parameter sets as the processing target, and supplies it to the encoding unit 512 and the parameter set selection unit 514.
  • the encoding unit 512 acquires the input point group C_X input to the encoding device 500.
  • the encoding unit 512 also acquires the guide surface encoding parameter set supplied from the parameter set supply unit 511.
  • the encoding unit 512 also applies the guide surface encoding parameter set to encode the input point group C_X and generate a bit stream.
  • the encoding unit 512 may have a configuration similar to that of the encoding device 300 (FIG. 7), for example, and perform similar processing. That is, the encoding unit 512 encodes the input point group C_X and generates a point number bit stream, a coordinate bit stream, a feature amount bit stream, and a guide surface bit stream.
  • the encoding unit 512 applies the guide surface encoding parameter set supplied from the parameter set supply unit 511 in the guide surface encoding unit 312. That is, the guide surface encoding unit 312 of the encoding unit 512 encodes the guide surface using the guide surface encoding parameter set supplied from the parameter set supply unit 511, and generates a guide surface bit stream.
  • the encoding unit 512 supplies the bit streams generated in this manner (score bit stream, coordinate bit stream, feature bit stream, and guide surface bit stream) to the decoding unit 513 and the parameter set selection unit 514.
  • the decoding unit 513 obtains the bit stream supplied from the encoding unit 512.
  • the decoding unit 513 decodes the bit stream and generates (restores) the decoded point group C_X.
  • the decoding unit 513 may have a configuration similar to that of the decoding device 350 (FIG. 11), for example, and may perform similar processing. That is, the decoding unit 513 decodes the point number bit stream, coordinate bit stream, feature amount bit stream, and guide surface bit stream, and generates the decoded point group C_X.
  • the decoding unit 513 supplies the generated decoded point group C_X to the parameter set selection unit 514.
  • the parameter set selection unit 514 acquires a guide surface encoding parameter set supplied from the parameter set supply unit 511.
  • the parameter set selection unit 514 acquires an input point group C_X input to the encoding device 500.
  • the parameter set selection unit 514 acquires bit streams (e.g., a point number bit stream, a coordinate bit stream, a feature amount bit stream, and a guide surface bit stream) supplied from the encoding unit 512.
  • the parameter set selection unit 514 acquires the decoded point group C_X supplied from the decoding unit 513 .
  • the parameter set selection unit 514 selects a guide plane coding parameter set that maximizes the coding efficiency of the input point group C_X based on these.
  • the parameter set selection unit 514 selects a bit stream corresponding to the selected guide plane coding parameter set and outputs it to the outside of the coding device 500.
  • R indicates the total size of the bit stream.
  • D indicates the error between the decoded geometry and the input geometry.
  • the coefficient ⁇ is a parameter that determines whether the bit size or the decoding error is to be emphasized.
  • the D1 metric which is an index that measures the distance between point clouds, or the D2 metric, which also takes into account the normal direction, may be applied.
  • the coding device 500 can suppress a decrease in coding efficiency.
  • step S501 the parameter set supply unit 511 sets the coefficient ⁇ of the RD cost function and a candidate set of guide surface encoding parameter sets.
  • step S502 the parameter set supply unit 511 selects the i-th guide surface encoding parameter set from the set candidate set. In other words, the parameter set supply unit 511 selects one unprocessed guide surface encoding parameter set.
  • step S503 the encoding unit 512 executes the encoding process described with reference to the flowcharts of, for example, FIG. 8 and FIG. 9.
  • the encoding unit 512 applies the guide surface encoding parameters selected in step S502 to encode the guide surface and generate a guide surface bit stream.
  • the encoding unit 512 then decodes the guide surface bit stream to generate a decoded guide surface, and encodes the point group to generate a point number bit stream, a coordinate bit stream, and a feature bit stream.
  • the guide surface encoding parameter set to be applied changes, the size of the guide surface bit stream and the decoded guide surface change. In other words, the overall bit stream size and the decoded point group C_X also change. Therefore, the encoding unit 512 derives not only the guide surface bit stream, but all bit streams.
  • step S504 the parameter set selection unit 514 calculates the total size R_i of all bit streams (score bit stream, coordinate bit stream, feature bit stream, and guide surface bit stream) generated in step S503.
  • step S505 the decoding unit 513 executes the decoding process described with reference to the flowcharts in Figures 12 and 13, decodes the bit streams (point number bit stream, coordinate bit stream, feature bit stream, and guide surface bit stream) generated in step S503, and generates a decoded point group C_X.
  • step S506 the parameter set selection unit 514 derives the error D_i between the decoded geometry (the decoded point group C_X generated in step S505) and the input geometry (the input point group C_X).
  • step S508 the parameter set selection unit 514 determines whether all parameter sets have been processed. If it is determined that an unprocessed parameter set exists, the process returns to step S502, and the subsequent processes are executed. In other words, the processes of steps S502 to S508 are executed for each parameter set, and the value C_i of the RD cost function is derived.
  • step S508 If it is determined in step S508 that processing has been performed for all parameter sets, processing proceeds to step S509.
  • step S509 the parameter set selection unit 514 compares the RD cost function values C_i corresponding to each parameter set derived as described above, and selects the parameter set for which the RD cost function value C_i is the smallest.
  • step S510 the parameter set selection unit 514 outputs a bit stream corresponding to the selected parameter set to the outside of the encoding device 500.
  • step S510 When the processing of step S510 is completed, the decoding process ends.
  • the encoding device 500 can select a guide surface encoding parameter set that optimizes the RD cost and apply it to the encoding of the input point group C_X (for encoding the guide surface). Therefore, the encoding device 500 can suppress a decrease in encoding efficiency.
  • Method 1-6 attributes may be transmitted (method 1-6) as shown in the bottom row of the table in Fig. 4. That is, although the transmission (encoding/decoding) of geometry has been described above, not only geometry but also attributes may be transmitted (encoding/decoding).
  • the guide surface feature derivation unit may derive guide surface features using a coordinate set, a feature set including attribute information, and a guide surface including texture. Then, the calculation unit may perform calculations using the guide surface features to derive a new coordinate set and a feature set including attributes. Then, the coordinate encoding unit may encode the coordinate set to generate a coordinate bit stream. Also, the feature encoding unit may encode the feature set including attributes to generate a feature bit stream. Also, the guide surface encoding unit may encode the guide surface including texture to generate a guide surface bit stream. Also, the score encoding unit may encode information indicating the score of each scale to generate a score bit stream.
  • the coordinate decoding unit may decode the coordinate bit stream to generate a coordinate set.
  • the feature decoding unit may decode the feature bit stream to generate a feature set including attribute information.
  • the guide surface decoding unit may decode the guide surface bit stream to generate a decoded guide surface including texture.
  • the guide surface feature derivation unit may derive guide surface features using the coordinate set, the feature set including attributes, and the decoded guide surface including texture.
  • the calculation unit may then perform calculations using the derived guide surface features to derive a new coordinate set and feature set including attributes.
  • attributes are attached to each point in the input point cloud.
  • the attributes can be any kind of information. For example, they can include color information or reflected brightness.
  • Such point cloud attributes can be transmitted (encoded and decoded) in the same framework as geometry.
  • Fig. 24 is a block diagram showing an example of the main configuration of an encoding device when methods 1-6 are applied.
  • the encoding device 600 shown in Fig. 24 is a device that encodes a point cloud (3D data) in the same manner as the encoding device 300.
  • the encoding device 600 performs encoding by applying the above-mentioned methods 1-6. That is, the encoding device 600 encodes attributes together with geometry.
  • FIG. 24 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 24.
  • processing units that are not shown as blocks in FIG. 24, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 24.
  • the encoding device 600 has a guide surface generation unit 611, a guide surface encoding unit 612, a guide surface decoding unit 613, a sparse tensor construction unit 614, a guide surface feature derivation unit 615, a CNN 616 (CNN_E1), a guide surface feature derivation unit 617, a CNN 618 (CNN_E2), a CNN 619 (CNN_E3), a guide surface feature derivation unit 620, a CNN 621 (CNN_E4), a CNN 622 (CNN_E5), a CNN 623 (CNN_E6), a coordinate encoding unit 624, a feature encoding unit 625, and a score encoding unit 626.
  • the input point group input to the encoding device 600 is composed of geometry and attributes.
  • each point has coordinate values (x, y, z) and attribute values (e.g., RGB values in the case of color information).
  • the guide surface generation unit 611 is a processing unit similar to the guide surface generation unit 311 ( Figure 7) and performs similar processing. However, the guide surface generation unit 611 obtains an input point group composed of geometry and attributes, and generates a guide surface including the attributes. For example, if the guide surface is composed of a mesh, the attributes are formed as a texture. Therefore, in this case, the guide surface generation unit 611 generates a guide surface including a texture.
  • texture coordinate values values that indicate where that vertex is located in the texture image
  • UV coordinates also called UV coordinates
  • the guide surface generator 611 In addition to generating a guide surface (mesh) from the input point cloud, the guide surface generator 611 must also generate a texture from the input attributes. For each point in the input point cloud, the guide surface generator 611 searches for the nearest point on the guide mesh and copies the attributes of that point to the texel (one pixel in texture data is called a texel) that corresponds to that nearest point. However, after the above processing, there is a possibility that empty texels will exist. In that case, the guide surface generator 611 may use neighboring values to complement the empty texel value.
  • the guide surface encoding unit 612 is a processing unit similar to the guide surface encoding unit 312 (FIG. 7) and performs similar processing. However, the guide surface encoding unit 612 encodes the guide surface including the texture supplied from the guide surface generation unit 611 to generate a guide surface bit stream. That is, the guide surface encoding unit 612 needs to encode the texture data in addition to encoding the mesh data. That is, the guide surface encoding unit 612 not only encodes the geometry but also encodes the texture coordinates and texture image of each vertex. Any encoding method may be applied to encode the texture coordinates of each vertex. For example, Draco encoding may be applied (the Draco encoder can encode the texture coordinates in addition to the mesh).
  • the guide surface bit stream in this case includes a bit stream that encodes the texture in addition to the bit stream that encodes the mesh.
  • the guide surface decoding unit 613 is a processing unit similar to the guide surface decoding unit 313 (FIG. 7) and performs similar processing. However, the guide surface decoding unit 613 decodes the guide surface bit stream and generates a decoded guide surface including texture. In other words, the guide surface decoding unit 613 decodes the bit stream to not only generate geometry, but also generate texture coordinates and texture images for each vertex. Any decoding method may be applied to the decoding when generating texture coordinates for each vertex. For example, Draco decoding may be applied (a Draco decoder can generate texture coordinates in addition to meshes). In addition, an existing image decoding method (for example, JPEG, PNG, etc.) may be applied to the decoding when generating texture images.
  • Draco decoding may be applied (a Draco decoder can generate texture coordinates in addition to meshes).
  • an existing image decoding method for example, JPEG, PNG, etc.
  • the feature set F_X is a vector that lists the attributes of each coordinate.
  • each feature f_i ⁇ F_X is the attribute value of each point (e.g., RGB value).
  • the input dimension of the CNN 616 (CNN_E1) must also be increased accordingly.
  • a_i e.g., RGB values for color information
  • CNN616 (CNN_E1) is a processing unit similar to CNN316 (CNN_E1) ( Figure 7) and performs similar processing.
  • F_X is changed to a vector that lists the attributes of each coordinate.
  • each feature f_i ⁇ F_X is the attribute value of each point (for example, RGB value).
  • the input dimension of CNN616 (CNN_E1) must also be increased accordingly.
  • the guide surface feature amount derivation unit 617 is a processing unit similar to the guide surface feature amount derivation unit 317 ( Figure 7) and performs similar processing. However, like the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 617 also includes features extracted from the texture in the guide surface feature amount in order to use attribute information on the texture for encoding and decoding.
  • CNN618 (CNN_E2) is a processing unit similar to CNN318 (CNN_E2) ( Figure 7) and performs similar processing. However, in this case, it is necessary to increase the input dimensions of CNN618 (CNN_E2), as in the case of CNN616 (CNN_E1).
  • CNN619 is a processing unit similar to CNN319 (CNN_E3) ( Figure 7) and performs similar processing.
  • the guide surface feature amount derivation unit 620 is a processing unit similar to the guide surface feature amount derivation unit 320 ( Figure 7) and performs similar processing. However, like the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 620 also includes features extracted from the texture in the guide surface feature amount in order to use attribute information on the texture for encoding and decoding.
  • CNN621 (CNN_E4) is a processing unit similar to CNN321 (CNN_E4) ( Figure 7) and performs similar processing. However, in this case, it is necessary to increase the input dimensions of CNN621 (CNN_E4), as in the case of CNN616 (CNN_E1).
  • CNN622 (CNN_E5) is a processing unit similar to CNN322 (CNN_E5) ( Figure 7) and performs similar processing.
  • CNN623 (CNN_E6) is a processing unit similar to CNN323 (CNN_E6) ( Figure 7) and performs similar processing.
  • the coordinate encoding unit 624 is a processing unit similar to the coordinate encoding unit 324 ( Figure 7) and performs similar processing.
  • the feature encoding unit 625 is a processing unit similar to the feature encoding unit 325 ( Figure 7) and performs similar processing. In this case, however, the feature set encoded by the feature encoding unit 625 includes not only geometry information but also attribute information.
  • the score encoding unit 626 is a processing unit similar to the score encoding unit 326 ( Figure 7) and performs similar processing.
  • the encoding device 600 can encode not only geometry but also attributes.
  • the guide surface generation unit 611 and the sparse tensor construction unit 614 acquire the input point group C_X, which is the point cloud input to the encoding device 600, and its attributes in step S601 of FIG. 25.
  • step S602 the guide surface generation unit 611 generates a guide surface and texture data from the input point group C_X and attributes.
  • step S603 the guide surface encoding unit 612 encodes the generated guide surface and texture data to generate a guide surface bit stream.
  • the guide surface encoding unit 612 outputs the guide surface bit stream to the outside of the encoding device 600.
  • step S604 the guide surface decoding unit 613 decodes the guide surface bitstream and generates (restores) a decoded guide surface and decoded texture data.
  • step S606 the guide surface feature derivation unit 615 sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ (C_X, F_X).
  • step S607 the guide surface feature derivation unit 615, the guide surface feature derivation unit 617, or the guide surface feature derivation unit 620 determines whether or not to input the guide surface feature to the next 3D-CNN. If it is determined that the guide surface feature is to be input to the next 3D-CNN, the process proceeds to step S608.
  • step S608 the guide surface feature derivation unit 615, the guide surface feature derivation unit 617, or the guide surface feature derivation unit 620 executes a guide surface feature set derivation process, and derives a guide surface feature set g(C ⁇ in) using the sparse tensor to be processed, the decoded guide surface, and the decoded texture data.
  • step S609 the guide surface feature derivation unit 615, the guide surface feature derivation unit 617, or the guide surface feature derivation unit 620 sets F ⁇ in ⁇ [F ⁇ in, g(C ⁇ in)].
  • the guide surface feature derivation unit 615, the guide surface feature derivation unit 617, or the guide surface feature derivation unit 620 sets the concatenation of the feature set to be processed and the guide surface feature set g(C ⁇ in) as the feature set to be processed by the next 3D-CNN.
  • step S609 When the processing of step S609 is completed, the processing proceeds to FIG. 26. Also, in step S607 of FIG. 25, if it is determined that the guide surface feature amount is not to be input to the next 3D-CNN, the processing proceeds to FIG. 26. In other words, if there is no guide surface feature amount derivation unit immediately before the 3D-CNN, the processing of steps S608 and S609 is omitted. For example, when the encoding device 600 has the configuration example shown in FIG.
  • CNN616 (CNN_E1), CNN618 (CNN_E2), CNN619 (CNN_E3), CNN621 (CNN_E4), CNN622 (CNN_E5), or CNN623 (CNN_E6) performs 3D-CNN operations on the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed, and derives the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ as the operation result.
  • CNN623 determines whether processing has been performed for all 3D-CNNs. If it is determined that processing has not been performed for all 3D-CNNs (i.e., there are unprocessed 3D-CNNs (at least CNN623 (CNN_E6) is unprocessed)), processing proceeds to step S623.
  • CNN616 (CNN_E1), CNN618 (CNN_E2), CNN619 (CNN_E3), CNN621 (CNN_E4), or CNN622 (CNN_E5) sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • CNN616 (CNN_E1), CNN618 (CNN_E2), CNN619 (CNN_E3), CNN621 (CNN_E4), or CNN622 (CNN_E5) sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ resulting from the calculation it derived as the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed by the next processing unit (guide surface feature derivation unit or 3D-CNN).
  • step S623 When the processing of step S623 ends, the process returns to step S607 in FIG. 25, and the subsequent processing is executed.
  • the processing of each of steps S607 to S609 in FIG. 25 and the processing of each of steps S621 to S623 in FIG. 26 are executed for each 3D-CNN (including the guide surface feature derivation unit if one exists immediately before the 3D-CNN).
  • step S622 of FIG. 26 If it is determined in step S622 of FIG. 26 that processing has been performed for all 3D-CNNs (i.e., processing has been performed for CNN623 (CNN_E6)), processing proceeds to step S624.
  • step S624 CNN 623 (CNN_E6) sets ⁇ C_Y, F_Y ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • CNN 623 sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ , which is the result of the calculation it derived, as the output sparse tensor ⁇ C_Y, F_Y ⁇ .
  • step S625 the coordinate encoding unit 624 encodes the coordinate set C_Y of the output sparse tensor ⁇ C_Y, F_Y ⁇ to generate a coordinate bitstream. It also outputs the coordinate bitstream to the outside of the encoding device 600.
  • step S626 the feature encoding unit 625 encodes the feature set F_Y of the output sparse tensor ⁇ C_Y, F_Y ⁇ to generate a feature bitstream.
  • the feature bitstream is then output to the outside of the encoding device 600.
  • step S627 the scores for each scale are encoded to generate a score bit stream.
  • the score bit stream is then output to the outside of the encoding device 600.
  • step S627 When the processing of step S627 is completed, the encoding process ends.
  • step S631 the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 617, or the guide surface feature amount derivation unit 620 extracts coordinates c_i ⁇ C ⁇ in.
  • the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 617, or the guide surface feature amount derivation unit 620 selects one coordinate c_i from the coordinate set C ⁇ in to be processed, and makes it the processing target.
  • step S632 the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 617, or the guide surface feature amount derivation unit 620 searches for the nearest point p_i on the decoded guide surface from the selected coordinate c_i.
  • the guide surface feature amount derivation unit 615, the guide surface feature amount derivation unit 617, or the guide surface feature amount derivation unit 620 derives the guide surface feature amount g(c_i) based on the positional relationship between the guide surface and the point (in this example, p_i - c_i) and the attribute value (in this example, a_i).
  • step S634 the guide surface feature derivation unit 615, the guide surface feature derivation unit 617, or the guide surface feature derivation unit 620 determines whether or not processing has been performed for all coordinates c_i ⁇ C ⁇ in. If it is determined that an unprocessed coordinate c_i exists, the process returns to step S631, and the subsequent processes are executed. In other words, the processes of steps S631 to S634 are executed for each coordinate value c_i in the coordinate set C ⁇ in to be processed.
  • step S634 If it is determined in step S634 that processing has been performed for all coordinate values c_i, the guide surface feature set derivation process ends and processing returns to FIG. 25.
  • the encoding device 600 can encode not only geometry but also attributes.
  • FIG. 28 is a block diagram showing a main configuration example of a decoding device when applying methods 1-6.
  • the decoding device 650 shown in FIG. 28 is a device that decodes a bit stream in which the geometry of a point cloud (3D data) is encoded, similar to the case of the decoding device 350.
  • the decoding device 650 decodes bit streams (guide surface bit stream, coordinate bit stream, feature bit stream, and point number bit stream) generated by encoding the point cloud (geometry and attributes) by the encoding device 600, and generates (restores) the point cloud (geometry and attributes).
  • FIG. 28 shows the main processing units, data flows, etc., and is not necessarily all that is shown in FIG. 28.
  • the decoding device 650 there may be processing units that are not shown as blocks in FIG. 28, and there may be processing and data flows that are not shown as arrows, etc. in FIG. 28.
  • the decoding device 650 has a score decoding unit 661, a coordinate decoding unit 662, a feature decoding unit 663, a guide surface decoding unit 664, a CNN 665 (CNN_D1), a guide surface feature derivation unit 666, a CNN 667 (CNN_D2), an occupancy state classification unit 668, a CNN 669 (CNN_D3), a guide surface feature derivation unit 670, a CNN 671 (CNN_D4), an occupancy state classification unit 672, a CNN 673 (CNN_D5), a guide surface feature derivation unit 674, a CNN 675 (CNN_D6), and an occupancy state classification unit 676.
  • the score decoding unit 661 is a processing unit similar to the score decoding unit 361 (FIG. 11) and performs similar processing. For example, the score decoding unit 661 may obtain the score bit stream input to the decoding device 650, decode it, and generate (restore) information indicating the scores of the coordinate sets at each scale (N_X, N_X', N_X''). The score decoding unit 661 may supply N_X to the occupancy state classification unit 676, supply N_X' to the occupancy state classification unit 672, and supply N_X'' to the occupancy state classification unit 668.
  • the coordinate decoding unit 662 is a processing unit similar to the coordinate decoding unit 362 (FIG. 11) and performs similar processing. For example, the coordinate decoding unit 662 may obtain the coordinate bit stream input to the decoding device 650, decode it, and generate (restore) the coordinate set C_Y. The coordinate decoding unit 662 may supply the generated coordinate set C_Y to the CNN 665 (CNN_D1).
  • the feature decoding unit 663 is a processing unit similar to the feature decoding unit 363 (FIG. 11) and performs similar processing. For example, the feature decoding unit 663 may obtain the feature bit stream input to the decoding device 650, perform entropy decoding, and generate (restore) a feature set F_Y. The feature decoding unit 663 may supply the generated feature set F_Y to the CNN 665 (CNN_D1).
  • the guide surface decoding unit 664 is a processing unit similar to the guide surface decoding unit 613 (FIG. 24), and executes processing related to the decoding of the guide surface bit stream. For example, the guide surface decoding unit 664 acquires the guide surface bit stream input to the decoding device 650, decodes it, and generates a decoded guide surface including texture. In other words, the guide surface decoding unit 664 decodes the bit stream to generate not only geometry, but also texture coordinates and texture images of each vertex. This decoding method is arbitrary, similar to the case of the guide surface decoding unit 613 (FIG. 24). The guide surface decoding unit 664 may supply the generated decoded guide surface (including texture) to the guide surface feature amount derivation unit 666, the guide surface feature amount derivation unit 670, and the guide surface feature amount derivation unit 674.
  • CNN665 (CNN_D1) is a processing unit similar to CNN365 (CNN_D1) ( Figure 11) and performs similar processing.
  • the feature set in the sparse tensor processed by CNN665 includes not only geometry information but also attribute information.
  • CNN667 (CNN_D2) is a processing unit similar to CNN367 (CNN_D2) ( Figure 11) and performs similar processing.
  • CNN367 CNN367
  • Figure 11 the number of dimensions of the guide surface feature increases by the number of dimensions of the attribute value, so it is necessary to increase the input dimensions of CNN667 (CNN_D2).
  • CNN669 is a processing unit similar to CNN369 (CNN_D3) ( Figure 11) and performs similar processing.
  • the guide surface feature amount derivation unit 670 is a processing unit similar to the guide surface feature amount derivation unit 370 (FIG. 11) and performs similar processing. However, like the guide surface feature amount derivation unit 666, the guide surface feature amount derivation unit 670 also includes features extracted from the texture in the guide surface feature amount in order to use attribute information on the texture for encoding and decoding.
  • CNN671 (CNN_D4) is a processing unit similar to CNN371 (CNN_D4) ( Figure 11) and performs similar processing.
  • the number of dimensions of the guide surface feature increases by the number of dimensions of the attribute value, so it is necessary to increase the input dimensions of CNN671 (CNN_D4).
  • CNN673 (CNN_D5) is a processing unit similar to CNN373 (CNN_D5) ( Figure 11) and performs similar processing.
  • the guide surface feature amount derivation unit 674 is a processing unit similar to the guide surface feature amount derivation unit 374 ( Figure 11) and performs similar processing. However, like the guide surface feature amount derivation unit 666, the guide surface feature amount derivation unit 674 also includes features extracted from the texture in the guide surface feature amount in order to use attribute information on the texture for encoding and decoding.
  • CNN675 (CNN_D6) is a processing unit similar to CNN375 (CNN_D6) ( Figure 11) and performs similar processing.
  • the number of dimensions of the guide surface features increases by the number of dimensions of the attribute values, so it is necessary to increase the input dimensions of CNN675 (CNN_D6).
  • CNN675 (CNN_D6) needs to increase the output dimensions by the number of attribute values.
  • each feature vector f_i ⁇ F_X in the feature vector set F_X output by the CNN 675 (CNN_D6) is expanded by three dimensions to become an RGB value.
  • the occupancy state classification unit 676 sets the attribute value as the attribute of each point in the decoded point group C_X and outputs it.
  • the decoding device 650 can decode not only geometry but also attributes.
  • the score decoding unit 661 decodes the score bit stream in step S651 of FIG. 29, and generates scores for each scale (N_X, N_X', N_X'').
  • step S652 the coordinate decoding unit 662 decodes the coordinate bit stream and generates a coordinate set C_Y.
  • step S653 the feature decoding unit 663 decodes the feature bit stream and generates a feature set F_Y.
  • step S654 the guide surface decoding unit 664 decodes the guide surface bitstream and generates a decoded guide surface and decoded texture data.
  • CNN665 sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C_Y, F_Y ⁇ .
  • CNN665 determines whether or not to input guide surface features to its own 3D-CNN. If it is determined that the guide surface features are to be input, the process proceeds to step S657. In other words, if the CNN to be processed is CNN667 (CNN_D2), CNN671 (CNN_D4), or CNN375 (CNN_D6), the process proceeds to step S657.
  • step S657 the guide surface feature amount derivation unit 666, the guide surface feature amount derivation unit 670, or the guide surface feature amount derivation unit 674 executes a guide surface feature amount set derivation process to derive a guide surface feature amount set g(C ⁇ in).
  • This guide surface feature amount set derivation process is executed in the same manner as described with reference to the flowchart in FIG. 27.
  • the guide surface feature amount derivation unit 666, the guide surface feature amount derivation unit 670, or the guide surface feature amount derivation unit 674 derives the guide surface feature amount set g(C ⁇ in) using the sparse tensor to be processed, the decoded guide surface, and the decoded texture data.
  • step S658 the guide surface feature derivation unit 666, the guide surface feature derivation unit 670, or the guide surface feature derivation unit 674 sets F ⁇ in ⁇ [F ⁇ in, g(C ⁇ in)].
  • the guide surface feature derivation unit 766, the guide surface feature derivation unit 670, or the guide surface feature derivation unit 674 sets the concatenation of the feature set to be processed and the guide surface feature set g(C ⁇ in) as the feature set to be processed by the next 3D-CNN.
  • step S658 ends, the process proceeds to FIG. 30. Also, if it is determined in step S656 of FIG. 29 that guide surface features are not to be input to its own 3D-CNN, the processing of steps S657 and S658 is omitted, and the process proceeds to FIG. 30. In other words, if the CNN to be processed is CNN665 (CNN_D1), CNN669 (CNN_D3), or CNN673 (CNN_D5), the processing of steps S657 and S658 is omitted, and the process proceeds to FIG. 30.
  • CNN665 (CNN_D1), CNN667 (CNN_D2), CNN669 (CNN_D3), CNN671 (CNN_D4), CNN673 (CNN_D5), or CNN675 (CNN_D6) performs 3D-CNN operations on the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed, and derives the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ as the operation result.
  • step S672 the occupancy state classification unit 668, the occupancy state classification unit 672, or the occupancy state classification unit 676 determines whether or not to classify the occupancy state. If it is determined that the occupancy state is to be classified, the process proceeds to step S673. In other words, if the process of step S671 is executed by CNN 667 (CNN_D2), CNN 671 (CNN_D4), or CNN 675 (CNN_D6), the process proceeds to step S673.
  • step S673 the occupancy state classification unit 668, occupancy state classification unit 672, or occupancy state classification unit 676 predicts the occupancy probability of each coordinate in the coordinate set C ⁇ out, extracts the top k coordinates with the highest occupancy probability, and deletes the remaining coordinates.
  • the scale value corresponding to the sparse tensor to be processed, among N_X, N_X', or N_X'', is applied to this k.
  • the occupancy state classification unit 668, occupancy state classification unit 672, or occupancy state classification unit 676 also deletes the features corresponding to the deleted coordinates in F ⁇ out.
  • step S673 ends, the process proceeds to step S674. Also, if it is determined in step S672 that occupancy state classification is not to be performed, the process of step S673 is omitted and the process proceeds to step S674. In other words, if the process of step S671 is performed by CNN665 (CNN_D1), CNN669 (CNN_D3), or CNN673 (CNN_D5), the process of step S673 is omitted and the process proceeds to step S674.
  • CNN665 CNN_D1
  • CNN669 CNN669
  • CNN_D5 CNN673
  • step S674 CNN675 (CNN_D6) determines whether processing has been performed for all 3D-CNNs. If it is determined that processing has not been performed for all 3D-CNNs (i.e., there are unprocessed 3D-CNNs (at least CNN675 (CNN_D6) is unprocessed)), processing proceeds to step S675.
  • CNN665 (CNN_D1), CNN667 (CNN_D2), CNN669 (CNN_D3), CNN671 (CNN_D4), CNN673 (CNN_D5), or CNN675 (CNN_D6) sets ⁇ C ⁇ in, F ⁇ in ⁇ ⁇ ⁇ C ⁇ out, F ⁇ out ⁇ .
  • CNN665 (CNN_D1), CNN667 (CNN_D2), CNN669 (CNN_D3), CNN671 (CNN_D4), CNN673 (CNN_D5), or CNN675 (CNN_D6) sets the sparse tensor ⁇ C ⁇ out, F ⁇ out ⁇ of the calculation result derived by itself as the sparse tensor ⁇ C ⁇ in, F ⁇ in ⁇ to be processed by the next processing unit (guide surface feature derivation unit or occupancy state classification unit).
  • the process returns to step S656 in FIG. 29, and subsequent processing is executed. That is, the processes of steps S656 to S658 in FIG. 29 and steps S671 to S675 in FIG. 30 are executed for each 3D-CNN (which may include a guide surface feature derivation unit and an occupancy state classification unit).
  • step S674 of FIG. 30 If it is determined in step S674 of FIG. 30 that processing has been performed for all 3D-CNNs, processing proceeds to step S676.
  • step S676 the occupancy state classification unit 676 sets C_X ⁇ C ⁇ out.
  • the occupancy state classification unit 676 sets the coordinate set C ⁇ out of the calculation result as the decoding result (decoded point group C_X).
  • the occupancy state classification unit 676 also extracts attribute values from F ⁇ out. For example, if the attribute is an RGB value, the occupancy state classification unit 676 extracts the last three dimensions of each feature vector of F ⁇ out and sets this vector as the attribute.
  • step S677 the occupancy state classification unit 676 outputs the obtained decoded point group C_X to the outside of the decoding device 650. In addition, the occupancy state classification unit 676 outputs the obtained attributes to the outside of the decoding device 650.
  • step S677 the decoding process ends.
  • the decoding device 350 can decode not only geometry but also attributes.
  • FIG. 31 shows an example of a comparison result of RD costs between the case where the present technology described above is applied and the case where the conventional method is applied.
  • the graph in Figure 31 shows an example of an RD curve plot when a point cloud (one frame, approximately 900,000 points, 10-bit accuracy) obtained by scanning a specific person is intra-encoded and intra-decoded.
  • the vertical axis shows mseF PSNR (p2point), or the so-called D1 Metric PSNR.
  • the horizontal axis shows the bit size per point (bpp (bits per point)). In other words, if the position on the vertical axis is the same, the further to the left the position on the horizontal axis, the less code there is.
  • the solid line in the graph of Figure 31 is an RD curve showing the relationship between bit size and PSNR when encoding a point cloud using a conventional method without using a guide surface.
  • the dotted line in the graph of Figure 31 is an RD curve showing the relationship between bit size and PSNR (Peak Signal-to-Noise Ratio) when encoding a point cloud using a guide surface by applying this technology.
  • the dotted lines are generally positioned above the solid lines. Therefore, encoding using a method that applies this technology (a method that uses a guide surface) can obtain better results (higher quality and lower code volume) than conventional methods (methods that do not use a guide surface). In other words, by applying this technology, it is possible to suppress a decrease in encoding efficiency.
  • FIG. 32 is a block diagram showing an example of the hardware configuration of a computer that executes the above-mentioned series of processes using a program.
  • a CPU Central Processing Unit
  • ROM Read Only Memory
  • RAM Random Access Memory
  • An input/output interface 910 Also connected to the bus 904 is an input/output interface 910.
  • An input unit 911, an output unit 912, a memory unit 913, a communication unit 914, and a drive 915 are connected to the input/output interface 910.
  • the input unit 911 includes, for example, a keyboard, a mouse, a microphone, a touch panel, an input terminal, etc.
  • the output unit 912 includes, for example, a display, a speaker, an output terminal, etc.
  • the storage unit 913 includes, for example, a hard disk, a RAM disk, a non-volatile memory, etc.
  • the communication unit 914 includes, for example, a network interface.
  • the drive 915 drives removable media 921 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
  • the CPU 901 loads a program stored in the storage unit 913, for example, into the RAM 903 via the input/output interface 910 and the bus 904, and executes the program, thereby carrying out the above-mentioned series of processes.
  • the RAM 903 also stores data necessary for the CPU 901 to execute various processes, as appropriate.
  • the program executed by the computer can be applied by recording it on removable media 921 such as package media, for example.
  • the program can be installed in the storage unit 913 via the input/output interface 910 by inserting the removable media 921 into the drive 915.
  • This program can also be provided via a wired or wireless transmission medium, such as a local area network, the Internet, or digital satellite broadcasting. In that case, the program can be received by the communication unit 914 and installed in the storage unit 913.
  • a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
  • the program can be received by the communication unit 914 and installed in the storage unit 913.
  • this program can also be pre-installed in the ROM 902 or memory unit 913.
  • the present technology may be applied to any configuration, for example, to various electronic devices.
  • the present technology can be implemented as part of an apparatus, such as a processor (e.g., a video processor) as a system LSI (Large Scale Integration), a module using multiple processors (e.g., a video module), a unit using multiple modules (e.g., a video unit), or a set in which other functions are added to a unit (e.g., a video set).
  • a processor e.g., a video processor
  • system LSI Large Scale Integration
  • modules using multiple processors e.g., a video module
  • a unit using multiple modules e.g., a video unit
  • a set in which other functions are added to a unit e.g., a video set.
  • the present technology can also be applied to a network system consisting of multiple devices.
  • the present technology may be implemented as cloud computing in which multiple devices share and collaborate on processing via a network.
  • the present technology may be implemented in a cloud service that provides image (video) related services to any terminal, such as a computer, AV (Audio Visual) equipment, portable information processing terminal, or IoT (Internet of Things) device.
  • a system refers to a collection of multiple components (devices, modules (parts), etc.), regardless of whether all the components are in the same housing. Therefore, multiple devices housed in separate housings and connected via a network, and a single device in which multiple modules are housed in a single housing, are both systems.
  • Systems, devices, processing units, etc. to which the present technology is applied can be used in any field, such as transportation, medical care, crime prevention, agriculture, livestock farming, mining, beauty, factories, home appliances, weather, and nature monitoring.
  • the applications are also arbitrary.
  • This technology can be used, for example, as a technology (so-called smart construction) aimed at improving productivity and safety at construction sites and resolving labor shortages, to create digital twins of construction sites through 3D surveying and use them for construction management.
  • this technology can be applied to 3D surveying using sensors mounted on drones or construction machinery, and feedback based on 3D data (e.g. point clouds) obtained by the surveying (e.g. construction progress management and soil volume management).
  • Such surveys are generally carried out multiple times at different times (e.g., every other day).
  • point cloud data obtained by surveys at different times is accumulated. Therefore, as the number of surveys increases, the increase in storage capacity and transmission costs for point cloud data sequences can become a problem.
  • There is redundancy between such point clouds from different times such as having similar structures.
  • a "flag” refers to information for identifying multiple states, and includes not only information used to identify two states, true (1) or false (0), but also information capable of identifying three or more states.
  • the value that this "flag” can take may be, for example, two values, 1/0, or three or more values. That is, the number of bits constituting this "flag” is arbitrary, and may be one bit or multiple bits.
  • identification information including flags
  • identification information includes flags
  • “flag” and “identification information” include not only the information itself, but also difference information with respect to the reference information.
  • various information (metadata, etc.) related to the encoded data may be transmitted or recorded in any form as long as it is associated with the encoded data.
  • the term "associate" means, for example, that one piece of data can be used (linked) when processing the other piece of data.
  • data that are associated with each other may be combined into one piece of data, or each piece of data may be individual data.
  • information associated with encoded data (image) may be transmitted on a transmission path different from that of the encoded data (image).
  • information associated with encoded data (image) may be recorded on a recording medium different from that of the encoded data (image) (or on a different recording area of the same recording medium).
  • this "association" may be a part of the data, not the entire data.
  • an image and information corresponding to that image may be associated with each other in any unit, such as multiple frames, one frame, or a part of a frame.
  • the configuration described above as one device (or processing unit) may be divided and configured as multiple devices (or processing units).
  • the configurations described above as multiple devices (or processing units) may be combined and configured as one device (or processing unit).
  • configurations other than those described above may be added to the configuration of each device (or processing unit).
  • part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit).
  • the above-mentioned program may be executed in any device.
  • the device has the necessary functions (functional blocks, etc.) and is capable of obtaining the necessary information.
  • each step of a single flowchart may be executed by a single device, or may be shared among multiple devices.
  • a single step includes multiple processes, those multiple processes may be executed by a single device, or may be shared among multiple devices.
  • multiple processes included in a single step may be executed as multiple step processes.
  • processes described as multiple steps may be executed collectively as a single step.
  • the processing of the steps describing a program executed by a computer may be executed chronologically in the order described in this specification, or may be executed in parallel, or individually at a required timing such as when a call is made. In other words, as long as no contradiction arises, the processing of each step may be executed in an order different from the order described above. Furthermore, the processing of the steps describing this program may be executed in parallel with the processing of other programs, or may be executed in combination with the processing of other programs.
  • the multiple technologies related to the present technology can be implemented independently and individually, so long as no contradictions arise.
  • any multiple of the present technologies can also be implemented in combination.
  • part or all of the present technology described in any embodiment can be implemented in combination with part or all of the present technology described in another embodiment.
  • part or all of any of the present technologies described above can be implemented in combination with other technologies not described above.
  • the present technology can also be configured as follows.
  • a guide surface feature amount derivation unit that derives guide surface feature amounts, which are feature amounts based on a positional relationship between the guide surface and the points, using a first coordinate set, which is a set of coordinates of points forming a point cloud, a first feature amount set, which is a set of feature amounts of the coordinates, and a guide surface, which is a reference surface formed in a three-dimensional space; a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature amount set that reflects the guide surface feature amount, and derives a second coordinate set and a second feature amount set; a coordinate encoding unit that encodes the derived second coordinate set; a feature encoding unit that encodes the derived second feature set; an amount of code for the encoded data of the second coordinate set and the second feature set is smaller than an amount of code for the encoded data of the first coordinate set and the first feature set.
  • a guide surface generation unit that generates the guide surface by using the point cloud; a guide surface encoding unit that encodes the guide surface and generates encoded data of the guide surface;
  • the information processing device according to any one of (1) to (5), further comprising: a guide surface decoding unit that decodes encoded data of the guide surface and generates the guide surface.
  • the guide surface feature amount is a feature amount based on a nearest point on the guide surface with respect to the point.
  • the guide surface is a mesh.
  • the information processing device according to any one of (1) to (7), wherein the guide surface is a triangle soup.
  • the information processing device according to any one of (1) to (7), wherein the guide surface is an implicit function.
  • the information processing device according to any one of (1) to (10), further comprising: a parameter set selection unit that selects a parameter set to be applied to encoding the point cloud.
  • a guide surface generation unit that generates the guide surface by using the point cloud;
  • the information processing device according to (11), further comprising: a guide surface encoding unit that applies the selected parameter set to encode the guide surface and generates encoded data of the guide surface.
  • the guide surface feature amount derivation unit derives the guide surface feature amount by using the coordinate set, the feature amount set including attribute information, and the guide surface including a texture.
  • a guide surface feature amount which is a feature amount based on a positional relationship between the guide surface and the points, using a first coordinate set, which is a set of coordinates of points forming the point cloud, a first feature amount set, which is a set of feature amounts of the coordinates, and a guide surface, which is a reference surface formed in a three-dimensional space; performing a calculation using the first coordinate set and the first feature amount set reflecting the guide surface feature amount to derive a second coordinate set and a second feature amount set; encoding the derived second set of coordinates; encoding the derived second feature set.
  • a coordinate decoding unit that decodes encoded data to generate a first coordinate set that is a set of coordinates of points that form the point cloud; a feature decoding unit that decodes the encoded data to generate a first feature set that is a set of features of the coordinates; a guide surface feature amount derivation unit that derives a guide surface feature amount, which is a feature amount based on a positional relationship between the guide surface and the point, by using the generated first coordinate set, the generated first feature amount set, and a guide surface, which is a reference surface formed in a three-dimensional space; a calculation unit that performs a predetermined calculation using the first coordinate set and the first feature amount set that reflects the guide surface feature amount, and derives a second coordinate set and a second feature amount set, an amount of code for encoded data of the first coordinate set and the first feature set is smaller than an amount of code for encoded data of the second coordinate set and the second feature set.
  • the information processing device according to any one of (21) to (25), further comprising: a guide surface decoding unit that decodes encoded data of the guide surface and generates the guide surface.
  • the guide surface feature amount is a feature amount based on a nearest point on the guide surface with respect to the point.
  • the information processing device according to any one of (21) to (27), wherein the guide surface is a mesh.
  • the information processing device according to (28), further comprising: a fitting unit that fits the guide surface to the point cloud.
  • the information processing device according to any one of (21) to (27), wherein the guide surface is a triangle soup.
  • (33) Decoding the encoded data to generate a first set of coordinates that is a set of coordinates of points that form the point cloud; Decoding the encoded data to generate a first feature set that is a set of features of the coordinates; deriving a guide surface feature amount, which is a feature amount based on a positional relationship between the guide surface and the point, using the generated first coordinate set, the generated first feature amount set, and a guide surface, which is a reference surface formed in a three-dimensional space; performing a predetermined calculation using the first coordinate set and the first feature amount set reflecting the guide surface feature amount to derive a second coordinate set and a second feature amount set; an amount of code for the encoded data of the first coordinate set and the first feature set is smaller than an amount of code for the encoded data of the second coordinate set and the second feature set.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un dispositif et un procédé de traitement d'informations qui permettent de supprimer une réduction de l'efficacité de codage. Le procédé consiste à : utiliser un premier ensemble de coordonnées, qui est un ensemble de coordonnées de points formant un nuage de points, un premier ensemble de quantités de caractéristiques, qui est un ensemble de quantités de caractéristiques des coordonnées, et une surface de guidage, qui est une surface de référence formée dans un espace tridimensionnel, pour dériver une quantité de caractéristiques de surface de guidage, qui est une quantité de caractéristiques basée sur une relation de position entre la surface de guidage et les points ; effectuer un calcul prédéterminé à l'aide du premier ensemble de coordonnées et du premier ensemble de quantités de caractéristiques reflétant la quantité de caractéristiques de surface de guidage pour dériver un second ensemble de coordonnées et un second ensemble de quantités de caractéristiques ; encoder le second ensemble de coordonnées dérivé ; et encoder le second ensemble de quantités de caractéristiques dérivé. La présente divulgation peut s'appliquer, par exemple, à des dispositifs de traitement d'informations, à des instruments électroniques ainsi qu'à des procédés et des programmes de traitement d'informations.
PCT/JP2024/021822 2023-07-06 2024-06-17 Dispositif et procédé de traitement d'informations Pending WO2025009366A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2023111216 2023-07-06
JP2023-111216 2023-07-06

Publications (1)

Publication Number Publication Date
WO2025009366A1 true WO2025009366A1 (fr) 2025-01-09

Family

ID=94172032

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2024/021822 Pending WO2025009366A1 (fr) 2023-07-06 2024-06-17 Dispositif et procédé de traitement d'informations

Country Status (1)

Country Link
WO (1) WO2025009366A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020071114A1 (fr) * 2018-10-02 2020-04-09 ソニー株式会社 Dispositif et procédé de traitement d'image
JP2020515937A (ja) * 2017-01-13 2020-05-28 インターデジタル ヴイシー ホールディングス, インコーポレイテッド 没入型ビデオフォーマットのための方法、装置、及びストリーム
WO2021065536A1 (fr) * 2019-10-01 2021-04-08 ソニー株式会社 Dispositif et procédé de traitement d'informations
JP2022540569A (ja) * 2019-06-30 2022-09-16 オッポ広東移動通信有限公司 変換方法、逆変換方法、エンコーダ、デコーダ及び記憶媒体
JP2023522702A (ja) * 2020-06-23 2023-05-31 ソニーグループ株式会社 スライス毎のtrisoupノードサイズ

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020515937A (ja) * 2017-01-13 2020-05-28 インターデジタル ヴイシー ホールディングス, インコーポレイテッド 没入型ビデオフォーマットのための方法、装置、及びストリーム
WO2020071114A1 (fr) * 2018-10-02 2020-04-09 ソニー株式会社 Dispositif et procédé de traitement d'image
JP2022540569A (ja) * 2019-06-30 2022-09-16 オッポ広東移動通信有限公司 変換方法、逆変換方法、エンコーダ、デコーダ及び記憶媒体
WO2021065536A1 (fr) * 2019-10-01 2021-04-08 ソニー株式会社 Dispositif et procédé de traitement d'informations
JP2023522702A (ja) * 2020-06-23 2023-05-31 ソニーグループ株式会社 スライス毎のtrisoupノードサイズ

Similar Documents

Publication Publication Date Title
Huang et al. 3d point cloud geometry compression on deep learning
CN110996098B (zh) 处理点云数据的方法和装置
US9165403B2 (en) Planetary scale object rendering
US8805097B2 (en) Apparatus and method for coding a three dimensional mesh
Daribo et al. Efficient rate-distortion compression of dynamic point cloud for grid-pattern-based 3D scanning systems
JP5735114B2 (ja) エンコード方法、エンコード装置、デコード方法及びデコード装置
JP7430792B2 (ja) 属性情報の予測方法、エンコーダ、デコーダ及び記憶媒体
US20250148652A1 (en) Method, apparatus, and medium for point cloud coding
KR20240132487A (ko) 포인트 클라우드 코딩을 위한 방법, 장치 및 매체
KR20250016274A (ko) 부호화 방법, 복호화 방법, 장치 및 기기
TW202406344A (zh) 一種點雲幾何資料增強、編解碼方法、裝置、碼流、編解碼器、系統和儲存媒介
US12288367B2 (en) Point cloud geometry compression
WO2022131948A1 (fr) Dispositifs et procédés de codage séquentiel pour compression de nuage de points
Lee et al. Progressive 3D mesh compression using MOG-based Bayesian entropy coding and gradual prediction
US20250232483A1 (en) Method, apparatus, and medium for point cloud coding
WO2025009366A1 (fr) Dispositif et procédé de traitement d'informations
Yim et al. Mamba-pcgc: Mamba-based point cloud geometry compression
KR20240091150A (ko) 포인트 클라우드 코딩 방법, 장치 및 매체
CN119366184A (zh) 一种点云帧间补偿方法、编解码方法、装置和系统
EP4542492A1 (fr) Dispositif et procédé de traitement d'informations
JP7689093B2 (ja) 情報圧縮システム及び情報圧縮方法
WO2025077881A1 (fr) Procédé, appareil et support de codage de nuage de points
WO2025081769A1 (fr) Procédés de codage et de décodage, flux binaire, codeur, décodeur et support de stockage
EP4233006B1 (fr) Appareils et méthodes pour la quantification spatiale faisant partie d'une compression d'un nuage à points
WO2025033111A1 (fr) Dispositif et procédé de traitement d'information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24835881

Country of ref document: EP

Kind code of ref document: A1