GB2637850A

GB2637850A - Applications of layered encoding in split computing

Info

Publication number: GB2637850A
Application number: GB2501402.8A
Authority: GB
Inventors: Meardi Guido
Original assignee: V Nova International Ltd
Current assignee: V Nova International Ltd
Priority date: 2022-07-01
Filing date: 2023-06-30
Publication date: 2025-08-06
Also published as: CN120035841A; WO2024003577A1; JP2025524516A; GB202501402D0; EP4548304A1

Abstract

A networked system for generating a sequence of frames for rendering a dynamic 3D scene, the system comprising a first rendering node and a second rendering node, wherein: the first rendering node is configured to: generate a sequence of first partially rendered data; perform layered encoding on each first partially rendered data to generate a sequence of encoded first partially rendered data; and transmit the sequence of encoded first partially rendered data to the second rendering node; and the second rendering node is configured to: obtain the sequence of encoded first partially rendered data from the first rendering node; and generate a sequence of second partially or fully rendered frames based on the sequence of encoded first partially rendered data.

Claims

1 . A networked system for generating a sequence of frames for rendering a dynamic 3D scene, the system comprising a first rendering node and a second rendering node, wherein: the first rendering node is configured to: generate a sequence of first partially rendered frames; and perform layered encoding on each first partially rendered frame to generate a sequence of encoded first partially rendered frames; and transmit the sequence of encoded first partially rendered frames to the second rendering node; and the second rendering node is configured to: obtain the sequence of encoded first partially rendered frames from the first rendering node; and generate a sequence of second partially or fully rendered frames based on the sequence of encoded first partially rendered frames.

2. A networked system according to 1 , wherein the second rendering node is configured to decode the sequence of encoded first partially rendered frames to obtain the sequence of first partially rendered frames.

3. A networked system according to claim 1 or claim 2, wherein the second rendering node is configured to perform layered encoding on each second partially or fully rendered frame to generate a sequence of encoded second partially or fully rendered frames.

4. A networked system according to claim 3, wherein the first rendering node is configured to perform layered encoding according to a first coding scheme, and the second rendering node is configured to perform layered encoding according to a second coding scheme, the first coding scheme being different from the second coding scheme.

5. A networked system according to any of claims 1 to 4, wherein a frame rate for the sequence of second partially or fully rendered frames is greater than a frame rate for the sequence of first partially rendered frames.

6. A networked system according to any of claims 1 to 5, comprising a display device, wherein a communication delay between the first rendering node and the display device is larger than a communication delay between the second rendering node and the display device.

7. A networked system according to claim 6, wherein: the first rendering node is configured to obtain a viewing position from the display device before generating the sequence of first partially rendered frames; and the second rendering node is configured to obtain an updated viewing position from the display device before generating the sequence of second partially or fully rendered frames.

8. A networked system according to any of claims 1 to 7, wherein generating the sequence of first partially rendered frames requires greater processing resources than generating the sequence of second partially or fully rendered frames.

9. A networked system according to any of claims 1 to 8, wherein each frame comprises image data and depth map data, and the system comprises a third rendering node configured to generate a sequence of third fully rendered frames by performing time warping and/or depth correction on the sequence of second partially or fully rendered frames using the depth map data.

10. A networked system according to any of claims 1 to 9, wherein each frame comprises point cloud data in which each of a plurality of points has a 3D position and one or more attributes.

11. A networked system according to claim 10, wherein the second rendering node or a third rendering node is configured to calculate depth map data based on 3D positions of points.

12. A networked system according to any of claims 1 to 11 , wherein: the first rendering node is configured to generate one or more sequences of first partially rendered frames, for a first number of users or display devices; and the second rendering node is configured to generate one or more sequences of second partially or fully rendered frames, for a second number of the plurality of users or display devices, wherein the second number is smaller than the first number.

13. The networked system of any of claims 1 to 11 , wherein the networked system dynamically chooses how many nodes and what specific nodes will be used for the rendering process responsive to at least one metric comprising: a metric based on complexity of the rendering task to be performed, a metric based on the spare capacity available in each node, a metric based on the location of the display device with respect to the network of nodes, a metric based on the roundtrip latency between nodes and display device, a metric based on the bandwidth available among nodes and from nodes to display device, and a metric based on the number of distinct display devices requesting rendering of a same 3D scene within a range of point of views.

14. The networked system of any of claims 1 to 13, wherein a partially rendered frame is encoded as volumetric data comprising one or more of: point cloud data, mesh data, texture data.

15. The networked system of any of claims 1 to 14, wherein a partially rendered frame comprises light field data.

16. The networked system of any of claims 1 to 15, wherein a partially rendered frame comprises spatial properties allowing to compute the behaviour of sound in the 3D space.

17. The networked system of any of claims 1 to 16, the partially rendered frame is encoded by using at least in part lossy encoding methods.

18. The networked system of any of claims 1 to 17, wherein a partially rendered frame is encoded by using at least in part layered encoding methods.

19. The networked system of any of claims 1 to 18, wherein a subsequent node receives only a portion of the data encoded by the first rendering node, responsive to the specific location of the one or more points of view of the fully rendered frames to be computed.

20. The networked system of any of claims 1 to 19, wherein a subsequent node only decodes the subset of encoded data produced by the first rendering node and received by the subsequent node that are necessary to fully render the specific field of view that it is rendering at any one point.

21. The networked system of any of claims 1 to 20, wherein a partially rendered frame is encoded by using at least in part a point cloud format representing points according to one or more coordinate systems, each point being attributed one or more data attributes specifying visual properties comprising one or more of size, normal vector, motion information, colour information, transparency.

22. A method for encoding a sequence of frames representing a dynamic 3D scene, wherein each frame is composable from base-layer image data and enhancement data, the method comprising: performing layered encoding on a frame of the sequence of frames to generate an encoded frame comprising one or more of a base image layer and an enhancement image layer.

23. The method of claim 22, wherein the enhancement image layer comprises data used by the decoder device to reconstruct a higher resolution rendition of the sequence of frames.

24. The method of claim 22 or claim 23, wherein the enhancement image layer comprises data used by the decoder device to reconstruct a higher bit-depth rendition of the sequence of frames with respect to the bit-depth of the base image layer.

25. The method of any one of claims 22 to 24, wherein the enhancement image layer comprises data used by the decoder device to reconstruct the distance of objects in the image from the viewer.

26. The method of any one of claims 22 to 25, the enhancement image layer comprises data used by the decoder device to reconstruct haptic feedback for the user.

27. The method of any one of claims 22 to 26, the method further comprising: responsive to a drop in transmission channel bandwidth, discarding enhancement data during transmission of the sequence of frames and indicating to the encoder that the enhancement data has been dropped, and if the enhancement data is being dropped, refreshing temporal buffers of enhancement data encoding and performing an Instantaneous Decoder Refresh (I DR) for the enhancement data (not necessarily for the base layer data), to account for the decoder having missed some of the previous enhancement data.

28. The method of any one of claims 22 to 27, wherein the layered encoding method used is MPEG-5 LCEVC (Low Complexity Enhancement Video Coding) encoding or SMPTE VC-6 encoding.

29. The method of claim 28, wherein at least one of the enhancement data being transmitted as embedded user data within the coefficients of LCEVC data.

30. The method of claim 29, wherein one or more residual coefficients of an image comprise embedded depth information representing the depth of the corresponding object with respect to the viewpoint, and the decoder processes the embedded data to reconstruct a depth map associated to the image frame based at least in part on the embedded data.

31. The method of claim 30, wherein the depth map is reconstructed by processing both the embedded data and the image data.

32. The method of any one of claims 22 to 31 , wherein the frame plus depth data sent at a given frame rate are used by a display device to increase the frame rate via depth-based reprojections so as to match a display frame rate.

33. The method of any one of claims 22 to 32, wherein each frame comprises image data and depth map data and the method comprises: performing layered encoding on a frame of the sequence of frames to generate an encoded frame comprising one or more of a base depth map layer and an enhancement depth map layer.

34. A method according to claim 33, wherein each frame comprises an image layer and a depth map layer, and the encoded frame comprises one or more of a base image layer, the base depth map layer, an enhancement image layer and the enhancement depth map layer.

35. A method according to claim 33, wherein each frame comprises the depth map data embedded in the image data, the base depth map layer is a base image layer with embedded depth map data, and the enhancement depth map layer is an enhancement image layer with embedded depth map data.

36. A method according to any of claims 33 to 35, further comprising: receiving a depth map drop indication indicating whether or not the depth map data is being dropped when transmitting the sequence of frames; and if the depth map data is being dropped, discarding the depth map data of a frame, and performing layered encoding on the image data of the frame to generate an encoded frame comprising a base image layer and an enhancement image layer.

37. A method comprising: receiving a frame plus depth data at a given frame rate, encoded by means of a layered encoding method; at a display device, increasing the frame rate via depth-based reprojections using the depth data so as to match a display frame rate.

38. The method of claim 37, wherein the frame comprises data representing a dynamic 3D scene, and each frame comprises base-layer image data and enhancement data, the method comprising: performing layered decoding on a frame of the sequence of frames to generate a decoded frame from an encoded frame comprising a base image layer and an enhancement image layer.

39. A bit sequence representing an encoding of a sequence of frames representing a dynamic 3D scene, the bit sequence comprising one or more of: encoded data for a base depth map layer for a frame; and encoded data for an enhancement depth map layer for the frame.

40. A bit sequence according to claim 39, further comprising one or more of: encoded data for a base image layer for the frame; and encoded data for an enhancement image layer for the frame.

41 . A bit sequence according to claim 39, wherein the base depth map layer is a base image layer with embedded depth map data, and the enhancement depth map layer is an enhancement image layer with embedded depth map data.

42. A method for transmitting a sequence of frames representing a dynamic 3D scene, wherein each frame comprises image data and depth map data, the method comprising: obtaining an encoded frame comprising one or more of a base depth map layer and an enhancement depth map layer; determining if the depth map data is to be dropped when transmitting the sequence of frames; if the depth map data is to be dropped: discarding the depth map data from the encoded frame, to give a reduced encoded frame comprising one or more of a base image layer and an enhancement image layer; and transmitting the reduced encoded frame; and otherwise, transmitting the encoded frame.

43. A method according to claim 42, wherein: each frame comprises an image layer and a depth map layer, the encoded frame comprises one or more of a base image layer, the base depth map layer, an enhancement image layer and the enhancement depth map layer, and discarding the depth map data comprises discarding the enhancement depth map layer, and preferably further comprises discarding the base depth map layer.

44. A method according to claim 42, wherein: each frame comprises the depth map data embedded in the image data, the base depth map layer is a base image layer with embedded depth map data, and the enhancement depth map layer is an enhancement image layer with embedded depth map data, and discarding the depth map data comprises removing the embedded depth map data from the enhancement image layer, and preferably further comprises removing the embedded depth map data from the base image layer.

45. A method according to any of claims 42 to 44, further comprising: generating a depth map drop indication indicating whether or not the depth map data is being dropped when transmitting the sequence of frames; and sending the depth map drop indication to an upstream encoder, or sending the depth map drop indication with the encoded frame or the reduced encoded frame.

46. A method performed by an encoder for encoding a sequence of frames, the method comprising: performing layered encoding on a first frame of the sequence of frames to generate a first encoded frame comprising a base layer and an enhancement layer; storing a component of the enhancement layer of the first encoded frame in a temporal buffer, for use in temporal encoding of a subsequent frame; sending the first encoded frame to a transmitter for transmission; receiving an enhancement drop indication indicating whether or not the enhancement layer was dropped when transmitting the first encoded frame; performing layered encoding on a second frame of the sequence of frames to generate a second encoded frame comprising a base layer and an enhancement layer, wherein: if the enhancement drop indication indicates that the enhancement layer was not dropped, the enhancement layer of the second encoded frame is generated with reference to the temporal buffer, and if the enhancement drop indication indicates that the enhancement layer was dropped, the enhancement layer of the second encoded frame is generated without reference to the temporal buffer.

47. A method according to claim 46, further comprising, if the enhancement drop indication indicates that the enhancement layer was dropped, clearing the temporal buffer.

48. A method according to claim 46 or claim 47, further comprising: if no enhancement drop indication is received within a predetermined time limit, performing layered encoding on a second frame of the sequence of frames to generate a second encoded frame comprising a base layer and an enhancement layer, wherein the enhancement layer of the second encoded frame is generated with reference to the temporal buffer.

49. A method according to claim 46 or claim 47, further comprising: if no enhancement drop indication is received within a predetermined time limit, performing layered encoding on a second frame of the sequence of frames to generate a second encoded frame comprising a base layer and an enhancement layer, wherein the enhancement layer of the second encoded frame is generated without reference to the temporal buffer.

50. A method according to any of claims 46 to 49, wherein generating the enhancement layer of a frame without reference to the temporal buffer comprises: decoding the base layer of the encoded frame; and calculating a residual as a difference between the decoded base layer and the frame.

51. A method according to any of claims 46 to 50, wherein generating the enhancement layer of a frame with reference to the temporal buffer comprises: decoding the base layer of the encoded frame; calculating a residual as a difference between the decoded base layer and the frame; and calculating a difference between the residual of the current frame and a corresponding residual of a previous frame stored in the temporal buffer.

52. A method according to any of claims 46 to 51 , wherein the layered encoding is LCEVC encoding.