[go: up one dir, main page]

WO2024117951A1 - Holographic communication system - Google Patents

Holographic communication system Download PDF

Info

Publication number
WO2024117951A1
WO2024117951A1 PCT/SE2022/051128 SE2022051128W WO2024117951A1 WO 2024117951 A1 WO2024117951 A1 WO 2024117951A1 SE 2022051128 W SE2022051128 W SE 2022051128W WO 2024117951 A1 WO2024117951 A1 WO 2024117951A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
depth map
image
sending device
criteria
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/SE2022/051128
Other languages
French (fr)
Inventor
Charles KINUTHIA
Volodya Grancharov
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Telefonaktiebolaget LM Ericsson AB
Original Assignee
Telefonaktiebolaget LM Ericsson AB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget LM Ericsson AB filed Critical Telefonaktiebolaget LM Ericsson AB
Priority to EP22967360.3A priority Critical patent/EP4627528A1/en
Priority to PCT/SE2022/051128 priority patent/WO2024117951A1/en
Publication of WO2024117951A1 publication Critical patent/WO2024117951A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/597Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes

Definitions

  • extended reality (XR) applications e.g., virtual reality applications
  • VR virtual reality
  • AR augmented reality
  • MR mixed reality applications
  • XR XR application
  • holographic communication refers to the transmission of data that enables a device receiving the data to produce a three-dimensional (3D) image.
  • a sending device obtains image data for an image (e.g., a frame of a video) and, for each image, corresponding depth data (a.k.a., a “depth map”) associated with the image (e.g., for each pixel of the image, there is a depth value that indicates the distance from the sensor to the respective point on the object corresponding to the pixel).
  • the image data can be captured by, for example, a smartphone’s camera and the depth map can be captured by a smartphone’s light detection and ranging (LiDAR) sensor and/or other sensors.
  • a depth map can be captured by means of active scanning (e.g., LiDAR in iPhone 14 Pro (see, e.g., apple(dot)com/iphone-14-pro/specs/)), or passive scanning (e.g., stereo camera setup as in Intel® RealSenseTM Depth Camera D435 (see, e.g., www(dot)intelrealsense(dot)com/depth-camera-d435/)).
  • a depth map contains distance information (“depth values”) indicating distances from a sensor position to points on the surface of an object in the physical scene.
  • a depth map may be a matrix of distance values where each distance value indicates a distance from a sensor to a point on a surface.
  • the encoded data is decoded to recover the image data and depth map. This data is then fed to a rendering and visualization module.
  • XR glasses in communication with the receiving device can be used to display the images in 3D as holograms. This application allows the rendering of a 3D reconstruction of a person using the sending device (e.g., the person is rendered as a hologram using XR glasses for a more immersive experience).
  • U.S. patent publication no. 20190025587 Al describes a holographic projection technology for use in the presentation of a 3D imaging effect to a user, such as computergenerated holography.
  • U.S. patent publication no. US 2022057750 Al describes generating a computer-generated hologram (CGH) of an object using a “depth map method.”
  • a method that includes obtaining a measured depth map associated with at least a first image and obtaining an estimated depth map associated with at least the first image.
  • the method also includes obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map.
  • the method also includes determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition.
  • the method further includes refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
  • a computer program comprising instructions which when executed by processing circuitry of a sending device causes the sending device to perform any of the methods disclosed herein.
  • a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.
  • a sending device that is configured to perform the methods disclosed herein.
  • the sending device may include memory and processing circuitry coupled to the memory.
  • a depth map associated with one or more images is transmitted only when a certain criteria is met (e.g., the estimated depth map is not accurate enough). This not only allows holographic communication systems to operate under bandwidth limitations, but also reduces the load on the network as well as reducing the load on the battery of the sending device, thereby extending the life of the battery.
  • FIG. 1 illustrates a communication system according to some embodiments.
  • FIG. 2A is a functional block diagram of a sending device according to some embodiments.
  • FIG. 2B is a functional block diagram of a receiving device according to some embodiments.
  • FIG. 3 is a flowchart illustrating a process according to some embodiments.
  • FIG. 4 is a block diagram of a sending device according to some embodiments.
  • FIG. 1 illustrates a communication system 100 according to an embodiment.
  • System 100 includes a sending device 102 (e.g., smartphone, laptop, computer, tablet, drone, etc.) communicating via a network 110 (e.g., the Internet) with a receiving device 104.
  • a sending device 102 e.g., smartphone, laptop, computer, tablet, drone, etc.
  • a network 110 e.g., the Internet
  • communication system 100 is a holographic communication system in which sending device 102 includes an image sensor (IS) 111 (e.g., camera) for producing image data (e.g., the image data may include a matrix of luma values and/or a matrix of chroma values) corresponding to an image and a depth sensor (DS) 112 (e.g., LiDAR sensor) for producing a depth map (a.k.a., “depth data”) for each image (e.g., video frame).
  • IS image sensor
  • DS depth sensor
  • sending device 102 has a bitstream generator (BG) 190 for generating a bitstream 199 which can be stored for later use by receiving device 104 and/or transmitted to receiving device 104 via network 110, where the bitstream contains both an image data bitstream 291 (see FIG. 2A) containing image data (e.g., the image data as captured by image senor 111 or an encoded version thereof) and a depth data bitstream 292 (see FIG. 2A) containing depth data (e.g., the depth data as captured by depth sensor 112 or an encoded version thereof), and receiving device 104 includes a hologram generator (HG) 195 configured to use the image data and depth data to display a hologram to the user of the receiving device.
  • sending device 102 outputs two separate bitstreams: the image bitstream containing the image data and the depth bitstream containing the depth data.
  • This disclosure describes improvements for use in the case of a network limitation that prevents the required high-quality image and depth data to be transmitted to the receiving device.
  • the improvement is to selectively provide the depth data.
  • a depth map associated with an image may or may not be provided to the receiving device (i.e., included in the bitstream) depending on whether a criteria is met the receiving device is able to produce a high-quality estimated depth map (i.e., an estimated depth map that is similar enough to the corresponding measured depth map).
  • image data is encoded and locally decoded by a video codec (e.g., Versatile Video Coding (VVC) codec) to produced decoded image data.
  • VVC Versatile Video Coding
  • the decoded image data is used as input to a depth map estimator (e.g., a neural network or other machine learning (ML) model) to produce a depth map estimate.
  • a value e.g., an error term
  • FIG. 2A illustrates an embodiment of sending device 102.
  • sending device 102 includes: i) an image encoder 202 (a.k.a., video encoder (VE)) for producing, for each of one or more images, encoded image data (Lnc) for the image from the raw image data (Law) for the image provided by image sensor 111, which encoded image data is included in bitstream 291 sent to receiving device 104 and ii) a depth data transmission controller (or controller for short) 206 for controlling whether or not, for each measured depth map, sending device 102 will transmit the depth map to receiving device 104 (e.g., for determining whether or not to include the depth map in the bitstream 292.
  • an image encoder 202 a.k.a., video encoder (VE)
  • VE video encoder
  • controller 206 receives encoded image data (Lnc) corresponding to an image (i.e., nc was produced by VE 202) and employs a video decoder (VD) 222 to decode the encoded image data produced by VE 202, thereby producing decoded image data (Idee) corresponding to the image.
  • VD video decoder
  • Idee decoded image data
  • EST depth map estimator
  • controller 206 receives Law and Law is used as the input to EST 224 to produce the estimated depth map (D*) associated with the image.
  • a decision function (DF) 222 is configured to decide, for each measured depth map, whether to include the measured depth map (or an encoded version thereof) in bitstream 292. The decision is based on a measure of the similarity between the measured depth map for the image and the estimated depth map for the image (e.g., an error value or similarity value).
  • DF 222 computes an error value (e) indicating an error between the measured depth map and the corresponding estimated depth map estimated by EST 224; the error value (e) is then compared to a threshold (T).
  • a depth map encoder (DE) 204 is used for encoding the measured depth map to produce an encoded depth map (Dene) based on the measured depth map (Dm).
  • the DE can for example encode the depth map as PNG (see reference [3]).
  • FIG. 2B further illustrates an embodiment of receiving device 104.
  • the encoded image data (lenc) in the bitstream 291 is decoded by VD 251 to produce decoded image data (Idee).
  • An EST 254 which is identical to EST 224 in sending device 102, uses decoded image data for an image to produce a depth map estimate D* associated with the image. If an encoded depth map for the image was not included in bitstream 292, which means that DF 226 determined that D* is good enough, D* and pou are used by HG 195 to produce the 3D image.
  • an encoded depth map for the image is included in bitstream 292 (i.e., D* is not good enough) it is first decoded by a depth decoder (DD) 252 to produce a decoded depth map (Ddec) which, together with zic, is used by HG 195 to produce the 3D image.
  • DD depth decoder
  • Ddec decoded depth map
  • controller 206 can switch between mode 1 and mode 2 based on a condition being satisfied (e.g., the video codec operating point satisfying a condition).
  • the EST (e.g., neural network (NN)) used to produce the depth estimate, operates at certain quality range of input images. If the codec bitrate is low, the difference between the image from the capturing device and decoded image will increase and the EST at the encoder and decoder will be presented with increasingly different input and therefore the estimated encoder and decoder depth will start to deviate. In this case it is advantageous to use mode 1. If the codec bitrate is high, one can assume the difference between the image from the capturing device and the decoded image is not as large and therefore switch to mode 2. The motivation for this is to have both the neural network on the sender and receiver use similar inputs and therefore produce very similar depth map estimates. This switching scheme can be realized as: if bn ⁇ B, then use mode 1, else use mode 2, where bn is the codec bitrate and B is the bitrate threshold at which EST performance starts to get affected.
  • bn is the codec bitrate
  • B is the bitrate threshold at which EST performance starts to get affected.
  • FIG. 3 is a flow chart illustrating a process 300 according to an embodiment.
  • Process 300 may begin in step s 302.
  • Step s302 comprises obtaining a measured depth map associated with at least a first image.
  • Step s304 comprises obtaining an estimated depth map associated with at least the first image.
  • Step s306 comprises obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map.
  • Step s308 comprises determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition.
  • Step s310 comprises refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
  • the method is performed by a sending device, and the sending device is configured to provide the measured depth map to the receiving device as a result of determining that the criteria is not met.
  • the similarity measure is an error value, and the error value satisfies the condition if the error value is less than a threshold.
  • the error value is equal to:
  • the criteria is met if the error value is less than the threshold. [0038] In some embodiments, the criteria is met if: i) the error value is less than the threshold and ii) bandwidth available to provide the measured depth map to the receiving device is less than a B, where B is a bandwidth threshold.
  • obtaining the estimated depth map comprises inputting image data into a neural network and obtaining the estimated depth map from the neural network.
  • the process also includes: encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data.
  • the image data input into the neural network comprises raw image data for the first image.
  • the process also includes encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data if a bitrate, bn, is less than a threshold, otherwise the image data input into the neural network comprises the raw image data.
  • FIG. 4 is a block diagram of sending device 102, according to some embodiments.
  • sending device 102 may comprise: processing circuitry (PC) 402, which may include one or more processors (P) 455 (e.g., one or more general purpose microprocessors and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be co-located in a single housing or in a single data center or may be geographically distributed (i.e., sending device 102 may be a distributed computing apparatus); at least one network interface 448 (e.g., a physical interface or air interface) comprising a transmitter (Tx) 445 and a receiver (Rx) 447 for enabling sending device 102 to transmit data to and receive data from other nodes connected to a network 110 (e.g., an Internet Protocol (IP) network) to which network interface 448 is connected (physically or wirelessly) (e.g.
  • IP Internet Protocol
  • a computer readable storage medium (CRSM) 442 may be provided.
  • CRSM 442 may store a computer program (CP) 443 comprising computer readable instructions (CRI) 444.
  • CP computer program
  • CRSM 442 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like.
  • the CRI 444 of computer program 443 is configured such that when executed by PC 402, the CRI causes sending device 102 to perform steps described herein (e.g., steps described herein with reference to the flow charts).
  • sending device 102 may be configured to perform steps described herein without the need for code. That is, for example, PC 402 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.
  • the sending device can determine whether or not there is an advantage to sending the measured depth data to a receiving device having a depth estimator with the capability to produce the estimated depth. For example, if the estimated depth is close to the depth measured by the LiDAR, then there is little to gained by providing the measured depth data to the receiving device and only the image data is transmitted, thereby saving network bandwidth as well as battery power.
  • transmitting data “to” or “toward” an intended recipient encompasses transmitting the data directly to the intended recipient or transmitting the data indirectly to the intended recipient (i.e., one or more other nodes are used to relay the message from the source node to the intended recipient).
  • receiving data “from” a sender encompasses receiving the data directly from the sender or indirectly from the sender (i.e., one or more nodes are used to relay the data from the sender to the receiving node).
  • a means “at least one” or “one or more.”

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A method for with a holographic communication system. The method includes obtaining a measured depth map associated with at least a first image and obtaining an estimated depth map associated with at least the first image. The method also includes obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map. The method also includes determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition. The method further includes refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.

Description

HOLOGRAPHIC COMMUNICATION SYSTEM
TECHNICAL FIELD
[001] Disclosed are embodiments related to a holographic communication system.
BACKGROUND
[002] In recent years extended reality (XR) applications (e.g., virtual reality applications
(VR), augmented reality (AR) applications, mixed reality applications (MR)) have become increasingly popular. One example of an XR application is an application that employs holographic communication, which refers to the transmission of data that enables a device receiving the data to produce a three-dimensional (3D) image.
[003] Typically, in a holographic communication system, a sending device obtains image data for an image (e.g., a frame of a video) and, for each image, corresponding depth data (a.k.a., a “depth map”) associated with the image (e.g., for each pixel of the image, there is a depth value that indicates the distance from the sensor to the respective point on the object corresponding to the pixel). The image data can be captured by, for example, a smartphone’s camera and the depth map can be captured by a smartphone’s light detection and ranging (LiDAR) sensor and/or other sensors. The image data and the corresponding depth map are encoded and the encoded data is added to a bitstream that is then transmitted over a network to a receiving device. A depth map can be captured by means of active scanning (e.g., LiDAR in iPhone 14 Pro (see, e.g., apple(dot)com/iphone-14-pro/specs/)), or passive scanning (e.g., stereo camera setup as in Intel® RealSense™ Depth Camera D435 (see, e.g., www(dot)intelrealsense(dot)com/depth-camera-d435/)). A depth map contains distance information (“depth values”) indicating distances from a sensor position to points on the surface of an object in the physical scene. For example, a depth map may be a matrix of distance values where each distance value indicates a distance from a sensor to a point on a surface.
[004] At the receiving device, the encoded data is decoded to recover the image data and depth map. This data is then fed to a rendering and visualization module. XR glasses in communication with the receiving device can be used to display the images in 3D as holograms. This application allows the rendering of a 3D reconstruction of a person using the sending device (e.g., the person is rendered as a hologram using XR glasses for a more immersive experience).
[005] U.S. patent publication no. 20190025587 Al describes a holographic projection technology for use in the presentation of a 3D imaging effect to a user, such as computergenerated holography. U.S. patent publication no. US 2022057750 Al describes generating a computer-generated hologram (CGH) of an object using a “depth map method.”
SUMMARY
[006] Certain challenges presently exist. For instance, for a holographic communication system to work well (i.e., to produce at the receiving device 3D images of sufficient quality) the network between the sending device and the receiving device must provide sufficient bandwidth, but this is not always feasible because the load on the network can be variable. That is, with existing compression and networking technology, the network occasionally does not provide the required bandwidth to transmit the required image and depth map, and this can lead to a drop of quality in the generated 3D images. The resulting visual perceptual artifacts can affect the overall perception of the holographic communication.
[007] Accordingly, in one aspect there is provided a method that includes obtaining a measured depth map associated with at least a first image and obtaining an estimated depth map associated with at least the first image. The method also includes obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map. The method also includes determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition. The method further includes refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
[008] In another aspect there is provided a computer program comprising instructions which when executed by processing circuitry of a sending device causes the sending device to perform any of the methods disclosed herein. In one embodiment, there is provided a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium. In another aspect there is provided a sending device that is configured to perform the methods disclosed herein. The sending device may include memory and processing circuitry coupled to the memory. [009] An advantage of the embodiments disclosed herein is that they reduce the bandwidth need of a holographic communication system because a measured depth map does not always need to be provided to the receiving device. Rather, a depth map associated with one or more images is transmitted only when a certain criteria is met (e.g., the estimated depth map is not accurate enough). This not only allows holographic communication systems to operate under bandwidth limitations, but also reduces the load on the network as well as reducing the load on the battery of the sending device, thereby extending the life of the battery.
BRIEF DESCRIPTION OF THE DRAWINGS
[0010] The accompanying drawings, which are incorporated herein and form part of the specification, illustrate various embodiments.
[0011] FIG. 1 illustrates a communication system according to some embodiments.
[0012] FIG. 2A is a functional block diagram of a sending device according to some embodiments.
[0013] FIG. 2B is a functional block diagram of a receiving device according to some embodiments.
[0014] FIG. 3 is a flowchart illustrating a process according to some embodiments.
[0015] FIG. 4 is a block diagram of a sending device according to some embodiments.
DETAILED DESCRIPTION
[0016] FIG. 1 illustrates a communication system 100 according to an embodiment. System 100 includes a sending device 102 (e.g., smartphone, laptop, computer, tablet, drone, etc.) communicating via a network 110 (e.g., the Internet) with a receiving device 104. In the particular use case illustrated, communication system 100 is a holographic communication system in which sending device 102 includes an image sensor (IS) 111 (e.g., camera) for producing image data (e.g., the image data may include a matrix of luma values and/or a matrix of chroma values) corresponding to an image and a depth sensor (DS) 112 (e.g., LiDAR sensor) for producing a depth map (a.k.a., “depth data”) for each image (e.g., video frame).
[0017] In one embodiment, sending device 102 has a bitstream generator (BG) 190 for generating a bitstream 199 which can be stored for later use by receiving device 104 and/or transmitted to receiving device 104 via network 110, where the bitstream contains both an image data bitstream 291 (see FIG. 2A) containing image data (e.g., the image data as captured by image senor 111 or an encoded version thereof) and a depth data bitstream 292 (see FIG. 2A) containing depth data (e.g., the depth data as captured by depth sensor 112 or an encoded version thereof), and receiving device 104 includes a hologram generator (HG) 195 configured to use the image data and depth data to display a hologram to the user of the receiving device. In some embodiments, rather than outputting a single bitstream 199, sending device 102 outputs two separate bitstreams: the image bitstream containing the image data and the depth bitstream containing the depth data.
[0018] This disclosure describes improvements for use in the case of a network limitation that prevents the required high-quality image and depth data to be transmitted to the receiving device. The improvement is to selectively provide the depth data. For example, a depth map associated with an image may or may not be provided to the receiving device (i.e., included in the bitstream) depending on whether a criteria is met the receiving device is able to produce a high-quality estimated depth map (i.e., an estimated depth map that is similar enough to the corresponding measured depth map).
[0019] In one embodiment, at the sending device, image data is encoded and locally decoded by a video codec (e.g., Versatile Video Coding (VVC) codec) to produced decoded image data. The decoded image data is used as input to a depth map estimator (e.g., a neural network or other machine learning (ML) model) to produce a depth map estimate. A value (e.g., an error term) is calculated indicating how close the depth map estimate is to the measured depth map (i.e., the depth map captured by the depth sensor or a depth map derived from the depth map captured by the depth sensor). In one embodiments, if the value is an error value and the error is below a certain threshold, then the measured depth map associated with the image is not included in the bitstream. Conversely, if the value is an error value and the error is above the threshold, the measured depth map is included in the bitstream (e.g., encoded and transmitted to the receiver). On the receiving device, if a depth map associated with an image was not received from the sender, then an identical depth map estimator to that at the sending device uses the decoded image data for the image to produce a depth map estimate which is then used with the image data to produce the 3D image, otherwise the depth map received from the sending device is used to produce the 3D image. [0020] FIG. 2A illustrates an embodiment of sending device 102. In the embodiment shown, in addition to including image sensor 111 and depth sensor 112, sending device 102 includes: i) an image encoder 202 (a.k.a., video encoder (VE)) for producing, for each of one or more images, encoded image data (Lnc) for the image from the raw image data (Law) for the image provided by image sensor 111, which encoded image data is included in bitstream 291 sent to receiving device 104 and ii) a depth data transmission controller (or controller for short) 206 for controlling whether or not, for each measured depth map, sending device 102 will transmit the depth map to receiving device 104 (e.g., for determining whether or not to include the depth map in the bitstream 292.
[0021] In one mode of operation (“mode 1”), controller 206 receives encoded image data (Lnc) corresponding to an image (i.e., nc was produced by VE 202) and employs a video decoder (VD) 222 to decode the encoded image data produced by VE 202, thereby producing decoded image data (Idee) corresponding to the image. Idee is then used as input to a depth map estimator (EST) 224 (e.g., a neural network) to produce an estimated depth map (D*) associated with the image. In another mode of operation (“mode 2”), controller 206 receives Law and Law is used as the input to EST 224 to produce the estimated depth map (D*) associated with the image.
[0022] In either mode of operation, a decision function (DF) 222 is configured to decide, for each measured depth map, whether to include the measured depth map (or an encoded version thereof) in bitstream 292. The decision is based on a measure of the similarity between the measured depth map for the image and the estimated depth map for the image (e.g., an error value or similarity value).
[0023] For example, in one embodiment, for each measured depth map produced by DS 112, DF 222 computes an error value (e) indicating an error between the measured depth map and the corresponding estimated depth map estimated by EST 224; the error value (e) is then compared to a threshold (T). The error value can be computed as the p-norm (e.g., e = ||Dm - D* ||p), where Dm is the measured depth map (e.g., as measured by the LiDAR) and p is > 1. If the error e is above T, a decision is made to add Dm (or an encoded version thereof) to the bitstream 292, otherwise the depth map is not added to the bitstream 292, and the receiving device uses its own estimated depth map to produce the 3D image. [0024] This means that a measured depth map associated with an image is provided to the receiving device only when the corresponding estimated depth map (i.e., the estimated depth map produced based on the image data for the image) is not good enough.
[0025] In some embodiments, if a measured depth map is to be transmitted, a depth map encoder (DE) 204 is used for encoding the measured depth map to produce an encoded depth map (Dene) based on the measured depth map (Dm). The DE can for example encode the depth map as PNG (see reference [3]). As a result, the image data is continuously transmitted while the depth information is transmitted on an as needed basis (i.e., when Dest is not good enough).
[0026] FIG. 2B further illustrates an embodiment of receiving device 104. At the receiving device, the encoded image data (lenc) in the bitstream 291 is decoded by VD 251 to produce decoded image data (Idee). An EST 254, which is identical to EST 224 in sending device 102, uses decoded image data for an image to produce a depth map estimate D* associated with the image. If an encoded depth map for the image was not included in bitstream 292, which means that DF 226 determined that D* is good enough, D* and Idee are used by HG 195 to produce the 3D image. If, on the other hand, an encoded depth map for the image is included in bitstream 292 (i.e., D* is not good enough) it is first decoded by a depth decoder (DD) 252 to produce a decoded depth map (Ddec) which, together with Idee, is used by HG 195 to produce the 3D image.
[0027] In one embodiment, controller 206 can switch between mode 1 and mode 2 based on a condition being satisfied (e.g., the video codec operating point satisfying a condition).
[0028] The EST (e.g., neural network (NN)) used to produce the depth estimate, operates at certain quality range of input images. If the codec bitrate is low, the difference between the image from the capturing device and decoded image will increase and the EST at the encoder and decoder will be presented with increasingly different input and therefore the estimated encoder and decoder depth will start to deviate. In this case it is advantageous to use mode 1. If the codec bitrate is high, one can assume the difference between the image from the capturing device and the decoded image is not as large and therefore switch to mode 2. The motivation for this is to have both the neural network on the sender and receiver use similar inputs and therefore produce very similar depth map estimates. This switching scheme can be realized as: if bn < B, then use mode 1, else use mode 2, where bn is the codec bitrate and B is the bitrate threshold at which EST performance starts to get affected.
[0029] The EST runs in real-time both on the sender and receiver side. It could be based on FastDepth (see reference [2]). In addition to single image depth estimation, a mapping —, IN] -> D^, could be used. Here k is the number of frames needed by neural network, N is a time stamp indicating the current frame.
[0030] FIG. 3 is a flow chart illustrating a process 300 according to an embodiment. Process 300 may begin in step s 302.
[0031] Step s302 comprises obtaining a measured depth map associated with at least a first image.
[0032] Step s304 comprises obtaining an estimated depth map associated with at least the first image.
[0033] Step s306 comprises obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map.
[0034] Step s308 comprises determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition.
[0035] Step s310 comprises refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
[0036] In some embodiments, the method is performed by a sending device, and the sending device is configured to provide the measured depth map to the receiving device as a result of determining that the criteria is not met.
[0037] In some embodiments, the similarity measure is an error value, and the error value satisfies the condition if the error value is less than a threshold. In some embodiments, the error value is equal to: ||Dm - D*||p, where p > 1, Dm is the measured depth map, and D* is the estimated depth map. In some embodiments, the criteria is met if the error value is less than the threshold. [0038] In some embodiments, the criteria is met if: i) the error value is less than the threshold and ii) bandwidth available to provide the measured depth map to the receiving device is less than a B, where B is a bandwidth threshold.
[0039] In some embodiments, obtaining the estimated depth map comprises inputting image data into a neural network and obtaining the estimated depth map from the neural network. In some embodiments the process also includes: encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data. In some embodiments, the image data input into the neural network comprises raw image data for the first image.
[0040] In some embodiments the process also includes encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data if a bitrate, bn, is less than a threshold, otherwise the image data input into the neural network comprises the raw image data.
[0041] FIG. 4 is a block diagram of sending device 102, according to some embodiments. As shown in FIG. 4, sending device 102 may comprise: processing circuitry (PC) 402, which may include one or more processors (P) 455 (e.g., one or more general purpose microprocessors and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be co-located in a single housing or in a single data center or may be geographically distributed (i.e., sending device 102 may be a distributed computing apparatus); at least one network interface 448 (e.g., a physical interface or air interface) comprising a transmitter (Tx) 445 and a receiver (Rx) 447 for enabling sending device 102 to transmit data to and receive data from other nodes connected to a network 110 (e.g., an Internet Protocol (IP) network) to which network interface 448 is connected (physically or wirelessly) (e.g., network interface 448 may be coupled to an antenna arrangement comprising one or more antennas for enabling sending device 102 to wirelessly transmit/receive data); and a storage unit (a.k.a., “data storage system”) 408, which may include one or more non-volatile storage devices and/or one or more volatile storage devices. In embodiments where PC 402 includes a programmable processor, a computer readable storage medium (CRSM) 442 may be provided. CRSM 442 may store a computer program (CP) 443 comprising computer readable instructions (CRI) 444. CRSM 442 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like. In some embodiments, the CRI 444 of computer program 443 is configured such that when executed by PC 402, the CRI causes sending device 102 to perform steps described herein (e.g., steps described herein with reference to the flow charts). In other embodiments, sending device 102 may be configured to perform steps described herein without the need for code. That is, for example, PC 402 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.
[0042] Conclusion
[0043] Because the sending device has access to the both the true depth (measured depth data) and the estimated depth (the depth data produced by the estimator), the sending device can determine whether or not there is an advantage to sending the measured depth data to a receiving device having a depth estimator with the capability to produce the estimated depth. For example, if the estimated depth is close to the depth measured by the LiDAR, then there is little to gained by providing the measured depth data to the receiving device and only the image data is transmitted, thereby saving network bandwidth as well as battery power.
[0044] While various embodiments are described herein, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of this disclosure should not be limited by any of the above-described exemplary embodiments. Moreover, any combination of the above-described elements in all possible variations thereof is encompassed by the disclosure unless otherwise indicated herein or otherwise clearly contradicted by context.
[0045] As used herein transmitting data “to” or “toward” an intended recipient encompasses transmitting the data directly to the intended recipient or transmitting the data indirectly to the intended recipient (i.e., one or more other nodes are used to relay the message from the source node to the intended recipient). Likewise, as used herein receiving data “from” a sender encompasses receiving the data directly from the sender or indirectly from the sender (i.e., one or more nodes are used to relay the data from the sender to the receiving node). Further, as used herein “a” means “at least one” or “one or more.”
[0046] Additionally, while the processes described above and illustrated in the drawings are shown as a sequence of steps, this was done solely for the sake of illustration. Accordingly, it is contemplated that some steps may be added, some steps may be omitted, the order of the steps may be re-arranged, and some steps may be performed in parallel.
[0047] References
[0048] [l]Bross, B., et. al., "Overview of the Versatile Video Coding (VVC) Standard and its Applications," in IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 3736-3764, Oct. 2021, doi: 10.1109/TCSVT.2021.3101953.
[0049] [2] Wofk, D., et. al., "FastDepth: Fast Monocular Depth Estimation on Embedded
Systems," 2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 6101- 6108, doi: 10.1109/ICRA.2019.8794182.
[0050] [3] W3C, “Portable Network Graphics (PNG) Specification (Second Edition),” available at www(dot)w3(dot)org/TR/2003/REC -PNG-20031110.

Claims

1. A sending device (102), comprising: an image sensor (111); a depth sensor (112); processing circuitry (402); and memory (442) storing instructions executable by the processing circuitry for configuring the sending device to: obtain a measured depth map associated with at least a first image; obtain an estimated depth map associated with at least the first image; obtain a similarity measure indicating a similarity between the measured depth map and the estimated depth map; determine, based at least in part on the similarity measure, whether a criteria is met, wherein determining whether the criteria is met comprises determining whether the similarity measure satisfies a condition; and refrain from providing the measured depth map to a receiving device (104) as a result of determining that the criteria is met.
2. The sending device of claim 1, wherein the sending device is further configured to provide the measured depth map to the receiving device as a result of determining that the criteria is not met, and the criteria is not met when the similarity measure does not satisfy the condition.
3. The sending device of claim 1 or 2, wherein the similarity measure is an error value, and the sending device determines that the error value satisfies the condition if the error value is less than a threshold, T.
4. The sending device of claim 3, wherein the error value is equal to:
||Dm - D* | |p, where p > 1, Dm is the measured depth map, and D* is the estimated depth map.
5. The sending device of claim 3 or 4, wherein the sending device determines that the criteria is met as a result of determining that the error value is less than T.
6. The sending device of claim 3 or 4, wherein the sending device determines that the criteria is met as a result of determining that: i) the error value is less than T and ii) bandwidth available to provide the measured depth map to the receiving device is less than a B, where B is a bandwidth threshold.
7. The sending device of any one of claims 1-6, wherein the sending device is configured to obtain the estimated depth map by performing a process that includes: inputting image data into a neural network; and obtaining the estimated depth map from the neural network.
8. The sending device of claim 7, wherein the sending device is further configured to: i) encode raw image data for the first image to produce encoded image data for the first image; and ii) decode the encoded image data to produce decoded image data for the first image, and the image data input into the neural network comprises the decoded image data.
9. The sending device of claim 7, wherein the image data input into the neural network comprises raw image data for the first image.
10. The sending device of claim 7, further comprising: an image encoder for encoding raw image data for the first image to produce encoded image data for the first image; an image decoder for decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data if a bitrate, bn, is less than a threshold, otherwise the image data input into the neural network comprises the raw image data.
11. A method (300), comprising: obtaining (s302) a measured depth map associated with at least a first image; obtaining (s304) an estimated depth map associated with at least the first image; obtaining (s306) a similarity measure indicating a similarity between the measured depth map and the estimated depth map; determining (s308), based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition; and refraining (s310)from providing the measured depth map to a receiving device (104) as a result of determining that the criteria is met.
12. The method of claim 11, wherein the method is performed by a sending device (102), and the sending device is configured to provide the measured depth map to the receiving device as a result of determining that the criteria is not met.
13. The method of claim 11 or 12, wherein the similarity measure is an error value, and the error value satisfies the condition if the error value is less than a threshold.
14. The method of claim 13, wherein the error value is equal to:
||Dm - D* | |p, where p > 1, Dm is the measured depth map, and D* is the estimated depth map.
15. The method of claim 13 or 14, wherein the criteria is met if the error value is less than the threshold.
16. The method of claim 13 or 14, wherein the criteria is met if: i) the error value is less than the threshold and ii) bandwidth available to provide the measured depth map to the receiving device is less than a B, where B is a bandwidth threshold.
17. The method of any one of claims 11-16, wherein obtaining the estimated depth map comprises: inputting image data into a neural network; and obtaining the estimated depth map from the neural network.
18. The method of claim 17, further comprising: encoding raw image data for the first image to produce encoded image data for the first image; and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data.
19. The method of claim 17, wherein the image data input into the neural network comprises raw image data for the first image.
20. The method of claim 17, further comprising: encoding raw image data for the first image to produce encoded image data for the first image; and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data if a bitrate, bn, is less than a threshold, otherwise the image data input into the neural network comprises the raw image data.
21. A computer program (443) comprising instructions (444) which when executed by processing circuitry (402) of a sending device causes the sending deivce to perform the method of any one of claims 11-20.
22. A carrier containing the computer program of claim 21, wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium (442).
PCT/SE2022/051128 2022-12-01 2022-12-01 Holographic communication system Ceased WO2024117951A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22967360.3A EP4627528A1 (en) 2022-12-01 2022-12-01 Holographic communication system
PCT/SE2022/051128 WO2024117951A1 (en) 2022-12-01 2022-12-01 Holographic communication system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/SE2022/051128 WO2024117951A1 (en) 2022-12-01 2022-12-01 Holographic communication system

Publications (1)

Publication Number Publication Date
WO2024117951A1 true WO2024117951A1 (en) 2024-06-06

Family

ID=91324521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2022/051128 Ceased WO2024117951A1 (en) 2022-12-01 2022-12-01 Holographic communication system

Country Status (2)

Country Link
EP (1) EP4627528A1 (en)
WO (1) WO2024117951A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200279120A1 (en) * 2018-07-27 2020-09-03 Beijing Sensetime Technology Development Co., Ltd. Method, apparatus and system for liveness detection, electronic device, and storage medium
EP3712841A1 (en) * 2019-03-22 2020-09-23 Ricoh Company, Ltd. Image processing method, image processing apparatus, and computer-readable recording medium
US20220156971A1 (en) * 2020-11-13 2022-05-19 Toyota Research Institute, Inc. Systems and methods for training a machine-learning-based monocular depth estimator
US20220215567A1 (en) * 2019-05-10 2022-07-07 Nippon Telegraph And Telephone Corporation Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200279120A1 (en) * 2018-07-27 2020-09-03 Beijing Sensetime Technology Development Co., Ltd. Method, apparatus and system for liveness detection, electronic device, and storage medium
EP3712841A1 (en) * 2019-03-22 2020-09-23 Ricoh Company, Ltd. Image processing method, image processing apparatus, and computer-readable recording medium
US20220215567A1 (en) * 2019-05-10 2022-07-07 Nippon Telegraph And Telephone Corporation Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program
US20220156971A1 (en) * 2020-11-13 2022-05-19 Toyota Research Institute, Inc. Systems and methods for training a machine-learning-based monocular depth estimator

Also Published As

Publication number Publication date
EP4627528A1 (en) 2025-10-08

Similar Documents

Publication Publication Date Title
US12477231B2 (en) Apparatus and methods for image encoding using spatially weighted encoding quality parameters
US10389994B2 (en) Decoder-centric UV codec for free-viewpoint video streaming
CN101651841B (en) Method, system and equipment for realizing stereo video communication
EP3251345B1 (en) System and method for multi-view video in wireless devices
US11889115B2 (en) Method and device for multi-view video decoding and method and device for image processing
US20200380775A1 (en) Transmitting device, transmitting method, and receiving device
US12356105B2 (en) Session description for communication session
WO2023179277A1 (en) Encoding/decoding positions of points of a point cloud encompassed in a cuboid volume
EP4375947A1 (en) Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method
GB2637850A (en) Applications of layered encoding in split computing
WO2024117951A1 (en) Holographic communication system
WO2024186242A1 (en) Holographic communication system
KR20090081190A (en) Handheld terminal
KR102260653B1 (en) The image generation system for providing 3d image
CN113989146B (en) Image processing method and device, electronic device, and storage medium
EP4564819A1 (en) Depth single-layer encoding/decoding
US12462407B2 (en) Depth estimation method in an immersive video context
US20240282013A1 (en) Learning-based point cloud compression via unfolding of 3d point clouds
EP4492787A1 (en) Capture device information
WO2025177023A1 (en) Systems and methods for reducing uplink traffic
JP2014165836A (en) Data communication system, and data communication method
CN119835394A (en) Encoding and decoding method and end cloud cooperative system
WO2025196604A1 (en) Data compression for medical images
WO2024226026A1 (en) Bitrate based exposure factorization for image and video processing
WO2021160955A1 (en) Method and device for processing multi-view video data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22967360

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022967360

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022967360

Country of ref document: EP

Effective date: 20250701

WWP Wipo information: published in national office

Ref document number: 2022967360

Country of ref document: EP