WO2024117951A1 - Holographic communication system - Google Patents
Holographic communication system Download PDFInfo
- Publication number
- WO2024117951A1 WO2024117951A1 PCT/SE2022/051128 SE2022051128W WO2024117951A1 WO 2024117951 A1 WO2024117951 A1 WO 2024117951A1 SE 2022051128 W SE2022051128 W SE 2022051128W WO 2024117951 A1 WO2024117951 A1 WO 2024117951A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image data
- depth map
- image
- sending device
- criteria
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/597—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding specially adapted for multi-view video sequence encoding
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
Definitions
- extended reality (XR) applications e.g., virtual reality applications
- VR virtual reality
- AR augmented reality
- MR mixed reality applications
- XR XR application
- holographic communication refers to the transmission of data that enables a device receiving the data to produce a three-dimensional (3D) image.
- a sending device obtains image data for an image (e.g., a frame of a video) and, for each image, corresponding depth data (a.k.a., a “depth map”) associated with the image (e.g., for each pixel of the image, there is a depth value that indicates the distance from the sensor to the respective point on the object corresponding to the pixel).
- the image data can be captured by, for example, a smartphone’s camera and the depth map can be captured by a smartphone’s light detection and ranging (LiDAR) sensor and/or other sensors.
- a depth map can be captured by means of active scanning (e.g., LiDAR in iPhone 14 Pro (see, e.g., apple(dot)com/iphone-14-pro/specs/)), or passive scanning (e.g., stereo camera setup as in Intel® RealSenseTM Depth Camera D435 (see, e.g., www(dot)intelrealsense(dot)com/depth-camera-d435/)).
- a depth map contains distance information (“depth values”) indicating distances from a sensor position to points on the surface of an object in the physical scene.
- a depth map may be a matrix of distance values where each distance value indicates a distance from a sensor to a point on a surface.
- the encoded data is decoded to recover the image data and depth map. This data is then fed to a rendering and visualization module.
- XR glasses in communication with the receiving device can be used to display the images in 3D as holograms. This application allows the rendering of a 3D reconstruction of a person using the sending device (e.g., the person is rendered as a hologram using XR glasses for a more immersive experience).
- U.S. patent publication no. 20190025587 Al describes a holographic projection technology for use in the presentation of a 3D imaging effect to a user, such as computergenerated holography.
- U.S. patent publication no. US 2022057750 Al describes generating a computer-generated hologram (CGH) of an object using a “depth map method.”
- a method that includes obtaining a measured depth map associated with at least a first image and obtaining an estimated depth map associated with at least the first image.
- the method also includes obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map.
- the method also includes determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition.
- the method further includes refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
- a computer program comprising instructions which when executed by processing circuitry of a sending device causes the sending device to perform any of the methods disclosed herein.
- a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.
- a sending device that is configured to perform the methods disclosed herein.
- the sending device may include memory and processing circuitry coupled to the memory.
- a depth map associated with one or more images is transmitted only when a certain criteria is met (e.g., the estimated depth map is not accurate enough). This not only allows holographic communication systems to operate under bandwidth limitations, but also reduces the load on the network as well as reducing the load on the battery of the sending device, thereby extending the life of the battery.
- FIG. 1 illustrates a communication system according to some embodiments.
- FIG. 2A is a functional block diagram of a sending device according to some embodiments.
- FIG. 2B is a functional block diagram of a receiving device according to some embodiments.
- FIG. 3 is a flowchart illustrating a process according to some embodiments.
- FIG. 4 is a block diagram of a sending device according to some embodiments.
- FIG. 1 illustrates a communication system 100 according to an embodiment.
- System 100 includes a sending device 102 (e.g., smartphone, laptop, computer, tablet, drone, etc.) communicating via a network 110 (e.g., the Internet) with a receiving device 104.
- a sending device 102 e.g., smartphone, laptop, computer, tablet, drone, etc.
- a network 110 e.g., the Internet
- communication system 100 is a holographic communication system in which sending device 102 includes an image sensor (IS) 111 (e.g., camera) for producing image data (e.g., the image data may include a matrix of luma values and/or a matrix of chroma values) corresponding to an image and a depth sensor (DS) 112 (e.g., LiDAR sensor) for producing a depth map (a.k.a., “depth data”) for each image (e.g., video frame).
- IS image sensor
- DS depth sensor
- sending device 102 has a bitstream generator (BG) 190 for generating a bitstream 199 which can be stored for later use by receiving device 104 and/or transmitted to receiving device 104 via network 110, where the bitstream contains both an image data bitstream 291 (see FIG. 2A) containing image data (e.g., the image data as captured by image senor 111 or an encoded version thereof) and a depth data bitstream 292 (see FIG. 2A) containing depth data (e.g., the depth data as captured by depth sensor 112 or an encoded version thereof), and receiving device 104 includes a hologram generator (HG) 195 configured to use the image data and depth data to display a hologram to the user of the receiving device.
- sending device 102 outputs two separate bitstreams: the image bitstream containing the image data and the depth bitstream containing the depth data.
- This disclosure describes improvements for use in the case of a network limitation that prevents the required high-quality image and depth data to be transmitted to the receiving device.
- the improvement is to selectively provide the depth data.
- a depth map associated with an image may or may not be provided to the receiving device (i.e., included in the bitstream) depending on whether a criteria is met the receiving device is able to produce a high-quality estimated depth map (i.e., an estimated depth map that is similar enough to the corresponding measured depth map).
- image data is encoded and locally decoded by a video codec (e.g., Versatile Video Coding (VVC) codec) to produced decoded image data.
- VVC Versatile Video Coding
- the decoded image data is used as input to a depth map estimator (e.g., a neural network or other machine learning (ML) model) to produce a depth map estimate.
- a value e.g., an error term
- FIG. 2A illustrates an embodiment of sending device 102.
- sending device 102 includes: i) an image encoder 202 (a.k.a., video encoder (VE)) for producing, for each of one or more images, encoded image data (Lnc) for the image from the raw image data (Law) for the image provided by image sensor 111, which encoded image data is included in bitstream 291 sent to receiving device 104 and ii) a depth data transmission controller (or controller for short) 206 for controlling whether or not, for each measured depth map, sending device 102 will transmit the depth map to receiving device 104 (e.g., for determining whether or not to include the depth map in the bitstream 292.
- an image encoder 202 a.k.a., video encoder (VE)
- VE video encoder
- controller 206 receives encoded image data (Lnc) corresponding to an image (i.e., nc was produced by VE 202) and employs a video decoder (VD) 222 to decode the encoded image data produced by VE 202, thereby producing decoded image data (Idee) corresponding to the image.
- VD video decoder
- Idee decoded image data
- EST depth map estimator
- controller 206 receives Law and Law is used as the input to EST 224 to produce the estimated depth map (D*) associated with the image.
- a decision function (DF) 222 is configured to decide, for each measured depth map, whether to include the measured depth map (or an encoded version thereof) in bitstream 292. The decision is based on a measure of the similarity between the measured depth map for the image and the estimated depth map for the image (e.g., an error value or similarity value).
- DF 222 computes an error value (e) indicating an error between the measured depth map and the corresponding estimated depth map estimated by EST 224; the error value (e) is then compared to a threshold (T).
- a depth map encoder (DE) 204 is used for encoding the measured depth map to produce an encoded depth map (Dene) based on the measured depth map (Dm).
- the DE can for example encode the depth map as PNG (see reference [3]).
- FIG. 2B further illustrates an embodiment of receiving device 104.
- the encoded image data (lenc) in the bitstream 291 is decoded by VD 251 to produce decoded image data (Idee).
- An EST 254 which is identical to EST 224 in sending device 102, uses decoded image data for an image to produce a depth map estimate D* associated with the image. If an encoded depth map for the image was not included in bitstream 292, which means that DF 226 determined that D* is good enough, D* and pou are used by HG 195 to produce the 3D image.
- an encoded depth map for the image is included in bitstream 292 (i.e., D* is not good enough) it is first decoded by a depth decoder (DD) 252 to produce a decoded depth map (Ddec) which, together with zic, is used by HG 195 to produce the 3D image.
- DD depth decoder
- Ddec decoded depth map
- controller 206 can switch between mode 1 and mode 2 based on a condition being satisfied (e.g., the video codec operating point satisfying a condition).
- the EST (e.g., neural network (NN)) used to produce the depth estimate, operates at certain quality range of input images. If the codec bitrate is low, the difference between the image from the capturing device and decoded image will increase and the EST at the encoder and decoder will be presented with increasingly different input and therefore the estimated encoder and decoder depth will start to deviate. In this case it is advantageous to use mode 1. If the codec bitrate is high, one can assume the difference between the image from the capturing device and the decoded image is not as large and therefore switch to mode 2. The motivation for this is to have both the neural network on the sender and receiver use similar inputs and therefore produce very similar depth map estimates. This switching scheme can be realized as: if bn ⁇ B, then use mode 1, else use mode 2, where bn is the codec bitrate and B is the bitrate threshold at which EST performance starts to get affected.
- bn is the codec bitrate
- B is the bitrate threshold at which EST performance starts to get affected.
- FIG. 3 is a flow chart illustrating a process 300 according to an embodiment.
- Process 300 may begin in step s 302.
- Step s302 comprises obtaining a measured depth map associated with at least a first image.
- Step s304 comprises obtaining an estimated depth map associated with at least the first image.
- Step s306 comprises obtaining a similarity measure indicating a similarity between the measured depth map and the estimated depth map.
- Step s308 comprises determining, based at least in part on the similarity measure, that a criteria is met, wherein determining that the criteria is met comprises determining whether the similarity measure satisfies a condition.
- Step s310 comprises refraining from providing the measured depth map to the receiving device as a result of determining that the criteria is met.
- the method is performed by a sending device, and the sending device is configured to provide the measured depth map to the receiving device as a result of determining that the criteria is not met.
- the similarity measure is an error value, and the error value satisfies the condition if the error value is less than a threshold.
- the error value is equal to:
- the criteria is met if the error value is less than the threshold. [0038] In some embodiments, the criteria is met if: i) the error value is less than the threshold and ii) bandwidth available to provide the measured depth map to the receiving device is less than a B, where B is a bandwidth threshold.
- obtaining the estimated depth map comprises inputting image data into a neural network and obtaining the estimated depth map from the neural network.
- the process also includes: encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data.
- the image data input into the neural network comprises raw image data for the first image.
- the process also includes encoding raw image data for the first image to produce encoded image data for the first image and decoding the encoded image data to produce decoded image data for the first image, wherein the image data input into the neural network comprises the decoded image data if a bitrate, bn, is less than a threshold, otherwise the image data input into the neural network comprises the raw image data.
- FIG. 4 is a block diagram of sending device 102, according to some embodiments.
- sending device 102 may comprise: processing circuitry (PC) 402, which may include one or more processors (P) 455 (e.g., one or more general purpose microprocessors and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be co-located in a single housing or in a single data center or may be geographically distributed (i.e., sending device 102 may be a distributed computing apparatus); at least one network interface 448 (e.g., a physical interface or air interface) comprising a transmitter (Tx) 445 and a receiver (Rx) 447 for enabling sending device 102 to transmit data to and receive data from other nodes connected to a network 110 (e.g., an Internet Protocol (IP) network) to which network interface 448 is connected (physically or wirelessly) (e.g.
- IP Internet Protocol
- a computer readable storage medium (CRSM) 442 may be provided.
- CRSM 442 may store a computer program (CP) 443 comprising computer readable instructions (CRI) 444.
- CP computer program
- CRSM 442 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like.
- the CRI 444 of computer program 443 is configured such that when executed by PC 402, the CRI causes sending device 102 to perform steps described herein (e.g., steps described herein with reference to the flow charts).
- sending device 102 may be configured to perform steps described herein without the need for code. That is, for example, PC 402 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software.
- the sending device can determine whether or not there is an advantage to sending the measured depth data to a receiving device having a depth estimator with the capability to produce the estimated depth. For example, if the estimated depth is close to the depth measured by the LiDAR, then there is little to gained by providing the measured depth data to the receiving device and only the image data is transmitted, thereby saving network bandwidth as well as battery power.
- transmitting data “to” or “toward” an intended recipient encompasses transmitting the data directly to the intended recipient or transmitting the data indirectly to the intended recipient (i.e., one or more other nodes are used to relay the message from the source node to the intended recipient).
- receiving data “from” a sender encompasses receiving the data directly from the sender or indirectly from the sender (i.e., one or more nodes are used to relay the data from the sender to the receiving node).
- a means “at least one” or “one or more.”
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Description
Claims
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22967360.3A EP4627528A1 (en) | 2022-12-01 | 2022-12-01 | Holographic communication system |
| PCT/SE2022/051128 WO2024117951A1 (en) | 2022-12-01 | 2022-12-01 | Holographic communication system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/SE2022/051128 WO2024117951A1 (en) | 2022-12-01 | 2022-12-01 | Holographic communication system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024117951A1 true WO2024117951A1 (en) | 2024-06-06 |
Family
ID=91324521
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SE2022/051128 Ceased WO2024117951A1 (en) | 2022-12-01 | 2022-12-01 | Holographic communication system |
Country Status (2)
| Country | Link |
|---|---|
| EP (1) | EP4627528A1 (en) |
| WO (1) | WO2024117951A1 (en) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200279120A1 (en) * | 2018-07-27 | 2020-09-03 | Beijing Sensetime Technology Development Co., Ltd. | Method, apparatus and system for liveness detection, electronic device, and storage medium |
| EP3712841A1 (en) * | 2019-03-22 | 2020-09-23 | Ricoh Company, Ltd. | Image processing method, image processing apparatus, and computer-readable recording medium |
| US20220156971A1 (en) * | 2020-11-13 | 2022-05-19 | Toyota Research Institute, Inc. | Systems and methods for training a machine-learning-based monocular depth estimator |
| US20220215567A1 (en) * | 2019-05-10 | 2022-07-07 | Nippon Telegraph And Telephone Corporation | Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program |
-
2022
- 2022-12-01 EP EP22967360.3A patent/EP4627528A1/en active Pending
- 2022-12-01 WO PCT/SE2022/051128 patent/WO2024117951A1/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20200279120A1 (en) * | 2018-07-27 | 2020-09-03 | Beijing Sensetime Technology Development Co., Ltd. | Method, apparatus and system for liveness detection, electronic device, and storage medium |
| EP3712841A1 (en) * | 2019-03-22 | 2020-09-23 | Ricoh Company, Ltd. | Image processing method, image processing apparatus, and computer-readable recording medium |
| US20220215567A1 (en) * | 2019-05-10 | 2022-07-07 | Nippon Telegraph And Telephone Corporation | Depth estimation device, depth estimation model learning device, depth estimation method, depth estimation model learning method, and depth estimation program |
| US20220156971A1 (en) * | 2020-11-13 | 2022-05-19 | Toyota Research Institute, Inc. | Systems and methods for training a machine-learning-based monocular depth estimator |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4627528A1 (en) | 2025-10-08 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12477231B2 (en) | Apparatus and methods for image encoding using spatially weighted encoding quality parameters | |
| US10389994B2 (en) | Decoder-centric UV codec for free-viewpoint video streaming | |
| CN101651841B (en) | Method, system and equipment for realizing stereo video communication | |
| EP3251345B1 (en) | System and method for multi-view video in wireless devices | |
| US11889115B2 (en) | Method and device for multi-view video decoding and method and device for image processing | |
| US20200380775A1 (en) | Transmitting device, transmitting method, and receiving device | |
| US12356105B2 (en) | Session description for communication session | |
| WO2023179277A1 (en) | Encoding/decoding positions of points of a point cloud encompassed in a cuboid volume | |
| EP4375947A1 (en) | Point cloud data transmission device, point cloud data transmission method, point cloud data reception device, and point cloud data reception method | |
| GB2637850A (en) | Applications of layered encoding in split computing | |
| WO2024117951A1 (en) | Holographic communication system | |
| WO2024186242A1 (en) | Holographic communication system | |
| KR20090081190A (en) | Handheld terminal | |
| KR102260653B1 (en) | The image generation system for providing 3d image | |
| CN113989146B (en) | Image processing method and device, electronic device, and storage medium | |
| EP4564819A1 (en) | Depth single-layer encoding/decoding | |
| US12462407B2 (en) | Depth estimation method in an immersive video context | |
| US20240282013A1 (en) | Learning-based point cloud compression via unfolding of 3d point clouds | |
| EP4492787A1 (en) | Capture device information | |
| WO2025177023A1 (en) | Systems and methods for reducing uplink traffic | |
| JP2014165836A (en) | Data communication system, and data communication method | |
| CN119835394A (en) | Encoding and decoding method and end cloud cooperative system | |
| WO2025196604A1 (en) | Data compression for medical images | |
| WO2024226026A1 (en) | Bitrate based exposure factorization for image and video processing | |
| WO2021160955A1 (en) | Method and device for processing multi-view video data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22967360 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022967360 Country of ref document: EP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2022967360 Country of ref document: EP Effective date: 20250701 |
|
| WWP | Wipo information: published in national office |
Ref document number: 2022967360 Country of ref document: EP |