WO2016003340A1

WO2016003340A1 - Encoding and decoding of light fields

Info

Publication number: WO2016003340A1
Application number: PCT/SE2014/050851
Authority: WO
Inventors: Julien Michot
Original assignee: Telefonaktiebolaget LM Ericsson AB
Current assignee: Telefonaktiebolaget LM Ericsson AB
Priority date: 2014-07-03
Filing date: 2014-07-03
Publication date: 2016-01-07
Anticipated expiration: 2017-01-03

Abstract

There is provided encoding of a light field (LF) into a bitstream. An LF of a scene and parameters relating to the LF are received. The parameters describe a three-dimensional (3D) model of the scene, parameters for rendering at least one view of the scenefrom at least one panoramic light field (PLF), a projection method for generating a PLF space from the images and the 3D model. At least one PLF and the parameters are encodedinto the bitstream by the sequence of PLF spaces being sampled into a sequence of PLFs and compression being applied to remove redundancy in the sequence of PLFs. There is also provided decoding of such an encoded bitstream into sequence of PLFs.

Description

ENCODING AND DECODING OF LIGHT FIELDS

TECHNICAL FIELD

Embodiments presented herein relate to encoding and decoding of light fields, and particularly to methods, an electronic device, computer programs, and a computer program product for encoding and decoding of light fields.

BACKGROUND

The area of 3D (three-dimensional) video and 3D TV is gaining momentum and is considered one possible logical step in consumer electronics, mobile devices, computers and cinemas. The additional dimension on top of common two-dimensional (2D) video offers multiple different directions for displaying the content and improves the potential for interaction between viewers and the content.

The term 3D is usually related to stereoscopic experiences, where each one of the user's eyes is provided with a unique image of a scene. Such unique images may be provided as a stereoscopic image pair. The unique images are then fused by the human brain to create a depth impression (i.e. an imagined 3D view). For example, by presenting a light field using technology that maps each sample to the appropriate ray in physical space, one obtains an auto- stereoscopic visual effect akin to viewing the original scene. Digital technologies for enabling this include placing an array of lenslets over a high- resolution display screen, or projecting the imagery onto an array of lenslets using an array of video projectors. If the latter is combined with an array of video cameras, one can capture and display a time-varying light field. This essentially constitutes a 3D television system. More generally, one way to add the depth dimension to video is by means of so-called stereoscopic video. In stereoscopic video, the left and the right eyes of the viewer are shown slightly different views (i.e., images). This is achieved by using anaglyph, shutter or polarized glasses that allow showing different images to the left and the right eyes of the viewer, in this way creating a perception of depth. The perceived depth of the point in the image is thereby determined by its displacement between the left and the right views.

Some auto-stereoscopic displays allow the viewer to experience depth perception without glasses. These displays project slightly different images in the different directions. This is schematically illustrated in Fig. l. Fig. l schematically illustrates a rendering unit 12 where slightly different images 14a, 14b, 14c from locations 12a, 12b, 12c on the display of the rendering unit 12 are projected towards a viewer, as represented by the eyes 11a, 11b, in front of the rendering unit 12. Therefore, if the viewer is located in a proper position in front of the display, the viewer's left and right eyes see slightly different views of the same scene, which make it possible to create the perception of depth. In order to achieve smooth parallax and change of the viewpoint when the user moves in front of the display, a number of views (typically 7-28) are generated. It is expected that the number of views may increase to 20-50.

One issue that may arise when using such auto-stereoscopic displays is the transmission, or storage, of the views, as the views may constitute to a high bit rate. In stereoscopic video, the left and the right views may be coded independently or jointly. Another way to obtain one view from the other view is by using the view synthesis. For example, the issue may be overcome by transmitting a low number (e.g. 1 to 3) of key views and generating the other views by a so-called view synthesis process from the transmitted key views and eventually using additional information such as depth maps. These synthesized views can be located between the key views (interpolated) or outside the range covered by key views (extrapolated).

In general terms, a depth map may be regarded as a simple grayscale image, wherein each pixel indicates the distance between the corresponding pixel from a video object and the image plane of the capturing camera. Disparity, on the other hand, maybe regarded as the apparent shift of a pixel which is a consequence of the viewer moving from one viewpoint to another. Depth and disparity are mathematically related and can be interchangeably used. One property of depth/disparity maps is that they contain large smooth surfaces of constant gray levels. This makes them comparatively easy to compress using currently available video coding technology. A so-called 3D point cloud may be reconstructed from the depth map if the 3D camera parameters (such as the intrinsic calibration matrix K for a pinhole camera model, containing the focal lengths, principal point, etc.) are known. The depth map maybe measured by specialized cameras, e.g., structured-light or time-of-flight (ToF) cameras, where the depth is correlated respectively with the

deformation of a projected pattern or with the round-trip time of a pulse of light.

One example of a view synthesis technique is depth image based rendering (DIBR). In order to facilitate the view synthesis, DIBR uses depth map(s) of the key view(s). At least in theory also the depth maps of other views could be used. The depth map may be represented by a grayscale image having the same resolution as the view (video frame). Then, each pixel of the depth map represents the distance from the camera to the object for the corresponding pixel in the image/ video frame. DIBR generally consists of creating a dense 3D point cloud by back-projection of the depth map and projecting the 3D point cloud to another viewpoint. In order to facilitate the DIBR view synthesis, some parameters need to be signalled for the device or program module that performs the view synthesis. Among those parameters are "z near" and "z far" that represent the closest and the farthest depth values, respectively, in the depth maps for the frame under consideration. These values are needed in order to map the quantized depth map samples to the real depth values that they represent. Another set of parameters that is needed for the view synthesis are camera parameters.

Camera parameters for the 3D video are usually split into two parts. The first part relates to internal camera parameters (or intrinsic parameters) and generally represents the optical characteristics of the camera for the image captured, such as the focal length, the coordinates of the images principal point and the lens distortions. The second part relates to external camera parameters (or extrinsic parameters) and generally represents the camera position and the direction of the optical axis of the camera either in the chosen real world coordinates or the position of the camera relative to each other and the objects in the scene. In general terms, both the internal and the external camera parameters may be required in the view synthesis process based on usage of the depth information (such as DIBR).

An alternative way to send views of the key cameras is using layered depth video (LDV) that uses multiple layers for scene representation. These layers may represent foreground texture, foreground depth, background texture. and background depth, etc.

Having one color per 3D pixel or volume pixel (voxel) in stereoscopic video is often not enough since some content has varying colors depending on from where the viewer is viewing them. This is typically due to specular lights, reflective or transparent content. Motion parallax brings the ability to the viewer to perceive the 3D structure of the static content but also allows the viewer to see how reflective/ transparent the content is. Being able to replicate this on a screen may thus improve the user experience.

Compressing or encoding 3D video with properties as outlined above may be challenging since the input data structure (camera array) contains a lot of redundancy due to the fact that the cameras record the same content but at a slightly different viewpoint, while most of the contents does not have a varying color depending on the angle from which the contents is observed.

Besides, in order to get a relatively good 3D rendering and have a proper, smooth motion parallax effect, a comparatively large number of cameras may be needed, increasing even more the redundancy.

Thus, there may be a need for finding a representation of the content captured by the cameras that can provide both motion parallax, angular color variation and can be coded easily. The multi view high efficiency video coding (MV-HEVC) standard (see, for example, MV-HEVC Draft Text 5, JCT3V-E1004-V6, ITU-T SG 16 WP 3 and ISO/IEC JTC i/SC 29/WG 11) and the three-dimensional high efficiency video coding (3D-HEVC) draft standard (see, for example, 3D-HEVC Test Model 5, ISO/IEC JTC1/SC29/WG11 N13769, ITU-T SG 16 WP 3 and

ISO/IEC JTC i/SC 29/WG 11) both yield very low compression efficiency for wide angular resolution (high number of cameras). These standards also have a very large overhead for motion/disparity vectors coding. These standards further offer a slow encoding process. The so-called "Layer-Based Representation for Image Based Rendering and Compression" was developed by Dragotti, P. L. et al at Imperial College London. The image content is split into several zones having the same depth value using image segmentation. Several light field layers for each zone are encoded separately. This approach only works when the content is easy to segment and contains a few elements, such as a toy example. But in reality, real videos are much more complex and the number of zones will increase, leading to a sub-optimal compression ratio. Coding the contour is also challenging (bits demanding).

JP3D (also known as JPEG 2000 3D) is only applicable for images; a straight-forward extension to video would yield a quite low compression ratio. JP3D was not directly developed for light field compression.

Hence, there is still a need for an improved encoding of a light field into a bitstream and improved decoding of a bitstream into a panoramic light field.

SUMMARY

An object of embodiments herein is to provide efficient encoding of a light field into a bitstream and efficient decoding of a bitstream into a panoramic light field.

According to a first aspect there is presented a method for encoding a light field (LF) into a bitstream. The method is performed by an electronic device. The method comprises receiving an LF of a scene and parameters relating to the LF. The parameters describe a three-dimensional (3D) model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field (PLF), and a projection method for generating a PLF space from the images and the 3D model. The method comprises encoding the at least one PLF and the parameters into the bitstream by sampling the sequence of PLF spaces into a sequence of PLFs and applying compression to remove redundancy in the sequence of PLFs.

Advantageously this enables efficient encoding of a light field into a bitstream. According to a second aspect there is presented an electronic device for encoding an LF into a bitstream. The electronic device comprises a processing unit. The processing unit is configured to receive an LF of a scene and parameters relating to the LF. The parameters describe a 3D model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field (PLF), and a projection method for generating a PLF space from the images and the 3D model. The processing unit is configured to encode the at least one PLF and the parameters into the bitstream by sampling the sequence of PLF spaces into a sequence of PLFs and applying compression to remove redundancy in the sequence of PLFs. According to a third aspect there is presented a computer program for encoding an LF into a bitstream, the computer program comprising computer program code which, when run on a processing unit, causes the processing unit to perform a method according to the first aspect.

According to a fourth aspect there is presented a computer program product comprising a computer program according to the third aspect and a computer readable means on which the computer program is stored.

According to a fifth aspect there is presented a method for decoding an encoded bitstream into a PLF. The method is performed by an electronic device. The method comprises receiving an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF. The parameters describe a panoramic 3D model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back- projection method for generating the images from the at least one PLF, and samplings of a PLF space. The method comprises decoding the encoded bitstream into the at least one PLF by reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters.

Advantageously this enables efficient decoding of a bitstream into a panoramic light field.

According to a sixth aspect there is presented an electronic device for decoding an encoded bitstream into a PLF. The electronic device comprises a processing unit. The processing unit is configured to receive an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF. The parameters describe a panoramic 3D model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back-projection method for generating the images from the at least one PLF, and samplings of a PLF space. The processing unit is configured to decode the encoded bitstream into the at least one PLF by reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters. According to a seventh aspect there is presented a computer program for decoding an encoded bitstream into a PLF, the computer program

comprising computer program code which, when run on a processing unit, causes the processing unit to perform a method according to the fifth aspect.

According to an eight aspect there is presented a computer program product comprising a computer program according to the seventh aspect and a computer readable means on which the computer program is stored.

Advantageously the disclosed encoding and decoding provides efficient encoding and decoding of light fields and panoramic light fields, respectively. Advantageously the disclosed encoding and decoding is scalable in number of cameras. Increasing the number of input camera will increase the number of bits but at a lower pace.

Advantageously the disclosed encoding and decoding provides angular scalability (light field), thus creating high fidelity images where motion parallax is available. A 2D or 3D only screen may just drop the angular layers and still be able to display the content. Alternatively, a network node may determine to drop transmission of the angular layers.

Advantageously the disclosed encoding and decoding may handle any input camera setup (lines, planar grid, circular grid, etc.), even non-ordered cameras.

Advantageously the disclosed encoding and decoding require only a few modifications of existing standards such as MV-HEVC and 3D-HEVC and could even be compatible for some setups (such as for a line and/ or planar grid).

Advantageously the disclosed encoding and decoding have a competitive compression efficiency compared to existing light field coding schemes.

Advantageously the disclosed encoding and decoding may utilize angular coding that supports transparency/ reflections (i.e., not only specular light), especially when a dense representation is kept and transmitted.

Advantageously the disclosed encoding and decoding allows the movie maker to select where to compress more (by giving the movie maker free control of the projection model).

In this respect it is to be noted that any feature of the first, second, third, fourth, fifth, sixth, seventh and eight aspects may be applied to any other aspect, wherever appropriate. Likewise, any advantage of the first aspect may equally apply to the second, third, fourth, fifth, sixth, seventh, and/or eight aspect, respectively, and vice versa. Other objectives, features and advantages of the enclosed embodiments will be apparent from the following detailed disclosure, from the attached dependent claims as well as from the drawings.

Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the element, apparatus, component, means, step, etc." are to be interpreted openly as referring to at least one instance of the element, apparatus, component, means, step, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated. BRIEF DESCRIPTION OF THE DRAWINGS

The inventive concept is now described, by way of example, with reference to the accompanying drawings, in which:

Fig. l is a schematic diagram illustrating a rendering unit according to prior art; Fig. 2 is a schematic diagram illustrating an image communications system according to an embodiment;

Fig. 3a is a schematic diagram showing functional units of an electronic device according to an embodiment;

Figs. 3b and 3c are schematic diagrams showing functional modules of an electronic device according to an embodiment;

Fig. 4 shows one example of a computer program product comprising computer readable means according to an embodiment;

Fig. 5 schematically illustrates parts of an image communications system according to an embodiment; Fig. 6 schematically illustrates parts of an image communications system according to an embodiment; Fig. 7 schematically illustrates angular coordinates for cameras according to an embodiment;

Fig. 8 schematically illustrates representation in an angular space according to an embodiment; Fig. 9 schematically illustrates slicing according to an embodiment;

Fig. 10 schematically illustrates angular space sampling according to an embodiment;

Fig. 11 schematically illustrates a lD encoding order according to an embodiment; Fig. 12 schematically illustrates a 2D encoding order according to an embodiment; and

Figs. 13, 14, 15 and 16 are flowcharts of methods according to embodiments. DETAILED DESCRIPTION

The inventive concept will now be described more fully hereinafter with reference to the accompanying drawings, in which certain embodiments of the inventive concept are shown. This inventive concept may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided by way of example so that this disclosure will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. Like numbers refer to like elements throughout the description. Any step or feature illustrated by dashed lines should be regarded as optional.

Embodiments disclosed herein relate to encoding a light field (LF) into a bitstream. In order to obtain such encoding there is provided an electronic device, a method performed by the electronic device, a computer program comprising code, for example in the form of a computer program product, that when run on a processing unit of the electronic device, causes the processing unit to perform the method. Embodiments disclosed herein further relate to decoding an encoded bitstream into a panoramic light field (PLF). In order to obtain such decoding there is provided an electronic device, a method performed by the electronic device, a computer program comprising code, for example in the form of a computer program product, that when run on a processing unit of the electronic device, causes the processing unit to perform the method.

Fig. 2 schematically illustrates an image communications system 20 according to an embodiment. The image communications system 20 comprises an M-by-N camera array 21. The camera array 21 comprises M-by- N cameras, one of which is identified at reference numerals 21a configured to capture (or record) images of a scene 22. In Fig. 2 the scene is schematically, and for illustrative purposes, represented by s single object (a circle). As the skilled person readily understands, the scene may comprise a variety of objects of possibly different shapes and with possible different distances to the cameras 21a. Image data captured by the cameras 21a represents a light field of the scene 22 and is transmitted to an electronic device 30, 30a acting as an encoder. The electronic device 30, 30a encodes the light field into a bitstream. The encoded bitstream is communicated over a symbolic communications channel 23. The symbolic communications channel 23 may be implemented as a storage medium or as a transmission medium between two electronic devices. Hence the symbolic communications channel 23 may be regarded as a delayed or real-time communications channel. The encoded bitstream is received by an electronic device 30, 30b acting as a decoder. Hence, when the symbolic communications channel 23 is implemented as a storage medium the electronic device 30, 30a and the electronic device 30, 30b maybe one and the same electronic device 30, 301, 30b. The electronic device 30, 30b decodes the received bitstream into a panoramic light field. The panoramic light field may be provided to a rendering unit 12 for displaying the panoramic light field (as in Fig. 1). Further details of the electronic device 30, 30a, 30b will now be disclosed. Fig. 3a schematically illustrates, in terms of a number of functional units, the components of an electronic device 30 according to an embodiment. A processing unit 31 is provided using any combination of one or more of a suitable central processing unit (CPU), multiprocessor, microcontroller, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate arrays (FPGA) etc., capable of executing software instructions stored in a computer program product 41a, 41b (as in Fig. 4), e.g. in the form of a storage medium 33. Thus the processing unit 31 is thereby arranged to execute methods as herein disclosed. The storage medium 33 may also comprise persistent storage, which, for example, can be any single one or combination of magnetic memory, optical memory, solid state memory or even remotely mounted memory. The electronic device 30 may further comprise a communications interface 32 for communications with, for example, another electronic device 30, an external storage medium, a camera array 21, and a rendering unit 12. As such the communications interface 32 may comprise one or more transmitters and receivers, comprising analogue and digital components and a suitable number of ports and interfaces for communications. The processing unit 31 controls the general operation of the electronic device 30 e.g. by sending data and control signals to the communications interface 32 and the storage medium 33, by receiving data and reports from the communications interface 32, and by retrieving data and instructions from the storage medium 33. Other components, as well as the related functionality, of the electronic device 30 are omitted in order not to obscure the concepts presented herein. Fig. 3b schematically illustrates, in terms of a number of functional modules, the components of an electronic device 30 acting as an encoder according to an embodiment. The electronic device 30b of Fig. 3b comprises a number of functional modules; a send and/or receive module, 31a, and an encode module 31b. The electronic device 30b of Fig. 3b may further comprises a number of optional functional modules, such as any of a determine module 31c, a reduce module 3id, a generate module 31ε, a project module 3if, a detect module 3ig, and a sample module 31I1. The functionality of each functional module 3ia-h will be further disclosed below in the context of which the functional modules 3ia-h maybe used. In general terms, each functional module 3ia-h maybe implemented in hardware or in software. Preferably, one or more or all functional modules 3ia-h maybe implemented by the processing unit 31, possibly in cooperation with functional units 32 and/ or 33. The processing unit 31 may thus be arranged to from the storage medium 33 fetch instructions as provided by a functional module 3ia-h and to execute these instructions, thereby performing any steps as will be disclosed hereinafter. Fig. 3c schematically illustrates, in terms of a number of functional modules, the components of an electronic device 30b acting as a decoder according to an embodiment. The electronic device 30b of Fig. 3b comprises a number of functional modules; a send and/or receive module 3ij, and a decode module 31k. The electronic device 30b of Fig. 3b may further comprises a number of optional functional modules, such a generate module 31I. The functionality of each functional module 31J-I will be further disclosed below in the context of which the functional modules 31J-I maybe used. In general terms, each functional module 31J-I maybe implemented in hardware or in software. Preferably, one or more or all functional modules 31J-I maybe implemented by the processing unit 31, possibly in cooperation with functional units 32 and/ or 33. The processing unit 31 may thus be arranged to from the storage medium 33 fetch instructions as provided by a functional module 31J-I and to execute these instructions, thereby performing any steps as will be disclosed hereinafter. As noted above, the electronic device 30, 30a and the electronic device 30, 30b maybe one and the same electronic device. In such embodiments the functional modules 3ia-h of the electronic device 30, 30a and the functional modules 31J-I maybe combined. Hence, only one send and/or receive module may be used instead of the separate send and/ or receive modules 31a, 31J, only one generate module may be used instead of the separate generate modules 31ε, 31I, only one detect module maybe used instead of the separate detect modules 3ig, 31m, and only one determine module maybe used instead of the separate determine modules 31c, 31η.

The electronic device 30, 30a, 30b maybe provided as a standalone device or as a part of a further device. The electronic device 30, 30a, 30b maybe provided as an integral part of the further device. That is, the components of the electronic device 30, 30a, 30b maybe integrated with other components of the further device; some components of the further device and the electronic device 30, 30a, 30b maybe shared. For example, if the further device as such comprises a processing unit, this processing unit may be arranged to perform the actions of the processing unit 31 of the electronic device 30, 30a, 30b. Alternatively the electronic device 30, 30a, 30b maybe provided as a separate unit in the further device. The further device may be a digital versatile disc (DVD) player, Blu-ray Disc player, a desktop computer, a laptop computer, a tablet computer, a portable wireless device, a mobile phone, a mobile station, a handset, wireless local loop phone, or a user equipment (UE).

Fig. 4 shows one example of a computer program product 41a, 41b

comprising computer readable means 43. On this computer readable means 33, a computer program 42a can be stored, which computer program 42a can cause the processing unit 31 and thereto operatively coupled entities and devices, such as the communications interface 32 and the storage medium 33, to execute methods for encoding a light field (LF) into a bitstream according to embodiments described herein. On this computer readable means 43, a computer program 42b can be stored, which computer program 42b can cause the processing unit 31 and thereto operatively coupled entities and devices, such as the communications interface 32 and the storage medium 33, to execute methods for decoding an encoded bitstream into a panoramic light field (PLF) according to embodiments described herein. The computer program 42b and/or computer program product 41b may thus provide means for performing any steps as herein disclosed. In the example of Fig. 4, the computer program product 41a, 41b is illustrated as an optical disc, such as a CD (compact disc) or a DVD (digital versatile disc) or a Blu-Ray disc. The computer program product 41a, 41b could also be embodied as a memory, such as a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or an electrically erasable programmable read-only memory (EEPROM) and more particularly as a non-volatile storage medium of a device in an external memory such as a USB (Universal Serial Bus) memory or a Flash memory, such as a compact Flash memory. Thus, while the computer programs 42a, 42b are here schematically shown as a track on the depicted optical disk, the computer programs 42a, 42b can be stored in any way which is suitable for the computer program product 41a, 41b.

Figs. 13 and 14 are flow charts illustrating embodiments of methods for encoding a light field (LF) into a bitstream as performed by an electronic device 30, 30a. The methods are advantageously provided as computer programs 42a. Figs. 15 and 16 are flow charts illustrating embodiments of methods for decoding an encoded bitstream into a panoramic light field (PLF) as performed by an electronic device 30, 30b. The methods are advantageously provided as computer programs 42b. Encoding

Reference is now made to Fig 13 illustrating a method for encoding a light field (LF) into a bitstream as performed by an electronic device 30, 30a according to an embodiment.

The electronic device 30, 30a is configured to, in a step S102, receive an LF of a scene and parameters relating to the LF. The parameters describe a three- dimensional (3D) model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field (PLF), and a projection method for generating a PLF space from the images and the 3D model. The processing unit 31 maybe configured to perform step S102 by executing functionality of the functional module 31a. The computer program l6

42a and/or computer program product 41a may thus provide means for this step.

The electronic device 30, 30a is configured to, in a step S118, encode the at least one PLF and the parameters into the bitstream. The encoding is performed by sampling the sequence of PLF spaces into a sequence of PLFs and by applying compression to remove redundancy in the sequence of PLFs. The processing unit 31 maybe configured to perform step S118 by executing functionality of the functional module 31b. The computer program 42a and/or computer program product 41a may thus provide means for this step Reference is now made to Fig. 14 illustrating methods for encoding a light field (LF) into a bitstream as performed by an electronic device 30, 30a according to further embodiments.

The encoding may be used to convert initial LF input data representing multiview videos to a panoramic light field 3D/4D video. Hence, the LF may represents images defining multiple views comprising pixels of the scene, where the images have been captured by an N-by-M array of cameras, and where at least one of M and N is larger than 1, and using 3D model

parameters for at least one captured image.

A LF may be thus be acquired from a set of cameras arranged on a grid (such as on a plane or circular grid) or on a line. The initial LF input data may be, but is not limited to: several 2D videos acquired by L=M-N real or virtual cameras, with associated depth maps, a subset of this data, and/or from time and frequency synchronized videos.

In the following, two cases (denoted Case A and Case B, respectively) will be considered. The first case is related to scenarios where the array of cameras is one-dimensional (iD), i.e., where one of M and N is equal to 1, and the second case is related to scenarios where the array of cameras is two- dimensional (2D), i.e., where both M and N are larger than 1.

Preprocessing Preprocessing, which maybe performed after the receiving in step S102 but prior to the encoding in step S118, may comprise at least some of the following steps (which may be performed for each of the videos frames, i.e., images, captured by the cameras): 1. Selection of a panoramic projection model.

2. Generation of point clouds from input videos/images and a 3D model, such as a depth representation, of the scene being captured by the cameras.

3. Merging and simplification of the point cloud so that each pixel in a

panoramic view of the scene has only one 3D point, for example by projecting the depth representation to a panoramic view, and back- projecting each 3D point and the depth representation of the panoramic view.

4. Generation of the 3D/4D LF space by projecting the merged and

simplified point cloud to each input view.

5. Filling of holes (pixel per pixel), in each camera direction, in the 3D/4D LF space.

6. Slicing of the 3D/4D space into several 2D videos/images, for example depending on the input format of the encoder. Next follows a more detailed description of each of these steps.

Select a panoramic projection model

At least some of the embodiments disclosed herein are based on embedding the 3D structure into one, global 2D depth map (or 3D mesh) that can later be used to recreate as many views of the captured scene as possible. A projection model that will merge all the views into one global view that will have no overlapping at all (i.e., defining a bijective function) may therefore be used. l8

There maybe several ways to achieve this. One way is to define a 3D surface and project the 3D content onto it. This may be visualized as a virtual surface being a large sparse sensor of a virtual camera, covering all the input cameras in the N-by-M array of cameras and where each pixel would have a quite different 3D location and projection angle. Physically, it would be similar to having one large sparse light capturing sensor that covers all the cameras instead of having one small light capturing sensor per camera. A parametric surface shape and location with a projection equation may therefore be defined. Since 2D videos may later be encoded, the surface is defined as being a distorted 2D rectangle. Fig. 5 schematically illustrates a top view of the image communications system 20, where a surface 51 has been schematically illustrated. In Fig. 5 cameras 21a are placed along an x-axis and configured to capture a scene 22. The surface 51 is placed between the cameras 21a and the scene 22. Projections perpendicular from the surface 51 are schematically illustrated at reference number 52 on the side of the surface 51 facing the scene 22.

In this respect, one objective may thus be to find a good surface distortion and localization such that all the pixels of the input images are projected onto the surface (considering a sufficient enough surface sampling resolution), and all the pixels occulted in the input images are projected onto the surface. This may be regarded as a hard problem but it may be solved using spline/radial basis functions within an optimization framework, or the problem itself may be simplified by relaxing some constraints. For example, the problem may be simplified by assuming a per-pixel (or per-column for Case A) pinhole camera. The generation of the panoramic image and depth map (both of size Wp-by-Hp pixels) is performed such that for each column c (from 1 to Wp) the projection matrix Pc maybe defined such that:

P(x, y) = Kp^■ M(x, y), with M(x, y) = [R(x, y) T x, y)] Kp is a scaled calibration matrix (with for instance Kp[o][o] = s-fx, where s = Wp/W) of the cameras (here assumed to be equal for all cameras) and R(x,y) and T(x,y) correspond to a virtual interpolation of the cameras such that, for Case A:

)E_i+1) '

X-N

with i =

Wp and a = -— i,

Wp and Ti being the position of input camera i, Ei being the Euler angles of the orientation of camera i and EulerToMat representing a function that transforms Euler angles to a rotation matrix. Angle axis, quaternions can be used as well. In Case B, with a planar array:

A bi-cubic or bilinear interpolation of, for example, the four closest camera locations maybe used, or any other surface/spline based approximation may be used, as mentioned above. Graphically, this corresponds to creating virtual cameras in between the cameras actually used when capturing images of the scene. Fig. 6 schematically illustrates one examples of a camera array (for case A) comprising cameras 21a and with interpolated cameras 21b

configured to capture the scene 22. This is a simple but relatively efficient way to interpolate the cameras locations. As is understood by the skilled person, other interpolations, such as splines, and other projections, such as orthogonal or perpendicular projections, etc. may alternatively be used.

With the projection matrices being defined, a per-pixel (or per column) projection of all the points of the point cloud onto the panoramic image and associated depth map maybe performed and only the necessary points, appearing in the associated column, may be kept. As is understood by the skilled person, other types of surfaces and surface projections (shape, location) may alternatively be used. For example, a semi- rectangular cylinder covering the foreground content may be used, even if the cameras are aligned on a plane. As is also understood by the skilled person, the sampling rate may be adapted to the surface curvature so that high-curved areas are more densely sampled than low-curved areas.

Generation of point clouds

The electronic device 30, 30a maybe configured to, in an optional step S104, determine a point cloud from the images and the parameters describing the 3D model and the camera parameters. The processing unit 31 maybe configured to perform step S104 by executing functionality of the functional module 31c. The computer program 42a and/or computer program product 41a may thus provide means for this step The electronic device 30, 30a may further be configured to back-project all (or part of) the images using the associated depth maps. This will generate L point clouds, denoted PCx (where x goes from 1 to L).

As is understood by the skilled person, a 3D mesh can be converted to, or expressed as, a 3D point cloud by sampling the faces of the mesh. Merging and simplification of the point cloud

The electronic device 30, 30a maybe configured to, in an optional step S106, reduce the number of points in the point cloud such that each pixel in the PLF has only one point in the point cloud. The processing unit 31 may be configured to perform step S 106 by executing functionality of the functional module 3id. The computer program 42a and/ or computer program product 41a may thus provide means for this step.

The electronic device 30, 30a maybe configured to, in an optional step S108, generate a panoramic 3D model from the projection model by projecting the point cloud. The 3D model may be a depth map. The processing unit 31 may be configured to perform step S108 by executing functionality of the functional module 3ie. The computer program 42a and/or computer program product 41a may thus provide means for this step. The electronic device 30, 30a may be configured to, in an optional step S110, project the point cloud to each input image of the cameras, and to generate the at least one PLF using the projection model by storing the different colors in a PLF space. The processing unit 31 may be configured to perform step S110 by executing functionality of the functional module 3if. The computer program 42a and/ or computer program product 41a may thus provide means for this step.

Enabling each pixel in the panoramic depth (PPD) to only have one 3D point, may be implemented in different ways.

One way is to project all the points to the panoramic depth map using the projection model. Then, the panoramic depth map may be back-projected, using the inverse of the projection model, in order for a single point cloud to be obtained.

The point cloud maybe projected to the panoramic depth as described above with reference to how the panoramic projection model may be selected. Generation of a 3D/4D panoramic light field space

In general terms, the panoramic light field space (PLF space) may be defined as being the 3D or 4D space with the following parameters: A first axis, denoted X, corresponds to the x axis of the panoramic image PPI (i.e. the sampled projection surface, 51), thus ranging from 1 to Wp, where Wp is the resolution of the LF in the x-direction. A second axis, denoted Y, corresponds to the y axis of the panoramic image PPI, thus ranging from 1 to Hp, where Hp is the resolution of the LF in the y-direction. Ch denotes the horizontal cameras index on the input camera array, ranging from 1 to N. The cameras are considered ordered from 1 being the left-most camera to N being the right-most camera. Cv corresponds to the vertical cameras index on the input camera array, ranging from l to M. The cameras are considered ordered from l being the up-most cameras to N being the bottom-most cameras. This applies only if the input camera array is 2D (i.e., case B only) and the vertical parallax is desired to be kept. Ch and Cv can be converted to an angular space using the parameter theta and omega instead of the camera index spaces. Figs. 7 and 8 provide a representation of theta and omega. In Fig. 7 Ci-i, Ci, and Ci+i are three different camera indexes of cameras 21a placed along the x-axis. Thus, a 3D point P from a point cloud can be defined by its 3D coordinates (Χ,Υ,Ζ) but also by an angular space theta, omega and length (which can be the depth Z).

This PLF space contains only color information since the 3D structure is encoded in the panoramic 3D model and projection model.

Filling of holes in the 3D/4D light field space In general terms, in the PLF space, so-called "holes", or areas with missing pixel values, may appear when there is occlusion (i.e., where foreground content cover background content). A 3D point that belongs to the

background content may in such cases not be projected onto all the views.

Besides, the pixels around the edges of the panoramic image may only appear in a few viewpoints since the viewpoint makes the pixels shift to the left/right. Thus some pixels maybe shifted outside the input image size, as shown in the non-hatched area of Fig. 10a (see below).

Such occlusions may be filled by determining values for the missing pixels. Hence the electronic device 30, 30a may be configured to, in an optional step S112, detect an entry in the PLF space representing a missing pixel; and, in an optional step S114, determine a value for the entry. Determine the value for the entry causes the hole to be filled. Filling the holes in the PLF space may be accomplished by employing a simple linear interpolation/ extrapolation. The process of hole filling is also denoted inpainting. The processing unit 31 may be configured to perform step S112 by executing functionality of the functional module 3ig. The computer program 42a and/or computer program product 41a may thus provide means for this step. The processing unit 31 maybe configured to perform step S114 by executing functionality of the functional module 31c. The computer program 42a and/or computer program product 41a may thus provide means for this step.

Slicing of the panoramic light field space

The electronic device 30, 30a maybe configured to, in an optional step S116, sample the PLF space along two-dimensional planes so as to slice the PLF space into slices. The processing unit 31 maybe configured to perform step S116 by executing functionality of the functional module 31I1. The computer program 42a and/or computer program product 41a may thus provide means for this step. Each slice represents a two-dimensional image. All slices have a color variation but a common panoramic 3D structure: the panoramic 3D model. Fig. 9 schematically illustrates a first example where a slice 90 is taken along the camera index axis (Ch), at Ch = 2, and a second example where a slice 91 is taken along a diagonal direction of the camera index axis and the x-axis.

In some embodiments, only a part of the PLF space (and hence not the complete 3D/4D space) is encoded. Slicing the space may then be regarded as sampling the space with a (eventually planar) 2D/ 3D path. One slice corresponds to one PLF. The rendering (see bellows) may also be achieved by slicing this space.

As disclosed above, there maybe different ways to perform the actual encoding (i.e., how to sample the sequence of PLF spaces into a sequence of PLFs and how to apply compression to remove redundancy in the sequence of PLFs). Further details of the encoding in step S118 will now be disclosed.

For example, at least one PLF may be encoded as a sequence of 3D video frames (or 4D video frame, or a set of 2D video frames with a panoramic 3D mesh or depth map) having dependent layers. The layers may be encoded one by one in a predetermined order ..Encoding of layer k+i may be dependent on encoding of layer k for K layers, and where o<k<K-i. Encoding the at least one PLF may comprise encoding a quantized pixel difference between layer k+i and layer k._The 3D model may be represented by a depth map or a 3D mesh of the scene. Further, the electronic device 30, 30a maybe configured to, in an optional step 120, encode positions of the cameras relative to the scene. The processing unit 31 may be configured to perform step S120 by executing functionality of the functional module 31b. The computer program 42a and/ or computer program product 41a may thus provide means for this step. Since a PLF space and its associated panoramic 3D mesh or depth map (PPD) have been generated, the PLF space may be sliced, as in step S116 above, thus generating several 2D images that have the same 3D structure but with a color variation.

The first PLF slice (2D image/video) to be encoded is denoted PPL As will be further disclosed below, the encoding of the remaining slices will be based on this first slice.

The PPI may be encoded using known 2D video coding techniques such as the ones used in HEVC. Although HEVC (or other 2D video codecs) may be used as is, such known 2D video coding techniques may be adapted in order to handle the video resolution increase. In fact, in order to avoid a 3D resolution drop due to the re-projection/resampling process, the panoramic image resolution may have to be increased, thus leading to large image sizes, typically more than 4K (i.e., larger than 3840-by-2i6o pixels). One possibility is to increase the block sizes. The depth map PPD can be stored using only one component (typically the luminance component) and then any existing coding techniques for standard 2D videos may be applied. The depth map is generally easier to encode than the PPI and more specific coding techniques, such as the ones used in 3D- HEVC, may be used. For instance, the motion vectors estimated when coding PPI can be re-used for encoding the depth map. This depth map may thus correspond to a new dependent layer, say layer 1, when using MV-HEVC or an equivalent encoding technique. Alternatively a 3D mesh maybe encoded instead of the depth map. At this stage, the herein disclosed encoding is able to generate both 2D and 3D videos since it contains both the colors and their associated depth (or position for the 3D mesh) of most of the necessary pixels.

Encoding the panoramic projection model

In general terms, a decoder will need to know what projection model was used when creating the PLF and hence the projection mode information may be encoded in the bitstream as well. This can be achieved in various ways. Examples include, but are not limited to, equations, surface mesh, matrix representation, a UV-mapping etc. Also information defining the camera locations may be encoded in the bitstream.

Encoding the angular space Different ways of encoding the light field (as represented by the angular space) will now be disclosed. The angular space has typically one dimension (case A) or two dimensions (case B). In more details, either the cameras 21a are aligned on a line and/or an horizontal or vertical motion parallax maybe allowed, or the cameras are set up in a 2D array and both horizontal and vertical motion parallax may be allowed. In the first case (case A) the angular space will have only one dimension and in the second case (case B) the angular space will have two dimensions.

The angular space may be encoded using dependent layers. In this

embodiment the angular space is sampled with a regular slicing of the PLF space. By slicing is meant sampling (with interpolation) the PLF space in order to generate rectangular images that then may be encoded.

Figs. 10a and 10b provide illustrations of the angular space sampling of PPI and LFi for Case A, i.e., where the angular space is iD. PPI corresponds to the pixels colors for which theta is o. This is referred to as layer o (denoted LFo). LFi represents one slice of the angular space theta. Theta may be replaced by the camera index and the space may be spliced in the camera dimension.

An example of regular slicing can be such that: LFi corresponds to the pixel colors of all pixels where the angle is equal to, say 15 degrees; LF2

corresponds to the pixel colors of all pixels where the angle is equal to, say -15 degrees; LF3 corresponds to the pixel colors of all pixels where the angle is equal to, say 30 degrees; LF4 corresponds to the pixels colors of all pixels where the angle is equal to, say -30 degrees; etc. LFx, where x denotes an index of the slice, are images of the same size as the PPI and, for most of the content, will have the same or slightly the same pixels colors.

The hatched area in Fig. 10a corresponds to the valid space where the pixels will have colors. This space may be discrete since the input data is a discrete number of cameras and hence an interpolation technique (bilinear, bi-cubic, etc.) maybe used to obtain the color value if the slicing lands in between two theta values. When there is no color (outside the hatched area), the color of the last valid theta at the same pixel location (x,y) may be used. Or an extrapolation technique may be used to obtain the color. In Fig. 10a the y-axis has been omitted to ease the illustration. The x-axis corresponds to the x-axis of the PPI (but may also be regarded as the real X-axis of the 3D space coordinate).

For Case B, i.e., where the angular space is 3D, the 2D angular space maybe regularly sliced using the same approach as for Case A.

The herein disclosed encoding is also applicable for other types of slicing. Information relating to how the slicing was performed may thus also be encoded in the bitstream. The slicing may be planar or parametric.

The different light field layers may be expressed as a small variation of the first light field layer. Thus the same mechanisms as used in the multiview or 3D extensions of HEVC (MV-HECV, 3D-HEVC) maybe used to encode the extra layers as dependent layers. One way to accomplish this is to encode the most extreme layers (highest positive/negative theta/omega values, denoted LFmax) first (after encoding the PPI and the depth map) in order to get a scalability on the angular space reconstruction accuracy. In fact, only one extra LF layer may be needed to get a first approximation of the angular-based color changes in Case A since a simple linear interpolation may create a color variation in the motion parallax.

Another way to accomplish this is to encode the quantized difference between the prediction given by (LFo, LFmax) or the two closest encoded layers. For all the light field layers, only the quantized pixels difference between the predictor and the real LF may be encoded. Each layer may be encoded one by one in a specific order that achieves a preferred compression ratio. The predictors may thus be estimated based on the previously encoded layers. Fig 11 schematically illustrates an example of encoding order for Case A with a total of 7 light field layers (including LFo=PPI). The numbers in Fig. 11 are the coding order (if not counting the depth map layer).

In terms of predictors, LFi uses LFo pixels colors as a predictor. This is symbolically expressed as pred(LFi) = LFo. LF2 uses a simple interpolation and/or extrapolation between LFo and LFi. This is symbolically expressed as pred(LF2) = LFo + a-(LFi - LFo) with the parameter a being the distance ratio such that a = (theta2- thetao)/( thetai- thetao). Similarly, LF3's predictor is pred(LF3) = LFo + b-(LFi - LFo) with b = (theta3- thetao)/( thetais- thetao) for example when using a linear interpolation. More complex interpolation, such as cubic interpolation (using LF2, LFo and LFi), may be used. The same applies for the other LF layers as well.

Fig. 12 schematically illustrates an example of encoding order for Case B with a total of 25 light field layers (including LFo=PPI). In the 2D case, instead of using a lD interpolation, a 2D interpolation (such as bi-linear, bi-cubic, etc.) is used. In some cases, e.g. when there is not any transparent objects in the scene, disparity vectors are not needed. The layers may therefore be very cheap (i.e., require very few bits) to encode. Additionally or alternatively, motion vectors estimated in layer o (PPI) can be re-used for the other layers, thus being transmitted only once.

Additionally or alternatively, blocks may not be used. In such cases the quantified difference between the predicted layer and the true layer is encoded. Whilst there is not any motion, some image content (such as dense details) may be more difficult to encode, thus requiring denser sampling in the angular space and/ or requiring a further predictor.

Existing techniques that are able to encode 3D or 4D spaces, using for instance blocks such as JP3D, may additionally or alternatively be used for encoding the PLF. Such techniques may also be used to encode the PLF sampled space (or a transformed version of it). Encoding techniques based on the discrete Wavelet transform (DWT) may be used as well.

It may sometimes be difficult to find a surface projection that covers all the input pixels. Therefore, more than one surface projection maybe used, thus generating several layers of PPI and panoramic 3D models. By sending the appropriate surface projection parameters, a decoder would still be able to select the proper model. For example one surface projection model may be used for the background image content and one surface projection model may be used for the foreground image content.

Decoding

Reference is now made to Fig 15 illustrating a method for decoding an encoded bitstream into a panoramic light field (PLF) as performed by an electronic device 30, 30b according to an embodiment.

In principle, the decoding process may comprise the opposite steps of the above disclosed encoding process. Expressed alternatively, the decoding may involve performing the inverse operations, in reverse order, of the operations performed during the encoding. For example, the decoder may decode the PLF, then reconstruct the PLF space, then fill any holes in the PLF space, then generate one or more 2D image(s) to be shown (or rendered) on a display (since a LF is by itself usually not shown). The electronic device 30, 30b is configured to, in a step S202, receive an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF. The parameters describe a panoramic three- dimensional (3D) model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back-projection method for generating the images from the at least one PLF, and samplings of a PLF space. The processing unit 31 maybe configured to perform step S202 by executing functionality of the functional module 31J. The computer program 42b and/or computer program product 41b may thus provide means for this step. The electronic device 30, 30b is configured to, in a step S204, decode the encoded bitstream into the at least one PLF by reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters. The processing unit 31 maybe configured to perform step S204 by executing functionality of the functional module 31k. The computer program 42b and/ or computer program product 41b may thus provide means for this step.

Reference is now made to Fig 16 illustrating methods for decoding an encoded bitstream into a panoramic light field (PLF) as performed by an electronic device 30, 30b according to further embodiments.

For example, the bitstream may comprise a header. The header may represent the parameters received in step S202. The electronic device 30, 30b may then be configured to, in an optional step S206, decode the header, thereby extracting the panoramic 3D model of the scene, the input camera parameters relating to cameras having captured images of the scene, the back-projection method being used for generating the images from the PLF, and the at least one PLF. In general terms, back-projection is the inverse application of the projection performed during the pre-processing. The processing unit 31 maybe configured to perform step S206 by executing functionality of the functional module 31k. The computer program 42b and/ or computer program product 41b may thus provide means for this step. Hence, the decoding may involve extracting the video headers that notably includes the projection model in order to re-create the point cloud, the eventual angular space slicing parameters that defines how the layers (for such embodiments) were defined and how to reconstruct the angular space defined above. Further, the electronic device 30, 30b may be configured to, in an optional step S208, generate a sequence of PLF spaces from the bitstream and the decoded at least one PLF representing the PLF spaces. The processing unit 31 maybe configured to perform step S208 by executing functionality of the functional module 31I. The computer program 42b and/ or computer program product 41b may thus provide means for this step.

Any holes in the PLF spaces maybe filled using, for instance, bi-linear interpolation. Such holes may be generated during the slicing resulting from the sampling in step S116. Hence, the electronic device 30, 30b maybe configured to, in an optional step S210, detect an entry in at least one PLF space of the sequence of PLF spaces representing a missing pixel; and, in an optional step S212, determine a value for said entry. The processing unit 31 maybe configured to perform step 210 by executing functionality of the functional module 31m. The computer program 42b and/or computer program product 41b may thus provide means for this step. The processing unit 31 may be configured to perform step S212 by executing functionality of the functional module 31η. The computer program 42b and/or computer program product 41b may thus provide means for this step.

Rendering

At least some of the following steps may be performed in order to render images from the LF. For example, the electronic device 30, 30b maybe configured to, in an optional step S214, generate a point cloud by back-projecting the panoramic 3D model from the at least one PLF from said 3D video frames using the back-projection model. The processing unit 31 maybe configured to perform step S214 by executing functionality of the functional module 31I. The computer program 42b and/or computer program product 41b may thus provide means for this step.

For example, the electronic device 30, 30b maybe configured to, in an optional step S216, generate images from said point cloud and PLF space based on the parameters describing the panoramic 3D model, camera parameters, the PLF space comprising pixel colors of the scene, by projecting the point cloud with colors coming from the PLF space. In some cases, there may be no need use the back-projection parameters as such since each point of the point cloud is associated to a set of colors in the PLF space; the generation in step S216 may be implemented by a reading in a 3D/4D matrix. The processing unit 31 maybe configured to perform step S216 by executing functionality of the functional module 31I. The computer program 42b and/ or computer program product 41b may thus provide means for this step.

For example, the bitstream may comprise information relating to positions of the cameras relative the scene. The electronic device 30, 30b may then be configured to, in an optional step S218, decode the positions of the cameras; and, in an optional step S220, generate the images from the PLF space based on the positions. Colors in the images may then depend on these positions. The processing unit 31 maybe configured to perform step S218 by executing functionality of the functional module 31k. The computer program 42b and/or computer program product 41b may thus provide means for this step. The processing unit 31 maybe configured to perform step S220 by executing functionality of the functional module 31I. The computer program 42b and/ or computer program product 41b may thus provide means for this step. The inventive concept has mainly been described above with reference to a few embodiments. However, as is readily appreciated by a person skilled in the art, other embodiments than the ones disclosed above are equally possible within the scope of the inventive concept, as defined by the appended patent claims.

Claims

1. A method for encoding a light field, LF, into a bitstream, the method being performed by an electronic device (30, 30a), comprising the steps of: receiving (S102) an LF of a scene and parameters relating to the LF, the parameters describing a three-dimensional, 3D, model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field, PLF, and a projection method for generating a PLF space from said images and said 3D model, and

encoding (S118) said at least one PLF and said parameters into the bitstream by sampling the sequence of PLF spaces into a sequence of PLFs and applying compression to remove redundancy in the sequence of PLFs.

2. The method according to claim 1, wherein said LF represents images defining multiple views comprising pixels of said scene, said images having been captured by an N-by-M array of cameras, where at least one of M and N is larger than 1, and using 3D model parameters for at least one captured image.

3. The method according to claim 1 or 2, further comprising:

determining (S104) a point cloud from said images and said parameters describing said 3D model and said camera parameters.

4. The method according to claim 3, further comprising:

reducing number of points (S106) in said point cloud such that each pixel in said PLF has only one point in said point cloud.

5. The method according to claim 3 or 4, further comprising:

generating (S108) a panoramic 3D model from said projection model by projecting said point cloud.

6. The method according to claim 3, 4 or 5, further comprising:

projecting (S110) said point cloud to each input image of said cameras, and generating the at least one PLF using said projection model, by storing the different colors in a PLF space.

7. The method according to claim 5, further comprising:

detecting (S112) an entry in said PLF space representing a missing pixel; and

determining (S114) a value for said entry.

8. The method according to claim 5 or 6, further comprising:

sampling (S116) the PLF space along two-dimensional planes so as to slice the PLF space into slices, each slice representing a two-dimensional image, wherein all slices have a common panoramic 3D structure but with a color variation.

9. The method according to claim 1, wherein said at least one PLF is encoded as a sequence of 3D video frames having dependent layers.

10. The method according to claim 9, wherein said layers are encoded one by one in a predetermined order.

11. The method according to claim 10, wherein encoding of layer k+i is dependent on encoding of layer k for K layers, and where o <k<K-i.

12. The method according to claim 11, wherein encoding said at least one PLF comprises encoding a quantized pixel difference between layer k+i and layer k.

13. The method according to claim 1, wherein said 3D model is represented by at least one of a depth map and a 3D mesh of said scene.

14. The method according to claim 1, further comprising:

encoding (S120) positions of said cameras relative to said scene.

15. A method for decoding an encoded bitstream into a panoramic light field, PLF, the method being performed by an electronic device (30, 30b), comprising the steps of:

receiving (S202) an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF, the parameters describing a panoramic three-dimensional, 3D, model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back-projection method for generating the images from said at least one PLF, and samplings of a PLF space; and

decoding (S204) the encoded bitstream into said at least one PLF by reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters.

16. The method according to claim 15, wherein said bitstream comprises a header, said header representing said parameters, and wherein said decoding further comprises:

decoding (S206) said header, thereby extracting said panoramic 3D model of the scene, said input camera parameters relating to cameras having captured images of the scene, said back-projection method being used for generating the images from the PLF, and said at least one PLF.

17. The method according to claim 16, further comprising:

generating (S208) a sequence of PLF spaces from said bitstream and decoded at least one PLF representing said PLF spaces.

18. The method according to claim 17, further comprising:

detecting (S210) an entry in at least one PLF space of said sequence of PLF spaces representing a missing pixel; and

determining (212) a value for said entry.

19. The method according to claim 17 or 18, further comprising:

generating (S214) a point cloud by back-projecting the panoramic 3D model of one PLF from the at least one PLF from said 3D video frames using the back-projection model.

20. The method according to claim 19, further comprising:

generating (S216) images from said point cloud and PLF space based on said parameters describing said panoramic 3D model, camera parameters, said PLF space comprising pixel colors of said scene, by projecting said point cloud with colors coming from the PLF space.

21. The method according to claim 20, wherein the bitstream comprises information relating to positions of said cameras relative said scene, the method further comprising:

decoding (S218) said positions of said cameras relative said scene; and generating (S220) said images from said PLF space based on said positions, and wherein colors in the images depend on said positions.

22. An electronic device (30, 30a) for encoding a light field, LF, into a bitstream, the electronic device (30, 30a) comprising a processing unit (31), the processing unit being configured to:

receive an LF of a scene and parameters relating to the LF, the parameters describing a three-dimensional, 3D, model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field, PLF, and a projection method for generating a PLF space from said images and said 3D model, and

encode said at least one PLF and said parameters into the bitstream by sampling the sequence of PLF spaces into a sequence of PLFs and applying compression to remove redundancy in the sequence of PLFs

23. An electronic device (30, 30b) for decoding an encoded bitstream into a panoramic light field, PLF, the electronic device (30, 30b) comprising a processing unit (31), the processing unit being configured to:

receive an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF, the parameters describing a panoramic three-dimensional model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back-projection method for generating the images from said at least one PLF, and samplings of a PLF space; and

decode the encoded bitstream into said at least one PLF by

reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters.

24. A computer program (42a) for encoding a light field, LF, into a bitstream, the computer program comprising computer program code which, when run on a processing unit (31) of an electronic device (30, 30a), causes the processing unit to:

receive (S102) an LF of a scene and parameters relating to the LF, the parameters describing a three-dimensional, 3D, model of the scene, parameters for rendering at least one view of the scene from at least one panoramic light field, PLF, and a projection method for generating a PLF space from said images and said 3D model, and

encode (S114) said at least one PLF and said parameters into the bitstream by sampling the sequence of PLF spaces into a sequence of PLFs and applying compression to remove redundancy in the sequence of PLFs.

25. A computer program (42b) for decoding an encoded bitstream into a panoramic light field, PLF, the computer program comprising computer program code which, when run on a processing unit (31) of an electronic device (30, 30b), causes the processing unit to:

receive (S202) an encoded bitstream representing at least one PLF of a scene and parameters relating to the at least one PLF, the parameters describing a panoramic three-dimensional model of the scene, parameters for rendering at least one view of the scene from the at least one PLF, a back- projection method for generating the images from said at least one PLF, and samplings of a PLF space; and

decode (S204) the encoded bitstream into said at least one PLF by reconstructing the at least one PLF by applying decompression to the bitstream based on the parameters.

26. A computer program product (41a, 41b) comprising a computer program (42a, 42b) according to at least one of claims 24 and 25, and a computer readable means (43) on which the computer program is stored.