[go: up one dir, main page]

WO2020040521A1 - Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux - Google Patents

Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux Download PDF

Info

Publication number
WO2020040521A1
WO2020040521A1 PCT/KR2019/010564 KR2019010564W WO2020040521A1 WO 2020040521 A1 WO2020040521 A1 WO 2020040521A1 KR 2019010564 W KR2019010564 W KR 2019010564W WO 2020040521 A1 WO2020040521 A1 WO 2020040521A1
Authority
WO
WIPO (PCT)
Prior art keywords
view
light field
input
scene
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2019/010564
Other languages
English (en)
Korean (ko)
Inventor
세르게비치 밀유코프글렙
빅토로비치 콜친콘스탄틴
블라디슬라보비치 시무틴알렉산드르
니콜라에비치 리차고프마이클
알렉산드로비치 투르코세르게이
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from RU2018130343A external-priority patent/RU2690757C1/ru
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Priority to US17/270,276 priority Critical patent/US11533464B2/en
Publication of WO2020040521A1 publication Critical patent/WO2020040521A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/128Adjusting depth or disparity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/10Processing, recording or transmission of stereoscopic or multi-view image signals
    • H04N13/106Processing image signals
    • H04N13/156Mixing image signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/271Image signal generators wherein the generated image signals comprise depth maps or disparity maps
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof

Definitions

  • the present disclosure generally relates to imaging and relates to a system for synthesizing an intermediate light field view based on an input light filed view.
  • the four-dimensional light field image generated by acquiring the amount of light traveling in various directions in the space further includes information on the direction of light travel compared to the conventional two-dimensional image. Therefore, the light field technology may use such information to perform various image processing such as refocusing image and 3D depth information estimation.
  • light field technology that is, technology for manufacturing a set of images generated from different views has been widely used in cameras and three-dimensional displays.
  • light field synthesis techniques have been developed that increase the spatial-angle resolution of light fields and enable effective compression of light fields.
  • Modern devices that support light field technology include special-purpose light field generation cameras (also called “plenoptic cameras”) or 3D displays that use light fields. Such a device has some disadvantages that are difficult for the general user to use.
  • plenoptic cameras are expensive and are specialized to capture only an array of light field views.
  • plenoptic cameras have a trade off between spatial resolution and angular resolution. In other words, increasing the angular resolution by creating more light field views with the plenoptic camera reduces the spatial resolution for each individual view.
  • Such light field view synthesis systems can be applied to mobile devices such as, for example, smartphones, augmented and virtual reality devices and high-productivity devices such as 3D displays or PCs.
  • a quality of a scene image is improved by reducing defects of a synthesized light field view.
  • An intermediate view synthesizing method of a light field is synthesized using a specific configuration of an input view of the light field collected by the light field obtaining apparatus.
  • the intermediate view synthesizing method includes selecting a configuration of a specific input view of the collected light field, inputting coordinates of the intermediate view to be synthesized into a neural network, scene disparity using the neural network, and specifying Compositing the intermediate view based on the selected configuration of the input view and the coordinates of the specified intermediate view.
  • the configuration of the particular input view may be defined by the coordinates of the input view in the light field matrix collected by the acquisition device.
  • the size of the light field matrix is M M (M is positive),
  • the coordinate of the input view is the M It may be a coordinate corresponding to a point included in the first and last row of the M matrix and a point included in the first and last column.
  • the point When M is odd, the point may mean an intermediate point in the row or column, and when M is even, the point may mean a point closest to the middle point in the row or column.
  • Coordinates of the intermediate view may be expressed as an integer or a fraction.
  • the intermediate view synthesizing method includes calculating a light field feature map based on a selected configuration of a specific input view of the light field and calculating the scene disparity using the neural network based on the light field feature map. It may further include.
  • the intermediate view synthesis method may further include estimating the scene disparity in advance by using a depth sensor.
  • the intermediate view synthesis method may further include synthesizing the intermediate view using a pre-trained neural network.
  • a light field view capture device for capturing an input view of a light field scene and a convolutional neural network for synthesizing an intermediate view based on specific inputs of the input view, scene disparity and intermediate view in the scene light field view array of the light field scene. It provides an intermediate view synthesis system of light fields including modules.
  • the intermediate view synthesizing system includes a first calculation module for calculating a light field scene feature map based on an input view of the light field scene, a convolutional neural network module for calculating scene disparity based on the feature map, ⁇ d 1 , ..., a disparity level setting module for setting a disparity level set of d L ⁇ , and a second calculation module for calculating a new view for each of the disparity levels using each of the input views, through the following equation: And a third calculation module for calculating a feature map representing two characteristics of an average value of color and brightness of the pixel relative to the disparity level from each generated view.
  • the intermediate view synthesis system may further include a depth sensor that provides a depth value used for preliminary estimation of the disparity.
  • a mobile device including an intermediate view synthesis system of a light field for performing the intermediate view synthesis method is provided.
  • a method of compressing a light field comprises the steps of: calculating a difference between at least one intermediate view and an input view; and compressing the difference.
  • the configuration of the particular input view may be defined by the coordinates of the input view in the light field matrix collected by the acquisition device.
  • Various embodiments of the present disclosure can reduce the number of input views required to reconstruct a three-dimensional scene image.
  • Various embodiments of the present disclosure can reduce defects in the synthesized light field view.
  • 1 is a simplified view of the acquisition process of a light field in the form of a scene view array from various viewpoints by a camera array.
  • FIG. 2 is a simplified illustration of the process of creating an array of any number of intermediate views based on any configuration of an input view according to a conventional scheme.
  • FIG. 3 is a diagram schematically illustrating a method of synthesizing an intermediate view using a neural network, according to an exemplary embodiment.
  • FIG. 11 is a simplified diagram of a light field compression algorithm according to an embodiment.
  • FIG. 12 is a simplified illustration of a reconstruction algorithm of a compressed light field according to one embodiment.
  • FIG. 13 is a simplified illustration of a system for synthesizing an intermediate view of a light field using a neural network, according to one embodiment.
  • FIG. 14 is a simplified illustration of a system for synthesizing an intermediate view of a light field using a neural network according to another embodiment.
  • 15 is a simplified illustration of a system for synthesizing an intermediate view of a light field using a neural network, according to another embodiment.
  • first and second may be used to describe various components, but the components should not be limited by the terms. The terms are used only to distinguish one component from another.
  • the intermediate view synthesis method of the light field, the intermediate view synthesis system and the light field compression method of the light field may be implemented in various different forms, and are not limited to the embodiments described herein.
  • FIG. 1 schematically illustrates the acquisition process of a light field in the form of a scene view array from various viewpoints by the camera array 20.
  • the light field image generated by acquiring the amount of light traveling in various directions in the space additionally includes light direction information, unlike a conventional 2D image.
  • the light field can be represented as a view array 30 of slightly different scenes obtained by photographing the actual scene 10 from several different viewpoints.
  • the light field may be generated using the camera array 20.
  • the light field may be generated using a microlens array included in a plenoptic camera.
  • one view 40 may include several pixels.
  • the difference between the scene points (scene points) constituting the image in each view is called disparity.
  • the position coordinates of the point in the left camera are (110, 100)
  • the position coordinates of the point in the right camera may be different from each other by (90, 100).
  • the 'depth' ie the distance between each point, can be calculated through the difference in the positions of the points.
  • This disparity calculation error may cause a defect in the boundary of the object.
  • the defect may be noise having different depths at the edge of the object.
  • the pixels at the boundary of the object may be mixed in such a way that some of the pixels of the image of one object are included in the image of another object.
  • the problem of synthesizing a scene view using two, four or five sets of input scene views can be solved by various embodiments according to the present disclosure using a disparity map.
  • the disparity map is an indication of how many pixels each pixel of the input scene views should be moved in order to create an appropriate intermediate view of the scene.
  • the disparity map for a particular view of the scene may be generated using depth information of each pixel of the particular view of the scene. Disparity maps for specific views of this scene are needed to create the desired scene view.
  • the classical disparity estimation method is not accurate at the boundaries within the view due to the complexity of the depth estimation for each pixel of the scene view.
  • an arbitrary number of intermediate views array 60 can be generated using any configuration 50 of the input view obtained using a plenoptic camera or the like.
  • various embodiments of the present disclosure provide methods for synthesizing views using neural networks. For example, providing any configuration 50 of the input view as an input value to the neural network, the neural network may generate an array 60 of any number of consecutive intermediate views.
  • a method of synthesizing the array 60 of the intermediate view using a neural network will be described with reference to FIG. 3.
  • FIG. 3 is a diagram schematically illustrating a method of synthesizing an intermediate view using a neural network, according to an exemplary embodiment.
  • the method of synthesizing an intermediate view of a light field may include selecting a configuration of a specific input view of the light field collected by the light field obtaining apparatus and specifying coordinates of an intermediate view to be synthesized. Inputting to the neural network and synthesizing the intermediate view based on scene disparity, selected configuration of the particular input view, and coordinates of the specified intermediate view using the neural network.
  • FIG. 3 shows an exemplary configuration of an input view of a light field.
  • a detailed configuration of the input view of the light field will be described later with reference to FIGS. 4 to 10.
  • the configuration of the input view is determined by the coordinates of the input view in the light field input view array (or matrix).
  • the composition of the input view has a decisive impact on the quality of view synthesis throughout the scene. Properly selected configuration of the input view can optimize the depth and amount of information about the object in the scene.
  • the construction of a properly selected input view can help to control defects caused by overlapping object images.
  • FIG. 3 shows a method of processing the configuration of an input view of a light field.
  • n original light fields 71 supplied to the system input n is a positive number
  • the coordinates 72 of the intermediate view to be synthesized can also be supplied to the system input. Coordinates 72 of the intermediate view to be synthesized may be represented by (u, v). All this data can be supplied to the neural network based light field intermediate view synthesis unit 73.
  • the desired intermediate view can be generated as the output of the intermediate view synthesizing unit 73.
  • (u, v) which is the coordinate 72 of the intermediate view to be synthesized, it is possible to synthesize the continuous light field view 75.
  • the neural network approach can be used to modify the calculation of disparity.
  • the user can train the neural network to generate a disparity map to minimize errors in view synthesis.
  • 3 shows a method of synthesizing any number of intermediate views based on input views of any configuration (eg, configuration C2). Furthermore. 3 shows that the light field generated by the plenoptic camera (i.e., the original light field 71) consists of discrete views of the light field, whereas the synthesized light field (i.e., the continuous light field view 75) Shows that it is a space of consecutive views of the light field.
  • the light field generated by the plenoptic camera i.e., the original light field 71
  • the synthesized light field i.e., the continuous light field view 75
  • the description of the intermediate view is as follows. Assume that the coordinates in the light field array of each of the input views are (1, 1), (1, 7), (7, 7), (7, 1). Then, an intermediate view of any coordinates inside the area surrounded by the coordinates of the input views, for example (4, 5) coordinates, can be synthesized. However, the present invention is not limited thereto, and the coordinates specified in the intermediate view of the light field synthesized through the neural network may have a non-integer value (eg, (4.1, 5.2)).
  • Convolutional Neural Networks can take into account the spatial structure of the scene and can correctly handle overlapping of objects with different depth levels in the view.
  • the light field synthesizing method according to the intermediate view synthesizing method of FIG. 3 may generate any view while minimizing defects on overlapping objects in continuous view space. This can be done by supplying the neural network with the coordinates of the desired view to be generated in the light field view matrix.
  • the coordinates of the desired view can be arbitrarily specified. In other words, the coordinates of the desired view may be selected from a range of coordinate values rather than from a set of light field coordinates generated by the plenoptic camera.
  • the configuration of the input view may include a symmetrical structure.
  • the input view symmetric configuration may be suitable for the neural network to reconstruct the desired light field while minimizing defects.
  • the configuration of the input view of the light field may have various structures and numbers.
  • Input Views C1 Construction of Input Views C1 is described in Nima Khademi Kalantari, Ting-Chun Wang, and Ravi Ramamoorthi. 2016. Learning-based view synthesis for light field cameras.ACM Trans.Graph. 35, 6, Article 193 (November 2016), 10 pages).
  • the configuration C1 of the input view is the configuration selected at the edge corner end of the matrix of the light field view, resulting in the most defects among the configurations C1 to C7.
  • configuration C2 of the input view is a predetermined fixed 9 It consists of views located at coordinates (4, 1), (8, 4), (1, 5) and (5, 8) in an input view array of a light field having a size of nine.
  • configuration C3 of the input view is a predetermined fixed 9 It consists of views located at coordinates (1, 1), (1, 9), (9, 1), (9, 9) and (5, 5) in an input view array of a light field having a size of nine.
  • the advantage of this configuration is when you need to composite views arranged close to the input view inside the rectangle formed by the input views.
  • configuration C4 of the input view is a predetermined fixed 9. It consists of views located at coordinates (5, 1), (1, 5), (5, 9), (9, 5) and (5, 5) in an input view array of a light field having a size of nine.
  • configuration C5 of the input view is a predetermined fixed 8 It consists of views located at coordinates (2, 2), (2, 7), (7, 2), and (7, 7) in an input view array of a light field with eight sizes.
  • the advantage of this configuration is when you need to composite views arranged close to the input view inside or outside the rectangle formed by the input views.
  • configuration C6 of the input view is a predetermined fixed 9 It consists of views located at coordinates (2, 2), (2, 8), (8, 2), (8, 8) and (5, 5) in an input view array of a light field having a size of nine.
  • the advantage of this configuration is when you need to composite views arranged close to the input view inside or outside the rectangle formed by the input views.
  • configuration C7 of the input view is predetermined fixed five. It consists of views located at coordinates (3, 3) and (3, 6) in an input view array of a light field having a size of five. Configuration C7 of the input view is advantageous in that it has only two configurations. Configuration C7 of the input view considers a variant in which the intermediate view is synthesized based on only two views of the initial light field. This configuration C7 can be used, for example, when two smartphone cameras are used.
  • the composition of the input view has a decisive impact on the quality of view synthesis throughout the scene. Properly selected configuration of the input view can optimize the amount of information about depth and scene objects. In addition, the construction of a properly selected input view can help to control defects caused by overlapping object images.
  • the quality of the composite image is determined by the distance from the view under consideration to the nearest input view. Computing such a distance for an intermediate view generated using the configurations C1 and C2 of the input view results in using the configuration C2 because the distance in configuration C2 of the input view is smaller than the distance in configuration C1 of the input view.
  • the quality of the composite image may be higher than that of C1.
  • An optimized configuration of a particular input view from an M (M is positive) sized light field array is the arbitrary point in the first and last row and the first and last in the light field array. It may be determined by the coordinates corresponding to any point in the second column. In this case, if M is odd, the arbitrary point may mean an intermediate point in a row or column, and if M is even, the arbitrary point may mean a point closest to the middle point in a row or column.
  • the system for synthesizing an intermediate view of the light field may be used to realize part of an algorithm for compression of the light field to minimize resource costs during data transmission in the data transmission network. .
  • FIG. 11 is a simplified illustration of a light field compression algorithm 1000 according to one embodiment.
  • the compression algorithm 1000 may include an input view selection step s101, an intermediate view synthesis step s102 of synthesizing an intermediate view using the selected input view, and a difference calculation step of calculating a difference between the input view and the intermediate view. (s103), a differential compression step (s104) of compressing the calculated difference and a compressed differential output step (s105) of outputting the compressed difference.
  • the present invention is not limited thereto, and the configuration of the input view may vary.
  • the selected input view can be provided to the current view unit.
  • the current view unit may deliver the selected input view to a view processing unit.
  • a desired intermediate view may be synthesized through a view processing unit using the method described with reference to FIGS. 1 to 3.
  • a neural network may be used to synthesize a desired intermediate view based on a specific input view.
  • a difference between the input view and the intermediate view of the light field may be calculated by using a difference calculation unit.
  • the difference calculated in the difference calculation step s103 may be compressed by a well-known transformation method such as a discrete cosine transform (DCT).
  • DCT discrete cosine transform
  • the compressed difference in the differential compression step s104 may be output.
  • the process returns to the input view selection step s101 in the differential compression step s104.
  • the compression algorithm 1000 ends if there are no more unprocessed input views remaining.
  • FIG. 12 is a simplified illustration of a reconstruction algorithm 2000 of a compressed light field according to one embodiment.
  • the reconstruction algorithm 2000 of the light field may provide the compressed difference providing step s106, the compressed difference and the input view, which provide the compressed difference to the current difference unit through the compression algorithm 1000 of FIG. 11. Reconstructing the view using operation s107, predicting the view s108, and generating the reconstructed view s109.
  • the compressed difference in the compression algorithm 1000 may be transmitted to a current difference unit.
  • the compressed difference may be sent to the current difference unit until reconstruction of all input views of the light field is complete.
  • Reconstructing the view may include reconstructing coefficients for reconstructing the difference between the synthesized intermediate view and the input view.
  • the input view may be the same as the input view selected in the input view selection step s101 of the compression algorithm 1000 of FIG. 11.
  • reconstructing the view s107 may include performing an inverse transform on the transformed difference using a view reconstruction unit.
  • the view reconstruction unit may be used to perform an inverse transform on the differential compressed through the Discrete Cosine Transform (DCT) in the differential compression step s104 of FIG. 11.
  • DCT Discrete Cosine Transform
  • Predicting the view s108 may include summing the resultant values of the coefficients and inverse transformed differences obtained in the reconstructing view s107 with the light field view synthesized by the neural network. Such summing may be performed by an estimated view unit.
  • Generating the reconstructed view (s109) may reconstruct the view using the expected view in step (s108) of predicting the view. Furthermore, after completing the reconstruction of all input views of the light field using a system that synthesizes the intermediate view of the light field to reconstruct all intermediate views based on the reconstructed input view of the light field, generate the reconstructed view. You can then process the input view of the unprocessed light field.
  • FIG. 13 is a simplified illustration of a system 3000 for synthesizing an intermediate view of a light field using a neural network, according to one embodiment.
  • the neural network system 3000 illustrated in FIG. 13 may use the configurations C2 to C7 of the input views illustrated in FIGS. 5 to 10.
  • the intermediate view synthesizing system 3000 includes a first calculation module 1 and a first calculation module 1 that calculate a light field scene feature array based on an input view IV of a scene light field.
  • a second convolutional neural network 3 that synthesizes the intermediate view based on the coordinates of the intermediate view in the field view array.
  • the first calculation module 1 may calculate the light field scene feature array based on the input view IV.
  • Feature arrays may be referred to as feature maps. These features can be provided immediately as basic information of raw disparity in the neural network.
  • One of the input views may be shifted 21 times by one pixel. Such a shift can be made for any input view configuration with a number of two, four or five.
  • the mean and variance can be calculated and obtained from the shifted input view.
  • the averaged view can be obtained by adding the pixel value of the views and then dividing it by the number of views.
  • the variance can be calculated from the mean.
  • the averaged view can be calculated as follows.
  • the feature map may be calculated in the following manner. First, a vector s including the coordinates (x, y) of pixels and a position q of the intermediate view in the light field view matrix may be defined, and a vector q including the coordinates (u, v) in two-dimensional space may be defined. In addition, a vector pi indicating the position of the i-th input view may be defined. pi may include coordinates (u, v) in two-dimensional space. In other words, q and pi may be vectors in the same space. Then, knowing what the disparity map Dq (s) is for the pixels of the new view of vector s (disparity map Dq (s) is computed by the neural network), The color value can be defined according to the following equation (1).
  • a set of disparity levels can be specified.
  • the new view can be calculated through the following equation (2) for each disparity level using each light field view.
  • the average value of one pixel between each generated views can be calculated by Equation (3) below.
  • the pixel value can be defined by the color and brightness of the pixel.
  • the pixel value L means the number triplet Lc, and c has values of 1, 2, and 3.
  • L 1 may mean red
  • L 2 may be green
  • L 3 may be blue
  • Lc may have a value ranging from 0 to 2N, with N usually representing 8.
  • the variance can be obtained from this mean.
  • the number of new views generated for a particular disparity may be equal to the number of input views.
  • the average value of the pixels And standard deviation of the color value of the pixels
  • a feature map having a depth of 2L having a structure such as may be formed. That is, it has a resolution that matches the resolution of the input view containing the image averaged by equation (3) and the standard deviation for each disparity level ([d 1 , d L ], L 21).
  • the matrix can be generated.
  • the number of disparity levels can be determined experimentally.
  • the feature map can have a depth of 2L + 1, since it must also include the zero level.
  • the disparity level (L) corresponds to 21 view shifts in the positive or negative direction by each pixel plus one level corresponding to the input level (ie, unshifted view), and two features As there are 43 (2 A disparity level L of 21 + 1 may be supplied as the input of the neural network to estimate the scene disparity.
  • the two features can mean the averaged image by equation (3) and the standard deviation by equation (4).
  • W within pixels H The feature map of the tensor having a size of 2L can be supplied as the input of the neural network.
  • W and H mean the width and height of the view, respectively.
  • the feature map refers to a set of matrices (matrix including three-dimensional tensors) that describe the behavior of the input view in the process of converting the input view to the desired view. Therefore, the neural network can include features of self-adjusting and can generate abstract features (sequences of feature maps) to filter out the omissible to identify what is essential.
  • System 3000 for synthesizing an intermediate view may synthesize an intermediate view of a desired light field.
  • the first and second convolutional neural networks 2, 3 included in the system 3000 may be trained together.
  • the first convolutional neural network 2 calculates disparity.
  • the second convolutional neural network 3 directly synthesizes the desired intermediate view.
  • Stackable three-dimensional tensors from a set of input fields of the light field may be transformed by equation (2) above taking into account the disparity map received from the first convolutional neural network (2).
  • the three-dimensional tensor may include the disparity map itself and two matrices. All elements of one of the two matrices coincide with the coordinates on the x-axis of the desired view (denoted by u), and all elements of the other matrix coincide with the coordinates on the y-axis of the desired view (denoted by v). can do.
  • FIG. 14 is a simplified illustration of a system 4000 for synthesizing an intermediate view of a light field using a neural network according to another embodiment.
  • System 4000 can synthesize intermediate views without using feature maps.
  • the neural network system 4000 illustrated in FIG. 14 may use the configurations C1 to C7 of the input views illustrated in FIGS. 4 to 10.
  • the essence of the system 4000 is to feed the input view IV and the coordinates (u, v) of the desired view to the input of the neural network and output the required intermediate view.
  • disparity may be estimated using a depth sensor (not shown) instead of a neural network.
  • the depth sensor (not shown) may be provided as an additional device for providing a depth map.
  • the technology related to the depth sensor is well known in the art (cf., https://ru.wikipedia.org/wiki/Kinect).
  • the system 4000 uses the depth sensor to generate an intermediate view based on the coordinates (u, v) of the intermediate view in the scene light field view array and the scene disparity map generated for the input view IV of the scene light field. It may include a convolutional neural network (4) to synthesize.
  • the structure of the system 4000 may be similar to that of the system 3000 of FIG. 13.
  • the selected input view IV may be converted by Equation 2 with reference to the disparity map received from the depth sensor.
  • the transformed input view can also be fed to the convolutional neural network 4 which synthesizes the desired view.
  • the converted input view may include the disparity map itself from the depth sensor and two matrices.
  • All elements of one of the two matrices coincide with the coordinates on the x-axis of the desired view (denoted by u), and all elements of the other matrix coincide with the coordinates on the y-axis of the desired view (denoted by v). can do.
  • FIG. 15 is a simplified illustration of a system 5000 for synthesizing an intermediate view of a light field using a neural network according to another embodiment.
  • the system shown in FIG. 15 may generate an intermediate view without using a scene disparity map.
  • the system 5000 comprises a first neural network 5 for synthesizing the intermediate view of the scene light field based on the input view IV of the scene light field and the coordinates (u, v) of the intermediate view in the scene light field view array; It may comprise a second neural network 6 that has been previously trained to classify objects in the digital image.
  • the first neural network 5 may be a convolutional neural network.
  • the first neural network 5 can be pre-trained to synthesize the intermediate view without the disparity map.
  • the second neural network 6 may be either VGG-16 (one of the well known neural networks) or other classifying networks.
  • the neural network VGG-16 can identify one of 1000 objects (see https://www.quora.com/What-is-the-VGG-neural-network). VGG-16 may also be used to train the first neural network 1.
  • the input view IV of the light field and the coordinates (u, v) of the desired intermediate view may be supplied as the input value of the first neural network 5.
  • the intermediate view synthesized by the first neural network 5 can then be transferred to the second neural network 6.
  • a reference view RV having the same coordinates as the desired intermediate view synthesized by the first neural network 5 can be transferred to the second neural network 2.
  • the reference view RV is not synthesized but is generated in advance and is an original view existing from the beginning in the training data set.
  • the reference view RV may be generated by a plenoptic camera.
  • the second neural network 6 transmits the desired intermediate view and reference view RV synthesized by the first neural network 5 into the space of the view features, which makes it possible to calculate the error more effectively than from a human cognitive point of view.
  • Two view feature maps may be generated at the output of the second neural network 6. Each of the two view feature maps represents an output from one or more layers of the second neural network 6.
  • the view feature map can be used to calculate an error function.
  • the second neural network 6 may output view feature maps having the same dimensions as the desired intermediate view and reference view RV.
  • the view feature maps can be the basis for calculating the error. To this end, the well-known technique of 'perceptual loss' may be used.
  • Perception loss is described in the non-patent literature ("Perceptual Losses for Real-Time Style Transfer and Super-Resolution", Justin Johnson, Alexandre Alahi, Li Fei-Fei, 2016, https://arxiv.org/pdf/1603.08155.pdf). Is disclosed.
  • the essence of the system 5000 shown in FIG. 15 is that in order to classify the images, the intermediate view synthesized by the first neural network 5 separates the reference view RV from the series of layers of the second neural network 6. It passes.
  • the generated two view feature maps can be compared in Frobenius norm L2 (https://en.wikipedia.org/wiki/Matrix_norm#Frobenius_norm).
  • the result by the system 5000 may be a value defined as the 'distance' between the feature map of the view synthesized by the first neural network 5 and the reference view RV. As the 'distance' value increases, the operation of the system 5000 may become unstable. In other words, the value generated by the system 5000 means a calculation error in the desired intermediate view synthesized by the first neural network 5.
  • the first of the system 5000 of FIG. 15 Similar to the manner of the system 3000 of FIG. 13, instead of causing the first neural network 5 to synthesize the view as close as possible to the reference view in the pixel difference, the first of the system 5000 of FIG. 15.
  • the neural network 5 can be trained after the error value is generated to minimize the error value.
  • the principle in which the first neural network 5 is trained is well known, and a description thereof will be omitted.
  • the first step may be repeated until the desired parameter of the intermediate view synthesized by the first neural network 5 is obtained, when compared with the reference view RV. .
  • the first neural network 5 is ready to synthesize the desired intermediate view with minimized error after training and obtaining the desired synthesis parameters.
  • the intermediate view synthesis method of the light field according to various embodiments of the present disclosure may be applied to a mobile device having at least one camera.
  • the user can quickly create a series of photos without changing the camera position. For example, a small movement of the camera due to the movement of the user's hand may be sufficient to form the required number of input views of the light field. If all the pictures are taken, better quality can be obtained, but it is desirable to take the number of pictures determined by the configuration of the preselected view.
  • the generated input views may be transmitted to a processing module, that is, a part that is responsible for intermediate view synthesis of the mobile device.
  • a submodule that receives the disparity map from the depth sensor may be included in this processing module.
  • the mobile device may be operated according to generating an intermediate view of the light field, transmitting an intermediate view of the generated light field to the memory device, and outputting the generated intermediate view of the light field to the display of the mobile device. Can be.
  • each view of the light field generated by such a mobile device may have a high resolution. This effect solves the trade-off problem between each resolution and spatial resolution, a typical problem for plenoptic cameras.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Studio Devices (AREA)

Abstract

Selon un mode de réalisation, la présente invention concerne un procédé de synthèse d'une vue intermédiaire d'un champ lumineux, comprenant : une étape de sélection d'une configuration d'une vue d'entrée spécifique d'un champ lumineux collecté par un dispositif d'obtention de champ lumineux ; une étape de spécification de coordonnées d'une vue intermédiaire à synthétiser et d'introduction des coordonnées spécifiées dans un réseau neuronal ; et une étape de synthèse de la vue intermédiaire à l'aide du réseau neuronal sur la base d'une disparité de scène, de la configuration sélectionnée de la vue d'entrée spécifique, et des coordonnées spécifiées de la vue intermédiaire.
PCT/KR2019/010564 2018-08-21 2019-08-20 Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux Ceased WO2020040521A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/270,276 US11533464B2 (en) 2018-08-21 2019-08-20 Method for synthesizing intermediate view of light field, system for synthesizing intermediate view of light field, and method for compressing light field

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
RU2018130343A RU2690757C1 (ru) 2018-08-21 2018-08-21 Система синтеза промежуточных видов светового поля и способ ее функционирования
RU2018130343 2018-08-21
KR10-2019-0099834 2019-08-14
KR1020190099834A KR102658359B1 (ko) 2018-08-21 2019-08-14 라이트 필드의 중간 뷰 합성 방법, 라이트 필드의 중간 뷰 합성 시스템과 라이트 필드 압축 방법

Publications (1)

Publication Number Publication Date
WO2020040521A1 true WO2020040521A1 (fr) 2020-02-27

Family

ID=69593338

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2019/010564 Ceased WO2020040521A1 (fr) 2018-08-21 2019-08-20 Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux

Country Status (1)

Country Link
WO (1) WO2020040521A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9233063B2 (en) 2009-12-17 2016-01-12 Air Products And Chemicals, Inc. Polymeric compositions for personal care products
CN111489407A (zh) * 2020-04-09 2020-08-04 中国科学技术大学先进技术研究院 光场图像编辑方法、装置、设备及存储介质
CN113139898A (zh) * 2021-03-24 2021-07-20 宁波大学 基于频域分析和深度学习的光场图像超分辨率重建方法
US20220377301A1 (en) * 2021-04-29 2022-11-24 National Taiwan University Light field synthesis method and light field synthesis system
CN116569218A (zh) * 2020-12-24 2023-08-08 华为技术有限公司 图像处理方法和图像处理装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229653A1 (en) * 2006-04-04 2007-10-04 Wojciech Matusik Method and system for acquiring and displaying 3D light fields
KR20150063010A (ko) * 2013-11-29 2015-06-08 톰슨 라이센싱 플렌옵틱 카메라로 취득된 장면의 뷰와 연관된 시차를 추정하기 위한 방법 및 디바이스
US20160255333A1 (en) * 2012-09-28 2016-09-01 Pelican Imaging Corporation Generating Images from Light Fields Utilizing Virtual Viewpoints
KR20160107265A (ko) * 2014-01-10 2016-09-13 오스텐도 테크놀로지스 인코포레이티드 완전 시차 압축 라이트 필드 3d 촬상 시스템을 위한 방법
KR101723738B1 (ko) * 2015-08-21 2017-04-18 인하대학교 산학협력단 딕셔너리 학습 기반 해상도 향상 장치 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070229653A1 (en) * 2006-04-04 2007-10-04 Wojciech Matusik Method and system for acquiring and displaying 3D light fields
US20160255333A1 (en) * 2012-09-28 2016-09-01 Pelican Imaging Corporation Generating Images from Light Fields Utilizing Virtual Viewpoints
KR20150063010A (ko) * 2013-11-29 2015-06-08 톰슨 라이센싱 플렌옵틱 카메라로 취득된 장면의 뷰와 연관된 시차를 추정하기 위한 방법 및 디바이스
KR20160107265A (ko) * 2014-01-10 2016-09-13 오스텐도 테크놀로지스 인코포레이티드 완전 시차 압축 라이트 필드 3d 촬상 시스템을 위한 방법
KR101723738B1 (ko) * 2015-08-21 2017-04-18 인하대학교 산학협력단 딕셔너리 학습 기반 해상도 향상 장치 및 방법

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9233063B2 (en) 2009-12-17 2016-01-12 Air Products And Chemicals, Inc. Polymeric compositions for personal care products
CN111489407A (zh) * 2020-04-09 2020-08-04 中国科学技术大学先进技术研究院 光场图像编辑方法、装置、设备及存储介质
CN111489407B (zh) * 2020-04-09 2023-06-02 中国科学技术大学先进技术研究院 光场图像编辑方法、装置、设备及存储介质
CN116569218A (zh) * 2020-12-24 2023-08-08 华为技术有限公司 图像处理方法和图像处理装置
CN113139898A (zh) * 2021-03-24 2021-07-20 宁波大学 基于频域分析和深度学习的光场图像超分辨率重建方法
US20220377301A1 (en) * 2021-04-29 2022-11-24 National Taiwan University Light field synthesis method and light field synthesis system
US12058299B2 (en) * 2021-04-29 2024-08-06 National Taiwan University Light field synthesis method and light field synthesis system

Similar Documents

Publication Publication Date Title
WO2020040521A1 (fr) Procédé de synthèse de vue intermédiaire de champ lumineux, système de synthèse de vue intermédiaire de champ lumineux, et procédé de compression de champ lumineux
KR20200021891A (ko) 라이트 필드의 중간 뷰 합성 방법, 라이트 필드의 중간 뷰 합성 시스템과 라이트 필드 압축 방법
JP5853166B2 (ja) 画像処理装置及び画像処理方法並びにデジタルカメラ
WO2009125988A2 (fr) Appareil et procédé de synthèse d'image tridimensionnelle à multiples vues rapide
US7978931B2 (en) Optimized video stitching method
WO2016003253A1 (fr) Procédé et appareil pour une capture d'image et une extraction de profondeur simultanées
CN101916455B (zh) 一种高动态范围纹理三维模型的重构方法及装置
JP4458678B2 (ja) 高速視覚センサ装置
CN101883215A (zh) 成像设备
KR20100104591A (ko) 파노라마의 생성 방법
JP2013009274A (ja) 画像処理装置および画像処理方法、プログラム
JP2019114842A (ja) 画像処理装置、コンテンツ処理装置、コンテンツ処理システム、および画像処理方法
EP3164992A1 (fr) Procédé et appareil pour une capture d'image et une extraction de profondeur simultanées
JP2013090059A (ja) 撮像装置、画像生成システム、サーバおよび電子機器
JP2013090059A5 (fr)
WO2022045779A1 (fr) Restauration du champ de vision (fov) d'images pour un rendu stéréoscopique
US20190379845A1 (en) Code division compression for array cameras
WO2022080681A1 (fr) Procédé et dispositif de retouche d'image tenant compte de la région périphérique
JP7257272B2 (ja) 奥行きマップ生成装置及びそのプログラム、並びに、立体画像生成装置
JP3091644B2 (ja) 2次元画像の3次元化方法
CN1441314A (zh) 多镜头数码立体相机
EP3657786A1 (fr) Reconstruction de champ lumineux
WO2023214790A1 (fr) Appareil et procédé pour une analyse d'image
TWI465105B (zh) 立體影像自動校準方法、裝置及其電腦可讀取之記錄媒體
CN101571667A (zh) 一种消除多视点立体图像垂直视差的技术

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19851244

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19851244

Country of ref document: EP

Kind code of ref document: A1