[go: up one dir, main page]

WO2022027197A1 - Systems and methods for processing image - Google Patents

Systems and methods for processing image Download PDF

Info

Publication number
WO2022027197A1
WO2022027197A1 PCT/CN2020/106650 CN2020106650W WO2022027197A1 WO 2022027197 A1 WO2022027197 A1 WO 2022027197A1 CN 2020106650 W CN2020106650 W CN 2020106650W WO 2022027197 A1 WO2022027197 A1 WO 2022027197A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
enhancement
matrices
neural network
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/106650
Other languages
French (fr)
Inventor
Hui ZENG
Zhiqiang Li
Zisheng Cao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Priority to PCT/CN2020/106650 priority Critical patent/WO2022027197A1/en
Publication of WO2022027197A1 publication Critical patent/WO2022027197A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure generally relates to systems and methods for processing an image, and more particularly, to enhancing an image based on deep learning.
  • image enhancement can include exposure compensation, hue/saturation adjustment, tone mapping, or gamma correction.
  • photo enhancement is highly empirical and usually hand-crafted by a seasoned expert through extensive labor.
  • a 3D look-up table can be manually designed and used to enhance an image with a certain scene.
  • a mapping relationship between pixels of sample images can be trained and applied to each pixel of an input image for improving the quality.
  • Embodiments of the present disclosure provide a system for processing an image.
  • the system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image and a reference image.
  • Embodiments of the present disclosure also provide a computer-implemented method for processing an image.
  • the method includes: generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
  • Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image.
  • the method includes: generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
  • Embodiments of the present disclosure further provide a system for processing an image.
  • the system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
  • Embodiments of the present disclosure further provide a computer-implemented method for processing an image.
  • the method includes: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
  • Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image.
  • the method includes: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
  • Embodiments of the present disclosure further provide a system for processing an image.
  • the system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the plurality of image parameter mapping models and the plurality of enhancement weights based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
  • Embodiments of the present disclosure further provide a computer-implemented method for processing an image.
  • the method includes: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
  • Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image.
  • the method includes: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
  • FIG. 1 illustrates an exemplary neural network
  • FIG. 2 illustrates an exemplary neural network inference pipeline workflow, according to some embodiments of the present disclosure.
  • FIG. 3A illustrates an exemplary parallel computing architecture, according to some embodiments of the disclosure.
  • FIG. 3B illustrates a schematic diagram of an exemplary cloud system incorporating a parallel computing architecture, according to some embodiments of the disclosure.
  • FIG. 4A illustrates a schematic diagram of a process for generating an image enhancement model, according to some embodiments of the disclosure.
  • FIG. 4B illustrates a schematic diagram of another process for generating an image enhancement model, according to some embodiments of the disclosure.
  • FIG. 4C illustrates a schematic diagram of updating basic image enhancement matrices and a neural network using an unpaired loss function, according to some embodiments of the disclosure.
  • FIG. 4D illustrates a schematic diagram of updating basic image enhancement matrices and a neural network using an unpaired loss function, according to some embodiments of the disclosure.
  • FIG. 4E illustrates a schematic diagram of yet another process for generating an image enhancement model, according to some embodiments of the disclosure.
  • FIG. 5 is a flowchart of an exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
  • FIG. 6 is a flowchart of another exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
  • FIG. 7 is a flowchart of yet another exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
  • FIG. 1 illustrates an exemplary neural network (NN) 100.
  • neural network 100 can include an input layer 120 that accepts inputs, e.g., input 110-1, ..., input 110-m.
  • Inputs can include an image, text, or any other structure or unstructured data for processing by neural network 100.
  • neural network 100 can accept a plurality of inputs simultaneously. For example, in FIG. 1, neural network 100 can accept up to m inputs simultaneously.
  • input layer 120 can accept up to m inputs in rapid succession, e.g., such that input 110-1 is accepted by input layer 120 in one cycle, a second input is accepted by input layer 120 in a second cycle in which input layer 120 pushes data from input 110-1 to a first hidden layer, and so on. Any number of inputs can be used in simultaneous input, rapid succession input, or the like.
  • Input layer 120 can comprise one or more nodes, e.g., node 120-1, node 120-2, ..., node 120-a. Each node can apply an activation function to corresponding input (e.g., one or more of input 110-1, ..., input 110-m) and weight the output from the activation function by a particular weight associated with the node.
  • An activation function can comprise a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a ReLU function, a Leaky ReLU function, a Tanh function, or the like.
  • a weight can comprise a positive value between 0.0 and 1.0 or any other numerical value configured to allow some nodes in a layer to have corresponding output scaled more or less than output corresponding to other nodes in the layer.
  • neural network 100 can include one or more hidden layers, e.g., hidden layer 130-1, ..., hidden layer 130-n.
  • Each hidden layer can comprise one or more nodes.
  • hidden layer 130-1 comprises node 130-1-1, node 130-1-2, node 130-1-3, ..., node 130-1-b
  • hidden layer 130-n comprises node 130-n-1, node 130-n-2, node 130-n-3, ..., node 130-n-c.
  • nodes of the hidden layers can apply activation functions to output from connected nodes of the previous layer and weight the output from the activation functions by particular weights associated with the nodes.
  • neural network 100 can include an output layer 140 that finalizes outputs, e.g., output 150-1, output 150-2, ..., output 150-d.
  • Output layer 140 can comprise one or more nodes, e.g., node 140-1, node 140-2, ..., node 140-d. Similar to nodes of input layer 120 and of the hidden layers, nodes of output layer 140 can apply activation functions to output from connected nodes of the previous layer and weight the output from the activation functions by particular weights associated with the nodes.
  • the finalized outputs can be a plurality of enhancement matrices, a plurality of enhancement weights, or the like. It is appreciated that, weights of neural network 100 can be modified to change the finalized outputs of neural network 100.
  • the layers of neural network 100 can use any connection scheme.
  • one or more layers e.g., input layer 120, hidden layer 130-1, ..., hidden layer 130-n, output layer 140, or the like
  • Such embodiments can use fewer connections between one layer and a previous layer than depicted in FIG. 1.
  • neural network 100 can additionally or alternatively use backpropagation (e.g., by using long short-term memory nodes or the like) .
  • neural network 100 is depicted similar to a convolutional neural network (CNN)
  • neural network 100 can comprise a recurrent neural network (RNN) , a generative adversarial network (GAN) , or any other neural network.
  • RNN recurrent neural network
  • GAN generative adversarial network
  • a neural network has two stages in deep learning workflow: training and inference.
  • training the neural network keeps learning parameter values by iteratively updating them to minimize prediction error.
  • the neural network with learned parameters can then be used to perform inference tasks on new cases.
  • FIG. 2 illustrates an exemplary neural network inference pipeline workflow 200, according to some embodiments of the present disclosure.
  • inference workflow 200 relates to weight generation, it is appreciated that this is only an example rather than a limitation.
  • a trained neural network e.g., neural network 100 of FIG. 1
  • can receive an input 201 e.g., an image with a size of 256 ⁇ 256
  • perform computation 203 on input 201 e.g., an image with a size of 256 ⁇ 256
  • FP forward propagation
  • each layer in the neural network receives inputs from precedent layer (or layers) , performs computation on the inputs, and sends output to subsequent layer (or layers) .
  • the neural network After computation, the neural network provides an output 205, e.g., an evaluation result.
  • the output 205 can include a plurality of weights or matrices.
  • CNN convolutional neural network
  • a convolutional neural network is a neural network category.
  • CNN is widely used in many technical fields.
  • a CNN can perform visual tasks, e.g., image features/patterns learning or recognition.
  • FIG. 3A illustrates an exemplary parallel computing architecture 300, according to some embodiments of the disclosure.
  • architecture 300 can include a chip communication system 302, a host memory 304, a memory controller 306, a direct memory access (DMA) unit 308, a Joint Test Action Group (JTAG) /Test Access End (TAP) controller 310, a peripheral interface 312, a bus 314, a global memory 316, and the like.
  • chip communication system 302 can perform algorithmic operations (e.g., machine learning operations) based on communicated data.
  • On-chip communication system 302 can include a global manager 3022 and a plurality of cores 3024.
  • Global manager 3022 can include at least one task manager to coordinate with one or more cores 3024.
  • Each task manager can be associated with an array of cores 3024 that provide synapse/neuron circuitry for parallel computation (e.g., the neural network) .
  • the top layer of processing elements of FIG. 3A may provide circuitry representing an input layer to a neural network, while the second layer of cores may provide circuitry representing a hidden layer of the neural network.
  • on-chip communication system 302 can be implemented as a neural network processing unit (NPU) , a graphic processing unit (GPU) , or another heterogeneous accelerator unit.
  • NPU neural network processing unit
  • GPU graphic processing unit
  • global manager 3022 can include two task managers to coordinate with two arrays of cores.
  • Cores 3024 can include one or more processing elements that each include single instruction, multiple data (SIMD) architecture including one or more processing units configured to perform one or more operations (e.g., multiplication, addition, multiply-accumulate, etc. ) based on instructions received from global manager 3022.
  • SIMD single instruction, multiple data
  • cores 3024 can include one or more processing elements for processing information in the data packets.
  • Each processing element may comprise any number of processing units.
  • core 3024 can be considered a tile or the like.
  • Host memory 304 can be off-chip memory such as a host CPU’s memory.
  • host memory 304 can be a DDR memory (e.g., DDR SDRAM) or the like.
  • Host memory 304 can be configured to store a large amount of data with slower access speed, compared to the on-chip memory integrated within one or more processors, acting as a higher-level cache.
  • Memory controller 306 can manage the reading and writing of data to and from a specific memory block within global memory 316 having on-chip memory blocks (e.g., 4 blocks of 8GB second generation of high bandwidth memory (HBM2) ) to serve as main memory.
  • memory controller 306 can manage read/write data coming from outside chip communication system 302 (e.g., from DMA unit 308 or a DMA unit corresponding with another NPU) or from inside chip communication system 302 (e.g., from a local memory in core 3024 via a 2D mesh controlled by a task manager of global manager 3022) .
  • one memory controller is shown in FIG. 3A, it is appreciated that more than one memory controller can be provided in architecture 300.
  • Memory controller 306 can generate memory addresses and initiate memory read or write cycles.
  • Memory controller 306 can contain several hardware registers that can be written and read by the one or more processors.
  • the registers can include a memory address register, a byte-count register, one or more control registers, and other types of registers. These registers can specify some combination of the source, the destination, the direction of the transfer (reading from the input/output (I/O) device or writing to the I/O device) , the size of the transfer unit, the number of bytes to transfer in one burst, and/or other typical features of memory controllers.
  • DMA unit 308 can assist with transferring data between host memory 304 and global memory 316. In addition, DMA unit 308 can assist with transferring data between multiple on-chip communication systems (e.g., 302) . DMA unit 308 can allow off-chip devices to access both on-chip and off-chip memory without causing a CPU interrupt. Thus, DMA unit 308 can also generate memory addresses and initiate memory read or write cycles. DMA unit 308 also can contain several hardware registers that can be written and read by the one or more processors, including a memory address register, a byte-count register, one or more control registers, and other types of registers.
  • architecture 300 can include a second DMA unit, which can be used to transfer data between other neural network processing architectures to allow multiple neural network processing architectures to communication directly without involving the host CPU.
  • JTAG/TAP controller 310 can specify a dedicated debug port implementing a serial communications interface (e.g., a JTAG interface) for low-overhead access to the NPU without requiring direct external access to the system address and data buses.
  • JTAG/TAP controller 310 can also have on-chip test access interface (e.g., a TAP interface) that implements a protocol to access a set of test registers that present chip logic levels and device capabilities of various parts.
  • Peripheral interface 312 (such as a PCIe interface) , if present, serves as an (and typically the) inter-chip bus, providing communication between architecture 300 and other devices.
  • Bus 314 includes both intra-chip bus and inter-chip buses.
  • the intra-chip bus connects all internal components to one another as called for by the system architecture. While not all components are connected to every other component, all components do have some connection to other components they need to communicate with.
  • the inter-chip bus connects the NPU with other devices, such as the off-chip memory or peripherals.
  • bus 314 is solely concerned with intra-chip buses, though in some implementations it could still be concerned with specialized inter-bus communications.
  • On-chip communication system 302 can be configured to perform operations based on neural networks.
  • Architecture 300 can also include a host unit 320.
  • Host unit 320 can be one or more processing unit (e.g., an X86 central processing unit, an ARM processor, and the like) .
  • a host system having host unit 320 and host memory 304 can comprise a compiler (not shown) .
  • the compiler is a program or computer software that transforms computer codes written in one programming language into NPU instructions to create an executable program.
  • a compiler can perform a variety of operations, for example, pre-processing, lexical analysis, parsing, semantic analysis, conversion of input programs to an intermediate representation, code optimization, and code generation, or combinations thereof.
  • the compiler that generates the NPU instructions can be on the host system, which pushes commands to chip communication system 302. Based on these commands, each task manager can assign any number of tasks to one or more cores (e.g., core 3024) . Some of the commands can instruct DMA unit 308 to load the instructions (generated by the compiler) and data from host memory 304 into global memory 316. The loaded instructions can then be distributed to each core assigned with the corresponding task, and the one or more cores can process these instructions.
  • each task manager can assign any number of tasks to one or more cores (e.g., core 3024) .
  • Some of the commands can instruct DMA unit 308 to load the instructions (generated by the compiler) and data from host memory 304 into global memory 316. The loaded instructions can then be distributed to each core assigned with the corresponding task, and the one or more cores can process these instructions.
  • the neural network when a neural network has a simple architecture (e.g., with 5 layers) , the neural network can be executed on host unit 320 without using on-chip communication system 302.
  • the training of a neural network can be implemented on on-chip communication system 302, while the application of the trained neural network can be implemented on host unit 320.
  • FIG. 3B illustrates a schematic diagram of an exemplary cloud system 330 incorporating parallel computing architecture 300, according to some embodiments of the disclosure.
  • cloud system 330 can provide cloud service with artificial intelligence (AI) capabilities, and can include a plurality of computing servers (e.g., 332 and 334) .
  • a computing server 332 can, for example, incorporate parallel computing architecture 300 of FIG. 3A.
  • Parallel computing architecture 300 is shown in FIG. 3B in a simplified manner for simplicity and clarity.
  • cloud system 330 can provide the extended AI capabilities of image recognition, facial recognition, translations, 3D modeling, and the like.
  • parallel computing architecture 300 can be deployed to computing devices in other forms.
  • parallel computing architecture 300 can also be integrated in a computing device, such as a smart phone, a tablet, and a wearable device.
  • FIGs. 3A-3B While a parallel computing architecture is shown in FIGs. 3A-3B, it is appreciated that any accelerator that provides the ability to perform parallel computation can be used.
  • a 3D look-up table (LUT) is designed manually or a pixel-level mapping relationship can be trained.
  • LUT 3D look-up table
  • Embodiments of the disclosure provide methods and systems for adaptively processing an input image using an adaptive image enhancement matrix determined in association with a neural network.
  • An adaptive image enhancement matrix can be generated based on a plurality of basic image enhancement image matrices or a plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices. At least one of the plurality of basic image enhancement image matrices or the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices can be adaptively generated based on the input image using a neural network, more particularly, based on color information of the input image. Also, as the color information can be processed by a relatively simple neural network, the image processing can be implemented on any computing platform, and the delay caused by enhancing an image using this adaptive image enhancement matrix is small enough to allow real-time video enhancement.
  • An image enhancement matrix (e.g., the adaptive image enhancement matrix) can be associated with a mapping relationship between an input image to be processed and a processed image.
  • parameters of the input image e.g., RGB values of a pixel
  • the processed image can present enhanced image quality.
  • the image enhancement matrix can be a one-dimension matrix, a two-dimension matrix, a three-dimension matrix, or the like.
  • the one-dimension matrix can be used to adjust a gamma curve of an image
  • the two-dimension matrix can be used to adjust saturation, sharpness, and the like of an image
  • the three-dimension (3D) matrix can be used to adjust the color space of an image (e.g., values of red, green, blue components (RGB values) , values of hue, saturation, lightness (HSL values) , LumaChroma (YUV) values of the image)
  • the 3D matrix can be referred to a 3D look-up table (LUT) .
  • the enhancement matrix can include more dimensions than the above examples.
  • a 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an image.
  • a set of RGB values e.g., 120, 140, 120
  • another set of RGB values e.g., 240, 140, 240
  • an image enhancement model including at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, or the neural network can be trained based on a paired or unpaired reference image using a neural network. Processes for the above training will be described as below.
  • FIG. 4A illustrates a schematic diagram of a process 400 for generating an image enhancement model, according to some embodiments of the disclosure.
  • an input image 402a which is a high resolution image
  • a reduced input image 402b can be down-sampled into a reduced input image 402b with a low resolution.
  • an image with a size of 4096 ⁇ 4096 can be down-sampled into an image with 256 ⁇ 256.
  • reduced input image 402b can be sent to a neural network 404 as an input. It is appreciated that more than one input images can be used as the input to neural network 404. This embodiment can achieve higher efficiency and lower memory consumption.
  • Neural network 404 can process a color space (e.g., RGB values, HSL values, YUV values, or the like) of reduced input image 402b and output a plurality of enhancement weights (e.g., elements 406a, 406b, and 406c of FIG. 4A) based on reduced input image 402b.
  • the plurality of enhancement weights can correspond to a plurality of basic image enhancement matrices (e.g., 408a, 408b, and 408c of FIG. 4A) , respectively.
  • the plurality of basic image enhancement matrices can be preset or randomly generated. Generally, a size of a basic image enhancement matrix is 32 ⁇ 32 ⁇ 32.
  • neural network 404 can output the plurality of basic image enhancement matrices (e.g., elements 408a-408c of FIG. 4A) based on reduced input image 402b.
  • the plurality of basic image enhancement matrices can correspond to the plurality of enhancement weights, which can be preset or randomly generated.
  • Neural network 404 can be a convolutional neural network (CNN) with five convolution blocks, each having a convolutional layer, a leaky ReLU, and an instance normalization layer, a dropout layer, and a fully-connected layer.
  • CNN convolutional neural network
  • a first convolution block can have 3 kernels and 16 channels
  • a second convolution block can have 3 kernels and 32 channels
  • a third convolution block can have 3 kemels and 64 channels
  • a fourth convolution block and a fifth convolution block can each have 3 kemels and 128 channels.
  • the dropout layer can be added to neural network 404 to avoid overfitting, and the fully-connected layer can output a plurality of enhancement weights (e.g., elements 406a, 406b, and 406c of FIG. 4A) .
  • neural network 404 can also be a Unet, a LeNet, an AlexNet, a VGG, a GoogleNet, a ResNet, a DenseNet, or the like.
  • an adaptive image enhancement matrix 410 can be generated based on the plurality of basic image enhancement matrices and the plurality of enhancement weights. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate adaptive image enhancement matrix 410.
  • Adaptive image enhancement matrix 410 can be applied on input image 402a to generate an enhanced input image 412.
  • At least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, or the neural network can be trained by a group of reference images.
  • the group of reference images can include an image set of MIT-Adobe “FiveK” or an image set of HDR+.
  • the group of reference images can be selected by a user of a terminal device (e.g., a mobile phone, a drone, and the like) , so that an adaptive image enhancement matrix reflecting the user’s preference can be generated based on the plurality of basic image enhancement image matrices and the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices in association with the selected group of reference image.
  • a terminal device e.g., a mobile phone, a drone, and the like
  • the group of reference images can be associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) .
  • a scene e.g., sky, person, pet, sport, flower, night scene, and the like
  • at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, and the neural network can be trained for each scene, so that adaptive image enhancement matrices corresponding to each scene can be generated.
  • a camera system can determine a scene of an image to be processed and then determine, for the image, at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, and the neural network corresponding to the scene.
  • the group of reference images can include an enhanced reference image (e.g., an enhanced image from the image set of MIT-Adobe “FiveK” or the image set of HDR+) .
  • neural network 404 can be updated by comparing the enhanced reference image and enhanced input image 412. For example, neural network weights of neural network 404 can be updated.
  • Neural network weights are provided for nodes of each layer of a neural network (e.g., neural network 100 of FIG. 1) .
  • a set of neural network weights can be provided for each node of hidden layer 130-n, so that the products of the neural network weights on each node in output layer 140 can be summed as outputs of output layer 140.
  • these outputs can be further used as enhancement weights for fusing basic image enhancement matrices into the adaptive image enhancement matrix.
  • FIG. 4B illustrates a schematic diagram of a process 410 for generating an image enhancement model, according to some embodiments of the disclosure.
  • enhancement weights 416a-416c can be preset or randomly generated, and a neural network 414 can output a plurality of basic image enhancement matrices corresponding to enhancement weights 418a-418c.
  • the set of neural network weights that are initially provided are usually not optimized, and can be further optimized through backpropagation based on a loss function reflecting a difference between a prediction value (e.g., enhanced input image 412) and an actual value (e.g., the enhanced reference image) .
  • a derivative of the loss function can be used to determine how the set of neural network weights affect the outputs of a neural network, and the derivative of the loss function can also be applied on the set of neural network weights to adjust values of these neural network weights.
  • parameters of basic image enhancement matrices 408a-408c e.g., values of one or more elements of a basic image enhancement matrix
  • values of enhancement weights 416a-416c can also be adjusted using the loss function.
  • the basic image enhancement matrices of the first example and the enhancement weights of the second example may not be adjusted.
  • only the neural network e.g., neural network 404 of FIG. 4A or neural network 414 of FIG. 4B
  • the group of reference images can include a plurality of sets of paired images.
  • Each set of paired images can include an original reference image and an enhanced reference image corresponding to the original reference image.
  • the enhanced reference image can be an image with high quality (e.g., an image that is manually adjusted by an expert) .
  • the original reference image can be used as input image 402a. Therefore, enhanced input image 412 is generated by applying adaptive image enhancement matrix 410 on input image 402a (e.g., the original reference image) . Because the enhanced reference image corresponds to the original reference image, the enhanced reference image can be compared with enhanced input image 412 to determine whether adaptive image enhancement matrix 410 can be further improved. In other words, at least one of basic image enhancement matrices 408a-408c, enhancement weights406a-406c, and neural network 404 can be updated by comparing the enhanced reference image and enhanced input image 412. And a paired loss function can be used to reflect a difference between the enhanced reference image and enhanced input image 412.
  • FIG. 4C illustrates a schematic diagram of updating basic image enhancement matrices (e.g., 408a-408c) and a neural network using a paired loss function 420, according to some embodiments of the disclosure.
  • paired loss function 420 can be used to update at least one of a plurality of basic image enhancement matrices 408a-408c or neural network 404.
  • paired loss function 420 can be a mean-square-error (MSE) loss function as Equation (1) below.
  • MSE mean-square-error
  • T is a number of sets of paired images (e.g., an original reference image and an enhanced reference image corresponding to the original reference image) used in training
  • q t is the enhanced input image (e.g., enhanced input image 412)
  • y t is the enhanced reference image (e.g., 414 of FIG. 4C) corresponding to the enhanced input image.
  • L MSE determined based on a difference (e.g., q t -y t ) between the enhanced input image and the enhanced reference image using the loss function can be referred to as a loss. It is appreciated that other loss functions also can be used, such as an L1 loss function, a perceptual function, and the like.
  • the paired loss function can be used to adjust parameters of at least one of the plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4C) or the neural network (e.g., 404 in FIG. 4C) .
  • the loss between the enhanced input image and the enhanced reference image can be used to determine whether the adjusted parameters are optimized.
  • the enhanced input image and the loss associated with the enhanced input image can be updated when the parameters of basic image enhancement matrices and the neural network are updated. For example, when the loss is less than a given threshold, it can be determined that the parameters of at least one of the plurality of basic image enhancement matrices or the neural network are optimized.
  • the optimized basic image enhancement matrices and the optimized neural network can be finalized as an image enhancement model and used for enhancing any input image.
  • the paired loss function can further include a smooth regularization factor (R s ) and a monotonicity regularization factor (R m ) .
  • the smooth regularization factor can convert input values (e.g., RGB values) of the input image into a desired color space without generating many artifacts, and therefore, smooth the output of the adaptive image enhancement matrix.
  • the smooth regularization factor can include a total variation (R TV ) determined as Equation (2) below.
  • the smooth regularization factor can also include a factor determined based on the enhancement weights (W n ) generated by neural network 404 of FIG. 4A.
  • the factor can be expressed as ⁇ n
  • the monotonicity regularization factor can preserve relative brightness and saturation of input values (e.g., RGB values) of the input image and update parameters that are not activated by the input values, thus improving generalization capability of the adaptive image enhancement matrix.
  • the monotonicity regularization factor R m can be expressed as Equation (4) below.
  • the monotonicity regularization factor can ensure that the output value increases with the indices i, j, k and larger indices i, j, k correspond to larger input values in the adaptive image enhancement matrix.
  • an unpaired loss function can be used for updating at least one of a plurality of basic image enhancement matrices or the neural network. More particularly, the enhanced reference image corresponding to the input image can contain a different content from the input image. In other words, the enhanced input image will be compared with an enhanced reference image containing a different content, and whether the enhanced input image is optimized can be determined using unsupervised learning.
  • FIG. 4D illustrates a schematic diagram of updating basic image enhancement matrices (e.g., 408a-408c) and a neural network using an unpaired loss function 430, according to some embodiments of the disclosure.
  • Unpaired loss function 430 can be used to update the machine learning model (e.g., basic image enhancement matrices 408a-408c or neural network 404) .
  • the contents of enhanced input image 412 and enhanced reference image 414 are different.
  • unpaired loss function 430 can be a generative adversarial network (GAN) loss function based on a GAN.
  • GAN generative adversarial network
  • a GAN can include a generator and a discriminator.
  • the generator can create and pass an enhanced image to the discriminator, and if the discriminator determines a rate of the enhanced image being optimized is 50%, then the discriminator is fooled and the enhanced image is optimized.
  • the generator of the GAN can include the plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4A) and the neural network (e.g., 404 in FIG. 4A) , or the plurality of enhancement weights (e.g., 416a-416c in FIG.
  • the discriminator can include a neural network that has a same architecture as neural network 404 or 414.
  • the discriminator can be a convolutional neural network having the same architecture as neural network 404 or 414.
  • unpaired loss function 430 can include a generator function 432 associated with the generator of the GAN and a discriminator function 434 associated with the discriminator of the GAN.
  • Generator function 432 can be expressed as Equation (5) below.
  • ⁇ 1 is a constant parameter to balance the above two factors. As an example, ⁇ 1 is set to 1,000.
  • Discriminator function 434 can be expressed as Equation (6) below.
  • E x [D (G (x) ) ] and E y [D (y) ] represent discriminator losses, is a gradient penalty for stabilizing the training.
  • unpaired loss function 430 can further include a smooth regularization factor (R s ) and a monotonicity regularization factor (R m ) , as described with reference to FIG. 4C.
  • the unpaired loss function can also be used to adjust parameters of at least one of a plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4D) or the neural network (e.g., 404 in FIG. 4D) .
  • the loss between the enhanced input image and the enhanced reference image can be used to determine whether the adjusted parameters are optimized.
  • the enhanced input image and the loss associated with the enhanced input image can be re-determined after the parameters of basic image enhancement matrices and the neural network are updated. For example, when the loss is less than a given threshold, it can be determined that the parameters of basic image enhancement matrices and the neural network are optimized.
  • an image enhancement model including at least one of the optimized basic image enhancement matrices or the optimized neural network can be finalized and used for enhancing any input image.
  • FIGs. 4C-4D show applying a paired loss function or an unpaired loss function on at least one of a plurality of basic image enhancement matrices or a neural network (i.e., the first example)
  • the paired loss function or the unpaired loss function can also be applied on at least one of a plurality of enhancement weights or a neural network (i.e., the second example) .
  • an image enhancement model including at least one of a plurality of basic image enhancement matrices, a plurality of enhancement weights, or a neural network can be trained in a first apparatus and applied in a second apparatus.
  • an image enhancement model can be trained by a smart phone manufacturer and preset in small phones.
  • an image enhancement model can be remotely trained by a cloud system based on a group of reference images selected by a user of a smart phone, but the trained image enhancement model can be applied locally in the smart phone.
  • the image enhancement model can be trained and used by a same apparatus.
  • the image enhancement model can be trained by a smart phone, and applied in the same smart phone.
  • trilinear interpolation in applying an image enhancement model on an input image, trilinear interpolation can be adopted as an input image may have more bits than an adaptive image enhancement matrix generated by the image enhancement model.
  • a size of a basic image enhancement matrix can be 32 ⁇ 32 ⁇ 32, and if the input image is an image in 8 bits, an output image can be interpolated.
  • an input color of a pixel of the input image can be determined.
  • the input color can includes RGB values of the pixel.
  • a maximum color value and a number of elements in the adaptive image enhancement matrix e.g., a 3D look-up table (LUT)
  • LUT 3D look-up table
  • r (x, y, z) , g (x, y, z) , b (x, y, z) represent the RGB values of the input color, respectively.
  • C max represents the maximum color value
  • M represents the number of elements in the 3D LUT.
  • nearest 8 surrounding elements can be used to interpolate an output color.
  • FIG. 4E illustrates a schematic diagram of a process 440 for generating an image enhancement model, according to some embodiments of the disclosure.
  • a plurality of enhancement weights 446a-446c and a plurality of basic image enhancement matrices 448a-448c are preset or randomly generated.
  • a loss function and a loss determined based on a difference between a reference image 442 and an enhanced input image 412 using the loss function can be used to iteratively update the plurality of enhancement weights 446a-446c and the plurality of basic image enhancement matrices 448a-448c.
  • the plurality of enhancement weights 446a-446c and the plurality of basic image enhancement matrices 448a-448c can be finalized as an image enhancement model.
  • FIG. 5 is a flowchart of an exemplary computer-implemented method 500 for processing an image, according to some embodiments of the disclosure.
  • Method 500 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG. 3A) , an X86 central processing unit, an ARM processor, or the like.
  • Method 500 can include steps as below.
  • an image enhancement model for an input image is generated.
  • the image enhancement model can include a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices.
  • the basic image enhancement matrices e.g., 408a-408c of FIG. 4A
  • the enhancement weights 418a-418c of FIG. 4B
  • a first neural network (e.g., neural network 404 of FIG. 4A) can be used to generate a plurality of enhancement weights (e.g., 406a-406c of FIG. 4A) corresponding to the plurality of basic image enhancement matrices based on an input image.
  • a first neural network (e.g., neural network 414 of FIG. 4B) can be used to generate a plurality of basic image enhancement matrices (e.g., 418a-418c of FIG. 4B) corresponding to the plurality of enhancement weights based on an input image.
  • the first neural network can be a convolutional neural network, a Unet, a LeNet, an AlexNet, a VGG, a GoogleNet, a ResNet, a DenseNet, or the like.
  • an input image can be down-sampled into a reduced image (e.g., 402b of FIG. 4A) . Then, at least one of the plurality of enhancement weights or the plurality of basic image enhancement matrices can be generated based on the reduced image using the first neural network.
  • the first neural network can process color space information of an image. More particularly, the color space information can include RGB values, HSL values, or YUV values of a pixel of an image.
  • the neural network can be sufficient for a general processor (e.g., an X86 central processing unit, an ARM processor, or the like) to execute.
  • a general processor e.g., an X86 central processing unit, an ARM processor, or the like
  • the plurality of basic image enhancement matrices and the generated adaptive image enhancement matrix can be color space models.
  • an adaptive image enhancement matrix is generated based on the plurality of enhancement weights and the plurality of basic image enhancement matrices. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate the adaptive image enhancement matrix.
  • the input image is enhanced using the adaptive image enhancement matrix.
  • the adaptive image enhancement matrix can be a three-dimension look-up table (3D LUT) .
  • the 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image. It is appreciated that a 3D LUT can also be applied on an input image to change, for example, hue, saturation, lightness of the input image, depending on the nature of the 3D LUT.
  • the image enhancement model is updated based on the enhanced input image.
  • at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices, or the first neural network can be updated by comparing the enhanced input image with a reference image, and eventually finalized.
  • the reference image can be selected by a user of a terminal device.
  • the user can select a reference image according to his/her preference.
  • the reference image is associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) .
  • at least one of the plurality of finalized basic image enhancement matrices, the plurality of finalized enhancement weights, or the finalized first neural network can be associated with the scene.
  • the reference image can include a content that is the same as the input image, and the content of the reference image is enhanced. Therefore, the reference image and the input image are paired images. Then, in determining the loss associated with reference image and the enhanced input image, a difference between the enhanced input image and the enhanced reference image can be determined, and the loss can be determined using a paired loss function based on the difference.
  • the paired loss function can include a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  • the paired loss function can further include a smooth regularization factor and a monotonicity regularization factor.
  • the reference image can include a content that is different from the input image, and the content is also enhanced.
  • the enhanced input image will be compared with the reference image containing a different but enhanced content, and whether the enhanced input image is optimized can be determined using unsupervised learning.
  • an unpaired loss function can be determined using a second neural network, and the loss can be determined using the unpaired loss function based on the enhanced input image and the enhanced reference image.
  • the second neural network can be referred to as a generative adversarial network (GAN) , and can include a generator network and a discriminator network, as described above.
  • the generator network can include the plurality of basic image enhancement matrices and the first neural network.
  • the generator network can include the plurality of enhancement weights and the first neural network, or the plurality of enhancement weights and the plurality of basic image enhancement matrices.
  • the discriminator network is a convolutional neural network (CNN) having e.g., the same architecture as the first neural network.
  • CNN convolutional neural network
  • the unpaired loss function e.g., 430 of FIG. 4D
  • the unpaired loss function can include a first loss function (e.g., 432 of FIG.
  • the unpaired loss function can also include a smooth regularization factor and a monotonicity regularization factor.
  • method 500 can further include a step for determining whether the reference image is paired with the input image, e.g., using a third neural network or according to an indicator provided by the user. And in response to the determination that the reference image is paired with the input image, the paired loss function can be used. In response to the determination that the reference image is not paired with the input image, the unpaired loss function can be used.
  • At least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network can be updated according to the loss determined by the loss function (e.g., the paired loss function or the unpaired loss function) .
  • the loss function e.g., the paired loss function or the unpaired loss function
  • a derivative of the loss function can be applied to first neural network and used to update neural network weights of the first neural network.
  • the derivative of the loss function can also be used to update the plurality of basic image enhancement matrices and the plurality of enhancement weights.
  • the plurality of basic image enhancement matrices and the first neural network can be updated according to the loss determined by the loss function.
  • the first neural network can output the plurality of enhancement weights.
  • the values of elements of each basic image enhancement matrix can be updated, and parameters (e.g., neural network weights) of the first neural network can also be updated. It is appreciated that when the first neural network is updated, the plurality of output enhancement weights can also be updated accordingly.
  • only the first neural network can be updated according to the loss determined by the loss function.
  • the first neural network can output the plurality of enhancement weights.
  • the plurality of basic image enhancement matrices can be predetermined and kept unchanged.
  • the plurality of output enhancement weights can also be updated accordingly.
  • the plurality of enhancement weights and the first neural network can be updated according to the loss determined by the loss function.
  • the first neural network outputs the plurality of basic image enhancement matrices.
  • the values of the enhancement weights can be updated, and parameters (e.g., neural network weights) of the first neural network can also be updated. It is appreciated that when the first neural network is updated, the plurality of output basic image enhancement matrices can also be updated accordingly.
  • only the first neural network can be updated.
  • the first neural network can output the plurality of basic image enhancement matrices.
  • the plurality of enhancement weights can be predetermined and kept unchanged.
  • the plurality of output basic image enhancement matrices can also be updated accordingly.
  • a new image enhancement model including at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network, can be generated and a new enhanced input image can be compared with the reference image and used to train the image enhancement model.
  • the updating of at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network can be performed iteratively.
  • the loss determined by the loss function can be used to determine whether the training of the image enhancement model is finished.
  • At least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights corresponding to the plurality of basic image enhancement matrices, or the updated first neural network can be finalized. For example, when the loss is less than a given threshold, it can be determined that the training of the image enhancement model is finished, and at least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights corresponding to the plurality of basic image enhancement matrices, or the updated first neural network can be finalized.
  • FIG. 6 is a flowchart of an exemplary computer-implemented method 600 for processing an image, according to some embodiments of the disclosure. More particularly, method 600 can be used to adaptively enhance an image.
  • Method 600 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG. 3A) , an X86 central processing unit, an ARM processor, or the like. Method 600 can include steps as below.
  • an input image is received.
  • the input image can be an existing image that is stored locally, or an image instantly taken by a camera system.
  • the input image can be a frame of a video clip or a video streaming.
  • enhancement parameters are determined for the input image using a neural network (e.g., neural network 404 or 414 in FIGs. 4A-4B) .
  • the enhancement parameters can include a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices.
  • the plurality of basic image enhancement matrices can be determined based on the input image using the neural network. And the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices are preset. That is, the plurality of basic image enhancement matrices can be adaptively generated in accordance with the input image using the neural network.
  • the plurality of enhancement weights can be determined based on the input image using the neural network.
  • the plurality of basic image enhancement matrices corresponding to the plurality of enhancement weights are preset. That is, the plurality of enhancement weights can be adaptively generated in accordance with the input image using the neural network.
  • a scene associated with the input image can be determined, e.g., using a scene identification neural network, and the enhancement parameters can be determined according to the scene.
  • the scene identification neural network can be part of e.g., neural network 404 or 414 in FIGs. 4A-4B, or an independent neural network.
  • the scene identification neural network can determine that the enhancement parameters correspond to a “sky” scene, when a main part of the input image is directed to sky.
  • the plurality of enhancement weights, the plurality of basic image enhancement matrices, and the neural network can be trained using above method 500.
  • an adaptive image enhancement matrix is generated based on the enhancement parameters.
  • the adaptive image enhancement matrix can be generated based on the plurality of basic image enhancement matrices and the plurality of enhancement weights. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate the adaptive image enhancement matrix.
  • the input image isenhanced using the adaptive image enhancement matrix.
  • the adaptive image enhancement matrix can be associated with a mapping relationship between an input image to be processed and a processed image.
  • parameters of the input image e.g., RGB values of a pixel
  • the processed image can present enhanced image quality.
  • the image enhancement matrix can be a one-dimension matrix, a two-dimension matrix, a three-dimension matrix, or the like.
  • the one-dimension matrix can be used to adjust a gamma curve of an image
  • the two-dimension matrix can be used to adjust saturation, sharpness, and the like of an image
  • the three-dimension (3D) matrix can be used to adjust the color space of an image (e.g., values of red, green, blue components (RGB values) , values of hue, saturation, lightness (HSL values) , or Luma, Chroma (YUV values) of the image)
  • the 3D matrix can be referred to a 3D look-up table (LUT) .
  • the enhancement matrix can include more dimensions than the above examples.
  • a 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on the input image.
  • a set of RGB values e.g., 120, 140, 120
  • another set of RGB values e.g., 240, 140, 240
  • FIG. 7 is a flowchart of an exemplary computer-implemented method 700 for processing an image, according to some embodiments of the disclosure.
  • Method 700 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG. 3A) , an X86 central processing unit, an ARM processor, or the like.
  • Method 700 can include steps as below.
  • a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the image parameter mapping models are generated for an input image.
  • the image parameter mapping models (e.g., 448a-448c of FIG. 4E) can be preset or randomly generated.
  • the plurality of image parameter mapping models can be color space models associated with color space information.
  • the color space information of the input image can include RGB values, HSL values, or YUV values of a pixel of the input image.
  • the plurality of image parameter mapping models can be one-dimension matrices, two-dimension matrices, or three-dimension matrices.
  • a size of the image parameter mapping model can be 32 ⁇ 32 ⁇ 32.
  • the enhancement weights (446a-446c of FIG. 4E) corresponding to the image parameter mapping models can be also preset or randomly generated.
  • an adaptive image enhancement matrix is generated based on the plurality of enhancement weights and the plurality of image parameter mapping models. For example, each of the image parameter mapping models can be multiplied with an enhancement weight corresponding to the image parameter mapping model, and the results can be added together to generate the adaptive image parameter mapping model.
  • the plurality of image parameter mapping models and the generated image parameter mapping model can be color space models.
  • the input image is enhanced using the adaptive image parameter mapping model.
  • the adaptive image parameter mapping model can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image.
  • the plurality of image parameter mapping models and the plurality of enhancement weights are updated based on the enhanced input image.
  • at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be updated by comparing the enhanced input image with a reference image, and eventually finalized.
  • the reference image can be selected by a user of a terminal device.
  • the user can select a reference image in his/her preference.
  • the reference image is associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) . Accordingly, at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be associated with the scene.
  • the plurality of image parameter mapping models can be updated according to the loss.
  • the plurality of enhancement weights can be predetermined and kept unchanged.
  • the plurality of enhancement weights can be updated according to the loss.
  • the plurality of image parameter mapping models can be predetermined and kept unchanged.
  • the plurality of image parameter mapping models and the plurality of enhancement weights can be updated according to the loss.
  • the reference image can include a content that is the same as the input image, and the content of the reference image is enhanced. Therefore, the reference image and the input image are paired images. Then, in determining the loss associated with the reference image and the enhanced input image, a difference between the enhanced input image and the enhanced reference image can be determined, and the loss can be determined using a paired loss function based on the difference.
  • the paired loss function can include a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  • MSE means-square-error
  • the paired loss function can further include a smooth regularization factor and a monotonicity regularization factor.
  • the reference image can include a content that is different from the input image, and the content is also enhanced.
  • the enhanced input image will be compared with the reference image containing a different but enhanced content, and whether the enhanced input image is optimized can be determined using unsupervised learning.
  • an unpaired loss function can be determined using a neural network, and the loss can be determined using the unpaired loss function based on the enhanced input image and the enhanced reference image.
  • the neural network can be referred to as a generative adversarial network (GAN) , and can include a generator network and a discriminator network.
  • the generator network can include the plurality of image parameter mapping models and the plurality of enhancement weights.
  • the discriminator network is a convolutional neural network (CNN) .
  • the unpaired loss function can include a first loss function associated with the generator network and a second loss function associated with the discriminator network.
  • the unpaired loss function can also include a smooth regularization factor and a monotonicity regularization factor.
  • method 700 can further include a step for determining whether the reference image is paired with the input image, e.g., using a neural network or according to an indicator provided by the user. And in response to the determination that the reference image is paired with the input image, the paired loss function can be used. In response to the determination that the reference image is not paired with the input image, the unpaired loss function can be used.
  • At least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be updated according to the loss determined by the loss function (e.g., the paired loss function or the unpaired loss function) .
  • the loss function e.g., the paired loss function or the unpaired loss function
  • a derivative of the loss function can be applied to the plurality of image parameter mapping models or the plurality of enhancement weights and used to update them.
  • a new image enhancement model including at least one of the plurality of image parameter mapping models or the plurality of enhancement weights, can be generated and a new enhanced input image can be compared with the reference image and used to train the image enhancement model.
  • the updating of at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be performed iteratively.
  • the loss determined by the loss function can be used to determine whether the training of the image enhancement model is finished.
  • At least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be finalized. For example, when the loss is less than a given threshold, it can be determined that the training of the image enhancement model is finished, and at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be finalized.
  • the plurality of image parameter mapping models and the plurality of enhancement weights can be finalized as an image enhancement model for the inference stage.
  • each of the image parameter mapping models can be multiplied with an enhancement weight corresponding to the image parameter mapping model, and the results can be added together to generate an image enhancement model.
  • the image enhancement model can be a three-dimension look-up table (3D LUT) .
  • the 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image.
  • an image enhancement model can also be applied on an input image to change, for example, hue, saturation, lightness of the input image, depending on the nature of the image enhancement model.
  • Embodiments of the disclosure also provide a computer program product.
  • the computer program product may include a non-transitory computer readable storage medium having computer readable program instructions thereon for causing a processor to carry out the above-described methods.
  • the computer readable storage medium may be a tangible device that can store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • the computer readable program instructions for carrying out the above-described methods may be assembler instructions, instruction-set-architecture (liSA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or source code or object code written in any combination of one or more programming languages, including an object-oriented programming language, and conventional procedural programming languages.
  • the computer readable program instructions may execute entirely on a computer system as a stand-alone software package, or partly on a first computer and partly on a second computer remote from the first computer. In the latter scenario, the second, remote computer may be connected to the first computer through any type of network, including a local area network (LAN) or a wide area network (WAN) .
  • LAN local area network
  • WAN wide area network
  • the computer readable program instructions may be provided to a processor of a computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the above-described methods.
  • the apply scenarios are not limited to the ISP pipeline of various cameras, such as UAV, mobile phones, SLRs, mirrorless cameras, and action cameras. It can also be used in mobile phones and computer for tone-mapping.
  • a block in the flow charts or diagrams may represent a software program, segment, or portion of code, which comprises one or more executable instructions for implementing specific functions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the diagrams and/or flow charts, and combinations of blocks in the diagrams and flow charts may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

Systems and methods for processing an image. The method can include: generating an image enhancement model for an input image (402a) using a first neural network (404), wherein the image enhancement model comprises a plurality of basic image enhancement matrices (408a-408c) and a plurality of enhancement weights (406a-406c) corresponding to the plurality of basic image enhancement matrices (408a-408c); generating an adaptive image enhancement matrix (410) based on the plurality of enhancement weights (406a-406c) and the plurality of basic image enhancement matrices (408a-408c); enhancing the input image (402a) using the adaptive image enhancement matrix (410); and updating the image enhancement model based on the enhanced input image (412), wherein at least one of the plurality of basic image enhancement matrices (408a-408c) or the plurality of enhancement weights (406a-406c) corresponding to the plurality of basic image enhancement matrices (408a-408c) is updated by comparing the enhanced input image (412) and a reference image.

Description

[Title established by the ISA under Rule 37.2] SYSTEMS AND METHODS FOR PROCESSING IMAGE
Copyright Notice
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
TECHNICAL FIELD
The present disclosure generally relates to systems and methods for processing an image, and more particularly, to enhancing an image based on deep learning.
BACKGROUND
An original image can be enhanced to improve the quality of the image. For example, image enhancement can include exposure compensation, hue/saturation adjustment, tone mapping, or gamma correction. However, photo enhancement is highly empirical and usually hand-crafted by a seasoned expert through extensive labor.
Using machine learning to enhance the quality of an image is trending. For example, a 3D look-up table can be manually designed and used to enhance an image with a certain scene. As another example, a mapping relationship between pixels of sample images can be trained and applied to each pixel of an input image for improving the quality.
SUMMARY
Embodiments of the present disclosure provide a system for processing an image. The system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: generating an image  enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image and a reference image.
Embodiments of the present disclosure also provide a computer-implemented method for processing an image. The method includes: generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image.  The method includes: generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
Embodiments of the present disclosure further provide a system for processing an image. The system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
Embodiments of the present disclosure further provide a computer-implemented method for processing an image. The method includes: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;  generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image. The method includes: receiving an input image; determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices; generating an adaptive image enhancement matrix based on the enhancement parameters; and enhancing the input image using the adaptive image enhancement matrix.
Embodiments of the present disclosure further provide a system for processing an image. The system includes: a memory for storing a set of instructions; and at least one processor configured to execute the set of instructions for causing the system to perform: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the plurality of image parameter mapping models and the plurality of enhancement weights based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
Embodiments of the present disclosure further provide a computer-implemented method for processing an image. The method includes: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
Embodiments of the present disclosure further provide a non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image. The method includes: generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image; generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models; enhancing the input image using the adaptive image enhancement matrix; and updating the image enhancement model based on the enhanced input image, wherein at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed. Other features of the present invention will become apparent by a review of the specification, claims, and appended figures.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates an exemplary neural network.
FIG. 2 illustrates an exemplary neural network inference pipeline workflow, according to some embodiments of the present disclosure.
FIG. 3A illustrates an exemplary parallel computing architecture, according to some embodiments of the disclosure.
FIG. 3B illustrates a schematic diagram of an exemplary cloud system incorporating a parallel computing architecture, according to some embodiments of the disclosure.
FIG. 4A illustrates a schematic diagram of a process for generating an image enhancement model, according to some embodiments of the disclosure.
FIG. 4B illustrates a schematic diagram of another process for generating an image enhancement model, according to some embodiments of the disclosure.
FIG. 4C illustrates a schematic diagram of updating basic image enhancement matrices and a neural network using an unpaired loss function, according to some embodiments of the disclosure.
FIG. 4D illustrates a schematic diagram of updating basic image enhancement matrices and a neural network using an unpaired loss function, according to some embodiments of the disclosure.
FIG. 4E illustrates a schematic diagram of yet another process for generating an image enhancement model, according to some embodiments of the disclosure.
FIG. 5 is a flowchart of an exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
FIG. 6 is a flowchart of another exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
FIG. 7 is a flowchart of yet another exemplary computer-implemented method for processing an image, according to some embodiments of the disclosure.
DETAILED DESCRIPTION
The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers refer to the same or similar parts. While several illustrative embodiments are described herein, modifications, adaptations and other implementations are possible. For example, substitutions, additions or modifications may be made to the components illustrated in the drawings. Accordingly, the following detailed description is not limited to the disclosed embodiments and examples. Instead, the proper scope is defined by the appended claims.
FIG. 1 illustrates an exemplary neural network (NN) 100. As depicted in FIG. 1, neural network 100 can include an input layer 120 that accepts inputs, e.g., input 110-1, ..., input 110-m. Inputs can include an image, text, or any other structure or unstructured data for processing by neural network 100. In some embodiments, neural network 100 can accept a plurality of inputs simultaneously. For example, in FIG. 1, neural network 100 can accept up to m inputs simultaneously. Additionally or alternatively, input layer 120 can accept up to m inputs in rapid succession, e.g., such that input 110-1 is accepted by input layer 120 in one cycle, a  second input is accepted by input layer 120 in a second cycle in which input layer 120 pushes data from input 110-1 to a first hidden layer, and so on. Any number of inputs can be used in simultaneous input, rapid succession input, or the like.
Input layer 120 can comprise one or more nodes, e.g., node 120-1, node 120-2, ..., node 120-a. Each node can apply an activation function to corresponding input (e.g., one or more of input 110-1, ..., input 110-m) and weight the output from the activation function by a particular weight associated with the node. An activation function can comprise a Heaviside step function, a Gaussian function, a multiquadratic function, an inverse multiquadratic function, a sigmoidal function, a ReLU function, a Leaky ReLU function, a Tanh function, or the like. A weight can comprise a positive value between 0.0 and 1.0 or any other numerical value configured to allow some nodes in a layer to have corresponding output scaled more or less than output corresponding to other nodes in the layer.
As further depicted in FIG. 1, neural network 100 can include one or more hidden layers, e.g., hidden layer 130-1, ..., hidden layer 130-n. Each hidden layer can comprise one or more nodes. For example, in FIG. 1, hidden layer 130-1 comprises node 130-1-1, node 130-1-2, node 130-1-3, ..., node 130-1-b, and hidden layer 130-n comprises node 130-n-1, node 130-n-2, node 130-n-3, ..., node 130-n-c. Similar to nodes of input layer 120, nodes of the hidden layers can apply activation functions to output from connected nodes of the previous layer and weight the output from the activation functions by particular weights associated with the nodes.
As further depicted in FIG. 1, neural network 100 can include an output layer 140 that finalizes outputs, e.g., output 150-1, output 150-2, ..., output 150-d. Output layer 140 can comprise one or more nodes, e.g., node 140-1, node 140-2, ..., node 140-d. Similar to nodes of input layer 120 and of the hidden layers, nodes of output layer 140 can apply activation functions  to output from connected nodes of the previous layer and weight the output from the activation functions by particular weights associated with the nodes. In some embodiments, the finalized outputs can be a plurality of enhancement matrices, a plurality of enhancement weights, or the like. It is appreciated that, weights of neural network 100 can be modified to change the finalized outputs of neural network 100.
Although depicted as fully connected in FIG. 1, the layers of neural network 100 can use any connection scheme. For example, one or more layers (e.g., input layer 120, hidden layer 130-1, ..., hidden layer 130-n, output layer 140, or the like) can be connected using a convolutional scheme, a sparsely connected scheme, or the like. Such embodiments can use fewer connections between one layer and a previous layer than depicted in FIG. 1.
Moreover, although depicted as a feedforward network in FIG. 1, neural network 100 can additionally or alternatively use backpropagation (e.g., by using long short-term memory nodes or the like) . Accordingly, although neural network 100 is depicted similar to a convolutional neural network (CNN) , neural network 100 can comprise a recurrent neural network (RNN) , a generative adversarial network (GAN) , or any other neural network.
In general, a neural network has two stages in deep learning workflow: training and inference. During training, the neural network keeps learning parameter values by iteratively updating them to minimize prediction error. When converged, the neural network with learned parameters can then be used to perform inference tasks on new cases.
FIG. 2 illustrates an exemplary neural network inference pipeline workflow 200, according to some embodiments of the present disclosure. Although inference workflow 200 relates to weight generation, it is appreciated that this is only an example rather than a limitation. As shown in FIG. 2, a trained neural network (e.g., neural network 100 of FIG. 1) can receive an  input 201, e.g., an image with a size of 256×256, and perform computation 203 on input 201. Specifically, a forward propagation (FP) starts in the neural network and data flow from an input layer, through one or more hidden layers, to an output layer. As explained with reference to FIG. 1, each layer in the neural network receives inputs from precedent layer (or layers) , performs computation on the inputs, and sends output to subsequent layer (or layers) . After computation, the neural network provides an output 205, e.g., an evaluation result. As depicted in FIG. 2, the output 205 can include a plurality of weights or matrices.
A convolutional neural network (CNN) is a neural network category. CNN is widely used in many technical fields. For example, a CNN can perform visual tasks, e.g., image features/patterns learning or recognition.
FIG. 3A illustrates an exemplary parallel computing architecture 300, according to some embodiments of the disclosure. As shown in FIG. 3A, architecture 300 can include a chip communication system 302, a host memory 304, a memory controller 306, a direct memory access (DMA) unit 308, a Joint Test Action Group (JTAG) /Test Access End (TAP) controller 310, a peripheral interface 312, a bus 314, a global memory 316, and the like. It is appreciated that chip communication system 302 can perform algorithmic operations (e.g., machine learning operations) based on communicated data.
On-chip communication system 302 can include a global manager 3022 and a plurality of cores 3024. Global manager 3022 can include at least one task manager to coordinate with one or more cores 3024. Each task manager can be associated with an array of cores 3024 that provide synapse/neuron circuitry for parallel computation (e.g., the neural network) . For example, the top layer of processing elements of FIG. 3A may provide circuitry representing an input layer to a neural network, while the second layer of cores may provide circuitry  representing a hidden layer of the neural network. In some embodiments, on-chip communication system 302 can be implemented as a neural network processing unit (NPU) , a graphic processing unit (GPU) , or another heterogeneous accelerator unit. As shown in FIG. 3A, global manager 3022 can include two task managers to coordinate with two arrays of cores.
Cores 3024, for example, can include one or more processing elements that each include single instruction, multiple data (SIMD) architecture including one or more processing units configured to perform one or more operations (e.g., multiplication, addition, multiply-accumulate, etc. ) based on instructions received from global manager 3022. To perform the operation on the communicated data packets, cores 3024 can include one or more processing elements for processing information in the data packets. Each processing element may comprise any number of processing units. In some embodiments, core 3024 can be considered a tile or the like.
Host memory 304 can be off-chip memory such as a host CPU’s memory. For example, host memory 304 can be a DDR memory (e.g., DDR SDRAM) or the like. Host memory 304 can be configured to store a large amount of data with slower access speed, compared to the on-chip memory integrated within one or more processors, acting as a higher-level cache.
Memory controller 306 can manage the reading and writing of data to and from a specific memory block within global memory 316 having on-chip memory blocks (e.g., 4 blocks of 8GB second generation of high bandwidth memory (HBM2) ) to serve as main memory. For example, memory controller 306 can manage read/write data coming from outside chip communication system 302 (e.g., from DMA unit 308 or a DMA unit corresponding with another NPU) or from inside chip communication system 302 (e.g., from a local memory in core  3024 via a 2D mesh controlled by a task manager of global manager 3022) . Moreover, while one memory controller is shown in FIG. 3A, it is appreciated that more than one memory controller can be provided in architecture 300. For example, there can be one memory controller for each memory block (e.g., HBM2) within global memory 316.
Memory controller 306 can generate memory addresses and initiate memory read or write cycles. Memory controller 306 can contain several hardware registers that can be written and read by the one or more processors. The registers can include a memory address register, a byte-count register, one or more control registers, and other types of registers. These registers can specify some combination of the source, the destination, the direction of the transfer (reading from the input/output (I/O) device or writing to the I/O device) , the size of the transfer unit, the number of bytes to transfer in one burst, and/or other typical features of memory controllers.
DMA unit 308 can assist with transferring data between host memory 304 and global memory 316. In addition, DMA unit 308 can assist with transferring data between multiple on-chip communication systems (e.g., 302) . DMA unit 308 can allow off-chip devices to access both on-chip and off-chip memory without causing a CPU interrupt. Thus, DMA unit 308 can also generate memory addresses and initiate memory read or write cycles. DMA unit 308 also can contain several hardware registers that can be written and read by the one or more processors, including a memory address register, a byte-count register, one or more control registers, and other types of registers. These registers can specify some combination of the source, the destination, the direction of the transfer (reading from the input/output (I/O) device or writing to the I/O device) , the size of the transfer unit, and/or the number of bytes to transfer in one burst. It is appreciated that architecture 300 can include a second DMA unit, which can be used to  transfer data between other neural network processing architectures to allow multiple neural network processing architectures to communication directly without involving the host CPU.
JTAG/TAP controller 310 can specify a dedicated debug port implementing a serial communications interface (e.g., a JTAG interface) for low-overhead access to the NPU without requiring direct external access to the system address and data buses. JTAG/TAP controller 310 can also have on-chip test access interface (e.g., a TAP interface) that implements a protocol to access a set of test registers that present chip logic levels and device capabilities of various parts.
Peripheral interface 312 (such as a PCIe interface) , if present, serves as an (and typically the) inter-chip bus, providing communication between architecture 300 and other devices.
Bus 314 includes both intra-chip bus and inter-chip buses. The intra-chip bus connects all internal components to one another as called for by the system architecture. While not all components are connected to every other component, all components do have some connection to other components they need to communicate with. The inter-chip bus connects the NPU with other devices, such as the off-chip memory or peripherals. Typically, if there is a peripheral interface 312 (e.g., the inter-chip bus) , bus 314 is solely concerned with intra-chip buses, though in some implementations it could still be concerned with specialized inter-bus communications.
On-chip communication system 302 can be configured to perform operations based on neural networks.
Architecture 300 can also include a host unit 320. Host unit 320 can be one or more processing unit (e.g., an X86 central processing unit, an ARM processor, and the like) . In some embodiments, a host system having host unit 320 and host memory 304 can comprise a compiler  (not shown) . The compiler is a program or computer software that transforms computer codes written in one programming language into NPU instructions to create an executable program. In machine learning applications, a compiler can perform a variety of operations, for example, pre-processing, lexical analysis, parsing, semantic analysis, conversion of input programs to an intermediate representation, code optimization, and code generation, or combinations thereof.
In some embodiments, the compiler that generates the NPU instructions can be on the host system, which pushes commands to chip communication system 302. Based on these commands, each task manager can assign any number of tasks to one or more cores (e.g., core 3024) . Some of the commands can instruct DMA unit 308 to load the instructions (generated by the compiler) and data from host memory 304 into global memory 316. The loaded instructions can then be distributed to each core assigned with the corresponding task, and the one or more cores can process these instructions.
It is appreciated that when a neural network has a simple architecture (e.g., with 5 layers) , the neural network can be executed on host unit 320 without using on-chip communication system 302. In other words, in some embodiments, the training of a neural network can be implemented on on-chip communication system 302, while the application of the trained neural network can be implemented on host unit 320.
FIG. 3B illustrates a schematic diagram of an exemplary cloud system 330 incorporating parallel computing architecture 300, according to some embodiments of the disclosure.
As shown in FIG. 3B, cloud system 330 can provide cloud service with artificial intelligence (AI) capabilities, and can include a plurality of computing servers (e.g., 332 and 334) . In some embodiments, a computing server 332 can, for example, incorporate parallel  computing architecture 300 of FIG. 3A. Parallel computing architecture 300 is shown in FIG. 3B in a simplified manner for simplicity and clarity.
With the assistance of parallel computing architecture 300, cloud system 330 can provide the extended AI capabilities of image recognition, facial recognition, translations, 3D modeling, and the like.
It is appreciated that, parallel computing architecture 300 can be deployed to computing devices in other forms. For example, parallel computing architecture 300 can also be integrated in a computing device, such as a smart phone, a tablet, and a wearable device.
Moreover, while a parallel computing architecture is shown in FIGs. 3A-3B, it is appreciated that any accelerator that provides the ability to perform parallel computation can be used.
As discussed above, conventionally, for enhancing an image, a 3D look-up table (LUT) is designed manually or a pixel-level mapping relationship can be trained. However, both methods can be highly time-consuming and inadaptive.
Embodiments of the disclosure provide methods and systems for adaptively processing an input image using an adaptive image enhancement matrix determined in association with a neural network. An adaptive image enhancement matrix can be generated based on a plurality of basic image enhancement image matrices or a plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices. At least one of the plurality of basic image enhancement image matrices or the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices can be adaptively generated based on the input image using a neural network, more particularly, based on color information of the input image. Also, as the color information can be processed by a relatively  simple neural network, the image processing can be implemented on any computing platform, and the delay caused by enhancing an image using this adaptive image enhancement matrix is small enough to allow real-time video enhancement.
An image enhancement matrix (e.g., the adaptive image enhancement matrix) can be associated with a mapping relationship between an input image to be processed and a processed image. By applying the image enhancement matrix on the input image, parameters of the input image (e.g., RGB values of a pixel) can be adjusted to generate the processed image. Generally, the processed image can present enhanced image quality. The image enhancement matrix can be a one-dimension matrix, a two-dimension matrix, a three-dimension matrix, or the like. For example, the one-dimension matrix can be used to adjust a gamma curve of an image, the two-dimension matrix can be used to adjust saturation, sharpness, and the like of an image, and the three-dimension (3D) matrix can be used to adjust the color space of an image (e.g., values of red, green, blue components (RGB values) , values of hue, saturation, lightness (HSL values) , LumaChroma (YUV) values of the image) . In some embodiments, the 3D matrix can be referred to a 3D look-up table (LUT) . It is appreciated that the enhancement matrix can include more dimensions than the above examples.
As an example, a 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an image.
In some embodiments, an image enhancement model, including at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, or the neural network  can be trained based on a paired or unpaired reference image using a neural network. Processes for the above training will be described as below.
FIG. 4A illustrates a schematic diagram of a process 400 for generating an image enhancement model, according to some embodiments of the disclosure.
As shown in FIG. 4A, an input image 402a, which is a high resolution image, can be down-sampled into a reduced input image 402b with a low resolution. For example, an image with a size of 4096×4096 can be down-sampled into an image with 256×256. Then, reduced input image 402b can be sent to a neural network 404 as an input. It is appreciated that more than one input images can be used as the input to neural network 404. This embodiment can achieve higher efficiency and lower memory consumption.
Neural network 404 can process a color space (e.g., RGB values, HSL values, YUV values, or the like) of reduced input image 402b and output a plurality of enhancement weights (e.g.,  elements  406a, 406b, and 406c of FIG. 4A) based on reduced input image 402b. The plurality of enhancement weights can correspond to a plurality of basic image enhancement matrices (e.g., 408a, 408b, and 408c of FIG. 4A) , respectively. The plurality of basic image enhancement matrices can be preset or randomly generated. Generally, a size of a basic image enhancement matrix is 32×32×32.
In some embodiments, neural network 404 can output the plurality of basic image enhancement matrices (e.g., elements 408a-408c of FIG. 4A) based on reduced input image 402b. The plurality of basic image enhancement matrices can correspond to the plurality of enhancement weights, which can be preset or randomly generated.
Neural network 404 can be a convolutional neural network (CNN) with five convolution blocks, each having a convolutional layer, a leaky ReLU, and an instance  normalization layer, a dropout layer, and a fully-connected layer. As the size of the input (i.e., reduced input image 402b) is highly reduced, neural network 404 can have a simple structure to accelerate the training of the image enhancement model. For example, a first convolution block can have 3 kernels and 16 channels, a second convolution block can have 3 kernels and 32 channels, a third convolution block can have 3 kemels and 64 channels, a fourth convolution block and a fifth convolution block can each have 3 kemels and 128 channels. The dropout layer can be added to neural network 404 to avoid overfitting, and the fully-connected layer can output a plurality of enhancement weights (e.g.,  elements  406a, 406b, and 406c of FIG. 4A) . It is appreciated that neural network 404 can also be a Unet, a LeNet, an AlexNet, a VGG, a GoogleNet, a ResNet, a DenseNet, or the like.
Then, an adaptive image enhancement matrix 410 can be generated based on the plurality of basic image enhancement matrices and the plurality of enhancement weights. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate adaptive image enhancement matrix 410.
Adaptive image enhancement matrix 410 can be applied on input image 402a to generate an enhanced input image 412.
As discussed above, at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, or the neural network can be trained by a group of reference images. For example, the group of reference images can include an image set of MIT-Adobe “FiveK” or an image set of HDR+. As another example, the group of reference images can be selected by a user of a terminal device (e.g., a mobile phone, a drone, and the like) , so that an  adaptive image enhancement matrix reflecting the user’s preference can be generated based on the plurality of basic image enhancement image matrices and the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices in association with the selected group of reference image.
It is appreciated that the group of reference images can be associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) . And at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, and the neural network can be trained for each scene, so that adaptive image enhancement matrices corresponding to each scene can be generated. As an example, a camera system can determine a scene of an image to be processed and then determine, for the image, at least one of the plurality of basic image enhancement image matrices, the plurality enhancement weights corresponding to the plurality of basic image enhancement image matrices, and the neural network corresponding to the scene.
The group of reference images can include an enhanced reference image (e.g., an enhanced image from the image set of MIT-Adobe “FiveK” or the image set of HDR+) . In some embodiments, neural network 404 can be updated by comparing the enhanced reference image and enhanced input image 412. For example, neural network weights of neural network 404 can be updated.
Neural network weights are provided for nodes of each layer of a neural network (e.g., neural network 100 of FIG. 1) . For example, a set of neural network weights can be provided for each node of hidden layer 130-n, so that the products of the neural network weights on each node in output layer 140 can be summed as outputs of output layer 140. In a first example, as shown in  FIG. 4A, these outputs can be further used as enhancement weights for fusing basic image enhancement matrices into the adaptive image enhancement matrix.
FIG. 4B illustrates a schematic diagram of a process 410 for generating an image enhancement model, according to some embodiments of the disclosure. As shown in FIG. 4B, in a second example, enhancement weights 416a-416c can be preset or randomly generated, and a neural network 414 can output a plurality of basic image enhancement matrices corresponding to enhancement weights 418a-418c.
However, the set of neural network weights that are initially provided are usually not optimized, and can be further optimized through backpropagation based on a loss function reflecting a difference between a prediction value (e.g., enhanced input image 412) and an actual value (e.g., the enhanced reference image) . For example, a derivative of the loss function can be used to determine how the set of neural network weights affect the outputs of a neural network, and the derivative of the loss function can also be applied on the set of neural network weights to adjust values of these neural network weights. In the first example above, in addition to the neural network weights of neural network 404, parameters of basic image enhancement matrices 408a-408c (e.g., values of one or more elements of a basic image enhancement matrix) can also be adjusted using the loss function. And in the second example above, in addition to the neural network weights of neural network 414, values of enhancement weights 416a-416c can also be adjusted using the loss function. In some embodiments, the basic image enhancement matrices of the first example and the enhancement weights of the second example may not be adjusted. In other words, in some embodiments, only the neural network (e.g., neural network 404 of FIG. 4A or neural network 414 of FIG. 4B) can be updated in association with the enhanced input image and the enhanced reference image.
In some embodiments, the group of reference images can include a plurality of sets of paired images. Each set of paired images can include an original reference image and an enhanced reference image corresponding to the original reference image. The enhanced reference image can be an image with high quality (e.g., an image that is manually adjusted by an expert) .
In some embodiments, the original reference image can be used as input image 402a. Therefore, enhanced input image 412 is generated by applying adaptive image enhancement matrix 410 on input image 402a (e.g., the original reference image) . Because the enhanced reference image corresponds to the original reference image, the enhanced reference image can be compared with enhanced input image 412 to determine whether adaptive image enhancement matrix 410 can be further improved. In other words, at least one of basic image enhancement matrices 408a-408c, enhancement weights406a-406c, and neural network 404 can be updated by comparing the enhanced reference image and enhanced input image 412. And a paired loss function can be used to reflect a difference between the enhanced reference image and enhanced input image 412.
FIG. 4C illustrates a schematic diagram of updating basic image enhancement matrices (e.g., 408a-408c) and a neural network using a paired loss function 420, according to some embodiments of the disclosure. In the above first example, paired loss function 420 can be used to update at least one of a plurality of basic image enhancement matrices 408a-408c or neural network 404.
In some embodiments, paired loss function 420 can be a mean-square-error (MSE) loss function as Equation (1) below.
Figure PCTCN2020106650-appb-000001
, wherein T is a number of sets of paired images (e.g., an original reference image and an enhanced reference image corresponding to the original reference image) used in training, q t is the enhanced input image (e.g., enhanced input image 412) , y t is the enhanced reference image (e.g., 414 of FIG. 4C) corresponding to the enhanced input image. L MSE determined based on a difference (e.g., q t -y t) between the enhanced input image and the enhanced reference image using the loss function can be referred to as a loss. It is appreciated that other loss functions also can be used, such as an L1 loss function, a perceptual function, and the like.
As discussed above, the paired loss function can be used to adjust parameters of at least one of the plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4C) or the neural network (e.g., 404 in FIG. 4C) . And the loss between the enhanced input image and the enhanced reference image can be used to determine whether the adjusted parameters are optimized. It is appreciated the enhanced input image and the loss associated with the enhanced input image can be updated when the parameters of basic image enhancement matrices and the neural network are updated. For example, when the loss is less than a given threshold, it can be determined that the parameters of at least one of the plurality of basic image enhancement matrices or the neural network are optimized. And the optimized basic image enhancement matrices and the optimized neural network can be finalized as an image enhancement model and used for enhancing any input image.
In some embodiments, the paired loss function can further include a smooth regularization factor (R s) and a monotonicity regularization factor (R m) .
The smooth regularization factor can convert input values (e.g., RGB values) of the input image into a desired color space without generating many artifacts, and therefore, smooth  the output of the adaptive image enhancement matrix. In some embodiments, the smooth regularization factor can include a total variation (R TV) determined as Equation (2) below.
Figure PCTCN2020106650-appb-000002
In addition to the total variation (R TV) , the smooth regularization factor can also include a factor determined based on the enhancement weights (W n) generated by neural network 404 of FIG. 4A. The factor can be expressed as ∑ n ||w n|| 2. Therefore, in some embodiments, the smooth regularization factor R s can be expressed as Equation (3) below.
R s= R TV + ∑ n||w n|| 2        Eq. (3)
The monotonicity regularization factor can preserve relative brightness and saturation of input values (e.g., RGB values) of the input image and update parameters that are not activated by the input values, thus improving generalization capability of the adaptive image enhancement matrix. The monotonicity regularization factor R m can be expressed as Equation (4) below.
Figure PCTCN2020106650-appb-000003
, wherein g() is a standard ReLU operation, such as g (a) =max (0, a) . The monotonicity regularization factor can ensure that the output value
Figure PCTCN2020106650-appb-000004
increases with the indices i, j, k and larger indices i, j, k correspond to larger input values in the adaptive image enhancement matrix.
In some embodiments, when the plurality of reference images include an enhanced reference image that is different from the input image (e.g., 402a in FIG. 4A) , an unpaired loss function can be used for updating at least one of a plurality of basic image enhancement matrices or the neural network. More particularly, the enhanced reference image corresponding to the input image can contain a different content from the input image. In other words, the enhanced input image will be compared with an enhanced reference image containing a different content,  and whether the enhanced input image is optimized can be determined using unsupervised learning.
FIG. 4D illustrates a schematic diagram of updating basic image enhancement matrices (e.g., 408a-408c) and a neural network using an unpaired loss function 430, according to some embodiments of the disclosure. Unpaired loss function 430 can be used to update the machine learning model (e.g., basic image enhancement matrices 408a-408c or neural network 404) . As shown in FIG. 4D, the contents of enhanced input image 412 and enhanced reference image 414 are different.
In some embodiments, unpaired loss function 430 can be a generative adversarial network (GAN) loss function based on a GAN. Generally, a GAN can include a generator and a discriminator. As an example, the generator can create and pass an enhanced image to the discriminator, and if the discriminator determines a rate of the enhanced image being optimized is 50%, then the discriminator is fooled and the enhanced image is optimized. In some embodiments, the generator of the GAN can include the plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4A) and the neural network (e.g., 404 in FIG. 4A) , or the plurality of enhancement weights (e.g., 416a-416c in FIG. 4B) and the neural network (e.g., 414 in FIG. 4B) . The discriminator can include a neural network that has a same architecture as  neural network  404 or 414. For example, the discriminator can be a convolutional neural network having the same architecture as  neural network  404 or 414.
With reference to FIG. 4D, unpaired loss function 430 can include a generator function 432 associated with the generator of the GAN and a discriminator function 434 associated with the discriminator of the GAN.
Generator function 432 can be expressed as Equation (5) below.
L G= E x [-D (G (x) ) ] + λ 1E x [||G (x) -x|| 2]     Eq. (5)
, wherein -D (G (x) ) is a factor for enforcing the generator of the GAN to fool the discriminator, || G (x) -x || 2 is a factor that ensures the image generated by the GAN preserves a same content as the input image, and λ 1 is a constant parameter to balance the above two factors. As an example, λ 1 is set to 1,000.
Discriminator function 434 can be expressed as Equation (6) below.
Figure PCTCN2020106650-appb-000005
, wherein E x [D (G (x) ) ] and E y [D (y) ] represent discriminator losses,
Figure PCTCN2020106650-appb-000006
is a gradient penalty for stabilizing the training.
And unpaired loss function 430 can be expressed using Equation (7) .
L gan = L G + L D      Eq. (7)
In some embodiments, unpaired loss function 430 can further include a smooth regularization factor (R s) and a monotonicity regularization factor (R m) , as described with reference to FIG. 4C.
Similarly, in the above first example, the unpaired loss function can also be used to adjust parameters of at least one of a plurality of basic image enhancement matrices (e.g., 408a-408c in FIG. 4D) or the neural network (e.g., 404 in FIG. 4D) . And the loss between the enhanced input image and the enhanced reference image can be used to determine whether the adjusted parameters are optimized. It is appreciated the enhanced input image and the loss associated with the enhanced input image can be re-determined after the parameters of basic image enhancement matrices and the neural network are updated. For example, when the loss is less than a given threshold, it can be determined that the parameters of basic image enhancement matrices and the neural network are optimized. And an image enhancement model including at  least one of the optimized basic image enhancement matrices or the optimized neural network can be finalized and used for enhancing any input image.
It is appreciated that although FIGs. 4C-4D show applying a paired loss function or an unpaired loss function on at least one of a plurality of basic image enhancement matrices or a neural network (i.e., the first example) , the paired loss function or the unpaired loss function can also be applied on at least one of a plurality of enhancement weights or a neural network (i.e., the second example) .
In some embodiments, an image enhancement model including at least one of a plurality of basic image enhancement matrices, a plurality of enhancement weights, or a neural network can be trained in a first apparatus and applied in a second apparatus. For example, an image enhancement model can be trained by a smart phone manufacturer and preset in small phones. As another example, an image enhancement model can be remotely trained by a cloud system based on a group of reference images selected by a user of a smart phone, but the trained image enhancement model can be applied locally in the smart phone. In some embodiments, the image enhancement model can be trained and used by a same apparatus. For example, the image enhancement model can be trained by a smart phone, and applied in the same smart phone.
In some embodiments, in applying an image enhancement model on an input image, trilinear interpolation can be adopted as an input image may have more bits than an adaptive image enhancement matrix generated by the image enhancement model. For example, as discussed above, a size of a basic image enhancement matrix can be 32×32×32, and if the input image is an image in 8 bits, an output image can be interpolated.
In an exemplary trilinear interpolation, an input color of a pixel of the input image can be determined. The input color can includes RGB values of the pixel. Based on the RGB  values of the input color, a maximum color value and a number of elements in the adaptive image enhancement matrix (e.g., a 3D look-up table (LUT) ) , a location for the input color in the 3D LUT can be determined using Equation (8) below.
Figure PCTCN2020106650-appb-000007
, wherein r  (x, y, z) , g  (x, y, z) , b  (x, y, z) represent the RGB values of the input color, respectively. 
Figure PCTCN2020106650-appb-000008
wherein C max represents the maximum color value and M represents the number of elements in the 3D LUT.
Then, based on the location (x, y, z) of the input color in the 3D LUT, nearest 8 surrounding elements can be used to interpolate an output color.
FIG. 4E illustrates a schematic diagram of a process 440 for generating an image enhancement model, according to some embodiments of the disclosure. As shown in FIG. 4E, unlike FIGs. 4A-4B, no neural network is provided. Instead, a plurality of enhancement weights 446a-446c and a plurality of basic image enhancement matrices 448a-448c are preset or randomly generated. A loss function and a loss determined based on a difference between a reference image 442 and an enhanced input image 412 using the loss function can be used to iteratively update the plurality of enhancement weights 446a-446c and the plurality of basic image enhancement matrices 448a-448c. When the loss satisfies a given condition (e.g., the loss being less than a given threshold) , the plurality of enhancement weights 446a-446c and the plurality of basic image enhancement matrices 448a-448c can be finalized as an image enhancement model.
FIG. 5 is a flowchart of an exemplary computer-implemented method 500 for processing an image, according to some embodiments of the disclosure. Method 500 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG.  3A) , an X86 central processing unit, an ARM processor, or the like. Method 500 can include steps as below.
At step 502, an image enhancement model for an input image is generated. The image enhancement model can include a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices. In a first example, the basic image enhancement matrices (e.g., 408a-408c of FIG. 4A) can be preset or randomly generated. Generally, a size of a basic image enhancement matrix is 32×32×32. In a second example, the enhancement weights (418a-418c of FIG. 4B) can be preset or randomly generated.
In the first example, a first neural network (e.g., neural network 404 of FIG. 4A) can be used to generate a plurality of enhancement weights (e.g., 406a-406c of FIG. 4A) corresponding to the plurality of basic image enhancement matrices based on an input image. In the second example, a first neural network (e.g., neural network 414 of FIG. 4B) can be used to generate a plurality of basic image enhancement matrices (e.g., 418a-418c of FIG. 4B) corresponding to the plurality of enhancement weights based on an input image. The first neural network can be a convolutional neural network, a Unet, a LeNet, an AlexNet, a VGG, a GoogleNet, a ResNet, a DenseNet, or the like.
To generate the plurality of enhancement weights or the plurality of basic image enhancement matrices using the first neural network, an input image can be down-sampled into a reduced image (e.g., 402b of FIG. 4A) . Then, at least one of the plurality of enhancement weights or the plurality of basic image enhancement matrices can be generated based on the reduced image using the first neural network. The first neural network can process color space information of an image. More particularly, the color space information can include RGB values,  HSL values, or YUV values of a pixel of an image. As the color space information is relatively simple for a neural network to process, the neural network can be sufficient for a general processor (e.g., an X86 central processing unit, an ARM processor, or the like) to execute. Accordingly, the plurality of basic image enhancement matrices and the generated adaptive image enhancement matrix can be color space models.
At step 504, an adaptive image enhancement matrix is generated based on the plurality of enhancement weights and the plurality of basic image enhancement matrices. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate the adaptive image enhancement matrix.
At step 506, the input image is enhanced using the adaptive image enhancement matrix. For example, the adaptive image enhancement matrix can be a three-dimension look-up table (3D LUT) . For example, the 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image. It is appreciated that a 3D LUT can also be applied on an input image to change, for example, hue, saturation, lightness of the input image, depending on the nature of the 3D LUT.
At step 508, the image enhancement model is updated based on the enhanced input image. For example, at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices, or the first neural network can be updated by comparing the enhanced input image with a reference image, and eventually finalized. The reference image can be selected by a user of a terminal device. For example, the user can select a reference image according to his/her  preference. In some embodiments, the reference image is associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) . Accordingly, at least one of the plurality of finalized basic image enhancement matrices, the plurality of finalized enhancement weights, or the finalized first neural network can be associated with the scene.
In some embodiments, the reference image can include a content that is the same as the input image, and the content of the reference image is enhanced. Therefore, the reference image and the input image are paired images. Then, in determining the loss associated with reference image and the enhanced input image, a difference between the enhanced input image and the enhanced reference image can be determined, and the loss can be determined using a paired loss function based on the difference. As discussed with reference to FIG. 4C, the paired loss function can include a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function. To improve the quality of the generated image, the paired loss function can further include a smooth regularization factor and a monotonicity regularization factor.
In some embodiments, the reference image can include a content that is different from the input image, and the content is also enhanced. In other words, the enhanced input image will be compared with the reference image containing a different but enhanced content, and whether the enhanced input image is optimized can be determined using unsupervised learning. Thus, in determining the loss associated with the reference image and the enhanced input image, an unpaired loss function can be determined using a second neural network, and the loss can be determined using the unpaired loss function based on the enhanced input image and the enhanced reference image.
The second neural network can be referred to as a generative adversarial network (GAN) , and can include a generator network and a discriminator network, as described above. The generator network can include the plurality of basic image enhancement matrices and the first neural network. In some embodiments, the generator network can include the plurality of enhancement weights and the first neural network, or the plurality of enhancement weights and the plurality of basic image enhancement matrices. The discriminator network is a convolutional neural network (CNN) having e.g., the same architecture as the first neural network. Thus, the unpaired loss function (e.g., 430 of FIG. 4D) can include a first loss function (e.g., 432 of FIG. 4D) associated with the generator network and a second loss function (e.g., 434 of FIG. 4D) associated with the discriminator network. Similarly as the paired loss function, the unpaired loss function can also include a smooth regularization factor and a monotonicity regularization factor.
It is appreciated that, in some embodiments, method 500 can further include a step for determining whether the reference image is paired with the input image, e.g., using a third neural network or according to an indicator provided by the user. And in response to the determination that the reference image is paired with the input image, the paired loss function can be used. In response to the determination that the reference image is not paired with the input image, the unpaired loss function can be used.
At least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network can be updated according to the loss determined by the loss function (e.g., the paired loss function or the unpaired loss function) . For example, a derivative of the loss function can be applied to first neural network and used to update neural network weights of the first neural network. Similarly, the derivative of the loss function can also be used to update the plurality of basic image enhancement matrices and the plurality of enhancement weights.
In some embodiments, the plurality of basic image enhancement matrices and the first neural network can be updated according to the loss determined by the loss function. And the first neural network can output the plurality of enhancement weights. For example, the values of elements of each basic image enhancement matrix can be updated, and parameters (e.g., neural network weights) of the first neural network can also be updated. It is appreciated that when the first neural network is updated, the plurality of output enhancement weights can also be updated accordingly.
In some embodiments, only the first neural network can be updated according to the loss determined by the loss function. The first neural network can output the plurality of enhancement weights. And the plurality of basic image enhancement matrices can be predetermined and kept unchanged. Similarly, when the first neural network is updated, the plurality of output enhancement weights can also be updated accordingly.
In some embodiments, the plurality of enhancement weights and the first neural network can be updated according to the loss determined by the loss function. And the first neural network outputs the plurality of basic image enhancement matrices. For example, the values of the enhancement weights can be updated, and parameters (e.g., neural network weights) of the first neural network can also be updated. It is appreciated that when the first neural network is updated, the plurality of output basic image enhancement matrices can also be updated accordingly.
In some embodiments, only the first neural network can be updated. The first neural network can output the plurality of basic image enhancement matrices. And the plurality of enhancement weights can be predetermined and kept unchanged. Similarly, when the first neural  network is updated, the plurality of output basic image enhancement matrices can also be updated accordingly.
Each time the updating is performed, a new image enhancement model, including at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network, can be generated and a new enhanced input image can be compared with the reference image and used to train the image enhancement model. Thus, the updating of at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network can be performed iteratively. The loss determined by the loss function can be used to determine whether the training of the image enhancement model is finished.
In some embodiments, when the loss satisfies a given condition, at least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights corresponding to the plurality of basic image enhancement matrices, or the updated first neural network can be finalized. For example, when the loss is less than a given threshold, it can be determined that the training of the image enhancement model is finished, and at least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights corresponding to the plurality of basic image enhancement matrices, or the updated first neural network can be finalized.
FIG. 6 is a flowchart of an exemplary computer-implemented method 600 for processing an image, according to some embodiments of the disclosure. More particularly, method 600 can be used to adaptively enhance an image. Method 600 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG. 3A) , an X86 central processing unit, an ARM processor, or the like. Method 600 can include steps as below.
At step 602, an input image is received. The input image can be an existing image that is stored locally, or an image instantly taken by a camera system. In some embodiments, the input image can be a frame of a video clip or a video streaming.
At step 604, enhancement parameters are determined for the input image using a neural network (e.g.,  neural network  404 or 414 in FIGs. 4A-4B) . The enhancement parameters can include a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices.
In some embodiments, the plurality of basic image enhancement matrices can be determined based on the input image using the neural network. And the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices are preset. That is, the plurality of basic image enhancement matrices can be adaptively generated in accordance with the input image using the neural network.
In some embodiments, the plurality of enhancement weights can be determined based on the input image using the neural network. The plurality of basic image enhancement matrices corresponding to the plurality of enhancement weights are preset. That is, the plurality of enhancement weights can be adaptively generated in accordance with the input image using the neural network.
In some embodiments, a scene associated with the input image (e.g., sky, person, pet, sport, flower, night scene, and the like) can be determined, e.g., using a scene identification neural network, and the enhancement parameters can be determined according to the scene. The scene identification neural network can be part of e.g.,  neural network  404 or 414 in FIGs. 4A-4B, or an independent neural network. For example, the scene identification neural network can  determine that the enhancement parameters correspond to a “sky” scene, when a main part of the input image is directed to sky.
It is appreciated that the plurality of enhancement weights, the plurality of basic image enhancement matrices, and the neural network can be trained using above method 500.
At step 606, an adaptive image enhancement matrix is generated based on the enhancement parameters. The adaptive image enhancement matrix can be generated based on the plurality of basic image enhancement matrices and the plurality of enhancement weights. For example, each of the basic image enhancement matrices can be multiplied with an enhancement weight corresponding to the basic image enhancement matrix, and the results can be added together to generate the adaptive image enhancement matrix.
At step 608, the input image isenhanced using the adaptive image enhancement matrix. The adaptive image enhancement matrix can be associated with a mapping relationship between an input image to be processed and a processed image. By applying the adaptive image enhancement matrix on the input image, parameters of the input image (e.g., RGB values of a pixel) can be adjusted to generate the processed image. Generally, the processed image can present enhanced image quality. The image enhancement matrix can be a one-dimension matrix, a two-dimension matrix, a three-dimension matrix, or the like. For example, the one-dimension matrix can be used to adjust a gamma curve of an image, the two-dimension matrix can be used to adjust saturation, sharpness, and the like of an image, and the three-dimension (3D) matrix can be used to adjust the color space of an image (e.g., values of red, green, blue components (RGB values) , values of hue, saturation, lightness (HSL values) , or Luma, Chroma (YUV values) of the image) . In some embodiments, the 3D matrix can be referred to a 3D look-up table (LUT) . It  is appreciated that the enhancement matrix can include more dimensions than the above examples.
As an example, a 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on the input image.
FIG. 7 is a flowchart of an exemplary computer-implemented method 700 for processing an image, according to some embodiments of the disclosure. Method 700 can be implemented by, for example, at least one of a parallel computing architecture (e.g., 300 of FIG. 3A) , an X86 central processing unit, an ARM processor, or the like. Method 700 can include steps as below.
At step 702, a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the image parameter mapping models are generated for an input image. The image parameter mapping models (e.g., 448a-448c of FIG. 4E) can be preset or randomly generated. In some embodiments, the plurality of image parameter mapping models can be color space models associated with color space information. For example, the color space information of the input image can include RGB values, HSL values, or YUV values of a pixel of the input image. The plurality of image parameter mapping models can be one-dimension matrices, two-dimension matrices, or three-dimension matrices. When an image parameter mapping model is a three-dimension matrix, a size of the image parameter mapping model can be 32×32×32. The enhancement weights (446a-446c of FIG. 4E) corresponding to the image parameter mapping models can be also preset or randomly generated.
At step 704, an adaptive image enhancement matrix is generated based on the plurality of enhancement weights and the plurality of image parameter mapping models. For  example, each of the image parameter mapping models can be multiplied with an enhancement weight corresponding to the image parameter mapping model, and the results can be added together to generate the adaptive image parameter mapping model. The plurality of image parameter mapping models and the generated image parameter mapping model can be color space models.
At step 706, the input image is enhanced using the adaptive image parameter mapping model. For example, the adaptive image parameter mapping model can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image.
At step 708, the plurality of image parameter mapping models and the plurality of enhancement weights are updated based on the enhanced input image. For example, at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be updated by comparing the enhanced input image with a reference image, and eventually finalized. The reference image can be selected by a user of a terminal device. For example, the user can select a reference image in his/her preference. In some embodiments, the reference image is associated with a scene (e.g., sky, person, pet, sport, flower, night scene, and the like) . Accordingly, at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be associated with the scene.
In some embodiments, the plurality of image parameter mapping models can be updated according to the loss. And the plurality of enhancement weights can be predetermined and kept unchanged.
In some embodiments, the plurality of enhancement weights can be updated according to the loss. And the plurality of image parameter mapping models can be predetermined and kept unchanged.
In some embodiments, the plurality of image parameter mapping models and the plurality of enhancement weights can be updated according to the loss.
In some embodiments, the reference image can include a content that is the same as the input image, and the content of the reference image is enhanced. Therefore, the reference image and the input image are paired images. Then, in determining the loss associated with the reference image and the enhanced input image, a difference between the enhanced input image and the enhanced reference image can be determined, and the loss can be determined using a paired loss function based on the difference. As discussed with reference to FIG. 4C, the paired loss function can include a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function. To improve the quality of the generated image, the paired loss function can further include a smooth regularization factor and a monotonicity regularization factor.
In some embodiments, the reference image can include a content that is different from the input image, and the content is also enhanced. In other words, the enhanced input image will be compared with the reference image containing a different but enhanced content, and whether the enhanced input image is optimized can be determined using unsupervised learning. Thus, in determining the loss associated with the reference image and the enhanced input image, an unpaired loss function can be determined using a neural network, and the loss can be determined using the unpaired loss function based on the enhanced input image and the enhanced reference image.
The neural network can be referred to as a generative adversarial network (GAN) , and can include a generator network and a discriminator network. The generator network can include the plurality of image parameter mapping models and the plurality of enhancement weights. And the discriminator network is a convolutional neural network (CNN) . Thus, the unpaired loss function can include a first loss function associated with the generator network and a second loss function associated with the discriminator network. Similarly as the paired loss function, the unpaired loss function can also include a smooth regularization factor and a monotonicity regularization factor.
It is appreciated that, in some embodiments, method 700 can further include a step for determining whether the reference image is paired with the input image, e.g., using a neural network or according to an indicator provided by the user. And in response to the determination that the reference image is paired with the input image, the paired loss function can be used. In response to the determination that the reference image is not paired with the input image, the unpaired loss function can be used.
At least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be updated according to the loss determined by the loss function (e.g., the paired loss function or the unpaired loss function) . For example, a derivative of the loss function can be applied to the plurality of image parameter mapping models or the plurality of enhancement weights and used to update them.
Each time the updating is performed, a new image enhancement model, including at least one of the plurality of image parameter mapping models or the plurality of enhancement weights, can be generated and a new enhanced input image can be compared with the reference image and used to train the image enhancement model. Thus, the updating of at least one of the  plurality of image parameter mapping models or the plurality of enhancement weights can be performed iteratively. The loss determined by the loss function can be used to determine whether the training of the image enhancement model is finished.
In some embodiments, when the loss satisfies a given condition, at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be finalized. For example, when the loss is less than a given threshold, it can be determined that the training of the image enhancement model is finished, and at least one of the plurality of image parameter mapping models or the plurality of enhancement weights can be finalized.
For example, the plurality of image parameter mapping models and the plurality of enhancement weights can be finalized as an image enhancement model for the inference stage.
In the inference stage, each of the image parameter mapping models can be multiplied with an enhancement weight corresponding to the image parameter mapping model, and the results can be added together to generate an image enhancement model. For example, the image enhancement model can be a three-dimension look-up table (3D LUT) . For example, the 3D LUT can include mapping relationships for mapping a set of RGB values (e.g., 120, 140, 120) to another set of RGB values (e.g., 240, 140, 240) , so as to achieve certain effects on an input image. It is appreciated that an image enhancement model can also be applied on an input image to change, for example, hue, saturation, lightness of the input image, depending on the nature of the image enhancement model.
Embodiments of the disclosure also provide a computer program product. The computer program product may include a non-transitory computer readable storage medium having computer readable program instructions thereon for causing a processor to carry out the above-described methods.
The computer readable storage medium may be a tangible device that can store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM) , a static random access memory (SRAM) , a portable compact disc read-only memory (CD-ROM) , a digital versatile disk (DVD) , a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
The computer readable program instructions for carrying out the above-described methods may be assembler instructions, instruction-set-architecture (liSA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or source code or object code written in any combination of one or more programming languages, including an object-oriented programming language, and conventional procedural programming languages. The computer readable program instructions may execute entirely on a computer system as a stand-alone software package, or partly on a first computer and partly on a second computer remote from the first computer. In the latter scenario, the second, remote computer may be connected to the first computer through any type of network, including a local area network (LAN) or a wide area network (WAN) .
The computer readable program instructions may be provided to a processor of a computer, or other programmable data processing apparatus to produce a machine, such that the  instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the above-described methods.
It should be noted that when not conflicting, the embodiment in the application and the feature in embodiment can be mutually combined.
The apply scenarios are not limited to the ISP pipeline of various cameras, such as UAV, mobile phones, SLRs, mirrorless cameras, and action cameras. It can also be used in mobile phones and computer for tone-mapping.
The flow charts and diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices, methods, and computer program products according to various embodiments of the specification. In this regard, a block in the flow charts or diagrams may represent a software program, segment, or portion of code, which comprises one or more executable instructions for implementing specific functions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the diagrams and/or flow charts, and combinations of blocks in the diagrams and flow charts, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It is appreciated that certain features of the specification, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the specification, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any  suitable subcombination or as suitable in any other described embodiment of the specification. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements.
Although the specification has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. For example, although some embodiments are described using an image as an example, the described systems and methods can be applied on video processing. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

Claims (83)

  1. A system for processing an image, comprising:
    a memory for storing a set of instructions; and
    at least one processor configured to execute the set of instructions for causing the system to perform:
    generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the image enhancement model based on the enhanced input image, wherein
    at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image and a reference image.
  2. The system according to claim 1, wherein in updating the image enhancement model based on the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss associated with the reference image and the enhanced input image; and
    updating at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network according to the loss.
  3. The system according to claim 2, wherein in updating the at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network according to the loss, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    updating the plurality of basic image enhancement matrices and the first neural network, wherein the first neural network outputs the plurality of enhancement weights;
    updating only the first neural network, wherein the first neural network outputs the plurality of enhancement weights, and the plurality of basic image enhancement matrices are predetermined;
    updating the plurality of enhancement weights and the first neural network, wherein the first neural network outputs the plurality of basic image enhancement matrices; or
    updating only the first neural network, wherein the first neural network outputs the plurality of basic image enhancement matrices, and the plurality of enhancement weights are predetermined.
  4. The system according to claim 2 or 3, wherein the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    in response to the loss satisfying a given condition, finalizing at least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights  corresponding to the plurality of basic image enhancement matrices, or the updated first neural network.
  5. The system according to claim 2, wherein the reference image comprises a content as same as the input image, the content of the reference image is enhanced, and in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a difference between the enhanced input image and the reference image; and
    determining the loss using a loss function based on the difference.
  6. The system according to claim 5, wherein the loss function comprises a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  7. The system according to claim 6, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  8. The system according to claim 2, wherein the reference image comprises a content that is different from the input image, and wherein in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss function using a second neural network; and
    determining the loss using the loss function based on the enhanced input image and the reference image, wherein the second neural network comprises a generator network and a discriminator network.
  9. The system according to claim 8, wherein the generator network comprises the plurality of basic image enhancement matrices and the first neural network.
  10. The system according to claim 8 or 9, wherein the discriminator network is a convolutional neural network (CNN) .
  11. The system according to claim 8, wherein the loss function comprises a first loss function associated with the generator network and a second loss function associated with the discriminator network.
  12. The system according to claim 11, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  13. The system according to claim 1, wherein in generating the image enhancement model for the input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    down-sampling the input image into a reduced image; and
    generating the at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices based on the reduced image using the first neural network.
  14. The system according to claim 13, wherein the plurality of basic image enhancement matrices are color space models associated with color space information.
  15. The system according to claim 14, wherein the color space information of the reduced image comprises RGB values, HSL values, or YUV values ofa pixel of the reduced image.
  16. The system according to claim 1, wherein the reference image is selected by a user of the system.
  17. The system according to claim 4, wherein the reference image is associated with a scene, and at least one of the plurality of finalized basic image enhancement matrices, the plurality of finalized enhancement weights corresponding to the plurality of finalized basic image enhancement matrices, or the finalized first neural network is associated with the scene.
  18. The system according to claim 14, wherein the adaptive image enhancement matrix is a three-dimensional look-up table.
  19. A computer-implemented method for processing an image, comprising:
    generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the image enhancement model based on the enhanced input image, wherein
    at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
  20. The method according to claim 19, wherein updating the image enhancement model based on the enhanced input image further comprises:
    determining a loss associated with the reference image and the enhanced input image; and
    updating at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network according to the loss.
  21. The method according to claim 20, wherein updating the at least one of the plurality of basic image enhancement matrices, the plurality of enhancement weights, or the first neural network according to the loss further comprises:
    updating the plurality of basic image enhancement matrices and the first neural network, wherein the first neural network outputs the plurality of enhancement weights;
    updating only the first neural network, wherein the first neural network outputs the plurality of enhancement weights, and the plurality of basic image enhancement matrices are predetermined;
    updating the plurality of enhancement weights and the first neural network, wherein the first neural network outputs the plurality of basic image enhancement matrices; or
    updating only the first neural network, wherein the first neural network outputs the plurality of basic image enhancement matrices, and the plurality of enhancement weights are predetermined.
  22. The method according to claim 20 or 21, further comprising:
    in response to the loss satisfying a given condition, finalizing at least one of the plurality of updated basic image enhancement matrices, the plurality of updated enhancement weights corresponding to the plurality of basic image enhancement matrices, or the updated first neural network.
  23. The method according to claim 20, wherein the reference image comprises a content as same as the input image, the content of the reference image is enhanced, and determining the loss associated with the reference image and the enhanced input image:
    determining a difference between the enhanced input image and the reference image; and
    determining the loss using a loss function based on the difference.
  24. The method according to claim 23, wherein the loss function comprises a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  25. The method according to claim 24, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  26. The method according to claim 20, wherein the reference image comprises a content that is different from the input image, and determining the loss associated with the reference image and the enhanced input image:
    determining a loss function using a second neural network; and
    determining the loss using the loss function based on the enhanced input image and the reference image, wherein the second neural network comprises a generator network and a discriminator network.
  27. The method according to claim 26, wherein the generator network comprises the plurality of basic image enhancement matrices and the first neural network.
  28. The method according to claim 26, wherein the discriminator network is a convolutional neural network (CNN) .
  29. The method according to claim 26, wherein the loss function comprises a first loss function associated with the generator network and a second loss function associated with the discriminator network.
  30. The method according to claim 29, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  31. The method according to claim 19, wherein generating the image enhancement model for the input image using the first neural network:
    down-sampling the input image into a reduced image; and
    generating the at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices based on the reduced image using the first neural network.
  32. The method according to claim 31, wherein the plurality of basic image enhancement matrices are color space models associated with color space information.
  33. The method according to claim 32, wherein the color space information of the reduced image comprises RGB values, HSL values, or YUV values ofa pixel of the reduced image.
  34. The method according to claim 19, wherein the reference image is selected by a user of the system.
  35. The method according to claim 32, wherein the reference image is associated with a scene, and at least one of the plurality of finalized basic image enhancement matrices, the  plurality of finalized enhancement weights corresponding to the plurality of finalized basic image enhancement matrices, or the finalized first neural network is associated with the scene.
  36. The method according to claim 32, wherein the adaptive image enhancement matrix is a three-dimensional look-up table.
  37. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image, the method comprising:
    generating an image enhancement model for an input image using a first neural network, wherein the image enhancement model comprises a plurality of basic image enhancement matrices and a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of basic image enhancement matrices;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the image enhancement model based on the enhanced input image, wherein
    at least one of the plurality of basic image enhancement matrices or the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices is updated by comparing the enhanced input image with a reference image.
  38. A system for processing an image, comprising:
    a memory for storing a set of instructions; and
    at least one processor configured to execute the set of instructions for causing the system to perform:
    receiving an input image;
    determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the enhancement parameters; and
    enhancing the input image using the adaptive image enhancement matrix.
  39. The system according to claim 38, wherein in determining the enhancement parameters for the input image using the neural network, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining the plurality of basic image enhancement matrices based on the input image using the neural network, wherein the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices are preset; or
    determining the plurality of enhancement weights based on the input image using the neural network, wherein the plurality of basic image enhancement matrices corresponding to the plurality of enhancement weights are preset.
  40. The system according to claim 38 or 39, wherein in determining the enhancement parameters for the input image using the neural network, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a scene associated with the input image; and
    determining the enhancement parameters according to the scene.
  41. The system according to claim 38, wherein in generating the adaptive image enhancement matrix based on the enhancement parameters, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    generating the adaptive image enhancement matrix based on the plurality of basic image enhancement matrices and the plurality of enhancement weights.
  42. A computer-implemented method for processing an image, comprising:
    receiving an input image;
    determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the enhancement parameters; and
    enhancing the input image using the adaptive image enhancement matrix.
  43. The method according to claim 42, wherein determining the enhancement parameters for the input image using the neural network further comprises:
    determining the plurality of basic image enhancement matrices based on the input image using the neural network, wherein the plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices are preset; or
    determining the plurality of enhancement weights based on the input image using the neural network, wherein the plurality of basic image enhancement matrices corresponding to the plurality of enhancement weights are preset.
  44. The method according to claim 42 or 43, wherein determining the enhancement parameters for the input image using the neural network further comprises further comprises:
    determining a scene associated with the input image; and
    determining the enhancement parameters according to the scene.
  45. The method according to claim 42, wherein generating the adaptive image enhancement matrix based on the enhancement parameters further comprises:
    generating the adaptive image enhancement matrix based on the plurality of basic image enhancement matrices and the plurality of enhancement weights.
  46. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image, the method comprising:
    receiving an input image;
    determining enhancement parameters for the input image using a neural network, wherein the enhancement parameters comprises a plurality of basic image enhancement matrices or a plurality of enhancement weights corresponding to the plurality of basic image enhancement matrices;
    generating an adaptive image enhancement matrix based on the enhancement parameters; and
    enhancing the input image using the adaptive image enhancement matrix.
  47. A system for processing an image, comprising:
    a memory for storing a set of instructions; and
    at least one processor configured to execute the set of instructions for causing the system to perform:
    generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the plurality of image parameter mapping models and the plurality of enhancement weights based on the enhanced input image, wherein
    at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
  48. The system according to claim 47, wherein in updating the plurality of image parameter mapping models and the plurality of enhancement weights based on the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss associated with the reference image and the enhanced input image; and
    updating at least one of the plurality of image parameter mapping models or the plurality of enhancement weights according to the loss.
  49. The system according to claim 48, wherein in updating the at least one of the plurality of image parameter mapping models or the plurality of enhancement weights according to the loss, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    updating the plurality of image parameter mapping models according to the loss, wherein the plurality of enhancement weights are predetermined;
    updating the plurality of enhancement weights according to the loss, wherein the plurality of image parameter mapping models are predetermined; or
    updating the plurality of image parameter mapping models and the plurality of enhancement weights according to the loss.
  50. The system according to claim 48 or 49, wherein the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    in response to the loss satisfying a given condition, finalizing at least one of the plurality of image parameter mapping models or the plurality of enhancement weights.
  51. The system according to claim 48, wherein the reference image comprises a content as same as the input image, the content of the reference image is enhanced, and in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a difference between the enhanced input image and the reference image; and
    determining the loss using a loss function based on the difference.
  52. The system according to claim 51, wherein the loss function comprises a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  53. The system according to claim 52, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  54. The system according to claim 48, wherein the reference image comprises a content that is different from the input image, and wherein in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss function using a neural network; and
    determining the loss using the loss function based on the enhanced input image and the reference image, wherein the neural network comprises a generator network and a discriminator network.
  55. The system according to claim 54, wherein the generator network comprises the plurality of image parameter mapping models and the plurality of enhancement weights.
  56. The system according to claim 54, wherein the discriminator network is a convolutional neural network (CNN) .
  57. The system according to claim 54, wherein the loss function comprises a first loss function associated with the generator network and a second loss function associated with the discriminator network.
  58. The system according to claim 57, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  59. The system according to claim 47, wherein in generating the plurality of image parameter mapping models and the plurality of enhancement weights for the input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    down-sampling the input image into a reduced image; and
    generating the at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models based on the reduced image.
  60. The system according to claim 47, wherein the plurality of image parameter mapping models are color space models associated with color space information.
  61. The system according to claim 60, wherein the color space information of the input image comprises RGB values, HSL values, or YUV values ofa pixel of the input image.
  62. The system according to claim 60 or 61, wherein the plurality of image parameter mapping models are one-dimension matrices, two-dimension matrices, or three-dimension matrices.
  63. The system according to claim 47, wherein the reference image is selected by a user of the system.
  64. The system according to claim 50, wherein the reference image is associated with a scene and at least one of the plurality of finalized image parameter mapping models or the plurality of finalized enhancement weights corresponding to the plurality of finalized image parameter mapping models is associated with the scene.
  65. A computer-implemented method for processing an image, comprising:
    generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the image enhancement model based on the enhanced input image, wherein
    at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
  66. The method according to claim 65, wherein in updating the plurality of image parameter mapping models and the plurality of enhancement weights based on the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss associated with the reference image and the enhanced input image; and
    updating at least one of the plurality of image parameter mapping models or the plurality of enhancement weights according to the loss.
  67. The method according to claim 66, wherein in updating the at least one of the plurality of image parameter mapping models or the plurality of enhancement weights according  to the loss, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    updating the plurality of image parameter mapping models according to the loss, wherein the plurality of enhancement weights are predetermined;
    updating the plurality of enhancement weights according to the loss, wherein the plurality of image parameter mapping models are predetermined; or
    updating the plurality of image parameter mapping models and the plurality of enhancement weights according to the loss.
  68. The method according to claim 66 or 67, wherein the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    in response to the loss satisfying a given condition, finalizing at least one of the plurality of image parameter mapping models or the plurality of enhancement weights.
  69. The method according to claim 66, wherein the reference image comprises a content as same as the input image, the content of the reference image is enhanced, and in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a difference between the enhanced input image and the reference image; and
    determining the loss using a loss function based on the difference.
  70. The method according to claim 69, wherein the loss function comprises a means-square-error (MSE) loss function, a L1 loss function, or a perceptual loss function.
  71. The method according to claim 70, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  72. The method according to claim 66, wherein the reference image comprises a content that is different from the input image, and wherein in determining the loss associated with the reference image and the enhanced input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    determining a loss function using a neural network; and
    determining the loss using the loss function based on the enhanced input image and the reference image, wherein the neural network comprises a generator network and a discriminator network.
  73. The method according to claim 72, wherein the generator network comprises the plurality of image parameter mapping models and the plurality of enhancement weights.
  74. The method according to claim 72, wherein the discriminator network is a convolutional neural network (CNN) .
  75. The method according to claim 72, wherein the loss function comprises a first loss function associated with the generator network and a second loss function associated with the discriminator network.
  76. The method according to claim 65, wherein the loss function further comprises a smooth regularization factor and a monotonicity regularization factor.
  77. The method according to claim 65, wherein in generating the plurality of image parameter mapping models and the plurality of enhancement weights for the input image, the at least one processor is configured to execute the set of instructions for causing the system to further perform:
    down-sampling the input image into a reduced image; and
    generating the at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models based on the reduced image.
  78. The method according to claim 65, wherein the plurality of image parameter mapping models are color space models associated with color space information.
  79. The method according to claim 78, wherein the color space information of the input image comprises RGB values, HSL values, or YUV values ofa pixel of the input image.
  80. The system according to claim 78 or 79, wherein the plurality of image parameter mapping models are one-dimension matrices, two-dimension matrices, or three-dimension matrices.
  81. The method according to claim 65, wherein the reference image is selected by a user of the system.
  82. The method according to claim 68, wherein the reference image is associated with a scene, and at least one of the plurality of finalized image parameter mapping models or the plurality of finalized enhancement weights corresponding to the plurality of finalized image parameter mapping models is associated with the scene.
  83. A non-transitory computer readable medium that stores a set of instructions that is executable by at least one processor of a computer system to cause the computer system to perform a method for processing an image, the method comprising:
    generating a plurality of image parameter mapping models and a plurality of enhancement weights corresponding to the plurality of image parameter mapping models for an input image;
    generating an adaptive image enhancement matrix based on the plurality of enhancement weights and the plurality of image parameter mapping models;
    enhancing the input image using the adaptive image enhancement matrix; and
    updating the image enhancement model based on the enhanced input image, wherein
    at least one of the plurality of image parameter mapping models or the plurality of enhancement weights corresponding to the plurality of image parameter mapping models is updated by comparing the enhanced input image with a reference image.
PCT/CN2020/106650 2020-08-03 2020-08-03 Systems and methods for processing image Ceased WO2022027197A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106650 WO2022027197A1 (en) 2020-08-03 2020-08-03 Systems and methods for processing image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/106650 WO2022027197A1 (en) 2020-08-03 2020-08-03 Systems and methods for processing image

Publications (1)

Publication Number Publication Date
WO2022027197A1 true WO2022027197A1 (en) 2022-02-10

Family

ID=80119817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/106650 Ceased WO2022027197A1 (en) 2020-08-03 2020-08-03 Systems and methods for processing image

Country Status (1)

Country Link
WO (1) WO2022027197A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596231A (en) * 2022-03-16 2022-06-07 杭州电子科技大学 Structured light fringe enhancement method based on neural network and Hessian matrix
CN115761271A (en) * 2022-12-20 2023-03-07 北京小米移动软件有限公司 Image processing method, device, electronic device and storage medium
CN115908196A (en) * 2022-12-27 2023-04-04 广东省大湾区集成电路与系统应用研究院 A light adaptive image enhancement method, system and device
CN116703791A (en) * 2022-10-20 2023-09-05 荣耀终端有限公司 Image processing method, electronic device and readable medium
CN117157665A (en) * 2022-03-25 2023-12-01 京东方科技集团股份有限公司 Video processing method and device, electronic equipment and computer readable storage medium
GB2638067A (en) * 2022-08-01 2025-08-13 Advanced Risc Mach Ltd System, devices and/or processes for image anti-aliasing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509290A (en) * 2011-10-25 2012-06-20 西安电子科技大学 Saliency-based synthetic aperture radar (SAR) image airfield runway edge detection method
EP3046071A1 (en) * 2015-01-15 2016-07-20 Thomson Licensing Methods and apparatus for groupwise contrast enhancement
CN110009563A (en) * 2019-03-27 2019-07-12 联想(北京)有限公司 Image processing method and device, electronic equipment and storage medium
CN111325690A (en) * 2020-02-20 2020-06-23 大连海事大学 An adaptive underwater image enhancement method based on differential evolution algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509290A (en) * 2011-10-25 2012-06-20 西安电子科技大学 Saliency-based synthetic aperture radar (SAR) image airfield runway edge detection method
EP3046071A1 (en) * 2015-01-15 2016-07-20 Thomson Licensing Methods and apparatus for groupwise contrast enhancement
CN110009563A (en) * 2019-03-27 2019-07-12 联想(北京)有限公司 Image processing method and device, electronic equipment and storage medium
CN111325690A (en) * 2020-02-20 2020-06-23 大连海事大学 An adaptive underwater image enhancement method based on differential evolution algorithm

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596231A (en) * 2022-03-16 2022-06-07 杭州电子科技大学 Structured light fringe enhancement method based on neural network and Hessian matrix
CN117157665A (en) * 2022-03-25 2023-12-01 京东方科技集团股份有限公司 Video processing method and device, electronic equipment and computer readable storage medium
GB2638067A (en) * 2022-08-01 2025-08-13 Advanced Risc Mach Ltd System, devices and/or processes for image anti-aliasing
CN116703791A (en) * 2022-10-20 2023-09-05 荣耀终端有限公司 Image processing method, electronic device and readable medium
CN116703791B (en) * 2022-10-20 2024-04-19 荣耀终端有限公司 Image processing method, electronic device and readable medium
CN115761271A (en) * 2022-12-20 2023-03-07 北京小米移动软件有限公司 Image processing method, device, electronic device and storage medium
CN115908196A (en) * 2022-12-27 2023-04-04 广东省大湾区集成电路与系统应用研究院 A light adaptive image enhancement method, system and device

Similar Documents

Publication Publication Date Title
WO2022027197A1 (en) Systems and methods for processing image
US20220188999A1 (en) Image enhancement method and apparatus
US12190488B2 (en) Image processor
US20230080693A1 (en) Image processing method, electronic device and readable storage medium
CN115442515B (en) Image processing methods and equipment
US12001959B2 (en) Neural network model training method and device, and time-lapse photography video generating method and device
US11776129B2 (en) Semantic refinement of image regions
US11741579B2 (en) Methods and systems for deblurring blurry images
CN113095470B (en) Neural network training method, image processing method and device, and storage medium
Patel et al. A generative adversarial network for tone mapping hdr images
EP4044110A1 (en) Method for generating image data with reduced noise, and electronic device for performing same
US10853694B2 (en) Processing input data in a convolutional neural network
CN115375909A (en) Image processing method and device
KR20240022265A (en) Method and apparatus for image processing based on neural network
CN114902237B (en) Image processing method, device and electronic device
CN112766277A (en) Channel adjustment method, device and equipment of convolutional neural network model
CN114283101B (en) Multi-exposure image fusion unsupervised learning method and device and electronic equipment
KR20230019060A (en) Method for controlling image signal processor and control device for performing the same
US20240386704A1 (en) System and method for image processing using mixed inference precision
JP7760702B2 (en) Image processing method and device, and vehicle
CN117974420A (en) Image processing method, device, computer equipment and storage medium
JP2024077434A (en) Image processing device, image processing method, program, and storage medium
Huang et al. A two-stage HDR reconstruction pipeline for extreme dark-light RGGB images
JP2023041375A (en) Information processing device, information processing method and program
KR102803285B1 (en) Training method of an artificial neural network model for image Color correction and image COLOR correction method using the trained artificial neural network model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20948422

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20948422

Country of ref document: EP

Kind code of ref document: A1