[go: up one dir, main page]

WO2024212750A1 - Image signal processing method and apparatus, device, and computer-readable storage medium - Google Patents

Image signal processing method and apparatus, device, and computer-readable storage medium Download PDF

Info

Publication number
WO2024212750A1
WO2024212750A1 PCT/CN2024/081031 CN2024081031W WO2024212750A1 WO 2024212750 A1 WO2024212750 A1 WO 2024212750A1 CN 2024081031 W CN2024081031 W CN 2024081031W WO 2024212750 A1 WO2024212750 A1 WO 2024212750A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
unit
feature
task
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/CN2024/081031
Other languages
French (fr)
Chinese (zh)
Inventor
张旭
徐科
孔德辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanechips Technology Co Ltd
Original Assignee
Sanechips Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanechips Technology Co Ltd filed Critical Sanechips Technology Co Ltd
Publication of WO2024212750A1 publication Critical patent/WO2024212750A1/en
Anticipated expiration legal-status Critical
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/36Applying a local operator, i.e. means to operate on image points situated in the vicinity of a given point; Non-linear local filtering operations, e.g. median filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks

Definitions

  • the present disclosure relates to the field of image processing technology, and in particular to an image signal processing method, apparatus, device and computer-readable storage medium.
  • Image signal processing is responsible for optimizing the output signal of the image sensor through a pipeline structure composed of a series of image processing modules in series.
  • Traditional ISP technology is limited by image quality and the difficulty of adjusting ISP to generate high-quality results.
  • AI artificial intelligence
  • the various deep learning-based functional units (such as image denoising, tone mapping, image enhancement, etc.) in AI-ISP (i.e., artificial intelligence-based image signal processing) technology are independent of each other, resulting in a very large overall neural network structure.
  • AI-ISP artificial intelligence-based image signal processing
  • the main purpose of the embodiments of the present disclosure is to provide an image signal processing method, device, equipment and computer-readable storage medium, aiming to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.
  • the present disclosure provides an image signal processing device, comprising:
  • the end backbone network module is configured to obtain a first image, perform feature extraction on the first image, and obtain shared feature parameters;
  • the core network module is configured to input the shared feature parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target feature parameters, wherein the shared feature parameters are feature parameters with task correlation between the various subtask functional units;
  • the back-end backbone network module is configured to construct an image according to the target feature parameters, and construct a second image for output, wherein the quality of the second image is higher than that of the first image.
  • the embodiment of the present disclosure also provides an image signal processing method, including: acquiring a first image, performing feature extraction on the first image, and obtaining shared feature parameters; inputting the shared feature parameters into multiple preset sub-task function units for multi-task joint processing to obtain target feature parameters, wherein the shared feature parameters are feature parameters with task correlation between the various sub-task function units; constructing an image according to the target feature parameters, constructing a second image for output, wherein the quality of the second image is higher than that of the first image.
  • an embodiment of the present disclosure also provides an image signal processing device, which includes: a memory, a processor, and an image signal processing program stored in the memory and executable on the processor, and when the image signal processing program is executed by the processor, it implements the image signal processing method as described above.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, on which an image signal processing program is stored, and when the image signal processing program is executed by a processor, the image signal processing method as described above is implemented.
  • FIG1 is a schematic diagram of the structure of an image signal processing device according to an embodiment of the present disclosure.
  • FIG2 is a schematic diagram of the current end-to-end AI-ISP network architecture
  • FIG3 is a schematic diagram of a feature-sharing AI-ISP network architecture according to an embodiment of the present disclosure
  • FIG4 is a diagram of a neural network architecture for joint processing of noise reduction and enhancement according to an embodiment of the present disclosure
  • FIG5 is a flowchart of a specific embodiment of the image signal processing method disclosed in the present invention.
  • FIG. 6 is a schematic diagram of the hardware structure of an image signal processing device involved in an embodiment of the present disclosure.
  • connection can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, it can be the internal connection of two elements or the interaction relationship between two elements, unless otherwise clearly defined.
  • fixation can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, it can be the internal connection of two elements or the interaction relationship between two elements, unless otherwise clearly defined.
  • the various deep learning-based functional units (such as image denoising, tone mapping, image enhancement, etc.) in the existing AI-ISP (i.e., artificial intelligence-based image signal processing) technology are independent of each other, resulting in a very large overall neural network structure.
  • AI-ISP artificial intelligence-based image signal processing
  • an embodiment of the present disclosure provides an image signal processing device, referring to FIG1 , which is a schematic diagram of the structure of the image signal processing device of the embodiment of the present disclosure.
  • the image signal processing device includes:
  • the front-end backbone network module 10 is configured to obtain a first image, perform feature extraction on the first image, and obtain shared feature parameters;
  • the core network module 20 is configured to input the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters having task correlation between the subtask functional units;
  • the back-end backbone network module 30 is configured to construct an image according to the target feature parameters, and construct a second image for output, wherein the quality of the second image is higher than that of the first image.
  • the front-end backbone network module 10 may be a backbone network module that uses a pyramid network as a feature extraction module
  • the back-end backbone network module 30 may be a backbone network module that uses a pyramid network as a feature recovery module.
  • the core network module 20 is configured to input the shared feature parameters into a plurality of preset subtask function units for multi-task joint processing to obtain the target feature parameters.
  • the types of the subtask function units include but are not limited to at least one of an image denoising unit, an image enhancement unit and a tone mapping unit. It is easy to understand that when the subtask function unit is an image denoising unit and an image mapping unit, the first image is the original image to be processed by image denoising and image mapping, and the second image is the output image after the image denoising and image mapping.
  • the subtask function unit is an image denoising unit and an image enhancement unit
  • the first image is the original image to be processed by image denoising and image enhancement
  • the second image is the output image after the image denoising and image enhancement.
  • the subtask function unit is an image enhancement unit and a tone mapping unit
  • the first image is the original image to be processed by image enhancement and tone mapping
  • the second image is the output image after the image enhancement and tone mapping.
  • the shared characteristic parameter refers to a network characteristic parameter having similarity/correlation between different subtask functional units of the core network module 20, that is, the shared characteristic parameter is a characteristic parameter having task correlation between each subtask functional unit.
  • the target characteristic parameter refers to a characteristic parameter output after the shared characteristic parameter is input into a plurality of preset subtask functional units for multi-task joint processing.
  • the preset subtask functional units are an image denoising unit and an image enhancement unit
  • the target characteristic parameter refers to a characteristic parameter output after the shared characteristic parameter (the shared characteristic parameter is a characteristic parameter having task correlation between the image denoising unit and the image enhancement unit) is input into the image denoising unit for denoising task processing, and input into the image enhancement unit for enhancement task processing.
  • the embodiments of the present disclosure achieve the reuse of similar network features among various sub-task functional units by extracting features (i.e., shared feature parameters) that are similar and relevant to the initial image, thereby sharing some neural network parameters, thereby reducing the network model and computing power requirements when performing multi-task joint processing in the embodiments of the present disclosure.
  • features i.e., shared feature parameters
  • the mainstream AI-ISP network architecture is to isolate sub-task functional units such as image denoising and image enhancement from each other, and use independent neural network models to implement each function. That is, the neural network model corresponding to each functional unit needs to start from the pixel level, perform feature extraction, task processing and feature recovery on the image, and then complete the end-to-end functional implementation of each module.
  • Figure 2 is a schematic diagram of the current mainstream end-to-end AI-ISP network architecture. Among them, "Input" represents the input of the first image, "Output” represents the output of the second image, and task1, task2...task N represents each sub-task functional unit (such as image denoising unit, image enhancement unit and/or tone mapping unit).
  • the AI-ISP network architecture of the embodiment of the present disclosure is an AI-ISP network architecture that can share features in various sub-task functional units.
  • Figure 3 is a schematic diagram of the feature-sharing AI-ISP network architecture of the embodiment of the present disclosure.
  • “Input” represents the input of the first image
  • “Output” represents the output of the second image
  • task1, task2...task N represent various sub-task functional units (such as image denoising units, image enhancement units and/or tone mapping units).
  • the embodiment of the present disclosure extracts data features through the backbone network, and the core networks of each task module reuse the same feature space, which reduces the movement of neural network parameters and feature data. This saves computing resources and improves the inference rate.
  • the disclosed embodiment extracts the features of the first image to obtain shared feature parameters, and inputs the shared feature parameters into a plurality of preset subtask function units for multi-task joint processing, thereby sharing some neural network parameters in the AI-ISP architecture, and each subtask function unit reuses the same feature space, which reduces the network structure and reduces the movement of feature data, thereby saving computing resources and improving the reasoning rate. That is, the disclosed embodiment extracts the data features of the initial image based on the front-end backbone network, and reduces the movement of neural network parameters and feature data by reusing the same feature space for each task core network, and then constructs the output image through the back-end backbone network module 30, thereby saving computing resources and improving the reasoning rate.
  • the disclosed embodiment integrates the neural network architecture to a certain extent by using the subtask function units to share the same backbone network for feature extraction (i.e., feature extraction of the first image) and feature recovery (i.e., image construction according to the target feature parameters), which can avoid repeated feature calculations and data migration and transmission between different subtask function units, effectively improving the overall network operation efficiency, thereby enabling the disclosed embodiment to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.
  • feature extraction i.e., feature extraction of the first image
  • feature recovery i.e., image construction according to the target feature parameters
  • the ISP processes the image signal in the following aspects:
  • Correction and compensation defective pixel correction (DPC), black level compensation (BLC), lens distortion correction (LSC), geometric correction for distortion, stretching, offset, etc., gamma correction, correction related to perspective principle, etc.
  • DPC defective pixel correction
  • BLC black level compensation
  • LSC lens distortion correction
  • geometric correction for distortion, stretching, offset, etc. geometric correction for distortion, stretching, offset, etc.
  • gamma correction correction related to perspective principle, etc.
  • Denoising and image enhancement time domain, spatial domain filtering, hierarchical compensation filtering, various noise removal and sharpening, suppression of ringing effect and banding artifacts, edge enhancement, brightness enhancement and contrast enhancement.
  • Color and format conversion color interpolation, color space conversion, tone mapping, chroma adjustment, color correction, saturation adjustment, scaling and rotation, etc.
  • Adaptive processing automatic white balance, automatic exposure, automatic focus and strobe detection, etc.
  • AI-ISP technology There are two mainstream implementation ideas for existing AI-ISP technology: one is to rely entirely on neural network computing.
  • the mapping relationship between the front-end image sensor signal and the output image simulates the ISP full process to achieve end-to-end signal processing.
  • This method is highly dependent on computing power and data sets, is highly blind and has poor interpretability, and cannot form a point-to-point solution for local color and contrast, texture, noise and other issues that specific modules are concerned about.
  • the second is to AI the functional modules that best reflect the algorithm improvement effect in the ISP process (such as noise reduction, high dynamic lighting rendering and image enhancement, etc.), in order to focus limited computing power on the most critical and most visible functions to the human eye.
  • This method only uses deep learning to process single-module tasks that are relatively complex and difficult in ISP and have obvious effect improvements, and the network design is more targeted and interpretable.
  • the functional modules based on deep learning are independent of each other, resulting in a very large overall neural network structure, and a large amount of computing and memory resource requirements limit its deployment in mobile devices with limited hardware resources.
  • the neural network models designed use similar structures or methods, and the only difference between different tasks is often the error function. Therefore, in the neural networks of different functional modules of AI-ISP, the features extracted by convolutional networks are similar and relevant. By reusing similar network features and sharing some neural network parameters, it is possible to reduce network model and computing power requirements while achieving joint processing of multiple module tasks.
  • the disclosed embodiment proposes a new method for joint multi-task processing of image signals.
  • each task module reuses the same feature space, reduces the network structure, reduces the movement of feature data, and thus saves computing resources and improves the inference rate.
  • Most AI-ISP technologies isolate tasks such as image denoising, tone mapping, and image enhancement from each other, and use independent neural network models to implement each function.
  • Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end function implementation of each module.
  • the disclosed embodiment realizes the joint processing of multiple module tasks based on balancing computing power and performance.
  • each task module shares the same backbone network for feature extraction and feature recovery, and to a certain extent, the neural network architecture is integrated, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network.
  • different subtask types can be decoupled at the feature level, and each task module has an independent neural network structure.
  • the parameters of the front-end backbone network and the forward task module neural network are fixed, and the separate training of each neural network can be completed.
  • the front-end backbone network module 10 includes a preset number of feature extraction network layers of different scales, the feature extraction network layer is configured to obtain a first image, perform Perform multi-channel feature extraction to obtain multi-scale shared feature parameters;
  • the back-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence.
  • the image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and construct an image based on the fused feature parameters to construct a second image for output.
  • the front-end backbone network module 10 may include a preset number of feature extraction network layers of different scales.
  • the front-end backbone network module 10 may include feature extraction network layers of 1/4 size, 1/16 size and 1/64 size, and the number of channels thereof are 16, 64 and 256, respectively.
  • the feature extraction network layer of 1/4 size can be used to extract features of the first image at 1/4 size.
  • the feature extraction network layer of 1/16 size can be used to extract features of the first image at 1/16 size.
  • the feature extraction network layer of 1/64 size can be used to extract features of the first image at 1/64 size.
  • Each feature extraction network layer can be composed of a 2D convolutional network with a convolution kernel size of 2*2 and a step size of 2.
  • each feature processing network layer contains multiple subtask functional units with the same structure.
  • each feature processing network layer includes an image denoising unit and an image enhancement unit.
  • the back-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence.
  • the image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and to construct an image based on the fused feature parameters to construct a second image for output.
  • the front-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence.
  • the image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and to construct an image based on the fused feature parameters to obtain a second image for output.
  • the backbone network module 10 includes feature extraction network layers of 1/4 size, 1/16 size and 1/64 size
  • the core network module 20 includes three feature processing network layers (respectively, the first feature processing network layer, the second feature processing network layer and the third feature processing network layer).
  • the feature extraction network layer of 1/4 size is connected to the first feature processing network layer
  • the feature extraction network layer of 1/16 size is connected to the second feature processing network layer
  • the feature extraction network layer of 1/64 size is connected to the third feature processing network layer.
  • the back-end backbone network module 30 also includes three image restoration network layers, and the scale of each image restoration network layer corresponds one-to-one to the scale of each feature extraction network layer, that is, the three image restoration network layers also correspond to image restoration network layers of 1/4 size, 1/16 size and 1/64 size respectively.
  • three connection links are specifically included, one of which is a 1/4 size feature extraction network layer, a first feature network layer and a 1/4 size feature restoration network layer connected in sequence.
  • Another connection link is a 1/16-sized feature extraction network layer, a second feature network layer, and a 1/16-sized feature recovery network layer connected in sequence.
  • Another connection link is a 1/64-sized feature extraction network layer, a third feature network layer, and a 1/64-sized feature recovery network layer connected in sequence.
  • the front-end backbone network module 10 is configured to include a preset number of feature extraction network layers of different scales, the feature extraction network layer is configured to obtain a first image, perform multi-channel feature extraction on the first image, and obtain multi-scale shared feature parameters, thereby performing multi-scale downsampling processing on the first image to extract image features of multiple numbers of channels, i.e., multi-scale shared feature parameters, and then the core network module 20 is configured to be a feature processing network layer connected to each feature extraction network layer in a one-to-one correspondence, each feature processing network layer includes multiple sub-task functional units, and the feature processing network layer is configured to input the multi-scale shared feature parameters into each sub-task functional unit for multi-channel multi-task joint processing, thereby obtaining image features after multi-task joint processing of multiple numbers of channels, i.e., multi-scale target feature parameters.
  • this embodiment sets the back-end backbone network module 30 as an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence.
  • the image restoration network layer is set to fuse the multi-scale target feature parameters to obtain fused feature parameters, and construct an image based on the fused feature parameters to construct a second image for output, thereby upsampling the fused feature parameters to obtain a second image that is consistent with the scale of the first image (i.e., restored to the size of the original image) and has undergone multi-task joint processing (such as image denoising and image enhancement).
  • subtask functional units include image denoising units and image enhancement units, wherein each feature processing network layer includes an image denoising unit and an image enhancement unit connected in sequence.
  • image noise reduction refers to the process of reducing noise in digital images, sometimes also called image denoising.
  • Image enhancement refers to an image processing method that makes an originally unclear image clear or emphasizes certain interesting features, suppresses uninteresting features, improves image quality, enriches information, and enhances image interpretation and recognition effects.
  • This embodiment extracts image features with multiple numbers of channels by performing downsampling processing on the first image at multiple scales, thereby facilitating improving the image processing effect of performing multi-task joint processing (such as image denoising processing and image enhancement processing) on the first image.
  • multi-task joint processing such as image denoising processing and image enhancement processing
  • the image signal processing apparatus further includes a training module (not shown), and the training module is configured to:
  • the first training data includes a third image of first noise and a fourth image of second noise obtained from the third image, and the second noise is greater than the first noise;
  • the image denoising unit in the first training link is trained using the first training data to perform supervised learning on the first error function that performs denoising task processing in the image denoising unit until the first error function converges, thereby obtaining a trained image denoising unit, wherein the first training link is a front-end backbone network module 10, an image denoising unit, and a back-end backbone network module 30 that are sequentially connected;
  • the first error function is fixed, and second training data is obtained, wherein the second training data includes a fifth image with a first resolution and a sixth image with a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution;
  • the image enhancement unit in the second training link is trained using the second training data to perform supervised learning on the second error function that performs the enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit, wherein the second training link is a front-end backbone network module 10, an image denoising unit, an image enhancement unit, and a back-end backbone network module 30 connected in sequence.
  • the first error function refers to the error function of the image noise reduction unit in performing noise reduction task processing
  • the second error function refers to the error function of the image enhancement unit in performing enhancement task processing
  • the image denoising unit in the first training link is trained by using the first training data to supervise the first error function of the image denoising unit for denoising task processing until the first error function converges, thereby obtaining a trained image denoising unit, wherein the first training link is the front-end backbone network module 10, the image denoising unit and the back-end backbone network module 30 connected in sequence. Then, after the image denoising unit training is completed, the first error function is fixed and the image denoising unit is obtained.
  • the second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, wherein the first resolution is greater than the second resolution, and the second training data is used to train the image enhancement unit in the second training link to supervise the learning of the second error function for the enhancement task processing in the image enhancement unit until the second error function converges to obtain a trained image enhancement unit
  • the second training link is a front-end backbone network module 10, an image denoising unit, an image enhancement unit and a back-end backbone network module 30 connected in sequence, so that the image denoising unit and the image enhancement unit can be decoupled at the feature level, and the parameters of the neural networks of the front-end backbone network and the forward task module are fixed during the training process to complete the separate training of the respective neural networks, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module 10, wherein the data flow for training the image denoising unit does not pass through the network layer of the image enhancement unit, and then after
  • the embodiment of the present disclosure decouples different functional modules at the feature level.
  • Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.
  • the mainstream AI-ISP technology isolates tasks such as image denoising and image enhancement from each other, and uses independent neural network models to implement each function.
  • Each neural network model starts from the pixel level, performs feature extraction, task processing and feature recovery on the image, and then completes the end-to-end functional implementation of each module.
  • each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network.
  • Different subtask types can be decoupled at the feature level.
  • FIG. 4 is a neural network architecture diagram of the joint processing of noise reduction and enhancement in the embodiment of the present disclosure, including:
  • a pyramid network is used as the backbone network for feature extraction and restoration (i.e., the front-end backbone network module 10), and image denoising and image enhancement are used as two subtasks (i.e., subtask functional units) as examples to illustrate the specific implementation process of the disclosed embodiment.
  • the neural network architecture i.e., AI-ISP network architecture
  • the neural network architecture shown in FIG4 includes a front-end backbone network (i.e., the front-end backbone network module 10), Core network (i.e. core network module 20) and back-end backbone network (i.e. back-end backbone network module 30).
  • the front-end backbone network includes three layers of feature extraction networks, which respectively perform feature extraction on the original image (i.e.
  • each layer of feature extraction network is composed of a 2D convolution network with a convolution kernel size of 2*2 and a step size of 2.
  • the front-end backbone network processes the input image to obtain multi-scale features as the input of the core network.
  • the back-end backbone network includes three layers of feature recovery networks, and the number of channels are 256, 64 and 16 respectively, and each layer of feature extraction network is composed of a pixel_shuffle network, which is used to restore the features output by the task core network to the original size image.
  • the denoising module (i.e., image denoising unit) and enhancement module (i.e., image enhancement unit) in the core network both adopt the UNET network structure, and also include 3 network blocks of the same scale as the backbone network.
  • the specific number of network layers in each network block can be adjusted according to the specific situation.
  • each network block contains three layers of networks, and each layer of the network is composed of a depth-separable convolution and a relu activation function.
  • the features of the forward network at 1/4, 1/16 and 1/64 scales are used as inputs for the core network subtask.
  • the training process is divided into two steps:
  • the input image i.e., the first image
  • the front-end backbone network the core network of the denoising module, and the back-end backbone network in turn, and supervised learning is performed using the error function L1loss (i.e., the first error function) of the denoising task, and the relevant network parameters are updated by the gradient descent method during back propagation.
  • the data flow does not pass through the network layer of the core network of the enhancement module, but only through the front-end backbone network, the core network of the denoising module, and the back-end backbone network (i.e., the first training link).
  • the network parameters of the front-end backbone network and the core network of the denoising module in the first step are fixed.
  • the input image is processed by the front-end backbone network, the core network of the denoising module, the core network of the enhancement module, and the back-end backbone network in sequence (i.e., the second training link).
  • the weighted parameters of the local contrast function and the color saturation function in the error function L1loss (i.e., the first error function) of the denoising task and the error function of the enhancement network (i.e., the second error function) are used for supervised learning, thereby updating the parameters of the core network of the enhancement module and fine-tuning the parameters of the back-end backbone network.
  • the enhancement core network directly processes the data features output by the denoising core network, rather than re-extracting features at the pixel level.
  • a tone mapping task unit can also be set in the core network, etc.
  • the training method is the same as the second step.
  • the parameters of the front-end backbone network and the trained forward core network are fixed, and the weighted error function of each task is used for supervised learning.
  • the input image will pass through the front-end backbone network, each task core network, and the back-end backbone network in sequence to obtain the final output, realizing multi-task joint processing in ISP.
  • the type of subtask functional unit also includes a tone mapping unit, wherein each feature processing network layer includes an image denoising unit, an image enhancement unit, and a tone mapping unit connected in sequence.
  • tone mapping is a computer graphics technique that approximates the display of high dynamic range images on a limited dynamic range medium.
  • the problem that tone mapping solves is to perform a large contrast reduction to transform the scene brightness to a displayable range while maintaining image details and colors, which are very important for representing the original scene.
  • the types of sub-task function units are further arranged to include a tone mapping unit, so that the image signal processing device of the disclosed embodiment can not only perform image noise reduction processing and image enhancement processing on the first image, but also perform tone mapping processing on the first image, thereby effectively expanding the image processing task function of the image signal processing device.
  • the type of subtask function module also includes a tone mapping unit, wherein each feature processing network layer includes an image denoising unit, an image enhancement unit, and a tone mapping unit connected in sequence, and the training module is set to:
  • the second error function is fixed, and third training data is obtained, wherein the third training data includes a seventh image with a first contrast and an eighth image with a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast;
  • the tone mapping unit in the third training link is trained using the third training data to perform supervised learning on the third error function that performs mapping task processing in the tone mapping unit until the third error function converges to obtain a trained tone mapping unit, wherein the third training link is a front-end backbone network, an image denoising unit, an image enhancement unit, a tone mapping unit, and a back-end backbone network processing connected in sequence.
  • the third error function refers to the error function used by the tone mapping unit to perform mapping task processing.
  • each feature processing network layer includes an image denoising unit, an image enhancement unit and a tone mapping unit connected in sequence
  • the training module is set to fix the second error function after the image enhancement unit is trained, and obtain the third training data
  • the third training data includes a seventh image of a first contrast and an eighth image of a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast
  • the tone mapping unit in the third training link is trained using the third training data to supervise the learning of the third error function of the mapping task processing in the tone mapping unit until the third error function converges to obtain the trained tone mapping unit
  • the third training link is a front-end backbone network, an image denoising unit, an image enhancement unit connected in sequence, , tone mapping unit and back-end backbone network processing, so that the image denoising unit, image enhancement unit and tone mapping unit can be decoupled at the feature level.
  • the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module 10, wherein the data flow of training the image denoising unit does not pass through the network layer of the image enhancement unit
  • the image enhancement unit can be trained by fixing the network parameters of the front-end backbone network module 10 and the image denoising unit until the network parameters of the image enhancement unit converge, wherein the data flow of training the image enhancement unit does not pass through the network layer of the tone mapping unit.
  • the tone mapping unit can be trained by fixing the network parameters of the front-end backbone network module 10, the image denoising unit and the image enhancement unit until the network parameters of the tone mapping unit converge.
  • the embodiment of the present disclosure decouples different functional modules at the feature level.
  • Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.
  • the mainstream AI-ISP technology isolates tasks such as image denoising, image enhancement, and tone mapping from each other, and implements each function with independent neural network models.
  • Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end functional implementation of each module.
  • each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network.
  • Different subtask types can be decoupled at the feature level.
  • the device embodiments described above are merely illustrative, and the individual components described as separate components are
  • the elements may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • modules in the image processing device may be omitted, or additional modules may be included.
  • the modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus may equivalently perform the functions of the corresponding modules/elements before the combination.
  • the present disclosure also provides an image signal processing method, with reference to FIG5 , which is a flowchart of a specific embodiment of the image signal processing method of the present disclosure.
  • the image signal processing method includes:
  • Step S10 acquiring a first image, performing feature extraction on the first image, and obtaining shared feature parameters
  • Step S20 inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters having task correlation between the subtask functional units;
  • Step S30 constructing an image according to the target feature parameters, and constructing a second image for output, wherein the quality of the second image is higher than that of the first image.
  • the embodiments of the present disclosure achieve the reuse of similar network features among various sub-task functional units by extracting features of the initial image (i.e., sharing feature parameters) that are similar and relevant, thereby sharing some neural network parameters, thereby achieving the embodiment of the present disclosure when performing multi-task joint processing. Reduced network model and computing power requirements.
  • the neural network model corresponding to each functional unit currently needs to start from the pixel level, perform feature extraction, task processing and feature recovery on the image, and then complete the end-to-end functional implementation of each module.
  • the embodiment of the present disclosure extracts features from the first image to obtain shared feature parameters, and inputs the shared feature parameters to multiple preset sub-task functional units for multi-task joint processing, thereby realizing the sharing of some neural network parameters in the AI-ISP architecture.
  • Each sub-task functional unit reuses the same feature space, which reduces the network structure and the movement of feature data, thereby indirectly saving computing resources and improving the inference rate. That is, the embodiment of the present disclosure extracts the data features of the initial image, and by integrating each task core The core network reuses the same feature space, reducing the movement of neural network parameters and feature data, and then constructs the output image through the back-end backbone network module, thereby saving computing resources and improving the reasoning rate.
  • the disclosed embodiment integrates the neural network architecture to a certain extent by using the subtask functional units to share the same backbone network for feature extraction (i.e., feature extraction of the first image) and feature recovery (i.e., image construction based on the target feature parameters), which can avoid repeated calculation of features and migration and transmission of data between different subtask functional units, effectively improving the overall network operation efficiency, and thus enabling the disclosed embodiment to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.
  • feature extraction i.e., feature extraction of the first image
  • feature recovery i.e., image construction based on the target feature parameters
  • the method further includes:
  • Step A10 acquiring a first image, performing multi-channel feature extraction on the first image, and obtaining multi-scale shared feature parameters
  • Step A20 inputting the multi-scale shared feature parameters into each sub-task functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters;
  • Step A30 feature fusion is performed on the multi-scale target feature parameters to obtain fused feature parameters, and image construction is performed according to the fused feature parameters to construct a second image for output.
  • the disclosed embodiment acquires a first image, performs multi-channel feature extraction on the first image, obtains multi-scale shared feature parameters, and then performs multi-scale downsampling processing on the first image to extract multiple image features with different numbers of channels, i.e., multi-scale shared feature parameters, and then inputs the multi-scale shared feature parameters into each subtask functional unit for multi-channel multi-task joint processing, thereby obtaining multiple image features after multi-task joint processing with different numbers of channels, i.e., multi-scale target feature parameters.
  • the present embodiment performs feature fusion of the multi-scale target feature parameters to obtain fused feature parameters, and constructs an image based on the fused feature parameters to construct a second image for output, thereby performing upsampling processing on the fused feature parameters to obtain a second image that is consistent with the scale of the first image (i.e., restored to the size of the original image) and has been subjected to multi-task joint processing (e.g., image denoising and image enhancement).
  • multi-task joint processing e.g., image denoising and image enhancement
  • the types of the subtask functional units include an image noise reduction unit and an image enhancement unit
  • the step of inputting the shared feature parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain the target feature parameters includes:
  • Step B10 input the shared feature parameters to the image denoising unit and the image enhancement unit connected in sequence to perform multi-task joint processing to obtain target feature parameters.
  • This embodiment extracts image features with multiple numbers of channels by performing downsampling processing on the first image at multiple scales, thereby facilitating improving the image processing effect of performing multi-task joint processing (such as image denoising processing and image enhancement processing) on the first image.
  • multi-task joint processing such as image denoising processing and image enhancement processing
  • the method further comprises:
  • Step C10 acquiring first training data, wherein the first training data includes a third image with first noise and a fourth image with second noise obtained from the third image, and the second noise is greater than the first noise;
  • Step C20 training the image denoising unit using the first training data to perform supervised learning on a first error function that performs denoising task processing in the image denoising unit, until the first error function converges, thereby obtaining a trained image denoising unit;
  • Step C30 after the image denoising unit training is completed, fixing the first error function and acquiring second training data, wherein the second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution;
  • Step C40 using the second training data to train the image enhancement unit to perform supervised learning on the second error function that performs enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit.
  • the image denoising unit in the first training link is trained by using the first training data to supervise the learning of the first error function for the denoising task in the image denoising unit until the first error function converges, thereby obtaining a trained image denoising unit. Then, after the image denoising unit is trained, the first error function is fixed, and the second training data is obtained, wherein the second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution.
  • the image enhancement unit in the second training link is then trained by using the second training data to supervise the learning of the second error function for the enhancement task in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit, so that the image denoising unit and the image enhancement unit can be decoupled at the feature level.
  • the parameters of the neural network of the front-end backbone network and the forward task module can be fixed to complete the separate training of the respective neural networks, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module, wherein the data flow for training the image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the front-end backbone network module can be fixed to complete the training of the image denoising unit.
  • the network parameters of the network module and the image denoising unit are used to train the image enhancement unit until the network parameters of the image enhancement unit converge.
  • the embodiment of the present disclosure decouples different functional modules at the feature level.
  • Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.
  • the type of the subtask functional unit further includes a tone mapping unit
  • the step of inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain the target characteristic parameters includes:
  • Step D10 inputting the shared feature parameters into the image denoising unit, the image enhancement unit and the tone mapping unit connected in sequence to perform multi-task joint processing to obtain target feature parameters.
  • the types of sub-task function units are further arranged to include a tone mapping unit, so that the image signal processing method of the disclosed embodiment can not only perform image noise reduction processing and image enhancement processing on the first image, but also perform tone mapping processing on the first image, thereby effectively expanding the image processing task function of the image signal processing device.
  • the method further includes:
  • the second error function is fixed and the third training data is obtained, wherein the third training data includes a seventh image of a first contrast and an eighth image of a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast.
  • the third training data is used to train the tone mapping unit in the third training link, so as to supervise the learning of the third error function that performs mapping task processing in the tone mapping unit until the third error function converges, and a trained tone mapping unit is obtained, so that the image denoising unit, the image enhancement unit and the tone mapping unit can be decoupled at the feature level.
  • the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module, wherein the data flow of the training image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the image enhancement unit can be trained by fixing the network parameters of the front-end backbone network module and the image denoising unit until the network parameters of the image enhancement unit converge, wherein the data flow of the training image enhancement unit does not pass through the network layer of the tone mapping unit. Then the tone mapping unit can be trained by fixing the network parameters of the front-end backbone network module, the image denoising unit and the image enhancement unit until the network parameters of the tone mapping unit converge.
  • the mainstream AI-ISP technology isolates tasks such as image denoising, image enhancement, and tone mapping from each other, and implements each function with independent neural network models.
  • Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end functional implementation of each module.
  • each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network.
  • Different subtask types can be decoupled at the feature level.
  • the image signal processing method provided in this embodiment and the image signal processing device provided in the above embodiment belong to the same inventive concept.
  • This embodiment has the same beneficial effects as the various embodiments of the image signal processing device, which will not be repeated here.
  • the embodiment of the present disclosure also provides an image signal processing device, with reference to FIG6, which is a schematic diagram of the hardware structure of the image signal processing device involved in the embodiment of the present disclosure.
  • the image signal processing device may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005.
  • the communication bus 1002 is used to realize the connection and communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the structure shown in FIG6 does not limit the image signal processing device, and may include more or fewer components than shown, or combine certain components, or arrange components differently.
  • the memory 1005 as a computer-readable storage medium may include an operating system, a data storage module, a network communication module, a user interface module, and an image signal processing program.
  • the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in this embodiment can be set in the communication device, and the communication device calls the image signal processing program stored in the memory 1005 through the processor 1001, and executes the image signal processing method provided in any of the above embodiments.
  • an embodiment of the present disclosure further proposes a computer storage medium, which may be a non-volatile computer-readable storage medium, on which an image signal processing program is stored, and when the image signal processing program is executed by a processor, the image signal processing method of the present disclosure as described above is implemented.
  • the various embodiments of the image signal processing device and the computer-readable storage medium disclosed in the present invention may refer to the various embodiments of the image signal processing method disclosed in the present invention, and will not be described in detail here.
  • the essence of the technical solution or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) as described above, and includes a number of instructions for enabling an image signal processing device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Nonlinear Science (AREA)
  • Image Processing (AREA)

Abstract

Embodiments of the present disclosure relate to the technical field of image processing, and provide an image signal processing method and apparatus, a device, and a computer-readable storage medium. The method of the embodiments of the present disclosure comprises: acquiring a first image, and performing multi-channel feature extraction on the first image to obtain a multi-scale shared feature parameter; inputting the multi-scale shared feature parameter into each sub-task functional unit for multi-channel multi-task joint processing to obtain a multi-scale target feature parameter; and performing feature fusion on the multi-scale target feature parameter to obtain a fused feature parameter, and performing image construction according to the fused feature parameter, to obtain a second image for output.

Description

图像信号处理方法、装置、设备及计算机可读存储介质Image signal processing method, device, equipment and computer readable storage medium

相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS

本公开基于2023年04月12日提交的发明名称为“图像信号处理方法、装置、设备及计算机可读存储介质”的中国专利申请CN202310387238.3,并且要求该专利申请的优先权,通过引用将其所公开的内容全部并入本公开。The present disclosure is based on Chinese patent application CN202310387238.3 filed on April 12, 2023, entitled “Image Signal Processing Method, Device, Equipment and Computer-readable Storage Medium”, and claims the priority of the patent application, and all the contents disclosed therein are incorporated into the present disclosure by reference.

技术领域Technical Field

本公开涉及图像处理技术领域,尤其涉及图像信号处理方法、装置、设备及计算机可读存储介质。The present disclosure relates to the field of image processing technology, and in particular to an image signal processing method, apparatus, device and computer-readable storage medium.

背景技术Background Art

图像信号处理(Image Signal Process,ISP)通过一系列图像处理模块串联而成的流水结构,负责对图像传感器的输出信号进行优化。传统的ISP技术受限于图像质量和调整ISP以生成高质量结果的困难性。Image signal processing (ISP) is responsible for optimizing the output signal of the image sensor through a pipeline structure composed of a series of image processing modules in series. Traditional ISP technology is limited by image quality and the difficulty of adjusting ISP to generate high-quality results.

近年来,一些基于人工智能(artificial intelligence,简称AI)的ISP已经被提出。而人工智能的发展促进深度学习替代传统ISP的整体流程或者部分模块功能,突破了传统ISP技术的“天花板”。深度学习的进步给传统ISP的许多图像处理方法带来了最先进的新方法,例如图像降噪和图像增强。In recent years, some ISPs based on artificial intelligence (AI) have been proposed. The development of AI has promoted deep learning to replace the overall process or some module functions of traditional ISPs, breaking through the "ceiling" of traditional ISP technology. The progress of deep learning has brought the most advanced new methods to many image processing methods of traditional ISPs, such as image noise reduction and image enhancement.

然而,在一些情形下的AI-ISP(即基于人工智能的图像信号处理)技术中各个基于深度学习的功能单元(例如图像降噪、色调映射、图像增强等功能单元)相互独立,从而导致整体神经网络结构十分庞大,大量的计算和内存资源需求使其在硬件资源有限的电子设备中部署受限。However, in some cases, the various deep learning-based functional units (such as image denoising, tone mapping, image enhancement, etc.) in AI-ISP (i.e., artificial intelligence-based image signal processing) technology are independent of each other, resulting in a very large overall neural network structure. The large amount of computing and memory resource requirements restrict its deployment in electronic devices with limited hardware resources.

发明内容Summary of the invention

本公开实施例的主要目的在于提供一种图像信号处理方法、装置、设备及计算机可读存储介质,旨在解决目前AI-ISP的神经网络结构庞大,对算力资源的需求大的技术问题。The main purpose of the embodiments of the present disclosure is to provide an image signal processing method, device, equipment and computer-readable storage medium, aiming to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.

为实现上述目的,本公开实施例提供一种图像信号处理装置,包括:前 端主干网络模块,设置为获取第一图像,对所述第一图像进行特征提取,得到共享特征参数;核心网络模块,设置为将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,所述共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;后端主干网络模块,设置为根据所述目标特征参数进行图像构建,构建得到第二图像进行输出,其中,所述第二图像的质量高于所述第一图像。To achieve the above-mentioned purpose, the present disclosure provides an image signal processing device, comprising: The end backbone network module is configured to obtain a first image, perform feature extraction on the first image, and obtain shared feature parameters; the core network module is configured to input the shared feature parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target feature parameters, wherein the shared feature parameters are feature parameters with task correlation between the various subtask functional units; the back-end backbone network module is configured to construct an image according to the target feature parameters, and construct a second image for output, wherein the quality of the second image is higher than that of the first image.

此外,为实现上述目的,本公开实施例还提供一种图像信号处理方法,包括:获取第一图像,对所述第一图像进行特征提取,得到共享特征参数;将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,所述共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;根据所述目标特征参数进行图像构建,构建得到第二图像进行输出,其中,所述第二图像的质量高于所述第一图像。In addition, to achieve the above-mentioned purpose, the embodiment of the present disclosure also provides an image signal processing method, including: acquiring a first image, performing feature extraction on the first image, and obtaining shared feature parameters; inputting the shared feature parameters into multiple preset sub-task function units for multi-task joint processing to obtain target feature parameters, wherein the shared feature parameters are feature parameters with task correlation between the various sub-task function units; constructing an image according to the target feature parameters, constructing a second image for output, wherein the quality of the second image is higher than that of the first image.

此外,为实现上述目的,本公开实施例还提供一种图像信号处理设备,所述图像信号处理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图像信号处理程序,所述图像信号处理程序被所述处理器执行时实现如上述的图像信号处理方法。In addition, to achieve the above-mentioned purpose, an embodiment of the present disclosure also provides an image signal processing device, which includes: a memory, a processor, and an image signal processing program stored in the memory and executable on the processor, and when the image signal processing program is executed by the processor, it implements the image signal processing method as described above.

此外,为实现上述目的,本公开实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有图像信号处理程序,所述图像信号处理程序被处理器执行时实现如上述的图像信号处理方法。In addition, to achieve the above-mentioned purpose, an embodiment of the present disclosure further provides a computer-readable storage medium, on which an image signal processing program is stored, and when the image signal processing program is executed by a processor, the image signal processing method as described above is implemented.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present disclosure. For ordinary technicians in this field, other drawings can be obtained based on the structures shown in these drawings without paying any creative work.

图1为本公开实施例图像信号处理装置的结构示意图;FIG1 is a schematic diagram of the structure of an image signal processing device according to an embodiment of the present disclosure;

图2为目前的端到端的AI-ISP网络架构示意图;FIG2 is a schematic diagram of the current end-to-end AI-ISP network architecture;

图3为本公开实施例的特征共享的AI-ISP网络架构示意图;FIG3 is a schematic diagram of a feature-sharing AI-ISP network architecture according to an embodiment of the present disclosure;

图4为本公开实施例降噪和增强联合处理的神经网络架构图; FIG4 is a diagram of a neural network architecture for joint processing of noise reduction and enhancement according to an embodiment of the present disclosure;

图5为本公开图像信号处理方法一具体实施例的流程示意图;FIG5 is a flowchart of a specific embodiment of the image signal processing method disclosed in the present invention;

图6为本公开实施例方案涉及的图像信号处理设备的硬件结构示意图。FIG. 6 is a schematic diagram of the hardware structure of an image signal processing device involved in an embodiment of the present disclosure.

本公开目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the objectives, functional features and advantages of the present disclosure will be further described in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。It should be understood that the specific embodiments described herein are only used to explain the present disclosure, and are not used to limit the present disclosure.

下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The following will be combined with the drawings in the embodiments of the present disclosure to clearly and completely describe the technical solutions in the embodiments of the present disclosure. Obviously, the described embodiments are only part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present disclosure.

需要说明,本公开实施例中所有方向性指示(诸如上、下、左、右、前、后……)仅用于解释在某一特定姿态(如附图所示)下各部件之间的相对位置关系、运动情况等,如果该特定姿态发生改变时,则该方向性指示也相应地随之改变。It should be noted that all directional indications in the embodiments of the present disclosure (such as up, down, left, right, front, back, etc.) are only used to explain the relative position relationship, movement status, etc. between the components under a certain specific posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication will also change accordingly.

在本公开中,除非另有明确的规定和限定,术语“连接”、“固定”等应做广义理解,例如,“固定”可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系,除非另有明确的限定。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本公开中的具体含义。In the present disclosure, unless otherwise clearly specified and limited, the terms "connection", "fixation", etc. should be understood in a broad sense. For example, "fixation" can be a fixed connection, a detachable connection, or an integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, it can be the internal connection of two elements or the interaction relationship between two elements, unless otherwise clearly defined. For ordinary technicians in this field, the specific meanings of the above terms in the present disclosure can be understood according to specific circumstances.

另外,在本公开中如涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本公开要求的保护范围之内。In addition, in the present disclosure, descriptions such as "first", "second", etc. are only used for descriptive purposes and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of the indicated technical features. Therefore, the features defined as "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but they must be based on the ability of ordinary technicians in the field to implement them. When the combination of technical solutions is contradictory or cannot be implemented, it should be deemed that such a combination of technical solutions does not exist and is not within the scope of protection required by the present disclosure.

近年来,一些基于人工智能(artificial intelligence,简称AI)的ISP已经被提出。而人工智能的发展促进深度学习替代传统ISP的整体流程或者部分 模块功能,突破了传统ISP技术的“天花板”。深度学习的进步给传统ISP的许多图像处理方法带来了最先进的新方法,例如图像降噪和图像增强。In recent years, some ISPs based on artificial intelligence (AI) have been proposed. The development of AI has promoted deep learning to replace the entire process or part of the traditional ISP. The module function breaks through the "ceiling" of traditional ISP technology. Advances in deep learning have brought state-of-the-art new methods to many image processing methods of traditional ISP, such as image denoising and image enhancement.

然而,现有的AI-ISP(即基于人工智能的图像信号处理)技术中各个基于深度学习的功能单元(例如图像降噪、色调映射、图像增强等功能单元)相互独立,从而导致整体神经网络结构十分庞大,大量的计算和内存资源需求使其在硬件资源有限的电子设备中部署受限。However, the various deep learning-based functional units (such as image denoising, tone mapping, image enhancement, etc.) in the existing AI-ISP (i.e., artificial intelligence-based image signal processing) technology are independent of each other, resulting in a very large overall neural network structure. The large amount of computing and memory resource requirements restrict its deployment in electronic devices with limited hardware resources.

基于此,本公开实施例提供了一种图像信号处理装置,参照图1,图1为本公开实施例图像信号处理装置的结构示意图。本实施例中,所述图像信号处理装置包括:Based on this, an embodiment of the present disclosure provides an image signal processing device, referring to FIG1 , which is a schematic diagram of the structure of the image signal processing device of the embodiment of the present disclosure. In this embodiment, the image signal processing device includes:

前端主干网络模块10,设置为获取第一图像,对第一图像进行特征提取,得到共享特征参数;The front-end backbone network module 10 is configured to obtain a first image, perform feature extraction on the first image, and obtain shared feature parameters;

核心网络模块20,设置为将共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;The core network module 20 is configured to input the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters having task correlation between the subtask functional units;

后端主干网络模块30,设置为根据目标特征参数进行图像构建,构建得到第二图像进行输出,其中,第二图像的质量高于第一图像。The back-end backbone network module 30 is configured to construct an image according to the target feature parameters, and construct a second image for output, wherein the quality of the second image is higher than that of the first image.

在本实施例中,前端主干网络模块10可为使用pyramid网络作为特征提取的主干网络模块。后端主干网络模块30可为使用pyramid网络作为特征恢复的主干网络模块。In this embodiment, the front-end backbone network module 10 may be a backbone network module that uses a pyramid network as a feature extraction module, and the back-end backbone network module 30 may be a backbone network module that uses a pyramid network as a feature recovery module.

需要说明的是,核心网络模块20设置为将共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数。其中,该子任务功能单元的种类包括但不限于图像降噪单元、图像增强单元和色调映射单元中的至少一种。容易理解的是,当子任务功能单元为图像降噪单元和图像映射单元时,第一图像即为待进行图像降噪处理和图像映射处理的原始图像,第二图像即为经过图像降噪处理和图像映射处理后的输出图像。又例如,当子任务功能单元为图像降噪单元和图像增强单元时,第一图像即为待进行图像降噪处理和图像增强处理的原始图像,第二图像即为经过图像降噪处理和图像增强处理后的输出图像。还例如,当子任务功能单元为图像增强单元和色调映射单元时,第一图像即为待进行图像增强处理和色调映射处理的原始图像,第二图像即为经过图像增强处理和色调映射处理后的输出图像。本实 施例对此不作具体的限定。可知的是,该第二图像的质量高于该第一图像。It should be noted that the core network module 20 is configured to input the shared feature parameters into a plurality of preset subtask function units for multi-task joint processing to obtain the target feature parameters. Among them, the types of the subtask function units include but are not limited to at least one of an image denoising unit, an image enhancement unit and a tone mapping unit. It is easy to understand that when the subtask function unit is an image denoising unit and an image mapping unit, the first image is the original image to be processed by image denoising and image mapping, and the second image is the output image after the image denoising and image mapping. For another example, when the subtask function unit is an image denoising unit and an image enhancement unit, the first image is the original image to be processed by image denoising and image enhancement, and the second image is the output image after the image denoising and image enhancement. For another example, when the subtask function unit is an image enhancement unit and a tone mapping unit, the first image is the original image to be processed by image enhancement and tone mapping, and the second image is the output image after the image enhancement and tone mapping. This embodiment The embodiment does not make any specific limitation to this. It is known that the quality of the second image is higher than that of the first image.

本领域技术人员可知的是,该共享特征参数是指核心网络模块20的各个不同子任务功能单元之间具有相似性/相关性的网络特征参数,也即,该共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数。该目标特征参数是指将共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理后所输出得到的特征参数。为了助于理解,列举一示例,该预设的子任务功能单元为图像降噪单元和图像增强单元时,该目标特征参数是指共享特征参数(该共享特征参数为图像降噪单元和图像增强单元之间具有任务相关性的特征参数)输入至图像降噪单元进行降噪任务处理后,以及输入至图像增强单元进行增强任务处理后所输出得到的特征参数。It is known to those skilled in the art that the shared characteristic parameter refers to a network characteristic parameter having similarity/correlation between different subtask functional units of the core network module 20, that is, the shared characteristic parameter is a characteristic parameter having task correlation between each subtask functional unit. The target characteristic parameter refers to a characteristic parameter output after the shared characteristic parameter is input into a plurality of preset subtask functional units for multi-task joint processing. To facilitate understanding, an example is given, where the preset subtask functional units are an image denoising unit and an image enhancement unit, and the target characteristic parameter refers to a characteristic parameter output after the shared characteristic parameter (the shared characteristic parameter is a characteristic parameter having task correlation between the image denoising unit and the image enhancement unit) is input into the image denoising unit for denoising task processing, and input into the image enhancement unit for enhancement task processing.

面对AI-ISP(artificial intelligence-Image Signal Process,基于人工智能的图像信号处理)架构的不同功能单元(例如图像降噪和图像增强等功能单元)的神经网络,本公开实施例通过对初始图像提取的特征(即共享特征参数)具有相似性和相关性的,从而实现在各个子任务功能单元之间复用相似的网络特征,从而共享部分神经网络参数,进而实现本公开实施例在进行多任务联合处理时,减小了网络模型和算力需求。In the face of neural networks of different functional units (such as image denoising and image enhancement and other functional units) of the AI-ISP (artificial intelligence-Image Signal Process) architecture, the embodiments of the present disclosure achieve the reuse of similar network features among various sub-task functional units by extracting features (i.e., shared feature parameters) that are similar and relevant to the initial image, thereby sharing some neural network parameters, thereby reducing the network model and computing power requirements when performing multi-task joint processing in the embodiments of the present disclosure.

目前,主流的AI-ISP网络架构是指将图像降噪和图像增强等子任务功能单元相互隔离,用各自独立的神经网络模型实现各个功能,也即目前每个功能单元对应的神经网络模型都需要从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现。如图2所示,图2为目前主流的端到端的AI-ISP网络架构示意图。其中,“Input”代表第一图像的输入,“Output”代表第二图像的输出,task1、task2......task N代表各个子任务功能单元(例如图像降噪单元、图像增强单元和/或色调映射单元)。At present, the mainstream AI-ISP network architecture is to isolate sub-task functional units such as image denoising and image enhancement from each other, and use independent neural network models to implement each function. That is, the neural network model corresponding to each functional unit needs to start from the pixel level, perform feature extraction, task processing and feature recovery on the image, and then complete the end-to-end functional implementation of each module. As shown in Figure 2, Figure 2 is a schematic diagram of the current mainstream end-to-end AI-ISP network architecture. Among them, "Input" represents the input of the first image, "Output" represents the output of the second image, and task1, task2...task N represents each sub-task functional unit (such as image denoising unit, image enhancement unit and/or tone mapping unit).

而本公开实施例的AI-ISP网络架构为可在各个子任务功能单元进行特征共享的AI-ISP网络架构。如图3所示,图3为本公开实施例的特征共享的AI-ISP网络架构示意图。对应地,“Input”代表第一图像的输入,“Output”代表第二图像的输出,task1、task2......task N代表各个子任务功能单元(例如图像降噪单元、图像增强单元和/或色调映射单元)。相比于目前主流的模块完全独立的AI-ISP技术,本公开实施例通过主干网络提取数据特征,各个任务模块核心网络复用同一个特征空间,减少了神经网络参数和特征数据的搬 移,进而节省了计算资源和提升了推理速率。The AI-ISP network architecture of the embodiment of the present disclosure is an AI-ISP network architecture that can share features in various sub-task functional units. As shown in Figure 3, Figure 3 is a schematic diagram of the feature-sharing AI-ISP network architecture of the embodiment of the present disclosure. Correspondingly, "Input" represents the input of the first image, "Output" represents the output of the second image, and task1, task2...task N represent various sub-task functional units (such as image denoising units, image enhancement units and/or tone mapping units). Compared to the current mainstream AI-ISP technology with completely independent modules, the embodiment of the present disclosure extracts data features through the backbone network, and the core networks of each task module reuse the same feature space, which reduces the movement of neural network parameters and feature data. This saves computing resources and improves the inference rate.

本公开实施例通过对该第一图像进行特征提取,得到共享特征参数,并将该共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,从而实现在AI-ISP架构中共享部分神经网络参数,各个子任务功能单元复用同一个特征空间,减小了网络结构,减少了特征数据的搬移,进而侧面节省了计算资源和提升了推理速率。也即,本公开实施例基于前端主干网络提取初始图像的数据特征,并通过将各个任务核心网络复用同一个特征空间,减少了神经网络参数和特征数据的搬移,再通过后端主干网络模块30进行输出图像的构建,进而节省了计算资源和提升了推理速率。并且本公开实施例通过将子任务功能单元共用相同的主干网络进行特征提取(即对第一图像进行特征提取)和特征恢复(即根据目标特征参数进行图像构建),在一定程度上进行了神经网络架构的融合,可以避免不同子任务功能单元之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率,进而使得本公开实施例解决了目前AI-ISP的神经网络结构庞大,对算力资源的需求大的技术问题。The disclosed embodiment extracts the features of the first image to obtain shared feature parameters, and inputs the shared feature parameters into a plurality of preset subtask function units for multi-task joint processing, thereby sharing some neural network parameters in the AI-ISP architecture, and each subtask function unit reuses the same feature space, which reduces the network structure and reduces the movement of feature data, thereby saving computing resources and improving the reasoning rate. That is, the disclosed embodiment extracts the data features of the initial image based on the front-end backbone network, and reduces the movement of neural network parameters and feature data by reusing the same feature space for each task core network, and then constructs the output image through the back-end backbone network module 30, thereby saving computing resources and improving the reasoning rate. In addition, the disclosed embodiment integrates the neural network architecture to a certain extent by using the subtask function units to share the same backbone network for feature extraction (i.e., feature extraction of the first image) and feature recovery (i.e., image construction according to the target feature parameters), which can avoid repeated feature calculations and data migration and transmission between different subtask function units, effectively improving the overall network operation efficiency, thereby enabling the disclosed embodiment to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.

本实施例中,ISP对图像信号进行处理可以包括如下几方面:In this embodiment, the ISP processes the image signal in the following aspects:

1、校正及补偿:缺陷像素校正(defective pixel correction,DPC),黑电平补偿(black level compensation,BLC),镜头畸变校正(Lens distortion correction,LSC),针对扭曲、拉伸、偏移等进行的几何校正,伽马校正、与透视原理相关的校正等。1. Correction and compensation: defective pixel correction (DPC), black level compensation (BLC), lens distortion correction (LSC), geometric correction for distortion, stretching, offset, etc., gamma correction, correction related to perspective principle, etc.

2、去噪及图像增强:时域、空域滤波、分级补偿滤波,各种噪声去除和锐化,抑制振铃效应和带状伪影,边缘增强、亮度增强和对比度增强。2. Denoising and image enhancement: time domain, spatial domain filtering, hierarchical compensation filtering, various noise removal and sharpening, suppression of ringing effect and banding artifacts, edge enhancement, brightness enhancement and contrast enhancement.

3、颜色及格式转换:颜色插值、颜色空间转换、色调映射、色度调整、颜色校正、饱和度调整、缩放和旋转等。3. Color and format conversion: color interpolation, color space conversion, tone mapping, chroma adjustment, color correction, saturation adjustment, scaling and rotation, etc.

4、自适应处理:自动白平衡、自动曝光、自动聚焦和频闪检测等。4. Adaptive processing: automatic white balance, automatic exposure, automatic focus and strobe detection, etc.

5、视觉识别(人脸、姿势识别)及极端环境下的图像处理。其中,极端环境包括震动、快速移动、较暗、过亮等。涉及的处理一般包括去模糊、点扩散函数估计,亮度补偿,运动检测,动态捕捉,图像稳定,高动态范围图像(High-Dynamic Range,HDR)处理等。5. Visual recognition (face and gesture recognition) and image processing in extreme environments. Extreme environments include vibration, fast movement, darkness, and excessive brightness. The processing involved generally includes deblurring, point spread function estimation, brightness compensation, motion detection, dynamic capture, image stabilization, and high-dynamic range image (HDR) processing.

而现有AI-ISP技术有两种主流的实现思路:一是完全依靠神经网络计算 前端图像传感器信号与输出图像的映射关系,模拟ISP全流程实现端到端的信号处理。这种方法强依赖于算力与数据集,盲目性强且解释性弱,对于具体模块关注的局部颜色与对比度、纹理、噪声等问题不能形成点对点的解决方案。二是将ISP流程中最能体现算法提升效果的功能模块(例如降噪、高动态光照渲染和图像增强等)进行AI化,这是为了将有限的算力集中在最关键、人眼最可知的功能上。这种方法仅利用深度学习处理ISP中比较复杂困难且效果提升明显的单模块任务,网络设计更具有目标性和可解释性。然而现有的AI-ISP技术中各个基于深度学习的功能模块之间相互独立,从而导致整体神经网络结构十分庞大,大量的计算和内存资源需求使其在硬件资源有限的移动设备中部署受限。实际上,现有大多数关于图像降噪、图像增强和色调映射等的研究,设计的神经网络模型都采用了类似的结构或方法,不同任务间往往只有误差函数的区别。因此,AI-ISP不同功能模块的神经网络中,通过卷积网络提取的特征是具有相似性和相关性的。通过复用相似的网络特征、共享部分神经网络参数,可以在实现多个模块任务联合处理的同时减小网络模型和算力需求。There are two mainstream implementation ideas for existing AI-ISP technology: one is to rely entirely on neural network computing. The mapping relationship between the front-end image sensor signal and the output image simulates the ISP full process to achieve end-to-end signal processing. This method is highly dependent on computing power and data sets, is highly blind and has poor interpretability, and cannot form a point-to-point solution for local color and contrast, texture, noise and other issues that specific modules are concerned about. The second is to AI the functional modules that best reflect the algorithm improvement effect in the ISP process (such as noise reduction, high dynamic lighting rendering and image enhancement, etc.), in order to focus limited computing power on the most critical and most visible functions to the human eye. This method only uses deep learning to process single-module tasks that are relatively complex and difficult in ISP and have obvious effect improvements, and the network design is more targeted and interpretable. However, in the existing AI-ISP technology, the functional modules based on deep learning are independent of each other, resulting in a very large overall neural network structure, and a large amount of computing and memory resource requirements limit its deployment in mobile devices with limited hardware resources. In fact, most of the existing research on image denoising, image enhancement and tone mapping, the neural network models designed use similar structures or methods, and the only difference between different tasks is often the error function. Therefore, in the neural networks of different functional modules of AI-ISP, the features extracted by convolutional networks are similar and relevant. By reusing similar network features and sharing some neural network parameters, it is possible to reduce network model and computing power requirements while achieving joint processing of multiple module tasks.

也即,本公开实施例提出了一种新的图像信号多任务联合处理方法,该方法通过共享部分神经网络参数,各个任务模块复用同一个特征空间,减小了网络结构,减少了特征数据的搬移,进而侧面节省了计算资源和提升了推理速率。大多数的AI-ISP技术将图像降噪、色调映射和图像增强等任务相互隔离,用各自独立的神经网络模型实现各个功能。每个神经网络模型都从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现。本公开实施例基于通过均衡算力和性能,实现了多个模块任务的联合处理。一方面,本公开实施例中各个任务模块共用相同的主干网络进行特征提取和特征恢复,在一定程度上进行了神经网络架构的融合,可以避免不同任务之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率。另一方面,不同子任务类型在特征级别可以实现解耦,每个任务模块具有独立的神经网络结构,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练。That is, the disclosed embodiment proposes a new method for joint multi-task processing of image signals. By sharing some neural network parameters, each task module reuses the same feature space, reduces the network structure, reduces the movement of feature data, and thus saves computing resources and improves the inference rate. Most AI-ISP technologies isolate tasks such as image denoising, tone mapping, and image enhancement from each other, and use independent neural network models to implement each function. Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end function implementation of each module. The disclosed embodiment realizes the joint processing of multiple module tasks based on balancing computing power and performance. On the one hand, in the disclosed embodiment, each task module shares the same backbone network for feature extraction and feature recovery, and to a certain extent, the neural network architecture is integrated, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network. On the other hand, different subtask types can be decoupled at the feature level, and each task module has an independent neural network structure. During the training process, the parameters of the front-end backbone network and the forward task module neural network are fixed, and the separate training of each neural network can be completed.

在一种可能的实施方式中,前端主干网络模块10包括预设数量个不同尺度的特征提取网络层,特征提取网络层设置为获取第一图像,对第一图像进 行多通道的特征提取,得到多尺度的共享特征参数;In a possible implementation manner, the front-end backbone network module 10 includes a preset number of feature extraction network layers of different scales, the feature extraction network layer is configured to obtain a first image, perform Perform multi-channel feature extraction to obtain multi-scale shared feature parameters;

核心网络模块20包括与各特征提取网络层一一对应连接的特征处理网络层,每个特征处理网络层包括多个子任务功能单元,特征处理网络层设置为将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,得到多尺度的目标特征参数;The core network module 20 includes a feature processing network layer connected to each feature extraction network layer in a one-to-one correspondence, each feature processing network layer includes a plurality of subtask functional units, and the feature processing network layer is configured to input multi-scale shared feature parameters into each subtask functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters;

后端主干网络模块30包括与各特征处理网络层一一对应连接的图像恢复网络层,图像恢复网络层设置为将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据融合特征参数进行图像构建,构建得到第二图像进行输出。The back-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence. The image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and construct an image based on the fused feature parameters to construct a second image for output.

在本实施例中,该前端主干网络模块10可包括预设数量个不同尺度的特征提取网络层。例如前端主干网络模块10可包括1/4尺寸、1/16尺寸和1/64尺寸的特征提取网络层,其通道数分别为16、64和256。其中,1/4尺寸的特征提取网络层可用于对第一图像在1/4尺寸上进行特征提取。1/16尺寸的特征提取网络层可用于对第一图像在1/16尺寸上进行特征提取。1/64尺寸的特征提取网络层可用于对第一图像在1/64尺寸上进行特征提取。每一个特征提取网络层可均由卷积核大小为2*2,步长为2的2D卷积网络构成。In this embodiment, the front-end backbone network module 10 may include a preset number of feature extraction network layers of different scales. For example, the front-end backbone network module 10 may include feature extraction network layers of 1/4 size, 1/16 size and 1/64 size, and the number of channels thereof are 16, 64 and 256, respectively. Among them, the feature extraction network layer of 1/4 size can be used to extract features of the first image at 1/4 size. The feature extraction network layer of 1/16 size can be used to extract features of the first image at 1/16 size. The feature extraction network layer of 1/64 size can be used to extract features of the first image at 1/64 size. Each feature extraction network layer can be composed of a 2D convolutional network with a convolution kernel size of 2*2 and a step size of 2.

对应地,该核心网络模块20包括与各特征提取网络层一一对应连接的特征处理网络层,每个特征处理网络层包括多个子任务功能单元,该特征处理网络层设置为将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,得到多尺度的目标特征参数。为了助于理解,列举一示例,在该示例中,前端主干网络模块10包括1/4尺寸、1/16尺寸和1/64尺寸的特征提取网络层,此时该核心网络模块20则同样包括三个特征处理网络层。其中,1/4尺寸的特征提取网络层连接一个特征处理网络层,1/16尺寸的特征提取网络层连接一个特征处理网络层,1/64尺寸的特征提取网络层连接一个特征处理网络层。且每个特征处理网络层均包含结构相同的多个子任务功能单元。例如,每个特征处理网络层均包括图像降噪单元和图像增强单元。Correspondingly, the core network module 20 includes a feature processing network layer connected to each feature extraction network layer in a one-to-one correspondence, each feature processing network layer includes a plurality of subtask functional units, and the feature processing network layer is configured to input multi-scale shared feature parameters into each subtask functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters. To help understand, an example is given, in which the front-end backbone network module 10 includes feature extraction network layers of 1/4 size, 1/16 size and 1/64 size, and the core network module 20 also includes three feature processing network layers. Among them, the feature extraction network layer of 1/4 size is connected to a feature processing network layer, the feature extraction network layer of 1/16 size is connected to a feature processing network layer, and the feature extraction network layer of 1/64 size is connected to a feature processing network layer. And each feature processing network layer contains multiple subtask functional units with the same structure. For example, each feature processing network layer includes an image denoising unit and an image enhancement unit.

对应地,该后端主干网络模块30包括与各特征处理网络层一一对应连接的图像恢复网络层,该图像恢复网络层设置为将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据该融合特征参数进行图像构建,构建得到第二图像进行输出。为了助于理解,列举一示例,在该示例中,前端主 干网络模块10包括1/4尺寸、1/16尺寸和1/64尺寸的特征提取网络层,该核心网络模块20则包括三个特征处理网络层(分别为第一特征处理网络层、第二特征处理网络层和第三特征处理网络层)。其中,1/4尺寸的特征提取网络层连接第一特征处理网络层,1/16尺寸的特征提取网络层连接第二特征处理网络层,1/64尺寸的特征提取网络层连接第三特征处理网络层。此时,该后端主干网络模块30同样包括三个图像恢复网络层,且各个图像恢复网络层的尺度与各个特征提取网络层的尺度一一对应,也即三个图像恢复网络层同样对应分别为1/4尺寸、1/16尺寸和1/64尺寸的图像恢复网络层。在该示例中,具体包括三条连接链路,其中一条连接链路为依次连接的1/4尺寸的特征提取网络层、第一特征网络层和1/4尺寸的特征恢复网络层。其中另一条连接链路为依次连接的1/16尺寸的特征提取网络层、第二特征网络层和1/16尺寸的特征恢复网络层。其中还一条连接链路为依次连接的1/64尺寸的特征提取网络层、第三特征网络层和1/64尺寸的特征恢复网络层。Correspondingly, the back-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence. The image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and to construct an image based on the fused feature parameters to construct a second image for output. To facilitate understanding, an example is given. In this example, the front-end backbone network module 30 includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence. The image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and to construct an image based on the fused feature parameters to obtain a second image for output. The backbone network module 10 includes feature extraction network layers of 1/4 size, 1/16 size and 1/64 size, and the core network module 20 includes three feature processing network layers (respectively, the first feature processing network layer, the second feature processing network layer and the third feature processing network layer). Among them, the feature extraction network layer of 1/4 size is connected to the first feature processing network layer, the feature extraction network layer of 1/16 size is connected to the second feature processing network layer, and the feature extraction network layer of 1/64 size is connected to the third feature processing network layer. At this time, the back-end backbone network module 30 also includes three image restoration network layers, and the scale of each image restoration network layer corresponds one-to-one to the scale of each feature extraction network layer, that is, the three image restoration network layers also correspond to image restoration network layers of 1/4 size, 1/16 size and 1/64 size respectively. In this example, three connection links are specifically included, one of which is a 1/4 size feature extraction network layer, a first feature network layer and a 1/4 size feature restoration network layer connected in sequence. Another connection link is a 1/16-sized feature extraction network layer, a second feature network layer, and a 1/16-sized feature recovery network layer connected in sequence. Another connection link is a 1/64-sized feature extraction network layer, a third feature network layer, and a 1/64-sized feature recovery network layer connected in sequence.

本公开实施例通过将前端主干网络模块10设置包括预设数量个不同尺度的特征提取网络层,特征提取网络层设置为获取第一图像,对第一图像进行多通道的特征提取,得到多尺度的共享特征参数,从而对第一图像进行多种尺度的下采样处理,提取出多个不同通道数的图像特征,即多尺度的共享特征参数,然后通过将核心网络模块20设置为与各特征提取网络层一一对应连接的特征处理网络层,每个特征处理网络层包括多个子任务功能单元,特征处理网络层设置为将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,从而得到多个不同通道数的多任务联合处理后的图像特征,即多尺度的目标特征参数。然后,本实施例通过将后端主干网络模块30设置为与各特征处理网络层一一对应连接的图像恢复网络层,该图像恢复网络层设置为将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据该融合特征参数进行图像构建,构建得到第二图像进行输出,从而对融合特征参数进行上采样处理,以得到与该第一图像的尺度一致的(即恢复至原始图像的尺寸),且经过多任务联合处理后(例如图像降噪处理和图像增强处理)的第二图像。In the embodiment of the present disclosure, the front-end backbone network module 10 is configured to include a preset number of feature extraction network layers of different scales, the feature extraction network layer is configured to obtain a first image, perform multi-channel feature extraction on the first image, and obtain multi-scale shared feature parameters, thereby performing multi-scale downsampling processing on the first image to extract image features of multiple numbers of channels, i.e., multi-scale shared feature parameters, and then the core network module 20 is configured to be a feature processing network layer connected to each feature extraction network layer in a one-to-one correspondence, each feature processing network layer includes multiple sub-task functional units, and the feature processing network layer is configured to input the multi-scale shared feature parameters into each sub-task functional unit for multi-channel multi-task joint processing, thereby obtaining image features after multi-task joint processing of multiple numbers of channels, i.e., multi-scale target feature parameters. Then, this embodiment sets the back-end backbone network module 30 as an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence. The image restoration network layer is set to fuse the multi-scale target feature parameters to obtain fused feature parameters, and construct an image based on the fused feature parameters to construct a second image for output, thereby upsampling the fused feature parameters to obtain a second image that is consistent with the scale of the first image (i.e., restored to the size of the original image) and has undergone multi-task joint processing (such as image denoising and image enhancement).

示例性地,子任务功能单元的种类包括图像降噪单元和图像增强单元,其中,每个特征处理网络层包括依次连接的图像降噪单元和图像增强单元。 Exemplarily, the types of subtask functional units include image denoising units and image enhancement units, wherein each feature processing network layer includes an image denoising unit and an image enhancement unit connected in sequence.

可知的是,图像降噪是指减少数字图像中噪声的过程,有时候又称为图像去噪。而图像增强是指将原来不清晰的图像变得清晰或强调某些感兴趣的特征,抑制不感兴趣的特征,使之改善图像质量、丰富信息量,加强图像判读和识别效果的图像处理方法。It is known that image noise reduction refers to the process of reducing noise in digital images, sometimes also called image denoising. Image enhancement refers to an image processing method that makes an originally unclear image clear or emphasizes certain interesting features, suppresses uninteresting features, improves image quality, enriches information, and enhances image interpretation and recognition effects.

本实施例通过对第一图像进行多种尺度的下采样处理,提取出多个不同通道数的图像特征,从而便于提升对第一图像进行多任务联合处理(例如图像降噪处理和图像增强处理)的图像处理效果。This embodiment extracts image features with multiple numbers of channels by performing downsampling processing on the first image at multiple scales, thereby facilitating improving the image processing effect of performing multi-task joint processing (such as image denoising processing and image enhancement processing) on the first image.

在一种可能的实施方式中,图像信号处理装置还包括训练模块(未图示),训练模块设置为:In a possible implementation, the image signal processing apparatus further includes a training module (not shown), and the training module is configured to:

获取第一训练数据,其中,第一训练数据包括第一噪声的第三图像和从第三图像获得第二噪声的第四图像,第二噪声大于第一噪声;Acquire first training data, wherein the first training data includes a third image of first noise and a fourth image of second noise obtained from the third image, and the second noise is greater than the first noise;

利用第一训练数据对第一训练链路中的图像降噪单元进行训练,以对图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至第一误差函数收敛,得到训练完成的图像降噪单元,其中,第一训练链路为依次连接的前端主干网络模块10、图像降噪单元和后端主干网络模块30;The image denoising unit in the first training link is trained using the first training data to perform supervised learning on the first error function that performs denoising task processing in the image denoising unit until the first error function converges, thereby obtaining a trained image denoising unit, wherein the first training link is a front-end backbone network module 10, an image denoising unit, and a back-end backbone network module 30 that are sequentially connected;

在图像降噪单元训练完成后,将第一误差函数进行固定,并获取第二训练数据,其中,第二训练数据包括第一分辨率的第五图像和从第五图像获得第二分辨率的第六图像,第一分辨率大于第二分辨率;After the image denoising unit training is completed, the first error function is fixed, and second training data is obtained, wherein the second training data includes a fifth image with a first resolution and a sixth image with a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution;

利用第二训练数据对第二训练链路中的图像增强单元进行训练,以对图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至第二误差函数收敛,得到训练完成的图像增强单元,其中,第二训练链路为依次连接的前端主干网络模块10、图像降噪单元、图像增强单元和后端主干网络模块30。The image enhancement unit in the second training link is trained using the second training data to perform supervised learning on the second error function that performs the enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit, wherein the second training link is a front-end backbone network module 10, an image denoising unit, an image enhancement unit, and a back-end backbone network module 30 connected in sequence.

在本实施例中,第一误差函数是指图像降噪单元进行降噪任务处理的误差函数。第二误差函数是指图像增强单元进行增强任务处理的误差函数。In this embodiment, the first error function refers to the error function of the image noise reduction unit in performing noise reduction task processing, and the second error function refers to the error function of the image enhancement unit in performing enhancement task processing.

本实施例通过利用第一训练数据对第一训练链路中的图像降噪单元进行训练,以对图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至第一误差函数收敛,从而得到训练完成的图像降噪单元,其中,第一训练链路为依次连接的前端主干网络模块10、图像降噪单元和后端主干网络模块30。然后在图像降噪单元训练完成后,将第一误差函数进行固定,并获取 第二训练数据,其中,该第二训练数据包括第一分辨率的第五图像和从第五图像获得第二分辨率的第六图像,该第一分辨率大于第二分辨率,再利用第二训练数据对第二训练链路中的图像增强单元进行训练,以对图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至第二误差函数收敛,得到训练完成的图像增强单元,其中,该第二训练链路为依次连接的前端主干网络模块10、图像降噪单元、图像增强单元和后端主干网络模块30,从而使得图像降噪单元和图像增强单元之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练,也即可通过固定前端主干网络模块10的网络参数,来训练图像降噪单元,其中,训练图像降噪单元的数据流不经过图像增强单元的网络层,然后在图像降噪单元训练完后,可通过固定前端主干网络模块10和图像降噪单元的网络参数,来训练图像增强单元,直至图像增强单元的网络参数收敛。In this embodiment, the image denoising unit in the first training link is trained by using the first training data to supervise the first error function of the image denoising unit for denoising task processing until the first error function converges, thereby obtaining a trained image denoising unit, wherein the first training link is the front-end backbone network module 10, the image denoising unit and the back-end backbone network module 30 connected in sequence. Then, after the image denoising unit training is completed, the first error function is fixed and the image denoising unit is obtained. The second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, wherein the first resolution is greater than the second resolution, and the second training data is used to train the image enhancement unit in the second training link to supervise the learning of the second error function for the enhancement task processing in the image enhancement unit until the second error function converges to obtain a trained image enhancement unit, wherein the second training link is a front-end backbone network module 10, an image denoising unit, an image enhancement unit and a back-end backbone network module 30 connected in sequence, so that the image denoising unit and the image enhancement unit can be decoupled at the feature level, and the parameters of the neural networks of the front-end backbone network and the forward task module are fixed during the training process to complete the separate training of the respective neural networks, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module 10, wherein the data flow for training the image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the image enhancement unit can be trained by fixing the network parameters of the front-end backbone network module 10 and the image denoising unit until the network parameters of the image enhancement unit converge.

相比于完全端到端的AI-ISP技术,本公开实施例在特征层面解耦了不同的功能模块,每个模块具有独立的神经网络,可以在前向网络固定的基础上进行训练,从模块级别优化整体ISP性能。Compared with the fully end-to-end AI-ISP technology, the embodiment of the present disclosure decouples different functional modules at the feature level. Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.

也即,主流的AI-ISP技术将图像降噪和图像增强等任务相互隔离,用各自独立的神经网络模型实现各个功能。每个神经网络模型都从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现。而本公开实施例中各个任务模块共用相同的主干网络进行特征提取和特征恢复,可以避免不同任务之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率。不同子任务类型之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练。That is, the mainstream AI-ISP technology isolates tasks such as image denoising and image enhancement from each other, and uses independent neural network models to implement each function. Each neural network model starts from the pixel level, performs feature extraction, task processing and feature recovery on the image, and then completes the end-to-end functional implementation of each module. In the disclosed embodiment, each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network. Different subtask types can be decoupled at the feature level. By fixing the parameters of the front-end backbone network and the forward task module neural network during training, the separate training of each neural network can be completed.

为了助于理解本公开实施例的技术构思或技术原理,列举一具体实施例,参照图4,图4为本公开实施例降噪和增强联合处理的神经网络架构图,包括:In order to help understand the technical concept or technical principle of the embodiment of the present disclosure, a specific embodiment is listed, with reference to FIG. 4 , which is a neural network architecture diagram of the joint processing of noise reduction and enhancement in the embodiment of the present disclosure, including:

在本具体实施例中,使用pyramid网络作为特征提取和恢复的主干网络(即前端主干网络模块10),图像降噪和图像增强为两个子任务(即子任务功能单元)为例说明本公开实施例的具体实施流程。如图4所示的神经网络架构(即AI-ISP网络架构),包括前端主干网络(即前端主干网络模块10)、 核心网络(即核心网络模块20)和后端主干网络(即后端主干网络模块30)。其中,前端主干网络包括三层特征提取网络,分别对原图像(即第一图像)在1/4、1/16和1/64的尺寸上进行特征提取(即对第一图像进行多通道的特征提取),其通道数分别为16、64和256,每一层特征提取网络均由卷积核大小为2*2,步长为2的2D卷积网络构成。前端主干网络对输入图像进行处理得到多尺度特征,作为核心网络的输入。类似的,后端主干网络包括三层特征恢复网络,其通道数分别为256、64和16,每一层特征提取网络均由pixel_shuffle网络构成,用于将任务核心网络输出的特征恢复为原尺寸图像。核心网络中的降噪模块(即图像降噪单元)和增强模块(即图像增强单元)均采用了unet网络结构,同样也包括3个和主干网络对应尺度的网络块,每个网络块具体的网络层数可以根据具体情况调整。这里,每个网络块包含三层网络,每一层网络由深度可分离卷积和relu激活函数构成。In this specific embodiment, a pyramid network is used as the backbone network for feature extraction and restoration (i.e., the front-end backbone network module 10), and image denoising and image enhancement are used as two subtasks (i.e., subtask functional units) as examples to illustrate the specific implementation process of the disclosed embodiment. The neural network architecture (i.e., AI-ISP network architecture) shown in FIG4 includes a front-end backbone network (i.e., the front-end backbone network module 10), Core network (i.e. core network module 20) and back-end backbone network (i.e. back-end backbone network module 30). Among them, the front-end backbone network includes three layers of feature extraction networks, which respectively perform feature extraction on the original image (i.e. first image) at sizes of 1/4, 1/16 and 1/64 (i.e. perform multi-channel feature extraction on the first image), and the number of channels are 16, 64 and 256 respectively, and each layer of feature extraction network is composed of a 2D convolution network with a convolution kernel size of 2*2 and a step size of 2. The front-end backbone network processes the input image to obtain multi-scale features as the input of the core network. Similarly, the back-end backbone network includes three layers of feature recovery networks, and the number of channels are 256, 64 and 16 respectively, and each layer of feature extraction network is composed of a pixel_shuffle network, which is used to restore the features output by the task core network to the original size image. The denoising module (i.e., image denoising unit) and enhancement module (i.e., image enhancement unit) in the core network both adopt the UNET network structure, and also include 3 network blocks of the same scale as the backbone network. The specific number of network layers in each network block can be adjusted according to the specific situation. Here, each network block contains three layers of networks, and each layer of the network is composed of a depth-separable convolution and a relu activation function.

在本具体实施例中,前向网络在1/4、1/16和1/64尺寸上的特征均会作为核心网络子任务的输入。训练过程分为两步:In this specific embodiment, the features of the forward network at 1/4, 1/16 and 1/64 scales are used as inputs for the core network subtask. The training process is divided into two steps:

第一步,输入图像(即第一图像)依次经过前端主干网络、降噪模块核心网络和后端主干网络处理,利用降噪任务的误差函数L1loss(即第一误差函数)进行监督学习,反向传播时通过梯度下降法更新相关网络参数。注意,在第一步中,数据流不通过增强模块核心网络的网络层,仅通过前端主干网络、降噪模块核心网络和后端主干网络处理(即第一训练链路)。In the first step, the input image (i.e., the first image) is processed by the front-end backbone network, the core network of the denoising module, and the back-end backbone network in turn, and supervised learning is performed using the error function L1loss (i.e., the first error function) of the denoising task, and the relevant network parameters are updated by the gradient descent method during back propagation. Note that in the first step, the data flow does not pass through the network layer of the core network of the enhancement module, but only through the front-end backbone network, the core network of the denoising module, and the back-end backbone network (i.e., the first training link).

第二步,固定第一步中前端主干网络、降噪模块核心网络的网络参数。输入图像依次经过前端主干网络、降噪模块核心网络、增强模块核心网络和后端主干网络处理(即第二训练链路),利用降噪任务的误差函数L1loss(即第一误差函数)和增强网络的误差函数(即第二误差函数)中局部对比度函数、颜色饱和度函数的加权参数进行监督学习,从而更新增强模块核心网络的参数,并对后端主干网络的参数进行微调。在这里,增强核心网络直接处理降噪核心网络输出的数据特征,而非从像素级重新进行特征提取。In the second step, the network parameters of the front-end backbone network and the core network of the denoising module in the first step are fixed. The input image is processed by the front-end backbone network, the core network of the denoising module, the core network of the enhancement module, and the back-end backbone network in sequence (i.e., the second training link). The weighted parameters of the local contrast function and the color saturation function in the error function L1loss (i.e., the first error function) of the denoising task and the error function of the enhancement network (i.e., the second error function) are used for supervised learning, thereby updating the parameters of the core network of the enhancement module and fine-tuning the parameters of the back-end backbone network. Here, the enhancement core network directly processes the data features output by the denoising core network, rather than re-extracting features at the pixel level.

其中,图4示出的多任务神经网络模型中还存在其他子任务(例如还可在核心网络中设置包括色调映射任务单元等),训练方式同第二步操作,固定前端主干网络和已训练的前向核心网络的参数,利用各任务的加权误差函数进行监督学习。 Among them, there are other subtasks in the multi-task neural network model shown in Figure 4 (for example, a tone mapping task unit can also be set in the core network, etc.), and the training method is the same as the second step. The parameters of the front-end backbone network and the trained forward core network are fixed, and the weighted error function of each task is used for supervised learning.

测试和推理阶段,输入图像将依次通过前端主干网络、各个任务核心网络、后端主干网络,得到最终的输出,实现ISP中多任务联合处理。During the testing and reasoning phases, the input image will pass through the front-end backbone network, each task core network, and the back-end backbone network in sequence to obtain the final output, realizing multi-task joint processing in ISP.

需要说明的是,上述具体实施例仅用于帮助理解本公开实施例的技术构思,并不构成对本公开图像信号处理装置的限定,基于该技术构思进行更多形式的简单变换,均应在本公开的保护范围内。It should be noted that the above specific embodiments are only used to help understand the technical concept of the embodiments of the present disclosure, and do not constitute a limitation on the image signal processing device of the present disclosure. More simple transformations based on the technical concept should all be within the protection scope of the present disclosure.

在一示例性实施方式中,子任务功能单元的种类还包括色调映射单元,其中,每个特征处理网络层包括依次连接的图像降噪单元、图像增强单元和色调映射单元。In an exemplary embodiment, the type of subtask functional unit also includes a tone mapping unit, wherein each feature processing network layer includes an image denoising unit, an image enhancement unit, and a tone mapping unit connected in sequence.

本领域技术人员可知的是,色调映射是指在有限动态范围媒介上近似显示高动态范围图像的一项计算机图形学技术。本质上来讲,色调映射要解决的问题是进行大幅度的对比度衰减将场景亮度变换到可以显示的范围,同时要保持图像细节与颜色等对于表现原始场景非常重要的信息。Those skilled in the art will know that tone mapping is a computer graphics technique that approximates the display of high dynamic range images on a limited dynamic range medium. Essentially, the problem that tone mapping solves is to perform a large contrast reduction to transform the scene brightness to a displayable range while maintaining image details and colors, which are very important for representing the original scene.

本实施例通过将子任务功能单元的种类还设置包括色调映射单元,从而使得本公开实施例的图像信号处理装置在能够对第一图像进行图像降噪处理和图像增强处理的基础上,还能对第一图像进行色调映射处理,有效拓展了图像信号处理装置的图像处理任务功能。In this embodiment, the types of sub-task function units are further arranged to include a tone mapping unit, so that the image signal processing device of the disclosed embodiment can not only perform image noise reduction processing and image enhancement processing on the first image, but also perform tone mapping processing on the first image, thereby effectively expanding the image processing task function of the image signal processing device.

在一种可能的实施方式中,子任务功能模块的种类还包括色调映射单元,其中,每个特征处理网络层包括依次连接的图像降噪单元、图像增强单元和色调映射单元,训练模块设置为:In a possible implementation, the type of subtask function module also includes a tone mapping unit, wherein each feature processing network layer includes an image denoising unit, an image enhancement unit, and a tone mapping unit connected in sequence, and the training module is set to:

在图像增强单元训练完成后,将第二误差函数进行固定,并获取第三训练数据,其中,第三训练数据包括第一对比度的第七图像和从第七图像获得第二对比度的第八图像,第一对比度大于第二对比度;After the image enhancement unit training is completed, the second error function is fixed, and third training data is obtained, wherein the third training data includes a seventh image with a first contrast and an eighth image with a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast;

利用第三训练数据对第三训练链路中的色调映射单元进行训练,以对色调映射单元中进行映射任务处理的第三误差函数进行监督学习,直至第三误差函数收敛,得到训练完成的色调映射单元,其中,第三训练链路为依次连接的前端主干网络、图像降噪单元、图像增强单元、色调映射单元和后端主干网络处理。The tone mapping unit in the third training link is trained using the third training data to perform supervised learning on the third error function that performs mapping task processing in the tone mapping unit until the third error function converges to obtain a trained tone mapping unit, wherein the third training link is a front-end backbone network, an image denoising unit, an image enhancement unit, a tone mapping unit, and a back-end backbone network processing connected in sequence.

在本实施例中,第三误差函数是指色调映射单元进行映射任务处理的误差函数。In this embodiment, the third error function refers to the error function used by the tone mapping unit to perform mapping task processing.

本公开实施通过将子任务功能模块的种类还设置包括色调映射单元,其 中,每个特征处理网络层包括依次连接的图像降噪单元、图像增强单元和色调映射单元,并将该训练模块设置为在图像增强单元训练完成后,将第二误差函数进行固定,并获取第三训练数据,其中,该第三训练数据包括第一对比度的第七图像和从第七图像获得第二对比度的第八图像,该第一对比度大于第二对比度,然后利用第三训练数据对第三训练链路中的色调映射单元进行训练,以对色调映射单元中进行映射任务处理的第三误差函数进行监督学习,直至第三误差函数收敛,得到训练完成的色调映射单元,其中,第三训练链路为依次连接的前端主干网络、图像降噪单元、图像增强单元、色调映射单元和后端主干网络处理,从而使得图像降噪单元、图像增强单元和色调映射单元之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练,也即可通过固定前端主干网络模块10的网络参数,来训练图像降噪单元,其中,训练图像降噪单元的数据流不经过图像增强单元的网络层,然后在图像降噪单元训练完后,可通过固定前端主干网络模块10和图像降噪单元的网络参数,来训练图像增强单元,直至图像增强单元的网络参数收敛,其中,训练图像增强单元的数据流不经过色调映射单元的网络层。然后可通过固定前端主干网络模块10、图像降噪单元和图像增强单元的网络参数,来训练色调映射单元,直至色调映射单元的网络参数收敛。The present disclosure implements the method of further configuring the subtask function module to include a tone mapping unit. In the method, each feature processing network layer includes an image denoising unit, an image enhancement unit and a tone mapping unit connected in sequence, and the training module is set to fix the second error function after the image enhancement unit is trained, and obtain the third training data, wherein the third training data includes a seventh image of a first contrast and an eighth image of a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast, and then the tone mapping unit in the third training link is trained using the third training data to supervise the learning of the third error function of the mapping task processing in the tone mapping unit until the third error function converges to obtain the trained tone mapping unit, wherein the third training link is a front-end backbone network, an image denoising unit, an image enhancement unit connected in sequence, , tone mapping unit and back-end backbone network processing, so that the image denoising unit, image enhancement unit and tone mapping unit can be decoupled at the feature level. During the training process, by fixing the parameters of the neural network of the front-end backbone network and the forward task module, the separate training of each neural network can be completed, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module 10, wherein the data flow of training the image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the image enhancement unit can be trained by fixing the network parameters of the front-end backbone network module 10 and the image denoising unit until the network parameters of the image enhancement unit converge, wherein the data flow of training the image enhancement unit does not pass through the network layer of the tone mapping unit. Then, the tone mapping unit can be trained by fixing the network parameters of the front-end backbone network module 10, the image denoising unit and the image enhancement unit until the network parameters of the tone mapping unit converge.

相比于完全端到端的AI-ISP技术,本公开实施例在特征层面解耦了不同的功能模块,每个模块具有独立的神经网络,可以在前向网络固定的基础上进行训练,从模块级别优化整体ISP性能。Compared with the fully end-to-end AI-ISP technology, the embodiment of the present disclosure decouples different functional modules at the feature level. Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.

也即,主流的AI-ISP技术将图像降噪、图像增强和色调映射等任务相互隔离,用各自独立的神经网络模型实现各个功能。每个神经网络模型都从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现。而本公开实施例中各个任务模块共用相同的主干网络进行特征提取和特征恢复,可以避免不同任务之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率。不同子任务类型之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练。That is, the mainstream AI-ISP technology isolates tasks such as image denoising, image enhancement, and tone mapping from each other, and implements each function with independent neural network models. Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end functional implementation of each module. In the disclosed embodiment, each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network. Different subtask types can be decoupled at the feature level. By fixing the parameters of the front-end backbone network and the forward task module neural network during training, separate training of each neural network can be completed.

以上所描述的装置实施例仅仅是示意性的,其中作为分离部件说明的单 元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are merely illustrative, and the individual components described as separate components are The elements may or may not be physically separated, that is, they may be located in one place, or they may be distributed on multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

在各种实施例中,可省略图像处理装置中的一些模块,或者还可包括另外的模块。此外,根据本公开的各种实施例的模块/元件可被组合以形成单个实体,并且因此可等效地执行相应模块/元件在组合之前的功能。In various embodiments, some modules in the image processing device may be omitted, or additional modules may be included. In addition, the modules/elements according to various embodiments of the present disclosure may be combined to form a single entity, and thus may equivalently perform the functions of the corresponding modules/elements before the combination.

此外,本公开实施例还提供了一种图像信号处理方法,参照图5,图5为本公开图像信号处理方法一具体实施例的流程示意图。本实施例中,所述图像信号处理方法包括:In addition, the present disclosure also provides an image signal processing method, with reference to FIG5 , which is a flowchart of a specific embodiment of the image signal processing method of the present disclosure. In this embodiment, the image signal processing method includes:

步骤S10,获取第一图像,对所述第一图像进行特征提取,得到共享特征参数;Step S10, acquiring a first image, performing feature extraction on the first image, and obtaining shared feature parameters;

步骤S20,将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,所述共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;Step S20, inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters having task correlation between the subtask functional units;

步骤S30,根据所述目标特征参数进行图像构建,构建得到第二图像进行输出,其中,所述第二图像的质量高于所述第一图像。Step S30, constructing an image according to the target feature parameters, and constructing a second image for output, wherein the quality of the second image is higher than that of the first image.

面对AI-ISP架构的不同功能单元(例如图像降噪和图像增强等功能单元)的神经网络,本公开实施例通过对初始图像提取的特征(即共享特征参数)具有相似性和相关性的,从而实现在各个子任务功能单元之间复用相似的网络特征,从而共享部分神经网络参数,进而实现本公开实施例在进行多任务联合处理时,减小了网络模型和算力需求。In the face of the neural networks of different functional units of the AI-ISP architecture (such as image denoising and image enhancement functional units), the embodiments of the present disclosure achieve the reuse of similar network features among various sub-task functional units by extracting features of the initial image (i.e., sharing feature parameters) that are similar and relevant, thereby sharing some neural network parameters, thereby achieving the embodiment of the present disclosure when performing multi-task joint processing. Reduced network model and computing power requirements.

相比于目前AI-ISP技术中将图像降噪和图像增强等子任务功能单元相互隔离,用各自独立的神经网络模型实现各个功能,也即目前每个功能单元对应的神经网络模型都需要从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现,而本公开实施例通过对该第一图像进行特征提取,得到共享特征参数,并将该共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,从而实现在AI-ISP架构中共享部分神经网络参数,各个子任务功能单元复用同一个特征空间,减小了网络结构,减少了特征数据的搬移,进而侧面节省了计算资源和提升了推理速率。也即,本公开实施例提取初始图像的数据特征,并通过将各个任务核 心网络复用同一个特征空间,减少了神经网络参数和特征数据的搬移,再通过后端主干网络模块进行输出图像的构建,进而节省了计算资源和提升了推理速率。并且本公开实施例通过将子任务功能单元共用相同的主干网络进行特征提取(即对第一图像进行特征提取)和特征恢复(即根据目标特征参数进行图像构建),在一定程度上进行了神经网络架构的融合,可以避免不同子任务功能单元之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率,进而使得本公开实施例解决了目前AI-ISP的神经网络结构庞大,对算力资源的需求大的技术问题。Compared with the current AI-ISP technology, which isolates sub-task functional units such as image denoising and image enhancement from each other and implements each function with their own independent neural network models, that is, the neural network model corresponding to each functional unit currently needs to start from the pixel level, perform feature extraction, task processing and feature recovery on the image, and then complete the end-to-end functional implementation of each module. The embodiment of the present disclosure extracts features from the first image to obtain shared feature parameters, and inputs the shared feature parameters to multiple preset sub-task functional units for multi-task joint processing, thereby realizing the sharing of some neural network parameters in the AI-ISP architecture. Each sub-task functional unit reuses the same feature space, which reduces the network structure and the movement of feature data, thereby indirectly saving computing resources and improving the inference rate. That is, the embodiment of the present disclosure extracts the data features of the initial image, and by integrating each task core The core network reuses the same feature space, reducing the movement of neural network parameters and feature data, and then constructs the output image through the back-end backbone network module, thereby saving computing resources and improving the reasoning rate. In addition, the disclosed embodiment integrates the neural network architecture to a certain extent by using the subtask functional units to share the same backbone network for feature extraction (i.e., feature extraction of the first image) and feature recovery (i.e., image construction based on the target feature parameters), which can avoid repeated calculation of features and migration and transmission of data between different subtask functional units, effectively improving the overall network operation efficiency, and thus enabling the disclosed embodiment to solve the technical problems of the current AI-ISP's huge neural network structure and high demand for computing resources.

在一种可能的实施方式中,所述方法还包括:In a possible implementation, the method further includes:

步骤A10,获取第一图像,对所述第一图像进行多通道的特征提取,得到多尺度的共享特征参数;Step A10, acquiring a first image, performing multi-channel feature extraction on the first image, and obtaining multi-scale shared feature parameters;

步骤A20,将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,得到多尺度的目标特征参数;Step A20, inputting the multi-scale shared feature parameters into each sub-task functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters;

步骤A30,将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据所述融合特征参数进行图像构建,构建得到第二图像进行输出。Step A30, feature fusion is performed on the multi-scale target feature parameters to obtain fused feature parameters, and image construction is performed according to the fused feature parameters to construct a second image for output.

本公开实施例通过获取第一图像,对第一图像进行多通道的特征提取,得到多尺度的共享特征参数,从而对第一图像进行多种尺度的下采样处理,提取出多个不同通道数的图像特征,即多尺度的共享特征参数,然后通过将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,从而得到多个不同通道数的多任务联合处理后的图像特征,即多尺度的目标特征参数。然后,本实施例通过将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据该融合特征参数进行图像构建,构建得到第二图像进行输出,从而对融合特征参数进行上采样处理,以得到与该第一图像的尺度一致的(即恢复至原始图像的尺寸),且经过多任务联合处理后(例如图像降噪处理和图像增强处理)的第二图像。The disclosed embodiment acquires a first image, performs multi-channel feature extraction on the first image, obtains multi-scale shared feature parameters, and then performs multi-scale downsampling processing on the first image to extract multiple image features with different numbers of channels, i.e., multi-scale shared feature parameters, and then inputs the multi-scale shared feature parameters into each subtask functional unit for multi-channel multi-task joint processing, thereby obtaining multiple image features after multi-task joint processing with different numbers of channels, i.e., multi-scale target feature parameters. Then, the present embodiment performs feature fusion of the multi-scale target feature parameters to obtain fused feature parameters, and constructs an image based on the fused feature parameters to construct a second image for output, thereby performing upsampling processing on the fused feature parameters to obtain a second image that is consistent with the scale of the first image (i.e., restored to the size of the original image) and has been subjected to multi-task joint processing (e.g., image denoising and image enhancement).

示例性地,所述子任务功能单元的种类包括图像降噪单元和图像增强单元,所述将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数的步骤包括:Exemplarily, the types of the subtask functional units include an image noise reduction unit and an image enhancement unit, and the step of inputting the shared feature parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain the target feature parameters includes:

步骤B10,将所述共享特征参数输入至依次连接的图像降噪单元和图像增强单元以进行多任务联合处理,得到目标特征参数。 Step B10: input the shared feature parameters to the image denoising unit and the image enhancement unit connected in sequence to perform multi-task joint processing to obtain target feature parameters.

本实施例通过对第一图像进行多种尺度的下采样处理,提取出多个不同通道数的图像特征,从而便于提升对第一图像进行多任务联合处理(例如图像降噪处理和图像增强处理)的图像处理效果。This embodiment extracts image features with multiple numbers of channels by performing downsampling processing on the first image at multiple scales, thereby facilitating improving the image processing effect of performing multi-task joint processing (such as image denoising processing and image enhancement processing) on the first image.

在一种可实施的方式中,所述方法还包括:In one practicable manner, the method further comprises:

步骤C10,获取第一训练数据,其中,所述第一训练数据包括第一噪声的第三图像和从所述第三图像获得第二噪声的第四图像,所述第二噪声大于所述第一噪声;Step C10, acquiring first training data, wherein the first training data includes a third image with first noise and a fourth image with second noise obtained from the third image, and the second noise is greater than the first noise;

步骤C20,利用所述第一训练数据对图像降噪单元进行训练,以对所述图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至所述第一误差函数收敛,得到训练完成的图像降噪单元;Step C20, training the image denoising unit using the first training data to perform supervised learning on a first error function that performs denoising task processing in the image denoising unit, until the first error function converges, thereby obtaining a trained image denoising unit;

步骤C30,在图像降噪单元训练完成后,将所述第一误差函数进行固定,并获取第二训练数据,其中,所述第二训练数据包括第一分辨率的第五图像和从所述第五图像获得第二分辨率的第六图像,所述第一分辨率大于所述第二分辨率;Step C30, after the image denoising unit training is completed, fixing the first error function and acquiring second training data, wherein the second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution;

步骤C40,利用所述第二训练数据对图像增强单元进行训练,以对所述图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至所述第二误差函数收敛,得到训练完成的图像增强单元。Step C40, using the second training data to train the image enhancement unit to perform supervised learning on the second error function that performs enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit.

本实施例通过利用第一训练数据对第一训练链路中的图像降噪单元进行训练,以对图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至第一误差函数收敛,从而得到训练完成的图像降噪单元。然后在图像降噪单元训练完成后,将第一误差函数进行固定,并获取第二训练数据,其中,该第二训练数据包括第一分辨率的第五图像和从第五图像获得第二分辨率的第六图像,该第一分辨率大于第二分辨率,再利用第二训练数据对第二训练链路中的图像增强单元进行训练,以对图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至第二误差函数收敛,得到训练完成的图像增强单元,从而使得图像降噪单元和图像增强单元之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练,也即可通过固定前端主干网络模块的网络参数,来训练图像降噪单元,其中,训练图像降噪单元的数据流不经过图像增强单元的网络层,然后在图像降噪单元训练完后,可通过固定前端主干 网络模块和图像降噪单元的网络参数,来训练图像增强单元,直至图像增强单元的网络参数收敛。In this embodiment, the image denoising unit in the first training link is trained by using the first training data to supervise the learning of the first error function for the denoising task in the image denoising unit until the first error function converges, thereby obtaining a trained image denoising unit. Then, after the image denoising unit is trained, the first error function is fixed, and the second training data is obtained, wherein the second training data includes a fifth image of a first resolution and a sixth image of a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution. The image enhancement unit in the second training link is then trained by using the second training data to supervise the learning of the second error function for the enhancement task in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit, so that the image denoising unit and the image enhancement unit can be decoupled at the feature level. During the training process, the parameters of the neural network of the front-end backbone network and the forward task module can be fixed to complete the separate training of the respective neural networks, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module, wherein the data flow for training the image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the front-end backbone network module can be fixed to complete the training of the image denoising unit. The network parameters of the network module and the image denoising unit are used to train the image enhancement unit until the network parameters of the image enhancement unit converge.

相比于完全端到端的AI-ISP技术,本公开实施例在特征层面解耦了不同的功能模块,每个模块具有独立的神经网络,可以在前向网络固定的基础上进行训练,从模块级别优化整体ISP性能。Compared with the fully end-to-end AI-ISP technology, the embodiment of the present disclosure decouples different functional modules at the feature level. Each module has an independent neural network and can be trained on the basis of a fixed forward network, thereby optimizing the overall ISP performance from the module level.

在一种可能的实施方式中,所述子任务功能单元的种类还包括色调映射单元,所述将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数的步骤包括:In a possible implementation manner, the type of the subtask functional unit further includes a tone mapping unit, and the step of inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain the target characteristic parameters includes:

步骤D10,将所述共享特征参数输入至依次连接的图像降噪单元、图像增强单元和色调映射单元以进行多任务联合处理,得到目标特征参数。Step D10, inputting the shared feature parameters into the image denoising unit, the image enhancement unit and the tone mapping unit connected in sequence to perform multi-task joint processing to obtain target feature parameters.

本实施例通过将子任务功能单元的种类还设置包括色调映射单元,从而使得本公开实施例的图像信号处理方法在能够对第一图像进行图像降噪处理和图像增强处理的基础上,还能对第一图像进行色调映射处理,有效拓展了图像信号处理装置的图像处理任务功能。In this embodiment, the types of sub-task function units are further arranged to include a tone mapping unit, so that the image signal processing method of the disclosed embodiment can not only perform image noise reduction processing and image enhancement processing on the first image, but also perform tone mapping processing on the first image, thereby effectively expanding the image processing task function of the image signal processing device.

在一种可实施的方式中,在所述得到训练完成的图像增强单元的步骤之后,所述方法还包括:In an practicable manner, after the step of obtaining the trained image enhancement unit, the method further includes:

将所述第二误差函数进行固定,并获取第三训练数据,其中,所述第三训练数据包括第一对比度的第七图像和从所述第七图像获得第二对比度的第八图像,所述第一对比度大于所述第二对比度;Fixing the second error function, and acquiring third training data, wherein the third training data includes a seventh image with a first contrast and an eighth image with a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast;

利用所述第三训练数据对色调映射单元进行训练,以对所述色调映射单元中进行映射任务处理的第三误差函数进行监督学习,直至所述第三误差函数收敛,得到训练完成的色调映射单元。The tone mapping unit is trained using the third training data to perform supervised learning on a third error function that performs mapping task processing in the tone mapping unit until the third error function converges, thereby obtaining a trained tone mapping unit.

本公开实施例通过在图像增强单元训练完成后,将第二误差函数进行固定,并获取第三训练数据,其中,该第三训练数据包括第一对比度的第七图像和从第七图像获得第二对比度的第八图像,该第一对比度大于第二对比度,然后利用第三训练数据对第三训练链路中的色调映射单元进行训练,以对色调映射单元中进行映射任务处理的第三误差函数进行监督学习,直至第三误差函数收敛,得到训练完成的色调映射单元,从而使得图像降噪单元、图像增强单元和色调映射单元之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的 单独训练,也即可通过固定前端主干网络模块的网络参数,来训练图像降噪单元,其中,训练图像降噪单元的数据流不经过图像增强单元的网络层,然后在图像降噪单元训练完后,可通过固定前端主干网络模块和图像降噪单元的网络参数,来训练图像增强单元,直至图像增强单元的网络参数收敛,其中,训练图像增强单元的数据流不经过色调映射单元的网络层。然后可通过固定前端主干网络模块、图像降噪单元和图像增强单元的网络参数,来训练色调映射单元,直至色调映射单元的网络参数收敛。In the embodiment of the present disclosure, after the training of the image enhancement unit is completed, the second error function is fixed and the third training data is obtained, wherein the third training data includes a seventh image of a first contrast and an eighth image of a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast. Then, the third training data is used to train the tone mapping unit in the third training link, so as to supervise the learning of the third error function that performs mapping task processing in the tone mapping unit until the third error function converges, and a trained tone mapping unit is obtained, so that the image denoising unit, the image enhancement unit and the tone mapping unit can be decoupled at the feature level. During the training process, the parameters of the front-end backbone network and the forward task module neural network are fixed, and the training of each neural network can be completed. Separate training, that is, the image denoising unit can be trained by fixing the network parameters of the front-end backbone network module, wherein the data flow of the training image denoising unit does not pass through the network layer of the image enhancement unit, and then after the image denoising unit is trained, the image enhancement unit can be trained by fixing the network parameters of the front-end backbone network module and the image denoising unit until the network parameters of the image enhancement unit converge, wherein the data flow of the training image enhancement unit does not pass through the network layer of the tone mapping unit. Then the tone mapping unit can be trained by fixing the network parameters of the front-end backbone network module, the image denoising unit and the image enhancement unit until the network parameters of the tone mapping unit converge.

也即,主流的AI-ISP技术将图像降噪、图像增强和色调映射等任务相互隔离,用各自独立的神经网络模型实现各个功能。每个神经网络模型都从像素级开始,对图像进行特征提取、任务处理和特征恢复,进而完成每一个模块端到端的功能实现。而本公开实施例中各个任务模块共用相同的主干网络进行特征提取和特征恢复,可以避免不同任务之间特征的重复计算和数据的迁移传输,有效提升网络整体运行效率。不同子任务类型之间在特征级别可以实现解耦,训练过程中通过固定前端主干网络和前向任务模块神经网络的参数,可以完成各自神经网络的单独训练。That is, the mainstream AI-ISP technology isolates tasks such as image denoising, image enhancement, and tone mapping from each other, and implements each function with independent neural network models. Each neural network model starts from the pixel level, performs feature extraction, task processing, and feature recovery on the image, and then completes the end-to-end functional implementation of each module. In the disclosed embodiment, each task module shares the same backbone network for feature extraction and feature recovery, which can avoid repeated calculation of features and data migration and transmission between different tasks, and effectively improve the overall operation efficiency of the network. Different subtask types can be decoupled at the feature level. By fixing the parameters of the front-end backbone network and the forward task module neural network during training, separate training of each neural network can be completed.

本实施例提供的图像信号处理方法与上述实施例提供的图像信号处理装置属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述图像信号处理装置的实施例,并且本实施例具备与图像信号处理装置各实施例相同的有益效果,此处不再赘述。The image signal processing method provided in this embodiment and the image signal processing device provided in the above embodiment belong to the same inventive concept. For technical details not fully described in this embodiment, reference can be made to the above-mentioned embodiments of the image signal processing device. This embodiment has the same beneficial effects as the various embodiments of the image signal processing device, which will not be repeated here.

此外,本公开实施例还提供一种图像信号处理设备,参照图6,图6为本公开实施例方案涉及的图像信号处理设备的硬件结构示意图。如图6所示,图像信号处理设备可以包括:处理器1001,例如中央处理器(Central Processing Unit,CPU),通信总线1002、用户接口1003,网络接口1004,存储器1005。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如无线保真(WIreless-FIdelity,WI-FI)接口)。存储器1005可以是高速的随机存取存储器(Random Access Memory,RAM)存储器,也可以是稳定的非易失性存储器(Non-Volatile Memory,NVM),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存 储设备。In addition, the embodiment of the present disclosure also provides an image signal processing device, with reference to FIG6, which is a schematic diagram of the hardware structure of the image signal processing device involved in the embodiment of the present disclosure. As shown in FIG6, the image signal processing device may include: a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize the connection and communication between these components. The user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a wireless fidelity (WIreless-FIdelity, WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or a stable non-volatile memory (Non-Volatile Memory, NVM), such as a disk memory. The memory 1005 may optionally be a memory independent of the processor 1001. Storage equipment.

本领域技术人员可以理解,图6中示出的结构并不构成对图像信号处理设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。如图6所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、数据存储模块、网络通信模块、用户接口模块以及图像信号处理程序。Those skilled in the art will appreciate that the structure shown in FIG6 does not limit the image signal processing device, and may include more or fewer components than shown, or combine certain components, or arrange components differently. As shown in FIG6 , the memory 1005 as a computer-readable storage medium may include an operating system, a data storage module, a network communication module, a user interface module, and an image signal processing program.

在图6所示的图像信号处理设备中,网络接口1004主要用于与其他设备进行数据通信;用户接口1003主要用于与用户进行数据交互;本实施例中的处理器1001、存储器1005可以设置在通信设备中,通信设备通过处理器1001调用存储器1005中存储的图像信号处理程序,并执行上述任一实施例提供的应用于图像信号处理方法。In the image signal processing device shown in FIG6 , the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with the user; the processor 1001 and the memory 1005 in this embodiment can be set in the communication device, and the communication device calls the image signal processing program stored in the memory 1005 through the processor 1001, and executes the image signal processing method provided in any of the above embodiments.

本实施例提出的终端与上述实施例提出的应用于图像信号处理方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述任意实施例,并且本实施例具备与执行图像信号处理方法相同的有益效果。The terminal proposed in this embodiment and the image signal processing method proposed in the above embodiment belong to the same inventive concept. The technical details not described in detail in this embodiment can be referred to any of the above embodiments, and this embodiment has the same beneficial effects as executing the image signal processing method.

此外,本公开实施例还提出一种计算机存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质上存储有图像信号处理程序,该图像信号处理程序被处理器执行时实现如上所述的本公开图像信号处理方法。In addition, an embodiment of the present disclosure further proposes a computer storage medium, which may be a non-volatile computer-readable storage medium, on which an image signal processing program is stored, and when the image signal processing program is executed by a processor, the image signal processing method of the present disclosure as described above is implemented.

本公开图像信号处理设备和计算机可读存储介质的各实施例,均可参照本公开图像信号处理方法各个实施例,此处不再赘述。The various embodiments of the image signal processing device and the computer-readable storage medium disclosed in the present invention may refer to the various embodiments of the image signal processing method disclosed in the present invention, and will not be described in detail here.

需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, in this article, the terms "include", "comprises" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or system. In the absence of further restrictions, an element defined by the sentence "comprises a ..." does not exclude the existence of other identical elements in the process, method, article or system including the element.

上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above-mentioned embodiments of the present disclosure are only for description and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的 技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台图像信号处理设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。Through the above description of the implementation methods, those skilled in the art can clearly understand that the above embodiment methods can be implemented by means of software plus a necessary general hardware platform, or by hardware, but in many cases the former is a better implementation method. The essence of the technical solution or the part that contributes to the prior art can be embodied in the form of a software product. The computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) as described above, and includes a number of instructions for enabling an image signal processing device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in the various embodiments of the present disclosure.

以上仅为本公开的优选实施例,并非因此限制本公开的专利范围,凡是利用本公开说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本公开的专利保护范围内。 The above are only preferred embodiments of the present disclosure, and are not intended to limit the patent scope of the present disclosure. Any equivalent structure or equivalent process transformation made using the contents of the present disclosure and the drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present disclosure.

Claims (13)

一种图像信号处理装置,包括:An image signal processing device, comprising: 前端主干网络模块,设置为获取第一图像,对所述第一图像进行特征提取,得到共享特征参数;A front-end backbone network module is configured to obtain a first image, perform feature extraction on the first image, and obtain shared feature parameters; 核心网络模块,设置为将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,所述共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;The core network module is configured to input the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters having task correlation between the subtask functional units; 后端主干网络模块,设置为根据所述目标特征参数进行图像构建,构建得到第二图像进行输出,其中,所述第二图像的质量高于所述第一图像。The back-end backbone network module is configured to construct an image according to the target feature parameters, and construct a second image for output, wherein the quality of the second image is higher than that of the first image. 如权利要求1所述的图像信号处理装置,其中,所述前端主干网络模块包括预设数量个不同尺度的特征提取网络层,所述特征提取网络层设置为获取第一图像,对所述第一图像进行多通道的特征提取,得到多尺度的共享特征参数;The image signal processing device according to claim 1, wherein the front-end backbone network module comprises a preset number of feature extraction network layers of different scales, and the feature extraction network layer is configured to obtain a first image, perform multi-channel feature extraction on the first image, and obtain multi-scale shared feature parameters; 所述核心网络模块包括与各特征提取网络层一一对应连接的特征处理网络层,每个特征处理网络层包括多个子任务功能单元,所述特征处理网络层设置为将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,得到多尺度的目标特征参数;The core network module includes a feature processing network layer connected to each feature extraction network layer in a one-to-one correspondence, each feature processing network layer includes a plurality of subtask functional units, and the feature processing network layer is configured to input multi-scale shared feature parameters into each subtask functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters; 所述后端主干网络模块包括与各特征处理网络层一一对应连接的图像恢复网络层,所述图像恢复网络层设置为将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据所述融合特征参数进行图像构建,构建得到第二图像进行输出。The back-end backbone network module includes an image restoration network layer connected to each feature processing network layer in a one-to-one correspondence. The image restoration network layer is configured to perform feature fusion on multi-scale target feature parameters to obtain fused feature parameters, and to construct an image based on the fused feature parameters to construct a second image for output. 如权利要求1所述的图像信号处理装置,其中,所述子任务功能单元的种类包括图像降噪单元和图像增强单元,其中,每个特征处理网络层包括依次连接的所述图像降噪单元和所述图像增强单元。The image signal processing device as described in claim 1, wherein the types of the subtask functional units include an image denoising unit and an image enhancement unit, and wherein each feature processing network layer includes the image denoising unit and the image enhancement unit connected in sequence. 如权利要求3所述的图像信号处理装置,其中,所述图像信号处理装置还包括训练模块,所述训练模块设置为:The image signal processing device according to claim 3, wherein the image signal processing device further comprises a training module, and the training module is configured to: 获取第一训练数据,其中,所述第一训练数据包括第一噪声的第三图像和从所述第三图像获得第二噪声的第四图像,所述第二噪声大于所述第一噪声;Acquire first training data, wherein the first training data includes a third image of first noise and a fourth image of second noise obtained from the third image, wherein the second noise is greater than the first noise; 利用所述第一训练数据对第一训练链路中的图像降噪单元进行训练,以 对所述图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至所述第一误差函数收敛,得到训练完成的图像降噪单元,其中,所述第一训练链路为依次连接的前端主干网络模块、图像降噪单元和后端主干网络模块;The image denoising unit in the first training link is trained using the first training data to Performing supervised learning on a first error function for performing denoising task processing in the image denoising unit until the first error function converges, thereby obtaining a trained image denoising unit, wherein the first training link is a front-end backbone network module, an image denoising unit, and a back-end backbone network module connected in sequence; 在图像降噪单元训练完成后,将所述第一误差函数进行固定,并获取第二训练数据,其中,所述第二训练数据包括第一分辨率的第五图像和从所述第五图像获得第二分辨率的第六图像,所述第一分辨率大于所述第二分辨率;After the image denoising unit training is completed, the first error function is fixed, and second training data is obtained, wherein the second training data includes a fifth image with a first resolution and a sixth image with a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution; 利用所述第二训练数据对第二训练链路中的图像增强单元进行训练,以对所述图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至所述第二误差函数收敛,得到训练完成的图像增强单元,其中,所述第二训练链路为依次连接的前端主干网络模块、图像降噪单元、图像增强单元和后端主干网络模块。The image enhancement unit in the second training link is trained using the second training data to perform supervised learning on a second error function that performs enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit, wherein the second training link is a front-end backbone network module, an image denoising unit, an image enhancement unit, and a back-end backbone network module connected in sequence. 如权利要求4所述的图像信号处理装置,其中,所述子任务功能单元的种类还包括色调映射单元,其中,每个特征处理网络层包括依次连接的所述图像降噪单元、所述图像增强单元和所述色调映射单元。The image signal processing device as described in claim 4, wherein the type of sub-task functional unit also includes a tone mapping unit, wherein each feature processing network layer includes the image denoising unit, the image enhancement unit and the tone mapping unit connected in sequence. 一种图像信号处理方法,包括:A method for processing an image signal, comprising: 获取第一图像,对所述第一图像进行特征提取,得到共享特征参数;Acquire a first image, perform feature extraction on the first image, and obtain shared feature parameters; 将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数,其中,所述共享特征参数为各个子任务功能单元之间具有任务相关性的特征参数;Inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain target characteristic parameters, wherein the shared characteristic parameters are characteristic parameters with task correlation between the subtask functional units; 根据所述目标特征参数进行图像构建,构建得到第二图像进行输出,其中,所述第二图像的质量高于所述第一图像。An image is constructed according to the target feature parameters to obtain a second image for output, wherein the quality of the second image is higher than that of the first image. 如权利要求6所述的图像信号处理方法,其中,所述方法还包括:The image signal processing method according to claim 6, wherein the method further comprises: 获取第一图像,对所述第一图像进行多通道的特征提取,得到多尺度的共享特征参数;Acquire a first image, perform multi-channel feature extraction on the first image, and obtain multi-scale shared feature parameters; 将多尺度的共享特征参数输入至各个子任务功能单元进行多通道的多任务联合处理,得到多尺度的目标特征参数;The multi-scale shared feature parameters are input into each sub-task functional unit for multi-channel multi-task joint processing to obtain multi-scale target feature parameters; 将多尺度的目标特征参数进行特征融合,得到融合特征参数,并根据所述融合特征参数进行图像构建,构建得到第二图像进行输出。The multi-scale target feature parameters are subjected to feature fusion to obtain fused feature parameters, and an image is constructed according to the fused feature parameters to obtain a second image for output. 如权利要求6所述的图像信号处理方法,其中,所述子任务功能单元 的种类包括图像降噪单元和图像增强单元,所述将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数的步骤包括:The image signal processing method according to claim 6, wherein the subtask function unit The types include image noise reduction units and image enhancement units, and the step of inputting the shared feature parameters into a plurality of preset subtask function units for multi-task joint processing to obtain target feature parameters includes: 将所述共享特征参数输入至依次连接的图像降噪单元和图像增强单元以进行多任务联合处理,得到目标特征参数。The shared feature parameters are input into the image denoising unit and the image enhancement unit connected in sequence to perform multi-task joint processing to obtain target feature parameters. 如权利要求8所述的图像信号处理方法,其中,所述方法还包括:The image signal processing method according to claim 8, wherein the method further comprises: 获取第一训练数据,其中,所述第一训练数据包括第一噪声的第三图像和从所述第三图像获得第二噪声的第四图像,所述第二噪声大于所述第一噪声;Acquire first training data, wherein the first training data includes a third image of first noise and a fourth image of second noise obtained from the third image, and the second noise is greater than the first noise; 利用所述第一训练数据对图像降噪单元进行训练,以对所述图像降噪单元中进行降噪任务处理的第一误差函数进行监督学习,直至所述第一误差函数收敛,得到训练完成的图像降噪单元;Using the first training data to train an image denoising unit, so as to perform supervised learning on a first error function that performs denoising task processing in the image denoising unit, until the first error function converges, thereby obtaining a trained image denoising unit; 在图像降噪单元训练完成后,将所述第一误差函数进行固定,并获取第二训练数据,其中,所述第二训练数据包括第一分辨率的第五图像和从所述第五图像获得第二分辨率的第六图像,所述第一分辨率大于所述第二分辨率;After the image denoising unit training is completed, the first error function is fixed, and second training data is obtained, wherein the second training data includes a fifth image with a first resolution and a sixth image with a second resolution obtained from the fifth image, and the first resolution is greater than the second resolution; 利用所述第二训练数据对图像增强单元进行训练,以对所述图像增强单元中进行增强任务处理的第二误差函数进行监督学习,直至所述第二误差函数收敛,得到训练完成的图像增强单元。The image enhancement unit is trained using the second training data to perform supervised learning on a second error function that performs enhancement task processing in the image enhancement unit until the second error function converges, thereby obtaining a trained image enhancement unit. 如权利要求9所述的图像信号处理方法,其中,所述子任务功能单元的种类还包括色调映射单元,所述将所述共享特征参数输入至多个预设的子任务功能单元进行多任务联合处理,得到目标特征参数的步骤包括:The image signal processing method according to claim 9, wherein the type of the subtask functional unit further includes a tone mapping unit, and the step of inputting the shared characteristic parameters into a plurality of preset subtask functional units for multi-task joint processing to obtain the target characteristic parameters comprises: 将所述共享特征参数输入至依次连接的图像降噪单元、图像增强单元和色调映射单元以进行多任务联合处理,得到目标特征参数。The shared feature parameters are input into an image denoising unit, an image enhancement unit and a tone mapping unit connected in sequence to perform multi-task joint processing to obtain target feature parameters. 如权利要求10所述的图像信号处理方法,其中,在所述得到训练完成的图像增强单元的步骤之后,所述方法还包括:The image signal processing method according to claim 10, wherein after the step of obtaining the trained image enhancement unit, the method further comprises: 将所述第二误差函数进行固定,并获取第三训练数据,其中,所述第三训练数据包括第一对比度的第七图像和从所述第七图像获得第二对比度的第八图像,所述第一对比度大于所述第二对比度;Fixing the second error function, and acquiring third training data, wherein the third training data includes a seventh image with a first contrast and an eighth image with a second contrast obtained from the seventh image, and the first contrast is greater than the second contrast; 利用所述第三训练数据对色调映射单元进行训练,以对所述色调映射单元中进行映射任务处理的第三误差函数进行监督学习,直至所述第三误差函 数收敛,得到训练完成的色调映射单元。The tone mapping unit is trained using the third training data to perform supervised learning on a third error function that performs mapping task processing in the tone mapping unit until the third error function The training number converges and the trained tone mapping unit is obtained. 一种图像信号处理设备,其中,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的图像信号处理程序,所述图像信号处理程序被所述处理器执行时实现如权利要求6至11中任一项所述的图像信号处理方法。An image signal processing device, comprising: a memory, a processor, and an image signal processing program stored in the memory and executable on the processor, wherein the image signal processing program, when executed by the processor, implements the image signal processing method according to any one of claims 6 to 11. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有图像信号处理程序,所述图像信号处理程序被处理器执行时实现如权利要求6至11中任一项所述的图像信号处理方法。 A computer-readable storage medium, wherein an image signal processing program is stored on the computer-readable storage medium, and when the image signal processing program is executed by a processor, the image signal processing method according to any one of claims 6 to 11 is implemented.
PCT/CN2024/081031 2023-04-12 2024-03-11 Image signal processing method and apparatus, device, and computer-readable storage medium Pending WO2024212750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310387238.3 2023-04-12
CN202310387238.3A CN118865048A (en) 2023-04-12 2023-04-12 Image signal processing method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
WO2024212750A1 true WO2024212750A1 (en) 2024-10-17

Family

ID=93058727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2024/081031 Pending WO2024212750A1 (en) 2023-04-12 2024-03-11 Image signal processing method and apparatus, device, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN118865048A (en)
WO (1) WO2024212750A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119729244A (en) * 2024-11-27 2025-03-28 成都维海德科技有限公司 Video data processing method, device, equipment and computer readable storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12348743B2 (en) * 2023-06-21 2025-07-01 City University Of Hong Kong Method and system for learned video compression

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018055377A (en) * 2016-09-28 2018-04-05 日本電信電話株式会社 Multitask processing device, multitask model learning device, and program
CN111325108A (en) * 2020-01-22 2020-06-23 中能国际建筑投资集团有限公司 Multitask network model, using method, device and storage medium
CN111340146A (en) * 2020-05-20 2020-06-26 杭州微帧信息科技有限公司 Method for accelerating video recovery task through shared feature extraction network
CN111813532A (en) * 2020-09-04 2020-10-23 腾讯科技(深圳)有限公司 Image management method and device based on multitask machine learning model
WO2021098831A1 (en) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 Target detection system suitable for embedded device
CN113298740A (en) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 Image enhancement method and device, terminal equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018055377A (en) * 2016-09-28 2018-04-05 日本電信電話株式会社 Multitask processing device, multitask model learning device, and program
WO2021098831A1 (en) * 2019-11-22 2021-05-27 乐鑫信息科技(上海)股份有限公司 Target detection system suitable for embedded device
CN111325108A (en) * 2020-01-22 2020-06-23 中能国际建筑投资集团有限公司 Multitask network model, using method, device and storage medium
CN111340146A (en) * 2020-05-20 2020-06-26 杭州微帧信息科技有限公司 Method for accelerating video recovery task through shared feature extraction network
CN111813532A (en) * 2020-09-04 2020-10-23 腾讯科技(深圳)有限公司 Image management method and device based on multitask machine learning model
CN113298740A (en) * 2021-05-27 2021-08-24 中国科学院深圳先进技术研究院 Image enhancement method and device, terminal equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119729244A (en) * 2024-11-27 2025-03-28 成都维海德科技有限公司 Video data processing method, device, equipment and computer readable storage medium
CN119729244B (en) * 2024-11-27 2025-09-30 成都维海德科技有限公司 Video data processing method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN118865048A (en) 2024-10-29

Similar Documents

Publication Publication Date Title
CN111669514B (en) High dynamic range imaging method and apparatus
KR20200132682A (en) Image optimization method, apparatus, device and storage medium
CN110675336A (en) Low-illumination image enhancement method and device
WO2020231016A1 (en) Image optimization method, apparatus, device and storage medium
WO2024212750A1 (en) Image signal processing method and apparatus, device, and computer-readable storage medium
CN110148088B (en) Image processing method, image rain removal method, device, terminal and medium
US12205249B2 (en) Intelligent portrait photography enhancement system
CN114298942B (en) Image deblurring method and device, computer readable medium and electronic device
CN110958469A (en) Video processing method and device, electronic equipment and storage medium
CN113724151B (en) Image enhancement method, electronic equipment and computer readable storage medium
JPWO2020166596A1 (en) Image processing system and program
WO2025026175A1 (en) Image processing method and apparatus, and electronic device and storage medium
CN111968039B (en) Day and night general image processing method, device and equipment based on silicon sensor camera
US20250117882A1 (en) Generation of high-resolution images
CN118887152A (en) A low-light image quality enhancement method and device based on Retinex theory
CN116563556B (en) Model training method
CN115937044A (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN113706390A (en) Image conversion model training method, image conversion method, device and medium
CN115393177A (en) Facial image processing method and electronic device
JP2024506828A (en) Image brightness adjustment method, device, electronic equipment and medium
KR102153786B1 (en) Image processing method and apparatus using selection unit
CN116740360B (en) Image processing method, device, equipment and storage medium
CN113744164B (en) Method, system and related equipment for enhancing low-illumination image at night quickly
CN114612293A (en) Image super-resolution processing method, device, equipment and storage medium
CN120374472A (en) Video processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 24787856

Country of ref document: EP

Kind code of ref document: A1