WO2018126484A1

WO2018126484A1 - Reconfigurable parallel image detail enhancing method and apparatus

Info

Publication number: WO2018126484A1
Application number: PCT/CN2017/070670
Authority: WO
Inventors: 刘壮; 郭若杉; 谭吉来; 李瑞玲; 韩睿; 李晨
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2017-01-09
Filing date: 2017-01-09
Publication date: 2018-07-12
Anticipated expiration: 2019-07-09

Abstract

The present invention relates to a reconfigurable parallel image detail enhancing method, comprising: parameter pre-loading, data buffering, horizontal and vertical filtering, coring filtering, overshoot suppression, amplitude suppression and cached data updating. The present invention further relates to a reconfigurable parallel image detail enhancing apparatus, comprising a local memory, a memory access control unit, a general buffer, a parallel arithmetic logic unit (ALU), a state machine and a parallel multiply-accumulator (MAC). The present invention enhances an image detail signal, so that a textured area is more clear, and improves the data usage efficiency of same, reduces the data interaction between an operational component and a peripheral memory, lowers the memory access bandwidth pressure, and can achieve the repeated utilization of hardware resources.

Description

Reconfigurable parallel image detail enhancement method and device

Technical field

本发明涉及视频图像处理领域，具体涉及一种可重构的并行图像细节增强方法和装置。The present invention relates to the field of video image processing, and in particular, to a reconfigurable parallel image detail enhancement method and apparatus.

Background technique

目前，视频技术的主流发展方向之一为超高清(4K分辨率)显示技术。相对于高清(1920*1080)视频，4K视频的像素数从2M提升到8M，因此对图像增强算法的画质和效能提出了更高的要求。At present, one of the mainstream development directions of video technology is ultra high definition (4K resolution) display technology. Compared with high-definition (1920*1080) video, the number of pixels in 4K video is increased from 2M to 8M, which puts higher requirements on the image quality and performance of image enhancement algorithms.

传统的视频图像细节增强解决方案主要有针对高清及以下标准的需求设计，在面对4K图像处理需求时，很有可能处理能力不足的问题；同时4K超高清图像可以带来更精细的画面效果，因此现有细节增强算法应用于4K分辨率图像时，过冲等负面效果可能会更加容易被观看者察觉。The traditional video image detail enhancement solution is mainly designed for the requirements of HD and the following standards. In the face of 4K image processing requirements, it is very likely to have insufficient processing power; at the same time, 4K ultra-high definition images can bring more detailed picture effects. Therefore, when the existing detail enhancement algorithm is applied to a 4K resolution image, negative effects such as overshoot may be more noticeable to the viewer.

此外，由于传统方案通常采用固化算法的专用集成电路芯片作为具体实施方案，在面临算法升级需求时，成本压力巨大。In addition, since the conventional solution usually adopts a dedicated integrated circuit chip of a curing algorithm as a specific implementation scheme, the cost pressure is huge when faced with the need for algorithm upgrade.

因此，需要提出一种新的视频图像细节增强解决方案，对该方案的要求是，1能提升图像锐利度，2对细节增强带来的过冲、噪声放大等负面效果有较好的抑制，3满足实时超高清视频流的处理需求，4成本可控前提下，具备算法升级潜力。Therefore, it is necessary to propose a new video image detail enhancement solution. The requirement of the scheme is that 1 can improve the image sharpness, and 2 negative effects such as overshoot and noise amplification caused by detail enhancement are better suppressed. 3 to meet the processing needs of real-time ultra-high-definition video stream, 4 under the premise of cost control, with the potential for algorithm upgrade.

发明内容Summary of the invention

为了解决现有技术中的上述问题，本发明一方面提出了一种可重构的并行图像细节增强方法，包括以下步骤：In order to solve the above problems in the prior art, an aspect of the present invention provides a reconfigurable parallel image detail enhancement method, including the following steps:

步骤1，将待处理的图像数据加载至缓冲器；所述待处理的图像数据为R*Q的像素点阵，其中R或Q的值等于并行度N；所述像素点阵可拆分为多个包含N像素点的一维点阵；Step 1: loading the image data to be processed into a buffer; the image data to be processed is a pixel matrix of R*Q, wherein the value of R or Q is equal to the degree of parallelism N; the pixel lattice is detachable a plurality of one-dimensional lattices comprising N pixel points;

步骤2，对所述一维点阵中各待增强像素点，分别并行进行水平与垂直方向的滤波，获取两个方向的细节信号； Step 2: performing horizontal and vertical filtering on each pixel to be enhanced in the one-dimensional lattice to obtain detailed signals in two directions;

步骤3，对两个方向的细节信号核化滤波，过滤掉由图像噪声引入的微小细节信号；Step 3, nucleating the detail signal in two directions, filtering out the minute detail signal introduced by the image noise;

步骤4，通过待增强像素点邻域两侧的灰度对称性以及该待增强像素点细节信号强度，对增强后的细节信号强度进行控制，进行过冲抑制，将完成过冲抑制的两个细节信号相加，获得这N个像素点的细节信号；Step 4: controlling the intensity of the enhanced detail signal by performing gray scale symmetry on both sides of the neighborhood of the pixel to be enhanced and the signal intensity of the pixel to be enhanced, performing overshoot suppression, and completing two overshoot suppression The detail signals are added to obtain the detail signals of the N pixels;

步骤5，进一步对步骤4中获取的细节信号进行幅度抑制；Step 5, further performing amplitude suppression on the detail signal obtained in step 4;

步骤6，依次对待处理的图像数据中各一维点阵执行步骤2至步骤5进行处理，完成该待处理的图像数据的细节增强。Step 6: Step 1 to step 5 are performed on each one-dimensional lattice in the image data to be processed in sequence, and the detail enhancement of the image data to be processed is completed.

优选地，所述缓冲器包括NM个大小为N个像素的缓冲单元；所述缓冲器配备有4个读取端口和4个写入端口。Preferably, the buffer comprises NM buffer units of size N pixels; the buffer is provided with 4 read ports and 4 write ports.

优选地，所述水平与垂直方向的滤波，所采用的滤波器分别对应为水平NH阶和垂直NV阶的一维滤波器，分别计算像素点左右各(NH-1)/2个和上下各(NV-1)/2个像素的灰度，并结合该像素点的灰度值获取该像素点两个方向的细节信号。Preferably, the filtering in the horizontal and vertical directions corresponds to a horizontal NH-order and a vertical NV-order one-dimensional filter, respectively calculating left and right (NH-1)/2 pixels and upper and lower respectively. (NV-1)/2 pixels of gray, combined with the gray value of the pixel to obtain the detail signal in both directions of the pixel.

优选地，所述的缓冲器为多粒度的离散存储器结构。Preferably, the buffer is a multi-granular discrete memory structure.

优选地，所述水平与垂直方向的滤波，具体为将滤波模板与图像数据进行空域卷积，其滤波结果表示为：Preferably, the filtering in the horizontal and vertical directions is specifically performing spatial convolution of the filtering template and the image data, and the filtering result is expressed as:

其中，(i，j)表示图像数据中第i行第j列位置上的像素点，DEH(i,j)表示(i，j)处的水平滤波结果，DEV(i,j)表示(i，j)处的垂直滤波结果，P(i，j)表示图像第i行第j列位置上的像素灰度，FH(k)表示水平模板第k个元素，FV(t)表示垂直模板第t个元素。Where (i, j) represents the pixel point in the i-th row and j-th column position in the image data, DEH(i, j) represents the horizontal filtering result at (i, j), and DEV(i, j) represents (i , j) vertical filtering results, P (i, j) represents the pixel gray level at the i-th row and j-th column position of the image, FH (k) represents the k-th element of the horizontal template, and FV (t) represents the vertical template t elements.

优选地，步骤4中所述过冲抑制，是对水平细节信号和垂直细节信号分别进行处理，然后将经过过冲抑制的两个细节信号相加，获得最终的细节信号，具体方法为： Preferably, the overshoot suppression in step 4 is to separately process the horizontal detail signal and the vertical detail signal, and then add the two detail signals that have been overshoot suppressed to obtain the final detail signal by:

步骤41，利用待处理像素点的灰度值以及该点左右各(NH-1)/2个和上下各(NV-1)/2个像素的灰度进行绝对差运算，即得到左右各(NH-1)/2个和上下各(NV-1)/2个共四组灰度绝对差；Step 41, using the gray value of the pixel to be processed and the gray level of each of the left and right (NH-1)/2 and the upper and lower (NV-1)/2 pixels of the point, the absolute difference operation is obtained, that is, the left and right sides are obtained ( NH-1)/2 and upper and lower (NV-1)/2 total four sets of grayscale absolute difference;

步骤42，求取四组绝对差的均值：Mean_L，Mean_R，Mean_T，Mean_B，即该点左右上下四个灰度差均值；Step 42: Calculate the mean values of the four groups of absolute differences: Mean_L, Mean_R, Mean_T, and Mean_B, that is, the mean value of the four gray scale differences between the top and bottom of the point;

步骤43，计算第一过冲抑制因子alpha和第二过冲抑制因子beta，公式为Step 43, calculating a first overshoot suppression factor alpha and a second overshoot suppression factor beta, the formula is

alpha＝ka*Y_abs_meanAlpha=ka*Y_abs_mean

步骤44，计算过冲抑制因子s＝1-alpha×beta，进行过充抑制，并获取经过过冲抑制的细节信号de_ss＝de×s。In step 44, the overshoot suppression factor s=1-alpha×beta is calculated, overcharge suppression is performed, and the detail signal de_ss=de×s after overshoot suppression is obtained.

优选地，所述细节信号强度de＝de_h+de_v，其中de_h为水平方向的细节信号强度，de_v为竖直方向的细节信号强度。Preferably, the detail signal strength de=de_h+de_v, where de_h is the detail signal strength in the horizontal direction and de_v is the detail signal strength in the vertical direction.

优选地，步骤5中所述幅度抑制，其方法为：Preferably, the amplitude suppression in step 5 is as follows:

步骤51，将de_ss与细节增强系数gain相乘，得到增强后的细节信号de_gain；Step 51, multiplying de_ss by the detail enhancement coefficient gain to obtain an enhanced detail signal de_gain;

步骤52，并按照如下公式进行幅度抑制，并得到最终的细节信号de_final；Step 52, and performing amplitude suppression according to the following formula, and obtaining a final detail signal de_final;

其中，Th为设定阈值，Max_de为设定最大幅度。Where Th is the set threshold and Max_de is the set maximum.

优选地，步骤5中进行幅度抑制后的输出值为Yout＝Yin+de_final，其中Yout和Yin分别为输出的像素灰度和输入的像素灰度。Preferably, the output value after amplitude suppression in step 5 is Yout=Yin+de_final, where Yout and Yin are the pixel gradation of the output and the pixel gradation of the input, respectively.

优选地，在步骤1之前还包括参数预加载步骤，参数预加载步骤包括：将预先设定的水平与垂直方向的滤波、核化滤波、过冲抑制和幅度抑制中的固化参数加载至通用缓冲器。 Preferably, before the step 1, a parameter pre-loading step is further included, the parameter pre-loading step includes: loading the curing parameters in the preset horizontal and vertical filtering, nucleation filtering, overshoot suppression and amplitude suppression to the general buffer Device.

优选地，步骤1中所述待处理的图像数据通过对图像数据按照R*Q的像素点阵顺次拆分获取；步骤1中所述加载至缓冲器，其方法为：Preferably, the image data to be processed in step 1 is obtained by sequentially splitting the image data according to the pixel matrix of R*Q; the method is loaded into the buffer in step 1, and the method is:

按照所述图像数据的拆分顺序，顺次选取待处理的图像数据并通过步骤2～步骤6处理，直至所有待处理的图像数据处理完毕。According to the splitting order of the image data, the image data to be processed is sequentially selected and processed through steps 2 to 6, until all the image data to be processed is processed.

本发明的另一方面，还提出了一种可重构的并行图像细节增强装置，其特征在于，包括局部存储器、访存控制单元、通用缓冲器、并行算术逻辑单元ALU、状态机、并行乘累加器MAC；In another aspect of the present invention, a reconfigurable parallel image detail enhancement apparatus is further provided, comprising: a local memory, a memory access control unit, a general purpose buffer, a parallel arithmetic logic unit ALU, a state machine, and a parallel multiplication Accumulator MAC;

所述局部存储器，用于保存输入输出图像数据以及并行视频图像对比度增强算法所需参数，该存储器支持并行访问；The local memory is configured to save input and output image data and parameters required by a parallel video image contrast enhancement algorithm, and the memory supports parallel access;

所述访存控制单元，用于局部存储器与通用缓冲器之间的数据交换；The memory access control unit is configured to exchange data between the local memory and the general buffer;

所述通用缓冲器，用于缓冲一次完整的处理流程所需要的全部数据以及中间结果，该缓冲器可以通过地址直接索引；The general purpose buffer is used to buffer all data and intermediate results required for a complete processing flow, and the buffer can be directly indexed by an address;

所述并行算术逻辑单元，用于执行并行视频图像对比度增强算法中涉及的非乘法类算术与逻辑运算；其并行度为N；The parallel arithmetic logic unit is configured to perform non-multiply-like arithmetic and logic operations involved in a parallel video image contrast enhancement algorithm; the degree of parallelism is N;

所述状态机，用于产生所有功能部件的控制信号；The state machine for generating control signals for all functional components;

所述并行乘累加器，用于对执行乘法相关运算，其并行度为N；The parallel multiply accumulator is configured to perform a multiplication correlation operation, and the degree of parallelism is N;

所述状态机分别通过通信线路与并行算术逻辑单元、访存控制单元、通用缓冲器、并行乘累加器相连接；所述局部存储器通过通信线路连接访存控制单元；所述通用缓冲器通过通信线路分别与访存控制单元、并行算术逻辑单元、并行乘累加器相连接；所述并行算术逻辑单元通过通信线路与并行乘累加器相连接。The state machine is respectively connected to the parallel arithmetic logic unit, the memory access control unit, the general buffer, and the parallel multiply accumulator through a communication line; the local memory is connected to the memory access control unit through a communication line; the universal buffer is communicated The lines are respectively connected to the memory access control unit, the parallel arithmetic logic unit, and the parallel multiply accumulator; the parallel arithmetic logic unit is connected to the parallel multiply accumulator via a communication line.

本发明具有以下有益效果：The invention has the following beneficial effects:

1、增强了图像细节信号，使纹理区域更加清晰；1. Enhance the image detail signal to make the texture area clearer;

2、对细节进行增强的同时，有效降低了噪声和过冲；2. Enhance the details while effectively reducing noise and overshoot;

3、易于对图像处理算法进行后期优化升级；3. It is easy to optimize and upgrade the image processing algorithm in the later stage;

4、提高了数据的使用效率，减少了运算部件与外围存储器之间的数据交互，降低了访存带宽压力；4. Improve the efficiency of data usage, reduce data interaction between computing components and peripheral memory, and reduce the pressure of memory access bandwidth;

5、通过使用通用缓冲器和状态机对功能部件进行控制，实现了硬件资源的重复利用。 5. Reuse of hardware resources by controlling the functional components by using general-purpose buffers and state machines.

DRAWINGS

图1是本发明可重构的并行图像细节增强装置的结构示意图；1 is a schematic structural diagram of a reconfigurable parallel image detail enhancement apparatus of the present invention;

图2是本发明提供的并行图像细节增强方法的流程图；2 is a flow chart of a parallel image detail enhancement method provided by the present invention;

图3是依照本发明一实施例的通用缓冲器的缓冲区示意图；3 is a schematic diagram of a buffer of a general buffer in accordance with an embodiment of the present invention;

图4是水平7阶滤波和垂直5阶滤波示例图；4 is a diagram showing an example of horizontal 7th-order filtering and vertical 5th-order filtering;

图5是依照本发明实施例核化滤波降噪示例图；FIG. 5 is a diagram showing an example of coring filtering noise reduction according to an embodiment of the present invention; FIG.

图6(a)～(d)是过冲现象易发生场景的示例图；6(a) to (d) are diagrams showing an example of a scene in which an overshoot phenomenon is likely to occur;

图7是依照本发明实施例过冲抑制因子alpha计算曲线示例图；7 is a diagram showing an example of an overshoot suppression factor alpha calculation curve in accordance with an embodiment of the present invention;

图8是依照本发明实施例过冲抑制因子beta计算曲线的示例图；8 is a diagram showing an example of an overshoot suppression factor beta calculation curve in accordance with an embodiment of the present invention;

图9是依照本发明实施例过冲抑制流程的示例图；9 is a diagram showing an example of an overshoot suppression process in accordance with an embodiment of the present invention;

图10是依照本发明实施例沿边缘插值的示例图。Figure 10 is an illustration of interpolation along an edge in accordance with an embodiment of the present invention.

detailed description

下面参照附图来描述本发明的优选实施方式。本领域技术人员应当理解的是，这些实施方式仅仅用于解释本发明的技术原理，并非旨在限制本发明的保护范围。Preferred embodiments of the present invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention, and are not intended to limit the scope of the present invention.

本发明的一种可重构的并行图像细节增强装置，如图1所示，包括局部存储器、访存控制单元、通用缓冲器、并行算术逻辑单元(ALU)、状态机、并行乘累加器(MAC)；A reconfigurable parallel image detail enhancement apparatus of the present invention, as shown in FIG. 1, includes a local memory, a memory access control unit, a general purpose buffer, an parallel arithmetic logic unit (ALU), a state machine, and a parallel multiply accumulator ( MAC);

所述访存控制单元，用于局部存储器与通用缓冲器之间的数据交换；本实施例中采用三个功能完全一致的访存控制单元，突破了访存资源瓶颈；The memory access control unit is used for data exchange between the local memory and the general buffer; in this embodiment, three memory access control units with completely identical functions are used, which breaks through the bottleneck of the memory access;

所述状态机，用于产生所有功能部件的控制信号； The state machine for generating control signals for all functional components;

当需要更改增强算法时，该装置仅需要对状态机进行重新编程，产生新的控制信号，同时更新局部存储器中的算法参数，即可快速实现算法迭代，而不需要重新设计制造硬件电路。When the enhancement algorithm needs to be changed, the device only needs to reprogram the state machine, generate a new control signal, and update the algorithm parameters in the local memory to quickly implement the algorithm iteration without redesigning the manufacturing hardware circuit.

本发明还提出了一种可重构的并行图像细节增强方法，如图2所示，包括以下步骤：The invention also proposes a reconfigurable parallel image detail enhancement method, as shown in FIG. 2, comprising the following steps:

步骤1，数据缓冲：将待处理的图像数据加载至缓冲器；所述待处理的图像数据为R*Q的像素点阵，其中R或Q的值等于并行度N；所述像素点阵可拆分为多个包含N像素点的一维点阵；Step 1, data buffering: loading the image data to be processed into a buffer; the image data to be processed is a pixel matrix of R*Q, wherein the value of R or Q is equal to the degree of parallelism N; Split into multiple one-dimensional lattices containing N pixel points;

步骤2，滤波：对所述一维点阵中各待增强像素点，分别并行进行水平与垂直方向的滤波，获取两个方向的细节信号；Step 2: filtering: performing horizontal and vertical filtering on each pixel to be enhanced in the one-dimensional lattice, and acquiring detailed signals in two directions;

步骤3，降噪：对两个方向的细节信号核化滤波，过滤掉由图像噪声引入的微小细节信号；Step 3, noise reduction: nucleating and filtering the detail signal in two directions, filtering out the minute detail signal introduced by the image noise;

步骤4，过冲抑制：通过待增强像素点邻域两侧的灰度对称性以及该待增强像素点细节信号强度，对增强后的细节信号强度进行控制，进行过冲抑制，将完成过冲抑制的两个细节信号相加，获得这N个像素点的细节信号；Step 4: Overshoot suppression: the gray level symmetry on both sides of the neighborhood of the pixel to be enhanced and the detail signal strength of the pixel to be enhanced are used to control the enhanced detail signal strength, and the overshoot suppression is performed, and the overshoot is completed. The two detail signals of the suppression are added to obtain the detail signals of the N pixel points;

步骤5，幅度抑制：进一步对步骤4中获取的细节信号进行幅度抑制；Step 5, amplitude suppression: further performing amplitude suppression on the detail signal obtained in step 4;

步骤6，缓存数据更新：通过更新缓冲器中数据，依次对待处理的图像数据中各一维点阵执行步骤2至步骤5进行处理，完成该待处理的图像数据的细节增强。Step 6: Cache data update: by updating the data in the buffer, step 1 to step 5 of each one-dimensional lattice in the image data to be processed are sequentially processed to complete the detail enhancement of the image data to be processed.

本实施例在步骤1之前还包括参数预加载步骤，参数预加载步骤包括：将预先设定的水平与垂直方向的滤波、核化滤波、过冲抑制和幅度抑制中的固化参数加载至通用缓冲器。The embodiment further includes a parameter pre-loading step before the step 1, the parameter pre-loading step includes: loading the curing parameters in the preset horizontal and vertical filtering, nucleation filtering, overshoot suppression and amplitude suppression to the general buffer. Device.

1、参数预加载 1, parameter preloading

该骤属于本发明所述装置的初始化阶段，水平与垂直方向的滤波器系数，核化滤波、过冲抑制和幅度抑制中所使用的阈值等固化参数会被预先加载到通用缓冲器中。This step belongs to the initialization phase of the apparatus of the present invention, and the curing parameters such as the filter coefficients in the horizontal and vertical directions, the thresholds used in the nucleation filtering, the overshoot suppression, and the amplitude suppression are preloaded into the general purpose buffer.

图3为本发明一实施例的通用缓冲器。如图3所示，所述通用缓冲器(用大写字母M代表)共存在NM个大小为N个像素的缓冲单元，配备有4个读取端口(r0、r1、r2、r3)和4个写入端口(w0、w1、w2、w3)，可以承载高速读写操作。通用缓冲器M支持直接使用序号对其NM个缓冲单元进行读写，便于对数据的重复使用。本发明所采用的通用缓冲器与运算部件同步运行，避免了高速运算部件等待低速存储部件的问题。3 is a general purpose buffer in accordance with an embodiment of the present invention. As shown in FIG. 3, the general-purpose buffer (represented by a capital letter M) has a total of NM buffer units of size N pixels, and is equipped with four read ports (r0, r1, r2, r3) and four. The write port (w0, w1, w2, w3) can carry high-speed read and write operations. The general-purpose buffer M supports direct reading and writing of NM buffer units by using serial numbers, which facilitates the repeated use of data. The universal buffer used in the present invention operates in synchronization with the arithmetic unit, thereby avoiding the problem that the high-speed arithmetic unit waits for the low-speed storage unit.

2、数据缓冲2, data buffering

对图像数据按照R*Q的像素点阵顺次拆分获取多个待处理的图像数据，按照所述图像数据的拆分顺序，顺次选取待处理的图像数据加载至缓冲器，并通过步骤2～步骤6处理，直至所有待处理的图像数据处理完毕；所述待处理的图像数据为R*Q的像素点阵，其中R或Q的值等于并行度N；所述像素点阵可拆分为多个包含N像素点的一维点阵。Obtaining a plurality of image data to be processed according to the pixel matrix of R*Q in sequence, and sequentially selecting the image data to be processed to be loaded into the buffer according to the splitting order of the image data, and adopting steps 2~Step 6 processing until all the image data to be processed is processed; the image data to be processed is a pixel matrix of R*Q, wherein the value of R or Q is equal to the degree of parallelism N; the pixel lattice is detachable Divided into a plurality of one-dimensional lattices containing N pixel points.

本发明提供的是一种并行度为N的处理方法，即等效于N个滤波器同时工作，因此需要在进行滤波前，在通用缓冲器内缓存NH列N个像素或NV行N个像素。同时，本发明所述的并行处理装置与方法可以视为对N维向量数据进行处理的装置与方法，因此在本文档后续部分将从对N维向量的操作的角度来对本发明进行详细说明。The invention provides a processing method with parallelism of N, that is equivalent to N filters working at the same time, so it is necessary to buffer the NH column N pixels or NV rows N pixels in the general buffer before performing filtering. . Meanwhile, the parallel processing apparatus and method of the present invention can be regarded as an apparatus and method for processing N-dimensional vector data, and thus the present invention will be described in detail from the viewpoint of the operation of the N-dimensional vector in the subsequent part of this document.

本发明所述算法涉及到对一列像素的并行处理，即需要对存储器进行按列访问，传统的存储器不支持高效的按列访问模式，因此本发明所述装置采用了一种多粒度的离散存储器结构，具体可参照“专利号为201110460585.1，名称为多粒度并行存储系统与存储器”进行设计。The algorithm of the present invention involves parallel processing of a column of pixels, that is, column access is required for the memory, and the conventional memory does not support an efficient column-by-column access mode. Therefore, the device of the present invention employs a multi-granular discrete memory. For the structure, please refer to "Patent No. 201110460585.1, the name is multi-granular parallel storage system and memory".

3、滤波3, filtering

本发明的图像细节增强方法首先需要通过滤波获取水平和垂直方向的细节信号，具体采用了水平NH阶和垂直NV阶的一维滤波器，实现对细节信号的提取。一般的，滤波器阶数越高，对细节信号的提取能力就越强，相应的，其过冲效应等负面影响也越明显，综合考虑以上两点以及信号对称性，通常采用水平5或7阶，垂直3或5阶的多个滤波器实施组合滤波，以获得最好的效果，图4是一个单像素水平7阶垂直5阶滤波示意图。每获取一个像素点的细节信号，需要待处理像素点的灰度值以及该点左右各(NH-1)/2个和上下各(NV-1)/2个像素的灰度。The image detail enhancement method of the invention first needs to obtain the detail signals in the horizontal and vertical directions by filtering, and specifically adopts a one-dimensional filter of horizontal NH order and vertical NV order to realize the extraction of the detail signal. In general, the higher the filter order, the stronger the extraction ability of the detail signal. Correspondingly, the negative effects such as the overshoot effect are more obvious. Considering the above two points and signal symmetry, the level 5 or 7 is usually adopted. Step, vertical filter of 3 or 5 steps to achieve the best effect, Figure 4 is a single pixel horizontal 7th order vertical 5th order filter Wave diagram. Each time a detail signal of one pixel is acquired, the gray value of the pixel to be processed and the gray level of each of the left and right (NH-1)/2 and the upper and lower (NV-1)/2 pixels are required.

本发明通过水平和垂直两组一维滤波器，实现对细节信号的提取，具体操作为将滤波模板与图像进行空域卷积，具体描述如下：The invention realizes the extraction of the detail signal by two horizontal and vertical one-dimensional filters, and the specific operation is to spatially convolve the filtering template and the image, and the specific description is as follows:

若水平滤波模板为FH，垂直滤波模板为FV，用FH(k)表示水平模板第k个元素，FV(t)表示垂直模板第t个元素，P(i，j)表示图像第i行第j列位置上的像素灰度，则(i，j)像素点处的水平滤波结果DEH(i,j)与垂直滤波结果DEV(i,j)可表示为公式(1)、公式(2)：If the horizontal filter template is FH, the vertical filter template is FV, FH(k) is used to represent the kth element of the horizontal template, FV(t) is the tth element of the vertical template, and P(i,j) is the i-th line of the image. The pixel gray level at the j column position, the horizontal filtering result DEH(i,j) at the (i,j) pixel point and the vertical filtering result DEV(i,j) can be expressed as the formula (1), the formula (2) :

其中，FH(0)和FV(0)对应滤波模板中间位置元素。Among them, FH(0) and FV(0) correspond to the intermediate position elements of the filtering template.

本发明采用了并行处理，所以可将滤波模板的每个元素视为一个N维向量，而P可视为第i行或第j列的连续N个像素的灰度。此外本发明所述方法中涉及到的向量乘法不同于数学上的向量外积或内积，本方法中向量乘法是将两个相同维度向量的对应位置元素相乘，其结果仍为一个N维向量。这里以二维向量为例作以简单说明，向量a＝(a1,a2)，b＝(b1,b2)，则向量乘法a*b＝(a1b1,a2b2)。其中a1,a2,b1,b2均为实数，a1b1,a2b2表示实数乘积。The present invention employs parallel processing, so each element of the filtered template can be considered as an N-dimensional vector, and P can be regarded as the gradation of successive N pixels of the ith row or the j-th column. In addition, the vector multiplication involved in the method of the present invention is different from the mathematical outer product or inner product. In this method, the vector multiplication is to multiply the corresponding position elements of two identical dimensional vectors, and the result is still an N dimension. vector. Here, a two-dimensional vector is taken as an example for simplicity. The vector a=(a1, a2), b=(b1, b2), then the vector multiplication a*b=(a1b1, a2b2). Where a1, a2, b1, and b2 are real numbers, and a1b1 and a2b2 represent real products.

进行滤波工作时，本发明所述装置先将通用缓冲器中缓冲区内的待处理的图像数据和滤波器系数依次发送给MAC的寄存器，MAC拥有四个等价的宽度为N的寄存器，用以完成N维向量乘法与累加运算，乘累加运算的结果可以返回至通用缓冲器，以方便再次被调用或者直接传递给其他运算部件，参与后续处理。When the filtering operation is performed, the device of the present invention firstly sends the image data to be processed and the filter coefficients in the buffer in the general buffer to the register of the MAC, and the MAC has four equivalent registers of width N, To complete the N-dimensional vector multiplication and accumulation operations, the result of the multiply-accumulate operation can be returned to the general purpose buffer to facilitate being called again or directly to other computing components for subsequent processing.

4、降噪4, noise reduction

本发明采用核化滤波对提取的细节信号所包含的噪声进行抑制，核化滤波原理是：默认细节信号叠加了一个比较小的噪声信号，因此将细节信号减去一个被称为核化滤波阈值的较小数值，即认为是不含噪声的细节信号。具体运算是，首先判断细节信号的正负，并获取符号标志位，若信号为正值，则标志位为1，否则为-1；然后对细节信号取绝对值，并将此绝对值减去核化滤波阈值，对非正结果，全部认为是0；最后将减法结果与符号位相乘，得到降噪结果。图5是核化滤波的输入输出关系示意图。The invention uses the nucleation filtering to suppress the noise contained in the extracted detail signal. The nucleation filtering principle is: the default detail signal is superimposed with a relatively small noise signal, so the detail signal is subtracted by a known nuclear filtering threshold. Smaller value, that is, no Noisy detail signal. The specific operation is to first judge the positive and negative of the detail signal and obtain the symbol flag. If the signal is positive, the flag is 1, otherwise it is -1; then the absolute value of the detail signal is taken, and the absolute value is subtracted The nucleation filter threshold, for non-positive results, is all considered to be 0; finally, the subtraction result is multiplied by the sign bit to obtain a noise reduction result. Figure 5 is a schematic diagram of the input-output relationship of the nucleation filter.

该步骤涉及到(与零)比较，求绝对值，减法，求最大值和乘法运算，除乘法运算外，其余运算均由并行算术逻辑单元ALU执行。与MAC类似，ALU也具有4个完全等价的N维向量寄存器，可以同时对N个数据进行算术与逻辑运算。This step involves (in comparison with zero) the absolute value, the subtraction method, the maximum value and the multiplication operation. Except for the multiplication operation, the rest of the operations are performed by the parallel arithmetic logic unit ALU. Similar to MAC, ALU also has four fully equivalent N-dimensional vector registers, which can perform arithmetic and logic operations on N data at the same time.

5、过冲抑制5, overshoot suppression

本发明依据细节信号的大小以及对应像素点邻域的灰度对称性对细节增强的幅度进行控制，从而实现过冲抑制。一般的，如图6(a)、6(b)、6(c)、6(d)所示，过冲现象通常发生在灰度变化较大(即细节丰富)且灰度不对称的区域。图6(a)～(d)给出了水平方向亮度不对称，易发生过冲现象的四种情况，垂直方向与此类似。The invention controls the amplitude of the detail enhancement according to the size of the detail signal and the gray level symmetry of the corresponding pixel point neighborhood, thereby implementing overshoot suppression. In general, as shown in Figures 6(a), 6(b), 6(c), and 6(d), the overshoot phenomenon usually occurs in areas where the gradation changes greatly (that is, the details are rich) and the gray scale is asymmetrical. . Fig. 6(a) to (d) show four cases in which the horizontal direction is asymmetrical and prone to overshoot, and the vertical direction is similar.

本发明对过冲抑制的策略是对水平细节信号和垂直细节信号分别进行处理，然后将经过过冲抑制的两个细节信号相加，获得最终的细节信号。具体方法如下：The strategy for overshoot suppression of the present invention is to separately process the horizontal detail signal and the vertical detail signal, and then add the two detail signals that have been overshoot suppressed to obtain the final detail signal. The specific method is as follows:

步骤43，利用图7所示曲线得到第一过冲抑制因子alpha，如公式(3)所示In step 43, the first overshoot suppression factor alpha is obtained by using the curve shown in FIG. 7, as shown in formula (3).

alpha＝ka*Y_abs_mean (3)Alpha=ka*Y_abs_mean (3)

同时利用下式计算与细节信号强度相关的第二过冲抑制因子beta，如公式(4)所示； At the same time, the second overshoot suppression factor beta related to the detail signal strength is calculated by the following formula, as shown in formula (4);

其中，de为细节信号强度，kb为设定的正系数，图8给出了beta计算公式的图形化表示。Where de is the detail signal strength and kb is the set positive coefficient. Figure 8 shows the graphical representation of the beta calculation formula.

步骤44，本发明所述装置进行过冲抑制时，可以直接从通用缓冲器获取已加载的NH列或NV行像素的灰度值，并利用这些灰度数据计算出两个抑制因子，并根据alpha和beta计算出过冲控制因子s＝1-alpha×beta，从而实施过冲抑制，得到经过过冲抑制的细节信号de_ss＝de×s。Step 44: When the device of the present invention performs overshoot suppression, the gray value of the loaded NH column or NV row pixel may be directly obtained from the general buffer, and the two suppression factors are calculated by using the gray scale data, and according to Alpha and beta calculate the overshoot control factor s=1-alpha×beta, thereby performing overshoot suppression, and obtaining the overshoot suppression detail signal de_ss=de×s.

由于本发明对两个方向分别进行过冲抑制，因此最终的细节信号de_ss＝de_ss_h+de_ss_v，de_ss_h、de_ss_v分别表示水平和垂直方向经过过冲抑制的细节信号强度。图9中的de_ss_X的X代表h或v，即水平或垂直方向的细节信号。Since the present invention performs overshoot suppression for both directions, the final detail signals de_ss=de_ss_h+de_ss_v, de_ss_h, and de_ss_v represent the detailed signal strengths of the horizontal and vertical direction overshoot suppression, respectively. The X of de_ss_X in Fig. 9 represents h or v, that is, a detail signal in the horizontal or vertical direction.

结合本发明的可重构的并行图像细节增强装置，具体执行流程见图9，包括：图像数据加载至ALU；计算灰度绝对值差，结果输出至MAC；绝对差累加，结果输出至ALU；计算绝对差均值，并进一步计算绝对差均值的绝对差；Ka加载至MAC，计算alpha，结果保留在MAC寄存器中；经过降噪的细节信号de和kb加载至MAC，计算beta；计算alpha与beta的乘积，结果输出至ALU；计算过冲控制因子s＝1-alpha×beta，结果输出至MAC；计算经过过冲抑制的细节信号de_ss_X＝de_X×s，结果输出至MAC。In conjunction with the reconfigurable parallel image detail enhancement apparatus of the present invention, the specific execution flow is shown in FIG. 9, including: loading image data to the ALU; calculating grayscale absolute value difference, and outputting the result to the MAC; absolute difference accumulation, and outputting the result to the ALU; Calculate the absolute difference mean and further calculate the absolute difference of the absolute difference mean; Ka is loaded to the MAC, the alpha is calculated, and the result is retained in the MAC register; the denoised detail signals de and kb are loaded into the MAC to calculate the beta; the alpha and beta are calculated The product is output to the ALU; the overshoot control factor s=1-alpha×beta is calculated, and the result is output to the MAC; the detailed signal de_ss_X=de_X×s after the overshoot suppression is calculated, and the result is output to the MAC.

6、幅度抑制6, amplitude suppression

经过过冲抑制的细节信号仍有可能存在强度过大，导致图像过增强，从而影响观看质量，因此还需要对增强后的细节信号的幅度进行控制。该步骤分为两个过程：The details of the overshoot suppression signal may still be too strong, resulting in image over-enhancement, which affects the viewing quality. Therefore, it is also necessary to control the amplitude of the enhanced detail signal. This step is divided into two processes:

步骤51，对经过过冲抑制的细节信号de_ss进行放大，方法是在MAC中将de_ss与细节增强系数gain相乘，得到增强后的细节信号de_gain；Step 51, amplifying the overshoot suppression detail signal de_ss by multiplying de_ss by the detail enhancement coefficient gain in the MAC to obtain an enhanced detail signal de_gain;

步骤52，将结果输出至ALU，在ALU中根据图10所示曲线进行幅度抑制，并得到最终的细节信号de_final，如公式(5)所示， In step 52, the result is output to the ALU, and the amplitude suppression is performed according to the curve shown in FIG. 10 in the ALU, and the final detail signal de_final is obtained, as shown in the formula (5).

最终的输出结果为Yout＝Yin+de_final。其中Yout和Yin分别为输出的像素灰度和输入的像素灰度。Yout首先输出至通用缓冲器，然后由访存控制单元存储至局部存储器中。The final output is Yout=Yin+de_final. Where Yout and Yin are the pixel gradation of the output and the pixel gradation of the input, respectively. Yout first outputs to the general purpose buffer, which is then stored by the fetch control unit into the local memory.

7、缓存数据更新7, cache data update

在完成N个像素的细节增强之后，需要更新通用缓冲器中缓冲区的数据，读取后面N个数据，替换缓冲区中NH或NV个N维向量中的第一个，从物理意义上可视为滤波窗口的滑动。After completing the detail enhancement of the N pixels, it is necessary to update the data of the buffer in the general buffer, read the next N data, and replace the first one of the NH or NV N-dimensional vectors in the buffer, which is physically applicable. Think of the sliding of the filter window.

通过更新缓冲器中数据，依次对待处理的图像数据中各一维点阵执行步骤2至步骤5进行处理，完成该待处理的图像数据的细节增强。By updating the data in the buffer, steps 1 to 5 of each of the one-dimensional lattices in the image data to be processed are sequentially processed to complete the detail enhancement of the image data to be processed.

上述过程对本发明的完整处理流程进行了解释说明，本发明通过对状态机的编程以及使用通用缓冲器设计，实现了硬件资源的重复利用，在运行复杂算法时，避免了传统的专用电路方案设计流片周期长且版本迭代成本高的不足之处。The above process explains the complete processing flow of the present invention. The invention realizes the reuse of hardware resources by programming the state machine and using the general buffer design, and avoids the design of the traditional dedicated circuit scheme when running the complex algorithm. The shortcomings of long stream period and high version iteration cost.

所属技术领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的装置的具体工作过程及有关说明，可以参考前述方法实施例中的对应过程，在此不再赘述。A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process and related description of the device described above can refer to the corresponding process in the foregoing method embodiments, and details are not described herein again.

本领域技术人员应该能够意识到，结合本文中所公开的实施例描述的各示例的单元及方法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明电子硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以电子硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本发明的范围。Those skilled in the art should appreciate that the elements and method steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, in order to clearly illustrate electronic hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are performed in electronic hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.

至此，已经结合附图所示的优选实施方式描述了本发明的技术方案，但是，本领域技术人员容易理解的是，本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下，本领域技术人员可以对相关技术特征作出等同的更改或替换，这些更改或替换之后的技术方案都将落入本发明的保护范围之内。 Heretofore, the technical solutions of the present invention have been described in conjunction with the preferred embodiments shown in the drawings, but it is obvious to those skilled in the art that the scope of the present invention is obviously not limited to the specific embodiments. Those skilled in the art can make equivalent changes or substitutions to the related technical features without departing from the principles of the present invention, and the technical solutions after the modifications or replacements fall within the scope of the present invention.

Claims

A reconfigurable parallel image detail enhancement method, comprising the steps of:

Step 1: loading the image data to be processed into a buffer; the image data to be processed is a pixel matrix of R*Q, wherein the value of R or Q is equal to the degree of parallelism N; the pixel lattice is detachable a plurality of one-dimensional lattices comprising N pixel points;

Step 2: performing horizontal and vertical filtering on each pixel to be enhanced in the one-dimensional lattice to obtain detailed signals in two directions;

Step 3, nucleating the detail signal in two directions, filtering out the minute detail signal introduced by the image noise;

Step 4: controlling the intensity of the enhanced detail signal by performing gray scale symmetry on both sides of the neighborhood of the pixel to be enhanced and the signal intensity of the pixel to be enhanced, performing overshoot suppression, and completing two overshoot suppression The detail signals are added to obtain the detail signals of the N pixels;

Step 5, further performing amplitude suppression on the detail signal obtained in step 4;

Step 6: Step 1 to step 5 are performed on each one-dimensional lattice in the image data to be processed in sequence, and the detail enhancement of the image data to be processed is completed.

The method of claim 1 wherein said buffer comprises NM buffer units of size N pixels; said buffer being provided with 4 read ports and 4 write ports.

The method according to claim 2, wherein the filtering in the horizontal and vertical directions corresponds to a one-dimensional filter of horizontal NH order and vertical NV order, respectively calculating left and right pixel points ( The gray scale of NH-1)/2 and each of the upper and lower (NV-1)/2 pixels is combined with the gray value of the pixel to obtain the detail signal in both directions of the pixel.

The method of claim 3 wherein said buffer is a multi-granular discrete memory structure.

The method according to claim 4, wherein the filtering in the horizontal and vertical directions is specifically performing spatial convolution of the filtering template and the image data, and the filtering result is represented by for:

Where (i, j) represents the pixel point in the i-th row and j-th column position in the image data, DEH(i, j) represents the horizontal filtering result at (i, j), and DEV(i, j) represents (i , j) vertical filtering results, P (i, j) represents the pixel gray level at the i-th row and j-th column position of the image, FH (k) represents the k-th element of the horizontal template, and FV (t) represents the vertical template t elements.

The method according to claim 5, wherein said overshoot suppression in step 4 separately processes the horizontal detail signal and the vertical detail signal, and then adds the two detail signals subjected to overshoot suppression to obtain The final detail signal, the specific method is:

Step 41, using the gray value of the pixel to be processed and the gray level of each of the left and right (NH-1)/2 and the upper and lower (NV-1)/2 pixels of the point, the absolute difference operation is obtained, that is, the left and right sides are obtained ( NH-1)/2 and upper and lower (NV-1)/2 total four sets of grayscale absolute difference;

Step 42: Calculate the mean values of the four groups of absolute differences: Mean_L, Mean_R, Mean_T, and Mean_B, that is, the mean value of the four gray scale differences between the top and bottom of the point;

Step 43, calculating a first overshoot suppression factor alpha and a second overshoot suppression factor beta, the formula is

Alpha=ka*Y_abs_mean

Where ka is the set coefficient and Y_abs_mean is the absolute difference of the mean difference of the gray scale absolute, ie |Mean_L-Mean_R| or |Mean_T-Mean_B|, de is the detail signal strength, and kb is the set positive coefficient.

In step 44, the overshoot suppression factor s=1-alpha×beta is calculated, overcharge suppression is performed, and the detail signal de_ss=de×s after overshoot suppression is obtained.

The method according to claim 6, characterized in that said detail signal strength de = de_h + de_v, wherein de_h is the detail signal strength in the horizontal direction and de_v is the detail signal strength in the vertical direction.

The method according to claim 7, wherein said amplitude suppression in step 5 is as follows:

Step 51, multiplying de_ss by the detail enhancement coefficient gain to obtain an enhanced detail signal de_gain;

Step 52, and performing amplitude suppression according to the following formula, and obtaining a final detail signal de_final;

Where Th is the set threshold and Max_de is the set maximum.

The method according to claim 8, wherein the output value after the amplitude suppression in step 5 is Yout=Yin+de_final, wherein Yout and Yin are respectively the output pixel gradation and the input pixel gradation.

The method according to any one of claims 1 to 9, further comprising a parameter preloading step before the step 1, the parameter preloading step comprising: filtering and nucleating filtering in a preset horizontal and vertical direction. The cure parameters in overshoot suppression and amplitude suppression are loaded into the general purpose buffer.

The method according to any one of claims 1 to 9, wherein the image data to be processed in step 1 is obtained by sequentially splitting the image data according to a pixel matrix of R*Q; Load into the buffer by:

According to the splitting order of the image data, the image data to be processed is sequentially selected and processed through steps 2 to 6, until all the image data to be processed is processed.

A reconfigurable parallel image detail enhancement device, comprising: local memory, memory access control unit, general purpose buffer, parallel arithmetic logic unit ALU, shape State machine, parallel multiply accumulator MAC;

The local memory is configured to save input and output image data and parameters required by a parallel video image contrast enhancement algorithm, and the memory supports parallel access;

The memory access control unit is configured to exchange data between the local memory and the general buffer;

The general purpose buffer is used to buffer all data and intermediate results required for a complete processing flow, and the buffer can be directly indexed by an address;

The parallel arithmetic logic unit is configured to perform non-multiply-like arithmetic and logic operations involved in a parallel video image contrast enhancement algorithm; the degree of parallelism is N;

The state machine for generating control signals for all functional components;

The parallel multiply accumulator is configured to perform a multiplication correlation operation, and the degree of parallelism is N;

The state machine is respectively connected to the parallel arithmetic logic unit, the memory access control unit, the general buffer, and the parallel multiply accumulator through a communication line; the local memory is connected to the memory access control unit through a communication line; the universal buffer is communicated The lines are respectively connected to the memory access control unit, the parallel arithmetic logic unit, and the parallel multiply accumulator; the parallel arithmetic logic unit is connected to the parallel multiply accumulator via a communication line.