CN106570462A

CN106570462A - Lip movement feature extraction method and system capable of illumination robustness improvement

Info

Publication number: CN106570462A
Application number: CN201610921372.7A
Authority: CN
Inventors: 马新军; 张宏军; 仲乾元; 李园园
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-04-19

Abstract

The present invention provides a pre-processing method for de-illumination, a lip movement feature extraction method and system for improving the robustness of illumination, and the pre-processing method for de-illumination includes a median filtering step, a Gamma correction step, a multi-scale Retinex filtering step, and contrast equalization transformation step. The beneficial effects of the present invention are: the improvement of the LBP feature extraction method of the present invention further improves lip reading recognition to a certain extent, and the feature extraction method is not easily affected by changes in external light and has strong robustness.

Description

Lip movement feature extraction method and system for improving illumination robustness

技术领域technical field

本发明涉及图像处理技术领域，尤其涉及提高光照鲁棒性的唇动特征提取方法及系统。The invention relates to the technical field of image processing, in particular to a lip movement feature extraction method and system for improving illumination robustness.

背景技术Background technique

随着自然人机交互技术的发展,唇读已成为该技术领域的一大热点。然而当前的对于唇读技术的研究多停留在实验室理想光照环境下，对于可变光照下的唇读识别研究则是少之又少。With the development of natural human-computer interaction technology, lip reading has become a hot spot in this technology field. However, most of the current research on lip reading technology stays in the ideal lighting environment in the laboratory, and there are very few researches on lip reading recognition under variable lighting.

发明内容Contents of the invention

本发明提供了一种去光照预处理方法，包括如下步骤：The present invention provides a kind of de-illumination pretreatment method, comprises the following steps:

中值滤波步骤：用于对图像进行去噪处理；Median filtering step: used to denoise the image;

Gamma校正步骤：用于对唇部图像的灰度分布进行校正；Gamma correction step: used to correct the gray distribution of the lip image;

多尺度Retinex滤波步骤：采用高斯环绕函数，具体为：Multi-scale Retinex filtering steps: Gaussian surround function is used, specifically:

S(x,y)＝R(x,y)L(x,y) (2)S(x,y)=R(x,y)L(x,y) (2)

其中σ为高斯环绕函数的尺度函数，G(x,y)满足∫∫G(x,y)dxdy＝1，λ是令∫∫G(x,y)dxdy＝1归一化的常数；in σ is the scaling function of the Gaussian surround function, G(x, y) satisfies ∫∫G(x, y)dxdy=1, and λ is a constant that makes ∫∫G(x, y)dxdy=1 normalized;

对比度均衡化步骤：用于对唇部图像进行对比度均衡化处理以达到改善唇部图像光照分布情况。Contrast equalization step: used to perform contrast equalization processing on the lip image to improve the light distribution of the lip image.

作为本发明的进一步改进，在所述中值滤波步骤中，采用一个3*3的中值滤波模板对输入的唇部图像进行滤波以有效去除脉冲噪声；As a further improvement of the present invention, in the median filtering step, a 3*3 median filtering template is used to filter the input lip image to effectively remove impulse noise;

在所述Gamma校正步骤中，Gamma校正的公式如式1所示：In the Gamma correction step, the formula of Gamma correction is as shown in formula 1:

本发明所选的γ值为1/2.2；The selected gamma value of the present invention is 1/2.2;

在所述多尺度Retinex滤波步骤中，对数域公式如式4所示，In the multi-scale Retinex filtering step, the logarithmic domain formula is as shown in formula 4,

K值为3，ω₁＝ω₂＝ω₃＝1/3，3个尺度因子σ分别为15、80和250；K value is 3, ω ₁ = ω ₂ = ω ₃ = 1/3, and the three scale factors σ are 15, 80 and 250 respectively;

在所述对比度均衡化步骤中，对比度均衡化公式如式5-7所示：In the contrast equalization step, the contrast equalization formula is shown in formula 5-7:

a是图像灰度压缩因子，可以有效的调整唇部图像的动态灰度范围，τ是用于限定过大的灰度值的阀值，a＝0.2、τ＝8，式7为反正切变换，该变换为非线性变换，通过对唇部图像做该变换可有效的将图像归一化到(-τ,τ)。a is the image grayscale compression factor, which can effectively adjust the dynamic grayscale range of the lip image, τ is the threshold used to limit the excessive grayscale value, a=0.2, τ=8, formula 7 is the arctangent transformation , the transformation is a nonlinear transformation, and the image can be effectively normalized to (-τ,τ) by performing this transformation on the lip image.

本发明还提供了一种提高光照鲁棒性的唇动特征提取方法，该唇动特征提取方法包括所述去光照预处理方法，该唇动特征提取方法还包括如下步骤：位于圆形模板边缘的P个采样点用于和位于圆心的像素值作比较以生成LBP模式值，像素点中心时，对其采用双线性插值法进行取值；在对唇部提取了LBP特征后，并不会直接将其作为特征进行识别，而是将LBP编码图的直方图作为特征向量。The present invention also provides a lip motion feature extraction method that improves illumination robustness, the lip motion feature extraction method includes the light-removing preprocessing method, and the lip motion feature extraction method further includes the following steps: The P sampling points are used to compare with the pixel value located in the center of the circle to generate the LBP mode value. When the pixel point is in the center, the bilinear interpolation method is used to obtain the value; after the LBP feature is extracted from the lip, it does not It will be directly recognized as a feature, but the histogram of the LBP coded image will be used as a feature vector.

作为本发明的进一步改进，首先对唇部图像进行分区，然后再对每个分区分别提取LBP直方图特征。As a further improvement of the present invention, the lip image is first partitioned, and then the LBP histogram feature is extracted for each partition.

作为本发明的进一步改进，采用LBP的统一模式，LBP统一模式是指LBP二进制编码中0和1的变化最多不超过2次。As a further improvement of the present invention, the unified mode of LBP is adopted, and the unified mode of LBP means that the change of 0 and 1 in the binary code of LBP does not exceed 2 times at most.

本发明还提供了一种去光照预处理系统，包括：The present invention also provides a light-removing pretreatment system, comprising:

中值滤波模块：用于对图像进行去噪处理；Median filtering module: used to denoise the image;

Gamma校正模块：用于对唇部图像的灰度分布进行校正；Gamma correction module: used to correct the gray distribution of the lip image;

多尺度Retinex滤波模块：采用高斯环绕函数，具体为：Multi-scale Retinex filter module: Gaussian surround function is used, specifically:

S(x,y)＝R(x,y)L(x,y) (2)S(x,y)=R(x,y)L(x,y) (2)

对比度均衡化模块：用于对唇部图像进行对比度均衡化处理以达到改善唇部图像光照分布情况。Contrast equalization module: used to perform contrast equalization processing on the lip image to improve the light distribution of the lip image.

作为本发明的进一步改进，在所述中值滤波模块中，采用一个3*3的中值滤波模板对输入的唇部图像进行滤波以有效去除脉冲噪声；As a further improvement of the present invention, in the median filter module, a 3*3 median filter template is used to filter the input lip image to effectively remove impulse noise;

在所述Gamma校正模块中，Gamma校正的公式如式1所示：In the Gamma correction module, the formula of Gamma correction is as shown in formula 1:

在所述多尺度Retinex滤波模块中，对数域公式如式4所示，In the multi-scale Retinex filtering module, the logarithmic domain formula is as shown in formula 4,

在所述对比度均衡化模块中，对比度均衡化公式如式5-7所示：In the contrast equalization module, the contrast equalization formula is shown in formula 5-7:

本发明还提供了一种提高光照鲁棒性的唇动特征提取系统，该唇动特征提取系统包括权利要求6至7任一项所述去光照预处理系统，该唇动特征提取系统还包括：位于圆形模板边缘的P个采样点用于和位于圆心的像素值作比较以生成LBP模式值，像素点中心时，对其采用双线性插值法进行取值；在对唇部提取了LBP特征后，并不会直接将其作为特征进行识别，而是将LBP编码图的直方图作为特征向量。The present invention also provides a lip movement feature extraction system that improves illumination robustness, the lip movement feature extraction system includes the light-removing preprocessing system described in any one of claims 6 to 7, and the lip movement feature extraction system also includes : The P sampling points located on the edge of the circular template are used to compare with the pixel value located at the center of the circle to generate the LBP mode value. When the pixel point is in the center, it adopts the bilinear interpolation method to obtain the value; when the lip is extracted After the LBP feature, it is not directly recognized as a feature, but the histogram of the LBP coded image is used as a feature vector.

本发明的有益效果是：本发明对于LBP特征提取法的改进也使得唇读识别进一步得到一定程度的提高，该特征提取方法不易受外部光照变化的影响，鲁棒性较强。The beneficial effects of the present invention are: the improvement of the LBP feature extraction method of the present invention further improves lip reading recognition to a certain extent, and the feature extraction method is not easily affected by changes in external light and has strong robustness.

附图说明Description of drawings

图1是本发明的去光照预处理方法流程图；Fig. 1 is the flow chart of de-illumination pretreatment method of the present invention;

图2是Gamma校正曲线图；Figure 2 is a Gamma correction curve;

图3是Retinex滤波算法流程图；Figure 3 is a flowchart of the Retinex filtering algorithm;

图4是去光照预处理方法效果图；Fig. 4 is the rendering of the preprocessing method for removing light;

图5是基本的LBP算子示意图；Figure 5 is a schematic diagram of the basic LBP operator;

图6是3种不同半径的圆形LBP采样示意图；Figure 6 is a schematic diagram of circular LBP sampling with three different radii;

图7是由统一LBP算子得到的不同的纹理特征图；Figure 7 is a different texture feature map obtained by the unified LBP operator;

图8是分块LBP直方图向量特征提取示意图；Fig. 8 is a schematic diagram of feature extraction of block LBP histogram vector;

图9是自然光照下不同特征提取方法识别率对比图；Figure 9 is a comparison chart of the recognition rate of different feature extraction methods under natural light;

图10是不同特征提取方法所得的混淆矩阵图；Figure 10 is a confusion matrix diagram obtained by different feature extraction methods;

图11是不同去光照预处理算法对应的识别率对比图；Figure 11 is a comparison chart of recognition rates corresponding to different de-illumination preprocessing algorithms;

图12是不同唇动特征提取方法所对应的识别率对比图。Fig. 12 is a comparison chart of recognition rates corresponding to different lip movement feature extraction methods.

具体实施方式detailed description

针对背景技术的不足，本发明提出一种新的唇动特征提取方法。该方法由去光照预处理链和光照不变特征提取算子构成，从两方面入手改善了可变光照下的唇读识别效果。Aiming at the deficiency of the background technology, the present invention proposes a new lip movement feature extraction method. This method is composed of a de-illumination preprocessing chain and an illumination invariant feature extraction operator, which improves the lip reading recognition effect under variable illumination from two aspects.

本发明的去光照预处理方法应用在唇动特征提取之前，用于滤除外部光照噪声的影响。整个预处理算法由四个部分组成，每一步都会对光照干扰进行一定程度的补偿，其程序流程图如图1所示。下面按照图1的顺序分别对其进行阐述。The de-illumination preprocessing method of the present invention is applied before lip movement feature extraction to filter out the influence of external illumination noise. The entire preprocessing algorithm is composed of four parts, and each step will compensate the light interference to a certain extent, and its program flow chart is shown in Figure 1. The following will describe them respectively according to the sequence of FIG. 1 .

中值滤波：由摄像头采集到的视频图像在形成和传输的过程中，常常因为外界噪声干扰而导致其产生大量的脉冲噪声。对于模拟信号，脉冲噪声的影响并不大,但是在数字信号的传输中,脉冲噪声会极大的影响图像的质量。脉冲噪声主要可分为椒盐噪声和随机值脉冲噪声。为减小噪声的影响,可采取各种滤波方法对图像进行去噪处理。中值滤波由于可对脉冲噪声起到很好的滤除效果的同时可对图像中的某些细节起到保护作用，因而在图像降噪处理中得到了比较广泛的应用。本发明采用一个3*3的中值滤波模板对输入的唇部图像进行滤波以有效去除脉冲噪声。Median filtering: During the formation and transmission of video images collected by the camera, a large amount of impulse noise is often generated due to external noise interference. For analog signals, the impact of impulse noise is not great, but in the transmission of digital signals, impulse noise will greatly affect the image quality. Impulse noise can be mainly divided into salt and pepper noise and random value impulse noise. In order to reduce the influence of noise, various filtering methods can be used to denoise the image. Median filtering has been widely used in image noise reduction processing because it can filter out impulse noise very well and protect some details in the image at the same time. The present invention uses a 3*3 median filtering template to filter the input lip image to effectively remove impulse noise.

Gamma校正:受外部光照变化的影响，在已经得到的唇部图像中经常会出现光照分布不均的情况。最典型的情况就是唇部的一部分区域由于光照反射过于明亮而另一部分则由于光照被遮挡而过于阴暗。为了能在图像预处理阶段有效的针对上述情况对唇部图像的灰度分布进行校正，本发明在对唇部图像做完中值滤波处理后，进一步对其进行gamma校正以改善唇部图像的光照分布状况。Gamma校正的公式如式1所示。Gamma correction: Affected by changes in external lighting, uneven lighting distribution often occurs in the obtained lip images. The most typical situation is that part of the lip is too bright due to light reflection and another part is too dark due to occlusion of light. In order to effectively correct the gray distribution of the lip image in the image preprocessing stage, the present invention further performs gamma correction on the lip image to improve the lip image after the median filter processing. Light distribution. The formula of Gamma correction is shown in Equation 1.

本发明所选的γ值为1/2.2，其校正曲线如图2所示。由图2不难看出通过对图像进行gamma校正，图像的低灰度区域(阴暗区域)的灰度值得到拉伸，高灰度区域(高亮区域)的灰度值得到压缩，从而使整个唇部的图像的光照分布情况得到改善。The gamma value selected in the present invention is 1/2.2, and its calibration curve is shown in FIG. 2 . It is not difficult to see from Figure 2 that by performing gamma correction on the image, the gray value of the low gray area (dark area) of the image is stretched, and the gray value of the high gray area (highlight area) is compressed, so that the whole Improved lighting distribution for images of lips.

多尺度Retinex滤波：多尺度Retinex滤波(MSR)是近些年图像处理领域广泛使用的去光照滤波算法。多尺度Retinex滤波由最基本的单尺度Retinex滤波器构成。Retinex算法认为图像均由入射分量L(x,y)和反射分量R(x,y)构成,如式2所示。因而，使用Retinex算法对图像滤波本质上是对一幅图像的入射分量进行准确的计算并在原图中消除该分量。由于直接在输入图像中计算其入射分量在数学模型上是一个奇异问题，因而只能采用数学方法对其进行近似的估计。本发明采用高斯环绕函数来完成这一任务，具体方法如下：Multi-scale Retinex filtering: Multi-scale Retinex filtering (MSR) is a de-illumination filtering algorithm widely used in the field of image processing in recent years. Multi-scale Retinex filtering is composed of the most basic single-scale Retinex filter. The Retinex algorithm believes that the image is composed of the incident component L(x, y) and the reflection component R(x, y), as shown in formula 2. Therefore, using the Retinex algorithm to filter an image is essentially to accurately calculate the incident component of an image and eliminate the component in the original image. Since it is a singular problem in the mathematical model to directly calculate its incident component in the input image, it can only be approximated by mathematical methods. The present invention adopts Gaussian surround function to accomplish this task, and concrete method is as follows:

S(x,y)＝R(x,y)L(x,y) (2)S(x,y)=R(x,y)L(x,y) (2)

其中σ为高斯环绕函数的尺度函数，G(x,y)满足∫∫G(x,y)dxdy＝1，λ是令∫∫G(x,y)dxdy＝1归一化的常数。采用对数运算的好处是可以将入射分量与反射分量的乘除关系转化为加减关系从而简化运算，另一方面，对数运算本身也具有一定的光照滤除功能。整个滤波算法的结构框图如图3所示。in σ is the scaling function of the Gaussian surround function, G(x,y) satisfies ∫∫G(x,y)dxdy=1, and λ is a constant that makes ∫∫G(x,y)dxdy=1 normalized. The advantage of using the logarithmic operation is that the multiplication and division relationship between the incident component and the reflection component can be converted into an addition and subtraction relationship to simplify the operation. On the other hand, the logarithmic operation itself also has a certain light filtering function. The block diagram of the entire filtering algorithm is shown in Figure 3.

在本发明使用的高斯环绕函数中，σ是最为重要的参数。σ较小时，图像的灰度动态范围压缩能力较强，从而可以更好的凸显唇部图像的细节，但是同时也会造成图像一定程度的失真；σ较大时，唇部图像保真度高，但同时也相应的减弱了其对于灰度的动态范围压缩能力。为了弥补这一缺陷，本发明采用多尺度Retinex算法来对图像进行滤波，其对数域公式如式4所示。In the Gaussian surround function used in the present invention, σ is the most important parameter. When σ is small, the grayscale dynamic range compression ability of the image is strong, which can better highlight the details of the lip image, but at the same time it will also cause a certain degree of image distortion; when σ is large, the lip image has high fidelity , but at the same time correspondingly weakens its ability to compress the dynamic range of the gray scale. In order to make up for this defect, the present invention uses a multi-scale Retinex algorithm to filter images, and its logarithmic domain formula is shown in Equation 4.

这里，为了保证滤波函数兼具单尺度Retinex滤波器高、中、低三个尺度的优点，取K值为3，并且使三个尺度的滤波器具有相同的权重，即ω₁＝ω₂＝ω₃＝1/3。经反复试验，3个尺度因子σ分别为15、80和250时滤波器取得最好的滤波效果。Here, in order to ensure that the filter function has the advantages of the high, medium and low scales of the single-scale Retinex filter, the value of K is 3, and the filters of the three scales have the same weight, that is, ω ₁ =ω ₂ = ω ₃ =1/3. After repeated tests, the filter achieves the best filtering effect when the three scale factors σ are 15, 80 and 250 respectively.

对比度均衡化步骤：在经过多尺度Retinex滤波后，唇部图像的光照情况已经得到了明显改善。但是由于经过上述滤波后的图像灰度分布在不同的灰度范围内，且该灰度范围十分狭小，因而本发明提出的去光照预处理方法法的最后一步便是对唇部图像进行对比度均衡化以达到最终改善唇部图像光照分布情况的目的。Contrast equalization step: After multi-scale Retinex filtering, the illumination of the lip image has been significantly improved. However, since the grayscale of the image after the above filtering is distributed in different grayscale ranges, and the grayscale range is very narrow, the final step of the light-removing preprocessing method proposed in the present invention is to perform contrast equalization on the lip image. In order to achieve the purpose of finally improving the light distribution of the lip image.

本发明的对比度均衡化公式如式5-7所示。The contrast equalization formula of the present invention is shown in Formula 5-7.

这里，a是图像灰度压缩因子，可以有效的调整唇部图像的动态灰度范围。τ是用于限定过大的灰度值的阀值。本专利取a＝0.2、τ＝8。式7为反正切变换，该变换为非线性变换，通过对唇部图像做该变换可有效的将图像归一化到(-τ,τ)。Here, a is the image grayscale compression factor, which can effectively adjust the dynamic grayscale range of the lip image. τ is a threshold used to limit excessive grayscale values. This patent takes a=0.2, τ=8. Equation 7 is the arctangent transformation, which is a nonlinear transformation, and the image can be effectively normalized to (-τ,τ) by performing this transformation on the lip image.

图4展示了经过本发明提出的去光照预处理方法处理后的效果图。图中每行为一种不同的光照情况。每行的第一幅图像为未经任何处理的唇部图像，剩下的4张图像依次为经过不同的步骤处理后的效果。由图4可以看出本发明提出的去光照预处理方法效果良好。Fig. 4 shows the effect diagram after being processed by the de-illumination preprocessing method proposed by the present invention. Each row in the figure is a different lighting situation. The first image in each row is the lip image without any processing, and the remaining 4 images are the effects after different steps of processing. It can be seen from Fig. 4 that the de-illumination preprocessing method proposed by the present invention has a good effect.

当前在局部特征提取法最有代表性的是局部二值模式(Local Binary Patterns，LBP)。该特征法由于其具有高分辨率、对灰度单调变化不敏感等优点，目前已在人脸识别中得到了广泛应用，并被证明在可变的光照环境下具备良好的光照鲁棒性。然而该方法在唇读领域的应用还鲜有提及，为此，本发明将这一优秀的特征提取方法运用到所研究的唇读课题中，以利用其改善可变光照下的唇读识别效果。Currently, the most representative local feature extraction method is Local Binary Patterns (LBP). Due to its advantages of high resolution and insensitivity to monotonous changes in gray levels, this feature method has been widely used in face recognition, and it has been proved to have good illumination robustness in variable illumination environments. However, the application of this method in the field of lip-reading is rarely mentioned. For this reason, the present invention applies this excellent feature extraction method to the lip-reading subject of research to improve lip-reading recognition under variable light. Effect.

LBP算子的基本运算单元为一个由9个像素组成的模块，并以中心像素作为阀值，依次处理其周围的8个邻接像素。如果该邻接像素的值大于中心阀值，则结果取1，否则取0。然后将这8个值按顺序连接起来就构成了LBP编码值，最后把这这个二进制的LBP编码值转化为十进制数就得到了该中心像素的LBP模式值，具体流程如图5所示。其对应的数学表达式如式8所示。The basic operation unit of the LBP operator is a module composed of 9 pixels, and the center pixel is used as the threshold, and the 8 adjacent pixels around it are processed sequentially. If the value of the adjacent pixel is greater than the center threshold, the result is 1, otherwise it is 0. Then these 8 values are connected in order to form the LBP code value, and finally the binary LBP code value is converted into a decimal number to obtain the LBP mode value of the central pixel. The specific process is shown in Figure 5. Its corresponding mathematical expression is shown in formula 8.

基本的LBP算子虽然取得了一定的识别效果，但是也存在着采样范围固定及采样点个数有限的缺陷。在扩展LBP模式中，原有的正方形模板被半径为R的圆形模板替代。而位于圆形模板边缘的P个采样点用于和位于圆心的像素值作比较以生成LBP模式值。像素点中心时，对其采用双线性插值法进行取值)。这样有效地增强了LBP算子的采样能力，可以对图像中某个像素点不同范围内的不同个数的邻域像素点进行采样，能够提取更为有效的局部特征，因而提取的特征识别能力显著优于基本的LBP特征。图6为不同半径和采样点的LBP采样图。Although the basic LBP operator has achieved a certain recognition effect, it also has the defects of fixed sampling range and limited number of sampling points. In the extended LBP mode, the original square template is replaced by a circular template with radius R. The P sampling points located at the edge of the circular template are used for comparison with the pixel values located at the center of the circle to generate the LBP mode value. When the center of the pixel point is selected, the bilinear interpolation method is used to obtain the value). This effectively enhances the sampling ability of the LBP operator, which can sample different numbers of neighboring pixels in different ranges of a certain pixel in the image, and can extract more effective local features, so the extracted feature recognition ability Significantly outperforms basic LBP features. Figure 6 is the LBP sampling diagram of different radii and sampling points.

记中心像素的坐标为(x_c,y_c),则位于圆形模板边上的采样点坐标可由如下的公式求得：Note that the coordinates of the central pixel are (x _c , y _c ), then the coordinates of the sampling points located on the edge of the circular template can be obtained by the following formula:

x_p＝x_c+Rcos(2πp/P) (8)x _p ＝x _c +Rcos(2πp/P) (8)

y_p＝y_c+Rsin(2πp/P) (9)y _p =y _c +Rsin(2πp/P) (9)

如果求得的坐标值不为整数，则需要使用插值法计算相应的像素值。这里，本发明采用的是双线性插值法。圆形LBP算子的数学表达式如公式10所示。If the obtained coordinate value is not an integer, you need to use the interpolation method to calculate the corresponding pixel value. Here, the present invention uses a bilinear interpolation method. The mathematical expression of the circular LBP operator is shown in formula 10.

其中，g_p为圆形模板边缘上第p个像素点，g_c为圆形模板中心像素点。根据公式10可得图6中3种LBP模式分别可记为LBP_8,1、LBP_12,2.5和LBP_16,4。Among them, g _p is the pth pixel on the edge of the circular template, and g _c is the center pixel of the circular template. According to formula 10, the three LBP modes in Fig. 6 can be denoted as LBP _8,1 , LBP _12,2.5 and LBP _16,4 respectively.

在对唇部提取了LBP特征后，并不会直接将其作为特征进行识别，而是将LBP编码图的直方图作为特征向量。这样，具有8个采样点的LBP算子形成的直方图向量共有2⁸＝256维，而16个采样点的LBP算子则有2¹⁶＝65536。显然，这样的特征向量维度过高，会直接降低唇读识别的识别速度。为解决这一问题，本发明采用了LBP的统一模式。所谓的LBP统一模式是指LBP二进制编码中0和1的变化最多不超过2次。事实上，这意味着LBP统一模式中0和1的变化只能为0次或者2次，因为LBP编码是呈圆形分布的。对于8个采样点的LBP编码，0和1的变化为0次的情况只有两种，分别为00000000和11111111，如图7a)和7b)。0和1的变化为2次的情况共有P(P-1)种。对于LBP统一模式算子，我们将其记为LBP_P ^u,2 _R。由于LBP的统一模式包含了唇部图像的大部分信息，且在唇部图像的非统一模式中包含了大量的噪声信息，因而采用LBP统一模式得到的唇读识别率更高。After the LBP feature is extracted from the lip, it is not directly recognized as a feature, but the histogram of the LBP coded image is used as a feature vector. In this way, the histogram vector formed by the LBP operator with 8 sampling points has a total of 2 ⁸ =256 dimensions, and the LBP operator with 16 sampling points has 2 ¹⁶ =65536. Obviously, such a feature vector dimension is too high, which will directly reduce the recognition speed of lip reading recognition. To solve this problem, the present invention adopts the unified mode of LBP. The so-called LBP unified mode means that the change of 0 and 1 in the LBP binary code does not exceed 2 times at most. In fact, this means that the changes of 0 and 1 in the LBP unified mode can only be 0 or 2 times, because the LBP codes are distributed circularly. For the LBP encoding of 8 sampling points, there are only two cases where the change of 0 and 1 is 0 times, which are 00000000 and 11111111 respectively, as shown in Figures 7a) and 7b). There are P(P-1) cases where there are 2 changes of 0 and 1. For the LBP unified mode operator, we denote it as LBP _P ^u,2 _R . Since the unified mode of LBP contains most of the information of the lip image, and the non-uniform mode of the lip image contains a lot of noise information, the recognition rate of lip reading obtained by using the unified mode of LBP is higher.

采用LBP统一模式算子有两个重要的好处。其中第一好处就是可以节省内存空间。统一模式LBP算子将LBP编码0和1变化超过2次的全部归为一类，因而减小了其特征维度。对于非统一模式的LBP直方图向量维数为2P，而采用统一模式的LBP直方图向量维数只有P(P-1)+3。例如，当采样点数为16时，统一模式的LBP直方图向量维度只有243，相比于65536，维度大大减小。采用LBP统一模式的第二个好处是统一模式本身包含了唇部图像最重要的反映局部信息的特征，例如图像中的斑点、线段、边缘和边角，如图7所示。There are two important benefits of adopting the LBP unified mode operator. The first benefit is that it can save memory space. The unified mode LBP operator classifies all the LBP codes with 0 and 1 changes more than 2 times into one category, thus reducing its feature dimension. The dimension of the LBP histogram vector in the non-uniform mode is 2P, while the dimension of the LBP histogram vector in the unified mode is only P(P-1)+3. For example, when the number of sampling points is 16, the dimension of the LBP histogram vector in the unified mode is only 243, which is greatly reduced compared to 65536. The second benefit of adopting the LBP unified mode is that the unified mode itself contains the most important features reflecting local information of the lip image, such as spots, line segments, edges and corners in the image, as shown in Figure 7.

表1不同LBP参数对应的实验结果Table 1 Experimental results corresponding to different LBP parameters

为了能确定适用于本发明唇读课题的最优的LBP算子，本发明采用不同的LBP参数做了多组识别试验试验结果如表1所示。其中，每种LBP参数对应的识别率是20次试验结果的平均值。同时，为了评价识别结果的稳定性，在识别率的基础上，本发明还计算了不同LBP参数所得到的识别率的标准差。由表1不难看出，当P＝8，R＝3时，识别率最高，且标准差最小，识别的稳定性也最好，因而本专利将采用该组参数进行后续的试验。In order to determine the optimal LBP operator suitable for the lip-reading subject of the present invention, the present invention uses different LBP parameters to conduct multiple sets of recognition tests. The test results are shown in Table 1. Among them, the recognition rate corresponding to each LBP parameter is the average value of 20 test results. At the same time, in order to evaluate the stability of the recognition result, on the basis of the recognition rate, the present invention also calculates the standard deviation of the recognition rate obtained by different LBP parameters. It is not difficult to see from Table 1 that when P=8 and R=3, the recognition rate is the highest, the standard deviation is the smallest, and the stability of recognition is also the best. Therefore, this patent will use this set of parameters for subsequent experiments.

上述的LBP特征提取是在整个唇部图像中进行的，因而在最终得到的LBP直方图向量中缺失了描述唇部图像的微观结构信息，这同时也在某种程度上降低了唇读系统的识别率。LBP直方图表示的是经过LBP编码后的图像中灰度值的一阶的统计特性。然而只提取单一的LBP直方图是无法知道图像各个灰度的位置分布信息的，自然就没有办法对唇部图像中的微观结构信息进行描述。而通常在一幅唇部图像中，不同区域所具有的局部信息的差异也是相对较大的。如果对整个唇部图像应用LBP特征提取算子，得到之前唇部图像的LBP编码图，然后在此基础上，再对该LBP图像进行直方图统计，生成LBP直方图，那么在这个整幅唇部图像生成的LBP直方图当中，通常会丢失唇部图像中的局部的差异信息被丢失。因此，本专利采取的方法是先对唇部图像进行分区，然后再对每个分区分别提取LBP直方图特征，这样就能可以在保留其局部信息的同时增加对唇部图像的微观结构信息。基于该思想，本专利先将一幅大小为M×N的唇部图像划分为A×B个大小一致的分区，这样每个分区的大小即为(M/A)×(N/B)，对这些大小为(M/A)×(N/B)的分区分别求取LBP直方图特征。通过对不同的分块方式进行试验，本专利最终选择了3×3的分块方法，具体流程如图8所示。The above-mentioned LBP feature extraction is carried out in the entire lip image, so the microstructure information describing the lip image is missing in the final LBP histogram vector, which also reduces the performance of the lip reading system to some extent. Recognition rate. The LBP histogram represents the first-order statistical characteristics of the gray value in the image encoded by LBP. However, it is impossible to know the position distribution information of each gray level of the image only by extracting a single LBP histogram, and naturally there is no way to describe the microstructure information in the lip image. Usually, in a lip image, the difference in local information of different regions is relatively large. If the LBP feature extraction operator is applied to the entire lip image to obtain the LBP coded image of the previous lip image, and then on this basis, the histogram statistics of the LBP image are performed to generate the LBP histogram, then the entire lip image In the LBP histogram generated from the lip image, the local difference information in the lip image is usually lost. Therefore, the method adopted in this patent is to partition the lip image first, and then extract the LBP histogram feature for each partition separately, so that the microstructure information of the lip image can be increased while retaining its local information. Based on this idea, this patent first divides a lip image with a size of M×N into A×B partitions of the same size, so that the size of each partition is (M/A)×(N/B), The LBP histogram features are obtained for these partitions whose size is (M/A)×(N/B). By testing different block methods, this patent finally selects the 3×3 block method, and the specific process is shown in FIG. 8 .

上述的基于分区的LBP特征提取法能很好的提取唇部图像的局部特征，因而对外部的光照变化具有一定的鲁棒性。但该方法不能提取唇部图像的全局特征，因而使得唇部的特征不能得到完整的描述。为了能进一步提取唇部图像中具有代表性的特征，本专利将PCA技术引入与LBP特征提取法结合，从而达到既能提取唇部图像的局部特征又能提取其全局特征，二者相结合以进一步提高唇读识别率。PCA法的引入带来的另一个优点是可以对唇部特征进行降维。本专利所采集的唇读数据库中的每个数字所对应的图像序列均有25帧以上，因而一个数字所对应的唇动特征至少为为单帧唇部图像的25倍。因此，采用PCA法对LBP分块直方图向量降维不仅可以提升唇读系统的识别率，滤除部分光照变化带来的干扰，同时也可以提升唇读系统的识别率。The above partition-based LBP feature extraction method can well extract the local features of the lip image, so it has certain robustness to external illumination changes. However, this method cannot extract the global features of the lip image, so the lip features cannot be fully described. In order to further extract the representative features in the lip image, this patent introduces the PCA technology and combines the LBP feature extraction method, so as to extract both the local features of the lip image and its global features. Further improve the lip reading recognition rate. Another advantage brought by the introduction of the PCA method is that it can reduce the dimensionality of lip features. The image sequence corresponding to each number in the lip-reading database collected by this patent has more than 25 frames, so the lip movement feature corresponding to a number is at least 25 times that of a single-frame lip image. Therefore, using the PCA method to reduce the dimensionality of the LBP block histogram vector can not only improve the recognition rate of the lip reading system, filter out the interference caused by some lighting changes, but also improve the recognition rate of the lip reading system.

基于自建的唇读数据库，通过采用SVM识别方法，本专利进行了四组不同的对比试验以验证本专利提出的光照预处理算法和基于LBP的特征提取算法的有效性。其中，第一组为自然光照下不同特征提取方法的效果对比试验。通过将本专利提出的唇动特征提取算法和目前常用的PCA、DCT以及LBP三种方法做对比，以体现出本专利提出的唇读系统对数字的识别效果。该组试验采用数据库中的子集1进行试验，每次从子集1中随机选取5个样本用作训练，剩下的20个样本用于识别测试。每种特征提取方法连续进行20次上述的测试过程，并最终取平均值，从而得到不同的特征提取法对每个数字的识别效果，试验结果如图9和图10所示。Based on the self-built lip-reading database, by adopting the SVM recognition method, this patent conducted four different sets of comparative experiments to verify the effectiveness of the illumination preprocessing algorithm and the LBP-based feature extraction algorithm proposed in this patent. Among them, the first group is a comparison test of the effects of different feature extraction methods under natural light. By comparing the lip movement feature extraction algorithm proposed in this patent with the currently commonly used three methods of PCA, DCT and LBP, the recognition effect of the lip reading system proposed in this patent on numbers is reflected. This group of experiments uses subset 1 in the database for experiments, and 5 samples are randomly selected from subset 1 for training each time, and the remaining 20 samples are used for recognition testing. Each feature extraction method carries out the above test process 20 times continuously, and finally takes the average value, so as to obtain the recognition effect of different feature extraction methods for each number. The test results are shown in Figure 9 and Figure 10.

由图9可以看出，在自然光照下DCT法的识别最高，优于其他方法。而本专利提出的方法与PCA识别率相当，LBP法识别效果最差。同时对于不同的数字，6和7的识别率最低，5和8的识别率最高，这是由于5和8的发音口型相比其他数字的发音较为独特。It can be seen from Figure 9 that the recognition of DCT method is the highest under natural light, which is better than other methods. However, the recognition rate of the method proposed in this patent is equivalent to that of PCA, and the recognition effect of the LBP method is the worst. At the same time, for different numbers, the recognition rate of 6 and 7 is the lowest, and the recognition rate of 5 and 8 is the highest. This is because the pronunciation of 5 and 8 is more unique than the pronunciation of other numbers.

图10为不同唇动特征提取方法所对应的混淆矩阵。其中，每行表示不同数字的实际类别，每列表示每个数字被SVM分类器预测的类别。由图10可以看出，6和7两个数字不仅识别率低，同时相互混淆的程度也最为明显，这是由于二者的发音口型较为相似造成的。除此之外，0和1的互相混淆的程度仅次于6和7，识别率也不够高。总体而言，本专利的唇读识别系统在自然光照条件下表现较好，平均识别率达到80％左右。Figure 10 is the confusion matrix corresponding to different lip movement feature extraction methods. where each row represents the actual class of a different digit and each column represents the class predicted by the SVM classifier for each digit. It can be seen from Figure 10 that the recognition rate of the two numbers 6 and 7 is not only low, but also the degree of mutual confusion is the most obvious, which is caused by the similarity in the pronunciation of the two numbers. In addition, the degree of confusion between 0 and 1 is second only to 6 and 7, and the recognition rate is not high enough. Generally speaking, the lip-reading recognition system of this patent performs well under natural light conditions, and the average recognition rate reaches about 80%.

第二组实验采用本专利提出的改进的LBP唇动特征提取方法，比较在不同的光照预处理算法下的识别效果。该实验每次随机选取子集1中的5个样本用于训练SVM模型，测试样本分别为子集1中剩余的20个样本以及全部的子集2、子集3和子集4的样本，实验结果如图11所示。其中，NONE表示不采用任何去光照预处理算法的情况，HE表示直方图均衡化，HF表示同态滤波。由图11可以看出，在光照条件发生变化的情况下，相比于直接提取唇动特征，采用了去光照预处理算法后唇读识别率均会有所提高。而其中，直方图均衡化和同态滤波效果相当，本专利提出的去光照预处理算法效果最好。然而值得注意的是，相比于自然光照条件，在光照发生变化时唇读识别率总体下降了很多。尽管本专利针对光照的影响采用了补偿措施，但仍无法和光照不变情况的唇读识别效果相比。The second group of experiments used the improved LBP lip movement feature extraction method proposed in this patent to compare the recognition effects under different lighting preprocessing algorithms. In this experiment, 5 samples in subset 1 are randomly selected for training the SVM model each time. The test samples are the remaining 20 samples in subset 1 and all samples in subset 2, subset 3 and subset 4. The experiment The result is shown in Figure 11. Among them, NONE means that no de-illumination preprocessing algorithm is used, HE means histogram equalization, and HF means homomorphic filtering. It can be seen from Figure 11 that when the lighting conditions change, compared with directly extracting lip movement features, the recognition rate of lip reading will be improved after using the de-illumination preprocessing algorithm. Among them, the effects of histogram equalization and homomorphic filtering are equivalent, and the de-illumination preprocessing algorithm proposed in this patent has the best effect. However, it is worth noting that compared with natural lighting conditions, the lip-reading recognition rate generally drops a lot when the lighting changes. Although this patent adopts compensation measures for the influence of light, it still cannot compare with the effect of lip reading recognition under the condition of constant light.

第三组试验采用本专利提出的去光照预处理算法，比较了不同唇动特征提取算法的识别率。其试验流程与第二组实验一样，实验结果如图12所示。从图12可以看出，尽管PCA、DCT等特征提取法在子集1的识别率较高，但对于其他子集，其识别率远低于LBP及本专利所提方法。由此可见，LBP的确是一种光照鲁棒的特征提取方法，而本专利对于LBP特征提取法的改进也使得唇读识别进一度得到一定程度的提高。相比之下，传统的特征提取方法则易受外部光照变化的影响，鲁棒性较差。The third group of experiments used the de-illumination preprocessing algorithm proposed by this patent to compare the recognition rates of different lip movement feature extraction algorithms. The test procedure is the same as that of the second group of experiments, and the experimental results are shown in Figure 12. It can be seen from Figure 12 that although feature extraction methods such as PCA and DCT have a high recognition rate in subset 1, for other subsets, the recognition rate is much lower than that of LBP and the method proposed in this patent. It can be seen that LBP is indeed a light-robust feature extraction method, and the improvement of the LBP feature extraction method in this patent has also improved lip reading recognition to a certain extent. In contrast, traditional feature extraction methods are vulnerable to external lighting changes and have poor robustness.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims

1. one kind removes light irradiation preprocess method, it is characterised in that comprise the steps：

Median filtering step：For carrying out denoising to image；

Gamma aligning steps：For being corrected to the intensity profile of lip image；

Multiple dimensioned Retinex filter steps：Function is surround using Gauss, specially：

S (x, y)=R (x, y) L (x, y) (2)

\begin{matrix} \log R (x, y) = l o g [S (x, y) / L (x, y)] \\ = \log S (x, y) - \log [S (x, y) &CircleTimes; G (x, y)] \end{matrix} - - - (3)

Whereinσ is scaling function of the Gauss around function, and G (x, y) meets ∫ ∫ G (x, y) Dxdy=1, λ are to make the normalized constants of ∫ ∫ G (x, y) dxdy=1；

Contrast equalization procedures：Improve lip image irradiation to reach for contrast equalization processing is carried out to lip image Distribution situation.

2. it is according to claim 1 to remove light irradiation preprocess method, it is characterised in that in the median filtering step, to adopt The lip image being input into is filtered effectively to remove impulsive noise with the medium filtering template of a 3*3；

In the Gamma aligning steps, the formula of Gamma corrections is as shown in Equation 1：

I_{(x, y)}^{'} = {(\frac{I_{(x, y)}}{255})}^{γ} \times 255, γ &Element; (0, 1) - - - (1)

γ-value selected by the present invention is 1/2.2；

In the multiple dimensioned Retinex filter steps, log-domain formula is as shown in Equation 4,

\log R (x, y) = Σ_{k = 1}^{K} ω_{k} {\log S (x, y) - \log [S (x, y) &CircleTimes; G_{k} (x, y)]} - - - (4)

K values are 3, ω₁=ω₂=ω₃=1/3,3 scale factors σ are respectively 15,80 and 250；

In the contrast equalization procedures, contrast equalizes formula as shown in formula 5-7：

I (x, y) = \frac{I (x, y)}{{(m e a n (| I (x^{'}, y^{'}) |^{a}))}^{1 / a}} - - - (5)

I (x, y) = \frac{I (x, y)}{(m e a n {(m i n {(τ, | I (x^{'}, y^{'}) |)}^{a})}^{1 / a}} - - - (6)

I (x, y) = τ \frac{2}{π} arc t a n (I (x, y) / τ) - - - (7)

A is gradation of image compressibility factor, can effectively adjust the dynamical gray scope of lip image, and τ is excessive for limiting Gray value threshold values, a=0.2, τ=8, formula 7 be contact transformation anyway, and this is transformed to nonlinear transformation, by lip image Doing the conversion can effectively by image normalization to (- τ, τ).

3. a kind of lip for improving illumination robustness moves feature extracting method, it is characterised in that the dynamic feature extracting method of the lip includes Light irradiation preprocess method is removed described in any one of claim 1 to 2, the dynamic feature extracting method of the lip also comprises the steps：It is located at The P sampled point at circular shuttering edge is used for and the pixel value positioned at the center of circle makes comparisons to generate LBP mode values, pixel dot center When, adopt bilinear interpolation to carry out value to which；After LBP features are extracted to lip, can't be directly as spy Levy and be identified, but using the rectangular histogram of LBP code patterns as characteristic vector.

4. lip according to claim 3 moves feature extracting method, it is characterised in that carry out subregion to lip image first, Then LBP histogram features are extracted respectively to each subregion again.

5. lip according to claim 3 moves feature extracting method, it is characterised in that using the More General Form of LBP, LBP systems One pattern refers to that in LBP binary codings 0 and 1 change is no more than 2 times.

6. one kind removes illumination pretreatment system, it is characterised in that include：

Medium filtering module：For carrying out denoising to image；

Gamma correction modules：For being corrected to the intensity profile of lip image；

Multiple dimensioned Retinex filtration modules：Function is surround using Gauss, specially：

S (x, y)=R (x, y) L (x, y) (2)

\begin{matrix} \log R (x, y) = l o g [S (x, y) / L (x, y)] \\ = \log S (x, y) - \log [S (x, y) &CircleTimes; G (x, y)] \end{matrix} - - - (3)

Contrast equalizes module：Improve lip image irradiation to reach for contrast equalization processing is carried out to lip image Distribution situation.

7. it is according to claim 6 to remove illumination pretreatment system, it is characterised in that in the medium filtering module, to adopt The lip image being input into is filtered effectively to remove impulsive noise with the medium filtering template of a 3*3；

In the Gamma correction modules, the formula of Gamma corrections is as shown in Equation 1：

I_{(x, y)}^{'} = {(\frac{I_{(x, y)}}{255})}^{γ} \times 255, γ &Element; (0, 1) - - - (1)

γ-value selected by the present invention is 1/2.2；

In the multiple dimensioned Retinex filtration modules, log-domain formula is as shown in Equation 4,

\log R (x, y) = Σ_{k = 1}^{K} ω_{k} {\log S (x, y) - \log [S (x, y) &CircleTimes; G_{k} (x, y)]} - - - (4)

In contrast equalization module, contrast equalizes formula as shown in formula 5-7：

I (x, y) = \frac{I (x, y)}{{(m e a n (| I (x^{'}, y^{'}) |^{a}))}^{1 / a}} - - - (5)

I (x, y) = \frac{I (x, y)}{(m e a n {(m i n {(τ, | I (x^{'}, y^{'}) |)}^{a})}^{1 / a}} - - - (6)

I (x, y) = τ \frac{2}{π} arc t a n (I (x, y) / τ) - - - (7)

8. a kind of lip for improving illumination robustness moves Feature Extraction System, it is characterised in that the dynamic Feature Extraction System of the lip includes Illumination pretreatment system is removed described in any one of claim 6 to 7, the dynamic Feature Extraction System of the lip also includes：Positioned at circular shuttering The P sampled point at edge is used for and the pixel value positioned at the center of circle is made comparisons to generate LBP mode values, during pixel dot center, to which Value is carried out using bilinear interpolation；After LBP features are extracted to lip, directly can't be known as feature Not, but using the rectangular histogram of LBP code patterns as characteristic vector.

9. lip according to claim 8 moves Feature Extraction System, it is characterised in that carry out subregion to lip image first, Then LBP histogram features are extracted respectively to each subregion again.

10. lip according to claim 8 moves Feature Extraction System, it is characterised in that using the More General Form of LBP, LBP systems One pattern refers to that in LBP binary codings 0 and 1 change is no more than 2 times.