CN108717542B

CN108717542B - Method, apparatus and computer-readable storage medium for identifying text region

Info

Publication number: CN108717542B
Application number: CN201810367675.8A
Authority: CN
Inventors: 杨松
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2020-09-15
Anticipated expiration: 2038-04-23
Also published as: CN108717542A

Abstract

The present disclosure relates to a method, apparatus, and computer-readable storage medium for identifying text regions. Using this method, firstly, the feature information of the image to be recognized is input into the text area recognition model to obtain the first probability and the second probability of each first image area in the to-be-recognized image, and then, according to each first image The first probability of the area is to filter out the image area containing the text, and then, on the basis of the screened image area containing the text, the second probability of the image area is further analyzed to determine whether the image area is adjacent to it. The image areas of the image are combined, and finally, according to the combined image area, the text area in the image to be recognized is determined. Therefore, by determining the first probability and the second probability of each first image region in the image to be recognized, the text region can be directly calculated, and a text region recognition method with high accuracy and fast recognition speed is provided.

Description

Method, apparatus and computer-readable storage medium for identifying text region

技术领域technical field

本公开涉及图像处理领域，尤其涉及一种识别文字区域的方法、装置及计算机可读存储介质。The present disclosure relates to the field of image processing, and in particular, to a method, a device, and a computer-readable storage medium for recognizing a text area.

背景技术Background technique

随着互联网技术的迅速发展，互联网上的图片数量越来越多，对用户获取想要的图片造成不便。由于图片中不仅仅包含有画面，还包含有一些文字，人们趋向于通过搜索图片中的文字筛选出想要的图片，在此趋势的推动下，OCR(Optical CharacterRecognition，光学字符识别)技术应运而生。With the rapid development of Internet technology, there are more and more pictures on the Internet, which causes inconvenience for users to obtain desired pictures. Since pictures not only contain pictures, but also some text, people tend to filter out the pictures they want by searching for the text in the pictures. Driven by this trend, OCR (Optical Character Recognition, Optical Character Recognition) technology came into being. pregnancy.

OCR是指通过图像处理和模式识别技术对光学的字符进行识别的技术，一般包括两个阶段：文字区域识别和文字识别，其中，文字区域识别用于识别出文字在图像中所处的位置，文字识别用于识别出文字区域中的文字。OCR refers to the technology of recognizing optical characters through image processing and pattern recognition technology. It generally includes two stages: text area recognition and text recognition. Among them, text area recognition is used to identify the position of the text in the image. Text recognition is used to recognize the text in the text area.

通常情况下，识别文字区域大多是采用基于Adaboost的方法，该方法主要是利用人工设计的特征进行文字区域识别，由于人工设计特征的误差，使得采用此种方法进行文字区域识别的准确度不高。Under normal circumstances, most of the recognition text areas are based on the Adaboost method. This method mainly uses artificially designed features to identify text areas. Due to the error of artificially designed features, the accuracy of using this method for text area recognition is not high. .

发明内容SUMMARY OF THE INVENTION

为克服相关技术中存在的问题，本公开提供一种识别文字区域的方法、装置及计算机可读存储介质。In order to overcome the problems existing in the related art, the present disclosure provides a method, an apparatus and a computer-readable storage medium for recognizing a text area.

根据本公开实施例的第一方面，提供一种识别文字区域的方法，包括：According to a first aspect of the embodiments of the present disclosure, there is provided a method for recognizing a text area, including:

将待识别图像的特征信息输入文字区域识别模型，得到所述待识别图像中每个第一图像区域的第一概率和第二概率，所述第一概率表征所述第一图像区域是文字区域的概率，所述第二概率表征所述第一图像区域与相邻的图像区域相连的概率，其中，所述文字区域识别模型用于识别图像中的文字区域；Input the feature information of the image to be recognized into the text area recognition model, and obtain the first probability and the second probability of each first image area in the to-be-recognized image, where the first probability indicates that the first image area is a text area The second probability represents the probability that the first image area is connected to an adjacent image area, wherein the text area recognition model is used to identify the text area in the image;

将第一概率大于第一概率阈值的所述第一图像区域确定为第二图像区域；determining the first image area with a first probability greater than a first probability threshold as a second image area;

对第二概率均大于第二概率阈值的两个相邻的所述第二图像区域进行合并；merging two adjacent second image regions whose second probability is greater than a second probability threshold;

根据合并后的图像区域，确定所述待识别图像中的文字区域。According to the combined image area, determine the text area in the to-be-recognized image.

可选地，在将第一概率大于第一概率阈值的所述第一图像区域确定为第二图像区域之后，所述方法还包括：Optionally, after determining the first image area with a first probability greater than a first probability threshold as the second image area, the method further includes:

将所述第二图像区域输入所述文字区域识别模型，得到所述待识别图像中每个所述第二图像区域的位置偏移量；Inputting the second image area into the text area recognition model to obtain the position offset of each of the second image areas in the to-be-recognized image;

根据每个所述第二图像区域的位置偏移量，对该第二图像区域的位置进行调整；adjusting the position of the second image area according to the position offset of each of the second image areas;

对第二概率均大于第二概率阈值的两个相邻的所述第二图像区域进行合并，包括：Merging the two adjacent second image regions whose second probability is greater than the second probability threshold, including:

对第二概率均大于第二概率阈值的两个相邻的且位置调整后的第二图像区域进行合并。Merging two adjacent and position-adjusted second image regions whose second probability is greater than the second probability threshold.

可选地，所述根据合并后的图像区域，确定所述待识别图像中的文字区域，包括：Optionally, determining the text area in the to-be-recognized image according to the merged image area includes:

确定合并后的图像区域的最小外接矩形；Determine the smallest bounding rectangle of the merged image area;

将所述最小外接矩形所在的区域确定为所述待识别图像中的文字区域。The area where the minimum circumscribed rectangle is located is determined as the text area in the to-be-recognized image.

可选地，所述方法还包括：Optionally, the method further includes:

根据样本图像的特征信息以及所述样本图像中的文字区域，对卷积神经网络进行训练，得到所述文字区域识别模型。According to the feature information of the sample image and the text area in the sample image, the convolutional neural network is trained to obtain the text area recognition model.

根据本公开实施例的第二方面，提供一种识别文字区域的装置，包括：According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for recognizing a text area, including:

概率获得模块，被配置为将待识别图像的特征信息输入文字区域识别模型，得到所述待识别图像中每个第一图像区域的第一概率和第二概率，所述第一概率表征所述第一图像区域是文字区域的概率，所述第二概率表征所述图像区域与相邻的图像区域相连的概率，其中，所述文字区域识别模型用于识别图像中的文字区域；The probability obtaining module is configured to input the feature information of the image to be recognized into the text region recognition model, and obtain the first probability and the second probability of each first image region in the to-be-recognized image, the first probability representing the The probability that the first image region is a text region, the second probability represents the probability that the image region is connected to an adjacent image region, wherein the text region recognition model is used to identify the text region in the image;

第一确定模块，被配置为将第一概率大于第一概率阈值的所述第一图像区域确定为第二图像区域；a first determination module, configured to determine the first image area with a first probability greater than a first probability threshold as a second image area;

合并模块，被配置为对第二概率均大于第二概率阈值的两个相邻的所述第二图像区域进行合并；a merging module, configured to merge two adjacent second image regions whose second probability is greater than a second probability threshold;

第二确定模块，被配置为根据合并后的图像区域，确定所述待识别图像中的文字区域。The second determination module is configured to determine the text area in the image to be recognized according to the merged image area.

可选地，所述装置还包括：Optionally, the device further includes:

偏移量获得模块，被配置为将所述第二图像区域输入所述文字区域识别模型，得到所述待识别图像中每个所述第二图像区域的位置偏移量；an offset obtaining module, configured to input the second image area into the text area recognition model to obtain a position offset of each of the second image areas in the to-be-recognized image;

调整模块，被配置为根据每个所述第二图像区域的位置偏移量，对该第二图像区域的位置进行调整；an adjustment module, configured to adjust the position of the second image area according to the position offset of each of the second image areas;

所述合并模块包括：The merging module includes:

合并子模块，被配置为对第二概率均大于第二概率阈值的两个相邻的且位置调整后的第二图像区域进行合并。The merging sub-module is configured to merge two adjacent and position-adjusted second image regions whose second probability is greater than the second probability threshold.

可选地，所述合并模块包括：Optionally, the merging module includes:

第一确定子模块，被配置为确定合并后的图像区域的最小外接矩形；a first determination submodule, configured to determine the minimum circumscribed rectangle of the merged image area;

第二确定子模块，被配置为将所述最小外接矩形所在的区域确定为所述待识别图像中的文字区域。The second determination submodule is configured to determine the area where the minimum circumscribed rectangle is located as the text area in the image to be recognized.

可选地，所述装置还包括：Optionally, the device further includes:

训练模块，被配置为根据样本图像的特征信息以及所述样本图像中的文字区域，对卷积神经网络进行训练，得到所述文字区域识别模型。The training module is configured to train the convolutional neural network according to the feature information of the sample image and the text area in the sample image to obtain the text area recognition model.

根据本公开实施例的第三方面，提供一种计算机可读存储介质，其上存储有计算机程序指令，该程序指令被处理器执行时实现本公开第一方面所提供的识别文字区域的方法的步骤。According to a third aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which computer program instructions are stored, and when the program instructions are executed by a processor, implement the method for identifying a text region provided by the first aspect of the present disclosure. step.

在本公开实施例中，首先，将待识别图像的特征信息输入到文字区域识别模型中，得到该待识别图像中每个第一图像区域的第一概率和第二概率，接着，根据每个第一图像区域的第一概率，筛选出包含有文字的图像区域，然后，在筛选出的包含有文字的图像区域的基础上，进一步分析该图像区域的第二概率，判断是否将该图像区域与其相邻的图像区域相合并，最后，根据合并后的图像区域，确定待识别图像中的文字区域。因此，通过确定待识别图像中每个第一图像区域的第一概率和第二概率，可以直接计算出文字区域，提供了一种准确度高并且识别速度快的文字区域识别方法。In the embodiment of the present disclosure, first, the feature information of the image to be recognized is input into the text area recognition model to obtain the first probability and the second probability of each first image area in the to-be-recognized image, and then, according to each The first probability of the first image area is to filter out the image area containing the text, and then, on the basis of the screened out image area containing the text, the second probability of the image area is further analyzed to determine whether the image area It is combined with its adjacent image areas, and finally, according to the combined image areas, the text area in the image to be recognized is determined. Therefore, by determining the first probability and the second probability of each first image region in the image to be recognized, the text region can be directly calculated, and a text region recognition method with high accuracy and fast recognition speed is provided.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.

图1是根据一示例性实施例示出的一种识别文字区域的方法的流程图。Fig. 1 is a flow chart of a method for recognizing a text area according to an exemplary embodiment.

图2是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。Fig. 2 is another flowchart of a method for recognizing a text area according to an exemplary embodiment.

图3是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。Fig. 3 is another flowchart of a method for recognizing a text area according to an exemplary embodiment.

图4是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。Fig. 4 is another flowchart of a method for recognizing a text area according to an exemplary embodiment.

图5是根据一示例性实施例示出的一种识别文字区域的装置的框图。Fig. 5 is a block diagram of an apparatus for recognizing a text area according to an exemplary embodiment.

图6是根据一示例性实施例示出的一种识别文字区域的装置的另一框图。Fig. 6 is another block diagram of an apparatus for recognizing a text area according to an exemplary embodiment.

图7是根据一示例性实施例示出的一种识别文字区域的装置中合并模块的框图。Fig. 7 is a block diagram of a merging module in an apparatus for recognizing text regions according to an exemplary embodiment.

图8是根据一示例性实施例示出的一种识别文字区域的装置的另一框图。Fig. 8 is another block diagram of an apparatus for recognizing a text area according to an exemplary embodiment.

图9是根据一示例性实施例示出的一种用于识别文字区域的装置的框图。Fig. 9 is a block diagram of an apparatus for recognizing a text area according to an exemplary embodiment.

具体实施方式Detailed ways

这里将详细地对示例性实施例进行说明，其示例表示在附图中。下面的描述涉及附图时，除非另有表示，不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反，它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. Where the following description refers to the drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the illustrative examples below are not intended to represent all implementations consistent with this disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as recited in the appended claims.

本公开实施例提供了一种识别文字区域的方法。请参考图1，图1是根据一示例性实施例示出的一种识别文字区域的方法的流程图，如图1所示，本公开实施例提供的识别文字区域的方法包括以下步骤。An embodiment of the present disclosure provides a method for identifying a text area. Please refer to FIG. 1 , which is a flowchart of a method for recognizing a text area according to an exemplary embodiment. As shown in FIG. 1 , the method for recognizing a text area provided by an embodiment of the present disclosure includes the following steps.

在步骤S11中，将待识别图像的特征信息输入文字区域识别模型，得到所述待识别图像中每个第一图像区域的第一概率和第二概率，所述第一概率表征所述第一图像区域是文字区域的概率，所述第二概率表征所述第一图像区域与相邻的图像区域相连的概率，其中，所述文字区域识别模型用于识别图像中的文字区域。In step S11, the feature information of the image to be recognized is input into the text area recognition model, and the first probability and the second probability of each first image area in the image to be recognized are obtained, and the first probability represents the first probability The probability that an image region is a text region, the second probability represents the probability that the first image region is connected to an adjacent image region, wherein the text region recognition model is used to identify text regions in an image.

在步骤S12中，将第一概率大于第一概率阈值的所述第一图像区域确定为第二图像区域。In step S12, the first image area with a first probability greater than a first probability threshold is determined as a second image area.

在步骤S13中，对第二概率均大于第二概率阈值的两个相邻的所述第二图像区域进行合并。In step S13, two adjacent second image regions whose second probability is greater than a second probability threshold are merged.

在步骤S14中，根据合并后的图像区域，确定所述待识别图像中的文字区域。In step S14, the character area in the image to be recognized is determined according to the merged image area.

CNN(convolutional neural network；卷积神经网络)是一种前馈神经网络，主要是由卷积层和连接层组成的，其中，连接层对输入图像的大小是有要求的，而实际应用中，不同的图像可能其尺寸大小也不一致，无法满足CNN中连接层的要求，可能会造成CNN无法识别图像的后果，因此，在将待识别图像输入全卷积网络之前需要将该图像进行缩放，以满足该CNN中连接层的要求。CNN (convolutional neural network; convolutional neural network) is a feedforward neural network, which is mainly composed of a convolution layer and a connection layer. The connection layer has requirements on the size of the input image, and in practical applications, Different images may have different sizes, which cannot meet the requirements of the connection layer in the CNN, which may cause the CNN to fail to recognize the image. Meet the requirements of the connection layer in this CNN.

在图像处理领域中，由于图像中的各个像素点的像素值仅能表征基本原色素及其灰度的基本编码，没有高层的语义信息，其中，高层的语义信息用于表示图像中具体有哪些物体、该物体在什么位置等等。因此，在将待识别图像进行缩放后，并不能直接进行文字区域识别，而是需获得该图像的高层的语义信息，具体地，可将该缩放后的待识别图像输入到CNN中，得到一个对应于该图像的特征矩阵(以下简称特征图)，从该特征图中识别出待识别图像的高层语义信息，进而可进一步识别出文字所在的区域。In the field of image processing, since the pixel value of each pixel in the image can only represent the basic primitive color and its gray level, there is no high-level semantic information. The high-level semantic information is used to indicate what is in the image The object, where the object is located, etc. Therefore, after the image to be recognized is zoomed, it is not possible to directly recognize the text area, but to obtain high-level semantic information of the image. Specifically, the zoomed image to be recognized can be input into the CNN to obtain a Corresponding to the feature matrix of the image (hereinafter referred to as the feature map), the high-level semantic information of the image to be recognized is identified from the feature map, and then the region where the text is located can be further identified.

在本公开实施例中，待识别图像的特征信息可以从待识别图像的特征图中提取出来，该特征信息中包含有该待识别图像的高层语义信息。首先，在步骤S11中，将待识别图像的特征信息输入文字区域识别模型，得到该待识别图像中每个第一图像区域的第一概率和第二概率，每个第一图像区域的第一概率表征该第一图像区域是文字区域的概率，每个第一图像区域的第二概率表征该第一图像区域是否与相邻的图像区域相连的概率。第一图像区域为待识别图像中的任一图像区域。In the embodiment of the present disclosure, feature information of the image to be recognized may be extracted from a feature map of the image to be recognized, and the feature information includes high-level semantic information of the image to be recognized. First, in step S11, the feature information of the image to be recognized is input into the text area recognition model to obtain the first probability and the second probability of each first image area in the to-be-recognized image, and the first probability of each first image area is obtained. The probability represents the probability that the first image area is a text area, and the second probability of each first image area represents the probability of whether the first image area is connected to an adjacent image area. The first image area is any image area in the image to be recognized.

具体地，在待识别图像中预先设置多个锚点，其中，有一部分锚点用于表示该图像中的区域，简称为区域锚点，另一部分锚点用于表示该图像中相邻区域是否需要连接，简称为连接锚点。在原图中预先设置多个带旋转的矩形区域，将待识别图像分割成多个图像区域，其中，带旋转的矩形区域可表示为(x,y,w,h,θ)，x,y为该带旋转的矩形区域的中心点坐标，w,h为该带旋转的矩形区域的宽度和高度，θ为该带旋转的矩形区域相对于水平方向的旋转角度，每一个区域锚点对应一个带旋转的矩形区域，也即是，对应待识别图像中的一个图像区域。可选地，预设的带旋转的矩形区域可以是互相重叠的。Specifically, a plurality of anchor points are preset in the image to be recognized, and some of the anchor points are used to represent the area in the image, referred to as area anchor points for short, and the other part of the anchor points are used to indicate whether the adjacent areas in the image are not. A connection is required, referred to as a connection anchor for short. In the original image, multiple rectangular areas with rotation are preset, and the image to be recognized is divided into multiple image areas. The rectangular area with rotation can be expressed as (x, y, w, h, θ), and x, y is The coordinates of the center point of the rectangular area with rotation, w, h are the width and height of the rectangular area with rotation, θ is the rotation angle of the rectangular area with rotation relative to the horizontal direction, and each area anchor point corresponds to a belt The rotated rectangular area, that is, corresponds to an image area in the image to be recognized. Optionally, the preset rectangular regions with rotation may overlap each other.

在将待识别图像的特征信息输入文字区域识别模型后，该文字区域识别模型对该待识别图像中的每个第一图像区域进行识别，每个第一图像区域的识别结果中均包括有第一概率和第二概率，其中，第一概率用于表征该第一图像区域为文字区域的概率，第二概率用于表征该第一图像区域与相邻图像区域相连的概率。After the feature information of the image to be recognized is input into the text area recognition model, the text area recognition model recognizes each first image area in the to-be-recognized image, and the recognition result of each first image area includes the first image area. a probability and a second probability, wherein the first probability is used to represent the probability that the first image area is a text area, and the second probability is used to represent the probability that the first image area is connected to an adjacent image area.

可选地，步骤S11中的文字区域识别模型是用于识别图像中的文字区域的模型，可以通过对全卷积网络进行训练后得到的。具体地，可以通过执行步骤S17得到文字区域识别模型。图2是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。如图2所示，所述方法除了包括步骤S11-S14外，还包括步骤S17。Optionally, the text region recognition model in step S11 is a model for recognizing text regions in an image, and can be obtained by training a fully convolutional network. Specifically, the text region recognition model can be obtained by executing step S17. Fig. 2 is another flowchart of a method for recognizing a text area according to an exemplary embodiment. As shown in FIG. 2, the method further includes step S17 in addition to steps S11-S14.

在步骤S17中，根据样本图像的特征信息以及所述样本图像中的文字区域，对卷积神经网络进行训练，得到所述文字区域识别模型。In step S17, the convolutional neural network is trained according to the feature information of the sample image and the text area in the sample image, so as to obtain the text area recognition model.

通常情况下，卷积神经网络中的系数是随机生成的，利用该随机生成的系数，对待识别图像的文字区域进行识别的准确度无法保证，因此，在使用该卷积神经网络进行文字区域识别之前，可根据用户对文字识别准确度的需求，对该卷积网络进行训练，以调整卷积神经网络中的系数，使经过该卷积神经网络识别出的文字区域较为准确。Usually, the coefficients in the convolutional neural network are randomly generated. Using the randomly generated coefficients, the accuracy of identifying the text area of the image to be recognized cannot be guaranteed. Therefore, when using the convolutional neural network for text area recognition Previously, the convolutional network could be trained according to the user's demand for text recognition accuracy to adjust the coefficients in the convolutional neural network, so that the text region recognized by the convolutional neural network is more accurate.

具体地，可将待识别样本图像输入卷积神经网络中，经过卷积神经网络的处理可以输出结果样本图像，在该结果样本图像中至少有一个矩形区域，该矩形区域用以表征识别出的文字区域，由于矩形区域是根据随机生成的系数进行识别得到的，可能该矩形区域不能准确的覆盖待识别样本图像中的文字区域，将该结果样本图像与目标样本图像(即具有矩形框的待识别样本图像，矩形框圈定的区域即为待识别样本图像中的文字区域)对比，根据两者之间的误差，调整卷积神经网络中的系数，以缩小结果样本图像与目标样本图像的误差。多次重复上述方法，直到卷积神经网络输出的结果样本图像与目标样本图像的误差满足预设要求为止，系数调整完毕之后的卷积神经网络即为文字区域识别模型，其中预设要求是用户对文字识别准确度的需求而预先设置的。Specifically, the sample image to be identified can be input into the convolutional neural network, and the resultant sample image can be output through the processing of the convolutional neural network. The result sample image has at least one rectangular area, and the rectangular area is used to represent the identified The text area, since the rectangular area is identified based on randomly generated coefficients, it is possible that the rectangular area cannot accurately cover the text area in the sample image to be recognized. Identify the sample image, the area enclosed by the rectangular frame is the text area in the sample image to be recognized) and compare, and adjust the coefficients in the convolutional neural network according to the error between the two to reduce the error between the result sample image and the target sample image. . Repeat the above method several times until the error between the result sample image output by the convolutional neural network and the target sample image meets the preset requirements, and the convolutional neural network after the coefficient adjustment is completed is the text area recognition model, wherein the preset requirements It is preset according to the requirement of text recognition accuracy.

在执行完步骤S11之后，执行步骤S12，根据步骤S11中确定的待识别图像中每个第一图像区域的第一概率，对该待识别图像中每个第一图像区域进行筛选。具体地，在文字区域识别模型输出的识别结果中，第一图像区域的第一概率的数值较大，则可认为该第一图像区域中含有文字，第一概率的数值较小，则认为该第一图像区域中不含有文字，因此，可预设设置一个第一概率阈值，与每个第一图像区域的第一概率的数值进行比较，将大于该第一概率阈值的第一概率的数值所对应的第一图像区域确定为包含有文字的区域，并将确定的包含有文字的区域筛选出来，作为第二图像区域。可选地，将小于该第一概率阈值的第一概率的数值所对应的第一图像区域确定为不包含有文字的区域，对该区域不做处理。After step S11 is performed, step S12 is performed, and each first image area in the to-be-recognized image is screened according to the first probability of each first image area in the to-be-recognized image determined in step S11. Specifically, in the recognition result output by the text region recognition model, if the value of the first probability of the first image region is large, it can be considered that the first image region contains text, and the value of the first probability is small, it is considered that the value of the first probability is small. The first image area does not contain text, therefore, a first probability threshold can be preset and compared with the value of the first probability of each first image area, and the value of the first probability greater than the first probability threshold The corresponding first image area is determined as an area containing text, and the determined area containing text is selected as the second image area. Optionally, the first image area corresponding to the value of the first probability smaller than the first probability threshold is determined as an area that does not contain text, and no processing is performed on the area.

其中，该第一概率阈值用以表征在第一概率的数值大于第一概率阈值时，该第一概率的数值所对应的第一图像区域中含有文字，在第一概率的数值小于第一概率阈值时，该第一概率的数值所对应的第一图像区域中不含有文字。The first probability threshold is used to represent that when the value of the first probability is greater than the first probability threshold, the first image area corresponding to the value of the first probability contains text, and when the value of the first probability is less than the first probability When the threshold is set, the first image area corresponding to the value of the first probability does not contain text.

在步骤S12中筛选出的图像区域(也即第二图像区域)仅能表征该区域中包含有文字，但是文字的大小不确定，而预设的矩形区域是确定的，该矩形区域并不一定完全包含有该文字，也即是，筛选出的图像区域可能只包含有文字的一部分，与该图像区域相邻的图像区域包含有文字的剩余部分。因此，在本公开实施例中，需要在筛选出的包含有文字的图像区域的基础上，通过执行步骤S13进一步分析该图像区域是否仅包含了文字的一部分。The image area (that is, the second image area) screened in step S12 can only represent that the area contains text, but the size of the text is uncertain, and the preset rectangular area is determined, and the rectangular area is not necessarily The text is completely contained, that is, the filtered image area may contain only a part of the text, and the image area adjacent to the image area contains the remaining part of the text. Therefore, in the embodiment of the present disclosure, it is necessary to further analyze whether the image area only contains a part of the text by performing step S13 on the basis of the screened image area containing the text.

在步骤S13中，对步骤S12中筛选出的包含有文字的图像区域进一步分析其第二概率，并根据该第二概率对筛选出的图像区域进行合并。具体地，在文字区域识别模型输出的识别结果中，每个第一图像区域的第二概率中包含有多个概率数值，该多个概率数值分别表示与该第一图像区域相邻的多个图像区域中是否包含有该文字的剩余部分，其中，概率数值较大，则可认为与该概率数值相对应的相邻的图像区域中也包含有该文字的一部分，两者相连，概率数值较小，则可认为与该概率数值相对应的相邻的图像区域中不包含有该文字的一部分，两者不相连。因此，可预先设置一个第二概率阈值，与步骤S12中筛选出的图像区域的第二概率的数值进行比较，将大于该第二概率阈值的概率数值所对应的相邻的图像区域确定为包含有该文字的一部分的区域，并将该区域与筛选出的图像区域进行合并。In step S13, the second probability is further analyzed for the image regions containing text that are screened out in step S12, and the screened image regions are merged according to the second probability. Specifically, in the recognition result output by the text region recognition model, the second probability of each first image region includes a plurality of probability values, and the plurality of probability values respectively represent a plurality of probability values adjacent to the first image region Whether the image area contains the remaining part of the text, where the probability value is larger, it can be considered that the adjacent image area corresponding to the probability value also contains a part of the text, the two are connected, and the probability value is higher. If it is small, it can be considered that the adjacent image area corresponding to the probability value does not contain a part of the text, and the two are not connected. Therefore, a second probability threshold can be preset, compared with the value of the second probability of the image region screened in step S12, and the adjacent image regions corresponding to the probability value greater than the second probability threshold are determined as containing A region with a part of the text, and merge this region with the filtered image region.

对于在步骤S12中筛选出来的包含有文字的每个图像区域，均执行上述步骤，最后，在步骤S14中，将合并后的图像区域，即可确定为待识别图像中的文字区域。The above steps are performed for each image area containing text screened out in step S12, and finally, in step S14, the combined image area can be determined as the text area in the image to be recognized.

可选地，图3是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。如图3所示，在步骤S12之后，所述方法还包括以下步骤。Optionally, FIG. 3 is another flowchart of a method for recognizing a text area according to an exemplary embodiment. As shown in FIG. 3, after step S12, the method further includes the following steps.

在步骤S15中，将所述第二图像区域输入所述文字区域识别模型，得到所述待识别图像中每个所述第二图像区域的位置偏移量。In step S15, the second image area is input into the text area recognition model to obtain the position offset of each of the second image areas in the image to be recognized.

在步骤S16中，根据所述每个所述第二图像区域的位置偏移量，对该第二图像区域的位置进行调整。In step S16, the position of the second image area is adjusted according to the position offset of each of the second image areas.

相应地，步骤S13具体包括步骤S131。Correspondingly, step S13 specifically includes step S131.

在步骤S131中，对第二概率均大于第二概率阈值的两个相邻的且位置调整后的第二图像区域进行合并。In step S131, two adjacent and position-adjusted second image regions whose second probability is greater than the second probability threshold are merged.

在本公开实施例中，预先设置的多个带旋转的矩形区域的大小以及在图像中的位置是确定的，将待识别图像分割成的每个第一图像区域的大小也是确定的，由于不同的图像中，文字所在的位置并不一样的，因此，在步骤S12中筛选出的图像区域可能包含文字部分的区域面积较小，此时，只要稍微调整该图像区域的位置，即可使该图像区域包含有该文字部分的区域面积增大，便于快速识别出文字区域。因此，在步骤S12中得到第二图像区域后，需将其输入到文字区域识别模型，以得到该第二图像区域的位置偏移量(Δx,Δy,Δw,Δh,Δθ)，并根据该位置偏移量，对该第二图像区域的位置进行调整，使该图像区域中较大的区域面积中可以包含有文字，其中，调整后的图像区域的位置为(x+Δx,y+Δy,w+Δw,h+Δh,θ+Δθ)。In the embodiment of the present disclosure, the sizes and positions in the image of the plurality of preset rectangular regions with rotation are determined, and the size of each first image region into which the to-be-recognized image is divided is also determined. In the image of , the position of the text is not the same. Therefore, the image area screened out in step S12 may contain a small area of the text part. At this time, just adjust the position of the image area slightly to make The area of the image area containing the text portion is increased, which facilitates the rapid identification of the text area. Therefore, after the second image area is obtained in step S12, it needs to be input into the text area recognition model to obtain the position offset (Δx, Δy, Δw, Δh, Δθ) of the second image area, and according to the Position offset, adjust the position of the second image area, so that the larger area in the image area can contain text, wherein, the position of the adjusted image area is (x+Δx,y+Δy , w+Δw, h+Δh, θ+Δθ).

在图像区域进行调整之后，根据该位置调整后的图像区域的第二概率，对位置调整后的图像区域进行合并，具体实施方式如前文所述，此处不再赘述。After the image area is adjusted, the position-adjusted image areas are merged according to the second probability of the position-adjusted image area. The specific implementation manner is as described above, and will not be repeated here.

采用上述技术方案，考虑到图像区域中包含文字的区域面积可能较小的问题，在判断该图像区域是否和相邻的图像区域进行合并之前，首先微调该图像区域，在该图像区域中增大覆盖文字的面积，减少区域合并的操作次数，进一步提高了文字区域识别的速度。With the above technical solution, considering that the area of the image area containing text may be small, before judging whether the image area is merged with the adjacent image area, first fine-tune the image area, and increase the size of the image area in the image area. Covers the area of the text, reduces the number of operations for region merging, and further improves the speed of text region recognition.

可选地，图4是根据一示例性实施例示出的一种识别文字区域的方法的另一流程图。如图4所示，图1中步骤S14具体包括以下步骤。Optionally, FIG. 4 is another flowchart of a method for recognizing a text area according to an exemplary embodiment. As shown in FIG. 4 , step S14 in FIG. 1 specifically includes the following steps.

在步骤S141中，确定合并后的图像区域的最小外接矩形。In step S141, the minimum circumscribed rectangle of the merged image area is determined.

在步骤S142中，将所述最小外接矩形所在的区域确定为所述待识别图像中的文字区域。In step S142, the area where the minimum circumscribed rectangle is located is determined as the text area in the image to be recognized.

通常情况下，由于该图像区域的相邻的图像区域分别位于该图像区域的上、下、左、后、左前、左后、右前及右后的八个方位中，该图像区域可与上述位于其八个方位中的任一个方位的图像区域进行合并，可能合并后的文字区域并不是规则的，而在文字识别技术中，通常用一个矩形来表示文字区域，因此，在图像区域合并后，为了保证文字处于同一个矩形框内，需要对该合并后的图像区域进行处理，以得到一个矩形区域。Usually, since the adjacent image areas of the image area are located in the eight directions of the upper, lower, left, rear, front left, rear left, front right and rear right of the image area, the image area can be the same as the above If the image areas in any of the eight directions are merged, the merged text area may not be regular. In the text recognition technology, a rectangle is usually used to represent the text area. Therefore, after the image areas are merged, In order to ensure that the text is in the same rectangular frame, the combined image area needs to be processed to obtain a rectangular area.

具体地，首先，根据合并后的图像区域的位置，确定出该合并后图形区域的最小外接矩形，该最小外接矩形中包含有待识别图像中的文字，然后，将该最小外接矩形所在的区域确定待识别图像中的文字区域。Specifically, first, according to the position of the merged image area, the minimum circumscribed rectangle of the merged graphic area is determined, and the minimum circumscribed rectangle contains the text in the image to be recognized, and then the area where the minimum circumscribed rectangle is located is determined The text area in the image to be recognized.

基于同一发明构思，本公开实施例还提供了一种识别文字区域的装置。图5是根据一示例性实施例示出的一种识别文字区域的装置的框图。参照图5，该装置500包括：Based on the same inventive concept, an embodiment of the present disclosure also provides an apparatus for recognizing a text area. Fig. 5 is a block diagram of an apparatus for recognizing a text area according to an exemplary embodiment. 5, the apparatus 500 includes:

概率获得模块501，被配置为将待识别图像的特征信息输入文字区域识别模型，得到所述待识别图像中每个第一图像区域的第一概率和第二概率，所述第一概率表征所述第一图像区域是文字区域的概率，所述第二概率表征所述图像区域与相邻的图像区域相连的概率，其中，所述文字区域识别模型用于识别图像中的文字区域；The probability obtaining module 501 is configured to input the feature information of the image to be recognized into the text region recognition model, and obtain the first probability and the second probability of each first image region in the to-be-recognized image, and the first probability represents the the probability that the first image region is a text region, the second probability represents the probability that the image region is connected to an adjacent image region, wherein the text region recognition model is used to identify the text region in the image;

第一确定模块502，被配置为将第一概率大于第一概率阈值的所述第一图像区域确定为第二图像区域；a first determination module 502, configured to determine the first image area with a first probability greater than a first probability threshold as a second image area;

合并模块503，被配置为对第二概率均大于第二概率阈值的两个相邻的所述第二图像区域进行合并；Merging module 503, configured to merge two adjacent second image regions whose second probability is greater than a second probability threshold;

第二确定模块504，被配置为根据合并后的图像区域，确定所述待识别图像中的文字区域。The second determination module 504 is configured to determine the text area in the image to be recognized according to the combined image area.

可选地，图6是根据一示例性实施例示出的一种识别文字区域的装置的另一框图。如图6所示，所述装置500还包括：Optionally, FIG. 6 is another block diagram of an apparatus for recognizing a text area according to an exemplary embodiment. As shown in FIG. 6, the apparatus 500 further includes:

偏移量获得模块505，被配置为将所述第二图像区域输入所述文字区域识别模型，得到所述待识别图像中每个所述第二图像区域的位置偏移量；an offset obtaining module 505, configured to input the second image region into the text region recognition model to obtain the position offset of each of the second image regions in the to-be-recognized image;

调整模块506，被配置为根据每个所述第二图像区域的位置偏移量，对该第二图像区域的位置进行调整；The adjustment module 506 is configured to adjust the position of the second image area according to the position offset of each of the second image areas;

所述合并模块503包括：The merging module 503 includes:

合并子模块5031，被配置为对第二概率均大于第二概率阈值的两个相邻的且位置调整后的第二图像区域进行合并。The merging sub-module 5031 is configured to merge two adjacent and position-adjusted second image regions whose second probability is greater than the second probability threshold.

可选地，图7是根据一示例性实施例示出的一种识别文字区域的装置中合并模块的框图。如图7所示，所述合并模块503包括：Optionally, FIG. 7 is a block diagram of a merging module in an apparatus for recognizing text regions according to an exemplary embodiment. As shown in Figure 7, the merging module 503 includes:

第一确定子模块5032，被配置为确定合并后的图像区域的最小外接矩形；The first determination sub-module 5032 is configured to determine the minimum circumscribed rectangle of the merged image area;

第二确定子模块5033，被配置为将所述最小外接矩形所在的区域确定为所述待识别图像中的文字区域。The second determination sub-module 5033 is configured to determine the area where the minimum circumscribed rectangle is located as the text area in the image to be recognized.

可选地，图8是根据一示例性实施例示出的一种识别文字区域的装置的另一框图。如图8所示，所述装置500还包括：Optionally, FIG. 8 is another block diagram of an apparatus for recognizing a text area according to an exemplary embodiment. As shown in FIG. 8, the apparatus 500 further includes:

训练模块507，被配置为根据样本图像的特征信息以及所述样本图像中的文字区域，对卷积神经网络进行训练，得到所述文字区域识别模型。The training module 507 is configured to train the convolutional neural network according to the feature information of the sample image and the text area in the sample image, so as to obtain the text area recognition model.

关于上述实施例中的装置，其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述，此处将不做详细阐述说明。Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

本公开还提供一种计算机可读存储介质，其上存储有计算机程序指令，该程序指令被处理器执行时实现本公开提供的识别文字区域的方法的步骤。The present disclosure also provides a computer-readable storage medium on which computer program instructions are stored, and when the program instructions are executed by a processor, implement the steps of the method for recognizing a text area provided by the present disclosure.

图9是根据一示例性实施例示出的一种用于识别文字区域的装置800的框图。例如，装置800可以是移动电话，计算机，数字广播终端，消息收发设备，游戏控制台，平板设备，医疗设备，健身设备，个人数字助理等。FIG. 9 is a block diagram of an apparatus 800 for recognizing text regions according to an exemplary embodiment. For example, apparatus 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

参照图9，装置800可以包括以下一个或多个组件：处理组件802，存储器804，电力组件806，多媒体组件808，音频组件810，输入/输出(I/O)的接口812，传感器组件814，以及通信组件816。9, the apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and communication component 816.

处理组件802通常控制装置800的整体操作，诸如与显示，电话呼叫，数据通信，相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令，以完成上述的识别文字区域的方法的全部或部分步骤。此外，处理组件802可以包括一个或多个模块，便于处理组件802和其他组件之间的交互。例如，处理组件802可以包括多媒体模块，以方便多媒体组件808和处理组件802之间的交互。The processing component 802 generally controls the overall operation of the device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the above-described method of identifying text regions. Additionally, processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components. For example, processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

存储器804被配置为存储各种类型的数据以支持在装置800的操作。这些数据的示例包括用于在装置800上操作的任何应用程序或方法的指令，联系人数据，电话簿数据，消息，图片，视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现，如静态随机存取存储器(SRAM)，电可擦除可编程只读存储器(EEPROM)，可擦除可编程只读存储器(EPROM)，可编程只读存储器(PROM)，只读存储器(ROM)，磁存储器，快闪存储器，磁盘或光盘。Memory 804 is configured to store various types of data to support operations at device 800 . Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.

电力组件806为装置800的各种组件提供电力。电力组件806可以包括电源管理系统，一个或多个电源，及其他与为装置800生成、管理和分配电力相关联的组件。Power component 806 provides power to various components of device 800 . Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power to device 800 .

多媒体组件808包括在所述装置800和用户之间的提供一个输出接口的屏幕。在一些实施例中，屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板，屏幕可以被实现为触摸屏，以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界，而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中，多媒体组件808包括一个前置摄像头和/或后置摄像头。当装置800处于操作模式，如拍摄模式或视频模式时，前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。Multimedia component 808 includes a screen that provides an output interface between the device 800 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action. In some embodiments, the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the apparatus 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.

音频组件810被配置为输出和/或输入音频信号。例如，音频组件810包括一个麦克风(MIC)，当装置800处于操作模式，如呼叫模式、记录模式和语音识别模式时，麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器804或经由通信组件816发送。在一些实施例中，音频组件810还包括一个扬声器，用于输出音频信号。Audio component 810 is configured to output and/or input audio signals. For example, audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when device 800 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816 . In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

I/O接口812为处理组件802和外围接口模块之间提供接口，上述外围接口模块可以是键盘，点击轮，按钮等。这些按钮可包括但不限于：主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.

传感器组件814包括一个或多个传感器，用于为装置800提供各个方面的状态评估。例如，传感器组件814可以检测到装置800的打开/关闭状态，组件的相对定位，例如所述组件为装置800的显示器和小键盘，传感器组件814还可以检测装置800或装置800一个组件的位置改变，用户与装置800接触的存在或不存在，装置800方位或加速/减速和装置800的温度变化。传感器组件814可以包括接近传感器，被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器，如CMOS或CCD图像传感器，用于在成像应用中使用。在一些实施例中，该传感器组件814还可以包括加速度传感器，陀螺仪传感器，磁传感器，压力传感器或温度传感器。Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of device 800 . For example, the sensor assembly 814 can detect the open/closed state of the device 800, the relative positioning of components, such as the display and keypad of the device 800, and the sensor assembly 814 can also detect a change in the position of the device 800 or a component of the device 800 , the presence or absence of user contact with the device 800 , the orientation or acceleration/deceleration of the device 800 and the temperature change of the device 800 . Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

通信组件816被配置为便于装置800和其他设备之间有线或无线方式的通信。装置800可以接入基于通信标准的无线网络，如WiFi，2G或3G，或它们的组合。在一个示例性实施例中，通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中，所述通信组件816还包括近场通信(NFC)模块，以促进短程通信。例如，在NFC模块可基于射频识别(RFID)技术，红外数据协会(IrDA)技术，超宽带(UWB)技术，蓝牙(BT)技术和其他技术来实现。Communication component 816 is configured to facilitate wired or wireless communication between apparatus 800 and other devices. Device 800 may access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

在示例性实施例中，装置800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现，用于执行上述识别文字区域的方法。In an exemplary embodiment, apparatus 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), a controller, a microcontroller, a microprocessor or other electronic components are implemented for implementing the above-mentioned method for recognizing a text area.

在示例性实施例中，还提供了一种包括指令的非临时性计算机可读存储介质，例如包括指令的存储器804，上述指令可由装置800的处理器820执行以完成上述识别文字区域的方法。例如，所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as a memory 804 including instructions, executable by the processor 820 of the apparatus 800 to accomplish the above-described method of recognizing a text region. For example, the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

本领域技术人员在考虑说明书及实践本公开后，将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the present disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the present disclosure. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the following claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for identifying text regions, comprising:

inputting feature information of an image to be recognized into a character region recognition model, and obtaining a first probability and a second probability of each first image region in the image to be recognized, wherein the first probability represents the probability that the first image region is a character region, and the second probability represents the probability that the first image region and an adjacent image region have associated character content, and the character region recognition model is used for recognizing the character region in the image;

determining the first image region having a first probability greater than a first probability threshold as a second image region;

merging two adjacent second image regions with second probabilities larger than a second probability threshold;

determining a character area in the image to be recognized according to the combined image area;

wherein, after determining the first image region having a first probability greater than a first probability threshold as a second image region, the method further comprises:

inputting the second image area into the character area recognition model to obtain the position offset of each second image area in the image to be recognized;

adjusting the position of each second image area according to the position offset of the second image area;

merging two adjacent second image regions with second probabilities both greater than a second probability threshold, including:

and merging two adjacent second image areas with the second probabilities larger than a second probability threshold value after position adjustment.

2. The method according to claim 1, wherein the determining a text region in the image to be recognized according to the merged image region comprises:

determining the minimum circumscribed rectangle of the combined image area;

and determining the area where the minimum circumscribed rectangle is located as a character area in the image to be recognized.

3. The method of claim 1, further comprising:

and training the convolutional neural network according to the characteristic information of the sample image and the character area in the sample image to obtain the character area recognition model.

4. An apparatus for recognizing a text region, comprising:

a probability obtaining module configured to input feature information of an image to be recognized into a text region recognition model, and obtain a first probability and a second probability of each first image region in the image to be recognized, wherein the first probability represents a probability that the first image region is a text region, and the second probability represents a probability that the first image region and an adjacent image region have associated text content, and the text region recognition model is used for recognizing the text region in the image;

a first determination module configured to determine the first image region having a first probability greater than a first probability threshold as a second image region;

a merging module configured to merge two adjacent second image regions having a second probability that is greater than a second probability threshold;

the second determining module is configured to determine a character area in the image to be recognized according to the combined image area;

wherein the apparatus further comprises:

an offset obtaining module, configured to input the second image area into the character area recognition model, to obtain a position offset of each second image area in the image to be recognized;

an adjusting module configured to adjust a position of each of the second image regions according to a position offset amount of the second image region;

the merging module comprises:

and the merging submodule is configured to merge two adjacent and position-adjusted second image regions of which the second probabilities are both greater than a second probability threshold.

5. The apparatus of claim 4, wherein the merging module comprises:

a first determination submodule configured to determine a minimum bounding rectangle of the merged image region;

and the second determining submodule is configured to determine the area where the minimum circumscribed rectangle is located as a character area in the image to be recognized.

6. The apparatus of claim 4, further comprising:

and the training module is configured to train the convolutional neural network according to the characteristic information of the sample image and the character area in the sample image to obtain the character area recognition model.

7. An apparatus for recognizing a text region, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

wherein, after determining the first image region having a first probability greater than a first probability threshold as a second image region, the processor is further to:

8. A computer-readable storage medium, on which computer program instructions are stored, which program instructions, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 3.