CN111563505A

CN111563505A - Character detection method and device based on pixel segmentation and merging

Info

Publication number: CN111563505A
Application number: CN201910114195.5A
Authority: CN
Inventors: 田伟伟; 董健; 颜水成; 卢禹锟
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2019-02-14
Filing date: 2019-02-14
Publication date: 2020-08-21

Abstract

The present invention provides a method and device for character detection based on pixel segmentation and merging. The method includes: extracting feature information of a picture to be detected, generating a feature map corresponding to the picture to be detected according to the extracted feature information; performing pixel segmentation on the feature map to obtain Multiple pixel points, analyze the confidence score of each pixel belonging to the text pixel point; extract the position information of the pixel point within the preset reliability score range in the text box to which it belongs, and form it according to the position information of the extracted pixel point. Multiple connected domains of pixel points; combine the pixel points in the same connected domain, and use the combined pixel points to determine the text area on the image to be detected. The invention firstly divides the pixel of the feature map of the picture to be detected, then forms a plurality of connected domains according to the position information of the text pixel points, and combines the pixel points in each connected domain, so that the combined pixel points can effectively, Accurately determine the text area on the image to be detected.

Description

A method and device for character detection based on pixel segmentation and merging

技术领域technical field

本发明涉及文字检测技术领域，特别是涉及基于像素分割合并的文字检测方法及装置。The present invention relates to the technical field of text detection, and in particular, to a text detection method and device based on pixel segmentation and merging.

背景技术Background technique

现有技术中，通用文字检测技术主要包括自然场景下的文字检测以及文本扫描图文字检测，目前较常见的方法有两种，第一种是基于目标检测方法的文字检测，通过卷积神经网络提取不同尺度特征图的每个点对应的置信度以及文本框坐标，之后对所有候选框通过非极大抑制(Non-Maximum Suppression,NMS)去重；另一种是通过卷积神经网络进行像素分割，对每个像素点回归是否属于文字的置信度以及文本框坐标。In the prior art, general text detection technologies mainly include text detection in natural scenes and text detection in text scanned images. There are currently two common methods. The first is text detection based on target detection methods. Convolutional neural network Extract the confidence and text box coordinates corresponding to each point of the feature maps of different scales, and then deduplicate all candidate boxes through Non-Maximum Suppression (NMS); the other is to use convolutional neural networks. Segmentation, return the confidence of whether each pixel belongs to the text and the coordinates of the text box.

但是，以上两种方法都存在不同的问题，从而不能很好的作为通用文字检测技术，第一种方法在文字出现较大旋转角度的时候不能准确的拟合像素点的坐标位置，第二种方法则因为卷积核尺寸的限制，不能完整的检测长文本。However, the above two methods have different problems, so they cannot be used as a general text detection technology. The first method cannot accurately fit the coordinate position of the pixel when the text has a large rotation angle. The second method The method cannot completely detect long texts due to the limitation of the size of the convolution kernel.

发明内容SUMMARY OF THE INVENTION

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的基于像素分割合并的文字检测方法及装置。In view of the above problems, the present invention is proposed in order to provide a method and device for character detection based on pixel segmentation and merging that overcome the above problems or at least partially solve the above problems.

依据本发明一方面，提供了一种基于像素分割合并的文字检测方法，包括：According to one aspect of the present invention, there is provided a text detection method based on pixel segmentation and merging, comprising:

提取待检测图片的特征信息，依据提取的特征信息生成所述待检测图片对应的特征图；Extracting feature information of the picture to be detected, and generating a feature map corresponding to the picture to be detected according to the extracted feature information;

对所述特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值；Perform pixel segmentation on the feature map to obtain a plurality of pixel points, and analyze the confidence score of each pixel point belonging to a text pixel point;

提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，依据提取的像素点的位置信息形成像素点的多个连通域；Extracting the position information of the pixel points within the preset reliability score range in the text box to which they belong, and forming a plurality of connected domains of the pixel points according to the position information of the extracted pixel points;

将同一连通域中的像素点合并，利用合并后的像素点确定所述待检测图片上的文字区域。Pixel points in the same connected domain are combined, and the combined pixel points are used to determine the text area on the picture to be detected.

可选地，所述提取待检测图片的特征信息，依据提取的特征信息生成所述待检测图片对应的特征图，包括：Optionally, the feature information of the picture to be detected is extracted, and the feature map corresponding to the picture to be detected is generated according to the extracted feature information, including:

基于UNet网络结构深度学习模型提取所述待检测图片的特征信息；Extract the feature information of the picture to be detected based on the deep learning model of the UNet network structure;

依据提取的特征信息对所述待检测图片执行上采样、下采样及对应的卷积操作，得到所述待检测图片对应的特征图。Perform up-sampling, down-sampling and corresponding convolution operations on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.

可选地，所述像素点属于文字像素点的置信度分值越大，其属于文字像素点的概率越大且属于图片背景的概率越小；Optionally, the greater the confidence score of the pixel belonging to the text pixel, the greater the probability of it belonging to the text pixel and the smaller the probability of belonging to the picture background;

所述像素点属于文字像素点的置信度分值越小，其属于文字像素点的概率越小且属于图片背景的概率越大。The smaller the confidence score of the pixel belonging to the text pixel, the lower the probability of the pixel belonging to the text pixel and the higher the probability of belonging to the picture background.

可选地，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，包括：Optionally, extract the position information of the pixels within the preset reliability score range in the text box to which they belong, including:

从分割得到多个像素点中获取置信度分值大于预置分值的像素点；Obtain the pixels whose confidence score is greater than the preset score from the plurality of pixels obtained by segmentation;

提取出大于预置分值的像素点在其所属文字框内的坐标值。Extract the coordinate value of the pixel point greater than the preset score in the text box to which it belongs.

可选地，所述依据提取的像素点的位置信息形成像素点的多个连通域，包括：Optionally, forming a plurality of connected domains of pixel points according to the extracted position information of the pixel points, including:

依据提取的任意像素点的坐标值，判断该任意像素点是否与其所属文字框内指定方向上的像素点级联；According to the coordinate value of the extracted arbitrary pixel point, determine whether the arbitrary pixel point is cascaded with the pixel point in the specified direction in the text box to which it belongs;

若是，则该任意像素点和其所属文字框内指定方向上的像素点级联；If so, the arbitrary pixel is cascaded with the pixels in the specified direction in the text box to which it belongs;

依据相互级联的像素点形成多个连通域，其中，属于同一连通域中的像素点互相级联。A plurality of connected domains are formed according to the concatenated pixel points, wherein the pixel points belonging to the same connected domain are concatenated with each other.

对所述特征图进行二值化处理，在二值化处理后的图像中依据提取的像素点的位置信息形成像素点的多个连通域。Binarization processing is performed on the feature map, and a plurality of connected domains of pixel points are formed in the image after the binarization processing according to the position information of the extracted pixel points.

可选地，将同一连通域中的像素点合并，利用合并后的像素点确定所述待检测图片上的文字区域，包括：Optionally, the pixels in the same connected domain are combined, and the combined pixels are used to determine the text area on the picture to be detected, including:

将同一连通域中的像素点合并，依据合并后的像素点的位置信息定位出相应文字框的位置信息；Merge the pixel points in the same connected domain, and locate the position information of the corresponding text box according to the position information of the merged pixel points;

依据所述文字框的位置信息确定出所述待检测图片上的文字区域。The text area on the picture to be detected is determined according to the position information of the text box.

依据本发明另一方面，提供了一种基于像素分割合并的文字检测装置，包括：According to another aspect of the present invention, a text detection device based on pixel segmentation and merging is provided, comprising:

生成模块，适于提取待检测图片的特征信息，依据提取的特征信息生成所述待检测图片对应的特征图；a generating module, adapted to extract feature information of the picture to be detected, and generate a feature map corresponding to the picture to be detected according to the extracted feature information;

分析模块，适于对所述特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值；an analysis module, adapted to perform pixel segmentation on the feature map to obtain a plurality of pixel points, and analyze the confidence score of each pixel point belonging to a text pixel point;

形成模块，适于提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，依据提取的像素点的位置信息形成像素点的多个连通域；A forming module is suitable for extracting the position information of the pixel points within the preset reliability score range in the text frame to which it belongs, and forms a plurality of connected domains of the pixel points according to the position information of the extracted pixel points;

合并模块，适于将同一连通域中的像素点合并，利用合并后的像素点确定所述待检测图片上的文字区域。The merging module is adapted to merge the pixel points in the same connected domain, and use the merged pixel points to determine the text area on the picture to be detected.

可选地，所述生成模块还适于：Optionally, the generating module is further adapted to:

可选地，所述形成模块包括获取单元和提取单元，Optionally, the forming module includes an acquisition unit and an extraction unit,

所述获取单元，适于从分割得到多个像素点中获取置信度分值大于预置分值的像素点；The obtaining unit is adapted to obtain pixels whose confidence score is greater than a preset score from a plurality of pixels obtained by segmentation;

所述提取单元，适于提取出大于预置分值的像素点在其所属文字框内的坐标值。The extraction unit is adapted to extract the coordinate value of the pixel point greater than the preset score in the text box to which it belongs.

可选地，所述形成模块，还包括：Optionally, the forming module also includes:

判断单元，适于依据提取的任意像素点的坐标值，判断该任意像素点是否与其所属文字框内指定方向上的像素点级联；The judgment unit is adapted to judge whether the arbitrary pixel is cascaded with the pixel in the specified direction in the text box to which it belongs, according to the coordinate value of the extracted arbitrary pixel;

形成单元，适于若所述判断单元确定该任意像素点与其所属文字框内指定方向上的像素点级联，则该任意像素点和其所属文字框内指定方向上的像素点级联；Forming unit, suitable for if the judging unit determines that the arbitrary pixel is cascaded with the pixel in the specified direction in the text box to which it belongs, then the arbitrary pixel is cascaded with the pixel in the specified direction in the text box to which it belongs;

可选地，所述形成模块，还适于：Optionally, the forming module is further adapted to:

可选地，所述合并模块，还适于：Optionally, the merging module is further adapted to:

依据本发明再一方面，还提供了一种计算机存储介质，所述计算机存储介质存储有计算机程序代码，当所述计算机程序代码在计算设备上运行时，导致所述计算设备执行上文任意实施例所述的基于像素分割合并的文字检测的方法。According to yet another aspect of the present invention, a computer storage medium is also provided, the computer storage medium stores computer program code, when the computer program code is executed on a computing device, causes the computing device to perform any of the above implementations. The method of text detection based on pixel segmentation and merging described in the example.

依据本发明又一方面，还提供了一种计算设备，包括：处理器；存储有计算机程序代码的存储器；当所述计算机程序代码被所述处理器运行时，导致所述计算设备执行上文任意实施例所述的基于像素分割合并的文字检测的方法。According to yet another aspect of the present invention, there is also provided a computing device, comprising: a processor; a memory storing computer program code; when the computer program code is executed by the processor, it causes the computing device to execute the above The method for character detection based on pixel segmentation and merging described in any embodiment.

在本发明实施例中，通过提取待检测图片的特征信息，依据提取的特征信息生成待检测图片对应的特征图，然后对特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，依据提取的像素点的位置信息形成像素点的多个连通域,将同一连通域中的像素点合并，利用合并后的像素点确定待检测图片上的文字区域。由此，本发明实施例通过先对待检测图片的特征图进行像素分割，然后再依据文字像素点的位置信息形成多个连通域，并将各连通域中的像素点进行合并，每一连通域对应一个文字，从而可以依据合并后的像素点有效、准确地确定出待检测图片上的文字区域。进一步的，由于本发明是采用像素点之间形成的连通域来确定文字区域，因此对于待检测图片上经过旋转的文字、较长的文本等都可以有效地进行检测。In the embodiment of the present invention, the feature information of the picture to be detected is extracted, a feature map corresponding to the picture to be detected is generated according to the extracted feature information, and then a plurality of pixels are obtained by pixel segmentation of the feature map, and it is analyzed that each pixel belongs to a text pixel The confidence score of the point, extract the position information of the pixel point within the preset reliability score range in the text box to which it belongs, and form multiple connected domains of the pixel point according to the position information of the extracted pixel point. The pixel points in the combination are combined, and the combined pixel points are used to determine the text area on the image to be detected. Therefore, in the embodiment of the present invention, pixel segmentation is performed on the feature map of the image to be detected, and then a plurality of connected domains are formed according to the position information of the text pixels, and the pixels in each connected domain are merged. Corresponding to a text, so that the text area on the image to be detected can be effectively and accurately determined according to the merged pixel points. Further, since the present invention uses the connected domain formed between the pixels to determine the text area, it can effectively detect the rotated text and long text on the picture to be detected.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, in order to be able to understand the technical means of the present invention more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and easy to understand , the following specific embodiments of the present invention are given.

根据下文结合附图对本发明具体实施例的详细描述，本领域技术人员将会更加明了本发明的上述以及其他目的、优点和特征。The above and other objects, advantages and features of the present invention will be more apparent to those skilled in the art from the following detailed description of the specific embodiments of the present invention in conjunction with the accompanying drawings.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are for the purpose of illustrating preferred embodiments only and are not to be considered limiting of the invention. Also, the same components are denoted by the same reference numerals throughout the drawings. In the attached image:

图1示出了根据本发明一个实施例的基于像素分割合并的文字检测方法的流程示意图；1 shows a schematic flowchart of a text detection method based on pixel segmentation and merging according to an embodiment of the present invention;

图2示出了根据本发明一个实施例的基于像素分割合并的文字检测过程的示意图；2 shows a schematic diagram of a text detection process based on pixel segmentation and merging according to an embodiment of the present invention;

图3示出了根据本发明一个实施例的基于像素分割合并的文字检测装置的结构示意图；以及FIG. 3 shows a schematic structural diagram of a text detection device based on pixel segmentation and merging according to an embodiment of the present invention; and

图4示出了根据本发明另一个实施例的基于像素分割合并的文字检测装置的结构示意图。FIG. 4 shows a schematic structural diagram of a character detection apparatus based on pixel division and merging according to another embodiment of the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that the present disclosure will be more thoroughly understood, and will fully convey the scope of the present disclosure to those skilled in the art.

为解决上述技术问题，本发明实施例提供了一种基于像素分割合并的文字检测方法。图1示出了根据本发明一个实施例的基于像素分割合并的文字检测方法的流程示意图。参见图1，该方法至少包括步骤S102至步骤S108。In order to solve the above technical problem, an embodiment of the present invention provides a text detection method based on pixel segmentation and merging. FIG. 1 shows a schematic flowchart of a text detection method based on pixel segmentation and merging according to an embodiment of the present invention. Referring to FIG. 1, the method includes at least steps S102 to S108.

步骤S102，提取待检测图片的特征信息，依据提取的特征信息生成待检测图片对应的特征图。Step S102, extracting feature information of the picture to be detected, and generating a feature map corresponding to the picture to be detected according to the extracted feature information.

步骤S104，对特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值。In step S104, pixel segmentation is performed on the feature map to obtain a plurality of pixel points, and the confidence score of each pixel point belonging to a text pixel point is analyzed.

步骤S106，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，依据提取的像素点的位置信息形成像素点的多个连通域。Step S106 , extracting the position information of the pixel points within the preset reliability score range in the text box to which they belong, and forming a plurality of connected domains of the pixel points according to the position information of the extracted pixel points.

步骤S108，将同一连通域中的像素点合并，利用合并后的像素点确定待检测图片上的文字区域。In step S108, the pixel points in the same connected domain are combined, and the combined pixel points are used to determine the text area on the picture to be detected.

在本发明实施例中，可以先提取待检测图片的特征信息，依据提取的特征信息生成待检测图片对应的特征图，然后对特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，进而依据提取的像素点的位置信息形成像素点的多个连通域，将同一连通域中的像素点合并，利用合并后的像素点确定待检测图片上的文字区域。由此，本发明实施例依据待检测图片上分割得到的多个像素点的位置信息形成多个连通域，然后再将各连通域中的像素点进行合并，使得每一连通域对应一个文字，从而可以依据合并后的像素点有效、准确地确定出待检测图片上的文字区域。进一步的，由于本发明是采用像素点之间形成的连通域来确定文字区域，因此对于待检测图片上经过旋转的文字、较长的文本等都可以有效地进行检测。In the embodiment of the present invention, the feature information of the picture to be detected can be extracted first, the feature map corresponding to the picture to be detected can be generated according to the extracted feature information, and then the feature map is divided into pixels to obtain a plurality of pixels, and it is analyzed that each pixel belongs to the text The confidence score of the pixel point, extract the position information of the pixel point within the preset reliability score range in the text box to which it belongs, and then form a plurality of connected domains of the pixel point according to the position information of the extracted pixel point. Pixel points in the connected domain are combined, and the combined pixel points are used to determine the text area on the image to be detected. Thus, the embodiment of the present invention forms a plurality of connected domains according to the position information of a plurality of pixel points obtained by dividing the image to be detected, and then combines the pixel points in each connected domain, so that each connected domain corresponds to a character, Therefore, the text area on the picture to be detected can be determined effectively and accurately according to the merged pixel points. Further, since the present invention uses the connected domain formed between the pixels to determine the text area, it can effectively detect the rotated text and long text on the picture to be detected.

参见上文步骤S102，在本发明一实施例中，在提取待检测图片的特征信息，依据提取的特征信息生成待检测图片对应的特征图时，可以基于UNet网络结构深度学习模型提取待检测图片的特征信息，然后依据提取的特征信息对待检测图片执行上采样、下采样及对应的卷积操作，以得到待检测图片对应的特征图，即对待检测图片进行卷积核处理后得到相应的特征图。Referring to step S102 above, in an embodiment of the present invention, when extracting the feature information of the picture to be detected, and generating the feature map corresponding to the picture to be detected according to the extracted feature information, the picture to be detected can be extracted based on the deep learning model of the UNet network structure Then perform up-sampling, down-sampling and corresponding convolution operations on the image to be detected according to the extracted feature information to obtain the feature map corresponding to the image to be detected, that is, the corresponding feature is obtained after the image to be detected is processed by convolution kernel. picture.

这里的特征图可以指高层语义特征图，其本质上是一个矩阵，例如高层语义特征图本质上是100乘100乘10的三维矩阵，那么这个三维矩阵里面每个100乘100的矩阵都可以看作是一张特征图。The feature map here can refer to the high-level semantic feature map, which is essentially a matrix. For example, the high-level semantic feature map is essentially a three-dimensional matrix of 100 by 100 by 10, then each 100 by 100 matrix in this three-dimensional matrix can be viewed is a feature map.

在本发明一实施例中，在一个分值区域内，像素点属于文字像素点的置信度分值越大，其属于文字像素点的概率越大且属于图片背景的概率越小，相反，像素点属于文字像素点的置信度分值越小，其属于文字像素点的概率越小且属于图片背景的概率越大。In an embodiment of the present invention, in a score area, the greater the confidence score of a pixel belonging to a text pixel, the greater the probability of belonging to a text pixel and the lower the probability of belonging to the picture background. On the contrary, the pixel The smaller the confidence score of the point belonging to the text pixel, the smaller the probability of it belonging to the text pixel and the higher the probability of belonging to the picture background.

例如，在像素点的分值区域0-1内，像素点的置信度分值为0则表示这个像素点完全属于背景，不属于文字，像素点的置信度分值为1则表示这个像素点完全属于文字，不属于背景。置信度分值越靠近1表示像素点属于文字像素点的概率越大，属于背景的概率越小，置信度分值越靠近0表示像素点属于背景的概率越大，属于文字像素点的概率越小。假设分析出一个像素点的置信度分值为0.1，那么可以确定这个像素点大概率是属于背景，小概率属于文字。For example, in the pixel's score area 0-1, the confidence score of the pixel is 0, which means that the pixel belongs to the background completely, not the text. The confidence score of the pixel is 1, which means that the pixel is It belongs entirely to the text, not to the background. The closer the confidence score is to 1, the greater the probability that the pixel belongs to the text pixel, the lower the probability of belonging to the background, the closer the confidence score is to 0, the higher the probability that the pixel belongs to the background, and the higher the probability of belonging to the text pixel. Small. Assuming that the confidence score of a pixel is 0.1, it can be determined that the pixel has a high probability of belonging to the background, and a small probability of belonging to the text.

该实施例中，文字像素点为表示文字上一点的像素点。通常，若待检测图像中包含文字，该文字是由多个文字像素点组合形成。像素点为文字像素点的概率和置信度分值可以表征该像素点为文字像素点的可能性大小。In this embodiment, the text pixel is a pixel representing a point on the text. Generally, if the image to be detected contains text, the text is formed by combining a plurality of text pixels. The probability and confidence score that a pixel is a text pixel can represent the possibility that the pixel is a text pixel.

在本发明一实施例中，可以采用UNet网络结构深度学习模型中的多个卷积层对提取出的待检测图像的特征信息进行卷积运算，得出每个像素点为文字像素点的置信度分值。当然，本发明实施例还可以采用其他的网络模型来提取出待检测图片对应的特征图，此处对网络模型的类型不做具体的限定。In an embodiment of the present invention, multiple convolution layers in the deep learning model of the UNet network structure can be used to perform a convolution operation on the extracted feature information of the image to be detected, so as to obtain the confidence that each pixel is a text pixel. Degree score. Certainly, other network models may also be used in this embodiment of the present invention to extract feature maps corresponding to the pictures to be detected, and the types of network models are not specifically limited here.

参见上文步骤S104，在本发明一实施例中，可以采用过卷积神经网络对特征图进行分割，以分割得到多个像素点，进而分析出每个像素点属于文字像素点的置信度分值，这里的卷积神经网络可以是UNet网络，当然还可以采用其他的卷积神经网络，本发明实施例对此不做具体的限定。Referring to step S104 above, in an embodiment of the present invention, a convolutional neural network may be used to segment the feature map to obtain multiple pixels, and then analyze the confidence score that each pixel belongs to a text pixel. value, the convolutional neural network here may be a UNet network, and of course other convolutional neural networks may also be used, which is not specifically limited in this embodiment of the present invention.

参见上文步骤S106，在本发明一实施例中，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息时，先从分割得到多个像素点中获取置信度分值大于预置分值的像素点，即选取出像素点属于文字像素点概率大的像素点，然后，提取出大于预置分值的像素点在其所属文字框内的坐标值。例如，预设置信度分值范围为大于0.7小于1，那么设置预置分值为0.7，若一像素点的置信度分值大于0.7，那么该像素点属于文字像素点概率较大，此时可以认为该像素点是一个文字像素点。Referring to the above step S106, in an embodiment of the present invention, when extracting the position information of the pixel points within the preset reliability score range in the text box to which they belong, the confidence score is first obtained from a plurality of pixel points obtained by segmentation. For pixels whose value is greater than the preset score, that is, select the pixel with a high probability of belonging to the text pixel, and then extract the coordinate value of the pixel whose value is greater than the preset score in the text box to which it belongs. For example, the preset reliability score range is greater than 0.7 and less than 1, then the preset score is set to 0.7. If the confidence score of a pixel is greater than 0.7, then the pixel has a high probability of belonging to a text pixel. At this time It can be considered that the pixel is a text pixel.

继续参见上文步骤S106，在本发明一实施例中，在依据提取的像素点的位置信息形成像素点的多个连通域时，可以依据提取的任意像素点的坐标值判断该任意像素点是否与其所属文字框内指定方向上的像素点级联，若是，则可以确定该任意像素点和其所属文字框内指定方向上的像素点级联，进而，依据相互级联的像素点可以形成多个连通域，其中，属于一个连通域中的像素点之间是相互级联的。Continue to refer to step S106 above, in an embodiment of the present invention, when multiple connected domains of pixels are formed according to the extracted position information of the pixel, it can be determined whether the arbitrary pixel is based on the coordinate value of the extracted arbitrary pixel. It is cascaded with the pixels in the specified direction in the text box to which it belongs. If so, it can be determined that the arbitrary pixel is cascaded with the pixels in the specified direction in the text box to which it belongs, and further, according to the mutually cascaded pixels, multiple pixels can be formed. A connected domain, in which the pixels belonging to a connected domain are cascaded with each other.

该实施例中，任意像素点的所属文字框内指定方向上的像素点可以是任意像素点所在文字框内上、下、左、右方向上的四个像素点，或者，也可以是任意像素点所在文字框内上、下、左、右、斜上(左斜上和右斜上)、斜下(左斜下、右斜下)方向上的八个像素点，本发明实施例对指定方向不做具体的限定。In this embodiment, the pixel points in the specified direction in the text box to which any pixel point belongs may be four pixels in the upper, lower, left, and right directions in the text box where the arbitrary pixel point is located, or may also be any pixel. Eight pixel points in the upper, lower, left, right, diagonally up (left diagonally up and right diagonally up), and diagonally down (left diagonally down, right diagonally down) directions in the text box where the point is located. The direction is not specifically limited.

在本发明一实施例中，依据提取的像素点的位置信息形成像素点的多个连通域时，还可以先对特征图进行二值化处理，从而在二值化处理后的图像中依据提取的像素点的位置信息形成像素点的多个连通域。In an embodiment of the present invention, when multiple connected domains of pixels are formed according to the position information of the extracted pixels, the feature map may also be binarized first, so that the binarized image can be extracted according to the extracted position information. The position information of the pixel points forms multiple connected domains of the pixel points.

参见上文步骤S108，在本发明一实施例中，在将同一连通域中的像素点合并后，可以依据合并后的像素点的位置信息定位出相应文字框的位置信息，进而依据文字框的位置信息确定出待检测图片上的文字区域。例如，依据合并后的像素点相对于其所属文字框的坐标值定位出文字框的位置。其中，一个文字对应一个文字框，确定出的文字区域是多个文字框的组合。Referring to the above step S108, in an embodiment of the present invention, after merging the pixels in the same connected domain, the position information of the corresponding text frame can be located according to the position information of the merged pixel points, and then according to the position information of the text frame The location information determines the text area on the image to be detected. For example, the position of the text box is located according to the coordinate value of the combined pixel point relative to the text box to which it belongs. One text corresponds to one text box, and the determined text area is a combination of multiple text boxes.

该实施例中，文字的位置信息可以由文字外围的文字框的位置信息来定义，文字框通常为矩形形状，如文字外围的最小外接矩形。在定位出相应文字框的位置信息时，可以采用文字框的四个顶点的坐标来表示文字框的位置，即通过输出文字框的四个顶点的坐标值来定位出文字框的位置信息。或者可以由文字框的一个顶点的坐标及文字框的宽度和高度来表示。In this embodiment, the positional information of the text can be defined by the positional information of the text box around the text, and the text box is usually in a rectangular shape, such as the smallest circumscribed rectangle around the text. When locating the position information of the corresponding text box, the coordinates of the four vertexes of the text frame can be used to represent the position of the text frame, that is, the position information of the text frame can be located by outputting the coordinate values of the four vertexes of the text frame. Or it can be represented by the coordinates of one vertex of the text box and the width and height of the text box.

当检测出图片上的文字区域之后，在本发明一个实施例中，还可以进一步地对文字区域中的文字进行识别，如采用OCR(Optical Character Recognition，光学字符识别)识别，以识别出图片中文字区域中具体包含了那些文字信息。对于一些有关网络安全的单位可以用此方式来判断出图像中是否存在非法的文字。例如，对于包含有黄赌毒文字信息的图片，可以采用本发明方案快速、准确地检测出图片中涉及到黄赌毒文字信息的文字区域，进而采用OCR技术识别出具体的文字内容，对于确实涉及到非法文字图片查找出发布的网站或图片来源，以采取相应的处理手段进行处理。After the text area on the picture is detected, in an embodiment of the present invention, the text in the text area may be further recognized, for example, OCR (Optical Character Recognition, Optical Character Recognition) recognition is used to recognize the Chinese in the picture The text area specifically contains those text information. For some units related to network security, this method can be used to determine whether there is illegal text in the image. For example, for a picture containing text information on pornography, gambling and drug use, the solution of the present invention can be used to quickly and accurately detect the text area in the picture that involves textual information on pornography, gambling and drug use, and then use the OCR technology to identify the specific text content. When it comes to illegal words and pictures, find out the source of the website or picture released, and take corresponding measures to deal with it.

参见图2，现以UNet网络结构深度学习模型为例，对本发明实施例的文字检测方法进行介绍。Referring to FIG. 2 , the text detection method according to the embodiment of the present invention is introduced by taking the UNet network structure deep learning model as an example.

在将待检测图片(即原始图片)输入至UNet网络结构深度学习模型中后，由该深度学习模型对待检测图片进行特征信息提取，然后，将提取的特征信息输入至卷积单元，以在卷积单元中依据提取的特征信息对待检测图片执行上采样、下采样及对应的卷积操作，从而得到待检测图片对应的特征图并生成对应的特征图。图2所示出的conv block pool代表下采样对应的卷积操作，conv block uppool代表上采样对应的卷积操作。在对特征图进行像素分割得到多个像素点后，可以分析出各像素点属于文字像素点的置信度分值linkscore，这里，score表示一个文字像素点的置信度分值，link score后面的4表示与该文字像素点相级联的其所在文字框中上下左右四个方向的四个像素点的置信度分值。进而，提取出置信度分值大于预置分值的像素点相对于其所属文字框text box的位置信息，此处，text box后面的5表示一像素点距离文字框上下左右的边界距离和他旋转的角度这5个值。After the picture to be detected (ie, the original picture) is input into the deep learning model of the UNet network structure, the feature information of the picture to be detected is extracted by the deep learning model, and then the extracted feature information is input into the convolution unit, so that the The product unit performs up-sampling, down-sampling and corresponding convolution operations on the image to be detected according to the extracted feature information, so as to obtain a feature map corresponding to the image to be detected and generate a corresponding feature map. The conv block pool shown in Figure 2 represents the convolution operation corresponding to downsampling, and the conv block uppool represents the convolution operation corresponding to upsampling. After pixel segmentation of the feature map to obtain multiple pixels, the confidence score linkscore of each pixel belonging to a text pixel can be analyzed. Here, score represents the confidence score of a text pixel, and the 4 after the link score Represents the confidence score of the four pixels in the four directions of up, down, left, and right in the text box that are cascaded with the text pixel. Further, the position information of the pixels whose confidence score is greater than the preset score relative to the text box to which they belong is extracted. Here, the 5 behind the text box represents the boundary distance of a pixel from the upper, lower, left, right, and other sides of the text box. The 5 values of the angle of rotation.

基于同一发明构思，本发明实施例还提供了一种基于像素分割合并的文字检测装置，图3示出了根据本发明一个实施例的基于像素分割合并的文字检测的装置结构示意图。参见图3，基于像素分割合并的文字检测的装置300包括生成模块310、分析模块320、形成模块330以及合并模块340。Based on the same inventive concept, an embodiment of the present invention further provides an apparatus for character detection based on pixel division and merging. FIG. 3 shows a schematic structural diagram of an apparatus for character detection based on pixel division and merging according to an embodiment of the present invention. Referring to FIG. 3 , the apparatus 300 for character detection based on pixel segmentation and merging includes a generating module 310 , an analyzing module 320 , a forming module 330 and a merging module 340 .

现介绍本发明实施例的基于像素分割合并的文字检测装置300的各组成或器件的功能以及各部分间的连接关系：The function of each component or device and the connection relationship between each part of the text detection apparatus 300 based on pixel division and merging according to the embodiment of the present invention are now introduced:

生成模块310，适于提取待检测图片的特征信息，依据提取的特征信息生成待检测图片对应的特征图；The generating module 310 is adapted to extract feature information of the picture to be detected, and generate a feature map corresponding to the picture to be detected according to the extracted feature information;

分析模块320，与生成模块310耦合，适于对特征图进行像素分割得到多个像素点，分析各像素点属于文字像素点的置信度分值；The analysis module 320, coupled with the generation module 310, is adapted to perform pixel segmentation on the feature map to obtain a plurality of pixel points, and analyze the confidence score of each pixel point belonging to the text pixel point;

形成模块330，与分析模块320耦合，适于提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，依据提取的像素点的位置信息形成像素点的多个连通域；The forming module 330, coupled with the analysis module 320, is adapted to extract the position information of the pixel points within the preset reliability score range in the text box to which they belong, and form a plurality of connected domains of the pixel points according to the position information of the extracted pixel points ;

合并模块340，与形成模块330耦合，适于将同一连通域中的像素点合并，利用合并后的像素点确定待检测图片上的文字区域。The combining module 340, coupled with the forming module 330, is adapted to combine the pixel points in the same connected domain, and use the combined pixel points to determine the text area on the picture to be detected.

在本发明一实施例中，生成模块310还适于，基于UNet网络结构深度学习模型提取待检测图片的特征信息，依据提取的特征信息对待检测图片执行上采样、下采样及对应的卷积操作，得到待检测图片对应的特征图。In an embodiment of the present invention, the generation module 310 is further adapted to extract feature information of the picture to be detected based on the UNet network structure deep learning model, and perform upsampling, downsampling and corresponding convolution operations on the picture to be detected according to the extracted feature information , to obtain the feature map corresponding to the image to be detected.

在本发明一实施例中，像素点属于文字像素点的置信度分值越大，其属于文字像素点的概率越大且属于图片背景的概率越小；In an embodiment of the present invention, the greater the confidence score of the pixel belonging to the text pixel, the greater the probability of belonging to the text pixel and the lower the probability of belonging to the picture background;

像素点属于文字像素点的置信度分值越小，其属于文字像素点的概率越小且属于图片背景的概率越大。The smaller the confidence score of the pixel belonging to the text pixel, the lower the probability of it belonging to the text pixel and the higher the probability of belonging to the picture background.

本发明实施例还提供了另一种基于像素分割合并的文字检测装置，图4示出了根据本发明另一个实施例的基于像素分割合并的文字检测的装置结构示意图。参见图4，基于像素分割合并的文字检测的装置300中形成模块330还包括获取单元331、提取单元332、判断单元333、形成单元334。An embodiment of the present invention further provides another apparatus for character detection based on pixel division and merging. FIG. 4 shows a schematic structural diagram of an apparatus for character detection based on pixel division and merging according to another embodiment of the present invention. Referring to FIG. 4 , the forming module 330 in the apparatus 300 for character detection based on pixel segmentation and merging further includes an acquiring unit 331 , an extracting unit 332 , a judging unit 333 , and a forming unit 334 .

获取单元331，适于从分割得到多个像素点中获取置信度分值大于预置分值的像素点。The obtaining unit 331 is adapted to obtain a pixel point whose confidence score is greater than a preset score from a plurality of pixel points obtained by segmentation.

提取单元332，与获取单元331耦合，适于提取出大于预置分值的像素点在其所属文字框内的坐标值。The extracting unit 332, coupled with the acquiring unit 331, is adapted to extract the coordinate value of the pixel point greater than the preset score in the text box to which it belongs.

判断单元333，与提取单元332耦合，适于依据提取的任意像素点的坐标值，判断该任意像素点是否与其所属文字框内指定方向上的像素点级联。The judging unit 333, coupled with the extracting unit 332, is adapted to judge whether the arbitrary pixel point is cascaded with the pixel point in the specified direction in the text box to which it belongs according to the coordinate value of the extracted arbitrary pixel point.

形成单元334，与判断单元333耦合，适于若判断单元333确定该任意像素点与其所属文字框内指定方向上的像素点级联，则该任意像素点和其所属文字框内指定方向上的像素点级联，进而依据相互级联的像素点形成多个连通域，其中，属于同一连通域中的像素点互相级联。The forming unit 334, coupled with the judging unit 333, is adapted to if the judging unit 333 determines that the arbitrary pixel is cascaded with the pixel in the specified direction in the text box to which it belongs, then the arbitrary pixel and the pixel in the specified direction in the text box to which it belongs are cascaded. The pixel points are cascaded, and then a plurality of connected domains are formed according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are cascaded with each other.

在本发明一实施例中，形成模块330还适于对特征图进行二值化处理，在二值化处理后的图像中依据提取的像素点的位置信息形成像素点的多个连通域。In an embodiment of the present invention, the forming module 330 is further adapted to perform binarization processing on the feature map, and form a plurality of connected domains of pixel points in the binarized image according to the position information of the extracted pixel points.

本发明一实施例中，合并模块340还适于，将同一连通域中的像素点合并，依据合并后的像素点的位置信息定位出相应文字框的位置信息，依据文字框的位置信息确定出待检测图片上的文字区域。In an embodiment of the present invention, the merging module 340 is further adapted to merge the pixel points in the same connected domain, locate the position information of the corresponding text frame according to the position information of the merged pixel points, and determine the position information of the text frame according to the position information of the text frame. The text area on the image to be detected.

本发明实施例还提供了一种计算机存储介质，计算机存储介质存储有计算机程序代码，当计算机程序代码在计算设备上运行时，导致计算设备执行上文任意实施例中的基于像素分割合并的文字检测的方法。Embodiments of the present invention further provide a computer storage medium, where the computer storage medium stores computer program codes, when the computer program codes are executed on a computing device, the computing device causes the computing device to execute the text based on pixel division and combination in any of the above embodiments method of detection.

本发明实施例还提供了一种计算设备，包括：处理器；存储有计算机程序代码的存储器；当计算机程序代码被处理器运行时，导致计算设备执行上文任意实施例中的基于像素分割合并的文字检测的方法。An embodiment of the present invention also provides a computing device, comprising: a processor; a memory storing computer program code; when the computer program code is executed by the processor, it causes the computing device to perform the pixel division-based combination in any of the above embodiments method of text detection.

根据上述任意一个优选实施例或多个优选实施例的组合，本发明实施例能够达到如下有益效果：According to any one of the above-mentioned preferred embodiments or a combination of multiple preferred embodiments, the embodiments of the present invention can achieve the following beneficial effects:

所属领域的技术人员可以清楚地了解到，上述描述的系统、装置和单元的具体工作过程，可以参考前述方法实施例中的对应过程，为简洁起见，在此不另赘述。Those skilled in the art can clearly understand that, for the specific working processes of the systems, devices and units described above, reference may be made to the corresponding processes in the foregoing method embodiments, and for the sake of brevity, details are not described herein.

另外，在本发明各个实施例中的各功能单元可以物理上相互独立，也可以两个或两个以上功能单元集成在一起，还可以全部功能单元都集成在一个处理单元中。上述集成的功能单元既可以采用硬件的形式实现，也可以采用软件或者固件的形式实现。In addition, each functional unit in each embodiment of the present invention may be physically independent of each other, or two or more functional units may be integrated together, or all functional units may be integrated into one processing unit. The above-mentioned integrated functional units may be implemented in the form of hardware, and may also be implemented in the form of software or firmware.

本领域普通技术人员可以理解：所述集成的功能单元如果以软件的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，其包括若干指令，用以使得一台计算设备(例如个人计算机，服务器，或者网络设备等)在运行所述指令时执行本发明各实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM)、随机存取存储器(RAM)，磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that: if the integrated functional unit is implemented in the form of software and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present invention or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, which includes several instructions to make a computer A computing device (such as a personal computer, a server, or a network device, etc.) executes all or part of the steps of the methods described in the embodiments of the present invention when running the instructions. The aforementioned storage medium includes: a U disk, a removable hard disk, a read only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk and other media that can store program codes.

或者，实现前述方法实施例的全部或部分步骤可以通过程序指令相关的硬件(诸如个人计算机，服务器，或者网络设备等的计算设备)来完成，所述程序指令可以存储于一计算机可读取存储介质中，当所述程序指令被计算设备的处理器执行时，所述计算设备执行本发明各实施例所述方法的全部或部分步骤。Alternatively, all or part of the steps of implementing the foregoing method embodiments may be accomplished by program instructions related to hardware (such as a personal computer, a server, or a computing device such as a network device), and the program instructions may be stored in a computer-readable storage In the medium, when the program instructions are executed by the processor of the computing device, the computing device executes all or part of the steps of the methods described in the embodiments of the present invention.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：在本发明的精神和原则之内，其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案脱离本发明的保护范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: Within the spirit and principle of the present invention, it is still possible to modify the technical solutions recorded in the foregoing embodiments, or to perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the corresponding technical solutions deviate protection scope of the present invention.

本发明实施例提供了A1、一种基于像素分割合并的文字检测方法，包括：The embodiment of the present invention provides A1, a text detection method based on pixel segmentation and merging, including:

A2、根据A1所述的方法，其中，所述提取待检测图片的特征信息，依据提取的特征信息生成所述待检测图片对应的特征图，包括：A2. The method according to A1, wherein the extracting feature information of the picture to be detected, and generating a feature map corresponding to the picture to be detected according to the extracted feature information, includes:

A3、根据A1或A2所述的方法，其中，A3. The method according to A1 or A2, wherein,

所述像素点属于文字像素点的置信度分值越大，其属于文字像素点的概率越大且属于图片背景的概率越小；The greater the confidence score of the pixel point belonging to the text pixel point, the greater the probability of it belonging to the text pixel point and the smaller the probability of belonging to the picture background;

A4、根据A1或A2所述的方法，其中，提取预设置信度分值范围内的像素点在其所属文字框内的位置信息，包括：A4. The method according to A1 or A2, wherein extracting the position information of the pixels within the preset reliability score range in the text box to which they belong includes:

A5、根据A4所述的方法，其中，所述依据提取的像素点的位置信息形成像素点的多个连通域，包括：A5. The method according to A4, wherein forming a plurality of connected domains of pixels according to the extracted position information of the pixels includes:

A6、根据A1或A2所述的方法，其中，所述依据提取的像素点的位置信息形成像素点的多个连通域，包括：A6. The method according to A1 or A2, wherein the forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points includes:

A7、根据A1或A2所述的方法，其中，将同一连通域中的像素点合并，利用合并后的像素点确定所述待检测图片上的文字区域，包括：A7. The method according to A1 or A2, wherein the pixel points in the same connected domain are combined, and the combined pixel points are used to determine the text area on the picture to be detected, including:

B8、一种基于像素分割合并的文字检测装置，包括：B8. A text detection device based on pixel segmentation and merging, comprising:

B9、根据B8所述的装置，其中，所述生成模块还适于：B9. The apparatus according to B8, wherein the generating module is further adapted to:

B10、根据B8或B9所述的装置，其中，B10. The device according to B8 or B9, wherein,

B11、根据B8或B9所述的装置，其中，所述形成模块包括获取单元和提取单元，B11. The apparatus according to B8 or B9, wherein the forming module includes an acquisition unit and an extraction unit,

B12、根据B11所述的装置，其中，所述形成模块，还包括：B12. The device according to B11, wherein the forming module further comprises:

B13、根据B8或B9所述的装置，其中，所述形成模块，还适于：B13. The device according to B8 or B9, wherein the forming module is further adapted to:

B14、根据B8或B9所述的装置，其中，所述合并模块，还适于：B14. The apparatus according to B8 or B9, wherein the merging module is further adapted to:

C15、一种计算机存储介质，所述计算机存储介质存储有计算机程序代码，当所述计算机程序代码在计算设备上运行时，导致所述计算设备执行A1-A7任一项所述的基于像素分割合并的文字检测的方法。C15. A computer storage medium, which stores computer program code, which, when the computer program code is executed on a computing device, causes the computing device to perform the pixel-based segmentation described in any one of A1-A7 Merged text detection method.

C16、一种计算设备，包括：处理器；存储有计算机程序代码的存储器；当所述计算机程序代码被所述处理器运行时，导致所述计算设备执行A1-A7任一项所述的基于像素分割合并的文字检测的方法。C16. A computing device, comprising: a processor; a memory storing computer program code; when the computer program code is executed by the processor, it causes the computing device to execute the method based on any one of A1-A7. A method for pixel segmentation and merged text detection.

Claims

1. A character detection method based on pixel segmentation and merging comprises the following steps:

extracting the characteristic information of a picture to be detected, and generating a characteristic diagram corresponding to the picture to be detected according to the extracted characteristic information;

carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points, and analyzing confidence scores of the pixel points belonging to the character pixel points;

extracting the position information of the pixel points within the preset confidence score range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;

and merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.

2. The method according to claim 1, wherein the extracting feature information of the picture to be detected and generating the feature map corresponding to the picture to be detected according to the extracted feature information includes:

extracting the characteristic information of the picture to be detected based on a UNet network structure deep learning model;

and performing up-sampling, down-sampling and corresponding convolution operation on the picture to be detected according to the extracted feature information to obtain a feature map corresponding to the picture to be detected.

3. The method of claim 1 or 2,

the higher the confidence score of the pixel point belonging to the character pixel point is, the higher the probability of the pixel point belonging to the character pixel point is and the lower the probability of the pixel point belonging to the picture background is;

the smaller the confidence score of the pixel point belonging to the character pixel point is, the smaller the probability of the pixel point belonging to the character pixel point is and the larger the probability of the pixel point belonging to the picture background is.

4. The method according to claim 1 or 2, wherein extracting the position information of the pixel points in the preset confidence score range in the text box to which the pixel points belong comprises:

obtaining pixel points with confidence scores larger than preset scores from a plurality of pixel points obtained by segmentation;

and extracting the coordinate value of the pixel point with the value larger than the preset value in the character frame to which the pixel point belongs.

5. The method of claim 4, wherein the forming of the plurality of connected domains of the pixel points according to the extracted position information of the pixel points comprises:

judging whether any pixel point is cascaded with the pixel point in the appointed direction in the character frame to which the pixel point belongs according to the extracted coordinate value of the any pixel point;

if yes, the arbitrary pixel point and the pixel point in the appointed direction in the text frame to which the arbitrary pixel point belongs are cascaded;

and forming a plurality of connected domains according to the mutually cascaded pixel points, wherein the pixel points belonging to the same connected domain are mutually cascaded.

6. The method according to claim 1 or 2, wherein the forming of the plurality of connected domains of the pixel point according to the extracted position information of the pixel point comprises:

and carrying out binarization processing on the feature map, and forming a plurality of connected domains of the pixel points in the image after binarization processing according to the extracted position information of the pixel points.

7. The method according to claim 1 or 2, wherein merging the pixels in the same connected domain, and determining the text region on the picture to be detected by using the merged pixels comprises:

merging the pixel points in the same connected domain, and positioning the position information of the corresponding text frame according to the position information of the merged pixel points;

and determining the character area on the picture to be detected according to the position information of the character frame.

8. A text detection device based on pixel segmentation and merging, comprising:

the generating module is suitable for extracting the characteristic information of the picture to be detected and generating a characteristic graph corresponding to the picture to be detected according to the extracted characteristic information;

the analysis module is suitable for carrying out pixel segmentation on the characteristic image to obtain a plurality of pixel points and analyzing the confidence score of each pixel point belonging to a character pixel point;

the forming module is suitable for extracting the position information of the pixel points within the preset confidence score value range in the text box to which the pixel points belong, and forming a plurality of connected domains of the pixel points according to the extracted position information of the pixel points;

and the merging module is suitable for merging the pixel points in the same connected domain, and determining the character region on the picture to be detected by using the merged pixel points.

9. A computer storage medium having computer program code stored thereon which, when run on a computing device, causes the computing device to perform the method of pixel segmentation merging based text detection of any one of claims 1-7.

10. A computing device, comprising: a processor; a memory storing computer program code; the computer program code, when executed by the processor, causes the computing device to perform the method for pixel segmentation merging based text detection of any one of claims 1-7.