CN113673497B

CN113673497B - Text detection method, terminal and computer-readable storage medium thereof

Info

Publication number: CN113673497B
Application number: CN202110827395.2A
Authority: CN
Inventors: 尹瑾; 熊剑平
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2024-11-05
Anticipated expiration: 2041-07-21
Also published as: CN113673497A

Abstract

The present invention provides a text detection method, a terminal and a computer-readable storage medium thereof. The text detection method obtains a text image to be detected, extracts features from a feature area and obtains a feature map; performs area detection on the feature map to obtain a boundary box of the feature map; performs text detection on the feature map to obtain an outline frame of the text in the feature map; determines a detection frame of the text to be detected based on the detected boundary box and the detected outline frame, thereby improving the detection accuracy of inclined text lines; and in the text detection method of the present application, it is not necessary to perform edge shrinking processing on the feature area, thereby greatly reducing the text missed detection rate, thereby improving the robustness of text detection.

Description

Text detection method, terminal and computer-readable storage medium thereof

技术领域Technical Field

本发明涉及图像识别技术领域，特别是涉及一种文本检测方法、终端及其计算机可读存储介质。The present invention relates to the technical field of image recognition, and in particular to a text detection method, a terminal and a computer-readable storage medium thereof.

背景技术Background Art

随着科技的发展，借助OCR(Optical Character Recognition，光学字符识别)技术自动识别各类票据、证件信息越来越普遍，如在银行办理业务时，需要对身份证信息进行识别；在交警执法过程中，需要对驾驶证、行驶证等证件信息进行识别。With the development of science and technology, it is becoming more and more common to automatically identify various bills and certificate information with the help of OCR (Optical Character Recognition) technology. For example, when handling business in a bank, it is necessary to identify the identity card information; during the law enforcement process of traffic police, it is necessary to identify the information of driver's licenses, vehicle registration certificates and other certificates.

目前，卡证图像主要以手机拍摄为主，图像中的内容人工打印，容易出现背景复杂、文本行倾斜和文本行粘连等问题，传统的OCR检测算法对自然场景下的卡证图像检测鲁棒性不强。At present, card images are mainly taken with mobile phones, and the content in the images is printed manually, which is prone to problems such as complex background, tilted text lines, and text line adhesion. Traditional OCR detection algorithms are not robust for card image detection in natural scenes.

发明内容Summary of the invention

本发明主要解决的技术问题是提供一种文本检测方法、终端及其计算机可读存储介质，解决现有技术中倾斜文本行的检测精确度不佳的问题。The main technical problem solved by the present invention is to provide a text detection method, a terminal and a computer-readable storage medium thereof, so as to solve the problem of poor detection accuracy of inclined text lines in the prior art.

为解决上述技术问题，本发明采用的第一个技术方案是：提供一种文本检测方法，该文本检测方法包括：获取到待检测文本图像，待检测文本图像至少包括待检测文本的特征区域；对特征区域进行特征提取，得到特征图；对特征图进行区域检测，得到特征图的边界框；对特征图进行文本检测，得到特征图中文本的轮廓边框；基于边界框和轮廓边框，确定待检测文本的检测框。To solve the above technical problems, the first technical solution adopted by the present invention is: to provide a text detection method, which includes: obtaining a text image to be detected, the text image to be detected at least includes a feature area of the text to be detected; performing feature extraction on the feature area to obtain a feature map; performing region detection on the feature map to obtain a bounding box of the feature map; performing text detection on the feature map to obtain an outline frame of the text in the feature map; and determining a detection frame of the text to be detected based on the bounding box and the outline frame.

其中，对特征区域进行特征提取，得到特征图的步骤之前还包括：对待文本检测图像进行分割，得到特征区域；对特征区域进行校正，以使特征区域处于预设角度。Among them, before the step of extracting features from the feature area to obtain the feature map, the step also includes: segmenting the image to be detected for text to obtain the feature area; and correcting the feature area so that the feature area is at a preset angle.

其中，基于边界框和轮廓边框，确定待检测文本的检测框的步骤具体包括：若确定轮廓边框各角点与边界框的对应角点之间的距离均小于第一阈值，则将边界框作为文本检测框。Among them, the step of determining the detection box of the text to be detected based on the bounding box and the contour frame specifically includes: if it is determined that the distance between each corner point of the contour frame and the corresponding corner point of the bounding box is less than a first threshold, then the bounding box is used as the text detection box.

其中，基于边界框和轮廓边框，确定待检测文本的检测框的步骤具体包括：轮廓边框的长边处于水平方向时，若确定轮廓边框组成短边的两个角点在水平方向上的差值大于第二阈值，则将边界框作为文本检测框；或轮廓边框的长边处于竖直方向时，若确定轮廓边框组成短边的两个角点在竖直方向上的差值大于第二阈值，则将边界框作为文本检测框。Among them, based on the bounding box and the outline frame, the step of determining the detection frame of the text to be detected specifically includes: when the long side of the outline frame is in the horizontal direction, if it is determined that the difference between the two corner points of the short side of the outline frame in the horizontal direction is greater than a second threshold, then the bounding box is used as the text detection box; or when the long side of the outline frame is in the vertical direction, if it is determined that the difference between the two corner points of the short side of the outline frame in the vertical direction is greater than the second threshold, then the bounding box is used as the text detection box.

其中，基于边界框和轮廓边框，确定待检测文本的检测框的步骤具体包括：根据边界框和轮廓边框进行加权融合，确定待检测文本的检测框。The step of determining the detection frame of the text to be detected based on the bounding box and the contour frame specifically includes: performing weighted fusion according to the bounding box and the contour frame to determine the detection frame of the text to be detected.

其中，对特征区域进行特征提取，得到特征图，包括：基于训练后的文本检测模型对特征区域进行特征提取，得到特征图；其中，文本检测模型是对初始文本检测模型进行训练得到的，且初始文本检测模型包括特征提取单元、文本检测单元和文本修正单元。Among them, feature extraction is performed on the feature area to obtain a feature map, including: feature extraction is performed on the feature area based on a trained text detection model to obtain a feature map; wherein the text detection model is obtained by training an initial text detection model, and the initial text detection model includes a feature extraction unit, a text detection unit and a text correction unit.

其中，文本检测模型是通过如下方式获得的：获取训练样本集，训练样本集包括多个包含文本的图像样本和文本标注框；对图像样本中包含文本的第一特征区域进行分割；通过特征提取单元对第一特征区域进行特征提取得到第一特征图；文本检测单元对第一特征图进行区域检测得到第一预测框；文本修正单元对第一特征图进行文本检测得到第二预测框；通过第一预测框与文本标注框、第二预测框与文本标注框构建第一损失函数；利用第一损失函数对初始文本检测模型进行迭代训练得到文本检测模型。Among them, the text detection model is obtained in the following manner: obtaining a training sample set, the training sample set includes multiple image samples containing text and text annotation boxes; segmenting a first feature area containing text in the image sample; extracting features from the first feature area by a feature extraction unit to obtain a first feature map; a text detection unit performs area detection on the first feature map to obtain a first prediction box; a text correction unit performs text detection on the first feature map to obtain a second prediction box; constructing a first loss function through the first prediction box and the text annotation box, and the second prediction box and the text annotation box; and using the first loss function to iteratively train the initial text detection model to obtain a text detection model.

其中，训练样本集还包括文本的真实类别；获得文本检测模型的方式还包括：通过初始文本检测模型检测得到文本的预测类别；根据文本的预测类别与真实类别构建第二损失函数；利用第二损失函数对初始文本检测模型进行迭代训练得到文本检测模型。Among them, the training sample set also includes the true category of the text; the method of obtaining the text detection model also includes: obtaining the predicted category of the text through the initial text detection model detection; constructing a second loss function according to the predicted category and the true category of the text; using the second loss function to iteratively train the initial text detection model to obtain the text detection model.

为解决上述技术问题，本发明采用的第二个技术方案是：提供一种终端，该移动机器人包括存储器、处理器以及存储于存储器中并在处理器上运行的计算机程序，处理器用于实现上述文本检测方法中的步骤。To solve the above technical problems, the second technical solution adopted by the present invention is: to provide a terminal, the mobile robot includes a memory, a processor and a computer program stored in the memory and running on the processor, and the processor is used to implement the steps in the above text detection method.

为解决上述技术问题，本发明采用的第三个技术方案是：提供一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序，计算机程序被处理器执行时实现上述文本检测方法中的步骤。In order to solve the above technical problems, the third technical solution adopted by the present invention is: providing a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps in the above text detection method are implemented.

本发明的有益效果是：区别于现有技术的情况，提供的一种文本检测方法、终端及其计算机可读存储介质，通过获取到待检测文本图像，对特征区域进行特征提取进而得到特征图；对特征图进行区域检测得到特征图的边界框；对特征图进行文本检测得到特征图中文本的轮廓边框；基于检测得到的边界框和检测得到的轮廓边框确定待检测文本的检测框，进而提升倾斜文本行的检测精确度；且本申请的文本检测方法中不需要对特征区域进行缩边处理，因此大大降低了文本漏检率，进而提高了文本检测的鲁棒性。The beneficial effects of the present invention are as follows: different from the prior art, a text detection method, a terminal and a computer-readable storage medium thereof are provided, which obtain a text image to be detected, perform feature extraction on a feature area and thereby obtain a feature map; perform area detection on the feature map to obtain a bounding box of the feature map; perform text detection on the feature map to obtain an outline frame of the text in the feature map; determine a detection frame of the text to be detected based on the detected bounding box and the detected outline frame, thereby improving the detection accuracy of inclined text lines; and in the text detection method of the present application, there is no need to perform edge shrinking processing on the feature area, thereby greatly reducing the text missed detection rate and thereby improving the robustness of text detection.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明提供的文本检测方法的流程示意图；FIG1 is a schematic diagram of a flow chart of a text detection method provided by the present invention;

图2是本发明提供的文本检测方法一具体实施例的流程示意图；FIG2 is a flow chart of a specific embodiment of a text detection method provided by the present invention;

图3是图2提供的文本检测方法中步骤S21一具体实施例的流程示意图；FIG3 is a flow chart of a specific embodiment of step S21 in the text detection method provided in FIG2 ;

图4是本发明提供的一实施例中的驾驶证的局部图像；FIG4 is a partial image of a driver's license in an embodiment of the present invention;

图5是本发明提供的文本检测模型一实施例的框架示意图；FIG5 is a schematic diagram of a framework of an embodiment of a text detection model provided by the present invention;

图6是本发明提供的终端一实施方式的示意框图；FIG6 is a schematic block diagram of an embodiment of a terminal provided by the present invention;

图7是本发明提供的计算机可读存储介质一实施方式的示意框图。FIG. 7 is a schematic block diagram of an embodiment of a computer-readable storage medium provided by the present invention.

具体实施方式DETAILED DESCRIPTION

下面结合说明书附图，对本申请实施例的方案进行详细说明。The scheme of the embodiment of the present application is described in detail below in conjunction with the drawings of the specification.

以下描述中，为了说明而不是为了限定，提出了诸如特定系统结构、接口、技术之类的具体细节，以便透彻理解本申请。In the following description, for the purpose of explanation rather than limitation, specific details such as specific system structures, interfaces, and technologies are provided to facilitate a thorough understanding of the present application.

为使本领域的技术人员更好地理解本发明的技术方案，下面结合附图和具体实施方式对本发明所提供的文本检测方法做进一步详细描述。In order to enable those skilled in the art to better understand the technical solution of the present invention, the text detection method provided by the present invention is further described in detail below in conjunction with the accompanying drawings and specific implementation methods.

请参阅图1，图1是本发明提供的文本检测方法的流程示意图。本实施例中提供一种文本检测方法，该文本检测方法包括如下步骤。Please refer to Figure 1, which is a flow chart of a text detection method provided by the present invention. In this embodiment, a text detection method is provided, and the text detection method includes the following steps.

S11：获取到待检测文本图像。S11: Obtain the text image to be detected.

具体地，待检测文本图像可以是终端设备的摄像头拍摄的图像，或者，也可以是终端设备中存储的图像。这里的终端设备可以指手机、平板电脑等设备，也可以为车辆上的车载设备，本发明实施例中对此并不限定。Specifically, the text image to be detected can be an image captured by a camera of a terminal device, or can also be an image stored in the terminal device. The terminal device here can refer to a device such as a mobile phone, a tablet computer, or a vehicle-mounted device on a vehicle, which is not limited in the embodiments of the present invention.

获取的待检测文本图像可以是任何需要进行文本检测的图像，比如：自然场景的图像，盲人导航时，所拍摄的盲人所在的场景下的方向标识、位置标识等的图像；学生作业图像；身份证、驾驶证等证件的文本检测等等。The acquired text image to be detected can be any image that requires text detection, such as: images of natural scenes, images of direction signs, location signs, etc. in the scene where the blind are located when navigating; images of student homework; text detection of documents such as identity cards and driver's licenses, etc.

S12：对特征区域进行特征提取，得到特征图。S12: Extract features from the feature area to obtain a feature map.

具体地，文本检测模型包括分割校正单元、特征提取单元、文本检测单元和文本修正单元。将待检测文本图像输入文本检测模型中，分割校正单元对待文本检测图像进行识别，并分割得到特征区域；之后对根据检测识别得到的特征区域的长宽比对提取的特征区域进行校正，以使特征区域处于预设角度。也可以根据特征区域中识别得到的条目所分布的位置对提取的特征区域进行校正，以使特征区域处于预设角度。通过对待检测文本图像进行分割校正是为了保证减少背景区域对文本检测的干扰，并保证文本区域相对于整张图像的比例不发生改变。Specifically, the text detection model includes a segmentation correction unit, a feature extraction unit, a text detection unit and a text correction unit. The text image to be detected is input into the text detection model, and the segmentation correction unit recognizes the text detection image and segments it to obtain the feature area; then, the extracted feature area is corrected according to the aspect ratio of the feature area obtained by detection and recognition, so that the feature area is at a preset angle. The extracted feature area can also be corrected according to the distribution position of the items identified in the feature area, so that the feature area is at a preset angle. The purpose of segmenting and correcting the text image to be detected is to ensure that the interference of the background area on the text detection is reduced, and to ensure that the proportion of the text area relative to the entire image does not change.

待特征区域调整至预设角度后，特征提取单元对特征区域进行特征提取进而得到特征区域对应的特征图。在一具体实施例中，当特征区域为卡证区域时，特征提取单元对卡证区域进行特征提取，得到卡证特征图。After the feature area is adjusted to the preset angle, the feature extraction unit extracts features from the feature area to obtain a feature map corresponding to the feature area. In a specific embodiment, when the feature area is a card area, the feature extraction unit extracts features from the card area to obtain a card feature map.

S13：对特征图进行区域检测，得到特征图的边界框。S13: Perform region detection on the feature map to obtain a bounding box of the feature map.

具体地，将特征区域对应的特征图输入到文本检测单元，文本检测单元对特征图进行特征提取后，再进行区域检测，进而得到特征图中文本的边界框。具体地，特征图中文本的边界框为矩形检测框。Specifically, the feature map corresponding to the feature area is input to the text detection unit, and the text detection unit extracts features from the feature map and then performs region detection to obtain a bounding box of the text in the feature map. Specifically, the bounding box of the text in the feature map is a rectangular detection box.

S14：对特征图进行文本检测，得到特征图中文本的轮廓边框。S14: Perform text detection on the feature map to obtain the outline of the text in the feature map.

具体地，将特征区域对应的特征图输入到文本修正单元，文本修正单元对特征图进行特征提取后，再进行文本检测，进而得到特征图中文本的四个边角相对于边界框的四个边角的偏移量，进而得到文本的轮廓边框。其中，特征图中文本的轮廓边框为四边形检测框。Specifically, the feature map corresponding to the feature area is input into the text correction unit, and the text correction unit extracts features from the feature map and then performs text detection, thereby obtaining the offsets of the four corners of the text in the feature map relative to the four corners of the bounding box, and then obtaining the outline of the text. The outline of the text in the feature map is a quadrilateral detection box.

S15：基于边界框和轮廓边框，确定待检测文本的检测框。S15: Determine a detection box of the text to be detected based on the bounding box and the contour frame.

具体地，在一可选实施例中，判断轮廓边框各角点与边界框的对应角点之间的距离是否均小于第一阈值。如果轮廓边框的角点与边界框的对应角点之间的距离均小于第一阈值，则将边界框作为文本检测框。在另一可选实施例中，轮廓边框的长边处于水平方向时，判断轮廓边框组成短边的两个角点在水平方向上的差值是否大于第二阈值。如果组成短边的两个角点在水平方向上的差值大于第二阈值，则将边界框作为文本检测框。在另一可选实施例中，轮廓边框的长边处于竖直方向时，判断轮廓边框组成短边的两个角点在竖直方向上的差值是否大于第二阈值。如果组成短边的两个角点在竖直方向上的差值大于第二阈值，则将边界框作为文本检测框。Specifically, in an optional embodiment, it is determined whether the distances between the corner points of the outline frame and the corresponding corner points of the bounding box are all less than a first threshold. If the distances between the corner points of the outline frame and the corresponding corner points of the bounding box are all less than the first threshold, the bounding box is used as a text detection box. In another optional embodiment, when the long side of the outline frame is in the horizontal direction, it is determined whether the difference in the horizontal direction between the two corner points constituting the short side of the outline frame is greater than a second threshold. If the difference in the horizontal direction between the two corner points constituting the short side is greater than the second threshold, the bounding box is used as a text detection box. In another optional embodiment, when the long side of the outline frame is in the vertical direction, it is determined whether the difference in the vertical direction between the two corner points constituting the short side of the outline frame is greater than the second threshold. If the difference in the vertical direction between the two corner points constituting the short side is greater than the second threshold, the bounding box is used as a text detection box.

本实施例中提供的文本检测方法，通过获取到待检测文本图像，采用文本检测模型对特征区域进行特征提取进而得到特征图；对特征图进行区域检测得到特征图的边界框；对特征图进行文本检测得到特征图中文本的轮廓边框；通过将检测得到的边界框和检测得到的轮廓边框进行融合，确定待检测文本的检测框，进而提升倾斜文本行的检测精确度；且本申请的文本检测方法中不需要对特征区域进行缩边处理，因此大大降低了文本漏检率，进而提高了文本检测的鲁棒性。The text detection method provided in the present embodiment obtains a text image to be detected, uses a text detection model to extract features from a feature area and obtains a feature map; performs area detection on the feature map to obtain a bounding box of the feature map; performs text detection on the feature map to obtain an outline frame of the text in the feature map; determines a detection frame for the text to be detected by fusing the detected bounding box and the detected outline frame, thereby improving the detection accuracy of inclined text lines; and the text detection method of the present application does not require edge shrinking processing of the feature area, thereby greatly reducing the text missed detection rate and thereby improving the robustness of text detection.

请参阅图2，图2是本发明提供的文本检测方法一具体实施例的流程示意图。本实施例中提供一种文本检测方法，该文本检测方法包括如下步骤。Please refer to Figure 2, which is a flowchart of a specific embodiment of a text detection method provided by the present invention. In this embodiment, a text detection method is provided, and the text detection method includes the following steps.

S21：对初始文本检测模型进行训练，得到文本检测模型。S21: Train the initial text detection model to obtain a text detection model.

具体地，初始文本检测模型是基于YOLOv3网络构建的。基于YOLOv3网络构建初始文本检测模型可以实现多尺度输出，进而可以有效降低文本检测过程中文本的漏检率。其中，初始文本检测模型包括依次连接的分割校正单元、特征提取单元、文本检测单元和文本修正单元。请参阅图3，图3是图2提供的文本检测方法中步骤S21一具体实施例的流程示意图。具体对初始文本检测模型进行训练包括如下步骤。Specifically, the initial text detection model is constructed based on the YOLOv3 network. Constructing the initial text detection model based on the YOLOv3 network can achieve multi-scale output, thereby effectively reducing the missed detection rate of text in the text detection process. Among them, the initial text detection model includes a segmentation correction unit, a feature extraction unit, a text detection unit and a text correction unit connected in sequence. Please refer to Figure 3, which is a flow chart of a specific embodiment of step S21 in the text detection method provided in Figure 2. Specifically, training the initial text detection model includes the following steps.

S211：获取训练样本集，训练样本集包括多个包含文本的图像样本、文本标注框以及文本的真实类别。S211: Obtain a training sample set, where the training sample set includes a plurality of image samples containing text, text annotation boxes, and true categories of the text.

具体地，获取多个包含文本的图像样本，包含文本的图像样本可以为终端设备的摄像头拍摄的图像。例如，该包含文本的图像样本可以为包含驾驶证的图片、包含身份证的图片，也可以包括拍摄的盲人所在的场景下的方向标识、位置标识等的图像；学生作业图像等等。包含文本的图像样本中还有标记文本的文本标注框。在另一可选实施例中，文本标注框可以为图像样本中文本区域的最小外接矩形的坐标。Specifically, a plurality of image samples containing text are obtained, and the image samples containing text may be images taken by a camera of a terminal device. For example, the image sample containing text may be a picture containing a driver's license, a picture containing an ID card, or may include an image of a direction mark, a location mark, etc., taken in a scene where a blind person is located, an image of a student's homework, etc. The image sample containing text also includes a text annotation box for marking the text. In another optional embodiment, the text annotation box may be the coordinates of the minimum circumscribed rectangle of the text area in the image sample.

请参阅图4，图4是本发明提供的一实施例中的驾驶证的局部图像。在另一可选实施例中，包含文本的图像样本还包括标记文本的真实类别。其中，文本的真实类别可以为键、值两者中的一种。例如，身份证上的文本“姓名”的真实类别为“键”。驾驶证上“品牌型号”、“住址”和“注册登记日期”的真实类别为“键”；身份证上的文本“张×”的真实类别为“值”，驾驶证上“×××××运”、“××省”和“2007-××-××”的真实类别为“值”。通过对文本的类别进行标注可以将粘连的文本进行有效区分。Please refer to Figure 4, which is a partial image of a driver's license in an embodiment of the present invention. In another optional embodiment, the image sample containing text also includes the real category of the marked text. Among them, the real category of the text can be one of the key and value. For example, the real category of the text "name" on the ID card is "key". The real category of "brand model", "address" and "registration date" on the driver's license is "key"; the real category of the text "Zhang ×" on the ID card is "value", and the real category of "××××× transport", "×× province" and "2007-××-××" on the driver's license is "value". By marking the category of the text, the adhered text can be effectively distinguished.

在另一可选实施例中，将包含文本的图像样本可以进行旋转增强处理、仿射增强处理以及光照增强处理后补充到训练样本集中，以扩大训练样本集中的数据量，进而提升训练得到的文本检测模型的鲁棒性。In another optional embodiment, image samples containing text may be subjected to rotation enhancement, affine enhancement, and illumination enhancement processing and then added to the training sample set to expand the amount of data in the training sample set, thereby improving the robustness of the trained text detection model.

在另一可选实施例中，还需要标注图像样本中文本的所处角度。其中，文本的所处角度为文本与水平轴之间的角度。In another optional embodiment, it is also necessary to mark the angle of the text in the image sample, wherein the angle of the text is the angle between the text and the horizontal axis.

S212：对图像样本中包含文本的第一特征区域进行分割。S212: Segment the first feature region containing text in the image sample.

具体地，将上述包含文本的图像样本输入到初始文本检测模型中，分割校正单元识别包含文本的图像样本中包含文本的第一特征区域的所处位置，将包含文本的第一特征区域和背景区域分开，分割校正单元分割提取包含文本的第一特征区域；之后对根据检测识别得到的文本的长宽比对提取的第一特征区域进行校正，以使第一特征区域处于预设角度。也可以根据第一特征区域中识别得到的条目所分布的位置对提取的第一特征区域进行校正，以使第一特征区域处于预设角度。具体地，根据第一特征区域中识别到的“键”和“值”之间的位置关系以及“键”、“值”所处位置与预存的模板中的“键”和“值”之间的位置关系和“键”、“值”所处位置之间的差异确定第一特征区域所处的角度，根据第一特征区域所处的角度将其调节至预设角度。Specifically, the above-mentioned image sample containing text is input into the initial text detection model, the segmentation correction unit identifies the position of the first feature area containing text in the image sample containing text, separates the first feature area containing text from the background area, and the segmentation correction unit segments and extracts the first feature area containing text; then the extracted first feature area is corrected according to the aspect ratio of the text obtained by detection and recognition, so that the first feature area is at a preset angle. The extracted first feature area can also be corrected according to the position of the items identified in the first feature area, so that the first feature area is at a preset angle. Specifically, the angle of the first feature area is determined according to the positional relationship between the "key" and "value" identified in the first feature area, the positional relationship between the "key" and "value" and the pre-stored template The position of the "key" and "value" and the difference between the position of the "key" and "value" determine the angle of the first feature area, and adjust it to the preset angle according to the angle of the first feature area.

在一具体实施例中，将第一特征区域旋转至预设角度。例如，预设角度可以为与水平轴之间的角度为0°，90°，180°或270°，进而便于后续对第一特征区域进行特征提取，提高第一特征区域文本检测的准确率。In a specific embodiment, the first feature region is rotated to a preset angle. For example, the preset angle may be 0°, 90°, 180° or 270° with respect to the horizontal axis, thereby facilitating subsequent feature extraction of the first feature region and improving the accuracy of text detection in the first feature region.

在另一可选实施例中，也可以根据标注的文本的所处角度对第一特征区域进行旋转调节，以使第一特征区域与水平轴之间的角度为0°，90°，180°或270°，进而便于后续对第一特征区域进行特征提取。进而提高第一特征区域的文本检测准确率。In another optional embodiment, the first feature region may be rotated and adjusted according to the angle of the annotated text, so that the angle between the first feature region and the horizontal axis is 0°, 90°, 180° or 270°, thereby facilitating subsequent feature extraction of the first feature region, thereby improving the accuracy of text detection in the first feature region.

S213：对第一特征区域进行特征提取得到第一特征图。S213: Extract features from the first feature area to obtain a first feature map.

具体地，待第一特征区域调整至预设角度后，特征提取单元对第一特征区域进行特征提取进而得到第一特征区域对应的第一特征图。在一具体实施例中，当第一特征区域为卡证区域时，特征提取单元对卡证区域进行特征提取，得到卡证特征图。Specifically, after the first feature area is adjusted to a preset angle, the feature extraction unit extracts features from the first feature area to obtain a first feature map corresponding to the first feature area. In a specific embodiment, when the first feature area is a card area, the feature extraction unit extracts features from the card area to obtain a card feature map.

S214：对第一特征图进行区域检测得到第一预测框；对第一特征图进行文本检测得到第二预测框。S214: performing region detection on the first feature map to obtain a first prediction box; performing text detection on the first feature map to obtain a second prediction box.

具体地，将第一特征区域对应的第一特征图输入到文本检测单元，文本检测单元对第一特征图进行特征提取后，再进行区域检测，进而得到第一特征图中文本的第一预测框。具体地，第一特征图中文本的第一预测框为矩形检测框。Specifically, the first feature map corresponding to the first feature area is input to the text detection unit, and the text detection unit extracts features from the first feature map and then performs region detection to obtain a first prediction box of the text in the first feature map. Specifically, the first prediction box of the text in the first feature map is a rectangular detection box.

具体地，将第一特征区域对应的第一特征图输入到文本修正单元，文本修正单元对第一特征图进行特征提取后，再进行文本检测，进而得到第一特征图中文本的四个边角相对于第一预测框的四个边角的偏移量，进而得到文本的第二预测框。其中，第一特征图中文本的第二预测框为四边形检测框。Specifically, the first feature map corresponding to the first feature area is input to the text correction unit, and the text correction unit performs feature extraction on the first feature map and then performs text detection, thereby obtaining the offsets of the four corners of the text in the first feature map relative to the four corners of the first prediction box, and then obtaining the second prediction box of the text. The second prediction box of the text in the first feature map is a quadrilateral detection box.

S215：通过第一预测框与文本标注框、第二预测框与文本标注框构建第一损失函数。S215: Construct a first loss function through the first prediction box and the text annotation box, and the second prediction box and the text annotation box.

具体地，采用回归损失函数对第一预测框与文本标注框、第二预测框与文本标注框之间的误差值进行计算。在一具体实施例中，第一损失函数为如下公式(1)所示的回归损失函数。Specifically, a regression loss function is used to calculate the error values between the first prediction box and the text annotation box, and between the second prediction box and the text annotation box. In a specific embodiment, the first loss function is a regression loss function shown in the following formula (1).

L＝a₀×exp(-kt)×L₁+(1-a₀×exp(-kt))×L₂ (1)L＝a ₀ ×exp(-kt)×L ₁ +(1-a ₀ ×exp(-kt))×L ₂ (1)

式中：L₁为矩形检测框的回归损失值，L₂为最终的四边形检测框的损失函数，a₀初始值为1，t是迭代次数，k为超参数，函数初始情况下L₁的系数为1，L₂的系数为0。Where: _L1 is the regression loss value of the rectangular detection box, _L2 is the loss function of the final quadrilateral detection box, _a0 is initially 1, t is the number of iterations, k is a hyperparameter, and the coefficient of _L1 is 1 and the coefficient of _L2 is 0 in the initial case of the function.

其中，随着训练代数的增加，L₁前系数逐渐衰减，L₂系数逐渐增加，即矩形检测框系数逐渐降低，四边形检测框系数逐渐增加。Among them, with the increase of training generations, the _L1 coefficient gradually decays and the _L2 coefficient gradually increases, that is, the rectangular detection frame coefficient gradually decreases and the quadrilateral detection frame coefficient gradually increases.

S216：利用第一损失函数对初始文本检测模型进行迭代训练得到文本检测模型。S216: Using the first loss function to iteratively train the initial text detection model to obtain a text detection model.

具体地，通过第一预测框与文本标注框、第二预测框与文本标注框之间的误差值对初始文本检测模型进行迭代训练得到文本检测模型。Specifically, the initial text detection model is iteratively trained through the error values between the first prediction box and the text annotation box, and between the second prediction box and the text annotation box to obtain the text detection model.

在一可选实施例中，初始文本检测模型的结果反向传播，根据第一损失函数反馈的损失值对初始文本检测模型的权重进行修正。在一可选实施例中，也可以对初始文本检测模型中的参数进行修正，实现对初始文本检测模型的训练。In an optional embodiment, the result of the initial text detection model is back-propagated, and the weight of the initial text detection model is modified according to the loss value fed back by the first loss function. In an optional embodiment, the parameters in the initial text detection model can also be modified to achieve the training of the initial text detection model.

将包含文本的图像样本输入到初始文本检测模型中，初始文本检测模型对文本进行预测。当第一预测框与文本标注框、第二预测框与文本标注框之间的误差值小于预设阈值，预设阈值可以自行设置，例如1％、5％等，则停止对初始文本检测模型的训练并获得文本检测模型。The image sample containing text is input into the initial text detection model, and the initial text detection model predicts the text. When the error value between the first prediction box and the text annotation box, and the error value between the second prediction box and the text annotation box is less than a preset threshold value, which can be set by itself, such as 1%, 5%, etc., the training of the initial text detection model is stopped and the text detection model is obtained.

S217：对第一特征图中检测得到的文本进行识别得到文本的预测类别。S217: Recognize the text detected in the first feature map to obtain a predicted category of the text.

具体地，将包含文本的图像样本输入到初始文本检测模型中，初始文本检测模型对图像样本中的文本类别进行预测，得到文本的预测类别。Specifically, an image sample containing text is input into an initial text detection model, and the initial text detection model predicts the text category in the image sample to obtain a predicted category of the text.

S218：根据文本的预测类别与真实类别构建第二损失函数。S218: Construct a second loss function according to the predicted category and the true category of the text.

具体地，采用交叉熵损失函数对预测类别与真实类别之间的误差值进行计算。在一具体实施例中，第二损失函数为交叉熵损失Cross-entropy Loss。Specifically, a cross-entropy loss function is used to calculate the error value between the predicted category and the true category. In a specific embodiment, the second loss function is a cross-entropy loss.

S219：利用第二损失函数对文本检测模型进行迭代训练。S219: Iteratively train the text detection model using the second loss function.

具体地，通过预测类别与真实类别之间的误差值对文本检测模型进行迭代训练。Specifically, the text detection model is iteratively trained by the error value between the predicted category and the true category.

在一可选实施例中，文本检测模型的结果反向传播，根据第二损失函数反馈的损失值对文本检测模型的权重进行修正。在一可选实施例中，也可以对文本检测模型中的参数进行修正，实现对文本检测模型的训练。In an optional embodiment, the result of the text detection model is back-propagated, and the weight of the text detection model is modified according to the loss value fed back by the second loss function. In an optional embodiment, the parameters in the text detection model can also be modified to achieve training of the text detection model.

当文本的预测类别与真实类别之间的误差值小于预设阈值，预设阈值可以自行设置，例如1％、5％等，则停止对文本检测模型的训练。When the error value between the predicted category and the true category of the text is less than a preset threshold, which can be set by yourself, such as 1%, 5%, etc., the training of the text detection model is stopped.

S22：获取到待检测文本图像。S22: Obtain the text image to be detected.

S23：对待文本检测图像进行分割得到特征区域。S23: Segment the image to be detected for text to obtain feature regions.

具体地，将待检测文本图像输入文本检测模型中，分割校正单元识别待检测文本图像中特征区域的所处位置，将待检测文本中的特征区域和背景区域分开，分割校正单元分割提取待检测文本图像中的特征区域。Specifically, the text image to be detected is input into the text detection model, the segmentation correction unit identifies the location of the feature area in the text image to be detected, separates the feature area and the background area in the text to be detected, and the segmentation correction unit segments and extracts the feature area in the text image to be detected.

S24：对特征区域进行校正，以使特征区域处于预设角度。S24: Correcting the feature area so that the feature area is at a preset angle.

具体地，对根据检测识别得到的特征区域的长宽比对提取的特征区域进行校正，以使特征区域处于预设角度。也可以根据特征区域中识别得到的条目所分布的位置对提取的特征区域进行校正，以使特征区域处于预设角度。在一具体实施例中，将特征区域旋转至预设角度。例如，预设角度可以为与水平轴之间的角度为0°，90°，180°或270°，进而便于后续对特征区域进行特征提取，提高特征区域文本检测的准确率。通过对待检测文本图像进行分割校正是为了保证减少背景区域对文本检测的干扰，并保证文本区域相对于整张图像的比例不发生改变。Specifically, the extracted feature area is corrected according to the aspect ratio of the feature area obtained by detection and identification, so that the feature area is at a preset angle. The extracted feature area can also be corrected according to the positions of the items identified in the feature area, so that the feature area is at a preset angle. In a specific embodiment, the feature area is rotated to a preset angle. For example, the preset angle can be 0°, 90°, 180° or 270° with the horizontal axis, thereby facilitating subsequent feature extraction of the feature area and improving the accuracy of text detection in the feature area. The purpose of segmenting and correcting the text image to be detected is to ensure that the interference of the background area on the text detection is reduced, and to ensure that the proportion of the text area relative to the entire image does not change.

S25：通过文本检测模型对特征区域进行特征提取，得到特征图。S25: Extract features from the feature area through the text detection model to obtain a feature map.

请参阅图5，图5是本发明提供的文本检测模型一实施例的框架示意图。具体地，待特征区域调整至预设角度后，特征提取单元对特征区域进行特征提取进而得到特征区域对应的特征图。在一具体实施例中，当特征区域为卡证区域时，特征提取单元对卡证区域进行特征提取，得到卡证特征图。Please refer to FIG5, which is a schematic diagram of the framework of an embodiment of a text detection model provided by the present invention. Specifically, after the feature area is adjusted to a preset angle, the feature extraction unit extracts features from the feature area and obtains a feature map corresponding to the feature area. In a specific embodiment, when the feature area is a card area, the feature extraction unit extracts features from the card area to obtain a card feature map.

S26：对特征图进行区域检测，得到特征图的边界框。S26: Perform region detection on the feature map to obtain a bounding box of the feature map.

具体地，将特征区域对应的特征图输入到文本检测单元，文本检测单元对特征图进行特征提取后，再进行区域检测，进而得到特征图中文本的边界框。具体地，特征图中文本的边界框为矩形边界框。Specifically, the feature map corresponding to the feature area is input to the text detection unit, and the text detection unit extracts features from the feature map and then performs region detection to obtain a bounding box of the text in the feature map. Specifically, the bounding box of the text in the feature map is a rectangular bounding box.

S27：对特征图进行文本检测，得到特征图中文本的轮廓边框。S27: Perform text detection on the feature map to obtain the outline of the text in the feature map.

具体地，将特征区域对应的特征图输入到文本修正单元，文本修正单元对特征图进行特征提取后，再进行文本检测，进而得到特征图中文本的四个边角相对于边界框的四个边角的偏移量，进而得到文本的四边形轮廓边框。其中，特征图中文本的轮廓边框为四边形检测框。Specifically, the feature map corresponding to the feature area is input into the text correction unit, and the text correction unit extracts features from the feature map and then performs text detection, thereby obtaining the offsets of the four corners of the text in the feature map relative to the four corners of the bounding box, and then obtaining the quadrilateral outline frame of the text. The outline frame of the text in the feature map is a quadrilateral detection frame.

S28：根据边界框和轮廓边框进行加权融合，确定待检测文本的检测框。S28: Perform weighted fusion based on the bounding box and the contour frame to determine the detection box of the text to be detected.

具体地，具体地，在一可选实施例中，判断轮廓边框各角点与边界框的对应角点之间的距离是否均小于第一阈值。如果轮廓边框的角点与边界框的对应角点之间的距离均小于第一阈值，则将边界框作为文本检测框。在另一可选实施例中，轮廓边框的长边处于水平方向时，判断轮廓边框组成短边的两个角点在水平方向上的差值是否大于第二阈值。如果组成短边的两个角点在水平方向上的差值大于第二阈值，则将边界框作为文本检测框。在另一可选实施例中，轮廓边框的长边处于竖直方向时，判断轮廓边框组成短边的两个角点在竖直方向上的差值是否大于第二阈值。如果组成短边的两个角点在竖直方向上的差值大于第二阈值，则将边界框作为文本检测框。Specifically, in an optional embodiment, it is determined whether the distances between the corner points of the outline frame and the corresponding corner points of the bounding box are all less than a first threshold. If the distances between the corner points of the outline frame and the corresponding corner points of the bounding box are all less than the first threshold, the bounding box is used as a text detection box. In another optional embodiment, when the long side of the outline frame is in the horizontal direction, it is determined whether the difference in the horizontal direction between the two corner points constituting the short side of the outline frame is greater than a second threshold. If the difference in the horizontal direction between the two corner points constituting the short side is greater than the second threshold, the bounding box is used as a text detection box. In another optional embodiment, when the long side of the outline frame is in the vertical direction, it is determined whether the difference in the vertical direction between the two corner points constituting the short side of the outline frame is greater than the second threshold. If the difference in the vertical direction between the two corner points constituting the short side is greater than the second threshold, the bounding box is used as a text detection box.

参阅图6，图6是本发明提供的终端一实施方式的示意框图。如图6所示，该实施方式中的终端70包括：处理器71、存储器72以及存储在存储器72中并可在处理器71上运行的计算机程序该计算机程序被处理器71执行时实现上述文本检测方法中，为避免重复，此处不一一赘述。Refer to Figure 6, which is a schematic block diagram of a terminal embodiment provided by the present invention. As shown in Figure 6, the terminal 70 in this embodiment includes: a processor 71, a memory 72, and a computer program stored in the memory 72 and executable on the processor 71. When the computer program is executed by the processor 71, the above-mentioned text detection method is implemented. To avoid repetition, it is not described one by one here.

参阅图7，图7是本发明提供的计算机可读存储介质一实施方式的示意框图。Please refer to FIG. 7 , which is a schematic block diagram of an embodiment of a computer-readable storage medium provided by the present invention.

本申请的实施方式中还提供一种计算机可读存储介质90，计算机可读存储介质90存储有计算机程序901，计算机程序901中包括程序指令，处理器执行程序指令，实现本申请实施方式提供的任一项文本检测方法。A computer-readable storage medium 90 is also provided in an embodiment of the present application. The computer-readable storage medium 90 stores a computer program 901. The computer program 901 includes program instructions. The processor executes the program instructions to implement any text detection method provided in the embodiment of the present application.

其中，计算机可读存储介质90可以是前述实施方式的计算机设备的内部存储单元，例如计算机设备的硬盘或内存。计算机可读存储介质90也可以是计算机设备的外部存储设备，例如计算机设备上配备的插接式硬盘，智能存储卡(Smart Media Card，SMC)，安全数字(Secure Digital，SD)卡，闪存卡(Flash Card)等。The computer-readable storage medium 90 may be an internal storage unit of the computer device of the aforementioned embodiment, such as a hard disk or memory of the computer device. The computer-readable storage medium 90 may also be an external storage device of the computer device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the computer device.

以上仅为本发明的实施方式，并非因此限制本发明的专利保护范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only implementation modes of the present invention, and are not intended to limit the patent protection scope of the present invention. Any equivalent structure or equivalent process transformation made using the contents of the present invention specification and drawings, or directly or indirectly applied in other related technical fields, are also included in the patent protection scope of the present invention.

Claims

1. A text detection method, characterized in that the text detection method comprises:

Acquire a text image to be detected, wherein the text image to be detected at least includes a feature area of the text to be detected;

Performing feature extraction on the feature area to obtain a feature map;

Performing region detection on the feature map to obtain a bounding box of the feature map;

Performing text detection on the feature map to obtain a contour frame of the text in the feature map;

Based on the bounding box and the outline frame, determining the detection frame of the text to be detected; wherein the step of determining the detection frame of the text to be detected based on the bounding box and the outline frame specifically includes:

If it is determined that the distances between the corner points of the outline frame and the corresponding corner points of the bounding box are all less than a first threshold, the bounding box is used as the text detection box;

Wherein, when the long side of the outline frame is in the horizontal direction, if it is determined that the difference between the two corner points constituting the short side of the outline frame in the horizontal direction is greater than a second threshold, the boundary box is used as the text detection box;

Or when the long side of the outline frame is in the vertical direction, if it is determined that the difference between the two corner points constituting the short side of the outline frame in the vertical direction is greater than a second threshold, the boundary box is used as the text detection box.

2. The text detection method according to claim 1, characterized in that:

Before the step of extracting features from the feature area to obtain a feature map, the step further includes:

Segmenting the to-be-detected text image to obtain the feature region;

The characteristic region is corrected to obtain a characteristic region at a preset angle.

3. The text detection method according to claim 1, characterized in that:

The step of determining the detection frame of the text to be detected based on the boundary box and the outline frame specifically includes:

A detection frame of the text to be detected is determined by performing weighted fusion on the boundary box and the contour frame.

4. The text detection method according to claim 2, wherein the extracting features from the feature area to obtain a feature map comprises:

The feature area is subjected to feature extraction based on the trained text detection model to obtain the feature map; wherein the text detection model is obtained by training an initial text detection model, and the initial text detection model includes a feature extraction unit, a text detection unit and a text correction unit.

5. The text detection method according to claim 4, wherein the text detection model is obtained by:

Acquire a training sample set, wherein the training sample set includes a plurality of image samples containing text and text annotation boxes;

Segmenting a first feature region containing the text in the image sample;

Extracting features from the first feature area by the feature extraction unit to obtain a first feature map;

The text detection unit performs region detection on the first feature map to obtain a first prediction frame; the text correction unit performs text detection on the first feature map to obtain a second prediction frame;

Constructing a first loss function through the first prediction box and the text annotation box, and the second prediction box and the text annotation box;

The initial text detection model is iteratively trained using the first loss function to obtain the text detection model.

6. The text detection method according to claim 5, characterized in that the training sample set also includes the real category of the text;

The method of obtaining the text detection model also includes:

Obtaining a predicted category of the text by detecting with the initial text detection model;

Constructing a second loss function according to the predicted category of the text and the true category;

The initial text detection model is iteratively trained using the second loss function to obtain the text detection model.

7. A terminal, characterized in that the terminal includes a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor is used to execute the program data to implement the steps in the text detection method as described in any one of claims 1 to 6.

8. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the text detection method according to any one of claims 1 to 6 are implemented.