[go: up one dir, main page]

CN111325195A - Text recognition method, device and electronic device - Google Patents

Text recognition method, device and electronic device Download PDF

Info

Publication number
CN111325195A
CN111325195A CN202010097683.2A CN202010097683A CN111325195A CN 111325195 A CN111325195 A CN 111325195A CN 202010097683 A CN202010097683 A CN 202010097683A CN 111325195 A CN111325195 A CN 111325195A
Authority
CN
China
Prior art keywords
text
adjacent lines
text blocks
block
merging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010097683.2A
Other languages
Chinese (zh)
Other versions
CN111325195B (en
Inventor
余红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Digital Service Technology Co ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202410096251.8A priority Critical patent/CN117912017A/en
Priority to CN202010097683.2A priority patent/CN111325195B/en
Publication of CN111325195A publication Critical patent/CN111325195A/en
Application granted granted Critical
Publication of CN111325195B publication Critical patent/CN111325195B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/1444Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiment of the specification discloses a text recognition method, a text recognition device and electronic equipment. Extracting block features of the text blocks aiming at the text blocks, and judging whether the block features of two adjacent lines of text blocks reach a preset feature condition, wherein the preset feature condition is a feature condition which is established by using a training sample and is met by the block features of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information. And determining the operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of text information combination and non-combination.

Description

文本识别方法、装置及电子设备Text recognition method, device and electronic device

技术领域technical field

本说明书实施例涉及计算机技术领域,尤其涉及一种文本识别方法、装置及电子设备。The embodiments of this specification relate to the field of computer technology, and in particular, to a text recognition method, apparatus, and electronic device.

背景技术Background technique

在日常中,经常需要对图片等载体进行文字识别,进而根据识别结果进行文字处理。那么,如何识别出完整的文本信息,则是业界一直讨论的课题。In daily life, it is often necessary to perform text recognition on carriers such as pictures, and then perform text processing according to the recognition results. Then, how to identify complete text information is a topic that has been discussed in the industry.

发明内容SUMMARY OF THE INVENTION

有鉴于此,本说明书实施例提供了一种准确识别文本信息的文本识别方法、装置及电子设备。In view of this, the embodiments of this specification provide a text recognition method, apparatus and electronic device for accurately recognizing text information.

本说明书实施例采用下述技术方案:The embodiments of this specification adopt the following technical solutions:

本说明书实施例提供一种文本识别方法,包括:The embodiments of this specification provide a text recognition method, including:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;It is judged whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is that when two adjacent lines of text blocks belong to the same text information established by using training samples, the adjacent text blocks belong to the same text information. The feature conditions satisfied by the block feature of the two-line text block;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例还提供一种文本识别方法,包括:The embodiment of this specification also provides a text recognition method, including:

对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the training samples, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The merging model is trained by using the block features of the text blocks to determine the preset feature conditions in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are identified, the whether the block feature of the text block meets the preset feature condition, and perform an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and not merging.

本说明书实施例还提供一种文本识别方法,包括:The embodiment of this specification also provides a text recognition method, including:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用合并模型对相邻两行所述文本块的块特征进行处理,得到所述相邻两行所述文本块是否达到预设特征条件的判断结果,所述合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;A merging model is used to process the block features of the text blocks in two adjacent lines to obtain a judgment result of whether the text blocks in the two adjacent lines meet the preset feature conditions. The block features of the two adjacent lines of text blocks obtained are obtained by training to determine the preset feature conditions;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例还提供一种文本识别装置,包括:The embodiments of this specification also provide a text recognition device, including:

文字识别模块,对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;A text recognition module, which performs text recognition on the object to be recognized, and obtains text blocks based on the recognized lines of text;

提取模块,提取所述文本块的块特征;an extraction module to extract block features of the text block;

判断模块,判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;A judgment module, for judging whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is established by using a training sample when the text blocks in two adjacent lines belong to the same text information, the The feature conditions satisfied by the block features of the two adjacent lines of text blocks;

执行模块,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The execution module performs an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例还提供一种文本识别装置,包括:The embodiments of this specification also provide a text recognition device, including:

文字识别模块,对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;The text recognition module performs text recognition on the training samples, and obtains text blocks based on the recognized lines of text;

提取模块,提取所述文本块的块特征;an extraction module to extract block features of the text block;

训练模块,利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The training module uses the block features of the text blocks to train the merged model to determine the preset feature conditions in the merged model, so that when identifying the block features of two adjacent lines of text blocks in the object to be identified, it is possible to judge the adjacent Whether the block feature of the two lines of the text block meets the preset feature condition, an operation is performed on the two adjacent lines of the text block according to the judgment result, and the operation includes one of merging and not merging.

本说明书实施例还提供一种文本识别装置,包括:The embodiments of this specification also provide a text recognition device, including:

文字识别模块,对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;A text recognition module, which performs text recognition on the object to be recognized, and obtains text blocks based on the recognized lines of text;

提取模块,提取所述文本块的块特征;an extraction module to extract block features of the text block;

模型处理模块,利用合并模型对相邻两行所述文本块的块特征进行处理,得到所述相邻两行所述文本块是否达到预设特征条件的判断结果,所述合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;The model processing module uses a merged model to process the block features of the text blocks in two adjacent lines, and obtains a judgment result of whether the text blocks in the two adjacent lines meet the preset feature conditions. The block features of the two adjacent lines of text blocks identified in the training samples are trained to determine the preset feature conditions;

执行模块,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The execution module performs an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例还提供一种电子设备,包括:The embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;It is judged whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is that when two adjacent lines of text blocks belong to the same text information established by using training samples, the adjacent text blocks belong to the same text information. The feature conditions satisfied by the block feature of the two-line text block;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例还提供一种电子设备,包括:The embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the training samples, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The merging model is trained by using the block features of the text blocks to determine the preset feature conditions in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are identified, the whether the block feature of the text block meets the preset feature condition, and perform an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and not merging.

本说明书实施例还提供一种电子设备,包括:The embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用合并模型对相邻两行所述文本块的块特征进行处理,得到所述相邻两行所述文本块是否达到预设特征条件的判断结果,所述合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;A merging model is used to process the block features of the text blocks in two adjacent lines to obtain a judgment result of whether the text blocks in the two adjacent lines meet the preset feature conditions. The block features of the two adjacent lines of text blocks obtained are obtained by training to determine the preset feature conditions;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例采用的上述至少一个技术方案能够达到以下有益效果:The above-mentioned at least one technical solution adopted in the embodiments of this specification can achieve the following beneficial effects:

本说明书实施例提供一种文本是否合并的自动识别方案,其实质技术方案为,通过对待识别对象中的各行文字进行文字识别,可以基于识别出的各行文字分别得到文本块。针对文本块提取出所述文本块的块特征,判断相邻两行文本块的块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件。根据判断结果确定对所述相邻两行所述文本块的操作,所述操作包括合并成文本信息和不合并中的一种。The embodiments of this specification provide an automatic recognition solution for whether texts are merged. The essential technical solution is that, by performing text recognition on each line of text in an object to be recognized, text blocks can be obtained based on the recognized lines of text. Extract the block feature of the text block for the text block, and determine whether the block features of the two adjacent lines of text blocks meet a preset feature condition, where the preset feature condition is that the two adjacent lines of text blocks belong to In the case of the same text information, the feature condition satisfied by the block features of the text blocks of the two adjacent lines. An operation on the text blocks of the two adjacent lines is determined according to the judgment result, and the operation includes one of merging into text information and not merging.

因此,本方案利用文本块的块特征,并结合利用训练样本中确立的预设特征条件,可以自动识别相邻两行文本块是否属于同一文本信息,并根据判断结果确定是否对该相邻两行文本块进行合并。本说明书实施例记载的方案能够提高文本识别效率。Therefore, this scheme can automatically identify whether two adjacent lines of text blocks belong to the same text information by using the block features of the text blocks, combined with the preset feature conditions established in the training samples, and determine whether the adjacent two lines of text blocks belong to the same text information according to the judgment result. Line text blocks are merged. The solutions described in the embodiments of this specification can improve text recognition efficiency.

附图说明Description of drawings

此处所说明的附图用来提供对本说明书实施例的进一步理解,构成本说明书实施例的一部分,本说明书的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described herein are used to provide further understanding of the embodiments of the present specification, and constitute a part of the embodiments of the present specification. The schematic embodiments and descriptions of the present specification are used to explain the present application and do not constitute an improper limitation of the present application. In the attached image:

图1为本说明书实施例提出的一种文本识别方案的系统架构示意图;FIG. 1 is a schematic diagram of a system architecture of a text recognition scheme proposed by an embodiment of this specification;

图2为本说明书实施例提供的一种文本识别方法的流程图;2 is a flowchart of a text recognition method provided by an embodiment of the present specification;

图3为本说明书实施例提供的一种文本识别过程中文本块的示意图;3 is a schematic diagram of a text block in a text recognition process provided by an embodiment of the present specification;

图4为本说明书实施例提供的一种文本识别过程中文本块的示意图;4 is a schematic diagram of a text block in a text recognition process provided by an embodiment of the present specification;

图5为本说明书实施例提供的一种文本识别过程中文本块的示意图;。FIG. 5 is a schematic diagram of a text block in a text recognition process provided by an embodiment of the present specification;

图6为本说明书实施例提出的一种文本识别方法的应用示例的流程图;6 is a flowchart of an application example of a text recognition method proposed in an embodiment of this specification;

图7为本说明书实施例提供的一种文本识别方法的流程图;7 is a flowchart of a text recognition method provided by an embodiment of the present specification;

图8为本说明书实施例提供的一种文本识别方法的流程图;8 is a flowchart of a text recognition method provided by an embodiment of the present specification;

图9为本说明书实施例提供的一种文本识别方法的应用示例的流程图;9 is a flowchart of an application example of a text recognition method provided by an embodiment of the present specification;

图10为本说明书实施例提供的一种文本识别方法的应用示例的流程图;10 is a flowchart of an application example of a text recognition method provided by an embodiment of this specification;

图11为本说明书实施例提供的一种文本识别装置的结构示意图;FIG. 11 is a schematic structural diagram of a text recognition device according to an embodiment of the present specification;

图12为本说明书实施例提供的一种文本识别装置的结构示意图;12 is a schematic structural diagram of a text recognition device according to an embodiment of the present specification;

图13为本说明书实施例提供的一种文本识别装置的结构示意图。FIG. 13 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present specification.

具体实施方式Detailed ways

对现有技术进行分析发现,现有技术提供了单一的文字识别技术,具体如采用光学字符识别技术进行文字识别。之后,可以采用人工方式,对识别出的文字进行合并形成完整的文本信息。It is found by analyzing the prior art that the prior art provides a single character recognition technology, for example, an optical character recognition technology is used for character recognition. Afterwards, the recognized characters can be merged to form complete text information in an artificial manner.

本说明书实施例提出了一种文本识别方法、装置及电子设备,其实质技术方案为,通过对待识别对象中的各行文字进行文字识别,可以基于识别出的各行文字分别得到文本块。针对文本块提取出所述文本块的块特征,判断相邻两行文本块的块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件。根据判断结果确定对所述相邻两行所述文本块的操作,所述操作包括合并成文本信息和不合并中的一种。The embodiments of this specification propose a text recognition method, device, and electronic device. The essential technical solution is that, by performing text recognition on each line of text in an object to be recognized, text blocks can be obtained based on each line of recognized text. Extract the block feature of the text block for the text block, and determine whether the block features of the two adjacent lines of text blocks meet a preset feature condition, where the preset feature condition is that the two adjacent lines of text blocks belong to In the case of the same text information, the feature condition satisfied by the block features of the text blocks of the two adjacent lines. An operation on the text blocks of the two adjacent lines is determined according to the judgment result, and the operation includes one of merging into text information and not merging.

因此,本说明书实施例提供一种文本是否合并的自动识别方案,本方案利用文本块的块特征,并结合利用训练样本中确立的预设特征条件,可以自动识别相邻两行文本块是否属于同一文本信息,并根据判断结果确定是否对该相邻两行文本块进行合并。本说明书实施例记载的方案能够提高文本识别效率。Therefore, the embodiments of this specification provide an automatic identification scheme for whether texts are merged. This scheme can automatically identify whether two adjacent lines of text blocks belong to a text block by using the block features of the text blocks, combined with the preset feature conditions established in the training samples. The same text information, and whether to merge the two adjacent lines of text blocks is determined according to the judgment result. The solutions described in the embodiments of this specification can improve text recognition efficiency.

为使本申请的目的、技术方案和优点更加清楚,下面将结合本说明书具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本说明书一部分实施例,而不是全部的实施例。基于本说明书中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objectives, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the specific embodiments of the present specification and the corresponding drawings. Obviously, the described embodiments are only some of the embodiments of the present specification, but not all of the embodiments. Based on the embodiments in this specification, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

以下结合附图,详细说明本说明书各实施例提供的技术方案。The technical solutions provided by the embodiments of the present specification will be described in detail below with reference to the accompanying drawings.

图1为本说明书实施例提出的一种文本识别方案的系统架构示意图。FIG. 1 is a schematic diagram of a system architecture of a text recognition solution proposed in an embodiment of the present specification.

如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种客户端应用。例如浏览器类应用、搜索类应用、即时通信类工具等等。The terminal devices 101, 102, and 103 interact with the server 105 through the network 104 to receive or send messages and the like. Various client applications may be installed on the terminal devices 101 , 102 and 103 . For example, browser applications, search applications, instant messaging tools, etc.

终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices, including but not limited to smart phones, tablet computers, e-book readers, laptop computers, desktop computers, and the like. When the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (eg, multiple software or software modules for providing distributed services), or as a single software or software module. There is no specific limitation here.

服务器105可以是提供各种服务的服务器,例如为终端设备101、102、103上所安装的客户端应用进行测试的后端服务器。需要说明的是,本公开的实施例所提供的文本识别方法一般由服务器105执行,相应地,文本识别装置一般设置于服务器105中。此时,可以不存在终端设备101、102、103和网络104。The server 105 may be a server that provides various services, such as a back-end server that tests client applications installed on the terminal devices 101 , 102 , and 103 . It should be noted that the text recognition method provided by the embodiments of the present disclosure is generally executed by the server 105 , and accordingly, the text recognition apparatus is generally provided in the server 105 . At this time, the terminal devices 101, 102, 103 and the network 104 may not exist.

还需要指出的是,对终端设备101、102、103上所安装的客户端应用的测试也可以由终端设备101、102、103执行。此时,文本识别方法可以由终端设备101、102、103执行,相应地,文本识别装置也可以设置于终端设备101、102、103中。此时,示例性系统架构100可以不存在服务器105和网络104。It should also be noted that the testing of the client applications installed on the terminal devices 101 , 102 and 103 may also be performed by the terminal devices 101 , 102 and 103 . At this time, the text recognition method may be executed by the terminal devices 101 , 102 and 103 , and correspondingly, the text recognition apparatus may also be provided in the terminal devices 101 , 102 and 103 . At this point, server 105 and network 104 may be absent from example system architecture 100 .

需要说明的是,服务器105可以是硬件,也可以是软件。当服务器105为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器105为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务的多个软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server 105 may be hardware or software. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server 105 is software, it can be implemented as multiple software or software modules (for example, multiple software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

图2为本说明书实施例提出的一种文本识别方法的流程图。FIG. 2 is a flowchart of a text recognition method proposed in an embodiment of the present specification.

步骤201:对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块。Step 201: Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text.

在具体应用中,待识别对象中可以展示文本信息。本说明书实施例记载的待识别对象可以是图像,该图像可以是扫描载有文本信息的载体所得,还可以是数字合成得到,在此不作具体限定。In a specific application, text information can be displayed in the object to be recognized. The object to be recognized described in the embodiment of this specification may be an image, and the image may be obtained by scanning a carrier carrying text information, or may be obtained by digital synthesis, which is not specifically limited herein.

对待识别对象中的各行文字进行文字识别,具体可以采用光学字符识别技术OCR(全称:Optical Character Recognition)。OCR是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程,简单来说就是识别图像中的文字。To perform text recognition on each line of text in the object to be recognized, the optical character recognition technology OCR (full name: Optical Character Recognition) may be used specifically. OCR refers to the process in which electronic devices (such as scanners or digital cameras) examine characters printed on paper, determine their shapes by detecting dark and light patterns, and then translate the shapes into computer text using character recognition methods. text in the image.

在本说明书实施例中,待识别对象可以包含多行文字。这里的行不是限定特定方向,而是表征沿一直线方向排列的一组文字。在本说明书实施例中,对待识别对象中的各行文字进行文字识别,基于识别出的所述各行文字分别得到文本块,可以包括:In the embodiment of this specification, the object to be recognized may include multiple lines of text. The lines here do not define a specific direction, but represent a group of words arranged in a straight direction. In the embodiment of this specification, character recognition is performed on each line of text in the object to be recognized, and text blocks are obtained based on the recognized lines of text, which may include:

对待识别对象进行文字识别;Perform text recognition on the object to be recognized;

还识别待识别对象中的文字排布方式,得到识别出的各行文字;It also recognizes the text arrangement in the object to be recognized, and obtains the recognized lines of text;

将识别出的各行文字分别标记为文本块。Mark the recognized lines of text as text blocks.

本说明书实施例记载的文本块为一个文本单位,并不限定文本结构。The text block described in the embodiments of this specification is a text unit, and does not limit the text structure.

本说明书实施例对识别出的文字,以文本块为单位进行后续处理。The embodiments of the present specification perform subsequent processing on the recognized characters in units of text blocks.

步骤203:提取所述文本块的块特征。Step 203: Extract the block feature of the text block.

本说明书实施例中,块特征为文本块自身具备的特征。本说明书实施例所提出的文本块自动识别合并方案的创新之处,在于以文本块为单位,识别出文本块所具备的块特征,结合后文所述相邻两行文本块的块特征,作出文本块是否属于同一文本信息的判断结果。In the embodiment of this specification, the block feature is a feature possessed by the text block itself. The innovation of the text block automatic identification and merging scheme proposed in the embodiments of this specification is that the block feature of the text block is identified by taking the text block as a unit, and combined with the block features of the two adjacent lines of text blocks described later, A judgment result is made as to whether the text blocks belong to the same text information.

在本说明书实施例中,块特征可以包括各所述文本块的行高和相邻两行所述文本块之间的行距中的一种或两种,在此不作具体限定。In the embodiment of this specification, the block feature may include one or both of the line height of each text block and the line spacing between two adjacent lines of the text blocks, which are not specifically limited herein.

在本说明书实施例中,提取所述文本块的块特征,可以包括:In the embodiment of this specification, extracting the block feature of the text block may include:

创建坐标空间;Create a coordinate space;

识别文本块在坐标空间中的坐标值;Identify the coordinate value of the text block in the coordinate space;

基于所述坐标值确定文本块的块特征。A block feature of the text block is determined based on the coordinate values.

具体地,利用坐标值可以计算文本块的行高,相邻两行文本块之间的行距,或其他尺寸,在此不作具体限定。Specifically, the line height of the text block, the line spacing between two adjacent lines of text blocks, or other dimensions can be calculated by using the coordinate values, which are not specifically limited herein.

步骤205:判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件。Step 205: Determine whether the block features of the text blocks in two adjacent lines meet a preset feature condition, where the preset feature condition is established by using training samples when two adjacent lines of text blocks belong to the same text information. The feature conditions that are satisfied by the block features of the two adjacent lines of text blocks.

本说明书实施例利用对训练样本进行分析、或学习,从中提炼出预设特征条件,预设特征条件为相邻两行文本块属于同一文本信息时,相邻两行文本块的块特征所具备的特征阈值条件,该预设特征条件可以作为文本块合并条件。这为实现本说明书实施例提出的文本块自动合并方案提供了可能性。The embodiment of the present specification uses the analysis or learning of the training samples to extract preset feature conditions. The preset feature conditions are that when two adjacent lines of text blocks belong to the same text information, the block features of the two adjacent lines of text blocks have The feature threshold condition of , the preset feature condition can be used as the text block merging condition. This provides the possibility to realize the automatic text block merging solution proposed by the embodiments of this specification.

本说明书实施例记载的文本信息是指由文字组成的描述指定事项的段落或文件。相邻两行文本块属于同一文本信息是指,相邻两行文本块所包含的文字描述同一事项,从形式上说,可以合并到一个段落或文件。The text information described in the embodiments of the present specification refers to paragraphs or documents composed of words that describe specified matters. The fact that two adjacent lines of text blocks belong to the same text information means that the texts contained in the two adjacent lines of text blocks describe the same matter, and can be combined into one paragraph or document formally.

具体地,在本说明书另一示例中,判断相邻两行文本块的块特征是否达到预设特征条件,包括:Specifically, in another example of this specification, judging whether the block features of two adjacent lines of text blocks meet the preset feature conditions, including:

判断相邻两行所述文本块的行高不小于所述行距。It is determined that the line height of the text blocks in two adjacent lines is not less than the line spacing.

在这种情况下,相邻两行文本块的行高不小于行距,为预设特征条件的一种示例。In this case, the line height of two adjacent lines of text blocks is not less than the line spacing, which is an example of a preset feature condition.

其原理是,若当前实际的行高不小于行距,即行高与行距接近,或行高大于行距,相邻两行文本块彼此接近,则相邻两行文本块大概率属于同一文本信息。反之,若当前实际的行高小于行距,即行距大,相邻两行文本块彼此间距大,则相邻两行文本块大概率不属于同一文本信息。The principle is that if the current actual line height is not less than the line spacing, that is, the line height is close to the line spacing, or the line height is greater than the line spacing, and two adjacent lines of text blocks are close to each other, then the two adjacent lines of text blocks have a high probability of belonging to the same text information. Conversely, if the current actual line height is smaller than the line spacing, that is, the line spacing is large, and the distance between two adjacent lines of text blocks is large, then the two adjacent lines of text blocks are likely not to belong to the same text information.

在该示例中,相邻两行文本块的行高接近或相等。In this example, the line heights of two adjacent lines of text blocks are close or equal.

其中,判断所述相邻两行所述文本块的行高不小于所述行距,包括:Wherein, judging that the line height of the text blocks in the two adjacent lines is not less than the line spacing, including:

判断所述相邻两行所述文本块的行高是否超过所述行距达到预设差值。It is judged whether the line height of the text blocks in the two adjacent lines exceeds the line spacing by a preset difference.

在这种情况下,预设差值为预设特征条件中的一种。In this case, the preset difference value is one of preset characteristic conditions.

其原理是,如果行高超过行距并达到预设差值,即文本块中的字高度远大于行距,字大而行距小,则该相邻两行文本块较大概率属于同一文本信息。反之,如果行高为超过行距,或者行高超过行距但未超过预设差值,即文本块中的字高小于行距,字小而行距大,则该相邻两行文本块大概率不属于同一文本信息。The principle is that if the line height exceeds the line spacing and reaches the preset difference, that is, the word height in the text block is much larger than the line spacing, and the words are large and the line spacing is small, then the two adjacent lines of text blocks have a high probability of belonging to the same text information. Conversely, if the line height exceeds the line spacing, or the line height exceeds the line spacing but does not exceed the preset difference, that is, the word height in the text block is smaller than the line spacing, and the words are small but the line spacing is large, then the two adjacent lines of text blocks are not likely to belong to the same text message.

在本说明书另一示例中,判断相邻两行所述文本块的所述块特征是否达到预设特征条件,包括:In another example of this specification, judging whether the block features of the text blocks in two adjacent lines meet a preset feature condition includes:

若文本块的块特征包括行高,则判断相邻两行所述文本块的所述行高之差是否不大于预设行高差。If the block feature of the text block includes a line height, it is determined whether the difference between the line heights of the text blocks in two adjacent lines is not greater than a preset line height difference.

在这种情况下,预设行高差为预设特征条件中的一种。In this case, the preset line height difference is one of the preset characteristic conditions.

其原理是,若行高之差不大于预设行高差,即相邻两行文本块的行高接近或相等,则该相邻两行文本块较大概率属于同一文本信息。若行高之差大于预设高差,即行高之差很大,则该相邻两行文本块较大概率不属于同一文本信息。The principle is that if the difference between the line heights is not greater than the preset line height difference, that is, the line heights of two adjacent lines of text blocks are close to or equal, then the two adjacent lines of text blocks have a high probability of belonging to the same text information. If the difference between the line heights is greater than the preset height difference, that is, the difference between the line heights is large, there is a high probability that the two adjacent lines of text blocks do not belong to the same text information.

本说明书实施例提出上述具体示例可以分别单独使用,也可结合使用进行判定,在此不作限定。除上述两种示例之外,还可以基于训练样本分析或识别出其他预设特征条件及对应的块特征,在此不作具体限定。It is proposed in the embodiments of this specification that the above-mentioned specific examples may be used independently, or may be used in combination for determination, which is not limited herein. In addition to the above two examples, other preset feature conditions and corresponding block features may also be analyzed or identified based on the training samples, which are not specifically limited here.

执行步骤205的判断结果可以包括两种,达到预设特征条件或未达到预设特征条件。The judgment result of performing step 205 may include two types, that is, the preset characteristic condition is met or the preset characteristic condition is not met.

步骤207:根据判断结果对相邻两行文本块执行操作,该操作可以包括合并和不合并的一种。Step 207: Perform an operation on two adjacent lines of text blocks according to the judgment result, and the operation may include one of merging and non-merging.

其中,不合并可以是按照原始的待识别对象中,两行文本块对应的排布方式进行排列,也可以按照预设策略将两行文本块隔离。Wherein, the non-merging may be arranged according to the arrangement manner corresponding to the two lines of text blocks in the original object to be recognized, or the two lines of text blocks may be isolated according to a preset strategy.

如果判断结果为合并,则可以将相邻两行文本块合并为段落或合并为文件。对合并后的文本形式可以任意设定,不受限定。If the judgment result is merging, two adjacent lines of text blocks can be merged into a paragraph or into a file. The combined text form can be arbitrarily set without limitation.

在本说明书实施例中,在将文本块合并后,可以进行后续的文本处理,如文本审核和质检等,在此不作具体限定。In the embodiment of this specification, after the text blocks are combined, subsequent text processing, such as text review and quality inspection, may be performed, which is not specifically limited herein.

本说明书实施例提供了一种文本是否合并的自动识别方案,本方案利用文本块的块特征,并结合利用训练样本中确立的预设特征条件,可以自动识别相邻两行文本块是否属于同一文本信息,并根据判断结果确定是否对该相邻两行文本块进行合并。本说明书实施例记载的方案能够提高文本识别效率。The embodiment of this specification provides an automatic identification scheme for whether texts are merged. This scheme can automatically identify whether two adjacent lines of text blocks belong to the same text block by using the block features of the text blocks and the preset feature conditions established in the training samples. text information, and determine whether to merge the two adjacent lines of text blocks according to the judgment result. The solutions described in the embodiments of this specification can improve text recognition efficiency.

如图3所示,“×”表示文字,图3示出上下相邻的两行文本块,行高远超过了行距,在这种情况下,可以判定该两行文本块属于同一文本信息,可以合并到一起。As shown in Figure 3, "×" represents text, and Figure 3 shows two lines of text blocks adjacent to each other, and the line height far exceeds the line spacing. In this case, it can be determined that the two lines of text blocks belong to the same text information, and can merge together.

如图4所示,上下相邻的两行文本块,行高与行距比较接近,在这种情况下,可以判定该两行文本块属于同一文本信息,可以合并到一起。As shown in FIG. 4 , the two lines of text blocks adjacent to the top and bottom are relatively close in line height and line spacing. In this case, it can be determined that the two lines of text blocks belong to the same text information and can be merged together.

如图5所示,在上下相邻的两行文本块中,上面一行文本块的行高大于下面一行文本块的行高,则判定该两行文本块不属于同一文本信息,不必合并。As shown in FIG. 5 , in two adjacent lines of text blocks, if the line height of the upper line of text blocks is greater than that of the lower line of text blocks, it is determined that the two lines of text blocks do not belong to the same text information and need not be merged.

图6为本说明书实施例提出的一种文本识别方法的应用示例的流程图。FIG. 6 is a flowchart of an application example of a text recognition method proposed in an embodiment of the present specification.

步骤602:检测用户发布的业务信息。Step 602: Detect the service information published by the user.

在方法的执行主体可以是业务信息平台,可以接收用户上传或发布的各种业务信息,即检测用户发布的业务信息。而业务信息的类型可以是视频、文本、图片,文本如海报、画报等,在此不作具体限定。The execution body of the method can be a business information platform, which can receive various business information uploaded or released by the user, that is, detect the business information released by the user. The types of business information may be videos, texts, pictures, texts such as posters, pictorials, etc., which are not specifically limited here.

在本说明书实施例中,检测用户发布的业务信息,可以包括定期或不定期检测用户发布的业务信息,还可以是从数据库中检索扫描得到的业务信息,在此不作具体限定。In the embodiment of this specification, detecting the service information released by the user may include regularly or irregularly detecting the service information released by the user, and may also be the service information obtained by retrieving and scanning from a database, which is not specifically limited here.

步骤604:从业务信息中提取待识别对象。Step 604: Extract the object to be identified from the service information.

本说明书实施例记载的业务信息可以包含待识别对象,如图像。在这种情况下,业务信息可以是包含图片的文本,还可以是视频,而图像为视频中提取的图像帧。The service information recorded in the embodiments of this specification may include objects to be identified, such as images. In this case, the business information can be text containing pictures, or it can be a video, and the image is an image frame extracted from the video.

步骤606可以参考上文步骤201的内容,步骤608可以参考上文步骤203的内容,步骤610可以参考上文步骤205的内容,步骤612可以参考上文步骤207的内容,在此不作具体限定。Step 606 can refer to the content of step 201 above, step 608 can refer to the content of step 203 above, step 610 can refer to the content of step 205 above, and step 612 can refer to the content of step 207 above, which is not specifically limited here.

图7为本说明书实施例提出的一种文本识别方法的流程图。本方法描述了应用于实践的合并模型形成方法。FIG. 7 is a flowchart of a text recognition method proposed in an embodiment of the present specification. This method describes a method of forming a merged model that is applied in practice.

步骤702:对训练样本进行文字识别,基于识别出的各行文字分别得到文本块。Step 702: Perform text recognition on the training sample, and obtain text blocks based on the recognized lines of text.

对训练样本的文字识别方案可以参考上文步骤201,在此不作赘述。For the text recognition scheme of the training samples, reference may be made to step 201 above, which will not be repeated here.

其中,训练样本可以包括白样本和黑样本中的一种或两种,所谓白样本中的相邻两行文本块属于同一文本信息而能够合并在一起,而黑样本中的相邻两行文本块不能够合并。对于白样本和黑样本的样本量,可以根据需要和实际进行选择,不用作具体限定。Among them, the training samples may include one or both of white samples and black samples. The so-called two adjacent lines of text in the white sample belong to the same text information and can be merged together, while the two adjacent lines of text in the black sample belong to the same text information. Blocks cannot be merged. The sample size of the white sample and the black sample can be selected according to actual needs and is not used as a specific limitation.

步骤704:提取文本块的块特征。Step 704: Extract block features of the text block.

具体可以参考上文步骤203,在此不作赘述。For details, reference may be made to step 203 above, which will not be repeated here.

步骤706:利用文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。Step 706: Use the block feature of the text block to train the merging model to determine the preset feature conditions in the merging model, so that when identifying the block features of two adjacent lines of text blocks in the object to be recognized, judge the two adjacent lines. Whether the block feature of the text block meets the preset feature condition, an operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例将判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件的过程模型化,基于机器学习的原理,训练合并模型,使合并模型基于训练样本中的文本块的块特征,不断调整和更新预设特征条件,直到合并模型的模型结果达到预设效果。The embodiments of this specification model the process of judging whether the block features of the text blocks in two adjacent lines meet the preset feature conditions, and train the merging model based on the principle of machine learning, so that the merging model is based on the The block feature of the text block continuously adjusts and updates the preset feature conditions until the model result of the merged model reaches the preset effect.

本说明书实施例记载的合并模型可以选择分类模型,如决策树等,在此不作具体限定。The merging model described in the embodiments of the present specification may select a classification model, such as a decision tree, etc., which is not specifically limited here.

另外,在利用所述文本块的块特征训练合并模型之前,还可以包括:In addition, before using the block feature of the text block to train the merged model, the method may further include:

获取所述训练样本中的相邻两行所述文本块的合并信息,所述合并信息表征所述相邻两个所述文本块之间是否合并;Acquiring merge information of the text blocks of two adjacent lines in the training sample, where the merge information represents whether the two adjacent text blocks are merged;

根据所述合并信息对所述块特征进行标记。The block features are marked according to the merge information.

合并信息记载该相邻两行文本块具体是否能够合并,并基于此对块特征添加标签,形成标记。这体现一种有监督学习方法,提高合并模型的训练效率。The merge information records whether the two adjacent lines of text blocks can be merged, and based on this, tags are added to the block features to form a mark. This embodies a supervised learning method and improves the training efficiency of the merged model.

在另一种实施例中,也可不预先标记,采用无监督学习方法或半监督学习方法,训练合并模型。In another embodiment, an unsupervised learning method or a semi-supervised learning method may be used to train the merged model without pre-marking.

在本说明书其他实施例中,也可采用人工识别方案确定预设特征条件。In other embodiments of the present specification, a manual identification scheme may also be used to determine the preset characteristic conditions.

图8为本说明书实施例提出的一种文本识别方法的流程图。FIG. 8 is a flowchart of a text recognition method proposed in an embodiment of the present specification.

对步骤801可以参考上文步骤201的内容,步骤803可以参考上文步骤203的内容,在此不作具体限定。For step 801, reference may be made to the content of step 201 above, and for step 803, reference may be made to the content of step 203 above, which is not specifically limited herein.

步骤805:利用合并模型对相邻两行文本块的块特征进行处理,得到相邻两行文本块是否达到预设特征条件的判断结果,合并模型是利用从训练样本中识别的相邻两行文本块的块特征进行训练,以确定预设特征条件而得到的。Step 805: Use the merge model to process the block features of the two adjacent lines of text blocks to obtain a judgment result of whether the adjacent two lines of text blocks meet the preset feature conditions. The merge model is to use the two adjacent lines identified from the training samples. The block features of the text blocks are trained to determine the preset feature conditions.

利用合并模型对相邻两行文本块的块特征进行处理,具体包括:The block features of two adjacent lines of text blocks are processed by using the merging model, including:

以相邻两行文本块块特征为输入,利用合并模型判断相邻两行文本块的块特征是否达到预设特征条件,进而得到判断结果。Taking the block features of two adjacent lines of text blocks as input, the merging model is used to judge whether the block features of the two adjacent lines of text blocks meet the preset feature conditions, and then the judgment result is obtained.

步骤807可以参考上文步骤207,在此不作具体限定。For step 807, reference may be made to the above step 207, which is not specifically limited here.

图9为本说明书实施例提出的一种文本识别方法的应用示例的流程图。FIG. 9 is a flowchart of an application example of a text recognition method proposed in an embodiment of the present specification.

步骤902:获取训练图片。Step 902: Acquire training pictures.

步骤904:对训练图片进行OCR识别;Step 904: perform OCR identification on the training picture;

步骤906:将识别出的每行文字进行合并而得到文本块,并获取每个文本块的坐标;Step 906: combine each line of text identified to obtain a text block, and obtain the coordinates of each text block;

步骤908:根据文本块的坐标计算文本块的行高;Step 908: Calculate the line height of the text block according to the coordinates of the text block;

步骤910:根据相邻两行文本块的坐标计算该两行文本块之间的行距;Step 910: Calculate the line spacing between the two adjacent lines of text blocks according to the coordinates of the two adjacent lines of text blocks;

步骤912:对相邻两行文本块是否合并进行标注;Step 912: Mark whether two adjacent lines of text blocks are merged;

步骤914:获取文本块的行高和相邻两行文本块之间的行距构成的块特征及标注的合并信息;Step 914: Obtain the block feature and the marked merge information composed of the line height of the text block and the line spacing between two adjacent lines of text blocks;

步骤916:利用获取的块特征和标注的合并信息训练决策树模型,以调整预设特征条件,得到合并模型。Step 916: Train a decision tree model by using the acquired block features and the marked merge information to adjust the preset feature conditions to obtain a merge model.

图10为本说明书实施例提出的一种文本识别方法的流程图。FIG. 10 is a flowchart of a text recognition method proposed in an embodiment of the present specification.

步骤1001:获取预测图片。Step 1001: Obtain a predicted picture.

步骤1003:对预测图片进行OCR识别;Step 1003: perform OCR identification on the predicted picture;

步骤1005:将识别出的每行文字进行合并而得到文本块,并获取每个文本块的坐标;Step 1005: combine each line of text identified to obtain a text block, and obtain the coordinates of each text block;

步骤1007:根据文本块的坐标计算文本块的行高;Step 1007: Calculate the line height of the text block according to the coordinates of the text block;

步骤1009:根据相邻两行文本块的坐标计算该两行文本块之间的行距;Step 1009: Calculate the line spacing between the two lines of text blocks according to the coordinates of the two adjacent lines of text blocks;

步骤1011:对相邻两行文本块是否合并进行标注;Step 1011: Mark whether two adjacent lines of text blocks are merged;

步骤1013:获取文本块的行高和相邻两行文本块之间行距构成的块特征;Step 1013: Obtain the block feature formed by the line height of the text block and the line spacing between two adjacent lines of text blocks;

步骤1015:将获取的块特征输入训练得到的合并模型,由合并模型判断块特征是否达到预设特征条件;Step 1015: Input the acquired block feature into the merging model obtained by training, and determine whether the block feature reaches the preset feature condition by the merging model;

步骤1017:合并模型输出是否合并结果。Step 1017 : whether the merged model outputs the merged result.

图11为本说明书实施例提供的一种文本识别装置的结构示意图。FIG. 11 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present specification.

本装置可以包括:The device may include:

文字识别模块1101,对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;The text recognition module 1101 is to perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取模块1102,提取所述文本块的块特征;Extraction module 1102, extracting the block feature of the text block;

判断模块1103,判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;The judgment module 1103 judges whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is established by using the training sample when the two adjacent lines of text blocks belong to the same text information, The feature conditions satisfied by the block features of the two adjacent lines of text blocks;

执行模块1104,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The execution module 1104 performs an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

本说明书实施例所述的文本识别装置提供一种文本是否合并的自动识别方案,利用文本块的块特征,并结合利用训练样本中确立的预设特征条件,可以自动识别相邻两行文本块是否属于同一文本信息,并根据判断结果确定是否对该相邻两行文本块进行合并。本说明书实施例记载的方案能够提高文本识别效率。The text recognition device described in the embodiments of this specification provides an automatic recognition scheme for whether texts are merged or not, and can automatically recognize two adjacent lines of text blocks by using the block features of the text blocks and the preset feature conditions established in the training samples. Whether it belongs to the same text information, and whether to merge the two adjacent lines of text blocks is determined according to the judgment result. The solutions described in the embodiments of this specification can improve text recognition efficiency.

基于同一个发明构思,本说明书实施例还提供了一种电子设备,包括:Based on the same inventive concept, the embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;It is judged whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is that when two adjacent lines of text blocks belong to the same text information established by using training samples, the adjacent text blocks belong to the same text information. The feature conditions satisfied by the block feature of the two-line text block;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

基于同一发明构思,本说明书实施例中还提供了一种计算机可读存储介质,包括与电子设备结合使用计算机程序,所述计算机程序可被处理器执行以完成以下步骤:Based on the same inventive concept, the embodiments of this specification also provide a computer-readable storage medium, including a computer program used in combination with an electronic device, and the computer program can be executed by a processor to complete the following steps:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

判断相邻两行所述文本块的所述块特征是否达到预设特征条件,所述预设特征条件为利用训练样本确立的在相邻两行文本块属于同一文本信息时,所述相邻两行文本块的块特征所满足的特征条件;It is judged whether the block features of the text blocks in two adjacent lines meet a preset feature condition, and the preset feature condition is that when two adjacent lines of text blocks belong to the same text information established by using training samples, the adjacent text blocks belong to the same text information. The feature conditions satisfied by the block feature of the two-line text block;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

图12为本说明书实施例提供的一种文本识别装置的结构示意图。FIG. 12 is a schematic structural diagram of a text recognition apparatus according to an embodiment of the present specification.

本装置可以包括:The device may include:

文字识别模块1201,对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;The text recognition module 1201 performs text recognition on the training samples, and obtains text blocks based on the recognized lines of text;

提取模块1202,提取所述文本块的块特征;Extraction module 1202, extracts the block feature of the text block;

训练模块1203,利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对相邻两行所述文本块执行操作,操作包括合并和不合并中的一种。The training module 1203 uses the block features of the text blocks to train the merging model to determine the preset feature conditions in the merging model, so that when identifying the block features of two adjacent lines of text blocks in the object to be recognized, determine the Whether the block feature of the text blocks in two adjacent lines meets the preset feature condition, an operation is performed on the text blocks in the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

基于同一个发明构思,本说明书实施例还提供了一种电子设备,包括:Based on the same inventive concept, the embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the training samples, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The merging model is trained by using the block features of the text blocks to determine the preset feature conditions in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are identified, the whether the block feature of the text block meets the preset feature condition, and perform an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and not merging.

基于同一发明构思,本说明书实施例中还提供了一种计算机可读存储介质,包括与电子设备结合使用计算机程序,所述计算机程序可被处理器执行以完成以下步骤:Based on the same inventive concept, the embodiments of this specification also provide a computer-readable storage medium, including a computer program used in combination with an electronic device, and the computer program can be executed by a processor to complete the following steps:

对训练样本进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the training samples, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用所述文本块的块特征训练合并模型,以确定所述合并模型中的预设特征条件,以便在识别出待识别对象中相邻两行文本块的块特征时,判断相邻两行所述文本块的所述块特征是否达到所述预设特征条件,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The merging model is trained by using the block features of the text blocks to determine the preset feature conditions in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are identified, the whether the block feature of the text block meets the preset feature condition, and perform an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and not merging.

图13为本说明书实施例提供的一种文本识别装置的结构示意图FIG. 13 is a schematic structural diagram of a text recognition device according to an embodiment of the present specification

本装置可以包括:The device may include:

文字识别模块1301,对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;The text recognition module 1301 is to perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取模块1302,提取所述文本块的块特征;Extraction module 1302, extracts the block feature of the text block;

模型处理模块1303,利用合并模型对相邻两行所述文本块的块特征进行处理,得到所述相邻两行所述文本块是否达到预设特征条件的判断结果,所述合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;The model processing module 1303 uses the merging model to process the block features of the text blocks in the two adjacent lines, and obtains the judgment result of whether the text blocks in the two adjacent lines meet the preset feature conditions. The merging model uses Obtained by training the block features of two adjacent lines of text blocks identified in the training samples to determine the preset feature conditions;

执行模块1304,根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。The execution module 1304 performs an operation on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

基于同一个发明构思,本说明书实施例还提供了一种电子设备,包括:Based on the same inventive concept, the embodiments of this specification also provide an electronic device, including:

处理器;以及processor; and

被配置成存储计算机程序的存储器,所述计算机程序在被执行时使所述处理器执行以下操作:a memory configured to store a computer program that, when executed, causes the processor to:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用合并模型对相邻两行所述文本块的块特征进行处理,得到相邻两行所述文本块是否达到预设特征条件的判断结果,合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;The block features of the text blocks in the two adjacent lines are processed by the merging model, and the judgment result of whether the text blocks in the adjacent two lines meet the preset feature conditions is obtained. The block features of the two-line text blocks are trained to determine the preset feature conditions;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

基于同一发明构思,本说明书实施例中还提供了一种计算机可读存储介质,包括与电子设备结合使用计算机程序,所述计算机程序可被处理器执行以完成以下步骤:Based on the same inventive concept, the embodiments of this specification also provide a computer-readable storage medium, including a computer program used in combination with an electronic device, and the computer program can be executed by a processor to complete the following steps:

对待识别对象进行文字识别,基于识别出的各行文字分别得到文本块;Perform text recognition on the object to be recognized, and obtain text blocks based on the recognized lines of text;

提取所述文本块的块特征;extracting block features of the text block;

利用合并模型对相邻两行所述文本块的块特征进行处理,得到所述相邻两行所述文本块是否达到预设特征条件的判断结果,所述合并模型是利用从训练样本中识别出的相邻两行文本块的块特征进行训练,以确定所述预设特征条件而得到的;A merging model is used to process the block features of the text blocks in two adjacent lines to obtain a judgment result of whether the text blocks in the two adjacent lines meet the preset feature conditions. The block features of the two adjacent lines of text blocks obtained are obtained by training to determine the preset feature conditions;

根据判断结果对所述相邻两行所述文本块执行操作,所述操作包括合并和不合并中的一种。An operation is performed on the text blocks of the two adjacent lines according to the judgment result, and the operation includes one of merging and non-merging.

在20世纪90年代,对于一个技术的改进可以很明显地区分是硬件上的改进(例如,对二极管、晶体管、开关等电路结构的改进)还是软件上的改进(对于方法流程的改进)。然而,随着技术的发展,当今的很多方法流程的改进已经可以视为硬件电路结构的直接改进。设计人员几乎都通过将改进的方法流程编程到硬件电路中来得到相应的硬件电路结构。因此,不能说一个方法流程的改进就不能用硬件实体模块来实现。例如,可编程逻辑器件(Programmable Logic Device,PLD)(例如现场可编程门阵列(Field Programmable GateArray,FPGA))就是这样一种集成电路,其逻辑功能由用户对器件编程来确定。由设计人员自行编程来把一个数字系统“集成”在一片PLD上,而不需要请芯片制造厂商来设计和制作专用的集成电路芯片。而且,如今,取代手工地制作集成电路芯片,这种编程也多半改用“逻辑编译器(logic compiler)”软件来实现,它与程序开发撰写时所用的软件编译器相类似,而要编译之前的原始代码也得用特定的编程语言来撰写,此称之为硬件描述语言(Hardware Description Language,HDL),而HDL也并非仅有一种,而是有许多种,如ABEL(Advanced Boolean Expression Language)、AHDL(Altera Hardware DescriptionLanguage)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL(Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(RubyHardware Description Language)等,目前最普遍使用的是VHDL(Very-High-SpeedIntegrated Circuit Hardware Description Language)与Verilog。本领域技术人员也应该清楚,只需要将方法流程用上述几种硬件描述语言稍作逻辑编程并编程到集成电路中,就可以很容易得到实现该逻辑方法流程的硬件电路。In the 1990s, improvements in a technology could be clearly differentiated between improvements in hardware (eg, improvements to circuit structures such as diodes, transistors, switches, etc.) or improvements in software (improvements in method flow). However, with the development of technology, the improvement of many methods and processes today can be regarded as a direct improvement of the hardware circuit structure. Designers almost get the corresponding hardware circuit structure by programming the improved method flow into the hardware circuit. Therefore, it cannot be said that the improvement of a method flow cannot be realized by hardware entity modules. For example, a Programmable Logic Device (PLD) (eg, Field Programmable Gate Array (FPGA)) is an integrated circuit whose logic function is determined by user programming of the device. It is programmed by the designer to "integrate" a digital system on a PLD without having to ask the chip manufacturer to design and manufacture a dedicated integrated circuit chip. And, instead of making integrated circuit chips by hand, these days, much of this programming is done using software called a "logic compiler", which is similar to the software compiler used in program development and writing, but before compiling The original code also has to be written in a specific programming language, which is called Hardware Description Language (HDL), and there is not only one HDL, but many kinds, such as ABEL (Advanced Boolean Expression Language) , AHDL (Altera Hardware Description Language), Confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), Lava, Lola, MyHDL, PALASM, RHDL (RubyHardware Description Language), etc. The most commonly used are VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog. It should also be clear to those skilled in the art that a hardware circuit for implementing the logic method process can be easily obtained by simply programming the method process in the above-mentioned several hardware description languages and programming it into the integrated circuit.

控制器可以按任何适当的方式实现,例如,控制器可以采取例如微处理器或处理器以及存储可由该(微)处理器执行的计算机可读程序代码(例如软件或固件)的计算机可读介质、逻辑门、开关、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程逻辑控制器和嵌入微控制器的形式,控制器的例子包括但不限于以下微控制器:ARC 625D、Atmel AT91SAM、Microchip PIC18F26K20以及Silicone Labs C8051F320,存储器控制器还可以被实现为存储器的控制逻辑的一部分。本领域技术人员也知道,除了以纯计算机可读程序代码方式实现控制器以外,完全可以通过将方法步骤进行逻辑编程来使得控制器以逻辑门、开关、专用集成电路、可编程逻辑控制器和嵌入微控制器等的形式来实现相同功能。因此这种控制器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的软件模块又可以是硬件部件内的结构。The controller may be implemented in any suitable manner, for example, the controller may take the form of eg a microprocessor or processor and a computer readable medium storing computer readable program code (eg software or firmware) executable by the (micro)processor , logic gates, switches, application specific integrated circuits (ASICs), programmable logic controllers and embedded microcontrollers, examples of controllers include but are not limited to the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicon Labs C8051F320, the memory controller can also be implemented as part of the control logic of the memory. Those skilled in the art also know that, in addition to implementing the controller in the form of pure computer-readable program code, the controller can be implemented as logic gates, switches, application-specific integrated circuits, programmable logic controllers and embedded devices by logically programming the method steps. The same function can be realized in the form of a microcontroller, etc. Therefore, such a controller can be regarded as a hardware component, and the devices included therein for realizing various functions can also be regarded as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as both a software module implementing a method and a structure within a hardware component.

上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机。具体的,计算机例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。The systems, devices, modules or units described in the above embodiments may be specifically implemented by computer chips or entities, or by products with certain functions. A typical implementation device is a computer. Specifically, the computer can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or A combination of any of these devices.

为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above device, the functions are divided into various units and described respectively. Of course, when implementing the present application, the functions of each unit may be implemented in one or more software and/or hardware.

本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。Memory may include non-persistent memory in computer readable media, random access memory (RAM) and/or non-volatile memory in the form of, for example, read only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.

还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device comprising a series of elements includes not only those elements, but also Other elements not expressly listed, or which are inherent to such a process, method, article of manufacture, or apparatus are also included. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article of manufacture, or device that includes the element.

本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the system embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to the partial descriptions of the method embodiments.

以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are merely examples of the present application, and are not intended to limit the present application. Various modifications and variations of this application are possible for those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims (17)

1. A text recognition method, comprising:
carrying out character recognition on an object to be recognized, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
judging whether the block features of two adjacent lines of text blocks reach a preset feature condition, wherein the preset feature condition is a feature condition which is established by using a training sample and is met by the block features of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information;
and executing operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of merging and non-merging.
2. The method of claim 1, the object to be identified being an image.
3. The method of claim 1, prior to performing text recognition on the object to be recognized, further comprising:
detecting service information issued by a user;
and extracting the object to be identified from the service information.
4. The method of claim 1, wherein the block characteristics include one or both of a line height of each of the text blocks and a line spacing between two adjacent lines of the text blocks.
5. The method of claim 4, wherein determining whether the block features of two adjacent lines of the text block meet a preset feature condition comprises:
and judging that the line height of the text blocks in the two adjacent lines is not less than the line spacing.
6. The method of claim 5, wherein determining that the line height of the two adjacent lines of the text block is not less than the line spacing comprises:
and judging whether the line height of the two adjacent lines of text blocks exceeds the line spacing to reach a preset difference value.
7. The method of claim 4, wherein determining whether the block features of two adjacent lines of the text block meet a preset feature condition comprises:
and judging whether the difference between the line heights of the two adjacent lines of text blocks is not greater than a preset line height difference.
8. The method of claim 4, wherein performing an operation on the two adjacent lines of text blocks according to the determination result comprises:
and if the judgment result is that the preset characteristic condition is met, combining the two adjacent lines of text blocks into a paragraph.
9. A text recognition method, comprising:
carrying out character recognition on the training sample, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
and training a merging model by using the block features of the text blocks to determine a preset feature condition in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are recognized, whether the block features of the two adjacent lines of text blocks reach the preset feature condition is judged, and operation is performed on the two adjacent lines of text blocks according to a judgment result, wherein the operation comprises one of merging and non-merging.
10. The method of claim 9, prior to training a merging model using block features of the text block, further comprising:
acquiring merging information of two adjacent lines of the text blocks in the training sample, wherein the merging information represents whether the two adjacent text blocks are merged or not;
and marking the block features according to the merging information.
11. A text recognition method, comprising:
carrying out character recognition on an object to be recognized, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
processing the block features of two adjacent lines of text blocks by using a merging model to obtain a judgment result of whether the two adjacent lines of text blocks reach a preset feature condition, wherein the merging model is obtained by training by using the block features of two adjacent lines of text blocks identified from a training sample to determine the preset feature condition;
and executing operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of merging and non-merging.
12. A text recognition apparatus comprising:
the character recognition module is used for carrying out character recognition on the object to be recognized and respectively obtaining text blocks based on recognized characters of each row;
the extraction module is used for extracting block features of the text block;
the judging module is used for judging whether the block features of the two adjacent lines of text blocks reach a preset feature condition, wherein the preset feature condition is a feature condition which is established by using a training sample and is met by the block features of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information;
and the execution module executes operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of combination and non-combination.
13. A text recognition apparatus comprising:
the character recognition module is used for carrying out character recognition on the training sample and respectively obtaining text blocks based on recognized characters of each line;
the extraction module is used for extracting block features of the text block;
and the training module is used for training a merging model by using the block features of the text blocks so as to determine a preset feature condition in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are recognized, whether the block features of the two adjacent lines of text blocks reach the preset feature condition is judged, and operation is performed on the two adjacent lines of text blocks according to a judgment result, wherein the operation comprises one of merging and non-merging.
14. A text recognition apparatus comprising:
the character recognition module is used for carrying out character recognition on the object to be recognized and respectively obtaining text blocks based on recognized characters of each row;
the extraction module is used for extracting block features of the text block;
the model processing module is used for processing the block characteristics of two adjacent lines of text blocks by utilizing a merging model to obtain a judgment result of whether the two adjacent lines of text blocks reach a preset characteristic condition, and the merging model is obtained by utilizing the block characteristics of two adjacent lines of text blocks identified from a training sample to train so as to determine the preset characteristic condition;
and the execution module executes operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of combination and non-combination.
15. An electronic device, comprising:
a processor; and
a memory configured to store a computer program that, when executed, causes the processor to:
carrying out character recognition on an object to be recognized, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
judging whether the block features of two adjacent lines of text blocks reach a preset feature condition, wherein the preset feature condition is a feature condition which is established by using a training sample and is met by the block features of the two adjacent lines of text blocks when the two adjacent lines of text blocks belong to the same text information;
and executing operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of merging and non-merging.
16. An electronic device, comprising:
a processor; and
a memory configured to store a computer program that, when executed, causes the processor to:
carrying out character recognition on the training sample, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
and training a merging model by using the block features of the text blocks to determine a preset feature condition in the merging model, so that when the block features of two adjacent lines of text blocks in the object to be recognized are recognized, whether the block features of the two adjacent lines of text blocks reach the preset feature condition is judged, and operation is performed on the two adjacent lines of text blocks according to a judgment result, wherein the operation comprises one of merging and non-merging.
17. An electronic device, comprising:
a processor; and
a memory configured to store a computer program that, when executed, causes the processor to:
carrying out character recognition on an object to be recognized, and respectively obtaining text blocks based on recognized characters of each line;
extracting block features of the text block;
processing the block features of two adjacent lines of text blocks by using a merging model to obtain a judgment result of whether the two adjacent lines of text blocks reach a preset feature condition, wherein the merging model is obtained by training by using the block features of two adjacent lines of text blocks identified from a training sample to determine the preset feature condition;
and executing operation on the two adjacent lines of text blocks according to the judgment result, wherein the operation comprises one of merging and non-merging.
CN202010097683.2A 2020-02-17 2020-02-17 Text recognition method, device and electronic equipment Active CN111325195B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202410096251.8A CN117912017A (en) 2020-02-17 2020-02-17 Text recognition method and device and electronic equipment
CN202010097683.2A CN111325195B (en) 2020-02-17 2020-02-17 Text recognition method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010097683.2A CN111325195B (en) 2020-02-17 2020-02-17 Text recognition method, device and electronic equipment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202410096251.8A Division CN117912017A (en) 2020-02-17 2020-02-17 Text recognition method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111325195A true CN111325195A (en) 2020-06-23
CN111325195B CN111325195B (en) 2024-01-26

Family

ID=71172116

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202410096251.8A Pending CN117912017A (en) 2020-02-17 2020-02-17 Text recognition method and device and electronic equipment
CN202010097683.2A Active CN111325195B (en) 2020-02-17 2020-02-17 Text recognition method, device and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202410096251.8A Pending CN117912017A (en) 2020-02-17 2020-02-17 Text recognition method and device and electronic equipment

Country Status (1)

Country Link
CN (2) CN117912017A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903047A (en) * 2021-10-29 2022-01-07 深圳前海环融联易信息科技服务有限公司 A method of cross-line merging of key contents of the full text of trade contracts based on deep learning
CN115905865A (en) * 2022-11-22 2023-04-04 蚂蚁财富(上海)金融信息服务有限公司 Text Merging Judgment Model Training Method and Text Merging Judgment Method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410611A (en) * 1993-12-17 1995-04-25 Xerox Corporation Method for identifying word bounding boxes in text
US6801673B2 (en) * 2001-10-09 2004-10-05 Hewlett-Packard Development Company, L.P. Section extraction tool for PDF documents
CN101833544A (en) * 2009-03-10 2010-09-15 株式会社理光 Method and system for extracting word part from portable electronic document
TW201039149A (en) * 2009-04-17 2010-11-01 Yu-Chieh Wu Robust algorithms for video text information extraction and question-answer retrieval
CN102063619A (en) * 2010-11-30 2011-05-18 汉王科技股份有限公司 Character row extraction method and device
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
CN109711406A (en) * 2018-12-25 2019-05-03 中南大学 A Multi-Orientation Image Text Detection Method Based on Multi-scale Rotation Anchor Mechanism
KR101985612B1 (en) * 2018-01-16 2019-06-03 김학선 Method for manufacturing digital articles of paper-articles
CN109948518A (en) * 2019-03-18 2019-06-28 武汉汉王大数据技术有限公司 A kind of method of PDF document content text paragraph polymerization neural network based
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN110321895A (en) * 2019-04-30 2019-10-11 北京市商汤科技开发有限公司 Document identification method and device, electronic device, computer-readable storage medium
CN110414505A (en) * 2019-06-27 2019-11-05 深圳中兴网信科技有限公司 Processing method, processing system and the computer readable storage medium of image
CN110413962A (en) * 2019-06-28 2019-11-05 南京智录信息科技有限公司 Rimless form analysis technology in file and picture

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5410611A (en) * 1993-12-17 1995-04-25 Xerox Corporation Method for identifying word bounding boxes in text
US6801673B2 (en) * 2001-10-09 2004-10-05 Hewlett-Packard Development Company, L.P. Section extraction tool for PDF documents
CN101833544A (en) * 2009-03-10 2010-09-15 株式会社理光 Method and system for extracting word part from portable electronic document
TW201039149A (en) * 2009-04-17 2010-11-01 Yu-Chieh Wu Robust algorithms for video text information extraction and question-answer retrieval
CN102063619A (en) * 2010-11-30 2011-05-18 汉王科技股份有限公司 Character row extraction method and device
CN107784301A (en) * 2016-08-31 2018-03-09 百度在线网络技术(北京)有限公司 Method and apparatus for identifying character area in image
KR101985612B1 (en) * 2018-01-16 2019-06-03 김학선 Method for manufacturing digital articles of paper-articles
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN109711406A (en) * 2018-12-25 2019-05-03 中南大学 A Multi-Orientation Image Text Detection Method Based on Multi-scale Rotation Anchor Mechanism
CN109948518A (en) * 2019-03-18 2019-06-28 武汉汉王大数据技术有限公司 A kind of method of PDF document content text paragraph polymerization neural network based
CN110321895A (en) * 2019-04-30 2019-10-11 北京市商汤科技开发有限公司 Document identification method and device, electronic device, computer-readable storage medium
CN110414505A (en) * 2019-06-27 2019-11-05 深圳中兴网信科技有限公司 Processing method, processing system and the computer readable storage medium of image
CN110413962A (en) * 2019-06-28 2019-11-05 南京智录信息科技有限公司 Rimless form analysis technology in file and picture

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903047A (en) * 2021-10-29 2022-01-07 深圳前海环融联易信息科技服务有限公司 A method of cross-line merging of key contents of the full text of trade contracts based on deep learning
CN115905865A (en) * 2022-11-22 2023-04-04 蚂蚁财富(上海)金融信息服务有限公司 Text Merging Judgment Model Training Method and Text Merging Judgment Method
WO2024109597A1 (en) * 2022-11-22 2024-05-30 蚂蚁财富(上海)金融信息服务有限公司 Training method for text merging determination model, and text merging determination method

Also Published As

Publication number Publication date
CN111325195B (en) 2024-01-26
CN117912017A (en) 2024-04-19

Similar Documents

Publication Publication Date Title
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
CN110135411A (en) Business card recognition method and device
US12159452B2 (en) Automatically predicting text in images
CN114663904A (en) PDF document layout detection method, device, equipment and medium
CN112925905B (en) Method, device, electronic equipment and storage medium for extracting video subtitles
CN113111882B (en) Card identification method, device, electronic equipment and storage medium
CN111368902A (en) Data labeling method and device
CN113222022A (en) Webpage classification identification method and device
CN111507250B (en) Image recognition method, device and storage medium
CN111460355A (en) A kind of page parsing method and device
US9418310B1 (en) Assessing legibility of images
US20250046110A1 (en) Method for extracting and structuring information
CN114332873A (en) A kind of training method and device of recognition model
CN114817633A (en) Video classification method, device, equipment and storage medium
CN111325195A (en) Text recognition method, device and electronic device
CN114445807A (en) Text region detection method and device
CN114821616A (en) Page representation model training method and device and computing equipment
CN114067339A (en) Image recognition method and device, electronic device, and computer-readable storage medium
CN111242114B (en) Character recognition method and device
CN115004261B (en) Text line detection
CN115719488B (en) Text recognition method, text recognition device, electronic equipment and storage medium
CN115841672A (en) Character detection and identification method, device and equipment
CN112115952B (en) Image classification method, device and medium based on full convolution neural network
CN112183523A (en) Text detection method and device
CN117935245A (en) Character recognition method, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310000 Zhejiang Province, Hangzhou City, Xihu District, Xixi Road 543-569 (continuous odd numbers) Building 1, Building 2, 5th Floor, Room 518

Patentee after: Alipay (Hangzhou) Digital Service Technology Co.,Ltd.

Country or region after: China

Address before: 310000 801-11 section B, 8th floor, 556 Xixi Road, Xihu District, Hangzhou City, Zhejiang Province

Patentee before: Alipay (Hangzhou) Information Technology Co., Ltd.

Country or region before: China

CP03 Change of name, title or address