CN116012570A - A method, device and system for recognizing text information in an image - Google Patents
A method, device and system for recognizing text information in an image Download PDFInfo
- Publication number
- CN116012570A CN116012570A CN202111234376.5A CN202111234376A CN116012570A CN 116012570 A CN116012570 A CN 116012570A CN 202111234376 A CN202111234376 A CN 202111234376A CN 116012570 A CN116012570 A CN 116012570A
- Authority
- CN
- China
- Prior art keywords
- image
- information
- scene
- text
- scenes
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Input (AREA)
Abstract
本申请公开了一种识别图像中文本信息的方法、设备及系统,涉及计算机视觉(computer version,CV)领域,可以快速、精确的识别图像中的信息。本申请中,通过利用图像语义分割算法对图像进行场景分析,并根据分析结果调用对应的专项光学字符识别(optical character recognition,OCR)能力,如专项模型或专项算法分场景进行信息识别,最后整合多个专项模型或专项算法的信息识别结果,得到最终图像的信息识别结果。该方法可以得到更加准确、可靠的信息识别结果,为用户提供更加便捷、高效、可靠的智能化信息识别体验。
The present application discloses a method, device and system for recognizing text information in an image, which relates to the field of computer vision (computer version, CV), and can quickly and accurately recognize information in an image. In this application, the scene analysis of the image is carried out by using the image semantic segmentation algorithm, and the corresponding special optical character recognition (OCR) capability is invoked according to the analysis result, such as a special model or a special algorithm for information recognition by scene, and finally integrated The information recognition results of multiple special models or special algorithms are obtained to obtain the information recognition results of the final image. The method can obtain more accurate and reliable information identification results, and provide users with a more convenient, efficient, and reliable intelligent information identification experience.
Description
技术领域technical field
本申请实施例涉及计算机视觉(computer version,CV)领域,尤其涉及一种识别图像中文本信息的方法、设备及系统。The embodiments of the present application relate to the field of computer vision (computer version, CV), and in particular to a method, device and system for recognizing text information in an image.
背景技术Background technique
随着智能终端的普及,终端可以通过光学字符识别(optical characterrecognition,OCR) 技术识别图像中的文本信息,如识别文档中的文本,识别车牌号码等。With the popularity of smart terminals, terminals can recognize text information in images through optical character recognition (OCR) technology, such as recognizing text in documents, recognizing license plate numbers, and so on.
不同类型的文本信息呈现的形式不一样,例如表格文本和卡证文本的布局和内容差异比较大。当前,卡证文本图像中的文本可以通过卡证相关的模型或算法来识别,表格文本图像中的文本可以通过表格相关的模型或算法来识别。但对于一些情况稍复杂些的图像,如图像中的内容既包括卡证又包括表格,图像中文字的识别精度亟待提升。Different types of text information are presented in different forms, for example, the layout and content of table text and card text are quite different. Currently, the text in the card text image can be recognized by a card-related model or algorithm, and the text in the form text image can be recognized by a form-related model or algorithm. However, for some images with a slightly more complex situation, such as the content in the image includes both cards and forms, the recognition accuracy of the text in the image needs to be improved urgently.
发明内容Contents of the invention
本申请提供一种识别图像中文本信息的方法、设备及系统,可以快速、精确的识别图像中的信息。The present application provides a method, device and system for recognizing text information in an image, which can quickly and accurately recognize the information in the image.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:
第一方面,提供一种识别图像中文本信息的方法,该方法包括:首先,获取包括多个内容区域的图像;然后,通过多个模型分别识别上述多个内容区域中的文本信息;最后,整合上述多个模型对多个内容区域中的文本信息的识别结果,得到对上述图像的文本识别结果。作为一种示例,上述方法可以应用于信息识别系统。In a first aspect, a method for identifying text information in an image is provided, the method comprising: first, acquiring an image including multiple content areas; then, using multiple models to respectively identify the text information in the above multiple content areas; finally, Integrating the recognition results of the above-mentioned multiple models on the text information in the multiple content areas to obtain the above-mentioned text recognition result of the image. As an example, the above method can be applied to an information identification system.
上述第一方面提供的方案,可以通过多个专项模型或专项算法对图像中的多个内容区域进行信息识别,最后整合多个专项模型或专项算法的信息识别结果,得到最终图像的信息识别结果。该方法可以得到准确、可靠的信息识别结果,为用户提供更加便捷、高效、可靠的智能化信息识别体验。The solution provided by the first aspect above can identify multiple content areas in the image through multiple special models or special algorithms, and finally integrate the information recognition results of multiple special models or special algorithms to obtain the information recognition result of the final image . The method can obtain accurate and reliable information recognition results, and provide users with a more convenient, efficient and reliable intelligent information recognition experience.
在一种可能的实现方式中,上述多个内容区域对应多个场景;上述方法还包括:对图像的多个内容区域进行分类,确定多个内容区域对应的多个场景;上述通过多个模型分别识别多个内容区域中的文本信息,包括:通过多个场景对应的多个模型分别识别多个内容区域中的文本信息。作为一种示例,本申请提供的方案可以通过对图像进行场景分析,以确定多个内容区域对应的多个场景,从而可以基于具体场景分场景调用对应的专项模型或专项算法分场景进行信息识别,以得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned multiple content areas correspond to multiple scenes; the above-mentioned method further includes: classifying the multiple content areas of the image, and determining the multiple scenes corresponding to the multiple content areas; Identifying the text information in the multiple content areas respectively includes: respectively identifying the text information in the multiple content areas through multiple models corresponding to the multiple scenes. As an example, the solution provided by this application can determine multiple scenes corresponding to multiple content areas by performing scene analysis on the image, so that the corresponding special model or special algorithm can be called for information recognition based on the specific scene and scene. , in order to obtain accurate and reliable information identification results.
在一种可能的实现方式中,上述对多个内容区域进行分类,确定多个内容区域对应的多个场景,包括:基于图像语义分割模型或算法,对多个内容区域进行图像语义分割,确定多个内容区域对应的多个场景。通过利用图像语义分割模型或算法确定多个内容区域对应的多个场景,可以实现对复杂图像场景的准确、快速的分析,得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned classifying multiple content areas and determining multiple scenes corresponding to the multiple content areas includes: performing image semantic segmentation on multiple content areas based on an image semantic segmentation model or algorithm, and determining Multiple scenes corresponding to multiple content areas. By using image semantic segmentation models or algorithms to determine multiple scenes corresponding to multiple content areas, accurate and fast analysis of complex image scenes can be achieved, and accurate and reliable information recognition results can be obtained.
在一种可能的实现方式中,上述通过多个场景对应的多个模型分别识别多个内容区域中的文本信息,包括:根据上述多个内容区域对应的多个场景,将上述多个内容区域裁剪,得到多个图像区域;分别将上述多个图像区域分送至对应模型,以通过多个模型分别识别多个图像区域中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以按照场景对原图像进行裁剪后分送对应专项模型或专项算法,以减小向专项模型或专项算法输入的数据量,同时得到准确、可靠的信息识别结果。In a possible implementation manner, the aforementioned identification of the text information in the multiple content areas through the multiple models corresponding to the multiple scenarios includes: according to the multiple scenarios corresponding to the multiple content areas, converting the multiple content areas cropping to obtain a plurality of image regions; respectively distributing the above-mentioned plurality of image regions to corresponding models, so as to recognize text information in the plurality of image regions through the plurality of models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the original image can be cropped according to the scene and then distributed to the corresponding special model or special algorithm to reduce the input to the special model or special algorithm At the same time, accurate and reliable information identification results can be obtained.
在一种可能的实现方式中,上述通过多个场景对应的多个模型分别识别多个内容区域中的文本信息,包括:根据多个内容区域对应的多个场景,确定多个场景在图像中对应的多个图像区域的位置信息;将图像发送至多个模型,以及分别将多个图像区域的位置信息分送至对应模型,以通过多个模型分别识别多个图像区域中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以将场景对应的图像区域的位置信息告知专项模型或专项算法,以得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned multiple models corresponding to multiple scenes respectively identify the text information in multiple content areas, including: according to multiple scenes corresponding to multiple content areas, determine the multiple scenes in the image Position information of the corresponding multiple image areas; sending the images to multiple models, and respectively distributing the position information of the multiple image areas to the corresponding models, so as to recognize the text information in the multiple image areas through the multiple models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the position information of the image area corresponding to the scene can be notified to the special model or special algorithm, so as to obtain accurate and reliable information recognition results.
在一种可能的实现方式中,上述方法还包括:基于视觉显著性检测(visualsaliency detection,VSD)算法,计算图像中像素的显著性值;根据图像中像素的显著性值,确定图像的焦点区域;其中,图像的焦点区域中像素的显著性值大于预设阈值。为了提高信息识别的速度和准确性,还可以对图像进行视觉显著性分析,识别图像中用户关注度较高的像素点所在的图像区域,即焦点区域。In a possible implementation, the above method further includes: calculating the saliency value of the pixel in the image based on a visual saliency detection (VSD) algorithm; determining the focus area of the image according to the saliency value of the pixel in the image ; Wherein, the saliency value of the pixel in the focus area of the image is greater than a preset threshold. In order to improve the speed and accuracy of information recognition, the visual salience analysis can also be performed on the image to identify the image area where the pixels with high user attention in the image are located, that is, the focus area.
在一种可能的实现方式中,上述方法还包括:分别根据多个内容区域中像素的显著性值和焦点区域的面积,确定多个内容区域的区域显著性值;上述通过多个模型分别识别多个内容区域中的文本信息,包括:通过多个模型分别识别多个内容区域中,区域显著性值大于预设阈值的内容区域中的文本信息。为了提高信息识别的速度和准确性,还可以对图像进行区域显著性分析,识别图像中用户关注度较高的图像区域,即焦点区域。In a possible implementation manner, the above method further includes: determining regional saliency values of multiple content regions according to the saliency values of pixels in multiple content regions and the area of the focus area respectively; The text information in the multiple content areas includes: using multiple models to respectively identify the text information in the content areas whose regional salience values are greater than a preset threshold. In order to improve the speed and accuracy of information recognition, regional saliency analysis can also be performed on the image to identify the image area with high user attention in the image, that is, the focus area.
在一种可能的实现方式中,上述确定多个内容区域的区域显著性值,包括:基于以下公式1计算得到内容区域Ai的显著性OcrSaliency(Ai),其中,内容区域Ai的显著性用于表征内容区域Ai所表示的对象的重要程度;其中, Ai={pj},pj是内容区域Ai中的第j个像素,内容区域Ai中包括N个像素,1≤j≤N,N正整数,且N>1;α是自定义参数;S(pj)是像素pj的视觉显著性值,H(pj)是像素pj的掩码值。作为一种示例,可以基于上述公式1计算得到多个内容区域的区域显著性值。In a possible implementation manner, the determination of regional saliency values of multiple content regions includes: calculating and obtaining the saliency OcrSaliency(A i ) of the content region A i based on the following formula 1, wherein the saliency of the content region A i The property is used to characterize the importance of the object represented by the content area A i ; Among them, A i ={p j }, p j is the jth pixel in the content area A i , and the content area A i includes N pixels, 1≤j≤N, N is a positive integer, and N>1; α is a custom parameter; S(p j ) is the visual saliency value of pixel p j , and H(p j ) is the mask value of pixel p j . As an example, the region salience values of multiple content regions can be calculated based on the above formula 1.
在一种可能的实现方式中,上述方法还包括:对图像进行文本行分析,确定图像包括第一信息,其中,第一信息的第一部分属于第一场景,第一信息的第二部分属于第二场景;合并第一场景和第二场景对应的图像区域。通过将原本孤立的图像场景分析和文本行分析结合进行图像语义分割,可以基于两者的结果互补,避免由于阴影等原因造成的误分割问题,提高图像语义分割的容错性,保证图像语义分割时的可靠性和准确性。In a possible implementation manner, the above method further includes: performing text line analysis on the image, and determining that the image includes the first information, wherein the first part of the first information belongs to the first scene, and the second part of the first information belongs to the first scene. Two scenes; Merge the image regions corresponding to the first scene and the second scene. By combining the original isolated image scene analysis and text line analysis for image semantic segmentation, the results of the two can be complementary, avoiding wrong segmentation problems caused by shadows and other reasons, improving the fault tolerance of image semantic segmentation, and ensuring image semantic segmentation. reliability and accuracy.
在一种可能的实现方式中,上述方法还包括:在图像上叠加展示对图像的文本识别结果;或者,在图像上悬浮展示对图像的文本识别结果;或者,以图像的排版格式展示对图像的文本识别结果;或者,按照预设规则重新排版,以展示对图像的文本识别结果,预设规则包括:预设位置顺序、预设显著性顺序。本申请对于识别结果的展示形式不做限定,视具体需求和设备功能而定。In a possible implementation, the above method further includes: superimposing and displaying the text recognition result of the image on the image; or displaying the text recognition result of the image floating on the image; or displaying the text recognition result of the image in the typesetting format of the image The text recognition result of the image; or, rearrange the typesetting according to preset rules to display the text recognition result of the image, and the preset rules include: preset position order, preset significance order. This application does not limit the display form of the recognition result, which depends on specific requirements and device functions.
在一种可能的实现方式中,上述多个内容区域对应的多个场景包括以下中的任意多种:卡证场景、车牌场景、照片场景、招牌场景、海报场景、路牌场景、路标场景、文档场景、表格场景、书籍场景、环境场景。本申请提供的方案可以识别任意多种场景组合下的复杂图像中的文本信息,通用性强且性能稳定。In a possible implementation manner, the multiple scenes corresponding to the above multiple content areas include any of the following: card scene, license plate scene, photo scene, signboard scene, poster scene, street sign scene, road sign scene, document Scenes, Table Scenes, Book Scenes, Environmental Scenes. The solution provided by this application can recognize text information in complex images under any combination of scenes, and has strong versatility and stable performance.
第二方面,提供一种识别图像中文本信息的方法,该方法应用于第一设备,该方法包括:第一设备获取包括多个内容区域的图像;第一设备对上述多个内容区域进行分类,确定多个内容区域对应的多个场景;第一设备向第二设备发送上述多个场景对应的多个信息识别任务;第一设备接收来自第二设备的对图像的信息识别结果。In a second aspect, a method for identifying text information in an image is provided, the method is applied to a first device, and the method includes: the first device acquires an image including a plurality of content areas; the first device classifies the plurality of content areas , determine multiple scenes corresponding to multiple content areas; the first device sends multiple information recognition tasks corresponding to the above multiple scenes to the second device; the first device receives the image information recognition result from the second device.
上述第二方面提供的方案,第一设通过对图像进行场景分析,以确定多个内容区域对应的多个场景,从而可以指示第二设备基于具体场景分场景调用对应的专项模型或专项算法分场景进行信息识别,以得到准确、可靠的信息识别结果。In the solution provided by the second aspect above, the first setting is to determine multiple scenes corresponding to multiple content areas by performing scene analysis on the image, so as to instruct the second device to call the corresponding special model or special algorithm analysis based on the specific scene. Scenarios for information identification to obtain accurate and reliable information identification results.
在一种可能的实现方式中,上述第一设备对多个内容区域进行分类,确定多个内容区域对应的多个场景,包括:第一设备基于图像语义分割模型或算法,对多个内容区域进行图像语义分割,确定多个内容区域对应的多个场景。通过利用图像语义分割模型或算法确定多个内容区域对应的多个场景,可以实现对复杂图像场景的准确、快速的分析,得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned first device classifies the multiple content areas, and determines the multiple scenes corresponding to the multiple content areas, including: the first device classifies the multiple content areas based on the image semantic segmentation model or algorithm Perform image semantic segmentation to determine multiple scenes corresponding to multiple content areas. By using image semantic segmentation models or algorithms to determine multiple scenes corresponding to multiple content areas, accurate and fast analysis of complex image scenes can be achieved, and accurate and reliable information recognition results can be obtained.
在一种可能的实现方式中,上述方法还包括:第一设备根据上述多个内容区域对应的多个场景,将上述多个内容区域裁剪,得到多个子图像;上述第一设备向第二设备发送多个场景对应的多个信息识别任务,包括:第一设备将多个子图像,分送至第二设备中的对应模型,以通过多个模型分别识别多个子图像中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以按照场景对原图像进行裁剪后分送对应专项模型或专项算法,以减小向专项模型或专项算法输入的数据量,同时得到准确、可靠的信息识别结果。In a possible implementation manner, the above method further includes: the first device crops the above multiple content areas according to the multiple scenes corresponding to the above multiple content areas to obtain multiple sub-images; Sending multiple information recognition tasks corresponding to multiple scenes includes: the first device distributes multiple sub-images to corresponding models in the second device, so as to respectively recognize text information in the multiple sub-images through multiple models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the original image can be cropped according to the scene and then distributed to the corresponding special model or special algorithm to reduce the input to the special model or special algorithm At the same time, accurate and reliable information identification results can be obtained.
在一种可能的实现方式中,上述方法还包括:第一设备根据多个内容区域对应的多个场景,确定多个场景在图像中对应的多个图像区域的位置信息;第一设备将图像发送至第二设备,以及将多个图像区域的位置信息分送至第二设备中的对应模型,以通过多个模型分别识别多个图像区域中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以将场景对应的图像区域的位置信息告知专项模型或专项算法,以得到准确、可靠的信息识别结果。In a possible implementation manner, the above method further includes: the first device determines the position information of the multiple image areas corresponding to the multiple scenes in the image according to the multiple scenes corresponding to the multiple content areas; sending to the second device, and distributing the position information of the plurality of image regions to corresponding models in the second device, so as to respectively recognize the text information in the plurality of image regions through the plurality of models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the position information of the image area corresponding to the scene can be notified to the special model or special algorithm, so as to obtain accurate and reliable information recognition results.
在一种可能的实现方式中,上述方法还包括:第一设备基于视觉显著性检测VSD算法,计算图像中像素的显著性值;第一设备根据图像中像素的显著性值,确定图像的焦点区域;其中,图像的焦点区域中像素的显著性值大于预设阈值。为了提高信息识别的速度和准确性,还可以对图像进行视觉显著性分析,识别图像中用户关注度较高的像素点所在的图像区域,即焦点区域。In a possible implementation, the above method further includes: the first device calculates the saliency value of the pixel in the image based on the visual saliency detection VSD algorithm; the first device determines the focus of the image according to the saliency value of the pixel in the image region; where the saliency value of the pixels in the focus region of the image is greater than a preset threshold. In order to improve the speed and accuracy of information recognition, the visual salience analysis can also be performed on the image to identify the image area where the pixels with high user attention in the image are located, that is, the focus area.
在一种可能的实现方式中,上述方法还包括:第一设备分别根据多个内容区域中像素的显著性值和焦点区域的面积,确定多个内容区域的区域显著性值;上述第一设备向第二设备发送多个场景对应的多个信息识别任务,包括:第一设备向第二设备发送多个内容区域中,区域显著性值大于预设阈值的内容区域对应的多个信息识别任务。为了提高信息识别的速度和准确性,还可以对图像进行区域显著性分析,识别图像中用户关注度较高的图像区域,即焦点区域。In a possible implementation manner, the above method further includes: the first device determines the area saliency values of multiple content areas according to the saliency values of pixels in the multiple content areas and the area of the focus area respectively; the above-mentioned first device Sending multiple information identification tasks corresponding to multiple scenarios to the second device includes: the first device sends multiple information identification tasks corresponding to content areas whose area salience value is greater than a preset threshold among multiple content areas to the second device . In order to improve the speed and accuracy of information recognition, regional saliency analysis can also be performed on the image to identify the image area with high user attention in the image, that is, the focus area.
在一种可能的实现方式中,上述第一设备确定多个内容区域的区域显著性值,包括:第一设备基于以下公式1计算得到内容区域Ai的显著性OcrSaliency(Ai),其中,内容区域Ai的显著性用于表征内容区域Ai所表示的对象的重要程度;其中,Ai={pj},pj是内容区域Ai中的第j个像素,内容区域Ai中包括N个像素,1≤j≤N,N正整数,且N>1;α是自定义参数;S(pj) 是像素pj的视觉显著性值,H(pj)是像素pj的掩码值。作为一种示例,第一设备可以基于上述公式1计算得到多个内容区域的区域显著性值。In a possible implementation manner, the first device determining regional saliency values of multiple content regions includes: the first device calculates the saliency OcrSaliency(A i ) of the content region A i based on the following formula 1, where, The salience of the content area A i is used to characterize the importance of the object represented by the content area A i ; Among them, A i ={p j }, p j is the jth pixel in the content area A i , and the content area A i includes N pixels, 1≤j≤N, N is a positive integer, and N>1; α is a custom parameter; S(p j ) is the visual saliency value of pixel p j , and H(p j ) is the mask value of pixel p j . As an example, the first device may calculate area salience values of multiple content areas based on the foregoing formula 1.
在一种可能的实现方式中,上述方法还包括:第一设备对图像进行文本行分析,确定图像包括第一信息,其中,第一信息的第一部分属于第一场景,第一信息的第二部分属于第二场景;第一设备合并第一场景和第二场景对应的图像区域。通过将原本孤立的图像场景分析和文本行分析结合进行图像语义分割,可以基于两者的结果互补,避免由于阴影等原因造成的误分割问题,提高图像语义分割的容错性,保证图像语义分割时的可靠性和准确性。In a possible implementation manner, the above method further includes: the first device performs text line analysis on the image, and determines that the image includes the first information, wherein the first part of the first information belongs to the first scene, and the second part of the first information The part belongs to the second scene; the first device merges image regions corresponding to the first scene and the second scene. By combining the original isolated image scene analysis and text line analysis for image semantic segmentation, the results of the two can be complementary, avoiding wrong segmentation problems caused by shadows and other reasons, improving the fault tolerance of image semantic segmentation, and ensuring image semantic segmentation. reliability and accuracy.
在一种可能的实现方式中,上述方法还包括:第一设备在图像上叠加展示对图像的文本识别结果;或者,第一设备在图像上悬浮展示对图像的文本识别结果;或者,第一设备以图像的排版格式展示对图像的文本识别结果;或者,第一设备按照预设规则重新排版,以展示对图像的文本识别结果,预设规则包括:预设位置顺序、预设显著性顺序。本申请对于识别结果的展示形式不做限定,视具体需求和设备功能而定。In a possible implementation, the above method further includes: the first device superimposes and displays the text recognition result of the image on the image; or, the first device hovers on the image to display the text recognition result of the image; or, the first The device displays the text recognition result of the image in the typesetting format of the image; or, the first device rearranges the typesetting according to preset rules to display the text recognition result of the image. The preset rules include: preset position order, preset significance order . This application does not limit the display form of the recognition result, which depends on specific requirements and device functions.
在一种可能的实现方式中,上述多个内容区域对应的多个场景包括以下中的任意多种:卡证场景、车牌场景、照片场景、招牌场景、海报场景、路牌场景、路标场景、文档场景、表格场景、书籍场景、环境场景。本申请提供的方案可以识别任意多种场景组合下的复杂图像中的文本信息,通用性强且性能稳定。In a possible implementation manner, the multiple scenes corresponding to the above multiple content areas include any of the following: card scene, license plate scene, photo scene, signboard scene, poster scene, street sign scene, road sign scene, document Scenes, Table Scenes, Book Scenes, Environmental Scenes. The solution provided by this application can recognize text information in complex images under any combination of scenes, and has strong versatility and stable performance.
第三方面,提供一种识别图像中文本信息的方法,该方法应用于第二设备,该方法包括:第二设备接收来自第一设备的图像的多个信息识别任务,该多个信息识别任务对应多个场景;第二设备通过多个模型分别执行多个信息识别任务,得到多个信息识别结果;第二设备整合多个模型得到的多个信息识别结果,得到对图像的文本识别结果。In a third aspect, a method for recognizing text information in an image is provided, the method is applied to a second device, and the method includes: the second device receives a plurality of information recognition tasks from an image of the first device, and the plurality of information recognition tasks Corresponding to multiple scenarios; the second device respectively performs multiple information recognition tasks through multiple models to obtain multiple information recognition results; the second device integrates multiple information recognition results obtained from multiple models to obtain text recognition results for images.
上述第三方面提供的方案,第二设备可以通过多个专项模型或专项算法对图像中的多个内容区域进行信息识别,最后整合多个专项模型或专项算法的信息识别结果,得到最终图像的信息识别结果。该方法可以得到准确、可靠的信息识别结果,为用户提供更加便捷、高效、可靠的智能化信息识别体验。In the solution provided by the third aspect above, the second device can identify multiple content areas in the image through multiple special models or special algorithms, and finally integrate the information recognition results of multiple special models or special algorithms to obtain the final image. information identification results. The method can obtain accurate and reliable information recognition results, and provide users with a more convenient, efficient and reliable intelligent information recognition experience.
在一种可能的实现方式中,上述方法还包括:第二设备向第一设备发送对图像的文本识别结果,以方便用户查看识别出的图像中的文本信息。In a possible implementation manner, the above method further includes: the second device sends the text recognition result of the image to the first device, so as to facilitate the user to view the text information in the recognized image.
在一种可能的实现方式中,上述多个信息识别任务分别对应图像的多个子图像,上述多个子图像对应多个场景;第二设备通过多个模型分别执行多个信息识别任务,得到多个信息识别结果,包括:第二设备将多个子图像分送至对应场景所对应的模型,以通过多个模型分别识别多个子图像中的文本信息。通过按照场景对原图像中多个场景对应的子图像进行信息识别,可以减小向专项模型或专项算法输入的数据量,同时得到准确、可靠的信息识别结果。In a possible implementation, the above-mentioned multiple information recognition tasks respectively correspond to multiple sub-images of the image, and the above-mentioned multiple sub-images correspond to multiple scenes; the second device respectively performs multiple information recognition tasks through multiple models, and obtains multiple The information recognition result includes: the second device distributes the multiple sub-images to the models corresponding to the corresponding scenes, so as to respectively recognize the text information in the multiple sub-images through the multiple models. By performing information recognition on the sub-images corresponding to multiple scenes in the original image according to the scene, the amount of data input to the special model or special algorithm can be reduced, and accurate and reliable information recognition results can be obtained at the same time.
在一种可能的实现方式中,上述多个信息识别任务分别用于指示图像和图像中的多个图像区域的位置信息,多个图像区域对应多个场景;第二设备通过多个模型分别执行多个信息识别任务,得到多个信息识别结果,包括:第二设备将图像和多个图像区域的位置信息分送至对应场景所对应的模型,以通过多个模型分别识别多个图像区域中的文本信息。通过按照场景和位置信息对原图像中多个场景对应的图像区域进行信息识别,可以得到准确、可靠的信息识别结果。In a possible implementation, the above multiple information recognition tasks are respectively used to indicate the position information of the image and multiple image areas in the image, and the multiple image areas correspond to multiple scenes; the second device executes the tasks respectively through multiple models Multiple information recognition tasks to obtain multiple information recognition results, including: the second device distributes the position information of the image and multiple image regions to the model corresponding to the corresponding scene, so as to identify the information in the multiple image regions through multiple models text information. Accurate and reliable information recognition results can be obtained by performing information recognition on image regions corresponding to multiple scenes in the original image according to the scene and position information.
在一种可能的实现方式中,上述第二设备整合多个模型得到的多个信息识别结果,得到对图像的文本识别结果,包括:第二设备以图像的排版格式整合多个信息识别结果,得到对图像的文本识别结果;或者,第二设备按照预设规则整合多个信息识别结果,得到对图像的文本识别结果;预设规则包括:预设位置顺序、预设显著性顺序。本申请对于识别结果的整合形式不做限定,视具体需求和设备功能而定。In a possible implementation manner, the second device integrates multiple information recognition results obtained from multiple models to obtain a text recognition result for an image, including: the second device integrates multiple information recognition results in an image typesetting format, Obtaining the text recognition result of the image; or, the second device integrates multiple information recognition results according to preset rules to obtain the text recognition result of the image; the preset rules include: preset position order and preset significance order. This application does not limit the integration form of the recognition results, which depends on specific requirements and device functions.
第四方面,提供一种第一设备,该第一设备包括:处理单元,用于获取包括多个内容区域的图像;以及,对上述多个内容区域进行分类,确定多个内容区域对应的多个场景。收发单元,用于向第二设备发送上述多个场景对应的多个信息识别任务;以及,接收来自第二设备的对图像的信息识别结果。In a fourth aspect, a first device is provided, and the first device includes: a processing unit, configured to acquire an image including multiple content areas; and classify the multiple content areas, and determine multiple content areas corresponding to the multiple content areas. scenes. The transceiver unit is configured to send the multiple information recognition tasks corresponding to the above multiple scenarios to the second device; and receive the information recognition result of the image from the second device.
上述第四方面提供的方案,第一设通过对图像进行场景分析,以确定多个内容区域对应的多个场景,从而可以指示第二设备基于具体场景分场景调用对应的专项模型或专项算法分场景进行信息识别,以得到准确、可靠的信息识别结果。In the solution provided by the fourth aspect above, the first setting is to determine multiple scenes corresponding to multiple content areas by performing scene analysis on the image, so as to instruct the second device to call the corresponding special model or special algorithm analysis based on the specific scene. Scenarios for information identification to obtain accurate and reliable information identification results.
在一种可能的实现方式中,上述处理单元对多个内容区域进行分类,确定多个内容区域对应的多个场景,包括:处理单元基于图像语义分割模型或算法,对多个内容区域进行图像语义分割,确定多个内容区域对应的多个场景。通过利用图像语义分割模型或算法确定多个内容区域对应的多个场景,可以实现对复杂图像场景的准确、快速的分析,得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned processing unit classifies multiple content areas, and determines multiple scenes corresponding to the multiple content areas, including: the processing unit performs image segmentation on the multiple content areas based on an image semantic segmentation model or algorithm. Semantic segmentation, to determine multiple scenes corresponding to multiple content areas. By using image semantic segmentation models or algorithms to determine multiple scenes corresponding to multiple content areas, accurate and fast analysis of complex image scenes can be achieved, and accurate and reliable information recognition results can be obtained.
在一种可能的实现方式中,上述处理单元还用于:根据上述多个内容区域对应的多个场景,将上述多个内容区域裁剪,得到多个子图像;上述收发单元具体用于:将多个子图像,分送至第二设备中的对应模型,以通过多个模型分别识别多个子图像中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以按照场景对原图像进行裁剪后分送对应专项模型或专项算法,以减小向专项模型或专项算法输入的数据量,同时得到准确、可靠的信息识别结果。In a possible implementation manner, the processing unit is further configured to: crop the multiple content regions to obtain multiple sub-images according to the multiple scenes corresponding to the multiple content regions; the transceiver unit is specifically configured to: The sub-images are distributed to the corresponding models in the second device, so that the text information in the multiple sub-images can be recognized by the multiple models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the original image can be cropped according to the scene and then distributed to the corresponding special model or special algorithm to reduce the input to the special model or special algorithm At the same time, accurate and reliable information identification results can be obtained.
在一种可能的实现方式中,上述处理单元还用于:根据多个内容区域对应的多个场景,确定多个场景在图像中对应的多个图像区域的位置信息;上述收发单元具体用于:将图像发送至第二设备,以及将多个图像区域的位置信息分送至第二设备中的对应模型,以通过多个模型分别识别多个图像区域中的文本信息。为了分场景调用对应的专项模型或专项算法分场景进行信息识别,作为一种示例,可以将场景对应的图像区域的位置信息告知专项模型或专项算法,以得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned processing unit is further configured to: determine the position information of multiple image areas corresponding to multiple scenes in the image according to multiple scenes corresponding to multiple content areas; the above-mentioned sending and receiving unit is specifically used to : sending the image to the second device, and distributing the location information of the plurality of image regions to corresponding models in the second device, so as to respectively recognize the text information in the plurality of image regions by the plurality of models. In order to call the corresponding special model or special algorithm for information recognition by scene, as an example, the position information of the image area corresponding to the scene can be notified to the special model or special algorithm, so as to obtain accurate and reliable information recognition results.
在一种可能的实现方式中,上述处理单元还用于:基于视觉显著性检测VSD算法,计算图像中像素的显著性值;根据图像中像素的显著性值,确定图像的焦点区域;其中,图像的焦点区域中像素的显著性值大于预设阈值。为了提高信息识别的速度和准确性,还可以对图像进行视觉显著性分析,识别图像中用户关注度较高的像素点所在的图像区域,即焦点区域。In a possible implementation manner, the above processing unit is further configured to: calculate the saliency value of the pixel in the image based on the visual saliency detection VSD algorithm; determine the focus area of the image according to the saliency value of the pixel in the image; wherein, The saliency value of pixels in the focus area of the image is greater than a preset threshold. In order to improve the speed and accuracy of information recognition, the visual salience analysis can also be performed on the image to identify the image area where the pixels with high user attention in the image are located, that is, the focus area.
在一种可能的实现方式中,上述处理单元还用于:分别根据多个内容区域中像素的显著性值和焦点区域的面积,确定多个内容区域的区域显著性值;上述收发单元具体用于:向第二设备发送多个内容区域中,区域显著性值大于预设阈值的内容区域对应的多个信息识别任务。为了提高信息识别的速度和准确性,还可以对图像进行区域显著性分析,识别图像中用户关注度较高的图像区域,即焦点区域。In a possible implementation manner, the above processing unit is further configured to: determine the area saliency values of multiple content areas according to the saliency values of pixels in the multiple content areas and the area of the focus area respectively; the above sending and receiving unit specifically uses In: sending to the second device a plurality of information identification tasks corresponding to content regions whose region salience values are greater than a preset threshold. In order to improve the speed and accuracy of information recognition, regional saliency analysis can also be performed on the image to identify the image area with high user attention in the image, that is, the focus area.
在一种可能的实现方式中,上述处理单元具体用于:基于以下公式1计算得到内容区域Ai的显著性OcrSaliency(Ai),其中,内容区域Ai的显著性用于表征内容区域Ai所表示的对象的重要程度;其中,Ai={pj},pj是内容区域Ai中的第j个像素,内容区域Ai中包括N个像素,1≤j≤N,N正整数,且N>1;α是自定义参数;S(pj)是像素pj的视觉显著性值,H(pj)是像素pj的掩码值。作为一种示例,第一设备可以基于上述公式1计算得到多个内容区域的区域显著性值。In a possible implementation manner, the above processing unit is specifically configured to: calculate and obtain the saliency OcrSaliency(A i ) of the content area A i based on the following formula 1, wherein the saliency of the content area A i is used to characterize the content area A the importance of the object represented by i ; Among them, A i ={p j }, p j is the jth pixel in the content area A i , and the content area A i includes N pixels, 1≤j≤N, N is a positive integer, and N>1; α is a custom parameter; S(p j ) is the visual saliency value of pixel p j , and H(p j ) is the mask value of pixel p j . As an example, the first device may calculate area salience values of multiple content areas based on the foregoing formula 1.
在一种可能的实现方式中,上述处理单元还用于:对图像进行文本行分析,确定图像包括第一信息,其中,第一信息的第一部分属于第一场景,第一信息的第二部分属于第二场景;以及,合并第一场景和第二场景对应的图像区域。通过将原本孤立的图像场景分析和文本行分析结合进行图像语义分割,可以基于两者的结果互补,避免由于阴影等原因造成的误分割问题,提高图像语义分割的容错性,保证图像语义分割时的可靠性和准确性。In a possible implementation manner, the above processing unit is further configured to: perform text line analysis on the image to determine that the image includes the first information, wherein the first part of the first information belongs to the first scene, and the second part of the first information belonging to the second scene; and merging image regions corresponding to the first scene and the second scene. By combining the original isolated image scene analysis and text line analysis for image semantic segmentation, the results of the two can be complementary, avoiding wrong segmentation problems caused by shadows and other reasons, improving the fault tolerance of image semantic segmentation, and ensuring image semantic segmentation. reliability and accuracy.
在一种可能的实现方式中,上述第一设备还包括:显示单元,用于在图像上叠加展示对图像的文本识别结果;或者,在图像上悬浮展示对图像的文本识别结果;或者,以图像的排版格式展示对图像的文本识别结果;或者,按照预设规则重新排版,以展示对图像的文本识别结果,预设规则包括:预设位置顺序、预设显著性顺序。本申请对于识别结果的展示形式不做限定,视具体需求和设备功能而定。In a possible implementation manner, the above-mentioned first device further includes: a display unit, configured to superimpose and display the text recognition result of the image on the image; or display the text recognition result of the image by floating on the image; or, use The typesetting format of the image shows the text recognition result of the image; or, retypesetting according to preset rules to show the text recognition result of the image. The preset rules include: preset position order, preset significance order. This application does not limit the display form of the recognition result, which depends on specific requirements and device functions.
在一种可能的实现方式中,上述多个内容区域对应的多个场景包括以下中的任意多种:卡证场景、车牌场景、照片场景、招牌场景、海报场景、路牌场景、路标场景、文档场景、表格场景、书籍场景、环境场景。本申请提供的方案可以识别任意多种场景组合下的复杂图像中的文本信息,通用性强且性能稳定。In a possible implementation manner, the multiple scenes corresponding to the above multiple content areas include any of the following: card scene, license plate scene, photo scene, signboard scene, poster scene, street sign scene, road sign scene, document Scenes, Table Scenes, Book Scenes, Environmental Scenes. The solution provided by this application can recognize text information in complex images under any combination of scenes, and has strong versatility and stable performance.
第五方面,提供一种第二设备,该第二设备包括:收发单元,用于接收来自第一设备的图像的多个信息识别任务,该多个信息识别任务对应多个场景;处理单元,用于通过多个模型分别执行多个信息识别任务,得到多个信息识别结果;以及,整合多个模型得到的多个信息识别结果,得到对图像的文本识别结果。In a fifth aspect, there is provided a second device, the second device comprising: a transceiver unit configured to receive multiple information recognition tasks of images from the first device, the multiple information recognition tasks corresponding to multiple scenarios; a processing unit, It is used to perform multiple information recognition tasks through multiple models to obtain multiple information recognition results; and integrate multiple information recognition results obtained from multiple models to obtain text recognition results for images.
上述第五方面提供的方案,第二设备可以通过多个专项模型或专项算法对图像中的多个内容区域进行信息识别,最后整合多个专项模型或专项算法的信息识别结果,得到最终图像的信息识别结果。该方法可以得到准确、可靠的信息识别结果,为用户提供更加便捷、高效、可靠的智能化信息识别体验。In the solution provided in the fifth aspect above, the second device can use multiple special models or special algorithms to perform information identification on multiple content areas in the image, and finally integrate the information recognition results of multiple special models or special algorithms to obtain the final image. information identification results. The method can obtain accurate and reliable information recognition results, and provide users with a more convenient, efficient and reliable intelligent information recognition experience.
在一种可能的实现方式中,上述收发单元还用于:向第一设备发送对图像的文本识别结果,以方便用户查看识别出的图像中的文本信息。In a possible implementation manner, the above-mentioned transceiving unit is further configured to: send the text recognition result of the image to the first device, so as to facilitate the user to view the text information in the recognized image.
在一种可能的实现方式中,上述多个信息识别任务分别对应图像的多个子图像,上述多个子图像对应多个场景;上述处理单元具体用于:将多个子图像分送至对应场景所对应的模型,以通过多个模型分别识别多个子图像中的文本信息。通过按照场景对原图像中多个场景对应的子图像进行信息识别,可以减小向专项模型或专项算法输入的数据量,同时得到准确、可靠的信息识别结果。In a possible implementation manner, the above-mentioned multiple information recognition tasks correspond to multiple sub-images of the image, and the above-mentioned multiple sub-images correspond to multiple scenes; the above-mentioned processing unit is specifically used to: distribute the multiple sub-images to the corresponding scene A model to identify text information in multiple sub-images through multiple models. By performing information recognition on the sub-images corresponding to multiple scenes in the original image according to the scene, the amount of data input to the special model or special algorithm can be reduced, and accurate and reliable information recognition results can be obtained at the same time.
在一种可能的实现方式中,上述多个信息识别任务分别用于指示图像和图像中的多个图像区域的位置信息,多个图像区域对应多个场景;上述处理单元具体用于:将图像和多个图像区域的位置信息分送至对应场景所对应的模型,以通过多个模型分别识别多个图像区域中的文本信息。通过按照场景和位置信息对原图像中多个场景对应的图像区域进行信息识别,可以得到准确、可靠的信息识别结果。In a possible implementation manner, the above multiple information recognition tasks are respectively used to indicate the position information of the image and multiple image areas in the image, and the multiple image areas correspond to multiple scenes; the above processing unit is specifically used to: and the position information of multiple image areas are distributed to the models corresponding to the corresponding scenes, so that the text information in the multiple image areas can be respectively recognized by the multiple models. Accurate and reliable information recognition results can be obtained by performing information recognition on image regions corresponding to multiple scenes in the original image according to the scene and position information.
在一种可能的实现方式中,上述处理单元具体用于:以图像的排版格式整合多个信息识别结果,得到对图像的文本识别结果;或者,按照预设规则整合多个信息识别结果,得到对图像的文本识别结果;预设规则包括:预设位置顺序、预设显著性顺序。本申请对于识别结果的整合形式不做限定,视具体需求和设备功能而定。In a possible implementation, the above processing unit is specifically configured to: integrate multiple information recognition results in an image typesetting format to obtain a text recognition result for an image; or, integrate multiple information recognition results according to preset rules to obtain The text recognition result of the image; preset rules include: preset position order, preset significance order. This application does not limit the integration form of the recognition results, which depends on specific requirements and device functions.
第六方面,提供一种第一设备,该第一设备包括:存储器,用于存储计算机程序;收发器,用于接收或发送无线电信号;处理器,用于执行所述计算机程序,使得第一设备执行如第二方面任一种可能的实现方式中的方法。According to a sixth aspect, a first device is provided, and the first device includes: a memory for storing a computer program; a transceiver for receiving or sending a radio signal; a processor for executing the computer program, so that the first The device executes the method in any possible implementation manner of the second aspect.
第七方面,提供一种第二设备,该第二设备包括:存储器,用于存储计算机程序;收发器,用于接收或发送无线电信号;处理器,用于执行所述计算机程序,使得第二设备执行如第三方面任一种可能的实现方式中的方法。In a seventh aspect, a second device is provided, and the second device includes: a memory for storing a computer program; a transceiver for receiving or sending a radio signal; a processor for executing the computer program, so that the second The device executes the method in any possible implementation manner of the third aspect.
第八方面,提供一种信息识别系统,该信息识别系统包括如第四方面或第六方面任一种可能的实现方式中第一设备,以及如第五方面或第七方面任一种可能的实现方式中第二设备。该通信系统用于实现如第一方面任一种可能的实现方式中的方法。The eighth aspect provides an information identification system, the information identification system includes the first device in any possible implementation manner of the fourth aspect or the sixth aspect, and any possible implementation manner of the fifth aspect or the seventh aspect In an implementation, the second device. The communication system is used to implement the method in any possible implementation manner of the first aspect.
第九方面,提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序代码,该计算机程序代码被处理器执行时,使得处理器实现如第二方面或第三方面任一种可能的实现方式中的方法。According to a ninth aspect, a computer-readable storage medium is provided. Computer program code is stored on the computer-readable storage medium. When the computer program code is executed by a processor, the processor implements any one of the second aspect or the third aspect. method in one possible implementation.
第十方面,提供一种芯片系统,该芯片系统包括处理器、存储器,存储器中存储有计算机程序代码;所述计算机程序代码被所述处理器执行时,使得处理器实现如第二方面或第三方面任一种可能的实现方式中的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。In a tenth aspect, a chip system is provided, the chip system includes a processor and a memory, and computer program code is stored in the memory; when the computer program code is executed by the processor, the processor implements the second aspect or the first A method in any of the possible implementations of the three aspects. The system-on-a-chip may consist of chips, or may include chips and other discrete devices.
第十一方面,提供一种计算机程序产品,该计算机程序产品包括计算机指令。当该计算机指令在计算机上运行时,使得计算机实现如第二方面或第三方面任一种可能的实现方式中的方法。In an eleventh aspect, a computer program product comprising computer instructions is provided. When the computer instructions are run on the computer, the computer is made to implement the method in any possible implementation manner of the second aspect or the third aspect.
附图说明Description of drawings
图1为一种信息识别结果示例图;FIG. 1 is an example diagram of an information recognition result;
图2为本申请实施例提供的一种第一设备的硬件结构示意图;FIG. 2 is a schematic diagram of a hardware structure of a first device provided in an embodiment of the present application;
图3为本申请实施例提供的一种基于专项模型对图像中信息进行识别的过程示意图;FIG. 3 is a schematic diagram of a process for identifying information in an image based on a specific model provided by an embodiment of the present application;
图4为本申请实施例提供的一种信息识别结果示例图;FIG. 4 is an example diagram of an information identification result provided by an embodiment of the present application;
图5为本申请实施例提供的复杂图像示例图一;Fig. 5 is an example diagram 1 of a complex image provided by the embodiment of the present application;
图6为本申请实施例提供的复杂图像示例图二;Figure 6 is the second example of a complex image provided by the embodiment of the present application;
图7为本申请实施例提供的复杂图像示例图三;Figure 7 is the third example of a complex image provided by the embodiment of the present application;
图8为本申请实施例提供的识别图像中文本信息的过程示例图一;Fig. 8 is an example diagram 1 of the process of recognizing text information in an image provided by the embodiment of the present application;
图9为本申请实施例提供的识别图像中文本信息的过程示例图二;FIG. 9 is a second example of the process of identifying text information in an image provided by the embodiment of the present application;
图10为本申请实施例提供的一种基于谱残差显著性检测算法获得显著性分布图的示例图;FIG. 10 is an example diagram of obtaining a significance distribution diagram based on a spectral residual significance detection algorithm provided by an embodiment of the present application;
图11为本申请实施例提供的一种图像区域的坐标信息示例图;FIG. 11 is an example diagram of coordinate information of an image area provided by an embodiment of the present application;
图12为本申请实施例提供的两种图像语义分割效果对比图;Fig. 12 is a comparison diagram of two image semantic segmentation effects provided by the embodiment of the present application;
图13为本申请实施例提供的对图像中信息的识别过程示例图三;Fig. 13 is a third example of the process of identifying information in an image provided by the embodiment of the present application;
图14为本申请实施例提供的对图像中信息的识别过程示例图四;Figure 14 is an example of Figure 4 of the process of identifying information in an image provided by the embodiment of the present application;
图15为本申请实施例提供的两种图像的物理裁剪结果示例图;FIG. 15 is an example diagram of physical cropping results of two images provided in the embodiment of the present application;
图16A为本申请实施例提供的一种信息识别结果展示示例图;Fig. 16A is an example diagram showing an information identification result provided by the embodiment of the present application;
图16B为本申请实施例提供的另一种信息识别结果展示示例图;FIG. 16B is an example diagram of displaying another information identification result provided by the embodiment of the present application;
图17为本申请实施例提供的一种识别图像中文本信息的方法交互流程图;Fig. 17 is an interactive flow chart of a method for recognizing text information in an image provided by an embodiment of the present application;
图18为本申请实施例提供的一种图像语义分割阶段示意图;FIG. 18 is a schematic diagram of an image semantic segmentation stage provided by an embodiment of the present application;
图19为本申请实施例提供的信息识别效果对比图一;Figure 19 is a comparison chart 1 of the information recognition effect provided by the embodiment of the present application;
图20为本申请实施例提供的信息识别效果对比图二;Figure 20 is the second comparison of information recognition effects provided by the embodiment of the present application;
图21为本申请实施例提供的信息识别效果对比图三;Figure 21 is the third comparison of information recognition effects provided by the embodiment of the present application;
图22为本申请实施例提供的一种第一设备的结构框图。Fig. 22 is a structural block diagram of a first device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/ 或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,在本申请实施例的描述中,“多个”是指两个或多于两个。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Among them, in the description of the embodiments of this application, unless otherwise specified, "/" means or means, for example, A/B can mean A or B; "and/or" in this article is only a description of associated objects The association relationship of , which means that there can be three kinds of relationships, for example, A and/or B, can mean: A exists alone, A and B exist at the same time, and B exists alone. In addition, in the description of the embodiments of the present application, "plurality" refers to two or more than two.
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first" and "second" are used for descriptive purposes only, and cannot be understood as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, a feature defined as "first" and "second" may explicitly or implicitly include one or more of these features. In the description of this embodiment, unless otherwise specified, "plurality" means two or more.
本申请实施例提供一种识别图像中文本信息的方法,该方法应用于基于光学字符识别 (optical character recognition,OCR)技术对图像中的信息进行识别的过程中。其中,经过信息识别所识别的文本信息的类型可以包括但不限于以下一种或多种:文字、数字、字母、符号等。An embodiment of the present application provides a method for recognizing text information in an image, and the method is applied in the process of recognizing information in an image based on optical character recognition (OCR) technology. Wherein, the types of text information identified through information identification may include but not limited to one or more of the following: characters, numbers, letters, symbols, and the like.
在本申请实施例中,图像可以包括但不限于图片、图像帧等。图像用于表示一个或多个对象。例如,图像所表示的对象可以包括但不限于卡证(如身份证、护照、银行卡、名片等)、车牌、照片、招牌、海报、路牌、路标、文档、表格、书籍等中的一个或多个,本申请不限定。以图1所示图像为例,图1所示图片A所表示的对象包括文档、表格和卡证。In this embodiment of the application, an image may include but not limited to a picture, an image frame, and the like. An image is used to represent one or more objects. For example, the object represented by the image may include but not limited to one or more of cards (such as ID cards, passports, bank cards, business cards, etc.), license plates, photos, signboards, posters, street signs, road signs, documents, forms, books, etc. Multiple, not limited by this application. Taking the image shown in FIG. 1 as an example, the objects represented by the picture A shown in FIG. 1 include documents, forms and cards.
其中,本申请实施例提供的一种识别图像中文本信息的方法可以由端侧设备(如智能手机)完成,也可以由云侧设备(如服务器)完成,还可以由端侧设备和云侧设备共同完成,本申请不限定。Among them, a method for identifying text information in an image provided by the embodiment of the present application can be completed by a device on the device (such as a smart phone), or by a device on the cloud (such as a server), or by a device on the device and a cloud The equipment is completed together, which is not limited in this application.
示例性的,端侧设备(以下简称“第一设备”)可以包括但不限于智能手机、个人电脑 (personal computer,PC)(如笔记本电脑、台式电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)等)、平板电脑、电视机、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、工业控制(industrial control)中的无线终端、无人驾驶 (self-driving)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、传感器类设备(如监控终端)、物联网(internet of things,IOT)设备(如智能家居设备)等。本申请不限定第一设备的具体功能和结构。Exemplarily, the end-side device (hereinafter referred to as "the first device") may include but is not limited to a smart phone, a personal computer (personal computer, PC) (such as a notebook computer, a desktop computer, an ultra-mobile personal computer (ultra-mobile personal computer) , UMPC), etc.), tablet PCs, TVs, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) equipment, wireless terminals in industrial control (industrial control), self-driving (self-driving) Wireless terminals in transportation safety, wireless terminals in smart city, sensor devices (such as monitoring terminals), Internet of things (IOT) devices (such as smart home equipment), etc. The present application does not limit the specific function and structure of the first device.
请参考图2,图2以智能手机为例,出了本申请实施例提供的一种第一设备(即端侧设备)的硬件结构示意图。如图2所示,第一设备可以包括处理器210,存储器(包括外部存储器接口220和内部存储器221),通用串行总线(universal serial bus,USB)接口230,充电管理模块240,电源管理模块241,电池242,天线1,天线2,移动通信模块250,无线通信模块260,音频模块270,扬声器270A,受话器270B,麦克风270C,耳机接口270D,传感器模块280,按键290,马达291,指示器292,摄像组件293,显示屏294,用户标识模块 (subscriberidentity module,SIM)接口295等。其中,传感器模块280可以包括陀螺仪传感器,加速度传感器,磁传感器,触摸传感器,指纹传感器,压力传感器,气压传感器,距离传感器,接近光传感器,温度传感器,环境光传感器,骨传导传感器等。Please refer to FIG. 2 . FIG. 2 shows a schematic diagram of a hardware structure of a first device (that is, a device at the end side) provided by an embodiment of the present application, taking a smart phone as an example. As shown in Figure 2, the first device may include a
可以理解的是,本发明实施例示意的结构并不构成对第一设备的具体限定。在本申请另一些实施例中,第一设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件,或软件和硬件的组合实现。It can be understood that, the structure shown in the embodiment of the present invention does not constitute a specific limitation on the first device. In some other embodiments of the present application, the first device may include more or fewer components than shown in the figure, or some components may be combined, or some components may be separated, or different component arrangements may be made. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器210可以包括一个或多个处理单元。例如:处理器210可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processingunit,GPU),图像信号处理器(image signal processor,ISP),飞行控制器,视频编解码器,数字信号处理器 (digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器210中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器210 中的存储器为高速缓冲存储器。该存储器可以保存处理器210刚用过或循环使用的指令或数据。如果处理器210需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器210的等待时间,因而提高了系统的效率。A memory may also be provided in the
在一些实施例中,处理器210可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuitsound,I2S) 接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purposeinput/output,GPIO)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments,
充电管理模块240用于从充电器接收充电输入。电源管理模块241用于连接电池242,充电管理模块240与处理器210。电源管理模块241接收电池242和/或充电管理模块240的输入,为处理器210,内部存储器221,显示屏294,摄像组件293,和无线通信模块260等供电。The charging management module 240 is configured to receive charging input from the charger. The power management module 241 is used for connecting the battery 242 , the charging management module 240 and the
第一设备的无线通信功能可以通过天线1,天线2,移动通信模块250,无线通信模块260,调制解调处理器以及基带处理器等实现。The wireless communication function of the first device may be realized by the antenna 1, the
天线1和天线2用于发射和接收电磁波信号。第一设备中的每个天线可用于覆盖单个或多个通信频段。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and
在本申请实施例中,第一设备可以通过天线1和/或天线2向云侧设备发送信息识别任务,以及接收来自云侧设备的信息识别结果。In this embodiment of the present application, the first device may send an information identification task to the cloud-side device through antenna 1 and/or
移动通信模块250可以提供应用在第一设备上的包括2G/3G/4G/5G/6G等无线通信的解决方案。移动通信模块250可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块250可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块250还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块250的至少部分功能模块可以被设置于处理器210中。在一些实施例中,移动通信模块250的至少部分功能模块可以与处理器210的至少部分模块被设置在同一个器件中。The
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器270A、受话器270B等)输出声音信号,或通过显示屏294显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器210,与移动通信模块250或其他功能模块设置在同一个器件中。A modem processor may include a modulator and a demodulator. Wherein, the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is passed to the application processor after being processed by the baseband processor. The application processor outputs a sound signal through an audio device (not limited to a
无线通信模块260可以提供应用在第一设备上的包括无线局域网(wirelesslocal area networks,WLAN)(如Wi-Fi网络),蓝牙BT,全球导航卫星系统(globalnavigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块260 可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块260经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器210。无线通信模块 260还可以从处理器210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The
在一些实施例中,第一设备的天线1和移动通信模块250耦合,天线2和无线通信模块 260耦合,使得第一设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(codedivision multiple access, CDMA),宽带码分多址(wideband code division multipleaccess,WCDMA),时分码分多址 (time-division code division multiple access,TD-SCDMA),长期演进(long term evolution, LTE),新无线(new radio,NR),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述 GNSS可以包括全球卫星定位系统(global positioningsystem,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system, BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentationsystems,SBAS)。In some embodiments, the antenna 1 of the first device is coupled to the
第一设备通过GPU,显示屏294,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏294和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The first device implements the display function through the GPU, the display screen 294, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 294 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
显示屏294用于显示图像,视频等。显示屏294包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emittingdiode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diode的,AMOLED),柔性发光二极管(flex light-emittingdiode,FLED),Miniled,MicroLed, Micro-oLed,量子点发光二极管(quantum dot lightemitting diodes,QLED)等。在一些实施例中,第一设备可以包括1个或N个显示屏294,N为大于1的正整数。The display screen 294 is used to display images, videos and the like. Display 294 includes a display panel. The display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrixorganic light-emitting diode) , AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the first device may include 1 or N display screens 294, where N is a positive integer greater than 1.
第一设备可以通过ISP,摄像组件293,视频编解码器,GPU,显示屏294以及应用处理器等实现拍摄功能。The first device can realize the shooting function through the ISP, the camera component 293 , the video codec, the GPU, the display screen 294 and the application processor.
外部存储器接口220可以用于连接外部存储卡,例如Micro SD卡,固态硬盘等,实现扩展第一设备的存储能力。外部存储卡通过外部存储器接口220与处理器210通信,实现数据存储功能。例如将音乐,视频,图像等文件保存在外部存储卡中。The external memory interface 220 can be used to connect an external memory card, such as a Micro SD card, a solid state drive, etc., to expand the storage capacity of the first device. The external memory card communicates with the
内部存储器221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器221可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储第一设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器221 可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器210通过运行存储在内部存储器221的指令,和/或存储在设置于处理器中的存储器的指令,执行第一设备的各种功能应用以及数据处理。The internal memory 221 may be used to store computer-executable program codes including instructions. The internal memory 221 may include an area for storing programs and an area for storing data. Wherein, the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like. The storage data area can store data created during the use of the first device (such as audio data, phonebook, etc.) and the like. In addition, the internal memory 221 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like. The
第一设备可以通过音频模块270,扬声器270A,受话器270B,麦克风270C以及应用处理器等实现音频功能。例如音乐播放,录音等。关于音频模块270,扬声器270A,受话器270B 和麦克风270C的具体工作原理和作用,可以参考常规技术中的介绍。The first device may implement an audio function through an
按键290包括开机键,音量键等。按键290可以是机械按键。也可以是触摸式按键。第一设备可以接收按键输入,产生与第一设备的用户设置以及功能控制有关的键信号输入。The keys 290 include a power key, a volume key and the like. The key 290 may be a mechanical key. It can also be a touch button. The first device can receive key input and generate key signal input related to user settings and function control of the first device.
需要说明的是,图2所示第一设备包括的硬件模块只是示例性地描述,并不对第一设备的具体结构做出限定。例如,若第一设备是电视机,那么第一设备还可以包括遥控组件等部件。若第一设备是PC,那么第一设备还可以包括键盘、鼠标等部件。It should be noted that the hardware modules included in the first device shown in FIG. 2 are only described as examples, and do not limit the specific structure of the first device. For example, if the first device is a television, the first device may further include components such as a remote control component. If the first device is a PC, the first device may further include components such as a keyboard and a mouse.
在本申请中,第一设备的操作系统可以包括但不限于 微软苹果 等操作系统。In this application, the operating system of the first device may include but not limited to Microsoft apple and other operating systems.
以包括分层架构的系统的第一设备为例,第一设备的软件可以分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。例如,第一设备的软件结构从上至下可以分为三层:应用程序层(简称应用层),应用程序框架层(简称框架层),系统库,安卓运行时和内核层(也称为驱动层)。to include layered architecture Take the first device of the system as an example, the software of the first device can be divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. For example, the software structure of the first device can be divided into three layers from top to bottom: application program layer (abbreviated as application layer), application program framework layer (referred to as framework layer), system library, Android runtime and kernel layer (also referred to as driver layer).
其中,在本申请实施例中,第一设备能够提供信息识别服务。例如,第一设备的应用层中可以包括用于文本等信息识别的识文类应用程序(application,APP)。又如,第一设备可以集成有提供文本等信息识别相关功能模块。又如,第一设备可以提供文本等信息识别相关接口,如快应用或者小程序等。又如,第一设备中安装的应用中可以集成有文本等信息识别服务。本申请实施例不限定第一设备提供信息识别服务的具体方式。关于信息识别功能的具体实现方式,可以参考常规技术,这里不做赘述。Wherein, in the embodiment of the present application, the first device can provide an information identification service. For example, the application layer of the first device may include a text-aware application program (application, APP) for identifying information such as text. For another example, the first device may be integrated with functional modules related to providing text and other information identification. In another example, the first device may provide an interface related to information identification such as text, such as a quick application or a small program. For another example, the application installed in the first device may be integrated with information identification services such as text. The embodiment of the present application does not limit the specific manner in which the first device provides the information identification service. For the specific implementation of the information identification function, reference may be made to conventional technologies, which will not be repeated here.
为了更加精确、全面地进行信息识别,在一种可能的实现方式中,可以对图像的场景进行分析,然后基于具体场景对应的专项模型或专项算法对图像中的信息进行识别。In order to perform information identification more accurately and comprehensively, in a possible implementation manner, the scene of the image may be analyzed, and then the information in the image may be identified based on a specific model or a specific algorithm corresponding to the specific scene.
作为一种示例,图像的场景与图像所表示的对象有关。例如,若图像所表示的对象是卡证,则图像属于卡证场景。若图像所表示的对象是表格,则图像属于表格场景。若图像所表示的对象是文档,则图像属于文档场景。若图像所表示的对象是车牌,则图像属于车牌场景。As an example, the scene of an image is related to the object represented by the image. For example, if the object represented by the image is a card, the image belongs to the card scene. If the object represented by the image is a table, the image belongs to the table scene. If the object represented by the image is a document, the image belongs to the document scene. If the object represented by the image is a license plate, the image belongs to the license plate scene.
作为另一种示例,图像的场景还可以与图像的颜色、亮度或纹理中的一个或多个决定。例如,若表达自然风景的图像较暗,则图像属于夜晚场景。若表达自然风景的图像较亮,则图像属于白天场景。As another example, the scene of the image may also be determined by one or more of the image's color, brightness, or texture. For example, if an image expressing a natural scenery is dark, the image belongs to a night scene. If the image expressing the natural scenery is brighter, the image belongs to the daytime scene.
作为一种示例,基于专项模型对图像中信息进行识别的过程可以如图3所示。如图3所示,若图像中存在文字,则可以进一步对图像进行场景分析。若确定图像属于场景1,则使用模型1对图像进行识别,得到识别结果。若确定图像属于场景2,则使用模型2对图像进行识别。若确定图像属于场景3,则使用模型3对图像进行识别。若确定图像属于场景4,则使用模型4对图像进行识别。若确定图像属于场景4,则使用模型4对图像进行识别。若确定图像属于场景5,则使用模型5对图像进行识别。其中,模型1是场景1的专项模型,模型2是场景2的专项模型,模型3是场景3的专项模型,模型4是场景4的专项模型,模型 5是场景5的专项模型。As an example, the process of identifying information in an image based on a specific model may be shown in FIG. 3 . As shown in FIG. 3 , if there are texts in the image, scene analysis can be further performed on the image. If it is determined that the image belongs to scene 1, use model 1 to recognize the image and obtain the recognition result. If it is determined that the image belongs to
但是,基于类似图3所示基于专项模型对图像中信息的识别方法在处理一些复杂图像时,仍然存在漏检或误检的问题。如图4所示,图片A所表示的对象包括文档、表格和卡证,但是基于专项模型对图像中信息进行识别的方法,通常仅能确定一种图片A的场景。例如,基于上述多个对象所在的图像区域在图像中的占比、位置等确定图像的场景。However, the recognition method based on the information in the image based on the specific model shown in Figure 3 still has the problem of missed detection or false detection when processing some complex images. As shown in Figure 4, the objects represented by picture A include documents, forms and cards, but the method of identifying information in images based on specific models can usually only determine one scene of picture A. For example, the scene of the image is determined based on the proportions and positions of the image areas where the above-mentioned multiple objects are located in the image.
示例性的,若对图4所示图片A进行场景分析后确定该图片属于文档场景,则文档场景对应的专项模型由于无法识别一些符号或公式等原因导致无法成功识别表格区域的全部信息,以及由于无法识别一些带有凹凸或者镂空等特点的文本导致无法成功识别卡证区域的全部信息。同样的道理,若对图片A进行场景分析后确定该图片属于表格场景,则文档场景对应的专项模型无法成功识别文档和卡证区域的全部信息。若对图片A进行场景分析后确定该图片属于卡证场景,则文档场景对应的专项模型无法成功识别文档和表格区域的全部信息。其中,图4所示识别结果中,图片A中被成功识别的文字被标记在方框内。Exemplarily, if the scene analysis of the picture A shown in Figure 4 determines that the picture belongs to the document scene, the special model corresponding to the document scene cannot recognize all the information in the table area due to the inability to recognize some symbols or formulas, and Due to the inability to recognize some text with concave-convex or hollow-out characteristics, it is impossible to successfully recognize all the information in the card area. In the same way, if the scene analysis of picture A determines that the picture belongs to the table scene, the special model corresponding to the document scene cannot successfully identify all the information of the document and the card area. If the scene analysis of picture A determines that the picture belongs to the card scene, the special model corresponding to the document scene cannot successfully identify all the information in the document and table areas. Among them, in the recognition result shown in FIG. 4 , the successfully recognized text in picture A is marked in a box.
需要说明的是,图4所示示例仅以识别结果在图像上叠加显示(例如被标记在方框内) 作为示例。本申请实施例不限定识别结果的具体展示形式。例如识别结果还可以以罗列识别出的文字等信息的形式展示信息识别结果,又如可以以文档形式展示信息识别结果(如未被成功识别的信息以乱码形式展示),又如还可以以掩码值形式展示信息识别结果。It should be noted that the example shown in FIG. 4 is only an example in which the recognition result is superimposed on the image (for example, marked in a box). The embodiment of the present application does not limit the specific display form of the recognition result. For example, the recognition result can also display the information recognition result in the form of a list of recognized text and other information, and the information recognition result can be displayed in the form of a document (for example, the information that has not been successfully recognized is displayed in the form of garbled characters), and it can also be displayed in the form of a mask Display the information recognition result in the form of code value.
为了解决上述常规方法存在的问题,本申请实施例提供一种识别图像中文本信息的方法,该方法通过对图像进行场景分析,在图像所表示的对象属于不同场景时,对图像按照场景进行分割以得到多个图像区域。然后,采用对应专项模型或专项算法对多个图像区域分别进行文本信息识别。最后,整合多个专项模型或专项算法对多个图像区域的识别结果,得到最终的对图像中文本信息的识别结果。In order to solve the problems existing in the above-mentioned conventional methods, the embodiment of the present application provides a method for recognizing text information in an image. The method performs scene analysis on the image, and when the objects represented by the image belong to different scenes, the image is segmented according to the scene. to get multiple image regions. Then, use the corresponding special model or special algorithm to carry out text information recognition on multiple image regions respectively. Finally, the recognition results of multiple special models or special algorithms for multiple image regions are integrated to obtain the final recognition result of the text information in the image.
通过本申请实施例提供的一种识别图像中文本信息的方法,可以更加精确、全面地对复杂图像进行文本信息识别。其中,在本申请实施例中,复杂图像可以包括但不限于以下特点中的一种或多种:图像所表示的对象属于多个场景、图像中包含细小密集文字、图像中包含干扰背景等。Through a method for recognizing text information in an image provided by an embodiment of the present application, the text information recognition of complex images can be performed more accurately and comprehensively. Among them, in the embodiment of the present application, the complex image may include but not limited to one or more of the following characteristics: the object represented by the image belongs to multiple scenes, the image contains small and dense text, the image contains interference background, etc.
其中,图像所表示的对象属于多个场景例如图4或者图6所示,其中,图4中的图片A所表示的对象文档、表格和卡证分别属于文档场景、表格场景和卡证场景。图5所示图片C所表示的对象包括表格和照片,分别属于表格场景和图片场景。图像中包含细小密集文字例如图5所示,其中,图5所示图片B中央的银幕上包括细小密集文字,基于常规方法,图5 所示图片B中的细密文字不易被成功识别。图像中包含干扰背景例如图7所示,其中,图7 所示图片D的背景中包括干扰文字。Wherein, the object represented by the image belongs to multiple scenes such as shown in FIG. 4 or 6, wherein the object document, form and card shown in picture A in FIG. 4 belong to the document scene, form scene and card scene respectively. The objects represented by the picture C shown in FIG. 5 include tables and photos, which belong to the table scene and the picture scene respectively. The image contains small and dense text, for example, as shown in Figure 5. The screen in the center of picture B shown in Figure 5 contains small and dense text. Based on conventional methods, the fine and dense text in picture B shown in Figure 5 is not easy to be successfully recognized. The image contains interference background, for example, as shown in Figure 7, where the background of picture D shown in Figure 7 includes interference text.
以图片A为例,如图8所示,本申请实施例提供的一种识别图像中文本信息的方法可以包括以下四个阶段:Taking picture A as an example, as shown in Figure 8, a method for recognizing text information in an image provided by the embodiment of the present application may include the following four stages:
阶段1:焦点区域分析阶段。Phase 1: Focus Area Analysis Phase.
其中,焦点区域分析阶段用于估计图像中用户感兴趣的区域或者用户关注的区域。Wherein, the focus area analysis stage is used to estimate the area of interest of the user or the area of concern of the user in the image.
可以理解,根据人的视觉机制,用户在看到图像时,最先关注的是图像中包括但不限于以下一种或多种特征:文字、数字、字母、符号、图形、徽标、颜色、纹理、光线,以及上述特征的具体位置等。在本申请实施例中,对于图像中用户感兴趣或者关注的区域,需要优先保证其信息识别的优先级和准确性;对于图像中用户不感兴趣或者不被用户关注的区域,其信息识别的优先级较低或者可以被忽略。It can be understood that according to the human visual mechanism, when a user sees an image, the first thing that the user pays attention to is one or more of the following features in the image, including but not limited to: text, numbers, letters, symbols, graphics, logos, colors, textures , light, and the specific location of the above features, etc. In the embodiment of this application, for the areas in the image that the user is interested in or concerned about, it is necessary to prioritize the priority and accuracy of information identification; lower level or can be ignored.
基于人的视觉机制,在本申请实施例中,可以通过模拟人的视觉机制,确定图像中用户感兴趣或者关注的区域。Based on the human visual mechanism, in the embodiment of the present application, the region of the user's interest or attention in the image may be determined by simulating the human visual mechanism.
例如,可以基于视觉显著性检测(visual saliency detection,VSD)算法,例如基于相位谱显著性检测算法、谱残差(spectral residual)显著性检测算法、四元傅里叶变换等算法,确定图像中像素的显著性值,以通过分析图像中像素的显著性值,确定图像的焦点区域。例如,图像的焦点区域中像素的显著性值大于预设阈值。For example, based on visual saliency detection (visual saliency detection, VSD) algorithm, such as based on phase spectrum saliency detection algorithm, spectral residual (spectral residual) saliency detection algorithm, four-element Fourier transform and other algorithms, determine the The saliency value of the pixel to determine the focus area of the image by analyzing the saliency value of the pixel in the image. For example, the saliency value of a pixel in a focus area of the image is greater than a preset threshold.
其中,VSD算法的原理是通过计算机视觉(computer version,CV)领域算法模拟人的视觉特点,提取图像中的人类感兴趣区域的技术。示例性的,VSD算法可以利用图像的颜色、亮度、边缘等特征表示,判断目标区域和它周围像素的差异,进而计算图像中像素之间的差异来确定视觉显著性特征。第一设备基于VSD算法可以获取人类视觉系统关注的图像特征,例如文字、数字、字母、符号、图形、徽标、颜色、纹理、光线等。Among them, the principle of the VSD algorithm is to simulate the visual characteristics of human beings through the computer vision (computer version, CV) field algorithm, and extract the technology of the human interest area in the image. Exemplarily, the VSD algorithm can use the color, brightness, edge and other feature representations of the image to judge the difference between the target area and its surrounding pixels, and then calculate the difference between the pixels in the image to determine the visual salient features. Based on the VSD algorithm, the first device can acquire image features concerned by the human visual system, such as text, numbers, letters, symbols, graphics, logos, colors, textures, light, and the like.
以谱残差显著性检测算法为例,谱残差显著性检测算法的原理是:通过对输入图像的对数谱分析,提取图像在光谱域的谱残差,并实现在空间域构造相应的显著性图(saliency map)。在一些实施例中,还可以通过设置不同的参数,对比其对显著性图造成的影响,并讨论算法本身的优点和限制,以得到最优的检测结果。Taking the spectral residual saliency detection algorithm as an example, the principle of the spectral residual saliency detection algorithm is: through the logarithmic spectrum analysis of the input image, the spectral residual of the image in the spectral domain is extracted, and the corresponding spectral residual is constructed in the spatial domain. Saliency map. In some embodiments, it is also possible to obtain optimal detection results by setting different parameters, comparing their effects on the saliency map, and discussing the advantages and limitations of the algorithm itself.
请参考图10,图10示出了一种基于谱残差显著性检测算法获得显著性分布图的示例。其中,图10中每个像素的亮度用于标识该像素点的图像特征的特征值,亮度越大则代表该像素点的图像特征的特征值越大。特征值用于表示像素点被用户关注的程度。如图10所示,显著性分布图中亮度大的区域对应图片B中中央银幕区域,该图像区域被用户关注的程度较高;显著性分布图中亮小的区域对应图片B中的会场背景区域,该图像区域被用户关注的程度较低。Please refer to FIG. 10 , which shows an example of obtaining a significance distribution map based on a spectral residual significance detection algorithm. Wherein, the brightness of each pixel in FIG. 10 is used to identify the feature value of the image feature of the pixel point, and the greater the brightness, the larger the feature value of the image feature of the pixel point. The feature value is used to indicate the degree to which the pixel is paid attention to by the user. As shown in Figure 10, the brighter area in the saliency distribution map corresponds to the central screen area in picture B, and this image area has a higher degree of user attention; the brighter and smaller area in the saliency distribution map corresponds to the venue background in picture B area, the image area is less concerned by the user.
以图8所示图片A为例,经过焦点区域分析,可以确定图片A中文字、数字、字母和符号特征所在的图像区域是用户感兴趣或者关注的区域。而图像中不包括文字、数字、字母、符号、图形、徽标、纹理等内容的区域,不是用户感兴趣或者被用户关注的区域,或者被用户感兴趣或者关注的程度较低。Taking picture A shown in FIG. 8 as an example, through focus area analysis, it can be determined that the image area where the characters, numbers, letters and symbols in picture A are located is the area that the user is interested in or concerned about. The areas in the image that do not include text, numbers, letters, symbols, graphics, logos, textures, etc. are not areas that are of interest or concern to the user, or are less interested or concerned by the user.
阶段2:图像语义分割阶段。Stage 2: Image semantic segmentation stage.
其中,图像语义分割阶段用于通过图像场景分析,确定图像所表示的多个对象和该多个对象所属的场景,以及多个场景分别在图像中的对应位置。Among them, the image semantic segmentation stage is used to determine the multiple objects represented by the image, the scenes to which the multiple objects belong, and the corresponding positions of the multiple scenes in the image through image scene analysis.
进一步的,图像语义分割阶段还用于根据上述多个对象所属的场景,将图像分割为多个图像区域。Further, the image semantic segmentation stage is also used to segment the image into multiple image regions according to the scenes to which the above multiple objects belong.
例如,对象所属的场景可以包括但不限于以下中的一种或多种:卡证场景、车牌场景、照片场景、招牌场景、海报场景、路牌场景、路标场景、文档场景、表格场景、书籍场景、环境场景等。For example, the scene to which the object belongs may include but not limited to one or more of the following: card scene, license plate scene, photo scene, signboard scene, poster scene, street sign scene, road sign scene, document scene, form scene, book scene , environmental scenes, etc.
在一些实施例中,基于图像的焦点区域中的特征分布,可以进一步确定图像所表示的对象。其中,图像所表示的对象可以包括但不限于以下中的一种或多种:卡证(如身份证、护照、银行卡、名片等)、车牌、照片、招牌、海报、路牌、路标、文档、表格、书籍。In some embodiments, the object represented by the image may be further determined based on the feature distribution in the focal area of the image. Among them, the object represented by the image may include but not limited to one or more of the following: cards (such as ID cards, passports, bank cards, business cards, etc.), license plates, photos, signboards, posters, road signs, road signs, documents , forms, books.
例如,若图像的焦点区域中分布有预设文字(如身份证)、预设数量的数字(如18个)、预设徽标或图形、预设纹理,则可以确定图像所表示的对象是身份证。For example, if there are preset characters (such as ID cards), preset numbers (such as 18), preset logos or graphics, and preset textures distributed in the focus area of the image, it can be determined that the object represented by the image is an identity certificate.
需要说明的是,在本申请实施例中,在经过图像语义分割之后,一个场景对应的图像区域用于表示一个或多个对象。即,一个场景可以对应图像中的一个或多个图像区域。It should be noted that, in the embodiment of the present application, after image semantic segmentation, an image region corresponding to a scene is used to represent one or more objects. That is, a scene may correspond to one or more image regions in an image.
如图8所示,在对图像进行图像场景分析之后,可以分别得到文档场景、表格场景和卡证场景对应的三个图像区域8a、8b和8c。其中,图像区域8a区域所表示的对象属于文档场景,图像区域8b区域所表示的对象属于表格场景,图像区域8c区域所表示的对象属于卡证场景。As shown in FIG. 8 , after image scene analysis is performed on the image, three
作为一种示例,多个场景分别在图像中的位置可以用多个场景对应的图像区域的坐标信息来表示。As an example, the respective positions of the multiple scenes in the image may be represented by coordinate information of image regions corresponding to the multiple scenes.
作为一种示例,第一设备可以采用U-Net和DenseNet确定图像所表示的多个对象所属的场景以及多个场景分别在图像中的位置。As an example, the first device may use U-Net and DenseNet to determine the scenes to which multiple objects represented by the image belong and the respective positions of the multiple scenes in the image.
其中,U-Net是一种深度全卷积网络。在本申请实施例中,U-Net可以用于进行图像语义分割。例如,U-Net可以使用包含压缩路径和扩展路径的对称U形结构进行建模,以确定图像所表示的多个对象所属的场景以及多个场景分别在图像中的位置。DenseNet是一种深度神经网络。在本申请实施例中,DenseNet可以用于进行图像检测和分类。例如,DenseNet可以使用稠密的跳层连接来达到复用各卷积层特征的目的,从而达到更好的场景识别精度。Among them, U-Net is a deep fully convolutional network. In the embodiment of this application, U-Net can be used for image semantic segmentation. For example, U-Net can be modeled using a symmetric U-shaped structure containing compressed paths and expanded paths to determine the scene to which multiple objects represented by an image belong and the positions of multiple scenes in the image. DenseNet is a deep neural network. In the embodiment of this application, DenseNet can be used for image detection and classification. For example, DenseNet can use dense layer-skip connections to achieve the purpose of reusing the features of each convolutional layer, thereby achieving better scene recognition accuracy.
例如,如图11所示,图片A所表示的多个对象文档、表格和卡证分别属于文档场景、表格场景和卡证场景,经过图像语义分割,可以确定文档场景在图像中的位置是坐标(0,0)-(x1,y1)的矩形区域,表格场景在图像中的位置是坐标(x1,y2)-(x2,y1)的矩形区域,卡证场景在图像中的位置是坐标(x1,0)-(x2,y2)的矩形区域。For example, as shown in Figure 11, multiple object documents, forms and cards represented by picture A belong to the document scene, form scene and card scene respectively. After image semantic segmentation, it can be determined that the position of the document scene in the image is the coordinate (0, 0)-(x1, y1), the position of the table scene in the image is the rectangular area of coordinates (x1, y2)-(x2, y1), and the position of the card scene in the image is the coordinate ( The rectangular area of x1,0)-(x2,y2).
其中,图11所示坐标系仅作为一种坐标系示例,本申请不限定坐标系的原点、x轴和y 轴的具体位置。另外,在一些实施例中,坐标系还可以是极坐标系或其他坐标系,多个场景对应的图像区域还可以用曲线表达式来表示,本申请不限定。Wherein, the coordinate system shown in FIG. 11 is only used as an example of a coordinate system, and the application does not limit the specific positions of the origin, x-axis and y-axis of the coordinate system. In addition, in some embodiments, the coordinate system may also be a polar coordinate system or other coordinate systems, and image regions corresponding to multiple scenes may also be represented by curve expressions, which is not limited in this application.
在一些实施例中,为了确保图像语义分割的可靠性和准确性,提高图像语义分割的容错性,在进行图像场景分析之后,还可以结合文本行分析,确定图像所表示的对象所属的场景和场景在图像中的位置。In some embodiments, in order to ensure the reliability and accuracy of image semantic segmentation and improve the fault tolerance of image semantic segmentation, after image scene analysis, text line analysis can also be combined to determine the scene and location of the object represented by the image. The position of the scene in the image.
例如,假设图像包括第一信息(如第一文本行),且经过图像场景分析,确定上述多个对象所属的场景包括第一场景和第二场景,则可以通过对图像进行文本行分析,在确定第一信息的第一部分属于第一场景,第一信息的第二部分属于第二场景时,合并第一场景和第二场景对应的图像区域。For example, assuming that the image includes the first information (such as the first text line), and after image scene analysis, it is determined that the scenes to which the above-mentioned multiple objects belong include the first scene and the second scene, then the text line analysis can be performed on the image, in When it is determined that the first part of the first information belongs to the first scene and the second part of the first information belongs to the second scene, image regions corresponding to the first scene and the second scene are merged.
以图12所示明暗区域反差较大的图像为例,其中,图12所示图像所表示的对象仅有一个,即文档。但是在进行图像语义分割时,如图12中的(a)所示,若仅进行图像场景分析,则由于阴影关系,该文档被分割为一明一暗两个场景对应的两个图像区域(如图12中的(a) 所示区域1和区域2)。Take the image shown in FIG. 12 with a large contrast between light and dark areas as an example, wherein the image shown in FIG. 12 represents only one object, that is, a document. However, when image semantic segmentation is performed, as shown in (a) in Figure 12, if only image scene analysis is performed, due to the shadow relationship, the document is segmented into two image regions corresponding to two scenes, one bright and one dark ( Area 1 and Area 2) are shown in (a) in Figure 12).
若结合图像场景分析和文本行分析,如图12中的(b)所示,则可以确定文本行1201横跨了区域1和区域2。根据场景的定义,我们可以知道:同一文本行只可能出现在一个场景对应的区域中。因此,对于这种情况,如图12中的(b)所示,则可以将两个场景对应的两个图像区域合并。If the image scene analysis and the text line analysis are combined, as shown in (b) in FIG. 12 , it can be determined that the text line 1201 straddles Area 1 and
通过将原本孤立的图像场景分析和文本行分析结合进行图像语义分割,可以基于两者的结果互补,避免类似图12中的(a)所示由于阴影造成的误分割问题,提高图像语义分割的容错性,保证图像语义分割时的可靠性和准确性。By combining the original isolated image scene analysis and text line analysis for image semantic segmentation, the results of the two can be complementary, avoiding the wrong segmentation problem caused by shadows as shown in (a) in Figure 12, and improving the image semantic segmentation. Fault tolerance, to ensure the reliability and accuracy of image semantic segmentation.
在一些实施例中,为了避免不重要的区域对信息识别单元的资源占用。如图9所示,在完成图像语义分割之后,还可以进行区域显著性计算(即图9所示区域显著性计算阶段,简称阶段5)。In some embodiments, in order to avoid resource occupation of the information identification unit by unimportant areas. As shown in FIG. 9 , after the image semantic segmentation is completed, regional saliency calculation can also be performed (that is, the regional saliency calculation stage shown in FIG. 9 , referred to as stage 5).
其中,区域显著性计算阶段用于通过显著性分析,以确定多个场景对应的图像区域分别表示的多个对象的重要程度。Wherein, the regional saliency calculation stage is used for determining the importance of multiple objects respectively represented by image regions corresponding to multiple scenes through saliency analysis.
作为一种示例,可以根据多个场景对应的图像区域中的亮度、纹理、焦点区域的掩码值等确定其重要程度。As an example, the degree of importance may be determined according to the brightness, texture, mask value of the focus area, and the like in image areas corresponding to multiple scenes.
作为一种示例,在得到多个图像区域之后,可以分别根据多个图像区域中像素的显著性值和焦点区域的面积,确定多个图像区域所表示的对象的重要程度。As an example, after the multiple image areas are obtained, the importance of the objects represented by the multiple image areas may be determined according to the saliency values of the pixels in the multiple image areas and the area of the focus area respectively.
例如,可以基于以下公式1计算得到图像区域的显著性(用OcrSaliency表示):For example, the saliency of an image region (expressed in OcrSaliency) can be calculated based on the following formula 1:
其中,显著性值越高,则对应图像区域越重要;显著性值越低,则对应图像区域越不重要。Ai是图像的第i个图像区域,i为正整数,且i>1。Ai={pj},pj是图像区域Ai中的第j个像素,图像区域Ai中包括N个像素,1≤j≤N,N正整数,且N>1。α是自定义参数。S(pj) 是图像区域Ai中像素pj的视觉显著性值。其中,像素pj的视觉显著性值可以在上述阶段1 中,基于VSD算法(如相位谱、谱残差、四元傅里叶变换等算法)计算得到。H(pj)是图像区域Ai中像素pj的掩码值,例如,若像素pj位于文本位置,则H(pj)=1。Among them, the higher the saliency value, the more important the corresponding image region; the lower the saliency value, the less important the corresponding image region. A i is the i-th image area of the image, i is a positive integer, and i>1. A i ={p j }, p j is the jth pixel in the image area A i , the image area A i includes N pixels, 1≤j≤N, N is a positive integer, and N>1. α is a custom parameter. S(p j ) is the visual saliency value of pixel p j in image region A i . Wherein, the visual saliency value of pixel p j can be calculated based on VSD algorithm (such as phase spectrum, spectral residual, quaternary Fourier transform, etc.) in the above stage 1. H(p j ) is the mask value of the pixel p j in the image area A i , for example, if the pixel p j is located at the text position, then H(p j )=1.
在本申请实施例中,计算得到的区域显著性结果可以用于在后续信息识别时,根据实际情况最小限度的降低处理负荷。例如,在GPU受限时,对于显著性较低的区域,不向信息识别单元发送对应图像区域,优先对显著性更高的图像区域进行信息识别。In the embodiment of the present application, the calculated regional saliency results can be used to minimize the processing load according to the actual situation during subsequent information identification. For example, when the GPU is limited, the corresponding image area is not sent to the information recognition unit for the lower salient area, and the information identification is performed on the higher salient image area preferentially.
作为一种示例,在本申请实施例中,可以设置显著性阈值Th,若计算得到的区域显著性小于显著性阈值Th,如OcrSaliency(Ai)<Th,则放弃对图像区域Ai进行信息识别。As an example, in the embodiment of the present application, the saliency threshold Th can be set, and if the calculated regional saliency is smaller than the saliency threshold Th, such as OcrSaliency(A i )<Th, the image region A i will be abandoned identify.
请参考图13,图13以对图5所示图片B进行信息识别为例,示出了一种信息识别过程示例图。如图13所示,图像1301经过图像语义分割被分割为图像中央的银幕区域13a和由椅子、幕布和天花等构成的背景区域13b,其中银幕区域13a属于表格场景,背景区域13b 属于环境场景。在对银幕区域13a和背景区域13b进行区域显著性计算后,假设显著性阈值 Th是0.25,银幕区域13a的显著性是0.41,背景区域13b的显著性是0.18,则如图13所示,由于背景区域13b的显著性(0.18)小于显著性阈值Th(0.25),因此不对背景区域13b进行信息识别。其中,对于图13所示的情况,在进行结果整合时,仅基于对银幕区域13a的识别结果得到图像1301的信息识别结果。Please refer to FIG. 13 . FIG. 13 shows an example diagram of an information recognition process by taking the information recognition of the picture B shown in FIG. 5 as an example. As shown in Figure 13, the image 1301 is divided into a screen area 13a in the center of the image and a
通过区域显著性的计算,可以有效过滤掉图像中不重要的区域,以避免该不重要的区域对信息识别单元的资源占用,导致的信息识别单元负荷较高的问题。Through the calculation of regional saliency, unimportant areas in the image can be effectively filtered out, so as to avoid the resource occupation of the information identification unit by the unimportant areas, resulting in a high load of the information identification unit.
阶段3:任务分送阶段。Phase 3: Task distribution phase.
其中,任务分送阶段用于将多个场景对应的图像区域分送不同场景的信息识别单元,以进行相应场景下的信息识别。其中,信息识别单元中集成有用于对对应场景下进行信息识别的专项模型或者专项算法。Wherein, the task distribution stage is used to distribute image regions corresponding to multiple scenes to information recognition units of different scenes, so as to perform information recognition in corresponding scenes. Wherein, the information identification unit is integrated with a special model or a special algorithm for identifying information in a corresponding scene.
在一种可能的实现方式中,可以将完整图像(如图片A)分别发送至不同场景的信息识别单元,同时告知信息识别单元对应图像区域在图像中的位置,以便信息识别单元进行信息识别时,准确识别对应图像区域所表示的对象中的信息。In a possible implementation, the complete image (such as picture A) can be sent to the information recognition unit of different scenes respectively, and at the same time, the information recognition unit is notified of the position of the corresponding image area in the image, so that when the information recognition unit performs information recognition , accurately identify the information in the object represented by the corresponding image region.
如图14所示,在将图片A分别发送至信息识别单元a、信息识别单元b和信息识别单元 c的同时,可以分别指示信息识别单元a、信息识别单元b和信息识别单元c图像区域8a、8b 和8c在图片A中的位置,以便信息识别单元a、信息识别单元b和信息识别单元c分别根据图像区域8a、8b和8c在图片A中的位置识别图片A中相应区域中的信息。As shown in Figure 14, when the picture A is sent to the information identification unit a, the information identification unit b and the information identification unit c respectively, the
在另一种可能的实现方式中,在确定多个场景分别在图像中的位置之后,还可以按照多个场景对应的图像区域,对图像进行物理裁剪,以得到独立的多个子图像,然后将多个子图像分送至不同场景的信息识别单元,以进行相应场景下的文本信息识别。In another possible implementation, after determining the positions of the multiple scenes in the image, the image can also be physically cropped according to the image regions corresponding to the multiple scenes to obtain multiple independent sub-images, and then the Multiple sub-images are sent to information recognition units of different scenes for text information recognition in corresponding scenes.
如图8所示,在经过物理裁剪之后,图像区域(即子图像)8a、8b和8c被分别送至用于识别文档信息的信息识别单元a、用于识别表格信息的信息识别单元b和用于识别卡证信息的信息识别单元c。As shown in Figure 8, after physical cropping, the image areas (i.e., sub-images) 8a, 8b and 8c are respectively sent to the information recognition unit a for recognizing document information, the information recognition unit b for recognizing form information and the information recognition unit b for recognizing form information. An information identification unit c for identifying card information.
作为一种实现方式,在对图像进行物理裁剪时,可以以多个场景对应的图像区域边缘为界限进行裁剪,如图15中的(a)所示。As an implementation manner, when performing physical cropping on an image, the cropping may be performed with edges of image regions corresponding to multiple scenes as boundaries, as shown in (a) in FIG. 15 .
作为另一种实现方式,在对图像进行物理裁剪时,可以先计算多个场景对应的图像区域外的最小包围矩形,然后以多个最小包围矩形的边缘为界限进行裁剪,如图15中的(b)所示。As another implementation, when physically cropping an image, the minimum enclosing rectangle outside the image area corresponding to multiple scenes can be calculated first, and then cropped with the edges of multiple minimum enclosing rectangles as the boundary, as shown in Figure 15 (b) shown.
在一些实施例中,在进行任务分送时,可以通过将多个场景对应的图像区域分送至不同的专项接口(如垂类接口或OCR接口),以分别通过专项接口送至不同场景的信息识别单元。其中,在一些情况下,场景也称垂类场景或OCR场景。例如,可以将图8所示图片A的图像区域8a、8b和8c分送至文档垂类接口、表格垂类接口和卡证垂类接口,以分别通过垂类接口送至对应信息识别单元进行文档信息识别、表格信息识别和卡证信息识别。In some embodiments, when performing task distribution, image regions corresponding to multiple scenes can be distributed to different special interfaces (such as vertical interface or OCR interface), so as to send them to different scenes through special interfaces. Information identification unit. Wherein, in some cases, the scene is also called vertical scene or OCR scene. For example, the
阶段4:结果整合阶段。Phase 4: Results integration phase.
其中,结果整合阶段用于通过将多个信息识别单元对多个场景对应的图像区域的识别结果整理合并,得到图像的信息识别结果。Wherein, the result integration stage is used to obtain the information recognition result of the image by arranging and merging the recognition results of the image regions corresponding to the multiple scenes by the multiple information recognition units.
例如,可以按照预设规则对多个场景对应的图像区域的识别结果进行整合。本申请实施例不限定具体的规则设置,也不限定结合整合时的依据。For example, the recognition results of image regions corresponding to multiple scenes may be integrated according to preset rules. The embodiment of the present application does not limit specific rule settings, nor does it limit the basis for combining and integrating.
在一些实施例中,经过结果整合,图像的信息识别结果可以通过在图像的对应位置上叠加显示标记,以进行结果展示。例如,通过在在图像中相应信息外标记方框,以进行结果展示,如图8、图9、图13或图14所示。In some embodiments, after the result integration, the information recognition result of the image can be displayed by superimposing and displaying a mark on a corresponding position of the image. For example, by marking a box outside the corresponding information in the image to display the result, as shown in FIG. 8 , FIG. 9 , FIG. 13 or FIG. 14 .
在另一些实施例中,经过结果整合,图像的信息识别结果还可以以文档形式进行结果展示,如图16A所示。In other embodiments, after the results are integrated, the image information recognition results can also be displayed in the form of documents, as shown in FIG. 16A .
需要说明的是,本申请实施例对图像的信息识别结果的具体展示形式不作具体限定。例如,图像的信息识别结果还可以以掩码值的形式进行展示。如图16B所示,响应于用户用手指触碰图像区域8a的操作,第一设备弹出悬浮框,其中悬浮框中包括识别出的图像区域8a 中的文本等信息。又如,第一设备可以以与图像一致的排版格式展示对图像的识别结果。It should be noted that the embodiment of the present application does not specifically limit the specific display form of the information recognition result of the image. For example, the image information recognition results can also be displayed in the form of mask values. As shown in FIG. 16B, in response to the operation of the user touching the
作为一种实现方式,可以根据图像中多个场景对应的图像区域的位置,按照预设规则进行整合。例如,预设规则可以是预设位置顺序,如按照由上到下、由左到右的顺序,整合对多个场景对应的图像区域的识别结果,如图16A所示。As an implementation manner, the integration may be performed according to preset rules according to positions of image regions corresponding to multiple scenes in the image. For example, the preset rule may be a preset position sequence, such as integrating recognition results of image regions corresponding to multiple scenes in a sequence from top to bottom and from left to right, as shown in FIG. 16A .
作为另一种实现方式,可以根据图像中多个场景对应的图像区域的显著性值,按照预设规则进行整合。例如,预设规则可以是预设显著性顺序,如按照显著性值由大到小的顺序,整合对多个场景对应的图像区域的识别结果。As another implementation manner, the integration may be performed according to preset rules according to the saliency values of image regions corresponding to multiple scenes in the image. For example, the preset rule may be a preset saliency order, such as integrating recognition results of image regions corresponding to multiple scenes in descending order of saliency values.
需要说明的是,在本申请实施例中,上述阶段1-阶段5的任务可以由第一设备(如智能手机)完成,也可以由云侧设备(如服务器)完成,还可以部分由第一设备负责,部分由云侧设备(以下简称“第二设备”)负责,本申请不限定。It should be noted that, in this embodiment of the application, the tasks of the above-mentioned stages 1 to 5 can be completed by the first device (such as a smart phone), or by a cloud-side device (such as a server), or partly by the first The device is responsible, and part of it is the cloud-side device (hereinafter referred to as "the second device"), which is not limited in this application.
例如,为了降低云侧设备(即第二设备)的计算压力,可以由第一设备执行处理上述阶段1、阶段2、阶段5和阶段3的任务,第二设备执行上述阶段4的任务。For example, in order to reduce the computing pressure of the cloud-side device (that is, the second device), the first device may perform the tasks of processing the above-mentioned
又如,为了便于维护和保护算法细节,可以由第一设备执行处理上述阶段1-阶段2的任务,第二设备执行上述阶段3、阶段5和阶段4的任务。For another example, in order to maintain and protect the details of the algorithm, the first device may perform the tasks of processing the above-mentioned phase 1-
以下将以第一设备执行处理上述阶段1、阶段2、阶段5和阶段3的任务,第二设备执行上述阶段4的任务为例,结合附图,对本申请实施例提供的一种识别图像中文本信息的方法进行具体介绍。The following will take the first device to perform the tasks of the above-mentioned stage 1,
如图17所示,本申请实施例提供的一种识别图像中文本信息的方法可以包括以下S1701-S1706:As shown in Figure 17, a method for identifying text information in an image provided by the embodiment of the present application may include the following S1701-S1706:
S1701、第一设备确定图像的焦点区域。焦点区域是图像中用户感兴趣的区域或者被用户关注的区域。S1701. The first device determines a focus area of an image. The focus area is the area in the image that the user is interested in or the area that the user pays attention to.
作为一种示例,图像的焦点区域中包括但不限于以下一种或多种特征:文字、数字、字母、符号、图形、徽标、颜色、纹理、光线,以及上述特征的具体位置等。As an example, the focus area of the image includes, but is not limited to, one or more of the following features: text, numbers, letters, symbols, graphics, logos, colors, textures, light, and specific positions of the above features.
如图8所示,经过焦点区域分析,可以确定图片A中文字、数字、字母和符号特征所在的图像区域是用户感兴趣或者关注的区域,因此是信息识别过程中需要优先考虑的图像区域。而图像中不包括文字、数字、字母、符号、图形、徽标、纹理等内容的区域,不是用户感兴趣或者被用户关注的区域,或者被用户感兴趣或者关注的程度较低,因此在信息识别时优先级较低或者可以被忽略。As shown in Figure 8, after focus area analysis, it can be determined that the image area where the characters, numbers, letters, and symbols in picture A are located is the area that the user is interested in or concerned about, so it is the image area that needs to be prioritized in the information recognition process. However, the areas in the image that do not include text, numbers, letters, symbols, graphics, logos, textures, etc. are not areas that are of interest or concern to users, or are less interested or concerned by users. lower priority or can be ignored.
又如,图5所示图片B中央的银幕区域通常是用户感兴趣或者关注的区域,而椅子、幕布区域则被用户感兴趣或者关注的程度较低。图7所示图片D中的印刷文字通常是用户感兴趣或者关注的区域,而背景中的干扰文字则被用户感兴趣或者关注的程度较低。As another example, the screen area in the center of picture B shown in FIG. 5 is usually the area that the user is interested in or pays attention to, while the chair and curtain area are less interested or concerned by the user. The printed text in picture D shown in FIG. 7 is usually the area that the user is interested in or pays attention to, while the interfering text in the background is less interested or concerned by the user.
S1702、第一设备将图像分割为多个图像区域。该多个图像区域所表示的对象所属的场景不同。S1702. The first device divides the image into multiple image regions. The objects represented by the plurality of image regions belong to different scenes.
其中,多个图像区域所表示的对象所属的场景可以包括但不限于以下中的一种或多种:卡证场景、车牌场景、照片场景、招牌场景、海报场景、路牌场景、路标场景、文档场景、表格场景、书籍场景、环境场景等。Wherein, the scenes to which the objects represented by the multiple image areas may include but not limited to one or more of the following: card scene, license plate scene, photo scene, signboard scene, poster scene, road sign scene, road sign scene, document Scenes, Table Scenes, Book Scenes, Environmental Scenes, etc.
作为一种实现方式,第一设备可以通过对图像进行图像场景分析,确定图像所表示的多个对象所属的场景(也称“垂类场景”)以及多个场景分别在图像中的位置。As an implementation manner, the first device may perform image scene analysis on the image to determine the scene to which multiple objects represented by the image belong (also called "vertical scene") and the respective positions of the multiple scenes in the image.
其中,图像所表示的多个对象可以由第一设备基于图像的焦点区域中的特征分布确定。其中,图像所表示的对象可以包括但不限于以下中的一种或多种:卡证(如身份证、护照、银行卡、名片等)、车牌、照片、招牌、海报、路牌、路标、文档、表格、书籍等。Wherein, the multiple objects represented by the image may be determined by the first device based on feature distribution in the focus area of the image. Among them, the object represented by the image may include but not limited to one or more of the following: cards (such as ID cards, passports, bank cards, business cards, etc.), license plates, photos, signboards, posters, road signs, road signs, documents , forms, books, etc.
作为另一种实现方式,为了确保图像语义分割的可靠性和准确性,提高图像语义分割的容错性,第一设备可以通过对图像进行图像场景分析结合文本行分析,确定图像所表示的多个对象所属的场景以及多个场景分别在图像中的位置。As another implementation, in order to ensure the reliability and accuracy of image semantic segmentation and improve the fault tolerance of image semantic segmentation, the first device can determine the multiple The scene to which the object belongs and the respective positions of multiple scenes in the image.
例如,假设图像包括第一信息(如第一文本行),且经过图像场景分析,确定上述多个对象所属的场景包括第一场景和第二场景,则第一设备可以通过对图像进行文本行分析,在确定第一信息的第一部分属于第一场景,第一信息的第二部分属于第二场景时,合并第一场景和第二场景对应的图像区域。For example, assuming that the image includes the first information (such as the first text line), and after image scene analysis, it is determined that the scenes to which the above-mentioned objects belong include the first scene and the second scene, then the first device can perform text line information on the image. For analysis, when it is determined that the first part of the first information belongs to the first scene and the second part of the first information belongs to the second scene, image regions corresponding to the first scene and the second scene are merged.
例如,若经过文本行分析确定,同一文本行的一部分属于场景1,另一部分属于场景2,则将场景1和场景2对应的图像区域合并。如图12中的(b)所示,文本行1201横跨了两个场景对应的区域的,对于这种情况,第一设备将该两个场景对应的两个图像区域合并。For example, if it is determined through text line analysis that a part of the same text line belongs to scene 1 and another part belongs to
作为一种实现方式,第一设备中可以集成有共主干双分支分析单元确定图像所表示的多个对象所属的场景以及多个场景分别在图像中的位置。其中,双分支指图像场景分析分支(R 分支)和文本行分析分支(H分支)。共主干指图像语义分割主干。如图18所示,语义分割主干的一个分支场景分析分支(R分支)分析图片A所表示的对象所属的场景,得到图像场景分析结果;另一个分支文本行分析分支(H分支)分析图片A中的文本行分布,得到文本行分析结果。示例性的,在本申请实施例中,主干可以采用U-Net结构和DenseNet结构。As an implementation manner, a common-stem dual-branch analysis unit may be integrated in the first device to determine the scenes to which multiple objects represented by the image belong and the respective positions of the multiple scenes in the image. Among them, the double branch refers to the image scene analysis branch (R branch) and the text line analysis branch (H branch). The common backbone refers to the image semantic segmentation backbone. As shown in Figure 18, a branch scene analysis branch (R branch) of the semantic segmentation backbone analyzes the scene to which the object represented by picture A belongs, and obtains the image scene analysis result; another branch text line analysis branch (H branch) analyzes picture A The text line distribution in , get the text line analysis result. Exemplarily, in the embodiment of the present application, the backbone may adopt a U-Net structure and a DenseNet structure.
S1703、第一设备确定多个图像区域所表示的对象的重要程度。S1703. The first device determines the importance of objects represented by multiple image regions.
例如,第一设备可以通过显著性分析,以确定多个图像区域所表示的对象的重要程度。For example, the first device may perform a saliency analysis to determine the importance of objects represented by multiple image regions.
作为一种示例,第一设备可以根据多个图像区域中的亮度、纹理、焦点区域的掩码值等确定其重要程度。As an example, the first device may determine the importance of multiple image regions according to the brightness, texture, mask value of the focus region, and the like.
作为一种示例,第一设备可以基于上述公式1计算得到多个图像区域的显著性值。其中,显著性值越高,则对应图像区域越重要;显著性值越低,则对应图像区域越不重要。As an example, the first device may calculate the saliency values of multiple image regions based on the foregoing formula 1. Among them, the higher the saliency value, the more important the corresponding image region; the lower the saliency value, the less important the corresponding image region.
在本申请实施例中,第一设备可以根据多个图像区域所表示的对象的重要程度执行后续任务分送阶段的任务。示例性的,第一设备可以基于具体场景,分别识别多个图像区域中,所表示的对象重要程度高的一个或多个对应图像区域中的信息。In the embodiment of the present application, the first device may perform tasks in the subsequent task distribution stage according to the importance of objects represented by the multiple image regions. Exemplarily, the first device may respectively identify information in one or more corresponding image regions among the plurality of image regions based on a specific scene, in which the represented object is of high importance.
例如,在某一图像区域的显著性值小于显著性阈值时,放弃对该图像区域进行任务分送及后续文本信息识别。通过综合考虑区域显著性,可以有效过滤掉图像中不重要的区域,以避免该不重要的区域对信息识别单元的资源占用,导致的信息识别单元负荷较高的问题。For example, when the saliency value of a certain image region is less than the saliency threshold, the task distribution and subsequent text information recognition for the image region are abandoned. By comprehensively considering the regional saliency, unimportant areas in the image can be effectively filtered out, so as to avoid the resource occupation of the information identification unit by the unimportant areas, resulting in a high load of the information identification unit.
需要说明的是,在一些实施例中,第一设备还可以不执行上述S1703,即第一设备在执行完上述S1702之后,继续执行以下S1704。It should be noted that, in some embodiments, the first device may not perform the foregoing S1703, that is, the first device continues to perform the following S1704 after performing the foregoing S1702.
S1704、第一设备将多个图像区域的信息识别任务发送给第二设备的对应信息识别单元。S1704. The first device sends the information identification tasks of the multiple image regions to the corresponding information identification unit of the second device.
作为一种示例,信息识别单元中可以集成有用于对对应场景下进行信息识别的专项模型或者专项算法。As an example, a specific model or a specific algorithm for identifying information in a corresponding scenario may be integrated in the information identification unit.
例如,假设图像所表示的对象属于三个场景(如第一场景、第二场景和第三场景),该三个场景分别对应图像中的第一图像区域、第二图像区域和第三图像区域,则在S1704,第一设备可以向第二设备(如第二设备的信息识别管理单元)发送第一识别任务、第二识别任务和第三识别任务。第二设备(如第二设备的信息识别管理单元)在接收到上述识别任务之后,按照场景将第一识别任务、第二识别任务和第三识别任务分别分发至第一识别单元、第二识别单元和第三识别单元。其中,第一识别单元用于对属于第一场景的图像区域(即第一图像区域)进行信息识别,第二识别单元用于对属于第二场景的图像区域(即第二图像区域)进行信息识别,第三识别单元用于对属于第三场景的图像区域(即第三图像区域)进行信息识别。For example, assume that the object represented by the image belongs to three scenes (such as the first scene, the second scene and the third scene), and the three scenes correspond to the first image area, the second image area and the third image area in the image respectively , then in S1704, the first device may send the first recognition task, the second recognition task and the third recognition task to the second device (such as the information recognition management unit of the second device). After the second device (such as the information recognition management unit of the second device) receives the above recognition tasks, it distributes the first recognition task, the second recognition task and the third recognition task to the first recognition unit and the second recognition task according to the scene. unit and the third identification unit. Wherein, the first identification unit is used for information identification of the image area (i.e. the first image area) belonging to the first scene, and the second identification unit is used for information identification of the image area (i.e. the second image area) belonging to the second scene. Identifying, the third identifying unit is used to identify the information of the image area (ie the third image area) belonging to the third scene.
作为一种示例,信息识别任务中可以携带有图像和对应图像区域在图像中的位置(如对应图像区域在图像中的坐标信息)。例如,第一识别任务中携带有图像和第一图像区域在图像中的位置,第二识别任务中携带有图像和第二图像区域在图像中的位置,第三识别任务中携带有图像和第三图像区域在图像中的位置。As an example, the information recognition task may carry the image and the position of the corresponding image region in the image (such as the coordinate information of the corresponding image region in the image). For example, the first recognition task carries the image and the position of the first image region in the image, the second recognition task carries the image and the position of the second image region in the image, and the third recognition task carries the image and the position of the first The position of the three image regions in the image.
作为另一种示例,信息识别任务中可以携带有对应图像区域。例如,第一识别任务中携带有第一图像区域,第二识别任务中携带有第二图像区域,第三识别任务中携带有第三图像区域。As another example, the information recognition task may carry corresponding image regions. For example, the first recognition task carries the first image region, the second recognition task carries the second image region, and the third recognition task carries the third image region.
S1705、第二设备的多个信息识别单元分别识别对应图像区域中的信息,得到多个信息识别结果。S1705. The multiple information identification units of the second device respectively identify the information in the corresponding image area, and obtain multiple information identification results.
示例性的,第二设备的多个信息识别单元可以分别使用对应场景的专项模型或者专项算法,对多个图像区域中的一个或多个图像区域进行信息识别,得到对一个或多个图像区域的信息识别结果。Exemplarily, the multiple information identification units of the second device can respectively use a specific model or a specific algorithm corresponding to the scene to perform information identification on one or more image areas in the multiple image areas, and obtain the information on one or more image areas information identification results.
以图像所表示的对象属于三个场景(如第一场景、第二场景和第三场景),第二设备的第一识别单元、第二识别单元和第三识别单元分别负责上述三个场景对应的图像区域的信息识别任务为例,在S1705中,第一识别单元识别第一图像区域中的信息,得到第一识别结果;第二识别单元识别第二图像区域中的信息,得到第二识别结果;第三识别单元识别第三图像区域中的信息,得到第三识别结果。The object represented by the image belongs to three scenes (such as the first scene, the second scene and the third scene), and the first recognition unit, the second recognition unit and the third recognition unit of the second device are respectively responsible for the correspondence between the above three scenes Taking the information recognition task of the image area of the image area as an example, in S1705, the first identification unit identifies the information in the first image area to obtain the first identification result; the second identification unit identifies the information in the second image area to obtain the second identification Result; the third recognition unit recognizes the information in the third image area to obtain a third recognition result.
作为一种示例,信息识别单元输出的信息识别结果可以以坐标的形式记录,可以在图像或者图像区域的对应位置上叠加标记,还可以是文档格式的,本申请实施例不限定。As an example, the information identification result output by the information identification unit may be recorded in the form of coordinates, may be superimposed on the image or the corresponding position of the image area, and may also be in a document format, which is not limited in this embodiment of the application.
S1706、第二设备根据多个信息识别结果,得到上述图像的信息识别结果。S1706. The second device obtains the information recognition result of the image according to the multiple information recognition results.
例如,第二设备可以通过将信息识别单元对一个或多个图像区域的信息识别结果整理合并,得到图像的信息识别结果。For example, the second device may obtain the information recognition result of the image by arranging and merging the information recognition results of one or more image regions by the information recognition unit.
以图像所表示的对象属于三个场景(如第一场景、第二场景和第三场景),第一识别单元、第二识别单元和第三识别单元分别识别第一图像区域、第二图像区域和第三图像区域中的信息,分别得到第一识别结果、第二识别结果和第三识别结果为例,S1706具体可以包括:第二设备根据第一识别结果、第二识别结果和第三识别结果,得到图像的信息识别结果。The object represented by the image belongs to three scenes (such as the first scene, the second scene and the third scene), and the first recognition unit, the second recognition unit and the third recognition unit respectively recognize the first image area, the second image area and the information in the third image area, respectively obtaining the first recognition result, the second recognition result and the third recognition result as an example, S1706 may specifically include: the second device according to the first recognition result, the second recognition result and the third recognition result As a result, an information recognition result of the image is obtained.
例如,第二设备可以按照预设规则,整合对一个或多个图像区域的信息识别结果。本申请实施例不限定具体的规则设置,也不限定结果整合时的依据。例如,预设规则可以是预设位置顺序,如按照由上到下、由左到右的顺序,整合对多个图像区域的识别结果。又如,预设规则可以是预设显著性顺序,如按照显著性值由大到小的顺序,整合对多个图像区域的识别结果。For example, the second device may integrate information recognition results of one or more image regions according to preset rules. The embodiment of the present application does not limit specific rule settings, nor does it limit the basis for result integration. For example, the preset rule may be a preset position sequence, such as integrating the recognition results of multiple image regions in a sequence from top to bottom and from left to right. As another example, the preset rule may be a preset saliency order, such as integrating the recognition results of multiple image regions in descending order of saliency values.
进一步的,作为一种示例,第二设备可以将得到的图像的信息识别结果发送至第一设备,以便第一设备进行相关展示。Further, as an example, the second device may send the obtained information recognition result of the image to the first device, so that the first device can display it.
本申请实施例提供的识别图像中文本信息的方法通过综合利用焦点区域分析算法(如视觉显著性检测算法)、图像语义分割算法对图像进行场景分析,并根据分析结果调用对应的专项OCR能力,如专项模型分场景进行信息识别,最后整合多个专项模型的信息识别结果,得到最终图像的信息识别结果。The method for identifying text information in an image provided by the embodiment of the present application comprehensively utilizes a focus area analysis algorithm (such as a visual saliency detection algorithm) and an image semantic segmentation algorithm to analyze the scene of the image, and calls the corresponding special OCR capability according to the analysis result, For example, the special model performs information recognition in different scenes, and finally integrates the information recognition results of multiple special models to obtain the information recognition result of the final image.
本申请实施例提供的识别图像中文本信息的方法相比于常规信息识别方法,可以得到更加准确、可靠的信息识别结果,为用户提供更加便捷、高效、可靠的智能化信息识别体验。Compared with conventional information recognition methods, the method for recognizing text information in images provided by the embodiments of the present application can obtain more accurate and reliable information recognition results, and provide users with a more convenient, efficient and reliable intelligent information recognition experience.
例如,相比于图1或者图4所示采用常规信息识别方法对图片A信息识别的结果,如图 8、图9或者图14所示,本申请实施例提供的识别图像中文本信息的方法可以通过图像语义分割,将图片A分割为属于文档场景的图像区域、属于表格场景的图像区域和属于卡证场景的图像区域。进一步的,通过对文档场景的图像区域、表格场景的图像区域和卡证场景的图像区域分别进行信息识别,成功识别图片评论中的文字,大大提升了信息识别的速度、成功率和准确性,提高用户体验。For example, compared to the result of identifying the picture A information using the conventional information identification method shown in Figure 1 or Figure 4, as shown in Figure 8, Figure 9 or Figure 14, the method for identifying text information in the image provided by the embodiment of the present application Image A can be segmented into an image area belonging to a document scene, an image area belonging to a form scene, and an image area belonging to a card scene through image semantic segmentation. Furthermore, through the information recognition of the image area of the document scene, the image area of the table scene and the image area of the card scene, the text in the picture comment was successfully recognized, which greatly improved the speed, success rate and accuracy of information recognition. Improve user experience.
又如,如图19所示,相比于采用常规信息识别方法对图片B进行信息识别时,由于信息识别模型或者专项算法的输入尺寸受限等原因,无法识别图片B中央银幕上的细小密集文字的结果,本申请实施例提供的识别图像中文本信息的方法可以通过图像语义分割,将图片 B分割为属于表格场景的图像区域和属于环境场景的图像区域。进一步的,通过对多个图像区域进行区域显著性计算,将显著性低的图像区域不送至信息识别单元,间接扩大了图片B 中央银幕上的细小密集文字的图像占比,大大提升了信息识别的速度、成功率和准确性,提高用户体验。As another example, as shown in Figure 19, compared to the information recognition of picture B using conventional information recognition methods, due to the limited input size of the information recognition model or special algorithm, it is impossible to recognize the small and dense images on the central screen of picture B. As a result of the text, the method for identifying text information in an image provided by the embodiment of the present application can divide the picture B into an image area belonging to the table scene and an image area belonging to the environment scene through image semantic segmentation. Furthermore, by calculating the regional saliency of multiple image regions, the image regions with low salience are not sent to the information recognition unit, which indirectly expands the image ratio of the small and dense text on the central screen of picture B, greatly improving the information The speed, success rate and accuracy of recognition improve user experience.
又如,如图20所示,相比于采用常规信息识别方法基于单一专项模型对图片C进行信息识别时,由于文字评论和图片评论的场景反差大,导致图片评论中的文字无法被识别的漏检结果,本申请实施例提供的识别图像中文本信息的方法可以通过图像语义分割,将图片C 分割为属于表格场景的图像区域和属于图片场景的图像区域。进一步的,通过对表格场景的图像区域和图片场景的图像区域分别进行信息识别,成功识别图片评论中的文字,大大提升了信息识别的速度、成功率和准确性,提高用户体验。As another example, as shown in Figure 20, compared with the conventional information recognition method based on a single special model for information recognition of picture C, due to the large contrast between text comments and picture comments, the text in picture comments cannot be recognized As a result of missed detection, the method for identifying text information in an image provided by the embodiment of the present application can divide the picture C into an image area belonging to the table scene and an image area belonging to the picture scene through image semantic segmentation. Further, by performing information recognition on the image area of the table scene and the image area of the picture scene, the text in the picture comment is successfully recognized, which greatly improves the speed, success rate and accuracy of information recognition, and improves user experience.
又如,如图21所示,相比于采用常规信息识别方法不考虑图片D中信息显著性,导致图片D中背景区域的干扰文字被识别的结果,本申请实施例提供的识别图像中文本信息的方法可以通过计算区域显著性过滤图像背景中的干扰文字,避免对干扰文字的识别对用户造成的干扰,同时节省信息识别模型或算法的算力,提高用户体验。As another example, as shown in Figure 21, compared to the result of recognizing the interfering text in the background area of the picture D without considering the salience of the information in the picture D by using the conventional information recognition method, the recognition of the text in the image provided by the embodiment of the present application The information method can filter the disturbing text in the image background by calculating the regional saliency, avoid the interference caused by the recognition of the disturbing text to the user, save the computing power of the information recognition model or algorithm, and improve the user experience.
应理解,本申请实施例的各个方案可以进行合理的组合使用,并且实施例中出现的各个术语的解释或说明可以在各个实施例中互相参考或解释,对此不作限定。It should be understood that various schemes of the embodiments of the present application can be used in a reasonable combination, and the explanations or descriptions of various terms appearing in the embodiments can be referred to or interpreted in each embodiment, which is not limited.
还应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should also be understood that in various embodiments of the present application, the serial numbers of the above-mentioned processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and internal logic, and should not be implemented in this application. The implementation of the examples constitutes no limitation.
可以理解的是,第一设备为了实现上述任一个实施例的功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to realize the functions of any one of the foregoing embodiments, the first device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that the present application can be implemented in the form of hardware or a combination of hardware and computer software in combination with the units and algorithm steps of each example described in the embodiments disclosed herein. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
本申请实施例可以对第一设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In this embodiment of the present application, the first device may be divided into functional modules. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
比如,以采用集成的方式划分各个功能模块的情况下,如图22所示,为本申请实施例提供的一种第一设备的结构框图。如图22所示,第一设备可以包括处理单元2210、收发单元 2220、存储单元2230和显示单元2240。For example, in the case of dividing each functional module in an integrated manner, as shown in FIG. 22 , it is a structural block diagram of a first device provided in the embodiment of the present application. As shown in FIG. 22 , the first device may include a processing unit 2210, a transceiver unit 2220, a
其中,处理单元2210用于支持第一设备确定图像的焦点区域,按照场景将图像分割为多个图像区域,确定多个图像区域所表示的对象的重要程度,识别多个图像区域中的信息,整合对多个图像区域中信息的识别结果,和/或与本申请实施例相关的其他过程。收发单元2220 用于支持第一设备将多个图像区域的信息识别任务发送给第二设备的对应信息识别单元,接收来自第二设备的图像的信息识别结果,和/或与本申请实施例相关的其他过程。存储单元 2230用于支持第一设备存储计算机程序和实现本申请实施例提供的方法中的处理数据和/或处理结果(如信息识别结果)等。显示单元2240用于支持第一设备显示图像,显示图像的信息识别结果,和/或与本申请实施例相关的其他界面。Wherein, the processing unit 2210 is used to support the first device to determine the focus area of the image, divide the image into multiple image areas according to the scene, determine the importance of objects represented by the multiple image areas, and identify information in the multiple image areas, Integrating recognition results of information in multiple image regions, and/or other processes related to the embodiments of the present application. The transceiver unit 2220 is used to support the first device to send the information recognition tasks of multiple image regions to the corresponding information recognition unit of the second device, receive the image information recognition result from the second device, and/or be related to the embodiment of the present application other processes. The
作为一种示例,上述收发单元2220可以包括射频电路。具体的,第一设备可以通过射频电路进行无线信号的接收和发送。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频电路还可以通过无线通信和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统、通用分组无线服务、码分多址、宽带码分多址、长期演进、电子邮件、短消息服务等。As an example, the transceiver unit 2220 may include a radio frequency circuit. Specifically, the first device may receive and send wireless signals through a radio frequency circuit. Typically, radio frequency circuitry includes, but is not limited to, an antenna, at least one amplifier, transceiver, coupler, low noise amplifier, duplexer, and the like. In addition, radio frequency circuits can also communicate with other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile Communications, General Packet Radio Service, Code Division Multiple Access, Wideband Code Division Multiple Access, Long Term Evolution, Email, Short Message Service, etc.
应理解,第一设备中的各个模块可以通过软件和/或硬件形式实现,对此不作具体限定。换言之,电子设备是以功能模块的形式来呈现。这里的“模块”可以指特定应用集成电路ASIC、电路、执行一个或多个软件或固件程序的处理器和存储器、集成逻辑电路,和/或其他可以提供上述功能的器件。It should be understood that each module in the first device may be implemented in the form of software and/or hardware, which is not specifically limited. In other words, electronic equipment is presented in the form of functional modules. The "module" here may refer to an application-specific integrated circuit ASIC, a circuit, a processor and memory executing one or more software or firmware programs, an integrated logic circuit, and/or other devices that can provide the above-mentioned functions.
在一种可选的方式中,当使用软件实现数据传输时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地实现本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线((digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如软盘、硬盘、磁带)、光介质(例如数字化视频光盘(digital video disk,DVD))、或者半导体介质(例如固态硬盘 solid state disk(SSD))等。In an optional manner, when software is used to implement data transmission, it may be implemented in whole or in part in the form of computer program products. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are realized in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more available media integrated. The available medium can be a magnetic medium, (such as a floppy disk, a hard disk, etc. , magnetic tape), optical media (such as digital video disk (DVD)), or semiconductor media (such as solid state disk (SSD)), etc.
结合本申请实施例所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于电子设备中。当然,处理器和存储介质也可以作为分立组件存在第一设备中。The steps of the methods or algorithms described in conjunction with the embodiments of the present application may be implemented in hardware, or may be implemented in a manner in which a processor executes software instructions. The software instructions can be composed of corresponding software modules, and the software modules can be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, mobile hard disk, CD-ROM or any other form of storage known in the art medium. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be a component of the processor. The processor and storage medium can be located in the ASIC. Alternatively, the ASIC may be located in the electronic device. Of course, the processor and the storage medium may also exist in the first device as discrete components.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated according to needs It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
Claims (30)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111234376.5A CN116012570A (en) | 2021-10-22 | 2021-10-22 | A method, device and system for recognizing text information in an image |
| PCT/CN2022/124123 WO2023066047A1 (en) | 2021-10-22 | 2022-10-09 | Method for recognizing text information in image, and device and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111234376.5A CN116012570A (en) | 2021-10-22 | 2021-10-22 | A method, device and system for recognizing text information in an image |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN116012570A true CN116012570A (en) | 2023-04-25 |
Family
ID=86021685
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111234376.5A Pending CN116012570A (en) | 2021-10-22 | 2021-10-22 | A method, device and system for recognizing text information in an image |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116012570A (en) |
| WO (1) | WO2023066047A1 (en) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117156108A (en) * | 2023-10-31 | 2023-12-01 | 中海物业管理有限公司 | Enhanced display system and method for machine room equipment monitoring picture |
| CN118154883A (en) * | 2024-05-11 | 2024-06-07 | 上海蜜度科技股份有限公司 | Target semantic segmentation method, system, storage medium and electronic device |
| WO2025139049A1 (en) * | 2023-12-25 | 2025-07-03 | 荣耀终端股份有限公司 | Method for creating schedule information, electronic device and readable storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109933756A (en) * | 2019-03-22 | 2019-06-25 | 腾讯科技(深圳)有限公司 | Image conversion method, device, device and readable storage medium based on OCR |
| CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text area detection method, device, electronic terminal and computer-readable storage medium |
| CN110930410A (en) * | 2019-10-28 | 2020-03-27 | 维沃移动通信有限公司 | Image processing method, server and terminal equipment |
| WO2021000841A1 (en) * | 2019-06-30 | 2021-01-07 | 华为技术有限公司 | Method for generating user profile photo, and electronic device |
| CN113255566A (en) * | 2021-06-11 | 2021-08-13 | 支付宝(杭州)信息技术有限公司 | Form image recognition method and device |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8031940B2 (en) * | 2006-06-29 | 2011-10-04 | Google Inc. | Recognizing text in images using ranging data |
| CN108229481B (en) * | 2017-12-25 | 2020-09-11 | 中国移动通信集团江苏有限公司 | Screen content analysis method, device, computing device and storage medium |
| CN108898138A (en) * | 2018-05-30 | 2018-11-27 | 西安理工大学 | Scene text recognition methods based on deep learning |
| CN109447078B (en) * | 2018-10-23 | 2020-11-06 | 四川大学 | A detection and recognition method for sensitive text in natural scene images |
-
2021
- 2021-10-22 CN CN202111234376.5A patent/CN116012570A/en active Pending
-
2022
- 2022-10-09 WO PCT/CN2022/124123 patent/WO2023066047A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109933756A (en) * | 2019-03-22 | 2019-06-25 | 腾讯科技(深圳)有限公司 | Image conversion method, device, device and readable storage medium based on OCR |
| WO2021000841A1 (en) * | 2019-06-30 | 2021-01-07 | 华为技术有限公司 | Method for generating user profile photo, and electronic device |
| CN110751146A (en) * | 2019-10-23 | 2020-02-04 | 北京印刷学院 | Text area detection method, device, electronic terminal and computer-readable storage medium |
| CN110930410A (en) * | 2019-10-28 | 2020-03-27 | 维沃移动通信有限公司 | Image processing method, server and terminal equipment |
| CN113255566A (en) * | 2021-06-11 | 2021-08-13 | 支付宝(杭州)信息技术有限公司 | Form image recognition method and device |
Non-Patent Citations (1)
| Title |
|---|
| 买买提依明・哈斯木;吾守尔・斯拉木;维尼拉・木沙江;: "维吾尔文后缀树构造算法的设计与实现", 计算机工程与应用, no. 08, 7 January 2013 (2013-01-07) * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117156108A (en) * | 2023-10-31 | 2023-12-01 | 中海物业管理有限公司 | Enhanced display system and method for machine room equipment monitoring picture |
| CN117156108B (en) * | 2023-10-31 | 2024-03-15 | 中海物业管理有限公司 | Enhanced display system and method for machine room equipment monitoring picture |
| WO2025139049A1 (en) * | 2023-12-25 | 2025-07-03 | 荣耀终端股份有限公司 | Method for creating schedule information, electronic device and readable storage medium |
| CN118154883A (en) * | 2024-05-11 | 2024-06-07 | 上海蜜度科技股份有限公司 | Target semantic segmentation method, system, storage medium and electronic device |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2023066047A1 (en) | 2023-04-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN111625431B (en) | Log information generation method and device and electronic equipment | |
| US11113523B2 (en) | Method for recognizing a specific object inside an image and electronic device thereof | |
| WO2023066047A1 (en) | Method for recognizing text information in image, and device and system | |
| CN114115619B (en) | Method and electronic device for displaying application program interface | |
| US20230005277A1 (en) | Pose determining method and related device | |
| WO2021036715A1 (en) | Image-text fusion method and apparatus, and electronic device | |
| WO2021258797A1 (en) | Image information input method, electronic device, and computer readable storage medium | |
| CN112508785A (en) | Picture processing method and device | |
| CN116095413B (en) | Video processing method and electronic equipment | |
| CN115937722A (en) | A device positioning method, device and system | |
| CN110968252A (en) | Display method of interactive system, interactive system and electronic equipment | |
| CN113760137A (en) | Cursor display method and electronic equipment | |
| CN115379208B (en) | Camera evaluation method and device | |
| CN114125134B (en) | Contactless operation method and device, server and electronic equipment | |
| CN117131213B (en) | Image processing method and related equipment | |
| CN116723415B (en) | Thumbnail generation method and terminal equipment | |
| CN118233558A (en) | Display method, user interface and related device | |
| CN116701795A (en) | Page display method and electronic device | |
| CN115633114A (en) | Address book letter display method, device and terminal equipment | |
| CN118445435B (en) | Media data search method and electronic device | |
| EP4290345B1 (en) | Cursor display method and electronic device | |
| US20250225710A1 (en) | Screen wallpaper display method and electronic device | |
| WO2025200671A1 (en) | Region selection method, electronic device, and computer readable storage medium | |
| WO2025140290A1 (en) | Graph drawing method and related apparatus | |
| CN118568380A (en) | Human-computer interaction method, electronic device and system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |