WO2021081562A3 - Multi-head text recognition model for multi-lingual optical character recognition - Google Patents
Multi-head text recognition model for multi-lingual optical character recognition Download PDFInfo
- Publication number
- WO2021081562A3 WO2021081562A3 PCT/US2021/014171 US2021014171W WO2021081562A3 WO 2021081562 A3 WO2021081562 A3 WO 2021081562A3 US 2021014171 W US2021014171 W US 2021014171W WO 2021081562 A3 WO2021081562 A3 WO 2021081562A3
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- electronic device
- image
- language
- textual content
- optical character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
This application is directed to performing optical character recognition (OCR) using deep learning techniques. An electronic device receives an image and a language indicator that indicates that the textual content in the image corresponds to a first language. The electronic device processes the image using a multilingual text recognition model applicable to a plurality of languages. The electronic device generates a feature sequence including a plurality of probability values corresponding to the textual content of the image. The feature sequence includes a plurality of feature subsets that correspond to the plurality of languages. For each feature subset, each probability value indicates a probability that a respective textual content corresponds to a respective character in a dictionary of the corresponding language. The electronic device constructs a sparse mask based on the first language and combines the feature sequence and the sparse mask to determine the textual content.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2021/014171 WO2021081562A2 (en) | 2021-01-20 | 2021-01-20 | Multi-head text recognition model for multi-lingual optical character recognition |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/US2021/014171 WO2021081562A2 (en) | 2021-01-20 | 2021-01-20 | Multi-head text recognition model for multi-lingual optical character recognition |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2021081562A2 WO2021081562A2 (en) | 2021-04-29 |
| WO2021081562A3 true WO2021081562A3 (en) | 2021-12-09 |
Family
ID=75620895
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2021/014171 Ceased WO2021081562A2 (en) | 2021-01-20 | 2021-01-20 | Multi-head text recognition model for multi-lingual optical character recognition |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2021081562A2 (en) |
Families Citing this family (14)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021237227A1 (en) * | 2021-07-01 | 2021-11-25 | Innopeak Technology, Inc. | Method and system for multi-language text recognition model with autonomous language classification |
| CN113744281A (en) * | 2021-07-20 | 2021-12-03 | 北京旷视科技有限公司 | Instance segmentation network training and instance segmentation method, device and electronic device |
| CN113657391A (en) * | 2021-08-13 | 2021-11-16 | 北京百度网讯科技有限公司 | Training method of character recognition model, and method and device for recognizing characters |
| CN113762269B (en) * | 2021-09-08 | 2024-03-22 | 深圳市网联安瑞网络科技有限公司 | Chinese character OCR recognition method, system and medium based on neural network |
| CN113744157B (en) * | 2021-09-09 | 2025-06-03 | 深圳市医诺智能科技发展有限公司 | A method and terminal for medical image denoising and enhancement |
| CN114022876B (en) * | 2021-11-15 | 2025-03-07 | 中再云图技术有限公司 | An electronic scale image text recognition method based on artificial intelligence OCR |
| CN114120321B (en) * | 2021-12-01 | 2024-07-23 | 北京比特易湃信息技术有限公司 | Text recognition method based on multi-dictionary sample weighting |
| CN114495111B (en) * | 2022-01-20 | 2024-07-26 | 北京字节跳动网络技术有限公司 | Text recognition method and device, readable medium and electronic equipment |
| CN114445812B (en) * | 2022-01-30 | 2025-12-09 | 北京有竹居网络技术有限公司 | Character recognition method, device, equipment and medium |
| CN114818738B (en) * | 2022-03-01 | 2024-08-02 | 达观数据有限公司 | A method and system for identifying customer service hotline user intention trajectory |
| CN114973224A (en) * | 2022-04-12 | 2022-08-30 | 北京百度网讯科技有限公司 | Character recognition method and device, electronic equipment and storage medium |
| CN114596566B (en) * | 2022-04-18 | 2022-08-02 | 腾讯科技(深圳)有限公司 | Text recognition method and related device |
| CN114495114B (en) * | 2022-04-18 | 2022-08-05 | 华南理工大学 | Text sequence recognition model calibration method based on CTC decoder |
| CN114821566B (en) * | 2022-05-13 | 2024-06-14 | 北京百度网讯科技有限公司 | Text recognition method, device, electronic device and storage medium |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6047251A (en) * | 1997-09-15 | 2000-04-04 | Caere Corporation | Automatic language identification system for multilingual optical character recognition |
| KR20060046128A (en) * | 2004-05-20 | 2006-05-17 | 마이크로소프트 코포레이션 | Low resolution OCR for camera input documents |
| US20110123115A1 (en) * | 2009-11-25 | 2011-05-26 | Google Inc. | On-Screen Guideline-Based Selective Text Recognition |
| US20150370785A1 (en) * | 2014-06-24 | 2015-12-24 | Google Inc. | Techniques for machine language translation of text from an image based on non-textual context information from the image |
-
2021
- 2021-01-20 WO PCT/US2021/014171 patent/WO2021081562A2/en not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6047251A (en) * | 1997-09-15 | 2000-04-04 | Caere Corporation | Automatic language identification system for multilingual optical character recognition |
| KR20060046128A (en) * | 2004-05-20 | 2006-05-17 | 마이크로소프트 코포레이션 | Low resolution OCR for camera input documents |
| US20110123115A1 (en) * | 2009-11-25 | 2011-05-26 | Google Inc. | On-Screen Guideline-Based Selective Text Recognition |
| US20150370785A1 (en) * | 2014-06-24 | 2015-12-24 | Google Inc. | Techniques for machine language translation of text from an image based on non-textual context information from the image |
Non-Patent Citations (1)
| Title |
|---|
| CHAE HO-YEOL: "Scene Text Recognition Performance Improvement through an Add-on of an OCR based Classifier", JOURNAL OF IKEEE, vol. 24, no. 4, 1 January 2020 (2020-01-01), pages 1086 - 1092, XP055877271, ISSN: 2288-243X, DOI: 10.7471/ikeee.2020.24.4.1086 * |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2021081562A2 (en) | 2021-04-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2021081562A3 (en) | Multi-head text recognition model for multi-lingual optical character recognition | |
| CN101002198B (en) | Systems and methods for spell correction of non-roman characters and words | |
| Vinnarasu et al. | Speech to text conversion and summarization for effective understanding and documentation | |
| US11416709B2 (en) | Method, apparatus, device and computer readable medium for generating VQA training data | |
| CN102982021A (en) | Method for disambiguating multiple readings in language conversion | |
| US11907665B2 (en) | Method and system for processing user inputs using natural language processing | |
| US11741317B2 (en) | Method and system for processing multilingual user inputs using single natural language processing model | |
| CN108536654A (en) | Identify textual presentation method and device | |
| CN112231480A (en) | A bert-based phonetic hybrid error correction model | |
| CN113268576B (en) | Deep learning-based department semantic information extraction method and device | |
| CN105760359A (en) | Question processing system and method thereof | |
| Tursun et al. | Noisy Uyghur text normalization | |
| Khosrobeigi et al. | A rule-based post-processing approach to improve Persian OCR performance | |
| US20210064820A1 (en) | Machine learning lexical discovery | |
| CN111241845B (en) | Automatic financial subject identification method and device based on semantic matching method | |
| Al-Sanabani et al. | Improved an algorithm for Arabic name matching | |
| CN112148862A (en) | Question intention identification method and device, storage medium and electronic equipment | |
| Nguyen et al. | OCR error correction for Vietnamese handwritten text using neural machine translation | |
| US20210073466A1 (en) | Semantic vector rule discovery | |
| Chiu et al. | Chinese spell checking based on noisy channel model | |
| Lu et al. | An automatic spelling correction method for classical mongolian | |
| CN111738023A (en) | Automatic image-text audio translation method and system | |
| CN112668312A (en) | Wrongly written character correction method and device, electronic equipment and storage medium | |
| Liu et al. | Chinese Spelling Correction: A Comprehensive Survey of Progress, Challenges, and Opportunities | |
| Ratnam et al. | Phonogram-based Automatic Typo Correction in Malayalam Social Media Comments |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21720671 Country of ref document: EP Kind code of ref document: A2 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 21720671 Country of ref document: EP Kind code of ref document: A2 |