[go: up one dir, main page]

US20140279323A1 - Systems and methods for capturing critical fields from a mobile image of a credit card bill - Google Patents

Systems and methods for capturing critical fields from a mobile image of a credit card bill Download PDF

Info

Publication number
US20140279323A1
US20140279323A1 US14/217,241 US201414217241A US2014279323A1 US 20140279323 A1 US20140279323 A1 US 20140279323A1 US 201414217241 A US201414217241 A US 201414217241A US 2014279323 A1 US2014279323 A1 US 2014279323A1
Authority
US
United States
Prior art keywords
payee
credit card
biller
found
bill
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/217,241
Inventor
Vitali Kliatskine
Grigori Nepomniachtchi
Nikolay Kotovich
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mitek Systems Inc
Original Assignee
Mitek Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mitek Systems Inc filed Critical Mitek Systems Inc
Priority to US14/217,241 priority Critical patent/US20140279323A1/en
Publication of US20140279323A1 publication Critical patent/US20140279323A1/en
Priority to US15/338,203 priority patent/US10509958B2/en
Assigned to MITEK SYSTEMS, INC. reassignment MITEK SYSTEMS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: KOTOVICH, NIKOLAY, NEPOMNIACHTCHI, GRIGORI, KLIATSKINE, VITALI
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/10544Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation by scanning of the records by radiation in the optical part of the electromagnetic spectrum
    • G06K7/10712Fixed beam scanning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14131D bar codes

Definitions

  • the embodiments described herein relate to processing images captured using a mobile device, and more particularly to identifying critical fields in a credit card remittance coupon and extracting the content therein.
  • a balance transfer where a customer with a balance due on a credit card can transfer some or all of the outstanding balance from one credit card to another credit card.
  • Customers typically transfer balances from one card to another to obtain a lower interest rate, more favorable payment schedule, or other benefits offered by a credit card for carrying a balance with a particular financial institution.
  • a balance transfer may also be similar to a cash advance, where a customer can transfer a sum of money from their credit card into their bank account, resulting in a balance due on the credit card but giving the customer cash in their bank account.
  • the customer already holds the credit card where the balance is being transferred, while in other situations, the customer may be opening a new credit card and transferring a balance to the new credit card.
  • Banks often compete with other banks to advertise lower interest rates and favorable payment terms on a balance transfer.
  • the balance transfer process is cumbersome for both the customer and the bank.
  • the customer must obtain several different pieces of information, including the customer's name, contact information, credit card number, the current balance and the applicable interest rates that are applicable to the balance. If the balance is being transferred to a bank account, other information may be needed, such as a bank account number and routing number.
  • a bank may also want to evaluate the credit history of the customer to determine whether to accept the balance transfer application, in which case the customer will need to provide even more information, such as a social security number, driver's license number or additional financial information.
  • the receiving bank evaluates the information to determine whether to accept the balance transfer request. This process may take a significant amount of time—generally several days. Once accepted, it may take several more day or even weeks before the money is transferred.
  • Embodiments described herein provide for the identification of critical fields on a document which provide high probabilities of accurately reading a content on the document image. By improving recognition accuracy of these fields on documents such as a credit card bill, the remainder of the content on the bill can be read with high confidence.
  • Products which use image processing techniques to read bills including such bill categories as insurance, utility, mortgage etc., use a set of rules which apply to all (or majority) of bills within each category.
  • One of the most important tasks behind the mobile image capture science is understanding and utilization of the category-specific rules in form of specialized OCR, cross-validation between different document fields, usage of postal barcodes etc.
  • knowledge that the document is a credit card bill (CCB) allows the system to read its Account Number and other critical fields using both data on the bill and the code-line and in some cases the code-line only. This reduces the error rate on critical fields by 2-5 times compared to “generic” bills.
  • CCBs Account Number, Balance Due, Payee ZIP-code and Biller's Name.
  • FIG. 1 is an image of a credit card bill identifying one or more critical fields, according to embodiments.
  • FIG. 2 is an image of a portion of the credit card bill used for cross-validation of an account number field against a Codeline, according to embodiments.
  • FIG. 3 is an image of the credit card bill identifying a payee block which is not sufficiently isolated from other text, causing text block segmentation issues, according to embodiments.
  • FIG. 4 is an illustration of a processed image of a credit card bill identifying a last text line in one or more address boxes, according to embodiments.
  • FIG. 5 is an image of the credit card bill highlighting a payment history deduction tool to improve accuracy of capturing critical fields, according to embodiments.
  • FIG. 6 is an image of a portion of the credit card bill highlighting a balance due critical field, according to embodiments.
  • FIG. 7 is an illustration of a method of identifying a biller's name field on different portions of the credit card bill, according to embodiments.
  • FIG. 8 illustrates a method of identifying one or more fields on the credit card statement, according to one embodiment of the invention.
  • FIG. 9 illustrates a system for capturing and identifying critical fields on a credit card statement, according to one embodiment of the invention.
  • FIG. 10 is a block diagram that illustrates an embodiment of a computer/server system upon which an embodiment of the inventive methodology may be implemented.
  • Embodiments described herein pertain to systems and methods for identifying and capturing critical fields on an image of a document such as a credit card bill.
  • Each critical field is identified as such based on the resulting likelihood that if the critical field can be identified, the remaining fields on the document can also be identified with a high confidence. This therefore improves the overall ability to capture and identify content from the document and utilize it for various applications.
  • the embodiments herein focus on improving recognition accuracy of these fields on credit card bills.
  • the following document fields are being captured and used to facilitate finding, identification, and recognition of critical fields:
  • AccountNumber field has a unique set of keyword phrases which allow to identify the field's location on about 90% of CCBs. In remaining 10%, the keyword cannot be found due to some combination of poor image quality, small font, inverted text etc. Below we discuss methods which can be used when keywords cannot be found.
  • Keyword phrases are: Account, Account Number and Account No. Some keyword phrases could be printed in a single text line or in two consecutive lines. It should be noted that the set of keywords on CCBs is more restrictive than in the general case. For example, such phrase as “Policy Number” (frequent on insurance bills) is not used.
  • Keywords are searched for in the full-page OCR result using Fuzzy Matching technique. For example, if OCR result contains “Account Nomber”, then the “Account Number” keyword will be found with confidence of approximately 920 (out of 1000 max) because 12 out of 13 non-space characters are the same as in the “Account Number”. On FIG. 2 , the keyword 201 “Account Number” was found.
  • Account Number format covers majority of ALL bills:
  • Total number of characters from 4 to 22 Number of low-case alpha characters (excluding ‘x’): 0 Number of upper-case alpha characters (excluding ‘X’): from 0 to 4 Number of punctuations (spaces, dashes): from 0 to 4 Number of masking characters (X, x, *, #): from 0 to 12
  • Total number of characters from 10 to 20 Number of low-case alpha characters (excluding ‘x’): 0 Number of upper-case alpha characters (excluding ‘X’): 0 Number of punctuations (spaces, dashes): from 0 to 4 Number of masking characters (X, x, *, #): from 0 to 12
  • the data could be found in proximity to keywords found in 1.1 or directly in the full-page OCR result.
  • Each location of data is assigned the format-based confidence, which reflects how close data in the found location matches the expected format.
  • the data 202 was found.
  • Account Number is always included into the Codeline. This allows to use cross-validation technique.
  • Account Number is captured using keywords and/or data formats definition, see 1.1-1.2. Let us refer to an Account Number result as A, see 202 on FIG. 2 .
  • Codeline is captured using an OCRA/OCRB recognition module. Let us refer to codeline string as B, see 202 on 203 on FIG. 2 .
  • Substrings of B are compared to A after removing spaces, dashes and other non-essential punctuation marks in both A and B.
  • the matching is done using Fuzzy Matching technique, explained in [1].
  • the matching threshold is configured in such a way that a single-character difference between A and substring of B is allowed. Additional differences involving characters which are frequently misrecognized are also allowed. For example, the difference between ‘3’ recognized in a particular place of A and ‘8’ recognized in the corresponding place inside B is excused because ‘8’ is frequently recognized as ‘3’. Another example of frequently misrecognized characters are ‘6’ and ‘5’.
  • Step (c) finds a substring C which fuzzy-matches A within the threshold explained above, then A is replaced by the C.
  • the explanation of preferring codeline recognition results to Account Number captured from the bill is that former is significantly more accurate than the latter.
  • Step 1.1 found the keyword 201
  • step 2.2 found the data 202 (A) next to the keyword.
  • the codeline 203 (B) was also found.
  • A is “4888 5755 5555 5561”
  • B is “43885755555555510000250000126565000000000000006”.
  • Each result found by 1.1-1.3 is assigned a confidence score which reflects how confident the system is that it found a correct field result.
  • a weighted linear combination of the following factors is used:
  • the weight of each individual factor is the overall field confidence score is established experimentally.
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate AccountNumber and other critical fields on CCBs, as described in [1]. In case the highest-confidence AccountNumber result matches Payee information included into the Biller's db, the system will accept the result. However, if it doesn't the system may reconfigure the AccountNumber format (see 1.2) and try to find it again by repeating steps 1.1 and 1.2 or just 1.2 when the new format is significantly more restrictive than the default one.
  • the system can use multiple OCR engines to recognize and re-recognize some characters.
  • a typical obstacle to using multiple OCR engines is a difficulty in deciding which one produced correct result. For the same reason as 1.3, making such decision becomes significantly simpler on CCBs.
  • the system can utilize such knowledge to improve data capturing accuracy.
  • the mechanism of using such hint is similar to imposing limitations on the field format (see 1.1). If a user enters one or more of last digits in the AccountNumber, the system can utilize such knowledge to improve data capturing accuracy.
  • the mechanism of using such hint is similar to imposing limitations on the field format (see 1.2)
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate account number and other critical fields on CCBs.
  • the system In order to find the Payee ZIP-code and also to ensure its correctness, the system first finds all address blocks on the bill, corrects those using postal barcodes, then identifies which one is Payee and takes its ZIP-code field as the result.
  • addresses are printed as left-, right- or center-justified text blocks isolated from the rest of document text by significant white margins. Based on this information, the system may detect potential address locations on a document by building text block structure. One way of doing that is to apply text segmentation features available in most of OCR systems, such as Fine Reader Engine by ABBYY.
  • the text segmentation method 2.1 may need a correction using postal barcodes, as explained on FIG. 4 .
  • the bill shown on FIG. 4 has Payee block 401 printed too close to an unrelated text block 402 , which may cause a failure of text segmentation algorithm, resulting in wrong text block 403 being found.
  • the system can use location of postal barcode 404 to define a search area above (shown as 405 ) and below the barcode, thus isolating the correct Payee address block 401 from the adjacent text block 402 .
  • the bottommost line contains City/State/ZIP information.
  • the system can utilize this knowledge by filtering out the text blocks found in 2.1-2.2 which do not have enough alphas (to represent City and State), do not contain any valid state (which is usually abbreviated to 2 characters) and/or do not contain enough numbers in the end to represent Zip-code.
  • the last text line in two true address text blocks 502 (Payor) and 503 (Payee) contain information which satisfies the conditions described above. Even if OCR makes several recognition errors, the Fuzzy Matching algorithm will establish a high degree compliance with expected format. On the other hand, the last line of text block 501 does not meet the expected format (for example, none of the last characters in its last row is numeric). Therefore, the text block 501 will be removed from further consideration whereas blocks 502 and 503 will be both recognized and classified as Payee/Payor as explained in 2.7
  • the BillPay system can build the entire address block starting with City/State/ZIP at the bottom line and including 1-3 upper lines as potential Name and Street Address components. Since the exact format of the address is not well-defined (it may have 1-4 lines, be with or without Recipient name, be with or without POBOX etc.), the system has to make multiple address interpretation attempts to achieve satisfactory interpretation of the entire text block.
  • the Fuzzy Matching mechanism In order to compare OCR results with the data included into the Postal db, the Fuzzy Matching mechanism is used. For example, if OCR reads “San Diego” as “San Dicgo” (‘c’ and ‘e’ are often misrecognized), Fuzzy Matching will produce matching confidence above 80% between the two, which is sufficient to achieve the correct interpretation of OCR result.
  • the individual components will be corrected to become identical to those included into the Postal db.
  • the discrepancies between address printed on the bill and its closest match in Postal db could be corrected by replacing invalid, obsolete or incomplete data as follows:
  • 92128-1284 could be replaced by 92128-1234 if the latter is a valid ZIP+4 additionally confirmed by either the street address or postal barcode, see 2.8
  • 92128 could be replaced by 92128-1234 if the latter is a valid ZIP+4 additionally confirmed by either the street address or postal barcode, see 2.8
  • the system will assign a confidence value on the scale from 0 to 1000 to each address it finds. Such confidences could be assigned overall for the entire address block or individually to each address component (Recipient Name, Street Number, Apartment Number, Street Name, POBOX Number, City, State and Zip). The larger values indicate that the system is quite sure that if found, read and interpreted the address correctly.
  • the component-specific confidence reflects the number of corrections in this component required by process 2.5. For example, if 1 out of 8 non-space characters was corrected the “CityName” address component (e.g. San Dicgo” v. “San Diego”), the confidence of 875 may be assigned (1000*7 ⁇ 8).
  • the overall confidence is a weighted linear combination of individual component-specific confidences, where the weights are established experimentally.
  • Adjacency to Postal barcodes see 2.8. If one and only one of two found addresses is adjacent to a postal barcode, it is likelier to be Payor's.
  • the address block adjacent to postal barcode is given a preference.
  • Postal barcodes are often printed on bills and they help to improve accuracy of address capture.
  • the system uses Postal barcodes for 4 purposes:
  • Payor hint contain information about Payor (i.e. the bill's recipient).
  • the system can use such information as one of the factors in Payee vs. Payor identification (see 2.7)
  • Such hint contains information about previously paid bills in the account.
  • the system can use such information to significantly increase accuracy of capturing critical fields. Depending on which and how many critical field values were included into the hint, the field capture error may be reduced by 20-98% for repeating billers.
  • Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate payee information and other critical fields on CCBs, see [1].
  • Balance Due field has a unique set of keyword phrases which allow to identify the field's location on about 90% of CCBs. In remaining 10%, the keyword cannot be found due to some combination of poor image quality, usage of small font, inverted text etc.
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the “New Balance” keyword 701
  • Keywords are searched for in the OCR result using Fuzzy Matching technique. For example, if OCR result contains “Bajance Due” then the “Balance Due” keyword will be found with confidence of 900 (out of 1000 max) because 9 out of 10 non-space characters are the same as in the “Balance Due”.
  • DollarAmount is one of pre-defined data formats explained in [1]. Data format is used by Bill Pay system in combination with Keyword-based search 3.1 to further narrow down the set of candidates for the field.
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 adjacent to keyword 701 . You can also see other instances of data with “DollarAmount” format in 703 .
  • Each location of data found in proximity to keywords found in 3.1 is assigned the format-based confidence, which reflects how close data in the found location matches expected format (in this case, “DollarAmount”).
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 cross-validated using Codeline substring 704
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 is the largest of 3 “DollarAmount” fields ( 702 and 703 ). It can also be cross-validated using Codeline substring 704
  • Each result found by 3.1-3.4 is assigned a confidence score which reflects how confident the system is that it found correct field result.
  • a weighted linear combination of the following factors is used
  • Biller's name is often indicated on a bill by certain keyword phrases, the most frequent of which are:
  • Biller's Name is pointed to by the keyword phrase “Make Check Payable To” ( 801 )
  • the system will try to find actual biller's name in the proximity of the keyword using the following sequence of attempts:
  • the biller name is also included into the “recipient” field (the upmost line of the address block) of the Payee address block. Therefore, the system will use the Payee's “recipient” field to cross-correlate (via Fuzzy Matching) with various biller's name alternatives found in 4.2 to choose the best candidate.
  • the Biller's Name “Citi Cards” 802 closely correlates with (or is identical to, if no OCR errors were made) Payee “recipient” field 803
  • Each result found by 4.1-4.4 is assigned a confidence score which reflects how confident the system is that it found correct field result.
  • a weighted linear combination of the following factors is used
  • MCCs Visa, MasterCard, AmEx, Diners, and Discover
  • Their account numbers start with a particular digit or a narrow range of digits (say, Visa always starts with ‘4’, MasterCard with “51”-“55” and so on). This limitation translates in narrowing the AccountNumber's format, see 1.2
  • AccountNumber's length is also restricted. The length depends on the credit card, 16 digits length is the most often case. This limitation also translates in narrowing the AccountNumber's format, see 1.2
  • Step 1 Double the value of alternate digits of the account number beginning with the second digit from the right (the first right—hand digit is the check digit.)
  • Step 2 Add the individual digits comprising the products obtained in Step 1 to each of the unaffected digits in the original number.
  • Step 3 The total obtained in Step 2 must be a number ending in zero (30, 40, 50, etc.) for the account number to be validated.
  • addresses are printed as left-, right- or center-justified text blocks isolated from the rest of document text by significant white margins. Based on this information, the system may detect potential address locations on a document by building text block structure.
  • the bottommost line contains City/State/ZIP information.
  • the system can utilize this knowledge by filtering out the text blocks found in 2.1 which do not have enough alphas (to represent City and State), do not contain any valid state (which is usually abbreviate to 2 characters) and/or do not contain enough numbers in the end to represent Zip-code.
  • the system can build the entire address block starting with City/State/ZIP at the bottom line and including 1-3 upper lines as potential Name and Street Address components. Since the exact format of the address is not well-defined (it may have 1-4 lines, be with and without names, be with and without POBOX etc), the system has to make multiple address interpretation attempts to achieve satisfactory interpretation of the entire text block.
  • the Fuzzy Matching mechanism In order to compare OCR results with the data included into the Postal db, the Fuzzy Matching mechanism is used. For example, if OCR reads “San Diego” as “San Dicgo” (‘c’ and ‘e’ are often misrecognized), Fuzzy Matching will produce matching confidence above 80% between the two, which is sufficient to achieve the correct interpretation of OCR result.
  • the system will assign a confidence value on the scale from 0 to 1000 to each address found above. Such confidences could be assigned overall for the entire address block or individually to each address component (recipient name, street number, apartment number, street name, POBOX number, City, State and ZIP). The larger values indicate that the system is quite sure that if found, read and interpreted the address correctly.
  • the system After one or more of address blocks have been captured, the system must make a determination of which one is Payee and Payor. The following factors help in such determination:
  • Adjacency to Postal to barcodes (if more than 1 block competes for either Payee or Payor, the one adjacent to a barcode wins).
  • Postal barcodes are often printed on bills and they help to improve accuracy of address capture.
  • the system uses Postal barcodes for 3 purposes:
  • Payor hint contain information about Payor (i.e. the bill recipient).
  • the system can use such information for Payee vs. Payor identification (see also 2.6)
  • Payee hint contain information about existing billers in the account.
  • the system can use such information to significantly increase accuracy of capturing critical fields.
  • the field capture error may be reduced by 20-98% for the pre-existing (i.e. repeating) billers.
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate payee information and other critical fields on CCBs.
  • Balance Due field has a unique set of keywords which allow us to identify the field's location on about 90% of CCBs. In remaining 10% the keyword cannot be found due to some combination of poor image quality, usage of small font, inverted text etc.
  • Biller's name is often indicated on a bill by certain keywords, like “Pay to”, “Make your check payable to” etc. Once one or more such keywords were found, the system will try to find actual biller's name in the proximity of the keyword using the following sequence of attempts:
  • the system will use the text found in 1-3 above as another candidate for the biller's name in addition to the one presented in the section immediately above.
  • Account Number field in all MCCs is purely numeric (unlike say Insurance bills which may include alphas).
  • MCCs (Visa, MasterCard, AmEx, Diners, and Discover) have well defined account number formats. Their account numbers start with a particular digit or a narrow range of digits (say, Visa always starts with ‘4’, MasterCard with “51”-“55” and so on).
  • Account number length is also well restricted. The length depends on the credit card, 16 digits length is the most often case.
  • Step 1 Double the value of alternate digits of the account number beginning with the second digit from the right (the first right—hand digit is the check digit.)
  • Step 2 Add the individual digits comprising the products obtained in Step 1 to each of the unaffected digits in the original number.
  • Step 3 The total obtained in Step 2 must be a number ending in zero (30, 40, 50, etc.) for the account number to be validated.
  • FIG. 8 is a flowchart of a method of capturing critical fields in a credit card bill, in accordance with embodiments of the invention.
  • the steps for capturing critical fields include:
  • Dynamic Capture module [1] Applying Dynamic Capture module [1] to find alternatives for Account Number and Balance Due based on the field's keywords (such as “Account Number”, “Balance Due” etc) and field's format
  • FIG. 9 is one embodiment of a network and system upon which the methods described herein may be implemented, including a capture device 702 which captures an image 704 of a credit card bill, then transmits it over a network 706 to a server 708 for processing.
  • the capture device 702 also performs one or more of the processing steps described herein in addition to, or instead of, the server 708 .
  • FIG. 10 is an embodiment of a computer, processor and memory upon which a mobile device, server or other computing device may be implemented to carry out the methods described herein.
  • the server 708 may include a power supply 902 , processor 904 , network interface module 906 , memory 908 and a CCB recognition module 910 for performing the specific credit card bill recognition and identification steps described herein.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Development Economics (AREA)
  • Electromagnetism (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Toxicology (AREA)
  • General Health & Medical Sciences (AREA)
  • Finance (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Accounting & Taxation (AREA)
  • Character Discrimination (AREA)

Abstract

The embodiments herein focus on improving recognition accuracy of these fields on credit card bills by detecting and identifying critical fields on a credit card, extracting the data from the critical fields and comparing the data with known data on a payor, payee and biller.

Description

    BACKGROUND
  • 1. Field of the Invention
  • The embodiments described herein relate to processing images captured using a mobile device, and more particularly to identifying critical fields in a credit card remittance coupon and extracting the content therein.
  • 2. Related Art
  • Financial institutions which issue credit cards frequently offer a service known as a balance transfer, where a customer with a balance due on a credit card can transfer some or all of the outstanding balance from one credit card to another credit card. Customers typically transfer balances from one card to another to obtain a lower interest rate, more favorable payment schedule, or other benefits offered by a credit card for carrying a balance with a particular financial institution. A balance transfer may also be similar to a cash advance, where a customer can transfer a sum of money from their credit card into their bank account, resulting in a balance due on the credit card but giving the customer cash in their bank account.
  • In some situations, the customer already holds the credit card where the balance is being transferred, while in other situations, the customer may be opening a new credit card and transferring a balance to the new credit card. Banks often compete with other banks to advertise lower interest rates and favorable payment terms on a balance transfer. However, it is often difficult for a customer to find out which balance transfer offers are available and what the terms of the balance transfer will be, as many balance transfer terms are dependent on the amount of the balance being transferred or the credit rating of the customer.
  • The balance transfer process is cumbersome for both the customer and the bank. The customer must obtain several different pieces of information, including the customer's name, contact information, credit card number, the current balance and the applicable interest rates that are applicable to the balance. If the balance is being transferred to a bank account, other information may be needed, such as a bank account number and routing number. A bank may also want to evaluate the credit history of the customer to determine whether to accept the balance transfer application, in which case the customer will need to provide even more information, such as a social security number, driver's license number or additional financial information.
  • Once this information is entered into an application for a balance transfer, the receiving bank evaluates the information to determine whether to accept the balance transfer request. This process may take a significant amount of time—generally several days. Once accepted, it may take several more day or even weeks before the money is transferred.
  • Therefore, there is a need for streamlining the process of applying for and processing financial offers, such as credit card balance transfers.
  • SUMMARY
  • Embodiments described herein provide for the identification of critical fields on a document which provide high probabilities of accurately reading a content on the document image. By improving recognition accuracy of these fields on documents such as a credit card bill, the remainder of the content on the bill can be read with high confidence.
  • Products which use image processing techniques to read bills, including such bill categories as insurance, utility, mortgage etc., use a set of rules which apply to all (or majority) of bills within each category. One of the most important tasks behind the mobile image capture science is understanding and utilization of the category-specific rules in form of specialized OCR, cross-validation between different document fields, usage of postal barcodes etc. For example, knowledge that the document is a credit card bill (CCB) allows the system to read its Account Number and other critical fields using both data on the bill and the code-line and in some cases the code-line only. This reduces the error rate on critical fields by 2-5 times compared to “generic” bills.
  • The following fields on CCBs are considered critical on CCBs: Account Number, Balance Due, Payee ZIP-code and Biller's Name.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • Various embodiments disclosed herein are described in detail with reference to the following figures. The drawings are provided for purposes of illustration only and merely depict typical or exemplary embodiments. These drawings are provided to facilitate the reader's understanding and shall not be considered limiting of the breadth, scope, or applicability of the embodiments. It should be noted that for clarity and ease of illustration these drawings are not necessarily made to scale.
  • FIG. 1 is an image of a credit card bill identifying one or more critical fields, according to embodiments.
  • FIG. 2 is an image of a portion of the credit card bill used for cross-validation of an account number field against a Codeline, according to embodiments.
  • FIG. 3 is an image of the credit card bill identifying a payee block which is not sufficiently isolated from other text, causing text block segmentation issues, according to embodiments.
  • FIG. 4 is an illustration of a processed image of a credit card bill identifying a last text line in one or more address boxes, according to embodiments.
  • FIG. 5 is an image of the credit card bill highlighting a payment history deduction tool to improve accuracy of capturing critical fields, according to embodiments.
  • FIG. 6 is an image of a portion of the credit card bill highlighting a balance due critical field, according to embodiments.
  • FIG. 7 is an illustration of a method of identifying a biller's name field on different portions of the credit card bill, according to embodiments.
  • FIG. 8 illustrates a method of identifying one or more fields on the credit card statement, according to one embodiment of the invention.
  • FIG. 9 illustrates a system for capturing and identifying critical fields on a credit card statement, according to one embodiment of the invention.
  • FIG. 10 is a block diagram that illustrates an embodiment of a computer/server system upon which an embodiment of the inventive methodology may be implemented.
  • The various embodiments mentioned above are described in further detail with reference to the aforementioned figured and the following detailed description of exemplary embodiments.
  • DETAILED DESCRIPTION
  • Embodiments described herein pertain to systems and methods for identifying and capturing critical fields on an image of a document such as a credit card bill. Each critical field is identified as such based on the resulting likelihood that if the critical field can be identified, the remaining fields on the document can also be identified with a high confidence. This therefore improves the overall ability to capture and identify content from the document and utilize it for various applications.
  • The following fields on CCBs are considered critical for BT application:
  • Account Number
  • Balance Due
  • Payee ZIP-code
  • Biller's Name
  • The embodiments herein focus on improving recognition accuracy of these fields on credit card bills. The following document fields are being captured and used to facilitate finding, identification, and recognition of critical fields:
  • 105 Codeline
  • 106 Payor Block
  • 107 Postal Barcodes
  • I. Capturing Account Number from Credit Cards Bills
  • Keyword-Based Search
  • AccountNumber field has a unique set of keyword phrases which allow to identify the field's location on about 90% of CCBs. In remaining 10%, the keyword cannot be found due to some combination of poor image quality, small font, inverted text etc. Below we discuss methods which can be used when keywords cannot be found.
  • The most frequent keyword phrases are: Account, Account Number and Account No. Some keyword phrases could be printed in a single text line or in two consecutive lines. It should be noted that the set of keywords on CCBs is more restrictive than in the general case. For example, such phrase as “Policy Number” (frequent on insurance bills) is not used.
  • Keywords are searched for in the full-page OCR result using Fuzzy Matching technique. For example, if OCR result contains “Account Nomber”, then the “Account Number” keyword will be found with confidence of approximately 920 (out of 1000 max) because 12 out of 13 non-space characters are the same as in the “Account Number”. On FIG. 2, the keyword 201 “Account Number” was found.
  • Format-Based Search
  • The data format of Account Number field on CCBs is more restrictive than in the “generic” bill case both in terms of its length and character set. More limitations apply in case of Major Credit Cards, see section 6.
  • The data format of Account Number field on CCBs is more restrictive than in the “generic” bill case both in terms of its length and character set.
  • For example, the following definition of Account Number format covers majority of ALL bills:
  • Total number of characters: from 4 to 22
    Number of low-case alpha characters (excluding ‘x’): 0
    Number of upper-case alpha characters (excluding ‘X’): from 0 to 4
    Number of punctuations (spaces, dashes): from 0 to 4
    Number of masking characters (X, x, *, #): from 0 to 12
  • In contrast, the following (narrower) definition of Account Number format covers majority of credit card bills:
  • Total number of characters: from 10 to 20
    Number of low-case alpha characters (excluding ‘x’): 0
    Number of upper-case alpha characters (excluding ‘X’): 0
    Number of punctuations (spaces, dashes): from 0 to 4
    Number of masking characters (X, x, *, #): from 0 to 12
  • The data could be found in proximity to keywords found in 1.1 or directly in the full-page OCR result.
  • Each location of data is assigned the format-based confidence, which reflects how close data in the found location matches the expected format. On FIG. 2, the data 202 was found.
  • Cross Validation Against Codeline
  • On CCBs, Account Number is always included into the Codeline. This allows to use cross-validation technique.
  • On CCBs, AccountNumber is always included into the Codeline. This allows to use cross-validation technique, which works as follows:
  • Account Number is captured using keywords and/or data formats definition, see 1.1-1.2. Let us refer to an Account Number result as A, see 202 on FIG. 2. Codeline is captured using an OCRA/OCRB recognition module. Let us refer to codeline string as B, see 202 on 203 on FIG. 2.
  • Substrings of B are compared to A after removing spaces, dashes and other non-essential punctuation marks in both A and B. The matching is done using Fuzzy Matching technique, explained in [1]. The matching threshold is configured in such a way that a single-character difference between A and substring of B is allowed. Additional differences involving characters which are frequently misrecognized are also allowed. For example, the difference between ‘3’ recognized in a particular place of A and ‘8’ recognized in the corresponding place inside B is excused because ‘8’ is frequently recognized as ‘3’. Another example of frequently misrecognized characters are ‘6’ and ‘5’.
  • If Step (c) finds a substring C which fuzzy-matches A within the threshold explained above, then A is replaced by the C. The explanation of preferring codeline recognition results to Account Number captured from the bill is that former is significantly more accurate than the latter.
  • Consider FIG. 2 as an example. Step 1.1 found the keyword 201, step 2.2 found the data 202 (A) next to the keyword. The codeline 203 (B) was also found. Let us assume that A is “4888 5755 5555 5561” and B is “43885755555555510000250000126565000000000000006”. The Fuzzy Matching technique (c) will detect that the substring 204 C=“4388575555555551” has 2 differences against A, involving pairs (‘8’ and ‘3’) and (‘6’ and ‘5’). Since both pairs are frequently misrecognized (see above) the string C will be accepted by step (c). As a result, step (d) will correct A to “4388 5755 5555 5551.”
  • AccountNumber's Confidence Score
  • Each result found by 1.1-1.3 is assigned a confidence score which reflects how confident the system is that it found a correct field result. In computing the confidence score, a weighted linear combination of the following factors is used:
      • the keyword finding confidence, see 1.1
      • the format-based confidence, see 2.2
      • the score reflecting geometrical alignment between keyword found in 1.1 and data found by 1.2. For example, if the data is located immediately to the right from the keyword (with no characters in between) like 201 and 202 on FIG. 2, or the data is located immediately below the keyword (again, with no characters in between), the score reaches its maximum value of 1000. Various deviation from such alignment or presence of characters in between cause lower scores.
      • if field's data could be found in the codeline (see 1.3), the confidence gets an additional boost depending on how well the data and codeline match.
  • The weight of each individual factor is the overall field confidence score is established experimentally.
  • Cross Validation Against Biller's Database
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate AccountNumber and other critical fields on CCBs, as described in [1]. In case the highest-confidence AccountNumber result matches Payee information included into the Biller's db, the system will accept the result. However, if it doesn't the system may reconfigure the AccountNumber format (see 1.2) and try to find it again by repeating steps 1.1 and 1.2 or just 1.2 when the new format is significantly more restrictive than the default one.
  • Usage of Specialized OCR
  • Since the format of Account Number field is more restrictive than in the “generic” bill case, it allows to make OCR more specialized and thus to achieve higher recognition accuracy. For example, a typical OCR error of misrecognition of ‘2’ and ‘Z’, ‘O’ and ‘0’, ‘1’ and ‘I’, ‘5’ and ‘S’ could be easily avoided if we know that the character is alpha or numeric.
  • Usage of Multiple OCR Engines
  • The system can use multiple OCR engines to recognize and re-recognize some characters. A typical obstacle to using multiple OCR engines is a difficulty in deciding which one produced correct result. For the same reason as 1.3, making such decision becomes significantly simpler on CCBs.
  • Using “Last Digits” Hints
  • If a user enters 1 or more of last digits in the account number, the system can utilize such knowledge to improve data capturing accuracy. The mechanism of using such hint is similar to imposing limitations on the field format (see 1.1). If a user enters one or more of last digits in the AccountNumber, the system can utilize such knowledge to improve data capturing accuracy. The mechanism of using such hint is similar to imposing limitations on the field format (see 1.2)
  • Identification of Major Credit Cards
  • Since there are very few major credit cards among all credit card billers, it is possible to identify the exact major credit using relatively simple and fast form identification methods. Such methods are based on finding logos, certain keyword and overall location of text blocks on the document. Handling of mobile images for the purpose of Form Identification is described in [2].
  • Once the system identified that a bill is one of the major credit cards', it can use several rules that apply to such bills (but do not apply to CCBs in general), see Section 6.
  • Cross Validation Against Biller's Database
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate account number and other critical fields on CCBs.
  • II. Section 2 Capturing Payee ZIP-Code from Credit Cards Bills
  • In order to find the Payee ZIP-code and also to ensure its correctness, the system first finds all address blocks on the bill, corrects those using postal barcodes, then identifies which one is Payee and takes its ZIP-code field as the result.
  • 2.1 Using Text Blocks to Find Possible Addresses
  • Usually, addresses are printed as left-, right- or center-justified text blocks isolated from the rest of document text by significant white margins. Based on this information, the system may detect potential address locations on a document by building text block structure. One way of doing that is to apply text segmentation features available in most of OCR systems, such as Fine Reader Engine by ABBYY.
  • 2.2 Using Postal Barcode to Isolate Address Text Blocks
  • On some bill layouts, where the address blocks are not sufficiently isolated, the text segmentation method 2.1 may need a correction using postal barcodes, as explained on FIG. 4. The bill shown on FIG. 4 has Payee block 401 printed too close to an unrelated text block 402, which may cause a failure of text segmentation algorithm, resulting in wrong text block 403 being found. To correct the problem, the system can use location of postal barcode 404 to define a search area above (shown as 405) and below the barcode, thus isolating the correct Payee address block 401 from the adjacent text block 402.
  • 2.3 Filtering-Out the Text Blocks by the City/State/ZIP Line
  • In most of US addresses, the bottommost line contains City/State/ZIP information. The system can utilize this knowledge by filtering out the text blocks found in 2.1-2.2 which do not have enough alphas (to represent City and State), do not contain any valid state (which is usually abbreviated to 2 characters) and/or do not contain enough numbers in the end to represent Zip-code.
  • Consider the bill shown on FIG. 5: the last text line in two true address text blocks 502 (Payor) and 503 (Payee) contain information which satisfies the conditions described above. Even if OCR makes several recognition errors, the Fuzzy Matching algorithm will establish a high degree compliance with expected format. On the other hand, the last line of text block 501 does not meet the expected format (for example, none of the last characters in its last row is numeric). Therefore, the text block 501 will be removed from further consideration whereas blocks 502 and 503 will be both recognized and classified as Payee/Payor as explained in 2.7
  • 2.4. Using Postal Database and Fuzzy Matching to Interpret Addresses
  • Once address candidates are selected using 2.1-2.3, the BillPay system can build the entire address block starting with City/State/ZIP at the bottom line and including 1-3 upper lines as potential Name and Street Address components. Since the exact format of the address is not well-defined (it may have 1-4 lines, be with or without Recipient name, be with or without POBOX etc.), the system has to make multiple address interpretation attempts to achieve satisfactory interpretation of the entire text block.
  • In order to compare OCR results with the data included into the Postal db, the Fuzzy Matching mechanism is used. For example, if OCR reads “San Diego” as “San Dicgo” (‘c’ and ‘e’ are often misrecognized), Fuzzy Matching will produce matching confidence above 80% between the two, which is sufficient to achieve the correct interpretation of OCR result.
  • 2.5. Using Postal Database to Correct Addresses
  • After the interpretation of the address block was achieved, the individual components will be corrected to become identical to those included into the Postal db. Optionally, the discrepancies between address printed on the bill and its closest match in Postal db could be corrected by replacing invalid, obsolete or incomplete data as follows:
  • Correcting ZIP+4
  • For example, 92128-1284 could be replaced by 92128-1234 if the latter is a valid ZIP+4 additionally confirmed by either the street address or postal barcode, see 2.8
  • Adding Missing ZIP+4
  • For example, 92128 could be replaced by 92128-1234 if the latter is a valid ZIP+4 additionally confirmed by either the street address or postal barcode, see 2.8
  • Correcting invalid street suffixes, such as “Road” into “Street” if the “Street” suffix can be confirmed by Postal db while the “Road” one cannot.
  • 2.6 Computation of the Address Confidence
  • The system will assign a confidence value on the scale from 0 to 1000 to each address it finds. Such confidences could be assigned overall for the entire address block or individually to each address component (Recipient Name, Street Number, Apartment Number, Street Name, POBOX Number, City, State and Zip). The larger values indicate that the system is quite sure that if found, read and interpreted the address correctly. The component-specific confidence reflects the number of corrections in this component required by process 2.5. For example, if 1 out of 8 non-space characters was corrected the “CityName” address component (e.g. San Dicgo” v. “San Diego”), the confidence of 875 may be assigned (1000*⅞). The overall confidence is a weighted linear combination of individual component-specific confidences, where the weights are established experimentally.
  • 2.7 Identification of Payee Vs. Payor
  • After one or more of address blocks have been captured as described in 2.1-2.6, the system must make a determination as to which one is Payee's and/or Payor's. The following factors help in such determination:
  • Presence of POBOX (it's much more likely to be a Payee than Payor if POBOX is present)
  • Location within the document (e.g. Payee is somewhat more likely to be printed at the bottom, especially in the right/bottom corner)
  • Inclusion of certain words in the Recipient name item (some words like “Corporation”, “Department”, “Center” etc. indicate Payee)
  • Inclusion of frequent names in the Recipient name item (e.g. “John” is more likely indicate Payor than Payee)
  • Adjacency to Postal barcodes, see 2.8. If one and only one of two found addresses is adjacent to a postal barcode, it is likelier to be Payor's.
  • Optional Payor hint, as explained in 2.9
  • Also, in a case when 3 or more addresses were found (and therefore more than one address block compete for either Payee or Payor) the address block adjacent to postal barcode is given a preference.
  • 2.8 Using Postal Barcode Reader
  • Postal barcodes are often printed on bills and they help to improve accuracy of address capture.
  • The system uses Postal barcodes for 4 purposes:
  • To help in Payee vs. Payor identification (see 2.7)
  • To help choose the correct Payee or Payor when two found address blocks compete for the same field result
  • To correct ZIP-codes or capture them if they cannot be read from the image due to poor quality. For example (see FIG. 6), if the address is 603 illegible but the barcode 604 containing ZIP+4=“60094-4014” was found, the system can recreate City, State and ZIP fields from the barcode.
  • To better detect address candidates, see 2.2
  • 2.9 Using Payor Hint
  • Payor hint contain information about Payor (i.e. the bill's recipient). The system can use such information as one of the factors in Payee vs. Payor identification (see 2.7)
  • 2.10 Using “Payment History” Hint
  • Such hint contains information about previously paid bills in the account. The system can use such information to significantly increase accuracy of capturing critical fields. Depending on which and how many critical field values were included into the hint, the field capture error may be reduced by 20-98% for repeating billers.
  • As an illustration, consider FIG. 6. Assume that the Bill Pay system made one recognition error (out of 16 digits) on the Account Number field 601 and recognized it as “4388 6755 5555 5551” as well as two errors (out of 17 characters) in the Biller's Name 602 and recognized it as “Chasc Card Servlces”. Let us assume that the system made no errors in capturing the Payee ZIP-code 603 (out of 9 digits). In this particular example, even though the field itself cannot be read due to poor quality, the Postal Barcode 604 was captured correctly and populated the City, State and ZIP fields of Payee block.
  • If the “payment history” for this transaction included correct reading of Account Number (“4388 5755 5555 5551”), correct Biller's Name (“Chase Card Services”) and correct biller's ZIP code (“60094-4014”), a standard fuzzy matching procedure will identify that 39 of 42 characters in all 3 critical fields combined are matched correctly, resulting in about 92% matching confidence. If the system uses a threshold of 90% for this matching (which could be made configurable), the errors in Account Number and Biller's Name may be corrected automatically.
  • 2.11 Using Biller's Database
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate payee information and other critical fields on CCBs, see [1].
  • III. Section 3 Capturing Balance Due from Credit Cards Bills
  • 3.1 Keyword-Based Search
  • Balance Due field has a unique set of keyword phrases which allow to identify the field's location on about 90% of CCBs. In remaining 10%, the keyword cannot be found due to some combination of poor image quality, usage of small font, inverted text etc.
  • The most frequent keyword phrases are:
      • Balance Due
      • New Balance
      • New Balance Total
      • Outstanding Balance
      • Balance
      • Total Balance
      • Previous balance
      • Current Balance
      • Balance At Billing
  • These and other keywords could be printed in a single text line or two adjacent lines (except for single-word ones)
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the “New Balance” keyword 701
  • Keywords are searched for in the OCR result using Fuzzy Matching technique. For example, if OCR result contains “Bajance Due” then the “Balance Due” keyword will be found with confidence of 900 (out of 1000 max) because 9 out of 10 non-space characters are the same as in the “Balance Due”.
  • 3.2 Format-Based Search
  • Balance Due field has so-called “DollarAmount” format, which is one of pre-defined data formats explained in [1]. Data format is used by Bill Pay system in combination with Keyword-based search 3.1 to further narrow down the set of candidates for the field.
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 adjacent to keyword 701. You can also see other instances of data with “DollarAmount” format in 703.
  • Each location of data found in proximity to keywords found in 3.1 is assigned the format-based confidence, which reflects how close data in the found location matches expected format (in this case, “DollarAmount”).
  • 3.3 Cross-Validation Against Codeline
  • On CCBs, Balance Due is always included into the Codeline. This allows to use cross-validation technique similar to one explained in 1.2 for Account Number field.
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 cross-validated using Codeline substring 704
  • 3.4 Usage of the Largest Amount
  • If regular keyword-based search (see 3.1) doesn't yield results, the system can use the largest of all amounts included into the bill and found by 3.2 as long as it can be validated against the codeline
  • Example on FIG. 7 shows a VISA Mileage Plus credit card bill with the BalanceDue data 702 is the largest of 3 “DollarAmount” fields (702 and 703). It can also be cross-validated using Codeline substring 704
  • 3.6. Confidence Score
  • Each result found by 3.1-3.4 is assigned a confidence score which reflects how confident the system is that it found correct field result. A weighted linear combination of the following factors is used
      • the keyword finding confidence, see 3.1
      • the format-based confidence, see 3.2
      • the score reflecting geometrical alignment between keyword found in 3.1 and data found by 3.2. For example, if the data is located immediately below the keyword (with no characters in between) like 701 and 702 on FIG. 7, or the data is located immediately to the right from the keyword (again with no characters in between) the score reaches its maximum value of 1000. Various deviation from such alignment or presence of characters in between cause lower scores.
      • if field's data could be found in the codeline, the confidence gets an additional boost depending on how well the data and codeline match.
      • the confidence is additionally boosted for larger amounts and penalized for smaller ones.
        The weight of each individual factor in the overall field confidence score is established experimentally.
        IV. Section 4 Capturing Biller's Name from Credit Cards Bills
  • 4.1 Using Keywords
  • The Biller's name is often indicated on a bill by certain keyword phrases, the most frequent of which are:
      • Make check payable to
      • Make payment to
      • Made payable to
      • Remit Payments to
      • Check made payable to
      • Check or money order payable to
  • On FIG. 8, Biller's Name is pointed to by the keyword phrase “Make Check Payable To” (801)
  • 4.2 Finding Field Adjacent to Keyword
  • Once one or more such keywords were found, the system will try to find actual biller's name in the proximity of the keyword using the following sequence of attempts:
  • 1. Text immediately to the right of found keyword(s). Stop if text is found, otherwise proceed to #2
  • 2. Text immediately below found keyword(s). Stop if text is found, otherwise proceed to #3
  • 3. Check if the Payee block is located below the keyword. If yes, take its topmost line.
  • On FIG. 8, the Biller's Name “Citi Cards” 802 is located immediately to the right of keyword 801
  • 4.3 Cross-Correlation Against the “Recipient” Field in the Payee Address
  • On a large portion of CCBs the biller name is also included into the “recipient” field (the upmost line of the address block) of the Payee address block. Therefore, the system will use the Payee's “recipient” field to cross-correlate (via Fuzzy Matching) with various biller's name alternatives found in 4.2 to choose the best candidate.
  • On FIG. 8, the Biller's Name “Citi Cards” 802 closely correlates with (or is identical to, if no OCR errors were made) Payee “recipient” field 803
  • 4.4. Using “Stop Words” to Limit the Field
  • When the Biller's name is found according to 4.1-4.2, sometimes an unrelated text is being added to it because it is printed to the right from actual Biller's name. To identify and remove unrelated text, the system uses a set of so-called “stop-words”, which are commonly used to give the Payor some additional instruction related to paying the bill.
  • The list of commonly used “stop-words” include (but not limited to) the following phrases
      • remit to
      • please do not
      • please return
      • and return
      • and indicate
      • do not send cash
      • in US funds
      • and include
      • and mail with
      • and mail to
      • please write
      • please make
      • please include
      • and write your
      • thank you
      • in US dollars
      • and this payment
      • in the enclosed
      • and write account
      • detach and exclude
      • payment due date
  • Consider the following example of using “stop-words”. On FIG. 8, the Biller's Name found according to 4.1-4.2 on the right from found keyword 804 is 805 “Beaumont Hospital—DO NOT SEND CASH”. Then analysis finds a “stop-word” 806 (“DO NOT SEND CASH”), truncates the field result 805 before the “stop-word”, hence producing the correct result 806 (“Beaumont Hospital”)
  • 4.5. Confidence Score
  • Each result found by 4.1-4.4 is assigned a confidence score which reflects how confident the system is that it found correct field result. A weighted linear combination of the following factors is used
      • the keyword finding confidence, see 4.1
      • the geometrical relationship between keyword found in 4.1 and data found by 4.2
      • cross-correlation to the Payee “recipient” field, established by 4.3
  • The weight of each individual factor is established experimentally.
  • 4.6 Using Biller's Database
  • All candidates for biller's name captured from the bill according to 4.1-4.4, get cross-correlated against all biller's names located at the captured biller's Zip-code (see Section 2). If one of the entries in Biller's db produces high match confidence against one of results 4.1-4.4, the latter will be chosen as the correct biller's name, see [1]. The matching threshold is configurable. If none of Biller db entry matches to field results found by 4.1-4.3, the result with the highest confidence score 4.5 will be used.
  • Section 5 Capturing AccountNumber on Major Credit Cards
  • There is a set of rules applicable to all Major Credit Cards (MCC for short) which help to increase recognition accuracy on such bills.
  • 5.1 Limitations of AccountNumber's Leading Digits
  • MCCs (Visa, MasterCard, AmEx, Diners, and Discover) have well-defined account number formats. Their account numbers start with a particular digit or a narrow range of digits (say, Visa always starts with ‘4’, MasterCard with “51”-“55” and so on). This limitation translates in narrowing the AccountNumber's format, see 1.2
  • 5.2 Limitations of AccountNumber's Length
  • AccountNumber's length is also restricted. The length depends on the credit card, 16 digits length is the most often case. This limitation also translates in narrowing the AccountNumber's format, see 1.2
  • 5.3 Mod 10 Rule (LUHN Formula)
  • Account Number field on MCCs satisfied LUHN Formula (Mod 10) rule, which we included below for reference.
  • The following steps are required to validate the account number on MCCs:
  • Step 1: Double the value of alternate digits of the account number beginning with the second digit from the right (the first right—hand digit is the check digit.)
  • Step 2: Add the individual digits comprising the products obtained in Step 1 to each of the unaffected digits in the original number.
  • Step 3: The total obtained in Step 2 must be a number ending in zero (30, 40, 50, etc.) for the account number to be validated.
  • 5.4 Detection of Account Number Entirely by Codeline
  • If the fact that bill is issued by a major credit card was established, the system can in most cases find the field directly in Codeline w/o a necessity to do the OCR. This becomes possible if a single substring in the codeline satisfies all restrictions 5.1-5.3
  • IV. Capturing Biller's Address from Credit Cards Bills
  • Using Text Blocks to Find Possible Addresses
  • Usually, addresses are printed as left-, right- or center-justified text blocks isolated from the rest of document text by significant white margins. Based on this information, the system may detect potential address locations on a document by building text block structure.
  • Filtering the Text Block by the City/State/ZIP Line
  • In most of US addresses, the bottommost line contains City/State/ZIP information. The system can utilize this knowledge by filtering out the text blocks found in 2.1 which do not have enough alphas (to represent City and State), do not contain any valid state (which is usually abbreviate to 2 characters) and/or do not contain enough numbers in the end to represent Zip-code.
  • Using Postal Database and Fuzzy Matching to Interpret Addresses
  • Once address candidates are selected using 2.1 and 2.2, the system can build the entire address block starting with City/State/ZIP at the bottom line and including 1-3 upper lines as potential Name and Street Address components. Since the exact format of the address is not well-defined (it may have 1-4 lines, be with and without names, be with and without POBOX etc), the system has to make multiple address interpretation attempts to achieve satisfactory interpretation of the entire text block.
  • In order to compare OCR results with the data included into the Postal db, the Fuzzy Matching mechanism is used. For example, if OCR reads “San Diego” as “San Dicgo” (‘c’ and ‘e’ are often misrecognized), Fuzzy Matching will produce matching confidence above 80% between the two, which is sufficient to achieve the correct interpretation of OCR result.
  • Using Postal Database to Correct Addresses
  • After the interpretation of the address block was achieved, the individual components will be corrected to become identical to those included into the Postal database.
  • Computation of the Address Confidence
  • The system will assign a confidence value on the scale from 0 to 1000 to each address found above. Such confidences could be assigned overall for the entire address block or individually to each address component (recipient name, street number, apartment number, street name, POBOX number, City, State and ZIP). The larger values indicate that the system is quite sure that if found, read and interpreted the address correctly.
  • Identification of Payee Vs. Payor
  • After one or more of address blocks have been captured, the system must make a determination of which one is Payee and Payor. The following factors help in such determination:
  • Presence of POBOX (it is much more likely to be a Payee than Payor if POBOX is printed).
  • Location within the document (e.g. Payee is somewhat more likely to be printed at the bottom, especially in the right/bottom corner)
  • Inclusion of certain words in the Recipient name item (some words like “Corporation” indicate Payee)
  • Inclusion of frequent names in the Recipient name item (e.g. “John” is more likely indicate Payor than Payee)
  • Adjacency to Postal to barcodes (if more than 1 block competes for either Payee or Payor, the one adjacent to a barcode wins).
  • Using Postal Barcode Reader
  • Postal barcodes are often printed on bills and they help to improve accuracy of address capture.
  • The system uses Postal barcodes for 3 purposes:
  • To help in Payee vs. Payor identification (see 2.6)
  • To correct ZIP-code
  • To better detect address candidates
  • Using Payor Hint
  • Payor hint contain information about Payor (i.e. the bill recipient). The system can use such information for Payee vs. Payor identification (see also 2.6)
  • Using Payee Hint
  • Payee hint contain information about existing billers in the account. The system can use such information to significantly increase accuracy of capturing critical fields. Depending on which and how many critical field values were included into the hint, the field capture error may be reduced by 20-98% for the pre-existing (i.e. repeating) billers.
  • Using Biller's Database
  • Available Biller's databases contain information about thousands of billers, including their postal addresses, names and account number formats. This information allows to cross-validate payee information and other critical fields on CCBs.
  • III. Capturing Balance Due from Credit Cards Bills
  • Keyword-Based Search
  • Balance Due field has a unique set of keywords which allow us to identify the field's location on about 90% of CCBs. In remaining 10% the keyword cannot be found due to some combination of poor image quality, usage of small font, inverted text etc.
  • Cross Validation Against Codeline
  • On CCBs, Balance Due is always included into the Codeline. This allows us to use a cross-validation technique by comparing content from two different fields that should be identical.
  • Usage of the Largest Amount
  • If regular keyword-based search (see 3.1) does not yield results, the system can use the largest of all amounts included into the bill as long as it can be validated against the codeline.
  • IV. Capturing Biller Name from Credit Cards Bills
  • Using the “Recipient” Field in the Payee Address
  • On a large portion of CCBs the biller name is included into the “recipient” field in the Payee Address. Therefore the system will use the “recipient” field as a candidate for the biller's name
  • Using Keywords
  • The Biller's name is often indicated on a bill by certain keywords, like “Pay to”, “Make your check payable to” etc. Once one or more such keywords were found, the system will try to find actual biller's name in the proximity of the keyword using the following sequence of attempts:
      • 1. Text immediately to the right of found keyword(s). Stop if text is found, otherwise proceed to #2.
      • 2. Text immediately below found keyword(s). Stop if text is found, otherwise proceed to #3.
      • 3. Check if the Payee block is located below the keyword. If yes, take its topmost line.
  • The system will use the text found in 1-3 above as another candidate for the biller's name in addition to the one presented in the section immediately above.
  • Using Biller's Database
  • All candidates for biller's name get cross-correlated against all possible billers located at the biller's Zip code found (see Section 2). The entry in Biller's db with the highest match confidence will be chosen as the correct biller.
  • V. Capturing Account Number on Major Credit Cards
  • There is a set of rules applicable to all Major Credit Cards (MCC for short) which help to increase recognition accuracy on such bills.
  • Limitations on Character Set
  • Account Number field in all MCCs is purely numeric (unlike say Insurance bills which may include alphas).
  • Limitations of Account Number's Leading Digits
  • MCCs (Visa, MasterCard, AmEx, Diners, and Discover) have well defined account number formats. Their account numbers start with a particular digit or a narrow range of digits (say, Visa always starts with ‘4’, MasterCard with “51”-“55” and so on).
  • Limitations of Account Number's Length
  • Account number length is also well restricted. The length depends on the credit card, 16 digits length is the most often case.
  • Mod 10 Rule (LUHN Formula)
  • Account Number field on MCCs satisfied LUHN Formula (Mod 10) rule, which we included below for reference.
  • The following steps are required to validate the account number:
  • Step 1: Double the value of alternate digits of the account number beginning with the second digit from the right (the first right—hand digit is the check digit.)
  • Step 2: Add the individual digits comprising the products obtained in Step 1 to each of the unaffected digits in the original number.
  • Step 3: The total obtained in Step 2 must be a number ending in zero (30, 40, 50, etc.) for the account number to be validated.
  • Detection of Account Number Entirely by Codeline
  • If the fact that bill is issued by a major credit card was established, the system can in most cases find the field directly in Codeline w/o a necessity to do the OCR.
  • VI. Overall Flowchart of Capturing Critical Fields from Credit Card Bills
  • Description of Overall Flowchart
  • FIG. 8 is a flowchart of a method of capturing critical fields in a credit card bill, in accordance with embodiments of the invention. In one embodiment, the steps for capturing critical fields include:
  • 100—Input binary image. It is created as a result of processing the mobile color JPG-image of the bill as described in Patent [1].
  • 150—Codeline recognition.
  • 200—ASCII string representing result of 150
  • 250—Postal Barcode recognition
  • 300—Postal Barcodes
  • 350—Applying Dynamic Capture module [1] to find alternatives for Account Number and Balance Due based on the field's keywords (such as “Account Number”, “Balance Due” etc) and field's format
  • 400—Final Account Number and Balance Due results based on cross-validation against codeline 200
  • 450—Postal database
  • 450—Address recognition and validation using Postal db 450
  • 500—Payee and Payor address block as a result of 450
  • 550—Identification of Payee and Payor
  • 600—Payee block
  • 650—Finding candidates for Biller Name using keywords and payee
  • 700—Biller name candidates
  • 750—Biller database
  • 800—Validation candidates 700 against database 750
  • 850—Final Biller name
  • FIG. 9 is one embodiment of a network and system upon which the methods described herein may be implemented, including a capture device 702 which captures an image 704 of a credit card bill, then transmits it over a network 706 to a server 708 for processing. In one embodiment, the capture device 702 also performs one or more of the processing steps described herein in addition to, or instead of, the server 708.
  • FIG. 10 is an embodiment of a computer, processor and memory upon which a mobile device, server or other computing device may be implemented to carry out the methods described herein. The server 708 may include a power supply 902, processor 904, network interface module 906, memory 908 and a CCB recognition module 910 for performing the specific credit card bill recognition and identification steps described herein.
  • While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not of limitation. The breadth and scope should not be limited by any of the above-described exemplary embodiments. Where this document refers to technologies that would be apparent or known to one of ordinary skill in the art, such technologies encompass those apparent or known to the skilled artisan now or at any time in the future. In addition, the described embodiments are not restricted to the illustrated example architectures or configurations, but the desired features can be implemented using a variety of alternative architectures and configurations. As will become apparent to one of ordinary skill in the art after reading this document, the illustrated embodiments and their various alternatives can be implemented without confinement to the illustrated example. One of ordinary skill in the art would also understand how alternative functional, logical or physical partitioning and configurations could be utilized to implement the desired features of the described embodiments.
  • Furthermore, although items, elements or components may be described or claimed in the singular, the plural is contemplated to be within the scope thereof unless limitation to the singular is explicitly stated. The presence of broadening words and phrases such as “one or more,” “at least,” “but not limited to” or other like phrases in some instances shall not be read to mean that the narrower case is intended or required in instances where such broadening phrases may be absent.

Claims (1)

What is claimed is:
1. A computer readable medium containing instructions which, when executed by a computer, perform a process comprising:
receiving an input image of a credit card bill;
performing a codeline recognition process;
capturing a postal service barcode;
extracting an account number and balance due from the image using an optical character recognition process;
obtaining an address of a payor and payee from the image;
identifying the payor and payee based on a comparison of the obtained addresses with a database of payor and payee addresses;
performing a keyword-based capture to obtain a biller name;
confirming a biller name; and
outputting bill data from the credit card bill to a user.
US14/217,241 2013-03-15 2014-03-17 Systems and methods for capturing critical fields from a mobile image of a credit card bill Abandoned US20140279323A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/217,241 US20140279323A1 (en) 2013-03-15 2014-03-17 Systems and methods for capturing critical fields from a mobile image of a credit card bill
US15/338,203 US10509958B2 (en) 2013-03-15 2016-10-28 Systems and methods for capturing critical fields from a mobile image of a credit card bill

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361802069P 2013-03-15 2013-03-15
US14/217,241 US20140279323A1 (en) 2013-03-15 2014-03-17 Systems and methods for capturing critical fields from a mobile image of a credit card bill

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/338,203 Continuation US10509958B2 (en) 2013-03-15 2016-10-28 Systems and methods for capturing critical fields from a mobile image of a credit card bill

Publications (1)

Publication Number Publication Date
US20140279323A1 true US20140279323A1 (en) 2014-09-18

Family

ID=51532529

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/217,241 Abandoned US20140279323A1 (en) 2013-03-15 2014-03-17 Systems and methods for capturing critical fields from a mobile image of a credit card bill
US15/338,203 Active 2034-05-07 US10509958B2 (en) 2013-03-15 2016-10-28 Systems and methods for capturing critical fields from a mobile image of a credit card bill

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/338,203 Active 2034-05-07 US10509958B2 (en) 2013-03-15 2016-10-28 Systems and methods for capturing critical fields from a mobile image of a credit card bill

Country Status (1)

Country Link
US (2) US20140279323A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298997B1 (en) * 2014-03-19 2016-03-29 Amazon Technologies, Inc. Signature-guided character recognition
CN108960058A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Invoice method of calibration, device, computer equipment and storage medium
CN110991456A (en) * 2019-12-05 2020-04-10 北京百度网讯科技有限公司 Bill identification method and device
CN111931784A (en) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium
US11087409B1 (en) 2016-01-29 2021-08-10 Ocrolus, LLC Systems and methods for generating accurate transaction data and manipulation

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11100571B1 (en) * 2014-06-10 2021-08-24 Wells Fargo Bank, N.A. Systems and methods for payee identification via camera
CN111325556A (en) * 2020-02-18 2020-06-23 中国银联股份有限公司 Information processing method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6282326B1 (en) * 1998-12-14 2001-08-28 Eastman Kodak Company Artifact removal technique for skew corrected images
US20080249936A1 (en) * 2007-04-04 2008-10-09 Devin Miller Bill paying systems and associated methods
US20130051610A1 (en) * 2008-01-18 2013-02-28 Mitek Systems Systems and methods for obtaining financial offers using mobile image capture
US20130120595A1 (en) * 2008-01-18 2013-05-16 Mitek Systems Systems for Mobile Image Capture and Remittance Processing of Documents on a Mobile Device
US20130148862A1 (en) * 2008-01-18 2013-06-13 Mitek Systems Systems and methods for obtaining financial offers using mobile image capture

Family Cites Families (89)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5326959A (en) 1992-08-04 1994-07-05 Perazza Justin J Automated customer initiated entry remittance processing system
US5920847A (en) * 1993-11-01 1999-07-06 Visa International Service Association Electronic bill pay system
US20020023055A1 (en) * 1996-03-01 2002-02-21 Antognini Walter Gerard System and method for digital bill presentment and payment
US5761686A (en) 1996-06-27 1998-06-02 Xerox Corporation Embedding encoded information in an iconic version of a text image
US6070150A (en) * 1996-10-18 2000-05-30 Microsoft Corporation Electronic bill presentment and payment system
US6968319B1 (en) * 1996-10-18 2005-11-22 Microsoft Corporation Electronic bill presentment and payment system with bill dispute capabilities
US7653600B2 (en) * 1997-05-30 2010-01-26 Capital Security Systems, Inc. Automated document cashing system
US6012048A (en) * 1997-05-30 2000-01-04 Capital Security Systems, Inc. Automated banking system for dispensing money orders, wire transfer and bill payment
US6038553A (en) 1997-09-19 2000-03-14 Affiliated Computer Services, Inc. Self service method of and system for cashing checks
US6038351A (en) 1997-10-28 2000-03-14 Cash Management Solutions Apparatus and method for multi-entity, mixed document environment document identification and processing
US6735341B1 (en) 1998-06-18 2004-05-11 Minolta Co., Ltd. Image processing device and method and recording medium for recording image processing program for same
US7377425B1 (en) 1999-11-30 2008-05-27 Diebold Self-Service Systems Division Of Diebold, Incorporated Method and system of evaluating checks deposited into a cash dispensing automated banking machine
AU2001243319A1 (en) * 2000-02-28 2001-09-12 Sprarkcharge, Inc. System, and method for prepaid anonymous and pseudonymous credit card type transactions
JP2001326847A (en) 2000-05-16 2001-11-22 Fuji Photo Film Co Ltd Image pickup device
JP4631133B2 (en) 2000-06-09 2011-02-16 コニカミノルタビジネステクノロジーズ株式会社 Apparatus, method and recording medium for character recognition processing
US7313289B2 (en) 2000-08-30 2007-12-25 Ricoh Company, Ltd. Image processing method and apparatus and computer-readable storage medium using improved distortion correction
US6754640B2 (en) * 2000-10-30 2004-06-22 William O. Bozeman Universal positive pay match, authentication, authorization, settlement and clearing system
US6993507B2 (en) * 2000-12-14 2006-01-31 Pacific Payment Systems, Inc. Bar coded bill payment system and method
US6433706B1 (en) * 2000-12-26 2002-08-13 Anderson, Iii Philip M. License plate surveillance system
CA2354372A1 (en) * 2001-02-23 2002-08-23 Efunds Corporation Electronic payment and authentication system with debit and identification data verification and electronic check capabilities
TW493143B (en) 2001-03-02 2002-07-01 Ulead Systems Inc Correction for perspective distortion image and method for artificial perspective distortion image
US20020143804A1 (en) * 2001-04-02 2002-10-03 Dowdy Jacklyn M. Electronic filer
AU2002328129A1 (en) 2001-06-22 2003-01-08 Emblaze Systems, Ltd. Mms system and method with protocol conversion suitable for mobile/portable handset display
FI113132B (en) 2001-06-28 2004-02-27 Nokia Corp Method and apparatus for improving an image
US7331523B2 (en) 2001-07-13 2008-02-19 Hand Held Products, Inc. Adaptive optical image reader
US6922487B2 (en) 2001-11-02 2005-07-26 Xerox Corporation Method and apparatus for capturing text images
US6985631B2 (en) 2002-02-20 2006-01-10 Hewlett-Packard Development Company, L.P. Systems and methods for automatically detecting a corner in a digitally captured image
US7295694B2 (en) 2002-02-22 2007-11-13 International Business Machines Corporation MICR-based optical character recognition system and method
US7020320B2 (en) 2002-03-06 2006-03-28 Parascript, Llc Extracting text written on a check
US20050141028A1 (en) * 2002-04-19 2005-06-30 Toshiba Corporation And Toshiba Tec Kabushiki Kaisha Document management system for automating operations performed on documents in data storage areas
KR20040076131A (en) 2003-02-24 2004-08-31 주식회사 한국인식기술 Robbery check confirmation method to use mobile
US7167580B2 (en) 2003-04-30 2007-01-23 Unisys Corporation Image quality assurance systems and methodologies for improving the identification of and access speed to image quality suspects in documents
US20050065893A1 (en) 2003-09-19 2005-03-24 The Alliance Group Of Texas System and Method for Commingled Remittance Payment Processing
JP2005108230A (en) 2003-09-25 2005-04-21 Ricoh Co Ltd Audio / video content recognition / processing function built-in printing system
WO2005041123A1 (en) 2003-10-24 2005-05-06 Fujitsu Limited Image distortion correcting program, image distortion correcting device and imag distortion correcting method
US20050097046A1 (en) 2003-10-30 2005-05-05 Singfield Joy S. Wireless electronic check deposit scanning and cashing machine with web-based online account cash management computer application system
US7707039B2 (en) 2004-02-15 2010-04-27 Exbiblio B.V. Automatic modification of web pages
US7593600B2 (en) 2004-03-04 2009-09-22 International Business Machines Corporation Black white image scaling for optical character recognition
US7707218B2 (en) 2004-04-16 2010-04-27 Mobot, Inc. Mobile query system and method based on visual cues
US7593595B2 (en) 2004-08-26 2009-09-22 Compulink Management Center, Inc. Photographic document imaging system
KR20060050746A (en) * 2004-08-31 2006-05-19 엘지전자 주식회사 How to process document images taken with the camera
US7689037B2 (en) * 2004-10-22 2010-03-30 Xerox Corporation System and method for identifying and labeling fields of text associated with scanned business documents
WO2006075967A1 (en) 2005-01-15 2006-07-20 Jacob Weitman Method for the use of digital cameras and cameraphones
WO2006136958A2 (en) 2005-01-25 2006-12-28 Dspv, Ltd. System and method of improving the legibility and applicability of document pictures using form based image enhancement
US7983468B2 (en) 2005-02-09 2011-07-19 Jp Morgan Chase Bank Method and system for extracting information from documents by document segregation
US20060210192A1 (en) 2005-03-17 2006-09-21 Symagery Microsystems Inc. Automatic perspective distortion detection and correction for document imaging
US20060242063A1 (en) * 2005-04-26 2006-10-26 Peterson David L Remote check deposit
US7360686B2 (en) 2005-05-11 2008-04-22 Jp Morgan Chase Bank Method and system for discovering significant subsets in collection of documents
US7526129B2 (en) 2005-06-23 2009-04-28 Microsoft Corporation Lifting ink annotations from paper
US7558418B2 (en) 2005-08-23 2009-07-07 Goldleaf Enterprise Payments, Inc. Real time image quality analysis and verification
US20070214078A1 (en) * 2005-09-28 2007-09-13 Transpayment, Inc. Bill payment apparatus and method
US7391934B2 (en) 2005-10-05 2008-06-24 Ncr Corporation Method of creating a substitute check using check image data from a remote check image capture device and an apparatus therefor
US20070084911A1 (en) 2005-10-18 2007-04-19 First Data Corporation ATM check invalidation and return systems and methods
US7747495B2 (en) * 2005-10-24 2010-06-29 Capsilon Corporation Business method using the automated processing of paper and unstructured electronic documents
GB0602357D0 (en) 2006-02-06 2006-03-15 First Ondemand Ltd Authentication of cheques and the like
US7330604B2 (en) 2006-03-02 2008-02-12 Compulink Management Center, Inc. Model-based dewarping method and apparatus
US20070288382A1 (en) 2006-05-03 2007-12-13 Avalon International, Inc. Check21 image based document and processing system
US8233714B2 (en) * 2006-08-01 2012-07-31 Abbyy Software Ltd. Method and system for creating flexible structure descriptions
US8164762B2 (en) 2006-09-07 2012-04-24 Xerox Corporation Intelligent text driven document sizing
US8626661B2 (en) * 2006-10-10 2014-01-07 Global Standard Financial, Inc. Electronic lockbox using digitally originated checks
US20080247629A1 (en) * 2006-10-10 2008-10-09 Gilder Clark S Systems and methods for check 21 image replacement document enhancements
US20080183576A1 (en) 2007-01-30 2008-07-31 Sang Hun Kim Mobile service system and method using two-dimensional coupon code
US20090108080A1 (en) * 2007-10-31 2009-04-30 Payscan America, Inc. Bar coded monetary transaction system and method
KR20070115834A (en) 2007-11-12 2007-12-06 주식회사 비즈모델라인 Mobile phones with watermarking (or encryption marking)
US7996317B1 (en) 2007-11-21 2011-08-09 Hsbc Bank Usa, N.A. Methods and systems for processing stranded payments and lockbox payments at the same designated payment location
US9852406B2 (en) * 2012-01-17 2017-12-26 Deluxe Small Business Sales, Inc. System and method for managing financial transactions based on electronic check data
US7978900B2 (en) 2008-01-18 2011-07-12 Mitek Systems, Inc. Systems for mobile image capture and processing of checks
US8379914B2 (en) * 2008-01-18 2013-02-19 Mitek Systems, Inc. Systems and methods for mobile image capture and remittance processing
US20130085935A1 (en) * 2008-01-18 2013-04-04 Mitek Systems Systems and methods for mobile image capture and remittance processing
JP2010140459A (en) 2008-02-22 2010-06-24 Ricoh Co Ltd Program, print data conversion device, and computer-readable recording medium
GB2472179B (en) 2008-05-06 2013-01-30 Compulink Man Ct Inc Camera-based document imaging
US8094918B2 (en) 2008-10-29 2012-01-10 Rdm Corporation Check and other item design for reflectance values determination prior to item manufacture
US8774516B2 (en) * 2009-02-10 2014-07-08 Kofax, Inc. Systems, methods and computer program products for determining document validity
US20100253787A1 (en) * 2009-04-02 2010-10-07 Isaac Grant Method for Object Recognition and Communication of Associated Label and Other Information
US8180137B2 (en) 2010-02-23 2012-05-15 Rdm Corporation Comparison of optical and magnetic character data for identification of character defect type
US9129340B1 (en) * 2010-06-08 2015-09-08 United Services Automobile Association (Usaa) Apparatuses, methods and systems for remote deposit capture with enhanced image detection
US8204805B2 (en) * 2010-10-28 2012-06-19 Intuit Inc. Instant tax return preparation
AU2012202173B2 (en) * 2011-04-18 2013-09-05 Castle Bookkeeping Wizard Pty Ltd System and method for processing a transaction document including one or more financial transaction entries
US9400806B2 (en) * 2011-06-08 2016-07-26 Hewlett-Packard Development Company, L.P. Image triggered transactions
JP5768590B2 (en) * 2011-08-22 2015-08-26 富士通株式会社 Image processing apparatus, image processing method, and program
US20130311362A1 (en) * 2012-04-26 2013-11-21 Mastercard International Incorporated Systems and methods for verifying payee information in electronic payments
US20140188715A1 (en) * 2012-12-31 2014-07-03 Fiserv, Inc. Systems and methods for bill payment with image capture of bill information and funding account
US20140258838A1 (en) * 2013-03-11 2014-09-11 Sap Ag Expense input utilities, systems, and methods
US20150142545A1 (en) * 2013-03-14 2015-05-21 Bill.Com, Inc. Enhanced system and method for offering and accepting discounts on invoices in a payment system
US10417674B2 (en) * 2013-03-14 2019-09-17 Bill.Com, Llc System and method for sharing transaction information by object tracking of inter-entity transactions and news streams
US20150012442A1 (en) * 2013-03-14 2015-01-08 Bill.Com, Inc. Enhanced system and method for scanning and processing of payment documentation
US10115137B2 (en) * 2013-03-14 2018-10-30 Bill.Com, Inc. System and method for enhanced access and control for connecting entities and effecting payments in a commercially oriented entity network
US20140281871A1 (en) * 2013-03-15 2014-09-18 Meditory Llc Method for mapping form fields from an image containing text
US9786011B1 (en) * 2014-09-02 2017-10-10 State Farm Mutual Automobile Insurance Company System and method for using object recognition to facilitate the collection of insurance information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6282326B1 (en) * 1998-12-14 2001-08-28 Eastman Kodak Company Artifact removal technique for skew corrected images
US20080249936A1 (en) * 2007-04-04 2008-10-09 Devin Miller Bill paying systems and associated methods
US20130051610A1 (en) * 2008-01-18 2013-02-28 Mitek Systems Systems and methods for obtaining financial offers using mobile image capture
US20130120595A1 (en) * 2008-01-18 2013-05-16 Mitek Systems Systems for Mobile Image Capture and Remittance Processing of Documents on a Mobile Device
US20130148862A1 (en) * 2008-01-18 2013-06-13 Mitek Systems Systems and methods for obtaining financial offers using mobile image capture

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9298997B1 (en) * 2014-03-19 2016-03-29 Amazon Technologies, Inc. Signature-guided character recognition
US11087409B1 (en) 2016-01-29 2021-08-10 Ocrolus, LLC Systems and methods for generating accurate transaction data and manipulation
CN108960058A (en) * 2018-05-31 2018-12-07 平安科技(深圳)有限公司 Invoice method of calibration, device, computer equipment and storage medium
CN110991456A (en) * 2019-12-05 2020-04-10 北京百度网讯科技有限公司 Bill identification method and device
CN111931784A (en) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 Bill recognition method, system, computer device and computer-readable storage medium

Also Published As

Publication number Publication date
US10509958B2 (en) 2019-12-17
US20170109574A1 (en) 2017-04-20

Similar Documents

Publication Publication Date Title
US10509958B2 (en) Systems and methods for capturing critical fields from a mobile image of a credit card bill
KR100368586B1 (en) Business form handling method and system for carrying out the same
US9195977B2 (en) System and method for remote deposit system
EP1483729B1 (en) Extracting text written on a check
US10891475B2 (en) Systems and methods for enrollment and identity management using mobile imaging
US20060219773A1 (en) System and method for correcting data in financial documents
JP6268352B2 (en) Accounting data entry system, method, and program
US20140153787A1 (en) Systems, methods and computer program products for determining document validity
JP5202677B2 (en) Receipt data recognition device and program thereof
TWI716761B (en) Intelligent accounting system and identification method for accounting documents
US20140268250A1 (en) Systems and methods for receipt-based mobile image capture
CN110516664A (en) Bill identification method and device, electronic equipment and storage medium
CN114998920B (en) Supply chain financial file management method and system based on NLP semantic recognition
US20230113578A1 (en) Transaction and ownership information document extraction
JP3707997B2 (en) Method and apparatus for determining payment form
TWM575887U (en) Intelligent accounting system
CN116503878A (en) A business decision processing method and device
US11475686B2 (en) Extracting data from tables detected in electronic documents
US20250022048A1 (en) Application of natural language processing to notational datasets to enhance sub-threshold remediation
JP2006053762A (en) Business form processing system
JP2001312694A (en) Various forms recognition method and apparatus
CN110456993B (en) Method and system for card number display based on preset rules
JPH1116020A (en) Method and device for identifying merchandise coupon
CN120975982A (en) Contract element entry method, electronic device, storage medium, and program product
JP2014219819A (en) Character recognition device and method therefor

Legal Events

Date Code Title Description
AS Assignment

Owner name: MITEK SYSTEMS, INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KLIATSKINE, VITALI;NEPOMNIACHTCHI, GRIGORI;KOTOVICH, NIKOLAY;SIGNING DATES FROM 20170608 TO 20170609;REEL/FRAME:043016/0153

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION