JP6250307B2

JP6250307B2 - Image information processing apparatus and image information processing method

Info

Publication number: JP6250307B2
Application number: JP2013117396A
Authority: JP
Inventors: 淳彦山井
Original assignee: Primagest Inc
Current assignee: Primagest Inc
Priority date: 2013-06-03
Filing date: 2013-06-03
Publication date: 2017-12-20
Anticipated expiration: 2033-06-03
Also published as: JP2014235619A

Description

本発明は、イメージ画像を登録し、登録したイメージ画像を検索可能な画像情報処理装置及び画像情報処理方法に関し、例えば、帳票をイメージ画像として登録し、将来の仕様変更にも柔軟に対処して検索可能な画像情報処理装置及び画像処理方法に関するものである。 The present invention relates to an image information processing apparatus and an image information processing method capable of registering image images and searching for registered image images, for example, registering a form as an image image, and flexibly dealing with future specification changes. The present invention relates to a searchable image information processing apparatus and an image processing method.

近年おける業務の多様化に伴い、様々な業種において多種多様の書類や伝票、帳票等の帳票類が扱われる様になってきており、処理対象書類に記載されている情報中の必要情報は目視確認してキー入力してキャラクタ情報化し、入力済みの帳票はその後の入力結果に検証（入力情報の全文確認）する必要が生じたときに容易に取り出せる状態で保存していた。 Along with the diversification of operations in recent years, various types of documents, slips, forms, and other forms are being handled in various industries, and necessary information in the information described in the processing target documents is visually checked. After confirming and inputting keys, it is converted into character information, and the input form is stored in a state where it can be easily taken out when it becomes necessary to verify the input result after that (the entire text of the input information is confirmed).

その後、認識技術の向上に伴って帳票類をＯＣＲ装置などにより読み取って必要情報を文字認識してデジタル情報化するシステムが導入されてきている。そして、デジタル情報化した情報が正しく認識されているかを、別途、原帳票を目視確認しながら確認し、認識誤りがあればここで修正していた。 Since then, with the improvement of recognition technology, a system has been introduced in which forms are read by an OCR device or the like, and necessary information is recognized and converted into digital information. Then, whether or not the digitalized information is correctly recognized is confirmed separately by visually checking the original form, and if there is a recognition error, it is corrected here.

デジタル化が終了すると原帳票は保存期間が満了するまで倉庫などに保管されることになる。以上の処理でも、入力済みの帳票はその後の入力結果に検証（入力情報の全文確認）する必要が生じたときに容易に取り出せる状態で保存しなければならなかった。 When the digitization is completed, the original form is stored in a warehouse or the like until the storage period expires. Even in the above processing, the input form must be stored in a state where it can be easily taken out when it becomes necessary to verify the input result thereafter (confirming the full text of the input information).

その後、さらに記憶装置の大容量化が図られ、処理速度も原帳票類の表面・裏面をそのままイメージ化して保存可能になってきた。このため、原帳票類をそのままイメージ化してそのまま取り込み、このイメージ情報の一部領域を認識処理して例えばキャラクタデータ化して、認識処理が正しく行われたか否かも読み取ったイメージ情報を表示し、この表示情報を確認しながら行うことにより、一度イメージ情報化した帳票類は、現物を参照する必要をなくすことができる。 Since then, the capacity of the storage device has been further increased, and the processing speed can be stored as an image of the front and back surfaces of the original forms. For this reason, the original forms are imaged as they are, taken in as they are, and a partial area of this image information is recognized and converted into character data, for example, and the read image information is also displayed whether or not the recognition processing has been performed correctly. By confirming the display information, it is possible to eliminate the need for referring to the actual form that has been converted into image information.

この種の装置においては、読み取ったイメージ情報を所定ファイルに登録する際に、当該イメージ情報を読み出してくることが可能なように所定の検索キーワードを設定して、設定したキーワードによりイメージを検索可能に構成することが望ましい。しかしながら、従来は帳票類をイメージデータ化して画像情報として登録し、登録した画像イメージを複数の処理システムで参照するものはなかった。 In this type of device, when registering the read image information in a predetermined file, a predetermined search keyword is set so that the image information can be read out, and an image can be searched with the set keyword. It is desirable to configure. However, conventionally, there has been no document in which forms are converted into image data and registered as image information, and the registered image is referred to by a plurality of processing systems.

画像データのみを複数のシステムで検索するものではなく、テキストデータ（文書データ）を検索するシステムは多数提案されている。例えば、特許文献１では、キーワードとしては、文書を登録するシステムの目的に対応したキーワードを辞書からピックアップして限定したものであることが好ましいとしているが、辞書のデータそのままを使用する技術である。 A number of systems for searching text data (document data) are proposed instead of searching only image data by a plurality of systems. For example, in Patent Document 1, it is preferable that a keyword corresponding to the purpose of a system for registering a document is picked up and limited from a dictionary, but this is a technique that uses data in a dictionary as it is. .

特開２００３−１８６８８４号公報Japanese Patent Laid-Open No. 2003-186884

しかしながら、キーワードは初期の目的に応じたものとならざるを得ず、目的が異なるとキーワードも異なるため、目的が異なる処理をする必要が生じたような場合には、再度キーワードを決め、再設定しなければならなかった。 However, the keyword must be based on the initial purpose, and if the purpose is different, the keyword will be different. If it is necessary to perform processing with a different purpose, the keyword is determined again and reset. Had to do.

処理対象が帳票類であった場合であっても、処理する帳票が異なる場合にはキーワードの項目も一から選定し直す必要があり、処理対象が異なるとその都度処理システムを設計し直す必要があった。 Even if the processing target is a form, it is necessary to re-select the keyword item from the beginning if the processing form is different. If the processing target is different, it is necessary to redesign the processing system each time there were.

例えば、処理対象が税公金の納付書イメージの検索システムを口座振替依頼書の検索に使用しようとした場合、納付書のイメージ検索時のキーは、銀行収納日付、収納店、収納市町村・企業名、金額などとなる。一方、口座振替依頼書イメージの検索では、店番・科目・口座番号(以下口座番号と略す)、登録日、委託会社コードなどがキーとなる。このように、キー項目が全く異なるため、まったく異なる２つの検索システムにならざるをえない。 For example, if you are going to use the payment image search system for tax payments to search for direct debit requests, the keys for searching payment images are the bank storage date, storage store, storage city, company name , And the amount of money. On the other hand, in searching for an account transfer request image, the store number, subject, account number (hereinafter abbreviated as account number), registration date, consignment company code, and the like are the keys. As described above, since the key items are completely different, it is unavoidable to have two completely different search systems.

このように、従来の処理システムでは、処理する業務メニューと業務毎の検索プログラムが一対であり、業務の追加／帳票の追加変更時にシステム変更、検索プログラムの変更が必要であった。画像データを登録する場合でもまったく同じである。 As described above, in the conventional processing system, the business menu to be processed and the search program for each business are a pair, and it is necessary to change the system and the search program when adding business / additional forms. The same applies when registering image data.

これを防ぐには、将来追加する可能性のある業務のキーを事前に全て洗い出しておく必要性があるが、現実には不可能である。 In order to prevent this, it is necessary to identify in advance all the keys of work that may be added in the future, but this is impossible in reality.

又、入手金伝票などは一度処理してしまうと将来的に繰り返し元の帳票を参照する必要性は低いが、銀行のローン申込み書類や許可決裁書類などは繰り返し参照する可能性が高く、検索の都度別の領域の記載情報を確認したい場合も多い。このため、イメージ情報の検索キーとしては全文が検索対象とできることが望ましい。 In addition, once you have processed the payment slip, it is unlikely that you will need to repeatedly refer to the original form in the future, but it is highly likely that you will repeatedly refer to the bank loan application documents and permit approval documents. In many cases, you may want to check the information in different areas. For this reason, it is desirable that the entire text can be a search target as a search key for image information.

本発明は、上述した課題を解決することを目的としてなされたもので、処理する帳票類、業務の変更があっても、検索プログラムを変更などする必要がない画像情報処理装置及び画像情報処理方法を提供することにある。 The present invention has been made for the purpose of solving the above-described problems, and an image information processing apparatus and an image information processing method that do not require a search program to be changed even when there are changes in the forms to be processed and business operations. Is to provide.

係る目的を達成する一手段として例えば以下の構成を備える。すなわち、多種業務に係る多種の書類イメージを記憶する画像情報処理装置であって、前記書類イメージを読み出し可能に記憶するイメージ記憶手段と、前記イメージ記憶手段に登録される書類イメージに表示されている文字情報に対応するデジタル情報において、文字情報の単語毎の文字列を前記書類イメージに対応するインデックスとする文字情報インデックス手段と、前記書類イメージの種別毎に固有の書類種別ＩＤを付与し、前記インデックス特定手段によるインデックスと対応付ける書類種別インデックス手段と、前記インデックス及び前記書類種別ＩＤを前記イメージ記憶手段に登録されている書類イメージの読み出しキーワードとして記憶するインデックス記憶手段とを備えることを特徴とする。 For example, the following configuration is provided as a means for achieving the object. That is, an image information processing apparatus for storing various document images related to various jobs, the image storage unit storing the document image in a readable manner, and the document image registered in the image storage unit In the digital information corresponding to the character information, a character information index means that uses a character string for each word of the character information as an index corresponding to the document image, and a unique document type ID for each type of the document image, Document type index means for associating with an index by the index specifying means, and index storage means for storing the index and the document type ID as read keywords for the document image registered in the image storage means.

そして例えば、前記インデックス記憶手段に記憶されている前記インデックス及び前記書類種別ＩＤを受け取り、該受け取ったインデックス及び書類識別ＩＤの全てで特定される前記イメージ記憶手段に記憶されている書類イメージを読み出すイメージ読み出し手段とを備えることを特徴とする。 And, for example, an image for receiving the index and the document type ID stored in the index storage means and reading out the document image stored in the image storage means specified by all of the received index and document identification ID And a reading means.

また例えば、前記書類イメージは帳票イメージであり、処理対象書類種別の追加時には、前記書類種別インデックス手段は新たに追加される書類種別に固有の書類種別ＩＤを付与し、前記文字情報インデックス手段は、追加された処理対象書類に表示されている文字情報に対応する文字列をインデックスとして前記インデックス記憶手段に追加登録することで対応可能であることを特徴とする。更に例えば、前記文字情報インデックス手段は前記処理対象書類に表示されている全文をインデックス対象とすることを特徴とする。 Also, for example, the document image is a form image, and when adding a document type to be processed, the document type index unit assigns a unique document type ID to the newly added document type, and the character information index unit includes: This can be handled by additionally registering a character string corresponding to the character information displayed in the added document to be processed as an index in the index storage means. Further, for example, the character information index means is characterized in that the entire text displayed in the processing target document is an index target.

又は、多種業務に係る多種の書類イメージを記憶する書類イメージ記憶手段と該書類イメージ記憶手段に記憶されている書類イメージに対するインデックス情報を記憶するインデックス記憶手段を備える画像情報処理装置における画像情報処理方法であって、前記書類イメージ記憶手段に記憶される前記書類イメージに表示されている文字情報を認識して単語毎の文字列の全てを前記書類イメージに対応するインデックスとして抽出し、前記書類イメージの種別毎に固有の書類種別ＩＤを付与し、前記抽出したインデックスと対応付け、書類イメージに対するインデックス情報として前記インデックス記憶手段に記憶させることを特徴とする。 Alternatively, an image information processing method in an image information processing apparatus including document image storage means for storing various document images relating to various business operations and index storage means for storing index information for the document images stored in the document image storage means And recognizing the character information displayed in the document image stored in the document image storage means, extracting all character strings for each word as an index corresponding to the document image, A unique document type ID is assigned to each type, is associated with the extracted index, and is stored in the index storage unit as index information for the document image.

そして例えば、前記インデックス記憶手段に記憶されている前記インデックス及び前記書類種別ＩＤを受け取り、該受け取ったインデックス及び書類識別ＩＤの全てで特定される前記イメージ記憶手段に記憶されている書類イメージを読み出すことを特徴とする。 And, for example, the index and the document type ID stored in the index storage means are received, and the document image stored in the image storage means specified by all of the received index and document identification ID is read. It is characterized by.

又例えば、前記書類イメージは帳票イメージであり、処理対象書類種別の追加時には、新たに追加される書類種別に固有の書類種別ＩＤを付与し、追加された処理対象書類に表示されている文字情報に対応する文字列をインデックスとして前記インデックス記憶手段に追加登録することで対応可能であることを特徴とする。 Also, for example, the document image is a form image, and when a processing target document type is added, a unique document type ID is given to the newly added document type, and the character information displayed on the added processing target document It is possible to cope with this by additionally registering a character string corresponding to the above as an index in the index storage means.

本発明によれば、処理する対象書類が追加変更となっても、検索プログラムを新たに作り直す必要がなく所望の対象書類を検索可能な画像情報処理装置及び画像情報処理方法を提供できる。 According to the present invention, it is possible to provide an image information processing apparatus and an image information processing method capable of searching for a desired target document without the need to newly create a search program even if the target document to be processed is added or changed.

本発明に係る一実施の形態例の画像情報処理装置の全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of an image information processing apparatus according to an embodiment of the present invention. 本実施の形態例に係る処理帳票の例を示す図である。It is a figure which shows the example of the processing form which concerns on the example of this Embodiment.

本実施の形態例に係る画像情報処理装置の図２に示す帳票のインデックス、イメージファイルへのパス等の生成例を説明するための図である。It is a figure for demonstrating the example of generation | occurrence | production of the index of the form shown in FIG. 本実施の形態例に係る画像情報処理装置の処理対象帳票の例と帳票表示キャラクタデータファイルの例、イメージ格納ファイルの生成例を説明するための図である。It is a figure for demonstrating the example of the process target form of the image information processing apparatus which concerns on the example of this embodiment, the example of a form display character data file, and the production | generation example of an image storage file.

本実施の形態例に係る画像情報処理装置のインデックス生成用ファイルとインデックスファイル、イメージファイルへのパス等の生成例を説明するための図である。It is a figure for demonstrating the example of generation | occurrence | production of the path | pass to the index generation file, index file, image file, etc. of the image information processing apparatus concerning this Embodiment. 本実施の形態例に係る画像情報処理装置のインデックス生成用ファイルとイメージ参照インデックスファイルの関係を説明するための図である。It is a figure for demonstrating the relationship between the file for index production | generation of an image information processing apparatus which concerns on this Example, and an image reference index file.

本実施の形態例に係る画像情報処理装置のイメージ検索処理を説明するための図である。It is a figure for demonstrating the image search process of the image information processing apparatus which concerns on the example of this embodiment.

以下、本発明に係る一発明の実施の形態例について添付図面を参照して詳細に説明する。本実施の形態例では、イメージ化した情報に対応して略全文検索が可能な画像情報処理装置及び画像情報処理方法を提供するものである。 Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the present embodiment, an image information processing apparatus and an image information processing method capable of performing a substantially full-text search corresponding to imaged information are provided.

本実施の形態例では、個別業務という考え方を持たない、自由な検索用キーの設定を可能とすることによって、検索業務毎の検索システムの設計を行う必要がなく、異なる業務間で串刺しに検索できる点に特徴を有している。 In this embodiment, it is not necessary to design a search system for each search job by enabling the setting of a free search key that does not have the concept of an individual job, and search for skewering between different jobs. It has a feature in that it can be done.

以下の本実施の形態例の説明は、例えば金融機関で処理する帳票類を検索するシステムに最適な装置及び処理方法を提供する例について説明を行う。更に、イメージの長期蓄積後においてもパフォーマンスの低下を起こさない情報処理システムとする。しかしながら、金融機関の処理帳票に限定されるものではなく、登録し、検索するイメージの種別に限定はない。 In the following description of the present embodiment, an example in which an apparatus and a processing method optimal for a system for searching for forms to be processed by a financial institution will be described. Furthermore, an information processing system that does not cause a decrease in performance even after long-term accumulation of images is provided. However, it is not limited to processing forms of financial institutions, and there is no limitation on the type of image to be registered and searched.

まず、図１を参照して本実施の形態例における画像情報処理装置の概略構成を説明する。図１は本発明に係る一実施の形態例の画像情報処理装置の全体構成を示すブロック図である。 First, a schematic configuration of the image information processing apparatus in the present embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the overall configuration of an image information processing apparatus according to an embodiment of the present invention.

図１において、１０は登録されているイメージ情報の検索を行う操作端末である。１１０〜１３０は金融機関の帳票を処理するシステムであり、１１０は口座振替処理を行う口座振替システム、１２０は入出金伝票の処理を行う入出金伝票処理システム、１３０は法人に対する融資に関する処理を行う法人融資処理システムである。各システムは個別に備えなければならないものではなく、全ての処理に共通のひとつの構成であっても良い。 In FIG. 1, reference numeral 10 denotes an operation terminal for searching registered image information. 110 to 130 are systems for processing financial institution forms, 110 is an account transfer system for performing an account transfer process, 120 is a deposit / withdrawal slip processing system for processing a deposit / withdrawal slip, and 130 is a process for financing a corporation. It is a corporate loan processing system. Each system does not have to be provided individually, but may have one configuration common to all processes.

各システムは少なくとも１５０に示す構成を備えている。即ち、処理する書類のイメージを読み取るイメージリーダ１５１、イメージリーダ１５１で読み取ったイメージから必要な文字、数字情報を認識してデジタルデータに変換するデータ認識部１５２、データ認識部１５２で認識したデータから当該イメージ情報を検索する際のキーワードを抽出するキーワード抽出部１５３、読み取ったイメージ情報、抽出したキーワードを書類イメージ蓄積／検索装置２００に出力するとともに、データ認識部１５２の認識結果の校正などを行う入出力部１５５などから構成される。 Each system has at least 150 configurations. That is, an image reader 151 that reads an image of a document to be processed, a data recognition unit 152 that recognizes necessary character and numeral information from an image read by the image reader 151, and converts the information into digital data, and data that is recognized by the data recognition unit 152 A keyword extraction unit 153 that extracts a keyword for searching the image information, outputs the read image information and the extracted keyword to the document image storage / retrieval apparatus 200, and calibrates the recognition result of the data recognition unit 152. An input / output unit 155 and the like are included.

これら処理システム１５０では、イメージリーダ１５１で処理対象帳票の表裏表示情報をイメージ情報として読み取る。そして、読み取ったイメージデータ中の所定領域に表示されている文字・数字データを、データ認識部１５２で文字認識し、例えば、認識したデータと対応する読み取りイメージの双方を並列表示し、認識結果の確認校正を行なう。 In these processing systems 150, the image reader 151 reads the front / back display information of the processing target form as image information. Then, the character / numeric data displayed in the predetermined area in the read image data is character-recognized by the data recognition unit 152, for example, both the recognized data and the corresponding read image are displayed in parallel, and the recognition result Perform verification calibration.

例えば、多種業務に係る多種の書類イメージを読み出し可能に記憶するイメージファイル２３０と、イメージファイル２３０に登録される書類イメージに表示されている文字情報に対応するデジタル情報において、文字情報の単語毎の文字列を前記書類イメージに対応するインデックスとするとともに、前記書類イメージの種別毎に固有の書類種別ＩＤを付与し、インデックスと対応付け、前記インデックス及び前記書類種別ＩＤをイメージファイル２３０に登録されている書類イメージの読み出しキーワードとして記憶するインデックスファイル３００とを備え、検索エンジン２５０はインデックス及び書類種別ＩＤを受け取り、インデックスファイル３００を参照して受け取ったインデックス及び書類識別ＩＤの全てで特定されるイメージファイル２３０に記憶されている書類イメージを読み出すイメージ読み出し手段とを備えることを特徴とする。 For example, in an image file 230 that stores various document images related to various tasks in a readable manner, and digital information corresponding to character information displayed in the document image registered in the image file 230, for each word of the character information A character string is used as an index corresponding to the document image, a unique document type ID is assigned to each type of the document image, the index is associated with the index, and the index and the document type ID are registered in the image file 230. The search engine 250 receives the index and the document type ID, and is identified by all of the received index and document identification ID with reference to the index file 300. Characterized in that it comprises a image reading means for reading a document image stored in a file 230.

そして、認識結果が正しいものと確認後に後述するキーワード抽出方法により抽出したキーワード及び読み取ったイメージ情報を入出力部１５５を解して書類イメージ蓄積／検索装置２００に送ることになる。 After confirming that the recognition result is correct, the keyword extracted by the keyword extraction method described later and the read image information are sent to the document image storage / retrieval apparatus 200 through the input / output unit 155.

２００は各処理システムなどで読み取った書類イメージを蓄積し、必要に応じて蓄積した書類イメージを検索して取り出せる書類イメージ蓄積／検索装置であり、書類イメージを登録保持するイメージファイル２３０、イメージファイル２３０に登録されているイメージファイルを検索する検索エンジン２５０、インデックスファイル３００より所望のインデックス（検索キーワード）を読み出し検索エンジン２５０に供給するインデクサ２６０を備えている。 Reference numeral 200 denotes a document image storage / retrieval device that stores document images read by each processing system, and can search and retrieve the stored document images as necessary. A search engine 250 that searches for image files registered in the index file, and an indexer 260 that reads a desired index (search keyword) from the index file 300 and supplies the index to the search engine 250.

３００はキーワード抽出部１５３で抽出したキーワード情報を登録保持するインデックスファイル３００である。 Reference numeral 300 denotes an index file 300 that registers and holds keyword information extracted by the keyword extraction unit 153.

以上の構成を備える本実施の形態例の処理対象である金融機関での取扱い書類のイメージは、実務的には削除をしにくい。法律上も７年間、あるいは１０年間の保管を要求され、かつ保管する枚数も非常に多くなる。 The image of the handling document in the financial institution which is the processing target of the present embodiment having the above configuration is difficult to delete in practice. It is legally required to store for 7 years or 10 years, and the number of sheets to be stored becomes very large.

例えば、入出金伝票などは、小規模地方銀行でも一日当たり２万枚くらいはあり、年間５００万枚以上となる。メガバンクでは年間５千万枚以上が想定される。このため、これらの書類をイメージ化して検索できるようにすることによりいちいち帳票類を確認する必要がなくなり、単に倉庫などに一括して保存しておけば良く、又、帳票類を破棄した後もイメージ情報として保存しておけばスペースもほとんど必要としない。 For example, deposit / withdrawal slips are about 20,000 per day even for small regional banks, and more than 5 million per year. Megabanks are expected to have more than 50 million copies per year. For this reason, it is not necessary to check each form by making these documents into an image so that they can be searched. You can simply store them in a warehouse, etc., and even after discarding forms. If saved as image information, little space is required.

しかしながら、上記のような多量の帳票イメージを処理する環境の中で所望の帳票イメージを検索する処理のパフォーマンスを維持するには、比較的単純なインデックスしか使用できない。特に、インデックスなしのデータ項目に対する検索はレスポンスを期待できない。 However, only a relatively simple index can be used to maintain the performance of processing for searching for a desired form image in an environment for processing a large amount of form images as described above. In particular, a search for an unindexed data item cannot expect a response.

本実施の形態例では、以上のことから、各個別の業務処理過程で得られた帳票読み取りイメージ情報を利用することとし、元の帳票などは倉庫などに格納して保存しコードデータの全てをキーとし、データ項目名を全文検索のインデックスに加えておき、検索時にデータ項目名と実体を論理積（ＡＮＤ）で検索するようにしている。これにより、アプリケーションに依存しない、汎用利用可能なイメージ検索システムが実現できる。 In this embodiment, from the above, the form reading image information obtained in each individual business process is used, and the original form is stored and saved in a warehouse or the like, and all of the code data is saved. The data item name is added to the full-text search index as a key, and the data item name and the entity are searched by AND (AND) at the time of the search. As a result, a general-purpose image search system that does not depend on applications can be realized.

以下に本実施の形態例の検索方法の概要を具体例を用いて説明する。
例えば、「今日は良い天気です。」という文字列をバイグラムのインデックスを作成する場合、まず対象となる文書のＩＤを割り振り、例えば１番とする。続いて次のような索引情報を生成する。 The outline of the search method according to the present embodiment will be described below using a specific example.
For example, when a bigram index is created with a character string “Today is good weather”, the ID of the target document is first assigned, for example, number 1. Subsequently, the following index information is generated.

「今日１」
「日は１」
「は良１」
「良い１」
「い天１」
「天気１」
「気で１」
「です１」
「す。１」
「。１」 "Today 1"
"Day is 1"
"Harah 1"
"Good one"
"Iten 1"
"Weather 1"
“Care is 1”
"Is 1"
"Su. 1"
". 1"

又、同様に、「今日は大雨です。」という文書のＩＤを 2番とし、次の検索情報（バイグラム情報）を生成する。
「今日２」
「日は２」
「は大２」
「大雨２」
「雨で２」
「です２」
「す。２」
「。２」 Similarly, the ID of the document “Today is heavy rain” is set to No. 2, and the next search information (bigram information) is generated.
"Today 2"
"Day is 2"
"Haha 2"
"Heavy rain 2"
"In the rain 2"
"It is 2"
"Su. 2"
". 2"

本実施の形態例では、この文書ＩＤ１番と２番の文書を合算した索引情報を、以下のように割り当てる。
「今日１，２」
「日は１，２」
「は良１」
「良い１」
「い天１」
「天気１」
「気で１」
「です１，２」
「す。１，２」
「。１，２」
「は大２」
「大雨２」
「雨で２」 In the present embodiment, the index information obtained by adding the document IDs No. 1 and No. 2 is assigned as follows.
"Today 1, 2"
“Day is 1, 2”
"Harah 1"
"Good one"
"Iten 1"
"Weather 1"
“Care is 1”
"Is 1, 2"
"Su. 1, 2"
". 1, 2"
"Haha 2"
"Heavy rain 2"
"In the rain 2"

これが本実施の形態例におけるバイグラムによる転置インデックスである。このようにして作られた索引情報を使って「天気」という文字列を検索するには、文書ＩＤが１番の文書が該当する。また、「今日」という文字列を検索すると文書ＩＤが１番と２番の文章が該当する。 This is a transposed index by bigram in the present embodiment. In order to search for the character string “weather” using the index information created in this way, the document whose document ID is No. 1 corresponds. In addition, when the character string “today” is searched, the sentences with the document IDs 1 and 2 correspond.

そして、「今日」と「大雨」という２つの文字列を含む文書を探す場合には，「今日」の検索結果と「大雨」の検索結果を論理積（ＡＮＤ）条件で合成すると、文書ＩＤが２番という結果が得られる。以上の検索方法が本実施の形態例の検索方法の基本をなしている。 When searching for a document including two character strings “today” and “heavy rain”, the search result of “today” and the search result of “heavy rain” are combined with a logical product (AND) condition. The result of No. 2 is obtained. The above search method is the basis of the search method of this embodiment.

次に、実際の帳票イメージに適用した具体例を次に説明する。本実施の形態例では、各業務で共通のイメージ参照基盤を使う。具体的には、図２に示す税公金の納付書イメージと口座振替依頼書のイメージを管理する場合に、表示内容を当該イメージを検索するための処理は以下のようになる。 Next, a specific example applied to an actual form image will be described next. In this embodiment, a common image reference base is used for each business. More specifically, when managing the image of the tax payment document image and the account transfer request document shown in FIG. 2, the processing for searching the image for display contents is as follows.

インデックスの中にキーの属性(名前)を含んでインデックス化し、例えば、税公金の納付書イメージの参照と口座振替依頼書で共通のイメージ参照基盤を使うためには、次のように行う。 In order to index and include key attributes (names) in the index, for example, in order to use a common image reference platform for reference to tax payment payment image and fund transfer request, it is performed as follows.

納付書(税公金)を登録してイメージ検索処理するシステムでは、表示されている以下の項目を検索キーワードとする。
例えば、コードデータとして、納付書に表されている銀行収納日付、収納店、収納市町村・企業名、金額等が得られる。 In a system for registering a payment slip (tax public money) and performing an image search process, the following items displayed are used as search keywords.
For example, as the code data, the bank storage date, the storage store, the storage municipality / company name, the amount of money, and the like shown on the payment slip are obtained.

システムを作成する際の図２に示すデータのインデックスは、例えば、納付済通知書(以後「済通」と省略。)である旨の情報となる。文書ＩＤとしては済通に表示されている以下の情報等である、
銀行収納日付、例えば「２０１３／０１／１５」、
収納店。例えば、「１０２」、
収納市町村名「○○市」、
金額、例えば「４万円」、
上記のそれぞれを一単語とし、全文検索インデックスを生成する。なお、このときの文書ＩＤ部はそれぞれイメージ格納先のパスを関連付ける（紐づける）。 The index of the data shown in FIG. 2 when creating the system is information indicating, for example, a payment notice (hereinafter abbreviated as “done”). The document ID is the following information etc. that are displayed in a completed manner.
Bank storage date, for example “2013/01/15”,
Storage store. For example, “102”,
Storage city name `` ○○ city '',
Amount of money, for example "40,000 yen"
Each of the above is regarded as one word, and a full-text search index is generated. The document ID portion at this time associates (links) the image storage destination path.

口座振替依頼書を登録してイメージ検索処理する場合の例では、インデックスとしては、例えば、口座振替依頼書登録の業務処理過程で得られた、口座振替依頼書である旨の情報である旨の情報であり、
文書ＩＤとしては、口座振替依頼書に表示されている以下の情報等である。 In the example of registering an account transfer request form and performing an image search process, the index is, for example, information indicating that it is an account transfer request form obtained in the business process of registering the account transfer request form. Information,
The document ID includes the following information displayed in the account transfer request form.

登録日付例えば「２０１３／０１／１５」、
口座番号例えば「４７１９２２０２０５０２６４」、
委託会社コード例えば「０２６４」、
委託会社名「□□株式会社」
をそれぞれ一単語として全文検索インデックスを生成する。なお、このときの文書ＩＤ部はそれぞれイメージ格納先のパスを関連付けている（紐づける）。 Registration date such as “2013/01/15”,
Account number, for example, “4719220205264”,
Consignment company code, for example “0264”
Consigned company name “□□ Corporation”
A full-text search index is generated with each as one word. Note that the document ID portion at this time associates (links) the image storage destination path.

以上の場合における全文検索インデックスとイメージの関係を図３に示す。
本実施の形態例では、論理的な単語とイメージの関係を示しただけというシンプルな構成であり、物理的なインデックステーブルの構造は、図３に示す様に、イメージのインデックスと、該インデックスに対応するイメージへのポインタ情報が得うイメージへのポインタごく簡単な構成としている。 FIG. 3 shows the relationship between the full-text search index and the image in the above case.
The present embodiment has a simple configuration in which only the relationship between logical words and images is shown, and the structure of a physical index table includes an image index and an index as shown in FIG. The pointer to the image from which the pointer information to the corresponding image can be obtained is very simple.

単純に全文検索を利用、例えば、番号、日付、氏名や住所を含んだ検索キーワードで検索する場合には、次のような不都合が発生する。例えば口座番号の「４７１９２２０２０５０２６４」をキーとしてインデックスにおき、「４７１９２２」の検索をかけると「４７１９２２」を含んだ全てがヒットしてしまう。業務では、数字番号をキーとしているものは非常に多い。 When the full text search is simply used, for example, when searching with a search keyword including a number, date, name and address, the following inconvenience occurs. For example, if the account number “4719220205264” is placed in the index and a search for “471922” is performed, all items including “471922” are hit. In business, there are very many things that use numeric numbers as keys.

例えば、銀行業務に限っても、口座番号に類似のものとして、ＣＩＦ番号、案件番号、稟議書番号、受付番号がある。同様に、日付にしても、受付日、処理日、登録日、回答日、受領日、実行日、解約日、発行日等がある。住所をキーワードとする場合においても、本人住所、勤務先住所、保証人住所、物件所在地などあり、効率の良い検索を行うためには、これらを区別して検索する必要がある。 For example, even if it is limited to banking operations, there are CIF numbers, project numbers, proposal numbers, and reception numbers that are similar to account numbers. Similarly, the date includes a reception date, a processing date, a registration date, a response date, a receipt date, an execution date, a cancellation date, an issue date, and the like. Even when an address is used as a keyword, there are a principal address, a work address, a guarantor address, a property address, etc., and in order to perform an efficient search, it is necessary to perform a search separately.

このためには、ＸＭＬ等を使って同じ属性のデータだとシステムに知らせなければならないが、同じ属性のデータだとシステムに知らせることは本質的にデータ項目を定義し構造化することに該当し、業務の追加に対応するのが難しく（弱く）なる。 For this purpose, it is necessary to inform the system that the data has the same attribute using XML or the like, but informing the system that the data has the same attribute is essentially defining and structuring data items. It becomes difficult (weak) to respond to the addition of business.

このため、本実施の形態例では、データ項目名を全文検索のインデックスに加えておきながら、検索時にデータ項目名と実体のインデックスを論理積（ＡＮＤ）で検索することで解決している。これにより、先の登録時に予定していない他業務についての処理を追加するときに、参照業務プログラムの追加変更を行わず業務の追加を行える。 For this reason, in the present embodiment, the data item name is added to the full-text search index, and the data item name and the entity index are searched by logical product (AND) at the time of the search. As a result, when a process for another business that is not scheduled at the time of previous registration is added, the business can be added without changing the reference business program.

例えば、処理システム１５０のイメージリーダ１５１で図２に示す納付済通知書を読み込み、データ認識部１５２で認識し、キーワード抽出部１６３で抽出した納付済通知書処理結果ファイルの例及び納付済通知書イメージファイル（イメージファイル２３０）の具体例を図４に示す。 For example, the payment notice shown in FIG. 2 is read by the image reader 151 of the processing system 150, recognized by the data recognition unit 152, and extracted by the keyword extraction unit 163. A specific example of the image file (image file 230) is shown in FIG.

図４は、図２に示す納付済通知書から、各データ項目名及び内容「○○市」、「４万円（収納税額）」、「１０２（収納銀行店コード）」、「２０１３／０１／０５（銀行収納日）」、及びイメージファイルの格納座標である「イメージへのパス」を生成してインデックスファイル３００に格納した状態を示している。 FIG. 4 shows the data item names and contents “XX city”, “40,000 yen (storage tax amount)”, “102 (storage bank store code)”, “2013/01” from the payment notice shown in FIG. / 05 (bank storage date) ”and“ image path ”which is the storage coordinates of the image file are generated and stored in the index file 300. FIG.

続いて、処理システム１５０では、生成した納付済通知書処理結果ファイルに帳票名とデータ項目名を追加したインデックス生成用ファイルを作成することになる。図５の左側に示すのが図４に示す納付済通知書処理結果ファイルに帳票名とデータ項目名を追加したインデックス生成用ファイルである。図５は納付済通知書の帳票ＩＤとして「１２３」を割り当てた例である。 Subsequently, the processing system 150 creates an index generation file in which a form name and a data item name are added to the generated payment notice processing result file. The left side of FIG. 5 shows an index generation file in which a form name and a data item name are added to the paid notice processing result file shown in FIG. FIG. 5 shows an example in which “123” is assigned as the form ID of the paid notice.

インデックス生成用ファイルは書類イメージ蓄積／検索装置２００に送られ、処理結果イメージファイル２３０に格納した論理的な単語をインデックスとしてインデックスファイル３００に帳票ＩＤ１２３のファイルにイメージ毎にイメージファイルへのパス情報（格納座標情報）とともに格納することになる。 The index generation file is sent to the document image storage / retrieval apparatus 200, and the logical word stored in the processing result image file 230 is used as an index. The index file 300 has a form ID 123 file and the path information to the image file for each image ( (Stored coordinate information).

以上の説明は、納付済通知書について行ったが、口座振替依頼書においてもまったく同様に口座振替依頼書の「委託会社名：□□株式会社」「口座番号：４７１９２２０２０２０２６４」「口座名義人／印鑑」（収納市町村名、収納店）について同様に処理結果ファイル、インデックス生成用ファイルを作成する。そして、図６に示すように、両方のインデックスを合成又は納付済通知書インデックスに追加することによりインデックスファイルを生成する。 The above explanation has been made with respect to the payment notice. However, in the account transfer request document, “trusted company name: □□ corporation” “account number: 47 1922202020264” “account holder / Similarly, a processing result file and an index generation file are created for the “seal” (storage city name, storage store). Then, as shown in FIG. 6, an index file is generated by adding both indexes to the combined or paid notice index.

即ち、既存の帳票インデックスに共通に処理したい帳票のインデックス生成用ファイルの内容を追加したり、合成したりするのみで良い。例えば、図３に示すように、処理対象帳票に追加や変更があっても、単に新たに追加となったインデックスを追加するのみで良く、検索時もインデックス（検索キー）を指定するのみでよい。 That is, it is only necessary to add or synthesize the contents of the index generation file of the form to be processed in common with the existing form index. For example, as shown in FIG. 3, even if there is an addition or change in the processing target form, it is only necessary to add a newly added index, and it is only necessary to specify an index (search key) during the search. .

即ち、図３の例では、「委託会社コード０２６４」がインデックスとして設定されるのに比し、従来では「委託会社」としての項目にインデックスを割り当て、続いてその下に「０２６４」というインデックスを割り当てなければならず、口座振替依頼書処理システムでこの「委託会社コード」が項目としてない場合には、新たに「委託会社」としての項目を設ける必要があり、口座振替依頼書処理システムとはまったく別のシステム設計が必要となっていた。本実施の形態例ではこのようなシステム再構築が不要である。 That is, in the example of FIG. 3, compared to the case where “consignment company code 0264” is set as an index, an index is conventionally assigned to the item “consignment company”, and subsequently an index “0264” is assigned below. If this “consignment company code” is not an item in the account transfer request processing system, it is necessary to provide a new item “consignment company”. What is an account transfer request processing system? A completely different system design was required. In the present embodiment, such a system reconstruction is unnecessary.

本実施の形態例では、イメージ情報を検索することが目的であり、細かいデジタルデータを抽出する必要がないことに着目したもので、このようなイメージ検索システムに特有の検索方法については従来ほとんど注目もされていなかったのであり本願発明に特有のものである。 In this embodiment, the purpose is to search for image information, and it is not necessary to extract fine digital data. Conventionally, a search method specific to such an image search system has been mostly noticed. It has not been done and is unique to the present invention.

以上説明したように本実施の形態例では、各業務で共通のイメージ参照基盤とするため、帳票に表示されている各論理的単語全体をインデックスとして取扱い、論理的な単語とイメージの関係を関連付けたのみであるため、他の帳票や項目の追加があっても、単に追加合成するのみで足り、イメージ情報登録／検索システムの仕様変更などにも極めて容易に対応できる。 As described above, in this embodiment, in order to use a common image reference platform for each business, each logical word displayed in the form is treated as an index, and the relationship between the logical word and the image is associated. Therefore, even if other forms or items are added, it is only necessary to add and synthesize, and it is very easy to cope with changes in the specifications of the image information registration / search system.

以上の様にしてイメージファイル２３０に登録したイメージ情報をインデックスファイル３００を利用して検索する処理を図７も参照して以下に説明する。
イメージ情報を検索（参照）する場合には、帳票名（例えば納付済通知書あるいは口座振替依頼書等）とデータ項目名を論理和（ＡＮＤ）条件をかけて検索処理を行う。例えば、インデックスファイル３００に帳票ＩＤを指示すれば、イメージファイル２３０から帳票ＩＤに対応するイメージファイルが特定され、続いて帳票を特定するためのインデックスを指定することになる。 Processing for searching for image information registered in the image file 230 as described above using the index file 300 will be described below with reference to FIG.
When searching (referring to) image information, a search process is performed by applying a logical sum (AND) condition to the form name (for example, paid notice or account transfer request form) and the data item name. For example, when a form ID is specified in the index file 300, an image file corresponding to the form ID is specified from the image file 230, and then an index for specifying the form is specified.

例えば、操作端末１０から図７に示すように検索キーとして「済通」で「収納市町村名○○市」を指定すると、検索エンジン２５０、インデクサ２６０はインデックスファイル３００をアクセスして帳票ＩＤ「１２３」を得る。これにより、イメージファイル２３０の納付済通知書イメージファイルが対象イメージファイルであることが特定される。 For example, as shown in FIG. 7, when “stored” is specified as “search city” as the search key from the operation terminal 10, the search engine 250 and the indexer 260 access the index file 300 and form ID “123”. Get. Thereby, it is specified that the paid notice image file of the image file 230 is the target image file.

ここでイメージ数がさほど多くない場合にはこのイメージファイルを順次確認して所望もイメージを特定すればよい。しかし、イメージ数が多い場合には、「１０２（収納銀行店コード）」、「２０１３／０１／０５（銀行収納日）」、「４万円（収納税額）」といったインデックスのいずれか、あるいは全てを指定して所望のイメージを得ることが可能となる。 If the number of images is not so large, the image files may be sequentially confirmed to specify the desired image. However, if the number of images is large, any or all of indexes such as “102 (storage bank store code)”, “2013/01/05 (bank storage date)”, “40,000 yen (storage tax amount)” It is possible to obtain a desired image by designating.

以上説明したように本実施の形態例によれば、処理対象の書類種別が追加となった場合にも、単に書類種別ＩＤを追加割り当てし、処理対象書類に表示されている文字列からインデックスを抽出して単にインデックスファイル３００に追加登録するのみで、容易に対応でき、イメージファイル２３０に登録した書類イメージを検索することができ、この場合にも新たに検索プログラムをを作成する必要がなく、新たにイメージファイル２３０に登録する書類に表示されている文字列を単に検索のためのインデックスとしてインデックスファイル３００に登録するのみで実現する。 As described above, according to this embodiment, even when a document type to be processed is added, a document type ID is simply assigned and an index is set from the character string displayed on the document to be processed. By simply extracting and registering in the index file 300, it is possible to easily handle the document image registered in the image file 230. In this case, there is no need to create a new search program. This is realized simply by registering a character string displayed in a document newly registered in the image file 230 in the index file 300 as an index for search.

以上の説明は主に口座振替システム１１０における口座振替依頼書処理や納付済通知書処理について説明したが、入出金伝票処理システム１２０による入出金伝票をイメージ化して必要なインデックスを生成してのイメージファイル２３０への登録処理であっても、伝票に表示されている文字列の全文についてインデックスとしてインデックスファイル３００に登録し、伝票種別毎に固有の書類ＩＤを割り当てて同様に上記インデックスに関連付けてインデックスファイル３００に登録することになる。法人融資処理システム１３０でも同様である。 The above explanation mainly explained the account transfer request processing and payment notice processing in the account transfer system 110, but an image of generating a necessary index by imaging the deposit / withdrawal slip processing system 120. Even in the registration process to the file 230, the entire text of the character string displayed on the slip is registered as an index in the index file 300, a unique document ID is assigned to each slip type, and the index is similarly associated with the index. It will be registered in the file 300. The same applies to the corporate loan processing system 130.

Claims

An image information processing apparatus for storing various document images related to various tasks,
Image storage means for storing the document image in a readable manner;
In the digital information corresponding to the character information displayed in the document image registered in the image storage means, the character information indexing means that uses a character string for each word of the character information as an index corresponding to the document image;
A document type index unit that assigns a unique document type ID to each type of the document image and associates the index with the character information index unit;
Index storage means for storing the index and the document type ID as read keywords for the document image registered in the image storage means ,
An image information processing apparatus capable of reading out the document image stored in the image storage unit by using the index stored in the index storage unit as a read keyword .

Image reading means for receiving the index and the document type ID stored in the index storage means and reading out the document image stored in the image storage means specified by all of the received index and document identification ID 2. The image information processing apparatus according to claim 1, further comprising:

The document image is a form image. When a document type to be processed is added, the document type index unit assigns a unique document type ID to the newly added document type, and the character information index unit is added. 3. The image information processing apparatus according to claim 1, wherein the image information processing apparatus can cope with the problem by additionally registering a character string corresponding to the character information displayed in the document to be processed as an index in the index storage unit. .

4. The image information processing apparatus according to claim 1, wherein the character information indexing unit sets the entire text displayed in the processing target document as an index target.

An image information processing method in an image information processing apparatus comprising document image storage means for storing various document images relating to various business operations and index storage means for storing index information for the document images stored in the document image storage means. And
Recognizing the character information displayed in the document image stored in the document image storage means and extracting all character strings for each word as an index corresponding to the document image;
A unique document type ID is assigned to each type of the document image, associated with the extracted index, and stored in the index storage unit as index information for the document image,
Image information processing method comprising the next possible Rukoto reads the document image stored in said image memory means as a keyword reading the index stored in the index storage unit.

Receiving the index and the document type ID stored in the index storage means, and reading the document image stored in the image storage means specified by all of the received index and document identification ID; The image information processing method according to claim 5.

The document image is a form image, and when a processing target document type is added, a unique document type ID is assigned to the newly added document type, and corresponds to the character information displayed on the added processing target document. 7. The image information processing method according to claim 5, wherein the image information processing method can cope by additionally registering a character string as an index in the index storage unit.