JPH0210480A

JPH0210480A - Character deciding method

Info

Publication number: JPH0210480A
Application number: JP63159362A
Authority: JP
Inventors: Taiji Mori; 泰二森
Original assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Current assignee: Fuji Electric Co Ltd; Fuji Facom Corp
Priority date: 1988-06-29
Filing date: 1988-06-29
Publication date: 1990-01-16

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、文字読取センサと中央処理装置と文字認識部
とから成る文字認識装置における文字判別方法に関する
ものである。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a character discrimination method in a character recognition device comprising a character reading sensor, a central processing unit, and a character recognition section.

[Conventional technology]

第３図は普通の文字認識装置の構成を示すブロック図で
ある。同図において、■は光学文字読取センサ、２はＣ
ＰＵ　（中央処理装置）、３は文字認識部である。FIG. 3 is a block diagram showing the configuration of an ordinary character recognition device. In the same figure, ■ is an optical character reading sensor, 2 is a C
PU (central processing unit) 3 is a character recognition unit.

ＣＰＵ２は、文字読取センサ１が読取った文字画像デー
タを入力されて記憶した後、該文字画像データを取り出
して文字認識部３に渡し、その認識結果としての文字コ
ードや形状特徴データを該文字認識部から受け取ること
により、文字判別を行う。After inputting and storing the character image data read by the character reading sensor 1, the CPU 2 takes out the character image data and passes it to the character recognition unit 3, and the character code and shape feature data as the recognition result are used for the character recognition. Character discrimination is performed by receiving the information from the department.

さてかかる文字認識装置における文字判別では、対象と
する文字が、数字の零（０）、英字のオー（０）、丸印
（○）等であるときは、形状が似ているところからその
識別が紛られしく、英字のオー（０）であるにもかかわ
らず、数字の零（０）と誤識別したり、或いはその逆に
誤識別するようなことが行われていた。In character discrimination using such a character recognition device, when the target character is a number zero (0), an alphabetic character O (0), a circle mark (○), etc., it is possible to identify the character based on the similarity in shape. was so confusing that it was mistakenly identified as the number zero (0) even though it was the alphabetic letter O (0), or vice versa.

これを改善して識別率を向上させる方法として、対象文
字が数字の零（０）、英字のオー（０）、丸印（○）等
のいずれかであるときは、その文字単独では判別しない
で、その前後に位置する文字が何であるかをも考慮に入
れて判別する方式が知られている。As a way to improve this and increase the recognition rate, if the target character is a number zero (0), an alphabetic character O (0), a circle mark (○), etc., the character is not identified by itself. There is a known method that takes into account the characters located before and after the character.

即ちＣＰＵ２が文字認識部３から受け取った文字コード
が数字の零（０）、英字のオー（０）、丸印（○）等の
いずれかであるときは、そのコードを散散えず未判定コ
ードとしておき、その未判定コードの前後の文字、例え
ば前（左隣）の文字が数字で後（右隣）の文字も数字で
あるような場合には、その未判定コードも数字である可
能性が強いので数字の零（０）と断定する。両隣りのい
ずれか一方の文字が数字で他方の文字が漢字とか、ひら
がな、記号等であれば、やはりその未判定コードは数字
である可能性が強いので数字の零（０）と断定する。That is, when the character code that the CPU 2 receives from the character recognition unit 3 is a number zero (0), an alphabetic character O (0), a circle mark (○), etc., the code is not scattered and is not determined. If the characters before and after the undetermined code are numbers, for example, the character before (to the left) is a number and the character after (to the right) is also a number, it is possible that the undetermined code is also a number. Since it has a strong gender, it is decided that it is the number zero (0). If one of the adjacent characters is a number and the other character is a kanji, hiragana, symbol, etc., there is a strong possibility that the undetermined code is a number, so it is determined to be the number zero (0).

同様にその未判定コードの前（左隣）の文字が英字で後
（右隣）の文字も英字であるような場合には、その未判
定コードも英字である可能性が強いので英字のオー（０
）と断定する。両隣りのいずれか一方の文字が英字で他
方の文字が漢字とか、ひらがな、記号等であれば、やは
りその未判定コードは英字である可能性が強いので英字
のオー（０）と断定する。Similarly, if the character before (adjacent to the left) of the undetermined code is an alphabetic character and the character after (adjacent to the right) is also an alphabetic character, there is a strong possibility that the undetermined code is also an alphabetic character. (0
). If one of the characters on both sides is an alphabetic character and the other character is a kanji, hiragana, symbol, etc., there is a strong possibility that the undetermined code is an alphabetic character, so it is determined to be the alphabetic character O (0).

また両隣りの文字が漢字や記号であればその未判定コー
ドは丸印（０）であると断定する。Furthermore, if the characters on both sides are kanji or symbols, the undetermined code is determined to be a circle mark (0).

このようにして、対象文字が数字の零（０）、英字のオ
ー（０）、丸印（○）等のいずれかであるとき、その文
字単独で判別していたときに比べてその認識率の向上を
図ることができる。In this way, when the target character is a number zero (0), an alphabetic character O (0), a circle mark (○), etc., the recognition rate is higher than when the character is identified alone. It is possible to improve the

[Problem to be solved by the invention]

しかし上述の従来技術においては、その未判定コードの
一方の隣りの文字が数字で他方の隣りの文字が英字であ
るというような場合には、数字である可能性と英字であ
る可能性とが半々ということになり断定ができなくなる
。従って結果としてその文字単独で判別せざるを得な（
なり、認識率の低下をきたすという問題があった。However, in the above-mentioned conventional technology, when one adjacent character of the undetermined code is a numeric character and the other adjacent character is an alphabetic character, there is a possibility that the undetermined code is a numeric character or an alphabetic character. It's a 50/50 split, so it's impossible to say for certain. Therefore, as a result, we have no choice but to distinguish the character by itself (
Therefore, there was a problem that the recognition rate decreased.

本発明の目的は、前後の文字の関係から、数字の零（０
）、英字のオー（０）、丸印（○）等のいずれかを対象
とするその未判定コードの断定が困難であるときも、そ
の断定を可能にして認識率の向上を図り得る文字判別方
法を提供することにある。The purpose of the present invention is to reduce the number zero (0) from the relationship between the preceding and following characters.
), the alphabetic character 0 (0), the circle mark (○), etc. Even when it is difficult to determine the unidentified code, it is possible to determine the character and improve the recognition rate. The purpose is to provide a method.

[Means to solve the problem]

上記目的達成のため、本発明では、文字読取センサと中
央処理装置と文字認識部とから成り、中央処理装置は、
文字読取センサが読取った文字画像データを入力されて
記憶した後、該文字画像データを取り出して文字認識部
に渡し、その認識結果としての文字コードおよび形状特
徴データを該文字認識部から受け取る文字認識装置にお
いて、中央処理装置は、文字認識部から受け取った文字
コードの中に、数字の零（０）、英字のオー（０）、丸
印（○）のいずれかを表わすコード（以下、未判定コー
ドという）が含まれているとき、該未判定コードの前後
に位置する文字が数字、英字、漢字、ひらがな、カタカ
ナ、或いは句点等の記号のいずれであるかを判別し、そ
の未判定コードの前、後、或いは前後に位置する文字が
予め定められた特定の文字であるとき、その未判定コー
ドについて改めて数字の零（０）、英字のオー（０）、
丸印（○）の中のいずれであるかを断定し、その未判定
コードの前、後、或いは前後に位置する文字が予め定め
られた特定の文字ではない場合には、文字認識部から受
け取ったその未判定コード文字の形状特徴データに基づ
いて、その未判定コードが数字の零（０）、英字のオー
（０）、丸印（○）の中のいずれであるかを断定するこ
ととした。In order to achieve the above object, the present invention includes a character reading sensor, a central processing unit, and a character recognition section, and the central processing unit is configured to:
Character recognition: After inputting and storing character image data read by a character reading sensor, the character image data is extracted and passed to a character recognition unit, and character codes and shape feature data as recognition results are received from the character recognition unit. In the device, the central processing unit selects a code (hereinafter referred to as undetermined code) representing any of the number zero (0), the alphabetic character O (0), or the circle mark (○) among the character codes received from the character recognition unit. code), it is determined whether the characters located before and after the undetermined code are numbers, alphabets, kanji, hiragana, katakana, or symbols such as periods, and the undetermined code is When the characters located before, after, or before and after are predetermined specific characters, the number zero (0), the alphabetic character O (0),
It determines which of the circles (○) it is, and if the characters located before, after, or before and after the undetermined code are not predetermined specific characters, the characters received from the character recognition unit are Based on the shape characteristic data of the undetermined code character, it is determined whether the undetermined code is a number zero (0), an alphabetic character O (0), or a circle mark (○). did.

[Effect]

このように、数字の零（０）、英字のオー（０）、丸印
（○）等のいずれかを対象とするその未判定コードの断
定を、前後に位置する文字の組合わせの関係で断定する
だけでなく、それで断定できない場合には、その未判定
コード文字の形状特徴データ（具体的には、その文字の
高さ寸法と幅寸法の比）を用いて断定する。In this way, the unidentified code for any of the numbers zero (0), alphabetical characters O (0), circle marks (○), etc. can be determined based on the combination of characters located before and after it. In addition to making a determination, if the determination cannot be made, the determination is made using the shape feature data of the undetermined code character (specifically, the ratio of the height dimension to the width dimension of the character).

第２図は、英字のオー（０）と数字の零（０）について
、文字の高さ寸法と幅寸法の比を比較して示した説明図
である。FIG. 2 is an explanatory diagram showing a comparison of the height and width ratios of the alphabetic character O (0) and the number zero (0).

同図（ａ）に見られるように、英字のオー（０）では、
高さ寸法を１としたとき、幅寸法は０゜９であり、数字
の零（０）では、高さ寸法を１としたとき、幅寸法は０
．６である。従って、しきい値として０．７とか０．８
とかを選んでおけば、両文字を識別することができる。As seen in the same figure (a), in the alphabet O (0),
When the height dimension is 1, the width dimension is 0°9, and with the number zero (0), when the height dimension is 1, the width dimension is 0°9.
．． It is 6. Therefore, the threshold value is 0.7 or 0.8.
If you select something like , you can distinguish both characters.

しかし第２図に示したのは一例であり、活字の種類とか
書体などによってもこの比は変わるので、文書毎にしき
い値を補正する必要がある。その場合、前後に位置する
文字の組合わせの関係で数字の零（０）であるとか英字
のオー（○）であるとか断定できた場合に、そのときの
形状特徴データ（文字の高さ寸法と幅寸法の比）を記憶
しておき、しきい値の決定に役立てることができる。However, what is shown in FIG. 2 is just an example, and since this ratio changes depending on the type of typeface, font, etc., it is necessary to correct the threshold value for each document. In that case, if it can be determined that it is a number zero (0) or an alphabetic letter O (○) based on the combination of characters located before and after it, then the shape feature data (height dimension of the character) and the width dimension) can be memorized and used to determine the threshold value.

〔Example〕

第１図は本発明を実施するに際してＣＰＵが動作するそ
の動作のフローを示したフローチャートである。FIG. 1 is a flowchart showing the flow of operations performed by the CPU when implementing the present invention.

同図に見られるように、ＣＰＵは文字画像データを抽出
しくステップＳ１）、対象文字を認識しくステップＳ２
）、これを繰り返して１ブロック分の認識結果を得る。As shown in the figure, the CPU extracts character image data in step S1) and recognizes the target character in step S2.
), this is repeated to obtain recognition results for one block.

次にその１ブロック分の文字認識結果をサーチして数字
の零（０）、英字のオー（Ｏ）、丸印（○）等の文字を
見付は出すと、それを未判定文字としてその文字の前後
、両隣りの文字種の組合わせを求める。数字の零（０）
、英字のオー（０）、丸印（○）等の文字が連続してい
る場合には、その全体を一つの文字グループと考え、そ
の両隣りの文字種の組合わせを求める（ステップＳ４）
。Next, the character recognition results for that one block are searched to find characters such as the number zero (0), the alphabetic character O (O), and the circle mark (○), which are then marked as unidentified characters. Find the combination of character types before, after, and on both sides of a character. Number zero (0)
, the alphabetic character O (0), a circle mark (○), etc., are consecutive, the entire character group is considered as one character group, and the combination of character types on both sides is determined (step S4).
.

求めた組合わせが英字と数字の組合わせであるか否かを
判断しくステップＳ５）、英字と数字の組合わせである
場合には、その未判定文字の幅と高さの寸法比がしきい
値より高いか低いか判定しくステップＳ６．）、高い場
合には英字のオー（○）と判定しくステップＳ８）、低
い場合には数字の零（０）と判定する（ステップＳ７＞
。It is determined whether the obtained combination is a combination of an alphabetic character and a number (step S5), and if it is a combination of an alphabetic character and a number, the dimensional ratio of the width and height of the undetermined character is set as a threshold. Step S6. ), if it is high, it is determined to be the letter O (○) (step S8), and if it is low, it is determined to be the number zero (0) (step S7>
.

ステップＳ５で英字と数字の組合わせではないと判定し
た場合には、ステップＳ９に進んでさらにその組合わせ
を吟味し、それが英字と英字（または漢字）の組合わせ
である場合には、ステップＳ１３に進んで英字のオー（
０）と判定する。ステップＳ９で英字と英字（または漢
字）の組合わせではないと判定した場合は、ステップ３
１０に進んでさらにその組合わせを吟味し、それが数字
と数字（または漢字）の組合わせである場合には、ステ
ップＳ１２に進んで数字の零（０）と判定し、そうでな
い場合には丸印（○）と判定する（ステップ５１１）。If it is determined in step S5 that it is not a combination of alphabetic characters and numbers, the process proceeds to step S9 to further examine the combination, and if it is a combination of alphabetic characters and alphabetic characters (or kanji), step Proceed to S13 and press the letter O (
0). If it is determined in step S9 that it is not a combination of alphabetic characters and alphabetic characters (or kanji), step 3
Step S10 further examines the combination, and if it is a combination of numbers and numbers (or kanji), step S12 determines that it is the number zero (0); otherwise, It is determined that the circle is marked (○) (step 511).

ステップＳ１３において英字のオー（０）と判定した場
合、或いはステップＳ１２において数字の零（０）と判
定した場合には、判定したその文字の幅と高さの寸法比
を記憶しておき、ステップＳ６で用いるしきい値の補正
を行う（ステップ５１４）。If it is determined in step S13 that it is an alphabetic character O (0), or if it is determined that it is a numeric character zero (0) in step S12, the determined width-to-height ratio of the character is memorized, and step The threshold value used in S6 is corrected (step 514).

〔Effect of the invention〕

以上説明したように本発明によれば、前後の文字の関係
から、数字の零（０）、英字のオー（０）、丸印（○）
等のいずれであるかを判定することが困難であるときに
も、その判定を可能にして、互いに紛られしい文字であ
る数字の零（Ｏ）、英字のオー（Ｏ）、丸印（０）等の
文字の認識率の向上を図り得るという利点がある。As explained above, according to the present invention, the number zero (0), the alphabetic character O (0), and the circle mark (○)
Even when it is difficult to determine which of ) has the advantage of improving the recognition rate of characters such as characters.

[Brief explanation of the drawing]

第１図は本発明を実施するに際してＣＰＵが動作するそ
の動作のフローを示したフローチャート、第２図は英字
のオー（○）と数字の零（０）について文字の高さ寸法
と幅寸法の比を比較して示した説明図、第３図は普通の
文字認識装置の構成を示すブロック図、である。符号の説明１・・・光学文字読取センサ、２・・・ＣＰＵ　（中央
処理装置）、３・・・文字認識部代理人　弁理士　並　木　昭　夫Fig. 1 is a flowchart showing the flow of operations performed by the CPU when implementing the present invention, and Fig. 2 shows the height and width dimensions of the alphabetic character O (○) and the number zero (0). FIG. 3 is a block diagram showing the configuration of an ordinary character recognition device. Explanation of symbols 1... Optical character reading sensor, 2... CPU (Central Processing Unit), 3... Character recognition department agent Patent attorney Akio Namiki

Claims

[Claims] 1) Consisting of a character reading sensor, a central processing unit, and a character recognition section, the central processing unit inputs and stores character image data read by the character reading sensor, and then processes the character image data. In a character recognition device that takes out a character code and passes it to a character recognition unit, and receives the character code and shape feature data as a recognition result from the character recognition unit, the central processing unit includes a numeric character in the character code received from the character recognition unit. Zero (0), English letter O (O), circle mark (○
) (hereinafter referred to as an undetermined code), the characters located before and after the undetermined code are numbers, alphabets, kanji, hiragana, katakana, or symbols such as periods. If the character located before, after, or before and after the undetermined code is a predetermined specific character, the number zero (0) or the alphabetic character (O), circle mark (
○), and if the character located before, after, or before and after the undetermined code is not a predetermined specific character, the unidentified character received from the character recognition unit is determined. Based on the shape feature data of the judgment code character,
The undetermined code is the number zero (0) and the alphabetic letter O (O).
, a character discrimination method characterized by determining which one of the circles (○) it is.