[go: up one dir, main page]

JP3914119B2 - Character recognition method and character recognition device - Google Patents

Character recognition method and character recognition device Download PDF

Info

Publication number
JP3914119B2
JP3914119B2 JP2002256913A JP2002256913A JP3914119B2 JP 3914119 B2 JP3914119 B2 JP 3914119B2 JP 2002256913 A JP2002256913 A JP 2002256913A JP 2002256913 A JP2002256913 A JP 2002256913A JP 3914119 B2 JP3914119 B2 JP 3914119B2
Authority
JP
Japan
Prior art keywords
character
extracted
feature
type
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP2002256913A
Other languages
Japanese (ja)
Other versions
JP2004094734A (en
Inventor
智久 鈴木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba Digital Solutions Corp
Original Assignee
Toshiba Corp
Toshiba Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Solutions Corp filed Critical Toshiba Corp
Priority to JP2002256913A priority Critical patent/JP3914119B2/en
Publication of JP2004094734A publication Critical patent/JP2004094734A/en
Application granted granted Critical
Publication of JP3914119B2 publication Critical patent/JP3914119B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Description

【0001】
【発明の属する技術分野】
本発明は、入力画像より切り出した文字を認識処理する文字認識方法に関する。更に、本発明は、入力画像より切り出した文字が印刷活字であるか手書き文字であるかを判定し、その判定結果をもとに文字認識を行う文字認識装置に関する。
【0002】
【従来の技術】
印刷活字と手書き文字の判定結果に従い文字認識方法や文字認識辞書を切替える文字認識処理に関わる技術として、従来では、検出された文字の高さのバラツキを表す特徴量により判定を行う特開平05−189604号公報の方法(従来技術1)、検出された文字のサイズの頻度のエントロピーを特徴量として判定を行う特公平06−32074号公報の方法(従来技術2)、文字間隔の分散が実験値とほぼ等しい場合に印刷活字と判定し、そうでない場合はリジェクト文字数により判定を行う登録3045086号公報の方法(従来技術3)、検出された文字のストロークの方向性や太さ、直線性などの特徴量の内の一つまたは複数の組合せにより判定を行う特開平10−162102号公報の方法(従来技術4)、文字矩形の面積、横幅、高さ、高さと横幅の比率、矩形の中心と行の中心の距離、矩形の中心の間隔のバラツキを表す特徴量の内の一つまたは複数の組合せを用いる特開平10−214308号公報の方法(従来技術5)等が挙げられる。
【0003】
上記した各従来技術のうち、従来技術1,2,5は、印刷活字の文字サイズが一般に、ほぼ一定であることを利用しているが、印刷活字へのノイズの付着や、かすれによる文字サイズの変化によって、印刷活字でも文字サイズにバラツキが現われることがあり、従って文字サイズのバラツキを表す特徴量では印刷活字と手書き文字を明瞭に判別できない場合がある。また、「1」や「−」など、横幅や高さ、面積等が他の文字のそれからかけ離れている文字が、判定の対象となる文字群に多数含まれている場合や、文字が少ない場合等に於いて、印刷活字のサイズや面積のバラツキが大きくなる場合があり、従ってサイズや面積のバラツキによる判定は困難になる。
【0004】
また、従来技術3は、印刷活字に適した文字認識方法が手書き文字をリジェクトする確率が高く、手書き文字に適した文字認識方法が印刷活字をリジェクトする確率が高い場合には妥当な判定が行えるが、文字認識方法によってはそのようなリジェクト確率の差が全く期待できない場合もあるし、ノイズやかすれにより認識不能な文字が混入している場合は、誤った字種に適した文字認識方法で認識を行った場合の方がリジェクト文字数が少なくなってしまう場合もある。
【0005】
また、従来技術4で用いているストロークの方向性や直線性に関する特徴量は、漢字など直線状のストロークが大い文字を認識対象としている場合や、対象とする文字群の中の文字数が多く、直線状のストロークが多い場合には有効であるが、直線成分が少ない数字やアルファベットの小文字、平仮名等を扱う場合は直線成分が少ないため判定が困難である。
【0006】
【発明が解決しようとする課題】
上述したように従来では、ノイズやかすれによる文字の劣化、文字数の不足、判定対象の文字のサイズ等の例外等により、単独の特徴による判定が困難な場合が多いという問題があった。
【0007】
本発明は上記実情に鑑みなされたもので、入力画像より切り出した読み取り対象文字が、印刷活字であるか手書き文字であるかの判定を正確に行うことができる信頼性の高い文字認識が可能な文字認識方法及び文字認識装置を提供することを目的とする。
【0008】
【課題を解決するための手段】
本発明は、単独では正確な判定を可能としない特徴量を複数組み合わせて印刷活字であるか手書き文字であるかの判定を行う構成として、単独の特徴量を用いた場合より正確な判定を行うことができるようにした文字認識方法及び文字認識装置を特徴とする。
【0009】
また、本発明は、入力画像より切り出した読み取り対象文字が、印刷活字であるか手書き文字であるかその判断がつかない字種であるかを判定し、その判定結果により、印刷活字に適した文字認識手段、手書き文字に適した文字認識手段、または印刷活字と手書き文字の両方に適用可能な文字認識手段のいずれかを用いて文字認識を行う構成として、字種判定が困難な場合でも、字種判定の誤りによる精度低下を防ぐことができるようにしたことを特徴とする。
【0010】
即ち、本発明は、入力画像より抽出した文字群が手書き文字であるか印刷活字であるかをその特徴量をもとに判定し、その判定結果をもとに文字認識処理を実行する文字認識装置に於いて、前記入力画像より抽出した文字群から、印刷活字と手書き文字の判定に用いる複数の特徴量を抽出する特徴抽出手段と、前記特徴抽出手段で抽出した特徴量の各々について、値が求まる場合はその値の関数を、値が不定である場合は特徴量に対して予め定められた定数を、それぞれ注目特徴量に対応する関数値として、この関数値をもとに印刷活字と手書き文字の判定に用いる評価値を求め、この評価値が予め定められた閾値未満である場合は手書き文字、上記評価値が上記閾値以上である場合は印刷活字と判定する文字種判定手段と、前記文字種判定手段による判定結果が手書き文字である場合に前記文字抽出手段で抽出された文字を認識する手書き文字認識手段と、前記文字種判定手段による判定結果が印刷活字である場合に前記文字抽出手段で抽出された文字を認識する印刷活字認識手段とを具備したことを特徴とする。このように、特徴量を複数組み合わせて判定を行う機能をもつことで、単独の特徴量を用いた場合に比し、より正確な判定を行うことができる。また、値が不定となり得る特徴量についても評価値の計算式への組込みを可能とする仕組みを導入することで、より多くの特徴量の組込みを可能とし、より高精度な判定を行うことが可能となる。
【0011】
また、本発明は、入力画像より抽出した文字群が手書き文字であるか印刷活字であるかをその特徴量をもとに判定し、その判定結果をもとに文字認識処理を実行する文字認識装置であって、入力画像に対しノイズ除去及び二値化処理を行って認識対象領域を切り出す前処理手段と、前処理手段で切り出した認識対象領域から文字群を抽出する文字抽出手段と、前記文字抽出手段で抽出した文字群から、印刷活字と手書き文字の判定に用いる特徴量を一つまたは複数抽出する特徴抽出手段と、前記特徴抽出手段で抽出した特徴量を用いて前記文字抽出手段で抽出した文字群が、印刷活字、手書き文字、印刷活字であるか手書き文字であるか不明な不明字種のいずれであるかを判定する文字種判定手段と、前記文字種判定手段による判定結果が手書き文字である場合にその判定結果に従う辞書を用いて前記文字抽出手段で抽出された文字を認識する手書き文字認識手段と、前記文字種判定手段による判定結果が印刷活字である場合にその判定結果に従う辞書を用いて前記文字抽出手段で抽出された文字を認識する印刷活字認識手段と、前記文字種判定手段による判定結果が不明字種である場合にその判定結果に従う辞書を用いて前記文字抽出手段で抽出された文字を認識する不明字種認識手段とを具備したことを特徴とする。このように、字種判定が困難な場合に、印刷活字と手書き文字の両方に適用可能な文字認識手段(認識用辞書)を適用することにより、字種判定が困難な場合でも、字種判定の誤りによる精度低下を防ぐことができる。
【0012】
【発明の実施の形態】
本発明に於いては、単独では正確な判定を可能としない特徴量を複数組み合わせて正確な判定を行うことと、異常値を示すことがある特徴量や、抽出不能になる場合がある特徴量をも判定に利用している。
【0013】
本発明の第1実施形態は、図1に示すように、前処理手段101、文字抽出手段102、特徴抽出手段103、文字種判定手段104、手書き文字認識手段105、印刷活字認識手段106等の構成要素を有する。
【0014】
前処理手段101は、入力画像に対して、ノイズ除去や二値化、フォーム除去等を行って、認識対象領域を切り出す。文字抽出手段102は上記前処理手段101で切り出した認識対象領域から認識対象となる文字群を抽出する。特徴抽出手段103は上記文字抽出手段102で抽出した文字群から、印刷活字と手書き文字の判定を行うための特徴量を一つまたは複数抽出する。
【0015】
文字種判定手段104は、上記特徴抽出手段103で抽出した特徴量の関数として、手書き文字と印刷活字の判定を行うための評価値を計算し、予め定められた閾値との比較により、「印刷活字」であるか「手書き文字」であるかを判定し、その結果が「印刷活字」なら印刷活字認識手段106により上記抽出した文字群の認識を行い、「手書き文字」なら手書き文字認識手段105により上記抽出した文字群の認識を行う。
【0016】
また、本発明の第2実施形態は、図10に示すように、上記図1に示す第1実施形態の各構成要素に加えて、不明字種認識手段207を有する。
【0017】
文字種判定手段204は、上記特徴抽出手段203で抽出した特徴量の関数として、手書き文字と印刷活字の判定を行うための評価値を計算し、予め定められた閾値との比較により、「印刷活字」であるか「手書き文字」であるか「印刷活字であるか手書き文字かが不明な字種」であるかを判定し、その結果が「印刷活字」であれば印刷活字認識手段206により、「手書き文字」であれば手書き文字認識手段205により、また「不明字種」であれば不明字種認識手段207によって、それぞれ抽出した文字群の認識を行う。
【0018】
ここで上記文字種判定手段104,204での評価値の計算は、後述する(3)式及び(6)式に示す各計算式を用い、各特徴量uの関数g(u)の関数f(g(u),…,g(u))によって行われるが、上記計算式において値が不定となる特徴量uについては、g(u)の代わりに定数を用いることによって、値が不定となる特徴量についても計算を可能にしている。
【0019】
以下に本発明の各実施形態について具体例を挙げて説明する。尚、本発明の処理機能およびその処理手順については、汎用のコンピュータに、文字認識用のソフトウェアを組み込むことによって構成できるため、以下ではそのような構成を仮定して説明を行う。ただし、本発明は各手段を専用ハードウェアの集合体や分散処理用のコンピューターのネットワークシステムとしても構成することもでき、上述の手段の全てを具備する構成ならば、ここで挙げた構成に限らず、どのようなもので実施してもよい。
【0020】
先ず図1乃至図9を参照して本発明の第1実施形態を説明する。
【0021】
図1に於いて、前処理手段101では、入力された画像(スキャナで読み取った文書画像)に対して、処理対象の欄の周辺の画像の切り出しや、二値化、ノイズ除去、フォーム除去等の画像処理が行われる。また、認識対象が帳票上の文字である場合は、罫線やプレプリント等のフォームの除去も行われる。
【0022】
文字抽出手段102では前処理手段101の出力画像から、認識対象となる文字群の文字毎の画像と位置情報の抽出が行われる。
【0023】
特徴抽出手段103では、文字抽出手段102で抽出された文字群毎に、印刷活字と手書き文字の判定に用いる特徴量が一種類または複数種類抽出される。また、特徴量と入力によっては、特徴量の値が求まらないか無意味である場合があるので、そのような場合には、値として「不定」を抽出結果とする。
【0024】
抽出する特徴量としては、例えば、以下で説明する、「文字矩形の端の並びからのずれ」を表す特徴量u、「同じ文字の字形の不一致」を表す特徴量u、「文字認識方法毎のリジェクト文字数の違い」を表す特徴量uの3種類が挙げられる。
【0025】
ここでは、これら3種類の特徴量u、u、uを適用するものとする。この3種類の特徴量u、u、uの抽出方法について述べる。
【0026】
先ず文字矩形の端の並びからのずれを表す特徴量uの抽出方法について、図4及び図5を参照して説明する。
【0027】
文字矩形の端の並びからのずれを表す特徴量uは、文字群中の文字数をN、n番目の文字の外接矩形の上端のY座標をyn、n番目の文字の外接矩形の下端のY座標をynとおくと、次の式で求められるy,yについて、
【数1】

Figure 0003914119
【数2】
Figure 0003914119
−yを最小化するαを勾配法で求め、計算式
【数3】
Figure 0003914119
を計算することによって求める。ただし、median(x)は全nについてのxの中央値とする。
【0028】
上記の方法で求めたy、y、αにより文字の上端と下端の並びが
【数4】
Figure 0003914119
【数5】
Figure 0003914119
と近似されるので(図4参照)、
|ytn−(nα+y)|,|ybn−(nα+y)|
はn番目の文字の上端、下端の文字並びからのずれを表し、uは文字矩形の端の並びからのずれの評価尺度として機能する(図5参照)。
【0029】
上記(式3)は、N=1の場合、必ず0になり、印刷活字、手書き文字の違いとは無関係なので、N=1の場合はuを「不定」とする。
【0030】
次に、同じ文字の字形の不一致を表す特徴量uについて図6乃至図9を参照して説明する。
【0031】
同じ文字の字形の不一致を表す特徴量uは、文字の種類の数をC、文字の種類の番号をc、n番目の文字の認識結果の文字の種類の番号をc、n番目の文字の画像の前景画素数をa、n番目の文字とm番目の文字の画像の左上の角を図6に示すように合わせて重ね合わせた時に、両方の画像で黒画素である画素の個数をvm,nとおくと、
【数6】
Figure 0003914119
によって求める。
【0032】
ここでは、二つの文字画像を重ねる際に左上の角を合わせているが、図7に示すように、重心や二つの文字画像の外接矩形の中心を合わせて重ね合わせる方法、図8に示すように、外接矩形の上辺の中心(図5)を合わせて重ね合わせる方法、または図9に示すように、下辺の中心を合わせて重ね合わせる方法等であっもよい。この際、同じ文字が文字群に含まれていない場合は、上記(式6)を計算することができないので、uを「不定」とする。
【0033】
次に、文字認識方法毎のリジェクト文字数の違いを表す特徴量uについて説明する。文字認識方法毎のリジェクト文字数の違いを表す特徴量uは、印刷活字に適した文字認識方法と、手書き文字に適した文字認識方法の二種類の文字認識方法により、文字群中の一部または、全部の文字の認識を行った後、印刷活字に適した文字認識方法でのリジェクト文字数rと手書き文字に適した文字認識方法でのリジェクト文字数rから次の式で求める。
【0034】
【数7】
Figure 0003914119
【0035】
リジェクト文字数rとrを求めるために行った文字認識の結果は、そのまま廃棄してもよいが、廃棄せずに保存しておき、手書き文字認識手段と印刷活字認識手段106でキャッシュデータとして利用してもよい。
【0036】
以上では、特徴抽出手段103で抽出する特徴量の例として、3種類の特徴量を示したが、特徴抽出手段103で抽出する特徴量としては、印刷活字と手書き文字との違いを表していると考えられるものならば、上記した以外にいかなる量を用いてもよく、その種類の個数も任意である。
【0037】
文字種判定手段104では、特徴抽出手段103で抽出した特徴をもとに、文字抽出手段102で抽出した文字群が、印刷活字であるか、あるいは手書き文字であるかを示す評価値sを求めて、この評価値sが予め定められた閾値θより大きい場合は印刷活字、閾値θ以下である場合は手書き文字であるとの判定が行われる。
【0038】
特徴量の個数をd、i番目の特徴量をuとおくと、評価値sは関数f、関数群gにより
【数8】
Figure 0003914119
と求められる。
【0039】
(u)としては、
【数9】
Figure 0003914119
を用いることができる。ただし、cは予め定められた定数である。この際の定数cを選択的に用いる関数群gの概念図を図2に示している。
【0040】
関数fとしては、g(u)の線形結合
【数10】
Figure 0003914119
を用いることができる。ただし、wは予め定められた定数である。
【0041】
以上の例では、fとして、g(u)線形結合を用いているが、fとしては、g(u)の二次形式
【数11】
Figure 0003914119
をはじめ、g(u)の関数ならば、いかなる関数を用いてもよい。
【0042】
ただし、
【数12】
Figure 0003914119
であり、Wは予め定められた対称行列である。
【0043】
また、g(u)としては、
【数13】
Figure 0003914119
を用いることもできる。この際の定数cを選択的に用いる関数群gの概念図を図3に示している。
【0044】
としては、
【数14】
Figure 0003914119
をはじめ、uの関数ならば、いかなる関数を用いてもよい。ただし、β,γは予め定められた定数である。
【0045】
文字種判定手段104による判定結果が、手書き文字である場合には、文字抽出手段102で抽出された文字が手書き文字認識手段105によって認識され、文字種判定手段104による判定結果が、印刷活字である場合には、印刷活字認識手段106によって認識され、その認識結果が出力される。この際、手書き文字認識手段105と、印刷活字認識手段106とは、それぞれ学習機能を含む辞書内容及び認識アルゴリズムを異にする。
ここで、上記特徴量が「不定」となる場合とその判定方法について補足説明する。
以下の2例の様に特徴量の計算式が計算不能である場合には特徴量の値が「求まらない」とする。
例1; 上記した(3)式の特徴量uがN=1の場合、計算不能であり、値が求まらない。
例2;上記した(6)式の特徴量uは、文字列中に同じ字形の文字が出現していない場合は、計算不能であり、値が求まらない。
【0046】
次に本発明の第2実施形態を説明する。この第2実施形態は、図10に示すように、前処理手段201、文字抽出手段202、特徴抽出手段203、文字種判定手段204、手書き文字認識手段205、印刷活字認識手段206、不明字種認識手段207等の構成要素を有する。ここで、前処理手段201、文字抽出手段202、特徴抽出手段203、手書き文字認識手段205、印刷活字認識手段206は、それぞれ上記図1に示す第1実施形態と同様の機能構成であり、ここでは具体的な動作説明を省略する。
【0047】
文字種判定手段204は、特徴抽出手段203で抽出した特徴量を用いて、文字抽出手段202で抽出した文字群が、印刷活字であるか、手書き文字であるか、印刷活字であるか手書き文字かが不明な字種であるかの判定を行う。即ち、文字種判定手段204は、特徴抽出手段203で抽出した特徴をもとに、文字抽出手段202で抽出した文字群が、印刷活字であるか、あるいは手書き文字であるかを示す評価値sを求め、この評価値sが予め定められた閾値θより大きい場合は印刷活字、評価値sが予め定められた閾値θより小さい場合は手書き文字と判定する。また、評価値sが[s<=θかつs>=θh]である場合は、印刷活字であるか手書き文字かが不明な字種であると判定する。
【0048】
文字種判定手段204による判定結果が、手書き文字である場合には、文字抽出手段202で抽出された文字を手書き文字認識手段205によって認識し、印刷活字である場合には印刷活字認識手段206によって認識し、印刷活字であるか手書き文字であるかが不明な字種であると判定した場合には、不明字種認識手段207によって認識して、その認識結果を出力する。この第2実施形態に於いても手書き文字認識手段205と、印刷活字認識手段206と、不明字種認識手段207とは、それぞれ学習機能を含む辞書内容及び認識アルゴリズムを異にする。
【0049】
【発明の効果】
以上詳記したように本発明によれば、入力画像より切り出した読み取り対象文字が、印刷活字であるか手書き文字であるかの判定を正確に行うことができる信頼性の高い文字認識が可能となる。即ち、本発明によれば、単独では正確な判定を可能としない特徴量を複数組み合わせることにより、単独の特徴量を用いた場合より正確な判定を行うことができる。また、値が不定となり得る特徴量についても評価値の計算式への組込みを可能とする仕組みを導入することにより、より多くの特徴量の組込みを可能とし、より高精度な判定を行うことが可能である。また、字種判定が困難な場合に印刷活字と手書き文字の両方に適用可能な文字認識方法を適用することにより、字種判定が困難な場合でも、字種判定の誤りによる精度低下を防ぐことができる。
【図面の簡単な説明】
【図1】本発明の第1実施形態に於ける要部の構成を示すブロック図。
【図2】上記実施形態に於ける(9式)の概念図。
【図3】上記実施形態に於ける(13式)の概念図。
【図4】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図5】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図6】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図7】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図8】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図9】上記実施形態に於ける特徴量uの抽出方法を説明するための図。
【図10】本発明の第2実施形態に於ける要部の構成を示すブロック図。
【符号の説明】
101,201…前処理手段
102,202…文字抽出手段
103,203…特徴抽出手段
104,204…文字種判定手段
105,205…手書き文字認識手段
106,206…印刷活字認識手段
207…不明字種認識手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a character recognition method for recognizing a character cut out from an input image. Furthermore, the present invention relates to a character recognition device that determines whether a character cut out from an input image is a printed type or a handwritten character, and performs character recognition based on the determination result.
[0002]
[Prior art]
Conventionally, as a technique related to a character recognition process for switching between a character recognition method and a character recognition dictionary according to a determination result of a printed type and a handwritten character, a determination is made based on a feature amount representing variation in detected character height. No. 189604 (the prior art 1), the method of the Japanese Patent Publication No. 06-32074 in which the entropy of the frequency of the detected character size is used as a feature amount (prior art 2), and the character spacing variance is an experimental value. Is determined to be a print type if it is almost equal, otherwise it is determined by the number of rejected characters (conventional technique 3) in which the determination is made based on the number of rejected characters, the directionality, thickness, linearity, etc. of the detected character stroke The method (prior art 4) of Japanese Patent Laid-Open No. 10-162102 for performing determination based on one or a plurality of combinations of feature amounts, the area of the character rectangle, the horizontal Japanese Patent Laid-Open No. 10-214308 uses one or a combination of height, height / width ratio, distance between the center of a rectangle and the center of a row, and feature amounts representing variations in the distance between the centers of the rectangles. And the method (prior art 5).
[0003]
Among the above-mentioned conventional technologies, the conventional technologies 1, 2 and 5 use that the character size of the print type is generally almost constant, but the character size due to the adhesion of noise to the print type or blurring. As a result of this change, there may be variations in the character size even in the printed type. Therefore, there is a case where the printed type and the handwritten character cannot be clearly distinguished by the feature amount representing the variation in the character size. In addition, when there are a large number of characters that are different from those of other characters such as “1” and “−” in terms of width, height, area, etc., or when there are few characters In some cases, variations in the size and area of the printed type may become large, and therefore determination based on variations in size and area becomes difficult.
[0004]
Prior art 3 has a high probability that a character recognition method suitable for printed characters will reject handwritten characters, and a reasonable decision can be made if the character recognition method suitable for handwritten characters has a high probability of rejecting printed characters. However, depending on the character recognition method, such a difference in reject probability may not be expected at all.If unrecognizable characters are mixed due to noise or blurring, a character recognition method suitable for the wrong character type may be used. In some cases, the number of rejected characters is reduced when recognition is performed.
[0005]
In addition, the feature amount related to the directionality and linearity of the stroke used in the prior art 4 is a case where a character with a large linear stroke, such as Kanji, is a recognition target or a large number of characters in the target character group. It is effective when there are many linear strokes, but it is difficult to determine when dealing with numbers with small linear components, lower case letters of alphabets, hiragana, etc. because there are few linear components.
[0006]
[Problems to be solved by the invention]
As described above, conventionally, there has been a problem that it is often difficult to make a determination based on a single feature due to character deterioration due to noise or blurring, a lack of the number of characters, an exception such as the size of a character to be determined.
[0007]
The present invention has been made in view of the above circumstances, and it is possible to perform highly reliable character recognition that can accurately determine whether a read target character cut out from an input image is a printed type or a handwritten character. An object is to provide a character recognition method and a character recognition device.
[0008]
[Means for Solving the Problems]
The present invention performs a more accurate determination than a case where a single feature amount is used as a configuration for determining whether it is a printed type or a handwritten character by combining a plurality of feature amounts that cannot be accurately determined alone. The character recognition method and the character recognition apparatus are made to be able to do this.
[0009]
Further, the present invention determines whether the character to be read cut out from the input image is a print type or a handwritten character, and determines whether the character type cannot be determined, and the determination result is suitable for the print type. Even if it is difficult to determine the character type as a configuration for character recognition using either a character recognition means, a character recognition means suitable for handwritten characters, or a character recognition means applicable to both printed and handwritten characters, It is characterized in that it is possible to prevent a decrease in accuracy due to an error in character type determination.
[0010]
That is, the present invention determines whether a character group extracted from an input image is a handwritten character or a print type based on the feature amount, and performs character recognition processing based on the determination result. In the apparatus, a feature extraction unit that extracts a plurality of feature amounts used for determination of print characters and handwritten characters from the character group extracted from the input image, and a value for each of the feature amounts extracted by the feature extraction unit If the value is determined, a function of the value is used, and if the value is indefinite, a predetermined constant for the feature value is set as a function value corresponding to the feature value of interest. An evaluation value used for determination of a handwritten character is obtained, and when the evaluation value is less than a predetermined threshold, the character type determination unit determines that the character is a handwritten character, and if the evaluation value is equal to or greater than the threshold, the print type is determined. Character type determination When the determination result by the step is a handwritten character, the character extraction unit recognizes the character extracted by the character extraction unit, and when the determination result by the character type determination unit is a print type, the character extraction unit extracts the character. And a print type recognition means for recognizing the characters. As described above, by having a function of performing determination by combining a plurality of feature amounts, more accurate determination can be performed as compared with the case where a single feature amount is used. In addition, by introducing a mechanism that allows the evaluation value to be incorporated into the calculation formula for the feature value whose value may be indeterminate, it is possible to incorporate more feature values and perform more accurate determination. It becomes possible.
[0011]
Further, the present invention determines whether a character group extracted from an input image is a handwritten character or a print type based on the feature amount, and performs character recognition processing based on the determination result A preprocessing unit that performs noise removal and binarization processing on an input image to cut out a recognition target region, a character extraction unit that extracts a character group from the recognition target region cut out by the preprocessing unit, A feature extracting unit that extracts one or a plurality of feature amounts used for determination of print type and handwritten character from the character group extracted by the character extracting unit, and the character extracting unit using the feature amount extracted by the feature extracting unit. Character type determination means for determining whether the extracted character group is a print type, a handwritten character, a print type, a handwritten character, or an unknown character type that is unknown, and the determination result by the character type determination unit is A handwritten character recognizing means for recognizing a character extracted by the character extracting means using a dictionary according to the determination result when the character is a character, and a determination result when the determination result by the character type determining means is a print type The printed character recognition means for recognizing the character extracted by the character extraction means using a dictionary, and the character extraction means using a dictionary according to the determination result when the determination result by the character type determination means is an unknown character type An unknown character type recognizing means for recognizing the extracted character is provided. Thus, when character type determination is difficult, character type determination is possible even when character type determination is difficult by applying character recognition means (recognition dictionary) that can be applied to both printed and handwritten characters. It is possible to prevent a decrease in accuracy due to an error.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
In the present invention, it is possible to perform accurate determination by combining a plurality of feature quantities that cannot be accurately determined alone, and feature quantities that may show abnormal values or feature quantities that may not be extracted. Is also used for judgment.
[0013]
As shown in FIG. 1, the first embodiment of the present invention comprises preprocessing means 101, character extraction means 102, feature extraction means 103, character type determination means 104, handwritten character recognition means 105, print type recognition means 106, and the like. Has an element.
[0014]
The preprocessing means 101 performs noise removal, binarization, form removal, and the like on the input image to cut out a recognition target area. The character extraction unit 102 extracts a character group as a recognition target from the recognition target region cut out by the preprocessing unit 101. The feature extraction unit 103 extracts one or a plurality of feature amounts for determining a print type and a handwritten character from the character group extracted by the character extraction unit 102.
[0015]
The character type determination unit 104 calculates an evaluation value for determining the handwritten character and the print type as a function of the feature amount extracted by the feature extraction unit 103, and compares it with a predetermined threshold value to determine “print type”. ”Or“ handwritten character ”. If the result is“ printed type ”, the printed character recognition unit 106 recognizes the extracted character group. If it is“ handwritten character ”, the handwritten character recognition unit 105 recognizes it. The extracted character group is recognized.
[0016]
Further, as shown in FIG. 10, the second embodiment of the present invention has an unknown character type recognition means 207 in addition to the components of the first embodiment shown in FIG.
[0017]
The character type determination unit 204 calculates an evaluation value for determining the handwritten character and the print type as a function of the feature amount extracted by the feature extraction unit 203, and compares it with a predetermined threshold value to determine “print type”. ”Or“ handwritten character ”or“ character type whose print type or handwritten character is unknown ”. If the result is“ print type ”, the print type recognition means 206 If it is “handwritten character”, the handwritten character recognition unit 205 recognizes the extracted character group, and if it is “unknown character type”, the unknown character type recognition unit 207 recognizes the extracted character group.
[0018]
Here, the evaluation values calculated by the character type determination means 104 and 204 are calculated using the calculation formulas shown in the following formulas (3) and (6), and the function g i (u i ) of each feature u i is calculated. This is performed by the function f (g 1 (u 1 ),..., G N (u N )). For the feature quantity u i whose value is indefinite in the above formula, a constant is used instead of g i (u i ). By using, it is possible to calculate a feature quantity whose value is indefinite.
[0019]
Each embodiment of the present invention will be described below with specific examples. Note that the processing function and processing procedure of the present invention can be configured by incorporating character recognition software into a general-purpose computer, so the following description will be made assuming such a configuration. However, according to the present invention, each means can be configured as an aggregate of dedicated hardware or a network system of computers for distributed processing. If the structure includes all of the above-described means, it is not limited to the configuration described here. However, any method may be used.
[0020]
First, a first embodiment of the present invention will be described with reference to FIGS.
[0021]
In FIG. 1, the preprocessing unit 101 extracts, binarizes, removes noise, removes a form, and the like from an input image (a document image read by a scanner) in the vicinity of a processing target column. Image processing is performed. Further, when the recognition target is a character on a form, a form such as a ruled line or a preprint is also removed.
[0022]
The character extraction unit 102 extracts an image and position information for each character of the character group to be recognized from the output image of the preprocessing unit 101.
[0023]
The feature extraction unit 103 extracts one or more types of feature amounts used for determination of print type and handwritten character for each character group extracted by the character extraction unit 102. Further, depending on the feature amount and the input, the value of the feature amount may not be obtained or meaningless. In such a case, “undefined” is set as the extraction result.
[0024]
As the feature quantity to be extracted, for example, as described below, a feature quantity u 1 representing “deviation from the arrangement of the end of the character rectangle”, a feature quantity u 2 representing “mismatch of the shape of the same character”, and “character recognition” There are three types of feature quantity u 3 representing “difference in the number of rejected characters for each method”.
[0025]
Here, it is assumed that these three types of feature quantities u 1 , u 2 , u 3 are applied. A method for extracting these three types of feature quantities u 1 , u 2 , and u 3 will be described.
[0026]
First, a method of extracting the feature amount u 1 representing the deviation from the end of the character rectangle will be described with reference to FIGS.
[0027]
The feature quantity u 1 representing the deviation from the end of the character rectangle is represented by N as the number of characters in the character group, y t n as the Y coordinate of the circumscribed rectangle of the nth character, and the circumscribed rectangle of the nth character. If the lower end Y coordinate is set to y b n, y t and y b obtained by the following equations:
[Expression 1]
Figure 0003914119
[Expression 2]
Figure 0003914119
α which minimizes y b −y t is obtained by the gradient method, and the calculation formula
Figure 0003914119
Is obtained by calculating However, median (x n ) is the median value of x n for all n.
[0028]
The sequence of the upper and lower ends of the characters is expressed by y t , y b , α obtained by the above method.
Figure 0003914119
[Equation 5]
Figure 0003914119
(See FIG. 4),
| Y tn − (nα + y t ) |, | y bn − (nα + y b ) |
The upper end of the n-th character, represents the deviation from the character sequence of the lower end, u 1 functions as an evaluation measure of the deviation from the sequence of character rectangle of the end (see FIG. 5).
[0029]
The above (Equation 3) is always 0 when N = 1 and is irrelevant to the difference between the printed type and the handwritten character. Therefore, when N = 1, u 1 is “undefined”.
[0030]
Next, the feature quantity u 2 representing the mismatch of the shape of the same character will be described with reference to FIGS.
[0031]
The feature quantity u 2 representing the mismatch of the shape of the same character is represented by C as the number of character types, c as the character type number, c n as the character type number of the recognition result of the n th character, and the n th character. the number foreground pixels of the character image when a n, the upper left corner of the n-th character and m-th character image overlaid combined as shown in FIG. 6, both of the image which the pixel a black pixel If the number is vm , n ,
[Formula 6]
Figure 0003914119
Ask for.
[0032]
Here, the upper left corner is aligned when two character images are overlapped. However, as shown in FIG. 7, the center of gravity and the center of the circumscribed rectangle of the two character images are overlapped, as shown in FIG. Alternatively, the method may be a method in which the centers of the upper sides of the circumscribed rectangle (FIG. 5) are overlapped or a method of overlapping the centers of the lower sides as shown in FIG. At this time, if the same character is not included in the character group, since the above (Formula 6) cannot be calculated, u 2 is set to “undefined”.
[0033]
Next, the feature quantity u 3 representing the difference in the number of rejected characters for each character recognition method will be described. The feature quantity u 3 representing the difference in the number of rejected characters for each character recognition method is a part of the character group obtained by two types of character recognition methods: a character recognition method suitable for printed characters and a character recognition method suitable for handwritten characters. or, after the recognition of all characters is obtained from the reject number r h of the character recognition method suitable for the rejected characters r p handwritten characters in the character recognition method suitable for printing print by the following equation.
[0034]
[Expression 7]
Figure 0003914119
[0035]
The result of the character recognition performed to obtain the reject character numbers r p and r h may be discarded as it is, but is stored without discarding and is stored as cache data by the handwritten character recognition means and the print type recognition means 106. May be used.
[0036]
In the above, three types of feature quantities are shown as examples of the feature quantities extracted by the feature extraction unit 103. However, the feature quantities extracted by the feature extraction unit 103 represent the difference between printed characters and handwritten characters. Any amount other than those described above may be used, and the number of types is arbitrary.
[0037]
The character type determination unit 104 obtains an evaluation value s indicating whether the character group extracted by the character extraction unit 102 is a print type or a handwritten character based on the feature extracted by the feature extraction unit 103. When the evaluation value s is larger than the predetermined threshold value θ, it is determined that the printed character is printed, and when the evaluation value s is equal to or smaller than the threshold value θ, the character is a handwritten character.
[0038]
When the number of feature quantities is d and the i-th feature quantity is u i , the evaluation value s is expressed by the function f and the function group g i
Figure 0003914119
Is required.
[0039]
g i (u i )
[Equation 9]
Figure 0003914119
Can be used. However, c i is a predetermined constant. A conceptual diagram of a function group g i using constants c i in this case selectively is shown in FIG.
[0040]
As a function f, a linear combination of g i (u i )
Figure 0003914119
Can be used. However, w i is a predetermined constant.
[0041]
In the above example, g i (u i ) linear combination is used as f, but f is a quadratic form of g i (u i )
Figure 0003914119
As long as it is a function of g i (u i ), any function may be used.
[0042]
However,
[Expression 12]
Figure 0003914119
And W is a predetermined symmetric matrix.
[0043]
Moreover, as g i (u i ),
[Formula 13]
Figure 0003914119
Can also be used. And Figure 3 shows the conceptual diagram of the function group g i using constants c i in this case selectively.
[0044]
The h i,
[Expression 14]
Figure 0003914119
Any function may be used as long as it is a function of u i . However, β i and γ i are predetermined constants.
[0045]
When the determination result by the character type determination unit 104 is a handwritten character, the character extracted by the character extraction unit 102 is recognized by the handwritten character recognition unit 105, and the determination result by the character type determination unit 104 is a print type Is recognized by the print type recognition means 106, and the recognition result is output. At this time, the handwritten character recognizing means 105 and the print type recognizing means 106 have different dictionary contents and recognition algorithms each including a learning function.
Here, a supplementary description will be given of the case where the feature amount is “indefinite” and the determination method.
As in the following two examples , when the feature quantity calculation formula cannot be calculated, it is determined that the feature quantity value is “not determined”.
Example 1: When the feature quantity u 1 in the above equation (3) is N = 1, calculation is impossible and the value cannot be obtained.
Example 2: feature quantity u 2 of the above-mentioned (6), if the character of the same shape in the string does not appear, a calculation impossible, the value is not determined.
[0046]
Next, a second embodiment of the present invention will be described. In the second embodiment, as shown in FIG. 10, preprocessing means 201, character extraction means 202, feature extraction means 203, character type determination means 204, handwritten character recognition means 205, print type recognition means 206, unknown character type recognition. It has components such as means 207. Here, the preprocessing means 201, the character extraction means 202, the feature extraction means 203, the handwritten character recognition means 205, and the print type recognition means 206 have the same functional configuration as that of the first embodiment shown in FIG. Now, a detailed description of the operation is omitted.
[0047]
The character type determination unit 204 uses the feature amount extracted by the feature extraction unit 203 to determine whether the character group extracted by the character extraction unit 202 is a print type, a handwritten character, a print type, or a handwritten character. Judge whether the character type is unknown. In other words, the character type determination unit 204 uses the feature extracted by the feature extraction unit 203 to obtain an evaluation value s indicating whether the character group extracted by the character extraction unit 202 is a printed type or a handwritten character. determined, it determines the evaluation value s is a predetermined threshold theta p larger than the printing print, when the evaluation value s is the threshold value theta h is smaller than a predetermined and handwriting. Further, it is determined that when the evaluation value s is [s <= θ p cutlet s> = θ h] is whether handwritten character is printed print is unknown character type.
[0048]
When the determination result by the character type determination unit 204 is a handwritten character, the character extracted by the character extraction unit 202 is recognized by the handwritten character recognition unit 205, and when it is a print type, the character is recognized by the print type recognition unit 206. If it is determined that the character type is unknown whether it is a printed type or a handwritten character, it is recognized by the unknown character type recognizing means 207 and the recognition result is output. Also in the second embodiment, the handwritten character recognizing means 205, the print type recognizing means 206, and the unknown character type recognizing means 207 have different dictionary contents and recognition algorithms each including a learning function.
[0049]
【The invention's effect】
As described above in detail, according to the present invention, it is possible to perform highly reliable character recognition that can accurately determine whether a reading target character cut out from an input image is a printed type or a handwritten character. Become. That is, according to the present invention, it is possible to perform more accurate determination by using a plurality of feature amounts that cannot be accurately determined by themselves than when a single feature amount is used. In addition, by introducing a mechanism that allows the evaluation value to be incorporated into the calculation formula for the feature value whose value may be indeterminate, it is possible to incorporate more feature values and perform more accurate determination. Is possible. In addition, by applying a character recognition method that can be applied to both printed and handwritten characters when character type determination is difficult, even if character type determination is difficult, accuracy degradation due to character type determination errors can be prevented. Can do.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a main part in a first embodiment of the present invention.
FIG. 2 is a conceptual diagram of (Equation 9) in the embodiment.
FIG. 3 is a conceptual diagram of (13 formulas) in the embodiment.
FIG. 4 is a diagram for explaining a method of extracting a feature quantity u 1 in the embodiment.
FIG. 5 is a diagram for explaining a method for extracting a feature amount u 1 in the embodiment.
FIG. 6 is a diagram for explaining a method of extracting a feature amount u 2 in the embodiment.
FIG. 7 is a view for explaining a method for extracting a feature quantity u 2 in the embodiment.
FIG. 8 is a diagram for explaining a method of extracting a feature quantity u 2 in the embodiment.
FIG. 9 is a view for explaining a method for extracting a feature quantity u 2 in the embodiment.
FIG. 10 is a block diagram showing a configuration of a main part in a second embodiment of the present invention.
[Explanation of symbols]
101, 201 ... Pre-processing means 102, 202 ... Character extraction means 103, 203 ... Feature extraction means 104, 204 ... Character type determination means 105, 205 ... Handwritten character recognition means 106, 206 ... Print type recognition means 207 ... Unknown character type recognition means

Claims (2)

入力画像から認識対象領域を切り出す前処理手段と、前記前処理手段が切り出した認識対象領域から文字群を抽出する文字抽出手段と、前記文字抽出手段が抽出した文字群から、印刷活字と手書き文字の判定に用いる複数の特徴量を抽出する特徴抽出手段と、前記特徴抽出手段が抽出した特徴量を用いて前記抽出した文字群が印刷活字であるか手書き文字であるかを判定する文字種判定手段と、前記文字種判定手段が手書き文字であることを判定したとき前記文字抽出手段が抽出した文字を認識する手書き文字認識手段と、前記文字種判定手段が印刷活字であることを判定したとき前記文字抽出手段が抽出した文字を認識する印刷活字認識手段とを具備した文字認識装置の文字認識方法であって、
前記特徴抽出手段が抽出する特徴量には、
前記文字抽出手段が抽出した文字群の文字矩形の端の並びからのずれを表す量をもとに抽出した特徴量1と、
前記文字抽出手段が抽出した文字群について、同じ文字の字形の不一致を表す量を用いて抽出した特徴量2とを含み、
前記文字種判定手段は、
前記特徴抽出手段が抽出した特徴量の各々について、値が求まる場合はその値の関数を注目特徴量に対応する関数値とし、前記文字抽出手段が抽出した文字群中の文字数Nが「N=1」であるときは前記特徴量1に対して予め定められた定数を、前記文字抽出手段が抽出した文字群中に同じ字形の文字がないときは前記特徴量2に対して予め定められた定数を、それぞれ前記注目特徴量に対応する関数値として、この関数値をもとに評価値を求め、この評価値が予め定められた閾値未満である場合は手書き文字、前記評価値が前記閾値以上である場合は印刷活字と判定する
ことを特徴とする文字認識方法。
A preprocessing means for cutting a recognition target region from an input image, a character extracting means for extracting a character group from the recognition target region said preprocessing means cut, from the character group in which the character extracting means has extracted, printed print and handwriting a plurality of feature extracting means for extracting a feature value, character type determining means for determining whether said feature extracting means character group which the extracted with features extracted there is a handwriting or a printing print used for the determination of when a handwriting recognition unit for recognizing the character extracted by the said character extraction means when it is judged that the character type determining means is a hand-written characters, the character extraction when determining that the character type determining means is a printing print A character recognition method of a character recognition device comprising a print type recognition means for recognizing a character extracted by the means,
The feature quantity extracted by the feature extraction means includes
A feature quantity 1 extracted based on an amount representing a deviation from the end of the character rectangle of the character group extracted by the character extraction means;
About the character group extracted by the character extraction means, the feature amount 2 extracted using the amount representing the mismatch of the shape of the same character,
The character type determination means includes
When a value is obtained for each feature quantity extracted by the feature extraction means, the function of the value is set as a function value corresponding to the feature quantity of interest, and the number N of characters in the character group extracted by the character extraction means is “N = "1", a predetermined constant for the feature amount 1 is set, and when there is no character having the same character shape in the character group extracted by the character extracting means, the predetermined amount is set for the feature amount 2. constants, respectively as a function value corresponding to the target feature amount, determine the evaluation value of the function value based, handwritten characters for this evaluation value is lower than the predetermined threshold value, the evaluation value is the threshold A character recognition method characterized in that if it is above, it is determined as a print type.
入力画像に対しノイズ除去及び二値化処理を行って認識対象領域を切り出す前処理手段と、
前記前処理手段で切り出した認識対象領域から文字群を抽出する文字抽出手段と、
前記文字抽出手段で抽出した文字群から、印刷活字と手書き文字の判定に用いる複数の特徴量を抽出する特徴抽出手段と、
前記特徴抽出手段で抽出した特徴量を用いて前記抽出した文字群が印刷活字であるか手書き文字であるかを判定する文字種判定手段と、
前記文字種判定手段による判定結果が手書き文字である場合に前記文字抽出手段で抽出された文字を認識する手書き文字認識手段と、
前記文字種判定手段による判定結果が印刷活字である場合に前記文字抽出手段で抽出された文字を認識する印刷活字認識手段とを具備し
前記特徴抽出手段は、
前記文字抽出手段が抽出した文字群の文字矩形の端の並びからのずれを表す量をもとに特徴量を抽出する手段と、
前記文字抽出手段が抽出した文字群について、同じ文字の字形の不一致を表す量を用いて特徴量を抽出する手段とを具備し、
前記文字種判定手段は、
前記特徴抽出手段が抽出した特徴量の各々について、値が求まる場合はその値の関数を注目特徴量に対応する関数値とし、前記文字抽出手段が抽出した文字群中の文字数Nが「N=1」であるときは前記特徴量1に対して予め定められた定数を、前記文字抽出手段が抽出した文字群中に同じ字形の文字がないときは前記特徴量2に対して予め定められた定数を、それぞれ前記注目特徴量に対応する関数値として、この関数値をもとに評価値を求め、この評価値が予め定められた閾値未満である場合は手書き文字、前記評価値が前記閾値以上である場合は印刷活字と判定することを特徴とする文字認識装置。
Preprocessing means for performing noise removal and binarization processing on the input image to cut out a recognition target area;
A character extraction means for extracting a character group from the recognition target region extracted by the preprocessing means,
Feature extraction means for extracting a plurality of feature amounts used for determination of print type and handwritten character from the character group extracted by the character extraction means;
Character type determination means for determining whether the extracted character group is a print type or a handwritten character using the feature amount extracted by the feature extraction means;
Handwritten character recognition means for recognizing a character extracted by the character extraction means when the determination result by the character type determination means is a handwritten character;
A print type recognition unit for recognizing the character extracted by the character extraction unit when the determination result by the character type determination unit is a print type ;
The feature extraction means includes
Means for extracting a feature amount based on an amount representing a deviation from the end of the character rectangle of the character group extracted by the character extraction means;
Means for extracting a feature amount using a quantity representing a mismatch of the shape of the same character for the character group extracted by the character extraction means;
The character type determination means includes
When a value is obtained for each feature quantity extracted by the feature extraction means, the function of the value is set as a function value corresponding to the feature quantity of interest, and the number N of characters in the character group extracted by the character extraction means is “N = "1", a predetermined constant for the feature amount 1 is set, and when there is no character having the same character shape in the character group extracted by the character extracting means, the predetermined amount is set for the feature amount 2. Each of the constants is a function value corresponding to the feature of interest, and an evaluation value is obtained based on the function value. If the evaluation value is less than a predetermined threshold value, handwritten characters, and the evaluation value is the threshold value A character recognition device characterized in that if it is above, it is determined as a print type .
JP2002256913A 2002-09-02 2002-09-02 Character recognition method and character recognition device Expired - Fee Related JP3914119B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2002256913A JP3914119B2 (en) 2002-09-02 2002-09-02 Character recognition method and character recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2002256913A JP3914119B2 (en) 2002-09-02 2002-09-02 Character recognition method and character recognition device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
JP2006001484A Division JP2006107534A (en) 2006-01-06 2006-01-06 Character recognizing method and character recognizing device

Publications (2)

Publication Number Publication Date
JP2004094734A JP2004094734A (en) 2004-03-25
JP3914119B2 true JP3914119B2 (en) 2007-05-16

Family

ID=32061995

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2002256913A Expired - Fee Related JP3914119B2 (en) 2002-09-02 2002-09-02 Character recognition method and character recognition device

Country Status (1)

Country Link
JP (1) JP3914119B2 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8649600B2 (en) * 2009-07-10 2014-02-11 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
JP2013085546A (en) * 2011-10-24 2013-05-13 Nikon Corp Apparatus, method and program for processing image
JP2016182142A (en) * 2016-07-05 2016-10-20 株式会社ニコン Image processor, image processing method, and program

Also Published As

Publication number Publication date
JP2004094734A (en) 2004-03-25

Similar Documents

Publication Publication Date Title
US7336813B2 (en) System and method of determining image skew using connected components
US8023741B2 (en) Methods and systems for detecting numerals in a digital image
Aradhye A generic method for determining up/down orientation of text in roman and non-roman scripts
JPH09179937A (en) Automatic identification method of sentence image boundaries
JPH05242292A (en) Separating method
JP2011018338A (en) Method and system for classifying connected group of foreground pixel in scanned document image according to type of marking
US20110311161A1 (en) Methods and Systems for Identifying the Orientation of a Digital Image
US6690824B1 (en) Automatic recognition of characters on structured background by combination of the models of the background and of the characters
JP4280355B2 (en) Character recognition device
Yin Skew detection and block classification of printed documents
US7146047B2 (en) Image processing apparatus and method generating binary image from a multilevel image
JP3914119B2 (en) Character recognition method and character recognition device
JP2006107534A (en) Character recognizing method and character recognizing device
Tse et al. An OCR-independent character segmentation using shortest-path in grayscale document images
Sánchez et al. Automatic line and word segmentation applied to densely line-skewed historical handwritten document images
JP2000322514A (en) Pattern extraction device and character segmentation device
CN113971802B (en) Character segmentation device and method
Razak et al. A real-time line segmentation algorithm for an offline overlapped handwritten Jawi character recognition chip
Rani et al. A Generic Line Elimination Methodology using Circular Masks for Printed and Handwritten Document Images
WO2006080568A1 (en) Character reader, character reading method, and character reading control program used for the character reader
JP2827960B2 (en) Address line extraction device
JP6098065B2 (en) Image inspection apparatus, image inspection method, and program
Pun et al. A Survey on Change Detection Techniques in Document Images
JP4304920B2 (en) Character string recognition device and its program
JPH05298487A (en) Alphabet recognizing device

Legal Events

Date Code Title Description
A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20051019

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20051108

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20060106

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20060106

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20060110

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A711

Effective date: 20060328

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20060328

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20061003

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20061204

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20070130

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20070201

R150 Certificate of patent or registration of utility model

Ref document number: 3914119

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20100209

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20110209

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20120209

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20130209

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140209

Year of fee payment: 7

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

LAPS Cancellation because of no payment of annual fees