[go: up one dir, main page]

CN1300740C - Postal coding numberical string identifying method - Google Patents

Postal coding numberical string identifying method Download PDF

Info

Publication number
CN1300740C
CN1300740C CNB2005100235506A CN200510023550A CN1300740C CN 1300740 C CN1300740 C CN 1300740C CN B2005100235506 A CNB2005100235506 A CN B2005100235506A CN 200510023550 A CN200510023550 A CN 200510023550A CN 1300740 C CN1300740 C CN 1300740C
Authority
CN
China
Prior art keywords
centerdot
recognition result
recognition
probability
prime
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2005100235506A
Other languages
Chinese (zh)
Other versions
CN1645408A (en
Inventor
吕岳
邬建中
文颖
原晓梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI INST OF POSTAL SCIENCE
Original Assignee
SHANGHAI INST OF POSTAL SCIENCE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI INST OF POSTAL SCIENCE filed Critical SHANGHAI INST OF POSTAL SCIENCE
Priority to CNB2005100235506A priority Critical patent/CN1300740C/en
Publication of CN1645408A publication Critical patent/CN1645408A/en
Application granted granted Critical
Publication of CN1300740C publication Critical patent/CN1300740C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)

Abstract

一种邮政编码数字串识别方法,包括如下步骤:将N个邮政编码字符序列的图象X=(x1…xn…xN)分别输入到K个独立的单字识别分类器ek,每个所述单字识别分类器ek将输入的字符图象xn识别为邮政编码{c1…cm…cM}中的一个,或者拒识,计算当识别结果为m’时输入模式为cm的概率P(x∈Cm/ek(x)=m’);根据P(x∈Cm/ek(x)=m’)计算X的识别结果为D=(d1,d2,…,dN)的概率p(D|X);其中D=(d1,d2,…,dN)是邮政编码字典库Ω中的一个有效邮政编码;根据概率p(D|X)决定输入模式的识别结果。本发明邮政编码数字串识别方法,其识别表决规则根据各分类器本身的特性,发挥了各个分类器的优点。通过对大量样本的统计获得每个分类器识别性能的先验知识,将其作为投票表决的依据,使识别组合结果达到高识别率和高置信度。提高了邮政编码数字串识别的准确率。

A method for identifying postal code number strings, comprising the steps of: inputting images X=(x 1 ... x n ... x N ) of N postal code character sequences into K independent word recognition classifiers e k , each The single character recognition classifier e k recognizes the input character image x n as one of the zip code {c 1 ...c m ...c M }, or rejects it, and calculates when the recognition result is m', the input mode is The probability P(x∈C m /e k (x)=m') of c m ; the recognition result of X calculated according to P(x∈C m /e k (x)=m') is D=(d 1 , d 2 ,...,d N ) probability p(D|X); where D=(d 1 ,d 2 ,...,d N ) is a valid zip code in the zip code dictionary library Ω; according to the probability p(D |X) determines the recognition result of the input pattern. In the postal code digital string recognition method of the present invention, the recognition voting rules are based on the characteristics of each classifier itself, and the advantages of each classifier are brought into play. Through the statistics of a large number of samples, the prior knowledge of the recognition performance of each classifier is obtained, and it is used as the basis for voting, so that the recognition combination results can achieve high recognition rate and high confidence. Improve the accuracy of postal code digit string recognition.

Description

Postal coding numberical string identifying method
Technical field
The present invention relates to postal coding numberical string identifying method.
Background technology
Optical character recognition progressively move towards practical, yet people wishes that still recognition system can reach the better recognition performance through the development of decades.In order to improve discrimination and degree of confidence, people more and more tend to adopt the combination of multiple information sources, many feature extractions and identification methods to realize high performance recognition system.
A kind of simple method that existing postal coding numberical string multi-categorizer makes up is voted exactly, as majority vote rule and rule in full accord etc.But these voting rules are not considered the characteristic of each sorter itself, implementation be the principle of " on a one-man-one-vote basis ".And in fact because the feature difference that each sorter uses, based on principle and method different, perhaps the sample of training process use is not quite similar, the recognition performance of each sorter is difference to some extent, certain complementarity is arranged, and promptly each sorter has certain difference to the recognition capability of each classification.
General Combination of Multiple Classifiers is paid close attention to is combination to single character identification result, its objective is the optimization that reaches the individual character recognition effect, and its principle after input is waited to know sample Xn and discerned through K recognition classifier, obtains K recognition result Sn as shown in Figure 1 (k)(k=1,2 .. K), after the decision-making of multi-categorizer knowledge result combinations, obtains final recognition result Cn.Do not consider the context of character string during to the combination of multi-categorizer, it is with the combination recognition sequence (C of each character in the character string 1Cn ... C N) deliver to a dictionary library, whether effective by the recognition result of dictionary library check character string, as shown in Figure 2.
In some practical application, wish to obtain the whole recognition effect optimum of character string, and be not only the recognition effect optimum of single character string, because the recognition effect optimum of single character string is not necessarily represented the whole recognition effect optimum of character string.Such as in the identification of postcode, six numerals are discerned simultaneously correctly and can be used for the automatic mail sorting machine, require the recognition effect of whole postal coding numberical string is reached best.
Summary of the invention
The object of the present invention is to provide a kind of postal coding numberical string identifying method of the Combination of Multiple Classifiers based on knowledge base.
Adopt following technical scheme for reaching above-mentioned purpose the present invention,
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1X nX N) be input to independently individual character recognition classifier e of K respectively k, wherein N and K are the positive integer greater than 1; For China Post's coded digital character string, N=6.
(2) each described individual character recognition classifier e kCharacter image x with input nBe identified as postcode { c 1C mC MIn one, perhaps refuse to know, be expressed as c (M+1), wherein M is the positive integer greater than 1; Postcode { c 1C mC MBe any one in the numeral 0 to 9, M=10 is promptly arranged.
(3) calculating input pattern when recognition result is m ' is c mProbability P (x ∈ C m/ e k(x)=m ');
(4) according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) Probability p (D|X); D=(d wherein 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
As a kind of optimal way of the present invention, in the described step (3), input pattern was c when recognition result was m ' mProbability P (x ∈ C m/ e k(x)=m ') computing method can be following method:
According to described individual character recognition classifier e kRecognition result carries out sample statistics, forms described individual character recognition classifier e kThe chaotic matrix of identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) k = 1,2 , · · · , K
N wherein Mm ' (k)Represent described individual character recognition classifier e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) work as m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class,
Described individual character recognition classifier e kRecognition result is m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n im ′ ( k ) m ′ = 1,2 , . . . , M + 1
At described individual character recognition classifier e kRecognition result be that sample is from C under the condition of m ' mThe probability of class is:
P ( x ∈ C m / e k ( x ) = m ′ ) = n mm ′ ( k ) n m ′ ( k ) = n mm ′ ( k ) Σ m = 1 M n mm ′ ( k ) m ′ = 1,2 , · · · , M
As another optimal way of the present invention, in the described step (4), according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M kSample abundant and reflected the space distribution of recognition result, with CM kAs set of classifiers fashionable priori, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’)i=1,2,...,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C dn )
S ( D | X ) = Π n = 1 N s ( d n | x n ) = Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C dn )
The probability that last X belongs to D is p (D|X)=e F (D)S (D|X).
As an optimal way more of the present invention, in the described step (5),
Determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then, and promptly recognition result is D; Wherein α is refusing to know and wrong value of explaining (α=0.5) that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is constant (β=0.2) here, X=D then, promptly recognition result is D.
Postal coding numberical string identifying method of the present invention, its identification voting rule have been brought into play the advantage of each sorter according to the characteristic of each sorter itself.Obtain the priori of each sorter recognition performance by statistics, as the foundation of voting, make the identification combined result reach high discrimination and high confidence level it great amount of samples.Improved the accuracy rate of postal coding numberical string identification.
Description of drawings
Further specify the present invention below in conjunction with drawings and Examples.
Fig. 1 is a Combination of Multiple Classifiers individual character identification block scheme in the prior art
Fig. 2 carries out verification for dictionary library in the prior art to recognition result block scheme
Fig. 3 is the inventive method functional-block diagram
Embodiment
As shown in Figure 3, sequence X to be identified=(x 1... x n... x N) through individual character recognition classifier e kAfter the identification, make a strategic decision, obtain recognition result sequence (d at last in conjunction with the probability of dictionary library and appearance 1, d 2..., d N).
A kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1... x n... x N) be input to independently individual character recognition classifier of K simultaneously.For Chinese code number word character string, N=6.
(2) each individual character recognition classifier e kCharacter image x to input nDiscern, obtain recognition result, suppose that sorter is identified as { c with input pattern 1... c m... c MIn the class one, perhaps refuse to know.For postcode numeral, M=10, promptly its recognition result may be 0,1 ..., any one among the 9}.
(3) when recognition result is m ', input pattern may be c mProbability represent with following mode:
At first utilize great amount of samples statistical sorter e kThe identification situation, thereby form the chaotic matrix of relevant this sorter identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) k = 1,2 , . . . , K
N wherein Mm ' (k)Presentation class device e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of expression is:
(a) if m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) if m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) if m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class.
To sorter e k, recognition result is m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n im ′ ( k ) m ′ = 1,2 , . . . , M + 1
At sorter e kRecognition result be that sample is from C under the condition of m ' mThe probability of class can be represented with conditional probability:
P ( x ∈ C m / e k ( x ) = m ′ ) = n mm ′ ( k ) n m ′ ( k ) = n mm ′ ( k ) Σ m = 1 M n mm ′ ( k ) m ′ = 1,2 , . . . , M
If generate chaotic Matrix C M kSample abundant and reflected the distribution of model space, this confusion matrix has reflected sorter e kThe identification situation, with CM kAs set of classifiers fashionable priori, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’)i=1,2,...,M
(4) calculate X and belong to a certain postcode character string D=(d 1, d 2..., d N) probability:
Suppose D=(d 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω, and suppose that for certain specific application scenario, the frequency that postcode D occurs is expressed as f (D).
X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C dn )
S ( D | X ) = Π n = 1 N s ( d n | x n ) = Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C dn )
The possibility that last X belongs to D is expressed as:
p(D|X)=e f(D)·S(D|X)
(5) adopt following rule to determine the optimal identification result of input pattern:
Rule 1:
If exist D to belong to Ω, and p ( D | X ) = max D ∈ Ω p ( D | X ) and , p ( D | X ) > α
X=D then
Wherein α is a threshold value, be used for refusing to obtain compromise (α=0.5) between knowledge and the wrong knowledge,
Rule 2:
If exist D to belong to Ω, and p ( D | X ) = max D ∈ Ω p ( D | X )
Exist D ' to belong to Ω, and p ( D ′ | X ) = max D ′ ∈ Ω - D p ( D ′ | X )
And p (D | X)-p (D ' | X)>β
X=D then
Here β is constant (β=0.2).

Claims (6)

1, a kind of postal coding numberical string identifying method comprises the steps:
(1) with the visual X=(x of N postcode character string 1X nX N) be input to independently individual character recognition classifier e of K respectively kIn, wherein N and K are the positive integer greater than 1;
(2) each described individual character recognition classifier e kCharacter image x with input nBe identified as postcode { c 1C mC MIn one, perhaps refuse to know, be expressed as c (M+1), wherein M is the positive integer greater than 1;
(3) calculating input pattern when recognition result is m ' is c mProbability P (x ∈ C m/ e k(x)=m ');
(4) according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) Probability p (D|X); D=(d wherein 1, d 2..., d N) be an effective postcode among the postcode dictionary library Ω;
(5) according to the recognition result of Probability p (D|X) decision input pattern.
2, postal coding numberical string identifying method according to claim 1 is characterized in that: in the described step (1), the number N of postcode character string is 6; In the described step (2), postcode { c 1C mC MBe any one in the numeral 0 to 9.
3, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (3), input pattern was c when recognition result was m ' mProbability P (x ∈ C m/ e k(x)=m ') computing method be, according to described individual character recognition classifier e kRecognition result carries out sample statistics, forms described individual character recognition classifier e kThe chaotic matrix of identification situation:
CM k = n 11 ( k ) · · · n 1 M ( k ) n 1 ( M + 1 ) ( k ) · · · · · · · · · · · · · · · n ij ( k ) · · · · · · · · · · · · · · · · · · n M 1 ( k ) · · · n MM ( k ) n M ( M + 1 ) ( k ) , k = 1,2 , · · · , K
N wherein Mm ' (k)Represent described individual character recognition classifier e kWith C mSample in the class is identified as C M 'The quantity of class, the implication of its expression is: (a) work as m=m ', e kCorrect identification C mThe quantity of sample in the class;
(b) work as m '=M+1, e kRefuse to know C mThe quantity of sample in the class;
(c) as m ≠ m ' and m ' ≠ M+1, e kWith C mSample wrong identification in the class is C M 'The quantity of class,
Described individual character recognition classifier e kRecognition result be m '=e k(x) total sample number is:
n m ′ ( k ) = Σ i = 1 M n i m ′ ( k ) , m ′ = 1,2 , · · · , M + 1
At described individual character recognition classifier e kRecognition result be that sample is from C under the condition of m ' mThe probability of class is:
P ( x ∈ C m / e k ( x ) = m ′ ) = n m m ′ ( k ) n m ′ ( k ) = n m m ′ ( k ) Σ m = 1 M n m m ′ ( k ) , m ′ = 1,2 , · · · , M .
4, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (4), according to P (x ∈ C m/ e k(x)=and m ') recognition result that calculates X is D=(d 1, d 2..., d N) the method for Probability p (D|X) be:
Suppose to generate chaotic Matrix C M kSample abundant and reflected the space distribution of recognition result, with CM kAs the fashionable priori of set of classifiers, promptly with P (x ∈ C m/ e k(x)=and m ') score when voting, x ∈ C mProbability tables be shown:
s (k)(x∈C m)=P(x∈C m/e k(x)=m’) i=1,2,…,M
Suppose that the frequency that postcode D occurs is expressed as f (D), then X is calculated as follows from the score of D:
s ( d n | x n ) = 1 K Σ k = 1 K s ( k ) ( x n ∈ C d n )
S ( D | X ) = Π n = 1 N s ( d n | x n ) + Π n = 1 N Σ k = 1 K s ( k ) ( x n ∈ C d n )
The probability that last X belongs to D is p (D|X)=e F (D)S (D|X).
5, postal coding numberical string identifying method according to claim 1 and 2 is characterized in that: in the described step (5), determine the method for the recognition result of input pattern to be according to Probability p (D|X),
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, and p (D|X)>α, X=D then,
Be that recognition result is D; Wherein α is refusing to know and a wrong threshold value that obtains compromise between knowing;
If exist D to belong to Ω, and p (D|X) is the maximal value in the recognition result, exists D ' to belong to Ω, and p (D ' | value X) is only second to maximal value p (D|X), if p (D|X)-p (D ' | X)>β, β is a constant here, X=D then, promptly recognition result is D.
6, postal coding numberical string identifying method according to claim 5 is characterized in that: the value of described α and β is respectively 0.5 and 0.2.
CNB2005100235506A 2005-01-25 2005-01-25 Postal coding numberical string identifying method Expired - Fee Related CN1300740C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005100235506A CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005100235506A CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Publications (2)

Publication Number Publication Date
CN1645408A CN1645408A (en) 2005-07-27
CN1300740C true CN1300740C (en) 2007-02-14

Family

ID=34875908

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005100235506A Expired - Fee Related CN1300740C (en) 2005-01-25 2005-01-25 Postal coding numberical string identifying method

Country Status (1)

Country Link
CN (1) CN1300740C (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100452042C (en) * 2006-06-23 2009-01-14 腾讯科技(深圳)有限公司 Digital string fuzzy match method
CN101894266A (en) * 2010-06-30 2010-11-24 北京捷通华声语音技术有限公司 Handwriting recognition method and system
CN110443159A (en) * 2019-07-17 2019-11-12 新华三大数据技术有限公司 Digit recognition method, device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0929179A (en) * 1995-07-17 1997-02-04 Toshiba Corp Address reading device
CN1154879A (en) * 1996-12-19 1997-07-23 邮电部第三研究所 Process and apparatus for recognition of postcode in course of letter sorting
JPH1034089A (en) * 1996-07-30 1998-02-10 Toshiba Corp Video coding equipment
US6269171B1 (en) * 1995-04-12 2001-07-31 Lockheed Martin Corporation Method for exploiting correlated mail streams using optical character recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269171B1 (en) * 1995-04-12 2001-07-31 Lockheed Martin Corporation Method for exploiting correlated mail streams using optical character recognition
JPH0929179A (en) * 1995-07-17 1997-02-04 Toshiba Corp Address reading device
JPH1034089A (en) * 1996-07-30 1998-02-10 Toshiba Corp Video coding equipment
CN1154879A (en) * 1996-12-19 1997-07-23 邮电部第三研究所 Process and apparatus for recognition of postcode in course of letter sorting

Also Published As

Publication number Publication date
CN1645408A (en) 2005-07-27

Similar Documents

Publication Publication Date Title
CN1282072A (en) Error correcting method for voice identification result and voice identification system
CN1276381C (en) Region detecting method and region detecting apparatus
CN1162803C (en) Bill distinguishing device and method and recording medium for recording the method
CN102346847B (en) License plate character recognizing method of support vector machine
CN1163841C (en) Online Handwritten Chinese Character Recognition Device
CN101059870A (en) Image cutting method based on attribute histogram
CN101038686A (en) Method for recognizing machine-readable travel certificate
CN1514985A (en) Identification, separation and compression of multiple tables with variants
CN1235108C (en) Computer viruses detection and identification system and method
CN1135492C (en) Handwriting verification device
CN1300740C (en) Postal coding numberical string identifying method
CN1388947A (en) Character recognition system
CN1302456C (en) Sound veins identifying method
CN100342391C (en) Automatic fingerprint classification system and method
CN1525387A (en) Apparatus and method for detecting image blurriness
CN1904906A (en) Address recognition device and method
CN1920857A (en) First-end stroke online extraction method for written Chinese character
CN1167956A (en) Similar word recognition method and device
CN1066335A (en) Character Recognition Method and System
CN100390815C (en) Template-optimized character recognition method and system
CN1092822C (en) Full address reading apparatus
CN110032999A (en) A kind of low resolution licence plate recognition method that Hanzi structure is degenerated
CN101046809A (en) New word identification method based on association rule model
CN1186744C (en) Chinese character recognizing method based on structure model
CN1570972A (en) An image retrieval method based on image grain characteristic

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20070214

Termination date: 20150125

EXPY Termination of patent right or utility model