KR20010015113A

KR20010015113A - An expression method of names of places, a recognition method of names of places and a recognition apparatus of names of places

Info

Publication number: KR20010015113A
Application number: KR1020000036989A
Authority: KR
Inventors: 고가마사시; 후루까와나오히로; 이께다히사시; 오가따히사오; 사꼬히로시; 후지사와히로미찌
Original assignee: 가나이 쓰토무; 가부시키가이샤 히타치세이사쿠쇼
Priority date: 1999-07-01
Filing date: 2000-06-30
Publication date: 2001-02-26
Anticipated expiration: 2020-06-30
Also published as: KR100692327B1; JP2001014311A; JP3709305B2; CN100424676C; CN1287317A

Abstract

지명 표기에 많은 이표기(異表記)가 있는 경우에도, 모든 지명 문자열을 망라한 지명 사전을 적은 수고로도 작성 가능하게 하고, 고속의 대조 처리를 가능하게 한다.Even when there are many notations in the name representation, it is possible to create a name dictionary that covers all names strings with little effort, and to enable fast collation processing.

본 발명은, 문맥 자유 문법의 생성 규칙의 표현법 중 1개로서 알려진 BNF 기법을, 지명의 표기에 적합하도록 확장하여, 지명의 이표기를 표현한다.The present invention extends the BNF technique known as one of the expressions of the generation rule of the context-free grammar to be suitable for the notation of the name, and expresses the name notation of the name.

생성 규칙에 따라, 전형적인 이표기의 패턴, 예를 들면 「ヶ」「ケ」「が」를 1개의 구문 카테고리로서 정의함으로써, 지명의 이표기의 집합을 간결히 표현할 수 있다. 또한, BNF 기법으로 채용되어 있는 선택 기호를 이용함으로써, 더욱 지명의 이표기를 간결하게 표현하는 것이 가능해진다. 처리는, 지명의 이표기를 생성 규칙으로 표현하고(102), 생성 규칙으로부터 얻어지는 네트워크를 이용하여 인식 처리(104)를 행하는 것이다. 이 때문에, 다양한 이표기의 집합을 빠짐없이 기재하는 사전을 용이하게 작성할 수 있다.According to the generation rule, a set of typical notation names, for example, "ヶ", "KE" and "が" can be defined simply as one syntax category. In addition, by using a selection symbol employed in the BNF technique, it is possible to express a notation of a place name more concisely. The processing is to express the notation of a place name with a production rule (102), and perform the recognition process 104 using a network obtained from the production rule. For this reason, it is possible to easily create a dictionary describing all sets of various notations.

Description

NAME EXPRESSION METHOD OF NAMES OF PLACES, A RECOGNITION METHOD OF NAMES OF PLACES AND A RECOGNITION APPARATUS OF NAMES OF PLACES}

본 발명은, 지명군 표현 방법, 지명 문자열 인식 방법 및 장치에 관한 것으로, 특히 문서 상에 기재된 지명을 판독하는 장치에 사용되는 지명 문자열 기억 수단 및 대조 수단에 적용하여 적합한 지명군 표현 방법, 지명 문자열 인식 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION Field of the Invention [0001] The present invention relates to a method for expressing a name group, a method for recognizing a name string, and a device for recognizing a name group, and a name string suitable for applying to a name string storage means and a matching means used in particular for a device for reading a name described on a document. A recognition method and apparatus are provided.

일반적으로, 都道府縣(일본의 행정구역)명, 시읍면명, 字(일본의 말단 행정구획의 하나)명 등의 지명 단어의 배열로 이루어지는 문자열(이하, 지명 문자열이라고 함)을 화상 내에서 판독하는 문자 인식 장치는,In general, a character string (hereinafter referred to as the name string) consisting of an array of names words such as the name of 都道府縣 (Japan's administrative division), the name of the municipality, and the character (one of the terminal administrative divisions of Japan) is included in the image. The character recognition device to read,

(1) 문자 패턴을 추출한다(문자 추출),(1) extract the character pattern (character extraction),

(2) 각각의 문자 패턴의 자종(문자 코드)을 식별한다(문자 식별),(2) identify the character type (character code) of each character pattern (character identification),

(3) 문자의 식별 결과를 미리 기억한 지명 단어의 열과 대조하고(문자열 대조), 3개의 기능을 구비하여 구성한다.(3) The identification result of the character is matched with the column of the place word which has been memorized in advance (string collation), and three functions are provided.

문자열 대조의 방법에 관한 종래 기술로서, 예를 들면 丸川 등에 의한 방식(정보 처리학회 논문지 제35권 제6호「손으로 쓴 한자 주소 인식을 위한 에러 수정 알고리즘」) 등이 알려져 있다. 또한, 문자 추출, 인식, 대조를 일체화한 방식에 관한 종래 기술로서, 히든 마르코브 모델에 기초한 방식(O. E. Agazzi, et al., "Connected And Degraded Text Recognition using Planar Hidden Markov Models," Proceedings of International Conference on Acoustics, Speech, and Signal Processing), 탐색적으로 문자열을 인식하는 방법(Koga, et al., “Lexical Search Approach for Character-String Recognition" Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998)이 알려져 있다.As a conventional technique related to the method of character string matching, for example, a method by Kagawa et al. (Information Processing Society, Vol. 35, No. 6, "Error Correction Algorithm for Recognizing Handwritten Chinese Character Address") and the like are known. In addition, as a conventional technique for a method of integrating character extraction, recognition, and contrast, a method based on a Hidden Markov model (OE Agazzi, et al., "Connected And Degraded Text Recognition using Planar Hidden Markov Models," Proceedings of International Conference on Acoustics, Speech, and Signal Processing), a method known as exploratory string recognition (Koga, et al., “Lexical Search Approach for Character-String Recognition" Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998) have.

상술된 종래 기술은, 문자열 대조 처리를 위해, 미리 출현할 수 있는 지명 문자열을 기억하는 수단, 지명 문자열 사전이 필요하다. 그리고, 지명 문자열 사전으로는, 이하 도시된 바와 같은 3 종류의 것이 있다.The above-described prior art requires a name string dictionary and a means for storing a named string that can appear in advance for the string matching process. Incidentally, there are three kinds of names string dictionaries as shown below.

(1) 파일에 저장된 「사전 소스 파일」(1) "Dictionary source file" saved in file

이것은, 후술된 「지명 표기 규칙 파일」등이고, 신규 작성이나 수정을 위해, 편집이 가능해야한다.This is a "name designation rule file" described later, and the like must be editable for new creation or modification.

(2) 메모리 상에 저장된 「사전 테이블」(2) "dictionary table" stored in memory

이것은, 후술된 「지명 표기 네트워크」등이고, 사전 파일의 내용을, 대조 처리에 적합한 형식으로 메모리 상에 전개한 것이다.This is the "name designation network" described later, and the contents of the dictionary file are expanded on the memory in a format suitable for collation processing.

(3) 상술된 (1)과 (2)의 중간 단계인 「사전 바이너리 파일」(3) "dictionary binary file" which is an intermediate step between (1) and (2) described above.

이것은, 메모리 상에의 전개를 쉽게 하기 위해, 미리 전개 처리의 일부를 실시한 결과를 파일에 저장한 것이다.This is a result of storing a part of the development process in advance in a file in order to facilitate development on the memory.

종래 기술에 사용되는 사전 소스 파일의 형식은 분명해지지 않은 경우가 많다. 그러나, 종래 기술은, 모두 출현할 수 있는 지명 문자열을 미리 빠짐없이 사전 테이블에 기억하는 것을 전제로 하고 있고, 이 때문에 출현할 수 있는 지명 문자열을, 미리 빠짐없이 열거한 텍스트 파일이, 사전 소스 파일로서 이용되고 있다고 생각되어진다.The format of the dictionary source file used in the prior art is often not clear. However, the prior art is based on the premise that all names strings that can appear in the dictionary table are stored in advance. Therefore, a text file that lists all the names strings that can appear in advance is a dictionary source file. It is thought that it is used as.

상술된 종래 기술은, 문자열 대조 처리를 위해 사전을 준비할 필요가 있지만, 일본어로는 동일한 지역을 다른 문자열로 표현하는 이표기가 많아, 출현할 수 있는 지명 문자열을 사전에 빠짐없이 등록하는 것이 곤란하고, 이를 위한 완전한 사전을 손으로 작성하는 것이 사실상 불가능하다는 문제점을 갖고 있다.In the above-described prior art, it is necessary to prepare a dictionary for character string matching processing, but in Japanese, there are many double letters representing the same region in different character strings, and it is difficult to register all the names strings that can appear in advance. The problem is that it is virtually impossible to write a complete dictionary for this.

일본어의 지명의 이표기에는, 사용하는 문자의 차이에 따른 이표기, 단어의 생략에 따른 이표기, 부가적인 문자열에 따른 이표기, 도로명의 표기에 따른 이표기등이 있다. 이하, 이들 이표기의 예에 대해 설명한다.In the Japanese place name notation, there are two notations in accordance with the difference in the letters used, two notations in which the word is omitted, two notations in accordance with additional character strings, and two notations in the notation of road names. Hereinafter, examples of these two notations will be described.

(1) 사용하는 문자의 차이에 따른 이표기(1) Dual notation according to the difference in the letters used

와 「小澤」, 「市ヶ谷」과 「市ケ谷」과 「市が谷」등이 있다. There are also "小澤", "市ヶ谷", "市ケ谷" and "市が谷".

(2) 단어의 생략에 따른 이표기(2) two-letter notation

都道府縣명을 생략하는 이표기, 「大字」, 「字」(大字, 字 모두, 일본의 말단 행정구획의 하나)를 생략하는 이표기가 있다. 都道府縣명을 생략하는 이표기는, 우편물의 수신인 등의 경우에 많이 보이는데, 예를 들면 「埼玉縣川越市大字小ヶ谷」과 「埼玉縣川越市小ヶ谷」등이 있다. 또한, 「大字」, 「字」를 생략하는 예로서, 예를 들면 「埼玉縣川越市大字小ヶ谷」과 「埼玉縣川越市小ヶ谷」등이 있다.There are two different symbols to omit the name of 都道府縣, and two symbols that omit "大字" and "字" (both 大字 and characters). The abbreviation for omits the name of the city is often seen in the case of mail recipients. For example, there are two types of names, such as 埼玉縣川越市大字小ヶ谷 and 과 玉縣川越市小ヶ谷. In addition, examples of omitting "大字" and "字" include, for example, "埼玉埼川越市大字小ヶ谷" and "埼玉縣川越市小ヶ谷".

(3) 부가적인 문자열에 따른 이표기(3) two-letter notation according to additional character strings

小字명등의 본래 주소의 특정에는 불필요한 문자열이 부가되어 이표기가 되는데, 예를 들면 「埼玉縣川越市大字小ヶ谷」과 「埼玉縣川越市大字小ヶ谷字東關」 등이 있다.Unnecessary strings are added to the original address, such as the name of the Chinese character, to make the name. For example, there are two characters, such as "埼玉縣川越市大字小ヶ谷" and "埼玉縣川越市大字小ヶ谷字東關".

(4) 거리명의 표기에 따른 이표기(4) Marking notation according to street name notation

京都 등에서 많이 보이는 것으로, 예를 들면 「京都市下京區大政所町」과 「烏丸通佛光寺下る」 등이 있다.There are many things that can be seen in Tokyo, such as 京都市下京區大政所町 and 烏丸通佛光寺下る.

상술된 바와 같이, 지명의 이표기에는 각종의 것이 있지만, 예를 들면 「埼玉縣川越市小ヶ谷」이라는 지명을 예로 하여, 이것에 대응하는 이표기를 조사해 보면, 다음에 열거한 바와 같이, 매우 다수의 이표기가 있는 것을 알 수 있다.As mentioned above, there are various kinds of names notation, but, for example, taking the name of "Jami Yumi River" as an example, and examining the notation corresponding to this, as listed below, It can be seen that there are a plurality of notations.

「埼玉縣川越市小ヶ谷」「埼玉縣川越市小ヶ谷」

「埼玉縣川越市小ケ谷」「埼玉縣川越市小ケ谷」

「埼玉縣川越市小が谷」「埼玉縣川越市小が谷」

「埼玉縣川越市大字小ヶ谷」「埼玉縣川越市大字小ヶ谷」

「埼玉縣川越市大字小ケ谷」「埼玉縣川越市大字小ケ谷」

「埼玉縣川越市大字小が谷」「埼玉縣川越市大字小が谷」

「川越市小ヶ谷」川川市小ヶ谷

「川越市小ケ谷」川川市小ケ谷

「川越市小が谷」川川市小が谷

「川越市大字小ヶ谷」`` Kawazai City Character ''

「川越市大字小ケ谷」`` Kawazai City Character ''

「川越市大字小が谷」`` Kawazai City Character ''

상술된 예에서는, 또한 「埼玉縣川越市小ヶ谷東田」, 「埼玉縣川越市小ヶ谷東關」, 「埼玉縣川越市小ヶ谷西關」등, 小字명이 함께 이용되는 경우가 있고, 상술에서 열기한 12의 이표기와의 조합을 고려하면, 합계 84가지의 이표기가 존재하게 된다.In the above-described example, there may be cases in which small characters are used together, such as "埼玉縣川越市小ヶ谷東田", "埼玉縣川越市小ヶ谷東關", "埼玉縣川越市小小谷谷西關", In consideration of the combination with the 12 double letters mentioned above, there are 84 different double letters.

종래 기술의 경우, 상술된 바와 같은 다양한 이표기의 모든 조합을, 망라하여 손으로 사전 파일에 등록할 필요가 있고, 이 때문에 사전 파일 작성에는 많은 수고가 든다는 문제가 있었다. 또한, 이표기가 특히 많은 京都市 등의 경우, 시내의 동네명과 거리명과의 호칭법의 합계가 수십만가지나 달해, 완전한 사전을 손으로 작성하는 것은 사실상 불가능하였다.In the prior art, it is necessary to register all the combinations of the various notations as described above in the dictionary file by hand, and therefore, there is a problem that the dictionary file creation takes a lot of trouble. In addition, in the case of many cities such as Tokyo, where the sum of the names of neighborhoods and street names in the city has reached hundreds of thousands, it was virtually impossible to create a complete dictionary by hand.

본 발명의 제1 목적은, 상술된 문제점을 해결하고, 문자열 대조용 사전에 다양한 이표기를 빠짐없이 등록하는 것을 용이하게 할 수 있는 지명 표기 방법을 제공하는 것에 있다.It is a first object of the present invention to provide a place name notation method that can solve the above-described problems and facilitate the registration of various double-word notation in the character string dictionary.

상술된 바와 같이, 지명의 표기에 이표기가 많은 경우, 가령 이표기를 빠짐없이 사전에 기재할 수 있었다고 해도, 종래 기술의 것으로는, 사전의 기억 용량이 커지고, 처리 시간도 이표기의 수에 따라 커진다는 문제점이 생기게 된다.As described above, when there are many notations in the designation of a place name, even if the notation can be described in advance without any omission, the prior art has a large storage capacity and the processing time also increases with the number of notations. Will cause problems.

상술된 문제점을 해결할 수 있는 기술로서, 트라이(Trie)라고 불리는 데이터 형식에 따라, 사전의 기억 용량을 작게 하고, 더욱 대조 처리를 고속으로 할 수 있도록 한 기술이, (Koga, et al., “Lexical Search Approach for Character-String Recognition" Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998) 등에 기재되어 알려져 있다. 이 기술은, 표기에 다양함이 있는 부분만 분기하는, 트리 형식의 데이터로서 지명을 표기하는 것이고, 문자열의 집합으로부터 Trie를 자동적으로 생성하는 것을 용이하게 한 것이다.As a technique capable of solving the above-described problem, a technique that allows the prior storage capacity to be made smaller and the collation processing can be made faster according to a data format called Trie (Koga, et al., “ Lexical Search Approach for Character-String Recognition "and the Third International Association for Pattern Recognition Workshop on Document Analysis Systems 1998." This technique is known as a tree-type data that diverges only portions that vary in notation. Notation, making it easy to automatically generate a Trie from a set of strings.

상술된 기술은, 예를 들면 「埼玉縣川越市小ヶ谷東田」,「埼玉縣川越市小ヶ谷東關」, 「埼玉縣川越市小ヶ谷西關」의 3개의 표기로부터, 이하와 같은 Trie를 용이하게 생성할 수 있다.The above-mentioned technique is, for example, from the three notations of "埼玉縣川越市小ヶ谷東田", "埼玉縣川越市小小谷谷東關", and "埼玉縣川越市小小谷谷西ヶ" as follows. Trie can be easily generated.

이하, 이와 같이, 지명 문자열에서의 문자의 연접 관계를 네트워크로 나타낸 것을 지명 표기 네트워크로 하기로 한다.In the following description, a network representing the concatenation of characters in a name string is referred to as a place name notation network.

그러나, 이러한 Trie형의 지명 표기 네트워크는, 문자열의 일부분에 차이가 있는 경우, 이들을 전혀 다른 문자열로서 취급하여, 다른 브랜치를 생성해야되고, 이 때문에, 예를 들면 「埼玉縣川越市小ヶ谷」의 이표기군에 대응하는 Trie는, 다음에 도시된 바와 같은 큰 것으로 되어 버린다.However, if there is a difference in a part of a character string, such a trie-type name representation network must treat them as completely different character strings and create another branch. Thus, for example, Trie corresponding to the heterologous notation becomes large as shown next.

(이하 생략)(Omit below)

상술된 바와 같이, Trie 형의 지명 표기 네트워크를 이용하는 수법으로 이표기를 표현하려고 해도, 사전 용량, 처리 시간 모두 대폭 증대한다는 문제점이 생긴다.As described above, even when trying to express this notation by a method using a Trie-type geographical name notation network, there arises a problem that both the preliminary capacity and the processing time are greatly increased.

따라서, 본 발명의 목적은, 다양한 이표기를 인식하기 위한 지명 사전에 사용하기 위한, 기억 용량이 작고 또한 고속으로 대조 처리를 할 수 있도록 기억 형식을 갖는 지명 표현 방법, 지명 문자열 인식 방법 및 장치를 제공하는 것에 있다.Accordingly, an object of the present invention is to provide a name expression method, a name string recognition method and apparatus having a memory format for use in a name dictionary for recognizing a variety of different notations and having a small-capacity and fast collation process. It is in doing it.

본 발명에 따르면 상기 목적은, 지역을 나타내는 지명이, 다른 문자열이지만 동일한 지역을 의미하는 단어의 배열에 의해 표현되는 복수의 이표기를 갖는 지명 문자열의 집합을 표기하는 지명 표현 방법에 있어서, 지명 문자열의 일부 또는 전부를 구성하는 부분 문자열마다, 문자 또는 구문 카테고리의 배열을 정의하고, 문자 또는 정의를 끝낸 구문 카테고리의 배열로 이루어지는 구문 카테고리에 의해 지명 문자열을 나타냄에 따라 달성된다.According to the present invention, the above object is a geographical name representation method in which a geographical name represents a set of geographical names strings having a plurality of double letters represented by an array of words representing different regions but the same region. For each substring constituting some or all, it is achieved by defining an array of characters or syntax categories, and representing the named string by a syntax category consisting of an array of characters or a finished syntax category.

또한, 상기 목적은, 상기 지명 문자열을, 구문 카테고리가 다른 어떠한 문자 또는 구문 카테고리 열로 치환되는지를 나타내는 치환 기호와, 어떤 구문 카테고리가 특정한 지역을 나타내는 것을 나타내는 지명 기호를 이용하여 표현함으로써 달성된다.The above object is also achieved by expressing the named string using a substitution symbol indicating which character or phrase category string the syntax category is replaced with, and a name symbol indicating which syntax category indicates a specific region.

또한, 상기 목적은, 입력 문자열의 부분 문자열이, 미리 주어진, 지명 문자열의 일부 또는 전부를 구성하는 부분 문자열마다, 문자 또는 구문 카테고리의 배열을 정의하고, 문자 또는 정의를 끝낸 구문 카테고리의 배열로 이루어지는 구문 카테고리에 의해 나타낸 지명 문자열의 1개와 일치하는지의 여부를 판단함으로써, 입력 문자열 중에서 지명을 대조함으로써 달성된다.In addition, the above object is that the substring of the input string defines an array of characters or syntax categories for each substring constituting part or all of the named string, which is previously given, and consists of an array of characters or delimited syntax categories. This is accomplished by matching a place name in the input string by determining whether it matches one of the place name strings indicated by the syntax category.

또한, 상기 목적은, 지명 문자열의 일부 또는 전부를 구성하는 부분 문자열마다, 문자 또는 구문 카테고리의 배열을 정의하고, 문자 또는 정의를 끝낸 구문 카테고리의 배열로 이루어지는 구문 카테고리에 의해 나타낸 지명 문자열을 기억하는 기억 수단과, 문자열을 입력하는 입력 수단과, 입력된 문자열이 상기 기억 수단에 기억한 지명 문자열인지의 여부를 대조하는 수단과, 대조의 결과를 출력하는 수단과 구비함에 따라 달성된다.The above object is also to define an array of characters or syntax categories for each substring constituting part or all of the names string, and to store the names string indicated by the syntax category consisting of the array of characters or syntax categories that have been defined. And a storage means, an input means for inputting a character string, means for collating whether or not the input character string is a named character string stored in the storage means, and means for outputting a result of the collation.

또한, 상기 목적은, 문서의 표면의 농담을 전기 신호로 변환하여 얻을 수 있는 화상을 입력으로 하여, 문서 상에 기재되어 있던 문자를 판독하는 문자 판독 수단을 구비하고, 상기 입력 수단이 상기 문자 판독 수단으로부터의 문자열을 입력함으로써 달성된다.Moreover, the said objective is equipped with the character reading means which reads the character described on the document by making into an image the image obtained by converting the shade of the surface of a document into an electrical signal, and the said input means reads the said character. This is accomplished by entering a string from the means.

구체적으로 말하면, 본 발명은, 상기 목적을 달성하기 위해, 지명의 이표기를 문맥 자유 문법의 생성 규칙을 이용하여 표현한다. 문맥 자유 문법은, 어떤 문장의 요소(구문 카테고리)가 어떠한 다른 구문 카테고리의 열로 치환되는지를, 생성 규칙에 의해 나타내는(「자연 언어 처리 입문」 근대 과학사, ISBN4-7649-0143-9). 본 발명은, 생성 규칙의 표현법 중 1개로서 알려진 BNF 기법(Backus-Naur-Form)(中田「컴파일러」 ISBN4-7828-5057-3)을, 지명의 표현에 적합하도록 확장한 확장 BNF 기법을 이용한다.Specifically, in order to achieve the above object, the present invention expresses a noun name using a generation rule of a context free grammar. The context-free grammar indicates, by generation rules, which sentence elements (syntax categories) are replaced by columns of which other syntax categories ("Introduction to Natural Language Processing", Modern Science History, ISBN4-7649-0143-9). This invention uses the extended BNF technique which extended the BNF technique (Backus-Naur-Form) known as one of the expressions of a generation rule (Middle "compiler" ISBN4-7828-5057-3) to be suitable for expression of a place name. .

상술된 생성 규칙에 따라, 전형적인 이표기의 패턴, 예를 들면 「ヶ」, 「ケ」, 「が」를 1개의 구문 카테고리로서 정의할 수 있고, 지명의 이표기의 집합을 간결히 표현할 수 있다. 또한, BNF 기법으로 채용되는 선택 기호를 이용함으로써, 지명의 이표기를 보다 간결하게 표현하는 것이 가능해진다. 이 때문에, 본 발명에 따르면, 다양한 이표기의 집합을 빠짐없이 기재한 사전을 용이하게 작성할 수 있다.According to the generation rule described above, a pattern of typical notation, for example, "ヶ", "ケ", and "が" can be defined as one syntax category, and the set of notation of place names can be expressed concisely. . In addition, by using a selection symbol employed in the BNF technique, it is possible to express a notation of a place name more concisely. For this reason, according to the present invention, it is possible to easily create a dictionary in which all sets of various notations are described.

BNF 기법은, 문맥 자유 문법의 생성 규칙을, 치환, 옵션, 선택 등의 기호를 이용하여 표현하는 기법으로, 이하와 같은 기호를 이용한다.The BNF technique is a technique for expressing the generation rule of the context free grammar using symbols such as substitution, options, and selection, and uses the following symbols.

::= 치환. 좌변의 구문 카테고리를 우변의 구문 카테고리 또는 문자의 배열로 치환할 수 있는 것을 의미한다.:: = substitution. This means that the syntax category on the left side can be replaced with the syntax category or the array of characters on the right side.

[ ] 옵션. [ ] 내의 기술이 있어도 없어도 되는 것을 의미한다.[ ] option. It means that there is no need to have the description in [].

| 선택. 우변, 좌변의 어느 하나를 의미한다.| Selection. It means either the right side or the left side.

일례로서, 상술된 「埼玉縣川越市小ヶ谷」의 이표기의 생성 규칙을 BNF 기법으로 표현한 예를 이하에 나타낸다.As an example, the example which expressed the generation | occurrence | production rule of the above notation of "埼玉縣川越市小ヶ谷" by the BNF technique is shown below.

<wヶ>::=ヶ|ケ|が<w ヶ> :: = ヶ | ケ | が

<埼玉縣川越市小ヶ谷>::=[埼玉縣]川越市[大字]小<wヶ>谷[[字]東田|東關|西關]<埼玉縣川越市小ヶ谷> :: = [埼玉縣] 川越市 [大字] 小 <w ヶ> 谷 [[字] 東田 | 東關 | 西關]

또한, 상술된 바와 같은 표기 형식을 이용함으로써, 지명 표기 네트워크를 소형화하는 것이 가능해진다. 상술된 표기 형식에서는, 부분 문자열의 차이는 기호 「[ ]」나 「|」을 이용하여 표현되고 있다. 이 때문에, 부분 문자열의 차이가 이표기에 있을 수 있는 경우, 그 부분을 바이패스하는 경로를 네트워크 상에 용이하게 설정할 수 있다. 예를 들면, 상술된 BNF 기법의 표기는, 아래에 도시된 바와 같은 조밀한 네트워크로 치환할 수 있다. 종래와 같은 문자열의 나열로부터 이러한 조밀한 네트워크를 생성하는 것은, 곤란하였다.In addition, by using the above-described notation format, it becomes possible to miniaturize the place name notation network. In the above-described notation format, the difference between the substrings is expressed using the symbols "[]" and "|". For this reason, when the difference between the substrings can be present in this notation, a path for bypassing the substring can be easily set on the network. For example, the notation of the BNF technique described above may be replaced by a dense network as shown below. It was difficult to generate such a dense network from the conventional string sequence.

도 1은 본 발명의 실시예에 따른 지명 문자열 인식의 처리예를 설명하는 플로우차트.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a flowchart for explaining an example of a processing of a name string recognition according to an embodiment of the present invention.

도 2는 편집된 지명 문자열 생성 규칙에 따라 표현된 지명의 표기예와 생성 규칙을 이용하지 않고 이표기를 나열한 예를 나타낸 도면.FIG. 2 is a diagram showing an example of notation of a place name expressed according to the edited place name generation rule and an example of listing this notation without using a generation rule; FIG.

도 3은 생성 규칙의 예로부터 만들어진 지명 표기 네트워크를 모식적으로 나타낸 도면.3 is a diagram schematically illustrating a place name notation network created from an example of a generation rule.

도 4는 지명 표기 네트워크를 계산기 상에 실장할 때의 데이터 형식을 설명하는 도면.Fig. 4 is a diagram for explaining a data format when the name representation network is mounted on a calculator.

도 5는 지명 문자열 생성 규칙으로부터 지명 표기 네트워크를 생성하는 처리를 설명하는 플로우차트.5 is a flowchart for explaining a process for generating a name representation network from a name string generation rule.

도 6은 생성되는 구문목(構文木)의 예를 설명하는 도면.FIG. 6 is a view for explaining an example of a syntactic tree generated. FIG.

도 7은 지명 표기 생성 규칙으로부터 지명 표기 네트워크 생성하는 함수 proc 에 의한 처리 동작을 설명하는 플로우차트.Fig. 7 is a flowchart for explaining a processing operation by a function proc that generates a name-notation network from a name-notation generation rule.

도 8은 함수 proc에 의해 지명 표기 네트워크가 생성되는 과정을 설명하는 도면(그 1).FIG. 8 is a diagram for explaining a process of generating a place name network by the function proc (No. 1). FIG.

도 9는 함수 Proc에 의해 지명 표기 네트워크가 생성되는 과정을 설명하는 도면(그 2).FIG. 9 is a view for explaining a process of generating a place name notation network by the function Proc (No. 2). FIG.

도 10은 지명 표기 생성 규칙으로부터 생성된 지명 표기 네트워크군을 나타낸 도면.FIG. 10 is a diagram illustrating a place name network group generated from a place name generation rule. FIG.

도 11은 종래 기술을 이용하여 지명 표기 네트워크를 생성하는 처리 순서를 설명하는 플로우차트.Fig. 11 is a flowchart for explaining a processing sequence for generating a place name notation network using the related art.

도 12는 종래 기술에 의해 생성되는 지명 표기 네트워크의 생성 과정의 예를 설명하는 도면.FIG. 12 is a diagram for explaining an example of a generation process of a place name notation network generated by the prior art; FIG.

도 13은 종래 기술에 의해 생성된 지명 표기 네트워크의 예를 나타내는 도면.FIG. 13 shows an example of a place name notation network created by the prior art. FIG.

도 14는 도 1에 도시된 지명 인식 처리에서의 처리 동작을 설명하는 플로우차트.FIG. 14 is a flowchart for explaining a processing operation in the name recognition processing shown in FIG. 1; FIG.

도 15는 도 14에 도시된 문자열 대조 처리에서의 처리 동작을 설명하는 플로우차트.FIG. 15 is a flowchart for explaining a processing operation in the character string matching process shown in FIG. 14; FIG.

도 16은 함수 srch의 처리 동작을 설명하는 플로우차트.Fig. 16 is a flowchart for explaining processing operation of function srch.

도 17은 본 발명의 실시예에 따른 지명 문자열 인식의 처리를 응용한 시스템의 구성예를 나타내는 블록도.Fig. 17 is a block diagram showing an example of the configuration of a system to which the name string recognition processing is applied according to the embodiment of the present invention.

도 18은 지명 문자열 생성 규칙 편집 장치의 구성을 나타내는 블록도.Fig. 18 is a block diagram showing the construction of a designated character string generation rule editing apparatus.

도 19는 본 발명의 다른 실시예의 구성을 나타내는 블록도.Fig. 19 is a block diagram showing the construction of another embodiment of the present invention.

도 20은 디스플레이에 표시되는 화면예를 설명하는 도면.20 is a diagram illustrating an example of a screen displayed on a display.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for main parts of the drawings>

101 : 지명 문자열 생성 규칙 편집 처리101: handle editing the name string generation rule

102 : 지명 문자열 생성 규칙 파일102: Name string generation rule file

103 : 지명 표기 네트워크 생성 처리103: name generation network generation processing

104 : 지명 인식 처리104: name recognition processing

1404 : 문자열 대조 처리1404: String collation processing

1701 : 우편 구분기1701: Postal Separator

1702 : 스캐너1702: Scanner

1703 : 딜레이 라인1703: delay line

1704 : 소터1704: sorter

1705 : 지명 인식 장치1705: place name recognition device

1706 : 입력용 인터페이스1706: interface for input

1707 : 연산 처리 장치1707: arithmetic processing unit

1708 : 출력용 처리 장치1708: processing unit for output

1710 : 메모리1710: memory

1711 : 네트워크 인터페이스1711: network interface

1712 : 하드디스크1712: hard disk

1713 : 미디어 착탈 가능 기억 장치1713: removable media storage

1714 : 지명 문자열 생성 규칙 편집 장치1714: editing device for generating names strings

1718 : 네트워크1718: network

1801 : 마우스1801: mouse

1802 : 키보드1802: Keyboard

1803 : 디스플레이1803: Display

1804 : 지명 문자열 생성 규칙 편집 프로그램1804: program for editing names string generation rules

1805 : 문자열 대조 프로그램1805: String Matching Program

1806 : 지명 표기 네트워크 표시 프로그램1806: place name network display program

1807 : 지명 문자열 생성 규칙 파일1807: naming string generation rules file

1808 : 지명 표기 네트워크 생성 프로그램1808: place name network generator

1809 : 지명 표기 네트워크 데이터1809: place name network data

1810 : 통신 장치1810: communication device

1811 : 미디어 착탈 가능 기억 장치1811: removable media storage

1812 : 컴퓨터1812: Computer

1901 : 마우스1901: Mouse

1902 : 키보드1902: keyboard

1903 : 디스플레이1903: display

1904 : 프린터1904: Printer

1905 : 입력 파일1905: input file

1906 : 출력 파일1906: output file

1907 : 지명록 프로그램1907: Nominations Program

1908 : 지명 부가 정보 파일1908: place name additional information file

1909 : 지명 문자열 생성 규칙 파일1909: nomination string generation rules file

1910 : 통신 모듈1910: communication module

1911 : 인터페이스 모듈1911: interface module

1912 : 지명 리스트 데이터1912: place name data

1913 : 지명 리스트 소트 모듈1913: place name sort module

1914 : 지명 정보 검색 모듈1914: Geographical Information Search Module

1915 : 지명 리스트 생성 모듈1915: place name generation module

1916 : 문자열 대조 모듈1916: string matching module

1917 : 지명 표기 전개 모듈1917: place name development module

1918 : 지명 표기 네트워크 생성 프로그램1918: place name network generator

1919 : 지명 표기 네트워크 데이터1919: place name network data

이하, 본 발명에 따른 지명 표기 방법 및 지명 문자열 인식 방법의 실시예를 도면에 의해 상세히 설명한다.Hereinafter, embodiments of a place name notation method and a place name character string recognition method according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시예에 따른 지명 문자열 인식의 처리예를 설명하는 플로우차트이고, 우선 이 플로우에 대해 설명한다. 또, 이하의 설명에서 사용하는 플로우차트는, 게인-서슨(Gane-Sarson) 기법에 따라 표현하였다. 이 기법에 대해서는, 「J. Martin과 C. McClure의「소프트웨어 구조화 기법」,근대 과학사」, 「ISBN4-7649-0124-2 C3050 P5562E」에 기재되어 있다.Fig. 1 is a flowchart for explaining a processing example of name string recognition according to an embodiment of the present invention. First, this flow will be described. In addition, the flowchart used by the following description was expressed according to the Gain-Sarson technique. For this technique, see "J. Martin and C. McClure, "Software Structured Techniques", History of Modern Science, "ISBN4-7649-0124-2 C3050 P5562E."

(1) 우선, 지명의 인식에 앞서, 지명 문자열 생성 규칙 편집 처리(스텝 101)가, 지명의 이표기의 사례에 기초하여 지명 문자열의 생성 규칙을 작성하고, 이 생성 규칙을 지명 문자열 생성 규칙 파일(102)에 저장한다. 스텝 101의 지명 문자열 생성 규칙 편집 처리는, 계산기를 통한 인간의 편집 작업에 의해 실현할 수 있다.(1) First, prior to the recognition of the name, the name string generation rule editing process (step 101) creates a generation rule of the name string based on the case of the name notation, and sets this generation rule as the name string generation rule file. Save to 102. The nomenclature character string generation rule editing process of step 101 can be implemented by human editing work by a calculator.

(2) 이어서, 지명 표기 네트워크 생성 처리(스텝 103)가, 지명 문자열 생성 규칙 파일(102)을 판독하고, 지명 인식(104)을 위한 사전인 지명 표기 네트워크를 생성한다. 스텝 103의 지명 표기 네트워크 생성 처리는, 계산기 상의 프로그램으로서 실현할 수 있다.(2) Next, the place name notation network generation process (step 103) reads the place name character string generation rule file 102, and generates the place name notation network which is a dictionary for the place name recognition 104. FIG. The name notation network generation process of step 103 can be realized as a program on a calculator.

(3) 이어서, 지명 인식 처리(스텝 104)가, 지명 표기 네트워크를 참조하여, 입력 화상 중에서 지명 문자열을 판독한다. 스텝 104의 지명 인식 처리(104)는, 계산기 상의 프로그램으로서 실현할 수 있다.(3) Next, the name recognition processing (step 104) reads the name string in the input image with reference to the name representation network. The name recognition processing 104 of step 104 can be realized as a program on a calculator.

지명 문자열 생성 규칙 파일(102)은, 본 발명에 따른 「확장 BNF 기법」을 이용하여, 지명의 이표기군을 문맥 자유 문법의 생성 규칙에 의해 표현한다. 확장 BNF 기법은, BNF 기법에 결합 등의 기호를 확장한 것이고, 이하에 설명한 바와 같은 기호를 이용한다.The name string generation rule file 102 expresses a group of two names in a name by the generation rule of the context free grammar using the "extended BNF technique" according to the present invention. The extended BNF technique is an extension of a symbol such as combined with the BNF technique, and uses a symbol as described below.

::= 치환. 좌변의 구문 카테고리를 우변의 구문 카테고리, 또는 문자의 배열로 치환할 수 있는 것을 의미한다.:: = substitution. This means that the syntax category on the left side can be replaced with the syntax category on the right side or an array of characters.

[ ] 옵션. [ ]내의 기술을 생략해도 되는 것을 의미한다.[ ] option. Means that the description in [] may be omitted.

| 선택. 이 기호의 우측, 좌측의 어느 하나를 의미한다.| Selection. It means either the right or left side of this symbol.

<W 문자열> 구문 카테고리<W string> syntax category

<N 숫자> 특정한 지역을 나타내는 지명 문자열의 이표기군을 나타내는 구문 카테고리. 숫자는, 지명의 식별자. 0보다 큰 정수를 이용한다.<N number> A phrase category that represents a family of names in a place name string that represents a particular region. The number is an identifier for the name. Use an integer greater than zero.

그리고, 상술된 기호는, 이하의 우선 순위에 따라 평가된다.And the symbol mentioned above is evaluated according to the following priority.

(1)<W 문자열> 및 <N 숫자>의 변수명의 정의(1) Definition of variable names of <W string> and <N number>

(2) [ ] 및 ( )의 괄호 종류. 2중 이상의 상자로 괄호 종류를 이용하는 경우, 내측의 괄호를 우선하여 평가(2) Types of parentheses in [] and (). When using parentheses with two or more boxes, the inner parentheses are evaluated first.

(3) |(3) |

(4) ::=(4) :: =

도 2는 상술된 스텝101에 따른 편집 처리로 편집된 지명 문자열 생성 규칙에 따라 표현된 지명의 표기예와 생성 규칙을 이용하지 않고 이표기를 나열한 예를 나타낸 도면이다.Fig. 2 is a diagram showing an example of notation of a place name expressed according to the place name character string generation rule edited by the editing process according to the above-described step 101 and an example of listing this notation.

도 2(A)에 도시된 지명 문자열 생성 규칙에 따라 표현된 지명의 표기예는, 그 예로서, 「埼玉縣川越市大字小ヶ谷」(「東田」「東關」「西關」이 小字), 「埼玉縣川越市大字笠幡」(「久保」「河南」이 小字), 「埼玉縣川越市下廣谷」의 이표기를 본 발명에 따른 확장 BNF기법으로 표기한 예이다. 이와 같이, 다수의 이표기를 포함하는 지명을, 본 발명에 따라 도입한 기호를 이용함으로써, 매우 간단히 표현할 수 있다. 이에 대해, 도 2(B)에 도시된 생성 규칙을 이용하지 않고 이표기를 나열한 예는, 다수의 이표기를 나열할 뿐이므로, 도 2(A)에 도시된 4 행의 표기로부터 생성되는 이표기의 수는 106가지나 된다. 도 2(B)에 도시된 것은 그 일부이다.An example of a place name represented according to the place name character string generation rule shown in Fig. 2A is, for example, "埼玉縣川越市大字小ヶ谷" ("東田", "東關", "西關"). , "표 玉 이 川越市大字笠幡" ("久保", 河南 "is a small letter) and" 埼玉縣川越市下廣谷 "are examples of using the extended BNF technique according to the present invention. In this way, a place name containing a large number of two notations can be expressed very simply by using a symbol introduced according to the present invention. On the other hand, the example in which two notations are listed without using the generation rule shown in Fig. 2B only lists a plurality of notations, so that the notation generated from the notation of the four lines shown in Fig. 2A is shown. The number is 106. The part shown in FIG. 2 (B) is a part thereof.

지명 문자열 생성 규칙 파일(102)은, 통상의 텍스트 파일이고, 지명 문자열 생성 규칙 편집 처리의 스텝 101의 실현 수단으로서는 일반적인 텍스트 에디터를 적용하는 것이 가능하다.The name string generation rule file 102 is a normal text file, and as a means for realizing the step 101 of the name string generation rule editing process, a general text editor can be applied.

도 3은 도 2(A)의 생성 규칙의 예로부터 만들어진 지명 표기 네트워크를 모식적으로 도시한 도면이고, 이하 이것에 대해 설명한다.FIG. 3 is a diagram schematically showing a place name notation network created from the example of the generation rule in FIG. 2A, which will be described below.

지명 표기 네트워크는, 각 변이 부분 문자열로, 각 정점이 부분 문자열의 경계에 대응하는 방향성 그래프이다. 각 변의 방향은, 문자열 중 문자의 순으로 일치한다. 도 3에서, NULL이라고 기록된 변은, NULL 천이, 즉 그 개소에 아무것도 문자열이 없어도 되는 것을 나타내고 있다. 도 3 내의 우측밑에 선이 들어간 원(301)은, 지명 문자열의 개시 위치를 나타낸다. 또한, 중앙에 사선이 들어간 원(302∼304)은, 지명 문자열의 끝 위치를 나타낸다.The place name notation network is a directional graph in which each vertex is a substring, with each vertex corresponding to a boundary of the substring. The direction of each side coincides in the order of the characters in the string. In Fig. 3, the side recorded as NULL indicates that the NULL transition, that is, the string may have nothing at that location. The circle | round | yen 301 which the line entered in the lower right in FIG. 3 shows the starting position of a place name character string. Further, circles 302 to 304 with slanted lines in the center indicate the end positions of the designated character strings.

도 4는 지명 표기 네트워크를 계산기 상에 실장할 때의 데이터 형식을 설명하는 도면이고, 이하 이에 대해 설명한다. 그리고, 계산기 상에 지명 표기 네트워크를 실장할 때, 지명 표기 네트워크는, 도 4에 도시된 바와 같은 데이터 형식(left-child. right-sibling representation, T. 콜멘 외에 「알고리즘 인트로덕션」근대 과학사, pp.201-202)을 이용하여 표현된다. 이 데이터 형식은, 문자의 연접 관계를 자(子) 포인터로, 지명 표기 네트워크의 분기를 형제 포인터로 표현하는 것이다.FIG. 4 is a diagram for explaining a data format when the name representation network is mounted on a calculator, which will be described below. In addition, when the name representation network is mounted on a calculator, the name representation network has a data format as shown in FIG. 4 (left-child. Right-sibling representation, T. Colmen et al. 201-202). This data type expresses the concatenation of characters as child pointers and branches of the place name network as sibling pointers.

도 4(A)는, 각 데이터 레코드의 구성 요소를 나타내고 있고, 각 데이터 레코드는, 데이터 항목 c401, b402, d403의 3개의 데이터 항목으로 이루어진다. 데이터 항목 c는 문자 코드이고, 데이터 항목 b는 형제 포인터이다. 또한, 데이터 항목 d는 자포인터이다. 그리고, 어떤 데이터 레코드로부터의 분기는, 형제 포인터에 의해, 또한 문자열은, 자포인터에 의해 접속된 리스트 형식으로 표현된다. 예를 들면, 도 3에 도시된 지명 표기 네트워크를 상술된 데이터 레코드에 의해 리스트 형식으로 표현하면, 도 4(B)에 도시된 바와 같은 것이 된다.Fig. 4A shows the components of each data record, and each data record consists of three data items of data items c401, b402 and d403. Data item c is a character code and data item b is a sibling pointer. In addition, the data item d is a pointer. A branch from a data record is represented by a sibling pointer and a character string in a list form connected by a pointer. For example, if the place name representation network shown in Fig. 3 is represented in the form of a list by the above-described data record, it becomes as shown in Fig. 4B.

도 4(B)에 도시된 리스트 형식으로 표현한 지명 표기 네트워크에 있어서, 데이터 레코드(404')(문자 코드 「小」에 대응)로부터는, 데이터 레코드(404∼406)로 분기하지만, 데이터 레코드(404'로부터 404)에는 자포인터에 의해 연결되고, 데이터 레코드(404, 405, 406)는 형제 포인터에 의해 연결되어 있다. 또한, 문자열 「埼玉縣」은, 자포인터로 접속된 데이터 레코드(407, 408, 409)로 나타내고 있다. 또한, 데이터 레코드가 NULL 천이에 대응하는 경우, 그 데이터 레코드의 문자 코드(c401)에는 NULL 문자가 저장되고, 이 NULL 문자가 저장된 데이터 코드로부터 분기하는 데이터 레코드는, 생략되어도 되는 것을 의미한다. 또한, 지명 문자열의 최후의 문자에 대응하는 데이터 레코드 다음에, 데이터 레코드(410)로서 나타낸 바와 같이, 1개 여분의 데이터 레코드가 설치되고, 이 데이터 레코드(410)의 자포인터 d에는, NULL 포인터가 저장되어, 네트워크의 종단인 것을 나타냄과 함께, 형제 포인터 b에 지명의 식별자가 저장된다.In the geographical name notation network expressed in the list form shown in Fig. 4B, from the data record 404 '(corresponding to the character code "small") to the data records 404 to 406, the data record ( 404 'to 404 are connected by means of a pointer and data records 404, 405 and 406 are connected by sibling pointers. In addition, the character string "埼玉 로" is shown by the data records 407, 408, and 409 connected by the pointer. In addition, when a data record corresponds to NULL transition, a NULL character is stored in the character code c401 of the data record, and it means that the data record which branches off from the data code which stored this NULL character may be abbreviate | omitted. In addition, after the data record corresponding to the last character of the designated character string, one extra data record is provided, as indicated by the data record 410, and a null pointer is provided in the pointer d of the data record 410. Is stored, indicating that it is the end of the network, and the identifier of the place name is stored in the sibling pointer b.

상술된 바와 같은 형식 표현되는 도 4(B)에 도시된 리스트 형식의 지명 표기 네트워크는, 각 데이터 레코드가 노드에 대응하는 그래프라고 간주할 수 있고, 도 3에 모식적으로 나타낸 지명 표기 네트워크 중의 각 변이, 여기서는 문자수만큼의 노드로 나타내어지게 된다.The geographical name notation network shown in FIG. 4 (B) in the format representation as described above can be regarded as a graph in which each data record corresponds to a node, and each of the geographical name notation networks schematically shown in FIG. Variations, here, are represented by the number of nodes.

도 5는 도 1의 스텝 103에서의 지명 문자열 생성 규칙으로부터 지명 표기 네트워크를 생성하는 처리를 설명하는 플로우차트, 도 6은 생성되는 구문목의 예를 설명하는 도면이고, 이하 이들에 대해 설명한다.FIG. 5 is a flowchart for explaining a process of generating a place name notation network from the place name character string generation rule in step 103 of FIG. 1, FIG. 6 is a view for explaining an example of a syntax item to be generated, and these are described below.

우선, 지명 표기 생성 규칙 파일(102) 내의 각 지명 문자 열의 생성 규칙, 예를 들면 도 2(A) 상으로부터 2행째 이후의 <N 숫자>로 시작되는 각 행에 대해, 제어 루프(501)에 의해 하나씩 처리한다. 각 행에 대해, 우선 스텝502에서, 행 내의 문자열의 구문 해석을 행하고, 도 6에 도시된 바와 같은 구문목을 작성한다. 이어서, 스텝 503에서, 그 지명 이표기군에 대응하는 지명 표기 네트워크의 종단 노드 t_i를 생성한다. 이하, 특별히 지장이 없는 경우, 「지명 표기 네트워크 상의 노드」는 도 4(A)의 형식의 데이터 레코드를 나타내는 것으로 한다. t_i내의 문자 코드 c에는 NULL을, 자포인터 d에는 NULL을, 형제 포인터 b에는 그 지명 이표기군의 지명 식별자를 저장한다. 다음에, 스텝 504에서, 후술된 함수 proc를 사용하여 그 지명 이표기군에 대응하는 지명 표기 네트워크를 생성한다. 모든 지명 문자열 생성 규칙을 처리하는 것을 끝낸 후, 스텝505에서 생성된 지명 표기 네트워크가 중복된 부분을 통합한다.First, in the control loop 501, for each row that begins with the generation rule of each name string in the name representation generation rule file 102, e.g., <N number> after the second row from Fig. 2A. One by one. For each line, first in step 502, the syntax of the character strings in the line is parsed, and a syntactic head as shown in FIG. Next, in step 503, an end node t _i of the place name notation network corresponding to the place name notation group is generated. Hereinafter, unless there is a problem in particular, the "node on the place name designation network" shall represent the data record of the format of FIG. The character code c in t _i stores NULL, the pointer d, NULL, and the sibling pointer b stores the place name identifier of the place name notation. Next, in step 504, a place name notation network corresponding to the place name notation group is generated using the function proc described below. After finishing processing all the names string generation rules, the names representation network generated in step 505 merges the duplicated portions.

지명 문자열 생성 규칙으로부터 구문목을 생성하는 처리는, 예를 들면 「자연 언어 처리 입문」(근대 과학사, ISBN4-7649-0143-9, pp.19-30)에 기재되어 있는 생성 규칙에 따른 천이 네트워크를 생성하는 수법 등을 사용할 수 있다. 도 6에 도시된 스텝 502의 처리로 생성된 구문목의 예는, 도 2(A)의 2행째로부터 생성되는 구문목의 예이다. 이 도 6에서, 「+」가 기록된 원은 문자열의 연접을, 「[ ]」이 기록된 원은 옵션을, 「｜」이 기록된 원은 선택을 나타내고, 사각은 문자열을 나타내고 있다. 확장 BNF 기법은, 괄호 「(」, 「)」 도 사용되지만, 본 발명의 실시예에 사용하는 구문목은, 괄호에 대응하는 노드는 설치하지 않고, 괄호에 의해 정해지는 연산의 순서를, 구문목의 구조 자신에게 반영시킨 것으로 한다.The process of generating a syntactic tree from a naming string generation rule is, for example, a transition network according to the generation rule described in "Introduction to Natural Language Processing" (Modern Scientific History, ISBN4-7649-0143-9, pp.19-30). Can be used to generate the method. An example of the syntax tree generated by the process of step 502 shown in FIG. 6 is an example of the syntax tree generated from the second line of FIG. 2 (A). In Fig. 6, circles marked with "+" indicate concatenation of character strings, circles with "[]" indicate options, circles with "|" indicate selection, and squares indicate character strings. In the extended BNF technique, parentheses "(", ")" are also used, but the syntax used in the embodiments of the present invention is that the order of operations determined by the parentheses is provided without installing a node corresponding to the parenthesis. The structure of the neck should reflect to itself.

함수 proc는, 구문목으로부터 지명 표기 네트워크를 생성하기 위해 사용하는 함수이고, p와 a의 2개의 인수를 취한다. 인수 p는, 생성하는 지명 표기 네트워크의 종단의 노드의 자포인터 d가 취하는 값을 지정한다. 또한, 인수 a는, 처리 대상의 구문목의 최상위 노드를 나타낸다. 어떤 노드에 인수 a가 지정되면, 인수 a 이하의 모든 노드가 재귀적으로 처리된다.The function proc is a function used to generate a name-notation network from syntax. It takes two arguments, p and a. The argument p specifies the value taken by the pointer d of the node at the end of the name-notation network to be created. In addition, the argument a indicates the most significant node of the syntax item to be processed. If argument a is specified for any node, all nodes below argument a are processed recursively.

도 7은 함수 proc에 의한 처리 동작을 설명하는 플로우차트, 도 8, 도 9는 함수 Proc에 의해 지명 표기 네트워크가 생성되는 과정을 설명하는 도면, 도 10은 지명 표기 생성 규칙으로부터 생성된 지명 표기 네트워크군을 도시한 도면이고, 이하, 이들에 대해 설명한다. 또, 도 7에서, 도면에 도시되고 있는 p, q, r은, 도 4에 도시된 형식의 데이터 레코드의 어드레스를 나타내는 변수이고, 기호 「->」는, 데이터 레코드 중 데이터 항목을 나타내고 있다. 또한, 도 7에 도시된 플로우의 처리는, 구문목의 노드 a의 종류에 따라 4개에 경우를 분리하여 실행한다.7 is a flowchart illustrating a processing operation by a function proc, FIGS. 8 and 9 illustrate a process in which a name representation network is generated by a function Proc, and FIG. 10 is a name representation network generated from a name representation generation rule. It is a figure which shows a group, and it demonstrates below. In Fig. 7, p, q, and r shown in the figure are variables representing addresses of data records in the format shown in Fig. 4, and the symbol "-" indicates data items in the data records. In addition, the process of the flow shown in FIG. 7 performs four cases separately according to the kind of node a of a syntax tree.

(1) 구문목의 노드 a의 종류를 판별하고, 종류가, 「+」, 「｜」, 「[ ]」, 「문자열」의 어느 하나를 판정한다(스텝 701).(1) The type of node a in the syntax item is determined, and the type is determined as one of "+", "|", "[]", and "string" (step 701).

(2) 스텝 701에서의 판정으로, 구문목의 노드 a의 종류가 「+」즉 결합인 경우, 우선 변수 q에 인수 p를 복사한다. 즉, 이 처리로 생성하는 부분 네트워크의 종단 노드의 어드레스를 복사한다(스텝 702) .(2) In the judgment in step 701, if the node a of the syntax item is "+", that is, a combination, first, the argument p is copied to the variable q. That is, the address of the end node of the partial network generated by this process is copied (step 702).

(3) 이어서, 구문목의 자노드(子node) n_i(1≤i≤자노드의 수)를 우측으로부터 순서대로 함수 proc( )에 의해 처리하여 지명 표기 네트워크의 부분 네트워크를 생성한다. 그 때, 함수 proc( )로 생성하는 부분 네트워크의 종점이 q가 되도록 인수를 건낸다. 이 결과 생성된 부분 네트워크의 시점의 포인터를 q로 다시 대입하고, 이어서 생성하는 부분 네트워크의 종점으로 한다. 이와 같이 함으로써 함수 proc( )를 반복하여 호출함에 따라, 구문목의 「+」의 자노드로부터 생성되는 지명 표기 네트워크의 부분 네트워크가 차례차례로 연결된다(스텝 703, 704).(3) Subsequently, the child node n _i (number of 1 ≦ _i ≦ child nodes) of the syntactic head is processed by the function proc () in order from the right side to generate a partial network of the named place network. At that time, the argument is passed so that the end point of the partial network generated by the function proc () becomes q. The pointer of the starting point of the generated partial network is substituted into q again, and the end point of the partial network to be generated is subsequently set. In this way, as the function proc () is repeatedly called, partial networks of the name-notation network generated from the child node of the syntax "+" are sequentially connected (steps 703 and 704).

(4) 모든 자노드를 처리하는 것을 끝났으면, 그 시점에서의 q 즉 부분 네트워크의 선두를 복귀값으로 하여 돌려준다(스텝 705).(4) When processing of all child nodes is finished, q is returned at the time q, that is, the head of the partial network as a return value (step 705).

(5) 스텝 701에서의 판정으로, 구문목의 노드 a의 종류가 「1」즉 선택인 경우, 우선 자노드의 1개 n₁로부터 부분 네트워크를 생성하고, 얻어진 부분 네트워크의 선두 어드레스를 변수 q에 대입한다(스텝 706).(5) In the determination in step 701, if the node a of the syntax item is "1", that is, the first node is generated from the partial node 1 n ₁ , and the first address of the obtained partial network is set to the variable q. Is substituted into (step 706).

(6) 이어서, 변수 r에 q의 값을 대입하고, 다른 자노드 n_i(2≤i≤자노드의 수)로부터 생성하는 부분 네트워크를 순서대로 생성한다. 생성한 부분 네트워크의 선두 어드레스는, r의 형제 포인터 b에 저장한다. 또한, 생성한 부분 네트워크의 선두 어드레스를 r에 대입하고, 이하 동일한 처리를 반복한다(스텝 707∼710).(6) Subsequently, the value of q is substituted into the variable r, and a partial network generated from other child nodes n _i (2 ≦ _i ≦ number of child nodes) is generated in order. The head address of the generated partial network is stored in the sibling pointer b of r. In addition, the head address of the generated partial network is substituted into r, and the same process is repeated below (steps 707 to 710).

(7) 모든 자노드를 처리하는 것을 끝냈으면, q 즉 가장 처음에 생성한 부분 네트워크의 선두의 어드레스 q를 복귀값으로 하여 돌려준다(스텝 711).(7) When the processing of all the child nodes is finished, q is returned as the return value of the address q of the head of the partial network generated first (step 711).

(8) 스텝 701에서의 판정으로, 구문목의 노드 a의 종류가 「[ ]」즉 옵션인 경우, 우선 구문목의 자노드에 대응하는 부분 네트워크를 생성하고, 그 선두 어드레스를 변수 q에 저장한다. 그 때, 생성한 부분 네트워크의 말단은 p가 되도록 파라미터를 지정한다(스텝 712).(8) If the type of node a in the syntax tree is "[]", that is, the option in step 701, first, a partial network corresponding to the child node of the syntax tree is generated, and the first address is stored in the variable q. do. At that time, a parameter is specified such that the terminal of the generated partial network is p (step 712).

(9) 이어서, NULL 천이에 대응하는 노드를 함수 newNd( )를 이용하여 생성하고, 그 어드레스를 q의 형제 포인터 b에 저장한다. 또, newNd( )는, 도 4에 도시된 형식의 데이터 레코드의 기억 영역을 새롭게 1개 확보하는 함수이고, 확보된 데이터 레코드의 데이터 항목 b에는 NULL 포인터가 세트된다(스텝 713).(9) Next, a node corresponding to the NULL transition is created using the function newNd (), and the address is stored in the sibling pointer b of q. NewNd () is a function for newly securing a new storage area of the data record of the format shown in Fig. 4, and a NULL pointer is set in the data item b of the secured data record (step 713).

(10) 이어서, NULL 천이에 대응하는 노드의 문자 코드 c에 NULL을 대입하고, 또한 NULL 천이에 대응하는 노드의 자노드 포인터 d에 p를 세트한다(스텝 714, 715).(10) Subsequently, NULL is substituted into the character code c of the node corresponding to the NULL transition, and p is set in the child node pointer d of the node corresponding to the NULL transition (steps 714 and 715).

(11) 마지막으로 생성한 부분 네트워크의 선두의 어드레스 q를 복귀값으로 하여 돌려준다(스텝 716).(11) The address q at the head of the last generated partial network is returned as a return value (step 716).

(12) 스텝 701에서의 판정으로, 구문목의 노드 a의 종류가 문자열인 경우, 우선 변수 q에 p의 값을 대입한다(스텝 717).(12) In the determination in step 701, when the type of the node a in the syntax item is a character string, the value of p is first substituted into the variable q (step 717).

(13) 이어서, 하기의 처리를 문자열 내의 각 문자 C_i(1≤i≤문자열 길이)에 대해, 문자열의 끝에서 순서대로 반복하고, 각 문자에 대응하는 노드를 1개씩 생성한다. 여기서는, 우선 함수 newNd( )로 노드의 기억 영역을 1개분 확보한다. 이어서, 새롭게 생성한 노드의 문자 코드 c에, C_i를 대입한다. 이어서, 새롭게 생성한 노드의 자노드 d에 q의 값을 대입한다. 이어서, q의 값을 새롭게 생성한 노드의 어드레스로 치환한다(스텝 718∼722).(13) Next, the following processing is repeated for each character C _i (1 ≦ _i ≦ string length) in the character string in order at the end of the character string, and one node corresponding to each character is generated. In this case, first, a storage area of a node is secured by the function newNd (). Subsequently, C _i is substituted into the character code c of the newly created node. Subsequently, the value of q is substituted into the child node d of the newly created node. Subsequently, the value of q is replaced with the address of the newly generated node (steps 718 to 722).

(14) 상술된 처리를 각 문자 C_i에 대해 실행한 후, 새롭게 생성한 부분 네트워크의 어드레스 q를 복귀값으로 하여 돌려준다(스텝 723).(14) After the above-described processing is executed for each character C _i , the address q of the newly generated partial network is returned as a return value (step 723).

함수 Proc에 의해 지명 표기 네트워크가 생성되는 과정을 나타낸 도 8, 도 9에서, 참조 번호(801)는, 도 5에 도시된 플로우의 스텝 503의 처리로 종단 노드를 생성하고, 식별자 「3501104」를 저장한 것이다. 그 후, 도 7에 도시된 플로우의 각 스텝의 처리에 따라, 도 8, 도 9에 도시된 도면의 위에서 아래로 순서대로 나타낸 바와 같이 지명 표기 네트워크가 생성되어 간다. 그리고, 도 6에 도시된 구문목의 노드(603)를 함수 proc로 처리한 경우, 우선 노드(602)에 대응하는 부분 네트워크가 함수 proc로 생성되고, 참조 번호(802)에 도시된 바와 같이 노드(602)에 대응하는 부분 네트워크가 생성된다. 이어서, 노드(604)에 대응하는 부분 네트워크를 함수 proc로 생성한다. 이 경우, p는 노드(804)의 어드레스를 저장하고, 생성된 부분 네트워크는, 참조 번호(803)에 도시된 바와 같이 노드(804)에 접속된다.In Figs. 8 and 9 illustrating the process of generating a place name notation network by the function Proc, reference numeral 801 generates an end node in the process of step 503 of the flow shown in Fig. 5, and designates an identifier "3501104". It is saved. Then, according to the process of each step of the flow shown in FIG. 7, the place name notation network is produced | generated as shown from the top to the bottom of the figure shown in FIG. 8, FIG. When the node 603 of the syntax item shown in FIG. 6 is processed by the function proc, first, a partial network corresponding to the node 602 is generated as the function proc, and as shown by reference numeral 802, the node 603 is generated. A partial network corresponding to 602 is created. Subsequently, a partial network corresponding to node 604 is created with the function proc. In this case, p stores the address of node 804, and the generated partial network is connected to node 804, as shown by reference numeral 803.

도 5에 도시된 플로우의 제어 루프(501)에 의해, 각 지명 문자열의 생성 규칙마다 별개의 지명 표기 네트워크가 생성된다. 이 결과, 도 2의 지명 표기 생성 규칙으로부터 생성된 지명 표기 네트워크군은, 도 10에 도시된 바와 같은 것으로서 생성되고, 또한 스텝 505의 처리에 따라, 이들 네트워크군이 중복된 부분, 예를 들면 埼玉縣川越市의 부분을 통합하고, 도 3에 의해 설명한 바와 같은 지명 표기 네트워크가 생성된다.By the control loop 501 of the flow shown in FIG. 5, a separate name representation network is generated for each generation rule of the name string. As a result, the place name notation network group generated from the place name notation generation rule in FIG. 2 is generated as shown in FIG. 10, and further, in accordance with the process of step 505, a portion where these network groups are duplicated, for example, Incorporating parts of Matsukawa City, the name place network as described by FIG. 3 is created.

도 11은 종래 기술을 이용하여 지명 표기 네트워크를 생성하는 처리 순서를 설명하는 플로우차트, 도 12는 종래 기술에 의해 생성되는 지명 표기 네트워크의 생성 과정의 예를 설명하는 도면, 도 13은 종래 기술에 의해 생성된 지명 표기 네트워크의 예를 나타내는 도면이고, 이하 이들 도면을 참조하여, 생성 규칙을 이용하지 않은 경우의 지명 표기 네트워크 생성 방법에 대해 설명한다.11 is a flowchart for explaining a processing procedure for generating a name representation network using the prior art, FIG. 12 is a view for explaining an example of a process for generating a name representation network generated by the prior art, and FIG. A diagram showing an example of a name-notation network generated by the present invention. Hereinafter, a method of generating a name-notation network when a generation rule is not used will be described with reference to these drawings.

여기서 종래 기술을 설명하는 이유는, 종래의 지명 문자열의 표기 방법으로부터는 Trie라고 하는 트리 구조의 지명 표기 네트워크밖에 생성할 수 없고, 본 발명의 표기 방법으로부터 생성되는 지명 표기 네트워크가 기억 용량, 대조에 필요한 처리 시간 모두 우수한 것을 나타내기 때문이다. 종래 기술에 따른 지명의 표기를 표현하는 수법은, 도 2(B)에 도시된 바와 같은 지명 문자열의 나열이고, 도 11에 의해 설명하는 플로우는, 이러한 단어의 나열로부터 지명 표기 네트워크를 생성하는 순서이다. 여기서, 도 2(B) 중 k 번째의 문자열을 S_k, 그 길이를 L_k, 각 문자열의 i 번째의 문자를 C_i로 한다. 또한, 각 문자열에 대응하는 식별자가 별도로 기억되는 것으로 한다. 그리고, 생성하는 지명 표기 네트워크는, 도 4에 도시된 데이터 형식으로 실현한다.The reason why the prior art is explained here is that only a conventional name representation network having a tree structure called Trie can be generated from a conventional name representation string, and the name representation network generated from the representation method of the present invention is used for storage capacity and matching. This is because all of the required processing times are excellent. A technique for expressing a notation of a place name according to the prior art is an arrangement of a place name string as shown in FIG. 2 (B), and the flow described by FIG. 11 is a procedure for generating a place name notation network from such an arrangement of words. to be. In FIG. 2B, the k-th character string is S _k , the length is L _k , and the i-th character of each character string is C _i . It is also assumed that an identifier corresponding to each character string is stored separately. The place name notation network to be generated is realized in the data format shown in FIG.

(1) 우선, 지명 표기 네트워크의 가상의 루트가 되는 노드 rr을 생성한다. 이 노드의 자노드 포인터 d에는 NULL을 세트한다(스텝 1101, 1102).(1) First, the node rr which becomes the virtual route of the place name notation network is created. NULL is set in the child node d of this node (steps 1101 and 1102).

(2) 루프(1103)에 의해 모든 문자열 S_k를 1씩 처리한다.(2) The loop 1103 processes all strings S _k by one.

(3) 우선, 변수 p에 루트의 어드레스를 대입한다. 이어서 문자열 내의 문자의 1개마다, 서브 루틴 SrchNxt를 호출한다. 서브 루틴 SrchNxt는, 각 문자에 대응하는 노드가 이미 생성되어 있는지의 여부를 판단하고, 생성되지 않은 경우, 새로운 노드를 추가하는 처리 순서이고, 이 순서에 대해서는 후술하겠다(스텝 1104∼1106).(3) First, the address of the root is substituted into the variable p. Then, for every one of the characters in the string, the subroutine SrchNxt is called. The subroutine SrchNxt is a processing procedure for determining whether a node corresponding to each character has already been generated and, if not, to add a new node, which will be described later (steps 1104 to 1106).

(4) 서브 루틴 SrchNxt로 문자열 내의 문자를 처리하는 것을 끝냈으면, 새로운 자노드를 함수 newNd( )로 생성하고, 그 문자열의 식별자를 포인터 b의 영역에 저장하고, 더욱 이 새로운 자노드의 어드레스를 p의 자노드 포인터 d에 대입한다. 루프(1103)가 종료한 시점에서의 rr의 자노드가 지명 표기 네트워크의 루트가 된다(스텝 1107∼1110).(4) When you have finished processing the characters in the string with the subroutine SrchNxt, create a new child node with the function newNd (), store the identifier of the string in the area of pointer b, and further address the new child node's address. Assign to p's child pointer d. The child node of rr at the end of the loop 1103 becomes the root of the name-notation network (steps 1107 to 1110).

이어서, 서브 루틴 SrchNxt의 처리에 대해 설명한다.Next, the processing of the subroutine SrchNxt will be described.

(1) 우선, 변수 q에 p의 자노드 d의 값을 대입하고, 이어서 루프 처리를 행하여 모든 p의 자노드를 변수 q에 의해 주사하고, 대응하는 문자 코드 즉 데이터 항목 c가 C_i와 같은지의 여부를 조사한다. 만약 같으면, 이미 C_i에 대응하는 노드가 생성된다고 간주하여, 포인터 p를 그 노드 q로 진행시켜 종료한다(스텝 1111, 1113∼1115, 루프(1112)).(1) First, the value of the child node d of p is substituted into the variable q, and then looped to scan all the child nodes of p by the variable q, and if the corresponding character code, i.e., the data item c, is equal to C _i. Investigate whether or not. If it is the same, it is assumed that a node corresponding to C _i has already been generated, and the pointer p advances to the node q to end (steps 1111, 1113 to 1115, and loop 1112).

(2) 스텝 1113에서의 체크로, 데이터 항목 c가 C_i와 같지 않으면, q에 q의 형제 포인터의 값을 대입하고, q가 NULL이 될 때까지 루프 처리를 반복한다.(스텝 1116).(2) In step 1113, if the data item c is not equal to C _i , the value of the sibling pointer of q is substituted for q, and the looping process is repeated until q becomes NULL (step 1116).

(3) 루프 처리가 종료해도 C_i에 대응하는 노드를 발견할 수 없는 경우, 새로운 노드를 함수 newNd( )로 생성하고, 새로운 노드의 문자 코드 c에 C_i를, 자노드 포인터 d에 NULL을, 형제 포인터 b에 p의 자노드 포인터 d의 값을 각각 대입하고, 이 새로운 노드의 어드레스를 p의 자노드 포인터 d에 대입하고, 포인터 p에 새로운 자노드의 어드레스를 대입하여, 이 서브 루틴의 처리를 종료한다(스텝 1117∼1122).(3) If the node corresponding to C _i is not found even after the loop processing ends, create a new node with the function newNd (), set C _i to the character code c of the new node, and NULL to the child node d. Assigns the value of the child node pointer d of p to the sibling pointer b, assigns the address of this new node to the child node pointer d of p, and assigns the address of the new child node to pointer p. The process ends (steps 1117 to 1122).

상술된 도 11의 처리 순서에 따라 지명 표기 네트워크가 생성되는 과정을 도 12에 나타내고 있다. 여기서 예로 든 것은, 도 2(B) 위에서 3 행을 처리하는 과정이다. 우선, 처음에 「川越市小ヶ谷」에 대응하는 지명 표기 네트워크가 생성된다(1201). 이어서, 「川越市笠幡」을 처리하지만, 「川越市」의 부분은, 참조 번호(1201)에서 이미 생성되어 있으므로, 새로운 노드는 생성되지 않는다. 그러나, 포인터 p가 참조 번호(1202)에 나타낸 위치에 달하여, 「笠」의 문자를 처리할 때에는, 「笠」에 해당하는 노드는 「市」의 자노드에는 없다. 그래서, 「小」의 형제 노드로서 새롭게 「笠」에 해당하는 노드를 생성한다. 이하, 남은 문자 「幡」에 대응하는 노드를, 새롭게 생성한 노드의 자노드로서 연결한다(1203). 「川越市下廣谷」의 경우도 마찬가지로 처리되고, 「下」에 대한 노드를 「小」, 「笠」의 형제로서 새롭게 생성하고(1204), 이후의 문자에 대응하는 노드가 연결된다(1205).FIG. 12 illustrates a process of generating a place name notation network according to the above-described process sequence of FIG. 11. An example here is the process of processing three rows above FIG. 2 (B). First, a place name notation network corresponding to "Kawazu City Kobe-ya" is first generated (1201). Subsequently, although "Kawazu City" is processed, since the part of "Kawazu City" is already generated by reference numeral 1201, a new node is not created. However, when the pointer p reaches the position indicated by the reference number 1202, and the character of "笠" is processed, the node corresponding to "笠" does not exist in the child node of "city". Thus, as a sibling node of "small", a node corresponding to "small" is newly created. Hereinafter, the node corresponding to the remaining character "幡" is connected as a child node of the newly created node (1203). In the case of "川越市下廣谷", the process is similarly performed, and a node for "下" is newly created as a brother of "小" and "笠" (1204), and nodes corresponding to subsequent characters are connected (1205). .

도 13은 도 2(B)에 적은 이표기군으로부터 생성한 지명 표기 네트워크의 일부를 모식적으로 나타내는 것이지만, 이 예는 도 3에 도시된 경우와 달리, 종래의 표기 방법으로부터 생성되는 지명 표기 네트워크는, 트리 형식, 즉 한번 분기하면 다시 합류하지 않은 형식으로 되어 있다. 이것은, Trie로서 알려져 있는 데이터의 표현 형식이다. 도 3과 비교하면, 중복된 부분이 많은 것을 알 수 있다.Although FIG. 13 schematically shows a part of the place name notation network generated from the group of two notations shown in FIG. 2 (B), this example is different from the case shown in FIG. 3, but the place name notation network generated from the conventional notation method is , Tree form, that is, once branched, does not join again. This is a representation format of data known as Trie. Compared with FIG. 3, it can be seen that there are many overlapping parts.

예를 들면, 「東田」, 「東關」, 「西關」에 대응하는 부분 네트워크가, 6회 반복되어 있다. 이것은, 필요해지는 기억 용량의 증대로 연결되는 것을 의미하며, 계층적인 메모리 구성을 취하는 계산기의 경우, 액세스하는 메모리 공간이 커지면 캐쉬의 미스히트등에 의해 액세스가 늦어지고, 나아가서는 후술된 문자열 대조 처리 자신이 늦어지게 된다.For example, the partial network corresponding to "東田", "東關", and "西關" is repeated 6 times. This means that the required storage capacity is increased, and in the case of a calculator having a hierarchical memory configuration, when the memory space to be accessed becomes large, access is delayed due to a miss hit of the cache, and further, the string matching process described later. This will be late.

본 발명에 따라, 도 3에 도시된 바와 같은 중복이 적은 지명 표기 네트워크를 생성할 수 있는 것은, 생성 규칙에 따른 지명 단어 표기의 본질적인 이점이다. 이 생성 규칙을 이용하면, 중복된 개소를 명확하게 표현할 수 있다. 예를 들면, 도 2(A)에 나타낸 예의 경우, 「小ヶ谷」의 「ヶ」에는 3가지의 이표기가 있지만, 「ヶ」이후의 문자열은 동일한 것이 확장 BNF 기법에 따라 나타내어지고 있다. 이 때문에, 도 3에 도시된 바와 같이, 「小」와 「谷」사이에만, 3개의 경로가 있는 네트워크가 생성된다.In accordance with the present invention, being able to create a less redundant name representation network as shown in FIG. 3 is an inherent advantage of name word representation in accordance with the creation rules. By using this generation rule, it is possible to clearly express the duplicated points. For example, in the example shown in FIG. 2 (A), there are three different notations in "의" of "小ヶ谷", but the same character strings after "ヶ" are represented by the extended BNF technique. For this reason, as shown in FIG. 3, a network with three paths is generated only between "small" and "谷".

이에 대해, 도 2(B)에 도시된 바와 같은 종래의 지명 문자열의 표기 방법은, 「ケ」 이후의 이표기군이 등가인지의 여부를 검지할 수 없어, 도 13에 도시된 바와 같은 네트워크밖에 생성할 수 없다.On the other hand, in the conventional notation method of the name-name string as shown in FIG. 2 (B), it is not possible to detect whether the two-letter group after "ケ" is equivalent, and generate only the network as shown in FIG. Can not.

도 14는 도 1에 도시된 지명 인식 처리(104)에서의 처리 동작을 설명하는 플로우차트이고, 이하 이에 대해 설명한다.FIG. 14 is a flowchart for explaining a processing operation in the name recognition processing 104 shown in FIG. 1, which will be described below.

(1) 우선, 입력 화상으로부터 문자 행 추출 처리에 따라, 문자 행의 부분의 화상을 추출한다(스텝 1401).(1) First, an image of a portion of a character line is extracted from the input image in accordance with the character line extraction process (step 1401).

(2) 이어서, 문자 추출 처리에 따라, 문자 행 화상 중에서 문자라고 생각되어지는 패턴, 즉 후보 패턴을 추출한다. 이 단계에서 일의로 문자의 경계를 결정할 수 없는 경우, 복수의 경계의 가설에 기초하여, 문자 패턴이 추출을 시도하고, 각각의 가설에 대응한 후보 패턴을 출력한다(스텝 1402).(2) Next, according to the character extraction process, the pattern considered to be a character, ie, a candidate pattern, is extracted from a character line image. If the boundary of the character cannot be determined uniquely in this step, the character pattern attempts extraction based on the hypotheses of the plurality of boundaries, and outputs candidate patterns corresponding to the respective hypotheses (step 1402).

(3) 이어서, 문자 인식 처리에 따라, 추출된 각각의 후보 패턴이 어떤 문자인지를 인식하고, 후보 문자열로서 출력한다. 문자의 추출 방법이 복수의 가설에 기초한 경우, 또한 문자 인식의 결과, 1개의 패턴에 대해 복수의 후보 문자가 출력되는 경우, 문자 인식 처리는, 각각의 추출 방법 및 후보 문자의 조합에 대응하여 복수의 후보 문자열을 출력한다(스텝 1403).(3) Next, according to the character recognition process, it recognizes what character each extracted candidate pattern is, and outputs it as a candidate character string. When the extraction method of the character is based on a plurality of hypotheses, and when a plurality of candidate characters are output for one pattern as a result of the character recognition, the character recognition processing is performed in response to the combination of each extraction method and the candidate character. The candidate character string is outputted (step 1403).

(4) 마지막으로, 문자열 대조 처리에 따라, 각각의 후보 문자열이 옳은 지명 문자열로 되어 있는지의 여부를, 지명 표기 네트워크를 참조하여 대조한다. 대조로 수리된 후보 문자열을 지명 인식 결과로 한다(스텝 1404).(4) Finally, according to the string matching process, whether each candidate string is a correct name string or not is compared with reference to the place name notation network. The candidate character string repaired by matching is used as the name recognition result (step 1404).

도 15는 상술된 문자열 대조 처리(1404)에서의 처리 동작을 설명하는 플로우차트이고, 이하 이에 대해 설명한다. 이 처리는, 1개의 문자열을 입력으로 하고, 입력 문자열 중 적어도 일부가 지명 문자열로서 수리할 수 있는지의 여부가 판정되고, 수리할 수 있으면 해당하는 그 지명 표기의 식별자를 구하는 처리이다. 여기서, 입력 문자열의 길이를 L, 문자열의 i 번째의 문자를 C_i로 한다.15 is a flowchart for explaining the processing operation in the above-described character string matching processing 1404, which will be described below. This process is a process that takes one character string as an input, determines whether at least a part of the input string can be repaired as a named string, and obtains an identifier of the corresponding name notation if it can be repaired. Here, the length of the input string is L, and the i-th character of the string is C _i .

(1) 우선, 루프(1501)에 의해, 대조의 기점 s를 1로부터 L까지 바꾸면서, 스텝 1502, 1503을 반복한다.(1) First, steps 1502 and 1503 are repeated by the loop 1501 while changing the starting point s of the control from 1 to L. FIG.

(2) 노드를 지시하는 변수 p에, 지명 표기 네트워크의 루트의 어드레스를 세트한다. 이어서, 인수 p 및 s를 제공하여 함수 srch를 호출한다. 함수 srch는, 지명 표기 네트워크 중에서 입력 문자열에 일치하는 경로를 발견하고, 그 종단의 노드의 어드레스를 돌려주는 함수이다. srch의 복귀값이, NULL 포인터가 아니면 대조에 성공한 것이라고 간주하여, 함수 srch의 복귀값이 나타내는 노드에 저장된 식별자를 출력한다(스텝 1502∼1504).(2) The address of the root of the place name notation network is set in the variable p indicating the node. Then call the function srch with the arguments p and s. The function srch finds a path that matches the input string in the name-notation network and returns the address of the node at its end. If the return value of srch is not a NULL pointer, the matching is considered to be successful, and the identifier stored in the node indicated by the return value of the function srch is output (steps 1502 to 1504).

(3) 만약 s가 L에 달해도 대조가 성공하지 않으면, 문자열 대조 처리는 실패한 것으로서, 처리를 종료한다(스텝 1505).(3) If the matching is not successful even if s reaches L, the character string matching processing has failed, and the processing ends (step 1505).

상술된 처리에서, 함수 srch는, 재귀적으로 자기 자신으로부터도 호출되고, 지명 표기 네트워크 중에서 입력 문자열에 일치하는 경로를 깊이 우선으로 탐색한다. 함수 srch는, 인수 p 및 i의 2개의 인수를 취한다. 인수 p는, 탐색을 개시하는 노드를 지시한다. 또한, 인수 i는, 정수이고, 현재의 처리로 주목하는 것이 입력 문자열 중 몇번째의 문자인지를 나타낸다. 수리되는 문자열이 발견된 경우, 함수 srch는, 그 문자열의 종단의 노드의 어드레스를 돌려주고, 수리되는 문자열이 발견되지 않았던 경우, NULL 포인터를 돌려 준다.In the above-described process, the function srch is also called recursively from itself, searching for depth-first paths that match input strings in the name-notation network. The function srch takes two arguments, the arguments p and i. The argument p indicates the node that starts the search. In addition, the argument i is an integer and shows which character of the input string is the thing to be noticed by the current process. If a string to be repaired is found, the function srch returns the address of the node at the end of the string, or a NULL pointer if no string to be repaired was found.

도 16은 상술된 처리에서의 함수 srch의 처리 동작을 설명하는 플로우차트이고, 이하 이것에 대해 설명한다.Fig. 16 is a flowchart for explaining the processing operation of the function srch in the above-described processing, which will be described below.

(1) 우선, 인수 p가 문자열 종료 노드를 가리키는지의 여부를 조사한다. 만약에 문자열 종료 노드를 가리키는 경우에는, 입력 문자열이 수리된다고 간주되고, p를 복귀값으로 하여 복귀하여 처리를 종료한다(스텝 1601).(1) First, check whether the argument p points to a string termination node. If the character string end node is indicated, the input character string is considered to be repaired, and p is returned as the return value to terminate the process (step 1601).

(2) 이어서, 이미 모든 문자를 처리하는 것을 끝냈는지의 여부를 판정한다. i가 L보다 크고, 모든 문자를 처리하는 것을 끝냄에도 불구하고, 지명 표기 네트워크의 종단에 p가 달하지 않는 경우, NULL을 돌려준다(스텝 1602).(2) Then, it is determined whether or not it has already finished processing all the characters. If i is greater than L and finishes processing all characters, and p does not reach the end of the name-notation network, NULL is returned (step 1602).

(3) 이어서, p의 데이터 항목 c가 문자열의 i 번째의 문자 C_i와 일치하는지의 여부를 조사한다. 만약 일치하면, p의 자노드 p->d를 탐색의 기점으로 하고, i+1번째로부터 문자열을 처리하도록, 함수 srch를 재귀적으로 호출한다. 이 복귀값 r이 NULL이 아니면, 문자열이 수리되었다고 간주하여, r을 복귀값으로 하여 처리를 종료한다(스텝 1603).(3) Then, it is checked whether the data item c of p matches the i-th character C _i of the character string. If there is a match, the function srch is called recursively, with the child node p-> d of p as the starting point of the search and processing the string from the i + 1 th. If this return value r is not NULL, it is assumed that the character string has been repaired, and the process ends with r as the return value (step 1603).

(4) 이어서, p가 NULL 천이에 대응하는 노드인지를 조사한다. 만약 그렇다면, p의 자노드 p->d를 탐색의 기점으로 하고, i 번째로부터 문자열을 처리하도록, 함수 srch를 재귀적으로 호출한다. 이 복귀값 r이 NULL이 아니면, 문자열이 수리됐다고 간주하고, r을 복귀값으로 하여 종료한다(스텝 1604).(4) Next, it is checked whether p is a node corresponding to a NULL transition. If so, then the function srch is called recursively, with p's child node p-> d as the starting point of the search and processing the string from the i th. If this return value r is not NULL, it is assumed that the character string has been repaired, and the process ends with r as the return value (step 1604).

(5) 이어서, p에 형제 노드 p->b가 연결되는지의 여부를 조사한다. 혹시 연결되면, p의 형제 노드 p->b를 탐색의 기점으로 하고, i 번째로부터 문자열을 처리하도록, 함수 srch를 재귀적으로 호출하고, 이 복귀값을 상위로 복귀시킨다(스텝 1605).(5) Then, it is examined whether sibling node p-> b is connected to p. If it is connected, the sibling node p-> b of p is the starting point of the search, and the function srch is called recursively to process the string from the i th, and this return value is returned to the upper level (step 1605).

(6) 만약에 상술된 어느 한 처리라도 입력 문자열이 수리되지 않으면, 이 이상의 탐색은 할 수 없기 때문에, NULL을 복귀값으로 하여 처리를 종료한다(스텝 1606).(6) If any of the above-described processes do not correct the input character string, no further search is possible, and the process ends with NULL as the return value (step 1606).

상술된 바와 같이 설명한 본 발명의 실시예는, 문자 추출, 문자 인식, 문자열 대조를 순차 행한다고 설명했지만, 본 발명은, 古賀 외「수신인 판독 장치 및 우편물등 구분기 및 문자열 인식 방법」(특원평10-28077호 공보)와 같이 문자열 대조 결과를 문자 추출에 피드백하는 방식으로 용이하게 확장할 수도 있다.Although the embodiments of the present invention described above have been described in that character extraction, character recognition, and character string matching are performed sequentially, the present invention, Gong et al., &Quot; Method for recognizing separators and character strings such as recipient reading device and mail item " 10-28077 can be easily extended by feeding back the string matching result to the character extraction.

도 17은 본 발명의 실시예에 따른 지명 문자열 인식의 처리를 응용한 시스템의 구성예를 나타내는 블록도, 도 18은 지명 문자열 생성 규칙 편집 장치의 구성을 나타내는 블록도이다. 이 시스템 예는, 우편 구분 시스템에 본 발명을 적용한 예이다. 도 17, 도 18에서, 참조 번호(1701)는 우편 구분기, 참조 번호(1702)는 스캐너, 참조 번호(1703)는 딜레이 라인, 참조 번호(1704)는 소터, 참조 번호(1705)는 지명 인식 장치, 참조 번호(1706)는 입력용 인터페이스, 참조 번호(1707)는 연산 처리 장치, 참조 번호(1708)는 출력용 처리 장치, 참조 번호(1710)는 메모리, 참조 번호(1711)는 네트워크 인터페이스, 참조 번호(1712)는 하드디스크, 참조 번호(1713)는 미디어 착탈 가능 기억 장치, 참조 번호(1714)는 지명 문자열 생성 규칙 편집 장치, 참조 번호(1718)는 네트워크, 참조 번호(1801)는 마우스, 참조 번호(1802)는 키보드, 참조 번호(1803)는 디스플레이, 참조 번호(1804)는 지명 문자열 생성 규칙 편집 프로그램, 참조 번호(1805)는 문자열 대조 프로그램, 참조 번호(1806)는 지명 표기 네트워크 표시 프로그램, 참조 번호(1807)는 지명 문자열 생성 규칙 파일, 참조 번호(1808)는 지명 표기 네트워크 생성 프로그램, 참조 번호(1809)는 지명 표기 네트워크 데이터, 참조 번호(1810)는 통신 장치, 참조 번호(1811)는 미디어 착탈 가능 기억 장치, 참조 번호(1812)는 컴퓨터이다.Fig. 17 is a block diagram showing a configuration example of a system to which the name string recognition processing is applied according to the embodiment of the present invention. Fig. 18 is a block diagram showing the configuration of a device for editing a name string generation rule. This system example is an example in which the present invention is applied to a mail classification system. 17 and 18, reference numeral 1701 denotes a mail separator, reference numeral 1702 is a scanner, reference numeral 1703 is a delay line, reference numeral 1704 is a sorter, and reference numeral 1705 is a name recognition. Device, reference number 1706 for an input interface, reference number 1707 for an arithmetic processing device, reference number 1708 for an output processing device, reference number 1710 for a memory, reference number 1711 for a network interface, reference The number 1712 is a hard disk, the reference number 1713 is a media removable storage device, the reference number 1714 is a designation string generation rule editing device, the reference number 1718 is a network, the reference number 1801 is a mouse, and the like. Number 1802 is a keyboard, reference number 1803 is a display, reference number 1804 is a name string generation rule editing program, reference number 1805 is a string matching program, reference number 1806 is a name designation network display program, Reference number (1807) is the nomination string A generation rule file, reference number 1808 denotes a name designation network generating program, reference number 1809 denotes a name designation network data, reference number 1810 indicates a communication device, reference number 1811 indicates a media removable storage device, and reference number 1812 is a computer.

도 17에 도시된 시스템은, 1대 또는 복수대의 우편 구분기(1701)와, 1대 또는 복수대의 지명 문자열 생성 규칙 편집 장치(1714)가 네트워크(1718)로 접속되어 구성된다. 우편 구분기(1701)는, 스캐너(1702), 딜레이 라인(1703), 소터(1704), 지명 인식 장치(1705)로 구성된다. 또한, 지명 인식 장치(1705)는, 입력용 인터페이스(1706), 연산 처리 장치(1707), 출력용 처리 장치(1708), 메모리(1710), 네트워크 인터페이스(1711), 하드디스크(1712), 미디어 착탈 가능 기억 장치(1713)로 구성된다. 또, 도면에서의 태선은, 우편물의 흐름을 나타낸다.In the system shown in FIG. 17, one or more postal separators 1701 and one or more designated character string generation rule editing apparatuses 1714 are connected by a network 1718. The mail separator 1701 is composed of a scanner 1702, a delay line 1703, a sorter 1704, and a name recognition device 1705. The name recognition device 1705 further includes an input interface 1706, an arithmetic processing unit 1707, an output processing unit 1708, a memory 1710, a network interface 1711, a hard disk 1712, and a removable media. It is composed of a possible memory device 1713. Moreover, the thick line in a figure shows the flow of a postal matter.

도 17에 도시된 시스템에 있어서, 스캐너(1702)로부터 입력된 우편물에 기재되어 있는 지명의 화상 정보는, 지명 인식 장치(1705)로 전송된다. 그리고, 우편물이 딜레이 라인(1703)을 반송하는 동안, 지명 인식 장치(1705)는, 우편물에 기재되어 있는 지명을 인식하고, 인식 결과를 소터(1704)로 전송한다. 소터(1704)는, 인식 결과에 따라 우편물을 구분한다.In the system shown in FIG. 17, image information of a place name described in a mail item input from the scanner 1702 is transmitted to the place name recognition device 1705. While the postal matter carries the delay line 1703, the name recognition apparatus 1705 recognizes the place name described in the postal matter and transmits the recognition result to the sorter 1704. The sorter 1704 classifies mails according to the recognition result.

우편물의 구분의 준비 단계로서, 지명 인식 장치(1705)는, 하드디스크(1712)로부터 지명 표기 네트워크 생성 프로그램 파일을 메모리(1710)로 판독하여 연산 장치(1707)로 기동한다. 지명 표기 네트워크 생성 프로그램의 제어하에, 지명 인식 장치(1705)는, 지명 문자열 생성 규칙을 지명 문자열 생성 규칙 편집 장치(1714)로부터 네트워크 인터페이스(1711)를 통해 입력하고, 지명 표기 네트워크 파일을 작성하여 하드디스크(1712)에 저장한다.As a preparation step for sorting mail items, the name recognition device 1705 reads the name representation network generation program file from the hard disk 1712 into the memory 1710 and starts the operation device 1707. Under the control of the place name notation network generation program, the place name recognition device 1705 inputs the place name character generation rule from the place name character generation rule editing device 1714 through the network interface 1711, and creates a place name notation network file to be hard. The disk 1712 is stored.

또, 지명 문자열 생성 규칙은, 네트워크를 통해 지명 문자열 생성 규칙 편집 장치(1714)로부터 입력하는 대신에, 플로피 디스크 드라이브 등의 미디어 착탈 가능 기억 장치(1713)로부터 판독해도 좋다.Alternatively, the name string generation rule may be read from a media removable storage device 1713 such as a floppy disk drive, instead of inputting from the name string generation rule editing device 1714 over a network.

지명 인식 장치(1705)는, 우편물의 구분시, 하드디스크(1712)로부터 인식 프로그램 파일 및 지명 표기 네트워크 파일을 메모리(1710)로 판독하여 연산 장치(1807)에 의해 실행한다. 그리고, 지명 인식 장치(1705)는, 인식 프로그램의 제어하에, 입력 인터페이스(1706)로부터 화상을 입력하고, 우편물에 기재된 지명을 인식하고, 인식 결과를 출력 인터페이스(1708)를 통해 출력한다.The name recognition apparatus 1705 reads the recognition program file and the name representation network file from the hard disk 1712 into the memory 1710 and executes it by the computing device 1807 when classifying mail items. Then, under the control of the recognition program, the name recognition apparatus 1705 inputs an image from the input interface 1706, recognizes the name described in the mail, and outputs the recognition result via the output interface 1708.

지명 문자열 생성 규칙 편집 장치(1714)는, 도 18에 도시된 바와 같이 컴퓨터(1812)에, 마우스(1801), 키보드(1802), 디스플레이(1803), 지명 문자열 생성 규칙 파일(1807)을 저장하는 디스크 장치, 통신 장치(1810), 미디어 착탈 가능 기억 장치(1811)를 접속하여 구성된다. 편집 작업은, 컴퓨터(1812) 상에서 동작하는 지명 문자열 생성 규칙 편집 프로그램(1804)을 통해 지명 문자열 생성 규칙 파일(1807)을 편집함으로써 실행된다. 지명 문자열 생성 규칙 파일(1807)은, 텍스트 파일이고, 편집에는 통상의 텍스트 에디터를 이용할 수 있다. 또한, 컴퓨터(1812) 상에서 지명 표기 네트워크 생성 프로그램(1808)을 실행하고, 지명 문자열 생성 규칙 파일(1807)로부터 지명 표기 네트워크(1809)를 생성할 수 있다.The name string generation rule editing apparatus 1714 stores a mouse 1801, a keyboard 1802, a display 1803, and a name string generation rule file 1807 in the computer 1812, as shown in FIG. The disk device, the communication device 1810, and the media removable storage device 1811 are connected to each other. The editing operation is executed by editing the name string generation rule file 1807 through the name string generation rule editing program 1804 operating on the computer 1812. The designated character string generation rule file 1807 is a text file, and an ordinary text editor can be used for editing. In addition, the name representation network generation program 1808 may be executed on the computer 1812, and the name representation network 1809 may be generated from the name string generation rule file 1807.

지명 문자열 생성 규칙 편집 장치(1714)는, 상술된 기능에 따라 편집 중 지명 단어 생성 규칙이 문법적으로 옳은지의 여부를 확인할 수 있고, 또한 인식 처리에서의 문자열 대조(1404)와 등가인 프로그램(1805)을 실행하고, 키보드(1803)로부터 입력된 시험용의 문자열이 수리되는지의 여부를 확인할 수 있다.The named character string generation rule editing apparatus 1714 can confirm whether the named word generation rule during editing is grammatically correct according to the above-described function, and is a program 1805 that is equivalent to the character string matching 1404 in the recognition process. Then, it is possible to check whether the test string input from the keyboard 1803 is repaired.

또한, 컴퓨터(1812)는, 지명 표기 네트워크(1809)를, 예를 들면 도 3에 도시된 바와 같은 형식으로 표시하기 위한 지명 표기 네트워크 표시 프로그램(1806)을 실행하므로, 작업자는, 편집 결과를 시각적으로 확인할 수 있다. 편집한 결과의 지명 문자열 생성 규칙 파일(1807)은, 통신 장치(1810)를 통해 지명 인식 장치(1705)로 전송되고, 혹은 미디어 착탈 가능 기억 장치(1811)에 의해 플로피 디스크 등의 착탈 가능한 기억 미디어에 복사되고, 기억 미디어에 의해 우편 구분기(1701)로 수송되어도 좋다.In addition, the computer 1812 executes the place name notation network display program 1806 for displaying the place name notation network 1809 in a format as shown in FIG. 3, for example, so that the operator can visually display the editing results. You can check with The name string generation rule file 1807 of the edited result is transmitted to the name recognition device 1705 via the communication device 1810, or the removable storage medium such as a floppy disk by the media removable storage device 1811. May be copied to the mail separator 1701 by a storage medium.

도 19는 본 발명의 다른 실시예의 구성을 나타내는 블록도, 도 20은 디스플레이에 표시되는 화면예를 설명하는 도면이다. 이 예는, 본 발명에 따른 지명 문자열의 표기 방법 및 지명 대조 방식을 이용하여, 지명을 나타내는 문자열로부터 지명에 관한 정보를 검색하기 위한 지명록 장치이다. 도 19에서, 참조 번호(1901)는 마우스, 참조 번호(1902)는 키보드, 참조 번호(1903)는 디스플레이, 참조 번호(1904)는 프린터, 참조 번호(1905)는 입력 파일, 참조 번호(1906)는 출력 파일, 참조 번호(1907)는 지명록 프로그램, 참조 번호(1908)는 지명 부가 정보 파일, 참조 번호(1909)는 지명 문자열 생성 규칙 파일, 참조 번호(1910)는 통신 모듈, 참조 번호(1911)는 인터페이스 모듈, 참조 번호(1912)는 지명 리스트 데이터, 참조 번호(1913)는 지명 리스트 소트 모듈, 참조 번호(1914)는 지명 정보 검색 모듈, 참조 번호(1915)는 지명 리스트 생성 모듈, 참조 번호(1916)는 문자열 대조 모듈, 참조 번호(1917)는 지명 표기 전개 모듈, 참조 번호(1918)는 지명 표기 네트워크 생성 프로그램, 참조 번호(1919)는 지명 표기 네트워크 데이터이다.19 is a block diagram showing the structure of another embodiment of the present invention, and FIG. 20 is a view for explaining an example of a screen displayed on a display. This example is a directory register for retrieving information on a place name from a character string representing a place name by using a method for representing a place name and a place name matching method according to the present invention. In Fig. 19, reference numeral 1901 denotes a mouse, reference numeral 1902 denotes a keyboard, reference numeral 1901 denotes a display, reference numeral 1904 denotes a printer, reference numeral 1905 denotes an input file, and reference numeral 1906. Is an output file, reference number 1907 is a directory program, reference number 1908 is a name side information file, reference number 1909 is a name string generation rule file, reference number 1910 is a communication module, reference number 1911 ) Is an interface module, reference number 1912 is a name list data, reference number 1913 is a name list sort module, reference number 1914 is a name information search module, reference number 1915 is a name list generation module, and reference number 1916 is a character string matching module, reference numeral 1917 is a name designation expansion module, reference numeral 1918 is a name designation network generating program, and reference number 1919 is a name designation network data.

도 19에 도시된 장치는, 이하와 같은 서비스를 제공하는 것이다.The apparatus shown in FIG. 19 provides the following services.

(1) 키보드로부터 입력된 지명 문자열의 표준형을 표시 또는 인쇄한다.(1) Displays or prints the standard form of the place name string entered from the keyboard.

(2) 키보드로부터 입력된 지명 문자열의 이표기를 표시 또는 인쇄한다.(2) Display or print a double-sided representation of the place name string entered from the keyboard.

(3) 키보드로부터 입력된 지명 문자열에 대응하는 지역의 정보(우편 번호등)을 표시 또는 인쇄한다.(3) Display or print local information (zip code, etc.) corresponding to the place name character string input from the keyboard.

(4) 파일로부터 입력한 지명 문자열을 표준형 또는 우편 번호 등 해당하는 지역에 고유의 정보로 변환하여 파일로 출력한다.(4) Convert the name string inputted from the file into information unique to the area, such as standard form or postal code, and output it as a file.

(5) 네트워크로부터 입력한 지명 문자열을 표준형 또는 우편 번호 등 해당하는 지역에 고유의 정보로 변환하여 네트워크로 출력한다.(5) Convert the name string inputted from the network into information unique to the relevant area such as standard type or postal code and output it to the network.

상술된 것에 있어서, 표준형이란, 예를 들면 행정 구분으로 정해져 있는 지역을 나타내는 정식 문자열을 말한다.In the above description, the standard type refers to a formal character string indicating an area determined by administrative division, for example.

도 19에 도시된 실시예는, 계산기 상에서 실행되는 지명록 프로그램(1907)에, 마우스(1901), 키보드(1902), 디스플레이(1903), 프린터(1904), 입력 파일(1905), 출력 파일(1906), 지명 부가 정보 파일(1908), 지명 문자열 생성 규칙 파일(1909)이 접속되어 구성된다. 표시, 입출력은, 인터페이스 모듈(1911)을 통해 행해진다. 검색 대상의 문자열이 입력되면, 지명 정보 검색 모듈(1914)은, 문자열 대조 모듈(1916)을 호출한다. 문자열 대조 모듈(1916)은, 도 14에서의 문자열 대조 처리(1404)와 등가인 처리를 담당하는 모듈이고, 지명 표기 생성 규칙 파일(1909)로부터 지명 표기 네트워크 생성 프로그램(1918)에 의해 생성된 지명 표기 네트워크 데이터(1919)를 참조하고, 입력 문자열이 어떠한 식별자의 지명 표기에 해당하는지를 조사한다.The embodiment shown in FIG. 19 is based on a directory register program 1907 executed on a calculator, including a mouse 1901, a keyboard 1902, a display 1903, a printer 1904, an input file 1905, and an output file ( 1906, the name addition information file 1908, and the name string generation rule file 1909 are connected to each other. Display and input / output are performed through the interface module 1911. When the character string to be searched is input, the name information retrieval module 1914 calls the character string matching module 1916. The string matching module 1916 is a module that is in charge of processing equivalent to the string matching processing 1404 in FIG. 14, and is generated by the geographical name notation network generation program 1918 from the geographical name notation generation rule file 1909. Refer to the notation network data (1919) and examine which identifier's place name notation corresponds to the input string.

지명 정보 검색 모듈(1914)은, 얻어진 식별자를 단서로, 지명 부가 정보 파일(1908)로부터, 표준형과, 우편 번호 등의 부가적인 정보를 검색한다. 또한, 지명 표기 전개 모듈(1917)은, 지명 표기 네트워크 데이터(1919)로부터 있을 수 있는 이표기를 전부 열거한다. 얻어진 이표기군은, 지명 리스트 데이터(1912)에 저장하고, 필요에 따라 인터페이스 모듈(1911)을 통해 출력한다. 또한, 지명 리스트 소트 모듈(1913)은, 조작자의 지시에 따라, 이표기군의 순서를 재배열하여 출력한다. 이러한 처리를 위한 입력은, 키보드(1901), 입력 파일(1905), 통신 모듈(1910)의 어느 하나를 통해 행해져도 좋다. 또한, 출력은, 디스플레이(1904), 출력 파일(1906), 통신 모듈(1910)의 어느 하나를 통해 행해도 좋다.The name information retrieval module 1914 retrieves additional information such as a canonical form and a postal code from the name additional information file 1908 with the obtained identifier. In addition, the place name notation development module 1917 enumerates all possible notation names from the place name notation network data 1919. The obtained notation group is stored in the place name list data 1912 and output through the interface module 1911 as necessary. The name list sorting module 1913 rearranges and outputs the order of the two notation groups according to the operator's instructions. The input for such processing may be performed through any one of the keyboard 1901, the input file 1905, and the communication module 1910. The output may be performed through any one of the display 1904, the output file 1906, and the communication module 1910.

도 20에 도시된 도 19의 실시예의 디스플레이(1903)에 표시되는 화면예에서, 도 20(A)에 나타낸 예는, 조작자가, 「川越市小ヶ谷」이라는 문자열을 입력하여, 검색을 실행했을 때에 디스플레이(1903)에 표시되는 화면예이다. 입력 문자열은, 필드(2005)에 입력되고, 버튼(2006)을 마우스로 클릭함으로써, 검색이 실행된다. 검색의 결과, 입력 문자열에 해당하는 것을 알 수 있는 문자열은, 윈도우(2007)에 표시된다. 각 행의 「표준」의 항목에는, 그 문자열이 표준형인지의 여부가 나타내어진다. 항목 「지명」은, 그 문자열을 표시한다. 항목 「우편 번호」에는, 그 문자열에 대응하는 우편 번호를 표시하지만, 그 밖의 그 지역의 부가 정보를 표시해도 좋다.In the screen example shown on the display 1903 of the embodiment of FIG. 19 shown in FIG. 20, in the example shown in FIG. 20A, an operator inputs a character string "川越市小ヶ谷" and executes a search. Is an example of the screen displayed on the display 1903 when the display is performed. The input character string is input in the field 2005, and a search is performed by clicking the button 2006 with a mouse. As a result of the search, a character string that corresponds to the input character string is displayed in the window 2007. The "standard" item of each line indicates whether or not the character string is a standard type. The item "name" displays the string. In the item "zip code", a postal code corresponding to the character string is displayed, but additional information of other regions may be displayed.

영역(2004)에 나열된 「표준」, 「지명」, 「우편 번호」의 프레임은 버튼이 되고, 각 버튼을 마우스에 의해 클릭함으로써, 각각의 항목에 기초를 둔 행의 재배열을 지시한다. 윈도우(2008)는, 검색의 옵션을 지정하기 위한 것이다. 여기서, 표준형만을 표시하거나, 字, 大字 등에 기초하여 이표기군을 표시하거나, 통칭명(「** 단지」등)에 기초하여 이표기군을 표시하는지를 지정한다. 버튼(2002)은, 표시 내용의 인쇄를 지시하기 위한 버튼이고, 버튼(2001)은, 키보드와 디스플레이을 대신하여, 파일을 입출력하는 모드에의 전환을 위한 버튼이다. 또한, 버튼(2003)은, 프로그램의 종료를 지시하기 위한 버튼이다.The frames of "standard", "name", and "zip code" listed in the area 2004 become buttons, and by clicking each button with a mouse, the rearrangement of the rows based on the respective items is instructed. The window 2008 is for specifying an option of a search. Here, it is specified whether to display only the standard form, to display the notation group based on characters, large characters, etc., or to display the notation group based on a generic name ("** complex", etc.). The button 2002 is a button for instructing the printing of the display contents, and the button 2001 is a button for switching to a mode for inputting / outputting a file instead of the keyboard and the display. The button 2003 is a button for instructing the end of the program.

도 20(B)에 열린 윈도우(2009)는, 대조 결과 얻어진 지명의 읽는 법, 小字, 우편 번호 등의 상세한 정보를 표시하는 윈도우이다. 이 윈도우(2009)는, 윈도우(2007) 상에 표시된 검색 결과를 마우스로 클릭함으로써 기동한다.The window 2009 opened in FIG. 20B is a window that displays detailed information such as how to read a place name, a Chinese character, a postal code, and the like obtained as a result of the verification. This window 2009 is activated by clicking on a search result displayed on the window 2007 with a mouse.

또, 본 발명의 실시예에 따른 표기 방법에 따라 표기된 지명 문자열은, FD, MO, DVD 등의 기억 매체에 지명 사전으로서 저장하여 제공할 수 있다.In addition, the place name character strings written according to the notation method according to the embodiment of the present invention can be stored and provided as a place name dictionary in a storage medium such as FD, MO, DVD, or the like.

이상 설명한 바와 같이 본 발명에 따르면, 지명의 표기에 대부분의 이표기가 있는 경우라도, 있을 수 있는 모든 지명 문자열을 망라한 지명 사전을 적은 수고로 작성할 수 있다. 또한, 고속의 대조 처리가 가능한 네트워크 형식의 지명 사전을 용이하게 작성할 수 있다.As described above, according to the present invention, even if most names are represented in place names, it is possible to prepare a place name dictionary that covers all possible place name strings with little effort. In addition, it is possible to easily create a dictionary of names in a network format capable of fast collation processing.

Claims

In a place name expression method in which a place name representing a region represents a set of place names strings having a plurality of double letters represented by an array of words representing different regions but the same region,

For each substring constituting part or all of the name string, an array of character or syntax categories is defined, and the name representation method is represented by a syntax category consisting of an array of the above-described character or syntax categories. .

The method of claim 1,

And the name string is written using a substitution symbol indicating which other character or phrase category string of the syntax category is replaced, and a name symbol indicating which syntax category indicates a specific region.

In the name string matching method,

A place name represented by a syntax category, in which the substring of the input string defines an array of character or syntax categories for each substring that constitutes some or all of the named string in advance, and consists of an array of characters or pre-defined syntax categories. A geographical name string matching method, characterized in that the geographical name is matched among input strings by judging whether or not it matches one of the character strings.

In the named string matching device,

Storage means for defining an array of character or syntax categories for each substring constituting part or all of the name string, and storing the name string represented by the syntax category consisting of the array of characters or the finished syntax category;

Input means for entering a string,

Means for checking whether or not the inputted character string is a named character string stored in the storage means; and

Means for outputting the result of said collation

Name string matching device, characterized in that provided with.

In the name string recognition device,

Character reading means for reading a character described on the document as an input of an image obtained by converting the color tone of the surface of the document into an electrical signal,

The place name string matching means of Claim 4 is provided,

And an input means in the name string matching means inputs a character string from the character reading means.

In the mail classification system,

A postal matter classification system, wherein the designated name character string recognition apparatus according to claim 5 is used to recognize a designated name string among the recipients of the postal matter, classify the postal matter, or print the recognition result on the postal matter.

In the designated character string recording medium,

Each name of a place name having a plurality of double notations represented by an array of words representing different regions but representing the same region, for each substring that forms part or all of the names string, Define an array,

A designated character string recording medium characterized by being represented by a syntax category consisting of the above-described character or an array of defined syntax categories.

A character reading device that reads a character described on a document by using an image obtained by converting a light and shade of the surface of a document into an electrical signal, wherein the name string is stored using the name expression method according to claim 1. Means, and means for recognizing a place name as an arrangement of partial pictures in the input image, wherein each partial image is found to be similar to each character included in one of the place name strings represented by the place name representation. Name string recognition device.

A method of presenting a geographical name, characterized in that it is expressed according to a rule for generating a substring consisting of a word or a part of a word.