CN112256821B - Chinese address completion method, device, equipment and storage medium - Google Patents
Chinese address completion method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN112256821B CN112256821B CN202011013241.1A CN202011013241A CN112256821B CN 112256821 B CN112256821 B CN 112256821B CN 202011013241 A CN202011013241 A CN 202011013241A CN 112256821 B CN112256821 B CN 112256821B
- Authority
- CN
- China
- Prior art keywords
- address
- name information
- place name
- complete
- level
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/387—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a method, a device, equipment and a storage medium for Chinese address completion, which aim to quickly search and complete Chinese addresses and do not influence the performance of a system when a database is increased. The method comprises the following steps: storing the complete addresses in the address library according to the structure of the Trie; marking the address level of each path in the Trie on the last node of the path; searching the input address keywords in the Trie to obtain the address keywords marked with the address level; and analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords.
Description
Technical Field
The embodiment of the application relates to the technical field of information processing, in particular to a method, a device, equipment and a storage medium for Chinese address completion.
Background
The Chinese address completion is a technology closely related to the daily life of people, aims to complete incomplete addresses input by people into complete addresses, and has important application in various scenes and applications, such as online shopping and address filling when dealing with certificates. One of the existing address completion methods is to store a large amount of address information in a database, then query the information according to a specific field, and the other scheme is to optimize the query in an inverted index mode.
The prior art has the defects that when the database is too large, the efficiency in address inquiry can be reduced, the inquiry time is nearly linearly increased, the inverted index needs to be designed for a large number of databases in advance, and the time is consumed.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for Chinese address completion, which aim to realize quick searching and Chinese address completion.
An embodiment of the present application provides a method for completing a chinese address, where the method includes:
Storing the complete addresses in the address library according to the structure of the Trie;
marking the address level of each path in the Trie on the last node of the path;
searching the input address keywords in the Trie to obtain the address keywords marked with the address level;
And analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, before storing the complete address in the address library according to the Trie structure, the method further includes:
Establishing an index for each of the complete addresses in the address library;
Storing the place name information in each complete address in the address library in a grading manner;
and generating a mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
Optionally, storing the complete address in the address library according to the Trie structure includes:
Sequentially inserting the complete address into a path of the Trie by taking characters as a unit from the first character;
after the first place name information in the complete address is inserted, sequentially inserting the second place name information in the complete address into another path in the Trie by taking characters as a unit from the first character;
And after the second place name information is inserted, all the place name information is inserted into the Trie in sequence according to the rule for storage.
Optionally, the method further comprises:
when the first character of the current place name information is the same as the first character in the previous path, the node corresponding to the first character in the previous path is used as the root node of the path corresponding to the current place name information, and the current place name information is inserted into a Trie.
Optionally, searching the input place name information in the Trie to obtain the place name information marked with the address level, including:
Searching each character of the address keyword from the root node of the Trie to obtain a plurality of paths corresponding to the address keyword;
And determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
Optionally, the method further comprises:
and when the shortest path matched with the place name information cannot be found in the Trie, marking the place name information by using a mark outside the address level.
Optionally, resolving the labeling information of the address keyword to obtain the complete address corresponding to the address keyword, including:
determining an address level of each piece of place name information in the address key words;
Determining the complete address where the place name information is located according to the mapping of each place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
When the address key word contains at least two place name information, calculating intersections among a plurality of complete addresses to obtain the complete addresses corresponding to the address key word;
When the address key word only contains one piece of place name information, the complete address with highest frequency of use is selected as the complete address of the address key word.
A second aspect of an embodiment of the present application provides a device for chinese address completion, the device including:
The first address storage module is used for storing the complete addresses in the address library according to the structure of the Trie;
the address level labeling module is used for labeling the address level of each path on the last node of the path in the Trie;
An address level searching module: searching the input address keywords in the Trie to obtain the address keywords marked with the address level;
complete address acquisition module: and analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, the apparatus further comprises:
the index establishing module is used for establishing an index for each complete address in the address library;
The second storage module is used for storing the place name information in each complete address in the address library in a grading manner;
And the mapping generation module is used for generating mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
Optionally, the first address storage module includes:
the first character insertion sub-module is used for sequentially inserting the complete address into one path of the Trie by taking characters as a unit from the first character;
the second character inserting sub-module is used for sequentially inserting the second place name information in the complete address into another path in the Trie by taking the character as a unit from the first character after the first place name information in the complete address is inserted;
And the third character insertion sub-module is used for sequentially inserting all the place name information into the Trie for storage according to the rule after the second place name information is inserted.
Optionally, the first address storage module further includes:
And the fourth character insertion sub-module is used for inserting the current place name information into the Trie by taking a node corresponding to the first character in the previous path as a root node of the path corresponding to the current place name information when the first character of the current place name information is the same as the first character in the previous path.
Optionally, the address level lookup module includes:
The path searching sub-module is used for searching each character of the address keyword from the root node of the Trie to obtain a plurality of paths corresponding to the address keyword;
And the first address level obtaining sub-module is used for determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
Optionally, the address level lookup module further includes:
and the second address level obtaining sub-module is used for marking the place name information by using a mark outside the address level when the shortest path matched with the place name information cannot be found in the Trie.
Optionally, the complete address obtaining module includes:
an address level determination sub-module for determining an address level of each of the place name information in the address keyword;
A complete address obtaining sub-module, configured to determine, according to a mapping between each piece of location name information and an index of a complete address to which the location name information belongs, the complete address where the location name information is located, and obtain a plurality of complete addresses including the location name information;
The first complete address determination submodule is used for calculating intersection sets among a plurality of complete addresses when the address keywords contain at least two pieces of place name information, and obtaining the complete addresses corresponding to the address keywords;
And the second complete address obtaining sub-module is used for selecting the complete address with highest use frequency as the complete address of the address key word when the address key word only contains one piece of place name information.
A third aspect of the embodiments of the present application provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the present application.
A fourth aspect of the embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect of the application when the processor executes the computer program.
The method for completing Chinese address provided by the application is characterized in that the index of a complete address in an address library is pre-established, the place name information of each level contained in the complete address is stored in a grading manner, the mapping between the place name information of each level and the index of the complete address where the place name information is located is generated, after the pretreatment of the address library is completed, the complete address information in the address library is stored according to the structure of a Trie, wherein all the place name information is stored in each node of the Trie in a character unit, the place name information is a complete path, a plurality of place name information with the same character from the first character can share one or more nodes, the address level of the path is marked on the last node of each path, when a user inputs an address keyword, the level of all the place name information can be searched in the Trie according to the place name information of the corresponding level stored in the address library, and the complete address corresponding to all the place name information is found according to the mapping, and Chinese address completion is realized.
By using the Trie tree structure to store the complete address in the address library, the query time is reduced by using the special structure of the Trie tree, the quick search of the address library is realized, the Chinese address is quickly complemented, and the performance is not obviously reduced along with the increase of the address library.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for storing address library data according to an embodiment of the present application;
FIG. 2 is a flow chart of a method for Chinese address completion according to an embodiment of the present application;
Fig. 3 is a block diagram of a Trie according to an embodiment of the present application;
Fig. 4 is a schematic diagram of an apparatus for chinese address completion according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The existing Chinese address completion method is to query entry information matched with an input address in a database, so that the optimization of the performance and query conditions of the database is very dependent, and when the database is too huge, the query rate is reduced. Still another approach is to optimize the query in an inverted index manner, which requires pre-designing the database and creating the index, which is labor intensive.
In the embodiment of the application, the data in the address library is stored by adopting the structure of the Trie, so that the storage space is greatly saved, the Chinese address is complemented according to the mapping of the place name information of different levels and the corresponding complete address index, the searching speed is high, and the inquiring speed is not influenced when the database is increased.
Referring to fig. 1, fig. 1 is a flowchart of a method for storing address base data according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
s11: and establishing an index for each complete address in the address library.
In this embodiment, the address library is a database storing complete addresses, the complete addresses refer to chinese addresses containing highest-level place name information to lowest-level place name information, the index is a pointer list of the complete addresses in the address library, which corresponds to a directory in the address library, and the location of the complete addresses in the address library can be quickly determined by querying the index.
The index is built for each complete address in the address library, so that the complete address in the address library can be quickly searched, a pointer list pointing to the complete address in the address library is independently built, and the complete address in the address library is searched through the index during searching.
In this embodiment, for example, "Guangdong province, shenzhen city, nan mountain area, peach garden road" and "Beijing city, sea lake area," Zhongguancun road "are all a complete address," Guangdong province "," Beijing city "belongs to the highest-level place name information," peach garden road "and" Zhongguancun road "belong to the lowest-level place name information. The index of the Shenzhen city, the south mountain area and the peach garden road in Guangdong province can be set as "A", the index of the Beijing city, the sea lake area and the Zhongguancun road can be set as "B", namely the position of the complete address of the Shenzhen city, the south mountain area and the peach garden road in Guangdong province in the address library can be determined through the index A, so that the quick inquiry is realized, and the same effect as the index B is realized.
S12, storing the place name information in each complete address in the address library in a grading manner;
In this embodiment, a complete address is composed of a plurality of pieces of place name information with different levels, the name information with the same level is stored together, and quick search can be performed according to the level of the place name information during query.
In this embodiment, for example, all address information is divided into four levels, "province", "city", "county", "street, district", and may be respectively denoted as "P", "C", "D", "S", where the provincial level autonomous region, the special administrative region may also be regarded as "P" level place name information, the highest address level is "P" level, and the lowest address level is "S" level, for example, "Zhejiang province, guangdong province, jiangsu province, beijing city, and inner Mongolian autonomous region" are stored as P level address information.
And S13, generating a mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
In this embodiment, each place name information belongs to a complete address, and each complete address is indexed in S11. And generating a mapping of each place name information and the index of the complete address to which the place name information belongs according to the affiliation of each place name information and the index of the complete address. When the place name information is stored separately, each piece of independent place name information may belong to a plurality of complete addresses, and a plurality of D-level place name information is subordinate in the C-level place name information, so when the mapping is generated, the index corresponding to the C-level place name information contains the index corresponding to the subordinate D-level place name information. The S-level place name information may have the same name in the same level, and when the map is generated, the index corresponding to the S-level place name information may include all indexes of the same-name place name information. After the mapping is generated, the index set of the complete address to which the place name information belongs can be queried according to the place name information.
In this embodiment, for example, the indexes of "Guangdong province, shenzhen city, nan mountain area, peach garden road" are "0", the indexes of "Guangdong province, shenzhen city, longpost area, longxiang main road" are "1", and then the address map of P level { Guangdong province, [0,1] }, the address map of C level { Shenzhen city [0,1] }, the address map of D level { nan mountain area, [0]; dragon' S guard zone [1] }, S level address mapping { peach garden, [0]; large dragon road, [1] }.
Through the steps from S11 to S13, the index of the complete address in the address library is established, each place name information is stored separately according to the level, and the mapping between each place name information and the index of the complete address to which the place name information belongs is established, so that the index of the complete address can be queried only according to the place name information.
Referring to fig. 2, fig. 2 is a flowchart of a method for chinese address completion according to an embodiment of the present application. As shown, the method includes the steps of:
And S21, storing the complete addresses in the address library according to the Trie structure.
In this embodiment, the Trie is also called a dictionary tree, and the Trie structure is a tree structure, which is a variant of the hash tree, and the Trie can use a common prefix to reduce the storage space, and perform well on text word frequency statistics tasks.
In this embodiment, there is no character on the root node of the Trie, there is only one character on each node except the root node, and multiple characters on one path form a character string, and all the child nodes of each node contain different character strings.
In this embodiment, the method for storing the complete address in the address library according to the Trie structure includes the following steps:
S21-1: the complete address is inserted into one path of the Trie in units of characters from the first character.
In this embodiment, a string of characters on each path of the Trie structure represents one address information of the complete address.
In this embodiment, when the complete address is inserted into the Trie, the first character of the highest-level address information in the complete address is inserted, and all the characters of the highest-level address information are sequentially inserted into a path, where the last character of the highest-level address information is the last node of the path.
S21-2: and after the first place name information in the complete address is inserted, sequentially inserting the second place name information in the complete address into another path in the Trie by taking the character as a unit from the first character.
In this embodiment, after the highest-level place name information is inserted into the Trie to form a path, the second character of the second-level place name information is inserted into the root node of a new path, and all the place name information of the second-level is inserted into the new path in turn, and similarly, the last character of the place name information of the second-level is the last node of the new path.
S21-3: and after the second place name information is inserted, all the place name information is inserted into the Trie in sequence according to the rule for storage.
In this embodiment, each place name information is sequentially inserted into the Trie tree in units of characters, wherein a string of characters on each path is one place name information. And storing all place name information in the address library into a Trie, thus completing the data storage.
For example, fig. 3 is a structural diagram of a Trie provided in an embodiment of the present application, and as shown in fig. 3, "Jiangsu Province, Nanjing City, Jianye District, Xinglong Street" is stored in the Trie structure, only one character is included in each node, and each path represents a place name information.
In another embodiment of the present application, when the first character of the current place name information is the same as the first character in the previous path, the node corresponding to the first character in the previous path is used as the root node of the path corresponding to the local name information, and the current place name information is inserted into the Trie.
When the first character of the current input place name information is used as the first character in the previous path, the node can be directly used as the root node of the current input place name information, and when characters are inserted in the next step, if the second character is still a child node of the current path, the node corresponding to the second character can still be used as the second node of the current input place name information until the same character can not be found in the current path by the subsequent continuous characters of the current place name information, the subsequent characters of the current place name information are inserted into a new node, and the subsequent characters are inserted into the new node of the current path no matter whether the previous characters occur or not, so that the insertion of the current place name information is completed. The place name information of different address levels may also share the same root node.
As shown in fig. 3, when the complete address "Jiangxi province, nanchang city, dong lake area, jiande street" is inserted into the Trie tree, the "Jiang" word of the place name information "Jiangxi province" appears before, and the node corresponding to the "Jiang" word is directly used as the root node of the place name information "Jiangxi province", and a new child node is created under the node corresponding to the "Jiang" word, so as to generate a new path. The insertion mode of the Nanchang city is the same, if the first character of the east lake area is not appeared in the previous root node, a new path is created for the east lake area, and if the build character of the Jiande street is appeared in the previous root node, a new path is created by directly taking the node corresponding to the build character as the root node.
The steps S21-1 to S21-2 are adopted to store the complete address according to the structure of the Trie, and the space is saved by sharing the prefix characters, so that the required address information can be quickly queried.
And S22, marking the address level of each path in the Trie on the last node of the path.
In this embodiment, after the complete addresses in the address library are all inserted into the Trie, the address level of each path, that is, the address level of each place name information, needs to be marked on each path. The address level is marked according to the address level of the place name information divided during storage.
In this embodiment, the way to label each path with the address level of the path is to label the last node of each path with the address level to which the path belongs. Thus, the address level of the place name information contained in the path can be obtained by analyzing the information of the last node.
As shown in FIG. 3, the last node of each path in the graph is marked with an address level, wherein the address of the P level is "Jiangsu province, Jiangxi province", the address of the C level is "Nanchang City, Nanjing City", the address of the D level is "Jianye district, Donghu district", the address of the S level is "Zhongnan street, Jiandeguan road".
In another embodiment of the application, a repeated path occurs, i.e. the location name information of different actual locations is repeated, when the address level is marked with a higher level address.
S23, searching the input address keywords in the Trie to obtain the address keywords marked with the address level.
In this embodiment, the address keyword refers to an incomplete address including a plurality of place name information, and the address keyword labeled with an address level is the address level of each place name information in the address keyword obtained by searching.
S23-1, searching each character of the address keyword from the root node of the Trie to obtain a plurality of paths corresponding to the address keyword.
In this embodiment, the address keyword is searched from the root node of the Trie to the lower node sequentially in units of characters. After the address key word is received, firstly, starting from a root node, searching a first character of first place name information in the address key word, determining a root node corresponding to the character, transferring to a next-stage subtree of the root node, searching a second character of the first place name information in the address key word, determining a corresponding subtree according to a node corresponding to the second character, continuing to transfer to a next-stage subtree to search a third character of the first place name information in the address key word, reading address level information on a certain node until the next-stage subtree is not contained in the certain node, and marking each character of the place name information with the address level information after the address level information on the certain node is read. And then searching a path corresponding to the next place name information in the address key word from the root node of the Trie in the same way as the searching way of the first place name information. Until the last character position of the address keyword is found. After the search is completed, a plurality of paths corresponding to a plurality of place name information contained in the address key word are obtained.
In this embodiment, for example, the input keywords are: as shown in the figure, the 'Jianye district, Xinglong street' is created, firstly, the root node where the 'Jian' word is located is searched, then the node where the 'Ye' word is located is searched in the next-stage subtree of the root node, the node where the 'district' word is located is searched in the next-stage subtree, and the node does not contain the next-stage subtree, so that the path corresponding to the 'Jianye district' is obtained. Then, the root node where the 'Xing' word is located is searched, and the path where the Xinglong street is located can be determined according to the method.
S23-2, determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
In this embodiment, after a plurality of paths included in the input address keyword are found through the previous step, the address level of each path is determined by reading the address level information on the last node of the path.
In this embodiment, the plurality of place name information in the same address keyword has different address levels, and after reading the effective address level information on the last node of each path, all nodes on the path are marked according to the address levels.
In this embodiment, after determining the path of the "Jianye district" in S23-1, reading the address level information on the node corresponding to the "district" word as D, and marking the "Jianye district" as DDD. Similarly, "Xinglong Street" is labeled as SSSS, and the address keywords with labeled information are "Jianye District, Xinglong Street; (DDD, SSSS)".
In another embodiment of the present application, when the shortest path matched with the place name information cannot be found in the Trie for the input address keyword, the place name information is marked with a flag other than the address level.
The shortest path which can not be found out and matched with the place name information, namely, the path which can not be found out and matched with the place name information in the Trie, for example, wrong words exist in the input address keywords, the place name information can not be matched with the shortest path because of the fact that a county is arranged in a county, the name is changed, the address library is not updated in time, and the like. At this time, the place name information is marked with a flag other than the existing address level.
For example, a county, subordinate to XX is named XX, the address library has not yet updated this region, when the input keyword is XX, XX is labeled CCC, when the search for XX does not match the shortest path, XX is labeled "???", at this time, address completion is no longer possible.
S24, analyzing the labeling information of the address keywords to obtain the complete address corresponding to the address keywords.
In this embodiment, the address keyword marked with the address level is obtained, and the complete address corresponding to the address keyword can be obtained by analyzing the address keyword, so that the completion of the chinese address is realized. The method specifically comprises the following steps:
S24-1, determining the address level of each piece of place name information in the address key words.
In this embodiment, the address keyword labeled with the address level is obtained through S23, and at this time, the address keyword containing the labeling information is obtained, and further operations are required to be performed on the place name information included in the address keyword, and the labeling information on the address keyword needs to be analyzed, so as to obtain the address level of each place name information in the keyword.
Illustratively, a regular expression is used to extract the address level to which each place name information in the address key corresponds. Jianye district, namely, a large street is created in the step S23; the (DDD, SSSS) is extracted to obtain a 'Jianye district (DDD)' and a 'Xinglong street (SSSS)'.
S24-2, determining the complete address where the place name information is located according to the mapping of each place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information.
In this embodiment, the address level corresponding to each piece of location name information is obtained by resolving the labeling information in the address keyword, and the index of the complete address to which the location name information belongs can be directly searched in the location name information map of the corresponding level in the address library through the address level corresponding to each piece of location name information, so that all the complete addresses to which the location name information belongs can be determined.
For example, assume that the index of the complete address "Jiangsu province, nanjing, Jianye district, XXX way (representing multiple ways in Jianye district)" in the address library is "3,4,5, … …,15". The index of the complete address "Jiangsu province, nanjing city, Jianye district, xinghong street" in the address library is "5". The mapping of the index for the complete addresses of Jianye District and Jianye District to D-level place name information is {Jianye district, [3,4,5, … …,15] }, and the mapping of the indexes of the S-level place name information area and the complete addresses of the S-level place name information area { gorgeous street }, namely the S-level place name information area is { gorgeous street, [5,16,23] }, assuming that the S-level place name information area is a plurality of different complete addresses, such as the address area is provided with the indexes of the S-level place name information area "XX province, XX city, XX area, the index of the gorgeous street" is "16", "XX province, XX city, XX area, and the index of the gorgeous street" is "23".
After extracting two place name information of 'Jianye district (DDD)' and 'Xinghong street (SSSS)' from the address key words, searching 'Jianye district' in the D-level mapping to obtain index sets corresponding to the Jianye district as '3, 4,5, … … and 15', and searching all complete addresses belonging to the Jianye district through indexes. Searching 'Xinghuang street' in the S-level mapping to obtain index sets corresponding to the Xinghuang street as '5, 16 and 23', and searching all complete addresses to which the Xinghuang street belongs through the indexes.
S24-3: when the address key word contains at least two place name information, calculating intersection sets among a plurality of complete addresses to obtain the complete addresses corresponding to the address key word.
In this embodiment, when the address keyword at least includes two place name information, each place name information may correspond to a plurality of complete addresses, and at this time, an intersection between the plurality of complete addresses is calculated, so that a complete address corresponding to the address keyword may be obtained.
For example, in S24-2, a plurality of complete addresses are found through the indexes corresponding to "Jianye district" and "Xinglong street", at this time, the intersection between them is taken, and the complete addresses including "Jianye district" and "Xinglong street" are obtained as "Jiangsu province, nanjing city, Jianye district, Xinglong street".
S24-4: when the address key word only contains one piece of place name information, the complete address with highest frequency of use is selected as the complete address of the address key word.
In this embodiment, when only one place name information is included in the address keyword, the place name information may correspond to a plurality of complete addresses, and then the complete address with the highest frequency of use is selected as the complete address of the current address keyword according to the address generated in the past history.
For example, the input address keyword is "Changan road", a plurality of complete addresses including Changan road are obtained through inquiry, and the historical generation frequency of "Shanghai city, jing'an area, changan road" is highest, and then "Shanghai city, jing'an area, changan road" is selected as the complete address corresponding to Changan road.
In this embodiment, there is also a case where when the input address keyword does not include the next-level place name information, the address keyword of the level or more is complemented.
For example, the input address keyword is "Jianye district" to obtain all the complete addresses including Jianye district, calculate intersection of these complete addresses to obtain the complete address as "Jiangsu province, nanjing city, Jianye district", at this time, the address keyword does not have the place name information of the next level of the Jianye district, but cannot complement the next level address.
Through S21-S24, the complete address is stored according to the structure of the Trie, each path stores one piece of place name information, the last node of each path marks the address level of the place name information, when the address keyword is input, each place name information in the keyword is obtained by searching from the root node, the mark information on the last node is read, the address level of the place name information corresponding to the path is obtained, a plurality of complete addresses corresponding to the place name information of each level in the address keyword are found through the mapping of the place name information generated before and the complete address to which the place name information belongs, and the completion of the Chinese address is realized through calculating the intersection among the plurality of complete addresses. The use of Trie to store address library information can save space, and complete addresses corresponding to address keywords can be quickly searched by a hierarchical searching method. The searching efficiency is high, the searching speed is high, and the performance is not affected when the address library data is increased.
Based on the same inventive concept, an embodiment of the application provides a Chinese address completion device. Referring to fig. 4, fig. 4 is a schematic diagram of an apparatus for chinese address completion according to an embodiment of the present application. As shown in fig. 4, the apparatus includes:
A first address storage module 401, configured to store the complete address in the address library according to a Trie structure;
An address level labeling module 402, configured to label an address level of each path in the Trie on a last node of the path;
address level lookup module 403: searching the input address keywords in the Trie to obtain the address keywords marked with the address level;
Complete address acquisition module 404: and analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords.
Optionally, the apparatus further comprises:
the index establishing module is used for establishing an index for each complete address in the address library;
The second storage module is used for storing the place name information in each complete address in the address library in a grading manner;
And the mapping generation module is used for generating mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
Optionally, the first address storage module includes:
the first character insertion sub-module is used for sequentially inserting the complete address into one path of the Trie by taking characters as a unit from the first character;
the second character inserting sub-module is used for sequentially inserting the second place name information in the complete address into another path in the Trie by taking the character as a unit from the first character after the first place name information in the complete address is inserted;
And the third character insertion sub-module is used for sequentially inserting all the place name information into the Trie for storage according to the rule after the second place name information is inserted.
Optionally, the first address storage module further includes:
And the fourth character insertion sub-module is used for inserting the current place name information into the Trie by taking a node corresponding to the first character in the previous path as a root node of the path corresponding to the current place name information when the first character of the current place name information is the same as the first character in the previous path.
Optionally, the address level lookup module includes:
The path searching sub-module is used for searching each character of the address keyword from the root node of the Trie to obtain a plurality of paths corresponding to the address keyword;
And the first address level obtaining sub-module is used for determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
Optionally, the address level lookup module further includes:
and the second address level obtaining sub-module is used for marking the place name information by using a mark outside the address level when the shortest path matched with the place name information cannot be found in the Trie.
Optionally, the complete address obtaining module includes:
an address level determination sub-module for determining an address level of each of the place name information in the address keyword;
A complete address obtaining sub-module, configured to determine, according to a mapping between each piece of location name information and an index of a complete address to which the location name information belongs, the complete address where the location name information is located, and obtain a plurality of complete addresses including the location name information;
The first complete address determination submodule is used for calculating intersection sets among a plurality of complete addresses when the address keywords contain at least two pieces of place name information, and obtaining the complete addresses corresponding to the address keywords;
And the second complete address obtaining sub-module is used for selecting the complete address with highest use frequency as the complete address of the address key word when the address key word only contains one piece of place name information.
Based on the same inventive concept, another embodiment of the present application provides a readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method for chinese address completion according to any of the above embodiments of the present application.
Based on the same inventive concept, another embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the steps in the method for completing chinese addresses according to any one of the above embodiments of the present application.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The above method, apparatus, device and storage medium for chinese address completion provided by the present application have been described in detail, and specific examples are applied herein to illustrate the principles and embodiments of the present application, and the above examples are only used to help understand the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.
Claims (8)
1. A method of chinese address completion, the method comprising:
storing the complete address in the address library according to the structure of the Trie, wherein the method comprises the following steps:
Sequentially inserting the complete address into a path of the Trie by taking characters as a unit from the first character;
After the first place name information in the complete address is inserted, sequentially inserting second place name information in the complete address into another path in the Trie by taking characters as a unit from a first character, wherein the first place name information is the place name information of the highest level, and the second place name information is the place name information of the second level;
after the second place name information is inserted, all the place name information is inserted into a Trie in sequence according to the rule for storage;
marking the address level of each path in the Trie on the last node of the path;
searching the input address keywords in the Trie to obtain the address keywords marked with the address level;
analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords, wherein the analyzing comprises the following steps:
determining an address level of each piece of place name information in the address key words;
Determining the complete address where the place name information is located according to the mapping of each place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
When the address key word contains at least two place name information, calculating intersections among a plurality of complete addresses to obtain the complete addresses corresponding to the address key word;
When the address key word only contains one piece of place name information, the complete address with highest frequency of use is selected as the complete address of the address key word.
2. The method of claim 1, wherein before storing the complete addresses in the address library in the Trie structure, the method further comprises:
Establishing an index for each of the complete addresses in the address library;
Storing the place name information in each complete address in the address library in a grading manner;
and generating a mapping of the place name information of each level and the index of the complete address to which the place name information belongs.
3. The method according to claim 2, wherein the method further comprises:
when the first character of the current place name information is the same as the first character in the previous path, the node corresponding to the first character in the previous path is used as the root node of the path corresponding to the current place name information, and the current place name information is inserted into a Trie.
4. The method of claim 1, wherein searching the Trie for the input address keyword to obtain the address keyword labeled with the address level comprises:
Searching each character of the address keyword from the root node of the Trie to obtain a plurality of paths corresponding to the address keyword;
And determining the address level of each place name information in the address key words according to the address level marked on the last node of each path in the paths.
5. The method according to claim 4, wherein the method further comprises:
and when the shortest path matched with the place name information cannot be found in the Trie, marking the place name information by using a mark outside the address level.
6. An apparatus for chinese address completion, the apparatus comprising:
the first address storage module is used for storing the complete addresses in the address library according to the structure of the Trie, and comprises the following steps:
Determining the address level of each place name information in the address key words;
Determining the complete address where the place name information is located according to the mapping of each place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
When the address key word contains at least two place name information, calculating intersections among a plurality of complete addresses to obtain the complete addresses corresponding to the address key word;
when the address key word only contains one piece of place name information, selecting the complete address with highest use frequency as the complete address of the address key word;
the address level labeling module is used for labeling the address level of each path on the last node of the path in the Trie;
An address level searching module: searching the input address keywords in the Trie to obtain the address keywords marked with the address level;
Complete address acquisition module: analyzing the labeling information in the address keywords to obtain the complete address corresponding to the address keywords, wherein the analyzing comprises the following steps:
determining an address level of each piece of place name information in the address key words;
Determining the complete address where the place name information is located according to the mapping of each place name information and the index of the complete address to which the place name information belongs, and obtaining a plurality of complete addresses containing the place name information;
When the address key word contains at least two place name information, calculating intersections among a plurality of complete addresses to obtain the complete addresses corresponding to the address key word;
When the address key word only contains one piece of place name information, the complete address with highest frequency of use is selected as the complete address of the address key word.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 5 when executing the computer program.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011013241.1A CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011013241.1A CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN112256821A CN112256821A (en) | 2021-01-22 |
| CN112256821B true CN112256821B (en) | 2024-05-17 |
Family
ID=74231990
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011013241.1A Active CN112256821B (en) | 2020-09-23 | 2020-09-23 | Chinese address completion method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN112256821B (en) |
Families Citing this family (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113204613B (en) * | 2021-04-26 | 2022-05-03 | 北京百度网讯科技有限公司 | Address generation method, device, equipment and storage medium |
| CN113139033B (en) * | 2021-05-13 | 2024-07-09 | 平安国际智慧城市科技股份有限公司 | Text processing method, device, equipment and storage medium |
| CN113656450A (en) * | 2021-07-12 | 2021-11-16 | 大箴(杭州)科技有限公司 | Address processing method and device, electronic equipment and storage medium |
| CN113886650A (en) * | 2021-09-16 | 2022-01-04 | 北京捷通华声科技股份有限公司 | Address query method, system, device and storage medium |
| CN114818670B (en) * | 2022-03-25 | 2025-03-18 | 大箴(杭州)科技有限公司 | A method and device for completing address based on double array, and storage medium |
| CN115238692B (en) * | 2022-06-29 | 2025-10-03 | 青岛海尔科技有限公司 | A method, system, device and storage medium for identifying place names |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101132150B1 (en) * | 2010-10-12 | 2012-07-11 | (주)수지원넷소프트 | Address processing for formalizing addresses |
| CN108369582A (en) * | 2018-03-02 | 2018-08-03 | 福建联迪商用设备有限公司 | Address error correction method and terminal |
| CN109815498A (en) * | 2019-01-25 | 2019-05-28 | 深圳市小赢信息技术有限责任公司 | A kind of Chinese address standardized method, device and electronic equipment |
| CN110147420A (en) * | 2019-05-07 | 2019-08-20 | 武大吉奥信息技术有限公司 | A kind of place name address matching querying method and system based on spectrum model |
| CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
| CN110750704A (en) * | 2019-10-23 | 2020-02-04 | 深圳计算科学研究院 | Method and device for automatically completing query |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6658356B2 (en) * | 2002-02-15 | 2003-12-02 | International Business Machines Corporation | Programmatically deriving street geometry from address data |
-
2020
- 2020-09-23 CN CN202011013241.1A patent/CN112256821B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR101132150B1 (en) * | 2010-10-12 | 2012-07-11 | (주)수지원넷소프트 | Address processing for formalizing addresses |
| CN108369582A (en) * | 2018-03-02 | 2018-08-03 | 福建联迪商用设备有限公司 | Address error correction method and terminal |
| CN109815498A (en) * | 2019-01-25 | 2019-05-28 | 深圳市小赢信息技术有限责任公司 | A kind of Chinese address standardized method, device and electronic equipment |
| CN110147420A (en) * | 2019-05-07 | 2019-08-20 | 武大吉奥信息技术有限公司 | A kind of place name address matching querying method and system based on spectrum model |
| CN110442603A (en) * | 2019-07-03 | 2019-11-12 | 平安科技(深圳)有限公司 | Address matching method, apparatus, computer equipment and storage medium |
| CN110750704A (en) * | 2019-10-23 | 2020-02-04 | 深圳计算科学研究院 | Method and device for automatically completing query |
Also Published As
| Publication number | Publication date |
|---|---|
| CN112256821A (en) | 2021-01-22 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112256821B (en) | Chinese address completion method, device, equipment and storage medium | |
| CN108959244B (en) | Address word segmentation method and device | |
| CN103561133B (en) | A kind of IP address attribution information index method and method for quickly querying | |
| CN107153647B (en) | Method, apparatus, system and computer program product for data compression | |
| CN107092659B (en) | Universal tree structure storage and analysis method | |
| CN111460311A (en) | Search processing method, device and equipment based on dictionary tree and storage medium | |
| CN107798054B (en) | A Trie-based range query method and device | |
| CN103123650B (en) | A kind of XML data storehouse full-text index method mapped based on integer | |
| CN108038090B (en) | A kind for the treatment of method and apparatus of Text Address | |
| CN112528174A (en) | Address finishing and complementing method based on knowledge graph and multiple matching and application | |
| US10810258B1 (en) | Efficient graph tree based address autocomplete and autocorrection | |
| CN104679801A (en) | Point of interest searching method and point of interest searching device | |
| CN113886650A (en) | Address query method, system, device and storage medium | |
| CN108228657B (en) | Method and device for realizing keyword retrieval | |
| CN109918664B (en) | Word segmentation method and device | |
| CN108446357A (en) | A kind of mass data spatial dimension querying method based on two-dimentional geographical location | |
| CN107239549A (en) | Method, device and the terminal of database terminology retrieval | |
| CN107748778B (en) | A method and device for extracting addresses | |
| CN103002061A (en) | Method and device for mutual conversion of long domain names and short domain names | |
| CN112083812A (en) | Associative word determining method and device, storage medium and electronic equipment | |
| CN118551031B (en) | Platform content intelligent recommendation method and system based on natural language processing | |
| CN106777118B (en) | A kind of quick abstracting method of geographical vocabulary based on fuzzy dictionary tree | |
| CN113946580A (en) | Mass heterogeneous log data retrieval middleware | |
| CN104008205A (en) | Content routing inquiry method and system | |
| CN115563409A (en) | A method, device, equipment and medium for address administrative division identification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |