CN102236706A - Fast fuzzy pinyin inquiry method of mass Chinese file names - Google Patents
Fast fuzzy pinyin inquiry method of mass Chinese file names Download PDFInfo
- Publication number
- CN102236706A CN102236706A CN 201110163943 CN201110163943A CN102236706A CN 102236706 A CN102236706 A CN 102236706A CN 201110163943 CN201110163943 CN 201110163943 CN 201110163943 A CN201110163943 A CN 201110163943A CN 102236706 A CN102236706 A CN 102236706A
- Authority
- CN
- China
- Prior art keywords
- chinese
- pinyin
- filename
- file name
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000001914 filtration Methods 0.000 claims abstract description 7
- 230000008859 change Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 7
- 230000013011 mating Effects 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 description 4
- 238000002203 pretreatment Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000008676 import Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 210000004243 sweat Anatomy 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000033772 system development Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
本发明公开了一种海量中文文件名快速模糊拼音查询方法,包括如下步骤:1)判断查询字符串是否为中文拼音,如果为中文拼音则根据模糊拼音规则进行转换并扩展形成新的查询字符串,如果不是中文拼音则查询字符串不变;2)将上述查询字符串执行SetBackwardOracleMatching算法构建模式串识别的神谕有限自动机;3)遍历文件名数据库,对所述文件名数据库中存储的文件名进行预过滤;4)在文件名数据库中对步骤3)预过滤后的文件名执行SBOM算法匹配,对所有符合条件的查询结果进行排序并返回查询结果具有海量文件下的查询速度快、支持中文快速查询、支持模糊拼音精确查询等特点。
The invention discloses a fast fuzzy pinyin query method for massive Chinese file names, comprising the following steps: 1) judging whether the query string is Chinese pinyin, and if it is Chinese pinyin, converting and expanding to form a new query string according to the rules of fuzzy pinyin , if it is not Chinese pinyin, the query string remains unchanged; 2) execute the SetBackwardOracleMatching algorithm on the above query string to construct an oracle finite automaton for pattern string recognition; 3) traverse the file name database, and check the files stored in the file name database 4) Perform SBOM algorithm matching on the file names after step 3) pre-filtering in the file name database, sort all eligible query results and return the query results, with fast query speed and support for massive files Features such as fast query in Chinese and support for precise query of fuzzy pinyin.
Description
Claims (6)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 201110163943 CN102236706B (en) | 2011-06-17 | 2011-06-17 | Fast fuzzy pinyin inquiry method of mass Chinese file names |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN 201110163943 CN102236706B (en) | 2011-06-17 | 2011-06-17 | Fast fuzzy pinyin inquiry method of mass Chinese file names |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102236706A true CN102236706A (en) | 2011-11-09 |
| CN102236706B CN102236706B (en) | 2012-12-05 |
Family
ID=44887352
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN 201110163943 Expired - Fee Related CN102236706B (en) | 2011-06-17 | 2011-06-17 | Fast fuzzy pinyin inquiry method of mass Chinese file names |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102236706B (en) |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102915333A (en) * | 2012-09-05 | 2013-02-06 | 佳都新太科技股份有限公司 | Off-line checking method of massive voice recording file |
| CN103838876A (en) * | 2014-03-27 | 2014-06-04 | 烽火通信科技股份有限公司 | Method for retrieving document through pinyin and document retrieval system |
| CN104268203A (en) * | 2014-09-23 | 2015-01-07 | 深圳市中兴移动通信有限公司 | Mobile terminal and junk information effectively filtering method and device thereof |
| CN107220381A (en) * | 2017-06-28 | 2017-09-29 | 南京云问网络技术有限公司 | A kind of input text automatic error correction method towards question answering system |
| CN108132999A (en) * | 2017-12-21 | 2018-06-08 | 恒宝股份有限公司 | The processing method and system of a kind of masurium |
| CN109145161A (en) * | 2018-07-12 | 2019-01-04 | 南京师范大学 | Chinese Place Names querying method, device and equipment |
| CN110188166A (en) * | 2019-05-15 | 2019-08-30 | 北京字节跳动网络技术有限公司 | Document search method, device and electronic equipment |
| CN114564452A (en) * | 2022-03-02 | 2022-05-31 | 统信软件技术有限公司 | File positioning method, computing device and storage medium |
| CN115794745A (en) * | 2023-01-29 | 2023-03-14 | 深圳市乐凡信息科技有限公司 | File searching method, system, device and storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101388012A (en) * | 2007-09-13 | 2009-03-18 | 阿里巴巴集团控股有限公司 | Phonetic check system and method with easy confusion tone recognition |
| US20090292693A1 (en) * | 2008-05-26 | 2009-11-26 | International Business Machines Corporation | Text searching method and device and text processor |
| WO2010003129A2 (en) * | 2008-07-03 | 2010-01-07 | The Regents Of The University Of California | A method for efficiently supporting interactive, fuzzy search on structured data |
| CN101794313A (en) * | 2010-03-10 | 2010-08-04 | 中国农业大学 | File search device of embedded system |
| CN102081649A (en) * | 2010-12-31 | 2011-06-01 | 深圳联友科技有限公司 | Method and system for searching computer files |
-
2011
- 2011-06-17 CN CN 201110163943 patent/CN102236706B/en not_active Expired - Fee Related
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101388012A (en) * | 2007-09-13 | 2009-03-18 | 阿里巴巴集团控股有限公司 | Phonetic check system and method with easy confusion tone recognition |
| US20090292693A1 (en) * | 2008-05-26 | 2009-11-26 | International Business Machines Corporation | Text searching method and device and text processor |
| WO2010003129A2 (en) * | 2008-07-03 | 2010-01-07 | The Regents Of The University Of California | A method for efficiently supporting interactive, fuzzy search on structured data |
| CN101794313A (en) * | 2010-03-10 | 2010-08-04 | 中国农业大学 | File search device of embedded system |
| CN102081649A (en) * | 2010-12-31 | 2011-06-01 | 深圳联友科技有限公司 | Method and system for searching computer files |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102915333A (en) * | 2012-09-05 | 2013-02-06 | 佳都新太科技股份有限公司 | Off-line checking method of massive voice recording file |
| CN103838876A (en) * | 2014-03-27 | 2014-06-04 | 烽火通信科技股份有限公司 | Method for retrieving document through pinyin and document retrieval system |
| CN103838876B (en) * | 2014-03-27 | 2017-06-20 | 烽火通信科技股份有限公司 | Use the document retrieval method and system of phonetic retrieval file |
| CN104268203A (en) * | 2014-09-23 | 2015-01-07 | 深圳市中兴移动通信有限公司 | Mobile terminal and junk information effectively filtering method and device thereof |
| CN104268203B (en) * | 2014-09-23 | 2016-09-14 | 努比亚技术有限公司 | A mobile terminal and its method and device for effectively filtering junk information |
| CN107220381A (en) * | 2017-06-28 | 2017-09-29 | 南京云问网络技术有限公司 | A kind of input text automatic error correction method towards question answering system |
| CN108132999A (en) * | 2017-12-21 | 2018-06-08 | 恒宝股份有限公司 | The processing method and system of a kind of masurium |
| CN109145161A (en) * | 2018-07-12 | 2019-01-04 | 南京师范大学 | Chinese Place Names querying method, device and equipment |
| CN110188166A (en) * | 2019-05-15 | 2019-08-30 | 北京字节跳动网络技术有限公司 | Document search method, device and electronic equipment |
| CN114564452A (en) * | 2022-03-02 | 2022-05-31 | 统信软件技术有限公司 | File positioning method, computing device and storage medium |
| CN115794745A (en) * | 2023-01-29 | 2023-03-14 | 深圳市乐凡信息科技有限公司 | File searching method, system, device and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102236706B (en) | 2012-12-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102236706A (en) | Fast fuzzy pinyin inquiry method of mass Chinese file names | |
| US11853334B2 (en) | Systems and methods for generating and using aggregated search indices and non-aggregated value storage | |
| US7685106B2 (en) | Sharing of full text index entries across application boundaries | |
| US8745061B2 (en) | Suffix array candidate selection and index data structure | |
| Chaudhuri et al. | Extending autocompletion to tolerate errors | |
| EP1643384B1 (en) | Query forced indexing | |
| CN107844493B (en) | File association method and system | |
| US11977581B2 (en) | System and method for searching chains of regions and associated search operators | |
| CN114722137A (en) | Security policy configuration method, device and electronic device based on sensitive data identification | |
| US12321340B2 (en) | System and method for value based region searching and associated search operators | |
| CN106708814B (en) | Retrieval method and device based on relational database | |
| CN103365992A (en) | Method for realizing dictionary search of Trie tree based on one-dimensional linear space | |
| JP4998237B2 (en) | Logical structure model creation support program, logical structure model creation support apparatus, and logical structure model creation support method | |
| US10545960B1 (en) | System and method for set overlap searching of data lakes | |
| WO2019171190A1 (en) | System and method for searching based on text blocks and associated search operators | |
| US8484221B2 (en) | Adaptive routing of documents to searchable indexes | |
| CN102521418A (en) | Pinyin storage structure and pinyin input method | |
| CN118349621A (en) | Index establishment method, index retrieval method and electronic equipment | |
| Arseneau et al. | STILT: Unifying spatial, temporal and textual search using a generalized multi-dimensional index | |
| CN114817498A (en) | User intention identification method, device, equipment and storage medium | |
| CN114969152A (en) | A fast fuzzy query method and system for road passenger station | |
| CN115982102A (en) | Meteorological big data management method and management system based on elastic search | |
| KR100659370B1 (en) | Method for Forming Document DV by Information Thesaurus Matching and Information Retrieval Method | |
| US12511316B2 (en) | Systems and methods for generating and using aggregated search indices and non-aggregated value storage | |
| Ilić et al. | Comparison of data mining algorithms, inverted index search and suffix tree clustering search |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C53 | Correction of patent of invention or patent application | ||
| CB03 | Change of inventor or designer information |
Inventor after: Yuan Xinyu Inventor after: Li Ying Inventor after: Wu Chaohui Inventor after: Yin Jianwei Inventor before: Yuan Xinyu Inventor before: Li Ying |
|
| COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: YUAN XINYU LI YING TO: YUAN XINYU LI YING WU ZHAOHUI YIN JIANWEI |
|
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant | ||
| EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20111109 Assignee: IPANEL. TV Inc. Assignor: Zhejiang University Contract record no.: 2013330000103 Denomination of invention: Fast fuzzy pinyin inquiry method of mass Chinese file names Granted publication date: 20121205 License type: Common License Record date: 20130425 |
|
| LICC | Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model | ||
| CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20121205 |
|
| CF01 | Termination of patent right due to non-payment of annual fee |