CN104133839A - Data processing method and system with intelligent detection function - Google Patents
Data processing method and system with intelligent detection function Download PDFInfo
- Publication number
- CN104133839A CN104133839A CN201410291108.0A CN201410291108A CN104133839A CN 104133839 A CN104133839 A CN 104133839A CN 201410291108 A CN201410291108 A CN 201410291108A CN 104133839 A CN104133839 A CN 104133839A
- Authority
- CN
- China
- Prior art keywords
- project
- data
- information
- module
- declaring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24564—Applying rules; Deductive queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种具有智能检测功能的数据处理方法及系统,步骤一:采集申报项目的数据;步骤二:读取项目申报库主数据表及各子数据表中数据,判断数据是否符合要求;步骤三:将步骤二中符合要求的项目数据从项目申报数据库同步到评审数据库;对符合要求的项目进行三次次筛选,将存储在评审数据库中的第三缓存区中的最终获奖项目的数据输出。数据判断优势在于系统自动提取查重因素,进行复杂匹配计算,减少人为因素,提高查重结果的正确性,减轻工作人员工作量,极大提高工作效率。
The invention discloses a data processing method and system with an intelligent detection function. Step 1: collect the data of the declared project; Step 2: read the data in the main data table and each sub-data table of the project declaration database, and judge whether the data meets the requirements ;Step 3: Synchronize the project data that meets the requirements in step 2 from the project application database to the review database; screen the projects that meet the requirements three times, and store the data of the final award-winning projects in the third cache area in the review database output. The advantage of data judgment is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency.
Description
技术领域technical field
本发明涉及一种具有智能检测功能的数据处理方法及系统。The invention relates to a data processing method and system with intelligent detection function.
背景技术Background technique
目前的科技奖励管理项目在数据处理上具有以下缺点:The current technology incentive management project has the following shortcomings in data processing:
科技奖励管理项目的数据量大,每年都有大量的数据需要处理,在处理的过程中,数据的筛选不够合理,另外,现有的系统缺乏自动查重、自动处理的功能。The amount of data in the science and technology reward management project is large, and a large amount of data needs to be processed every year. During the processing, the data screening is not reasonable enough. In addition, the existing system lacks the functions of automatic duplicate checking and automatic processing.
大量、繁琐数据的筛选难度大,处理过程不够合理,原有系统对数据进行一次筛选,筛选依据单一,人为干预因素多,缺乏公平合理性。人工处理数据工作量大、效率低,原有系统需要人工进行数据的查阅比对,工作效率低,任务繁重。It is difficult to screen a large amount of cumbersome data, and the processing process is not reasonable enough. The original system screened the data once, with a single screening basis, many human intervention factors, and lack of fairness and rationality. Manual data processing is heavy workload and low efficiency. The original system needs to manually check and compare data, which has low work efficiency and heavy tasks.
在申报科技奖励时,填报的申报材料较多,当申报材料填写的为项目名称或论文论著名称时及项目完成人及专利文献时,需要根据名称判断是否存在重复申请的嫌疑,目前,该工作均是通过人为识别,由于申报数据的量大,人为识别精确度不够。When declaring scientific and technological awards, there are many application materials to be filled in. When the application materials are filled with the name of the project or the name of the thesis, the person who completed the project and the patent document, it is necessary to judge whether there is any suspicion of repeated application based on the name. At present, the work They are all identified by humans. Due to the large amount of declared data, the accuracy of human identification is not enough.
发明内容Contents of the invention
本发明的目的就是为了解决上述问题,提供一种项目申报数据处理方法及系统,本发明的数据判断优势在于系统自动提取查重因素,进行复杂匹配计算,减少人为因素,提高查重结果的公正和正确性,减轻工作人员工作量,极大提高工作效率。The purpose of the present invention is to solve the above problems and provide a method and system for project declaration data processing. The data judgment advantage of the present invention lies in that the system automatically extracts plagiarism factors, performs complex matching calculations, reduces human factors, and improves the fairness of plagiarism results. and correctness, reduce the workload of staff, and greatly improve work efficiency.
为了实现上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一种具有智能检测功能的数据处理方法,包括如下步骤:A data processing method with an intelligent detection function, comprising the steps of:
步骤一:通过JS脚本自动检测当前用户使用浏览器版本,对于非IE系列浏览器给予提示,并关闭;检测使用浏览器符合要求即进入系统进行数据采集;Step 1: Automatically detect the browser version used by the current user through the JS script, give a prompt for non-IE series browsers, and close it; detect that the browser used meets the requirements and then enter the system for data collection;
步骤二:将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库,采集的申报项目的数据信息存储到网络服务器的缓存区,主计算机调用网络服务器的缓存区的申报项目的数据信息并存储到主计算机的第一缓存区;采集的图片、Word和PDF文档数据以文件形式存储至主计算机的第二缓存区,将文档相对路径存储至项目申报数据库附件子数据表中;Step 2: The remote data collection terminal collects the handwriting of the completed person through the tablet and stores it in the handwriting feature database, and stores the data information of the collected application items in the cache area of the network server, and the host computer calls the data information of the application items in the cache area of the network server And stored in the first cache area of the host computer; the collected pictures, Word and PDF document data are stored in the second cache area of the host computer in the form of files, and the relative path of the document is stored in the sub-data table of the project declaration database attachment;
步骤三:主计算机读取第一缓存区中的申报项目的信息,判断数据是否符合要求;对于图片文档,利用项目申报数据库附件子数据表中文档路径调用主计算机的第二缓存区图片文档,利用图像识别模块读取图片文档内容,识别完成人笔迹与笔迹特征库进行比对,完成单位盖章与完成单位数据表完成单位名称是否相符,如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入步骤四,如果不符合就返回步骤二;Step 3: The host computer reads the information of the declaration project in the first buffer area, and judges whether the data meets the requirements; for the picture file, use the document path in the sub-data table of the attachment of the project declaration database to call the picture file in the second buffer area of the host computer, Use the image recognition module to read the content of the picture document, identify the handwriting of the person who completed it and compare it with the handwriting feature database, and check whether the seal of the completion unit matches the name of the completion unit in the data sheet of the completion unit. If the handwriting information in the handwriting feature database and the project declaration data table If the information and picture documents of the declared project meet the requirements, go to step 4, and if not, go back to step 2;
步骤四:将步骤三中符合要求的项目数据从项目申报数据库同步到评审数据库,该同步过程采取单向同步;对评审数据库的申报信息进行三次筛选,并将最终数据输出在服务器的浏览页面上。Step 4: Synchronize the project data that meets the requirements in step 3 from the project application database to the review database. The synchronization process adopts one-way synchronization; perform three screenings on the application information of the review database, and output the final data on the browsing page of the server .
所述步骤二中具体为:将与项目直接相关的数据存储到项目申报库主数据表中,每条数据对应唯一项目编号,与项目间接相关的数据存储到项目申报库各子数据表中,子数据表中将唯一项目编号设置为外键与主数据表关联;申报数据库的子数据表中还存储有历史项目申报信息。In the second step, the data directly related to the project are stored in the master data table of the project declaration database, each piece of data corresponds to a unique project number, and the data indirectly related to the project are stored in each sub-data table of the project declaration database, In the sub-data table, the unique item number is set as a foreign key to associate with the main data table; the sub-data table of the declaration database also stores historical project declaration information.
所述直接相关的数据包括项目基本信息;间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;The directly related data includes the basic information of the project; the indirectly related data includes the project introduction, major technological innovations, third-party evaluation, social and economic benefits, scientific and technological awards received, the person who completed it, the unit that completed it, the opinion of the recommending unit, intellectual property rights, papers, etc. treatises and appendices;
所述项目基本信息包括:项目名称、项目学科、技术领域、项目来源及所属国民经济行业等;所述知识产权包括:专利申请号、专利名称、发明人和专利授权日;所述完成人包括:完成人的姓名、身份证号和完成人顺序。所述步骤一中数据存储形式包括:数据表、图片、Word文档及PDF文档。存储为不同格式数据,数据表便于查询、统计,图片及PDF文档确保数据真实性,Word文档为了便于保持数据原有格式,便于查看。The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry, etc.; the intellectual property rights include: patent application number, patent name, inventor and patent authorization date; the completion person includes : Completer's name, ID number and sequence of completers. The data storage forms in the step 1 include: data tables, pictures, Word documents and PDF documents. Stored as data in different formats, the data table is convenient for query and statistics, pictures and PDF documents ensure the authenticity of the data, and Word documents are easy to maintain the original format of the data for easy viewing.
所述步骤三中,对项目申报库各子数据表存储的数据进行分词或者直接匹配,利用判断模块判断当前的申报项目的信息是否为项目名称或论文论著名称,如果是就进入关键词比较模块,如果不是,再进行判断是否为项目的完成人姓名及身份证号或知识产权号,如果是,则进入直接匹配模块。In said step 3, word segmentation or direct matching is carried out to the data stored in each sub-data table of the project declaration database, and the judging module is used to judge whether the information of the current declaration project is the project name or the name of the thesis, and if so, enter the keyword comparison module , if not, judge whether it is the name and ID number or intellectual property number of the person who completes the project, if yes, enter the direct matching module.
利用关键词比较模块将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较,如果相似度不低于设定值,判定重复,否则不重复;Use the keyword comparison module to compare the project name or thesis name of the current application project information with the keyword of another application project’s project name or thesis name in the historical project application information, if the similarity is not lower than the set value, it is determined to be repeated, otherwise it is not repeated;
利用直接匹配模块将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配,如果相同判定重复,否则不重复;Use the direct matching module to directly match the completer's name and ID number or intellectual property number of the information item of the current application project with the project completion person's name, ID number or intellectual property number of another application item in the historical project application information Match, if the same judgment is repeated, otherwise not repeated;
利用存储模块将关键词比较模块及直接匹配模块判定不重复申报项目的信息存储到主计算机的缓存区,将判定重复的申报项目的信息存储到主计算机中的项目申报数据库查重表中。Using the storage module, the keyword comparison module and the direct matching module determine the information of non-duplicate declaration items to be stored in the cache area of the host computer, and the information of the duplicate declaration items is stored in the duplicate checking table of the item declaration database in the host computer.
所述关键词比较模块将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较,如果相似度不低于设定值,判定重复,否则不重复,具体过程为:The keyword comparison module compares the project name or thesis name of the information of the current application project with the keywords of the project name or thesis name of another application project in the historical project application information, and if the similarity is not lower than the set Set the value, determine the repetition, otherwise it will not repeat, the specific process is:
利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,将分解出的关键词分别存储到主计算机中项目申报数据库对应的两个数组中;Use the keyword extraction module to extract one of the information of the current declared project and the declared information of historical projects, perform word segmentation on the extracted information, and store the decomposed keywords into two arrays corresponding to the project declaration database in the main computer;
利用关键词匹配模块对项目申报数据库对应的两个数组中的关键词进行遍历循环比较,得出相同关键词数及各组关键词数;Use the keyword matching module to traverse and compare the keywords in the two arrays corresponding to the project declaration database to obtain the same keyword number and the number of keywords in each group;
根据相似度模块得出当前申报项目的信息与历史项目申报信息中的项目信息的相似度,将该相似度与设定值比较,如果相似度不低于设定值,判定重复,否则不重复。According to the similarity module, the similarity between the information of the current declared project and the project information in the historical project declaration information is obtained, and the similarity is compared with the set value. If the similarity is not lower than the set value, it is determined to be repeated, otherwise it is not repeated. .
所述利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,采用ShootSearch组件分词。The keyword extraction module extracts one of the information of the current declaration project and the declaration information of historical projects, performs word segmentation on the extracted information, and uses the ShootSearch component to do word segmentation.
所述相似度模块具体用于比较项目申报数据库对应的两个数组中关键词数大小,取出较小关键词数,用相同关键词数除以较小关键词数得出相似度。The similarity module is specifically used to compare the number of keywords in the two arrays corresponding to the project declaration database, extract the smaller number of keywords, and divide the same number of keywords by the smaller number of keywords to obtain the similarity.
所述利用直接匹配模块将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配,具体为:Using the direct matching module to combine the completer’s name and ID number or intellectual property number of the information item of the current application project with the project completion person’s name, ID number or intellectual property number of another application item in the historical project application information Match directly, specifically:
将远程数据采集终端采集的申报项目的信息与申报数据库的子数据表中历史项目申报信息直接进行遍历循环匹配,判断是否匹配,如果匹配,则判定重复,否则,不重复。The information of the declaration items collected by the remote data collection terminal and the historical project declaration information in the sub-data table of the declaration database are directly traversed and cyclically matched to determine whether they match. If they match, it is determined to be repeated, otherwise, it is not repeated.
所述申报项目的信息包括项目名称、论文论著名称、项目的完成人姓名及身份证号及知识产权号,历史项目申报信息包括当前年度其他项目或近三年所有项目中的项目名称、论文论著名称、项目的完成人姓名及身份证号及知识产权号。知识产权号为专利申请号或者专利公开号。The information of the declared project includes the project name, the name of the thesis, the name and ID number of the person who completed the project, and the intellectual property number. Name, the name of the person who completed the project, ID number and intellectual property number. The intellectual property number is the patent application number or patent publication number.
用相似度计算方式查重,是为了更大程度上查询出重复项目,避免重新组合项目重复申报;所述分词的数据包括项目名称和论文论著名称;所述直接匹配的数据包括:完成人的姓名及身份证号和知识产权号。数据查重是整个科技奖励系统中的重要环节,数据处理过程复杂,处理手段不同。针对不同数据采取不同方式查重,是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识,不存在组合的可能。项目名称及论文论著名称可分解重组,进行分词计算相似度可更精确的查询重复项目。The similarity calculation method is used to check duplicate items to a greater extent to avoid repeated declarations of recombined items; the word segmentation data includes the project name and the title of the paper; the direct matching data includes: the author's Name and ID number and intellectual property number. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.
所述步骤三中对评审数据库的申报信息进行三次筛选,具体为:In the third step, the application information of the review database is screened three times, specifically:
对符合要求的评审数据库中的项目进行第一次筛选,根据项目信息遴选对应的网络评审专家,通过网络评审对同步到评审数据库中的数据进行筛选;将筛选出的申报项目数据存储到评审数据库中的第三缓存区;Screen the projects in the review database that meet the requirements for the first time, select the corresponding network review experts according to the project information, and screen the data synchronized to the review database through network review; store the screened declared project data in the review database The third cache area in;
对存储到第三缓存区中的申报项目数据进行第二次筛选;根据项目信息遴选资深专家,对第一次筛选取出的数据进行专家投票,从投票结果中取出初步获奖项目,将初步获奖项目的数据存储到评审数据库中的第四缓存区;The second screening is performed on the declared project data stored in the third buffer area; senior experts are selected according to the project information, experts vote on the data extracted from the first screening, and the preliminary award-winning projects are taken out from the voting results, and the preliminary award-winning projects are selected The data stored in the fourth cache area in the review database;
对存储到第四缓存区中的申报项目数据进行第三次筛选;根据项目信息遴选科技委员会专家,对第二次筛选中取出的数据进行专家审核,从审核结果中取出最终获奖项目,将最终获奖项目存储到评审数据库中的第五缓存区。The third screening is carried out on the declared project data stored in the fourth buffer area; the experts of the Science and Technology Committee are selected according to the project information, and the data taken out in the second screening are reviewed by experts, and the final award-winning projects are taken out from the review results, and the final Award-winning projects are stored in the fifth cache in the review database.
在筛选时,设置评分指标的权重,遴选专家,对同步的数据进行专家评分,依据权重对专家评分进行加和得到项目得分,依据项目得分从高到低进行排序,从排序结果中取出设定数目的项目。投票包括:一等奖、二等奖、三等奖和不评奖。审核包括有异议和无异议。When screening, set the weight of scoring indicators, select experts, perform expert scoring on the synchronized data, add up the expert scores according to the weights to obtain project scores, sort according to project scores from high to low, and take settings from the sorting results number of items. Voting includes: first prize, second prize, third prize and no award. Reviews include objections and non-objections.
所述三次筛选中,专家的遴选的过程具体为:In the above three screenings, the selection process of experts is as follows:
S1:将科技项目学科信息以数据集A的形式存储在评审数据库的子数据表中,评审数据库的子数据表中还存储有以数据集B的形式存储的专家学科信息;S1: Store the scientific and technological project subject information in the form of data set A in the sub-data table of the review database, and the sub-data table of the review database also stores expert subject information in the form of data set B;
S2:在数据集B中选取专家学科信息并作为条件因素,判断该专家学科信息的条件因素是否为一级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目学科信息的关键因素匹配,遍历专家学科信息中的条件因素是否包含关键因素,如果包含则匹配,并进入步骤S5,否则不匹配,进入步S3;S2: Select the subject information of experts in data set B as a condition factor, judge whether the condition factor of the subject information of the expert is a first-level subject, if so, combine the condition factor with the science and technology project discipline stored in the form of data set A Matching of key factors of information, traversing whether the conditional factors in expert subject information contain key factors, if yes, match and go to step S5, otherwise not match, go to step S3;
S3:判断该专家信息的条件因素是否为二级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,进入步骤S5,否则不匹配,进入步骤S4;S3: Determine whether the condition factor of the expert information is a secondary discipline, if so, match the condition factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse whether the condition factor in the expert information contains the key Factors, if they are included, they will match and go to step S5, otherwise they will not match and go to step S4;
S4:判断该专家信息的条件因素是否为三级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,进入S5,否则不匹配;S4: Determine whether the conditional factor of the expert information is a third-level subject, if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse whether the conditional factor in the expert information contains the key Factors, match if contained, go to S5, otherwise not match;
S5:从存储的专家学科信息的数据集B中随机选取与科技项目相匹配的专家数量,并将取出的数据存储至评审数据库;S5: Randomly select the number of experts matching the scientific and technological project from the stored data set B of expert subject information, and store the retrieved data in the review database;
所述步骤S5具体为,根据Random(随机)函数返回的零到指定数目的随机数,选取数据,直到选够指定数目。The step S5 specifically includes selecting data according to random numbers from zero to a specified number returned by the Random (random) function until the specified number is selected.
三次筛选是不同的,对数据的评价不同,第一次筛选是打分、第二次筛选是投票分等级、第三次筛选是投票表决。但是三次筛选中专家遴选的过程是相同的。筛选出的数据输出,输出形式为Word格式文档。Word格式文档输出,便于用户自行调整数据格式。The three screenings are different, and the evaluation of the data is different. The first screening is scoring, the second screening is voting and grading, and the third screening is voting. But the process of selecting experts in the three screenings is the same. The filtered data is output in the form of a Word format document. The document output in Word format is convenient for users to adjust the data format by themselves.
一种具有智能检测功能的数据处理系统,包括系统校验模块,用于通过JS脚本自动检测当前用户使用浏览器版本,对于非IE系列浏览器给予提示,并关闭;检测使用浏览器符合要求即进入系统进行数据采集;A data processing system with an intelligent detection function, including a system verification module, which is used to automatically detect the browser version used by the current user through JS scripts, give a prompt to non-IE series browsers, and close them; Enter the system for data collection;
远程数据采集终端,用于将采集的申报项目的数据信息存储到网络服务器的缓存区,将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库;The remote data collection terminal is used to store the data information of the collected declaration items in the cache area of the network server, and store the handwriting of the person who completed the collection by the remote data collection terminal through the handwriting board into the handwriting feature database;
主计算机,用于调用网络服务器的缓存区的申报项目的数据信息并存储到主计算机的第一缓存区;采集的图片、Word和PDF文档数据以文件形式存储至主计算机的第二缓存区,将文档相对路径存储至项目申报数据库附件子数据表中;The host computer is used to call the data information of the declared items in the cache area of the network server and store it in the first cache area of the host computer; the collected pictures, Word and PDF document data are stored in the second cache area of the host computer in the form of files, Store the relative path of the document in the attachment sub-data table of the project declaration database;
查重判断模块,用于根据主计算机读取第一缓存区中的申报项目的信息,判断数据是否符合要求;对于图片文档,利用项目申报数据库附件子数据表中文档路径调用主计算机的第二缓存区图片文档,利用图像识别模块读取图片文档内容,识别完成人笔迹与笔迹特征库进行比对,完成单位盖章与完成单位数据表完成单位名称是否相符,如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入筛选模块,如果不符合就重新采集数据;Duplicate checking and judging module is used to read the information of the declared project in the first cache area according to the main computer, and judge whether the data meets the requirements; For the image file in the cache area, use the image recognition module to read the content of the image file, and compare the handwriting of the person who completed the identification with the handwriting feature database. If the information and picture documents of the declared project in the project declaration data form meet the requirements, it will enter the screening module, and if it does not meet the requirements, the data will be collected again;
输出模块,用于将符合要求的项目数据从项目申报数据库同步到评审数据库;对评审数据库的申报信息进行三次筛选,并将最终数据输出在服务器的浏览页面上。The output module is used to synchronize the project data meeting the requirements from the project application database to the review database; perform three screenings on the application information of the review database, and output the final data on the browsing page of the server.
所述数据处理系统还包括数据分配模块,具体用于将与项目直接相关的数据存储到项目申报库主数据表中,与项目间接相关的数据存储到项目申报库各子数据表中,子数据表之间通过项目主键关联;申报数据库的子数据表中还存储有历史项目申报信息。The data processing system also includes a data allocation module, which is specifically used to store data directly related to the project into the master data table of the project declaration database, and store data indirectly related to the project into each sub-data table of the project declaration database, and the sub-data The tables are associated through the primary key of the project; the sub-data table of the declaration database also stores historical project declaration information.
所述查重判断模块中,具体还包括选择模块,选择模块用于对主计算机读取第一缓存区中的项目申报库各子数据表存储的数据进行分词或者直接匹配,利用判断当前的申报项目的信息是否为项目名称或论文论著名称,如果是就进入关键词比较模块,如果不是,再进行判断是否为项目的完成人姓名及身份证号或知识产权号,如果是,则进入直接匹配模块;In the described duplicate checking judgment module, specifically also include a selection module, the selection module is used to carry out word segmentation or direct matching to the data stored in each sub-data table of the project declaration library in the first buffer area read by the main computer, and to judge the current declaration by using Whether the project information is the name of the project or the name of the thesis, if it is, enter the keyword comparison module, if not, then judge whether it is the name and ID number or intellectual property number of the person who completed the project, if yes, enter the direct matching module;
关键词比较模块,用于将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较,如果相似度不低于设定值,判定重复,否则不重复;The keyword comparison module is used to compare the project name or thesis name of the current application project information with the keywords of the project name or thesis name of another application project in the historical project application information, if the similarity is not less than Set the value to determine the repetition, otherwise it will not repeat;
直接匹配模块,用于将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配,如果相同判定重复,否则不重复;The direct matching module is used to compare the completer's name and ID number or intellectual property number of the information item of the current application project with the project completion person's name, ID number or intellectual property number of another application item in the historical project application information Match directly, if the same judgment is repeated, otherwise it will not be repeated;
存储模块,用于将关键词比较模块及直接匹配模块判定不重复申报项目的信息存储到主计算机的缓存区,将判定重复的申报项目的信息存储到主计算机中的项目申报数据库查重表中。The storage module is used to store the information of non-duplicate declaration items determined by the keyword comparison module and the direct matching module in the cache area of the host computer, and store the information of the duplicate declaration items in the duplicate checking table of the item declaration database in the host computer .
所述关键词比较模块,具体包括:The keyword comparison module specifically includes:
关键词提取模块,用于取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,将分解出的关键词分别存储到主计算机中项目申报数据库对应的两个数组中;The keyword extraction module is used to extract one of the information of the current declared project and the declared information of historical projects, segment the extracted information, and store the decomposed keywords into two arrays corresponding to the project declaration database in the main computer ;
相似度模块,用于得出当前申报项目的信息与历史项目申报信息中的项目信息的相似度,将该相似度与设定值比较,如果相似度不低于设定值,判定重复,否则不重复。The similarity module is used to obtain the similarity between the information of the current declared project and the project information in the historical project declaration information, and compare the similarity with the set value. If the similarity is not lower than the set value, it is determined to be repeated, otherwise Not repeating.
所述利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,采用ShootSearch组件分词。The keyword extraction module extracts one of the information of the current declaration project and the declaration information of historical projects, performs word segmentation on the extracted information, and uses the ShootSearch component to do word segmentation.
所述相似度模块具体用于比较项目申报数据库对应的两个数组中关键词数大小,取出较小关键词数,用相同关键词数除以较小关键词数得出相似度。The similarity module is specifically used to compare the number of keywords in the two arrays corresponding to the project declaration database, extract the smaller number of keywords, and divide the same number of keywords by the smaller number of keywords to obtain the similarity.
所述直接匹配模块,具体包括:The direct matching module specifically includes:
匹配模块,用于将远程数据采集终端采集的申报项目的信息与申报数据库的子数据表中历史项目申报信息直接进行遍历循环匹配,判断是否匹配,如果匹配,则判定重复,否则,不重复。The matching module is used for traversing and cyclically matching the declared project information collected by the remote data collection terminal with the historical project declared information in the sub-data table of the declared database, and judging whether they match.
所述筛选模块,具体包括:The screening module specifically includes:
筛选存储模块,用于调用科技项目学科信息并以数据集A的形式存储在评审数据库的子数据表中,评审数据库的子数据表中还存储有以数据集B的形式存储的专家学科信息;The screening storage module is used to call the scientific and technological project subject information and store it in the sub-data table of the review database in the form of data set A, and the expert subject information stored in the form of data set B is also stored in the sub-data table of the review database;
一级学科提取模块,用于在数据集B中选取专家学科信息并作为条件因素,判断该专家学科信息的条件因素是否为一级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目学科信息的关键因素匹配,遍历专家学科信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The first-level subject extraction module is used to select expert subject information in data set B as a condition factor, and judge whether the condition factor of the expert subject information is a first-level subject, and if so, combine the condition factor with the condition factor of data set A The key factor matching of the scientific and technological project subject information stored in the form, traverses whether the conditional factors in the expert subject information contain the key factor, if it is included, it will match, otherwise it will not match;
二级学科提取模块,用于判断该专家信息的条件因素是否为二级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The second-level subject extraction module is used to judge whether the conditional factor of the expert information is a second-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;
三级学科提取模块,用于判断该专家信息的条件因素是否为三级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The third-level subject extraction module is used to judge whether the conditional factor of the expert information is a third-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;
随机数据生成模块,用于从存储的专家学科信息的数据集B中随机选取与科技项目相匹配的专家数量,并将取出的数据存储至评审数据库;A random data generation module, used to randomly select the number of experts matching the scientific and technological project from the stored data set B of expert subject information, and store the retrieved data in the review database;
所述随机数据生成模块具体为,根据Random(随机)函数返回的零到指定数目的随机数,选取数据,直到选够指定数目。The random data generation module specifically selects data according to random numbers from zero to a specified number returned by the Random (random) function until the specified number is selected.
所述筛选模块包括一次筛选模块、二次筛选模块及三次筛选模块,所述一次筛选模块用于对符合要求的项目进行第一次筛选,根据项目信息遴选对应的网络评审专家,通过网络评审对同步到评审数据库中的数据进行筛选;将筛选出的申报项目数据存储到评审数据库中的第三缓存区;The screening module includes a primary screening module, a secondary screening module, and a tertiary screening module. The primary screening module is used for the first screening of projects that meet the requirements, and selects corresponding network review experts according to project information. Synchronize the data in the review database for screening; store the screened declared project data in the third cache area in the review database;
所述二次筛选模块用于对存储到第三缓存区中的申报项目数据进行第二次筛选;根据项目信息遴选资深专家,对一次筛选模块中取出的数据进行专家投票,从投票结果中取出初步获奖项目,将初步获奖项目的数据存储到评审数据库中的第四缓存区;The secondary screening module is used to perform a second screening of the declared project data stored in the third buffer area; select senior experts according to the project information, conduct expert voting on the data taken out of the primary screening module, and take out the data from the voting results Preliminary award-winning projects, store the data of preliminary award-winning projects in the fourth cache area in the review database;
所述三次筛选模块用于对存储到第四缓存区中的申报项目数据进行数据第三次筛选;根据项目信息遴选科技委员会专家,对二次筛选模块中取出的数据进行专家审核,从审核结果中取出最终获奖项目,将最终获奖项目存储到评审数据库中的第五缓存区;将存储在评审数据库中的第五缓存区中的最终获奖项目的数据输出。The third screening module is used for the third screening of the declared project data stored in the fourth buffer area; according to the project information, the experts of the Science and Technology Committee are selected, and the data taken out of the secondary screening module is reviewed by experts. The final award-winning project is taken out, and the final award-winning project is stored in the fifth buffer area in the review database; the data of the final award-winning item stored in the fifth buffer area in the review database is output.
直接相关的数据包括项目基本信息;间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;采集数据包括:项目基本信息、项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;所述项目基本信息包括:项目名称、项目学科、技术领域、项目来源及所属国民经济行业等;所述知识产权包括:知识产权号、知识产权名称、知识产权人和知识产权取得时间;所述完成人包括:完成人的姓名、身份证号和完成人顺序。Directly related data include the basic information of the project; indirectly related data include project introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers, and Attachments; collected data include: basic project information, project brief introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, recommendations from recommending units, intellectual property rights, papers and annexes; The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry, etc.; the intellectual property rights include: intellectual property number, intellectual property name, intellectual property owner and time when the intellectual property was obtained; the completed person Including: the name of the person who completed it, the ID number and the order of the person who completed it.
所述项目采集模块中数据存储形式包括:数据表、图片、Word文档及PDF文档,存储为不同格式数据,数据表便于查询、统计,图片及PDF文档确保数据真实性,Word文档为了便于保持数据原有格式,便于查看。The data storage form in the project collection module includes: data sheets, pictures, Word documents and PDF documents, which are stored as data in different formats. Original format for easy viewing.
所述一次筛选模块、二次筛选模块和三次筛选模块是不同的,对数据的评价不同,一次筛选模块用于打分、二次筛选模块用于投票分等级、三次筛选模块用于投票表决。但是一次筛选模块、二次筛选模块和三次筛选模块中专家遴选的过程是相同的。The primary screening module, the secondary screening module and the tertiary screening module are different, and the evaluation of data is different. The primary screening module is used for scoring, the secondary screening module is used for voting and grading, and the tertiary screening module is used for voting. But the process of selecting experts in the primary screening module, secondary screening module and tertiary screening module is the same.
本发明的有益效果:Beneficial effects of the present invention:
数据判断优势在于系统自动提取查重因素,进行复杂匹配计算,减少人为因素,提高查重结果的公正和正确性,减轻工作人员工作量,极大提高工作效率。数据查重是整个科技奖励系统中的重要环节,数据处理过程复杂,处理手段不同。针对不同数据采取不同方式查重,是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识,不存在组合的可能。项目名称及论文论著名称可分解重组,进行分词计算相似度可更精确的查询重复项目。The advantage of data judgment is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the fairness and correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.
通过网络评审进行数据筛选的优势,减少工作成本,筛选过程减少其他因素的干扰使筛选更客观。专家遴选优势是随机选取专家且依据项目学科与专家学科关联,使专家遴选公平公正、专家更具针对性,评审结果质量更高。The advantage of data screening through network review is to reduce work costs, and the screening process reduces the interference of other factors, making the screening more objective. The advantage of expert selection is that experts are randomly selected and related to the subject of the project and the subject of the expert, so that the selection of experts is fair and just, the experts are more targeted, and the quality of the review results is higher.
附图说明Description of drawings
图1为本发明的主流程示意图;Fig. 1 is a schematic diagram of the main process of the present invention;
图2为本发明的数据判断流程示意图;Fig. 2 is a schematic diagram of a data judgment flow chart of the present invention;
图3本发明的数据判断关键词比较流程示意图;Fig. 3 is a schematic flow chart of data judging keyword comparison of the present invention;
图4为本发明的数据判断直接匹配流程示意图;Fig. 4 is a schematic diagram of the data judging direct matching process of the present invention;
图5为本发明的专家遴选流程示意图。Fig. 5 is a schematic diagram of the expert selection process of the present invention.
具体实施方式Detailed ways
下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.
如图1所示,一种具有智能检测功能的数据处理方法,包括如下步骤:As shown in Figure 1, a data processing method with an intelligent detection function comprises the following steps:
步骤一:系统检测,使用浏览器打开网站,通过JS脚本自动检测当前用户使用浏览器版本,对于非IE系列浏览器给予提示,并关闭;检测使用浏览器符合要求即进入系统进行数据采集;Step 1: System detection, use a browser to open the website, automatically detect the browser version used by the current user through JS script, give a prompt for non-IE series browsers, and close it; detect that the browser used meets the requirements and then enter the system for data collection;
步骤二:采集申报项目的数据;将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库,将采集的与项目直接相关的数据存储到项目申报库主数据表中,每条数据对应唯一项目编号,与项目间接相关的数据存储到项目申报库各子数据表中,子数据表中设置项目编号作为外键与主数据表关联;采集的图片、Word和PDF文档数据以文件形式存储至服务器缓存区,将文档相对路径存储至项目申报库附件子数据表中;Step 2: Collect the data of the declared project; store the handwriting of the person who completed the collection by the remote data collection terminal through the handwriting board into the handwriting feature database, and store the collected data directly related to the project into the master data table of the project declaration database. Each piece of data corresponds to Unique project number, data indirectly related to the project is stored in each sub-data table of the project declaration database, and the project number is set in the sub-data table as a foreign key to associate with the main data table; the collected pictures, Word and PDF document data are stored in the form of files To the server cache area, store the relative path of the document in the attachment sub-data table of the project declaration library;
步骤三:读取项目申报库主数据表及各子数据表中数据,判断数据是否符合要求;对于图片文档,利用项目申报库附件子数据表中文档路径调用服务器缓存区图片文档,利用图像识别模块读取图片文档内容,识别完成人笔迹与笔迹特征库进行比对,完成单位盖章与完成单位数据表完成单位名称是否相符,如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入步骤四,如果不符合就返回步骤二;Step 3: Read the data in the main data table and each sub-data table of the project declaration database, and judge whether the data meets the requirements; for image documents, use the document path in the sub-data table of the project declaration database to call the image file in the server cache area, and use image recognition The module reads the content of the picture document, and compares the handwriting of the person who completes it with the handwriting feature database. Whether the seal of the completion unit matches the name of the completion unit in the data sheet of the completion unit, if the handwriting information in the handwriting feature database and the declaration item in the project declaration If the information and image files meet the requirements, go to step 4; if not, go back to step 2;
步骤四:将步骤二中符合要求的项目数据从项目申报数据库同步到评审数据库,该同步过程为单向同步;Step 4: Synchronize the project data that meets the requirements in step 2 from the project application database to the review database. The synchronization process is one-way synchronization;
步骤五:对符合要求的项目进行多次筛选,根据项目信息遴选对应的网络评审专家,通过网络评审对同步到评审数据库中的数据进行筛选;将筛选出的申报项目数据存储到评审数据库中的第一缓存区;对存储到第一缓存区中的申报项目数据进行第二次筛选;根据项目信息遴选资深专家,对步骤四中取出的数据进行专家投票,从投票结果中取出初步获奖项目,将初步获奖项目的数据存储到评审数据库中的第二缓存区;对存储到第二缓存区中的申报项目数据进行数据第三次筛选;根据项目信息遴选科技委员会专家,对步骤五中取出的数据进行专家审核,从审核结果中取出最终获奖项目,将最终获奖项目存储到评审数据库中的第三缓存区;Step 5: Screen the projects that meet the requirements multiple times, select the corresponding network review experts according to the project information, and screen the data synchronized to the review database through network review; store the screened declared project data in the review database. The first cache area; perform a second screening of the declared project data stored in the first cache area; select senior experts according to the project information, conduct expert voting on the data taken out in step 4, and take out preliminary award-winning projects from the voting results, Store the data of the preliminary award-winning projects in the second buffer area of the review database; perform the third data screening on the declared project data stored in the second buffer area; select the experts of the Science and Technology Committee according to the project information, The data is reviewed by experts, and the final award-winning projects are taken out from the review results, and the final award-winning projects are stored in the third cache area in the review database;
步骤六:将存储在评审数据库中的第三缓存区中的最终获奖项目的数据输出。Step 6: Output the data of the final award-winning project stored in the third cache area in the review database.
所述步骤二中,直接相关的数据包括项目基本信息;In said step 2, the directly related data includes the basic information of the project;
间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;Indirectly related data include project introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers and annexes;
采集数据包括:项目基本信息、项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;The collected data include: basic project information, project brief introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers and annexes;
所述项目基本信息包括:项目名称、项目学科、技术领域、项目来源及所属国民经济行业等;The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry to which it belongs, etc.;
所述知识产权包括:知识产权号、知识产权名称、知识产权人和知识产权取得时间;The intellectual property rights include: intellectual property number, intellectual property name, intellectual property owner and acquisition time of intellectual property rights;
所述完成人包括:完成人的姓名、身份证号和完成人顺序。The completer includes: the completer's name, ID number and sequence of completers.
所述步骤二中数据存储形式包括:数据表、图片、Word文档及PDF文档,存储为不同格式数据,数据表便于查询、统计,图片及PDF文档确保数据真实性,Word文档为了便于保持数据原有格式,便于查看。Data storage form in described step 2 comprises: data table, picture, Word document and PDF document, store as different format data, data table is convenient to query, statistics, picture and PDF document ensure data authenticity, and Word document is in order to keep data original Formatted for easy viewing.
所述步骤三中,对步骤二中存储的数据进行分词或者直接匹配,计算相似度,如果相似度低于设定值,就判断为符合要求,否则,判断为不符合要求。用相似度计算方式查重,是为了更大程度上查询出重复项目,避免重新组合项目重复申报;所述分词的数据包括项目名称和论文论著名称;所述直接匹配的数据包括:完成人的姓名及身份证号和知识产权号。In the third step, word segmentation or direct matching is performed on the data stored in the second step, and the similarity is calculated. If the similarity is lower than the set value, it is judged as meeting the requirements; otherwise, it is judged as not meeting the requirements. The similarity calculation method is used to check duplicate items to a greater extent to avoid repeated declarations of recombined items; the word segmentation data includes the project name and the title of the paper; the direct matching data includes: the author's Name and ID number and intellectual property number.
所述步骤五中具体过程为:设置评分指标的权重,遴选专家,对步骤三中同步的数据进行专家评分,依据权重对专家评分进行加和得到项目得分,依据项目得分从高到低进行排序,从排序结果中取出设定数目的项目。The specific process in step 5 is: setting the weight of scoring indicators, selecting experts, performing expert scoring on the data synchronized in step 3, summing up the expert scores according to the weights to obtain project scores, and sorting according to the project scores from high to low , fetch the set number of items from the sorted results.
所述步骤五中的投票包括:一等奖、二等奖、三等奖和不评奖。The voting in the step 5 includes: first prize, second prize, third prize and no award.
所述步骤五中审核包括有异议和无异议。The review in Step 5 includes objection and non-objection.
所述步骤六,将步骤六筛选出的数据输出,输出形式为Word格式文档。Word格式文档输出,便于用户自行调整数据格式。In the sixth step, the data filtered out in the sixth step is output, and the output form is a document in Word format. The document output in Word format is convenient for users to adjust the data format by themselves.
如图2所示,所述步骤三取项目申报库主数据表及各子数据表中数据,判断数据是否符合要求的步骤包括:As shown in Figure 2, the third step is to obtain the data in the main data table and each sub-data table of the project declaration database, and the steps of judging whether the data meets the requirements include:
步骤(3-1):从项目申报库主数据表及各子数据表提取因素;所述因素包括:项目名称、项目的完成人姓名及身份证号、知识产权号、论文论著名称;Step (3-1): Factors are extracted from the main data table and each sub-data table of the project declaration database; the factors include: project name, project completer name and ID number, intellectual property number, and name of thesis;
步骤(3-2):判断因素是否为项目名称和论文论著名称,如果是就进入步骤(3-3),否则就进入步骤(3-4);Step (3-2): Determine whether the factors are the project name and the name of the thesis, if yes, go to step (3-3), otherwise go to step (3-4);
步骤(3-3):将项目名称和论文论著名称与另一项目中关键词比较,如果相同,则判断项目重复,否则项目不重复;Step (3-3): Compare the project name and the name of the thesis with the keywords in another project, if they are the same, judge that the project is repeated, otherwise the project is not repeated;
步骤(3-4):将项目的完成人姓名及身份证号、知识产权号与另一项目直接匹配,如果相同,则判断项目重复,否则项目不重复;Step (3-4): directly match the name, ID number, and intellectual property number of the person completing the project with another project. If they are the same, determine that the project is duplicated; otherwise, the project is not repeated;
步骤(3-5):结果存储;对步骤(3-3)和步骤(3-4)比较和匹配结果存储到项目申报数据库。Step (3-5): result storage; the comparison and matching results of step (3-3) and step (3-4) are stored in the project declaration database.
数据查重是整个科技奖励系统中的重要环节,数据处理过程复杂,处理手段不同。针对不同数据采取不同方式查重,是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识,不存在组合的可能。项目名称及论文论著名称可分解重组,进行分词计算相似度可更精确的查询重复项目。Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.
如图3所示,所述步骤(3-3)关键词比较的具体步骤包括:As shown in Figure 3, the specific steps of described step (3-3) keyword comparison include:
步骤(3-3-1):从项目申报数据库取出当前项目和另一项目的项目名称或论文论著名称;所述另一项目指当前年度其他项目或前三年所有项目中的一个项目;Step (3-3-1): Take out the project name or thesis title of the current project and another project from the project application database; the other project refers to other projects in the current year or one of all projects in the previous three years;
步骤(3-3-2):提取关键词;对步骤(3-3-1)取出的数据采用ShootSearch组件分词,将分解出的关键词分别存储到数组;Step (3-3-2): extract keywords; use the ShootSearch component word segmentation for the data extracted in step (3-3-1), and store the decomposed keywords in an array;
步骤(3-3-3):关键词匹配;对步骤(3-3-2)得到的两个数组进行遍历循环比较,得出相同关键词数及各组关键词数;Step (3-3-3): Keyword matching; The two arrays obtained in step (3-3-2) are compared in a traversal loop to obtain the number of identical keywords and the number of keywords in each group;
步骤(3-3-4):计算相似度;比较步骤(3-3-3)得出的两组关键词数大小,取出较小关键词数,用相同关键词数除以较小关键词数得出相似度;Step (3-3-4): Calculate the similarity; compare the number of keywords in the two groups obtained in step (3-3-3), take out the smaller number of keywords, and divide the same number of keywords by the smaller keyword Count the similarity;
步骤(3-3-5):判断是否重复;将步骤(3-3-4)得出的相似度与设定值比较,如果相似度不低于设定值,判定重复,否则不重复;Step (3-3-5): judge whether to repeat; compare the similarity obtained in step (3-3-4) with the set value, if the similarity is not lower than the set value, determine to repeat, otherwise do not repeat;
步骤(3-3-6):数据存储;将步骤(3-3-5)判重的数据存储到项目申报数据库。Step (3-3-6): data storage; store the weight judgment data in step (3-3-5) into the project declaration database.
如图4所示,所述步骤(3-4)直接匹配的步骤包括:As shown in Figure 4, the step of described step (3-4) direct matching comprises:
步骤(3-4-1):从项目申报数据库取出当前项目及另一项目完成人姓名及身份证号或知识产权号;所述另一项目:在因素为完成人姓名或身份证号时,指当前年度的其他项目中的一个项目;在因素为知识产权号时,指当前年度的其他项目或前三年所有项目中的一个项目。Step (3-4-1): Take out the current project and another project completer's name and ID number or intellectual property number from the project declaration database; said another item: when the factor is the completer's name or ID number, Refers to one of the other projects in the current year; when the factor is the intellectual property number, it refers to one of the other projects in the current year or one of all the projects in the previous three years.
步骤(3-4-2):直接匹配;对步骤(3-4-1)取出的数据直接进行比较;Step (3-4-2): direct matching; directly compare the data taken out in step (3-4-1);
步骤(3-4-3):判断是否重复;对步骤(3-4-2)中比较相同的判定重复,否则不重复;Step (3-4-3): Determine whether to repeat; repeat the same determination in step (3-4-2), otherwise do not repeat;
步骤(3-4-4):数据存储,将步骤(3-4-3)判重的数据存储到项目申报数据库。Step (3-4-4): data storage, storing the weight judgment data in step (3-4-3) into the project declaration database.
如图5所示,所述步骤五中的专家的遴选过程如下:As shown in Figure 5, the selection process of experts in the fifth step is as follows:
步骤(I):从数据集A即项目学科集合中取出关键因素即项目第一学科,从数据集B即专家学科集合中取出因素1即第一学科、因素2即第二学科及因素3即第三学科;Step (I): Take out the key factor from the data set A, which is the subject set of the project, that is, the first subject of the project, and take out factor 1, which is the first subject, factor 2, which is the second subject, and factor 3, which is third subject;
步骤(II):选取条件因素;所属条件因素包括:第一学科、第二学科和第三学科;Step (II): select condition factors; the condition factors include: the first subject, the second subject and the third subject;
步骤(III):判断条件因素是否为因素1即第一学科,如果是就进入步骤(IV)且令n=1,否则进入步骤(V);Step (III): determine whether the conditional factor is factor 1, i.e. the first subject, if so, enter step (IV) and make n=1, otherwise enter step (V);
步骤(IV):将因素n与关键因素匹配;遍历因素n中是否包含关键因素,如果包含则匹配,否则不匹配;如果匹配则进入步骤(VI),否则返回步骤(II);Step (IV): Match the factor n with the key factor; traverse whether the key factor is included in the factor n, if it is included, match, otherwise it will not match; if it matches, enter step (VI), otherwise return to step (II);
步骤(V):判断条件因素是否为因素2即第二学科,如果是就进入步骤(IV)且令n=2,否则进入步骤(IV)且令n=3;Step (V): determine whether the condition factor is factor 2, i.e. the second subject, if so, enter step (IV) and make n=2, otherwise enter step (IV) and make n=3;
步骤(VI):从数据集B选取匹配数据;根据步骤(IV)匹配结果选取数据;Step (VI): select matching data from data set B; select data according to step (IV) matching result;
步骤(VII):随机选取指定数目数据;根据Random函数返回的零到指定书目的随机数,选取数据,直到选够指定数目;Step (VII): randomly select the specified number of data; according to the random number from zero returned by the Random function to the specified bibliography, select data until the specified number is selected;
步骤(VIII):数据存储;将步骤(VII)取出的数据存储至评审数据库。Step (VIII): data storage; store the data retrieved in step (VII) into the review database.
一种具有智能检测功能的数据处理系统,包括A data processing system with intelligent detection function, comprising
系统校验模块,用于通过JS脚本自动检测当前用户使用浏览器版本,对于非IE系列浏览器给予提示,并关闭;检测使用浏览器符合要求即进入系统进行数据采集;The system verification module is used to automatically detect the browser version used by the current user through JS scripts, give a prompt for non-IE series browsers, and close them; if the browser meets the requirements, it will enter the system for data collection;
远程数据采集终端,用于将采集的申报项目的数据信息存储到网络服务器的缓存区,将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库;The remote data collection terminal is used to store the data information of the collected declaration items in the cache area of the network server, and store the handwriting of the person who completed the collection by the remote data collection terminal through the handwriting board into the handwriting feature database;
主计算机,用于调用网络服务器的缓存区的申报项目的数据信息并存储到主计算机的第一缓存区;采集的图片、Word和PDF文档数据以文件形式存储至主计算机的第二缓存区,将文档相对路径存储至项目申报数据库附件子数据表中;The host computer is used to call the data information of the declared items in the cache area of the network server and store it in the first cache area of the host computer; the collected pictures, Word and PDF document data are stored in the second cache area of the host computer in the form of files, Store the relative path of the document in the attachment sub-data table of the project declaration database;
查重判断模块,用于根据主计算机读取第一缓存区中的申报项目的信息,判断数据是否符合要求;对于图片文档,利用项目申报数据库附件子数据表中文档路径调用主计算机的第二缓存区图片文档,利用图像识别模块读取图片文档内容,识别完成人笔迹与笔迹特征库进行比对,完成单位盖章与完成单位数据表完成单位名称是否相符,如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入筛选模块,如果不符合就重新采集数据;Duplicate checking and judging module is used to read the information of the declared project in the first cache area according to the main computer, and judge whether the data meets the requirements; For the image file in the cache area, use the image recognition module to read the content of the image file, and compare the handwriting of the person who completed the identification with the handwriting feature database. If the information and picture documents of the declared project in the project declaration data form meet the requirements, it will enter the screening module, and if it does not meet the requirements, the data will be collected again;
输出模块,用于将符合要求的项目数据从项目申报数据库同步到评审数据库;对评审数据库的申报信息进行三次筛选,并将最终数据输出在服务器的浏览页面上。The output module is used to synchronize the project data meeting the requirements from the project application database to the review database; perform three screenings on the application information of the review database, and output the final data on the browsing page of the server.
所述数据处理系统还包括数据分配模块,具体用于将与项目直接相关的数据存储到项目申报库主数据表中,与项目间接相关的数据存储到项目申报库各子数据表中,子数据表之间通过项目主键关联;申报数据库的子数据表中还存储有历史项目申报信息。The data processing system also includes a data allocation module, which is specifically used to store data directly related to the project into the master data table of the project declaration database, and store data indirectly related to the project into each sub-data table of the project declaration database, and the sub-data The tables are associated through the primary key of the project; the sub-data table of the declaration database also stores historical project declaration information.
所述查重判断模块中,具体还包括选择模块,选择模块用于对主计算机读取第一缓存区中的项目申报库各子数据表存储的数据进行分词或者直接匹配,利用判断当前的申报项目的信息是否为项目名称或论文论著名称,如果是就进入关键词比较模块,如果不是,再进行判断是否为项目的完成人姓名及身份证号或知识产权号,如果是,则进入直接匹配模块;In the described duplicate checking judgment module, specifically also include a selection module, the selection module is used to carry out word segmentation or direct matching to the data stored in each sub-data table of the project declaration library in the first buffer area read by the main computer, and to judge the current declaration by using Whether the project information is the name of the project or the name of the thesis, if it is, enter the keyword comparison module, if not, then judge whether it is the name and ID number or intellectual property number of the person who completed the project, if yes, enter the direct matching module;
关键词比较模块,用于将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较,如果相似度不低于设定值,判定重复,否则不重复;The keyword comparison module is used to compare the project name or thesis name of the current application project information with the keywords of the project name or thesis name of another application project in the historical project application information, if the similarity is not less than Set the value to determine the repetition, otherwise it will not repeat;
直接匹配模块,用于将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配,如果相同判定重复,否则不重复;The direct matching module is used to compare the completer's name and ID number or intellectual property number of the information item of the current application project with the project completion person's name, ID number or intellectual property number of another application item in the historical project application information Match directly, if the same judgment is repeated, otherwise it will not be repeated;
存储模块,用于将关键词比较模块及直接匹配模块判定不重复申报项目的信息存储到主计算机的缓存区,将判定重复的申报项目的信息存储到主计算机中的项目申报数据库查重表中。The storage module is used to store the information of non-duplicate declaration items determined by the keyword comparison module and the direct matching module in the cache area of the host computer, and store the information of the duplicate declaration items in the duplicate checking table of the item declaration database in the host computer .
所述关键词比较模块,具体包括:The keyword comparison module specifically includes:
关键词提取模块,用于取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,将分解出的关键词分别存储到主计算机中项目申报数据库对应的两个数组中;The keyword extraction module is used to extract one of the information of the current declared project and the declared information of historical projects, segment the extracted information, and store the decomposed keywords into two arrays corresponding to the project declaration database in the main computer ;
相似度模块,用于得出当前申报项目的信息与历史项目申报信息中的项目信息的相似度,将该相似度与设定值比较,如果相似度不低于设定值,判定重复,否则不重复。The similarity module is used to obtain the similarity between the information of the current declared project and the project information in the historical project declaration information, and compare the similarity with the set value. If the similarity is not lower than the set value, it is determined to be repeated, otherwise Not repeating.
所述利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个,对取出的信息进行分词,采用ShootSearch组件分词。The keyword extraction module extracts one of the information of the current declaration project and the declaration information of historical projects, performs word segmentation on the extracted information, and uses the ShootSearch component to do word segmentation.
所述相似度模块具体用于比较项目申报数据库对应的两个数组中关键词数大小,取出较小关键词数,用相同关键词数除以较小关键词数得出相似度。The similarity module is specifically used to compare the number of keywords in the two arrays corresponding to the project declaration database, extract the smaller number of keywords, and divide the same number of keywords by the smaller number of keywords to obtain the similarity.
所述直接匹配模块,具体包括:The direct matching module specifically includes:
匹配模块,用于将远程数据采集终端采集的申报项目的信息与申报数据库的子数据表中历史项目申报信息直接进行遍历循环匹配,判断是否匹配,如果匹配,则判定重复,否则,不重复。The matching module is used for traversing and cyclically matching the declared project information collected by the remote data collection terminal with the historical project declared information in the sub-data table of the declared database, and judging whether they match.
所述筛选模块,具体包括:The screening module specifically includes:
筛选存储模块,用于调用科技项目学科信息并以数据集A的形式存储在评审数据库的子数据表中,评审数据库的子数据表中还存储有以数据集B的形式存储的专家学科信息;The screening storage module is used to call the scientific and technological project subject information and store it in the sub-data table of the review database in the form of data set A, and the expert subject information stored in the form of data set B is also stored in the sub-data table of the review database;
一级学科提取模块,用于在数据集B中选取专家学科信息并作为条件因素,判断该专家学科信息的条件因素是否为一级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目学科信息的关键因素匹配,遍历专家学科信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The first-level subject extraction module is used to select expert subject information in data set B as a condition factor, and judge whether the condition factor of the expert subject information is a first-level subject, and if so, combine the condition factor with the condition factor of data set A The key factor matching of the scientific and technological project subject information stored in the form, traverses whether the conditional factors in the expert subject information contain the key factor, if it is included, it will match, otherwise it will not match;
二级学科提取模块,用于判断该专家信息的条件因素是否为二级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The second-level subject extraction module is used to judge whether the conditional factor of the expert information is a second-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;
三级学科提取模块,用于判断该专家信息的条件因素是否为三级学科,如果是,则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配,遍历专家信息中的条件因素是否包含关键因素,如果包含则匹配,否则不匹配;The third-level subject extraction module is used to judge whether the conditional factor of the expert information is a third-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;
随机数据生成模块,用于从存储的专家学科信息的数据集B中随机选取与科技项目相匹配的专家数量,并将取出的数据存储至评审数据库;A random data generation module, used to randomly select the number of experts matching the scientific and technological project from the stored data set B of expert subject information, and store the retrieved data in the review database;
所述随机数据生成模块具体为,根据Random(随机)函数返回的零到指定数目的随机数,选取数据,直到选够指定数目。The random data generation module specifically selects data according to random numbers from zero to a specified number returned by the Random (random) function until the specified number is selected.
所述筛选模块包括一次筛选模块、二次筛选模块及三次筛选模块,所述一次筛选模块用于对符合要求的项目进行第一次筛选,根据项目信息遴选对应的网络评审专家,通过网络评审对同步到评审数据库中的数据进行筛选;将筛选出的申报项目数据存储到评审数据库中的第三缓存区;The screening module includes a primary screening module, a secondary screening module, and a tertiary screening module. The primary screening module is used for the first screening of projects that meet the requirements, and selects corresponding network review experts according to project information. Synchronize the data in the review database for screening; store the screened declared project data in the third cache area in the review database;
所述二次筛选模块用于对存储到第三缓存区中的申报项目数据进行第二次筛选;根据项目信息遴选资深专家,对一次筛选模块中取出的数据进行专家投票,从投票结果中取出初步获奖项目,将初步获奖项目的数据存储到评审数据库中的第四缓存区;The secondary screening module is used to perform a second screening of the declared project data stored in the third buffer area; select senior experts according to the project information, conduct expert voting on the data taken out of the primary screening module, and take out the data from the voting results Preliminary award-winning projects, store the data of preliminary award-winning projects in the fourth cache area in the review database;
所述三次筛选模块用于对存储到第四缓存区中的申报项目数据进行数据第三次筛选;根据项目信息遴选科技委员会专家,对二次筛选模块中取出的数据进行专家审核,从审核结果中取出最终获奖项目,将最终获奖项目存储到评审数据库中的第五缓存区;将存储在评审数据库中的第五缓存区中的最终获奖项目的数据输出。The third screening module is used for the third screening of the declared project data stored in the fourth buffer area; according to the project information, the experts of the Science and Technology Committee are selected, and the data taken out of the secondary screening module is reviewed by experts. The final award-winning project is taken out, and the final award-winning project is stored in the fifth buffer area in the review database; the data of the final award-winning item stored in the fifth buffer area in the review database is output.
直接相关的数据包括项目基本信息;间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;采集数据包括:项目基本信息、项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件;所述项目基本信息包括:项目名称、项目学科、技术领域、项目来源及所属国民经济行业等;所述知识产权包括:知识产权号、知识产权名称、知识产权人和知识产权取得时间;所述完成人包括:完成人的姓名、身份证号和完成人顺序。Directly related data include the basic information of the project; indirectly related data include project introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers, and Attachments; collected data include: basic project information, project brief introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, recommendations from recommending units, intellectual property rights, papers and annexes; The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry, etc.; the intellectual property rights include: intellectual property number, intellectual property name, intellectual property owner and time when the intellectual property was obtained; the completed person Including: the name of the person who completed it, the ID number and the order of the person who completed it.
所述项目采集模块中数据存储形式包括:数据表、图片、Word文档及PDF文档,存储为不同格式数据,数据表便于查询、统计,图片及PDF文档确保数据真实性,Word文档为了便于保持数据原有格式,便于查看。The data storage form in the project collection module includes: data sheets, pictures, Word documents and PDF documents, which are stored as data in different formats. Original format for easy viewing.
所述一次筛选模块、二次筛选模块和三次筛选模块是不同的,对数据的评价不同,一次筛选模块用于打分、二次筛选模块用于投票分等级、三次筛选模块用于投票表决。但是一次筛选模块、二次筛选模块和三次筛选模块中专家遴选的过程是相同的。The primary screening module, the secondary screening module and the tertiary screening module are different, and the evaluation of data is different. The primary screening module is used for scoring, the secondary screening module is used for voting and grading, and the tertiary screening module is used for voting. But the process of selecting experts in the primary screening module, secondary screening module and tertiary screening module is the same.
本发明数据判断优势在于系统自动提取查重因素,进行复杂匹配计算,减少人为因素,提高查重结果的公正和正确性,减轻工作人员工作量,极大提高工作效率。数据查重是整个科技奖励系统中的重要环节,数据处理过程复杂,处理手段不同。针对不同数据采取不同方式查重,是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识,不存在组合的可能。项目名称及论文论著名称可分解重组,进行分词计算相似度可更精确的查询重复项目。The advantage of the data judgment of the present invention is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the fairness and correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.
通过网络评审进行数据筛选的优势,减少工作成本,筛选过程减少其他因素的干扰使筛选更客观。专家遴选优势是随机选取专家且依据项目学科与专家学科关联,使专家遴选公平公正、专家更具针对性,评审结果质量更高。The advantage of data screening through network review is to reduce work costs, and the screening process reduces the interference of other factors, making the screening more objective. The advantage of expert selection is that experts are randomly selected and related to the subject of the project and the subject of the expert, so that the selection of experts is fair and just, the experts are more targeted, and the quality of the review results is higher.
上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410291108.0A CN104133839A (en) | 2014-06-24 | 2014-06-24 | Data processing method and system with intelligent detection function |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201410291108.0A CN104133839A (en) | 2014-06-24 | 2014-06-24 | Data processing method and system with intelligent detection function |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN104133839A true CN104133839A (en) | 2014-11-05 |
Family
ID=51806517
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201410291108.0A Withdrawn CN104133839A (en) | 2014-06-24 | 2014-06-24 | Data processing method and system with intelligent detection function |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN104133839A (en) |
Cited By (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105187457A (en) * | 2015-10-27 | 2015-12-23 | 上海斐讯数据通信技术有限公司 | Client-based account automatic registration method, system and server |
| CN106709531A (en) * | 2017-01-20 | 2017-05-24 | 中国烟草总公司郑州烟草研究院 | Multi-process matching recognition method and device for used substance of cigarette material |
| CN107798047A (en) * | 2017-07-26 | 2018-03-13 | 上海壹账通金融科技有限公司 | Repeat work order detection method, device, server and medium |
| CN110263044A (en) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Date storage method, device, equipment and computer readable storage medium |
| WO2019205036A1 (en) * | 2018-04-25 | 2019-10-31 | 深圳市大疆创新科技有限公司 | Data processing method and apparatus |
| CN112214986A (en) * | 2020-11-12 | 2021-01-12 | 深圳供电局有限公司 | An intelligent analysis device for repeated declaration of scientific research projects |
| CN112329425A (en) * | 2020-11-12 | 2021-02-05 | 深圳供电局有限公司 | Intelligent review method and storage medium for scientific research projects |
| CN112417840A (en) * | 2020-11-12 | 2021-02-26 | 深圳供电局有限公司 | Scientific research project intelligent review system and computer equipment |
Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1928902A (en) * | 2005-09-06 | 2007-03-14 | 廖吉安 | Project appraisal method and system |
| CN101593324A (en) * | 2009-06-17 | 2009-12-02 | 浙江师范大学 | Network multi-level approval method and system based on trusted computing application technology |
| US20100036870A1 (en) * | 2008-08-05 | 2010-02-11 | Lee Edward Lowry | Mechanisms to support multiple name space aware projects |
| CN102376023A (en) * | 2010-11-17 | 2012-03-14 | 苏州德融嘉信信用管理技术有限公司 | Integration method for evaluation material for credit evaluation |
| US20130006893A1 (en) * | 2004-01-07 | 2013-01-03 | Thomas Henderson | Targeted dividend reinvestment plans and methods of establishing same |
| CN103176962A (en) * | 2013-03-08 | 2013-06-26 | 深圳先进技术研究院 | Statistical method and statistical system of text similarity |
| CN103235774A (en) * | 2013-04-27 | 2013-08-07 | 杭州电子科技大学 | A method for extracting feature words from a scientific and technological project application |
| CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
-
2014
- 2014-06-24 CN CN201410291108.0A patent/CN104133839A/en not_active Withdrawn
Patent Citations (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20130006893A1 (en) * | 2004-01-07 | 2013-01-03 | Thomas Henderson | Targeted dividend reinvestment plans and methods of establishing same |
| CN1928902A (en) * | 2005-09-06 | 2007-03-14 | 廖吉安 | Project appraisal method and system |
| US20100036870A1 (en) * | 2008-08-05 | 2010-02-11 | Lee Edward Lowry | Mechanisms to support multiple name space aware projects |
| CN101593324A (en) * | 2009-06-17 | 2009-12-02 | 浙江师范大学 | Network multi-level approval method and system based on trusted computing application technology |
| CN102376023A (en) * | 2010-11-17 | 2012-03-14 | 苏州德融嘉信信用管理技术有限公司 | Integration method for evaluation material for credit evaluation |
| CN103176962A (en) * | 2013-03-08 | 2013-06-26 | 深圳先进技术研究院 | Statistical method and statistical system of text similarity |
| CN103235774A (en) * | 2013-04-27 | 2013-08-07 | 杭州电子科技大学 | A method for extracting feature words from a scientific and technological project application |
| CN103440329A (en) * | 2013-09-04 | 2013-12-11 | 北京邮电大学 | Authoritative author and high-quality paper recommending system and recommending method |
Non-Patent Citations (5)
| Title |
|---|
| 周宁: "《信息组织》", 30 November 2004, 武汉大学出版社 * |
| 徐健: "《术语相似度计算方法研究》", 30 September 2012, 中山大学出版社 * |
| 王龙: "《突破DreamWeaver 4.0创作实例五十讲》", 31 August 2001, 中国水利水电出版社 * |
| 肖波: "KTDictSeg 一种简单快速准确的中文分词方法", 《CSDN博客》 * |
| 郎为民: "《射频识别(RFID)技术原理与应用》", 30 June 2006, 机械工业出版社 * |
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105187457A (en) * | 2015-10-27 | 2015-12-23 | 上海斐讯数据通信技术有限公司 | Client-based account automatic registration method, system and server |
| CN106709531A (en) * | 2017-01-20 | 2017-05-24 | 中国烟草总公司郑州烟草研究院 | Multi-process matching recognition method and device for used substance of cigarette material |
| CN106709531B (en) * | 2017-01-20 | 2020-10-13 | 中国烟草总公司郑州烟草研究院 | Method and device for identifying substances used by multi-process matched tobacco materials |
| CN107798047B (en) * | 2017-07-26 | 2021-03-02 | 深圳壹账通智能科技有限公司 | Duplicate work order detection method, apparatus, server and medium |
| CN107798047A (en) * | 2017-07-26 | 2018-03-13 | 上海壹账通金融科技有限公司 | Repeat work order detection method, device, server and medium |
| WO2019205036A1 (en) * | 2018-04-25 | 2019-10-31 | 深圳市大疆创新科技有限公司 | Data processing method and apparatus |
| CN110785734A (en) * | 2018-04-25 | 2020-02-11 | 深圳市大疆创新科技有限公司 | Data processing method and device |
| CN110263044A (en) * | 2019-06-21 | 2019-09-20 | 深圳前海微众银行股份有限公司 | Date storage method, device, equipment and computer readable storage medium |
| CN110263044B (en) * | 2019-06-21 | 2023-03-31 | 深圳前海微众银行股份有限公司 | Data storage method, device, equipment and computer readable storage medium |
| CN112329425A (en) * | 2020-11-12 | 2021-02-05 | 深圳供电局有限公司 | Intelligent review method and storage medium for scientific research projects |
| CN112417840A (en) * | 2020-11-12 | 2021-02-26 | 深圳供电局有限公司 | Scientific research project intelligent review system and computer equipment |
| CN112214986A (en) * | 2020-11-12 | 2021-01-12 | 深圳供电局有限公司 | An intelligent analysis device for repeated declaration of scientific research projects |
| CN112329425B (en) * | 2020-11-12 | 2023-09-15 | 深圳供电局有限公司 | An intelligent review method and storage medium for scientific research projects |
| CN112417840B (en) * | 2020-11-12 | 2023-09-15 | 深圳供电局有限公司 | Scientific research project intelligent review system and computer equipment |
| CN112214986B (en) * | 2020-11-12 | 2023-11-14 | 深圳供电局有限公司 | An intelligent analysis device for repeated declaration of scientific research projects |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN104133839A (en) | Data processing method and system with intelligent detection function | |
| CN104133842A (en) | Data processing method and data processing system with intelligent expert detection function | |
| CN104133838A (en) | Data processing method and system with system detection function | |
| CN104731941B (en) | method for capturing data from unstructured financial report based on XBR L technology | |
| CN103049575B (en) | A kind of academic conference search system of topic adaptation | |
| CN112632405B (en) | Recommendation method, recommendation device, recommendation equipment and storage medium | |
| CN110619568A (en) | Risk assessment report generation method, device, equipment and storage medium | |
| CN105760439B (en) | A kind of personage's cooccurrence relation map construction method based on specific behavior co-occurrence network | |
| CN111104798A (en) | Analysis method, system and computer readable storage medium for criminal plot in legal document | |
| CN110489653A (en) | Public feelings information querying method and device, system, electronic equipment, storage medium | |
| CN101593200A (en) | Chinese Web Page Classification Method Based on Keyword Frequency Analysis | |
| CN110688900A (en) | A method of withdrawing meter management based on image recognition | |
| CN110335180A (en) | Case is put on record material intelligence checking device | |
| CN112418695A (en) | Multi-dimensional portrait construction method and recommendation method for scientific researchers in tobacco field | |
| CN111782917B (en) | Method and device for visual analysis of financial punishment data | |
| CN101178721A (en) | Method for classifying and managing useful poser information in forum | |
| CN112581037B (en) | Background investigation method and system for multidimensional talent evaluation | |
| CN104156386A (en) | Data processing method and system with image recognition function | |
| CN114547400B (en) | A trade indicator visualization system and method based on international trade commodity data | |
| CN104133841A (en) | Data processing method and data processing system with system detection and image identification functions | |
| KR101265975B1 (en) | Future technology value appraisal system and method | |
| CN104133840A (en) | Data processing method and data processing system with system detection and biological recognition functions | |
| CN102368266A (en) | Sorting method of unlabelled pictures for network search | |
| CN106372123A (en) | Tag-based related content recommendation method and system | |
| CN117312303A (en) | Automatic data asset checking method, device, electronic equipment and medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C53 | Correction of patent of invention or patent application | ||
| CB03 | Change of inventor or designer information |
Inventor after: Wu Guanbin Inventor after: Li Hongmei Inventor after: Li Yong Inventor after: Xu Naiyuan Inventor after: Chen Suhong Inventor after: Wang Meng Inventor after: Gao Ying Inventor after: Fu Peng Inventor after: Wang Huihui Inventor before: Wu Guanbin Inventor before: Li Hongmei Inventor before: Li Yong Inventor before: Xu Naiyuan Inventor before: Chen Suhong Inventor before: Fu Peng Inventor before: Wang Huihui |
|
| COR | Change of bibliographic data |
Free format text: CORRECT: INVENTOR; FROM: WU GUANBIN LI HONGMEI LI YONG XU NAIYUAN CHEN SUHONG FU PENG WANG HUIHUI TO: WU GUANBIN LI HONGMEI LI YONG XU NAIYUAN CHEN SUHONG WANG MENG GAO YING FU PENG WANG HUIHUI |
|
| CB02 | Change of applicant information |
Address after: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant after: STATE GRID CORPORATION OF CHINA Applicant after: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co. Applicant after: SHANDONG ECLOUD INFORMATION TECHNOLOGY Co.,Ltd. Address before: 100031 Xicheng District West Chang'an Avenue, No. 86, Beijing Applicant before: State Grid Corporation of China Applicant before: ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER Co. Applicant before: SHANDONG ECLOUD INFORMATION TECHNOLOGY Co.,Ltd. |
|
| CB02 | Change of applicant information | ||
| WW01 | Invention patent application withdrawn after publication |
Application publication date: 20141105 |
|
| WW01 | Invention patent application withdrawn after publication |