CN104133839A

CN104133839A - Data processing method and system with intelligent detection function

Info

Publication number: CN104133839A
Application number: CN201410291108.0A
Authority: CN
Inventors: 吴观斌; 李红梅; 李勇; 许乃媛; 陈素红; 傅蓬; 王慧慧
Original assignee: Shandong Yi Yun Information Technology Co Ltd; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; State Grid Corp of China SGCC
Current assignee: Shandong Yi Yun Information Technology Co Ltd; Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd; State Grid Corp of China SGCC
Priority date: 2014-06-24
Filing date: 2014-06-24
Publication date: 2014-11-05

Abstract

The invention discloses a data processing method and system with an intelligent detection function. Step 1: collect the data of the declared project; Step 2: read the data in the main data table and each sub-data table of the project declaration database, and judge whether the data meets the requirements ;Step 3: Synchronize the project data that meets the requirements in step 2 from the project application database to the review database; screen the projects that meet the requirements three times, and store the data of the final award-winning projects in the third cache area in the review database output. The advantage of data judgment is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency.

Description

A data processing method and system with intelligent detection function

技术领域technical field

本发明涉及一种具有智能检测功能的数据处理方法及系统。The invention relates to a data processing method and system with intelligent detection function.

背景技术Background technique

目前的科技奖励管理项目在数据处理上具有以下缺点：The current technology incentive management project has the following shortcomings in data processing:

科技奖励管理项目的数据量大，每年都有大量的数据需要处理，在处理的过程中，数据的筛选不够合理，另外，现有的系统缺乏自动查重、自动处理的功能。The amount of data in the science and technology reward management project is large, and a large amount of data needs to be processed every year. During the processing, the data screening is not reasonable enough. In addition, the existing system lacks the functions of automatic duplicate checking and automatic processing.

大量、繁琐数据的筛选难度大，处理过程不够合理，原有系统对数据进行一次筛选，筛选依据单一，人为干预因素多，缺乏公平合理性。人工处理数据工作量大、效率低，原有系统需要人工进行数据的查阅比对，工作效率低，任务繁重。It is difficult to screen a large amount of cumbersome data, and the processing process is not reasonable enough. The original system screened the data once, with a single screening basis, many human intervention factors, and lack of fairness and rationality. Manual data processing is heavy workload and low efficiency. The original system needs to manually check and compare data, which has low work efficiency and heavy tasks.

在申报科技奖励时，填报的申报材料较多，当申报材料填写的为项目名称或论文论著名称时及项目完成人及专利文献时，需要根据名称判断是否存在重复申请的嫌疑，目前，该工作均是通过人为识别，由于申报数据的量大，人为识别精确度不够。When declaring scientific and technological awards, there are many application materials to be filled in. When the application materials are filled with the name of the project or the name of the thesis, the person who completed the project and the patent document, it is necessary to judge whether there is any suspicion of repeated application based on the name. At present, the work They are all identified by humans. Due to the large amount of declared data, the accuracy of human identification is not enough.

发明内容Contents of the invention

本发明的目的就是为了解决上述问题，提供一种项目申报数据处理方法及系统，本发明的数据判断优势在于系统自动提取查重因素，进行复杂匹配计算，减少人为因素，提高查重结果的公正和正确性，减轻工作人员工作量，极大提高工作效率。The purpose of the present invention is to solve the above problems and provide a method and system for project declaration data processing. The data judgment advantage of the present invention lies in that the system automatically extracts plagiarism factors, performs complex matching calculations, reduces human factors, and improves the fairness of plagiarism results. and correctness, reduce the workload of staff, and greatly improve work efficiency.

为了实现上述目的，本发明采用如下技术方案：In order to achieve the above object, the present invention adopts the following technical solutions:

一种具有智能检测功能的数据处理方法，包括如下步骤：A data processing method with an intelligent detection function, comprising the steps of:

步骤一：通过JS脚本自动检测当前用户使用浏览器版本，对于非IE系列浏览器给予提示，并关闭；检测使用浏览器符合要求即进入系统进行数据采集；Step 1: Automatically detect the browser version used by the current user through the JS script, give a prompt for non-IE series browsers, and close it; detect that the browser used meets the requirements and then enter the system for data collection;

步骤二：将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库，采集的申报项目的数据信息存储到网络服务器的缓存区，主计算机调用网络服务器的缓存区的申报项目的数据信息并存储到主计算机的第一缓存区；采集的图片、Word和PDF文档数据以文件形式存储至主计算机的第二缓存区，将文档相对路径存储至项目申报数据库附件子数据表中；Step 2: The remote data collection terminal collects the handwriting of the completed person through the tablet and stores it in the handwriting feature database, and stores the data information of the collected application items in the cache area of the network server, and the host computer calls the data information of the application items in the cache area of the network server And stored in the first cache area of the host computer; the collected pictures, Word and PDF document data are stored in the second cache area of the host computer in the form of files, and the relative path of the document is stored in the sub-data table of the project declaration database attachment;

步骤三：主计算机读取第一缓存区中的申报项目的信息，判断数据是否符合要求；对于图片文档，利用项目申报数据库附件子数据表中文档路径调用主计算机的第二缓存区图片文档，利用图像识别模块读取图片文档内容，识别完成人笔迹与笔迹特征库进行比对，完成单位盖章与完成单位数据表完成单位名称是否相符，如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入步骤四，如果不符合就返回步骤二；Step 3: The host computer reads the information of the declaration project in the first buffer area, and judges whether the data meets the requirements; for the picture file, use the document path in the sub-data table of the attachment of the project declaration database to call the picture file in the second buffer area of the host computer, Use the image recognition module to read the content of the picture document, identify the handwriting of the person who completed it and compare it with the handwriting feature database, and check whether the seal of the completion unit matches the name of the completion unit in the data sheet of the completion unit. If the handwriting information in the handwriting feature database and the project declaration data table If the information and picture documents of the declared project meet the requirements, go to step 4, and if not, go back to step 2;

步骤四：将步骤三中符合要求的项目数据从项目申报数据库同步到评审数据库，该同步过程采取单向同步；对评审数据库的申报信息进行三次筛选，并将最终数据输出在服务器的浏览页面上。Step 4: Synchronize the project data that meets the requirements in step 3 from the project application database to the review database. The synchronization process adopts one-way synchronization; perform three screenings on the application information of the review database, and output the final data on the browsing page of the server .

所述步骤二中具体为：将与项目直接相关的数据存储到项目申报库主数据表中，每条数据对应唯一项目编号，与项目间接相关的数据存储到项目申报库各子数据表中，子数据表中将唯一项目编号设置为外键与主数据表关联；申报数据库的子数据表中还存储有历史项目申报信息。In the second step, the data directly related to the project are stored in the master data table of the project declaration database, each piece of data corresponds to a unique project number, and the data indirectly related to the project are stored in each sub-data table of the project declaration database, In the sub-data table, the unique item number is set as a foreign key to associate with the main data table; the sub-data table of the declaration database also stores historical project declaration information.

所述直接相关的数据包括项目基本信息；间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件；The directly related data includes the basic information of the project; the indirectly related data includes the project introduction, major technological innovations, third-party evaluation, social and economic benefits, scientific and technological awards received, the person who completed it, the unit that completed it, the opinion of the recommending unit, intellectual property rights, papers, etc. treatises and appendices;

所述项目基本信息包括：项目名称、项目学科、技术领域、项目来源及所属国民经济行业等；所述知识产权包括：专利申请号、专利名称、发明人和专利授权日；所述完成人包括：完成人的姓名、身份证号和完成人顺序。所述步骤一中数据存储形式包括：数据表、图片、Word文档及PDF文档。存储为不同格式数据，数据表便于查询、统计，图片及PDF文档确保数据真实性，Word文档为了便于保持数据原有格式，便于查看。The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry, etc.; the intellectual property rights include: patent application number, patent name, inventor and patent authorization date; the completion person includes : Completer's name, ID number and sequence of completers. The data storage forms in the step 1 include: data tables, pictures, Word documents and PDF documents. Stored as data in different formats, the data table is convenient for query and statistics, pictures and PDF documents ensure the authenticity of the data, and Word documents are easy to maintain the original format of the data for easy viewing.

所述步骤三中，对项目申报库各子数据表存储的数据进行分词或者直接匹配，利用判断模块判断当前的申报项目的信息是否为项目名称或论文论著名称，如果是就进入关键词比较模块，如果不是，再进行判断是否为项目的完成人姓名及身份证号或知识产权号，如果是，则进入直接匹配模块。In said step 3, word segmentation or direct matching is carried out to the data stored in each sub-data table of the project declaration database, and the judging module is used to judge whether the information of the current declaration project is the project name or the name of the thesis, and if so, enter the keyword comparison module , if not, judge whether it is the name and ID number or intellectual property number of the person who completes the project, if yes, enter the direct matching module.

利用关键词比较模块将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较，如果相似度不低于设定值，判定重复，否则不重复；Use the keyword comparison module to compare the project name or thesis name of the current application project information with the keyword of another application project’s project name or thesis name in the historical project application information, if the similarity is not lower than the set value, it is determined to be repeated, otherwise it is not repeated;

利用直接匹配模块将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配，如果相同判定重复，否则不重复；Use the direct matching module to directly match the completer's name and ID number or intellectual property number of the information item of the current application project with the project completion person's name, ID number or intellectual property number of another application item in the historical project application information Match, if the same judgment is repeated, otherwise not repeated;

利用存储模块将关键词比较模块及直接匹配模块判定不重复申报项目的信息存储到主计算机的缓存区，将判定重复的申报项目的信息存储到主计算机中的项目申报数据库查重表中。Using the storage module, the keyword comparison module and the direct matching module determine the information of non-duplicate declaration items to be stored in the cache area of the host computer, and the information of the duplicate declaration items is stored in the duplicate checking table of the item declaration database in the host computer.

所述关键词比较模块将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较，如果相似度不低于设定值，判定重复，否则不重复，具体过程为：The keyword comparison module compares the project name or thesis name of the information of the current application project with the keywords of the project name or thesis name of another application project in the historical project application information, and if the similarity is not lower than the set Set the value, determine the repetition, otherwise it will not repeat, the specific process is:

利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个，对取出的信息进行分词，将分解出的关键词分别存储到主计算机中项目申报数据库对应的两个数组中；Use the keyword extraction module to extract one of the information of the current declared project and the declared information of historical projects, perform word segmentation on the extracted information, and store the decomposed keywords into two arrays corresponding to the project declaration database in the main computer;

利用关键词匹配模块对项目申报数据库对应的两个数组中的关键词进行遍历循环比较，得出相同关键词数及各组关键词数；Use the keyword matching module to traverse and compare the keywords in the two arrays corresponding to the project declaration database to obtain the same keyword number and the number of keywords in each group;

根据相似度模块得出当前申报项目的信息与历史项目申报信息中的项目信息的相似度，将该相似度与设定值比较，如果相似度不低于设定值，判定重复，否则不重复。According to the similarity module, the similarity between the information of the current declared project and the project information in the historical project declaration information is obtained, and the similarity is compared with the set value. If the similarity is not lower than the set value, it is determined to be repeated, otherwise it is not repeated. .

所述利用关键词提取模块取出当前申报项目的信息及历史项目申报信息中的一个，对取出的信息进行分词，采用ShootSearch组件分词。The keyword extraction module extracts one of the information of the current declaration project and the declaration information of historical projects, performs word segmentation on the extracted information, and uses the ShootSearch component to do word segmentation.

所述相似度模块具体用于比较项目申报数据库对应的两个数组中关键词数大小，取出较小关键词数，用相同关键词数除以较小关键词数得出相似度。The similarity module is specifically used to compare the number of keywords in the two arrays corresponding to the project declaration database, extract the smaller number of keywords, and divide the same number of keywords by the smaller number of keywords to obtain the similarity.

所述利用直接匹配模块将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配，具体为：Using the direct matching module to combine the completer’s name and ID number or intellectual property number of the information item of the current application project with the project completion person’s name, ID number or intellectual property number of another application item in the historical project application information Match directly, specifically:

将远程数据采集终端采集的申报项目的信息与申报数据库的子数据表中历史项目申报信息直接进行遍历循环匹配，判断是否匹配，如果匹配，则判定重复，否则，不重复。The information of the declaration items collected by the remote data collection terminal and the historical project declaration information in the sub-data table of the declaration database are directly traversed and cyclically matched to determine whether they match. If they match, it is determined to be repeated, otherwise, it is not repeated.

所述申报项目的信息包括项目名称、论文论著名称、项目的完成人姓名及身份证号及知识产权号，历史项目申报信息包括当前年度其他项目或近三年所有项目中的项目名称、论文论著名称、项目的完成人姓名及身份证号及知识产权号。知识产权号为专利申请号或者专利公开号。The information of the declared project includes the project name, the name of the thesis, the name and ID number of the person who completed the project, and the intellectual property number. Name, the name of the person who completed the project, ID number and intellectual property number. The intellectual property number is the patent application number or patent publication number.

用相似度计算方式查重，是为了更大程度上查询出重复项目，避免重新组合项目重复申报；所述分词的数据包括项目名称和论文论著名称；所述直接匹配的数据包括：完成人的姓名及身份证号和知识产权号。数据查重是整个科技奖励系统中的重要环节，数据处理过程复杂，处理手段不同。针对不同数据采取不同方式查重，是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识，不存在组合的可能。项目名称及论文论著名称可分解重组，进行分词计算相似度可更精确的查询重复项目。The similarity calculation method is used to check duplicate items to a greater extent to avoid repeated declarations of recombined items; the word segmentation data includes the project name and the title of the paper; the direct matching data includes: the author's Name and ID number and intellectual property number. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.

所述步骤三中对评审数据库的申报信息进行三次筛选，具体为：In the third step, the application information of the review database is screened three times, specifically:

对符合要求的评审数据库中的项目进行第一次筛选，根据项目信息遴选对应的网络评审专家，通过网络评审对同步到评审数据库中的数据进行筛选；将筛选出的申报项目数据存储到评审数据库中的第三缓存区；Screen the projects in the review database that meet the requirements for the first time, select the corresponding network review experts according to the project information, and screen the data synchronized to the review database through network review; store the screened declared project data in the review database The third cache area in;

对存储到第三缓存区中的申报项目数据进行第二次筛选；根据项目信息遴选资深专家，对第一次筛选取出的数据进行专家投票，从投票结果中取出初步获奖项目，将初步获奖项目的数据存储到评审数据库中的第四缓存区；The second screening is performed on the declared project data stored in the third buffer area; senior experts are selected according to the project information, experts vote on the data extracted from the first screening, and the preliminary award-winning projects are taken out from the voting results, and the preliminary award-winning projects are selected The data stored in the fourth cache area in the review database;

对存储到第四缓存区中的申报项目数据进行第三次筛选；根据项目信息遴选科技委员会专家，对第二次筛选中取出的数据进行专家审核，从审核结果中取出最终获奖项目，将最终获奖项目存储到评审数据库中的第五缓存区。The third screening is carried out on the declared project data stored in the fourth buffer area; the experts of the Science and Technology Committee are selected according to the project information, and the data taken out in the second screening are reviewed by experts, and the final award-winning projects are taken out from the review results, and the final Award-winning projects are stored in the fifth cache in the review database.

在筛选时，设置评分指标的权重，遴选专家，对同步的数据进行专家评分，依据权重对专家评分进行加和得到项目得分，依据项目得分从高到低进行排序，从排序结果中取出设定数目的项目。投票包括：一等奖、二等奖、三等奖和不评奖。审核包括有异议和无异议。When screening, set the weight of scoring indicators, select experts, perform expert scoring on the synchronized data, add up the expert scores according to the weights to obtain project scores, sort according to project scores from high to low, and take settings from the sorting results number of items. Voting includes: first prize, second prize, third prize and no award. Reviews include objections and non-objections.

所述三次筛选中，专家的遴选的过程具体为：In the above three screenings, the selection process of experts is as follows:

S1：将科技项目学科信息以数据集A的形式存储在评审数据库的子数据表中，评审数据库的子数据表中还存储有以数据集B的形式存储的专家学科信息；S1: Store the scientific and technological project subject information in the form of data set A in the sub-data table of the review database, and the sub-data table of the review database also stores expert subject information in the form of data set B;

S2：在数据集B中选取专家学科信息并作为条件因素，判断该专家学科信息的条件因素是否为一级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目学科信息的关键因素匹配，遍历专家学科信息中的条件因素是否包含关键因素，如果包含则匹配，并进入步骤S5，否则不匹配，进入步S3；S2: Select the subject information of experts in data set B as a condition factor, judge whether the condition factor of the subject information of the expert is a first-level subject, if so, combine the condition factor with the science and technology project discipline stored in the form of data set A Matching of key factors of information, traversing whether the conditional factors in expert subject information contain key factors, if yes, match and go to step S5, otherwise not match, go to step S3;

S3：判断该专家信息的条件因素是否为二级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配，遍历专家信息中的条件因素是否包含关键因素，如果包含则匹配，进入步骤S5，否则不匹配，进入步骤S4；S3: Determine whether the condition factor of the expert information is a secondary discipline, if so, match the condition factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse whether the condition factor in the expert information contains the key Factors, if they are included, they will match and go to step S5, otherwise they will not match and go to step S4;

S4：判断该专家信息的条件因素是否为三级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配，遍历专家信息中的条件因素是否包含关键因素，如果包含则匹配，进入S5，否则不匹配；S4: Determine whether the conditional factor of the expert information is a third-level subject, if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse whether the conditional factor in the expert information contains the key Factors, match if contained, go to S5, otherwise not match;

S5：从存储的专家学科信息的数据集B中随机选取与科技项目相匹配的专家数量，并将取出的数据存储至评审数据库；S5: Randomly select the number of experts matching the scientific and technological project from the stored data set B of expert subject information, and store the retrieved data in the review database;

所述步骤S5具体为，根据Random(随机)函数返回的零到指定数目的随机数，选取数据，直到选够指定数目。The step S5 specifically includes selecting data according to random numbers from zero to a specified number returned by the Random (random) function until the specified number is selected.

三次筛选是不同的，对数据的评价不同，第一次筛选是打分、第二次筛选是投票分等级、第三次筛选是投票表决。但是三次筛选中专家遴选的过程是相同的。筛选出的数据输出，输出形式为Word格式文档。Word格式文档输出，便于用户自行调整数据格式。The three screenings are different, and the evaluation of the data is different. The first screening is scoring, the second screening is voting and grading, and the third screening is voting. But the process of selecting experts in the three screenings is the same. The filtered data is output in the form of a Word format document. The document output in Word format is convenient for users to adjust the data format by themselves.

一种具有智能检测功能的数据处理系统，包括系统校验模块，用于通过JS脚本自动检测当前用户使用浏览器版本，对于非IE系列浏览器给予提示，并关闭；检测使用浏览器符合要求即进入系统进行数据采集；A data processing system with an intelligent detection function, including a system verification module, which is used to automatically detect the browser version used by the current user through JS scripts, give a prompt to non-IE series browsers, and close them; Enter the system for data collection;

远程数据采集终端，用于将采集的申报项目的数据信息存储到网络服务器的缓存区，将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库；The remote data collection terminal is used to store the data information of the collected declaration items in the cache area of the network server, and store the handwriting of the person who completed the collection by the remote data collection terminal through the handwriting board into the handwriting feature database;

主计算机，用于调用网络服务器的缓存区的申报项目的数据信息并存储到主计算机的第一缓存区；采集的图片、Word和PDF文档数据以文件形式存储至主计算机的第二缓存区，将文档相对路径存储至项目申报数据库附件子数据表中；The host computer is used to call the data information of the declared items in the cache area of the network server and store it in the first cache area of the host computer; the collected pictures, Word and PDF document data are stored in the second cache area of the host computer in the form of files, Store the relative path of the document in the attachment sub-data table of the project declaration database;

查重判断模块，用于根据主计算机读取第一缓存区中的申报项目的信息，判断数据是否符合要求；对于图片文档，利用项目申报数据库附件子数据表中文档路径调用主计算机的第二缓存区图片文档，利用图像识别模块读取图片文档内容，识别完成人笔迹与笔迹特征库进行比对，完成单位盖章与完成单位数据表完成单位名称是否相符，如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入筛选模块，如果不符合就重新采集数据；Duplicate checking and judging module is used to read the information of the declared project in the first cache area according to the main computer, and judge whether the data meets the requirements; For the image file in the cache area, use the image recognition module to read the content of the image file, and compare the handwriting of the person who completed the identification with the handwriting feature database. If the information and picture documents of the declared project in the project declaration data form meet the requirements, it will enter the screening module, and if it does not meet the requirements, the data will be collected again;

输出模块，用于将符合要求的项目数据从项目申报数据库同步到评审数据库；对评审数据库的申报信息进行三次筛选，并将最终数据输出在服务器的浏览页面上。The output module is used to synchronize the project data meeting the requirements from the project application database to the review database; perform three screenings on the application information of the review database, and output the final data on the browsing page of the server.

所述数据处理系统还包括数据分配模块，具体用于将与项目直接相关的数据存储到项目申报库主数据表中，与项目间接相关的数据存储到项目申报库各子数据表中，子数据表之间通过项目主键关联；申报数据库的子数据表中还存储有历史项目申报信息。The data processing system also includes a data allocation module, which is specifically used to store data directly related to the project into the master data table of the project declaration database, and store data indirectly related to the project into each sub-data table of the project declaration database, and the sub-data The tables are associated through the primary key of the project; the sub-data table of the declaration database also stores historical project declaration information.

所述查重判断模块中，具体还包括选择模块，选择模块用于对主计算机读取第一缓存区中的项目申报库各子数据表存储的数据进行分词或者直接匹配，利用判断当前的申报项目的信息是否为项目名称或论文论著名称，如果是就进入关键词比较模块，如果不是，再进行判断是否为项目的完成人姓名及身份证号或知识产权号，如果是，则进入直接匹配模块；In the described duplicate checking judgment module, specifically also include a selection module, the selection module is used to carry out word segmentation or direct matching to the data stored in each sub-data table of the project declaration library in the first buffer area read by the main computer, and to judge the current declaration by using Whether the project information is the name of the project or the name of the thesis, if it is, enter the keyword comparison module, if not, then judge whether it is the name and ID number or intellectual property number of the person who completed the project, if yes, enter the direct matching module;

关键词比较模块，用于将当前的申报项目的信息的项目名称或论文论著名称与历史项目申报信息中的另一申报项目的项目名称或论文论著名称的关键词比较，如果相似度不低于设定值，判定重复，否则不重复；The keyword comparison module is used to compare the project name or thesis name of the current application project information with the keywords of the project name or thesis name of another application project in the historical project application information, if the similarity is not less than Set the value to determine the repetition, otherwise it will not repeat;

直接匹配模块，用于将当前的申报项目的信息项目的完成人姓名及身份证号或知识产权号与历史项目申报信息中的另一申报项目的项目完成人姓名及身份证号或知识产权号直接进行匹配，如果相同判定重复，否则不重复；The direct matching module is used to compare the completer's name and ID number or intellectual property number of the information item of the current application project with the project completion person's name, ID number or intellectual property number of another application item in the historical project application information Match directly, if the same judgment is repeated, otherwise it will not be repeated;

存储模块，用于将关键词比较模块及直接匹配模块判定不重复申报项目的信息存储到主计算机的缓存区，将判定重复的申报项目的信息存储到主计算机中的项目申报数据库查重表中。The storage module is used to store the information of non-duplicate declaration items determined by the keyword comparison module and the direct matching module in the cache area of the host computer, and store the information of the duplicate declaration items in the duplicate checking table of the item declaration database in the host computer .

所述关键词比较模块，具体包括：The keyword comparison module specifically includes:

关键词提取模块，用于取出当前申报项目的信息及历史项目申报信息中的一个，对取出的信息进行分词，将分解出的关键词分别存储到主计算机中项目申报数据库对应的两个数组中；The keyword extraction module is used to extract one of the information of the current declared project and the declared information of historical projects, segment the extracted information, and store the decomposed keywords into two arrays corresponding to the project declaration database in the main computer ;

相似度模块，用于得出当前申报项目的信息与历史项目申报信息中的项目信息的相似度，将该相似度与设定值比较，如果相似度不低于设定值，判定重复，否则不重复。The similarity module is used to obtain the similarity between the information of the current declared project and the project information in the historical project declaration information, and compare the similarity with the set value. If the similarity is not lower than the set value, it is determined to be repeated, otherwise Not repeating.

所述直接匹配模块，具体包括：The direct matching module specifically includes:

匹配模块，用于将远程数据采集终端采集的申报项目的信息与申报数据库的子数据表中历史项目申报信息直接进行遍历循环匹配，判断是否匹配，如果匹配，则判定重复，否则，不重复。The matching module is used for traversing and cyclically matching the declared project information collected by the remote data collection terminal with the historical project declared information in the sub-data table of the declared database, and judging whether they match.

所述筛选模块，具体包括：The screening module specifically includes:

筛选存储模块，用于调用科技项目学科信息并以数据集A的形式存储在评审数据库的子数据表中，评审数据库的子数据表中还存储有以数据集B的形式存储的专家学科信息；The screening storage module is used to call the scientific and technological project subject information and store it in the sub-data table of the review database in the form of data set A, and the expert subject information stored in the form of data set B is also stored in the sub-data table of the review database;

一级学科提取模块，用于在数据集B中选取专家学科信息并作为条件因素，判断该专家学科信息的条件因素是否为一级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目学科信息的关键因素匹配，遍历专家学科信息中的条件因素是否包含关键因素，如果包含则匹配，否则不匹配；The first-level subject extraction module is used to select expert subject information in data set B as a condition factor, and judge whether the condition factor of the expert subject information is a first-level subject, and if so, combine the condition factor with the condition factor of data set A The key factor matching of the scientific and technological project subject information stored in the form, traverses whether the conditional factors in the expert subject information contain the key factor, if it is included, it will match, otherwise it will not match;

二级学科提取模块，用于判断该专家信息的条件因素是否为二级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配，遍历专家信息中的条件因素是否包含关键因素，如果包含则匹配，否则不匹配；The second-level subject extraction module is used to judge whether the conditional factor of the expert information is a second-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;

三级学科提取模块，用于判断该专家信息的条件因素是否为三级学科，如果是，则将该条件因素与以数据集A的形式存储的科技项目信息的关键因素匹配，遍历专家信息中的条件因素是否包含关键因素，如果包含则匹配，否则不匹配；The third-level subject extraction module is used to judge whether the conditional factor of the expert information is a third-level subject, and if so, match the conditional factor with the key factor of the scientific and technological project information stored in the form of data set A, and traverse the expert information Whether the condition factor of contains the key factor, if it contains, it will match, otherwise it will not match;

随机数据生成模块，用于从存储的专家学科信息的数据集B中随机选取与科技项目相匹配的专家数量，并将取出的数据存储至评审数据库；A random data generation module, used to randomly select the number of experts matching the scientific and technological project from the stored data set B of expert subject information, and store the retrieved data in the review database;

所述随机数据生成模块具体为，根据Random(随机)函数返回的零到指定数目的随机数，选取数据，直到选够指定数目。The random data generation module specifically selects data according to random numbers from zero to a specified number returned by the Random (random) function until the specified number is selected.

所述筛选模块包括一次筛选模块、二次筛选模块及三次筛选模块，所述一次筛选模块用于对符合要求的项目进行第一次筛选，根据项目信息遴选对应的网络评审专家，通过网络评审对同步到评审数据库中的数据进行筛选；将筛选出的申报项目数据存储到评审数据库中的第三缓存区；The screening module includes a primary screening module, a secondary screening module, and a tertiary screening module. The primary screening module is used for the first screening of projects that meet the requirements, and selects corresponding network review experts according to project information. Synchronize the data in the review database for screening; store the screened declared project data in the third cache area in the review database;

所述二次筛选模块用于对存储到第三缓存区中的申报项目数据进行第二次筛选；根据项目信息遴选资深专家，对一次筛选模块中取出的数据进行专家投票，从投票结果中取出初步获奖项目，将初步获奖项目的数据存储到评审数据库中的第四缓存区；The secondary screening module is used to perform a second screening of the declared project data stored in the third buffer area; select senior experts according to the project information, conduct expert voting on the data taken out of the primary screening module, and take out the data from the voting results Preliminary award-winning projects, store the data of preliminary award-winning projects in the fourth cache area in the review database;

所述三次筛选模块用于对存储到第四缓存区中的申报项目数据进行数据第三次筛选；根据项目信息遴选科技委员会专家，对二次筛选模块中取出的数据进行专家审核，从审核结果中取出最终获奖项目，将最终获奖项目存储到评审数据库中的第五缓存区；将存储在评审数据库中的第五缓存区中的最终获奖项目的数据输出。The third screening module is used for the third screening of the declared project data stored in the fourth buffer area; according to the project information, the experts of the Science and Technology Committee are selected, and the data taken out of the secondary screening module is reviewed by experts. The final award-winning project is taken out, and the final award-winning project is stored in the fifth buffer area in the review database; the data of the final award-winning item stored in the fifth buffer area in the review database is output.

直接相关的数据包括项目基本信息；间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件；采集数据包括：项目基本信息、项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件；所述项目基本信息包括：项目名称、项目学科、技术领域、项目来源及所属国民经济行业等；所述知识产权包括：知识产权号、知识产权名称、知识产权人和知识产权取得时间；所述完成人包括：完成人的姓名、身份证号和完成人顺序。Directly related data include the basic information of the project; indirectly related data include project introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers, and Attachments; collected data include: basic project information, project brief introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, recommendations from recommending units, intellectual property rights, papers and annexes; The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry, etc.; the intellectual property rights include: intellectual property number, intellectual property name, intellectual property owner and time when the intellectual property was obtained; the completed person Including: the name of the person who completed it, the ID number and the order of the person who completed it.

所述项目采集模块中数据存储形式包括：数据表、图片、Word文档及PDF文档，存储为不同格式数据，数据表便于查询、统计，图片及PDF文档确保数据真实性，Word文档为了便于保持数据原有格式，便于查看。The data storage form in the project collection module includes: data sheets, pictures, Word documents and PDF documents, which are stored as data in different formats. Original format for easy viewing.

所述一次筛选模块、二次筛选模块和三次筛选模块是不同的，对数据的评价不同，一次筛选模块用于打分、二次筛选模块用于投票分等级、三次筛选模块用于投票表决。但是一次筛选模块、二次筛选模块和三次筛选模块中专家遴选的过程是相同的。The primary screening module, the secondary screening module and the tertiary screening module are different, and the evaluation of data is different. The primary screening module is used for scoring, the secondary screening module is used for voting and grading, and the tertiary screening module is used for voting. But the process of selecting experts in the primary screening module, secondary screening module and tertiary screening module is the same.

本发明的有益效果：Beneficial effects of the present invention:

数据判断优势在于系统自动提取查重因素，进行复杂匹配计算，减少人为因素，提高查重结果的公正和正确性，减轻工作人员工作量，极大提高工作效率。数据查重是整个科技奖励系统中的重要环节，数据处理过程复杂，处理手段不同。针对不同数据采取不同方式查重，是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识，不存在组合的可能。项目名称及论文论著名称可分解重组，进行分词计算相似度可更精确的查询重复项目。The advantage of data judgment is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the fairness and correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.

通过网络评审进行数据筛选的优势，减少工作成本，筛选过程减少其他因素的干扰使筛选更客观。专家遴选优势是随机选取专家且依据项目学科与专家学科关联，使专家遴选公平公正、专家更具针对性，评审结果质量更高。The advantage of data screening through network review is to reduce work costs, and the screening process reduces the interference of other factors, making the screening more objective. The advantage of expert selection is that experts are randomly selected and related to the subject of the project and the subject of the expert, so that the selection of experts is fair and just, the experts are more targeted, and the quality of the review results is higher.

附图说明Description of drawings

图1为本发明的主流程示意图；Fig. 1 is a schematic diagram of the main process of the present invention;

图2为本发明的数据判断流程示意图；Fig. 2 is a schematic diagram of a data judgment flow chart of the present invention;

图3本发明的数据判断关键词比较流程示意图；Fig. 3 is a schematic flow chart of data judging keyword comparison of the present invention;

图4为本发明的数据判断直接匹配流程示意图；Fig. 4 is a schematic diagram of the data judging direct matching process of the present invention;

图5为本发明的专家遴选流程示意图。Fig. 5 is a schematic diagram of the expert selection process of the present invention.

具体实施方式Detailed ways

下面结合附图与实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

如图1所示，一种具有智能检测功能的数据处理方法，包括如下步骤：As shown in Figure 1, a data processing method with an intelligent detection function comprises the following steps:

步骤一：系统检测，使用浏览器打开网站，通过JS脚本自动检测当前用户使用浏览器版本，对于非IE系列浏览器给予提示，并关闭；检测使用浏览器符合要求即进入系统进行数据采集；Step 1: System detection, use a browser to open the website, automatically detect the browser version used by the current user through JS script, give a prompt for non-IE series browsers, and close it; detect that the browser used meets the requirements and then enter the system for data collection;

步骤二：采集申报项目的数据；将远程数据采集终端通过手写板采集完成人笔迹存储至笔迹特征库，将采集的与项目直接相关的数据存储到项目申报库主数据表中，每条数据对应唯一项目编号，与项目间接相关的数据存储到项目申报库各子数据表中，子数据表中设置项目编号作为外键与主数据表关联；采集的图片、Word和PDF文档数据以文件形式存储至服务器缓存区，将文档相对路径存储至项目申报库附件子数据表中；Step 2: Collect the data of the declared project; store the handwriting of the person who completed the collection by the remote data collection terminal through the handwriting board into the handwriting feature database, and store the collected data directly related to the project into the master data table of the project declaration database. Each piece of data corresponds to Unique project number, data indirectly related to the project is stored in each sub-data table of the project declaration database, and the project number is set in the sub-data table as a foreign key to associate with the main data table; the collected pictures, Word and PDF document data are stored in the form of files To the server cache area, store the relative path of the document in the attachment sub-data table of the project declaration library;

步骤三：读取项目申报库主数据表及各子数据表中数据，判断数据是否符合要求；对于图片文档，利用项目申报库附件子数据表中文档路径调用服务器缓存区图片文档，利用图像识别模块读取图片文档内容，识别完成人笔迹与笔迹特征库进行比对，完成单位盖章与完成单位数据表完成单位名称是否相符，如果笔迹特征库中笔迹信息、项目申报数据表中申报项目的信息及图片文档均符合要求就进入步骤四，如果不符合就返回步骤二；Step 3: Read the data in the main data table and each sub-data table of the project declaration database, and judge whether the data meets the requirements; for image documents, use the document path in the sub-data table of the project declaration database to call the image file in the server cache area, and use image recognition The module reads the content of the picture document, and compares the handwriting of the person who completes it with the handwriting feature database. Whether the seal of the completion unit matches the name of the completion unit in the data sheet of the completion unit, if the handwriting information in the handwriting feature database and the declaration item in the project declaration If the information and image files meet the requirements, go to step 4; if not, go back to step 2;

步骤四：将步骤二中符合要求的项目数据从项目申报数据库同步到评审数据库，该同步过程为单向同步；Step 4: Synchronize the project data that meets the requirements in step 2 from the project application database to the review database. The synchronization process is one-way synchronization;

步骤五：对符合要求的项目进行多次筛选，根据项目信息遴选对应的网络评审专家，通过网络评审对同步到评审数据库中的数据进行筛选；将筛选出的申报项目数据存储到评审数据库中的第一缓存区；对存储到第一缓存区中的申报项目数据进行第二次筛选；根据项目信息遴选资深专家，对步骤四中取出的数据进行专家投票，从投票结果中取出初步获奖项目，将初步获奖项目的数据存储到评审数据库中的第二缓存区；对存储到第二缓存区中的申报项目数据进行数据第三次筛选；根据项目信息遴选科技委员会专家，对步骤五中取出的数据进行专家审核，从审核结果中取出最终获奖项目，将最终获奖项目存储到评审数据库中的第三缓存区；Step 5: Screen the projects that meet the requirements multiple times, select the corresponding network review experts according to the project information, and screen the data synchronized to the review database through network review; store the screened declared project data in the review database. The first cache area; perform a second screening of the declared project data stored in the first cache area; select senior experts according to the project information, conduct expert voting on the data taken out in step 4, and take out preliminary award-winning projects from the voting results, Store the data of the preliminary award-winning projects in the second buffer area of the review database; perform the third data screening on the declared project data stored in the second buffer area; select the experts of the Science and Technology Committee according to the project information, The data is reviewed by experts, and the final award-winning projects are taken out from the review results, and the final award-winning projects are stored in the third cache area in the review database;

步骤六：将存储在评审数据库中的第三缓存区中的最终获奖项目的数据输出。Step 6: Output the data of the final award-winning project stored in the third cache area in the review database.

所述步骤二中，直接相关的数据包括项目基本信息；In said step 2, the directly related data includes the basic information of the project;

间接相关的数据包括项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件；Indirectly related data include project introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers and annexes;

采集数据包括：项目基本信息、项目简介、主要科技创新、第三方评价、社会经济效益、曾获科技奖励、完成人、完成单位、推荐单位意见、知识产权、论文论著及附件；The collected data include: basic project information, project brief introduction, major technological innovations, third-party evaluations, social and economic benefits, scientific and technological awards received, completers, completed units, opinions of recommending units, intellectual property rights, papers and annexes;

所述项目基本信息包括：项目名称、项目学科、技术领域、项目来源及所属国民经济行业等；The basic information of the project includes: project name, project discipline, technical field, project source and national economic industry to which it belongs, etc.;

所述知识产权包括：知识产权号、知识产权名称、知识产权人和知识产权取得时间；The intellectual property rights include: intellectual property number, intellectual property name, intellectual property owner and acquisition time of intellectual property rights;

所述完成人包括：完成人的姓名、身份证号和完成人顺序。The completer includes: the completer's name, ID number and sequence of completers.

所述步骤二中数据存储形式包括：数据表、图片、Word文档及PDF文档，存储为不同格式数据，数据表便于查询、统计，图片及PDF文档确保数据真实性，Word文档为了便于保持数据原有格式，便于查看。Data storage form in described step 2 comprises: data table, picture, Word document and PDF document, store as different format data, data table is convenient to query, statistics, picture and PDF document ensure data authenticity, and Word document is in order to keep data original Formatted for easy viewing.

所述步骤三中，对步骤二中存储的数据进行分词或者直接匹配，计算相似度，如果相似度低于设定值，就判断为符合要求，否则，判断为不符合要求。用相似度计算方式查重，是为了更大程度上查询出重复项目，避免重新组合项目重复申报；所述分词的数据包括项目名称和论文论著名称；所述直接匹配的数据包括：完成人的姓名及身份证号和知识产权号。In the third step, word segmentation or direct matching is performed on the data stored in the second step, and the similarity is calculated. If the similarity is lower than the set value, it is judged as meeting the requirements; otherwise, it is judged as not meeting the requirements. The similarity calculation method is used to check duplicate items to a greater extent to avoid repeated declarations of recombined items; the word segmentation data includes the project name and the title of the paper; the direct matching data includes: the author's Name and ID number and intellectual property number.

所述步骤五中具体过程为：设置评分指标的权重，遴选专家，对步骤三中同步的数据进行专家评分，依据权重对专家评分进行加和得到项目得分，依据项目得分从高到低进行排序，从排序结果中取出设定数目的项目。The specific process in step 5 is: setting the weight of scoring indicators, selecting experts, performing expert scoring on the data synchronized in step 3, summing up the expert scores according to the weights to obtain project scores, and sorting according to the project scores from high to low , fetch the set number of items from the sorted results.

所述步骤五中的投票包括：一等奖、二等奖、三等奖和不评奖。The voting in the step 5 includes: first prize, second prize, third prize and no award.

所述步骤五中审核包括有异议和无异议。The review in Step 5 includes objection and non-objection.

所述步骤六，将步骤六筛选出的数据输出，输出形式为Word格式文档。Word格式文档输出，便于用户自行调整数据格式。In the sixth step, the data filtered out in the sixth step is output, and the output form is a document in Word format. The document output in Word format is convenient for users to adjust the data format by themselves.

如图2所示，所述步骤三取项目申报库主数据表及各子数据表中数据，判断数据是否符合要求的步骤包括：As shown in Figure 2, the third step is to obtain the data in the main data table and each sub-data table of the project declaration database, and the steps of judging whether the data meets the requirements include:

步骤(3-1)：从项目申报库主数据表及各子数据表提取因素；所述因素包括：项目名称、项目的完成人姓名及身份证号、知识产权号、论文论著名称；Step (3-1): Factors are extracted from the main data table and each sub-data table of the project declaration database; the factors include: project name, project completer name and ID number, intellectual property number, and name of thesis;

步骤(3-2)：判断因素是否为项目名称和论文论著名称，如果是就进入步骤(3-3)，否则就进入步骤(3-4)；Step (3-2): Determine whether the factors are the project name and the name of the thesis, if yes, go to step (3-3), otherwise go to step (3-4);

步骤(3-3)：将项目名称和论文论著名称与另一项目中关键词比较，如果相同，则判断项目重复，否则项目不重复；Step (3-3): Compare the project name and the name of the thesis with the keywords in another project, if they are the same, judge that the project is repeated, otherwise the project is not repeated;

步骤(3-4)：将项目的完成人姓名及身份证号、知识产权号与另一项目直接匹配，如果相同，则判断项目重复，否则项目不重复；Step (3-4): directly match the name, ID number, and intellectual property number of the person completing the project with another project. If they are the same, determine that the project is duplicated; otherwise, the project is not repeated;

步骤(3-5)：结果存储；对步骤(3-3)和步骤(3-4)比较和匹配结果存储到项目申报数据库。Step (3-5): result storage; the comparison and matching results of step (3-3) and step (3-4) are stored in the project declaration database.

数据查重是整个科技奖励系统中的重要环节，数据处理过程复杂，处理手段不同。针对不同数据采取不同方式查重，是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识，不存在组合的可能。项目名称及论文论著名称可分解重组，进行分词计算相似度可更精确的查询重复项目。Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.

如图3所示，所述步骤(3-3)关键词比较的具体步骤包括：As shown in Figure 3, the specific steps of described step (3-3) keyword comparison include:

步骤(3-3-1)：从项目申报数据库取出当前项目和另一项目的项目名称或论文论著名称；所述另一项目指当前年度其他项目或前三年所有项目中的一个项目；Step (3-3-1): Take out the project name or thesis title of the current project and another project from the project application database; the other project refers to other projects in the current year or one of all projects in the previous three years;

步骤(3-3-2)：提取关键词；对步骤(3-3-1)取出的数据采用ShootSearch组件分词，将分解出的关键词分别存储到数组；Step (3-3-2): extract keywords; use the ShootSearch component word segmentation for the data extracted in step (3-3-1), and store the decomposed keywords in an array;

步骤(3-3-3)：关键词匹配；对步骤(3-3-2)得到的两个数组进行遍历循环比较，得出相同关键词数及各组关键词数；Step (3-3-3): Keyword matching; The two arrays obtained in step (3-3-2) are compared in a traversal loop to obtain the number of identical keywords and the number of keywords in each group;

步骤(3-3-4)：计算相似度；比较步骤(3-3-3)得出的两组关键词数大小，取出较小关键词数，用相同关键词数除以较小关键词数得出相似度；Step (3-3-4): Calculate the similarity; compare the number of keywords in the two groups obtained in step (3-3-3), take out the smaller number of keywords, and divide the same number of keywords by the smaller keyword Count the similarity;

步骤(3-3-5)：判断是否重复；将步骤(3-3-4)得出的相似度与设定值比较，如果相似度不低于设定值，判定重复，否则不重复；Step (3-3-5): judge whether to repeat; compare the similarity obtained in step (3-3-4) with the set value, if the similarity is not lower than the set value, determine to repeat, otherwise do not repeat;

步骤(3-3-6)：数据存储；将步骤(3-3-5)判重的数据存储到项目申报数据库。Step (3-3-6): data storage; store the weight judgment data in step (3-3-5) into the project declaration database.

如图4所示，所述步骤(3-4)直接匹配的步骤包括：As shown in Figure 4, the step of described step (3-4) direct matching comprises:

步骤(3-4-1)：从项目申报数据库取出当前项目及另一项目完成人姓名及身份证号或知识产权号；所述另一项目：在因素为完成人姓名或身份证号时，指当前年度的其他项目中的一个项目；在因素为知识产权号时，指当前年度的其他项目或前三年所有项目中的一个项目。Step (3-4-1): Take out the current project and another project completer's name and ID number or intellectual property number from the project declaration database; said another item: when the factor is the completer's name or ID number, Refers to one of the other projects in the current year; when the factor is the intellectual property number, it refers to one of the other projects in the current year or one of all the projects in the previous three years.

步骤(3-4-2)：直接匹配；对步骤(3-4-1)取出的数据直接进行比较；Step (3-4-2): direct matching; directly compare the data taken out in step (3-4-1);

步骤(3-4-3)：判断是否重复；对步骤(3-4-2)中比较相同的判定重复，否则不重复；Step (3-4-3): Determine whether to repeat; repeat the same determination in step (3-4-2), otherwise do not repeat;

步骤(3-4-4)：数据存储，将步骤(3-4-3)判重的数据存储到项目申报数据库。Step (3-4-4): data storage, storing the weight judgment data in step (3-4-3) into the project declaration database.

如图5所示，所述步骤五中的专家的遴选过程如下：As shown in Figure 5, the selection process of experts in the fifth step is as follows:

步骤(I)：从数据集A即项目学科集合中取出关键因素即项目第一学科，从数据集B即专家学科集合中取出因素1即第一学科、因素2即第二学科及因素3即第三学科；Step (I): Take out the key factor from the data set A, which is the subject set of the project, that is, the first subject of the project, and take out factor 1, which is the first subject, factor 2, which is the second subject, and factor 3, which is third subject;

步骤(II)：选取条件因素；所属条件因素包括：第一学科、第二学科和第三学科；Step (II): select condition factors; the condition factors include: the first subject, the second subject and the third subject;

步骤(III)：判断条件因素是否为因素1即第一学科，如果是就进入步骤(IV)且令n＝1，否则进入步骤(V)；Step (III): determine whether the conditional factor is factor 1, i.e. the first subject, if so, enter step (IV) and make n=1, otherwise enter step (V);

步骤(IV)：将因素n与关键因素匹配；遍历因素n中是否包含关键因素，如果包含则匹配，否则不匹配；如果匹配则进入步骤(VI)，否则返回步骤(II)；Step (IV): Match the factor n with the key factor; traverse whether the key factor is included in the factor n, if it is included, match, otherwise it will not match; if it matches, enter step (VI), otherwise return to step (II);

步骤(V)：判断条件因素是否为因素2即第二学科，如果是就进入步骤(IV)且令n＝2，否则进入步骤(IV)且令n＝3；Step (V): determine whether the condition factor is factor 2, i.e. the second subject, if so, enter step (IV) and make n=2, otherwise enter step (IV) and make n=3;

步骤(VI)：从数据集B选取匹配数据；根据步骤(IV)匹配结果选取数据；Step (VI): select matching data from data set B; select data according to step (IV) matching result;

步骤(VII)：随机选取指定数目数据；根据Random函数返回的零到指定书目的随机数，选取数据，直到选够指定数目；Step (VII): randomly select the specified number of data; according to the random number from zero returned by the Random function to the specified bibliography, select data until the specified number is selected;

步骤(VIII)：数据存储；将步骤(VII)取出的数据存储至评审数据库。Step (VIII): data storage; store the data retrieved in step (VII) into the review database.

一种具有智能检测功能的数据处理系统，包括A data processing system with intelligent detection function, comprising

系统校验模块，用于通过JS脚本自动检测当前用户使用浏览器版本，对于非IE系列浏览器给予提示，并关闭；检测使用浏览器符合要求即进入系统进行数据采集；The system verification module is used to automatically detect the browser version used by the current user through JS scripts, give a prompt for non-IE series browsers, and close them; if the browser meets the requirements, it will enter the system for data collection;

所述筛选模块，具体包括：The screening module specifically includes:

本发明数据判断优势在于系统自动提取查重因素，进行复杂匹配计算，减少人为因素，提高查重结果的公正和正确性，减轻工作人员工作量，极大提高工作效率。数据查重是整个科技奖励系统中的重要环节，数据处理过程复杂，处理手段不同。针对不同数据采取不同方式查重，是避免误查和漏查。完成人姓名及身份证号及知识产权号是完整的标识，不存在组合的可能。项目名称及论文论著名称可分解重组，进行分词计算相似度可更精确的查询重复项目。The advantage of the data judgment of the present invention is that the system automatically extracts duplicate checking factors, performs complex matching calculations, reduces human factors, improves the fairness and correctness of duplicate checking results, reduces the workload of staff, and greatly improves work efficiency. Data plagiarism check is an important link in the entire science and technology reward system. The data processing process is complicated and the processing methods are different. Different methods are adopted for different data to check for duplicates, which is to avoid false checks and missed checks. The completer's name, ID number and intellectual property number are complete identifications, and there is no possibility of combination. Project names and thesis titles can be decomposed and reorganized, and word segmentation can be used to calculate the similarity to query duplicate items more accurately.

上述虽然结合附图对本发明的具体实施方式进行了描述，但并非对本发明保护范围的限制，所属领域技术人员应该明白，在本发明的技术方案的基础上，本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific implementation of the present invention has been described above in conjunction with the accompanying drawings, it does not limit the protection scope of the present invention. Those skilled in the art should understand that on the basis of the technical solution of the present invention, those skilled in the art do not need to pay creative work Various modifications or variations that can be made are still within the protection scope of the present invention.

Claims

1. a data processing method with intelligent detecting function, is characterized in that, comprises the steps:

Step 1: automatically detect active user by JS script and use browser version, point out for non-IE series browser, and close; Detect and use the browser i.e. system of entering that meets the requirements to carry out data acquisition;

Step 2: remote data acquisition terminal is completed to people's person's handwriting by handwriting pad collection and be stored to handwriting characteristic storehouse, the data information memory of the project of declaring gathering is to the buffer area of the webserver, principal computer call the buffer area of the webserver the project of declaring data message and store the first buffer area of principal computer into; Picture, Word and the PDF document data gathering is stored to the second buffer area of principal computer with document form, document relative path is stored in project application database attachments subdatasheet;

Step 3: principal computer reads the information of the project of declaring in the first buffer area, judges whether data meet the requirements; For photo-document, utilize document path in project application database attachments subdatasheet to call the second buffer area photo-document of principal computer, utilize picture recognition module to read photo-document content, identify people's person's handwriting and compare in handwriting characteristic storehouse, complete unit stamp and complete flat data table and complete organization and whether conform to, all meet the requirements and just enter step 4 if declare the information of project and photo-document in handwriting characteristic storehouse in handwriting information, project application tables of data, just return to step 2 if do not met;

Step 4: from project application database synchronization to review data storehouse, this synchronizing process is taked one-way synchronization by satisfactory project data in step 3; The information of declaring in review data storehouse is carried out to three screenings, and final data is exported on the browsing pages of server;

In described step 2, be specially: the data directly related with project are stored in the master data sheet of project application storehouse, every corresponding unique bullets of data, store in the each subdatasheet in project application storehouse with the data of project indirect correlation, in subdatasheet, to be set to external key associated with master data sheet for unique bullets; In the subdatasheet in declaration data storehouse, also store history item and declare information.

2. a kind of data processing method with intelligent detecting function as claimed in claim 1, it is characterized in that, in described step 2, the data of the each subdatasheet storage in project application storehouse are carried out participle or directly mated, utilize judge module to judge whether the information of the current project of declaring is project name or paper treatise title, if just enter keyword comparison module, if not, what determine whether project completes people's name and identification card number or intellecture property number again, if so, enter direct matching module;

Utilize keyword comparison module that the project name of the information of the current project of declaring or paper treatise title and history item are declared to another in information and declare the project name of project or the keyword comparison of paper treatise title, if similarity is not less than setting value, judge and repeat, otherwise do not repeat;

Utilizing direct matching module that people's name and identification card number or intellecture property number and history item another project of declaring project in information of declaring that completes of the information project of the current project of declaring is completed to people's name and identification card number or intellecture property number directly mates, if identical judgement repeats, otherwise does not repeat;

Utilize memory module by keyword comparison module and directly matching module judge that the information that does not repeat the project of declaring stores the buffer area of principal computer into, store the information of judging the project of declaring repeating into project application database in principal computer and look into heavily in table.

3. a kind of data processing method with intelligent detecting function as claimed in claim 2, it is characterized in that, described keyword comparison module is declared another in information by the project name of the information of the current project of declaring or paper treatise title and history item and is declared the project name of project or the keyword comparison of paper treatise title, if similarity is not less than setting value, judge and repeat, otherwise do not repeat, detailed process is:

Utilize keyword extracting module to take out current information and history item of declaring project and declare in information, the information of taking out is carried out to participle, the keyword decompositing is stored into respectively in two arrays corresponding to project declaration data storehouse in principal computer;

Utilize keyword matching module to travel through recycle ratio to the keyword in two arrays corresponding to project application database, draw same keyword number and respectively organize keyword number;

Show that according to similarity module current information of declaring project and history item declare the similarity of the project information in information, by this similarity and setting value comparison, if similarity is not less than setting value, judges and repeat, otherwise do not repeat;

Described current information and the history item of declaring project of keyword extracting module taking-up of utilizing declared in information, and the information of taking out is carried out to participle, adopts ShootSearch assembly participle;

Described similarity module is counted size specifically for keyword in two arrays corresponding to item compared declaration data storehouse, takes out less keyword number, draws similarity with same keyword number divided by less keyword number.

4. a kind of data processing method with intelligent detecting function as claimed in claim 1, is characterized in that, in described step 3, the information of declaring in review data storehouse is carried out to three screenings, is specially:

Project in satisfactory review data storehouse is screened for the first time, select corresponding network evaluation expert according to project information, by network evaluation, the data that are synchronized in review data storehouse are screened; Store the project data of declaring filtering out into the 3rd buffer area in review data storehouse;

The project data of declaring storing in the 3rd buffer area is carried out to programmed screening; Select Senior Expert according to project information, the data of screening for the first time taking-up are carried out to expert's ballot, from voting results, take out preliminary award-winning item, store the data of preliminary award-winning item into the 4th buffer area in review data storehouse;

The project data of declaring storing in the 4th buffer area is screened for the third time; Select the expert of Technology Committee according to project information, the data of taking out in programmed screening are carried out to expert's examination & verification, from auditing result, take out final award-winning item, store final award-winning item into the 5th buffer area in review data storehouse.

5. a kind of data processing method with intelligent detecting function as claimed in claim 1, is characterized in that, in described three screenings, expert's the process of selecting is specially:

S1: science and technology item discipline information is stored in the form of data set A in the subdatasheet in review data storehouse, also stores the expert's discipline information with the form storage of data set B in the subdatasheet in review data storehouse;

S2: choose expert's discipline information and as condition element in data set B, whether the condition element that judges this expert's discipline information is one-level subject, if, this condition element is mated with the key factor of the science and technology item discipline information of the form storage with data set A, whether the condition element in traversal expert discipline information comprises key factor, if comprised, and coupling, and enter step S5, otherwise do not mate, enter step S3;

S3: whether the condition element that judges this expert info is secondary subject, if, this condition element is mated with the key factor of the science and technology item information of the form storage with data set A, whether the condition element in traversal expert info comprises key factor, if comprised, coupling, enter step S5, otherwise do not mate, enter step S4;

S4: whether the condition element that judges this expert info is three grades of subjects, if, this condition element is mated with the key factor of the science and technology item information of the form storage with data set A, whether the condition element in traversal expert info comprises key factor, if comprised, coupling, enter S5, otherwise do not mate;

S5: choose at random the expert's quantity matching with science and technology item from the data set B of expert's discipline information of storage, and the data of taking-up are stored to review data storehouse.

6. a kind of data handling system with intelligent detecting function as claimed in claim 1, is characterized in that, comprising:

System check module, uses browser version for automatically detecting active user by JS script, points out, and close for non-IE series browser; Detect and use the browser i.e. system of entering that meets the requirements to carry out data acquisition;

Remote data acquisition terminal, for the buffer area to the webserver by the data information memory of the project of declaring gathering, completes people's person's handwriting by remote data acquisition terminal by handwriting pad collection and is stored to handwriting characteristic storehouse;

Principal computer, for call the webserver buffer area the project of declaring data message and store the first buffer area of principal computer into; Picture, Word and the PDF document data gathering is stored to the second buffer area of principal computer with document form, document relative path is stored in project application database attachments subdatasheet;

Look into heavy judge module, for read the information of the project of declaring of the first buffer area according to principal computer, judge whether data meet the requirements; For photo-document, utilize document path in project application database attachments subdatasheet to call the second buffer area photo-document of principal computer, utilize picture recognition module to read photo-document content, identify people's person's handwriting and compare in handwriting characteristic storehouse, complete unit stamp and complete flat data table and complete organization and whether conform to, all meet the requirements and just enter screening module if declare the information of project and photo-document in handwriting characteristic storehouse in handwriting information, project application tables of data, if do not met with regard to Resurvey data;

Output module, for by satisfactory project data from project application database synchronization to review data storehouse; The information of declaring in review data storehouse is carried out to three screenings, and final data is exported on the browsing pages of server;

Described data handling system also comprises data allocations module, specifically for the data directly related with project being stored in the master data sheet of project application storehouse, store in the each subdatasheet in project application storehouse with the data of project indirect correlation, between subdatasheet, pass through the association of project major key; In the subdatasheet in declaration data storehouse, also store history item and declare information.

7. a kind of data handling system with intelligent detecting function as claimed in claim 6, it is characterized in that, described looking in heavy judge module, specifically also comprise selection module, select module carry out participle or directly mate for principal computer being read to the data of the each subdatasheet storage in project application storehouse of the first buffer area, utilize and judge whether the information of the current project of declaring is project name or paper treatise title, if just enter keyword comparison module, if not, what determine whether project completes people's name and identification card number or intellecture property number again, if, enter direct matching module,

Keyword comparison module, declare the project name of project or the keyword comparison of paper treatise title for another that the project name of the information of the current project of declaring or paper treatise title and history item are declared to information, if similarity is not less than setting value, judge and repeat, otherwise do not repeat;

Directly matching module, for complete people's name and identification card number or intellecture property number and the history item of the information project of the current project of declaring are declared to another project of declaring project of information and are completed people's name and identification card number or intellecture property number and directly mate, if identical judgement repeats, otherwise does not repeat;

Memory module, for by keyword comparison module and directly matching module judge that the information that does not repeat the project of declaring stores the buffer area of principal computer into, store the information of judging the project of declaring repeating into project application database in principal computer and look into heavily in table.

8. a kind of data handling system with intelligent detecting function as claimed in claim 7, is characterized in that, described keyword comparison module, specifically comprises:

Keyword extracting module, declares of information for taking out current information of declaring project and history item, and the information of taking out is carried out to participle, and the keyword decompositing is stored into respectively in two arrays corresponding to project declaration data storehouse in principal computer;

Similarity module, for showing that current information of declaring project and history item declare the similarity of the project information of information, by this similarity and setting value comparison, if similarity is not less than setting value, judges and repeats, otherwise do not repeat;

9. a kind of data handling system with intelligent detecting function as claimed in claim 6, it is characterized in that, described screening module comprises primary screening module, postsearch screening module and three screening modules, described primary screening module is for screening for the first time satisfactory project, select corresponding network evaluation expert according to project information, by network evaluation, the data that are synchronized in review data storehouse are screened; Store the project data of declaring filtering out into the 3rd buffer area in review data storehouse;

Described postsearch screening module is for carrying out programmed screening to the project data of declaring that stores the 3rd buffer area into; Select Senior Expert according to project information, the data of taking out in primary screening module are carried out to expert's ballot, from voting results, take out preliminary award-winning item, store the data of preliminary award-winning item into the 4th buffer area in review data storehouse;

Described three screening modules are for carrying out data and screen for the third time the project data of declaring that stores the 4th buffer area into; Select the expert of Technology Committee according to project information, the data of taking out in postsearch screening module are carried out to expert's examination & verification, from auditing result, take out final award-winning item, store final award-winning item into the 5th buffer area in review data storehouse; By the data output of the final award-winning item in the 5th buffer area being stored in review data storehouse.

10. a kind of data handling system with intelligent detecting function as claimed in claim 6, is characterized in that, described screening module, specifically comprises:

Screening memory module, for calling science and technology item discipline information and being stored in the subdatasheet in review data storehouse with the form of data set A, also stores the expert's discipline information with the form storage of data set B in the subdatasheet in review data storehouse;

One-level subject extraction module, for choosing expert's discipline information and as condition element at data set B, whether the condition element that judges this expert's discipline information is one-level subject, if, this condition element is mated with the key factor of the science and technology item discipline information of the form storage with data set A, whether the condition element in traversal expert discipline information comprises key factor, if comprised, and coupling, otherwise do not mate;

Secondary subject extraction module, for judging whether the condition element of this expert info is secondary subject, if, this condition element is mated with the key factor of the science and technology item information of the form storage with data set A, whether the condition element in traversal expert info comprises key factor, if comprised, coupling, otherwise do not mate;

Three grades of subject extraction modules, for judging whether the condition element of this expert info is three grades of subjects, if, this condition element is mated with the key factor of the science and technology item information of the form storage with data set A, whether the condition element in traversal expert info comprises key factor, if comprised, coupling, otherwise do not mate;

Random data generation module, chooses at random for the data set B of the expert's discipline information from storage the expert's quantity matching with science and technology item, and the data of taking-up is stored to review data storehouse.