CN119557419A - A file management method and system - Google Patents
A file management method and system Download PDFInfo
- Publication number
- CN119557419A CN119557419A CN202411625274.XA CN202411625274A CN119557419A CN 119557419 A CN119557419 A CN 119557419A CN 202411625274 A CN202411625274 A CN 202411625274A CN 119557419 A CN119557419 A CN 119557419A
- Authority
- CN
- China
- Prior art keywords
- archive
- file
- score
- importance
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to the technical field of file management and discloses a file management method and system, wherein the method comprises the steps of obtaining file basic data, wherein the file basic data comprises file forming time, file content and confidentiality deadline of files; the method comprises the steps of carrying out classification scoring calculation according to file basic data to obtain file classification scores, obtaining file archiving categories based on the file classification scores and preset classification scoring criteria, calculating importance rating factors according to the file basic data and the file archiving categories, carrying out importance scoring calculation based on the importance rating factors to obtain file importance scores, and selecting a storage period for archiving according to the file importance scores and preset storage period criteria. The method can improve the storage efficiency of the files and the retrieval speed.
Description
Technical Field
The present invention relates to the field of archive management technologies, and in particular, to a method and a system for archive management.
Background
In the current information explosion age background, with the rapid development of society and economy and the wide popularization of information technology, various institutions and personal generated document materials exhibit unprecedented richness and diversity. These materials include not only traditional paper documents, but also a large number of electronic documents, audio, video materials, and other forms of media. The trend of this diversification makes the number of files rapidly increase, and the demand for storage space increases greatly. Traditional physical storage methods, such as file cabinets and file boxes, have difficulty in coping with such huge amounts of data, and cannot meet the requirements of efficient management and long-term storage.
In the prior art, file management starts to turn to a digital storage scheme, and massive file materials are managed and stored through an electronic file management system. The institutions store a large amount of paper archives on a hard disk, a cloud server or other network platforms through digital processing means such as scanning, photographing and the like. The technology not only effectively saves physical space and reduces the dependence on entity storage facilities, but also remarkably improves the retrieval efficiency of files. The user can quickly locate the required archive information through various modes such as keyword searching, classified browsing and the like, so that the searching time is greatly shortened, and the working efficiency is improved. The digital storage is also beneficial to long-term storage and safety protection of files, reduces the risk of information loss caused by physical damage or loss, and ensures the integrity and usability of file materials.
Nevertheless, current archive management approaches still suffer from a number of shortcomings. Existing management methods are mainly focused on the problem of storing files, but cannot adopt differentiated management and storage strategies according to different types of files. For example, most files are simply classified and stored according to the generation time and the category to which the files belong, and factors such as security level, integrity and the like of the files are ignored, so that the actual meaning of different files cannot be accurately evaluated, and the specific setting of file management deadlines is affected. In addition, the file management is disordered as a whole because the management means is single, and the information retrieval efficiency is low.
Disclosure of Invention
The invention provides a file management method and system, which are used for improving file management efficiency.
In order to solve the above technical problems, the present invention provides a method for managing files, including:
Acquiring archive basic data, wherein the archive basic data comprise archive forming time, archive content and confidentiality deadlines of the archive;
carrying out classification scoring calculation according to the file basic data to obtain file classification scores;
obtaining a archive category based on the archive classification score and a preset classification score standard;
Calculating an importance rating factor according to the archive basic data and the archive category;
Carrying out importance score calculation based on the importance rating factors to obtain file importance scores;
and selecting a storage period for archiving according to the file importance score and a preset storage period standard.
In an optional implementation manner, the calculating the classification score according to the archive basic data to obtain an archive classification score includes:
the archive formation time score is calculated by the following formula:
A=100-Ynow+YStart
Wherein A is a archive formation time score, Y now is the current year, Y Start is the archive formation year;
the profile classification score is calculated by the following formula:
W=A+B+C
wherein W is the classification score of the file, B is the score of the file content, C is the confidentiality score of the file;
Wherein the profile security score is calculated based on the security level of the profile.
In an alternative embodiment, the calculation of the profile content score includes:
extracting keywords from the file content to obtain a preset number of file keywords;
Searching the keyword score corresponding to the archive keyword based on the keyword score preset in the keyword database;
and summing the keyword scores to obtain archive content scores.
In an alternative embodiment, the obtaining the archive category based on the archive classification score and a preset classification score criterion includes:
the archive files comprise general archives, confidential archives and confidential archives;
And when the file classification score exceeds a preset confidential file score threshold, the file classification class is confidential files.
In an alternative embodiment, the calculating the importance rating factor according to the archive basic data and the archive category includes:
The importance rating factors comprise category factors, integrity factors and confidentiality factors;
the category factor is calculated by the following formula:
Wherein alpha is a class factor, m is the number of classes of the archived data, A i is the corresponding importance weight of the ith class, and C i is the class factor of the ith class;
the privacy factor is calculated by the following formula:
s is a security factor, L is the security level of the current file, and M is the maximum limit of the preset security level;
The integrity factor is calculated by the following formula:
Where η is an integrity factor, T is the integrity of the archive data, and μ is the number of archive data.
In an optional implementation manner, the calculating the importance score based on the importance rating factor to obtain the file importance score includes:
the profile importance score is calculated by the following formula:
I=α×(S×β-η)/t
Wherein I is file importance score, alpha is category factor, eta is integrity factor, S is confidentiality factor, t is normalization factor, and beta is importance parameter.
In an alternative embodiment, the selecting the storage period for archiving according to the file importance score and a preset storage period standard includes:
The shelf life criteria classify shelf life as permanent, long-term or short-term;
wherein the retention period is permanent when the profile importance score is greater than a first score threshold.
In a second aspect, the present invention provides a system for managing files, including:
the file acquisition module is used for acquiring file basic data, wherein the file basic data comprises file forming time, file content and confidentiality deadline of a file;
The classification scoring module is used for performing classification scoring calculation according to the file basic data to obtain file classification scores;
the archive classification module is used for obtaining archive classification based on the archive classification score and a preset classification score standard;
the factor calculating module is used for calculating an importance rating factor according to the archive basic data and the archive category;
the score calculating module is used for calculating an importance score based on the importance rating factors to obtain an archive importance score;
and the archive filing module is used for selecting the archive period for filing according to the archive importance score and a preset archive period standard.
In a third aspect, the present invention further provides an electronic device, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements a method for managing an archive according to any one of the above methods when the processor executes the computer program.
In a fourth aspect, the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the computer readable storage medium is controlled to execute a method for managing an archive according to any one of the foregoing methods.
Compared with the prior art, the method and the system for managing the files have the advantages that file basic data are obtained, the file basic data comprise file forming time, file content and confidentiality deadlines of the files, file classification scores are obtained through calculation according to the file basic data through a classification scoring formula, file filing categories are obtained based on the file classification scores and preset classification scoring standards, importance rating factors are calculated according to the file basic data and the file filing categories, file importance scores are obtained through calculation based on the importance rating factors and the importance scoring formula, and files are filed according to the file importance scores and preset storage deadline standards. The invention provides a management system for classifying, scoring and archiving files through a scientific method, which remarkably improves the efficiency and accuracy of file management. Basic data of the files, such as forming time, content, confidentiality deadline and the like, are processed through an automatic process, classification scores of the files are calculated by using a classification scoring formula, and then the classification of the files is determined according to scoring standards, so that objective evaluation of significance and importance of the files is realized. Specifically, the file importance score is calculated through a formula, and the method comprehensively considers the types, confidentiality degree, integrity, normalization factors and importance parameters of the file, so that the accurate evaluation of the file importance is realized. Meanwhile, the method enhances the safety of the information, and particularly, stricter safety measures are adopted for files with high security level, so that sensitive information is prevented from being leaked. In addition, through scientific classification and importance evaluation, the method can better meet the requirements of different users, provide personalized information service, and adapt to different use scenes by setting weights by the users. In summary, the computing method greatly improves the efficiency and security of archive management.
And selecting a storage period for archiving according to the file importance score and a preset storage period standard. By dividing files into permanent, long-term and short-term storage, resource allocation can be optimized, files with extremely high importance and sensitivity (such as national laws and important historical event records) can be stored for a long term for future reference and research, files with certain importance are properly stored in a certain time, files with lower importance are destroyed in time after being stored in a short term, and storage space is released.
In summary, the method provides a specific scientific calculation method, can optimize the allocation of storage resources based on the importance and the storage period of the files, ensures the safety of information, establishes a perfect information index system, and greatly facilitates the retrieval and use of the files. And the storage efficiency and the retrieval efficiency of the files are improved.
Drawings
FIG. 1 is a flowchart of a method for managing files according to a first embodiment of the present invention;
Fig. 2 is a schematic structural diagram of a file management system according to a second embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the current information explosion age background, with the rapid development of society and economy and the wide popularization of information technology, various institutions and personal generated document materials exhibit unprecedented richness and diversity. These materials include not only traditional paper documents, but also a large number of electronic documents, audio, video materials, and other forms of media. The trend of this diversification makes the number of files rapidly increase, and the demand for storage space increases greatly. Traditional physical storage methods, such as file cabinets and file boxes, have difficulty in coping with such huge amounts of data, and cannot meet the requirements of efficient management and long-term storage.
In the prior art, file management starts to turn to a digital storage scheme, and massive file materials are managed and stored through an electronic file management system. The institutions store a large amount of paper archives on a hard disk, a cloud server or other network platforms through digital processing means such as scanning, photographing and the like. The technology not only effectively saves physical space and reduces the dependence on entity storage facilities, but also remarkably improves the retrieval efficiency of files. The user can quickly locate the required archive information through various modes such as keyword searching, classified browsing and the like, so that the searching time is greatly shortened, and the working efficiency is improved. The digital storage is also beneficial to long-term storage and safety protection of files, reduces the risk of information loss caused by physical damage or loss, and ensures the integrity and usability of file materials.
Nevertheless, current archive management approaches still suffer from a number of shortcomings. Existing management methods are mainly focused on the problem of storing files, but cannot adopt differentiated management and storage strategies according to different types of files. For example, most files are simply classified and stored according to the generation time and the category to which the files belong, and factors such as security level, integrity and the like of the files are ignored, so that the actual meaning of different files cannot be accurately evaluated, and the specific setting of file management deadlines is affected. In addition, the file management is disordered as a whole because the management means is single, and the information retrieval efficiency is low.
In order to solve the above problems, referring to fig. 1, a first embodiment of the present invention provides a method for managing files, including the following steps:
S11, acquiring archive basic data, wherein the archive basic data comprise archive forming time, archive content and confidentiality deadline of an archive;
s12, classifying and scoring calculation is carried out according to the file basic data to obtain file classifying and scoring;
S13, obtaining archive classification based on the archive classification score and a preset classification score standard;
S14, calculating an importance rating factor according to the archive basic data and the archive category;
s15, carrying out importance score calculation based on the importance rating factors to obtain file importance scores;
s16, selecting a storage period for archiving according to the file importance scores and a preset storage period standard.
In step S11, archive base data including archive formation time, archive content, and archive security deadline are acquired.
In one embodiment, the basic data of the electronic version file is obtained, including file forming time, content and confidentiality deadline, and the information such as creation date, modification date and the like can be directly read from the common electronic document format through a metadata extraction technology. For example, for a PDF formatted document, the metadata of the document may be extracted using libraries PyPDF or pdfminer in Python, etc., which provide rich APIs that enable easy reading of the title, author, creation date, modification date, etc. of the document. Also, for Microsoft Office documents (e.g., DOCX, XLsX, etc.), metadata information for a file may be accessed and read via a Java library such as the Apache POI or the Python-DOCX library of Python. The tools not only support reading basic information of the document, but also can help to identify version history, revision record and the like of the document, and improve the accuracy and the integrity of the file. In addition, for the electronic files stored in the database, the required information can be directly obtained through SQL query and other means. The archive management system provides an API interface that allows third party applications to invoke to automatically obtain archive data. In cases where the technology is not fully automated, manual input or verification methods may be used.
In another embodiment, optical Character Recognition (OCR) technology is an effective means of converting scanned or other non-text formatted documents into editable and searchable text. OCR technology is capable of automatically recognizing text content in a document and converting it into a computer-readable text format through image processing and pattern recognition techniques. For example, using TESSERACT OCR engines, not only can text be recognized in multiple languages, but multiple operating systems can be supported, and training can be used to improve the recognition of a particular font or scene. In addition, business software such as Adobe Acrobat Pro DC integrates advanced OCR functions, can efficiently process complex documents such as documents containing charts, tables and different fonts, and ensures that the converted text is accurate and complete. By OCR technology, the method can extract key content from the document in a non-text format as a data source for subsequent document classification, archiving and retrieval.
It is worth to say that the file basic data includes file forming time, file content and file confidentiality limit. For example, scientific papers and policy documents are two very typical archive types. Scientific papers generally contain content such as research objectives, research methods, experimental data, analysis results, conclusions, and references. For example, a research paper on the effects of climate change will detail the selection of the area of investigation, the climate model used, the type of data collected, the method of data analysis, the dominant trends found, the significance of future climate predictions, etc. The policy documents record government or organization issued policy regulations, enforcement guidelines, associated legal regulations, and the like. For example, a new environmental policy document includes information about the context of the policy's attendance, scope of application, primary measures, executives, supervision mechanisms, penalties for violations, etc.
In step S12, a classification score is calculated according to the profile basic data to obtain a profile classification score.
In one embodiment, the profile formation time score is calculated by the following formula:
A=100-Ynow+YStart
Wherein A is a archive formation time score, Y now is the current year, Y Start is the archive formation year;
the profile classification score is calculated by the following formula:
W=A+B+C
wherein W is the classification score of the file, B is the score of the file content, C is the confidentiality score of the file;
wherein the file security score is calculated according to the security level of the file, and the file security score of the confidential file is 30 points.
The calculation of the file content score comprises the steps of extracting keywords of file content to obtain a preset number of file keywords, searching the keyword scores of the file keywords based on the keyword scores preset in a keyword database, and summing the keyword scores to obtain the file content score.
It is worth to say that, this formula reflects the historical meaning of the file by calculating the difference between the file forming year and the current year, and the earlier the file forming time, the higher the score. The method comprises the steps of calculating the scores of the file contents, wherein the calculation of the scores of the file contents comprises the steps of extracting keywords of the file contents to obtain a preset number of file keywords, searching the keyword scores of the file keywords based on the keyword scores preset in a keyword database, and finally summing the keyword scores to obtain the scores of the file contents. The archive security score is calculated based on the security level of the archive, e.g., the archive security score for a confidential document is 30 points, while the security score for a general document is 10 points. The score is confirmed by the actual use scenario, and the present invention is not limited. For example, a research paper file was developed in 2005, and the current year was 2024. First, a profile formation time score of 81 points is calculated. Then, extracting keywords of the archive content, wherein the extracted keywords are weather change, model prediction and data analysis, and the scores of the keywords in a keyword database are 20 points, 15 points and 10 points respectively. Thus, the archive content score was 45 points. Meanwhile, the file is a confidential file, and the confidentiality rating is 30 minutes. Finally, the total score was calculated as 156.
In one embodiment, for keyword extraction, a keyword extraction method of a topic model is selected, and the specific steps of keyword extraction on the archive content include firstly, preprocessing the archive content, including word segmentation, stop word removal and the like, so as to ensure the cleanliness and specification of texts. Training the preprocessed document set by using an LDA (LATENT DIRICHLET Allocation) model, and deducing the topic distribution and topic-word distribution of the document. Each topic is represented by a set of probability distributions of words, by analyzing which the core words under each topic can be determined. Finally, selecting the word with the highest probability from each topic as a keyword, calculating the score of each keyword according to the score in a preset keyword database, and summing the scores to obtain the archive content score. By the method, keywords in the archive content can be extracted. The preset number is 3 in the example, and the method is not limited. For archival keyword databases, a method may be employed that uses a topic model (e.g., LDA) or other keyword extraction technique (e.g., TF-IDF, textRank) to extract keywords from the pre-processed text. The extracted keywords need to be subjected to duplicate removal and normalization processing, so that the consistency and accuracy of the keywords are ensured. The extracted keywords and their related attributes (such as frequency of occurrence, importance score, etc.) are stored in a database, and a relational database (such as MySQL, postgreSQL) or a NoSQL database (such as mongo db) can be selected. The database table structure design should include fields such as keywords, document IDs, number of occurrences, TF-IDF values, etc. for subsequent query and analysis. By the method, keywords in the archive can be efficiently managed and utilized, and the retrieval efficiency in archive content analysis is improved.
In another embodiment, multiple linear regression is used to calculate the profile classification score. The multiple linear regression is a statistical technique for analyzing the relationship between multiple independent and dependent variables. In a archive management system, a predictive model is built by collecting historical data and real-time data of archives, such as archive classification, archive content and archive security level. The core goal of multiple linear regression is to use these known input variables (independent variables) to predict the target output variables (dependent variables). Specifically, it determines the degree of influence of a plurality of independent variables on the dependent variables by weighted combination of the independent variables, thereby obtaining a linear equation. Regression coefficients in this equation, such as a 1、b1、c1, are obtained by historical data fitting, reflecting the magnitude of each independent variable contribution to the dependent variable. In the present invention, multiple linear regression is used to predict classification categories of archives. First, by collecting the current profile base information, the system enters these data as arguments into the regression model. Then, the model calculates by using the independent variables, combines the independent variables into a linear equation through the regression coefficient fitted before, and finally predicts the classification category of the file.
In step S13, a archive category is obtained based on the archive classification score and a preset classification score standard.
In one embodiment, the archive categories include general archives, confidential archives, and confidential archives;
And when the file classification score exceeds a preset confidential file score threshold, the file classification class is confidential files. In one embodiment, the scoring criteria for classification is by setting up an insulated profile threshold. The archive classification score is divided into 100 points. For example, the privacy profile threshold is set to 80 minutes. Classification of over 80 points is an extremely dense archive. The invention is not limited to setting the threshold value of the confidential file. The establishment of the confidential profile and the general profile is determined based on the set threshold. The rationality of the method is that the importance and the sensitivity of the file can be reflected more comprehensively and accurately by comprehensively considering the file forming time score, the file content score and the file confidentiality score. Specifically, the file formation time score reflects the historical meaning and timeliness of the file, the file content score evaluates the core content and information meaning of the file through a keyword extraction technology, and the file confidentiality score measures the sensitivity and confidentiality requirements of the file. The scores of the three aspects jointly determine the final classification of the files, and the scientificity and the accuracy of classification are ensured. Through the mode, files with different importance and sensibility can be reasonably archived, subsequent retrieval and management are facilitated, and the efficiency and safety of file retrieval are improved. For example, confidential and confidential documents may take more stringent access control measures, such as multi-factor authentication, access log records, and periodic audits, to ensure that only authorized personnel have access to such sensitive information, thereby preventing information leakage and abuse. And the general archives can adopt looser access control measures, such as simple user name and password verification, so that more people are allowed to conveniently review and share, and the working efficiency is improved. Access control measures the invention is not limited.
In another embodiment, in addition to classifying archival categories into general archives, confidential archives, and confidential archives, the archives may be classified according to frequency of use, such as regular archives, occasional archives, and rarely used archives. The specific classification threshold may be set such that files accessed more than 50 times in the past year are classified as commonly used files, files accessed 5 times to 50 times are classified as occasionally used files, files accessed less than 5 times are classified as rarely used files. For example, a year financial report is reviewed and referenced many times over the past year and is thus categorized as a common archive for storage in a readily accessible location, while a historical project summary report is reviewed only in certain instances and is thus categorized as an occasional archive for storage in a secondary storage location, and a ten year-old meeting summary is almost never reviewed and is thus categorized as an extremely rare archive for storage in cold storage.
In step S14, an importance rating factor is calculated from the archive base data and the archive category.
In one embodiment, the importance rating factors include a category factor, an integrity factor, a privacy factor;
the category factor is calculated by the following formula:
Wherein alpha is a class factor, m is the number of classes of the archived data, A i is the corresponding importance weight of the ith class, and C i is the class factor of the ith class;
the privacy factor is calculated by the following formula:
Wherein S is a security factor, L is the security level of the current file, M is the maximum limit of the preset security level, and the integrity factor is calculated by the following formula:
Where η is an integrity factor, T is the integrity of the archive data, and μ is the number of archive data.
It should be noted that, based on the basic data of the archive and the archive category, importance rating factors are calculated, and the factors include category factors, integrity factors and confidentiality factors. The category factors are calculated by formulas and reflect the relative importance of the category to which the archive belongs. Different profile categories (e.g., general profile, confidential profile) have different importance weights, and classification factors indicate the specific importance of the profile in that category. The confidentiality factor is calculated by a formula and reflects confidentiality requirements and sensitivity of the file. The higher the security level, the more sensitive the profile, requiring more stringent protection. The integrity factor is calculated by a formula and reflects the integrity and reliability of the archive data. The higher the integrity of the archive data, the more complete and reliable the information in the archive. By comprehensive calculation of these factors, the importance of the archive can be comprehensively evaluated. For example, a file belongs to a high importance class (such as an confidential file), the confidentiality level is high (such as the confidential level), and the data integrity is good, so that the class factor, the confidentiality factor and the integrity factor are all high, and the final importance rating factor is also high, which indicates that the file has extremely high management and protection significance.
In one embodiment, the data integrity may be obtained by first performing a comprehensive check of the contents of the archive to determine whether all necessary information contained in the archive is complete, such as key information for the file title, author, date, content, etc. Second, missing or incomplete information items in the statistics file, such as missing signatures, ambiguous dates, missing portions of content, etc. And then, calculating the data integrity according to the total information item number and the missing information item number of the file. The specific formula is that the data integrity= (total information item number-missing information item number)/total information item number. For example, if a certain profile has 10 information items in total, 2 of which are missing, the data integrity is (10-2)/10=0.8, i.e., 80%. By the method, the integrity of the file can be quantized, and data can be provided for subsequent integrity factor calculation.
In step S15, an importance score is calculated based on the importance rating factor to obtain a file importance score.
In one embodiment, the profile importance score is calculated by the following formula:
I=α×(S×β-η)/t
Wherein I is file importance score, alpha is category factor, eta is integrity factor, S is confidentiality factor, t is normalization factor, and beta is importance parameter.
It should be noted that the profile importance score represents the overall importance of the profile. Category factors are calculated through formulas and reflect the relative importance of the categories to which the files belong. The integrity factor is calculated by a formula and reflects the integrity and reliability of the archive data. The higher the integrity of the archive data, the more complete and reliable the information in the archive. The confidentiality factor is calculated by a formula and reflects confidentiality requirements and sensitivity of the file. The higher the security level, the more sensitive the profile, requiring more stringent protection. A normalization factor for normalizing the importance scores such that the same scoring criteria are used for different profiles. Importance parameters for adjusting the weight of the security factor in the importance score. For example, there is a file whose category is confidential files, the importance weight is 0.8, and the classification factor is 0.9, so the category factor is 0.72. The security level of the file is 3, and the preset maximum limit of the security level is 5, so that the security factor is 0.6. The file integrity factor is 0.8. The normalization factor was 0.001 and the importance parameter was 1.5. From these quantities, the importance score of 72 can be calculated by substituting them into the formula. For use in subsequent determination of the shelf life.
In step S16, a storage period is selected for archiving according to the file importance score and a preset storage period criterion.
In one embodiment, the shelf life criteria classifies shelf life as permanent, long-term or short-term, wherein the shelf life is permanent when the archive importance score is greater than 80 points. The storage period criteria are specifically defined as a period of time that is permanent when the file importance score is greater than 80, such files often have extremely high historical, legal, or strategic importance and require long-term storage for future reference and study. When the importance score of the file is between 50 and 80 minutes, the storage period is long, and the specific time is 20 to 50 years, and the file has high use significance in a certain time period, such as long-term cooperation agreement. When the importance score of the file is lower than 50, the storage period is short, usually 1 to 10 years, and the file has a use requirement in a short period, but the importance of long-term storage is not great, such as week work schedule.
It should be noted that, for files exceeding the storage period, the files are processed according to the following steps, firstly, the file management system evaluates and audits the files, and the evaluation content comprises actual use conditions, historical significance and the like of the files. And secondly, identifying and screening the files according to the evaluation result, destroying the files which do not have continuous preservation significance, and applying for prolonging the preservation period of the files which still have preservation significance. The destroying mode can comprise physical cold destroying (such as storing a storage medium in a warehouse) and electronic destroying (such as deleting an electronic file), so that the file content is ensured not to be leaked, and a destroying certificate is required to be issued after destroying and a new file is generated. The file for which the storage period is extended should have an extended reason and period, and the storage period record of the file should be updated to ensure that the management and use of the file meet the new storage requirements. After the processing is finished, relevant records are archived, including evaluation reports, approval files, destruction certificates and the like, and the records are not only important basis for file management, but also important data for future audit and traceability. Through the steps, files exceeding the storage period can be ensured to be properly processed, and unnecessary waste of storage resources can be avoided.
In summary, the present invention provides a method and a system for managing files, which aim to classify, score and archive files by a scientific method, so as to improve efficiency and accuracy of file management. The method comprises the steps of firstly obtaining basic data of the file, including file forming time, file content and confidentiality deadline of the file. From these basic information, the system calculates a classification score for the archive using a classification score formula that comprehensively considers the formation time, content, and security level of the archive. Based on the classification score and a preset classification score criteria, the system determines archive categories of the archive, including general archives, confidential archives, and confidential archives. The system then calculates importance rating factors based on the base data and archive categories of the archive, including category factors, integrity factors, and privacy factors. The category factor reflects the relative importance of the category to which the archive belongs, the confidentiality factor reflects the confidentiality requirement and sensitivity of the archive, and the integrity factor reflects the integrity and reliability of the archive data. Based on these importance rating factors, the system further calculates an importance score for the profile in conjunction with an importance score formula. Finally, the system selects a proper storage period for archiving according to the important scores of the files and the preset storage period standard. The shelf life criteria divide shelf life into permanent, long-term or short-term, wherein when the file importance score is greater than eighty, the shelf life is permanent, such files have extremely high historical, legal or strategic importance and need to be stored for future reference and research, when the file importance score is between fifty and eighty minutes, the shelf life is long, typically twenty to fifty years, such files still have high usage significance, such as long-term partnership agreement, for a certain period of time, and when the file importance score is less than fifty minutes, the shelf life is short, typically one year to ten years, such files have usage requirements for a short period of time, but the long-term storage significance is not great, such as week work plan. Through the scientific classification and scoring method, the invention not only optimizes the configuration of the archive resources, but also improves the efficiency and safety of archive management. In addition, the method also enhances the safety of the information, and particularly, stricter safety measures are adopted for files with high security level, so that sensitive information is prevented from being leaked. Furthermore, the user adapts to different use scenes by setting weight by himself, so that the storage efficiency and the retrieval efficiency of file management are improved.
Referring to fig. 2, a second embodiment of the present invention provides a method for managing files, including:
the file acquisition module is used for acquiring file basic data, wherein the file basic data comprises file forming time, file content and confidentiality deadline of a file;
the classification scoring module is used for performing classification scoring calculation according to the file basic data to obtain file classification scores;
the archive classification module is used for obtaining archive classification based on the archive classification score and a preset classification score standard;
the factor calculating module is used for calculating an importance rating factor according to the archive basic data and the archive category;
The score calculating module is used for calculating the importance scores based on the importance rating factors to obtain file importance scores;
and the archive filing module is used for selecting the archive period for filing according to the archive importance score and a preset archive period standard.
Preferably, the archive acquisition module is configured to:
And acquiring archive basic data, wherein the archive basic data comprise archive forming time, archive content and confidentiality deadlines of the archive.
Preferably, the classification scoring module is configured to:
According to the archive basic data, an archive classification score is obtained through calculation according to a classification score formula, and the method comprises the following steps:
the archive formation time score is calculated by the following formula:
A=100-Ynow+YStart
Wherein A is a archive formation time score, Y now is the current year, Y Sta rt is the archive formation year;
the profile classification score is calculated by the following formula:
W=A+B+C
wherein W is the classification score of the file, B is the score of the file content, C is the confidentiality score of the file;
wherein the file security score is calculated according to the security level of the file, and the file security score of the confidential file is 30 points.
The calculation of the archive content score comprises the following steps:
extracting keywords from the file content to obtain a preset number of file keywords;
Searching the keyword scores of the file keywords based on the keyword scores preset in the keyword database;
and summing the keyword scores to obtain archive content scores.
Preferably, the archive classification module is configured to:
Obtaining a archive category based on the archive classification score and a preset classification score standard, including:
the archive files comprise general archives, confidential archives and confidential archives;
and when the file classification score exceeds a preset confidential file score threshold, the file archiving category is confidential files.
Preferably, the factor calculation module is configured to:
calculating an importance rating factor according to the archive basic data and the archive category, wherein the importance rating factor comprises the following steps:
The importance rating factors comprise category factors, integrity factors and confidentiality factors;
the category factor is calculated by the following formula:
Wherein alpha is a class factor, m is the number of classes of the archived data, A i is the corresponding importance weight of the ith class, and C i is the class factor of the ith class;
the privacy factor is calculated by the following formula:
Wherein S is a security factor, L is the security level of the current file, M is the maximum limit of the preset security level, and the integrity factor is calculated by the following formula:
Where η is an integrity factor, T is the integrity of the archive data, and μ is the number of archive data.
Preferably, the scoring calculation module is configured to:
and calculating the importance score based on the importance rating factor to obtain an archive importance score, wherein the method comprises the following steps:
the profile importance score is calculated by the following formula:
I=α×(S×β-η)/t
Wherein I is file importance score, alpha is category factor, eta is integrity factor, S is confidentiality factor, t is normalization factor, and beta is importance parameter.
Preferably, the archive module is used for:
Selecting a storage period for archiving according to the file importance score and a preset storage period standard, wherein the method comprises the following steps:
The shelf life criteria classify shelf life as permanent, long-term or short-term;
wherein, when the file importance score is greater than 80 points, the keeping period is permanent.
It should be noted that, the archive management system provided in the embodiment of the present invention is configured to execute all the flow steps of the archive management method in the foregoing embodiment, and the working principles and the beneficial effects of the two correspond to each other one by one, so that a detailed description is omitted.
The embodiment of the invention also provides electronic equipment. The electronic device comprises a processor, a memory and a computer program, such as a data acquisition program, stored in the memory and executable on the processor. The steps in the above embodiments of the method for managing files are implemented when the processor executes the computer program, for example, step S11 shown in fig. 1. Or the processor, when executing the computer program, performs the functions of the modules/units in the above-mentioned device embodiments, such as an archive acquisition module.
The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the electronic device.
The electronic equipment can be a desktop computer, a notebook computer, a palm computer, an intelligent tablet and other computing equipment. The electronic device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the above components are merely examples of electronic devices and are not limiting of electronic devices, and may include more or fewer components than those described above, or may combine certain components, or different components, e.g., the electronic devices may also include input-output devices, network access devices, buses, etc.
The Processor may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that is a control center of the electronic device, connecting various parts of the overall electronic device using various interfaces and lines.
The memory may be used to store the computer program and/or modules, and the processor may implement various functions of the electronic device by running or executing the computer program and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area which may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), etc., and a storage data area which may store data created according to the use of the cellular phone (such as audio data, a phonebook, etc.), etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart memory card (SMART MEDIA CARD, SMC), secure Digital (SD) card, flash memory card (FLASH CARD), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
Wherein the integrated modules/units of the electronic device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that the above-described apparatus embodiments are merely illustrative, and the units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the invention, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.
Claims (10)
1. A method for managing files, comprising:
Acquiring archive basic data, wherein the archive basic data comprise archive forming time, archive content and confidentiality deadlines of the archive;
Performing classification scoring calculation according to the archive basic data to obtain archive classification scores;
obtaining a archive category based on the archive classification score and a preset classification score standard;
Calculating an importance rating factor according to the archive basic data and the archive category;
Carrying out importance score calculation based on the importance rating factors to obtain file importance scores;
and selecting a storage period for archiving according to the file importance score and a preset storage period standard.
2. A method of managing a profile according to claim 1, wherein said performing a classification score calculation based on said profile base data to obtain a profile classification score comprises:
the archive formation time score is calculated by the following formula:
A=100-Ynow+YStart
Wherein A is a archive formation time score, Y now is the current year, Y Start is the archive formation year;
the profile classification score is calculated by the following formula:
W=A+B+C
wherein W is the classification score of the file, B is the score of the file content, C is the confidentiality score of the file;
Wherein the profile security score is calculated based on the security level of the profile.
3. A method of archive management according to claim 2, wherein the calculation of archive content scores comprises:
extracting keywords from the file content to obtain a preset number of file keywords;
Searching the keyword score corresponding to the archive keyword based on the keyword score preset in the keyword database;
and summing the keyword scores to obtain archive content scores.
4. A method of archive management according to claim 1, wherein the obtaining archive categories based on the archive classification scores and a preset classification score criterion includes:
the archive files comprise general archives, confidential archives and confidential archives;
And when the file classification score exceeds a preset confidential file score threshold, the file classification class is confidential files.
5. A method of managing an archive of claim 1 wherein said calculating an importance rating factor based on said archive base data and said archive category comprises:
The importance rating factors comprise category factors, integrity factors and confidentiality factors;
the category factor is calculated by the following formula:
Wherein alpha is a class factor, m is the number of classes of the archived data, A i is the corresponding importance weight of the ith class, and C i is the class factor of the ith class;
the privacy factor is calculated by the following formula:
s is a security factor, L is the security level of the current file, and M is the maximum limit of the preset security level;
The integrity factor is calculated by the following formula:
Where η is an integrity factor, T is the integrity of the archive data, and μ is the number of archive data.
6. A method of archive management according to claim 5, wherein said calculating an importance score based on said importance rating factors, obtaining an archive importance score, comprises:
the profile importance score is calculated by the following formula:
I=α×(S×β-η)/t
Wherein I is file importance score, alpha is category factor, eta is integrity factor, S is confidentiality factor, t is normalization factor, and beta is importance parameter.
7. A method of managing files according to claim 1, wherein selecting a retention period for archiving based on the file importance score and a predetermined retention period criterion comprises:
The shelf life criteria classify shelf life as permanent, long-term or short-term;
wherein the retention period is permanent when the profile importance score is greater than a first score threshold.
8. A system for managing files, comprising:
the file acquisition module is used for acquiring file basic data, wherein the file basic data comprises file forming time, file content and confidentiality deadline of a file;
The classification scoring module is used for performing classification scoring calculation according to the file basic data to obtain file classification scores;
the archive classification module is used for obtaining archive classification based on the archive classification score and a preset classification score standard;
the factor calculating module is used for calculating an importance rating factor according to the archive basic data and the archive category;
the score calculating module is used for calculating an importance score based on the importance rating factors to obtain an archive importance score;
and the archive filing module is used for selecting the archive period for filing according to the archive importance score and a preset archive period standard.
9. An electronic device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing a method of managing an archive according to any one of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the method of managing an archive according to any one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411625274.XA CN119557419B (en) | 2024-11-14 | 2024-11-14 | File management method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411625274.XA CN119557419B (en) | 2024-11-14 | 2024-11-14 | File management method and system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119557419A true CN119557419A (en) | 2025-03-04 |
| CN119557419B CN119557419B (en) | 2025-09-26 |
Family
ID=94765736
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411625274.XA Active CN119557419B (en) | 2024-11-14 | 2024-11-14 | File management method and system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119557419B (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119847998A (en) * | 2025-03-18 | 2025-04-18 | 青岛大学 | File classification management system based on big data |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8442951B1 (en) * | 2011-12-07 | 2013-05-14 | International Business Machines Corporation | Processing archive content based on hierarchical classification levels |
| CN118796760A (en) * | 2024-06-14 | 2024-10-18 | 北京当代档案事务咨询中心 | A method, system, electronic device and storage medium for managing electronic files |
-
2024
- 2024-11-14 CN CN202411625274.XA patent/CN119557419B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8442951B1 (en) * | 2011-12-07 | 2013-05-14 | International Business Machines Corporation | Processing archive content based on hierarchical classification levels |
| CN118796760A (en) * | 2024-06-14 | 2024-10-18 | 北京当代档案事务咨询中心 | A method, system, electronic device and storage medium for managing electronic files |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119847998A (en) * | 2025-03-18 | 2025-04-18 | 青岛大学 | File classification management system based on big data |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119557419B (en) | 2025-09-26 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109635082B (en) | Policy influence analysis method, device, computer equipment and storage medium | |
| CN108509482B (en) | Question classification method and device, computer equipment and storage medium | |
| CN117726166A (en) | Artificial intelligence enterprise customer risk information analysis and evaluation method and system based on large language model | |
| CN119557419B (en) | File management method and system | |
| CN111767716A (en) | Method and device for determining enterprise multilevel industry information and computer equipment | |
| CN117493645B (en) | Big data-based electronic archive recommendation system | |
| CN118761736A (en) | A document management system and method based on artificial intelligence | |
| CN113220885B (en) | Text processing method and system | |
| CN119782545A (en) | Government affairs processing method and electronic device based on large language model of knowledge graph | |
| Gupta et al. | Creation and analysis of an international corpus of privacy laws | |
| CN114117038A (en) | Document classification method, device and system and electronic equipment | |
| CN119646278B (en) | File intelligent management system based on multi-type analysis | |
| CN116738979A (en) | Power grid data search method, system and electronic equipment based on core data identification | |
| CN110941952A (en) | Method and device for perfecting audit analysis model | |
| CN109426905B (en) | Criminal document criminal deviation judging method and device | |
| CN112785154A (en) | Safety evaluation method of cloud ERP system | |
| CN115982429B (en) | Knowledge management method and system based on flow control | |
| CN110858214B (en) | Recommendation model training and further auditing program recommendation method, device and equipment | |
| CN117632873A (en) | File sorting method and device | |
| CN111367879A (en) | Legal document processing method and device | |
| CN114495138A (en) | Intelligent document identification and feature extraction method, device platform and storage medium | |
| Asfoor | Applying Data Science Techniques to Improve Information Discovery in Oil And Gas Unstructured Data | |
| CN112988972A (en) | Administrative penalty file evaluation and checking method and system based on data model | |
| CN117236899B (en) | Electronic file information release system based on mobile terminal | |
| KR102785954B1 (en) | System for providing legal technology based case searching service |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |