[go: up one dir, main page]

CN113139071A - Document processing system and method for classifying documents by machine learning - Google Patents

Document processing system and method for classifying documents by machine learning Download PDF

Info

Publication number
CN113139071A
CN113139071A CN202110087670.1A CN202110087670A CN113139071A CN 113139071 A CN113139071 A CN 113139071A CN 202110087670 A CN202110087670 A CN 202110087670A CN 113139071 A CN113139071 A CN 113139071A
Authority
CN
China
Prior art keywords
classification
machine learning
document
codes
folder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110087670.1A
Other languages
Chinese (zh)
Other versions
CN113139071B (en
Inventor
廖俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Avision Inc
Original Assignee
Avision Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avision Inc filed Critical Avision Inc
Publication of CN113139071A publication Critical patent/CN113139071A/en
Application granted granted Critical
Publication of CN113139071B publication Critical patent/CN113139071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to a file processing system and method for classifying files by using machine learning, which comprises an input module, a processing module and more than one storage module, wherein the storage module defaults to a classified folder during a first modeling program, the classified folder corresponds to codes, after the first modeling program is completed, the input module can receive more than one file image, the processing module calculates according to a group of machine learning model information and the file image to generate a calculation result, and the processing module stores the file image in the classified folder according to the calculation result; the file images are judged in real time, and classification of the file images is automatically completed according to codes of the corresponding classification folders, so that accuracy and efficiency of classifying files are improved.

Description

Document processing system and method for classifying documents by machine learning
Technical Field
The present invention relates to document processing systems and methods, and more particularly, to a document processing system and method for classifying documents using machine learning.
Background
With the change of science and technology, many electronic devices that help people to work conveniently, such as copiers, scanners or multi-function office machines, are developed rapidly in the spring of the rainy season, such as copiers, scanners or multi-function office machines, and the documents are scanned into electronic documents for storage by placing the documents on a paper feeding component or a paper placing component of the copier, scanner or multi-function office machine through the scanning component of the copier, scanner or multi-function office machine, so as to improve the convenience of document storage.
When people go to office places such as office units, banks and the like, paper application files with different formats and different services are required to be filled and submitted to undertaking personnel, the undertaking personnel scan the paper application files into electronic file images through a copying machine, a scanner or a multifunctional office machine, and then manually scan bar codes (Barcode and Patch code) on the paper application files through a bar code scanner or manually confirm specific characters and form modes to classify and file the electronic file images and complete service undertaking. However, when the barcode scanner fails and the barcode is stained and can not be read, the operation time of the undertaking staff is prolonged and the efficiency is low, so that the public is worried about intolerance due to too long waiting time, and the undertaking staff is complained; or when the contractor documents the electronic document image according to the specific characters and forms, and is influenced by external environments such as the consultations of the civil or other contractors, the classification and filing errors or no classification may occur due to distraction, so that the interests of the civil may be damaged.
In addition, when people go to a hospital clinic for visiting, patient data needs to be filled in, and then counter staff scan the patient data into electronic file images through a copying machine, a scanner or a multifunctional affair machine and then carry out classified filing, however, when the bar code scanner fails and bar codes are stained and cannot be read, the operation time is prolonged, the patients cannot visit the hospital in real time due to overlong waiting time, dangers are caused, or in the process of filing the electronic file images by the counter staff according to specific characters and forms in a classified manner, when the counter staff is influenced by external environments such as patients or nurses, classification filing errors or no classification can be caused due to distraction, the patient data are abnormal, and subsequent medical disputes can also be caused.
Conventionally, an electronic document image obtained after a copy machine, a scanner or a multi-function peripheral scans a paper document can actually assist in document processing, however, in the subsequent classified filing of electronic document images, the operation time is easily prolonged due to the fact that a bar code on the document cannot be read smoothly, the operation is slow, or the classified filing of electronic document images is easily influenced by external environment, so that the classified filing is inaccurate, and therefore, the defects of time consuming, inefficiency and easiness in error exist in the manual mode of classifying and filing electronic document images in the prior art.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, a primary objective of the present invention is to provide a document processing system and method for classifying documents by machine learning, which can improve the accuracy and efficiency of classification by setting the codes of the classification folders in advance, calculating the document images by machine learning, classifying and storing the calculated results according to the codes of the classification folders, and automatically classifying.
The main technical means adopted to achieve the above object is a file processing method for classifying files by machine learning, which is executed on a file processing system, wherein the file processing system defaults more than one classified folder, and the classified folders correspond to codes, and the method comprises the following steps:
receiving more than one file image;
calculating according to a set of machine learning model information and file images generated by completing a primary modeling program to generate a calculation result;
and storing the file images in the corresponding classified folders according to the calculation result and the codes of more than one classified folders.
According to the method, the file images and the machine learning model information are calculated to obtain the calculation result, the calculation result is processed with the codes of more than one classification folder, and the file images are stored in the corresponding classification folders, so that the classification process is simplified, and the efficiency and the accuracy of classifying files are effectively improved in an automatic classification mode.
Another main technical means for achieving the above object is a document processing system for classifying documents using machine learning, comprising:
the input module acquires more than one document image;
the storage module is used for defaulting the classified folders, and the classified folders correspond to codes;
the processing module is respectively connected with the input module and the storage module;
the processing module receives the file images and calculates the file images and a group of machine learning model information generated by completing a primary modeling program to generate a calculation result, and the processing module compares the calculation result with codes of the classification folders to store the file images in the classification folders.
According to the system, after the file image is obtained by the input module, the processing module calculates the file image and the machine learning model information generated by the completed first modeling program to generate a calculation result, the processing module compares the calculation result with the code of the classification folder to determine whether the file image is stored in the classification folder, and the processing module stores the file image in the classification folder of the storage module, so that the classification process is simplified, and the efficiency and the accuracy of classifying the folder can be effectively improved through an automatic classification mode.
As described above, the present invention provides a document processing system and method for classifying documents by machine learning.
In order to make the aforementioned and other objects of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The following description will explain embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1 is a block diagram of a system architecture according to a preferred embodiment of the present invention;
FIG. 2 is another system architecture diagram of the preferred embodiment of the present invention;
FIG. 3 is a flow chart of a first method of the preferred embodiment of the present invention;
FIG. 4 is a flow chart of a second method of the preferred embodiment of the present invention;
FIG. 5 is a flow chart of a third method in accordance with the preferred embodiment of the present invention;
FIG. 6 is a flow chart of a fourth method of the preferred embodiment of the present invention;
FIG. 7 is a flow chart of a fifth method of the preferred embodiment of the present invention;
fig. 8 is a flow chart of a sixth method of a preferred embodiment of the present invention.
The reference numbers illustrate:
11: an input module;
12: a processing module;
13: a storage module;
131: classifying folders;
132: a code;
14: expanding a storage module;
141: expanding the classified folders;
142: and (4) code.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. While the invention will be described in connection with the preferred embodiments, there is no intent to limit its features to those embodiments. On the contrary, the invention is described in connection with the embodiments for the purpose of covering alternatives or modifications that may be extended based on the claims of the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be practiced without these particulars. Moreover, some of the specific details have been left out of the description in order to avoid obscuring or obscuring the focus of the present invention.
Referring to fig. 1, a document processing system for classifying documents by machine learning according to a preferred embodiment of the present invention includes an input module 11, a processing module 12 and at least one storage module 13; the processing module 12 is electrically connected with the input module 11 and the storage module 13 respectively; in this embodiment, the document processing system for classifying documents by machine learning according to the present invention includes: a copier, a scanner, or a multifunction Peripheral (MFP) having a Function of scanning a document. In this embodiment, two or more storage modules 13 may be further included for respectively storing different data.
In this embodiment, the input module 11 obtains one or more document images according to one or more paper documents to be scanned by the user, and stores the obtained document images in the storage module 13 after being processed by the processing module 12, specifically, the storage module 13 defaults to have a classification folder 131, and the classification folder 131 corresponds to a code 132, and the code 132 of the classification folder 131 is compared with the document images to determine whether the document images are stored in the classification folder 131. Further, when two or more storage modules 13 are included, more than one classification folder 131 may be respectively disposed in each storage module 13, and each classification folder 131 corresponds to one code 132, so that the classification folder 131 to be stored with the document image is determined by comparing the document image with the codes 132 of the classification folders 131. In the preferred embodiment, the specific application manner of the classification folders 131 corresponding to the codes 132 can be that the storage modules 13 respectively default to the classification folders 131, and the corresponding codes 132 are automatically marked respectively without manual or prior marking, so as to simplify the judgment of classification; in addition, based on the special identification requirement, when the storage modules 13 default to the classification folders 131, the corresponding codes 132 may be set for the classification folders 131, and the foregoing application manner is only an example and is not limited thereto.
In use, after a user puts more than one paper document into a copier, a scanner or a multi-function peripheral with a document scanning function, the input module 11 correspondingly obtains more than one document image and outputs the document image to the processing module 12, the processing module 12 performs calculation according to a set of machine learning model information generated by completing a first modeling procedure and the received document image to generate a calculation result, and the processing module 12 compares the calculation result with the code 132 of the classification folder 131 to store the document image in the classification folder 131.
Specifically, the calculation result of the corresponding file image includes a code, and the processing module 12 compares the code 132 of the classification folder 131 according to the code of the calculation result to determine whether the same code is compared, and if so, stores the file image in the classification folder 131.
In addition, when the user classifies the document images by using the document processing system for classifying documents by machine learning according to the present invention, if the processing module 12 determines that it is impossible to compare whether the document image to be classified is stored in the classification folder 131 according to the calculation result, the processing module 12 further executes an auxiliary determination program to assist in determining whether the document image to be classified is stored in the classification folder 131 through an auxiliary determination process, wherein the auxiliary determination process is performed by the processing module 12 obtaining more than one image feature information corresponding to the document image according to the received document image, performing a processing according to the image feature information to generate an auxiliary determination result, and comparing the auxiliary determination result with the code 132 of the classification folder 131 to determine whether the document image is stored in the classification folder 131, specifically, the auxiliary determination result includes a code, and the processing module 12 compares the code of the auxiliary determination result with the code 132 of the classification folder 131 to determine whether to store the document image in the classification folder 131, so as to improve the accuracy and applicability of the classification document image through the auxiliary determination procedure. In the present embodiment, the image feature information includes Optical Character Recognition information (OCR), document image size information, document image color information, and the like.
As mentioned above, the machine learning model information used by the processing module 12 to calculate with the document image is generated by the first modeling program, the specific process of the first modeling program is that the processing module 12 sets the classification folder 131 first, and the classification folder 131 has the corresponding code 132, wherein the corresponding code 132 can be automatically preset by the processing module 12, or automatically set or set manually, the input module 11 receives a plurality of document images for machine learning, the processing module 12 processes the document images by the machine learning program to generate the set of machine learning model information, wherein the set of machine learning model information includes a plurality of sets of coefficients, and one of the plurality of sets of coefficients corresponds to the code 132 of the classification folder 131, when the processing module 12 completes the generation of the set of machine learning model information, the first modeling procedure is completed; machine learning model information required for automatically classifying the document images is established through a primary modeling program, and the efficiency and the accuracy of classifying the document images are improved.
Further, in another embodiment, please refer to fig. 2, when the user has different types of document images to be classified, the document processing system for classifying documents by machine learning according to the present invention further includes more than one expansion storage module 14, the default expansion classification folder 141 in the expansion storage module 14, the processing module 12 further executes an expansion modeling program to provide the function of expanding and classifying different types of document images through an expansion modeling process, wherein the expansion modeling process sets the corresponding expansion classification folder 141 by the processing module 12, the expansion classification folder 141 corresponds to a code 142, the processing module 12 receives a plurality of document images for machine learning, and processes the plurality of document images through the machine learning program to generate a new set of machine learning model information, the new set of machine learning model information includes a plurality of new sets of coefficients, and these new coefficients correspond to the codes 132 of the classification folder 131 and the codes 142 of the expansion classification folder 141, and after the processing module 12 completes the generation of the new set of machine learning model information, the expansion modeling program is completed, and the new set of machine learning model information, the different types of document images and the classified document images are calculated to classify the different types of document images and the classified document images, and are stored in the corresponding classification folder 131 or the corresponding expansion classification folder 141, so as to improve the expandability and use flexibility of the document processing system for classifying documents by machine learning according to the present invention.
Further, in order to match the content of the foregoing embodiments, a specific application is illustrated as a use scenario, which is not limited thereto; for example, when the user wants to classify two bills of different types on hand, after the input module 11 obtains the file images of the two bills of different types, the processing module 12 calculates the obtained two file images and the machine learning model information in sequence to obtain corresponding calculation results in sequence, and the processing module 12 compares the two calculation results in sequence with the codes 132 of the classification folder 131 respectively to determine which file image of the bill of different types needs to be classified and stored in the classification folder 131; further, if the code 132 of the corresponding classification folder 131 cannot be compared with the document image of one of the bills, the processing module 12 further performs an auxiliary determination procedure to further determine the corresponding stored classification folder 131. In addition, if there are other different types of document images of bills to be classified, a corresponding number of the extended storage modules 14 are provided, each extended storage module 14 defaults to the corresponding extended classification folder 141, and the processing module 12 completes the extended modeling procedure to obtain new machine learning model information for calculating the calculation result corresponding to the one of the classification folders 131 or the one of the extended classification folders 141.
Therefore, according to the above embodiments and specific application, the input module 11 outputs the received document images to the processing module 12, the processing module 12 performs calculation according to the machine learning model information and the received document images to generate a calculation result, the processing module 12 compares the calculation result with the codes 132 of the classification folder 131 to determine whether the document images should be classified and stored in the classification folder 131, and the processing module 12 stores the document images in the corresponding classification folder 131, so as to simplify the classification process and effectively improve the efficiency and accuracy of classifying the documents by an automated classification method.
In addition, the auxiliary judgment process can provide further auxiliary judgment for the file images which cannot be judged at present, so that the accuracy and the applicability of classifying the file images are improved.
In addition, the function of classifying different types of file images can be further expanded through expanding the modeling process, so that the use expandability and the use elasticity of the invention are improved.
According to the above embodiments and specific application, the present invention further generalizes the file processing method using machine learning to classify files, as shown in fig. 3, which is executed on the file processing system using machine learning to classify files according to the present invention, the file processing system defaults more than one classified folder 131, and the classified folders 131 correspond to codes 132, and the method includes the following steps:
receiving one or more document images (S20);
calculating according to a set of machine learning model information and document images generated by completing the primary modeling procedure to generate a calculation result (S30);
based on the calculation result and the code 132 of the classification folder 131, the document image is stored in the corresponding classification folder 131 (S40).
As shown in fig. 4, when the above steps are executed to the step of storing the document image in the corresponding classification folder 131 according to the calculation result and the code 132 of the classification folder 131(S40), the method further includes the following steps:
the calculated result is compared with the codes 132 of the classification folders 131 to store the document images in the corresponding classification folders 131 (S41).
Referring to fig. 5, when the above steps are performed to the step of comparing the calculated result with the codes 132 of the classification folders 131 to store the document images in the corresponding classification folders 131(S41), the method further includes the following steps:
judging whether the codes 132 of the same classification folders 131 are compared or not according to the codes of the calculation results (S411);
if so, the document image is stored in the corresponding classification folder 131 (S412).
Referring to FIG. 5, when the above-mentioned step is executed to "determine whether the codes 132 of the same classification folders 131 are matched according to the codes of the calculation results (S41)", if not, the document processing system further provides an auxiliary determination program (S413).
Referring to fig. 5 and 6, the auxiliary determination procedure includes the following steps:
acquiring one or more image feature information of the document image (S4131);
processing according to the image characteristic information to generate an auxiliary judgment result (S4132);
comparing the codes of the auxiliary judgment result with the codes 132 of the one or more classification folders 131 to store the document images in the corresponding classification folders 131 (S4133); the image feature information includes Optical Character Recognition (OCR), document image size information, document image color information, and the like.
In this embodiment, please refer to fig. 7, wherein the first modeling procedure further includes the following steps:
setting a code 132 corresponding to the classification folder 131 (S51);
receiving a plurality of document images (S52); wherein, the received file image is used for machine learning;
executing a machine learning program to process the document images to generate machine learning model information (S53); wherein the machine learning model information includes a plurality of sets of coefficients, one of the plurality of sets of coefficients corresponding to the code 132 of the classification folder 131.
In this embodiment, if a new document image is to be classified, the document processing system further includes more than one expanded classification folder 141, and the method further provides an expanded modeling program, and please refer to fig. 8, where the expanded modeling program further includes the following steps:
setting a code 142 of the expanded classification folder 141 (S61);
receiving a plurality of document images (S62); wherein, the received file image is used for machine learning;
processing the document images by a machine learning program to generate a new set of machine learning model information (S63); wherein the new set of machine learning model information includes a plurality of new sets of coefficients corresponding to the code 132 of the classification folder 131 and the code 142 of the expanded classification folder 141.
In summary, the above-mentioned embodiments are provided only for illustrating the principles and effects of the present invention, and not for limiting the present invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (17)

1. A method for processing documents using machine learning to classify documents, the method being performed on a document processing system, the document processing system defaulting to more than one classification folder, the classification folders having corresponding codes, and the method comprising the steps of:
receiving more than one file image;
calculating according to a set of machine learning model information generated by completing a primary modeling program and the file image to generate a calculation result;
and storing the file image in the corresponding classification folder according to the calculation result and the code of the classification folder.
2. The method of claim 1, wherein when the step of storing the document image in the corresponding classification folder according to the calculation result and the code of the classification folder is executed, the method further comprises the steps of:
and comparing the calculation result with the codes of the classification folders to store the file images in the corresponding classification folders.
3. The method of claim 2, wherein when the step of comparing the calculated result with the code of the classification folder to store the document image in the corresponding classification folder is executed, the method further comprises the steps of:
judging whether the codes of the same classified folders are compared or not according to the codes of the calculation results;
and if so, storing the file images in the corresponding classification folders.
4. The method of claim 3, wherein when the step of determining whether the codes of the classified folders match with each other is executed, if not, the method further provides an auxiliary determination procedure.
5. The method of processing documents using machine learning for document classification according to claim 4, wherein said auxiliary judgment program comprises the steps of:
acquiring more than one image characteristic information of the document image;
processing according to the image characteristic information to generate an auxiliary judgment result;
and comparing the codes of the auxiliary judgment results with the codes of the classification folders to store the file images in the corresponding classification folders.
6. The method of document processing using machine learning for document classification as claimed in claim 1, wherein said first modeling routine further comprises the steps of:
setting codes corresponding to the classification folders;
receiving a plurality of file images;
executing a machine learning program to process the plurality of file images to generate the machine learning model information.
7. The method of processing documents using machine learning for document classification according to claim 6, wherein said machine learning model information includes a plurality of sets of coefficients, one of said plurality of sets of coefficients corresponding to a code of said classification folder.
8. The method of claim 1, wherein the document processing system further comprises one or more augmented taxonomy folders, and wherein the method further provides an augmented modeling program.
9. The method of document processing using machine learning for document classification as claimed in claim 8, wherein said augmented modeling program further comprises the steps of:
setting a code of the expanded classified folder;
receiving a plurality of file images;
and processing the plurality of file images through the machine learning program to generate a new set of machine learning model information.
10. The method of processing documents using machine learning for document classification according to claim 9, wherein said new machine learning model information includes a plurality of sets of new coefficients, said plurality of sets of new coefficients corresponding to codes of said classification folder and said expanded classification folder.
11. The method of document processing using machine learning for document classification as claimed in claim 1, wherein the document processing system includes a copier, a scanner or a multi-function peripheral.
12. A document processing system for classifying documents using machine learning, comprising:
the input module acquires more than one document image;
the storage module defaults to a classification folder, and the classification folder is corresponding to codes;
the processing module is respectively connected with the input module and the storage module;
the processing module receives the file images and calculates the file images and a group of machine learning model information generated by completing a primary modeling program to generate a calculation result, and the processing module compares the calculation result with the codes of the classification folders to store the file images in the corresponding classification folders.
13. The system of claim 12, wherein the processing module compares the code of the calculation result with the code of the classification folder, and stores the document image in the corresponding classification folder when the processing module compares the code of the classification folder with the code of the calculation result.
14. The system of claim 13, wherein when the processing module does not match the code of the classification folder with the code of the calculation result, the processing module further executes an auxiliary judgment program and obtains one or more image feature information according to the document image, the processing module processes the image feature information to generate an auxiliary judgment result, and the processing module matches the code of the auxiliary judgment result with the code of the classification folder to store the document image in the corresponding classification folder.
15. The document processing system using machine learning for document classification as claimed in claim 14, wherein the image characteristic information includes optical character recognition information, document image size information or document image color information.
16. The system of claim 12, wherein the input module receives a plurality of document images when the processing module executes the first modeling routine, the processing module processes the plurality of document images through a machine learning routine to generate the machine learning model information, the machine learning model information includes a plurality of sets of coefficients, and one of the plurality of sets of coefficients corresponds to a code of the classification folder.
17. The system of claim 12, further comprising one or more expansion modules, said expansion modules default to expansion classification folders; the processing module executes an extended modeling program to set codes corresponding to the extended classification folder, receives a plurality of file images, and processes the plurality of file images through the machine learning program to generate a new set of machine learning model information, wherein the new machine learning model information comprises a plurality of new groups of coefficients, and the new groups of coefficients correspond to the codes of the classification folder and the codes of the extended classification folder.
CN202110087670.1A 2020-01-30 2021-01-22 File processing system and method for classifying files by machine learning Active CN113139071B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
TW109102766 2020-01-30
TW109102766A TWI750572B (en) 2020-01-30 2020-01-30 Document processing system and method for document classification using machine learning

Publications (2)

Publication Number Publication Date
CN113139071A true CN113139071A (en) 2021-07-20
CN113139071B CN113139071B (en) 2023-10-24

Family

ID=76811210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110087670.1A Active CN113139071B (en) 2020-01-30 2021-01-22 File processing system and method for classifying files by machine learning

Country Status (3)

Country Link
US (1) US11663526B2 (en)
CN (1) CN113139071B (en)
TW (1) TWI750572B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7268764B1 (en) * 2022-01-14 2023-05-08 凸版印刷株式会社 Image processing device, image processing method and image processing program
CN118939604B (en) * 2024-07-24 2025-02-21 广西智汇通人力资源有限公司 Data processing method and system based on archive information

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699904A (en) * 2013-12-25 2014-04-02 大连理工大学 Image computer-aided diagnosis method for multi-sequence nuclear magnetic resonance images
CN105320945A (en) * 2015-10-30 2016-02-10 小米科技有限责任公司 Image classification method and apparatus
CN107220975A (en) * 2017-07-31 2017-09-29 合肥工业大学 Uterine neck image intelligent auxiliary judgment system and its processing method
CN108109680A (en) * 2017-12-20 2018-06-01 南通艾思达智能科技有限公司 A kind of method of settlement of insurance claim image bag sorting
CN109977073A (en) * 2019-03-11 2019-07-05 厦门纵横集团科技股份有限公司 A kind of law court's electronics folder automation filing system and its method
US20200019853A1 (en) * 2018-07-13 2020-01-16 Primax Electronics Ltd. Product testing system with auxiliary judging function and auxiliary testing method applied thereto

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8495679B2 (en) * 2000-06-30 2013-07-23 Thomson Licensing Method and apparatus for delivery of television programs and targeted de-coupled advertising
US10043112B2 (en) * 2014-03-07 2018-08-07 Qualcomm Incorporated Photo management
US10382792B2 (en) * 2015-02-13 2019-08-13 Lg Electronics Inc. Method and apparatus for encoding and decoding video signal by means of transform-domain prediction
US11580398B2 (en) * 2016-10-14 2023-02-14 KLA-Tenor Corp. Diagnostic systems and methods for deep learning models configured for semiconductor applications
US10395362B2 (en) * 2017-04-07 2019-08-27 Kla-Tencor Corp. Contour based defect detection
TWI662511B (en) * 2017-10-03 2019-06-11 財團法人資訊工業策進會 Hierarchical image classification method and system
US11500533B2 (en) * 2018-02-14 2022-11-15 Lg Electronics Inc. Mobile terminal for displaying a preview image to be captured by a camera and control method therefor
TW202004519A (en) * 2018-06-05 2020-01-16 正修學校財團法人正修科技大學 Method for automatically classifying images
US11537506B1 (en) * 2018-10-26 2022-12-27 Amazon Technologies, Inc. System for visually diagnosing machine learning models
JP2021043775A (en) * 2019-09-12 2021-03-18 富士ゼロックス株式会社 Information processing device and program
US11423308B1 (en) * 2019-09-20 2022-08-23 Apple Inc. Classification for image creation
JP7439435B2 (en) * 2019-09-30 2024-02-28 富士フイルムビジネスイノベーション株式会社 Information processing device and program
JP7357579B2 (en) * 2020-03-30 2023-10-06 シャープ株式会社 Image processing device, image processing method and program
US20220075845A1 (en) * 2020-05-18 2022-03-10 Best Apps, Llc Computer aided systems and methods for creating custom products

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699904A (en) * 2013-12-25 2014-04-02 大连理工大学 Image computer-aided diagnosis method for multi-sequence nuclear magnetic resonance images
CN105320945A (en) * 2015-10-30 2016-02-10 小米科技有限责任公司 Image classification method and apparatus
CN107220975A (en) * 2017-07-31 2017-09-29 合肥工业大学 Uterine neck image intelligent auxiliary judgment system and its processing method
CN108109680A (en) * 2017-12-20 2018-06-01 南通艾思达智能科技有限公司 A kind of method of settlement of insurance claim image bag sorting
US20200019853A1 (en) * 2018-07-13 2020-01-16 Primax Electronics Ltd. Product testing system with auxiliary judging function and auxiliary testing method applied thereto
CN109977073A (en) * 2019-03-11 2019-07-05 厦门纵横集团科技股份有限公司 A kind of law court's electronics folder automation filing system and its method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WANG CHUNFENG等: "Heterogeneous transfer learning based on stack sparse auto-encoders for fault diagnosis", 《2018 CHINESE AUTOMATIC CONGRESS(CAC)》, pages 1 - 2 *
赵可杨 等: "机器学习辅助肿瘤诊断", 《肿瘤》, vol. 38, no. 10, pages 987 - 991 *

Also Published As

Publication number Publication date
US11663526B2 (en) 2023-05-30
TWI750572B (en) 2021-12-21
CN113139071B (en) 2023-10-24
US20210240974A1 (en) 2021-08-05
TW202129513A (en) 2021-08-01

Similar Documents

Publication Publication Date Title
US20210209359A1 (en) Image processing apparatus, control method for image processing apparatus, and non-transitory storage medium
CN101184137B (en) Image processing method and device, image reading and forming device
US8185398B2 (en) Reading device with shortcut read function
EP1473641A2 (en) Information processing apparatus, method, storage medium and program
US11620844B2 (en) Image processing apparatus, control method of image processing apparatus, and storage medium
US10530957B2 (en) Image filing method
US8300944B2 (en) Image processing method, image processing apparatus, image reading apparatus, image forming apparatus, image processing system, and storage medium
US7463772B1 (en) De-warping of scanned images
CN113139071A (en) Document processing system and method for classifying documents by machine learning
CN105787425A (en) Information processing apparatus, system, and information processing method
JP6435934B2 (en) Document image processing program, image processing apparatus and character recognition apparatus using the program
JP2009206658A (en) Image processing method, image processor, image forming apparatus, program, and storage medium
JP4859054B2 (en) Image processing apparatus, image processing method, program, and recording medium
US20210118316A1 (en) Document checking system and grading system
JP5962449B2 (en) Determination program, determination method, and determination apparatus
US10834281B2 (en) Document size detecting by matching between image of entire document and read size image
JP3093493B2 (en) Image storage and retrieval device
US8451461B2 (en) Information processor, information processing system, and computer readable medium
US11657632B2 (en) Image processing device, image reading device, image processing method, and non-transitory computer readable medium, using two pieces of image data
CN101609453A (en) A kind of separator page and the method and apparatus that utilizes the document classification of this separator page
JP6303742B2 (en) Image processing apparatus, image processing method, and image processing program
JP2007041709A (en) Document processing system, document processing system control method, document processing apparatus, computer program, and computer-readable storage medium
CN112364868A (en) Rotation correction method and device for electronic file
CN116389649B (en) Paper scanning input storage method, equipment and computer readable storage medium
JP2020004345A (en) Image collation system, image collation method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant