TWI793432B - Document management method and system for engineering project - Google Patents
Document management method and system for engineering project Download PDFInfo
- Publication number
- TWI793432B TWI793432B TW109126902A TW109126902A TWI793432B TW I793432 B TWI793432 B TW I793432B TW 109126902 A TW109126902 A TW 109126902A TW 109126902 A TW109126902 A TW 109126902A TW I793432 B TWI793432 B TW I793432B
- Authority
- TW
- Taiwan
- Prior art keywords
- engineering project
- database
- natural language
- document
- project
- Prior art date
Links
- 238000007726 management method Methods 0.000 title claims abstract description 114
- 238000000034 method Methods 0.000 claims description 135
- 238000013135 deep learning Methods 0.000 claims description 44
- 238000013528 artificial neural network Methods 0.000 claims description 34
- 238000001514 detection method Methods 0.000 claims description 30
- 230000000306 recurrent effect Effects 0.000 claims description 24
- 238000013527 convolutional neural network Methods 0.000 claims description 23
- 230000011218 segmentation Effects 0.000 claims description 15
- 238000012015 optical character recognition Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000010276 construction Methods 0.000 claims description 8
- 238000004458 analytical method Methods 0.000 claims description 4
- 230000002457 bidirectional effect Effects 0.000 claims description 4
- 238000012706 support-vector machine Methods 0.000 claims description 4
- 238000003066 decision tree Methods 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims description 3
- 230000000877 morphologic effect Effects 0.000 claims description 3
- 230000008447 perception Effects 0.000 claims description 3
- 238000007637 random forest analysis Methods 0.000 claims description 3
- 230000006403 short-term memory Effects 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 14
- 238000003058 natural language processing Methods 0.000 description 9
- 238000013459 approach Methods 0.000 description 5
- 238000007418 data mining Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本發明係有關於一種工程專案文件管理方法與系統,尤其是能夠自動將文件分派給關係人並歸檔,從而實現文件自動化的一套工程專案文件管理方法與系統。 The present invention relates to a method and system for project file management, in particular to a set of project file management method and system capable of automatically assigning files to related parties and filing them, thereby realizing file automation.
在工程專案進行的過程中,衍生的相關文件極為龐雜,不僅文件類型眾多,如:照片、PDF、word、…等等,且往往涉及眾多利害關係人、工程管理項目也極為細瑣且廣泛,在習用技術中,專案執行的過程中,通常習慣以人為方式將文件資料、或待辦事項分派給關係人,或將相關資料分類至不同的工程管理項目下。 In the process of engineering projects, the derived related documents are extremely complex. Not only are there many types of documents, such as: photos, PDF, word, ..., etc., but also often involve many stakeholders, and the project management items are also extremely detailed and extensive. In conventional technology, during the process of project execution, it is usually customary to manually assign documents or to-do items to related parties, or to classify relevant materials into different project management items.
然而在資料量極為龐雜的情況下,經常會發生許多錯誤,例如分派錯誤、分類錯誤、遺漏等問題,其他衍生問題,例如:分派事項難以進行進度追蹤、相關資料之查詢需耗費許多時間等亦會隨之產生,近年來雖然因為科技的進步,而出現許多工程專案平台,以加速文件傳遞流程並建立共同協作模式,或是應用例如自然語言處理技術,試著使文件處理過程能夠盡量自動化。 However, in the case of an extremely large amount of data, many errors often occur, such as assignment errors, classification errors, omissions, etc., and other derivative problems, such as: it is difficult to track the progress of assignments, and it takes a lot of time to inquire about relevant information. It will follow. In recent years, due to the advancement of technology, many engineering project platforms have emerged to speed up the file transfer process and establish a joint collaboration model, or apply natural language processing technology to try to automate the file processing process as much as possible.
舉例來說,在習用技術中,已經有劉秉錦等人提出之中華民 國發明專利第I682286號「利用文字解析結果與自然語言輸入的文件搜尋系統」,其揭露利用自然語言處理(natural language processing、NLP)技術應用於文件搜尋系統,此項技術可針對儲存檔案進行關鍵字解析以強化搜尋速度,也可以透過於雲端平台接收使用者輸入含有關鍵字的自然語法進行文件搜尋,以改善文件搜尋之效率,還有「洽吧智能股份有限公司」所提出之中華民國新型專利第M583974號「文件資訊提取歸檔系統」,其揭露透過文字偵測、及文字識別技術來辨識並擷取文件中記載的各種資訊,並將擷取下來的資訊存入所屬的資料庫中使用。 For example, among the commonly used technologies, the Chinese Minority Technology proposed by Liu Bingjin et al. National Invention Patent No. I682286 "Document Search System Using Text Analysis Results and Natural Language Input", which discloses the use of natural language processing (natural language processing, NLP) technology for document search systems. Word analysis can enhance the search speed, and it can also receive user input on the cloud platform to search for documents with natural grammar containing keywords, so as to improve the efficiency of document search. There is also a new model of the Republic of China proposed by "Qiaba Smart Co., Ltd." Patent No. M583974 "Document Information Extraction and Archiving System", which discloses the use of text detection and text recognition technology to identify and retrieve various information recorded in documents, and store the retrieved information in its own database for use .
但上述這些習用技術,在針對有關文件的派發及分類的作業上,大多仍採人為進行,不但極為耗時且容易出錯,也缺少在龐大的文件資料中,找出不同資料項目間的相關性,實現共同串連各方關係人之工作流程,並自動紀錄歷程的功能,使得各方關係人之間無法產生交流與連結,也無法清楚的掌握工程進度與施工成本。 However, most of the above-mentioned conventional technologies are still manual for the distribution and classification of relevant documents, which is not only extremely time-consuming and error-prone, but also lacks the ability to find out the correlation between different data items in a huge amount of documents. , realize the function of jointly linking the work processes of all parties involved and automatically recording the history, making it impossible for all parties to communicate and connect, and it is also impossible to clearly grasp the progress of the project and the construction cost.
職是之故,發明人經過悉心嘗試與研究,並一本鍥而不捨之精神,終構思出本案「工程專案文件管理方法與系統」,能夠克服上述缺點,以下為本發明之簡要說明。 For this reason, the inventor, after painstaking attempts and research, and a spirit of perseverance, finally conceived the project "engineering project document management method and system", which can overcome the above-mentioned shortcomings. The following is a brief description of the present invention.
有鑑於習用技術的缺點,本發明提出將自然語言處理技術結合大數據資料探勘、關聯法則與深度學習等技術,使該平台之文件分派及分類工作自動化,提升文件處理之效率,並具備快速搜尋文件等功能;本發明透過自然語言處理技術自動辨識不同檔案類型之文件,並建立關鍵字資料庫,透過關鍵字資料庫與合約利害關係人、工程專案管理等兩項資料 庫建立關聯法則,進而將文件自動分派給關聯的利害關係人並分類至相應之工程專案管理資料庫。另外,透過關鍵字資料庫之建立強化搜尋速度,讓使用者可以雲端互動平台為媒介,輸入自然語言關鍵字進行文件搜尋,減少使用在大量資料庫中所耗費之時間。 In view of the shortcomings of conventional technologies, this invention proposes to combine natural language processing technology with big data data mining, association rules and deep learning technologies to automate the document assignment and classification work of the platform, improve the efficiency of document processing, and enable fast search Documents and other functions; the present invention automatically identifies documents of different file types through natural language processing technology, and establishes a keyword database, and through the keyword database, two data such as contract stakeholders and project management Databases establish association rules, and then automatically assign documents to associated stakeholders and classify them into corresponding engineering project management databases. In addition, through the establishment of a keyword database to enhance the search speed, users can use the cloud interactive platform as a medium to input natural language keywords for document search, reducing the time spent in using a large number of databases.
本發明提出一種工程專案文件管理方法,其包含:建立包含複數文字的複數電子文件並上傳工程專案文件管理平台;透過該工程專案文件管理平台包含之自然語言解析器以識別該等電子文件所包含的複數關鍵字,並儲存到關鍵字資料庫;按照該等關鍵字並依據關聯法則,將該等電子文件分派給至少一關係人並歸類到至少一工程管理項目;以及提供自然語言文件查詢元件以供使用者經由檢索該關鍵字資料庫中之該等關鍵字而找到對應之電子文件。 The present invention proposes a management method for engineering project documents, which includes: creating multiple electronic documents containing plural characters and uploading them to the engineering project document management platform; using the natural language parser included in the engineering project document management platform to identify the electronic documents contained multiple keywords and store them in the keyword database; assign these electronic documents to at least one related person and classify them into at least one project management project according to the keywords and the association rules; and provide natural language document query The component is used for the user to find the corresponding electronic document by searching the keywords in the keyword database.
較佳的,所述之工程專案文件管理方法還包含以下其中之一:將該等電子文件上傳到文件資料庫中;透過該自然語言解析器所包含之字元偵測元件實施字元偵測以判斷該等文字的所在位置;透過該自然語言解析器所包含之字元辨識元件實施字元辨識以辨識所偵測到的該等文字;透過該自然語言解析器所包含之命名實體識別元件實施命名實體識別以識別所辨識出的該等文字所包含的該等關鍵字;集合所識別出的該等關鍵字建立該關鍵字資料庫;以及提供自然語言文件查詢元件以提供使用者使用自然語言而直接檢索該關鍵字資料庫中的該等關鍵字,並據以連結到所查詢的電子文件,以從該文件資料庫包含的該等電子文件中找到所查詢的電子文件。 Preferably, the project document management method also includes one of the following: uploading these electronic documents to the document database; implementing character detection through the character detection components contained in the natural language parser To determine the location of the text; implement character recognition through the character recognition component included in the natural language parser to identify the detected text; through the named entity recognition component included in the natural language parser Implement named entity recognition to identify the keywords included in the identified text; collect the identified keywords to create the keyword database; and provide natural language document query components to provide users with natural directly search the keywords in the keyword database, and link to the inquired electronic documents accordingly, so as to find the inquired electronic documents from the electronic documents included in the document database.
較佳的,所述之工程專案文件管理方法還包含以下其中之 一:該字元偵測元件係經由實施型態學操作方法、MSER方法、NMS方法、CTPN方法、SegLink方法、EAST方法、R-CNN方法、快速RCNN方法、PSENet方法及其組合其中之一而偵測到該等文字的所在位置;該字元辨識元件經由實施深度學習方法而辨識所偵測到的該等文字,該深度學習方法係為卷積神經網路、深度卷積神經網路、循環神經網路、卷積遞歸神經網路、卷積遞歸神經網路光學字元辨識、注意力光學字元辨識及其組合其中之一;該命名實體識別元件經由參照該自然語言規則資料集而實施應用基於規則的方法、非監督式學習方法、基於特徵的監督式學習方法、該深度學習方法及其組合其中之一,以建立該關鍵字資料庫;以及該命名實體識別元件經由參照該自然語言規則資料集而實施文本斷詞作業、斷詞標記作業、詞性標記作業、實體標記作業、實體擷取、專名擷取、指代消解作業、關係抽取作業或者語法剖析作業,以建立該關鍵字資料庫。 Preferably, the project project file management method also includes one of the following One: The character detection element is implemented by implementing one of the morphological operation method, MSER method, NMS method, CTPN method, SegLink method, EAST method, R-CNN method, fast RCNN method, PSENet method and combinations thereof The location of the characters is detected; the character recognition element recognizes the detected characters by implementing a deep learning method, the deep learning method is a convolutional neural network, a deep convolutional neural network, One of recurrent neural network, convolutional recurrent neural network, convolutional recurrent neural network optical character recognition, attentional optical character recognition, and combinations thereof; the named entity recognition element is generated by referring to the natural language rule dataset implementing one of a rule-based method, an unsupervised learning method, a feature-based supervised learning method, the deep learning method, and combinations thereof to build the keyword database; To create the key word database.
較佳的,所述之工程專案文件管理方法還包含以下其中之一:建立深度學習資料集,以訓練該深度學習方法並提供該深度學習方法學習;建立自然語言規則資料集,以提供該命名實體識別元件作為參照;以該深度學習資料集以及該自然語言規則資料集作為訓練集,以訓練該深度學習方法;建立關係人資料庫,該關係人資料庫包含複數關係人以及每一該等關係人所對應之關鍵字;建立工程專案資料庫,該工程專案資料庫包含複數工程管理項目以及每一該等工程管理項目所對應之關鍵字;實施關聯法則演算法,以建立該等關鍵字相對於該關係人資料庫中每一該等關係人以及該工程專案資料庫中每一該等工程管理項目之關聯法則;以及應用該深度學習方法優化該關聯法則。 Preferably, the project file management method also includes one of the following: establishing a deep learning data set to train the deep learning method and provide learning for the deep learning method; establishing a natural language rule data set to provide the naming The entity recognition component is used as a reference; the deep learning data set and the natural language rule data set are used as a training set to train the deep learning method; a relational person database is established, and the relational person database includes plural relational persons and each such Keywords corresponding to related parties; establish a project project database, which contains multiple project management projects and keywords corresponding to each of these project management projects; implement association rule algorithms to establish these keywords an association rule with respect to each of the related parties in the related party database and each of the engineering management items in the engineering project database; and optimizing the association rule by applying the deep learning method.
本發明進一步提出一種工程專案文件管理系統,其包含:後端資料層,其供儲存自然語言規則資料集、深度學習資料集、關鍵字資料庫、關係人資料庫、工程專案資料庫以及文件資料庫其中之一;中繼邏輯層,其執行工程專案文件管理平台以及該工程專案文件管理平台包含的自然語言解析器;以及前端展示層,其執行該工程專案文件管理平台包含之前端元件以及自然語言文件查詢元件,其中使用者在使用者裝置上透過該前端元件將包含複數文字的複數電子文件上傳該工程專案文件管理平台,以供該自然語言解析器識別該等電子文件所包含的複數關鍵字,並儲存到該關鍵字資料庫,以便該工程專案文件管理平台按照該等關鍵字並依據關聯法則,將該等電子文件分派給至少一關係人並歸類到工程管理項目,以及透過該自然語言文件查詢元件檢索該關鍵字資料庫中之該等關鍵字而找到對應之電子文件。 The present invention further proposes an engineering project file management system, which includes: a back-end data layer for storing natural language rule data sets, deep learning data sets, keyword databases, related person databases, engineering project databases, and document data One of the libraries; the relay logic layer, which executes the project file management platform and the natural language parser contained in the project file management platform; and the front-end display layer, which executes the project file management platform. A language file query component, wherein the user uploads multiple electronic files containing multiple characters to the engineering project file management platform through the front-end component on the user device, so that the natural language parser can identify the multiple keys contained in the electronic files words, and store them in the keyword database, so that the engineering project document management platform can assign these electronic documents to at least one related person and classify them into engineering management projects according to the keywords and the association rules, and through the The natural language document query component searches the keywords in the keyword database to find the corresponding electronic documents.
上述發明內容旨在提供本揭示內容的簡化摘要,以使讀者對本揭示內容具備基本的理解,此發明內容並非揭露本發明的完整描述,且用意並非在指出本發明實施例的重要/關鍵元件或界定本發明的範圍。 The above summary of the invention is intended to provide a simplified summary of the disclosure to enable readers to have a basic understanding of the disclosure. This summary of the invention is not intended to disclose a complete description of the invention, and is not intended to point out important/key elements or components of the embodiments of the invention. define the scope of the invention.
10:本發明工程專案文件管理系統 10: Project document management system of the present invention
100:前端展示層 100: Front-end display layer
101:桌上型電腦 101:Desktop Computer
103:筆記型電腦 103: Notebook computer
105:平板裝置 105: Tablet device
107:智慧手機 107:Smartphone
110:使用者裝置 110: user device
120:使用者 120: user
130:文件上傳使用者介面 130:File upload user interface
150:伺服器負載平衡設備 150: Server load balancing equipment
200:中繼邏輯層 200: relay logic layer
250:伺服器負載平衡設備 250: Server load balancing equipment
300:後端資料層 300: backend data layer
500:本發明工程專案文件管理方法 500: Project file management method of the present invention
501-506:實施步驟 501-506: Implementation steps
第1圖揭示本發明工程專案文件管理系統之系統架構示意圖; Figure 1 discloses a schematic diagram of the system architecture of the engineering project document management system of the present invention;
第2圖揭示本發明工程專案文件管理系統之系統運作示意圖; Figure 2 shows a schematic diagram of the system operation of the engineering project document management system of the present invention;
第3圖揭示本發明工程專案文件管理平台使用之電子文件之示意圖; Figure 3 shows a schematic diagram of the electronic documents used by the engineering project document management platform of the present invention;
第4圖揭示本發明字元偵測元件在文件上標示的矩形文字框之示意圖; Figure 4 discloses a schematic diagram of a rectangular text box marked on a document by the character detection device of the present invention;
第5圖揭示本發明自然語言解析器所包含的字元偵測元件、字元辨識元件以及命名實體識別元件的模組架構示意圖; FIG. 5 shows a schematic diagram of the module structure of the character detection component, the character recognition component and the named entity recognition component included in the natural language parser of the present invention;
第6圖揭示本發明工程專案文件管理平台經由前端展示層提供給使用者操作的平台使用者介面之示意圖; Figure 6 discloses a schematic diagram of the platform user interface provided by the project project document management platform of the present invention to the user through the front-end display layer;
第7圖揭示本發明工程專案文件管理系統之運作原理方塊圖;以及 Figure 7 discloses a block diagram of the operating principle of the engineering project document management system of the present invention; and
第8圖揭示本發明工程專案文件管理方法之實施步驟流程圖。 Fig. 8 discloses a flow chart of the implementation steps of the engineering project file management method of the present invention.
本發明將可由以下的實施例說明而得到充分瞭解,使得熟習本技藝之人士可以據以完成之,然本發明之實施並非可由下列實施案例而被限制其實施型態;本發明之圖式並不包含對大小、尺寸與比例尺的限定,本發明實際實施時其大小、尺寸與比例尺並非可經由本發明之圖式而被限制。 The present invention can be fully understood by the following examples, so that those skilled in the art can complete it, but the implementation of the present invention can not be limited by the following examples of implementation; the drawings of the present invention are not limited No limitation on size, dimension and scale is included, and the size, dimension and scale of the present invention are not limited by the drawings of the present invention during the actual implementation.
本文中用語“較佳”是非排他性的,應理解成“較佳為但不限於”,任何說明書或請求項中所描述或者記載的任何步驟可按任何順序執行,而不限於請求項中所述的順序,本發明的範圍應僅由所附請求項及其均等方案確定,不應由實施方式示例的實施例確定;本文中用語“包含”及其變化出現在說明書和請求項中時,是一個開放式的用語,不具有限制性含義,並不排除其他特徵或步驟。 The word "preferred" in this article is non-exclusive and should be understood as "preferably but not limited to". order, the scope of the present invention should be determined only by the appended claims and their equivalents, not by the examples illustrated in the implementation; when the term "comprising" and its variations appear in the specification and claims, it is An open-ended term without a restrictive meaning that does not exclude other features or steps.
第1圖揭示本發明工程專案文件管理系統之系統架構示意圖;第2圖揭示本發明工程專案文件管理系統之系統運作示意圖;本發明工程專案文件管理系統10是由後端資料層300、中繼邏輯層200以及前端展示層100等三層架構所組成,後端資料層300包含至少一台或多台資料庫伺服
器,中繼邏輯層200包含至少一台或多台程式伺服器,前端展示層100包含至少一台或多台網頁伺服器,伺服器之間透過網際網路而通訊連結,每一層所包含的多台伺服器彼此之間會透過伺服器負載平衡設備150與250,在連線忙碌時合理分配工作負載,有效利用伺服器容量,加快每一層伺服器之反應速度。
Figure 1 discloses a schematic diagram of the system architecture of the project project document management system of the present invention; Figure 2 discloses a schematic diagram of the system operation of the project project document management system of the present invention; the project project
後端資料層300包含的資料庫伺服器主要用於儲存多個資料庫,包含自然語言規則資料集、深度學習資料集、關鍵字資料庫、關係人資料庫、工程專案資料庫以及文件資料庫等,中繼邏輯層200包含的程式伺服器主要用於執行工程專案文件管理平台之後端元件,包含自然語言解析器,自然語言解析器還包含字元偵測元件、字元辨識元件以及命名實體識別元件等。
The database server included in the back-
工程專案文件管理平台包含前端元件與後端元件,前端元件包含前端使用者介面元件以及自然語言文件查詢元件等,後端元件包含自然語言解析器,前端展示層100包含的網頁伺服器主要用於執行工程專案文件管理平台之前端元件,以提供使用者120在使用者裝置110上透過前端使用者介面元件操作工程專案文件管理平台,並存取關鍵字資料庫、關係人資料庫或者工程專案資料庫,使用者裝置110較佳是桌上型電腦101、筆記型電腦103、平板裝置105或是智慧手機107等。
The engineering project document management platform includes front-end components and back-end components. The front-end components include front-end user interface components and natural language document query components. The back-end components include natural language parsers. Execute the front-end components of the engineering project document management platform to provide the
後端資料層300儲存的關係人資料庫,其儲存工程專案中每一位關係人所涉及的關鍵字,以便使關鍵字資料庫與關係人資料庫之間產生關聯,工程專案資料庫則儲存工程專案中所包含的每一個工程管理項目所對應的關鍵字,以便使關鍵字資料庫與工程專案資料庫之間產生關聯,
上述的關鍵字資料庫、關係人資料庫以及工程專案資料庫,是以結構化(structured)資料格式、或是以半結構化(semi-structured)的資料格式,例如:CSV、紀錄檔、XML、JSON等所儲存的二進制文本,或者以非結構化(unstructured)的資料格式,例如:一般的電腦檔案而儲存,以便提供前端元件與後端元件存取、檢索、訓練或學習。
The related person database stored in the back-
在工程專案進行過程中,衍生的相關文件極為龐雜,不僅電子文件的類型眾多,例如但不限於:jpeg、pdf、doc或是docx等等,且涉及之關係人、工程管理項目極為廣泛,在習用技術中,往往是以人工方式分派各種不同的文件資料,或將待辦事項分派給專案的承辦人,或將相關資料分類至不同的工程管理項目。但這種人工分派的作業方式,在資料量極為龐雜的情況下常常會產生許多錯誤,例如分派錯誤、分類錯誤、遺漏等問題,因此本發明提出的工程專案文件管理平台包含一個自然語言解析器,可以自動識別電子文件中包含的文字或字符,並可執行自動派送給關係人、自動文件分類、以及文件自動歸檔入所屬工程管理項目,以實現文件自動化。 In the process of engineering projects, the derived related documents are extremely complex. Not only are there many types of electronic documents, such as but not limited to: jpeg, pdf, doc, or docx, etc., but also involve a wide range of related parties and engineering management projects. In conventional techniques, various documents and materials are often assigned manually, or to-do items are assigned to project undertakers, or related materials are classified into different project management items. However, this manual assignment method often produces many errors when the amount of data is extremely complex, such as assignment errors, classification errors, omissions, etc., so the engineering project file management platform proposed by the present invention includes a natural language parser , can automatically identify the words or characters contained in electronic files, and can automatically send to related parties, automatically classify files, and automatically file files into their own engineering management projects to realize file automation.
第3圖揭示本發明工程專案文件管理平台使用之電子文件之示意圖;第3圖揭示工程專利進行過程中常見的電子契約文件,但這樣的電子契約文件有可能是WORD檔案的.doc格式上傳到本發明工程專案文件管理平台,也可能是以掃描的PDF檔案的.pdf格式上傳到本發明工程專案文件管理平台,但也很可能是以手機拍照的影像檔案的.jpeg格式上傳到本發明工程專案文件管理平台,僅僅是第3圖所顯示的電子契約文件,就可能以jpeg、pdf或是doc的格式上傳到平台中,電子文件的檔案格式較佳為PDF檔 案格式、Power Point檔案格式、Power Point檔案兼容格式、WORD檔案格式、WORD檔案兼容格式、Excel檔案格式、Excel檔案兼容格式、JPG檔案格式、JPEG檔案格式、或是PNG檔案格式。 Figure 3 reveals a schematic diagram of the electronic documents used by the engineering project document management platform of the present invention; Figure 3 reveals common electronic contract documents in the process of engineering patents, but such electronic contract documents may be uploaded in the .doc format of WORD files The engineering project file management platform of the present invention may also be uploaded to the project project file management platform of the present invention in the .pdf format of the scanned PDF file, but it is also likely to be uploaded to the project of the present invention in the .jpeg format of the image file taken by the mobile phone The project document management platform, only the electronic contract documents shown in Figure 3, may be uploaded to the platform in the format of jpeg, pdf or doc, and the file format of the electronic documents is preferably a PDF file file format, Power Point file format, Power Point file compatible format, WORD file format, WORD file compatible format, Excel file format, Excel file compatible format, JPG file format, JPEG file format, or PNG file format.
第5圖揭示本發明自然語言解析器所包含的字元偵測元件、字元辨識元件以及命名實體識別元件的模組架構示意圖;上述的電子文件中所包含的文字資訊,有可能以字元編碼、文字圖像、或是具有文字圖像的圖片物件等多種型態,儲存在各種類型的檔案當中,為了辨識電子文件中包含的文字,本發明自然語言解析器經配置包含字元偵測元件、字元辨識元件以及命名實體識別元件,自然語言解析器依序執行字元偵測、字元辨識以及命名實體識別等,來對電子文件中包含的文字進行字元辨識與實體識別。 Figure 5 discloses a schematic diagram of the module architecture of the character detection component, character recognition component and named entity recognition component included in the natural language parser of the present invention; the text information contained in the above-mentioned electronic documents may be represented by characters Various types of codes, text images, or picture objects with text images are stored in various types of files. In order to identify the text contained in electronic files, the natural language parser of the present invention is configured to include character detection Components, character recognition components and named entity recognition components. The natural language parser performs character detection, character recognition and named entity recognition in sequence to perform character recognition and entity recognition on the text contained in the electronic document.
第4圖揭示本發明字元偵測元件在文件上標示的矩形文字框之示意圖;字元偵測元件主要是執行場景字元偵測(scene text detection),場景字元偵測從直觀上來理解,是從文件中找出所有文字或字符在文件中的具體位置,並以矩形文字框標示出找到的文字,字元偵測元件較佳是選擇應用型態學操作方法、MSER方法、NMS方法、CTPN方法、SegLink方法、EAST方法、R-CNN方法、快速RCNN方法、PSENet方法或這些方法的組合來執行場景字元偵測。 Figure 4 discloses a schematic diagram of a rectangular text box marked on a document by the character detection element of the present invention; the character detection element mainly performs scene text detection (scene text detection), and the scene character detection is intuitively understood , is to find out the specific position of all text or characters in the file from the file, and mark the found text with a rectangular text box. The character detection component is preferably selected from the application morphology operation method, MSER method, and NMS method. , CTPN method, SegLink method, EAST method, R-CNN method, fast RCNN method, PSENet method or a combination of these methods to perform scene character detection.
字元辨識元件主要是執行場景字元辨識(scene text recognition),場景字元辨識是在文字框標示出的範圍內,對其中包含的文字進行辨識,以提取其中的文字,字元辨識元件較佳可以選擇使用光學字元辨識(OCR)進行前期文字辨識,識別的過程分為兩個步驟:文字切割和分類 作業,先利用投影切割分割出單一字體,再送入卷積層(convolutional layers)進行分類。 The character recognition component is mainly to perform scene text recognition. The scene character recognition is to recognize the text contained in the text box within the range marked by the text box, so as to extract the text. The character recognition component is relatively Jia can choose to use Optical Character Recognition (OCR) for text recognition in the early stage. The recognition process is divided into two steps: text cutting and classification For the homework, a single font is first segmented by projection cutting, and then sent to convolutional layers for classification.
在某實施例,本發明選擇使用深度學習之相關技術進行文字辨識,因此得以省略文字切割的步驟,將文字識別轉換為序列學習的問題,雖然輸入的圖片尺度不同,文本長度也不同,但是經過深度卷積神經網路(DCNN)和循環神經網路(RNN)演算之後,輸出階段經過一定的翻譯,就可以對整個文本圖片進行辨識,舉例來說,較佳可選擇卷積遞歸神經網路-光學字元辨識(CRNN-OCR)或是注意力OCR(Attention OCR)來進行場景字元辨識,由於CRNN-OCR與注意力OCR在特徵學習階段都採用卷積神經網路(CNN)加RNN之複合網路結構,故可以直接從序列標籤中學習,而無需詳盡的註釋,不受序列狀對象長度限制,在訓練和測試階段都只需要高度標準化即可,與現有技術相比,在單字辨識上表現更好,佔用較少儲存空間。 In one embodiment, the present invention chooses to use deep learning related technologies for character recognition, so the step of character cutting can be omitted, and character recognition can be converted into a sequence learning problem. After the deep convolutional neural network (DCNN) and the recurrent neural network (RNN) are calculated, the output stage undergoes a certain translation, and the entire text image can be identified. For example, it is better to choose the convolutional recurrent neural network. -Optical character recognition (CRNN-OCR) or attention OCR (Attention OCR) for scene character recognition, because CRNN-OCR and attention OCR both use convolutional neural network (CNN) plus RNN in the feature learning stage The complex network structure, so it can be learned directly from sequence labels without detailed annotations, not limited by the length of sequence-like objects, and only needs a high degree of standardization in the training and testing stages. It performs better in recognition and takes up less storage space.
命名實體識別元件主要是執行命名實體識別(named entity recognition)任務,又稱專名識別、命名實體,目的是希望從辨識出的文字中,進一步識別出有特定意義的實體或專有名詞,主要涵蓋人名、地名、組織名稱、專有名詞、時間、數字、數量、貨幣、比例數值等可以用專有名詞或名稱來標識的事物或實體,是自然語言解析器中常見的一項任務,使用範圍非常廣;本發明提出的命名實體識別元件經過功能增強,可進一步識別更多實體,例如產品名稱、型號、價格等;所有識別出的實體與專名等,將儲存在後端資料層300而形成關鍵字資料庫。
The named entity recognition component mainly performs the task of named entity recognition (named entity recognition), also known as proper name recognition and named entity. The purpose is to further identify entities or proper nouns with specific meanings from the recognized text. Covering things or entities that can be identified by proper nouns or names, such as person names, place names, organization names, proper nouns, time, numbers, quantities, currency, and proportional values, is a common task in natural language parsers. Use The scope is very wide; the named entity recognition element proposed by the present invention has been enhanced to further identify more entities, such as product names, models, prices, etc.; all identified entities and proper names will be stored in the back-
相對於英文紀錄,單字之間可以空白或其他符號隔開,因而沒有斷詞處理的問題,但是中文句子卻是充滿連貫且無明顯邊界的多個字 元(characters),該如何進行中文句子的斷詞一直是技術上的難題,尤其該如何辨識一詞多意與多詞一義,故本發明使用基於自然語言處理(NLP)的一系列技術,包括文字探勘(Text Mining)、資料探勘等技術,來對相關資料進行前處理,文字探勘技術能從無論是結構化、非結構化或半結構化的文字資料格式中發掘出未知、隱含且有用的資訊,以編輯、組織及分析大量中文,並以詞彙出現的頻率及數量等相關數據進行分析,多應用於趨勢預測、決策輔助等領域。 Compared with English records, words can be separated by blanks or other symbols, so there is no problem of hyphenation processing, but Chinese sentences are full of multiple words that are coherent and have no obvious boundaries. Characters, how to decipher Chinese sentences has always been a technical problem, especially how to identify one word with multiple meanings and multiple words with one meaning, so the present invention uses a series of technologies based on natural language processing (NLP), including Text mining (Text Mining), data mining and other technologies are used to pre-process relevant data. Text mining technology can discover unknown, implicit and useful information from structured, unstructured or semi-structured text data formats. To edit, organize and analyze a large amount of Chinese information, and analyze the frequency and quantity of vocabulary and other related data, it is mostly used in trend forecasting, decision-making assistance and other fields.
在某實施例,本發明選擇應用基於規則的方法、非監督式學習方法(unsupervised learning approaches)、基於特徵的監督式學習方法(feature based supervised learning approaches)、深度學習方法(deep learning approaches)等幾類方法,例如但不限於:雙向神經網路(BRNN)-卷積遞歸神經網路(CRNN)方法,來進行專名識別或實體識別的任務。 In an embodiment, the present invention chooses to apply rule-based methods, unsupervised learning approaches (unsupervised learning approaches), feature-based supervised learning approaches (feature based supervised learning approaches), deep learning methods (deep learning approaches), etc. Class methods, such as but not limited to: Bidirectional Neural Network (BRNN)-Convolutional Recurrent Neural Network (CRNN) method, to perform proper name recognition or entity recognition tasks.
在某實施例,本發明選擇應用中央研究院開發的斷詞暨實體辨識系統(CoreNLP),作為核心的自然語言處理演算法並進行命名實體識別,配合使用斷詞法則解決中文句斷詞歧義的問題,在某實施例中,中文斷詞法其實施步驟依序包含:初步斷詞、斷詞標記(tagging)、未知詞偵測、中國人名擷取、歐美譯名擷取、複合詞擷取、下而上合併排序法、重新斷詞等步驟,其中斷詞法則主要由一個詞庫與一組斷詞法則構成,詞庫內的詞彙為事先建好且用人工檢視修正過的正確詞彙,用來作為中文字句斷詞的基準詞彙,這些基準詞彙(tokens)構成文本(texts)。 In a certain embodiment, the present invention chooses to apply the word segmentation and entity recognition system (CoreNLP) developed by the Central Research Institute as the core natural language processing algorithm to perform named entity recognition, and use the word segmentation rule to solve the ambiguity of Chinese sentence segmentation Problem, in one embodiment, the implementation steps of the Chinese word segmentation method include: preliminary word segmentation, tagging, unknown word detection, Chinese name extraction, European and American translation name extraction, compound word extraction, and following As for the steps of merging and sorting and re-segmentation, the segmentation method mainly consists of a thesaurus and a group of segmentation rules. The vocabulary in the thesaurus is the correct vocabulary that has been built in advance and corrected by manual inspection. These benchmark words (tokens) constitute texts (texts).
上述斷詞取得基本詞彙(tokens)的過程也稱為文本分詞(text segmentation),當斷詞與標記完成後,還接著需要建立詞性標記 (part-of-speech tagging),以及作為索引(index)的字典,以便將詞彙與文本轉換為處理器可辨識的數字序列(sequence of numbers),以便輸入深度學習技術讀取、辨識、學習和訓練,並建立自然語言規則資料集、以及深度學習資料集等作為訓練集(training set),以訓練自然語言解析器或是提供自然語言解析器學習,訓練集主要是經由實施基於語句規則、監督式、非監督式學習以及深度學習等不同的辨識方法分別建立相應的資料庫,以輔助自然語言辨識單元產出精準的辨識結果。 The above-mentioned process of word segmentation to obtain basic vocabulary (tokens) is also called text segmentation (text segmentation). After word segmentation and tagging are completed, part-of-speech tags need to be established (part-of-speech tagging), and a dictionary as an index to convert vocabulary and text into a sequence of numbers recognizable by the processor, so that the input deep learning technology can read, recognize, learn and Training, and establish natural language rule data sets and deep learning data sets as training sets to train natural language parsers or provide natural language parser learning. The training set is mainly implemented through the implementation of sentence-based rules, supervised Different identification methods such as traditional learning, unsupervised learning, and deep learning establish corresponding databases to assist the natural language identification unit to produce accurate identification results.
上述自然語言解析器所包含的字元偵測元件、字元辨識元件以及命名實體識別元件,由於皆採用最先進的深度學習技術,因此需要大量範本資料作為訓練集,以提供自然語言解析器包含的元件進行學習,經過訓練與校正後,自然語言解析器可正確執行包含文字辨識與實體識別等任務,準確率介於90%~95%之間,這些資料訓練集包含例如但不限於:儲存在後端資料層300資料庫伺服器上的自然語言規則資料集以及深度學習資料集等。
The character detection components, character recognition components, and named entity recognition components included in the above natural language parser all use the most advanced deep learning technology, so a large amount of sample data is required as a training set to provide a natural language parser that includes After training and calibration, the natural language parser can correctly perform tasks including text recognition and entity recognition, with an accuracy rate between 90% and 95%. These data training sets include but are not limited to: storage A natural language rule data set and a deep learning data set on the back-
當關鍵字資料庫建置完成後,接著需要在關鍵字資料庫與關係人資料庫以及工程專案資料庫之間,建立關聯法則(association rules),使關鍵字資料庫與關係人資料庫之間,以及關鍵字資料庫與工程專案資料庫之間產生關聯,後端資料層300儲存的關係人資料庫包含工程專案中每一位關係人所涉及的關鍵字,工程專案資料庫則包含工程專案中每一個工程管理項目所涉及的關鍵字,經過應用關聯法則演算法分析與計算詞頻之後,就可在關鍵字資料庫與關係人資料庫以及工程專案資料庫之間建立正確對應關係,以便平台根據指令,將文件分派給對應的利害關係人或分類至相
對應的工程管理項目。
After the construction of the keyword database is completed, it is necessary to establish association rules between the keyword database, the relationship database and the engineering project database, so that the relationship between the keyword database and the relationship database , and there is an association between the keyword database and the engineering project database, the related person database stored in the back-
舉例來說,關係人資料庫會存放所有關係人過往經手文件所涉及的關鍵字紀錄、詞彙紀錄與文字紀錄等,每筆記錄都有一個可供辨識的欄位,每筆紀錄會包含一筆或多筆的資料項目,關聯法則演算法的資料探勘會從眾多的資料項目集合中,找出經常發生的資料項目集,以判斷資料項目之間的關聯性強度是否足夠,當發生的次數夠多,代表此資料項目集具有某些存在的意義,因此演算法在計算過程當中會給定一個門檻值,較佳稱為支持度(support),以判斷某個資料項目集合發生的頻率是否符合門檻值的依據。 For example, the database of related parties will store the keyword records, vocabulary records and text records involved in the documents handled by all related parties in the past. Each record has an identifiable field, and each record will contain one or For multiple data items, the data mining of the association rule algorithm will find out frequently occurring data item sets from a large number of data item sets to judge whether the correlation strength between the data items is sufficient. , which means that the set of data items has some meaning of existence, so the algorithm will give a threshold value during the calculation process, which is better called support (support), to judge whether the frequency of occurrence of a set of data items meets the threshold basis for the value.
當某資料項目集發生的次數高於支持度,就稱該資料項目集為高頻資料項目集,當某資料項目集合中,同時存在A、B兩個資料項目,且發現的頻率高於支持度的設定,演算法就會判定候選資料項目A、B為高頻資料項目集,資料項目A與B具有關聯性,然後再透過條件機率檢驗此高頻資料項目的信賴度,檢定在資料項目A發生的情況下,B資料項目B同時也會發生的機率,以Support(A∩B)/Support(A)來表示,集合支持度與信賴度皆符合預設標準的所有資料項目,以形成有意義的關聯法則。 When the number of occurrences of a data item set is higher than the support degree, the data item set is called a high-frequency data item set. When there are two data items A and B in a certain data item set, and the frequency of discovery is higher than the support If the degree is set, the algorithm will determine that candidate data items A and B are high-frequency data item sets, and data item A and B are related, and then test the reliability of this high-frequency data item through conditional probability. When A occurs, the probability that B data item B will also occur at the same time is represented by Support(A∩B)/Support(A), and all data items whose support and reliability meet the preset standards are collected to form meaningful association laws.
在某實施例,本發明選擇應用深度學習類技術來優化關聯法則,本發明所述之深度學習類技術,較佳是指例如但不限於:類神經網路(ANN)、深度神經網路(DNN)、遞歸神經網路(RNN)、卷積神經網路(CNN)、卷積遞歸神經網路(CRNN)、生成對抗網路(GAN)、深度信念網路(DBN)、全卷積神經網路(FCN)、多列卷積神經網路(MCNN)、遞歸神經網路(RNN)、長短期記憶模型(LSTM)、雙向神經網路(BRNN)、深層循環神經網路 (DRNN)、殘差網路(DRN)、限制玻爾茲曼機(RBM)、多層感知(MLP)、自編碼器、注意力網路、集成學習(ensemble learning)、非監督式分類方法、監督式分類方法、提升樹方法、梯度提升樹方法、強梯度提升機方法、弱梯度提升機方法、回歸樹方法、隨機森林方法、決策樹方法、弱學習方法、強學習方法、強投票方法、弱投票方法、支援向量機(support vector machines)分類器、或是基於規則的方法等等。 In a certain embodiment, the present invention chooses to apply deep learning technology to optimize association rules. The deep learning technology described in the present invention preferably refers to, for example but not limited to: neural network (ANN), deep neural network ( DNN), Recurrent Neural Network (RNN), Convolutional Neural Network (CNN), Convolutional Recurrent Neural Network (CRNN), Generative Adversarial Network (GAN), Deep Belief Network (DBN), Fully Convolutional Neural Network Network (FCN), Multicolumn Convolutional Neural Network (MCNN), Recurrent Neural Network (RNN), Long Short-Term Memory Model (LSTM), Bidirectional Neural Network (BRNN), Deep Recurrent Neural Network (DRNN), residual network (DRN), restricted Boltzmann machine (RBM), multi-layer perception (MLP), autoencoder, attention network, ensemble learning, unsupervised classification methods, Supervised classification method, boosted tree method, gradient boosted tree method, strong gradient boosting machine method, weak gradient boosting machine method, regression tree method, random forest method, decision tree method, weak learning method, strong learning method, strong voting method, Weak voting methods, support vector machines (support vector machines) classifiers, or rule-based methods, etc.
第6圖揭示本發明工程專案文件管理平台經由前端展示層提供給使用者操作的平台使用者介面之示意圖;本發明工程專案文件管理平台,經由在前端展示層100上執行前端元件,而向使用者提供一系列平台使用者介面,使用者在使用者裝置上透過存取這些平台使用者介面,就可以操作工程專案文件管理平台,並將電子文件上傳工程專案文件管理平台。
Fig. 6 discloses a schematic diagram of the platform user interface provided to the user by the project project file management platform of the present invention through the front-end display layer; the project project file management platform of the present invention is provided to users by executing the front-end components on the front-
如第6圖所揭示,使用者在使用者裝置上啟動瀏覽器(browser),並在網址列輸入正確的統一資源定位符(URL)之後,即可存取第6圖揭示的文件上傳使用者介面130,使用者按照文件上傳使用者介面130中的指示,將電子文件上傳給工程專案文件管理平台,自然語言解析器將自動執行,辨識電子文件中包含的文字,並按照最終識別出來的專名或實體,將文件分派給關係人,並自動歸入所屬之工程專案資料庫之中。
As shown in Figure 6, after the user starts the browser on the user device and enters the correct Uniform Resource Locator (URL) in the address bar, he can access the file upload user disclosed in Figure 6
本發明工程專案文件管理平台的前端元件還包含一個自然語言文件查詢元件,其主要是在平台前端元件提供的使用者介面當中,例如:第6圖揭示的文件上傳使用者介面130,嵌入一列查詢欄位,以提供使用者以自然語言輸入語詞,自然語言文件查詢元件將依照所輸入之語詞,執行基於關聯法則之關鍵字偵測演算法,從關鍵字資料庫中找尋符合的關
鍵字,並根據關鍵字資料庫中的指示,連結到文件資料庫中對應的電子文件,然後存取對應的電子文件提供給使用者調閱與查看,使用者能從大量的文件中快速找到需要的文件,大幅減少文件搜索之時間。
The front-end component of the engineering project file management platform of the present invention also includes a natural language file query component, which is mainly in the user interface provided by the platform front-end component, for example: the file upload
第7圖揭示本發明工程專案文件管理系統之運作原理方塊圖;本發明提出之工程專案文件管理系統10,其整體運作原理如第7圖所揭示,首先進行自然語言解析器的學習與訓練,將作為範本的紙本文件,利用拍照、掃描或其他方式電子化為電子文件,或直接建置為電子文件,上傳到位於雲端平台伺服器並作為訓練集,然後配合自然語言規則資料集與深度學習資料集,以訓練自然語言解析器或是提供自然語言解析器學習,以產生關聯法則,並建立關鍵字資料庫、關係人資料庫以及工程專案資料庫。
The 7th figure discloses the block diagram of the operating principle of the engineering project document management system of the present invention; the engineering project
當自然語言解析器完成學習之後,使用開始將電子文件上傳到本發明工程專案文件管理平台,工程專案文件管理平台收到上傳的電子文件後,自然語言解析器將啟動識別電子文件所包含的各種關鍵字,並將識別出來的關鍵字用來更新關鍵字資料庫,然後按照關聯法則,將電子文件自動派送給關係人,並歸檔入所屬的工程專案的特定工程管理項目,使用者也可以反向透過操作自然語言文件查詢元件,快速找到文件資料庫中對應的電子文件,然後進行後續的存取、調閱或查看,使用者能從大量的文件中快速找到需要的文件,大幅減少文件搜索之時間。 After the natural language parser completes the study, the user begins to upload the electronic files to the engineering project file management platform of the present invention. keywords, and the identified keywords are used to update the keyword database, and then according to the association rules, the electronic files are automatically sent to the related parties, and filed into the specific project management items of the engineering projects to which they belong. Users can also reverse the To quickly find the corresponding electronic documents in the document database by operating natural language document query components, and then perform subsequent access, retrieval or viewing, users can quickly find the required documents from a large number of documents, greatly reducing document searches time.
第8圖揭示本發明工程專案文件管理方法之實施步驟流程圖;小結而言,本發明工程專案文件管理方法500,較佳包含下列步驟:建立包含複數文字的複數電子文件並上傳工程專案文件管理平台所屬之文件 資料庫,該工程專案文件管理平台包含自然語言解析器(步驟501);透過該自然語言解析器所包含之字元偵測元件實施字元偵測以判斷該等文字的所在位置(步驟502);透過該自然語言解析器所包含之字元辨識元件實施字元辨識以辨識所偵測到的該等文字(步驟503);透過該自然語言解析器所包含之命名實體識別元件實施命名實體識別以識別所辨識出的該等文字所包含的該等關鍵字,並儲存到關鍵字資料庫(步驟504);按照該等關鍵字並依據關聯法則,將該等電子文件分派給至少一關係人並歸類到至少一工程管理項目(步驟505);以及提供自然語言文件查詢元件以提供使用者使用自然語言直接檢索該關鍵字資料庫中的該等關鍵字,並據以連結到所查詢的電子文件,以從該文件資料庫包含的該等電子文件中找到所查詢的電子文件(步驟506)。 Figure 8 discloses a flow chart of the implementation steps of the engineering project file management method of the present invention; in summary, the engineering project file management method 500 of the present invention preferably includes the following steps: creating multiple electronic files containing plural characters and uploading them for engineering project file management Files to which the platform belongs Database, the engineering project document management platform includes a natural language parser (step 501); implement character detection through the character detection component included in the natural language parser to determine the location of the text (step 502) ; implement character recognition through the character recognition component included in the natural language parser to identify the detected characters (step 503); implement named entity recognition through the named entity recognition component included in the natural language parser To identify the keywords included in the recognized text and store them in the keyword database (step 504); assign the electronic documents to at least one related person according to the keywords and the association rules and classified into at least one project management project (step 505); and provide a natural language document query component to provide users with natural language to directly search the keywords in the keyword database, and link to the queried electronic files, so as to find the queried electronic files from the electronic files included in the file database (step 506).
本發明以上各實施例彼此之間可以任意組合或者替換,從而衍生更多之實施態樣,但皆不脫本發明所欲保護之範圍,茲進一步提供更多本發明實施例如次: The above embodiments of the present invention can be arbitrarily combined or replaced with each other, thereby deriving more implementation forms, but none of them depart from the scope of protection intended by the present invention. More embodiments of the present invention are further provided as follows:
實施例1:一種工程專案文件管理方法,其包含:建立包含複數文字的複數電子文件並上傳工程專案文件管理平台;透過該工程專案文件管理平台包含之自然語言解析器以識別該等電子文件所包含的複數關鍵字,並儲存到關鍵字資料庫;按照該等關鍵字並依據關聯法則,將該等電子文件分派給至少一關係人並歸類到至少一工程管理項目;以及提供自然語言文件查詢元件以供使用者經由檢索該關鍵字資料庫中之該等關鍵字而找到對應之電子文件。 Embodiment 1: A method for managing engineering project documents, which includes: creating multiple electronic documents containing plural characters and uploading them to the engineering project document management platform; using the natural language parser included in the engineering project document management platform to identify the electronic documents Include multiple keywords and store them in a keyword database; assign these electronic documents to at least one related person and classify them into at least one engineering management project according to the keywords and according to the association rules; and provide natural language documents The query element is used for the user to find the corresponding electronic document by searching the keywords in the keyword database.
實施例2:如實施例1所述之工程專案文件管理方法,還包含
以下其中之一:將該等電子文件上傳到文件資料庫中;透過該自然語言解析器所包含之字元偵測元件實施字元偵測以判斷該等文字的所在位置;透過該自然語言解析器所包含之字元辨識元件實施字元辨識以辨識所偵測到的該等文字;透過該自然語言解析器所包含之命名實體識別元件實施命名實體識別以識別所辨識出的該等文字所包含的該等關鍵字;集合所識別出的該等關鍵字建立該關鍵字資料庫;以及提供自然語言文件查詢元件以提供使用者使用自然語言而直接檢索該關鍵字資料庫中的該等關鍵字,並據以連結到所查詢的電子文件,以從該文件資料庫包含的該等電子文件中找到所查詢的電子文件。
Embodiment 2: The engineering project file management method as described in
實施例3:如實施例2所述之工程專案文件管理方法,還包含以下其中之一:該字元偵測元件係經由實施型態學操作方法、MSER方法、NMS方法、CTPN方法、SegLink方法、EAST方法、R-CNN方法、快速RCNN方法、PSENet方法及其組合其中之一而偵測到該等文字的所在位置;該字元辨識元件經由實施深度學習方法而辨識所偵測到的該等文字,該深度學習方法係為卷積神經網路、深度卷積神經網路、循環神經網路、卷積遞歸神經網路、卷積遞歸神經網路光學字元辨識、注意力光學字元辨識及其組合其中之一;該命名實體識別元件經由參照該自然語言規則資料集而實施應用基於規則的方法、非監督式學習方法、基於特徵的監督式學習方法、該深度學習方法及其組合其中之一,以建立該關鍵字資料庫;以及該命名實體識別元件經由參照該自然語言規則資料集而實施文本斷詞作業、斷詞標記作業、詞性標記作業、實體標記作業、實體擷取、專名擷取、指代消解作業、關係抽取作業或者語法剖析作業,以建立該關鍵字資料庫。
Embodiment 3: The engineering project file management method as described in
實施例4:如實施例3所述之工程專案文件管理方法,還包含以下其中之一:建立深度學習資料集,以訓練該深度學習方法並提供該深度學習方法學習;建立自然語言規則資料集,以提供該命名實體識別元件作為參照;以該深度學習資料集以及該自然語言規則資料集作為訓練集,以訓練該深度學習方法;建立關係人資料庫,該關係人資料庫包含複數關係人以及每一該等關係人所對應之關鍵字;建立工程專案資料庫,該工程專案資料庫包含複數工程管理項目以及每一該等工程管理項目所對應之關鍵字;實施關聯法則演算法,以建立該等關鍵字相對於該關係人資料庫中每一該等關係人以及該工程專案資料庫中每一該等工程管理項目之關聯法則;以及應用該深度學習方法優化該關聯法則。 Embodiment 4: The engineering project document management method as described in embodiment 3 also includes one of the following: establishing a deep learning data set to train the deep learning method and provide the deep learning method for learning; establishing a natural language rule data set , to provide the named entity recognition element as a reference; use the deep learning data set and the natural language rule data set as a training set to train the deep learning method; establish a relational person database, and the relational person database includes plural relational persons and the keywords corresponding to each of these related parties; establish a project project database, which contains multiple project management projects and keywords corresponding to each of these project management projects; implement the algorithm of association rules to Establishing an association rule of the keywords with respect to each of the related persons in the related person database and each of the engineering management items in the engineering project database; and optimizing the association rule by applying the deep learning method.
實施例5:如實施例4所述之工程專案文件管理方法,其中該深度學習方法係為類神經網路、深度神經網路、遞歸神經網路、卷積神經網路、卷積遞歸神經網路、生成對抗網路、深度信念網路、全卷積神經網路、多列卷積神經網路、遞歸神經網路、長短期記憶模型、雙向神經網路、深層循環神經網路、殘差網路、限制玻爾茲曼機、多層感知、自編碼器、注意力網路、集成學習、非監督式分類方法、監督式分類方法、基於規則的方法、提升樹方法、梯度提升樹方法、強梯度提升機方法、弱梯度提升機方法、回歸樹方法、隨機森林方法、決策樹方法、弱學習方法、強學習方法、強投票方法、弱投票方法或者支援向量機分類器。 Embodiment 5: The engineering project file management method as described in embodiment 4, wherein the deep learning method is a neural network, a deep neural network, a recurrent neural network, a convolutional neural network, a convolutional recurrent neural network Road, Generative Adversarial Networks, Deep Belief Networks, Fully Convolutional Neural Networks, Multi-column Convolutional Neural Networks, Recurrent Neural Networks, Long Short-Term Memory Models, Bidirectional Neural Networks, Deep Recurrent Neural Networks, Residual Networks, Restricted Boltzmann Machines, Multilayer Perception, Autoencoders, Attention Networks, Ensemble Learning, Unsupervised Classification Methods, Supervised Classification Methods, Rule-Based Methods, Boosted Tree Methods, Gradient Boosted Tree Methods, Strong gradient boosting machine methods, weak gradient boosting machine methods, regression tree methods, random forest methods, decision tree methods, weak learning methods, strong learning methods, strong voting methods, weak voting methods, or support vector machine classifiers.
實施例6:如實施例1所述之工程專案文件管理方法,其中該等電子文件係為施工紀錄、施工日誌、施工照片、會議紀錄、督導檢查、自主檢查表或者契約文件。
Embodiment 6: The engineering project document management method as described in
實施例7:一種工程專案文件管理系統,其包含:後端資料層,其供儲存自然語言規則資料集、深度學習資料集、關鍵字資料庫、關係人資料庫、工程專案資料庫以及文件資料庫其中之一;中繼邏輯層,其執行工程專案文件管理平台以及該工程專案文件管理平台包含的自然語言解析器;以及前端展示層,其執行該工程專案文件管理平台包含之前端元件以及自然語言文件查詢元件,其中使用者在使用者裝置上透過該前端元件將包含複數文字的複數電子文件上傳該工程專案文件管理平台,以供該自然語言解析器識別該等電子文件所包含的複數關鍵字,並儲存到該關鍵字資料庫,以便該工程專案文件管理平台按照該等關鍵字並依據關聯法則,將該等電子文件分派給至少一關係人並歸類到工程管理項目,以及透過該自然語言文件查詢元件檢索該關鍵字資料庫中之該等關鍵字而找到對應之電子文件。 Embodiment 7: An engineering project file management system, which includes: a back-end data layer for storing natural language rule data sets, deep learning data sets, keyword databases, related person databases, engineering project databases, and document data One of the libraries; the relay logic layer, which executes the project file management platform and the natural language parser contained in the project file management platform; and the front-end display layer, which executes the project file management platform. A language file query component, wherein the user uploads multiple electronic files containing multiple characters to the engineering project file management platform through the front-end component on the user device, so that the natural language parser can identify the multiple keys contained in the electronic files words, and store them in the keyword database, so that the engineering project document management platform can assign these electronic documents to at least one related person and classify them into engineering management projects according to the keywords and the association rules, and through the The natural language document query component searches the keywords in the keyword database to find the corresponding electronic documents.
實施例8:如實施例7所述之工程專案文件管理系統,其中該後端資料層包含至少一台或多台資料庫伺服器,該中繼邏輯層包含至少一台或多台程式伺服器,該前端展示層包含至少一台或多台網頁伺服器,其中該等程式伺服器係經由第一伺服器負載平衡設備而分配工作負載,該等網頁伺服器係經由第二伺服器負載平衡設備而分配工作負載。 Embodiment 8: The engineering project document management system as described in Embodiment 7, wherein the back-end data layer includes at least one or more database servers, and the relay logic layer includes at least one or more program servers , the front-end display layer includes at least one or more web servers, wherein the program servers distribute the workload through the first server load balancing device, and the web servers are distributed through the second server load balancing device And distribute the workload.
實施例9:如實施例7所述之工程專案文件管理系統,其中該等電子文件之檔案格式係為PDF檔案格式、Power Point檔案格式、Power Point檔案兼容格式、WORD檔案格式、WORD檔案兼容格式、Excel檔案格式、Excel檔案兼容格式、JPG檔案格式、JPEG檔案格式或者PNG檔案格式。 Embodiment 9: The engineering project document management system as described in Embodiment 7, wherein the file format of these electronic files is PDF file format, Power Point file format, Power Point file compatible format, WORD file format, WORD file compatible format , Excel file format, Excel file compatible format, JPG file format, JPEG file format or PNG file format.
實施例10:如實施例7所述之工程專案文件管理系統,其中 該使用者裝置係為桌上型電腦、筆記型電腦、平板裝置或者智慧手機。 Embodiment 10: The engineering project document management system as described in Embodiment 7, wherein The user device is a desktop computer, a notebook computer, a tablet device or a smart phone.
本發明各實施例彼此之間可以任意組合或者替換,從而衍生更多之實施態樣,但皆不脫本發明所欲保護之範圍,本發明保護範圍之界定,悉以本發明申請專利範圍所記載者為準。 The various embodiments of the present invention can be combined or replaced arbitrarily with each other, thereby deriving more implementation forms, but none of them depart from the intended protection scope of the present invention, and the definition of the protection scope of the present invention is fully defined by the patent scope of the present invention application The recorder shall prevail.
500:本發明工程專案文件管理方法 500: Project file management method of the present invention
501-506:實施步驟 501-506: Implementation steps
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109126902A TWI793432B (en) | 2020-08-07 | 2020-08-07 | Document management method and system for engineering project |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW109126902A TWI793432B (en) | 2020-08-07 | 2020-08-07 | Document management method and system for engineering project |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202207109A TW202207109A (en) | 2022-02-16 |
| TWI793432B true TWI793432B (en) | 2023-02-21 |
Family
ID=81323352
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW109126902A TWI793432B (en) | 2020-08-07 | 2020-08-07 | Document management method and system for engineering project |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI793432B (en) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TWI823815B (en) * | 2023-05-10 | 2023-11-21 | 犀動智能科技股份有限公司 | Abstract generation methods and systems and computer program products |
| US12332895B2 (en) * | 2023-10-27 | 2025-06-17 | International Business Machines Corporation | High-performance resource and job scheduling |
| TWI839316B (en) * | 2023-11-03 | 2024-04-11 | 國立中央大學 | Tracking system and integration of existing positioning system docking parts device |
| CN120067061B (en) * | 2025-04-28 | 2025-09-19 | 航天中认软件测评科技(北京)有限责任公司 | A method and device for processing FPGA engineering files |
Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW200719172A (en) * | 2005-11-04 | 2007-05-16 | Webgenie Information Ltd | Method for automatically detecting similar documents |
| TW201033823A (en) * | 2008-12-09 | 2010-09-16 | Ibm | Systems and methods for analyzing electronic text |
| TWI438639B (en) * | 2011-09-26 | 2014-05-21 | Univ Ming Chuan | Method and system for document classification |
| TW201506650A (en) * | 2013-05-09 | 2015-02-16 | Hon Hai Prec Ind Co Ltd | System and method for sorting documents |
| CN109583796A (en) * | 2019-01-08 | 2019-04-05 | 河南省灵山信息科技有限公司 | A kind of data digging system and method for Logistics Park OA operation analysis |
| CN110019018A (en) * | 2017-09-22 | 2019-07-16 | 三星Sds株式会社 | File recommended method and file recommendation apparatus |
| TWM583974U (en) * | 2019-03-21 | 2019-09-21 | 洽吧智能股份有限公司 | Document information retrieval and filing system |
| CN110413767A (en) * | 2019-08-05 | 2019-11-05 | 浙江核新同花顺网络信息股份有限公司 | System and method based on spatial term rendering content |
| TWI682286B (en) * | 2018-08-31 | 2020-01-11 | 愛酷智能科技股份有限公司 | System for document searching using results of text analysis and natural language input |
| TWM590730U (en) * | 2019-06-10 | 2020-02-11 | 李蓉芳 | Document management system base on AI |
| CN111475467A (en) * | 2020-03-27 | 2020-07-31 | 平安科技(深圳)有限公司 | A file management method, cloud file management system and terminal |
-
2020
- 2020-08-07 TW TW109126902A patent/TWI793432B/en active
Patent Citations (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW200719172A (en) * | 2005-11-04 | 2007-05-16 | Webgenie Information Ltd | Method for automatically detecting similar documents |
| TW201033823A (en) * | 2008-12-09 | 2010-09-16 | Ibm | Systems and methods for analyzing electronic text |
| TWI438639B (en) * | 2011-09-26 | 2014-05-21 | Univ Ming Chuan | Method and system for document classification |
| TW201506650A (en) * | 2013-05-09 | 2015-02-16 | Hon Hai Prec Ind Co Ltd | System and method for sorting documents |
| CN110019018A (en) * | 2017-09-22 | 2019-07-16 | 三星Sds株式会社 | File recommended method and file recommendation apparatus |
| TWI682286B (en) * | 2018-08-31 | 2020-01-11 | 愛酷智能科技股份有限公司 | System for document searching using results of text analysis and natural language input |
| CN109583796A (en) * | 2019-01-08 | 2019-04-05 | 河南省灵山信息科技有限公司 | A kind of data digging system and method for Logistics Park OA operation analysis |
| TWM583974U (en) * | 2019-03-21 | 2019-09-21 | 洽吧智能股份有限公司 | Document information retrieval and filing system |
| TWM590730U (en) * | 2019-06-10 | 2020-02-11 | 李蓉芳 | Document management system base on AI |
| CN110413767A (en) * | 2019-08-05 | 2019-11-05 | 浙江核新同花顺网络信息股份有限公司 | System and method based on spatial term rendering content |
| CN111475467A (en) * | 2020-03-27 | 2020-07-31 | 平安科技(深圳)有限公司 | A file management method, cloud file management system and terminal |
Non-Patent Citations (1)
| Title |
|---|
| 網路文獻 陳威翰、陳介豪 運用自然語言處理技術輔助工程專案合約利害關係人平台之研究 國立中央大學土木系營建管理碩士班 2020/06/05 https://ir.lib.ncu.edu.tw/handle/987654321/82938#.YRTsDYgzaUk,https://hdl.handle.net/11296/qbb7q2 * |
Also Published As
| Publication number | Publication date |
|---|---|
| TW202207109A (en) | 2022-02-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11816138B2 (en) | Systems and methods for parsing log files using classification and a plurality of neural networks | |
| TWI793432B (en) | Document management method and system for engineering project | |
| CN102053991B (en) | Method and system for multi-language document retrieval | |
| CN110990597B (en) | Cross-modal data retrieval system and retrieval method based on text semantic mapping | |
| TWI743623B (en) | Artificial intelligence-based business intelligence system and its analysis method | |
| US12242432B2 (en) | Guiding a generative model to create and interact with a data structure | |
| CN113761208A (en) | Scientific and technological innovation information classification method and storage device based on knowledge graph | |
| CN115630843A (en) | Contract clause automatic checking method and system | |
| CN112307303A (en) | Efficient and accurate network page duplicate removal system based on cloud computing | |
| CN117095419A (en) | PDF document data processing and information extracting device and method | |
| CN115130435A (en) | Document processing method and device, electronic equipment and storage medium | |
| CN118069843A (en) | Social media public opinion recognition method based on cross-language transfer learning algorithm framework | |
| CN119271630A (en) | A method, system and device for retrieving archive data | |
| CN120014664A (en) | A method and related device for extracting information of relay protection setting value list of power system | |
| CN117785861A (en) | A multi-source heterogeneous data processing method and system | |
| CN117009595A (en) | Text paragraph acquisition method and device, storage medium and program product thereof | |
| CN118779458B (en) | A sensitive information analysis and identification method, system, device and readable storage medium | |
| Wu et al. | Design of a Computer‐Based Legal Information Retrieval System | |
| CN112417220A (en) | Heterogeneous data integration method | |
| CN118964443A (en) | A contract management method, device and medium based on information retrieval | |
| CN118013094A (en) | Unstructured index evaluation data processing method, system and equipment | |
| CN117252514A (en) | Building database data processing method based on deep learning and model training | |
| Zhang et al. | An Introduction to the Implementation Strategy of Unstructured Data Governance for Aviation Enterprise | |
| CN120873214B (en) | Multi-mode large model-based multimedia material label management method and system | |
| KR102888843B1 (en) | System and method for digital archival document extraction service using machine learning-based ocr and automatic labeling technology |