TWI891171B

TWI891171B - Object identification method and system and computer program product

Info

Publication number: TWI891171B
Application number: TW112149297A
Authority: TW
Inventors: 陳政隆; 春祿阮; 賴宗誠; 劉祐延; 蔡捷; 陳韋太; 姚穎; 曉輝王
Original assignee: 所羅門股份有限公司
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2025-07-21
Also published as: TW202526832A

Abstract

一種由物件識別系統實施的物件識別方法，其包含：利用一外觀特徵匹配模型根據一特徵資料集合分析一待識別物件於一拍攝結果中表現出的外觀，以產生一對應於該待識別物件的特徵識別結果；在利用一光學字元辨識模型辨識出該拍攝結果中被呈現於該待識別物件上的一關鍵文字資料之後，利用一文字關聯度分析模型根據一品名資料集合分析該關鍵文字資料，以產生一對應於該待識別物件的品名識別結果；根據該特徵識別結果及該品名識別結果的其中至少一者，將多筆物件品名資料中的其中一者作為一筆目標物件品名資料，並將該目標物件品名資料輸出。An object recognition method implemented by an object recognition system comprises: using an appearance feature matching model to analyze the appearance of an object to be recognized in a photographic result according to a feature data set to generate a feature recognition result corresponding to the object to be recognized; and using an optical character recognition model to recognize a key word presented on the object to be recognized in the photographic result. After the data is obtained, a text relevance analysis model is used to analyze the key text data based on a product name data set to generate a product name recognition result corresponding to the object to be identified; based on at least one of the feature recognition result and the product name recognition result, one of the multiple object name data is used as a target object name data, and the target object name data is output.

Description

Object identification method and system and computer program product

本發明是有關於一種識別方法，特別是指一種用於識別影像中之物件的物件識別方法。本發明還有關於一種用於識別影像中之物件的物件識別系統，以及一種能使電子裝置識別影像中之物件的電腦程式產品。The present invention relates to an identification method, and more particularly to an object identification method for identifying objects in an image. The present invention also relates to an object identification system for identifying objects in an image, and a computer program product that enables an electronic device to identify objects in an image.

近年來，影像辨識技術已被廣泛應用於物件識別，亦即利用電腦視覺來辨識出物件的種類。In recent years, image recognition technology has been widely used in object recognition, which is to use computer vision to identify the type of object.

現有技術的其中一種常見作法，是讓電腦透過機器學習技術對於物件的形狀、輪廓等特徵進行學習，藉此實現從影像中辨識出特定物件的功能。One common approach in existing technologies is to allow computers to learn features such as the shape and outline of objects through machine learning technology, thereby achieving the function of identifying specific objects from images.

然而，如果一堆物件中實際上存在外型相似的多種物件（例如將品牌相同、包裝彼此相似的多款商品混雜堆放在一起），現有技術便可能難以正確辨識出每一個物件的種類。因此，要如何改善現有技術對於相似外型之物件的識別，便成為值得探討的議題。However, if a pile of objects actually contains multiple items with similar appearances (for example, mixing together multiple products of the same brand with similar packaging), existing technologies may have difficulty accurately identifying the type of each object. Therefore, how to improve existing technologies for identifying objects with similar appearances has become an issue worth exploring.

為了對現有技術作出改善，本發明的其中一目的，便在於提供一種有助於正確識別相似外型之多種物件的物件識別方法。In order to improve the existing technology, one of the objectives of the present invention is to provide an object recognition method that helps to accurately identify multiple objects with similar appearances.

本發明物件識別方法由一物件識別系統實施，該物件識別系統包含一處理單元及一儲存單元，該儲存單元儲存有一特徵資料集合、一品名資料集合、一基於影像辨識技術被實現的外觀特徵匹配模型、一光學字元辨識模型，以及一具備文字處理能力的文字關聯度分析模型，該特徵資料集合包含多筆分別對應於多個物件種類的物件特徵資料，該品名資料集合包含多筆分別對應於該等物件種類，且還分別與該等物件特徵資料相對應的物件品名資料；該物件識別方法包含：(A)該處理單元獲得一呈現出至少一個待識別物件的拍攝結果，利用該外觀特徵匹配模型而根據該特徵資料集合分析該待識別物件於該拍攝結果中表現出的外觀，以產生一對應於該待識別物件且相關於該等物件品名資料的特徵識別結果；(B)在該處理單元利用該光學字元辨識模型辨識出該拍攝結果中被呈現於該待識別物件上的一關鍵文字資料的情形下，該處理單元利用該文字關聯度分析模型而根據該品名資料集合分析該關鍵文字資料，以產生一對應於該待識別物件且相關於該等物件品名資料的品名識別結果；(C)該處理單元根據該特徵識別結果及該品名識別結果的其中至少一者，將該等物件品名資料中的其中一者作為一筆目標物件品名資料，並將該目標物件品名資料輸出。The object recognition method of the present invention is implemented by an object recognition system, which includes a processing unit and a storage unit. The storage unit stores a feature data set, a product name data set, an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities. The object identification method comprises: (A) the processing unit obtains a photographic result showing at least one object to be identified, and uses the appearance feature matching model to match the object name data with the feature data; The processing unit analyzes the appearance of the object to be identified in the photographic result to generate a feature recognition result corresponding to the object to be identified and related to the object name data; (B) when the processing unit uses the optical character recognition model to recognize a key text data presented on the object to be identified in the photographic result, the processing unit uses the text relevance analysis model to The key text data is analyzed based on the product name data set to generate a product name recognition result corresponding to the object to be recognized and related to the object name data; (C) the processing unit uses one of the object name data as a target object name data based on at least one of the feature recognition result and the product name recognition result, and outputs the target object name data.

在本發明物件識別方法的一些實施態樣中，在步驟(A)中，該處理單元是先從該等物件特徵資料中，選出與該待識別物件之外觀的匹配程度最高的其中一或多筆匹配物件特徵資料，再根據該（等）匹配物件特徵資料所分別對應的該（等）物件品名資料產生該特徵識別結果，在步驟(B)中，該處理單元是先從該等物件品名資料中，選出與該關鍵文字資料的匹配程度最高的其中一或多筆匹配物件品名資料，再根據該（等）匹配物件品名資料產生該品名識別結果。In some embodiments of the object identification method of the present invention, in step (A), the processing unit first selects one or more matching object feature data with the highest degree of match with the appearance of the object to be identified from the object feature data, and then generates the feature identification result based on the object name data (or items) corresponding to the matching object feature data (or items). In step (B), the processing unit first selects one or more matching object name data with the highest degree of match with the keyword text data from the object name data, and then generates the item name identification result based on the matching object name data (or items).

在本發明物件識別方法的一些實施態樣中，在步驟(A)中，該特徵識別結果包含該（等）匹配物件特徵資料所分別對應的該（等）物件品名資料，以及一或多個分別對應於該（等）匹配物件特徵資料的信心分數；在步驟(B)中，該品名識別結果包含該（等）匹配物件品名資料，以及另外一或多個分別對應於該（等）匹配物件品名資料的信心分數；在步驟(C)中，該處理單元決定出該目標物件品名資料的方式，是判斷該等物件品名資料中是否其中一筆物件品名資料被包含於該特徵識別結果及該品名識別結果內，於判斷結果為是的情形下，將該其中一筆物件品名資料作為該目標物件品名資料，於判斷結果為否的情形下，將被包含於該特徵識別結果或該品名識別結果的該等物件品名資料中，所對應之信心分數最高的該物件品名資料作為該目標物件品名資料。In some embodiments of the object recognition method of the present invention, in step (A), the feature recognition result includes the object name data corresponding to the matching object feature data, and one or more confidence scores corresponding to the matching object feature data; in step (B), the product name recognition result includes the matching object name data, and one or more confidence scores corresponding to the matching object name data; in step (C), the processing unit determines The target object name data is determined by determining whether one of the object name data is included in the feature recognition result and the product name recognition result. If the determination result is yes, the one of the object name data is used as the target object name data. If the determination result is no, the object name data with the highest confidence score among the object name data included in the feature recognition result or the product name recognition result is used as the target object name data.

在本發明物件識別方法的一些實施態樣中，在步驟(A)中，該處理單元是從該等物件特徵資料中，選出與該待識別物件之外觀的匹配程度最高的單一筆匹配物件特徵資料，再產生該特徵識別結果，且該特徵識別結果包含該匹配物件特徵資料所對應的該物件品名資料；在步驟(B)中，該處理單元是先利用該光學字元辨識模型判斷該拍攝結果中的該待識別物件上是否存在一文字資訊，並於判斷結果為是的情況下從該文字資訊中獲得該關鍵文字資料，並且，在該處理單元有辨識出該關鍵文字資料的情形下，該處理單元是先從該等物件品名資料中，選出與該關鍵文字資料的匹配程度最高的單一筆匹配物件品名資料，再產生該品名識別結果，且該品名識別結果包含該匹配物件品名資料；在步驟(C)中，若該處理單元在步驟(B)中判定該待識別物件上存在該文字資訊，該處理單元將被包含於該品名識別結果的該物件品名資料作為該目標物件品名資料，若該處理單元在步驟(B)中判定該待識別物件上不存在文字資訊，該處理單元將被包含於該特徵識別結果的該物件品名資料作為該目標物件品名資料。In some embodiments of the object recognition method of the present invention, in step (A), the processing unit selects a single matching object feature data that has the highest degree of matching with the appearance of the object to be recognized from the object feature data, and then generates the feature recognition result, and the feature recognition result includes the object name data corresponding to the matching object feature data; in step (B), the processing unit first uses the optical character recognition model to determine whether there is text information on the object to be recognized in the photographic result, and if the determination result is yes, obtains the key text data from the text information, and, in the case where the processing unit has recognized the key text data In the following, the processing unit first selects a single matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name recognition result, and the product name recognition result includes the matching object name data; in step (C), if the processing unit determines in step (B) that the text information exists on the object to be recognized, the processing unit uses the object name data included in the product name recognition result as the target object name data; if the processing unit determines in step (B) that the text information does not exist on the object to be recognized, the processing unit uses the object name data included in the feature recognition result as the target object name data.

在本發明物件識別方法的一些實施態樣中，在步驟(B)中，該處理單元是先利用該光學字元辨識模型判斷該拍攝結果中的該待識別物件上是否存在一由多個文字組成的文字資訊，並於判斷結果為是的情況下，至少根據該等文字在該拍攝結果中的大小及／或位置而從該文字資訊中擷取出其中一部分作為該關鍵文字資料。In some embodiments of the object recognition method of the present invention, in step (B), the processing unit first uses the optical character recognition model to determine whether text information consisting of multiple characters exists on the object to be identified in the photographic result. If the determination result is yes, at least a portion of the text information is extracted from the text information as the key text data based on the size and/or position of the characters in the photographic result.

本發明的另一目的，在於提供一種有助於正確識別相似外型之多種物件的物件識別系統。Another object of the present invention is to provide an object recognition system that helps to accurately identify multiple objects of similar appearance.

本發明物件識別系統包含一處理單元及一電連接該處理單元的儲存單元，該儲存單元儲存有一特徵資料集合、一品名資料集合、一基於影像辨識技術被實現的外觀特徵匹配模型、一光學字元辨識模型，以及一具備文字處理能力的文字關聯度分析模型，該特徵資料集合包含多筆分別對應於多個物件種類的物件特徵資料，該品名資料集合包含多筆分別對應於該等物件種類，且還分別與該等物件特徵資料相對應的物件品名資料。其中，該處理單元用於：獲得一呈現出至少一個待識別物件的拍攝結果，利用該外觀特徵匹配模型而根據該特徵資料集合分析該待識別物件於該拍攝結果中表現出的外觀，以產生一對應於該待識別物件且相關於該等物件品名資料的特徵識別結果；利用該光學字元辨識模型辨識出該拍攝結果中被呈現於該待識別物件上的一關鍵文字資料的情形下，該處理單元利用該文字關聯度分析模型而根據該品名資料集合分析該關鍵文字資料，以產生一對應於該待識別物件且相關於該等物件品名資料的品名識別結果；根據該特徵識別結果及該品名識別結果的其中至少一者，將該等物件品名資料中的其中一者作為一筆目標物件品名資料，並將該目標物件品名資料輸出。The object recognition system of the present invention includes a processing unit and a storage unit electrically connected to the processing unit. The storage unit stores a feature data set, a product name data set, an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities. The feature data set includes multiple pieces of object feature data corresponding to multiple object types, and the product name data set includes multiple pieces of object name data corresponding to the object types and corresponding to the object feature data. The processing unit is used to: obtain a photographic result showing at least one object to be identified; analyze the appearance of the object to be identified in the photographic result according to the feature data set using the appearance feature matching model to generate a feature recognition result corresponding to the object to be identified and related to the object name data; and recognize a key text displayed on the object to be identified in the photographic result using the optical character recognition model. In the case of word data, the processing unit uses the word relevance analysis model to analyze the key word data according to the product name data set to generate a product name recognition result corresponding to the object to be identified and related to the object name data; based on at least one of the feature recognition result and the product name recognition result, one of the object name data is used as a target object name data, and the target object name data is output.

在本發明物件識別系統的一些實施態樣中，該處理單元是先從該等物件特徵資料中，選出與該待識別物件之外觀的匹配程度最高的其中一或多筆匹配物件特徵資料，再根據該（等）匹配物件特徵資料所分別對應的該（等）物件品名資料產生該特徵識別結果，並且，該處理單元是先從該等物件品名資料中，選出與該關鍵文字資料的匹配程度最高的其中一或多筆匹配物件品名資料，再根據該（等）匹配物件品名資料產生該品名識別結果。In some embodiments of the object identification system of the present invention, the processing unit first selects one or more matching object feature data from the object feature data that has the highest degree of match with the appearance of the object to be identified, and then generates the feature identification result based on the object name data corresponding to the matching object feature data. In addition, the processing unit first selects one or more matching object name data from the object name data that has the highest degree of match with the keyword text data, and then generates the name identification result based on the matching object name data.

在本發明物件識別系統的一些實施態樣中，該特徵識別結果包含該（等）匹配物件特徵資料所分別對應的該（等）物件品名資料，以及一或多個分別對應於該（等）匹配物件特徵資料的信心分數；該品名識別結果包含該（等）匹配物件品名資料，以及另外一或多個分別對應於該（等）匹配物件品名資料的信心分數；該處理單元決定出該目標物件品名資料的方式，是判斷該等物件品名資料中是否其中一筆物件品名資料被包含於該特徵識別結果及該品名識別結果內，於判斷結果為是的情形下，將該其中一筆物件品名資料作為該目標物件品名資料，於判斷結果為否的情形下，將被包含於該特徵識別結果或該品名識別結果的該等物件品名資料中，所對應之信心分數最高的該物件品名資料作為該目標物件品名資料。In some embodiments of the object recognition system of the present invention, the feature recognition result includes the object name data corresponding to the matching object feature data, and one or more confidence scores corresponding to the matching object feature data; the product name recognition result includes the matching object name data, and one or more confidence scores corresponding to the matching object name data; the processing unit determines the target object name data. The method is to determine whether one of the object name data is included in the feature recognition result and the product name recognition result. If the determination result is yes, the one of the object name data is used as the target object name data. If the determination result is no, the object name data with the highest corresponding confidence score among the object name data included in the feature recognition result or the product name recognition result is used as the target object name data.

在本發明物件識別系統的一些實施態樣中，該處理單元是從該等物件特徵資料中，選出與該待識別物件之外觀的匹配程度最高的單一筆匹配物件特徵資料，再產生該特徵識別結果，且該特徵識別結果包含該匹配物件特徵資料所對應的該物件品名資料；該處理單元是先利用該光學字元辨識模型判斷該拍攝結果中的該待識別物件上是否存在一文字資訊，並於判斷結果為是的情況下從該文字資訊中獲得該關鍵文字資料，並且，在該處理單元有辨識出該關鍵文字資料的情形下，該處理單元是先從該等物件品名資料中，選出與該關鍵文字資料的匹配程度最高的單一筆匹配物件品名資料，再產生該品名識別結果，且該品名識別結果包含該匹配物件品名資料；若該處理單元判定該待識別物件上存在該文字資訊，該處理單元將被包含於該品名識別結果的該物件品名資料作為該目標物件品名資料，若該處理單元判定該待識別物件上不存在文字資訊，該處理單元將被包含於該特徵識別結果的該物件品名資料作為該目標物件品名資料。In some embodiments of the object recognition system of the present invention, the processing unit selects a single matching object feature data that has the highest degree of match with the appearance of the object to be recognized from the object feature data, and then generates the feature recognition result, and the feature recognition result includes the object name data corresponding to the matching object feature data; the processing unit first uses the optical character recognition model to determine whether there is text information on the object to be recognized in the photographic result, and if the determination result is yes, obtains the key text data from the text information, and, after the processing unit recognizes the key text data In this case, the processing unit first selects a single matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name recognition result, and the product name recognition result includes the matching object name data; if the processing unit determines that the text information exists on the object to be recognized, the processing unit will use the object name data included in the product name recognition result as the target object name data; if the processing unit determines that the text information does not exist on the object to be recognized, the processing unit will use the object name data included in the feature recognition result as the target object name data.

在本發明物件識別系統的一些實施態樣中，該處理單元是先利用該光學字元辨識模型判斷該拍攝結果中的該待識別物件上是否存在一由多個文字組成的文字資訊，並於判斷結果為是的情況下，至少根據該等文字在該拍攝結果中的大小及／或位置而從該文字資訊中擷取出其中一部分作為該關鍵文字資料。In some embodiments of the object recognition system of the present invention, the processing unit first uses the optical character recognition model to determine whether text information consisting of multiple characters exists on the object to be identified in the photographic result. If the determination result is yes, the processing unit extracts at least a portion of the text information as the key text data based on the size and/or position of the characters in the photographic result.

本發明的再一目的，在於提供一種能使電子裝置正確識別相似外型之多種物件的電腦程式產品。Another object of the present invention is to provide a computer program product that enables an electronic device to correctly identify multiple objects of similar appearance.

本發明的電腦程式產品包含一應用程式，該應用程式包含一基於影像辨識技術被實現的外觀特徵匹配模型、一光學字元辨識模型，以及一具備文字處理能力的文字關聯度分析模型，在一電腦裝置獲得一特徵資料集合及一品名資料集合的情況下，當該電子裝置載入並運行該應用程式時，該應用程式能使該電腦裝置實施如前述任一實施態樣中所述的物件識別方法，其中，該特徵資料集合包含多筆分別對應於多個物件種類的物件特徵資料，該品名資料集合包含多筆分別對應於該等物件種類，且還分別與該等物件特徵資料相對應的物件品名資料。The computer program product of the present invention includes an application program, which includes an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities. When a computer device obtains a feature data set and a product name data set, when the electronic device loads and runs the application program, the application program enables the computer device to implement the object identification method described in any of the aforementioned embodiments, wherein the feature data set includes multiple object feature data corresponding to multiple object types, and the product name data set includes multiple object name data corresponding to the object types and corresponding to the object feature data.

本發明之功效在於：該物件識別系統能同時考量待識別物件的外型以及存在於其上的文字，來綜合判斷要將該待識別物件辨識為哪一個物件種類，藉此，該物件識別系統有助於避免因不同種類之待識別物件的外型彼此相似而識別錯誤，而能夠更加準確地辨識出每一個待識別物件所屬的物件種類。The object recognition system of the present invention is effective in that it can simultaneously consider the appearance of the object to be identified and the text on it to comprehensively determine the object type to be identified. In this way, the object recognition system helps avoid recognition errors caused by the similar appearance of objects of different types to be identified, and can more accurately identify the object type of each object to be identified.

在本發明被詳細描述之前應當注意：在未特別定義的情況下，本專利說明書中所述的「電連接(electrically connected)」是用來描述電腦硬體（例如電子系統、設備、裝置、單元、元件）之間的「耦接(coupled)」關係，且泛指複數電腦硬體之間透過導體/半導體材料彼此實體相連而實現的「有線電連接」，以及利用無線通訊技術（例如但不限於無線網路、藍芽及電磁感應等）而實現無線資料傳輸的「無線電連接」。另一方面，在未特別定義的情況下，本專利說明書中所述的「電連接」也泛指複數電腦硬體之間彼此直接耦接而實現的「直接電連接」，以及複數電腦硬體之間是透過其他電腦硬體間接耦接而實現的「間接電連接」。Before the present invention is described in detail, it should be noted that, unless otherwise specified, the term "electrically connected" as used in this patent specification is used to describe the "coupled" relationship between computer hardware (e.g., electronic systems, devices, apparatuses, units, components), and generally refers to "wired electrical connections" achieved by physically connecting multiple computer hardware components to each other through conductive/semiconductor materials, as well as "radio connections" achieved by wireless data transmission using wireless communication technologies (such as, but not limited to, wireless networks, Bluetooth, and electromagnetic induction). On the other hand, unless otherwise specified, the "electrical connection" described in this patent specification also generally refers to a "direct electrical connection" achieved by directly coupling multiple computer hardware components to each other, and an "indirect electrical connection" achieved by indirectly coupling multiple computer hardware components through other computer hardware components.

本專利說明書提供了同一創作的多種實施例，因此，在後續的說明內容中，不同實施例之間的類似的元件是以相同的編號來表示。This patent specification provides multiple embodiments of the same invention. Therefore, in the following description, similar elements between different embodiments are represented by the same numbers.

參閱圖1，本發明物件識別系統1的一第一實施例是用於對影像中的物件進行辨識，以識別出影像中之物件的種類。Referring to FIG. 1 , a first embodiment of the object recognition system 1 of the present invention is used to identify objects in an image to identify the type of the object in the image.

該物件識別系統1包含一處理單元11，以及一電連接該處理單元11的儲存單元12。具體來說，在本實施例中，該物件識別系統1是一台電腦設備，該處理單元11是一個以積體電路實現且具有資料運算及指令收發功能的處理器，該儲存單元12則是一台用於儲存數位資料的資料儲存裝置（例如硬碟，但亦可為其他種類的電腦可讀取記錄媒體）。The object identification system 1 includes a processing unit 11 and a storage unit 12 electrically connected to the processing unit 11. Specifically, in this embodiment, the object identification system 1 is a computer device. The processing unit 11 is a processor implemented as an integrated circuit and having data processing and instruction transmission and reception functions. The storage unit 12 is a data storage device for storing digital data (such as a hard drive, but it can also be other types of computer-readable recording media).

然而，在本實施例的類似實施態樣中，該處理單元11也可以是一個包括有處理器及電路板的電路組件，而該儲存單元12也可以是多個相同或相異種類之儲存裝置的集合。進一步地，在其他實施例中，該物件識別系統1也可被實施為多台彼此電連接的電腦設備（例如多台伺服器），而且，在此類實施例中，該處理單元11可被實施為該等電腦設備所分別具有之多個處理器／電路組件的集合，而該儲存單元12則可被實施為該等電腦設備所分別具有之多個儲存裝置的集合。基於上述，該物件識別系統1在電腦硬體方面的實際實施態樣並不以本實施例為限。However, in similar implementations of this embodiment, the processing unit 11 may also be a circuit assembly including a processor and a circuit board, and the storage unit 12 may also be a collection of multiple storage devices of the same or different types. Furthermore, in other embodiments, the object identification system 1 may also be implemented as multiple electrically connected computer devices (e.g., multiple servers). Moreover, in such embodiments, the processing unit 11 may be implemented as a collection of multiple processors/circuit assemblies respectively possessed by these computer devices, and the storage unit 12 may be implemented as a collection of multiple storage devices respectively possessed by these computer devices. Based on the above, the actual implementation of the object identification system 1 in terms of computer hardware is not limited to this embodiment.

在本實施例中，該處理單元11適合於與一拍攝裝置2電連接，且該拍攝裝置2例如是一台被配置為朝下拍攝的相機或攝影機，藉此，該處理單元11可以即時獲得該拍攝裝置2所拍攝的影像，進而對其進行分析以識別其中的物件。在一些實施例中，該拍攝裝置2可被作為一拍攝單元而被包含在該物件識別系統1之內，然而，在另一些實施例中，若該處理單元11不需即時分析該拍攝裝置2所拍攝的結果，則該處理單元11便無需被配置成與該拍攝裝置2電連接。In this embodiment, the processing unit 11 is adapted to be electrically connected to a camera 2, and the camera 2 is, for example, a camera or camcorder configured to shoot downward. Thus, the processing unit 11 can instantly obtain images captured by the camera 2 and analyze them to identify objects therein. In some embodiments, the camera 2 can be included in the object recognition system 1 as a camera unit. However, in other embodiments, if the processing unit 11 does not need to instantly analyze the images captured by the camera 2, the processing unit 11 does not need to be electrically connected to the camera 2.

該儲存單元12儲存有一特徵資料集合D1、一品名資料集合D2、一外觀特徵匹配模型M1、一光學字元辨識模型M2，以及一文字關聯度分析模型M3。The storage unit 12 stores a feature data set D1, a product name data set D2, an appearance feature matching model M1, an optical character recognition model M2, and a text relevance analysis model M3.

該特徵資料集合D1包含多筆物件特徵資料D11，且該等物件特徵資料D11分別對應於多個物件種類。在本實施例中，每一個物件種類是對應於一款特定的產品，舉例來說，該等物件種類的其中三者，可例如分別為甲品牌的維生素B群、甲品牌的維生素C及甲品牌的維生素E等三種保健食品。另一方面，在本實施例中，每一物件特徵資料D11是一張圖像或照片，並且呈現出對應之該物件種類的產品的包裝外觀，例如產品包裝盒的正面或整個側周面的外觀，但並不以此為限。值得一提的是，以本實施例而言，若要增加能被該物件識別系統1所辨識的物件種類，只要於該特徵資料集合D1中新增欲辨識物件種類的相關產品照片即可實現，而不需耗費時間對相關產品的照片進行深度學習，因此，本實施例在所能辨識的物件種類方面具有優秀的擴充性。然而，補充說明的是，在其他實施例中，每一物件特徵資料D11也可以是透過機器學習對照片進行深度學習後的結果，換言之，每一物件特徵資料D11也可以是透過機器學習所學習到的外型特徵資料，因此，該等物件特徵資料D11的實際態樣並不以本實施例為限。The feature data set D1 includes multiple pieces of object feature data D11, each corresponding to a plurality of object categories. In this embodiment, each object category corresponds to a specific product. For example, three of the object categories may be health foods such as Brand A's Vitamin B complex, Brand A's Vitamin C, and Brand A's Vitamin E. Furthermore, in this embodiment, each piece of object feature data D11 is an image or photograph depicting the packaging appearance of the corresponding product of that object category, such as the front or entire side of a product packaging box, but not limited thereto. It is worth noting that, in this embodiment, if the object recognition system 1 wishes to increase the types of objects that can be recognized, this can be achieved by simply adding product photos related to the object type to be recognized to the feature data set D1, without the need to spend time performing deep learning on the photos of the relevant products. Therefore, this embodiment has excellent scalability in terms of the types of objects that can be recognized. However, it should be noted that in other embodiments, each object feature data D11 can also be the result of deep learning of photos through machine learning. In other words, each object feature data D11 can also be appearance feature data learned through machine learning. Therefore, the actual form of the object feature data D11 is not limited to this embodiment.

補充說明的是，在另一實施例的應用環境中，該拍攝裝置2為一深度相機，因此，該拍攝裝置2所拍攝的結果除了影像之外，還包含對應於影像的深度資訊。並且，在所述的該另一實施例中，每一物件特徵資料D11是包含一圖像（或照片）以及一尺寸資料，其中，該尺寸資料指示出對應之物件種類的物件尺寸（例如長度、寬度及高度）。In addition, in another embodiment, the camera device 2 is a depth camera. Therefore, the image captured by the camera device 2 includes not only the image but also depth information corresponding to the image. Furthermore, in this embodiment, each object feature data D11 includes an image (or photo) and size data, where the size data indicates the object size (e.g., length, width, and height) of the corresponding object type.

該品名資料集合D2包含多筆物件品名資料D21，其中，該等物件品名資料D21分別對應於該等物件種類，且還分別與該特徵資料集合D1的該等物件特徵資料D11彼此相對應。更具體地說，在本實施例中，每一物件品名資料D21是一個字串，且例如以自然語言形式的文字指示出對應之該物件種類的產品名稱。延續前述之甲品牌的維生素B群、維生素C及維生素E等三個物件種類舉例來說，該品名資料集合D2中分別對應於該三個物件種類的其中三筆物件品名資料D21，可例如分別被實現為「活力B群」、「活力C」及「活力E」，但並不以此為限。補充說明的是，在其他實施例中，每一物件品名資料D21並不限於以自然語言形式呈現的字串，而亦可被實施為相關產品的商品編號或識別碼，因此，該等物件品名資料D21的實際態樣並不以本實施例為限。The product name data set D2 includes a plurality of object name data D21, wherein the object name data D21 respectively correspond to the object categories, and also respectively correspond to the object feature data D11 of the feature data set D1. More specifically, in this embodiment, each object name data D21 is a string, and indicates the product name of the corresponding object category, for example, in the form of text in natural language. Continuing with the aforementioned example of the three object categories of Brand A, namely, Vitamin B complex, Vitamin C, and Vitamin E, the three object name data D21 corresponding to the three object categories in the product name data set D2 can be implemented as "Vitality B complex", "Vitality C", and "Vitality E", for example, respectively, but is not limited to this. It should be noted that, in other embodiments, each object name data D21 is not limited to a string presented in natural language form, but can also be implemented as a product number or identification code of the relevant product. Therefore, the actual form of the object name data D21 is not limited to this embodiment.

該外觀特徵匹配模型M1是一個基於影像辨識技術被實現的軟體模組，且其能夠被該處理單元11載入並運行。其中，所述的影像辨識技術可例如是基於深度學習的圖像分割技術(Image Segmentation)，詳言之，該外觀特徵匹配模型M1可以是利用圖像分割技術中的語義分割技術(Semantic Segmentation)、實例分割技術(Instance Segmentation)或者全景分割技術(Panoramic Segmentation)來實現。由於該外觀特徵匹配模型M1可利用諸多現有技術實現，且其細節並非本說明書之重點，故在此不過度詳述其細節。The appearance feature matching model M1 is a software module implemented based on image recognition technology and can be loaded and run by the processing unit 11. The image recognition technology can be, for example, image segmentation technology based on deep learning. Specifically, the appearance feature matching model M1 can be implemented using semantic segmentation technology, instance segmentation technology, or panoramic segmentation technology within image segmentation technology. Since the appearance feature matching model M1 can be implemented using a variety of existing technologies and its details are not the focus of this specification, they will not be described in detail here.

該光學字元辨識模型M2是一個能夠被該處理單元11載入並運行的軟體模組，且其是以現有的光學字元辨識技術（Optical Character Recognition，簡稱OCR）所實現。The optical character recognition model M2 is a software module that can be loaded and run by the processing unit 11 and is implemented using existing optical character recognition (OCR) technology.

該文字關聯度分析模型M3是一個具備文字處理能力的軟體模組，且其能夠被該處理單元11載入並運行。更明確地說，在本實施例中，該文字關聯度分析模型M3是一個以類神經網路技術實現的大型語言模型（Large Language Model，簡稱LLM），例如但不限於基於轉換器的生成式預訓練模型（Generative Pre-trained Transformers，簡稱GPT）、對話程式語言模型（Language Model for Dialogue Applications，也稱LaMDA）或者LLaMA（全名為Large Language Model Meta AI）。然而，在另一種實施例中，該文字關聯度分析模型M3也可以是利用CLIP（全名為Contrastive Language-Image Pre-Training）模型來實現，因此，該文字關聯度分析模型M3並不限於以大型語言模型來實現。應當理解，無論是大型語言模型還是CLIP模型，其訓練及運作方式皆為現有技術，故在此不過度詳述其細節。The text relevance analysis model M3 is a software module with text processing capabilities, and it can be loaded and run by the processing unit 11. More specifically, in this embodiment, the text relevance analysis model M3 is a large language model (LLM) implemented with neural network-like technology, such as but not limited to a generative pre-trained transformer (GPT) model based on a transformer, a language model for dialogue applications (LaMDA), or LLaMA (Large Language Model Meta AI). However, in another embodiment, the text relevance analysis model M3 can also be implemented using a CLIP (Contrastive Language-Image Pre-Training) model. Therefore, the text relevance analysis model M3 is not limited to being implemented with a large language model. It should be understood that the training and operation methods of both the Large Language Model and the CLIP model are existing technologies, so we will not go into too much detail here.

參閱圖2，以下示例性地詳細說明本實施例的該物件識別系統1如何實施一物件識別方法。Referring to FIG. 2 , the following exemplarily illustrates in detail how the object identification system 1 of this embodiment implements an object identification method.

首先，在步驟S1中，該處理單元11獲得一呈現出至少一個待識別物件的拍攝結果。具體而言，在本實施例中，該處理單元11是藉由控制該拍攝裝置2進行拍攝，以從該拍攝裝置2接收由該拍攝裝置2所產生的該拍攝結果，且該拍攝結果例如是一張照片。補充說明的是，在該處理單元11未與該拍攝裝置2電連接的實施例中，該拍攝結果可例如是被預先儲存在該儲存單元12，因此，該處理單元11也可以是藉由對該儲存單元12進行讀取以獲得該拍攝結果。又或者，該處理單元11也可以是從圖未示出的一個外部電子裝置（例如電腦、手機或隨身碟）接收該拍攝結果。First, in step S1, the processing unit 11 obtains a photographic result showing at least one object to be identified. Specifically, in this embodiment, the processing unit 11 controls the camera 2 to capture images, thereby receiving the photographic result generated by the camera 2 from the camera 2. The photographic result is, for example, a photograph. It should be noted that in embodiments where the processing unit 11 is not electrically connected to the camera 2, the photographic result may be pre-stored in the storage unit 12, for example. Therefore, the processing unit 11 may also obtain the photographic result by reading from the storage unit 12. Alternatively, the processing unit 11 may also receive the shooting result from an external electronic device (such as a computer, a mobile phone or a flash drive) not shown in the figure.

此外，在實際的實施態樣中，該拍攝結果可能呈現出多個待識別物件，但也可能僅呈現出單一個待識別物件。為了便於描述，此處假設該拍攝結果僅呈現出單一個待識別物件。In addition, in actual implementation, the shooting result may show multiple objects to be identified, but it may also show only a single object to be identified. For the sake of ease of description, it is assumed here that the shooting result shows only a single object to be identified.

在該處理單元11獲得該拍攝結果之後，流程進行至步驟S2。After the processing unit 11 obtains the shooting result, the process proceeds to step S2.

在步驟S2中，該處理單元11利用該外觀特徵匹配模型M1（即載入並運行該外觀特徵匹配模型M1），而根據該特徵資料集合D1分析該待識別物件於該拍攝結果中表現出的外觀，以產生一對應於該待識別物件且相關於該等物件品名資料D21的特徵識別結果。In step S2, the processing unit 11 uses the appearance feature matching model M1 (i.e., loads and runs the appearance feature matching model M1) to analyze the appearance of the object to be identified in the photographic result according to the feature data set D1 to generate a feature recognition result corresponding to the object to be identified and related to the object name data D21.

具體而言，該處理單元11是利用該外觀特徵匹配模型M1，而將該待識別物件於該拍攝結果中表現出的外觀與該等物件特徵資料D11進行外型相似度的比對，並從該等物件特徵資料D11中，選出與該待識別物件之外觀的匹配程度最高（亦即外型相似程度最高）的其中一或多筆匹配物件特徵資料D11’（圖1示例性地示出一者），再根據該（等）匹配物件特徵資料D11’所分別對應的該（等）物件品名資料D21產生該特徵識別結果。Specifically, the processing unit 11 utilizes the appearance feature matching model M1 to compare the appearance of the object to be identified in the photographic result with the object feature data D11 for appearance similarity, and selects one or more matching object feature data D11′ (one of which is shown as an example in FIG1 ) that has the highest degree of match with the appearance of the object to be identified (i.e., the highest degree of appearance similarity) from the object feature data D11, and then generates the feature recognition result based on the object name data D21 corresponding to each of the matching object feature data D11′.

進一步言之，在本實施例中，該處理單元11是選出與該待識別物件之外觀的匹配程度最高且符合一預設筆數（例如但不限於兩筆）的多筆匹配物件特徵資料D11’，再根據該等匹配物件特徵資料D11’所分別對應的該等物件品名資料D21產生該特徵識別結果。而且，在本實施例中，該特徵識別結果包含該等匹配物件特徵資料D11’所分別對應的該等物件品名資料D21（例如前述舉例的「活力B群」、「活力C」等），以及多個分別對應於該等匹配物件特徵資料D11’的信心分數。其中，每一信心分數可例如是介於0至1之間，並且，對於每一匹配物件特徵資料D11’及其所對應的該信心分數，該信心分數可被理解為該處理單元11對於「該待識別物件在該拍攝結果中的外觀與該匹配物件特徵資料D11’相符」的信心程度。Furthermore, in this embodiment, the processing unit 11 selects a plurality of matching object feature data D11' that best matches the appearance of the object to be identified and that meets a predetermined number (e.g., but not limited to, two pieces), and then generates the feature recognition result based on the object name data D21 corresponding to each of the matching object feature data D11'. Furthermore, in this embodiment, the feature recognition result includes the object name data D21 corresponding to each of the matching object feature data D11' (e.g., "Vitality Group B," "Vitality C," etc., as mentioned above) and a plurality of confidence scores corresponding to each of the matching object feature data D11'. Here, each confidence score may be, for example, between 0 and 1, and, for each matching object feature data D11' and its corresponding confidence score, the confidence score may be understood as the degree of confidence of the processing unit 11 that "the appearance of the object to be identified in the photographic result is consistent with the matching object feature data D11'".

在該處理單元11產生該特徵識別結果之後，流程進行至步驟S3。After the processing unit 11 generates the feature recognition result, the process proceeds to step S3.

在步驟S3中，該處理單元11利用該光學字元辨識模型M2（即載入並運行該光學字元辨識模型M2），判斷該拍攝結果中的該待識別物件上是否存在一由多個文字組成的文字資訊。若判斷結果為是，流程進行至步驟S4，另一方面，若判斷結果為否，流程則進行至步驟S7。In step S3, the processing unit 11 uses the optical character recognition model M2 (i.e., loads and runs the optical character recognition model M2) to determine whether text information consisting of multiple characters is present on the object to be recognized in the photographed result. If the determination is yes, the process proceeds to step S4. If not, the process proceeds to step S7.

在接續於步驟S3之後的步驟S4中，一旦判定該拍攝結果中的該待識別物件上存在該文字資訊，該處理單元11對該文字資訊執行一文字預處理，以從該文字資訊中擷取出其中一部分，並將擷取出的該部分作為一關鍵文字資料。In step S4 following step S3, once it is determined that the text information exists on the object to be identified in the photographic result, the processing unit 11 performs a text pre-processing on the text information to extract a portion of the text information and uses the extracted portion as a key text data.

具體而言，該處理單元11執行該文字預處理的方式，例如是根據該文字資訊之該等文字的排列方式（例如排列順序、方向及位置）對該文字資訊進行分句、分詞，並據以將該文字資訊拆分成多個文字部分，接著，該處理單元11再例如根據該等文字部分的字數、語意、在該拍攝結果中的字體大小及／或其彼此之間的相對位置關係，而從該等文字部分中選出其中一或多者來作為該關鍵文字資料。換言之，該關鍵文字資料是該拍攝結果中被呈現於該待識別物件上，且由該處理單元11利用該光學字元辨識模型M2所辨識出的其中一或多個文字部分。Specifically, the processing unit 11 performs text pre-processing by, for example, segmenting the text information into sentences and words based on the arrangement of the characters in the text information (e.g., order, direction, and position), thereby breaking the text information into multiple text parts. The processing unit 11 then selects one or more of the text parts as the key text data based on, for example, the number of characters, semantics, font size in the image, and/or their relative positional relationships. In other words, the key text data is one or more text parts that appear on the object to be recognized in the image and are recognized by the processing unit 11 using the optical character recognition model M2.

進一步舉例來說，該處理單元11可例如是從該等文字部分中，選出字數小於一門檻值、語意指示出一產品名稱或品牌、字體大小在所有文字部分中屬於最大或次大，且位於該文字資訊所佔之影像區域的相對中央部分的其中一或兩個文字部分，來作為該關鍵文字資料。補充說明的是，該處理單元11要選出哪些文字部分作為該關鍵文字資料，也可以利用監督式機器學習的學習結果來決定，而並不以前述所舉之例為限。For example, the processing unit 11 may select, as the key text data, one or two text portions whose word count is less than a threshold, whose semantics indicate a product name or brand, whose font size is the largest or second largest among all text portions, and whose location is relatively central to the image area occupied by the text information. It should be noted that the selection of text portions by the processing unit 11 as the key text data may also be determined based on the results of supervised machine learning, and is not limited to the above example.

在該處理單元11從該文字資訊中擷取出該關鍵文字資料之後，流程進行至步驟S5。After the processing unit 11 extracts the key text data from the text information, the process proceeds to step S5.

在步驟S5中，該處理單元11利用該文字關聯度分析模型M3（即載入並運行該文字關聯度分析模型M3），而根據該品名資料集合D2分析該關鍵文字資料，以產生一對應於該待識別物件且相關於該等物件品名資料D21的品名識別結果。In step S5, the processing unit 11 uses the text relevance analysis model M3 (i.e., loads and runs the text relevance analysis model M3) to analyze the key text data according to the product name data set D2 to generate a product name recognition result corresponding to the object to be recognized and related to the object name data D21.

具體而言，該處理單元11是利用該文字關聯度分析模型M3，而將該關鍵文字資料與該等物件品名資料D21進行文字相似度的比對，並從該等物件品名資料D21中，選出與該關鍵文字資料的匹配程度最高（亦即文字組成的相似程度最高）的其中一或多筆匹配物件品名資料D21’（圖1示例性地示出一者），再根據該（等）匹配物件品名資料D21’產生該品名識別結果。Specifically, the processing unit 11 utilizes the text relevance analysis model M3 to compare the key text data with the object name data D21 for text similarity, and selects one or more matching object name data D21' (one of which is shown as an example in FIG1 ) from the object name data D21 that has the highest degree of match with the key text data (i.e., the highest degree of similarity in text composition), and then generates the product name recognition result based on the matching object name data D21'.

進一步言之，在本實施例中，該處理單元11是選出與該關鍵文字資料的匹配程度最高且符合一預設筆數（例如但不限於兩筆）的多筆匹配物件品名資料D21’，再根據該等匹配物件品名資料D21’產生該品名識別結果。而且，在本實施例中，該品名識別結果包含該等匹配物件品名資料D21’（例如前述舉例的「活力B群」、「活力C」等），以及另外多個分別對應於該等匹配物件品名資料D21’的信心分數。其中，每一信心分數可例如是介於0至1之間，並且，對於每一匹配物件品名資料D21’及其所對應的該信心分數，該信心分數可被理解為該處理單元11對於「該待識別物件上的關鍵文字資料與該匹配物件品名資料D21’相符」的信心程度。Furthermore, in this embodiment, the processing unit 11 selects a plurality of matching object name data D21′ that best matches the key text data and meets a predetermined number (e.g., but not limited to, two) and then generates a product name recognition result based on these matching object name data D21′. Furthermore, in this embodiment, the product name recognition result includes the matching object name data D21′ (e.g., "Vitality Group B" and "Vitality C" in the aforementioned examples) and a plurality of confidence scores corresponding to the matching object name data D21′. Each confidence score may be, for example, between 0 and 1, and for each matching object name data D21′ and its corresponding confidence score, the confidence score may be understood as the degree of confidence of the processing unit 11 that “the keyword text data on the object to be identified matches the matching object name data D21′”.

在該處理單元11產生該品名識別結果之後，流程進行至步驟S6。After the processing unit 11 generates the product name recognition result, the process proceeds to step S6.

在步驟S6中，該處理單元11根據該特徵識別結果及該品名識別結果，而將該等物件品名資料D21中的其中一者作為一筆目標物件品名資料D21*（示例性地示於圖1），並將該目標物件品名資料D21*輸出。其中，該處理單元11決定出該目標物件品名資料D21*，相當於判定該待識別物件是屬於該目標物件品名資料D21*所對應的該物件種類（例如前述之甲品牌的維生素B群）。In step S6, the processing unit 11 uses one of the object name data D21 as target object name data D21* (illustrated in FIG. 1 ) based on the feature recognition result and the product name recognition result, and outputs the target object name data D21*. The processing unit 11 determines that the target object name data D21* is equivalent to determining that the object to be identified belongs to the object type corresponding to the target object name data D21* (e.g., the aforementioned brand A vitamin B complex).

具體而言，在本實施例中，該處理單元11決定出該目標物件品名資料D21*的方式，是判斷該等物件品名資料D21中，是否其中一筆物件品名資料D21被一併包含於該特徵識別結果及該品名識別結果內，若該處理單元11的判斷結果為是，則該處理單元11便將該其中一筆物件品名資料D21作為該目標物件品名資料D21*。舉例來說，假設該特徵識別結果包含的該兩物件品名資料D21分別為「活力B群」及「活力C」，而該品名識別結果包含的該兩物件品名資料D21（即匹配物件品名資料D21’）分別為「活力B群」及「活力E」，由於該特徵識別結果及該品名識別結果都包含有「活力B群」的物件品名資料D21，則該處理單元11便會將「活力B群」的物件品名資料D21作為該目標物件品名資料D21*。另一方面，若該處理單元11的判斷結果為否（即該特徵識別結果及該品名識別結果包含的物件品名資料D21完全不相同），則該處理單元11是將被包含於該特徵識別結果或該品名識別結果的所有該等物件品名資料D21中，所對應之信心分數最高的該物件品名資料D21作為該目標物件品名資料D21*。Specifically, in this embodiment, the processing unit 11 determines the target object name data D21* by determining whether one of the object name data D21 is included in both the feature recognition result and the product name recognition result. If the determination result of the processing unit 11 is yes, the processing unit 11 uses the one of the object name data D21 as the target object name data D21*. For example, assuming that the two object name data D21 included in the feature recognition result are "Vibrant Group B" and "Vibrant C", and the two object name data D21 included in the product name recognition result (i.e., the matching object name data D21') are "Vibrant Group B" and "Vibrant E", since both the feature recognition result and the product name recognition result include the object name data D21 of "Vibrant Group B", the processing unit 11 will use the object name data D21 of "Vibrant Group B" as the target object name data D21*. On the other hand, if the judgment result of the processing unit 11 is negative (i.e., the object name data D21 included in the feature recognition result and the product name recognition result are completely different), the processing unit 11 will use the object name data D21 with the highest corresponding confidence score among all the object name data D21 included in the feature recognition result or the product name recognition result as the target object name data D21*.

另一方面，該處理單元11將該目標物件品名資料D21*輸出的方式，可例如是將該目標物件品名資料D21*透過一顯示裝置（圖未示出）以顯示的方式輸出，以供使用者確認該物件識別系統1所辨識出的物件種類。然而，在其他實施例中，該處理單元11將該目標物件品名資料D21*輸出的方式，也可例如是將該目標物件品名資料D21*傳送至該儲存單元12儲存，或者是將該目標物件品名資料D21*傳送至外部電子裝置，而並不以本實施例為限。On the other hand, the processing unit 11 may output the target object name data D21* by, for example, displaying the target object name data D21* on a display device (not shown) to allow a user to confirm the object type identified by the object identification system 1. However, in other embodiments, the processing unit 11 may output the target object name data D21* by, for example, transmitting the target object name data D21* to the storage unit 12 for storage, or transmitting the target object name data D21* to an external electronic device, and the present embodiment is not limited thereto.

在接續於步驟S3之後的步驟S7中，一旦判定該拍攝結果中的該待識別物件上存在該文字資訊，該處理單元11相當於無從獲得關鍵文字資料並據以產生該品名識別結果。在此情況下，該處理單元11根據該特徵識別結果而將該等物件品名資料D21中的其中一者作為一筆目標物件品名資料D21*，並將該目標物件品名資料D21*輸出（例如透過該顯示裝置以顯示的方式輸出）。更具體地說，在該處理單元11無從獲得關鍵文字資料的情況下，該處理單元11是將該特徵識別結果的該等物件品名資料D21中，所對應之信心分數最高的該物件品名資料D21作為該目標物件品名資料D21*。In step S7, following step S3, if it is determined that the text information exists on the object to be identified in the photographic result, the processing unit 11 is unable to obtain the key text data from which to generate the product name identification result. In this case, the processing unit 11 uses one of the object name data D21 as target object name data D21* based on the feature identification result and outputs the target object name data D21* (e.g., by displaying it on the display device). More specifically, when the processing unit 11 cannot obtain the key text data, the processing unit 11 uses the object name data D21 with the highest confidence score among the object name data D21 of the feature recognition result as the target object name data D21*.

補充說明的是，在類似的其他實施例中，該處理單元11在步驟S2中也可以只固定選出與該待識別物件之外觀的匹配程度最高的單一筆匹配物件特徵資料D11’。因此，該特徵識別結果也可以僅包含單一筆物件品名資料D21，以及單一個對應於該匹配物件特徵資料D11’的信心分數。同理，該處理單元11在步驟S5中也可以只固定選出與該關鍵文字資料的匹配程度最高的單一筆匹配物件品名資料D21’ 。因此，該品名識別結果也可以僅包含單一筆匹配物件品名資料D21’，以及另外單一個對應於該匹配物件品名資料D21’的信心分數。It should be noted that, in other similar embodiments, the processing unit 11 may also select only a single matching object feature data D11’ that has the highest degree of match with the appearance of the object to be identified in step S2. Therefore, the feature identification result may also include only a single object name data D21 and a single confidence score corresponding to the matching object feature data D11’. Similarly, the processing unit 11 may also select only a single matching object name data D21’ that has the highest degree of match with the keyword text data in step S5. Therefore, the product name identification result may also include only a single matching object name data D21’ and another single confidence score corresponding to the matching object name data D21’.

以上即為本實施例之物件識別系統1如何實施該物件識別方法的示例說明。The above is an example of how the object identification system 1 of this embodiment implements the object identification method.

以上的示例說明是假設步驟S1中的該拍攝結果僅呈現出單一個待識別物件的情況，而若在步驟S1中，該拍攝結果是呈現出多個待識別物件，則與以上示例說明不同的是，在步驟S2中，該處理單元11是利用該外觀特徵匹配模型M1分析每一個待識別物件於該拍攝結果中表現出的外觀，而產生多個分別對應於該等待識別物件的特徵識別結果，接著對該拍攝結果中的每一個待識別物件繼續執行步驟S3以及步驟S3之後的其他步驟。特別說明的是，藉由分別對所有待識別物件執行步驟S3及其之後的步驟，該處理單元11會決定出多筆分別對應於該等待識別物件的目標物件品名資料D21*，而且，在較佳的實際實施態樣中，該處理單元11例如是將該等目標物件品名資料D21*一次性地輸出，而相當於產生並輸出一個物件品名清單。The above example description assumes that the shooting result in step S1 only presents a single object to be identified. If in step S1, the shooting result presents multiple objects to be identified, then unlike the above example description, in step S2, the processing unit 11 uses the appearance feature matching model M1 to analyze the appearance of each object to be identified in the shooting result, and generates multiple feature identification results corresponding to the objects to be identified, and then continues to execute step S3 and other steps after step S3 for each object to be identified in the shooting result. It is particularly noted that by executing step S3 and subsequent steps for all objects to be identified, the processing unit 11 determines multiple target object name data D21* corresponding to the objects to be identified. Moreover, in a preferred practical implementation, the processing unit 11 outputs the target object name data D21* at one time, which is equivalent to generating and outputting an object name list.

藉由實施該物件識別方法，對於該拍攝結果所呈現出的每一個待識別物件，本實施例能先分析該待識別物件於該拍攝結果中表現出的外觀以產生該特徵識別結果，再於判定該待識別物件上存在文字資訊後從中獲得該關鍵文字資料，並藉由分析該關鍵文字資料來產生該品名識別結果，最後一併根據該特徵識別結果及該品名識別結果來決定要將哪一筆物件品名資料D21作為目標物件品名資料D21*輸出，換言之，本實施例能同時考量該待識別物件的外型（例如形狀、輪廓、顏色等）以及存在於其上的文字，來綜合判斷要將該待識別物件辨識為哪一個物件種類。因此，若該拍攝結果所呈現的多個待識別物件彼此外型相似但實際上種類不同（例如品牌相同、包裝彼此相似的多款保健食品或其他種類的商品，又或者形狀、尺寸彼此類似的多個盒裝包裹），本實施例有助於避免因該等待識別物件的外型彼此相似而識別錯誤，而能更加準確地辨識出每一個待識別物件所屬的物件種類。基於上述，本實施例特別適合被應用在例如進／出貨的商品清點，或者是實體商店中自助結帳機台的待結帳商品辨識。另一方面，本實施例亦可與一機械手臂裝置（圖未示出）配合應用，舉例來說，在步驟S1中，該處理單元11可以是在該待識別物件被該機械手臂裝置所固持並移動到該拍攝裝置2之拍攝範圍內的情況下，控制該拍攝裝置2進行拍攝以獲得該拍攝結果，並且，該待識別物件後續要被該機械手臂裝置移動至何處（例如要被放置到哪一個容器內），是依據該處理單元11所選出的目標物件品名資料D21*而被決定。By implementing the object recognition method, for each object to be identified presented in the photographic result, the present embodiment can first analyze the appearance of the object to be identified in the photographic result to generate the feature recognition result, then obtain the key text data from the object to be identified after determining that there is text information on the object to be identified, and generate the product name recognition result by analyzing the key text data. Finally, based on both the feature recognition results and the product name recognition results, a decision is made as to which object name data D21 to output as the target object name data D21*. In other words, this embodiment can simultaneously consider the appearance (e.g., shape, outline, color, etc.) of the object to be recognized as well as the text on it to comprehensively determine which object type the object to be recognized is. Therefore, if the image capture results show multiple objects to be identified that appear similar in appearance but are actually of different types (for example, multiple health supplements or other products of the same brand and similar packaging, or multiple boxes of similar shape and size), this embodiment helps avoid misidentification due to the similar appearance of the objects to be identified, and can more accurately identify the object type of each object to be identified. Based on this, this embodiment is particularly suitable for applications such as inventory counting of incoming and outgoing goods, or identifying items to be checked out at self-checkout machines in physical stores. On the other hand, the present embodiment can also be used in conjunction with a robotic arm device (not shown). For example, in step S1, the processing unit 11 can control the photographing device 2 to take pictures to obtain the photographing result when the object to be identified is held by the robotic arm device and moved into the photographing range of the photographing device 2. In addition, where the object to be identified will be subsequently moved by the robotic arm device (for example, into which container it will be placed) is determined based on the target object name data D21* selected by the processing unit 11.

在本實施例的一種進階實施態樣中，每一物件特徵資料D11為一模板影像，而且，每一模板影像是以一張屬於對應之物件種類的物件的照片來實現。該等物件特徵資料D11的其中N筆（N為大於等於1的整數）物件特徵資料D11分別被作為N筆標記物件特徵資料。其中，每一標記物件特徵資料包括一被使用者所預先標記出的關鍵區域，而且，該關鍵區域相當於該標記物件特徵資料中要被作為影像辨識目標的一感興趣區域(Region of Interest)。具體舉例來說，該N筆標記物件特徵資料所對應的該N個物件種類，例如分別為N款包裝彼此相似但成分（或成分的劑量）有所差異的藥品或保健食品。並且，每一標記物件特徵資料的關鍵區域可例如是對應之物件種類的物件上（例如包裝盒上）以文字呈現出其產品成分（或成分的劑量）的部分。在本實施態樣之物件識別方法的步驟S2中，在該處理單元11選出該（等）匹配物件特徵資料D11’之後，對於每一匹配物件特徵資料D11’，該處理單元11還判斷該匹配物件特徵資料D11’是否屬於該N筆標記物件特徵資料的其中一者。若該處理單元11的判斷結果為否，該處理單元11繼續執行後續的步驟S3至步驟S7。然而，若該處理單元11的判斷結果為是，表示該待識別物件是特別容易被混淆的物件，在此情況下，該處理單元11是再次利用該外觀特徵匹配模型M1，而將該待識別物件於該拍攝結果中表現出之整體外觀的一關鍵部分，與該N筆標記物件特徵資料的該N個關鍵區域進行外型相似度的比對，並從該N筆標記物件特徵資料中，選出所包括之關鍵區域與該待識別物件之關鍵部分的匹配程度最高的其中一筆匹配標記物件特徵資料，並且根據該匹配標記物件特徵資料所對應的該物件品名資料D21來產生該特徵識別結果。其中，該關鍵部分是指該待識別物件在該拍攝結果所表現出的整體外觀中與該N筆標記物件特徵資料之關鍵區域相對應的部分，例如包裝盒上以文字說明其成分或劑量的標籤部分。並且，在後續的步驟S3及步驟S4中，該處理單元11是利用該光學字元辨識模型M2判斷該待識別物件的該關鍵部分中是否存在文字資訊，並於判斷結果為是的情況下，對存在於該關鍵部分中的該文字資訊執行該文字預處理，以獲得該關鍵文字資料。In an advanced implementation of this embodiment, each object feature data D11 is a template image, and each template image is implemented as a photograph of an object belonging to the corresponding object type. N pieces of object feature data D11 (N is an integer greater than or equal to 1) are used as N pieces of marked object feature data. Each marked object feature data includes a key region pre-marked by the user, and the key region corresponds to a region of interest (ROI) in the marked object feature data to be used as an image recognition target. For example, the N object types corresponding to the N pieces of tagged object feature data may be, for example, N types of pharmaceuticals or health foods with similar packaging but different ingredients (or dosages of the ingredients). Furthermore, the key area of each piece of tagged object feature data may be, for example, the portion of the corresponding object type (e.g., the packaging box) where the product ingredients (or dosages of the ingredients) are displayed in text. In step S2 of the object recognition method of this embodiment, after the processing unit 11 selects the matching object feature data D11', for each piece of matching object feature data D11', the processing unit 11 further determines whether the matching object feature data D11' belongs to one of the N pieces of tagged object feature data. If the determination result of the processing unit 11 is no, the processing unit 11 continues to execute the subsequent steps S3 to S7. However, if the judgment result of the processing unit 11 is yes, it means that the object to be identified is an object that is particularly easy to be confused. In this case, the processing unit 11 uses the appearance feature matching model M1 again to compare the appearance similarity of a key part of the overall appearance of the object to be identified in the shooting result with the N key areas of the N marked object feature data, and selects from the N marked object feature data one of the matching marked object feature data whose included key area has the highest degree of matching with the key part of the object to be identified, and generates the feature identification result based on the object name data D21 corresponding to the matching marked object feature data. The key portion refers to the portion of the object to be identified in the overall appearance of the photographed result that corresponds to the key region of the N pieces of marked object feature data, such as a label on a packaging box that describes its ingredients or dosage in text. Furthermore, in subsequent steps S3 and S4, the processing unit 11 uses the optical character recognition model M2 to determine whether text information exists in the key portion of the object to be identified. If the determination is yes, the processing unit 11 performs text pre-processing on the text information in the key portion to obtain the key text data.

特別說明的是，本實施例的步驟S1至步驟S7及圖2的流程圖僅是用於示例說明本發明物件識別方法的其中一種可實施方式。應當理解，即便將步驟S1至步驟S7進行合併、拆分或順序調整，若合併、拆分或順序調整之後的流程與本實施例相比是以實質相同的方式達成實質相同的功效，便仍屬於本發明物件識別方法的可實施態樣，因此，本實施例的步驟S1至步驟S7及圖2的流程圖並非用於限制本發明的可實施範圍。It should be noted that steps S1 through S7 of this embodiment and the flowchart in FIG2 are merely intended to illustrate one possible implementation of the object identification method of the present invention. It should be understood that even if steps S1 through S7 are combined, split, or reordered, if the resulting process achieves substantially the same effect as the present embodiment in a substantially identical manner, it still falls within the scope of the object identification method of the present invention. Therefore, steps S1 through S7 of this embodiment and the flowchart in FIG2 are not intended to limit the scope of implementation of the present invention.

本發明還提供了該物件識別系統1的一第二實施例，並且，第二實施例與第一實施例的不同之處，在於該物件識別系統1所實施的該物件識別方法。The present invention also provides a second embodiment of the object identification system 1. The second embodiment differs from the first embodiment in the object identification method implemented by the object identification system 1.

具體而言，在第二實施例之物件識別方法的步驟S2中，該處理單元11是從該等物件特徵資料D11中，選出與該待識別物件之外觀的匹配程度最高的單一筆匹配物件特徵資料D11’，再根據該匹配物件特徵資料D11’所對應的該物件品名資料D21產生該特徵識別結果。更明確地說，該特徵識別結果包含該匹配物件特徵資料D11’所對應的該物件品名資料D21。Specifically, in step S2 of the object recognition method of the second embodiment, the processing unit 11 selects a single piece of matching object feature data D11' from the object feature data D11 that best matches the appearance of the object to be recognized. The processing unit 11 then generates the feature recognition result based on the object name data D21 corresponding to the matching object feature data D11'. More specifically, the feature recognition result includes the object name data D21 corresponding to the matching object feature data D11'.

並且，在第二實施例的步驟S5中，在該處理單元11判定該待識別物件上存在該文字資訊並從中獲得該關鍵文字資料（亦即有辨識出該關鍵文字資料）的情形下，該處理單元11是先從該等物件品名資料D21中，選出與該關鍵文字資料的匹配程度最高的單一筆匹配物件品名資料D21’，再根據該匹配物件品名資料D21’產生該品名識別結果。更明確地說，該品名識別結果包含該匹配物件品名資料D21’。Furthermore, in step S5 of the second embodiment, if the processing unit 11 determines that the text information exists on the object to be identified and obtains the key text data therefrom (i.e., recognizes the key text data), the processing unit 11 first selects a single matching object name data D21′ from the object name data D21 that has the highest degree of match with the key text data, and then generates the product name identification result based on the matching object name data D21′. More specifically, the product name identification result includes the matching object name data D21′.

進一步地，在第二實施例的步驟S6中（亦即在該處理單元11已獲得該關鍵文字資料的情形下），該處理單元11是直接將被包含於該品名識別結果的該匹配物件品名資料D21’作為該目標物件品名資料D21*，亦即以該品名識別結果為準，而不考慮該特徵識別結果。另一方面，在第二實施例的步驟S7中（亦即在該處理單元11無從獲得該關鍵文字資料的情形下），該處理單元11是直接將被包含於該特徵識別結果的該物件品名資料D21作為該目標物件品名資料D21*。Furthermore, in step S6 of the second embodiment (i.e., when the processing unit 11 has obtained the key text data), the processing unit 11 directly uses the matching object name data D21' included in the product name recognition result as the target object name data D21*, i.e., the product name recognition result is used as the basis without considering the feature recognition result. On the other hand, in step S7 of the second embodiment (i.e., when the processing unit 11 has not obtained the key text data), the processing unit 11 directly uses the object name data D21 included in the feature recognition result as the target object name data D21*.

本發明還提供了一種電腦程式產品的一實施例，其中，該電腦程式產品包含一能被儲存於電腦可讀取紀錄媒體且能被一電腦裝置（例如筆記型電腦及桌上型電腦）所載入並運行的應用程式，並且，且該應用程式例如包含該外觀特徵匹配模型M1、該光學字元辨識模型M2及該文字關聯度分析模型M3。並且，在該電腦裝置獲得該特徵資料集合D1及該品名資料集合D2的情況下（例如由使用者手動輸入或從外部電子裝置接收），當該電子裝置載入並運行該電腦程式產品的該應用程式時，該應用程式能使該電腦裝置實施前述任一實施態樣所述的物件識別方法。The present invention also provides an embodiment of a computer program product, wherein the computer program product includes an application program that can be stored on a computer-readable recording medium and can be loaded and executed by a computer device (e.g., a laptop or desktop computer). The application program includes, for example, the appearance feature matching model M1, the optical character recognition model M2, and the text relevance analysis model M3. Furthermore, when the computer device obtains the feature data set D1 and the product name data set D2 (e.g., manually input by a user or received from an external electronic device), when the electronic device loads and executes the application program of the computer program product, the application program can cause the computer device to implement the object recognition method described in any of the aforementioned embodiments.

綜上所述，藉由實施該物件識別方法，該物件識別系統1能同時考量待識別物件的外型以及存在於其上的文字，來綜合判斷要將該待識別物件辨識為哪一個物件種類，藉此，該物件識別系統1有助於避免因該等待識別物件的外型彼此相似而識別錯誤，而能夠更加準確地辨識出每一個待識別物件所屬的物件種類，故確實能達成本發明之目的。In summary, by implementing the object identification method, the object identification system 1 can simultaneously consider the appearance of the object to be identified and the text present thereon to comprehensively determine the object type to be identified as. In this way, the object identification system 1 helps to avoid identification errors caused by the similar appearances of the objects to be identified and can more accurately identify the object type to which each object to be identified belongs, thereby truly achieving the purpose of the present invention.

惟以上所述者，僅為本發明之實施例而已，當不能以此限定本發明實施之範圍，凡是依本發明申請專利範圍及專利說明書內容所作之簡單的等效變化與修飾，皆仍屬本發明專利涵蓋之範圍內。However, the above description is merely an example of the present invention and should not be used to limit the scope of the present invention. All simple equivalent changes and modifications made within the scope of the patent application and the contents of the patent specification of the present invention are still within the scope of the present patent.

1··········· 物件識別系統 11········· 處理單元 12········· 儲存單元 D1········· 特徵資料集合 D11······· 物件特徵資料 D11’······ 匹配物件特徵資料 D2········· 品名資料集合 D21······· 物件品名資料 D21’······ 匹配物件品名資料 D21*······ 目標物件品名資料 M1········ 外觀特徵匹配模型 M2········ 光學字元辨識模型 M3········ 文字關聯度分析模型 2··········· 拍攝裝置 S1~S7····· 步驟 1··········· Object recognition system 11········· Processing unit 12········· Storage unit D1········ Feature data set D11······· Object feature data D11’······ Matching object feature data D2········ Product name data set D21······· Object name data D21’······ Matching object name data D21*······ Target object name data M1········ Appearance feature matching model M2········ Optical character recognition model M3········· Text Relevance Analysis Model 2. Camera Equipment Steps S1-S7

本發明之其他的特徵及功效，將於參照圖式的實施方式中清楚地呈現，其中：圖1是一方塊示意圖，示例性地表示本發明物件識別系統的一第一實施例；及圖2是一流程圖，用於示例性地說明該第一實施例如何實施一物件識別方法。 Other features and functions of the present invention are clearly illustrated in the accompanying drawings, wherein: Figure 1 is a block diagram illustrating a first embodiment of the object identification system of the present invention; and Figure 2 is a flow chart illustrating how the first embodiment implements an object identification method.

S1~S7····· 步驟S1~S7····· Steps

Claims

An object recognition method is implemented by an object recognition system, the object recognition system including a processing unit and a storage unit, the storage unit storing a feature data set, a product name data set, an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities, the feature data set including a plurality of object feature data corresponding to a plurality of object categories, the product name data set including a plurality of object name data corresponding to the object categories and corresponding to the object feature data; the object recognition method includes: (A) the processing unit obtains a photographic result showing at least one object to be recognized, and uses the appearance feature matching model to process the image feature matching model; The appearance of the object to be identified in the photographed result is analyzed based on the feature data set to generate a feature identification result corresponding to the object to be identified and related to the object name data, wherein the processing unit first selects one or more matching object feature data with the highest degree of matching with the appearance of the object to be identified from the object feature data, and then generates the feature identification result based on the object name data to which the matching object feature data respectively corresponds, and the feature identification result includes the object name data to which the matching object feature data respectively corresponds, and one or more confidence scores corresponding to the matching object feature data respectively; (B) the processing unit uses the optical character recognition When the recognition model identifies a key text data presented on the object to be identified in the photographic result, the processing unit analyzes the key text data according to the product name data set using the text relevance analysis model to generate a product name recognition result corresponding to the object to be identified and related to the object name data, wherein the processing unit first selects one or more matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name recognition result based on the (such) matching object name data, the product name recognition result including the (such) matching object name data and one or more confidence scores corresponding to the (such) matching object name data; and (C) the processing unit The processing unit determines the target object name data by determining whether one of the object name data is included in the feature recognition result and the product name recognition result. In the identification result and the product name identification result, if the judgment result is yes, one of the object name data will be used as the target object name data. If the judgment result is no, the object name data with the highest corresponding confidence score among the object name data included in the feature identification result or the product name identification result will be used as the target object name data.

The object identification method as described in claim 1, wherein: in step (A), the processing unit selects a single matching object feature data that has the highest degree of match with the appearance of the object to be identified from the object feature data, and then generates the feature identification result, and the feature identification result includes the object name data corresponding to the matching object feature data; in step (B), the processing unit first uses the optical character recognition model to determine whether there is text information on the object to be identified in the shooting result, and if the determination result is yes, obtains the key text data from the text information, and, if the processing unit has identified the key text data The processing unit first selects a single matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name recognition result, and the product name recognition result includes the matching object name data; and in step (C), if the processing unit determines in step (B) that the text information exists on the object to be recognized, the processing unit uses the object name data included in the product name recognition result as the target object name data; if the processing unit determines in step (B) that the text information does not exist on the object to be recognized, the processing unit uses the object name data included in the feature recognition result as the target object name data.

The object recognition method as described in claim 1, wherein in step (B), the processing unit first uses the optical character recognition model to determine whether text information consisting of multiple characters exists on the object to be identified in the photographic result, and if the determination result is yes, at least a portion of the text information is extracted from the text information as the key text data based on the size and/or position of the characters in the photographic result.

An object recognition system includes: a processing unit; and a storage unit electrically connected to the processing unit and storing a feature data set, a product name data set, an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities, wherein the feature data set includes multiple object feature data corresponding to multiple object categories, and the product name data set includes multiple object name data corresponding to the object categories and corresponding to the object feature data; wherein the processing unit is used to: obtain a photographic result showing at least one object to be recognized, and use the appearance feature matching model to identify the object according to the feature data set; The apparatus further comprises a processing unit for processing a feature recognition result of the object to be identified and the object name data. The processing unit first selects one or more matching object feature data with the highest degree of matching with the appearance of the object to be identified from the object feature data, and then generates the feature recognition result according to the object name data to which the matching object feature data corresponds. The feature recognition result includes the object name data to which the matching object feature data corresponds, and one or more confidence scores corresponding to the matching object feature data. The apparatus further comprises a processing unit for processing a feature recognition result of the object to be identified and the object name data to which the matching object feature data corresponds. In the case of a key text data presented on the object to be identified in the result, the processing unit uses the text relevance analysis model to analyze the key text data according to the product name data set to generate a product name identification result corresponding to the object to be identified and related to the object name data, wherein the processing unit first selects one or more matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name identification result based on the (such) matching object name data, the product name identification result including the (such) matching object name data and one or more confidence scores corresponding to the (such) matching object name data respectively; and according to the feature identification At least one of the feature recognition result and the product name recognition result is selected, and one of the object name data is used as a target object name data, and the target object name data is output. The processing unit determines the target object name data by judging whether one of the object name data is included in the feature recognition result and the product name recognition result. If the judgment result is yes, the one of the object name data is used as the target object name data. If the judgment result is no, the object name data with the highest confidence score among the object name data included in the feature recognition result or the product name recognition result is used as the target object name data.

The object recognition system as described in claim 4, wherein: the processing unit selects a single matching object feature data that has the highest degree of match with the appearance of the object to be identified from the object feature data, and then generates the feature recognition result, and the feature recognition result includes the object name data corresponding to the matching object feature data; the processing unit first uses the optical character recognition model to determine whether there is text information on the object to be identified in the photographic result, and if the determination result is yes, obtains the key text data from the text information, and, after the processing unit has identified the key text data In this case, the processing unit first selects a single matching object name data with the highest degree of matching with the key text data from the object name data, and then generates the product name recognition result, and the product name recognition result includes the matching object name data; and if the processing unit determines that the text information exists on the object to be identified, the processing unit will use the object name data included in the product name recognition result as the target object name data; if the processing unit determines that the text information does not exist on the object to be identified, the processing unit will use the object name data included in the feature recognition result as the target object name data.

As described in claim 4, the object recognition system, wherein the processing unit first uses the optical character recognition model to determine whether text information consisting of multiple characters exists on the object to be identified in the photographic result, and if the determination result is yes, extracts at least a portion of the text information as the key text data based on the size and/or position of the characters in the photographic result.

A computer program product includes an application program, which includes an appearance feature matching model implemented based on image recognition technology, an optical character recognition model, and a text relevance analysis model with text processing capabilities. When a computer device obtains a feature data set and a product name data set, when the electronic device loads and runs the application program, the application program enables the computer device to implement the object identification method as described in any one of claims 1 to 3, wherein the feature data set includes multiple object feature data corresponding to multiple object types, and the product name data set includes multiple object name data corresponding to the object types and corresponding to the object feature data.