WO2008004559A1

WO2008004559A1 - Clustering system, and defect kind judging device

Info

Publication number: WO2008004559A1
Application number: PCT/JP2007/063325
Authority: WO
Inventors: Makoto Kurumisawa; Akio Suguro; Koji Ohnishi
Original assignee: Asahi Glass Co Ltd
Current assignee: AGC Inc
Priority date: 2006-07-06
Filing date: 2007-07-03
Publication date: 2008-01-10
Anticipated expiration: 2009-01-06
Also published as: CN101484910B; JP5120254B2; TWI434229B; KR20090018920A; KR100998456B1; TW200818060A; JPWO2008004559A1; CN101484910A

Abstract

Provided is a clustering system capable of classifying target data more rapidly and precisely than the examples of the prior art. The clustering system classifies input data into each of clusters formed of the populations of learning data, in terms of featuring quantities owned by the input data. The clustering system comprises a featuring set storage unit stored with featuring quantity sets or a combination of featuring quantities to be used for the classifications, in a manner to correspond to the individual clusters, a featuring quantity extracting unit for extracting a preset featuring quantity from the input data, a distance calculating unit for calculating and outputting the distances between the center of the population of the individual clusters and the input data, individually as set distances for each featuring quantity set corresponding to each cluster, on the basis of the featuring quantities contained in the featuring quantity set, and a rank extracting unit for arraying the individual set distances in the order of the smaller length.

Description

明細書 Specification

クラスタリングシステムおよび欠陥種類判定装置 Clustering system and defect type determination apparatus

技術分野 Technical field

[0001] 本発明は、検出対象物の画像における欠陥部分の部分画像を切り出し、この部分画像カゝら欠陥の特徴信号を抽出して、欠陥の種別を分類するクラスタリングシステム、欠陥種類判定装置に関する。 TECHNICAL FIELD [0001] The present invention relates to a clustering system and a defect type determination device that cut out a partial image of a defect portion in an image of a detection target object, extract a feature signal of the defect from the partial image cover, and classify the defect type. .

背景技術 Background art

[0002] 未知データと学習データとの距離、例えばマハラノビス（Mahalanobis generalized distance)距離によるクラスタリング手法は従来力も一般的に行われている。すなわち、未知データが事前に学習した母集団としてのクラスタに属する力否かを判定することにより分類され、クラスタリング処理が行われる。たとえば、複数クラスタに対するマハラノビス距離の大小により、未知データがいずれの母集団のクラスタに属するかの判定が行われている (たとえば、特許文献 1参照)。 A conventional clustering technique based on a distance between unknown data and learning data, for example, a Mahalanobis (Mahalanobis generalized distance) distance, is generally performed. That is, the unknown data is classified by determining whether or not it belongs to a cluster as a population learned in advance, and clustering processing is performed. For example, the determination of which population cluster the unknown data belongs to is made based on the Mahalanobis distance for multiple clusters (see, for example, Patent Document 1).

また、上述した距離を効率的に計算するため、複数の特徴量を選択してクラスタリングの処理を行うことが行われて、る。 In addition, in order to efficiently calculate the above-described distance, a plurality of feature amounts are selected and clustering processing is performed.

[0003] また、多数の識別器 (classifier)より得られた結果の投票により、その未知データが帰属するクラスタの判定を行う手法も一般的であり、異なるセンサの出力の識別結果、または一つの画像上の異なる領域に対する未知データの識別における識別結果等が用いられている (たとえば、特許文献 2参照)。 [0003] In addition, a method of determining a cluster to which the unknown data belongs by voting results obtained from a large number of classifiers (classifiers) is also common. The identification result in the identification of unknown data for different areas on the image is used (for example, see Patent Document 2).

上記クラスタリング手法により、血液検査の結果得られたパラメータによる病気の診断、すなわちいずれの病気に属するかのクラスタリングにおいて、複数のクラスタにおける 2つのクラスタずつの組み合わせを設定し、この組合せごとに、被検データがいずれかのクラスタに類似すると判定されるかを全ての組み合わせに対して行、、その判定の数の集計結果により、判定された数の多いクラスタに分類されると決定する手法がある（たとえば、特許文献 3)。 With the clustering method described above, in diagnosing illnesses using parameters obtained as a result of blood tests, that is, in categorizing which disease belongs, a combination of two clusters in a plurality of clusters is set, and for each combination. Whether the test data is judged to be similar to any cluster is performed for all the combinations, and the result of counting the number of judgments determines that the data is classified into a cluster with a large number of judgments. There is a law (for example, Patent Document 3).

[0004] LCDガラス基板に付けられた各欠陥を、予め設定されている欠陥種類毎に分類する際、分類する際の識別に対応させて、分類に用いる各特徴量の最適化を行い、この最適化に対応するように、各特徴量に対してそれぞれ重み付けを行い、この最適化された特徴量を用いて、いずれのクラスタに属するかの判定を行うクラスタリングが行われて、る（例えば、特許文献 4)。 [0004] When each defect attached to the LCD glass substrate is classified for each preset defect type, each feature amount used for classification is optimized in accordance with the identification at the time of classification, This Clustering is performed to weight each feature amount so as to correspond to the optimization of each of the clusters, and to determine which cluster the cluster belongs to using the optimized feature amount (for example, Patent Document 4).

特許文献 1：特開 2005— 214682号公報 Patent Document 1: Japanese Patent Laid-Open No. 2005-214682

特許文献 2 :特開 2001— 56861号公報 Patent Document 2: Japanese Patent Laid-Open No. 2001-56861

特許文献 3 :特開平 07— 105166号公報 Patent Document 3: Japanese Patent Application Laid-Open No. 07-105166

特許文献 4:特開 2002— 99916号公報 Patent Document 4: Japanese Patent Laid-Open No. 2002-99916

発明の開示 Disclosure of the invention

発明が解決しょうとする課題 Problems to be solved by the invention

[0005] し力しながら、特許文献 3に示すクラスタリングにあっては、個々の組み合わせの最適化が行われておらず、判別材料となる特徴量を活力ゝしきれておらず、かつ判別すべきクラスタが多くなると、組み合わせ数が膨大となり、判定処理に力かる時間が増大してしまうという問題がある。 [0005] However, in the clustering shown in Patent Document 3, the individual combinations are not optimized, the feature quantities that are the discrimination materials are not fully used, and the discrimination is not performed. When the number of clusters to be increased increases, the number of combinations becomes enormous, and there is a problem that the time required for the determination process increases.

また、特許文献 4に示すクラスタリングにあっては、判定率を元に、特徴量に重みを付け判別精度を向上させようとしているが、クラスタ毎の特徴量の最適化の概念が無ぐ上述した特許文献 3と同様に、特徴量が生かし切れていないため、高い精度の分類が行われな、欠点がある。 Further, in the clustering shown in Patent Document 4, the feature amount is weighted based on the determination rate to improve the discrimination accuracy, but there is no concept of feature amount optimization for each cluster as described above. Similar to Patent Document 3, since the feature quantity is not fully utilized, there is a drawback that classification with high accuracy is not performed.

[0006] 本発明は、このような事情に鑑みてなされたもので、属するクラスタに分類する対象である分類対象データ力抽出された特徴量を判別する際に活かし、従来例に比し、より高速に、より高精度に分類対象データを分類する、例えばガラス面に付いた欠陥を、欠陥種類に対応したクラスタに分類することができるクラスタリングシステム、欠陥種類判定装置を提供する。 [0006] The present invention has been made in view of such circumstances, and is utilized when discriminating the extracted feature quantity, which is an object to be classified into the cluster to which it belongs, and compared with the conventional example. Provided are a clustering system and a defect type determination device that can classify classification target data at high speed and with high accuracy, for example, can classify defects on a glass surface into clusters corresponding to defect types.

課題を解決するための手段 Means for solving the problem

[0007] 上述した課題を解決するため、本発明においては、分類対象データと、各クラスタとの間の距離を同一の種類の特徴量にて算出して分類先を決定する従来例とは異なり、各クラスタ間にて差分を得ることができる特徴量のセットをクラスタ毎に設定し、それぞれのクラスタとの間にて異なる特徴量にて距離を求めているため、従来に比較してより精度の高、分類を行うこととなる。上述した特徴量のセットは、各クラスタに属する学習データの特性に基づいて行うため、他のクラスタと区別が可能な特徴量にて構成されている。 [0007] In order to solve the above-described problem, the present invention is different from the conventional example in which the distance between the classification target data and each cluster is calculated with the same type of feature amount to determine the classification destination. Therefore, a set of feature quantities that can obtain differences between each cluster is set for each cluster, and the distance is obtained with different feature quantities for each cluster. Thus, classification is performed with higher accuracy. Since the set of feature values described above is performed based on the characteristics of the learning data belonging to each cluster, the feature value sets are configured with feature values that can be distinguished from other clusters.

すなわち、本発明は以下の構成を採用した。 That is, the present invention employs the following configuration.

[0008] 本発明のクラスタリングシステムは、学習データ（learning data)の母集団（populatio n)により形成されたクラスタ各々に、入力データ (input data)を、該入力データが有する特徴量 (parameter)により分類するクラスタリングシステムにおいて、クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セット（parameter set)が記憶されて、る特徴量セット記憶部と、入力データから予め設定されて、る特徴量を抽出する特徴量抽出部と、各クラスタに対応した特徴量セット毎に、該特徴量セット〖こ含まれる特徴量に基づいて、各クラスタの母集団の中心と前記入力データとの距離を、各々セット距離として計算して出力する距離計算部と、前記各セット距離を小さい順に配列する順位抽出部とを有することを特徴とする。 [0008] The clustering system of the present invention includes input data (input data) for each cluster formed by the population of learning data (learning data), and features (parameters) of the input data. In a clustering system that classifies according to the above, a feature set (parameter set) that is a combination of features used for classification is stored corresponding to each cluster, and is set in advance from a feature set storage unit and input data. A feature amount extracting unit for extracting the feature amount, and for each feature amount set corresponding to each cluster, based on the feature amount included in the feature amount set, the center of the population of each cluster and the input A distance calculating unit that calculates and outputs the distance to the data as a set distance, and a rank extracting unit that arranges the set distances in ascending order.

[0009] 本発明の好ましいクラスタリングシステムは、前記特徴量セットが各クラスタ毎に複数設定されている。 In a preferred clustering system of the present invention, a plurality of the feature quantity sets are set for each cluster.

[0010] 本発明の好ましいクラスタリングシステムは、特徴量セット毎に得られた前記セット距離にぉ、て、該セット距離の順位に基づ、て設定された入力データの各クラスタへの分類基準を示す規則パターンにより、前記入力データがいずれのクラスタに属するかを検出するクラスタ分類部をさらに有して！/、る。 [0010] A preferred clustering system of the present invention provides a classification criterion for each cluster of input data set based on the rank of the set distance based on the set distance obtained for each feature amount set. A cluster classification unit for detecting which cluster the input data belongs to by a rule pattern indicating! /

[0011] 本発明の好ましいクラスタリングシステムは、前記クラスタ分類部が、前記セット距離の順位により、前記入力データがいずれのクラスタに属するかを検出し、該順位が上位となったセット距離が多いクラスタを、前記入力データの属するクラスタとして検出する。 [0011] In a preferred clustering system of the present invention, the cluster classification unit detects which cluster the input data belongs to based on the rank of the set distance, and there are many set distances where the rank is higher. A cluster is detected as a cluster to which the input data belongs.

[0012] 本発明の好まヽクラスタリングシステムは、前記クラスタ分類部が、順位が上位となった数に対する閾値を有しており、上位となったクラスタが該閾値以上であれば入力データの属するクラスタとして検出する。 [0012] In a preferred clustering system of the present invention, the cluster classification unit has a threshold for the number of ranks higher than the rank, and the input data belongs if the higher-ranked cluster is equal to or higher than the threshold. Detect as a cluster.

[0013] 本発明の好まヽクラスタリングシステムは、前記距離計算部が、前記セット距離に対して特徴量セット対応して設定されている補正係数を乗算し、各特徴量セット間におけるセット距離を標準化することを特徴とする。 [0014] 本発明の好ましいクラスタリングシステムは、各クラスタ毎の特徴量セットを作成する特徴量セット作成部をさらに有し、前記特徴量セット作成部が、各特徴量の複数の組合せ毎に、各クラスタの母集団の学習データの平均値を原点とし、この原点と他のクラスタの母集団の各学習データとの距離の平均値を求め、最も大きな平均値となった特徴量の組合せを、各クラスタの他のクラスタとの識別に用いる特徴量セットとして選択する。 In a preferred clustering system of the present invention, the distance calculation unit multiplies the set distance by a correction coefficient set corresponding to the feature amount set, and sets a set distance between each feature amount set. It is characterized by standardization. [0014] A preferred clustering system of the present invention further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, and the feature quantity set creation unit has a plurality of combinations of each feature quantity. The average value of the learning data of the population of each cluster is used as the origin, the average value of the distance between this origin and each learning data of the population of another cluster is obtained, and the combination of the feature values that has the largest average value is obtained. Then, it is selected as a feature value set used for distinguishing each cluster from other clusters.

[0015] 本発明の欠陥種類判定装置は、上記記載のクラスタリングシステムのいずれかが設けられ、前記入力データが製品の欠陥の画像データであり、欠陥を示す特徴量により、画像データにおける欠陥を、欠陥の種類別に分類する。 [0015] The defect type determination apparatus according to the present invention is provided with any of the above-described clustering systems, and the input data is image data of a product defect, and the image data includes a feature amount indicating the defect. Defects are classified by defect type.

本発明の好ましい欠陥種類判定装置は、前記製品がガラス物品であり、該ガラス物品の欠陥を、欠陥の種類別に分類する。 In a preferred defect type determination apparatus according to the present invention, the product is a glass article, and the defects of the glass article are classified by defect type.

[0016] 本発明の欠陥検出装置は、上記欠陥種類判定装置が設けられた、製品の欠陥の種別を検出する。 [0016] A defect detection apparatus according to the present invention detects a defect type of a product provided with the defect type determination apparatus.

[0017] 本発明の製造状態判定装置は、上記記載の欠陥種類判定装置が設けられた、製品の欠陥の種別を行い、該種別に対応した発生要因との対応に基づき、製造プロセスにおける欠陥の発生要因の検出を行う。 [0017] A manufacturing state determination device according to the present invention is provided with the defect type determination device described above, performs a type of product defect, and in a manufacturing process based on correspondence with an occurrence factor corresponding to the type. Detect the cause of the defect.

[0018] 本発明の好ましい製造状態判定装置は、上記記載のクラスタリングシステムのいずれかが設けられ、前記入力データが製品の製造プロセスにおける製造条件を示す特徴量であり、この特徴量を、製造プロセスの各工程の製造状態別に分類する。 [0018] A preferable manufacturing state determination apparatus of the present invention is provided with any one of the above-described clustering systems, and the input data is a characteristic amount indicating a manufacturing condition in a manufacturing process of the product. Categorize by manufacturing state of each step of the manufacturing process.

本発明の好ましい製造状態判定装置は、前記製品がガラス物品であり、該ガラス物品の製造プロセスにおける特徴量を、製造プロセスの各工程の製造状態別に分類する。 In a preferable manufacturing state determination apparatus of the present invention, the product is a glass article, and the characteristic amount in the manufacturing process of the glass product is classified according to the manufacturing state of each step of the manufacturing process.

[0019] 本発明の製造状態検出装置は、上記記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別を検出する。 [0019] The manufacturing state detection device of the present invention detects the type of manufacturing state in each step of the product manufacturing process provided with the manufacturing state determination device described above.

[0020] 本発明の製品製造管理装置は、上記記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別の検出を行い、該種別に対応した制御項目に基づき、製造プロセスの工程におけるプロセス制御を行う。 [0020] The product manufacturing management device of the present invention detects the type of manufacturing state in each step of the manufacturing process of the product provided with the above-described manufacturing state determination device, and sets the control item corresponding to the type. Based on this, process control in the manufacturing process is performed.

発明の効果 [0021] 以上説明したように、本発明によれば、分類先のクラスタ毎に、分類対象データの有する複数の特徴量から、他のクラスタとの距離が遠くなる最適な特徴量の組合せを予め設定しておき、分類対象データと各クラスタとの間における距離をそれぞれ計算し、この計算された距離が最も小さいクラスタに、分類対象データを分類するため、従来の手法に比較して、より正確に分類対象データを対応するクラスタに分類することができる。 The invention's effect [0021] As described above, according to the present invention, for each cluster to be classified, an optimal combination of feature amounts that is far from other clusters from a plurality of feature amounts of data to be classified is previously stored. It is set in advance, and the distance between the classification target data and each cluster is calculated, and the classification target data is classified into the cluster with the smallest calculated distance. The classification target data can be accurately classified into the corresponding clusters.

また、本発明によれば、クラスタ毎に上記組合せを複数設定し、全クラスタと分類対象データとの計算結果の距離を小さ、順にならベて、予め設定した数の上位グループに含まれる数が最も多いクラスタに、分類対象データを分類するため、従来に比較して精度の高、分類が行うことができる。 In addition, according to the present invention, a plurality of the above combinations are set for each cluster, and the calculation result distances of all the clusters and the classification target data are arranged in order, and are included in a preset number of upper groups. Since the data to be classified is classified into the cluster with the largest number, classification can be performed with higher accuracy than in the past.

図面の簡単な説明 Brief Description of Drawings

[0022] [図 1]本発明の第 1および第 2の実施形態によるクラスタリングシステムの構成例を示すブロック図である。 FIG. 1 is a block diagram showing a configuration example of a clustering system according to first and second embodiments of the present invention.

[図 2]判別基準値 λによる特徴セットの選択に対する処理を説明するテーブルである FIG. 2 is a table explaining a process for selecting a feature set based on a discrimination reference value λ.

[図 3]判別基準値 λによる特徴セットの選択に対する処理を説明するテーブルである FIG. 3 is a table for explaining processing for feature set selection based on a discrimination reference value λ.

[図 4]判別基準値 λによる特徴セットの選択に対する効果を説明するヒストグラムを示す図である。 FIG. 4 is a diagram showing a histogram for explaining the effect of the discrimination criterion value λ on the feature set selection.

[図 5]第 1の実施形態による各クラスタに対する特徴量セットを選択する処理における動作例を示すフローチャートである。 FIG. 5 is a flowchart showing an operation example in a process of selecting a feature amount set for each cluster according to the first embodiment.

[図 6]第 1の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。 FIG. 6 is a flowchart showing an operation example in clustering processing for classification target data according to the first embodiment.

[図 7]第 2の実施形態におけるクラスタリングの処理に用いる規則パターンのテーブルを生成する動作例を示すフローチャートである。 FIG. 7 is a flowchart showing an operation example for generating a rule pattern table used for clustering processing in the second embodiment.

[図 8]第 2の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。 FIG. 8 is a flowchart showing an operation example in clustering processing for classification target data according to the second embodiment.

[図 9]第 2の実施形態による分類対象データに対する他のクラスタリングの処理における動作例を示すフローチャートである。 [FIG. 9] Another clustering process for the classification target data according to the second embodiment. It is a flowchart which shows the operation example which takes.

[図 10]第 3の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。 FIG. 10 is a flowchart showing an operation example in clustering processing for classification target data according to the third embodiment.

圆 11]特徴量の変換方法としての演算式を設定する動作例を示すフローチャートである。 [11] This is a flowchart showing an operation example for setting an arithmetic expression as a feature quantity conversion method.

[図 12]図 11のフローチャートにおける評価値の算出の動作例を示すフローチャートである。 FIG. 12 is a flowchart showing an operation example of evaluation value calculation in the flowchart of FIG. 11.

圆 13]設定された変換方法を用いて変換した特徴量を用いた距離の算出の動作例を示すフローチャートである。 [13] This is a flowchart showing an example of the operation of calculating the distance using the feature value converted by using the set conversion method.

[図 14]各クラスタに属する学習データを示すテーブルである。 FIG. 14 is a table showing learning data belonging to each cluster.

圆 15]図 14の学習データを従来例によるクラスタリング方法により分類した結果を示す結果テーブルである。 [15] This is a result table showing the results of classifying the learning data in Fig. 14 by the clustering method of the conventional example.

圆 16]全体補正判定率の算出方法を説明する概念図である。圆 16] It is a conceptual diagram illustrating a method for calculating the overall correction determination rate.

[図 17]図 14の学習データを第 1の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。 FIG. 17 is a result table showing the result of classifying the learning data of FIG. 14 by the clustering system in the first embodiment.

[図 18]図 14の学習データを第 2の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。 FIG. 18 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.

[図 19]図 14の学習データを第 2の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。 FIG. 19 is a result table showing the results of classifying the learning data of FIG. 14 by the clustering system in the second embodiment.

[図 20]本発明のクラスタリングシステムを用いた検査装置の構成例を示すブロック図である。 FIG. 20 is a block diagram showing a configuration example of an inspection apparatus using the clustering system of the present invention.

[図 21]図 20の検査装置における特徴量セットの選択の動作例を示すフローチャートである。 FIG. 21 is a flowchart showing an operation example of feature quantity set selection in the inspection apparatus of FIG.

[図 22]図 20の検査装置におけるクラスタリング処理の動作例を示すフローチャートである。 FIG. 22 is a flowchart showing an operation example of clustering processing in the inspection apparatus of FIG.

圆 23]本発明のクラスタリングシステムを用いた欠陥種類判定装置の構成例を示すブロック図である。圆 23] It is a block diagram showing a configuration example of a defect type determination device using the clustering system of the present invention.

[図 24]本発明のクラスタリングシステムを用いた製造管理装置の構成例を示すブロック図である。 FIG. 24 is a block diagram showing a configuration example of a manufacturing management apparatus using the clustering system of the present invention. FIG.

[図 25]本発明のクラスタリングシステムを用いた他の製造管理装置の構成例を示すブロック図である。 FIG. 25 is a block diagram showing a configuration example of another manufacturing management apparatus using the clustering system of the present invention.

符号の説明 Explanation of symbols

1· ··特徴量セット作成部 1 ··· Feature set creation part

2· "特徴量抽出部 2. “Feature extraction unit

3· 距離計算部 3.Distance calculator

4· "特徴量セット記憶部 4 · “Feature set storage unit

5· · ·クラスタデータベース 5. Cluster database

100· 被検査物 100 · Inspection object

101· ··画像取得部 101 ··· Image acquisition unit

102· ··照明装置 102 ··· Lighting equipment

103· 撮像装置 103 · Imaging device

104· 欠陥候補検出部 104 · Defect candidate detection unit

105· "クラスタリング部 105 · “Clustering Department

200, 300…制御装置 200, 300 ... Control device

201, 202· ··画像取得装置 201, 202 ... Image acquisition device

301, 302…製造装置 301, 302 ... Manufacturing equipment

303· ··告知部 303 ... Notification Department

304· ··記録部 304 ··· Recording section

発明を実施するための最良の形態 BEST MODE FOR CARRYING OUT THE INVENTION

本発明のクラスタリングシステムは、学習データを母集団として形成されたクラスタ各々に、分類対象の入力データを、この入力データが有する特徴量により分類するクラスタリングシステムに関するものであり、前記クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セットが記憶されている特徴量セット記憶部を有し、特徴量抽出部が予め設定されている該特徴量セットに基づいて、前記入力データから特徴量を抽出し、距離計算部が各クラスタに対応した特徴量セット毎に、該特徴量セットに含まれる特徴量に基づいて、母集団及び前記入力データとの距離を、各々セット距離として計算し、順位抽出部が各セット距離を小さい順に配列し、配列順に対応してクラスタへの分類を行うものである。 The clustering system of the present invention relates to a clustering system that classifies the input data to be classified into the clusters formed by using the learning data as a population according to the feature quantities of the input data, and corresponds to each of the clusters. A feature quantity set storage unit storing a feature quantity set that is a combination of feature quantities used for classification, and a feature quantity extraction unit based on the preset feature quantity set, the input data For each feature quantity set corresponding to each cluster, the distance calculation unit extracts the feature quantity from the feature quantity, and based on the feature quantity included in the feature quantity set, the distance between the population and the input data is obtained. The set distance is calculated, and the rank extraction unit arranges the set distances in ascending order and classifies them into clusters corresponding to the arrangement order.

[0025] <第 1の実施形態 > <First Embodiment>

以下、本発明の第 1の実施形態によるクラスタリングシステムを図面を参照して説明する。図 1は同実施形態によるクラスタリングシステムの構成例を示すブロック図である。 Hereinafter, a clustering system according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of the clustering system according to the embodiment.

本実施形態のクラスタリングシステムは、図 1に示すように、特徴量セット作成部 1, 特徴量抽出部 2,距離計算部 3,特徴量セット記憶部 4およびクラスタデータベース 5 を有している。 As shown in FIG. 1, the clustering system of this embodiment has a feature value set creation unit 1, a feature value extraction unit 2, a distance calculation unit 3, a feature value set storage unit 4, and a cluster database 5.

特徴量セット記憶部 4には、各クラスタの識別情報に対応して、クラスタ毎に個別に設定された、分類対象データの特徴量の組合せを示す特徴量セットが記憶されて、る。たとえば、分類対象データが特徴量の集合 {a, b, c, d}である場合、各クラスタの特徴量セットは [a, b] , [a, b, c, d] , [c]等の種類の特徴量の組合せとして設定されている。以下の説明においては、前記特徴量の集合から、特徴量全ての組合せ，複数 (前記例においては、集合のいずれ力 2つ， 3つの特徴量)の組合せ，いずれか 1つを、「特徴量の組合せ」と定義する。 The feature value set storage unit 4 stores a feature value set indicating a combination of feature values of classification target data, which is individually set for each cluster, corresponding to the identification information of each cluster. For example, if the classification target data is a set of feature quantities {a, b, c, d}, the feature quantity set for each cluster is [a, b], [a, b, c, d], [c], etc. It is set as a combination of feature types. In the following description, a combination of all feature values or a combination of two or more (in the above example, any two forces or three feature values) from the feature value set, any one of them is referred to as “feature value”. Defined as “combinations of”.

[0026] ここで、クラスタ A, Bおよび Cが分類先のクラスタとして設定されている場合、各クラスタに対応する特徴量セットは、各クラスタに予め分類されている学習データを用い、各クラスタと他のクラスタとの距離が最も大きくなる特徴量の組合せとして求められ、特徴量セット記憶部 4に記憶されてヽる。 [0026] Here, when clusters A, B, and C are set as classification destination clusters, the feature value set corresponding to each cluster uses learning data that is classified in advance in each cluster, and each cluster Is obtained as a combination of feature quantities having the largest distance from the other clusters, and is stored in the feature quantity set storage unit 4.

たとえば、クラスタ Aに対して設定されている特徴量セットは、クラスタ Aに属する学習データの各特徴量の平均値力なるベクトルと、その他のクラスタ Bおよび Cに属する学習データの各特徴量の平均値からなるベクトルとの距離が、最も大きくなる特徴量の組合せとして設定されて、る。 For example, the feature quantity set for cluster A includes the vector that is the average value of each feature quantity of learning data belonging to cluster A, and each feature quantity of learning data belonging to other clusters B and C. The distance from the average vector is set as the largest combination of features.

また、分類対象データと、各クラスタにおける母集団の学習データとは、同一の特徴量の集合から構成されて!ヽる。 Further, the classification target data and the learning data of the population in each cluster are composed of the same set of features.

[0027] 特徴量抽出部 2は、入力される分類対象データから、各クラスタとの距離を計算する際、計算対象となるクラスタに対応する特徴量セットを特徴量セット記憶部 4から読み出し、この特徴量セットに対応した特徴量を、分類対象データの複数の特徴量から抽出し、抽出した特徴量を距離計算部 3へ出力する。 [0027] When calculating the distance to each cluster from the input classification target data, the feature amount extraction unit 2 reads the feature amount set corresponding to the cluster to be calculated from the feature amount set storage unit 4. The feature quantity corresponding to this feature quantity set is extracted from the plurality of feature quantities of the classification target data, and the extracted feature quantity is output to the distance calculation unit 3.

距離計算部 3は、クラスタデータベース 5から、計算対象のクラスタの識別情報をキ一とし、計算対象となるクラスタの学習データの各特徴量の平均値力なるベクトルを読み出し、このクラスタの特徴量セットに基づいて、分類対象データが抽出した特徴量力なるベクトルと、学習データの各特徴量の平均値からなるベクトル (クラスタにおける複数の学習データの重心位置を示す重心べ外ル)との距離を算出する。 The distance calculation unit 3 reads from the cluster database 5 a vector that is the average value of each feature quantity of the learning data of the cluster to be calculated, using the identification information of the cluster to be calculated as a key, and sets the feature quantity set of this cluster The distance between the vector of feature quantity extracted from the classification target data and the vector consisting of the average value of each feature quantity of the training data (the center of gravity center indicating the center of gravity of multiple learning data in the cluster) calculate.

[0028] 前記距離の計算を行う際、距離計算部 3は、特徴量間のデータ単位の差異を無くし、特徴量間の数値を標準化するため、以下の（1)式により、分類対象データにおける各特徴量 v (i)毎に正規ィ匕を行っている。 [0028] When calculating the distance, the distance calculation unit 3 eliminates the data unit difference between the feature quantities and standardizes the numerical values between the feature quantities. For each feature v (i), regularity is performed.

V (i) = (v (i) -avg. (i) ) /std. (i) …ひ） V (i) = (v (i) -avg. (I)) / std. (I)… hi)

ここで、 v (i)は特徴量であり、 avg. (i)は計算対象のクラスタ内の学習データにおける特徴量の平均値であり、 std. (i)は計算対象のクラスタ内の学習データにおける特徴量の標準偏差（standardized deviation)であり、 V (i)は正規化された特徴量である。したがって、距離を計算する際に、距離計算部 3は特徴量セット毎に各特徴量の規格ィ匕を行う必要がある。 Here, v (i) is the feature value, avg. (I) is the average value of the feature value in the training data in the cluster to be calculated, and std. (I) is the value in the cluster to be calculated. This is the standardized deviation of the feature in the training data, and V (i) is the normalized feature. Therefore, when calculating the distance, the distance calculation unit 3 needs to perform standardization of each feature value for each feature value set.

また、距離計算部 3は、前記正規化処理を、分類対象データにおける距離の計算に用いる特徴量毎に、学習データのそれぞれ対応する特徴量の平均値および標準偏差を用いて行う。 In addition, the distance calculation unit 3 performs the normalization process using the average value and the standard deviation of the corresponding feature values of the learning data for each feature value used for calculating the distance in the classification target data.

[0029] また、距離としては、上述した標準化した特徴量を用いた標準化ユークリッド距離 (s tandardized Euclidean distance) ,マハラノビス距離，ミンコフスキー距離（Minkowsk ydistance)などの!/、ずれを用いてもよ！、。 [0029] As the distance,! /, Such as standardized Euclidean distance, Mahalanobis distance, Minkowsky distance (Minkowsk ydistance) using the above-mentioned standardized features, may be used! ,.

ここで、マハラノビス距離を用いた場合、マハラノビス平方距離 (Mahalanobis squar ed distance)MHDは以下の（2)式により求められる。 Here, when the Mahalanobis distance is used, the Mahalanobis squared distance (Mahalanobis squared distance) MHD is obtained by the following equation (2).

MHD= (lZn) ' (V^TR^_1V) - - - (2) MHD = (lZn) '(V ^T R ^_1 V)---(2)

前記（2)式における行列 Vにおける各要素 V (i)は、未知データの多次元の特徴量 v (i)に対して、該当クラスタ内の学習データの特徴量の平均値 avg. (i)と標準偏差 std . (i)により、上述した（ 1)式により求めた特徴量である。 nは自由度であり、本実施形態におヽては特徴量セット（後述）における特徴量の数である特徴量数を示して!/ヽるEach element V (i) in the matrix V in the equation (2) is an average value avg. (I) of the feature value of the learning data in the corresponding cluster with respect to the multidimensional feature value v (i) of the unknown data. And the standard deviation std. (I), the feature amount obtained by the above-described equation (1). n is the degree of freedom. For the state, indicate the number of features that is the number of features in the feature set (described later)!

。これにより、マハラノビス平方距離は n個の変換された特徴量の差分を加算した数値であり、（マハラノビス平方距離) Znにより母集団平均の単位距離が 1となる。また、 V^Tは特徴量 v (i)を要素とする行列 Vの転置行列であり、 R^_1はクラスタ内の学習データにおける各特徴量間の相関行列 (correlation matrix)Rの逆行列である。 . As a result, the Mahalanobis square distance is a value obtained by adding the differences of n transformed feature quantities, and the unit distance of the population average becomes 1 due to (Mahalanobis square distance) Zn. V ^T is a transposed matrix of matrix V whose elements are features v (i), and R ^_1 is an inverse matrix of correlation matrix R between each feature in the learning data in the cluster. is there.

[0030] 特徴量セット作成部 1は、前記距離計算部 3が分類対象データと各クラスタとの間の距離を計算する際に用いる特徴量セットを、各クラスタ毎に算出し、算出結果を各クラスタの識別情報に対応して、特徴量セット記憶部 4に書き込んで記憶させる。 [0030] The feature quantity set creation unit 1 calculates a feature quantity set used when the distance calculation unit 3 calculates the distance between the classification target data and each cluster for each cluster, and calculates the calculation result for each cluster. Corresponding to the cluster identification information, it is written and stored in the feature value set storage unit 4.

特徴量セットを算出する際、特徴量セット作成部 1は、各クラスタ毎に、特徴量セットを生成する対象クラスタに属する学習データの重心ベクトル (barycentric vector)と、該対象クラスタを除いた他のクラスタすべてに属する学習データの重心ベクトルとの距離を基に、以下の（3)式により判別基準 (discriminant criterion)の値 λを計算する。以下、特徴量の組合せを特徴量セットとして説明する。 When calculating the feature quantity set, the feature quantity set creation unit 1 for each cluster, the centroid vector (barycentric vector) of the learning data belonging to the target cluster for generating the feature quantity set, and the other clusters excluding the target cluster. Based on the distance from the centroid vector of the learning data belonging to all the clusters, the value λ of the discriminant criterion is calculated by the following equation (3). Hereinafter, a combination of feature amounts will be described as a feature amount set.

[0031] λ = ω ω ( μ - μ ) ^Ζ/ ( ω σ ²+ ω . σ .²) · ' · (3) [0031] λ = ω ω (μ-μ) ^Ζ / (ω σ ² + ω. Σ. ² ) · '· (3)

前記（3)式にお、て、 μ iは「対象クラスタに属する学習データ (クラスタ内母集団）」の特徴量セットにおける特徴量の平均値力もなる重心ベクトルである。 σは該クラスタ内母集団に属する学習データの特徴量によるベクトルの標準偏差である。 ωは全クラスタに属する学習データに対するクラスタ内母集団に属する学習データ数の比率である。また、 μ は「対象クラスタ以外のクラスタに属する学習データ (対象クラスタ外母集団）」の特徴量セットにおける特徴量の平均値力もなる重心ベクトルである。 σ は該対象クラスタ外母集団に属する学習データの特徴量によるベクトルの標準偏差である。 ω は全クラスタに属する学習データにおけるクラスタ外母集団に属する学習 In the above equation (3), μ i is a centroid vector that also serves as an average value of feature quantities in a feature quantity set of “learning data belonging to the target cluster (cluster population)”. σ is a standard deviation of a vector based on a feature amount of learning data belonging to the population in the cluster. ω is the ratio of the number of learning data belonging to the population in the cluster to the learning data belonging to all clusters. In addition, μ is a centroid vector that also represents the average value of feature values in the feature value set of “learning data belonging to clusters other than the target cluster (population outside target cluster)”. σ is a standard deviation of a vector based on a feature amount of learning data belonging to the population outside the target cluster. ω is the learning belonging to the non-cluster population in the learning data belonging to all clusters

0 0

データ数の比率である。ここで、（3)式における ( μ ~ μ ) ί log (対数)および平方根とした数値を用いてもよい。また、ここで各ベクトルを計算する際、特徴量セット作成部 1は、（1)式により、各特徴量毎に規格化された特徴量を計算して用いる。また、比率 co i及び ωを予め演算された、分離が大きくなる数値として固有値を設定するようにしてもよい。 It is the ratio of the number of data. Here, (μ to μ) ί log (logarithm) and a square root value in equation (3) may be used. In addition, when calculating each vector here, the feature value set creation unit 1 calculates and uses a feature value normalized for each feature value according to equation (1). Further, the eigenvalues may be set as numerical values for which the ratios co i and ω are calculated in advance and the separation becomes large.

[0032] そして、特徴量セット作成部 1は、各対象クラスタ毎に、前記（3)式を用いて、他のクラスタとの前記判別基準値えを、学習データを構成する特徴量のいずれ力または全ての組合せに対して計算し、計算された判別基準値えを大きい順に列べ、判別基準値 λの順位リストを出力する。 [0032] Then, the feature value set creation unit 1 uses the above equation (3) for each target cluster and uses another class. The discrimination reference value with the raster is calculated for any force or all combinations of the feature values constituting the learning data, the calculated discrimination reference values are listed in descending order, and the order of the discrimination reference value λ is calculated. Output a list.

ここで、特徴量セット作成部 1は、最も大きな判別基準値えに対応する特徴量の組合せを、対象クラスタの特徴量セットとして、判別基準値えの値とともに、クラスタの識別情報に対応させて、特徴量セット記憶部 4へ記憶する。 Here, the feature quantity set creation unit 1 uses the combination of feature quantities corresponding to the largest discriminant reference value as the feature quantity set of the target cluster, along with the discriminant reference value and the cluster identification information. And stored in the feature value set storage unit 4.

[0033] 上述した判別基準値 λの決定は、図 2 (a)に示すように、特徴量セット作成部 1は、各クラスタの特徴量セットの設定を行う際、学習データおよび分類対象データの特徴量が a, b, c, dの 4つである場合、この 4つの特徴量全て，複数，いずれか 1つの全組合せにおける判別基準値 λを全て計算する。 [0033] As shown in Fig. 2 (a), the determination criterion value λ is determined by the feature quantity set creation unit 1 when the feature data set of each cluster is set. If there are four feature values, a, b, c, and d, all of the four feature values, multiple, or any one of all combinations of the discrimination reference values λ are calculated.

そして、特徴量セット作成部 1は、最も高い数値、たとえば、図 2 (a)においては、特徴量 b, cの組合せを選択する。 Then, the feature quantity set creation unit 1 selects the highest numerical value, for example, the combination of the feature quantities b and c in FIG. 2 (a).

[0034] また、他の判別基準値 λの方法として、図 2 (b)に記載したように、 BSS法、すなわち分類対象データの集合に含まれる特徴量 n個全てを用いた判別基準値 λを演算し、次に特徴量 η個の集合カゝら η—1個を取り出す組合せ全てに対し、判別基準値えを演算する。そして、その n—l個の判別基準値え力最大値の組合せを選択し、今度は、この n—1個の特徴量力も n— 2個の組合せ全てに対し、判別基準値えを演算する。このように、順位、 1個ずつ特徴量を集合カゝら減少させて、減少させた特徴量の集合から、さらに 1個減少させた組合せを選択して判別基準値えを演算して、少ない特徴量数で判別できる組合せを選択するよう、特徴量セット作成部 1を構成してもよい。 [0034] As another method for determining the reference value λ, as described in Fig. 2 (b), the BSS method, that is, a reference using all n feature quantities included in the set of classification target data. The value λ is calculated, and then the discriminant reference value is calculated for all combinations in which η−1 is extracted from the set of feature values η. Then, select the combination of the n−l discriminant reference value maximum values, and calculate the discriminant reference value for all the n−1 feature amount forces. To do. In this way, the feature amount is reduced by one by one in the order, and the discrimination reference value is calculated by selecting the combination further reduced by one from the reduced feature amount set, and reducing the feature amount. The feature quantity set creation unit 1 may be configured to select a combination that can be identified by the number of feature quantities.

[0035] また、さらに、他の判別基準値えの方法として、図 2 (c)に記載したように、 FSS法、すなわち分類対象データの集合に含まれる特徴量 n個から特徴量の全種類を 1個ずつ読み出し、各特徴量の判別基準値 λを演算し、この中から最大の判別基準値を有する特徴量を選択する。次に、この特徴量とそれ以外の特徴量との 2つの特徴量からなる組合せを生成し、それぞれの組合せに対する判別基準値えを計算する。そして、その組合せの中から最大の判別基準値を有する組合せを選択する。次に、この組合せと、この組合せに含まれてヽな、特徴量との 3つの特徴量力もなる組合せを生成し、それぞれの判別基準値えを生成する。このように、順次、直前の特徴量の組合せから最大の判別基準値えを有する特徴量を選択し、組合せの特徴量を組合せに対し、この組合せに存在しない特徴量を 1個増カロさせ、増加させた組合せの特徴量の判別基準値 λを計算し、この組合せ力最大の判別基準値 λを有する組合せを選択し、さらにこの組合せに存在しなヽ特徴量を 1個増加させた特徴量の組合せの判別基準値えを演算して、最終的に、判別基準値えを計算した全ての組合せから、判別基準値えが最大となる組合せを特徴量セットとして選択するよう、特徴量セット作成部 1を構成してもよい。 [0035] Furthermore, as another method for determining the criterion value, as described in Fig. 2 (c), the FSS method, that is, all kinds of feature quantities from n feature quantities included in the set of classification target data are used. Are read one by one, the discrimination reference value λ of each feature quantity is calculated, and the feature quantity having the maximum discrimination reference value is selected from these. Next, a combination of two feature quantities, that is, the feature quantity and other feature quantities is generated, and a discrimination reference value for each combination is calculated. Then, the combination having the maximum discrimination reference value is selected from the combinations. Next, this combination and the combination that is included in this combination and has the three feature quantity powers are generated. And each discrimination reference value is generated. In this way, the feature quantity having the maximum discriminant reference value is selected sequentially from the combination of the immediately preceding feature quantities, and the feature quantity of the combination is increased by one with respect to the combination. The feature value of the increased combination feature value λ is calculated, the combination with the largest combination reference value λ is selected, and the feature value that is not present in this combination is increased by one feature. The feature value set is calculated so that the discrimination criterion value of the combination of amounts is calculated and, finally, the combination with the maximum discrimination criterion value is selected as the feature amount set from all the combinations for which the discrimination criterion value is calculated. Creation unit 1 may be configured.

[0036] 次に、判別基準値えによって、クラスタリングに用いる特徴量セットの選択の有効性を、図 3および 4により示す。 Next, FIGS. 3 and 4 show the effectiveness of selecting feature quantity sets used for clustering based on the discrimination reference value.

図 3には、特徴量 a, b, c, d, eから、特徴量セットを選択する組合せとして、特徴量 aおよび gの組合せと、特徴量 aおよび hの組合せと、特徴量 dおよび eとの組合せを抽出し、これらの組合せから、クラスタ 1と、クラスタ 2および 3とにおいて、従来例に比して高ヽ分類特性を有する特徴量セットの選択にっヽて説明する。 FIG. 3 shows combinations of feature amounts a and g, combinations of feature amounts a and h, and feature amounts d and e as combinations for selecting feature amount sets from feature amounts a, b, c, d, and e. From these combinations, cluster 1 and clusters 2 and 3 will be described with reference to selection of feature quantity sets having higher classification characteristics than conventional examples.

図 3において、 1は前記 .に、 2は前記に、び1は前記び.に、 σ 2は前記 σ に、 ω ΐは前記 ω .、 ω 2は前記 ω にそれぞれ対応している。 In FIG. 3, 1 corresponds to the above, 2 corresponds to the above, 1 corresponds to the above, σ 2 corresponds to the σ, ω ΐ corresponds to the ω., And ω 2 corresponds to the ω.

[0037] この中で、前記組合せにぉ、て、最も判別基準値 λの値が大き、のは特徴量 aおよび hの組合せであり、この組合せをクラスタ 1と、それ以外のクラスタとの分離に用い、クラスタ 1とそれ以外のクラスタ (クラスタ 2及び 3)との分類結果を図 4により確認する図 4にお、て、横軸は特徴量の組合せを用いて演算したマハラノビス距離の logの数値を示し、縦軸は対応する数値を有する分離対象データの数 (ヒストグラム)を示している。ここで、横軸の数値 1. 4は、マハラノビス距離の logの数値が 1. 4未満かつ 1 . 2以上（1. 4の左側の数値)であることを意味する。他の横軸上の数値も同様である。また、図 4において 1. 4≤は 1. 4以上であることを表す。図 4のマハラノビス距離は、クラスタ 1に対応する特徴量セットを用いて、クラスタ 1およびそれ以外のクラスタに属する分類対象データに対して各々計算したものである。 [0037] Among them, the combination has the largest discriminant reference value λ, which is a combination of feature quantities a and h. This combination is defined as a combination of cluster 1 and other clusters. The classification results of cluster 1 and the other clusters (clusters 2 and 3) are confirmed in Fig. 4 for separation. In Fig. 4, the horizontal axis is the log of Mahalanobis distance calculated using the combination of features. The vertical axis represents the number of data to be separated (histogram) having a corresponding numerical value. Here, a numerical value of 1.4 on the horizontal axis means that the numerical value of the log of Mahalanobis distance is less than 1.4 and 1.2 or more (the value on the left side of 1.4). The same applies to the values on the other horizontal axes. In Fig. 4, 1.4≤ indicates that it is 1.4 or more. The Mahalanobis distance in Fig. 4 is calculated for the classification target data belonging to cluster 1 and other clusters using the feature set corresponding to cluster 1.

図 4 (a)が特徴量 aおよび gの組合せを用いてマハラノビス距離を演算した例であり、図 4 (b)が特徴量 aおよび hの組合せを用いてマハラノビス距離を演算した例であり、図 4 (c)が特徴量 dおよび eの組合せを用いてマハラノビス距離を演算した例です。図 4におけるヒストグラムを見ると、判別基準値えの数値が大きいと、クラスタ 1と他のクラスタとの分類が良く行われていることが判る。 Fig. 4 (a) shows an example of calculating the Mahalanobis distance using a combination of features a and g. Fig. 4 (b) shows an example of calculating the Mahalanobis distance using a combination of features a and h. Fig. 4 (c) shows an example of calculating a Mahalanobis distance using a combination of features d and e. Looking at the histogram in Fig. 4, it can be seen that if the numerical value of the discrimination criterion is large, cluster 1 is well classified into other clusters.

[0038] 次に、図 5および図 6を参照して、図 1の第 1の実施形態によるクラスタリングシステムの動作を説明する。図 5は第 1の実施形態によるクラスタリングシステムの特徴量セット作成部 1の動作例を示すフローチャートであり、図 6は分類対象データのクラスタリングの動作例を示すフローチャートである。 Next, with reference to FIGS. 5 and 6, the operation of the clustering system according to the first embodiment of FIG. 1 will be described. FIG. 5 is a flowchart showing an operation example of the feature quantity set creation unit 1 of the clustering system according to the first embodiment, and FIG. 6 is a flowchart showing an operation example of the clustering of the classification target data.

以下の説明において、たとえば、分類対象データがガラス物品に付けられた傷の特徴量の集合である場合、この特徴量として「a :キズ (scratch)の長さ」，「b :キズの面積」，「c：キズの幅」，「d :キズ部分を含む所定領域の透過率」，「e :キズを含む所定領域の反射率」などが、画像処理や測定結果から得られるとする。したがって、特徴量の集合 (以下、特徴量集合とする）としては {a, b, c, d, e}となる。また、本実施形態においては、クラスタリングに用いる距離を、規格ィ匕した特徴量を用いたマハラノビス距離として算出する。ここで、本実施形態における上記ガラス物品は、一例として、板ガラスやディスプレイ用ガラス基板が挙げられる。 In the following description, for example, when the classification target data is a set of feature values of scratches attached to a glass article, these feature values are “a: length of scratch”, “b: scratch area”. ”,“ C: width of scratch ”,“ d: transmittance of a predetermined region including a scratch portion ”,“ e: reflectance of a predetermined region including a scratch ”, and the like are obtained from image processing and measurement results. Therefore, the set of feature values (hereinafter referred to as feature value set) is {a, b, c, d, e}. In this embodiment, the distance used for clustering is calculated as the Mahalanobis distance using the standardized feature value. Here, examples of the glass article in the present embodiment include plate glass and a glass substrate for display.

[0039] A.特徴量セット作成処理（図 5のフローチャート対応） [0039] A. Feature value set creation processing (corresponding to the flowchart in FIG. 5)

ユーザは、ガラスに付けられたキズを検出し、この画像を撮像して画像データを得るとともに、この画像データカゝらキズ部分の長さの測定などの特徴量の抽出を画像処理により行い、前記特徴量の集合からなる特徴量データを収集する。そして、ユーザはキズの発生原因や形状などの分類したい各クラスタに対して、予め判っている発生原因や形状などの情報に基づき、特徴量データを学習データとして振り分け、各クラスタの学習データの母集団とし、図示しない処理端末からクラスタの識別情報に対応させて、クラスタデータベース 5へ記憶させる（ステップ Sl)。 The user detects scratches on the glass, captures this image, obtains image data, and performs image processing to extract feature quantities such as measuring the length of the scratched part. Collecting feature value data consisting of the set of feature values. Then, the user sorts the feature data as learning data based on information such as the cause and shape that is known in advance for each cluster to be classified such as the cause and shape of the scratch, and the learning data of each cluster is sorted. A population is stored in the cluster database 5 in correspondence with the cluster identification information from a processing terminal (not shown) (step Sl).

[0040] 次に、特徴量セット作成部 1は、各クラスタに対する特徴量セットを生成する制御命令を、前記処理端末から入力すると、クラスタデータベース 5から、各クラスタの識別情報に対応して、学習データの母集団を読み込む。 [0040] Next, when a control instruction for generating a feature quantity set for each cluster is input from the processing terminal, the feature quantity set creation unit 1 corresponds to the identification information of each cluster from the cluster database 5. Read the population of learning data.

そして、特徴量セット作成部 1は、各クラスタ毎に、クラスタ内母集団における各特徴量の平均値および標準偏差を算出し、この平均値および標準偏差を用いて、（1)式から、各学習データにおける規格化された特徴量を算出する。 Then, the feature quantity set creation unit 1 performs each feature in the cluster population for each cluster. The average value and standard deviation of the quantities are calculated, and the standardized feature quantity in each learning data is calculated from the equation (1) using the average value and standard deviation.

[0041] 次に、特徴量セット作成部 1は、特徴量集合に含まれる特徴量の全ての組合せの特徴量セット毎に、（3)式により判別基準値えを算出する。 [0041] Next, the feature quantity set creation unit 1 calculates a discrimination reference value according to equation (3) for each feature quantity set of all combinations of feature quantities included in the feature quantity set.

このとき、特徴量セット作成部 1は、クラスタ毎に、クラスタ内母集団の規格化された特徴量を用いて、各特徴量セットに対応した特徴量力もなるベクトルの平均値 (重心ベクトル） μと、クラスタ内母集団における特徴量セットに対応する特徴量カゝらなる学習データのベクトルの標準偏差 σと、クラスタ外母集団の規格化された特徴量を用いて、各特徴量セットに対応した特徴量力なるベクトルの平均値 (重心ベクトル） μ と、クラスタ外母集団における特徴量セットに対応する特徴量からなる学習データのベクトルの標準偏差 σ と、全学習データ数におけるクラスタ内母集団の学習データ数の比率 ωと、全学習データ数におけるクラスタ外母集団の学習データ数の比率 ω とを算出する。 At this time, the feature value set creation unit 1 uses, for each cluster, the standardized feature values of the population in the cluster, and the average value of vectors (centroid vectors) μ that also have feature value power corresponding to each feature value set. And the standard deviation σ of the learning data vector corresponding to the feature quantity set in the population within the cluster and the standardized feature quantity of the population outside the cluster, to each feature quantity set. Corresponding feature value power vector mean value (centroid vector) μ, standard deviation σ of learning data vector consisting of features corresponding to feature quantity set in non-cluster population, and intra-cluster population in the total number of learning data The ratio ω of the number of learning data and the ratio ω of the number of learning data of the non-cluster population in the total number of learning data are calculated.

[0042] そして、特徴量セット作成部 1は、前記重心ベクトル， μ ,標準偏差 σ , σ ,比率 ω , ωを用いて、（3)式により、各クラスタ毎に他のクラスタとの距離を判別する判別基準値えを、各クラスタ毎に、特徴量集合の全ての組合せの特徴量セットに対して計算する。 [0042] Then, the feature quantity set creation unit 1 uses the centroid vectors, μ, standard deviations σ, σ, and ratios ω, ω to calculate the distance from other clusters for each cluster according to equation (3). The discriminant reference value for discriminating is calculated for each cluster feature amount set of all combinations of feature amount sets.

全ての判別基準値えの計算が終了すると、特徴量セット作成部 1は、各クラスタ毎に、大きい順に判別基準値えを列べ、最も大きな判別基準値えに対応する特徴量セットを、各クラスタへの所属を判定する際に、距離の算出に用いる特徴量の組合せの集合を示す特徴量セットとして検出する (ステップ S2)。 When the calculation of all the discrimination reference values is completed, the feature quantity set creation unit 1 lists the discrimination reference values in the descending order for each cluster, and selects the feature quantity set corresponding to the largest discrimination reference value. When determining the affiliation to each cluster, it is detected as a feature value set indicating a set of feature value combinations used for calculating the distance (step S2).

[0043] 次に、特徴量セット作成部 1は、距離計算部 3での距離の計算に用いるため、各特徴量セットに対応した特徴量間の相関係数 Rと、各クラスタ内母集団における学習データの特徴量の平均値 avg. (i)および標準偏差 std. (i)とを算出する (ステップ S3)。 [0043] Next, the feature quantity set creation unit 1 uses the correlation coefficient R between the feature quantities corresponding to each feature quantity set and the population within each cluster for use in the distance calculation by the distance calculation unit 3. The average value avg. (I) and standard deviation std. (I) of the feature values of the learning data at are calculated (step S3).

[0044] 次に、特徴量セット作成部 1は、前記判別基準値えから補正係数え ^{_ (1/2)}を算出する。この補正係数えは、各特徴量セット間の標準化をとるものである。クラスタによって、他のクラスタとの距離がばらついているため、分類精度を上げるために、特徴量セット間の標準化を行う必要がある。また、補正係数としてえ ^{_ (1/2)}ではなぐ log ( λ )としたり、あるいは単純に（。— を用いても良ぐ λを含む関数であって特徴量セット間の標準化が行えるものであれば、ずれでもよ、。 [0044] Next, the feature value set creation unit 1 calculates a correction coefficient ^{_ (1/2)} from the discrimination reference value. This correction coefficient is standardized between each feature set. Since the distance from other clusters varies depending on the cluster, it is necessary to standardize the feature sets in order to increase the classification accuracy. In addition, the correction coefficient Toshitee ^{_ (1/2)} in Nag log ( λ), or simply (.-can be used. If it is a function that includes λ and can standardize between feature sets, it can be shifted.

また、上記（3)式において、対象クラスタ外母集団の特徴量セットにおける重心べクトルを算出する際、対象クラスタ外母集団における学習データとして以下の 3つの種類のいずれかを選択して算出する。 In the above equation (3), when calculating the center of gravity vector in the feature set of the population outside the target cluster, select one of the following three types as the learning data in the population outside the target cluster. To do.

a.全学習データにおける対象クラスタ外母集団の全ての学習データ a. All learning data of the population outside the target cluster in all learning data

b.上記対象クラスタ外母集団における分類の目的に対応する特定の学習データ c特徴量の選択に用いた学習データにおける対象クラスタ外母集団の学習データここで、 b.の分類の目的とは注目しているクラスタと明確に差を付けて区別することであり、学習データとしてはこの差を付けたい他のクラスタに含まれる学習データを用いる。 b. Specific learning data corresponding to the purpose of classification in the population outside the target cluster c Learning data of the population outside the target cluster in the learning data used to select the feature amount Here, the purpose of classification in b. The learning data included in other clusters to which this difference is desired is used as the learning data.

そして、特徴量セット作成部 1は、各クラスタの識別情報毎に対応させて、特徴量セットと、特徴量セットに対応した補正係数、本実施形態にてはえ ^{_ (1/2)}の値と、逆行列 R^_1と、平均値 avg. (i)と、標準偏差 std. (i)を、特徴量セット記憶部 4に、距離計算データとして記憶する (ステップ S4)。 Then, the feature quantity set creation unit 1 associates the identification information of each cluster with the feature quantity set, the correction coefficient corresponding to the feature quantity set, and in this embodiment fly_ ^(1/2) , The inverse matrix ^R_1 , the average value avg. (I), and the standard deviation std. (I) are stored as distance calculation data in the feature quantity set storage unit 4 (step S4).

[0045] B.クラスタリング処理（図 6のフローチャート対応） [0045] B. Clustering processing (corresponding to the flowchart in FIG. 6)

分類対象データが入力されると、特徴量抽出部 2は、各クラスタの識別信号により、クラスタ毎に対応した特徴量セットを、特徴量セット記憶部 4から読み出す。 When the classification target data is input, the feature quantity extraction unit 2 reads a feature quantity set corresponding to each cluster from the feature quantity set storage unit 4 based on the identification signal of each cluster.

そして、特徴量抽出部 2は、読み出した特徴量セットにおける特徴量の種別に対応して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を内部記憶部に記憶する (ステップ Sl l)。 Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the type of feature quantity in the read feature quantity set, and associates it with each cluster identification information. Then, the extracted feature value is stored in the internal storage unit (step Sl l).

[0046] 次に、距離計算部 3は、分類対象データから抽出した各特徴量を、特徴量セット記憶部 4から、該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i)を読み出し、前記 (2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。 [0046] Next, the distance calculation unit 3 obtains each feature amount extracted from the classification target data from the feature amount set storage unit 4 with an average value avg. (I) corresponding to the feature amount and a standard deviation std. (i) is read out, standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し、この行列 Vの転置行列 V^Tを計算し、（3)式により、順次、分類対象データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、内部記憶部に記憶する (ステップ S 12)。 Then, the distance calculation unit 3 generates the matrix V that also has the elemental force of V (i) obtained as described above, calculates the transposed matrix V ^T of this matrix V, and sequentially classifies it according to equation (3). The Mahalanobis distance between the target data and each cluster is calculated, and the internal data is recorded according to the identification information of each cluster. Store in memory (step S12).

[0047] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数え ^{_ (1/2)}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える (ステップ S 13)。また、補正係数を乗算する際、マハラノビス距離の lo gまたは平方根を計算した後に乗算するようにしてもょ、。 [0047] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient ^{_ (1/2)} corresponding to the feature quantity set to obtain a correction distance, and each Mahalanobis distance Replace with separation (step S13). Also, when multiplying the correction factor, it may be multiplied after calculating the log or square root of the Mahalanobis distance.

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を比較し（ステップ S14)、最小の補正距離を検出し、その補正距離に対応する識別情報のクラスタを、分類対象データの属するクラスタとし、クラスタデータベース 5に対し、分類先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S1 5)。 Then, the distance calculation unit 3 compares the correction distances between the clusters in the internal storage unit (step S14), detects the minimum correction distance, and identifies the cluster of identification information corresponding to the correction distances as the classification target. As the cluster to which the data belongs, the classified data to be classified is stored in the cluster database 5 in correspondence with the identification information of the cluster to be classified (step S15).

[0048] <第 2の実施形態 > <Second Embodiment>

上述した第 1の実施形態は、クラスタリングを行う際に用いる特徴量セットを、クラスタ毎に 1種類として説明したが、以下に説明する第 2の実施形態のように、クラスタ毎に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノビス距離を演算し、補正距離を算出して、この補正距離を小さい順番に並び替え、上位の所定の順位以内の補正距離により、予め設定された規則に応じて、分類対象データの属するクラスタとしてちょ、。 In the first embodiment described above, the feature amount set used for clustering is described as one type for each cluster. However, as in the second embodiment described below, the feature amount set is set for each cluster. Multiple, and calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, rearrange the correction distances in ascending order, and in advance by the correction distance within the upper predetermined rank As a cluster to which the data to be classified belongs, according to the set rules.

[0049] すなわち、本実施形態における距離計算部 3は、特徴量セット毎に得られた分類対象データと各クラスタとの距離にぉ、て、この距離の順位に基づ、て設定された分類対象データの各クラスタへの分類基準を示す規則パターンにより、分類対象データカ^、ずれのクラスタに属するかを検出する。 That is, the distance calculation unit 3 according to the present embodiment is set based on the distance ranking based on the distance between the classification target data obtained for each feature quantity set and each cluster. Based on the rule pattern indicating the classification criteria for each cluster of the classification target data, it is detected whether the classification target data belongs to the misaligned cluster.

以下、第 2の実施形態の構成は、図 1に示す第 1の実施形態と同様であり、同一の符号を各構成に付し、各構成において第 1の実施形態と異なる動作のみを、図 7を用いて説明する。第 2の実施形態においては、学習データ力も上記規則パターンを設定する処理がある。図 7は規則パターンを設定する距離の順位に対するパターン学習の動作例を示すフローチャートである。図 8及び図 9は第 2の実施形態におけるクラスタリングの動作例を示すフローチャートである。 Hereinafter, the configuration of the second embodiment is the same as that of the first embodiment shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the first embodiment in each configuration are illustrated in FIG. Use 7 to explain. In the second embodiment, there is a process for setting the rule pattern for the learning data power. FIG. 7 is a flowchart showing an example of pattern learning operation for the order of distances for setting rule patterns. 8 and 9 are flowcharts showing an example of clustering operation according to the second embodiment.

[0050] また、第 1の実施形態において、特徴量セットを作成する際、特徴量セット作成部 1 は、各クラスタ毎に、特徴量の組合せとしての複数の特徴量セットに対して判別基準値えを算出し、複数求められた判別基準値えの最大値に対応する特徴量セットを、各クラスタの特徴量セットとして設定した。 Further, in the first embodiment, when creating a feature quantity set, a feature quantity set creation unit 1 For each cluster, a discrimination reference value is calculated for a plurality of feature quantity sets as a combination of feature quantities, and a feature quantity set corresponding to the maximum value of the plurality of obtained discrimination reference values is calculated for each cluster. It was set as a feature amount set.

一方、第 2の実施形態において、特徴量セット作成部 1は、各クラスタ毎に、他のクラスタの 1つまたは複数の組合せあるヽは他の全てのクラスタに対して、それぞれ特徴量の組合せ数に対応する特徴量セットの最大値を設定することにより、複数の判別基準値えを求め、各クラスタ毎に他のクラスタと分離するための複数の特徴量セットを設定する。 On the other hand, in the second embodiment, the feature value set creation unit 1 has one or more combinations of other clusters for each cluster. By setting the maximum value of the feature quantity set corresponding to the number of combinations, a plurality of judgment reference values are obtained, and a plurality of feature quantity sets for separating each cluster from each other is set.

そして、特徴量セット作成部 1は、各特徴量セット毎に距離計算データを求め、クラスタの識別情報に対応させて、複数の特徴量セットと、各特徴量セットの距離計算データを特徴量セット記憶部 4に記憶させる。 Then, the feature quantity set creation unit 1 obtains distance calculation data for each feature quantity set, and associates a plurality of feature quantity sets with the distance calculation data of each feature quantity set in accordance with the cluster identification information. Stored in the feature set storage unit 4.

[0051] そして、図 7において、学習データが入力されると、特徴量抽出部 2は、各クラスタの識別信号により、クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部 4から読み出す。 Then, in FIG. 7, when learning data is input, the feature quantity extraction unit 2 receives a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 according to the identification signal of each cluster. read out.

そして、特徴量抽出部 2は、読み出した各特徴量セットにおける特徴量の種別に対応して、学習データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する (ステツプ S21)。 Then, the feature quantity extraction unit 2 extracts the feature quantity from the learning data for each cluster corresponding to the type of feature quantity in each read feature quantity set, and associates it with the identification information of each cluster. Then, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S21).

[0052] 次に、距離計算部 3は、学習データから抽出した各特徴量を、特徴量セット記憶部 4から、特徴量セット毎に該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i)を読み出し、前記（2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。 [0052] Next, the distance calculation unit 3 extracts each feature quantity extracted from the learning data from the feature quantity set storage unit 4 and the average value avg. (I) corresponding to the feature quantity for each feature quantity set and the standard The deviation std. (I) is read out and standardized by performing the calculation of equation (2) above, and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し、この行列 Vの転置行列 V^Tを計算し、（3)式により、順次、学習データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴量セット毎に内部記憶部に記憶する (ステップ S22)。 Then, the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above, calculates a transposed matrix V ^T of this matrix V, and sequentially learns according to Equation (3). The Mahalanobis distance between the data and each cluster is calculated, and stored in the internal storage unit for each feature quantity set in correspondence with the identification information of each cluster (step S22).

[0053] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数え ^{_ (1/2)}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える (ステップ S23)。 [0053] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient ^{_ (1/2)} corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S23).

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を小さい順に並べ替え (小さヽ補正距離ほど上位となる順位に並べ替え）、すなわち分類対象データとの補正距離の小さ、クラスタの識別情報が上位となる順番に並べる (ステップ S24)。 Then, the distance calculation unit 3 sorts the correction distances between the clusters in the internal storage unit in ascending order (sorts the smaller correction distances in a higher rank), that is, the correction distance from the classification target data becomes smaller. Then, arrange them in the order of higher cluster identification information (step S24).

[0054] 次に、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に対応するクラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う。 [0054] Next, the distance calculation unit 3 detects the identification information of the cluster corresponding to each of the correction distances from the smaller (higher) to the n-th, and the identification information for each cluster included in the n is detected. Count the number, that is, vote for each cluster.

そして、距離計算部 3は、各学習データの各クラスタの識別情報のカウント数のバターンが、同一のクラスタに含まれる学習データに共通する規則パターンを検出する。たとえば、 nを 10としたとき、クラスタ Bの学習データの場合、クラスタ Aが 5個，クラスタ Bが 3個、クラスタ Cが 2個となるカウント数のパターンとなることが検出されるとこれを規則 R1とする。 Then, the distance calculation unit 3 detects a rule pattern in which the number of identification information counts of each cluster of each learning data is common to the learning data included in the same cluster. For example, if n is set to 10 and the learning data for cluster B is detected as a count pattern with 5 clusters A, 3 clusters B, and 2 clusters C, this is detected. Is rule R1.

また、クラスタ Cの学習データの場合、クラスタ Cが 3個検出されると、クラスタ Aが 7 個で、クラスタ Bが 0個であっても、必ずクラスタ Cであることが共通であると、クラスタ C のカウント数が 3以上であれば、他のクラスタのカウント数に無関係にクラスタ Cとする規則 R2とする。 Also, in the case of learning data for cluster C, if 3 clusters C are detected, 7 clusters A and 0 clusters B will always be cluster C. If the count of C is 3 or more, rule R2 is set to cluster C regardless of the count of other clusters.

また、クラスタ Aの学習データの場合、クラスタ Aが上位から 1番目及び 2番目を占めた並びのパターンのとき、クラスタ Bのカウント数が 8個であっても、他のクラスタのカウント数に無関係にクラスタ Aとする規則 R3とする。 In addition, in the case of learning data for cluster A, when cluster A has the first and second patterns from the top, even if the number of counts for cluster B is 8, the number of counts for other clusters Regardless of rule R3, cluster A is used.

[0055] 上述したように、同一クラスタに分類される各学習データが有する各クラスタのカウント数の規則性を検出し、各クラスタの識別情報毎にパターンテーブルとして内部に記憶しておく。ここで、規則は各クラスタに 1つでもよいし、複数設定しておいてもよい。また、上述の説明において、距離計算部 3が規則パターンを抽出するとしたが、ュ一ザが各クラスタへの分類の精度を変えるために、カウント数あるいは並びの規則パターンを任意に設定してもよい。 [0055] As described above, the regularity of the number of counts of each cluster included in each learning data classified into the same cluster is detected and stored internally as a pattern table for each piece of identification information of each cluster. Here, one rule may be set for each cluster, or a plurality of rules may be set. In the above description, the distance calculation unit 3 extracts the rule pattern. However, in order for the user to change the accuracy of classification into each cluster, the count number or the order rule pattern can be set arbitrarily. Also good.

クラスタによっては、他のクラスタと特徴情報の特性が似ているものもあり、複数のクラスタの関連性、すなわち各クラスタのカウント数あるいは上位からの並びのパターンである対象パターンから、分類対象データの分類を行う方が精度の高い場合もあり、本実施形態はその点を補完するものである。 Some clusters have similar characteristics to the characteristics of other clusters, and the relevance of multiple clusters, that is, the count number of each cluster or the pattern from the top In some cases, it is more accurate to classify the classification target data from the target pattern, and this embodiment complements this point.

[0056] 次に、上述したテーブルに記述された規則を用いた第 2の実施形態のクラスタリングの処理につ、て、図 8のフローチャートを用いて説明する。 Next, the clustering process of the second embodiment using the rules described in the above table will be described with reference to the flowchart of FIG.

分類対象データが入力されると、特徴量抽出部 2は、各クラスタの識別信号により、クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部 4から読み出す。そして、特徴量抽出部 2は、読み出した各特徴量セットにおける特徴量の種別に対応して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する（ステップ S31)。 When the classification target data is input, the feature quantity extraction unit 2 reads a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 in accordance with the identification signal of each cluster. Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the feature quantity type in each read feature quantity set, and each of the cluster identification information. Correspondingly, the extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S31).

[0057] 次に、距離計算部 3は、分類対象データから抽出した各特徴量を、特徴量セット記憶部 4から、特徴量セット毎に該特徴量に対応する平均値 avg. (i)と標準偏差 std. (i) を読み出し、前記（2)式の演算を行うことにより規格ィ匕し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。 [0057] Next, the distance calculation unit 3 extracts each feature quantity extracted from the classification target data from the feature quantity set storage unit 4 for each feature quantity set by calculating an average value avg. (I) And the standard deviation std. (I) are read out and standardized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the standardized feature quantity.

そして、距離計算部 3は、上述のように得られた V (i)の要素力もなる行列 Vを生成し Then, the distance calculation unit 3 generates a matrix V that also has the elemental force of V (i) obtained as described above.

、この行列 Vの転置行列 V^Tを計算し、（3)式により、順次、分類対象データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴量セット毎に内部記憶部に記憶する (ステップ S32)。 Then, the transpose matrix V ^T of this matrix V is calculated, and the Mahalanobis distance between the data to be classified and each cluster is calculated sequentially by equation (3), and each feature is correlated with the identification information of each cluster. Each quantity set is stored in the internal storage unit (step S32).

[0058] 次に、距離計算部 3は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数え ^{_ (1/2)}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える (ステップ S33)。 [0058] Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient ^{_ (1/2)} corresponding to the feature amount set to obtain a correction distance, and each Mahalanobis distance Replace with release (step S33).

そして、距離計算部 3は、内部記憶部における各クラスタ間との補正距離を小さい順に、並べ替え、すなわち分類対象データとの補正距離の小さいクラスタの識別情報が上位となる順番に並べる（ステップ S34)。 Then, the distance calculation unit 3 rearranges the correction distances between the clusters in the internal storage unit in ascending order, that is, arranges the identification information of the clusters with the small correction distances to the classification target data in the order of higher rank (step S34).

並べ替えた後、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に対応するクラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う。 After rearrangement, the distance calculation unit 3 detects the identification information of the clusters corresponding to the correction distances from the smaller (higher) to the nth, and calculates the number of identification information for each cluster included in the n pieces. Counting, that is, voting is performed for each cluster.

[0059] 次に、距離計算部 3は、各分類対象データの上位 n個における各クラスタに対するカウント数のパターン (あるいは並びのパターン） 1S 内部に記憶したテーブルに存在するか否かの照合処理を行う（ステップ S35)。 [0059] Next, the distance calculation unit 3 applies to each cluster in the top n pieces of each classification target data. Count number pattern (or arrangement pattern) 1S Check whether it exists in the table stored in the inside (step S35).

そして、距離計算部 3は、上述した照合の結果、分類対象データの対象パターンに合致する規則パターンがテーブルに記述されていることを検出すると、この分類対象データがその合致した規則に対応する識別情報のクラスタに属すると判定し、分類対象データをこのクラスタに分類する (ステップ S36)。 When the distance calculation unit 3 detects that the rule pattern matching the target pattern of the classification target data is described in the table as a result of the above-described collation, the distance calculation unit 3 identifies the classification target data corresponding to the matched rule. It is determined that it belongs to the cluster of information, and the classification target data is classified into this cluster (step S36).

[0060] また、上述したテーブルに記述された規則を用いた第 2の実施形態の他のクラスタリングの処理について、図 9のフローチャートを用いて説明する。 Further, another clustering process of the second embodiment using the rules described in the above-described table will be described with reference to the flowchart of FIG.

この図 9に示す他のクラスタリングの処理において、ステップ S31〜ステップ S35までの処理は、図 8に示す処理と同様であり、距離計算部 3は、ステップ S35においてすでに述べたように、テーブルに記憶されている規則パターンから、分類対象データの対象パターンとの照合処理を行う。 In the other clustering processing shown in FIG. 9, the processing from step S31 to step S35 is the same as the processing shown in FIG. 8, and the distance calculation unit 3 stores the table in the table as already described in step S35. Based on the stored rule pattern, collation processing with the target pattern of the classification target data is performed.

そして、距離計算部 3は、上記照合結果において、上記対象パターンと合致する規則パターンが検索されたか否かを検出し、合致する規則パターンが検索されたことを検出した場合、処理をステップ S47へ移行し、一方、合致する規則パターンが検索されな、ことを検出した場合、処理をステップ S48へ移行する (ステップ S46)。 Then, the distance calculation unit 3 detects whether or not a rule pattern that matches the target pattern is found in the collation result. If the rule calculation unit 3 detects that a rule pattern that matches the target pattern is found, the process proceeds to step S47. On the other hand, if it is detected that no matching rule pattern is found, the process proceeds to step S48 (step S46).

[0061] 合致する規則パターンが検索されたことを検出した場合、距離計算部 3は、この分類対象データがその合致した規則に対応する識別情報のクラスタに属すると判定し、分類対象データをこのクラスタに分類し、クラスタデータベース 5に対し、分類先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S47)。一方、合致する規則パターンが検索されないことを検出した場合、距離計算部 3は、カウント数、すなわち投票数が最も多い識別情報を検出し、この識別情報に対応するクラスタに分類対象データを分類する。 [0061] When it is detected that a matching rule pattern has been searched, the distance calculation unit 3 determines that this classification target data belongs to the cluster of identification information corresponding to the matching rule, and determines the classification target data. The classification target data is classified and stored in the cluster database 5 in correspondence with the identification information of the classification destination cluster (step S47). On the other hand, when it is detected that the matching rule pattern is not searched, the distance calculation unit 3 detects the identification information having the largest number of counts, that is, the number of votes, and classifies the classification target data into the cluster corresponding to the identification information. To do.

そして、距離計算部 3は、クラスタデータベース 5に対し、帰属先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S48)。 Then, the distance calculation unit 3 stores the classified target data in the cluster database 5 in association with the identification information of the cluster to which it belongs (step S48).

[0062] <第 3の実施形態 > <Third Embodiment>

上述した第 2の実施形態は、計算した分類対象データの各クラスタとの距離が小さ Vヽ (類似性が大きヽ）方から上位 n個における規則パターンのテーブルを準備し、このテーブルにある規則パターンに対応するカゝ否かにより、各分類対象データのクラスタリングの処理を行うとして説明したが、以下に説明する第 3の実施形態のように、クラスタ毎に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノビス距離を演算し、補正距離を算出して、上位の所定の順位以内の補正距離が多いクラスタを、分類対象データの属するクラスタとしてもよい。 The second embodiment described above prepares a table of rule patterns in the top n from the V V (similarity is large) distance from each cluster of the calculated classification target data. Although it has been described that clustering processing is performed on each classification target data depending on whether or not the rule pattern corresponding to the rule pattern in the table is selected, the feature amount is set for each cluster as in the third embodiment described below. Set multiple sets, calculate the Mahalanobis distance corresponding to each feature set, calculate the correction distance, and select the cluster with the most correction distances within the upper predetermined rank as the cluster to which the classification target data belongs. Also good.

以下、第 3の実施形態の構成は、図 1に示す第 1及び第 2の実施形態と同様であり、同一の符号を各構成に付し、各構成において第 2の実施形態と異なる動作のみを、図 10を用いて説明する。第 3の実施形態においては、学習データから上記規則を設定する処理がなぐ直接に図 9におけるステップ S48を行う。図 10は第 3の実施形態におけるクラスタリングの動作例を示すフローチャートである。 Hereinafter, the configuration of the third embodiment is the same as that of the first and second embodiments shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the second embodiment are performed in the respective configurations. Is explained using FIG. In the third embodiment, step S48 in FIG. 9 is performed directly after the process of setting the rules from the learning data. FIG. 10 is a flowchart showing an example of clustering operation in the third embodiment.

[0063] この図 10に示す他のクラスタリングの処理において、ステップ S31〜ステップ S34までの処理は、図 8に示す処理と同様であり、距離計算部 3は、すでに述べたように、ステツプ S34において、内部記憶部における各クラスタ間との補正距離を小さい順に、並べ替え、すなわち分類対象データとの補正距離の小さ、クラスタの識別情報が上位となる順番に並べる (ステップ S34)。 [0063] In the other clustering processing shown in Fig. 10, the processing from step S31 to step S34 is the same as the processing shown in Fig. 8, and the distance calculation unit 3 performs the step as described above. In S34, the correction distances between the clusters in the internal storage unit are rearranged in ascending order, that is, the correction distances from the classification target data are arranged in ascending order of the cluster identification information (step S34).

次に、距離計算部 3は、小さい方 (上位)から n番目までの各補正距離に対応するクラスタの識別情報を検出し、その n個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う（ステップ S55)。 Next, the distance calculation unit 3 detects cluster identification information corresponding to each of the correction distances from the smaller (higher) to the nth, and counts the number of identification information for each cluster included in the n. In other words, a voting process is performed for each cluster (step S55).

そして、距離計算部 3は、投票結果において、最も多いカウント値 (投票数)の識別情報を検出し、この識別情報に対応するクラスタを、分類対象データの属するクラスタとし、クラスタデータベース 5に対し、帰属先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する (ステップ S 56)。 Then, the distance calculation unit 3 detects the identification information of the largest count value (the number of votes) in the voting result, and designates the cluster corresponding to this identification information as the cluster to which the classification target data belongs, with respect to the cluster database 5. Then, the classified data to be classified is stored in correspondence with the identification information of the cluster to which it belongs (step S 56).

[0064] また、ユーザが予め足きりのための投票数の閾値を、距離計算部 3に識別情報毎に設定し、最も投票数の多い識別情報の投票数がこの閾値に満たない場合、いずれのクラスタにも属さな、とする処理を行ってもょ、。 [0064] In addition, when the user sets a threshold value for the number of votes in advance in the distance calculation unit 3 for each identification information, and the number of votes for the identification information with the largest number of votes is less than this threshold value, Do not belong to any cluster.

例えば、クラスタ A, B, Cの 3つのクラスタに対して、分類対象データを分類する場合、クラスタ Aの識別情報に対する投票数が 5個であり、クラスタ Bの識別情報に対する投票数が 3個であり、クラスタ Cに対する投票数が 2個である場合、最も投票数の多い識別情報はクラスタ Aと距離計算部 3が検出する。 For example, when classifying data to be classified for three clusters A, B, and C, the number of votes for the identification information of cluster A is five, and the number of votes for the identification information of cluster B is five. If there are three and the number of votes for cluster C is two, the most votes Cluster A and the distance calculation unit 3 detect the identification information.

し力しながら、クラスタ Aに対する上記閾値が 6個として設定されていると、距離計算部 3は、クラスタ Aの識別情報に対する投票数が閾値に満たないため、いずれのクラスタにも属さないとの判定を行う。 However, if the above threshold value for cluster A is set to 6, the distance calculation unit 3 must not belong to any cluster because the number of votes for the identification information of cluster A is less than the threshold value. Judgment is made.

これにより、特徴量が他のクラスタとわずかな差しかないクラスタに対するクラスタリングにおいて、分類対象データのクラスタに対する分類処理の信頼性を向上させることが可能となる。 As a result, it is possible to improve the reliability of the classification process for the cluster of the classification target data in the clustering for the cluster whose feature quantity is not slightly different from other clusters.

[0065] <特徴量の変換方法 > [0065] <Feature Conversion Method>

各特徴量の母集団が正規分布であることを期待してクラスタリングを行うが、特徴量の種類 (面積、長さなど）によっては正規分布とならず、母集団が偏った分布を有する場合があり、分類対象データと各クラスタとの間の距離の計算、すなわち分類対象データと各クラスタとの類似性を判定する場合の精度が低下することが考えられる。そのため、特徴量によっては、母集団の特徴量を所定の方法により変換し、正規分布に近づけて類似性の判定の精度を向上させることを行う必要がある。 Clustering is performed with the expectation that the population of each feature is a normal distribution, but depending on the type of feature (area, length, etc.), the distribution may not be a normal distribution and the population may be biased. In some cases, the calculation of the distance between the classification target data and each cluster, that is, the accuracy in determining the similarity between the classification target data and each cluster may be reduced. Therefore, depending on the feature value, it is necessary to convert the feature value of the population by a predetermined method and to improve the accuracy of similarity determination by bringing it close to the normal distribution.

この正規分布への変換方法としては、特徴量を logや平方根 ( ）、立方根 (³ )などの n方根、または階乗、あるいは数値計算により求めた関数を含む演算式のいずれかにより変換する。 This normal distribution can be converted to either a logarithm, n-root such as square root (), cubic root ( ³ ), factorial, or an arithmetic expression including a function obtained by numerical calculation. Convert.

[0066] 以下に、各特徴量の変換方法の設定処理について図 11を用いて説明する。図 11 は各特徴量の変換方法の設定処理の動作例を示すフローチャートである。なお、この変換方法は、クラスタ毎に、クラスタに含まれる各特徴量単位にて設定する。また、この変換方法の設定は、各クラスタに属する学習データを用いて行う。以下の処理は、特徴量セット作成部 1が行うこととして説明するが、この処理に対応した処理部を他に設けても力まわない。 [0066] Hereinafter, setting processing of a conversion method for each feature amount will be described with reference to FIG. FIG. 11 is a flowchart showing an operation example of the setting process of the conversion method of each feature quantity. This conversion method is set for each cluster in units of feature values included in the cluster. This conversion method is set using learning data belonging to each cluster. The following processing is described as being performed by the feature value set creation unit 1, but it is not necessary to provide another processing unit corresponding to this processing.

特徴量セット作成部 1は、分類対象のクラスタの識別情報をキーとし、このクラスタに含まれる学習データをクラスタデータベース 5から読み出し、各学習データの特徴量を算出 (正規化処理)する (ステップ S61)。 The feature value set creation unit 1 uses the identification information of the cluster to be classified as a key, reads the learning data included in this cluster from the cluster database 5, and calculates (normalizes) the feature value of each learning data (step S61). ).

[0067] 次に、特徴量セット作成部 1は、内部に記憶されている特徴量変換を行う演算式のいずれかを用い、読み出した上記各学習データを演算することにより、特徴量の変換を行う（ステップ S 62)。 [0067] Next, the feature value set creation unit 1 performs feature value conversion by calculating each of the read learning data using any of the arithmetic expressions for performing feature value conversion stored therein. (Step S62).

全ての学習データの特徴量の変換が終了すると、特徴量セット作成部 1は、変換処理にて得られた分布が正規分布に近いか否かを示す評価値を算出する (ステップ S6 3)。 When the conversion of the feature values of all learning data is completed, the feature value set creation unit 1 calculates an evaluation value indicating whether the distribution obtained by the conversion process is close to the normal distribution (step S63) .

[0068] 次に、特徴量セット作成部 1は、内部に記憶されている、すなわち変換方法として予め設定されて、る演算式の全てにぉ、て評価値を算出した力否かの検出を行、、全ての演算式にて特徴量が変換され得られた分布の評価値が算出されていることが検出された場合、処理をステップ S65へ進め、一方、全ての演算式による特徴量の算出が終了して、な、ことを検出した場合、次に設定されて、る演算式の処理を行うため、処理をステップ S62へ戻す (ステップ S64)。 [0068] Next, the feature value set creation unit 1 detects whether or not the force is calculated by calculating the evaluation value for all the arithmetic expressions stored therein, that is, preset as a conversion method. , And if it is detected that the evaluation value of the distribution obtained by converting the feature values in all the arithmetic expressions is calculated, the process proceeds to step S65, while If the calculation of the feature quantity is completed and it is detected, the process returns to step S62 (step S64) in order to process the arithmetic expression that is set next.

全ての演算式による特徴量の変換が終了した場合、特徴量セット作成部 1は、設定した演算式において得られた分布にて評価値が最も小さな分布、すなわち最も正規分布に近い分布を検出し、検出された分布を作成するために用いた演算式を変換方法として決定し、そのクラスタの特徴量の変換方法として内部に設定する (ステップ S65)。 When the feature value conversion by all the calculation formulas is completed, the feature value set creation unit 1 detects the distribution with the smallest evaluation value in the distribution obtained in the set calculation formula, that is, the distribution closest to the normal distribution. Then, the arithmetic expression used to create the detected distribution is determined as a conversion method and set internally as a conversion method for the feature quantity of the cluster (step S65).

特徴量セット作成部 1は、上述した処理を各クラスタの特徴量毎に対して行い、それぞれのクラスタにおける各特徴量に対応して変換方法を設定する。 The feature quantity set creation unit 1 performs the above-described processing for each feature quantity of each cluster, and sets a conversion method corresponding to each feature quantity in each cluster.

[0069] 次に、上記ステップ S63における評価値の計算を、図 12を用いて説明する。図 12 は演算式により得られた分布の評価値を求める処理の動作例を説明するフローチヤートである。 [0069] Next, the calculation of the evaluation value in step S63 will be described with reference to FIG. Fig. 12 is a flow chart for explaining an example of the processing for obtaining the evaluation value of the distribution obtained by the arithmetic expression.

特徴量セット作成部 1は、対象クラスタに属する各学習データの特徴量を、設定されて、る演算式により変換する (ステップ S71)。 The feature quantity set creation unit 1 converts the feature quantity of each learning data belonging to the target cluster by the set arithmetic expression (step S71).

全ての学習データの特徴量を変換した後、特徴量セット作成部 1は、この変換後の特徴量にて得られた分布 (母集団)の平均値 μ及び標準偏差 σを算出する (ステツプ S72)。 After converting the feature values of all the learning data, the feature value set creation unit 1 calculates the average value μ and standard deviation σ of the distribution (population) obtained from the converted feature values (step) S72).

そして、特徴量セット作成部 1は、上記母集団の平均値と標準偏差 σとを用いて (X— /ζ ) Ζ σにより ζ値（1)を算出する (ステップ S73)。 Then, the feature value set creation unit 1 calculates the ζ value (1) from (X− / ζ) Ζσ using the average value of the population and the standard deviation σ (step S73).

[0070] 次に、特徴量セット作成部 1は、上記母集団における累積確率を算出する (ステツプ S74)。 [0070] Next, the feature value set creation unit 1 calculates the cumulative probability in the population (steps). P74).

算出後、特徴量セット作成部 1は、求めた母集団中の累積確率により、標準正規分布の累積分布関数の逆関数の値として z値（2)を算出する (ステップ S75)。 After the calculation, the feature value set creation unit 1 calculates the z value (2) as the value of the inverse function of the cumulative distribution function of the standard normal distribution based on the calculated cumulative probability in the population (step S75).

そして、特徴量セット作成部 1は、特徴量の分布の 2つの z値、すなわち z値（1)及び z値（2)の差、すなわち分布における 2つの z値の誤差を求める（ステップ S76)。 z値の誤差を求めると、特徴量セット作成部 1は、上記 2つ z値の誤差の和、すなわちその誤差の総和（自乗和）を評価値として算出する (ステップ S77)。 Then, the feature quantity set creation unit 1 calculates the difference between the two z values of the feature quantity distribution, that is, the difference between the z value (1) and the z value (2), that is, the error between the two z values in the distribution (step S76). . When the error of the z value is obtained, the feature value set creation unit 1 calculates the sum of the errors of the two z values, that is, the sum of the errors (square sum) as an evaluation value (step S77).

上述した 2つの z値の誤差が小さいほど、分布は正規分布に近ぐ z値の誤差がなければ正規分布であり、一方、分布が正規分布力外れるほど誤差は大きくなる。 The smaller the error between the two z-values mentioned above, the closer the distribution is to the normal distribution if there is no z-value error. On the other hand, the larger the error is, the larger the error is.

[0071] 次に、第 1〜第 3の実施形態におけるクラスタリングの処理を行う前に、分類対象データの特徴量の算出について図 13を用いて説明する。図 13は、分類対象データの特徴量データの算出の動作例を示すフローチャートである。 Next, calculation of the feature quantity of the classification target data will be described with reference to FIG. 13 before performing the clustering process in the first to third embodiments. FIG. 13 is a flowchart showing an operation example of calculating feature amount data of classification target data.

距離計算部 3は、入力される分類対象データから識別対象の特徴量を、各クラスタに対して設定された特徴量セットに対応して抽出し、すでに説明した正規化処理を行う（ステップ S81)。 The distance calculation unit 3 extracts the feature quantity of the identification target from the input classification target data corresponding to the feature quantity set set for each cluster, and performs the normalization process already described (step S81). .

次に、距離計算部 3は、分類対象データにおける分類対象のクラスタへの分類に用 V、られる特徴量を、このクラスタの特徴量に対して設定されて、る変換方法 (演算式）により変換する (ステップ S82)。 Next, the distance calculation unit 3 converts the feature quantity used for classification into the cluster to be classified in the classification target data V according to the conversion method (arithmetic formula) set for the feature quantity of this cluster. (Step S82).

そして、距離計算部 3は、第 1〜第 3の実施形態に記載されているように、分類対象のクラスタとの距離を算出する (ステップ S83)。 Then, as described in the first to third embodiments, the distance calculation unit 3 calculates a distance from the cluster to be classified (step S83).

[0072] 次に、距離計算部 3は、分類対象のクラスタ全てに対し、各クラスタの特徴量に対応して設定された変換方法により、特徴量を変換し、この変換した特徴量によりクラスタとの距離が計算されたカゝ否かの検出を行い、分類対象の全てのクラスタに対して距離を求めたことが検出された場合、処理をステップ S85へ進め、一方、分類対象のクラスタが残って、ることを検出した場合、処理をステップ S82に戻す (ステップ S84)。そして、第 1〜第 3の実施形態各々において、距離の計算が終了した時点力の処理を開始する (ステップ S85)。 [0072] Next, the distance calculation unit 3 converts the feature amounts for all the clusters to be classified by a conversion method set in correspondence with the feature amounts of each cluster, and the cluster and the converted feature amounts. If it is detected that the distance is calculated for all the clusters to be classified, the process proceeds to step S85, while the cluster to be classified is detected. If it is detected that remains, the process returns to step S82 (step S84). Then, in each of the first to third embodiments, processing of the point-in-time force when the distance calculation is completed is started (step S85).

上述した処理により、本実施形態にて用いているマハラノビス距離においては、分類対象データと各クラスタとの間の距離を求める際、特徴量が正規分布であることを期待しているため、母集団の各特徴量の分布が正規分布に近いほど、各クラスタとの間にお、て正確な距離 (類似性)を求めることができ各クラスタに対する分類の精度が向上することが期待できる。 With the processing described above, the Mahalanobis distance used in this embodiment is When calculating the distance between the target data and each cluster, we expect that the feature quantity is a normal distribution. Therefore, the closer the distribution of each feature quantity of the population is to the normal distribution, the closer to each cluster. In addition, it can be expected that an accurate distance (similarity) can be obtained and the accuracy of classification for each cluster is improved.

実施例 Example

[0073] <計算例> [0073] <Calculation example>

次に、上述した第 1 ,第 2及び第 3の実施形態のクラスタリングシステムを用いて、図 14に示すサンプルデータによる、従来例との分類の精度を確認した。サンプル数が少ないが、使用している特徴量が少ないにもかかわらず、従来例またはそれ以上の正答率が得られていることが判る。この図 14において、クラスタとして、カテゴリ 1 ,力テゴリ 2およびカテゴリ 3のそれぞれに学習データを 10個ずつ定義し、各学習データが特徴量 a, b, c, d, e, f, g, hの 8つを有している。この例では、図 14に示す各クラスタに属している学習データから、クラスタリングに用いる特徴量セットを決定し、次に、分類対象データとして、同様に学習セットを用いてクラスタリングを行っている。 Next, using the clustering systems of the first, second, and third embodiments described above, the accuracy of classification with the conventional example based on the sample data shown in FIG. 14 was confirmed. Although the number of samples is small, it can be seen that the correct answer rate is higher than that of the conventional example, even though the number of features used is small. In Fig. 14, 10 learning data are defined for each of category 1, force category 2 and category 3 as clusters, and each learning data has feature quantities a, b, c, d, e, f, g, h There are eight of them. In this example, a feature amount set used for clustering is determined from the learning data belonging to each cluster shown in FIG. 14, and then clustering is similarly performed using the learning set as classification target data.

[0074] 計算結果としては、図 15が従来の計算手法として、特徴量の組合せとして特徴量 a および gを用いて、クラスタ 1〜クラスタ 3の図 14に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。図 15 (a)において、 Clusterlの列はクラスタ 1とのマハラノビス距離であり、 Cluster2の列はクラスタ 2とのマハラノビス距離であり、 Cluster3の列はクラスタ 3とのマハラノビス距離を示している。また、カテゴリの列が実際に各学習データが属しているクラスタを示し、判定結果が学習データとマノ、ラノビス距離が最小のクラスタを示して、る。カテゴリと判定結果との数字が一致して!、るものが正確に分類された特徴量データを示して、る。 [0074] As a calculation result, Fig. 15 shows a conventional calculation method, using feature amounts a and g as a combination of feature amounts, and for each learning data shown in Fig. 14 of cluster 1 to cluster 3, Mahalanobis. The distance is calculated and the determination result is shown. In Fig. 15 (a), the cluster column is the Mahalanobis distance to cluster 1, the cluster 2 column is the Mahalanobis distance to cluster 2, and the cluster 3 column is the Mahalanobis distance to cluster 3. The category column indicates the cluster to which each learning data actually belongs, and the judgment result indicates the learning data and the mano, and the cluster with the smallest Ranobis distance. The numbers in the category and the judgment result match! The feature data that is correctly classified is displayed.

[0075] 図 15 (b)において、列の番号が学習データが実際に属しているクラスタを示し、行の番号が判定されたクラスタを示している。例えば、マーク R1の「8」はクラスタ 1の 10 個のクラスタの内 8個がクラスタ 1として判定され、マーク R2の「2」はクラスタ 1の 10個のクラスタの内 2個がクラスタ 3と判定されたことを示している。 ρθは正解と回答との一致率を示し、 piは両者が偶然一致する確率を示し、 κは全体補正判定率であり、以下の式により求められる。この κが高いほど分類の精度が高いことを示している。 κ = (p0 -pl) / ( l -pl) In FIG. 15B, the column number indicates the cluster to which the learning data actually belongs, and the row number indicates the determined cluster. For example, “8” in mark R1 is determined as cluster 1 in 8 of 10 clusters in cluster 1, and “2” in mark R2 is determined as cluster 3 in 2 out of 10 clusters in cluster 1. It has been shown. ρθ indicates the match rate between correct answer and answer, pi indicates the probability of coincidence of both, and κ is the overall correction determination rate, which can be obtained by the following formula. The higher κ, the higher the classification accuracy. κ = (p0 -pl) / (l -pl)

pO = (a + d) / (a + b + c + d) pO = (a + d) / (a + b + c + d)

pi = [ (a + b) · (a + c) · (b + d) · (c + d) ] · (a + b + c + d) ² pi = [(a + b) · (a + c) · (b + d) · (c + d)] · (a + b + c + d) ²

[0076] 前記式における、 a, b, c, dの関係を、図 16を用いて説明する。 [0076] The relationship between a, b, c, and d in the above equation will be described with reference to FIG.

クラスタ 1に属するデータがクラスタ 1として分類された数力であり、クラスタ 1に属するデータがクラスタ 2として分類された数力 ¾であり、 a + bがクラスタ 1に属するデータ数を示している。また、同様に、クラスタ 2に属するデータがクラスタ 2として分類された数が dであり、クラスタ 2に属するデータがクラスタ 1として分類された数が cであり、 c + dがクラスタ 2に属するデータ数を示して!/、る。 a + cは全データ a + b + c + dの内でクラスタ 1に分類された数であり、 b + dは全データ a +b + c + dの内でクラスタ bに分類された数である。 The data belonging to cluster 1 is the number power classified as cluster 1, the data belonging to cluster 1 is the number power ¾ classified as cluster 2, and a + b indicates the number of data belonging to cluster 1 . Similarly, the number of data belonging to cluster 2 classified as cluster 2 is d, the number of data belonging to cluster 2 classified as cluster 1 is c, and c + d is data belonging to cluster 2 Show the number! / a + c is the number classified as cluster 1 in all data a + b + c + d, b + d is the number classified as cluster b in all data a + b + c + d It is.

[0077] 次に、図 17が第 1の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。この図 17 (a)および (b)の見方については、図 15と同様であるためその説明を省略する。正解率 ρθ,偶然一致する確立 p i ,全体補正判定率 _Kは図 15の従来の計算手法と同等であることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に最大の判別基準値えを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ 1に対応した特徴量セットとしては特徴量 aおよび hの組み合わせを用い、クラスタ 2に対応した特徴量セットとしては特徴量 a , dの組み合わせを用い、クラスタ 3に対応した特徴量セットとしては特徴量 a, gの組み合わせを用いた。 Next, FIG. 17 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the first embodiment. . The way of viewing FIGS. 17 (a) and (b) is the same as in FIG. 15, and therefore the description thereof is omitted. It can be seen that the correct answer rate ρθ, the coincidence probability pi, and the overall correction judgment rate _K are equivalent to the conventional calculation method in FIG. Here, a feature value set corresponding to each cluster was calculated using a method of selecting a combination having the maximum discriminant reference value for each cluster from the entire combinations described above. The feature set corresponding to cluster 1 is a combination of feature quantities a and h, the feature set corresponding to cluster 2 is a combination of feature quantities a and d, and the feature set corresponding to cluster 3 is A combination of features a and g was used.

[0078] 次に、図 18が第 2の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。この図 18 (a)および (b)の見方にっ、ては、図 15と同様であるためその説明を省略する。正解率 pOが 0. 8333であり、偶然一致する確立 p iが 0. 3333であり，全体補正判定率 κが 0. 75であり、図 15の従来の計算手法と比較すると分類精度が向上していることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に上位 3番目までの判別基準値 λを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ 1に対応した特徴量セットとしては特徴量 a 'h, a-g, d' eの 3つの組み合わせを用い、クラスタ 2に対応した特徴量セットとしては特徴量 a'f, a - d, a 'bの 3つの組み合わせを用い、クラスタ 3に対応した特徴量セットとしては特徴量 e 'g, a - c, a 'gの 3つの組み合わせを用いた。 Next, FIG. 18 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the second embodiment. . 18 (a) and 18 (b) are the same as FIG. 15 and will not be described. The correct answer rate pO is 0.8333, the probability of coincidence pi is 0.3333, the overall correction judgment rate κ is 0.75, and the classification accuracy is improved compared to the conventional calculation method of Fig. 15. You can see that Here, each class is selected using a method of selecting the combination having the third highest discriminant reference value λ for each cluster from the above-described total combinations. The feature amount set corresponding to the star was calculated. As a feature set corresponding to cluster 1, three combinations of feature quantities a 'h, ag, d' e are used. As a feature set corresponding to cluster 2, feature quantities a'f, a-d , a 'b, and three combinations of feature quantities e' g, a-c, and a 'g were used as the feature quantity set corresponding to cluster 3.

また、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ないものから 3番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象データが属するクラスタとした。 In addition, for voting, we calculated the number of clusters that entered the third from the smallest Mahalanobis distance, and calculated the number of the third cluster from the smallest, and set the largest number of clusters to which the classification target data belongs.

[0079] 次に、図 19が第 2の実施形態の計算手法を用い、クラスタ 1〜クラスタ 3の図 14に示す各学習データに対して、マハラノビス距離を演算し、さらに計算結果のマハラノビス距離に対して補正係数（ λ ) ^{_ 1/2}を乗算した後、距離の順位付けを行い、判定結果を示している。この図 19 (a)および (b)の見方については、図 15と同様であるためその説明を省略する。正解率 ρθが 0. 8333であり、偶然一致する確立 piが 0. 3333 であり，全体補正判定率 κが 0. 75であり、図 15の従来の計算手法と比較すると分類精度が向上していることが判る。ここで、上述した全体の組み合わせのな力から、各クラスタ毎に上位 3番目までの判別基準値 λを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ 1に対応した特徴量セットとしては特徴量 a'h, a -g, d' eの 3つの組み合わせを用い、クラスタ 2に対応した特徴量セットとしては特徴量 a'f, a - d, a 'bの 3つの組み合わせを用い、クラスタ 3 に対応した特徴量セットとしては特徴量 e 'g, a- c, a 'gの 3つの組み合わせを用いたまた、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ないものから 3番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象データが属するクラスタとした。 Next, FIG. 19 uses the calculation method of the second embodiment, calculates the Mahalanobis distance for each learning data shown in FIG. 14 for cluster 1 to cluster 3, and further calculates the Mahalanobis distance of the calculation result. ^Is multiplied by the correction coefficient (λ) ^{_1 / 2} , and then the ranking of the distance is performed, and the determination result is shown. The way of viewing FIGS. 19 (a) and 19 (b) is the same as FIG. The correct answer rate ρθ is 0.8333, the probability of coincidence pi is 0.3333, the overall correction judgment rate κ is 0.75, and the classification accuracy is improved compared to the conventional calculation method of Fig. 15. You can see that Here, a feature value set corresponding to each cluster was calculated using a method of selecting combinations having the third highest discriminant reference value λ for each cluster from the above-described total combination power. The feature set corresponding to cluster 1 uses three combinations of feature quantities a'h, a -g, d 'e, and the feature set corresponding to cluster 2 is feature quantity a'f, a-d, Three combinations of a 'b were used, and three combinations of features e' g, a- c, and a 'g were used as the feature set corresponding to cluster 3, and the Mahalanobis distance was used for voting The number of clusters that fall in order from the smallest number to the third is calculated, and the largest number of clusters is the cluster to which the classification target data belongs.

[0080] 上述した図 15, 17, 18, 19に示した各分類結果から、本実施形態が従来例に比して、高速かつ高精度のクラスタリング処理が行われていることが判り、本実施形態の従来例に対する優位性が確認できた。 [0080] From the classification results shown in FIGS. 15, 17, 18, and 19 described above, it can be seen that the clustering process of this embodiment is faster and more accurate than the conventional example. The superiority of the form over the conventional example was confirmed.

[0081] <本発明の応用例 > <Application example of the present invention>

A.検査装置図 20に示すように被検査物、例えばガラス基板表面のキズの種類を分類する検査装置 (欠陥検出装置)を説明する。図 21は特徴量セットの選択の動作例を説明するフローチャートであり、図 22はクラスタリング処理における動作例を説明するフローチヤートである。 A. Inspection equipment As shown in FIG. 20, an inspection apparatus (defect detection apparatus) for classifying the types of scratches on the surface of an inspection object, for example, a glass substrate will be described. FIG. 21 is a flowchart for explaining an operation example for selecting a feature quantity set, and FIG. 22 is a flowchart for explaining an operation example in the clustering process.

まず、特徴量セットの選択の動作について説明する。図 5のフローチャートにおけるステップ S1における学習データの収集力図 21のフローチャートのステップ S101からステップ S 105に対応して!/、る。 First, the feature quantity set selection operation will be described. The learning data collection capability in step S1 in the flowchart of FIG. 5 corresponds to steps S101 to S105 in the flowchart of FIG.

図 21のステップ S2からステップ S4は図 5のフローチャートと同様であるため、説明を省略する。 Steps S2 to S4 in FIG. 21 are the same as those in the flowchart in FIG.

[0082] オペレータの操作により、キズの種類を分類したいクラスタにそれぞれ対応する学習データ用のサンプル収集する（ステップ S 101)。 [0082] By the operation of the operator, a sample for learning data corresponding to each cluster to be classified into scratch types is collected (step S101).

画像取得部 101が学習データとして収集したキズの形状を照明装置 102にて照射し、キズの部分の画像データを撮像装置 103により取得する (ステップ S102)。そして、画像取得部 101が取得した画像データから、各学習データのキズの特徴量を算出する (ステップ S 103)。 The flaw shape collected as learning data by the image acquisition unit 101 is irradiated by the illumination device 102, and the image data of the flaw portion is acquired by the imaging device 103 (step S102). Then, the flaw feature amount of each learning data is calculated from the image data acquired by the image acquisition unit 101 (step S 103).

得られた学習データの特徴量を目視で得られた分類先にそれぞれ振り分け、各クラスタにおける学習データの特定を行う（ステップ S 104)。 The feature quantities of the obtained learning data are assigned to the classification destinations obtained visually, and the learning data in each cluster is specified (step S104).

そして、各クラスタの学習データが所定数 (予め設定したサンプル数）、例えば、 30 0個ずつ程度になるまで、ステップ S101からステップ S102までの処理を繰り返し、所定数となると、すでに図 5説明したステップ S2以降の処理をクラスタリング部 105が行う。ここで、クラスタリング部 105は、第 1または第 2の実施形態におけるクラスタリングシステムである。 Then, the processing from step S101 to step S102 is repeated until the learning data of each cluster reaches a predetermined number (preset number of samples), for example, about 300 pieces each. The clustering unit 105 performs the processing after step S2. Here, the clustering unit 105 is the clustering system in the first or second embodiment.

[0083] 次に、図 22を参照して、図 4の検査装置におけるクラスタリングの処理を説明する。 Next, with reference to FIG. 22, the clustering process in the inspection apparatus of FIG. 4 will be described.

ここで、図 22のステップ S31からステップ S34、 S55及び S56は図 10のフローチヤ一トと同様であるため、説明を省略する。 Here, steps S31 to S34, S55, and S56 in FIG. 22 are the same as the flowchart in FIG.

図 20の検査装置において、検査が開始されると、被検査物 100であるガラス基板に対し、照明装置 102が照明を行い、撮像装置 103がガラス基板表面を撮影してその撮像画像を画像取得部 101へ出力する。これにより、欠陥候補検出部 104は、画像取得部 101から入力される撮像画像において平面形状と異なる部分を検出すると、それを分類すべき欠陥候補とする (ステップ S 201)。 In the inspection apparatus of FIG. 20, when inspection is started, the illumination device 102 illuminates the glass substrate that is the object to be inspected 100, and the imaging device 103 captures the surface of the glass substrate and images the captured image. Output to the acquisition unit 101. As a result, the defect candidate detection unit 104 When a portion different from the planar shape is detected in the captured image input from the image acquisition unit 101, it is set as a defect candidate to be classified (step S201).

[0084] 次に、欠陥候補検出部 104は、その欠陥候補の部分の画像データを分類対象データとして、撮像画像から切り出す。 Next, the defect candidate detection unit 104 cuts out the image data of the defect candidate portion from the captured image as classification target data.

そして、欠陥候補検出部 104は、分類対象データの画像データから特徴量を算出し、クラスタリング部 105に対して、抽出した特徴量の集合からなる分類対象データを出力する (ステップ S 202)。 Then, the defect candidate detection unit 104 calculates the feature amount from the image data of the classification target data, and outputs the classification target data including the extracted feature amount set to the clustering unit 105 (step S 202).

後のクラスタリングの処理については、図 10のステップですでに説明してあるため、省略する。上述したように、本発明の検査装置は、ガラス基板上に付いた傷を、キズの種類毎に、高い精度にて分類することができる。 The subsequent clustering process has already been described in step 10 in FIG. As described above, the inspection apparatus of the present invention can classify scratches on a glass substrate with high accuracy for each type of scratch.

[0085] B.欠陥種類判定装置 [0085] B. Defect type determination device

図 23に示す欠陥種類判定装置は、クラスタリング部 105がすでに説明した本発明のクラスタリングシステムに対応して、る。 The defect type determination apparatus shown in FIG. 23 corresponds to the clustering system of the present invention already described by the clustering unit 105.

画像取得装置 201は、図 20における画像取得部 101,照明装置 102および撮像装置 103から構成されて、る。 The image acquisition device 201 includes the image acquisition unit 101, the illumination device 102, and the imaging device 103 in FIG.

すでに分類対象データを分類する先の各クラスタの学習データは取得されており、クラスタリング装置 105のクラスタデータベース 5に準備されている。したがって、図 5 における特徴量セットの選択も終了している。 The learning data of each cluster to which the classification target data is classified has already been acquired and prepared in the cluster database 5 of the clustering apparatus 105. Therefore, the feature set selection in FIG. 5 is also completed.

[0086] 各製造装置に取り付けられて!/ヽる画像取得装置 202から入力される撮像画像から欠陥候補を検出し、その画像データを切り取り、特徴量を抽出してデータ収集装置 2 03へ出力する。制御装置 200は、データ収集装置 203へ入力される分類対象データを、クラスタリング部 105へ転送させる。そして、すでに説明したように、クラスタリング部 105は、入力される分類対象データを、キズの種類に対応した各クラスタに対して分類する。 [0086] A defect candidate is detected from the captured image input from the image acquisition device 202 attached to each manufacturing device !, and the image data is cut out, and the feature amount is extracted and output to the data collection device 203. To do. The control device 200 transfers the classification target data input to the data collection device 203 to the clustering unit 105. As described above, the clustering unit 105 classifies the input classification target data for each cluster corresponding to the type of scratch.

[0087] C.製造管理装置 [0087] C. Manufacturing management device

本発明の製造管理装置は、図 24に示すように、制御装置 300,製造装置 301, 30 2,告知部 303,記録部 304,不具合装置判定部 305および欠陥種別判定装置 306 カゝら構成されている。ここで、欠陥種別判定装置 306は前記 Bの項で説明した欠陥種別判定装置と同様である。 As shown in FIG. 24, the manufacturing management apparatus of the present invention is composed of a control device 300, manufacturing devices 301, 302, a notification unit 303, a recording unit 304, a defective device determination unit 305, and a defect type determination device 306. ing. Here, the defect type determination device 306 is the defect described in the section B above. This is the same as the type determination device.

欠陥種類判定装置 306は、製造装置 301および製造装置 302にそれぞれ設けられている画像取得装置 201, 202からの撮像画像を、対応する欠陥候補検出部 104 にお、て画像処理して特徴量を抽出し、分類対象データの分類を行う。 The defect type determination device 306 performs image processing on the captured images from the image acquisition devices 201 and 202 provided in the manufacturing device 301 and the manufacturing device 302, respectively, and performs feature processing on the corresponding defect candidate detection unit 104. And classify the data to be classified.

[0088] 次に、不具合装置判定部 305は、分類されたクラスタの識別情報と、そのクラスタに対応する発生要因との関係を示すテーブルを有し、前記欠陥種類判定装置 306から入力される分類先のクラスタの識別情報に対応した発生要因を前記テーブルから読み出し、発生要因となっている製造装置を判定する。すなわち、不具合装置判定部 305は、クラスタの識別情報に対応して、製品の製造プロセスにおける欠陥の発生要因を検出する。 Next, the defective device determination unit 305 has a table indicating the relationship between the identification information of the classified cluster and the generation factor corresponding to the cluster, and is input from the defect type determination device 306. The generation factor corresponding to the identification information of the cluster to be classified is read from the table, and the manufacturing apparatus that is the generation factor is determined. That is, the defective device determination unit 305 detects the cause of the defect in the product manufacturing process in accordance with the cluster identification information.

そして、不具合装置判定部 305は、告知部 303からオペレータに通知するとともに、記録部 304に、判定された日時に対応して、欠陥の分類されたクラスタの識別番号と、発生要因と、その製造装置の識別情報とを履歴として記憶させる。また、制御装置 300は、不具合装置判定部 305の判定した製造装置の停止、または制御パラメ一タの制御を行う。 Then, the defective device determination unit 305 notifies the operator from the notification unit 303, and also notifies the recording unit 304 of the identification number of the cluster into which the defect is classified, the generation factor, and the manufacturing process corresponding to the determined date and time. The device identification information is stored as a history. Further, the control device 300 stops the manufacturing device determined by the defective device determination unit 305 or controls the control parameters.

[0089] D.製造管理装置 [0089] D. Manufacturing management device

本発明の他の製造管理装置は、図 25に示すように、制御装置 300,製造装置 301 , 302,告知部 303,記録部 304およびクラスタリング部 105から構成されている。ここで、クラスタリング部 105は前記 A, Bの項で説明した構成と同様である。 As shown in FIG. 25, another production management apparatus of the present invention includes a control device 300, production devices 301 and 302, a notification unit 303, a recording unit 304, and a clustering unit 105. Here, the clustering unit 105 has the same configuration as that described in the above sections A and B.

クラスタリング部 105においては、上述した A〜Cの場合と異なり、分類対象データの特徴データが工業製品、例えばガラス基板の製造過程における製造条件 (材料の分量、処理温度、圧力、処理速度など)力なる特徴量により、製造プロセスの各ェ程の製造状態別に分類する。前記特徴量は、各製造装置 301や 302に設けられているセンサの検出する工程情報としてクラスタリング部 105に特徴量として入力される In the clustering unit 105, unlike the cases A to C described above, the feature data of the classification target data is the manufacturing condition (material quantity, processing temperature, pressure, processing speed, etc.) force in the manufacturing process of industrial products, for example, glass substrates. Are classified according to the manufacturing status of each stage of the manufacturing process. The feature amount is input as a feature amount to the clustering unit 105 as process information detected by sensors provided in the respective manufacturing apparatuses 301 and 302.

[0090] すなわち、クラスタリング部 105は、前記分類対象データの特徴量により、各製造装置における各工程におけるガラス製造プロセスの製造状態を、「正常な状態」，「欠陥が発生しやすく調整が必要な状態」，「危険で調整が必要な状態」などのクラスタに分類する。そして、クラスタリング部 105は、前記分類結果を告知部 303によりオペレータに通知するとともに、分類結果のクラスタの識別情報を制御装置 300へ出力し、また、記録部 304に、判定された日時に対応して、前記各工程の製造状態の分類されたクラスタの識別番号と、最も問題となる特徴量である製造条件と、その製造装置の識別情報とを履歴として記憶させる。 [0090] That is, the clustering unit 105 needs to adjust the manufacturing state of the glass manufacturing process in each process of each manufacturing apparatus according to the feature quantity of the classification target data as “normal state” and “defects are likely to occur. Status ”,“ dangerous and needs adjustment ”, etc. Similar. Then, the clustering unit 105 notifies the operator of the classification result by the notification unit 303, outputs the cluster identification information of the classification result to the control device 300, and also notifies the recording unit 304 at the determined date and time. Correspondingly, the identification number of the classified cluster of the manufacturing state of each process, the manufacturing condition which is the most problematic feature amount, and the identification information of the manufacturing apparatus are stored as history.

制御装置 300は、クラスタの識別情報と製造条件を正常に戻す調整項目およびそのデータとの対応を示すテーブルを有しており、クラスタリング部 105から入力されるクラスタの識別情報に対応した、製造条件を正常に戻す調整項目およびそのデータを読み出し、対応する製造装置を読み出したデータにより制御する。 The control device 300 has a table indicating the correspondence between the cluster identification information and the adjustment items for returning the manufacturing conditions to normal and the data, and the manufacturing apparatus corresponding to the cluster identification information input from the clustering unit 105. The adjustment items that return the conditions to normal and their data are read, and the corresponding manufacturing equipment is controlled by the read data.

[0091] なお、図 1におけるクラスタリングシステムの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより分類対象データのクラスタリングの処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、 OSや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームべージ提供環境 (あるいは表示環境)を備えた WWWシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、 ROM, CD— ROM等の可搬媒体、コンピュータシステムに内蔵されるハードデイスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、ィンターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（RA M)のように、一定時間プログラムを保持して、るものも含むものとする。 [0091] It should be noted that a program for realizing the functions of the clustering system in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed. You may perform clustering processing of the classification target data. The “computer system” here includes OS and hardware such as peripheral devices. “Computer system” also includes a WWW system equipped with a home page provision environment (or display environment). The “computer-readable recording medium” means a portable medium such as a flexible disk, a magneto-optical disk, a ROM, a CD-ROM, or a storage device such as a hard disk built in the computer system. Furthermore, the “computer-readable recording medium” means a volatile memory (RA) inside a computer system that becomes a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. As shown in M), the program is held for a certain period of time.

[0092] また、前記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク (通信網）や電話回線等の通信回線 (通信線)のように情報を伝送する機能を有する媒体のことをいう。また、前記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル (差分プログラム）であってもよ、。 [0092] The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting a program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can realize the above-mentioned functions in combination with programs already recorded in the computer system, so-called differences. Even a minute file (difference program).

産業上の利用可能性 Industrial applicability

本発明は、ガラス物品等の欠点検出などのように多種類の特徴量を有する情報を高精度で分類し判別する分野に応用でき、さらに製造状態検出装置や製品製造管理装置にも利用できる。なお、 2006年 7月 6日に出願された日本特許出願 2006— 186628の明細書、特許請求の範囲、図面及び要約書の全内容をここに引用し、本発明の明細書の開示として、取り入れるものである。 The present invention can be applied to the field of classifying and discriminating information having various types of features with high accuracy, such as detection of defects in glass articles, etc., and can also be used for manufacturing state detection devices and product manufacturing management devices. . It should be noted that the entire contents of the specification, patent claims, drawings and abstract of Japanese Patent Application 2006-186628 filed on July 6, 2006 are incorporated herein by reference. It is something that is incorporated.

Claims

The scope of the claims

[1] In a clustering system that classifies input data according to the features of the input data into each cluster formed by the population of learning data,

A feature value set that is a combination of feature values used for classification is stored corresponding to each of the clusters, and a feature value set storage unit,

A feature quantity extraction unit that extracts preset feature quantities from the input data, and for each feature quantity set corresponding to each cluster, based on the feature quantities included in the feature quantity set, the population of each cluster A distance calculation unit that calculates and outputs the distance between the center and the input data as a set distance,

A rank extracting unit for arranging the set distances in ascending order;

A clustering system characterized by comprising:

[2] The clustering system according to claim 1, wherein a plurality of the feature quantity sets are set for each cluster.

[3] According to the rule pattern indicating the classification criteria for each cluster of the input data set based on the set distance obtained for each feature quantity set and based on the rank of the set distance, the input data The clustering system according to claim 2, further comprising a cluster classification unit that detects which cluster belongs to.

[4] The cluster classification unit detects which cluster the input data belongs to based on the rank of the set distance, and a cluster having a large set distance with the rank higher belongs to the input data. The clustering system according to claim 3, wherein the clustering system is detected as a cluster.

5. The cluster classification unit according to claim 4, wherein the cluster classification unit has a threshold for the number of ranks higher than the rank, and detects a cluster to which input data belongs if the higher rank cluster is equal to or higher than the threshold. Clustering system.

[6] The distance calculation unit is characterized by multiplying the set distance corresponding to the feature amount set by a correction coefficient and standardizing a set distance between each feature amount set. Item 6. The clustering system according to any one of Items 1 to 5.

[7] It further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, For each of a plurality of combinations of feature amounts, the feature value set creation unit uses the average value of the learning data of the population of each cluster as the origin, and calculates the distance between this origin and each learning data of the population of other clusters. The average value is obtained, and the combination of the feature values having the largest average value is selected as a feature value set used for distinguishing each cluster from other clusters.

V, the clustering system described in somewhere.

[8] The clustering system according to any one of claims 1 to 7 is provided, wherein the input data is image data of a product defect, and the defect in the image data is detected by a feature amount indicating the defect. Defect type determination device for classifying by defect type.

[9] The defect type determination device according to claim 8, wherein the product is a glass article, and the defects of the glass article are classified by defect type.

[10] A defect detection apparatus for detecting a defect type of a product, provided with the defect type determination apparatus according to claim 8 or 9.

[11] The defect type judging device according to claim 8 or claim 9 is provided, the type of defect of the product is determined, and the cause of the defect in the manufacturing process based on the correspondence with the cause corresponding to the type Manufacturing state determination device that detects the above.

[12] The clustering system according to any one of claims 1 to 7 is provided, and the input data is a feature value indicating a manufacturing condition in a product manufacturing process, and the feature value is converted into a manufacturing process. The manufacturing state determination apparatus which classifies according to the manufacturing state of each process.

13. The manufacturing state determination device according to claim 12, wherein the product is a glass article, and the feature amount in the manufacturing process of the glass article is classified according to the manufacturing state of each step of the manufacturing process.

[14] A manufacturing state detecting device provided with the manufacturing state determining device according to claim 12 or 13, for detecting a type of manufacturing state in each step of a product manufacturing process.

[15] The manufacturing state determination device according to claim 12 or claim 13 is provided to detect the type of manufacturing state in each step of the product manufacturing process, and based on the control item corresponding to the type, Product manufacturing management device for process control in the manufacturing process